Data are a strategic asset, and organizations are collecting more data than ever before. The availability of so much data creates big opportunities but also bigger challenges on how to analyze all the data in a timely manner. Trends in analytics and data management, along with heightened regulatory and governance requirements, demand new, innovative approaches that can quickly transform massive volumes of data into meaningful and actionable insights. In-memory analytics help to overcome these challenges and enable organizations to analyze extremely large volumes of data very quickly and efficiently. This latest technological innovation provides an entirely new approach to tackle big data by using an in-memory analytics engine to deliver super-fast responses to complex analytical problems. Similar to in-database analytics, this technology eliminates the need to copy and replicate data and offers more superior benefits such as near real-time analyses.
Traditionally, computers have two types of data storage mechanisms— physical disk (hard drive) and RAM (random access memory). I can recall owning a computer with a floppy disk, 128MB of disk space and 4 MB of RAM; however those days are long gone. Now, I have the luxury of owning a laptop with a CD ROM drive, 500GB disk storage, and 8GB of RAM. The megabyte has transformed to gigabyte and beyond. In today's world, computers have much more available disk storage than RAM, but reading data from the disk is significantly slower—possibly hundreds of times—when compared to accessing the same data from RAM. Performance can be massively impacted when analyzing enormous volumes of data with the traditional disk-based technology.
In-memory analytics is an innovative approach to querying data when it resides in a computer's random access memory (RAM), as opposed to querying data stored on physical disks. This results in vastly shortened query response times, allowing business analytic applications to execute complex, data-intensive analytics and enable proactive data-driven decisions.
As the cost of RAM declines, in-memory analytics is becoming more feasible and affordable for many businesses. Analytic applications have traditionally supported caching data in RAM. Older 32-bit operating systems provided only 4 GB of addressable memory for analyses. But with the newer 64-bit operating systems, with up to 1 terabyte (TB) of addressable memory (and could be more in the future), technology has made it feasible to cache larger volumes of data—potentially an entire data warehouse or data mart—in a computer's RAM.
In addition to providing incredibly fast query response times, in-memory analytics can reduce and eliminate the need for data indexing and storing pre-aggregated data or aggregating your data to conserve time. This capability tremendously reduces IT costs and enables faster implementation of analytic applications. It is anticipated that as analytic applications embrace in-memory analytics, complex data models, data visualization, and analytics processes can be executed much faster with more accuracy and precision.
Customers are still exploring and learning about in-memory analytics since it is relatively new. This technology was introduced around 2011. Customers who have adopted in-memory analytics are experiencing near real-time analyses with deeper insights and increased performance. It allows them to do more with the data they have and to solve a variety of business problems that were previously not possible. As complex data exploration and analytical approaches (descriptive analytics, predictive analytics, machine learning, text analytics, prescriptive analytics, etc.) become more prevalent, the efficiencies of integrating both the data and analytical workloads are critical to handling the processing needs in today's business climate of unpredictability and change.
As a consumer of IT, I am used to storing my data on a hard drive on my PC or a local server and access it when it is needed. When applying analytics, it is no different. The process is similar but it might not be so simple having all the data being readily available. Conventionally, the data are stored on a disk in a server room, and business analysts are given permission to access the data for analysis.
In the traditional batch processing of data, IT personnel have to deal with managing a lot of data back and forth between disks and data shuffling to obtain the right data. This process can create a lot of headaches and complexity for IT and business.
Our customers have encountered these common challenges, regardless of the industry or the size of the organization:
Some of the approaches taken to overcome the limitations of physical disk storage include having hardware/software to micromanage where the data are actually stored on the physical platter. The more frequent the data are accessed, the closer to the spindle the data are stored, whereas less frequently accessed data could be written to the outside of the platters.
Now let's examine in-memory analytics in detail and discuss how it works.
Compared to the traditional method of executing complex and advanced analytics, in-memory analytics offers many advantages and is an innovative approach to analyzing large amounts of data. In-memory analytics is a technique in which all the data used by an analytic application are stored within the main memory of the computing environment. In most cases, the computing environment is a data warehouse. Instead of accessing the data from a physical disk, data remain in the data warehouse until needed. Data are then lifted into RAM memory for analytics. Since the data are kept in memory, multiple users can share data across various applications and the time of any calculation is extremely fast, in a secure and parallel environment. In-memory analytics also takes advantage of multithreading and distributed computing, where you can distribute the data (and complex workloads that process the data) across multiple nodes in clusters as shown in Figure 3.1.
There are significant differences between traditional and in-memory processing. The first and most significant difference between the traditional and in-memory approach is where the data are stored for analytics. Today, with the powerful hardware available commercially, customers are taking advantage of in-memory processing power instead of constantly transferring, meddling, or shuffling with data residing on the disk. In the case of in-memory analytics, the persistent storage of the data is still on the physical disk, but the data are read into memory when needed for analytics. The second difference and biggest advantage compared to traditional processing is speed. In-memory processing allows users to keep the data in memory and run iterative processing or jobs without having to go back and forth to the disk each time. The end users can quickly get answers without worrying about the infrastructure limitations for analytical experiments or testing. In addition, data scientists are not restricted to a sample of data. They have all of the data and can apply as many analytic techniques and iterations as desired to find the best model in near real-time.
As indicated in Chapter 1, the Analytical Data Life Cycle, in-memory analytics is not only associated with queries and data exploration/visualization, but is also used with more complex processes like predictive analytics, complex model development, and text analytics. For example, regression, correlations, decision trees, and neural networks are all associated with in-memory analytics processing.
In-memory analytics helps to solve the following issues that the traditional approach is unable to resolve:
Similar to in-database analytics, a data warehouse is an essential component of in-memory analytics, especially since it contains a set of data that is integrated, cleansed, and refined. Data exploration and visualization is ideal for in-memory processing because it can quickly visualize and provide useful information, such as correlation on the data that you are working with, in order to determine whether and what type of further analysis is needed. One customer expressed the value that it can “show me the data and patterns in the data.” In addition, in-memory analytics allows for more self-service for end users because there will be less dependence on IT to create, maintain, and administer aggregates and indexes of the data. In-memory analytics also helps meet diverse and unplanned workloads (e.g., discover relationships or build models involving observations at granular level).
In-memory analytics is becoming the new or next generation BI (business intelligence). Many of the vendors in this space have developed visually rich analytics features with click, drag-and-drop capabilities. With easy access to data and analytics, organizations are adopting in-memory analytics to develop interactive dashboards and explore data without limits. There are a few vendors that offer in-memory for predictive analytics and data model development. With in-memory technology, business users can now engage their data with blazing speed, resulting in more informed, proactive, data-driven decisions. For IT departments, in-memory analytics offers far less time spent on data analysis, aggregate data, performance tuning, and other time-consuming tasks.
Gartner Research confirms that not only can data be retrieved faster, but in-memory analytical technology performs complex calculations and query results much significantly faster than disk-based approaches. This allows users to dissect data and create robust reporting without the limitations associated with traditional Business Intelligence (BI) tools such as multidimensional cubes or aggregate tables. Near real-time, ad-hoc query capabilities can be extended to even high volume and velocity transaction–based industries such as financial services, telecommunication, and retail.
Organizations are adopting in-memory analytics to solve many issues in conjunction with the traditional approach and seeing the need to improve performance, economics, and governance. The needs are very similar to in-database analytics and become the main drivers for many organizations. What follows are some reasons for in-memory analytics.
Although the need for in-memory analytics is still growing, I am seeing huge benefits from customers who have either adopted the technology or implemented in-memory analytics within their organization. Let's examine these business and IT benefits.
Depending on the size of the organization and the use of in-memory analytics, the benefits are truly remarkable. Customers who implemented in-memory analytics see big transformations of their processes, productivity, and culture within IT and business. There is a good balance of the tangible and intangible benefits using in-memory analytics.
Some of the intangible benefits that customers have stated the following.
Now that the benefits are explained, let's examine the justification for in-memory analytics.
As mentioned earlier, in-memory is relatively new; it is has been on the market for approximately four years (at time of writing). As with all new technology, there are questions about its capabilities and the value it brings into an organization. Customers often have an analytics strategy and roadmap in mind before discussing the in-memory analytics approach. Here are some things that you should consider to justify and get you started with in-memory analytics.
Finally, it is vital that you involve all parties—IT, business users, and sponsors—early in the decision process, as well as throughout the practical side. When they participate in the decision process, I witness higher success rate and on-time, on-budget delivery of tasks.
Many vendors in the in-memory analytics space offer similar technologies, features, functionality, and infrastructure. However, the success or failure of in-memory analytics does rest to some degree on the technology selected to be the delivery platform. Customers who adopted this technology say that the platform needs to be web-enabled/centric as their primary requirement. Beyond the web-enabled requirement, here are some other essential technology-driven prerequisites to consider.
In recent months, we witnessed data security breaches globally in both the private and public sectors. Thus, selecting a solution that focuses on heightening data governance and making data security a priority can alleviate major headaches, costly remedies, and public embarrassment. I highly recommend a vendor that had a solution around a centralized data server such as a data warehouse or database. Having a centralized data repository option enables IT to govern the data in a highly safeguarded environment. With a centralized data repository, such as a data warehouse, your in-memory analytics can adapt and conform to your organization's data security measures. Another recommendation is to identify the users with certain rights and privileges to access sensitive data, analyze, and store the data and adjust as the employees change their role or job function within the company.
Let's examine some customer successes and case studies. These customers have adopted, implemented, and achieved superior results using in-memory analytics by considering the above requirements.
There are a number of success stories and use cases for in-memory analytics since its inception into the industry. As previously mentioned, in-memory analytics is used to explore the data and model development. While in-database analytics started to catch on in the mid- to late-2000s, in-memory analytics was first commercially promoted in 2011 with SAS and Teradata.
Our first success story comes from a global banking and financial services company based in Europe. This institution has an international network spanning 50 markets with more than 9,000 branches and over 149,000 employees. This commercial bank operates in 22 European countries, and its strategic position in Western and Eastern Europe gives the company one of the region's highest market shares.
One of the reasons for it adopting and implementing in-memory analytics is to enhance its data governance focus. It was and still is imperative for a financial company to adhere to rules and regulations, especially when it comes to data security and governance. Protecting customer information is a priority for the company.
There are three main objectives mandated by the CFO (chief financial officer) to improve and implement change for the future of the company:
The CFO is the executive sponsor for this project, and he has outlined the key areas and initiatives to manage complexity and simplify the need for information sharing across departments and throughout the enterprise.
The traditional approach to analyzing sales and business operations is at best mediocre. Data reside in many silos and there are many copies of the same data. Thus, the analyses of the data can be misleading and results are not trustworthy, as data may be outdated and there is not a process to provide a single view of the customer. Analyzing customer data and reporting the results from the analyses are critical for the business. The traditional model does not allow business analysts to analyze large amounts of data at a granular level. The staff spent as much as 85% of their time preparing the data instead of performing the analysis.
The business is asking for an innovative solution that allows analysis with a growing level of granularity, on large amounts of data, down to each single deal transaction and enables the staff to be more focused on data analysis instead of data preparation. Thus, in-memory analytics enters the picture to help the business thrive with a single, fast, scalable, and user-friendly solution.
As the bank examines its business, it considers many options and concludes that they need an advanced analytics solution focusing on data governance. Figure 3.3 illustrates the before and after picture of the architecture.
Figure 3.3 illustrates how the traditional architecture is set up and used by the bank. The silo model has data sources (legacy systems, payment systems, securities, markets products, and customer info) in various data silos. As the data enters into the system, each group such as controller, risk, accounting, and regulations applies different data management process and services. Because there is no consistent method to manage the data, it is a “spaghetti” mess of data warehouse coming and going everywhere. Since the data may be in an inconsistent format or not integrated with other data sources, the company has a challenge to trust the state of the data and the results of the analyses from the data. In addition, with four silo data marts potentially storing duplicate data, it requires more resources to manage and maintain. This is what the bank referred to as “number crunching” from the previous section. Ultimately, different reports are being delivered and management is unable to trust which report is correct to make business decisions.
Transforming the left side to the right side, the layering concept is much more streamlined and with a lot of emphasis on data governance. Analytics have become the focal point instead of just reporting on the data. The company truly believes in using analytics to drive data-driven decisions. Let's examine the new architecture on the right side.
When I described the centralization, culture development, and standardization initiatives, the new architecture provides all of these elements. As data enter into the architecture, there is a standard process of managing and integrating the data. One important aspect to highlight is the focus on data quality, which is a topic that I find many customers tend to ignore and not address. (Refer to in-database data quality from Chapter 2.) In this case, the customer carves out a layer to address data quality issues so that data is cleansed and integrated before it gets analyzed. Each business unit such as controlling, risk, accounting, and regulation leverages the same data from one centralized repository to analyze and run reports using one view of the data. There are cross-functional layers for IT and business to govern the data. Figure 3.4 provides a more granular view of the architecture and how data is governed from IT and business.
The bank no longer has four data marts but one enterprise data warehouse (EDW) to allow IT to stage and store the data. Once the data are captured in the EDW, data mining and reporting tools are available to analyze the data for the various business units. The advanced analytics are applied to the EDW data, where the business can further do analysis such as validating and reporting. In the next section, we will examine how in-memory analytics is used at this financial institution.
The CFO has three main goals for this project:
In-memory analytics was adopted by the customer and is used extensively for data visualization and exploration to create reports for executives and publish the results to the portal. One area is sales. In the banking world, there are many offers and incentives for credit cards, loans, and other types of accounts. For example, everyone can relate to applying for a loan and wondering if you can qualify (and how much you can borrow) to purchase a car, house, or renovation project. The bank uses in-memory analytics to analyze customer data at a granular level to examine historical data: whether you have applied for a loan before and were either accepted or rejected; you have employment and steady income to pay back the loan; whether you have defaulted on a loan in the past; and use of your credit history and credit score to see how much you can qualify for. These are some basic and rudimentary data exploration techniques used by the bank that require it to be near real-time so that it can provide an immediate response to the applicant. Figure 3.5 illustrates the process.
Once the data reside in the data warehouse, the process begins with data preparation to ensure the data are in the required format. The data are then lifted into memory for analysis. Analytics explores the relationship of the data and runs simulations to provide the output and scenarios based on the input of the applicant in seconds. It then provides the mechanism to create a report that can be published to the portal, a dashboard, or email, or even mobile devices.
In addition to sales, this institution uses in-memory analytics to analyze its network operations. It analyzes network capacity and the behavior of the network to adequately support the business planning. It does not want to have any downtime for any of their systems, as it can drastically affect the business operations and loss of revenue.
What used to take days and hours to process and analyze is now taking minutes and seconds. From analyzing loan applications to monitoring the network operations for business planning, the bank has expanded its use of in-memory analytics from 1,300 users in 2013 to over 15,000 users today. The user-friendly interface and game-changing performance have made this transformation a success. In-memory analytics offer the following benefits:
The customer shared some best practices and lesson learned when adopting a new technology such as in-memory analytics. The primary takeaway from this bank is that the business side must drive the project, while leveraging on IT for support. The business side included the IT side very early on when it evaluated the tools and technology. Because in-memory analytics was relatively new and emerging, the customer started several discussions with various vendors that included a proof-of-concept that aligns with the business and IT initiatives (centralization, standardization, culture development, and innovation). Other best practices and lessons learned are:
These best practices and lessons learned provide a good perspective of the efforts of teamwork and investment needed to be successful. The bank continues to expand the use of in-memory analytics. Because the adoption rate has been so high, in-memory analytics will extend to beyond 15,000 users and to more departments. It will continue to publish information and reports to mobile devices for data-driven decisions.
Thus far, I have provided a few customer successes in the financial, e-commerce, and telecommunications sector. This next one highlights the use of in-memory analytics in the public sector. It is a government agency based in Europe that handles taxes for individual, business, and customs for the entire country. As the population changes, this agency is also in charge of analyzing census data, which is very high in volume. The agency is collecting more data today than ever to manage the changing needs of the population.
In the traditional architecture, many of the business units such as customs, tax, financial, enforcement, and social services benefits did not have an automated way to look at the data. They relied on manual lookups and paper trail to know whether an individual household paid their property tax, as illustrated in Figure 3.6. This is just one example of their archaic process.
Similar to the financial institution discussed earlier, this agency has data in many silos. There is no standardized process to manage the massive amounts of collected data.
Once the data are collected, applying analytics is a challenge and time consuming. The reason is the number of analytical tools and process to analyze the silos of data, and each department had its own types of tools to use. A standard tool is needed with the analytical capabilities to meet all of the departments' needs. When analytics is used, it can take hours and days to complete due to the infrastructure and the silos of data that need to be consolidated. It is currently inefficient for the IT and business groups to operate on a daily basis, motivating the agency to change to support the growth of the country and its population.
The homegrown, in-house application that is developed and being used is no longer able to meet the needs of the agency. When the agency was exploring for a new analytics package, they considered many factors. First and foremost is the usability of the interface. It has to be easy to use, and anyone from an analyst to director level can use the technology without have to do any coding and offer self-service functionality. Another factor was the depth and breadth of the analytics capability from beginner to novice to data statisticians. The technology must be scalable and fast when analyzing large amounts of data from many sources. After many months of evaluating, this agency selected an in-memory data visualization package that can be integrated with their existing data warehouse.
In-memory data visualization is used in various business units because of its flexibility, depth and breadth, ease of use, and timely analysis. One great example from this agency is the ability to create a dashboard for tax collectors. Prior to this system, all records were manually managed and monitored with paper trails. With in-memory analytics, the data can be analyzed at the granular level that includes all of the customer information—name, address, property tax value, tax owed or paid, and so on. The tax collector can simply leverage the information from the dashboard on a mobile device and determine who has not paid their taxes, as shown in Figure 3.7. The tax collector can receive alerts from those who have not paid their taxes and are overdue for 30, 60, 90 days, and beyond. When the tax collector visits the household to collect the taxes, the debt can be showed to the clients in real time on the dash board and there can be no argument as to whether the client claims to have paid or not. It offers a dynamic application for the tax collector to update and manage the data. Once the information is updated from the mobile device, it updates the dashboard for everyone in the agency to view and see the progress. This is a great example of using in-memory analytics to get real-time feeds and results and increase revenue for the agency.
In addition to the tax department, another use of in-memory analytics is from the customs department. As the agency gets many incoming packages to its country, it collects the data and filters what may be interesting for the inspectors to flag and inspect. With in-memory analytics, data are being fed as packages are being scanned, and the data is analyzed for any suspicious materials for further examination. This type of information is shared with the line of business and executives to ensure the safety of the country and its citizens. Other uses for in-memory analytics and data visualization from this agency include:
The agency continues to expand the use of in-memory analytics. The user community has grown exponentially as well from hundreds to thousands. The adoption of the technology is very high as personnel in various departments have complimented its ease of use and the analytical capabilities it offers with superior performance. What has taken months to process can now be done in hours, which makes this agency very happy. It is striving to be better every day for its country and citizens.
The agency has shared with me the many benefits of using in-memory analytics integrated with the data warehouse.
When I speak to customers about in-memory analytics, one topic that comes up consistently is the investment or cost associated with the hardware and software. Of course, it depends on the vendor and the architecture that you choose to select. Every vendor offers different options when it comes to in-memory analytics. On the hardware side, you will likely need an appliance or a separate server to host the in-memory analytics. The nodes that reside in the appliance or server should be dedicated for in-memory analytics. When sizing the hardware, you should work with the vendor to configure the system appropriately by providing the vendor the following information based on your organization's requirements:
On the software side, there are solutions that offer in-memory data visualization with deep analytical capabilities. In addition, there are specific domain in-memory analytics packages for data mining, statistics, forecasting, econometrics, text mining, and optimization. If a customer is looking for industry-specific in-memory solution, there are anti-money-laundering, risk, and marketing optimization. Depending on your needs, I advise the customers to adopt one solution and test it out to see if it meets the requirements of your business. Once you have proven it to be successful, you can expand the hardware and adopt additional software packages to extend the use of in-memory analytics.
When selecting an in-memory package, it is essential to ensure that the analytics is well integrated with the data warehouse.
By now, you get a good sense of what is in-memory analytics and what it can do. I tend to compare in-memory analytics to the sprint stage while in-database is the crawl stage. The next chapter covers Hadoop, which has been a trendy topic in the IT industry for the last few years. Hadoop will be the last piece of the relay, and then we will see how in-database, in-memory, and Hadoop fit into the big picture that customers are embracing.