Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3
In-Memory Analytics

Data are a strategic asset, and organizations are collecting more data than ever before. The availability of so much data creates big opportunities but also bigger challenges on how to analyze all the data in a timely manner. Trends in analytics and data management, along with heightened regulatory and governance requirements, demand new, innovative approaches that can quickly transform massive volumes of data into meaningful and actionable insights. In-memory analytics help to overcome these challenges and enable organizations to analyze extremely large volumes of data very quickly and efficiently. This latest technological innovation provides an entirely new approach to tackle big data by using an in-memory analytics engine to deliver super-fast responses to complex analytical problems. Similar to in-database analytics, this technology eliminates the need to copy and replicate data and offers more superior benefits such as near real-time analyses.

BACKGROUND

Traditionally, computers have two types of data storage mechanisms— physical disk (hard drive) and RAM (random access memory). I can recall owning a computer with a floppy disk, 128MB of disk space and 4 MB of RAM; however those days are long gone. Now, I have the luxury of owning a laptop with a CD ROM drive, 500GB disk storage, and 8GB of RAM. The megabyte has transformed to gigabyte and beyond. In today's world, computers have much more available disk storage than RAM, but reading data from the disk is significantly slower—possibly hundreds of times—when compared to accessing the same data from RAM. Performance can be massively impacted when analyzing enormous volumes of data with the traditional disk-based technology.

In-memory analytics is an innovative approach to querying data when it resides in a computer's random access memory (RAM), as opposed to querying data stored on physical disks. This results in vastly shortened query response times, allowing business analytic applications to execute complex, data-intensive analytics and enable proactive data-driven decisions.

As the cost of RAM declines, in-memory analytics is becoming more feasible and affordable for many businesses. Analytic applications have traditionally supported caching data in RAM. Older 32-bit operating systems provided only 4 GB of addressable memory for analyses. But with the newer 64-bit operating systems, with up to 1 terabyte (TB) of addressable memory (and could be more in the future), technology has made it feasible to cache larger volumes of data—potentially an entire data warehouse or data mart—in a computer's RAM.

In addition to providing incredibly fast query response times, in-memory analytics can reduce and eliminate the need for data indexing and storing pre-aggregated data or aggregating your data to conserve time. This capability tremendously reduces IT costs and enables faster implementation of analytic applications. It is anticipated that as analytic applications embrace in-memory analytics, complex data models, data visualization, and analytics processes can be executed much faster with more accuracy and precision.

Customers are still exploring and learning about in-memory analytics since it is relatively new. This technology was introduced around 2011. Customers who have adopted in-memory analytics are experiencing near real-time analyses with deeper insights and increased performance. It allows them to do more with the data they have and to solve a variety of business problems that were previously not possible. As complex data exploration and analytical approaches (descriptive analytics, predictive analytics, machine learning, text analytics, prescriptive analytics, etc.) become more prevalent, the efficiencies of integrating both the data and analytical workloads are critical to handling the processing needs in today's business climate of unpredictability and change.

TRADITIONAL APPROACH

As a consumer of IT, I am used to storing my data on a hard drive on my PC or a local server and access it when it is needed. When applying analytics, it is no different. The process is similar but it might not be so simple having all the data being readily available. Conventionally, the data are stored on a disk in a server room, and business analysts are given permission to access the data for analysis.

In the traditional batch processing of data, IT personnel have to deal with managing a lot of data back and forth between disks and data shuffling to obtain the right data. This process can create a lot of headaches and complexity for IT and business.

Our customers have encountered these common challenges, regardless of the industry or the size of the organization:

Network impact: There is a large impact on I/O (input/output) time and resources when dealing with large volumes of data for analytics. To minimize this impact, an organization needs optimized file structure and high-speed database access methods. Thus, in-memory analytics overcomes this process by moving all the data and workload into memory for analysis.
Data aggregation: Aggregating data can take a lot of processing time. It is common practice to store aggregated data in an independent data mart for users to access and run queries when needed. In addition, coding is needed to optimize the aggregation algorithm to improve performance, and it is a time consuming task. Thus, with in-memory technology, data aggregation process is not required, and this can save both time and money for any organization.
Data movement: Moving the data is risky and hazardous. When data are moved, copies are stored on a separate server, which creates redundant data and a silo data store that will need to be managed.

Some of the approaches taken to overcome the limitations of physical disk storage include having hardware/software to micromanage where the data are actually stored on the physical platter. The more frequent the data are accessed, the closer to the spindle the data are stored, whereas less frequently accessed data could be written to the outside of the platters.

Now let's examine in-memory analytics in detail and discuss how it works.

IN-MEMORY ANALYTICS APPROACH

Compared to the traditional method of executing complex and advanced analytics, in-memory analytics offers many advantages and is an innovative approach to analyzing large amounts of data. In-memory analytics is a technique in which all the data used by an analytic application are stored within the main memory of the computing environment. In most cases, the computing environment is a data warehouse. Instead of accessing the data from a physical disk, data remain in the data warehouse until needed. Data are then lifted into RAM memory for analytics. Since the data are kept in memory, multiple users can share data across various applications and the time of any calculation is extremely fast, in a secure and parallel environment. In-memory analytics also takes advantage of multithreading and distributed computing, where you can distribute the data (and complex workloads that process the data) across multiple nodes in clusters as shown in Figure 3.1.

Schema for In-memory analytics. — **Figure 3.1** In-memory analytics

There are significant differences between traditional and in-memory processing. The first and most significant difference between the traditional and in-memory approach is where the data are stored for analytics. Today, with the powerful hardware available commercially, customers are taking advantage of in-memory processing power instead of constantly transferring, meddling, or shuffling with data residing on the disk. In the case of in-memory analytics, the persistent storage of the data is still on the physical disk, but the data are read into memory when needed for analytics. The second difference and biggest advantage compared to traditional processing is speed. In-memory processing allows users to keep the data in memory and run iterative processing or jobs without having to go back and forth to the disk each time. The end users can quickly get answers without worrying about the infrastructure limitations for analytical experiments or testing. In addition, data scientists are not restricted to a sample of data. They have all of the data and can apply as many analytic techniques and iterations as desired to find the best model in near real-time.

As indicated in Chapter 1, the Analytical Data Life Cycle, in-memory analytics is not only associated with queries and data exploration/visualization, but is also used with more complex processes like predictive analytics, complex model development, and text analytics. For example, regression, correlations, decision trees, and neural networks are all associated with in-memory analytics processing.

In-memory analytics helps to solve the following issues that the traditional approach is unable to resolve:

Obtaining immediate analytical insights from multiple data sources: In-memory processing can support complex analytical workloads with parallel scaling for increased performance as compared to the traditional architecture. What has previously taken days or weeks to analyze via complex analytical models for strategic insights can now be executed in seconds and minutes. We will illustrate this with examples later in this chapter.
Analyzing granular and deeper analytical insights using the entire data set as opposed to working with a small subset: You can now take advantage of this innovative technology to uncover meaningful new opportunities, mitigate unknown risks, and drive growth for your business with near real-time insights. An example of the benefit of this is with clustering/segmenting data. Typically, when clustering data, you end up with several large clusters, as well as some small ones. It's these smaller clusters that typically contain the “interesting” group (as the larger clusters are the “typical or average” groups. If we had to subset due to limitations with estimating the model, there is a very strong chance that we would have missed the interesting observations (as we don't know what constitutes interesting—that's the point of the clustering).
Integrating digital data with new demographics and external audience: This is the ability to be preemptive in analyzing tweets, postings, and texts. In order to gain competitive advantage, organizations need to discover, analyze, and proactively respond to different, changing, and fast-moving events that occur in volume (e.g., in 2015 there were, on average, 6,000 tweets per second). These events of interest are only apparent when they are understood and heard by the dependent parts of the organization. This requires event processing that follows through the organization in contextually and relevant data-driven actions. The ability to ingest data and process streams of events effectively identifies patterns and correlations of importance, focusing organizational activity to react and even proactively drive the results they seek and respond to in real time. It has become a directive for many companies to not just maintain but continue to push the envelope to be faster and better with in-memory analytics.

Similar to in-database analytics, a data warehouse is an essential component of in-memory analytics, especially since it contains a set of data that is integrated, cleansed, and refined. Data exploration and visualization is ideal for in-memory processing because it can quickly visualize and provide useful information, such as correlation on the data that you are working with, in order to determine whether and what type of further analysis is needed. One customer expressed the value that it can “show me the data and patterns in the data.” In addition, in-memory analytics allows for more self-service for end users because there will be less dependence on IT to create, maintain, and administer aggregates and indexes of the data. In-memory analytics also helps meet diverse and unplanned workloads (e.g., discover relationships or build models involving observations at granular level).

THE NEED FOR IN-MEMORY ANALYTICS

In-memory analytics is becoming the new or next generation BI (business intelligence). Many of the vendors in this space have developed visually rich analytics features with click, drag-and-drop capabilities. With easy access to data and analytics, organizations are adopting in-memory analytics to develop interactive dashboards and explore data without limits. There are a few vendors that offer in-memory for predictive analytics and data model development. With in-memory technology, business users can now engage their data with blazing speed, resulting in more informed, proactive, data-driven decisions. For IT departments, in-memory analytics offers far less time spent on data analysis, aggregate data, performance tuning, and other time-consuming tasks.

Gartner Research confirms that not only can data be retrieved faster, but in-memory analytical technology performs complex calculations and query results much significantly faster than disk-based approaches. This allows users to dissect data and create robust reporting without the limitations associated with traditional Business Intelligence (BI) tools such as multidimensional cubes or aggregate tables. Near real-time, ad-hoc query capabilities can be extended to even high volume and velocity transaction–based industries such as financial services, telecommunication, and retail.

Organizations are adopting in-memory analytics to solve many issues in conjunction with the traditional approach and seeing the need to improve performance, economics, and governance. The needs are very similar to in-database analytics and become the main drivers for many organizations. What follows are some reasons for in-memory analytics.

Complex analytics requirements
Traditional IT infrastructures present a number of challenges, and one of them is to overcome the slow query performance supporting complex analytical requirements. It simply cannot keep pace with today's changing and dynamic data management and analytical requirements for fast and accurate analysis. In addition, it is not designed to process complex analytics on terabytes (or beyond) of data efficiently. This is where in-memory analytics can help. In-memory analytics can solve complex analytics that is often coupled with large data volumes.
Growing big data volumes
Many enterprises are being perplexed by a massive explosion of data in their databases and analytics applications. Exponential growth of data is being captured and stored, along with semi-structured forms of data files such as email, video, and freeform text (such as tweets, Facebook comments/status, yelp comments, and other social media sources). At the same time tighter regulations put the burden on companies to maintain and store data available for years to come in case of audits or requests from law enforcement agencies. In-memory analysis makes access and analysis of large data volumes possible at incredibly fast speed and yields a higher return on investment.
Less reliance on IT
With the presence of big data and the explosion of data-driven applications, organizations are discovering it is becoming harder to manage their data and analytical projects for all levels of the organization. With the traditional approaches, the queries and reporting are taking too long to execute because it takes too long to manipulate the data. In-memory analytics eliminates complicated disk-based shuffling of data. There will be no need to wait for the IT bottleneck to build summary and aggregated tables to be used for analysis from the disk-based data. Business units can be more self-serving with in-memory analytics. One customer has expressed the opinion that it has driven a wider adoption of analytics because of the speed it offers to the users throughout the enterprise.
Enhanced user interface
Vendors have developed very attractive and intuitive interactive data visualization solutions, which have been adopted as the common front-end interface to analytical applications. With their intuitive displays, it is new paradigm for business analysts and IT users who aren't accustomed to the grid style of analysis and reporting offered by relational databases and spreadsheets. Interacting and exploring data using in-memory visualization tools offer very user friendly tasks such as clicking on a pie chart or dragging data onto a scatter plot—and the ability to easily visualize the relationship of their data is a trendy concept.
Departmental application
In-memory analytics is ideal for departmental applications, as users can analyze the full depth and breadth of their data since there are no practical limits to performing drill-down style capabilities. Customers who have adopted in-memory analytics are able to facilitate a more exploratory, visual analysis of big data with easy-to-use data selection and access to common graphical user interface components such as sliders, radio boxes, and check boxes offered via the user interface. I interact with customers who have in-memory analytics for enterprise use as well as customers who have used it for common departmental deployments, but it is more commonly for departmental deployment.

Although the need for in-memory analytics is still growing, I am seeing huge benefits from customers who have either adopted the technology or implemented in-memory analytics within their organization. Let's examine these business and IT benefits.

Benefits

Depending on the size of the organization and the use of in-memory analytics, the benefits are truly remarkable. Customers who implemented in-memory analytics see big transformations of their processes, productivity, and culture within IT and business. There is a good balance of the tangible and intangible benefits using in-memory analytics.

Dramatic performance improvements
Users are querying and interacting with data in-memory, which is significantly faster than accessing data from disk to execute complex analytics that are data volume intensive. In one case, a customer decreased processing in a regression model from 167 hours down to 84 seconds. More success stories are to come in the next section.
Discover new, interesting insights
In-memory analytics can give business users rapid execution of complex analytics to deliver new levels of interesting insights to optimize business performance or improve the decision making process without the IT bottleneck. Many of the customers tell me that it provides self-service applications to quickly explore and visualize data for discovery.
Cost effective
The in-memory approach provides the ability to analyze very large data sets at any time with any data type, and customers are expressing that it is much simpler to administer. As IT is not burdened with data movement and performance tuning of the ETL process, it minimizes the resources and increases cost savings.

Some of the intangible benefits that customers have stated the following.

Ease of use leads to higher adoption of use
The most valuable benefit that organizations are experiencing, albeit less recognized because it is an intangible one, is that the ease of use which leads to high adoption rate. In-memory analytic software allows business analysts and line-of-business managers to build their own reports, charts, graphs, and/or dashboards with very little training or technical expertise because of the intuitive interface that allows user to simply click, drag, and drop data for analysis. This also encourages significant higher levels of user adoption due to the autonomy and sense of ownership that business analysts and nontechnical personnel have to explore their own data and not feel intimidated with the data and technology. The ultimate benefit is the transformation shifts away from those that manage the data to the stakeholders who use, own, and analyze the data. In-memory analytics enable users to comprehend and expose their business in new ways and interactively explore big data without limits.
Offloading work from the overburdened data warehouse
In-memory analytics offers the wonderful benefit of eliminating a big chunk of repetitive and expensive (in terms of CPU usage and disk I/O) processing that would normally be time consuming and add to the burden placed on the database or data warehouse servers. With in-memory analytics, the analytics engine pulls the data into memory from the data warehouse or database once or when it is needed. This process can be scheduled to populate the data as a batch or incremental load and/or overnight during off-peak hours. Thus, it alleviates the capacity demand by offloading the query from the data warehouse or database during peak interactive hours of the day. By unburdening the database server or data warehouse, organizations can benefit from delivering faster performance, producing more reports per hour, and free up capacity on the source database servers or data warehouse for other data queries and purposes.
Enabling self-service applications for departments
Companies are realizing the value of convenience and availability by setting up a mechanism for departments or workgroup to leverage in-memory analytics. I see two camps, one for the power users and the other for specific departments such as sales or marketing. In one extreme case, customers are setting up environments for power users to conduct high-performance analytics using in-memory technology and limiting its use to perhaps the PhDs or statisticians. On the other hand, organizations are realizing the benefit of self-service environments for larger user groups for sales and marketing to operate autonomously using in-memory technology without impacting the data warehouse workload.
Ad-hoc analysis
Ad-hoc analysis is very limited or nonexistent in the traditional approach because it can take a significant amount of time to obtain the data from disk. In-memory analytics makes it possible to conduct ad-hoc analysis because of its infrastructure—data are only lifted into memory when they are needed for analysis. And it is in the true sense of the word, ad-hoc. In-memory analytics provides the ability to rapidly lift the data from the database or data warehouse and users can explore, discover, and build an analytic application to meet a specific task at hand. For example, let's consider exploring the relationship of age groups and recent usage of coupons that are delivered to customers via emails or smart phones. In this case, the analysis is undertaken and then programs can be built to offer more coupons and bigger incentives to the most loyal customers with even deeper discounts. It could be a flash sale or limited offers to attract customers and increase sales.

Now that the benefits are explained, let's examine the justification for in-memory analytics.

Getting Started

As mentioned earlier, in-memory is relatively new; it is has been on the market for approximately four years (at time of writing). As with all new technology, there are questions about its capabilities and the value it brings into an organization. Customers often have an analytics strategy and roadmap in mind before discussing the in-memory analytics approach. Here are some things that you should consider to justify and get you started with in-memory analytics.

Identify a business problem or opportunity and consider what users do with the data
Start by identifying the problem or opportunity. This is often perceived as a “no brainer,” but it can be the most time-consuming and resource-intensive process. Every business has issues or problems but identifying “just” one to perform a proof of value or proof of concept can be daunting. Customers definitely want to get the “biggest bang for the buck” effort, but not every problem can or should be solved with in-memory technology. However, if your organization is report-oriented and the infrastructure does not facilitate what-if analysis, interactive exploration of data, discovery of data patterns, and new opportunities, then adopting an in-memory technology can definitely be beneficial. I always suggest to start small and grow/expand as needed once you are able to solve one problem, and people will see value with the new technology.
Understand the IT and business bottlenecks
With any organization, there are bottlenecks to any process. Are the users expressing grievances about poor query response times? Do complex queries and analytical computations time out before completing? Does poor performance prevent users and executives from asking important business questions or any questions? If so, consider in-memory technology that delivers an integrated data management and analytics for the ultimate user experience. It is critical that the technology does not constrain users but offers flexibility and scalability in terms of data access, analysis, and reporting.
Assess the costs and balance with the expected benefits
In-memory analytics will require some investment from the company in terms of new hardware, new software, professional services, and definitely training. Since an in-memory deployment involves supporting another data structure and analysis, it will need to involve support from IT and business sides to work together. In-memory investments have delivered great value and benefits from the customers that I work with, but the benefits and value added may be hard to quantify and articulate to the leadership team. On one hand, the business side often is not aware of tangible, lower cost of ownership benefits such as saving disk space, minimizing data movement, integrating the platform, and reducing administrative labor that would otherwise be required to make these queries run faster and more efficient. On the other hand, you may have the IT side that would have to support the hardware maintenance and may not see the intangible benefits of ease of use, access for business analysts, and ad-hoc capabilities, so bringing the two sides together is critical. Organizations contemplating in-memory analytics often would develop a business case to justify the investment. Positioning in-memory analytics with the user communities and types of applications is critical in the cost/benefit analysis and overall analytics strategy.
Obtain a corporate, senior executive sponsor
All of the prospects and customers that I work with say that having a senior corporate sponsor for in-memory projects is a must. Many projects fail to go forward (even after spectacular results) due to having no senior management sponsorship. The sponsor provides not only the financial support but also guidance on how to maneuver the political push-backs within IT and business sides. In addition, this sponsor can help to identify the business problem that will help to drive innovation and increase the bottom line. Without a sponsor, it is very challenging to justify the investment of new hardware, software, services, and training.

Finally, it is vital that you involve all parties—IT, business users, and sponsors—early in the decision process, as well as throughout the practical side. When they participate in the decision process, I witness higher success rate and on-time, on-budget delivery of tasks.

Requirements

Many vendors in the in-memory analytics space offer similar technologies, features, functionality, and infrastructure. However, the success or failure of in-memory analytics does rest to some degree on the technology selected to be the delivery platform. Customers who adopted this technology say that the platform needs to be web-enabled/centric as their primary requirement. Beyond the web-enabled requirement, here are some other essential technology-driven prerequisites to consider.

Integration with your existing integrated data warehouse and business intelligence
The need for a data warehouse is still prevalent for in-memory analytics. While some vendors convey or advertise that in-memory analytics does not require or avoid developing a data warehouse, this option may work for smaller organizations that have only a single data source or a small system. However, for larger companies that typically have multiple data sources or larger, more complex systems, the data warehouse is still the ideal platform to capture, transform, and integrate the data for analysis. This is where the in-database capability can be part of the overall infrastructure. As you explore and look for in-memory technology, make sure that it can be integrated with the existing data warehouse and BI environments. An in-memory solution normally comes with visualization capabilities, and it can tap into the BI applications and data warehouse to uncover data patterns, build complex data models, and deliver to the web or mobile devices for enterprise consumption.
Enterprise scalability
Most if not all customers tend to start small with an appliance to enable in-memory analytics for a specific department. Once proven to show value and a positive ROI, the vast majority (9 out of 10) customers upgrade to a bigger appliance and add more memory (via the addition of nodes) to handle additional data for high-performance analyses. Therefore, it is important to select a solution that can scale linearly as you grow. When you add more data, more complex analytics, more analytical data models, and more data, ensure that the technology can scale to support today's and tomorrow's requirements. Therefore, it is critical that you select a solution that can provide enterprise-class infrastructure that enables you to be strategic and expand.
Ensure near real-time data refresh
Whether you create reports or build an analytical data model, data are extracted from a source system, most likely from a data warehouse. In-memory analytics load the data directly from the source system into in-memory nodes, and data latency can be a concern. There are SLAs (service-level agreements) where reports or analytics have to be delivered within a specific window of time. Therefore, an in-memory solution will need high-speed connectivity to the data source or data warehouse so that the data can be extracted ad-hoc or otherwise can be scheduled for incremental data loads during off-peak hours. Having the flexibility to refresh data in near real-time for in-memory analytics is a powerful benefit for data-driven decisions.
Data governance and security
Depending on the vendor you select and the infrastructure for in-memory analytics, there is a higher potential risk of expositing data to more end users than ever. This raises data security concerns:
- How the data are accessed
- Who has access to the data
- Where it is stored
- How much data is analyzed
- Who can see and share the data and analyses
- Whether it can be pushed to mobile devices for information sharing
In recent months, we witnessed data security breaches globally in both the private and public sectors. Thus, selecting a solution that focuses on heightening data governance and making data security a priority can alleviate major headaches, costly remedies, and public embarrassment. I highly recommend a vendor that had a solution around a centralized data server such as a data warehouse or database. Having a centralized data repository option enables IT to govern the data in a highly safeguarded environment. With a centralized data repository, such as a data warehouse, your in-memory analytics can adapt and conform to your organization's data security measures. Another recommendation is to identify the users with certain rights and privileges to access sensitive data, analyze, and store the data and adjust as the employees change their role or job function within the company.

Let's examine some customer successes and case studies. These customers have adopted, implemented, and achieved superior results using in-memory analytics by considering the above requirements.

SUCCESS STORIES AND USE CASES

There are a number of success stories and use cases for in-memory analytics since its inception into the industry. As previously mentioned, in-memory analytics is used to explore the data and model development. While in-database analytics started to catch on in the mid- to late-2000s, in-memory analytics was first commercially promoted in 2011 with SAS and Teradata.

Global Financial Company: Explore and Analyze Sales and Business Operations

Our first success story comes from a global banking and financial services company based in Europe. This institution has an international network spanning 50 markets with more than 9,000 branches and over 149,000 employees. This commercial bank operates in 22 European countries, and its strategic position in Western and Eastern Europe gives the company one of the region's highest market shares.

One of the reasons for it adopting and implementing in-memory analytics is to enhance its data governance focus. It was and still is imperative for a financial company to adhere to rules and regulations, especially when it comes to data security and governance. Protecting customer information is a priority for the company.

There are three main objectives mandated by the CFO (chief financial officer) to improve and implement change for the future of the company:

The first one is to streamline and optimize their internal processes. Their traditional process took too long to get the information to sales and executives to make data-driven decisions. The data latency had become an issue and was impacting their competitiveness and hence growth.
The second is to create strategic value and develop a long-term strategy that can transform the company from reactive to proactive with updated technologies and near real-time information to be used across the enterprise.
Finally, the third is to create a diverse marketing campaign with various promotions and incentives for customers, to grow their customer base and profitability. In Figure 3.2, the customer illustrates the initiatives for the adoption of in-memory analytics.

Overview of a CFO mandate. — **Figure 3.2** CFO mandate for the project

The CFO is the executive sponsor for this project, and he has outlined the key areas and initiatives to manage complexity and simplify the need for information sharing across departments and throughout the enterprise.

Centralization
This is the first and most critical initiative for the customer. Over the years, data has grown exponentially and been stored in many locations that users can access. The many silos of data created issues in areas of data management, data capture, data tagging, and data quality. Since there was no central repository of data such as an integrated data warehouse, there were many data silos that created inconsistent data analyses and reporting to management and staff. Centralizing all data into a data warehouse defines a consistent process to capture, integrate, and access the data for analytics and analyses. In addition, centralizing the data will provide you a single repository for analysis and enhance the data governance focus to adhere to policies and regulations.
Culture development
This is an intangible initiative that encourages IT and business to work together. It is expected and understood that IT manages the data warehouse and supports the needs of the business side. Thus, it is critical and an imperative to adopt an integrated solution that closely connects analytics and data management to be supported by IT and business. In addition, the integrated solution would be shared across departments and business units that include executives, managers, commercial units, and product lines. The culture development initiative is to create an open organization of information sharing but also focus on synergies across all departments to be more collaborative.
Standardization
There has not been a standard tool or process to address advanced analytics. The bank has been using various technologies including in-house development and commercial products. In addition, business reports are not consistent in quality and consistency for management to make decisions. By standardizing on one analytics solution and on one platform, the goal is to streamline the process, reduce the amount of number crunching, and increase the efforts behind study and analysis. It is a way to leverage resources to be less operational and to become more strategic in their roles and responsibilities. By doing so, information can be distributed in a timely manner and across the enterprise with one consistent view of the business.
Innovation
By combining the efforts behind centralization, culture development, and standardization, the bank's CFO ultimately wants to drive and deliver innovation to increase profitability and the bottom line. This company believes innovation can be achieved with the adoption of advanced analytics to analyze customer behavior, customer relationships, predictive simulations, and profitability of the business. It is also a strategy to move the business into the twenty-first century to support strategic decision making and sustain the competitiveness of the business.

The traditional approach to analyzing sales and business operations is at best mediocre. Data reside in many silos and there are many copies of the same data. Thus, the analyses of the data can be misleading and results are not trustworthy, as data may be outdated and there is not a process to provide a single view of the customer. Analyzing customer data and reporting the results from the analyses are critical for the business. The traditional model does not allow business analysts to analyze large amounts of data at a granular level. The staff spent as much as 85% of their time preparing the data instead of performing the analysis.

The business is asking for an innovative solution that allows analysis with a growing level of granularity, on large amounts of data, down to each single deal transaction and enables the staff to be more focused on data analysis instead of data preparation. Thus, in-memory analytics enters the picture to help the business thrive with a single, fast, scalable, and user-friendly solution.

Transformation Focus

As the bank examines its business, it considers many options and concludes that they need an advanced analytics solution focusing on data governance. Figure 3.3 illustrates the before and after picture of the architecture.

Illustration depicting the silos model to a functional, integrated architecture. — **Figure 3.3** From silos model to a functional, integrated architecture

Figure 3.3 illustrates how the traditional architecture is set up and used by the bank. The silo model has data sources (legacy systems, payment systems, securities, markets products, and customer info) in various data silos. As the data enters into the system, each group such as controller, risk, accounting, and regulations applies different data management process and services. Because there is no consistent method to manage the data, it is a “spaghetti” mess of data warehouse coming and going everywhere. Since the data may be in an inconsistent format or not integrated with other data sources, the company has a challenge to trust the state of the data and the results of the analyses from the data. In addition, with four silo data marts potentially storing duplicate data, it requires more resources to manage and maintain. This is what the bank referred to as “number crunching” from the previous section. Ultimately, different reports are being delivered and management is unable to trust which report is correct to make business decisions.

Transforming the left side to the right side, the layering concept is much more streamlined and with a lot of emphasis on data governance. Analytics have become the focal point instead of just reporting on the data. The company truly believes in using analytics to drive data-driven decisions. Let's examine the new architecture on the right side.

When I described the centralization, culture development, and standardization initiatives, the new architecture provides all of these elements. As data enter into the architecture, there is a standard process of managing and integrating the data. One important aspect to highlight is the focus on data quality, which is a topic that I find many customers tend to ignore and not address. (Refer to in-database data quality from Chapter 2.) In this case, the customer carves out a layer to address data quality issues so that data is cleansed and integrated before it gets analyzed. Each business unit such as controlling, risk, accounting, and regulation leverages the same data from one centralized repository to analyze and run reports using one view of the data. There are cross-functional layers for IT and business to govern the data. Figure 3.4 provides a more granular view of the architecture and how data is governed from IT and business.

Schematic representation of Integrating data and analytics. — **Figure 3.4** Integrating data and analytics

The bank no longer has four data marts but one enterprise data warehouse (EDW) to allow IT to stage and store the data. Once the data are captured in the EDW, data mining and reporting tools are available to analyze the data for the various business units. The advanced analytics are applied to the EDW data, where the business can further do analysis such as validating and reporting. In the next section, we will examine how in-memory analytics is used at this financial institution.

In-Memory Analytics for Visualization and Reporting

The CFO has three main goals for this project:

Ensure a coherent and synergistic implementation process of systems for use from executives to line of business managers to sales.
Provide all management levels and commercial personnel reports and network analysis in near real-time.
Standardize the rules and methodologies used at all levels to measure the phenomenon of risk-adjusted performance.

In-memory analytics was adopted by the customer and is used extensively for data visualization and exploration to create reports for executives and publish the results to the portal. One area is sales. In the banking world, there are many offers and incentives for credit cards, loans, and other types of accounts. For example, everyone can relate to applying for a loan and wondering if you can qualify (and how much you can borrow) to purchase a car, house, or renovation project. The bank uses in-memory analytics to analyze customer data at a granular level to examine historical data: whether you have applied for a loan before and were either accepted or rejected; you have employment and steady income to pay back the loan; whether you have defaulted on a loan in the past; and use of your credit history and credit score to see how much you can qualify for. These are some basic and rudimentary data exploration techniques used by the bank that require it to be near real-time so that it can provide an immediate response to the applicant. Figure 3.5 illustrates the process.

Schematic representation of In-Memory Analytics Process. — **Figure 3.5** In-memory analytics process

Once the data reside in the data warehouse, the process begins with data preparation to ensure the data are in the required format. The data are then lifted into memory for analysis. Analytics explores the relationship of the data and runs simulations to provide the output and scenarios based on the input of the applicant in seconds. It then provides the mechanism to create a report that can be published to the portal, a dashboard, or email, or even mobile devices.

In addition to sales, this institution uses in-memory analytics to analyze its network operations. It analyzes network capacity and the behavior of the network to adequately support the business planning. It does not want to have any downtime for any of their systems, as it can drastically affect the business operations and loss of revenue.

What used to take days and hours to process and analyze is now taking minutes and seconds. From analyzing loan applications to monitoring the network operations for business planning, the bank has expanded its use of in-memory analytics from 1,300 users in 2013 to over 15,000 users today. The user-friendly interface and game-changing performance have made this transformation a success. In-memory analytics offer the following benefits:

Specialization of supported processed with definition of a single point of access for corporate center and network
Optimization of workflow information
Centralization of information construction in order to ensure data ad information consistency across CFO, CRO, and network, creating the conditions for recurring synergies
Centralization of data quality
Centralization of expertise of business intelligence, with the creation of a service facilities server hub
More productive from analysts with less data preparation time

Best Practices and Lessons Learned

The customer shared some best practices and lesson learned when adopting a new technology such as in-memory analytics. The primary takeaway from this bank is that the business side must drive the project, while leveraging on IT for support. The business side included the IT side very early on when it evaluated the tools and technology. Because in-memory analytics was relatively new and emerging, the customer started several discussions with various vendors that included a proof-of-concept that aligns with the business and IT initiatives (centralization, standardization, culture development, and innovation). Other best practices and lessons learned are:

Obtain executive sponsorship: A strong sponsorship from an executive is imperative. In this case, it is the CFO who oversees the project, manages change with the business and IT sides, and reviews processes from the traditional to the new architecture. Having a strong executive sponsorship is fundamental to the success of this project.
Develop a glossary: Since business and IT must intertwine, the bank developed a glossary of definitions to define terms, process, and taxonomy. We often have a different understanding of a term. This glossary standardizes terms that are used by both IT and business. As each process gets developed, the glossary is updated and shared across the enterprise.
Involve all parties early in the process: The CFO brought in the business side to drive the requirements of the project and leverage IT to support the infrastructure. Both the IT and business staff help to define the data preparation process, since it is the first step to getting the data into the EDW. Both teams were involved to certify the data so that it is correctly governed to protect the integrity of the business. In addition, by getting all the groups involved early, there is more likelihood for success once the technology was selected as each group has input into the selection of the solution.
Select an integrated solution: This is a critical best practice consideration that is to have analytics well integrated with the enterprise data warehouse. In addition, the integrated solution must deliver end-to-end capabilities from data preparation, data exploration, advanced analytics, and reporting. Finally, the integrated solution should include hardware and software with a strong architecture to scale linearly as needed for today's needs and the future.

These best practices and lessons learned provide a good perspective of the efforts of teamwork and investment needed to be successful. The bank continues to expand the use of in-memory analytics. Because the adoption rate has been so high, in-memory analytics will extend to beyond 15,000 users and to more departments. It will continue to publish information and reports to mobile devices for data-driven decisions.

European Government Agency: Analyze Tax and Population Data

Thus far, I have provided a few customer successes in the financial, e-commerce, and telecommunications sector. This next one highlights the use of in-memory analytics in the public sector. It is a government agency based in Europe that handles taxes for individual, business, and customs for the entire country. As the population changes, this agency is also in charge of analyzing census data, which is very high in volume. The agency is collecting more data today than ever to manage the changing needs of the population.

In the traditional architecture, many of the business units such as customs, tax, financial, enforcement, and social services benefits did not have an automated way to look at the data. They relied on manual lookups and paper trail to know whether an individual household paid their property tax, as illustrated in Figure 3.6. This is just one example of their archaic process.

Schematic of Manual lookup and paperwork. — **Figure 3.6** Manual lookup and paperwork

Similar to the financial institution discussed earlier, this agency has data in many silos. There is no standardized process to manage the massive amounts of collected data.

Once the data are collected, applying analytics is a challenge and time consuming. The reason is the number of analytical tools and process to analyze the silos of data, and each department had its own types of tools to use. A standard tool is needed with the analytical capabilities to meet all of the departments' needs. When analytics is used, it can take hours and days to complete due to the infrastructure and the silos of data that need to be consolidated. It is currently inefficient for the IT and business groups to operate on a daily basis, motivating the agency to change to support the growth of the country and its population.

In-Memory Analytics

The homegrown, in-house application that is developed and being used is no longer able to meet the needs of the agency. When the agency was exploring for a new analytics package, they considered many factors. First and foremost is the usability of the interface. It has to be easy to use, and anyone from an analyst to director level can use the technology without have to do any coding and offer self-service functionality. Another factor was the depth and breadth of the analytics capability from beginner to novice to data statisticians. The technology must be scalable and fast when analyzing large amounts of data from many sources. After many months of evaluating, this agency selected an in-memory data visualization package that can be integrated with their existing data warehouse.

In-memory data visualization is used in various business units because of its flexibility, depth and breadth, ease of use, and timely analysis. One great example from this agency is the ability to create a dashboard for tax collectors. Prior to this system, all records were manually managed and monitored with paper trails. With in-memory analytics, the data can be analyzed at the granular level that includes all of the customer information—name, address, property tax value, tax owed or paid, and so on. The tax collector can simply leverage the information from the dashboard on a mobile device and determine who has not paid their taxes, as shown in Figure 3.7. The tax collector can receive alerts from those who have not paid their taxes and are overdue for 30, 60, 90 days, and beyond. When the tax collector visits the household to collect the taxes, the debt can be showed to the clients in real time on the dash board and there can be no argument as to whether the client claims to have paid or not. It offers a dynamic application for the tax collector to update and manage the data. Once the information is updated from the mobile device, it updates the dashboard for everyone in the agency to view and see the progress. This is a great example of using in-memory analytics to get real-time feeds and results and increase revenue for the agency.

Schematic of Distribution of information. — **Figure 3.7** Distribution of information

In addition to the tax department, another use of in-memory analytics is from the customs department. As the agency gets many incoming packages to its country, it collects the data and filters what may be interesting for the inspectors to flag and inspect. With in-memory analytics, data are being fed as packages are being scanned, and the data is analyzed for any suspicious materials for further examination. This type of information is shared with the line of business and executives to ensure the safety of the country and its citizens. Other uses for in-memory analytics and data visualization from this agency include:

Analyze and examine resources to support the needs of the agency for future growth.
Determine if the agency has funding for the government to spend on special projects or upgrades to their facility.
Segment customers in neighborhoods to better serve the community by providing additional police patrols or enforcement services.
Analyze census population and determine the growth or decline in certain neighborhoods and explore the reasons for the change.
Allocate enough funding and resources for departments to sustain support for the community.

The agency continues to expand the use of in-memory analytics. The user community has grown exponentially as well from hundreds to thousands. The adoption of the technology is very high as personnel in various departments have complimented its ease of use and the analytical capabilities it offers with superior performance. What has taken months to process can now be done in hours, which makes this agency very happy. It is striving to be better every day for its country and citizens.

The agency has shared with me the many benefits of using in-memory analytics integrated with the data warehouse.

Decades of data are now centralized and managed effectively for analytical needs.
It eliminates the number of data marts and costs of managing these data silos.
It enables staff to run complex analyses with terabytes of data.
It provides one platform for all information (via the dashboard) to be shared and consumed.
Its a access to granular data via the dashboard on mobile devices enables the agency to collect more debt and increase revenue.
It increases the productivity of the business analysts and users of the information.

INVESTMENT FOR IN-MEMORY ANALYTICS

When I speak to customers about in-memory analytics, one topic that comes up consistently is the investment or cost associated with the hardware and software. Of course, it depends on the vendor and the architecture that you choose to select. Every vendor offers different options when it comes to in-memory analytics. On the hardware side, you will likely need an appliance or a separate server to host the in-memory analytics. The nodes that reside in the appliance or server should be dedicated for in-memory analytics. When sizing the hardware, you should work with the vendor to configure the system appropriately by providing the vendor the following information based on your organization's requirements:

How many total users will have access to the environment?
In this case, the environment refers to the software and the appliance or server that supports in-memory analytics. As users access the data from the data warehouse and lift it into in-memory for analysis, the size of the appliance should be configured correctly to meet the needs of the number of users.
How many concurrent sessions may be running?
In-memory analytics may be used in many concurrent sessions from an enterprise or departmental level. Depending on the size of your organization, consider one or two departments first to have access to in-memory analytics and run concurrent sessions such as in-memory data visualization and domain specific such as data mining.
How many concurrent users are running in-memory analytics?
Customers tend to start with a small group of less than five and slowly expand to thousands. The reason for the initial small group is to test the technology, gain confidence in its ability and performance, and see the value of in-memory analytics. Once it shows success, the number of users tends to grow exponentially. How many users will you log in with a mobile device to access the information?
What is the amount of data that will be lifted into memory from the data warehouse?
The amount of the data determines the number of nodes needed in the appliance or server. Keep in mind that the nodes in the appliance or server are all dedicated to running in-memory analytics. Thus, you will get superior speed and performance for any complex, advanced analytics tasks that are data- and compute-intensive because of the dedicated nodes.
What is the largest size data to be used?
Whether it is in the gigabytes or petabytes, the software and hardware need to be configured to analyze your largest data set. Most of our customers who leverage in-memory analytics are in the petabytes.
Do you know the number of rows and columns of the data being analyzed in-memory?
Most of our customers have thousands to billions of rows and columns of data whether it is customer data, SKUS in retail, or manufacturing parts data.

On the software side, there are solutions that offer in-memory data visualization with deep analytical capabilities. In addition, there are specific domain in-memory analytics packages for data mining, statistics, forecasting, econometrics, text mining, and optimization. If a customer is looking for industry-specific in-memory solution, there are anti-money-laundering, risk, and marketing optimization. Depending on your needs, I advise the customers to adopt one solution and test it out to see if it meets the requirements of your business. Once you have proven it to be successful, you can expand the hardware and adopt additional software packages to extend the use of in-memory analytics.

When selecting an in-memory package, it is essential to ensure that the analytics is well integrated with the data warehouse.

By now, you get a good sense of what is in-memory analytics and what it can do. I tend to compare in-memory analytics to the sprint stage while in-database is the crawl stage. The next chapter covers Hadoop, which has been a trendy topic in the IT industry for the last few years. Hadoop will be the last piece of the relay, and then we will see how in-database, in-memory, and Hadoop fit into the big picture that customers are embracing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3: In-Memory Analytics

Create new playlist

Sign In

Sign Up