Chapter 4. Enterprise Technologies and Big Data Business Intelligence

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Enterprise Technologies and Big Data Business Intelligence

Online Transaction Processing (OLTP)

Online Analytical Processing (OLAP)

Extract Transform Load (ETL)

As described in Chapter 2, in an enterprise executed as a layered system, the strategic layer constrains the tactical layer, which directs the operational layer. The alignment of layers is captured through metrics and performance indicators, which provide the operational layer with insight into how its processes are executing. These measurements are aggregated and enhanced with additional meaning to become KPIs, through which managers of the tactical layer can assess corporate performance, or business execution. The KPIs are related with other measurements and understandings that are used to assess critical success factors. Ultimately, this series of enrichment corresponds with the transformation of data into information, information into knowledge and knowledge into wisdom.

This chapter discusses the enterprise technologies that support this transformation. Data is held within the operational-level information systems of an organization. Moreover, database structure is leveraged with queries to generate information. Higher up the analytic food chain are analytical processing systems. These systems leverage multi-dimensional structures to answer more complex queries and provide deeper insight into business operations. On a larger scale, data is collected from throughout the enterprise and warehoused in a data warehouse. It is from these data stores that management gains insight into broader corporate performance and KPIs.

This chapter covers the following topics:

• Online Transaction Processing (OLTP)

• Online Analytical Processing (OLAP)

• Extract Transform Load (ETL)

• Data Warehouses

• Data Marts

• Traditional BI

• Big Data BI

Online Transaction Processing (OLTP)

OLTP is a software system that processes transaction-oriented data. The term “online transaction” refers to the completion of an activity in realtime and is not batch-processed. OLTP systems store operational data that is normalized. This data is a common source of structured data and serves as input to many analytic processes. Big Data analysis results can be used to augment OLTP data stored in the underlying relational databases. OLTP systems, for example a point of sale system, execute business processes in support of corporate operations. As shown in Figure 4.1, they perform transactions against a relational database.

Figure 4.1 OLTP systems perform simple database operations to provide sub-second response times.

The queries supported by OLTP systems are comprised of simple insert, delete and update operations with sub-second response times. Examples include ticket reservation systems, banking and point of sale systems.

Online Analytical Processing (OLAP)

Online analytical processing (OLAP) systems are used for processing data analysis queries. OLAPs form an integral part of business intelligence, data mining and machine learning processes. They are relevant to Big Data in that they can serve as both a data source as well as a data sink that is capable of receiving data. They are used in diagnostic, predictive and prescriptive analytics. As shown in Figure 4.2, OLAP systems perform long-running, complex queries against a multidimensional database whose structure is optimized for performing advanced analytics.

Figure 4.2 OLAP systems use multidimensional databases.

OLAP systems store historical data that is aggregated and denormalized to support fast reporting capability. They further use databases that store historical data in multidimensional structures and can answer complex queries based on the relationships between multiple aspects of the data.

Extract Transform Load (ETL)

Extract Transform Load (ETL) is a process of loading data from a source system into a target system. The source system can be a database, a flat file, or an application. Similarly, the target system can be a database or some other storage system.

ETL represents the main operation through which data warehouses are fed data. A Big Data solution encompasses the ETL feature-set for converting data of different types. Figure 4.3 shows that the required data is first obtained or extracted from the sources, after which the extracts are modified or transformed by the application of rules. Finally, the data is inserted or loaded into the target system.

Figure 4.3 An ETL process can extract data from multiple sources and transform it for loading into a single target system.

Data Warehouses

A data warehouse is a central, enterprise-wide repository consisting of historical and current data. Data warehouses are heavily used by BI to run various analytical queries, and they usually interface with an OLAP system to support multi-dimensional analytical queries, as shown in Figure 4.4.

Figure 4.4 Batch jobs periodically load data into a data warehouse from operational systems like ERP, CRM and SCM.

Data pertaining to multiple business entities from different operational systems is periodically extracted, validated, transformed and consolidated into a single denormalized database. With periodic data imports from across the enterprise, the amount of data contained in a given data warehouse will continue to increase. Over time this leads to slower query response times for data analysis tasks. To resolve this shortcoming, data warehouses usually contain optimized databases, called analytical databases, to handle reporting and data analysis tasks. An analytical database can exist as a separate DBMS, as in the case of an OLAP database.

Data Marts

A data mart is a subset of the data stored in a data warehouse that typically belongs to a department, division, or specific line of business. Data warehouses can have multiple data marts. As shown in Figure 4.5, enterprise-wide data is collected and business entities are then extracted. Domain-specific entities are persisted into the data warehouse via an ETL process.

Figure 4.5 A data warehouse’s single version of “truth” is based on cleansed data, which is a prerequisite for accurate and error-free reports, as per the output shown on the right.

Traditional BI

Traditional BI primarily utilizes descriptive and diagnostic analytics to provide information on historical and current events. It is not “intelligent” because it only provides answers to correctly formulated questions. Correctly formulating questions requires an understanding of business problems and issues and of the data itself. BI reports on different KPIs through:

• ad-hoc reports

• dashboards

Ad-hoc Reports

Ad-hoc reporting is a process that involves manually processing data to produce custom-made reports, as shown in Figure 4.6. The focus of an ad-hoc report is usually on a specific area of the business, such as its marketing or supply chain management. The generated custom reports are detailed and often tabular in nature.

Figure 4.6 OLAP and OLTP data sources can be used by BI tools for both ad-hoc reporting and dashboards.

Dashboards

Dashboards provide a holistic view of key business areas. The information displayed on dashboards is generated at periodic intervals in realtime or near-realtime. The presentation of data on dashboards is graphical in nature, using bar charts, pie charts and gauges, as shown in Figure 4.7.

Figure 4.7 BI tools use both OLAP and OLTP to display the information on dashboards.

As previously explained, data warehouses and data marts contain consolidated and validated information about enterprise-wide business entities. Traditional BI cannot function effectively without data marts because they contain the optimized and segregated data that BI requires for reporting purposes. Without data marts, data needs to be extracted from the data warehouse via an ETL process on an ad-hoc basis whenever a query needs to be run. This increases the time and effort to execute queries and generate reports.

Traditional BI uses data warehouses and data marts for reporting and data analysis because they allow complex data analysis queries with multiple joins and aggregations to be issued, as shown in Figure 4.8.

Figure 4.8 An example of traditional BI.

Big Data BI

Big Data BI builds upon traditional BI by acting on the cleansed, consolidated enterprise-wide data in the data warehouse and combining it with semi-structured and unstructured data sources. It comprises both predictive and prescriptive analytics to facilitate the development of an enterprise-wide understanding of business performance.

While traditional BI analyses generally focus on individual business processes, Big Data BI analyses focus on multiple business processes simultaneously. This helps reveal patterns and anomalies across a broader scope within the enterprise. It also leads to data discovery by identifying insights and information that may have been previously absent or unknown.

Big Data BI requires the analysis of unstructured, semi-structured and structured data residing in the enterprise data warehouse. This requires a “next-generation” data warehouse that uses new features and technologies to store cleansed data originating from a variety of sources in a single uniform data format. The coupling of a traditional data warehouse with these new technologies results in a hybrid data warehouse. This warehouse acts as a uniform and central repository of structured, semi-structured and unstructured data that can provide Big Data BI tools with all of the required data. This eliminates the need for Big Data BI tools to have to connect to multiple data sources to retrieve or access data. In Figure 4.9, a next-generation data warehouse establishes a standardized data access layer across a range of data sources.

Figure 4.9 A next-generation data warehouse.

Traditional Data Visualization

Data visualization is a technique whereby analytical results are graphically communicated using elements like charts, maps, data grids, infographics and alerts. Graphically representing data can make it easier to understand reports, view trends and identify patterns.

Traditional data visualization provides mostly static charts and graphs in reports and dashboards, whereas contemporary data visualization tools are interactive and can provide both summarized and detailed views of data. They are designed to help people who lack statistical and/or mathematical skills to better understand analytical results without having to resort to spreadsheets.

Traditional data visualization tools query data from relational databases, OLAP systems, data warehouses and spreadsheets to present both descriptive and diagnostic analytics results.

Data Visualization for Big Data

Big Data solutions require data visualization tools that can seamlessly connect to structured, semi-structured and unstructured data sources and are further capable of handling millions of data records. Data visualization tools for Big Data solutions generally use in-memory analytical technologies that reduce the latency normally attributed to traditional, disk-based data visualization tools.

Advanced data visualization tools for Big Data solutions incorporate predictive and prescriptive data analytics and data transformation features. These tools eliminate the need for data pre-processing methods, such as ETL. The tools also provide the ability to directly connect to structured, semi-structured and unstructured data sources. As part of Big Data solutions, advanced data visualization tools can join structured and unstructured data that is kept in memory for fast data access. Queries and statistical formulas can then be applied as part of various data analysis tasks for viewing data in a user-friendly format, such as on a dashboard.

Common features of visualization tools used in Big Data:

• Aggregation – provides a holistic and summarized view of data across multiple contexts

• Drill-down – enables a detailed view of the data of interest by focusing in on a data subset from the summarized view

• Filtering – helps focus on a particular set of data by filtering away the data that is not of immediate interest

• Roll-up – groups data across multiple categories to show subtotals and totals

• What-if analysis – enables multiple outcomes to be visualized by enabling related factors to be dynamically changed.

Case Study Example

Enterprise Technology

ETI employs OLTP in almost every business function. Its policy quotation, policy administration, claims management, billing, enterprise resource planning (ERP) and customer relationship management (CRM) systems are all OLTP-based. An example of ETI’s employment of OLTP occurs whenever there is the submission of a new claim, for it results in the creation of a new record in the claim table found within the relational database used by the claims management system. Similarly, as the claim gets processed by the claim adjuster, its status changes from submitted to assigned and from assigned to processing and finally to processed through simple database update operations.

The EDW is populated weekly via multiple ETL operations that involve extracting data from tables in the relational databases used by operational systems, validating and transforming the data and loading it into the EDW’s database. Data extracted from the operational systems is in a flat file format that is first imported into a staging database, where it is transformed by the execution of various scripts. One ETL process that deals with customer data involves the application of several data validation rules, one of which is to confirm that each customer has both the first and surname fields populated with meaningful characters. Also, as part of the same ETL process, the first two lines of the address are joined together.

The EDW includes an OLAP system where data is kept in the form of cubes that enable the execution of various reporting queries. For example, the policy cube is made up of calculations of policies sold (the fact table) and dimensions of location, type and time (dimension tables.) The analysts perform queries on different cubes as part of business intelligence (BI) activities. For security and fast query response, the EDW further contains two data marts. One of them is comprised of claim and policy data that is used by the actuaries and the legal team for various data analyses, including risk assessment and regulatory compliance assurance. The second one contains sales-related data that is used by the sales team to monitor sales and set future sales strategies.

Big Data Business Intelligence

As established, ETI currently employs BI that falls into the category of traditional BI. One particular dashboard used by the sales team displays various policy-related KPIs via different charts, such as a breakdown of sold policies by type, region and value and policies expiring each month. Different dashboards inform agents of their current performances, such as commissions earned and whether or not they are on track for achieving their monthly targets. Both of these dashboards are fed data from the sales data mart.

In the call center, a scoreboard provides vital statistics related to daily operations of the center, such as the number of calls in queue, average waiting time, number of calls dropped and calls by type. This scoreboard is fed data directly from the CRM’s relational database with a BI product that provides a simple user interface for constructing different SQL queries that are periodically executed to obtain required KPIs. The legal team and the actuaries, however, generate some ad-hoc reports that resemble a spreadsheet. Some of these reports are sent to the regulatory authorities as part of assuring continuous regulatory compliance.

ETI believes that the adoption of Big Data BI will greatly help in achieving its strategic goals. For example, the incorporation of social media along with a call center agent’s notes may provide a better understanding of the reasons behind a customer’s defection. Similarly, the legitimacy of a filed claim can be ascertained more quickly if valuable information can be harvested from the documents submitted at the time a policy was purchased and cross-referenced against the claim data. This information can then be correlated with similar claims to detect fraud.

With regards to data visualization, the BI tools used by the analysts currently only operate on structured data. In terms of sophistication and ease of use, most of these tools provide point-and-click functionality where either a wizard can be used or the required fields can be selected manually from the relevant tables displayed graphically to construct a database query. The query results can then be displayed by choosing the relevant charts and graphs. The end result is a dashboard where different statistics are displayed. The dashboard can be configured to add filtering, aggregation and drill-down options. An example of this could be a user who clicks on a quarterly sales figures chart and is taken to a monthly breakdown of sales figures. Although a dashboard that provides the what-if analysis feature is not currently supported, having one would allow the actuaries to quickly ascertain different risk levels by changing relevant risk factors.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4. Enterprise Technologies and Big Data Business Intelligence

Create new playlist

Sign In

Sign Up

Chapter 4. Enterprise Technologies and Big Data Business Intelligence

Online Transaction Processing (OLTP)

Online Analytical Processing (OLAP)

Extract Transform Load (ETL)

Data Warehouses

Data Marts

Traditional BI

Ad-hoc Reports

Dashboards

Big Data BI

Traditional Data Visualization

Data Visualization for Big Data

Table of Contents for
Chapter 4. Enterprise Technologies and Big Data Business Intelligence