Chapter 4. Enterprise Technologies and Big Data Business Intelligence

Image

Online Transaction Processing (OLTP)

Online Analytical Processing (OLAP)

Extract Transform Load (ETL)

Data Warehouses

Data Marts

Traditional BI

Big Data BI

As described in Chapter 2, in an enterprise executed as a layered system, the strategic layer constrains the tactical layer, which directs the operational layer. The alignment of layers is captured through metrics and performance indicators, which provide the operational layer with insight into how its processes are executing. These measurements are aggregated and enhanced with additional meaning to become KPIs, through which managers of the tactical layer can assess corporate performance, or business execution. The KPIs are related with other measurements and understandings that are used to assess critical success factors. Ultimately, this series of enrichment corresponds with the transformation of data into information, information into knowledge and knowledge into wisdom.

This chapter discusses the enterprise technologies that support this transformation. Data is held within the operational-level information systems of an organization. Moreover, database structure is leveraged with queries to generate information. Higher up the analytic food chain are analytical processing systems. These systems leverage multi-dimensional structures to answer more complex queries and provide deeper insight into business operations. On a larger scale, data is collected from throughout the enterprise and warehoused in a data warehouse. It is from these data stores that management gains insight into broader corporate performance and KPIs.

This chapter covers the following topics:

• Online Transaction Processing (OLTP)

• Online Analytical Processing (OLAP)

• Extract Transform Load (ETL)

• Data Warehouses

• Data Marts

• Traditional BI

• Big Data BI

Online Transaction Processing (OLTP)

OLTP is a software system that processes transaction-oriented data. The term “online transaction” refers to the completion of an activity in realtime and is not batch-processed. OLTP systems store operational data that is normalized. This data is a common source of structured data and serves as input to many analytic processes. Big Data analysis results can be used to augment OLTP data stored in the underlying relational databases. OLTP systems, for example a point of sale system, execute business processes in support of corporate operations. As shown in Figure 4.1, they perform transactions against a relational database.

Image

Figure 4.1 OLTP systems perform simple database operations to provide sub-second response times.

The queries supported by OLTP systems are comprised of simple insert, delete and update operations with sub-second response times. Examples include ticket reservation systems, banking and point of sale systems.

Online Analytical Processing (OLAP)

Online analytical processing (OLAP) systems are used for processing data analysis queries. OLAPs form an integral part of business intelligence, data mining and machine learning processes. They are relevant to Big Data in that they can serve as both a data source as well as a data sink that is capable of receiving data. They are used in diagnostic, predictive and prescriptive analytics. As shown in Figure 4.2, OLAP systems perform long-running, complex queries against a multidimensional database whose structure is optimized for performing advanced analytics.

Image

Figure 4.2 OLAP systems use multidimensional databases.

OLAP systems store historical data that is aggregated and denormalized to support fast reporting capability. They further use databases that store historical data in multidimensional structures and can answer complex queries based on the relationships between multiple aspects of the data.

Extract Transform Load (ETL)

Extract Transform Load (ETL) is a process of loading data from a source system into a target system. The source system can be a database, a flat file, or an application. Similarly, the target system can be a database or some other storage system.

ETL represents the main operation through which data warehouses are fed data. A Big Data solution encompasses the ETL feature-set for converting data of different types. Figure 4.3 shows that the required data is first obtained or extracted from the sources, after which the extracts are modified or transformed by the application of rules. Finally, the data is inserted or loaded into the target system.

Image

Figure 4.3 An ETL process can extract data from multiple sources and transform it for loading into a single target system.

Data Warehouses

A data warehouse is a central, enterprise-wide repository consisting of historical and current data. Data warehouses are heavily used by BI to run various analytical queries, and they usually interface with an OLAP system to support multi-dimensional analytical queries, as shown in Figure 4.4.

Image

Figure 4.4 Batch jobs periodically load data into a data warehouse from operational systems like ERP, CRM and SCM.

Data pertaining to multiple business entities from different operational systems is periodically extracted, validated, transformed and consolidated into a single denormalized database. With periodic data imports from across the enterprise, the amount of data contained in a given data warehouse will continue to increase. Over time this leads to slower query response times for data analysis tasks. To resolve this shortcoming, data warehouses usually contain optimized databases, called analytical databases, to handle reporting and data analysis tasks. An analytical database can exist as a separate DBMS, as in the case of an OLAP database.

Data Marts

A data mart is a subset of the data stored in a data warehouse that typically belongs to a department, division, or specific line of business. Data warehouses can have multiple data marts. As shown in Figure 4.5, enterprise-wide data is collected and business entities are then extracted. Domain-specific entities are persisted into the data warehouse via an ETL process.

Image

Figure 4.5 A data warehouse’s single version of “truth” is based on cleansed data, which is a prerequisite for accurate and error-free reports, as per the output shown on the right.

Traditional BI

Traditional BI primarily utilizes descriptive and diagnostic analytics to provide information on historical and current events. It is not “intelligent” because it only provides answers to correctly formulated questions. Correctly formulating questions requires an understanding of business problems and issues and of the data itself. BI reports on different KPIs through:

• ad-hoc reports

• dashboards

Ad-hoc Reports

Ad-hoc reporting is a process that involves manually processing data to produce custom-made reports, as shown in Figure 4.6. The focus of an ad-hoc report is usually on a specific area of the business, such as its marketing or supply chain management. The generated custom reports are detailed and often tabular in nature.

Image

Figure 4.6 OLAP and OLTP data sources can be used by BI tools for both ad-hoc reporting and dashboards.

Dashboards

Dashboards provide a holistic view of key business areas. The information displayed on dashboards is generated at periodic intervals in realtime or near-realtime. The presentation of data on dashboards is graphical in nature, using bar charts, pie charts and gauges, as shown in Figure 4.7.

Image

Figure 4.7 BI tools use both OLAP and OLTP to display the information on dashboards.

As previously explained, data warehouses and data marts contain consolidated and validated information about enterprise-wide business entities. Traditional BI cannot function effectively without data marts because they contain the optimized and segregated data that BI requires for reporting purposes. Without data marts, data needs to be extracted from the data warehouse via an ETL process on an ad-hoc basis whenever a query needs to be run. This increases the time and effort to execute queries and generate reports.

Traditional BI uses data warehouses and data marts for reporting and data analysis because they allow complex data analysis queries with multiple joins and aggregations to be issued, as shown in Figure 4.8.

Image

Figure 4.8 An example of traditional BI.

Big Data BI

Big Data BI builds upon traditional BI by acting on the cleansed, consolidated enterprise-wide data in the data warehouse and combining it with semi-structured and unstructured data sources. It comprises both predictive and prescriptive analytics to facilitate the development of an enterprise-wide understanding of business performance.

While traditional BI analyses generally focus on individual business processes, Big Data BI analyses focus on multiple business processes simultaneously. This helps reveal patterns and anomalies across a broader scope within the enterprise. It also leads to data discovery by identifying insights and information that may have been previously absent or unknown.

Big Data BI requires the analysis of unstructured, semi-structured and structured data residing in the enterprise data warehouse. This requires a “next-generation” data warehouse that uses new features and technologies to store cleansed data originating from a variety of sources in a single uniform data format. The coupling of a traditional data warehouse with these new technologies results in a hybrid data warehouse. This warehouse acts as a uniform and central repository of structured, semi-structured and unstructured data that can provide Big Data BI tools with all of the required data. This eliminates the need for Big Data BI tools to have to connect to multiple data sources to retrieve or access data. In Figure 4.9, a next-generation data warehouse establishes a standardized data access layer across a range of data sources.

Image

Figure 4.9 A next-generation data warehouse.

Traditional Data Visualization

Data visualization is a technique whereby analytical results are graphically communicated using elements like charts, maps, data grids, infographics and alerts. Graphically representing data can make it easier to understand reports, view trends and identify patterns.

Traditional data visualization provides mostly static charts and graphs in reports and dashboards, whereas contemporary data visualization tools are interactive and can provide both summarized and detailed views of data. They are designed to help people who lack statistical and/or mathematical skills to better understand analytical results without having to resort to spreadsheets.

Traditional data visualization tools query data from relational databases, OLAP systems, data warehouses and spreadsheets to present both descriptive and diagnostic analytics results.

Data Visualization for Big Data

Big Data solutions require data visualization tools that can seamlessly connect to structured, semi-structured and unstructured data sources and are further capable of handling millions of data records. Data visualization tools for Big Data solutions generally use in-memory analytical technologies that reduce the latency normally attributed to traditional, disk-based data visualization tools.

Advanced data visualization tools for Big Data solutions incorporate predictive and prescriptive data analytics and data transformation features. These tools eliminate the need for data pre-processing methods, such as ETL. The tools also provide the ability to directly connect to structured, semi-structured and unstructured data sources. As part of Big Data solutions, advanced data visualization tools can join structured and unstructured data that is kept in memory for fast data access. Queries and statistical formulas can then be applied as part of various data analysis tasks for viewing data in a user-friendly format, such as on a dashboard.

Common features of visualization tools used in Big Data:

Aggregation – provides a holistic and summarized view of data across multiple contexts

Drill-down – enables a detailed view of the data of interest by focusing in on a data subset from the summarized view

Filtering – helps focus on a particular set of data by filtering away the data that is not of immediate interest

Roll-up – groups data across multiple categories to show subtotals and totals

What-if analysis – enables multiple outcomes to be visualized by enabling related factors to be dynamically changed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset