Identify the metrics you want to instrument (i.e., those metrics you choose to use that will provide the source data for the analysis)
Normalize the metric data
Convert cumulative data to deltas
Union normalized metrics from multiple sources
Identifying the metrics you want to instrument is an obvious first step, but the choice of what data to use is almost limitless and therefore of no small consequence. In version 12c of Oracle, there are about 140 DBA_HIST views. When I began experimenting with this new process, I chose a handful of tables I wanted to incorporate. I chose tables from the AWR framework based on my experience/familiarity with them. I chose metrics/tables I had found helpful/indicative of performance issues in the past while I was using a traditional approach. As I expanded this effort, I continued to add tables/metrics. So far I have instrumented seven tables which include approximately 11,000 metrics. The collection of these metrics occupies approximately two million rows when the data collection is done in hourly increments over a week.
dba_hist_waitstat: cumulative stats on block contention, including CLASS, WAIT_COUNT, TIME
dba_hist_undostat
custom queries: e.g., DBA_ADVISOR% ; DBA_HIST_SQLSTAT ;... blocked sessions
My DOPA process includes in the analysis Exadata and in-memory metrics since they are included in one of the metric source tables (DBA_HIST_SYSSTAT) from which I am pulling data. When Oracle introduces new features such as Exadata [Exadata is Oracle’s database machine platform which has been available for many years] and database in-memory (introduced in Oracle 12c), they are also introducing a ton of new and useful metrics. Suffice it to say that the DOPA process is flexible enough to automatically incorporate many of these new metrics right off the bat.
The following figures show some of the metrics collected to support these features.
Since the databases on which I work typically don’t have the Exadata or database in-memory licensed features, the returned value for these metrics is zero. However, if you are working on an Exadata machine or a database with the in-memory feature turned on and run the same code, there will be a result returned, and it will be incorporated into the analysis. By writing your code to always include these important metrics, you don’t have to worry about missing relevant observations, and you don’t have to rewrite code when machine licensing changes. When those metrics are available, they will be included in the analysis. When they aren’t available, the returned value will be zero and they will not inform the analysis.
Normalize the Data
As I related in the introduction to this section, while I was tuning using traditional methods, I found it useful to collect and pivot data out of the tables where they were found into a short, wide format for my performance tuning work. When I began trying to develop a new process for analysis, I found this short-wide data structure cumbersome to work with. I had to write separate code for each metric I wanted to include because each kind of measurement was a special case. This made scaling to thousands of metrics impossible. I realized that if I used the normalized data structure (key-value pair format—also described as a tall, skinny format) instead, I would have a common format for the data, and this would allow me to complete the analysis on all the metrics with just one SQL statement.
Normalization, therefore, is the second step in the process of data preparation. Normalizing the data structure involves putting the metric data into a format commonly referred to as key-value pairs. The normalized data structure looks like a tall, skinny table as opposed to the multicolumn table (MCT) that looks short and wide. Some of the AWR tables (e.g., dba_hist_sysmetric_summary and dba_hist_sys_time_model) already report the metrics they contain in a normalized view. The challenge then was to discover a way to convert the multitude of metric sources that are in a multicolumn table (MCT) structure to a normalized structure.
The process I use to normalize data involves logically “unpivoting” traditional short-wide structured data (MCT) into a tall, skinny structure of key-value pairs. In this format, each row is a key-value pair (KVP).
Figures 3-4, 3-5, and 3-6 show examples of data in both MCT and KVP formats. On the top of the figure, the data is displayed in a short-wide MCT format, and below that, the same data is displayed in a tall, skinny format (KVP).
Figure 3-4a shows the multicolumn table format of DBA_HIST_LATCH. A SQL “describe” can be used to see all the columns of DBA_HIST_LATCH.
Figure 3-5a shows the multicolumn table format of DBA_HIST_SYSTEM_EVENT. A SQL “describe” can be used to see all the columns of DBA_HIST_SYSTEM_EVENT. You may note that here again the values in the various columns are cumulative rather being the value for just that particular snap_id.
Figure 3-6a shows the multicolumn table format of DBA_HIST_IOSTAT_FUNCTION. A SQL “describe” can be used to see all the columns of DBA_HIST_ IOSTAT_FUNCTION. You may note that the values in the various columns are cumulative rather being the value for just that particular snap_id.
The how-to for unpivoting the data to obtain a normalized structure is simple. I use a SQL statement to do this, taking advantage of the WITH clause subquery factoring capability of Oracle (example code snippet shown in the following text). When implemented, this SQL query will convert the columns of data into rows, with each row having a metric on the left and its value on the right. Following is an example code for normalizing data.
The reason I place a significant emphasis on normalizing data is that it is essential for enabling you to instrument large quantities of metrics. Since the motivation for building a better process is to obtain a more complete analysis, it’s important to use all the metrics that might inform your analysis. Although the normalization process is essential, it is not complicated. In fact, it is very simple and easy to implement. Once you understand how to code to normalize data, the door opens for you to instrument any metric. Later in this chapter, I explain how to implement custom queries to enable you to use any data source.
Some metrics are already persisted in the key-value pair (KVP) format, so do not need to go through the normalization step. For example, those metrics found in DBA_HIST_SYS_TIME_MODEL, %OSSTAT, %SYSMETRIC_SUMMARY, and %SYSSTAT already use the KVP format.
As you can see, the key-value pair data structure can easily be implemented in a relational database. It is regularly used by computer scientists, and I have made use of it many times over the years. The usefulness of this structure can be inferred from its ever-increasing use in the field of analytics. Many recently developed systems/tools use this format instead of a more traditional structure including NoSQL, MongoDB, Redis, JSON, and XML. I expect this list will grow as developers comprehend and take hold of the power of big data techniques.
Convert to Delta Values
Let’s say, for example, you had a calorie monitor that measured your cumulative calorie usage and reported this every hour. In a certain use case, that would be an important metric; however, you would probably also want to know how many calories you used in a particular hour (i.e., the “delta value”). In the case of Oracle performance metrics, Oracle provides many of the metrics as cumulative values, but we usually want to know the “delta value,” that is, the value for that metric in a particular AWR snapshot period (usually one hour). So, to make the metric more useful, I convert the cumulative values to delta values.
DBA_HIST_SYSMETRIC_SUMMARY.METRIC_NAME = “Response Time Per Txn”
Example average value: 220 centiseconds per Txn
- Average value is for the interval marked by
BEGIN_TIME: 5/15/2018 01:00:00
END_TIME: 5/15/2018 02:00:00
However, for a certain set of metrics which Oracle refers to as statistics, a cumulative value since instance start-up is given at each interval. The values for these metrics increase over time. The metrics persisted in DBA_HIST_SYS_TIME_MODEL, %SYSSTAT, %OSSTAT, %LATCH, %SYSTEM_EVENT, and %IOSTAT_FUNCTION are examples of this type of metric. A close examination of the data in the listings in Figures 3-4, 3-5, and 3-6 also shows the cumulative nature of the values collected by Oracle for the metric. Data that is presented as a cumulative value requires some manipulation in order to be used in the DOPA process for analysis.
The manipulation I do involves converting the cumulative data into deltas. Logically, this makes sense because what we are concerned with for performance issues is the change from one interval to the next as opposed to the cumulative value since instance start-up. So in this step, I’m preparing the data to show the values for the metric in that single snapshot interval (i.e., the “delta”) rather than the cumulative value which is recorded by Oracle. These deltas can be calculated using Oracle’s LAG function [for more details on the LAG function, refer to the Oracle documentation; see the following text for an example of LAG in this context].
Figure 3-8 shows an example of using the LAG function convert raw data that is cumulative in nature to deltas.
Union Data from All Sources
Once I have the normalized key-value pair format of the metrics from several different sources, I union them together to form one big virtual data set.
Figure 3-9 shows an example of normalized data that has been unioned for analysis.
The union of the normalized metrics is what allows me to scale my analysis to include the thousands of the metrics available. Once the data is unioned, it is a simple task to tell the “machine” to apply the same analysis to all the KVP rows.
Up to this point, I have not moved any data, but the data is perceived logically by the Oracle query engine as one giant metric table. I can continue to add metrics. Since operations on the data are being performed inside the database and data is not being moved around, it is not a resource-consuming process. Even with the thousands of metrics I’ve implemented so far, I have not come close to hitting the machine’s capacity limits. The analysis I run using data from the seven tables I have implemented so far includes 2 million rows of metrics and takes approximately 30 secs to 2 ½ mins. Theoretically, at least, you should be able to add all the metrics you want without encountering difficulties.
Custom Queries
As mentioned above, you are not limited to the data collected by AWR for your analysis. You can use the basic process I have explained to capture data from any source that can be formatted as a time-series KVP metric. A custom query will enable you to instrument data retrieved from sources such as dba_advisor tables and active session history results, as well as many other sources.
For example, you could use this process to instrument log file switching which is a time-based metric persisted in v$log_history. You would begin by “unpivoting” this data.
The last step would be to normalize this table into the standard KVP format, with a column for time, a column for metric name (log file switches per hour), and a third column for value.
Figure 3-11 shows v$log_history in key-value pair format. I’ve joined v$log_history with DBA_HIST_SNAPSHOT (matching the time values) so that I can get the snap_ids just like all the AWR metrics have. To use this in the DOPA process, one would just need to union this in with the other metrics using the same method being used for normal AWR metrics.
Not all metrics have a data pattern that make it easy to convert to the KVP format. Histograms are a data source that I would put in this category. I have not spent time to discover a method for implementing this type of data, but I am sure there is a way. I welcome any input from others who have a desire to work on this particular aspect.
Summary
Hopefully, this discussion has enabled you to understand the usefulness of adopting the KVP/normalized data structure. Data prep is an easy step, more or less a housekeeping step, but it is essential for scalability. Only when the data is formatted into a single, large, virtual data set does it become possible to include the vast quantity of data that we want to include in the analysis. Normalized data can be manipulated with a single SQL statement in situ, and therefore it is possible to include a vast number of metrics. It is the quantity of metrics that assures us that our analysis is fully informed and therefore accurate.
This discussion should also have convinced you that the process of converting MCT structures to this normalized structure is easily accomplished. Normalization is accomplished using SQL, the language of the database. It is a language we are all familiar with and use every day.
The code snippets I’ve included will provide the example you need to be able to select and implement your own set of metrics. Your choice of metrics will depend on your organization’s goals and objectives.
I listed in the preceding text the metrics I have included to date. I plan to continue to add metrics as I am able because I believe having more metrics adds depth of understanding. My goal is to instrument every source of metrics that could be useful for performance tuning; any metric stored within Oracle is a potential candidate. Of course, adding metrics takes time, so I will prioritize.
Oracle continues to add new features to their product. Each time they do, they add a metric by which its performance can be measured. The analysis I perform with the metrics I have implemented to date takes approximately one minute to run. With this analysis, I have not come anywhere close to maximizing the machine’s capability. Theoretically, it should be possible to include all of the metrics in the analysis, but I don’t know yet if this is true. I also don’t know if it is necessary. I am still in the discovery phase.
When using a traditional approach, it had always plagued me that my finite ability to process information was necessarily forcing me to limit my analysis to a few select metrics and ignore so many others. I believed I must be missing out on some valuable information. With the DOPA process, the machine is doing the analysis and pulling out for my attention only the metrics most likely to predict a problem, so the number of metrics that can be included in the analysis is vastly expanded while the speed and accuracy of the analysis enhanced.
The next chapter will discuss the first part of that analysis, statistical analysis.