© Roger Cornejo 2018
Roger CornejoDynamic Oracle Performance Analyticshttps://doi.org/10.1007/978-1-4842-4137-0_10

10. Further Enhancements

Roger Cornejo1 
(1)
Durham, NC, USA
 

Introductory Statement

As I have stated repeatedly throughout this work, the DOPA process is something I have been developing over a period of time. I consider it an effective tool as is, but there are many enhancements and further applications which I would like to explore as time allows. In this chapter I discuss further enhancements/possible future work in a general way. I hope these brief comments will stimulate the reader to experiment with these ideas as well. If so, I would love to hear about the results of your exploration.

Metrics Analysis

The DOPA process as currently coded uses a statistical process whereby outliers are removed prior to calculating normal ranges. The metrics are then compared to the normal range to determine which metric values should be flagged, and there is some ability to alter the sensitivity of the process. An alternate method for determining which values should be flagged, that is often used in monitoring, is to use percentiles. In this scenario, metric values would be viewed as a percentile of the high end of the range and flagged when in excess of the stated percentile threshold. Thresholds could be adjusted to create a greater (lower percentile) or lesser degree (higher percentile) of sensitivity, and because the threshold is calculated with each iteration, this is a dynamic threshold.

The PERCENTILE_CONT analytical function in Oracle could be leveraged to accomplish this mechanism for flagging. I believe this would be an enhancement to DOPA that could be accomplished easily. Although I do not have experience with other RDBMSs, I tend to think they would have a corresponding function for this task. The percentile threshold method may be particularly advantageous for metrics whose distribution is not normal. I am aware that metrics fitting this description are part of the analysis used by DOPA, but I haven’t had the opportunity to investigate how many are normal vs. nonnormal in their distribution.

Ultimately , I believe that a meta-analysis/comparison of predictive models, one built using the normal distribution and one using the percentile threshold, could be accomplished via machine learning to ascertain which method yields the most accurate model. I’ll discuss this more thoroughly later in this chapter when I discuss machine learning.

Implementing Other Sources/Customized Metrics

So far in this book, I have talked almost exclusively about using the AWR metrics for the DOPA process, and I mentioned the possibility of customized metrics. Although my work focuses on the performance of the Oracle database, the DOPA process concept can be applied not only to other areas within the infrastructure but even totally different applications, essentially any set of instrumented metrics.

Following are some ideas about how you can leverage data from other sources to create customized metrics.

The basic idea is to union in other SQLs including SQL with calculated values. Any SQL can be a source of metrics including:
  • Active session history

  • Regressed SQL: dba_hist_sqlstat

  • Advisor analysis: DBA_HIST_ADVISOR queries for recommended action

  • Application of this approach to V$ tables and StatsPack

  • Application of this approach to other metrics collection sources in the infrastructure

  • Application of this approach to other database management systems such as Postgres or MongoDB

Active Session History

To instrument the active session history and incorporate the results with other AWR metrics, I would preprocess the data so that the sample intervals match the sample intervals of the other metrics (i.e., the data prep stage). For example, to properly instrument v$active_session_history, one would need to know that there is one sample every one second [=> a sample approximates one second of DB time], and this should be aggregated to the snapshot interval (in most cases one hour). Likewise, to instrument DBA_HIST_ACTIVE_SESS_HISTORY, know that there is one sample every ten seconds [=> a sample approximates ten seconds of DB time]; again, this would be aggregated to the snapshot interval. In the case of instrumenting DBA_HIST_ACTIVE_SESS_HISTORY, which has “snap_id,” it would be quite easy to obtain the DB time for a SQL statement in a snapshot interval: DB Time for a sql_id is (in the simple case):
select snap_id, 'sql_id: ' || sql_id || ' - DB Time' metric_name
, count(*)*10 average
, 'dba_hist_active_sess_history' stat_source
, null metric_id
from DBA_HIST_ACTIVE_SESS_HISTORY
where 1=1
  and sql_id is not null
group by snap_id, SQL_ID

Likewise, one could provide more details in the custom metric by adjusting the group by, for example, “group by snap_id, sql_id, event”.

Regressed SQL: dba_hist_sqlstat

Adding to the DOPA process, normalized metrics at the SQLSTAT level would be another possibility. Again, the process is much the same as for other metrics. I prototyped this instrumentation by unpivoting the multicolumn table format of dba_hist_sqlstat. Following the pattern described in the “Normalize the Data” section of Chapter 3, Data Preparation, the following SQL pseudocode (i.e., this is not the full implementation) is what the code section would look like when creating the unpivoted format of the active session history metrics of interest:
select a.snap_id, a.force_matching_signature metric_id
, parsing_schema_name || ': ' || sql_id ||': ' || 'executions'       metric_name
, executions_delta       delta_value
, a.dbid, a.instance_number
from dba_hist_sqlstat  a
union
select a.snap_id, a.force_matching_signature metric_id
, parsing_schema_name || ': ' || sql_id ||': ' || 'fetches'          metric_name
, fetches_delta          delta_value
, a.dbid, a.instance_number
from dba_hist_sqlstat  a
union
select a.snap_id, a.force_matching_signature metric_id
, parsing_schema_name || ': ' || sql_id ||': ' || 'px_servers_execs' metric_name
, px_servers_execs_delta delta_value
, a.dbid, a.instance_number
from dba_hist_sqlstat a
union
...

Once you have created a normalized data set as in the preceding section, you would union that normalized data set in as per the pattern described in the “Union Data from All Sources” section of Chapter 3. Once the metric source is unioned in, you would then perform the typical DOPA process analysis on those dynamically created metrics.

The table in Figure 10-1 shows a selected output from running the prototype SQL code [the prototype code is not provided, as it is not yet fully incorporated into the DOPA process]:
../images/469031_1_En_10_Chapter/469031_1_En_10_Fig1_HTML.png
Figure 10-1

Example output from prototype instrumenting Regressed SQL using dba_hist_sqlstat

So far, I’ve only done this in prototype and the results look interesting. I am interested to fully factor in this new capability into the DOPA process.

Advisor Analysis: DBA_ADVISOR% Queries for Recommended Action

I have been exploring the content of the Advisor framework (in particular, the persisted data in the DBA_ADVISOR% tables) for some time now. I’ve become very familiar with the content and where to go for what kind of data. To me, the most promising starting point is the ADDM Advisor which has hourly runs that match the snapshot intervals. Essentially what I would do is unpivot the MCT structure into the IRP format and then union it in with the other DOPA instrumented sources. Since these results are more along the lines of actionable tuning recommendations, I would probably just include them at the top of the Metrics Aggregate View, so you could quickly see what tuning advice was made by ADDM. I have not yet prototyped this approach, but I have the basic queries to extract parameter change recommendations, SQL Tuning Advisor recommendations, and Segment Tuning Advisor recommendations.

Application of This Approach to V$ Tables and StatsPack

For optimal real-time analysis, the AWR tables are not a best choice since you only get rolled up data in the DBA_HIST% views every hour (or every snapshot interval). For real-time analysis, I would find a way to instrument select v$ views and use those as the source of the metrics for the DOPA process analysis instead. StatsPack views could also be used as data sources for the DOPA process if one does not have access to AWR.

Application of This Approach to Other Metrics Collection Sources in the Infrastructure

The fact that the DOPA process is currently instrumented for metrics coming from AWR should not be considered a limitation of the approach. Rather, DOPA’s analytics-based approach would work for any metric instrumented across the infrastructure from other sources. Essentially, if you can get metrics into Oracle, you can instrument them into the DOPA process. Typically, DBAs don’t have remit or authority or tools that cut across the infrastructure, but imagine if the operating system, network, storage, app server, and even application metrics were instrumented in such a way that you could talk SQL to them. It would then be a simple exercise to normalize them and union them into the DOPA process; this would provide a great deal of fodder for analysis. With better infrastructure visibility, solving performance issues related to the database would be much easier.

Application of This Approach to Other Database Management Systems Such As Postgres or MongoDB

I’ve talked to many DBAs who work with MongoDB and Postgres; these databases also collect performance metrics which can be queried. Essentially, if the metric exists and measures something relevant, it can be used for problem-solving in the DOPA process.

Graphs

I am currently using three views in the DOPA process: the Metrics Aggregate View, the Metrics Time-Series View, and the Category Count View. I frequently manually graph data from the Metrics Time-Series View and the Category Count View. I think it would be useful to be able to generate graphs easily to help with the analysis.

Time-Series Graphs

Although I am currently producing a time-series graph that I use in my analysis, the process by which I generate this graph is clunky and time-consuming. It involves several steps: First I look at the results of running the Metrics Aggregate View and select a metric of interest, and then I use the metric name in a run of the Metrics Time-Series View (changing some of the parameters so that I can see all the data). Running the Metrics Time-Series View produces a table of data that I save off to Excel. I go into Excel and choose the columns I want to graph and tweak the size/fonts to obtain the finished graph. This can take several minutes.

This has been my method to date because Excel is accessible and fairly simple to use, and I haven’t had the time to develop a more streamlined process. I know other tools for graphing exist and could probably be integrated with the DOPA process to produce quality graphs in much less time and be more easily modifiable. Google Graphs is one option that I would consider. Ultimately, I would like to be able to have a regular interface with the graphs so that a graph could be generated with just a couple clicks.

Metric Frequency Distributions

A histogram representation of metric value frequency distributions is another enhancement that I’d like to add to the DOPA process. I have prototyped a Metrics Frequency Distribution View going against dba_hist_sysmetric_summary. With this prototype I generate a table of data that could be graphed to represent metric frequency distributions visually. In this view, metric values would be sorted into “buckets” representing a specified range of metric value. By adding a few key calculations to the DOPA code, I could use the inbuilt WIDTH_BUCKET Oracle analytical function and group by the metric name and the “buckets.”

Prototype Metrics Frequency Distribution View
select metric_name, bucket
, count(bucket) num_values_in_bucket
, avg(average) average_value_in_bucket
from (
select metric_name
, metric_unit
, average
, width_bucket(average, (min(average) over ()), (max(average) over()), nvl(:num_buckets, 100)) bucket
from dba_hist_sysmetric_summary
where 1=1
  and (dbid, instance_number) =
(select d.dbid, i.instance_number from v$database d, v$instance i)
  and     end_time   between sysdate - 30 and sysdate
  and     metric_name       = :metric_name
  and snap_id between nvl(:start_snap_id, snap_id) and nvl(:end_snap_id, snap_id)
  order by end_time
)
group by metric_name, bucket
order by metric_name, bucket
;

Note on bind variables:

I run the code through Toad which prompts me for the bind variable values; I suspect SQL Developer behaves the same way. SQL*Plus is different, so one should refer to the SQL*Plus documentation for how to populate them using that tool. As far as the values go, follow the description/advice in the following:
  • :num_buckets is defaulted to 100 if you don’t change it via: nvl(:num_buckets, 100).

  • :metric_name would be any metric name from dba_hist_sysmetric_summary, for example, “Response Time Per Txn,” “Physical Read Total Bytes Per Sec,” “Current Open Cursors Count,” and so on.

  • :start_snap_id and :end_snap_id would be the range of snapshots that you are interested in; by default, if you leave these bind variables null, it will look at all the data in dba_hist_sysmetric_summary.

A summary of the number of occurrences for each bucket could be represented using a histogram (see Figure 10-2). A histogram is very helpful for visualizing the normal occurrence of metrics; it helps you to understand what is going on with the database. I produced the histograms using Excel against output from running the query in the preceding section for the specific metric name.
../images/469031_1_En_10_Chapter/469031_1_En_10_Fig2_HTML.jpg
Figure 10-2

Example histogram: Response Time Per Txn

Once again, I have been able to produce histograms, but the method by which I do so is clunky.

Adding a graphing capability to the DOPA process would make these tasks much more efficient.

Furthermore, since I’ve already normalized and unioned the metric data, I have access to all of the metrics in one location, so adding a Metrics Frequency Distribution View is not that difficult. If you tried to accomplish this task when the metrics were in separate tables, you would need several different pieces of code to accomplish the same task.

Many of the metrics I’ve generated histograms for have long tails on the high range of the metric and a big hump (or in this case two humps) in the lower range of the metric. By understanding the frequency distribution of the metric, we can project beyond the raw metric data at hand and make inferences about the metric in general even though we don’t practically have access to all the data that comprises the true distribution.
../images/469031_1_En_10_Chapter/469031_1_En_10_Fig3_HTML.jpg
Figure 10-3

Example histogram: Physical Read Total Bytes Per Sec

Another interesting perspective this gives you is the frequency distribution for the same metric across different databases. The following graphs (Figures 10-4 and 10-5) are for the Current Open Cursors Count on two different databases. As you can see, the distribution is specific to the database and date/time range when the values were collected.
../images/469031_1_En_10_Chapter/469031_1_En_10_Fig4_HTML.jpg
Figure 10-4

Example histogram—Database A: Current Open Cursors Count

../images/469031_1_En_10_Chapter/469031_1_En_10_Fig5_HTML.jpg
Figure 10-5

Example histogram—Database B: Current Open Cursors Count

Ranking

Given the enormous number of metrics available in AWR (as well as other sources), it is quite reasonable that individuals have gravitated toward small model approaches that are more easily comprehended. Using the DOPA process allows us to examine a much larger cross section of data, but even with DOPA it is sometimes necessary to prioritize. Here again, the propensity to pay more attention to the metrics with which we are comfortable is a real temptation. While experimenting/developing the DOPA process, I preferentially viewed my “favorite” metrics while simultaneously considering the other flagged metrics. This enabled me to see correlations with other metrics and gave me a better understanding of what was happening inside the database.

It would be a fairly simple task to manually assign a priority ranking to various metrics by including a column in the taxonomy by which the metrics could be sorted, so that when showing flagged metrics, the ones with the highest priority would be shown first.

Note

I have a thought that machine learning could be used to assign rankings rather than manually assigning them.

The idea of manually assigning a ranking/priority to various metrics might be seen as a move back toward the small model approach. I don’t think that prioritizing is bad, I view this as emphasizing well-known/actionable metrics. Having said that, I believe there should be an objective method for bumping certain items up and ranking them as more important.

This leads me to consider another extension of DOPA that I think would be very useful—establishing correlations.

Correlation Analysis

All database professionals already recognize the existence of certain correlations, for example, often when Average Active Sessions goes up, Host CPU Utilization goes up as well, or perhaps you have noticed when Host CPU Utilization is up, so is single-block read latency.

Since DOPA is already being used for statistical analysis, it is not a stretch to think that we might be able to extend its usefulness to methodically finding correlations between and among the many metrics, not generally, but in the specific performance scenario at hand. The correlation analysis could easily be accomplished using the correlation functions within Oracle.

The result of the CORR function in Oracle is a correlation coefficient ranging in value from –1 to 1. A correlation coefficient of 1 means that the two variables perfectly correlate (when one goes up, the other goes up, and when one goes down, the other goes down), so the closer to 1 a correlation coefficient is, the more correlated the two variables are. A correlation coefficient of 0 means no correlation and a correlation coefficient of -1 indicates the relationship is inversely correlated (when one goes up, the other goes down and vice versa).

Note

Oracle’s aggregate function, CORR (var1, var2), takes two numeric variables as input and calculates the Pearson’s correlation coefficient.

The fact that the DOPA process uses a normalized item result pair structure for all its metrics means that just one SQL statement is required to correlate all metrics to all metrics by joining the normalized metrics to itself based on the time the data was collected (snap_id). Of course, this generates a massive Cartesian product, so one might want to constrain the analysis to select metric names.

To test this idea, I did the following:

I suppose you could say I have a habit of developing performance tuning code in the “heat of battle.” I was notified of a performance issue with a database where application functions hitting the database just froze. Initial investigation showed 100% CPU utilization for long duration accompanied by “cursor: pin S” wait event. The metrics for both “Host CPU Utilization (%)” and “Average Active Sessions” were extremely high during the same interval. Curious about how other metrics correlated to CPU, I created a prototype of a Metrics Correlation View (the “CORR SQL” in Figure 10-6 uses dba_hist_sysmetric_summary) to see what the correlation factor was between “Host CPU Usage Per Sec” and all the other metrics. Interestingly, I found some metrics that were better correlated than “Average Active Sessions.”

Note

A session waits on “cursor: pin S” when it wants to update a shared mutex pin, and another session is currently in the process of updating a shared mutex pin for the same cursor object.

The following SQL code is what I used to prototype a “Metrics Correlation View.” This SQL statement uses Oracle’s CORR function against dba_hist_sysmetric_summary. This SQL statement will perform a pairwise correlation analysis for all metrics defined in the set “metric_set_1” against all the metrics defined in set “metric_set_2.
with snaps_for_interval as
(select distinct snap_id , begin_interval_time, end_interval_time
from dba_hist_snapshot
where trunc(begin_interval_time, 'HH')
between trunc(to_date(:intrvl_st_MM_DD_YYYY_HH24_MI, 'MM_DD_YYYY_HH24_MI') , 'HH')
    and trunc(to_date(:intrvl_end_MM_DD_YYYY_HH24_MI, 'MM_DD_YYYY_HH24_MI') , 'HH'))
, metric_set_1 as
(select snap_id, metric_name, average, maxval
from dba_hist_sysmetric_summary natural join dba_hist_snapshot
where upper(metric_name) like upper(nvl(:metric_name1, metric_name))
  and snap_id in (select snap_id from snaps_for_interval))
, metric_set_2 as
(select snap_id, metric_name, average, maxval
from dba_hist_sysmetric_summary natural join dba_hist_snapshot
where upper(metric_name) like upper(nvl(:metric_name2, metric_name))
  and snap_id in (select snap_id from snaps_for_interval))
/* main select CORR function */
select s1.metric_name metric_name_1
, s2.metric_name metric_name_2
, round(CORR(s1.average, s2.average), 7) AS "Pearson's Correlation"
from metric_set_1 s1, metric_set_2 s2
where s1.snap_id = s2.snap_id
  and s1.METRIC_NAME <> s2.metric_name
group by s1.METRIC_NAME, s2.metric_name
having CORR(s1.average, s2.average) is not null
   and (CORR(s1.average, s2.average) >= .75
     or CORR(s1.average, s2.average) <= -.75)
order by 3 desc
;
Note on bind variables for the preceding SQL:
  • :intrvl_st_MM_DD_YYYY_HH24_MI: provide an interval start date/time in the format: ‘MM_DD_YYYY_HH24_MI’

  • :intrvl_end_MM_DD_YYYY_HH24_MI: provide an interval end date/time using the format: ‘MM_DD_YYYY_HH24_MI’

  • :metric_name1: use a full metric name from dba_hist_sysmetric_summary or wildcard it with “%” (or any other wildcarded substring)

  • :metric_name2: same description as with :metric_name1

Figure 10-6 shows tabular results of running the prototype of a “Metrics Correlation View” showing the pairwise comparisons of “Host CPU Usage Per Sec” compared to all the other metrics [:metric_1 set to “Host CPU Usage Per Sec,” metric_2 wildcarded using “%” for a specific interval].
../images/469031_1_En_10_Chapter/469031_1_En_10_Fig6_HTML.png
Figure 10-6

Tabular results of running the prototype of a “Metrics Correlation View

In reviewing the table (Figure 10-6), one can see that “Host CPU Usage Per Sec” is almost perfectly correlated with “Host CPU Utilization (%)” and “CPU Usage Per Sec.” These correlations are not very interesting since they are pretty much measuring the same thing. The third highest correlation (“Host CPU Usage Per Sec” correlated to “Open Cursors Per Sec,” with correlation coefficient ~ 0.99) is interesting though since the “cursor: pin S” wait event has to do with SQL processing (cursors are used to process SQL). Further, I discovered that the “Average Active Sessions” was not as well correlated to “Host CPU Usage Per Sec” as I thought it would be, with a correlation coefficient of ~ 0.81.

My current thinking for extending DOPA for correlation analysis is that the simple pairwise correlation analysis could be another view, say, the Metrics Correlation View, that produces a table of the pairwise correlation analysis. Such a view added to the set of DOPA model generating views should yield a higher-quality analysis of the performance problems and contribute to improvement in model accuracy. In addition, I could add a Metric Correlation Time-Series View, the output of which could be used to produce a graph of the pairwise correlated metrics on the same graph. The following graph shows an example of pairwise correlated metrics on the same graph for Host CPU Usage Per Sec compared to Open Cursors Per Sec, generated for the problem described in the preceding section.
../images/469031_1_En_10_Chapter/469031_1_En_10_Fig7_HTML.png
Figure 10-7

Metrics Correlation Time-Series graph example

In the preceding problem case study, the DOPA process was used to aid in the analysis of the problem (which turned out to be a flaw in the application which ran a particular SQL statement 800 million times in a day, when it would normally run around 10 million times). Further, the amount of SQL code needed to provide the prototypes of Metrics Correlation and the Metric Correlation Time-Series Views was not that much, and the impact of the additional analysis was significant in developing an understanding of the performance problem at hand. I am confident that adding correlation analysis to the DOPA process will be a fruitful endeavor.

Further efforts at finding correlations and assigning ranking might be accomplished via machine learning, which I will discuss next.

Machine Learning

Machine learning is used in so many applications and across so many industries; I think it is safe to say that this capability is no longer state of the art, but rather state of the practice. Machine learning is a type of artificial intelligence that analyzes data to identify patterns and produce analytical models. Through an iterative process in which the models are exposed to new data, they are enhanced without human intervention, hence the term “machine learning.” The increased capacity of machines to collect, store, and process vast amounts of data quickly has accelerated this field of study.

DOPA is powerful because of the statistical analysis it performs on the data. It relies on several machine learning techniques to produce a very accurate descriptive correlation analysis that can be used for problem-solving. It is not, however, in the purest sense, a machine learning tool since it does not “learn” and extend the model to scenarios it has not previously encountered.

I think it is possible and would likely be very profitable to build the capability of DOPA such that it could perform true machine learning. Furthermore, using machine learning with the DOPA process should be fairly straightforward and well within reach of the average software savvy DBA through tools such as Oracle Advanced Analytics, Oracle R Enterprise, MatLab, R, Python, and others. These tools have popular machine learning algorithms already coded up for you. With these tools, the DBA can focus on arranging the input data and interpreting the results rather than the internals of the machine learning algorithm; usually only a few lines of code are needed to implement. Personally, I like the idea of using Oracle R Enterprise (within Advanced Analytics license pack) because I agree with the strategy of bringing the algorithm to the data rather than bringing the data to the algorithm [plus it is set up so the developer can “talk SQL” to the ML algorithms]. In my mind, Oracle Advanced Analytics holds great promise as a scalable platform for building performance analytics models. I can see, however, that there are cases where running R, MatLab, or Python outside the database would be a fair approach. For example, I understand that R can also generate SQL that can be run in Oracle for some common processing.

In the previous section, I alluded to areas in which I believe ML would be profitable. I’ll expand on some of those ideas here.
  • ML applied to metric analysis

  • ML applied to ranking

  • ML applied to correlation analysis

  • ML applied to building taxonomies

ML Applied to Metric Analysis

Machine learning is a well-suited approach to analyzing Oracle performance metrics, not only because of the volume and variety of data (many variables) but because the Oracle RDBMS is an extremely complex software system with many moving parts that interact together. Moreover, since Oracle RDBMS is likely the most instrumented piece of software on the planet, I assert that the quality and predictiveness of the input data to start with is very high (many data mining/data analysis projects require a lot of data prep/cleanup work). The more advanced analytical methods of ML would likely allow a richer analysis and thereby yield higher-quality models.

High numbers of complex variables are difficult to study with traditional statistical methods but can be accomplished rather easily with machine learning. In machine learning, variables are the measures of interest; for DOPA, the measures of interest are the performance metrics. The promise of leveraging machine learning to Oracle performance analytics is to quickly analyze massive amounts of performance metrics, to find hidden patterns in the metrics, and to use that information for new insights into performance problems and even to be able to predict performance problems. The DOPA process already identifies the most important factors (metrics) through its feature selection/flagging outside of normal range; however, the use of ML algorithms would hopefully make the feature selection more accurate and provide a more rich association of metrics.

ML Applied to Ranking

Since, broadly speaking, machine learning is inference based on data, it is quite reasonable to think that machine learning could be used to help rank which metrics are most important based on which ones were most highly correlated with problems or with predicted problems in the case of monitoring. This would be a good area to explore for ML. Sadly, this is an area of very early exploration for me, I don’t have many details other than to recognize the potential for additional work in this area.

ML Applied to Correlation Analysis

The DOPA process currently looks at metrics in isolation. The use of machine learning techniques should be able to provide insight into trends/predictive patterns with multiple metrics in combination. Using the machine learning approach of supervised learning with various problem types as the training sets, one should be able to obtain metric-based profiles of problem types.

The normalized metrics used by the DOPA process already provide a single consolidated operational data store which can be analyzed by applying advanced analytics, data mining/data science, and ML techniques. Again, the aim would be to discover key patterns and correlations from historical performance metrics and also provide actionable predictive analysis that could be used to reduce cycle time for a performance resolution, reduce application problems, and reduce downtime.

ML Applied to Building Taxonomies

In a previous chapter, I hinted at building additional taxonomies such as a workload taxonomy (or building out the existing infrastructure and Oracle taxonomies). I haven’t done this (or even completely refined the ones I have implemented) because building the taxonomy is a time-consuming process. It may be possible to use machine learning algorithms to associate a metric with a higher-level grouping to accomplish this taxonomy building process. The possibility of enhancing the accuracy and usefulness of taxonomies in the DOPA process via machine learning is another area that interests me—the goal with both of these being a better bundling of metrics.

Note

A workload taxonomy could be useful in grouping the metrics into a set of workload categories such as storage/memory utilization, transaction rates, execution rates, logon rates, IOPS, and so on.

ML Applied to Capacity Planning

Performance problems or infrastructure capacity issues are not easy to correct quickly (i.e., may require provisioning of additional infrastructure). For some performance/capacity issues and business models, it is necessary to know far in advance of an actual bottleneck what the capacity needs are and will be in the future in order to avoid problems that negatively impact the business.

The best way to avoid problems is to anticipate capacity needs and plan for them, but without the ability to predict capacity needs accurately, it is possible to either underprovision (thereby incurring the problem we were trying to avoid in the first place) or overprovision the hardware configuration. Overprovisioning may avoid the bottleneck we were trying to avoid, but it entails much higher operating costs.

With a good, metrics-based capacity planning capability, organizations can avoid both under- and overprovisioning and feel confident that the hardware purchase being made will be suitable for the intended use. I would venture to guess that in most organizations, such a capability would be a transformational change resulting in much lower costs.

Capacity planning is yet another area where ML appears to hold great promise for providing the insight necessary for making good decisions.

ML Summary

Machine learning is now state of the practice and holds great promise for analysis and interpretation of big data. DOPA uses the data set of metrics contained in AWR to perform statistical analysis, but I believe that there are many more opportunities for using ML techniques within the DOPA process that would lead to even more accurate analysis and monitoring.

In the process of developing the DOPA process, I have been convinced of the “illusion of explanatory depth,” which refers to the phenomena of thinking you fully understand something when you really don’t. The more I learn about databases through using the DOPA process, the more I realize what I didn’t understand before. I think bringing ML into the data analysis will yield great insights into database performance and show us all things we never knew before that will be very useful for the jobs we do every day.

Chapter Summary

I first came up with the idea for the DOPA process around May of 2017. I started out prototyping using dba_hist_sysmetric_summary because the data was already “normalized.” As I continued developing the DOPA process (adding data sources, converting cumulatives into deltas, removing outliers, subsetting on flag ratio, etc.), and using DOPA to solve real-world performance issues (often in the “heat of battle”), I became absolutely drawn to this process, and ideas for future enhancements would frequently spring to mind. I paused major development on the DOPA process so that I could focus on not only using and proving out what I had so far but spreading the word at industry conferences (Hotsos 2018, Collaborate 2018, ECO 2018) and writing this book. I am really jazzed up about implementing the Metrics Correlation View since the prototype with dba_hist_sysmetric_summary went so well. Further, the graphs I was able to easily generate using the prototype version of the Metric Correlation Time-Series View have helped me explain several tricky performance cases. Another area I have an intense interest in pursuing is the metric value frequency distributions and generating the graphs. The prototype for this has given me more insight into the metric data distribution than I initially thought it would. Moreover, I look forward to being able to dive deeper into the field of machine learning and leveraging this “state of the practice” capability in the DOPA process. There is much to explore, learn, and implement and many more fires to fight and problems to solve.

As I stated in the beginning of this chapter, I hope these brief comments will stimulate the reader to experiment with these ideas as well. If so, I would love to hear about the results of your exploration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset