7. Managing, Optimizing, and Tuning VLDBs

Data is the heart and lifeblood of every business. Data growth in every industry is phenomenal and is increasing at a very rapid pace with each passing year, sometimes more than 100 percent from its previous year. Data warehousing is a common requirement for any organization, whatever the size. At the same time, configuring and maintaining very large databases (VLDBs) and extremely large databases (XLDBs) are challenging tasks that require advanced skills. This chapter presents a 360-degree overview of managing VLDBs, offering tips for best practices and strategies in configuration, performance, backup and recovery, and maintenance.

Overview of Very Large Databases

As many Oracle and data warehousing gurus have voiced, there is no fixed definition or standard rule to categorize a certain sized database as a VLDB. Not that long ago, databases that were only hundreds of gigabytes in size were considered to be VLDBs; however, over the past few years, the definition of VLDBs has changed significantly due to enormous data growth. It is not unusual to encounter databases that have grown to the size of several terabytes or even petabytes and that hold billions or even trillions of records. VLDBs and XLDBs therefore demand highly efficient software and hardware resources and require extremely large amounts of storage capacity.

VLDBs and XLDBs used for decision support systems (DSS) are often referred to as data warehouse (DW) databases. DSSs are critical to an organization’s business management to analyze the functionality and business growth of the products and services that the organization offers. Data warehouses generally hold historical data that is loaded from various data sources, including another database, a flat file, an Excel spreadsheet, and other sources. Extract-transform-load (ETL) tools are typically used to load data from those sources, while applications and utilities such as business intelligence (BI) and business objects are used to generate management reports and dashboards for the business’s needs.

VLDBs and XLDBs therefore engender a unique set of challenges and can raise some daunting tasks for the information technology (IT) department of an organization. They demand state-of-the-art technologies to meet myriad business needs while simultaneously delivering optimal performance. When designing such a database, you will almost certainly encounter the need for high-end hardware resources, huge storage capacities, large network bandwidths, and special considerations for backup and recovery. The following sections present best practices and discuss how to apply the tools, tips, and tricks to ensure a solid VLDB and XLDB foundation so that you can manage them with ease.

Optimal Basic Configuration

One of the key factors in having an optimal setup is getting the basics right and building a solid foundation. This segment briefly reviews some of the initial configuration tips for VLDBs:

Image Choosing the right database configuration template in DBCA

Image Selecting the optimal data block size

Image Sizing adequate system global area (SGA), program global area (PGA), and other memory components

Image Leveraging data compression

Image Using temporary tablespaces effectively

Image Implementing partitioning for easy data management

Image Making the right choices for partitioned indexes (especially global vs. local)

Image Enabling parallelism to take advantage of multiple CPUs

Image Verifying application code for effectiveness before its production deployment

Image Implementing appropriate backup and recovery strategies

Data Warehouse Template

When deploying a new database, it is advisable to follow some basic rules specific to the application category: online transaction processing (OLTP), DSS, or a combination of both. To create a new Oracle database, you can use the Database Configuration Assistant (DBCA), which is a Java-based GUI tool, or you can use the CREATE DATABASE statement, a manual approach that requires running scripts. Depending on the nature of the application—that is, whether it’s an OLTP or a DSS application—you should choose the appropriate database creation template through DBCA, as shown in Figure 7.1. If the database is intended for DSS or VLDB, choose the Data Warehouse template so that the appropriate database initialization parameters, online redo log sizing, and tablespace sizing are set properly.

Image

Figure 7.1 Database Configuration Assistant: Create database template

Optimal Data Block Size

It is essential to choose the right database block size for databases, especially for VLDBs and XLDBs. The data block size has an impact on the overall database read and write performance. Although the default 8 KB block size is most appropriate and can meet the demands of an OLTP application, DSS systems generally require a larger block size. If the database is already configured with a default 8 KB block size, the DBA can use Oracle’s multiple block feature. Under some circumstances, using a 16 KB database block size or even a 32 KB size for VLDBs and XLDBs would yield performance benefits over a 4 KB or 8 KB block size. The block size is controlled with the db_block_size initialization parameter.

Oracle supports multiple data block sizes within the same database, but the practice is not encouraged unless your application demands it. Any VLDB might have data block sizes of 16 KB or even 32 KB, but only the DBA can determine which block size best supports the application’s performance and behavior. Remember, once a database is created with a default block size, the default block sized cannot be modified unless the database is re-created.

Following are a few tips for choosing the right block size that meets the application’s needs:

Image A smaller block size is efficient when rows are smaller and data access is random.

Image Choose a larger block size to improve the read performance when the rows are smaller and access is sequential or when you have a mixture of random and sequential reads.

Image Larger block size produces significant read performance when the rows are larger, such as in large object (LOB) columns.

Image With large block size for high concurrency systems, ensure you have set appropriate values for the INITRANS and MAXTRANS parameters.

Image Larger block size has less overhead and stores more rows in a single block.

Image With a larger block size, several rows are put into buffer cache with a single read.

Bigfile Tablespaces

Today’s VLDBs and XLDBs commonly grow into terabyte (TB) or even petabyte (PB) sizes. Traditionally, the larger the database, the more tablespaces and datafiles will be required to support the increasing data demands. Depending on the operating system platform, the datafiles of a smallfile tablespace are allowed to grow to a maximum size of 32 GB for an 8 KB block size (128 GB for a 32 KB block size).

To limit the number of datafiles for VLDBs, Oracle introduced the bigfile tablespace concept, which allows a single large datafile per tablespace. However, that single datafile can grow to a maximum size of 32 TB for an 8 KB block size (and up to 128 TB for a 32 KB block size). This feature eliminates the need for numerous datafiles and simplifies tablespace management. Keep in mind that bigfile tablespaces are valid only for locally managed tablespaces with automatic segment space management.

When you create bigfile tablespaces, you need to ensure there is enough free space for growth on the storage/filesystem  and that the system supports striping.

SQL> CREATE BIGFILE tablespace data_ts DATAFILE size 10G;
SQL> CREATE BIGFILE TABLESPACE data_ts DATAFILE '/oradata/data_ts01.dbf' SIZE 10G;

Refer to the BIGFILE column in the V$TABLESPACE dictionary view to identify whether the tablespace is a traditional smallfile or bigfile tablespace.

SQL> select name,bigfile from v$tablespace;

NAME                           BIG
------------------------------ ---
SYSTEM                         NO
UNDOTBS1                       NO
SYSAUX                         NO
USERS                          NO
TEMP                           NO
DATA_TS                        YES

Adequate SGA and PGA

Another critical decision a DBA must make concerns the amount of memory resources that should be assigned for the system and program global areas for the database instance used for a VLDB or XLDB. The applications that run against VLDBs typically require much higher rates of data access and process complex calculations, so it is essential to have optimal SGA and PGA to avoid performance pitfalls.

Fortunately, Oracle provides multiple options to conquer memory management and configuration hassles for large-sized database demands. Although you can manually configure memory components such as SGA and PGA separately, you might want to take advantage of the automatic memory management (AMM) feature to handle the SGA and PGA together automatically. It is also critical to ensure there is no frequent dynamic memory allocation and deallocation happening among the SGA components, especially during peak business hours. Review V$SGA_DYNAMIC_COMPONENTS, V$SGA_CURRENT_RESIZE_OPS, V$SGA_RESIZE_OPS, and V$SGA_DYNAMIC_FREE_MEMORY dynamic performance views to analyze whether the current SGA sizing is optimal for your environment.

There is no rule of thumb to define the perfect memory allocation for SGA and PGA, but you may consider starting with an optimal size and then monitoring the SGA and PGA statistics to adjust these settings as needed. Since VLDB applications tend to perform a lot of sorting, ensure that you configure adequate PGA as well. You can use the V$MEMORY_TARGET_ADVICE dynamic view to adjust the value of AMM if it is in use; also, the V$SGA_TARGET_ADVICE and V$PGA_TARGET_ADVICE dynamic views are helpful for tuning and adjusting the values for SGA_TARGET and PGA_AGGREGATE_TARGET, respectively. You can always adjust the SGA settings; sometimes a database restart is necessary for the new values to take effect.

Temporary Tablespace Groups

Managing and configuring the proper sizes of temporary tablespaces can become a significant challenge for a DBA as VLDB/DSS applications tend to perform heavy sorting operations quite often.

Essentially, the Oracle database uses temporary tablespaces to segregate the temporary work from the real work in the system. Performance bottlenecks on the temporary tablespace are often caused by an undersized tablespace, which results in the system running out of adequate temporary space when concurrent user queries increase the demand for temporary space. Temporary tablespace groups (TTGs), introduced in Oracle Database 10g, provides the ability to offer performance benefits by spreading I/O across multiple temporary tablespaces. A TTG can be created initially with a single temporary tablespace, and multiple temporary tablespaces can be added later. When configuring a TTG, consider having equally sized temp files and the same number of temp files for each temporary tablespace that will be part of the TTG. A TTG is a better option than having one large temporary tablespace or multiple temporary tablespaces assigned to multiple users in the database.

Data Partitioning

Managing the massive amounts of data typically stored in a VLDB or XLDB is not a simple task. It demands manageability, availability, and good performance. Oracle’s partitioning features offer a wide range of partitioning types to support your data and application needs. Data partitioning enhances easy data management, improves data availability, and delivers significant performance by dividing huge amounts data into much smaller segments.

Oracle continues to improve and expand partitioning capabilities in each new release of the database. Oracle supports table-level, index-level, and index-organized table (IOT)-level partitioning. When partitioning is applied to an object, each partition is treated as a separate segment. According to the need of your application, you can implement an appropriate partition strategy; when you can’t decide on a strategy, or when your data doesn’t call for range or list partitioning, you may choose to go with hash partitioning.

Oracle Database 11g made partitioning management easier with the introduction of interval partitioning, an extension of range partitioning, which allows a DBA to define automated partitioning of data on the basis of specified timestamp or numeric intervals. With this new feature, partitions are created automatically whenever the data demands. And when existing columns do not present an appropriate partitioning key, Oracle 11g also supports the ability to partition data on the basis of the value of an expression that can be stored in a virtual column.

DBAs also often agonize over when partitioning should be applied to a table, which column is the best choice for the table’s partitioning key, and whether partitioning should be applied to a table in an OLTP environment. There are many opinions on how best to answer these questions, but a good rule of thumb is that if the table is huge and often requires complex data maintenance, it makes excellent sense to at least consider partitioning regardless of whether an OLTP or DSS application is accessing the table.

Index Partitioning: Local versus Global

One of the tricky tasks that a DBA typically has to address when dealing with DSS applications is deciding between the local and global partitioned indexes for a partitioned table. The type of index partitioning serves an important role in data access, data loading, and partition maintenance, so it generally pays to invest considerable time when making these decisions.

The general perception is that global indexes deliver better performance for OLTP applications, whereas local indexes provide better performance benefits for DSS applications running against VLDBs.

Local index partitions not only provide better performance but also are easier to maintain and offer great application availability. Because each local index partition maps to a single corresponding individual table partition, this generally provides good application availability when a MERGE, SPLIT, EXCHANGE, DROP, or TRUNCATE partition operation is applied against one or more table partitions. However, you  can’t create a local index on the primary key and the unique index. When MERGE, SPLIT, EXCHANGE, DROP, and similar operations are performed, you need to either rebuild the index or add the UPDATE GLOBAL INDEXES clause.

Data Compression

Data retention may differ dramatically among organizations according to their different business needs, regulatory requirements, and even the data owner’s desire to keep historical data long past its perceived value. When the retention period is longer, a VLDB tends to encompass huge volumes of historical data, which will of course need a correspondingly huge amount of physical storage capacity. DBAs must therefore carefully consider exactly how much data is being retained and implement appropriate maintenance policies to insure that storage space is not exhausted unexpectedly or prematurely.

Oracle’s various data compression features may provide significant storage savings. If you are fortunate enough to be using Exadata or the ZFS logical volume manager for your VLDB, those storage platforms offer Hybrid Columnar Compression (HCC) features through which you may be able to save significant storage capacity. If your VLDB is not using these storage platforms, however, you may still be able to make use of Oracle Advanced Compression features to compress data within the database.

Oracle Advanced Compression is a separately licensed feature that provides comprehensive storage savings. As of Oracle Database 12c, Advanced Compression offers myriad features, including Advanced Data Compression, Heat Map, Automatic Data Optimization (ADO), Advanced Row Compression, Advanced Index Compression, and Advanced Network Compression.

Table Compression

Table compression allows you to compress the data within a table. Basic table compression, which doesn’t require any additional license, applies an average of 10X compression to a table’s data but applies only to initial direct-path loading. In contrast, Advanced Compression’s Advanced Row Compression, called OLTP compression in earlier releases, applies row compression to tables during all data manipulation language (DML) operations. If you are using either Exadata or ZFS storage, then HCC offers warehouse and archive compression methods in addition to basic and advanced compression.

Here are some examples of table-level compression options:

SQL> CREATE TABLE table_name (column_list...) ROW COMPRESS ADVANCED;
SQL> ALTER TABLE table_name ROW STORE COMPRESS BASIC;
SQL> ALTER TABLE table_name ROW STORE COMPRESS |ADVANCED;
SQL> ALTER TABLE table_name MOVE PARTITION partition1 ROW STORE COMPRESS BASIC | ADVANCE;
SQL> ALTER TABLE table_name NOCOMPRESS;

The following query lists the objects on which compression is applied:

SQL> SELECT table_name,compression, compress_for FROM user_tables WHERE compression = 'ENABLED';

Heat Map and Automatic Data Optimization

Often, historical data in a VLDB becomes less active as it grows older. If this is true of data contained in a partitioned table, Oracle 12c’s new (and complementary) Heat Map and ADO features may be particularly useful. You can define ADO policies and conditions under which data should be moved to a different storage tier or compressed at a higher compression level; ADO will then automatically apply the appropriate compression based on heat map statistics, but only when the corresponding conditions are satisfied.

To enable the Heat Map feature, all that is required is to set the HEAT_MAP initialization parameter to ON (the default value is OFF):

SQL> ALTER SESSION | SYSTEM SET HEAT_MAP = ON;
SQL> ALTER SESSION | SYSTEM SET HEAT_MAP = OFF; -- disable Heat Map

To list the statistical information about the Heat Map settings for database objects, you can use views V$HEAT_MAP_SEGMENT and USER_HEAT_MAP_SEGMENT. Views DBA_HEAT_MAP_SEGMENT and DBA_HEAT_MAP_SEQ_HISTOGRAM provide additional information about the individual data segments and their corresponding heat map settings.

ADO allows you to define the policies at table level, tablespace level, and row level for smart compression and automatic data movement. The following examples demonstrate how to deploy ADO policies at various levels for a table. In this example, when 60 days has elapsed since any DML has been performed against any data in the sales table, that table’s segments will be automatically compressed using ADVANCED compression:

SQL> ALTER TABLE sales
       ILM ADD POLICY
       ROW STORE
       COMPRESS ADVANCED
       SEGMENT
       AFTER 60 DAYS OF NO MODIFICATION;

In the subsequent example, when 30 days has elapsed since any DML has been performed against any data in the sales_dec14 partition of the sales table, it will be automatically compressed:

SQL> ALTER TABLE sales
       MODIFY PARTITION sales_dec14
       ILM ADD POLICY
       ROW STORE
       COMPRESS ADVANCED
       ROW
       AFTER 30 DAYS OF NO MODIFICATION;

To disable and then remove all ADO policies against the sales table, use the following syntax:

SQL> ALTER TABLE sales ILM DISABLE_ALL;
SQL> ALTER TABLE sales ILM DELETE_ALL;

Use the following example to query the information on ADO:

SQL> SELECT policy_name, policy_type, enabled FROM user_ilm_policies;

Advanced Index Partition Compression

Oracle Database 12c supports Advanced Index Compression, which yields storage reduction. The Advanced Index Compression feature supports unique and nonunique indexes while maintaining good performance during the index access operation. The compression can be applied at partition level, as demonstrated in the following:

SQL> CREATE INDEX index_name ON(column) COMPRESS ADVANCED HIGH LOCAL (PARTITION
      ip1 COMPRESS ADVANCED LOW, PARTITION ip2 COMPRESS HIGH, PARTITION ip3 NOCOMPRESS);
SQL> CREATE INDEX index_name ON(column) LOCAL (PARTITION ip1 COMPRESS ADVANCED LOW,
      PARTITION ip2 COMPRESS HIGH, PARTITION ip3);
SQL> CREATE INDEX index_name ON(columns_list..) COMPRESS ADVANCED LOW;

VLDB Performance Tuning Principles

Delivering optimal performance for applications that access a VLDB is essential. If Oracle database best practices are applied, many tuning nightmares may disappear. Following are the most common performance issues that can affect VLDBs:

Image Suboptimal application coding

Image Inefficient application and database design

Image Inaccurate or missing optimizer statistics

Image Lack of indexes or too many indexes

Image Limited hardware resources

Image Bad network design

Real-World Scenario

To demonstrate the issue of suboptimal application coding, let’s look at a classic example of a poor performance incident—an issue that was ultimately resolved with a simple code modification.

A reconciliation batch scheduled to run at the end of each business day was performing inconsistently. The job sometimes completed within only 2 hours and sometimes took more than 6 hours; during the database’s worst performance, it took nearly 24 hours to complete. This inconsistent performance not only gave the DBA team nightmares, it also significantly impacted the subsequent dependent jobs, and as a result, several key business users were unable to get their daily reports on time.

The job itself was straightforward, involving no complex queries or convoluted logic. The job was to perform the following simple tasks:

1. Delete data from the target tables.

2. Load data from text files into the target tables.

3. Run queries against the target tables to generate reports.

During the course of investigation, the following behavior was observed:

Image The database was growing in size at a uniform rate each day.

Image Important queries were favoring the nested loops in the query execution plan.

As a temporary workaround the following action plan was used:

1. Reorganize the target tables using the alter table ... move command.

2. Rebuild all indexes in the schema.

3. Gather application schema statistics.

4. If the preceding steps fail to resolve the inconsistent performance, export the schema into another database residing on a different server that has more CPU and memory capacity.

Despite these workarounds, the problem reappeared and gradually returned to the worst state (taking nearly 24 hours to complete). Interestingly, when the person responsible for the application was contacted, he mentioned that data in all tables was deleted and subsequently loaded from different sources, and the observed growth was unlikely with the amount of data he was loading daily.

Then the focus shifted to the data load script. Upon reviewing the code, a classic mistake was observed: data was removed from the tables with the DELETE command followed by direct path loads.

If you know the basics, you can easily spot the reason for the consistent growth in the database. When DELETE is used to remove complete data from a table, the high-watermark is not reset; therefore, subsequent direct-path load inserts start loading data above the high-watermark, ignoring the free data blocks below the high-watermark. This approach causes continuing data growth and consumed significant amounts of free space in the datafiles. Hence, queries on these tables were performing badly.

The simple solution to this issue was to replace the DELETE command with TRUNCATE TABLE. Once this change was applied, the job finished in just 2 hours. There have been no issues with performance in the 6 years since the problem was encountered.

The moral of the story is that if you get the basics right, tuning the code just a little could offer significant benefits for many performance issues.

Limiting the Impact of Indexes on Data Loading

Data loading is a common practice in data warehouse databases. In general, indexes improve queries’ data selection performance; however, having too many indexes sometimes hampers DML (insert, update, and delete) performance. When data loading doesn’t produce expected throughput and when a DBA receives performance-related concerns, one of the key areas to investigate is the number of indexes on the table.

One of the typical methods used to improve the performance of bulk data loading is to disable indexes and constraints on the table and then rebuild them after data loading completes. Although this may seem like a good workaround, it can be quite time consuming to rebuild indexes on extremely large tables. Therefore, it is recommended to minimize the number of indexes on the table to improve the data loading process.

There are a few approaches to find out which indexes are being used and which are not. One of the methods is to enable monitoring of index usage with the ALTER INDEX ... MONITORING USAGE command. Once you activate monitoring for indexes, you can verify their usage through the V$OBJECT_USAGE dynamic view; if monitoring shows that some of the indexes are not being used, you may consider putting them in invisible mode and dropping them after obtaining permission from the application owner/developers.

Oracle Database 12c gives you the ability to create partial indexes on a partitioned table. This cool feature provides the flexibility to turn on and off local and global indexing for individual partitions of a table. The following example demonstrates how to create a table with an INDEXING OFF|ON option:

SQL> CREATE TABLE emp (eno number(9),ename varchar2(100),dob date, dept number(2), salary
      number(6)) INDEXING OFF
PARTITION BY RANGE (salary)
(PARTITION p10000 values less than (10001) INDEXING OFF,
 partition p20000 values less than (20001) INDEXING ON,
 partition p30000 values less than (30001));

SQL> CREATE INDEX index_p1 ON emp(eno) GLOBAL INDEXING FULL|PARTIAL;
SQL> CREATE INDEX index_p2 ON emp(salary) LOCAL INDEXING PARTIAL;
SQL> ALTER TABLE emp MODIFY PARTITION p10000 INDEXING ON;

When indexes are created with the PARTIAL option on a table:

Image For a local index, partitions that are tagged as INDEXING OFF will have an UNUSABLE index status, and partitions that are tagged as INDEXING ON will have a USABLE index status.

Image For a global index, only those partitions that are tagged as INDEXING ON will be active, and the rest will be excluded.

To view an index’s status, refer to the INDEXING column in DBA|USER_PART_TABLES, DBA|USER_TAB_PARTITIONS, and DBA|USER_TAB_SUBPARTITIONS. Keep in mind that a partial index can’t be applied to unique indexes.

Maximizing Resource Utilization

Two things are very common in VLDB environments: huge hardware resources and massive data. The queries that access VLDBs often either scan large amounts of data or perform massive computation on huge amounts of data. When we have a solid hardware presence, it is up to us to best utilize those resources to improve the overall application performance. One of the best ways to do this is to divide and drive: that is, divide the workload of the query into smaller pieces so it will complete as soon as possible. Don’t panic. We are not talking about rocket science here—just leveraging Oracle’s capabilities in regard to parallelism.

When parallelism is applied at any level—database, table, or query—Oracle distributes the workload of the query among multiple slave processes and completes the job more quickly than if it were executed serially. Parallel execution utilizes multiple CPU and I/O resources on the servers to achieve improved performance by reducing the response time needed to execute the query.

As just mentioned, you can define parallelism at various levels in an Oracle database:

Image Object level (table or index)

Image Session level

Image Database level

Image At statement runtime

If you are uncertain about the degree of parallelism to use, you can leave this to Oracle, as Oracle will automatically determine the number of parallel processes to use during execution based on the settings for initialization parameters cpu_count and parallel_threads_per_cpu. Following are some of the key initialization parameters pertaining to parallel execution and their default values:

parallel_degree_policy               string      MANUAL
parallel_execution_message_size      integer     16384
parallel_max_servers                 integer     240
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_server                      boolean     TRUE
parallel_servers_target              integer     96
parallel_threads_per_cpu             integer     2

Using parallelism is generally advised while dealing with huge amounts of data. For example, the following are some of the statements that might benefit from parallelism:

Image CTAS (CREATE TABLE AS SELECT)

Image ALTER TABLE|PARTITION MOVE|SPLIT| PARTITION

Image ALTER INDEX REBUILD

Image CREATE INDEX

Image INSERT AS SELECT

Image INSERT /*+ APPEND */ (direct path INSERTs)

If you are working with an Oracle Real Application Clusters (RAC) database, when you set the PARALLEL_FORCE_LOCAL initialization parameter value to TRUE, it restricts the SQL statement parallel execution to a single instance.

Gathering Optimizer Statistics

Maintaining accurate and up-to-date optimizer statistics always is critical to generating optimal SQL execution plans by the Oracle optimizer. Many performance issues can be resolved simply by gathering fresh optimizer statistics. The challenge comes when you have to gather optimizer statistics on large tables. You must consider the time required to gather the statistics, determine how often statistics must be gathered, and find the most efficient method of gathering them. The following sections provide some best practices for collecting accurate optimizer statistics quickly on large tables and table partitions.

Incremental Statistics Synopsis

Accurate and consistent optimizer statistics on tables helps to improve query performance; however, that need must be balanced against how to collect and maintain accurate statistics consistently on large partitioned tables in a timely manner. Although several workarounds and methods exist to accomplish this, we focus on the benefits of gathering incremental statistics and look at real-world examples.

The concept of gathering incremental statistics was first introduced in Oracle Database 11g with the intention to enhance the performance of gathering statistics at global and partition levels on large partitioned tables. Incremental statistics on partitioned tables are gathered at the global, partition, and subpartition levels. When incremental preferences are enabled, the statistical metadata for each partition of the table are stored in the form of a synopsis in the SYSAUX tablespace.

The synopsis helps to produce accurate global-level statistics by aggregating partition-level statistics, which eliminates the need to scan the full table. In the context, whenever a new partition is added, the global statistics are updated automatically using the existing partitions’ synopses and the new partition’s synopsis. Whenever 10 percent of the data is changed on a partitioned table, the statistics become stale and need to be regathered.


Note

Bug 16851194 reports continued growth in the SYSAUX tablespace without growth in actual data. Therefore, it is advisable to monitor the SYSAUX tablespace’s growth periodically to ensure you are not hitting this bug. The bug was fixed in Oracle 12.1.0.2.


Starting in Oracle Database 12c, the default behavior for stale statistics is controlled with the new INCREMENTAL_STALENESS argument. The default value for INCREMENTAL_STALENESS is NULL, which invokes the same behavior as in Oracle 11g. Also in Oracle 12c, global-level statistics are automatically calculated for any table whose statistics are either locked or stale.

Additionally, the USE_STALE_PERCENT argument can be used to define a different threshold value for the statistics’ staleness percentage. The default threshold value for statistics staleness is 10 percent, but it can be overridden so that statistics will not be considered stale until the percentage of changed rows has reached the specified threshold value:

SQL> exec dbms_stats.set_database_prefs('INCREMENTAL_STALENESS', 'USE_STALE_PERCENTAGE');
SQL> exec dbms_stats.set_database_prefs('STALE_PERCENT', '25');

Use the following example to get the current value for STALE_PERCENT:

SQL> exec dbms_stats.get_prefs('STALE_PERCENT', 'SCHEMA', 'TABLENAME') STALE_VALUE FROM dual;

To maintain and gather incremental statistics on a partitioned table, you must run through the following procedure first:

1. The INCREMENTAL statistics preference is not enabled by default; verify its current setting using this code:

SQL> SELECT dbms_stats.get_prefs('INCREMENTAL', 'SCHEMA', TABLENAME')
      tab_incr_prefs FROM dual;

TAB_INCR_PREFS
-------------------------------------------------------------------
FALSE

2. Turn on the INCREMENTAL statistics preference on the table:

SQL> exec dbms_stats.gather_table_stasts('SCHEMA',
      'TABLENAME',ESTMATE_PERCENT=>dbms_stats.auto_sample_size);

SELECT dbms_stats.get_prefs('INCREMENTAL','SCHEMA','TABLENAME')
      tab_inc_perf FROM dual;

3. Gather incremental statistics and ensure the incremental flag is turned on:

SQL> exec dbms_stats.set_table_prefs('SCHEMA', 'TABLENAME', 'INCREMENTAL', 'TRUE');

TAB_INC_PERF
-------------------------------------------------------------------
TRUE

The following query identifies which tables have the INCREMENTAL statistics flag turned on:

SQL> SELECT owner,  object_name
FROM dba_objects
WHERE object_id IN
  (SELECT DISTINCT(obj#)
  FROM optstat_user_prefs$
  WHERE PNAME='INCREMENTAL'
  AND VALCHAR='TRUE');

This example shows how to activate incremental statistics at the schema level:

SQL> exec dbms_stats.set_SCHEMA_prefs('SCHEMA', 'INCREMENTAL', 'FALSE');

Use the following syntax to disable the incremental statistics for an individual table:

SQL> exec dbms_stats.set_table_prefs('SCHEMA', 'TABLENAME', 'INCREMENTAL', 'FALSE');

Use the following syntax to disable the incremental statistics at the schema level:

SQL> exec dbms_stats.set_schema_prefs('SCHEMA', 'INCREMENTAL', 'FALSE');

Gathering Statistics Concurrently

Gathering statistics in a schema sometimes can be time consuming because of the large number of tables and indexes. Before Oracle 11.2.0.2, there was no direct method available to gather statistics concurrently on multiple objects. The concurrent statistics-gathering feature (also known as interobject parallelism) in Oracle 11.2.0.2 provides the ability to gather statistics on multiple tables or partitions concurrently. The objective of this feature is to fully utilize the database server’s hardware resources to improve the efficiency of gathering statistics and thus reduce the overall time required to gather them.

This behavior is controlled by the CONCURRENT global preference with the DBMS_STATS.GATHER_TABLE_PREF package. By default, the value for CONCURRENT is FALSE; in order to enable this feature, it must be explicitly set to TRUE:

SQL> exec dbms_stats.set_global_prefs('CONCURRENT', 'TRUE');
SQL> SELECT dbms_stats.get_prefs('CONCURRENT') FROM dual;
      DBMS_STATS.GET_PREFS('CONCURRENT')
      ----------------------------------------------------------------
      TRUE

When the value is set to TRUE, Oracle takes advantage of Job Scheduler and Advance Queuing (AQ) mechanisms to perform concurrent statistics gathering. The number of tables used for concurrent statistics collection is directly proportional to the value of the JOB_QUEUE_PROCESSES initialization parameter. When schema-level statistics gathering is triggered, the number of tables for which statistics will be gathered concurrently is determined by the JOB_QUEUE_PROCESSES value; the remaining tables in the schema will be queued until statistics gathering for the current batch of tables is completed. When schema-level or database-level statistics are triggered, a separate job is created for each nonpartitioned and partitioned table.

The CONCURRENT argument can be set to MANUAL, AUTOMATIC, ALL, or OFF (the default value). When the CONCURRENT value is set on a partitioned table, a separate job is triggered for each partition of the table. The CONCURRENT parameter can be set to table level, schema level, or database level. Therefore, when you have schema with a large number of tables and partitions, make use of the CONCURRENT feature to reduce the overall time required to gather statistics. Setting the value to AUTOMATIC allows the automatic statistics-gathering job to take advantage of concurrently gathered statistics.

Also, ensure that the RESOURCE_MANAGER_PLAN and JOB_QUEUE_PROCESSES initialization parameters are set to appropriate values.

To review or monitor the concurrent statistics-gathering jobs, use the following examples:

SQL> SELECT owner,job_name,state,start_date,max_run_duration
FROM dba_scheduler_jobs
WHERE job_class like 'CONCURRENT%' AND state in ('RUNNING', 'SCHEDULED');

SQL> SELECT job_name, state, comments
FROM dba_scheduler_jobs
WHERE job_class LIKE 'CONC%' and STATE = 'RUNNING';

SQL> SELECT job_name, elapsed_time
       FROM dba_scheduler_running_jobs WHERE job_name LIKE 'ST$%';

Oracle Database 12c offers a significant enhancement: statistics gathering for very small and empty tables are encapsulated into a single batch job, which reduces the overhead of statistics maintenance.

Setting the ESTIMATE_PERCENT Value

Another vital consideration when gathering table statistics is to decide on an appropriate estimate percentage. Although gathering statistics against 100 percent of a table’s data will certainly provide accurate statistics, the time and computing resources required for this task—especially for extremely large tables—may have a significant negative impact on your database’s performance.

Obviously, it can be tough to settle on a value for the estimate percentage, as both large and small values have their own set of benefits and disadvantages. The AUTO sampling statistics-gathering feature of Oracle 11g helps to automatically determine an appropriate estimate percentage while gathering table statistics. The accuracy of AUTO sampling is nearly that of a 100 percent sample size; additionally, it takes less time to complete statistics gathering using AUTO sampling. The new AUTO sample algorithm—now implemented by default in Oracle Database 11g Release 2—also influences how index statistics are gathered. The AUTO sample algorithm performs a full table scan instead of a sampling clause to build a synopsis per column for column-level number of distinct values (NDV) calculations.

The following example demonstrates how to use the AUTO sampling size with the DBMS_STATS package. In order to use the incremental statistics-gathering feature mentioned previously, ESTIMATE_PERCENT must be specified as AUTO_SAMPLE_SIZE:

SQL> exec dbms_stats.gather_table_stasts(null,'tablename',
               estimate_percent=>dbms_stats.auto_sample_size);

Backup and Recovery Best Practices

It doesn’t matter whether your database is supporting OLTP or DSS workloads—the data stored within is critical to your organization’s business continuity. One of the prime responsibilities of a DBA is to protect the data from any potential disasters and, when necessary, recover the database according to the application service level agreement (SLA). For VLDBs and XLDBs, backup and recovery thus becomes a prime concern not only to the DBA but also to the business organization’s management. Following are several strategies that will help improve performance for backup and recovery, including reducing overall backup time, optimizing backup duration and size, and reducing database recovery time:

Image Configure fast incremental backups using the block change tracking (BCT) feature. This saves a good amount of backup time because it scans only the blocks that are modified since the previous full backup.

Image Perform a weekly full backup followed by cumulative daily incremental backups.

Image Use RMAN incremental updates/merge backups to significantly reduce database recovery time.

Image If sufficient disk space is not a barrier, make backups to disk first and then copy them to tape.

Image For tablespaces that contain historical or static data, back them up once, bring them into read-only mode, and then exclude them from the daily backups.

Image Make use of multisection backups to back up large datafiles so that data in the files can be backed up in parallel.

Image Recovering VLDBs and XLDBs can be extremely time consuming. Therefore, activate flashback logging and use guaranteed restore points to overcome known logical errors.

Image Compress historical data to gain storage savings, which ultimately reduces backup time and size.

Image If you have implemented Oracle Data Guard, back up your database from the physical standby database instead of the primary database to reduce resource demands.

Image Use RMAN compressed backups to reduce backup sizes.

Image If your VLDB or XLDB is extremely large and backup processing exceeds the time window allotted, use the PARTIAL option in concert with the DURATION directive to permit RMAN to retain backups of the database even though all datafiles haven’t been backed up. This practice essentially scatters backup creation over multiple days within the specified backup execution window defined by DURATION.

Exadata Solutions

When the Exadata Database Machine was introduced, it was marketed as an excellent solution to deliver extreme performance for data warehouse databases. Although later versions of Exadata support OLTP workloads as well as DSS systems, it is still highly recommended to consider Exadata solutions for VLDBs and XLDBs.

With its state-of-the-art technologies and massive hardware resources, Exadata genuinely delivers extreme performance for DSS applications. Exadata combines a significant amount of hardware resources (CPU and memory), fast networking components (InfiniBand at 40 Gbe/s), flash disks, and numerous software features such as query offloading, HCC, and smart features (smart flash cache, smart scan) to deliver excellent performance. If an Exadata solution fits your organization’s budget for any data warehouse, VLDB, or XLDB, use it and get the extreme performance benefits.

Utilizing a Data Guard Environment

Many companies implement a Data Guard configuration for their data warehousing databases. If you work for one of these companies, consider using an existing Data Guard environment to maximize the returns on that investment by offloading your workload for the production database.

If you have in place an active standby Data Guard that is configured for your primary production VLDB, you can leverage it to offload or redirect some of the workload to that standby database. For instance, if the business department wants to generate daily reports after the nightly batch or data loading using aggregative or complex queries against huge amounts of data, you can shift these queries to run against the physical standby database instead. This approach will reduce resource utilization on the production database server as well as balance the workload.

Summary

Configuring and managing a VLDB is a challenging task. The DBA needs a set of advanced skills to keep the database application operating smoothly. This chapter addressed the most common recommendations for the basic setup, configuration, performance, and maintenance of very large and extremely large databases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset