Data is the heart and lifeblood of every business. Data growth in every industry is phenomenal and is increasing at a very rapid pace with each passing year, sometimes more than 100 percent from its previous year. Data warehousing is a common requirement for any organization, whatever the size. At the same time, configuring and maintaining very large databases (VLDBs) and extremely large databases (XLDBs) are challenging tasks that require advanced skills. This chapter presents a 360-degree overview of managing VLDBs, offering tips for best practices and strategies in configuration, performance, backup and recovery, and maintenance.
As many Oracle and data warehousing gurus have voiced, there is no fixed definition or standard rule to categorize a certain sized database as a VLDB. Not that long ago, databases that were only hundreds of gigabytes in size were considered to be VLDBs; however, over the past few years, the definition of VLDBs has changed significantly due to enormous data growth. It is not unusual to encounter databases that have grown to the size of several terabytes or even petabytes and that hold billions or even trillions of records. VLDBs and XLDBs therefore demand highly efficient software and hardware resources and require extremely large amounts of storage capacity.
VLDBs and XLDBs used for decision support systems (DSS) are often referred to as data warehouse (DW) databases. DSSs are critical to an organization’s business management to analyze the functionality and business growth of the products and services that the organization offers. Data warehouses generally hold historical data that is loaded from various data sources, including another database, a flat file, an Excel spreadsheet, and other sources. Extract-transform-load (ETL) tools are typically used to load data from those sources, while applications and utilities such as business intelligence (BI) and business objects are used to generate management reports and dashboards for the business’s needs.
VLDBs and XLDBs therefore engender a unique set of challenges and can raise some daunting tasks for the information technology (IT) department of an organization. They demand state-of-the-art technologies to meet myriad business needs while simultaneously delivering optimal performance. When designing such a database, you will almost certainly encounter the need for high-end hardware resources, huge storage capacities, large network bandwidths, and special considerations for backup and recovery. The following sections present best practices and discuss how to apply the tools, tips, and tricks to ensure a solid VLDB and XLDB foundation so that you can manage them with ease.
One of the key factors in having an optimal setup is getting the basics right and building a solid foundation. This segment briefly reviews some of the initial configuration tips for VLDBs:
Choosing the right database configuration template in DBCA
Selecting the optimal data block size
Sizing adequate system global area (SGA), program global area (PGA), and other memory components
Leveraging data compression
Using temporary tablespaces effectively
Implementing partitioning for easy data management
Making the right choices for partitioned indexes (especially global vs. local)
Enabling parallelism to take advantage of multiple CPUs
Verifying application code for effectiveness before its production deployment
Implementing appropriate backup and recovery strategies
When deploying a new database, it is advisable to follow some basic rules specific to the application category: online transaction processing (OLTP), DSS, or a combination of both. To create a new Oracle database, you can use the Database Configuration Assistant (DBCA), which is a Java-based GUI tool, or you can use the CREATE DATABASE
statement, a manual approach that requires running scripts. Depending on the nature of the application—that is, whether it’s an OLTP or a DSS application—you should choose the appropriate database creation template through DBCA, as shown in Figure 7.1. If the database is intended for DSS or VLDB, choose the Data Warehouse template so that the appropriate database initialization parameters, online redo log sizing, and tablespace sizing are set properly.
It is essential to choose the right database block size for databases, especially for VLDBs and XLDBs. The data block size has an impact on the overall database read and write performance. Although the default 8 KB block size is most appropriate and can meet the demands of an OLTP application, DSS systems generally require a larger block size. If the database is already configured with a default 8 KB block size, the DBA can use Oracle’s multiple block feature. Under some circumstances, using a 16 KB database block size or even a 32 KB size for VLDBs and XLDBs would yield performance benefits over a 4 KB or 8 KB block size. The block size is controlled with the db_block_size
initialization parameter.
Oracle supports multiple data block sizes within the same database, but the practice is not encouraged unless your application demands it. Any VLDB might have data block sizes of 16 KB or even 32 KB, but only the DBA can determine which block size best supports the application’s performance and behavior. Remember, once a database is created with a default block size, the default block sized cannot be modified unless the database is re-created.
Following are a few tips for choosing the right block size that meets the application’s needs:
A smaller block size is efficient when rows are smaller and data access is random.
Choose a larger block size to improve the read performance when the rows are smaller and access is sequential or when you have a mixture of random and sequential reads.
Larger block size produces significant read performance when the rows are larger, such as in large object (LOB) columns.
With large block size for high concurrency systems, ensure you have set appropriate values for the INITRANS
and MAXTRANS
parameters.
Larger block size has less overhead and stores more rows in a single block.
With a larger block size, several rows are put into buffer cache with a single read.
Today’s VLDBs and XLDBs commonly grow into terabyte (TB) or even petabyte (PB) sizes. Traditionally, the larger the database, the more tablespaces and datafiles will be required to support the increasing data demands. Depending on the operating system platform, the datafiles of a smallfile tablespace are allowed to grow to a maximum size of 32 GB for an 8 KB block size (128 GB for a 32 KB block size).
To limit the number of datafiles for VLDBs, Oracle introduced the bigfile tablespace concept, which allows a single large datafile per tablespace. However, that single datafile can grow to a maximum size of 32 TB for an 8 KB block size (and up to 128 TB for a 32 KB block size). This feature eliminates the need for numerous datafiles and simplifies tablespace management. Keep in mind that bigfile tablespaces are valid only for locally managed tablespaces with automatic segment space management.
When you create bigfile tablespaces, you need to ensure there is enough free space for growth on the storage/filesystem and that the system supports striping.
SQL> CREATE BIGFILE tablespace data_ts DATAFILE size 10G;
SQL> CREATE BIGFILE TABLESPACE data_ts DATAFILE '/oradata/data_ts01.dbf' SIZE 10G;
Refer to the BIGFILE
column in the V$TABLESPACE
dictionary view to identify whether the tablespace is a traditional smallfile or bigfile tablespace.
SQL> select name,bigfile from v$tablespace;
NAME BIG
------------------------------ ---
SYSTEM NO
UNDOTBS1 NO
SYSAUX NO
USERS NO
TEMP NO
DATA_TS YES
Another critical decision a DBA must make concerns the amount of memory resources that should be assigned for the system and program global areas for the database instance used for a VLDB or XLDB. The applications that run against VLDBs typically require much higher rates of data access and process complex calculations, so it is essential to have optimal SGA and PGA to avoid performance pitfalls.
Fortunately, Oracle provides multiple options to conquer memory management and configuration hassles for large-sized database demands. Although you can manually configure memory components such as SGA and PGA separately, you might want to take advantage of the automatic memory management (AMM) feature to handle the SGA and PGA together automatically. It is also critical to ensure there is no frequent dynamic memory allocation and deallocation happening among the SGA components, especially during peak business hours. Review V$SGA_DYNAMIC_COMPONENTS
, V$SGA_CURRENT_RESIZE_OPS
, V$SGA_RESIZE_OPS
, and V$SGA_DYNAMIC_FREE_MEMORY
dynamic performance views to analyze whether the current SGA sizing is optimal for your environment.
There is no rule of thumb to define the perfect memory allocation for SGA and PGA, but you may consider starting with an optimal size and then monitoring the SGA and PGA statistics to adjust these settings as needed. Since VLDB applications tend to perform a lot of sorting, ensure that you configure adequate PGA as well. You can use the V$MEMORY_TARGET_ADVICE
dynamic view to adjust the value of AMM if it is in use; also, the V$SGA_TARGET_ADVICE
and V$PGA_TARGET_ADVICE
dynamic views are helpful for tuning and adjusting the values for SGA_TARGET
and PGA_AGGREGATE_TARGET
, respectively. You can always adjust the SGA settings; sometimes a database restart is necessary for the new values to take effect.
Managing and configuring the proper sizes of temporary tablespaces can become a significant challenge for a DBA as VLDB/DSS applications tend to perform heavy sorting operations quite often.
Essentially, the Oracle database uses temporary tablespaces to segregate the temporary work from the real work in the system. Performance bottlenecks on the temporary tablespace are often caused by an undersized tablespace, which results in the system running out of adequate temporary space when concurrent user queries increase the demand for temporary space. Temporary tablespace groups (TTGs), introduced in Oracle Database 10g, provides the ability to offer performance benefits by spreading I/O across multiple temporary tablespaces. A TTG can be created initially with a single temporary tablespace, and multiple temporary tablespaces can be added later. When configuring a TTG, consider having equally sized temp files and the same number of temp files for each temporary tablespace that will be part of the TTG. A TTG is a better option than having one large temporary tablespace or multiple temporary tablespaces assigned to multiple users in the database.
Managing the massive amounts of data typically stored in a VLDB or XLDB is not a simple task. It demands manageability, availability, and good performance. Oracle’s partitioning features offer a wide range of partitioning types to support your data and application needs. Data partitioning enhances easy data management, improves data availability, and delivers significant performance by dividing huge amounts data into much smaller segments.
Oracle continues to improve and expand partitioning capabilities in each new release of the database. Oracle supports table-level, index-level, and index-organized table (IOT)-level partitioning. When partitioning is applied to an object, each partition is treated as a separate segment. According to the need of your application, you can implement an appropriate partition strategy; when you can’t decide on a strategy, or when your data doesn’t call for range or list partitioning, you may choose to go with hash partitioning.
Oracle Database 11g made partitioning management easier with the introduction of interval partitioning, an extension of range partitioning, which allows a DBA to define automated partitioning of data on the basis of specified timestamp or numeric intervals. With this new feature, partitions are created automatically whenever the data demands. And when existing columns do not present an appropriate partitioning key, Oracle 11g also supports the ability to partition data on the basis of the value of an expression that can be stored in a virtual column.
DBAs also often agonize over when partitioning should be applied to a table, which column is the best choice for the table’s partitioning key, and whether partitioning should be applied to a table in an OLTP environment. There are many opinions on how best to answer these questions, but a good rule of thumb is that if the table is huge and often requires complex data maintenance, it makes excellent sense to at least consider partitioning regardless of whether an OLTP or DSS application is accessing the table.
One of the tricky tasks that a DBA typically has to address when dealing with DSS applications is deciding between the local and global partitioned indexes for a partitioned table. The type of index partitioning serves an important role in data access, data loading, and partition maintenance, so it generally pays to invest considerable time when making these decisions.
The general perception is that global indexes deliver better performance for OLTP applications, whereas local indexes provide better performance benefits for DSS applications running against VLDBs.
Local index partitions not only provide better performance but also are easier to maintain and offer great application availability. Because each local index partition maps to a single corresponding individual table partition, this generally provides good application availability when a MERGE
, SPLIT
, EXCHANGE
, DROP
, or TRUNCATE
partition operation is applied against one or more table partitions. However, you can’t create a local index on the primary key and the unique index. When MERGE
, SPLIT
, EXCHANGE
, DROP
, and similar operations are performed, you need to either rebuild the index or add the UPDATE GLOBAL INDEXES
clause.
Data retention may differ dramatically among organizations according to their different business needs, regulatory requirements, and even the data owner’s desire to keep historical data long past its perceived value. When the retention period is longer, a VLDB tends to encompass huge volumes of historical data, which will of course need a correspondingly huge amount of physical storage capacity. DBAs must therefore carefully consider exactly how much data is being retained and implement appropriate maintenance policies to insure that storage space is not exhausted unexpectedly or prematurely.
Oracle’s various data compression features may provide significant storage savings. If you are fortunate enough to be using Exadata or the ZFS logical volume manager for your VLDB, those storage platforms offer Hybrid Columnar Compression (HCC) features through which you may be able to save significant storage capacity. If your VLDB is not using these storage platforms, however, you may still be able to make use of Oracle Advanced Compression features to compress data within the database.
Oracle Advanced Compression is a separately licensed feature that provides comprehensive storage savings. As of Oracle Database 12c, Advanced Compression offers myriad features, including Advanced Data Compression, Heat Map, Automatic Data Optimization (ADO), Advanced Row Compression, Advanced Index Compression, and Advanced Network Compression.
Table compression allows you to compress the data within a table. Basic table compression, which doesn’t require any additional license, applies an average of 10X compression to a table’s data but applies only to initial direct-path loading. In contrast, Advanced Compression’s Advanced Row Compression, called OLTP compression in earlier releases, applies row compression to tables during all data manipulation language (DML) operations. If you are using either Exadata or ZFS storage, then HCC offers warehouse and archive compression methods in addition to basic and advanced compression.
Here are some examples of table-level compression options:
SQL> CREATE TABLE table_name (column_list...) ROW COMPRESS ADVANCED;
SQL> ALTER TABLE table_name ROW STORE COMPRESS BASIC;
SQL> ALTER TABLE table_name ROW STORE COMPRESS |ADVANCED;
SQL> ALTER TABLE table_name MOVE PARTITION partition1 ROW STORE COMPRESS BASIC | ADVANCE;
SQL> ALTER TABLE table_name NOCOMPRESS;
The following query lists the objects on which compression is applied:
SQL> SELECT table_name,compression, compress_for FROM user_tables WHERE compression = 'ENABLED';
Often, historical data in a VLDB becomes less active as it grows older. If this is true of data contained in a partitioned table, Oracle 12c’s new (and complementary) Heat Map and ADO features may be particularly useful. You can define ADO policies and conditions under which data should be moved to a different storage tier or compressed at a higher compression level; ADO will then automatically apply the appropriate compression based on heat map statistics, but only when the corresponding conditions are satisfied.
To enable the Heat Map feature, all that is required is to set the HEAT_MAP
initialization parameter to ON
(the default value is OFF
):
SQL> ALTER SESSION | SYSTEM SET HEAT_MAP = ON;
SQL> ALTER SESSION | SYSTEM SET HEAT_MAP = OFF; -- disable Heat Map
To list the statistical information about the Heat Map settings for database objects, you can use views V$HEAT_MAP_SEGMENT
and USER_HEAT_MAP_SEGMENT
. Views DBA_HEAT_MAP_SEGMENT
and DBA_HEAT_MAP_SEQ_HISTOGRAM
provide additional information about the individual data segments and their corresponding heat map settings.
ADO allows you to define the policies at table level, tablespace level, and row level for smart compression and automatic data movement. The following examples demonstrate how to deploy ADO policies at various levels for a table. In this example, when 60 days has elapsed since any DML has been performed against any data in the sales
table, that table’s segments will be automatically compressed using ADVANCED
compression:
SQL> ALTER TABLE sales
ILM ADD POLICY
ROW STORE
COMPRESS ADVANCED
SEGMENT
AFTER 60 DAYS OF NO MODIFICATION;
In the subsequent example, when 30 days has elapsed since any DML has been performed against any data in the sales_dec14
partition of the sales
table, it will be automatically compressed:
SQL> ALTER TABLE sales
MODIFY PARTITION sales_dec14
ILM ADD POLICY
ROW STORE
COMPRESS ADVANCED
ROW
AFTER 30 DAYS OF NO MODIFICATION;
To disable and then remove all ADO policies against the sales
table, use the following syntax:
SQL> ALTER TABLE sales ILM DISABLE_ALL;
SQL> ALTER TABLE sales ILM DELETE_ALL;
Use the following example to query the information on ADO:
SQL> SELECT policy_name, policy_type, enabled FROM user_ilm_policies;
Oracle Database 12c supports Advanced Index Compression, which yields storage reduction. The Advanced Index Compression feature supports unique and nonunique indexes while maintaining good performance during the index access operation. The compression can be applied at partition level, as demonstrated in the following:
SQL> CREATE INDEX index_name ON(column) COMPRESS ADVANCED HIGH LOCAL (PARTITION
ip1 COMPRESS ADVANCED LOW, PARTITION ip2 COMPRESS HIGH, PARTITION ip3 NOCOMPRESS);
SQL> CREATE INDEX index_name ON(column) LOCAL (PARTITION ip1 COMPRESS ADVANCED LOW,
PARTITION ip2 COMPRESS HIGH, PARTITION ip3);
SQL> CREATE INDEX index_name ON(columns_list..) COMPRESS ADVANCED LOW;
Delivering optimal performance for applications that access a VLDB is essential. If Oracle database best practices are applied, many tuning nightmares may disappear. Following are the most common performance issues that can affect VLDBs:
Suboptimal application coding
Inefficient application and database design
Inaccurate or missing optimizer statistics
Lack of indexes or too many indexes
Limited hardware resources
Bad network design
To demonstrate the issue of suboptimal application coding, let’s look at a classic example of a poor performance incident—an issue that was ultimately resolved with a simple code modification.
A reconciliation batch scheduled to run at the end of each business day was performing inconsistently. The job sometimes completed within only 2 hours and sometimes took more than 6 hours; during the database’s worst performance, it took nearly 24 hours to complete. This inconsistent performance not only gave the DBA team nightmares, it also significantly impacted the subsequent dependent jobs, and as a result, several key business users were unable to get their daily reports on time.
The job itself was straightforward, involving no complex queries or convoluted logic. The job was to perform the following simple tasks:
1. Delete data from the target tables.
2. Load data from text files into the target tables.
3. Run queries against the target tables to generate reports.
During the course of investigation, the following behavior was observed:
The database was growing in size at a uniform rate each day.
Important queries were favoring the nested loops in the query execution plan.
As a temporary workaround the following action plan was used:
1. Reorganize the target tables using the alter table ... move
command.
2. Rebuild all indexes in the schema.
3. Gather application schema statistics.
4. If the preceding steps fail to resolve the inconsistent performance, export the schema into another database residing on a different server that has more CPU and memory capacity.
Despite these workarounds, the problem reappeared and gradually returned to the worst state (taking nearly 24 hours to complete). Interestingly, when the person responsible for the application was contacted, he mentioned that data in all tables was deleted and subsequently loaded from different sources, and the observed growth was unlikely with the amount of data he was loading daily.
Then the focus shifted to the data load script. Upon reviewing the code, a classic mistake was observed: data was removed from the tables with the DELETE
command followed by direct path loads.
If you know the basics, you can easily spot the reason for the consistent growth in the database. When DELETE
is used to remove complete data from a table, the high-watermark is not reset; therefore, subsequent direct-path load inserts start loading data above the high-watermark, ignoring the free data blocks below the high-watermark. This approach causes continuing data growth and consumed significant amounts of free space in the datafiles. Hence, queries on these tables were performing badly.
The simple solution to this issue was to replace the DELETE
command with TRUNCATE TABLE
. Once this change was applied, the job finished in just 2 hours. There have been no issues with performance in the 6 years since the problem was encountered.
The moral of the story is that if you get the basics right, tuning the code just a little could offer significant benefits for many performance issues.
Data loading is a common practice in data warehouse databases. In general, indexes improve queries’ data selection performance; however, having too many indexes sometimes hampers DML (insert, update, and delete) performance. When data loading doesn’t produce expected throughput and when a DBA receives performance-related concerns, one of the key areas to investigate is the number of indexes on the table.
One of the typical methods used to improve the performance of bulk data loading is to disable indexes and constraints on the table and then rebuild them after data loading completes. Although this may seem like a good workaround, it can be quite time consuming to rebuild indexes on extremely large tables. Therefore, it is recommended to minimize the number of indexes on the table to improve the data loading process.
There are a few approaches to find out which indexes are being used and which are not. One of the methods is to enable monitoring of index usage with the ALTER INDEX ... MONITORING USAGE
command. Once you activate monitoring for indexes, you can verify their usage through the V$OBJECT_USAGE
dynamic view; if monitoring shows that some of the indexes are not being used, you may consider putting them in invisible mode and dropping them after obtaining permission from the application owner/developers.
Oracle Database 12c gives you the ability to create partial indexes on a partitioned table. This cool feature provides the flexibility to turn on and off local and global indexing for individual partitions of a table. The following example demonstrates how to create a table with an INDEXING OFF|ON
option:
SQL> CREATE TABLE emp (eno number(9),ename varchar2(100),dob date, dept number(2), salary
number(6)) INDEXING OFF
PARTITION BY RANGE (salary)
(PARTITION p10000 values less than (10001) INDEXING OFF,
partition p20000 values less than (20001) INDEXING ON,
partition p30000 values less than (30001));
SQL> CREATE INDEX index_p1 ON emp(eno) GLOBAL INDEXING FULL|PARTIAL;
SQL> CREATE INDEX index_p2 ON emp(salary) LOCAL INDEXING PARTIAL;
SQL> ALTER TABLE emp MODIFY PARTITION p10000 INDEXING ON;
When indexes are created with the PARTIAL
option on a table:
For a local index, partitions that are tagged as INDEXING OFF
will have an UNUSABLE
index status, and partitions that are tagged as INDEXING ON
will have a USABLE
index status.
For a global index, only those partitions that are tagged as INDEXING ON
will be active, and the rest will be excluded.
To view an index’s status, refer to the INDEXING
column in DBA|USER_PART_TABLES
, DBA|USER_TAB_PARTITIONS
, and DBA|USER_TAB_SUBPARTITIONS
. Keep in mind that a partial index can’t be applied to unique indexes.
Two things are very common in VLDB environments: huge hardware resources and massive data. The queries that access VLDBs often either scan large amounts of data or perform massive computation on huge amounts of data. When we have a solid hardware presence, it is up to us to best utilize those resources to improve the overall application performance. One of the best ways to do this is to divide and drive: that is, divide the workload of the query into smaller pieces so it will complete as soon as possible. Don’t panic. We are not talking about rocket science here—just leveraging Oracle’s capabilities in regard to parallelism.
When parallelism is applied at any level—database, table, or query—Oracle distributes the workload of the query among multiple slave processes and completes the job more quickly than if it were executed serially. Parallel execution utilizes multiple CPU and I/O resources on the servers to achieve improved performance by reducing the response time needed to execute the query.
As just mentioned, you can define parallelism at various levels in an Oracle database:
Object level (table or index)
Session level
Database level
At statement runtime
If you are uncertain about the degree of parallelism to use, you can leave this to Oracle, as Oracle will automatically determine the number of parallel processes to use during execution based on the settings for initialization parameters cpu_count
and parallel_threads_per_cpu
. Following are some of the key initialization parameters pertaining to parallel execution and their default values:
parallel_degree_policy string MANUAL
parallel_execution_message_size integer 16384
parallel_max_servers integer 240
parallel_min_percent integer 0
parallel_min_servers integer 0
parallel_server boolean TRUE
parallel_servers_target integer 96
parallel_threads_per_cpu integer 2
Using parallelism is generally advised while dealing with huge amounts of data. For example, the following are some of the statements that might benefit from parallelism:
CTAS (CREATE TABLE AS SELECT)
ALTER TABLE|PARTITION MOVE|SPLIT| PARTITION
ALTER INDEX REBUILD
CREATE INDEX
INSERT AS SELECT
INSERT /*+ APPEND */
(direct path INSERT
s)
If you are working with an Oracle Real Application Clusters (RAC) database, when you set the PARALLEL_FORCE_LOCAL
initialization parameter value to TRUE
, it restricts the SQL statement parallel execution to a single instance.
Maintaining accurate and up-to-date optimizer statistics always is critical to generating optimal SQL execution plans by the Oracle optimizer. Many performance issues can be resolved simply by gathering fresh optimizer statistics. The challenge comes when you have to gather optimizer statistics on large tables. You must consider the time required to gather the statistics, determine how often statistics must be gathered, and find the most efficient method of gathering them. The following sections provide some best practices for collecting accurate optimizer statistics quickly on large tables and table partitions.
Accurate and consistent optimizer statistics on tables helps to improve query performance; however, that need must be balanced against how to collect and maintain accurate statistics consistently on large partitioned tables in a timely manner. Although several workarounds and methods exist to accomplish this, we focus on the benefits of gathering incremental statistics and look at real-world examples.
The concept of gathering incremental statistics was first introduced in Oracle Database 11g with the intention to enhance the performance of gathering statistics at global and partition levels on large partitioned tables. Incremental statistics on partitioned tables are gathered at the global, partition, and subpartition levels. When incremental preferences are enabled, the statistical metadata for each partition of the table are stored in the form of a synopsis in the SYSAUX
tablespace.
The synopsis helps to produce accurate global-level statistics by aggregating partition-level statistics, which eliminates the need to scan the full table. In the context, whenever a new partition is added, the global statistics are updated automatically using the existing partitions’ synopses and the new partition’s synopsis. Whenever 10 percent of the data is changed on a partitioned table, the statistics become stale and need to be regathered.
Note
Bug 16851194 reports continued growth in the SYSAUX
tablespace without growth in actual data. Therefore, it is advisable to monitor the SYSAUX
tablespace’s growth periodically to ensure you are not hitting this bug. The bug was fixed in Oracle 12.1.0.2.
Starting in Oracle Database 12c, the default behavior for stale statistics is controlled with the new INCREMENTAL_STALENESS
argument. The default value for INCREMENTAL_STALENESS
is NULL
, which invokes the same behavior as in Oracle 11g. Also in Oracle 12c, global-level statistics are automatically calculated for any table whose statistics are either locked or stale.
Additionally, the USE_STALE_PERCENT
argument can be used to define a different threshold value for the statistics’ staleness percentage. The default threshold value for statistics staleness is 10 percent, but it can be overridden so that statistics will not be considered stale until the percentage of changed rows has reached the specified threshold value:
SQL> exec dbms_stats.set_database_prefs('INCREMENTAL_STALENESS', 'USE_STALE_PERCENTAGE');
SQL> exec dbms_stats.set_database_prefs('STALE_PERCENT', '25');
Use the following example to get the current value for STALE_PERCENT
:
SQL> exec dbms_stats.get_prefs('STALE_PERCENT', 'SCHEMA', 'TABLENAME') STALE_VALUE FROM dual;
To maintain and gather incremental statistics on a partitioned table, you must run through the following procedure first:
1. The INCREMENTAL
statistics preference is not enabled by default; verify its current setting using this code:
SQL> SELECT dbms_stats.get_prefs('INCREMENTAL', 'SCHEMA', TABLENAME')
tab_incr_prefs FROM dual;
TAB_INCR_PREFS
-------------------------------------------------------------------
FALSE
2. Turn on the INCREMENTAL
statistics preference on the table:
SQL> exec dbms_stats.gather_table_stasts('SCHEMA',
'TABLENAME',ESTMATE_PERCENT=>dbms_stats.auto_sample_size);
SELECT dbms_stats.get_prefs('INCREMENTAL','SCHEMA','TABLENAME')
tab_inc_perf FROM dual;
3. Gather incremental statistics and ensure the incremental flag is turned on:
SQL> exec dbms_stats.set_table_prefs('SCHEMA', 'TABLENAME', 'INCREMENTAL', 'TRUE');
TAB_INC_PERF
-------------------------------------------------------------------
TRUE
The following query identifies which tables have the INCREMENTAL
statistics flag turned on:
SQL> SELECT owner, object_name
FROM dba_objects
WHERE object_id IN
(SELECT DISTINCT(obj#)
FROM optstat_user_prefs$
WHERE PNAME='INCREMENTAL'
AND VALCHAR='TRUE');
This example shows how to activate incremental statistics at the schema level:
SQL> exec dbms_stats.set_SCHEMA_prefs('SCHEMA', 'INCREMENTAL', 'FALSE');
Use the following syntax to disable the incremental statistics for an individual table:
SQL> exec dbms_stats.set_table_prefs('SCHEMA', 'TABLENAME', 'INCREMENTAL', 'FALSE');
Use the following syntax to disable the incremental statistics at the schema level:
SQL> exec dbms_stats.set_schema_prefs('SCHEMA', 'INCREMENTAL', 'FALSE');
Gathering statistics in a schema sometimes can be time consuming because of the large number of tables and indexes. Before Oracle 11.2.0.2, there was no direct method available to gather statistics concurrently on multiple objects. The concurrent statistics-gathering feature (also known as interobject parallelism) in Oracle 11.2.0.2 provides the ability to gather statistics on multiple tables or partitions concurrently. The objective of this feature is to fully utilize the database server’s hardware resources to improve the efficiency of gathering statistics and thus reduce the overall time required to gather them.
This behavior is controlled by the CONCURRENT
global preference with the DBMS_STATS.GATHER_TABLE_PREF
package. By default, the value for CONCURRENT
is FALSE
; in order to enable this feature, it must be explicitly set to TRUE
:
SQL> exec dbms_stats.set_global_prefs('CONCURRENT', 'TRUE');
SQL> SELECT dbms_stats.get_prefs('CONCURRENT') FROM dual;
DBMS_STATS.GET_PREFS('CONCURRENT')
----------------------------------------------------------------
TRUE
When the value is set to TRUE
, Oracle takes advantage of Job Scheduler and Advance Queuing (AQ) mechanisms to perform concurrent statistics gathering. The number of tables used for concurrent statistics collection is directly proportional to the value of the JOB_QUEUE_PROCESSES
initialization parameter. When schema-level statistics gathering is triggered, the number of tables for which statistics will be gathered concurrently is determined by the JOB_QUEUE_PROCESSES
value; the remaining tables in the schema will be queued until statistics gathering for the current batch of tables is completed. When schema-level or database-level statistics are triggered, a separate job is created for each nonpartitioned and partitioned table.
The CONCURRENT
argument can be set to MANUAL
, AUTOMATIC
, ALL
, or OFF
(the default value). When the CONCURRENT
value is set on a partitioned table, a separate job is triggered for each partition of the table. The CONCURRENT
parameter can be set to table level, schema level, or database level. Therefore, when you have schema with a large number of tables and partitions, make use of the CONCURRENT
feature to reduce the overall time required to gather statistics. Setting the value to AUTOMATIC
allows the automatic statistics-gathering job to take advantage of concurrently gathered statistics.
Also, ensure that the RESOURCE_MANAGER_PLAN
and JOB_QUEUE_PROCESSES
initialization parameters are set to appropriate values.
To review or monitor the concurrent statistics-gathering jobs, use the following examples:
SQL> SELECT owner,job_name,state,start_date,max_run_duration
FROM dba_scheduler_jobs
WHERE job_class like 'CONCURRENT%' AND state in ('RUNNING', 'SCHEDULED');
SQL> SELECT job_name, state, comments
FROM dba_scheduler_jobs
WHERE job_class LIKE 'CONC%' and STATE = 'RUNNING';
SQL> SELECT job_name, elapsed_time
FROM dba_scheduler_running_jobs WHERE job_name LIKE 'ST$%';
Oracle Database 12c offers a significant enhancement: statistics gathering for very small and empty tables are encapsulated into a single batch job, which reduces the overhead of statistics maintenance.
Another vital consideration when gathering table statistics is to decide on an appropriate estimate percentage. Although gathering statistics against 100 percent of a table’s data will certainly provide accurate statistics, the time and computing resources required for this task—especially for extremely large tables—may have a significant negative impact on your database’s performance.
Obviously, it can be tough to settle on a value for the estimate percentage, as both large and small values have their own set of benefits and disadvantages. The AUTO sampling statistics-gathering feature of Oracle 11g helps to automatically determine an appropriate estimate percentage while gathering table statistics. The accuracy of AUTO sampling is nearly that of a 100 percent sample size; additionally, it takes less time to complete statistics gathering using AUTO sampling. The new AUTO sample algorithm—now implemented by default in Oracle Database 11g Release 2—also influences how index statistics are gathered. The AUTO sample algorithm performs a full table scan instead of a sampling clause to build a synopsis per column for column-level number of distinct values (NDV) calculations.
The following example demonstrates how to use the AUTO sampling size with the DBMS_STATS
package. In order to use the incremental statistics-gathering feature mentioned previously, ESTIMATE_PERCENT
must be specified as AUTO_SAMPLE_SIZE
:
SQL> exec dbms_stats.gather_table_stasts(null,'tablename',
estimate_percent=>dbms_stats.auto_sample_size);
It doesn’t matter whether your database is supporting OLTP or DSS workloads—the data stored within is critical to your organization’s business continuity. One of the prime responsibilities of a DBA is to protect the data from any potential disasters and, when necessary, recover the database according to the application service level agreement (SLA). For VLDBs and XLDBs, backup and recovery thus becomes a prime concern not only to the DBA but also to the business organization’s management. Following are several strategies that will help improve performance for backup and recovery, including reducing overall backup time, optimizing backup duration and size, and reducing database recovery time:
Configure fast incremental backups using the block change tracking (BCT) feature. This saves a good amount of backup time because it scans only the blocks that are modified since the previous full backup.
Perform a weekly full backup followed by cumulative daily incremental backups.
Use RMAN incremental updates/merge backups to significantly reduce database recovery time.
If sufficient disk space is not a barrier, make backups to disk first and then copy them to tape.
For tablespaces that contain historical or static data, back them up once, bring them into read-only mode, and then exclude them from the daily backups.
Make use of multisection backups to back up large datafiles so that data in the files can be backed up in parallel.
Recovering VLDBs and XLDBs can be extremely time consuming. Therefore, activate flashback logging and use guaranteed restore points to overcome known logical errors.
Compress historical data to gain storage savings, which ultimately reduces backup time and size.
If you have implemented Oracle Data Guard, back up your database from the physical standby database instead of the primary database to reduce resource demands.
Use RMAN compressed backups to reduce backup sizes.
If your VLDB or XLDB is extremely large and backup processing exceeds the time window allotted, use the PARTIAL
option in concert with the DURATION
directive to permit RMAN to retain backups of the database even though all datafiles haven’t been backed up. This practice essentially scatters backup creation over multiple days within the specified backup execution window defined by DURATION
.
When the Exadata Database Machine was introduced, it was marketed as an excellent solution to deliver extreme performance for data warehouse databases. Although later versions of Exadata support OLTP workloads as well as DSS systems, it is still highly recommended to consider Exadata solutions for VLDBs and XLDBs.
With its state-of-the-art technologies and massive hardware resources, Exadata genuinely delivers extreme performance for DSS applications. Exadata combines a significant amount of hardware resources (CPU and memory), fast networking components (InfiniBand at 40 Gbe/s), flash disks, and numerous software features such as query offloading, HCC, and smart features (smart flash cache, smart scan) to deliver excellent performance. If an Exadata solution fits your organization’s budget for any data warehouse, VLDB, or XLDB, use it and get the extreme performance benefits.
Many companies implement a Data Guard configuration for their data warehousing databases. If you work for one of these companies, consider using an existing Data Guard environment to maximize the returns on that investment by offloading your workload for the production database.
If you have in place an active standby Data Guard that is configured for your primary production VLDB, you can leverage it to offload or redirect some of the workload to that standby database. For instance, if the business department wants to generate daily reports after the nightly batch or data loading using aggregative or complex queries against huge amounts of data, you can shift these queries to run against the physical standby database instead. This approach will reduce resource utilization on the production database server as well as balance the workload.
Configuring and managing a VLDB is a challenging task. The DBA needs a set of advanced skills to keep the database application operating smoothly. This chapter addressed the most common recommendations for the basic setup, configuration, performance, and maintenance of very large and extremely large databases.