December 6, 2012

Quickly Improve SQL Performance with dbms_stats

Oracle Tips by Burleson Consulting
Updated for 11g on September 17, 2010

The old fashioned “analyze table” and dbms_utility methods for generating CBO statistics are obsolete and somewhat dangerous to SQL performance. This is because the cost-based SQL Optimizer (CBO) relies on the quality of the statistics to choose the best execution plan for all SQL statements. The dbms_stats utility does a far better job in estimating statistics, especially for large partitioned tables, and the better stats results in faster SQL execution plans.

Let’s see how dbms_stats works. It’s easy! Here is a sample execution of dbms_statswith the options clause:

exec dbms_stats.gather_schema_stats(
ownname          => ‘SCOTT’,
estimate_percent => dbms_stats.auto_sample_size,
method_opt       => ‘for all columns size repeat’,
degree           => 34);

When the options clause is specified you may specify GATHER options. When GATHER AUTO is specified, the only additional valid parameters are ownname, stattab, statid, objlist and statown; all other parameter settings are ignored.

exec dbms_stats.gather_schema_stats(
ownname => ‘SCOTT’,
options => ‘GATHER AUTO’);

There are several values for the options parameter that we need to know about:

gather – re-analyzes the whole schema.
gather empty – Only analyze tables that have no existing statistics.
gather stale – Only re-analyze tables with more than 10% modifications (inserts, updates, deletes).
gather auto – This will re-analyze objects which currently have no statistics and objects with stale statistics. Using gather auto is like combininggather stale and gather empty.

Note that both gather stale and gather auto require monitoring. If you issue the “alter table xxx monitoring” command, Oracle tracks changed tables with thedba_tab_modifications view. Below we see that the exact number of inserts, updates and deletes are tracked since the last analysis of statistics.

SQL> desc dba_tab_modifications;

Name Type

TABLE_OWNER         VARCHAR2(30)
TABLE_NAME          VARCHAR2(30)
PARTITION_NAME      VARCHAR2(30)
SUBPARTITION_NAME   VARCHAR2(30)

INSERTS             NUMBER
UPDATES             NUMBER
DELETES             NUMBER
TIMESTAMP           DATE TRUNCATED           VARCHAR2(3)

The most interesting of these options is the gather stale option. Because all statistics will become stale quickly in a robust OLTP database, we must remember the rule for gather stale is > 10% row change (based on num_rows at statistics collection time).

Hence, almost every table except read-only tables will be re-analyzed with the gather stale option. Hence, the gather stale option is best for systems that are largely read-only. For example, if only 5% of the database tables get significant updates, then only 5% of the tables will be re-analyzed with the “gather stale” option.

The CASCADE Option

When analyzing specific tables, the cascade option can be used to analyze all related indexes.

The Oracle documentation notes that using the cascade option gathers statistics on the table, plus all indexes for the target table. Using this option is equivalent to runninggather_table_stats plus running gather_index_stats for each index on the table. If you always want indexes analyzed when running gather_table_stats you can use theset_database_prefs, set_global_prefs, or set_table_prefs, to always include indexes when gather_table_stats is executed.

exec dbms_stats.gather_table_stats(
ownname          => ‘PERFSTAT’,
tabname          => ’STATS$SNAPSHOT’
estimate_percent => dbms_stats.auto_sample_size,
method_opt       => ‘for all columns size skewonly’,
cascade          => true,
degree           => 7);

The DEGREE Option

Note that you can also parallelize the collection of statistics because the CBO does full-table and full-index scans. When you set degree=x, Oracle will invoke parallel query slave processes to speed up table access. Degree is usually about equal to the number of CPUs, minus 1 (for the OPQ query coordinator).

Automating Sample Size with dbms_stats

Now that we see how the dbms_stats options works, get see how to specify the sample size for dbms_stats. The following estimate_percent argument is a new way to allow Oracle’s dbms_stats to automatically estimate the “best” percentage of a segment to sample when gathering statistics:

estimate_percent => dbms_stats.auto_sample_size

You can verify the accuracy of the automatic statistics sampling by looking at thedba_tables sample_size column. It is interesting to note that Oracle chooses between 5% to 20% for a sample_size when using automatic sampling.

In our next installment we will look at automatics the collection of histogram data fromdbms_stats.

11g Update: Oracle guru Guy Harrison also offers this advice for 11g statistics collection on function-based index columns.

In 11g, I think there are two other ways to get statistics collected for indexed expressions:

1) Collect extended statistics directly on the expression.So for instance, if we had a function SALES_CATEGORY, we might do this:

DBMS_STATS.gather_table_stats
(ownname => USER,
tabname => ‘SALES’,
method_opt => ‘FOR ALL COLUMNS FOR COLUMNS
(sale_category(amount_sold))’ );

2) Create a virtual column on the expression, then index that column. So for the same example as above we might create the following virtual column, then index the column and collect stats as usual:

ALTER TABLE
SALES
ADD
sales_category
GENERATED ALWAYS AS
(sale_category(amount_sold));

I think I like the first method better, because the statistics will still exist even if the index is dropped and – unlike the second approach – it doesn’t change the logical structure of the table.

9i (SYSTEM STATISTICS)

The parameter name to get, which can have one of the following values:

sreadtim–average time to read single block (random read), in milliseconds
mreadtim–average time to read an mbrc block at once (sequential read), in milliseconds
cpuspeed–average number of CPU cycles per second, in millions
mbrc–average multiblock read count for sequential read, in blocks
maxthr–maximum I/O system throughput, in bytes/sec
slavethr–average slave I/O throughput, in bytes/sec

10g (SYSTEM STATISTICS) –
From 10g onwards these are same parameters
The parameter name to get, which can have one of the following values:

iotfrspeed – I/O transfer speed in bytes for each millisecond
ioseektim – seek time + latency time + operating system overhead time, in milliseconds
sreadtim – average time to read single block (random read), in milliseconds
mreadtim – average time to read an mbrc block at once (sequential read), in milliseconds
cpuspeed – average number of CPU cycles for each second, in millions, captured for the workload (statistics collected using ‘INTERVAL’ or ‘START’ and ‘STOP’ options)
cpuspeednw – average number of CPU cycles for each second, in millions, captured for the noworkload (statistics collected using ‘NOWORKLOAD’ option.
mbrc – average multiblock read count for sequential read, in blocks
maxthr – maximum I/O system throughput, in bytes/second
slavethr – average slave I/O throughput, in bytes/second

Arup Nanda has a great article on extended statistics with dbms_stats, specialty histogram analysis using function-based columnar data:

Next, re-gather statistics on the table and collect the extended statistics on the expression upper(cust_name).


begin
  dbms_stats.gather_table_stats (
     ownname    => 'ARUP',
     tabname    => 'CUSTOMERS',
     method_opt => 'for all columns size skewonly for columns (upper(cust_name))'
  );
end;

Alternatively you can define the column group as part of the gather statistics command.

You do that by placing these columns in the method_opt parameter of the gather_table_stats procedure in dbms_stats as shown below:

begin    dbms_stats.gather_table_stats (       ownname         => 'ARUP',       tabname         => 'BOOKINGS',       estimate_percent=> 100,       method_opt  => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS(HOTEL_ID,RATE_CATEGORY)',        cascade         => true

end;

ref: http://www.dba-oracle.com/oracle_tips_dbms_stats1.htm

clusterclouds

Quickly Improve SQL Performance with dbms_stats

Name Type

The CASCADE Option

The DEGREE Option

Automating Sample Size with dbms_stats

Leave a comment Cancel reply

Name Type

The CASCADE Option

The DEGREE Option

Automating Sample Size with dbms_stats

Share this:

Related

Leave a comment Cancel reply