# Quickly Improve SQL Performance with dbms_stats

*Oracle Tips by Burleson Consulting
Updated for 11g on September 17, 2010*

The old fashioned “*analyze table*” and *dbms_utility* methods for generating CBO statistics are obsolete and somewhat dangerous to SQL performance. This is because the cost-based SQL Optimizer (CBO) relies on the quality of the statistics to choose the best execution plan for all SQL statements. The *dbms_stats *utility does a far better job in estimating statistics, especially for large partitioned tables, and the better stats results in faster SQL execution plans.

Let’s see how *dbms_stats* works. It’s easy! Here is a sample execution of *dbms_stats*with the *options* clause:

exec dbms_stats.gather_schema_stats(

ownname => ‘SCOTT’,

estimate_percent => dbms_stats.auto_sample_size,

method_opt => ‘for all columns size repeat’,

degree => 34);

When the *options *clause is specified you may specify GATHER options. When GATHER AUTO is specified, the only additional valid parameters are ownname, stattab, statid, objlist and statown; all other parameter settings are ignored.

exec dbms_stats.gather_schema_stats(

ownname => ‘SCOTT’,

**options => ‘GATHER AUTO’**);

There are several values for the *options* parameter that we need to know about:

*gather*– re-analyzes the whole schema.*gather empty*– Only analyze tables that have no existing statistics.*gather stale*– Only re-analyze tables with more than 10% modifications (inserts, updates, deletes).*gather auto*– This will re-analyze objects which currently have no statistics and objects with stale statistics. Using*gather auto*is like combining*gather stale*and*gather empty*.

Note that both *gather stale* and *gather auto* require monitoring. If you issue the “*alter table xxx monitoring*” command, Oracle tracks changed tables with the*dba_tab_modifications* view. Below we see that the exact number of inserts, updates and deletes are tracked since the last analysis of statistics.

SQL> desc dba_tab_modifications;

## Name Type

TABLE_OWNER VARCHAR2(30)

TABLE_NAME VARCHAR2(30)

PARTITION_NAME VARCHAR2(30)

SUBPARTITION_NAME VARCHAR2(30)

**INSERTS NUMBER**

UPDATES NUMBER

DELETES NUMBER

TIMESTAMP DATE TRUNCATED VARCHAR2(3)

The most interesting of these options is the *gather stale* option. Because all statistics will become stale quickly in a robust OLTP database, we must remember the rule for *gather stale* is > 10% row change (based on *num_rows* at statistics collection time).

Hence, almost every table except read-only tables will be re-analyzed with the *gather stale* option. Hence, the *gather stale* option is best for systems that are largely read-only. For example, if only 5% of the database tables get significant updates, then only 5% of the tables will be re-analyzed with the “gather stale” option.

**The CASCADE Option**

When analyzing specific tables, the cascade option can be used to analyze all related indexes.

The Oracle documentation notes that using the cascade option gathers statistics on the table, plus all indexes for the target table. Using this option is equivalent to running*gather_table_stats* plus running *gather_index_stats *for each index on the table. If you always want indexes analyzed when running gather_table_stats you can use the*set_database_prefs, set_global_prefs*, or *set_table_prefs*, to always include indexes when gather_table_stats is executed.

exec dbms_stats.gather_table_stats(

ownname => ‘PERFSTAT’,

tabname => ’STATS$SNAPSHOT’

estimate_percent => dbms_stats.auto_sample_size,

method_opt => ‘for all columns size skewonly’,

**cascade => true**,

degree => 7);

#### The DEGREE Option

Note that you can also parallelize the collection of statistics because the CBO does full-table and full-index scans. When you set *degree=x*, Oracle will invoke parallel query slave processes to speed up table access. Degree is usually about equal to the number of CPUs, minus 1 (for the OPQ query coordinator).

#### Automating Sample Size with dbms_stats

Now that we see how the *dbms_stats* *options* works, get see how to specify the sample size for *dbms_stats*. The following *estimate_percent* argument is a new way to allow Oracle’s *dbms_stats* to automatically estimate the “best” percentage of a segment to sample when gathering statistics:

estimate_percent => dbms_stats.**auto_sample_size**

You can verify the accuracy of the automatic statistics sampling by looking at the*dba_tables* *sample_size* column. It is interesting to note that Oracle chooses between 5% to 20% for a *sample_size* when using automatic sampling.

In our next installment we will look at automatics the collection of histogram data from*dbms_stats*.

**11g Update:** Oracle guru Guy Harrison also offers this advice for 11g statistics collection on function-based index columns.

In 11g, I think there are two other ways to get statistics collected for indexed expressions:

1) Collect extended statistics directly on the expression.So for instance, if we had a function SALES_CATEGORY, we might do this:

DBMS_STATS.gather_table_stats

(ownname => USER,

tabname => ‘SALES’,

method_opt => ‘FOR ALL COLUMNS FOR COLUMNS

(sale_category(amount_sold))’ );

2) Create a virtual column on the expression, then index that column. So for the same example as above we might create the following virtual column, then index the column and collect stats as usual:

ALTER TABLE

SALES

ADD

sales_category

GENERATED ALWAYS AS

(sale_category(amount_sold));

I think I like the first method better, because the statistics will still exist even if the index is dropped and – unlike the second approach – it doesn’t change the logical structure of the table.

**9i (SYSTEM STATISTICS)**

The parameter name to get, which can have one of the following values:

sreadtim–average time to read single block (random read), in milliseconds

mreadtim–average time to read an mbrc block at once (sequential read), in milliseconds

cpuspeed–average number of CPU cycles per second, in millions

mbrc–average multiblock read count for sequential read, in blocks

maxthr–maximum I/O system throughput, in bytes/sec

slavethr–average slave I/O throughput, in bytes/sec

**10g (SYSTEM STATISTICS)** –

From 10g onwards these are same parameters

The parameter name to get, which can have one of the following values:

iotfrspeed – I/O transfer speed in bytes for each millisecond

ioseektim – seek time + latency time + operating system overhead time, in milliseconds

sreadtim – average time to read single block (random read), in milliseconds

mreadtim – average time to read an mbrc block at once (sequential read), in milliseconds

cpuspeed – average number of CPU cycles for each second, in millions, captured for the workload (statistics collected using ‘INTERVAL’ or ‘START’ and ‘STOP’ options)

cpuspeednw – average number of CPU cycles for each second, in millions, captured for the noworkload (statistics collected using ‘NOWORKLOAD’ option.

mbrc – average multiblock read count for sequential read, in blocks

maxthr – maximum I/O system throughput, in bytes/second

slavethr – average slave I/O throughput, in bytes/second

Arup Nanda has a great article on extended statistics with *dbms_stats*, specialty histogram analysis using function-based columnar data:

*Next, re-gather statistics on the table and collect the extended statistics on the expression upper(cust_name).*

begindbms_stats.gather_table_stats (ownname => 'ARUP',tabname => 'CUSTOMERS',method_opt => 'for all columns size skewonly for columns (upper(cust_name))');end;

Alternatively you can define the column group as part of the gather statistics command.

*You do that by placing these columns in the method_opt parameter of the gather_table_stats procedure in dbms_stats as shown below:
*

begin dbms_stats.gather_table_stats ( ownname => 'ARUP', tabname => 'BOOKINGS', estimate_percent=> 100, method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS(HOTEL_ID,RATE_CATEGORY)', cascade => true

end;

ref: http://www.dba-oracle.com/oracle_tips_dbms_stats1.htm