Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 3

EXEC dbms_stats.

gather_table_stats (ownname=>'xx', tabname=>'xxx' ,


method_opt=>'FOR ALL COLUMNS SIZE AUTO', DEGREE=>'04',
estimate_percent=>'0.0001', granularity=>'ALL', CASCADE=>TRUE);

**This is where histograms come into play � they allow us to give the optimizer
more detailed information about the
distribution of values in a column. Until 12c they come in two flavours: frequency
histograms and height-balanced
histograms � and in our example we need a frequency histogram. (Note: 12c has two
new types of histogram: Top-N and hybrid).

In principle a frequency histogram is an exact picture of the data at a moment in


time (and that last phrase
is very important) whereas a height-balanced histogram is an approximate image of
the data distribution that tries to
capture details of popular values and the uneven spread of the rest. A frequency
histogram can be created when a column
holds no more that 254 distinct values (2,048 in 12c), whereas a height balanced
histogram is much less precise and can�t
really capture information about more than 127 popular values.

Purpose of Histograms
By default the optimizer assumes a uniform distribution of rows across the distinct
values in a column. For columns that contain data skew (a nonuniform distribution
of data within the column), a histogram enables the optimizer to generate accurate
cardinality estimates for filter and join predicates that involve these columns.

For example, a California-based book store ships 95% of the books to California, 4%
to Oregon, and 1% to Nevada. The book orders table has 300,000 rows.
A table column stores the state to which orders are shipped. A user queries the
number of books shipped to Oregon. Without a histogram, the optimizer assumes an
even distribution of 300000/3 (the NDV is 3), estimating cardinality at 100,000
rows. With this estimate, the optimizer chooses a full table scan. With a
histogram,
the optimizer calculates that 4% of the books are shipped to Oregon, and chooses an
index scan.

**The METHOD_OPT parameter actually controls the following,

which columns will or will not have base column statistics gathered on them
the histogram creation,
the creation of extended statistics

The METHOD_OPT parameter syntax is made up of multiple parts. The first two parts
are mandatory and are broken down
in the diagram below.

FOR ALL [INDEXED|HIDDEN] COLUMNS SIZE {AUTO|REPEATE|SKEWONLY}

The leading part of the METHOD_OPT syntax controls which columns will have base
column statistics
(min, max, NDV, number of nulls, etc) gathered on them. The default, FOR ALL
COLUMNS, will collects base column
statistics for all of the columns (including hidden columns) in the table. The
alternative values limit the collection
of base column statistics as follows;

FOR ALL INDEXED COLUMNS limits base column gathering to only those columns that are
included in an index. This value is
not recommended as it is highly unlikely that only index columns will be used in
the select list, where clause predicates,
and group by clause of all of the SQL statement executed in the environment.

FOR ALL HIDDEN COLUMNS limits base column statistics gathering to only the virtual
columns that have been created on a table.
This means none of the actual columns in the table will have any column statistics
gathered on them. Again this value is not
recommended for general statistics gathering purposes. It should only be used when
statistics on the base table columns are
accurate and a new virtual column(s) has been created (e.g. a new column group is
created). Then gathering statistics in
this mode will gather statistics on the new virtual columns without re-gathering
statistics on the base columns.

Note that if a column is not included in the list to have statistics gathered on
it, then only its average column length
is gathered. The average column length is used to correctly compute average row
length and discarded (i.e., not saved to
disk) after use.

The SIZE part of the METHOD_OPT syntax controls the creation of histograms and can
have the following settings;

AUTO means Oracle will automatically determines the columns that need histograms
based on the column usage information
(SYS.COL_USAGE$), and the presence of a data skew.

An integer value indicates that a histogram will be created with at most the
specified number of buckets. Must be in the
range [1,254]. To force histogram creation it is recommend that the number of
buckets be left at 254. Note SIZE 1 means no
histogram will be created.

REPEAT ensures a histogram will only be created for any column that already has
one. If the table is a partitioned table,
repeat ensures a histogram will be created for a column that already has one on the
global level. However, this is not a
recommended setting, as the number of buckets currently in each histogram will
limit the maximum number of buckets used for
the newly created histograms. Lets assume there are 5 buckets currently in a
histogram. When the histogram is re-gathered
with SIZE REPEAT, the newly created histogram will use at most 5 buckets and may
not been of good quality.

SKEWONLY automatically creates a histogram on any column that shows a skew in its
data distribution.

**DEGREE : Degree of parallelization of the gathering session.

**estimate_percent
The value for estimate_percent is the percentage of rows to estimate, with NULL
meaning compute.

**Granularity of statistics to collect (only pertinent if the table is


partitioned).
ALL Gathers all (subpartition, partition, and global) statistics
AUTO Determines the granularity based on the partitioning type. This is the
default value
DEFAULT Gathers global and partition-level statistics. This option is obsolete,
and while currently supported, it is included in the documentation for legacy
reasons only. You should use the 'GLOBAL AND PARTITION' for this functionality
GLOBAL Gathers global statistics
GLOBAL AND PARTITION gathers the global and partition level statistics. No
subpartition level statistics are gathered even if it is a composite partitioned
object
PARTITION gathers partition-level statistics
SUBPARTITION gathers subpartition-level statistics

**Cascade
Gathers statistics on the indexes for this table. Using this option is equivalent
to running the GATHER_INDEX_STATS Procedure on each of the table's indexes
**for view collected statistics use table dba_tables

You might also like