Professional Documents
Culture Documents
Analyze New
Analyze New
**This is where histograms come into play � they allow us to give the optimizer
more detailed information about the
distribution of values in a column. Until 12c they come in two flavours: frequency
histograms and height-balanced
histograms � and in our example we need a frequency histogram. (Note: 12c has two
new types of histogram: Top-N and hybrid).
Purpose of Histograms
By default the optimizer assumes a uniform distribution of rows across the distinct
values in a column. For columns that contain data skew (a nonuniform distribution
of data within the column), a histogram enables the optimizer to generate accurate
cardinality estimates for filter and join predicates that involve these columns.
For example, a California-based book store ships 95% of the books to California, 4%
to Oregon, and 1% to Nevada. The book orders table has 300,000 rows.
A table column stores the state to which orders are shipped. A user queries the
number of books shipped to Oregon. Without a histogram, the optimizer assumes an
even distribution of 300000/3 (the NDV is 3), estimating cardinality at 100,000
rows. With this estimate, the optimizer chooses a full table scan. With a
histogram,
the optimizer calculates that 4% of the books are shipped to Oregon, and chooses an
index scan.
which columns will or will not have base column statistics gathered on them
the histogram creation,
the creation of extended statistics
The METHOD_OPT parameter syntax is made up of multiple parts. The first two parts
are mandatory and are broken down
in the diagram below.
The leading part of the METHOD_OPT syntax controls which columns will have base
column statistics
(min, max, NDV, number of nulls, etc) gathered on them. The default, FOR ALL
COLUMNS, will collects base column
statistics for all of the columns (including hidden columns) in the table. The
alternative values limit the collection
of base column statistics as follows;
FOR ALL INDEXED COLUMNS limits base column gathering to only those columns that are
included in an index. This value is
not recommended as it is highly unlikely that only index columns will be used in
the select list, where clause predicates,
and group by clause of all of the SQL statement executed in the environment.
FOR ALL HIDDEN COLUMNS limits base column statistics gathering to only the virtual
columns that have been created on a table.
This means none of the actual columns in the table will have any column statistics
gathered on them. Again this value is not
recommended for general statistics gathering purposes. It should only be used when
statistics on the base table columns are
accurate and a new virtual column(s) has been created (e.g. a new column group is
created). Then gathering statistics in
this mode will gather statistics on the new virtual columns without re-gathering
statistics on the base columns.
Note that if a column is not included in the list to have statistics gathered on
it, then only its average column length
is gathered. The average column length is used to correctly compute average row
length and discarded (i.e., not saved to
disk) after use.
The SIZE part of the METHOD_OPT syntax controls the creation of histograms and can
have the following settings;
AUTO means Oracle will automatically determines the columns that need histograms
based on the column usage information
(SYS.COL_USAGE$), and the presence of a data skew.
An integer value indicates that a histogram will be created with at most the
specified number of buckets. Must be in the
range [1,254]. To force histogram creation it is recommend that the number of
buckets be left at 254. Note SIZE 1 means no
histogram will be created.
REPEAT ensures a histogram will only be created for any column that already has
one. If the table is a partitioned table,
repeat ensures a histogram will be created for a column that already has one on the
global level. However, this is not a
recommended setting, as the number of buckets currently in each histogram will
limit the maximum number of buckets used for
the newly created histograms. Lets assume there are 5 buckets currently in a
histogram. When the histogram is re-gathered
with SIZE REPEAT, the newly created histogram will use at most 5 buckets and may
not been of good quality.
SKEWONLY automatically creates a histogram on any column that shows a skew in its
data distribution.
**estimate_percent
The value for estimate_percent is the percentage of rows to estimate, with NULL
meaning compute.
**Cascade
Gathers statistics on the indexes for this table. Using this option is equivalent
to running the GATHER_INDEX_STATS Procedure on each of the table's indexes
**for view collected statistics use table dba_tables