Professional Documents
Culture Documents
Using The ETL Infrastructure Inside Oracle Database 10g
Using The ETL Infrastructure Inside Oracle Database 10g
Using The ETL Infrastructure Inside Oracle Database 10g
Purpose
This tutorial covers the Extraction, Transformation, and Loading (ETL) infrastructure of
Oracle Database 10g.
Time to Complete
Approximately 1 hour
Topics
This tutorial covers the following topics:
Overview
Prerequisites
Use Synchronous Change Data Capture (CDC) to Track and Consume Incremental
Data Changes
Propagate Information for a Data Mart
Clean Up
Summary
Viewing Screenshots
Place the cursor over this icon to load and view all the screenshots for this tutorial.
(Caution: This action loads all screenshots simultaneously, so response time may be
slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following
steps to load and view only the screenshot associated with that step. You can hide an
individual screenshot by clicking it.
Overview
What Happens During the ETL Process?
ETL stands for extraction, transformation, and loading. During Extraction, the desired
data has to be identified and extracted from many different sources, including database
systems and applications. Often, it is not possible to identify the specific subset of
interest, meaning that more than necessary data has to be extracted, and the
identification of the relevant data is done at a later point in time. Depending on the
source system’s capabilities (for example, OS resources), some transformations may
take place during this extraction process. The size of the extracted data varies from
hundreds of kilobytes to hundreds of gigabytes, depending on the source system and the
business situation. The same is true for the time delta between two (logically) identical
extractions: the time span may vary between days/hours and minutes to near real-time.
For example, Web server log files can easily become hundreds of megabytes in a very
short period of time.
After extracting data, it needs to be physically transported to the target system or an
intermediate system for further processing. Depending on the chosen way of
transportation, some transformations can be done during this process, too. For example,
a SQL statement that directly accesses a remote target through a gateway, can
concatenate two columns as part of the SELECT statement .
After extracting and transporting the data, the most challenging (and time consuming)
part of ETL follows: Transformation and Loading into the target system. This may
include:
Application of complex filters
Data has to be validated against information already existing in target database tables
Data extracted without the knowledge of new versus changed information has to be
checked against the target objects to determine whether it must be updated or
inserted
The same data has to be inserted several times as detail level and aggregated
information
This should be done as quickly as possible in a scalable manner and must not affect the
existing target with respect to concurrent access for information retrieval.
Oracle offers a wide variety of capabilities to address all the issues and tasks relevant in
an ETL scenario. Oracle Database 10g is the ETL transformation engine.
Back to Topic List
Prerequisites
Before starting this tutorial, you should:
1. Perform the Installing Oracle Database 10g on Windows tutorial.
2. Download and unzip etl2.zip into your working directory (i.e. c:\wkdir).
2. Now show the execution plan for the multi-table insert based on a UNION ALL set operation.
From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\explain_mti_old.sql
DELETE FROM PLAN_TABLE;
COMMIT;
EXPLAIN PLAN FOR
INSERT INTO sales
(prod_id, cust_id, time_id, channel_id,promo_id,amount_sold,quantity_sold)
SELECT product_id, customer_id,weekly_start_date,2,9999,sales_sun,q_sun
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+1,2,9999,sales_mon,q_mon
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+2,2,9999,sales_tue,q_tue
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+3,2,9999,sales_wed,q_wed
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+4,2,9999,sales_thu,q_thu
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+5,2,9999,sales_fri,q_fri
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+6,2,9999,sales_sat,q_sat
FROM sales_input_table;
SET linesize 140
SELECT * from table(dbms_xplan.display);
COMMIT;
Note: The input source table is scanned seven times! The complexity of the denormalization is
handled within the several SELECT operations.
With an increasing number of input records, the superiority and the performance improvement of
the new multi-table insert statement—by reducing the statement to only one SCAN—will
become more and more obvious.
Back to Topic
Using the Multitable Insert for Conditional Insertion
1. Create an intermediate table consisting of new information. From a SQL*Plus session logged on
to the SH schema, execute the following SQL script, which contains the SQL statements from
steps 1-4:
@c:\wkdir\mti2_prepare.sql
Rem create intermediate table with some records
6. You can see what was inserted. Execute the following SQL script:
@c:\wkdir\control_mti2.sql
SELECT COUNT(*) FROM customers;
SELECT COUNT(*) FROM customers_special;
SELECT MIN(cust_credit_limit) FROM customers_special;
7. Before continuing, reset your environment by executing the following SQL script:
@c:\wkdir\reset_mti2.sql
set echo on
REM cleanup and reset
2. When creating an external table, you are defining two parts of information:
1. The metadata information for the table representation inside the database
2. The access parameter definition of how to extract the data from the external file
After creation of this meta information, the external data can be accessed from within the
database, without the necessity of an initial load. To create the external table, from a SQL*Plus
session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\create_external_table2.sql
Rem *****
Rem preparation for upsert
Rem file prodDelta.dat, 1.2 MB must
Rem reuse of directory STAGE_DIR
Rem access via external table
Rem *****
DROP TABLE products_delta;
CREATE TABLE products_delta
(
PROD_ID NUMBER(6),
PROD_NAME VARCHAR2(50),
PROD_DESC VARCHAR2(4000),
PROD_SUBCATEGORY VARCHAR2(50),
PROD_SUBCATEGORY_ID NUMBER,
PROD_SUBCATEGORY_DESC VARCHAR2(2000),
PROD_CATEGORY VARCHAR2(50),
PROD_CATEGORY_ID NUMBER,
PROD_CATEGORY_DESC VARCHAR2(2000),
PROD_WEIGHT_CLASS NUMBER(2),
PROD_UNIT_OF_MEASURE VARCHAR2(20),
PROD_PACK_SIZE VARCHAR2(30),
SUPPLIER_ID NUMBER(6),
PROD_STATUS VARCHAR2(20),
PROD_LIST_PRICE NUMBER(8,2),
PROD_MIN_PRICE NUMBER(8,2),
PROD_TOTAL VARCHAR2(13),
PROD_TOTAL_ID NUMBER,
PROD_SRC_ID NUMBER,
PROD_EFF_FROM DATE,
PROD_EFF_TO DATE,
PROD_VALID CHAR(1)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
NOBADFILE
NOLOGFILE
FIELDS TERMINATED BY "|" LDRTRIM
)
location
('prodDelta.dat')
REJECT LIMIT UNLIMITED NOPARALLEL;
Back to Topic
2. Performing an Upsert Using the SQL MERGE Command
To perform an upsert using the SQL MERGE command, execute the following step:
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\do_merge_new.sql
Rem *****
Rem MERGE
Rem new and modified product information has arrived
Rem use of external table again
Rem - seamlessly within the MERGE command !!!
Rem *****
MERGE INTO products t
USING products_delta s
ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN
UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price,
t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN
INSERT (prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total,
prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
VALUES (s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory,
s.prod_subcategory_id,
s.prod_subcategory_desc, s.prod_category, s.prod_category_id,
s.prod_category_desc,
s.prod_status, s.prod_list_price, s.prod_min_price, s.prod_total,
s.prod_total_id,
s.prod_weight_class, s.prod_pack_size, s.supplier_id)
;
ROLLBACK;
Note: You will ROLLBACK so that you can reissue the same upsert with two SQL statements in
a subsequent operation.
Back to Topic
3. Showing the Execution plan of the MERGE Command
To examine the execution plan of the SQL MERGE statement, perform the following
steps:
1. From a SQL*Plus session logged on to the SH schema, run the explain_merge_new.sql
script, or copy the following SQL statement into your SQL*Plus session:
@c:\wkdir\explain_merge_new.sql
DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR
MERGE INTO products t
USING products_delta s
ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN
UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price,
t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN
INSERT (prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total,
prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
VALUES (s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory,
s.prod_subcategory_id,
s.prod_subcategory_desc, s.prod_category, s.prod_category_id,
s.prod_category_desc,
s.prod_status, s.prod_list_price, s.prod_min_price, s.prod_total,
s.prod_total_id,
s.prod_weight_class, s.prod_pack_size, s.supplier_id)
;
set linesize 140
select * from table(dbms_xplan.display);
Back to Topic
4. Performing an Upsert Using Two Separate SQL Commands
Without the MERGE statement, such upsert functionality could be implemented as a
combination of two SQL statements (insert plus update) or in a procedural
manner. A possible upsert implementation of this business problem as two SQL
commands is shown below.
To demonstrate how this was done, perform the following steps:
1. To leverage an updatable join view, a primary or unique key must exist on the join column.
Therefore, you cannot use the external table directly. As a result, you need to create an
intermediate table in the database with the appropriate unique key constraint. Run the following
script to create the necessary structures and perform the upsert.:
@c:\wkdir\do_merge_816.sql
CREATE TABLE prod_delta NOLOGGING AS SELECT * FROM products_delta;
CREATE UNIQUE INDEX p_uk ON prod_delta(prod_id);
UPDATE
(SELECT s.prod_id, s.prod_name sprod_name, s.prod_desc sprod_desc,
s.prod_subcategory sprod_subcategory,
s.prod_subcategory_desc sprod_subcategory_desc, s.prod_category
sprod_category,
s.prod_category_desc sprod_category_desc, s.prod_status
sprod_status,
s.prod_list_price sprod_list_price, s.prod_min_price
sprod_min_price,
t.prod_id, t.prod_name tprod_name, t.prod_desc tprod_desc,
t.prod_subcategory
tprod_subcategory,
t.prod_subcategory_desc tprod_subcategory_desc, t.prod_category
tprod_category,
t.prod_category_desc tprod_category_desc, t.prod_status
tprod_status,
t.prod_list_price tprod_list_price, t.prod_min_price
tprod_min_price
FROM products t, prod_delta s WHERE s.prod_id=t.prod_id) JV
SET
tprod_list_price =sprod_list_price,
tprod_min_price =sprod_min_price;
INSERT INTO products
(prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total,
prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total,
prod_total_id,
prod_weight_class, prod_pack_size, supplier_id
FROM prod_delta s WHERE NOT EXISTS (SELECT 1 FROM products t WHERE
t.prod_id =
s.prod_id);
ROLLBACK;
Note that you have to issue two separate SQL statements to accomplish the same functionality
when using the MERGE statement.
2. From a SQL*Plus session logged on to the SH schema, view the execution plans by running the
view_explain_merge.sql script, or copy the following SQL statement into your
SQL*Plus session:
@c:\wkdir\view_explain_merge.sql
PROMPT show the plans
DELETE FROM plan_table;
COMMIT;
COMMIT;
...
To leverage the updatable join view functionality, the external table needs to be copied into a real
database table and a unique index on the join column needs to be created. In short, this requires
more operations, more space requirements, and more processing time.
Back to Topic
Back to Topic List
Note the mandatory control columns and the different data types for the columns to being
tracked. The data
types have to be a superset of the target data types to enable a proper tracking of errors, e.g. A
non-number value
for a number target column.
SET LINESIZE 60
DESCRIBE err$_sales_overall
4. Try to load the data residing in the external file into the target table. The default behavior of the
error
logging functionality is set to a REJECT LIMIT of zero. In the case of an error, the DML
operation will fail
and the first record raising the error is stored in the error logging table. Run the script
ins_elt_1.sql to see this behavior.
@c:\wkdir\ins_elt_1.sql
PROMPT First insert attempt , DEFAULT reject limit 0
PROMPT also, the error message that comes back is the one of error #1
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
SELECT * FROM sales_activity_direct
LOG ERRORS INTO err$_sales_overall ( 'load_test1' );
commit;
PROMPT As you can see, nothing is inserted in the target table, but ONE row
in the error
logging table
select count(*) from sales_overall;
select count(*) from err$_sales_overall;
delete from err$_sales_overall;
commit;
From a DML perspective, you want to ensure that a transaction is either successful or not.
Hence, from a generic perspective, you either want to have the DML operation succeeding no
matter what (which would translate into a reject limit unlimited) or you want to have it failing
when an error occurs (reject limit 0). The decision was made to set the default reject limit to
zero, because any arbitrary number chosen is somewhat meaningless; If you decide to tolerate a
specific number of errors, it is a pure business decision how many errors might be tolerable in a
specific situation.
5. Try the insert again with a REJECT LIMIT of 10 records. If more than 10 errors occur, the
DML operation will fail, and you will find 11 records in the error logging table. Run the script
ins_elt_2.sql to see this behavior.
@c:\wkdir\ins_elt_2.sql
SET ECHO OFF
6. Put the data into the table and figure out what errors your are encountering. Run script
ins_elt_3.sql.
@c:\wkdir\ins_elt_3.sql
PROMPT Reject limit unlimited will succeed
Rem ... as long as you do not run into one of the current limitations ...
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
SELECT * FROM sales_activity_direct
LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT
UNLIMITED;
commit;
8. As of today, there are some limitations around the error logging capability. All limitations have
to deal with situations where an index maintenance is done in a delayed optimization. The
limitations will be addressed in a future release of Oracle. Run the script elt_limit.sql.
@c:\wkdir\elt_limit.sql
PROMPT discuss a case with limitation
truncate table sales_overall;
alter table sales_overall add constraint pk_1 primary key (prod_id,
cust_id,time_id,
channel_id, promo_id);
10. Run the same DML operation again, but enforce that the TO_DATE() conversion cannot be
pushed to the DML operation. By using a view construct with a NO_MERGE hint, you can
accomplish this. The equivalent SQL SELECT statement itself fails, as you can see by running
the script sel_ins_elt_5.sql.
@c:\wkdir\sel_ins_elt_5.sql
PROMPT The equivalent SQL statement will fail if the predicate is
artificially kept -
within a lower query block
And the plan shows that the conversion is taking place inside the view, thus the error. Run the
script xins_elt_5.sql to view this behavior:
@c:\wkdir\xins_elt_5.sql
explain plan for
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
SELECT * FROM
( SELECT /*+ NO_MERGE */ prod_id, cust_id,
TO_DATE(time_id,'DD-mon-yyyy'),
channel_id, promo_id, quantity_sold, amount_sold
FROM sales_activity_direct )
LOG ERRORS INTO err$_sales_overall ( 'load_20040802' )
REJECT LIMIT UNLIMITED;
Back to Topic
Back to Topic List
Experience the Basics of the Table Function
In the ETL process, the data extracted from a source system passes through a sequence
of transformations before it is loaded into a data warehouse. Complex transformations
are implemented in a procedural manner, either outside the database or inside the
database in PL/SQL. When the results of a transformation become too large to fit into
memory, they must be staged by materializing them either into database tables or flat
files. This data is then read and processed as input to the next transformation in the
sequence. Oracle table functions provide support for pipelined and parallel execution of
such transformations implemented in PL/SQL, C, or Java.
A table function is defined as a function that can produce a set of rows as output;
additionally, Oracle table functions can take a set of rows as input. These sets of rows
are processed iteratively in subsets, thus enabling a pipelined mechanism to stream
these subset results from one transformation to the next before the first operation has
finished. Furthermore, table functions can be processed transparently in parallel, which
is equivalent to SQL operations such as a table scan or a sort. Staging tables are no
longer necessary.
Because of the powerful and multifaceted usage of table functions as part of complex
transformations, you examine basic implementation issues for table functions only. To
do this, perform the following activities:
1. Set up the basic objects for a table function.
2. Perform a nonpipelined table function, returning an array of
records.
3. Perform a pipelined table function.
4. Perform transparent parallel execution of table functions.
5. Perform a table function with autonomous DML.
6. Perform seamless streaming through several table functions.
2. Define the object (collection) type for the examples. From a SQL*Plus session logged on to the
SH schema, execute the following SQL script:
@c:\wkdir\setup_tf_collection.sql
CREATE TYPE product_t_table AS TABLE OF product_t;
/
This type represents the structure of the set of records, which will be delivered back by your
table functions.
3. Define a package for the REF CURSOR types. From a SQL*Plus session logged on to the SH
schema, execute the following SQL script:
@c:\wkdir\setup_tf_package.sql
Rem package of all cursor types
Rem handle the input cursor type and the output cursor collection type
4. Create a log table for the table function example with autonomous DML. From a SQL*Plus
session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\cre_ope_table.sql
CREATE TABLE obsolete_products_errors
(prod_id NUMBER, msg VARCHAR2(2000));
This table will be used for the table function example that fans out some data as part of its
execution.
Due to time constraints, only the PL/SQL implementations of table functions are covered here.
For more information, see the ‘Data Cartridge Developer’s Guide’ for details of interface
implementations.
Table functions return a set of records to the subsequent operation. The result set can be
either delivered en-block (the default PL/SQL behavior) or pipelined – increments of
the result set are streamed to the subsequent operation as soon as they are produced.
For table function implementations in PL/SQL, the PL/SQL engine controls the size of
the incrementally returned array. For interface implementations of table functions, in C
or Java, it is up to the programmer to define the incremental return set.
Back to Topic
2. Performing Nonpipelined Table Function, Returning an Array of Records
The following example represents a simple table function, returning its result set
nonpipelined as a complete array. The table function filters out all records of the product
category "Electronics" and returns all other records. Note that the input REF CURSOR
is weakly typed, meaning it could be any valid SQL statement. It is up to the table
function code to handle its input, probably based on other input variables.
To perform the nonpipelined table function, you perform the following steps:
1. From a SQL*Plus session logged on to the SH schema, run create_tf_1.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\create_tf_1.sql
PROMPT SIMPLE PASS-THROUGH, FILTERING OUT obsolete products without
product_category 'Electronics'
Rem uses weakly typed cursor as input
CREATE OR REPLACE FUNCTION obsolete_products(cur cursor_pkg.refcur_t) RETURN
product_t_table
IS
prod_id NUMBER(6);
prod_name VARCHAR2(50);
prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000);
prod_category VARCHAR2(50);
prod_category_desc VARCHAR2(2000);
prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30);
supplier_id NUMBER(6);
prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2);
sales NUMBER:=0;
objset product_t_table := product_t_table();
i NUMBER := 0;
BEGIN
LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
-- Category Electronics is not meant to be obsolete and will be suppressed
IF prod_status='obsolete' AND prod_category != 'Electronics' THEN
-- append to collection
i:=i+1;
objset.extend;
objset(i):=product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price);
END IF;
END LOOP;
CLOSE cur;
RETURN objset;
END;
/
2. You cannot transparently SELECT from the defined table function obsolete_products.
From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\use_tf_1.sql
SELECT DISTINCT UPPER(prod_category), prod_status
FROM TABLE(obsolete_products(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products)));
Note that the input SELECT list for the cursor variable must match the definition of the table
function, in this case, pkg.refcur_t. If you can ensure that the table definition (in this case
products) matches this input cursor, you can simplify the statement by using the SELECT *
FROM TABLE notation.
Back to Topic
3. Performing Pipelined Table Functions
The following example represents the same filtering-out than the previous example.
Unlike the first one, this table function returns its result set pipelined, which means
incrementally. Note that the input REF CURSOR is strongly typed here, meaning it
could be only a SQL statement returning a record type of product_type_record.
To perform the pipelined table function, perform the following steps:
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\create_tf_2.sql
Rem same example, pipelined implementation
Rem strong ref cursor (input type is defined)
Rem a table without a strong typed input ref cursor
Rem cannot be parallelized
CREATE OR REPLACE FUNCTION obsolete_products_pipe(cur
cursor_pkg.strong_refcur_t)
RETURN product_t_table
PIPELINED
PARALLEL_ENABLE (PARTITION cur BY ANY) IS
prod_id NUMBER(6);
prod_name VARCHAR2(50);
prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000);
prod_category VARCHAR2(50);
prod_category_desc VARCHAR2(2000);
prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30);
supplier_id NUMBER(6);
prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2);
sales NUMBER:=0;
BEGIN
LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
IF prod_status='obsolete' AND prod_category !='Electronics' THEN
PIPE ROW (product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price));
END IF;
END LOOP;
CLOSE cur;
RETURN;
END;
/
2. You can now transparently SELECT from the defined table function obsolete_products.
From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\use_tf_2.sql
SELECT DISTINCT prod_category, DECODE(prod_status,'obsolete','NO LONGER
AVAILABLE','N/A')
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products)));
Back to Topic
4. Performing Transparent Parallel Execution of Table Functions
Table functions can be parallelized transparently without any special intervention or
setup. The only necessary declaration, which has to be done as part of the table function
creation, is to tell the SQL engine if there are any restrictions or rules for parallelizing
its execution.
Think of a table function that processes a complex aggregation operation over some
product category attributes. If you want to process this operation in parallel, you have to
guarantee that all records with the same product category attributes are processed from
the same parallel execution slave, so that your aggregation operation covers all records
that belong to the same grouping. This implies that, during parallel processing, a
distribution rule must be used for parallel processing, which would guarantee correct
distribution as described above.
This parallel distribution rule is defined as part of the table function header, for
example:
PARALLEL_ENABLE (PARTITION cur BY ANY) IS
All you need to do to enable parallel execution of a table function is to use the syntax
shown.
To perform one of the table functions you previously defined in parallel, perform the
following steps:
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\pq_session.sql
SELECT *
FROM v$pq_sesstat
WHERE statistic in ('Queries Parallelized','Allocation Height');
2. Now you can use one of the table functions in parallel. You enforce the parallel execution with a
PARALLEL hint on the REF CURSOR. From a SQL*Plus session logged on to the SH schema,
execute the following SQL script:
@c:\wkdir\parallel_tf.sql
SELECT DISTINCT prod_category,
DECODE(prod_status,'obsolete','NO LONGER AVAILABLE','N/A')
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT /*+ PARALLEL(a,4)*/
prod_id, prod_name, prod_desc, prod_subcategory, prod_subcat_desc,
prod_category, prod_cat_desc, prod_weight_class, prod_unit_of_measure,
prod_pack_size, supplier_id,prod_status, prod_list_price,prod_min_price
FROM products a)));
3. Rerun the SQL script to check the status of the parallelized queries:
@c:\wkdir\pq_session.sql
SELECT *
FROM v$pq_sesstat
WHERE statistic in ('Queries Parallelized','Allocation Height');
Back to Topic
5. Performing Table Functions with Autonomous DML
Although a table function is part of a single atomic transaction, it provides the
additional capability to fan out data to other tables within the scope of an autonomous
transaction. This can be used in a variety of ways, such as for exception or progress
logging or to fan out subsets of data to be used by other independent transformations.
The following example logs some "exception" information into the
OBSOLETE_PRODUCTS_ERRORS table:
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\create_tf_3.sql
CREATE OR REPLACE FUNCTION obsolete_products_dml(cur
cursor_pkg.strong_refcur_t,
prod_cat varchar2 DEFAULT 'Electronics') RETURN product_t_table
PIPELINED
PARALLEL_ENABLE (PARTITION cur BY ANY) IS
PRAGMA AUTONOMOUS_TRANSACTION;
prod_id NUMBER(6);
prod_name VARCHAR2(50);
prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000);
prod_category VARCHAR2(50);
prod_category_desc VARCHAR2(2000);
prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30);
supplier_id NUMBER(6);
prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2);
sales NUMBER:=0;
BEGIN
LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
IF prod_status='obsolete' THEN
IF prod_category=prod_cat THEN
INSERT INTO obsolete_products_errors VALUES
(prod_id, 'correction: category '||UPPER(prod_cat)||' still available');
COMMIT;
ELSE
PIPE ROW (product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price));
END IF;
END IF;
END LOOP;
CLOSE cur;
RETURN;
END;
/
2. Now you can select from this table function and check the log table. From a SQL*Plus session
logged on to the SH schema, execute the following SQL script:
@c:\wkdir\use_tf_3a.sql
TRUNCATE TABLE obsolete_products_errors;
SELECT DISTINCT prod_category, prod_status FROM TABLE(obsolete_products_dml(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products)));
SELECT DISTINCT msg FROM obsolete_products_errors;
3. Select from this table function again with a different input argument and check the log table.
@c:\wkdir\use_tf_3b.sql
TRUNCATE TABLE obsolete_products_errors;
SELECT DISTINCT prod_category, prod_status FROM TABLE(obsolete_products_dml(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products),'Photo'));
SELECT DISTINCT msg FROM obsolete_products_errors;
You see that, dependent on the input variable, the table function processes the input rows
differently, thus delivering a different result set and inserting different information into
obsolete_products_error.
Back to Topic
6. Performing Seamless Streaming Through Several Table Functions
Besides the transparent usage of table functions within SQL statements and its
capability of being processed in parallel, one of the biggest advantages is that table
functions can be called from within each other. Furthermore, table functions can be used
in any SQL statement and become the input for all kinds of DML statements.
To see how this is done, perform the following steps:
1. This SQL statement uses two table functions, nested in each other. From a SQL*Plus session
logged on to the SH schema, execute the following SQL script:
@c:\wkdir\use_tf_stream.sql
SELECT DISTINCT prod_category, prod_status
FROM TABLE(obsolete_products_dml(CURSOR(SELECT *
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products))))));
SELECT COUNT(*) FROM obsolete_products_errors;
SELECT DISTINCT msg FROM obsolete_products_errors;
set echo off
2. Now you can use this table function as INPUT for a CREATE TABLE AS SELECT
command. From a SQL*Plus session logged on to the SH schema, execute the following SQL
script:
@c:\wkdir\CTAS_tf.sql
CREATE TABLE PIPE_THROUGH AS
SELECT DISTINCT prod_category, prod_status
FROM TABLE(obsolete_products_dml(CURSOR(SELECT *
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products))))));
3. Before you move to the next section of this tutorial, you need to clean up your environment by
executing the following script:
@c:\wkdir\cleanup_tf.sql
DROP TABLE obsolete_products_errors;
DROP TABLE pipe_through;
DROP FUNCTION obsolete_products;
DROP FUNCTION obsolete_products_pipe;
DROP FUNCTION obsolete_products_dml;
Back to Topic
Back to Topic List
Using Synchronous Change Data Capture (CDC) to Capture and
Consume Incremental Source Changes
Data warehousing involves the extraction and transportation of relational data from one
or more source databases, into the data warehouse for analysis. Change Data Capture
quickly identifies and processes only the data that has changed, not entire tables, and
makes the change data available for further use.
Without Change Data Capture, database extraction is a cumbersome process in which
you move the entire contents of tables into flat files, and then load the files into the data
warehouse. This ad hoc approach is expensive in a number of ways.
Change Data Capture does not depend on intermediate flat files to stage the data outside
of the relational database. It captures the change data resulting from INSERT,
UPDATE, and DELETE operations made to user tables. The change data is then stored in
a database object called a change table, and the change data is made available to
applications in a controlled way.
Publish and Subscribe Model
Most Change Data Capture systems have one publisher that captures and publishes
change data for any number of Oracle source tables. There can be multiple subscribers
accessing the change data. Change Data Capture provides PL/SQL packages to
accomplish the publish and subscribe tasks.
Publisher
The publisher is usually a database administrator (DBA) who is in charge of creating
and maintaining schema objects that make up the Change Data Capture system. The
publisher performs the following tasks:
Determines the relational tables (called source tables) from which the data
warehouse application is interested in capturing change data.
Uses the Oracle supplied package, DBMS_LOGMNR_CDC_PUBLISH, to set up the
system to capture data from one or more source tables.
Publishes the change data in the form of change tables.
Allows controlled access to subscribers by using the SQL GRANT and REVOKE
statements to grant and revoke the SELECT privilege on change tables for users and
roles.
Subscriber
The subscribers, usually applications, are consumers of the published change data.
Subscribers subscribe to one or more sets of columns in source tables. Subscribers
perform the following tasks:
Drop the subscriber view and purge the subscription window when finished
processing a block of changes.
Drop the subscription when the subscriber no longer needs its change data.
Please consult the ‘Oracle Data Warehousing Guide’ for a more detailed discussion
about the Change Data Capture Architecture.
To learn more about this topic, perform the following steps:
1. Use synchronous CDC to track all the incremental changes.
2. Create a change table.
3. Subscribe to a change set and to all source table columns of interest.
4. Activate a subscription and extend the subscription window.
5. Investigate how to handle the new environment over time.
6. Run the publisher.
7. Drop the used change view and purge the subscription window.
8. Clean up the CDC environment.
Back to Topic
Back to Topic List
1. Using Synchronous CDC to Track all the Incremental Changes
You use synchronous CDC to track all the incremental changes for the table PRODUCTS
by performing the following steps ( For demonstration purposes the publisher and
subscriber will be the same database user.):
To see how this is done, perform the following steps:
1. First create a new intermediate table where you apply all changes. From a SQL*Plus session
logged on to the SH schema, run cr_cdc_target.sql, or copy the following SQL
statements into your SQL*Plus session:
@c:\wkdir\cr_cdc_target.sql
CREATE TABLE my_price_change_Electronics
(prod_id number, prod_min_price number, prod_list_price number, when date);
2. If you never used Change Data Capture before, you won't see any objects related to CDC other
than the synchronous change set SYNC_SET and the synchronous change source
SYNC_SOURCE. Both of these objects are created as part of the database creation.
@c:\wkdir\show_cdc_env1.sql
SELECT * FROM change_sources;
PROMPT see the change tables
Rem shouldn't show anything
SELECT * FROM change_tables;
PROMPT see the change sets SYNC_SET
SELECT decode(to_char(end_date,'dd-mon-yyyy hh24:mi:ss'),null,'No end date
set.')
end_date,
decode(to_char(freshness_date,'dd-mon-yyyy hh24:mi:ss'),null,'No freshness
date set.') freshness_date
FROM change_sets WHERE set_name='SYNC_SET';
Back to Topic
2. Creating a Change Table
All changes are stored in change tables. Change tables have a one-to-one dependency to
a source table; they consist of several fixed metadata columns and a dynamic set of
columns equivalent to the identified source columns of interest.
1. You create a change table by using the provided package DBMS_CDC_PUBLISH.
This script creates a change table called PROD_PRICE_CT and the necessary trigger
framework to track all subsequent changes on PRODUCTS. Note that the tracking is done as part
of the atomic DML operation against PRODUCTS.
See the static metadata columns as well as the columns PROD_ID, PROD_MIN_PRICE, and
PROD_LIST_PRICE (equivalent to source table PRODUCTS).
3. To see the metadata of the published source table and the change tables in the system, issue the
following SQL commands:
@c:\wkdir\show_cdc_env2.sql
prompt see the published source tables
SELECT * FROM dba_source_tables;
prompt see published source columns
SELECT source_schema_name,source_table_name, column_name
FROM dba_source_tab_columns;
prompt see the change tables
SELECT * FROM change_tables;
Back to Topic
3. Subscribing to a Change Set and to All Source Table Columns of Interest
You can subscribe to a change set for controlled access to the published change data for
analysis. Note that the logical entity for a subscription handle is a change set and not a
change table. A change set can consist of several change tables and guarantees logical
consistency for all of its change tables. After subscribing to a change set, you have to
subscribe to all source table columns of interest.
1. You must get a unique subscription handle that is used throughout the session and tell the system
what columns you are interested in. You are using the existing change set SYNC_SET. This
functionality is provided by the package DBMS_CDC_SUBSCRIBE.
@c:\wkdir\subs_cdc_ct.sql
variable subname varchar2(30)
begin
:subname := 'my_subscription_no_1';
DBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION (
CHANGE_SET_NAME => 'SYNC_SET',
DESCRIPTION => 'Change data PRODUCTS for ELECTRONICS',
SUBSCRIPTION_name => :subname);
END;
/
PRINT subname
On the subscriber site, you identified and subscribed to all source tables (and
columns) of interest and defined/created our Change View.
Now you are ready to use CDC.
Back to Topic
4.Activating a Subscription and Extending the Subscription Window
The first step after setting up the publish and subscribe framework is to activate a
subscription and to extend the subscription window to see all changes that have taken
place since the last subscription window extension or since the activation of a
subscription.
1. Activate and extend a subscription window. You will also have a look into the metadata.
@c:\wkdir\sub_cdc1.sql
rem now activate the subscription since we are ready to receive
rem change data the ACTIVATE_SUBSCRIPTION procedure sets
rem subscription window to empty initially
rem At this point, no additional source tables can be added to the
rem subscription
EXEC DBMS_LOGMNR_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION -
(SUBSCRIPTION_name => 'my_subscription_no_1')
UPDATE products
SET prod_list_price=prod_list_price*1.1
WHERE prod_min_price > 100;
COMMIT;
Rem you will see entries in the change table
Rem note that we have entries for the old and the new values
4. You can now select from this system-generated change view, and you will see only the changes
that have happened for your time window.
@c:\wkdir\sel_cdc_cv1.sql
Rem changes classified for specific product groups
SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,
p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN';
PROMPT and especially the Electronics' ones - 3 records only
SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,
p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN'
AND p2.prod_category='Electronics';
In the example shown above, you are joining back to the source table PRODUCTS. When you set
up a CDC environment, please ensure that such an operation should not be necessary; if possible,
a change table should be usable as stand-alone source for synchronizing your target environment.
Joining back to the source table may reduce the data stored on disk, but it creates additional
workload on the source site. Join-back cannot be accomplished on a different system than the
source system.
COMMIT;
Back to Topic
5.Activating a Subscription and Extending the Subscription Window
So far you exercised your first consumption with the synchronous CDC framework.
Now investigate how to handle such an environment over time when additional changes
will happen and old changes are consumed by all subscribers.
1. Other DML operations take place on the source table PRODUCTS.
@c:\wkdir\dml_cdc2.sql
PROMPT other changes will happen
UPDATE products
SET prod_min_price=prod_min_price*1.1
WHERE prod_min_price < 10;
COMMIT;
2. The synchronous CDC framework tracks these DML operations transparently in the change
table.
@c:\wkdir\show_cdc_ct2.sql
SELECT count(*) FROM prod_price_ct ;
PROMPT and especially the ELECTRONICS' ones
SELECT COUNT(*) FROM prod_price_ct p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
3. Run the following script to view the product data.
@c:\wkdir\sel_cdc_cv1.sql
Rem changes classified for specific product groups
EXEC DBMS_CDC_SUBSCRIBE.PURGE_WINDOW -
(SUBSCRIPTION_name => 'my_subscription_no_1');
PROMPT still the same number of records in the change table
PROMPT REMEMBER THE NUMBER OF ROWS !!!
SELECT COUNT(*) FROM prod_price_ct;
PROMPT ... change view is empty
SELECT COUNT(*) FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
5. To get the new changes reflected in the change view, you have to extend the time window for the
change view.
@c:\wkdir\ext_cdc_sub_window1.sql
PROMPT let's get the new change
Rem 'do it again Sam'
Rem first extend the window you want to see
EXEC DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW -
(SUBSCRIPTION_name => 'my_subscription_no_1');
Rem ... now you will see exactly the new changed data since the last
consumption
SELECT COUNT(*) FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
The new change view shows exactly the changes that were not consumed yet. This is different
than the content in the change table.
6. Now consume the new changes. Because the incremental changes you are interested in are stored
in a change table (a "normal" Oracle table), you can use any language or SQL construct the
database supports.
@c:\wkdir\consume_cdc2.sql
PROMPT the new changes will be used
MERGE INTO my_price_change_Electronics t
USING (
SELECT p1.prod_id, p1.prod_min_price, p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics' AND
operation$='UN') cv
ON (cv.prod_id = t.prod_id)
WHEN MATCHED THEN
UPDATE SET t.prod_min_price=cv.prod_min_price,
t.prod_list_price=cv.prod_list_price
WHEN NOT MATCHED THEN
INSERT VALUES (cv.prod_id, cv.prod_min_price, cv.prod_list_price,
commit_timestamp$);
COMMIT;
rem look at them
SELECT prod_id, prod_min_price, prod_list_price,
to_char(when,'dd-mon-yyyy hh24:mi:ss')
FROM my_price_change_electronics;
Back to Topic
6. Running the Publisher
The Publisher is responsible for maintaining the CDC framework and guaranteeing that
the change tables are purged regularly as soon as all subscribers have consumed a
specific set of change information.
1. From a SQL*Plus session logged on to the SH schema, run purge_ct.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\purge_cdc_ct.sql
PROMPT what to do to avoid overflow of change table ? PURGE IT !
Rem only the rows where we have no potential subscribers will be purged !!!
Rem we guarantee to keep all rows as long as we have subscribers which
haven't
rem purged their window ...
exec DBMS_CDC_PUBLISH.purge_change_table('sh','prod_price_ct')
PROMPT REMEMBER THE NUMBER OF ROWS !!!
Rem you will have 18 entries for the last updated 9 records
SELECT COUNT(*) FROM prod_price_ct;
PROMPT this is exactly twice the number of changes we made with the second
DML operation
SELECT COUNT(*) FROM prod_price_ct p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
Although we haven’t specified any specific time window for the purge operation of the change
table, the CDC framework ensures that only those changes will be purged that are no longer
needed by a subscriber.
Back to Topic
7. Dropping the Used Change View and Purging the Subscription Window
1. Because you consumed the second set of change already, you can drop your used change view
and purge your subscription window. You will also purge the change table.
@c:\wkdir\purge_cdc_sub_window2.sql
PROMPT ... purge the newly consumed data from the subscription window (2nd
DML operation))
EXEC DBMS_CDC_SUBSCRIBE.PURGE_WINDOW -
(SUBSCRIPTION_name =>'my_subscription_no_1');
PROMPT purge all change tables again
EXEC DBMS_CDC_PUBLISH.purge_change_table('sh','prod_price_ct')
PROMPT ... and you will see an empty change
SELECT * FROM prod_price_ct;
The change table is now empty. All changes are consumed and can be purged.
Back to Topic
8. Cleaning Up the CDC Environment
Clean up the CDC environment. To do this, perform the following step:
1. From a SQL*Plus session logged on to the SH schema, run cleanup_cdc.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\cleanup_cdc.sql
PROMPT CLEAN UP
exec DBMS_CDC_SUBSCRIBE.drop_subscription -
(subscription_name=> 'my_subscription_no_1');
exec DBMS_CDC_PUBLISH.DROP_CHANGE_TABLE (OWNER => 'sh', -
CHANGE_TABLE_NAME => 'prod_price_CT', -
FORCE_FLAG => 'Y');
DROP TABLE my_price_change_electronics;
UPDATE products SET prod_list_price=prod_list_price/1.1
WHERE prod_min_price > 100;
UPDATE products SET prod_list_price=prod_list_price/1.1
WHERE prod_min_price < 10;
COMMIT;
Back to Topic
Back to Topic List
Propagate Information for a Data Mart
Besides its central data warehouse, MyCompany is running several small divisional data
marts. For example, the products department wants to receive all transactional SALES
data partitioned by its main product categories for marketing campaign analysis; only
the SALES data of 2000 is relevant. You will address the business problem by using
Oracle’s Transportable Tablespace capabilities and Oracle’s LIST partitioning.
Furthermore, to guarantee a successful completion of the generation process of this new
table, you run the statement in RESUMABLE mode, thus ensuring that any space
problem will not cause the creation process to fail.
Note: To demonstrate the benefit of resumable statements, you need TWO sessions and
therefore TWO windows. Please read the following section CAREFULLY before
taking the exercise.
To propagate from a data warehouse to a data mart, you perform the following steps:
1. Enable a RESUMABLE session.
2. Create a new tablespace as a potential transportable tablespace.
3. Create a LIST partitioned table in the new tablespace.
4. Leverage the new RESUMABLE statement capabilities for efficient error detection
and handling.
5. Create a new RANGE_LIST partitioned table (this is Oracle9i Release 2
functionality).
6. Prepare the metadata export for a transportable tablespace.
This brings your session in the so-called resumable mode, names it, and enables a maximum
suspension time of 1200 seconds.
Back to Topic
2. Creating a New Tablespace as Potential Transportable Tablespace
Now create an additional tablespace for storing our new LIST partitioned fact table. To
do this, perform the following steps:
1. First, make sure that the potential tablespace TRANSFER does not exist. From a SQL*Plus
session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\drop_TS.sql
DROP TABLESPACE my_obe_transfer INCLUDING CONTENTS AND DATAFILES;
The tablespace is dropped. If you get an ORA-959 message, you can ignore it; this means that
the tablespace does not exist.
2. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
NOTE: If you are using a different drive or path for your data files, the script will need to be
edited accordingly.
@c:\wkdir\create_ts.sql
CREATE TABLESPACE my_obe_transfer DATAFILE 'c:\obetemp' SIZE 2M REUSE
autoextend off;
The tablespace is intentionally defined too small for the target table to enforce a space error
during table creation.
Back to Topic
Back to Topic
4. Leveraging the New RESUMABLE Statement Capabilities for Efficient Error
Detection and Handling
While the CREATE TABLE AS SELECT is running in Window 1, you can control
the status of all sessions running in RESUMABLE mode in Window 2.
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL statement:
@c:\wkdir\run_sel_w2.sql
SELECT NAME, STATUS, ERROR_MSG FROM dba_resumable;
As long as no error occurs, the session stays in a normal status. As soon as an error occurs, your
session status changes, and the ORA-error causing this problem is displayed. The current
statement in the RESUMABLE session will be suspended until either (a) the problem is fixed, or
(b) the timeout limit is reached. The error will also show up in the alert.log file of the running
instance.
2. Now fix the problem manually. Issue the individual tablespace/datafile definitions of the
following commands. From a SQL*Plus session logged on to the SYSTEM Schema, execute the
following SQL script:
@c:\wkdir\fix_ts.sql
ALTER DATABASE DATAFILE 'c:\obetemp' AUTOEXTEND ON NEXT 5M;
The tablespace can now be autoextended by the database itself without any further interaction.
3. As soon as the error is fixed, the suspended session automatically resumes and continues
processing. Check the data dictionary view for change of status. From a SQL*Plus session
logged on to the SH schema, check the status again:
@c:\wkdir\run_sel_w2.sql
SELECT NAME, STATUS, ERROR_MSG FROM dba_resumable;
The resumption of the statement occurs in the previously hung window.
Back to Topic
5. Creating a Range-List Partitioned Table in the Transfer Tablespace
With the additional partitioning strategy RANGE-LIST, you can use a composite
partitioning method to subdivide a partitioned table based on two logical attributes. The
main partitioning strategy is the well-known and most used Range partitioning strategy.
You find this implementation in nearly every data warehouse implementation dealing
with Rolling Window Operations. Furthermore, every Range partition is partitioned
based on a List Partitioning strategy, enabling a finer partition granule for addressing
your business needs.
A typical example leveraging Range-List Partitioning is a global Retail environment,
having time range partitions (Rolling Window) and a region-oriented list partition
underneath, so that you can maintain every time window for a specific region
independent of each other.
List partitioned tables enable you to group sets of distinct unrelated values together in
one partition. Consequently, all values not being covered with an existing partition
definition will raise an Oracle error message. Although this enforces a business
constraint (you should not see a non-existing region in your data warehouse), you
cannot always be sure to not having such violations. Oracle therefore introduces the
capability to create a DEFAULT List Partition, a kind of catch-all partition for all
undefined partition key values.
Oracle has also introduced the so-called subpartition template technology, a common
optional element of both range-hash and range-list composite partitioning. The template
provides an easy and convenient way to define default subpartitions for each table
partition. Oracle will create these default subpartitions in any partition for which you do
not explicitly define subpartitions. This clause is useful for creating symmetric
partitions.
In this section, you create a Range-List partitioned table, leveraging all of the
functionality mentioned above.
1. From a SQL*Plus session logged on to the SH schema, run the following SQL statements into
your SQL*Plus session:
@c:\wkdir\create_new_range_list.sql
CREATE TABLE sales_rlp
COMPRESS
TABLESPACE MY_OBE_TRANSFER
PARTITION BY RANGE (time_id)
SUBPARTITION BY LIST (channel_id)
SUBPARTITION TEMPLATE
( SUBPARTITION direct values (3),
SUBPARTITION internet values (4),
SUBPARTITION partner values (2),
SUBPARTITION other values (DEFAULT)
)
(PARTITION SALES_before_1999 VALUES LESS THAN (TO_DATE('01-JAN-1999','DD-MON-
YYYY')),
PARTITION SALES_Q1_1999 VALUES LESS THAN (TO_DATE('01-APR-1999','DD-MON-
YYYY')),
PARTITION SALES_Q2_1999 VALUES LESS THAN (TO_DATE('01-JUL-1999','DD-MON-
YYYY')),
PARTITION SALES_Q3_1999 VALUES LESS THAN (TO_DATE('01-OCT-1999','DD-MON-
YYYY')),
PARTITION SALES_Q4_1999 VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-
YYYY')),
PARTITION SALES_Q1_2000 VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-
YYYY')),
PARTITION SALES_Q2_2000 VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-
YYYY')),
PARTITION SALES_Q3_2000 VALUES LESS THAN (TO_DATE('01-OCT-2000','DD-MON-
YYYY')),
PARTITION SALES_Q4_2000 VALUES LESS THAN (MAXVALUE) NOCOMPRESS)
AS
SELECT * FROM sales sample(10);
This example also shows how you can easily create compressed and uncompressed partitions as
part of an initial table creation. The partition and template names are inherited by all the
subpartitions, thus making the naming and identification of subpartitions much easier and more
convenient.
2. From a SQL*Plus session logged on to the SH schema, run the following SQL statements into
your SQL*Plus session:
@c:\wkdir\show_range_list_names.sql
PROMPT as you can see, the partition and template name is inherited to all
the subpartitions,
PROMPT thus making the naming - and identification - of subpartitions much
easier ...
select partition_name, subpartition_name, high_value from
user_tab_subpartitions
where table_name='SALES_RLP';
A List and Range-List partitioned table group sets distinct unrelated values together in one
partition. Consequently, all values not being covered with an existing partition definition will
raise an Oracle error message. We will now create a DEFAULT List Partition for the previously
created List partitioned table sales_prod_dept, a kind of catch-all partition for all
undefined partition key values.
3. From a SQL*Plus session logged on to the SH schema, run the following SQL statements into
your SQL*Plus session:
@c:\wkdir\cr_default_list_part.sql
PROMPT add a new partition to sales_prod_dept that does NOT have a DEFAULT
partition
Rem it is added just like for a range partitioned table
ALTER TABLE sales_prod_dept ADD PARTITION gameboy_sales VALUES ('Gameboy');
All records having an undefined partitioning key will be stored in this new partition other_sales.
However, having a DEFAULT partition changes the way you’re adding new partitions to this
table. Conceptually, you cover all possible values for the partitioning key—the defined ones and
"all others"—as soon as you have created a default partition.
To ADD a new partition to this table, you logically have to SPLIT the default partition into a new
partition with a set of defined keys and a new default partition, still covering "all other values"
(now reduced by the keys we have specified for the new partition). Any attempt to add a new
partition like you did before will raise an ORA error message.
4. From a SQL*Plus session logged on to the SH schema, run the following SQL statements into
your SQL*Plus session:
@c:\wkdir\split_default_list_part.sql
PROMPT Unlike the first time, we cannot simply add a new partition
PROMPT raises ORA-14323: cannot add partition when DEFAULT partition exists
Rem we cannot be sure whether the new value already exists in the DEFAULT
partition
ALTER TABLE sales_prod_dept ADD PARTITION undefined_sales VALUES
('Undefined');
This will raise an Oracle error. Run the following commands or the script
split_default_list_part_b.sql
@c:\wkdir\split_default_list_part_b.sql
PROMPT so we have to SPLIT the default partition to ensure that
PROMPT all potential values of 'Undefined' are in the new partition
2. You can now export the data dictionary information of tablespace my_obe_transfer as
follows:
@c:\wkdir\export_metadata.sql
Optionally, you can create a specific directory where you want to store the export dump file:
CREATE DIRECTORY my_obe_dump_dir as 'c:\wkdir';
Now you can export the metadata only. Please compare the file size with the tablespace size to
get a feeling what it would have meant to extract all the data and not only the metadata.
expdp \'/ as sysdba\' DIRECTORY=my_obe_dump_dir DUMPFILE=
meta_MY_OBE_TRANSFER.dmp
Back to Topic
Back to Topic List
Cleanup
To clean up your environment, you will need to perform the following step:
1. From a SQL*Plus session logged on to the SH schema, execute the following commands:
SET SERVEROUTPUT ON
EXEC dw_handsOn.cleanup_modules
Back to Topic List
Summary
In this lesson, you learned how to:
Perform a multitable insert
Perform an upsert