Professional Documents
Culture Documents
CBQT DP - Distinct Placement
CBQT DP - Distinct Placement
Query Transformation
Query transformation is to rewrite original query into semantic equivalent query during parsing time. CBQT, cost based query transformation, is to find out the optimal semantic equivalent query based on cost calculation. Almost all RDMBS have query transformation phase. Even Apache Hive has query transformation, although it is simple and rule based. For Oracle, even simplest query will have CBQT phase. The sophisticated CBQT process will usually generate more efficient execution plans for very complex queries, but it will add overhead for very simple queries, especially when you compare the process with MySQL. Side notes: ANSI SQL join syntax like A JOIN B ON (A.C1=B.C2) is not Oracle native SQL language. Oracle will treat each such join as view, for example, A JOIN B ON (A.C1=B.C2) JOIN C ON (B.C3=C.C4) will become SELECT FROM (SELECT FROM A, B WHERE A.C1=B.C1) V, C WHERE V.C3=C.C4. Then view merge will be considered and CBQT will be applied possibly for each query block.
Distinct Placement
The following is a simple query with two table join and a final SELECT DISTINCT.
select distinct dtpc.targ_profile_id from udd_dim.dim_targ_prof_content_topic dtpc join udd_dim.map_sp_id_cont_topic msic on msic.cont_topic_id = dtpc.targ_cont_topic_id where dtpc.data_source_id = 4 and dtpc.tp_cont_top_include_flag = 1
The Questions
Why does the two table join have such long plan and why an internal view is generated?
VW_DTP_xxx (VW_DTP_C252A5A8) indicate CBQT distinct placement at work.
ITEM_1
Without DP, join cardinality will be 2,163,072* 3,571,698 /max(508,193) = 14,978,413,480. This will create huge burden for final DISTINCT. Think about if this join is just an intermediate process for more table joins or other operations.
With DP, join cardinality is 508*3,571,698/max(508, 193) = 3,571,698. The plan has it at 710K. 10053 trace file uses selectivity 0.000392 other than 1/508. I need further investigation. One possibility is histogram.
The following is excerpt from 10053 trace file, which shows the process of query transformation and the detail text of the system generated inline view VW_DTP_C252A5A8. CBQT also makes it hard to read 10053 trace files because it adds too many combinations beyond access path, join type and join order.
Original Issue
The first (good) query has one more table join to DIM_TARG_PROF_CONTENT_TOPIC. The second (bad) query has two more table joins, with one to MAP_SP_ID_CONT_TOPIC, on different column.
SELECT DISTINCT v1.adv_order_line_id AS adv_order_line_id, v2.site_id as pred_site_id FROM (SELECT /*+ NO_MERGE */ distinct daol.adv_order_line_id, dtpc.targ_cont_topic_id FROM udd_dim.dim_advertiser_order_line daol JOIN udd_dim.dim_targ_prof_content_topic dtpc ON (daol.targ_profile_id = dtpc.targ_profile_id) WHERE daol.data_source_id = 4 AND daol.apt_placement_type_code = 'AIC' AND daol.targ_section_name_text = 'N/A' AND daol.targ_cont_topic_name_text NOT IN ('N/A', 'abt') AND dtpc.data_source_id = 4 AND dtpc.tp_cont_top_include_flag = 1 ) v1 JOIN (SELECT /*+ NO_MERGE */ distinct msic.cont_topic_id, msis.site_id FROM udd_dim.map_sp_id_cont_topic msic JOIN udd_dim.map_sp_id_site msis ON (msis.space_id = msic.space_id)) v2 ON v1.targ_cont_topic_id=v2.cont_topic_id
While there is not much input cardinality reduction, there is no step with cardinality estimate which reaches 100M or 1B. The response time has changed from not coming back to minutes.