Professional Documents
Culture Documents
In-Database Analytics - OrACLE
In-Database Analytics - OrACLE
DB
Oracle 10g
10g DB
Data Warehousing
ETL
OLAP
BI/DW
Statistics
Data Mining
Flashback : RTE
: In-Database Analytics
DB
In-Database Statistics
OLAP Option
Data Mining Option
Q&A
Flashback :
RTE
RTE(Real-Time Enterprise)?
an enterprise that competes by using up-todate information to progressively remove
delays to the management and
execution of its critical business processes
- Gartner, Definition of Real Time Enterprise
Real-Time Enterprise
(Remove
delays)
(Up-to-date)
RTE
RTE
9
9
.
9
9
DW Renovation
KPI
Fin
MFG
ETL
MES
(ETL
)
ODS
RTDW
(OLAP
+
OLTP)
BI Portal
: Data in
One Place
Information
Driven
Enterprise
: Gap
Query
and Reporting
OLAP
Data Mining
&
3
?
,
?
6
?
Business Intelligence
3 DW : 30TB
DW : 100TB
2,3 PB DW
.
4000 2TB
?
: ,
: ?
: ,
OLAP
Engine
Data
Integration
Engine
Data
Warehouse
Mining
Engine
.
,
?(:TB )
9
9
9
9 Security, Scalability, Availability
&
9 SQL
:
In-Database Analytics
Oracle 10g DB
Data Warehousing
ETL
OLAP
Statistics
Data Mining
DB
Data Warehouse
Built-in Statistics
OLAP Option
Data Mining
Option
Oracle 10g DB
REGION
PRODUCT
TIME
Data Warehousing
ETL
OLAP
Statistics
Data Mining
In-Database Analytics
Oracle 10g
10g DB
Data Warehousing
ETL
OLAP
Statistics
DB
9Fast scoring : CPU 250
Data Mining
TCO
In-Database Analytics :
: DVD
,
3 3
DVD
,
In-Database Analytics :
1 :
DB
DB
2 : DB
3 :
DB
In-Database Analytics :
SQL
select responder, cust_region, count(*) as cnt,
sum(post_purch pre_purch) as tot_increase,
:
avg(post_purch pre_purch) as avg_increase,
stats_t_test_paired(pre_purch, post_purch) as significance
from (
:
select cust_name,
prediction(campaign_model using *) as responder,
sum(case when purchase_date < 15-Apr-2005 then
purchase_amt else 0 end) as pre_purch,
DB
sum(case when purchase_date >= 15-Apr-2005 then
purchase_amt else 0 end) as post_purch
from customers, sales, products@PRODDB
where sales.cust_id = customers.cust_id
and purchase_date between 15-Jan-2005 and 14-Jul-2005
and sales.prod_id = products.prod_id
and contains(prod_description, DVD) > 0
group by cust_id, prediction(campaign_model using *) )
group by rollup responder, cust_region order by 4 desc;
In-Database Analytics :
(SQL pipelining)
DB DM,
9 :
9 :
DB
10g
Ranking functions
rank, dense_rank, cume_dist, percent_rank,
ntile
LAG/LEAD functions
Direct inter-row reference using offsets
Statistical Aggregates
Correlation, linear regression family, covariance
Linear regression
Fitting of an ordinary-least-squares regression
line to a set of number pairs.
Frequently combined with the COVAR_POP,
COVAR_SAMP, and CORR functions.
Note: Statistics and SQL Analytics are included in Oracle
Database Standard Edition
Descriptive Statistics
average, standard deviation, variance, min, max, median
(via percentile_count), mode, group-by & roll-up
DBMS_STAT_FUNCS: summarizes numerical columns
of a table and returns count, min, max, range, mean,
stats_mode, variance, standard deviation, median,
quantile values, +/- n sigma values, top/bottom 5 values
Correlations
Pearsons correlation coefficients, Spearman's and
Kendall's (both nonparametric).
Cross Tabs
Enhanced with % statistics: chi squared, phi coefficient,
Cramer's V, contingency coefficient, Cohen's kappa
Hypothesis Testing
Student t-test , F-test, Binomial test, Wilcoxon Signed
Ranks test, Chi-square, Mann Whitney test, KolmogorovSmirnov test, One-way ANOVA
Distribution Fitting
Kolmogorov-Smirnov Test, Anderson-Darling Test, ChiSquared Test, Normal, Uniform, Weibull, Exponential
In-Database Statistics
(: )
Note: Statistics and SQL Analytics are included in Oracle Database Standard Edition
OLAP
OLAP
SQL Ad-Hoc
9
9
OLAP
API
9
9SQL OLAP API
OracleBI
Reports
OracleBI
OracleBI
Discoverer Spreadsheet Oracle
BI Beans
Add-In
OLAP
Database
OLAP Option: Query Analysis Planning
Oracle Warehouse Builder
Time to build
98
100
60
40
23
20
17
14
16
10
17
14
0
Analytic Workspace
14 MVs
214 MVs
518 MVs
Time to build
400
350
300
250
200
147
150
126
98
100
50
17
23
10
17
0
Analytic Workspace
14 MVs
214 MVs
518 MVs
Data Mining
Data Mining
(Attribute Importance)
(Classification)
(Decision Trees)
(Clustering)
(Associations)
(Anomaly Detection)
Data Mining
(churn)
(Basel II)
DB
(Sarbanes-Oxley)
Attribute importance
Classification, regression & prediction
Anomaly detection
Association rules
Clustering
Nonnegative matrix factorization
BLAST
Attribute Importance
A1 A2 A3 A4 A5 A6 A7
Regression
Income
>$50K
<=$50K
Gender
Age
>35
Status
Married
Buy = 0
Gender
Single F
Buy = 1
Buy = 0
<=35
HH Size
>4
Buy = 1 Buy = 0
<=4
Buy = 1
Clustering
Association Rules
Feature Extraction
clustering text mining
F1 F2 F3 F4
SQL &
Java
BI
metagroup.com
Copyright 2004
META Group, Inc.
All rights reserved.
METAspectrum 60.1
In-Database Analytics
Benefit
H/W, O/S
DB
DB ,
RTE
DB
9, ,
9
DB
9
,