Professional Documents
Culture Documents
Lecture08 1
Lecture08 1
Lecture08 1
Q2: for each product and for sales of 2016, show each month’s
total sales as percentage of the year-long total sales
SELECT prod,
distinct
month,
prod, month,
sum(amount) OVER (PARTITION BY prod, month)/
sum(amount) OVER (PARTITION BY prod)
FROM Sales
WHERE year=2016
Q3: for each product and month of 2016, show the month’s
cumulative total sales
SELECT prod,
distinct
month,
prod, month,
sum(amount) OVER (PARTITION BY prod
ORDER BY month)
FROM Sales
WHERE year=2016
39
11/29/2020 Data Management, Business Intelligence and Visualization
Classification Techniques
Regression
Bayesian classification
K-Nearest Neighbors (KNN)
Decision support trees
Neural networks
Rainy 3/9 2/5 Cool 3/9 1/5 Rainy Mild High False Yes
Outlook Temp. Humidity Windy Play Sunny Cool Normal False Yes
Clustering
houses based
on geographic
distance
Clustering
houses based
on size
Size Based
11/29/2020 Data Management, Business Intelligence and Visualization 47
Clustering – Cases
Marketing: Help marketers discover distinct groups in
their customer bases, and then use this knowledge to
develop targeted marketing programs.
Land use: Identification of areas of similar land use in
an earth observation database.
Insurance: Identifying groups of motor insurance policy
holders with a high average claim cost.
City-planning: Identifying groups of houses according to
their house type, value, and geographical location.
Earth-quake studies: Observed earth quake epicenters
should be clustered along continent faults.
k1
Y
Pick 3
k2
initial
cluster
centers
(randomly)
k3
X
50
K-means example, 1st repetition
k1
Y
k2
Assign
each point
to the closest
cluster
center k3
X
51
K-means example, 1st repetition
k1 k1
Y
Move k2
each cluster
center k3
k2
to the mean
of each k3
cluster
X
52
K-means example, 2nd repetition
Reassign k1
points Y
closest to a
different new
cluster center
k3
Q: Which k2
points are
reassigned?
X
53
K-means example, 2nd repetition
k1
Y
A: three
points with
animation k3
k2
X
54
K-means example, 2nd repetition
k1
Y
re-compute
cluster
means k3
k2
X
55
K-means example, 2nd repetition
k1
Y
k2
move cluster
centers to k3
cluster
means
X
56
Association Rules
Example: baskets in a supermarket containing products
{Beer, Bread} => Diapers | support=5%, confidence=30%
support 5% 5% of baskets contain Beer, Bread, Diaper
confidence 30% 30% of baskets that contain Beer and Bread,
also contain Diapers
Given
a set of items I={I1,I2,…,Im} (e.g. products, books, views), and
a set of baskets (or transactions) T={t1,t2, …, tn}
where each transaction contains items, ti = {Ii1,Ii2, …, Iik}