Professional Documents
Culture Documents
Mining Class Comparisions and Mining Descriptive Statistical Measures
Mining Class Comparisions and Mining Descriptive Statistical Measures
Mining Class Comparisions and Mining Descriptive Statistical Measures
Neha Sharma
ME(I.T),3rd Sem
Roll.NO-463
MINING CLASS COMPARISONS:DISCRIMINATING BETWEEN
DIFFERENT CLASSES
use Big_University_DB
mine comparison as “grad_vs_undergrad_students”
in relevance to name, gender, major, birth_place, birth_date, residence,
phone#, gpa
for “graduate_students”
where status in “graduate”
versus “undergraduate_students”
where status in “undergraduate”
analyze count%
from student
Name Gender Major Birth-Place Birth_date Residence Phone # GPA
Jim M CS Vancouver,BC, 8-12-76 3511 Main St., 687-4598 3.67
Woodman Canada Richmond
Scott M CS Montreal, Que, 28-7-75 345 1st Ave., 253-9106 3.70
Lachance Canada Richmond
Laura Lee F Physics Seattle, WA, USA 25-8-70 125 Austin Ave., 420-5232 3.83
… … … … … Burnaby … …
…
count(q
i 1
a Ci )
Count distribution between graduate and undergraduate students for a generalized tuple
X , graduate_ student( X )
birth_ country( X ) "Canada"age_ range( X ) "25 30"gpa( X ) " good" [d : 30%]
– where 90/(90+210) = 30%
– sufficient
• Quantitative description rule
X, target_cla ss(X)
condition 1(X) [t : w1, d : w 1] ... condition n(X) [t : wn, d : w n]
– necessary and sufficient
12/7/21 12
Example: Quantitative Description Rule
Location/item TV Computer Both_items
Both_ 200 20% 100% 800 80% 100% 1000 100% 100%
regions
Crosstab showing associated t-weight, d-weight values and total number (in thousands) of TVs and
computers sold at AllElectronics in 1998
X, Europe(X)
(item(X) " TV" ) [t : 25%, d : 40%] (item(X) " computer" ) [t : 75%, d : 30%]
wi
• Median: A holistic measure i 1
• Variance
1 n 1 1 2
2
s i ( x x ) 2
i
x 2
x
i
n 1 i 1 n 1 n