Professional Documents
Culture Documents
Lecture # 2-Normalization - Significance Test
Lecture # 2-Normalization - Significance Test
Data Normalization
Min Max Normalization:
A database can contain n numbers of continuous type
attributes.
Data is scaled to fall within a small, specified range
Min-max normalization:
to [new_minA, new_maxA] where
new_minA = is the new minimum to be used for scale
new_maxA = the new minimum to be used for scale
Data Normalization Cont…
Min Max Normalization:
v minA
v' (new _ maxA new _ minA) new _ minA
maxA minA
Example:-
Suppose that the minimum and maximum values for the
attribute income are 12,000 and 98,000. By min-max
normalization, a value of 73,600 for income is
transformed to in the interval [0,1] is
» ((73,000-12,000)/(98,000-12,000)) (1.0-0) + 0 = 0.716
Data Normalization Cont…
z-score normalization
This method of normalization is useful when the
actual minimum and maximum of any attribute are
unknown.
Or when outliers which dominate the min-max
normalization.
v A v mean
v'
or
v'
A S .D
Example : Let v=20000, μ = 250, σ = 125. Then find v’
Data Normalization Cont…
Normalization by decimal scaling
v
v' j
10
(Observed Expected ) 2
2
Expected
Matric 50 90 140
FA 30 100 130
(50 41.48) 2
(90 98.52) 2
(30 38.52) 2
(100 91.48) 2
2
41.48 98.52 38.52 91.48
= 1.75+0.74+1.88+0.79
= 5.16
Chi-Square Example
Some more examples of Chi-Square are at the end of
this lecture slide
Problem
A public opinion poll surveyed a simple random
sample of 1000 voters. Respondents were classified by
gender (male or female) and by voting preference
(Republican, Democrat, or Independent). Results are
shown in the contingency table below.
Is there a gender gap? Do the men's voting preferences
differ significantly from the women's preferences? Use
a 0.05 level of significance.
Chi-Square
The first step is to state the null hypothesis and an
alternative hypothesis.
H0: Gender and voting preferences are independent.
Ha: Gender and voting preferences are not independent.
Chi-Square
Formulate an analysis plan. For this analysis, the
significance level is 0.05. Using sample data, we will
conduct a chi-square test for independence.