Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

Research Trends in AI

Data Normalization
Min Max Normalization:
A database can contain n numbers of continuous type
attributes.
Data is scaled to fall within a small, specified range
Min-max normalization:
to [new_minA, new_maxA] where
new_minA = is the new minimum to be used for scale
new_maxA = the new minimum to be used for scale
Data Normalization Cont…
Min Max Normalization:
v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA

Example:-
Suppose that the minimum and maximum values for the
attribute income are 12,000 and 98,000. By min-max
normalization, a value of 73,600 for income is
transformed to in the interval [0,1] is
» ((73,000-12,000)/(98,000-12,000)) (1.0-0) + 0 = 0.716
Data Normalization Cont…
z-score normalization
This method of normalization is useful when the
actual minimum and maximum of any attribute are
unknown.
Or when outliers which dominate the min-max
normalization.

v  A v  mean
v' 
or
v' 
 A S .D
 Example : Let v=20000, μ = 250, σ = 125. Then find v’
Data Normalization Cont…
Normalization by decimal scaling

v
v'  j
10

Where j is the smallest integer in the given range


such that MAX v '  1
Correlation Analysis (Categorical Data)
Χ2 (chi-square) test

(Observed  Expected ) 2
2  
Expected

Observed is the given values and expected is the


calculated value
The larger the Χ2 value, the more likely the
variables are related
Chi-Square Calculation: An Example
Find the value of Χ2 (chi-square) from the given data

Exams Science Arts Sum (row)

Matric 50 90 140

FA 30 100 130

Sum(col.) 80 190 270


Chi-Square Calculation: An Example
Expected values Science Arts Sum (row)
of Exams
Matric 140x80/27 140x190/270= 140
0 98.52
=41.48
FA 130x80/27 130x190/270= 130
0=38.52 91.48

Sum(col.) 80 190 270


Chi-Square Calculation: An Example
Χ2 (chi-square) calculation (numbers in
parenthesis are expected counts calculated based
on the data distribution in the two categories)

(50  41.48) 2
(90  98.52) 2
(30  38.52) 2
(100  91.48) 2
2    
41.48 98.52 38.52 91.48
= 1.75+0.74+1.88+0.79
= 5.16
Chi-Square Example
Some more examples of Chi-Square are at the end of
this lecture slide
Problem
A public opinion poll surveyed a simple random
sample of 1000 voters. Respondents were classified by
gender (male or female) and by voting preference
(Republican, Democrat, or Independent). Results are
shown in the contingency table below.
Is there a gender gap? Do the men's voting preferences
differ significantly from the women's preferences? Use
a 0.05 level of significance.
Chi-Square
The first step is to state the null hypothesis and an
alternative hypothesis.
H0: Gender and voting preferences are independent.
Ha: Gender and voting preferences are not independent.
Chi-Square
Formulate an analysis plan. For this analysis, the
significance level is 0.05. Using sample data, we will
conduct a chi-square test for independence.

Analyze sample data. Applying the chi-square test for


independence to sample data, we compute the degrees of
freedom, the expected frequency counts, and the chi-
square test statistic. Based on the chi-square statistic and
the degrees of freedom, we determine the P-value.
Chi-Square
Chi-Square
Chi-Square
Chi-Square

where DF is the degrees of freedom, r is the number of


levels of gender(rows), c is the number of levels of the
voting preference(Columns), nr is the number of
observations from level r of gender, nc is the number of
observations from level c of voting preference, n is the
number of observations in the sample, Er,c is the expected
frequency count when gender is level r and voting
preference is level c, and Or,c is the observed frequency
count when gender is level r voting preference is level c.
P-value
The P-value is the probability that a chi-square
statistic having 2 degrees of freedom is more
extreme than 16.2
We use the Chi-Square Distribution to find P
Interpret results. Since the P-value is less than
calculated value, we cannot accept the null
hypothesis. Thus, we conclude that there is a
relationship between gender and voting
preference.
Chi-Square Distribution Sample
Another Chi-Square Distribution Sample
Another Example of US-Election
Computing the expected frequency
The Significance Test
Significance
r = the number of rows and c = the number of columns.

The chi-square critical value with df = 1 and α = .05 is


3.84.

Because the calculated value exceeds this critical value,


the difference is significant(Reject Null Hypothesis, so
attributes are not independent).
Patient and Hospital Example
Null and Alternate Hypothesis
H0: There is no difference in satisfaction of patients
between the two schemes, and

H1: There is a difference in satisfaction of patients


between the two schemes.
Patient and Hospital Example
Calculations
Chi-Square Value
Conclusion
Null hypothesis is rejected
Another Significance Test
Paired T-Test
Thanks

You might also like