Professional Documents
Culture Documents
Statistical Theory
Statistical Theory
Statistical Theory
theories
Group 10
➔ Factor Analysis
is a statistical method used to describe
variability among observed, correlated
variables in terms of a potentially lower
number of unobserved variables called
factors.
➔ Multivariate Analysis
involves observation and analysis of more
than one statistical outcome variable at a
time.
Regression Analysis
is a set of statistical processes for estimating the
relationships between a dependent variable
(outcome variable) and one or more independent
variables ('predictors'). The most common form of
regression analysis is linear regression, in which a
researcher finds the line that most closely fits the
data according to a specific mathematical criteria
A hypothetical example
(Sonipat)
Area in ft2 Price in Lakhs
Xi (INR) Yi
600 14
650 16
800 17
900 18
1200 24
1300 28
1500 29
How it works Area in ft2
Xi
Price in Lakhs
(INR) Yi
m = number of training examples 600 14
x’s = input variable
y’s = output variable 650 16
Hypothesis 800 17
900 18
1200 24
E.g in the example of house prices 1300 28
Let’s assume thetao =0 & theta1 =.025
1500 29
Therefore h(x1) = 0 +.025x560 ===> 14( predicted value)
Similarly h(x2) = 0 +.025x650 ===> 16.25 ( predicted
value)
h(x3) = 0 +.025x800 ===> 20 ( predicted
value)
& so on…..
Whereas actual y1 = 14, y2= 16, y3 = 17 (actual)
Vs
On the right trajectory? Area in ft2
Xi
Hypothesis
h(x)
Price in Lakhs
(INR) Yi
650 16 16
Therefore h(x1) = 0 +.025x560 ===> 14( predicted value)
Similarly h(x2) = 0 +.025x650 ===> 16.25 ( predicted 800 20 17
value)
h(x3) = 0 +.025x800 ===> 20 ( predicted 900 22.5 18
value)
& so on….. 1200 30 24
600 14 14
650 16 16
800 20 17
900 22.5 18
1200 30 24
1300 32.5 28
1500 37.5 29
Intuition
Area in Hypothesi Price in Area in Hypothesi Price in
ft2 s Lakhs ft2 s Lakhs
Xi h(x) (INR) Xi h(x) (INR)
Yi Yi
1500 37.5 29
1500 34.5 29
From this we can easily infer that theta1 = 0.023 is a better choice if
theta0 = 0.
theta1= .
025
theta1= .
023
h(x) graph
MULTIVARIATE ANALYSES
INTRODUCTION
• Multivariate analysis is used to describe analyses of data where there are multiple
variables or observations for each unit or individual.
• Often times these data are interrelated and statistical methods are needed to fully
answer the objectives of our research.
Examples Where Multivariate Analyses
May Be Appropriate
Types of Variables:
1. Qualitative variable:
2. Quantitative variable:
Both PCA and CFA gives similar answers most of the time and especially when the
number of variable are > 30 or the communalities > 0.6 for most variables
TYPES OF VARIANCES
Unique variance
Specific variance
Shared variance
Factor Analysis
FACTOR ROTATION
A rotation method that minimizes the number of factors needed to explain each
variable.The number of variables that load highly on a factor and the number
of factors needed to explain a variable are minimized.
Factor Analysis
FACTOR LOADING
Factor loading is basically the correlation coefficient for the variable and factor. Factor
loading shows the variance explained by the variable on that particular factor.
COMMUNALITY
It is the sum of the square of the factor loadings.
Applications of Factor Analysis
clusters variables into homogeneous sets–creates new variables (i.e. factors)–allows us to gain insight
to categories
2. Screening of Variables:
– identifies groupings to allow us to select one variable to represent many–useful in regression (recall
collinearity).
3. Summary:
4. Sampling of variables:
–helps select small group of variables of representative variables from larger set