Statistical Theory

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Statistical

theories
Group 10

21st September, 2020


Topics Covered
➔ Regression Analysis
is a set of statistical processes for estimating
the relationships between a dependent
variable and one or more independent
variables.

➔ Factor Analysis
is a statistical method used to describe
variability among observed, correlated
variables in terms of a potentially lower
number of unobserved variables called
factors.

➔ Multivariate Analysis
involves observation and analysis of more
than one statistical outcome variable at a
time.
Regression Analysis
is a set of statistical processes for estimating the
relationships between a dependent variable
(outcome variable) and one or more independent
variables ('predictors'). The most common form of
regression analysis is linear regression, in which a
researcher finds the line that most closely fits the
data according to a specific mathematical criteria
A hypothetical example
(Sonipat)
Area in ft2 Price in Lakhs
Xi (INR) Yi

600 14

650 16

800 17

900 18

1200 24

1300 28

1500 29
How it works Area in ft2
Xi
Price in Lakhs
(INR) Yi
m = number of training examples 600 14
x’s = input variable
y’s = output variable 650 16

Hypothesis 800 17

900 18

1200 24
E.g in the example of house prices 1300 28
Let’s assume thetao =0 & theta1 =.025
1500 29
Therefore h(x1) = 0 +.025x560 ===> 14( predicted value)
Similarly h(x2) = 0 +.025x650 ===> 16.25 ( predicted
value)
h(x3) = 0 +.025x800 ===> 20 ( predicted
value)
& so on…..
Whereas actual y1 = 14, y2= 16, y3 = 17 (actual)
Vs
On the right trajectory? Area in ft2
Xi
Hypothesis
h(x)
Price in Lakhs
(INR) Yi

Let’s assume thetao =0 & theta1 =.025 600 14 14

650 16 16
Therefore h(x1) = 0 +.025x560 ===> 14( predicted value)
Similarly h(x2) = 0 +.025x650 ===> 16.25 ( predicted 800 20 17
value)
h(x3) = 0 +.025x800 ===> 20 ( predicted 900 22.5 18
value)
& so on….. 1200 30 24

Whereas actual y1 = 14, y2= 16, y3 = 17 (actual)


Vs 1300 32.5 28
h1 = 14, h2= 16.25, h3 = 20 (prediction)
1500 37.5 29
Now the difference between y1-h1, y2-h2, y3-h3……...yn-hn
And then summing them will give us a number which can be
negative or positive depend on our selection of theta1
Area in Hypothesi Price in
ft2 s Lakhs
Xi h(x) (INR)
Yi

600 14 14

650 16 16

800 20 17

900 22.5 18

1200 30 24

1300 32.5 28

1500 37.5 29
Intuition
Area in Hypothesi Price in Area in Hypothesi Price in
ft2 s Lakhs ft2 s Lakhs
Xi h(x) (INR) Xi h(x) (INR)
Yi Yi

600 14 14 600 13.8 14

650 16 16 650 14.95 16


vs
800 20 17 800 18.4 17

900 22.5 18 900 20.7 18

1200 30 24 1200 27.6 24

1300 32.5 28 1300 29.9 28

1500 37.5 29 1500 34.5 29

theta1= .025 theta1= .023


Area in Hypoth Price Area Hypoth Price
ft2 esis in in ft2 esis in
Xi h(x) Lakhs Xi h(x) Lakh
theta1= .025 (INR)
Yi
s
(INR)
Yi

600 14 14 600 13.8 14

650 16 16 650 14.95 16

800 20 17 800 18.4 17


theta1= .023 900 22.5 18 900 20.7 18

1200 30 24 1200 27.6 24

1300 32.5 28 1300 29.9 28

1500 37.5 29
1500 34.5 29

From this we can easily infer that theta1 = 0.023 is a better choice if
theta0 = 0.

But how do we express this mathematically ??????????


This is called a cost function or J which is used to test the accuracy of our hypothesis
H
In simple terms what this does is subtract values of h from their respective y and sum
them hence the summation. The closer the value of j to 0 the better the prediction.
Which can also be observed from this graph

theta1= .
025

theta1= .
023

h(x) graph
MULTIVARIATE ANALYSES
INTRODUCTION

• Multivariate analysis is used to describe analyses of data where there are multiple
variables or observations for each unit or individual.

• Often times these data are interrelated and statistical methods are needed to fully
answer the objectives of our research.
Examples Where Multivariate Analyses
May Be Appropriate

• Determining the value of an


apartment.
• Book store
Types of Multivariate Analyses To Be Taught

• Multiple linear regression: A linear


regression method where the dependent
variable Y is described by a set of X
independent variables.

• Multiple linear correlation: Allows for


the determination of the strength of the
strength of the linear relationship
between Y and a set of X variables.
• Multivariate nonlinear
regression: A form of regression
analysis in which the dependent
variable Y is described by a
nonlinear combination of the
independent variables X.
CHARACTERIZING DATA

Types of Variables:

1. Qualitative variable:

• One in which numerical measurement is not possible.

• Observations can be neither meaningfully ordered nor measured.

2. Quantitative variable:

• One in which observations can be measured.

• Observations have a numerical value .

Quantitative variables can be subdivided into two classes:


Factor Analysis

•Data reduction tool

•Removes redundancy or duplication from a set of correlated variables

•Represents correlated variables with a smaller set of “derived” variables.

•Factors are formed that are relatively independent of one another.

TYPES OF FACTOR ANALYSIS-

Explanatory Factor Analysis-


● Used to describe underlying structure.
● Principle component analysis (PCA)-
Considers the total variance and derive factors that contains a little amount of
Factor Analysis

● Common factor analysis (CFA)-


● Considers only the common or shared variance and derive factors that
contains little amount of unique and error variance

Both PCA and CFA gives similar answers most of the time and especially when the
number of variable are > 30 or the communalities > 0.6 for most variables

TYPES OF VARIANCES

Unique variance
Specific variance
Shared variance
Factor Analysis

FACTOR ROTATION
A rotation method that minimizes the number of factors needed to explain each
variable.The number of variables that load highly on a factor and the number
of factors needed to explain a variable are minimized.
Factor Analysis

FACTOR LOADING
Factor loading is basically the correlation coefficient for the variable and factor. Factor
loading shows the variance explained by the variable on that particular factor.

COMMUNALITY
It is the sum of the square of the factor loadings.
Applications of Factor Analysis

1. Identification of Underlying Factors:

clusters variables into homogeneous sets–creates new variables (i.e. factors)–allows us to gain insight
to categories

2. Screening of Variables:

– identifies groupings to allow us to select one variable to represent many–useful in regression (recall
collinearity).

3. Summary:

–Allows us to describe many variables using a few factors

4. Sampling of variables:

–helps select small group of variables of representative variables from larger set

You might also like