Professional Documents
Culture Documents
Introduction of Survey Analysis Correlations Linear Regression Simple Linear Regression Multiple Linear Regression Logistic Regression Binomial
Introduction of Survey Analysis Correlations Linear Regression Simple Linear Regression Multiple Linear Regression Logistic Regression Binomial
Nevertheless, there
is a preferred logic to
analyzing survey
data.
Type of
Relationship
• Non – Causal Relationship
It is also known as Association or Correlation. Two variable
are said to be related if there is an association or correlation
between the two
Two variable can be related to each other, but is not
completely the result of one variable affecting the other.
• Causal Relationship
One variable has a direct influence on the other.
If two variable are causally related, it is possible to conclude
that changes in the explanatory variable X, will have a
direct impact on Y. It means that adjusting the value that
variable
will cause the other to change.
Correlational
•
Analysis
A statistical techniques use to determine an
existence of association between two or more
variable.
• It determines the strength or degree of
association between variables
• Concerned largely with the study of
interdependency or co-variation between
variables
• Does not express causality or that a
variable
function is
of athe other.
Correlational
Analysis
Using the data “Scatterplot_Discuss1”, Let
us answer the following:
Where,
Y – is the value of the response variable
0, 1, 2, … k – are the parameter of the model
X1 – is the value of the first predictor variable
X2 – is the value of the second predictor variable
Arturo J Patungan Jr
09662776892
Multiple Linear
Regression
The Result
The Plots
• Scatter
Plot
The scatter plot of the residual
give us the idea of the
homogeneity of variance
(homoscedasticity) of the
residual.
A model with a characteristics of
homoscedasticity (no problem
of heteroskedasticity) have
relatively the same spread of
points across the horizontal
pattern.
Multiple Linear
Regression
Diagnostic Checking and Remedial Measures in a
Regression Analysis
• Multicollinearity – when predictors are highly correlated
How to detect?
Perfect multicollinearity makes the computer scream at
you!
Milder forms of may be detected by significant F-
statistics or high R2 accompanied by t-statistics which
are not significant.
It can also be detected by a high correlation between
pairs of regressors.
The rule of that is commonly used is: Multicollinearity is
not serious if no VIF is greater than 10.
Multiple Linear
Regression
Diagnostic Checking and Remedial Measures in a
Regression Analysis
• Multicollinearity
Remedies
Drop one of the variables that are causing the
problem.
Add more observation to the data
Perform principal component analysis or perform
factor
analysis before performing the regression.
Multiple Linear
Regression
Diagnostic Checking and Remedial Measures in a
Regression Analysis
• Serial Correlation/ Autocorrelation – error terms are
correlated to each other
How to detect?
Use the Durbin – Watson Test – that is there is a
first
order autoregressive serial correlation
The Durbin – Watson test should be close to 2 to
conclude no serial =
Durbin-Watson correlation.
2 – no serial correlation
0 < Durbin-Watson < 2 – positive autocorrelation
2 < Durbin-Watson < 4 – negative autocorrelation
Multiple Linear
Regression
Diagnostic Checking and Remedial Measures in
a Regression Analysis
• Serial Correlation/ Autocorrelation
Remedies
Reintroduction of an important omitted variable may
remove the problem
If the source of the serial correlation is an incorrect
functional form, then specifying the correct functional
form will solve the problem.
Multiple Linear
Regression
Diagnostic Checking and Remedial Measures in
a Regression Analysis
• Heteroskedasticity – the error term having no equal
variance (not constant)
How to Detect?
A funnel-shaped residual plot indicates
nonconstant
variance.
If the residuals form a horizontal band centered
around 0, the indication is that the variance is
constant.
Multiple Linear
Regression
Diagnostic Checking and Remedial Measures in
a Regression Analysis
• Heteroskedasticity
Remedy
If the nature of heteroskedasticity is “known”, or
believed to be of a certain form, then a suitable
transformation (e.g. logarithmic) of the
variables might remove the problem.
Multiple Linear
Regression
Exercises
• Use the Pizza data.
• Create a MLR model with the following:
Dependent Variable – Calories
Independent Variables - Moisture, Protein,
Fats, Ash, Sodium, Carbohydrates
• Test the Multicollinearity Assumption
• Test significance of predictors
• Test Serial Correlations
• Test Heteroskedasticity assumption
Multiple Linear
Regression
Other Consideration in the Regression Model
• Indicator Variables
Indicator or dummy variables are used to include categorical or
qualitative regressors in the regression analysis
Dummy variables are also used to compare the responses of different
groups.
Dummy variables assume only the values 0 and 1; generally 1 denotes
the presence of a characteristic, while 0 denotes the absence of the
characteristic. However, assignment of labels to the values is generally
arbitrary.
Even though one of the independent variables in the model is
qualitative, it is possible to include interaction effects or interaction terms
in the model by including cross-product terms.
Multiple Linear
Regression
Other Consideration in the Regression Model
• Number of Indicator Variables
In general, if a qualitative variable has m categories, we
can define (m-1) dummy variables.
If there are more than one qualitative variable, define the
appropriate number of dummy variables for each qualitative
variable.
Dummy variables can also be used to model seasonality (time-
series data).