Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

12/12/2023

Syllabus
• Regression:
• Linear regression
• Non Linear regression
• Exp. Des. for Optimization: Theory of Response Surface Methodology
(RSM)

REGRESSION
Experimental Design

1
12/12/2023

2
12/12/2023

Introduction

• Regression analysis involves identifying the relationship between a


dependent variable and one or more independent variables.
• A model of the relationship is hypothesized, and estimates of the
parameter values are used to develop an estimated regression
equation.
• Various tests are then employed to determine if the model is
satisfactory.
• If the model is satisfactory, the estimated regression equation can be
used to predict the value of the dependent variable given values for
the independent variables.

Types of Regression
1. Linear Regression
2. Polynomial Regression
3. Logistic Regression
4. Quantile Regression
5. Ridge Regression
6. …
7. ….
8. Nnn…

3
12/12/2023

Regression Model
• In simple linear regression, the model used to describe the relationship
between a single dependent variable y and a single independent variable x is y =
β0 + β1x + ε.
• β0 and β1 are referred to as the model parameters, and ε is a probabilistic error
term that accounts for the variability in y that cannot be explained by the linear
relationship with x.
• If the error term were not present, the model would be deterministic; in that
case, knowledge of the value of x would be sufficient to determine the value of
y.
• In multiple regression analysis, the model for simple linear regression is
extended to account for the relationship between the dependent variable y and
p independent variables x1, x2, . . ., xp.
• The general form of the multiple regression model is y = β0 + β1x1 + β2x2 + . . . +
βpxp + ε.
• The parameters of the model are the β0, β1, . . ., βp, and ε is the error term.

Least squares method


• The least squares method is the most widely used procedure for
developing estimates of the model parameters.
• For simple linear regression, the least squares estimates of the model
parameters β0 and β1 are denoted b0 and b1.
• Using these estimates, an estimated regression equation is
constructed: ŷ = b0 + b1x .
• The graph of the estimated regression equation for simple linear
regression is a straight line approximation to the relationship between
y and x.

4
12/12/2023

Examples

A primary use of the estimated regression equation is to predict the value of the dependent
variable when values for the independent variables are given.
For instance, given a patient with a stress test score of 60, the predicted blood pressure is 42.3 +
0.49(60) = 71.7.

Analysis of variance and goodness of fit


• A commonly used measure of the goodness of fit provided by the estimated regression equation
is the coefficient of determination or coefficient of correlation.
• Coefficient of determination, in statistics, R2 (or r2), a measure that assesses the ability of a model
to predict or explain an outcome in the linear regression setting.
• More specifically, R2 indicates the proportion of the variance in the dependent variable (Y) that is
predicted or explained by linear regression and the predictor variable (X, also known as the
independent variable).
• An R2 of 0.35, for example, indicates that 35 percent of the variation in the outcome has been
explained just by predicting the outcome using the covariates included in the model.
• R2 = SSR/SST
• SSR + SSE = SST
• SST=Σ(y − ȳ)2
• SSR= Regression sum of squares, SSE= error sum of squares, SST= total sum of squares

5
12/12/2023

Lets Interpret the value


• Multiple R: the Correlation Coefficient that measures Regression Statistics
the strength of a linear relationship between two Multiple R 0.986276597
variables. The larger the absolute value, the stronger is
the relationship. 1 means a strong positive relationship, R Square 0.972741527
-1 means a strong negative relationship, 0 means no Adjusted R Square 0.970469987
relationship at all Standard Error 1.778042169
• R Square signifies the Coefficient of Determination, Observations 14
which shows the goodness of fit. It shows how many
points fall on the regression line. In our example: 96% of
the dependent variables (y-values) are explained by the
independent variables (x-values).
• Adjusted R Square is the modified version of R square
that adjusts for predictors that are not significant to the
regression model.
• Standard Error is another goodness-of-fit measure that
shows the precision of your regression analysis.

6
12/12/2023

ANOVA
• Df is the number of degrees of freedom associated with the sources of variance.
• SS is the sum of squares. The smaller the Residual SS viz a viz the Total SS, the better the
fitment of your model with the data.
• MS is the mean square.
• F is the F statistic or F-test for the null hypothesis. It is very effectively used to test the
overall model significance.
• Significance F is the P-value of F.

ANOVA
df SS MS F Significance F
Regression 1 1353.82113 1353.82113 428.2300848 9.3639E-11
Residual 12 37.93720744 3.161433954
Total 13 1391.758337

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 3.947931358 1.12885803 3.497278888 0.004403889 1.488360998 6.407501717 1.488360998 6.407501717
0.15 12.34519188 0.596567043 20.6937209 9.3639E-11 11.04538396 13.64499981 11.04538396 13.64499981

Multiple Linear Regression


• Multiple linear regression is a method we can use to understand the
relationship between two or more explanatory variables and a
response variable.
• Example: Suppose we want to know if the number of hours spent
studying and the number of prep exams taken affects the score that a
student receives on a certain college entrance exam.

7
12/12/2023

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.855691

Interpretation R Square 0.732207


Adjusted R 0.698733
Square
Standard Error
5.454812
Observations 19

ANOVA
• R Square: 0.734→ 73.4% of the variation in the exam df SS MS F Significance F
scores can be explained by the number of hours Regression 2 1301.71 650.855 21.87382 2.64E-05
Residual 16 476.0796 29.75497
studied and the number of prep exams taken. Total 18 1777.789

• Standard error: 5.366. In this example, the observed Coefficients


Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
values fall an average of 5.366 units from the Intercept 66.88755 3.093505 21.62193
1 5.71029 0.942763 6.056976
2.87E-13 60.32961 73.44548 60.32961 73.44548
1.66E-05 3.711722 7.708857 3.711722 7.708857
regression line. 1 -0.56044 0.931605 -0.60159 0.555876 -2.53536 1.414471 -2.53536 1.414471

• F: 23.46. This is the overall F statistic for the regression


model, calculated as regression MS / residual MS.
• Significance F: 0.0000. →indicates that the explanatory
variables hours studied and prep exams taken
combined have a statistically significant association
with exam score.

You might also like