Professional Documents
Culture Documents
1518 (Assignment 2) - Packages Iii
1518 (Assignment 2) - Packages Iii
BSC(HONS.) STATISTICS
SESSION:2018-2022
SEMESTER: 6TH
ASSIGNMENT TOPICS:
Page 1|6
LINEAR REGRESSION
"Francis Galton " invented the word regression. He wrote an article which stated that " An individual
characteristics inherited in their children which provides lesser degree on average". Regression leads to
mediocrity.
For example, Parents with small height have small height children and on average their heights are
relatively less than their parents. The term "regression" used to analyze the relation between variables (two
or more) in statistical techniques. Regression correlates with average. [1]
To develop a relationship in which dependent variable depends on the set of independent variables in order
to predict the mean values of dependent variables for the fixed values of independent variables.
Example, Lets take the hypothetical population distribution of heights of students for the fixed ages. At a
specific age there is a range of heights .The students at specific age does not have the same heights, but on
average height increases with age that is shown through regression line in which it passes from the average
height of every specific age . So we can predict the height at any age from the regression line. [1]
When a variable is dependent on single explanatory variable it is known as simple regression analysis.
Linear regression is the most widely used statistical technique; it is a way to model a relationship between
two sets of variables. The result is a linear regression equation that can be used to make predictions about
data.
Two dimensional graph of simple regression analysis gives the fundamental ideas. As regression analysis
is about estimate for predicting the mean of population.
It plays an important role in the model . As it represents the following terms that are almost neglibile.
Page 2|6
LINEAR REGRESSION
Lack of uncertainty
Data unavailable
Error measurements
Extreme unwillingness of not including the variables
Inappropriate functional models
In case, e is zero because it is assumed that the independent variable (X) has negligible errors. Therefore,
it remains Y = a + b(X)
Where;
Value of a : a = Y – b(X)
Regression analysis is all about to estimate a and b that is based on the observation of X and Y.
It is one of the method of estimation. It is powerful having statistical properties and simple in mathematical
calculations.
The model uses Ordinary Least Squares (OLS) method, which determines the value of unknown
parameters in a linear regression equation. Its goal is to minimize the difference between observed
responses and the ones predicted using linear regression model. [2]
DETERMINING REGRESSION FUNCTION : Our target is to find the estimated values extremely
closer to the actual values of Y. So we will choosing the a and b in such way that its sum gives the smallest
possible values. The method of least square will provide us the smallest values of the estimates a and b.
To find the unique estimates, the differential calculus plays a significant role.
1) Observable Quantities : The observable (sample) quantities are taken from the least square estimators
so they can be computed easily.
2) Point estimators : Least square method provides the single value of estimators, so they point estimators
of the population parameter.
Page 3|6
LINEAR REGRESSION
3) Regression line : By obtaining the estimates, we can draw regression line. Regression line have
following properties:
There is no correlation between the residuals and the fixed values of X. [1]
Properties of Estimators
Expectation shows the unbiasedness of an estimator. Estimators are consistent and efficient.
Greater variance(sigma2) have greater dispersion . Greater variance gives less precise information
of a and b. More spread shows larger sum of squares, means it have "smaller variance" of the least
square estimators and vice versa.
The least square estimators should have the minimum variance value .This property makes the least
square estimators, the best estimator (BLUE).[1]
The correlation coefficient tells how strong of a linear relationship there is between two variables.
If Regression Report is not specified, Regress automatically gives a list including values for Parameter
Table, R Squared, Adjusted R Squared, Estimated Variance and ANOVA Table. This set of objects
comprises the default Summary Report. [4]
ANOVA Table : ANOVA Table represents the analysis of variance table for the fitted model.
It provides a comparison of the given model to a smaller one including only a constant term. The table
includes the degrees of freedom, the sum of squares and the mean squares due to the model (labeled Model)
and due to the residuals (labeled Error).
The residual mean square is also available in Estimated Variance and is calculated by dividing the residual
sum of squares by its degrees of freedom.
The F-test compares the two models using the ratio of their mean squares. If the value of F is large, the null
hypothesis supporting the smaller model is rejected.
It is equivalent to the squared sum of Fit Residuals divided by the degrees of freedom n-p where n is the
length of the dataset and p is the number of parameters.
Parameter Table : Parameter Table is a set of all possible values that can occur in regression report. It
represents a table of fitted parameter information. It includes estimates, standard errors, t‐test statistics
and p‐values for each parameter.
It is the ratio of the model sum of squares to the total sum of squares.It gives the fraction of the variation of
the response that is predicted by the model.
Parameter CI Table : Parameter CI Table represents a table of confidence interval information for the
fitted parameters.
Best Fit Parameters : Best Fit Parameters gives the list of estimated parameter values for the model.
Best Fit : Best Fit gives the model with the Best Fit Parameters inserted.
Fit Residuals : Fit Residuals represents the residual errors for the fitted values. It gives the difference
between the response values in the input data and the fitted values Predicted Response.
Predicted Response : Predicted Response represents the fitted values for data. It gives the value of Best
Fit for each of the input data points.
Durbin Watson D : The Durbin–Watson d statistic is a test for autocorrelation in the residuals from a
statistical regression analysis. The statistic takes on values between 0 and 4.
A value close to 0 indicates positive correlation, and a value close to 4 indicates negative correlation. [4]
REFERENCES :
Page 5|6
LINEAR REGRESSION
1. Gujarati , D.N. et al. Porter, D.C. Gunasekar, S. (1978). Basic Econometrics. Tata McGraw Hill,
5th edition, New York, America.
https://www.statisticshowto.com/probability-and-statistics/regression-analysis/find-a-linear-
regression-equation/
Vogt, W.P. (2005). Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social
Sciences. SAGE.
https://www.statisticshowto.com/probability-and-statistics/coefficient-of-determination-r-
squared/
http://reference.wolframcloud.com/language/LinearRegression/tutorial/LinearRegression.html
Page 6|6