Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

LINEAR REGRESSION

ASSIGNMENT OF STATISTICAL PACKAGES III

COURSE CODE : 3206

BSC(HONS.) STATISTICS

SESSION:2018-2022

SUBMITTED TO: MS. MALIHA BUTT

SUBMITTED BY: HIRA NAEEM

ROLL NUMBER: 1518-BH-STAT-18

SEMESTER: 6TH

ASSIGNMENT TOPICS:

SIMPLE LINEAR REGRESSION AND THE TERMS USED IN


REGRESSION REPORT IN MATHEMATICA

Page 1|6
LINEAR REGRESSION

SIMPLE LINEAR REGRESSION

History of the terminology "Regression"

"Francis Galton " invented the word regression. He wrote an article which stated that " An individual
characteristics inherited in their children which provides lesser degree on average". Regression leads to
mediocrity.

For example, Parents with small height have small height children and on average their heights are
relatively less than their parents. The term "regression" used to analyze the relation between variables (two
or more) in statistical techniques. Regression correlates with average. [1]

REGRESSION ANALYSIS INTERTERPRETATION

To develop a relationship in which dependent variable depends on the set of independent variables in order
to predict the mean values of dependent variables for the fixed values of independent variables.

Example, Lets take the hypothetical population distribution of heights of students for the fixed ages. At a
specific age there is a range of heights .The students at specific age does not have the same heights, but on
average height increases with age that is shown through regression line in which it passes from the average
height of every specific age . So we can predict the height at any age from the regression line. [1]

SIMPLE (TWO VARIABLE) REGRESSION ANALYSIS

When a variable is dependent on single explanatory variable it is known as simple regression analysis.

Linear regression is the most widely used statistical technique; it is a way to model a relationship between
two sets of variables. The result is a linear regression equation that can be used to make predictions about
data.

Two dimensional graph of simple regression analysis gives the fundamental ideas. As regression analysis
is about estimate for predicting the mean of population.

The primary equation is: Y = a + b(X) +e (error term)

IMPORTANCE OF ERROR TERM

It plays an important role in the model . As it represents the following terms that are almost neglibile.
Page 2|6
LINEAR REGRESSION

 Lack of uncertainty
 Data unavailable
 Error measurements
 Extreme unwillingness of not including the variables
 Inappropriate functional models

In case, e is zero because it is assumed that the independent variable (X) has negligible errors. Therefore,
it remains Y = a + b(X)

Where;

 a is the unknown and fixed parameter . It is defined as Intercept.


 b is also the unknown and fixed parameter. It is defined as "slope coefficients"

Regression coefficients: [2]

a and b are also called regression coefficients .

Value of b : b = [ ∑ XY – (∑Y)(∑X)/n ] / ∑(X-X`)2

Value of a : a = Y – b(X)

Regression analysis is all about to estimate a and b that is based on the observation of X and Y.

ESTIMATION PROCEDURE AND ASSUMPTIONS

The LEAST SQUARE METHOD

It is one of the method of estimation. It is powerful having statistical properties and simple in mathematical
calculations.

The model uses Ordinary Least Squares (OLS) method, which determines the value of unknown
parameters in a linear regression equation. Its goal is to minimize the difference between observed
responses and the ones predicted using linear regression model. [2]

DETERMINING REGRESSION FUNCTION : Our target is to find the estimated values extremely
closer to the actual values of Y. So we will choosing the a and b in such way that its sum gives the smallest
possible values. The method of least square will provide us the smallest values of the estimates a and b.

UNIQUE ESTIMATES BY LEAST SQUARE METHOD

To find the unique estimates, the differential calculus plays a significant role.

NUMERICAL PROPERTIES OF ORDINRARY LEAST SQUARE METHOD

These properties are obtained from the least square method

1) Observable Quantities : The observable (sample) quantities are taken from the least square estimators
so they can be computed easily.

2) Point estimators : Least square method provides the single value of estimators, so they point estimators
of the population parameter.
Page 3|6
LINEAR REGRESSION

3) Regression line : By obtaining the estimates, we can draw regression line. Regression line have
following properties:

 Sample means: If passes from the taken of sample X and Y.

 Estimated mean value is equal to the actual mean value.

 The algebric sum of the residuals is zero.

 There is no correlation between the residuals and predicted values.

 There is no correlation between the residuals and the fixed values of X. [1]

Properties of Estimators

 Expectation shows the unbiasedness of an estimator. Estimators are consistent and efficient.

 Greater variance(sigma2) have greater dispersion . Greater variance gives less precise information
of a and b. More spread shows larger sum of squares, means it have "smaller variance" of the least
square estimators and vice versa.

 The least square estimators should have the minimum variance value .This property makes the least
square estimators, the best estimator (BLUE).[1]

Coefficient of Determination (R Squared) [3]

The correlation coefficient tells how strong of a linear relationship there is between two variables.

 Range of r: -1 to +1  Perfect negative relationship: -1


 Perfect positive relationship: +1  No Linear relationship: 0

TERMS USED IN REGRESSION REPORT IN MATHEMATICA

If Regression Report is not specified, Regress automatically gives a list including values for Parameter
Table, R Squared, Adjusted R Squared, Estimated Variance and ANOVA Table. This set of objects
comprises the default Summary Report. [4]

ANOVA Table : ANOVA Table represents the analysis of variance table for the fitted model.

It provides a comparison of the given model to a smaller one including only a constant term. The table
includes the degrees of freedom, the sum of squares and the mean squares due to the model (labeled Model)
and due to the residuals (labeled Error).

The residual mean square is also available in Estimated Variance and is calculated by dividing the residual
sum of squares by its degrees of freedom.

The F-test compares the two models using the ratio of their mean squares. If the value of F is large, the null
hypothesis supporting the smaller model is rejected.

Estimated Variance : Estimated Variance represents the estimated error Variance.


Page 4|6
LINEAR REGRESSION

It is equivalent to the squared sum of Fit Residuals divided by the degrees of freedom n-p where n is the
length of the dataset and p is the number of parameters.

Parameter Table : Parameter Table is a set of all possible values that can occur in regression report. It
represents a table of fitted parameter information. It includes estimates, standard errors, t‐test statistics
and p‐values for each parameter.

R Squared : R Squared represents the coefficient of determination of R2.

It is the ratio of the model sum of squares to the total sum of squares.It gives the fraction of the variation of
the response that is predicted by the model.

Parameter CI Table : Parameter CI Table represents a table of confidence interval information for the
fitted parameters.

It includes parameter estimates, standard errors and confidence intervals.

Best Fit Parameters : Best Fit Parameters gives the list of estimated parameter values for the model.

Best Fit : Best Fit gives the model with the Best Fit Parameters inserted.

Fit Residuals : Fit Residuals represents the residual errors for the fitted values. It gives the difference
between the response values in the input data and the fitted values Predicted Response.

Predicted Response : Predicted Response represents the fitted values for data. It gives the value of Best
Fit for each of the input data points.

Durbin Watson D : The Durbin–Watson d statistic is a test for autocorrelation in the residuals from a
statistical regression analysis. The statistic takes on values between 0 and 4.

A value close to 0 indicates positive correlation, and a value close to 4 indicates negative correlation. [4]

REFERENCES :

Page 5|6
LINEAR REGRESSION

1. Gujarati , D.N. et al. Porter, D.C. Gunasekar, S. (1978). Basic Econometrics. Tata McGraw Hill,
5th edition, New York, America.

2. Edwards, A. L. et al. San Francisco, CA. Freeman, W.H.(1976). An Introduction to Linear


Regression and Correlation.

https://www.statisticshowto.com/probability-and-statistics/regression-analysis/find-a-linear-
regression-equation/

3. Gonick, L. (1993). The Cartoon Guide to Statistics. Harper Perennial.

Kotz, S. et al., eds. (2006), Encyclopedia of Statistical Sciences, Wiley.

Vogt, W.P. (2005). Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social
Sciences. SAGE.

https://www.statisticshowto.com/probability-and-statistics/coefficient-of-determination-r-
squared/

4. Edwards, A. L. et al. San Francisco, CA. Freeman, W.H.(1976). An Introduction to Linear


Regression and Correlation.

http://reference.wolframcloud.com/language/LinearRegression/tutorial/LinearRegression.html

Page 6|6

You might also like