Download as pdf
Download as pdf
You are on page 1of 292
COLLEGE OF AGRICULTURE AND ENVIRONMENTAL SCIENCES DEPARTMENT OF AGRICULTURAL ECONOMICS LO ECONOMETRICS (AgEc-2133) a BEX Assistant Professor of Agricultural Economics y A Evie Api g | Gondar, Ethiopia Unit Objectives After the completion of your study in this unit, you should be able to: > Define econometrics; > Describe and understand the uses of econometrics; > Discuss the methodology of econometric > Describe the amalgamate of econometric and its model > Describe elements of econometric > Des cribe types of econometric = Definition of Econometrics » Econometrics means economics measurement. » It is the application of statistical techniques to economic variables, which are expressed in mathematical forms. » It is the social science in which the tools of economic theory, mathematics and statistical inference are applied to the analysis of economic phenomena. » It is an integration of economics, mathematical economics and statistics with an objective to provide numerical values to the parameters of economic relationships. Thursday, January 28, 2021 Cont..... Economic Mathematical Theory Economics Economic Mathematic Statistics Statistics =~ Economic and Econometric Models © The first task an econometrician faces is that of formulating an econometric model, © What is a model? ° A model is a simplified representation of a real-world process. ° For instance, ‘the demand for oranges depends on the price of oranges’ is a simplified representation since there are a host of other variables that one can think of that determine the demand for oranges. ° A model should be representative in the sense that it should contain the salient features of the phenomena under study. Thursday, January 28, 2021 { Cont Simple models are easier: » To understand » To communicate » To test empirically with data © But the choice of a simple model to explain complex real-world phenomena leads to two criticisms: “The model is oversimplified “The assumptions are unrealistic © For instance, to say that the demand for oranges depends on only the price of oranges is both an over simplification and an unrealistic sumption, Thurs Jay 28, ; Cont ’ ‘To the criticism of over simplification, many have argued that it is better to start with a simplified model and progressively construct more complicated models. » As to the criticism of unrealistic assumptions, the relevant question is whether they are sufficiently good approximations for the purpose at hand or not. © In practice we include in our model: » Variables that we think are relevant for our purpose. » A ‘disturbance’ or ‘error’ term which accounts for variables that are omitted as well as all unforeseen forces. o brings us to the distinction between an economic mode], and, 50>: nometric model, a cont... ¢ mathematical modeling is exact in nature whereas the statistical modeling contains a stochastic term. » Any economic theory is an observation from the real world. v__ the immense complexity of the real world economy makes it impossible for us to understand all interrelationships at once. y¥ _ all the interrelationships are not equally important as such for the understanding of the economic phenomenon under study. » An economic model is a set of assumptions that approximately describes the behavior of an economy(or a sector of an economy). Thursday, January 28, 2021 Cont... > » It is an organized set of relationships that describes the functioning of an economic entity under a set of simplifying assumptions. » Economic models consist of the following three basic structural elements. A set of variables v A list of fundamental relationships and y A number of strategic coefficients { Cont... ) n econometric model consists of the following: 1) A set of behavioral equations derived from the economic model. These equations involve some observed variables and some ‘disturbances’. 2) A statement of whether there are errors of observation in the observed variables. 3) _ A specification of the probability distribution of the ‘disturbances ‘. » With these specifications, we can proceed to test the empirical validity of the economic model and use it to make forecasts or use it : in policy analysis. Thursday, January 28, 2021 Cont... > Desirable properties of an econometric model © The goodness of an econometric model is judged customarily according to the following desirable properties 1. Theoretical plausibility: The model should be compatible with the postulates of economic theory. 2. Explanatory ability: The model should be able to explain the observations of the actual world. 3. Accuracy of the estimates of the parameters: The estimates of the coefficients should be accurate in the sense that they should approximate as best as possible the true parameters of the structural \—_ model. 7 recasting ability: The model should produce satisfactory predictions of future values of the dependent (endogenous) variables. 5. Simplicity: The model should represent the economic relationships with maximum simplicity. » Ifan econometrics model is better, the more of the above properties a model possesses. Goal of Econometrics ’ Econometrics provides techniques for verification or refutation of economic theories and estimations of the parameters of econometrics models. © The aims of econometrics are: » Testing economic theory and Formulation of econometric models, that is, formulation of economic models in an empirically testable form (specification aspect). » Estimation and testing of these models with observed data (inference aspect). >Use of these models for prediction and policy purpose (forecasting the Thursday, January 28, 2021 behavior economic variable) Cont... > © Estimation of parameters: A parameter is a population value, which we do not know, but wish to estimate. » Estimation is a process, which enables us to determine the approximate value of a parameter of interest. ¢ Hypothesis testing: this is a statement of objective we are interested in and wish to check, It may or may not be true, but subject to checking, © For instance, one may state that there is a positive relationship between consumption and disposable income. © Or one may hypothesize that the mean life expectancy in Ethiopia equals Thursday, January 28,2021 to 65.5 years, and so on 7 | Cont... © Prediction: Prediction is forecasting of unknown values on the basis of prior information, © To predict the unknown values first we should estimates the population parameters. Thursday, Janua The main difference between economics theory, mathematical economics, statistics and econometrics are listed below ° Mathematical economics is to express economic theory in mathematical form or equations or models without regard to measurability or empirical verification of the theory. © It changes verbal descriptions into mathematical symbol or equation. ° Economic theory uses verbal or statement description of the relationship while mathematical economics states economic theory in terms of mathematical symbols. ( 6 Thursday, January 28,2021 Cont... » Both express economic relationship in an exact form and don’t allow for random variable and don’t provide numerical values for the coefficient of the relationship. ° Econometrics: is primarily interested in the empirical verification of economic theory. The econometrician often uses mathematical models proposed by the mathematical economist but puts these models in the forms that lend themselves to empirical testing. ¢ However econometrics assumes that the relationships are not exact and take into account random disturbances. Cont... » Economic statistics is mainly concerned with collecting, processing and presenting economic data in the form of charts, diagrams and tables. He or she collects data on the GNP, employment, unemployment, price, etc. the data thus collected constitute the raw data for econometric work. But the economic statistician, not primarily concerned with using the collected data to test economic theories, does not go any further. ° Mathematical statistics provides many of the tools employed in he theory. Thursday, January 28, 2021 Cont... > ¢ Econometrician often needs special methods in view of the unique nature of most economic data; namely, the data are not usually generated as the result of a controlled experiment. ° The econometrician, like the meteorologist, generally depends upon data that cannot be controlled directly. © Thus data on consumption, income, investment, savings, prices, etc., which are collected by public and private agencies are non- experimental in nature. Cont... Generally, the subject of Econometrics deserves to be studied in its own right for the following reasons: i. Economic theory makes statements or hypotheses that are mostly qualitative in nature (the law of demand), the law does not provide any numerical measure of the relationship. This is the job of the econometrician. 2. The main concern of mathematical economics is to express economic theory in mathematical form without regard to measurability or empirical verification of the theory. Econometrics is mainly interested in the empirical verification of economic theory. Thursday, January 28, 2021 Cont... 3. Economic statistics is mainly concerned with collecting, processing, and presenting economic data in the form of charts and tables. It does not go any further. The one who does that is the econometrician. 4. Mathematical statistics provides many of the tools for economic studies, but econometrics supplies the later with many special methods of quantitative analysis based on economic data. e Thar, jaary Methodology of Econometrics » Statement of Theory or Hypothesis: Keynes postulated that the marginal propensity to consume (MPC),the rate of change of consumption for a unit change in income, is greater than zero but less than 1. » Specification of the Mathematical Model of Consumption: a mathematical economist might suggest the following form of the Keynesian consumption function: Y=B1 +B2X 0< B2<1 Eq(1) ° This equation, which states that consumption is linearly related to income. ( e Thursday, January 28,2021 N Cont... > © Yis called the dependent, regressand, endogenous variable. © X is/are called the independent, exogenous, repressor or explanatory, variable(s). » Specification of the Econometric Model of Consumption: The purely mathematical model of the consumption function is limited interest to the econometrician. © It assumes that there is an exact or deterministic relationship between consumption and income. ‘* But relationships between economic variables are generally inexact,,, Cont... ecause in addition to income, other variables affect consumption expenditure. © To allow for the inexact relationships between economic variables, the econometrician would modify the deterministic consumption function as follows: ©Y=B1 +B2X+u Eq(2) “It is an example of a linear regression model. ° The disturbance term (u) may well represent all those factors that affect consumption but are not taken into account explicitly. cont... The econometric consumption function hypothesizes that the dependent variable Y(consumption) is linearly related to the explanatory variable X(income). “But that the relationship between the two is not exact; “It is subject to individual variation. Significance of the Stochastic Disturbance Term () » The disturbance term is a surrogate for all those variables that are omitted from the model but that collectively affect the dependent variable (y). \e Thursday, January 28, 2021 4 cont... ~ { » Reasons for including disturbance term in econometrics model ** Vagueness of theory: The theory determining the behavior of y may be, and often is, incomplete. ** Unavailability of data. ** Core variables vs. peripheral variables: the joint influence of all or some of the variables may be so small and at best nonsystematic or random that as a practical matter and for cost considerations it does not pay to introduce them into the model explicitly. Thursday, January 28, 2021 cont... ~ Intrinsic randomness in human behavior: Even if we succeed in introducing all the relevant variables into the model, there is bound to be some “intrinsic” randomness in individual y's that cannot be explained no matter how hard we try. * Poor proxy variables : Although the classical regression model assumes that the variables y and x are measured accurately, in practice the data may be plagued by errors of measurement. “> Principle of parsimony: If we can explain the behavior of y “substantially” with two or three explanatory variables and if our theory is not strong enough to suggest what other variables might be included Thursday, January “ cont... * Wrong functional form: Even if we have theoretically correct variables explaining a phenomenon, very often we do not known the form of the fanctional relationship between the regressand and the regressors. » Obtaining Data: To estimate the econometric model, that is, to obtain the numerical values of B1 and B2, we need data. © Table 1,1 Data onY(Personal Consumption Expenditure) and X (Gross Domestic Product, 1982-1996) Yeai yx 19RD | 4064.6 [6062.0 1982 [3081.5 [4620.3 1990 [4132.2 [6136.3 1983 [3240.6 | 4803.7 1991 | 4105.8 [6079.4 1984 [3407.6 [5140.1 1992 [4219.8 | 6244.4 1985 [3566.5 | 5323.5 1993 | 4343.6 [6380.6 1986 [3708.7 | 5487.7 1994 | 4486.0 [6610.7 1987 [3822.3 | 5649.5 1995 [4395.3 [6742.1 1988 [3972.7 | 5865.2 1996 [4714.1 [6928.4 y28, 2021 Cont... » » Estimation of the Econometric Model: The numerical estimates of the parameters give empirical content to the consumption function. © The statistical technique of regression analysis is the main tool used to obtain the estimates. ¢ Using this technique and the data given in Table 1.1, we obtain the following estimates of B1 and 2, namely,— 184.08 and0.7064 respectively. ° Thus, the estimated consumption function is: ¥=-18408+0.7064x, Fa) Thursday, January 28, 2021 oe ) The slope coo Ot. ‘the MPC) was about 0.70, which indicates that the average, or mean, consumption expenditure went up by about 70 cents for a dollar’s increase in real income. » Hypothesis Testing: As noted earlier, Keynes expected the MPC to be positive but less than 1. © In our example we found the MPC to be about 0.70. © But before we accept this finding as confirmation of Keynesian consumption theory, we must enquire is 0.70 statistically less than 1? © Such confirmation or refutation of economic theories on the basis of sample evidence is based on a branch of statistical theory known Thursday, January s statistical inference (hypothesis testing). = Cont... - > Forecasting or Prediction: If the chosen model does not refute the hypothesis or theory under consideration, ¢ We may use it to predict the future value(s) of the dependent, or forecast, variable Y on the basis of known or expected future value(s) of the explanatory, or predictor, variable X. © Suppose we want to predict the mean consumption expenditure for 1997. The GDP value for 1997 was 7269.8 billion dollars. Putting this GDP on the right-hand side of Eq(3), we obtain: Yio97 =—18408+0.706472698) = 49513167 Eq(4) Thursday, January 28, 2021 N Cont... > ° The actual value of the consumption expenditure reported in 1997 was 4913.5 billion dollars. ° The estimated model eq(3) thus over predicted the actual consumption expenditure by about 37.82 billion dollars. ° We could say the forecast error is about 37.82 billion dollars. © Forecast errors are inevitable given the statistical nature of our analysis. © There is another use of the estimated model Eq(3). © Suppose the President decides to propose a reduction in the income tax. . Asa result of the proposed policy change, investment expenditure Thursday, January 28, 2021 \ increases. = Cont... » © As macroeconomic theory shows, change in investment expenditure is given by the income multiplier M, which is defined as: M EQ) MPC © If we use the MPC of 0.70 obtained in Eq (3), this multiplier becomes about M= 3.33. ¢ That is, an increase (decrease) of a dollar in investment will eventually lead to more than a threefold increase (decrease) in income. ¢ Thus, a quantitative estimate of MPC provides valuable information for policy purposes. Thursday, January 28, 2021 Cont... > ° Knowing MPC, one can predict the future course of income, consumption expenditure, and employment following a change in the government's fiscal policies. » Use of the Model for Control or Policy Purposes: an estimated model may be used for control, or policy, purposes. » By appropriate fiscal and monetary policy mix, the government can manipulate the control variable X to produce the desired level of the target variable Y. e Thursday, January 28,2021 Cont... ane Anatomy of econometric modeling. Economic theory Mathematical model theory I Econometric model theory J Obtaining data I Estimation of econometric model Hypothesis testing I Forecasting or prediction Using the model for control or policy purpose NS ‘Types of Econometrics \ conometrics may be divided into two broad categories: theoretical econometrics and applied econometrics. Theoretical econometrics is concerned with the development of appropriate methods for measuring economic relationships specified by econometric models. concerns the development of tools and methods, and the study of the properties of econometric methods. It is concerned with methods, both their properties and developing new ones. It is closely related to mathematical statistics, and it states Thursday, January 28, 2021 ssumptions of a particular method and its properties. yy Cont... > © It is particularly important to know when they are not fulfilled. © In economics, data cannot be obtained from laboratory experiments because the researcher can’t hold all other conditions constant and change one element in performing an experiment. © In this aspect, econometrics leans heavily on mathematical statistics. ¢ Theoretical econometrics may state the assumptions of least square method, its properties, and what happens to these properties when one or more of the assumptions of the method are not fulfilled. Thursday, January 28, 2021 Applied econometried fr concerned with the measurement) of parameters of the econometrics of the economic relationships and with the prediction by means of these parameters of the value of economic variables, ® It is describing the development of quantitative economic models and the application of econometric methods to these models using economic data. » We use the tools of theoretical econometrics to study some special field(s) of economics and business, such as the production function, investment function, demand and supply functions, portfolio theory, etc > It oes the application of theoretical econometrics or the analysis off Thursday, january 2 The core element of econometrics » Collection of data > Specification » Estimation > Inference Types of data © Various types of data is used in the estimation of the model. * Cross sectional data: The cross section data give information on the variables concerning individual agents (e.g, consumers or produces) at a given point of time. Thursday, January cont... - Time series data: It give information about the numerical ) values of variables from period to period and are collected over time. For example, the data during the years 1990 -2010 for monthly income constitute s_a time series data. ** Panel data: The panel data are the data from repeated survey of a single (cros -section) sample in different periods of time. * Dummy variable data: When the variables are qualitative in nature, then the data is recorded in the form of indicator function. © The values of the variables do not reflect the magnitude of data. They reflect only the presence/absence of a characteristic. © For example, the variables like religion, sex, taste, etc. arg, qualitati variable: COLLEGE OF AGRICULTURE AND ENVIRONMENTAL SCIENCES DEPARTMENT OF AGRICULTURAL ECONOMICS LO ECONOMETRICS (AgEc-2133) a BEX Assistant Professor of Agricultural Economics y A Evie Api g | Gondar, Ethiopia Unit Objectives After the completion of your study in this unit, you should be able to: **Describe and understand the meaning of correlation and covariance; “Compute and interpret covariance of two random variables; **Describe and understand what a correlation coefficient is; “Compute and interpret correlation coefficient of two random variables ** Describe types of correlation coefficient f - a = Covariance » Covariance is a parameter that captures the plus or minus relationship is known as covariance. » It describes about the relationship between two variables without regard to cause and effect relationship. » Covariance indicates how two variables are related. » The formula for calculating covariance of sample data is ons LO -YO-Y) or . n-1 D&, -DY,-¥) COV(x, y) = =t —__—— Thur, nry 28, 2021 Cont... * Regarding the values of Ox, , there are three possible cases » Case I: %, > 0. In this case larger than mean values of X are usually associated with larger than mean values of Y and vice versa. » Case II: oy <0. In this case larger than mean values of X are usually associated with smaller than mean values of Y and vice versa. » Case III: 7% = 0. In this case there is no systematic relationship between X and Y. ¥ In this situation, either X andY are statistically independent or non-linearly related. Thursday, January { Cont... Example 1:To understand how covariance is used, consider the table below, which describes the rate of economic growth (x,) and the rate of return on the Stock price (y,). Economic growth% {2.1 2.5 |4.0 3.6 Stock Price % 8 12 14 10 » Find the covariance of rate of economic growth (x,) and the rate of return on the Stock price (y,). > Covariance measures variables that have different units of measurement. > Using covariance, you could determine whether units were increasing or decreasing » But it was impossible to measure the degree to which the variables moved together or It does not measure the degree of associations. Thursday, January 28, 2027 oe Correlation Analysis » Correlation analysis: deals with the measurement of the closeness of the relationship which are described in the regression equation. » Correlation is a statistical technique used to determine the degree to which two variables are related. » Suppose we have two variables X and Y ** When higher values of X are associated with higher values of Y and e versa, then the correlation is said to be positive or direct. * When higher values of X are associated with lower values of Y and vice versa, then the correlation is said to be negative or inverse. Thursday, Janua cont... ~ » Correlation to measure the degree to which variables move together. » Correlation standardizes the measure of interdependence between two variables. The correlation measurement, called a correlation coefficient, will always take on a value between 1 and —1. Thursday, January 28, 2021 Correlation Coefficient ) Correlation coefficient is a quantitative measure of the strength of the linear relationship between two variables. Correlation coefficient r measures the degree of 'straight-line! association between the values of two variables. Thus a value of +1.0 or -1.0 is obtained if all the points in a scatter plot lie on a perfect straight line. The population correlation coefficient p (rho) measures the strength of) the association between the variables. The sample correlation coefficient r is an estimate of p and is used to measure the strength of the linear relationship in the sample observati Features of p andr Init free > Range between -1 and 1 > If the correlation coefficient is one, the variables have a perfect positive correlation. > If correlation coefficient is zero, no relationship exists between the variables. » If correlation coefficient is —1, the variables are perfectly negatively correlated (or inversely correlated) and move in opposition to each other. » In essence r is a measure of the scatter of the points around an eo derlying linear trend. Thursday, January 28, 2021 ; Cont... > It does not necessarily imply any cause-and-effect relationship. » It is symmetrical in nature; that is, the coefficient of correlation between x and y is the same as that between y and x. » The correlation coefficient usually calculated is called Pearson's correlation coefficient, r. » The sample correlation coefficient is computed by: _ Covyy rey = Vary xVary OT r Dx ny = Veo) a myXY XIN) OF CEL (LY) Thursday, January 28, 2021 = Cont... - » To calculate correlation, you must know the covariance for the two variables and the standard deviations of each variable. » From the earlier example, you know that the covariance of Stock price returns and economic growth was calculated to be 1.53. » Now determine the standard deviation of each of the variables to obtain correlation coefficient. COV(x,y)_ 153 SS, (0.92.58) hry, jnry 28,2021 Nay = ano Cont... » The correlation coefficient posses all the characteristics of covariance. In addition, it provides a measure of strength. » A very small correlation does not necessarily indicate that two variables are not associated. » However, to be sure of this we should study a plot of data, because it is possible that the two variables display a non-linear relationship > In such cases r will underestimate the association, as it is a measure of linear association alone. Thursday, January 28, 2021 ¥ reloteto +1 1 x x © © @ ¥ ppositive but ¥ pnegative but relose to-1 close to sere Close to ero, = x © © © ¥ vox? but 0. x Thursday, January 28, 2021 Cont... » » Example 2: From a sample of 20 observation of two random variables X andY, the following data are obtained. Dxi=dO12=2154 DHL GW’ =869 LX, = 186.2 Dose o-=104 LN =219 n= 20 A. Find the covariance of X andY and interpret your result. 8. Find the correlation coefficient of X andY interprets your result. Thursday, January 28, 2021 = Cont... >) » Example 3: when nicotine is absorbed by the body cotinine is produced. A measurement of cotinine in the body is therefore a good indicator of how much a person smokes. Listed below are the reported number of cigarettes smoked per day (Xi) and the measured amount of nicotine (Yi in ng/ml). X; ] 0 M4 }4 3 | WW} 1 {20 )8 47 10 | 10 |20 Yg {179 ]283 | 75.6 | 174 | 200 | 951 350 | 1.85 [434 | 25.1) 408 | 344 » Calculate the sample correlation coefficient for cigarettes per day smoked and amount of nicotine. lJ, JX, = 175, ¥, =210246,].X2 = 5185,,¥7 = 601709.7926, DXZY, = 3711151. Hence, January 28, 2021 Cont... = In correlation analysis, we treat any (two) variables symmetrically: there is no distinction between the dependant and explanatory variables. » Correlation between scores on mathematics and statistics examinations is ics and mathematics. the same as that between scores on statis » Moreover, both variables are assumed to be random. Thursday, January 28, 2021 ypes of correlation coefficient ere are correlation procedures for all four data types (nominal, ordinal, interval, and ratio). A, The Pearson’s Product Moment Correlation Coefficient (rxy ) computes the correlation between two interval or ratio variables. > Requires cardinal numbers and Linear relationships of variables Spearman’s rho (rs) computes the correlation between two ordinals, or ranked variables. The Contingency Coefficient (C) and Cramer's Phi (pC) compute the strength of relationship when testing nominal data analyzed by a X- square procedure. Thursday, January 28, 2021 Cont... 7 e Phi Coefficient (r@) computes the correlation between two dichotomous variables (two and only two categories » Additionally, a study may require the computation of a correlation coefficient between mixed data types. A, Point biserial is used when one variable is interval /ratio and the second is dichotomous. B. Rank biserial is used when one variable is ordinal and the second is dichotomous. C. Kendall's Tau correlation Coefficient: is a coefficient that represents the wursday, January 28, 2021 - . @ Kegree of concordance between two columns of ranked data! carson Product Moment Correlation Coefficient ) » The most popular correlation coefficient is the Product Moment correlation coefficient, better known as Pearson’s r. » Pearson’s r is used to determine the correlation between two variables under three conditions. Both variables must be interval and ratio measures (i.c. attitude scales, test scores). The relationship between the two variables must be linear — the data points must generally fall along a straight line. Both variables are normally distributed. A skewed distribution Thursday, January 28, 2021 produces a smaller r than a normal distribution. Cont... » Therefore, Pearson Product Moment Correlation Coefficient is a widely used statistical method for obtaining an index of the relationships between two variables when the relationships between the variables is linear and when the two variables correlation are continuous. » To Calculate Pearson Product Moment Correlation Coefficient is the same as we stated above 20D) (>So MYDD e oa) Il) Q or cont... ae ic Trunk XY Ye De Height —|Diameter Y) (X) 35 8 280 1225; 64 49 9 441 2401 81 27 7 189 729 49 33 6 198 1089 36 60 13 780 3600 169 21 7 147 441 49 45 ll 495 2025 121 51 12 612 2601 144 =2.9] 73 =2149 Is=1411] |5=713 Calculate Pearson correlation coefficient of x and y naan NX » Hypotheses Hy: p =0 Hy: p #0 = Significance Test for Correlation cont... —~ (no correlation) (correlation exists) » Test statistic |; — (with n — 2 degrees of freedom Thursday, January 28, 2021 Ss cont... Test Solution Do not reject Hy Reject He 0 2 tay 2.4469 4.68 Decision: Reject Ho Conclusion: There is evidence of a linear relationship at the 5% level of significance Thursday, January 28, 2021 = Spearman’s Rank Correlation » Spearman’s rho yields a correlation coefficient between two ordinals, or ranked variables. » When we have qualitative data like efficiency, honesty, intelligence, etc we calculate what is called Spearman’s rank correlation coefficient as follows: > Steps i. Rank the different items in X and. ii Find the difference of the ranks in a pair , denote them by di iii, Use the following formula Thursday, January 28, 2021 =m Cont... » Occasionally we may need to determine the correlation between two variables where suitable measures of one or both variables do not exist. > However, variables can be ranked and the association between the two variables can be measured by : 6y a? pole . n(n? -1) “Sif 7, closes to 1: strong positive association “if 7, closes to -1: strong negative association “if T, closes to 0: no association Thursday, January 28, 2021 Example 5: Aster aft ont, were asked to rank 7 different types of lipsticks, see if there is if there is correlation between the tests of the ladies. Lipsticks A |B IC |D |E |F |G Aster 2 {1 [4 |3 {5 |7 |6 Almaz 1 {3 |2 |4 |5 |6 |7 » Calculate Spearman rank correlation coefficient Answer F-09736 > spearman variables » Generally, the Spearman correlation is used when: “Measuring the relationship between two ordinal variables. “Measuring the relationship between two variables that are related, but not linearly. hry, ry 28,2021 cont —~ The hypotheses are: Hp: r, » Hs #0 | cov( a,b » The test statistic is |t, = govt 4,5) S48, Y aand bare the ranks of the data. For a large sample (n > 30) r, is approximately normally distributed |z=ryn—l| » The two variables must be ranked in the same order, giving rank 1 either to the largest (or smallest) value, rank 2 to the second largest (or smallest) value and so forth. » . If there are ties, we assign to each of the tied observations the mean of the ranks which they jointly occupy; thus, if the third and fourth ordered 344 35 values are identical we assign each the rank of Thursday, January 28, 2021 p> cont, \ Ss. «> The ordinary sample correlation coefficient r can also CP used to calculate the rank correlation coefficient where x and y represent ranks of the observations instead of their actual numerical values. * Calculate the rank correlation coefficient Month Valuex rank(x) Valuey rank(y) d & YO @ @ @ © 66M 1 1 1 1 1.5 -0.5 0.25 2 2 2 1 1.5 0.5 0.25 3 3 3 2 3.5 -0.5 0.25 4 4 4 2 3.5 0.5 0.25 4 0 ier sang pena 28, 2027 cont... Example 7: Calculate the Spearman’s rank correlation, , between x and y for the following data: 7 rank (3 rank (x (rank(y,)~rank(x,))" 52 10 54 14 47 6 42 8 49 6 4 8 38 50. 49 8 a. Calculate the coefficient of correlation between x and y. b. Calculate the Spearman’s rank correlation between x and y c. Compare your results from part a and b. Thursday, January 28, 2021 Kendall’s tau correlation is a coefficient that represents the degree of concordance between two columns of ranked data. **The greater the number of “inversions” the smaller the coefficient will be. & It ranges -1 to 1. ** We can’t square the correlation to get the coefficient of determination. “> It is a non-parametric test of correlation that can be used as an alternative to Spearman. Thursday, January 28, 2021 ss. cont... Kendall's.tau = = C+D Where C- Sum of concordance pairs D- Sum of discordance pairs Calculations: Concordance pairs and Discordance pairs. » Concordance pairs: the number of observed ranks below a particular rank which are larger than that particular rank. » Discordance pairs: the number of observed ranks below a particular rank which are smaller in value than that particular rank. Thursday, January 28, 2021 Example 8: Calculate Kendall’s tau correlation coefficient for the following ranked data. Master Student! C (concordance) _ | D(discordance) 1 2 10 1 2 1 10 0 3 4 8 1 4 3 8 0 5 6 6 1 6 5 6 0 7 8 4 1 8 7 4 0 9 10 2 1 10 9 2 0 1 12 0 re, = 60 1 ‘Dhueséay, January 28, 2021 cont... > Statistical significance of Kendall’s tau: Z ee zara nn—l) _3*0.818/1202-1) _3*0.818*11.489 Y22n+5) ——-2021)+5)——t:«2«1G =3.7019 “The statistical test of Kendall's tau test statistics result indicates that the correlation of master rank and student rank is statistically significance. Thursday, January 28, 2021 Example 9: Calculate Kendall’s tau correlation coefficient for | eigilpwing raked data. C(concordance) _ |D(discordance) : Z 9 iv 2 2 5 : 3 3 g ; 4 4 ; : 5 5 F , 6 6 5 : 7 7 ri 1 8 8 5 : 9 9 5 : 10 10 1 i Ul il ; ; 12 1 yo ae Yo=45 cont... > Partial correlation: allows you describe the relationship between two variables after controlling for the influence of a third variable. “Semi partial correlation: allows you to assess the association between two variables with the effect of the third variable removed from only one of the variables being correlated. * Multiple correlation: allows you to assess the correlation between one variable and a combination of other variables. Thursday, January 28, 2021 COLLEGE OF AGRICULTURE AND ENVIRONMENTAL SCIENCES DEPARTMENT OF AGRICULTURAL ECONOMICS [| ECONOMETRICS (AgEc-2133) a BEX Assistant Professor of Agricultural Economics y January, 2021 Gondar, Ethiopia After the completion of this unit, you should be able to YDescribe and understand the assumptions underlying in CLRM v Estimate the point estimators of the population parameters; v Estimate the measure of goodness of fit, r? YDescribe and understand properties of OLS estimators; vTest the true population parameters based on sample observation Thursday, January 28, 2021 ( Simple Linear Regression >) » Regression analysis is concerned with the study of dependence o! one variable on one or more other variable variables with a view to estimating the mean value of the former in terms of the known or fixed values of the latter, » Regression is estimation or prediction of the average value of a dependent variable on the basis of the fixed values of other variables, » Simple linear regression model, is the most elementary type o} regression model, which can be expressed by the following Thursday, January 28, 2021 ‘SGquation: 7 Cont... ~ Y=A+BXx,+u (ld > Simple linear regression model is also called the two-variable linear regression model or bivariate linear regression model because it relates the two variables X and Y. » The variables Y and X have several different names used interchangeably, as fo Independent variable, > Explanatory Dependent variable > Explained > Control > Predicted > Predictor > Response » Regressor Thursday, January 28, 2021 » Regressand > Covariate ( Assumptions of Classical Linear Regression > » Al: Linear regression model ** The regression model is linear in the Parameters. “> But it may or may not be linear in Variables. » A2:X values are fixed in repeated sampling * Values taken by the regressor X are considered fixed in repeated samples. “More technically, X is assumed to be non-stochastic for repeated sampling. » A3: Zero mean value of disturbance “Given the value of X, the mean, or expected value of the random | Qrturbance Caaneee Thursday, Janary 28, 2021 ( Cont... ~ +> Technically, the conditional mean value of is zero. E(u,/X,)=0 » A4: Homoscedasticity or equal variance of disturbance term ** Given the value of X, the variance of is the disturbance term same for all observations. “* That is, the conditional variances of disturbance term are identical. Var(u,/X,)=Flu, -E(u,/X,)P =E(u'/X,)= 0° * The variation around the regression line is the same across the X values; ov" increases nor decreases as X varies. Thursday, January ( Cont... ) » AS: No autocorrelation between the disturbances * Given any two X values x, andX,(@#/) , the correlation between any two uy, and u@4)) is zero. covu,,u,/X,,X,) =E(u, | X,)(u; |X,)=0 * Technically, this is the assumption of no serial correlation, or no autocorrelation. > A6: Exogeneity of the independent variables * Zero covariance between y, and X, . cov(u,X,) = E(u,X; =0 Cont... >) » A7: The number of observations must be greater than the number of parameters to be estimated *: Alternatively, the number of observation must be greater than the number of explanatory variables. » A8:Variability in values ** The X values in a given sample must not all be the same. Technically, var(X) must be a finite positive number. » A9: No Specification error * The regression model is correctly specified. “Alternatively, there is no specification bias or error in the model used in y, P (pirical analysis. ; / Methods of estimation \ » Specifying the model and stating its underlying assumptions are the first stage of any econometric application. » The next step is the estimation of the numerical values of the parameters of economic relationships. » The following are the most commonly used methods are: Ordinary least square method (OLS) Maximum likelihood method (MLM) Mcthod of moments (MM) Bayesian estimation technique e The free hand method \ Thursday, January 28,2021 | SS The semi-average method / Driving the OLS Estimators ’ » To estimate the parameters of linear regression model the following assumption is needed. Ale)=0 (1.3 CoX,2)= E(X,2)=0 (1.3) » The dominating and powerful estimation method of the parameters (or regression coefficients) and is the method of least squares. » Consider the two -variable Population Regression Function (PRF): Ya Bt BX, +6, (1.4) » However, the PRF is not directly observable; we estimate it from the Thursday, January 28, 2021 er’ Regression Function (SRF): i. ( Cont... Y=Bt BX +e (1.3 ¥=¥ite, (16 > Express equation (1.6) as: 6 =Y,-Yi &=¥-By-B,X, (1.7) » Choose the SRF in such a way that the sum of the residuals is as small as possible by adopting the least square criteria. Ye,=-Du,-7%.) ( e Thurs, Jory 28, 2021 ( Cont... > The sum of squares of the errors (SSE) is: Dep Lary Lee = LG -A-B, x (18) > Differentiating equation (1.8) partially with respect to the estimators, we obtain @ 2G - B- BX,)=2Ds, =0 7 (s = 2D, - By AX )X, = 2D 6X, =0 Thursday, January 28, 2021 6 Cont... \ » The process of differentiation yields the following normal equations for estimating the parameters. Dy =nhy +B, DX, (9) a a DEX =B DX +B DX, (1.19 » Solving the normal equation simultaneously or using matrix algebra, we ha xn Edn (Zxn-2ARN) nD XP -E x [5 x- 22} LOAN =¥) __ Vx, _ Cov(X.Y) @.tt) Du, - x)? xs Var (X) | e Thursday, Janary 28, 2021 N Cont... ) LARLY - DX, Bo yay » The estimators obtained from eq (1.11) and (1.12) are known as the =Y-B,X (1.12) least-squares estimators, for they are derived from the least-squares principle (the ordinary least Squares (OLS) estimators of the population parameter). » The least squares line or the estimated regression line of Y on X is formulated as: A A ~ Yi =£B,+ BX XN _/ Co ,cont... \ [> The variance of sstimatorBq is: iy? var(,) = Re S (1.13) A > The standard error of estimator Fo is: se(By) =| var(f, (1.14) A > The variance of aa B, is: var(B,)= a - (1.15) » The standard error of estimator f, is: se(p,)= \var(A) (1.16) » Levels of estimators are correlated because both depend on i.e. are . 5 xX ev of the same sample: conf, A)=0? se oe ml}? : ’ XN Numerical properties of OLS estimators) if 8 The OLS estimators are expressed solely in terms of the observable (i-e., sample) quantities. They are point estimators; that is, given the sample, each estimator will provide only a single (point) value of the relevant population parameter. Once the OLS estimates are obtained from the sample data, the sample regression line can be easily obtained. Thursday, January 28, 2021 ( Statistical Properties of OLS Estimators » The statistical properties of OLS estimators are contained in the well- known Gauss— Markov theorem. >To understand this theorem, we need to consider the best linear| unbiasedness property of an estimator. > An estimator is said to be a best linear unbiased estimator (BLUE) of the population parameter if the following hold: 1. It is linear, that is, a linear function of a random variable, such as the) dependent variable y in the regression model. It passes through the sample means of Y and X Ks NX a Cont... \\ 2. It is unbiased, that is, its average or expected value is equal to the true value. 3. The mean value of the estimated y ,Y is equal to the mean value of the actual Y. 4, It has minimum variance in the class of all such linear unbiased estimators; an unbiased estimator with the least variance is known as an efficient estimator. 5. The mean value of the residuals is zero 6. The residuals are uncorrelated with the predicated Y, y. Thursday, January 28, 2021 6" residuals are uncorrelated with Xi . / Cont... >) 8. Consistency: As the sample size approaches to infinity, the sampling distribution of the estimators collapses to the population parameters. This holds if the variance of the estimators approaches zero as the sample size approaches infinity. True estimators of the parameters are consistent estimators. > Therefore, the BLUE properties are linearity, unbiasedness, efficiency, and consistency. > To estimate the variance of the OLS estimator, the variance of the disturbance term is needed. The variance of the population disturbance term, g? since in most cases ‘a 5 Cont... >) ue (1.18) n-2 A 2 > Oo unbiased estimator of O >» Example 1: Hypothetical data on weekly family consumption expenditure, Y, and weekly income X. Y 70 65 90 95 10 115 120 140 155 150 X 80 100 120 140 160 180 200 220 240 260 DX;=1,700 LY, =1L110 > X,Y, = 205,500 EX? =322,00 DY? =132100 LX Lh =1887,000 Dv =D, -¥) =33,000 Py? = Pr -¥)? =8,890 Da = DX -4)G;-¥)=16800 » Estimate the estimated parameters of the above data Thursday, January 28, 2021 Example-Estimate the regression equation for the following data and, interpret the results and also formulate the regression line equation [pe Y, (Consumption) X; (Disposable Income) 1 102 114 z 106 118 3 108 126 4 110 130 5 122 136 6 124 140 7 128 148 8 130 156 9 142 160 10 148 164 jt 150 170 154 17g ers wary 28,2021 ( Goodness ot Fit \ > Coefficient of determination is a measure in a simple linear regression analysis that shows the explanatory power of independent variables (regressors) in explaining the variation on dependent variable (regressand). » The total variation on the dependent variable can be decomposed as follows: Hat }= |p -j Yo “ike =V 6-7 Ye LG, ile “Lb Hed (1.19) » For n observations and k explanatory variables, the total variation in the dependent variable can be decomposed into explained and unexplained riation. Thursday, January 28, 2021 la Cont... \ > TSS, , = SSR, + SSE, , (1.20) > Where ISS, = total sum of squared deviation with n-1 degrees of freedom > R= sum of squared deviations or explained variation by then regression with k-1 degrees of freedom > SSE, , =u of squared deviation or unexplained variation with n-k degrees of freedom, and k is the number of parameters estimated. » kis the number of parameters estimated. » Now dividing by TSS on both sides, we obtain > 1 = £95, ROS _ Ley? a > OO LO X (— Cont... > ESS _,_ RSS (21) TSS TSS » R’ is the ratio of the explained variation compared to the total variation, or the fraction of the sample variation inY that is explained by X. > The coefficient of determination is explained in a number of ways: v it is defined in terms of variation about the mean of y so that if a model is re- parameterized (rearranged) and the dependent variable changes will change. ¥ it is Never falls if more regressors are added to the regression. e Thursday, January 28,2021 ( Residual Analysis » We need to check the many assumptions of regression about the errors by examining the residuals. v Examine for linearity assumption, y Examine for constant variance for all levels of X of homoscedasticity v Evaluate normal distribution assumption, and v Evaluate independence assumption. Thursday, January 28, 2021 NX ( Basics of Hypothesis Testing ° » Hypotheses are actually about the population parameter, but not the estimator. » There are various tests: i. F —test: hypothesis that Null Model does better; overall model adequacy. 2. Loglikelihood Test: joint significance of variables in a maximum] likelihood estimator (MLE) model; or 3. T-test: tests that individual coefficients are not zero. Ks / Cont... > Hypothesis testing procedures : Formulate null and alternative hypothesis: alternative depends on 1 or 2 tailed test: Example: Hy :\=0; H,:f, #0 (two sided test ) Specify test statistic and appropriate distribution: (1.29 3. Choose rejection region: Q@ 4. Calculate test statistic. 5. Reject /Fail to reject the null hypothesis aces . ate >i I>) reject null hypothesis (two sided), Conclusion 6 = if P(t|>1,) reject mull hypothesis (one sided), Thursday, January 28, 2021 Di » The theory of hypothesis testing is concerned with developing rules or procedures for deciding whether to reject or not reject the null hypothesis. » There are two mutually complementary approaches for devising such rules, Y confidence interval Y test of significance. » Both these approaches predicate that the variable (statistic or estimator) under! consideration has some probability distribution » Hypothesis testing involves making statements or assertions about the Oro of the parameter(s) of such distribution. Thursday, January 28, 2021 y,_ the contidence-mtervat approach » This is an approach to test the significance of the parameter. >It uses one sided (one tail test) or two sided (two tail test) to estimate the confidence interval of estimators. » Example : the Confidence interval of an estimator by using two tail test is estimated by the following formula: C1(B,)=A ett, se(B,) (1.23) » If the population parameter value stated under null hypothesis falls within the confidence interval, we do not reject the null hypothesis 3} lies outside the interval, we may reject it. Thuruday, January 28,2021 N ) Mi Cont... » In statistics: vwhen we reject the null hypothesis, we say that our finding is statistically significant. Y when we do not reject the null hypothesis, we say that our finding is not statistically significant. ( eo hurd, Jnuary 28, 2021 N la The test-ot-signiticance approach » An alternative but complementary approach to the confidence-interval method of testing statistical hypotheses is the test-of-significance approach. » A test of significance is a procedure by which sample results are used to verify the truth or falsity of a null hypothesis. » The key idea behind tests of significance is that of a test statistic (estimator) and the sampling distribution of such a statistic under the null hypothesis. » The decision to accept or reject null hypothesi - Cont... > The t-test statistics is calculated by using the following formula: AA A, (1.24) selB,) sel B, t= » In the language of significance tests, a statistic is said to be statistically significant if the value of the test statistic is greater than the critical (tabulated) value. » A test is said to be statistically insignificant if the value of the test statistic is less than the tabulated value. e Thursday, January 28,2021 COLLEGE OF AGRICULTURE AND ENVIRONMENTAL SCIENCES DEPARTMENT OF AGRICULTURAL ECONOMICS LO COURSE ECONOMETRICS (AgEc-2133) a BEX Assistant Professor of Agricultural Economics y January, 2021 Gondar, Ethiopia init Objectives After the completion of this unit, you should be able to: » Describe assumptions of CLRM. » Estimate the point estimators of the population parameters, in three variable regression models; > Estimate R’, in three variable regression models; } Estimate the interval estimation of the population parameters in three variable regression models; > Test the true population parameters based on sample observation; and > Test the model adequacy of the regression usin F-test Point Estimation » ‘ultiple regression models: is models in which the dependent variable, or regressand, Y depends on two or more explanatory variables, or regressors. >It is a statistical tool that allows you to examine how multiple independent variables are related to a dependent variable » It is more amenable to ceteris paribus analysis because: “It allows us to explicitly control for many other factors that simultaneously affect the dependent variable “It is important for testing economic theories “It is important for evaluating policy effects when we must rely on none Thursday, Januar 28, 2021 experimental data So Cont... ) * The model for multiple linear regressions is given by Y=f,+BX,+BX,+..+ BX tu, ** The simplest possible multiple regression model is three-variable regression, with one dependent variable and two explanatory variables. Y=B+BX,,+ BX, tu, Thursday, January 28, 2021 Assumption of CLRM » Linear in parameters » values of Xi is fixed in repeated sample > Zero mean value of U;, or AulX,,X,)=0 for each i > No serial correlation, or covti;,u;) =0 ix] > Homoscedasticity, or var(u;) = o » The number of observations must be greater than the number of variables » Zero covariance between¥; and each X variable, or cov(u,, X,,) = cov(u,,X5,) =0 Thursday, January 28, 2021 Cont... ** The values in a given sample must not all be the same. **No specification bias, or the model is correctly specified **No exact collinearity between the ¥ variables, or No exact linear relationship between X) and X3 » No collinearity or no multicolinearity means more than one exact linear relationship is involved between regressors. » Informally, no collinearity means none of the regressors can be written as exact linear combinations of the remaining regressors in Thursday, January Cont... Formally, no collinearity means that there exists no set of numbers, 4 and 4:, not both zero such that AX i +AXy =0 eq. 4.1 » If such an exact linear relationship exists, then X, and_X, are said to be collinear or linearly dependent » On the other hand, if (eq.4.1) holds true only when 4, = 4; = 0 then X, and_X; are said to be linearly dependent. » Thus, if X,,=-4X,, the two variables are linearly dependent, and i both are included in a regression model, we will have perfect Thursday, January 28, 2021 @collinea -y or an exact linear r/ship between the two regres: yy Cont... ») ppose that in PRF , X, and X, represent consumption expenditure, income, and wealth of the consumer, respectively. » In postulating that consumption expenditure is linearly related to income and wealth, economic theory presumes that wealth and income may have some independent influence on consumption. » If there is an exact linear relationship between income and wealth, there is no way to assess the separate influence of income and wealth on consumption. Thursday, January Cont... >) Let X, =2X, in the consumption-income-wealth regression. **Then the PRF regression becomes Y=B+BX,+BQX,)+u, =f +(B,+2B)X, +u, =f, +aX,, +u, **The assumption of no multicolinearity requires that in the PRF we include only those variables that are not exact linear functions of one or more variables in the model. ¥ First, the assumption of no multicolinearity pertains to our theoretical (i.e., PRF) model. ¥ Second, keep in mind that we are talking only about perfect linear relationships between two or more variables. Tae, emery 28,2001 Cont... ») Multiple regression analysis is regression analysis conditional upon the fixed values of the regressors. EY) XyXy)= B+ BX a + BX, ** The meaning of partial regression coefficient is as follows: > Pr measures the change in the mean value of Y,E(Y) per unit change in.X,, holding the value of X, constant. * Table 4.1 Fertility, Literacy and GDP for 64 Countries » Recall that in the example, Y= child mortality (CM), X, = per capita GNP (PGN), and; = female literacy rate (FLR). Thursday, January 28, 2021 Cont... et us suppose we want to hold the influence of FLR constant. “Since FLR may have some effect on CM as well as PGNP in any given concrete data “What we can do is to remove the (linear) influence of FLR from both CM and PGNP by running the regression of CM on FLR and that of PGNP on FLR separately and then looking at the residuals obtained from these regressions. CM, = 263.8635—2.3905FLR, +11, se = (12.2249) (0.2133) r? = 0.06695 PGNP, = ~39.3033+ 28.1427FLR, +1021 se = (734.9526) (12.8211) Thurs, January 28,2021 Cont... >) After obtain thew, and y1,, , Tegress uy, ON yo, Which are “purified from the linear influence of FLR. » The regression results are as follows. ti = 0.0056 wi se = (0.0019) 7? = 0.1152 » This regression has no intercept term because the mean value of the OLS residuals «,,and j1,, is zero. » The slope coefficient of -0.0056 now gives the “true” or net effect of a unit change in PGNP on CM . » It gives the partial regression coefficient of CM with respect to Thursday, January 28, 2021 \PGNP, Cont... ) Also we can get the partial regression coefficient of CM with respect to FLR can replicate the above procedure by first regressing CM on PGNP and getting the residuals from this regression (un), then regressing FLR on PGNP and obtaining the A A residuals from this regression #2), and then regressing 1; on ua; « **This multi step procedure is very lengthy and not quickly to obtain the partial regression coefficients. *To avoid the problem of multi step procedure, OLS procedure would be used to accomplish quickly and routinely partial Thursday, January 28, 2021 egression coefficients. Cont... >) Consider the sample regression function (SRF) corresponding to the PRF (% = 4, +X, + AXs +U)) as follows Y= B+ BX, +B, Xy tui **The OLS procedure consists in so choosing the values of unknown parameters that the residual sum of squares (RSS) or D1,’ as small as possible. ye =Dv-B-BX, “BX “Partially with respect to the three unknowns and setting the resulting equation to zero to obtain them. Dy -A-BXy-ABXI)=0 Thursday, January 28, 2021 / Cont... arn, yee BBX BX) =0 en? rare Pa = 230, AB, Xy AB XyXy)=0 wR “Simplifying these, we obtain Y=f,+P.X:+B,X, eq. 1 Dies AD ALY ALD cca DH ADM, +ADNM AAD eq.3 » Substitute eq. 1 into eq. 2 and 3 we obtain yx, -Y XY, =B DX, BX TX, + BG, BXDX, ) DRAIN BL BHL IG IB EWA LEB, cso Cont... >) above equations can be written in deviation form as follows Lyi BL tA Lets LA = B Sots + B x “In matrix notation it can be a as a All eae ES Dr] [Deyxy Ee A B, I “Solving simultaneously for B andAs gives as p, - 2a Sx8)- (Sy, JE) a x3, (Lay, J- (Lay, j, Sve Exi)- Crim lExam) , x7, KEG, |-(E,,45, 7 ° * RB can be computed from the normal equation as follows B, =¥-B,X,-B,X Thursday, January 28, 2021 | - Cont... >) Va ): 1, Ex + Hy 2X Zara! 2 Yxq Dx -Exyay) n I (Jefe “The Standard Error of Estimate: A single summary number that allows you to tell how accurate your predictions are likely to be when you perform linear regression. A 7 =x, : ny o val ; } Salee)-Onm) of sate afd, Za, o var 2, )= —_ °) Cx le8)-SxF Dxill-r5 Thursday, January 28, 2021 Cont... In all this formulas is the homoscedastistic variance of the a. ay population disturbances Uu;- 5 Yui n-3 “To obtain the OLS estimators of the -variables linear regression model we proceed analogously. Thus, we first write Se = ABM BX * Differentiating this expression partially with respect to each of the k unknowns, setting the resulting equations equal to zero Thursday, January 28, 2021 Cont... ample 1: from the data given in table 4.1 estimate the estimators of population parameter and the variance and the standard errors of the OLS estimators. “Summary statistics ¥ x, =89,570 LX) = $93,450,100 DX = 3.273, YX} = 209,833 XY, = 9,056 dv? = 1,645,102 \¥,,X,,=5,777.480 DY, Xj. =7,359,990 SY, X,,=361,398 Thursday, January 28, 2021 ultiple Coefficients of Determination ) The coefficient of multiple determination R? is defined as the proportion of the total variation inY “explained” by the multiple regression of Y on X, and X;, and it can be calculated by 2 ESS _ Br DY Vix + BD Vids TSS Sy » It measures the goodness of fit of the regression equation. > It isa single summary number that tells you how much variation in dependent variable is directly related to variation in another /explanatory/ variable. Thursday, January 28, 2021 Cont... \ he coefficient of determination is explained in a number of ways: YR? is defined in terms of variation about the mean of y so that if a model is reparameterised (rearranged) and the dependent variable changes, will change. v R’ never falls if more regressors are added to the regression. vR quite often takes on values of 0.9 or higher for time series regressions. Example : Determine R? for our child mortality regression models » That is about 70.78% of the total variation in the dependent variable, child mortality, is explained by the two explanatory Thursday, January 28, 2021 variables, GNP and the female literacy rate. R’ and Adjusted R’ ) n important property of R’ is that it is a non decreasing function of the number of explanatory variables or regressors present in the model; as 2 ees the number of regressors increases, R” almost invariably increases and ESS _, TSS > dj » )iy? is independent of the number of X variables in the model because never decreases. R it is simply (%-¥". a2 >» However, yu depends on the number of regressors present in the model. > It is clear that as the number of X variables increases, ) ui is likely to Thursday, January lecrease (at least it will not increase); ®” hence will increase Cont... ») ‘To compare two R? terms, one must take into account the number of X variables present in the model. » This can be done readily if we consider an alternative coefficient of determination: 2 1 Ql b 1-a-R) 22! X v7 /@-) n-k » The term adjusted means adjusted for the df associated with the sums of ze squares (residual and total). 2 > For k >1,R < R’which implies that as the number of X variables increases, the adjusted R* increases less than the unadjusted R? turns out to be negative in an application, its value is taken as zero, Thursday, January 28, 3027 Cont... ‘xample: Suppose a regression model has three variables and the model has an intercept term. From 64 observations R? was found to be 0.707. Then what would be the values for adjusted R?, R? (64-1) (64-4) R =1-0-R) 2 31 —(1-0.7077) = 1-0.2923x1.05=1- 0.306915 =0.693085, — Thursday, January 28, 2027, Interval Estimation and Hypothesis Testing : «In interval estimation unlike point estimation we will estimate the parameters in interval using some probability level. “In hypothesis testing there are two approaches to test the validity of population parameters v Interval ¥ Test of significance Thursday, January 28, 2021 \ Interval estimation ‘ If our sole objective is point estimation of the parameters of the regression models, the method of ordinary least squares (OLS), which does not make any assumption about the probability distribution of the disturbances. “*But if our objective is estimation as well as inference, we need to assume that the disturbance follow some probability distribution. “In multiple regression models disturbances follow the normal distribution with zero mean and constant variance. ** With the normality assumption the OLS estimators of the partial regression coefficients are best linear unbiased estimators (BLUE). Thursday, January 28, 202 Cont... ) * Moreover, the OLS estimators are themselves normally distributed with means equal to true population parameters and the constant variance. “The ¢ distribution can be used to establish confidence intervals as well as test statistical hypotheses about the true population partial regression coefficients. “Similarly, % * the distribution can be used to test hypotheses about the true . Thursday, January 28, 202 Cont... ) * The 95% confidence interval estimation for the true population parameters. » The 95% confidence limits (interval) for , A A Confidence interval of 4) = ,+ baya® se(B,) > The 95% confidence limits (interval) for B, Confidence interval of £,= 8,+ bap xse(P,) Thursday, January 28, 202 Cont... >) Example: establish the 95% confidence interval estimation for the true population coefficients of GNP in percentage and Female Literacy Rate (FLR) in percentage in our Child Mortality (CM) population regression function. » From a sample size of 64 if 100 samples are drawn, the 95 of them will have a range of (- 0.0096, -0.0016) for the coefficient of GNP in percent. » From a sample size of 64 if 100 samples are drawn, the 95 of them will have a range of (-2.6583 to-1.8171) for the coefficient of FLR Thursday, January 28, 2021 en percent. JD S Hypothesis Testing ‘ ere are different hypothesis testing for inferring a given situation. But we will see two of the hypothesis testing. 1. Testing hypotheses about an individual partial regression coefficient ra Testing the overall significance of the estimated multiple regression model Thursday, January 28, 2021 Tests of significance of parameter estimates | “In order to test for the statistical significance of the parameter estimates of the multiple regression: the partial regression coefficients and variance of the estimates is required. » Hypothesis testing procedures are listed as follows: Y Formulate null and alternative hypothesis Hy: f,=0 and H,: 8, #0 “The null hypothesis states that, with X(female literacy rare) held constant, X(PGNP) has no (linear) influence on y (child mortality). Y Specify test statistic and appropriate distribution **To test the null hypothesis, we use the t test which is important to est the individual partial regression coefficients, "sj 8,207 = Cont... ~\ By) ¥ Choose rejection region: @.The most commonly used level of significance is 95%. Y Calculate test statistic. Bb, _ = 0.0056 se(B,) 0.0020 v Reject /Fail to reject the null hypothesis t= = —2.8187 »if the computed t value exceeds the critical ¢ value at the chosen level of significance, we may reject the null hypothesis » if the computed t value less than the critical ¢ value, we may not reject null hypothesis Thursday, January 28, 2021 Cont... > The sample observations are 64. Therefore, the degree of reedom is 61. » At 61 dfand 5% level of significance the critical value of t test is ¥ 2.0 for a two-tail test ¥ 1.671 for a one-tail test » For our example, the alternative hypothesis is two-sided. Therefore, we use the two-tail ¢ value. » Since the computed t value of 2.8187 (in absolute terms) exceeds the critical ¢ value of 2, we can reject the null hypothesis that PGNP Thursday, January 28, 2021 has no effect on child mortality. Cont... v State conclusion: the coefficient of PGNP is different from zero. That is per capita GNP has a significant (negative) effect on child mortality, female literacy rate held constant. Y The same is true for the effect of female literacy rate on child mortality, per capita GNP held constant. Thursday, January 28, 2021 esting the Overall Significance of the Regression) Under the separate hypothesis each true population partial regression coefficient was zero. But now consider the following hypothesis: H, : 8, = B, =0 » This null hypothesis is a joint hypothesis that are jointly or simultaneously equal to zero. » A test of such a hypothesis is called a test of the overall significance of the observed or estimated regression line. Thursday, January 28, 2021

You might also like