Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

‭Lecture notes‬

‭Lecture 1: Mathematical and statistical foundations‬ ‭1‬


‭ ecture 2 & 3: Estimation Methods 1‬
L 1‭ 3‬
‭Lecture 3:‬ ‭25‬
‭Lecture 4 & 5: Maximum Likelihood‬ ‭29‬
‭Lecture 7 - Test of The Capital Asset Pricing Model‬ ‭36‬

‭Lecture 1: Mathematical and statistical foundations‬


‭file:///Users/oliviajakobsson/Downloads/Lecture_01_Introduction_Spring_2024.pdf‬

‭ ideos: Athena → Kurskanal‬


V
‭Exam: no heavy questions due to that we’ll have two quizzes.‬

‭About the quizzes:‬


‭●‬ ‭Place: computer room, Larosal 33 or 36 (Datorsal), Albano Hus 4‬
‭●‬ ‭Computer system in the quiz room: Windows‬
‭●‬ ‭Quiz conducted via Moodle, SU account and password required!‬
‭●‬ ‭Quiz instruction and Mock quiz (Athena and Moodle)‬
‭●‬ ‭Quiz times‬
‭○‬ ‭Quiz 1: 26 Feb. 2024 0900–1230‬
‭○‬ ‭Quiz 2: 4 Mar. 2024 0900–1230‬
‭first 3 labs‬
‭Later labs will be tested on the exam (3 and 4)‬

‭‬ Q
● ‭ uiz data: Athena, different for each group, available just before the quiz starts.‬
‭●‬ ‭Quiz code: available just before the quiz starts.‬
‭●‬ ‭Note! Students can only go to their registered group to do the quizzes. Otherwise the‬
‭results are not counted! Please register a group in Athena and Moodle (choose the‬
‭same group!)‬

‭Exam:‬
‭●‬ ‭Multiply choice‬
‭●‬ ‭Some calculations‬
‭●‬ ‭Not heavy‬

‭‬ P
● ‭ re-read and Read (textbooks, lecture notes, lecture slides)‬
‭●‬ ‭Review basic statistics and elementary matrix algebra‬
‭ ‬ ‭Go to the lectures‬

‭●‬ ‭Attend all the computer labs‬
‭●‬ ‭Preview and review the video tutorials for computer labs‬
‭●‬ ‭Practice the mock quizzes‬
‭●‬ ‭Discuss with your peer students‬
‭●‬ ‭Ask questions‬
‭______‬
‭Functions‬
‭-‬ ‭We’ll build a function and describe the relationship!‬

‭●‬ A ‭ function is a mapping or relationship between an input or set of inputs and an‬
‭output‬
‭●‬ ‭We write that y, the output, is a function f(x), the input, or y =f(x)‬
‭●‬ ‭y could be a linear function of x where the relationship can be expressed on a straight‬
‭line‬
‭●‬ ‭Or it could be non-linear where it would be expressed graphically as a curve‬
‭●‬ ‭If the equation is‬‭linear‬‭, we would write the relationship‬‭as‬
‭𝑦‬ = ‭𝑎‬ + ‭𝑏𝑥‬

‭where y and x are variables and a and b are parameters‬


‭●‬ ‭a is the intercept and b is the slope/gradient‬

‭Ex.‬

‭●‬ ‭The slope can be negative/zero‬


‭○‬ ‭negative: goes down‬
‭○‬ ‭zero: straight (meaning no relationship between y and x)‬
‭●‬ ‭In general, we can calculate the slope of a straight line by taking any two points on the‬
‭line and dividing the change in y by the change in x‬
‭○‬ ‭∆ (Delta) denotes the change in a variable‬

‭Ex.‬
‭ ifferential Calculus‬
D
‭VI MÅSTE EJ DERIVERA I DENNA KURS! Detta är bara good to know info.‬

‭●‬ T ‭ he effect of the rate of change of one variable on the rate of change of another is‬
‭measured by a mathematical derivative‬
‭●‬ ‭If the relationship between the two variables can be represented by a curve, the‬
‭gradient of the curve will be this rate of change‬
‭●‬ ‭Consider a variable y that is a function f of another variable x, i.e. y = f(x): the‬
‭derivative of y with respect to x is written‬

‭or sometimes f’(x)‬

‭Integration‬
‭●‬ ‭This term measures the instantaneous rate of change of y with respect to x, or in other‬
‭words, the impact of an infinitesimally small change in x‬
‭●‬ ‭Notice the difference between the notations ∆y and dy‬

‭ ‬ I‭ ntegration is the opposite of differentiation‬



‭●‬ ‭If we integrate a function and then differentiate the result, we get back the original‬
‭function‬
‭●‬ ‭Integration is used to calculate the area under a curve(between two specific points)‬
‭●‬ ‭Further details on the rules for integration are not given since the mathematical‬
‭technique is not needed for any of the approaches used here.‬

‭Probability‬
‭-‬ ‭Kan ej va negativ eller mindre 1‬
‭Random variables‬
‭●‬ ‭A variable whose value at least in part is determined by the outcome of a chance‬
‭experiment is called a random variable. Random variables are usually denoted by the‬
‭capital letters X, Y, Z and so on, the values taken by them are denoted by‬‭small‬
‭letters x, y, z‬
‭●‬ ‭A random variable may be either discrete or continuous.‬
‭○‬ ‭A discrete RV takes on only a finite number of values.‬
‭■‬ ‭e.g., in throwing dice, if the random variable X is the numbers showing‬
‭on the dice, then X can take 1, 2, 3, 4, 5, 6. Hence, it is a discrete‬
‭random variable.‬
‭○‬ ‭A continuous RV can take on any value in some interval of values.‬
‭■‬ ‭ex height and weight‬

‭ x stock price for a stock, we can predict some of tomorrows stock price. So‬
E
‭we assume todays price + something.‬

‭Probability distribution function of a‬‭discrete‬‭random‬‭variable‬

‭ = nr showing on dice‬
X
‭P to show 1 = ⅙‬

‭Probability Density Function of a continuous RV‬

‭-‬ ‭Its always positive‬

‭VIKTIG!‬
‭ ue‬
m
‭sigma = standard deviation‬
‭ he normal distribution‬
T
‭X follows N‬

‭Once we know mue and variance we can see the probability!‬

‭We use normal distribution often since‬


‭-‬ ‭its very easy to calculate since it only depends on two parameters.‬
‭-‬ ‭Also it has a linear transformation! → Y = ax + b‬
‭-‬ ‭Also a lot of data as height and weight, naturally occurring data follows normal‬
‭distribution.‬

‭Expected value‬
‭●‬ ‭The expected value of a random variable X, denoted by E(X) has the following‬
‭properties:‬

1‭ .‬ T ‭ he expected value of a constant is the constant itself.‬‭‬‭𝐸(‬ ‭𝑏‬) = ‭𝑏‬‭if b is a constant.‬


‭2.‬ ‭If a and b are‬‭constants‬‭,‬
‭𝐸(‬ ‭𝑎𝑋‬‭‬ + ‭‬‭𝑏‬)‭‬ = ‭‬‭𝑎𝐸‬(‭𝑋)‬ ‭‬ + ‭‬‭𝑏‬
‭3.‬ ‭If X and Y are‬‭independent random variables‬‭, then‬
‭𝐸(‬ ‭𝑋𝑌‬)‭‬ = ‭‭𝐸
‬ ‬(‭𝑋)‬ ‭𝐸‬(‭𝑌‬)
‭Variance‬

‭ hen collecting data we’ll have a lot of information, then we’ll look easier at expected value‬
W
‭and variance.‬

‭Covariance‬

‭Variance of correlated variables‬


-‭ ‬ C ‭ an take a value -1 to 1‬
‭-‬ ‭- 1 = perfect negative‬
‭-‬ ‭1 = perfect positive‬

‭Matrices - Background‬
‭●‬ ‭Some useful terminology:‬
‭○‬ ‭A‬‭scalar‬‭is simply a‬‭single number‬‭(although it need‬‭not be a whole number –‬
‭e.g. 3, −5, 0.5 are all scalars)‬
‭○‬ ‭A‬‭vector‬‭is a‬‭one-dimensional array of numbers‬
‭○‬ ‭A‬‭matrix‬‭is a‬‭two-dimensional collection or array‬‭of numbers‬‭. The size of a‬
‭matrix is given by its numbers of rows and columns.‬
‭■‬ ‭good to organize and derive formulas‬
‭●‬ ‭Matrices are very useful and important ways for organising sets of data together,‬
‭which make manipulating and transforming them easy‬
‭●‬ ‭Matrices are widely used in econometrics and finance for solving systems of linear‬
‭equations, for deriving key results, and for expressing formula.‬
‭Vector and Matrix‬
‭●‬ ‭A matrix is a rectangular array of numbers with elements arranged in rows and‬
‭columns.‬
‭●‬ ‭M: rows x columns‬
‭●‬ ‭m: refers to the elements in the rows and column ex. m‬‭11‬ ‭= 1‬

‭-‬ ‭A symmetric is always a square matrix‬


‭Matrix Addition or Substraction‬
‭●‬ ‭Addition and substraction of matrices requires the matrices concerned to be of the‬
‭same order (i.e. to have the same number of rows and the same number of columns as‬
‭one another)‬
‭●‬ ‭The operations are then performed element by element‬

‭-‬ ‭its possible to A + B or A - B:‬

‭VI BEHÖVER BARA GÖRA DETTA I EXCELL‬

‭Matrix Multiplication‬
‭●‬ ‭Multiplying or dividing a matrix by a scalar (that is, a single number), implies that‬
‭every element of the matrix is multiplied by that number‬

‭●‬ M
‭ ore generally, for two matrices A and B of the same order and for c a scalar, the‬
‭following results hold‬

‭This will be different when we multiply!‬

‭●‬ M ‭ atrix multiplication‬‭: requires number of columns‬‭of the first matrix‬


‭must be equal to the number of rows of the second matrix.‬
‭●‬ ‭Note also that the ordering of the matrices is important, so in general, AB ̸=‬
‭BA‬
‭●‬ W
‭ hen the matrices are multiplied together, the resulting matrix will be of size‬
‭(number of rows of first matrix X number of columns of second matrix), e.g.‬
‭(3 × 2) × (2 × 4) = (3 × 4).‬

‭BA is not the same: and therefore it does not hold →‬‭AB ̸= BA‬

‭●‬ ‭More generally, (a × b) × (b × c) × (c × d) × (d × e) = (a × e), etc.‬


‭○‬ ‭We can cancel out if the last is equal to the first!‬

‭○‬ ‭In excel:‬


‭1.‬ ‭See if its applicable → se om vi kan bryta ut genom att sista =‬
‭första i :)‬
‭2.‬ ‭Calculation in excel‬

‭●‬ ‭In general, matrices‬‭cannot be‬‭divided‬‭by one another.‬


‭○‬ ‭Instead, we‬‭multiply by the‬‭inverse‬‭.‬

‭ he Transpose of a Matrix‬
T
‭Swisching the rows with columns‬
‭The Inverse of a Matrix‬

‭‬ T
● ‭ he inverse of a matrix exists only when the matrix is‬‭square‬‭and‬‭non-singular‬‭.‬
‭●‬ U‭ se‬‭MINVERSE‬‭function in excel‬

‭Matrix properties → sum‬

‭But if we swish it will be‬

‭-->‬

‭Reading materials for this lecture‬


‭Lecture 2 & 3: Estimation Methods 1‬
‭file:///Users/oliviajakobsson/Downloads/Lecture_02_03_%20OLS_Spring_2024.pdf‬

‭Outline for the lecture:‬


‭1.‬ ‭The classical linear regression model: Ordinary Least Square‬
‭2.‬ ‭The property of an estimator‬
‭3.‬ ‭Multiple linear regression: OLS in matrix form‬
🫡‭ ‬

‭Regression analysis‬
‭-‬ ‭Ordinary Least Square‬‭is used to do regression analysis!‬

‭ egression analysis is used to describes and evaluates the relationship between a given‬
R
‭variable (the dependent variable, y) and one or more other variables (the independent‬
‭variable(s), x)‬

‭Regression is different from Correlation‬


‭●‬ ‭So if we say y and x are correlated, it means that we are treating y and x in a‬
‭completely symmetrical‬‭way.‬
‭○‬ ‭so we have no idea if they affect each other.‬
‭BUT:‬
‭●‬ ‭In regression, we treat the dependent variable (y) and the independent variable(s) (x)‬
‭very differently.‬
‭○‬ ‭The y variable‬‭is assumed to be‬‭random‬‭or ‘stochastic’‬‭in some way, i.e. to‬
‭have a‬‭probability distribution.‬
‭○‬ ‭The x variables are‬‭, however, assumed to have‬‭fixed‬‭(‘non-stochastic’)‬
‭values in repeated samples.‬

‭→ We’re trying to use models to understand, using x to predict the y variable!‬

‭Some notations:‬
‭●‬ ‭Denote the dependent variable by y and the independent variables (s) by x‬‭1‬‭, x‬‭2‬‭, … , x‬‭k‬
‭where there are k independent variables‬
‭●‬ ‭Some alternative names for the y and x variables:‬

‭●‬ N
‭ ote that there can be many x variables but we will limit ourselves to the case where‬
‭there is only one x variable to start with‬
‭●‬ ‭Cross sectional denote:‬‭y‬‭i‬‭,x‬‭i‬
‭○‬ ‭y = ex. individual firms, countries etc‬
‭●‬ ‭Time series:‬‭y‬‭t‭,‬ x‬‭t‬
‭●‬ ‭Pentalty: ...‬

‭Vi kollar först på att y endast påverkas av ett x‬

‭Simple regression‬
‭●‬ ‭For simplicity, say k=1. This is the situation where y depends on only one x variable.‬

‭Examples of the kind of relationship that may be of interest include:‬


‭●‬ ‭How asset returns vary with their level of market risk‬
‭○‬ ‭portfolio theory‬
‭●‬ ‭Measuring the long-term relationship between stock prices and dividends. etc.‬
‭○‬ ‭dividend cross model‬
‭●‬ ‭Constructing an optimal hedge ratio for a spot position in crude oil.‬
‭○‬ ‭Ex. how many contract to go long or short in order to hedge.‬

‭Example:‬
‭-‬ ‭y depends on only one x here‬

‭ uppose that we have the following data on the excess returns on a fund XXX and the excess‬
S
‭returns on a market index:‬

y‭ = fund‬
‭x = market portfolio/index‬

‭●‬ W
‭ e have some intuition that the beta on this fund is ? (postive or negative) want to‬
‭find whether there is a relationship between y and x given the data.‬
‭○‬ ‭when the exc on market (x) is high the exc on fund (y) is high vice versa.‬

‭●‬ ‭The first stage would be to form a scatter plot of the two variables.‬
‭Plot in a scatter diagram‬

‭Finding a Line of Best Fit‬


‭●‬ ‭We can use the general equation for a straight line,‬
‭y = α + βx‬

‭to get the line that best fits” the data.‬

‭●‬ ‭However, this equation above is completely deterministic.‬


‭○‬ ‭if y and x really have the relationship all the data points should line up in a‬
‭straight line.‬
‭●‬ ‭Is this realistic?‬
‭●‬ ‭No !‬
‭●‬ ‭What we do is to add a random disturbance term, u into the equation.‬
‭○‬ ‭u means that we allow some deviation form the straight line!‬
‭y‭t‬‬ ‭= α + βx‬‭t‬ ‭+ u‬‭t‬
‭-‬ ‭a = alpha/error‬
‭-‬ ‭B = beta‬
‭-‬ ‭u = random variable‬

‭where t (years) = 1, 2, 3, 4, 5 → meaning we have 5 observations‬

‭Why do we include a Disturbance term? (u)‬


‭-‬ ‭form of error term‬

‭The disturbance term can capture a number of features:‬


‭●‬ ‭We always leave out some determinants of yt (unobservable, unmeasurable)‬
‭●‬ ‭There may be errors in the measurement of yt that cannot be modelled.‬
‭●‬ ‭Random‬‭outside influences on yt which we cannot model‬
‭○‬ ‭e.g. a hurricane, a computer failure.‬
‭ etermining the Regression Coefficients‬
D
‭So how do we determine what α and β are?‬
‭●‬ ‭Choose‬‭α‬‭and‬‭β‬‭so that the (vertical) distances from‬‭the data‬
‭points to the fitted lines are minimised (so that the line fits‬
‭the data as closely as possible).‬

‭Ordinary Least Squares‬


‭●‬ ‭The most common method used to fit a line to the data is‬
‭known as OLS (ordinary least squares).‬
‭●‬ ‭What we actually do is to take each distance from the data point and square it (i.e.‬
‭take the area of each of the squares in the diagram) and minimise the total sum of the‬
‭squares (hence it’s called least squares).‬
‭●‬ ‭Tightening up the notation, let‬

‭Y BAR AND X BAR = MEAN OF Y AND X‬

‭-‬ 5‭ data point‬


‭first, second and four = positive‬
‭third and fifth = negative‬

‭We use the vertical line in order to the wish to minimize the error.‬
‭Actual and Fitted Value‬

‭Our aim is to minimize the sum of the squar residual‬

‭-‬ s‭ o we want to find the alpha and beta that minimize the differenc between the actual‬
‭pint and the line.‬

‭Deriving the OLS Estimator‬


-‭ ‬ T‭ = nr of observations‬
‭-‬ ‭This method of finding the optimum is known as‬‭ordinary‬‭least squares.‬

‭-‬ ‭vi kommer ej behöva derivera manuellt‬

‭So first we want to minimize‬

‭WHat do we use αˆ and βˆ For?‬

‭-‬ ‭we can see that its more risky than on the market‬

‭→ we can use this to answer questions:‬

‭NOW WE’LL SEE WHY THE OLS IS POPULAR AND HOW IT CAN BE USED:‬
‭The Population and the Sample‬

‭ opulation‬
P
‭The‬‭population‬‭is the total collection of all objects‬‭or people to be‬
‭studied, for example:‬

‭ ample‬
S
‭A‬‭sample‬‭is a selection of just some items from the‬‭population.‬
‭●‬ ‭Ideal sample: is a‬‭random sample‬‭is a sample in which‬‭each individual item in the‬
‭population is equally likely to be drawn.‬

‭Estimator or Estimate?‬
‭●‬ ‭Estimators‬‭are the formula used to calculate the coefficients,‬
‭○‬ ‭e.g., the αˆ and βˆ of the OLS‬
‭■‬ ‭with now ^ above = population parameters‬
‭●‬ ‭Estimates‬‭are the actual numerical values for the‬‭coefficients, will be a set of numbers‬

‭ ampling‬
S
‭In order to find the population parameters we start with finding the sampling:‬

‭●‬ ‭Repeated sample with observations‬‭n‬


‭○‬ ‭Values of independent variables unchanged‬
‭○‬ ‭Values of the dependent variable change by drawing a new set of disturbances,‬
‭u.‬
‭○‬ ‭Use an estimator βˆ to calculate an estimate of β for m times‬

‭‬ T
● ‭ he manner in which these estimates are distributed is the sampling distribution of βˆ‬
‭●‬ ‭The properties of sampling distribution can be used to characterize and evaluate the‬
‭estimator‬
‭●‬ ‭Look at the Unbiasedness, Efficiency and Asympototic properties of an estimator‬

‭Unbiasedness‬
‭-‬ ‭First property‬
‭‬ T
● ‭ he property does not mean that βˆ = β‬
‭●‬ ‭If we could undertake repeated sampling an infinite number of times, we would get‬
‭the correct estimate on average.‬

‭ epeated Sampling: unbiasedness‬


R
‭population model: y‬‭t‬ ‭= a + B + x‬‭t‬ ‭+ u‬‭t‬
‭here we do the opposit!‬

‭-‬ ‭5) we test if OLS generates unbiased estimators for a and B and hat a and hat B‬

‭Efficiency‬

‭●‬ A ‭ n estimator βˆ of parameterβ is said to be efficient if it is unbiased and has the‬


‭smallest variance among all unbiased estimators. If the estimator is efficient, we are‬
‭minimising the probability that it is a long way off from the true value of β.‬
‭●‬ ‭An estimator that is linear and unbiased and that has the minimum variance among all‬
‭linear unbiased estimator is called the‬‭best linear‬‭unbiased estimator‬‭(BLUE)‬
‭○‬ ‭OLS is also called this‬
‭ robability‬
P
‭Small variance = beta is close to the two beta‬
‭Large variance = beta is high‬

‭Unbias and Efficient‬

‭ nbiased - det är utspridd i cirkeln‬


U
‭Efficient - hur tätt det är‬

‭Asymptotic Properties‬
‭●‬ ‭Justify an estimator on the basis of its asymptotic properties of sampling distribution‬
‭in extremely large samples‬
‭○‬ ‭If the asymptotic distribution of βˆ becomes concentrated on a particular value‬
‭β as the sample size approaches infinity, β is said to be the probability limit of‬
‭βˆ and is written plim(βˆ) = β. βˆ is said to be‬‭consistent‬‭if‬

‭If we can collect the infinit sample for all people‬


‭●‬ I‭ f βˆ is consistent and its asymptotic variance is smaller than the asymptotic variance‬
‭of all other consistent estimators, βˆ is said to be‬‭asymptotic efficient.‬

‭Repeated Sampling: consistent‬

‭-‬ i‭f we can increase the nr of observations the parameter will converge to the two‬
‭parameter or population.‬

‭The Assumption of the classical linear regression model‬


‭Properties of the OLS Estimator: BLUE - gick ej över mkt‬
‭●‬ ‭The Best Linear Unbiased Estimator ( if assumptions 1-4 hold)‬
‭○‬ ‭Estimator: αˆ and βˆ are the estimators of the true value of α and β‬
‭○‬ ‭Linear: is a linear estimator‬
‭○‬ ‭Unbiased: On average, the actual value of the αˆ and βˆ will be equal to the‬
‭true values.‬
‭○‬ ‭Best: the OLS estimators have the minimum variance among the class of‬
‭linear unbiased estimators‬

‭The Gauss-Markov theorem proves that the OLS estimator is best.‬

‭ recision and standard errors‬


P
‭Recall that the estimators of α and β from the sample parameters αˆ and βˆ are given by,‬

‭ ould like to know how good are these estimates. We need some measure of the reliability or‬
W
‭precision of the estimators (αˆ and βˆ). The precision of the estimate is given by its standard‬
‭error. Given assumptions 1 - 4 above, then the standard errors can be shown to be given by,‬

‭-‬ ‭s = the estimated standard deviation of the residuals (uˆ‬‭t‭)‬ .‬


‭ stimating the Variance of the Disturbance Term‬
E
‭how to esimatr the standard error‬

‭Example: How to Calculate the Parameters and Standard Errors‬


‭Lecture 3:‬
‭ tart slide 39‬
S
‭The Assumption of the classical linear regressionmodel‬

‭ roperties of the OLS Estimator: BLUE‬


P
‭The Best Linear Unbiased Estimator ( if assumptions 1-4 hold)‬
‭1.‬ ‭Estimator:ˆαandˆβare the estimators of the true value ofαandβ‬
‭2.‬ ‭Linear: is a linear estimator‬
‭3.‬ ‭Unbiased: On average, the actual value of theˆα and ˆβ will be equal to the true‬
‭values.‬
‭4.‬ ‭Best: the OLS estimators have the minimum variance among theclass of linear‬
‭unbiased estimators.‬
‭The Gauss-Markov theorem proves that the OLS estimator is best.‬

‭ recision and standard errors‬


P
‭Recall that the estimators of α and β from the sample parameters ˆα and ˆβ are given by,‬

‭ ould like to know how good are these estimates. We need somemeasure of the reliability or‬
W
‭precision of the estimators (ˆαandˆβ). Theprecision of the estimate is given by its standard‬
‭error. Givenassumptions 1 - 4 above, then the standard errors can be shown tobe given by,‬

‭-‬ ‭s is the estimated standard deviation of the residuals (ˆut)‬

‭Estimating the Variance of the Disturbance Term‬

‭ xample: How to Calculate the Parameters and Standard Errors‬


E
‭Assume we have the following data calculated from a regressionofyon a single variablexand‬
‭a constant over 22 observations.‬

‭After calculation, we have‬


‭Example (cont’d)‬

‭An Introduction to Statistical Inference‬


‭●‬ ‭We want to make inferences about the likely population valuesfrom the regression‬
‭parameters‬
‭●‬ ‭e.g., Suppose we have the following regression results‬

‭●‬ ‭̂β=0.35 is a single (point) estimate of the unknown populationparameter,β.‬


‭Howreliableis this estimate?‬
‭●‬ ‭The reliability of the point estimate is measured by thecoefficient’s standard error.‬

‭Hypothesis Testing: Some concepts‬


‭●‬ ‭We can use the information in the sample to make inferencesabout the population.‬
‭●‬ ‭We will always have two hypotheses that go together,‬
‭○‬ ‭the null hypothesis‬‭(denoted‬‭H0‬‭:β=β∗)‬
‭○‬ ‭alternative hypothesis‬‭(denoted‬‭H1‬‭: β̸=β∗,β > β∗,β‬‭< β∗).‬
‭●‬ ‭The null hypothesis is the statement or the statistical hypothesisthat is actually being‬
‭tested.‬
‭●‬ ‭The alternative hypothesisrepresents the remaining outcomes of interest.‬
‭●‬ e‭ .g., suppose given the regression results above, we are interested in the hypothesis‬
‭that the true value of β is in fact 0.5.We would use the notation H0:β=0.5 H1:β̸=0.5‬
‭●‬ ‭This would be known as atwo sided test.‬

‭ ne-sided Hypothesis Tests‬


O
‭Sometimes we may have some prior information that, e.g., wewould expect β >0.5 rather than‬
‭β <0.5. In this case, we woulddo a one-sided test‬

‭or we could have had‬

‭ here are twon common ways to conduct a hypothesis test: thetest of significance approach,‬
T
‭or via the confidence intervalapproach.‬

‭The probability distribution of the Least Squaresestimator‬


‭Lecture 4 & 5: Maximum Likelihood‬
‭Lecture Notes (2019), Chapter 2.5‬

‭Introduction to the Maximum likelihood Estimation(MLE)‬


‭●‬ ‭The starting point: assuming the distribution of an observedvariable is known‬
‭○‬ ‭we’re interested in to modeling the dependent variable‬
‭●‬ ‭Expect for a‬‭finite‬‭number of unknown parameters‬
‭●‬ ‭The parameters will be estimated by taking the values for themthat give the highest‬
‭probability / likelihood‬
‭●‬ ‭MLE provide us a means to characterizing a distributionassuming that we know the‬
‭form of this distribution‬
‭●‬ ‭For example:‬

‭A simple demonstration‬
‭●‬ ‭Consider a large pool filled with red and yellow balls. We areinterested in the‬
‭fractionpof red balls in this pool. Now we take arandom sample ofNballs (do not look‬
‭at all other balls) withreplacement.‬
‭●‬ ‭Denote y‬‭1‬ ‭= 1 if ball i is red and y‬‭i‬‭= 0 if it is‬‭yellow‬

‭P = (y‬‭i‬ ‭= 1) = P‬

‭●‬ N
‭ ow: estimate P from the sample‬
‭Assume that our sample contains N1= Σ‬‭i‬‭y‬‭i‬ ‭red and N−N‬‭1‬ ‭yellow balls, the probability‬
‭of obtaining such a sample is given by,‬

‭ = 100‬
N
‭N‬‭1‬ ‭= 60 → red‬
‭N-N‬‭1‬ ‭= Yellow‬

‭●‬ F
‭ or computational purposes it is often more convenient tomaximize the (natural)‬
‭logarithm,‬

‭●‬ ‭Maximize LnL(p) w.r.t. p, and solve the first order condition, weget:‬
‭●‬ T
‭ he ML estimator thus corresponds with the sample proportionof red balls, and‬
‭probability also corresponds with your bestguess for p based on the sample that was‬
‭drawn.‬

‭ he intuition of MLE‬
T
‭The maximum likelihood principle is as follows.‬
‭1.‬ ‭Determine/assume the distribution of the data (e.g.,y‬‭i‬‭),‬
‭2.‬ ‭From the assumed data, we determine the likelihood ofobserving the sample as a‬
‭function of the unknown parametersthat characterize the distribution‬
‭3.‬ ‭Estimate the values for the unknown parameters that gives usthe highest likelihood.‬

‭The Linear Regression Model: MLE‬

‭-‬ ‭MLE FOR NORMAL DISTRIBUTION‬

‭-‬ ‭N = nr of observations‬
‭Inference with ML‬
‭ ptional: What if the assumptions in CLRM areviolated? Other possible problems in‬
O
‭regressionanalysis?‬
‭BEHÖVER EJ KUNNA‬
‭●‬ ‭Heteroskedasticity‬
‭●‬ ‭Serial (Auto) correlation‬
‭●‬ ‭Other softwares‬
‭●‬ ‭See Athena- Resources-Other supplementary materials(optional)‬
‭●‬ ‭Multilinearility‬
‭●‬ ‭Discrete, ordinal independant variable‬
‭●‬ ‭Discrete dependant variable‬

‭Similarity and difference: OLS vs MLE‬


‭●‬ ‭What is OLS? What is MLE?‬
‭●‬ ‭When should we use OLS or MLE?‬
‭●‬ ‭Properties of the OLS estimator and ML estimator‬
‭●‬ ‭Assumptions on the error term distribution‬
‭●‬ ‭Sample requirement‬

‭On whiteboard‬
‭Method‬ ‭Models‬ ‭ rror term‬ ‭Estimator‬
E ‭No. of obs‬
‭distribution‬

‭OLS‬ ‭min ∑u^‬‭2‭t‬‬ ‭linear models‬ ‭u‬‭t‬ ‭~ N(0,σ‬‭2‬‭)‬ ‭BLUE‬ ‭ o requirement, small or‬
N
‭big‬

‭MLE‬ ‭Max lnL‬ b‭ oth linear and non‬ c‭ an assume‬ ‭ ssympototic‬


A ‭Require a large sample‬
‭linear models‬ ‭any‬ ‭properties‬
‭distribution‬ ‭1)consistency‬
‭for the error‬ ‭2)efficiency‬
‭term‬

‭OLS‬
‭-‬ W ‭ e’re estimating the linear regression‬
‭-‬ ‭Minimize the residual sum of square‬
‭MLE‬
‭-‬ ‭more flexible and powerful, we can estimate both linear and non linear models‬

‭ inear: y‬‭t‬‭= β‬‭1‬ ‭+ β‬‭2‬‭x‬‭it‬‭+β‬‭3‭x‬ ‭i‬t‬‭+...+u‬‭t‬


L
‭Non linear: y‬‭t‬ ‭= β‬‭1‬ ‭+ β‬‭2‬‭x‬‭it‬‭/β‬‭3‬‭x‬‭it‬ ‭+ β‬‭4‬‭x‬‭it‬ ‭+ u‬‭t‬

‭ ariance of the error term:‬


V
‭MLE =‬
‭-‬ ‭biased‬
‭OLS =‬
‭-‬ ‭unbiased‬

‭(2-0)/1 = 2‬

‭Yes, since 2>1,96‬


‭ 0: B = 1‬
H
‭New test stat = 2-1/1= 1‬

‭No since 1<1,96‬

‭ ^-+Critical value * SE(B^)‬


B
‭2-1,96*1 = 0,04 → lower bound‬
‭2+1,96*1 = 3,96 → upper bound‬

‭Yes since 0 is not in the interval.‬


‭-‬ ‭here we look at B* in the interval, if its in the interval or not.‬

‭No, since 1 is in the interval.‬


‭-‬ ‭here we look at B* in the interval, if its in the interval or not.‬

‭ es, we can reject H0 since -1 is not in the interval.‬


Y
‭Yes since 2 > 1,96‬

‭ est st = 1,8‬
T
‭1,8 < 1,96 so we fail to reject‬
‭P-value ≈ 4,5%‬

‭No since 8%>5%‬

‭-‬ ‭We reject if its smaller than the P value.‬

‭Yes, since 8%<10%‬

‭Beta 2 is positive, meaning that any increase in xt will result in an increase in yt.‬

_‭ __‬
‭There is a positive relationship between liquidity and trading activity:‬
‭liquidity‬‭i‬ ‭= a + 0,8 Trading activity +...+ u‬‭t‬
‭Lecture 7 - Test of The Capital Asset Pricing Model‬
‭CAPM‬

‭From 6th lecture:‬


‭-‬ ‭Check wether a time series is AR(q) or MA(q)‬
‭AR(p)process is described by:‬
‭●‬ ‭an ACF that is infinite in extent but decays geometrically.‬
‭●‬ ‭a PACF that is (close to) zero for lags larger than p.‬
‭For anMA(q)process we have that:‬
‭●‬ ‭an ACF that is (close to) zero for lags larger than q.‬
‭●‬ ‭a PACF that is infinite in extent but decays geometrically.‬

‭A combined ARMA model has:‬


‭●‬ ‭a geometrically decaying ACF.‬
‭●‬ ‭a geometrically decaying PACF‬

‭Unit root test:‬


‭●‬ ‭Non-stationary = ex. random walk (stock market)‬

‭Dickey-fuller test‬
‭●‬ ‭To check if a time series contains a unit root or not, or if y‬‭t‬ ‭is stationary or not‬
‭●‬ ‭Null hypothesis says that psi=0‬
‭○‬ ‭if its 0 = random walk model and contain unit root.‬

‭We reject if the psi/test stat for DF < Critical value‬


‭___‬

‭Capital Asset Pricing Model - CAPM‬


‭●‬ ‭We assume thet the investor holds a mean variance efficient portfolio in return‬
‭and volatility space, risk aversion, greedy customer etc.‬
‭○‬ ‭Expected return = E(r) - y-axis‬
‭○‬ ‭Volatility = σ - x-axis‬

S‭ amma nivå = vi kommer ej investera‬


‭The market porfolio will give u the highest return.‬

‭●‬ U
‭ nconditional single period CAPM model: the expected return ofan asset must be‬
‭linearly related to the covariance of its returnwith the return of the market‬
‭portfolio.‬
‭○‬ ‭The investors are risk averson, meaning if someone wants to invest in‬
‭stocks instead of in the bank the investor will have a higher risk and‬
‭higher expected return.‬
‭●‬ I‭ n order to test the models empirically we normally assume IID returns and joint‬
‭multivate normality.‬
‭●‬ ‭There are different approaches focus on different aspects of an assetpricing‬
‭model.‬

‭ est of CAPM model:‬


T
‭Test 1:‬
‭Analyze if a candidate asset pricing model is‬‭sufficient‬‭forexplaining the expected asset‬
‭returns by testing the significanceof the model’s mispricing.‬
‭●‬ ‭Based on a time series regression analysis (TSR), i.e. a regressionon time-series‬
‭data for some firms/portfolios‬

‭This is focus on todays lecture and lab 4.‬

‭ est 2:‬
T
‭Examine if the assets’ loadings on the factors can explain the variations in expected‬
‭returns across assets‬
‭●‬ ‭Use the cross-sectional regressions approach (CSR), i.e. aregression on data for‬
‭different firms, which can be repeated fordifferent time periods.‬

‭ est 3:‬
T
‭Test if the factor‬‭risk premiums‬‭are significantly‬‭different from zero.‬
‭●‬ ‭Depending on how we model the factor risk premium, can beanalyzed by both‬
‭the CSR and the TSR approaches‬

‭The Sharpe (1964) and Lintner (1965) version of CAPM‬


‭●‬ ‭we assume there’s some risk free rate for investors to lend and borrow. The expected‬
‭retorn fo asset i is the following:‬

-‭ ‬ r‭ f = risk free rate‬


‭-‬ ‭B = beta‬
‭-‬ ‭Expected return of market portfolio - risk free rate‬

‭= the premium‬

o‭ m vi flytta r‬‭f‬‭till vänster:‬


‭E(r‬‭i‭)‬ - r‬‭f‬ ‭= excess return of asset i over the risk‬‭free rate‬
‭E(r‬‭i‭)‬ - r‬‭f‬ ‭= B‬‭im‬ ‭(E(r‬‭m‭)‬ -r‬‭f‬‭)‬
‭ he expected excess return of asset i for any asset‬
T
‭Expected excess rerun = some beta * excess return on marker portfolio‬
‭-‬ ‭E(R‬‭i‭)‬ = B‬‭im‬ ‭* E(R‬‭m‭)‬ ‬

‭We need to design an emperical model to test the theoretical model.‬


‭-‬ ‭R‭i‬‬ ‭= B‬‭i‬ ‭* R‬‭m‬ ‭+ ε‬‭t‬

‭ ald test‬
W
‭Wald test on the intercepts of‬

-‭ ‬ H‭ ere we estimate alpha and beta‬


‭-‬ ‭We test if alpha i is jointly zero for all the assets.‬

‭ here R‬‭t‬ ‭denotes the vector of excess returns on risky‬‭assets, and R‬‭mt‬ ‭is the excess return on‬
w
‭the market portfolio.‬

‭-‬ ‭vi anpassar alltså utifrån olika assets.‬

‭ ssuming thet there are N risky assets, R‬‭t‬ ‭is an (N‬‭x 1) vector, a and B are (N x 1) vectors of‬
A
‭parameters, and ε‬‭t‬ ‭is an (N x 1) vector of residuals,‬‭susch that:‬

‭If all alpha hare jointly zero the CAPM model is suitable!‬

‭In order to have the Sharpe-Lintner CAPM then all elements of the vector alpha are zero.‬

‭The log-likelihood function for this multivariate model is given by‬


‭We maximise the log likelihood which‬

-‭ ‬ N‭ = how many assets‬


‭-‬ ‭T = how many time periods‬

‭ est the joint‬


T
‭The distribution for the ML estimator is:‬

‭-‬ ‭μ‬‭m‬ ‭= average on market → Average(R‬‭m‬‭)‬

‭=‬‭Variance covariance‬

‭About the hypothesis:‬

‭As long as one alpha is not significant diff from zero, then we can reject the null hypothesis.‬
‭Wald test statistics:‬

‭Alpha hat = maximum liklihood estimation‬


‭-‬ ‭in order to see the single Alpha hat and with MLE‬
‭H‬‭0‭:‬ alpha = 0 → t.test (alpha/SE(alpha))‬‭2‬ ‭~ N(0,1)‬

‭ ~ N(0,1)‬
X
‭= X‬‭2‬ ‭+...+X‬‭2‭1‬ 00‬ ‭then this will follow “Chisquare distribution”‬‭X‭2‬ ‬ ‭~ N(0,100)‬

‭ ärav när vi räknar ut Critical value och P-value använder vi oss av CHIINV (Tc) och‬
D
‭CHIDIST (P-value).‬

‭ eject if its > Tc‬


R
‭Reject if P-value<Sign.level‬

‭ he likelihood Ratio test‬


T
‭A test to compare models where one is more often restricted (att något tagits bort) of the‬
‭pervious. The test tells us which model is the best.‬

‭ o implement this test, we need to re-estimate the parametersunder the‬‭restricted‬‭model (the‬


T
‭model without intercept), whichresults in a new log-likelihood value,ln L*.‬

‭The LR test stat is given by:‬


‭LR = -2(ln L* - ln L)‬
‭-‬ ‭L = log-likelihood value obtained from the unrestricted model.‬

‭●‬ ‭The distribution of LR is asympototically x‬‭2‬‭(N).‬

‭The null hypothesis here is if the‬‭restriction‬‭was‬‭useful or not.‬

‭Zero-Beta CAPM model → Black‬


‭●‬ ‭In absence of risk free rate this version work.‬
‭●‬ ‭In absence of riskfree asset, one can use the market portfolioand its zero-covariance‬
‭portfolio, and write theexpected returnon any feasible portfolio or security q as‬
‭●‬ ‭No relation with the marker portfolio.‬
‭●‬ ‭There exist some special portfolio that lies on‬‭the‬‭inefficient frontier.‬
‭We rewrite the traditional CAPM model to:‬

‭How to test the Zero-Beta‬


‭-‬ ‭Gamma = zero covariance or zeroa beta‬

‭Black’s Zero-Beta version of CAPM implies that the expectedreturn vector is‬

-‭ ‬ 1‭ -B*Gamma+Beta* expected return on market‬


‭-‬ ‭γ = gamma‬

‭●‬ w ‭ here γ is the return of the zero-beta portfolio with the market(treated as an‬
‭unobserved quantity)‬
‭●‬ ‭ι is an(N×1)vector of ones‬

‭We reject means that we reject the blacks verson of the CAPM model.‬
‭AR(p) model:‬

‭AR(1) model:‬

‭MA(q) model:‬

‭MA(1) model:‬

‭Combine AR(p) and MA(q) to ARMA:‬

‭ARMA (1,1)‬

‭Random walk with drift‬

‭Rewritten to‬‭Stochastic non-stationarity‬‭(where‬ϕ ‭> 1)‬

You might also like