2022 LI QuantMethods

Last Revised: 08/03/2021
MarkMeldrum.com
Level I - Quantitative Methods

Readings Page
The Time Value of Money 2
Organizing, Visualizing, and Describing Data 11
Probability Concepts 28
Common Probability Distributions 40
Sampling and Estimation 53
Hypothesis Testing 62
Introduction to Linear Regression 74
Reviews 83
This document should be used in conjunction with the corresponding readings in the 2022 Level I CFA® Program curriculum.
Some of the graphs, charts, tables, examples, and figures are copyright 2022, CFA Institute. Reproduced and republished with
permission from CFA Institute. All rights reserved.
Required disclaimer: CFA Institute does not endorse, promote, or warrant accuracy or quality of the products or services
offered by MarkMeldrum.com. CFA Institute, CFA®, and Chartered Financial Analyst® are trademarks owned by CFA
Institute.
© markmeldrum.com. All rights reserved.

The Time Value of Money
a. interpret interest rates as required rates of return, discount rates, or opportunity

costs;
b. explain an interest rate as the sum of a real risk-free rate and premiums that
compensate investors for bearing distinct types of risk;
c. calculate and interpret the effective annual rate, given the stated annual interest
rate and the frequency of compounding;
d. calculate the solution for time value of money problems with different
frequencies of compounding;
e. calculate and interpret the future value (FV) and present value (PV) of a single
sum of money, an ordinary annuity, an annuity due, a perpetuity (PV only), and
a series of unequal cash flows;
f. demonstrate the use of a time line in modeling and solving time value of money
problems.
Time Value of Money
- 3 rules of money: 1. money sooner is worth more LOS a

than money later -interpret
Pg-1
2. larger cash flows are worth more than smaller
cash flows
3. less risky cash flows are worth more than risky
cash flows
- Interest rates (r) - can be thought of in 3 ways:
1/ required rate of return the rate of return required by an investor
or lender moneytoday (1 + r) = moneytomorrow
2/ discount rate the rate at which some future value is discounted
to arrive at a value today =
( + )
3/ opportunity cost the value an investor or lender
forgoes by choosing a particular action
i.e. r is the opportunity cost of current consumption
- typically: reg. rate of return = discount rate = opportunity cost
suppose I lend you $1000 for one year. I will want: LOS b
rf real risk-free rate: single period rate rf + e = nominal
-explain
Pg-2
+ inflation premium compensates for expected risk-free rate
inflation ( ) [( + )( + )] −
+ Default risk premium - compensates for credit risk
+ Liquidity premium risk of loss vs. fair value if an investment

needs to be converted to cash quickly
+ Maturity premium greater interest rate risk (i.e. price risk) with longer
maturities
will also have a premium for inflation
uncertainty the longer the time period, the more
uncertain we are about the level of expected inflation
Future Value of a Single Cash Flow/ LOS e

PV = present value -calculate
PV FV -interpret
FV = future value
Pg-3
-
-
t = 0 t = N r = interest rate r must be in the same
t = 0 r = 5% t = 1 N = # of periods periodicity as N
e.g./ $100, 5yrs., 6% 100(1.06)5
-
-
100 FV = 100(1.05) = 105
semi-annual 100(1.03)10
(PV)
quarterly (1.015)20
t = 0 t = 1 t = 2
-
-
100 100(1.05) 100(1.05)(1.05) = 100(1.05)2 = 110.25
FV = PV (1+ r)N
- the power of compounding/
S1
-
25 yrs. 65yrs. at 5%, N = 40: FV = 1000(1.05)40 = 7,039.99. (2000)
$1000 FV = ? at 7%, N = 40: FV = 1000(1.07)40 = 14, 974.46 (2800)
at 9%, N = 40: FV = 1000(1.09)40 = 31,409.42. (3600)
e.g./ $5M at t = 0, r = 7% compounded annually, N = 5 years LOS e

Find FV: method 1: 5M(1.07)5 = 7,012,758.65 -calculate
-interpret
method 2: N = 5, I/Y = 7, PV = -5,000,000, PMT = 0
Pg-4
CPT FV = 7,012,758.65
e.g. 2/ Invest ¥2.5M, r = 8% compounded annually, N = 6 years

· Find FV: method 1: 2.5M(108)6 = ¥3,967,186
method 2: N = 6, I/Y = 8, PMT = 0, PV = -2,500,000
CPT FV = 3,967,186
e.g. 3/ $10M at t = 5, E(r) = 9%, FV at t = 15?

r = 9 method 1: FV = 10M(1.09)10 = 23,673,636.75
method 2: N = 10, I/Y = 9, PMT = 0
-
0 5 15
N = 10 PV = -10,000,000
10 M CPT FV = 23,673,636.75
, ,
= = , , .
( . )
Frequency of Compounding/ LOS d

- all rates are quoted annually rs (stated interest rate) -solve
Pg-5
e.g./ PV = 10,000, N = 2, rs = 8% compounded quarterly, FV = ?
×
= + ⁄ = , +. = , ( . ) = , .
or/
N = 2 × 4 = 8, I/Y = = 2, PMT = 0, PV = -10,000, CPT FV = 11,716.59
e.g. 2/ PV = $1M, N = 1, r = 6% compounded monthly, FV = ?

×
= +. = ( . ) = , , .
or/ N = 1 × 12 = 12, I/6 = = .5, PMT = 0, PV = -1,000,000
CPT FV = 1,061,678.81
Continuous Compounding/ ×
=
e.g./ PV = 10,000, N = 2, r = 8% compounded continuously, FV = ?
. ×
= , = , .
or/ .08 × 2 = 2 nd
LN × 10 000 = 11,735.11
LOS c
EAR: effective annual rate EAR
-calculate
$100 at 8% annual 100(1.08) = 108 8%
-interpret
semi-annual 100(1.04)2 = 108.16 8.16% Pg-6
quarterly 100(1.02)4 = 108.2432 8.2432%
monthly 100(1.00 ) = 108.30
̇ 12
8.3%
daily 100(1.000219) 365
= 108.3278 8.3278%
continuous 100e.08 = 108.3287 8.3287%
- if we know EAR, we can solve for rs [( + ⁄ ) − ]

or ( − )
e.g./ EAR of 8.3%, compounded monthly
. = + −
( . ) − = 1.083 yx (1 ÷ 12) - 1 × 12 =
. = +
= %
e.g. 2/ .083287 = −
( . ) = + 1.083287 = rs = 8%
( . ) LN (1.083287) = rs
− =
Future value of a series of cash flows: LOS e

annuity - a finite set of level sequential cash flows -calculate
-interpret
ordinary first cash flow at t = 1
Pg-7
due first cash flow at t = 0
Ordinary Annuity/
r = 5%
-
-
-
t = 0 1 2 3 4 5
-1000 -1000 -1000 -1000 -1000
1000(1.05)4 N = 5
( + ) − 1000(1.05)3 PMT = -1000
=
1000(1.05)2 I/Y = 5
( . ) − 1000(1.05)1 PV = 0
= = , .
. 1000 CPT FV
e.g./ € 20,000/yr., N = 30, r = 9% 5,525.63
end of yr. CF
- ordinary annuity/ N = 30, PMT = -20,000
( . ) −
= , = , , . or/ PV = 0, I/Y = 9
.
CPT FV

· Annuity Due/ -calculate
r = 5% -interpret
-
t = 0 1 2 3 4 5 Pg-8
-1000 -1000 -1000 -1000 -1000

1000(1.05)5
= ordinary annuity
1000(1.05)4
× (1.05)
1000(1.05)3
=
( + ) −
×( + ) · calculator in END Mode
1000(1.05)2
N = 5
1000(1.05)1
ordinary ann. PMT = -1000
5,801.91
PV = 0
annuity due calculator in BGN mode I/Y = 5
2 PMT 2 ENTER CE/C
nd nd
CPT FV × 1.05
Display END BGN.

N = 5, PMT = -1000, PV = 0,
I/Y = 5 CPT FV

-calculate
· Unequal cash flows
r = 5% -interpret
Pg-9
-
-
t = 0 1 2 3 4 5
-1000 -2000 -4000 -5000 -6000
× × × × ×
(1.05)4 (1.05)3 (1.05)2 (1.05)1 (1.05)0 = 19,190.76
Present value of a single cash flow:

e.g./
r = 8% r = 8%
-
t = 6 t = 0 4
-
t = 0 8
FV = PV (1 + r)N 10 100,000
PV = ? 100,000
=
= , ( + ) , 0
( . ) = , .
= FV (1 + r)-N = ( . )
(N = 6, PMT = 0, I/Y = 8, FV = 100,000) = , .
CPT PV
= ( . )
= , .
Present value of a series of cash flows: LOS e

· Ordinary annuity/ 1000 1000 1000 1000 1000 -calculate
-interpret
-
t = 0 1 2 3 4 5 Pg-10
r = 12%
−
( + ) N = 5
=
− or/ FV = 0
( . )
= , . I/Y = 12
.
PMT = 1000 CPT PV
· Annuity Due 1000 1000 1000 1000 1000
or/ N = 4
-
t = 0 1 2 3 4 5 PMT = 1000
− I/Y = 12
( . )
+
.
= . FV = 0
CPT PV
e.g./
+1000
200k 200k 200k 200k
N = 19, PMT = 200,000, I/Y = 7, FV = 0
-
t = 0 1 2 19 20 CPT PV (+ 20,000) = 2,267,119.05

R = 7%
or/
$2M today or 20 PMTs? 200k + 200k
−
( . )
= , , .
.
Present value of a series of cash flows: LOS e

e.g./ -calculate
1M 1M 1M (30 PMTs) -interpret
-
-
t = 0 t = 9 10 11 12 t = 39 Pg-11
PV0 = ? r = 5%
PV10 annuity due
PV9 ordinary annuity
− −
( . ) ( . )
= + = = , , .
. .
= , , .
= ( . ) = , ,
= ( . ) = , ,
N = 30, PMT = 1,000,000, I/Y = 5, FV = 0
BGN mode CPT PV9
N = 30, PMT = 1,000,000, I/Y = 5, FV = 0
CPT PV10 = ( . )
= ( . )
Present value of a perpetuity: = LOS e

-calculate
· level CFs e.g./ $10 at t = 1 forever -interpret
· sequential r = 20% Pg-12
· infinite = =$
.
e.g./ £ 100/yr. - perpetual, r = 5%
e.g./
= . =£ ,
£100 100 100 100
= + = = = .
-
t= 0 5 6 7 8 . ( . )
(r = 5%) = =
. = = .
e.g./ ( . )
100 100 100 100 100
long perpetuity at t = 0 = . = ,
-
t= 0 1 2 3 4 5
100
short perpetuity at t = 4 =
.
) =
-
5 ( .
t= 0 .
= 4 yr. ord. annuity N = 4
−
( . ) PMT = 100
= . I/Y = 5 CPT PV
.
FV =0
LOS e
Present value of a series of unequal CFs:
-calculate
1k 2k 4k 5k 6k -interpret
Calculator: Pg-13
-
-
t= 0 1 2 3 4 5
1000/(1.05) 2nd CF 2nd CE/C
2000/(1.05)2 CF0 ↓
4000/(1.05)3 CO1 1000 ENTER ↓ ↓
5000/(1.05)4 CO2 2000 ENTER ↓ ↓
6000/(1.05)5 CO3 4000 ENTER ↓ ↓
= 15,036.46 CO4 5000 ENTER ↓ ↓
CO5 6000 ENTER ↓ NPV
I 5 ENTER
NPV CPT
15,036.46
Solve for r, N or PMT/ LOS e

FV = PV(1 + r)N solve for r -calculate
=( + ) -interpret
e.g./ Year Sales Profit Pg-14
= +
2008 10,503 822.5
2012 14,146.4 796.4 = −
- growth in Sales 14,146.40 = 10,503 (1 + g)4
= . − =. ( . %)
- growth in Profit 796.4 = 822.5 (1 + g)4
= . − = −. (−. %)
.
e.g./ 2012 - 7.35M units sold
2007 - 8.52M units sold = −
find g
= . − = −. (− . %)
.
Note: g is called the compound annual growth rate
Solve for r, N or PMT/ LOS e

e.g./ How long will it take to double €10M at 7% compounded -calculate
= ( + ) annually? -interpret
Solve for N: Pg-15
( + ) =
= ( . )
( + )= .
= = = .
( . ) .
= calculator: 20 ÷ 10 = LN
( + )
÷ (1.07 LN) =
Solve for PMT: $100,000 mortgage, 30 yrs., 8% compounded monthly
− N = 360
( + )
= PV = -100,000
FV = 0
= I/Y = 8/12 = .66 ̇
−
( + )
= , CPT PV
⎡ − × ⎤
⎢ +. ⎥ ,
⎢ . ⎥= . = .
⎢ ⎥
⎣ ⎦ 666.66
interest
retirement income LOS f

-demonstrate
100k 100k 100k Pg-16
-
t = 0 1 2 15 16 40 41 42 60
2k 2k 2k
age 63
age 22 savings how much to save each yr.? Assume r = 8%
(PMT = ?)
1. FV 15 N = 15, PMT = 2000, I/Y = 8, PV = 0 CPT FV = 54,304.23
2. PV40 N = 20, PMT = 100,000, I/Y = 8, FV = 0 CPT PV = 981,814.74
3. PV15 143,362.53 - 54,304.23 = 89,058.30
N = 25, FV = 0, I/Y = 8, PV = -89,058.30 =
( . )
CPT PMT = 8,342.87
or/ = , .
PV40 = PV15(1.08)25 = 371,901.17 N = 25, FV = 609,913.56, I/Y = 8
FV40 = 981,814.74 - 371,901.17 = 609,913.56 PV = 0 CPT PMT = 8342.87
Organizing, Visualizing, and Describing Data
a. identify and compare data types;
b. describe how data are organized for quantitative analysis;
c. interpret frequency and related distributions;
d. interpret a contingency table;
e. describe ways that data may be visualized and evaluate uses of specific
visualizations;
f. describe how to select among visualization types;
g. calculate and interpret measures of central tendency;
h. evaluate alternative definitions of mean to address an investment problem;
i. calculate quantiles and interpret related visualizations;
j. calculate and interpret measures of dispersion;
k. calculate and interpret target downside deviation;
l. interpret skewness;
m. interpret kurtosis;
n. interpret correlation between two variables.

Sections/
7.5p Data Types (LOS a) identify

14p Data Summarization (LOS b, c, d) describe
16.5p Data Visualization (LOS e, f) interpret
17.5p Measures of Central Tendency (LOS g, h) calculate
6p Other Measures of Location (LOS i) interpret
10p Measures of Dispersion (LOS j, k) select
2.5p Skewness (LOS l)

interpret
4p Kurtosis (LOS m)
7p Correlation (LOS n) interpret
Data a collection of numbers, characters, words, Pg-1

LOS a
text that represent facts or information
- identify
- compare
I/ Numerical or Categorical
(quantitative) (qualitative)
Categorical: values that describe a quality or characteristic

- mutually exclusive labels or groups
(N) c) nominal no logical order (e.g. sectors of the economy)
(O) d) ordinal has a logical order or rank
- no information in the distance between
groups however
Numerical: measured or counted quantities
integer (I) a) Discrete limited to a finite number of values
ratio (R) b) Continuous can take on any value within a range
Ex #1
II/ Cross-sectional vs. Time-series vs. Panel Data Pg-2

LOS a
- identify
Definitions: variable a particular quality or characteristic
- compare
(e.g. stock price, height)
observation value of a specific variable
(e.g. GM at $53.30, Tom at 96 kg)
a) cross-sectional - multiple observations of a particular variable

(stock prices of 60 companies)
b) time-series - multiple observations of a particular variable
for the same observational unit over time
(GM’s stock price over the last 60 months)
c) panel data - cross-sectional + time-series
GM F . . . TSLA CS
montht
montht-1 data panel
TS montht-2 table data
Pg-3
LOS a
III/ Structured vs. Unstructured Data - identify
- compare
a) Structured highly organized in a pre-defined
manner (e.g. stock prices, returns, EPS)
b) Unstructured no organized form (news, social media posts,
company filings, audio/video)
- also called ‘alternative data’

• produced by individuals
• generated by business processes (credit card
• generated by sensors transactions)
- to be useful in data analysis must be transformed
into structured data
Ex #2
Pg-4
One-dimensional array (1 variable) LOS b
- describe
- e.g. a column of a spreadsheet (CS or TS)
Two-dimensional rectangular array (two or more variables)
- data table (CS or panel)
Var 1 Var 2 ... Var N

Comp 1 - - -
Comp 2 -
Comp m -
LOS c
- interpret
• Frequency distribution (one-way table)
- the number of observations of a specific value
or group of a variable
- sorted in ascending or descending order Exhibit #8
Pg-5
LOS c
• Absolute frequency - actual count of observations per
- interpret
value of the variable (∑ = )
• Relative frequency - %’age of observations per value of the variable
(abs. freq./total N) ∑ = %
- for numerical data: create non-overlapping intervals (bins)

sort data in ascending order
find the range: max - min too few = too much
decide on the number of intervals (k) aggregation
interval width = range/k - loss of info.
(round up always) too many - not enough
aggregation
Interval 1 = min value + width - too much noise
e.g. [0,5) , [5,10) , [10,15] (k = 3)
each obs. falls
0 ≤ x < 5 10 ≤ x ≤ 15 into only one interval
e.g./ -4.57 Pg-6

LOS c
-4.04 2
- interpret
-1.64 1 Range = 11.43 - (-4.57) = 16
0.28 ascending
1.34 order 3 let k = 4
N = 12 2.35
2.38 4 width = range/k = 16/4 = 4
4.28
4.42 5 Intervals:
4.68 [-4.57, -0.57) , [-0.57, 3.43) , [3.43, 7.43) , [7.43 - 11.43]
7.16 6 3 4 4 1
11.43
• cumulative absolute frequency a sequence of partial sums

relative frequency that sum to N or 100%
Exhibit #11
Pg-7
• Contingency table - summarizes data for 2 or more LOS d
categorical variables - interpret
(helps visually find patterns)

2-way table = 2 variables
Rows = 5
Columns = 3
R x C table
(5 x 3)
Var 1 row totals

- marginal
frequencies
joint
frequency column totals N
(can be abs. or %’age) - marginal frequencies
• applications/ Pg-8
LOS d
1/ confusion matrix - interpret
2/ potential association between 2 categorical variables

- use of a ‘chi-square test of independence’
Actual Expected
=∑
( − ) df = (C-1)(R-1)
(73 − 80.457) (183 − 175.543)

= +
80.457 175.543
(26 − 18.543) (33 − 40.457)

+ +
18.543 40.457
( × ) = 5.38 df = (2-1)(2-1) = 1
Pg-9
Visualization: presentation of data in pictorial or LOS e
- describe
graphical format
- evaluate
1/ Histogram & Frequency Polygon
represents the distribution of numerical data
y-axis:
frequency
(can be frequency
absolute or polygon
relative)
x-axis intervals/values
Pg-10
2/ Bar Chart - represent the frequency distribution of LOS e
categorical data - describe
- evaluate
freq.
horizontal
1 variable
- can be vertical
100%
% Pareto stacked bar
chart chart
2 variables
grouped bar
2 variables
chart
(aka. clustered
bar chart)
3/ Tree-Map - a set of coloured rectangles

Pg-11
LOS e
to represent groups - describe
- area = %’age of group - evaluate
- green = health care (1 category)
nested rectangles market cap
(other category)
within each market cap:
more nested rectangles for another category
4/ Word Clouds
size of each word proportional
(aka. tag cloud)
to its frequency in the text
- colour can be used to display
- depicts frequency different sentiment
of unstructured
data (e.g. text)
Pg-12
5/ Line Chart
LOS e
- used to visualize ordered
- describe
observations - evaluate
- typically used for time series data
- facilitates showing changes and underlying
(aids in forecasting) trends
can show more than one time series
- adding a third characteristic

(revenue + time = line chart)
(rev. + time + EPS = bubble line
chart)
EPS+ EPS-
= green = red
Pg-13
6/ Scatter Plot LOS e
- describe
- used to visualize the joint variation - evaluate
identify
in 2 numerical values
outliers
- may be no relationship, a linear or non-linear
relationship
- scatter plot matrix
- assess for pairwise association
among many variables (Exhibit 32)
7/ Heat Maps
- contingency table with

colour-coded cells
- can also be used to visualize

the degree of correlation
among different variables
Pg-14
LOS f
- describe
Pitfalls/
1 2
1 selecting an improper
chart type - hinders
accurate interpretation
3
of data
2 Selecting data that
favours a conclusion
3 truncating the range
4 extending the range

100 10 10B
implies
vs
9B XYZ is
2x that
0 0 8B of ABC
ABC XYZ
Pg-15
• measures of central tendency - specifies where data LOS g
are centered - calculate
(arithmetic mean, median, mode, weighted mean, geometric - interpret
mean, harmonic mean)
• measures of location - deciles, quantiles, quintiles
population parameters µ ,
measures of dispersion
sample sample statistics , S
descriptive statistics
1/ Arithmetic Mean
average sales of
=
∑ cross-sectional mean
50 companies
average sales for last
time-series mean
X-bar 10 yrs. for GM
1/ Arithmetic Mean Pg-16

- deviations from the
property: LOS g
( − )= mean indicate risk - calculate
(variance, skew, kurtosis) - interpret
disadvantage: sensitive to outliers

e.g. 1, 2, 3, 4, 5, 6, 1000 mean = 1021/7 = 145.86 not
representative
of any value
Options
1/ Do nothing - appropriate if the value is legitimate and correct

- may contain meaningful information
2/ Delete - trimmed mean exclude a small %’age of lowest
and highest values
e.g. 5% trimmed mean:
- deletes top 2.5% and bottom 2.5%
Pg-17
1/ Arithmetic Mean LOS g
Options - calculate
- interpret
3/ replace with another value
95% winsorized mean top 2.5% of values replaced by
the value at which all others lie
e.g./ 100 4 obs - all 4
88 above (opposite for the bottom)
(2.5%) assigned 88
(25 - 75)
majority 12 4 obs - all 3 assigned
0 (2.5%) 12
2/ Median - middlemost value of a set of observations
odd # of obs. median = (n+1)/2 e.g. n = 11 median = (11+1)/2 = 6th obs.
( )
even # of obs. median = e.g. n = 10 median =
.
=
- not affected by extreme values (outliers)

useful for describing central tendency for a non-symmetrical distribution
3/ Mode - the most frequently occurring value in a Pg-18

LOS g
distribution
- calculate
unimodal only 1 value that is most frequent
- interpret
bi-modal two values have the highest frequency
tri-modal three …
or no mode no value occurs more frequently than any other value
(uniform distribution)
only measure of central tendency that can
be used with nominal data
For a symmetrical distribution: mode = median = mean
4/ Weighted Mean - common in finding RP or E(RP)
RP = WARA + W BRB + … + WNRN
= where =
E(RP) = WAE(RA) + WBE(RB) + … + WNE(RN)
Wi > 0 = long position
X-bar sub-w Wi < 0 = short position
Pg-19
4/ Weighted Mean - weights can be probabilities LOS g
- calculate
- interpret
RSP500 = PA * RA + PB + RB + PC RC where ∑ =
bullish neutral bearish
5/ Geometric Mean used with rates of change over time or

to compute growth rates
e.g./
= … ≥ = , , … ( ) = .
( )/
( … ) = ( ) = .
or ( )=
Property:
(* critical
= = ( + ) − to know) ≤
- difference between them
- also referred to as compounded grows as variability increases
returns (example #10)
5/ Geometric Mean Pg-20

LOS g
e.g. #1/ YR1 YR2 YR3 . %+ . %− . %
= - calculate
7.8% 6.3% -1.5% - interpret
. %
- but (1.042)3 - 1 = 13.137% = = . %
$1 invested would actually grow to: (1.078)(1.063)(.985) = 1.128725

(12.8725%)
=( . ) − = . %
(1.04119)3 = 1.128725
e.g. #2/ Beg. Sales $12M
Ending Sales $21M −
N = 6 yrs. + − = . %
$12M(1.09775)6 = $21M
5/ Geometric Mean Pg-21

LOS g
e.g. #3/
Sales - calculate
- interpret
line of best fit (linear regression)
[( + )( + )( + )( + )( + )] −
End.
g4 g5
Beg g3
g2
g1
time
• growth rate of an investment (constant/yr.) over multiple periods

• average single period return
.˙. forecast of returns (or growth rates) in one YR?

over multiple periods?
= −
6/ Harmonic Mean Pg-22

- gives much LOS g
=
less weight to - calculate
∑ - interpret
outliers
• appropriate for averaging ratios when the ratios are

repeatedly applied to a fixed quantity to yield a variable
number of units
e.g. dollar cost averaging
invest €1,000 a month for 2 months
m1 €10/sh. m2 €15/sh.
. or/ = =
. +.
=
= +
.
.˙. .
= /
Pg-23
LOS h
× = and > > - select
when including all when to avoid

values, including compounding outliers
outliers is involved
LOS i
- calculate
Quantiles/ - interpret
Quart 25%, 50%, 75% interquartile range: 1QR = Q3 - Q2

Quint 20%, 40%, 60%, 80%
Dec 10%, 20%, 30% … 90%
Percent 1%, 2% . . . 99% Ly = ( + ) (with data in ascending
order)
location an ↑, Ly becomes more
accurate
Ly is an integer done
Ly is a decimal interpolation e.g. 6.8 X6 X7
.8
X6 + (X7 - X6).8
Pg-24
LOS i
- calculate
- interpret
upper
= (1.5 x 1QR)
Box & whisker plot fence
+ upper
bound
lower fence = lower bound - (1.5 x 1QR)
uses/ rank performance of portfolios and investment managers

in terms of percentile/quartile in which they fall
investment research bottom return decile short long/short
top return decile long HF
(more at L3)
Dispersion the variability around the central tendency Pg-25

LOS j
- calculate
- measures of absolute dispersion - interpret
1/ Range = max value - min value (56 - 12 = 44)
or max value to min value (ranges from 56 to 12)
- uses only 2 observations
- tells us nothing about the shape of the distribution however
2/ Mean Absolute Deviation (MAD)

MAD = ∑ | − | - uses all the observations
3/ Variance and standard deviation Let n = 10. If we know ,

( ) ( ) we can only take 9 at
random, the 10 is constrained.
th
∑ ( − )
= .˙. we do not have independent
−
df variables, we have −
Pg-26
3/ Variance and standard deviation LOS j
- calculate
∑ ( − ) - variance of
= - interpret
−
measured in
units squared ∑ ( − ) - sd of
= =
−
e.g. %
expressed in the same units of measurement
% = %
as the mean
- for , recall that ≈ − application: BSM

−
= =
= √ − −
=
√
(Level 2 lookahead)
Pg-27
Target Downside Deviation - only concerned with LOS k
downside risk - calculate
Target Semideviation a measure of dispersion - interpret
below the target
∑∀ ( − ) e.g. 10
= 8 (5 - 5)2 + (5 - 5)2
− 6
Let B = 5 + (5 - 5)2 + (4 - 5)2
4
+ (2 - 5)2 + (0 - 5)2
full , not just of < 2
0
- as ↑, ↑ (example 18 and 19)
Coefficient of Variation measure of relative dispersion

e.g./ for returns, CV measures the
= > risk per unit of return
Pg-28
Coefficient of Variation LOS k
- allows for direct comparisons of - calculate
= dispersion across different data sets - interpret
e.g./ = vs. = which one has greater

= = dispersion?
= =. = =. lower = less dispersion
Skew LOS l
- interpret
Skew = 0
Skew > 0 Skew < 0
mean = median = mode mean > median > mode mean < median < mode
Pg-29
∑ ( − )
≈ for > LOS l
- interpret
Kurtosis ∑ ( − ) LOS m
= − - interpret
leptokurtic (k > 3)
mesokurtic (k = 3)
more
less weight
weight
platykurtic
(k < 3)
more weight less weight
Lepto >
(Exhibit 50 + Example 21)
Meso =
Platy <
good exam Q.
Pg-30
LOS n
Covariance the joint variability of 2 random variables
- interpret
expressed in the same units as the
variables
∑ ( − )( − )
= > 0 when they covary together
− ( - ) > 0 when ( - ) > 0
determines and ( - ) < 0 when ( - ) < 0
the sign of rXY
Correlation measures the linear association between 2 variables
= Properties:
1/ -1 ≤ r ≤ 1 maximum
2/ r = 0 implies no linear relationship diversification
perfect
3/ r = 1 perfect positive correlation
replication
4/ r = -1 perfect negative correlation
Example #22 perfect hedge
Pg-31
LOS n
Limitations/
- interpret
1/ Linear association only

2/ Unreliable when outliers are present
3/ correlation does not imply causation
spurious correlation
chance relationship
x and y may have resulted from a process
involving a third variable
x and y may be related to a third variable
(ice cream sales and crime) (height and vocabulary)
Probability Concepts
a. define a random variable, an outcome, and an event;
b. identify the two defining properties of probability including mutually exclusive

and exhaustive events, and compare and contrast empirical, subjective, and a
priori probabilities;
c. describe the probability of an event in terms of odds for and against the event;
d. calculate and interpret conditional probabilities;
e. demonstrate the application of the multiplication and addition rules for

probability;
f. compare and contrast dependent and independent events;
g. calculate and interpret an unconditional probability using the total probability

rule;
h. calculate and interpret the expected value, variance, and standard deviation of
random variables;
i. explain the use of conditional expectation in investment applications;
j. interpret a probability tree and demonstrate its application to investment

problems;
k. calculate and interpret the expected value, variance, standard deviation,

covariances, and correlations of portfolio returns;
l. calculate and interpret covariance of portfolio returns using the joint probability
function;
m. calculate and interpret an updated probability using Bayes’ formula;
n. identify the most appropriate method to solve a particular counting problem

and solve counting problems using factorial, combination, and permutation
concepts.
LOS a-c Probability Concepts and Odds Ratios (define, identify, describe)
(5p)
LOS d-g Conditional and Joint Probability (calculate/interpret, demonstrate,
(12p) compare, contrast)
LOS h-j Expected Value, Variance and Conditional (calculate/interpret,
(6.5p) measures of Expected Value and Variance explain)
LOS k Expected Value, Variance, Standard Deviation,
(6p) Covariance and Correlation of Portfolio Returns
LOS L Covariance of a Joint Probability Function calculate
(2.5p) interpret
LOS m Bayes’ Formula
(6p)
LOS n Principles of Counting (identify)
(5p)
Page 1
Random variable - a quantity whose future outcomes
LOS a
are uncertain (e.g. returns) - define
Outcome - a possible value of a random variable (e.g. 4.3%)
Event - a specified set of outcomes e.g.: A = (rP < 10%) B = (rP ≥ 10%)
A
( )+ ( )= %
B
=
0% 10%
P(A) P(B)
if an event is impossible: P(E) = 0%

not really random
if an event is certain: P(E) = 100%
Page 2
Property 1: ≤ ( )≤ LOS b
where are - identify
Property 2: 1/ mutually exclusive
( )= - compare
(if one happens, another can’t) - contrast
2/ exhaustive
(covers all possible outcomes)
How are probabilities estimated?
1/ empirical probabilities based on historical observation
past is assumed to be representative of the future
historical period must include occurences of the event
( )= =
2/ subjective probabilities adjust an empirical probability based

on intuition or experience
when there is a lack of empirical observations
to make a personal assessment
Page 3
LOS b
How are probabilities estimated? - identify
3/ a priori probabilities - arriving at a conclusion - compare
based on deductive reasoning - contrast
e.g. P(1) = 1/6 (roll a die, get a 1)
expressed as LOS c
( ) - describe
Odds for: a to b probability =
− ( ) +
− ( )
Odds against: b to a probability =
( ) +
e.g./
P(E) = 1/8 1 to 7 - for each occurence of E, we expect
7 non-occurences
P(A) = 3/17 3 to 14 - for every 3 occurences of A,
we expect 14 non-occurences
odds for
Page 4
from odds to probability: for: 1 to 4 LOS c
=.
+ - describe
against: 4 to 1 =.
+
e.g./ Wager:
A = win mutually exclusive Odds for = 1 to 15
B = loss exhaustive
$1 bet pays $16, profit = $15
P(A) = 1/16 ≤ ( )≤ Odds against 15 to 1
P(B) = 15/16 ∑ ( )=
lose $1
.˙. expected profit = + (− ) = − =
Page 5
LOS d
( ) unconditional probability - calculate
( | ) conditional probability (prob. A given B) - interpret
( | ) = ( ) called a joint probability

( ) probability of A and B occurring
e.g./ P(B) = .5 A B
P(AB) = .1
.˙. ( | ) = . = 20%
.
(A ∧ B)
Multiplication Rule: ( ) = ( | ) ( ) LOS e

- demonstrate
A = YR1 Winner
AC= YR1 Loser
B = YR2 Winner ex. #3
BC= YR2 Loser
Page 6
Tree/ LOS e
( | ) = .66 - demonstrate
1/ YR1 Winner = A
YR1 Loser = AC
YR2 Winner = B ( ) = .50 ( | ) = .34 calculate: ( )
YR2 Loser = BC = ( ) ( )
( | ) = .34 = .66 x .5
= 0.33
( ) = .50
( | ) = .66
Additional Rule: P(A or B) = P(A ∨ B) = P(A) + P(B) - P(AB)
A B
Double counting
Page 7
e.g./ find: P(A or B) LOS e
P(A) + P(B) - P(AB) - demonstrate
= .35 + .25 - .25

A = $10 P(10) = 0.35
P(9.75) = 0.25 = .35
B = $9.75
LOS f
Independent event: 2 events are independent iff
- compare
( | ) = ( ) knowing tells
- contrast
or ( | ) = ( ) us nothing about
.˙. P(AB) = P(A)P(B) e.g. ( | ) = ( ) = 50%
Dependent event: P(A) is related to P(B)
e.g. A = stock ABC rises A is most likely dependent

B = SnP500 rises upon B ex. #6/7
Total Probability Rule: ( ) = ( | ) ( )+ ( ) ( ) Page 8

LOS g
when the conditioning - calculate
events are ( ) = ( ) = - interpret
mut. excl. ( | ) ( )
exhaustive ( | )
( ) ( ) multiplication
( ) + rule
( )
( ) ( )
+
( )
( )
( ) ( )
( )
unconditional total probability
probability
conditional rule
probability
Page 9
ex. #9 ( ) = 0.55 ( ) = 0.45 LOS g
- calculate
+ ( | )=? = ( | ).55
- interpret
(find)
( ) = 0.55 -
+
+ ( )
-
+
( ) = 0.45
( ) = 0.40 .40(.45) .55 = .55X + .18
- .37 = .55X
( ) .
X = = .6727
.
LOS h
Recall: ∑
= = + + …+ - calculate
- interpret
weights
= ∑
could be probabilities
Page 10
Expected Value of a random variable is the LOS h
probability-weighted average of the - calculate
possible outcomes - interpret
(expected - what we expect the true value to be or
what we expect the future value to be)
e.g./ EPS P(EPS)

2.60 .15
E(EPS) = .15(2.60) + .45(2.45) + .24(2.20) + .16(2.00)
2.45 .45 = 2.34
2.20 .24 value of X
2.00 .16 ( )=∑ ( )
probability
Recall: averages have variances (measures of dispersion)

- since an expected value is a probability-weighted
average, it has a probability-weighted variance
Page 11
LOS h
- calculate
Recall: = ∑( − )
( − ) + ( − ) - interpret
− − −
+…+ ( − )
−
weights
make them probabilities
= ( ) − ( )
previous example:
=. ( . − . ) +. ( . − . ) +. ( . − . )
+. ( . − . )
=.
( ) = √. = .
Page 12
conditional expected value value of X LOS i
( | )= ∑ ( | ) - explain
probability
( )= ( | ) ( )+ ( | ) ( )+ … + ( | ) ( )
total probability rule for expected value
2.60 (.6 x .25 = .15) LOS j
.25 - interpret
- demonstrate
.60 .75
r↓ But: what is:
E(EPS)=2.34 2.45 (.6 x .75 = .45) 1/ ( | ↓)
r-unch 2.20 (.4 x .6 = .24) .25(2.60) + .75(2.45)
.60
.40 = 2.4875
2/ ( | − )
.60(2.20) + .40(2.00)
.40
2.00 (.4 x .4 = .16) = 2.12
E(EPS) = .60(2.4875) + .40(2.12)
2.60 Page 13
.25 LOS j
- interpret
.60 .75 ( | ↓) = . - demonstrate
r↓
( | ↓) = . ( . − . )
2.45
E(EPS)=2.34 +. ( . − . ) =.
r-unch 2.20
.60
.40
( | − )= .
( | − )=. ( . − . )
.40 +. ( . − . ) = .
Example #12 2.00
LOS k
⟹ Portfolio Returns ( ), , ,
- calculate
1/ E(RP) = E(W1R1 + W 2R2 + … + WnRn) - interpret
possible value of R1
also a random variable
E(R1) = P(R11)R11 + P(R12)R12 + … + P(R1n)R1n
probability
Page 14
e.g./ W E(Ri) LOS k
SnP500 .50 13% - calculate
- interpret
Corp. bonds .25 6%
MSCI EAFE .25 15% E(RP) = .5(13%) + .25(6%) + .25(15%) = 11.75%
measure of expected reward
2/ ( )=
( )=∑ ( )
= ∑ − −
to calculate −
portfolio variance, need:
1/ all E(R i)
assume 3 assets R1, R2, R3
2/ all Cov(Ri,Rj)
( )= ( )+ ( )+ ( )
(Exhibit #11) + ( )+ ( )
+ ( )
Page 15
( ) = f(variances, covariances)
LOS k
- calculate
can be
always > 0 - interpret
< 0 or > 0
- major point - by selecting assets with zero or negative covariance,

portfolio risk is lowered
n= 5
- for n securities (or asset classes) n variances 5 vars.
n2 - n covariances 25-5=20 Covars.
(n2 - n)/2 distinct covariances =
unique
3/ Correlation = Covars.
= ,
,
. Ex #13
= = .
.
Page 16
LOS L
Recall:
- calculate
( )= ∑ − −
- interpret
−
= − − + − − + ⋯+ − −
− − −
weights
probabilities?
- the concept of joint probability
where i & j = 1 to n
( )= − −
are scenarios
probability value of cross product
if returns are independent:

since independence is a stronger property than
P(RARB) = P(RA)P(RB) uncorrelatedness, this property holds for
uncorrelated random variables ex #15/16
Bayes’ Formula: a method for updating prior probabilities Page 17

LOS m
based on new information
- calculate
Recall: Total Probability Rule - interpret
( )= ( | ) ( )+ ( | ) ( )+ ⋯+ ( | ) ( )
Q: given that we observe , what is ( )? ( | )

( | )= ( | ) ( ) ( | )
= × ( )
( ) ( )
e.g./ prior Exp. What is P(exceeds expands)
.75 P(E) = (. )(. ) .

.25 .45(.75) (.3375) =
.
=
.
. =.
E=.45
.20 +
m=.30 .30(.20) (.06) ( .| .) ( .) ( .| .)
.80 = = ( .)
.05 + ( .) ( .)
FS=.25 .95 .25(.05) (.0125)

41%
no Exp.
Bayes’ Formula: Page 18

Exp. P(Exp.)
LOS m
e.g./ .75 (.75) . /. ̇ =. - calculate
.25
1/3 + - interpret
.20
1/3 (.20) . ̇ /. ̇ = .
.80
diffuse 1/3
.05 +
priors .95 (.05) . ̇ /. ̇ = .
1/3
LOS n
Counting/ e.g./ - identify
1/ Multiplication Portfolio subdivided by
- analyze
Domestic/Foreign
then by 4 industries
then by 3 size categories
- how many sub-portfolios?
2 x 4 x 3 = 24
Page 19
Counting/
e.g./ LOS n
2/ Factorial 3 analysts to cover 3 industries - identify
!
3 x 2 x 1 = 3! - analyze
3/ Multinomial - the number of ways that objects can be

! labelled with labels with to representing
! !… ! the size of each label category
e.g./ Rank 18 funds by total return

- each 18 assigned to 5 categories
high risk above-avg. risk avg. risk below-avg. risk low risk
4 4 3 4 3
ABCD is the same as ACBD

!
.˙. ! ! ! ! !
Factorial of 4!
Counting/ Page 20
LOS n
if k = 1 factorial
- identify
if k = 2 combination or permutation - analyze
4/ Combination - the number of ways of selecting objects

! from where order does not matter
= =
( − )! !
e.g./ For 5 price changes with 3 Us, how
many ways can this occur?
U
! × × !
D = = =
( − )! ! ! !
recombining
lattice
5/ Permutation - if k = 2 an
!
= Ex 17, 17, 19
( − )!
Common Probability Distributions
a. define a probability distribution and compare and contrast discrete and continuous random
variables and their probability functions;
b. calculate and interpret probabilities for a random variable, given its cumulative distribution
function;
c. describe the properties of a discrete uniform random variable, and calculate and interpret
probabilities given the discrete uniform distribution function;
d. describe the properties of the continuous uniform distribution and calculate and interpret
probabilities, given a continuous uniform distribution;
e. describe the properties of a Bernoulli random variable and a binomial random variable, and
calculate and interpret probabilities given the binomial distribution function;
f. explain the key properties of the normal distribution;
g. contrast between a multivariate distribution and a univariate distribution, and explain the role
of correlation in the multivariate normal distribution;
h. calculate the probability that a normal distributed random variable lies inside a given interval;
i. explain how to standardize a random variable;
j. calculate and interpret probabilities using the standard normal distribution;
k. define shortfall risk, calculate the safety-first ratio, and identify an optimal portfolio using
Roy’s safety-first criterion;
l. explain the relationship between normal and lognormal distributions and why the lognormal
distribution is used to model asset prices;
m. calculate and interpret a continuously compounded rate of return given a specific holding
period return;
n. describe the properties of the Student’s t-distribution, and calculate and interpret its degree of
freedom;
o. describe the properties of the chi-square distribution and the F-distribution, and calculate and
interpret their degrees of freedom;
p. describe Monte Carlo simulation.

LOS a, b Discrete Random Variables (define, compare, contrast)

(4.5p)
LOS c, d Discrete and Continuous Uniform Distribution describe
(5p) calculate
LOS e Binomial Distribution interpret
(7p)
LOS f-j Normal Distribution (explain, contrast)
(8p)
define
LOS k Applications of the Normal Distribution
identify
(4p) explain
LOS L, m Lognormal Distributions and Continuous Compounding calculate
(6.5p) interpret
LOS n, o Student’s t, Chi-Square, and F-Distributions - describe
(6.5p) - calculate
LOS p Monte Carlo Simulation - describe - interpret
(6p)
Page 1
Probability distribution specifies the probabilities LOS a
associated with the possible outcomes - define
of a random variable - compare
(uniform, binomial, normal, lognormal, Student’s t, - contrast
chi-square, F-distribution)
Random variable a quantity whose future outcomes are uncertain
discrete - take on at most a countable number of
possible values (possibly infinite)
continuous - cannot count the possible values
- every random variable is associated with a probability distribution
that describes the variable completely
Probability function specifies the probabilities that a random
variable can take i.e. P(X = x)
discrete variables: p(x)
continuous variables: f(x) the probability density function (pdf)
Probability function Page 2

LOS a
has 2 key properties
- define
1/ ≤ ( ) ≤ - compare
2/ ∑ ( ) over all values of X equals 1 - contrast
Cumulative distribution function (cdf) gives the probability LOS b

- calculate
that a variable X is less than or equal to
- interpret
a particular value x
( )= ( ≤ )
Page 3
Discrete uniform distribution
LOS c
- describe
- calculate
( ≤ )= . - interpret
(P(1) + P(2) + … + P(7) = 7 x 0.125

= .875)
( )=
= .125 ( ≤ ≤ ) = P(4) + P(5) + P(6)
= 3 x .125 = .375
or/
( ≤ )− ( ≤ )
= . − . = .
( < ≤ ) = P(5) + P(6) = 0.250

or/
( ≤ )− ( ≤ )
= . − . = .
Page 4
Continuous uniform distribution ( )= for ≤ ≤
LOS d
0 otherwise - describe
( )= - calculate
= =. - interpret
−
− −
( )= ( ≤ )= = = =.
− −
example #2/
a b
LOS e
Bernoulli random variable: the outcome of a trial that - describe
produces one of two outcomes (1 or 0) - calculate
where p = success - interpret
( )=
( )= − intrials, we can have 0 to successes

- if each trial is a random var., then the
# of successes in trials is also a random var
Page 5
Binomial Random Variable - # of successes in LOS e
- describe
Bernoulli trials.
- calculate
- interpret
assumptions: 1/ p is constant for all trials
2/ trials are independent
- a binomial random variable has a distribution completely

described by 2 parameters ~ ( , )
Q1: how many successes ( ) are in trials?

- does order matter? SSFF !
SFSF ? no. .˙. =
( − )! !
SFFS
Q2: how probable is it to have successes in trials?

e.g.: P(SSFF) = ( ) ( ) ( ) ( )
= ( − ) ( − )= ( − ) ( − )
Page 6
! LOS e
( )= ( − )
( − )! ! - describe
probability - calculate
# of ways - interpret
n =5 p = 0.50 p = 0.10 p = 0.90
symmetrical
.59
! !
. =. (. ) = .
( − )! ! ! ( − )! !
(. ) = .
( − )! !
example #3.
Profitable Losing Page 7

LOS e
BB001 3 9 12 !
( − )
( − )! ! - describe
BB002 5 3 8 - calculate
- interpret
3 A/ ( ≤ ) B/ ( ≥ )
! !
( )= =. ( )= =.
( )! ! ( )! !
! !
( )= =. ( )= =.
( )! ! ( )! !
! !
( )= =. ( )= =.
( )! ! ( )! !
! !
( )= =. ( )= =.
( )! ! ( )! !
.072998 .363281
or 7.3% or 36.3%
Page 8
Mean Variance p 1 LOS e
( ) = ( )+( − )
Bernoulli p p(1 - p) - describe
=
Binomial np np(1 - p) (1-p) 0 - calculate
- interpret
= ( − ) + ( − )( − ) = ( − )( − ) + ( − ) = − + + −
= − = ( − )
Central Limit Theorem (next reading) the distribution of a LOS f

- explain
large number of independent random variables
with finite variance is approximately normal
( )
= ( )= for −∞ < <∞
√
pdf
= if = , = : standard normal dist.
−∞ +∞
Page 9
1.00
pdf LOS f
cdf - explain
.50
0
50% 50%
0
- the normal dist. will be used to model asset returns (not asset prices)
- more kurtotic than normal
- options add skew
Characteristics:
1/ described by 2 parameters and ~ ( , )
2/ skew = 0 and kurtosis = 3 ( = )
.˙. mean = median = mode
3/ a linear combination of 2 or more normal random variables
is also normally distributed
R P = w 1 R 1 + w 2 R2 + w 3 R 3
normally dist.
univariate random vars.
but multivariate
multivariate normal distribution is completely Page 10

LOS g
defined by 3 lists of variables:
- explain
1/ all the mean returns of all the individual securities
- contrast
( returns)
2/ all the securities’ variances ( variances)
3/ all pairwise correlations ( − )/ unique correlations
LOS h
z-value - calculate
example #5/
1 = cdf
0 = pdf
+/− = %
+/− . = %
+/− . = %
n = 30 returns Page 11
LOS i
for each xi : e.g./ x = 7.2% - explain
7.2% . ̇
4.7% − . − . 0
= = = . ̇
=3% =1
LOS j
= NORM. S. DIST (z,1) or = NORM. S. INV (probability)
- calculate
(z in, prob. out) (prob. in, z out) - interpret
1/ ( ≤ . ) = NORM. S. DIST (0.24,1) 0.5984

2/ ( ≤ − . ) = NORM. S. DIST (-1.65,1) .04947 ~ %
3/ 90th percentile = NORM. S. INV (0.90) 1.28155 (z-value)
4/ 95th percentile = NORM. S. INV (0.95) 1.64485 10%
1.28155
= 1 - NORM. S. INV (0.95) 5%

5%
= NORM. S. INV (0.05)
-1.64485 1.64485
Page 12
Example #6/ = % = %
LOS j
1/ ( ≥ %) - calculate
64.194% 35.806%
− - interpret
= = .
0 .3636
= NORM. S. DIST (0.3636,1)
= 0.64194 ∴ ( ≥ %) = − . =.
2/ ( %≤ ≤ %)
= = . - =
.3636 0 0 .3636
= NORM. S. DIST (0.3636,1) - NORM. S. DIST (0,1) (= .64194 - .5000 = .14194)
3/ ( ≤ . %)
. − 38.38%
= = −.
= NORM. S. DIST (-.2955,1)

-.2955 0
= .3838
Page 13
Safety first rules focus on shortfall risk - the risk a
LOS k
portfolio value (or return) will fall below some
- define
minimum acceptable level over some time horizon - identify
Let = minimum acceptable level of return

objective is to maximize this ratio
( )−
= - optimal portfolio minimizes
z-value N(-SFRatio)
e.g./ = %
−
= = . ̇ = NORM. S. DIST (-.667,1)
Portfolio = 0.2525
1 12% 15% −
= = . = NORM. S. DIST (-.75,1)
2 14% 16% = 0.227
Note: if = SFRatio = Sharpe Ratio Ex #7

Page 14
.
= - commonly used to model LOS L
the probability distribution - explain
of asset prices
right
skewed
- a variable Y follows a lognormal
distribution if LN(Y) is normally
0 ∞ distributed
bounded below by 0
- completely described by 2 parameters the and of its associated

normal distribution
LOS m
/ = + where = holding period return - calculate
- interpret
i.e./ −
=
/ − / = / − = / = +
Page 15
e.g. = . =
LOS m
/ = . / = . = + .˙. = % - calculate
- interpret
( / )= where = continuously compounded return
= ( . / )= ( . )= . and ~ ( , )
.˙. . = .
more generally: = ( / = and ( / )= )
- to assume returns are normally distributed, we assume returns

are 1/ independent
2/ identically distributed ( and do not change from
period to period)
- so, while = ( + )
with cont. comp: =
Volatility annualized sd of the continuously compounded Page 16

LOS m
daily returns of the underlying asset
- calculate
- interpret
- since ~ ( , ) , sd = √
- so both the mean and variance of scale linearly
with time, but the s.d. scales linearly with the square
root of time
e.g. if daily vol = .01, annualized vol = . √ = . % example #8
LOS n
1/ Student’s t-distribution - defined by a single parameter - describe
known as degrees of freedom (df = − ) - calculate
normal - interpret
more
- as ↑ , the t-distribution weight less
t-dist.
approaches the z-distribution weight
- for > , ≃
Page 17
− −
= = LOS n
standard
/√ /√ - describe
error
- calculate
where and are where and are - interpret
population parameters sample statistics
(only 1 estimate used) (.˙. 2 estimates used)
t-tests are used for hypothesis testing since they are more
conservative, a more stringent test, and they produce wide confidence
intervals
LOS o
- distribution of the sum of squares - describe
- each distribution has
(deviations) of k independent - calculate
its own df
standard normally distributed - interpret
- as df ↑, dist.
becomes more
random variables (dist. of variances)
bell-shaped df = −
bounded below by zero
Page 18
F-distribution - the ratio of 2 variables LOS o
the larger - describe
− - calculate
= value is in the
- interpret
numerator
−
used in regression to test the
significance of the whole regression
(explained var/unexplained var)
Example 9 Excel cdfs:
NORM. S. DIST (z,1) NORM. S. INV (p)
T. DIST (t-value,df,1) T. INV (p,df)
CHISQ. DIST (x2-value,df,1) CHISQ. INV (p,df)
F. DIST (F-value,df1,df2,1) F. INV (p,df1,df2)
Page 19
Example 9/ LOS o
- describe
- calculate
- interpret
Monte Carlo Simulation/ Page 20

LOS p
- describe
Step 1: Specify the quantity of interest
e.g. MVP in 10 years
Step 2: Specify a time grid sub-periods with ∆ increment

for the full time horizon
e.g. 20 sub-periods, ∆ = 6 months
Step 3: Specify distributional assumptions for the key risk factors

e.g. ( )= . % = %
= +
Step 4: Draw standard normal random numbers for each key risk
factor over each sub-periods.
- random number generator produces a distribution of
random numbers from 0 to 1, all equally likely
Page 21
Monte Carlo Simulation/ LOS p
Step #4 distribution of random #’s - describe
#1 0.32
#2 0.64 1000 runs
normal cdf 0 1 #20

100% Step #5:
50%
#1 0.31561 on cdf
#2 0.7673 on cdf
50%
#20
0 Step #6:
#1 -0.48 z-value
( )= . %− . ( %) = .
= ( . )
#2 0.73 z-value
( ) = . %+. ( )= . %
= ( . )
#20
Page 22
Monte Carlo Simulation/ LOS p
- describe
= ( + ( )
distribution of possible
Objective: =$ with 95% PoS.
/ + = Beginning Capital
10 yrs.
Monte Carlo Simulation provides only statistical estimates, not

exact results
does not support cause and
Sampling and Estimation
a. compare and contrast probability samples with non-probability samples and

discuss applications of each to an investment problem;
b. explain sampling error;
c. compare and contrast simple random, stratified random sampling cluster,

convenience, and judgmental sampling;
d. explain the central limit theorem and its importance;
e. calculate and interpret the standard error of the sample mean;
f. identify and describe desirable properties of an estimator;
g. contrast a point estimate and a confidence interval estimate of a population

parameter;
h. calculate and interpret a confidence interval for a population mean, given a

normal distribution with 1) a known population variance, 2) an unknown
population variance, or 3) an unknown population variance and a large sample
size;
i. describe the use of resampling (bootstrap, jackknife) to estimate the sampling

distribution of a statistic;
j. describe the issues regarding selection of the appropriate sample size, data-
mining bias, sample selection bias, survivorship bias, look-ahead bias, and
time-period bias.
LOS a-c Sampling Methods (compare, contrast, explain)

(10.5p)
LOS d, e Central Limit Theorem (explain, calculate, interpret)
(5.5p)
LOS f Properties of Estimators (identify, describe)
(4.5p)
LOS g-h Confidence Intervals (contrast, calculate, interpret)
(8.5p)
LOS i Resampling (describe)
(3p)
LOS j Biases (describe)
(6p)
Page 1
sample a method of obtaining information LOS a-c
- compare
about a population’s parameters ( & )
population - contrast
through sample statistics ( & ) - explain
A/ Probability sampling every member of a population has an equal

chance of being selected
.˙. samples will be more representative of the population
1/ Simple Random Sampling a subset of a larger population such that

each element has an equal probability of being selected
e.g.
population random number
sample
= generator selects
size
50 numbers between
=
1 and 500
useful when data are homogeneous

2/ Systematic Sampling when the population is too Page 2

LOS a-c
large to code
- compare
- select every element until the desired - contrast
sample size is reached - explain
3/ Stratified Random Sampling - population is sub-divided into sub-populations

based on one or more classifications
- simple random samples are then drawn from each sub-pop.
- each sample is then pooled to form the main sample
- each sub-sample is proportionate to the

sample size of its sub-population
- guarantees that population sub-divisions
are represented in the sample
- statistics will be more precise
Page 3
- sample statistics are estimates of population parameters LOS a-c
- not exact, subject to error - compare
- contrast
sampling error difference between observed values of - explain
a statistic and population parameters as a result

of using just a subset of the population
sampling distribution of
the sample means
all equal = /√ example #1
4/ Cluster sampling - pop. is divided into clusters each of which is

a mini representation of the population
- certain clusters are then selected as a whole using
simple random sampling one-stage cluster sampling
Page 4
4/ Cluster sampling - if sub-samples are selected from
LOS a-c
each cluster two-stage cluster sampling - compare
usually results in lowest precision since a cluster - contrast
may not be representative of the population - explain
- is both cost and time efficient however
Page 5
B/ Non-probability sampling - depends on factors such LOS a-c
- compare
as judgment or convenience (in terms of
- contrast
access to data)
- explain
- risk that samples may be non-representative
5/ Convenience sampling - observations are selected that are easy to

obtain or are accessible
not necessarily representative, but low cost
6/ Judgmental sampling select observations based on experience

and knowledge
useful when there is a time constraint and/or
the specialty of the researcher would result
in better representation
example 2, 3, 4
Page 6
LOS d, e
population with
- explain
any distribution - calculate
with and finite - interpret
sample
size =
sample size
= Standard Error
/√ or /√ if we
sample size
the sampling know
=
distribution of Note: sd ≠ SE
- as ↑, sampling the sampling sd = dispersion from
error decreases means the mean
(data description)
SE = sampling error
best estimate of
(data inference)
= /
= /√
Page 7
Point Estimators/ Desirable properties
LOS f
- identify
1/ Unbiasedness an unbiased estimator is one whose - describe
expected value (the mean of its sampling
distribution) equals the parameter it is intended to estimate
e.g. ∑ ∑
= is unbiased while = is biased
− upwards
2/ Efficiency an unbiased estimator is efficient if no other
unbiased estimator has a sampling distribution with
smaller variance
from estimator A
estimator A is more both are
efficient since it unbiased
from estimator B
produces a smaller
variance
= =
Page 8
Point Estimators/ Desirable properties
LOS f
- identify
3/ Consistency - a consistent estimator is one for which - describe
the probability of estimates close to the value
of the population parameter increases as sample size increases
e.g. = /√ , as ↑, ↓ implying less sampling error in = .

Example 7
LOS g, h
Confidence Interval a range for which one can assert - contrast
with a given probability ( −∝), called the degree of - calculate
confidence, that it will contain the parameter it - interpret
is intended to estimate
ie. lower limit upper limit

two-sided confidence interval
Page 9
Confidence Intervals/ LOS g, h
Interpretation: Probabilistic in repeated sampling - contrast
95% of such CIs will, in the long run, - calculate
include or bracket the population mean - interpret
Practical 95% confident that a given CI contains

the population mean
Point Estimate +/- Reliability factor x Standard Error
∝/ or ∝/ /√ or /√
the precision of the estimator
1/ CI for (Normally Distributed population, known variance)
+/- ∝/ /√ e.g. = , = , ∝= %, =
+/− . √ /√
= NORM. S. INV (.975) = 1.96
or = NORM. S. INV (.025) = -1.96 21.08 28.92
Page 10
1/ CI for (Normally Distributed population, known variance) LOS g, h
- contrast
- calculate
common reliability factors - interpret
90% = 1.65
5% 90% 5%
95% = 1.96
99% = 2.58 -1.65 +1.65
lower bound upper bound
2/ CI for (Large sample, Variance Unknown)

- any sampling distribution
+/- ∝/ /√ e.g. = . = . ∝= % =
. +/− . . /√
= NORM. S. INV (.95) = 1.65
or = NORM. S. INV (.05) = -1.65 .4005 .4995
Page 11
3/ CI for ( Unknown) LOS g, h
- contrast
+/- ∝/ /√ sample is large regardless
- calculate
of distribution - interpret
= T. INV (p ,df) = t-val. or/ sample is small but population
or = T. INV (p,df) = -t-val. is normally distributed
< >
normal dist., known z z
normal dist., unknown t t or z
practice uses t.
non-normal dist., known N/A z
non-normal dist., unknown N/A t or z
the larger , the greater the precision
- to obtain a desired CI width, select as follows:

×
+/- /√ =
× note: width = 2E
= √ = =
√
Page 12
Resampling repeatedly draw samples from an
LOS i
original data sample in order to estimate
- describe
population parameters
1/ Bootstrap method/ uses computer simulation
- draw 1 observation,
Population record, replace
- unknown sample all with - draw another obs.,
=
distribution =
record, replace
- all we have
is a sample times
- rather than estimate the distribution,

∑ −
this method creates the distribution =
−
- can also find of an estimator even when
no analytical formula is available standard
(e.g. median) error
Page 13
LOS i
2/ Jackknife method/ - omit one observation from
- describe
a sample, one at a time
will produce similar

e.g./ = = , omit
results from sample to
= , omit
sample (bootstrap may
not)
= , omit
LOS j
1/ Data snooping bias - searching a data set for statistically - describe
significant patterns/relationships
(data mining)
- if ∝ = %, testing 100 different variables, on average, will produce

5 significant relationships
- typically will not be theory-driven
- lack an economic rationale
Page 14
1/ Data snooping bias LOS j
- to minimize/avoid: - describe
validation test
out-of-sample test to
training
data set data set data evaluate model fit
- if data snooping is
build and fit evaluate model present, there will be
a model fit and tune the insignificant model fit
model
2/ Sample selection bias/ excluding some observations or time periods
- basically choosing non-random samples
e.g./ survivorship bias historical data may only include data
for companies that survived
- would overstate performance
using hedge fund indexes since they are self report, only
well-performing funds may opt to report
Page 15
3/ Look ahead bias/ using information that was not LOS j
available on the observation date - describe
e.g.: models that use price and accounting data from the
historical record when the actg. data may not have been
available on the same date
( on Dec 31, on Dec 31, but may not have
been reported until mid-February)
4/ Time-period bias/ results in one time period may be specific to

that time period
- typical of short time series

- too long, risk of including more than one regime/distribution
Hypothesis Testing
a. define a hypothesis, describe the steps of the hypothesis testing, and describe and
interpret the choice of the null and alternative hypotheses;
b. compare and contrast one-tailed and two-tailed tests of hypotheses;
c. explain a test statistic, Type I and Type II errors, a significance level, how significance
levels are used in hypothesis testing and the power of a test;
d. explain a decision rule, the power of a test, and the relation between confidence intervals
and hypothesis tests, and determine whether a statistically significant result is also
economically meaningful;
e. explain and interpret the p-value as it relates to hypothesis testing;
f. describe how to interpret the significance of a test in the context of multiple tests;
g. identify the appropriate test statistic and interpret the results for a hypothesis test
concerning the population mean of both large and small samples when the population is
normally or approximately normally distributed and the variance is 1) known or 2)
unknown;
h. identify the appropriate test statistic and interpret the results for a hypothesis test
concerning the equality of the population means of two at least approximately normally
distributed populations, based on independent random samples with 1) equal or 2)
unequal assumed variances;
i. identify the appropriate test statistic and interpret the results for a hypothesis test
concerning the mean difference of two normally distributed populations;
j. identify the appropriate test statistic and interpret the results for a hypothesis test
concerning 1) the variance of a normally distributed population, and 2) the equality of the
variances of two normally distributed populations based on two independent random
samples;
k. compare and contrast parametric and non parametric tests and describe situations where
each is the more appropriate type of test;
l. explain parametric and non parametric tests of the hypothesis that the population
correlation coefficient equals zero and determine whether the hypothesis is rejected at a
given level of significance;
m. explain tests of independence based on contingency table data.

Hypothesis Testing
LOS a, b (4p) The process of hypothesis testing - define, describe, interpret, compare
1.5p Identifying the appropriate test statistic contrast
LOS c explain
2p Specify the level of significance
3p State the Decision Rule
LOS d explain, determine
1p Make a decision
LOS e (3p) The role of p-values - explain, interpret
LOS f (2.5p) Multiple Tests and Interpreting Significance - describe
LOS g (4.5p) Tests concerning a single mean identify
LOS h (2.5p) Tests concerning differences between means (Ind. Samples) interpret
LOS i (4p) Tests concerning differences between means (Dep. samples)
LOS j (8.5p) Tests concerning tests of Variance
LOS k (2p) Parametric vs. Non Parametric Tests - compare, contrast, describe
LOS L (5.5p) Tests Concerning Correlation - explain, determine
LOS m (5p) Tests of Independence - explain
Page 1
Statistical Inference the process of making judgments
LOS a
about a larger group (pop.) based on a - define
smaller group (sample) - describe
e.g./ hypothesis testing - test to see whether a sample - interpret
statistic is likely to come from a population with
the hypothesized value of the population parameter
i.e. Does = ?
Hypothesis a statement about one or more populations that are

tested using sample statistics
Process: Step 1: State the hypothesis

2: Identify the appropriate test statistic
3: Specify the level of significance
4: State the decision rule
5: Collect data and calculate the test statistic
6: Make a decision
Step #1: State the hypothesis Page 2

LOS a
null assumed to be true unless
- define
alternative we can reject - describe
- typically want to reject - interpret
LOS b
Two-sided (two-tailed) test
- compare
e.g. : = % could be fail to - contrast
could be
vs. : ≠ % < %
reject > %
reject = %
reject
One-sided (left or right tailed) test

e.g. : ≤ % or/ : ≥ % left-tailed
: > % : < % do not
< % reject
right-tailed left-tailed Reject = %
Page 3
- the null ( ) always contains the equality sign LOS b
- compare
∶ = ∶ ≤ ∶ ≥ - contrast
- testing is always done at equality example #1
Test Statistic:
(Step #2) LOS c
pop. is known - explain
−
=
/√
distributed
normally
pop. is unknown
−
=
/√
t-distributed
Page 4
Step 3: Specify the Level of Significance LOS c
- level of sig. depends on the seriousness of making - explain
a mistake
= true = false
fail to reject Correct Type II error
( − ∝) as ∝ ↓, ↑
confidence level
reject Type I error Correct only way to
∝ ( − ) decrease both is
level of sig. Power of a test
to increase
−
=
/√ as ↑,
e.g.: : not pregnant denom. ↓,
: pregnant t-stat ↑
Step #4: State the Decision Rule Page 5

LOS d
- explain
2-tail: Reject when | − | >| | - determine
or ∝/ or ∝/
right tail: Reject when test-statistic > critical value
( or ∝)
left tail: Reject when test-statistic < critical value
∝
∝= % ∝= %
2.5% fail to 2.5% fail to 5% reject

reject reject reject reject
- or + or + or
or T. INV (p,df)
- using confidence intervals: +/− /√ Page 6

LOS d
- explain
- determine
if the CI around the
sample statistic ( ) contains
the hypothesized pop. parameter,
- or + or Do not reject
lower upper
limit − ∙ /√ + ∙ /√ limit
Step 5: Collect the data and Calculate the test statistic Ex #3

Step 6: Make a decision
- Reject if there is statistical support
Note: something statistically significant may not be economically

significant (transaction costs, taxes, risk)
ex #4
Page 7
P-value the area in the probability distribution outside LOS e
the calculated test-statistic - explain
- interpret
- for a two-sided test-stat, combine the probabilities under
the curve in both tails
= (1 - NORM. S. DIST(+ ,1)) x 2 p-value = a + b

or/ = (1 - T. DIST(+ ,df,1) x 2
b a
p-value is the lowest ∝ at which -t-stat +t-stat
the null can be rejected
one-sided: prob. under the curve in the appropriate tail

= NORM. S. DIST(- ,1) = 1 - NORM. S. DIST(+ ,1)
= T. DIST(- ,df,1) = 1 - T. DIST(+ ,df,1)
Page 8
if p-value < ∝ , reject LOS e
∝/ ∝/ - explain
∝ - interpret
< /
/
= <∝
−∝
example #5 test-statistic
LOS f
Rejecting is a positive event (support for ) - describe
Rejecting a true is thus a false positive
FDR - false discovery rate expected proportion of false positives

multiple testing problem in repeated tests with a level of
significance ∝, expect to generate ∝ false positives
this requires an adjustment to the p-value for the likelihood of
significant results being false positives (BH - Benjamini & Hochberg)
Page 9
rank all p-values from low to high LOS f
starting at the lowest: - describe
yes, Let = + , repeat

Let = , check ( )≤∝
# no, stop, Let =
= adjusted # of FPs.
exhibit #12 + example #6
LOS g
Tests concerning a single mean/
- identify
< ≥
- interpret
population known
approx. normal unknown or recall CLT-sampling
this is typically distribution of means will
the case be approx. normal with large
regardless of pop. dist.
∴ test statistic = = −
/√
Page 10
test-stat. known -alternative: unknown
LOS g
- theoretically correct to use: but large : - identify
− − - interpret
= =
/√ /√
Decision Rule: if test-stat > critical value reject
2-sided
= NORM. S. DIST(p)
= T. DIST(p,df)
test-stat > critical value - right

reject
test-stat < critical value - left
common z-values
1% 2-tailed +/- 2.576
1-tailed -2.326 or +2.326 example #7
5% 2-tailed +/- 1.96 example #8
1-tailed -1.645 or +1.645
Page 11
Differences between means - independent samples LOS h
- identify
pop 1.
Q: Are and from the - interpret
~ and same population (i.e. = )
or from different populations
independent (i.e. ≠ )
pop 2.
~ and 2-sided or/
: − = : =
: − ≠ : ≠
Assumption: =
test statistic: 1 sided - right
( − )−( − ) : − ≤ : ≤
= : − > : >
+
1-sided - left
: − ≥ : ≥
( − ) +( − ) Ex #9 : − < : <
=
+ −
Differences between means - dependent samples Page 12

LOS i
have something in common - identify
equal (i.e. = ) - interpret
- arrange data in pairs
- paired observations
- calculate a new variable
= −
and =
∑
and = /√ = # of paired
observations
: = : ≤ : ≥
: ≠ : > : <
test-statistic: −
= Example 10, 11
/√
Page 13
LOS j
Tests of Variances
- identify
1/ Single Variance - independent observations - interpret
from a normally distributed pop.
- chi-square tests sensitive to violations
2 sided:
0 : =
- not symmetrical, .˙. critical : ≠
values are also not symmetrical
1 sided: left right
= CHISQ. INV(lower p,df) lower : ≥ : ≤
= CHISQ. INV(upper p,df) upper : < : >
( − )
test-statistic = = example #12
Page 14
Tests of Variances (e.g. compare volatility of 2 funds) LOS j
2/ Equity of 2 Variances - identify
- interpret
: = : ≤ : ≥
: ≠ : > : < test-statistic
or/
= /
: / = : / ≤ : / ≥
: / ≠ : / > : / < = −
= −
right-tailed left-tailed
with the larger

- not symmetrical in the numerator
two .˙. need 2 critical values
Example 13/14
0 = F. INV(lower p,df1,df2)
left right
= F. INV(upper p,df1,df2)
Page 15
Parametric Testing Non-parametric testing LOS k
- compare
sample stats distributional no no - contrast
to test pop. assumptions parameters distributional - describe
parameters tested assumptions
( or for or )
1/ when data do not meet distributional
assumptions
i.e. < , pop. is non-normal
2/ when there are outliers
- test of median instead of mean
3/ when data are given in ranks or use
an ordinal scale ordered
NO IR
categorical
4/ hypothesis do not concern a parameter
e.g. Is a sample random?
Page 16
Tests of Correlation LOS L
1/ Parametric test left right - explain
2-sided : = one-sided : ≥ : ≤ - determine
: ≠ : < : >
- recall ( , ) called a Pearson correlation

=
or a Bivariate correlation
√ −
test-statistic: =
√ −
- in testing , as ↑, rejected for even small correlations
- big data sets, almost any will be significant
e.g./ =. . √ ,
, = ~ vs. critical = .
= , √ −.
ex #16
Page 17
Tests of Correlation LOS L
2/ Non-parametric test - explain
- if normality assumption for or violated, or - determine
outliers are present
Spearman rank correlation coefficient

basically a correlation, but calculated on rank values
and not the values of the observations of or
1/ Rank all from largest to smallest

- assign a rank: 1 = largest … n = smallest
- for a tie: assign all the same average rank
e.g. 3 tied for 6th
(6 + 7 + 8)/3 = 7
- each get a rank of 7
- repeat for
Page 18
Tests of Correlation/ LOS L
2/ Non-parametric test - explain
2/ On original data set (pre-ranked): - determine
calculate =( − ) - where and

are original
3/ = − ∑ paired obs.
( − ) white text example
√ −
=
Example #17
test , if > −
LOS m
Tests of Independence/
- explain
- test if classification types are independent
e.g./
Are growth stocks equally likely to be any size or
are they more likely to be large-cap stocks?
Page 19
Tests of Independence/ Contingency Table (2-way)
LOS m
- explain
observed
non-parametric test of indep.
− df = (r-1)(c-1)
=
(right-tailed)
m = # of cells (3 x 3 = 9)
= observed value in each cell
= expected value in each cell
( )×
=
=( × )/ = .
=( × )/ = .
.˙.
− ( − . )
= =.
.
Page 20
Tests of Independence/ LOS m
: size and type are independent - explain
: size and type are not independent
−
= .
= CHISQ. INV(0.95,4) = 9.4877

.˙. reject
standardized residuals = − > means more obs. than
expected if categories were
independent
e.g./ − .
= = .
√ .
more than
− . example #18
= = . expected if
√ . independent
Introduction to Linear Regression
a. describe a simple linear regression model and the roles of the dependent and
independent variables in the model;
b. describe the least squares criterion, how it is used to estimate regression

coefficients, and their interpretations;
c. explain the assumptions underlying the simple linear regression model, and
describe how residuals and residual plots indicate if these assumptions may
have been violated;
d. calculate and interpret the coefficient of determination and the F-statistic in a

simple linear regression;
e. describe the use of analysis of variance (ANOVA) in regression analysis,

interpret ANOVA results, and calculate and interpret the standard error of
estimate in a simple linear regression;
f. formulate a null and an alternative hypothesis about a population value of a

regression coefficient, and determine whether the null hypothesis is rejected at
a given level of significance;
g. calculate and interpret the predicted value for the dependent variable, and a
prediction interval for it, given an estimated linear regression model and a
value for the independent variable;
h. describe different functional forms of simple linear regression.

Simple Linear Regression
LOS a (2.5p) Simple Linear Regression - describe
LOS b (8.5p) Estimating Parameters - describe
LOS c (7p) Assumptions of SLR - explain, describe
LOS d, e (5.5p) Analysis of Variance - calculate, interpret, describe
LOS f (8p) Hypothesis Testing Coefficients - formulate, determine
LOS g (4p) Prediction & Prediction Intervals - calculate, interpret
LOS h (7p) Functional Forms of LR - describe
Page 1
- Simple Linear Regression (LR) one IV
LOS a
DV - dependent variable - - the variable we
- describe
IV - independent variable - are seeking to explain
the explanatory variable
LR assumes a linear relationship between the DV and the IV
Variation of = ( − ) SST or total sum of squares
- best guess for is , thus if gives a more accurate

estimate of than , we say helps explain
LOS b
= + + error term (residual) - describe
- the portion of the DV that

intercept slope coefficient
cannot be explained by the IV
regression coefficients
( is regressed on )
Page 2
regression compute a line of best fit that
LOS b
residuals minimizes the sum of the squared - describe
deviations between the observed values of
and the predicted values (the regression
line)
i.e. min − − = SSE
( , ) lies on the
- sum of
regression line DV predicted values the squares
of DV ( )
= − error
- Note: = − implies the (a.k.a. residual
residual is in the same units of sum of squares)
( , )
measurement as the DV ( )
∑ ( − )( − )
( ( )=
∑ = )
= = denominator can never be
∑ ( − )
negative, ∴ sign of is
determined solely by ( , )
the ( − ) cancels out - if > , >
Page 3
Interpreting and LOS b
- describe
= if = only makes sense if
the IV has meaning at =
the change in for a one unit change in
e.g. ROA(%) = 4.875% + 1.25 CAPEX(%) ROA = 4.875% of CAPEX = 0

if CAPEX ↑ 1 unit (i.e. 1%), then
ROA ↑ 1.25%
Data/
cross-sectional - many observations on & for the same
time period
time-series - many observations on (and sometimes ) from
different time periods
example #2/3
Page 4
Assumptions/ LOS c
1/ Linearity the relationship between & is linear in - explain
the parameters and neither is multiplied - describe
or divided by another regression parameter
implies the IV must not be random - if so, there
would be no linear relation between &
2/ Homoskedasticity ( ) is the same for all observations

(vs. heteroskedastic) - a violation indicates the data series
may come from 2 different populations (CS) or
regimes (TS)
3/ Independence the pairs ( , ) are independent of each other

∴ is uncorrelated across observations
(no serial correlation)
- needed to correctly estimate the variances
of and
Page 5
Assumptions/ LOS c
4/ Normality is normally distributed - explain
- required to conduct valid tests of the - describe
example #4 values of the regression coefficients
LOS d
Analysis of Variance/ - calculate
Total sum of squares (SST) - interpret
total
( − )
sum of squared errors (SSE) Regression sum of squares (SSR)

unexplained explained
− −
∴ SST = SSE + SSR

or/ total SS = unexplained SS + explained SS
Page 6
Analysis of Variance/
LOS d
Coefficient of Determination - measures the - calculate
fraction of the total variation in the - interpret
DV that is explained by the IV (goodness of fit measure)
- if only 1 IV, square the correlation between IV and DV
- if multiple IVs: total variation in = ( − ) / −
explained variation in = − / −
. ∑ − measures fit but is

= = =
. ∑ ( − ) not a statistical test
Coeff. of
Determination
multiple IVs
∴ statistical test = F-test : = : = =⋯= =
: ≠ : ≠
Page 7
Analysis of Variance/ LOS d
/ / = slope - calculate
= = = - interpret
/ / −( + ) coefficients
df1 =
mean + = regression coefficients
df2 = − −
LOS e
ANOVA table/ - describe
- calculate
- interpret
=
√ =
Page 8
Standard Error of the Estimate (SEE) - a measure of LOS e
the s.d. of - describe
/ / - calculate
∑ − ∑ ∑
=√ = = - interpret
− −
the smaller the SEE, the more accurate the regression

(a.k.a. the standard error of the regression or the root mean square
error)
e.g./
= F. INV(.95,1,4)
= 7.71
=√ .
Example #5
Page 9
1/ Hypothesis Tests of : LOS f
hypothesized value - formulate
test statistic: = − - determine
df = − ( + )
standard error of =
∑ ( − )
= . ( − ) = .
e.g./
: =
at ∝ = %
: =∅
= T. INV(.05,4) = 2.776
−( + )
.
= √ = . =
. −
= . Reject
√ . .
( = , . = . )
Note: : =
√ − . √ SLR
: ≠ = = . Reject
√ − √ −. only
df = −
2/ Hypothesis Tests of : Page 10

LOS f
= − - formulate
and = + - determine
df = −( + ) ∑ ( − )
3/ Hypothesis Tests of if IV is an indicator variable (dummy var.)

takes on a value of 0 or 1
- same process as a test of
= + - if = , = = avg. of all 0 obs.
- if = , = + + = avg. of all 1 obs.
all 0 obs. = difference in means

( − )=( + )−
all 1 obs. = + = − +
=
∴ test-stat. for = test-stat. of differences in means
Page 11
Level of Significance and p-values/ LOS f
- most software output ∝ = % , : parameter = 0 - formulate
recall: = (1 - T. DIST(+ ,df,1)) x 2 example #6 - determine
LOS g
Prediction interval (or CI) for : - calculate
= + - interpret
estimated with error 2 sources of
− = estimated with error error
- recall for CI: +/−

- since there are 2 sources of error and not just one adjust
forecast value of the IV

−
adjusted : = + + and =
( − )
SEE
( − )
Page 12
Prediction interval (or CI) for : 3 LOS g
- calculate
−
= + + - interpret
( − )
1
2
1. the better the fit of the regression model lower lower
2. larger = smaller
3. close is to smaller
Steps/ Determine
Select ∝
Determine
Determine
Determine +/−
example #7
Page 13
1/ Log - lin model LOS h
= - describe
Revenues
- take the log of
both sides
= + growth rate
or revenue
relative
change in for
absolute change in
2/ Lin - Log model

= + ( )
absolute change in for relative change in
- when and are significantly different in scale

exh. #35/36
e.g./ = percent = billions of $ in Revenue
(transform with ( ))
Page 14
3/ Log - Log model = + ( ) LOS h
- describe
the relative change in for a
relative change in
- exh. #37/38
selecting a model depends on goodness of fit
F-stat
SEE ( )
- a plot of the residuals should show randomness and the

distribution should be normal
- if not, consider transforming the DV, IV, or both.
Time Value of Money

Review - 1
r - discount rate
rf = risk-free rate + premiums
rf + inflation premium (1 + rf)(1 + CPI) · liquidity
· default
nominal rate · maturity
moving forward X(1 + r) = ( + ) = ( + ⁄ )
= =
( + ) moving backwards ( + ) ( + ⁄ )
· stated annual interest rate = rs (quoted interest rate)

· effective annual rate (EAR) e.g./ 5% annually
= rs quoted in terms
= +. =( . )
of periods
= . q
(EAR)
Review - 2
e or e
rt rn
- continuous compounding EAR = −
EAR = +
Annuity - finite, level, sequential CFs

ordinary - first CF at T1 Perpetuity:
annuity due - first CF at T0 · infinite
· ordinary
( + ) − · level
= · ord. ann.
· sequential
FV annuity factor
( + ) −
= ×( + ) · ann. due
Review - 3
PV of an annuity/
− −
( + ) ( + )
= = +
ord. ann. ann. due
= - PV of a perpetuity
Solving for r = − r - also called growth

rate
Solving for N =
( + )
Review - 1
LOS a - identify/compare
I/ N - nominal - no logical ordering

Categorical
O - ordinal - ordered data
I - integer - discrete
Numerical
R - ratio - continuous
II/ cross-sectional - multiple observations of a variable

time-series - multiple observations of the same unit over time
panel data - cross-sectional x time-series
III/ Structured - organized data produced by

Unstructured - no organized form (alternative data individuals
(must be transformed into business processes
structured data) sensors
Review - 2
LOS b - describe/
one dimensional array - column or row of a spreadsheet
two dimensional rectangular array - two or more variables
(data table) rows x columns
LOS c - interpret/
frequency distribution one way table
- # of obs./variable
absolute frequency - actual count max - min
relative frequency - %’age of obs./variable
- for numerical data place each obs. in an interval (range/k)
non-overlapping
absolute
cumulative adds up frequencies
relative
Review - 3
LOS d - interpret/
Contingency table - for categorical data
joint frequencies (r x c entry)
marginal frequencies (r or c totals)
Applications/
Confusion matrix - assess precision of a classification model
T T
- prediction vs. actual
F F
test potential association between 2 variables
( )
- chi-square test of independence =∑
LOS e - describe/evaluate/
Histogram and frequency polygon
- represents distribution of numerical data
Bar chart represents distribution of categorical data
(pareto chart, grouped/clustered bar chart, stacked bar chart)
Review - 4
LOS e - describe/evaluate/
Tree Map - set of coloured rectangles to represent groups
- nested rectangles = more categories
Word Cloud - frequency of unstructured data (text)
Line Chart - typically for time series data (trend analysis)
Scatter Plot - visualize joint variation in 2 numerical variables
Heat Map - contingency table with colour-coded cells
LOS f - describe/
Relationship Comparison
over time
Scatter Heat among categories line chart (2 vars)
Plot Map bar chart bubble line
(2 vars) multiple tree map chart (3 vars)
vars
Scatter Plot Heat Map
Matrix
Review - 5
LOS f - describe/ numerical data histogram
Distribution frequency polygon
cumulative distribution chart
unstructured data categorical data
word cloud bar chart, tree map, heat map
Do not: select an improper chart type
select data that favours a conclusion
truncate or extend the range of an axis
LOS g - calculate/interpret/
measures of central tendency
1/ Arithmetic mean ∑
= ⇒ ∑( − )=
sensitive to outliers - do nothing
- delete - trimmed mean (5% trimmed - delete top
& bottom 2.5%)
- replace - winsorized mean
95% winsorized replace all top/bottom 2.5% of obs.
LOS g - calculate/interpret/ Review - 6

odd # of obs: median = ( + )/

2/ Median - middlemost value
even # of obs: median = ( + )
+
not affected by extreme values
useful with non-symmetrical distributions
3/ Mode - most frequently occurring value
unimodal, bi-modal, etc.
- no mode = uniform distribution
Symmetrical distribution: mean = median = mode

4/ weighted mean: =∑ where ∑ =
or ∑ = (probabilities)
5/ geometric mean:
or or
= … = ( … )/ = ( + )( + )…( + )−
LOS g - calculate/interpret/ Review - 7

5/ geometric mean: used with rates of change over time or
to compute growth rates
≤ and = −
forecasting next period returns use
" over multiple periods use
6/ Harmonic mean = - gives much less weight to

∑ outliers
- a fixed amount applied to a variable quantity
LOS h - evaluate/
× ≈ and
> >
one-period minimize
return compounding outliers
LOS i - calculate/interpret/ Review - 8

measures of location
Quantiles Quartiles, Quintiles, Deciles, Percentiles
=( + ) - with data in ascending order
location Ly is an integer - done

Ly is a decimal linear interpolation
Visualization
Box and whisker plot
upper
= (1.5 x 1QR)
fence
+ upper
bound
Box & whisker plot

lower fence = lower bound - (1.5 x 1QR)
Review - 9
LOS j - calculate/interpret/
measures of absolute dispersion
1/ Range max value - min value
- only uses 2 observations
- no information about the distribution
2/ MAD-mean absolute deviation ∑| − | uses all the

=
observations
3/ Variance and standard deviation
∑( − )
= = = ∑( − )
−
measured in same units as − degrees of freedom
units squared the mean
4/ Target Downside Deviation
= ∑∀ ( − ) - measure of dispersion
− below the target
full dataset N
LOS k - calculate/interpret/ Review - 10

measures of relative dispersion
CV - Coefficient of Variation = where >
- allows for direct comparison of dispersion across different
data sets
LOS l - interpret/
Skew:
∑( − )
×
LOS m - interpret
Kurtosis:
∑( − )
= −
Review - 11
LOS n - interpret
Covariance - joint variability of 2 random variables
∑( − )( − )
=
−
Correlation: measures the linear association between 2 variables
− ≤ ≤
r = 0 no linear association - max diversification
r = -1 perfect negative correlation - perfect hedge
r = +1 perfect positive correlation - perfect replication
Linear association only

sensitive to outliers
correlation does not imply causation
Review - 1
LOS a - define/
Random variable - a quantity whose future outcomes are uncertain
Outcome - a possible value of a random variable (RP)
(4.3%)
Event - a specified set of outcomes (> %, ≤ %)
LOS b - identify/compare/contrast/
Property 1: ≤ ( )≤ where are
i) mutually exclusive
Property 2: ∑ ( ) =
ii) exhaustive
Empirical probability based on historical observation (past is assumed

to be representative of the future
Subjective probability based on experience or intuition
A priori probability based on deductive reasoning
LOS c - describe/ expressed as Review - 2

( )
Odds for: a to b probability =
− ( ) +
− ( )
Odds against: b to a probability =
( ) +
LOS d - calculate/interpret/
( ) unconditional probability
( | ) conditional probability
( ) joint probability
( | )=
( )
(A ∧ B)
LOS e - demonstrate/
A B
Multiplication Rule: ( )= ( | ) ( )
Addition Rule: ( )= ( )+ ( )− ( )
( | )× ( )
Review - 3
LOS f - compare/contrast/
Independent event - 2 events are independent iff:
( | ) = ( ) .˙. ( ) = ( ) ( )
Dependent event - ( ) is related to ( )
Total Probability Rule: ( )= ( | ) ( )+ … + ( | ) ( )
( )= ( )=
unconditional ( | )
( | ) ( )
probability
( ) multiplication
( ) ( )
( ) rule
+
( )
+
( ) ( )
( )
( ) ( ) ( )
conditional
total probability rule
probability
LOS h - calculate/interpret/ Review - 4

Expected Value the probability-weighted average of the possible
value of Xi outcomes
( )= ( )
probability
Expected variance also probability-weighted
( )= ( ) − ( ) squared deviation
probability
LOS i - explain/
value of Xi
Conditional expected value: ( | )= ( | )
probability
LOS j - interpret/ ( | ) X1
= ( | ) + ( )
demonstrate/
B ( | )
X2 = ( ) + ( )
A ( | ) X3
C ( | ) = ( ) ( | )+ ( ) ( )
X4
1/ ( )= ( ) and =
2/ ( )= cross product of
the deviations
∑ − −
=
−
n = 3 variances all > 0

- assume 3 assets:
( )= ( )+ ( )+ ( )
(by selecting
assets with zero or + ( )+ ( )
negative covariance, + ( )
portfolio risk is >
(n2 - n) = 6 Covariances
lowered) ≤
(n - n)/2 = 3 unique covariances
2

3/ Correlation:
= or =
LOS L - calculate/interpret/ value of the cross product

of the
( )= − − deviations
- where i and j
joint probability
are scenarios
LOS m - calculate/interpret
Bayes’ Formula - a method for updating prior probabilities
based on new information
( | ) ( )=
( | ) ( )
( | ) ( ) ( | )=
( )
( | ) +
( ) ( )
( )=
( )
LOS n - identify/analyze/ P
Review - 7
1/ Multiplication (2 x 4 x 3)
D F
industries
sizes
2/ Factorial ! - the number of ways to fill positions
!
3/ Multinomial - the number of ways objects can
! !… !
be labelled w/ k labels (k ≥ 3), each
of which has objects
!
4/ Combination - the number of ways of selecting
( − )! !
objects from where order does not
matter (k = 2)
!
5/ Permutation
( − )!
LOS a - define/compare/contrast/ Review - 1
Probability distribution specifies the probabilities associated with

the possible outcomes of a random variable
Random variable a quantity whose future outcomes are uncertain
discrete - a countable # of possible values
continuous - infinite and uncountable
Probability function specifies the probabilities that a random variable
can take
p(x) - discrete, f(x) - continuous
properties: ≤ ( )≤ and ∑ ( )=
LOS b - calculate/interpret/
cumulative distribution function (cdf) gives the probability that a
variable is less than or equal to a particular value
( )= ( ≤ )
LOS c - describe/calculate/interpret/ Review - 2

Discrete uniform distribution ( )= , 0 otherwise
−
LOS d - describe/calculate/interpret
Continuous uniform distribution ( )= for all ≤ ≤
−
−
( )=
−
LOS e - describe/calculate/interpret
Bernoulli random variable the outcome of any binary trial
= success ( − ) = failure
- in trials, the number of successes is a binomial random variable
(# of successes in Bernoulli trials)
assumptions: is constant and trials are independent
~ ( , ) - described by 2 parameters
Review - 3
LOS e - describe/calculate/interpret
! - if p = .50, the distribution will
( )= ( − )
( − )! ! be symmetrical
- for ( ≤ )= ( )+ ( − )+⋯ ( )
mean variance
cdf
Bernoulli ( − ) 100%
Binomial ( − ) pdf
LOS f - explain 50% 50%

0
1/ described by 2 parameters, and
2/ skew = 0, kurtosis = 3, mean = median = mode
3/ a linear combination of 2 or more normal random variables
is also normally distributed
- but now multivariate
Review - 4
LOS g - explain/contrast/
- multivariate normal distribution completely described by
3 lists of variables
1/ all the means ( )
2/ all the variances ( )
3/ all pairwise correlations ( − )/ unique corrs.
LOS h - calculate/ + LOS j - calculate/interpret/ probability

= NORM. S. DIST (z-value, 1) = probability
= NORM. S. INV (probability) = z-value
z-value
+/− ~ % +/− . ~ % +/− . ~ %
LOS i - explain/
− =1
= for all in a sample/data set
0
- produces a standard normal distribution

LOS k - define/identify/ Review - 5
Safety first rules focus on shortfall risk - the risk a portfolio

value (or return) will fall below some minimum acceptable
level over some time horizon
( )− - objective is to max. this ratio
=
z-value - optimal portfolio minimizes N(-SFRatio)
LOS L - explain/
- commonly used to model prob. dist. of asset prices
right
- a variable Y follows a lognormal dist. if
skewed
LN(Y) is normally dist.
0 ∞
- described by the and of its associated normal distribution
LOS m - calculate/interpret/
( / )= + but ( / )= continuously compounded return
normally distributed
Review - 6
LOS m - calculate/interpret/
~ ( - both
, &
) scale linearly with time
- assumptions: returns are 1/ independent
2/ identically distributed ( & do
not change)
Volatility the annualized sd of the continuously
compounded daily returns of the underlying asset
LOS n - describe/calculate/interpret/
Student’s t-dist. - defined by a single parameter

normal df = −
- as ↑, t-dist. approaches z-dist.
more less
weight t −
weight =
/√
t-tests are more conservative, produce wider confidence intervals
Review - 7
LOS o - describe, calculate, interpret/ low df

- the distribution of variance, df = −
- bounded below by 0 high df
F-distribution ratio of / −
= df =
2 variables / − − , −
Excel function calls:

NORM. S. DIST (z,1) NORM. S. INV (prob)
T. DIST (t,df,1) T. INV (prob,df)
CHISQ. DIST ( value,df,1) CHISQ. INV (prob,df)
F. DIST (F,df1,df2,1) F. INV (prob,df1,df2)
Review - 8
LOS p - describe/
Step 1 Specify the quantity of interest

2 Specify time steps ∆
3 Specify distributional assumptions for the key risk factors
4 draw random values from each distribution for each ∆
times 5 convert the random values to the quantity of interest
6 Calculate a new value of the quantity of interest
- provides statistical estimates only, not exact results

- does not support cause
LOS a-c compare/contrast/explain/ Review - 1
A/ Probability sampling - every observation has an equal probability of

selection
- produce more representative samples
1/ Simple random sample - observations chosen at random
2/ Systematic sampling - every observation
3/ Stratified sampling random samples drawn from sub-populations
in proportion to the size of the sub-pop.
4/ Cluster sampling - pop. is divided into clusters (mini-populations)
- certain clusters are selected as a whole
one-stage cluster sampling
- sub-samples are selected from each cluster
two-stage cluster sampling
usually results in lowest precision but is both time and
cost efficient
LOS a-c compare/contrast/explain/ Review - 2
B/ Non-probability sampling
5/ Convenience sampling - observations that are easy to
obtain or are accessible
6/ Judgmental sampling - select observations based on
experience and knowledge
samples may not be representative
LOS d, e explain, calculate, interpret/

higher sampling error:
the sampling
= /√
distribution of
the sample
sd ≠ decreases as
means
increases
constant
measure of
measure
precision
of dispersion
LOS f - identify/describe/ Review - 3
Desirable properties of an estimator

1/ Unbiasedness - the expected value equals the parameter it
is intended to estimate
2/ Efficiency - no other unbiased estimator has a sampling
distribution with smaller variance
3/ Consistency - an increases, the precision increases
LOS g, h contrast/calculate/interpret/
Confidence Intervals: Point estimate +/- Reliability Factor x Standard
Error
1/ Normally Dist. Pop., Known Variance +/− ∝/ /√

2/ Large Sample, Variance Unknown +/− ∝/ /√
3/ Variance Unknown +/− ∝/ /√
Review - 4
LOS g, h contrast/calculate/interpret/
< >
normal dist., known z z
normal dist., unknown t t or z
practice uses t
non-normal dist., known N/A z
non-normal dist., unknown N/A t or z
to obtain a desired width for a CI: = where = / width
LOS i - describe/
Resampling - repeatedly drawing samples from a given sample
1/ Bootstrap method - sampling with replacement
sample draw an observation standard ∑ −

= =
record error −
replace
Review - 5
LOS i - describe/
1/ Bootstrap method - can also find of an estimator even
when no analytical formula is available
2/ Jackknife method - omit 1 observation from a sample 1 at a

time
= = , omit will produce similar
= , omit results from sample to
sample
= , omit
LOS j describe/
1/ Data snooping bias (aka data mining) - searching a data set
for statistically significant patterns
- typically will not be theory-driven
- will lack an economic rationale
Review - 6
LOS j describe/
minimize: training data validation test out-of-sample
set data set data test to evaluate
evaluate fit
model fit
build and fit
a model and fine tune
2/ Sample selection bias - exclude some observations or time periods

e.g./ survivorship bias equity indexes exclude companies that
failed
self-report bias hedge fund indexes typically don’t have
poor performers
both overstate results
3/ Look ahead bias - using information that was not available on the
observation date
4/ Time period bias - results of one time period may be specific to that
time period
Hypothesis Testing
LOS a - define, describe, interpret/ Review - 1
hypothesis a statement about one or more populations that are

tested using sample statistics
Process: 1. State the hypothesis

2. Identify the appropriate test statistic
3. Specify the level of significance
4. State the decision rule
5. Collect data and calculate the test statistic
6. Make a decision
#1/
- null assumed to be true unless rejected (contains the equality
- alternative typically want to reject sign)
LOS b - compare/contrast/ left-sided right-sided

2 sided : = : ≥ : ≤
: ≠ : < : >
Review - 2
LOS c - explain/
test statistic a calculated value of a distribution to compare
to a critical value of the distribution in order
to test
- as ∝ ↓, ↑
negative DNR Correct (false)
−∝ Type II - to decrease both, increase
positive R ∝ Correct
Type I ( − )
(false) Power of a test
LOS d - explain/determine/
Reject : 2 sided test-stat > critical value
left-side test-stat < critical value
right-side test-stat > critical value
Review - 3
LOS d - explain/determine/
+/− critical value /√ (confidence Interval)
lower bound upper bound

- if the CI around contains , Do not Reject
- if theres is statistical support reject
- something statistically significant may not be economically
significant.
LOS e - explain/interpret/
p-value the area in the probability distribution outside the
calculated test statistic
2-sided = (1 - NORM. S. DIST(+z,1)) x 2 or T.DIST(+ ,df,1)
- if p-value < ∝ , Reject
Review - 4
LOS f - describe/
- multiple testing problem in repeated tests with a level
of significance ∝, expect ∝ false positives
- rank all p-values highest to lowest

yes, go to next p.
- for each one, assess if ( ) ≤∝
# no, stop. ( )=
= adjusted # of FPs
LOS g - identify/interpret/
- test of a single mean
< ≥
population known
approx. normal unknown or
typically
= − = − or = −
/√ /√ /√
Review - 5
LOS h - identify/interpret/
- Differences between means - independent samples
= assumed
=( − )−( − ) =( − ) +( − )
+ −
+
df
: = : ≤ : ≥
: ≠ : > : <
LOS i - identify/interpret/
- Differences between means - dependent samples
- arrange data in pairs, calculate ( − ) ,
∑
=
− : = left right
= df = −
/√ ≥ ≤
: ≠ < >
Review - 6
LOS j - identify/interpret/
left right
Single Variance : = ≥ ≤
: ≠ < >
( − )
= df = − note: must determine each critical
value
left right
Equality of 2 Variances : = ≥ ≤
: ≠ < >
= / df1 = − , df2 = − and is the larger

variance
LOS k - compare/contrast/describe/
non-parametric testing
1/ when data do not meet distributional assumptions
2/ when there are outliers
3/ when data are given in ranks or use an ordinal scale
4/ hypothesis do not concern a parameter
Review - 7
LOS L - explain/determine left right
Test of correlation parametric test : = ≥ ≤
: ≠ < >
√ −
= df = − Pearson or bivariate
√ −
correlation
- as ↑, even small will be significant
Non-parametric test - Spearman rank correlation coefficient

rank all obs. on X & Y from largest to smallest
calculate =( − )
= ∑ √ −
− test = df = −
( − )
−
Review - 8
LOS m - explain/ Tests of Independence
−
non-parametric test = df = ( − )( − )
: categories are independent ×

=
: categories are not independent
right-tail test only
standardized residuals = − > means more obs. than

expected if categories
were independent
< - fewer obs.

Simple Linear Regression
LOS a - describe/ Review - 1
DV - dependent variable - seeking to explain

IV - independent variable - explanatory variable
- best guess for is , so if gives a more accurate estimate

of , we say helps explain
( − ) = total sum of squares
LOS b - describe/ = + + residual/error term
intercept slope
regression coefficients
- regression computes a line of best fit among ( , ) pairs that

minimizes
SSE - sum of squared errors ∑ =
=
Review - 2
LOS b - describe/ = −
( , )
= - denominator can never be neg.
( ) ∴ sign of determined by ( , )
the change in for if > , >
a 1 unit change in
cross-sectional data - many ( , ) pairs for the same time period

time-series data - many (or as well) obs. over time
LOS c - explain/describe/
1/ Linearity - relationship between and is linear in the parameters
- IV must not be random, i.e. ( | )
2/ Homoskedasticity - ( ) is constant for all observations
3/ Independence - ( , ) pairs are independent of each other
- is uncorrelated across observations
Review - 3
LOS c - explain/describe/
4/ Normality - is normally distributed
LOS d - calculate/interpret/
∑( − ) SST Total sum of squares
sum of squared
errors SSE
∑ − + ∑ − SSR - regression sum of
squares
Coefficient of Determination = =
=
- measures fit
slope
coeff.
F-test = / / : =
= =
/ / −( + ) : ≠
reg. coeff.
LOS e - describe/calculate/interpret/
SSR / k = MSR = F-stat.

SSE / n-k-1 = MSE standard error
SST √ = = of the estimate
Review - 4
LOS e - describe/calculate/interpret/
SEE = ∑ / - the smaller the SEE, the more
accurate the regression
−
LOS f - formulate/determine/
− : =
= = =
: ≠
∑ ( − )
√ −
= - produces the same result.
√ −
− = +
= ∑ ( − )
when is an indicator variable same as above

= if = = avg. of all 0 obs.
= + if = + = avg. of all 1 obs.
Review - 5
= + - 2 sources of error
+⁄− = + +( )
1 2 3
more narrow CI lower SEE /higher /closer is to
LOS h - describe
1/ Log - lin model = +
relative change in for absolute change
in
2/ Lin - log model = + ( )
absolute change in for relative change in
3/ Log - log model = + ( )

relative change in for relative change in

2022 LI QuantMethods

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 LI QuantMethods

Uploaded by

Copyright:

Available Formats

Last Revised: 08/03/2021

Level I - Quantitative Methods

The Time Value of Money 2

Organizing, Visualizing, and Describing Data 11

Common Probability Distributions 40

Sampling and Estimation 53

Introduction to Linear Regression 74

© markmeldrum.com. All rights reserved.

The Time Value of Money

a. interpret interest rates as required rates of return, discount rates, or opportunity

Time Value of Money

- 3 rules of money: 1. money sooner is worth more LOS a

+ Default risk premium - compensates for credit risk

+ Liquidity premium risk of loss vs. fair value if an investment

Future Value of a Single Cash Flow/ LOS e

25 yrs. 65yrs. at 5%, N = 40: FV = 1000(1.05)40 = 7,039.99. (2000)

$1000 FV = ? at 7%, N = 40: FV = 1000(1.07)40 = 14, 974.46 (2800)

at 9%, N = 40: FV = 1000(1.09)40 = 31,409.42. (3600)

e.g./ $5M at t = 0, r = 7% compounded annually, N = 5 years LOS e

e.g. 2/ Invest ¥2.5M, r = 8% compounded annually, N = 6 years

e.g. 3/ $10M at t = 5, E(r) = 9%, FV at t = 15?

Frequency of Compounding/ LOS d

e.g. 2/ PV = $1M, N = 1, r = 6% compounded monthly, FV = ?

- if we know EAR, we can solve for rs [( + ⁄ ) − ]

Future value of a series of cash flows: LOS e

Future value of a series of cash flows: LOS e

-1000 -1000 -1000 -1000 -1000

Display END BGN.

Future value of a series of cash flows: LOS e

Present value of a single cash flow:

Present value of a series of cash flows: LOS e

t = 0 1 2 19 20 CPT PV (+ 20,000) = 2,267,119.05

Present value of a series of cash flows: LOS e

Present value of a perpetuity: = LOS e

Solve for r, N or PMT/ LOS e

- growth in Sales 14,146.40 = 10,503 (1 + g)4

- growth in Profit 796.4 = 822.5 (1 + g)4

Solve for r, N or PMT/ LOS e

retirement income LOS f

Organizing, Visualizing, and Describing Data

a. identify and compare data types;

b. describe how data are organized for quantitative analysis;

c. interpret frequency and related distributions;

d. interpret a contingency table;

f. describe how to select among visualization types;

g. calculate and interpret measures of central tendency;

h. evaluate alternative definitions of mean to address an investment problem;

i. calculate quantiles and interpret related visualizations;

j. calculate and interpret measures of dispersion;

k. calculate and interpret target downside deviation;

n. interpret correlation between two variables.

Organizing, Visualizing, and Describing Data

7.5p Data Types (LOS a) identify

2.5p Skewness (LOS l)

Data a collection of numbers, characters, words, Pg-1

Categorical: values that describe a quality or characteristic

II/ Cross-sectional vs. Time-series vs. Panel Data Pg-2

a) cross-sectional - multiple observations of a particular variable

- also called ‘alternative data’

Var 1 Var 2 ... Var N

- for numerical data: create non-overlapping intervals (bins)

e.g./ -4.57 Pg-6

• cumulative absolute frequency a sequence of partial sums