Professional Documents
Culture Documents
Empirical Research Methods-AB
Empirical Research Methods-AB
Empirical Research Methods-AB
Solving Approach
DR ARINDAM BANDYOPADHYAY
Associate Professor (Finance)
Define
Define
Problem
Problem
Plan
PlanDesign/
Design/
Primary/ Specify
Specify
Primary/
Secondary
SecondaryData
Data Sampling
Sampling
Procedure
Procedure
Collect
Collect
Data
Data
Analyze
Analyze
Data
Data
Prepare/
Prepare/
Present
Present
Report
Report
Follow
FollowUp
Up
Econometric Problem Solving Approach
Theory Facts
Statistical
Model Data Theory
Structural
Evaluation Forecasting
Analysis
The Need for Applied Statistical Tools
PHASE IV
PHASE III
PHASE II
PHASE I
Phase II Dispatch of disk and e-mail containing the data entry 2 weeks
template and description of data fields to all the
participating banks by NIBM
Phase III Entry of all relevant data, as per template, by the bank 4 weeks
representatives
Dispatch of disk & e-mail containing the database by
the bank representatives to NIBM
• In the Literature Review phase you will have to order and digest the
material you have read, and then produce a structured summary and
critique of the reading you have done.
• The aim of the Literature Review is not to systematically catalogue
the reading you have done. Rather, using the hypothesis you have
developed or the project idea you have identified, you should use the
reading to identify the major ideas and threads of development, relate
work that has (perhaps) not previously been related, and thereby
justify and refine your hypothesis / ideas.
• your literature review must be organised around ideas with an
assessment of previous studies (including their strengths and
weaknesses).
• The Literature Review should "tell a story" that identifies the
development and blossoming of your ideas as you conducted your
literature search.
• It provides you the opportunity to persuade your reader (and
examiner) that your work is relevant and that it was worth doing!
Importance of Literature Review in the Research
Report
Asarnow & Edwards (1995) 24 years 831 commercial & Workout -Average LGD is 35% for
Citibank industrial loans & C&I loans
data structured loans -For structured Loans:
(highly collateralized) 13%
Altman, Edward I. & Vellore 1978-95 696 defaulted bonds by Market -Average LGD is 58.3%
M. Kishore (1996),” Almost seniority & industry - 42% for Sr. Secured,
Everything You Wanted to class 52% for Sr. Unsecured,
Know About Recoveries on 66% Sr. Subordinate,
Defaulted Bonds”. 69% Jr. Subordinate
Altman, Brady, Resti and 1982-2001 1,300 corporate bonds Market -Average 62.8%
Sironi (2003), ISDA research -PD and LGD correlated
report
Gupton and Stein (2005), 1981-04 3,026 defaulted loans, Market -Beta distribution fits
Moody’s Investor Service, bonds and preferred recovery
Global Credit Research stocks -Small no. of LGD<0
LGD public studies…
Acharya et al. (2003) 1982-1999 1,511 bond & debt Market -Liquidity, Seniority,
instruments industry, firm
profitability matters
Neto de Carvalho & June ‘85- 371 defaulted loans Workout -bi-modal LGD
Dermine (2003 & 2005) Dec 2000 -size & collateral effects
on LGD etc.
Araten, Michael, and 1982-’99 3,761 large corporate Workout -Average 39.8%
Peeyush (2004), The RMA loans of JP Morgan -St.dev.35.4%
Journal, May, pp. 96-103 -Min/Max (20%/38%)
Mishra & Verma (2016), 200-’15 NPA movements & System -weaker loan recovery
EPW 2004-’05 to performance of level, based rates
2014-’15 recovery channels on various Collateral quality,
(DRTs, SARFAESI, Lok recovery debtor credit
Adalats) channels relationship, credit
environment matters
Example 2: Corporate Compliance Cost Study
Advantages Disadvantages
• Answers a specific • Expensive
research question
• Time Consuming
• Data are current & can
better give realistic view • Quality declines if
interviews/questionnaire
• Source of data is known & are lengthy
can have wide coverage
• Secrecy can be maintained • Reluctance to participate in
lengthy
interviews/questionnaire
filling
Disadvantages
Disadvantagesareare
usually
usuallyoffset
offsetby
bythe
the
advantages
advantages
Secondary Data
Advantages Disadvantages
• Ease of Access, Saves time and • May not be on target
money IF on target with the research
problem
• Aids in determining direction & hence may not meet
for primary data collection researcher’s needs
• Low cost to acquire • Quality and accuracy of
data may pose a
• Secondary research is often
problem
used prior to primary research
to help clarify the research • Not Timely
focus.
• it provides a way to access the
work of the best scholars all
over the world.
Sources of Secondary Data
Internal
InternalBank
BankInformation
Information
Government
Government Agencies:
Agencies:CSO,
CSO,RBI
RBI
Trade
Tradeand
andIndustry
IndustryAssociations:
Associations:NIC
NIC
Economic/Financial
Economic/FinancialResearch
ResearchFirms:
Firms:CRISIL,
CRISIL,
NCAER
NCAERetc.
etc.
Commercial
CommercialPublications:
Publications:RBI
RBIReport,
Report,Published
Published
Paper
Paperetc.
etc.
News
News Media
Media
Secondary Database
• Indian Database:
– CMIE PROWESS database (Firm Level)
– http://economicoutlook.cmie.com/ (Macro Database)
– http://www.epwrfits.in/NAS_Series.aspx
– Stock Market Database: www. nseindia.com
– CSO:http://www.mospi.gov.in/
– NIC: http://indiabudget.nic.in
– RBI Database: www.rbi.org.in (or http://dbie.rbi.org.in/DBIE/dbie.rbi?site=publications#!
4
• RBI Publications: http://www.rbi.org.in/scripts/publications.aspx?publication=Annual
• Basic Statistical Returns-Credit, Deposit Distribution, Maturity Pattern etc. across
Banks
• Bank’s Balance-sheet data: Annual Accounts Data of Scheduled Commercial
Banks (1979 to 2004): http://www.rbi.org.in/scripts/Foreword.aspx
• Handbook of Statistics of Indian Economy:
http://www.rbi.org.in/scripts/AnnualPublications.aspx?head=Handbook%20of
%20Statistics%20on%20Indian%20Economy
• Daily, quarterly, fortnight, weekly data: http://www.rbi.org.in/scripts/statistics.aspx
• Data with short frequency: RBI Monthly Report:
http://www.rbi.org.in/scripts/BS_ViewBulletin.aspx
• Global Database:
– Federal Reserve:http://www.federalreserve.gov/releases/h15/data.htm
– Global Financial Database: www.globalfindata.com/index.php3?
action=global_financial_database_description
– Global Econmic Parameters: http://www.economagic.com/
– Global Economic Statistics: http://www.econstats.com/index.htm
Data Templates & Formats
• The value of z (or t) can be found in statistical tables which contains the
area under the normal curve (or t distribution). It is the abscissa of a curve
that cuts off an area (α) at the tails (1- α equals the desired level of
confidence level, say 95%). However, many researchers use t-distribution
(two tail) table to obtain z when population variance is unknown.
• The margin of error (e) is the difference between population mean ( ) and
sample mean( ).
• Exercise: Suppose a researcher wish to evaluate the effectiveness of
Financial Literacy Programme organized by banks where farmers were
encouraged to adopt a new practice. Assume there is a large population
but we don’t know the variability (or variance) in the proportion that adopt
the practice. If we desire a 95% confidence level and ±5% precision. The
resulting sample size would be: ?
Types of Samples
• Feasibility Study
• Data Collection Process
• Quantitative vs. Qualitative Data
• Micro vs. Macro Data
• Data Cleaning/filtering/editing: missing data, outliers, detecting errors
& omissions, checking accuracy, uniformity and consistency etc.
• Data entering & formatting (cross section/panel/time series)
• Data coding: Conversion of qualitative factors into quantitative ones &
data categorization/grouping.
• Cross checking information: data validation
• Model/Method Validation-DV vs. IV, Choice of Functions, Two
Variable vs. Multivariate Techniques, Cross Section vs. Panel,
Degrees of Freedom, Multicollinearity, Serial Correlation, Structural
Changes, Errors in Measurement, Non-Stationarity (trends,
seasonality), Non-linearity
Empirical Methods
• Crucial. Many readers will just read the Introduction and the
conclusion.
• What should it contain?
Remind the reader about the original question
Remind the reader why this is important and interesting
Tell the reader what your contribution is but in much more
detail than in the Introduction
Future avenues of research
Research Presentation
Categorical Data
Graphing Data
Tabulating Data
The Summary Table:
Frequency Distribution Bar Charts
7
6
O g ive
5
4
3 Histogram
12 0
10 0
80
Ogive
60
2
40
1 20
0 0
10 20 30 40 50 60
10 20 30 40 50 60
Example of Multi-dimension Bar Charts
• outliers are values that are markedly smaller or larger than most other values in the
same data
Summary Statistics of a panel of open joint stock cos.
During 2000-06 collected by SMIDA, Ukraine
Source: DeServigny & O Renault, Measuring & Managing Credit Risk, McGrawHill
70
Descriptive Statistics of Bank Variables: Can you
detect any issue?
Basic Concepts on Variables
Series: Loss_rate_bsp
8
Observations 19
Mean 127.6842
Median 116
Frequency
6
Maximum 297
Minimum 43
Std. Dev. 72.44619
Skewness 0.844582
4
Kurtosis 2.880478
Jarque-Bera 2.270154
Probability 0.321397
2
0
77
Histogram of Bond Defaults Rate (bp)
5
Series: BONDDEF
Sample 1982 2000
4 Observations 19
Mean 127.6842
3 Median 116.0000
Maximum 297.0000
Minimum 43.00000
2 Std. Dev. 72.44619
Skewness 0.844582
Kurtosis 2.880478
1
Jarque-Bera 2.270154
Probability 0.321397
0
25 50 75 100 125 150 175 200 225 250 275 300
N k 2 ( KURT 3) 2
Jarque-Bera test statistics JB SK
6 4
where N=Sample size. It follows chi2 distribution with 2 d.f.
Descriptive Stats about Zone-wise Loan Distribution of
a Bank
zone_group p1 p5 p10 p25 p50 p75 p90 p95 p99 min max range mean sd cv Kurto Gini HHI
Central_Z_I 0.11 0.4 0.69 1 1.68 3 7.67 11.15 84.79 0.01 90.89 90.9 4.042 10.40 2.57 56.47 0.634 0.356
Central_Z_II 0.03 0.39 0.62 0.99 1.43 2.54 6.97 13.71 107.7 0.01 211.43 211.4 5.410 20.75 3.83 76.68 0.724 0.519
East_Z 0.02 0.51 0.95 1.35 2.39 10.8 30.5 55.84 260 0.01 1251 1251.0 20.703 97.27 4.70 137.38 0.792 0.598
Mumbai_Z 0.04 0.29 0.64 1.48 4.14 15 49.6 133.7 560 0.004 1204.4 1204.4 27.815 91.71 3.30 67.32 0.786 0.572
North_Z 0.11 0.5 0.82 1.2 2.22 5.49 13.4 41.45 183.3 0.01 731.03 731.0 10.620 44.61 4.20 159.31 0.739 0.519
South_Z_I 0.13 0.83 0.97 1.31 2.4 6.41 24.3 38.27 97.3 0.02 380.5 380.5 8.735 23.87 2.73 155.87 0.701 0.421
South_Z_II 0.07 0.41 0.79 1.37 3.11 9.74 29 59 272.4 0.04 400 400.0 13.249 37.77 2.85 67.84 0.720 0.442
West_Z_I 0.21 0.73 0.94 1.53 3.27 11 29.7 105.5 225.1 0.12 385.4 385.3 18.296 48.88 2.67 31.59 0.759 0.547
West_Z_II 0.22 0.69 0.83 1.23 2.34 4.93 13.7 27.16 50.32 0.07 99.54 99.5 6.108 11.55 1.89 32.66 0.619 0.299
Total 0.08 0.46 0.79 1.24 2.51 7.94 26.1 52.41 250 0.004 1251 1251.0 15.505 62.26 4.02 163.61 0.771 0.578
The above table represents the detailed summary statistics of loan distribution across 9
zones.
p1, p2,…p50 p75 are the percentile values of actual size of the loans. Like p50 measures the
median loan size. The tail side of the loan distribution is captured by p99 percentile.
The nature of percentiles, coefficient of variations, Gini and Herfindahl Index tell us there is a
significant presence of geographic concentration in the corporate loan portfolio of the bank.
West Zone II & Central Zone I are diversified. East Zone has highest level of concentration
because of the presence of two very large loans.
Gini & Lorenz Curve Measure of inequality or
Concentration
Or more compactly
Thus, Gini=X/(X+Y)=
Geographic Loan Concentration: Gini Coefficient
Approach
Zone wise Inequality Comparison in Loan Distribution
100.00%
90.00%
80.00%
Central_Z_I
70.00%
Central_Z_II
Cum % of Loan Share
60.00% East_Z
Mumbai_Z
50.00%
North_Z
40.00% G=1-Σpi×(zi+zi-1)
South_Z_I
30.00% South_Z_II
West_Z_I
20.00%
West_Z_II
10.00%
0.00%
0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00
%
Cum % of Borrow ers
Application-Rating Model Validation Tests: Lorenz
Curve for Credit Scores
G=1-Σpi×(zi+zi-1)
Zi is cum% share in total default up
to ith group
Pi is relative default% share
Straight forward measures of concentration: HHI
NIBM
Expected Loss (EL) based HHI Measure of
Concentration
Besides Rating,
EL based HHI
Measure depends
upon pool size,
Largest loan size
as well as the no.
of loans.
Hence, Portfolio
slicing is
important!
• In fraction, HHI of Pool 1 is 0.020 means well diversified; Pool 2: 0.10 moderate level
of concentration and Pool 3: 0.352 indicate high level of concentration
• However, HHI does not tell the ways to reduce concentration!
NIBM
Theil Entropy Index
• These are the four moments about mean describe the nature of loss
distribution in risk measurement.
• The mean is the location of a distribution & Variance or the square of
standard deviation measures the scale of a distribution.
• The Skew is a measure of the asymmetry of the distribution. In risk
measurement, it tells us whether the probability of winning is similar
to the probability of losing and the nature of losses.
• Negative skewness means there is a substantial probability of a big
negative return. Positive skewness means that there is a greater-
than-normal probability of a big positive return.
• Kurtosis is useful in describing extreme events (e.g., losses that are
so bad that they only have a 1 in 1000 chance of happening).
• In the extreme events, the portfolio with the higher kurtosis would
suffer worse losses than the portfolio with lower kurtosis.
• Skewness and Kurtosis are called the shape parameters
Moments and the Nature of Distribution
• Since Kurtosis measures the shape of the distribution (the fatness of the tails), it
focuses on losses are ranged around the mean.
– leptokurtic means smaller proportion of medium sized deviation from mean,
but larger proportion of extremely large and small deviation from mean.
Kurtosis greater than three indicates a sharp/high peak with a thin midrange
and fat tails
– Platykurtic means smaller proportion than normal deviation from mean that
are extremely small or large and a larger proportion of medium sized
deviations from mean. Kurtosis of less than three indicates a low peak with a
fat midrange on either side
– A normal distribution is called mesokurtic and it has a kurtosis of 3.
90
Difference between Skewness & Kurtosis
92
Popular Discrete Distributions: Rule of Thumb for
Identifying Them
Binomial: variance<a. m
Poisson: variance=a. m
Negative Binomial: variance>a. m
Date Fraud
50
Amount ($)
18/01/2003 1285.73
26/01/2003 1268.1
40
26/01/2003 1392.33
08/01/2003 1257.85
20/01/2003 1261.13
Frequency
30
22/02/2003 1252.79
…
09/08/2004 1251.9
20
13/09/2004 1347.66
19/09/2004 1282.3
26/09/2004 1269.83
10
12/10/2004 1312.61
23/10/2004 1256.37
27/10/2004 1299.78
0
N= 12
p= 0.8
N!
f ( x) p x (1 p) x
x!( N x)!
Probability
Mean=N p
and
1
15
19
23
25
27
29
11
13
17
21
Np (1 p)
x
The parameter p can be estimated by: 95 pˆ
N
Summary of Frequency of Loss Daily Data for Credit
Card Fraud
Poisson Distribution:
No. of events Observed i x ni
per day (i) frauds (ni)
0 19 0 x
e k
1 16 16 f ( x)
2 51 102 k 0 k!
3 9 27
4 6 24 E=2.71828…
5 5 25 x=0,1,2,…,
6 4 24
7 6 42
8 2 16
9 1 9 Here, mean (Lambda)
10 0 0 λ =sum(ixni )/sum(ni)
11 0 0 =352/124=2.84
12 2 24
13 1 13 Here, SD=
14 0 0 √(2.84)=1.68523
15 2 30
Total: 124 352 96
Distribution of Credit Card Fraud Events
60
50
40
Observed Frauds
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
no of events per day
97
Fitted Poisson Values for Credit Card Frauds
Lambda (λ=2.84)
No. of Events Probability Possion Parameter
0 5.84% 2.84
1 16.59%
2 23.56%
3 22.31%
Probability Fitted Poisson
4 15.84%
5 9.00% 25.00%
6 4.26%
7 1.73%
8 0.61% 20.00%
9 0.19%
10 0.05%
Probability
11 0.01% 15.00%
12 0.00%
13 0.00%
14 0.00% 10.00%
15 0.00%
16 0.00%
17 0.00% 5.00%
18 0.00%
19 0.00%
20 0.00%
0.00%
21 0.00%
22 0.00% 0 2 4 6 8 10 12 14 16 18 20 22 24
23 0.00% 98
24 0.00% No. of Events
Chi-Sq. Goodness of Fit Test
• The risk manager should run a fit test to confirm the right selection
of distribution.
• One such test is: Chi-squared goodness of fit test:
~ n (Oi Ei ) 2
T
i 1 Ei
• H0: The data follows a specified distribution (here Poisson)
• Ha: The data do not follow the specific distribution
• The test statistic is calculated by dividing the data into n bins (or
ranges) and is defined as:
• Where Oi is the observed no. of events, Ei is the expected no. of
events (or fitted), and n is the no. of categories.
• D.f=n-(k-1), where k refers to the no. of parameters that need to be
estimated.
Chi-Sq-Goodness of Fit Result
Probability
8 4 32 0.04814273 5 0.1377 degrees of freedom:
9 3 27 0.02433883 2 0.1317 (n-1=14-1=13), we fail to reject
0.1
10 1 10 0.01107417 1 0.0104 the null hypothesis and
11 0 0 0.00458068 0 0.4581 hence, Poisson distribution
12 0 0 0.00173684 0 0.1737 0.05
fits the data fairly well.
13 1 13 0.00060789 0 14.5110
Total 100 457 0
Chi2 15.5845938 0 1 2 3 4
d.f 13
Sign 5% No. o
Critical Chi2 22.3620325
Fitness Test
20
15
Frequency
Observed Frauds
10
Fitted Frauds
0
0 2 4 6 8 10 12
Events Per Month
The Poisson Distribution appears visually fit the data fairly well.
Normal Distribution
If we’d measure very accurately a randomly distributed
characteristic in a very large sample of cases, we’d obtain a
frequency distribution which is symmetric and in which
most cases cluster around the mean.
102
Examples
• Given that the daily change in price of a security follows the normal
distribution with a mean of 70 bps and a variance of 9. What is the
probability that on any given day the change in price is greater than 75
bps.
– Z= (75-70)/3 =1.67
– P(X>75)=P(Z>1.67)
– =1-P(Z<1.67)= 1-0.9525=0.0475
• Now estimate:
– Probability of change in price being 75 or fewer
– Probability of change in price being between 65 and 75 bps
– Probability of change in price being less than or equal to 60 bps
103
Confidence Interval…Example
• Suppose the mean operational loss X =$434,045 and set
confidence multiplier α=5% so that we have a (1-α)=95%
confidence interval around the estimate of mean, Such an
interval can be calculated using:
X z α Stdev(X)
400
Series: HIST_LGD
Sample 1 829
Observations 829
300
Mean 0.751924
Median 0.937150
Maximum 1.000000
200
Minimum 0.000000
Std. Dev. 0.323241
Skewness -1.160426
100 Kurtosis 3.063549
Jarque-Bera 186.1932
Probability 0.000000
0
0.0 0.2 0.4 0.6 0.8 1.0
Market Risk Example: Histogram of Daily Returns
for S&PCNXNIFTY over a 5-year period
500
Series: SNP_RETURN
Sample 1 1275
400 Observations 1275
Mean 0.001205
300 Median 0.002188
Maximum 0.079691
Minimum -0.130539
200 Std. Dev. 0.014263
Skewness -1.088501
Kurtosis 11.35109
100
Jarque-Bera 3956.755
Probability 0.000000
0
-0.10 -0.05 0.00 0.05
Candidates of Popular Non-Normal Distributions
• Beta Distribution
• Log Normal Distribution
• Weibull
• Inverse Gaussian
• Exponential
• Laplace Distribution
Beta Distribution
Mean & S .D.
; ( ) ( 1)
2
X (1 X ) X (1 X )
ˆ X 1 ˆ (1 X ) 2
1
S 2
S
Log Normal Distribution
• Density Function:
110
Kolmogorov-Smirnov Test (K-S)
112
Severity Distribution: Legal Liability Loss
Skew 2.8064
Kurtosis 15.3145
Percent
113
Normal Probability Plot for Legal Event Losses (P-P &
Q-Q plots)
1.2
0.8
1.0
0.8
Fitted p-value
Values in Millions
0.6
Fitted quantile
0.6
0.4
0.4
0.2
0.2 0.0
-0.2
0.0
-0.4
1.0
0.0
0.2
0.4
0.6
0.8
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Input p-value
114 Input quantile
Values in Millions
Fitted Exponential Distribution
Expon(149190) Shift=+1688.6
7
Fitted Actual
6 Function RiskExpon(149190,
N/A RiskShift(168
Shift 1688.58848 N/A
5
b 149189.812 N/A
Minimum 1688.6 2754.2
Values x 10^-6
0
0.0
0.4
1.2
1.4
0.2
0.6
0.8
1.0
Values in Millions
90.0% 5.0%
115 >
0.009 0.449
Fitted Weibull Distribution to Cover the Fat Tail
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Values in Millions
1.2
0.8
1.0
Fitted p-value
Values in Millions
0.8
0.6
Fitted quantile
0.6
0.4
0.4
0.2
0.2
0.0
0.0
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
-0.2
0.0
0.2
0.4
0.8
1.0
1.2
1.4
0.6
Input p-value
Input quantile
Values in Millions
117
VaR
118
Hypothesis Testing
• All hypothesis tests are conducted the same way. The researcher states a
hypothesis to be tested, formulates an analysis plan, analyzes sample data
according to the plan, and accepts or rejects the null hypothesis, based on
results of the analysis.
State the hypotheses. Every hypothesis test requires the analyst to state a
null hypothesis and an alternative hypothesis. The hypotheses are stated in
such a way that they are mutually exclusive. That is, if one is true, the other
must be false; and vice versa.
Formulate an analysis plan. The analysis plan describes how to use
sample data to accept or reject the null hypothesis. It should specify the
following elements.
• Significance level. Often, researchers choose significance levels (p-
value) equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be
used.
Note that: the p-values are the probability values that confirms whether the evidence
obtained is statistically significant or not. Statisticians have encoded evidence on a 0 to 1
scale, where smaller values establish greater evidence of statistical significance and p-value
less than 0.05 is generally accepted benchmark.
One-tailed test vs. Two-tailed Hypothesis Testing
• One-Tailed Test
– A test of a statistical hypothesis , where the region of rejection is on only
one side of the sampling distribution , is called a one-tailed test. In such
tests, we are only interested in values greater (or less) than the null. A
one sided hypothesis test is as follows:
– Test H0: k=0 against HA: k>0 or k<0 & we reject the null if | Tcomp |>Tcritical
• Two-Tailed Test
– A test of a statistical hypothesis , where the region of rejection is on both
sides of the sampling distribution , is called a two-tailed test. In such
tests, we are interested in values greater and smaller than the null
hypothesis.
– We write this as:
– Test H0: k=0 against HA:k≠0 & we reject the null if | Tcomp |>Tcritical
– In the two-sided hypothesis, we calculate critical value using α/2. For
example, α=5%, the critical value of the test statistic is T0.025.
Problem1: Two tailed test
• Analyze sample data. Using sample data, we compute the standard error
(SE), degrees of freedom (DF), and the t-score test statistic (t). It is also
termed as z statistic.
• SE = s / sqrt(n) = 20 / sqrt(50) = 20/7.07 = 2.83
DF = n - 1 = 50 - 1 = 49
t = (x - μ) / SE = (295 - 300)/2.83 = 1.77
• where s is the standard deviation of the sample, x is the sample mean, μ is
the hypothesized population mean, and n is the sample size.
• Since we have a two-tailed test, the P-value is the probability that the t-
score having 49 degrees of freedom is less than -1.77 or greater than 1.77.
• We use the t Distribution Calculator to find P(t < -1.77) = 0.04, and P(t >
1.7) = 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.
• Interpret results: Since the P-value (0.08) is greater than the significance
level (0.05), we cannot reject the null hypothesis.
Problem2: One-tailed test
• Bon Air Elementary School has 300 students. The principal of the
school thinks that the average IQ of students at Bon Air is at least
110. To prove her point, she administers an IQ test to 20 randomly
selected students. Among the sampled students, the average IQ is
108 with a standard deviation of 10. Based on these results, should
the principal accept or reject her original hypothesis? Assume a
significance level of 0.01.
– Null hypothesis: μ = 110
Alternative hypothesis: μ < 110
– Note that these hypotheses constitute a one-tailed test. The null
hypothesis will be rejected if the sample mean is too small.
Solution2: One-tailed test
• Analyze sample data. Using sample data, we compute the standard error
(SE), degrees of freedom (DF), and the t-score test statistic (t).
• SE = s / sqrt(n) = 10 / sqrt(20) = 10/4.472 = 2.236
DF = n - 1 = 20 - 1 = 19
t = (x - μ) / SE = (108 - 110)/2.236 = -0.894
• where s is the standard deviation of the sample, x is the sample mean, μ is
the hypothesized population mean, and n is the sample size.
• Since we have a one-tailed test, the P-value is the probability that the t-
score having 19 degrees of freedom is less than -0.894.
• We use the t Distribution Calculator to find P(t < -0.894) = 0.19. Thus, the
P-value is 0.19.
Interpret results. Since the P-value (0.19) is greater than the significance
level (0.01), we cannot reject the null hypothesis.
Hypothesis Testing: Bond Loss Example 1
132
Parametric-Mean Difference Test
• Many problems arise where we wish to test hypotheses about the means of two
different populations (e.g. comparing ratios of defaulted and solvent firms or
comparing performance of public sector bank vis a vis private banks etc.)
• Un-Paired test: Or,
• Start by assuming H0 is true and use the following test statistic to arrive at a
decision:
A low p value (<0.05) will Reject the null and a high p value
(>0.10) will fail to reject the null.
133
Ex: Difference between Solvent & Defaulted Group of
Borrowers
The hypothesis statements function the same way as the two sample
ttest – but we are focused on the medians rather than on the
means:
H 0: η 1 – η 2 = 0
H 1: η 1 – η 2 ≠ 0
Mean Sum
Source of Sum of of Squares
variation d.f. squares F-statistic p-value
140
Spearman’s Non-Parametric Rank Order Correlation
6 d 2
R 1
n3 n
• The tie adjusted rank correlation coefficient is:
6{ d 2 (t 3 t ) / 12}
R' 1
n3 n
• t is the number of individuals involved in a tie either in the first or second
series. => See the Excel Illustration.
Kendall’s Rank Correlation
• Check possible order combinations and compare two order sets. Then count the
number of different pairs (i.e. d) between these two order sets and estimate tau.
=> See the Excel Illustration.
• Because τ is based upon counting the number of different pairs between two
ordered sets, its interpretation can be framed in a probabilistic context.
Multivariate Analysis
• where
• a={(VarY(avg.Xsolv-avg.Xdef))-(CovXY(avg.Ysolv-Ydef))}/((VarX×VarY)-(CovXY)^2)
• b={(VarX(avg.Ysolv-avg.Ydef))-(CovXY(avg.Xsolv-Xdef))}/((VarX×VarY)-(CovXY)^2)
• Where Cov XY=Σ(X-avgX)(Y-avgY)/n-1
• avg. Xsolv=mean of variable X for borrowers in solvent category
• avg. Xdef=mean of variable X for borrowers in defaulted group
• avg. Ysolv=mean of variable Y for borrowers in solvent category
• avg. Ydef=mean of variable Y for borrowers in defaulted category
• The cut off Z-score is the combined benchmark for identified independent
variables to classify the prospective borrower into defaulted or solvent
category.
145
Statistical Scoring Model-Altman’s Z-Score Model
• SPSS
• STATA
• Eviews
• Bestfit/Easyfit
• Palisade@Risk
Thank You
Email: arindam@nibmindia.org