Empirical Research Methods-AB

Empirical Research Methods: Problem
Solving Approach
DR ARINDAM BANDYOPADHYAY
Associate Professor (Finance)
National Institute of Bank Management

What is Research Methodology?
• A research method is a technique for (or way of proceeding in)

gathering evidence while "methodology” is a theory and analysis of
how research can be done or should proceed.
• "It is the theory which decides what can be observed."
– Albert Einstein
• Research Questions are key that helps in designing the framework
• Methods must answer the research questions
• Methodology guides application
• All must include “rigor”
The Vital Linkage between Theory and Empirical
Observations
• Some kind of implicit theories about the nature of

reality always directs our observation and
interpretation of reality!
• Since we cannot “get away” from theory in scientific
research it should be made explicit! (transparency of
research)
• Theory provides us with a framework for the research
• Data analysis should always be based on and discuss
with earlier research (theoretical framework).
• Theoretical formulation of research results enables
generalization, benchmarking the study and enhances
the “explanatory power” and acceptability of the
results.
Academic Research Process
The Applied Research Process Wheel
The Research Process: Critical Steps
Define
Define
Problem
Problem
Plan
PlanDesign/
Design/
Primary/ Specify
Specify
Primary/
Secondary
SecondaryData
Data Sampling
Sampling
Procedure
Procedure
Collect
Collect
Data
Data
Analyze
Analyze
Data
Data
Prepare/
Prepare/
Present
Present
Report
Report
Follow
FollowUp
Up
Econometric Problem Solving Approach
Theory Facts
Statistical
Model Data Theory
Econometric Refined Econometric

Theory Data Techniques
Estimation of Econometric Model with the

Refined Data Using Econometric
Techniques
Structural
Evaluation Forecasting
Analysis
The Need for Applied Statistical Tools
• Econometrics utilizes economic theory, facts (data) and statistical

techniques, to measure and to test certain relationships among
meaningful (economic) variables, thereby giving these results to
economic reasoning.
• Empirical finance provides analytical tools needed to examine the
behavior of financial markets. viz. estimating the dynamic impact
multiplier of financial shocks, forecasting the value of capital assets,
measuring the volatility of asset returns, testing the financial
integration with different markets, assessing the determinants of
NPAs or profitability in banks & FIs, and more.
Course Objectives
• The course is designed to enable students to obtain a working

knowledge of econometric/statistical packages sufficient to extract
and analyze data sets that are typically encountered in financial
research in the Banking area.
• Students will use such basic statistical approaches in Excel as well
as more advanced techniques in various packages like STATA,
SPSS, EVIEWS, R studio, Best Fit, Simulation Engines etc.
• The course will also cover commonly used empirical methodologies
practiced in financial institutions as well as in academics worldwide.
• By this process, students will gain experience in writing and
presentation of research reports/papers/thesis.
What is Quantitative Research?
• Quantitative research is about measurements.

• Statistics and econometrics are the most widely used branch of
mathematics in quantitative research
• Quantitative research using statistical methods typically begins
with the collection of data based on a theory or hypothesis,
followed by the application of descriptive or inferential statistical
methods.
• Those who are likely to be a successful researchers/analysts are
more usually attracted by the problem solving side of the work and
the practical application of the mathematics and logic rather than
the mathematics/statistical concepts per se.
The Empirical Research Process Details
1. Interest->The topic or theme of research

2. Reading earlier research and theoretical research
3. Specification of the research problems-Research questions,
Framing hypotheses, Conceptualization
4. Planning Research process, Empirical Research design
5. Selection of variables and empirical tools-Source of data.
6. Data collection
7. Data filtering, coding, sorting
8. Data analysis (quantitative analysis using statistical packages)
1. Univariate/multivariate descriptive analysis
2. Multivariate Regression analysis
3. Diagnostic Checks
Research Process…
9. Interpretation of results
1. Answering empirical questions
2. Explanation of results
10. Drawing conclusions and policy actions
11. Bibliographic citations
12. Finalizing the report (sequencing the charts, tables, footnotes,
abstract and text etc.)
13. Publication of results
Basic steps of a research project
• Find a topicWhat, When

• Formulate questionsWhat, Why
• Define populationWho, When
• Select design & measurementHow
• Gather evidenceHow
• Interpret evidenceWhy
• Tell about what you did and found out
Stages of a Research Project/Report
• We will think of a research as having the following logical steps.

– Stage 1: Planning and framing issues
– Stage 2: Gathering and recording
– Stage 3: Analyzing the information
– Stage 4: Writing and sharing
• These stages are not, however, separate in practice.
• You should begin thinking about the last stage as soon as you start
on the first.
Planning and Framing Issues
 It means choosing a topic and grounding your research.

 Framing a research project means devising the parts that will give it
its structure and purpose.
 It includes choosing a topic and forming the question. Setting out
the aims and the objectives of the project. Working out a timetable
for the project. And, possibly, developing a formal Research
Proposal It could also include a pilot study.
The research topic should address
• The selection of the theme must have

– Personal interest
– Theoretical interest
– Social significance
– Researchability
– Data availability
– Existing Literature or new contribution (novelty)
– Ethical questions: who’s interests does the research serve?
(audience)
Planning & Framing-steps
• Selection of the theme/topic

• Getting acquainted with previous literature on the same theme
• Selection of the theoretical approach
• Specification of the research problem
– Research questions
– Conceptualization
• Research Design-methods, data source and sampling methods,
measures and indicators
Research Design
• Logical structure of the research (data & methods).

• “The function of a research design is to ensure that the evidence
obtained enables us to answer the initial question as unambiguously
as possible.”
(David de Vaus: Research Design in Social Research, 2001)
• Empirical support for practically any hypothesis can usually be
obtained by manipulating data.
• Good research design prevents this kind of manipulative use of data
by taking into account possible alternative explanations and
enabling comparisons and judgments between them.
The format of a research proposal
• A proposal will usually contain:

– A proposed title
– The project’s aim or your hypothesis-background theory
– The questions you need to answer
– Your methodology
– The time scale in which you propose to carry out the work
Time Frame for the Study
PHASE IV
PHASE III
PHASE II
PHASE I
1 week 1 week 2 weeks 4 weeks

Time Frame for a Study
Phases Description Time

Phase I Acceptance of participation by authorized letter of CMD 2 weeks
and appointment of bank representative for co-ordination
Phase II Dispatch of disk and e-mail containing the data entry 2 weeks
template and description of data fields to all the
participating banks by NIBM
Phase III  Entry of all relevant data, as per template, by the bank 4 weeks
representatives
 Dispatch of disk & e-mail containing the database by
the bank representatives to NIBM
Phase IV  Compilation, cleansing and sorting of collated data 4 weeks

from all banks by NIBM researchers
 Analysis of data and generation of reports &
compilation of important results
Phase V One-day Seminar on Loss Given Default at NIBM to share 1 week

important industry level results of the Study
TOTAL TIME FOR THE STUDY 13 weeks

Management Uses of
Problem Solving Research Method
• Improve the quality of decision making

• Trace problems
• Focus on keeping existing market share/expand business
• Assessment of Risk and Opportunities
• Understand the ever-changing marketplace
• Regulatory compliance
Some major points
• A hypothesis is a tentative statement of what you think you are likely

to find out in your research.
• A research proposal describes what the researcher intends to do.
• To start forming a research proposal, consider your topic, research
statement and project ideas. How are you going to approach the
topic?
Detailed form of a project proposal
• There should be a proposed title or an initial short statement about

what you intend to do, if you have not come up with a title.
• The introduction will be more sketchy than in the later report. But it
should explain the following: Why are you doing this work? (What is
your aim?) What problem are you wanting to investigate? What
questions will you ask? What other research is relevant to your
work?
• The introduction section should be followed by a brief literature
survey to project how your work will contribute to the literature in
your chosen subject area. Literature survey also helps you to learn
about research design for your project.
Contd..…(project proposal format)
• Description of your methodology materials, etc. may be tentative at

this stage as you may not have made a final decision about your
methods. But you may describe a range of possibilities.
• There will be no results unless you have done some brief trial or
pilot work, but it is helpful to indicate any expectations you have
concerning the results (any hypothesis) and how you will analyze
the information you collect.
Contd..…(project proposal format)
• Likewise, although you cannot have a discussion of results you do

not have, you could give some ideas of how you plan to interpret
and discuss various possible results.
• A proposal could also include a rough timetable of when you plan to
carry out different stages of the work.
Literature Search vs. Literature Review
• A Literature Survey consists of two parts - a Literature Search and a

Literature Review.
• It is the most important part of research report since it gives the
author a direction in the area of his/her research.
• In the Literature search phase, you will (with the guidance of your
supervisor) identify appropriate previous work, locate the
appropriate papers/articles/books, read the literature and make
notes on the key ideas, theories, directions, concepts and
techniques that are revealed.
• Review research in your chosen topic to clarify conceptual issues
and empirical context for your project.
• A key factor in marking the Literature Survey is whether key
foundational material has been considered. Therefore, you should
not be tempted to cut corners with the Literature Search.
• Sources: https://www.sciencedirect.com/
Literature Survey: Literature Review
• In the Literature Review phase you will have to order and digest the
material you have read, and then produce a structured summary and
critique of the reading you have done.
• The aim of the Literature Review is not to systematically catalogue
the reading you have done. Rather, using the hypothesis you have
developed or the project idea you have identified, you should use the
reading to identify the major ideas and threads of development, relate
work that has (perhaps) not previously been related, and thereby
justify and refine your hypothesis / ideas.
• your literature review must be organised around ideas with an
assessment of previous studies (including their strengths and
weaknesses).
• The Literature Review should "tell a story" that identifies the
development and blossoming of your ideas as you conducted your
literature search.
• It provides you the opportunity to persuade your reader (and
examiner) that your work is relevant and that it was worth doing!
Importance of Literature Review in the Research
Report
• Literature Review helps the researcher to set a goal for his/her

analysis and hence provides the problem statement.
• Literature Review is important because it describes how the
proposed research is related to prior works.
• In Literature Review section, the author must write the researchers
made various analysts, their methodology and the key findings they
have arrived at. By relating these works, the researcher must give
an account of how they have influenced the current thesis/report.
• It shows the originality and relevance of the research problem.
• A literature review must contain at least 10-12 published researches
in researcher’s field of interest.
Literature Review
• The literature review should:
– compare and contrast different authors' views on an issue
– group authors who draw similar conclusions
– criticize aspects of methodology
– note areas in which authors are in disagreement
– highlight gaps in research
– show how your study relates to previous studies
– conclude by summarizing what the literature says
• The purposes of the review:
– to place your study in an historical perspective
– justify and refine your hypothesis / ideas
– to avoid unnecessary duplication
– to evaluate promising research methods
– to relate your findings to previous knowledge and suggest further research
• A good literature review, therefore, identifies areas of controversy, highlights the
key issues, raises questions and identifies areas which need further research.
• It needs proper citation of works and referencing.
Key Elements in Literature Review
Elements Description
A. General Statements What is this section about?
B. Reference to previous research: What has been done before in this area?
Once you have identified the
references you want to cite in your
report you need to decide on 3 things:
1. Choose whether you want to emphasize the
information OR the authors
2. Choose the order of citations:
a.Grouped by approach
b.Ordered from distant (more general) to close (more
specific)
c.Ordered chronologically (by findings)
3. Choose the method of writing:

i.Quoting
ii.Paraphrasing-presenting the ideas and information
you have read in your own words
C. Gap in research: What is missing from the previous research?
This section leads to the reason why you have carried
out the research project presented in this report.
D. Reference to present research: What does the study in this report plan to achieve?
(Re-stating the objective & purpose of the paper)
Structuring Literature Review
• Researcher may structure the literature review section either by arranging

the works in their chronological order of publication year or by their findings.
• It is advised to provide a table to facilitate readers to identify the key findings

as well as the research gaps in the existing pool of studies and make space
for research carried out in the paper.
Listing Example 1: LGD public studies
Author Period Sample LGD type Statistics
Asarnow & Edwards (1995) 24 years 831 commercial & Workout -Average LGD is 35% for
Citibank industrial loans & C&I loans
data structured loans -For structured Loans:
(highly collateralized) 13%
Altman, Edward I. & Vellore 1978-95 696 defaulted bonds by Market -Average LGD is 58.3%
M. Kishore (1996),” Almost seniority & industry - 42% for Sr. Secured,
Everything You Wanted to class 52% for Sr. Unsecured,
Know About Recoveries on 66% Sr. Subordinate,
Defaulted Bonds”. 69% Jr. Subordinate
Ciochetti (1997), Real Estate -Average 30.6%

Finance, vol. 14 (Spring), 1986-95 2,013 commercial Workout -Min/Max (annual)
pp. 53-69 mortgages 20%/38%
Gupton (2000), Moody’s ‘90s 181 defaulted bank Secondary -Average 30.5% for
Investor Service, Global loans (of 121 issuers) Mkt Price Sr.Secured & 47.9%
Credit Research for Sr. Unsecured
Altman, Brady, Resti and 1982-2001 1,300 corporate bonds Market -Average 62.8%
Sironi (2003), ISDA research -PD and LGD correlated
report
Gupton and Stein (2005), 1981-04 3,026 defaulted loans, Market -Beta distribution fits
Moody’s Investor Service, bonds and preferred recovery
Global Credit Research stocks -Small no. of LGD<0
LGD public studies…
Author Period Sample LGD type Statistics
Acharya et al. (2003) 1982-1999 1,511 bond & debt Market -Liquidity, Seniority,
instruments industry, firm
profitability matters
Neto de Carvalho & June ‘85- 371 defaulted loans Workout -bi-modal LGD
Dermine (2003 & 2005) Dec 2000 -size & collateral effects
on LGD etc.
Araten, Michael, and 1982-’99 3,761 large corporate Workout -Average 39.8%
Peeyush (2004), The RMA loans of JP Morgan -St.dev.35.4%
Journal, May, pp. 96-103 -Min/Max (20%/38%)
Ghosh & Newar (2008), Conceptual Market -Average 75%-80% from

EPW & Feedback based based industry feedback
(spread)
Mishra & Verma (2016), 200-’15 NPA movements & System -weaker loan recovery
EPW 2004-’05 to performance of level, based rates
2014-’15 recovery channels on various Collateral quality,
(DRTs, SARFAESI, Lok recovery debtor credit
Adalats) channels relationship, credit
environment matters
Example 2: Corporate Compliance Cost Study
Author Taxes Method & Year Area Sample Size Response

Studied of Survey Studied Rate
Arthur D Little Federal Diary Study, USA 750 individuals, 100%,
Corporation (1988) Income Recall Survey, 4000 36.85%
Taxes for Recall mail partnerships &
individual & (questionnaire) corporations
business survey
Sandford, Godwin VAT Mail survey- United 3000 24%
and Hardwick 1987 Kingdom
(1989)
Vaillancourt (1989) Compliance Mail Canada 4196 9.18%
cost of questionnaire
Canadian survey, March-
employers May 1987
Slemrod and CIT Mailed to 1329 USA 1329 27.5%
Blumenthal (1996) of 1672 large
corporations
Citation of References
• When using the resources in Literature Review, whether the key

resources are books, journal articles, working papers, published
reports or websites, it is important to cite all the resources
mentioned in the research report.
• By citing sources properly, the author actually gives credit to those
who has created the original information resources. This is also a
very authentic way to benchmark your study.
• Check the Bibliographic Format in Harvard Style of Reference.
• Reference Style: http://www.nibmindia.org/prajnan.php?url=MQ==
Data
• When considering the establishment of a framework for statistical

testing, it is sensible to ensure the availability of a large enough
set of reliable information on which to base the test. For example,
if the analyst intends to find `one-in-five-year event’ the best way is to
have a five-year database.
Methods of Data Collection
• Primary Data: Survey research (through questionnaire

/telephonic/email/interview)
• Secondary Data Source
• Sampling Design
• Selection of Relevant Data Fields-Designing Data Format
Primary Data
Information which the researcher collects through various methods like
interviews, field surveys, questionnaires, templates etc. collected for the first
time. Can be used for solving the particular problem under investigation.
Advantages Disadvantages
• Answers a specific • Expensive
research question
• Time Consuming
• Data are current & can
better give realistic view • Quality declines if
interviews/questionnaire
• Source of data is known & are lengthy
can have wide coverage
• Secrecy can be maintained • Reluctance to participate in
lengthy
interviews/questionnaire
filling
Disadvantages
Disadvantagesareare
usually
usuallyoffset
offsetby
bythe
the
advantages
advantages
Secondary Data
This process involves accessing information that is already

gathered from either the originator or a distributor of primary
research. It is the most widely used method for data collection.
Advantages Disadvantages
• Ease of Access, Saves time and • May not be on target
money IF on target with the research
problem
• Aids in determining direction & hence may not meet
for primary data collection researcher’s needs
• Low cost to acquire • Quality and accuracy of
data may pose a
• Secondary research is often
problem
used prior to primary research
to help clarify the research • Not Timely
focus.
• it provides a way to access the
work of the best scholars all
over the world.
Sources of Secondary Data
Internal
InternalBank
BankInformation
Information
Government
Government Agencies:
Agencies:CSO,
CSO,RBI
RBI
Trade
Tradeand
andIndustry
IndustryAssociations:
Associations:NIC
NIC
Economic/Financial
Economic/FinancialResearch
ResearchFirms:
Firms:CRISIL,
CRISIL,
NCAER
NCAERetc.
etc.
Commercial
CommercialPublications:
Publications:RBI
RBIReport,
Report,Published
Published
Paper
Paperetc.
etc.
News
News Media
Media
Secondary Database
• Indian Database:
– CMIE PROWESS database (Firm Level)
– http://economicoutlook.cmie.com/ (Macro Database)
– http://www.epwrfits.in/NAS_Series.aspx
– Stock Market Database: www. nseindia.com
– CSO:http://www.mospi.gov.in/
– NIC: http://indiabudget.nic.in
– RBI Database: www.rbi.org.in (or http://dbie.rbi.org.in/DBIE/dbie.rbi?site=publications#!
4
• RBI Publications: http://www.rbi.org.in/scripts/publications.aspx?publication=Annual
• Basic Statistical Returns-Credit, Deposit Distribution, Maturity Pattern etc. across
Banks
• Bank’s Balance-sheet data: Annual Accounts Data of Scheduled Commercial
Banks (1979 to 2004): http://www.rbi.org.in/scripts/Foreword.aspx
• Handbook of Statistics of Indian Economy:
http://www.rbi.org.in/scripts/AnnualPublications.aspx?head=Handbook%20of
%20Statistics%20on%20Indian%20Economy
• Daily, quarterly, fortnight, weekly data: http://www.rbi.org.in/scripts/statistics.aspx
• Data with short frequency: RBI Monthly Report:
http://www.rbi.org.in/scripts/BS_ViewBulletin.aspx
• Global Database:
– Federal Reserve:http://www.federalreserve.gov/releases/h15/data.htm
– Global Financial Database: www.globalfindata.com/index.php3?
action=global_financial_database_description
– Global Econmic Parameters: http://www.economagic.com/
– Global Economic Statistics: http://www.econstats.com/index.htm
Data Templates & Formats
• It is very essential to create data templates and formats for

collecting data.
• Date frequencies may vary in a time series data.
• For cross sectional data, template must specify key variables and
identifiers.
• For Panel structure, it is essential to pool cross sectional and time
dimensions.
Sampling selection
• When we are studying a very large group of cases (e.g. Housing

demand), it is usually not possible to obtain observations of all the
cases.
• Therefore, we study a sample of the group (population) which
represents the larger whole.
• There is always a risk that the units selected in the sample are
somehow exceptional, i.e. the sample doesn’t represent the
population (sampling error-tolerance level).
• We minimize this risk by random sampling: A sample arranged so
that every element of the population has an equal chance of being
selected.
• There may often be factors which divide up the population into sub-
populations (groups / strata) and we may expect the measurement of
interest to vary among the different sub-populations. This is achieved
by stratified sampling. Check my Excel illustration.
• Cluster sampling is a sampling technique where the entire population
is divided into groups, or clusters, and a random sample of these
clusters are selected.
Sampling concepts
The aim of sampling is to produce a miniature copy of the population. Each

member of the population has an equal likelihood of being selected into the
sample. Hence we can make inferences about the larger population based on
the sample.
Sample Size Calculation
• For large populations, Cochran (1963), Sampling Techniques, 2nd Edition,

NY: John Wiley and Sons, Inc.) developed the following equation to
compute a representative sample size.
• The value of z (or t) can be found in statistical tables which contains the
area under the normal curve (or t distribution). It is the abscissa of a curve
that cuts off an area (α) at the tails (1- α equals the desired level of
confidence level, say 95%). However, many researchers use t-distribution
(two tail) table to obtain z when population variance is unknown.
• The margin of error (e) is the difference between population mean ( ) and
sample mean( ).
• Exercise: Suppose a researcher wish to evaluate the effectiveness of
Financial Literacy Programme organized by banks where farmers were
encouraged to adopt a new practice. Assume there is a large population
but we don’t know the variability (or variance) in the proportion that adopt
the practice. If we desire a 95% confidence level and ±5% precision. The
resulting sample size would be: ?
Types of Samples
• Simple Random Sample: Every member of the population has a

known equal chance of selection
• Stratified Sample: Population is divided into mutually exclusive
groups (such as gender or age or business wise-public sector,
foreign, pvt. etc.); then random samples are drawn from each group.
• Cluster Sample: Population is divided into mutually exclusive
groups (such as geographic areas, then a random sample of clusters
is selected. The researcher then collects data from all the elements
in the selected clusters or from a probability sample of elements
within each selected cluster.
• Systematic Sample: A list of the population is obtained-i.e. all
persons with a checking account at each branch of XYZ Bank and a
skip interval is obtained by dividing the sample size by the population
size. If the sample size is 100 and the one branch of the bank has
1000 customers, then the skip interval is 10. The beginning number
is randomly chosen within the skip interval. If the beginning number
is 8, then the skip pattern would be 8, 18, 28,….
Data and Refining
• Feasibility Study
• Data Collection Process
• Quantitative vs. Qualitative Data
• Micro vs. Macro Data
• Data Cleaning/filtering/editing: missing data, outliers, detecting errors
& omissions, checking accuracy, uniformity and consistency etc.
• Data entering & formatting (cross section/panel/time series)
• Data coding: Conversion of qualitative factors into quantitative ones &
data categorization/grouping.
• Cross checking information: data validation
• Model/Method Validation-DV vs. IV, Choice of Functions, Two
Variable vs. Multivariate Techniques, Cross Section vs. Panel,
Degrees of Freedom, Multicollinearity, Serial Correlation, Structural
Changes, Errors in Measurement, Non-Stationarity (trends,
seasonality), Non-linearity
Empirical Methods
• Data Analysis: Summary Statistics

– Central tendency/expectations
– Dispersion/volatility
– Understanding Distribution Fitting
• In-depth analysis of Data
– Covariance and correlation
– Basic Concepts in Probability, Joint Probability
– Discrete & Continuous Distribution
– Hypotheses Testing
• Modeling & Forecasting
– Simulation & Value at Risk (VaR) Techniques
– Simple Linear Regression
– Multivariate Regression-MDA, Multiple Regression, Logistic Regression
etc.
– Time Series Analysis-Non stationarity and forecasting methods
– Diagnostic Checks or Validation
Research Report Structure: Title & Annexure
• Title Page: should be brief & catchy.

• Executive Summary: Abstract of the report in miniature form along
with acknowledgements.
• Content Page: Table guide about cauterizations, Table and Graph
indexes.
Research Report Structure..
• Introduction: The need for the research, background research,

research objectives, hypotheses and investigative questions.
• Theory and Assumptions: Theoretical/Empirical Model
Specifications, Hypotheses Framing etc.
• Methodology/Research Design: Data Sample, Variables, Data
Source, Sampling Design, Descriptive Statistics (data summary
statistics), Empirical Modeling, Preliminary analysis, software used
Model selection: dependent & independent variables and
explanations (rationale).
• Findings: Longest section of the report. Results explanation and
interpretations. Presentation of data, graphs/charts, documentation
and explanation of tables
• Summary and Conclusions: Brief statement of findings, Implications
for theory, policy conclusions, recommendations, limitations of the
study and further scope of future research.
What Makes Research Good?
• Originality/Reliability: Does the report/paper add to the subject

area/body of knowledge?
• Analytical Rigor/Trustworthiness: Is the report/paper soundly
researched or argued?
• Practical Implications/Validity: Are implications for practitioners
clearly identified?
• Research Applications: Does the paper suggest areas for further
research?
• Readability: Is attention paid to clarity of expression and
readability? –The purpose of the work is to convey your new idea! It
is not to show the rest of the world how much you know.
The Importance of Methods and Methodology
• The most common error made in reading [and conducting] research

is overlooking the methodology, and concentrating on the
conclusions. Yet if the methodology isn’t sound, the conclusions
and subsequent recommendations won’t be sound.
The Conclusion
• Crucial. Many readers will just read the Introduction and the
conclusion.
• What should it contain?
 Remind the reader about the original question
 Remind the reader why this is important and interesting
 Tell the reader what your contribution is but in much more
detail than in the Introduction
 Future avenues of research
Research Presentation
• The purpose of your presentation is to advertise your new idea, get

the audience excited about it and wanting to read more in your
paper/report/project
In Summary
• Tailor the project report to the audience and the problem

• Keep statistics in tables and methods well documented and interpret
the Econometric results very clearly
• Use graphs and tables wherever appropriate
• Always include an explanation of the model’s assumptions
• Do not get carry away with unnecessary model sophistication
• Link your report to policy, recommendations and suggestions-what
additional value its add to the existing literature/policy/business?
Data Presentation & Tabulation
• Most research studies result in a large volume of raw statistical

information which must be reduced to more manageable
dimensions if we are to see the meaningful relationships in it.
• This process require Tabulation (grouped frequency distribution,
relative frequency etc.), Graphical Presentation (Histogram, Bar
graphs, Frequency Polygon, Ogive etc.), Summary Statistics (or
Descriptive Statistics).
Descriptive Methods
• Graphical Methods-Bar Charts, Pie Charts etc.

• Frequency Distribution (grouped)-histogram, frequency curve,
cumulative distribution etc.
• Scatter Plot
• Measurement of central tendency (mean, median, mode,
percentiles etc.)
• Dispersion-SD, mean deviation, CV, range, moments, skew-ness,
kurtosis etc.
• Forms of distribution-discrete vs. continuous
Exploring & Displaying Data: Tabulating and Graphing
Categorical Data
Graphing Data
Tabulating Data
The Summary Table:
Frequency Distribution Bar Charts
7
6
O g ive
5
4
3 Histogram
12 0
10 0
80
Ogive
60
2
40
1 20
0 0
10 20 30 40 50 60
10 20 30 40 50 60
Example of Multi-dimension Bar Charts
Quarterly NPA Position of SCBs

by banking groups in India
6
Gross NPA%
2 0 4
ALL PSBs New Pvt Old Pvt PSB-Large PSB-Mid PSB-Small

2013 2014
Source: CMIE
Recent Position-Bar Charts-FSR
Capital Infusion & Performance of PSBs-C&AG
Bar Chart-Comparison of Bank NIM
Recent Trend in NIM-FSR
Statistical Analysis: Descriptive Statistics
• The objective of descriptive statistical analysis is to develop

sufficient knowledge to describe a body of data. This is
accomplished by understanding the data levels, their frequency (or
probability) distributions, and characteristics of location, spread (or
deviation) and shape (skewness).
• The discovery of miscoded values, missing data, and other
problems in the dataset is enhanced with descriptive statistics.
• Correlation analysis for preliminary study of relationship amongst
factors.
Univariate Analysis: Graphical Approach: ROA vs.
Rating
Detecting Outliers
• outliers are values that are markedly smaller or larger than most other values in the
same data
Summary Statistics of a panel of open joint stock cos.
During 2000-06 collected by SMIDA, Ukraine
Variable Mean SD Min Max Mean Mean

(Small) (Large)
Debt Maturity 0.087 0.178 0.000 0.920 0.067 0.106
Size 8.387 1.503 4.758 13.467 7.202 9.573
Growth 1.301 1.006 0.071 12.744 1.305 1.298
Turnover 0.925 1.029 0.004 8.080 0.937 0.913
Leverage 0.292 0.256 0.004 1.752 0.260 0.324
CR 2.834 4.143 0.087 52.316 2.898 2.770
No. of Obs 17,462 8731 8731
Descriptive Statistics: Financial Ratios per Rating (3-
Year Medians for 1998-2000 US Firms)
What will be your interpretation?
Factors or Ratios AAA AA A BBB BB B CCC

EBIT int. cov. 21.4 10.1 6.1 3.7 2.1 0.8 0.1
Free Oper CF/TD% 84.2 25.2 15.0 8.5 2.6 -3.2 -12.9
Return on Capital% 34.9 21.7 19.4 13.6 11.6 6.6 1.0
Op-Income/Sales% 27.0 22.1 18.6 15.4 15.9 11.9 11.9
Total debt/Capital% 22.9 37.7 42.5 48.2 62.6 74.8 87.7
LTD/Capital% 13.3 28.2 33.9 42.5 57.2 69.7 68.8
No. of Cos. 8 29 136 218 273 281 22
Source: DeServigny & O Renault, Measuring & Managing Credit Risk, McGrawHill
70
Descriptive Statistics of Bank Variables: Can you
detect any issue?
Basic Concepts on Variables
• Variables: observables and measurable characteristic of an observation

unit, which varies across different units
• Qualitative (individual, group, organization, event or activity etc.) vs.
Quantitative (measurable in values)
• Continuous vs. Categorical/Dichotomous
– Categorical (or nominal or qualitative) variables with no intrinsic
order, e.g. Gender, Employment type, Education level, Marital status,
Job types, Political Party etc.-like 0/1 or 1,2,3..,(qualitative data).
• Ordinal: categories (qualitative) in ascending or descending order like
low income, medium income and high income etc., age group, high risk,
medium risk, low risk or satisfaction level or level of agreement etc.
• Scale variables (quantitative): Not grouped, quantitative variables. For
example: Loan amount, Salary, Expenditure, Height, Weight etc. Also
referred to continuous data
• It is essential to classify data correctly!
• Independent vs. Dependent
Grouped Frequency Distribution
• Grouped frequency distribution is a tabular summary of data

showing the frequency of items in each of several non-overlapping
classes
• Class interval Width= (Largest -Smallest value)/No. of class
• Relative frequency=Frequency of the class/n
• Percentage frequency=Relative frequency*100
• Cumulative frequency-less than type or more than type
• Graphic Presentation
– Histogram
– Frequency polygon
– Ogive (cumulative)
– Lorenz Curve
Quartiles and Percentiles
• Quartiles divide an ordered lists into quarters.
• For example, the first quartile (Q1) is a number greater (or equal) than the values of
25% of the cases and lower (or equal) than the values of the remaining 75%.
• In financial risk management quantile chosen would be 90%, 95% or 99% in most
cases since the largest losses can be observed at extreme quanitiles. For example,
in Operational Risk Capital forecasting from loss distribution can be quantified by
determining the 100p% quantile for simulated distribution.
• Percentiles divide ordered lists into hundredths.
– One percent (p1) of the cases lie below the first percentile and 99% lie above it.
For example 1st quartile (Q1) equals 25 th percentile (p25). Rule: (k/4)×(n+1) for
ungrouped data.
• For example, all the cases of a real sample of employees (N=1112) are ordered on
the line below according to the monthly income (in Rs.).
– e.g. median is the value between 556th and 557th cases (or their average)
Measures of dispersion
• Range=maximum value-minimum value

• Interquartile range(IQR)=Q3-Q1
• Standard Deviation (SD), Variance (SD 2)
• Coefficient of Variation (CV)=SD/Mean
• Skeweness (sk)=3(Mean-Median)/SD
=(Mean-Mode)/SD
or =[(Q3-Q2)-(Q2-Q1)]/[(Q3-Q2)+(Q2-Q1)]
or=3rd moment=√β1=μ3/σ3
• Kurtosis=4th moment=β2-3=(μ4/μ22)-3
• = μ4/σ4
• If β2<3, distribution is platykurtic (less peaked-ness & greater coverage
of body); if β2>3, distribution is leptokurtic (fat or long tail but high
peaked-ness). When β2 =3, distribution becomes normal or mesokurtic
or symmetric (thin tail).
Example: Credit Risk: Bond Default Rates over 19
Years
Year Bond Default Rate (bp)

1982 125
1983 68
1984 84
1985 99
1986 175
1987 93
1988 146
1989 151
1990 256
1991 297
1992 121
1993 47
1994 52
1995 91
1996 43
1997 52 Source: S & P’s
1998 116 Credit Week,
1999 198 Jan31,2001
2000 212
Histogram of Bond Default Losses
10
Series: Loss_rate_bsp
8
Observations 19
Mean 127.6842
Median 116
Frequency
6
Maximum 297
Minimum 43
Std. Dev. 72.44619
Skewness 0.844582
4
Kurtosis 2.880478
Jarque-Bera 2.270154
Probability 0.321397
2
0
50 100 150 200 250

Loss_Rate_bsp
77
Histogram of Bond Defaults Rate (bp)
5
Series: BONDDEF
Sample 1982 2000
4 Observations 19
Mean 127.6842
3 Median 116.0000
Maximum 297.0000
Minimum 43.00000
2 Std. Dev. 72.44619
Skewness 0.844582
Kurtosis 2.880478
1
0
25 50 75 100 125 150 175 200 225 250 275 300
N  k  2 ( KURT  3) 2 
Jarque-Bera test statistics JB   SK  
6  4 
where N=Sample size. It follows chi2 distribution with 2 d.f.
Descriptive Stats about Zone-wise Loan Distribution of
a Bank
zone_group p1 p5 p10 p25 p50 p75 p90 p95 p99 min max range mean sd cv Kurto Gini HHI
Central_Z_I 0.11 0.4 0.69 1 1.68 3 7.67 11.15 84.79 0.01 90.89 90.9 4.042 10.40 2.57 56.47 0.634 0.356
Central_Z_II 0.03 0.39 0.62 0.99 1.43 2.54 6.97 13.71 107.7 0.01 211.43 211.4 5.410 20.75 3.83 76.68 0.724 0.519
East_Z 0.02 0.51 0.95 1.35 2.39 10.8 30.5 55.84 260 0.01 1251 1251.0 20.703 97.27 4.70 137.38 0.792 0.598
Mumbai_Z 0.04 0.29 0.64 1.48 4.14 15 49.6 133.7 560 0.004 1204.4 1204.4 27.815 91.71 3.30 67.32 0.786 0.572
North_Z 0.11 0.5 0.82 1.2 2.22 5.49 13.4 41.45 183.3 0.01 731.03 731.0 10.620 44.61 4.20 159.31 0.739 0.519
South_Z_I 0.13 0.83 0.97 1.31 2.4 6.41 24.3 38.27 97.3 0.02 380.5 380.5 8.735 23.87 2.73 155.87 0.701 0.421
South_Z_II 0.07 0.41 0.79 1.37 3.11 9.74 29 59 272.4 0.04 400 400.0 13.249 37.77 2.85 67.84 0.720 0.442
West_Z_I 0.21 0.73 0.94 1.53 3.27 11 29.7 105.5 225.1 0.12 385.4 385.3 18.296 48.88 2.67 31.59 0.759 0.547
West_Z_II 0.22 0.69 0.83 1.23 2.34 4.93 13.7 27.16 50.32 0.07 99.54 99.5 6.108 11.55 1.89 32.66 0.619 0.299
Total 0.08 0.46 0.79 1.24 2.51 7.94 26.1 52.41 250 0.004 1251 1251.0 15.505 62.26 4.02 163.61 0.771 0.578
The above table represents the detailed summary statistics of loan distribution across 9
zones.
p1, p2,…p50 p75 are the percentile values of actual size of the loans. Like p50 measures the
median loan size. The tail side of the loan distribution is captured by p99 percentile.
The nature of percentiles, coefficient of variations, Gini and Herfindahl Index tell us there is a
significant presence of geographic concentration in the corporate loan portfolio of the bank.
West Zone II & Central Zone I are diversified. East Zone has highest level of concentration
because of the presence of two very large loans.
Gini & Lorenz Curve Measure of inequality or
Concentration
• The Gini coefficient or Lorenz ratio is a standard measure of inequality

or concentration of a group distribution. It is defined as a ratio with
values between 0 and 1. A low Gini coefficient indicates more equal
income or distribution of loan assets with different industries/groups,
sectors, etc., while a high Gini coefficient indicates more unequal
distribution.
1 n n yi yi
• For Grouped Data: G 
2
 pi p j
i 1 j 1

pi pj
• yi/pi is the percapita income/loan share of the ith class to the per capita
income/loan share of the total population where income/loan size
classes are arranged from bottom of the distribution.
• In terms of Lorenz curve, (cum. share of income or loan outstanding in

the vertical axis vs. cum. share of population or number of loans in the
horizontal axis),Gini coefficient
=area between the Lorenz curve and the line of equality/area of the
triangle below the diagonal NIBM
Lorenz Curve Representation of Gini Inequality
Measure
Measurement of Income Inequality through Gini
Coefficient
Area Y=Area of triangle AHO
plus area of quadrilateral
ACIH correspond to ith class
=area of rectangle AHIG+area
of triangle AGC
=
X
=
Y The area of triangle AHO=
Thus, required area=
Or more compactly
Thus, Gini=X/(X+Y)=
Geographic Loan Concentration: Gini Coefficient
Approach
Zone wise Inequality Comparison in Loan Distribution
100.00%
90.00%
80.00%
Central_Z_I
70.00%
Central_Z_II
Cum % of Loan Share
60.00% East_Z
Mumbai_Z
50.00%
North_Z
40.00% G=1-Σpi×(zi+zi-1)
South_Z_I
30.00% South_Z_II
West_Z_I
20.00%
West_Z_II
10.00%
0.00%
0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00
%
Cum % of Borrow ers
Application-Rating Model Validation Tests: Lorenz
Curve for Credit Scores
G=1-Σpi×(zi+zi-1)
Zi is cum% share in total default up
to ith group
Pi is relative default% share
Straight forward measures of concentration: HHI
• Originally used in the context of quantifying diversification within an

industry to assess the level of competition in the marketplace, the
HHI can also be used to calculate portfolio concentration risk.
• This is a very straight forward measure of concentration
• The HHI is calculated by summing the squares of the portfolio share
of each contributor.
• Theoretically, a perfectly diversified portfolio of 500 borrowers would
have HHI = 0.002. In contrast, if the bank portfolio is divided
amongst five zones in the ratio of 5:2:1:1:1, then the implied HHI by
sector is 0.32, indicating a significant level of concentration.
However, the HHI of 0.32 does not hint at ways to lower the
concentration. This is perhaps a major drawback of this method.
NIBM
Expected Loss (EL) based HHI Measure of
Concentration
Besides Rating,
EL based HHI
Measure depends
upon pool size,
Largest loan size
as well as the no.
of loans.
Hence, Portfolio
slicing is
important!
• In fraction, HHI of Pool 1 is 0.020 means well diversified; Pool 2: 0.10 moderate level
of concentration and Pool 3: 0.352 indicate high level of concentration
• However, HHI does not tell the ways to reduce concentration!
NIBM
Theil Entropy Index
• The Theil index is a popular entropy measure to capture regional

inequality
• T=Σi=1n silog(si/pi)
• si is relative share of loan outstanding in the whole pool.
• n is the number of sub groups
• pi is the each sub group’s share in total population (total borrowers) in
the pool
• For perfect equality, T=0; higher the value of T, greater is the
inequality (or more concentration)
Descriptive Statistics: Mean, VARIANCE, SKEW and
KURTOSIS
• These are the four moments about mean describe the nature of loss
distribution in risk measurement.
• The mean is the location of a distribution & Variance or the square of
standard deviation measures the scale of a distribution.
• The Skew is a measure of the asymmetry of the distribution. In risk
measurement, it tells us whether the probability of winning is similar
to the probability of losing and the nature of losses.
• Negative skewness means there is a substantial probability of a big
negative return. Positive skewness means that there is a greater-
than-normal probability of a big positive return.
• Kurtosis is useful in describing extreme events (e.g., losses that are
so bad that they only have a 1 in 1000 chance of happening).
• In the extreme events, the portfolio with the higher kurtosis would
suffer worse losses than the portfolio with lower kurtosis.
• Skewness and Kurtosis are called the shape parameters
Moments and the Nature of Distribution
• For a normal distribution, skewness=0

• The Kurtosis for the Normal distribution is 3.
• Normal distribution is so commonly used (especially in the credit
risk) that some researchers define the “excess kurtosis” as being
the calculations above minus 3.
• Distributions with a kurtosis greater than the Normal distribution are
said to be leptokurtic (or fat tail distribution).
Kurtosis
• Since Kurtosis measures the shape of the distribution (the fatness of the tails), it
focuses on losses are ranged around the mean.
– leptokurtic means smaller proportion of medium sized deviation from mean,
but larger proportion of extremely large and small deviation from mean.
Kurtosis greater than three indicates a sharp/high peak with a thin midrange
and fat tails
– Platykurtic means smaller proportion than normal deviation from mean that
are extremely small or large and a larger proportion of medium sized
deviations from mean. Kurtosis of less than three indicates a low peak with a
fat midrange on either side
– A normal distribution is called mesokurtic and it has a kurtosis of 3.
90
Difference between Skewness & Kurtosis
• Skewness - measures the degree and direction of symmetry or

asymmetry of the distribution.
• A normal or symmetrical distribution has a skewness of zero (0). But in
the operational loss results, normal distributions are hard to come by.
• Therefore, a distribution may be positively skewed (skew to the right;
longer tail to the right; represented by a positive value) or negatively
skewed (skew to the left; longer tail to the left; with a negative value-
market losses).
• Kurtosis - measures how peaked a distribution is and the lightness or
heaviness of the tails of the distribution. In other words, how much of the
distribution is actually located in the tails?
• A positive kurtosis value means that the tails are heavier than a normal
distribution and the distribution is said to be leptokurtic (with a higher,
more acute "peak“, e.g. Pareto distribution. A negative kurtosis value
means that the tails are lighter than a normal distribution and the
distribution is said to be platykurtic (with a smaller, flatter "peak").
Types of Probability Distributions
• Discrete (for Events Prediction) & Continuous

(for Losses)
• Binomial, Poisson, Bernoulli, Negative
Binomial...(Discrete candidates..)
• Normal, Beta (Credit Risk, Market Risk) t, Chi-
sq. , Beta, Exponential, Weibull (Extreme
Distribution-Thick Tail), ...(Continuous
distributions)
92
Popular Discrete Distributions: Rule of Thumb for
Identifying Them
• Binomial Distribution, Geometric Distribution and Negative

Binomial Distribution
• A useful rule of thumb for choosing between these popular
distributions is:
Binomial: variance<a. m
Poisson: variance=a. m
Negative Binomial: variance>a. m
• Thus, if we observe that our sample variance is much larger

than the sample mean, the negative binomial distribution may
be an appropriate choice.
93
Data on External Frauds committed in the Retail Credit Card
Business
(N=100 data points)
Date Fraud
50
Amount ($)
18/01/2003 1285.73
26/01/2003 1268.1
40
26/01/2003 1392.33
08/01/2003 1257.85
20/01/2003 1261.13
Frequency
30
22/02/2003 1252.79
…
09/08/2004 1251.9
20
13/09/2004 1347.66
19/09/2004 1282.3
26/09/2004 1269.83
10
12/10/2004 1312.61
23/10/2004 1256.37
27/10/2004 1299.78
0
1200 1300 1400 1500 1600

N: 100
Fraud_Amount_in_Dollar
Mean: 1306.79
Binomial Distribution
N= 12
p= 0.8
N!
f ( x)  p x (1  p) x
x!( N  x)!
Probability
Mean=N p
and
1
15
19
23
25
27
29
11
13
17
21
Number of events Standard Deviation:
  Np (1  p)
x
The parameter p can be estimated by: 95 pˆ 
N
Summary of Frequency of Loss Daily Data for Credit
Card Fraud
Poisson Distribution:
No. of events Observed i x ni
per day (i) frauds (ni)
0 19 0 x
e  k
1 16 16 f ( x)  
2 51 102 k 0 k!
3 9 27
4 6 24 E=2.71828…
5 5 25 x=0,1,2,…,
6 4 24
7 6 42
8 2 16
9 1 9 Here, mean (Lambda)
10 0 0 λ =sum(ixni )/sum(ni)
11 0 0 =352/124=2.84
12 2 24
13 1 13 Here, SD=
14 0 0 √(2.84)=1.68523
15 2 30
Total: 124 352 96
Distribution of Credit Card Fraud Events
Distribution of Frauds frequency
60
50
40
Observed Frauds
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
no of events per day
97
Fitted Poisson Values for Credit Card Frauds
Lambda (λ=2.84)
No. of Events Probability Possion Parameter
0 5.84% 2.84
1 16.59%
2 23.56%
3 22.31%
Probability Fitted Poisson
4 15.84%
5 9.00% 25.00%
6 4.26%
7 1.73%
8 0.61% 20.00%
9 0.19%
10 0.05%
Probability
11 0.01% 15.00%
12 0.00%
13 0.00%
14 0.00% 10.00%
15 0.00%
16 0.00%
17 0.00% 5.00%
18 0.00%
19 0.00%
20 0.00%
0.00%
21 0.00%
22 0.00% 0 2 4 6 8 10 12 14 16 18 20 22 24
23 0.00% 98
24 0.00% No. of Events
Chi-Sq. Goodness of Fit Test
• The risk manager should run a fit test to confirm the right selection
of distribution.
• One such test is: Chi-squared goodness of fit test:
~ n (Oi  Ei ) 2
T 
i 1 Ei
• H0: The data follows a specified distribution (here Poisson)
• Ha: The data do not follow the specific distribution
• The test statistic is calculated by dividing the data into n bins (or
ranges) and is defined as:
• Where Oi is the observed no. of events, Ei is the expected no. of
events (or fitted), and n is the no. of categories.
• D.f=n-(k-1), where k refers to the no. of parameters that need to be
estimated.
Chi-Sq-Goodness of Fit Result
External Credit Card Fraud Data

22 months data
Events Observed Fitted Expected
Per Month (i)Frauds (ni ) ixn Probability Frauds sum((O-E)^2/E)
O E
0 1 0 0.0105672 1 0.0030
1 5 5 0.04808078 5 0.0077 Fitte
2 11 22 0.10938377 11 0.0003
3 17 51 0.16589872 17 0.0101
4 18 72 0.1887098 19 0.0402
5 17 85 0.17172592 17 0.0017 Notice0.2that the chi2 test
6 14 84 0.13022549 13 0.0734 statistic is less than the
critical value of 22.36 at 5
7 8 56 0.08464657 8 0.0255 0.15significance with 13
percent
Probability
8 4 32 0.04814273 5 0.1377 degrees of freedom:
9 3 27 0.02433883 2 0.1317 (n-1=14-1=13), we fail to reject
0.1
10 1 10 0.01107417 1 0.0104 the null hypothesis and
11 0 0 0.00458068 0 0.4581 hence, Poisson distribution
12 0 0 0.00173684 0 0.1737 0.05
fits the data fairly well.
13 1 13 0.00060789 0 14.5110
Total 100 457 0
Chi2 15.5845938 0 1 2 3 4
d.f 13
Sign 5% No. o
Critical Chi2 22.3620325
Fitness Test
Poisson Fitted Vs. Observed Frauds
20
15
Frequency
Observed Frauds
10
Fitted Frauds
0
0 2 4 6 8 10 12
Events Per Month
The Poisson Distribution appears visually fit the data fairly well.
Normal Distribution
If we’d measure very accurately a randomly distributed
characteristic in a very large sample of cases, we’d obtain a
frequency distribution which is symmetric and in which
most cases cluster around the mean.
102
Examples
• Given that the daily change in price of a security follows the normal
distribution with a mean of 70 bps and a variance of 9. What is the
probability that on any given day the change in price is greater than 75
bps.
– Z= (75-70)/3 =1.67
– P(X>75)=P(Z>1.67)
– =1-P(Z<1.67)= 1-0.9525=0.0475
• Now estimate:
– Probability of change in price being 75 or fewer
– Probability of change in price being between 65 and 75 bps
– Probability of change in price being less than or equal to 60 bps
103
Confidence Interval…Example

• Suppose the mean operational loss X =$434,045 and set
confidence multiplier α=5% so that we have a (1-α)=95%
confidence interval around the estimate of mean, Such an
interval can be calculated using:

X  z α  Stdev(X)
• Stdev(X), the standard deviation of X=$73,812, and z is the

standard normal variable for α=5%. Using the Normsinv( )
function, we see that Normsinv(0.95)=1.64 (Or see the standard
normal table). Therefore, we can set z=1.64 and calculate 95%
confidence interval as $312,635 to $555,455.
• In this case, the OR manager may feel comfortable stating that
the average OR loss as $434,045, although we have 95%
confidence that the actual (population) value will lie somewhere
close to this value, say, between
104
$312,635 and $555,455.
Bank’s Loan Loss Distribution
400
Series: HIST_LGD
Sample 1 829
Observations 829
300
Mean 0.751924
Median 0.937150
Maximum 1.000000
200
Minimum 0.000000
Std. Dev. 0.323241
Skewness -1.160426
100 Kurtosis 3.063549
0
0.0 0.2 0.4 0.6 0.8 1.0
Market Risk Example: Histogram of Daily Returns
for S&PCNXNIFTY over a 5-year period
500
Series: SNP_RETURN
Sample 1 1275
400 Observations 1275
Mean 0.001205
300 Median 0.002188
Maximum 0.079691
Minimum -0.130539
200 Std. Dev. 0.014263
Skewness -1.088501
Kurtosis 11.35109
100
0
-0.10 -0.05 0.00 0.05
Candidates of Popular Non-Normal Distributions
• Beta Distribution
• Log Normal Distribution
• Weibull
• Inverse Gaussian
• Exponential
• Laplace Distribution
Beta Distribution
The mean of beta distribution is given by:
 
Mean  & S .D. 
;   (   ) (    1)
2
The parameters of this distribution can be easily estimated

using the following (method of moments) equations:
 X (1  X )    X (1  X )  
ˆ  X    1 ˆ  (1  X )  2
  1
 S 2
   S  
Log Normal Distribution
• Density Function:
Location & scale: where meu=Mean and

Sigma=SD
Testing the Fitness of Continuous Distributions
• Jarque-Bera Statistics-Tests the normality of a distribution

• Kolmogorov-Smirnov Test-Identify the Fat Tails
• Anderson-Darling Test-Best Fits Extreme Distributions
• Scheartz Criteria
• Akaike Information Criteria
• Graphical-Quantile-Quantile (Q-Q) Plots, Probability-Probability
Polots (P-P).
110
Kolmogorov-Smirnov Test (K-S)
• Kolmogorov-Smirnov goodness of fit test that whether a

set of data come from a hypothesized continuous
distribution.
• It tends to be more sensitive near the center of the
distribution than it is at the tails.
• H0: The data follow the specified distribution. H a: The
data do not follow the specified distribution.
• Test Statistics:
• Where F(Y) is the theoretical fitted distribution
• i/N is the actual data distribution.
• The hypothesis regarding the distributional form is
rejected if the test statistic, D, is greater than the critical
value obtained from a table.
• You can run this test in Best Fit, Easy Fit, Data Plot
softwares
111
Anderson-Darling Test
• Anderson-Darling goodnes of fit test whether a data set comes from a

specified distribution.
• It is a modification of the Kolmogorov-Smirnov (K-S) test and gives more
weight to the tails than the K-S test.
• The K-S test is distribution free in the sense that the critical values do not
depend on the specific distribution being tested.
• The Anderson-Darling test makes use of the specific distribution in
calculating critical values. This has the advantage of allowing a more
sensitive test and the disadvantage that critical values must be calculated
for each distribution.
• You can run this test in Best Fit, Easy Fit, Data Plot softwares
• More formally, the test is defined as follows.
• H0: The data follows a specified distribution.
• Ha: The data do not follow the specified distribution
• For Test Statistic, See Statistics Book
112
Severity Distribution: Legal Liability Loss
Descriptive Statistics of Legal

60
Liability Losses (in British Pound)

Mean 151944
Median 103522.9
Standard Deviation 170767.1
40
Skew 2.8064
Kurtosis 15.3145
Percent
No. of Obs 140

20
0
0 500000 1000000 1500000

legl_loss
113
Normal Probability Plot for Legal Event Losses (P-P &
Q-Q plots)
Normal(151944, 170767) Normal(151944, 170767)

1.0 1.4
1.2
0.8
1.0
0.8
Fitted p-value
Values in Millions
0.6
Fitted quantile
0.6
0.4
0.4
0.2
0.2 0.0
-0.2
0.0
-0.4
1.0
0.0
0.2
0.4
0.6
0.8
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Input p-value
114 Input quantile
Values in Millions
Fitted Exponential Distribution
Expon(149190) Shift=+1688.6
7
Fitted Actual
6 Function RiskExpon(149190,
N/A RiskShift(168
Shift 1688.58848 N/A
5
b 149189.812 N/A
Minimum 1688.6 2754.2
Values x 10^-6
Maximum Plus infinity 1255736

4
Mean 150878 151944
Mode 1688.6 13551 [est]
3 Median 105099 103523
Std. Deviation 149190 170767
Variance 2.23E+10 2.90E+10
2
Skewness 2 2.8064
Kurtosis 9 15.3145
1
0
0.0
0.4
1.2
1.4
0.2
0.6
0.8
1.0
Values in Millions
90.0% 5.0%
115 >
0.009 0.449
Fitted Weibull Distribution to Cover the Fat Tail
Weibull(1.2154, 192107) Shift=-26732

Weibull Fit Actual
6
Function RiskWeibull(1.2154,
N/A 19210
Shift -26731.975 N/A
5
a 1.21540041 N/A
b 192106.533 N/A
4 Minimum -26732 2754.2
Values x 10^-6
Maximum Plus infinity 1255736

3
Mean 153392 151944
Mode 19533 13551 [est]
Median 115363 103523
2
Std. Deviation 148922 170767
Variance 2.22E+10 2.90E+10
1
Skewness 1.492 2.8064
Kurtosis 6.0945 15.3145
0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Values in Millions
< 90.0% 5.0%

116
>
-0.010 0.447
Weibull Probability Plot for Legal Event Losses
Weibull(1.2154, 192107) Shift=-26732 Weibull(1.2154, 192107) Shift=-26732

1.0 1.4
1.2
0.8
1.0
Fitted p-value
Values in Millions
0.8
0.6
Fitted quantile
0.6
0.4
0.4
0.2
0.2
0.0
0.0
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
-0.2
0.0
0.2
0.4
0.8
1.0
1.2
1.4
0.6
Input p-value
Input quantile
Values in Millions
117
VaR
Value-at-Risk is a risk measure which is

generically defined as the maximum possible loss
for a given position or portfolio within a known
confidence interval over a specific time horizon
due to certain kind of risk
118
Hypothesis Testing
• To determine the accuracy of your hypotheses you have made

since you have collected a sample of data on which basis you want
to make inference.
• => statistical significance
Hypothesis Testing: Illustrative Example-Empirically
Testing Corporate Debt Structure
Hypothesis Argumentation Variable Definition Expected
Sign
Agency costs Small firms have higher investment Size Log(TA) +
Opportunities. There is a +ve
relationship between firm size & debt
maturity
Agency costs Higher the growth of a firm, the lower Growth SG/TAG -
it’s debt maturity because of
underinvestment problem
Signaling The higher the turnover, the better is Turnover S/TA -
firm quality, the lower is debt maturity
Signaling The higher the volatility in earnings, Volatility Ln(σ(ΔRE/TA)) -

the more frequently does a firm
revise its capital structure, and the
lower is its debt maturity
Liquidity risk Firms with higher leverage are Leverage TD/TA +
expected to employ more long-term
debt
Liquidity risk Firms with access to capital market Access {0,1} +
have reduced liquidity constraints &
thereby higher debt maturity
Hypothesis Testing: Illustrative Example-Empirically
Testing NIM & Bank Variables
Hypothesis Argumentation Variable Definition Expected
Sign
Management Higher quality management Operating cost to OpCost/GINC -
Quality would be able to offer a profitable gross income
optimization of the bank’s ratio
portfolio & minimize operating
costs
Capital Well capitalized banks have Capital Tier I CRAR +
Ratio/Solvency higher NIM as they tend to face Adequacy Ratio
lower cost of funding
Fee Income Higher fee income helps the bank Share of fee FEEINC/TINC +
to tolerate lower spread income to total
income
Credit Risk of The higher the credit risk, lower is Provisions for Provisions -
the loan book the margin doubtful debts
Market Share It is expected that bank NIMs Bank’s business Ln(Business) +/-
have a positive relationship with share to total
bank market share? business
Ownership Foreign banks/ new generation Ownership Foreign -
pvt sector banks earn dummy dummy
significantly higher NIM than
domestic banks due to
technological edge
Univariate Analysis
• Parametric tests: mean equality test, median equality test (t-test,

F-test etc.)-tests are based on normality/nearly normal assumption.
• Non-parametric tests: Wilcoxon rank sum test, Spearman Rank
Correlation etc.-these statistical tests are independent of population
distributions and associated parameters.
• ANOVA: Analysis of Variance: to examine whether group ownership
matters in export or whether marketing has impact on sales etc.
• Trend analysis, Time series regression (ARIMA)-Time-series
econometrics is concerned with the estimation of difference
equations containing stochastic components.
Hypothesis Testing Procedure
• All hypothesis tests are conducted the same way. The researcher states a
hypothesis to be tested, formulates an analysis plan, analyzes sample data
according to the plan, and accepts or rejects the null hypothesis, based on
results of the analysis.
 State the hypotheses. Every hypothesis test requires the analyst to state a
null hypothesis and an alternative hypothesis. The hypotheses are stated in
such a way that they are mutually exclusive. That is, if one is true, the other
must be false; and vice versa.
 Formulate an analysis plan. The analysis plan describes how to use
sample data to accept or reject the null hypothesis. It should specify the
following elements.
• Significance level. Often, researchers choose significance levels (p-
value) equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be
used.
Note that: the p-values are the probability values that confirms whether the evidence
obtained is statistically significant or not. Statisticians have encoded evidence on a 0 to 1
scale, where smaller values establish greater evidence of statistical significance and p-value
less than 0.05 is generally accepted benchmark.
One-tailed test vs. Two-tailed Hypothesis Testing
• One-Tailed Test
– A test of a statistical hypothesis , where the region of rejection is on only
one side of the sampling distribution , is called a one-tailed test. In such
tests, we are only interested in values greater (or less) than the null. A
one sided hypothesis test is as follows:
– Test H0: k=0 against HA: k>0 or k<0 & we reject the null if | Tcomp |>Tcritical
• Two-Tailed Test
– A test of a statistical hypothesis , where the region of rejection is on both
sides of the sampling distribution , is called a two-tailed test. In such
tests, we are interested in values greater and smaller than the null
hypothesis.
– We write this as:
– Test H0: k=0 against HA:k≠0 & we reject the null if | Tcomp |>Tcritical
– In the two-sided hypothesis, we calculate critical value using α/2. For
example, α=5%, the critical value of the test statistic is T0.025.
Problem1: Two tailed test
• An inventor has developed a new, energy-efficient lawn mower

engine. He claims that the engine will run continuously for 5 hours
(300 minutes) on a single gallon of regular gasoline. Suppose a
simple random sample of 50 engines is tested. The engines run for
an average of 295 minutes, with a standard deviation of 20 minutes.
Test the null hypothesis that the mean run time is 300 minutes
against the alternative hypothesis that the mean run time is not 300
minutes. Use a 0.05 level of significance. (Assume that run times for
the population of engines are normally distributed.)
– Null hypothesis: μ = 300
Alternative hypothesis: μ ≠ 300
– Note that the null hypothesis will be rejected if the sample mean is too
big or if it is too small.
Solution1: Two-tailed test
• Analyze sample data. Using sample data, we compute the standard error
(SE), degrees of freedom (DF), and the t-score test statistic (t). It is also
termed as z statistic.
• SE = s / sqrt(n) = 20 / sqrt(50) = 20/7.07 = 2.83
DF = n - 1 = 50 - 1 = 49
t = (x - μ) / SE = (295 - 300)/2.83 = 1.77
• where s is the standard deviation of the sample, x is the sample mean, μ is
the hypothesized population mean, and n is the sample size.
• Since we have a two-tailed test, the P-value is the probability that the t-
score having 49 degrees of freedom is less than -1.77 or greater than 1.77.
• We use the t Distribution Calculator to find P(t < -1.77) = 0.04, and P(t >
1.7) = 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.
• Interpret results: Since the P-value (0.08) is greater than the significance
level (0.05), we cannot reject the null hypothesis.
Problem2: One-tailed test
• Bon Air Elementary School has 300 students. The principal of the
school thinks that the average IQ of students at Bon Air is at least
110. To prove her point, she administers an IQ test to 20 randomly
selected students. Among the sampled students, the average IQ is
108 with a standard deviation of 10. Based on these results, should
the principal accept or reject her original hypothesis? Assume a
significance level of 0.01.
– Null hypothesis: μ = 110
Alternative hypothesis: μ < 110
– Note that these hypotheses constitute a one-tailed test. The null
hypothesis will be rejected if the sample mean is too small.
Solution2: One-tailed test
• Analyze sample data. Using sample data, we compute the standard error
(SE), degrees of freedom (DF), and the t-score test statistic (t).
• SE = s / sqrt(n) = 10 / sqrt(20) = 10/4.472 = 2.236
DF = n - 1 = 20 - 1 = 19
t = (x - μ) / SE = (108 - 110)/2.236 = -0.894
• where s is the standard deviation of the sample, x is the sample mean, μ is
the hypothesized population mean, and n is the sample size.
• Since we have a one-tailed test, the P-value is the probability that the t-
score having 19 degrees of freedom is less than -0.894.
• We use the t Distribution Calculator to find P(t < -0.894) = 0.19. Thus, the
P-value is 0.19.
 Interpret results. Since the P-value (0.19) is greater than the significance
level (0.01), we cannot reject the null hypothesis.
Hypothesis Testing: Bond Loss Example 1
Hypothesis Testing for LOSS_RATE_BSP

Date: 10/24/07 Time: 12:50
Sample: 1 19
Included observations: 19
Test of Hypothesis: Mean = 128.0000

Assuming Std. Dev. = 72.44619
Sample Mean = 127.6842

Sample Std. Dev. = 72.44619
Method Value Probability

Z-statistic -0.019 0.9848
t-statistic -0.019 0.985
Example 2: Bond Loss
Hypothesis Testing for LOSS_RATE_BSP

Date: 10/24/07 Time: 13:03
Sample: 1 19
Included observations: 19
Test of Hypothesis: Mean = 80.00000
Sample Mean = 127.6842

Sample Std. Dev. = 72.44619
Method Value Probability

t-statistic 2.869035 0.0102
Errors of Testing
• There are two kinds of errors that can be made in significance

testing: (1) a true null hypothesis can be incorrectly rejected and
(2) a false null hypothesis can fail to be rejected.
• The former error is called a Type I error and the latter error is
called a Type II error.
True State of the Null Hypothesis

Statistical Decision
H0 True H0 False
Reject H0 Type I error Correct
Do not Reject H0 Correct Type II error
• The probability of a Type I error is designated by the Greek

letter alpha (α) and is called the Type I error rate; the probability
of a Type II error (the Type II error rate) is designated by the
Greek letter beta (ß).
Relationship Between Alpha, Beta and Power
132
Parametric-Mean Difference Test
• Many problems arise where we wish to test hypotheses about the means of two
different populations (e.g. comparing ratios of defaulted and solvent firms or
comparing performance of public sector bank vis a vis private banks etc.)
• Un-Paired test: Or,
• Start by assuming H0 is true and use the following test statistic to arrive at a
decision:
A low p value (<0.05) will Reject the null and a high p value
(>0.10) will fail to reject the null.
133
Ex: Difference between Solvent & Defaulted Group of
Borrowers
Variable Name Mean
Solvent Defaulted t-test for

Difference$
PROPERTY AREA (SQ. METER) 101.67 65.99 35.68**
(23.81)
GROSS MONTHLY INCOME (RS.) 20,443.30 9,711.90 10,731.40**
(28.56)
AGE_BORR 43 45 -1.79**
(-12.42)
NO_DEPEND 1.445 1.744 -0.2988**
(-12.57)
LN_ASSTVAL 12.75 11.95 0.798**
(22.15)
SECVAL_LOANAMT 1.65 1.50 0.15**
(10.05)
NO_CO_BORR 0.48 0.31 0.174**
(18.70)
COBOR_MINC 3061.04 10,24.64 2036.4**
(12.30)
ORGNL_TERMM 173.26 176.4 -3.14**
(-3.8)
No. of observations 7321 6166
Non Parametric Wilcoxon Rank-sum test
The hypothesis statements function the same way as the two sample
ttest – but we are focused on the medians rather than on the
means:
H 0: η 1 – η 2 = 0
H 1: η 1 – η 2 ≠ 0
These could also be expressed as one tailed tests.

Wilcoxon Rank Sum Test
Step 1: List the data values from both samples in a single list arranged from smallest
to largest.
Step 2: In the next column, assign the numbers 1 to N (where N = n 1+n2). These are
the ranks of the observations. As before, if there are ties, assign the average of the
ranks the values would receive to each of the tied values.
n
Step 3: Estimate the sum of the ranks (  R)1ifor the obs. from Population 1 & std.
dev. of combined ranks. i 1
To test the difference between ranksum, following statistics are used:

n
T   R1i n1 (n1  n2  1) n1 n2 s 2
E (T )  and Var (T ) 
i 1
2 (n1  n2 )
E(T): Expected Rank & s is the standard deviation of combined ranks
The test statistic is: T  E (T ) & z~N(0,1)

z
Var (T )
In Wilcoxon (1945), Mann & Whitney (1947) rank-sum test, we are checking whether
z is significant at 5% (i.e. p<0.05); & then we can conclude that medians are
statistically different.
If there is no difference between the two medians ( & the null is true), the value of z will be
around half the sum of the ranks – {(n1(1+N))/2} & z will be insignificant (Follow my excel hands on
exercise & STATA command )
Empirical Testing of Hypotheses: ANOVA
• ANOVA is used to compare means between more than two groups

• Examples: studying the effectiveness of various training
programmes on learning/skill development, Tests if occupational
stress varies according to age..
• Example: Hypothesis Testing

• H0: All the age groups have equal stress on the average or μ1 = μ2 =
μ3
• Ha: The mean stress of at least one age group is significantly
different.
• The one-way ANOVA is an extension of the independent two-

sample t-test.
ANOVA Table

Mean Sum
Source of Sum of of Squares
variation d.f. squares F-statistic p-value
Between m-1 SSB SSB/m-1 Go to

(m groups) (sum of squared Fm-1, mn-m
deviations of group
means from grand chart
mean)
SSB
m 1
SSW
mn  m
SSW s2=SSW/mn-m
Within m×(n-1)
(sum of squared
(n individuals per
deviations of
group)
observations from
their group mean)
Total m×n-1 TSS

variation (sum of squared deviations of TSS=SSB + SSW
observations from grand mean)
INTERPRETATION of ANOVA:
How much of the variance in stress is explained by treatment group?
R2=“Coefficient of Determination” = SSB/TSS
One way ANOVA vs. Two way ANOVA
• One-Way ANOVA: An ANOVA hypothesis tests the difference in

population means based on one characteristic or factor
– e.g. If a research wants to determine if there is a difference in
the mean height of stalks of three different types of seeds.
– Since there is more than one mean, one can use a one-way
ANOVA since there is only one factor that could be making the
heights different.
• Two-Way ANOVA: An ANOVA hypothesis tests comparisons
between populations based on multiple characteristics.
– Suppose that there are three different types of seeds, and the
possibility that four different types of fertilizer is used, then you
would want to use a two-way ANOVA.
– The mean height of the stalks could be different for a
combination of several reasons
Correlation and Dependence Analysis
• Frequency based Joint Dependence: Using probability and set

theorem-Random sampling with or without replacement
• Pearson Correlation Coefficient(rx,y): Cov(x,y)/(SDx×SDy)
• It is a parametric measure
• Spearman’s Rank Correlation Coefficient (ρ ): For example
correlation between salary ratio and gross income generation for 20
traders.
• ρ=1-6Σdi2/(n2-1)n where di are the differences of the ranked pairs.
For tie adjustment add: Σ(t3-t)/12 to Σd2; t=no. of individuals involved
in how many ties.
• This is a non parametric measure
140
Spearman’s Non-Parametric Rank Order Correlation
• Spearman’s Non parametric methods can also be used to measure the

correlation of two variables.
• If X and Y variables are ranked in such a manner, the coefficient of rank
correlation, or Spearman’s formula for rank correlation is given by:
6 d 2
R  1
n3  n
• The tie adjusted rank correlation coefficient is:
6{ d 2   (t 3  t ) / 12}
R'  1
n3  n
• t is the number of individuals involved in a tie either in the first or second
series. => See the Excel Illustration.
Kendall’s Rank Correlation
• Kendall’s (1955) Rank correlation is also a Non parametric method to evaluate

the degree of similarity between two sets of ranks given to a same set of
objects.
• It takes into account the maximum number of pairs which can differ between two
sets (say R1=1,3,2,4 & R2=1,3,4,2) with n(n-1)/2 possible combinations (or
ordered pairs P1 & P2).
• The formula for Kendall rank correlation coefficient is:
1
N ( N  1)  d  ( P1 , P2 )
2  [d  ( P1 , P2 )]
 2  1
1 N ( N  1)
N ( N  1)
2
• Check possible order combinations and compare two order sets. Then count the
number of different pairs (i.e. d) between these two order sets and estimate tau.
=> See the Excel Illustration.
• Because τ is based upon counting the number of different pairs between two
ordered sets, its interpretation can be framed in a probabilistic context.
Multivariate Analysis
• Multiple Regression: To predict the variability in the dependent

variable based on its covariance with set of independent
variables=> OLS, 2SLS, MLE, Logistic Regression, Tobit
Regression, Time series VAR.
• Multiple Discriminant Analysis (MDA): The primary objective is to
predict an entity’s likelihood of belonging to a particular class or
group based on several predictor variables.
MDA Analysis: Altman’s Z Score
 Altman (1968), for the first time, applied Multiple Discriminant

Analysis (MDA) in response to shortcomings of traditional univariate
financial ratio analysis.
MDA models are developed in the following steps :
 Establish a sample of two mutually exclusive groups: firms which
have “failed” and those which are still continuing to trade
successfully
 Collect financial ratios for each of these companies belonging to
both of these groups
 Identify financial ratios which best discriminate between groups
(F-test/ Wilk’s Lambda tests).
 Establish a Z score based on these ratios.
The Z score and weights
• The discriminant coefficients can be estimated using following formula based

on 2 variables:
• Z=aX+bY where X=TOL/TA and Y=CR;
• where
• a={(VarY(avg.Xsolv-avg.Xdef))-(CovXY(avg.Ysolv-Ydef))}/((VarX×VarY)-(CovXY)^2)
• b={(VarX(avg.Ysolv-avg.Ydef))-(CovXY(avg.Xsolv-Xdef))}/((VarX×VarY)-(CovXY)^2)
• Where Cov XY=Σ(X-avgX)(Y-avgY)/n-1
• avg. Xsolv=mean of variable X for borrowers in solvent category
• avg. Xdef=mean of variable X for borrowers in defaulted group
• avg. Ysolv=mean of variable Y for borrowers in solvent category
• avg. Ydef=mean of variable Y for borrowers in defaulted category
• The cut off Z-score is the combined benchmark for identified independent
variables to classify the prospective borrower into defaulted or solvent
category.
145
Statistical Scoring Model-Altman’s Z-Score Model
Z = 1.2X1 + 1.4X2 + 3.3X3 + 0.6X4 + 0.999X5

Z = The Z score
X1 = Net Working Capital (NWC)/Total Asset (liquidity)
X2 = Retained Earnings/Total Asset (cumulative profitability)
X3 = Profit before Interest and Tax (PBIT)/total assets (productivity)
X4 = Market Value of Equity/Book value of Liabilities (movement in the asset
value)
X5 = Sales/Total Assets (sales generating ability)
Altman’s Z-Score Model
 It is a classificatory model for corporate customers
 Zscore > 2.99 - firm is in a good shape

 2.99>Zscore>1.81 - warning signal
 1.81>Zscore-big trouble; firm could be heading towards bankruptcy
 Therefore, the greater a firm’s distress potential, the lower its discriminant
score
 Z-score model can be used as a default probability predictor model

Application of MDA
• Statistical techniques like MDA methods may be applied to design

Qualitative Risk Rating of Operational Risk Management Process in a
Bank (as part of QLA).
• It is also used to develop Credit Scoring Models (Both Corporate as

well as Retail).
• MDA can be done in SPSS (see my demonstration of the technique)

Key Features of Financial Econometrics
• The Regression Model: ( yt    xt  ut )

– The Method of ordinary least squares
• Assumption (disturbance term;observations, independent
variables)
• The Gauss-Markov theorem (BLUE,consistency)
– Other methods of estimation
• Maximum likelihood
• Moments
• Bayesian approach
– The Probability distribution for OLS estimator
• Parameters and disturbance term
• t,F,Z tests and significance (confidence intervals)
• Applications (structural break,prediction,model selection)
– Extensions
• Diagnosis and treatment
Time Series Analysis-the following data represent
the closing value of the Dow Jones Industrial Average
for the years 1980 - 2001.
Time Series Plot
Time Series Analysis
• Time Series Univariate Models: [ xt  f ( xt 1 , xt 2 ,... xt  p , ut )]

– Components
• Trends
• Seasonality
• Cycle
• Randomness (convergence)
• Conditional heteroskedasticity (volatility)
• Non-linearity (state dependency)
– Determinants
• Function structure: f
• Lag order: p
• Cointegration
• Dynamic Multivariate Models: [ yt  f ( xt , yt 1 , yt 2 ,... yt  p , ut )]
– VAR, VECM (impulse response function)
References
• Cooper, D. R. and Schindler, P.S. (2005): “Business Research Methods”,

Tata McGraw-Hill, New Delhi.
• Day, R. (1979): “How to Write a Scientific Paper”, Cambridge.
• Emory, W. C. (1976): “Business Research Methods”, Homewood,
Illinois: Richard D Irwin Inc.
• Gujarati D N: “Basic Econometrics”. McGrawHill.
• Introduction to Statistics By Ronald E. Walpole, The Macmillan Company,
New York.
• Katz, David A (1982), “Econometric Theory and Applications”, Prentice-Hall.
• Kumar, R. (2008, 11): “Research Methodology: A Step-By-Step Guide
for Beginners”, Low Price Edition (3rd Edition), Pearson Education
(Ch2, Ch3, Ch4, Ch5, Ch6, Ch12, Ch13, Ch15, Ch17)
• Nagar A L., and R K Das (2000):“Basic Statistics”, 2nd Edition, Oxford
University Press.
• Pindyck, R.S., and D. L. Rubinfeld (1981), “Econometric Models and
Economic Forecasts”, McGraw-Hill International Editions.
• Ronald E Walpole (1982) “Introduction to Statistics”, Publisher: The
Macmillan Co., NY.
• Rose D and Sullivan, O. (1996): “Introducing Data Analysis for Social
Scientists”, Open University Press.
Statistical Software will be used
• SPSS
• STATA
• Eviews
• Bestfit/Easyfit
• Palisade@Risk
Thank You
Email: arindam@nibmindia.org

Empirical Research Methods-AB

Uploaded by

Copyright:

Available Formats

You might also like

Empirical Research Methods-AB

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Empirical Research Methods-AB

Uploaded by

Copyright:

Available Formats

Empirical Research Methods: Problem

National Institute of Bank Management

• A research method is a technique for (or way of proceeding in)

• Some kind of implicit theories about the nature of

Econometric Refined Econometric

Estimation of Econometric Model with the

• Econometrics utilizes economic theory, facts (data) and statistical

• The course is designed to enable students to obtain a working

• Quantitative research is about measurements.

1. Interest->The topic or theme of research

• Find a topicWhat, When

• We will think of a research as having the following logical steps.

 It means choosing a topic and grounding your research.

• The selection of the theme must have

• Selection of the theme/topic

• Logical structure of the research (data & methods).

• A proposal will usually contain:

1 week 1 week 2 weeks 4 weeks

Phases Description Time

Phase IV  Compilation, cleansing and sorting of collated data 4 weeks

Phase V One-day Seminar on Loss Given Default at NIBM to share 1 week

TOTAL TIME FOR THE STUDY 13 weeks

• Improve the quality of decision making

• A hypothesis is a tentative statement of what you think you are likely

• There should be a proposed title or an initial short statement about

• Description of your methodology materials, etc. may be tentative at

• Likewise, although you cannot have a discussion of results you do

• A Literature Survey consists of two parts - a Literature Search and a

• Literature Review helps the researcher to set a goal for his/her

3. Choose the method of writing:

• Researcher may structure the literature review section either by arranging

• It is advised to provide a table to facilitate readers to identify the key findings

Author Period Sample LGD type Statistics

Ciochetti (1997), Real Estate -Average 30.6%

Author Period Sample LGD type Statistics

Ghosh & Newar (2008), Conceptual Market -Average 75%-80% from

Author Taxes Method & Year Area Sample Size Response

• When using the resources in Literature Review, whether the key

• When considering the establishment of a framework for statistical

• Primary Data: Survey research (through questionnaire

This process involves accessing information that is already

• It is very essential to create data templates and formats for

• When we are studying a very large group of cases (e.g. Housing

The aim of sampling is to produce a miniature copy of the population. Each

• For large populations, Cochran (1963), Sampling Techniques, 2nd Edition,

• Simple Random Sample: Every member of the population has a

• Data Analysis: Summary Statistics

• Title Page: should be brief & catchy.

• Introduction: The need for the research, background research,

• Originality/Reliability: Does the report/paper add to the subject

• The most common error made in reading [and conducting] research

• The purpose of your presentation is to advertise your new idea, get

• Tailor the project report to the audience and the problem

• Most research studies result in a large volume of raw statistical

• Graphical Methods-Bar Charts, Pie Charts etc.

Quarterly NPA Position of SCBs

ALL PSBs New Pvt Old Pvt PSB-Large PSB-Mid PSB-Small

• The objective of descriptive statistical analysis is to develop