Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

ECO 391 – Research Methods I

Lecture Slides

By

Dr Jonathan E. Ogbuabor
Department of Economics, UNN
jonathan.ogbuabor@unn.edu.ng
+234(0)8035077722
Data Analysis
Types of data used in economic analysis:
Data used in economic analysis can be
classified according to:
 Source;
 Time dimension;
 Degree of aggregation.
Data classification according to source:
 Primary data: data collected for the first
time by the researcher through
questionnaire administration, face-to-face
interview, direct measurement/observation
 Secondary data: data that have already
been collected by someone else, e.g. from
research reports, government documents,
statistical reports, institutional publications,
etc.
Things to consider before using any
secondary data:
 Is the data too old?
 Is the data in the required frequency/timing?
 Does the data cover the relevant research
variables/geographical space under study?
 The integrity, reliability and credibility of the
data collection process
 Is the data in the form required for analysis, or
does it need to be transformed?
 Is the level of aggregation compatible with the
analysis to be done?
Data classification according to time dimension:
 Time series data: This gives information about
research variables from period to period.

It is a set of observations on the values that a


variable takes at different times, e.g. daily,
weekly, monthly, quarterly, or annually.

A time series is said to be stationary if its


mean and variance do not vary systematically
over time (more on this later)
Time dimension classification cont'd
 Cross-sectional data: These are data on one or
more variables collected at the same point in
time e.g. census of population conducted every
10 years, yam production by the 36 states in
Nigeria every year, etc.
 Panel data (or pooled data or combined data):
This has both time series and cross-sectional
data.
It gives the values of the same set of units over
time, arising from repeated surveys of the same
unchanged sample.
Classification by degree of aggregation:
 Micro-level data: Are observed for individuals,
households, firms.
 Macro-level data: Are observed for the
economy as a whole.
Note that data disaggregation allows for
deeper analysis showing intra-national or
intra-sample variations like income
classes, geographic regions, genders, etc
Categories of disaggregation are:
 Spatial dimensions: e.g. national, state, LGA,
town, urban, rural, etc
 Individual characteristics: e.g. age, sex, race,
ethnic group, migrant/non-migrant, etc
 Income: e.g. poverty classes, wealth quintile,
etc
 Education: levels of attainment such as
primary, secondary, tertiary, etc
 Employment: sector (Agric, Manufacturing,
Services, etc), formal/informal, etc
Data Preparation
 A researcher should invest time and effort in
preparing and presenting his data.
 This is to ensure that the data is put in a form that is
suitable for estimation
 This also means that the original data could undergo
certain transformations.
 In the case of time series data, such transformations
may include: Data indexing; Log transformation;
Conversion from one frequency (e.g. annual)
to another (e.g. quarterly); Seasonal
adjustment, etc.
• Log transformation is most common.
Data Presentation
• We present our data using:
Tables (e.g. Frequency Distribution Tables)
Charts
Line Graphs
Scatter Plots
Pie chart
Bar chart
Area chart
etc
Rules Guiding Data Presentation:
• Give a title to all tables/charts;
• Scale & label the axis of all charts;
• Indicate the source of all tables/charts;
• Provide notes, where necessary, to aid
understanding;
• Discuss each table/chart before presenting
another one;
• Avoid too many tables/charts in one report;
• Be simple/concise.
Data Analysis
• Data analysis is the creative process that uses
exploratory, descriptive, and inductive
techniques to examine the nature, patterns,
and relationships of data.
• The nature and purpose of study determine
the type of analysis and approach used, but
general principles apply
• Data analysis is usually preceded by data
presentation.
Forms of Data Analysis
• Here, we consider two forms of data analysis:
• 1. Descriptive Statistics: This summarizes the
data as it exists at face value. We consider:
Mean; Mode ; Median
Range; Variance; Standard
Deviation
Skewness; Kurtosis;
Correlation Matrix (Very important in time series analysis)
Percentiles; etc
They do not involve in depth analysis or reference
to statistical tables
Simple Correlation Analysis
• Correlation measures the degree of rshp existing b/w 2 or
more variables, i.e. the degree to which the variables tend to
move together
• It can be simple (involving 2 variables) or multiple (involving 3
or more variables)
• 2 variables are +vely correlated when they tend to change
together in the same direction; -vely correlated if in opposite
directions; uncorrelated when they tend to change with no
connection to each other
• Partial correlation coefficients recognises the covariability of
more than 2 variables and measures the correlation b/w any
2 variables while assuming that other variables are held
constant
• It is standard to assume linearity and that all variables are
normally disn, i.e. are of the same population
Correlation Contd
• Note that -1< rxy <1
• Pearson coefficient of correlation is commonly
used.
• Where the variables are qualitative or cannot be
measured numerically, the rank correlation
coefficient (or Spearman's correlation
coefficient) is used.
• r = 1 – 6ΣD2/n(n2 – 1) ; where D is diff b/w pairs
of ranks and n is no of obs. -1<r<1 applies.
• Correlation analysis does not imply cause-and-
effect rship.
Data Analysis Contd
• 2. Inferential Statistics: Provides deeper
analysis, usually in consultation with statistical
tables to enable the acceptance or rejection of
hypothesis. Examples are:
Regression analysis
Analysis of variance
Principal component analysis
Etc
Our emphasis here is on Regression Analysis.
Regression Analysis
• Regression analysis establishes and/or proves
how one variable is related to another
• It is a useful tool for prediction/forecasting,
inference, hypothesis testing, and modeling
causal relationships, provided the model
assumptions are satisfied
• A typical regression eqn contains the
dependent (response) variable, the
independent (explanatory) variables,
regression parameters to be estimated, and
an error term
Regression Analysis ContD
• We consider the following classes of models:
Simple linear regression model
Residual Diagnostics
Stability diagnostics
Granger Causality Analysis
Multiple regression model
Autoregressive Distributed Lag Models
Error correction models
Models of qualitative choice
Logit model Probit model Tobit
model
Simple Linear Regression
• A functional rship is a statement of how one variable
depends on one or more other variables, e.g. GDP =
f(GEXP)
• In research, the task is to estimate this functional or
causal rship and test its validity/falsity
• Regression analysis enables us to determine the eqn
for this rship, its form and parameters
• Regression analysis may be linear or nonlinear,
simple or multiple
• Simple linear regression provides the procedure for
determining the line of best fit of the rship b/w, say
GDP and GEXP
Simple Linear Regression Contd
• The general form is: y = α + βx + ε …… 1
• where ε is the error term that captures the unpredictable part
of y; y and x are data quantities; α and β are unknown
parameters (“constants”) to be estimated from the data
• The values of the parameters are usually derived by the
method of OLS, which minimizes the sum of squared error
estimates for the given dataset
• Once the regression model is constructed/estimated, we
check the goodness of fit (e.g. R2); the statistical significance
of the estimated parameters (e.g. t-test for individual
parameters and F-test for overall fit); check the patterns of
the residuals
• F-test and t-test are meaningless unless the model
assumptions are satisfied
Simple Linear Regression Contd
• Some important model assumptions of interest in
this course include:
 Fixed regressor values or regressor values independent of
the error term (Not often true)
 Model is Linear in parameters (i.e. there is linear rship b/w
dependent & regressor variables) (Not true sometimes)
 Constant parameters
 Normality (the residuals are jointly normally distributed
random variables with zero mean) (Not so serious)
 No autocorrelation (All pairs of disturbances are
uncorrelated) (Serious)
 Homoskedasticity (The variance of the disturbance exist
and is constant) (Serious)
 No multicollinearity (pairs of regressors not highly
correlated) (Serious)
Multiple Regression Analysis
• This involves more than one regressor
• The general form is:
• Y = α + β1X1 + β2X2 + … + βnXn + ε …… 2
• As before, the parameters can be estimated
by least squares method
• The standard assumptions still apply, e.g.
normally disn variables, linear rshp b/w
dependent and independent variables,
homoscedasticity, etc
Evaluation and Interpretation of Regression Results
• We consider the three criteria for evaluating
regression results:
 Economic Criteria (A Priori Criteria)
 Statistical Criteria (First Order Test)
 Econometric Criteria (Second Order Tests)
• Economic Criteria: The value of each regression
coefficient is a measure of how strongly each
regressor influences the dependent variable. The
higher the value of the coefficient, the higher the
impact of the regressor. Evaluation is generally based
on signs and magnitudes. If the variables are in log
form, the coefficients can be interpreted as
elasticities
Evaluation and Interpretation ContD
• Statistical Criteria: The first order tests include
 The t-test: To find out if the impact/influence of each
individual regressor on the dependent variable is
significant at a given level of significance or otherwise
(though not conventional, 2-t rule of thumb may be used)
 The F-test: To test the overall significance of the model
 The Coefficient of Determination (R2): This is a measure
of goodness of fit. It measures the percentage or total
amount of variations in the dependent variable accounted
for by variations in the regressors, i.e. it is used to assess
the explanatory power of the regressors on the dependent
variable.
 Adjusted R2: This considers the no of variables and the no
of observations in the model, since the usual R2 tend to
over-estimate the success of the model.
Evaluation and Interpretation ContD
• Econometric Criteria: The tests under this criteria are used to
ascertain if the necessary assumptions of the model have
been satisfied.
They determine the reliability of the statistical criteria, and
establish if the estimates have the BLUE property of OLS.
We consider the following 2nd order tests:
 Normality Test: This tests if the error term follows the normal
disn. The Jarque-Bera (JB) Statistic which follows the Chi
Square disn is usually used. H0: Error term is normally disn
 Autocorrelation Test: To test if the residual is serially
uncorrelated. By 2-t rule of thumb, a DW close to 2 indicates
absence of autocorrelation. A more formal test is the Breuch-
Godfrey test. The Newey West HAC method is used to correct
for this problem.
Econometric Criteria ContD
 Heteroscedasticity Test: To test if the error term has a
constant variance or not. The White Heteroscedasticity test
(no cross terms) or the Breusch-Pagan Test, both of which
follow the Chi-sqaure disn can be used. H0: Homoscedasticity.
If a problem, then the Newey West HAC method can be used.
 Test for Multicollinearity: The regressors should not be
strongly correlated as this causes problems when drawing
inferences about the impact/influence of a regressor on the
dependent variable. High simple correlations b/w pairs of
regressors and a relatively high R2 with few significant t-
statistics give an indication of this problem. We will use the
correlation matrix for this test.
 Test for Specification Error or Model Stability: To test if the
model is correctly specified. The Ramsey RESET test which
follows the F-statistic is used to test H0: Model is not correctly
specified.

You might also like