Professional Documents
Culture Documents
Corrup Logit
Corrup Logit
Abstract
In this study, we analyse the data set compiled by the World Bank
in their survey of firms in the European region (BEEPS) using a model
logistic regression for binary response variables. We pick out variables
that concern informal gifts or bribes and attempt to link them with the
characteristics of a firm and components of its business environment.
We find a significant relationship between being asked for a bribe in
connection with acquiring a certificate required for operations and the
total number of inspections carried out by enforcement agencies, among
other factors. Several econometric methods are employed to understand
the relationship fully, and diagnostic tests are used to check for the
robustness of the results.
1 Introduction
The World Bank group (World Bank) and the European Bank for Re-
construction and Development (EBRD) are the organizations to which
the Business Environment and Enterprise Performance Survey (BEEPS
henceforth) can be attributed to. The objective of the survey was to
collect firm-level data on matters relating to labour, governmental rela-
tions, innovation and corruption. It covers 29 countries (predominantly
in the European region) including Russia, Poland, Turkey and Hungary
(a full list can be found on the website of BEEPS). This particular study
will deal primarily with data relating to corruption and the incidence
of requests for informal payments in connection with a firm s operations.
The main issue will revolve around the various characteristics a firm
might possess that make it susceptible to such informal payments (bribes
henceforth). The level of corruption in a country or region has proven
to be a very important factor in deciding the strength of its business
environment. Svensson (2003) studies corruption using a unique data
set in Uganda, finding that not all firms reported having to pay bribes,
and there were differences between firms facing the same set of poli-
cies. There has been a vast amount of literature relating to perception
indices in order to study the macro-determinants of corruption using
cross-country data.1
1
Some of the studies that do this are Hellman, Jones, Kaufmann, and Schankerman
(2000) and Kaufman and Wei (1999).
2
2 Econometric Model and
Literature Review
In this particular study, we shall focus on the effect of several business
environment factors and characteristics specific to firms on the incidence
and reporting of bribery. We make use of a logistic regression model,
with a cumulative distribution function that is log normally distributed.
The dependent variable is a binary response variable, taking values y
= 1 for a response of Yes and y = 0 for No and missing values were
generated for all other response values (non-response or refusal).
x1 = 1 b7 + 2 ecaq69 + 3 ecaw1 + 4 e2
and
e x2
pecap7 = ecap7 (x2 ) = (2)
1 + e x2
where
3
Logit models have been found to yield similar results as that of
probits (normally distributed), but are preferred for studying binary
response variables as compared to a linear probability model (LPM),
which is known to produce biased and inconsistent estimators. The
choice of the model depends on the data generating process, and it has
been suggested that the costs of choosing either one of logit or probit
are not heavy, in the sense of predicting similar probabilities(Cameron
and Trivedi 2005). One difference is that the logistic normal curve has
thicker tails as compared to the probit (standard normal curve). It can
also be mentioned that logit models are particularly useful in biomet-
rics, and a useful example is provided by Horowitz and Savin (2001).
In the same paper, they also highlight the use of semiparametric meth-
ods of estimating the logit model when it is difficult to ascertain the
form of dependence on the explanatory variables; this is one of the ways
of overcoming specification problems associated with binary response
variables. Lastly, in the case of a logit model, we cannot infer the prob-
ability of observing the sample by looking at the coefficients that result
from a logistic regression (these give us only the direction of change).
We, therefore, focus on the marginal effects (calculated at mean and
evaluated at every observation using the sample average of the individ-
ual marginal effects).
2
For more details about the survey and data analysis using panel data models, refer to
the website: http://beeps.prognoz.com/beeps/Home.ashx
4
3 Data Used
The dataset used in the analysis was downloaded from the website of
BEEPS(EBRD-World Bank 2010). There were 437 variables included
in the dataset, each with 11998 observations. Of these, this particu-
lar study makes use of 10 variables for regressions, whereas nearly 20
more variables of interest were analyzed separately (not included in this
study).
After having generated the missing data, some of the variables had
become unfit for regression as they did not have enough useable ob-
servations, and hence were discarded. We conducted analysis assuming
homoscedasticity of error terms, as correction of heteroscedasticity re-
quires knowledge about the form of heteroscedasticity present in the
data. It was found that the data used for a specified model was het-
eroscedastic in nature, and ways to work around this are suggested in
the next section.
The results of the BEEPS data analysis showed (among other things)
that there was a significant frequency of meetings, where firm-owners
or managers were asked to pay a bribe, and a large fraction of firms de-
scribed corruption as the biggest obstacle for a firm in the region. The
details of this can be found on the EBRD-World Bank data portal for
BEEPS. As mentioned earlier in the literature review, several studies
have found conflicting results about how much corruption affects devel-
opment in a country.4
3
Survey non-response was differentiated from item non-response. The former refers to
refusals to participate in the survey altogether, whereas the latter refers to the refusals to
answer some specific questions. BEEPS suffers from both problems and different strategies
were used to address these issues. A report summarized these numbers to alert researchers
to these issues when using the data and when making inferences.
4
Since the analysis is restricted to the data from BEEPS, there are very few studies that
suggest any relation between the variables under study. See Hellman, Jones, Schankerman,
and Kaufmann (1999) for a useful study of BEEPS data, especially regarding perception
bias.
5
4 Evaluation of the Model
A logistic regression (reporting coefficients as well as log-odds ratio) was
carried out using Maximum Likelihood Estimation (MLE, hence-
forth). MLE is preferred over Ordinary Least Squares method of esti-
mation (OLS) as the latter yields a biased and inconsistent estimator,
as given in Greene (2003). The analysis also involves using marginal
effects (calculated at mean as well as taken at the sample average of the
individual observations) in order to determine the degree of effect of a
change in the measure of independent variables on the binary response
probability. We conclude this section with a brief outline of the diag-
nostic tests carried out for heteroscedasticity and goodness-of-fit.
1. MLE
This method of estimation proves to be ideal for binary response
variables that represent non-linear models and are iteratively solved.
MLE maximizes the log likelihood of the joint density function
of observing a given sample. It can be shown that MLE (under
regularity conditions) is consistent, asymptotically normally dis-
tributed and efficient. Notably, MLE is found to be biased (but
consistent) under small sample properties. Furthermore, MLE is
not chosen in order to provide a good fit for the data or to have
extraordinarily good predictive power on the binary response prob-
ability, but rather to simply maximize the joint density of the ob-
served dependent variable.
2. Marginal Effects
The analysis employs the calculation of various marginal effects,
since the coefficients of the independent variables do not vary with
x as pointed out earlier. Marginal effects differ with the point
of evaluation of x, where we can determine the average of the
marginal effects (AME) or calculate marginal effects at the sample
mean of the regressors (ME at mean). AME may be more useful
than ME when the explanatory variables are discrete or non-linear
in nature (i.e. mean observations may be non-representative of the
entire sample). Both types of marginal effects hinge greatly on the
size of the sample. If dummy variables (D) are explaining the bi-
nary response outcomes, then those effects are found by evaluating
them at D = 1 and D = 0, and then finding the difference.
6
3. Log-odds ratios
A very common interpretation of the marginal effects is through
the odds ratio. The log-odds ratio gives the probability that y = 1
relative to y = 0. Since we are using a logit model, the log-odds
ratio is linear in the regressors (Cameron and Trivedi 2005).
4. Heteroscedasticity tests
We have used the Lagrange Multiplier (LM) test in order to
validate the assumption of homoscedasticity. Since the errors are
found to be heteroscedastic, we cannot rely wholly on the results
of MLE. Introducing more flexible and functional forms of inde-
pendent variables may help in achieving homoscedasticity, such as
using instrumental variables, and two-stage least squares as sug-
gested by Evans and Schwab (1995). There is also a chance that
heteroscedasticity may arise from some form of misspecification of
the model, thereby casting aspersions on the power of the LM test.
7
5 Results
5.1 First logit model
8
yields negative coefficients on b7 and f 1 (percent capacity utiliza-
tion of firm, calculated as Output/Max. Output), and positive
coefficients on all others. Turning our attention to the ME and
AMEs of this model, we can conclude the following:
Finally, we can say that the concerns about the business environ-
ment and performance of a firm in a given country being affected
by the levels of corruption are not completely unfounded. The
analysis carried out shows that there are some characteristics of a
firm that significantly affect the incidence of bribery and corrup-
tion. A summary of all results is presented below:
Ecap1: whether compulsory certificate was required? Ecap7: whether bribe was requested in connection with certificate?
Marginal effects Average marginal effects Log-odds ratio Marginal effects Average marginal effects Log-odds ratio
b7 -.004190** -0.004068** 0.9833452** -.0023739** -.0024931** 0.9736162**
ecaw1 .006245** 0.0060601** 1.025343** 0.0019926** 0.0020927** 1.022697**
ecaq69 .0019098** 0.0018532** 1.007683** 0.0011529** 0.0012108** 1.01307**
e2 0.213292* -0.0206976* 0.9180745* - - -
b3 - - - 0.0004625 0.0004857 1.005223
ecaw2 - - - 0 0 1
f1 - - - -0.0006369* -0.0006689* 0.9928521*
Significant at 1%
Significant at 5%
9
6 Improving the Analysis
There are several ways in which the analysis could have been improved,
given more time and resources. Firstly, the heteroscedasticity that was
found in the data for the model could have been done away with in many
ways - as mentioned earlier by using two-stage least squares method of
estimation as well attempting to find a more flexible specification of
the model. Another method of dealing with this is through Elaborate
Likelihood Estimation (for probit) as suggested by Alvarez and Brehm
(1995), where they develop a heteroscedastic probit model.
The data used was primarily cross-section, but since the survey has been
conducted over a period of years, we could also employ panel data re-
gression using Generalized Least Squares (GLS) to estimate the model.
The original BEEPS data analysis involves complex weight structures,
and since stratified random sampling methods were used to collect data,
there may be different results owing to these factors as well.
7 Concluding Remarks
We have analyzed a very specific aspect of bribe-giving and corrup-
tion derived from data collected in the BEEPS 2009 edition. In the
first model, the requirement of compulsory certificates for operations
for a firm was shown to be affected by the number of degree-holding
employees and the total number of inspections in the past year. Our
findings also show that there was a significant effect of the total number
of inspections in a given year (and the years of work experience of the
top manager) on the firms that reported being asked for bribes. The
data used for estimating both models displayed characteristics of het-
eroscedasticity in the error terms, thereby making MLE suspect. The
various ways in which this can be remedied are mentioned briefly in the
paper as well. Finally, brief mentions of extending the analysis to spe-
cific countries and employing panel data methods to study the BEEPS
data are suggested.
10
References
Alvarez, R. M., and J. Brehm (1995): American Ambivalence
Towards Abortion Policy: Development of a Heteroskedastic Probit
Model of Competing Values, American Journal of Political Science,
39(4), pp. 10551082.
Bardhan, P. (1997): Corruption and Development: A Review of
Issues., Journal of Economic Literature, 35(3), 1320 1346.
Ben-Akiva, M. E., and S. R. Lerman (1985): Discrete choice anal-
ysis : theory and application to travel demand / Moshe Ben-Akiva,
Steven R. Lerman. MIT Press, Cambridge, Mass. :.
Cameron, A. C., and P. K. Trivedi (2005): Microeconometrics, no.
9780521848053 in Cambridge Books. Cambridge University Press.
Cramer, J. S. (1999): Predictive Performance of the Binary Logit
Model in Unbalanced Samples, Journal of the Royal Statistical So-
ciety. Series D (The Statistician), 48(1), pp. 8594.
Dzhumashev, R., A. Islam, and Z. H. Khan (2010): Non-
collusive Corruption: Theory and Evidence from Education Sector
in Bangladesh, Monash Economics Working Papers 38-10, Monash
University, Department of Economics.
EBRD-World Bank, T. (2010): The Business Environment and
Enterprise Performance Survey 2008-09: A report on methodology
and observations, .
Evans, W. N., and R. M. Schwab (1995): Finishing High School
and Starting College: Do Catholic Schools Make a Difference?, The
Quarterly Journal of Economics, 110(4), 941974.
Greene, W. (2003): Econometric Analysis. Prentice Hall, Upper Sad-
dle River, NJ.
Hellman, J. S., G. Jones, D. Kaufmann, and M. Schankerman
(2000): Measuring governance, corruption, and State capture - how
firms and bureaucrats shape the business environment in transition
economies, Policy Research Working Paper Series 2312, The World
Bank.
Hellman, J. S., G. Jones, M. Schankerman, and D. Kauf-
mann (1999): Measuring Governance, Corruption, and State Cap-
ture: How Firms and Bureaucrats Shape the Business Environment
in Transit, Research Working papers, 1(1), 145.
11
Horowitz, J. L., and N. E. Savin (2001): Binary Response Models:
Logits, Probits and Semiparametrics, Journal of Economic Perspec-
tives, 15(4), 4356.
Kaufman, D., and S.-J. Wei (1999): Does"grease
money"speed up the wheels of commerce?, Policy Research
Working Paper Series 2254, The World Bank.
Kay, R., and S. Little (1987): Transformations of the explanatory
variables in the logistic regression model for binary data, Biometrika,
74(3), 495501.
Silva, J. M. C. S. (2001): A score test for non-nested hypotheses with
applications to discrete data models, Journal of Applied Economet-
rics, 16(5), 577597.
Svensson, J. (2003): Who Must Pay Bribes and How Much? Ev-
idence from a Cross Section of Firms, The Quarterly Journal of
Economics, 118(1), 207230.
(2005): Eight Questions about Corruption, The Journal of
Economic Perspectives, 19(3), 1942.
Transparency International, T. (2004): Corruptions Perception
Index, 2004, Discussion paper, Transparency International.
12
Figure 1: 2 2 table for first logit model
13
Figure 2: 2 2 table for second logit model
14