Corrup Logit

EC910 - Econometrics B Project
Everyone Has Their Price: An

Analysis of Corruption from the World
Bank Business Environment and
Enterprise Performance
Survey(BEEPS)
March 18, 2012
Abstract
In this study, we analyse the data set compiled by the World Bank
in their survey of firms in the European region (BEEPS) using a model
logistic regression for binary response variables. We pick out variables
that concern informal gifts or bribes and attempt to link them with the
characteristics of a firm and components of its business environment.
We find a significant relationship between being asked for a bribe in
connection with acquiring a certificate required for operations and the
total number of inspections carried out by enforcement agencies, among
other factors. Several econometric methods are employed to understand
the relationship fully, and diagnostic tests are used to check for the
robustness of the results.
1 Introduction
The World Bank group (World Bank) and the European Bank for Re-
construction and Development (EBRD) are the organizations to which
the Business Environment and Enterprise Performance Survey (BEEPS
henceforth) can be attributed to. The objective of the survey was to
collect firm-level data on matters relating to labour, governmental rela-
tions, innovation and corruption. It covers 29 countries (predominantly
in the European region) including Russia, Poland, Turkey and Hungary
(a full list can be found on the website of BEEPS). This particular study
will deal primarily with data relating to corruption and the incidence
of requests for informal payments in connection with a firm s operations.
The main issue will revolve around the various characteristics a firm
might possess that make it susceptible to such informal payments (bribes
henceforth). The level of corruption in a country or region has proven
to be a very important factor in deciding the strength of its business
environment. Svensson (2003) studies corruption using a unique data
set in Uganda, finding that not all firms reported having to pay bribes,
and there were differences between firms facing the same set of poli-
cies. There has been a vast amount of literature relating to perception
indices in order to study the macro-determinants of corruption using
cross-country data.1
The main purpose of the analysis will be to understand how different

factors present in a firm s business environment (and the performance of
the firm) affect the incidence of corruption within that particular envi-
ronment. We will also look at determining the impact of several factors
(and the extent thereof) on bureaucracy and bribery with regards to a
particular component of business operations - that of requiring a cer-
tificate to run operations for a firm.
The rest of the paper is structured as follows: in section 2, we spec-

ify the econometric model employed, as well as a brief survey of the
literature is also included. Section 3 will introduce the data and quality
of the obtained dataset is discussed. Next, the various components of
the analysis are listed, with advantages and disadvantages of using each
of them in section 4. In section 5, we report the results from analysis
(selectively) and study the implications. Finally, suggestions and com-
ments on how the analysis can be extended and improved are provided
in section 6, followed by concluding remarks.
1
Some of the studies that do this are Hellman, Jones, Kaufmann, and Schankerman
(2000) and Kaufman and Wei (1999).
2
2 Econometric Model and
Literature Review
In this particular study, we shall focus on the effect of several business
environment factors and characteristics specific to firms on the incidence
and reporting of bribery. We make use of a logistic regression model,
with a cumulative distribution function that is log normally distributed.
The dependent variable is a binary response variable, taking values y
= 1 for a response of Yes and y = 0 for No and missing values were
generated for all other response values (non-response or refusal).
We expect several independent variables to play a role in affecting the

response probability. These are usually absolute values, but one vari-
able included in the regression was in the form of a ratio (Total capacity
utilization). There are two relationships to be studied, and are given
by the following two specifications of a logit model where we denote
by p the probability of observing a given sample of binary responses
and () is the logistic cumulative distribution function. The xs are
explanatory variables with coefficients i and i respectively:

ex 1
pecap1 = ecap1 (x1 ) = (1)
1 + e x1
where
x1 = 1 b7 + 2 ecaq69 + 3 ecaw1 + 4 e2
and
e x2
pecap7 = ecap7 (x2 ) = (2)
1 + e x2
where
x2 = 1 b7 + 2 ecaq69 + 3 ecaw1 + 4 f 1 + 5 b3 + 6 ecaw2
3
Logit models have been found to yield similar results as that of
probits (normally distributed), but are preferred for studying binary
response variables as compared to a linear probability model (LPM),
which is known to produce biased and inconsistent estimators. The
choice of the model depends on the data generating process, and it has
been suggested that the costs of choosing either one of logit or probit
are not heavy, in the sense of predicting similar probabilities(Cameron
and Trivedi 2005). One difference is that the logistic normal curve has
thicker tails as compared to the probit (standard normal curve). It can
also be mentioned that logit models are particularly useful in biomet-
rics, and a useful example is provided by Horowitz and Savin (2001).
In the same paper, they also highlight the use of semiparametric meth-
ods of estimating the logit model when it is difficult to ascertain the
form of dependence on the explanatory variables; this is one of the ways
of overcoming specification problems associated with binary response
variables. Lastly, in the case of a logit model, we cannot infer the prob-
ability of observing the sample by looking at the coefficients that result
from a logistic regression (these give us only the direction of change).
We, therefore, focus on the marginal effects (calculated at mean and
evaluated at every observation using the sample average of the individ-
ual marginal effects).
Other research that has studied corruption in a micro framework, or

employed survey data for analysis often found conflicting results de-
pending on the area of study. A study conducted in Bangladesh about
non-collusive corruption in the education sector found that a specific
theoretical model fit the predictions of Transparency International s
Corruptions Perceptions Index, 2007 (Transparency International 2004)
and that bribery is related to the characteristics of the bribe payer
(Dzhumashev, Islam, and Khan 2010). Svensson (2003) built on a
unique cross-sectional data set collected in Uganda from firms, and
found that majority do not report paying bribes, and found variation
in firms facing the same institutions/policies. BEEPS, which this anal-
ysis draws upon, has compiled data and expects to carry out another
survey in 20122 (EBRD-World Bank 2010). Other studies of corruption
mainly focus on the effect of corruption (in both political and bureau-
cratic forms) on development (Bardhan 1997), whereas a comprehensive
discussion of the issue is provided by Svensson (2005).
2
For more details about the survey and data analysis using panel data models, refer to
the website: http://beeps.prognoz.com/beeps/Home.ashx
4
3 Data Used
The dataset used in the analysis was downloaded from the website of
BEEPS(EBRD-World Bank 2010). There were 437 variables included
in the dataset, each with 11998 observations. Of these, this particu-
lar study makes use of 10 variables for regressions, whereas nearly 20
more variables of interest were analyzed separately (not included in this
study).
Over the years, BEEPS has involved numerous thousands of interviews

administered across the European region. The data that we use in the
analysis had binary response variables in the form of questions requir-
ing yes or no answers (eg: Was a gift or informal payment requested in
connection with granting a permit to the firm?), which were decoded
using the egen function in Stata. Survey non-response and refusal to
answer were treated as missing data3 , and these were generated in Stata
as well, using the mvdecode function.
After having generated the missing data, some of the variables had
become unfit for regression as they did not have enough useable ob-
servations, and hence were discarded. We conducted analysis assuming
homoscedasticity of error terms, as correction of heteroscedasticity re-
quires knowledge about the form of heteroscedasticity present in the
data. It was found that the data used for a specified model was het-
eroscedastic in nature, and ways to work around this are suggested in
the next section.
The results of the BEEPS data analysis showed (among other things)
that there was a significant frequency of meetings, where firm-owners
or managers were asked to pay a bribe, and a large fraction of firms de-
scribed corruption as the biggest obstacle for a firm in the region. The
details of this can be found on the EBRD-World Bank data portal for
BEEPS. As mentioned earlier in the literature review, several studies
have found conflicting results about how much corruption affects devel-
opment in a country.4
3
Survey non-response was differentiated from item non-response. The former refers to
refusals to participate in the survey altogether, whereas the latter refers to the refusals to
answer some specific questions. BEEPS suffers from both problems and different strategies
were used to address these issues. A report summarized these numbers to alert researchers
to these issues when using the data and when making inferences.
4
Since the analysis is restricted to the data from BEEPS, there are very few studies that
suggest any relation between the variables under study. See Hellman, Jones, Schankerman,
and Kaufmann (1999) for a useful study of BEEPS data, especially regarding perception
bias.
5
4 Evaluation of the Model
A logistic regression (reporting coefficients as well as log-odds ratio) was
carried out using Maximum Likelihood Estimation (MLE, hence-
forth). MLE is preferred over Ordinary Least Squares method of esti-
mation (OLS) as the latter yields a biased and inconsistent estimator,
as given in Greene (2003). The analysis also involves using marginal
effects (calculated at mean as well as taken at the sample average of the
individual observations) in order to determine the degree of effect of a
change in the measure of independent variables on the binary response
probability. We conclude this section with a brief outline of the diag-
nostic tests carried out for heteroscedasticity and goodness-of-fit.
1. MLE
This method of estimation proves to be ideal for binary response
variables that represent non-linear models and are iteratively solved.
MLE maximizes the log likelihood of the joint density function
of observing a given sample. It can be shown that MLE (under
regularity conditions) is consistent, asymptotically normally dis-
tributed and efficient. Notably, MLE is found to be biased (but
consistent) under small sample properties. Furthermore, MLE is
not chosen in order to provide a good fit for the data or to have
extraordinarily good predictive power on the binary response prob-
ability, but rather to simply maximize the joint density of the ob-
served dependent variable.
2. Marginal Effects
The analysis employs the calculation of various marginal effects,
since the coefficients of the independent variables do not vary with
x as pointed out earlier. Marginal effects differ with the point
of evaluation of x, where we can determine the average of the
marginal effects (AME) or calculate marginal effects at the sample
mean of the regressors (ME at mean). AME may be more useful
than ME when the explanatory variables are discrete or non-linear
in nature (i.e. mean observations may be non-representative of the
entire sample). Both types of marginal effects hinge greatly on the
size of the sample. If dummy variables (D) are explaining the bi-
nary response outcomes, then those effects are found by evaluating
them at D = 1 and D = 0, and then finding the difference.
6
3. Log-odds ratios
A very common interpretation of the marginal effects is through
the odds ratio. The log-odds ratio gives the probability that y = 1
relative to y = 0. Since we are using a logit model, the log-odds
ratio is linear in the regressors (Cameron and Trivedi 2005).
4. Heteroscedasticity tests
We have used the Lagrange Multiplier (LM) test in order to
validate the assumption of homoscedasticity. Since the errors are
found to be heteroscedastic, we cannot rely wholly on the results
of MLE. Introducing more flexible and functional forms of inde-
pendent variables may help in achieving homoscedasticity, such as
using instrumental variables, and two-stage least squares as sug-
gested by Evans and Schwab (1995). There is also a chance that
heteroscedasticity may arise from some form of misspecification of
the model, thereby casting aspersions on the power of the LM test.
5. Goodness of fit test

For a dependent variable yi , the percentage correctly predicted
refers to the probability of predicting yi = 1 (when the cumulative
distribution (ecap1 and ecap7 0.5) and yi = 0 otherwise (when
ecap1 and ecap7 < 0.5). Therefore, when yi = yi = 0 or 1, we
say that the model is a good fit for the data. McFadden s R2 is
used for comparing the unrestricted model (keeping all variables)
and a restricted specification wherein only the intercept is kept.
If the covariates have no explanatory power, then this measure
will return a value of one. There are several other specification
tests that have been devised in order to evaluate the model. Silva
(2001) uses the score function to test the model (used commonly
in single-index formulations). Ben-Akiva and Lerman (1985) and
Kay and Little (1987) use a prediction rule to determine the aver-
age probability of correct predictions. Cramer (1999) uses a test
that penalizes the failure to correctly predict the binary response
outcomes.
6. Omitted Regressor test

An omitted regressor test is conducted with the first model, in
an attempt to understand the effect of an omitted explanatory
variables predictive ability. At times, Omitted regressors can also
mean that the coefficients on the included variables will be incon-
sistent (but this is inconsequential as we are not interested purely
in the coefficients, rather in ME and AMEs.)
7
5 Results
5.1 First logit model
With respect to the first of the previously specified models, we

now address the question from the survey that relates to the re-
quirement of a certificate in order to run operations. We use MLE
to run the regression, and the results follow.
The number of years of work experience that the top manager of

the firm had (b7), percent of employees with a university degree
(ecaq69), number of competitors in the main product market (e2)
and number of inspections in the last fiscal year (ecaw1) show sig-
nificance in both, marginal effects calculated at mean (ME) and
average marginal effects (AMEs).
Notably, for a firm with mean characteristics, a unit change in the

number of competitors that the firm faces reduces the probability
of requiring a certificate for operations by 2.1%(slope = 0.0213292)
in case of ME (ceteris paribus); whereas on average, a unit change
in the number of competitors in the country reduces the probabil-
ity of requiring a certificate by 2.07%(slope = 0.02069) in case
of AME (ceteris paribus). The log-odds ratio indicates the odds
that a certificate is required for production is 1.025 times higher
if there was one additional inspection in the previous fiscal year
(ecaw1). An omitted regressor test for the variable j7a (percent
of total annual sales paid as informal payment) showed that the
omitted variable was not of any significance.
In evaluating the model, we find that the model correctly predicts

the outcomes 60.14% of the time. This is referred to in figure 1
of the appendix. A test for heteroscedasticity is conducted using
generalized residuals of the regressors (which all have mean zero).
The resultant p-value returned is 0, and the null hypothesis we are
testing (homoscedasticity of error terms) is rejected, thus leading
us to infer that MLE is inconsistent in this non-linear model.
5.2 Second logit model5

The previously specified model was estimated using MLE, and
5
Since the data regarding this binary variable is directly dependent on the variable ecap1
returning a value of 1, we carry out a logistic regression conditional on ecap1==1. A model
of endogenous sample selection can play a role if we were considering data that was not
corrected.
8
yields negative coefficients on b7 and f 1 (percent capacity utiliza-
tion of firm, calculated as Output/Max. Output), and positive
coefficients on all others. Turning our attention to the ME and
AMEs of this model, we can conclude the following:
There are very small changes in probability of the response vari-

able (whether an informal payment or gift was requested in con-
nection with obtaining the compulsory certificate) when looking at
effects of the independent variables. Firstly, for an average firm,
a unit change in the total number of inspections in the last fis-
cal year (ecaw1) significantly affects the response probability by
0.1% (positive). The two variables found to be insignificant are
b3 (percent of firm owned by the largest owner) and ecaw2 (total
inspections cost including official and unofficial payments). This is
the case with AMEs as well. The log odds-ratio tells us that a unit
change in the significant variables (e.g.: ecaw1 - total number of
inspections in the past year) increases the odds of being requested
for a bribe in connection with obtaining a compulsory certificate
by 1.022. This is the strongest variable among the independent
variables.
The predictive capabilities of the model are better in case of pre-

dicting negative outcomes of binary response, even though the
degree of correct classification is found to be 89.16%. The 2 2
classification table is figure 2 in the appendix.
Finally, we can say that the concerns about the business environ-
ment and performance of a firm in a given country being affected
by the levels of corruption are not completely unfounded. The
analysis carried out shows that there are some characteristics of a
firm that significantly affect the incidence of bribery and corrup-
tion. A summary of all results is presented below:
Ecap1: whether compulsory certificate was required? Ecap7: whether bribe was requested in connection with certificate?
Marginal effects Average marginal effects Log-odds ratio Marginal effects Average marginal effects Log-odds ratio
b7 -.004190** -0.004068** 0.9833452** -.0023739** -.0024931** 0.9736162**
ecaw1 .006245** 0.0060601** 1.025343** 0.0019926** 0.0020927** 1.022697**
ecaq69 .0019098** 0.0018532** 1.007683** 0.0011529** 0.0012108** 1.01307**
e2 0.213292* -0.0206976* 0.9180745* - - -
b3 - - - 0.0004625 0.0004857 1.005223
ecaw2 - - - 0 0 1
f1 - - - -0.0006369* -0.0006689* 0.9928521*
Significant at 1%
Significant at 5%
9
6 Improving the Analysis
There are several ways in which the analysis could have been improved,
given more time and resources. Firstly, the heteroscedasticity that was
found in the data for the model could have been done away with in many
ways - as mentioned earlier by using two-stage least squares method of
estimation as well attempting to find a more flexible specification of
the model. Another method of dealing with this is through Elaborate
Likelihood Estimation (for probit) as suggested by Alvarez and Brehm
(1995), where they develop a heteroscedastic probit model.
The data used was primarily cross-section, but since the survey has been
conducted over a period of years, we could also employ panel data re-
gression using Generalized Least Squares (GLS) to estimate the model.
The original BEEPS data analysis involves complex weight structures,
and since stratified random sampling methods were used to collect data,
there may be different results owing to these factors as well.
Another possibility for extending the analysis of data lies in separating

the dataset into different countries and specifying different models for
each of those countries. This might help in understanding differences
between countries in the same dataset. Given plentiful time and re-
sources, one could even collect fresh primary data regarding the survey,
and run an analysis on the same to reconcile the outcomes in both cases.
7 Concluding Remarks
We have analyzed a very specific aspect of bribe-giving and corrup-
tion derived from data collected in the BEEPS 2009 edition. In the
first model, the requirement of compulsory certificates for operations
for a firm was shown to be affected by the number of degree-holding
employees and the total number of inspections in the past year. Our
findings also show that there was a significant effect of the total number
of inspections in a given year (and the years of work experience of the
top manager) on the firms that reported being asked for bribes. The
data used for estimating both models displayed characteristics of het-
eroscedasticity in the error terms, thereby making MLE suspect. The
various ways in which this can be remedied are mentioned briefly in the
paper as well. Finally, brief mentions of extending the analysis to spe-
cific countries and employing panel data methods to study the BEEPS
data are suggested.
10
References
Alvarez, R. M., and J. Brehm (1995): American Ambivalence
Towards Abortion Policy: Development of a Heteroskedastic Probit
Model of Competing Values, American Journal of Political Science,
39(4), pp. 10551082.
Bardhan, P. (1997): Corruption and Development: A Review of
Issues., Journal of Economic Literature, 35(3), 1320 1346.
Ben-Akiva, M. E., and S. R. Lerman (1985): Discrete choice anal-
ysis : theory and application to travel demand / Moshe Ben-Akiva,
Steven R. Lerman. MIT Press, Cambridge, Mass. :.
Cameron, A. C., and P. K. Trivedi (2005): Microeconometrics, no.
9780521848053 in Cambridge Books. Cambridge University Press.
Cramer, J. S. (1999): Predictive Performance of the Binary Logit
Model in Unbalanced Samples, Journal of the Royal Statistical So-
ciety. Series D (The Statistician), 48(1), pp. 8594.
Dzhumashev, R., A. Islam, and Z. H. Khan (2010): Non-
collusive Corruption: Theory and Evidence from Education Sector
in Bangladesh, Monash Economics Working Papers 38-10, Monash
University, Department of Economics.
EBRD-World Bank, T. (2010): The Business Environment and
Enterprise Performance Survey 2008-09: A report on methodology
and observations, .
Evans, W. N., and R. M. Schwab (1995): Finishing High School
and Starting College: Do Catholic Schools Make a Difference?, The
Quarterly Journal of Economics, 110(4), 941974.
Greene, W. (2003): Econometric Analysis. Prentice Hall, Upper Sad-
dle River, NJ.
Hellman, J. S., G. Jones, D. Kaufmann, and M. Schankerman
(2000): Measuring governance, corruption, and State capture - how
firms and bureaucrats shape the business environment in transition
economies, Policy Research Working Paper Series 2312, The World
Bank.
Hellman, J. S., G. Jones, M. Schankerman, and D. Kauf-
mann (1999): Measuring Governance, Corruption, and State Cap-
ture: How Firms and Bureaucrats Shape the Business Environment
in Transit, Research Working papers, 1(1), 145.
11
Horowitz, J. L., and N. E. Savin (2001): Binary Response Models:
Logits, Probits and Semiparametrics, Journal of Economic Perspec-
tives, 15(4), 4356.
Kaufman, D., and S.-J. Wei (1999): Does"grease
money"speed up the wheels of commerce?, Policy Research
Working Paper Series 2254, The World Bank.
Kay, R., and S. Little (1987): Transformations of the explanatory
variables in the logistic regression model for binary data, Biometrika,
74(3), 495501.
Silva, J. M. C. S. (2001): A score test for non-nested hypotheses with
applications to discrete data models, Journal of Applied Economet-
rics, 16(5), 577597.
Svensson, J. (2003): Who Must Pay Bribes and How Much? Ev-
idence from a Cross Section of Firms, The Quarterly Journal of
Economics, 118(1), 207230.
(2005): Eight Questions about Corruption, The Journal of
Economic Perspectives, 19(3), 1942.
Transparency International, T. (2004): Corruptions Perception
Index, 2004, Discussion paper, Transparency International.
12
Figure 1: 2 2 table for first logit model
13
Figure 2: 2 2 table for second logit model
14

Corrup Logit

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Corrup Logit

Uploaded by

Copyright:

Available Formats

EC910 - Econometrics B Project

Everyone Has Their Price: An

March 18, 2012

The main purpose of the analysis will be to understand how different

The rest of the paper is structured as follows: in section 2, we spec-

We expect several independent variables to play a role in affecting the

x2 = 1 b7 + 2 ecaq69 + 3 ecaw1 + 4 f 1 + 5 b3 + 6 ecaw2

Other research that has studied corruption in a micro framework, or

Over the years, BEEPS has involved numerous thousands of interviews

5. Goodness of fit test

6. Omitted Regressor test

With respect to the first of the previously specified models, we

The number of years of work experience that the top manager of

Notably, for a firm with mean characteristics, a unit change in the

In evaluating the model, we find that the model correctly predicts

5.2 Second logit model5

There are very small changes in probability of the response vari-

The predictive capabilities of the model are better in case of pre-

Another possibility for extending the analysis of data lies in separating

You might also like