Professional Documents
Culture Documents
Measuring The Tax Gap: B. Erard & Associates
Measuring The Tax Gap: B. Erard & Associates
Brian Erard
B. Erard & Associates
BEandAssoc@Aol.com
Outline of Presentation
Underground/black/hidden/unobserved
economy
– Broadest concept: Subset of all economic
activity (from both legal and illegal
sources/market and non-market) that goes
unrecorded in official statistics
– Typical concept: Difference between total
market-based income (legal and illegal) and
recorded GDP
How does UE differ from tax gap?
Not all unrecorded income is taxable (due to filing thresholds,
exemptions, and certain deductions)
Some taxable income sources are not counted in UE measures (such as
capital gains and various transfers)
A sizeable portion of the tax gap is attributable to aggressive use of tax
credits, depreciation rules, transfer pricing, and other provisions rather
than direct underreporting of income
The tax gap includes taxes on income that have been reported but not
paid
Recorded GDP actually accounts for some sources of unreported
income
Conceptually, the UE includes income from illegal activities (drugs,
gambling, prostitution) that is typically excluded from tax gap
measurement
The UE is even harder to measure!
Scope of tax gap measurement
Ideally, a broad monetary measure encompassing all taxes
and all forms of non-compliance
As a practical matter, it may be too costly or difficult to
develop a reasonably accurate broad measure
– A large scale random audit programme may exhaust a
large share of a tax administration’s compliance
resources
Alternatives for a narrower scope include:
– Focus on certain key taxes
– Focus on compliance rates rather than compliance levels
– Focus on indicators of non-compliance rather than direct
measures
US tax gap map, TY 2006
Role of third-party reporting and
withholding in U.S.
HMRC tax gap 2009-10 and 2010-11
Table 1.1: Tax Gaps for HMRC administered taxes – 2009-10 (revised) and 2010-11 (£ billion)
Tax Component Point estimates Percentage tax gap 3
(£ billion) 1,2,4
2009-10 2010-11 2009-10 2010-11
(revised) (revised)
Indirect taxes5
Value Added Tax (VAT) 8.6 9.6 10.8% 10.1%
Spirits duty 0.1 0.2 4% 5%
Beer duty 0.4 0.4 9% 10%
Cigarette duty 1.2 1.0 11% 9%
Hand rolled tobacco duty 0.5 0.5 42% 38%
Great Britain diesel duty 0.5 0.1 3% 1%
Great Britain petrol duty6 N/A N/A N/A N/A
Northern Ireland diesel duty7 0.1 0.1 12% 25%
Northern Ireland petrol duty6,7 N/A 0.0 N/A 13%
Other indirect taxes 8 1.0 1.0 6% 5%
Total indirect taxes 12.3 12.9 9.0% 8.4%
Direct taxes
Individuals in self assessment 4.6 4.4
Business taxpayers 4.2 4.0
Non-business taxpayers 0.4 0.4
Large partnerships in self assessment
9
0.7 0.8
Income Tax, Small and medium employers (PAYE)10 0.9 0.8
National Insurance Large employers (PAYE) 2.0 2.1
Contributions, Avoidance 1.9 2.1
Capital Gains Tax Non-declaration of income and capital gains
by individuals not in self assessment 0.9 1.0
Ghosts 11 1.3 1.3
Moonlighters 12 1.8 1.9
Total 14.1 14.4 5.6% 5.5%
Businesses managed by the Large Business
Service 1.1 1.4
Avoidance 0.9 1.1
Corporation Tax Technical issues 0.3 0.3
Large and complex businesses 1.3 1.2
Small and medium businesses 1.4 1.4
Total 3.8 4.1 9.6% 8.8%
Inheritance tax 0.2 0.2
Stamp duties
13
0.5 0.6
Stamp duty land tax 0.2 0.3
Other direct taxes
Shares stamp duty 0.3 0.3
Petroleum revenue tax 0.02 0.03
Total 0.8 0.8 6.5% 4.6%
Total direct taxes 18.7 19.3 6.2% 5.9%
Total tax gap 31 32 7.1% 6.7%
Denmark personal income taxes TY2006
Audit Data
– Random
– Operational
– Combined operational and random
Measures based on comparisons of surveys and
administrative data
Other creative approaches
Designing random audit studies
Scope
Scale
Sampling strategy
Data collection
Scope
𝑝 1−𝑝
𝑝Ƹ ± 𝑧𝛼/2
𝑛
𝑝(1−𝑝)
The term 𝑧𝛼/2 is known as the margin of error (m)
𝑛
1.962 (.25)
𝑛= ≈ 1,067
.03 2
Some notes
point estimate)
If we are confident that the true rate p is far from
½, we can use a smaller sample
Estimating the magnitude of non-
compliance
Example: Kleven et al. (2011)
– As part of this study, a random sample of Danish
taxpayers were selected for rather comprehensive audits
of their personal tax returns
– The study was used for various purposes, including
developing an estimate of overall tax underreporting
Summation notation
Population
Observation X
1 2
2 8
3 5
4 6
5 1
Total 22
𝑋𝑖 = 𝑋1 + 𝑋2 + 𝑋3 … + 𝑋𝑁
𝑖=1
5
𝑋𝑖 = 2 + 8 + 5 + 6 + 1 = 22
𝑖=1
Point estimation
𝑋1 , 𝑋2 , 𝑋3 … , 𝑋𝑁 represent the overall magnitudes of tax underreporting on
the N returns in the population
σ𝑁
𝑖=1 𝑋𝑖
𝜇= represents the mean level of tax underreporting in the population
𝑁
𝜏 = σ𝑁
𝑖=1 𝑋𝑖 = 𝑁𝜇represents the aggregate level of tax underreporting in the
population
Our respective point estimates of the mean and aggregate levels of tax
underreporting in the population are:
σ𝑛𝑖=1 𝑥𝑖
𝑥ҧ =
𝑛
𝑛
𝑁
𝑡= 𝑥𝑖 = 𝑁𝑥ҧ
𝑛
𝑖=1
Interval estimation
The population standard deviation of tax underreporting is defined as:
σ𝑁
𝑖=1 𝑋𝑖 − 𝜇
2
𝜎=
𝑁
The interval estimates for the mean and aggregate levels of tax
underreporting are, respectively:
𝜎
𝑥ҧ ± 𝑧𝛼/2
𝑛
𝜎
𝑡 ± 𝑧𝛼/2 𝑁
𝑛
How large should the sample be?
Suppose we want our margin of error for the mean level of tax
underreporting to be £50, and we believe that 𝜎 is roughly 2,000. Since
𝜎
𝑚 = 𝑧𝛼/2 𝑛, we compute:
1.962 𝜎 2 1.962 ∗ 2,0002
𝑛= = ≈ 6,147
𝑚2 502
Similarly, suppose that there are 1 million taxpayers and we want our
margin of error for the aggregate level of tax underreporting to be £50
𝜎
million. Since 𝑚 = 𝑧𝛼/2 𝑁 𝑛, we compute:
2 2
1.96𝑁𝜎 1.96 ∗ 1,000,000 ∗ 2,000
𝑛= = ≈ 6,147
𝑚 50,000,000
Stratified random sampling
𝐻 𝑁ℎ
2 𝑁ℎ
𝑁1 𝑁2
𝑁ℎ
σ𝐻 σ
ℎ=1 𝑖=1 𝑋ℎ𝑖
𝐻 𝑁ℎ
𝜇= = 𝜇
𝑁 ℎ=1 𝑁 ℎ
A simple random sample of size nhis drawn from each stratum, and the
sample mean for the hth stratum is
σ𝑛𝑖=1
ℎ
𝑥ℎ𝑖
𝑥ҧℎ =
𝑛ℎ
This serves as an estimate of the population stratum mean 𝜇ℎ . The estimate
of the overall population mean 𝜇 is computed as:
𝑁ℎ
𝑥ҧ = σ𝐻
ℎ=1 𝑁 𝑥ҧ ℎ .
Sample weights
p.d.f. of A
𝑓𝑁 (𝐴) 𝐴>0
𝑓 𝐴 =ቊ
Pr(𝜀𝑁 < −𝛽𝑁′ 𝑥𝑁 ) 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
DCE tobit model: pdf of 𝐴 = 𝑁 ∗ 𝐷
When A = 0, just like DCE probit case:
Pr(A=0) = 1 − Pr 𝜀𝑁 > −𝛽𝑁′ 𝑥𝑁, , 𝜀𝐷 > −𝛽𝐷′ 𝑥𝐷
When A>0, p.d.f. is sum of expressions for 2
separate kinds of detection outcomes:
1. Perfect detection: 𝑓𝑁 𝐴 Pr 𝜀𝐷 > 1 − 𝛽𝐷′ 𝑥𝐷
2. Partial detection: account for all D rates 0 to 1:
1
1 𝐴
න 𝑓𝑁 𝑓𝐷 𝐷 𝑑𝐷
0 𝐷 𝐷
The 1/D term in the integral is the Jacobianof the
transformation from N to A
DCE tobit likelihood function for
independent normal disturbances
𝑁∗ 𝑁∗ > 0
∗
𝑁 = 𝛽𝑁′ 𝑋𝑁 + 𝜀𝑁 𝑁=ቊ
0 𝑁∗ ≤ 0
1 𝐷∗ ≥ 1
𝐷 ∗ = 𝛽𝐷′ 𝑋𝐷 + 𝜀𝐷 𝐷 = ቐ𝐷 ∗ 0 < 𝐷 ∗ < 1
0 𝐷∗ ≤ 0
𝐴=𝑁∗𝐷
′ ′
𝛽𝑁 𝑋𝑁 𝛽𝐷 𝑋𝐷
A=0: 𝐿 = 1 − Φ Φ
𝜎𝑁 𝜎𝐷
′ ′ 1 1 ′ ′
1 𝑁−𝛽𝑁 𝑋𝑁 1−𝛽𝐷 𝑋𝐷 𝐴/𝐷−𝛽𝑁 𝑋𝑁 1 𝐷−𝛽𝐷 𝑋𝐷
A>0: 𝐿 = 𝜎𝑁
𝜙 𝜎𝑁
Φ 𝜎𝐷
+ 0 𝐷𝜎 𝜙 𝜎𝑁 𝜎𝐷
𝜙 𝜎𝐷
𝑑𝐷
𝑁
Extensions of approach
Model the probability and magnitude of non-compliance
using separate equations
Account for skewness in distribution of non-compliance
Account for role of third-party information reports
Employ separate models for each income source
Account for cases where an income source was not
examined during the audit
Separately model the case where an income source has not
been reported on the return
Pool data from multiple tax years
Incorporate results into a micro-simulation model
Developing detection controlled
estimates
The estimated parameters of the DCE model are
used to predict the actual level of non-compliance
(N) on each return conditional on the detected
level (A)
These estimates can be aggregated across returns
to estimate overall misreporting by income source
A tax calculator can be applied to estimates of
unreported income to compute the tax gap
One can also use the results to derive implicit
DCE multipliers
Confidence intervals
Our aggregate estimate of underreported income is
S=σ𝑛𝑖=1 𝐸(𝑁𝑖 |𝐴𝑖 )
Approach 1: Delta method
– 𝑆ሚ ± 𝑧𝛼/2 𝑉, where 𝑉 = 𝐺 ′ Σ𝐺
𝑑𝐺
– G is estimated gradient vector
𝑑𝛽
– Σ is estimated covariance matrix of 𝛽
Approach 2: Simulation
– Draw M random samples of parameter vector from a distribution
with mean 𝛽 and covariance Σ
– For each draw, compute an estimate of S
– Sort the sample values of S and use the 𝛼/2ndand 1-𝛼/2nd
percentiles as the upper and lower bounds
Implicit DCE multipliers for TY 2001
Estimates of net income misreporting
Notes on DCE methodology
Relies on variation across examiners in their
performance at uncovering unreported income
– The method essentially scales up performance of all
examiners to reflect what the best examiner would have
found
– Sometimes there are not sufficient examiners who have
each audited an income source on a reasonable number
of returns. One can do some pooling in such cases
To help insure model identification, it is desirable
not too have much overlap between the
explanatory variable sets XN and XD
Attempting to distinguish deliberate from
unintentional errors: a simple example
Extensions
Let
A = audit indicator (A=1 if audited, 0 otherwise)
N = Non-compliance if not audited
X = set of explanatory variables
1. N⊥A|X (conditional independence or
unconfoundedness
2. 0 <Pr 𝐴 = 1 𝑋 < 1 (common support)
3. X is exogenous (not influenced by A)
Relationship between statistical
matching and random assignment
Random assignment
– Distribution of both measured and unmeasured
variables balanced across groups
– Common support condition always holds
Matching
– Only distribution of measured variables balanced across
groups
– Common support condition may fail for some values of
X
Relationship between statistical
matching and regression analysis
Matching is non-parametric
– No need to assume functional forms (linearity, additive
errors, normality, etc.)
Common support requirement avoids the
extrapolation problem in regression
– Identifying effects by projecting into regions where no
data points exist
Matching approaches
Exact match on X
– Useful if small number of qualitative variables in X
Inexact match on X
– A distance metric is used to find one or more audited
returns that have similar X values to each audited return
Propensity score matching
– Idea is to reduce dimensionality of the problem by
matching on a single index: 𝑃 𝑋 = Pr 𝐴 = 1 𝑋
– Rosenbaum and Rubin (1963) showed that if
N⊥A|X then N⊥A|𝑃 𝑋
Propensity score estimation approach
Steps:
1. Estimate “propensity score” using probit or logit
2. Match audited and unaudited returns by propensity score
3. Impute a value N to each unaudited return based on the
observed value(s) of the matched audited return(s)
Propensity score matching issues
The bump in the filing rate in TY2007 coincides with the “Economic
Stimulus Payment” (worth $300 per family member), suggesting that this
one-time benefit encouraged many ghosts to file a return in that year.
Idea to evaluate the determinants of
filing compliance – “calibrated probit”
1 Filer
𝐹∗ = 𝛽𝐹′ 𝑋𝐹 + 𝜀𝐹 𝐹=ቊ
0 Non−filer
Tax return data: 𝐹𝑖 =1 for all observations i = 1,…,𝑁1
Survey data: 𝐹𝑗 is unknown for all observations j = 1,…,n
Survey weights: σ𝑛𝑗=1 𝑤𝑗 = 𝑁 (Overall population of filers
and non-filers)
=
Impact
of the
program
Control