Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 133

Measuring the Tax Gap

Brian Erard
B. Erard & Associates
BEandAssoc@Aol.com
Outline of Presentation

I. Tax Gap Overview


II. Measures based on random audits
i. Sample design and considerations
ii. Application of design-based measures
iii. Application of model-based measures
III. Measures based on operational audit data
IV. Measures based on comparisons of surveys
and administrative data
V. Other creative approaches
Conceptual Issues

 How is the tax gap defined?


 What are its components?
 Why attempt measure it?
 How does it compare to the underground
economy?
 How broad a scope should the measure cover?
How is the tax gap defined?

 Gross – the difference between the tax that


taxpayers should pay and the tax they actually pay
on a timely basis

 Net – the difference between the gross tax gap and


taxes collected through enforcement and late
payments
What are its components

 Non-filing – taxes owed but not reported and paid


on a timely basis by non-registrants/non-filers
(and late filers)
 Underreporting – taxes attributable to
underreporting of actual liabilities on timely filed
tax returns
 Underpayment – taxes that are reported but not
paid on a timely basis
– This component often can be accurately assessed from
administrative records
How is true tax liability defined?

 The liability that would be recommended based on


the interpretation of a fully informed tax official?
 The actual liability that is assessed following the
resolution of any disputed amounts between the
taxpayer and the tax agency?
 The liability that would be assessed if it were to be
assessed by an impartial court of law?
Why attempt to measure the tax gap?

 Collection of tax revenue is the primary


function of a tax administration
– Accountability: Helpful for evaluating the
degree to which the tax administration is
successful
– Disaggregation of the tax gap is helpful for
understanding the sources and potential
underlying causes of tax compliance
What is the underground economy (UE)?

 Underground/black/hidden/unobserved
economy
– Broadest concept: Subset of all economic
activity (from both legal and illegal
sources/market and non-market) that goes
unrecorded in official statistics
– Typical concept: Difference between total
market-based income (legal and illegal) and
recorded GDP
How does UE differ from tax gap?
 Not all unrecorded income is taxable (due to filing thresholds,
exemptions, and certain deductions)
 Some taxable income sources are not counted in UE measures (such as
capital gains and various transfers)
 A sizeable portion of the tax gap is attributable to aggressive use of tax
credits, depreciation rules, transfer pricing, and other provisions rather
than direct underreporting of income
 The tax gap includes taxes on income that have been reported but not
paid
 Recorded GDP actually accounts for some sources of unreported
income
 Conceptually, the UE includes income from illegal activities (drugs,
gambling, prostitution) that is typically excluded from tax gap
measurement
 The UE is even harder to measure!
Scope of tax gap measurement
 Ideally, a broad monetary measure encompassing all taxes
and all forms of non-compliance
 As a practical matter, it may be too costly or difficult to
develop a reasonably accurate broad measure
– A large scale random audit programme may exhaust a
large share of a tax administration’s compliance
resources
 Alternatives for a narrower scope include:
– Focus on certain key taxes
– Focus on compliance rates rather than compliance levels
– Focus on indicators of non-compliance rather than direct
measures
US tax gap map, TY 2006
Role of third-party reporting and
withholding in U.S.
HMRC tax gap 2009-10 and 2010-11
Table 1.1: Tax Gaps for HMRC administered taxes – 2009-10 (revised) and 2010-11 (£ billion)
Tax Component Point estimates Percentage tax gap 3

(£ billion) 1,2,4
2009-10 2010-11 2009-10 2010-11
(revised) (revised)
Indirect taxes5
Value Added Tax (VAT) 8.6 9.6 10.8% 10.1%
Spirits duty 0.1 0.2 4% 5%
Beer duty 0.4 0.4 9% 10%
Cigarette duty 1.2 1.0 11% 9%
Hand rolled tobacco duty 0.5 0.5 42% 38%
Great Britain diesel duty 0.5 0.1 3% 1%
Great Britain petrol duty6 N/A N/A N/A N/A
Northern Ireland diesel duty7 0.1 0.1 12% 25%
Northern Ireland petrol duty6,7 N/A 0.0 N/A 13%
Other indirect taxes 8 1.0 1.0 6% 5%
Total indirect taxes 12.3 12.9 9.0% 8.4%
Direct taxes
Individuals in self assessment 4.6 4.4
Business taxpayers 4.2 4.0
Non-business taxpayers 0.4 0.4
Large partnerships in self assessment
9
0.7 0.8
Income Tax, Small and medium employers (PAYE)10 0.9 0.8
National Insurance Large employers (PAYE) 2.0 2.1
Contributions, Avoidance 1.9 2.1
Capital Gains Tax Non-declaration of income and capital gains
by individuals not in self assessment 0.9 1.0
Ghosts 11 1.3 1.3
Moonlighters 12 1.8 1.9
Total 14.1 14.4 5.6% 5.5%
Businesses managed by the Large Business
Service 1.1 1.4
Avoidance 0.9 1.1
Corporation Tax Technical issues 0.3 0.3
Large and complex businesses 1.3 1.2
Small and medium businesses 1.4 1.4
Total 3.8 4.1 9.6% 8.8%
Inheritance tax 0.2 0.2
Stamp duties
13
0.5 0.6
Stamp duty land tax 0.2 0.3
Other direct taxes
Shares stamp duty 0.3 0.3
Petroleum revenue tax 0.02 0.03
Total 0.8 0.8 6.5% 4.6%
Total direct taxes 18.7 19.3 6.2% 5.9%
Total tax gap 31 32 7.1% 6.7%
Denmark personal income taxes TY2006

Mean Reported Mean Underreporting


Amount (DKK) Underreported Percentage
Amount (DKK)
Personal income 209,681 2,343 1.1

Stock income 5,635 274 1.8

Self-employment 10,398 838 7.5

Capital income -11,075 156

Deductions -9,098 129

Net overall income 206,038 3,744 1.8

Positive income subject to 3rd party 216,801 400 0.18


reporting and withholding
Positive income subject to 3rd party 7,081 148 2.1
reporting, but not withholding
Total tax liability 69,940 1,670 2.3
Sweden
Pre-filled returns in Sweden and
Denmark
Uses and misuses of tax gap
 Uses
– Reasonably good indicator of the order of
magnitude of tax non-compliance
– Helpful for identifying key sources of non-
compliance
– Underlying data can be useful for risk
assessment
 Misuses
– Short-term trend analysis
– Performance evaluation
Digression on “closing the tax gap”

 Public disclosure of tax gap estimates inevitably


leads to demands to “close the gap”
– Even under an optimal tax administration, it is
important to recognise that some gap will exist
– Nor is it optimal to audit until MR=MC
 Heisenberg uncertainty principal
– Attempts to measure the tax gap impact its size
– Attempts to reduce the tax gap impact the tax
base
How can we measure evasion?

 Audit Data
– Random
– Operational
– Combined operational and random
 Measures based on comparisons of surveys and
administrative data
 Other creative approaches
Designing random audit studies

 Scope
 Scale
 Sampling strategy
 Data collection
Scope

 May be interested in a particular tax or tax issue


– Individual income tax, Corporate income tax, VAT
– Specific credits, deductions, or income sources
 May be interested in a particular taxpayer segment
– Self-employed taxpayers, employers, high wealth
individuals
– For instance, one may want to investigate compliance
by small businesses with all taxes (income tax,
VAT/sales tax, employment taxes, etc.)
Scale

 The appropriate scale of the programme depends


on factors such as:
– What is being measured (e.g., rates or dollar amounts)
– Planned method of estimation: design-based or model-
based
– Desired precision for key estimates
– Other planned uses for the data (e.g., risk scoring)
Evolution of IRS random audit programs: Taxpayer
Compliance Measurement Program (TCMP)

 Line-by-line audits of a stratified random sample


of about 50,000 individual income tax returns
 Conducted approximately every 3 years from TY
1963 until TY 1988
– Also occasional studies of other taxes (employment,
small corporations, partnerships, individual non-filers)
 Primary uses were:
– Development of audit selection criteria
– Measurement of tax gap
– Research
Long dry spell
13 years later … TY 2001 National
Research Program (NRP)
– Stratified random sample of 45,000 individual returns for TY 2001
– Advertised as “kinder and gentler” than TCMP
• About 10% of returns accepted without examination or with
only a correspondence examination
• Not all line items examined
– Some routinely examined – e.g., self-employment returns
– Some examined only at discretion of “classifier” or
examiner
– Case building materials provided in advance
– For TY 2001, had a small “calibration sample” of returns audited
in a manner similar to old TCMP program
• Useful for evaluating non-compliance on line items that were
not routinely examined
NRP redesign

 Smaller annual studies of individual income tax


– Most recently for tax years 2006, 2007, 2008
– About 14,000 returns per year
 No longer a calibration sample
 Some recent studies of other taxes
– S-corporations (tax years 2003 and 2004, 5,000 returns)
– Employment tax (2008-2010, 6,000 returns)
Design challenges

 Mandatory vs. discretionary examination of line


items
 Intensity of probes for unreported income sources
 Examination of related entities
 Adjustments following disputes and appeals
 If detection controlled estimation is to be
employed, ensuring sufficient examiners who have
each done a reasonable number of audits of the
return items of interest
Some best practices for random audit
studies
 Non-sampling errors can plague a random audit study. The
following practices help to prevent such errors:
– Appropriate support and training of examiners and other staff –
buy-in by examiners is crucial
– Provide examiners with relevant case-building information
– Design procedures to distinguish reports on the wrong line item
from reports of an incorrect amount
– Have good procedures for recording, validating, and correcting
data
– Record details on which specific line items or issues have been
examined and which have not
– Provide adequate supervision
 It is also useful to consider what auxiliary information to
collect to aid research
Random sampling: design-based
estimation
 Design-based estimation is very common in
survey work. Under this approach:
– The variables of interest in the population are treated as
fixed but unknown numbers
– Estimates are computed based on a randomly drawn
sample from this population (typically, these estimates
are the sample analogues of the population
characteristics of interest)
– The properties of the estimates (such as their means and
variances) are derived using information only about the
selection probabilities for the observations in the
sample (i.e., the approach is non-parametric)
Estimating the rate of non-compliance

 Canada Processing Review Programme


– Approach is to contact a random sample of individual
taxpayers who have claimed certain credits or
deductions to request receipts to verify their claims
– The results are used to measure the rates of non-
compliance on these items and to develop targeting
criteria for future verification work
 Canada Core Audit Programme
– Approach is to randomly audit various SME segments
for selected tax issues to estimate rates of material non-
compliance and assess risks
Simple random sampling (SRS)

 One starts with a sample frame


– For this example, the frame is all tax returns in a given
year that claimed at least one specified credit or
deduction
 Under SRS, one randomly chooses returns from
the sample frame in such a way that every possible
sample of size n that can be drawn from the N
returns in the population has an equal chance of
selection
Point and interval estimation

Let p = unknown population proportion of returns with an improper claim


n = sample size
𝑛1 = number of sampled returns found to have an improper claim
𝑛1
Then 𝑝Ƹ = is the point estimate of the rate of non-compliance
𝑛

The following is a confidence interval for p:

𝑝 1−𝑝
𝑝Ƹ ± 𝑧𝛼/2
𝑛

𝑝(1−𝑝)
The term 𝑧𝛼/2 is known as the margin of error (m)
𝑛

For a 95% confidence interval, 𝑧𝛼/2 = 1.96


How large should the sample be?
Suppose we want to draw a random sample to estimate the rate of non-
compliance with a margin of error m=.03 (for a 95% level of confidence).
Since
𝑝(1 − 𝑝)
𝑚 = 1.96
𝑛
we can calculate n as:
1.962 𝑝(1 − 𝑝)
𝑛=
(.03)2

Of course, we don’t know p. The worst case scenario for precision is


p=1/2, in which case:

1.962 (.25)
𝑛= ≈ 1,067
.03 2
Some notes

 If the population size N is relatively small, a


somewhat smaller sample will be required. (We
are ignoring the FPC factor
𝑛
1− for the standard error of our
𝑁

point estimate)
 If we are confident that the true rate p is far from
½, we can use a smaller sample
Estimating the magnitude of non-
compliance
 Example: Kleven et al. (2011)
– As part of this study, a random sample of Danish
taxpayers were selected for rather comprehensive audits
of their personal tax returns
– The study was used for various purposes, including
developing an estimate of overall tax underreporting
Summation notation
Population

Observation X
1 2
2 8
3 5
4 6
5 1
Total 22

N is population size (5 in this example)


𝑁

𝑋𝑖 = 𝑋1 + 𝑋2 + 𝑋3 … + 𝑋𝑁
𝑖=1
5

𝑋𝑖 = 2 + 8 + 5 + 6 + 1 = 22
𝑖=1
Point estimation
𝑋1 , 𝑋2 , 𝑋3 … , 𝑋𝑁 represent the overall magnitudes of tax underreporting on
the N returns in the population

𝑥1 , 𝑥2 , 𝑥3 … , 𝑥𝑛 represent the overall magnitudes of tax underreporting on


the n returns in a SRS from the population

σ𝑁
𝑖=1 𝑋𝑖
𝜇= represents the mean level of tax underreporting in the population
𝑁

𝜏 = σ𝑁
𝑖=1 𝑋𝑖 = 𝑁𝜇represents the aggregate level of tax underreporting in the
population

Our respective point estimates of the mean and aggregate levels of tax
underreporting in the population are:
σ𝑛𝑖=1 𝑥𝑖
𝑥ҧ =
𝑛
𝑛
𝑁
𝑡= 𝑥𝑖 = 𝑁𝑥ҧ
𝑛
𝑖=1
Interval estimation
The population standard deviation of tax underreporting is defined as:
σ𝑁
𝑖=1 𝑋𝑖 − 𝜇
2
𝜎=
𝑁

The interval estimates for the mean and aggregate levels of tax
underreporting are, respectively:

𝜎
𝑥ҧ ± 𝑧𝛼/2
𝑛
𝜎
𝑡 ± 𝑧𝛼/2 𝑁
𝑛
How large should the sample be?
Suppose we want our margin of error for the mean level of tax
underreporting to be £50, and we believe that 𝜎 is roughly 2,000. Since
𝜎
𝑚 = 𝑧𝛼/2 𝑛, we compute:
1.962 𝜎 2 1.962 ∗ 2,0002
𝑛= = ≈ 6,147
𝑚2 502

Similarly, suppose that there are 1 million taxpayers and we want our
margin of error for the aggregate level of tax underreporting to be £50
𝜎
million. Since 𝑚 = 𝑧𝛼/2 𝑁 𝑛, we compute:

2 2
1.96𝑁𝜎 1.96 ∗ 1,000,000 ∗ 2,000
𝑛= = ≈ 6,147
𝑚 50,000,000
Stratified random sampling

 So far, we have considered SRS. However, often


it is preferable to use a stratified random sample.
 One should do so if:
– Reasonably precise estimates are desired for certain
subgroups of the population; or
– The mean value of the variable of interest is likely to
differ substantially across different subgroups
 For instance, separate sampling strata were
defined for employment status (self-employed or
not self-employed), return complexity, and region
in the Denmark study
Summation notation, continued
Population
Size of stratum 1: 𝑁1 = 3
Stratum Observation X
Size of stratum 1: 𝑁2 = 2 1 1 2
𝑁𝐻
1 2 8
Total for Stratum h: σ𝑗 =1 𝑋ℎ𝑗 = 𝑋ℎ1 + 𝑋ℎ2 + ⋯ + 𝑋ℎ𝑁ℎ 1 3 5
Subtotal 1 15
𝑁1
If h = 1, σ𝑗 =1 𝑋1𝑗 = 𝑋11 + 𝑋12 + 𝑋13 = 2 + 8 + 5 = 15 2 1 6
2 2 1
𝑁2
If h = 2, σ𝑗 =1 𝑋2𝑗 = 𝑋21 + 𝑋22 = 6 + 1 = 7 Subtotal 2 7
Total 22
Overall total:

𝐻 𝑁ℎ

𝑋ℎ𝑗 = 𝑋11 + 𝑋12 + ⋯ + 𝑋1𝑁1 + 𝑋21 + ⋯ + 𝑋𝐻𝑁𝐻


ℎ=1 𝑗 =1

2 𝑁ℎ

𝑋ℎ𝑗 = 𝑋11 + 𝑋12 + 𝑋13 + 𝑋21 + 𝑋22 = 2 + 8 + 5 + 6 + 1 = 22


ℎ=1 𝑗 =1

𝑁1 𝑁2

= 𝑋1𝑗 + 𝑋2𝑗 = (𝑋11 + 𝑋12 + 𝑋13 ) + (𝑋21 + 𝑋22 ) = (15) + (7) = 22


𝑗 =1 𝑗 =1
Estimation with a stratified random sample
Under stratified random sampling, we divide the population into H distinct
strata. The population count within the hthstratum is Nhand the total
population count is 𝑁 = σ𝐻ℎ=1 𝑁ℎ

The population mean 𝜇 is defined as:

𝑁ℎ
σ𝐻 σ
ℎ=1 𝑖=1 𝑋ℎ𝑖
𝐻 𝑁ℎ
𝜇= = 𝜇
𝑁 ℎ=1 𝑁 ℎ

A simple random sample of size nhis drawn from each stratum, and the
sample mean for the hth stratum is
σ𝑛𝑖=1

𝑥ℎ𝑖
𝑥ҧℎ =
𝑛ℎ
This serves as an estimate of the population stratum mean 𝜇ℎ . The estimate
of the overall population mean 𝜇 is computed as:
𝑁ℎ
𝑥ҧ = σ𝐻
ℎ=1 𝑁 𝑥ҧ ℎ .
Sample weights

 To simplify computation of sample statistics, one


often constructs sample weights, which are
defined as the inverse of the sampling rate within
𝑁ℎ
a stratum: w𝑖 = for all taxpayers i in stratum h
𝑛ℎ
 So, for instance, the estimate of the population
mean is computed as a weighted average over the
σ𝑛
𝑖=1 𝑤𝑖 𝑥𝑖 σ𝐻
𝑖=1 𝑤ℎ 𝑥ҧ ℎ
entire sample: 𝑥ҧ = = .
𝑛 𝐻
Stratified sampling strategies

 Proportional allocation: sample each stratum in


𝑛ℎ 𝑛
proportion to its size in the population: =
𝑁ℎ 𝑁
 Optimal allocation: choose stratum sample sizes
𝑛ℎ to maximise precision for a given overall
sample size n
– Suppose the cost of examining a return in stratum h is
ch
𝑁ℎ 𝜎ℎ
– Then the optimal allocation sets 𝑛ℎ ∝
𝑐ℎ
Estimating rates vs. magnitudes

 Estimation of rates of non-compliance tends to


require a modest sized random sample (1,000
observations or less) for reasonable precision
 The distribution of the magnitude of tax non-
compliance tends to be highly skewed, resulting in
a large population standard deviation 𝜎
– As a consequence, rather large samples are typically
required for adequate precision in estimating
magnitudes
Model-based approaches with random
audit data
 Under a model-based approach, one specifies a
relationship between the variable of interest (non-
compliance) and its potential determinants
 The model generally imposes functional form and
distributional assumptions (parametric approach)
 The quality of the estimates depends not only on
the sample design but also the validity of the
modelling assumptions
Why use a model-based approach?

 To control for measurement errors, such as:


– The failure to fully detect non-compliance
– Conflation of deliberate and unintentional errors
 To improve one’s understanding of what drives
compliance behaviour and to predict future behaviour
 Potentially, to improve the precision of tax gap estimates
(if the underlying modelling assumptions are reasonably
valid)
Old IRS Approach
 Long ago, a study of randomly audited returns
from TY 1976 found, with the aid of third party
returns not available to the original examiners, that
for every dollar of underreporting discovered for
certain income items, another $2.28 went
undiscovered
 Based on this study, the IRS routinely applied a
“multiplier” of 3.28 to detected unreported sources
not subject to third-party reporting when
estimating the tax gap
What we see

 Random audit studies such as the TCMP and NRP


tell us about audit assessments on different
sources of income and offsets
 So, they give us an idea of how much additional
tax might be assessed if everyone received a fairly
intensive audit
 They also indicate what sorts of income and
deduction items are commonly associated with
compliance problems
What we don’t see

 The objective of tax evasion is to conceal one’s


actual tax liability…
 Not infrequently, this is done so well that
examiners are unable to uncover all of the
cheating that is present on a return
 So audit assessments allow us to observe most of
the unintentional errors and much of the deliberate
cheating that is fairly easy to identify
 However, they show us only a portion of the
deliberate cheating that is hard to uncover
How to measure what we can’t see

 Intuitively, some examiners are better at


uncovering noncompliance than others
– Some might be globally superior on all return issues;
others may have a comparative advantage on particular
issues
 If we knew the relative abilities of different
examiners on a given issue or line item, we could
“scale up” what was detected by a given examiner
to approximate what the best examiner would
have found if (s)he had done the audit
Detection Controlled Estimation (DCE)

 A statistical methodology to account for detection


errors on examinations and inspections
 Original methodology developed by Jonathan
Feinstein
– Rand (1991), J. of Law & Economics (1990)
 Improved and extended approach for use with
NRP data in collaboration with Jonathan
Visualizing the approach:
Suppose audit results show this:
Now suppose we are able to break down the results
for each of three examiners who were assigned
similar returns:
Detected vs. actual non-compliance
Main ingredients for DCE
Probit model with perfect detection
(A=N)
DCE probit model (𝐴 = 𝑁 ∗ 𝐷)
Regular tobit model with perfect
detection (A=N)
𝑁∗ 𝑁∗ > 0

𝑁 = 𝛽𝑁′ 𝑥𝑁 + 𝜀𝑁 N=ቊ
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.

p.d.f. of A
𝑓𝑁 (𝐴) 𝐴>0
𝑓 𝐴 =ቊ
Pr(𝜀𝑁 < −𝛽𝑁′ 𝑥𝑁 ) 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
DCE tobit model: pdf of 𝐴 = 𝑁 ∗ 𝐷
When A = 0, just like DCE probit case:
Pr(A=0) = 1 − Pr 𝜀𝑁 > −𝛽𝑁′ 𝑥𝑁, , 𝜀𝐷 > −𝛽𝐷′ 𝑥𝐷
When A>0, p.d.f. is sum of expressions for 2
separate kinds of detection outcomes:
1. Perfect detection: 𝑓𝑁 𝐴 Pr 𝜀𝐷 > 1 − 𝛽𝐷′ 𝑥𝐷
2. Partial detection: account for all D rates 0 to 1:
1
1 𝐴
න 𝑓𝑁 𝑓𝐷 𝐷 𝑑𝐷
0 𝐷 𝐷
The 1/D term in the integral is the Jacobianof the
transformation from N to A
DCE tobit likelihood function for
independent normal disturbances
𝑁∗ 𝑁∗ > 0

𝑁 = 𝛽𝑁′ 𝑋𝑁 + 𝜀𝑁 𝑁=ቊ
0 𝑁∗ ≤ 0

1 𝐷∗ ≥ 1
𝐷 ∗ = 𝛽𝐷′ 𝑋𝐷 + 𝜀𝐷 𝐷 = ቐ𝐷 ∗ 0 < 𝐷 ∗ < 1
0 𝐷∗ ≤ 0

𝐴=𝑁∗𝐷

′ ′
𝛽𝑁 𝑋𝑁 𝛽𝐷 𝑋𝐷
A=0: 𝐿 = 1 − Φ Φ
𝜎𝑁 𝜎𝐷

′ ′ 1 1 ′ ′
1 𝑁−𝛽𝑁 𝑋𝑁 1−𝛽𝐷 𝑋𝐷 𝐴/𝐷−𝛽𝑁 𝑋𝑁 1 𝐷−𝛽𝐷 𝑋𝐷
A>0: 𝐿 = 𝜎𝑁
𝜙 𝜎𝑁
Φ 𝜎𝐷
+ ‫׬‬0 𝐷𝜎 𝜙 𝜎𝑁 𝜎𝐷
𝜙 𝜎𝐷
𝑑𝐷
𝑁
Extensions of approach
 Model the probability and magnitude of non-compliance
using separate equations
 Account for skewness in distribution of non-compliance
 Account for role of third-party information reports
 Employ separate models for each income source
 Account for cases where an income source was not
examined during the audit
 Separately model the case where an income source has not
been reported on the return
 Pool data from multiple tax years
 Incorporate results into a micro-simulation model
Developing detection controlled
estimates
 The estimated parameters of the DCE model are
used to predict the actual level of non-compliance
(N) on each return conditional on the detected
level (A)
 These estimates can be aggregated across returns
to estimate overall misreporting by income source
 A tax calculator can be applied to estimates of
unreported income to compute the tax gap
 One can also use the results to derive implicit
DCE multipliers
Confidence intervals
 Our aggregate estimate of underreported income is
S=σ𝑛𝑖=1 𝐸(𝑁𝑖 |𝐴𝑖 )
 Approach 1: Delta method
– 𝑆ሚ ± 𝑧𝛼/2 𝑉, where 𝑉 = 𝐺 ′ Σ𝐺
𝑑𝐺
– G is estimated gradient vector
𝑑𝛽
– Σ is estimated covariance matrix of 𝛽
 Approach 2: Simulation
– Draw M random samples of parameter vector from a distribution
with mean 𝛽 and covariance Σ
– For each draw, compute an estimate of S
– Sort the sample values of S and use the 𝛼/2ndand 1-𝛼/2nd
percentiles as the upper and lower bounds
Implicit DCE multipliers for TY 2001
Estimates of net income misreporting
Notes on DCE methodology
 Relies on variation across examiners in their
performance at uncovering unreported income
– The method essentially scales up performance of all
examiners to reflect what the best examiner would have
found
– Sometimes there are not sufficient examiners who have
each audited an income source on a reasonable number
of returns. One can do some pooling in such cases
 To help insure model identification, it is desirable
not too have much overlap between the
explanatory variable sets XN and XD
Attempting to distinguish deliberate from
unintentional errors: a simple example
Extensions

 Use separate equations to describe the probability


and magnitude of non-compliance
 Account for undetected non-compliance
Tax gap estimation with operational
audit data
 Operational audits are generally undertaken on
returns where substantial non-compliance is
deemed likely
 This creates a classic sample selection problem
– The audited returns are unlikely to be representative of
unaudited returns
Sample selection example
 During WWII, engineers routinely examined damage to
returning bombers

– They reasoned that those areas that were consistently


shot up would benefit from more reinforcement
 Their concern, of course, was improving the odds that an
aircraft would return successfully; yet the sample consisted
only of aircraft that did return
 Abraham Wald insightfully turned the “common wisdom"
on its head
How to control for sample selection bias

 Statistical models of sample selection


– In research for the IRS, I have used this approach to
estimate the estate tax gap and also to investigate
underreporting of self-employment income
 Statistical matching
Statistical models of sample selection

 The selected sample may differ from the general


population in terms of both observed and unobserved
characteristics
 Under this approach, one attempts to account for both the
observed and unobserved differences
 One does this by jointly modelling the determinants of the
outcome variable of interest and the sample selection
process
Heckman (1979) sample selection model
Correlated errors case
A two-part sample selection model of tax
non-compliance
Estimation issues

 It is very important to have at least one regressor in XA that


is excluded from X
 Results can be sensitive to distributional assumptions, so it
is important to evaluate whether the specification is
adequate
 Performance will tend to be better when most returns have
at least a small chance of being selected for audit
 The estimation results may be used to predict, for each
unaudited return, the magnitude of non-compliance that
would have been discovered if the return had been audited
Statistical matching

 Attempt to control for observed differences


between operational audit sample and unaudited
returns
 No attempt to control for unobserved differences
– Likely to work best with a detailed data base that
includes the most important factors impacting whether
a return is selected for an operational audit
 Often used to evaluate treatment effects
– But it can be used to impute, say, non-compliance from
operationally audited returns to unaudited returns
Key assumptions

 Let
A = audit indicator (A=1 if audited, 0 otherwise)
N = Non-compliance if not audited
X = set of explanatory variables
1. N⊥A|X (conditional independence or
unconfoundedness
2. 0 <Pr 𝐴 = 1 𝑋 < 1 (common support)
3. X is exogenous (not influenced by A)
Relationship between statistical
matching and random assignment
 Random assignment
– Distribution of both measured and unmeasured
variables balanced across groups
– Common support condition always holds
 Matching
– Only distribution of measured variables balanced across
groups
– Common support condition may fail for some values of
X
Relationship between statistical
matching and regression analysis
 Matching is non-parametric
– No need to assume functional forms (linearity, additive
errors, normality, etc.)
 Common support requirement avoids the
extrapolation problem in regression
– Identifying effects by projecting into regions where no
data points exist
Matching approaches

 Exact match on X
– Useful if small number of qualitative variables in X
 Inexact match on X
– A distance metric is used to find one or more audited
returns that have similar X values to each audited return
 Propensity score matching
– Idea is to reduce dimensionality of the problem by
matching on a single index: 𝑃 𝑋 = Pr 𝐴 = 1 𝑋
– Rosenbaum and Rubin (1963) showed that if
N⊥A|X then N⊥A|𝑃 𝑋
Propensity score estimation approach

Steps:
1. Estimate “propensity score” using probit or logit
2. Match audited and unaudited returns by propensity score
3. Impute a value N to each unaudited return based on the
observed value(s) of the matched audited return(s)
Propensity score matching issues

 Various approaches to match observations


– Nearest neighbour, caliper, kernel, local linear, etc.
 Without replacement or with replacement?
– Without replacement yields lower variation, but at price of higher
potential bias
– Without replacement also sensitive to order in which observations
are matched
– With replacement is preferable if relatively few audited returns are
similar to unaudited returns
 Balancing conditions: want 𝐸 𝐴 𝑋, 𝑃 𝑋 =𝐸 𝐴𝑃 𝑋
– Good to break 𝑃 𝑋 into 5 or so strata and verify mean values of X
within each stratum are similar for audited and unaudited returns
Choice-based sample

 If the sampling probability depends on whether


return was audited (e.g., oversampling of audited
returns), one can
– Perform an unweighted logit analysis to estimate
parameter vector 𝛽𝐴
– Perform an unweighted logit analysis and match
observations using the estimated log odds ratio:
𝑃(𝑋)
𝑙𝑛 = 𝛽𝐴′ 𝑋𝐴
1−𝑃 𝑋
 Alternatively, one can perform weighted logit (or
probit) and use usual propensity score
Creative and interesting power law
application
 Zipf’s Law postulates that the size (S) or frequency of
an observation is inversely proportional to its rank (R):
−𝛽
𝑆𝑖 = 𝛼𝑅𝑖 → ln 𝑆𝑖 = ln(𝛼) − 𝛽ln(𝑅𝑖 )
– It has many applications (size of cities, frequency of
words in a book, income rankings, corporation size,
number of visits to websites, etc.)
– It is “power law” relationship (also known as Pareto)
and often fits the upper tail of a distribution well
 Bloomquist, Hamilton, and Pope (2013) fit a size-rank
regression to operational audit results to estimate
overall non-compliance among very large corporations
(over US$250 million)
Size-rank regression for non-compliance

 Excluding some of the smallest operational audit


adjustment cases, they fit the regression:
𝑙𝑛𝑁𝑖 = 𝛼 + 𝛽 ln 𝑅𝑖 + 𝜀 𝑖 ,
where 𝑁 represents the audit adjustment and
𝑅 represents the rank of 𝑁
 They then use the estimated regression
relationship to predict overall non-compliance in
the population
Possible issues for further exploration
 Zipf’s Law seems unlikely to hold towards the
bottom of the size-rank distribution in the
population where non-compliance is zero or
negative (i.e., overstatements of income)
 The ranks of observations within the estimation
sample will tend to be more concentrated than the
ranks of the same observations in the population
– It is not clear how much this impacts predictions
 The results may be somewhat sensitive to the
chosen adjustment amount threshold for inclusion
in the size-rank regression
Observations

 Approach is relatively simple to apply and it


accounts the important fact that a relatively small
number of extreme cases account for the bulk of
non-compliance
 Approach yields an aggregate estimate
– It does not facilitate an analysis of the determinants of
underreporting
– In principle, though, one could extend the approach to
derive separate aggregate estimates for different income
sources or for different categories of corporations
(industry, public/private, national/international, etc.)
Combining operational and random audit
data
Ideally, “round out” operational audit with some random
audits of untargeted returns (and, ideally, non-targeted issues)

Selected for Not selected for


Operational Audit Operational Audit
Issues with operational-random sample
 Based on a study I performed for the IRS:
– Design-based estimation of such a sample is not very promising –
one needs a rather large random component to obtain reasonable
precision
– Model-based estimation that incorporates sample weights is known
to provide a degree of protection from misspecification. However,
weighted estimation adversely impacts precision, especially when
weights vary substantially in the sample.
– Although unweighted model-based estimation can lead to incorrect
inferences if the modelling assumptions are invalid, a well-
specified model can achieve superior precision in a combined
sample with a large number of targeted audits and relatively few
untargeted ones.
 It would seem ideal to integrate random sampling into an
operational audit selection program
Value of audit data for targeting non-
compliance
 Random audit findings can be used to devise audit
selection strategies
– NRP and DIF-selection in U.S.
 Operational audit findings alone are not adequate
for developing audit selection criteria
– Tunnel vision
 A combined operational-random sample may be a
viable alternative
Survey discrepancies
 Surveys that directly ask about tax evasion are not
especially useful
– Although there has been some success with asking
about informal sector employment (e.g., Lemieux et al.,
1994)
 Comparison of survey reports on income or
expenditure with tax data more promising
– Informal supplier evasion
– “Nanny Tax” evasion
– Non-filers
U.S. informal suppliers

 The IRS defines informal suppliers as “individuals


who provide products or services through informal
arrangements which frequently involve cash-
related transactions or ‘off the books’ accounting
practice”
 Owing in large part to the lack of a ‘paper trail’
tax non-compliance among informal suppliers can
be especially difficult to detect
Past IRS methodology

 In the past, the IRS commissioned a special survey


of consumer purchases that attempted to assess
informal earnings based on reported expenditures
on informally supplied goods and services
 It was not clear how successful this research was
in distinguishing informally from formally
supplied goods and services
A new approach

 Jim Alm and I took a different approach. Rather than rely


on a dubious distinction between formal and informal
sales, we identified 12 industries where informal suppliers
are prevalent (food vendors, direct sales, construction,
landscaping, personal services, etc.)
 We then compared reported self-employment earnings in
these industries from a large national survey to tax return
reports from these same industries
 Our approach attempts to account for all earnings in these
industries from self-employment, including that earned
through moonlighting
Findings

 Our results yield a higher level of non-compliance


in these industries than NRP-auditors were able to
uncover through intensive audits of tax returns
 At the same time, our estimates are somewhat
lower than the DCE-adjusted NRP estimates.
– This makes sense, since some self-employed
individuals may not be fully forthcoming about their
earnings, even on an anonymous national survey
Who’s minding the Nanny Tax?
 In the U.S. households are responsible for paying
various employment taxes when they pay more than a
nominal amount for the services of a domestic
employee (nannies, housekeepers, home health aides,
cooks, butlers, chauffeurs, groundskeepers, etc.)
 Household employers are required to file Schedule H
with their income tax returns to report all such
payments
 Together, federal and state employment taxes amount
to more than 20% of domestic employee
compensation
Compliance study methodology
 To examine compliance with Nanny Taxes, I used
a large national survey to identify individuals who
report that their longest job held during the year
was from domestic employment in a household
 Based on their reported earnings from this job, I
estimated how many Nanny Tax returns should
have been filed
 The results indicate that only 1 in 4 domestic
employers actually file and pay Nanny Taxes, and
that in aggregate, only about half of the federal
taxes due are actually paid
Validation of estimates

 Concerns with the methodology include:


– Moonlighters are excluded
– Some domestic workers are likely to be reluctant to
report their earnings on the survey
 As a validation exercise, I used a national
consumer expenditure survey to investigate how
much household reported spending on in-home
child care and house cleaning
– The results suggest non-compliance is even worse,
perhaps as high as 70% of taxes due
Individual income tax filing rate
estimation
 IRS is interested in measuring the trend in the
voluntary filing rate (VFR), defined as the ratio of
timely filed required returns to required returns
 I worked with Alan Plumley and Mark Payne of
the IRS to develop an improved filing rate
measure
Measuring the numerator of the VFR

 To measure the numerator of the VFR, a large


representative sample of filed individual returns
was analyzed to estimate the number of timely
filed required returns
– Some filed returns are not required
– In the process of distinguishing required from non-
required returns, we discovered that IRS instructions on
who must file are incomplete
– Our findings led to a change in IRS instructions to
clarify that the gross income concept for filing purposes
disregards all losses
Estimation of denominator of VFR
 We relied on a large national survey to identify
households that appeared to have a filing
requirement
 To address underreporting of certain income
sources on the survey, we imputed additional self-
employment earnings, pensions, and social
security to various households
– The imputations were based on an econometric
analysis of 3rd party reports (for pensions and
social security) and tax returns (for self-
employment earnings)
Findings

The bump in the filing rate in TY2007 coincides with the “Economic
Stimulus Payment” (worth $300 per family member), suggesting that this
one-time benefit encouraged many ghosts to file a return in that year.
Idea to evaluate the determinants of
filing compliance – “calibrated probit”
1 Filer
𝐹∗ = 𝛽𝐹′ 𝑋𝐹 + 𝜀𝐹 𝐹=ቊ
0 Non−filer
Tax return data: 𝐹𝑖 =1 for all observations i = 1,…,𝑁1
Survey data: 𝐹𝑗 is unknown for all observations j = 1,…,n
Survey weights: σ𝑛𝑗=1 𝑤𝑗 = 𝑁 (Overall population of filers
and non-filers)

Constrained estimation of parameters:


𝑁
1
Maximize σ𝑖=1 ln Pr(𝐹𝑖 =1) s.t. σ𝑛𝑗=1 𝑤𝑗 Pr 𝐹𝑗 =1 = 𝑁1
𝛽𝐹
Other creative approaches for measuring
non-compliance
 Searching for non-filers in Jamaica
 Searching for traces of evasion
– In consumption behaviour
– In litter
Searching for non-filers in Jamaica

 Prior to the 1986 reform, Jamaica had high


marginal tax rates, but many credits and loopholes
 Wage earners were taxed via PAYE withholding,
while self-employed were required to file a return
Alm, Bahl, and Murray (1991)
methodology
 The authors sampled 12,000 names from a master
population list based on third-party sources of information
(telephone directories, trade association lists, etc.) on
workers in 9 industries (service stations, customs
brokerages, auto repair, auto parts, hair care, real estate,
contractors, transport, beverage and spirits outlets)
 A similar approach was used to sample 600 professionals
(accountants, architects, attorneys, doctors, etc.)
 The sampled names were matched against Jamaica Income
Tax Department records to check filing and withholding
status
Results

 Only 23% of professionals and 11% percent of


non-professionals in the selected industries filed a
return or had any income withheld
 Focusing on the non-professionals, assuming that
the characteristics of those who filed were the
same as those who did not file, the amount of
undeclared income was 28.0 percent of reported
income, costing the government 38.8 percent of
actual income taxes collected
Searching for traces of evasion in
consumption behaviour
 Since consumption behaviour tends to be closely
linked to income, the idea is to infer
underreporting of income in cases where
consumption appears to be excessive in relation to
income
– Pissarides and Weber (1989)
– Feldman and Slemrod (2007)
– Fu (2008)
Pissarides and Weber (1989)
Assume food expenditures and wages are reported accurately
on national survey, but not income from self-employment
Estimate a consumption function:
𝑙𝑛𝐶 = 𝛽′ 𝑍 + 𝛾𝑙𝑛𝑌 + 𝛿𝑆𝐸 + 𝜖
where lnY = reported income
SE = dummy variable for self-employment
Z = demographic controls
Data source: 1982 Family Expenditure Survey – diary records
for food consumption used
Actual SE income could then estimated as exp(𝛿/𝛾)𝑌 (the
actual analysis they use is a bit more complex)
Results

 The estimated coefficient of the SE dummy (𝛿) is


about 0.10, while the estimated mpc (𝛾) is roughly
0.25, implying that actual self-employment
income is about exp(.10/.25)=1.5 times as large as
the reported amount (33% underreporting)
 Possible issues
– The self-employed may be more prone to eating out,
buying meals for their clients, etc.
– What the self-employed report as earnings on a survey
may be different than what they report on their tax
returns
Feldman and Slemrod (2007)
 Apply a similar approach to measure tax evasion in U.S.
by self-employed, but with charitable donations used in
place of food expenditures
– Since charitable contributions are only reported by itemizers,
authors have to control for price of giving in their analysis
 The data source is a large public use sample of federal
individual income tax return data (both cross-sectional and
longitudinal)
 The key assumption is that self-employment status does
not impact the true propensity to make donations
Results
Fu (2008)
 Instead of “inverting” a consumption function to predict
true income, Fu searches for discrepancies between income
reported on Canadian individual tax returns and an
imputed measure of overall consumption and savings
 Data sources
– Regression on Survey of Household Spending (SHS)
used to develop prediction formulae for consumption
and savings
– Prediction formula applied to Survey of Financial
Security (SFS) to impute consumption and savings
• SFS already has detailed information on income reported on
tax returns
Consumption regression

Total household consumption regression:


𝐶 = 𝑍𝛽 + 𝑋𝛾 + 𝑢
where Z = common consumption sources on the2 surveys
X = demographic factors, reported income, mortgage
to income ratio, and rent to income ratio
Separate approaches attempted with and without including
durable consumption
Findings

 Even just imputing $8/day for food, clothing, and


transportation on top of ongoing expenses in SFS
implies 8% of wage earners and 20% of self-
employed have an income statement discrepancy
 Using SHS to impute consumption yields an
estimated incidence of underreporting of 25% for
non self-employed and 60% for self-employed
(higher if durable consumption is included)
Issues

 Income measure in two data sources may differ;


not clear how this impacts imputations, which
include income as an explanatory variable
 Method assumes that consumption and savings
self-reports are accurate
 Imputations at the individual level likely to be
rather noisy
– Some sensitivity analysis was done
Searching for traces of non-compliance
in litter
 In 2007, state and local cigarette taxes in Chicago
in were $2.68 per pack higher than in neighboring
counties and over $3.10 higher than in the
bordering state of Indiana
 Cigarette packages are required to have tax stamps
affixed to them to demonstrate that required taxes
have been paid
 Smokers in Chicago have incentives to purchase
cigarettes from lower tax jurisdictions directly or
from smugglers and/or Native American
reservations
Merriman (2010) methodology

 Students worked in teams to find and collect


littered cigarette packs in randomly selected
representative tracts in Chicago and some
neighbouring locations
 The location of each pack was logged along with
information about the presence or absence of tax
stamps from different jurisdictions
Findings
 75% of sampled cigarette packs from the Chicago
area did not have a Chicago tax stamp (indicating
the packs were purchased from other lower tax
counties or states, or from contraband suppliers)
 The share of packs with a Chicago tax stamp was
lowest in Chicago neighbourhoods close to the
lower cigarette tax state (Indiana)
 More generally, regression analysis indicated that
the share of Chicago tax stamps in a location was
negatively associated with the distance to a lower
tax jurisdiction and with the level of tax savings
Selected empirical approaches

 Econometric models using random audit


data
 Aggregate panel data on tax reporting by
state/jurisdiction, supplemented by socio-
economic variables
 Lab Experiments
 Field Experiments
 Agent-Based Models
Econometric models using random audit
data

 NRP Study Findings


– Tax rates, income: hard to identify in a cross-section; Feinstein
(1991) pools 2 years and finds tax rate positively related to non-
compliance and income not significant
– Tax Preparers: Klepper et al. (1991) find preparers enforce
unambiguous tax rules and exploit ambiguous ones; Erard (1993)
finds preparers (especially CPAs and attorneys) are associated with
greater non-compliance
– Prior Audits: Erard (1992) finds weak evidence that audits
positively impact future reporting behaviour
– Demographics: elderly less likely to cheat; married more likely
Aggregate panel data findings

 Dubin et al. (1990), Plumley (1996), Dubin (2007)


find very large general deterrent effect of audit
rates
 Plumley, Erard, and Snaidauf (2011) find
predictions sensitive to specification decisions:
– Trend vs. year dummies
– Time period
– Dynamics
Lab experiments

 Can test theoretical predictions/hypotheses in a


controlled setting
– Isolate a particular factor to change, holding all other
factors constant
 However, the test is somewhat weak in that setting
is artificial (external validity is unclear):
– Salience: hard to capture moral and social influences,
real world costs and benefits – I might cheat on taxes in
a lab setting (just a game from my perspective), but
much more inclined to be honest in actual practice
Lab experiment findings

 Many studies by Alm et al. on myriad of factors


 Impact of some key factors on compliance:
– Tax rates -
– Audit rates + (modest)
– Fine rates + (small)
– Positive inducements +
– Vote over use of tax revenue (public good) +
 Replication of experiments across different
countries indicates that cultural factors/experience
impact behaviour
Field experiment: Slemrod et al. (2001)

 Randomised field experiment in Minnesota –


treatment sample told in advance their returns
would be “closely examined”; DIF-in-DIF
analysis of reported taxes before/after treatment
– Low and middle income/high opportunity groups big
increase reported taxes
– High income/high opportunity group results “perverse”
DIF-in-DIF approach

=
Impact
of the
program
Control

t = -1 t=0 t=1 Time


pre-treatment post-treatment
Field experiment Kleven et al. (2011)

 Denmark: large scale (40,000 taxpayers) randomised field


experiment; DIF-in-DIF on reported adjustments to income
on pre-populated returns
– Case 1: Threat of audit letters sent to a treatment group
• Significant improvement in reporting of self-reported income (income not
subject to 3rd party reports)

– Case 2: Randomly audited and non-audited taxpayers in


2007 investigated for change in reported income in
subsequent year
• Significant improvement in compliance associated with prior
audit in 2007
Agent-Based Models (ABMs)

 Bloomquist (2012) simulates tax compliance within a large


community of taxpayers, employers, and tax preparers
 Assumes certain behavioral rules for the various “agents”
 Allows communication among social networks
 One simulation experiment involves testing impact of
alternative audit selection strategies on taxpayer reporting.
The results illustrate that targeted audits tend to be more
effective at improving compliance
 Strengths: allows analysis of complex social interactions
and behaviors that are not analytically tractable
 Weakness: Little existing evidence to guide behavioral
rules practiced by different agents

You might also like