Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Chaos, Solitons and Fractals 104 (2017) 238256

Contents lists available at ScienceDirect

Chaos, Solitons and Fractals


Nonlinear Science, and Nonequilibrium and Complex Phenomena
journal homepage: www.elsevier.com/locate/chaos

Data science for assessing possible tax income manipulation: The case
of Italy
Marcel Ausloos a,b, Roy Cerqueti c,, Tariq A. Mir d
a
Institute of Accounting and Finance, School of Management, University of Leicester, University Road, Leicester LE1 7RH, UK
b
GRAPES, rue de la Belle Jardiniere, B-4031 Liege, Federation Wallonie-Bruxelles, Belgium
c
University of Macerata, Department of Economics and Law, via Crescimbeni 20, Macerata I-62100, Italy
d
Nuclear Research Laboratory, Astrophysical Sciences Division, Bhabha Atomic Research Center, Srinagar 190 006, Jammu and Kashmir, India

a r t i c l e i n f o a b s t r a c t

Article history: This paper explores a real-world fundamental theme under a data science perspective. It specically dis-
Received 5 July 2017 cusses whether fraud or manipulation can be observed in and from municipality income tax size dis-
Accepted 15 August 2017
tributions, through their aggregation from citizen scal reports. The study case pertains to ocial data
obtained from the Italian Ministry of Economics and Finance over the period 20072011. All Italian (20)
JEL classication: regions are considered. The considered data science approach concretizes in the adoption of the Benford
H71 rst digit law as quantitative tool. Marked disparities are found, - for several regions, leading to unex-
C82 pected conclusions. The most eye browsing regions are not the expected ones according to classical
imagination about Italy nancial shadow matters.
Keywords:
Data science 2017 Elsevier Ltd. All rights reserved.
Benford law
Aggregated income tax
Data manipulation
Italy

1. Introduction structs of errors [23]. However, despite substantial progress, in this


safety area, available methods present deciencies that limit their
This paper deals with the relevant theme of identifying the ex- usefulness, - sometimes due to unclear hypotheses underlying the
istence of anomalies in tax incomes. We specically focus on the method. Most likely, this will continue for ever, since it is well
case of Italian regions. The problem is faced under a data science known that the imagination of crooks leads to further more so-
perspective, which is suitable for the scope of the study. Indeed, phisticated manipulation, while reaction of policy makers is im-
data science represents nowadays a major area in the research paired by legal processes. Yet, controls are challenged by intelligent
frontier for processing large sets of data (see e.g.[17] and references people, lacking classical ethics.
therein contained). Without suggesting opprobrium on all Italian citizens because
The relevance of the study lies in the evidence that assessing of supposed to be tax evasion by a few, before debating individual
the errors in nancial statements is a major task of auditors, regu- cases, it is often admitted that Italy is one of the top countries los-
lators, or analysts not only in nancial markets, but also in macroe- ing to tax evasion (after the USA and Brazil) through a GDP rank-
conomic and public affairs, like on governmental economic data. ing (http://investorplace.com/investorpolitics/10- worst- countries-
Reports of accurate nancial statement data are crucial, even es- for-tax-evasion/#.Vvkqf3AR54c), or through the amount of tax
sential, to the management of public budgets. Thus, it is manda- loss as a result of shadow economy. Income manipulation might
tory to observe whether misestimations, mistakes, biases, or even thus be rightfully tested on such a (country) case, - as somewhat
manipulations have occurred or are occurring [60]. On the other in line with recently discussed pertinent topics, from points of
hand, academic researchers must propose ways for detecting er- view related to ethics and organized crime, by e.g. [4,15,16], or
rors or anomalies. Many methods have been proposed and steps [46].
taken in creating and validating techniques to assess different con- Of course, not only citizens falsify their nancial data, but also
rms and even governments [2,41,43]. For example, questions have

been raised about the data submitted by Greece to the Eurostat to
Corresponding author.
E-mail addresses: ma683@le.ac.uk, marcel.ausloos@ulg.ac.be (M. Ausloos),
meet the strict decit criteria set by the European Union (EU), -
roy.cerqueti@unimc.it (R. Cerqueti), taarik.mir@gmail.com (T.A. Mir). see [62], or about the macroeconomic data of China (Holz [37]).

http://dx.doi.org/10.1016/j.chaos.2017.08.012
0960-0779/ 2017 Elsevier Ltd. All rights reserved.
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 239

Table 1 and demonstration of the NewcombBenford law (18811938), as a


Frequency (Freq.) of the rst digit (d) in a set of data; d values ranging from 1 to
powerful methodology in the audit eld, were further emphasized
9, - according to BL1, Eq. (1).
by [29,36,58,61], among others, and also recently in [21], Nigrini
d 1 2 3 4 5 6 7 8 9 and Miller [5,8,22,45,57]. In fact, BL1 is also applied outside the
Freq. 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046 nancial audit realm; e.g. see [31] for image forensics, [7] for birth
rate anomalies, [59] for maternal mortality rates, or elsewhere
in the natural sciences [63], and on religious activities [44]. We
Managers can engage in more or less corporate tax avoidance than notice that the literature is huge: see [3], for approximately the
shareholders would otherwise prefer [6]. On the other hand, rms last decade, [25], or [10], which contains a rather exhaustive list
have incentives to manipulate earnings in order to convince in- of references.
vestors, e.g. to report a rounded to a upper value number when Limiting ourselves at the public government nancial realm
they have prots (i.e., USD 40 million) and to report a number data tampering, let us mention, among others, the application of
such as USD 39.95 million, when they have losses, - as discussed Benford law to selected balances in the Comprehensive Annual Fi-
by [65], having observed such unusual patterns in reported earn- nancial Reports of the fty states of the United States [35,38];
ings. This rounding approach points to a moderate manipulation or the political economy of numbers international macroeco-
of the data. However, its relevance is considered not to be negligi- nomic statistics [52]. A study on the analysis of the digit dis-
ble for investors. tribution of 134281 contracts issued by 20 management units in
At the lower level, that of citizens, it is also known that seem- two states, in Brazil, also found signicant deviations from Ben-
ingly small rounding manipulations can inuence nancial state- fords law [24]; see also [33]. Fraud detection (and prevention
ment users perception of credit quality [34]. At another level, that methods) in the Malaysian Public Sector have been discussed in
where the citizen is immersed in a crowd, and expects to be pro- [53].
tected by some shadow due to a bigger cheater, it is interesting At a lower scale, - ours if it has to be recalled, using Benfords
to raise the question whether a collective effect can be seen. This law has permitted to uncover deciencies in the data reported by
can be done through examining income tax contributions at local local governments, like municipalities and states in several coun-
levels. This accounting level is the core of our investigation and tries, or example, USA or Brazil. The digit distributions of the -
report. nancial statements of 3 municipalities, Valejo City, Orange County
A review of statistical methods of fraud detection has been pro- and Jefferson County, have been shown to have signicant depar-
vided by Bolton and Hand [13,14], while accountants perceptions tures from that expected on the basis of BL1 [35],
regarding fraud detection and prevention methods have been re- In Italy, tax collection is a fundamental source of revenues for
cently discussed [12]; see also [68] for a quick summary or [39] for local governments, enabling the ecient delivery of services [54].
a specic discussion of a couple of techniques. On the other hand, tax evasion is known to be widespread across
In this context, the data science approach proposed in the Italy ([15,30], Marino and Zizza [20,32]).
present paper is based on the Benford law. Obviously, any nancial distress of municipalities, resulting
This Benford law, originally for the rst digit (BL1) distribution from income tax evasion, has severe repercussions on the lives of
of data sets, follows a logarithmic law: the taxpayers and municipal employees [9]. This is annoying for
 1
 the collectivity, thus it seems important to have some better over-
P (d ) = log10 1 + , d = 1, 2, . . . , 9, (1) sight of the quality of nancial statements and accountability in
d view of the demand (and use) of funds, say returning from the
where P(d) is the probability that the rst digit is equal to d in the Italian government. Moreover, these concerns on data quality, on
data set; log10 being the logarithm in base 10. one hand, and the admittedly poor auditing procedures being used
This law stems from observations by [47] and later inde- in Italy, on the other hand, have resurfaced vigorously following
pendently by [11] that the distribution of the 1st digit is more the bankruptcy of a number of local government bodies during re-
concentrated on smaller values: the digit 1 has the highest fre- cent nancial crisis [54], - in fact, to be fair, more generally, across
quency, 9 the lowest frequency. In Table 1, the frequency of the several industrialized countries.
rst digit, as given by BL1, is recalled for the reader convenience. Within this review of specic accounting features relevant to
Thereafter, mathematics can suggest empirical law for the 2nd, our research, it might be nally interesting to point to the reader
3rd, digit distribution. However, since the latter becomes quickly a very recent and specic (by chance) italian case, i.e. the detec-
rather uniform, it becomes hard (but it is done) to use such high tion of anomalies in receivables and payables in Italian universi-
level digits for testing the validity of reported data. Thus, let us ties, by [19]. This shows that intermediary levels of nancial data
concentrate our aim below to the rst digit, i.e. on the validity (or scales may contain intriguing features, whence further suggesting
not) of BL1 in a specic case, serving as a paradigm for other big to raise questions on the detection of manipulations, through de-
data investigations. viations from Benford law, as here the case of tax incomes in e.g.
In general, Benford law has to be recommended because it con- regions.
tains many advantages like not being affected by scale invariance, Thus, we have considered the aggregated values of the income
and is admitted to be of help when there is no supporting doc- tax reports of each of the 20 IT regions, - over a recent quinquen-
ument to prove the authenticity of the transactions [67]. Nowa- nium: [20072011] for which the data is available. As suggested by
days, this so called law provides a convenient basis for digital [42] we calibrate our analysis with a 2 test.
analysis of sequences of numbers of similar nature. For example, Our point of view is politico-economic unique: the account-
an analysis based on Benford law has been used in a wide vari- ing reliability of the citizen contributions to the IT GDP, - even
ety of ways to identify instances of employee theft and tax evasion though questions are numerous. Hopefully, within this framework,
[34] and also deviation of the exchange rates from regular paths one can also (i) enlarge the knowledge of BL1 application range,
[18] or of the Libor rates [1]. (ii) contribute to a better application of BL1 in accounting, and
In fact, following [65], a manipulation expectation can be ob- even (iii) indicate that one can reach socio-economico-political
tained using the [11] law, as was later pointed out by [4850]. conclusions.
Since [51], it is admitted that BL1 can be used to detect fraud The paper is organized as follows: Section 2 is about the
in accounting data reporting individual incomes. The presentation methodology. Section 3 contains the description of the data. The
240 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256

ndings are collected in Section 4 and discussed in Section 5. The Table 2


Number Nc, r of cities in 2011, and in previous years, in the (20) IT regions; the
last section also allows us to offer suggestions for further research
region ranking follows the decreasing city number. For a more complete informa-
lines. All the Tables collecting the results at the regional level are tion on IT cities, the number of inhabitants (Ninhab ) in each region, according to the
reported in the Appendix. 2011 Census, is also reported ( ) http://dati-censimentopopolazione.istat.it/Index.
aspx?lang=en. The empirical 2 of the averaged BL1 observations for each region
2. Methodology can be read in the last column: it is included in this Table in order to alleviate
other Tables. Also recall that the critical value of the 2 test at a condence level
0.05 is 02.05 = 15.51 for a number of degrees of freedom = 8, which is that of
The reported research here below provides a thorough analysis BL1.
essentially at the regional level, of the value (= size) income tax
year ( )
distribution among Italian cities, whence their contribution to the
country GDP. Specically, the data of a given region is obtained by 2007 2009 2011 2011
region Nc, r Ninhab  2 
summing the data at a municipalities level, for the municipalities
belonging to the region under examination. We refer to the Aggre- Lombardia 1546 1546 1544 9 704 151 6.3694
gated Income Tax (AIT, hereafter) of all the citizens living in each Piemonte 1206 4 363 916 5.2830
Veneto 581 4 857 210 7.7896
Italian region. Campania 551 5 766 810 17.224
However, income manipulation by citizens, if it exists, would Calabria 409 1 959 050 4.4664
occur at the Tax Income level, whence the specic and complicated Sicilia 390 5 002 904 6.4494
wording of the title of this paper. Lazio 378 5 502 886 5.0288
Sardegna 377 1 639 362 24.587
The numerical analysis is carried out on the basis of ocial data
EmiliaRomagna 341 341 348 4 342 13 8.1620
obtained at the Italian Ministry of Economics and Finance (MEF), Trentino Adige Alto 339 333 333 1 029 475 5.3782
and concerns each year of the 20072011 quinquennium. In 2011, Abruzzo 305 1 307 30 4.7372
there were 8092 municipalities and 20 regions, with widely dif- Toscana 287 3 672 202 9.5140
ferent characteristics. Notice, at once, that this concerns a large Puglia 258 4 052 566 5.8868
Marche 246 246 239 1 541 319 8.6922
number of graphs and tables. One can concatenate them, but that Liguria 235 1 570 694 16.895
surely means 20 displays of BL1 plots, - one per region, each with Friuli Giulia Venezia 219 218 218 1 218 985 7.8444
5 sets of data points, one per year, as it is seen below. The rst Molise 136 313 660 12.657
digit location was examined using a simple home made algorithm. Basilicata 131 578 036 7.5054
Umbria 92 884 268 11.317
Actual occurrences of each data point was compared to expected
ValledAosta 74 126 806 7.6682
amounts. Total 8101 8094 8092 59 433 744
MS Excel was used to organize the data into a useable form.
The original Excel data le listed municipalities according to the al-
phabetical order of their names. This was pertinently useful within
[45] study. The aim of the present study is to analyze AIT data at 3. Data
a mesoscale, i.e., regional, level. So we rst isolated and put the
municipalities in their respective regions. We used http://www. The economic data analyzed here below has been obtained by
comuni-italiani.it/nomi/index.html to nd out the parent regions (and from) the Research Center of the Italian MEF. Contributions
of municipalities. The Excel spreadsheet included 20 columns to have been disaggregated at the municipal level (in IT a municipality
accommodate each year of data for each of the twenty regions or city is denoted as comune, - plural comuni) to the Italian GDP, for
and six columns for the ve years of analysis plus their average ve recent years: 20072011.
over the quinquennium. The rst digits were extracted by using Let it be recalled that Italy is composed of 20 regions and
the LEFT(text, num chars) function of Excel by entering the column more than 80 0 0 municipalities: the latter number has varied over
number of interest at text and 1 (for rst digit) at num chars. time, even during the examined quinquennium: from 8101 down
The frequency of each digit, 1 to 9, as rst digit is obtained by to 8092, between 2007 and 2011: several (10) cities have merged
making use of COUNTIF(range, criteria) function by specifying the into (3) new entities, while (2) others were phagocytized. More-
range of cells and the digit of interest at criteria. The fre- over, 7 municipalities have changed from the Marche a region to
quency of each digit, 1 to 9, as the rst signicant digit is thus another one (EmiliaRomagna), in 2008. The variation in the num-
determined. ber of cities in each concerned region has been taken into account:
Several tests can be made in order to assess the validity of BL1. we have made a virtual merging of cities (see also http://www.
The most classical 2 test is used as for other BL1 applications comuni-italiani.it/regioni.html), in order to compare AIT data for
[29]. Nevertheless, within this constraint, the visualization of the stable size regions. Since the number of regions has been con-
data and the reported test tables allow us to pin point regularities stantly equal to 20, the regional level seems to be the most inter-
or anomalies: acceptable conformity, suggests that the balance is esting one for any data measure and discussion. The regions are
likely not biased and should be accepted without further analysis; listed by order of importance, i.e. through their number of cities,
nonconformity does not guarantee that problems exist in the in Table 2. Observe that the to-be-expected 2 calculated for the
underlying accounts comprising the AIT or that fraud has occurred, 2011 region municipality number content is given for future refer-
but results of the BL1 analysis should be used as an indicator that ence, and avoiding extra columns or lines in other Tables.
further investigation is needed. Indeed if the data does not con- In short, the AIT of the resulting cities, whence that of the re-
form to BL1, this signals that the aggregated data may not be true gions, was linearly adapted, as if these cities and regional con-
representations; the numbers may be inuenced by operations, tent were preexisting before the merging or phagocytosis. The
biased due to (our, but we did much cross checking) error, or they (rounded) AIT in successive years and the corresponding averaged
may have been manipulated to deceive some nancial statement AIT of each region over the quinquennium are given in Table 3.
user. The AIT and AIT, for IT are in (e+11) EUR, but for regions
From the ergodicity point of view [2628,40,55,64,66], it is of in (e+10) EUR. As a complementary information, but irrelevant
interest to assess the data through a time average; this has been for the BL1 test, the number of inhabitants (Ninhab ) in each re-
taken over the relevant 5 years for each region. A 2 test has been gion, according to the 2011 Census, is also reported from http:
made also on such averages. //dati-censimentopopolazione.istat.it/Index.aspx?lang=en.
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 241

Table 3 Lombardia: rather smooth distribution, not too far from con-
Nc, r , the number of cities in each region; (rounded) AIT in successive years and
dence interval (CI); d = 6, 7 somewhat away from CI; above
average AIT of a region over the quinquennium. The AIT and AIT, for IT is in (e+11)
EUR, but for regions in (e+10) EUR. BL1+.
Piemonte: rather smooth distribution, not too far from CI; d =
i Nc, r in AIT  AIT  5, 6 somewhat away from CI, specically above BL1+.
= REGION 2007 2008 2009 2010 2011 5yr Veneto: rather smooth distribution, not too far from CI, even if
1 305 ABRUZZO 1.2871 1.2850 1.3053 1.3288 1.3613 1.3135 d = 4, 5 are somewhat away from CI, on different sides of BL1.
2 131 BASILICATA 0.4580 0.4735 0.4820 0.4834 0.4911 0.4776 d = 9 is much dispersed.
3 409 CALABRIA 1.3404 1.3961 1.4411 1.4496 1.4516 1.4158 Campania: very scattered, except d = 1, 2; other digits are far
4 551 CAMPANIA 4.1890 4.2908 4.3589 4.3833 4.3863 4.3217
from CI.
5 348 EM. ROMAGNA 6.2211 6.3448 6.2945 6.3665 6.3970 6.3248
6 218 FRIULI V.G. 1.6876 1.7121 1.7163 1.7214 1.7323 1.7139
Calabria: rather smooth distribution, not too far from CI; d =
7 378 LAZIO 7.1759 7.3436 7.4487 7.5532 7.6163 7.4275 6, 7 somewhat away from CI, above BL1+; the digit d = 2 is be-
8 235 LIGURIA 2.2020 2.2402 2.2829 2.2958 2.3003 2.2642 low BL1.
9 1544 LOMBARDIA 14.457 14.737 14.561 14.771 15.008 14.707 Sicilia: d = 1, 4 below BL1; d = 7 much dispersed.
10 239 MARCHE 1.7710 1.8045 1.7977 1.8232 1.8567 1.8106
Lazio: rather smooth distribution, not too far from CI, even if
11 136 MOLISE 0.2736 0.2823 0.2825 0.2827 0.2865 0.2815
12 1206 PIEMONTE 5.9479 6.0326 5.9797 6.0515 6.1201 6.0264 d = 4, 5, 6 are somewhat away from CI (below BL1).
13 258 PUGLIA 3.1445 3.2563 3.3082 3.3557 3.3947 3.2919 Sardegna: very very scattered, except for d = 1, 2.
14 377 SARDEGNA 1.4896 1.5510 1.5789 1.5890 1.5977 1.5612 EmiliaRomagna: rather smooth distribution, not too far from
15 390 SICILIA 3.6977 3.8324 3.9066 3.9256 3.9451 3.8615
CI. d = 3, 4, 5 are somewhat away from CI and below BL1. The
16 287 TOSCANA 4.7404 4.8175 4.8417 4.8943 4.9499 4.8487
17 333 TRENTINO A.A. 1.3967 1.4379 1.4808 1.5148 1.5360 1.4733 digit d = 8 is quite above BL1+.
18 92 UMBRIA 1.0167 1.0432 1.0539 1.0624 1.0702 1.0493 TrentinoAlto Adige: rather smooth distribution, not too far
19 74 V. DAOSTA 0.1795 0.1849 0.1889 0.1911 0.1923 0.1873 from CI, but d = 4 somewhat away from CI (above BL1+). d = 7
20 581 VENETO 6.2346 6.3244 6.2912 6.3808 6.4845 6.3431 is quite above BL1+, while d = 6 somewhat below BL1.
IT 8092 6.8910 7.0390 7.0601 7.1424 7.2178 7.0701
Abruzzo: rather smooth distribution, but scattered away from
CI; on different sides of BL1.
Table 4 Toscana: much dispersed; d = 3, 4, 6, 7 below BL1; d = 8, 9
Summary of (rounded) statistical characteristics for AIT (in Euros) of the IT regions quite above BL1+.
(Nr = 20) in 20072011. Puglia: much dispersed; d = 3, 6, 7 much below BL1; the dig-
2007 2008 2009 2010 2011  AIT  its d = 2, 9 are quite above BL1+.
Marche: very much dispersed; the digits d = 3, 6 are much be-
Min. (/e+09) 1.7950 1.8495 1.8891 1.9110 1.9230 1.8735
Maxi (/e+10) 14.457 14.737 14.561 14.771 15.008 14.707 low BL1, while d = 4, 5, 8, 9 are quite above BL1+.
Sum (/e+10) 69.910 70.390 70.601 71.424 72.178 70.701 Liguria: very much dispersed. The digits d = 3, 6, 8 are much
Mean (/e+10) 3.4455 3.5195 3.5300 3.5712 3.6089 3.5350 below BL1, while d = 2, 4, 6, 9 are quite above BL1+.
Median (/e+10) 1.9865 2.0223 2.0403 2.0595 2.0785 2.0374 FriuliVenezia Giulia : rather smooth distribution, but rather far
RMS (/e+10) 4.7793 4.8753 4.8601 4.9227 4.9831 4.8840
Std Dev (/e+10) 3.3982 3.4613 3.4274 3.4761 3.5254 3.4576 from CI; d = 4, 5, 7 quite away from CI, below BL1; d = 2, 3, 8
Std Error (/e+09) 7.5986 7.7397 7.6639 7.7729 7.8831 7.7314 rather away above BL1+.
Skewness 1.7795 1.7784 1.7406 1.7478 1.7672 1.7627 Molise: very much scattered from BL1. The digits d = 1, 2 are
Kurtosis 3.4321 3.4359 3.2850 3.3097 3.3902 3.3708 much much below BL1, while d = 3, 6 are much much above
BL1+. Wide variation for other digits.
Basilicata: much scattered from BL1. The digits d = 4, 8 are
4. Results much below BL1, while d = 5, 6 are much above BL1+. A wide
variation for all digits is observed.
Beside the (rounded) AIT in successive years and average AIT Umbria: much scattered from BL1. The digits d = 4, 6, 7, 8, 9 are
of a region over the quinquennium AIT and AIT, given for the much below BL1, while d = 1, 2, 5 are much above BL1+. Also
regions in (e+10) EUR) units, and for IT in (e+11) EUR, given in in this case, wide variation for all digits
Table 3, the statistical characteristics of the AIT regional distribu- Valle dAosta: much scattered from BL1. d = 4, 8 are much be-
tion for 20072011 is reported in Table 4, together with the cor- low BL1, and d = 2, 5, 7 are much above BL1+. Once again,
responding time average. The skewness and kurtosis are obviously there is wide variation for all digits
both positive, and the mean greater than the median (by a factor
1.75). Relevantly, it can be observed that the minimum and max- 4.2. 2 conformity test
imum AIT, both have 1 as rst digit.
It has been emphasized that there are ve 2 values to be cal-
culated for each region, - one for each year. These 1one hundred
4.1. BL1 displays 2 values are found in Tables 57. Moreover, the resulting 2 for
the time average values of the AIT is also given,- but recalled that
Each BL1 data set for each (20) region is displayed on Figs. 1 it is given in Table 2 for simplifying the reading of the Tables 57.
20; different symbols are used in order to distinguish the 5 years However, before examining each region BL1, it seems fair to ex-
so examined. On each gure, the frequency of the d digit is given, amine whether the distribution of the 2 itself is anomalous, - in
together with the theoretical BL1. Moreover, the sample stan- order to pin point (if it occurs) some statistical anomaly arising
 error bar, i.e. depending on the number of cities, i.e. 
dard from computations. Such a distribution of the 100 (20 regions, 5
(1/ (Nc,r 1 ) ) allows to draw the estimated range of the con- years) 2 is shown in Fig. 21.
dence interval dened as [BL1, BL1+], in obvious notations. The The distribution is markedly positively skewed, but rather irreg-
display order has been chosen according to the region ranking ular, with a set of outliers. Recall that the critical value of the 2 at
given in Table 2. a signicance level 0.05 is 02.05 = 15.51 when the number of de-
Denote the rst digit as d, so that d = 1, . . . , 9. Visual observa- grees of freedom is = 8 - that of BL1. This rather highly peaked
tions (list in descending order of Nc, r ) point to: distribution with a few outliers (in particular Sardegna and Liguria,
242 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256

Lombardia, Nc=1546
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 1. Lombardia.

Piemonte, Nc=1206
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 2. Piemonte.
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 243

Veneto, Nc=581
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 3. Veneto.

Campania, Nc=551
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 4. Campania.
244 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256

Calabria, Nc=409
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 5. Calabria.

Sicilia, Nc=390
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 6. Sicilia.
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 245

Lazio, Nc=378
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 7. Lazio.

Sardegna, Nc=377
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 8. Sardegna.
246 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256

Emilia-Romagna, Nc=348
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 9. Emilia Romagna.

Trento-Alto Adige, Nc=339


0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 10. Trentino Alto Adige.


M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 247

Abruzzo, Nc=305
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 11. Abruzzo.

Toscana, Nc=287
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 12. Toscana.


248 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256

Puglia, Nc=258
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 13. Puglia.

Marche, Nc=246
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 14. Marche.


M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 249

Liguria, Nc=235
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 15. Liguria.

Friuli-Venezia Giulia, Nc=219


0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 16. FriuliVenezia Giulia.


250 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256

Molise, Nc=136
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 17. Molise.

Basilicata, Nc=131
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 18. Basilicata.


M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 251

Umbria, Nc=92
0.40
BL1+
BL1
0.35 BL1-
2007
2008
2009
0.30
2010
2011

0.25
Proportion

0.20

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 19. Umbria.

Valle d'Aosta, Nc=74


0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011

0.20
Proportion

0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10
Digit

Fig. 20. Valle dAosta.


252 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256

Table 5
Frequencies of the rst digit in reported AIT data for IT regions with a high Nc number, for various years; it can be visually compared to the expected frequency according
to BL given in Table 1; the corresponding calculated 2 is to be compared with the theoretical one (15.507) for a number of degree of freedom ( = 8) at 0.05%. The null
hypothesis is not veried for underlined cases.

Digit 1 2 3 4 5 6 7 8 9 2
Year Lombardia, Nc = 1546 for 20072019 ; Nc = 1544 for 20102011
2007 0.298 0.168 0.118 0.092 0.087 0.074 0.064 0.050 0.048 5.297
2008 0.297 0.175 0.115 0.089 0.085 0.073 0.070 0.045 0.051 9.692
2009 0.299 0.176 0.114 0.093 0.085 0.072 0.070 0.048 0.043 7.348
2010 0.296 0.172 0.119 0.096 0.082 0.071 0.069 0.054 0.043 4.674
2011 0.300 0.172 0.117 0.097 0.078 0.074 0.067 0.051 0.043 4.836
Piemonte, Nc = 1206
2007 0.304 0.185 0.119 0.091 0.088 0.061 0.064 0.052 0.036 6.279
2008 0.300 0.186 0.112 0.096 0.085 0.065 0.065 0.044 0.047 5.175
2009 0.303 0.180 0.110 0.095 0.090 0.061 0.061 0.053 0.046 4.925
2010 0.304 0.185 0.111 0.094 0.086 0.070 0.055 0.054 0.041 4.327
2011 0.299 0.186 0.114 0.095 0.076 0.080 0.056 0.051 0.042 5.709
Veneto, Nc = 581
2007 0.310 0.165 0.136 0.117 0.057 0.057 0.062 0.040 0.057 11.327
2008 0.313 0.167 0.139 0.107 0.062 0.062 0.065 0.041 0.043 6.251
2009 0.320 0.170 0.138 0.105 0.062 0.064 0.062 0.043 0.036 6.309
2010 0.325 0.172 0.138 0.103 0.067 0.062 0.055 0.053 0.024 9.569
2011 0.322 0.170 0.134 0.105 0.071 0.057 0.060 0.048 0.033 5.492
Campania, Nc = 551
2007 0.309 0.160 0.096 0.094 0.067 0.102 0.051 0.054 0.067 21.647
2008 0.314 0.160 0.093 0.100 0.065 0.100 0.044 0.062 0.064 23.021
2009 0.310 0.176 0.091 0.091 0.071 0.093 0.051 0.060 0.058 14.559
2010 0.310 0.172 0.091 0.087 0.082 0.087 0.049 0.065 0.056 13.556
2011 0.310 0.172 0.087 0.083 0.087 0.078 0.060 0.058 0.064 13.336
Calabria, Nc = 409
2007 0.298 0.152 0.132 0.098 0.073 0.073 0.068 0.046 0.059 4.439
2008 0.301 0.164 0.127 0.093 0.073 0.068 0.073 0.042 0.059 4.514
2009 0.298 0.164 0.120 0.100 0.064 0.076 0.071 0.049 0.059 4.939
2010 0.308 0.166 0.115 0.103 0.061 0.081 0.073 0.051 0.042 5.420
2011 0.301 0.166 0.117 0.095 0.073 0.086 0.064 0.054 0.044 3.020
Sicilia, Nc = 390
2007 0.287 0.195 0.123 0.077 0.085 0.079 0.051 0.049 0.054 4.614
2008 0.272 0.205 0.123 0.077 0.082 0.069 0.067 0.044 0.062 7.728
2009 0.279 0.203 0.123 0.077 0.079 0.072 0.064 0.051 0.051 4.421
2010 0.282 0.197 0.131 0.074 0.085 0.056 0.079 0.038 0.056 9.723
2011 0.290 0.195 0.126 0.079 0.077 0.067 0.069 0.038 0.059 5.761
Lazio, Nc = 378
2007 0.304 0.193 0.119 0.095 0.061 0.058 0.053 0.056 0.061 4.980
2008 0.310 0.188 0.127 0.093 0.069 0.050 0.066 0.042 0.056 4.360
2009 0.315 0.193 0.116 0.087 0.082 0.048 0.061 0.040 0.058 5.894
2010 0.320 0.190 0.119 0.093 0.077 0.053 0.056 0.034 0.058 5.613
2011 0.312 0.198 0.127 0.079 0.077 0.061 0.045 0.048 0.053 4.297

- noticed in different scal years) is another hint toward pursuing 4.2.1.  2  13.362, and yearly (ir)regularity
a BL1 analysis at the regional basis. It is found from Table 2 that the BL1  2  has a small value
For emphasizing the regional aspects, and connecting with in 17 regions; from the lowest to the highest: Calabria (4.4664),
Table 2 last column, the distribution of the mean (average over the Abruzzo (4.7372), Lazio (5.0288), Piemonte (5.2830) Trentino
quinquennium)  2  for each region, is shown in Fig. 22. Three Alto Adige (5.3782), Puglia (5.8868) Lombardia (6.3694) Sicilia
regions are markedly outliers in the upper range: Sardegna, Cam- (6.4494) Basilicata (7.5054), Valle dAosta (7.6682), Veneto (7.7896),
pania and Liguria, for approximately  2  15, pointing to much FriuliVenezia Giulia (7.8444), EmiliaRomagna (8.1620), Marche
(questionable) non-conformity with BL1. On the contrary, 2 regions (8.6922), Toscana (9.5140), Umbria (11.317) and Molise (12.657).
have a low  2  over the quinquennium, Abruzzo and Calabria, in- Among these, Tables 57 show that much regularity is observed
dicating very ne agreement with BL1 ( 2  4). in the respective yearly values, with some exceptions.
At this stage, it seems important to point to a quite substantial The values for Veneto (11.33) in 2007, EmiliaRomagna (10.60)
time-invariance of the 2 values, according to the displayed data/. in 2010, and Valle dAosta (11.31) in 2010 are such that the respec-
Thereafter, it seems natural to discuss each region BL1 values, tive 2 fall outside the 5% sampling error bar. The largest 2 value
to notice conformity or not, starting from the most valid and end- for Molise occurs in 2008, 20.99, although a surprisingly quite
ing with the most anomalous. In so doing, one can distinguish val- small 2 value 9.74 occurs in 2007. The 2 values for Umbria
ues according to the standard risk value for rejecting the null hy- are high, but without any severe hint of an anomaly; the distri-
pothesis, i.e. a uniform distribution, at 02.05 = 15.507 for a number bution of 2 values is quite narrow indeed for Umbria, implying
of degrees of freedom = 8; for completeness, let us observe that some systematicity.
(for = 8) the critical value of the 2 at a signicance level 0.10
is 02.10 = 13.362.
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 253

Table 6
Frequencies of the rst digit in reported AIT data for IT regions with a medium range Nc number, for various years; it can be visually compared to the expected frequency
according to BL given in Table 1; the corresponding calculated 2 is to be compared with the theoretical one (15.507) for a number of degree of freedom ( = 8) at 0.05%.
The null hypothesis is always veried in these cases.

Digit 1 2 3 4 5 6 7 8 9 2
Year Sardegna, Nc = 337
2007 0.310 0.167 0.093 0.053 0.050 0.119 0.064 0.109 0.034 56.001
2008 0.310 0.172 0.095 0.106 0.069 0.093 0.077 0.024 0.053 15.606
2009 0.316 0.175 0.101 0.095 0.085 0.069 0.093 0.032 0.034 13.908
2010 0.318 0.175 0.095 0.090 0.093 0.061 0.095 0.034 0.037 16.059
2011 0.314 0.174 0.103 0.087 0.092 0.066 0.103 0.029 0.032 21.363
EmiliaRomagna, Nc = 348
2007 0.296 0.201 0.103 0.078 0.069 0.069 0.057 0.066 0.060 7.516
2008 0.310 0.204 0.101 0.080 0.066 0.066 0.060 0.063 0.049 6.121
2009 0.305 0.201 0.103 0.072 0.075 0.060 0.060 0.060 0.063 8.040
2010 0.290 0.201 0.112 0.072 0.072 0.055 0.066 0.063 0.069 10.604
2011 0.299 0.195 0.115 0.072 0.060 0.069 0.063 0.072 0.055 8.529
TrentinoAlto Adige, Nc = 339
2007 0.322 0.159 0.127 0.100 0.077 0.065 0.062 0.062 0.027 4.713
2008 0.300 0.168 0.126 0.099 0.081 0.048 0.075 0.060 0.042 4.225
2009 0.285 0.171 0.135 0.105 0.078 0.048 0.084 0.039 0.054 7.975
2010 0.300 0.168 0.120 0.114 0.078 0.057 0.072 0.048 0.042 2.992
2011 0.294 0.156 0.117 0.120 0.087 0.048 0.072 0.048 0.057 6.986
Abruzzo, Nc = 305
2007 0.292 0.187 0.105 0.085 0.089 0.056 0.075 0.052 0.059 5.381
2008 0.295 0.170 0.111 0.089 0.098 0.059 0.062 0.056 0.059 3.852
2009 0.289 0.164 0.111 0.102 0.079 0.062 0.066 0.056 0.072 6.090
2010 0.298 0.154 0.121 0.092 0.085 0.069 0.066 0.052 0.062 3.252
2011 0.318 0.144 0.115 0.095 0.079 0.062 0.079 0.059 0.049 5.111
Toscana, Nc = 287
2007 0.331 0.188 0.125 0.059 0.066 0.070 0.035 0.059 0.066 11.580
2008 0.328 0.195 0.105 0.080 0.070 0.059 0.042 0.059 0.063 7.097
2009 0.321 0.206 0.094 0.084 0.073 0.056 0.038 0.066 0.063 10.148
2010 0.341 0.195 0.101 0.084 0.066 0.059 0.042 0.045 0.066 8.958
2011 0.369 0.185 0.101 0.091 0.073 0.042 0.056 0.038 0.045 9.787
Puglia, Nc = 258
2007 0.306 0.194 0.116 0.089 0.089 0.043 0.054 0.047 0.062 5.060
2008 0.295 0.213 0.101 0.101 0.074 0.078 0.039 0.047 0.054 5.989
2009 0.291 0.217 0.101 0.097 0.074 0.078 0.035 0.062 0.047 7.260
2010 0.302 0.209 0.105 0.101 0.078 0.074 0.031 0.062 0.039 6.800
2011 0.310 0.202 0.112 0.101 0.070 0.074 0.035 0.054 0.043 4.325

4.2.2.  2  15.507, and yearly (ir)regularity Deviations from BL1 are usually read as data manipulation.
In contrast, Liguria (16.895), Campania (17.224), and Sardegna However, the law is surely a subject of controversy in accounting.
(24.587) are the 3 regions with an indication of much lack of con- It is not clear even now neither why it should be valid at all, under
formity with respect to BL1. whatever socio-economic conditions, nor whether its theoretical
The two largest 2 values for Sardegna occur in 2007 and 2011: derivation, under various hypotheses, informs us on its origins and
56.00 and 21.36, respectively. The largest 2 values for Campa- its range of applications. Even Newcomb and Benford were dubious
nia occur in 2007 and 2008: 21.65 and 23.02, respectively. The of the realm of validity. Therefore, a deep exploration of the re-
largest 2 value for Liguria occurs in 2008: 27.17; surprisingly, gional reality behind the AIT data, of how they have been collected
a quite small 2 value 9.70 occurs in 2011. and of the shadow economy at a regional level are required to pro-
vide a rigorous interpretation of the results. This is well-beyond
the scopes of this paper. We can only give some suggestions and
5. Discussion discussions, to be likely taken as arguments for future studies.
Among the 3 regions with very anomalous BL1 2 , two belong
This section xes and discusses the results of the investigation. to S: Sardegna and Campania, while the other comes from N: Lig-
In general, the concordance between the AIT of Italian regions uria.
and the theoretical statement of BL1 is rather questionable. There Sardegna is characterized by a noticeable fragmentation at a
are discrepancies at a regional level, an this is in line with the het- city level in several municipalities with very small number of in-
erogeneous nature of Italian regions under a socio-economic point habitants. This region does not have a highly developed industrial
of view. In particular, one can note a very good matching between structure, and a wide part of the regional economy is still based on
geographic and economic features of the regions, and cluster them agriculture and livestock. In the small communities the economy is
among N (North), C (Center), S (South, plus Sicilia and Sardegna). somewhat closed, and business exchanges are often based on com-
N is the part of Italy constituted by 8 regions: EmiliaRomagna, modities. In such a situation, one can guess that the existence of
FriuliVenezia Giulia, Liguria, Lombardia, Piemonte, TrentinoAlto a discrepancy between the ocial data and the income tax should
Adige, Valle dAosta and Veneto; come from the real regional economy.
C contains 5 regions: Abruzzo, Lazio, Marche, Toscana and Um- Sadly, Campania has the relevant problem of a massive inu-
bria; ence of the organized crime on the economic system. Hence, devi-
S is the remaining 7 regions: Basilicata, Calabria, Campania, ations from BL1 can be viewed as expected.
Molise, Puglia, Sardegna, Sicilia.
254 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256

Table 7
Frequencies of the rst digit in reported AIT data for IT regions with a low Nc number, for various years; it can be visually compared to the expected frequency according
to BL given in Table 1; the corresponding calculated 2 is to be compared with the theoretical one (15.507) for a number of degree of freedom ( = 8) at 0.05%. The null
hypothesis is not veried for underlined cases.

Digit 1 2 3 4 5 6 7 8 9 2
Year Marche, Nc = 239
2007 0.268 0.172 0.092 0.121 0.084 0.063 0.054 0.071 0.075 11.051
2008 0.285 0.172 0.092 0.113 0.084 0.071 0.059 0.054 0.071 6.486
2009 0.280 0.180 0.088 0.117 0.088 0.071 0.067 0.042 0.067 7.370
2010 0.285 0.180 0.096 0.096 0.109 0.054 0.071 0.038 0.071 9.946
2011 0.293 0.197 0.088 0.096 0.105 0.042 0.071 0.054 0.054 8.608
Liguria, Nc = 235
2007 0.272 0.209 0.085 0.089 0.077 0.098 0.043 0.047 0.081 15.920
2008 0.268 0.213 0.089 0.081 0.055 0.111 0.055 0.034 0.094 27.173
2009 0.294 0.209 0.077 0.098 0.060 0.094 0.060 0.034 0.077 15.719
2010 0.289 0.196 0.094 0.102 0.047 0.094 0.072 0.030 0.077 15.954
2011 0.306 0.191 0.089 0.115 0.047 0.089 0.060 0.043 0.060 9.707
FriuliVenezia Giulia, Nc = 219 for 2007 ; Nc = 218 for 20082011
2007 0.279 0.210 0.142 0.078 0.055 0.059 0.068 0.064 0.046 6.075
2008 0.289 0.220 0.138 0.078 0.046 0.073 0.055 0.064 0.037 7.940
2009 0.280 0.220 0.142 0.078 0.041 0.069 0.064 0.064 0.041 8.993
2010 0.303 0.206 0.142 0.073 0.050 0.073 0.060 0.060 0.032 6.516
2011 0.284 0.211 0.151 0.069 0.055 0.073 0.041 0.073 0.041 9.698
Molise, Nc = 136
2007 0.243 0.125 0.191 0.118 0.074 0.074 0.074 0.051 0.051 9.741
2008 0.221 0.110 0.213 0.118 0.059 0.103 0.044 0.074 0.059 20.990
2009 0.250 0.110 0.184 0.110 0.081 0.103 0.051 0.066 0.044 11.890
2010 0.228 0.132 0.169 0.118 0.096 0.081 0.066 0.037 0.074 10.475
2011 0.235 0.140 0.169 0.096 0.103 0.081 0.081 0.029 0.066 10.190
Basilicata, Nc = 131
2007 0.290 0.160 0.130 0.061 0.122 0.076 0.061 0.023 0.076 9.965
2008 0.282 0.168 0.107 0.107 0.084 0.092 0.031 0.061 0.069 5.365
2009 0.290 0.160 0.122 0.099 0.061 0.099 0.053 0.053 0.061 3.567
2010 0.290 0.168 0.115 0.092 0.076 0.122 0.023 0.053 0.061 9.692
2011 0.305 0.160 0.099 0.115 0.053 0.122 0.046 0.046 0.053 8.938
Umbria, Nc = 92
2007 0.380 0.207 0.130 0.065 0.109 0.054 0.011 0.022 0.022 10.855
2008 0.370 0.207 0.130 0.065 0.120 0.043 0.022 0.022 0.022 10.348
2009 0.370 0.207 0.130 0.076 0.120 0.043 0.022 0.022 0.011 11.093
2010 0.370 0.217 0.120 0.076 0.109 0.065 0.011 0.0 0 0 0.033 12.352
2011 0.380 0.217 0.109 0.087 0.109 0.054 0.011 0.011 0.022 11.938
Valle dAosta, Nc = 74
2007 0.270 0.230 0.095 0.081 0.122 0.054 0.081 0.041 0.027 5.456
2008 0.270 0.243 0.108 0.068 0.108 0.054 0.081 0.014 0.054 6.760
2009 0.284 0.203 0.122 0.054 0.149 0.054 0.068 0.027 0.041 7.477
2010 0.284 0.216 0.108 0.027 0.162 0.068 0.054 0.041 0.041 11.309
2011 0.284 0.176 0.135 0.041 0.135 0.081 0.054 0.027 0.068 7.339

The economic system of Liguria seems to be affected by the be explained only based on BL1: we do not understand why some
pervasion of shadow economy. Confartigianato (the Italian associa- regions do not have BL1 violators; we avoid to propose specula-
tion of artisans and small businesses) states that about 73% of the tive statements which might be called resulting from imagina-
artisans is in competition with illegal and shadow economies (see tion. Yet, this further supports the point that a deeper analysis
http://www.confartigianatoliguria.it/node/4153). This evidence rep- should be carried out to investigate the nature of the scal data
resents a good hint for explaining why Liguria exhibits this dis- and how they are usually collected and approved [56].
crepancy with respect to BL1.
Unexpectedly, we admit so, the remaining regions are in ac-
cordance with the BL1, with some disparities over the quinquen- 6. Conclusions
nium as highlighted in the previous Section. From both an eco-
nomic and a social point of view, some regions are quite similar to Today Benfords law is routinely used by forensic analysts to de-
Campania, with a remarkable presence of organized crime (think tect error, incompleteness and dubious manipulation of nancial
at Calabria, Puglia and Sicilia, but also to the North with Veneto data. The basic premise of the test is that the rst digits in real
and Lombardia). Basilicata and Molise are similar to Sardegna for data, in general, have a tendency to approach the Benford distri-
what concerns the absence of a well-established industrial struc- bution whereas people intending to play with the numbers, when
ture. It is also worth mentioning that Basilicata is the main pro- unaware of the law, try to place the digits uniformly. Thus any
ducer of fossil fuels in Italy (see the report in http://www.siteb.it/ departure from the law raises some suspicion. We have assessed
new%20siteb/documenti/RASSEGNA/6711_7.pdf). Moreover, shadow the tax income possible manipulation of citizens in Italy through
economy is generally widespread in the entire country. accounting city aggregated income tax reports from all Italian re-
To conclude: we demonstrate some hints for further exploring gions, with data obtained from the Research Center of the Italian
cases of violations of BL1, whence likely possible tax income ma- MEF.
nipulation through accounting city Aggregated Income Tax reports The validity of the reported data does not seem to have at-
throughout all Italian Regions. We admit that not every nding can tracted ocial accountants. For example, something like Economia
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 255

possible to document the variation in income taxes across regions,


to order them, and to observe a distribution of anomalies, also in
time. The sampling data at this aggregated level is large enough
to look in details at pertinent numbers, - without making strong
econometric modeling assumptions. The restriction on the demand
of a large database (at the country level) expected to provide the
scale needed for the data to be suciently granular can be relaxed.
The data analysis points to different regional realities, sometimes
quite unexpectedly.
Durtschi et al. [29] have pointed out that when interpreting re-
sults of Benfords law, one should be aware of a few risks: How-
ever, Benford law is most effective for large data set, when data
represents more than one distribution, when the mean is greater
than the median, and the skewness is positive. This seems to be
the case in our investigation.
Our ndings demonstrate also that Benfords law seems effec-
tive in detecting data bias in not too large data sets. There is
(alas) no doubt that manipulation of income reports exist in var-
ious regions and municipalities. Either a few accountants are so
Fig. 21. Distribution of yearly 2 values in BL1 analysis of AIT in the 20 IT regions. well aware that BL1 is a test and can avoid the non-conformity
surprise, at the individual level, but cannot do so at the next (city
aggregation) level.
Thus, to our knowledge, this paper is the rst to document
whether income tax aggregated data conforms to the (rst) Ben-
fords law, i.e. how without examining (individual) citizens nan-
cial reports are likely to exhibit divergences. In so doing, a view of
fraud or manipulation is put on a more collective level.

Acknowledgments

This paper is part of scientic activities in COST Action IS1104,


The EU in the new complex geography of economic systems:
models, tools and policy evaluation.

Supplementary material

Supplementary material associated with this article can be


found, in the online version, at 10.1016/j.chaos.2017.08.012.

References
Fig. 22. Distribution of the mean (average over the quinquennium) 2 values in
BL1 analysis of AIT in the 20 IT regions.
[1] Abrantes-Metz RM, Villas-Boas SB, Judge G. Tracking the libor rate. Appl Econ
Lett 2011;18(10):8939.
[2] Aggarwal R, Lucey BM. Psychological barriers in gold prices? Rev Financial
Econ 2007;16(2):21730.
e Finanza locale Rapporto 2010 or RAPPORTO ANNUALE 2012 La situ-
[3] Alali FA, Romero S. Benfords law: analyzing a decade of nancial data. J
azione del Paese fall short of discussing the data validity. Emerging Technol Accounting 2013;10(1):139.
This paper provides an examination of a scal data set stem- [4] Alexeev M, Janeba E, Osborne S. Taxation and evasion in the presence of ex-
ming from the Italian citizens, on a regional level. Specically, it tortion by organized crime. J Comp Econ 2004;32:37587.
[5] Amiram D, Bozanic Z, Rouen E. Financial statement errors: evidence from the
focuses on the assessment of potential manipulation of tax income distributional properties of nancial statement numbers. Rev Accounting Stud
through the adoption of the Benford law for the rst digit over the 2015;20(4):154093. Ibid. Erratum to: Financial statement errors: evidence
quinquennium 20072011. from the distributional properties of nancial statement numbers. Review of
Accounting Studies, 20(4), 15941595.
The BL1 presents signicant advantages over alternative mea- [6] Armstrong CS, Blouin JL, Jagolinzer AD, Larcker DF. Corporate governance, in-
sures of accounting quality currently used in practice. For example, centives, and tax avoidance. J Accounting Econ 2015;60(1):117.
it does not require time-series, cross-sectional, or forward-looking [7] Ausloos M, Herteliu C, Ileanu B. Breakdown of Benfords law for birth data.
Physica A 2015;419:73645.
information, nor details on transactions. [8] Ausloos M, Castellano R, Cerqueti R. Regularities and discrepancies of credit
Throughout the paper we refer to municipalities, though in default swaps: a data science approach through Benfords law. Chaos, Solitons
practice we are investigating the incomes of citizens, but we avoid and Fractals 2016. doi:10.1016/j.chaos.2016.03.002.
[9] Bartolini D, Santolini R. Political yardstick competition among Italian munici-
any individual information on whether individuals correctly report
palities on spending decisions. Ann Reg Sci 2012;49(1):21335.
their income. This is an important distinction to the extent that [10] Beebe N.H.F. A bibliography of publications about Benfords law, Heaps law,
this is precisely the population set of interest. Though we nd sig- and Zipfs law. 2016. http://ftp.math.utah.edu/pub/tex/bib/benfords-law.pdf.
[11] Benford F. The law of anomalous numbers. Proc Am Philos Soc
nicant variations in municipality tax incomes by regions, much of
1938;74(8):55172.
the variation is actually attributable to differences in the character- [12] Bierstaker JL, Brody RG, Pacini C. Accountants perceptions regarding fraud de-
istics of regions tection and prevention methods. Managerial Auditing J 2006;21(5):52035.
Another purpose of this paper has been to provide a proof of [13] Bolton RJ, Hand DJ. Unsupervised proling methods for fraud detection. Credit
Scoring Credit Control 2001;VII:23555.
the BL1 concept for using the scal data at a regional level in order [14] Bolton RJ, Hand DJ. Statistical fraud detection: a review. Stat Sci
to provide some information on manipulation. It is shown that it is 2002;17(3):23549.
256 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256

[15] Brosio G, Cassone A, Ricciuti R. Tax evasion across Italy: rational non- compli- [40] Lucas RE, Sargent T. After Keynesian macroeconomics. rational expectations
ance or inadequate civic concern. Public Choice 2002;112(3):25973. and econometric practice 1. London: George Allen & Unwin; 1981. p. 295319.
[16] Calderoni F. Where is the maa in Italy? measuring the presence of the maa [41] Luippold BL, Kida T, Piercey MD, Smith JF. Managing audits to manage earn-
across Italian provinces. Global Crime 2011;12(1):4169. ings: the impact of diversions on an auditors detection of earnings manage-
[17] Carbone A, Jensen M, Sato AH. Challenges in data science: a complex systems ment. Accounting, Organiz Soc 2015;41(2):3954.
perspective. Chaos, Solitons Fractals 2016;90:17. [42] Lusk EJ, Halperin M. Detecting newcomb-benford digital frequency anomalies
[18] Carrera C. Tracking exchange rate management in latin America. Rev Financial in the audit context: suggested chi2 test possibilities. Accounting Finance Res
Econ 2015;25:3541. 2014;3(2):191205.
[19] Ciaponi F, Mandanici F. Using digital frequencies to detect anomalies in receiv- [43] Michalski T, Stoltz G. Do countries falsify economic data strategically? some
ables and payables: an analysis of the Italian universities. Ekonomski i socijalni evidence that they might. Rev Econ Stat 2013;95:591616.
razvoj 2015;2(1):86108. [44] Mir TA. The law of the leading digits and the world religions. Physica A
[20] Chiarini B, Marzano E, Schneider F. Tax rates and tax evasion: an empirical 2012;391:7928.
analysis of the long-run aspects in Italy. Eur J Law Econ 2013;35(2):27393. [45] Mir TA. The leading digit distribution of the worldwide illicit nancial ows.
[21] Cleary R, Thibodeau JC. Applying digital analysis using Benfords law to detect Qual Quant 2016;50(2016):27181.
fraud: the dangers of type I errors. Auditing A J PractTheory 2005;24(1):77 [46] Mir TA, Ausloos M, Cerqueti R. BenfordS law predicted digit distribution of ag-
81. gregated income taxes: the surprising conformity of Italian cities and regions.
[22] Clippe P, Ausloos M. Benfords law and theil transform of nancial data. Phys- Eur Phys J B 2014;87(11):261.
ica A 2012;391(24):655667. [47] Newcomb S. Note on the frequency of use of the different digits in natural
[23] Cooper DJ, Dacin T, Palmer D. Fraud in accounting, organizations and numbers. Am J Math 1881;4:3940.
society: extending the boundaries of research. Accounting, OrganizSoc [48] Nigrini M. Using digital frequencies to detect fraud, 8. The White Paper; 1994.
2013;38(6):44057. p. 36.
[24] Costa JIF, Santos J, Travassos SKM. An analysis of federal entities compliance [49] Nigrini MJ. A taxpayer compliance application of Benfords law. J Am Taxation
with public spending: applying the newcomb-Benford law to the 1st and Assoc 1996;18(1):7291.
2nd digits of spending in two brazilian states. Revista Contabilidade & Fi- [50] Nigrini M. BenfordS law: applications for forensic accounting, auditing, and
nanas-USP 2012;23(60):18798. fraud detection. Hoboken, N.J.: Wiley; 2012.
[25] Costa JIF, Travassos, M SK, Santos J. Application of newcomb-Benford law in [51] Nigrini MJ, Mittermeir LJ. The use of Benfords law as an aid in analytical pro-
accounting audit: a bibliometric analysis in the period from 1988 to 2011. 10th cedures. Auditing 1997;16:5267.
International conference on information systems and technology management, [52] Nye J, Moul C. The political economy of numbers: on the application of Ben-
June 1214 (2013). Sao Paulo, Brazil; 2013. fords law to international macroeconomic statistics. BE J Macroeconomics
[26] Davidson P. Sensible expectations and the long-run non-neutrality of money. J 2007;7(1).
Post Keynesian Econ 1987;10(1):14653. [53] Othman R, Aris NA, Mardziyah A, Zainan N, Amin NM. Fraud detection and
[27] Davidson P. Reality and economic theory. J Post Keynesian Econ prevention methods in the Malaysian public sector: accountants and internal
1996;18(4):479508. auditors perceptions. Procedia Econ Finance 2015;28:5967.
[28] Davidson P. The keynes solution: the path to global economic prosperity. Pal- [54] Padovani E, Scorsone E. Measuring nancial health of local governments a
grave/Macmillan; 2009. comparative framework. Year Book of Swiss Administrative Sciences; 2011.
[29] Durtschi C, Hillison W, Pacini C. The effective use of Benfords law to assist in [55] Palmer RG. Broken ergodicity. Adv Phys 1982;31(6):669735.
the detecting of fraud in accounting data. J Forensic Accounting 2004;5:1734. [56] Pentland BT, Carlile P. Audit the taxpayer, not the return: tax auditing as an
[30] Fiorio CV, DAmuri F. Workers tax evasion in Italy. Giornale degli Economisti e expression game. Accounting, Organiz Soc 1996;21(23):26987.
Annali di Economia 2005;64(2/3):24770. [57] Pimbley JM. Benfords law and the risk of nancial fraud. Risk Professional;
[31] Fu D, Shi YQ, Su W. A generalized Benfords law for JPEG coecients and its 2014. p. 17.
applications in image forensics. Electron Imaging 2007. 65051L-65051L Inter- [58] Pinkham RS. On the distribution of rst signicant digits. Ann Math Stat
national Society for Optics and Photonics. 1961;32(4):122330.
[32] Galbiati R, Zanella G. The tax evasion social multiplier: evidence from Italy. J [59] Pollach G, Jung K, Namboya F, Pietruck C. Maternal mortality rate. a reliable
Public Econ 2012;96(5):48594. indicator? Int J Clin Med 2015;6:3426.
[33] Gava AM, Vitiello L. Ination, quarterly balance sheets and the possibility [60] Puyou F-R. Ordering collective performance manipulation practices: how do
of fraud: Benfords law and the brazilian case. J Accounting, Bus Manage leaders manipulate nancial reporting gures in conglomerates? Crit Perspect
2014;21:4352. Accounting 2014;25(6):46988.
[34] Guan L, He SD, Mc, Eldowney J. Window dressing in reported earnings. Com- [61] Raimi A. The rst digit phenomenon again. Proc Am Philos Soc
mer Lending Rev 2008;23(3):2833. 1985;129(2):21119.
[35] Haynes AH. Detecting fraud in bankrupt municipalities using Benfords [62] Rauch B, Gttsche M, Brhler G, Engel S. Fact and ction in EU-governmental
law; 2012. Scripps Senior Theses. Paper 42. available at http://scholarship: economic data. Ger Econ Rev 2011;12(3):24355.
claremont:edu/scripps_-theses/42. [63] Sambridge M, Tkalcic H, Jackson A. Benfords law in the natural sciences. Geo-
[36] Hill TP. The rst digit phenomenon a century-old observation about an unex- phys Res Lett 2010;37. L22301
pected pattern in many numerical tables applies to the stock market, census [64] Samuelson PA. Classical and neoclassical theory, in monetary theory. Penguin
statistics and accounting data. Am Sci 1998;86(4):35863. Books, London; 1969. P.12
[37] Holz CA. The quality of Chinas GDP statistics. China Econ Rev 2014;30:30938. [65] Thomas JK. Unusual patterns in reported earnings. Accounting Rev
[38] Johnson GG, Weggenmann J. Exploratory research applying Benfords law to 1989;54(4):77387.
selected balances in the nancial statements of state governments. Acad Ac- [66] Tsallis C, Anteneodo C, Borland L, Osorio R. Nonextensive statistical mechanics
counting Financial StudJ 2013;17(1):3144. and economics. Physica A 20 03;324(1):8910 0.
[39] Lin CC, Chiu AA, Huang SYY, Yen DC. Detecting the nancial statement fraud: [67] Varian H. Benfords law. Am Stat 1972;26:656.
the analysis of the differences between data mining techniques and experts [68] Wadhwa L, Pal V. Forensic accounting and fraud examination in India. Int J
judgments. Knowl Based Syst 2015;89:45970. Appl EngRes 2012;7(11):20069.

You might also like