Professional Documents
Culture Documents
Chaos, Solitons and Fractals: Marcel Ausloos, Roy Cerqueti, Tariq A. Mir
Chaos, Solitons and Fractals: Marcel Ausloos, Roy Cerqueti, Tariq A. Mir
Data science for assessing possible tax income manipulation: The case
of Italy
Marcel Ausloos a,b, Roy Cerqueti c,, Tariq A. Mir d
a
Institute of Accounting and Finance, School of Management, University of Leicester, University Road, Leicester LE1 7RH, UK
b
GRAPES, rue de la Belle Jardiniere, B-4031 Liege, Federation Wallonie-Bruxelles, Belgium
c
University of Macerata, Department of Economics and Law, via Crescimbeni 20, Macerata I-62100, Italy
d
Nuclear Research Laboratory, Astrophysical Sciences Division, Bhabha Atomic Research Center, Srinagar 190 006, Jammu and Kashmir, India
a r t i c l e i n f o a b s t r a c t
Article history: This paper explores a real-world fundamental theme under a data science perspective. It specically dis-
Received 5 July 2017 cusses whether fraud or manipulation can be observed in and from municipality income tax size dis-
Accepted 15 August 2017
tributions, through their aggregation from citizen scal reports. The study case pertains to ocial data
obtained from the Italian Ministry of Economics and Finance over the period 20072011. All Italian (20)
JEL classication: regions are considered. The considered data science approach concretizes in the adoption of the Benford
H71 rst digit law as quantitative tool. Marked disparities are found, - for several regions, leading to unex-
C82 pected conclusions. The most eye browsing regions are not the expected ones according to classical
imagination about Italy nancial shadow matters.
Keywords:
Data science 2017 Elsevier Ltd. All rights reserved.
Benford law
Aggregated income tax
Data manipulation
Italy
http://dx.doi.org/10.1016/j.chaos.2017.08.012
0960-0779/ 2017 Elsevier Ltd. All rights reserved.
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 239
Table 3 Lombardia: rather smooth distribution, not too far from con-
Nc, r , the number of cities in each region; (rounded) AIT in successive years and
dence interval (CI); d = 6, 7 somewhat away from CI; above
average AIT of a region over the quinquennium. The AIT and AIT, for IT is in (e+11)
EUR, but for regions in (e+10) EUR. BL1+.
Piemonte: rather smooth distribution, not too far from CI; d =
i Nc, r in AIT AIT 5, 6 somewhat away from CI, specically above BL1+.
= REGION 2007 2008 2009 2010 2011 5yr Veneto: rather smooth distribution, not too far from CI, even if
1 305 ABRUZZO 1.2871 1.2850 1.3053 1.3288 1.3613 1.3135 d = 4, 5 are somewhat away from CI, on different sides of BL1.
2 131 BASILICATA 0.4580 0.4735 0.4820 0.4834 0.4911 0.4776 d = 9 is much dispersed.
3 409 CALABRIA 1.3404 1.3961 1.4411 1.4496 1.4516 1.4158 Campania: very scattered, except d = 1, 2; other digits are far
4 551 CAMPANIA 4.1890 4.2908 4.3589 4.3833 4.3863 4.3217
from CI.
5 348 EM. ROMAGNA 6.2211 6.3448 6.2945 6.3665 6.3970 6.3248
6 218 FRIULI V.G. 1.6876 1.7121 1.7163 1.7214 1.7323 1.7139
Calabria: rather smooth distribution, not too far from CI; d =
7 378 LAZIO 7.1759 7.3436 7.4487 7.5532 7.6163 7.4275 6, 7 somewhat away from CI, above BL1+; the digit d = 2 is be-
8 235 LIGURIA 2.2020 2.2402 2.2829 2.2958 2.3003 2.2642 low BL1.
9 1544 LOMBARDIA 14.457 14.737 14.561 14.771 15.008 14.707 Sicilia: d = 1, 4 below BL1; d = 7 much dispersed.
10 239 MARCHE 1.7710 1.8045 1.7977 1.8232 1.8567 1.8106
Lazio: rather smooth distribution, not too far from CI, even if
11 136 MOLISE 0.2736 0.2823 0.2825 0.2827 0.2865 0.2815
12 1206 PIEMONTE 5.9479 6.0326 5.9797 6.0515 6.1201 6.0264 d = 4, 5, 6 are somewhat away from CI (below BL1).
13 258 PUGLIA 3.1445 3.2563 3.3082 3.3557 3.3947 3.2919 Sardegna: very very scattered, except for d = 1, 2.
14 377 SARDEGNA 1.4896 1.5510 1.5789 1.5890 1.5977 1.5612 EmiliaRomagna: rather smooth distribution, not too far from
15 390 SICILIA 3.6977 3.8324 3.9066 3.9256 3.9451 3.8615
CI. d = 3, 4, 5 are somewhat away from CI and below BL1. The
16 287 TOSCANA 4.7404 4.8175 4.8417 4.8943 4.9499 4.8487
17 333 TRENTINO A.A. 1.3967 1.4379 1.4808 1.5148 1.5360 1.4733 digit d = 8 is quite above BL1+.
18 92 UMBRIA 1.0167 1.0432 1.0539 1.0624 1.0702 1.0493 TrentinoAlto Adige: rather smooth distribution, not too far
19 74 V. DAOSTA 0.1795 0.1849 0.1889 0.1911 0.1923 0.1873 from CI, but d = 4 somewhat away from CI (above BL1+). d = 7
20 581 VENETO 6.2346 6.3244 6.2912 6.3808 6.4845 6.3431 is quite above BL1+, while d = 6 somewhat below BL1.
IT 8092 6.8910 7.0390 7.0601 7.1424 7.2178 7.0701
Abruzzo: rather smooth distribution, but scattered away from
CI; on different sides of BL1.
Table 4 Toscana: much dispersed; d = 3, 4, 6, 7 below BL1; d = 8, 9
Summary of (rounded) statistical characteristics for AIT (in Euros) of the IT regions quite above BL1+.
(Nr = 20) in 20072011. Puglia: much dispersed; d = 3, 6, 7 much below BL1; the dig-
2007 2008 2009 2010 2011 AIT its d = 2, 9 are quite above BL1+.
Marche: very much dispersed; the digits d = 3, 6 are much be-
Min. (/e+09) 1.7950 1.8495 1.8891 1.9110 1.9230 1.8735
Maxi (/e+10) 14.457 14.737 14.561 14.771 15.008 14.707 low BL1, while d = 4, 5, 8, 9 are quite above BL1+.
Sum (/e+10) 69.910 70.390 70.601 71.424 72.178 70.701 Liguria: very much dispersed. The digits d = 3, 6, 8 are much
Mean (/e+10) 3.4455 3.5195 3.5300 3.5712 3.6089 3.5350 below BL1, while d = 2, 4, 6, 9 are quite above BL1+.
Median (/e+10) 1.9865 2.0223 2.0403 2.0595 2.0785 2.0374 FriuliVenezia Giulia : rather smooth distribution, but rather far
RMS (/e+10) 4.7793 4.8753 4.8601 4.9227 4.9831 4.8840
Std Dev (/e+10) 3.3982 3.4613 3.4274 3.4761 3.5254 3.4576 from CI; d = 4, 5, 7 quite away from CI, below BL1; d = 2, 3, 8
Std Error (/e+09) 7.5986 7.7397 7.6639 7.7729 7.8831 7.7314 rather away above BL1+.
Skewness 1.7795 1.7784 1.7406 1.7478 1.7672 1.7627 Molise: very much scattered from BL1. The digits d = 1, 2 are
Kurtosis 3.4321 3.4359 3.2850 3.3097 3.3902 3.3708 much much below BL1, while d = 3, 6 are much much above
BL1+. Wide variation for other digits.
Basilicata: much scattered from BL1. The digits d = 4, 8 are
4. Results much below BL1, while d = 5, 6 are much above BL1+. A wide
variation for all digits is observed.
Beside the (rounded) AIT in successive years and average AIT Umbria: much scattered from BL1. The digits d = 4, 6, 7, 8, 9 are
of a region over the quinquennium AIT and AIT, given for the much below BL1, while d = 1, 2, 5 are much above BL1+. Also
regions in (e+10) EUR) units, and for IT in (e+11) EUR, given in in this case, wide variation for all digits
Table 3, the statistical characteristics of the AIT regional distribu- Valle dAosta: much scattered from BL1. d = 4, 8 are much be-
tion for 20072011 is reported in Table 4, together with the cor- low BL1, and d = 2, 5, 7 are much above BL1+. Once again,
responding time average. The skewness and kurtosis are obviously there is wide variation for all digits
both positive, and the mean greater than the median (by a factor
1.75). Relevantly, it can be observed that the minimum and max- 4.2. 2 conformity test
imum AIT, both have 1 as rst digit.
It has been emphasized that there are ve 2 values to be cal-
culated for each region, - one for each year. These 1one hundred
4.1. BL1 displays 2 values are found in Tables 57. Moreover, the resulting 2 for
the time average values of the AIT is also given,- but recalled that
Each BL1 data set for each (20) region is displayed on Figs. 1 it is given in Table 2 for simplifying the reading of the Tables 57.
20; different symbols are used in order to distinguish the 5 years However, before examining each region BL1, it seems fair to ex-
so examined. On each gure, the frequency of the d digit is given, amine whether the distribution of the 2 itself is anomalous, - in
together with the theoretical BL1. Moreover, the sample stan- order to pin point (if it occurs) some statistical anomaly arising
error bar, i.e. depending on the number of cities, i.e.
dard from computations. Such a distribution of the 100 (20 regions, 5
(1/ (Nc,r 1 ) ) allows to draw the estimated range of the con- years) 2 is shown in Fig. 21.
dence interval dened as [BL1, BL1+], in obvious notations. The The distribution is markedly positively skewed, but rather irreg-
display order has been chosen according to the region ranking ular, with a set of outliers. Recall that the critical value of the 2 at
given in Table 2. a signicance level 0.05 is 02.05 = 15.51 when the number of de-
Denote the rst digit as d, so that d = 1, . . . , 9. Visual observa- grees of freedom is = 8 - that of BL1. This rather highly peaked
tions (list in descending order of Nc, r ) point to: distribution with a few outliers (in particular Sardegna and Liguria,
242 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256
Lombardia, Nc=1546
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Fig. 1. Lombardia.
Piemonte, Nc=1206
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Fig. 2. Piemonte.
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 243
Veneto, Nc=581
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Fig. 3. Veneto.
Campania, Nc=551
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Fig. 4. Campania.
244 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256
Calabria, Nc=409
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Fig. 5. Calabria.
Sicilia, Nc=390
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Fig. 6. Sicilia.
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 245
Lazio, Nc=378
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Fig. 7. Lazio.
Sardegna, Nc=377
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Fig. 8. Sardegna.
246 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256
Emilia-Romagna, Nc=348
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Abruzzo, Nc=305
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Toscana, Nc=287
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Puglia, Nc=258
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Marche, Nc=246
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Liguria, Nc=235
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Molise, Nc=136
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Basilicata, Nc=131
0.35
BL1+
BL1
BL1-
0.30 2007
2008
2009
2010
0.25
2011
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Umbria, Nc=92
0.40
BL1+
BL1
0.35 BL1-
2007
2008
2009
0.30
2010
2011
0.25
Proportion
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
0.20
Proportion
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
Digit
Table 5
Frequencies of the rst digit in reported AIT data for IT regions with a high Nc number, for various years; it can be visually compared to the expected frequency according
to BL given in Table 1; the corresponding calculated 2 is to be compared with the theoretical one (15.507) for a number of degree of freedom ( = 8) at 0.05%. The null
hypothesis is not veried for underlined cases.
Digit 1 2 3 4 5 6 7 8 9 2
Year Lombardia, Nc = 1546 for 20072019 ; Nc = 1544 for 20102011
2007 0.298 0.168 0.118 0.092 0.087 0.074 0.064 0.050 0.048 5.297
2008 0.297 0.175 0.115 0.089 0.085 0.073 0.070 0.045 0.051 9.692
2009 0.299 0.176 0.114 0.093 0.085 0.072 0.070 0.048 0.043 7.348
2010 0.296 0.172 0.119 0.096 0.082 0.071 0.069 0.054 0.043 4.674
2011 0.300 0.172 0.117 0.097 0.078 0.074 0.067 0.051 0.043 4.836
Piemonte, Nc = 1206
2007 0.304 0.185 0.119 0.091 0.088 0.061 0.064 0.052 0.036 6.279
2008 0.300 0.186 0.112 0.096 0.085 0.065 0.065 0.044 0.047 5.175
2009 0.303 0.180 0.110 0.095 0.090 0.061 0.061 0.053 0.046 4.925
2010 0.304 0.185 0.111 0.094 0.086 0.070 0.055 0.054 0.041 4.327
2011 0.299 0.186 0.114 0.095 0.076 0.080 0.056 0.051 0.042 5.709
Veneto, Nc = 581
2007 0.310 0.165 0.136 0.117 0.057 0.057 0.062 0.040 0.057 11.327
2008 0.313 0.167 0.139 0.107 0.062 0.062 0.065 0.041 0.043 6.251
2009 0.320 0.170 0.138 0.105 0.062 0.064 0.062 0.043 0.036 6.309
2010 0.325 0.172 0.138 0.103 0.067 0.062 0.055 0.053 0.024 9.569
2011 0.322 0.170 0.134 0.105 0.071 0.057 0.060 0.048 0.033 5.492
Campania, Nc = 551
2007 0.309 0.160 0.096 0.094 0.067 0.102 0.051 0.054 0.067 21.647
2008 0.314 0.160 0.093 0.100 0.065 0.100 0.044 0.062 0.064 23.021
2009 0.310 0.176 0.091 0.091 0.071 0.093 0.051 0.060 0.058 14.559
2010 0.310 0.172 0.091 0.087 0.082 0.087 0.049 0.065 0.056 13.556
2011 0.310 0.172 0.087 0.083 0.087 0.078 0.060 0.058 0.064 13.336
Calabria, Nc = 409
2007 0.298 0.152 0.132 0.098 0.073 0.073 0.068 0.046 0.059 4.439
2008 0.301 0.164 0.127 0.093 0.073 0.068 0.073 0.042 0.059 4.514
2009 0.298 0.164 0.120 0.100 0.064 0.076 0.071 0.049 0.059 4.939
2010 0.308 0.166 0.115 0.103 0.061 0.081 0.073 0.051 0.042 5.420
2011 0.301 0.166 0.117 0.095 0.073 0.086 0.064 0.054 0.044 3.020
Sicilia, Nc = 390
2007 0.287 0.195 0.123 0.077 0.085 0.079 0.051 0.049 0.054 4.614
2008 0.272 0.205 0.123 0.077 0.082 0.069 0.067 0.044 0.062 7.728
2009 0.279 0.203 0.123 0.077 0.079 0.072 0.064 0.051 0.051 4.421
2010 0.282 0.197 0.131 0.074 0.085 0.056 0.079 0.038 0.056 9.723
2011 0.290 0.195 0.126 0.079 0.077 0.067 0.069 0.038 0.059 5.761
Lazio, Nc = 378
2007 0.304 0.193 0.119 0.095 0.061 0.058 0.053 0.056 0.061 4.980
2008 0.310 0.188 0.127 0.093 0.069 0.050 0.066 0.042 0.056 4.360
2009 0.315 0.193 0.116 0.087 0.082 0.048 0.061 0.040 0.058 5.894
2010 0.320 0.190 0.119 0.093 0.077 0.053 0.056 0.034 0.058 5.613
2011 0.312 0.198 0.127 0.079 0.077 0.061 0.045 0.048 0.053 4.297
- noticed in different scal years) is another hint toward pursuing 4.2.1. 2 13.362, and yearly (ir)regularity
a BL1 analysis at the regional basis. It is found from Table 2 that the BL1 2 has a small value
For emphasizing the regional aspects, and connecting with in 17 regions; from the lowest to the highest: Calabria (4.4664),
Table 2 last column, the distribution of the mean (average over the Abruzzo (4.7372), Lazio (5.0288), Piemonte (5.2830) Trentino
quinquennium) 2 for each region, is shown in Fig. 22. Three Alto Adige (5.3782), Puglia (5.8868) Lombardia (6.3694) Sicilia
regions are markedly outliers in the upper range: Sardegna, Cam- (6.4494) Basilicata (7.5054), Valle dAosta (7.6682), Veneto (7.7896),
pania and Liguria, for approximately 2 15, pointing to much FriuliVenezia Giulia (7.8444), EmiliaRomagna (8.1620), Marche
(questionable) non-conformity with BL1. On the contrary, 2 regions (8.6922), Toscana (9.5140), Umbria (11.317) and Molise (12.657).
have a low 2 over the quinquennium, Abruzzo and Calabria, in- Among these, Tables 57 show that much regularity is observed
dicating very ne agreement with BL1 ( 2 4). in the respective yearly values, with some exceptions.
At this stage, it seems important to point to a quite substantial The values for Veneto (11.33) in 2007, EmiliaRomagna (10.60)
time-invariance of the 2 values, according to the displayed data/. in 2010, and Valle dAosta (11.31) in 2010 are such that the respec-
Thereafter, it seems natural to discuss each region BL1 values, tive 2 fall outside the 5% sampling error bar. The largest 2 value
to notice conformity or not, starting from the most valid and end- for Molise occurs in 2008, 20.99, although a surprisingly quite
ing with the most anomalous. In so doing, one can distinguish val- small 2 value 9.74 occurs in 2007. The 2 values for Umbria
ues according to the standard risk value for rejecting the null hy- are high, but without any severe hint of an anomaly; the distri-
pothesis, i.e. a uniform distribution, at 02.05 = 15.507 for a number bution of 2 values is quite narrow indeed for Umbria, implying
of degrees of freedom = 8; for completeness, let us observe that some systematicity.
(for = 8) the critical value of the 2 at a signicance level 0.10
is 02.10 = 13.362.
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 253
Table 6
Frequencies of the rst digit in reported AIT data for IT regions with a medium range Nc number, for various years; it can be visually compared to the expected frequency
according to BL given in Table 1; the corresponding calculated 2 is to be compared with the theoretical one (15.507) for a number of degree of freedom ( = 8) at 0.05%.
The null hypothesis is always veried in these cases.
Digit 1 2 3 4 5 6 7 8 9 2
Year Sardegna, Nc = 337
2007 0.310 0.167 0.093 0.053 0.050 0.119 0.064 0.109 0.034 56.001
2008 0.310 0.172 0.095 0.106 0.069 0.093 0.077 0.024 0.053 15.606
2009 0.316 0.175 0.101 0.095 0.085 0.069 0.093 0.032 0.034 13.908
2010 0.318 0.175 0.095 0.090 0.093 0.061 0.095 0.034 0.037 16.059
2011 0.314 0.174 0.103 0.087 0.092 0.066 0.103 0.029 0.032 21.363
EmiliaRomagna, Nc = 348
2007 0.296 0.201 0.103 0.078 0.069 0.069 0.057 0.066 0.060 7.516
2008 0.310 0.204 0.101 0.080 0.066 0.066 0.060 0.063 0.049 6.121
2009 0.305 0.201 0.103 0.072 0.075 0.060 0.060 0.060 0.063 8.040
2010 0.290 0.201 0.112 0.072 0.072 0.055 0.066 0.063 0.069 10.604
2011 0.299 0.195 0.115 0.072 0.060 0.069 0.063 0.072 0.055 8.529
TrentinoAlto Adige, Nc = 339
2007 0.322 0.159 0.127 0.100 0.077 0.065 0.062 0.062 0.027 4.713
2008 0.300 0.168 0.126 0.099 0.081 0.048 0.075 0.060 0.042 4.225
2009 0.285 0.171 0.135 0.105 0.078 0.048 0.084 0.039 0.054 7.975
2010 0.300 0.168 0.120 0.114 0.078 0.057 0.072 0.048 0.042 2.992
2011 0.294 0.156 0.117 0.120 0.087 0.048 0.072 0.048 0.057 6.986
Abruzzo, Nc = 305
2007 0.292 0.187 0.105 0.085 0.089 0.056 0.075 0.052 0.059 5.381
2008 0.295 0.170 0.111 0.089 0.098 0.059 0.062 0.056 0.059 3.852
2009 0.289 0.164 0.111 0.102 0.079 0.062 0.066 0.056 0.072 6.090
2010 0.298 0.154 0.121 0.092 0.085 0.069 0.066 0.052 0.062 3.252
2011 0.318 0.144 0.115 0.095 0.079 0.062 0.079 0.059 0.049 5.111
Toscana, Nc = 287
2007 0.331 0.188 0.125 0.059 0.066 0.070 0.035 0.059 0.066 11.580
2008 0.328 0.195 0.105 0.080 0.070 0.059 0.042 0.059 0.063 7.097
2009 0.321 0.206 0.094 0.084 0.073 0.056 0.038 0.066 0.063 10.148
2010 0.341 0.195 0.101 0.084 0.066 0.059 0.042 0.045 0.066 8.958
2011 0.369 0.185 0.101 0.091 0.073 0.042 0.056 0.038 0.045 9.787
Puglia, Nc = 258
2007 0.306 0.194 0.116 0.089 0.089 0.043 0.054 0.047 0.062 5.060
2008 0.295 0.213 0.101 0.101 0.074 0.078 0.039 0.047 0.054 5.989
2009 0.291 0.217 0.101 0.097 0.074 0.078 0.035 0.062 0.047 7.260
2010 0.302 0.209 0.105 0.101 0.078 0.074 0.031 0.062 0.039 6.800
2011 0.310 0.202 0.112 0.101 0.070 0.074 0.035 0.054 0.043 4.325
4.2.2. 2 15.507, and yearly (ir)regularity Deviations from BL1 are usually read as data manipulation.
In contrast, Liguria (16.895), Campania (17.224), and Sardegna However, the law is surely a subject of controversy in accounting.
(24.587) are the 3 regions with an indication of much lack of con- It is not clear even now neither why it should be valid at all, under
formity with respect to BL1. whatever socio-economic conditions, nor whether its theoretical
The two largest 2 values for Sardegna occur in 2007 and 2011: derivation, under various hypotheses, informs us on its origins and
56.00 and 21.36, respectively. The largest 2 values for Campa- its range of applications. Even Newcomb and Benford were dubious
nia occur in 2007 and 2008: 21.65 and 23.02, respectively. The of the realm of validity. Therefore, a deep exploration of the re-
largest 2 value for Liguria occurs in 2008: 27.17; surprisingly, gional reality behind the AIT data, of how they have been collected
a quite small 2 value 9.70 occurs in 2011. and of the shadow economy at a regional level are required to pro-
vide a rigorous interpretation of the results. This is well-beyond
the scopes of this paper. We can only give some suggestions and
5. Discussion discussions, to be likely taken as arguments for future studies.
Among the 3 regions with very anomalous BL1 2 , two belong
This section xes and discusses the results of the investigation. to S: Sardegna and Campania, while the other comes from N: Lig-
In general, the concordance between the AIT of Italian regions uria.
and the theoretical statement of BL1 is rather questionable. There Sardegna is characterized by a noticeable fragmentation at a
are discrepancies at a regional level, an this is in line with the het- city level in several municipalities with very small number of in-
erogeneous nature of Italian regions under a socio-economic point habitants. This region does not have a highly developed industrial
of view. In particular, one can note a very good matching between structure, and a wide part of the regional economy is still based on
geographic and economic features of the regions, and cluster them agriculture and livestock. In the small communities the economy is
among N (North), C (Center), S (South, plus Sicilia and Sardegna). somewhat closed, and business exchanges are often based on com-
N is the part of Italy constituted by 8 regions: EmiliaRomagna, modities. In such a situation, one can guess that the existence of
FriuliVenezia Giulia, Liguria, Lombardia, Piemonte, TrentinoAlto a discrepancy between the ocial data and the income tax should
Adige, Valle dAosta and Veneto; come from the real regional economy.
C contains 5 regions: Abruzzo, Lazio, Marche, Toscana and Um- Sadly, Campania has the relevant problem of a massive inu-
bria; ence of the organized crime on the economic system. Hence, devi-
S is the remaining 7 regions: Basilicata, Calabria, Campania, ations from BL1 can be viewed as expected.
Molise, Puglia, Sardegna, Sicilia.
254 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256
Table 7
Frequencies of the rst digit in reported AIT data for IT regions with a low Nc number, for various years; it can be visually compared to the expected frequency according
to BL given in Table 1; the corresponding calculated 2 is to be compared with the theoretical one (15.507) for a number of degree of freedom ( = 8) at 0.05%. The null
hypothesis is not veried for underlined cases.
Digit 1 2 3 4 5 6 7 8 9 2
Year Marche, Nc = 239
2007 0.268 0.172 0.092 0.121 0.084 0.063 0.054 0.071 0.075 11.051
2008 0.285 0.172 0.092 0.113 0.084 0.071 0.059 0.054 0.071 6.486
2009 0.280 0.180 0.088 0.117 0.088 0.071 0.067 0.042 0.067 7.370
2010 0.285 0.180 0.096 0.096 0.109 0.054 0.071 0.038 0.071 9.946
2011 0.293 0.197 0.088 0.096 0.105 0.042 0.071 0.054 0.054 8.608
Liguria, Nc = 235
2007 0.272 0.209 0.085 0.089 0.077 0.098 0.043 0.047 0.081 15.920
2008 0.268 0.213 0.089 0.081 0.055 0.111 0.055 0.034 0.094 27.173
2009 0.294 0.209 0.077 0.098 0.060 0.094 0.060 0.034 0.077 15.719
2010 0.289 0.196 0.094 0.102 0.047 0.094 0.072 0.030 0.077 15.954
2011 0.306 0.191 0.089 0.115 0.047 0.089 0.060 0.043 0.060 9.707
FriuliVenezia Giulia, Nc = 219 for 2007 ; Nc = 218 for 20082011
2007 0.279 0.210 0.142 0.078 0.055 0.059 0.068 0.064 0.046 6.075
2008 0.289 0.220 0.138 0.078 0.046 0.073 0.055 0.064 0.037 7.940
2009 0.280 0.220 0.142 0.078 0.041 0.069 0.064 0.064 0.041 8.993
2010 0.303 0.206 0.142 0.073 0.050 0.073 0.060 0.060 0.032 6.516
2011 0.284 0.211 0.151 0.069 0.055 0.073 0.041 0.073 0.041 9.698
Molise, Nc = 136
2007 0.243 0.125 0.191 0.118 0.074 0.074 0.074 0.051 0.051 9.741
2008 0.221 0.110 0.213 0.118 0.059 0.103 0.044 0.074 0.059 20.990
2009 0.250 0.110 0.184 0.110 0.081 0.103 0.051 0.066 0.044 11.890
2010 0.228 0.132 0.169 0.118 0.096 0.081 0.066 0.037 0.074 10.475
2011 0.235 0.140 0.169 0.096 0.103 0.081 0.081 0.029 0.066 10.190
Basilicata, Nc = 131
2007 0.290 0.160 0.130 0.061 0.122 0.076 0.061 0.023 0.076 9.965
2008 0.282 0.168 0.107 0.107 0.084 0.092 0.031 0.061 0.069 5.365
2009 0.290 0.160 0.122 0.099 0.061 0.099 0.053 0.053 0.061 3.567
2010 0.290 0.168 0.115 0.092 0.076 0.122 0.023 0.053 0.061 9.692
2011 0.305 0.160 0.099 0.115 0.053 0.122 0.046 0.046 0.053 8.938
Umbria, Nc = 92
2007 0.380 0.207 0.130 0.065 0.109 0.054 0.011 0.022 0.022 10.855
2008 0.370 0.207 0.130 0.065 0.120 0.043 0.022 0.022 0.022 10.348
2009 0.370 0.207 0.130 0.076 0.120 0.043 0.022 0.022 0.011 11.093
2010 0.370 0.217 0.120 0.076 0.109 0.065 0.011 0.0 0 0 0.033 12.352
2011 0.380 0.217 0.109 0.087 0.109 0.054 0.011 0.011 0.022 11.938
Valle dAosta, Nc = 74
2007 0.270 0.230 0.095 0.081 0.122 0.054 0.081 0.041 0.027 5.456
2008 0.270 0.243 0.108 0.068 0.108 0.054 0.081 0.014 0.054 6.760
2009 0.284 0.203 0.122 0.054 0.149 0.054 0.068 0.027 0.041 7.477
2010 0.284 0.216 0.108 0.027 0.162 0.068 0.054 0.041 0.041 11.309
2011 0.284 0.176 0.135 0.041 0.135 0.081 0.054 0.027 0.068 7.339
The economic system of Liguria seems to be affected by the be explained only based on BL1: we do not understand why some
pervasion of shadow economy. Confartigianato (the Italian associa- regions do not have BL1 violators; we avoid to propose specula-
tion of artisans and small businesses) states that about 73% of the tive statements which might be called resulting from imagina-
artisans is in competition with illegal and shadow economies (see tion. Yet, this further supports the point that a deeper analysis
http://www.confartigianatoliguria.it/node/4153). This evidence rep- should be carried out to investigate the nature of the scal data
resents a good hint for explaining why Liguria exhibits this dis- and how they are usually collected and approved [56].
crepancy with respect to BL1.
Unexpectedly, we admit so, the remaining regions are in ac-
cordance with the BL1, with some disparities over the quinquen- 6. Conclusions
nium as highlighted in the previous Section. From both an eco-
nomic and a social point of view, some regions are quite similar to Today Benfords law is routinely used by forensic analysts to de-
Campania, with a remarkable presence of organized crime (think tect error, incompleteness and dubious manipulation of nancial
at Calabria, Puglia and Sicilia, but also to the North with Veneto data. The basic premise of the test is that the rst digits in real
and Lombardia). Basilicata and Molise are similar to Sardegna for data, in general, have a tendency to approach the Benford distri-
what concerns the absence of a well-established industrial struc- bution whereas people intending to play with the numbers, when
ture. It is also worth mentioning that Basilicata is the main pro- unaware of the law, try to place the digits uniformly. Thus any
ducer of fossil fuels in Italy (see the report in http://www.siteb.it/ departure from the law raises some suspicion. We have assessed
new%20siteb/documenti/RASSEGNA/6711_7.pdf). Moreover, shadow the tax income possible manipulation of citizens in Italy through
economy is generally widespread in the entire country. accounting city aggregated income tax reports from all Italian re-
To conclude: we demonstrate some hints for further exploring gions, with data obtained from the Research Center of the Italian
cases of violations of BL1, whence likely possible tax income ma- MEF.
nipulation through accounting city Aggregated Income Tax reports The validity of the reported data does not seem to have at-
throughout all Italian Regions. We admit that not every nding can tracted ocial accountants. For example, something like Economia
M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256 255
Acknowledgments
Supplementary material
References
Fig. 22. Distribution of the mean (average over the quinquennium) 2 values in
BL1 analysis of AIT in the 20 IT regions.
[1] Abrantes-Metz RM, Villas-Boas SB, Judge G. Tracking the libor rate. Appl Econ
Lett 2011;18(10):8939.
[2] Aggarwal R, Lucey BM. Psychological barriers in gold prices? Rev Financial
Econ 2007;16(2):21730.
e Finanza locale Rapporto 2010 or RAPPORTO ANNUALE 2012 La situ-
[3] Alali FA, Romero S. Benfords law: analyzing a decade of nancial data. J
azione del Paese fall short of discussing the data validity. Emerging Technol Accounting 2013;10(1):139.
This paper provides an examination of a scal data set stem- [4] Alexeev M, Janeba E, Osborne S. Taxation and evasion in the presence of ex-
ming from the Italian citizens, on a regional level. Specically, it tortion by organized crime. J Comp Econ 2004;32:37587.
[5] Amiram D, Bozanic Z, Rouen E. Financial statement errors: evidence from the
focuses on the assessment of potential manipulation of tax income distributional properties of nancial statement numbers. Rev Accounting Stud
through the adoption of the Benford law for the rst digit over the 2015;20(4):154093. Ibid. Erratum to: Financial statement errors: evidence
quinquennium 20072011. from the distributional properties of nancial statement numbers. Review of
Accounting Studies, 20(4), 15941595.
The BL1 presents signicant advantages over alternative mea- [6] Armstrong CS, Blouin JL, Jagolinzer AD, Larcker DF. Corporate governance, in-
sures of accounting quality currently used in practice. For example, centives, and tax avoidance. J Accounting Econ 2015;60(1):117.
it does not require time-series, cross-sectional, or forward-looking [7] Ausloos M, Herteliu C, Ileanu B. Breakdown of Benfords law for birth data.
Physica A 2015;419:73645.
information, nor details on transactions. [8] Ausloos M, Castellano R, Cerqueti R. Regularities and discrepancies of credit
Throughout the paper we refer to municipalities, though in default swaps: a data science approach through Benfords law. Chaos, Solitons
practice we are investigating the incomes of citizens, but we avoid and Fractals 2016. doi:10.1016/j.chaos.2016.03.002.
[9] Bartolini D, Santolini R. Political yardstick competition among Italian munici-
any individual information on whether individuals correctly report
palities on spending decisions. Ann Reg Sci 2012;49(1):21335.
their income. This is an important distinction to the extent that [10] Beebe N.H.F. A bibliography of publications about Benfords law, Heaps law,
this is precisely the population set of interest. Though we nd sig- and Zipfs law. 2016. http://ftp.math.utah.edu/pub/tex/bib/benfords-law.pdf.
[11] Benford F. The law of anomalous numbers. Proc Am Philos Soc
nicant variations in municipality tax incomes by regions, much of
1938;74(8):55172.
the variation is actually attributable to differences in the character- [12] Bierstaker JL, Brody RG, Pacini C. Accountants perceptions regarding fraud de-
istics of regions tection and prevention methods. Managerial Auditing J 2006;21(5):52035.
Another purpose of this paper has been to provide a proof of [13] Bolton RJ, Hand DJ. Unsupervised proling methods for fraud detection. Credit
Scoring Credit Control 2001;VII:23555.
the BL1 concept for using the scal data at a regional level in order [14] Bolton RJ, Hand DJ. Statistical fraud detection: a review. Stat Sci
to provide some information on manipulation. It is shown that it is 2002;17(3):23549.
256 M. Ausloos et al. / Chaos, Solitons and Fractals 104 (2017) 238256
[15] Brosio G, Cassone A, Ricciuti R. Tax evasion across Italy: rational non- compli- [40] Lucas RE, Sargent T. After Keynesian macroeconomics. rational expectations
ance or inadequate civic concern. Public Choice 2002;112(3):25973. and econometric practice 1. London: George Allen & Unwin; 1981. p. 295319.
[16] Calderoni F. Where is the maa in Italy? measuring the presence of the maa [41] Luippold BL, Kida T, Piercey MD, Smith JF. Managing audits to manage earn-
across Italian provinces. Global Crime 2011;12(1):4169. ings: the impact of diversions on an auditors detection of earnings manage-
[17] Carbone A, Jensen M, Sato AH. Challenges in data science: a complex systems ment. Accounting, Organiz Soc 2015;41(2):3954.
perspective. Chaos, Solitons Fractals 2016;90:17. [42] Lusk EJ, Halperin M. Detecting newcomb-benford digital frequency anomalies
[18] Carrera C. Tracking exchange rate management in latin America. Rev Financial in the audit context: suggested chi2 test possibilities. Accounting Finance Res
Econ 2015;25:3541. 2014;3(2):191205.
[19] Ciaponi F, Mandanici F. Using digital frequencies to detect anomalies in receiv- [43] Michalski T, Stoltz G. Do countries falsify economic data strategically? some
ables and payables: an analysis of the Italian universities. Ekonomski i socijalni evidence that they might. Rev Econ Stat 2013;95:591616.
razvoj 2015;2(1):86108. [44] Mir TA. The law of the leading digits and the world religions. Physica A
[20] Chiarini B, Marzano E, Schneider F. Tax rates and tax evasion: an empirical 2012;391:7928.
analysis of the long-run aspects in Italy. Eur J Law Econ 2013;35(2):27393. [45] Mir TA. The leading digit distribution of the worldwide illicit nancial ows.
[21] Cleary R, Thibodeau JC. Applying digital analysis using Benfords law to detect Qual Quant 2016;50(2016):27181.
fraud: the dangers of type I errors. Auditing A J PractTheory 2005;24(1):77 [46] Mir TA, Ausloos M, Cerqueti R. BenfordS law predicted digit distribution of ag-
81. gregated income taxes: the surprising conformity of Italian cities and regions.
[22] Clippe P, Ausloos M. Benfords law and theil transform of nancial data. Phys- Eur Phys J B 2014;87(11):261.
ica A 2012;391(24):655667. [47] Newcomb S. Note on the frequency of use of the different digits in natural
[23] Cooper DJ, Dacin T, Palmer D. Fraud in accounting, organizations and numbers. Am J Math 1881;4:3940.
society: extending the boundaries of research. Accounting, OrganizSoc [48] Nigrini M. Using digital frequencies to detect fraud, 8. The White Paper; 1994.
2013;38(6):44057. p. 36.
[24] Costa JIF, Santos J, Travassos SKM. An analysis of federal entities compliance [49] Nigrini MJ. A taxpayer compliance application of Benfords law. J Am Taxation
with public spending: applying the newcomb-Benford law to the 1st and Assoc 1996;18(1):7291.
2nd digits of spending in two brazilian states. Revista Contabilidade & Fi- [50] Nigrini M. BenfordS law: applications for forensic accounting, auditing, and
nanas-USP 2012;23(60):18798. fraud detection. Hoboken, N.J.: Wiley; 2012.
[25] Costa JIF, Travassos, M SK, Santos J. Application of newcomb-Benford law in [51] Nigrini MJ, Mittermeir LJ. The use of Benfords law as an aid in analytical pro-
accounting audit: a bibliometric analysis in the period from 1988 to 2011. 10th cedures. Auditing 1997;16:5267.
International conference on information systems and technology management, [52] Nye J, Moul C. The political economy of numbers: on the application of Ben-
June 1214 (2013). Sao Paulo, Brazil; 2013. fords law to international macroeconomic statistics. BE J Macroeconomics
[26] Davidson P. Sensible expectations and the long-run non-neutrality of money. J 2007;7(1).
Post Keynesian Econ 1987;10(1):14653. [53] Othman R, Aris NA, Mardziyah A, Zainan N, Amin NM. Fraud detection and
[27] Davidson P. Reality and economic theory. J Post Keynesian Econ prevention methods in the Malaysian public sector: accountants and internal
1996;18(4):479508. auditors perceptions. Procedia Econ Finance 2015;28:5967.
[28] Davidson P. The keynes solution: the path to global economic prosperity. Pal- [54] Padovani E, Scorsone E. Measuring nancial health of local governments a
grave/Macmillan; 2009. comparative framework. Year Book of Swiss Administrative Sciences; 2011.
[29] Durtschi C, Hillison W, Pacini C. The effective use of Benfords law to assist in [55] Palmer RG. Broken ergodicity. Adv Phys 1982;31(6):669735.
the detecting of fraud in accounting data. J Forensic Accounting 2004;5:1734. [56] Pentland BT, Carlile P. Audit the taxpayer, not the return: tax auditing as an
[30] Fiorio CV, DAmuri F. Workers tax evasion in Italy. Giornale degli Economisti e expression game. Accounting, Organiz Soc 1996;21(23):26987.
Annali di Economia 2005;64(2/3):24770. [57] Pimbley JM. Benfords law and the risk of nancial fraud. Risk Professional;
[31] Fu D, Shi YQ, Su W. A generalized Benfords law for JPEG coecients and its 2014. p. 17.
applications in image forensics. Electron Imaging 2007. 65051L-65051L Inter- [58] Pinkham RS. On the distribution of rst signicant digits. Ann Math Stat
national Society for Optics and Photonics. 1961;32(4):122330.
[32] Galbiati R, Zanella G. The tax evasion social multiplier: evidence from Italy. J [59] Pollach G, Jung K, Namboya F, Pietruck C. Maternal mortality rate. a reliable
Public Econ 2012;96(5):48594. indicator? Int J Clin Med 2015;6:3426.
[33] Gava AM, Vitiello L. Ination, quarterly balance sheets and the possibility [60] Puyou F-R. Ordering collective performance manipulation practices: how do
of fraud: Benfords law and the brazilian case. J Accounting, Bus Manage leaders manipulate nancial reporting gures in conglomerates? Crit Perspect
2014;21:4352. Accounting 2014;25(6):46988.
[34] Guan L, He SD, Mc, Eldowney J. Window dressing in reported earnings. Com- [61] Raimi A. The rst digit phenomenon again. Proc Am Philos Soc
mer Lending Rev 2008;23(3):2833. 1985;129(2):21119.
[35] Haynes AH. Detecting fraud in bankrupt municipalities using Benfords [62] Rauch B, Gttsche M, Brhler G, Engel S. Fact and ction in EU-governmental
law; 2012. Scripps Senior Theses. Paper 42. available at http://scholarship: economic data. Ger Econ Rev 2011;12(3):24355.
claremont:edu/scripps_-theses/42. [63] Sambridge M, Tkalcic H, Jackson A. Benfords law in the natural sciences. Geo-
[36] Hill TP. The rst digit phenomenon a century-old observation about an unex- phys Res Lett 2010;37. L22301
pected pattern in many numerical tables applies to the stock market, census [64] Samuelson PA. Classical and neoclassical theory, in monetary theory. Penguin
statistics and accounting data. Am Sci 1998;86(4):35863. Books, London; 1969. P.12
[37] Holz CA. The quality of Chinas GDP statistics. China Econ Rev 2014;30:30938. [65] Thomas JK. Unusual patterns in reported earnings. Accounting Rev
[38] Johnson GG, Weggenmann J. Exploratory research applying Benfords law to 1989;54(4):77387.
selected balances in the nancial statements of state governments. Acad Ac- [66] Tsallis C, Anteneodo C, Borland L, Osorio R. Nonextensive statistical mechanics
counting Financial StudJ 2013;17(1):3144. and economics. Physica A 20 03;324(1):8910 0.
[39] Lin CC, Chiu AA, Huang SYY, Yen DC. Detecting the nancial statement fraud: [67] Varian H. Benfords law. Am Stat 1972;26:656.
the analysis of the differences between data mining techniques and experts [68] Wadhwa L, Pal V. Forensic accounting and fraud examination in India. Int J
judgments. Knowl Based Syst 2015;89:45970. Appl EngRes 2012;7(11):20069.