Professional Documents
Culture Documents
Predicting Poverty and Wealth From Mobile Phone Metadata: Joshua Blumenstock, Gabriel Cadamuro, Robert On
Predicting Poverty and Wealth From Mobile Phone Metadata: Joshua Blumenstock, Gabriel Cadamuro, Robert On
Predicting Poverty and Wealth From Mobile Phone Metadata: Joshua Blumenstock, Gabriel Cadamuro, Robert On
Table 1. Summary statistics for primary data sets. Phone survey data were collected by the authors in Kigali, in collaboration with the Kigali Institute of
Science and Technology. Call detail records were collected by the primary mobile phone operator in Rwanda at the time of the phone survey. Demographic
and Health Survey (DHS) data were collected by the Rwandan National Institute of Statistics. N/A, not applicable.
DHS DHS
Summary statistic Phone survey Call detail records
(2007) (2010)
Fig. 2. Construction of high-resolution maps of poverty and wealth from call records. Information derived from the call records of 1.5 million
subscribers is overlaid on a map of Rwanda. The northern and western provinces are divided into cells (the smallest administrative unit of the country), and
the cell is shaded according to the average (predicted) wealth of all mobile subscribers in that cell. The southern province is overlaid with a Voronoi division
that uses geographic identifiers in the call data to segment the region into several hundred thousand small partitions. (Bottom right inset) Enlargement of
a 1-km2 region near Kiyonza, with Voronoi cells shaded by the predicted wealth of small groups (5 to 15 subscribers) who live in each region.
that are not predictive of wealth. The first step models, including tree-based ensemble regres- be used to accurately identify the individuals in
employs a structured, combinatorial method to sors and classifiers (24). We also show that this the sample who are living below a relative poverty
automatically generate several thousand metrics two-step approach to feature engineering and threshold (AUC = 0.72 to 0.81) (Fig. 1C). With
from the phone logs that quantify factors such as model selection performs significantly better than further refinement, such methods could prove
the total volume, intensity, timing, and direc- a more intuitive approach based on a small num- useful to policy-makers and organizations that
tionality of communication; the structure of the ber of hand-crafted metrics (table S1). target resources to the extreme poor (25) (supple-
individuals contact network; patterns of mobil- In addition to predicting composite wealth, mentary materials section 6).
ity and migration based on geospatial markers this same approach can be used to estimate, with For each of these prediction tasks, we use the
in the data; and so forth. The second step uses varying degrees of accuracy, how a phone survey two-step procedure to select a different model
elastic net regularization to eliminate irrelevant participant will respond to any question, such as with different metrics and parameters. Although
phone metrics and select a parsimonious model whether the respondent owns a motorcycle or not the focus of our analysis, we note discernible
that is more likely to generalize (23). We use has electricity in the household (Fig. 1B and table patterns in the set of features identified as the
cross-validation to limit the possibility that the S1). Cross-validated area-under-the-curve (AUC) best joint predictors of these different response
model is overfit on the small sample on which it scoreswhich indicate the probability that the variables. For instance, features related to an indi-
is trained. In the supplementary materials (sec- model will rank a randomly chosen positive re- viduals patterns of mobility are generally predic-
tion 3B), we provide details on these methods sponse higher than a randomly chosen negative tive of motorcycle ownership, whereas factors
and show that comparable results are obtained onerange from 0.50 (no better than random) to related to an individuals position within his or
under a variety of alternative supervised-learning 0.88 (quite effective). An analogous method can her social network are more useful in predicting
poverty and wealth (fig. S3). These results suggest the urban capital of Kigali, we also find a correlation 13. J.-P. Onnela et al., Proc. Natl. Acad. Sci. U.S.A. 104, 73327336
that our approach might be generalized to predict (r = 0.58) between satellite estimates of night (2007).
14. G. Palla, A. L. Barabsi, T. Vicsek, Nature 446, 664667
a broader class of survey responses, such as the light intensity in 0.55-km2 grid cells (fig. S7B) and (2007).
subjective opinions and perceptions of mobile the predicted distributionbased on phone data 15. M. C. Gonzlez, C. A. Hidalgo, A.-L. Barabsi, Nature 453,
subscribers. and the methods described earlierof responses 779782 (2008).
Having fit and cross-validated the model on the to the question Does your household have elec- 16. X. Lu, E. Wetter, N. Bharti, A. J. Tatem, L. Bengtsson, Sci. Rep.
3, 2923 (2013).
phone survey samplea sample drawn to be rep- tricity? (fig. S7C). 17. J. E. Blumenstock, Inf. Technol. Dev. 18, 107125
resentative of all active mobile phone userswe How might such methods be used in practice? (2012).
next generate out-of-sample predictions for the In addition to small-area estimation, one promis- 18. V. Frias-Martinez, J. Virseda, in Proceedings of the
characteristics of the remaining 1.5 million Rwan- ing application is as a source of low-cost, interim Fifth International Conference on Information and
Communication Technologies and Development
dan mobile phone users who did not participate national statistics. In many developing economies, (Association for Computing Machinery, New York,
in the survey. Combined with the rich geospatial long lag times typically occur between successive 2012), pp. 7684; http://doi.acm.org/10.1145/2160673.
markers in the phone data, the predicted attri- national surveys. In Angola, for instance, the most 2160684.
butes of millions of individual subscribers enable recent census before 2014 was conducted in 1970. 19. P. Deville et al., Proc. Natl. Acad. Sci. U.S.A. 111, 1588815893
(2014).
us to study the geographic distribution of sub- In that 44-year period, the official population grew 20. G. C. Cawley, N. L. C. Talbot, J. Mach. Learn. Res. 11,
scriber wealth at an extremely fine degree of by more than 400%. Rwanda has better resources 20792107 (2010).
spatial granularity (Fig. 2). Whereas public data for data collection, and the DHS preceding the 21. D. Filmer, L. H. Pritchett, Demography 38, 115132
from Rwanda are only accurate at the level of the 2010 DHS was conducted in 2007. However, even (2001).
22. J. Blumenstock, N. Eagle, Inf. Technol. Int. Dev. 8, 116
district (of which there are 30), the phone data in that relatively short period, the distribution of (2012).
can be used to infer characteristics of each of wealth in Rwanda shifted slightly. Thus, we find 23. H. Zou, T. Hastie, J. R. Stat. Soc. Ser. B 67, 301320
Rwandas 2148 cells, as well as small micro- that the 2010 distribution of wealth is more accu- (2005).
SUPPLEMENTARY http://science.sciencemag.org/content/suppl/2015/11/24/350.6264.1073.DC1
MATERIALS
RELATED http://science.sciencemag.org/content/sci/350/6264/1108.2.full
CONTENT
REFERENCES This article cites 32 articles, 8 of which you can access for free
http://science.sciencemag.org/content/350/6264/1073#BIBL
PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions
Science (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement of
Science, 1200 New York Avenue NW, Washington, DC 20005. 2017 The Authors, some rights reserved; exclusive
licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. The title
Science is a registered trademark of AAAS.