Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

VIRTUAL CONFERENCE

30 November 2021

A Study in Determining Indicators of Food-Insecure


Households using SHAP and Boruta SHAP

Nidia Mindiyarti1, a), Bagus Sartono1, b), Indahwati1, c), Alfian


Futuhul Hadi2, d), Evi Ramadhani3, e)

1
Department of Statistics, IPB University, Bogor, West Java, Indonesia
2
Department of Mathematics, University of Jember, East Java, Indonesia
3
Department of Statistics, Syiah Kuala University, Aceh, Indonesia

Presented at The 3rd International Seminar on Science and Technology


Universitas Islam Indonesia
Yogyakarta Tuesday 30 November 2021, Indonesia

isstec.uii.ac.id
isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Outline

1 2 3 4
Introduction Methodology Result Conclusion

isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Introduction
 Black Box SHAP
SHapley Additive exPlanations
SHAP able interpret
blackbox into whitebox
Machine learning is a computational
methods helpful in making and improving
predictions and big data modelling The various components are difficult
to understand and interpret SHAP is not equipped
Volume with an algorithm to filter
Data many variables into finite
Size  Inconsistency variables
Data
Complexity
Boruta-SHAP
combining the
Boruta variable Boruta

KRT Rawan

KRT Rawan
The complexity of big data makes big data The resulting values ​proved to be local selection SHAP

BPJS

PIP

BPJS
challenging to handle with traditional and inconsistent for different algorithm with

PIP
learning methods observations Shapley values

isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Introduction
 The Boruta-SHAP algorithm will be used to analyze the
incidence of household food insecurity in West Java
Province using the XGBoost method.
You were unable
 To see if there is a difference in the performance of
You were worried You were
you would not have to eat healthy and You had to hungry but did Boruta-SHAP and SHAP, the SHAP algorithm will be
enough food to eat? nutritious food? skip a meal? not eat? applied to data containing all variables (complete data)
and Boruta-SHAP filtered data.
You ate only a You ate less than Your You went
few kinds of you thought you household ran without eating
foods? should? out of food? for a whole day?

750 million people are affected by severe food


Research Purpose
insecurity and 690 million people in The
world's food organization is estimated to be
malnourished in 2019 To examine the differences between the XGBoost
method with Boruta-SHAP and without Boruta-SHAP
on the importance score of the SHAP variable, which
In 2020, the moderate characterizes household-level food insecurity in West
and severe food Java.
insecurity rate in
Indonesia reach
5.12%
isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Methodology
Variables  The Name of Variables 
Y Food Insecurity Status
X1  
X2  
Education of Household Head (6 levels) 
Vulnerable Household Head (1=yes, 0=no) 
Research Process
X3   Number of family members illiterate (1, 2, etc) 
X4   Access to Outpatient Treatment  (1=yes, 0=no)
Pre-Processing
X5   Number of Family Members Having Saving Account (0, 1, 2, etc)  Start Exploration Data
Data
X6   The income from transferee(1=yes, 0=no) 
X7   Ownership of Land (1=yes, 0=no) 
XGBoost Model Boruta-SHAP
X8   Grantee of Non-Cash Social Assistance (1=yes, 0=no)  • Hyperparameter
X9   Grantee of Hopeful Family Program (1=yes, 0=no)  optimization
Boruta-SHAP Data
X10  Grantee of Prosperous Family Program (1=yes, 0=no)  • Model Fitting
X11  Grantee of Social Assistance from Local Government (1=yes, 0=no)  • Model Evaluation
XGBoost Model
X12  Grantee of Health Insurance National Program (1=yes, 0=no) 
SHAP • Hyperparameter
X13  Grantee of Health Insurance Local Program (1=yes, 0=no)  optimization
X14  Grantee of Scholarship Social Program (1=yes, 0=no)  Feature Importance • Model Fitting
X15  Roof Types (roofing, asbes, roof tile, concrete, other)  • Model Evaluation
X16  Floor Types (cement, tiles/terrazzo, parquet/carpet, marble/ceramic, other) 
SHAP
X17  Wall Types (wood/board, wall/plaster/woven, other) 
X18  House Size (square meter)  Feature Importance
X19  Internet Access (1=yes, 0=no)  Boruta-SHAP
X20  Electricity (1=yes, 0=no) 
X21  Types of Cooking Fuel (firewood, kerosene, LPG 3kg, not cooking, blue gas)  End
X22  Drinking-Water Source (borehole/pump, plumbing, bottled water/refill, other) 
X23  Decent Drinking Water (1=yes, 0=no) 
X24  Decent Sanitation (1=yes, 0=no) 

isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Result
The
proportion of
vulnerable
households
more than the
proportion of
households
that are not
vulnerable

TABLE 3. Optimal Hyperparameter of XGBoost


Algorithm
Hyperparameter Value of Model Value of Model
1 2
colsample­­_bytree 0.93 0.93
eta 0.46 0.46
max_depth 21.0 20.0
min_child_weight 13.0 13.0
subsample 0.95 0.95
random_state 123 123 The Boruta-SHAP algorithm selected seven variables: house size, education of
household head, ownership of land, number of family members having saving
Tuning hyperparameter is used in each model to account, drinking water source, decent drinking water, and internet access.
obtain the optimal model

isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Result
SHAP Score
0.4746
0.4702
The SHAP values and order of model 1 and
0.4240 model 2 are different. However, the seven
0.4045 variables that are in the first order of model 1
0.3782
all appear in model 2. So it can conclude that
0.3703
0.3594 the results of the order of importance of the
0.2868 complete data variables and data from Boruta-
0.2213
SHAP tend to be the same and converge.
0.1908
0.1260
0.1154
0.1060
0.817
SHAP
0.773 Score
0.725 0.5409
0.522 0.5241
0.420 0.4186
0.335 0.4070
0.327 0.3979
0.283 0.3478
0.3180
0.145
0.000
0.000

isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Conclusion
 The order of the variables formed had differences between the SHAP value using the complete
data model and the data model of the Boruta-SHAP filtering results.
 All seven variables created from Boruta-SHAP are also the topmost important variables in the
complete SHAP data model. So it can conclude that the SHAP values of the two models are in
line.
 The seven variables that contribute the most are ownership of land, house size, education of
household head, number of family members having saving account, decent drinking water,
internet access, and drinking water source.
 When the analysis is carried out only to see the importance of all variables, then select the SHAP
model without variable filtering; however, if the research aims to see which variables contribute
the most to the variables, the Boruta-SHAP variable filtering algorithm before modelling is
recommended.

isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Acknowledges
Research reported in this publication was supported by the Ministry of Research
and Technology/National Agency of Research and Innovation - Republic of
Indonesia under Award Number 1/E1/KP.PTNBH/2021. The content is solely the
responsibility of the authors and does not necessarily represent the official views
of the National Agency of Research and Innovation.

isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Acknowledges
Research reported in this publication was supported by the Ministry of Research
and Technology/National Agency of Research and Innovation - Republic of
Indonesia under Award Number 1/E1/KP.PTNBH/2021. The content is solely the
responsibility of the authors and does not necessarily represent the official views
of the National Agency of Research and Innovation.

isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

References
Cristoph Molnar, Interpretable Machine Learning. A Guide for Making Black Box Models Explainable, chapter 2.2.
https://christophm.github.io/interpretable-ml-book/
Scoot M. Lundberg and Su-In Lee, A Unified Approach to Interpreting Model Predictions (Proceedings of the 31st International Conference on Neural
Information Processing Systems, 2017), pp. 4768-4777. https://dl.acm.org/doi/10.5555/3295222.3295230
Miron B. Kursa and Witold R. Rudnicki, Feature Selection with Boruta Package, (Journal of Statistical Software, September 2010, Volume 36, Issue 11), pp.
1-13. doi:10.18637/jss.v036.i11
Yagyanath Rimal, Boruta Algorithm is Significant for Large Feature Selection of Student Marks Data of Pokhara University Nepal, (Universe International
Journal of Interdisciplinary Research. Vol. 1, Issue.2, 2020), pp. 308-315. http://doi-ds.org/doilink/08.2020-25662434/
Lee Kuok Leong and Azian Azamimi Abdullah, Prediction of Alzheimer’s Disease (AD) Using Machine Learning Techniques with Boruta Algorithm as
Feature Selection Method, (Journal of Physics: Conference Series, Volume 1372, International Conference of Biomedical Engineering 26-27 August 2019,
Penang Island, Malaysia), pp. 1-8. doi:10.1088/1742-6596/1372/1/012065
Eoghan Keany, BorutaShap : A Wrapper Feature Selection Method Which Combines the Boruta Feature Selection Algorithm with Shapley Values,
(Zenodo.org, https://pypi.org/project/BorutaShap/, 2020). doi:10.5281/zenodo.4247618
EoHeru Irawan, "Faktor-Faktor Rumah Tangga yang Mencirikan Tingkat Kerawanan Pangan," thesis, Institut Pertanian Bogor, 2019.
Otilia Vanessa Coldero-Ahiman, Jorge Leonardo Vanegas, Pablo Beltrán-Romero and Maria Elena Quinde-Lituma, Determinant of Food Insecurity in Rural
Households: The Case of the Paute River Basin of Azuay Province, Ecuador, (Sustainability, 2020, 12, 946) pp. 1-18. doi:10.3390/su12030946
Giovanna Menardi and Nicola Torelli, Training and Assessing Classification Rules with Imbalanced Data, (Data Mining and Knowledge Discovery, Volume
28, Issue 1, January 2014), pp. 92-122. doi:10.1007/s10618-012-0295-5
Tianqi Chen and Carlos Guestrin, XGBoost: A Scalable Tree Boosting System, (Proceeding of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, San Fransisco, 13-17 Agustus 2016), pp. 785-794. doi:10.1145/2939672.2939785
Scoot M. Lundberg and Su-In Lee, A Unified Approach to Interpreting Model Predictions, (Proceedings of 31st International Conference on Neural
Information Processing Systems, December 2017), pp. 4768-4777. doi:10.5555/3295222.3295230
Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal and Su-In Lee,
From Local Explanations to Global Understanding with Explainable AI for Trees, (Nature Machine Intelligence 2, 2020), pp. 56-67.
https://doi.org/10.1038/s42256-019-0138-9

isstec.uii.ac.id
VIRTUAL CONFERENCE
30 November 2021

Thank you

isstec.uii.ac.id

You might also like