Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Diabetes Prediction and Validation from

Genealogy using Social Network Analysis

Ms. K Chaitra N Prabhu
Dept. of ISE
Sahyadri College of Engineering & Management Mangaluru

Ms. Manasa Rao K

Dept. of ISE
Sahyadri College of Engineering & Management Mangaluru

Ms. Ravijna R
Dept. of ISE
Sahyadri College of Engineering & Management Mangaluru

Ms. Thrishala N P
Dept. of ISE
Sahyadri College of Engineering & Management Mangaluru

Mr. Ganaraj K
Assistant Professor, Dept. of ISE
Sahyadri College of Engineering & Management Mangaluru

Abstract— Excellent health is key to human I.INTRODUCTION

happiness and wellness. Hence it is important to identify the
health risks is prior in order to avoid chronic conditions at Genealogy is the study of ancestry of a person or a family.
any time soon. Diabetes type 2 is one amongst the chronic Genealogy has been depicted as an inescapable part of
condition that affects the way the body deals with blood sugar. consultation with patients. Nowadays genealogy has not only
This is the most common type of diabetes. Up to 95% of people been used as a mode of social analysis, but it can also be used as
with diabetes have Type 2. It usually occurs in middle-aged a genetic risk predictor .A family history of a specific disease
and older people. Some of the factors that contribute to usually contemplates the combining effects of genetic
diabetes are high blood pressure, physically inactive, history susceptibility, common environmental factors and common
of heart disease, family history etc. Family history is an behaviors in the family line. Analyzing genetic factors helps us to
unexplored factor that could be used to predetermine the risk predict whether a descendent is more likely to have a disease that
of incorporating diabetes. The project aims to develop a runs through the family for generations. Diabetes type 2 is one
model for early prediction and validation of diabetes. The among the genetically inherited diseases. If you have a family
model outputs how susceptible are an individual to diabetes. health history of diabetes in addition to other environmental
The proposed model consists of the following steps - 1. factors, you are more likely to have prediabetes and develop
Collecting clinical information associated with diabetes, diabetes. People with prediabetes are susceptible to type 2
which is based on a three-generation family history 2. diabetes. In the year 2015, it was estimated that 9.4 percent of the
Enforcing a machine learning algorithm to predict if an population was diabetes sensitive. Over one in four people were
individual is diabetes sensitive or not; if sensitive then not aware that they had diabetes. One in four people over sixty
forecasting at what age he might develop diabetes 3. five years of age suffers from diabetes. In India, over 77 million
Validation of the predicted results using social network adults suffer from diabetes. Researchers are forecasting this
Analysis that uses graph theory. number to rise to 134 million by 2045. While the stats for diabetes
are high, researchers estimate that 57 percent of cases remain
Keywords — Diabetes, genealogy, Social Network Analysis undiagnosed. This is alarming because
an individual's health is at stake when the person does not take based on 18 risk alleles provides slightly better prediction of risk
medication on time to control their blood glucose. The purpose of than knowledge of common risk factors alone. Genetic loci have
the project is to introduce the idea of genealogy as an instrument been convincingly associated with the risk of type 2 diabetes.
to foresee if a person is likely to develop diabetes in the near This research aims to find out whether a genotype score based
future. Genealogy is used to identify individuals with increased on 18 risk alleles provides marginally upgraded forecasting of
sensitivity to diabetes. Genealogy is used for early identification risk than knowledge of common risk factors alone. Genetic loci
of the disease risk of your immediate family members and helps have been satisfactorily linked with the risk of type 2 diabetes.
them in taking preventive measures and lifestyle changes in order They genotyped SNPs at 18 loci associated with diabetes in
to reduce the risk of developing diabetes. Social Network 2377 participants of the Framingham Offspring Study and
Analysis is one of the most substantial visual tools to gauge and created a genotype score from the number of risk alleles and
typify social association. Social Network analysis is used to used logistic regression to generate C statistics.
evaluate networks and uncover the most important nodes within
them. Social network analysis is a way to evaluate networks and Martin Singh-Blom et al.[6] proposed a study on
discover the most substantial nodes within them. In our case SNA prediction and validation of gene-disease associations using social
is used to find the most substantial node, i.e. a person likely to network analyses. It states that rightly determining associations of
develop diabetes. genes with diseases has long been an intention in biology. The
study used two methods - Katz measure and CATAPULT method.
II. LITERATURE SURVEY Katz measure is driven from its success in social network link
prediction. CATAPULT is a supervised machine learning method
Talha Mahboob Alama et al.[1] proposed a model for the that uses a biased SVM to analyze heterogeneous gene-trait
early detection of diabetes. In this study, various attributes for network. The Katz measure is superior at identifying relationships
diabetes detection were taken into account and the Apriori between traits and badly explored genes, whereas CATAPULT is
algorithm was used that realized there existed a strong association more effectively suited to precisely determining gene-trait
between body mass index (BMI) and glucose levels with diabetes. associations overall.
Three Machine Learning algorithms, Random Forest, Artificial
Neural Network and K-Means clustering, were used and the Fawzya Hassan and Masoud E. Shaheen et al.[7]
accuracy of each algorithm was computed using the AUROC proposed a model for Predicting diabetes from health-based
curve and confusion matrix. The ANN algorithm can use non- Streaming data using Social Media, and ML. In this study, Prime
linear relationships and hence the model that used ANN was Indian Diabetes Dataset(PIDD) is used for testing as well as
found to be more accurate. training the models. . The methodologies adopted in this paper
include various Machine learning algorithms such as Logistic
Jack W. Smith et al.[2] proposed a paper for early regression, Decision tree, Support vector machines and random
diagnosis of diabetes using the ADAP Algorithm, a neural forest are used. Three machine learning techniques were
network model. They devised the Diabetes Pedigree implemented on the dataset, as well as trained and tested against
Function(DPF) that considers the family history of diabetes and a test dataset. The results show that ANN outperforms other
provides a value that indicates the influence of genetics in the models.
onset of diabetes. ADAP algorithm was used to generate neural
networks for the prediction of diabetes in the next 5 years. A ROC Parvin Pasalar et al. [9] proposed a study on predictive
curve that plots the graph of TP and FP was used to find the factors of diabetic complications: a possible link between family
influence of the discrimination point. history of diabetes and diabetic retinopathy. The aim of this
survey was valuation of predictive factors of diabetic
In the study by Warih Maharani et al.[3], Social Network retinopathy. Out of 1228 diabetic patients, the occurrence of
Analysis was applied to find user interactions in social media. The
diabetes retinopathy was 26.6%. Important associations
crawled Twitter dataset was considered and the N-most influential
between retinopathy and family history of diabetes is (p = 0.04).
users were found. Two centrality measures, degree centrality and
A family history of diabetes is indicating towards a possible
Eigen Vector centrality measures were applied. The result showed
genetic and epigenetic basis for diabetic retinopathy.
that notable differences existed in the two centrality measures. A
node with low weight can still be the most influential node in the Zalika Klemenc-Ketis et al. [10] proposed a study on
eigenvector centrality measure whereas the degree centrality
family history as a predictor for disease risk in healthy
measure showed that the node with the highest degree was the individuals. This research targeted to find out the prevalence of
most influential. This study can be used to improve marketing healthy folks in danger of developing chronic diseases,
strategies in Small and Medium enterprises. established on their self-reported parentage. Statistics were
James B. Meigs et al. [5] proposed a study on genotype gathered from preliminary health care institutions in Slovenia.
score in addition to common risk factors for predicting Type 2 Details were collected by a self-developed questionnaire. The
Diabetes. This study aims to find out whether a genotype score major conclusion was the number of participants at a moderate
or high risk for the development of cardio diseases, diabetes collaboration outcomes in the first four years of development. A
mellitus, and cancer. The final sampling consisted of 1,340 diverse set of partners were engaged in a CBPR effort to reduce
respondents. The results showed moderate or high danger for the and eliminate cancer disparities, with purposive and directed
development of diabetes in 154 (11.5%) members. The figures effort in the areas of community activities
were examined with the SPSS 19.0 package. The research
concluded that healthy individuals with a heightened genetic risk III.SYSTEM DESIGN
for chronic disease.

Raccoons act as a vector of rabies, and the degree of

rabies spreading depends on the social interactions of raccoons.
Collars and various social network metrics, like weighted
degree, two-step reach and clustering coefficient were used to
find the social connectivity in raccoons. In this paper put
forward by Ben T Hirsch[11], 30 raccoons were observed for a
duration of 18 months and the three metrics were analyzed for
each raccoon. The result showed that the monthly social
networks were strongly connected and due to this, the possibility
of pathogen transmission in the raccoon population in the
strongly connected social network was high.
The dataset used contains the clinical information associated with
Duncan Chambers et al. [12] proposed a systematic scopic review diabetes, which is based on a three-generation family history. This
for analysing the importance of social networking analysis in dataset is generally used to find associations in family history of
healthcare. In this study, The literature search aimed to diabetes and predict the potential for the successor to develop
systematically identify social network analyses of healthcare diabetes in the near future. The dataset is preprocessed to verify
professionals in any healthcare setting. A broad search strategy data quality and characteristics are extracted from the pre-
was initially developed on MEDLINE (OvidSP) using free text processed data inputs. The extraction of characteristics is part of
terms, synonyms and subject headings relating to social networks the dimensionality reduction process, where an initial set of raw
and methods used to investigate them.Key findings of studies that data is split and reduced to more manageable groups. Then,
looked at service provision and organisation included differences depending on the characteristics extracted from the preprocessed
in actual and perceived nature of social networks among data, some of these will be used for training and testing
professionals from different disciplines. In conclusion, very little respectively. The Machine Learning algorithm used to predict the
evidence was found for the potential of SNA being realised in potential for the development of diabetes. The outcome will be
healthcare settings. However, it seems unlikely that networks are either diabetic sensitive or
less important in healthcare than other settings. Future research
should seek to go beyond the merely descriptive to implement and
evaluate SNA-based interventions
Shoba Ramanadhan et al. [13] proposed a study on
Addressing Cancer Disparities via Community Network. V. RESULTS AND DISCUSSION
Development of a rich and productive set of partnerships among
diverse players was one of the goals driving the development of The outcomes of the proposed model are:
the Massachusetts Community Network for Cancer Education,
Research, and Training (MassCONECT) project.MassCONECT • Predict if an individual within a family may be affected
utilized a Community-based Participatory Research (CBPR) by diabetes with the help of clinical information
framework, which “integrates education and social action to associated with diabetes, which is based on a three-
improve health and reduce health disparities”. They conducted a generation family history
cross-sectional study at the end of Year 4 of the MassCONECT • Predict at what age an individual can be diagnosed with
initiative to describe the social network that developed over the diabetes.
time. The three outcome indices measured the extent to which • Predicting and analyzing mortality rates.
members engaged in stated network goals. The mean response for
the community activities index was 1.97 out of 4 (SD = 1.42); the VI.CONCLUSION AND FUTURE WORK
mean for the publications and grants index was 2.29 out of 3
The most severe and complicated inheritance patterns are found
(SD = 1.01); and the mean for the policy engagement index was
in type-2 diabetes and they can significantly lower a person's
1.11 out of 2 (SD = 1.29).This study describes a successful
quality of life. The primary causes of diabetes are genetic factors.
community mobilization effort that resulted in increased
Using social network analysis and genealogy, a novel method is
intersectoral partnerships and generated important short-term
offered to predict diabetes at an early stage. The model examines [9] Maghbooli, Z. et al. (2014) “Predictive factors of diabetic complications: A
possible link between family history of diabetes and Diabetic retinopathy,”
whether diabetes runs in the family, forecasts a person's
Journal of Diabetes & Metabolic Disorders, 13(1). Available at:
susceptibility to the disease and the age at which it is most likely
to appear, and examines the mortality rate. In the early study on
the subject, numerous machine learning techniques were [10] Klemenc-Ketis, Z. and Peterlin, B. (2013) “Family history as a predictor for
disease risk in healthy individuals: A cross-sectional study in Slovenia,” PLoS
explored. Early prediction diabetes using genealogy was less
ONE, 8(11). Available at:
explored. Collection of clinical data of three generations family
tree is quite a task. But the model will predict and validate diabetes [11] irsch, B.T. et al. (2013) “Raccoon social networks and the potential for
with better accuracy than the earlier methods since we have disease transmission,” PLoS ONE, 8(10). Available at:
included genealogy with the rest of the usual factors that
contribute to diabetes. [12] Chambers, D. et al. (2012) “Social network analysis in healthcare settings: A
systematic scoping review,” PLoS ONE, 7(8). Available at:

[1] Mahboob Alam, T. et al. (2019) “A model for early prediction of diabetes,” [13] Ramanadhan, S. et al. (2012) “Addressing cancer disparities via Community
Informatics in Medicine Unlocked, 16, p. 100204. Available at: Network Mobilization and Intersectoral Partnerships: A Social Network Analysis,” PLoS ONE, 7(2). Available at:
[14] Asad, M., Qamar, U. and Abbas, M. (2021) “Blood glucose level prediction
of diabetic type 1 patients using nonlinear autoregressive neural networks,”
[3] Maharani, W., Adiwijaya and Gozali, A.A. (2014) “Degree centrality and Journal of Healthcare Engineering, 2021, pp. 1–7. Available at:
eigenvector centrality in Twitter,” 2014 8th International Conference on
Telecommunication Systems Services and Applications (TSSA) [Preprint].
Available at:
[15] Karaivanov, A. (2020) “A social network model of COVID-19,” PLOS ONE,
15(10). Available at:
[4] Hossain, M.E., Uddin, S. and Khan, A. (2021) “Network analytics and
machine learning for predictive risk modelling of cardiovascular disease in
patients with type 2 diabetes,” Expert Systems with Applications, 164, p. 113918.
Available at:

[5] Meigs, J.B. et al. (2008) “Genotype score in addition to common risk factors
for prediction of type 2 diabetes,” New England Journal of Medicine, 359(21), pp.
2208–2219. Available at:

[6] Singh-Blom, U.M. et al. (2013) “Prediction and validation of gene-disease

associations using methods inspired by social network analyses,” PLoS ONE, 8(5).
Available at:

[7] Hassan, F. and Shaheen, M.E. (2020) “Predicting diabetes from health-based
streaming data using social media, machine learning and Stream Processing
Technologies,” International Journal of Engineering Research and Technology,
13(8), p. 1957. Available at:


You might also like