Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 45

A PROJECT REPORT

ON
ANALYSIS OF SUICIDE RATES AROUND THE WORLD USING BIG
DATA

Submitted By

MUKUND SAROCH
1/17/FET/BCC/037

Under the guidance of

DR. POONAM TANWAR


ASSOCIATE PROFESSOR

in partial fulfilment for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Faculty of Engineering & Technology

Manav Rachna International Institute of Research and


Studies, Faridabad

1
ACKNOWLEDGEMENT

The successful realization of project is an outgrowth of a consolidated effort of people


from desperate fronts. I am thankful to Dr. Poonam Tanwar (Associate Professor)
for her variable advice and support extended to me without which I could not have
been able to complete my project for a success.

I would like to express my sincere gratitude to Prof. (Dr.) Harish C. Rai, Dean, FET
for providing me with the facilities in the Institute for completion of my work.

Words cannot express my gratitude for all those people who helped me directly or
indirectly in my endeavour. I take this opportunity to express my sincere thanks to all
staff members of CSE department for the valuable suggestion and also to my family
and friends for their support.

MUKUND SAROCH

1/17/FET/BCC/037

2
DECLARATION

I hereby declare that this project report entitled “Analysis of suicide rates around the
world using Big Data” by Mukund Saroch, is being submitted in partial fulfillment
of the requirements for the degree of Bachelor of Technology in Computer Science
& Engineering under Faculty of Engineering & Technology of Manav Rachna
International Institute of Research and Studies, Faridabad, during the academic year
2021, is a bonafide record of my original work carried out under the guidance of DR.
POONAM TANWAR, ASSOCIATE PROFESSOR.

I further declare that I have not submitted the matter presented in this Project for the
award of any other Degree/Diploma of this University or any other
University/Institute.

Mukund Saroch

1/17/FET/BCC/037

3
Manav Rachna International Institute of Research and Studies,
Faridabad
Faculty of Engineering & Technology
Department of Computer Science and Engineering

May, 2021
CERTIFICATE

This is to certify that this project report entitled “Analysis of Suicide Rates Around
the World using Big Data” by Mukund Saroch, is being submitted in partial
fulfillment of the requirements for the degree of Bachelor of Technology in
Computer Science and Technology under Faculty of Engineering & Technology of
Manav Rachna International Institute of Research and Studies, Faridabad, during the
academic year 2021, is a bonafide record of work carried out under my guidance and
supervision. I hereby declare that the work has been carried out under my supervision
and has not been submitted elsewhere for any other purpose.

(Signature of the project guide)

Dr. Poonam Tanwar


Associate Professor
Department of Computer Science and Engineering
FET, MRIIRS, Faridabad

4
TABLE OF CONTENTS

Declaration i
Certificate ii
Acknowledgement iii
Abstract iv

I. Introduction 1-12

S.No Topic Page No.


1.1 Introduction 1
1.2 Suicide Terminology 2
1.3 Literature Survey 2
1.4 Historic Perspective 4
1.5 Existing System 5
1.6 Proposed System 5
1.7 Software Requirement 7
1.8 Hardware requirement 7
1.9 Feasibility Study 8
1.10 Project Timeline 10
1.11 Overview of the report 11

II. System Analysis and Design

S.No Topic Page No.


2.1 Requirement specification 12

III. Implementation and Results 13-34


IV. Conclusion and Future Enhancements 35

5
ABSTRACT

This paper deals with the analysis of suicides around the world. I have covered all
the factors that lead to suicidal deaths. There are various factors that might force a
person to take away his / her life. These might be mental disorders, drug misuse,
psychological states, cultural, family and social situations, genetics, experiences
of trauma or loss, and nihilism. Mental disorders and substance misuse frequently
co-exist.
The most prominent and leading reason for death among young adults worldwide
is suicide. There is a growing recognition that prevention strategies got to be
tailored to the region-specific demographics of a rustic and to be
implemented during a culturally-sensitive manner. My paper will review the
existing scenario of suicide in India and worldwide and would also suggest some
region-specific prevention methods that might reduce the possibility of another
suicide. Every year, India sees an increasing rate of suicides than that of the
preceding year. Distinct from global demographic risk factors, In India, legal
status isn't necessarily protective and therefore the female: male ratio within
the rate of suicide is higher. The modes of suicide in India are also different from
those of the western countries. The most suicidal country within the entire world
is by a good margin Greenland. In 2016 the amount of suicides in India increased
to 230,314. Suicide was the foremost common explanation for death in both the
age groups of 15–29 years and 15–39 years. About 800,000 people die by suicide
worldwide every year. Out of those, 135,000 (17%) are residents of India, a nation
with 17.5% of world population.

Keywords: Suicide, India, epidemiology, prevention

1
I. 1.1 INTRODUCTION
One might wonder why a famous TV personality or an actor took away his own
life and what caused him to do so. Often, many factors combine to lead a person to
the decision to take their own life. This research paper covers the following aspects
in detail:

 What is it about? This project deals with the frequency of suicides around the
world.
 Why do such problems occur? This paper covers all the factors that cause a
person to commit suicide.
 Why did I take this paper up? I wanted to study about the increasing cases of
suicides and what leads people to take such extreme steps.
 Where do suicides take place the most globally? I have also discussed about
the countries that encounter the highest and the lowest cases of suicides in a
year.

More than one lakh lives are lost every year due to suicide in India. In the last
three decades (from 1975 to 2005), the suicide rate increased by 43%. The rates
were approximately the same in 1975 and 1985; from 1985 to 1995 there was an
increase of 35% and from 1995 to 2005, the increase was 5%. However, the male-
female ratio has been stable at around 1.4 to 1. There is a wide variation in suicide
rates within the country. The southern states of Kerala, Karnataka, Andhra Pradesh
and Tamil Nadu have a suicide rate of >15 while in the Northern States of Punjab,
Uttar Pradesh, Bihar and Jammu and Kashmir, the suicide rate is <3. This variable
pattern has been stable for the last 20 years. Higher literacy, a better reporting
system, lower external aggression, higher socioeconomic status and higher
expectations are the possible explanations for the higher suicide rates in the
southern states (Vijayakumar L, 2008)
Majority of the suicides (37.8%) in India are by those below the age of 30 years.
The fact that 71% of suicides in India are by persons below the age of 44 years
imposes a huge social, emotional and economic burden on society.
The near equal suicide rates of young men and women and consistently narrow
male:female ratio denotes that more Indian women die by suicide than their
Western counterparts. Poisoning (34.8%), hanging (31.7%) and self-immolation
(8.5%) were the common methods used to commit suicide (accidental deaths and
suicide 2007). Two large epidemiological verbal autopsy studies in rural Tamil
Nadu reveal that the annual suicide rate is six to nine times the official rates. If
these figures are extrapolated it suggests that there are at least half a million
suicides in India every year. It is estimated that one in 60 persons are affected by
suicide. It includes both, those who have attempted suicide and those who have

1
been affected by the suicide of a close family or friend. Thus, suicide is a major
public and mental health problem which demands urgent action.

1.2 SUICIDE TERMINOLOGY


Suicide, from Latin suicidium, is "the act of taking one's own life". Attempted
suicide or non-fatal suicidal behavior is self-injury with at least some desire to end
one's life that does not result in death.

Factors that affect the risk of suicide include mental disorders, drug misuse,
psychological states, cultural, family and social situations, genetics, experiences of
trauma or loss, and nihilism. Mental disorders and substance misuse frequently co-
exist.

1.3 LITERATURE SURVEY


The World Health Organisation (WHO) estimates that each year approximately
one million people die from suicide, which represents a global mortality rate of 16
people per 100,000 or one death every 40 seconds. It is predicted that by 2020 the
rate of death will increase to one every 20 seconds.

The WHO further reports that:

• In the last 45 years suicide rates have increased by 60% worldwide. Suicide is
now among the three leading causes of death among those aged 15-44 (male
and female). Suicide attempts are up to 20 times more frequent than completed
suicides.

• Although suicide rates have traditionally been highest amongst elderly males,
rates among young people have been increasing to such an extent that they are
now the group at highest risk in a third of all countries.

• Mental health disorders (particularly depression and substance abuse) are


associated with more than 90% of all cases of suicide.

• However, suicide results from many complex sociocultural factors and is more
likely to occur during periods of socioeconomic, family and individual crisis
(e.g. loss of a loved one, unemployment, sexual orientation, difficulties with
developing one's identity, disassociation from one's community or other
social/belief group, and honour).

2
The WHO also states that:

• In Europe, particularly Eastern Europe, the highest suicide rates are reported
for both men and women.

• The Eastern Mediterranean Region and Central Asia republics have the lowest
suicide rates.

• Nearly 30% of all suicides worldwide occur in India and China.

• Suicides globally by age are as follows: 55% are aged between 15 to 44 years
and 45% are aged 45 years and over.

• Youth suicide is increasing at the greatest rate.

In the US, the Centre of Disease Control and Prevention reports that:

• Overall, suicide is the eleventh leading cause of death for all US Americans,
and is the third leading cause of death for young people 15-24 years.

• Although suicide is a serious problem among the young and adults, death rates
continue to be highest among older adults ages 65 years and over.

• Males are four times more likely to die from suicide than are females.
However, females are more likely to attempt suicide than are males.

WHICH COUNTRY SEES THE MOST NUMBER OF SUICIDES IN A YEAR?

The most suicidal country in the entire world is by a wide margin Greenland, after its
last decades modernizing transformation from remote colony to welfare state, with a
male-female ratio of 2.99. India, of the South Asia region, is the greatest contributor to
the absolute number of suicide deaths.

This graph
clearly shows
that the most
suicidal
country is
Greenland.

3
1.4 HISTORIC PERSPECTIVE
The story of suicide can be dated back to a time when man was born. Through the
ages, suicide has variously been glorified, romanticized, bemoaned, and even
condemned. The most prominent examples of ancient society or civilization are
that of the tragic Greek heroes Aegeus, Lycurgus, Cato, Socrates, Zeno,
Domesthenes or Seneca; or the Roman figures Brutus, Cassius, Antony or the
Egyptian princess, Cleopatra; or Samson, Saul, Abimelech and Achitophel of the
Old Testament; or the suicide bombers within the present world. The prominence
of suicide and the concept of taking away one’s own life has been mentioned in
different religions and cultures across globe.
Suicide within the Indian context might involve the dominance of literary, cultural
and social ethos. For a very long time, suicide in hindu mythology has been
degraded because the ancient texts or the Puranas consist of stories of glory and
valour, and taking away one’s own life would bring shame and disgust within the
society. Many ancient folk songs also glorify the “death at war.” We can find the
prevalence of suicide in the Ramayana when sage Dadhichi took away his own life
right after Ram went into “samadhi”. The Bhagwat Gita also degrades suicide by
stating that it is one of the most selfish reasons to take away one’s own life and
such death does not deserve the last rites as well. Brahmanical view had held
that those that attempt suicide should fast for a stipulated period. The holy
scriptures or the Upanishads also condemn suicide and state that ‘he who takes his
own life will enter the sunless areas covered by impenetrable darkness after
death’.
Despite all the condemning, the Vedas justify suicide by stating that the easiest
way of sacrificing is that of taking away one’s own life. The kind of suicide which
is done by starving oneself or “Sallekhana” was said to be linked with liberation
or moksha. The ancient practise of Sati or a woman burning herself on the pyre of
her husband was prevalent till the 20 th century since many cases were reported for
the same. It was practiced by the Rajput widows in order to protect their chastity
even after the death of their husbands.

1.5 EXISTING SYSTEM

4
 Around 800 000 people die due to suicide every year across the globe.
 Suicide is prevalent among those who have attempted a prior suicide but have
failed to do so. Those people are at the highest risk of taking away their lives
again.
 Among the teenagers, between 15-19, it has been found that suicide is the
leading cause of death.
 79% of global suicides occur in low or middle income and developing or
under-developed countries.
 Ingestion of pesticide, hanging and firearms are the most common methods of
suicide around the world.

1.6 PROPOSED SYSTEM


My project sees the highest and the lowest cases of suicides around the world. It
compares the past trends with those of the present. For example, in the diagram
given below:

We see that in 2016, the prevalence of suicides in India increased to


230,314. Suicide has been the most common cause of death the age groups of 15–29
years and 15–39 years. About 800,000 people die by suicide worldwide every year, of
these 135,000 (17%) are residents of India, a nation with 17.5% of world population.

5
Comparative
Analysis which
shows the changes;
absolute and
relative.

6
1.7 SOFTWARE REQUIREMENTS
Project will be made using Cloudera working in VMware, i-python-3.6 version.
Working environment will be the Anaconda Distribution (Jupiter Notebook). It
includes data manipulation and visualization libraries such as
1. HDFS
2. Sqoop
3. Hive
4. Pig
5. Business Intelligence

1.8 HARDWARE REQUIREMENTS


 Laptop with minimum i3 (2 quad core) processors with clock speed of 2.4
GHz
 4 GB of RAM
 1 GB Memory storage

7
1.9 FEASIBILITY STUDIES

1.9.1 Financial Feasibility

Being a standalone system the project doesn’t have any hardware costs. Resources
required for the successful testing and implementation is readily available. Source
codes are created taking reference from open source materials. Hence no costs
required in terms proprietary licensing.

Further implementation may raise costs if we wish to apply the project in the market
or open it for public use. Hardware costs may arise including but not limited to cloud
space renting and faster processing units. Furthermore if the project is to be combined
with any proprietary application additional licensing cost might be required.

From this it is clear that the project is financially feasible.

i. Technical and Time Feasibility

The project is solely developed in PYTHON 3.6 using open source libraries having
free support from the developers and friendly online forums. Time factor has been
already been accessed beforehand keeping in mind the errors and obstacles that
usually arise while handling such projects. The project can be completed in finite
time.

From this it is clear that the project is technically feasible.

ii. Resource Feasibility

Resources required for implementing the project successfully are:

 Programming Device (Laptop) with specifications to handle project


 Software Python 3.6 version
 Programming tools (free and open source)

iii. Risk Feasibility

8
 Risk associated with size
Since this project does not deal with extensive graphics and images, and is solely a
code based programme, it is doable without undergoing any risks. But once we open
the project to the public we might need databases and storage management software to
store the dataset.

 Technical Risks
Is the technology new or obsolete?
The technology is a mixture of both new and old but not obsolete. The technology has
extensive support from the developers and engaging online developer’s forums. Hence
no risk in implementing the technology.
As discussed above it is clear has that the project has the required risk feasibility to be
implemented successfully.

9
b. PROJECT TIMELINE

STAGE I
TIME MANAGEMENT
Time management is the management of the time spent, and
progress made, on project tasks and activities. Excellent time
management in project management requires the planning,
scheduling, monitoring and controlling of all project activities.

STAGE II
NAVIGATING THE PROJECT:
Allocating proper sections to each aspect of the
project: introduction, objectives, analysis, scope,
references.

STAGE III
RESEARCH WORK:
After allocating appropriate time and navigation, the next
step of my project was to start researching. I started my
research on the suicide analysis around the world. I referred
to various sources like WHO and other reliable sources.

STAGE IV
STARTED WORKING ON THE PROJECT:
After researching about the various aspects of the project, I
started working on the same.

10
c. OVERVIEW OF THE REPORT
Big Data is a phrase used to mean a massive volume of both structured and
unstructured data that is so large it is difficult to process using traditional database and
software techniques. In most enterprise scenarios the volume of data is too big or it
moves too fast or it exceeds the current processing capacity.

This project is to analyses a data set related to suicides all over the world in different
countries. It includes the information and raw data about the various suicides
committed in different regions, at different rates and total no of suicides in a
particular area.

It gives a country wise perspective analysis of the growth and deviation between
various suicides commited. In this project, we have also used the time series
technique to resample the data month or quarter wise. The features of our project
are given below-

1. Framing the data sector wise for better understanding.


2. Data munging for scrubbing or removing irrelevant data.
3. Visualization by different techniques for better anticipation.
3.1. Graphical visualization
3.2. Advance plotting techniques
3.3. Geographical plotting

4. Predictive Analysis
5. Descriptive Analysis

11
2. SYSTEM ANALYSIS AND DESIGN
2.1 Requirement Specifications
Hardware Specifications:

 Laptop with minimum i3 (2 quad core) processors with clock speed of 2.4
GHz
 4 GB of RAM
 1 GB Memory storage

Software Specifications:
Project will be made using Cloudera working in VMware, i-python-3.6 version.
Working environment will be the Anaconda Distribution (Jupiter Notebook). It
includes data manipulation and visualization libraries such as
 HDFS
 Sqoop
 Hive
 Pig
 Business Intelligence
Operating System: Windows 8
Programming Language: Python 3.6
IDE platform: Python IDLE 3.6 (64 bit) / JupyterLab (Anaconda)

12
3. IMPLEMENTATION AND RESULTS

3.1 Assumptions and Dependencies


The dependencies that could be involved throughout the project is Rapid Miner,
Cognos Insight. Statically view and assumption plotting to get unknown data
elements, relation among different attributes.

3.2 Queries
1) To find out the no. of suicides people above the age of 75 have committed?
2) To find out how many kids between the age of 5-14 years have committed suicide?
3) To find out the records of suicides in the year 1999?
4) To find out the name of countries with GDP per capita above 20,000?
5) To find out countries with a population less than 10,000?
6) To view all the records of suicides all over the world?
7) Find out the avg suicides per 100k population?
8) To find out the total number of males and females in the list?
9) To find out the names of distinct countries whose name ends with “%ia”?
10) To find out the records of all the females?

3.3 Coding on Tools

Hadoop Distributed File System


Step 1: Moving data into Hadoop Distributed File System (HDFS)
For any queries in Hive and Pig, we need to load the data in HDFS. For queries in
hive we will load the data in HDFS and for queries in Pig we will use the data from
local file system.
The snapshot also contains some hadoop fs commands and mkdir to create directory
in hadoop file system.

13
14
SQOOP:
Load Data into Sqoop environments.
1. Open the terminal.
2. Write the sqoop command on the terminal.
3. This will open the shell.
4. Now you need to create the table in sql and insert data into it from sqoop.

15
HIVE

Load Data into hive environments.


1. Open the terminal.
2. Write the Hive command on the terminal.
3. This will open the hive shell.
4. Now you need to create the table in hive and load data into it from HDFS.

16
Query 1: To find out the no. of suicides people above the age of 75 have committed?

Query 2: To find out how many kids between the age of 5-14 years have committed
suicide?

17
Query 3: To find out the records of suicides in the year 1999?

Query 4: To find out the name of countries with GDP per capita above 20,000?

18
Query 5: To find out countries with a population less than 10,000?

Query 6: To view all the records of suicides all over the world?

19
Query 7: Find out the avg suicides per 100k population?

20
Query 8: To find out the total number of males and females in the list?

21
Query 9: To find out the names of distinct countries whose name ends with “%ia”?

22
Query 10: To find out the records of all the females?

23
PIG

24
Load Data into Pig environments.
1. Open the terminal.
2. Write the Pig Command (pig –x local) on the terminal.
3. This will open the grunt shell in the local mode.
4. Now you need to create a table and load data into it from local file system.

PIG FS COMMAND

25
DESCRIBE COMMAND IN PIG

26
ILLUSTRATE COMMAND IN PIG:

27
QUERIES:

28
Query 1: list the data group by country?

Query 2: list the country,year, sex from dataset using limit?

29
Query 3: list the data for country 'australia'?

30
31
4. DATA VISUALIZATION
BAR GRAPH
This bar graph shows the graph plotted between different countries all around
the world and their population. The bars shows the population of different
countries.

32
PIE CHART

This pie graph shows the graph plotted between the sex vs age of the person. The
graph tells the the sex and age and in what ratio the suicides have been commited.

LINE GRAPH
33
This line graph shows the graph plotted between country year and suicides per 100k.
The lines shows the deviation of suicides per 100k

AREA GRAPH

34
This area graph shows the graph plotted between country vs gdp per capita. It shows
the deviation of different countries plotted aginst their respective total gdp per
capita($).

CONCLUSION AND FUTURE ENHANCEMENTS

35
SUMMARY OF WORK DONE

As we have reached the end of the project report, we can conclude that the system
proposed can be successfully implemented to predict the future trends of suicides
around the world with great accuracy. I have trained the system with random data
from various sources and have achieved the results as expected. With proper database
and greater processing power the capabilities of the system can be further enhanced to
great extent.

FUTURE SCOPE

In the near future AIs will be much stronger than what we have today. Since the
system is developed fully in python 3.6, this can be imported as a module for any
further development or integration to other complex programs. The scope of this
project is as high as the sky, it depends only on the creativity of the developer how
he/she may want to use it further.

REFERENECES

36
 https://www.kaggle.com/
 https://www.who.int/
 https://www.afternic.com/forsale/data.govhealthcare.com?
utm_source=TDFS_DASLNC&utm_medium=DASLNC&utm_campaign=TD
FS_DASLNC&traffic_type=TDFS_DASLNC&traffic_id=daslnc&
 mhccinfo@mentalhealthcommission.ca.
 Turecki G, Brent DA. Suicide and suicidal behaviour. Lancet (London,
England) 2016; 387(10024): 1227-1239.
 Davis Molock S, Heekin JM, Matlin SG, Barksdale CL, Gray E, Booth CL.
The baby
or the bath water? Lessons learned from the National Action Alliance for
Suicide
 Prevention Research Prioritization Task Force literature review. American
journal of preventive medicine 2014; 47(3 Suppl 2): S115-121.
 Table 102-0551. Suicides and suicide rate, by sex and by age group.
 Suicide rates: An overview, 2017. Catalogue no. 82-624-X;
 www.statcan.gc.ca/pub/82-624-x/2012001/article/11696-eng.htm#n5.
 Langlois S and Morrison M, Statistics Canada. Suicide deaths and suicide
attempts, 2002.
 Findlay L. Statistics Depression and suicidal ideation among aged 15 to 24.
41.
Catalogue no. 82-003-X ISSN 1209-1367, 2017.
 United Nations Development Program. (2018). Human development
index (HDI). Retrieved from http://hdr.undp.org/en/indicators/137506
 World Bank. (2018). World development indicators: GDP (current US$)
by country:1985 to 2016. Retrieved
from http://databank.worldbank.org/data/source/world-development-
indicators# 33
 [Szamil]. (2017). Suicide in the Twenty-First Century [dataset].
Retrieved from https://www.kaggle.com/szamil/suicide-in-the-twenty-
first-century/notebook

37
 World Health Organization. (2018). Suicide prevention. Retrieved
from http://www.who.int/mental_health/suicide-prevention/en/
 World Health Organization. Suicide rates per 100,000 by country, year and
sex. [Last accessed on 2012 Mar 27]. Available
from: http://www.who.int/mental_health/prevention/suicide_rates/en/
index.html .
 Rajagopal S. Suicide pacts and the internet. BMJ. 2004;329:1298–9. [PMC
free article] [PubMed] [Google Scholar]
 Birbal R, Maharajh HD, Clapperton M, Jarvis J, Ragoonath A, Uppalapati K.
Cybersuicide and the adolescent population: Challenges of the future? Int J
Adolesc Med Health. 2009;21:151–9. [PubMed] [Google Scholar] 42.
 Thomas K, Chang SS, Gunnell D. Suicide epidemics: The impact of newly
emerging methods on overall suicide rates - a time trends study. BMC Public
Health. 2011;11:314. [PMC free article] [PubMed] [Google Scholar]
 International Association for Suicide Prevention. World Suicide Prevention
Day. Sep 10, [Last cited in 2011]. Available
from: http://www.iasp.info/wspd/2011_wspd.php .
 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3554961/
 Braun W. Sallekhana: The ethicality and legality of religious suicide by
starvation in the Jain religious community. Med Law. 2008;27:913–
24. [PubMed] [Google Scholar]
 Bhugra D. Sati: A type of nonpsychiatric suicide. Crisis. 2005;26:73–
7. [PubMed] [Google Scholar]
 World Health Organization. Global Burden of Disease. 2004. [Last cited in
2004]. Update. Available
from: http://www.who.int/healthinfo/global_burden_disease/GBD_report_20
04update_full.pdf
RESEARCH PAPER

AUTHOR: Mukund Saroch

38
43.
TITLE: Analysis of suicide rates around the world using Big Data

ACCEPTED IN: Manav Rachna International Institute of Research and Studies,


Faridabad

DATED: July, 2020

39

You might also like