IJNGC Latex Research Paper

Depression Detection using Sentiment Analysis
of Social Media Posts

Aroop Kumar
Ashish
Kilaparthi Chaitanya Kumar
Saurabh Kulkarni
Prof. Shubhada Bhalerao
Prof. Sagar Rane
Army Institute of Technology, Pune, India
Depression has become a huge mayhem plaguing the world today. About 265 million individuals of all ages suffer
from depression worldwide. Of these, about 75% remain untreated, with one million individuals taking their lives
every year. Thus, depression is amongst the leading causes of suicide esp. amongst adolescents.
Social media platforms are becoming an inseparable part of people’s daily lives. They mirror the user’s personal
life as users share their happiness, joy, insecurities and sorrow on social media. These platforms are often utilized
by researchers to spot the causes of depression and retract it. Detection of early depression could prove to be an
enormous step in improving mental health of our society collectively.
Thus, to address our problem, we propose a stacking-based ensemble machine learning model which uses XGBoost
and Multinomial Naive Bayes as the base-learners and Logistic Regression as the meta-learner. The model is
developed for twitter data and would flag tweets which are found to be depressive. The stacked model produced a
very high accuracy and F1-Score, which is superior to any other standalone model proposed earlier. The project
would help us employ emotional AI in twitter which would in turn lead to lower suicide rates and improved mental
health.
Keywords: Depression, Twitter, Machine Learning, Naive Bayes, XGBoost, Stacking
1. INTRODUCTION
Depression is one of the most significant causes of disability in the world. In developing nations,
about 75% of people with mental illnesses go untreated, and nearly 1 million people commit
suicide each year due to mental health issues. Moreover, as per the World Health Organization
(WHO), one in every thirteen individuals worldwide suffers from anxiety. Depression is distinct
from natural mood swings and short-term emotional reactions to daily difficulties. Depression
can be dangerous to one’s health, particularly if it lasts for a long time and has a moderate or
extreme severity. It can make the person who is affected suffer greatly and perform poorly at
work, school, and in the family. Depression can lead to suicide in the worst-case scenario. A
depressive episode may be classified as mild, moderate, or extreme depending on the number and
severity of symptoms. There is also a distinction made between depression in individuals who
have or have not had manic episodes.
Social media has taken over people’s lives in a big way. Individuals are quite open to share their
feelings and experiences on these public platforms. This provides researchers a huge amount of
data for mining useful information out of it. One such use case is early depression detection
in social media users. For our model, we have used Twitter data as our input as it is publicly
available, easy to pre-process and has a liberal API.
In the proposed model, first a comprehensive dataset is prepared using various open source
International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

2 ·
datasets like Sentiment140 1 and TWINT GitHub data 2 . Also, the Twitter API has been used to
collect depressive tweets for reducing imbalance in our dataset. The dataset prepared in the above
step is cleaned, pre-processed and passed through a vectorizer, where resolution of the textual
data into features takes place. This vectorized data is passed through a stacking-based Machine
Learning model for training. After training is completed, the model is ready to give predictions
and produces test results. The model performed much better than other state-of-the-art models,
giving an accuracy of 96% and an F1-score of 93%.
2. BACKGROUND AND RELATED WORK

This is a field where there is a lot of study going on. Many scholars have used social media to
investigate mental health in recent years. Husseini Orabi et al. [2018] believe that social media
platforms can reflect users’ personal lives on many levels. Their main goal was to find the most
successful deep neural architecture among two of the most common deep learning approaches
in natural language processing: Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs).
Millennials are more willing to talking about their mental health difficulties on social media,
according to Naslund et al. [2017]. In recent years, machine learning (ML) has evolved dra-
matically, allowing for the solution of real-world issues as well as the construction of automated
systems.
Thorstad and Wolff [2019] predicted future mental disease based on posts from an individual’s
Reddit postings by collecting posts from clinical sub-reddits and then classified them to the as-
sociated mental disorder. After compiling the posts, clustering was used to identify the signs of
mental illness that were evident in their everyday spoken language.
Yang et al. [2016] created a database of facial expressions reacting to emotional stimuli from
patients with BD, UD, and safe controls. The AU profiles were created using the subject’s facial
expressions in the CHIMEI database to diagnose mood disorder. The MS was then used to diag-
nose mood disorders by analyzing the fluctuation of the AU profile series over a video segment.
Based on the findings of the mood disorder identification comparison, we can conclude that the
proposed ANN-based approach performed the best.
Katchapakirin et al. [2018] offer a method for detecting depression quickly and easily. Peo-
ple will be more conscious of their emotional states and seek psychological support as a result of
this. This research uses Natural Language Processing (NLP) techniques to develop a depression
detection algorithm for the Thai language on 35 users on the Facebook.
Detecting terms that convey negativity in a social media post is one step towards detecting
depressed moods, according to Tao et al. [2019]. The authors used a multistep approach that
helped us to identify potential users and then discover the terms that these users used to convey
negativity. Since the computation on these datasets was narrowed to just these chosen users, the
results showed that the sentiment of these terms could be collected and rated efficiently. They
also collected stress ratings, which were found to be highly associated with the content’s negative
sentiment.
Jain et al. [2019], the authors analyzed social media posts (especially twitter), conducted ques-
tionnaire and asked students and parents to give their opinion and also scrapped blogs on internet.
1 https://www.kaggle.com/kazanova/sentiment140
2 https://github.com/ram574/Detecting-Depression-in-Social-Media-via-Twitter-Usage

· 3
According to the research, major factors of depression among the age group of 15-29 which they
found during the course of the project are parental pressure, love, failures, bullying, body sham-
ing, inferiority complex, exam pressure, peer pressure, physical and sexual abuse etc. Among
future directions, the authors researched to understand how social media behavior analysis can
help in leading to development
Classification of major machine learning algorithms over certain parameters of interest is demon-
strated below:
Model/s
Ref [Year] Advantages and Limitations
Used
Advantages:
1. Capable of working with huge datasets.
Deshpande 2. Able to solve problems involving multiclass text categorization prediction.
Multinomial
and Rao Limitations:
Naı̈ve Bayes
[2017] 1. Low Precision.
2. Naı̈ve assumption of independence of features causes low correlation.
Advantages:
1. BiLSTM works well with sequence data and hence is a good model for NLP.
Combination 2. XGBoost reduces the effect of imbalance.
Cong et al.
of XGBoost Limitations:
[2018]
and BiLSTM 1. Long training duration.
2. LSTM prone to overfitting.
Advantages:
1. Deep learning models give better results than traditional models.
Convolutional 2. Capable of handling huge amounts of data.
Trotzek
Neural Net- Limitations:
et al. [2018]
works 1. Incapable of handling “long-term dependencies”.
2. Very high training time.
Advantages:
1. Training time is very low.
2. Can work with small datasets efficiently.
Asad et al. Naı̈ve Bayes
Limitations:
[2019] and SVM
1. Low precision and accuracy.
2. Suffers from Zero-frequency problem.
Advantages:
1. Low training time.
Random For- 2. RF reduces overfitting through a voting mechanism.
est, Support 3. NB works well with text data.
Tariq et al. Vector Ma- Limitations:
[2019] chine (SVM), 1. NB is prone to Zero-frequency problem.
Naive Bayes 2. SVM does not work well with noisy data.
(NB) 3. RF lacks interpretability and fails to evaluate the relevance of each variable
due to the ensemble of decision trees.
Advantages:
1. When compared to individual base learners, the stacking model significantly
increased prediction accuracy.
Stacking of
2. Reduced variance.
Mahendran KNN, Ran-
Limitations:
et al. [2020] dom Forest
1. High training time makes it unsuitable for large datasets.
and SVM
2. The model with a significant bias will struggle to learn the relationship between
the qualities.
Table I: Summary of different methodologies with their model, advantages and disadvantages.

4 ·
From the literature reviewed in this section, various models were discovered to work well with
textual data . These models have been compared on the basis of suitable parameters in Table II
and Figure I.
Table II: Metric-based comparison of models working with text data. (Literature Review).
Figure I: F1-Score Comparison (Literature Review).

· 5
3. PROPOSED METHODOLOGY
3.1 Drawbacks of Previous Models
(1) Deep Learning models like Multi-Layer Perceptron (MLP) give better results for large datasets
but social media datasets for depression detection are generally small and thus deep learning
models lead to wastage of time and effort.
(2) The datasets are highly imbalanced which makes Machine Learning models very inefficient
as they become biased.
(3) None of the algorithms provides exceptional accuracy along with high F1-Score which makes
them susceptible to erroneous results.
3.2 Proposed Idea

Most of the algorithms described in our survey of the topic are state-of-the-art algorithms which
work very well with balanced datasets. But we are dealing with highly imbalanced datasets and
thus, our algorithm must have the ability to remove this imbalance and avoid any bias.
Our model will be developed as a stacking generalized combination of XGBoost and Multinomial
Naive Bayes. This hybrid model would be able to extract most relevant information using the
power of both algorithms. Here, the XGBoost and Multinomial Naı̈ve Bayes models would act
as the base-learners for our stack while a Logistic Regression classifier will act as the meta-
model. The stacked model produces very accurate results and a very good F1-score, surpassing
all state-of-the-art models.
3.3 Data Collection

Since our model requires tweets as input, it becomes important to collect sufficient number of
tweets in our dataset so that our model can train well. For this phase we will integrate data from
various open-source datasets with some data scraped by us through the use of Tweepy library.
The open-source datasets can be obtained from Sentiment140 3 and TWINT GitHub data 4 . These
datasets on their own are highly imbalanced in favor of non-depressive tweets. Hence, we scrape
our own data from twitter with Tweepy and search tweets with depressive keywords like “de-
pression”, “anxiety”, “#depression is real”, “suicide” etc. After scraping is completed, we can
integrate all the data together to obtain our final dataset.
Our final dataset contains 10,85,066 tweets out of which about 19% is depressive and rest 81%
is non-depressive. Clearly, this dataset is skewed heavily and needs to be dealt with carefully
(Refer Figure II).
Figure II: Dataset distribution of depressive and non-depressive tweets.
3 https://www.kaggle.com/kazanova/sentiment140
4 https://github.com/ram574/Detecting-Depression-in-Social-Media-via-Twitter-Usage

6 ·
3.4 Data Preprocessing
After preparing the dataset the next task is to clean the data and convert it to a form suitable
for machine learning. Data cleaning involves the following steps: -
(1) Convert text to Lowercase.
(2) Remove Stopwords.
(3) Elongate shorthand notations (e.g., I’ll to I will, didn’t to did not etc.).
(4) Remove URL links and user mentions (e.g., @AroopKumar9).
(5) Remove non-alphabet characters like emojis and numeric data.
(6) Lemmatize the text (e.g., [historical, historically] to history).
After performing the above cleaning operations, the data is passed through a vectorizer (CountVec-
torizer) for Tokenization. Now, each tweet will be stored as a list of tokens. Thus, our data is
ready for training.
3.5 Model Selection and Hyperparameter Tuning
Various models like Logistic Regression, SVM, Naı̈ve Bayes (Multinomial, Bernoulli), Random
Forest and XGBoost were trained and tested against the dataset. It was observed that XGBoost
gave the best accuracy and good precision but a very bad recall. While, Multinomial Naı̈ve Bayes
(MNB) gave a modest accuracy and precision but a high recall. Hence, its ability to classify de-
pressive tweets correctly was better than XGBoost (XGB). Hence, these were the two models
selected as the base models for our stack.
These two models were further improved by tuning their hyperparameters using GridSearchCV.
For MNB the parameter tuned was alpha while for XGB it was n estimators, max depth and
gamma.
3.6 Stacking Generalization (Proposed Model)
Stacking is an ensemble technique of combining various heterogenous base models, training them
on the original data, passing the results obtained to a next meta-model and training it on these
predictions and then predicting the final value. Hence, stacking helps in developing hybrid mod-
els which could perform better than the standalone base models.
Our stacked model has been built using Multinomial Naı̈ve Bayes (MNB) and XGBoost (XGB)
classifiers as the base models and Logistic regression as the meta-model. This has been depicted
in Figure III.
Figure III: Architecture of proposed model.

· 7
4. FINDINGS AND RESULTS

For testing the model, we need to define the metrics upon which the models must be evaluated.
In our case the metrics defined are: Precision, Recall, F1-score (primary) and Accuracy
(secondary). Evaluating the model using the traditional train-test split method will not be of
much use due to imbalanced distribution. Hence, we use Stratified 5-Fold Cross Validation for
obtaining our prediction results.
The stacking model (MNB + XGB) was tested against standalone MNB and XGB models to
get a good understanding of the properties of our model. The results for these are shown in the
below table (Table III): -
Table III: Comparison of state-of-the-art models with stacked model.
The results clearly show that the stacked model is performing exceedingly well on both our pri-
mary metric (F1-score) and secondary metric (Accuracy). The Multinomial Naive Bayes model
on its own gives a very good recall (low false negatives) but has a relatively low precision (high
False Positivity Rate). On the other hand, the XGBoost model gives an excellent precision (low
false positives) but a low recall (high False Negativity Rate). The stacked model utilizes the best
of both these algorithms, thereby giving exceptional values for precision and recall. As a result,
the F1-score and the Accuracy obtained for the stacked model is superior to that of any model
proposed earlier.
The final results for the 5-fold cross validation have been represented in the form of box-plots in
the following diagrams: -

8 ·
Figure IV: Comparison of F1-score.
Table V: Comparison of Accuracy.

· 9
5. CONCLUSION
To sum up, it has been well established that depression is one of the leading issues faced by our
society. Detecting depression early can play a major role in preventing suicides and improving
mental health of the society collectively. Social media has been a revelation in sentiment analysis
and can be used to effectively tackle depression.
Our project uses twitter data to train a stack ensembled model consisting of Multinomial Naı̈ve
Bayes and XGBoost base-models along with a Logistic Regression meta-classifier. The results
have shown convincingly that on both F1-score and Accuracy metrics, the proposed stacking
model has had superior results to the state-of-the-art models. With a F1-score of 93% and an
Accuracy of 96%, our model stands out as one of the best performers among all models developed
to date.
This project could go a long way in integrating emotional AI with social media for eradicat-
ing depression from our society.
References
Asad, N. A., Mahmud Pranto, M. A., Afreen, S., and Islam, M. M. 2019. Depression
detection by analyzing social media posts of user. In 2019 IEEE International Conference
on Signal Processing, Information, Communication Systems (SPICSCON). 13–17.
Cong, Q., Feng, Z., Li, F., Xiang, Y., Rao, G., and Tao, C. 2018. X-a-bilstm: a deep
learning approach for depression detection in imbalanced data. In 2018 IEEE International
Conference on Bioinformatics and Biomedicine (BIBM). 1624–1627.
Deshpande, M. and Rao, V. 2017. Depression detection using emotion artificial intelligence.
In 2017 International Conference on Intelligent Sustainable Systems (ICISS). 858–862.
Husseini Orabi, A., Buddhitha, P., Husseini Orabi, M., and Inkpen, D. 2018. Deep
learning for depression detection of twitter users. 88–97.
Jain, S., Narayan, S. P., Dewang, R. K., Bhartiya, U., Meena, N., and Kumar, V.
2019. A machine learning based depression analysis and suicidal ideation detection system
using questionnaires and twitter. In 2019 IEEE Students Conference on Engineering and
Systems (SCES). 1–6.
Katchapakirin, K., Wongpatikaseree, K., Yomaboot, P., and Kaewpitakkun, Y. 2018.
Facebook social media for depression detection in the thai community. In 2018 15th Inter-
national Joint Conference on Computer Science and Software Engineering (JCSSE). 1–6.
Mahendran, N., Durai Raj Vincent, P. M., Srinivasan, K., Sharma, V., and Jayakody,
D. K. 2020. Realizing a stacking generalization model to improve the prediction accuracy
of major depressive disorder in adults. IEEE Access 8, 49509–49522.
Naslund, J., Aschbrenner, K., Mchugo, G., Unützer, J., Marsch, L., and Bartels, S.
2017. Exploring opportunities to support mental health care using social media: A survey
of social media users with mental illness. Early Intervention in Psychiatry 13.
Tao, X., Dharmalingam, R., Zhang, J., Zhou, X., Li, L., and Gururajan, R. 2019.
Twitter analysis for depression on social networks based on sentiment and stress. In 2019 6th
International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC).
1–4.
Tariq, S., Akhtar, N., Afzal, H., Khalid, S., Mufti, M. R., Hussain, S., Habib, A.,
and Ahmad, G. 2019. A novel co-training-based approach for the classification of mental
illnesses using social media posts. IEEE Access 7, 166165–166172.
Thorstad, R. and Wolff, P. 2019. Predicting future mental illness from social media: a big
data approach.
10 ·
Trotzek, M., Koitka, S., and Friedrich, C. 2018. Utilizing neural networks and linguistic
metadata for early detection of depression indications in text sequences. IEEE Transactions
on Knowledge and Data Engineering 32, 588–601.
Yang, T.-H., Wu, C.-H., Su, M.-H., and Chang, C.-C. 2016. Detection of mood disorder
using modulation spectrum of facial action unit profiles. In 2016 International Conference
on Orange Technologies (ICOT). 5–8.

IJNGC Latex Research Paper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IJNGC Latex Research Paper

Uploaded by

Copyright:

Available Formats

Depression Detection using Sentiment Analysis

of Social Media Posts

Army Institute of Technology, Pune, India

Keywords: Depression, Twitter, Machine Learning, Naive Bayes, XGBoost, Stacking

International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

2. BACKGROUND AND RELATED WORK

International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

Figure I: F1-Score Comparison (Literature Review).

International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

3.2 Proposed Idea

3.3 Data Collection

Figure II: Dataset distribution of depressive and non-depressive tweets.

International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

Figure III: Architecture of proposed model.

International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

4. FINDINGS AND RESULTS

Table III: Comparison of state-of-the-art models with stacked model.

International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

Figure IV: Comparison of F1-score.

Table V: Comparison of Accuracy.

International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

International Journal of Next-Generation Computing, Vol. V, No. N, Month 20YY.

You might also like