Da Project Report

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Title:-

Tweeting the Pulse: Unveiling Sentiments on India's Education Policy


2020

Introduction:-
In the digital age, social media platforms have become central hubs for public discourse,
providing a dynamic arena where individuals express their opinions, concerns, and
aspirations. Amidst this backdrop, the introduction of the Education Policy 2020 by the
Indian government marks a significant milestone in the country's educational landscape. This
comprehensive reform initiative aims to address the evolving needs and challenges of the
education system, ushering in a new era of modernization and inclusivity. However, the
success and impact of such policy reforms heavily depend on public acceptance, engagement,
and feedback.
Understanding the sentiments and attitudes of stakeholders towards the Education Policy
2020 is paramount for policymakers seeking to assess its effectiveness and address potential
concerns. To this end, our project focuses on harnessing the power of machine learning and
natural language processing techniques to analyze sentiments expressed on Twitter regarding
the Education Policy 2020 in India. By systematically categorizing tweets into positive,
negative, or neutral sentiments, we aim to uncover underlying patterns, trends, and insights
within the vast volume of social media data.
Through this interdisciplinary approach, we seek to empower policymakers, educators, and
other stakeholders with actionable insights derived from social media analytics. By enhancing
the evaluation process and fostering greater public engagement, our project endeavors to
contribute to a more transparent, inclusive, and responsive policymaking process in the realm
of educational reforms. By unraveling the intricate dynamics of sentiment expression in
educational policy discourse, we aspire to facilitate informed decision-making and
consensus-building around India's Education Policy 2020.

Literature review:
The literature on sen ment analysis in social media highlights its significance in understanding public
percep ons towards various topics, including educa onal policies. Previous studies have employed
diverse methodologies, including machine learning techniques, to analyze sen ment trends and
opinions expressed on pla orms like Twi er. While such analyses offer valuable insights, challenges
such as data noise and bias remain. Addi onally, there is a growing recogni on of the need for
context-aware sen ment analysis to accurately capture the nuances of policy discourse. Our project
builds upon this exis ng literature to develop a tailored sen ment analysis approach for evalua ng
public sen ments towards India's Educa on Policy 2020 on Twi er.

Data Preprocessing:
In order to prepare the dataset for sentiment analysis, several
preprocessing steps were performed:
 Renaming Fields:
The field names were modified to better reflect their contents. Specifically, the
"clear_text" field was renamed to "tweets," and the "category" field was
renamed to "sentiments" for clarity and consistency.
 Sentiment Label Conversion:
To facilitate sentiment analysis, the numerical sentiment labels in the
"sentiments" field were converted to more intuitive categorical labels. The
mapping was as follows:
Sentiment label -1 was replaced with "negative."
Sentiment label 0 was replaced with "neutral."
Sentiment label 1 was replaced with "positive."
This conversion enhances the interpretability of the sentiment analysis results
and aligns with common conventions for sentiment labeling in natural language
processing tasks.
 Data Cleaning:
The tweets in the "tweets" field underwent data cleaning procedures to remove
noise, irrelevant characters, and special symbols. This step involved removing
mentions, hashtags, URLs, and non-alphanumeric characters that do not
contribute to sentiment analysis.
 Tokenization and Normalization:
The cleaned tweets were tokenized into individual words or tokens and underwent
normalization processes such as lowercasing to ensure uniformity in text
representation. This step prepares the text data for feature extraction and
subsequent analysis.
 Handling Missing Data:
A check was conducted to identify and handle any missing or null values in the
dataset. Depending on the extent of missing data, strategies such as imputation or
removal of incomplete records were employed to maintain data integrity .
 Data Splitting:
Finally, the preprocessed dataset was partitioned into training, validation, and test
sets to facilitate model development, evaluation, and validation. The data splitting
process ensures that the model's performance can be accurately assessed on
unseen data.
 Summary:
The data collection and preprocessing steps outlined above lay the foundation for
conducting sentiment analysis on Twitter data related to India's Education Policy
2020. By transforming raw tweet text into structured data with categorical
sentiment labels, we have prepared the dataset for subsequent modeling and
analysis.

Learning Method: Logistic Regression


Logistic regression is a powerful and widely-used statistical technique for
binary classification tasks, making it particularly suitable for sentiment
analysis on Twitter data related to India's Education Policy 2020. In this
project, logistic regression serves as the core learning method for training
a predictive model to classify tweets into negative, neutral, and positive
sentiment categories.
 Model Representation:
The logistic regression model represents the relationship between the
input features (word occurrences in tweets) and the probability of each
sentiment class (negative, neutral, positive) using the logistic function.
Mathematically, logistic regression models the log-odds of the probability
of a tweet belonging to a particular sentiment class as a linear
combination of the input features.
 Training Procedure:
During the training phase, the logistic regression algorithm iteratively
adjusts the model parameters (weights) to minimize the logistic loss
function, which measures the discrepancy between the predicted
probabilities and the true sentiment labels in the training dataset. Gradient
descent optimization techniques are commonly employed to efficiently
update the model parameters and converge to the optimal solution.
 Feature Extraction:
The feature extraction process involves converting the preprocessed tweet
text into numerical feature vectors, typically using the Bag-of-Words
(BoW) representation. Each tweet is represented as a sparse vector of word
occurrences, capturing the presence or absence of specific words in the
tweet. These feature vectors serve as input to the logistic regression model.
 Model Interpretability:
One of the key advantages of logistic regression is its interpretability. The
model coefficients (weights) associated with each feature provide insights
into the importance of different words or tokens in predicting sentiment.
This interpretability allows stakeholders to understand the underlying
factors driving sentiment classification decisions.
 Performance Evaluation:
The performance of the logistic regression model is evaluated using
standard evaluation metrics such as accuracy, precision, recall, and F1-
score on a separate validation dataset. These metrics provide a
comprehensive assessment of the model's predictive performance across
different sentiment categories, guiding the selection of hyperparameters
and fine-tuning of the model.

Testing & Result:


Following model training, testing yielded an accuracy of 83%,
validating the logistic regression's efficacy in sentiment analysis
on Twitter data concerning India's Education Policy 2020. This
robust performance underscores the model's capability to classify
sentiments accurately, contributing valuable insights into public
perceptions and attitudes.

Accuracy:-
Testing resulted in an 83% accuracy rate, confirming the logistic
regression model's effectiveness in sentiment analysis on Twitter
data.

Conclusion:-
In conclusion, the logistic regression model demonstrated strong
performance with an 83% accuracy rate in sentiment analysis on
Twitter data pertaining to India's Education Policy 2020. This
underscores its efficacy in discerning public sentiments, aiding
policymakers in gauging policy acceptance and effectiveness.
The model's interpretability further enhances its utility by
providing insights into influential factors driving sentiment
classification. Moving forward, leveraging such models can
facilitate informed decision-making and foster greater public
engagement in educational reforms, ultimately contributing to a
more transparent and responsive policymaking process.

Reference:
1. Choi, Y., & Varian, H. (2012). Predicting the Present with Google
Trends. Economic Record, 88, 2-9.
2. Mitchell, L., Frank, M. R., Harris, K. D., Dodds, P. S., &
Danforth, C. M. (2013). The Geography of Happiness:
Connecting Twitter Sentiment and Expression, Demographics,
and Objective Characteristics of Place. PloS ONE, 8(5), e64417.
3. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R.
(2011). Sentiment Analysis of Twitter Data. In Proceedings of the
Workshop on Languages in Social Media (pp. 30-38).
4. Pak, A., & Paroubek, P. (2010). Twitter as a Corpus for Sentiment
Analysis and Opinion Mining. In Proceedings of the Seventh
conference on International Language Resources and Evaluation
(LREC'10) (pp. 1320-1326).
5. Pang, B., & Lee, L. (2008). Opinion Mining and Sentiment
Analysis. Foundations and Trends® in Information Retrieval,
2(1-2), 1-135.

You might also like