IR Case Study Final Presentation

APPLICATIONS OF
INFORMATION RETRIEVAL
IN SOCIAL NETWORKING
PLATFORMS
Information Retrieval Case Study

Under The Supervision Of - Dr. MAMTA RANI DAS
2
TEAM MEMBERS
Name Regd No.
Aman Kumar Singh 2101020403
Shrobona Sutradhar 2101020404
Aashutosh Raj 2101020405
Shivam Kumar Mahatha 2101020406
Aman Kumar Singh 2101020407
Ritik Kumar Singh 2101020408
Rupesh Kumar 2101020409
Ashish Ranjan 2101020410
INTRODUCTION
Social networking platforms leverage information retrieval techniques,
notably sentiment analysis, to understand user opinions and emotions.
This involves NLP, machine learning, and data mining to classify text
into positive, negative, or neutral sentiments. Applications include
targeted advertising, reputation management, trend monitoring, and
customer feedback analysis. Such analyses enhance user experience,
enable content personalization, and inform decision-making across
domains.
Sentiment analysis not only enhances user engagement but also provides
valuable insights for businesses, advertisers, and policymakers. By
understanding sentiment dynamics, platforms can adapt strategies to
better meet user needs and preferences.
PROBLEM
STATEMENT
Despite advancements, sentiment analysis on social media faces
challenges due to the volume, dynamics, and nuances of user-
generated content.
Noise, sarcasm, slang, and cultural

variations complicate accurate analysis,
necessitating more robust, context-aware
algorithms for improved reliability.
DATASET
• The Sentiment140 dataset comprises 1,600,000 tweets collected using the Twitter API.
• Tweets are manually annotated for sentiment polarity, categorized into negative (0),
neutral (2), and positive (4) classes.
• Each entry in the dataset includes six fields: Target, IDs, Date, Flag, User, and Text.
• Target field indicates the sentiment polarity with numerical values: 0 for negative, 2 for
neutral, and 4 for positive.
• IDs represent unique identifiers assigned to each tweet, aiding in referencing and
tracking.
• Date field specifies the timestamp of tweet posting, including details like day, month,
time, and UTC time zone.
• Flag field provides additional information such as the query used for extraction or
NO_QUERY if randomly selected.
• User field contains the username of the Twitter user who posted the tweet, facilitating
source identification and sentiment analysis of specific users.
• Text field comprises the tweet content, including hashtags, mentions, URLs, and emojis,
serving as primary data for sentiment analysis tasks.
METHODOLOGY 6
The methodology for Twitter sentiment analysis utilizing the Sentiment140

dataset involves several key steps, including data collection, pre-processing,
model training, evaluation, and prediction. Here is a breakdown of these
steps :
1. Data Collection : The first step involves collecting data from the
Sentiment140 dataset, which contains 1,600,000 annotated tweets
categorized into negative, neutral, and positive sentiments.
2. Data Pre-processing : Before training the sentiment analysis model, the

collected tweet data undergoes pre-processing which includes tasks such
as removing special characters, URLs, hashtags, mentions, and stop
words.
5. Using New Data: Once the logistic regression model is trained, it can be
3. Train-Test Split : Once the data is pre-processed, it is
deployed to analyze new data, such as unseen tweets from Twitter.
divided into two subsets: a training set and a test set.
Typically, the data is split into a ratio such as 80% for
6. Prediction : The trained logistic regression model is then used to predict the
training and 20% for testing.
sentiment polarity of the new tweets. For each tweet, the model outputs a
probability score indicating the likelihood of it belonging to each sentiment
4. Logistic Regression Model : For sentiment analysis, a
class (negative, neutral, or positive).
logistic regression model is a commonly used machine
learning algorithm due to its simplicity and effectiveness for
7. Evaluation : Finally, the accuracy and precision of the sentiment analysis
binary classification tasks.
model are evaluated using the test set.
LIBRARIES USED
Here is a breakdown of each of the libraries and modules used:
1.NumPy : NumPy is a fundamental package for scientific computing in Python. It

provides support for large, multi-dimensional arrays and matrices, along with a
collection of mathematical functions to operate on these arrays efficiently.
2. Pandas : Pandas is a powerful data manipulation and analysis library built on top
of NumPy. With Pandas, users can perform tasks like data cleaning, transformation,
aggregation, and visualization, making it an essential tool for data scientists and
analysts.
3. RE (Regular Expressions) : The `re` module in Python provides support for
working with regular expressions, allowing users to search, match, and manipulate
text data based on specific patterns or rules.
4. Pickle : Pickle is a module in Python used for serializing and deserializing Python
objects. It enables users to store Python objects persistently for later use, making it
useful for tasks such as saving trained machine learning models, caching
intermediate results, or transferring data between different Python environments.
RESULTS 9
1. Accuracy Scores:
a) Training Data: The accuracy score of training data was 0.8102.
b) Testing Data: The accuracy score of testing data was 0.7780.
2. Prediction Interpretation:
a) Negative Sentiment: Predicted value of 0.
b) Positive Sentiment: Predicted value of 1.
3. Analysis of Results:
a) The accuracy scores suggest that the model has learned significant patterns from the
tweet data.
b) The slightly lower testing accuracy implies possible overfitting, where the model
might be better at memorizing training data than generalizing.
4. Practical Implications:
1. With over 77% accuracy on testing data, the model is viable for real-world
applications such as brand monitoring, reputation management, customer feedback
analysis, and market sentiment analysis.
2. Businesses and organizations can use this model to make informed decisions based on
Twitter sentiment analysis.
Overall, the sentiment analysis model is effective in classifying tweet sentiments, offering
CONCLUSION 10
Information retrieval in social media, particularly through sentiment analysis and opinion
mining, has become essential for understanding and leveraging user-generated content on
platforms like Twitter, Facebook, and Instagram. These techniques allow businesses,
organizations, and individuals to gain valuable insights into the opinions, emotions, and
attitudes expressed by users regarding specific topics, products, or events. By utilizing
natural language processing, machine learning, and data mining, sentiment analysis and
opinion mining algorithms can efficiently process large volumes of social media data,
extract meaningful patterns, and provide actionable insights for decision-making.
However, challenges such as noise, sarcasm, cultural nuances, and data biases require
continuous research and development to improve accuracy, reliability, and scalability.
Ultimately, the applications of information retrieval in social media offer unprecedented
opportunities for understanding user sentiments, preferences, and behaviors on a large
scale. By harnessing these techniques, businesses, organizations, and researchers can gain
deep insights into public opinion dynamics, drive informed strategies, and foster better
engagement and communication with their target audiences in the dynamic social media
landscape.
FUTURE WORK 11
1. Improved Sentiment Analysis Models : Continued research and development are

needed to enhance the accuracy and robustness of sentiment analysis models, particularly
in handling complex language structures, sarcasm, irony, and context-dependent
sentiments.
2. Multimodal Sentiment Analysis : With the proliferation of multimedia content on social

media platforms, future work can explore multimodal sentiment analysis techniques that
analyze not only text but also images, videos, and audio data.
3. Real-time Sentiment Monitoring : Incorporating streaming data processing techniques

and scalable sentiment analysis algorithms can facilitate timely decision-making and
crisis management strategies.
4. Fine-grained Opinion Mining : Future research can focus on fine-grained opinion

mining techniques that go beyond sentiment polarity classification to identify specific
aspects, features, or attributes of products, services, or topics that users are discussing.
5. Cross-lingual Sentiment Analysis : Developing algorithms that can handle code-

switching, language variations, and cultural differences can facilitate sentiment analysis
on a global scale, enabling businesses and organizations to gain insights into diverse user
communities and markets.
REFERENCES 12
• https://www.kaggle.com/datasets/kazanova/sentiment140
• https://www.geeksforgeeks.org/twitter-sentiment-analysis-using-python
/
• https://link.springer.com/book/10.1007/978-3-031-02145-9
• https://towardsdatascience.com/using-nlp-to-figure-out-what-people-re
ally-think-e1d10d98e491
• https://www.analyticsvidhya.com/blog/2021/06/rule-based-sentiment-a
nalysis-in-python/

IR Case Study Final Presentation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IR Case Study Final Presentation

Uploaded by

Copyright:

Available Formats

APPLICATIONS OF

Information Retrieval Case Study

Noise, sarcasm, slang, and cultural

The methodology for Twitter sentiment analysis utilizing the Sentiment140

2. Data Pre-processing : Before training the sentiment analysis model, the

1.NumPy : NumPy is a fundamental package for scientific computing in Python. It

1. Improved Sentiment Analysis Models : Continued research and development are

2. Multimodal Sentiment Analysis : With the proliferation of multimedia content on social

3. Real-time Sentiment Monitoring : Incorporating streaming data processing techniques

4. Fine-grained Opinion Mining : Future research can focus on fine-grained opinion

5. Cross-lingual Sentiment Analysis : Developing algorithms that can handle code-

You might also like