Professional Documents
Culture Documents
AI Report Shivam
AI Report Shivam
Artificial Intelligence
Mini Project Report
On
By
CERTIFICATE
This is to Certify that
Has satisfactorily completed the requirements of Artificial Intelligence mini project simple
portfolio for the degree of T.E (Computer Engineering)
On
1 Abstract 4
4 Objectives 5
5 Source code 7
6 Output 11
7 Conclusion 12
ABSTRACT
This report presents the development of a sentiment analysis project using natural language
processing techniques. The project aims to classify text data into positive, negative, or neutral
sentiment categories. The report outlines the project's objectives, methodology, data sources, and
tools used. The project utilizes Python and its natural language processing libraries, including
NLTK and Scikit-learn, to preprocess the data and perform sentiment analysis. The report also
includes an evaluation of the model's performance and potential improvements, demonstrating the
effectiveness of natural language processing techniques in sentiment analysis tasks.
Problem Statement:
Sentiment analysis is a technique used to analyze and classify the sentiment of a text. With the
increase in social media usage, it has become important to monitor the sentiment of people
towards a particular product, service or event. The aim of this project is to perform sentiment
analysis on a dataset of tweets related to a particular topic and classify them into positive,
negative or neutral sentiments.
Outcome: The output of this project will be a report containing the sentiment analysis of the
given dataset and a classification of the tweets into positive, negative or neutral sentiments.
Software Requirements:
To perform the analytics, we need the following software:
• Python 3.5 or above
• panda’s library
• matplotlib library
• Scikit-learn library
Theory Concept:
Natural Language Processing (NLP): It is a subfield of Artificial Intelligence that deals with the
interaction between computers and humans using natural language. NLP techniques are used to
process, analyze and understand human language. Sentiment Analysis: It is a technique used to
analyze and classify the sentiment of a text. Sentiment analysis can be performed using machine
learning algorithms or lexicon-based approaches. NLTK Library: It is a Python library used for
natural language processing. It provides a suite of text processing libraries and tools, including
tokenization, stemming, lemmatization, part-ofspeech tagging, and sentiment analysis. Pandas:
Pandas is a Python library used for data manipulation and analysis. It provides data structures for
efficiently storing and manipulating large datasets and provides functions for data cleaning,
merging, filtering, and other operations
Program
1. Import necessary libraries: Import the required libraries - NLTK, Pandas and Scikitlearn. 2. Load
the dataset: Load the dataset of tweets into a Pandas dataframe. 3. Text preprocessing: Preprocess
the text data by removing stop words, punctuation marks, and performing tokenization. 4. Feature
extraction: Extract features from the preprocessed text data using bag-ofwords and TF-IDF
techniques. 5. Sentiment analysis: Train a machine learning algorithm (such as Naive Bayes or
Support Vector Machine) using the extracted features and perform sentiment analysis on the
dataset. 6. Classification: Classify the tweets into positive, negative or neutral sentiments based on
the sentiment score
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix
import pandas as pd
nltk.download('stopwords')
df = pd.read_csv('tweets.csv')
stop_words = set(stopwords.words('english'))
df['text'] = df['text'].apply(lambda x: ' '.join([word for word in word_tokenize(x.lower()) if
word.isalpha() and word not in stop_words]))
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['text'])
clf = MultinomialNB()
clf.fit(X, df['sentiment'])
Output:
• If the sentiment predicted for the test_tweet is positive, the output will be: "Positive
Sentiment"
• If the sentiment predicted for the test_tweet is negative, the output will be: "Negative
Sentiment"
• If the sentiment predicted for the test_tweet is neutral, the output will be: "Neutral
Sentiment"
The output of the code will depend on the contents of the 'tweets.csv' dataset and the specific
test_tweet used in the code.So, the exact output of the code will depend on the specific
test_tweet used and the sentiment predicted by the model for that tweet
In conclusion: In this report, the sentiment analysis project demonstrated how machine learning
can be used to classify text into positive, negative, or neutral sentiment. The project used the Count
Vectorizer for feature extraction and Multinomial for classification. Overall, the project showcased
the potential of AI and NLP in analysing large volumes of text data.