NLP Proj 1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

This project aims to perform sentiment analysis on customer reviews that are being provided for

a business using the Google Generative AI API, specifically the Gemini Pro model. The process
is divided into several key steps, each serving a crucial purpose in the overall workflow.

1. Mounting the Drive and Importing the Dataset


The project begins by mounting a Google Drive to access the dataset stored there. This is done
using the google.colab library, which allows for easy integration with Google Drive. The dataset
used by us is the sample data from amazon_alexa.ts which is then imported using pandas
library, with specific columns selected for analysis. The dataset is found to be imbalanced, with
one class of labels being significantly more prevalent than the other.

2. Balancing the Dataset


To address the imbalance, an undersampling technique is applied. This involves randomly
dropping rows from the majority class until the class distribution is balanced. This step is crucial
for ensuring that the model does not become biased towards the majority class during training.

3. Data Preprocessing
The dataset undergoes a series of preprocessing steps to clean the text data. This includes
removing special characters, punctuation, HTML tags, and converting all text to lowercase.
Additionally, extra whitespace is removed, and the text is trimmed to remove leading and trailing
spaces. This step is essential for preparing the data for analysis, as it ensures that the model
can focus on the meaningful content of the reviews rather than irrelevant formatting.

4. Splitting the Dataset


The dataset is split into a training set and a test set, with 95% of the data used for testing and
5% for training. This split allows for the evaluation of the model's performance on unseen data,
providing a more accurate assessment of its ability to generalize to new data.

5. Sentiment Analysis Using LLM


The project leverages the Google Generative AI API to perform sentiment analysis. The Gemini
Pro model is selected for this task, as it is capable of generating content based on the provided
prompts. The model is configured with an API key, and a test is run to ensure it is functioning
correctly.

6. Integrating the Gemini Pro API


The sentiment analysis task is integrated into the project by feeding the model a prompt that
includes a sample of the cleaned reviews. The model is expected to classify the sentiment of
these reviews as either positive or negative. The output is then processed and added back to
the dataset as predicted labels.

7. Batching GEMINI API Calls


To manage the API requests efficiently, the project batches the API calls. This involves dividing
the test set into smaller subsets and processing each batch separately. This approach helps in
managing the API quotas and ensures that the project does not exceed the usage limits.
8. Evaluation
Finally, the performance of the model is evaluated using a confusion matrix. This provides a
clear visualization of the model's performance, showing the number of true positives, true
negatives, false positives, and false negatives. Additionally, the accuracy of the model is
calculated to give a quantitative measure of its performance.

Conclusion
This project demonstrates the application of the Google Generative AI API for sentiment
analysis on customer reviews. By balancing the dataset, preprocessing the text data, and
integrating the Gemini Pro model, the project successfully classifies the sentiment of reviews as
either positive or negative. The use of batching for API calls ensures efficient processing, and
the evaluation step provides a comprehensive assessment of the model's performance.

You might also like