Professional Documents
Culture Documents
Problem Statement NLP Graded Project
Problem Statement NLP Graded Project
Problem Statement NLP Graded Project
Graded Project
Domain:
E-Commerce
Objective:
Sentiment Analysis of Product Reviews
Problem Statement:
Given the review data rating label, we will try to get insights about various brands and their
ratings using text analytics and build a model to predict overall sentiment.
Data Dictionary:
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 1
Tasks and Steps [Score: 50 points]
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Q5. Textual Data Pre-processing [15 Marks]
5.1 Lowercase all data in columns ‘title’ and ‘body’ [3 Marks]
5.2 Remove all punctuations from columns ‘title’ and ‘body’ [3 Marks]
5.3 Remove stopwords from columns ‘title’ and ‘body’ [3 Marks]
5.4 Transform ‘body’ using TF-IDF vectorizer [3 Marks]
5.5 Split data into X and Y. [3 Marks]
(Vectorized data should be X and sentiment should be Y.)
Submission format:
Please upload submission in both ‘.html’ and ‘.ipynb’ formats.
In jupyter notebook GoTo File > Download as > HTML (.html)
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3