Professional Documents
Culture Documents
TD 3 - Feature Extration and Feature Selection
TD 3 - Feature Extration and Feature Selection
TD 3 - Feature Extration and Feature Selection
TD 3
Feature extraction
Problem 1:
You are given a dataset of customer reviews for a product. Each review is labeled with its
sentiment: positive or negative. Your task is to perform sentiment analysis on this dataset using
both the bag of words and TF-IDF representations.
Dataset:
Review Sentiment
Tasks:
● Tokenize each review and create a bag of word representations for each using
word frequencies.
TF-IDF Representation:
1
Ms. Laifa Meriem BBA university - 2023
Sentiment Analysis:
● Based on the bag of words and TF-IDF representations, predict the sentiment
(positive or negative) for each review manually.
Note: For TF-IDF, assume a total of 6 documents in the corpus (the number of reviews in the
dataset). You can use the formulas and calculations explained in the previous response to solve
this problem.
Dataset:
You are provided with a dataset containing 50 customer reviews, with labels indicating whether
the sentiment is positive or negative. HERE
b. Create a bag of words representation for each review using word frequencies.
2. TF-IDF Representation:
2
Ms. Laifa Meriem BBA university - 2023
3. Sentiment Analysis: Manually predict the sentiment (positive or negative) for each review
based on both the bag of words and TF-IDF representations.
4. Feature Importance:
a. For each sentiment (positive and negative), identify the top 3 words with the
highest importance based on both bag of words and TF-IDF representations.
5. Summary:
a. Provide a summary of your findings, including insights into the important words for
positive and negative sentiments according to both representations.
b. Compare and contrast the results obtained from the bag of words and TF-IDF
analyses.
Problem 3:
● Write the pseudocode of Bag of Words Algorithm.
● Write the pseudocode of the TF-IDF algorithm.