Professional Documents
Culture Documents
2173112-Assignment 02
2173112-Assignment 02
2173112-Assignment 02
Total Marks:
Obtained Marks:
Assignment # 02
Last date of Submission: 5 june, 2023
Instructions: Copied or shown assignments will be marked zero. Late submissions are not
entertained in any case. Submit on Google classroom.
Q#1:
For the BBC text classification dataset, available at the following link.
https://lazyprogrammer.me/course_files/nlp/bbc_text_cls.csv
Report the performance of Random Forest classifier available in the ScikitLearn library in Python .
Solution:
import pandas as pd
data_url = 'https://lazyprogrammer.me/course_files/nlp/bbc_text_cls.csv'
df = pd.read_csv(data_url)
# Display the first few rows of the dataset
print(df.head())
X = df['text']
y = df['category']
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
rf_classifier = RandomForestClassifier()
rf_classifier.fit(X_train_tfidf, y_train)
y_pred = rf_classifier.predict(X_test_tfidf)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)