Professional Documents
Culture Documents
FYP Final Document
FYP Final Document
FYP Final Document
Supervised By
DR. SHAHID IQBAL
BS Software Engineering
Department of Computer Science
Capital University of Science & Technology, Islamabad
i|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
Submission Form for Final-Year
PROJECT REPORT
Version V 3.0 NUMBER 2
OF
MEMBERS
TITLE Web App to Classify Shopify User Reviews using Textual Features
MEMBERS’ SIGNATURES
Supervisor’s Signatures
ii | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
APPROVAL CERTIFICATE
This Project, entitled as “Web App to Classify User Reviews using Textual
Features” has been approved for the award of
Committee Signatures:
Supervisor: __________________________
iii | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
DECLARATION
We, hereby, declare that “No portion of the work referred to, in this project has been
submitted in support of an application for another degree or qualification of this or any other
university/institute or other institution of learning”. It is further declared that this
undergraduate project, neither as a whole nor as a part thereof has been copied out from any
sources, wherever references have been provided.
MEMBERS’ SIGNATURES
iv | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Contents
Chapter 1 ....................................................................................................................................................... 1
Chapter 2 ....................................................................................................................................................... 5
Chapter 3 ..................................................................................................................................................... 17
System Design............................................................................................................................................. 17
Chapter 4 ..................................................................................................................................................... 39
4.1.1 Indentation............................................................................................................................. 39
Chapter 5 ..................................................................................................................................................... 67
Chapter 6 ..................................................................................................................................................... 87
Software Deployment.................................................................................................................................. 87
Chapter 7 ..................................................................................................................................................... 91
References ................................................................................................................................................... 92
viii | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figures:
Figure 1 Work Breakdown Chart .................................................................................................................. 4
Figure 2 Project Time-Lapse ......................................................................................................................... 4
Figure 3 Use case Diagram ........................................................................................................................... 8
Figure 4 User Registration SSD .................................................................................................................. 18
Figure 5 User Login SSD ............................................................................................................................ 19
Figure 6 Load Data SSD ............................................................................................................................. 19
Figure 7 View Dataset SSD ........................................................................................................................ 20
Figure 8 Feature Computation SSD ............................................................................................................ 20
Figure 9 Preprocessing Technique SSD ...................................................................................................... 21
Figure 10 Select Classification SSD ........................................................................................................... 21
Figure 11 Save Result SSD ......................................................................................................................... 22
Figure 12 View History SSD ...................................................................................................................... 22
Figure 13 Test Model SSD.......................................................................................................................... 23
Figure 14 Logout SSD ................................................................................................................................ 24
Figure 15 Domain Model ............................................................................................................................ 25
Figure 16 Flow Chart ................................................................................................................................. 26
Figure 17 User Registration Flow Chart ..................................................................................................... 27
Figure 18 User Login Flow Chart ............................................................................................................... 27
Figure 19 Features computation flowchart.................................................................................................. 28
Figure 20 Pre-Processing Flow Chart ......................................................................................................... 29
Figure 21 Machine Learning Modeling Flow Chart ................................................................................... 29
Figure 22 Evaluation Metrics Flow Chart ................................................................................................... 30
Figure 23 Validation Method Flow Chart ................................................................................................... 31
Figure 24 Save model Flow Chart .............................................................................................................. 32
Figure 25 Test Saved Model Flow Chart .................................................................................................... 32
Figure 26 Signup Page Interface ................................................................................................................. 33
Figure 27 Login Page Interface ................................................................................................................... 33
Figure 28 About Page Interface .................................................................................................................. 34
Figure 29 Dataset Selection Interface ......................................................................................................... 34
Figure 30 View Dataset Interface ............................................................................................................... 35
Figure 31 Features selection Interface ........................................................................................................ 35
ix | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 32 Data Preprocessing Interface ...................................................................................................... 36
Figure 33 Classifier Selection Interface ...................................................................................................... 36
Figure 34 Classifier Result Interface........................................................................................................... 37
Figure 35 History Interface ......................................................................................................................... 37
Figure 36 Unseen Review Interface ........................................................................................................... 38
Figure 37 Text Result Interface................................................................................................................... 38
Figure 38 Text Result Interface.................................................................................................................. 87
Figure 39 Text Result Interface................................................................................................................... 88
Figure 40 Text Result Interface................................................................................................................... 89
Figure 41 Text Result Interface................................................................................................................... 89
Figure 42 Text Result Interface................................................................................................................... 90
x|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
Tables:
Table 1 Functional Requirements ................................................................................................................. 5
Table 2 Non-Functional Requirements ......................................................................................................... 7
Table 3 User Registration Use Case Description .......................................................................................... 9
Table 4 User Login Use Case Description .................................................................................................... 9
Table 5 Select Dataset Use Case Description ............................................................................................. 10
Table 6 View Dataset Use Case Description .............................................................................................. 11
Table 7 Select feature extraction method Use Case Description ............................................................... 11
Table 8 Selection of preprocessing technique Use Case Description ......................................................... 12
Table 9 Select Classifier Use Case Description .......................................................................................... 13
Table 10 Select validation technique Use Case Description ....................................................................... 14
Table 11 Select Evaluation Metric Use Case Description .......................................................................... 14
Table 12 View History Use Case Description ............................................................................................ 15
Table 13 Enter unseen text Use Case Description ...................................................................................... 16
Table 14 Layers Definition ......................................................................................................................... 17
Table 15 User Registration Test case .......................................................................................................... 67
Table 16 User Login Test Case ................................................................................................................... 68
Table 17 Choose Dataset Test Case ............................................................................................................ 69
Table 18 View Dataset Test Case .............................................................................................................. 69
Table 19 View Dataset Test Case ............................................................................................................... 70
Table 20 Train Model Test Case ................................................................................................................. 71
Table 21 Apply Feature Extraction method on Dataset .............................................................................. 72
Table 22 Apply Part of speech on Dataset Test Case.................................................................................. 72
Table 23 Remove special characters Test Case .......................................................................................... 73
Table 24 Apply Preprocessing Technique on Dataset Test Case ................................................................ 74
Table 25 View processed Feature data Test Case ....................................................................................... 74
Table 26 View processed Feature data Test Case ....................................................................................... 75
Table 27 Moving to Classifier Test Case .................................................................................................... 76
Table 28 Machine Learning Model Test Case ............................................................................................ 76
Table 29 Evaluation Metrics Test Case ...................................................................................................... 77
xi | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Table 30 Apply classifier Test Case............................................................................................................ 78
Table 31 Save Model Test Case .................................................................................................................. 79
Table 32 Save Model Test Case .................................................................................................................. 79
Table 33 Test model Test Case ................................................................................................................... 80
Table 34 Test model Test Case ................................................................................................................... 81
Table 35 Unseen Prediction Test Case........................................................................................................ 81
Table 36 Unseen Prediction Test Case........................................................................................................ 82
Table 37 Unseen Prediction Test Case........................................................................................................ 83
Table 38 Unseen Prediction Test Case........................................................................................................ 83
Table 39 Display Button Test Case ............................................................................................................. 84
Table 40 Display Button Test Case ............................................................................................................. 85
Table 41 About Page Test Case .................................................................................................................. 85
Table 42 Project Evaluation Guidelines ...................................................................................................... 91
xii | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Chapter 1
The following chapter provides the brief summary of project scope, project specification of the
project, this report includes an existing system and technologies which is used for the
development of the software, it also includes the flow of our project timeline and breakdown
structure of the project.
1|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
1.3 Business Scope
Amazon shopify apps reviews will be used as data set, and the system will classify reviews into
two categories (Happy/ Unhappy). This app will be useful for seller, as they can improve their
product for a better future sale and also for the customer, that they should buy a particular
product or not.
1.4 Objectives
This project will have following objectives
• To help developers to resolve problems and make plans for their apps.
2|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
The Hypertext Markup Language, or HTML is the standard markup language for
documents designed to be displayed in a web browser. It can be assisted by
technologies such as Cascading Style Sheets and scripting languages. [4]
3|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
1.6 Project Work Break Down
4|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
Chapter 2
Requirement Specification and Analysis
5|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
8. User can select training using discrete Intermediate Completed
positive emotion features.
12. User can select all features for training Core Completed
6|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
21. User can select F-measure evaluation Core Completed
metric for experiment.
1. The user should reach the classified text with one button press Usability
if possible
2. The system also should be user friendly for admins because Usability
anyone can be admin instead of programmers.
7|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
Whole cycle of classifying a dataset should not be more than
40 seconds.
6. After entering unseen tweet text, the system should classify it Accuracy
8|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
2.5. Use Case Descriptions
2.5.1. User Registration Use Case Description:
Table 3 User Registration Use Case Description
9|Page
Capital University of Science & Technology, Islamabad Department of Software Engineering
User
Actors User
Description User must select the dataset from the given options
10 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
2.5.4. View Dataset Use Case Description
Table 6 View Dataset Use Case Description
Actors User
Actors User
11 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Post-condition Feature selection is successfully marked
Actors Users
12 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
techniques. extraction method on
selected dataset.
Actors Users
Post-condition System will apply the machine learning model on the dataset
13 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
2.5.8. Select Validation Technique Use Case Description
Table 10 Select validation technique Use Case Description
Actors Users
Actors User
Description User can view the quality of the machine learning model.
14 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Trigger Drop Down Menu
User System
Actors User
Trigger Button
Post-condition User will see all the past history of model training.
15 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Alternative Flow 1. An unknown error occurred while displaying the list.
3. No past history
Actors User
16 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Chapter 3
System Design
The purpose of this chapter is to provide information that is complementary to the development
phase. Without an adequate design, that delivers required function as well as quality attributes,
the project will fail. However, communicating architecture to its stakeholders is as important a
job as creating it in the first place.
Layers Description
Presentation Layer This layer will be used for the interaction with the user
through a graphical user interface.
Business Logic Layer This layer contains the business logic. All the
constraints and majority of the functions reside under
this layer.
18 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 5 User Login SSD
19 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 7 View Dataset SSD
20 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
3.2.2.6 Apply Preprocessing SSD
This is Applying Pre-processing System Sequence Diagram which shown user will have to
select any of the pre-processing technique and then that technique will be applied to the given
dataset.
21 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
3.2.2.8 Save Result SSD
This is Save Result System Sequence Diagram, which shows that when the result of the
models is displayed after that we will save our model in the system
This is View History System Sequence Diagram which shown if user tries to view history of
old models which are saved, which flow of activities will take place.
22 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
3.2.2.10 Test Trained Model SSD
This is Test Saved Model System Sequence Diagram which shown if user tries to test the
saved model, user will enter unseen review and then system will display the predicted result.
23 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 14 Logout SSD
In our system we have twelve entities, the user entity is used to register and login to the system,
and train model enity is used to train the model, to train the model we first need the dataset, so
we have a choose dataset entity, and after choosing data we have to apply feature extraction
method and preprocessing technique so we have a feature extraction entity and preprocessing
entity, after this we need machine learing model, evaluation metrics and validation technique
entities, and after getting the result of train model, we have to store that model in our database so
we have save model entity, we can also test our model by giving unseen review so we have test
model entity, then system will give us prediction result so will also be having prediction entity.
24 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 15 Domain Model
(Note: We remove Software Architecture diagram, Class diagram, Sequence diagram, and ER-Diagram and add detailed
flow chart because our panel and supervisor suggested us to add detailed flow chart and remove those diagrams because we
don’t need them)
25 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
• User will also select one the validation technique from the given options.
• When User will apply these on dataset, system will train model and will display the
results for the trained model. User can also save the trained model in the system.
• User can go to history tab and view all the saved trained model.
• From History tab user can test the trained model which is saved in our system, by giving
unseen review.
• Then system will predict the class label of the unseen review (Happy/Unhappy), from the
trained model which is selected.
26 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
➢ Name
➢ Username
➢ Email
➢ Password
➢ Confirm Password
➢ Username
➢ Password
➢ Bag of words.
➢ Part of speech tagging.
➢ Discrete Positive emotion.
27 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
➢ Discrete Positive emotion.
➢ Sentiment,
➢ Polarity,
➢ Term frequency inverse document frequency (TF-IDF).
➢ Stopwords Removal.
➢ Stopwords Removal and Special Character Removal.
➢ Stopwords Removal and Lemmatization.
28 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
➢ Stopwords Removal, Special Character Removal and Lemmatization.
29 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
3.4.6 Evaluation Metrics Flow Chart
User will select Evaluation metrices from the given options:
➢ Accuracy
➢ F-measure
➢ Precision
➢ Recall
30 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 23 Validation Method Flow Chart
31 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 24 Save model Flow Chart
32 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
3.5 User Interface Design
33 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 28 About Page Interface
34 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 30 View Dataset Interface
35 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 32 Data Preprocessing Interface
36 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 34 Classifier Result Interface
37 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Figure 36 Unseen Review Interface
38 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Chapter 4
Software Development
4.1.1 Indentation
Proper code indention is used in this project. The indentation of blocks of code enhances
readability, understandability and hierarchy of lines of code.
4.1.2 Declaration
• In this project we have used one declaration per line is to increase clarity and better
understanding of code. Following is the order of declaration:
• All the widgets have been imported at the beginning.
• The sequence of class variables is: First public, protected then private.
• Instance variables follow the sequence: First public then private instance variables.
• Then class constructors are declared with proper names.
• Class methods are grouped by functionality rather than by scope or accessibility to make
reading and understanding the code easier.
• Declarations for local variables are only at the beginning of code after importing
packages and libraries
39 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
4.1.4 Naming Convention
Proper naming convention rules are followed while implementation of this project which
make programs more understandable by making them easier to read.
While implementing this project, we have used words from Natural Language (English) to
properly assign understandable names to classes, variables and methods. Such as Requests,
DocumentCollection, BasicInformation etc. instead of un-understandable names like myc
method, a1, b1 etc.
Terminologies applicable to the domain of project are used. Implying that if user refers to
Email as Registration Number, then term Registration Number is used.
Mixed case is used to make names readable with lower case letters in general capitalizing the
first letter of class names and interface names.
40 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Python is an interpreted high-level general-purpose programming
language. Python's design philosophy emphasizes code readability with
its notable use of significant indentation. [3]
41 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
# from mlxtend.classifier import StackingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import model_selection
from sklearn import tree
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
import time
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score
def get_preprocessing(pre_processing):
if pre_processing == 'Stopwords Removal':
return "a1"
elif pre_processing == 'Stopwords + Special Characters':
return "a2"
else:
return "a3"
def generate_random_forest(dataset):
label_Label = LabelEncoder()
43 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
# covernverting text into numbers
dataset["label"] = label_Label.fit_transform(dataset['label'])
X = dataset.drop("label", axis=1)
y = dataset['label']
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
start = time.time()
classifier = RandomForestClassifier(n_estimators=42, criterion='entropy')
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cv = ShuffleSplit(n_splits=5, test_size=0.3)
scores = cross_val_score(classifier, X, y, cv=10)
print(classification_report(y_test, y_pred))
print("Random Forest accuracy after 10 fold CV: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2)
+ ", " + str(
round(time.time() - start, 3)) + "s")
print("******************************")
print("******************************")
print("******************************")
print("************************************************************************************
******")
def generateNaiveBayes(dataset):
start = time.time()
label_Label = LabelEncoder()
# covernverting text into numbers
dataset["label"] = label_Label.fit_transform(dataset['label'])
X = dataset.drop("label", axis=1)
y = dataset['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
nb = GaussianNB()
nb.fit(X_train, y_train)
y_pred = nb.predict(X_test)
cv = ShuffleSplit(n_splits=5, test_size=0.3)
scores = cross_val_score(nb, X, y, cv=10)
print(classification_report(y_test, y_pred))
print("Naive Bayes accuracy after 10 fold CV: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2) + ",
" + str(
round(time.time() - start, 3)) + "s")
print("******************************")
print("******************************")
print("******************************")
print("______________________________")
print('Precision:', precision_score(y_test, y_pred, average='weighted'))
precision = precision_score(y_test, y_pred, average='weighted')
# print ('Precision:', precision_score(y_test, y_pred))
print("______________________________")
print("______________________________")
print("______________________________")
return accuracy, precision, recall, f1score, nb
if __name__ == "__main__":
print('RF')
accuracy, precision, recall, f1score = generateNaiveBayes(read_dataset('sentiment', 'a1'))
46 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
import nltk from nltk import word_tokenize, pos_tag from nltk.corpus import wordnet as wn
from nltk.tokenize import RegexpTokenizer from nltk.corpus import stopwords import xlrd
import xlwt import re from collections import Counter from nltk.stem import
WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
prepList = ["", "CC", "CD", "DT", "EX", "FW", "IN", "JJ", "JJR", "JJS", "LS", "MD", "NN",
"NNS", "NNP",
"NNPS", "PDT", "POS",
"PRP", "PRP$", "RB", "RBR", "RBS", "RP", "RP", "TO", "UH", "VB", "VBD",
"VBG", "VBN", "VBP",
"VBZ", "WDT", "WP", "WP$", "WRB"]
def process1():
try: for i in range(1, len(prepList)):
sheetToWrite.write(0, i, prepList[i], style)
47 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
# Special Character removing from source file tokenizer =
RegexpTokenizer(r'\w+') x = tokenizer.tokenize(txt1)
print(x)
#stop word removal
print(tokens_without_sw)
#x = txt1.lower()
#x = re.sub(r'\W', ' ', txt1) #x = re.sub(r'\s+', ' ', txt1)
#print(x)
#word_tokens = word_tokenize(x)
#print(word_tokens)
# Remove Punctuation # Lemmatization for token in tokens_without_sw:
token = wordnet_lemmatizer.lemmatize(token, pos="v")
print(token)
48 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
for i in counts.elements(): if i in prepList:
column = prepList.index(i) if not column in tempIndexList:
sheetToWrite.write(read + 1, column, counts[i]) tempIndexList.append(column)
"""
Writing 0's to columns with no values
"""
for i in range(1, len(prepList)): if i not in tempIndexList:
sheetToWrite.write(read + 1, i, 0)
wbWrite.save("POS Count.xls")
# print(nltk.help.upenn_tagset())
except Exception as e:
print(str(e))
process1()
Bag-of-words :
A bag-of-words model, or BoW for short, is a way of extracting features from text for use in
modeling, such as with machine learning algorithms. The approach is very simple and flexible,
and can be used in a myriad of ways for extracting features from documents Code
import re
import nltk import pandas as pd from nltk.corpus import stopwords from nltk.stem import
WordNetLemmatizer
49 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
from n_gram import output_to_csv from Pre_Processing import stopword_rem from
Pre_Processing import lemmitization
#wordnet_lemmatizer = WordNetLemmatizer()
#def stopword_rem(token):
#tokens_without_sw = [word for word in token if not word in stopwords.words()]
#return tokens_without_sw
#def lemmitization(token):
#token = wordnet_lemmatizer.lemmatize(token, pos="v")
#return token
def main():
Review_df = pd.read_csv("C:/FYP/POS
tagging/bagofwords/abc.csv")
texts_list = Review_df['text'].tolist() # texts_list[0] = "Playing...." for i in
range(len(texts_list)):
texts_list[i] = texts_list[i].lower()
# Return a match at every NON word character (characters NOT between a and Z. Like "!",
"?" white-space etc.)
texts_list[i] = re.sub(r'\W', ' ', texts_list[i])
# Replace all white-space characters with ""
texts_list[i] = re.sub(r'\s+', ' ', texts_list[i])
# TODO Number remove
50 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
bag_of_words_list = [] count = 0
"""
['The', 'The', 'Samad'] wordfreq['The'] wordfreq {
'key': value
The: 2
Samad: 1
}
sentence_1 = ['The', 'The', 'Samad'] sentence_2 = ['The', 'BAG', 'Samad']
count += 1
bag_of_words_list.append(wordfreq)
if __name__ == "__main__":
main()
51 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
TF-IDF
TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a
collection of documents. ... It has many uses, most importantly in automated text analysis, and is
very useful for scoring words in machine learning algorithms for Natural Language Processing
(NLP).
CODE
import pandas as pd import re import nltk from Pre_Processing import stopword_rem from
Pre_Processing import lemmitization
punctuations = "?:!.,;"
freq = {} tf = {}
for word in token: if word in freq: freq[word] += 1
else:
freq[word] = 1
def compute_idf(doc_list):
import math idf_dict = {}
N = len(doc_list) # [{}, {}, {}] for doc in doc_list:
52 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
for word, val in doc.items(): if val > 0: if idf_dict.get(word):
idf_dict[word] += 1
else:
idf_dict[word] = 1
return idf_dict
def main():
texts_list = ["it is going to rain today",
"today i am not going outside",
"i am going to watch the season premiere"]
53 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
# corpus = ['This is the first document.',
# 'This document is the second document.',
# 'And this is the third one.',
# 'Is this the first document?',
# ]
# train_set = ["sky is blue", "sun is bright", "sun in the sky is bright"]
# reviews_df = pd.read_csv("abc.csv")
# texts_list = reviews_df['text'].tolist()
for i in range(len(texts_list)):
texts_list[i] = texts_list[i].lower() texts_list[i] = re.sub(r'\W', ' ', texts_list[i])
texts_list[i] = re.sub(r'\s+', ' ', texts_list[i])
if __name__ == '__main__':
main()
54 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Pre-processing
In pre-processing we are doing stop-word removal , special character removal and lemmatization
Code
from nltk import WordNetLemmatizer from nltk.corpus import stopwords
wordnet_lemmatizer = WordNetLemmatizer()
def stopword_rem(token):
tokens_without_sw = [word for word in token if not word in stopwords.words()] return
tokens_without_sw
def lemmitization(token):
token = wordnet_lemmatizer.lemmatize(token, pos="v") return token
MODULE WEB
APP CODE
Home.html
<!DOCTYPE html>
<style>
html, body {
max-width: 100%;
overflow-x: hidden;
}
55 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
</style>
</head>
<body>
<div class="row">
<div class="col-md-12 text-center">
</div>
</div>
</div>
</header>
<button
class="navbar-toggler"
data-toggle="collapse"
data-target="#navbarNav"
>
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarNav">
<ul class="navbar-nav">
56 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
<li><a href="index2"><strong>Home</strong> </a></li>
</ul>
</br></br>
58 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js" integrity="sha384-
JjSmVgyd0p3pXB1rRibZUAYoIIy6OrQ6VrjIEaFf/nJGzIxFDsf4x0xIM+B07jRM"
crossorigin="anonymous"></script>
Header.html
<div class="row">
<div class="col-md-12 text-center">
</div>
</div>
</div>
</header>
59 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
<li><a href="dataset"><strong>DATASET SELECTION</strong> </a></li>
</ul>
</div>
</nav>
</div></div>
</br></br>
<br><br>
60 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
<img src="samad.jpeg" alt="HTML5 Icon" width="128" height="128">
<br><br>
</div></div>
</br></br>
Preprocessing.html
</script>
</div>
</header>
<div class="row justify-content-center">
<div class="col-md-11.5 ">
<nav class="navbar navbar-expand-lg navbar-light bg-light">
61 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
<li><a href="feature_selection" ><strong><h6>Feature Extraction</h6></strong> </a></li>
</ul>
</br></br>
Features selectio.html
<script type="text/javascript">
function codeAddress() {
if(localStorage.getItem("feature") != undefined){
var myradioValue = localStorage.getItem("feature")
$("input[name=input_name][value="+myradioValue+"]").attr('checked', true);
}
function saveradio()
{
localStorage.setItem("feature", radiovalue);
}
</script>
</head>
<body onload="codeAddress();" onbeforeunload="saveradio();" on>
</div>
</header>
<div class="row justify-content-center">
<div class="col-md-11.5 ">
<nav class="navbar navbar-expand-lg navbar-light bg-light">
63 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav mr-auto">
</ul>
</div>
</nav>
</div></div>
</br></br>
64 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
<label for="Bag">Bag Of Words</label><br>
<input type="radio" id="part" name="feature" value="Part of Speech Tagging">
<label for="part">Part of Speech Tagging</label><br>
<input type="radio" id="tf" name="feature" value="TF-IDF">
<label for="tf">TF-IDF</label><br>
<input type="radio" id="pos" name="feature" value="Discrete Positive">
<label for="pos">Discrete Positive</label><br>
<input type="radio" id="neg" name="feature" value="Discrete negative">
<label for="neg">Discrete negative</label><br>
<input type="radio" id="Polarity" name="feature" value="Polarity">
<label for="Polarity">Polarity</label><br>
<input type="radio" id="sent" name="feature" value="Sentiments">
<label for="sent">Sentiments</label><br>
<input type="radio" id="all" name="feature" value="All">
<label for="all">All</label><br>
History.html
5
<div class="row justify-content-center">
<div class="col-md-10 ">
<table class="table table-striped">
<thead>
<tr>
<th scope="col">Data Type</th>
<th scope="col">Feature Extraction</th>
<th scope="col">Preprocessing</th>
<th scope="col">Machine Learning Model</th>
<th scope="col">Accuracy</th>
<th scope="col">F-Measure</th>
<th scope="col">Precision</th>
<th scope="col">Recall</th>
<th scope="col">Validation Technique</th>
<th scope="col">Action</th>
</tr>
</thead>
<tbody>
{% if dataset %}
{% for ml,prep,feat in dataset %}
<tr>
<!-- <th scope="row">1</th>-->
<td>Product Review</td>
<td>{{ feat.feature }}</td>
<td>{{ prep.prep }}</td>
<td>{{ ml.classifier }}</td>
<td>{{ ml.accuracy }}</td>
<td>{{ ml.fmeasure }}</td>
<td>{{ ml.precision }}</td>
65 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
<td>{{ ml.recall }}</td>
<td>{{ ml.val_tech }}</td>
<td>
<form method="post" action="delrec">
{% csrf_token %}
</tbody>
</table>
66 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Chapter 5
Software Testing
This chapter provides a description about the adopted testing procedure. This includes the
selected testing methodology, test suite and the test results of the developed software.
The test cases are done manually without the use of any tool.
Date: 1/8/2021
Input
Name=Abdul Samad
Username=abdul.samad
Email=bse173024@cust.pk
Password=173024
Confirm Password=173024
67 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Expected Output
Actual Output
Expected Exceptions
Invalid email.
Date: 1/8/2021
System: Classify Shopify User Reviews
Input
Username=abdul.samad
Password=173024
Expected Output
Actual Output
Registration Successful.
68 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Expected Exceptions
Invalid email.
Empty Field.
Date: 1/8/2021
Input
Expected Output
Actual Output
Dataset selected
Expected Exceptions
Date: 1/8/2021
69 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
System: Classify Shopify User Reviews
Input
Product review.
Expected Output
Actual Output
Expected Exceptions
Dataset is displayed
Date: 1/8/2021
Input
Product review.
70 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Expected Output
Actual Output
Dataset is displayed
Expected Exceptions
Date: 1/8/2021
Input
Product review
Expected Output
Actual Output
Expected Exceptions
Backend exception
71 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
5.2.7 Apply Feature Extraction method on Dataset Test Case
Date: 4/8/2021
Input
Expected Output
Actual Output
Expected Exceptions
Date: 4/8/2021
72 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Input
Expected Output
Actual Output
Expected Exceptions
Date: 4/8/2021
Input
Expected Output
System will apply special character removal from the given dataset.
Actual Output
Expected Exceptions
73 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
System may not be responding.
Date: 4/8/2021
Input
Stopword Removal
Expected Output
Actual Output
Expected Exceptions
Date: 1/8/2021
74 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Objective: View Feature Test ID: 12
Input
Stopword Removal
Expected Output
Actual Output
Expected Exceptions
Date: 1/8/2021
Input
Stopword Removal
Expected Output
Actual Output
75 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Expected Exceptions
Date: 4/8/2021
Input
No input
Expected Output
Actual Output
System displayed the display Machine learning model and Evaluation Metrics.
Expected Exceptions
Backend exception
Date: 4/8/2021
76 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Version: 1 Test Type: Black Box Testing
Input
Naive Bayes
Expected Output
Actual Output
Expected Exceptions
Date: 4/8/2021
Input
Expected Output
Actual Output
77 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Expected Exceptions
Date: 4/8/2021
Input
Actual Output
System displayed the display Machine learning model and Evaluation Metrics
results.
Expected Exceptions
78 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
5.2.17 Save Model Test Case 1
Table 31 Save Model Test Case
Date: 4/8/2021
Input
Actual Output
Expected Exceptions
Date: 4/8/2021
79 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Version: 1 Test Type: Black Box Testing
Input
Actual Output
Expected Exceptions
Date: 1/8/2021
Input
Expected Output
80 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Actual Output
Expected Exceptions
Date: 1/8/2021
Input
Expected Output
Actual Output
Expected Exceptions
Date: 4/8/2021
81 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
System: Classify Shopify User Reviews
Input
Expected Output
Actual Output
Unhappy
Expected Exceptions
Date: 4/8/2021
Input
Text field e.g., Nice fast checkout app. Easy to set up and works as promised. I
like this wiggling button. Really cool feature!
Expected Output
82 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Actual Output
Unhappy
Expected Exceptions
Date: 4/8/2021
Input
Expected Output
Actual Output
Happy
Expected Exceptions
Date: 4/8/2021
83 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
System: Classify Shopify User Reviews
Input
Expected Output
Actual Output
Redirected to login
Expected Exceptions
Date: 4/8/2021
Input
Expected Output
Actual Output
84 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Redirect to Contact us but email not sent
Expected Exceptions
Date: 4/8/2021
Input
Expected Output
Actual Output
Expected Exceptions
Date: 1/8/2021
85 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Objective: About Page Test ID: 27
Input
Expected Output
Actual Output
Expected Exceptions
None
86 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Chapter 6
Software Deployment
First, we have to install git on the system then we will make the account on GitHub.
• Then we will install GitToolBox on PyCharm, plugins.
• Then we will push the project on GitHub hub using this tool.
• Heroku
87 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Then, we will make account on Cloud Application Platform | Heroku
• We will create application on Heroku, after creating application on Heroku then we select
python language.
• Then we will connect our Heroku account with GitHub account.
88 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
• Then we will click on Deploy Branch on Heroku website to deploy the project.
• Then we will check status of deployment on Heroku and also from PyCharm.
89 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
• Check status from PyCharm.
90 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
Chapter 7
REPORT APPROVAL CERTIFICATE
The report of the project, “Web App to classify Shopify User Review Using Textual
Features” has been approved based on the following evaluation guideline.
Artifacts Guidelines
Analysis and Design artifacts are syntactically correct (use-case model, SSDs,
domain model, class diagram, SDs, ERDs, Flow charts, Activity Diagram,
DFDs)
Consistency and traceability have been maintained among different artifacts
General Guidelines
Formatting (font style, indentation) is according to the FYP template and
consistent throughout the document
Captions are added to all the figures and tables. Figure captions must be placed
below each figure, and table captions must be provided above the table
Each figure or table is followed by some text describing what it represents
_________________
Name & Signature
(Supervisor)
91 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering
References
Research Paper
[1] F. Rustam, A. Mehmood, M. Ahmad, S. Ullah, D. M. Khan and G. S. Choi, "Classification of
Shopify App User Reviews Using Novel Multi Text Features," in IEEE Access, vol. 8, pp.
30234-30244, 2020, doi: 10.1109/ACCESS.2020.2972632.
Webpage
[2] https://www.jetbrains.com/pycharm/features/, last accessed July 24, 2021.
92 | P a g e
Capital University of Science & Technology, Islamabad Department of Software Engineering