Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 37

Final Year Project Mid Evaluation Report

Predict visitor purchases with a classification model for an E-commerce store

Project Advisor:
Ma’am Zahra Ali

Submitted By:
Sr. no Names Student IDs

1 Ehtisham-Ul-Haq 17011065-013

2 Hamza Muzaffar 17011065-006

3 Khalil Ahmed 17011065-003

4 Abdul Rab 17011065-009

Capstone 1

University of Management and Technology

Sialkot Pakistan
Final Approval

Panel of Examiners

 Head of Department / Area Coordinator ______________________

Department of Computer Science
UMT Sialkot

 Committee Head (Final Year Projects) ______________________

Department of Computer Science
UMT Sialkot

 Supervisor ______________________
Department of Computer Science
UMT Sialkot

 Co-Supervisor ______________________
Department of Computer Science
UMT Sialkot

 UMT Librarian ______________________

Department of Computer Science
UMT Sialkot

1. External Examiner ______________________

Controller of Examinations ___________________

Table of Contents
FINAL YEAR PROJECT MID EVALUATION REPORT ..................................................................................................1
PROJECT ADVISOR:.........................................................................................................................................1
SUBMITTED BY:............................................................................................................................................. 1
CAPSTONE 1..................................................................................................................................................1
UNIVERSITY OF MANAGEMENT AND TECHNOLOGY................................................................................................1
SIALKOT PAKISTAN..........................................................................................................................................1
DEFINITIONS AND ACRONYMS............................................................................................................... 5
LIST OF FIGURES.................................................................................................................................... 6
LIST OF TABLES...................................................................................................................................... 7
1 INTRODUCTION............................................................................................................................. 8
1.1 PROBLEM OVERVIEW.........................................................................................................................9
1.2 RESEARCH QUESTIONS.......................................................................................................................9
1.3 RESEARCH OBJECTIVES.....................................................................................................................10
1.4 SCOPE...........................................................................................................................................10
1.5 METHODOLOGY..............................................................................................................................10
1.6 SIGNIFICANCE/ POTENTIAL APPLICATIONS............................................................................................11
2 BACKGROUND............................................................................................................................. 12
3 LITERATURE REVIEW.................................................................................................................... 15
3.1 ONLINE/OFFLINE SHOPPING..............................................................................................................15
3.1.1 Comparison between Online and Offline Shopping.................................................................15
3.2 E-COMMERCE.................................................................................................................................18
3.2.1 Types of Ecommerce.................................................................................................................18
3.2.2 Ecommerce Business Model.....................................................................................................18
3.2.3 Lower Sales Conversions challenge faced in the Ecommerce Webspace.................................20
3.3 BIG DATA......................................................................................................................................20
3.4 BIGQUERY.....................................................................................................................................21
3.4.1 BigQuery ML.............................................................................................................................22
3.5 DATA MINING................................................................................................................................23
3.5.1 Classification in Data Mining...................................................................................................24
3.6 MACHINE LEARNING........................................................................................................................24
3.6.1 Categories of classification.......................................................................................................24
3.7 CLOUD COMPUTING.........................................................................................................................25
3.8 RELATED WORK...............................................................................................................................26
3.9 GAP ANALYSIS...............................................................................................................................29
4 PROPOSED METHODOLOGY......................................................................................................... 32
SUGGESTED APPROACH.................................................................................................................................32
WORKFLOW OF THE SYSTEM...........................................................................................................................32
5 DESIGN AND IMPLEMENTATION.................................................................................................. 33
SYSTEM DESIGN...........................................................................................................................................33
SYSTEM IMPLEMENTATION.............................................................................................................................33
ASSUMPTIONS/CONSTRAINTS (OPTIONAL)........................................................................................................33
6 REFERENCES/ BIBLIOGRAPHY....................................................................................................... 34

Provide definitions or references to all the definitions of the special terms and acronyms used within this document.

New figures that are given captions using the Caption paragraph style will be added to the table automatically. To update this
table of contents in Microsoft Word, put the cursor anywhere in the table and press F9. If you want the table to be easy to
maintain, do not change it manually.
This section can be deleted if the document contains no figures or if otherwise desired.

Table 1. Online shopping..............................................................................................................15

Table 2. Offline shopping.............................................................................................................17
Table 3. Summary Table...............................................................................................................29

With the tremendous growth rate in the past years in the field of technology, revolutionary changes have
been occurred in human lives and this modern technology has taken its place in the heart of every human
being. Undoubtedly, there is no field in our surroundings where technology is not playing its vital role. As
we know, the world is developing day by day; and the scenario of the whole world is changing. With the
invention of internet, unpredictable changes have occurred. Internet has changed the mind of every human
being with its extraordinary miracles. The total population of the world is almost about 7.674billion out of
which 4.66billion (59.5%) people are connected with internet and taking the advantage to meet up the
worldly challenges in every field of life. There is not even a single field where the internet is considered as
the unimportant. According to the latest survey, it is well considered that in upcoming years almost
everybody and everything will be attracted and connected with internet as well as our whole data that is
now used manually will be converted into online system as well as in digital form.
As the number of internet users and adopting technologies fields are increasing day by day; with this the
competition has increased in every path of life and the most relevant example in this regard is an “Online
System”. Online system is considered as one of the most efficient, quick, time saving system that has
introduced to common people globally. Online term indicates the connected state; more simply and in
easy way we can say that a functional unit that is connected to system and able to use. The importance of
online system cannot be neglected because the currently the time is very fast and everybody in hurry to
deliver his tasks either they are formal or informal. In this regard, they usually prefer the online system to
complete their tasks. Furthermore, it is not only used by the private sectors, business communities,
industrial fields, academic institutions, journalism but also used by government sector for registration
purposes, announcements of new occasions or jobs, online transactions, etc.
Online shopping is mostly attracting the people of every age especially to our young generation. Our
young generation is mostly connected with internet; therefore online shopping sites have got their
attraction for shopping to buy the products from anywhere around the globe. In our country, almost 17.1%
users take interest in online shopping and mostly lie in the age of 15-33 and it is almost 63% of the total
population of our country. E-commerce stands for electronic commerce and it is business model where the
companies or marketers buy and sell their product through internet. Anything that is bought or sold
through internet is considered as e-commerce business and there are multiple sites exist through which we
can purchase anything. The online sites are not only used for shopping purposes but also used for the
online businesses. People can communicate online and can sell and buy the product of their own choice by
just visiting the specific websites of their own choice on anytime, anywhere and anyplace from the world.
Online shopping sites like Daraz, Amazon, Alibaba, Aliexpress, olx, etc where a user frequently visits the
website and finds out his favorite product, compare the product details, comments given by the customers,
discount and offers, returnable policies, delivery time period, size and price then finally gives his order to
the company and receives it at his home. User does not need to go outsides to visit the shopping center for
one market to another just for shopping purposes; rather he simply needs a mobile phone and internet
connection to buy the product on just a single click. In this way, the user can save his time that is very
precious. Hence, online shopping is on top trend and the reason lies in the development of technology as
well as in our generation.
In our project, we are going to predict the customer’s visit by using some machine learning techniques
also by using BigQuery. If a customer visits any e-commerce store and he spends some time while visiting
and selecting the product, so we will focus on user’s time spending and his mostly preference regarding
product, so that in future if he visits the store; he may finds his favorite product on top of store and gets
the product quickly in his first look. Basically, the main purpose of our project is to give prediction about
the future visit of the customer by using machine learning techniques like logistic regression. Firstly, we
are going to use public dataset and export into BigQuery to explore the e-commerce visitor sessions. It
gives an initial aggregation of the raw data, with redundant information removed. After that we will use a
machine learning technique in BigQuery to give prediction about the visitor whether he will purchase the
product in future or not. Finally, we will evaluate the performance of our machine learning model that can
predict and give decision that a visitor will make a purchase or not. We will also use feature engineering
to improve the performance of our model. In our project, we are using a term named as “BigQuery”. It is
mandatory to fully understand the concept of BigQuery. It is cleared from the name that it will surely
handle some massive amount of data that is very useful to keep the records for future purposes. BigQuery
is fully managed and serverless data warehouse which help us to focus on analytics instead of managing
the infrastructure. It was designed to make large scale data analysis that is accessible to everyone. In our
model, we are going to use BigQuery to find public datasets, query and explore the e-commerce dataset.

1.1 Problem Overview

Categorizing whether a web shop session will end in a purchase or not, is a relevant
use case in the context of predictions in e-commerce. This categorization
followed by the display of gift cards to non purchasing customers, to convince them
of a purchase nonetheless, has proven to increase turnover of a large German clothing.
retailer. A variety of possible prediction models as well as different data sources
exist to carry out such predictions. This paper aims at retrieving well-suited prediction
models and comparing their performances across different data types, such
as static and dynamic data, to establish how customers can be best classified as
buying or no buying. This results in the following research question:

1.2 Research Questions

Question No.1: How will we gather data of customers browsing on the e-commerce website?
Question No.2: What kind of data will be used in determining a customer behavior?
Question No.3: What are the reasons to which a customer might not buy something until a later visit?
Question No.4: What is the role of BigQuery in analyzing customer behavior?
Question No.5: Which classification method may bring out the best results on predicting a customer
Question No.6: How customers can be classified as buyers and non-buyers in subsequent visit?

1.3 Research Objectives

 Using and exploring e-commerce datasets.
 To create a machine learning model for predicting the customer behavior on his/her first visit and
whether the person will buy on subsequent visit to the website.
 To train and evaluate the dataset for prediction.

 To improve performance of machine learning model by feature engineering.

 Analyze various machine learning models and suggest the best model based on results.
1.4 Scope
 Analysis of clusters of user journey data with the intention to identify pattern in customers’ second
visit purchasing options.
 Further analysis of feature trends to discriminate purchase vs. no purchase events that aids in
distinctive customer insights.
 This analysis will aid targeted recommendation and follow ups in the form of marketing
coupons/offers for an improved conversion rate for purchase.

1.5 Methodology
In our proposed methodology, we are going to predict about the customer’s visit by using machine
learning classification model in BigQuery for an e-commerce store either the customer purchases the
product or not.
At initial stage of our project, we are going to conduct an extensive research of all the features in the
dataset. After removing redundant and unnecessary information, the dataset from Google analytics will be
exported into BigQuery. After exporting dataset in BigQuery, we will get the information about the
visitors that made a purchase and that how much time has a customer spent on e-commerce store on their
first visit without purchasing. Now, we will move towards our main task that is the classification model.
Hence, we will use logistic regression model as our base model from which we start to achieve our main
objective. By applying our base model in BigQuery, we will be able to predict whether a visitor purchase
anything in future or not. Then, we will evaluate the performance of our model that can predict and decide
about the visitor. Finally after evaluation, we will use feature engineering to improve the performance of
our model.

1.6 Significance/ Potential Applications

This project has a major significance on people’s daily life because if any person who wants to buy the
product, he simply gives order by just searching the product in search bar almost lies on the top of website
showing the search icon as well. It is very easy and time saving instead of going from one shopping center
to another. Concurrent to population and connection with internet, the numbers of online shopping users
are increasing day by day and its marketers are required to have a good strategic plan to boost their
business. It can boost businesses by providing them meaningful insights on customers’ purchasing habits.
E-commerce store owners can be aware for their top selling products, hence discarding the ones which are
the least to be sold as well as building a better user experience where the customer purchases on his/her
first visit to the site. The marketers can target the users which are least likely to buy with special
promotions to increase the conversion rate of purchasing. Our project can be applied to both forecasting
and classification applications where e-commerce stores can predict sales based on historical data. Every
user of internet can use our project for selling, buying, and take an initial step to start their business and
earn a handsome amount.

The extraordinary advancement in the field of technology in recent years has introduced to the world
which has totally reshaped the whole world and its creature living in this planet. The miracles have been
occurred in human lives and this modern technology has taken its place in the heart of every human being.
Undoubtedly, there is no field in our surroundings where technology is not playing its vital role. As we
know, the world is developing day by day; and the scenario of the whole world is changing. With the
invention of internet, unpredictable changes have occurred. Internet has changed the mind of every human
being with its extraordinary miracles. As the number of internet users and adopting technologies fields are
increasing day by day; with this the competition has increased in every path of life and the most relevant
example in this regard is an “Online System”.
Online system is considered as one of the most efficient, quick, time saving system that has introduced to
common people globally. Online term indicates the connected state; more simply and in easy way we can
say that a functional unit that is connected to system and able to use. The importance of online system
cannot be neglected because the currently the time is very fast and everybody in hurry to deliver his tasks
either they are formal or informal. In this regard, they usually prefer the online system to complete their
tasks. Furthermore, it is not only used by the private sectors, business communities, industrial fields,
academic institutions, journalism but also used by government sector for registration purposes,
announcements of new occasions or jobs, online transactions, etc. Online shopping is mostly attracting the
people of every age especially to our young generation. Our young generation is mostly connected with
internet; therefore online shopping sites have got their attraction for shopping to buy the products from
anywhere around the globe.
E-commerce stands for electronic commerce and it is business model where the companies or marketers
buy and sell their product through internet. Anything that is bought or sold through internet is considered
as e-commerce business and there are multiple sites exist through which we can purchase anything. The
online sites are not only used for shopping purposes but also used for the online businesses. People can
communicate online and can sell and buy the product of their own choice by just visiting the specific
websites of their own choice on anytime, anywhere and anyplace from the world. Online shopping sites
like Daraz, Amazon, Alibaba, Aliexpress, OLX, etc. where a user frequently visits the website and finds
out his favorite product, compare the product details, comments given by the customers, discount and
offers, returnable policies, delivery time period, size and price then finally gives his order to the company
and receives it at his home. User does not need to go outsides to visit the shopping center for one market
to another just for shopping purposes; rather he simply needs a mobile phone and internet connection to
buy the product on just a single click. In this way, the user can save his time that is very precious. Hence,
online shopping is on top trend and the reason lies in the development of technology as well as in our
According to the latest survey, it is well considered that in upcoming years almost everybody and
everything will be attracted and connected with internet as well as our whole data that is now used
manually will be converted into online system as well as in digital form. The total population of the world
is almost about 7.674billion out of which 4.66billion (59.5%) people are connected with internet and
taking the advantage to meet up the worldly challenges in every field of life. There is not even a single
field where the internet is considered as the unimportant. In our country, almost 17.1% users take interest
in online shopping and mostly lie in the age of 15-33 and it is almost 63% of the total population of our
country. One of the latest reports says that in 2022, the Pakistan would reach up to 25% digit that is a
positive approach of our youth. The future of e-commerce in Pakistan is bright although there are some
factors due to which we are lacking behind from rest of the countries.
One of them is illiteracy rate and other one in lack of financial inclusion. In Indonesia, the internet users in
a country reached almost 50% of the total population with approximately 104.96 million internet users.
Whereas, the latest survey shows that these numbers will reach to 133.39 million in 2021 and this will
make the Indonesia as one of the biggest online market globally. All those people who sell or buy online
lies in the age of 25-34 years old and their total number is approximately 12.8 million.
In the era of technology, World Wide Web, machine learning, deep learning has controlled the whole
world. Some such systems have invented that can predict and give accurate result, helps in future. Some
prediction methods like clustering and classification are used to create best accurate result instead of
individuals. Various countries like America, China, Greece, Indonesia, Malaysia, Portugal, etc. and many
other countries are working to develop effective model that can predict accurately by analyzing the
customer’s behavior and customer satisfaction. In America almost 209.6 million people visit e-commerce
store out of 328.2 million populated whereas the total number internet users are 87.3% while on the other
hand in China is one of the fastest growing country and having a largest e-commerce market around the
globe, the number of internet users are 54.3%. In previous work in Malaysian, the sampling technique
named as “Probability Systematic Sampling Technique” has implemented which is based on some
questioner session that were printed on papers and distributed among the different age of people and get
the result. The main focus of this study was about the customer’s behavior and gender difference towards
online shopping. While in Athens the city of Greece, a totally different technique is implemented and its
result is quite efficient. They have worked on an application named as “the application of
Recommendation Systems for Electronic Commerce” and its main purpose is to focus on the important
characteristics and requirements of the environment. They have presented “A Hybrid Model for Dynamic
Recommendations in Electronic Commerce Retail Sites”. According to the study of Indonesia, they have
proposed a conceptual model of e-services and five major attributes have discussed named as “Reliability,
Enjoyment, Control, Ease of Use and last one is Speed of delivery”.
The purpose of our research mainly focus on observing the attitude and behavior of the customer as well
as understanding the previous work thoroughly for predicting the customer’s visit to revisit and buy the
product in future. Currently many systems for prediction exist but many of them require human
involvement. We want to propose a system which do not require human involvement and can perform all
tasks related to prediction about user by itself which will save the previous time of humans. If a fully
automatic system is deployed in any e-commerce site then it will be considered as a remarkable change in
business communities. The underlying purpose of this research is to provide a system of prediction
through which any market owner can understand the thinking of the customer and fully knows about the
requirements of customer’s satisfaction and this could be done while to keep noticing the behavior of
customer, his performing activities, taking care of his/her likes and dislikes, his/her interest, offering
discount, trying to fulfill his expectation and finally the satisfaction is gained.

3.1 Online/Offline shopping

Online shopping is mostly attracting the people of every age especially to our young generation. Our
young generation is mostly connected with internet; therefore online shopping sites have got their
attraction for shopping to buy the products from anywhere around the globe. In our country, almost 17.1%
users take interest in online shopping and mostly lie in the age of 15-33 and it is almost 63% of the total
population of our country. E-commerce stands for electronic commerce and it is business model where the
companies or marketers buy and sell their product through internet. Anything that is bought or sold
through internet is considered as e-commerce business and there are multiple sites exist through which we
can purchase anything. The online sites are not only used for shopping purposes but also used for the
online businesses. People can communicate online and can sell and buy the product of their own choice by
just visiting the specific websites of their own choice on anytime, anywhere and anyplace from the world.
Online shopping sites like Daraz, Amazon, Alibaba, Aliexpress, olx, etc where a user frequently visits the
website and finds out his favorite product, compare the product details, comments given by the customers,
discount and offers, returnable policies, delivery time period, size and price then finally gives his order to
the company and receives it at his home. User does not need to go outsides to visit the shopping center for
one market to another just for shopping purposes; rather he simply needs a mobile phone and internet
connection to buy the product on just a single click. In this way, the user can save his time that is very
precious. Hence, online shopping is on top trend and the reason lies in the development of technology as
well as in our generation.

3.1.1 Comparison between Online and Offline Shopping

Table 1. Online shopping

Pros Cons
1. In online shopping, we can access a wide 1. One of its main factor is risk in online
range of products like clothes, shoes, shopping because the customer have not
accessories, household items on a single touched the product so it remains
shared platform. unpredictable about its size, color size, etc.

2. A customer can return the product 2. Quality assurance is hard to check

within the given time period if he did not online and the customer only buys the
like, thus reduces the travel expenses. product just depending on the details and
feedback given by customers.

3. Time saving, 24/7 availability and a 3. Waiting for delivery, it is most irritating
quick service, it is considered as most part of online shopping and customers
efficient and anyone can shop at anytime don’t have control over delivery process if
from anywhere. product delivered lately or damaged.

4. We can get good offers and deals on 4. Color and size issues, most common
online products and this is absence in issue that exist in online shopping.
offline stores.

5. We can find out our favorite product by 5. Only interested internet users can buy
own and compare them with others and be anything online whereas the illiterate
well assured before buying. people are totally unaware from online

6. It is best convenient way for shopping. 6. Although it is convenient but the

All the items are just one click away from customer does not completely satisfied till
the users. the time he receives and checks the

7. Price comparison, we can choose and 7. It is very hard for online marketers to
read about the complete details of products, build up the customer’s trust and
then make final decision to buy. satisfaction for future visit.

8. It has multiple features like live picture 8. The information about the products
instead of photo shoot, no compromise on should be given correctly otherwise owners
quality assurance, better product have to face the severe consequences of
availability and accessibility, fast delivery customers to report about site that is
and suggestion based on previous record. misleading the people.

9. Good plans and policies making 9. No bargaining concept exist in online

decisions change the online system and shopping.
keep them updated like pricing, returnable
policies, sales and offers, etc.
10. Not paid by cash, users have multiple 10. Highly risk about the security and
ways to pay via credit card, debit card, privacy of your account, we can never be
paypal, etc. assured that in this development age our
credit card will not be misused.

Table 2. Offline shopping

Pros Cons
1. Offline shopping is quite near to real 1. In offline shopping, the number of
time and its one of the main factor is the qualities and quantities are limited and we
customers’ satisfaction. have less number of choices.

2. In offline shopping, customer can check 2. The price of the product is high and the
the quality of the product and buys after owner cannot know about the likes and
complete satisfaction. dislikes of the customer.

3.  We don’t have to wait for delivery 3. Time consuming; the user has to go
process for days, weeks or months to get outside from one market to another as well
the product. as to travel a long distance.

4. Immediately needed dress cannot be 4. Much more expensive than online

bought online and on emergency time the because going from one shop to other
offline shopping is preferable. could be exhausting and spending hours for
dressing and sometimes we find nothing.

5. In offline shopping, customer security 5. Owner cannot know about the

and privacy is never compromised. There is customer’s likes or dislikes or not even
no need to put our transaction record on predict that the customer will revisit in
risk. future or not.

6. A quick returnable policy if customer is 6. Customer gets discounts or offers for a

not satisfied with the purchasing product he very short time just for an hours and mostly
just needs to visit the store and get it user don’t know about it and offer time is
replaced. ended.

7. Payment method is very simple, user 7. Lack of information is one drawback,

just needs to select his favorite product mostly both owner and buyer doesn’t know
and pay the owner by cash. about the correct information about

8. There is no issue related to color, size or 8. Availability issue, in offline shopping

fitting related and we can fully inspect the the availability of the service is limited;
product closely before finalizing. they are not available 24/7.

3.2 E-commerce
E-commerce also referred to as electronic commerce or internet commerce is the buying and vending of
various types of goods, products or services using the internet, and transfer of money and data to execute
these transactions. The history of Ecommerce begins with the first ever online sale which dates back to
August 11, 1994 a man sold a CD to his friend through his merchandising store website. Since its
inception, it has evolved to make the process of discovery and purchase from online retailers and
marketplaces. Ecommerce has helped all kinds of retailers whether it be freelancers, large scale, small
scale businesses to provide their products and services to consumers at a higher level without any
intermediary and that was not possible with the traditional bricks and mortar retail stores.
3.2.1 Types of Ecommerce
1. Retail: This refers to the process of selling goods, products and services to consumers through
multiple channels without an intermediary.
2. Dropshipping: It refers to a set of products and goods that are manufactured and sold from one end
but shipped from another end via third parties.
3. Wholesale: The goods that are sold in larger quantities. These wholesalers provide these goods in
exchange of money to retailers, who then vend these goods to consumers.
4. Services: The technical skills or a skill that helps in improving the day to day problems such as social
media marketing etc. that are purchased and paid for online.
5. Subscription: This is based on the idea of selling a product or service to receive monthly or yearly
recurring payments.
6. Digital products: Downloadable items such as e-books, templates software, and tools that must be
paid for before use.

3.2.2 Ecommerce Business Model

Ecommerce business model is an online retail shop unlike the traditional Bricks and Mortar (B&M) retail
stores. This is a kind of platform which allows the customers to submit their orders for a product or
service from a merchandising website. This merchandising website is known as the e-commerce website.
A website store built on this ecommerce model has an integrated module for payment processing service
and as well as warehouse system. Such kind of services allow the customers to purchase products without
hassle and they can easily sell or buy through cash cards i.e. credit or debit, or with an online wallet
service which holds customer credits. The e-commerce business model can be classified into two most
popular categories, Business to Consumer (B2C) and Business to Business (B2B). Business to Consumer (B2C)

The Business to Consumer (B2C) ecommerce model allows people to open up a retail store for selling
various types of products online to its consumers and it also allows people to be an independent brand
selling its products to consumers without an intermediary via a website. The B2C model focuses on the
idea of selling the products or services directly to the end consumers without any involvement of a third
party. The prime example of B2C can be of Amazon Inc. who are the biggest online ecommerce based
company in the world and made its CEO Jeff Bezos, one of the richest person in the world. Business to Business (B2B)

The Business to Business (B2B) model has a purpose and that is to make a good buying relationship
between two business entities. This model promoted the needlessness of involvement from the two parties
during a deal and that everything would be handled digitally. The completion of sale and transactions
would not require any condition that the two parties have to meet in person. According to study, 70% of
sale is made in advance, 92% of the buyers search product insights on the internet before purchase, and
about 37% of the customers post comments and questions for feedback. According to studies, 75% of B2B
deals are made on social media, and 57% of purchases are made before meeting with the salesperson. A
prime example of B2B model would be Alibaba Group Inc. who are the largest B2B marketplace provider
in the world.

3.2.3 Lower Sales Conversions challenge faced in the Ecommerce Webspace

The conversion rate of sales is a grade for measurement which is calculated as the number of customers
who complete the journey of a purchase on the e-commerce website among the total number of customers
that visited the site. For example, if my website has 100 users per day for the purchase of a shoe but only
10 of the users complete the process then my website has a conversion rate of 10%. Another view of a
sales conversion is the scenario where the user follows the call to action and completes the steps required.
For example, a website asks a user to enter his/her email address to subscribe to the emails from the
company and the user successfully subscribes to the newsletter, it would be a conversion in this case. To
optimize the conversion rate, we plan to provide analysis on the best suited machine learning models to
help out the retailers to enhance their way of approaching the customers and increasing the overall visits to
sale ratio.

3.3 Big Data

The term big data is defined as a large volume of data which can be structured or unstructured. A
structured data also known as quantitative data, is highly maintained and organized type of data which is
formatted so that it is easily searchable and accessible in relational databases. For example, names, dates,
addresses, location etc. Whereas, unstructured data often known as qualitative data, has no predefined
format which makes it hard to collect, process and analyze. For example, text files, video and audio files,
weather data, financial transactions etc. Big data as its name suggests, include large amount of data but it’s
not the size that matters the most, it is the way in which this data is utilized by businesses all around the
globe. Big data can be analyzed for insights which prove to be fruitful for the organizations in making
better strategic business moves and decisions.

Big data is so large, swift, and computationally complex that it cannot be processed using traditional
methods. The implications that big data help retailers to perform can be cost reductions, time reductions,
in-demand productions, optimized offerings to customers to increase profits, smart decision making and
more. The technologies that utilize the big data analytics are Classification, Cluster analysis, data
integration, Machine learning, Natural Language Processing, Sentiment Analysis and many more. Big
Data has become a necessity for these technologies as it is necessary to conjure hidden patterns and to find
useful answers without over-fitting the data. Big data works by following principles:
 Identify
 Access, Manage and Store
 Analyze
 Make decisions

3.4 BigQuery
It would be pretty difficult to think that a marketer who does not work with Google Ads, Google
Analytics, and other such services. Google BigQuery is a significant part of Google’s infrastructure.
Marketing analytics is a necessity for all kinds of business progress. Google is continuously working on
developing BigQuery which proves that there wouldn’t be a scenario where this service might get ceased
and would not be supported in future. BigQuery has a simple and fast infrastructure, it includes ready-
made SQL queries so that there can be a useful of insights from the gathered data. BigQuery works with
Machine Learning which helps in analyzing and automate the process of marketing by segregating target
audience, searching for meaningful insights. It is a fully server less data warehouse that enables safe and
scalable analysis of large amount of data. Moreover, the BigQuery does not need python to build models,
BigQuery uses SQL to build and access machine learning models. This in turn helps in speeding up the
model building process as the need to extract and export data from a data warehouse is eliminated.

Some of the features that make it stand out are as follows:

 No Servers: The infrastructure does not require any need of server from our side and there can be
work done from anywhere in the world.
 Data processing: With the enhancements made in BigQuery it has been made possible to enable
real-time analysis of any kind of data and however large it can do it quite swiftly by using SQL
queries with ease and at any scale.
 Integrations: It can be integrated with other services of Google platform such as Google Analytics and
Google Ads which enhances the overall accuracy and efficiency of this product.
 Data security: All of the data in BigQuery is protected, well-kept according to Google standards. No matter
where we login from we will always have a secure access to data.

3.4.1 BigQuery ML
This feature allows us to create and run machine learning algorithms using standard SQL tools and
queries. It has significantly increased the development speed by excluding the need to move data. This
functionality is available through Google Cloud Console, bq command line tool, BigQuery REST API,
Jupyter notebook. Large datasets on machine learning require exhaustive programming and knowledge of
ML frameworks. As a result, this area of development was being restricted to only the experts of the field
and a number other data analysts who just happen to understand the data but do not have exposure to
advanced ML techniques and programming. BigQuery comes to rescue for such causes to empower the
data analysts and help them to use ML through existing SQL tools. They can use BigQuery ML to create
and evaluate Machine learning models without having to move data from here and there into the
spreadsheets. In BigQuery ML, we can utilize a model by providing it with data from multiple datasets in
order to train and predict the analytics to guide business decision making and increase profits. BigQuery
ML supports the following models:
 Linear Regression
 Binary Logistic Regression
 Multiclass logistic regression
 K-means clustering
 TensorFlow based Deep Neural Network

3.5 Data Mining

Data mining is a branch of data analytics that is used by firms and organizations around the globe to turn
raw data into meaningful information. It uses large data batches to look for patterns using a software
which helps businesses to learn more about their customers in order to develop more effective strategies
regarding marketing, decreasing costs, increasing profits, sales, building successful loyal and ever-lasting
relationship with customers etc. Data mining helps in the healthcare industry as well as education by
reducing costs, making efficient operational decisions. Data mining depends on effective data collection,
data warehousing, and computer processing. These processes are used to develop machine learning
models that strengthen the applications such as website recommendations and search engines.

Data mining works by exploring and analyzing large chunks of information to extract meaningful trends
and patterns. The process of data mining includes five steps, (1) collecting data and loading them in
warehouses, (2) store and managing the data on servers or cloud, (3) accessing the data and deciding how
to organize and utilize it, (4) Using software to sort data, (5) presenting data in an easy to read format
whether it be a table or graph to identify patterns and perform data analysis.
3.5.1 Classification in Data Mining
The technique that is majorly used to put business needs to fulfilment is classification.
This mining technique is used to distinguish data classes and concepts by creating a classification model
using different algorithms making it to learn with the training of the model to predict accurate results. One
of the examples can be whether a customer has to be classified as a trustworthy customer or a defaulter in a
credit card transaction data base, given his various demographic and previous purchase characteristics.

3.6 Machine learning

Machine learning is a field of Artificial Intelligence (AI) that concerns about the development of applications and
study of machine that learn from data based on some experience to enhance the efficiency of system. When some
data is processed; system learns from the data set and make some future predictions based on given data set.

3.6.1 Categories of classification

Machine learning is often categorized into three different types and their names are mentioned below:
o Supervised Learning
o Unsupervised Learning
o Semi-Supervised Learning Supervised Learning

Supervised learning is the type of machine learning that includes labeled data. It means some training data is
required that consists of set of labeled examples and make prediction based on labeled set of training set. When some
known data is given to system (machine) to decide then model is processed and response that what will be the
prediction related about the given data and finally decides the correct output of the object. It also works on the basis
of previous experience.
Classification is a category of supervised learning that specifies the classes of data elements in which they belong.
There are some categories of classification in machine learning and their names are:
o Binary Classification
o Multi-Class Classification BINARY CLASSIFICATION

Binary means two and binary classification consists of two classes that include only two labels of
class, e.g., spam email; it means either the email is spam or not. There are some well-known
algorithms included in binary classification and their names are:
o Logistic Regression
o K- Nearest Neighbors
o Decision Tree
o Naïve Base

Multi-class classification defines those tasks which have multiple classes or having more than two labels of class,
e.g., face recognition in which we have to determine some characteristics of face like male/female, glasses or
without glasses, etc. Unsupervised learning
Unsupervised learning is the type of machine learning that includes unlabeled data. It means that the algorithm that
we want to run will work on unlabeled data to decide about the object. When some data is given to the machine; it
finds all possible ways to make predictions about the data. The most important thing in unsupervised learning is that
the data should be unlabeled and the algorithm that works on it should be trained on unlabeled data. During process,
the algorithm looks about all kinds of the possible ways to find about the correct information about object and finally
gives the result that can be helpful for further process. Semi-supervised learning
Semi-supervised learning is the field of machine learning that consists on both supervised and unsupervised
learning or simply we can say that the combination of both labeled and unlabeled data and it is fully independent to
make prediction about the data. Hence, semi-supervised learning in other way we can say that it is very close to
learn when both labeled and unlabeled data is present. It makes prediction on the basis of both learning data.

3.7 Cloud computing

Cloud computing has become one of the foremost important models within the latest years and plays its vital role in
every field of life, particularly in technology. Cloud computing has created its importance by using in the field of
Information Technology (IT) and in business field as well. If someone wants to start his business, then he must pay
his attention towards the information security. So, the data security is one of the main factors of business realization
and it is very low in this regard due to some decryption of data that is decrypted by cloud service provider. So, it can
be defined as in such a simplest way that the delivery of multiple types of services that is provided by internet and
enclosed the services like information storage, database, virtual machines, applications, platform for securing the
data and saving the networking resources. In everyday life, we are using multiple cloud-based application and we
are getting benefits from cloud solutions like we send a file or image via web, downloading any data, mobile app,
Netflix show and other all these types of services are stored in cloud and exist at some place within the cloud. If we
delete anything from our devices, and it is still existing in some place (cloud), but we don’t know where does it
exist but the modern research say it exists on some clouds but unable to locate. A user can access any information
anywhere in the world, but he just needs to connect the internet, it is the only requirement that a user need.

3.8 Related work

In our project, the focus is on observing the user’s attitude and behavior as well as understanding the
previous work thoroughly for predicting the customer’s visit in future. In previous work, there is several
prediction models have proposed by using machine learning, artificial intelligence, neural networks,
decision tree, naïve bayes, cluster analysis, binary classification and so on. In recent years, machine
learning and artificial intelligence have reached to the bigger markets only due to their high prices for such
a great technologies. Therefore, many retailers and company owner are willing to invest in these
technologies at world platform by 2021, including the big data solutions for machine learning. So, the
capability of customization of user’s behavior and predict as well as give some recommendations about
products are uniquely liked to customer on the websites, is handle by the artificial intelligence. All those
technologies who predict are the prominent phenomenon of artificial intelligence. As the numbers of
internet users are increasing every day, retailers tend to move towards the implementation of machine
learning and artificial intelligence in their market place to snatch the customer’s attraction. This shows the
importance of modern technology and that’s why we are implementing this approach in our project.

Previous research shows the multiple techniques related to the prediction and buying intention of the
customer on online sites. Old techniques like statics and probability in which the different sampling of
questions are prepared and distributed among the people. This questioner paper contains some basic
information in the form of true/false, right/wrong, agree/disagree, etc. After collecting back, the data we
have is not in a well ordered form. Mostly, people don’t fill the paper carefully they just fulfill the
formality. In this way, the result is not accurate and almost considered as formality purposes and no one
have trust on this type of data and its result. This technique is used in Malaysia. In China, multiples
techniques have used by using the modern techniques. They have worked on an architecture based on
binary classification (Naïve Bayes) and gained the quite favorable results. They have applied the methods
of collaborative filtering. It is a numerical rating value that gives the final result in Yes/No forms.
Collaborating filtering defined the situation of people in which people collaborate with each other, notice
the behavior of each other, action performed by them as well as recording their reaction to document often
they use. It is mostly rely on gathering the rating from massive amount of data used by larger number of
users. As we know, we are dealing with massive amount of data and it is hard to find out the favorite
product from big data and more often the user get confused while choosing his product.

Collaborative filtering is such type of technique which is used for automatic recommended system. It
supposes that if the customer has bought any product then he may have the same interest, similarities and
preference. It has two types of classes named as user-based and item-based. In user based collaborative
filtering, it measures the common interest or behavior between the target user and other user. It predicts
the ratings given by end users to some specific product, his purchasing history and product preference that
record the rating value. On the contrary the item-based collaborative filtering, it measure the common
behavior among the items on which the end user rate or interaction with other item. It uses the rating or
purchasing information of common items. On contrary to collaborative filtering, the content-based
filtering is a machine learning technique that measures the product features to recommend more products
similar to previous one the user likes and takes interest in them. It is based on the previous history, what
has user searched for as well as it depend on the description of products and customers to predict.

A lot of predicting models have proposed already by using the modern technology like machine learning
and artificial intelligence. Here are some models mentioned with their experts. Liu Weixiao has proposed
such type of model named as “Hybrid Intelligent Prediction Algorithms” by combining the Artificial
Neural Network (ANN) and Discrete Grey Prediction Model (DGM). The importance of artificial
intelligence cannot be neglected in any field of life. There are various fields of life where artificial
intelligence is playing its tremendous role. In medical field, the famous expert named as “Zhou Yunhui”
has worked and predicted on information (data mining) of breast cancer and introduced a treatment way
by Bayesian Network. In the field of electricity, Xu Jun has proposed a method to predict about electricity
by using algorithms of time series and multivariate linear regression as well as grey prediction. He also
proposed such a forecasting method of electricity by which we can improve the ability of electricity for
short, medium and long term purposes. Data mining technology is used by Peng Yuqing to lead regression
analysis, cluster analysis, principal component analysis and some association rules of mining for a
massive amount data from which we can extract the a handsome amount of information which might help
the teaching faculty in almost every institution to manage their work properly as well as can play a leading
role in performance of teaching.

In earliest research, three types of models has proposed and implemented for prediction purposes and their
names are: decision tree, cluster analysis and naïve bayes algorithm. They are deeply employed and
analyzed to check the customer attitude, his characteristics, similarities, action performance, preference,

Decision tree is one of the common classifications model as well as used for prediction purposes in data
mining. It is generated by splitting the data into multiple groups. The ability of generalization is quite
admirable and strong enough. Mainly, it has two steps from where it takes it initial start. It includes two
types of nodes: internal node and leaf node. Internal nodes usually distinguished differently between
feature and attributes on the contrary the leaf node presents different type of classifications. In all nodes,
the root node does not have the parent node, others have only one parent node and node without any child
node is known as leaf node. The main idea of decision tree is to choose the suitable labels for input data,
test the attributes, training sets with location attributes by gathering the information as well as to classify
the uncertain attributes by predicting the best possible results till it cannot train the data in an efficient
way.It starts from the root node whereas the data is divided into two groups according to the similarities
and keep on dividing the data till the leaf node will come to predict otherwise keep on dividing further. It
has some attributes that shows its importance, it has low requirement on data set, continuous data and
categorized data both can be processed, easy to understand and implement as well as does not have high

Cluster analysis is an unsupervised technique in which the data is splitted into various classes on the bases
of common attributes. The object in same class having high similarities while in different class having low
similarities. Clustering technique belongs to unsupervised learning and having the strong capability to deal
with low data. It relies on the characteristics, nature and clustering of the data itself. K-Means algorithm is
most famous clustering technique used in machine learning and it divides the clustering objects by an
average distance. Its calculation is very simple, any data set which contains n number of objects, randomly
select clustering objects as k, now select the cluster which is quite near according to the distance between
the remaining clustering objects and the center of each cluster. Then, recalculate average value of each
cluster till the time similarities within same class are high and low in different class.

In our project, we are going to use Machine Learning Model based on Logistic Regression that will be
used for predicting the buying behavior of user on e-commerce sites through which any market owner can
understand the thinking of the customer and fully knows about the requirements of customer’s satisfaction
and this could be done while to keep noticing the behavior of customer, his performing action and
activities, calculating time session, number clicks on products, keeping and maintain the record history,
taking care of his/her likes and dislikes, his/her interest, offering discount, trying to fulfill his expectation
and finally the satisfaction is gained. The parameters which are used in our project are: date and
time, IP address, cookies, actions, target, url, object.
At initial step of our project, we are going to use the dataset that is publicly available and export the
dataset into BigQuery. After exporting dataset in BigQuery, we will extract some information about the
visitor that how much time has he spent on e-commerce store. At this point, we have collected the
information related to visitor and product with unnecessary information removed. Now, we will move
towards our main task that is the classification model. Hence, we will use logistic regression model as our
base model from which we start to achieve our main objective. By applying our base model in BigQuery,
we will be able to predict whether a visitor purchase anything in future or not. Then, we will evaluate the
performance of our model that can predict and decide about the visitor. Finally after evaluation, we will
use feature engineering to improve the performance of our project.

3.9 GAP Analysis

Table 3. Summary Table

Sr. Domain Architecture Pros/cons No. of features Dataset Results

[1] Machine Classification Pros: The proposed Dataset is The result
Learning models system predicts provided by gained from
1: Manage the
Based Machine the user by Adventure proposed
massive amount of model is
Method learning like gaining some Work
data with the help of quite
for decision tree, information Company.
data mining. satisfactory
Customer Cluster about the
Behavior analysis, data features like, and result of
2: Can transmit data
Prediction mining over wide ranges. age, profession, Decision
no. of family tree is
Object detection runs
members, 73.43%,
on mobile platform. Custer
Cultural levels,
Cons: region, and analysis is
marital status. 61.66%.
1: The issue rises in
this regard is to
handling the sensitive
and large amount of

2: It depends upon
the customer
segmentation based
on the data
provided by the
company history

[2] Predicting Machine Pros: There are Dataset is

Customer Learning and multiples features provided by
Behavior in Artificial 1: It identifies the
are used in this Tipser
E-commerce uncertain attitude model named as:
intelligence Company, a
using of the user. product view,
Machine techniques are company
implemented. 2: Comparison cart adds, cart that runs its
Learnin removes, no. of
g and focus on the own e-
clicks, cart tab
clicks, payment commerce
needs of users.
page viewed. business.
3: It helps to
increase the sales
and revenue of

4: Access the data

of more than 50
million events for
future actions and
events occurrence.

1: It uses a sensitive
data to train the
model that may
affect the result and

2: It is not well
assured about data
quantity and all the
relevant information
about intended

3: It requires
removing the
unnecessary data
that may lead us in
wrong direction.
[3] E-commerce Machine Extracting the Dataset is The
Purchase learning information from provided by classificatio
Prediction algorithms, 1: It is fast and user behavior Ali Mobile n model, the
Approach By convenient way to data, user’s result is and
Binary Recommen
User preference, and is 63.89%.
classification discover about dation
Behavior actions
Data model of Naïve products in a Company. It
Bayes complex situation. performed on consists on
products like no.
12 million
2: It helps the e- of clicks, clicking
commerce time, and
record of
companies to duration time.
understand the user
users across
demands more
3: It helps to build
the relationship
b/w product and
user to purchase or

1: Users are often
get confused with
large amount of data
while selecting their
favorite product.

2: It needs to
terminate the sparse
data that includes
many gaps in it.

3: It is completely
against single user
vs massive variety
of products.

[5] Online- A Hybrid Model Pros: The proposed The data set Main focus
Purchasing FA-based SVM system predicts is provided to draw the
Behavior 1: It deals with the
model of the user by by an conversion
Forecasting massive amount of
Machine gaining some anonymous result that
with a Firefly data.
Learning is information about online how many
Algorithm- the features like,
based SVM used. 2: The detailed furniture users have
information about clickstream store with conversion
each product can be measures,
Considering 3,006,524 rating and it
Shopping viewed when purchasing
visits and was about
Cart Use registering with behaviors,
1,267,757 42.17%.
username and customer
heterogeneity and purchases
shopping cart within the
3: It can be used as a sampling
powerful tool for usability. period.
forecasting online-
purchasing behavior,
in terms of prediction
accuracy, robustness
and time saving.
1: It needs too much
attention to work with
large amount of data.
2: SVM has its own
weakness like
parameter sensitive.

4.1 Suggested Approach

The importance of our project named as “Prediction Visitor Purchases with a
Classification Model in BQML for an E-commerce Store” can be perceived by end user
when the whole project is implemented in a real life then someone can understand it in a
better way. After reading multiple research papers of different experts who put their best
to predict about users, we have reached on a point where we can give our own idea to
make it easier and more reliable for any e-commerce store and here, it is elaborated in
After surveying the needs of our environment, we have made up mind to propose an idea
regarding predicting model. We have studied many research papers, articles, opinions of
many experts who have tried their best to give them in in this field. Some of the related
work and experienced we have gained from previous work is commendable and they
have encouraged us to make some more effort to give prediction about end users in a
quite efficient way. In previous work, the prediction models have been implemented in
various countries like China, America, Sweden, Austria, Indonesia, Malaysia, Greece,
Portugal and various countries around the world.
In China, multiples techniques have used by using the modern techniques. They have
worked on an architecture based on binary classification (Naïve Bayes) and gained the
quite favorable results. They have applied the methods of collaborative filtering. It is a
numerical rating value that gives the final result in Yes/No forms. Collaborating filtering
defined the situation of people in which people collaborate with each other, notice the
behavior of each other, action performed by them as well as recording their reaction to
document often they use. It is mostly rely on gathering the rating from massive amount of
data used by larger number of users. As we know, we are dealing with massive amount of
data and it is hard to find out the favorite product from big data and more often the user
get confused while choosing his product.
In Malaysia, still the old techniques like statics and probability in which the different
sampling of questions are prepared and distributed among the people. This questioner
paper contains some basic information in the form of true/false, right/wrong,
agree/disagree, etc. After collecting back, the data we have is not in a well ordered form.
Mostly, people don’t fill the paper carefully they just fulfill the formality. In this way, the
result is not accurate and almost considered as formality purposes and no one have trust
on this type of data and its result.
In Sweden, the experts have proposed a predicting model about customer’s future visit by
using the modern technology like machine learning, artificial intelligence, binary
classifications models through which the retailers know about the customer’s preference,
likes/dislikes, buying behavior, etc. They have used a data set which is provided by
already existence company of e-commerce businesses where he access the data in large
amount and by using the massive amount of data they have extract some information
about users as well as extract the features like product view of the customer, cart adds,
cart removes, cart table clicks payment page viewed, number of clicks, preference, etc.
After extracting the data and features, they have gained the data in pure form while the
unwanted data has terminated and now applying the model, some predicting results have
gained due to which they can judge either the user buys revisit the site in future or not.
The importance of technology cannot be neglected in any field and the great effort done
by some great minds is a tremendous work in their fields. Some of the brilliant minds has
completed their work and they continuously spreading their positive brightness in every
field of life. Already, predicting models have proposed by using the modern technology
like machine learning and artificial intelligence. Liu Weixiao has proposed such type of
model named as “Hybrid Intelligent Prediction Algorithms” by combining the Artificial
Neural Network (ANN) and Discrete Grey Prediction Model (DGM). The importance of
artificial intelligence cannot be neglected in any field of life. There are various fields of
life where artificial intelligence is playing its tremendous role. In medical field, the
famous expert named as “Zhou Yunhui” has worked and predicted on information (data
mining) of breast cancer and introduced a treatment way by Bayesian Network. In the
field of electricity, Xu Jun has proposed a method to predict about electricity by using
algorithms of time series and multivariate linear regression as well as grey prediction. He
also proposed such a forecasting method of electricity by which we can improve the
ability of electricity for short, medium and long term purposes. Data mining technology is
used by Peng Yuqing to lead regression analysis, cluster analysis, principal component
analysis and some association rules of mining for a massive amount data from which we
can extract the a handsome amount of information which might help the teaching faculty
in almost every institution to manage their work properly.

Other research shows prediction methodology in a different way and they have gained the
result from 60-75% and mostly the data set is provided by the companies. According to
them, if the retailors or market owners wants to catch up the user’s attention towards
online, then they need a proper plan, implementing the plan on a right a direction for
example all the factors of customer satisfaction should be implemented like user visits the
e-commerce store and search some product, in future if he revisits then automatically he
must get his choice on top at his first look and this could be done by some prediction
1. Our suggested approach (write a brief description about approach)
4.2 Workflow of the system

 How you chose your core setting

 What we need to know about the setting
 Calculations, technique, procedure and equipment
 Limitations, assumptions, and range of validity. 

4.3 Algorithms/Architecture

 Description of any novel procedures which you have proposed and implemented
 Incorporation of existing systems (if any), their description and settings

This chapter should explain what you did and how you did it. It must be clearly written so it would be easy for another
researcher to duplicate the experiment if they wished to.

5.1 System Design

Design is considered as one of the most important things of any product that attracts the
users for just giving them a stylish and innovative look. It is the process of developing
and collecting the descriptions, plans, images, and procedures that allow us to create what
is in our mind. It is defined as the visual look or a shape of any object that is formed in
reality to satisfy the people to buy or use the product at least once. There are some
characteristics of the design that make its importance:
o Performance
o Robustness
o Interactively
o Flexibility
o Security
o Re-usability and Portability

5.1.1 Performance
Performance is a process to determine the ability of any product on the basis of end result
as well as the process used for testing its functionalities, efficiency, running capability,
the stability of the program or application or any other processes that are required to
check the credibility of the software or application. The main goal of the performance is
to check either the application or software is performing well under some suitable action
while running with error free. It gives the enough information related to application as
well as it clearly describes each and everything that what are those needs which are
required to fulfill the requirements of user as well as to make its performance well. A
good developer always tests its application before handling to the end user because he
knows the user only demands the performance that is what he has chosen. So, without
testing the performance the software is likely to be considered dead software.
In our project, we specially put many efforts to have a system which can put the best
performance. It was one of the base reasons of us proposing the predicting model because
as in the world of today technology is growing rapidly and changes should be adopted
quickly as well. Many systems related to predicting been proposed before but all of them
lacks in performance or accuracy. We proposed this system while carefully undertaking
all performance related issues in mind.
5.1.2 Robustness
The term robust often referred as strength whereas the robustness [ R.P] is a system’s
ability to handle with abnormal situation or user’s misbehavior or may be some error in a
system. It verifies either a system is showing any sort of error that is unacceptable or not.
So, to handle this sort of error; fuzz testing is applied to handle robustness test. So,
robustness testing is the way to check the quality of software that how much it is
efficient. It defines either a software is performing well in different conditions or not and
ensures that a software system has qualified and completed its target for which purpose it
has developed. Thus, robustness testing is applied under different situations for assurance
purpose and in other words we can that it is a collaboration of the entire reasonable load
that is applied on the system to check the performance as well as the validity.
The robustness of our system will be measured using different robustness testing
techniques. We have planned to test our software as well as our hardware system by
giving “valid and invalid inputs” and to identify the working level of our system under
severe conditions. As with the software, we will test it by giving things not available in
dataset or some unknown entry, or a material related to two classes. This will test the
robustness of our system.

[ R.P link] P. Netinant, “Design Reusability and Adaptability for Concurrent Software,” AASRI
Procedia, vol. 5, pp. 133–139, 2019, doi: 10.1016/j.aasri.2013.10.069.

5.1.3 Interactively
Interactively of a system means how the users will interact with the user interface. In
information technology, interactivity is one of the most concerned things for the
developer. The ease of user interaction depends on the three components which are:
o Efficient
o Effective
o Satisfaction

If an interface has these three things it means that it provides a good user interface for the
users. The first one is efficient which means that the interface can do what is assigned to
it easily, naturally, and quickly without taking much time. The second one is effective
which means that the interface performs the right task. It should do what is required. It
should perform the task right. The third one is satisfaction which means that the user
should feel good after using the interface. The user does not get tired using the interface.
The interactively of user with the interface must satisfy these three conditions.
Our system will also have user interface because it will have a web and mobile
application which will be used by the authorities. The designing of user interface is a
critical thing because it can enhance the user experience using it if it is developed
carefully. Our applications are quite simple and will be easy for the personnel to
understand and use the interface. Not much background knowledge and expertise will be
required to use our software interface.

5.1.4 Flexibility
Flexibility can be defined in such a way that ability of software to change the sudden
requirements given by user during the process to make the system more valuable and it
can be adapted when some external changes occur. In this modern era, the requirements
of the users and developing environments have become simple and complicated. This
causes to change the situation slowly and transfer into more complex form. During the
development phase, the user may make a request to change some requirements in
software that might be regarding functions, interface, etc. Therefore, in order to handle
with these amendments, we must need a development method to adjust these contents and
requirements. Hence, the sudden change in project may lead to some inappropriate
specific method and it is handled by flexibility.

In our proposed system, we consider the factor of flexibility beforehand. We are

developing a project in a controlled environment, but our proposed system has the
potential to be deployed to a bigger environment. Also, the area and the classification
types in our project are limited. But the flexibility of our project is that it can be changed
according to user requirements.

5.1.5 Security:
Security determines the protection of the system and all the data within the system,
information and maintains functionality as intended by an owner in other words we can
say that it protects data against the disclosure of information to parties other than the
intended person and ensuring that data or information is accessible only for those persons
who have allowed or authorized and to prevent the third person to steal the information
while using any source. It allows a person to determine that the information which it is
providing is correct and may involve confirmation of the identity of a person. There are
certain points in which security checks are measured to make a system confidential. . It
has two types of security in online system; one part belongs to data and security of
transaction while the other one is the authenticity. [ R.P_Article 4]. It includes
checking whether there is any unauthorized way to access, any way of losing the
information. It also includes sorting out all the holes and weaknesses of the system that
can impact the security. One of the main purposes of security is to identify the
vulnerabilities and subsequently repairing them if there is any security breech is present in
a system.
In our project, we have considered the security as the deciding factor for its adaptation in
real world applications because the more secure the system is, the more attractive it will
be to the real-world users. For having a secure system, we have adopted the option to
perform our processing on cloud rather locally completely because as it is already known
that it’s the users that are manipulated for security breach and not the computer. That is
why we decided to use cloud computing for storing our data and to perform processing.
And only the authorized users will be able to access the dataset and other stored
information because cloud computing has become very secure over the year.

5.1.6 Re-usability and Portability

Reusability is the fastest process in the development field to get a goal. With the help of
this process, we can create applications with high speed, low cost and achieve best quality
by using from pre-existing designs or building blocks with little effort. The reason behind
is that reusable [R.P1] components are used many times and they are more likely to be
used and accurately fixed on the basis of field experience and with all this we can achieve
a high level of quality. It may be very difficult for developers to achieve this level of
quality at one time. Thus, it helps the developers to achieve such a high-quality system as
well as fast development. It is faster to gather applications from existing designs than to
design them from new. As reusable designs are thoroughly tested and there are very less
chances to have errors, instability and misunderstanding in the resulting product by using
the best techniques. One of the major advantages of reusability is that developers can
learn from the previous designs that are used and this should employ the best design
techniques and modifying them is a good way to learn and gain more experience with the
techniques as well as to achieve good result.

[ ] P. Netinant, “Design Reusability and Adaptability for Concurrent Software,” AASRI
Procedia, vol. 5, pp. 133–139, 2019, doi: 10.1016/j.aasri.2013.10.069.
In our project, it can be defined that our whole paper and model can be reused in future
for a different kind of project. If we talk about our model then our techniwue about
prediction and all the extracting information and features can be reused and anyone can
take help while completing his project.
Portability is a quality of a computer or application that can easily be moved from one system to
another system. A system will be called a portable system if it is moved from one system to
another system and run frequently without generating any error. It means to transfer any
application from one device to other and to make it workable smoothly on other device comes
under the portability scenario for example moving of any application from Windows 2000 to
Windows 10.In our project, the portability factor lies in our project. As we know that the software
components based models are portable on contrary to hardware components.

System Implementation
The implementation section describes how the different components in the project have been implemented. It should also
consist of:
 Develops tools and environment used
 Implementation of different modules (including detail steps about how they were Developed)
 Sample codes (including standards and conventions)
 Difficulties faced and how they were addressed.

Assumptions/Constraints (Optional)

Online shopping vs offline comparison:

You might also like