Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Sustainable Operations and Computers 4 (2023) 53–61

Contents lists available at ScienceDirect

Sustainable Operations and Computers


journal homepage:
http://www.keaipublishing.com/en/journals/sustainable-operations-and-computers/

PAPQ: Predictive analytics of product quality in industry 4.0


Md.Anjar Ahsan a, Khaleel Ahmad a, Jameel Ahamed a, Mohd Omar a, Khairol Amali Bin Ahmad b,∗
a
Department of CS&IT, Maulana Azad National Urdu University, Hyderabad, India
b
Faculty of Engineering, National Defence University of Malaysia, Malaysia

a r t i c l e i n f o a b s t r a c t

Keywords: In e-commerce, Industry 4.0 is all about combining analytics, artificial intelligence, and machine learning to
Industry 4.0 simplify procedures and enable product quality review. In addition, the importance of anticipating client behavior
Product quality prediction in the context of e-commerce is growing as individuals migrate from visiting physical businesses to shopping
Naïve Bayes
online. By providing a more personalized purchasing experience, it can increase consumer satisfaction and sales,
SVM
leading to improved conversion rates and competitive advantage. Using data from e-commerce platforms such
k-NN
Random forest as Flipkart and Amazon, it is possible to build models for forecasting customer behavior. This study examines
Random tree machine learning techniques for product quality prediction and gives an insight into the performance differences
of machine learning-based models by doing descriptive data analysis and training each model separately on three
datasets viz Mobile, Health Equipments, and Book Datasets. Support Vector Machine, Nave Bayes, k-Nearest
Neighbors, Random Forest, and Random Tree were the machine learning methods utilized in this work. The
results indicate that a Support Vector Machine Model provides the greatest fit for the prediction task, with the
best performance, reasonable latency, comprehensibility, and resilience for the first two datasets, but Random
Forest provides the highest performance for the Book dataset.

1. Introduction hance present e-commerce procedures and guarantee that all of the data
collected will be used to make even more advancements in the future
In this fast-growing world, consumers put more and more emphasis [5,6].
on quality products. The online product selling system provides options After China (at 575 million) and the United States (at 275 million),
for the customer to select the best product based on various customer India now has the third-largest Internet population on the planet, with
reviews. Although product reviews have aided customers in resolving about 4.2 trillion dynamic web clients and 3.4 billion web-based life
the merits and demerits of various products, ultimately assisting in the clients in 2018 (The Statistics Portal). In 2019 an estimated 1.98 bil-
selection of the best product for an individual’s needs, the overwhelm- lion people were expected to buy something on the internet [7]. This
ing and voluminous ratings provided by existing reviews create confu- indicates that India is rapidly evolving, and people are becoming more
sion and chaos when it comes to purchasing the product, and introduce accustomed to using the Internet as human civilization advances, com-
a challenge for prospective customers to analyze this massive amount munication formats improve, and digital convergence opens new mar-
of data [1]. The amount of data in Product Reviews is rapidly increas- keting opportunities and challenges. As a result, the Internet has ad-
ing day by day. Some customers have begun to include pictures/images vanced to play a key role in the Customer Decision-Making Process. This
of the product in their evaluations to make them more appealing and work focuses on investigating the components of E-Marketing, in terms
user-friendly [2]. As a result, the product evaluation dataset may be of product quality that leads to Customer Behavior towards a particular
considered a big data analytics challenge [3]. Hence, if we want the product [8].
customer to be able to select the best product, then we should apply In this paper, we combine the seller rating, service rating, and prod-
some techniques to help the customers [4]. Industry 4.0 is an integral uct rating provided by different existing users in order to develop a
element of this process, which is further reinforced by the rapid ad- machine-learning model that could assist customers in selecting a high-
vancements and the way people conduct business through e-commerce. quality product without the need to manually examine vast quantities
With everything having to do with the online world, Industry 4.0 is the of data [9]. The unstructured rating data in its raw form was collected
ideal method to enhance present business processes in the actual world. from the e-commerce websites and preprocessed it using Python and
Data collection, one of the key components of Industry 4.0, may en- other preprocessing techniques. The pre-processed data is then analyzed


Corresponding author.
E-mail address: khairol@upnm.edu.my (K.A.B. Ahmad).

https://doi.org/10.1016/j.susoc.2023.02.001
Received 7 November 2022; Received in revised form 26 January 2023; Accepted 17 February 2023
Available online 21 February 2023
2666-4127/© 2023 The Author(s). Published by Elsevier B.V. on behalf of KeAi Communications Co., Ltd. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
M. Ahsan, K. Ahmad, J. Ahamed et al. Sustainable Operations and Computers 4 (2023) 53–61

using machine learning algorithms for the development of models that primary objective is to devise a method that can be used to assess the
provide quality products. impact of the concepts of Industry 4.0, the Internet of Things, informa-
tion and communications technology and E-commerce on transportation
2. Related works and mobility. The concept of mobility has always been an essential and
fundamental component of human civilization. Interaction on a social,
Several research papers were studied and analyzed during the litera- cultural, and economic level is supported. Nevertheless, as a result of the
ture survey pertaining to the various types of research on the quality of COVID-19 pandemic, people’s mobility has been restricted owing to a
the product based on customer behavior and method that have been em- fear of corona virus, and as a consequence, home delivery services have
ployed over these years using classification, Regression, K-means, neural begun, playing an essential role in contributing to economic operations.
network, and clustering techniques [10]. Customers used the review- In addition, the idea behind Industry 4.0, the Internet of Things, and
ers’ disclosing of identity-descriptive information to augment or replace information and communications technology will most certainly have a
product information when making purchase decisions and assessing the part to play in electronic commerce and, consequently, in urban mobility
usefulness of online reviews, according to Lee and Shin [11]. They dis- [23]. However, the rise of e-commerce has contributed to the strength-
covered that online community members’ ratings that include personal ening of the citizenship economy in the industrial era 4.0 for the ben-
information are more comprehensive [12]. Hu et al. looked at the char- efit of Universitas Pendidikan Indonesia (UPI) students as discussed in
acteristics of reviews obtained from Amazon to see what makes a re- [2]. This research takes a qualitative approach, and the authors made
view useful for shoppers [13]. Three hypotheses were constructed and use of a literature review, Interviews, observations, document reviews,
tested. They discovered that the influence of review extremity was en- and field notes as data-collecting methods. In an endeavor to bolster
thusiastic on the product type after assessing the possibilities. Customers the economy of people, achieve wealth, and realize national goals, the
need to act directly while knowing the quality of the items; thus, they researchers are expected to teach students how to make the most effi-
were divided into two categories: search product and expertise product cient use of e-commerce as a part of Industry 4. It is impossible to deny
[14]. Qazi et al. looked at the influence of many aspects of review texts, that the advancements in digital technology that happened throughout
such as writing system faults, readability, and subjectiveness, on sales. the industrial age of 4.0 brought about very substantial changes in the
Linguistic accuracy was discovered to be an important factor in driv- order in which humans live their lives. Because of the proliferation of
ing sales [15]. In comparison to reviews that are either short or very different kinds of e-commerce, a significant shift has taken place in the
long and include spelling problems, there is a notion that reviews of way that buying and selling transactions are conducted. The advent of e-
medium length with a high number of violent writing system flaws are commerce has unquestionably opened up many lucrative opportunities
more beneficial to ignorant buyers [16]. Krishnamoorthy investigated for business people and entrepreneurs [2]. The authors in [3] reviewed
the relationship between online review usefulness, rating score, and, recent developments in robotics and automation technologies, as they
as a result, the review text’s qualitative characteristics as evaluated by pertain to the implementation of Industry 4.0. There is a widespread
readability tests. Conformity, comprehensibility, and quality were the consensus among businesses, research facilities, and educational insti-
three elements they endorsed in their theoretical paradigm [17]. They tutions that robots and automation technology serve as the bedrock of
looked at the directional link between the qualitative features of the industrial manufacturing and an essential component of Industry 4.0. At
review text, review helpfulness, and, as a result, review helpfulness’ ef- the end of this paper, it is explored whether or not the concept of "Indus-
fect on the review score. It was discovered that their ability to view try 4.0″ is truly disruptive or simply a normal, gradual development of
had a greater impact on their helpfulness-greatness relationship than its industrial production processes [3]. The history of the concept of “Indus-
length [18]. Korfiatis et al. explored whether the Quality of reviews in- try 4.0″ as well as its progression was covered in [4]. The idea of Industry
fluences both their viewers’ and thee-commerce website’s assessments 4.0 is not limited just to direct manufacturing in the company, it also
[19]. Shareholders were given questions such as (a) how frequently they comprises the entire value chain from providers to customers and all of
use online shopping malls and (b) if they have utilized the target product the enterprise functions and services are also included. Even though the
before. They examined how reader acceptance is influenced by (i) the concept is very comprehensive and complex, three main points can be
quality of online product evaluation, and (ii) how often such an event identified. Industry 4.0 is a subset of the Internet of Things that is ap-
is to occur. Their research revealed that favorable high-quality evalua- plied to the manufacturing and industrial environment. The data gener-
tions raise shareholders’ desire to purchase the goods and low-quality ated from IoT devices can be used for preventive maintenance, and it can
reviews decrease it [20]. Wan (2015) examined a dataset of Amazon’s give manufacturers valuable information on the lifetime and reliability
best-selling products and emphasized the Matthew effect (Merton, 1968) of their goods. It is predicated on the collection of data in real-time,
as well as the ratchet effect (Freixas et al., 1985) [21]. Reviews are with- which raises the issues of managing and analyzing massive amounts of
out a doubt beneficial to those who want to learn more about a product. data as well as ensuring data security. The fourth industrial revolution
The usefulness of any review has become a major area of investigation, of the 21st century is referred to as Industry 4.0. It enables businesses
as seen by Amazon’s introduction of a fresh new procedure known as to develop "smarter" products and services by cutting costs and boost-
the “helpfulness vote”. Amazon scores each review based on its ‘help- ing efficiency. The human element is essential to the application, and
fulness vote and some of the best reviews are shown on the product page. the work is based on previous research in the field [4]. The authors dis-
The first useful review is also chosen by hand [22]. Various components cussed Industry 4.0, including its objectives and applications, as well
of review analysis include sentiment analysis and helpfulness calcula- as the obstacles that stand in the way of its widespread adoption. The
tion. Many approaches are employed to do this, including classification difficulties and impediments, as well as potential solutions, were dis-
techniques, the Naïve Bayes theorem, support vector machines, and the cussed by the authors. The fourth industrial revolution, often known as
language communication process. Further, the conceptual debate over "Industry 4.0," introduces new technologies that make it easier to inte-
a variety of concerns about e-commerce, transportation, Industry 4.0, grate and computerize manufacturing processes [24]. The goal of the
the Internet of Things, information and communication technology, as paper [25] was to conduct a study of how the components that make
well as logistics and supply chain management. This study is crucial up Industry 4.0 successfully affect the manufacturing sector. The paper
from a futuristic point of view because the future appears to be mov- also reviewed the additive manufacturing techniques that make Indus-
ing more towards automated systems, internet-based approaches, mo- try 4.0 possible in real-time. This paper also assisted in gaining a better
bility, transportation, and logistics got affected due to industry 4.0. As understanding of Industry 4.0 and the components that make it up, such
a result, the concept of a feasible integrated model for passenger metro as the IoT, Cyber-Physical Systems (CPS), big data, and additive manu-
trains and last-mile parcel flow of e-commerce deliveries has been of- facturing [25]. The purpose of the paper [26] was to demonstrate that
fered. This research is centered on the urban area of Nagpur, and its an understanding of Internet technology, as well as the web and mobile

54
M. Ahsan, K. Ahmad, J. Ahamed et al. Sustainable Operations and Computers 4 (2023) 53–61

Fig. 1. Block diagram for development of quality prediction


model.

Table 1
Performance metrics.

Performance Metrics Metrics


𝑇 𝑃 +𝑇 𝑁
Accuracy 𝑇 𝑜𝑡𝑎𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑇𝑃
Recall (𝑇 𝑃 +𝐹 𝑁 )
𝑇𝑃
Rating Precision (𝑇 𝑃 +𝐹 𝑃 )
𝑇𝑃
F-1 Score
𝑇 𝑃 + 12 (𝐹 𝑃 +𝐹 𝑁 )

commerce websites, it is difficult for customers to decide on a good prod-


uct based on its quality [27]. For this problem, e-commerce websites in-
troduced a feature that provides ratings and feedback for the products
they bought. Despite this feature, many potential customers are unable
to select a good quality product easily due to a huge chunk of rating data
available on various online portals, analysis, and viewing of which is a
challenge [28,29]. The real-time dataset in the form of customer reviews
and feedback used in the work was extracted from Flipkart and Ama-
zon using .net and Model-View-Controller (MVC) technology. In this
work, numerical ratings are provided on a scale of 1 to 5 to the col-
lected dataset (customer reviews) of online available products, where
1 illustrates the worst quality of the product and 5 is the best. Then
the dataset is divided into two categories as training and testing data.
Fig. 2. Development of product quality models. Further, this rating-based dataset is analyzed using machine learning
algorithms viz Naïve Bayes, Support Vector Machine, k-Nearest Neigh-
bours, Random Forest, and Random Tree for the development of pre-
platforms that are components of the electronic commerce infrastruc- diction models. Hence, the prediction of the product’s quality is purely
ture, is necessary for the successful operation of the idea of electronic based on customer ratings, and feedback data; the performance metrics
commerce in an era in which daily changes occur on a global scale. The for the results of the models are shown in Table 1 and the complete
authors chose to investigate the Internet as a platform for moving trade methodology for the development of models is shown in Fig. 1.
to the digital network, on the one hand, and as a basis for the devel- Evidently, customers always remain curious about the quality of the
opment of new trade models, better known in theory and practice as product, hence the proposed developed models will be very helpful to
virtual trade, or e-commerce, on the other hand [26]. decide on the selection of available products on e-commerce portals. The
proposed models were developed on the collected datasets of three cat-
3. Proposed system egories viz Mobile dataset, Books dataset, and Health equipment dataset
as shown in Fig. 2. Customer service is also an important aspect when
The primary objective of this research is to illustrate product qual- deciding on a product.
ity based on customer behavior by assigning threshold values to online The complete description and stages for the development of the Qual-
customer reviews. As lots of products are available on online portals/e- ity prediction model are given below:

55
M. Ahsan, K. Ahmad, J. Ahamed et al. Sustainable Operations and Computers 4 (2023) 53–61

Table 2
Rating-based product encoding.

Before Encoding After Encoding


Product Quality Rating Range

Excellent 4≤E≤5
Good 3 ≤ G ≤ 3.9
Fair 2 ≤ F ≤ 2.9
Poor 0 ≤ P ≤ 1.9

Fig. 4. Histogram of the mobile dataset.

Fig. 3. Count plot of product quality.

(i). Data Collection

In the first stage, the data was extracted from Flipkart and Ama-
zon through the .Net using MVC technology. Thereafter, the dataset is
divided into training and testing for the development of the product’s
quality prediction model.

(ii). Selection and Description of Performance Metrics

The performance metrics on which the developed models were com-


pared are listed in Table 1.

(iii). Product Reviews and Feedbacks Encoding

The quality of the product is analyzed from the extracted dataset and
encoded into numerical values on a scale of 1 to 5, as shown in Table 2.
Fig. 5. Correlation among different attributes.
4. Results and discussion

The product quality prediction models were developed using ma-


chine learning techniques on the datasets extracted from Flipkart and seller’s rating attribute value ranges from 0 to 30 and the highest rating
Amazon. Three types of datasets were collected (Mobile Phones, Health is 30.
Equipment, and Books) from the e-commerce portals. The results of The Scatter plot shows the correlation among different attributes of
the developed models using machine learning techniques on different the Mobile dataset as shown in Fig. 5. It is depicted that the attribute
datasets are given below. seller rating is positively correlated to the overall review and is not re-
lated to other attributes of the dataset as seen in Fig. 5. This means that
4.1. Mobile dataset if the value of an attribute’s overall rating increases, then the value of
the overall review attribute will also increase. Similarly, the attributes’
The mobile dataset is the first category of collected data used for overall review is positively related to the overall rating and the rest of
the development of Product Quality Prediction Models. The histogram the attributes are not correlated to each other.
of the product ratings and reviews in terms of the category instead of 4.1a. Implementation of Naïve Bayes, SVM, k-NN, Random Forest,
quantitative variables defines the count of product quality factors and and Random Tree Algorithm on Mobile Dataset
is shown in Fig. 3. The mobile dataset is analyzed using the said machine learning tech-
Each attribute of the mobile dataset is represented by Histogram, niques and results are depicted in Table 3 on the given performance
as shown in Fig. 4. The average rating attribute ranges from 0 to 35, metrics. The accuracy of the Naïve Bayes algorithm, Support Vector
with the majority of average rating values falling between 0 and 50. Machine, k-Nearest Neighbours, Random Forest, and Random Tree are
The attribute value varies from 0 to 80 in the Mobile dataset. Similarly, 39.86%, 78.37%, 65.54%, 76.35%, and 68.91% respectively and the F-
the seller Name attribute displays data dispersion from 0 to 25 with a 1 score of Naïve Bayes algorithm, Support Vector Machine, k-Nearest
range of values from 0 to 150. Another attribute of a Mobile dataset is Neighbours, Random Forest, and Random Tree are 0.39, 0.78, 0.65,
product quality which shows different quality values from 0 to 3. The 0.76, and 0.68 respectively. Tables 4, 5

56
M. Ahsan, K. Ahmad, J. Ahamed et al. Sustainable Operations and Computers 4 (2023) 53–61

Fig. 6. Histogram of product quality.

Fig. 7. Histogram of health equipment dataset.

Table 3 Table 5
Results of NB, SVM, k-NN, RF, RT on a mobile dataset. Results of NB, SVM, k-NN, RF, RT on Book Dataset.

Algorithms Precision Recall F-1 Score Accuracy (%) Algorithms Precision Recall F1 Score Accuracy%

Naïve Bayes 0.42 0.40 0.39 39.86 Naïve Bayes 0.48 0.51 0.43 50.69
Support Vector Machine 0.80 0.78 0.78 78.37 Support Vector Machine 0.81 0.79 0.79 79.26
k-Nearest Neighbours 0.66 0.66 0.65 65.54 k-Nearest Neighbours 0.42 0.53 0.47 53.45
Random Forest 0.77 0.76 0.76 76.35 Random Forest 0.87 0.85 0.85 84.79
Random Tree 0.70 0.69 0.68 68.91 Random Tree 0.85 0.82 0.82 82.48

Table 4
Results of NB, SVM, k-NN, RF, RT on Health Equipment Dataset.
shown in Fig. 6 and the attributes relationships of the dataset similar to
Algorithms Precision Recall F-1 Score Accuracy%
the mobile dataset are shown in Fig. 7.
Naïve Bayes 0.33 0.34 0.33 33.78 The histogram representation of each attribute of the dataset cor-
Support Vector Machine 0.98 0.97 0.97 97.29 relating to different attributes is shown in Fig. 8. The average rating
k-Nearest Neighbours 0.37 0.41 0.34 40.54
attribute range values are from 0 to 25, however, most of the average
Random Forest 0.95 0.95 0.95 94.59
Random Tree 0.96 0.95 0.95 95.27 rating values fall between 0 and 60. Some attributes of Health Equip-
ment range from 0 to 100. Similarly, the seller’s name attribute shows
the dispersion of data from 0 to 20 with diverse values between 0and60.
Another attribute of the health equipment dataset is product quality
4.2. Health equipment dataset which shows different quality values from 0 to 3. Seller rating attribute
value ranges from 0 to 17.5 and the highest rating is 17.5.
This section give the description of the histogram of the health equip- 4.2a. Implementation of NaïveBayes, SVM, k-NN, Random Forest,
ment dataset collected from online portals for product review rating is and Random Tree Algorithm on Health Equipment Dataset

57
M. Ahsan, K. Ahmad, J. Ahamed et al. Sustainable Operations and Computers 4 (2023) 53–61

Fig. 8. Health Equipment correlation among different attributes.

Fig. 9. Histogram of product quality.

This section describes the analysis of the Health equipment dataset product quality which shows different quality values from 0 to 3. The
using the said machine learning approaches. The accuracy of the Naïve seller’s rating attribute value ranges from 0 to 12 and the highest rating
Bayes algorithm, Support Vector Machine, k-Nearest Neighbours, Ran- is 12.
dom Forest, and Random Tree are 33.78%, 97.29%, 40.54, 94.59, and The scatter plot shown in Fig. 11 is the correlation among differ-
95.27% respectively and F-1score of Naïve Bayes algorithm, Support ent attributes of the books’ dataset. From the figure, we can derive that
Vector Machine, k-Nearest Neighbours, Random Forest, and Random the attribute seller rating is positively correlated to the overall review
Tree are 0.33, 0.97, 0.34, 0.95, and 0.95 respectively. and is not related to other attributes. This means that if the value of an
attribute’s overall rating increases, then the value of the overall review
4.3. Book dataset attribute will be increased. Similarly, the attribute overall review is pos-
itively related to the overall rating and the rest of the attributes are not
The Book Dataset is the third category of a collected dataset from correlated to each other.
the e-commerce portals. The histogram of the product quality in terms 4.3a. Implementation of Naïve Bayes, SVM, k-NN, Random Forest,
of product rating and reviews of the dataset is shown in Fig. 9 and the and Random Tree Algorithm on Book Dataset
histogram for attributes of the book dataset is shown in Fig. 10. The results of the analysis of the Book dataset using machine learn-
Each attribute of the Book dataset is represented by Histogram, as ing techniques are depicted in this section. The accuracy of the Naïve
shown in Fig. 10. The average rating attribute ranges from 0 to 35, with Bayes algorithm, Support Vector Machine, k-Nearest Neighbours, Ran-
the majority of average rating values falling between 0 and 200. The dom Forest, and Random Tree are 50.69%, 79.26%, 53.45%, 84.79%,
attribute value varies from 0 to 150 in the Book dataset. Similarly, the and 82.48% respectively and the F-1 score of Naïve Bayes algorithm,
seller Name attribute displays data dispersion from 0 to 17.5 with di- Support Vector Machine, k-Nearest Neighbours, Random Forest, and
verse values from 0 to 500. Another attribute of the Book dataset is Random Tree are 0.43, 0.79, 0.47, 0.85, and 0.82 respectively.

58
M. Ahsan, K. Ahmad, J. Ahamed et al. Sustainable Operations and Computers 4 (2023) 53–61

Fig. 10. Histogram of book dataset.

Fig. 11. Correlation among different attributes


of the Books Dataset.

59
M. Ahsan, K. Ahmad, J. Ahamed et al. Sustainable Operations and Computers 4 (2023) 53–61

Table 6
Performance of the developed Models.

Dataset Accuracy

Naïve Bayes SVM K-NN Random Forest Random Tree

Mobile 39.86 78.37 65.54 76.35 68.91


Health Equipment 33.78 97.29 40.54 94.59 95.27
Book 50.69 79.26 53.45 84.79 82.48

Fig. 12. Comparative Result of NV, SVM, k-NN, RF, RT


on Different Datasets.

4.4. Comparison of the entire developed model on three datasets sites in order to assess the quality of the supplied products. However,
the proposed system is limited to the datasets available in a particular
This section describes the performance comparison of the different language and needs to have precise inputs.
developed models for the three collected online datasets. Table 6 depicts
the average Performance in terms of accuracy of the developed Naïve
Declaration of Competing Interest
Bayes, SVM, k-NN, random forest, and random tree models on three
datasets viz Mobile, Health Equipment, and Book.
The authors declare that they have no known competing financial
On the Mobile Dataset, the Health Equipment Dataset, and the Books
interests or personal relationships that could have appeared to influence
Dataset, we compared the performance results of Nave Bayes, SVM,
the work reported in this paper.
K-NN, random forest, and random tree method. As can be shown in
Table 6 and Fig. 12, the random forest technique has a higher average
accuracy than Naive Bayes, Support Vector Machine, k-Nearest Neigh- Acknowledgement
bors, and Random Tree.
This work is an outcome of a collaboration between Maulana
5. Conclusion, future work, and limitations Azad National Urdu University and the National Defence Uni-
versity of Malaysia which is funded through its internal grant
Everyone in today’s society is concerned about the quality of the UPNM/2023/SF/TK/1.
products they intend to purchase online. Using three distinct datasets,
we developed machine learning models to evaluate the quality of online References
items. In this paper, a data extractor developed in DOT NET is used to ex-
tract three datasets from Flipkart and Amazon: healthcare supplies, mo- [1] A. Kumar, V. Landge, S. Jaiswal, E-commerce, industry 4.0, & transportation–iden-
tifying the potentiality & problems, in: Proceedings of the 1st Indian International
bile phones, and books. After preprocessing, the datasets are analyzed Conference on Industrial Engineering and Operations Management, IEOM 2021,
with the Naive Bayes, SVM, k-NN, Random Forest, and Random tree 2021, pp. 553–563.
algorithms. With an accuracy of 78.37 percent on the Mobile Dataset, [2] N. Agustina, D. Sundawa, Utilization of e-commerce in the industrial era 4.0 for
UPI students in strengthening the economic civics, in: Proceedings of the Annual
the Support Vector Machine algorithm outperformed the Nave Bayes al- Civic Education Conference (ACEC 2021), 636, 2022, pp. 270–272, doi:10.2991/as-
gorithm, k-Nearest Neighbors, Random Forest, and Random Tree. Fur- sehr.k.220108.049.
ther, with an accuracy of 97.29 percent, the Support Vector Machine al- [3] C.V. Bidnur, A study on industry 4.0 concept, Int. J. Eng. Res. V9 (04) (2020) 613–
618, doi:10.17577/ijertv9is040569.
gorithm outperformed the Nave Bayes algorithm, k-Nearest Neighbors,
[4] R. Murdiana, Business Ecosystem & Strategy, Int. J. Bus. Ecosyst. Strateg. 2 (1)
Random Forest, and Random Tree on the Health Equipment Dataset. (2020) 30–41.
Furthermore, with an accuracy of 84.79 percent, the Random Forest al- [5] A.Q. Md, K. Jha, S. Haneef, A.K. Sivaraman, K.F. Tee, A review on data-driven qual-
ity prediction in the production process with machine learning for industry 4.0,
gorithm outperformed the Nave Bayes algorithm, k-Nearest Neighbors,
Processes 10 (10) (2022) 1966.
SVM, and Random Tree on the Book Dataset. Random Forest, with an av- [6] V. Sima, I.G. Gheorghe, J. Subić, D. Nancu, Influences of the industry 4.0 revolution
erage accuracy of 85.24 percent, outperforms the Nave Bayes algorithm, on the human capital development and consumer behavior: a systematic review,
k-Nearest Neighbors, SVM, and Random Tree. This research applies to Sustainability 12 (10) (2020) 4035.
[7] Q. Cao, W. Duan, Q. Gan, Exploring determinants of voting for the “helpfulness”
a wider variety of available datasets from different e-commerce portals of online user reviews: a text mining approach, Decis. Support Syst. 50 (2) (2011)
and the diverse datasets can be extracted from various e-commerce web- 511–521.

60
M. Ahsan, K. Ahmad, J. Ahamed et al. Sustainable Operations and Computers 4 (2023) 53–61

[8] S. Bolter, Predicting product review helpfulness using machine learning and special- [19] N. Korfiatis, E. García-Bariocanal, S. Sánchez-Alonso, Evaluating content quality and
ized classification models Master of Science Thesis, San Jose State University, United helpfulness of online product reviews: the interplay of review helpfulness vs. review
States of America, 2013, doi:10.31979/etd.4wez-ndf6. content, Electron. Commer Res. Appl. 11 (3) (2012) 205–217.
[9] C. Stoicescu, Big Data, the perfect instrument to study today’s consumer behavior, [20] S. Lee, J.Y. Choeh, Predicting the helpfulness of online reviews using multilayer
Database Syst. J. 6 (2016) 28–42. perceptron neural networks, Expert Syst. Appl. 41 (6) (2014) 3041–3046.
[10] S. Maghilnan, M.R. Kumar, Sentiment analysis on speaker specific speech data, in: [21] S. Paknejad, Sentiment Classification On Amazon reviews Using Machine Learning
Proceedings of the International Conference on Intelligent Computing and Control approaches. Degree Project in ComputerScience, KTH Royal Institute of Technology,
(I2C2), IEEE, 2017, pp. 1–5. Sweden, June 2018.
[11] E.J. Lee, S.Y. Shin, When do consumers buy online product reviews? Effects of re- [22] F. Avicenna, Online reviews: the effect of sources and framing of reviews on eWoM
view quality, product type, and reviewer’s photo, Comput. Hum. Behav. 31 (2014) credibility, product attitude, and behavioral intention (Master’s thesis, University of
356–366. Twente, 2016).
[12] C. Forman, A. Ghose, B. Wiesenfeld, Examining the relationship between reviews [23] A. Kumar, V. Landge, S. Jaiswal, E-commerce, Industry 4.0, & Transportation –
and sales: the role of reviewer identity disclosure in electronic markets, Inf. Syst. Identifying the Potentiality & Problems, in 1st Indian International Conference on
Res. 19 (3) (2008) 291–313. Industrial Engineering and Operations Management, IEOM (2021) 553–563.
[13] N. Hu, I. Bose, N.S. Koh, L. Liu, Manipulation of online reviews: an analysis of ratings, [24] C. Zhang, Y. Chen, H. Chen, D. Chong, Industry 4.0 and its Implementation : a review,
readability, and sentiments, Decis. Support Syst. 52 (3) (2012) 674–684. Inf. Syst. Front. (2021) 1–11.
[14] A. Singh, C.S. Tucker, A machine learning approach to product review disambigua- [25] S. Kumar, P. Yadav, F. Siddiqui, Industry 4.0 in manufacuturing sector: a review,
tion based on function, form and behavior classification, Decis Support Syst 97 Int. J. Sci. Eng. Res. 10 (5) (2019) 194–199.
(2017) 81–91. [26] S. Ćuzović, B. Labović, E-commerce in the light of the fourth industrial revolution,
[15] A. Qazi, K.B. Syed, R.G. Raj, E. Cambria, M. Tahir, D. Alghazzawi, A concept-level Нови Економист 13 (25) (2020) 30–36, doi:10.7251/noe1925030c.
approach to the analysis of online review helpfulness, Comput. Hum. Behav. 58 [27] Y. Wan, The Matthew effect in social commerce, Electron. Mark. 25 (4) (2015)
(2016) 75–81. 313–324.
[16] A. Ghose, P.G. Ipeirotis, Estimating the helpfulness and economic impact of product [28] H. Almagrabi, A. Malibari, J. McNaught, A survey of quality prediction of product
reviews: mining text and reviewer characteristics, IEEE Trans. Knowl. Data Eng. 23 reviews, Int. J. Adv. Comput. Sci. Appl. 6 (11) (2015) 49–58 Nov.
(10) (2010) 1498–1512. [29] D. Weathers, S.D. Swain, V. Grover, Can online product reviews be more helpful?
[17] S. Krishnamoorthy, Linguistic features for review helpfulness prediction, Expert Syst. Examining characteristics of information content by product type, Decis Support
Appl. 42 (7) (2015) 3751–3759. Syst. 79 (2015) 12–23.
[18] J. Xu, Y. Duan, Pricing and greenness investment for green products with govern-
ment subsidies: when to apply blockchain technology? Electron. Commer. Res. Appl.
51 (2022) 101108.

61

You might also like