Professional Documents
Culture Documents
An Efficient User Centric Clustering Approach For Product Recommendation Based On Majority Voting: A Case Study On Wine Data Set
An Efficient User Centric Clustering Approach For Product Recommendation Based On Majority Voting: A Case Study On Wine Data Set
An Efficient User Centric Clustering Approach For Product Recommendation Based On Majority Voting: A Case Study On Wine Data Set
liking of each customer is known to all other customers. information about the situation of training data samples in
Sometimes there is a need to select the objects based on the attribute space. This new value takes into accounts the
preferences of values of attributes. One way to select best value of stability and robustness of any train samples
products from the set of products is by using the linear regarding with its neighbors. KNN with applied weights
weighted function of preferences and attribute values. In employs validity as the multiplication factor yields to more
this method, the linear function computes the score for robust classification rather than simple KNN method,
each product Using scores products are ranked in the order efficiently.
that makes each for selecting top score products. Present The Wine dataset is the result of a chemical analysis of
web applications are able to process multi parametric wines grown in the same region in Italy that derived from
ranked queries only on small datasets. Many databases do three different cultivars. The analysis determined the
not support efficient evaluation of customer preference quantities of 13 attributes found in each of the three types
queries. Here for response and throughput is retrieval first of wines. This data set has been in use with many others
and then scoring function is evaluated for each tuple. for comparing various classifiers. In the context of
Finally, a few products are ranked and displayed. The main classification, this is a well-posed problem with well-
disadvantage of the traditional database queries is that behaved class structures.
these query evaluation techniques need retrieval and Data mining techniques to classify the quality of wines
ordering of the entire data preference-based queries can be using a larger physicochemical data set were used in more
augmented with features for scalability purpose. recent works. Cortez and his colleagues [12] built models
Special Indexing techniques are needed for efficient using support vector machine, multiple regression, and
processing weighted preference based query execution. neural networks. A dataset with a large number of records
These indexing techniques are useful for obtaining linear is considered (vinho Verde samples from the Minho region
function preference based optimized query results. Linear of Portugal). A computational procedure was developed
function optimization queries are called preference that performs simultaneous variable and model selection.
selection queries because such queries retrieve tuples Support vector machine achieved desirable results,
maximizing a linear function defined over the attribute “outperforming the multiple regression and neural network
values of the tuples in a relation. Linear preference methods”. This model is vital in supporting the oenologist
function also needs to modify for obtaining better query wine tasting evaluations and to improve wine production.
results The results of this research are relevant to the wine science
domain, helping in the understanding of physicochemical
characterization and the things that affect the final quality.
2. Related work In[7] authors enforced unsupervised neural network (NN)
based on Adaptive Resonance Theory (ART1) as an
A good number of research papers have been published on alternative to statistical classifier so as to discriminate
wine quality that is mostly based on the empirical studies among the 178 samples of wine possessing 13 numbers of
in the wine industry. Most of the research was endorsed to attributes. The dimensionality of the feature variables was
assess wine quality using physicochemical data is based on reduced to 5 by principal component analysis (PCA).Out
small sample sizes. In [10] pattern recognition approaches of 13, the first 2 numbers of principal components
that include clustering, principle component analysis, captured over 55.4 % of the variance of the wine dataset.
nearest neighbors, etc. were applied to classify wines from Nonhierarchical K-means clustering algorithm was used to
Galicia (northwestern Spain) among several different choose the classes available among the samples of wine.
brands. The dataset used consists of 42 white wines. Appalasamy et al [2] applied two classification algorithms,
Principle component analysis (PCA) for wine classification decision tree and Naive Bayes and compared results with
according to the geographical region was reported in [13]. the recent work results.
The authors used the data set that contains 33 Greek wines In [3] authors proposed a brand new data science area
with physicochemical variables. The details of a 2-stage named Wine informatics. In order to automatically retrieve
classification done (principle component and clustering) wines' flavors and characteristics from reviews, which are
from 24 industrial fermentations of a particular type of stored in the human language format, authors proposed a
wine were given in [1. This study tried to detect novel "Computational Wine Wheel" to extract keywords.
undesirable fermentation behavior. Two completely different public-available datasets are
In [6] authors proposed an improved KNN method with produced based on the new technique in their paper. The
weights, which considerably improves the performance of hierarchical clustering algorithm is applied to the primary
KNN method. The authors employed a kind of dataset and got purposeful clustering results. Association
preprocessing on train data. They introduced a new value rules mining is performed on the second dataset to predict
named Validity to train samples which cause to more whether or not a wine is scored higher than 90 points or
IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.10, October 2017 105
not supported on the wine savory reviews. Fivefold cross- automated recommender software systems are developing
validation experiments were executed based on different continuously by many companies for voting based
parameters and results with a range of 73% to 82% selection of products. A customer would be interested in
accuracy were generated. This new domain will bring huge purchasing products that are similar to the products that
benefits to fields as diverse as computer science, statistics, he/she liked earlier. In [15] authors proposed an algorithm
business, and agriculture. that combines user-based approach, item-based and
In [5] authors proposed a data analysis approach to classify Bhattacharyya approach. The main advantage of this
wine into different quality categories. A data set of white hybrid approach is its capability to find more reliable items
wines of 4898 records was used in the analysis. As the data for recommendation. Collaborative filtering technology
set was imbalanced with about 93% of the observations are works by creating a database of preferences for products
from one category with respect to the occurrence of events by their customers. Collaborative technology is becoming
in it, Synthetic Minority Over-Sampling Technique popular in the latest research areas such as E-business,
(SMOTE) was applied to oversample the minority class. A Banking, Space, and Share Market and so on. In [11]
balanced data was considered to model a classifier that authors proposed an improved collaborative filtering
categorizes a wine into three categories. These categories algorithm that combines k-means algorithm with CHARM
include high quality, normal quality, and poor quality. algorithm. This hybrid approach improved the prediction
Three classification techniques used in this work include quality of recommendation system. In [9] authors tried to
decision tree, adaptive boosting and random forest. Among improve the learning speed by splitting the cluster tree into
the techniques, random forest produced to produce the sub-clusters and by using exploration and exploitation
desired results with the minimal error. Based a Wine phases and aggregates as well. From user-centric sensor
dataset of 4898 instances the authors attempt to build data, Friendbook discovers lifestyles of users and measures
models that classify different wines into quality categories. the similarity of lifestyles between users. It recommends
With this model, the test data is tested. The quality friends to users if their lifestyles have high similarity. In
variable is assessed by many factors The authors [16] authors model the daily life of users as life documents.
concluded that the analysis would give a clearer idea to Using these documents lifestyles are extracted by using the
winemakers as to which variables influence the quality the Latent Dirichlet Allocation algorithm.
most and what steps could be an attempt to attain more
desirable outcomes. 2.1 The evolution summary:
In [8] authors used Analytical Hierarchy process (AHP)
classification algorithm. This algorithm provides the way The evolution in the field of wine quality research is
to recommend wine on the basis of the components of the threefold.
wine. Wine selection on the basis of its attributes is a i) With respect to the dataset
different approach. The Machine Learning Techniques Most of the research about Wine has been carrying
used here helped in finding the component accuracy of out based on two popular data sets named Wine
wine attributes. The Analytical Hierarchy Process (AHP) is and Wine quality which are publicly available on UCI
used for arranging and examining complicated problems machine learning platform. Two datasets are available
by mathematical calculations. AHP defines multiple of which one dataset is in red wine and have 1599
attribute issue to advise a particular commodity to an completely different varieties and therefore the alternative
individual. The process of AHP is mainly used to calculate is on white wine and has 4898 varieties. All wines are
weights. The inputs for AHP are relative preferences and produced in a particular area of Portugal. Data are
attributes. The authors have taken red wine dataset and collected on 12 completely different properties of the
weights were allotted to them based on AHP. The obtained wines one amongst that is Quality, supported on sensory
results after analysis of the data were used for knowledge, and therefore the rest are on chemical
recommending a wine to individuals. properties of the wines together with density, acidity,
Users need reasonably good recommendations in finding alcohol content etc. All chemical properties of wines are
products according to their likes and dislikes. In [14] continuous variables. Quality is an ordinal variable with
authors compare data-centric evaluation with user-centric the possible ranking from 1 (worst) to 10 (best). Every
evaluation and obtain remarkable results in favor of user- form of wine is tasted by three independent tasters and
centric approach. In [4] authors used a new clustering therefore the final rank assigned is that the median rank
algorithm based on the mutual vote, which adjusts itself given by the tasters. Knowledge regarding reviews given
automatically to the given dataset, needs minimum by users is additionally out there.
overhead in terms of parameters, and also able to detect ii) With respect to the objectives
clusters with different densities in the same dataset. The common objective of most of the researchers is to
Currently, many Voting/Rating machine searing based assess/predict wine quality based on the wine
106 IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.10, October 2017
calculations, to cluster the products. 4. Compute reverse top-k queries for all the products
c) Nearest neighbor similarity search using Jaccard obtained in the step-3.
coefficient and modified Jaccard coefficient.
IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.10, October 2017 107
Figure 2: Number of clusters for 100 and 200 reviews with varying k From Figure 5 it is observed that with the increase in a
value number of reviews the proportional increase is observed in
the number of clusters. Finer clustering needs larger
From Figure 2 it was observed that the top k query value reviews.
influencing the number of clusters well for large values of
k and as the number of reviews increased the number of
clusters is finer.
Figure 3: Cluster density for 100 and 200 reviews with varying k value From Figure 6 it is observed that with the increase in a
number of reviews the proportional decrease is observed in
From Figure 3 it was observed that the top k query value the density of clusters. Sparser clusters are resulted by
influencing the cluster density well, as the number of larger reviews.
reviews increased.
Figure 4: Execution time for 100 and 200 reviews with varying k value From Figure 7 it was observed that with the increase in a
number of reviews the proportional increase was observed
From Figure 4 it was observed that with the increase in top in the execution time.
k query value the execution time increased a little and falls
back for larger k values. More reviews need more
execution time
110 IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.10, October 2017
From Figure 9 it is observed that with the increase in In this paper, a user-centric similarity framework is
threshold value the oscillatory nature is observed in the introduced in which the similarity of products is assessed
density of clusters. by user preferences. A popular dataset named "Red wine
quality "is considered in this work to assess the quality of
Wine by grouping the individual products into clusters and
then grade the groups based on preferences. The user-
centric approach provided quite different and interesting
results than the conventional approaches have, that do not
consider the preferences that the customers have expressed.
It is observed that the query types introduced in this work
need higher execution times as the number of users and the
preferences increased. This may lead to a scalability
Figure10: Threshold Vs Execution time problem of the proposed framework. This can be
smoothened by introducing R-tree like data structures for
From Figure 10 it is observed that with the increase in the searching and indexing purpose that can optimize the
threshold value the proportional increase is observed in the execution time of the proposed framework. The purpose of
execution time. developing this kind of a system is to support and advise
From the sets of cases executed it is found that the more wine users for better selection and winemakers for
the number of customer reviews, the clearer the providing a better quality.
information from the clusters about the customers’ taste.
As the number of reviews increased, the number of clusters
References
also increased and the density of individual clusters came
[1] Alejandra Urtubia, J Ricardo Perez-Correa, Alvaro Soto,
down. For more reviews to be incorporated in computing and Philippe Pszcz´olkowski. “Using data mining
weighs, the execution time increased accordingly. techniques to predict industrial wine problem
The kinds of influence discussed above are useful for fermentations”. Food Control, 18(12):1512–1517, 2007.
market analysis, and it is directly correlated with the [2] Appalasamy P, A Mustapha, N. D. Rizal, F Johari, and A. F.
number of customers that value a particular product. Mansor. “Classification-based data mining approach for
Although the techniques of reverse top-k queries are good quality control in wine production”. Journal of Applied
Sciences, 12(6):598–601, 2012.
IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.10, October 2017 111
[3] Bernard Chen, Christopher Rhodes and Aaron Crawford, Y.Subba Reddy received M.Sc
“Wineinformatics: Applying Data Mining on Wine Sensory (Computer Science) degree from
Reviews Processed by the Computational Wine Wheel”, Bharathidasan University, Tiruchirapalli,
2014 IEEE International Conference on Data Mining TN and M.E degree in Computer Science
Workshop. & Engineering from Sathyabama
[4] Charif Haydar, Anne Boyer “A New Statistical Density University, Chennai, TN. He is a research
Clustering Algorithm based on Mutual Vote and Subjective scholar in the Department of Computer
Logic Applied to Recommender Systems” UMAP’17, July Science, Sri Venkateswara University,
9-12, 2017, Bratislava, Slovakia. Tirupati, AP, India. His research focus is
[5] Gongzhu Hu, Tan Xi, Faraz Mohammed, “Classification of on Data Mining in Clustering and Similarity measures.
Wine Quality with Imbalanced Data”, 2016 IEEE.
[6] Hamid Parvin, Hoseinali Alizadeh, Behrouz Minati,” A
Modification on K-Nearest Neighbor Classifier” GJCST, P.Govindarajulu, Department of
Vol.10 Issue 14 (Ver.1.0) November 2010. Computer Science, Sri Venkateswara
[7] Kavuri N. C. and Madhusree Kundu. “ART1 Network: University, Tirupathi, AP, India. He
Application in Wine Classification”, International Journal of received his M. Tech., from IIT Madras
Chemical Engineering and Applications, Vol. 2, No. 3, June (Chennai), Ph. D from IIT Bombay
2011. (Mumbai), His area of research is
[8] Kunal Thakkar et al, (IJCSIT),”AHP and Machine Learning Databases, Data Mining, Image
Techniques for Wine Recommendation” International processing, Intelligent Systems and
Journal of Computer Science and Information Technologies, Software Engineering
Vol. 7 (5), 2016, 2349-2352.
[9] Linqi Song, Cem Tekin, Mihaela van der Schaar “Clustering
Based Online Learning in Recommender Systems: A Bandit
Approach”, SongTekin ICASSP2014.
[10] Maria J Latorre, Carmen Garcia-Jares, Bernard Medina, and
Carlos Herrero, "Pattern recognition analysis applied to
classification of wines from Galicia (northwestern Spain)
with the certified brand of origin", Journal of Agricultural
and Food Chemistry, 42(7):1451–1455, 1994.
[11] Paritosh Nagarnaik, Prof. A. Thomas, “Survey on
Recommendation System Methods”, IEEE SPONSORED
2ND INTERNATIONAL CONFERENCE ON
ELECTRONICS AND COMMUNICATION SYSTEM
(ICECS 2015).
[12] Paulo Cortez, Antonio Cerdeira, Fernando Almeida, Telmo
Matos, and Jos´e Reis. “Modeling wine preferences by data
mining from physicochemical properties”. Decision Support
Systems, 47(4):547–553, 2009.
[13] S Kallithraka, IS Arvanitoyannis, P Kefalas, An El-Zajouli,
E Soufleros, and E Psarra. “Instrumental and sensory
analysis of Greek wines; implementation of principal
component analysis (PCA) for classification according to
geographical origin". Food Chemistry, 73(4):501–514, 2001.
[14] Soude Fazeli, Hendrik Drachsler, Marlies Bitter-Rijpkema,
Francis Brouns, Wim van der Vegt, and Peter B. Sloep,
“User-centric Evaluation of Recommender Systems in
Social Learning Platforms: Accuracy is Just the Tip of the
Iceberg”, IEEE TRANSACTIONS ON LEARNING
TECHNOLOGIES.
[15] Umutoni Nadine, Huiying Cao, Jiangzhou Deng,
“Competitive Recommendation Algorithm for E-commerce”
2016, 12th International Conference on Natural
Computation, Fuzzy Systems and Knowledge Discovery
(ICNC-FSKD).
[16] Zhibo Wang, Jilong Liao, Qing Cao, Hairong Qi, Zhi Wang,
“Friendbook: A Semantic-based Friend Recommendation
System for Social Networks”, IEEE TRANSACTIONS ON
MOBILE COMPUTING 2015.