Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2016 Second International Conference on Web Research (ICWR)

A Novel Approach for Website Aesthetic


Evaluation based on Convolutional Neural Networks

Masoud Ganj Khani 1, Mohammad Reza Mazinani2 1


Mohsen Fayyaz, 2Mojtaba Hoseini
1 1,2
AmirKabir University of Technology Malek-Ashtar University of Technology
2
Malek-Ashtar University of Technology Tehran, Iran
1
Tehran, Iran mohsen.fayyaz89@gmail.com
1 2
ganjkhany@aut.ac.ir mojtabahoseini@aut.ac.ir
2
mazinany@gmail.com

Abstract—In this paper we propose a website aesthetic users do not tend to refer to them, and they have little traffic
evaluation method. For achieving better performance, we have [2] [3]. In other words, aesthetic is more a quality parameter
applied convolutional neural networks, which are one of the than a quantitative parameter to measure. Another view
methods of deep learning research area. Using deep learning examines the effect of complexity on websites. Some
and convolutional neural networks for feature representation
is one of the main tips that makes difference between our work
researchers believe that simplicity makes beauty. But as was
and previous ones. Our system takes a screenshot of the said, it is not all the reason of beauty [4].
website as input, and finally reports it is a good or bad website In this article, we seek to understand the beauty with a
based on users’ country or not. For evaluation process, we general way. It seems that the use of numerical quantities,
represent the website screenshot using MemNet convolutional such as the extraction of some of the parameters of the site is
neural network. Then we decrease the extracted features a part of providence. For example, using some quantitative
dimension using principal component analysis algorithm. parameters such as the number of graphical elements,
Finally, we classify them using a SVM classifier, which trained, density, and size can only be a guess about the beauty of a
based on users’ ratings. Furthermore, aesthetics evaluation in
website. Guess that may be not true in many cases and does
this research is language independent. It means the website’s
language is not important and our method works for all
not lead to results. There is the possibility of wrong
languages. parameter selection for representing the beauty of the site. It
is also possible that some errors occur during calculation of
Index Terms— Aesthetics, Website, Deep Learning, parameters. Generality keeps us away from such mistakes.
Convolutional Neural Network. Using a general strategy can have a better performance on
website aesthetic evaluation. However, as our approach is in
I. INTRODUCTION general way, we do not use the numerical parameters.
Today, the majority of our needs met through internet. Our approach is also language-independent and can be
Things like shopping, reading news, and online gaming are used for all sites in any language. This feature makes this
only part of the websites’ application. Thus, in general algorithm to be able to be applied in different countries. One
websites classified in groups, such as, news, scientific, of the major challenge of the traditional website aesthetic
marketing, etc. There are many sites in each group. This evaluation systems, such as, [5]is deciding what features to
number added each day and even every hour. One crucial be extracted from the website, to help the system in the
point that makes the users to be attracted to a website is evaluation process. In other word to evaluate the aesthetic of
aesthetic. Aesthetic is the first thing that a user after visiting websites, some features are needed. These features should be
the website is unconsciously affected by[1]. This effect able to represent the websites based on their aesthetic. This is
makes the user next visits to the sites that they are desirable a traditional problem in machine learning research. For
for him or her. In other words, when a group of websites solving such problem, a wide variety of algorithms, methods,
offer similar services, users tend to be among the sites that and field of research have been introduced. One of the most
they are more visually attractive than others[1]. That is why promising introduced fields of research is deep learning. It is
today's web designers spend a lot of time on graphical a new area of machine learning research that solves the
elements and images used on the website and they design problem of choosing best hand crafted features by learning
them with great care. the best features for representing the raw data.
However, what is really causes beauty for the site is in To alleviate the problem of choosing best features, we
the eyes of the audience. Attractive and high quality images, have used Convolutional Neural Networks (CNN), which are
graphic elements, and animations can make the website one of the best feature extraction methods in deep learning.
beautiful. Nevertheless, in fact this is not the case. Because Our system takes the website screenshot as input. Then
we know websites that take advantage of this feature, but represents it using a convolutional neural network. Finally,

978-1-5090-2166-6/16/$31.00 ©2016 IEEE


48
we classify the represented data using a Support Vector paper compares rate of people to websites and the predicted
Machine (SVM) to find out that the website is good or not. rate extracted from visual metrics. It concluded that
We have trained our classifier based on the opinion of the colorfulness and visual complexity are the main parameters
dataset users. For better evaluation of the websites’ aesthetic, in aesthetics and ratings. As stated in [3] also vertical
we consider the country of the users. Therefore, we have symmetry is an important parameter in aesthetic website
evaluated each website based on the target society. design.
Our main contributions are: As reported by [4] using separated elements, intelligent
• Applying deep learning and convolutional neural titles and images has a good impression on attracting viewer.
networks for feature representation of input The information in these sites visually separated from each
screenshots of websites. Our main difference with other and classified in groups. In the case it takes, about five
pictorial aesthetic evaluation systems, such as, [6] is seconds to user find what he is looking for. The visual
that we are evaluating a website aesthetic which complexity of a website depends on the way of present,
composes of texts, images, ribbons, changing parts, dispersion and density of the elements. In [4] authors
etc. ,which each of them can have an enormous Showed the relation between aesthetic and complexity. It
effect on user opinion about the whole website, uses a part of top hundred Alexa’s website with voting to
while they are evaluating images of natural scenes, show that relation. The parameters in this examination are
see sides, objects, animals, etc. which most of them menus, images, words, links, and part of page that has no
doesn’t compose of different parts with different sharing space with other parts. Complexity means using
subjects, such as, texts, movies, .etc. many parameters in the website. It shows that if a website
• We evaluate the websites’ screenshots, in which all was complex, it was not beautiful.
the elements of a website are considered together; In agreement with [11] number of text sections, number
like what happens in human mind when watching a of links, page dimension, graphic elements, number of
scene. While in other works, such as, [2], [7], each colors, and complexity of reading are more important in high
element of website are evaluated separately. rated websites. This paper also uses voting for gathering
• Using a public dataset which has been used for information about aesthetics. This voting was on web experts
website aesthetic evaluation with a different point of who pay attention to content, structure, navigation, graphical
view, such as, statistical demographic analysis in [2], design, interact with the user, and self-experiences.
[7]. In [12] authors determine the average of people’s scores
• Considering the home country of the user whom about a website as a criterion for website aesthetic.
websites we would evaluate. In practice, some papers, such as, [5], emphasized that
The rest of the paper is organized as follows: we survey online shopping website environment and its aesthetics have
related works in this field of research in section II. Our an effect on attracting customers. In [13] it is shown that the
proposed method is introduced in section III that explains aesthetic increase the usability, hedonic, and utilitarian value.
how we applied deep learning in our work. In section IV, These three values are the most important parameters in
experimental results have been described. Finally, in section shopping website. In addition, the paper uses a questionnaire
V we have concluded the paper and proposed some for gathering the information about 10 shopping websites.
suggestions for future work. Conforming to [13] high quality images also can attract
customers and improve sales.
II. RELATED WORK
III. METHODOLOGY
According to [1] for measuring how much user interested
in a website, the first feeling after entering site is very As mentioned above, for evaluation of the aesthetic of
important. In a short time, the user decides to stay in this site websites, we need features that can separate the websites
or surf elsewhere. According to [8] this short time is only 50 based on their aesthetic when they are described by such
milliseconds. We should manage this time. In this, paper the features. The problem of choosing the best features for
effect of complexity on aesthetics reviewed by voting. representing the raw data by is a challenging problem in
In consonance with [1] “color” has a special role in machine learning research. Deep learning is a new area of
website aesthetics. Some papers like [9] show that each color machine learning research that provides us the ability to
has special effects on the viewer. Some colors are warm and solve such problems by learning the best features for
other ones are cold and so on. In addition to color, “balance” representation of the raw data. Not only deep learning
is also important. As stated in [10] when a website was provides us the power of learning the best features to extract,
balanced, users sense equilibrium psychologically. Balance but also the ability to control the dimension of new
means equal visual weight distribution horizontally and represented data. In other word, we achieve both the power
vertically. In other word, it means that html elements of controlling amount of learned features robustness and the
distributed equally inside the page. power of controlling representation data reduction amount
In [2] authors have employed some metrics, such as, simultaneously. Deep learning based algorithms and
balance, hue, and saturation of colors, symmetry of elements, methods are growing fast in a wide range of subjects, such
number of images, and other visual statistical metrics. This

49
as, [6], [14], [15], [16] which shows the power of deep kernel size of 11×11 and stride of 4 performs which outputs
learning. 96 feature maps with size of 55×55. Then a max pooling,
As mentioned above deep learning has the power of pools features to decrease avoid the impact of rotation of the
extracting the best features from our raw data. This key point input image to final features. In second convolution layer,
of deep learning motivates us for applying deep learning convolution has been done with kernel size of 5×5, stride of
methods for websites aesthetic evaluation problem. One of one, which outputs 256 feature maps with size of 27×27. The
the state of the art deep learning tools for feature extraction other convolutional layers operate like previous and what is
is Convolutional Neural Network [17] which will be referred seen in the Fig. 2. Finally, after five convolutional layers
to by, CNN. This motivates us to apply the CNNs for there are two full-connected layers for abstracting the
aesthetic evaluation problems. The basic idea is that, given a features and decreasing the features dimension to 4096. For
screenshot from the website page, which we want to extracting the features, we have used layers fc6 and fc7 of
evaluate, we use a CNN to represent it based on the CNN the CNN to evaluate them and choose the best one that is
learned features, then we dimensionally reduce the extracted discussed in next section. Both fc6 and fc7 layers have the
features to achieve better performance in classification, dimension of 4096.
finally we classify websites using a support vector machine
B. Feature dimension reduction
(SVM). The flowchart of our proposed method is shown in
Fig. 1. As mentioned above, representing features using CNN
gives us a feature vector with dimension of 4096, which is
computationally expensive for classification step. To solve
Website screenshot the problem, we have used principal component analysis
(PCA) algorithm to reduce the dimension of features
C. SVM classification
Feature Representation Our feature dimension was 128 for each websites. After
using CNN obtaining features for image we used Support Vector
Machine [23], with Gaussian radial basis function kernel, as
classifier for training and testing step. The classifier takes
Feature Dimensionality features vector of size 128 for each website as input and
Reduction using PCA classify them into two categories: good or bad.
IV. EXPERIMENTAL RESULT
In this section, we first introduce websites screenshots
Classification using MLP dataset, then our result for all the dataset and for each
country presented and discussed separately.
Fig. 1. Our proposed method consist of three main steps: A. Websites screenshot dataset
Feature representation using CNN, feature dimension reduction, and
SVM classification. In the following paragraphs, these stages We use dataset presented by Intelligent Interactive
are explained. Systems Group at Harvard University, which has been used
in [2], [7]. Dataset consist of 430 screenshots captured from
A. Feature representation using CNN websites’ homepages. All image resolution was 1024×768.
In this step, we represent the input raw data using CNN. 350 websites were English websites, 60 websites obtained
There are a wide variety of CNN architectures which have from other languages and 20 websites selected from Webby
the state of the art performances,[17] [18] [19] The CNN Award Candidate [2]. They selected websites to represent
architecture we used is based on AlexNet CNN [17].The wide variety of genres but unfortunately structure of most
training of our CNN is like MemNet [20] that uses pre- sites were very similar which results in low variety of users’
trained form of Hybrid-CNN [21], trained on both Places rating. For evaluating the collected websites’ aesthetic, they
dataset [22] and ILSVRC 2012 dataset [17], and fine-tuned designed an online test that people around the world could
on LaMem dataset [20]. As it can be seen in Fig. 2, The rate complexity, colorfulness and visual appeal of webpages.
CNN we used for feature extraction consists of five Each webpages display 500ms for users so sites contents did
convolutional layers, three max pooling layers, and two full- not affected their impression. Data collected from volunteer
connected layers (fc6 and fc7). The input dimensions of the between June 2012 and August 2013. Each participant rate
CNN are 227×227×3 (an RGB image with size of 227×227). 30 random websites two times to ensure that they rated the
In the first convolution layer, a convolution operation with sites accurately.

50
Fig. 1. Architecture of pretrained CNN

Because the aesthetic of each site varied by opinion of


people, mean rating for not all sites were variant too much. TABLE I. CLASSIFICATION ACCURACY FOR FEATURE OF LAYER FC6 AND
FC7
Fig. 3 shows histogram of sites aesthetic rate.
Mean
Feature extracted Feature extracted
participant
from fc6 from fc7
number

All country 1757 48.78 34.15

C. Feature dimension reduction


Features dimension from fc7 reduced by Principal
Component Analyzing to 128 dimensions. We tested 16, 64,
128 and 256 features as Principal Component. Choosing
lower Principal Components tend to under fitting and large
train error and choosing large Principal Components tend to
over fitting (small train error but large test error). Based on
our experiments and analyses best-reduced size is 128.
D. Classification Result
There was no rating for some sites so we excluded them
from dataset and 418 sites remained. We use 90 percent of
Fig. 2. Histogram of sites aesthetic rate. Websites with rate zero didn’t
have any rate and excluded from the dataset. the dataset for training and 10 percent for testing. We train
our SVM classifier for each country separately. Because
B. Feature representation using CNN each participant did not rate all the websites, some websites
We downsized each image to 227×227 and used all three did not have enough rates so we only use countries that have
categories together. To extracting best features from CNN, more than 40 rates for most of websites. At the end, we
we compare output of layer fc6 and fc7 from pretrained calculated rates for all countries. For SVM classification, we
CNN. For each of layer output we could extract 2048 feature used SVM in MATLAB 2013b. We used Radial basis
per sample. Then we trained a SVM classifier for each. function (RBF) for SVM kernel. RBF kernel fits training
Based on result for all samples and all rates presented in data very well and by using big Gaussian parameter we
Table I we choose feature extracted from fc7 because of could have generality so the classifier could separate test
lower test error. For implementing CNN and feature data.
extraction step, we used Caffe deep learning framework [24]. To compare our model to previous work [2] we used
We run our proposed system on a system with a simple their two features (colorfulness and visual complexity) and
NVIDA GT620 GPU with CUDA compute capability of 2.1. trained SVM classifier for that feature. Then we tested their
classification robustness.

51
Table II shows classification error per country for towards understanding aesthetic judgments,” Int. J. Hum. Comput.
deciding good website from bad. In last row of table, we Stud., vol. 70, no. 11, pp. 794–811, Nov. 2012.
show results for trained classifier based on all users' rates. [2] K. Reinecke, T. Yeh, L. Miratrix, R. Mardiko, Y. Zhao,
Results show that our simple model has better J. Liu, and K. Z. Gajos, “Predicting users’ first impressions of
performance for some country and for total people rate. We website aesthetics with a quantification of perceived visual
complexity and colorfulness,” Proc. SIGCHI Conf. Hum. Factors
have the same result in compare to two features, which are
Comput. Syst. - CHI ’13, pp. 2049–2058, 2013.
handcraft. Also, our CNN is pretrained on another
dataset[20] which has another goal with different images. [3] A. N. Tuch, J. A. Bargas-Avila, and K. Opwis,
“Symmetry and aesthetics in website design: It’s a man’s
TABLE II. CLASSIFICATION ACCURACY BY COUNTRY
business,” Comput. Human Behav., vol. 26, no. 6, pp. 1831–1837,
2010.
[4] E. Michailidou, S. Harper, and S. Bechhofer, “Visual
Color and complexity and aesthetic perception of web pages,” in Proceedings
Our model
Mean participant Complexity
Country
number feature Test
Test error of the 26th annual ACM international conference on Design of
(%) communication, 2008, pp. 215–224.
error (%)
[5] Z. DONG, N. Univeristy, A. T. Submitted, F. O. R. The,
D. Of, and M. Of, “Website aesthetics: Does it matter? A study on
United States 727 34.15 31.71
effects of perception of website aesthetics, usability and content
quality on online purchase intention,” 2007.
Germany 50 51.22 39.02
[6] X. Lu, Z. Lin, H. Jin, J. Yang, and J. Z. Wang, “RAPID:
Rating Pictorial Aesthetics using Deep Learning,” in Proceedings
United Kingdom 270 26.83 34.15 of the ACM International Conference on Multimedia - MM ’14,
2014, pp. 457–466.
Canada 101 34.15 46.34 [7] K. Reinecke, A. Arbor, and K. Z. Gajos, “Quantifying
Visual Preferences Around the World,” Chi, pp. 11–20, 2014.
Finland 40 36.59 41.46
[8] G. Lindgaard, G. Fernandes, C. Dudek, J. Brown, and J.
Lindgaard, Gitte and Fernandes, Gary and Dudek, Cathy and
All 1757 34.15 34.15 Brown, “Attention web designers: You have 50 milliseconds to
make a good first impression!,” Behav. Inf. Technol., vol. 25, no.
V. CONCLUSION 2, pp. 115–126, 2006.
[9] H.-Y. Lim, “The Effect of Color in Web Page Design,”
In this paper, we proposed a website aesthetic evaluation Complement, 2003.
method based on CNNs. Using deep learning and CNNs for
[10] G. Lindgaard, “Does emotional appeal determine
feature representation is one of the main tips that made our perceived usability of web sites,” in Proceedings of CybErg: the
system comparable to the previous ones. We showed that a second international cyberspace conference on ergonomics, 1999,
pretrained CNN feature extraction ability that has been pp. 202–211.
trained on another dataset is comparable to the traditional [11] M. Y. Ivory, U. C. Berkeley, R. R. Sinha, and M. a
handcraft feature extraction methods. Furthermore, CNNs Hearst, “Empirically Validated Web Page Design Metrics,” Proc.
have the options of fine-tuning that brings them the ability of SIGCHI Conf. Hum. Factors Comput. Syst., pp. 53–60, 2001.
learning better and better features to achieve the goal we [12] R. Datta and J. Z. Wang, “Algorithmic inferencing of
have chosen for them. Another contribution of this work is aesthetics and emotion in natural images: An exposition,” 2008
considering the target users’ home country for evaluating the 15th IEEE Int. Conf. Image Process., pp. 105–108, 2008.
websites aesthetic especially for them. [13] Z. Jiang, L. Qiu, C. Yi, B. Choi, and D. Zhang, “An
As future work, a better dataset could be collected with investigation of the effects of website aesthetics and usability on
more samples with more variation in aesthetic rating of online shoppers’ purchase intention,” 16th Am. Conf. Inf. Syst.
users. Another work could be done in CNN fine-tuning. The 2010, AMCIS 2010, vol. 7, pp. 5397–5407, 2010.
CNN used in this work is pretrained on both Places dataset [14] M. Fayyaz, M. H. Saffar, M. Sabokrou, M. Hoseini, and
[22] and ILSVRC 2012 dataset [17], and fine-tuned on M. Fathy, “Online signature verification based on feature
LaMem dataset [20]. It can be fine-tuned on a websites’ representation,” in 2015 The International Symposium on Artificial
screenshots dataset to achieve better feature extraction Intelligence and Signal Processing (AISP), 2015, pp. 211–216.
accuracy. Another improvement can be done in classification [15] M. Sabokrou, M. Fathy, M. Hoseini, and R. Klette,
step. By collecting a better dataset with more aesthetic “Real-time anomaly detection and localization in crowded scenes,”
in 2015 IEEE Conference on Computer Vision and Pattern
variant websites, classification can have more classes for
Recognition Workshops (CVPRW), 2015, pp. 56–62.
better evaluation, or it could be converted to regression for
[16] M. Fayyaz, M. Hajizadeh-Saffar, M. Sabokrou, M.
exact rating.
Hoseini, and M. Fathy, “A novel approach for Finger Vein
VI. REFERENCES verification based on self-taught learning,” 2015 9th Iranian
Conference on Machine Vision and Image Processing (MVIP). pp.
[1] A. N. Tuch, E. E. Presslaber, M. Stöcklin, K. Opwis, and 88–91, 2015.
J. a. Bargas-Avila, “The role of visual complexity and
prototypicality regarding first impression of websites: Working

52
[17] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet
classification with deep convolutional neural networks,” Adv.
neural …, pp. 1–9, 2012.
[18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D.
Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going
deeper with convolutions,” arXiv Prepr. arXiv1409.4842, 2014.
[19] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner,
“Gradient-based learning applied to document recognition,” Proc.
IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[20] A. O. Aditya Khosla, Akhil S. Raju, Antonio Torralba,
A. Khosla, A. S. Raju, A. Torralba, and A. Oliva, “Understanding
and Predicting Image Memorability at a Large Scale,” IEEE Int.
Conf. Comput. Vis., pp. 2390–2398, 2015.
[21] X.-X. Niu and C. Y. Suen, “A novel hybrid CNN–SVM
classifier for recognizing handwritten digits,” Pattern Recognit.,
vol. 45, no. 4, pp. 1318–1325, 2012.
[22] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A.
Oliva, “Learning deep features for scene recognition using places
database,” in Advances in Neural Information Processing Systems,
2014, pp. 487–495.
[23] C. Cortes and V. Vapnik, “Support-vector networks,”
Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995.
[24] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long,
R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional
architecture for fast feature embedding,” in Proceedings of the
ACM International Conference on Multimedia, 2014, pp. 675–678.

53

You might also like