Professional Documents
Culture Documents
A Novel Approach For Website Aesthetic Evaluation Based On Convolutional Neural Networks
A Novel Approach For Website Aesthetic Evaluation Based On Convolutional Neural Networks
Abstract—In this paper we propose a website aesthetic users do not tend to refer to them, and they have little traffic
evaluation method. For achieving better performance, we have [2] [3]. In other words, aesthetic is more a quality parameter
applied convolutional neural networks, which are one of the than a quantitative parameter to measure. Another view
methods of deep learning research area. Using deep learning examines the effect of complexity on websites. Some
and convolutional neural networks for feature representation
is one of the main tips that makes difference between our work
researchers believe that simplicity makes beauty. But as was
and previous ones. Our system takes a screenshot of the said, it is not all the reason of beauty [4].
website as input, and finally reports it is a good or bad website In this article, we seek to understand the beauty with a
based on users’ country or not. For evaluation process, we general way. It seems that the use of numerical quantities,
represent the website screenshot using MemNet convolutional such as the extraction of some of the parameters of the site is
neural network. Then we decrease the extracted features a part of providence. For example, using some quantitative
dimension using principal component analysis algorithm. parameters such as the number of graphical elements,
Finally, we classify them using a SVM classifier, which trained, density, and size can only be a guess about the beauty of a
based on users’ ratings. Furthermore, aesthetics evaluation in
website. Guess that may be not true in many cases and does
this research is language independent. It means the website’s
language is not important and our method works for all
not lead to results. There is the possibility of wrong
languages. parameter selection for representing the beauty of the site. It
is also possible that some errors occur during calculation of
Index Terms— Aesthetics, Website, Deep Learning, parameters. Generality keeps us away from such mistakes.
Convolutional Neural Network. Using a general strategy can have a better performance on
website aesthetic evaluation. However, as our approach is in
I. INTRODUCTION general way, we do not use the numerical parameters.
Today, the majority of our needs met through internet. Our approach is also language-independent and can be
Things like shopping, reading news, and online gaming are used for all sites in any language. This feature makes this
only part of the websites’ application. Thus, in general algorithm to be able to be applied in different countries. One
websites classified in groups, such as, news, scientific, of the major challenge of the traditional website aesthetic
marketing, etc. There are many sites in each group. This evaluation systems, such as, [5]is deciding what features to
number added each day and even every hour. One crucial be extracted from the website, to help the system in the
point that makes the users to be attracted to a website is evaluation process. In other word to evaluate the aesthetic of
aesthetic. Aesthetic is the first thing that a user after visiting websites, some features are needed. These features should be
the website is unconsciously affected by[1]. This effect able to represent the websites based on their aesthetic. This is
makes the user next visits to the sites that they are desirable a traditional problem in machine learning research. For
for him or her. In other words, when a group of websites solving such problem, a wide variety of algorithms, methods,
offer similar services, users tend to be among the sites that and field of research have been introduced. One of the most
they are more visually attractive than others[1]. That is why promising introduced fields of research is deep learning. It is
today's web designers spend a lot of time on graphical a new area of machine learning research that solves the
elements and images used on the website and they design problem of choosing best hand crafted features by learning
them with great care. the best features for representing the raw data.
However, what is really causes beauty for the site is in To alleviate the problem of choosing best features, we
the eyes of the audience. Attractive and high quality images, have used Convolutional Neural Networks (CNN), which are
graphic elements, and animations can make the website one of the best feature extraction methods in deep learning.
beautiful. Nevertheless, in fact this is not the case. Because Our system takes the website screenshot as input. Then
we know websites that take advantage of this feature, but represents it using a convolutional neural network. Finally,
49
as, [6], [14], [15], [16] which shows the power of deep kernel size of 11×11 and stride of 4 performs which outputs
learning. 96 feature maps with size of 55×55. Then a max pooling,
As mentioned above deep learning has the power of pools features to decrease avoid the impact of rotation of the
extracting the best features from our raw data. This key point input image to final features. In second convolution layer,
of deep learning motivates us for applying deep learning convolution has been done with kernel size of 5×5, stride of
methods for websites aesthetic evaluation problem. One of one, which outputs 256 feature maps with size of 27×27. The
the state of the art deep learning tools for feature extraction other convolutional layers operate like previous and what is
is Convolutional Neural Network [17] which will be referred seen in the Fig. 2. Finally, after five convolutional layers
to by, CNN. This motivates us to apply the CNNs for there are two full-connected layers for abstracting the
aesthetic evaluation problems. The basic idea is that, given a features and decreasing the features dimension to 4096. For
screenshot from the website page, which we want to extracting the features, we have used layers fc6 and fc7 of
evaluate, we use a CNN to represent it based on the CNN the CNN to evaluate them and choose the best one that is
learned features, then we dimensionally reduce the extracted discussed in next section. Both fc6 and fc7 layers have the
features to achieve better performance in classification, dimension of 4096.
finally we classify websites using a support vector machine
B. Feature dimension reduction
(SVM). The flowchart of our proposed method is shown in
Fig. 1. As mentioned above, representing features using CNN
gives us a feature vector with dimension of 4096, which is
computationally expensive for classification step. To solve
Website screenshot the problem, we have used principal component analysis
(PCA) algorithm to reduce the dimension of features
C. SVM classification
Feature Representation Our feature dimension was 128 for each websites. After
using CNN obtaining features for image we used Support Vector
Machine [23], with Gaussian radial basis function kernel, as
classifier for training and testing step. The classifier takes
Feature Dimensionality features vector of size 128 for each website as input and
Reduction using PCA classify them into two categories: good or bad.
IV. EXPERIMENTAL RESULT
In this section, we first introduce websites screenshots
Classification using MLP dataset, then our result for all the dataset and for each
country presented and discussed separately.
Fig. 1. Our proposed method consist of three main steps: A. Websites screenshot dataset
Feature representation using CNN, feature dimension reduction, and
SVM classification. In the following paragraphs, these stages We use dataset presented by Intelligent Interactive
are explained. Systems Group at Harvard University, which has been used
in [2], [7]. Dataset consist of 430 screenshots captured from
A. Feature representation using CNN websites’ homepages. All image resolution was 1024×768.
In this step, we represent the input raw data using CNN. 350 websites were English websites, 60 websites obtained
There are a wide variety of CNN architectures which have from other languages and 20 websites selected from Webby
the state of the art performances,[17] [18] [19] The CNN Award Candidate [2]. They selected websites to represent
architecture we used is based on AlexNet CNN [17].The wide variety of genres but unfortunately structure of most
training of our CNN is like MemNet [20] that uses pre- sites were very similar which results in low variety of users’
trained form of Hybrid-CNN [21], trained on both Places rating. For evaluating the collected websites’ aesthetic, they
dataset [22] and ILSVRC 2012 dataset [17], and fine-tuned designed an online test that people around the world could
on LaMem dataset [20]. As it can be seen in Fig. 2, The rate complexity, colorfulness and visual appeal of webpages.
CNN we used for feature extraction consists of five Each webpages display 500ms for users so sites contents did
convolutional layers, three max pooling layers, and two full- not affected their impression. Data collected from volunteer
connected layers (fc6 and fc7). The input dimensions of the between June 2012 and August 2013. Each participant rate
CNN are 227×227×3 (an RGB image with size of 227×227). 30 random websites two times to ensure that they rated the
In the first convolution layer, a convolution operation with sites accurately.
50
Fig. 1. Architecture of pretrained CNN
51
Table II shows classification error per country for towards understanding aesthetic judgments,” Int. J. Hum. Comput.
deciding good website from bad. In last row of table, we Stud., vol. 70, no. 11, pp. 794–811, Nov. 2012.
show results for trained classifier based on all users' rates. [2] K. Reinecke, T. Yeh, L. Miratrix, R. Mardiko, Y. Zhao,
Results show that our simple model has better J. Liu, and K. Z. Gajos, “Predicting users’ first impressions of
performance for some country and for total people rate. We website aesthetics with a quantification of perceived visual
complexity and colorfulness,” Proc. SIGCHI Conf. Hum. Factors
have the same result in compare to two features, which are
Comput. Syst. - CHI ’13, pp. 2049–2058, 2013.
handcraft. Also, our CNN is pretrained on another
dataset[20] which has another goal with different images. [3] A. N. Tuch, J. A. Bargas-Avila, and K. Opwis,
“Symmetry and aesthetics in website design: It’s a man’s
TABLE II. CLASSIFICATION ACCURACY BY COUNTRY
business,” Comput. Human Behav., vol. 26, no. 6, pp. 1831–1837,
2010.
[4] E. Michailidou, S. Harper, and S. Bechhofer, “Visual
Color and complexity and aesthetic perception of web pages,” in Proceedings
Our model
Mean participant Complexity
Country
number feature Test
Test error of the 26th annual ACM international conference on Design of
(%) communication, 2008, pp. 215–224.
error (%)
[5] Z. DONG, N. Univeristy, A. T. Submitted, F. O. R. The,
D. Of, and M. Of, “Website aesthetics: Does it matter? A study on
United States 727 34.15 31.71
effects of perception of website aesthetics, usability and content
quality on online purchase intention,” 2007.
Germany 50 51.22 39.02
[6] X. Lu, Z. Lin, H. Jin, J. Yang, and J. Z. Wang, “RAPID:
Rating Pictorial Aesthetics using Deep Learning,” in Proceedings
United Kingdom 270 26.83 34.15 of the ACM International Conference on Multimedia - MM ’14,
2014, pp. 457–466.
Canada 101 34.15 46.34 [7] K. Reinecke, A. Arbor, and K. Z. Gajos, “Quantifying
Visual Preferences Around the World,” Chi, pp. 11–20, 2014.
Finland 40 36.59 41.46
[8] G. Lindgaard, G. Fernandes, C. Dudek, J. Brown, and J.
Lindgaard, Gitte and Fernandes, Gary and Dudek, Cathy and
All 1757 34.15 34.15 Brown, “Attention web designers: You have 50 milliseconds to
make a good first impression!,” Behav. Inf. Technol., vol. 25, no.
V. CONCLUSION 2, pp. 115–126, 2006.
[9] H.-Y. Lim, “The Effect of Color in Web Page Design,”
In this paper, we proposed a website aesthetic evaluation Complement, 2003.
method based on CNNs. Using deep learning and CNNs for
[10] G. Lindgaard, “Does emotional appeal determine
feature representation is one of the main tips that made our perceived usability of web sites,” in Proceedings of CybErg: the
system comparable to the previous ones. We showed that a second international cyberspace conference on ergonomics, 1999,
pretrained CNN feature extraction ability that has been pp. 202–211.
trained on another dataset is comparable to the traditional [11] M. Y. Ivory, U. C. Berkeley, R. R. Sinha, and M. a
handcraft feature extraction methods. Furthermore, CNNs Hearst, “Empirically Validated Web Page Design Metrics,” Proc.
have the options of fine-tuning that brings them the ability of SIGCHI Conf. Hum. Factors Comput. Syst., pp. 53–60, 2001.
learning better and better features to achieve the goal we [12] R. Datta and J. Z. Wang, “Algorithmic inferencing of
have chosen for them. Another contribution of this work is aesthetics and emotion in natural images: An exposition,” 2008
considering the target users’ home country for evaluating the 15th IEEE Int. Conf. Image Process., pp. 105–108, 2008.
websites aesthetic especially for them. [13] Z. Jiang, L. Qiu, C. Yi, B. Choi, and D. Zhang, “An
As future work, a better dataset could be collected with investigation of the effects of website aesthetics and usability on
more samples with more variation in aesthetic rating of online shoppers’ purchase intention,” 16th Am. Conf. Inf. Syst.
users. Another work could be done in CNN fine-tuning. The 2010, AMCIS 2010, vol. 7, pp. 5397–5407, 2010.
CNN used in this work is pretrained on both Places dataset [14] M. Fayyaz, M. H. Saffar, M. Sabokrou, M. Hoseini, and
[22] and ILSVRC 2012 dataset [17], and fine-tuned on M. Fathy, “Online signature verification based on feature
LaMem dataset [20]. It can be fine-tuned on a websites’ representation,” in 2015 The International Symposium on Artificial
screenshots dataset to achieve better feature extraction Intelligence and Signal Processing (AISP), 2015, pp. 211–216.
accuracy. Another improvement can be done in classification [15] M. Sabokrou, M. Fathy, M. Hoseini, and R. Klette,
step. By collecting a better dataset with more aesthetic “Real-time anomaly detection and localization in crowded scenes,”
in 2015 IEEE Conference on Computer Vision and Pattern
variant websites, classification can have more classes for
Recognition Workshops (CVPRW), 2015, pp. 56–62.
better evaluation, or it could be converted to regression for
[16] M. Fayyaz, M. Hajizadeh-Saffar, M. Sabokrou, M.
exact rating.
Hoseini, and M. Fathy, “A novel approach for Finger Vein
VI. REFERENCES verification based on self-taught learning,” 2015 9th Iranian
Conference on Machine Vision and Image Processing (MVIP). pp.
[1] A. N. Tuch, E. E. Presslaber, M. Stöcklin, K. Opwis, and 88–91, 2015.
J. a. Bargas-Avila, “The role of visual complexity and
prototypicality regarding first impression of websites: Working
52
[17] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet
classification with deep convolutional neural networks,” Adv.
neural …, pp. 1–9, 2012.
[18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D.
Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going
deeper with convolutions,” arXiv Prepr. arXiv1409.4842, 2014.
[19] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner,
“Gradient-based learning applied to document recognition,” Proc.
IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[20] A. O. Aditya Khosla, Akhil S. Raju, Antonio Torralba,
A. Khosla, A. S. Raju, A. Torralba, and A. Oliva, “Understanding
and Predicting Image Memorability at a Large Scale,” IEEE Int.
Conf. Comput. Vis., pp. 2390–2398, 2015.
[21] X.-X. Niu and C. Y. Suen, “A novel hybrid CNN–SVM
classifier for recognizing handwritten digits,” Pattern Recognit.,
vol. 45, no. 4, pp. 1318–1325, 2012.
[22] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A.
Oliva, “Learning deep features for scene recognition using places
database,” in Advances in Neural Information Processing Systems,
2014, pp. 487–495.
[23] C. Cortes and V. Vapnik, “Support-vector networks,”
Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995.
[24] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long,
R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional
architecture for fast feature embedding,” in Proceedings of the
ACM International Conference on Multimedia, 2014, pp. 675–678.
53