Professional Documents
Culture Documents
Software Development Metrics Prediction Using Time Series Methods
Software Development Metrics Prediction Using Time Series Methods
In this paper we investigate four models that can be used to analyze time-based
data series for prediction of trends observed in the software development process,
based on the information obtained from mentioned tools (e.g. GitLab). Those
models are (1) Exponential Smoothing, (2) Holt-Winters Model, (3) Autoregres-
sive Integrated Moving Average (ARIMA) and (4) Recurrent Neural Networks
(RNN).
Time series analysis can be defined as the decomposition of a given Xt signal
into a trend Tt , a seasonal component St and the residual component et (an
example is shown in Fig. 1).
314 M. Choraś et al.
110
data
95
seasonal
0.0
−0.4
remainder trend
110
95
2
0
−3
2 4 6 8 10 12 14
time
0.2
0.1
0.6
Partial ACF
ACF
0.0
0.2
−0.2
−0.2
0 1 2 3 4 1 2 3 4
Lag Lag
The simplest way to obtain the trend of a signal is to use linear filter on
the Xt :
∞
Tt = λi Xt+i (1)
i=−∞
Software Development Metrics Prediction Using Time Series Methods 315
1 a
Tt = Xt+i (2)
2a + 1 i=−a
Exponential smoothing is a method for refining time series data using the expo-
nential window function. In contrast to the simple moving average technique, the
forecasting of data trends using exponential smoothing is characterized by expo-
nentially decreasing weights of past observations, instead of equally weighted
observations. This method is especially beneficial in forecasting time series char-
acterized by a low variation of trend and seasonal changes. Numerous recent
works show the viability of exponential smoothing for prediction purposes. In
the project management area, [3] and [13] propose improvements to well-known
Earned Value Management (EVM) project control methodology and Earned
Duration Index (EDI) parameter by enhancing it with exponential smoothing to
predict project progress and calculate project completion time and cost. In both
cases, application of exponential smoothing was easy to incorporate to the tra-
ditional forecasting methodologies, raised their accuracy and reduced errors. For
example, EVM enhanced by this technique led to the production of 14.8% more
accurate time predictions and 25.1% more accurate cost predictions based on
the data from 23 real-life projects examined by [3]. Exponential smoothing has
been also applied to forecast financial/industrial indicators and variable physi-
cal quantities. [29] propose three methods using exponential smoothing to pre-
dict solar irradiance over time, based on historical data. Three variations of the
method were examined: exponential smoothing with seasonal-trend decomposi-
tion, Global Horizontal Irradiance (GHI) time series decomposition (forecasted
separately into a direct component and a diffuse component and then recom-
bined) and the regression-based reconstruct of GHI through employing expo-
nential smoothing for cloud coverage forecasting. [26] compared several methods
based on exponential smoothing to forecast palm oil production in a given time.
Those were: single ES, double ES (Holt Model), triple ES (Holt-Winters Model),
triple ES additive and multiplicative. Due to the fact that historical data contain
a seasonal component, authors recommend triple exponential smoothing addi-
tive as the suggested method for prediction, in this case, giving a smaller error
value in the forecasted results.
Exponential smoothing predicts the next value of a given time series using
geometric weighted sum of past data samples.
However, the exponential smoothing model should only be used with signals that
have no systematic trend and seasonal components.
316 M. Choraś et al.
X
MA a=2
115
MA a=5
# GitLab Issues
105
95
time
As described above, the Holt-Winters model can be used in order to deal with
the limitations of the exponential smoothing model. The Holt-Winters model has
three smoothing parameters, namely α (for the constant level value), β (for the
trend), and γ (for seasonal variations).
115
# GitLab Issues
105
95
time
Fig. 4. Holt-Winters fitted model (red) for GitLab issues time series (black) (Color
figure online)
2.3 ARIMA
ARIMA stands for Autoregressive Integrated Moving Average. ARIMA is com-
posed of an autoregressive and a moving average model, where the autoregression
5 10 15
Fig. 5. Holt-Winters prediction for next 3 days. The gray bars indicate the confidence
levels (95% dark gray and 80% light gray) (Color figure online)
318 M. Choraś et al.
model involves regressing the variables based on past values. The model is com-
monly denoted as ARIMA(p,d,q) where parameters p, d, and q are non-negative
integers, which control three main components of the model: (i) the order (the
number of time lags) of the autoregressive model, (ii) the degree of differencing,
and (iii) the order of the moving-average.
It must be noted that ARIMA models are defined for stationary time series.
In order to obtain it, the non-stationary time series has to be differenced until the
non-stationarity is eliminated. The number of differencing iterations corresponds
to the d parameter of the model. In order to chose the other two parameters (p
and q) we must analyze the autocorrelation diagram (ACF) and the partial
autocorrelation diagram (PACF). From the diagram shown in Fig. 2, we may
conclude that p will be 0, because ACF function drops below the significance
threshold (dashed line) after lag 0. Similarly, we can estimate the q = 0 value
using the PACF diagram.
Autoregressive Integrated Moving Average (ARIMA) is a technique success-
fully applied for the prediction of non-stationary time series in different areas
(e.g. finances or cybersecurity [2]). QoS in a cloud environment is a case for
examination of the ARIMA model for cloud workload prediction purposes [4].
Authors concluded that their model is able to improve the prediction accuracy of
up to 91%, contributing to a better cloud resources allocation with no impact on
the cloud environment performance. ARIMA has also been studied for the pur-
poses of prediction in software engineering. For example, a comparative analysis
of ARIMA, neural networks and a hybrid technique for Debian bug number pre-
diction can be found in [19], where a model including a combination of ARIMA
an Artificial Neural Networks has been presented as a promising approach to
improve prediction accuracy. On the other hand, [16] shows that hybrid model
based on the Singular Spectrum Analysis (SSA) and ARIMA outperforms other
models in prediction of software failures during the software testing process.
Furthermore, Recurrent Neural Networks (RNN) gain traction in the context
of forecasting and prediction based on time series data. A number of publications
describe the use of the RNN for such purposes, including [6] Chang et al. for
flooding water level forecasting, [24] Rout et al. for financial time series data pre-
diction (forecasting of stock market indices) or [18] Muller-Navarra et al. for sales
forecasting using partial recurrent neural networks. RNNs have some advantages
over other methods used for time series prediction, namely: robustness to noise
in a dataset, robustness to the incompleteness of the data, a better ability to
predict from big datasets, a better ability to approximate non-linear functions
and learning temporal dependencies among the data [8].
Software Development Metrics Prediction Using Time Series Methods 319
Fig. 6. Identified correlated facts: backlog size and the cognitive complexity (left),
backlog size and the number of duplicated lines (right).
In our research we have used real data coming from SW development team
working on commercial and research IT products (web applications) from several
projects. The software team consists of up to 10 individuals, and they extensively
use GitLab and SonarQube. According to the procedure described in the previous
publications concerning Q-Rapids approach to data gathering and analysis, we
have created connectors to those software management tools in order to calculate
metrics, factors and strategic indicators. Hereby, our work facilitates adding the
prediction aspect to help product owners in their daily decision making.
analysis is the Holt-Winters model. The method determines the trend of the
data it operates on, and estimates the seasonality of the series. The fitted Holt-
Winters model of GitLab issues time series has been shown in Fig. 4. In contrast
to the moving average approach, the Holt-Winters algorithm provides a better fit
for the original data. Moreover, the model also uses a more formal mathematical
tool suit to provide the prediction capabilities. The predictions for the next three
days for the same data have been presented in Fig. 5. The procedure constitutes
an approximation with a certain confidence level of a given time series based on
the raw data.
In the literature, we can find numerous examples [28] that the Autoregres-
sive Integrated Moving Average often outperforms the Holt-Winters algorithm.
Therefore, forecasting using the ARIMA model has also been attempted in this
work.
In general, we have observed that ARIMA seasonal model allowed us to
achieve a higher (than the Holt-Winters algorithm) fitting degree and lower fore-
casting error. For example, in Fig. 6 we have shown, that for a short time frame
prediction, the ARIMA model allowed us to identify that the increasing trend
of number of items in sprint backlog may cause deterioration of the code quality
(increased cognitive complexity and number of duplicated lines). In that case we
have used 3 ARIMA models: one to predict the backlog size, one to predict the
cognitive complexity, and one to predict the number of duplicated lines. Statis-
tically significant correlations have been spotted with auto correlation function
(ACF).
eT +h = yT +h − ŷT +h (4)
Moreover, here we assume that we split the time series y in to a training set
{y1 , · · · , yT } and testing set {yT , yT +1 · · · }. In our experiments we have evalu-
ated the prediction effectiveness of following models, namely (described in pre-
vious section) ARIMA and Holt-Winters, and random walk forecast. The effec-
tiveness results for different scenarios have been reported in Tables 1, 2 and 3.
Software Development Metrics Prediction Using Time Series Methods 321
We have identified the similar results also for other metrics, namely the
number of task under development (tasks in progress) and the number of delayed
tasks.
5 Conclusions
In this paper we have addressed the problem of the analysis of the software devel-
opment and quality data. Our thesis was that the time-series analysis methods
are well suited to the challenge of software related data prediction. Such methods
and tools (e.g. offered by H2020 Q-Rapids Project) can be helpful to Product
Owners in their daily decisions related to e.g. assignment of tasks, time pre-
dictions, bugs predictions, time to release etc. We conducted our experiments
on real case data in order to forecast various project-related characteristics. We
have compared different forecasting tools such as ARIMA, Random Walk, and
Holt-Winters. Our experiments have shown that ARIMA model performs well
on our data. It achieved the lowest prediction errors among compared ones.
322 M. Choraś et al.
Acknowledgments. This work is funded under Q-Rapids project, which has received
funding from the European Union’s Horizon 2020 research and innovation programme
under grant agreement No. 732253.
References
1. Tovey, A.: Cyber attacks cost British industry 34bn a year (2018). http://
www.telegraph.co.uk/finance/newsbysector/industry/defence/11663761/Cyber-
attacks-cost-British-industry-34bn-a-year.html
2. Andrysiak, T., Saganowski, L ., Choraś, M., Kozik, R.: Network traffic predic-
tion and anomaly detection based on ARFIMA model. In: de la Puerta, J.G.,
et al. (eds.) International Joint Conference SOCO’14-CISIS’14-ICEUTE’14. AISC,
vol. 299, pp. 545–554. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-
07995-0 54
3. Batselier, J., Vanhoucke, M.: Improving project forecast accuracy by integrating
earned value management with exponential smoothing and reference class forecast-
ing. Int. J. Project Manage. 35(1), 28–43 (2017)
4. Calheiros, R.N., Masoumi, E., Ranjan, R., Buyya, R.: Workload prediction using
ARIMA model and its impact on cloud applications’ QoS. IEEE Trans. Cloud
Comput. 3(4), 449–458 (2015)
5. Capgemini. Capgemini: World Quality Report 2016–17 (2017). https://www.
capgemini.com/world-quality-report-2016-17/. Accessed 9 Oct 2017
6. Chang, F.-J., Chen, P.-A., Ying-Ray, L., Huang, E., Chang, K.-Y.: Real-time multi-
step-ahead water level forecasting by recurrent neural networks for urban flood
control. J. Hydrol. 517, 836–846 (2014)
7. Choraś, M., Kozik, R., Puchalski, D., Renk, R.: Increasing product owners’ cog-
nition and decision-making capabilities by data analysis approach. Cogn. Technol.
Work 21(2), 191–200 (2019)
8. Dorffner, G.: Neural networks for time series processing. Neural Netw. World 6,
447–468 (1996)
9. Felderer, M., Ramler, R.: Risk orientation in software testing processes of small
and medium enterprises: an exploratory and comparative study. Softw. Qual. J.
24, 519–548 (2015)
10. Franch, X., et al.: Data-driven requirements engineering in agile projects: the Q-
rapids approach, pp. 411–414, September 2017
11. Desjardins, J.: QASymphony, how many millions of lines of code does it take?
(2017). https://informationisbeautiful.net/visualizations/million-lines-of-code/
12. Jones, C., Bonsignour, O.: The Economics of Software Quality, 1st edn. Addison-
Wesley Professional, Boston (2011)
13. Khamooshi, H., Abdi, A.: Project duration forecasting using earned duration man-
agement with exponential smoothing techniques. J. Manage. Eng. 33, 04016032
(2016)
14. Kozik, R., Choraś, M., Puchalski, D., Renk, R.: Platform for software quality
and dependability data analysis. In: Zamojski, W., Mazurkiewicz, J., Sugier, J.,
Walkowiak, T., Kacprzyk, J. (eds.) DepCoS-RELCOMEX 2018. AISC, vol. 761, pp.
306–315. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91446-6 29
15. Guzmán, L., Oriol, M., Rodrı́guez, P., Franch, X., Jedlitschka, A., Oivo, M.: How
can quality awareness support rapid software development? – A research preview.
In: Grünbacher, P., Perini, A. (eds.) REFSQ 2017. LNCS, vol. 10153, pp. 167–173.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54045-0 12
Software Development Metrics Prediction Using Time Series Methods 323
16. Liu, G., Zhang, D., Zhang, T.: Software reliability forecasting: singular spectrum
analysis and ARIMA hybrid model, pp. 111–118, September 2015
17. López, L., et al.: Q-rapids tool prototype: supporting decision-makers in manag-
ing quality in rapid software development. In: Mendling, J., Mouratidis, H. (eds.)
CAiSE 2018. LNBIP, vol. 317, pp. 200–208. Springer, Cham (2018). https://doi.
org/10.1007/978-3-319-92901-9 17
18. Müller-Navarra, M., Lessmann, S., Voß, S.: Sales forecasting with partial recurrent
neural networks: empirical insights and benchmarking results. In: 2015 48th Hawaii
International Conference on System Sciences, pp. 1108–1116 (2015)
19. Pati, J., Shukla, K.K.: A comparison of ARIMA, neural network and a hybrid
technique for Debian bug number prediction, pp. 47–53, September 2014
20. Jorgensen, P.C.: Software testing: a craftsman’s approach (2016)
21. Kozik, R., Choraś, M., Puchalski, D., Renk, R.: Q-rapids framework for advanced
data analysis to improve rapid software development. J. Ambient Intell. Hum.
Comput. 10, 1927–1936 (2019)
22. Q-Rapids: Q-rapids H2020 project (2018). http://www.q-rapids.eu/
23. QASymphony: QASymphony, The Cost of Poor Software Quality (2016). https://
www.qasymphony.com/blog/cost-poor-software-quality/
24. Rout, A.K., Dash, P.K., Dash, R., Bisoi, R.: Forecasting financial time series using
a low complexity recurrent neural network and evolutionary learning approach. J.
King Saud Univ. - Comput. Inform. Sci. 29(4), 536–552 (2017)
25. Shahin, A.A.: Using multiple seasonal holt-winters exponential smoothing to pre-
dict cloud resource provisioning. CoRR, abs/1701.03296 (2016)
26. Siregar, B., Butar-Butar, I.A., Rahmat, R.F., Andayani, U., Fahmi, F.: Comparison
of exponential smoothing methods in forecasting palm oil real production. J. Phys.:
Conf. Ser. 801(1), 012004 (2017)
27. Wang, J., Wu, Z., Shu, Y., Zhang, Z., Xue, L.: A Study on software reliability
prediction based on triple exponential smoothing method (WIP), pp. 61:1–61:9
(2014)
28. Wang, Z.-H., Lu, C.-Y., Pu, B., Li, G.-W., Guo, Z.-J.: Short-term forecast model of
vehicles volume based on ARIMA seasonal model and holt-winters. In: ITM Web
Conferrence, vol. 12, p. 04028 (2017)
29. Yang, D., Sharma, V., Ye, Z., Lim, L.I., Zhao, L., Aryaputera, A.W.: Forecasting
of global horizontal irradiance by exponential smoothing, using decompositions.
Energy 81, 111–119 (2015)
Tag-Cloud Based Recommendation
for Movies
Sambo Dutta1 , Soumita Das1 , Joydeep Das2(B) , and Subhashis Majumder1
1
Department of Computer Science and Engineering, Heritage Institute of
Technology, Kolkata, WB, India
sambodutta@gmail.com, soumitamum210@gmail.com,
subhashis.majumder@heritageit.edu
2
The Heritage Academy, Kolkata, WB, India
joydeep.das@heritageit.edu
1 Introduction
The objective of any recommendation system is to predict the items that a user
may like or dislike among a list of items. Recommendation systems are of two cat-
egories namely collaborative filtering (CF) and content based filtering. CF based
systems rely on the past user interaction with the items in the database such
as user ratings, votes, etc. [16], while content based systems simple rely on the
analysis of the content associated with users or items [9]. In recent years group
recommendations are becoming popular [1,6]. Due to the diversity of interests
in a single group, it becomes a challenge to make recommendations of items for
c Springer Nature Switzerland AG 2019
K. Saeed et al. (Eds.): CISIM 2019, LNCS 11703, pp. 324–336, 2019.
https://doi.org/10.1007/978-3-030-28957-7_27
Tag-Cloud Based Recommendation for Movies 325
a group which will be in accordance with the preferences of most of the group
members. In literature, there are two most common strategies to generate group
recommendations. The first recommendation strategy is to generate personal-
ized recommendations for each group member. Subsequently, these individual
recommendation lists are merged based on different aggregation techniques to
generate a single recommendation list that cater to the interests of the group as a
whole [1]. The second strategy is to employ the aggregation techniques to gener-
ate a group profile by combining individual preferences [5]. This group profile is
then treated as a pseudo user and recommendations are made using traditional
recommendation algorithms. In this paper, we employ the second strategy.
Most of the group recommendation systems developed till now implement
CF technique [1,13] to generate the group profile. However, the semantic infor-
mation available in the tags or reviews of the group members have not been fully
explored. In this work, we propose a framework that primarily relies on the tags
assigned to the movies by the users and the frequency of their occurrences. These
tags not only help in capturing the content of the movies but the frequency of
occurrence also helps in estimating the correct weightage of the tags among the
group members. The tags and their corresponding weightages are used to com-
pute the individual tag clouds of the group members and also the tag clouds
of the movies to be considered for recommendation. Following this, a group tag
cloud is generated by combining the individual tag clouds based on aggregation
techniques. This group tag cloud is then compared with the movie tag cloud to
compute the similarity score between the two using the lexical database Word-
Net [3,7]. We name this similarity score as Group Score and on the basis of this
score top-N recommendations are generated for the group. In case of identical
Group Score, movies having a higher Popularity Score are ranked higher. Popu-
larity Score is the measure of the popularity of a movie in terms of the number
of unique users who tagged it. Our recommendation framework with different
modules is depicted in Fig. 1. We have verified our algorithm using Precision
and Normalized Discounted Cumulative Gain (nDCG) metrics on MovieLens-
10M and Movielens-20M datasets. Experimental results show that precision and
nDCG values improve when the similarity between group members is higher.
Finally, we state that our proposed design is not only simple but can also be
easily extended to make group recommendations on items other than movies.
2 Related Work
In the domain of movies, Polylens is the first recommender system that provides
recommendations for groups of users [11]. It is a CF based group recommender
system, where a single neighborhood is formed for the group and then the individ-
ual recommendations are merged based on least misery technique. Research per-
taining to aggregating individual recommendations produced by a CF method to
generate group recommendations was addressed by Baltrunas et al. [1]. Recom-
mendations for a group of users were built using 5 rank aggregation techniques
and were tested for group having 2, 3, 4 and 8 members. Removal of natural
noise (biasness) in ratings becomes a challenge for group recommender systems.
Castro et al. [4] developed a fuzzy tool for noise removal from ratings in order to
provide more accurate group recommendations by incorporating fuzzy profiling,
global and local noise management. Seo et al. [14] added a deviation element
to the aggregation method used for providing group recommendations. Items
with low deviations are preferred over items with high deviations. GroupReM
is a group recommender system that instead of using user ratings, leverages the
semantic information from tags to generate recommendations [12]. Although it
uses word-correlation factor to find movies based on group preferences, it ignores
the weightage of each tag in the individual or the group profile. In our work,
we have employed the tag cloud technique [15] to assign a correct weightage to
each tag, which helps to depict the preferences of individuals or of the group as
a whole accurately. We then go on to find the tag similarity to predict movies to
groups based on the similarity between the group tag cloud and the movie tag
cloud using an appropriate scoring system.
3 Our Framework
The first task of any group recommendation system is to form groups of optimal
size. However, to the best of our knowledge none of the available datasets that
are used to evaluate recommender systems provide built-in groups of users. Our
primary focus is to form cohesive groups having optimal sizes from the users
present in the MovieLens dataset. We have varied the group size between 3 to 8
members which is comparable to the experimental group sizes used in some of
the earlier works [1,12]. We feel a very large group has a lot of diversity in the
preferences of its members and there would hardly be any similarity between
their choices, thus making the recommendations irrelevant. Since we are only
considering the tags while calculating similarity between the group members,
it is not possible to calculate this similarity directly using Cosine similarity or
Euclidean distance. So, for calculating the similarity between two users (say u1
and u2 ) of the group, we compute the similarity between the tag clouds of the
two users as follows.
Tag-Cloud Based Recommendation for Movies 327
1
n
Similarity (u1 , u2 ) = max sim (tagi u1 , tagj u2 ) (1)
n i=1 1≤j≤m
where, n is the number of tags in the tag cloud of user u1 , m is the number
of tags in the tag cloud of user u2 and sim() is a function that calculates the
similarity between two tags using the lexical database WordNet [7]. Note that
similarity (u1 , u2 ) = similarity (u2 , u1 ). Equation 1 ensures that when one tag,
say t1 in the tag cloud of u1 is exactly matched with a tag, say t2 in the tag
cloud of u2 , it is not penalized for the lower similarity measure between t1 and
any other tag in tag cloud of user u2 . Thus two users with exactly the same tag
clouds will always have a similarity of 1. Note that the similarity value for an
ordered pair of users will always lie in the range [0, 1]. Finally, for the Group
Similarity, we take an average of all the user pair similarities. This ensures that
the Group Similarity is also in the range [0, 1]. In our work, we randomly form
groups of a predetermined number of users to verify our algorithm.
A tag cloud of an individual user consists of the tags and their corresponding
frequencies used by that user. Once we find the tag that has been used most
frequently, along with its frequency of occurrence, we use this frequency as the
max uT ag f requency for that user. Subsequently, we calculate the weightage
for each tag in the tag cloud of the user, userN um using the following formula:
f requencyusertag
W eightage(userN um, usertag) = (2)
max uT ag f requency
where, f requencyusertag is the frequency of the tag. By assigning this weightage
we have a measure of the relevance of each tag to the user. Equation 2 gives
us a tag weightage which lies in the range [0, 1]. Table 1 shows a sample user
tag cloud. In the example given in Table 1, classic is the most frequently used
tag. Thus the max uT ag f requency will be 6. We calculate the weightage by
dividing the individual frequency of each tag by 6. Using the above procedure
we form individual tag clouds for all the group members.
In this module, we merge all the individual tag clouds of the group members to
form the group tag cloud. First we list the tag clouds belonging to a group and
then add the weightages of each tag assigned by the group members. After this
addition operation we divide this group weightage of each individual tag in the
group tag cloud by the total number of members in the group to get the average
weightage for the group as shown in Eq. 3. This ensures that all the weightages
of the tags lie in the range [0, 1].
n
W eightage(i, usertag)
Group W eightage(usertag) = i=1 (3)
n
where usertag is a tag in the group tag cloud, n is the number of group members
and W eightage(i, usertag) maps the weightage of that particular tag in the tag
cloud of user i using Eq. 2. If a tag is not present in the tag cloud of user i its
weightage is considered to be 0. Figure 2 shows an example of a tag cloud. Here
we have three user tag clouds U 1, U 2 and U 3 and the corresponding group tag
cloud G generated by merging the user tag clouds. The tag classic occurs both in
the tag cloud of U 1 and U 3 but not present in the tag cloud of U 2. We calculate
its weightage in the group tag cloud by adding the weightages of classic for users
U 1 and U 3, and then dividing the sum by the number of users in the group,
which is 3. So the group weightage for classic is (1 + 0 + 0.16)/3 = 0.38. In spite
of being the most used tag in the user tag cloud U 1, the weightage of classic
reduces in the group since it has a low weightage for U 3 and is absent in the tag
cloud of U 2. Thus we obtain a normalized weightage measure for all the tags in
the group tag cloud.
We first list all the movies in the dataset which has not been watched by any
of the group members. Next we develop a scoring system for all the unwatched
movies and rank them. To calculate the Group Score for an unwatched movie we
Tag-Cloud Based Recommendation for Movies 329
need to compare the similarity in content of the group tag cloud and the movie
tag cloud. Hence we need to generate the movie tag cloud first. The procedure
for forming the movie tag cloud is similar to the one used for forming individual
user tag cloud. Here, all the tags ever used by any user to describe a particular
movie are listed. Following this the frequency of each unique tag is calculated
and we use the frequency of the tag, which is used most frequently in the tag
cloud of the movie as the max mT ag f requency for that movie. The weightage
for each tag in the tag cloud of the movie movieN um is then calculated using
the following formula:
f requencymovietag
W eightage(movieN um, movietag) = (4)
max mT ag f requency
where f requencymovietag corresponds to the frequency associated with this tag.
This weightage forms the basis for measuring the relevance of each tag assigned
to the movie. Equation 4 gives us a weightage of all the tags in the range [0, 1].
An example of a movie tag cloud is shown in Table 2.
where n and m are the total number of tags in the group tag cloud and movie tag
cloud respectively. sim() calculates the similarity between two tags using Word-
Net, Group W eightage(group tagi ) and weightage(mov, movie tagj ) functions
map the weightage of the corresponding tags from the group and the movie tag
cloud using Eqs. 3 and 4 respectively. Although two tags might be an exact match
giving the result 1, but if the weightage of these two tags are low the Group Score
will also be low. Thus Eq. 5 captures how similar a particular movie tag is to a
group tag and their respective relevance to the movie tag cloud and group tag
cloud. Note that, Group Score will again be in the range [0, 1].
3.6 Recommendation
We recommend a top-N movie list to the members of the group on the basis of
the Group Score. However, there may arise a situation when more than one movie
has the same Group Score. To overcome this issue, we find the Popularity Score
of each movie in order to rank a more popular movie higher. The Popularity
Score of a movie refers to the number of unique users who have tagged the
movie. After calculating the popularity scores we normalize these values for all
movies in the list L to keep it in the range [0, 1] using the formula below.
I l − Lmin
P opularityScorel = (6)
Lmax − Lmin
4 Data Description
4.1 Dataset
We have conducted experiments on the MovieLens-10M and 20M datasets [8].
MovieLens-10M dataset contains 10000054 ratings and 95580 tags applied to
10681 movies by 71567 users while MovieLens-20M has 20000263 ratings and
465564 tags assigned by 138493 users to 27278 movies. In both the datasets only
those users are included who have rated at least 20 movies. Since the datasets
do not have any pre-defined groups, we use our own group formation procedure
to form groups and subsequently provide recommendations to the groups.
4.2 Preprocessing
We observed two important features in the MovieLens dataset: (1) Instead of
assigning tags, about 30% of the users had used complete sentences to describe
a particular movie, and (2) There were some irrelevant tags like dialogues of the
movie, or an expression used to describe the movie. To solve these problems,
Tag-Cloud Based Recommendation for Movies 331
we used Natural Language Toolkit [2] to first remove the stop-words like the,
is, which, etc. and then extract nouns and adjectives from the sentences. An
example is shown below.
Original tag: This was based on a book.
Tags generated: book.
Original tag: I was scared by the horror
Tags generated: horror.
tp or true positive implies the number of recommended items that were relevant
to the group based on the training dataset while fp or false positive denotes
the number of irrelevant items that have been recommended. We have measured
precesion@10 to evaluate the effectiveness of Top-10 recommended items.
nDCGN score imposes a penalty or discount on the relevant items that are
ranked lower. This penalty is logarithmically proportional to the relative position
of each relevant item in a ranked list (see Eq. 9). We have used nDCG10 to
evaluate the performance of Top-10 recommended movies. A high nDCG10 score
implies that in the Top-10 ranked list, the movies relevant to the preferences of
the group members are ranked higher which makes the recommendations more
effective. This metric is defined below.
1 1 DCG10, k
P Qi
nDCG10 = (8)
P i=1 Qi IDCG10 , k
k=1
10
2relevancej − 1
DCG10, k = (9)
j=1
log2 (1 + j)
In order to prove the accuracy and efficiency of our framework over other
content-based and memory-based approaches we conduct further experiments. We
compare our approach TC+Popularity with other approaches like TC Without
Popularity, Equal Weightage, Collaborative Filtering and Popularity. For each
approach, groups of members ranging from 3 to 8 have been formed. The com-
parison has been further classified into dissimilar, moderately similar and
highly similar groups. For each comparison based on P recision and nDCG, we
have formed 100 groups with a pre-defined degree of cohesiveness and group size.
Thus in total 1800[100 ∗ 3 (group categories) ∗ 6 (dif f erent group sizes)] groups
have been formed across the MovieLens 10M and MovieLens 20M datasets.
TC+Popularity consists of two core components. First one uses the tag cloud
technique to form the group tag cloud representing the preferences of the group
members, and the movie tag cloud to represent the content of a particular movie.
The Group Score is assigned by comparing the group tag cloud and the movie
tag cloud. The second component is the Popularity Score which is used to rank
the popular movies higher in case of identical Group Score.
We compare the above technique with TC WithoutPopularity which does
not include the Popularity Score feature. A highly popular movie may find itself
below a less popular movie in case of identical Group Score in the recommended
list. In Equal Weightage technique, tags are not assigned any weightage on