Professional Documents
Culture Documents
Zhang2015 Pre Production Phase Paper
Zhang2015 Pre Production Phase Paper
Zhang2015 Pre Production Phase Paper
88
metrics for measuring "best". These generally measure the TABLE II. CART MODEL: CLASSIFICAT ION T HRESHOLDS
homogeneity of the target variable within the subsets. CLASS REVENUE
̞ MILLION )
RANGE(̞
A. Gini impurity A (BLOCKBUST ER) 1000+
Used by the CART algorith m, Gini impurity is a measure B 600-1000
of how often a rando mly chosen element fro m the set would C 400-600
be incorrectly labeled if it were rando mly labeled according
D 200-400
to the distribution of labels in the subset. Gin i impurity is a
E 100-200
rule of part ition and can be computed by summing the
probability of each item being chosen times the probability F 50-100
of a mistake in categorizing that item. It reaches its G(FLOP) <50
minimu m (zero) when all cases in the node fall into a single
target category.
To compute Gin i impurity for a set of items, suppose A. Performance metrics
L {1, 2,…,m}, and let IL be the fraction of items Our model uses average percent hit rate (APHR) [11] as
the accuracy metric, which is calculated by formula (2).This
labeled with value L in the set[10]. Gini impurity is metric is arguably the intuitive method to estimate the
computed by formula (1). predictive performance of models. APHR is the ratio of total
P
correct classifications to total number of samples, averaged
IL
, * I . (1) for all classes in the classification problem and is more
L commonly known as precision. As the formula showing, the
bigger the values are, the better classification performance
V. BOX-OFFICE FORECAST ING the model predicts.
As mentioned above, we chose the top 100 films of 2014 &ODVVLILHG
1XPEHURI6DPSOHV&RUUHFWO\
in bo x office revenues as the training set and the top 50 films $3+5 (2)
of the former 5 months in 2015 as the testing set to do our 7RWDO1XPEHURI6DPSOHV
research. The collection and conversion of data accord to the
following variables as “TABLE I” showed. The movie B. Training model
metrics employed by us are transformed fro m 6 variables This paper applies IBM SPSS Modeler to do our s tudy.
into 34 data points for input into our Classification and In our study, we set 6 variables as 6 inputs and box office
Regression Tree Model. revenue class as the output. The CART model is set as
enhanced accuracy style which generating model sequences
TABLE I. VARIABLES to obtain more accurate predictions. The maximu m tree
depth is 7, which is decided by testing. Through training, the
VARIABLES PO SSIBLE VALUES fitting degree of training set is 99%. The importance of the
Production China, America, Hong Kong, Taiwan, others predictive variables is shown in “TABLE III”.
country
First Action, Plot, Adventure, Comedy, Thriller, TABLE III. IMP ORTANCE OF VARIABLES
genr e Animation, love
VARIABLES WEIGHT O F IMPO RTANCE
T he second Comedy,Adventure,Fantasy,Action ,thriller ,Suspense Production country 0.04
genre ,Family ,Love ,Sci-fi, Animation, Costume, War,
Crime, Terror, Children First genre 0.18
Seasonality New year, Labor day, Summer Vacation, National T he second genre 0.04
day, Others Seasonality 0.20
Mean of director A positive integer between 0 and 33890
Mean of director and former two 0.21
and former two
actors’ fans actors’ fans
Mean of director A positive integer between 0 and 19283 Mean of director and former five 0.33
and former five actors’ fans
actors’ fans
The weights of importance of the predictive variables
In our CART based model, we convert the problem of indicate that Mean of director and former five actors’ fans
revenue forecasting from a point-estimate into a has the most influence value as 0.33. The weight of Mean of
classification problem. By this way, movies are clas sified director and former five actors’ fans is bigger than the weight
into one of seven classes from ‘flop’ to ‘blockbuster’ as of Mean of director and former two actors’ fans, this result
“TABLE II” showed. This clustering of films allows for a gives evidence that the more actors are considered the more
CART model to be trained to recognize elements and comprehensive star value quantifies. Seasonality can’t be
combinations of elements which are of predictive value fro m ignored, its weight of importance marks 0.20. It can testify
similarly performing films. the importance of Seasonality in determin ing the box office
performance of films in China. First genre is important than
the second genre, meaning that firs t genre can better
89
represent the major type of films. While, Production country 0.33.It can be concluded that the method is verified to be
performs the minimal influence value. very effective.
This paper also compares the prediction performance of
CART model with others in the training set. The results are VI. CONCLUSIONS
showed as” TABLE IV”. It can be seen clearly that CART This paper converts the problem of revenue forecasting
model has the best prediction performance. So, our study fro m a point-estimate into a classification problem. In our
chooses CART model to predict box office. study, 6 basic film variables with 34 data points are set to be
inputs. The study first applies film fans as a quantization
TABLE IV. COMPARISON OF MODELS
method of star and director value, which is verified to be
MO DEL APHR effective. Through the comparison of different models, our
CART 99% study chooses CART algorithm to do the research. The
Bayesian Network 86% trained model can predict the level of box office at the early
stage of the film, and the prediction accuracy is high as 76%,
SVM 77%
which can provide decision-making reference for filmmakers
C5.0 60% and reduces the investment risk. In the further research, our
NNs 40% study will increase data index to improve the prediction
precision, such as production companies, distribution
C. Forcasting Results companies and film format index, and so on. Meanwhile,
This study set the top 50 films of former 5 months in more models will be applied in our research.
2015 as the testing set. We predict their results though the
trained CART model. The results are shown as” TABLE V”. A CKNOWLEDGMENT
This paper is financially supported by Engineering
TABLE V. FORECATE RESULTS
Planning Project of Co mmunication Un iversity of Ch ina
CLASS TRAINING TESTING (XNG1356), Engineering Planning Project of
APHR (%) APHR (%) Co mmunication University of China (XNG1412),
Outstanding Young Teacher Training Project of
A 100% 100%
Co mmunication University of Ch ina (YXJS201527) and the
B 87.5% 75% National Science Foundation of China (71172040).
C 100% 83.3%
D 100% 80%
REFERENCES
E 100% 75% [1] Yan Wang, T ianxin Jin,“Marketing and risk assessment Under the
dual perspective of movie box office forecasting,” The Chinese film
F 100% 100% market, The third stage, pp. 11-12, 2012.
G 100% 63.2% [2] Litman B R. Predicting Success of Theatrical Movies: An Empirical
Study[J]. Journal of Popular Culture, 1983, 16(4):159–175.
AVERAGE 98.2% 82%
[3] Sharda, R., Delen, D.”Predicting box-office success of motion
T OTAL 99% 76% pictures with neural networks”. Expert Systems with Applications,
vol.30,2006, pp .243–254.
The results indicates that our model performs very well [4] M. Ghiassi., David Lio, Brian Moon. “Pre-production forecasting of
movie revenues with a dynamic artificial neural network”.Expert
both in training and testing set. In our model, the fitting Systems with Applications, vol.42,2015,pp. 3176-3193.
degree of training set is 99%, and the application prediction [5] Reggie Panaligan, Andrea Chen. Quantifying Movie Magic with
accuracy value is 76%. By employing CA RT to our data Google Search 㹙 EB/OL. Google Whitepaper | Industry
with the similar methodology, we can improve classification Perspective+User Insights. http://www.google.com.au/think/research-
accuracy from the 56.01% APHR benchmark previously tudies/quantifying-movie-magic.html, 2013.6.
established by Sharda & Delen to 76% APHR, and fro m the [6] Jingfei Du., Hua Xu, Xiaoqiu Huang. “Box office prediction based
74.4% APHR established by M. Ghiassi & Dav id Lio to on microblog” . Expert Systems with Applications, vol.41,2014,pp.
1680-1689.
76%. Therefore, our study shows fine practical value and
[7] Lian Wang, Jian-min Jia.Forecasting box office perforemance based
innovation. Our research can avoid risk for filmmakers and on online search: Evidence from Chinese movie industry[J].Systems
provide guidance to some extent. The bo x-office revenue Engineering-Theory&Practice.Vol.34,N0.12.Dec .2014.
forecasting model introduced in this research and its superior [8] Einav, L,“ Seasonality in the U.S. motion picture industry”. The
accuracy establish a scientific decision support tool for RAND Journal of Economics, 38(1), pp.127–145ˈ2007
stakeholders in the movie industry, offering them a rational, [9] Post&Telecom Press:Machile Learning in Action,pp.160,2013.
practical advantage. [10] T singhua University Publication: DataMining: Concepts, Models,
The study first applies film fans as a quantization method Methods, and Algorithms,Second Editionl,pp146,2013.
of star and director value. Because the weights of importance [11] Li Zhang,Jianhua Luo, Suying Yang “ Forecasting box office revenue
of the predictive variables indicate that Mean of director and of movies with BP neural network,” Expert Systems with
former five actors’ fans has the most influence value as Applications,vol.36,2008,pp.6580-6587.
90