Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Bollywood Movies

Box Office Collection


Analysis by Linear
Regression Model
KESHAV JANGID 2016ME20782
Objectives and Relevance
• Objective – Analysis of Box Office Collection of Bollywood movies(released
in 2018) on various factors such as Budget, Audience review and other
Factors via multiple regression method

• Relevance - As Movies are the biggest source of entertainment in present


decade, many people want to invest in a movie or become Film Producer

• Better Planning of Movies will result in a better return of investment

• Factors, I will be incorporating are movie’s budget, user review(IMDB


rating), no of simultaneous movies releasing on that day , and the number of
releases in that particular month
Data Description
• I collected the Data of 2018 Bollywood Movies release date, their Gross box
office collection in India, their Budget and their respective IMDB rating

• Data is collected from various sources like Wikipedia.com,


BoxOfficeCollection.com, jackace.com, sacnilk.com

• There was no accurate measure present to collect the information about the
hype of the movie
Causes of Error
• Many film makers don’t release the exact budget and that’s why it’s
estimation might be wrong
• Hype of the Movie is not incorporated
• Effects of Presence of Super Star Actors is not incorporated in the analysis
• Presence of Hollywood and other regional movies doing good in Indian
Theaters is not seen
• Effect of release on Festivals, Holidays and Extended Weekends is not
incorporated
• Audience Genre Preference is also not incorporated
• Effect of Presence of Big and Famous Production Houses is also not
incorporated
Data
No of No of Movie
simultaneous Releases in
Name Date of Release Budget(in Cr.) Gross Indian Collection (in Cr.) IMDB( out of 10) ROI Release that month
1921 12-Jan 15 19.69 4.2 1.312667 3 5
Kaalakaandi 12-Jan 18 6.15 6.2 0.341667 3 5
Mukkabaaz 12-Jan 12 13.67 8.1 1.139167 3 5
Vodka Diaries 19-Jan 3 1.26 5.6 0.42 1 5
Padmaavat 25-Jan 215 360.89 7 1.678558 1 5
Pad Man 09-Feb 76 100.53 8 1.322763 1 3
Aiyaary 16-Feb 57 22.33 5.2 0.391754 1 3
Sonu Ke titu ki sweety 23-Feb 40 128.85 7.1 3.22125 1 3
Pari 02-Mar 21 35.15 6.6 1.67381 2 8
Veerey ki Wedding 02-Mar 16 3.65 2.8 0.228125 2 8
Hate Story 4 09-Mar 17 25.64 3.3 1.508235 3 8
Dil Junglee 09-Mar 13 1.47 4 0.113077 3 8
3 Storeys 09-Mar 11 2.84 7.1 0.258182 3 8
Raid 16-Mar 72 125.66 7.4 1.745278 1 8
Hichki 23-Mar 20 59.13 7.5 2.9565 1 8
Baaghi 2 30-Mar 59 205.44 5 3.482034 1 8
Blackmail 06-Apr 18 28.81 7 1.600556 1 5
October 13-Apr 33 50 7.5 1.515152 1 5
Beyond the Clouds 20-Apr 7 2.1 6.9 0.3 2 5
Nanu Ki janu 20-Apr 15 4.12 5 0.274667 2 5

Daas Dev 27-Apr 15 1.4 5.1 0.093333 1 5

102 Not out 04-May 35 50 7.5 1.428571 2 5

Omerta 04-May 12 4 7 0.333333 2 5

Raazi 11-May 38 158 7.8 4.157895 1 5

Parmanu 25-May 44 65 7.7 1.477273 2 5

Bioscopewaala 25-May 4 0.55 7.7 0.1375 2 5

Veerey di Wedding 01-Jun 42 102 3.2 2.428571 2 4

Bhavesh Joshi Superhero 01-Jun 20 1.5 7.6 0.075 2 4

Race 3 15-Jun 185 213 2 1.151351 1 4

Sanju 29-Jun 96 430.84 7.8 4.487917 1 4

Soorma 13-Jul 31 42 7.4 1.354839 1 4

Dhadak 20-Jul 41 95 4.5 2.317073 1 4

Saheb Biwi aur Gangster 3 27-Jul 20 6.6 4.3 0.33 2 4

Nawabzade 27-Jul 8 4.5 4.7 0.5625 2 4

Fanney Khan 03-Aug 38 16.7 4.4 0.439474 3 9

Mulk 03-Aug 20 42 6.7 2.1 3 9

Karwaan 03-Aug 23 26.42 7.5 1.148696 3 9

Gold 15-Aug 70 110 7.3 1.571429 2 9

Satyamev Jayate 15-Aug 45 104 5.6 2.311111 2 9

Happy Phir bhaag jaayegi 24-Aug 30 21 4.5 0.7 2 9

Genius 24-Aug 20 4.3 4.6 0.215 2 9

Yamla Pagla Deewana Phir se 31-Aug 36 10 4.8 0.277778 2 9

Stree 31-Aug 30 167 7.7 5.566667 2 9

Paltan 07-Sep 14 10 5.2 0.714286 2 9


Laila Majnu 07-Sep 8 3 7.7 0.375 2 9

Manmarziyaan 14-Sep 30 36 6.9 1.2 3 9

Love Sonia 14-Sep 20 9 7.4 0.45 3 9

Mitron 14-Sep 15 4 6.9 0.266667 3 9

Batti Gul Meter Chalu 21-Sep 49 52.4 6.1 1.069388 2 9

Manto 21-Sep 8 4 7.4 0.5 2 9

Sui Dhaaga 28-Sep 48 80 6.8 1.666667 2 9

Patakha 28-Sep 15 7 7.2 0.466667 2 9

Loveyatri 05-Oct 27 12 3 0.444444 2 10

Andhadhun 05-Oct 32 96 8.4 3 2 10

Jalebi 12-Oct 8 2.5 6.3 0.3125 4 10

Helicopter eela 12-Oct 16 4.5 5.5 0.28125 4 10

FryDay 12-Oct 10 1.5 5.4 0.15 4 10

Tumbbad 12-Oct 6 13 8.2 2.166667 4 10

Namaste England 18-Oct 54 8 1.7 0.148148 2 10

Badhaai ho 18-Oct 29 176 8 6.068966 2 10

Baazar 26-Oct 40 24.8 6.7 0.62 2 10

5 weddings 26-Oct 10 1 3.2 0.1 2 10

Thugs of Hindostan 08-Nov 310 194 4 0.625806 1 4

Mohalla Assi 16-Nov 8.5 2.5 6.9 0.294118 2 4

Pihu 16-Nov 2 3 6.7 1.5 2 4


Bhaiaji Superhit 23-Nov 50 8.2 4.1 0.164 1 4

Kedarnath 07-Dec 60 85 6.1 1.416667 1 3

Zero 21-Dec 210 120 5.5 0.571429 1 3

Simmba 28-Dec 110 295 6.3 2.681818 1 3


Preliminary Analysis

Scatter Plot of Box Office Collection vs Budget of the


movie
500

450

400
Box Office Collection(in Cr.)

350

300

250

200

150

100

50

0
0 50 100 150 200 250 300 350
Budget(in Cr.)
Scatter Plot of log(Box office Collection) vs log(Budget)
12

y = 1.422x - 3.6693
10 R² = 0.6492
Log(Box Office Collection)

6
Y-Values
Linear (Y-Values)

0
0 1 2 3 4 5 6 7 8 9 10
Log(Budget)
Scatter Plot of Return of Investment vs IMDB Rating
7

5
Return Of Investment

0
0 1 2 3 4 5 6 7 8 9
IMDB Rating
MATRIX PLOT

s
f
Linear Regression Analysis
• I will be assuming 3 models and will compare those models to get the best
applicable model

• Firstly I will incorporate all the variables and will regress it to analyze it

• Then I will not incorporate the factor for which p value would be greater
than 0.05

• Then Finally I will also remove the variable for which p value is greater
Model -1- Using All
variables

Model-
Y = 𝛽1 + 𝛽2X2 + 𝛽3X3 + 𝛽4X4
+ 𝛽5X5

From t-test we get


Budget , imdb rating and
number of movies releasing
that date statistically
significant
Model -2- Using 3 variables –
budget, imdb and number of
movies releasing on that date

Model-
Y = 𝛽1 + 𝛽2X2 + 𝛽3X3 + 𝛽4X4

Budget , imdb rating and


number of movies releasing
that date are statistically
significant here
Model -3- Using 2 variables –
budget and imdb

Model-
Y = 𝛽1 + 𝛽2X2 + 𝛽3X3
OLS Condition Satisfaction

• Model-2:
 Y = 𝛽1 + 𝛽2X2 + 𝛽3X3 + 𝛽4X4
 Adjusted R2 = 0.5313
 Multiple R2 = 0.552
 Correlation of residuals with different variables

• All values of the correlation are very small except for box office collection
Multicollinearity
VIF for model 1

VIF for model 2

VIF for model 3

As all VIF <10 , there is no multicollinearity


Heteroscedasticity

As from B-P test


The P value is less than 0.05,
Hence we will reject the null
hypothesis that homoscedasticity
present.
Thus Heteroscedasticity is present
Conclusion
• As we see from the data that number of movies releasing on a date has a
significant effect and it’s coefficient is negative, thus it can be concluded
easily that clashes on the box office surely affects the collection of the movie

• There is no multicollinearity present but Heteroscedasticity is present in the


model

• Also we see that this model is not very accurate because of the various
factors not incorporated in the model

• Having a good public review also has a very significant effect in the
collection of the movie and good rating will surely increase the chances of
having return of investment greater than 1 , i.e. box office collection being
greater than the budget of the film
THANK YOU

You might also like