Professional Documents
Culture Documents
Predictive Analysis 1 Assignment
Predictive Analysis 1 Assignment
Predictive Analysis 1 Assignment
ipynb - Colaboratory
import pandas as pd
boll = pd.read_csv('/content/bollywood.csv')
boll.head(2)
Release
SlNo MovieName ReleaseTime Genre Budget BoxOfficeCollection YoutubeViews YoutubeLi
Date
18-Apr-
0 1 2 States LW Romance 36 104.0 8576361 26
14
boll.Genre.value_counts()
Comedy 36
Drama 35
Thriller 26
Romance 25
Action 21
Thriller 3
Action 3
Name: Genre, dtype: int64
boll[['Genre','ReleaseTime']].value_counts()
Genre ReleaseTime
Drama N 24
Comedy N 23
Thriller N 20
Romance N 15
Action N 12
Drama HS 6
Comedy LW 5
HS 5
Thriller FS 4
Romance LW 4
Drama FS 4
Comedy FS 3
Action N 3
Romance FS 3
HS 3
Action LW 3
HS 3
FS 3
Thriller N 2
Thriller HS 1
LW 1
Drama LW 1
Thriller LW 1
dtype: int64
boll.head(1)
https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 1/5
9/20/23, 10:55 PM Untitled19.ipynb - Colaboratory
Release
SlNo MovieName ReleaseTime Genre Budget BoxOfficeCollection YoutubeViews YoutubeLi
Date
18-Apr-
0 1 2 States LW Romance 36 104.0 8576361 26
14
boll['month'].value_counts()
1 20
3 19
5 18
7 16
2 16
4 11
9 10
6 10
11 10
10 9
8 8
12 2
Name: month, dtype: int64
boll[boll['Budget']>25][['MovieName','month']].value_counts()
MovieName month
2 States 4 1
Raja Natwarlal 8 1
Kill Dil 11 1
Kochadaiiyaan 5 1
Krrish 3 11 1
..
Highway 2 1
Himmatwala 3 1
Holiday 6 1
Humshakals 6 1
Zilla Ghaziabad 2 1
Length: 62, dtype: int64
boll['ROI']=(boll['BoxOfficeCollection']-boll['Budget'])/boll['Budget']
boll.nlargest(10,['ROI'])
https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 2/5
9/20/23, 10:55 PM Untitled19.ipynb - Colaboratory
Release
SlNo MovieName ReleaseTime Genre Budget BoxOfficeCollection YoutubeViews Youtube
Date
26-Apr-
64 65 Aashiqui 2 N Romance 12 110.0 2926673
13
19-Dec-
89 90 PK HS Drama 85 735.0 13270623
14
boll.groupby('ReleaseTime')['ROI'].mean()
13-Sep- Grand
132 133 LW Comedy 35 298.0 1795640
13 Masti
ReleaseTime 20-Sep- The
135 0.973853
FS 136 N Drama 10 85.0 1064854
13 Lunchbox
HS 0.850867
LW 1.127205
14-Jun-
87 88 Fukrey N Comedy 5 36.2 227912
N 0.657722 13
Name: ROI, dtype: float64
5-Sep-
58 59 Mary Kom N Drama 15 104.0 6086811
14
import matplotlib.pyplot as plt
import128
seaborn 18-Oct-
129 as sn Shahid FS Drama 6 40.0 1148516
13
%matplotlib inline
import warnings Humpty
11-Jul-
warnings.filterwarnings('ignore')
37 38 Sharma Ki N Romance 20 130.0 6604595
14
plt.hist(boll['Budget']) Dulhania
Bhaag
12-Jul-
101 102 Milkha4., 4., 2.,
(array([64., 40.,1319., 11., N 2.,Drama 30
1., 2.]), 164.0 2635390
array([ 2. , 16.8, 31.6,Bhaag 46.4, 61.2, 76. , 90.8, 105.6, 120.4,
135.2, 150. ]),
9-Aug- Chennai
<BarContainer
115 116 object of 10 artists>) FS Comedy 75 395.0 1882346
13 Express
sn.distplot(boll['Budget'])
https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 3/5
9/20/23, 10:55 PM Untitled19.ipynb - Colaboratory
sn.distplot(boll[boll['Genre']=='Comedy']['ROI'],color='g',label='comedy')
sn.distplot(boll[boll['Genre']=='Drama']['ROI'],color='r',label='drama')
plt.legend()
<matplotlib.legend.Legend at 0x7bea32b63880>
feature=['BoxOfficeCollection','YoutubeLikes']
boll[feature].corr()
BoxOfficeCollection YoutubeLikes
heatfeature=['Budget', 'BoxOfficeCollection','YoutubeViews','YoutubeLikes','YoutubeDislikes']
sn.heatmap(boll[heatfeature].corr(),annot= True)
https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 4/5
9/20/23, 10:55 PM Untitled19.ipynb - Colaboratory
<Axes: >
https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 5/5