Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

ANALYSIS

OF MOVIE
DATABASE
USING
GGPLOT
PAVAN KUMAR G R
1DS20BA068
CONTENTS
I. IMPORTING THE DATABASE...................................................3
II. CHANGING THE COLUMN NAME & SUMMARISING..........3
III. APPLYING AESTHETICS.........................................................3
IV. HISTOGRAM..............................................................................7
V. DENSITY CHART.........................................................................7
VI. STATISTICAL TRANSFORMATION.......................................8
A. GEOM SMOOTH........................................................................8
B. BOX PLOT...................................................................................9
VII. FACETS.....................................................................................10
VIII.ZOOMING THE DATA............................................................12
IX. ADDING THEME TO DATA...................................................14

Figure 1: Empty graph..........................................................................3


Figure 2: Scatter plot............................................................................4
Figure 3: Scatter plot with colour as genre...........................................4
Figure 4: Scatter plot with colour as genre and size of dots as budget.5
Figure 5: Scatter plot along with line chart..........................................6
Figure 6: Histogram..............................................................................7
Figure 7: Density chart.........................................................................7
Figure 8: Geom-smooth and scatter plot..............................................8
Figure 9: Box plot with Geom-jitter.....................................................9
Figure 10:Vertical facets....................................................................10
Figure 11: Horizontal facets...............................................................10
Figure 12: Facet grid...........................................................................11
Figure 13: Zooming the data..............................................................12
Figure 14: Zooming using facets........................................................13
Figure 15: Adding theme to the data..................................................14
I. IMPORTING THE DATABASE

movierating<-read.csv(file.choose(), header=T , stringsAsFactor=T)


II. CHANGING THE COLUMN NAME & SUMMARISING

a. names(movierating)<-
c("Film","Genre","RottenTomatoesRating","AudienceRating","
BudgetinMillion","Year")
b. View(movierating)
c. head(movierating)
d. tail(movierating)
e. str(movierating)
f. summary(movierating)
III. APPLYING AESTHETICS

a. install.packages("ggplot2")
b. library(ggplot2)
c. ggplot(data=
movierating,aes(x=RottenTomatoesRating,y=AudienceRating))
Figure 1: Empty graph.

Program to get a blank graph sheet with x and y axis as Rotten tomato
rating and Audience ratings respectively.
d. ggplot(data=
movierating,aes(x=RottenTomatoesRating,y=AudienceRating))
+geom_point()
Figure 2: Scatter plot.

A graph showing the scatter points among the two axis X and Y. i.e.,
Rotten tomatoes rating and Audience ratings accordingly.
e. ggplot(data=
movierating,aes(x=RottenTomatoesRating,y=AudienceRating,c
olor=Genre))+geom_point()
Figure 3: Scatter plot with colour as genre.

Graph with same data as in figure 2 but here the allotment of points
are according to the genre of the movies which had got rotten tomato
rating and audience ratings.
f. g<-ggplot(data=
movierating,aes(x=RottenTomatoesRating,y=AudienceRating,c
olor=Genre,size=BudgetinMillion))
Figure 4: Scatter plot with colour as genre and size of dots as budget.

Graph showing the scatter plot of genre and budget along the X and Y
axis
g. g+geom_point()
h. g+geom_line()
i. g+geom_line()+geom_point()
j. g+geom_line(size=0.5)+geom_point()

Figure 5: Scatter plot along with line chart.


In this graph the figure 4 graph is updated by apply aesthetic features
geometric point along with geometric lines for better understanding of
the graph.
IV. HISTOGRAM
a. g<-ggplot(data= movierating,aes(x=BudgetinMillion))
b. g+geom_histogram(binwidth=10,fill="red",colour="green")
c. g+geom_histogram(binwidth=10,aes(fill=Genre),colour="black"
)

Figure 6: Histogram.

The graph represents the budget (million $) required for each genre.
The x axis represents the budget and y axis the no. of count (genre)
and the coloured area represent the genre the movie belongs to.
V. DENSITY CHART
a. g<-ggplot(data= movierating,aes(x=BudgetinMillion))
b. g+geom_density(aes(fill=Genre),position="stack")
Figure 7: Density chart.

The graph indicates the same data as in figure 6 but represented using
density chart.
VI. STATISTICAL TRANSFORMATION
A. GEOM SMOOTH
a. g<-ggplot(data=
movierating,aes(x=RottenTomatoesRating,y=AudienceRating,c
olor=Genre))
b. g+geom_point()+geom_smooth(fill=NA)
Figure 8: Geom-smooth and scatter plot.

The graph represent which genre has got what ratings accordingly i.e.,
whether rotten tomatoes are more or audience rating is more.
B. BOX PLOT
a. g<-ggplot(data=
movierating,aes(x=Genre,y=RottenTomatoesRating,color=Genr
e))
b. g+geom_boxplot()
c. g+geom_boxplot(size=1.2)+geom_jitter()
d. g+geom_jitter()+geom_boxplot(size=1.2,alpha=0.5,
outlier.colour = NA)
Figure 9: Box plot with Geom-jitter.

The graph indicates the box plot of the genre along the rotten tomato
ratings.
VII. FACETS
a. g<-ggplot(data= movierating,aes(x=BudgetinMillion))
b. g+geom_histogram(binwidth=10,aes(fill=Genre),colour="black"
)
c. g+geom_histogram(binwidth=10,aes(fill=Genre),colour="black"
)+facet_grid(Genre~.,scales="free")
Figure 10:Vertical facets.

The facets program allows to give different graphs for each genre for
the budget utilised by them. The above graph depicts clearly the
amount of budget used by each genre and the count of genre
respectively.
d. g+geom_histogram(binwidth=10,aes(fill=Genre),colour="black"
)+facet_grid(.~Genre,scales="free")

Figure 11: Horizontal facets.


The same data in figure 11 is represent side by side.
e. g<-ggplot(data=
movierating,aes(x=RottenTomatoesRating,y=AudienceRating,c
olour=Genre))
f. g+geom_point(aes(size=BudgetinMillion))+geom_smooth()
+facet_grid(Genre~Year)

Figure 12: Facet grid.

The data represents what is the budget used by each genre in every
year and the rotten and audience rating for each genre.
VIII. ZOOMING THE DATA
a. g<-ggplot(data= movierating,aes(x=BudgetinMillion))
b. g+geom_histogram(bandwidth=10,aes(fill=Genre),colour="blac
k")+coord_cartesian(ylim=c(0,65))
Figure 13: Zooming the data.

The zooming of the data means only required range of data is graphed
i.e., in this case the count of number of genres is limited to 65 so only
up to 65 count is zoomed and there is no data loss.
ZOOMING USING FACETS
a. g<-ggplot(data=
movierating,aes(x=RottenTomatoesRating,y=AudienceRating,c
olour=Genre))
b. g+geom_point(aes(size=BudgetinMillion))+geom_smooth()
+facet_grid(Genre~Year)+coord_cartesian(ylim=c(0,100))
Figure 14: Zooming using facets.

IX. ADDING THEME TO DATA


a. g<-ggplot(data= movierating,aes(x=BudgetinMillion))
b. g+geom_histogram(bandwidth=10,aes(fill=Genre),colour="blac
k")
c. p<-
g+geom_histogram(bandwidth=10,aes(fill=Genre),colour="blac
k")
d. p+xlab("BUDGET USED")+ylab("No. OF MOVIES")
+theme(axis.title.x=element_text(colour="darkgreen",size=30),a
xis.title.y=element_text(colour="darkgreen",size=30),legend.titl
e=element_text(size=20),legend.text=element_text(size=20),leg
end.position=c(1,1),legend.justification=c(1,1),plot.title=elemen
t_text(colour="darkblue",size=40,family="popularity"))
Figure 15: Adding theme to the data.

Here the theme to the graph is applied by assigning the base,


geometric and aesthetic layer to a variable “P” and then adding
different themes to that variable, Such as x and y title’s colour and
size, scale position etc.

You might also like