Project On Netflix Data Analysis

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 22

SESSION 2021 – 22

A
PROJECT
REPORT
ON
INFORMATICS PRACTICES

{Netflix data analysis}


DIRECTED BY
CENTRAL Board OF SECONDARY EDUCATION (CBSE)

Guided by: submitted by:


Mrs. HONEY KAUR ______________
Roll no:_________
CLASS : XII-

Certificate
1
This is to certify that _____________of class 12th ___
(Commerce) of SRI GURU NANAK PUBLIC SCHOOL has
completed his/her project entitled Netflix data analysis under
my supervision. He/She has taken a proper care and sincerity in
completion of his/her project.
I certify that this project is up to my expectation and as per the
guideline given by CBSE.

Principal signature

External Signature Internal SIGNATURE

INDEX
S.NO DESCRIPTION
1. ACKNOWLEDGEMENT
2. PREFACE
3. INTRODUCTION TO python, pandas, matplotlib,

2
csv &MYSQL
4. HARDWARE AND SOFTWARE REQUIREMENT
5. Source code
6. Output
7. CONCLUSION
8. REMARKS
9. BIBLIOGRAPHY

ACKNOWLEDGEMENT
First of all the express of our deep, sense and gratitude and whole thanks
and honourable guide Mrs. HONEY KAUR for her valuable guidance,
keen interest and constant encouragement throughout in making our
project came to live. We are feeling great pleasure to have undertaken
this project entitled Netflix data analysis. Throughout project
development we get immense support from Mrs. HONEY KAUR and all
faculty members of MBVB. We express sincere thanks to Mrs. HONEY
KAUR for providing us with relevant facility, valuable guidance and
extra lab time for completion of our project and proper time. We would

3
like to thanks our school management for giving a coordinate support
throughout the project development.
We greatly respect each other’s contribution, dedication,
sincere efforts in making this project come to alive.

Student : _________

Introduction:

4
THEORETICAL BACKGROUND

1.Python: Python is easy to learn and use


and more expressive interacted
and
cross-platform language which
has a large and broad library
such as Pandas, matplotlib,
logging, time, sys and much more
that gives a programmer a huge
resource to take advantage of
this language.
Features:
➔Easy to learn and use
➔Free and open source
➔Large standard library
5
➔More expressive
2.Pandas: Pandas is a software library
Written for the Python programming
language for data manipulation and
analysis. In particular, it offers
data structures and operations for
manipulating numerical tables and
time series.
Features:
➔Handling of data
➔Alignment and indexing
➔Handling missing data
➔Cleaning up data

3.Matplotlib: Matplotlib is a plotting


library for the Python programming

6
language and its numerical
mathematics extension numpy. It
provides an object-oriented API for
embedding plots into the application
using general purpose.
Features:
➔Easy Visualisation.
➔Free and open source.
➔Embedded GUI.
➔Widely used for data analysis.

4.CSV file(Comma-separated values): A comma-


separated values file is a delimited text file that
uses a comma to separate values. Each
line of the file is a data record.
Each record consists of one or more
7
fields, separated by commas. The
use of the comma as a field
separator is the source of the
name for this file format.

Features:
➔One line for each record.
➔Comma-separated fields.
➔Space-characters adjacent to commas
is ignored.
➔Fields with in-built commas are
separated by double-quote
characters.

8
SYSTEM
IMPLEMENTATION

The Hardware used:


While developing the Software, Dell Inspiron 15 3000
3567 15.6-inch FHD Laptop (7th Gen Core
i7-7500U/8GB/1TB/Windows 10 with Office 2016 Home
and Student/2GB Graphics)

The Software used:


➔Windows Operating System (WOS)
➔Python idle
➔Pages for Documentation

9
coding

10
11
'''Project on Netflix Data Analysis'''
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sqlalchemy #import sqlalchemy for interaction
from sqlalchemy import create_engine
#Create engine to establish the connection between SQL and Python Pandas
#Manipluation
#insert
#delete a rows
#add another dataframe file
#drop a column
#Analysis
#Display the top records
#Display the bottom records
#Display particular column
#display particular Columns
#display rows
#rating type
#rating level conditions
#Aggreagate values
#Top rating shows
#NaN record
#Graphs
#Pie chart rating wise shows
#Line chart year wise

pro=input("Enter the project title")


while True:
print("-----------------------------------------------------------")
print(" ", pro," ")
print("-----------------------------------------------------------")
mainmenu='''1. Read CSV/ Excel File
\n2. Manipulation
\n3. Analysis
\n4. Visualisation
\n5. Import / Export Data from SQL
\n6. Exit'''
print(mainmenu)

12
ch=int(input("Enter your choice"))
if ch==1:
print('''1. Read CSV File to create and Display DataFrame\
\n2. Read Excel File to create and Display DataFrame\
\n3. Press enter to go back''')
chone=int(input("Enter your choice"))
if chone==1:
filename=input("Enter the file name with extension .CSV")
df=pd.read_csv(filename)
print(df)
print("File Reterived Sucessfully!!!!")
elif chone==2:
filename=input("Enter the file name with extension .XLSX")
df=pd.read_excel(filename)
print(df)
print("File Reterived Sucessfully!!!!")
elif chone==3:
pass
elif ch==2:
print('''\n1. Insert\n
2. Delete a rows\n
3. Add another dataframe file\n
4. Enter to go back''')
mch=int(input("Enter your choice"))
if mcf==1:
col=df.columns
print(col)
print(df.head(1))
j=0
ninsert={}
for i in col:
print("Enter ", col[j], " value")
nval=input()
ninsert[col[j]]=nval
j=j+1
print(ninsert)
df = df.append(ninsert, ignore_index=True)
print("New row inserted")
elif ch==3:
print("Data Frame Analysis")
13
menu=''' 1. Top record \n 2. Bottom Records
\n 3. To print particular column
\n 4. To print multiple columns
\n 5. To display complete statitics of the dataframe
\n 6. To display complte information about dataframe
\n 7. To display the unique values of the columns
\n 8. To apply and display the data group by with count function
\n 9. To apply and display the data using group by with more
functions
\n 10.To appying aggregate function
\n 11.To applying pivoting
\n Press enter to go back ''
print(menu)
ch3=int(input("Enter your choice"))
if ch3==1:
n=int(input("Enter the number of records to be displayed"))
print("Top ", n," records from the dataframe")
print(df.head(n))
elif ch3==2:
n=int(input("Enter the number of records to be displayed"))
print("Bottom ", n," records from the dataframe")
print(df.tail(n))
elif ch3==3:
print("Name of the columns\n",df.columns)
co=input("Enter the column name to be displayed")
print(df[[co]])
elif ch3==4:
print("Name of the columns\n",df.columns)
co=eval(input("Enter the column names as list in square bracket"))
print(df[co])
elif ch3==5:
print("Complete Statistics")
print(df.describe())
elif ch==6:
print("Information about dataframe")
print(df.info())
elif ch3==7:
print("Dispaying unique values of any columns")
print("Name of the columns\n",df.columns)
co=input("Enter the column name")
14
print("Distinct values of column ", co," are: ")
print(*df[co].unique(),sep='\n')
elif ch3==8:
print("Name of the columns\n",df.columns)
co=eval(input("Enter the column names as list in square bracket"))
print(df[co])
co1=input("Enter the column name to be displayed")
print("Grouped columm ",co1)
dfgroup=df[co].groupby(co1).count()
print(dfgroup)
elif ch3==9:
print("Name of the columns\n",df.columns)
co=eval(input("Enter the column names as list in square bracket"))
print(df[co])
co1=input("Enter the column name to be displayed")
print("Grouped columm",co1,' max',' min',' count',' sum',' mean')
dfgroup=df[co].groupby(co1).agg(['max','min','count','sum','mean'])
print(dfgroup)
elif ch3==10:
print("Applying aggregate functions")
print("Name of the columns\n",df.columns)
co=eval(input("Enter the column names as list in square bracket"))
print('Print the maximum values of the ',co,' columns')
print(df[co].max()) #Any function can be applied
elif ch3==11:
dfYear=df[df['release year']>2010]
dfpivot=dfYear.pivot_table(index='rating',columns='release year',values='user
rating size')
print(dfpivot)
else:
print("Invalid choice")
elif ch==4:
print("Data Visualisation of pandas data frame")
menu=''' 1. To display histogram of all numeric columns
\n 2. To display the line chart
\n 3. To display line chart of numeric columns
\n 4. To choose your chart
\n Press enter to go back '''
print(menu)
ch4=int(input("Enter your choice"))
15
if ch4==1:
df.hist()
plt.show()
elif ch4==2:
df.plot(kind='line')
plt.show()
elif ch4==3:
dfline=df[['release year','user rating score']].groupby('release year').count()
dfline.plot(color='r',linestyle='--',marker='X',figsize=(10,10))
plt.title('Year')
plt.ylabel("Ratings")
plt.show()
elif ch4==4:
gmenu='''1. Bar chart
\n2. Box chart
\n3. Pie Chart
\n4. Box plot
\n5. Histogram
\n 6. Bar Graph '''
print(gmenu)
gch=int(input("Enter your choice"))
if gch==1:
print("Bar Chart required only one numeric columns")
print(df.head(3))
print("Name of the columns\n",df.columns)
co=eval(input("Enter the column names as list in square bracket"))
print(df[co])
co1=input("Enter the column name to be displayed")
print("Grouped columm ",co1)
dfgroup=df[co].groupby(co1).count()
print(dfgroup)
dfgroup.plot(kind='bar',title='Report of
graphs',color=['red','yellow'],edgecolor='Green',linewidth=2,linestyle='--',figsize=(
10,10))
plt.show()
elif gch==2:
print("Box Chart required only one numeric columns")
print(df.head(3))
print("Name of the columns\n",df.columns)
co=eval(input("Enter the column names as list in square bracket"))
16
print(df[co])
co1=input("Enter the column name to be displayed")
print("Grouped columm ",co1)
dfgroup=df[co].groupby(co1).count()
print(dfgroup)
dfgroup.plot(kind='box',title='Report of
graphs',color=['red','yellow'],edgecolor='Green',linewidth=2,linestyle='--',figsize=(
10,10))
plt.show()
elif gch==3:
print("Pie Chart required only one numeric columns")
print(df.head(3))
print("Name of the columns\n",df.columns)
co=eval(input("Enter the column names as list in square bracket"))
print(df[co])
co1=input("Enter the column name to be displayed")
print("Grouped columm ",co1)
dfgroup=df[co].groupby(co1).count()
print(dfgroup)
dfgroup.plot(kind='pie',y='user rating size',autopct='%.2f')
else:
break
elif ch==5:
sqlmenu='''1. Import
\n2. Export
\n Enter to go back'''
print(sqlmenu)
chmenu=int(input("Enter the choice"))
if chmenu==1:
engine=create_engine('mysql+pymysql://root:sms@localhost:3306/test')

#Create_engine is method which help us to interact with MYSQL, root is the


username of MySQL, "sms" is the password of the MySQL, and "Rachna" is the
name of the Database. Rest of option remain same. We are connecting to the
localhost ofr the server and 3306 server port number

tablename=input("Enter the table name")


df.to_sql(tablename,engine,if_exists="replace",index=False)
#to_sql method will transfer the data to MySQL, if_exist means if the table already
exist it be replaced, and we don’t want to transfer the index so its false.
17
elif chmenu==2:
engine=create_engine('mysql://root:sms@localhost:3306/test')
tablename=input("Enter the table name")
se='SELECT * FROM {}'.format(tablename)
df=pd.read_sql_query(se, engine)
print("Data Fetched")
print(df)
else:
break
else:
break
con=input("Do you wish to continue")
if con=='n':
break

18
Conclusion

The conclusion that we drawn from this project report is that from
aspect the project on the Netflix data analysis is technically feasible,
usable and it is also valuable.

19
Remarks

20
BIBILIOGRAPHY

Python pandas:-
1. Informatics PRACTICE BY sumita arora (dhanpat rai
publication).
2. Informatics practices by preeti arora (sultan chand publication).

21
22

You might also like