Certificate

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 25

Rayat Shikshan Sanstha's

KARMAVEER BHAURAO PATIL


COLLEGE,VASHI
[AutonomousCollege]

Reaccredited NAAC with Grade A+' (CGPA3.53)|ISO 9001:2008 Certified Institute


‘BestCollege’ Award by University of Mumbai

[ DEPARTMENT OF INFORMATION TECHNOLOGY ]

CERTIFICATE

This Is To Certify That


Mr. Tanmay Chandrakant Mane
Student of T.Y.B.Sc.IT. Class From Karmaveer Bhaurao Patil College,
Vashi [Autonomous], Navi Mumbai Has Satisfactorily Completed The
Practical Course In Subject DATA ANALYSIS AND VISUALIZATION As
per The Syllabus Laid By The University Of Mumbai During The Academic
Year 2023- 24.

ROLL NO .: 237802
EXAM NO.: 237802

SAHIL VICHARE MADHURI GABHANE


Date: / /2023 Head, Department of IT

External Examiner
INDEX

Sr. No.: Practical Name Sign

1. a. Print "Hello, Data Analysis!" using Python's print function


b. Declare variables for your age, name, and favourite colour.

2. a. Use pandas to create a simple DataFrame with some sample


data.
b. Create a small dataset with missing values using pandas and
display it.

3. a. Calculate the sum of numbers from 1 to 10 using a loop.


b. Create a DataFrame with duplicate rows and remove them
using pandas.

4. a. Plot a basic line graph showing the population growth over


years.
b. Create a bar chart that displays the sales of different
products.

5. Generate a scatter plot for two variables showing their


relationship.

6. Generate a box plot to visualise the distribution of exam


scores.

7. Create a scatter plot to explore the relationship between two


numeric variables.

8. a. Calculate and visualise the mean and median of a dataset


using pandas and Matplotlib.

b. Choose a small dataset and create a bar chart to show


different categories.

9. Plot a line graph showing the temperature variation over days.

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 1(A)

Aim: Print "Hello, Data Analysis!" using Python's print function.

Program :

print("Hello Data Analysis")

Output :

Explanation :
The print() function prints the specified message and output on the screen.

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Practical No.: 1B
Aim : Declare variables for your age, name, and favourite colour.

Program :
age=20
name='abc'
fav_col='red'
print('AGE:',age)
print('Name:',name)
print('Favourite
colour:',fav_col)

Output :

Explanation :
In the above Program code,
Variable age has Integer Data Type which stores Age.
Variable name has String Data Type which stores Name.
Variable fav_col has String Data Type which stores Favourite Colour.

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 2(A)

Aim : Use pandas to create a simple DataFrame with some sample data.

Program :
import pandas as pd data={'Name':['abc','def','ghi'], 'age':
[19,20,21],
'city':['kk','ghansoli','Panvel']}

df=pd.DataFrame(data)
df

Output :

Explanation :
In the above Program code,
We import Library Panda .
We create dictionary of sample data as ‘data’ and the convert it into the
dataframe using the function DataFrame()

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 2B

Aim : Create a small dataset with missing values using pandas and
display it.

Program :
import pandas as
pd import numpy
as np data = {
'Name': ['Omkar', 'Sid', 'Vaibhav', 'DK'],
'Age': [22, None, 22, 28],
'City': ['Mumbai', 'Pune', None, 'Navi Mumbai'],
'Salary': [50000,75000, None, 55000]
}

df = pd.DataFrame(data)
df

Output :

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Explanation :
We create a small dataset with missing values in pandas using the None
(Not any value) value, which represents missing or undefined data.
In this example, we've intentionally set some values to None to represent
missing data in the 'Age' ,'City' and ‘Salary’ columns.

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 3(A)

Aim : Calculate the sum of numbers from 1 to 10 using a loop.


Program :
sum=0
for i in range(1,11)=
sum=sum+i
print('Sum of first 10 digits :',sum)

Output :

Explanation :
In the above Program of code,
for loop is used for iterations

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 3B

Aim : Create a DataFrame with duplicate rows and remove them using
pandas.

Program :
import pandas as pd data={'Name':
['omkar','yash','DK','vaibhav'],
'age':[20,21,20,21],
'city':['Ghansoli','Dombivali','Airoli','koparkhairane']}
df=pd.DataFrame(data)
print('Original DataFrame:')
print(df)

Output :

Explanation :
In the above example,
We create a DataFrame with duplicate rows based on the ‘Name’,’Age’ and
‘City’ columns

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 4(A)

Aim : Plot a basic line graph showing the population growth over years.

Program :
import matplotlib.pyplot as plt
years=[2018,2019,2020,2021,2022]
population=[7.1,6.4,7.8,8.3,7.5]
plt.plot(years,population,marker='o',linestyle='-')
plt.xlabel('Year')
plt.ylabel('population(billions)')
plt.title('population growth over years')
plt.grid(True)
plt.show()

Output :

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Explanation :
We define the years and corresponding population data.

Then use plt.plot() to create a line graph, where we plot years on the x-
axis and population on y-axis , we use marker= ‘o’ to add markers at data
points and linestyle= ‘-’ to connect the markers with lines. Then add labels
x and y-axis,

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Practical No.: 4B
Aim : Create a bar chart that displays the sale of different product.

Program :
import matplotlib.pyplot as plt
products=['product A','product B','product C', 'product D']
sales=[2000,6700,3590,1200]
plt.bar(products,sales)
plt.xlabel('Products')
plt.ylabel('sales(units)')
plt.title('Sales of different products')
plt.xticks()
plt.tight_layout()
plt.show()
Output :

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 5
Aim : Generate a scatter plot for two variables showing their relationship.

Program :
import matplotlib.pyplot as plt
variable1=[1,2,3,4,5,6,7,8,9,10]
variable2=[2,3,5,7,11,13,17,19,23,29]
plt.scatter(variable1,variable2,label='Data points',color='blue',marker='o')
plt.xlabel('Variable 1')
plt.ylabel('Variable 2')
plt.title('Scatter plot of Variable 1 vs. Variable 2')
plt.legend()
plt.grid(True)
plt.show()

Output:

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 6
Aim : Generate a box plots to visualise the distribution of exam scores.

Program :
import seaborn as sns
import matplotlib.pyplot as plt
exam_score=[78,85,90,88,92,75,82,95,88,76,89,93,80]
sns.boxplot(x=exam_score,color="lightblue")
plt.ylabel('Exam scores')
plt.title('Box plot of exam scores')
plt.grid(axis='y',linestyle='-',alpha=0.7)
plt.show()

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 7
Aim : Create a scatter plot to explore the relationship between two numeric
variables.

Program :

import matplotlib.pyplot as plt

variable1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


variable2 = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

plt.scatter(variable1, variable2, color='blue', marker='o')

plt.xlabel('Variable 1')
plt.ylabel('Variable 2')
plt.title('Scatter Plot of Variable 1 vs. Variable 2')

plt.grid(True)
plt.show()

Output :

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Explanation :

import matplotlib.pyplot as plt: This line imports the Matplotlib library, which
is used for creating various types of plots and visualisations.

variable1 and variable2: These are two Python lists containing the values of
'Variable 1' and 'Variable 2.' Each list represents a set of data points. In this
example, 'Variable 1' has values 1 through 10, and 'Variable 2' contains a
different set of prime numbers.

plt.scatter(variable1, variable2, color='blue', marker='o'): This line creates a


scatter plot. Here's what each parameter does:

variable1 and variable2: These parameters specify the data to be plotted on the x
and y axes, respectively. In this case, 'Variable 1' is on the x-axis, and 'Variable
2' is on the y-axis.
color='blue': This sets the color of the scatter points to blue.
marker='o': This specifies that circular markers (dots) should be used for the
data points.
plt.xlabel('Variable 1') and plt.ylabel('Variable 2'): These lines set labels for
the x and y axes, indicating that 'Variable 1' is on the x-axis and 'Variable 2' is
on the y-axis.

plt.title('Scatter Plot of Variable 1 vs. Variable 2'): This line sets the title of the
plot to "Scatter Plot of Variable 1 vs. Variable 2."

plt.grid(True): This command adds a grid to the plot to help in reading and
interpreting the data points.

plt.show(): This line displays the scatter plot on the screen. It's necessary to
include this line to visualise the plot.

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.: 8(A)

Aim : Calculate and visualise the mean and median of a dataset using pandas
Matplotlib.

Program :

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

data = {'Values': [12, 24, 36, 48, 60, 72, 84, 96, 108, 120]}

df = pd.DataFrame(data) mean_value

= df['Values'].mean()

median_value = df['Values'].median()

print(f"Mean: {mean_value}")

print(f"Median: {median_value}")

plt.figure(figsize=(8, 6))

plt.boxplot(df['Values'], vert=False, widths=0.4) plt.scatter([mean_value], [1],

color='red', marker='o', label='Mean') plt.scatter([median_value], [1],

color='blue', marker='x', label='Median')plt.xlabel('Values')

plt.title('Box Plot with Mean and Median')

plt.legend()

plt.grid(True)

plt.show()

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Output :

Explanation :

Imports the necessary libraries:

import pandas as pd: Imports the Pandas library for data manipulation.

import numpy as np: Imports the NumPy library for numerical operations.

import matplotlib.pyplot as plt: Imports Matplotlib for creating plots.

Creates a dictionary 'data' containing a single column of data called 'Values'


with ten values ranging from 12 to 120.

Converts the 'data' dictionary into a Pandas DataFrame named 'df'.

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Calculates the mean and median of the 'Values' column in the DataFrame:

mean_value = df['Values'].mean(): Calculates the mean of the 'Values' column.

median_value = df['Values'].median(): Calculates the median of the 'Values'


column.

Prints the mean and median values.

Sets up a Matplotlib figure with a specific size (8 inches wide and 6 inches
high) using plt.figure(figsize=(8, 6)).

Creates a boxplot of the 'Values' column using plt.boxplot(df['Values'],


vert=False, widths=0.4). The vert=False argument specifies that the boxplot
should be horizontal, and widths=0.4 sets the width of the boxes.

Adds two scatter points to the plot to mark the mean and median:

plt.scatter([mean_value], [1], color='red', marker='o', label='Mean'): Adds a


red circular marker for the mean value.

plt.scatter([median_value], [1], color='blue', marker='x', label='Median'): Adds a


blue 'x' marker for the median value.

Sets the x-axis label to 'Values' using plt.xlabel('Values').

Sets the title of the plot to 'Box Plot with Mean and Median' with plt.title('Box
Plot with Mean and Median').

Adds a legend to the plot to distinguish the mean and median markers using
plt.legend().

Enables the grid for the plot with

plt.grid(True). Finally, displays the plot with

plt.show().

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Practical No.: 8B

Aim :Choose a small dataset and create a bar chart to show different categories.

Program :

import pandas as pd

import matplotlib.pyplot as plt

data = {

'Product': ['Product A', 'Product B', 'Product C', 'Product D'],

'Sales': [1200, 850, 950, 1100]

df = pd.DataFrame(data)

plt.figure(figsize=(8, 6))

plt.bar(df['Product'], df['Sales'], color='skyblue')

plt.xlabel('Product')

plt.ylabel('Sales')

plt.title('Product Sales')

plt.xticks(rotation=15)

plt.grid(axis='y', linestyle='--', alpha=0.7)

plt.show()

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Output :

Explanation :

Import the necessary libraries:

import pandas as pd: Imports the Pandas library for data manipulation.

import matplotlib.pyplot as plt: Imports Matplotlib for creating plots.

Create a dictionary 'data' with two columns: 'Product' and 'Sales,' representing
the product names and their corresponding sales values.

Convert the 'data' dictionary into a Pandas DataFrame named 'df.'

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Set up a Matplotlib figure with a specific size (8 inches wide and 6 inches high)
using plt.figure(figsize=(8, 6)).

Create a bar chart using plt.bar(df['Product'], df['Sales'], color='skyblue'):

df['Product'] is used as the x-axis (product names).

df['Sales'] is used as the y-axis (sales values).

color='skyblue' sets the color of the bars to sky blue.

Set the x-axis label to 'Product' with plt.xlabel('Product').

Set the y-axis label to 'Sales' with plt.ylabel('Sales').

Set the title of the plot to 'Product Sales' using plt.title('Product Sales').

Rotate the x-axis labels by 15 degrees for better readability with


plt.xticks(rotation=15).

Enable gridlines on the y-axis with a dashed line style and a transparency level
of 0.7 using plt.grid(axis='y', linestyle='--', alpha=0.7).

Finally, display the bar chart with plt.show().

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
PRACTICAL NO.:9

Aim :Plot a line graph showing the temperature variation over days.

Program :

import matplotlib.pyplot as plt

import pandas as pd

days = ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5']

temperatures = [75, 78, 82, 79, 83]

data = pd.DataFrame({'Day': days, 'Temperature': temperatures})

plt.figure(figsize=(8, 6))

plt.plot(data['Day'], data['Temperature'], marker='o', linestyle='-', color='b',

markersize=8)

plt.title('Temperature Variation Over Days')

plt.xlabel('Days')

plt.ylabel('Temperature (°F)')

plt.grid(True)

plt.show()

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
Output :

Explanation :

Import the necessary libraries:

import matplotlib.pyplot as plt: Imports the Matplotlib library for creating plots.

import pandas as pd: Imports the Pandas library for data manipulation.

Create two lists:

days: A list of strings representing the days, from 'Day 1' to 'Day 5'.

temperatures: A list of numerical values representing the temperature in degrees


Fahrenheit for each respective day.

Create a Pandas DataFrame named 'data' by combining the 'days' and


'temperatures' lists:

Name :-Tanmay Mane


Roll No : 237802
Batch : B2
'Day': The 'days' list becomes the 'Day' column in the DataFrame.'Temperature':
The 'temperatures' list becomes the 'Temperature' column in the DataFrame.

Set up a Matplotlib figure with a specific size (8 inches wide and 6 inches high)
using plt.figure(figsize=(8, 6)).

Create a line plot using plt.plot(data['Day'], data['Temperature'],


marker='o', linestyle='-', color='b', markersize=8):

data['Day'] is used as the x-axis, representing the days.

data['Temperature'] is used as the y-axis, representing the temperature

values. marker='o' specifies that circular markers should be placed at data

points. linestyle='-' specifies that the line connecting the data points should

be solid. color='b' sets the color of the line and markers to blue.

markersize=8 sets the size of the markers to 8 points.

Set the title of the plot to 'Temperature Variation Over Days' using
plt.title('Temperature Variation Over Days').

Set the x-axis label to 'Days' with plt.xlabel('Days').

Set the y-axis label to 'Temperature (°F)' with plt.ylabel('Temperature (°F)'.

Enable grid lines on the plot with plt.grid(True).

Finally, display the line plot using plt.show().

Name :-Tanmay Mane


Roll No : 237802
Batch : B2

You might also like