Be A 65 Ads Exp 2

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Experiment No.

02
PART B
Roll No.: A7 Name: Ritika Dwivedi
Class: BE-A(COMP) Batch: A1
Date of Experiment: 16-02-24 Date of Submission: 16-02-24
Grade:

Aim: To implement any five Data visualization Techniques.

B.1 Software Code written by student:

!pip install -q kaggle


import numpy as np
import pandas as pd
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname,filename))
from google.colab import files
files.upload()
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d shree1992/housedata
!unzip housedata.zip
data=pd.read_csv('data.csv')
data
df=pd.DataFrame(data)
df
Line Graph
import pandas as pd
import matplotlib.pyplot as plt

# Selecting the data


x_values = data['bedrooms'] # X-axis: Number of bedrooms
y_values = data['price'] # Y-axis: Price

# Plotting the data


plt.plot(x_values, y_values, marker='o', linestyle='-',color='red')

# Adding title and labels


plt.title('Price vs. Number of Bedrooms')
plt.xlabel('Number of Bedrooms')
plt.ylabel('Price')
# Displaying the plot
plt.show()

import pandas as pd
import matplotlib.pyplot as plt

# Convert 'date' column to datetime format for better plotting


data['date'] = pd.to_datetime(data['date'])

# Sorting data by 'date' to make the line graph meaningful


data.sort_values('date', inplace=True)

# Selecting attributes to plot against 'date'. Example: 'price'


plt.figure(figsize=(10, 6))
plt.plot(data['date'], data['price'], marker='', linestyle='-', label='Price',color='green')

# Adding title and labels


plt.title('Price Over Time')
plt.xlabel('Date')
plt.ylabel('Price')

# Optionally, add a legend if you plot multiple lines


plt.legend()

plt.xticks(rotation=45) # Rotate date labels for better readability


plt.tight_layout() # Adjust the layout to make room for the rotated date labels
plt.show()

import pandas as pd
import matplotlib.pyplot as plt

# Selecting attributes for the line graph


x_attribute = 'sqft_living' # X-axis: Square footage of living space
y_attribute = 'price' # Y-axis: Price

# Plotting the line graph


plt.figure(figsize=(10, 6))
plt.plot(data[x_attribute], data[y_attribute], marker='o', linestyle='-',color='orange')

# Adding title and labels


plt.title(f'{y_attribute.capitalize()} vs. {x_attribute.capitalize()}')
plt.xlabel(f'{x_attribute.capitalize()}')
plt.ylabel(f'{y_attribute.capitalize()}')

plt.grid(True) # Add grid for better readability

plt.tight_layout()
plt.show()
ScatterPlot
import pandas as pd
import matplotlib.pyplot as plt

# Define variables
x = data['sqft_living']
y = data['price']
size_attribute = 'bathrooms' # Using 'bathrooms' to adjust the size of the points

# Normalize the size attribute for better visualization


sizes = (data[size_attribute] - data[size_attribute].min()) / (data[size_attribute].max() -
data[size_attribute].min()) * 100

# Plotting the scatter plot with single color


plt.figure(figsize=(12, 8))
scatter = plt.scatter(x, y, s=sizes, cmap='viridis', alpha=1, edgecolors='w',
linewidth=1,color='green')

# Adding title and labels


plt.title('House Price by Sqft Living, Sized by Bathrooms')
plt.xlabel('Sqft Living')
plt.ylabel('Price')

plt.grid(True)
plt.show()
Bubble Chart
import pandas as pd
import matplotlib.pyplot as plt

# Define variables
x = data['sqft_living']
y = data['price']
bedrooms = data['bedrooms']
bathrooms = data['bathrooms']

# Set bubble sizes based on the number of bedrooms


sizes = bedrooms * 10 # Adjust the scaling factor as needed

# Set bubble colors based on the number of bathrooms


colors = bathrooms

# Plotting the bubble chart


plt.figure(figsize=(12, 8))
bubble = plt.scatter(x, y, s=sizes, c=colors, cmap='viridis', alpha=0.6, edgecolors='w', linewidth=1)

# Adding color bar to represent the 'bathrooms' attribute


cbar = plt.colorbar(bubble)
cbar.set_label('Number of Bathrooms')

# Adding title and labels


plt.title('House Price vs. Sqft Living with Bedrooms and Bathrooms')
plt.xlabel('Sqft Living')
plt.ylabel('Price')

plt.grid(True)
plt.show()
Histogram
import pandas as pd
import matplotlib.pyplot as plt

# Selecting the variable of interest


prices = data['price']

# Creating the histogram


plt.figure(figsize=(10, 6))
plt.hist(prices, bins=30, color='skyblue', edgecolor='black')

# Adding titles and labels


plt.title('Distribution of House Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')

plt.show()

import pandas as pd
import matplotlib.pyplot as plt

# Selecting the variable of interest


bedrooms = data['bedrooms']

# Round up the maximum value of bedrooms to the nearest integer


max_bedrooms = int(bedrooms.max() + 1)

# Creating the histogram


plt.figure(figsize=(10, 6))
plt.hist(bedrooms, bins=range(1, max_bedrooms + 1), color='orange', edgecolor='black')

# Adding titles and labels


plt.title('Distribution of Number of Bedrooms')
plt.xlabel('Number of Bedrooms')
plt.ylabel('Frequency')

plt.xticks(range(1, max_bedrooms + 1)) # Set x-axis ticks to integers only


plt.show()
Density Plot
import seaborn as sns
import matplotlib.pyplot as plt

# Select the column for which you want to create the density chart
column_to_plot = 'price'

# Set the style for the plot (optional)


sns.set(style="whitegrid")

# Create a density chart using seaborn's kdeplot


sns.kdeplot(data[column_to_plot], fill=True, palette="viridis",color='red')

# Add title and labels


plt.title(f'Density Chart for {column_to_plot}')
plt.xlabel(column_to_plot)
plt.ylabel('Density')

# Show the plot


plt.show()

B.2 Input and Output:

Line Chart:
ScatterPlot:
Bubble Chart:

Histogram:
Density Plot:
B.3 Observations and learning:
We have hence observed the various visualization techniques using python and gained various
insights. Python provides various libraries such as Matplotlib, Seaborn, Plotly, Bokeh, Pandas, etc.
that come with different features for visualizing data. All these libraries come with different
features and can support various types of graphs. These libraries offer different levels of complexity
and flexibility, so it is important to choose the right one depending on your needs and the type of
data you want to visualize.
Data visualization techniques are useful for understanding complex data patterns, communicating
information, identifying outliers and anomalies, spotting areas for improvement, and making data-
driven decisions.

B.4 Conclusion:
Hence, we can conclude that we have see the different types of visualizations available using the
python programming language.

B.5 Question of Curiosity:


Q1: What is data visualization?
Answer: Data visualization is the graphical representation of data and information using visual
elements such as charts, graphs, and maps. It allows people to easily interpret and understand
complex data sets and patterns. Data visualization can help to identify trends, patterns, and
relationships in data that may not be immediately apparent through tables or raw data. It can also
be used to highlight anomalies and outliers in data, and to communicate insights and findings to
others in a clear and concise manner. Effective data visualization involves choosing the
appropriate visual representation for the data and presenting it in a way that is visually appealing
and easy to understand. It is an important tool for decision making and problem solving in fields
such as business, science, and engineering.
Q2: Difference between the line and scatter plot?
Answer:
• The key difference between a line plot and a scatter plot is in the way they represent the
relationship between two variables
• A line plot shows the trend or pattern in the data over time or across a range of values. It
is used to represent continuous data, where the values are connected by a line. Line plots
are typically used to represent data that changes smoothly over time or space, such as
stock prices or temperature fluctuations.
• relationship between two variables in a set of data. It represents discrete data, where
each data point is plotted as a point on the graph. Scatter plots are typically used to
identify patterns or trends in data, to determine the strength of the relationship between
two variables, or to identify outliers or anomalies in the data.

Q3: What is histogram and density plot?


Answer: A histogram is a graphical representation of the distribution of a set of continuous
data. It works by dividing the data into intervals or bins, and then counting the number of data
points that fall within each bin. The resulting plot displays the frequency or count of data points
in each bin as a bar, with the height of the bar representing the number of data points that fall
within that interval. A density plot, on the other hand, is a smoothed version of a histogram that
estimates the probability density function of the underlying data. It works by fitting a density
estimate to the data, which is then plotted as a curve. The resulting plot shows the relative
frequency of data points at different values or intervals, with the area under the curve
representing the probability that a data point falls within a particular range of values.

You might also like