Be A 65 Ads Exp 2

Experiment No.
02
PART B
Roll No.: A7 Name: Ritika Dwivedi
Class: BE-A(COMP) Batch: A1
Date of Experiment: 16-02-24 Date of Submission: 16-02-24
Grade:
Aim: To implement any five Data visualization Techniques.
B.1 Software Code written by student:
!pip install -q kaggle

import numpy as np
import pandas as pd
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname,filename))
from google.colab import files
files.upload()
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d shree1992/housedata
!unzip housedata.zip
data=pd.read_csv('data.csv')
data
df=pd.DataFrame(data)
df
Line Graph
import pandas as pd
import matplotlib.pyplot as plt
# Selecting the data

x_values = data['bedrooms'] # X-axis: Number of bedrooms
y_values = data['price'] # Y-axis: Price
# Plotting the data

plt.plot(x_values, y_values, marker='o', linestyle='-',color='red')
# Adding title and labels

plt.title('Price vs. Number of Bedrooms')
plt.xlabel('Number of Bedrooms')
plt.ylabel('Price')
# Displaying the plot
plt.show()
import pandas as pd
# Convert 'date' column to datetime format for better plotting

data['date'] = pd.to_datetime(data['date'])
# Sorting data by 'date' to make the line graph meaningful

data.sort_values('date', inplace=True)
# Selecting attributes to plot against 'date'. Example: 'price'

plt.figure(figsize=(10, 6))
plt.plot(data['date'], data['price'], marker='', linestyle='-', label='Price',color='green')

plt.title('Price Over Time')
plt.xlabel('Date')
plt.ylabel('Price')
# Optionally, add a legend if you plot multiple lines

plt.legend()
plt.xticks(rotation=45) # Rotate date labels for better readability

plt.tight_layout() # Adjust the layout to make room for the rotated date labels
plt.show()
import pandas as pd
# Selecting attributes for the line graph

x_attribute = 'sqft_living' # X-axis: Square footage of living space
y_attribute = 'price' # Y-axis: Price
# Plotting the line graph

plt.plot(data[x_attribute], data[y_attribute], marker='o', linestyle='-',color='orange')

plt.title(f'{y_attribute.capitalize()} vs. {x_attribute.capitalize()}')
plt.xlabel(f'{x_attribute.capitalize()}')
plt.ylabel(f'{y_attribute.capitalize()}')
plt.grid(True) # Add grid for better readability
plt.tight_layout()
plt.show()
ScatterPlot
import pandas as pd
# Define variables
x = data['sqft_living']
y = data['price']
size_attribute = 'bathrooms' # Using 'bathrooms' to adjust the size of the points
# Normalize the size attribute for better visualization

sizes = (data[size_attribute] - data[size_attribute].min()) / (data[size_attribute].max() -
data[size_attribute].min()) * 100
# Plotting the scatter plot with single color

scatter = plt.scatter(x, y, s=sizes, cmap='viridis', alpha=1, edgecolors='w',
linewidth=1,color='green')

plt.title('House Price by Sqft Living, Sized by Bathrooms')
plt.xlabel('Sqft Living')
plt.ylabel('Price')
plt.grid(True)
plt.show()
Bubble Chart
import pandas as pd
# Define variables
x = data['sqft_living']
y = data['price']
bedrooms = data['bedrooms']
bathrooms = data['bathrooms']
# Set bubble sizes based on the number of bedrooms

sizes = bedrooms * 10 # Adjust the scaling factor as needed
# Set bubble colors based on the number of bathrooms

colors = bathrooms
# Plotting the bubble chart

bubble = plt.scatter(x, y, s=sizes, c=colors, cmap='viridis', alpha=0.6, edgecolors='w', linewidth=1)
# Adding color bar to represent the 'bathrooms' attribute

cbar = plt.colorbar(bubble)
cbar.set_label('Number of Bathrooms')

plt.title('House Price vs. Sqft Living with Bedrooms and Bathrooms')
plt.xlabel('Sqft Living')
plt.ylabel('Price')
plt.grid(True)
plt.show()
Histogram
import pandas as pd
# Selecting the variable of interest

prices = data['price']
# Creating the histogram

plt.hist(prices, bins=30, color='skyblue', edgecolor='black')
# Adding titles and labels

plt.title('Distribution of House Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()
import pandas as pd
# Selecting the variable of interest

bedrooms = data['bedrooms']
# Round up the maximum value of bedrooms to the nearest integer

max_bedrooms = int(bedrooms.max() + 1)
# Creating the histogram

plt.hist(bedrooms, bins=range(1, max_bedrooms + 1), color='orange', edgecolor='black')
# Adding titles and labels

plt.title('Distribution of Number of Bedrooms')
plt.xlabel('Number of Bedrooms')
plt.ylabel('Frequency')
plt.xticks(range(1, max_bedrooms + 1)) # Set x-axis ticks to integers only

plt.show()
Density Plot
import seaborn as sns
# Select the column for which you want to create the density chart
column_to_plot = 'price'
# Set the style for the plot (optional)

sns.set(style="whitegrid")
# Create a density chart using seaborn's kdeplot

sns.kdeplot(data[column_to_plot], fill=True, palette="viridis",color='red')
# Add title and labels

plt.title(f'Density Chart for {column_to_plot}')
plt.xlabel(column_to_plot)
plt.ylabel('Density')
# Show the plot

plt.show()
B.2 Input and Output:
Line Chart:
ScatterPlot:
Bubble Chart:
Histogram:
Density Plot:
B.3 Observations and learning:
We have hence observed the various visualization techniques using python and gained various
insights. Python provides various libraries such as Matplotlib, Seaborn, Plotly, Bokeh, Pandas, etc.
that come with different features for visualizing data. All these libraries come with different
features and can support various types of graphs. These libraries offer different levels of complexity
and flexibility, so it is important to choose the right one depending on your needs and the type of
data you want to visualize.
Data visualization techniques are useful for understanding complex data patterns, communicating
information, identifying outliers and anomalies, spotting areas for improvement, and making data-
driven decisions.
B.4 Conclusion:
Hence, we can conclude that we have see the different types of visualizations available using the
python programming language.
B.5 Question of Curiosity:

Q1: What is data visualization?
Answer: Data visualization is the graphical representation of data and information using visual
elements such as charts, graphs, and maps. It allows people to easily interpret and understand
complex data sets and patterns. Data visualization can help to identify trends, patterns, and
relationships in data that may not be immediately apparent through tables or raw data. It can also
be used to highlight anomalies and outliers in data, and to communicate insights and findings to
others in a clear and concise manner. Effective data visualization involves choosing the
appropriate visual representation for the data and presenting it in a way that is visually appealing
and easy to understand. It is an important tool for decision making and problem solving in fields
such as business, science, and engineering.
Q2: Difference between the line and scatter plot?
Answer:
• The key difference between a line plot and a scatter plot is in the way they represent the
relationship between two variables
• A line plot shows the trend or pattern in the data over time or across a range of values. It
is used to represent continuous data, where the values are connected by a line. Line plots
are typically used to represent data that changes smoothly over time or space, such as
stock prices or temperature fluctuations.
• relationship between two variables in a set of data. It represents discrete data, where
each data point is plotted as a point on the graph. Scatter plots are typically used to
identify patterns or trends in data, to determine the strength of the relationship between
two variables, or to identify outliers or anomalies in the data.
Q3: What is histogram and density plot?

Answer: A histogram is a graphical representation of the distribution of a set of continuous
data. It works by dividing the data into intervals or bins, and then counting the number of data
points that fall within each bin. The resulting plot displays the frequency or count of data points
in each bin as a bar, with the height of the bar representing the number of data points that fall
within that interval. A density plot, on the other hand, is a smoothed version of a histogram that
estimates the probability density function of the underlying data. It works by fitting a density
estimate to the data, which is then plotted as a curve. The resulting plot shows the relative
frequency of data points at different values or intervals, with the area under the curve
representing the probability that a data point falls within a particular range of values.

Be A 65 Ads Exp 2

Uploaded by

Copyright:

Available Formats

You might also like

Be A 65 Ads Exp 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Be A 65 Ads Exp 2

Uploaded by

Copyright:

Available Formats

Experiment No.

Aim: To implement any five Data visualization Techniques.

B.1 Software Code written by student:

!pip install -q kaggle

# Selecting the data

# Plotting the data

# Adding title and labels

# Convert 'date' column to datetime format for better plotting

# Sorting data by 'date' to make the line graph meaningful

# Selecting attributes to plot against 'date'. Example: 'price'

# Adding title and labels

# Optionally, add a legend if you plot multiple lines

plt.xticks(rotation=45) # Rotate date labels for better readability

# Selecting attributes for the line graph

# Plotting the line graph

# Adding title and labels

plt.grid(True) # Add grid for better readability

# Normalize the size attribute for better visualization

# Plotting the scatter plot with single color

# Adding title and labels

# Set bubble sizes based on the number of bedrooms

# Set bubble colors based on the number of bathrooms

# Plotting the bubble chart

# Adding color bar to represent the 'bathrooms' attribute

# Adding title and labels

# Selecting the variable of interest

# Creating the histogram

# Adding titles and labels

# Selecting the variable of interest

# Round up the maximum value of bedrooms to the nearest integer

# Creating the histogram

# Adding titles and labels

plt.xticks(range(1, max_bedrooms + 1)) # Set x-axis ticks to integers only

# Set the style for the plot (optional)

# Create a density chart using seaborn's kdeplot

# Add title and labels

# Show the plot

B.2 Input and Output:

B.5 Question of Curiosity:

Q3: What is histogram and density plot?

You might also like