Data Analytics Practical.pdf.PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Department of Computer Science and Engineering

Subject Name: Data Analytics


(Subject Code: CS605)

Submitted by : SAKSHI SHARMA

Enrollment No. 0176CS211150

Course: B.Tech

Session: 2023-24

Submitted to: Dr. Keerti Verma

Page 1
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Data Analysis
Data analysis is a naturally integral process of cleansing, transforming, and analyzing raw data
obtain usable, relevant information that can assist businesses in making educated decisions. By givi
relevant insights and data, which are commonly presented in charts, photos, tables, and graphs, t
technique helps to lessen the risks associated with decision-making. When it comes to implementi
effective data analysis for Excel, the robust capabilities of the software enhance the entire proces
Excel’s features, including pivot tables, data tables, and various statistical functions, play a vital ro
in streamlining and optimizing data analysis for Excel. This synergy between data analysis and Exc
empowers users to navigate and derive meaningful insights from complex datasets naturally. Da
analytics encompasses not just data analysis, but also data collecting, organization, storage, and t
tools and techniques used to delve deeper into data, as well as those used to present the finding
such as data visualization tools. On the other hand, data analysis is concerned with the process
transforming raw data into meaningful statistics, information, and explanations. Data visualization
an interdisciplinary field concerned with the depiction of data graphically. When the data is large, su
as in a time series, it is a very effective manner of conveying. The mapping establishes how the
components’ characteristics change in response to the data. A ba chart, in this sense, is a mapping
a variable’s magnitude to the length of a bar. Mapping is a basic component of Data visualization sin
the graphic design of the mapping can negatively affect the reading of a chart.
The iterative Data Analysis Process is comprised of the following phases:

Specification of Data Requirements


Data Gathering
Data Processing
Data Cleaning
Data Analysis
Data Communication

Introduction to Excel for Data Analysis


Data analysis is a naturally valuable skill that can help you make better judgments. Microsoft Excel is
one of the most used programs for data analysis, with the built-in pivot tables being the most popula
analytic tool. Excel for data analysis provides a user-friendly platform where individuals can efficient
organize and interpret data sets. Whether you are working in finance, marketing, or any other indust
mastering the intricacies of Excel for data analysis can significantly enhance your ability to derive
meaningful insights and inform strategic decision-making.

Page 18
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Experiment 1

Microsoft Excel allows you to examine and interpret data in a variety of ways. The information
could come from several different places. A variety of formats and conversions are available for
the data. Conditional Formatting, Ranges, Tables, Text functions, Date functions, Time
functions, financial functions, Subtotals, Quick Analysis, Formula Auditing, Inquire Tool, What-
if Analysis, Solvers, Data Model, PowerPivot, PowerView, PowerMap, and other Excel
commands, functions, and tools can all be used to analyse it.
Essential Excel Data Analysis Functions
E cdealt ah as hundreds of functions and trying to match the proper formula with the right kind
ofx
analysis can be overwhelming. It is not necessary for the most valuable functions to be difficult.
You’ll
w
inotenrdperre th ow you ever lived without fifteen easy functions that will increase your ability to
data.
1. Concatenate
When conducting data analysis, the formula =CONCATENATE is one of the simplest to
understand
b
coumt bminoesdt ipnotow ae rful. Text, numbers, dates, and other data from numerous cells can be
single cell.
SYNTAX = CONCATENATE (text1, text2, [text3], …)
2. Len () uInti ldisaetda analysis, LEN is used to show the number of characters in each cell. It’s
frequently
when working with text that has a character limit or when attempting to distinguish between
product
numbers.
SYNTAX = LEN (text)
3. Days ()
The number of calendar days between two dates is calculated using this function = DAYS.
SYNTAX =DAYS (end_date, start_date)
4. Networkdays
The number of weekends is automatically excluded when using the function. It’s classified as a
Date/Time Function in Excel. The net workday’s function is used in finance and accounting for
determining employee benefits based on days worked, the number of working days available
throughout a project, or the number of business days required to resolve a customer problem,
among
other things.
SYNTAX = NETWORKDAYS (start_date, end_date, [holidays])
5. Sumifs()
One of the “must-know” formulas for a data analyst is =SUMIFS. =SUM is a familiar formula,
but
what if you need to sum data based on numerous criteria? It’s SUMIFS.
SYNTAX = SUMIFS (sum_range, range1, criteria1, [range2], [criteria2], …)

Page 19 6. Averageifs()AVERAGEIFS, like SUMIFS, lets you take an average based on one or

more
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

result, it doesn’t need a sum range like SUMIFS.


SYNTAX = COUNTIFS (range, criteria)
8. Counta()
COUNTA determines whether a cell is empty or not. You’ll come across incomplete data sets daily as
a data analyst. Without needing to restructure the data, COUNTA will allow you to examine any gaps
in the dataset.
SYNTAX = COUNTA (value1, [value2], …)
9.Vlookup()
The acronym VLOOKUP stands for ‘Vertical Lookup.’ It’s a function that tells Excel to look for a
s(tpheec ific value in a column
so-called ‘table array’) to return a value from another column in the same row. SYNTAX =
VLOOKUP (lookup_value, table_array, column_index_num, [range_lookup]) 10. Hlookup()
“Horizontal” is represented by the letter H in HLOOKUP. It looks for a value in the top row of a
table or an array of values, then returns a value from a row you specify in the table or array in the
same column. When your comparison values are in a row across the top of a data table and you
wish to look down a specific number of rows, use HLOOKUP. When your comparison values are in
a column to the left of the data you wish to find, use VLOOKUP. SYNTAX = HLOOKUP
(lookup_value, table_array, row_index, [range_lookup]) 11. If () The IF function comes in
handy a lot. We can use this function to automate decision-making in our spreadsheets. We
could use IF to make Excel conduct a different computation or show a different value based on
the results of a logical test (a decision). The IF function will ask you to run a logical test, as well
as what action to take if the test is true and what action to take if the test is false. SYNTAX = IF
(logical_test, [value_if_true], [value_if_false])
12. Iferror()
We could display a more informative error than Excel does, or even execute an alternative
computation, by using IFERROR. Two things are required for the IFERROR function to work.
What
value should be checked for an error and what action should be taken instead.
SYNTAX = IFERROR (value, value_if_error)
13. Find/Search
The FIND function in Excel returns the position of one text string within another (as a number).
FIND
delivers a #VALUE error if the text cannot be located.
However, a =SEARCH for “Bigger” will return results for Bigger or bigger, broadening the scope
of
the query. This is very helpful when searching for anomalies or unique identifiers.
SYNTAX = FIND (find_text, within_text, [start_num])
SYNTAX = SEARCH (find_text, within_text, [start_num])
14. Left/Right
=LEFT and =RIGHT are simple and efficient ways for retrieving static data from cells. =RIGHT
returns the “x” number of characters from the cell’s end, while =LEFT returns the “x”
number of
echxtarraacctteedrs
thef cell’s
rofrmombeginning. In the sample below, the consumer’s area code is
their phone number using =LEFT, while the last four digits are extracted using =RIGHT.
SYNTAX = LEFT (text, [num_chars])
SYNTAX = RIGHT (text, [num_chars])
Page 20 15. Rank() Even though =RANK is an old Excel function, it is nevertheless useful for data an
is a
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

utilised in this case to determine which clients order the most stuff. SYNTAX = RANK (number, re
[order]) Some of the Methods for Data Analysis in Excel 1. Ranges and Tables The information yo
have can be in the form of a table or a range. Whether the data is in a range or a table, certain actio
can be performed on it. Certain procedures, however, are more successful when data is stored
tables rather than ranges. There are some operations that are only applicable to table You will als
gain an understanding of how to analyze data in ranges and tables. You’ll learn how to name range
how to utilise them, and how to manage them. The same may be said for table names. 2. Da
Cleaning – Text Functions, Dates and Times Before moving on to data analysis, you must clean an
organize the data you’ve gathered from multipl sources. The following approaches can be used
clean data in Excel.

With Text Functions


Containing Date Values
Containing Time Values
3. Conditional Formatting
Conditional formatting instructions in Excel allow you to colour cells or fonts, as well as place symbo
next to values in cells, based on predetermined criteria. This aids in visualizing the most importan
values.
It allows you to highlight cells with a different colour depending on the value you set to them. Rules
data bars, colour scales, icon Sets, finding duplicates, shading alternate rows, comparing two list
conflicting rules, checklists, and creating Heat Maps all benefit from conditional formatting.
4. Sorting and Filtering
You may need to sort and/or filter your data to prepare for data analysis and/or to display specific
critical data. You can perform the same thing in Excel using the simple sorting and filtering options.
Sort and Filter are the most used Excel functions. Within columns, sorting can be done in ascending
descending order. Lists can be sorted by colour, reversed, or randomly generated. Filters are used to
display data that meets requirements. Number and Text Filters, Date Filters, Advanced Filter, Data
Form, Remove Duplicates, Outlining Data, and Subtotal are some of the options.
5. Subtotals with Ranges
PivotTables are commonly used to summarize data, as you are aware. However, Subtotals with Rang
is another Excel function that allows you to group/ungroup data and summarize data in ranges in a fe
simple steps.
6. QuickAnalysis
You can quickly execute numerous data analysis activities and create quick representations of the
results with Excel’s Quick Analysis function.
7. Understanding Lookup Functions
Excel Lookup Functions allow you to search through a large amount of data for data values that fit a
of criteria. Vlookup and Hlookup are two different types of lookup engines. Analysts use Vlookup and
Hlookup to discover a value in a database and retrieve other values that correspond to that value. Da
analysts frequently use it to integrate and consolidate useful data from several excel sheets.
8. PivotTables
PivotTables allow you to summarise data and create dynamic reports by modifying the PivotTable’s
contents. You can use pivot tables to extract important data from a vast dataset. This is the most
practical method of data analysis. After inserting a Pivot Table, you can drag fields, sort, filter, or
change the summary calculation. Two-dimensional Pivot Tables are also possible. Group Pivot Table
Items, Multi-level Pivot Table, Frequency Distribution, Pivot Chart, Slicers, Update Pivot Table,
Calculated Field/Item, and GetPivotData are all important functions.

Page 21
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

9. Data Visualization in Excel Charts are simple to make and display data in a variety of ways,
making them more helpful than a sheet. You can make a chart, modify its type, adjust the row or
column, the legend location, and the data labels. Column Chart, Line Chart, Pie Chart, Bar Chart,
Area Chart, Scatter Plot are some of the different types of charts provided in Microsoft Excel.
10. Data Validation Only valid values may need to be entered into cells. Otherwise, they risk
producing erroneous results. Using data validation commands, you can rapidly set up data
validation values for a cell, an input message prompting the user on what should be typed in the
cell, validate the values provided against the supplied criteria, and display an error message in
the case of incorrect entries. It may be necessary to insert only valid values into cells.
Otherwise, they could result in inaccurate calculations. You may quickly set up data validation
values for a cell, an input message prompting the user on what should be typed in the cell,
validate the values entered against the given criteria, and display an error message in the case
of wrong entries using data validation commands. 11. Financial Analysis Excel has several
financial features. However, you may learn to employ a combination of these functions to solve
common situations that need financial analysis. 12. Working with Multiple Worksheets It’s
possible that you’ll need to run multiple identical calculations in different worksheets. Instead
of duplicating these calculations in each worksheet, you can complete them in one and have
them display in all of the others. You may also use a report worksheet to compile the data from
the multiple worksheets. 13. Formula Auditing When you utilise formulas, you should double-
check that they are working correctly. Formula Auditing commands in Excel assist you in tracing
previous and dependent variables as well as error checking. 14. What-if Analysis You can
extract critical data from a large dataset using pivot tables. This form of data analysis is the
most practical. You can drag fields, sort, filter, and adjust the summary calculation after a Pivot
Table has been inserted. Pivot Tables can also be made in two dimensions. The functions of
Group Pivot Table Items, Multi-level Pivot Table, Frequency Distribution, Pivot Chart, Slicers,
Update Pivot Table, Calculated Field/Item, and GetPivotData are all essential.

Page 22
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Experiment 2

Correlation:
1. Data Preparation:
Organize your data into two columns (let's say Column A and Column B).
2. Correlation Calculation:
In a new cell, use the following formula:
lessCopy code
=CORREL(A1:A10, B1:B10)
Replace A1:A10 and B1:B10 with your actual data ranges.
Regression:
1. Data Preparation:
Similar to correlation, organize your data into two columns (let's say Column A for the independent
variable and Column B for the dependent variable).
2. Regression Calculation:
In a new cell, use the following formula:
phpCopy code
=LINEST(B1:B10, A1:A10, TRUE, TRUE)
Replace B1:B10 and A1:A10 with your actual data ranges. The TRUE parameters include the intercept and
statistics.
The LINEST function returns an array with slope, y-intercept, standard errors, and other regression
statistics.
Covariance:
1. Data Preparation:
Again, organize your data into two columns (let's say Column A and Column B).
2. Covariance Calculation:
In a new cell, use the following formula:
lessCopy code
=COVARIANCE.P(A1:A10, B1:B10)
Replace A1:A10 and B1:B10 with your actual data ranges.
Note: COVARIANCE.P calculates population covariance, while COVARIANCE.S calculates
sample covariance. Choose the appropriate one based on your data.

Page 23
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Experiment 3

Basic syntax in the R programming language for Data analytics, R is a powerful statistical programming
language commonly used for data analysis, statistical modeling, and visualization. Here are some fundament
aspects of R syntax:
1. Assigning Values:
Use the assignment operator <- or = to assign values to variables.
RCopy code
x <- 10 y = 5
2. Data Types:
R supports various data types, including numeric, character, logical, and more.
RCopy code
numeric_var <- 3.14 character_var <- "Hello, World!" logical_var <- TRUE
3. Vectors:
Vectors are fundamental data structures in R.
RCopy code
numeric_vector <- c(1, 2, 3, 4, 5) character_vector <- c("apple", "banana", "orange")
4. Indexing and Slicing:
Access elements in a vector using square brackets.
RCopy code
numeric_vector[2] # Access the second element character_vector[1:2] # Access the first two elements
5. Matrices:
Create matrices using the matrix() function.
RCopy code
matrix_data <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
6. Data Frames:
Data frames are used for tabular data.
RCopy code
df <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Grade = c("A", "B", "C") )
7. Functions:
Define and use functions.
RCopy code
square <- function(x) { return(x^2) } result <- square(4)
8. Conditional Statements:
Use if, else if, and else for conditional logic.
RCopy code
x <- 10 if (x > 5) { print("x is greater than 5") } else { print("x is not greater than 5") }
9. Loops:
Use for and while loops for iteration.
RCopy code
for (i in 1:5) { print(i) } j <- 1 while (j <= 5) { print(j) j <- j + 1 }
10. Packages:
Install and load packages for additional functionality.
RCopy code
install.packages("tidyverse") # Install the tidyverse package library(tidyverse) # Load the tidyverse package

Page 24
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Experiment 4
The implementation of matrices, arrays, and factors in R, and then perform some basic operations, includin
calculating the variance.
Matrices:
RCopy code
# Creating a matrix mat <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3) # Print the matrix print(mat)
Arrays:
RCopy code
# Creating an array arr <- array(c(1, 2, 3, 4, 5, 6), dim = c(2, 3, 1)) # Print the array print(arr)
Factors:
RCopy code
# Creating a factor gender <- c("Male", "Female", "Male", "Female", "Male") factor_gender <- factor(gende
Print the factor print(factor_gender)
Variance Calculation:
Now, let's perform variance calculations on a numeric vector, a matrix, and an array.
Variance of Numeric Vector:
RCopy code
# Numeric vector numeric_vector <- c(2, 4, 6, 8, 10) # Calculate variance variance_numeric <-
var(numeric_vector) # Print the result print(variance_numeric)
Variance of Matrix:
RCopy code
# Matrix matrix_data <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3) # Calculate variance for each column
variance_matrix <- apply(matrix_data, 2, var) # Print the result print(variance_matrix)
Variance of Array:
RCopy code
# Array array_data <- array(c(1, 2, 3, 4, 5, 6), dim = c(2, 3, 1)) # Calculate variance for each dimension
variance_array <- apply(array_data, c(1, 2), var) # Print the result print(variance_array)

Page 25
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Experiment 5

Data frames are a fundamental data structure in R, widely used for storing tabular data. They
allow you to organize data in rows and columns, similar to a spreadsheet. Here's a basic
implementation and use of data frames in R: Creating a Data Frame: RCopy code # Creating a
data frame df <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Grade =
c("A", "B", "C"), stringsAsFactors = FALSE # Optional: Prevent character vectors from being
converted to factors ) # Print the data frame print(df) Accessing and Modifying Data Frame:
RCopy code # Accessing a specific column ages <- df$Age # Accessing a specific row
row_bob <- df[2, ] # Modifying a value in the data frame df[3, "Grade"] <- "B" # Print the
modified data frame print(df) Adding and Removing Columns: RCopy code # Adding a new
column df$City <- c("New York", "San Francisco", "Los Angeles") # Removing a column df <-
df[, -4] # Remove the Grade column # Print the updated data frame print(df) Filtering and
Subsetting: RCopy code # Filtering rows based on a condition young_people <- df[df$Age <
30, ] # Subsetting columns subset_df <- df[, c("Name", "Age")] # Print the filtered and subset
data frames print(young_people) print(subset_df) Summary Statistics: RCopy code #
Summary statistics for numeric columns summary_stats <- summary(df$Age) # Print the
summary statistics print(summary_stats) Merging Data Frames: RCopy code # Creating a
second data frame df2 <- data.frame( Name = c("David", "Eva"), Age = c(28, 35), Grade =
c("B", "A"), stringsAsFactors = FALSE ) # Merging data frames merged_df <- rbind(df, df2) #
Print the merged data frame print(merged_df)

Page 26
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 6

R and perform data manipulation, including data transpose operations.


Create Sample (Dummy) Data:
R
Copy code
# Create a data frame with dummy data
data <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22),
Grade = c("A", "B", "C"),
Score1 = c(85, 90, 75),
Score2 = c(92, 88, 80)
)
# Print the dummy data
print(data)
Data Manipulation:
Filtering and Subsetting:
R
Copy code
# Filter rows based on a condition
young_people <- data[data$Age < 30, ]
# Subset columns
subset_data <- data[, c("Name", "Age", "Grade")]
# Print the filtered and subset data frames
print(young_people)
print(subset_data)
Adding a New Column:
R
Copy code
# Add a new column for the total score
data$Total_Score <- data$Score1 + data$Score2
# Print the updated data frame
print(data)
Transpose Operation:
R
Copy code
# Transpose the data frame
transposed_data <- t(data)

# Print the transposed data frame


print(transposed_data)

Page 27
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

The t() function is used to transpose the data frame. It swaps rows and columns, effectively turning
columns into rows and vice versa.
Keep in mind that transposing a data frame may not always be suitable, especially if the data has mi
types or if you are working with a large dataset. However, in some scenarios, transposing can be use
for reshaping data.

Page 28
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Experiment 7

The briefly discussion of various control structures in R, and then we'll move on to data
manipulation using the dplyr package. Control Structures in R: 1. if-else Statements: RCopy code
# Example of if-else statement x <- 10 if (x > 5) { print("x is greater than 5") } else { print("x is not
greater than 5") } 2. for Loops: RCopy code # Example of for loop for (i in 1:5) { print(i) } 3. while
Loops: RCopy code # Example of while loop j <- 1 while (j <= 5) { print(j) j <- j + 1 } 4. Switch
Case: RCopy code # Example of switch case day <- "Monday" switch( day, "Monday" = print("It's
the start of the week."), "Friday" = print("It's almost the weekend."), print("It's a regular day.") )
Data Manipulation with dplyr Package: The dplyr package is widely used for data manipulation in
R. It provides a set of functions that make it easy to manipulate and analyze data frames.
Installation: RCopy code install.packages("dplyr") library(dplyr) Examples of dplyr Functions: 1.
Filtering Rows: RCopy code # Filter rows based on a condition filtered_data <- filter(data, Age <
30) 2. Selecting Columns: RCopy code # Select specific columns selected_data <- select(data,
Name, Age, Grade) 3. Mutating (Adding/Modifying) Columns: RCopy code # Add a new column
for the total score mutated_data <- mutate(data, Total_Score = Score1 + Score2) 4. Arranging
Rows: RCopy code # Arrange rows based on a column arranged_data <- arrange(data,
desc(Age)) 5. Summarizing Data: RCopy code # Summarize data summary_data <-
summarize(data, Avg_Score = mean(Total_Score)) These are just a few examples of the powerful
data manipulation functions provided by the dplyr package. The syntax is designed to be
expressive and readable, making it easier to write and understand complex data manipulations
in R.

Page 29
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Experiment 8
Matlab is a powerful tool for data visualization and analysis. Below is a simple example of data
visualization using Matlab. In this example, we'll generate some sample data and create a scatter plo
matlabCopy code
% Generate sample data x = randn(100, 1); % 100 random values from a normal distribution y = 2*x
0.5*randn(100, 1); % Linear relationship with some noise % Scatter plot scatter(x, y, 'filled');
title('Scatter Plot'); xlabel('X-axis'); ylabel('Y-axis'); grid on;
Explanation of the code:

randn generates random numbers from a standard normal distribution.


scatter creates a scatter plot of the data.
'filled' fills the markers with color.
title, xlabel, ylabel set the title and axis labels.
grid on adds a grid to the plot.
This is a basic example, and Matlab offers a wide range of plotting functions and customization
options for more complex visualizations. You can create line plots, bar plots, histograms, surface
plots, and more.

Page 30
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Experiment 9
Using Numpy and Pandas in a Jupyter Notebook:
1. Install Numpy and Pandas:
pythonCopy code
!pip install numpy pandas
2. Import Libraries:
pythonCopy code
import numpy as np import pandas as pd
3. Create Numpy Array:
pythonCopy code
# Creating a Numpy array numpy_array = np.array([[1, 2, 3], [4, 5, 6]]) print("Numpy Array:")
print(numpy_array)
4. Create Pandas DataFrame:
pythonCopy code
# Creating a Pandas DataFrame from a Numpy array pandas_df = pd.DataFrame(numpy_array,
columns=['A', 'B', 'C']) print("\nDataFrame:") print(pandas_df)
5. Data Manipulation with Pandas:
pythonCopy code
# Adding a new column pandas_df['D'] = [7, 8] # Filtering rows filtered_df = pandas_df[pandas_df['B
> 2] # Display the manipulated DataFrame print("\nManipulated DataFrame:") print(filtered_df)
6. Displaying DataFrame in a Table:
pythonCopy code
# Using the `table` package to display the DataFrame in a table format from table import Table tabl
Table() table.add_rows(filtered_df) table.display()
Note: Make sure to install the table package first using !pip install table.

Page 31
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL

Experiment 10
Data visualization is a crucial aspect of data analysis, and Python offers several powerful libraries for
creating visualizations. Two popular libraries for this purpose are Matplotlib and Seaborn. Let's go
through a basic study and implementation using these libraries.
Installation:
pythonCopy code
!pip install matplotlib seaborn
Importing Libraries:
pythonCopy code
import matplotlib.pyplot as plt import seaborn as sns
Matplotlib:
Line Plot:
pythonCopy code
# Create data x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] # Line plot plt.plot(x, y, label='Line Plot')
plt.title('Line Plot Example') plt.xlabel('X-axis') plt.ylabel('Y-axis') plt.legend() plt.show()
Scatter Plot:
pythonCopy code
# Scatter plot plt.scatter(x, y, label='Scatter Plot', color='red', marker='o') plt.title('Scatter Plot
Example') plt.xlabel('X-axis') plt.ylabel('Y-axis') plt.legend() plt.show()
Seaborn:
Pair Plot:
pythonCopy code
# Load a sample dataset iris = sns.load_dataset('iris') # Pair plot sns.pairplot(iris, hue='species')
plt.title('Pair Plot Example') plt.show()
Heatmap:
pythonCopy code

# Create a correlation matrix correlation_matrix = iris.corr() # Heatmap


sns.heatmap(correlation_matrix,annot=True, cmap='coolwarm') plt.title('Heatmap Example')
plt.show()
Additional Notes:
Explore various Matplotlib and Seaborn functions for customization, such as colors, styles, and
annotations.
Utilize NumPy and Pandas for data manipulation and preparation before visualization.

Page 32

You might also like