Professional Documents
Culture Documents
DV Lab
DV Lab
DV Lab
DATE:
Do the data manipulation operations for iris and mtcars dataset using dplyr package and obtain
the results for following functions
i)filter
ii)select
iii)arrange
iv) mutate
v) summarise
AIM:
To Do the data manipulation operations for iris and mtcars dataset using dplyr package and obtain the
i) Filter:
install.packages("dplyr")
library(dplyr)
data("mtcars")
data("iris")
mydata <- mtcars
mydata
mynewdata
myirisdata <-tbl_df(iris)
myirisdata
Use filter to filter data with required condition
filter(mynewdata,cyl>4)
When you are working with large datasets with many columns, but you are interested in a few,
select() allows you to rapidly zoom in on on a useful subset using operations that usually only work
mynewdata %>%
select( cyl,wt,gear)%>%
arrange( desc(wt))
IV) mutate:
This function ,mutate() adds new variables while preserving the existing ones. mutate() is used to
select sets of existing columns and add new columns that are functions of existing columns.
mynewdata %>%
select( mpg,cyl)%>%
v)summarise:
myirisdata %>%
group_by(Species)%>%
summarise(average=mean(Sepal.Length,na.rm=TRUE))
RESULT:
DATE:
Create a data frame and do the following operations using tidyr package
i)gather
ii)spread
iii) separate
iv)unite
AIM:
To Create a data frame and do the following operations using tidyr package
Procedure and Code:
Installing tidyr package
install.packages('tidyr')
library(tidyr)
Creating a dummy data set.
name <- c('Akanash', 'Bhanu','Vinay', 'Varun', 'Prashanth')
weight <- c(35,45,55,65,75)
age <- c(20,21,22,23,24)
class <- c('maths','physics','chemistry','biology','science')
Create a data frame
tdata <- data.frame( name, weight, age, class)
tdata
I) gather():
gathers multiple columns and converts them into key: value pairs. This function transforms wide form of
data to long form. It can be used as an alternative to ‘melt’ in reshape package.
longt <- tdata %>% gather( key, value, weight:class)
longt
II) Spread():
Does reverse of gather. It accepts a key:value pair and converts it into
Separate columns.
wide <- longt %>% spread( key, value)
wide
III) separate():
Splits a column into multiple columns.
Use the separate function when you have date time variable in the data set. Because a column contains
multiple information , It make sense to split it and use those values individually. The following code
shows the usage of the separate function.
Create a data frame:
Humidity <- c(37.79,42.34,52.16,44.57,48.83,44.59)
Rain <- c(0.971360441,1.1096716,1.06475853,0.953183435,0.98878849,0.9887643)
Time <- c("13/03/2018 23:24","09/01/2019 15.44","25/12/2018 19:15","02/01/2019 07:46","14/03/2018
01:55","20/10/2018 20:52")
dset <- data.frame (Humidity,Rain,Time)
dset
RESULT:
The tidyr package Program has been Executed Successfully
EX.NO:3 TABLE PACKAGE
DATE:
Do the following operations for any external dataset using data. table package
i) Select a subset row
ii) Select a column with particular values
AIM:
To do the data manipulation operations for external csv file using data.table package.
Procedure and Code:
RESULT:
Thus, the Data Manipulation using DATA. Table Package Executed Successfully.
EXPNO: 4 GGPLOT PACKAGE
DATE:
AIM:
To do the different types of visualization for air quality data set using ggplot
package in R.
a. Line graph
a. Bar graph
a. Histogram
a. Scatter plot
a. Pie chart
RESULT:
Thus, the Data Visualisation using DATA. GG Plot Package Executed Successfully.
EX.NO:5 PANDAS PACKAGE
DATE:
AIM:
To Do the data manipulation operations for iris and airquality dataset
using data.table package and obtain the results for following
functions.
a. Select a subset row
b. Select a column with particular values
c. Select columns with multiple values
d. Select a column to return a vector
e. Select multiple columns
f. Returns the sum and standard deviation
g. Sum of selected columns
PROCEDURE AND CODE:
Data.table package is a enhanced version of data.frame s, which
are the standard data structure for storing data in base R.
To install dplyr, use the below command
install.packages("data.table")
data[Species == 'setosa‘]
OUTPUT:
AIM:
To create the different types of graphs for user inputs. 1)Line graph 2)Line
Graph with style 3)Bar Graph(Horizontal and verticle) 4)Histogram
5)Scatter Plot.
3. BAR
GRAPH:
A-VERTICAL
import matplotlib.pyplot as plt
studentnames = ['Adeline','Jane','Roo','Bluewhale','Rossey'] marks =
[85,55,90,45,60]
plt.bar(studentnames,marks,color='purple')
plt.title('STUDENT DATA-BAR GRAPH VERTICAL')
plt.xlabel('NAMES')
plt.ylabel('MARKS)
plt.show()
OUTPUT PLOT:
B-HORIZONTAL :
import matplotlib.pyplot as plt
studentnames = ['Adeline','Jane','Roo','Bluewhale','Rossey'] marks =
[85,55,90,45,60]
plt.barh(studentnames,marks,color='c')
plt.title('STUDENT DATA-BAR GRAPH VERTICAL')
plt.xlabel('`MARKS')
plt.ylabel ('NAMES')
plt.show()
OUTPUT PLOT:
4. HISTOGRAM:
import matplotlib.pyplot as plt
student_marks=[45,12,13,26,15,55,100,98,95,54,58,56,52,24,71,66,6
6.5,12,23,55,78,10,9,5,10,22,35,65,45]
bins=[0,10,20,30,40,50,60,70,80,90,100]
plt.hist(student_marks,bins,histtype='bar',rwidth=0.8,color='purple')
plt.xlabel('MARKS')
plt.ylabel('NUMBER OF STUDENT')
plt.title('STUDENT DATA-HISTOGTAM')
plt.show()
OUTPUT PLOT:
5. SCATTER PLOT:
import matplotlib.pyplot as plt
import matplotlib.style
x=[5,6,8,10,15]
y=[20,30,40,50,55]
x2=[2,13,16,20,18]
y2=[25,35,16,23.5,40]
plt.scatter(x,y,color='purple')
plt.scatter(x2,y2,color='c')
plt.title=('STUDENT DATA-SCATTER PLOT')
plt.ylabel('Present %')
plt.xlabel('Roll.no')
plt.show()
OUTPUT PLOT:
RESULT:
The Visualization Graphs Program has been Executed Successfully
EX.NO:7 EXPLORATORY DATA ANALYSIS(EDA)
DATE:
AIM:
To Write the R program to implement the Exploratory
Data Analysis for the inbuild data set in data
visualization
OUTPUT:
2) Getting shape of the data:
print(data.shape)
OUTPUT:
data["Gender"].fillna(data["Gender"].mode()[0],inplace=Tr
ue)
data["Married"].fillna(data["Married"].mode()[0],inplace=T
rue)
data["Dependents"].fillna(data["Dependents"].mode()[0],inp
lace=True)
data["Self_Employed"].fillna(data["Self_Employed"].mode(
)[0],inplace=True)
data["Loan_Amount_Term"].fillna(data["Loan_Amount_Te
rm"].mode()[0],inplace=True)
data["Credit_History"].fillna(data["Credit_History"].mode(
) [0],inplace=True)
6) Filling missing values with continuous variable
with mean:
data["LoanAmount"].fillna(data["LoanAmount"].mean(),inp
lace=True)
7) Checking missing values:
print(data.isnull().sum())
OUTPUT:
8) Converting Categorical into numerical:
print(data.head())
OUTPUT:
AIM:
If the project file that you select to import is encrypted, you must
enter the password that was used for encryption to enable decrypting
sensitive connection properties.
RESULT:
Thus, the IBM-Watson Studio Project has been Executed
Successfully.
EX.NO: 9 DATA ANALYSIS – COVID 19 DATASET
DATE:
AIM:
To do the data analysis and visualization for covid19
dataset.
plt.plot(new_data['Name of State /
UT'],new_data['Cured/Discharged/Migrated'],color='Red')
plt.xticks(rotation=90)
Plt.show()
OUTPUT:
RESULT:
The Data analysis Program has been Executed Successfully.