Ida Lab Final

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

[Approved by AICTE, Govt. of India & Affiliated to Dr.

APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

Lab File

Data Analytics and Visualization


(KDS-551)

Submitted By:
Student Name: Vipin Gupta
UniversityRoll No.:2101921540062
Branch: CSE-DS
Section: DS
Group:G1

Submitted To:
Mr. Rakesh Kumar

GL BAJAJ Institute of Technologies & Management


Greater Noida
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

Index
S. No. Name of experiment Date of Date of Signature
Experiment Submission
1 Get input from user and perform
operations (MAX, MIN, AVG, SUM,
SQRT, ROUND) using R.
2 To perform data import/export
(.CSV, .XLS, .TXT) operations using
data frames in R.
3 To get input matrix and perform
matrix addition, subtraction,
multiplication, inverse, transpose and
division using vector concept in R.
4 To perform statistical operations
(Mean, Median, Mode and Standard
deviation) using R.
5 To perform data pre-processing
operations i) Handling Missing data
ii) Min-Max normalization.
6 To perform dimensionally reduction
operation using PCA for Houses
Data Set.
7 To perform Simple Linear
Regression with R.

8 To perform K-Means clustering


operation and visualize for iris
dataset.

9 Learn how to collect data via web-


scraping, APIs and data connectors
from suitable sources as specified.
10 Perform association analysis on a
given dataset and evaluate its
accuracy.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 1
To get input from user and perform numerical operations (MAX, MIN, AVG, SUM,
SQRT, ROUND) using R.
#Get input for two numbers
num1<- as.numeric(readLine(“Enter the first number:”))
num2<- as.numeric(readLine(“Enter the second number:”))
#Perform numerical operations
sum_result <- num1 + num2
diff_result<- num1 - num2
prod_result<- num1 num2
div_result <- num1/num2
sqrt_result<- sqrt(num1)
round_result <- round(num2)
# Display the results
cat("Sum:", sum_result, "\n")
cat("Difference:", diff_result, "n")
cat("Product:", prod_result, "\n")
cat("Division:", div_result, "\n")
cat("Square Root of num1:", sqrt_result, "\n")
cat("Rounded num2:", round_result, "\n").

Output:
Enter the first number: 10
Enter the second number: 5
Sum:15
Difference:5
Product:50
Division:2
Square root of num1:3.1622
Rounded num2: 5
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

This program does the following:


It prompts the user to enter two numbers using the readline function and stores them in
variable num1 and num2.
It performs four basic numerical operations: addition, subtraction, multiplication, and
division on entered numbers.
The sqrt function calculates the square root of num1.
The round function rounds num2 to the nearest integer.

The results are displayed using the cat function.

Output 2:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 2
To perform data import/export (.CSV, .XLS, .TXT) operations using data frames in R.
In R, you can perform data import/export operations using data frames with various file
formats like CSV, XLS (Excel), and TXT. Here's how to do it:
Data Import:
CSV (Comma-Separated Values):
To import data from a CSV file into a data frame:

#Read CSV data into a data frame


Data<-read.csv("your file.csv")
XLS (Excel):

To import data from an Excel file into a data frame, you can use the readxl or openxlsx
package First, you need to install and load the package:

#Install and load the readxl package


Install.packages ("readxl!")
library(readxl)
Then, you can read data from an Excel file:

#Read Excel data into a data frame


data<-read_excel("your_file.xlsx")
TXT (Text):

To import data from a TXT file into a data frame:

#Read text data into a data frame (assuming tab-delimited)


Data<-read.table(“your_file.txt", header=TRUE, sep="\t")
Data Export:
CSV (Comma-Separated Values):
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

To export data from a data frame to a CSV file:


#Export data frame to CSV
write.csv(data, "output_file.csv", row.names = FALSE)
XLS (Excel):

To export data from a data frame to an Excel file, you can use the writesd package. First,
install and load the package:

#Install and load the writexl package


install.packages("writexl")
library(writexl)
Then, you can write the data frame to an Excel file:

#Export data frame to Excel


write xlsx(data, "output file.xlsx")
TXT (Text):

To export data from a data frame to a TXT file (tab-delimited in this example):

#Export data frame to text file (tab-delimited)


write table(data, "output_file.txt", sep="\t”, row.names=FALSE)

Output:
Data Import
Suppose you have a CSV file named "data.csv" with the following content:

Name, Age, Gender


Alice, 25, Female
Bob, 0, Male
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

Charlie 28 Male
Using R, after importing this data the dati data frame might look like this.

Name Age Gender


1 Alice 25 Female
2 Bob 30 Male
3 Charlie 28 Male

Data Export:
After performing some operations on the data frame and then exporting it back to CSV, the
"output.csv" file night look like this:

“”,"Name"."Age", "Gender"
“1","Alice", 25, "Female"
"2","Bob", 10, "Male"
"3", "Charlie", 28, "Male"
Please note that the specific appearance of the output can vary depending on settings and
options used during import/export, but this gives you a general idea of what to expect.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 3
To get the input matrix from user and perform matrix addition, subtraction,
multiplication, inverse, transpose and division operations using vector concept in R.

#Function to take input matrix from user


get matrix input<- function() {
rows<- as.integer(readline("Enter the number of rows: "))
cols<- as.integer(readline("Enter the number of columns: "))
matrix_data<- matrix(mumeric(rows*cols), nrow=rows, ncol=cols)

cat("Enter matrix elements row-wise:\n")


for( i in 1:rows){
for (j in I:cols) {
matrix_data[i, j]<- as.numeric(readline(paste("Enter element at row", i, "column", j,":")))
}
}
return(matrix data)
}

#Input matrices
Cat(“Matrix A:\n”)
matrix A<- get_matrix_input()
cat("Matrix B:\n”)
matrix_B<- get_matrix_input()

#Matrix operations
Cat(“\nMatrix Addition (A+B):\n")
Matrix_sum<- matrix_A+ matrix_B
Print(matrix_sum)
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

Cat(“Matrix Subtraction (A-B):\n”)


Matrix_diff<- matrix_A - matrix_B
Print(matrix_diff)
Cat("\nMatrix Multiplication (A*B):/n")
Matrix_mul<- matrix_A * matrix_B
print(matrix_mult)
cat(“\nMatrix Inverse (A^-1):\n”)
matrix_inv_A<- solve(matrix_A)
print(matrix_inv_A)
cat(“\nMatrix Transpose (A^T):\n")
matrix_trans_A<- (matrix_A)
print(matrix_trans_A)
cat(“\n Matrix Division (A/B):\n")
matrix_div <-matrix_A / matrix_B
print(matrix_div)

It defines a function get_matrix input to take user input for the matrices, including their
dimensions and elements.
It takes input for two matrices, A and B, and then performs the requested operations:
Matrix Addition (A+B)
Matrix Subtraction (A-B)
Matrix Multiplication (A*B)
Matrix Inverse (A^-1)
Matrix Transpose (A^T)
Matrix Division (A/B)

The results of each operation are displayed as output. You can copy and paste this code into
an R environment to test it with your own matrices.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

Output:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 4
To perform statistical operations (Mean, median, Mode and Standard Deviation) using
R.

#Import the data using read.csv()


myData-read.csv("Info.csv", stringsAsFactors=F)
MEAN:
#Compute the mean value
Mean=mean(myData$Age)
print(mean)

MEDIAN:
#Compute the median value
Median=median(myData$Age)
print(median)

MODE:
mode=function(){
return(sort(table(myData$Age))
}
mode()

Product Age Gender Education MaritalStatus Usage Fitness Income Miles


1 TM195 24 Male 14 Single 3 4 68562 2
2 TM195 19 Male 15 Single 2 3 31836 75
3 TM195 19 Female 14 Partnered 4 3 30699 66

Mean in R Programming Language:


It is the sum of observations divided by the total number of observations. It is also defined as
average which is the sum divided by count.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

Median in R Programming Language:


It is the middle value of the data set. It splits the data into two halves. If the number of
elements in the data set is odd then the center element is median and if it is even then the
median would be the average of two central elements.

Mode in R. Programming Language:


It is the value that has the highest frequency in the given data set. The data set may have no
made if the frequency of all data points is the same. Also, we can have more than one mode if
we encounter two or more data points having the same frequency. There is no inbuilt function
for finding mode in R, so we can create our own function for finding the mode or we can use
the package called modeest.

OUTPUT:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 5
To perform data preprocessing operations i) Handling Missing data ii) Min-Max
normalization.

Dealing Missing Values in R:


Missing Values in R, are handled with the use of some pre-defined functions:

is.na() Function for Finding Missing values:


A logical vector is returned by this function that indicates all the NA values present. It returns
a Boolean value. IFNA is present in a vector it returns TRUE or FALSE.
X<- c(NA, 3, 4, NA, NA, NA)
Is.na(x)
Output:
[1] TRUE FALSE FALSE TRUE TRUE TRUE

Properties of Missing Values:


• For testing objects that are NA use is a
• For testing objects that are NaN he is nan()
• There are classes under which NA comes. Hence integer clast has integer type NA.
• A NaN value is counted in NA but the reverse is not valid.

The creation of vector with one or multiple NAs is also possible.


X<- c(NA,3,4,NA, NA, NA)
Output:
[1] ΝΑ 3 4 ΝΑ NA NA

Removing NA or NaN values:


Extracting values except for NA or NaN values:

Example :
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

X<- c(0/0,2,1,3, NA, 4,0/0)


X
X[! is.na(x)]

Output:
[1] NaN 2 1 3 NA 4 NaN
[1]2 1 3 4
A function called complete.cases() can also be used. This function also works on data frames.

Min-Max Normalization:
This technique rescales values to be in the range between 0 and 1. Also, the data ends up with
smaller standard deviations, which can suppress the effect of outliers
Example: Let's write a custom function to implement Min-Max Normalization.
[V’ = (v-min(A)/(max(A)-min(A)) * (new_max(A)-new_min(A)) + new_min(A)]

This is the formula for Min-Mas Normalization Let's use this formula and create a custom
user defined function, minMax which takes in one vahe at a time and computes the scaled
value such that it lies between 0 and 1. Here new_max(A) is I and new_min(A) is 0 as we
trying in scale down/up the values in the range [0,1] import the library
library(caret)

#dataset
Data=data.frame(var1=c(120,345,145,122,596,285,21)
var2=c(10,15,45,22,53,26,12)
var3=c(34,0.05,0.15,0.12,-6,0.85,0.11)

#custom function to implement min-max scaling


minMax<- function(x){
(x-min(x))/(max(x)-min(x))
}
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

#normalise data using custom function


normalizedMydata<- as.data.frame(lapply(data, minMax))
head(normalisedMydata)

OUTPUT:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 6
To perform dimensionally reduction operation using PCA for Houses Data Set.
To perform dimensionality redaction using Principal Component Analysis (PCA) for the
"Houses" dataset in R, you can we the following steps as an example:

Load the Houses Dataset: Fast, you need to load the "Houses" dataset into R. You can do this
by readings CSV file or using any other method suitable for your dataset.
#Load the Houses dataset (replace 'dataset.csv’ with your dataset file)
houses_data<- read.csv(“dataset.csv”, header=TRUE)
Data Preprocessing: Ensure that your data is cleaned and standardized. You can use the
function to standardize numerical features.

#Standardize numerical features.


Houses_data_scaled<- scale(houses_data)

PCA Implementation:
#Perform PCA
pca_result<- prcomp(houses_data_scaled, center=TRUE, scale=TRUE)

#Specify the number of components you want to retain


n_components<- 3 #Adjust as needed
Explained Variance:
#Check the explained variance ratio
explained_variance<- pca_result$sdev^2 / sum(pca_result$sdev^2)
print("Explained Variance Ratio:", explained_variance)

Dimentionality Reduction:
# Transform the data wing the selected number of components
houses_data_pca<- as.data.frame(predict(pca_result, newdata=houses_data_scaled,
ncomp=n_compnents))
Now, houses data_pca_contains the reduced-dimensional data with the specified number of
principal components.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

You can analyze and use this reduced dataset for further analysis or modeling as needed.
Adjust the code according to the actual file name and data preprocessing requirements of
your specific "Houses" dataset.

Output:
Output for Explained Variance: After running the code to print explained variance for each
principal component, you might see something like this:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 7
To perform Simple Linear Regression with R.
To perform Simple Linear Regression in R, you can use the built-in functions and libraries
for regression analysis. Here are the general steps:

Load Your Data: First, load your dataset into R.


#Load your dataset (replace 'dataset.csv’ with your dataset file)
Data<-read.csv("dataset.csv", header=TRUE)
Explore Your Data: Examine your data to understand the relationship between the variables
you want to use in the regression analysis. Ensure that you have a dependent variable (Y) and
an independent variable (X)

Fit the Linear Regression Model:


#Assuming "Y' is the dependent variable and 'X' is the independent variable
Linear_model<- Im(Y~X , data=data)
Summary of the Model:
#Get a summary of the linear regression model
Summary(linear_model)
The summary will provide information about coefficients, R-squared, p-values, and other
statistics related to the regression model.
Visualization (Optional): You can create plots to visualize the regression line and the data
points.

#Create a scatterplot of the data:


Plot(data$X, data$Y, main="Simple Linear Regression", xlab= "X", ylab="Y")
#Add the regression line
abline( linear_model, col="red")
Make Predictions:
#To make predictions for new data, for example, a new value of ‘X’:
new_data<-data.frame(X=c(10, 20, 30)) #Replace with your new data
predictions <- predict(linear_model, newdata=new_data)
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

These are the basic steps to perform a Simple Linear Regression analyses in R. Make sure to
adapt the code according to your specific dataset and variable names.

Output:
Assuming you’ve followed the steps provided earlier and you have a simple linear regression
model, here’s an example of the output you might expect:

#Output of summary(model)
Coefficients:
Estimate Std. Error t value Pr(&gt )
(Intercept) 1.2000 0.7386 1.6260.1786
x 0.8000 0.3139 2.545 0.0672.

#Scatter plot with regression line


This output includes information about the coefficients, standard errors, t-values, p-values,
the residual standard error, R-squared values, and more. It summarizes the linear regression
models fit to the data.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 8
To perform K-Means clustering operation and visualize for iris data set.
library(ggplot2)
df <- iris
head(iris)
ggplot(df, aes(Petal.Length, Petal.Width)) + geom_point(aes(col=Species), size=4)
set.seed(101)
irisCluster <- kmeans(df[,1:4], center=3, nstart=20)
irisCluster
table(irisCluster$cluster, df$Species)
library(cluster)
clusplot(iris, irisCluster$cluster, color=T, shade=T, labels=0, lines=0)
#We can see the setosa cluster perfectly explained, meanwhile virginica and versicolor have a
#little noise between their clusters.
#(we will not always have the labeled data. If we would want to know the exactly number of
#centers, we should have built the elbow method.)

tot.withinss <- vector(mode="character", length=10)


for (i in 1:10){
irisCluster <- kmeans(df[,1:4], center=i, nstart=20)
tot.withinss[i] <- irisCluster$tot.withinss
}
#(Let’s visualize it.)
plot(1:10, tot.withinss, type="b", pch=19)

Output:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 9
Learn to collect data via web-scraping. APIs and data connectors from suitable sources
as specified by the instructor.

Collecting data through web scraping, APIs, and data connectors is a valuable skill for
acquiring information from various online sources. Here's an overview of each method:

1. Web Scraping:
Web scraping involves extracting data from websites by parsing the HTML or XML. content
of web pages. Here are the general steps:
a. Select a Target Website: Choose a website from which you want to scrape data. Make sure
to review their terms of service and robots.txt file to ensure compliance with their policies.
b. Inspect the Web Page: Use your web browser's developer tools to inspect the HTML
structure of the web page. Identify the elements that contain the data you want to scrape.
c. Choose a Scraping Tool: Python offers popular libraries like BeautifulSoup and Scrapy for
web scraping. In R, you can use libraries like rvest.
d. Write Code to Scrape Data: Write code to navigate the HTML structure and extract the
desired data. This often involves selecting elements by their HTML tags, classes, or IDs.
e. Store Data: Save the scraped data in a structured format, such as CSV, ISON, or a
database.
f. Automate the Process: Consider using web scraping frameworks to automate data
collection on a schedule if needed.

#Install and load necessary libraries


install packages("rvest")
library (rvest)

#Specify the URL of the webpage tо sстаре


url<- “https://example.com”
#Send an HTTP GET request to the webpage
Webpage<- httr::GET(url)
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

#Parse the HTML content of the webpage


parsed_html<- read_html(content(webpage, as=”text”))

#Extract data using CSS selectors


data<- parsed_html
htnl_nodes(“.class-name")
html_text()

#Print the extracted data


print(data)

In this example, replace "https://example.com" with the URL of the webpage you want to
scrape and "class-name" with the appropriate CSS selector for the data you want to extract.

2. APls (Application Programming Interfaces):


APIs are structured methods for retrieving data from web services. Here's how to use them:

a. Find an API: Look for an API that provides the data you need. Many websites and online
services offer APIs to access their data programmatically.
b. Get API Access: Register for an API key or token of required. This key is often used to
authenticate and track your usage.
c. Read API Documentation: Stady the API documentation to understand how to make
requests, the available endpoints, and the data format.
d. Make API Requests: Use a programming language (e.g., Python, R) to make HTTP
requests the API endpoints. You can libraries like requests in Python or httr in R
e. Parse and Process Data: Choice you receive the API response, parse the data (usually in
JSON XML formal) to extract the relevant information.
f. Store Data: Save the data in structured format for analysis or future use.
g. Respect Rate Limits: Many APIs have rate limits, so ensure you don't exceed them to
avoid being blocked.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

#install and load necessary libraries


install.packages("httr")
library(httr)

#Specify the API endpoint and your API key (if required)
api_url<- "https://api.example.com/data"
api_key<- “your_api_key_here"

#Set up headers with API key (if required)


headers<- list("Authorization" paste("Bearer", api_key))

#Send an HTTP GET request to the API


Response<- httr::GET(url=api_url, headers=headers)

#Check the response status


if (httr::http_status(response)$status_code=200) {
#Parse JSON response
Data<- httr::content(response, as="parsed")

#Print the retrieved data


print(data)}
else {
cat(“API request failed", httr: http status(response)$reason, "\n")
}
Replace "https://api.example.com/data" with the actual API endpoint, and
“your_api_key_here" with your API key if the API requires authentication.

3. Data Connectors:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

Duta connections or integrations allow you to access data from various sources with
predefined connectors or plugins. Here's how to use them:
a. Identify a Data Connector Tool: Use a data integration tool like Zapier, Integromat, or
Microsoft Power Automate that offers connectors for various data sources.
b. Select Data Sources: Choose the data sources you want to connect. These can include
online services, databuses, apps, and more.
c. Configure Connectors: Set up the connectors by providing necessary authentication
credentials and configuring the data transfer settings.
d. Automate Data Flows: Create automation workflows that trigger data transfers between
your selected sources on predefined conditions or schedules.
e. Monitor and Maintain; Regularly monitor your data flows to ensure they are working
correctly. Make adjustments as needed.
Remember to respect the terms of service, privacy policies, and legal considerations when
collecting data through these methods. Additionally, keep data security and user privacy in
mind throughout the process.
Install and load necessary libraries
install.packages("RIDBC")
library(RADBC)

#JDBC driver and connection parameters


driver<- JDBC(“org.postgresql.Driver", class.Path="/path/to/driver.jar")
url<- “jdbc:postgresql://localhost 5432/database_name"
user<- “your_username"
password<- "your_password”

#Establish a database connection.


Conn<- dbConnect(driver, url=url, user=user, password=password)

#Execute a SOL query to fetch data


Query<- "SELECT*FROM table_name”
result<- dbGetQuery(conn, query)
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

#Print the retrieved


data print(result)

#Close the database connection


dbDisconnect(conn)
Replace the driver classpath(path/to/driver.jar”), database URL(“jdbc:postgresql
://localhost:5432/database_name”), tablename(“table_name”), username(“your_username”),
and password(“your_password”) with your database connection details and query.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

PROGRAM 10
Perform association analysis on a given dataset and evaluate its accuracy.
Performing association analysis in R typically involves using the Apriori algorithm to
discover frequent item sits and association rales in transactional data. Here's a step-by-step
guide on bow to perform association analysis in R using the Apriori algorithm and evaluate
its accuracy:

Step 1: Install and Load Required Packages


Before you begin, make sure you have the necessary packages installed. Youll need the arules
package for association analysis:
install.packages("arules")
library(arules)

Step 2. Load and Explore the Dataset


Load your dataset, which should be in a transaction format. Each row represents a
transaction, and items are typically represented as binary values (eg. I for presence, 0 for
absence).

#Load your dataset (replace 'your_dataset.csv' with your dataset's file path)
data <- read.transactions("your_dataset.csv", format="basket", sep=”,”)
You can explore the dataset using functions like summary() to get an overview of the data.

Step 3: Perform Association Analysis


Use the apriori() function to perform association analysis and discover frequent itemsets and
rules. Set the parameters such as support and confidence based on your analysis
requirements:

#Perform association analysts


Rules<- apriori(data, parameter=list(support=0.1, confidence=0.7))
This will generate association rules with the specified support and confidence thresholds.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

Step 4: Evaluate Accuracy


To evaluate the accuracy of the association rules, you can consider the following metrics:
Support: Measures the frequency of a rule in the dataset. Higher support values indicate more
frequent itemsets.
Confidence: Measures the strength of the rule. It represents the conditional probability of the
consequent item(s) given the antecedent item(s). Higher confidence values indicate stronger
rules.
Lift: Indicates whether the presence of the antecedent item(s) has a positive or negative effect
on the presence of the consequent item(s). Lift>1 suggests a positive correlation.

You can access these metrics for your rules using the following code:
#View the top rules and their metrics
inspect(head(sort(rules, by "lift"), n=10))
This cade will display the top 10 rules sorted by lift, but you can adjust the number and
sorting criteria as needed.
Evaluate the rules based on your domain knowledge and the business context. You may need
to fine-tune the support and confidence thresholds to discover meaningful associations.

Step 5: Interpret and Act as the Results


After evaluating the rules, interpret the results and consider how they can be applied to your
specific problem or business scenario. You can use these insights for various purposes, such
as product recommendations, marketing strategies, or process optimization.
Remember that association analysis results should be validated and tested in real-world
scenarios to ensure their practical usefulness and accuracy.

Output:

#Load required libraries


library(arules)
#Load the Groceries dataset
data( &quot; Groceries&quot;)
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering

transactions &lt;- as(Groceries, d&quot;transactions&quot;)

#Perform association analysis using Apriori


rules &lt;- apriori(transactions, parameter =list( support 0.001, confidence-0.51))
#Display the top rules by lift
top rules &lt;- head(sort(rules, by= &quot;lift&quot;), n=10)
inspect(top_rules)

You might also like