Professional Documents
Culture Documents
Ida Lab Final
Ida Lab Final
Ida Lab Final
APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
Lab File
Submitted By:
Student Name: Vipin Gupta
UniversityRoll No.:2101921540062
Branch: CSE-DS
Section: DS
Group:G1
Submitted To:
Mr. Rakesh Kumar
Index
S. No. Name of experiment Date of Date of Signature
Experiment Submission
1 Get input from user and perform
operations (MAX, MIN, AVG, SUM,
SQRT, ROUND) using R.
2 To perform data import/export
(.CSV, .XLS, .TXT) operations using
data frames in R.
3 To get input matrix and perform
matrix addition, subtraction,
multiplication, inverse, transpose and
division using vector concept in R.
4 To perform statistical operations
(Mean, Median, Mode and Standard
deviation) using R.
5 To perform data pre-processing
operations i) Handling Missing data
ii) Min-Max normalization.
6 To perform dimensionally reduction
operation using PCA for Houses
Data Set.
7 To perform Simple Linear
Regression with R.
PROGRAM 1
To get input from user and perform numerical operations (MAX, MIN, AVG, SUM,
SQRT, ROUND) using R.
#Get input for two numbers
num1<- as.numeric(readLine(“Enter the first number:”))
num2<- as.numeric(readLine(“Enter the second number:”))
#Perform numerical operations
sum_result <- num1 + num2
diff_result<- num1 - num2
prod_result<- num1 num2
div_result <- num1/num2
sqrt_result<- sqrt(num1)
round_result <- round(num2)
# Display the results
cat("Sum:", sum_result, "\n")
cat("Difference:", diff_result, "n")
cat("Product:", prod_result, "\n")
cat("Division:", div_result, "\n")
cat("Square Root of num1:", sqrt_result, "\n")
cat("Rounded num2:", round_result, "\n").
Output:
Enter the first number: 10
Enter the second number: 5
Sum:15
Difference:5
Product:50
Division:2
Square root of num1:3.1622
Rounded num2: 5
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
Output 2:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
PROGRAM 2
To perform data import/export (.CSV, .XLS, .TXT) operations using data frames in R.
In R, you can perform data import/export operations using data frames with various file
formats like CSV, XLS (Excel), and TXT. Here's how to do it:
Data Import:
CSV (Comma-Separated Values):
To import data from a CSV file into a data frame:
To import data from an Excel file into a data frame, you can use the readxl or openxlsx
package First, you need to install and load the package:
To export data from a data frame to an Excel file, you can use the writesd package. First,
install and load the package:
To export data from a data frame to a TXT file (tab-delimited in this example):
Output:
Data Import
Suppose you have a CSV file named "data.csv" with the following content:
Charlie 28 Male
Using R, after importing this data the dati data frame might look like this.
Data Export:
After performing some operations on the data frame and then exporting it back to CSV, the
"output.csv" file night look like this:
“”,"Name"."Age", "Gender"
“1","Alice", 25, "Female"
"2","Bob", 10, "Male"
"3", "Charlie", 28, "Male"
Please note that the specific appearance of the output can vary depending on settings and
options used during import/export, but this gives you a general idea of what to expect.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
PROGRAM 3
To get the input matrix from user and perform matrix addition, subtraction,
multiplication, inverse, transpose and division operations using vector concept in R.
#Input matrices
Cat(“Matrix A:\n”)
matrix A<- get_matrix_input()
cat("Matrix B:\n”)
matrix_B<- get_matrix_input()
#Matrix operations
Cat(“\nMatrix Addition (A+B):\n")
Matrix_sum<- matrix_A+ matrix_B
Print(matrix_sum)
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
It defines a function get_matrix input to take user input for the matrices, including their
dimensions and elements.
It takes input for two matrices, A and B, and then performs the requested operations:
Matrix Addition (A+B)
Matrix Subtraction (A-B)
Matrix Multiplication (A*B)
Matrix Inverse (A^-1)
Matrix Transpose (A^T)
Matrix Division (A/B)
The results of each operation are displayed as output. You can copy and paste this code into
an R environment to test it with your own matrices.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
Output:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
PROGRAM 4
To perform statistical operations (Mean, median, Mode and Standard Deviation) using
R.
MEDIAN:
#Compute the median value
Median=median(myData$Age)
print(median)
MODE:
mode=function(){
return(sort(table(myData$Age))
}
mode()
OUTPUT:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
PROGRAM 5
To perform data preprocessing operations i) Handling Missing data ii) Min-Max
normalization.
Example :
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
Output:
[1] NaN 2 1 3 NA 4 NaN
[1]2 1 3 4
A function called complete.cases() can also be used. This function also works on data frames.
Min-Max Normalization:
This technique rescales values to be in the range between 0 and 1. Also, the data ends up with
smaller standard deviations, which can suppress the effect of outliers
Example: Let's write a custom function to implement Min-Max Normalization.
[V’ = (v-min(A)/(max(A)-min(A)) * (new_max(A)-new_min(A)) + new_min(A)]
This is the formula for Min-Mas Normalization Let's use this formula and create a custom
user defined function, minMax which takes in one vahe at a time and computes the scaled
value such that it lies between 0 and 1. Here new_max(A) is I and new_min(A) is 0 as we
trying in scale down/up the values in the range [0,1] import the library
library(caret)
#dataset
Data=data.frame(var1=c(120,345,145,122,596,285,21)
var2=c(10,15,45,22,53,26,12)
var3=c(34,0.05,0.15,0.12,-6,0.85,0.11)
OUTPUT:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
PROGRAM 6
To perform dimensionally reduction operation using PCA for Houses Data Set.
To perform dimensionality redaction using Principal Component Analysis (PCA) for the
"Houses" dataset in R, you can we the following steps as an example:
Load the Houses Dataset: Fast, you need to load the "Houses" dataset into R. You can do this
by readings CSV file or using any other method suitable for your dataset.
#Load the Houses dataset (replace 'dataset.csv’ with your dataset file)
houses_data<- read.csv(“dataset.csv”, header=TRUE)
Data Preprocessing: Ensure that your data is cleaned and standardized. You can use the
function to standardize numerical features.
PCA Implementation:
#Perform PCA
pca_result<- prcomp(houses_data_scaled, center=TRUE, scale=TRUE)
Dimentionality Reduction:
# Transform the data wing the selected number of components
houses_data_pca<- as.data.frame(predict(pca_result, newdata=houses_data_scaled,
ncomp=n_compnents))
Now, houses data_pca_contains the reduced-dimensional data with the specified number of
principal components.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
You can analyze and use this reduced dataset for further analysis or modeling as needed.
Adjust the code according to the actual file name and data preprocessing requirements of
your specific "Houses" dataset.
Output:
Output for Explained Variance: After running the code to print explained variance for each
principal component, you might see something like this:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
PROGRAM 7
To perform Simple Linear Regression with R.
To perform Simple Linear Regression in R, you can use the built-in functions and libraries
for regression analysis. Here are the general steps:
These are the basic steps to perform a Simple Linear Regression analyses in R. Make sure to
adapt the code according to your specific dataset and variable names.
Output:
Assuming you’ve followed the steps provided earlier and you have a simple linear regression
model, here’s an example of the output you might expect:
#Output of summary(model)
Coefficients:
Estimate Std. Error t value Pr(> )
(Intercept) 1.2000 0.7386 1.6260.1786
x 0.8000 0.3139 2.545 0.0672.
PROGRAM 8
To perform K-Means clustering operation and visualize for iris data set.
library(ggplot2)
df <- iris
head(iris)
ggplot(df, aes(Petal.Length, Petal.Width)) + geom_point(aes(col=Species), size=4)
set.seed(101)
irisCluster <- kmeans(df[,1:4], center=3, nstart=20)
irisCluster
table(irisCluster$cluster, df$Species)
library(cluster)
clusplot(iris, irisCluster$cluster, color=T, shade=T, labels=0, lines=0)
#We can see the setosa cluster perfectly explained, meanwhile virginica and versicolor have a
#little noise between their clusters.
#(we will not always have the labeled data. If we would want to know the exactly number of
#centers, we should have built the elbow method.)
Output:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
PROGRAM 9
Learn to collect data via web-scraping. APIs and data connectors from suitable sources
as specified by the instructor.
Collecting data through web scraping, APIs, and data connectors is a valuable skill for
acquiring information from various online sources. Here's an overview of each method:
1. Web Scraping:
Web scraping involves extracting data from websites by parsing the HTML or XML. content
of web pages. Here are the general steps:
a. Select a Target Website: Choose a website from which you want to scrape data. Make sure
to review their terms of service and robots.txt file to ensure compliance with their policies.
b. Inspect the Web Page: Use your web browser's developer tools to inspect the HTML
structure of the web page. Identify the elements that contain the data you want to scrape.
c. Choose a Scraping Tool: Python offers popular libraries like BeautifulSoup and Scrapy for
web scraping. In R, you can use libraries like rvest.
d. Write Code to Scrape Data: Write code to navigate the HTML structure and extract the
desired data. This often involves selecting elements by their HTML tags, classes, or IDs.
e. Store Data: Save the scraped data in a structured format, such as CSV, ISON, or a
database.
f. Automate the Process: Consider using web scraping frameworks to automate data
collection on a schedule if needed.
In this example, replace "https://example.com" with the URL of the webpage you want to
scrape and "class-name" with the appropriate CSS selector for the data you want to extract.
a. Find an API: Look for an API that provides the data you need. Many websites and online
services offer APIs to access their data programmatically.
b. Get API Access: Register for an API key or token of required. This key is often used to
authenticate and track your usage.
c. Read API Documentation: Stady the API documentation to understand how to make
requests, the available endpoints, and the data format.
d. Make API Requests: Use a programming language (e.g., Python, R) to make HTTP
requests the API endpoints. You can libraries like requests in Python or httr in R
e. Parse and Process Data: Choice you receive the API response, parse the data (usually in
JSON XML formal) to extract the relevant information.
f. Store Data: Save the data in structured format for analysis or future use.
g. Respect Rate Limits: Many APIs have rate limits, so ensure you don't exceed them to
avoid being blocked.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
#Specify the API endpoint and your API key (if required)
api_url<- "https://api.example.com/data"
api_key<- “your_api_key_here"
3. Data Connectors:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Computer Sc. & Engineering (Data Science),
Grater Noida Department of Applied Computational Science & Engineering
Duta connections or integrations allow you to access data from various sources with
predefined connectors or plugins. Here's how to use them:
a. Identify a Data Connector Tool: Use a data integration tool like Zapier, Integromat, or
Microsoft Power Automate that offers connectors for various data sources.
b. Select Data Sources: Choose the data sources you want to connect. These can include
online services, databuses, apps, and more.
c. Configure Connectors: Set up the connectors by providing necessary authentication
credentials and configuring the data transfer settings.
d. Automate Data Flows: Create automation workflows that trigger data transfers between
your selected sources on predefined conditions or schedules.
e. Monitor and Maintain; Regularly monitor your data flows to ensure they are working
correctly. Make adjustments as needed.
Remember to respect the terms of service, privacy policies, and legal considerations when
collecting data through these methods. Additionally, keep data security and user privacy in
mind throughout the process.
Install and load necessary libraries
install.packages("RIDBC")
library(RADBC)
PROGRAM 10
Perform association analysis on a given dataset and evaluate its accuracy.
Performing association analysis in R typically involves using the Apriori algorithm to
discover frequent item sits and association rales in transactional data. Here's a step-by-step
guide on bow to perform association analysis in R using the Apriori algorithm and evaluate
its accuracy:
#Load your dataset (replace 'your_dataset.csv' with your dataset's file path)
data <- read.transactions("your_dataset.csv", format="basket", sep=”,”)
You can explore the dataset using functions like summary() to get an overview of the data.
You can access these metrics for your rules using the following code:
#View the top rules and their metrics
inspect(head(sort(rules, by "lift"), n=10))
This cade will display the top 10 rules sorted by lift, but you can adjust the number and
sorting criteria as needed.
Evaluate the rules based on your domain knowledge and the business context. You may need
to fine-tune the support and confidence thresholds to discover meaningful associations.
Output: