Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 43

Department of CSE

COURSE NAME: BDO


COURSE CODE: 21CS3276R
TOPIC :
FOUNDATIONS OF R

Session - 4
AIM OF THE SESSION

To familiarize students with the basic concept of R Programming

INSTRUCTIONAL OBJECTIVES

This Session is designed to:


1. Demonstrate R Basics
2. Describe the Functions of R
3. Controlling execution of R Functions

LEARNING OUTCOMES

At the end of this session, you should be able to:


1. Fundamentals of R
2. Execution of R Functions
SESSION INTRODUCTION

Objects of R

Every programming language has its own data types to store values or any

information so that the user can assign these data types to the variables and

perform operations respectively. Operations are performed accordingly to the

data types.

These data types can be character, integer, float, long, etc. Based on the data

type, memory/storage is allocated to the variable


SESSION INTRODUCTION

Unlike other programming languages, variables are assigned to objects rather

than data types in R programming. The following are list of objects used in R
1. Vectors
2. List
3. Matrices
4. Factors
5. Data Frames
SESSION DESCRIPTION

VECTOR

Atomic vectors are one of the basic types of objects in R programming.

Atomic vectors can store homogeneous data types such as character, doubles,

integers, raw, logical, and complex.

A single element variable is also said to be vector.


SESSION DESCRIPTION

Here are some key characteristics and concepts related to vectors in R:


• Homogeneous Data Type: Vectors in R are homogeneous, meaning that all the
elements in a vector must be of the same data type. For example, you can have a
numeric vector, a character vector, or a logical vector, but you cannot mix different
data types within a single vector.
• Atomic Data Types: Vectors can contain elements of atomic data types, such as
numeric (real or integer values), character (text), logical (TRUE or FALSE), and
complex (complex numbers).
• Creation of Vectors:
•Using the c() function: The most common way to create a vector is by using the c()
SESSION DESCRIPTION

Example for VECTOR


# Create vectors
x <- c(1, 2, 3, 4)
y <- c("a", "b", "c", "d")
z <- 5

# Print vector and class of vector


print(x) -------- 1,2,3,4
print(class(x)) -------- numeric

print(y) -------- "a", "b", "c", "d"


print(class(y)) -------- charater

print(z) -------- 5
print(class(z)) -------- numeric
SESSION DESCRIPTION

List

List is another type of object in R programming. List can contain heterogeneous

data types such as vectors or another lists.


SESSION DESCRIPTION

List

List is another type of object in R programming. List can contain

heterogeneous data types such as vectors or another lists.

Lists are designed to store heterogeneous data, meaning that each element

within a list can be of a different data type, and they can be of varying

lengths. This flexibility makes lists particularly useful for organizing and

managing complex and structured data.


SESSION DESCRIPTION

Example
Output
my_list <- list(
name = "John Doe", $name
age = 30, [1] "John Doe"
city = "New York", $age
hobbies = c("Reading", "Hiking", "Cooking"), [1] 30
scores = c(95, 89, 78, 92) $city
) [1] "New York" $
print(my_list) Hobbies
[1] "Reading" "Hiking" "Cooking“
$scores
[1] 95 89 78 92
SESSION DESCRIPTION

Matrices

A matrix is a two-dimensional data structure that stores elements of the same data

type in rows and columns. Matrices are commonly used for various mathematical

and statistical operations


SESSION DESCRIPTION

Creation of Matrix in R

A matrix in R using the matrix() function. It takes several arguments, such as data,

the number of rows, and the number of columns.

# Create a 3x3 matrix

my_matrix <- matrix(1:9, nrow = 3, ncol = 3)


SESSION DESCRIPTION

Example
my_matrix <- matrix(1:9, nrow = 3, ncol = 3)
print(my_matrix)

Output
SESSION DESCRIPTION

Factors

In R, a factor is a data structure used to represent categorical data.

Categorical data consists of distinct categories or levels, and factors are used

to store and manipulate this type of data. Factors are particularly useful for

data analysis and statistical modeling, as they enable R to treat categorical

variables appropriately.
SESSION DESCRIPTION

Creating Factors:

You can create a factor using the factor() function. This function takes a

vector of categorical data as its main argument and, optionally, other

arguments like levels to specify the categories and ordered to indicate

whether the factor represents an ordered categorical variable.

# Create a factor

gender <- factor(c("Male", "Female", "Male", "Male", "Female"))

Output: Male Female Male Male Female


SESSION DESCRIPTION

Levels:

A factor consists of a set of levels, which are the distinct categories or values

in the categorical variable. You can access the levels of a factor using the

levels() function.

# factor_levels <- levels(gender)

factor_levels

Outout: Male Female


SESSION DESCRIPTION

Data Frames:

In R, a data frame is a fundamental data structure used to store and

manipulate data in a tabular format, similar to a spreadsheet or database

table.

Data frames are a common way to organize and work with structured data,

making them one of the most important data structures in R.


SESSION DESCRIPTION

Creating Data Frames:

You can create a data frame using the data.frame() function. This function

allows you to combine vectors of different types into a data frame, with each

vector representing a column.


SESSION DESCRIPTION

Example Data Frames: Output

# Creating a simple data frame

df <- data.frame( Name = c("John", "Alice",

"Bob"), Age = c(28, 24, 32),

City = c("New York", "Los Angeles", "Chicago"))

str(df)
SESSION DESCRIPTION

Typical Functions Used in R


R offers a wide range of functions and operators for manipulating objects, including
containers like vectors, lists, data frames, and more. Here are some useful functions and
operators for common data manipulation tasks in R

Data Manipulation String


Manipulation
Data Aggregation and Summarization Data Sorting
Missing Data Handling Data Reshaping
Data Sampling Statistical
Analysis:
Function Application Data I/O
SESSION DESCRIPTION

Data Manipulation:
• subset(): Filter rows based on conditions and select specific columns.

• mutate() : Create or modify columns in a data frame.

• aggregate(): Aggregate data using a summary function.

• merge() and join() (from dplyr): Perform data joins between data frames.

• split(): Split data into a list based on a factor or a grouping variable.

• stack() and unstack(): Reshape data from wide to long and vice versa.

• rbind() and cbind(): Combine data frames by rows or columns.


SESSION DESCRIPTION

String Manipulation:

• paste(), paste0(): Concatenate strings.

• grep(), sub(), gsub(): Perform regular expression-based text manipulation.

• strsplit(): Split strings into substrings.

• tolower(), toupper(): Convert character data to lowercase or uppercase.


SESSION DESCRIPTION

Data Aggregation and Summarization:

• tapply(): Apply a function to subsets of data based on a factor.

• aggregate(): Compute summary statistics for different levels of a factor.

• by(): Apply a function to subsets of data based on a factor..


SESSION DESCRIPTION

Data Sorting:

• order(): Get the index that would sort a vector.

• sort(): Sort a vector or data frame by one or more columns.

• arrange() : Sort data frames by one or more columns.


SESSION DESCRIPTION

Missing Data Handling:

• is.na(): Check for missing values.

• na.omit(): Remove missing values from a data frame.

• complete.cases(): Identify complete cases in a data frame.

Data Reshaping:

• melt() and cast() (from reshape2): Reshape data for analysis.

• **gather() and spread() (from tidyr): Reshape data from wide to long and vice
versa.
SESSION DESCRIPTION

Data Sampling:

• sample(): Randomly sample elements from a vector or data frame.

• sample_n() and sample_frac() (from dplyr): Randomly sample rows from a data
frame.

Statistical Analysis:

• lm(): Fit linear regression models.

• glm(): Fit generalized linear models.

• t.test(), wilcox.test(): Perform statistical tests.

• cor(), cov(): Calculate correlation and covariance.


• sum(), min(), max(), mean(), median(): Statistical functions
SESSION DESCRIPTION

Function Application:

• lapply(), sapply(), apply(): Apply a function to elements of a list or array.

• map(), mapply(): Apply a function to multiple lists or vectors.

• do.call(): Call a function with a list of arguments.

Data I/O:

• read.csv(), read.table(): Read data from CSV files or tab-delimited text files.

• write.csv(), write.table(): Write data frames to CSV or text files.

• readRDS(), saveRDS(): Read and save R objects.


SESSION DESCRIPTION

Machine Learning and Data Mining:


• caret and mlr packages: Streamline the process of building and evaluating
machine learning models.
• randomForest(), xgboost(), caret for machine learning algorithms.

Data Visualization:
• plot(), hist(), barplot(), etc.: Create basic plots.
• ggplot2 package: Create complex and customized plots.
• lattice package: Create conditioned plots.
• heatmap(), boxplot(), qqnorm(), etc.: Generate specialized plots.
• plotly and shiny for interactive plots.
SESSION DESCRIPTION
Control Structures
Control structures in R are fundamental programming constructs that
allow you to control the flow of your code and make decisions based
on conditions.

R supports a variety of control structures, including conditional


statements, loops, and function calls
SESSION DESCRIPTION
Control Structures
Conditional Statements: if Statement

The basic conditional statement in R. It allows you to execute a block


of code if a condition is true.
Syntax:
if (condition) { # Code to execute if condition is true}
SESSION DESCRIPTION
Control Structures
Conditional Statements: if-else Statement

Allows you to execute one block of code if a condition is true and


another block if the condition is false.
Syntax:
if (condition) { # Code to execute if condition is true}
else { # Code to execute if condition is false}
SESSION DESCRIPTION
Control Structures
Conditional Statements: if-else if-else Statement

Allows you to test multiple conditions and execute different code blocks
based on which condition is true
Syntax:
if (condition1) { # Code to execute if condition1 is true}
else if (condition2) { # Code to execute if condition2 is true}
else { # Code to execute if no conditions are true}

Note: NO SWITCH STATEMENT IN R


SESSION DESCRIPTION
Control Structures
Loop Statements: for loop

Iterates over a sequence (e.g., a vector) and executes a block of code


for each element in the sequence
Syntax:
for (variable in sequence) { # Code to execute for each element in the
sequence}
SESSION DESCRIPTION
Control Structures
Loop Statements: while loop

Repeats a block of code as long as a condition is true.


Syntax:
while (condition) { # Code to execute as long as the condition is true}
SESSION DESCRIPTION
Control Structures
Loop Statements: repeat loop and break statement

Creates an infinite loop, which can be exited using the break


statement.
Syntax:
repeat { # Code to execute (use break to exit the loop) with
conditional statement
break}
SESSION DESCRIPTION

Functions in R

A function is a block of code which only runs when it is called. You can pass

data, known as parameters, into a function. A function can return data as a

result.

In R, you define functions using the function() keyword and implement them

by specifying the function's arguments, body, and return value


SESSION DESCRIPTION

Call Function

To call a function, use the function name followed by parenthesis, like

function_name(arg,agr):
SESSION DESCRIPTION

function_name <- function(arg1, arg2, ...)

{ # Function body # Perform operations

return(result)

function_name(arg1,agr2)
SESSION DESCRIPTION

Example
square <- function(x) {
result <- x^2
return(result)
}
square(5)

Output

25
SELF-ASSESSMENT QUESTIONS

1. Point out the correct statement?

(a) Blocks are evaluated until a new line is entered after the closing brace
(b) Break will execute a loop while a condition is true

(c) The if/else statement conditionally evaluates two statements

2. Point out the wrong statement?

(a) if and else tests a condition and acting on it


(b) for will execute a loop a fixed number of times
(c) break will execute a loop while a condition is true
TERMINAL QUESTIONS

1. Describe about Control Structures in R

2. Discuss about different categories of functions can be used in R Programming

3. Evaluate the functionality of R Objects.


REFERENCES FOR FURTHER LEARNING OF THE SESSION

Text Books:

1. Paulo Cortez, “Modern Optimization with R “ Springer, (2014).


2. Nicholas J. Horton & Ken Klein man, “ Using R and R Studio for Data Management, Statistical
Analysis, and Graphics”, Second Edition , CRC Press, (2015).

Reference Books:

1. Carlo Zaniolo, “Advanced database systems”, Morgan Kaufmann, Elsevier, (1997).


2.Jan L. Harrington, “Relational Database Design”, Morgan Kaufmann, Elsevier, (2009)
THANK YOU

Team – BDO

You might also like