Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Programming for Data Analysis

(CT127-3-2-PFDA and Version VC1)

Data Structures
Topic & Structure of the lesson

- Matrices
- Arrays
- Data.frames
- List

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
2 of 255of 19
Learning outcomes

At the end of this topic, you should be able to:

• Understand the various data structures
available in the R programming language

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
3 of 355of 19
Key terms you must be able to use

If you have mastered this topic, you should be able to use

the following terms correctly in your assignments and

• Matrices
• Arrays
• Frames
• List

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
4 of 455of 19

- A very common mathematical structure that is

essential to statistics is a matrix.
- The matrix has columns and rows.
- The data in the matrix must be the same type,
most commonly all numeric.
- The matrix is filled by columns.
- Can do all mathematical operations.
- The matrix function is used to create the matrix.

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
5 of 555of 19

Matrix columns
1 4 7
1 2 3 4 5 6 7 8 9
rows 2 5 8
number of rows 3 6 9

matrix(data = x, nrow = 1, ncol = 1, dimnames = NULL)

> v<-c(1,2,3,4,5,6,7,8,9)
row and column names
> a=matrix(v,nrow=3,ncol=3)
> a number of columns
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
6 of 655of 19
Giving names to the rows and columns
> B = matrix(c(2, 4, 3, 1), nrow=2, ncol=2, dimnames
=list(c("row1", "row2"), c(“col1", “col2")))
col1 col2
row1 2 3
row2 4 1

Add two matrices

> C = matrix( c(7, 4, 2,4), nrow=2, ncol=2)
> D= B+C
col1 col2
row1 9 5
row2 8 5

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
7 of 755of 19
Matrix 1 2 3
1 1 4 7

rows 2 2 5 8
3 3 6 9
1 2 3
-An element at the row, column of a can be mth nth
1 1 4 7
accessed by the expression a[m, n].
2 2 5 8
> a [2, 3] # element at 2nd row, 3rd column
[1] 8 3 3 6 9
1 2 3
1 1 4 7
- The entire mth row can be extracted as a[m, ].
2 2 5 8
> a [2, ] # the 2nd row
[1] 2 5 8 3 3 6 9
1 2 3
- The entire nth column can be extracted as a[ ,n]. 1 1 4 7
> a [ ,3] # the 3rd column 2 2 5 8
[1] 7 8 9 3 3 6 9
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
8 of 855of 19
Combining Matrices
> B = matrix( c(2, 4, 3, 1, 5, 7), nrow=3, ncol=2)
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
> C = matrix( c(0,0,1,1),nrow=2,ncol=2)
[,1] [,2]
[1,] 0 1
[2,] 0 1
-Then we can combine the rows of B and C with rbind.
> rbind(B, C)
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
[4,] 0 1
[5,] 0 1
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
9 of 955of 19
Combining Matrices
> B = matrix( c(2, 4, 3, 1, 5, 7), nrow=3, ncol=2)
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
> C = matrix( c(0,0,0),nrow=3,ncol=1)
[1,] 0
[2,] 0
[3,] 0
-Then we can combine the columns of B and C with cbind.
> cbind(B, C)

[,1] [,2] [,3]

[1,] 2 1 0
[2,] 4 5 0
[3,] 3 7 0
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
10 of1055of 19
Return indices
> B = matrix( c(2, 4, 1, 1, 5, 7), nrow=3, ncol=2)
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 1 7
> which(B == min(B), arr.ind = TRUE)
row col
[1,] 3 1
[2,] 1 2

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
11 of1155of 19

- An array is essentially a multidimensional vector.

- It must all be of the same type and individual
elements are assessed in a similar fashion using
square brackets
- The first element is the row index, the second is
the column index and the remaining elements are
for outer dimensions.
- The array function is used to create
an array.

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
12 of1255of 19
array(data = x, dim = length(data), dimnames = NULL)
>vector1 <- c(2,18,30)
>vector2 <- c(10,14,17,13,11,15,22,11,33)
> data<- array(c(vector1, vector2),dim = c(3,2,2)))
[,1] [,2]
[1,] 2 10
[2,] 18 14
[3,] 30 17

,,2 2 10 13 22
[,1] [,2]
[1,] 13 22 18 14 11 11
[2,] 11 11 30 17 15 33
[3,] 15 33
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
13 of1355of 19
vector1 <- c(2,18,30)
vector2 <- c(10,14,17,13,11,15,22,11,33)

row_names <- c("ROW1","ROW2","ROW3")

col_names <- c("COL1","COL2","COL3","COL4")
matrix_names <- c("Matrix1","Matrix2")

data<- array(c(vector1,vector2),dim = c(3,4,2),dimnames =

list(row_names,col_names, matrix_names ))

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
14 of1455of 19



CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
15 of1555of 19
Data Frame

-One of the most useful features of R is the Data

-Data Frame is just like the Excel spreadsheet in
that it has columns and rows.
-R organizes data frames as each column as a
The simplest way of using the data frame is
data.frame function.

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
16 of1655of 19
Data Frame
> x <- 10:5
> y <- -3:2
> q <- c(“Hockey”,”Foot ball”, “Baseball”, “Basket Ball”, “Tennis”, “ Cricket”)
> data <- data.frame(x,y,q)
> data
x y q
1 10 -3 Hockey
2 9 -2 Football
3 8 -1 Baseball
4 7 0 Basket ball
5 6 1 Tennis
6 5 2 Cricket

This creates a 6x3 data frame consisting of three vectors

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
17 of1755of 19
Data Frame
We could assign names also for the data frame
> newdata <- data.frame(First=x, Second=y, Sport=q)
First Second Sports
1 10 -3 Hockey
2 9 -2 Football
3 8 -1 Baseball
4 7 0 Basket ball
5 6 1 Tennis
6 5 2 Cricket
- nrow(newdata) to find the total no: of rows
- ncol(newdata) to find the total no: of cols
- dim (newdata) to find the dimension
- names(newdata) to find the names of the columns
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
18 of1855of 19
Data Frame
no name salary
1 Adam 3000
2 Tom 4000 df[,3]
3 Ali 5000


CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
19 of1955of 19

- Unlike frame, list can store any number of items of

any type
- A list can contain all numeric or characters or a mix
of the two or data.frame
- List are created with the list function where each
argument to the function becomes an element of the

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
20 of2055of 19

> data <-list(c(1,2,3),3:7)

> data
[1] 1 2 3
[2] 3 4 5 6 7

> newlist <- list(data, 1:10)

-length is used to find the length of the list
-names used to find the names of the columns

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
21 of2155of 19

data[[1]] data[[2]]


CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
22 of2255of 19
Vector, List, Matrix, Data Frame, Array

Single Type Multiple Types

1D Vector List

2D Matrix Data Frame

nD Array

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
23 of2355of 19
Quick Review Questions

• How to create a data frame, List, Matrix

and an Array ?
• What are the commands required to
create all these data structures?
• How to combine the rows and columns in
a matrix?
• What is the difference between data frame
and list?

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
24 of2455of 19
Summary of Main Teaching Points

• Matrices
• Arrays
• Data Frames
• Lists

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
25 of2555of 19

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
26 of2655of 19
Next Session

• Control Structure and Loops

- for

CT038-3-2 Object Oriented Development

CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
27 of2755of 19

You might also like