Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Programming for Data Analysis

(CT127-3-2-PFDA and Version VC1)

Data Structures
Topic & Structure of the lesson

- Matrices
- Arrays
- Data.frames
- List

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
2 of 255of 19
Learning outcomes

At the end of this topic, you should be able to:


• Understand the various data structures
available in the R programming language

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
3 of 355of 19
Key terms you must be able to use

If you have mastered this topic, you should be able to use


the following terms correctly in your assignments and
exams:

• Matrices
• Arrays
• Frames
• List

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
4 of 455of 19
Matrices

- A very common mathematical structure that is


essential to statistics is a matrix.
- The matrix has columns and rows.
- The data in the matrix must be the same type,
most commonly all numeric.
- The matrix is filled by columns.
- Can do all mathematical operations.
- The matrix function is used to create the matrix.

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
5 of 555of 19
Matrices

Matrix columns
Vector
1 4 7
1 2 3 4 5 6 7 8 9
rows 2 5 8
number of rows 3 6 9

matrix(data = x, nrow = 1, ncol = 1, dimnames = NULL)

> v<-c(1,2,3,4,5,6,7,8,9)
row and column names
> a=matrix(v,nrow=3,ncol=3)
> a number of columns
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
6 of 655of 19
Giving names to the rows and columns
> B = matrix(c(2, 4, 3, 1), nrow=2, ncol=2, dimnames
=list(c("row1", "row2"), c(“col1", “col2")))
>B
col1 col2
row1 2 3
row2 4 1

Add two matrices


> C = matrix( c(7, 4, 2,4), nrow=2, ncol=2)
> D= B+C
>D
col1 col2
row1 9 5
row2 8 5

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
7 of 755of 19
columns
Matrix 1 2 3
1 1 4 7

rows 2 2 5 8
3 3 6 9
1 2 3
-An element at the row, column of a can be mth nth
1 1 4 7
accessed by the expression a[m, n].
2 2 5 8
> a [2, 3] # element at 2nd row, 3rd column
[1] 8 3 3 6 9
1 2 3
1 1 4 7
- The entire mth row can be extracted as a[m, ].
2 2 5 8
> a [2, ] # the 2nd row
[1] 2 5 8 3 3 6 9
1 2 3
- The entire nth column can be extracted as a[ ,n]. 1 1 4 7
> a [ ,3] # the 3rd column 2 2 5 8
[1] 7 8 9 3 3 6 9
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
8 of 855of 19
Combining Matrices
> B = matrix( c(2, 4, 3, 1, 5, 7), nrow=3, ncol=2)
>B
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
> C = matrix( c(0,0,1,1),nrow=2,ncol=2)
>C
[,1] [,2]
[1,] 0 1
[2,] 0 1
-Then we can combine the rows of B and C with rbind.
> rbind(B, C)
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
[4,] 0 1
[5,] 0 1
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
9 of 955of 19
Combining Matrices
> B = matrix( c(2, 4, 3, 1, 5, 7), nrow=3, ncol=2)
>B
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 3 7
> C = matrix( c(0,0,0),nrow=3,ncol=1)
>C
[,1]
[1,] 0
[2,] 0
[3,] 0
-Then we can combine the columns of B and C with cbind.
> cbind(B, C)

[,1] [,2] [,3]


[1,] 2 1 0
[2,] 4 5 0
[3,] 3 7 0
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
10 of1055of 19
Return indices
> B = matrix( c(2, 4, 1, 1, 5, 7), nrow=3, ncol=2)
>B
[,1] [,2]
[1,] 2 1
[2,] 4 5
[3,] 1 7
> which(B == min(B), arr.ind = TRUE)
row col
[1,] 3 1
[2,] 1 2

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
11 of1155of 19
Arrays

- An array is essentially a multidimensional vector.


- It must all be of the same type and individual
elements are assessed in a similar fashion using
square brackets
- The first element is the row index, the second is
the column index and the remaining elements are
for outer dimensions.
- The array function is used to create
an array.

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
12 of1255of 19
Arrays
array(data = x, dim = length(data), dimnames = NULL)
>vector1 <- c(2,18,30)
>vector2 <- c(10,14,17,13,11,15,22,11,33)
> data<- array(c(vector1, vector2),dim = c(3,2,2)))
>data
,,1
[,1] [,2]
[1,] 2 10
[2,] 18 14
[3,] 30 17

,,2 2 10 13 22
[,1] [,2]
[1,] 13 22 18 14 11 11
[2,] 11 11 30 17 15 33
[3,] 15 33
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
13 of1355of 19
Arrays
vector1 <- c(2,18,30)
vector2 <- c(10,14,17,13,11,15,22,11,33)

row_names <- c("ROW1","ROW2","ROW3")


col_names <- c("COL1","COL2","COL3","COL4")
matrix_names <- c("Matrix1","Matrix2")

data<- array(c(vector1,vector2),dim = c(3,4,2),dimnames =


list(row_names,col_names, matrix_names ))

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
14 of1455of 19
Arrays
data

data[1,4,1]

data[3,2,2]

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
15 of1555of 19
Data Frame

-One of the most useful features of R is the Data


Frame.
-Data Frame is just like the Excel spreadsheet in
that it has columns and rows.
-R organizes data frames as each column as a
vector.
The simplest way of using the data frame is
data.frame function.

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
16 of1655of 19
Data Frame
> x <- 10:5
> y <- -3:2
> q <- c(“Hockey”,”Foot ball”, “Baseball”, “Basket Ball”, “Tennis”, “ Cricket”)
> data <- data.frame(x,y,q)
> data
x y q
1 10 -3 Hockey
2 9 -2 Football
3 8 -1 Baseball
4 7 0 Basket ball
5 6 1 Tennis
6 5 2 Cricket

This creates a 6x3 data frame consisting of three vectors

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
17 of1755of 19
Data Frame
We could assign names also for the data frame
> newdata <- data.frame(First=x, Second=y, Sport=q)
>newdata
First Second Sports
1 10 -3 Hockey
2 9 -2 Football
3 8 -1 Baseball
4 7 0 Basket ball
5 6 1 Tennis
6 5 2 Cricket
- nrow(newdata) to find the total no: of rows
- ncol(newdata) to find the total no: of cols
- dim (newdata) to find the dimension
- names(newdata) to find the names of the columns
CT038-3-2 Object Oriented Development
CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
18 of1855of 19
Data Frame
no name salary
df
1 Adam 3000
df$salary
2 Tom 4000 df[,3]
3 Ali 5000

df[2:3,]
df[3,2]

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
19 of1955of 19
Lists

- Unlike frame, list can store any number of items of


any type
- A list can contain all numeric or characters or a mix
of the two or data.frame
- List are created with the list function where each
argument to the function becomes an element of the
list

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
20 of2055of 19
List

> data <-list(c(1,2,3),3:7)


> data
[[1]]
[1] 1 2 3
[[2]]
[2] 3 4 5 6 7

> newlist <- list(data, 1:10)


-length is used to find the length of the list
-names used to find the names of the columns

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
21 of2155of 19
List
data

data[[1]] data[[2]]

data[[1]][1,2]

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
22 of2255of 19
Vector, List, Matrix, Data Frame, Array

Single Type Multiple Types

1D Vector List

2D Matrix Data Frame

nD Array

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
23 of2355of 19
Quick Review Questions

• How to create a data frame, List, Matrix


and an Array ?
• What are the commands required to
create all these data structures?
• How to combine the rows and columns in
a matrix?
• What is the difference between data frame
and list?

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
24 of2455of 19
Summary of Main Teaching Points

• Matrices
• Arrays
• Data Frames
• Lists

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
25 of2555of 19
Q&A

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
26 of2655of 19
Next Session

• Control Structure and Loops


-if-else
-switch
- for
-while

CT038-3-2 Object Oriented Development


CT127-3-2-Programming with Java
for Data Analysis File I/O
Data Structures SlideSlide
27 of2755of 19

You might also like