Data Frame

10/26/23, 10:54 AM Data Frame
Data Frame
AUTHOR
Dr. Mohammad Nasir Abdullah
Data Frames
Data sets frequently consist of more than one column of data, where each column represents
measurements of a single variable. Each row usually represents a single observation. This format is
referred to as case-by-variable format.
Most data sets are stored in R as data frames. These are like matrices, but with the columns having
their own names.
A data frame is one of the most commonly used data structures in R, especially for data analysis and
statistical modelling. Conceptually, it can be thought of as a table or a spreadsheet, where you have
rows representing observations and columns representing variables. A data frame is similar to a
matrix, but with the added flexibility that different columns can contain different types of data (eg:
numeric, character, factor).
Features:
1. Mixed Data Types : Unlike matrices, data frames can store different classes of objects in each
column.
2. Column Names : Columns in a data frame can have names, which makes accessing and manipulating
data easier and more intuitive.
3. Row Names : By default, rows have index names (from 1 to the number of rows), but these can also
be explicitly set to other values.
Creation:
A data frame can be created using the data.frame() function:
df <- data.frame(Name = c("Ali", "Abu", "Ahmad"),

Age = c(9, 6, 2),
Score = c(82, 93, 92))
df
Name Age Score

1 Ali 9 82
2 Abu 6 93
3 Ahmad 2 92
Indexing:
1. Columns: You can access a column in a data frame using $ operator or double square brackets
[[…]] .
#extract names from df

df$Name
https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 1/7

10/26/23, 10:54 AM Data Frame
[1] "Ali" "Abu" "Ahmad"
#extract score from df

df[["Score"]]
[1] 82 93 92
2. Rows: Rows can be accessed using single square brackets […] .
#Extracting first row

df[1, ]
Name Age Score

1 Ali 9 82
#Extracting 3rd row

df[3, ]
Name Age Score

3 Ahmad 2 92
3. Subsetting: You can subset data frames using conditions
#Extracting data that contain more than 90

df[df$Score > 90, ]
Name Age Score

2 Abu 6 93
3 Ahmad 2 92
Useful functions
1. head() and tail() : Display the first or last part of a data frame.
2. str() : Provides the structure of a data frame, showing the data type of each column and the first
few entries.
3. summary() : Gives a statistical summary of all columns in a data frame.
4. dim() : Returns the dimensions (number of rows and columns) of a data frame.
5. rownames() and colnames() : Get or set the row or column names of a data frame.
6. merge() : Merges two data frames by common columns or row names.
Examples
1) head() and tail()
These functions display the first or last part of a data frame, respectively. By default, they show six
rows.
# Create a sample data frame

df <- data.frame(Name = c("Ali", "Abu", "Ahmad", "Aminah", "Rosnah", "Rozanae", "Rohana"),
Age = c(25, 32, 29, 24, 27, 31, 23),
Score = c(85, 90, 93, 87, 78, 91, 82))

10/26/23, 10:54 AM Data Frame
# Display the first few rows
head(df)
Name Age Score

1 Ali 25 85
2 Abu 32 90
3 Ahmad 29 93
4 Aminah 24 87
5 Rosnah 27 78
6 Rozanae 31 91
# Display the last few rows

tail(df)
Name Age Score

2 Abu 32 90
3 Ahmad 29 93
4 Aminah 24 87
5 Rosnah 27 78
6 Rozanae 31 91
7 Rohana 23 82
2) str()
This function provides a concise display of the structure of an object, such as a data frame.
# Display the structure of df

str(df)
'data.frame': 7 obs. of 3 variables:

$ Name : chr "Ali" "Abu" "Ahmad" "Aminah" ...
$ Age : num 25 32 29 24 27 31 23
$ Score: num 85 90 93 87 78 91 82
3) summary()
Gives a statistical summary of all columns in a data frame.
# Get a summary of df
summary(df)
Name Age Score

Length:7 Min. :23.00 Min. :78.00
Class :character 1st Qu.:24.50 1st Qu.:83.50
Mode :character Median :27.00 Median :87.00
Mean :27.29 Mean :86.57
3rd Qu.:30.00 3rd Qu.:90.50
Max. :32.00 Max. :93.00
4) dim()
Returns the dimensions of an object.

10/26/23, 10:54 AM Data Frame
# Get the dimensions of df (number of rows and columns)

dim(df)
[1] 7 3
5) rownames() and colnames()
Retrieve or set the row or column names of a data frame.
# Get row names of df

rownames(df)
[1] "1" "2" "3" "4" "5" "6" "7"
# Get column names of df

colnames(df)
[1] "Name" "Age" "Score"
# Set new row names for df

rownames(df) <- c("A", "B", "C", "D", "E", "F", "G")
6) merge()
Merge two data frames by common columns or row names.
# Create another sample data frame

df2 <- data.frame(Name = c("Ali", "Abu", "Rosnah", "Rohana"),
Grade = c("A", "B", "A", "C"))
# Merge df and df2 by the "Name" column

merged_df <- merge(df, df2, by="Name")
print(merged_df)
Name Age Score Grade

1 Abu 32 90 B
2 Ali 25 85 A
3 Rohana 23 82 C
4 Rosnah 27 78 A
Let’s use mtcars data set.

This dataset comprises various specifications and details about different car models from the 1970s.
1. Quick Glance at the Dataset
First, let’s take a quick look at the mtcars dataset:
head(mtcars)

10/26/23, 10:54 AM Data Frame
mpg cyl disp hp drat wt qsec vs am gear carb

Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
2. Structure of the Dataset ( str() )
Examining the structure of mtcars :
str(mtcars)
'data.frame': 32 obs. of 11 variables:

$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
3. Summary of the Dataset ( summary() )
Providing a statistical summary:
summary(mtcars)
mpg cyl disp hp

Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000

10/26/23, 10:54 AM Data Frame
4. Dimensions of the Dataset ( dim() )
Checking the number of rows and columns:
dim(mtcars)
[1] 32 11
5. Column Names ( colnames() )
Retrieving the names of the columns:
colnames(mtcars) #same as names(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
[11] "carb"
7. Subsetting Example
Extracting data for cars with 6 cylinders and horsepower ( hp ) greater than 150:
mtcars[mtcars$cyl == 6 & mtcars$hp > 150, ]
mpg cyl disp hp drat wt qsec vs am gear carb

Ferrari Dino 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
Exercise
Exercise 1:
1. Create a data frame named students with the following columns: Name , Age , Grade , and Subject .
Populate it with at least 5 rows of sample data.
2. Display the structure of the students data frame using the str() function.
3. Add a new column to the students data frame named Attendance and populate it with sample
data.
Exercise 2:
1. From the mtcars dataset, extract the mpg (miles per gallon) and hp (horsepower) columns and
save them as a new data frame named car_specs .
2. Retrieve the first 6 rows of the car_specs data frame.
3. Create a subset of mtcars containing only cars with 6 cylinders ( cyl ).
Exercise 3:
1. Calculate the median horsepower ( hp ) of all cars in the mtcars dataset.

10/26/23, 10:54 AM Data Frame
2. How many cars in the dataset have an automatic transmission ( am column: 0 represents
automatic, 1 represents manual)?
3. Which car model in the mtcars dataset has the highest miles per gallon ( mpg )?
Exercise 4:
1. Extract and display all cars from mtcars with 4 cylinders ( cyl ).
2. How many cars in the mtcars dataset have more than 100 horsepower ( hp ) and weigh (column
wt ) less than 3,000 lbs?
3. Retrieve all car models from mtcars that have an automatic transmission and can cover more
than 20 miles per gallon.
Exercise 5:
1. How many rows and columns are present in the mtcars dataset?
2. What are the names of all the columns in the dataset?
3. Display the last 8 rows of the dataset.
Exercise 6:
1. Calculate the median horsepower ( hp ) of all cars in the dataset.
2. How many cars in the dataset have an automatic transmission ( am column: 0 represents
automatic, 1 represents manual)?
3. Which car model has the highest miles per gallon ( mpg )?
Exercise 7:
1. Extract and display all cars with 4 cylinders ( cyl ).
2. How many cars have more than 100 horsepower ( hp ) and weigh (column wt ) less than 3,000 lbs?
3. Retrieve all car models that have an automatic transmission and can cover more than 20 miles per
gallon.

Data Frame

Uploaded by

Copyright:

Available Formats

You might also like

Data Frame

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Frame

Uploaded by

Copyright:

Available Formats

10/26/23, 10:54 AM Data Frame

A data frame can be created using the data.frame() function:

df <- data.frame(Name = c("Ali", "Abu", "Ahmad"),

Name Age Score

#extract names from df

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 1/7

[1] "Ali" "Abu" "Ahmad"

#extract score from df

#Extracting first row

Name Age Score

#Extracting 3rd row

Name Age Score

3. Subsetting: You can subset data frames using conditions

#Extracting data that contain more than 90

Name Age Score

# Create a sample data frame

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 2/7

Name Age Score

# Display the last few rows

Name Age Score

# Display the structure of df

'data.frame': 7 obs. of 3 variables:

Gives a statistical summary of all columns in a data frame.

Name Age Score

Returns the dimensions of an object.

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 3/7

# Get the dimensions of df (number of rows and columns)

5) rownames() and colnames()

Retrieve or set the row or column names of a data frame.

# Get row names of df

[1] "1" "2" "3" "4" "5" "6" "7"

# Get column names of df

[1] "Name" "Age" "Score"

# Set new row names for df

Merge two data frames by common columns or row names.

# Create another sample data frame

# Merge df and df2 by the "Name" column

Name Age Score Grade

Let’s use mtcars data set.

1. Quick Glance at the Dataset

First, let’s take a quick look at the mtcars dataset:

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 4/7

mpg cyl disp hp drat wt qsec vs am gear carb

Examining the structure of mtcars :

'data.frame': 32 obs. of 11 variables:

3. Summary of the Dataset ( summary() )

Providing a statistical summary:

mpg cyl disp hp

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 5/7

4. Dimensions of the Dataset ( dim() )

Checking the number of rows and columns:

5. Column Names ( colnames() )

Retrieving the names of the columns:

colnames(mtcars) #same as names(mtcars)

mtcars[mtcars$cyl == 6 & mtcars$hp > 150, ]

mpg cyl disp hp drat wt qsec vs am gear carb

2. Retrieve the first 6 rows of the car_specs data frame.

3. Create a subset of mtcars containing only cars with 6 cylinders ( cyl ).