Data Frame

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

10/26/23, 10:54 AM Data Frame

Data Frame
AUTHOR
Dr. Mohammad Nasir Abdullah

Data Frames

Data sets frequently consist of more than one column of data, where each column represents
measurements of a single variable. Each row usually represents a single observation. This format is
referred to as case-by-variable format.

Most data sets are stored in R as data frames. These are like matrices, but with the columns having
their own names.

A data frame is one of the most commonly used data structures in R, especially for data analysis and
statistical modelling. Conceptually, it can be thought of as a table or a spreadsheet, where you have
rows representing observations and columns representing variables. A data frame is similar to a
matrix, but with the added flexibility that different columns can contain different types of data (eg:
numeric, character, factor).

Features:

1. Mixed Data Types : Unlike matrices, data frames can store different classes of objects in each
column.
2. Column Names : Columns in a data frame can have names, which makes accessing and manipulating
data easier and more intuitive.
3. Row Names : By default, rows have index names (from 1 to the number of rows), but these can also
be explicitly set to other values.

Creation:

A data frame can be created using the data.frame() function:

df <- data.frame(Name = c("Ali", "Abu", "Ahmad"),


Age = c(9, 6, 2),
Score = c(82, 93, 92))
df

Name Age Score


1 Ali 9 82
2 Abu 6 93
3 Ahmad 2 92

Indexing:

1. Columns: You can access a column in a data frame using $ operator or double square brackets
[[…]] .

#extract names from df


df$Name

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 1/7


10/26/23, 10:54 AM Data Frame

[1] "Ali" "Abu" "Ahmad"

#extract score from df


df[["Score"]]

[1] 82 93 92
2. Rows: Rows can be accessed using single square brackets […] .

#Extracting first row


df[1, ]

Name Age Score


1 Ali 9 82

#Extracting 3rd row


df[3, ]

Name Age Score


3 Ahmad 2 92

3. Subsetting: You can subset data frames using conditions

#Extracting data that contain more than 90


df[df$Score > 90, ]

Name Age Score


2 Abu 6 93
3 Ahmad 2 92

Useful functions
1. head() and tail() : Display the first or last part of a data frame.
2. str() : Provides the structure of a data frame, showing the data type of each column and the first
few entries.
3. summary() : Gives a statistical summary of all columns in a data frame.
4. dim() : Returns the dimensions (number of rows and columns) of a data frame.
5. rownames() and colnames() : Get or set the row or column names of a data frame.
6. merge() : Merges two data frames by common columns or row names.

Examples
1) head() and tail()

These functions display the first or last part of a data frame, respectively. By default, they show six
rows.

# Create a sample data frame


df <- data.frame(Name = c("Ali", "Abu", "Ahmad", "Aminah", "Rosnah", "Rozanae", "Rohana"),
Age = c(25, 32, 29, 24, 27, 31, 23),
Score = c(85, 90, 93, 87, 78, 91, 82))

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 2/7


10/26/23, 10:54 AM Data Frame
# Display the first few rows
head(df)

Name Age Score


1 Ali 25 85
2 Abu 32 90
3 Ahmad 29 93
4 Aminah 24 87
5 Rosnah 27 78
6 Rozanae 31 91

# Display the last few rows


tail(df)

Name Age Score


2 Abu 32 90
3 Ahmad 29 93
4 Aminah 24 87
5 Rosnah 27 78
6 Rozanae 31 91
7 Rohana 23 82
2) str()

This function provides a concise display of the structure of an object, such as a data frame.

# Display the structure of df


str(df)

'data.frame': 7 obs. of 3 variables:


$ Name : chr "Ali" "Abu" "Ahmad" "Aminah" ...
$ Age : num 25 32 29 24 27 31 23
$ Score: num 85 90 93 87 78 91 82

3) summary()

Gives a statistical summary of all columns in a data frame.

# Get a summary of df
summary(df)

Name Age Score


Length:7 Min. :23.00 Min. :78.00
Class :character 1st Qu.:24.50 1st Qu.:83.50
Mode :character Median :27.00 Median :87.00
Mean :27.29 Mean :86.57
3rd Qu.:30.00 3rd Qu.:90.50
Max. :32.00 Max. :93.00

4) dim()

Returns the dimensions of an object.

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 3/7


10/26/23, 10:54 AM Data Frame

# Get the dimensions of df (number of rows and columns)


dim(df)

[1] 7 3

5) rownames() and colnames()

Retrieve or set the row or column names of a data frame.

# Get row names of df


rownames(df)

[1] "1" "2" "3" "4" "5" "6" "7"

# Get column names of df


colnames(df)

[1] "Name" "Age" "Score"

# Set new row names for df


rownames(df) <- c("A", "B", "C", "D", "E", "F", "G")

6) merge()

Merge two data frames by common columns or row names.

# Create another sample data frame


df2 <- data.frame(Name = c("Ali", "Abu", "Rosnah", "Rohana"),
Grade = c("A", "B", "A", "C"))

# Merge df and df2 by the "Name" column


merged_df <- merge(df, df2, by="Name")
print(merged_df)

Name Age Score Grade


1 Abu 32 90 B
2 Ali 25 85 A
3 Rohana 23 82 C
4 Rosnah 27 78 A

Let’s use mtcars data set.


This dataset comprises various specifications and details about different car models from the 1970s.

1. Quick Glance at the Dataset

First, let’s take a quick look at the mtcars dataset:

head(mtcars)

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 4/7


10/26/23, 10:54 AM Data Frame

mpg cyl disp hp drat wt qsec vs am gear carb


Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
2. Structure of the Dataset ( str() )

Examining the structure of mtcars :

str(mtcars)

'data.frame': 32 obs. of 11 variables:


$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

3. Summary of the Dataset ( summary() )

Providing a statistical summary:

summary(mtcars)

mpg cyl disp hp


Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 5/7


10/26/23, 10:54 AM Data Frame

4. Dimensions of the Dataset ( dim() )

Checking the number of rows and columns:

dim(mtcars)

[1] 32 11

5. Column Names ( colnames() )

Retrieving the names of the columns:

colnames(mtcars) #same as names(mtcars)

[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
[11] "carb"

7. Subsetting Example

Extracting data for cars with 6 cylinders and horsepower ( hp ) greater than 150:

mtcars[mtcars$cyl == 6 & mtcars$hp > 150, ]

mpg cyl disp hp drat wt qsec vs am gear carb


Ferrari Dino 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6

Exercise

Exercise 1:

1. Create a data frame named students with the following columns: Name , Age , Grade , and Subject .
Populate it with at least 5 rows of sample data.

2. Display the structure of the students data frame using the str() function.

3. Add a new column to the students data frame named Attendance and populate it with sample
data.

Exercise 2:

1. From the mtcars dataset, extract the mpg (miles per gallon) and hp (horsepower) columns and
save them as a new data frame named car_specs .

2. Retrieve the first 6 rows of the car_specs data frame.

3. Create a subset of mtcars containing only cars with 6 cylinders ( cyl ).

Exercise 3:

1. Calculate the median horsepower ( hp ) of all cars in the mtcars dataset.

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 6/7


10/26/23, 10:54 AM Data Frame

2. How many cars in the dataset have an automatic transmission ( am column: 0 represents
automatic, 1 represents manual)?

3. Which car model in the mtcars dataset has the highest miles per gallon ( mpg )?

Exercise 4:

1. Extract and display all cars from mtcars with 4 cylinders ( cyl ).

2. How many cars in the mtcars dataset have more than 100 horsepower ( hp ) and weigh (column
wt ) less than 3,000 lbs?

3. Retrieve all car models from mtcars that have an automatic transmission and can cover more
than 20 miles per gallon.

Exercise 5:

1. How many rows and columns are present in the mtcars dataset?

2. What are the names of all the columns in the dataset?

3. Display the last 8 rows of the dataset.

Exercise 6:

1. Calculate the median horsepower ( hp ) of all cars in the dataset.

2. How many cars in the dataset have an automatic transmission ( am column: 0 represents
automatic, 1 represents manual)?

3. Which car model has the highest miles per gallon ( mpg )?

Exercise 7:

1. Extract and display all cars with 4 cylinders ( cyl ).

2. How many cars have more than 100 horsepower ( hp ) and weigh (column wt ) less than 3,000 lbs?

3. Retrieve all car models that have an automatic transmission and can cover more than 20 miles per
gallon.

https://sta334.s3.ap-southeast-1.amazonaw s.com/day3/Data+Frame.html 7/7

You might also like