Professional Documents
Culture Documents
Data Frame
Data Frame
Data Frame
Data Frame
AUTHOR
Dr. Mohammad Nasir Abdullah
Data Frames
Data sets frequently consist of more than one column of data, where each column represents
measurements of a single variable. Each row usually represents a single observation. This format is
referred to as case-by-variable format.
Most data sets are stored in R as data frames. These are like matrices, but with the columns having
their own names.
A data frame is one of the most commonly used data structures in R, especially for data analysis and
statistical modelling. Conceptually, it can be thought of as a table or a spreadsheet, where you have
rows representing observations and columns representing variables. A data frame is similar to a
matrix, but with the added flexibility that different columns can contain different types of data (eg:
numeric, character, factor).
Features:
1. Mixed Data Types : Unlike matrices, data frames can store different classes of objects in each
column.
2. Column Names : Columns in a data frame can have names, which makes accessing and manipulating
data easier and more intuitive.
3. Row Names : By default, rows have index names (from 1 to the number of rows), but these can also
be explicitly set to other values.
Creation:
Indexing:
1. Columns: You can access a column in a data frame using $ operator or double square brackets
[[…]] .
[1] 82 93 92
2. Rows: Rows can be accessed using single square brackets […] .
Useful functions
1. head() and tail() : Display the first or last part of a data frame.
2. str() : Provides the structure of a data frame, showing the data type of each column and the first
few entries.
3. summary() : Gives a statistical summary of all columns in a data frame.
4. dim() : Returns the dimensions (number of rows and columns) of a data frame.
5. rownames() and colnames() : Get or set the row or column names of a data frame.
6. merge() : Merges two data frames by common columns or row names.
Examples
1) head() and tail()
These functions display the first or last part of a data frame, respectively. By default, they show six
rows.
This function provides a concise display of the structure of an object, such as a data frame.
3) summary()
# Get a summary of df
summary(df)
4) dim()
[1] 7 3
6) merge()
head(mtcars)
str(mtcars)
summary(mtcars)
dim(mtcars)
[1] 32 11
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
[11] "carb"
7. Subsetting Example
Extracting data for cars with 6 cylinders and horsepower ( hp ) greater than 150:
Exercise
Exercise 1:
1. Create a data frame named students with the following columns: Name , Age , Grade , and Subject .
Populate it with at least 5 rows of sample data.
2. Display the structure of the students data frame using the str() function.
3. Add a new column to the students data frame named Attendance and populate it with sample
data.
Exercise 2:
1. From the mtcars dataset, extract the mpg (miles per gallon) and hp (horsepower) columns and
save them as a new data frame named car_specs .
Exercise 3:
2. How many cars in the dataset have an automatic transmission ( am column: 0 represents
automatic, 1 represents manual)?
3. Which car model in the mtcars dataset has the highest miles per gallon ( mpg )?
Exercise 4:
1. Extract and display all cars from mtcars with 4 cylinders ( cyl ).
2. How many cars in the mtcars dataset have more than 100 horsepower ( hp ) and weigh (column
wt ) less than 3,000 lbs?
3. Retrieve all car models from mtcars that have an automatic transmission and can cover more
than 20 miles per gallon.
Exercise 5:
1. How many rows and columns are present in the mtcars dataset?
Exercise 6:
2. How many cars in the dataset have an automatic transmission ( am column: 0 represents
automatic, 1 represents manual)?
3. Which car model has the highest miles per gallon ( mpg )?
Exercise 7:
2. How many cars have more than 100 horsepower ( hp ) and weigh (column wt ) less than 3,000 lbs?
3. Retrieve all car models that have an automatic transmission and can cover more than 20 miles per
gallon.