Data tidying with tidyr : : CHEAT SHEET

Tidy data is a way to organize tabular data in a

consistent data structure across packages. Reshape Data - Pivot data to reorganize values into a new layout. Expand
A table is tidy if:
table4a Tables
country 1999 2000 country year cases pivot_longer(data, cols, names_to = "name", Create new combinations of variables or identify
& A
0.7K 2K
37K 80K
1999 0.7K
1999 37K
values_to = "value", values_drop_na = FALSE) implicit missing values (combinations of
C 212K 213K C 1999 212K "Lengthen" data by collapsing several columns variables not present in the data).
A 2000 2K
Each variable is in Each observation, or into two. Column names move to a new
B 2000 80K x
its own column case, is in its own row C 2000 213K names_to column and values to a new values_to x1 x2 x3 x1 x2 expand(data, …) Create a
column. A 1 3
B 1 4
A 1
A 2 new tibble with all possible
A B C A *B C pivot_longer(table4a, cols = 2:3, names_to ="year", 2 3 B 1
B 2
combinations of the values
values_to = "cases") of the variables listed in …
Drop other variables.
table2 expand(mtcars, cyl, gear,
Access variables Preserve cases in country year type count country year cases pop pivot_wider(data, names_from = "name", carb)
as vectors vectorized operations A 1999 cases 0.7K A 1999 0.7K 19M
values_from = "value")
A 1999 pop 19M A 2000 2K 20M x
A 2000 cases 2K B 1999 37K 172M The inverse of pivot_longer(). "Widen" data by x1 x2 x3 x1 x2 x3 complete(data, …, fill =
A 1 3 A 1 3
pop 20M
cases 37K
2000 80K 174M
1999 212K 1T
expanding two columns into several. One column B 1 4 A 2 NA list()) Add missing possible
B 1999 pop 172M C 2000 213K 1T provides the new column names, the other the 2 3 B 1 4
combinations of values of
B 2 3
B 2000 cases 80K values. variables listed in … Fill
Tibbles are a table format provided B 2000 pop 174M
remaining variables with NA.
C 1999 cases 212K pivot_wider(table2, names_from = type,
by the tibble package. They inherit the complete(mtcars, cyl, gear,
C 1999 pop 1T values_from = count)
data frame class, but have improved behaviors: C 2000 cases 213K carb)
C 2000 pop 1T
• Subset a new tibble with ], a vector with [[ and $.
• No partial matching when subsetting columns.
• Display concise views of the data on one screen. Split Cells - Use these functions to split or combine cells into individual, isolated values. Handle Missing Values
options(tibble.print_max = n, tibble.print_min = m, table5 Drop or replace explicit missing values (NA).
tibble.width = Inf) Control default display settings. country century year country year unite(data, col, …, sep = "_", remove = TRUE,
View() or glimpse() View the entire data set.
A 19 99 A 1999
na.rm = FALSE) Collapse cells across several x1 x2 x1 x2 drop_na(data, …) Drop
A 20 00 A 2000
B 19 99 B 1999 columns into a single column. A 1 A 1
rows containing NA’s in …
B 20 00 B 2000
unite(table5, century, year, col = "year", sep = "") C
tibble(…) Construct by columns. E NA drop_na(x, x2)
tibble(x = 1:3, y = c("a", "b", "c")) Both make table3
this tibble country year rate country year cases pop separate(data, col, into, sep = "[^[:alnum:]]+",
tribble(…) Construct by rows. x1 x2 x1 x2 fill(data, …, .direction =
A 1999 0.7K/19M0 A 1999 0.7K 19M remove = TRUE, convert = FALSE, extra = "warn", A 1 A 1
tribble(~x, ~y, A 2000 0.2K/20M0 A 2000 2K 20M B NA B 1 "down") Fill in NA’s in …
A tibble: 3 × 2 fill = "warn", …) Separate each cell in a column
1, "a", x y B 1999 .37K/172M B 1999 37K 172 C NA C 1
columns using the next or
<int> <chr> B 2000 .80K/174M B 2000 80K 174 into several columns. Also extract(). D 3 D 3
2, "b", 1 1 a
E NA E 3 previous value.
3, "c") 2
separate(table3, rate, sep = "/", fill(x, x2)
into = c("cases", "pop"))
as_tibble(x, …) Convert a data frame to a tibble. table3
0.7K x1 x2 x1 x2 replace_na(data, replace)
A 1 A 1
enframe(x, name = "name", value = "value") country year rate A 1999 19M
Specify a value to replace
A 1999 0.7K/19M0 A 2000 2K separate_rows(data, …, sep = "[^[:alnum:].]+", B NA B 2
Convert a named vector to a tibble. Also deframe(). C NA C 2
NA in selected columns.
A 2000 0.2K/20M0 A 2000 20M
convert = FALSE) Separate each cell in a column D 3 D 3

is_tibble(x) Test whether x is a tibble. B 1999 .37K/172M B 1999 37K

into several rows.
E NA E 2 replace_na(x, list(x2 = 2))
B 2000 .80K/174M B 1999 172M
B 2000 80K
B 2000 174M separate_rows(table3, rate, sep = "/")

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • • 844-448-1212 • • Learn more at • tibble 3.1.2 • tidyr 1.1.3 • Updated: 2021–08


Nested Data
A nested data frame stores individual tables as a list-column of data frames within a larger organizing data frame. List-columns can also be lists of vectors or lists of varying data types.
Use a nested data frame to:
• Preserve relationships between observations and subsets of data. Preserve the type of the variables being nested (factors and datetimes aren't coerced to character).
• Manipulate many sub-tables at once with purrr funcitons like map(), map2(), or pmap() or with dplyr rowwise() grouping.


nest(data, …) Moves groups of cells into a list-column of a data unnest(data, cols, ..., keep_empty = FALSE) Flatten nested columns A vectorized function takes a vector, transforms each element in
frame. Use alone or with dplyr::group_by(): back to regular columns. The inverse of nest(). parallel, and returns a vector of the same length. By themselves
n_storms %>% unnest(data) vectorized functions cannot work with lists, such as list-columns.
1. Group the data frame with group_by() and use nest() to move
the groups into a list-column. unnest_longer(data, col, values_to = NULL, indices_to = NULL) dplyr::rowwise(.data, …) Group data so that each row is one
n_storms <- storms %>% Turn each element of a list-column into a row. group, and within the groups, elements of list-columns appear
group_by(name) %>% directly (accessed with [[ ), not as lists of length one. When you
nest() starwars %>% use rowwise(), dplyr functions will seem to apply functions to
select(name, films) %>% list-columns in a vectorized fashion.
2. Use nest(new_col = c(x, y)) to specify the columns to group
using dplyr::select() syntax. unnest_longer(films)
n_storms <- storms %>%
name films
nest(data = c(year:long)) data data data result
Luke The Empire Strik…
<tibble [50x4]> <tibble [50x4]> fun( <tibble [50x4]> , …) result 1
"cell" contents Luke Revenge of the S… <tibble [50x4]> <tibble [50x4]> fun( <tibble [50x4]> , …) result 2
yr lat long name films Luke Return of the Jed… <tibble [50x4]> <tibble [50x4]> fun( <tibble [50x4]> , …) result 3
name yr lat long name yr lat long 1975 27.5 -79.0 Luke <chr [5]> C-3PO The Empire Strik…
Amy 1975 27.5 -79.0 Amy 1975 27.5 -79.0 1975 28.5 -79.0 C-3PO <chr [6]> C-3PO Attack of the Cl…
Amy Amy 1975 28.5 -79.0 nested data frame 1975 29.5 -79.0
1975 28.5 -79.0 R2-D2 <chr[7]> C-3PO The Phantom M…
Amy 1975 29.5 -79.0 Amy 1975 29.5 -79.0
Bob 1979 22.0 -96.0 Bob 1979 22.0 -96.0
name data
Amy <tibble [50x3]>
yr lat
1979 22.0 -96.0
R2-D2 The Empire Strik… Apply a function to a list-column and create a new list-column.
Bob 1979 22.5 -95.3
R2-D2 Attack of the Cl…
Bob 1979 22.5 -95.3 Bob <tibble [50x3]> 1979 22.5 -95.3
Bob 1979 23.0 -94.6 R2-D2 The Phantom M…
Bob 1979 23.0 -94.6 Zeta <tibble [50x3]> 1979 23.0 -94.6 dim() returns two
Zeta 2005 23.9 -35.6 Zeta 2005 23.9 -35.6
yr lat long
n_storms %>% values per row
Zeta 2005 24.2 -36.1 Zeta
2005 23.9 -35.6 rowwise() %>%
Zeta 2005 24.7 -36.6
2005 24.2 -36.1 unnest_wider(data, col) Turn each element of a list-column into a mutate(n = list(dim(data))) wrap with list to tell mutate
to create a list-column
2005 24.7 -36.6
Index list-columns with [[]]. n_storms$data[[1]] regular column.
starwars %>%
CREATE TIBBLES WITH LIST-COLUMNS select(name, films) %>% Apply a function to a list-column and create a regular column.
tibble::tribble(…) Makes list-columns when needed.
tribble( ~max, ~seq, n_storms %>%
3, 1:3, max seq name films name ..1 ..2 ..3 rowwise() %>%
Luke <chr [5]> Luke The Empire... Revenge of... Return of... nrow() returns one
4, 1:4,
<int [3]>
<int [4]>
mutate(n = nrow(data)) integer per row
C-3PO <chr [6]> C-3PO The Empire... Attack of... The Phantom...
5, 1:5) 5 <int [5]>
R2-D2 <chr[7]> R2-D2 The Empire... Attack of... The Phantom...

tibble::tibble(…) Saves list input as list-columns.

tibble(max = c(3, 4, 5), seq = list(1:3, 1:4, 1:5)) Collapse multiple list-columns into a single list-column.
tibble::enframe(x, name="name", value="value") hoist(.data, .col, ..., .remove = TRUE) Selectively pull list components
Converts multi-level list to a tibble with list-cols. out into their own top-level columns. Uses purrr::pluck() syntax for starwars %>% append() returns a list for each
row, so col type must be list
enframe(list('3'=1:3, '4'=1:4, '5'=1:5), 'max', 'seq') selecting from lists. rowwise() %>%
mutate(transport = list(append(vehicles, starships)))
dplyr::mutate(), transmute(), and summarise() will output select(name, films) %>%
Apply a function to multiple list-columns.
list-columns if they return a list. hoist(films, first_film = 1, second_film = 2)
mtcars %>% starwars %>% length() returns one
integer per row
group_by(cyl) %>% name films name first_film second_film films rowwise() %>%
summarise(q = list(quantile(mpg))) Luke <chr [5]> Luke The Empire… Revenge of… <chr [3]> mutate(n_transports = length(c(vehicles, starships)))
C-3PO <chr [6]> C-3PO The Empire… Attack of… <chr [4]>
R2-D2 <chr[7]> R2-D2 The Empire… Attack of… <chr [5]>
See purrr package for more list functions.

