Professional Documents
Culture Documents
Data Import::: Cheat Sheet
Data Import::: Cheat Sheet
Tibbles - an enhanced data frame Tidy Data with tidyr Split Cells
Tidy data is a way to organize tabular data. It provides a consistent data structure across packages.
The tibble package provides a new Use these functions to
A table is tidy if: Tidy data:
S3 class for storing tabular data, the A * B -> C split or combine cells
tibble. Tibbles inherit the data frame A B C A B C A B C A * B C into individual, isolated
class, but improve three behaviors: values.
• Subsetting - [ always returns a new tibble,
[[ and $ always return a vector.
& separate(data, col, into, sep = "[^[:alnum:]]
Each variable is in Each observation, or Makes variables easy Preserves cases during +", remove = TRUE, convert = FALSE,
• No partial matching - You must use full its own column case, is in its own row to access as vectors vectorized operations extra = "warn", fill = "warn", ...)
column names when subsetting
Separate each cell in a column to make
• Display - When you print a tibble, R provides a
concise view of the
Reshape Data - change the layout of values in a table several columns.
table3
data that fits on Use pivot_longer() and pivot_wider() to reorganize the values of a table into a new layout.
# A tibble: 234 × 6 country year rate country year cases pop
manufacturer model displ
one screen 1
<chr>
audi
<chr> <dbl>
a4 1.8 pivot_longer(data, cols, names_to = "name", pivot_wider(data, id_cols = NULL, names_from = name, A 1999 0.7K/19M A 1999 0.7K 19M
2 audi a4 1.8
3 audi a4 2.0 A 2000 2K/20M A 2000 2K 20M
4
5
audi
audi
a4
a4
2.0
2.8
names_prefix = NULL, names_sep = NULL, names_prefix = "", names_sep = "_", names_glue = NULL,
B 1999 37K/172M B 1999 37K 172
6
7
audi
audi
a4
a4
2.8
3.1 names_pattern = NULL, names_ptypes = list(), names_sort = FALSE, names_repair = "check_unique", B 2000 80K/174M B 2000 80K 174
names_transform = list(), names_repair = values_from = value, values_fill = NULL, values_fn = NULL, ...)
8 audi a4 quattro 1.8
w
w
9 audi a4 quattro 1.8
10 audi a4 quattro 2.0 C 1999 212K/1T C 1999 212K 1T
# ... with 224 more rows, and 3
# more variables: year <int>,
"check_unique", values_to = "value", values_drop_na = C 2000 213K/1T C 2000 213K 1T
# cyl <int>, trans <chr>
FALSE, values_ptypes = list(), values_transform = pivot_wider() pivots a names_from and a
list(), ...) values_from column into a rectangular field of separate(table3, rate, sep = "/",
tibble display cells. into = c("cases", "pop"))
pivot_longer() pivots cols columns, moving table2
156 1999 6 auto(l4)
column names into a names_to column, and
separate_rows(data, ..., sep = "[^[:alnum:].]
157 1999 6 auto(l4) country year type count country year cases pop
158 2008 6 auto(l4)
159 2008
160 1999
8 auto(s4)
4 manual(m5) column values into a values_to column. A 1999 cases 0.7K A 1999 0.7K 19M
161 1999
162 2008
4 auto(l4)
4 manual(m5) A 1999 pop 19M A 2000 2K 20M +", convert = FALSE)
163 2008
164 2008
4 manual(m5)
4 auto(l4) table4a A 2000 cases 2K B 1999 37K 172M
165 2008
166 1999
[ reached
4
4
auto(l4)
auto(l4)
getOption("max.print") country 1999 2000 country year cases A 2000 pop 20M B 2000 80K 174M Separate each cell in a column to make
A large table -- omitted 68 rows ]
A 0.7K 2K A 1999 0.7K B 1999 cases 37K C 1999 212K 1T several rows.
to display data frame display
B 37K 80K B 1999 37K B 1999 pop 172M C 2000 NA NA table3
C 212K 213K C 1999 212K B 2000 cases 80K
• Control the default appearance with options: A 2000 2K B 2000 pop 174M
country year rate country year rate
A 1999 0.7K/19M A 1999 0.7K
options(tibble.print_max = n, B
C
2000 80K
2000 213K
C
C
1999
1999
cases 212K
pop 1T
A 2000 2K/20M A 1999 19M
tibble.print_min = m, tibble.width = Inf) B 1999 37K/172M A 2000 2K
B 2000 80K/174M A 2000 20M
• View full data set with View() or glimpse() pivot_longer(table4a, cols = 2:3, pivot_wider(table2, names_from = type, C 1999 212K/1T B 1999 37K
names_to = "year", values_to = "cases") values_from = count) C 2000 213K/1T B 1999 172M
• Revert to data frame with as.data.frame() B 2000 80K
B 2000 174M
CONSTRUCT A TIBBLE IN TWO WAYS
tibble(…)
Handle Missing Values C
C
1999
1999
212K
1T
Both drop_na(data, ...) fill(data, ..., .direction = c("down", "up")) replace_na(data, C 2000 213K
Construct by columns.
replace = list(), ...)
C 2000 1T
make this Drop rows containing Fill in NA’s in … columns with most
tibble(x = 1:3, y = c("a", "b", "c")) tibble NA’s in … columns. recent non-NA values. Replace NA’s by column. separate_rows(table3, rate, sep = "/")
tribble(…) x x x
Construct by rows. A tibble: 3 × 2
x y
x1
A
x2
1
x1
A
x2
1
x1
A
x2
1
x1
A
x2
1
x1
A
x2
1
x1
A
x2
1 unite(data, col, ..., sep = "_", remove = TRUE)
tribble( ~x, ~y, <int> <chr> B NA D 3 B NA B 1 B NA B 2
Collapse cells across several columns to
1 1 a C NA C NA C 1 C NA C 2
1, "a", 2 2 b D 3 D 3 D 3 D 3 D 3 make a single column.
2, "b", 3 3 c E NA E NA E 3 E NA E 2
table5
3, "c")
drop_na(x, x2) fill(x, x2) replace_na(x, list(x2 = 2)) country century year country year
as_tibble(x, …) Convert data frame to tibble. Afghan 19 99 Afghan 1999
enframe(x, name = "name", value = "value") Expand Tables - quickly create tables with combinations of values Afghan
Brazil
20
19
00
99
Afghan
Brazil
2000
1999
Convert named vector to a tibble Brazil 20 00 Brazil 2000
complete(data, ..., fill = list()) expand(data, ...) China 19 99 China 1999
is_tibble(x) Test whether x is a tibble. Adds to the data missing combinations of the Create new tibble with all possible combinations China 20 00 China 2000
values of the variables listed in … of the values of the variables listed in … unite(table5, century, year,
complete(mtcars, cyl, gear, carb) expand(mtcars, cyl, gear, carb) col = "year", sep = "")
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at tidyverse.org • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2021–03