Professional Documents
Culture Documents
Ultimate Cheat SHEET - Analysis in R
Ultimate Cheat SHEET - Analysis in R
Basics Geoms Use a geom function to represent data points, use the geom’s aesthetic properties to represent variables.
Each function returns a layer.
ggplot2 is based on the grammar of graphics, the idea
that you can build every graph from the same GRAPHICAL PRIMITIVES TWO VARIABLES
components: a data set, a coordinate system, a <- ggplot(economics, aes(date, unemploy)) both continuous continuous bivariate distribution
and geoms—visual marks that represent data points. b <- ggplot(seals, aes(x = long, y = lat)) e <- ggplot(mpg, aes(cty, hwy)) h <- ggplot(diamonds, aes(carat, price))
RCC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at ggplot2.tidyverse.org • ggplot2 3.3.5 • Updated: 2021-08
ft
Stats An alternative way to build a layer. Scales Override defaults with scales package. Coordinate Systems Faceting
A stat builds new variables to plot (e.g., count, prop). Scales map data values to the visual values of an r <- d + geom_bar() Facets divide a plot into
fl cty cyl aesthetic. To change a mapping, add a new scale. r + coord_cartesian(xlim = c(0, 5)) - xlim, ylim subplots based on the
n <- d + geom_bar(aes(fill = fl)) The default cartesian coordinate system. values of one or more
+ =
x ..count..
discrete variables.
aesthetic prepackaged scale-specific r + coord_fixed(ratio = 1/2)
scale_ to adjust scale to use arguments ratio, xlim, ylim - Cartesian coordinates with t <- ggplot(mpg, aes(cty, hwy)) + geom_point()
data stat geom coordinate plot
x=x· system n + scale_fill_manual( fixed aspect ratio between x and y units.
y = ..count.. values = c("skyblue", "royalblue", "blue", "navy"), t + facet_grid(cols = vars(fl))
Visualize a stat by changing the default stat of a geom limits = c("d", "e", "p", "r"), breaks =c("d", "e", "p", “r"), ggplot(mpg, aes(y = fl)) + geom_bar() Facet into columns based on fl.
name = "fuel", labels = c("D", "E", "P", "R")) Flip cartesian coordinates by switching
function, geom_bar(stat="count") or by using a stat
x and y aesthetic mappings. t + facet_grid(rows = vars(year))
function, stat_count(geom="bar"), which calls a default range of title to use in labels to use breaks to use in
values to include legend/axis in legend/axis legend/axis Facet into rows based on year.
geom to make a layer (equivalent to a geom function). in mapping
Use ..name.. syntax to map stat variables to aesthetics. r + coord_polar(theta = "x", direction=1)
theta, start, direction - Polar coordinates. t + facet_grid(rows = vars(year), cols = vars(fl))
GENERAL PURPOSE SCALES Facet into both rows and columns.
geom to use stat function geommappings r + coord_trans(y = “sqrt") - x, y, xlim, ylim t + facet_wrap(vars(fl))
Use with most aesthetics Transformed cartesian coordinates. Set xtrans
i + stat_density_2d(aes(fill = ..level..), Wrap facets into a rectangular layout.
scale_*_continuous() - Map cont’ values to visual ones. and ytrans to the name of a window function.
geom = "polygon")
variable created by stat scale_*_discrete() - Map discrete values to visual ones. Set scales to let axis limits vary across facets.
scale_*_binned() - Map continuous values to discrete bins. π + coord_quickmap()
60
π + coord_map(projection = "ortho", orientation t + facet_grid(rows = vars(drv), cols = vars(fl),
c + stat_bin(binwidth = 1, boundary = 10) scale_*_identity() - Use data values as visual ones. = c(41, -74, 0)) - projection, xlim, ylim scales = "free")
lat
x, y | ..count.., ..ncount.., ..density.., ..ndensity.. scale_*_manual(values = c()) - Map discrete values to Map projections from the mapproj package x and y axis limits adjust to individual facets:
manually chosen visual ones.
c + stat_count(width = 1) x, y | ..count.., ..prop.. long
(mercator (default), azequalarea, lagrange, etc.). "free_x" - x axis limits adjust
scale_*_date(date_labels = "%m/%d"), "free_y" - y axis limits adjust
c + stat_density(adjust = 1, kernel = "gaussian") date_breaks = "2 weeks") - Treat data values as dates.
x, y | ..count.., ..density.., ..scaled..
e + stat_bin_2d(bins = 30, drop = T)
scale_*_datetime() - Treat data values as date times.
Same as scale_*_date(). See ?strptime for label formats.
Position Adjustments Set labeller to adjust facet label:
t + facet_grid(cols = vars(fl), labeller = label_both)
x, y, fill | ..count.., ..density.. Position adjustments determine how to arrange geoms fl: c fl: d fl: e fl: p fl: r
X & Y LOCATION SCALES that would otherwise occupy the same space.
e + stat_bin_hex(bins = 30) x, y, fill | ..count.., ..density.. t + facet_grid(rows = vars(fl),
Use with x or y aesthetics (x shown here) s <- ggplot(mpg, aes(fl, fill = drv)) labeller = label_bquote(alpha ^ .(fl)))
e + stat_density_2d(contour = TRUE, n = 100)
x, y, color, size | ..level.. scale_x_log10() - Plot x on log10 scale. ↵c ↵d ↵e ↵p ↵r
scale_x_reverse() - Reverse the direction of the x axis. s + geom_bar(position = "dodge")
e + stat_ellipse(level = 0.95, segments = 51, type = "t") scale_x_sqrt() - Plot x on square root scale. Arrange elements side by side.
l + stat_contour(aes(z = z)) x, y, z, order | ..level..
l + stat_summary_hex(aes(z = z), bins = 30, fun = max) COLOR AND FILL SCALES (DISCRETE)
s + geom_bar(position = "fill")
Stack elements on top of one
Labels and Legends
x, y, z, fill | ..value.. another, normalize height. Use labs() to label the elements of your plot.
n + scale_fill_brewer(palette = "Blues")
l + stat_summary_2d(aes(z = z), bins = 30, fun = mean) For palette choices: e + geom_point(position = "jitter") t + labs(x = "New x axis label", y = "New y axis label",
x, y, z, fill | ..value.. RColorBrewer::display.brewer.all() Add random noise to X and Y position of title ="Add a title above the plot",
each element to avoid overplotting. subtitle = "Add a subtitle below title",
f + stat_boxplot(coef = 1.5) n + scale_fill_grey(start = 0.2, A caption = "Add a caption below plot",
x, y | ..lower.., ..middle.., ..upper.., ..width.. , ..ymin.., ..ymax.. end = 0.8, na.value = "red") e + geom_label(position = "nudge") alt = "Add alt text to the plot",
B
Nudge labels away from points. <aes> = "New <aes>
<AES> <AES> legend title")
f + stat_ydensity(kernel = "gaussian", scale = "area") x, y
| ..density.., ..scaled.., ..count.., ..n.., ..violinwidth.., ..width.. COLOR AND FILL SCALES (CONTINUOUS) s + geom_bar(position = "stack") t + annotate(geom = "text", x = 8, y = 9, label = “A")
Stack elements on top of one another. Places a geom with manually selected aesthetics.
e + stat_ecdf(n = 40) x, y | ..x.., ..y.. o <- c + geom_dotplot(aes(fill = ..x..))
e + stat_quantile(quantiles = c(0.1, 0.9), Each position adjustment can be recast as a function p + guides(x = guide_axis(n.dodge = 2)) Avoid crowded
o + scale_fill_distiller(palette = “Blues”) with manual width and height arguments: or overlapping labels with guide_axis(n.dodge or angle).
formula = y ~ log(x), method = "rq") x, y | ..quantile..
s + geom_bar(position = position_dodge(width = 1)) n + guides(fill = “none") Set legend type for each
e + stat_smooth(method = "lm", formula = y ~ x, se = T, o + scale_fill_gradient(low="red", high=“yellow") aesthetic: colorbar, legend, or none (no legend).
level = 0.95) x, y | ..se.., ..x.., ..y.., ..ymin.., ..ymax..
ggplot() + xlim(-5, 5) + stat_function(fun = dnorm,
o + scale_fill_gradient2(low = "red", high = “blue”,
mid = "white", midpoint = 25) Themes n + theme(legend.position = "bottom")
Place legend at "bottom", "top", "le ", or “right”.
n = 20, geom = “point”) x | ..x.., ..y.. n + scale_fill_discrete(name = "Title",
ggplot() + stat_qq(aes(sample = 1:100)) o + scale_fill_gradientn(colors = topo.colors(6)) r + theme_bw() r + theme_classic() labels = c("A", "B", "C", "D", "E"))
x, y, sample | ..sample.., ..theoretical.. Also: rainbow(), heat.colors(), terrain.colors(), White background Set legend title and labels with a scale function.
cm.colors(), RColorBrewer::brewer.pal() with grid lines. r + theme_light()
e + stat_sum() x, y, size | ..n.., ..prop..
e + stat_summary(fun.data = "mean_cl_boot")
h + stat_summary_bin(fun = "mean", geom = "bar")
SHAPE AND SIZE SCALES
r + theme_gray()
Grey background
r + theme_linedraw()
r + theme_minimal()
Zooming
p <- e + geom_point(aes(shape = fl, size = cyl)) (default theme). Minimal theme. Without clipping (preferred):
e + stat_identity() p + scale_shape() + scale_size() r + theme_dark() r + theme_void() t + coord_cartesian(xlim = c(0, 100), ylim = c(10, 20))
e + stat_unique() p + scale_shape_manual(values = c(3:7)) Dark for contrast. Empty theme.
With clipping (removes unseen data points):
r + theme() Customize aspects of the theme such
as axis, legend, panel, and facet properties. t + xlim(0, 100) + ylim(10, 20)
p + scale_radius(range = c(1,6))
p + scale_size_area(max_size = 6) r + ggtitle(“Title”) + theme(plot.title.postion = “plot”) t + scale_x_continuous(limits = c(0, 100)) +
r + theme(panel.background = element_rect(fill = “blue”)) scale_y_continuous(limits = c(0, 100))
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at ggplot2.tidyverse.org • ggplot2 3.3.5 • Updated: 2021-08
ft
ft
Data tidying with tidyr : : CHEAT SHEET
Tidy data is a way to organize tabular data in a
consistent data structure across packages. Reshape Data - Pivot data to reorganize values into a new layout. Expand
A table is tidy if:
A B C A B C
table4a Tables
country 1999 2000 country year cases pivot_longer(data, cols, names_to = "name", Create new combinations of variables or identify
& A
B
0.7K 2K
37K 80K
A
B
1999 0.7K
1999 37K
values_to = "value", values_drop_na = FALSE) implicit missing values (combinations of
C 212K 213K C 1999 212K "Lengthen" data by collapsing several columns variables not present in the data).
A 2000 2K
Each variable is in Each observation, or into two. Column names move to a new
B 2000 80K x
its own column case, is in its own row C 2000 213K names_to column and values to a new values_to x1 x2 x3 x1 x2 expand(data, …) Create a
column. A 1 3
B 1 4
A 1
A 2 new tibble with all possible
A B C A *B C pivot_longer(table4a, cols = 2:3, names_to ="year", B 2 3 B 1
B 2
combinations of the values
values_to = "cases") of the variables listed in …
Drop other variables.
table2 expand(mtcars, cyl, gear,
Access variables Preserve cases in country year type count country year cases pop pivot_wider(data, names_from = "name", carb)
as vectors vectorized operations A 1999 cases 0.7K A 1999 0.7K 19M
values_from = "value")
A 1999 pop 19M A 2000 2K 20M x
A 2000 cases 2K B 1999 37K 172M The inverse of pivot_longer(). "Widen" data by x1 x2 x3 x1 x2 x3 complete(data, …, fill =
Tibbles
A 1 3 A 1 3
A
B
2000
1999
pop 20M
cases 37K
B
C
2000 80K 174M
1999 212K 1T
expanding two columns into several. One column B 1 4 A 2 NA list()) Add missing possible
B 1999 pop 172M C 2000 213K 1T provides the new column names, the other the B 2 3 B 1 4
combinations of values of
AN ENHANCED DATA FRAME
B 2 3
B 2000 cases 80K values. variables listed in … Fill
Tibbles are a table format provided B 2000 pop 174M
remaining variables with NA.
C 1999 cases 212K pivot_wider(table2, names_from = type,
by the tibble package. They inherit the complete(mtcars, cyl, gear,
C 1999 pop 1T values_from = count)
data frame class, but have improved behaviors: C 2000 cases 213K carb)
C 2000 pop 1T
• Subset a new tibble with ], a vector with [[ and $.
• No partial matching when subsetting columns.
• Display concise views of the data on one screen. Split Cells - Use these functions to split or combine cells into individual, isolated values. Handle Missing Values
options(tibble.print_max = n, tibble.print_min = m, table5 Drop or replace explicit missing values (NA).
tibble.width = Inf) Control default display settings. country century year country year unite(data, col, …, sep = "_", remove = TRUE,
x
View() or glimpse() View the entire data set.
A 19 99 A 1999
na.rm = FALSE) Collapse cells across several x1 x2 x1 x2 drop_na(data, …) Drop
A 20 00 A 2000
B 19 99 B 1999 columns into a single column. A 1 A 1
rows containing NA’s in …
CONSTRUCT A TIBBLE B NA D 3
B 20 00 B 2000
unite(table5, century, year, col = "year", sep = "") C
D
NA
3
columns.
tibble(…) Construct by columns. E NA drop_na(x, x2)
tibble(x = 1:3, y = c("a", "b", "c")) Both make table3
x
this tibble country year rate country year cases pop separate(data, col, into, sep = "[^[:alnum:]]+",
tribble(…) Construct by rows. x1 x2 x1 x2 fill(data, …, .direction =
A 1999 0.7K/19M0 A 1999 0.7K 19M remove = TRUE, convert = FALSE, extra = "warn", A 1 A 1
tribble(~x, ~y, A 2000 0.2K/20M0 A 2000 2K 20M B NA B 1 "down") Fill in NA’s in …
A tibble: 3 × 2 fill = "warn", …) Separate each cell in a column
1, "a", x y B 1999 .37K/172M B 1999 37K 172 C NA C 1
columns using the next or
<int> <chr> B 2000 .80K/174M B 2000 80K 174 into several columns. Also extract(). D 3 D 3
2, "b", 1 1 a
E NA E 3 previous value.
3, "c") 2
3
2
3
b
c
separate(table3, rate, sep = "/", fill(x, x2)
into = c("cases", "pop"))
x
as_tibble(x, …) Convert a data frame to a tibble. table3
country
A
year
1999
rate
0.7K x1 x2 x1 x2 replace_na(data, replace)
A 1 A 1
enframe(x, name = "name", value = "value") country year rate A 1999 19M
Specify a value to replace
A 1999 0.7K/19M0 A 2000 2K separate_rows(data, …, sep = "[^[:alnum:].]+", B NA B 2
Convert a named vector to a tibble. Also deframe(). C NA C 2
NA in selected columns.
A 2000 0.2K/20M0 A 2000 20M
convert = FALSE) Separate each cell in a column D 3 D 3
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at tidyr.tidyverse.org • tibble 3.2.1 • tidyr 1.3.0 • Updated: 2023–05
ft
Nested Data
A nested data frame stores individual tables as a list-column of data frames within a larger organizing data frame. List-columns can also be lists of vectors or lists of varying data types.
Use a nested data frame to:
• Preserve relationships between observations and subsets of data. Preserve the type of the variables being nested (factors and datetimes aren't coerced to character).
• Manipulate many sub-tables at once with purrr functions like map(), map2(), or pmap() or with dplyr rowwise() grouping.
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at tidyr.tidyverse.org • tibble 3.2.1 • tidyr 1.3.0 • Updated: 2023–05
ft
String manipulation with stringr : : CHEAT SHEET
The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.
2 4 str_locate(string, pattern) Locate the str_match(string, pattern) Return the str_trim(string, side = c("both", "le ", "right"))
4 7
NA NA
positions of pattern matches in a string. NA NA
first pattern match found in each string, as Trim whitespace from the start and/or end of
3 4 Also str_locate_all(). str_locate(fruit, "a") a matrix with a column for each ( ) group in a string. str_trim(str_pad(fruit, 17))
pattern. Also str_match_all().
0 str_count(string, pattern) Count the number str_match(sentences, "(a|the) ([^ +])") str_squish(string) Trim whitespace from each
3
1
of matches in a string. str_count(fruit, "a") end and collapse multiple spaces into single
2 spaces. str_squish(str_pad(fruit, 17, "both"))
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor on Twitter • stringr 1.5.0 • Updated: 2023-05
ft
ft
ft
ft
Need to Know Regular Expressions - Regular expressions, or regexps, are a concise language for
describing patterns in strings.
[:space:]
new line
Pattern arguments in stringr are interpreted as MATCH CHARACTERS see <- function(rx) str_view_all("abc ABC 123\t.!?\\(){}\n", rx)
regular expressions a er any special characters [:blank:] .
have been parsed. string regexp matches example
(type this) (to mean this) (which matches this) space
In R, you write regular expressions as strings, a (etc.) a (etc.) see("a") abc ABC 123 .!?\(){} tab
sequences of characters surrounded by quotes \\. \. . see("\\.") abc ABC 123 .!?\(){}
("") or single quotes('').
\\! \! ! see("\\!") abc ABC 123 .!?\(){} [:graph:]
Some characters cannot be represented directly \\? \? ? see("\\?") abc ABC 123 .!?\(){}
in an R string . These must be represented as \\\\ \\ \ see("\\\\") abc ABC 123 .!?\(){} [:punct:] [:symbol:]
special characters, sequences of characters that \\( \( ( see("\\(") abc ABC 123 .!?\(){}
have a specific meaning., e.g. . , : ; ? ! / *@# | ` = + ^
\\) \) ) see("\\)") abc ABC 123 .!?\(){}
Special Character Represents \\{ \{ { see("\\{") abc ABC 123 .!?\(){} - _ " ' [ ] { } ( ) ~ < > $
\\ \ \\} \} } see( "\\}") abc ABC 123 .!?\(){}
\" " \\n \n new line (return) see("\\n") abc ABC 123 .!?\(){} [:alnum:]
\n new line \\t \t tab see("\\t") abc ABC 123 .!?\(){}
Run ?"'" to see a complete list \\s \s any whitespace (\S for non-whitespaces) see("\\s") abc ABC 123 .!?\(){} [:digit:]
\\d \d any digit (\D for non-digits) see("\\d") abc ABC 123 .!?\(){}
0 1 2 3 4 5 6 7 8 9
Because of this, whenever a \ appears in a regular \\w \w any word character (\W for non-word chars) see("\\w") abc ABC 123 .!?\(){}
expression, you must write it as \\ in the string \\b \b word boundaries see("\\b") abc ABC 123 .!?\(){}
that represents the regular expression. [:digit:]
1
digits see("[:digit:]") abc ABC 123 .!?\(){} [:alpha:]
1
Use writeLines() to see how R views your string [:alpha:] letters see("[:alpha:]") abc ABC 123 .!?\(){} [:lower:] [:upper:]
1
a er all special characters have been parsed. [:lower:] lowercase letters see("[:lower:]") abc ABC 123 .!?\(){}
[:upper:]
1
uppercase letters see("[:upper:]") abc ABC 123 .!?\(){} a b c d e f A B C D E F
writeLines("\\.") 1
# \. [:alnum:] letters and numbers see("[:alnum:]") abc ABC 123 .!?\(){} g h i j k l GH I J K L
[:punct:] 1 punctuation see("[:punct:]") abc ABC 123 .!?\(){}
mn o p q r MNOPQR
writeLines("\\ is a backslash") [:graph:]1 letters, numbers, and punctuation see("[:graph:]") abc ABC 123 .!?\(){}
# \ is a backslash
[:space:]1 space characters (i.e. \s) see("[:space:]") abc ABC 123 .!?\(){} s t u v w x S T U VWX
[:blank:]1 space and tab (but not new line) see("[:blank:]") abc ABC 123 .!?\(){} y z Y Z
. every character except a new line see(".") abc ABC 123 .!?\(){}
INTERPRETATION 1 Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]]
Patterns in stringr are interpreted as regexs. To
change this default, wrap the pattern in one of:
ALTERNATES alt <- function(rx) str_view_all("abcde", rx) QUANTIFIERS quant <- function(rx) str_view_all(".a.aa.aaa", rx)
regex(pattern, ignore_case = FALSE, multiline = example example
regexp matches regexp matches
FALSE, comments = FALSE, dotall = FALSE, ...)
Modifies a regex to ignore cases, match end of ab|d or alt("ab|d") abcde a? zero or one quant("a?") .a.aa.aaa
lines as well of end of strings, allow R comments [abe] one of alt("[abe]") abcde a* zero or more quant("a*") .a.aa.aaa
within regex's , and/or to have . match everything a+ one or more quant("a+") .a.aa.aaa
including \n. [^abe] anything but alt("[^abe]") abcde
str_detect("I", regex("i", TRUE)) [a-c] range alt("[a-c]") abcde 1 2 ... n a{n} exactly n quant("a{2}") .a.aa.aaa
1 2 ... n a{n, } n or more quant("a{2,}") .a.aa.aaa
fixed() Matches raw bytes but will miss some n ... m a{n, m} between n and m quant("a{2,4}") .a.aa.aaa
characters that can be represented in multiple ANCHORS anchor <- function(rx) str_view_all("aaa", rx)
ways (fast). str_detect("\u0130", fixed("i")) regexp matches example
^a start of string anchor("^a") aaa GROUPS ref <- function(rx) str_view_all("abbaab", rx)
coll() Matches raw bytes and will use locale
specific collation rules to recognize characters a$ end of string anchor("a$") aaa Use parentheses to set precedent (order of evaluation) and create groups
that can be represented in multiple ways (slow).
regexp matches example
str_detect("\u0130", coll("i", TRUE, locale = "tr"))
(ab|d)e sets precedence alt("(ab|d)e") abcde
LOOK AROUNDS look <- function(rx) str_view_all("bacad", rx)
boundary() Matches boundaries between
characters, line_breaks, sentences, or words. regexp matches example Use an escaped number to refer to and duplicate parentheses groups that occur
str_split(sentences, boundary("word")) a(?=c) followed by look("a(?=c)") bacad earlier in a pattern. Refer to each group by its order of appearance
a(?!c) not followed by look("a(?!c)") bacad string regexp matches example
(?<=b)a preceded by look("(?<=b)a") bacad (type this) (to mean this) (which matches this) (the result is the same as ref("abba"))
(?<!b)a not preceded by look("(?<!b)a") bacad \\1 \1 (etc.) first () group, etc. ref("(a)(b)\\2\\1") abbaab
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor on Twitter • stringr 1.5.0 • Updated: 2023-05
ft
ft
ft
Dates and times with lubridate : : CHEAT SHEET
Date-times 2017-11-28 12:00:00 2017-11-28 12:00:00 Round Date-times
A date-time is a point on the timeline, A date is a day stored as An hms is a time stored as floor_date(x, unit = "second")
stored as the number of seconds since the number of days since the number of seconds since Round down to nearest unit.
1970-01-01 00:00:00 UTC 1970-01-01 00:00:00 floor_date(dt, unit = "month")
2016 2017 2018 2019 2020
Jan Feb Mar Apr
dt <- as_datetime(1511870400) d <- as_date(17498) t <- hms::as_hms(85) round_date(x, unit = "second")
2017-11-28 12:00:00 Round to nearest unit.
## "2017-11-28 12:00:00 UTC" ## "2017-11-28" ## 00:01:25
round_date(dt, unit = "month")
Jan Feb Mar Apr ceiling_date(x, unit = "second",
PARSE DATE-TIMES (Convert strings or numbers to date-times) GET AND SET COMPONENTS change_on_boundary = NULL)
d ## "2017-11-28" Round up to nearest unit.
1. Identify the order of the year (y), month (m), day (d), hour (h), Use an accessor function to get a component. day(d) ## 28 ceiling_date(dt, unit = "month")
minute (m) and second (s) elements in your data. Assign into an accessor function to change a day(d) <- 1 Jan Feb Mar Apr
2. Use the function below whose name replicates the order. Each component in place. d ## "2017-11-01" Valid units are second, minute, hour, day, week, month, bimonth,
accepts a tz argument to set the time zone, e.g. ymd(x, tz = "UTC"). quarter, season, halfyear and year.
ymd_hms(), ymd_hm(), ymd_h(). 2018-01-31 11:59:59 date(x) Date component. date(dt) rollback(dates, roll_to_first = FALSE, preserve_hms = TRUE) Roll back to
2017-11-28T14:02:00 ymd_hms("2017-11-28T14:02:00") last day of previous month. Also rollforward(). rollback(dt)
year(x) Year. year(dt)
2017-22-12 10:00:00
ydm_hms(), ydm_hm(), ydm_h().
ydm_hms("2017-22-12 10:00:00")
2018-01-31 11:59:59 isoyear(x) The ISO 8601 year.
epiyear(x) Epidemiological year. Stamp Date-times
mdy_hms(), mdy_hm(), mdy_h(). stamp() Derive a template from an example string and return a new
11/28/2017 1:02:03 2018-01-31 11:59:59 month(x, label, abbr) Month. function that will apply the template to date-times. Also
mdy_hms("11/28/2017 1:02:03") month(dt) stamp_date() and stamp_time().
dmy_hms(), dmy_hm(), dmy_h(). day(x) Day of month. day(dt) 1. Derive a template, create a function
1 Jan 2017 23:59:59 dmy_hms("1 Jan 2017 23:59:59") Tip: use a
2018-01-31 11:59:59 wday(x, label, abbr) Day of week. sf <- stamp("Created Sunday, Jan 17, 1999 3:34") date with
ymd(), ydm(). ymd(20170131) qday(x) Day of quarter. day > 12
20170131 2. Apply the template to dates
sf(ymd("2010-04-05"))
mdy(), myd(). mdy("July 4th, 2000") 2018-01-31 11:59:59 hour(x) Hour. hour(dt) ## [1] "Created Monday, Apr 05, 2010 00:00"
July 4th, 2000
dmy(), dym(). dmy("4th of July '99") 2018-01-31 11:59:59 minute(x) Minutes. minute(dt)
4th of July '99
2001: Q3 yq() Q for quarter. yq("2001: Q3") 2018-01-31 11:59:59 second(x) Seconds. second(dt) Time Zones
R recognizes ~600 time zones. Each encodes the time zone, Daylight
07-2020 my(), ym(). my("07-2020") 2018-01-31 11:59:59 UTC tz(x) Time zone. tz(dt) Savings Time, and historical calendar variations for an area. R assigns
one time zone per vector.
2:01 hms::hms() Also lubridate::hms(), week(x) Week of the year. week(dt)
hm() and ms(), which return x
J F M A M J isoweek() ISO 8601 week. Use the UTC time zone to avoid Daylight Savings.
periods.* hms::hms(seconds = 0, J A S O N D epiweek() Epidemiological week.
minutes = 1, hours = 2) OlsonNames() Returns a list of valid time zone names. OlsonNames()
x
J F M A M J quarter(x) Quarter. quarter(dt) Sys.timezone() Gets current time zone.
J A S O N D
2017.5 date_decimal(decimal, tz = "UTC") 5:00 6:00
semester(x, with_year = FALSE)
date_decimal(2017.5)
x
J F M A M J Semester. semester(dt) 4:00 Mountain Central 7:00 with_tz(time, tzone = "") Get
the same date-time in a new
now(tzone = "") Current time in tz J A S O N D Pacific Eastern time zone (a new clock time).
(defaults to system tz). now() am(x) Is it in the am? am(dt) Also local_time(dt, tz, units).
pm(x) Is it in the pm? pm(dt) with_tz(dt, "US/Pacific")
today(tzone = "") Current date in a PT
MT
January
CT ET
xxxxx dst(x) Is it daylight savings? dst(d)
xxx tz (defaults to system tz). today()
force_tz(time, tzone = "") Get
fast_strptime() Faster strptime. leap_year(x) Is it a leap year? the same clock time in a new
leap_year(d) 7:00 7:00
fast_strptime(“9/1/01”, “%y/%m/%d”) Pacific Eastern time zone (a new date-time).
Also force_tzs().
parse_date_time() Easier strptime. update(object, ..., simple = FALSE) 7:00 7:00 force_tz(dt, "US/Pacific")
parse_date_time(“09-01-01”, "ymd") update(dt, mday = 2, hour = 1) Mountain Central
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at lubridate.tidyverse.org • lubridate 1.9.2 • Updated: 2023-05
ft
Math with Date-times — Lubridate provides three classes of timespans to facilitate math with dates and date-times.
Math with date-times relies on the timeline, Periods track changes in clock times, Durations track the passage of Intervals represent specific intervals Not all years
which behaves inconsistently. Consider how which ignore time line irregularities. physical time, which deviates from of the timeline, bounded by start and are 365 days
the timeline behaves during: clock time when irregularities occur. end date-times. due to leap days.
A normal day nor + minutes(90) nor + dminutes(90) interval(nor, nor + minutes(90)) Not all minutes
nor <- ymd_hms("2018-01-01 01:30:00",tz="US/Eastern") are 60 seconds due to
leap seconds.
forwards
Open in new Save Find and
backwards/ window replace
Compile as Run
notebook selected
code
Import data History of past
with wizard commands to
run/copy
Manage
external
View
memory
databases usage
R tutorials
Control
and more in Source Pane Turn on at Tools > Project Options > Git/SVN
A• Added M• Modified
Check Render Choose Configure Insert D• Deleted R• Renamed
?• Untracked
Package Development
Click next to line number to Highlighted line shows where
RStudio opens plots in a dedicated Plots pane RStudio opens documentation in a dedicated Help pane add/remove a breakpoint. execution has paused
Create a new package with
File > New Project > New Directory > R Package
Navigate Open in Export Delete Delete
Enable roxygen documentation with recent plots window plot plot all plots Home page of Search within Search for
Tools > Project Options > Build Tools helpful links help file help file
Roxygen guide at Help > Roxygen Quick Reference
See package information in the Build Tab Viewer pane displays HTML content, such as Shiny
apps, RMarkdown reports, and interactive visualizations
GUI Package manager lists every installed package
Install package Run devtools::load_all()
and restart R and reload changes
Stop Shiny Publish to shinyapps.io, Refresh
Install Update Browse app rpubs, RSConnect, … Run commands in Examine variables Select function
Packages Packages package site environment where in executing in traceback to
Clear output execution has paused environment debug
Run R CMD and rebuild
check View(<data>) opens spreadsheet like view of data set
Customize Run Click to load package with Package Delete
package build package library(). Unclick to detach version from
options tests package with detach(). installed library
Filter rows by value Sort by Search Step through Step into and Resume Quit debug
or value range values for value code one line out of functions execution mode
at a time to run
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at rstudio.com • Font Awesome 5.15.3 • RStudio IDE 1.4.1717 • Updated: 2021-07
ff
ft
ff
Keyboard Shortcuts RStudio
RUN CODE
Search command history
Windows/Linux
Ctrl+arrow-up
Mac
Cmd+arrow-up
DOCUMENTS AND APPS
Knit Document (knitr) Ctrl+Shi +K Cmd+Shi +K
Workbench
Interrupt current command Esc Esc Insert chunk (Sweave & Knitr) Ctrl+Alt+I Cmd+Option+I WHY RSTUDIO WORKBENCH?
Clear console Ctrl+L Ctrl+L Run from start to current line Ctrl+Alt+B Cmd+Option+B Extend the open source server with a
commercial license, support, and more:
NAVIGATE CODE MORE KEYBOARD SHORTCUTS
Go to File/Function Ctrl+. Ctrl+. Keyboard Shortcuts Help Alt+Shi +K Option+Shi +K • open and run multiple R sessions at once
Show Command Palette Ctrl+Shi +P Cmd+Shi +P • tune your resources to improve performance
WRITE CODE
Attempt completion Tab or Tab or
• administrative tools for managing user sessions
Ctrl+Space Ctrl+Space View the Keyboard Shortcut Quick Search for keyboard shortcuts with • collaborate real-time with others in shared projects
Insert <- (assignment operator) Alt+- Option+- Reference with Tools > Keyboard Tools > Show Command Palette • switch easily from one version of R to a di erent version
Shortcuts or Alt/Option + Shi + K or Ctrl/Cmd + Shi + P.
Insert %>% (pipe operator) Ctrl+Shi +M Cmd+Shi +M • integrate with your authentication, authorization, and audit practices
(Un)Comment selection Ctrl+Shi +C Cmd+Shi +C • work in the RStudio IDE, JupyterLab, Jupyter Notebooks, or VS Code
MAKE PACKAGES Windows/Linux Mac Download a free 45 day evaluation at
Load All (devtools) Ctrl+Shi +L Cmd+Shi +L www.rstudio.com/products/workbench/evaluation/
Test Package (Desktop)
Document Package
Ctrl+Shi +T
Ctrl+Shi +D
Cmd+Shi +T
Cmd+Shi +D Share Projects
File > New Project
RStudio saves the call history,
Visual Editor
workspace, and working Start new R Session Close R Session
Choose Choose Insert Jump to Jump Run directory associated with a in current project in project
Check Render output output code previous to next selected Publish Show file project. It reloads each when
spelling output format location chunk chunk chunk lines to server outline you re-open a project.
T H J
Back to
Source Editor
Block (front page) Active shared
format collaborators
Name of
current
Lists and Links Citations Images File outline project
Insert blocks, Select
block
citations, Insert and Share Project R Version
quotes More
formatting equations, and edit tables with Collaborators
Clear special
formatting characters
Insert
verbatim
code
Run Remote Jobs
Run R on remote clusters
(Kubernetes/Slurm) via the
Job Launcher
Add/Edit
attributes Monitor Launch a job
launcher jobs
Run launcher
jobs remotely
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at rstudio.com • Font Awesome 5.15.3 • RStudio IDE 1.4.1717 • Updated: 2021-07
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ft
ff
Factors with forcats : : CHEAT SHEET
The forcats package provides tools for working with factors, which are R's data structure for categorical data.
Factors stored displayed Change the order of levels Change the value of levels
R represents categorical integer
1 1= a a 1= a
data with factors. A factor vector 3 23 == bc c 23 == bc a 1= a a 1= b fct_relevel(.f, ..., a er = 0L) a 1= a v 1= v
2= x
fct_recode(.f, ...) Manually change
is an integer vector with a 2 b c 2= b c 2= c Manually reorder factor levels. c 2= b
z levels. Also fct_relabel() which obeys
3= c 3= a fct_relevel(f, c("b", "c", "a")) 3= c 3= z purrr::map syntax to apply a function
levels attribute that stores levels 1 a b b b x
a set of mappings between or expression to each level.
a a a v fct_recode(f, v = "a", x = "b", z = "c")
integers and categorical values. When you view a factor, R fct_infreq(f, ordered = NA) Reorder
displays not the integers, but the levels associated with them. fct_relabel(f, ~ paste0("x", .x))
levels by the frequency
c 1= a c 1= c in which they appear in the
= c 2= c c 2= a data (highest frequency first).
a a 1= a Also fct_inseq(). a 1= a 2 1=2 fct_anon(f, prefix = "")
c c 2= b a a c 2= b 2=1 Anonymize levels with random
3= c
f3 <- factor(c("c", "c", "a")) 3= c 1 3=3
integers.
b b fct_infreq(f3) b 3 fct_anon(f)
a a a 2
b 1= a b 1= b fct_inorder(f, ordered = NA)
a 2= b a 2= a Reorder levels by order in which
they appear in the data. a 1= a x 1= x fct_collapse(.f, …, other_level = NULL)
a 1= a a
c 2= b c 2= c Collapse levels into manually defined
c 2= b b fct_inorder(f2) 3= c groups.
3= c c b x fct_collapse(f, x = c("a", "b"))
b a x
a a 1= a a 1= c fct_rev(f) Reverse level order.
2= b 2= b f4 <- factor(c("a","b","c"))
b 3= c b 3= a
c c fct_rev(f4) fct_lump_min(f, min, w = NULL,
Inspect Factors a
c
1= a
2= b
3= c
a
Other
1= a
2 = Other
other_level = "Other") Lumps together
factors that appear fewer than min
times. Also fct_lump_n(),
a 1= a f n fct_count(f, sort = FALSE, a 1= a a 1= c fct_shi (f) Shi levels to le or b Other
fct_lump_prop(), and
c 2= b
a 2 prop = FALSE) Count the 2= b 2= a right, wrapping around end. a a
3= c number of values with each b 3= c b 3= b
fct_lump_lowfreq().
b b 1 c c fct_shi (f4) fct_lump_min(f, min = 2)
level. fct_count(f)
a c 1
fct_match(f, lvls) Check for
lvls in f. fct_match(f, "a") a 1= a a 1= a fct_shu le(f, n = 1L) Randomly a 1= a a 1= a fct_other(f, keep, drop, other_level =
2= b 2= c permute order of factor levels. 2= b 2= b "Other") Replace levels with "other."
a 1= a a 1= a fct_unique(f) Return the b 3= c b 3= b
c 3= c
Other
3 = Other
c c fct_shu le(f4) fct_other(f, keep = c("a", "b"))
b 2= b
b 2= b unique values, removing b b
a duplicates. fct_unique(f) a a
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at forcats.tidyverse.org • Diagrams inspired by @LVaudor on Twitter • forcats 1.0.0 • Updated: 2023-05
ft
ff
ff
ft
ff
ft
ft
ft
ft
Data import with the tidyverse : : CHEAT SHEET
Read Tabular Data with readr
read_*(file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale, n_max = Inf, One of the first steps of a project is to import OTHER TYPES OF DATA
skip = 0, na = c("", "NA"), guess_max = min(1000, n_max), show_col_types = TRUE) See ?read_delim outside data into R. Data is o en stored in Try one of the following
tabular formats, like csv files or spreadsheets. packages to import other types of files:
A|B|C
A B C read_delim("file.txt", delim = "|") Read files with any delimiter. If no The front page of this sheet shows • haven - SPSS, Stata, and SAS files
1 2 3 delimiter is specified, it will automatically guess. how to import and save text files into • DBI - databases
1|2|3 4 5 NA To make file.txt, run: write_file("A|B|C\n1|2|3\n4|5|NA", file = "file.txt")
4|5|NA R using readr. • jsonlite - json
The back page shows how to import • xml2 - XML
A B C read_csv("file.csv") Read a comma delimited file with period • httr - Web APIs
A,B,C spreadsheet data from Excel files
1 2 3 decimal marks. • rvest - HTML (Web Scraping)
1,2,3 4 5 NA write_file("A,B,C\n1,2,3\n4,5,NA", file = "file.csv") using readxl or Google Sheets using
4,5,NA googlesheets4. • readr::read_lines() - text data
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • readr.tidyverse.org • readxl.tidyverse.org and googlesheets4.tidyverse.org • readr 2.1.4 • readxl 1.4.2 • googlesheets4 1.1.0 • Updated: 2023-05
ft
ft
Import Spreadsheets
with readxl with googlesheets4
READ EXCEL FILES READ SHEETS
A B C D E A B C D E
1 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 1 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
2 x z 8 x NA z 8 NA 2 x z 8 x NA z 8 NA
3 y 7 9 10 y 7 NA 9 10 READXL COLUMN SPECIFICATION 3 y 7 9 10 y 7 NA 9 10 GOOGLESHEETS4 COLUMN SPECIFICATION
s1 s1
Column specifications define what data type Column specifications define what data type
each column of a file will be imported as. each column of a file will be imported as.
read_excel(path, sheet = NULL, range = NULL) read_sheet(ss, sheet = NULL, range = NULL)
Read a .xls or .xlsx file based on the file extension. Read a sheet from a URL, a Sheet ID, or a dribble
Use the col_types argument of read_excel() to Use the col_types argument of read_sheet()/
See front page for more read arguments. Also from the googledrive package. See front page for
set the column specification. range_read() to set the column specification.
read_xls() and read_xlsx(). more read arguments. Same as range_read().
read_excel("excel_file.xlsx")
Guess column types Guess column types
To guess a column type, read_ excel() looks at SHEETS METADATA To guess a column type read_sheet()/
READ SHEETS the first 1000 rows of data. Increase with the URLs are in the form: range_read() looks at the first 1000 rows of data.
guess_max argument. https://docs.google.com/spreadsheets/d/ Increase with guess_max.
A B C D E read_excel(path, sheet = read_excel(path, guess_max = Inf) read_sheet(path, guess_max = Inf)
NULL) Specify which sheet SPREADSHEET_ID/edit#gid=SHEET_ID
to read by position or name. Set all columns to same type, e.g. character gs4_get(ss) Get spreadsheet meta data. Set all columns to same type, e.g. character
read_excel(path, sheet = 1) read_excel(path, col_types = "text") read_sheet(path, col_types = "c")
s1 s2 s3
read_excel(path, sheet = "s1") gs4_find(...) Get data on all spreadsheet files.
Set each column individually sheet_properties(ss) Get a tibble of properties Set each column individually
read_excel( for each worksheet. Also sheet_names(). # col types: skip, guess, integer, logical, character
excel_sheets(path) Get a
vector of sheet names. path, read_sheets(ss, col_types = "_?ilc")
s1 s2 s3
col_types = c("text", "guess", "guess",“numeric") WRITE SHEETS
excel_sheets("excel_file.xlsx")
) A B C write_sheet(data, ss =
1 x 4 1 1 x 4 NULL, sheet = NULL) COLUMN TYPES
A B C D E To read multiple sheets: 2 y 5 2 2 y 5
Write a data frame into a
COLUMN TYPES l n c D L
A B C D E 1. Get a vector of sheet 3 z 6 3 3 z 6
new or existing Sheet. TRUE 2 hello 1947-01-08 hello
s1
names from the file path. logical numeric text date list FALSE 3.45 world 1956-10-21 1
A B C D E gs4_create(name, ...,
2. Set the vector names to TRUE 2 hello 1947-01-08 hello
s1 s2 A B C D sheets = NULL) Create a
be the sheet names. FALSE 3.45 world 1956-10-21 1 • skip - "_" or "-" • date - "D"
1 new Sheet with a vector
s1 s2 3. Use purrr::map_dfr() to • guess - "?" • datetime - "T"
• skip • logical • date 2 of names, a data frame,
s1 s2 s3 read multiple files into • logical - "l" • character - "c"
• guess • numeric • list s1 or a (named) list of data
one data frame. • integer - "i" • list-column - "L"
• text frames.
• double - "d" • cell - "C" Returns
path <- "your_file_path.xlsx" A B C
sheet_append(ss, data,
x1 x2 x3 1 x1 x2 x3 • numeric - "n" list of raw cell data.
path |> excel_sheets() |> Use list for columns that include multiple data 2 1 x 4 sheet = 1) Add rows to
2 y 5
set_names() |> types. See tidyr and purrr for list-column data. 3 z 6 3 2 y 5 the end of a worksheet. Use list for columns that include multiple data
map_dfr(read_excel, path = path) 4 3 z 6 types. See tidyr and purrr for list-column data.
s1
OTHER USEFUL EXCEL PACKAGES CELL SPECIFICATION FOR READXL AND GOOGLESHEETS4 FILE LEVEL OPERATIONS
For functions to write data to Excel files, see: Use the range argument of readxl::read_excel() or googlesheets4 also o ers ways to modify other
• openxlsx googlesheets4::read_sheet() to read a subset of cells from a aspects of Sheets (e.g. freeze rows, set column
• writexl A B C D E sheet. width, manage (work)sheets). Go to
1 1 2 3 4 5 2 3 4 read_excel(path, range = "Sheet1!B1:D2") googlesheets4.tidyverse.org to read more.
For working with non-tabular Excel data, see: 2 x y z NA y z read_sheet(ss, range = "B1:D2")
• tidyxl 3 6 7 9 10 For whole-file operations (e.g. renaming, sharing,
s1 Also use the range argument with cell specification functions placing within a folder), see the tidyverse
cell_limits(), cell_rows(), cell_cols(), and anchored(). package googledrive at
googledrive.tidyverse.org.
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • readr.tidyverse.org • readxl.tidyverse.org and googlesheets4.tidyverse.org • readr 2.1.4 • readxl 1.4.2 • googlesheets4 1.1.0 • Updated: 2023-05
ft
ff
Data transformation with dplyr : : CHEAT SHEET
dplyr functions work with pipes and expect tidy data. In tidy data:
A B C A B C
Manipulate Cases Manipulate Variables
&
pipes EXTRACT CASES EXTRACT VARIABLES
Row functions return a subset of rows as a new table. Column functions return a set of columns as a new vector or table.
Each variable is in Each observation, or x |> f(y)
its own column case, is in its own row becomes f(x, y) filter(.data, …, .preserve = FALSE) Extract rows pull(.data, var = -1, name = NULL, …) Extract
Summarize Cases w
www
ww that meet logical criteria.
mtcars |> filter(mpg > 20) w
www column values as a vector, by name or index.
mtcars |> pull(wt)
w
www
Apply summary functions to columns to create a new table of
w
www
ww
rows with duplicate values. mtcars |> select(mpg, wt)
summary statistics. Summary functions take vectors as input and mtcars |> distinct(gear)
return one value (see back).
relocate(.data, …, .before = NULL, .a er = NULL)
slice(.data, …, .preserve = FALSE) Select rows
w
www
ww
summary function Move columns to new position.
by position. mtcars |> relocate(mpg, cyl, .a er = last_col())
mtcars |> slice(10:15)
summarize(.data, …)
w
ww w
www
ww
Compute table of summaries. slice_sample(.data, …, n, prop, weight_by =
mtcars |> summarize(avg = mean(mpg)) NULL, replace = FALSE) Randomly select rows. Use these helpers with select() and across()
Use n to select a number of rows and prop to e.g. mtcars |> select(mpg:cyl)
count(.data, …, wt = NULL, sort = FALSE, name = select a fraction of rows.
NULL) Count number of rows in each group defined contains(match) num_range(prefix, range) :, e.g., mpg:cyl
mtcars |> slice_sample(n = 5, replace = TRUE) ends_with(match) all_of(x)/any_of(x, …, vars) !, e.g., !gear
by the variables in … Also tally(), add_count(),
w
ww add_tally(). starts_with(match) matches(match) everything()
mtcars |> count(cyl) slice_min(.data, order_by, …, n, prop,
with_ties = TRUE) and slice_max() Select rows
with the lowest and highest values. MANIPULATE MULTIPLE VARIABLES AT ONCE
Group Cases w
www
ww
mtcars |> slice_min(mpg, prop = 0.25)
df <- tibble(x_1 = c(1, 2), x_2 = c(3, 4), y = c(4, 5))
slice_head(.data, …, n, prop) and slice_tail()
Use group_by(.data, …, .add = FALSE, .drop = TRUE) to create a Select the first or last rows. across(.cols, .funs, …, .names = NULL) Summarize
w
ww
"grouped" copy of a table grouped by columns in ... dplyr mtcars |> slice_head(n = 5) or mutate multiple columns in the same way.
functions will manipulate each "group" separately and combine df |> summarize(across(everything(), mean))
the results.
Logical and boolean operators to use with filter() c_across(.cols) Compute across columns in
w
ww
== < <= is.na() %in% | xor() row-wise data.
w
www
ww mtcars |> != > >= !is.na() ! &
df |>
rowwise() |>
w
group_by(cyl) |>
summarize(avg = mean(mpg)) See ?base::Logic and ?Comparison for help. mutate(x_total = sum(c_across(1:2)))
MAKE NEW VARIABLES
ARRANGE CASES Apply vectorized functions to columns. Vectorized functions take
Use rowwise(.data, …) to group data into individual rows. dplyr arrange(.data, …, .by_group = FALSE) Order vectors as input and return vectors of the same length as output
functions will compute results for each row. Also apply functions (see back).
w
www
ww
rows by values of a column or columns (low to
to list-columns. See tidyr cheat sheet for list-column workflow. high), use with desc() to order from high to low. vectorized function
mtcars |> arrange(mpg) mutate(.data, …, .keep = "all", .before = NULL,
starwars |> mtcars |> arrange(desc(mpg))
ww
www w
www
ww
.a er = NULL) Compute new column(s). Also
w
w ww
rowwise() |> add_column().
mutate(film_count = length(films)) mtcars |> mutate(gpm = 1 / mpg)
ADD CASES mtcars |> mutate(gpm = 1 / mpg, .keep = "none")
add_row(.data, …, .before = NULL, .a er = NULL)
ungroup(x, …) Returns ungrouped copy of table.
w
www
ww
Add one or more rows to a table. rename(.data, …) Rename columns. Use
w
www
w
g_mtcars <- mtcars |> group_by(cyl) cars |> add_row(speed = 1, dist = 1) rename_with() to rename with a function.
ungroup(g_mtcars) mtcars |> rename(miles_per_gallon = mpg)
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at dplyr.tidyverse.org • dplyr 1.1.2 • Updated: 2023-05
ft
ft
ft
ft
ft
Vectorized Functions Summary Functions Combine Tables
TO USE WITH MUTATE () TO USE WITH SUMMARIZE () COMBINE VARIABLES COMBINE CASES
mutate() applies vectorized functions to summarize() applies summary functions to x y
columns to create new columns. Vectorized columns to create a new table. Summary A B C E F G A B C E F G A B C
TRUE ~ "other") Tidy data does not use rownames, which store a A B C intersect(x, y, …)
A B.x C B.y D Use by = c("col1", "col2", …) to
) variable outside of the columns. To work with the
c v 3
Rows that appear in both x and y.
a t 1 t 3
specify one or more common
dplyr::coalesce() - first non-NA values by rownames, first move them into a column. b u 2 u 2
columns to match on.
element across a set of vectors c v 3 NA NA
setdi (x, y, …)
tibble::rownames_to_column() le _join(x, y, by = "A") A B C
dplyr::if_else() - element-wise if() + else() A B C A B
a t 1 Rows that appear in x but not y.
dplyr::na_if() - replace specific values with NA 1 a t 1 a t Move row names into col. b u 2
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at dplyr.tidyverse.org • dplyr 1.1.2 • Updated: 2023-05
ft
ft
ft
ft
ff
ff
ff
ff
ff
ff
ff
ff
ff
ft
ff
ff
ft
ff
Shiny : : CHEAT SHEET
Building an App To generate the template, type shinyapp and press Tab in the RStudio IDE
or go to File > New Project > New Directory > Shiny Web Application
Inputs
Collect values from the user.
A Shiny app is a web page (ui)
# app.R
connected to a computer library(shiny) Access the current value of an input object with
running a live R session (server). Customize the UI with Layout Functions input$<inputId>. Input values are reactive.
In ui nest R ui <- uidPage(
functions to numericInput(inputId = "n",
Add Inputs with *Input() functions actionButton(inputId, label, icon,
build an HTML "Sample size", value = 25), width, …)
interface plotOutput(outputId = "hist") Add Outputs with *Output() functions
) actionLink(inputId, label, icon, …)
Users can manipulate the UI,
which will cause the server to checkboxGroupInput(inputId,
Tell the server server <- function(input, output, session) {
update the UI’s displays (by output$hist <- renderPlot({ Wrap code in render*() functions label, choices, selected, inline, width,
how to render choiceNames, choiceValues)
running R code). hist(rnorm(input$n)) before saving to output
outputs and })
respond to } Refer to UI inputs with input$<id> checkboxInput(inputId, label, value,
Save your template as app.R. inputs with R and outputs with output$<id> width)
Keep your app in a directory
along with optional extra files. shinyApp(ui = ui, server = server) dateInput(inputId, label, value, min,
max, format, startview, weekstart,
app-name Call shinyApp() to combine ui and server into an interactive app! language, width, autoclose,
The directory name is the app name datesdisabled, daysofweekdisabled)
le
.r app.R
(optional) used in showcase mode
le DESCRIPTION
README (optional) directory of supplemental .R files that are sourced dateRangeInput(inputId, label, start,
le
automatically, must be named "R" See annotated examples of Shiny apps by running end, min, max, format, startview,
fo R/ runExample(<example name>). Run runExample() weekstart, language, separator, width,
www/ (optional) directory of files to share with web browsers (images, autoclose)
fo CSS, .js, etc.), must be named "www" with no arguments for a list of example names.
Launch apps stored in a directory with runApp(<path to directory>). fileInput(inputId, label, multiple,
accept, width, buttonLabel, placeholder)
Share Outputs render*() and *Output() functions work together to add R output to the UI. numericInput(inputId, label, value,
min, max, step, width)
Share your app in three ways: DT::renderDataTable(expr, options, dataTableOutput(outputId)
searchDelay, callback, escape, env, quoted, passwordInput(inputId, label, value,
1. Host it on shinyapps.io, a cloud based outputArgs) width, placeholder)
service from RStudio. To deploy Shiny apps:
renderImage(expr, env, quoted, deleteFile, imageOutput(outputId, width, height, radioButtons(inputId, label,
Create a free or professional outputArgs) click, dblclick, hover, brush, inline) choices, selected, inline, width,
account at shinyapps.io choiceNames, choiceValues)
renderPlot(expr, width, height, res, …, alt, env, plotOutput(outputId, width, height, click,
Click the Publish icon in RStudio IDE, or run: quoted, execOnResize, outputArgs) dblclick, hover, brush, inline) selectInput(inputId, label, choices,
rsconnect::deployApp("<path to directory>") selected, multiple, selectize, width, size)
Also selectizeInput()
renderPrint(expr, env, quoted, width, verbatimTextOutput(outputId,
2. Purchase RStudio Connect, a outputArgs) placeholder)
publishing platform for R and Python. sliderInput(inputId, label, min, max,
value, step, round, format, locale, ticks,
rstudio.com/products/connect/ renderTable(expr, striped, hover, bordered, tableOutput(outputId) animate, width, sep, pre, post,
spacing, width, align, rownames, colnames, timeFormat, timezone, dragRange)
3. Build your own Shiny Server digits, na, …, env, quoted, outputArgs)
rstudio.com/products/shiny/shiny-server/ textOutput(outputId, container, inline) submitButton(text, icon, width)
renderText(expr, env, quoted, outputArgs, sep) (Prevent reactions for entire app)
renderUI(expr, env, quoted, outputArgs) uiOutput(outputId, inline, container, …)
htmlOutput(outputId, inline, container, …) textInput(inputId, label, value, width,
placeholder) Also textAreaInput()
These are the core output types. See htmlwidgets.org for many more options.
CC BY SA Posit So ware, PBC • info@posit.co • posit.co • Learn more at shiny.rstudio.com • Font Awesome 5.15.3 • shiny 1.6.0 • Updated: 2021-07
fi
fi
fi
fl
ft
Reactivity UI - An app’s UI is an HTML document. Layouts
Reactive values work together with reactive functions. Call a reactive value from within the arguments of one Use Shiny’s functions to assemble this HTML with R. Combine multiple elements
of these functions to avoid the error Operation not allowed without an active reactive context. uidPage( into a "single element" that
textInput("a","") Returns has its own properties with a
) HTML panel function, e.g.
## <div class="container- uid"> wellPanel(
## <div class="form-group shiny-input-container"> dateInput("a", ""),
## <label for="a"></label> submitButton()
## <input id="a" type="text" )
## class="form-control" value=""/>
## </div> absolutePanel() navlistPanel()
## </div> conditionalPanel() sidebarPanel()
fixedPanel() tabPanel()
headerPanel() tabsetPanel()
Add static HTML elements with tags, a list inputPanel() titlePanel()
of functions that parallel common HTML mainPanel() wellPanel()
tags, e.g. tags$a(). Unnamed arguments
will be passed into the tag; named Organize panels and elements into a layout with a
arguments will become tag attributes. layout function. Add elements as arguments of the
layout functions.
Run names(tags) for a complete list. sidebarLayout()
tags$h1("Header") -> <h1>Header</h1> ui <- fluidPage(
sidebarLayout(
The most common tags have wrapper functions. You side main sidebarPanel(),
do not need to prefix their names with tags$ mainPanel()
CREATE YOUR OWN REACTIVE VALUES RENDER REACTIVE OUTPUT
panel
panel )
*Input() functions render*() functions ui <- uidPage( )
# *Input() example library(shiny)
(see front page) (see front page) h1("Header 1"),
fluidRow()
ui <- uidPage( ui <- uidPage( hr(),
textInput("a","","A") Each input function creates textInput("a","","A"), br(), ui <- fluidPage(
) textOutput("b") Builds an object to row
column col
a reactive value stored as p(strong("bold")), fluidRow(column(width = 4),
) display. Will rerun code in
input$<inputId>. p(em("italic")),
column(width = 2, o set = 3)),
server <- function(input,output){ body to rebuild the object p(code("code")),
output$b <- whenever a reactive value a(href="", "link"), column fluidRow(column(width = 12))
#reactiveValues example
reactiveValues(…) renderText({
in the code changes. HTML("<p>Raw html</p>") )
server <- function(input,output){ input$a
rv <- reactiveValues() Creates a list of reactive }) )
Also flowLayout(), splitLayout(), verticalLayout(),
rv$number <- 5 values whose values you } Save the results to fixedPage(), and fixedRow().
}
can set. shinyApp(ui, server)
output$<outputId>.
To include a CSS file, use includeCSS(), or
1. Place the file in the www subdirectory Layer tabPanels on top of each other,
CREATE REACTIVE EXPRESSIONS PERFORM SIDE EFFECTS and navigate between them, with:
2. Link to it with:
library(shiny) reactive(x, env, quoted, observeEvent(eventExpr, ui <- fluidPage( tabsetPanel(
ui <- uidPage( label, domain)
library(shiny)
handlerExpr, event.env, tags$head(tags$link(rel = "stylesheet", tabPanel("tab 1", "contents"),
ui <- uidPage( event.quoted, handler.env, type = "text/css", href = "< le name>")) tabPanel("tab 2", "contents"),
textInput("a","","A"),
textInput("z","","Z"),
Reactive expressions: textInput("a","","A"),
handler.quoted, ..., label, tabPanel("tab 3", "contents")))
textOutput("b")) • cache their value to )
actionButton("go","Go")
suspended, priority, domain, To include JavaScript, use includeScript() or
server <- function(input,output){
reduce computation autoDestroy, ignoreNULL, 1. Place the file in the www subdirectory ui <- fluidPage( navlistPanel(
• can be called elsewhere server <- function(input,output){
ignoreInit, once) tabPanel("tab 1", "contents"),
re <- reactive({
observeEvent(input$go,{ 2. Link to it with: tabPanel("tab 2", "contents"),
paste(input$a,input$z)}) • notify dependencies print(input$a) Runs code in 2nd
output$b <- renderText({
when invalidated }) tabPanel("tab 3", "contents")))
re()
} argument when reactive tags$head(tags$script(src = "< le name>"))
}
}) Call the expression with values in 1st argument ui <- navbarPage(title = "Page",
function syntax, e.g. re(). change. See observe() for IMAGES To include an image:
shinyApp(ui, server)
shinyApp(ui, server) tabPanel("tab 1", "contents"),
alternative. 1. Place the file in the www subdirectory tabPanel("tab 2", "contents"),
2. Link to it with img(src="< le name>") tabPanel("tab 3", "contents"))
REACT BASED ON EVENT REMOVE REACTIVITY
eventReactive(eventExpr, isolate(expr)
library(shiny)
ui <- uidPage( valueExpr, event.env, library(shiny)