Professional Documents
Culture Documents
K25 Parsing
K25 Parsing
3 Valid inputs in R
4 Separators
5 Metaprogramming
6 Using expressions
Example:
# The file parsing_example.txt contains the following statements:
# x <- 5 + 3
# y <- 1:10
# z <- sum(4, 6, 9)
parse("parsing_example.txt")
eval(parse("parsing_example.txt"))
ls()
19
2 3 18
+
1 10 9 17
2 /
5 6 8 12 13 15
∗ :
4 7 11 14
3 4 1 2
36
3 2 35
<−
1 4 6 7, 9 32
add function 5 ( x y ) 10
12 26 30
{ }
16 15 23 21
( )
14 19 18 22
return +
17 20
x y
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 16 / 67
The internal representation - Expressions
## \- () ## \- ()
## \- `sin ## \- `mean
## \- () ## \- ()
## \- `+ ## \- `:
## \- 2 ## \- 1
## \- 3 ## \- 10
## \- 0.1
On the next couple of slides, we will take a look at the structure and
contents of an AST.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 17 / 67
The internal representation - Expressions
## \- 2 ## \- `x
pryr::ast("a") pryr::ast(sin)
## \- "a" ## \- `sin
pryr::ast(TRUE) pryr::ast(`+`)
## \- TRUE ## \- `+
Constants are always scalars, i.e. We’re also accessing most functions
vectors of length 1. using symbols.
Function calls I
## \- () ## \- ()
## \- `sin ## \- `sum
## \- 2 ## \- 1L
## \- 2L
## \- 3L
Being an inner node of the AST, every function call has multiple childs
which are trees (ASTs) themselves. The first child always denotes the called
function while the remaining children represent the function’s arguments.
Function calls II
## \- () ## \- ()
## \- `sin ## \- `mean
## \- () ## \- ()
## \- `cos ## \- `seq
## \- `x ## \- 1
## \- 10
It’s not advisable to rely on the order of the arguments, because argument
matching rules in R are rather complicated after all:
## \- () ## \- ()
## \- `mean ## \- `mean
## \- () ## \- 0.1
## \- `c ## \- `MISSING
## \- 1 ## \- TRUE
## \- 2 ## \- ()
## \- 0.1 ## \- `c
## \- TRUE ## \- 1
## \- 2
Function calls IV
## \- () ## \- ()
## \- `+ ## \- `while
## \- 2 ## \- `cond
## \- 3 ## \- ()
## \- `=
## \- `cond
## \- TRUE
More elements can (almost) not appear inside of an AST and thus the
internal structure of every piece of R code is aptly described. Except for:
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 22 / 67
The internal representation - Expressions
Pairlists
## \- ()
## \- `function
## \- []
## \ x = 2
## \- `x
## \- <srcref>
Pairlists are used in R mostly for the formals of a function. As such, they
appear in the AST if the function function has been called beforehand.
Pairlists themselves can contain constants, symbols and calls once more.
Valid inputs in R
Valid inputs in R
2.1e-2 2.1E2
## [1] 0.021 ## [1] 210
For numbers with an absolute value between 0 and 1, the leading zero
can be omitted:
.5 -.5
## [1] 0.5 ## [1] -0.5
0xa 0x123
## [1] 10 ## [1] 291
2 Either start with a letter or a dot. When starting with a dot, the
second character must not be a digit.
3 Blank characters inside of variable names are not permitted.
4 It’s not allowed to use reserved words (more on them shortly).
We can circumvent these rules by using backticks. E.g: a§b is not a valid
name, whereas ‘a§b‘ is a valid name.
a§b
`a§b` <- 2
`a§b`
## Error: <text>:1:2: unexpected input
## 1: a§
## [1] 2
## ^
.a <- 5
ls()
ls(all.names = TRUE)
There are in total 19 words in R that are not allowed as variable names.
They are:
if and else,
for, while and repeat,
in, next and break,
function,
TRUE and FALSE,
NULL, Inf and NaN,
NA, NA_integer_, NA_real_, NA_complex_ and NA_character_.
A complete list can also be found via ?Reserved.
Separators
Separators I
x<-5 x< -5
## [1] FALSE
Separators II
Metaprogramming
What is metaprogramming?
1:10 quote(1:10)
## [1] 1 2 3 4 5 6 7 8 9 10 ## 1:10
quote() returns its input without evaluating it. Alternatively, we can also
create an expression using parse() without directly evaluating it.
## language 1:10
## [1] "expression"
typeof(expr.obj[[1]])
## [1] "language"
Evaluating an expression
## [1] 5 ## [1] 1
eval() is the exact opposite of quote(). Their calls neutralize each other:
eval(quote(eval(eval(quote(quote(2 + 2))))))
## [1] 4
Modifying an expression I
Modifying an expression II
Findings: Expressions have a length and can be subsetted similarly to lists.
Just like for lists: [ returns an expression with the respective elements, [[
returns the content of the element.
typeof(expr[1]) typeof(expr[2])
typeof(expr[[1]]) typeof(expr[[2]])
The first element always denotes the function that is being called. This
element can be a symbol or a call itself (constants that are also functions
don’t exist).
## factory()
The other elements of the call are the arguments of the function call. They
are potentially named and may be referenced by their name.
Modifying an expression IV
Just like vectors, calls can be modified using the regular replacement
functions $<- and [[<-:
## mean(1, 2, 3) ## mean(5, 2, 3)
expr[[1]] = 1 expr[-1]
expr
## 5(2, 3)
## 1(5, 2, 3)
Creating a call
To manually generate a call on our own, we can use the functions call()
and as.call():
Sometimes, it’s useful for functions to know their own call. That’s what the
functions sys.call() and match.call() are for:
For example, this is used by the function lm() to return the call as well:
lm(mpg ~ wt, data = mtcars)$call
Using expressions
## x y x...y
−1.0 0.0 1.0
## 1 1 2 3
## 2 2 3 5
sinx
## 3 3 4 7
## 4 4 5 9
## 5 5 6 11
0 2 4 6 ## 6 6 7 13
x ## 7 7 8 15
How does R know the axis labels? How does R know column names?
In both examples, the respective R functions not only use the values of
their input parameters, but also the corresponding expressions.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 48 / 67
Using expressions
Let’s try to recreate this behavior using quote() while reducing the
functionality to a bare minimum:
f = function(x) f(sin(x))
quote(x)
f(1:10) ## x
## x
substitute()
f = function(x) f(sin(x))
substitute(x)
f(1:10) ## sin(x)
## 1:10
g = function(x) g(sin(x))
deparse(substitute(x))
g(1:10) ## [1] "sin(x)"
## [1] "1:10"
library("e1071") library(e1071)
Why does the second call work? After all, there is no e1071 object:
e1071
f = function(x) h = function(x)
substitute(x) deparse(substitute(x))
g = function(x)
deparse(f(x))
g(1:10) h(1:10)
substitute() returns the parse tree for the (unevaluated) expression expr,
substituting any variables bound in env.
h = function(x) h(1:10)
deparse(expr = substitute(x))
## [1] "1:10"
How does subset() work? The variables a, b and c don’t exist in any
environment, yet no error occurs.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 57 / 67
Using expressions
y = 4 expr = 4
mySubset(sample.df, a >= y) mySubset(sample.df, a >= expr)
Loopholes I
Loopholes II
During the evaluation of mySubset(), condition is substituted for
the corresponding expression which is stored in expr.
However, this expression is merely the symbol condition which itself
is bound to a promise object.
The desired expression is located in the environment of this promise
object and is unaccessible to us.
When calling eval(), the symbol condition is then evaluated inside
the specified environment. Evaluating a symbol means (cf. ?eval)
replacing it by its value.
The calling environment of mySubset() is the execution environment
of subscramble() where the symbol condition is bound to a
promise object. Thus, the result of eval() is this promise object.
Subsequently, the promise object is evaluated in its own environment
where, however, no binding of a exists.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 62 / 67
Using expressions
Loopholes III
Loopholes IV
## a b c
## 4 4 2 4
## 5 5 1 1
y = 2
substitute(a + b, list(a = y)) substitute(a + b, list(`+` = quote(`*`)))
## 2 + b ## a * b
## y + b ## 1(a, b)