Professional Documents
Culture Documents
Data Manipulation With R - 1
Data Manipulation With R - 1
INTRODUCTION
What is data.table?
DATA.TABLE
General form
DATA.TABLE
mtcarsDT[
!
!
.(AvgHP = mean(hp),
"MinWT(kg)" = min(wt*453.6)),
!
!
# wt lbs
R:
by
SQL:
WHERE
SELECT
GROUP BY
General form
DT[i, j, by]
DATA.TABLE
General form
DATA.TABLE
> DT
A
1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
typeof(1)
B
C
D
a -0.6264538 TRUE
b 0.1836433 TRUE
c -0.8356286 TRUE
a 1.5952808 TRUE
b 0.3295078 TRUE
c -0.8204684 TRUE
typeof(1L) == integer
!
typeof(NA)
== logical
typeof(NA_integer_) == integer
!
B
a
b
c
a
b
c
> DT[3:5,]
A B
1: 3 c
2: 4 a
3: 5 b
> DT[3:5]
A B
1: 3 c
2: 4 a
3: 5 b
DATA.TABLE
data.frame
> DF
A
1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
B
a
b
c
a
b
c
> DF[3:5,]
A B
3: 3 c
4: 4 a
5: 5 b
> DF[3:5]
Error:
undefined columns selected
Compatibility
DATA.TABLE
DATA.TABLE
Lets practice
Selecting columns in j
> DT[, .(B, C)]
> DT
1:
2:
3:
4:
5:
A
1
2
3
4
5
B C
a 6
b 7
c 8
d 9
e 10
1:
2:
3:
4:
5:
B C
a 6
b 7
c 8
d 9
e 10
DATA.TABLE
Computing on columns
> DT[, .(Total = sum(A), Mean = mean(C))]
> DT
1:
2:
3:
4:
5:
A
1
2
3
4
5
B C
a 6
b 7
c 8
d 9
e 10
Total
1:
15
Mean
8
DATA.TABLE
Recycling in j
> DT[, .(B, C = sum(C))]
> DT
1:
2:
3:
4:
5:
A
1
2
3
4
5
DATA.TABLE
B C
a 6
b 7
c 8
d 9
e 10
1:
2:
3:
4:
5:
B
a
b
c
d
e
C
40
40
40
40
40
DATA.TABLE
DATA.TABLE
DATA.TABLE
Lets practice
Doing j by group
> DT
A B
1: c 1
2: b 2
3: a 3
4: c 4
5: b 5
6: a 6
A MySum MyMean
1: c
5
2.5
2: b
7
3.5
3: a
9
4.5
DATA.TABLE
Function calls in by
> DT[, .(MySum = sum(B)), by = .(Grp = A%%2)]
%%
> DT
1:
2:
3:
4:
5:
A
1
2
3
4
5
B
10
11
12
13
14
Grp MySum
1:
1
36
2:
0
24
DATA.TABLE
A V1
1: 0 24
2: 1 12
DATA.TABLE
DATA.TABLE
Lets practice