Professional Documents
Culture Documents
Intro To Data Science Lecture 4
Intro To Data Science Lecture 4
Intro To Data Science Lecture 4
OUTLINE
Introduction to Data Science for 1. Data Exploration
Civil Engineers 2. Data Management
3. Data Engineering and Shaping
Fall 2022
1 2
In R, graphing packages include ggplot2, WVPlots, function of other variables whose data are available
(regression models, neural network models)
ggpubr, ggstatsplot, lattice, and others.
Investigate the reasons for data missing
3 4 1
9/6/2022
5 6
https://github.com/eddelbuettel/gsir-te
The caret and recipes R packages both include many Package dplyr
more high-level functions for data preprocessing and Data manipulation through sequences of SQL-like operators
normalization. Part of tidyverse (a collection of R packages designed for data science).
https://github.com/saghirb/Getting-Started-in-R
7 8 2
9/6/2022
9 10
11 12 3
9/6/2022
13 14
15 16 4
9/6/2022
17 18
19 20 5
9/6/2022
21 22
23 24 6
9/6/2022
Combining multiple data frames by columns using the base R > capacityTable
RoadID capacity
function cbind() 1 r1 9.99
2 r3 19.99
> temp<-cbind(purchases1, purchases2) 3 r4 5.49
> temp 4 r5 24.49
day hour n_purchase day hour n_purchase > demandTable > merge(capacityTable,demandTable, by ="RoadID",all.x=TRUE)
1 1 9 5 2 9 3 RoadID demand RoadID capacity demand
4 1 13 1 2 11 5 1 r1 10 1 r1 9.99 10
6 1 14 1 2 13 3 2 r2 43 2 r3 19.99 55
3 r3 55 3 r4 5.49 8
4 r4 8 4 r5 24.49 NA
25 26
> capacityTable
RoadID capacity
1 r1 9.99
2 r3 19.99
3 r4 5.49
4 r5 24.49
> demandTable
RoadID demand > capacityTable
1 r1 10 RoadID capacity
2 r2 43 1 r1 9.99
3 r3 55 2 r3 19.99
4 r4 8 3 r4 5.49
4 r5 24.49
> merge(capacityTable,demandTable, by ="RoadID",all.y=TRUE) > demandTable > merge(capacityTable,demandTable, by ="RoadID")
RoadID capacity demand RoadID demand RoadID capacity demand
1 r1 9.99 10 1 r1 10 1 r1 9.99 10
2 r2 NA 43 2 r2 43 2 r3 19.99 55
3 r3 19.99 55 3 r3 55 3 r4 5.49 8
4 r4 5.49 8 4 r4 8
27 28 7
9/6/2022
> capacityTable
RoadID capacity
1 r1 9.99
2 r3 19.99
3 r4 5.49
4 r5 24.49 > merge(capacityTable,demandTable, by ="RoadID", all=TRUE)
> demandTable RoadID capacity demand
RoadID demand 1 r1 9.99 10
1 r1 10 2 r2 NA 43
2 r2 43 3 r3 19.99 55
3 r3 55 4 r4 5.49 8
4 r4 8 5 r5 24.49 NA
29 30
31 32 8
9/6/2022
package.
> install.packages(“data.table")
> library(“data.table")
33 34
package.
> library("tidyr")
> y_long > library(“cdata")
index info meastype meas >
1 1 a meas1 4 > y_wide2 <- pivot_to_rowrecs(y_long,
2 2 a meas1 5 + columnToTakeKeysFrom = "meastype",
3 3 c meas1 6 + columnToTakeValuesFrom = "meas",
4 1 a meas2 7 + rowKeyColumns = "index")
5 2 a meas2 8 > y_wide2
6 3 c meas2 9 index info meas1 meas2 meas3
7 1 a meas3 11 1 1 a 4 7 11
8 2 a meas3 12 2 2 a 5 8 12
9 3 c meas3 13 3 3 c 6 9 13
>
> y_wide1 <- spread(y_long,key=meastype,value=meas)
>
> y_wide1
index info meas1 meas2 meas3
1 1 a 4 7 11
2 2 a 5 8 12
3 3 c 6 9 13
35 36 9
9/6/2022
37 38
39 40 10
9/6/2022
41 42
43 44 11
9/6/2022
45 46
47 48 12
9/6/2022
49 50
13