Professional Documents
Culture Documents
IE451 Fall 2023-2024 Homework 1 Solutions
IE451 Fall 2023-2024 Homework 1 Solutions
Savas Dayanik
Homework 1
• R for Data Science
Section 5.2.4
The first ten flights. Help page of flights says the delays are measured in
minutes. (continued below)
year month day dep_time sched_dep_time dep_delay arr_time
2013 1 1 811 630 101 1047
2013 1 1 848 1835 853 1001
2013 1 1 957 733 144 1056
2013 1 1 1114 900 134 1447
2013 1 1 1505 1310 115 1638
2013 1 1 1525 1340 105 1831
Table continues below
sched_arr_time arr_delay carrier flight tailnum origin dest
830 137 MQ 4576 N531MQ LGA CLT
1950 851 MQ 3944 N942MQ JFK BWI
853 123 UA 856 N534UA EWR BOS
1222 145 UA 1086 N76502 LGA IAH
1431 127 EV 4497 N17984 EWR RIC
1 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
The
first ten flights to IAH or HOU (continued below)
year month day dep_time sched_dep_time dep_delay arr_time
2013 1 1 517 515 2 830
2013 1 1 533 529 4 850
2013 1 1 623 627 -4 933
2013 1 1 728 732 -4 1041
2013 1 1 739 739 0 1104
2013 1 1 908 908 0 1228
2013 1 1 1028 1026 2 1350
2013 1 1 1044 1045 -1 1352
2013 1 1 1114 900 134 1447
2013 1 1 1205 1200 5 1503
Table continues below
sched_arr_time arr_delay carrier flight tailnum origin dest
819 11 UA 1545 N14228 EWR IAH
830 20 UA 1714 N24211 LGA IAH
932 1 UA 496 N459UA LGA IAH
1038 3 UA 473 N488UA LGA IAH
1038 26 UA 1479 N37408 EWR IAH
1219 9 UA 1220 N12216 EWR IAH
1339 11 UA 1004 N76508 LGA IAH
1351 1 UA 455 N667UA EWR IAH
1222 145 UA 1086 N76502 LGA IAH
1505 -2 UA 1461 N39418 EWR IAH
air_time distance hour minute time_hour
227 1400 5 15 2013-01-01 05:00:00
227 1416 5 29 2013-01-01 05:00:00
229 1416 6 27 2013-01-01 06:00:00
238 1416 7 32 2013-01-01 07:00:00
2 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
head(10) %>%
3 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
head(10) %>%
4 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
5. Arrived more than two hours late, but didn’t leave late
flights %>%
pander(caption="Flights that departed early or on time, but arrived more than two hours
late.")
Flights that departed early or on time, but arrived more than two hours late.
(continued below)
year month day dep_time sched_dep_time dep_delay arr_time
2013 1 27 1419 1420 -1 1754
2013 10 7 1350 1350 0 1736
2013 10 7 1357 1359 -2 1858
2013 10 16 657 700 -3 1258
2013 11 1 658 700 -2 1329
2013 3 18 1844 1847 -3 39
2013 4 17 1635 1640 -5 2049
2013 4 18 558 600 -2 1149
2013 4 18 655 700 -5 1213
2013 5 22 1827 1830 -3 2217
2013 5 23 1810 1810 0 2208
2013 6 5 1604 1615 -11 2041
2013 6 14 1708 1710 -2 2227
2013 6 24 1602 1605 -3 2134
2013 6 27 2052 2100 -8 13
2013 6 30 1423 1425 -2 1816
2013 7 1 905 905 0 1443
2013 7 7 1659 1700 -1 2050
2013 7 7 1727 1730 -3 2203
2013 7 7 1746 1755 -9 2133
5 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
6 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
mutate_at(vars(starts_with("sched")), ~{
}) %>%
7 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
head(10) %>%
pander(caption = "Ten examples for flights that are at least an hour late, but spent at
least 30 minutes less than scheduled in the air")
Ten examples for flights that are at least an hour late, but spent at least 30 minutes
less than scheduled in the air (continued below)
arr_delay sched_dep_time sched_arr_time sched_air_time air_time year
851 1115 1190 75 41 2013
123 453 533 80 37 2013
78 584 733 149 117 2013
115 818 1105 287 193 2013
78 883 953 70 35 2013
61 945 1239 294 183 2013
68 990 1194 204 167 2013
66 975 1099 124 72 2013
61 1184 1269 85 51 2013
246 1040 1240 200 146 2013
Table continues below
month day dep_time dep_delay arr_time carrier flight tailnum
1 1 848 853 1001 MQ 3944 N942MQ
1 1 957 144 1056 UA 856 N534UA
1 1 1120 96 1331 EV 4495 N16561
1 1 1540 122 2020 B6 705 N570JB
1 1 1607 84 1711 UA 465 N435UA
1 1 1716 91 2140 B6 703 N651JB
1 1 1740 70 2102 DL 2139 N369NW
1 1 1743 88 1925 9E 3651 N8515F
1 1 2056 72 2210 EV 4692 N11536
1 1 2205 285 46 AA 1999 N5DNAA
origin dest distance hour minute time_hour
JFK BWI 184 18 35 2013-01-01 18:00:00
EWR BOS 200 7 33 2013-01-01 07:00:00
EWR SAV 708 9 44 2013-01-01 09:00:00
JFK SJU 1598 13 38 2013-01-01 13:00:00
EWR BOS 200 14 43 2013-01-01 14:00:00
JFK SJU 1598 15 45 2013-01-01 15:00:00
LGA MIA 1096 16 30 2013-01-01 16:00:00
JFK RDU 427 16 15 2013-01-01 16:00:00
EWR IAD 212 19 44 2013-01-01 19:00:00
EWR MIA 1085 17 20 2013-01-01 17:00:00
8 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
head(10) %>%
9 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
Missing dep_time
How many flights have a missing dep_time? What other variables are missing? What
might these rows represent?
flights %>%
Section 5.3.1
arrange(desc(arr_delay)) %>%
relocate(arr_delay) %>%
head(10) %>%
10 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
flights %>%
arrange(arr_delay) %>%
relocate(arr_delay) %>%
head(10) %>%
11 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
Fastest flights
Sort flights to find the fastest (highest speed) flights.
flights %>%
arrange(desc(speed)) %>%
head(10) %>%
12 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
flights %>%
arrange(desc(distance)) %>%
relocate(distance) %>%
head(10) %>%
13 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
# panderOptions("table.alignment.default", "right")
flights %>%
arrange(distance) %>%
relocate(distance) %>%
head(10) %>%
Section 5.6.7
Which carrier has the worst delays? Challenge: can you disentangle the effects of
bad airports vs. bad carriers? Why/why not? (Hint: think about flights %>%
group_by(carrier, dest) %>% summarise(n()))
We should control for origin, destination, month, and flight time (rush hour or not,
weekend or weekday) while we compare carriers with respect to flights delay. Let
us calculate the mean delay times for each carrier between every pair of departure
14 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
mutate(month = factor(month),
weekday = weekdays(time_hour),
na.omit() %>%
weight = n(),
.groups = "drop") # do not let negative arr_delay cancel out positive delays
lmod %>%
pander(caption = "Model for delay versus carrier controlled for other variables")
15 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
16 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
17 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
glance() %>%
Adjusted R2 equals 34%, which is weak. The RSE/mean delay =279% is huge.
There may be other factors that contribute to the variation in delay. We cannot use
18 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
the model for delay prediction, but the model can be useful to compare carriers.
lmod %>%
tidy() %>%
head(10) %>%
Since we are controlling for many aspects of the flights, comparisons of carrier
estimates is fair. But those estimates have different standard errors. Therefore, it is
better to compare the t-statistics; namely, standardized carrier estimates.
lmod %>%
col="red", lty="dashed") +
19 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
Section 5.7.1
group_by(tailnum) %>%
geom_linerange(aes(ymin=0, ymax=worst_on_time)) +
20 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor
Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor
Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor
Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor
Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor
Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor
Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor
21 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
Avoid delays
What time of day should you fly if you want to avoid delays as much as possible?
d2 <- flights %>%
group_by(dep_time_cut) %>%
22 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
d2 %>%
23 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...
24 of 24 19/02/2024, 15:03