Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

IE451 Fall 2023-2024 Homework 1


Solutions
Author

Savas Dayanik

Homework 1
• R for Data Science

• Section 5.2.4, Exercises 1, 2, 3

• Section 5.3.1, Exercises 2, 3, 4

• Section 5.6.7, Exercise 5

• Section 5.7.1, Exercises 2, 3

Section 5.2.4

Flights with specific attributes


1. Had an arrival delay of two or more hours
flights %>%
filter(arr_delay >= 2*60) %>%
head() %>%
pander(caption="The first ten flights. Help page of flights says the delays are measured in
minutes.")

The first ten flights. Help page of flights says the delays are measured in
minutes. (continued below)
year month day dep_time sched_dep_time dep_delay arr_time
2013 1 1 811 630 101 1047
2013 1 1 848 1835 853 1001
2013 1 1 957 733 144 1056
2013 1 1 1114 900 134 1447
2013 1 1 1505 1310 115 1638
2013 1 1 1525 1340 105 1831
Table continues below
sched_arr_time arr_delay carrier flight tailnum origin dest
830 137 MQ 4576 N531MQ LGA CLT
1950 851 MQ 3944 N942MQ JFK BWI
853 123 UA 856 N534UA EWR BOS
1222 145 UA 1086 N76502 LGA IAH
1431 127 EV 4497 N17984 EWR RIC

1 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

sched_arr_time arr_delay carrier flight tailnum origin dest


1626 125 B6 525 N231JB EWR MCO
air_time distance hour minute time_hour
118 544 6 30 2013-01-01 06:00:00
41 184 18 35 2013-01-01 18:00:00
37 200 7 33 2013-01-01 07:00:00
248 1416 9 0 2013-01-01 09:00:00
63 277 13 10 2013-01-01 13:00:00
152 937 13 40 2013-01-01 13:00:00

2. Flew to Houston (IAH or HOU)


flights %>%
filter(dest %in% c("IAH", "HOU")) %>%
head(10) %>%
pander(caption="The first ten flights to IAH or HOU")

The
first ten flights to IAH or HOU (continued below)
year month day dep_time sched_dep_time dep_delay arr_time
2013 1 1 517 515 2 830
2013 1 1 533 529 4 850
2013 1 1 623 627 -4 933
2013 1 1 728 732 -4 1041
2013 1 1 739 739 0 1104
2013 1 1 908 908 0 1228
2013 1 1 1028 1026 2 1350
2013 1 1 1044 1045 -1 1352
2013 1 1 1114 900 134 1447
2013 1 1 1205 1200 5 1503
Table continues below
sched_arr_time arr_delay carrier flight tailnum origin dest
819 11 UA 1545 N14228 EWR IAH
830 20 UA 1714 N24211 LGA IAH
932 1 UA 496 N459UA LGA IAH
1038 3 UA 473 N488UA LGA IAH
1038 26 UA 1479 N37408 EWR IAH
1219 9 UA 1220 N12216 EWR IAH
1339 11 UA 1004 N76508 LGA IAH
1351 1 UA 455 N667UA EWR IAH
1222 145 UA 1086 N76502 LGA IAH
1505 -2 UA 1461 N39418 EWR IAH
air_time distance hour minute time_hour
227 1400 5 15 2013-01-01 05:00:00
227 1416 5 29 2013-01-01 05:00:00
229 1416 6 27 2013-01-01 06:00:00
238 1416 7 32 2013-01-01 07:00:00

2 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

air_time distance hour minute time_hour


249 1400 7 39 2013-01-01 07:00:00
233 1400 9 8 2013-01-01 09:00:00
237 1416 10 26 2013-01-01 10:00:00
229 1400 10 45 2013-01-01 10:00:00
248 1416 9 0 2013-01-01 09:00:00
221 1400 12 0 2013-01-01 12:00:00

3. Were operated by United, American, or Delta


flights %>%

left_join(airlines, by = "carrier") %>% # combine tables airlines and


flights using the common key carrier

filter(str_detect(name, "United|American|Delta")) %>% # use regular expressions to


detect rows whose name column contains United or American or Delta

relocate(carrier, name) %>% # Pull carrier and name columns to


the beginning of table

head(10) %>%

pander(caption = "Ten example flights operated by United, American, or Delta. Combined


tables airlines and flights, then used string search. Details are in the same book")

Ten example flights operated by United, American, or Delta.


Combined tables airlines and flights, then used string search. Details
are in the same book (continued below)
carrier name year month day dep_time
UA United Air Lines Inc. 2013 1 1 517
UA United Air Lines Inc. 2013 1 1 533
AA American Airlines Inc. 2013 1 1 542
DL Delta Air Lines Inc. 2013 1 1 554
UA United Air Lines Inc. 2013 1 1 554
AA American Airlines Inc. 2013 1 1 558
UA United Air Lines Inc. 2013 1 1 558
UA United Air Lines Inc. 2013 1 1 558
AA American Airlines Inc. 2013 1 1 559
UA United Air Lines Inc. 2013 1 1 559
Table continues below
sched_dep_time dep_delay arr_time sched_arr_time arr_delay flight
515 2 830 819 11 1545
529 4 850 830 20 1714
540 2 923 850 33 1141
600 -6 812 837 -25 461
558 -4 740 728 12 1696
600 -2 753 745 8 301
600 -2 924 917 7 194
600 -2 923 937 -14 1124

3 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

sched_dep_time dep_delay arr_time sched_arr_time arr_delay flight


600 -1 941 910 31 707
600 -1 854 902 -8 1187
tailnum origin dest air_time distance hour minute time_hour
N14228 EWR IAH 227 1400 5 15 2013-01-01 05:00:00
N24211 LGA IAH 227 1416 5 29 2013-01-01 05:00:00
N619AA JFK MIA 160 1089 5 40 2013-01-01 05:00:00
N668DN LGA ATL 116 762 6 0 2013-01-01 06:00:00
N39463 EWR ORD 150 719 5 58 2013-01-01 05:00:00
N3ALAA LGA ORD 138 733 6 0 2013-01-01 06:00:00
N29129 JFK LAX 345 2475 6 0 2013-01-01 06:00:00
N53441 EWR SFO 361 2565 6 0 2013-01-01 06:00:00
N3DUAA LGA DFW 257 1389 6 0 2013-01-01 06:00:00
N76515 EWR LAS 337 2227 6 0 2013-01-01 06:00:00

4. Departed in summer (July, August, and September)


flights %>%

filter(between(month, 7, 9)) %>%

head(10) %>%

pander(caption="Ten examples for departures in July, August, September")

Ten examples for departures in July, August, September (continued below)


year month day dep_time sched_dep_time dep_delay arr_time
2013 7 1 1 2029 212 236
2013 7 1 2 2359 3 344
2013 7 1 29 2245 104 151
2013 7 1 43 2130 193 322
2013 7 1 44 2150 174 300
2013 7 1 46 2051 235 304
2013 7 1 48 2001 287 308
2013 7 1 58 2155 183 335
2013 7 1 100 2146 194 327
2013 7 1 100 2245 135 337
Table continues below
sched_arr_time arr_delay carrier flight tailnum origin dest
2359 157 B6 915 N653JB JFK SFO
344 0 B6 1503 N805JB JFK SJU
1 110 B6 234 N348JB JFK BTV
14 188 B6 1371 N794JB LGA FLL
100 120 AA 185 N324AA JFK LAX
2358 186 B6 165 N640JB JFK PDX
2305 243 VX 415 N627VA JFK LAX
43 172 B6 425 N535JB JFK TPA

4 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

sched_arr_time arr_delay carrier flight tailnum origin dest


30 177 B6 1183 N531JB JFK MCO
135 122 B6 623 N663JB JFK LAX
air_time distance hour minute time_hour
315 2586 20 29 2013-07-01 20:00:00
200 1598 23 59 2013-07-01 23:00:00
66 266 22 45 2013-07-01 22:00:00
143 1076 21 30 2013-07-01 21:00:00
297 2475 21 50 2013-07-01 21:00:00
304 2454 20 51 2013-07-01 20:00:00
298 2475 20 1 2013-07-01 20:00:00
140 1005 21 55 2013-07-01 21:00:00
126 944 21 46 2013-07-01 21:00:00
304 2475 22 45 2013-07-01 22:00:00

5. Arrived more than two hours late, but didn’t leave late
flights %>%

filter(arr_delay > 2*60, dep_delay <=0) %>%

pander(caption="Flights that departed early or on time, but arrived more than two hours
late.")

Flights that departed early or on time, but arrived more than two hours late.
(continued below)
year month day dep_time sched_dep_time dep_delay arr_time
2013 1 27 1419 1420 -1 1754
2013 10 7 1350 1350 0 1736
2013 10 7 1357 1359 -2 1858
2013 10 16 657 700 -3 1258
2013 11 1 658 700 -2 1329
2013 3 18 1844 1847 -3 39
2013 4 17 1635 1640 -5 2049
2013 4 18 558 600 -2 1149
2013 4 18 655 700 -5 1213
2013 5 22 1827 1830 -3 2217
2013 5 23 1810 1810 0 2208
2013 6 5 1604 1615 -11 2041
2013 6 14 1708 1710 -2 2227
2013 6 24 1602 1605 -3 2134
2013 6 27 2052 2100 -8 13
2013 6 30 1423 1425 -2 1816
2013 7 1 905 905 0 1443
2013 7 7 1659 1700 -1 2050
2013 7 7 1727 1730 -3 2203
2013 7 7 1746 1755 -9 2133

5 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

year month day dep_time sched_dep_time dep_delay arr_time


2013 7 7 1823 1830 -7 2201
2013 7 22 1555 1600 -5 2139
2013 7 22 1606 1615 -9 2056
2013 7 22 1628 1630 -2 2151
2013 7 28 1710 1711 -1 2248
2013 8 8 1457 1500 -3 1828
2013 8 13 657 659 -2 1015
2013 8 28 1157 1200 -3 1520
2013 9 19 656 700 -4 1037
Table continues below
sched_arr_time arr_delay carrier flight tailnum origin dest
1550 124 MQ 3728 N1EAMQ EWR ORD
1526 130 EV 5181 N611QX LGA MSN
1654 124 AA 1151 N3CMAA LGA DFW
1056 122 B6 3 N703JB JFK SJU
1015 194 VX 399 N629VA JFK LAX
2219 140 UA 389 N560UA JFK SFO
1845 124 MQ 4540 N721MQ LGA DTW
850 179 AA 707 N3EXAA LGA DFW
950 143 AA 2083 N565AA EWR DFW
2010 127 MQ 4674 N518MQ LGA CLE
2000 128 MQ 4626 N525MQ LGA CMH
1840 121 MQ 4657 N510MQ LGA ATL
2015 132 AA 181 N320AA JFK LAX
1916 138 DL 706 N3768 JFK AUS
2210 123 US 2144 N952UW LGA BOS
1554 142 B6 2402 N206JB JFK BUF
1223 140 DL 1057 N337NB LGA MIA
1823 147 US 2183 N948UW LGA DCA
1951 132 F9 837 N263AV LGA DEN
1921 132 B6 1407 N374JB JFK IAD
1955 126 MQ 3486 N724MQ LGA BNA
1938 121 DL 141 N713TW JFK SFO
1831 145 DL 1619 N970DL LGA MSP
1939 132 B6 423 N625JB JFK LAX
2039 129 B6 167 N510JB JFK OAK
1624 124 US 2185 N746UW LGA DCA
814 121 EV 4522 N14188 EWR BNA
1316 124 US 2179 N737US LGA DCA
833 124 UA 331 N808UA LGA ORD
air_time distance hour minute time_hour
135 719 14 20 2013-01-27 14:00:00

6 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

air_time distance hour minute time_hour


117 812 13 50 2013-10-07 13:00:00
192 1389 13 59 2013-10-07 13:00:00
225 1598 7 0 2013-10-16 07:00:00
336 2475 7 0 2013-11-01 07:00:00
386 2586 18 47 2013-03-18 18:00:00
130 502 16 40 2013-04-17 16:00:00
234 1389 6 0 2013-04-18 06:00:00
230 1372 7 0 2013-04-18 07:00:00
90 419 18 30 2013-05-22 18:00:00
82 479 18 10 2013-05-23 18:00:00
158 762 16 15 2013-06-05 16:00:00
334 2475 17 10 2013-06-14 17:00:00
247 1521 16 5 2013-06-24 16:00:00
46 184 21 0 2013-06-27 21:00:00
80 301 14 25 2013-06-30 14:00:00
183 1096 9 5 2013-07-01 09:00:00
64 214 17 0 2013-07-07 17:00:00
236 1620 17 30 2013-07-07 17:00:00
78 228 17 55 2013-07-07 17:00:00
113 764 18 30 2013-07-07 18:00:00
371 2586 16 0 2013-07-22 16:00:00
140 1020 16 15 2013-07-22 16:00:00
332 2475 16 30 2013-07-22 16:00:00
353 2576 17 11 2013-07-28 17:00:00
70 214 15 0 2013-08-08 15:00:00
146 748 6 59 2013-08-13 06:00:00
63 214 12 0 2013-08-28 12:00:00
192 733 7 0 2013-09-19 07:00:00

6. Were delayed by at least an hour, but made up over 30 minutes in flight


flights %>%

mutate_at(vars(starts_with("sched")), ~{

full_time <- sprintf("%04d", .) # add zero to the


beginning of times

hour_time <- as.numeric(str_sub(full_time, 1, 2)) # extract hour


information

min_time <- as.numeric(str_sub(full_time, 3, 4)) # extract minute


information

hour_time*60 + min_time # express time as the


number of minutes since midnight

}) %>%

7 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

mutate(sched_air_time = sched_arr_time - sched_dep_time) %>% # scheduled air time in


minutes

filter(arr_delay >= 60, sched_air_time - air_time >= 30) %>%

relocate(arr_delay, sched_dep_time, sched_arr_time, sched_air_time, air_time) %>%

head(10) %>%

pander(caption = "Ten examples for flights that are at least an hour late, but spent at
least 30 minutes less than scheduled in the air")

Ten examples for flights that are at least an hour late, but spent at least 30 minutes
less than scheduled in the air (continued below)
arr_delay sched_dep_time sched_arr_time sched_air_time air_time year
851 1115 1190 75 41 2013
123 453 533 80 37 2013
78 584 733 149 117 2013
115 818 1105 287 193 2013
78 883 953 70 35 2013
61 945 1239 294 183 2013
68 990 1194 204 167 2013
66 975 1099 124 72 2013
61 1184 1269 85 51 2013
246 1040 1240 200 146 2013
Table continues below
month day dep_time dep_delay arr_time carrier flight tailnum
1 1 848 853 1001 MQ 3944 N942MQ
1 1 957 144 1056 UA 856 N534UA
1 1 1120 96 1331 EV 4495 N16561
1 1 1540 122 2020 B6 705 N570JB
1 1 1607 84 1711 UA 465 N435UA
1 1 1716 91 2140 B6 703 N651JB
1 1 1740 70 2102 DL 2139 N369NW
1 1 1743 88 1925 9E 3651 N8515F
1 1 2056 72 2210 EV 4692 N11536
1 1 2205 285 46 AA 1999 N5DNAA
origin dest distance hour minute time_hour
JFK BWI 184 18 35 2013-01-01 18:00:00
EWR BOS 200 7 33 2013-01-01 07:00:00
EWR SAV 708 9 44 2013-01-01 09:00:00
JFK SJU 1598 13 38 2013-01-01 13:00:00
EWR BOS 200 14 43 2013-01-01 14:00:00
JFK SJU 1598 15 45 2013-01-01 15:00:00
LGA MIA 1096 16 30 2013-01-01 16:00:00
JFK RDU 427 16 15 2013-01-01 16:00:00
EWR IAD 212 19 44 2013-01-01 19:00:00
EWR MIA 1085 17 20 2013-01-01 17:00:00

8 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

7. Departed between midnight and 6am (inclusive)


flights %>%

filter(dep_time <= 600) %>%

head(10) %>%

pander(caption = "Ten examples for flights departed between midnight and 6 AM


(inclusive)")

Ten examples for flights departed between midnight and 6 AM (inclusive)


(continued below)
year month day dep_time sched_dep_time dep_delay arr_time
2013 1 1 517 515 2 830
2013 1 1 533 529 4 850
2013 1 1 542 540 2 923
2013 1 1 544 545 -1 1004
2013 1 1 554 600 -6 812
2013 1 1 554 558 -4 740
2013 1 1 555 600 -5 913
2013 1 1 557 600 -3 709
2013 1 1 557 600 -3 838
2013 1 1 558 600 -2 753
Table continues below
sched_arr_time arr_delay carrier flight tailnum origin dest
819 11 UA 1545 N14228 EWR IAH
830 20 UA 1714 N24211 LGA IAH
850 33 AA 1141 N619AA JFK MIA
1022 -18 B6 725 N804JB JFK BQN
837 -25 DL 461 N668DN LGA ATL
728 12 UA 1696 N39463 EWR ORD
854 19 B6 507 N516JB EWR FLL
723 -14 EV 5708 N829AS LGA IAD
846 -8 B6 79 N593JB JFK MCO
745 8 AA 301 N3ALAA LGA ORD
air_time distance hour minute time_hour
227 1400 5 15 2013-01-01 05:00:00
227 1416 5 29 2013-01-01 05:00:00
160 1089 5 40 2013-01-01 05:00:00
183 1576 5 45 2013-01-01 05:00:00
116 762 6 0 2013-01-01 06:00:00
150 719 5 58 2013-01-01 05:00:00
158 1065 6 0 2013-01-01 06:00:00
53 229 6 0 2013-01-01 06:00:00
140 944 6 0 2013-01-01 06:00:00
138 733 6 0 2013-01-01 06:00:00

9 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

What does between do?


Another useful dplyr filtering helper is between(). What does it do? Can you use it
to simplify the code needed to answer the previous challenges?

Check ?between. We used it in 4. above.

Missing dep_time
How many flights have a missing dep_time? What other variables are missing? What
might these rows represent?
flights %>%

mutate(dep_time = ifelse(is.na(dep_time), "Missing", "Complete")) %>%

count(dep_time, name = "Count") %>%

pander(caption = "Number of cases with missing and complete dep_time")

Number of cases with missing and complete dep_time


dep_time Count
Complete 328521
Missing 8255

Section 5.3.1

Most delayed and earliest departed flights


Sort flights to find the most delayed flights. Find the flights that left earliest.
flights %>%

arrange(desc(arr_delay)) %>%

relocate(arr_delay) %>%

head(10) %>%

pander(caption = "Ten most delayed flights")

Ten most delayed flights (continued below)


arr_delay year month day dep_time sched_dep_time dep_delay
1272 2013 1 9 641 900 1301
1127 2013 6 15 1432 1935 1137
1109 2013 1 10 1121 1635 1126
1007 2013 9 20 1139 1845 1014
989 2013 7 22 845 1600 1005
931 2013 4 10 1100 1900 960
915 2013 3 17 2321 810 911
895 2013 7 22 2257 759 898
878 2013 12 5 756 1700 896

10 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

arr_delay year month day dep_time sched_dep_time dep_delay


875 2013 5 3 1133 2055 878
Table continues below
arr_time sched_arr_time carrier flight tailnum origin dest
1242 1530 HA 51 N384HA JFK HNL
1607 2120 MQ 3535 N504MQ JFK CMH
1239 1810 MQ 3695 N517MQ EWR ORD
1457 2210 AA 177 N338AA JFK SFO
1044 1815 MQ 3075 N665MQ JFK CVG
1342 2211 DL 2391 N959DL JFK TPA
135 1020 DL 2119 N927DA LGA MSP
121 1026 DL 2047 N6716C LGA ATL
1058 2020 AA 172 N5DMAA EWR MIA
1250 2215 MQ 3744 N523MQ EWR ORD
air_time distance hour minute time_hour
640 4983 9 0 2013-01-09 09:00:00
74 483 19 35 2013-06-15 19:00:00
111 719 16 35 2013-01-10 16:00:00
354 2586 18 45 2013-09-20 18:00:00
96 589 16 0 2013-07-22 16:00:00
139 1005 19 0 2013-04-10 19:00:00
167 1020 8 10 2013-03-17 08:00:00
109 762 7 59 2013-07-22 07:00:00
149 1085 17 0 2013-12-05 17:00:00
112 719 20 55 2013-05-03 20:00:00

flights %>%

arrange(arr_delay) %>%

relocate(arr_delay) %>%

head(10) %>%

pander(caption = "Ten earliest departed flights")

Ten earliest departed flights (continued below)


arr_delay year month day dep_time sched_dep_time dep_delay
-86 2013 5 7 1715 1729 -14
-79 2013 5 20 719 735 -16
-75 2013 5 2 1947 1949 -2
-75 2013 5 6 1826 1830 -4
-74 2013 5 4 1816 1820 -4
-73 2013 5 2 1926 1929 -3
-71 2013 5 6 1753 1755 -2
-71 2013 5 7 2054 2055 -1
-71 2013 5 13 657 700 -3

11 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

arr_delay year month day dep_time sched_dep_time dep_delay


-70 2013 1 4 1026 1030 -4
Table continues below
arr_time sched_arr_time carrier flight tailnum origin dest
1944 2110 VX 193 N843VA EWR SFO
951 1110 VX 11 N840VA JFK SFO
2209 2324 UA 612 N851UA EWR LAX
2045 2200 AA 269 N3KCAA JFK SEA
2017 2131 AS 7 N551AS EWR SEA
2157 2310 UA 1628 N24212 EWR SFO
2004 2115 DL 1394 N3760C JFK PDX
2317 28 UA 622 N806UA EWR SFO
908 1019 B6 671 N805JB JFK LAX
1305 1415 VX 23 N855VA JFK SFO
air_time distance hour minute time_hour
315 2565 17 29 2013-05-07 17:00:00
316 2586 7 35 2013-05-20 07:00:00
300 2454 19 49 2013-05-02 19:00:00
289 2422 18 30 2013-05-06 18:00:00
281 2402 18 20 2013-05-04 18:00:00
314 2565 19 29 2013-05-02 19:00:00
283 2454 17 55 2013-05-06 17:00:00
309 2565 20 55 2013-05-07 20:00:00
290 2475 7 0 2013-05-13 07:00:00
324 2586 10 30 2013-01-04 10:00:00

Fastest flights
Sort flights to find the fastest (highest speed) flights.
flights %>%

mutate(speed = distance*1.609/(air_time/60)) %>%

relocate(speed, air_time, distance) %>%

arrange(desc(speed)) %>%

head(10) %>%

pander(caption = "Ten highest speed (km/h) flights")

Ten highest speed (km/h) flights (continued below)


speed air_time distance year month day dep_time sched_dep_time
1132 65 762 2013 5 25 1709 1700
1046 93 1008 2013 7 2 1558 1513
1043 55 594 2013 5 13 2040 2025
1032 70 748 2013 3 23 1914 1910

12 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

speed air_time distance


year month day dep_time sched_dep_time
951.6 105 1035
2013 1 12 1559 1600
907.5 170 1598
2013 11 17 650 655
896.9 172 1598
2013 2 21 2355 2358
895.3 175 1623
2013 11 17 759 800
891.7 173 1598
2013 11 16 2003 1925
891.7 173 1598
2013 11 16 2349 2359
Table continues below
dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum
9 1923 1937 -14 DL 1499 N666DN
45 1745 1719 26 EV 4667 N17196
15 2225 2226 -1 EV 4292 N14568
4 2045 2043 2 EV 3805 N12567
-1 1849 1917 -28 DL 1902 N956DL
-5 1059 1150 -51 DL 315 N3768
-3 412 438 -26 B6 707 N779JB
-1 1212 1255 -43 AA 936 N5FFAA
38 17 36 -19 DL 347 N3773D
-10 402 440 -38 B6 1503 N571JB
origin dest hour minute time_hour
LGA ATL 17 0 2013-05-25 17:00:00
EWR MSP 15 13 2013-07-02 15:00:00
EWR GSP 20 25 2013-05-13 20:00:00
EWR BNA 19 10 2013-03-23 19:00:00
LGA PBI 16 0 2013-01-12 16:00:00
JFK SJU 6 55 2013-11-17 06:00:00
JFK SJU 23 58 2013-02-21 23:00:00
JFK STT 8 0 2013-11-17 08:00:00
JFK SJU 19 25 2013-11-16 19:00:00
JFK SJU 23 59 2013-11-16 23:00:00

Farthest and shortest distance flights


Which flights travelled the farthest? Which travelled the shortest?
panderOptions("table.alignment.default", "right")

flights %>%

# distinct(distance, origin, dest) %>%

count(distance, origin, dest, name = "Number of flights") %>%

arrange(desc(distance)) %>%

relocate(distance) %>%

head(10) %>%

13 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

pander(caption = "Ten fartest distance flights")

Ten fartest distance flights


distance origin dest Number of flights
4983 JFK HNL 342
4963 EWR HNL 365
3370 EWR ANC 8
2586 JFK SFO 8204
2576 JFK OAK 312
2569 JFK SJC 329
2565 EWR SFO 5127
2521 JFK SMF 284
2475 JFK LAX 11262
2465 JFK BUR 371

# panderOptions("table.alignment.default", "right")

flights %>%

count(distance, origin, dest, name = "Number of flights") %>%

arrange(distance) %>%

relocate(distance) %>%

head(10) %>%

pander(caption = "Ten shortest distance flights")

Ten shortest distance flights


distance origin dest Number of flights
17 EWR LGA 1
80 EWR PHL 49
94 JFK PHL 976
96 LGA PHL 607
116 EWR BDL 443
143 EWR ALB 439
160 EWR PVD 376
169 EWR BWI 545
173 JFK MVY 221
184 JFK BWI 1221

Section 5.6.7
Which carrier has the worst delays? Challenge: can you disentangle the effects of
bad airports vs. bad carriers? Why/why not? (Hint: think about flights %>%
group_by(carrier, dest) %>% summarise(n()))

We should control for origin, destination, month, and flight time (rush hour or not,
weekend or weekday) while we compare carriers with respect to flights delay. Let
us calculate the mean delay times for each carrier between every pair of departure

14 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

and arrival airports.


d <- flights %>%

mutate(month = factor(month),

weekday = weekdays(time_hour),

weekend = weekday %in% c("Saturday", "Sunday"),

rush_hour = (!weekend) & (between(arr_time, 700, 900) | between(arr_time, 1600,


1800) |

between(dep_time, 700, 900) | between(dep_time, 1600, 1800))) %>%

na.omit() %>%

group_by(carrier, origin, dest, rush_hour, weekend, month) %>%

summarize(delay = mean(pmax(arr_delay, 0), na.rm=TRUE),

weight = n(),

.groups = "drop") # do not let negative arr_delay cancel out positive delays

Next let us fit a linear regression.


lmod <- lm(delay ~ carrier + origin + dest + rush_hour + weekend + month, d, weights = weight)
%>% step(trace = FALSE)

lmod %>%
pander(caption = "Model for delay versus carrier controlled for other variables")

Model for delay versus carrier controlled for other variables


Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.9 4.963 3.203 0.001366
carrierAA -4.88 0.9277 -5.261 1.48e-07
carrierAS -7.941 3.331 -2.384 0.01716
carrierB6 -0.1769 0.7987 -0.2215 0.8247
carrierDL -5.284 0.8254 -6.402 1.635e-10
carrierEV 3.905 0.8249 4.734 2.249e-06
carrierF9 8.462 3.223 2.626 0.008671
carrierFL 5.312 1.889 2.812 0.004936
carrierHA -9.531 5.892 -1.618 0.1058
carrierMQ -1.208 0.8634 -1.399 0.1617
carrierOO 8.627 14.38 0.5999 0.5486
carrierUA -4.485 0.9125 -4.916 9.054e-07
carrierUS -7.707 1.029 -7.493 7.574e-14
carrierVX -2.623 1.417 -1.851 0.06428
carrierWN -3.027 1.273 -2.378 0.01743
carrierYV 8.076 3.458 2.336 0.01953
originJFK -0.0928 0.4992 -0.1859 0.8525
originLGA -1.265 0.4399 -2.875 0.004053
destACK -4.45 6.786 -0.6558 0.512
destALB 4.402 6.181 0.7121 0.4764

15 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

Estimate Std. Error t value Pr(>|t|)


destANC 0.2404 27.73 0.00867 0.9931
destATL 8.234 4.92 1.673 0.09429
destAUS 6.234 5.106 1.221 0.2222
destAVL -0.2713 6.841 -0.03966 0.9684
destBDL -2.521 6.197 -0.4068 0.6841
destBGR 0.2402 6.382 0.03763 0.97
destBHM 7.29 6.8 1.072 0.2837
destBNA 7.38 4.986 1.48 0.1389
destBOS 3.276 4.901 0.6685 0.5038
destBQN -2.343 5.497 -0.4263 0.6699
destBTV 0.0535 5.094 0.0105 0.9916
destBUF 2.468 4.984 0.4951 0.6206
destBUR 9.412 6.294 1.495 0.1349
destBWI 6.235 5.234 1.191 0.2336
destBZN 6.916 13.93 0.4964 0.6196
destCAE 23.11 8.952 2.581 0.009861
destCAK 4.789 5.813 0.8237 0.4101
destCHO 2.878 12.4 0.2321 0.8165
destCHS 4.41 5.09 0.8664 0.3863
destCLE 7.133 5.013 1.423 0.1548
destCLT 7.275 4.937 1.474 0.1406
destCMH 6.148 5.076 1.211 0.2259
destCRW 7.318 8.285 0.8833 0.3771
destCVG 9.784 5.05 1.938 0.05272
destDAY 4.771 5.309 0.8987 0.3689
destDCA 7.128 4.954 1.439 0.1503
destDEN 9.24 4.967 1.86 0.06289
destDFW 5.648 4.966 1.137 0.2555
destDSM 8.188 5.941 1.378 0.1682
destDTW 5.16 4.945 1.043 0.2968
destEGE 8.916 7.255 1.229 0.2191
destEYW -2.465 19.35 -0.1274 0.8986
destFLL 5.863 4.905 1.195 0.2321
destGRR 10.33 5.671 1.821 0.06866
destGSO 6.929 5.287 1.311 0.1901
destGSP 8.878 5.611 1.582 0.1137
destHDN 4.365 21.2 0.2059 0.8369
destHNL 3.276 6.356 0.5155 0.6062
destHOU 6.929 5.188 1.336 0.1817
destIAD 5.206 4.994 1.042 0.2973
destIAH 6.864 4.965 1.383 0.1668
destILM 1.446 8.933 0.1618 0.8714

16 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

Estimate Std. Error t value Pr(>|t|)


destIND 4.424 5.183 0.8535 0.3934
destJAC 17.61 17.54 1.004 0.3153
destJAX 5.128 5.09 1.007 0.3138
destLAS 2.163 4.961 0.4359 0.6629
destLAX 2.607 4.906 0.5314 0.5952
destLEX -9.864 77.35 -0.1275 0.8985
destLGB 0.882 5.701 0.1547 0.877
destMCI 8.225 5.201 1.582 0.1138
destMCO 4.444 4.899 0.9071 0.3644
destMDW 9.406 5.125 1.835 0.06651
destMEM 6.438 5.234 1.23 0.2188
destMHT 3.318 5.508 0.6025 0.5469
destMIA 4.56 4.932 0.9244 0.3553
destMKE 8.166 5.121 1.595 0.1108
destMSN 10.29 5.888 1.747 0.08071
destMSP 7.013 4.966 1.412 0.1579
destMSY 6.182 5.022 1.231 0.2184
destMTJ 0.8145 21.2 0.03842 0.9694
destMVY -6.533 7.206 -0.9066 0.3646
destMYR 4.781 11.26 0.4245 0.6712
destOAK 5.057 6.541 0.7731 0.4395
destOKC 15.21 6.546 2.323 0.02018
destOMA 6.177 5.589 1.105 0.2691
destORD 8.901 4.91 1.813 0.06988
destORF 7.445 5.293 1.407 0.1596
destPBI 6.585 4.95 1.33 0.1835
destPDX 7.727 5.297 1.459 0.1447
destPHL 8.306 5.273 1.575 0.1152
destPHX 6.278 5.009 1.253 0.2101
destPIT 4.672 5.098 0.9163 0.3595
destPSE -3.373 6.333 -0.5326 0.5943
destPSP -13.3 18.87 -0.7047 0.481
destPVD 4.083 6.371 0.6408 0.5217
destPWM 2.957 5.114 0.5782 0.5631
destRDU 5.433 4.961 1.095 0.2734
destRIC 9.907 5.143 1.926 0.0541
destROC 4.264 5.106 0.8352 0.4036
destRSW 1.039 5.026 0.2068 0.8362
destSAN 7.767 5.083 1.528 0.1266
destSAT 11.97 5.723 2.091 0.03656
destSAV 5.449 5.648 0.9646 0.3348
destSBN -0.5733 24.9 -0.02303 0.9816

17 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

Estimate Std. Error t value Pr(>|t|)


destSDF 4.53 5.414 0.8367 0.4028
destSEA 4.159 5.051 0.8234 0.4103
destSFO 6.096 4.916 1.24 0.215
destSJC -1.433 6.452 -0.2221 0.8242
destSJU 1.31 4.958 0.2642 0.7917
destSLC 1.004 5.113 0.1963 0.8444
destSMF 2.989 6.678 0.4476 0.6545
destSNA -2.515 5.58 -0.4508 0.6521
destSRQ 4.027 5.344 0.7535 0.4511
destSTL 8.307 5.034 1.65 0.09899
destSTT 1.283 5.943 0.2159 0.8291
destSYR 1.901 5.196 0.3659 0.7144
destTPA 5.731 4.938 1.16 0.2459
destTUL 18.15 6.648 2.73 0.006357
destTVC 5.122 9.312 0.5501 0.5823
destTYS 12.77 5.853 2.182 0.02916
destXNA 6.342 5.48 1.157 0.2471
rush_hourTRUE -10.07 0.2773 -36.32 2.411e-264
month2 -0.1341 0.6915 -0.194 0.8462
month3 1.24 0.6633 1.87 0.06159
month4 4.789 0.6656 7.196 6.884e-13
month5 0.5171 0.6626 0.7804 0.4352
month6 10.28 0.6689 15.38 1.786e-52
month7 10.79 0.662 16.3 1.259e-58
month8 1.551 0.6595 2.352 0.01868
month9 -4.696 0.6698 -7.01 2.603e-12
month10 -4.609 0.6604 -6.979 3.243e-12
month11 -4.767 0.6698 -7.116 1.22e-12
month12 6.604 0.6694 9.865 8.414e-23

How good did model fit?


lmod %>%

glance() %>%

pander(caption = "Model performance")

Model performance (continued below)


r.squared adj.r.squared sigma statistic p.value df logLik AIC
0.3618 0.3495 77.19 29.39 0 132 -28723 57714
BIC deviance df.residual nobs
58632 40785482 6845 6978

Adjusted R2 equals 34%, which is weak. The RSE/mean delay =279% is huge.
There may be other factors that contribute to the variation in delay. We cannot use

18 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

the model for delay prediction, but the model can be useful to compare carriers.
lmod %>%

tidy() %>%

head(10) %>%

pander(caption = "A glimpse over the model parameters")

A glimpse over the model parameters


term estimate std.error statistic p.value
(Intercept) 15.9 4.963 3.203 0.001366
carrierAA -4.88 0.9277 -5.261 1.48e-07
carrierAS -7.941 3.331 -2.384 0.01716
carrierB6 -0.1769 0.7987 -0.2215 0.8247
carrierDL -5.284 0.8254 -6.402 1.635e-10
carrierEV 3.905 0.8249 4.734 2.249e-06
carrierF9 8.462 3.223 2.626 0.008671
carrierFL 5.312 1.889 2.812 0.004936
carrierHA -9.531 5.892 -1.618 0.1058
carrierMQ -1.208 0.8634 -1.399 0.1617

Since we are controlling for many aspects of the flights, comparisons of carrier
estimates is fair. But those estimates have different standard errors. Therefore, it is
better to compare the t-statistics; namely, standardized carrier estimates.
lmod %>%

tidy() %>% # broom package: extracts coefficients


and stores them in a tidy tibble

filter(str_detect(term, "carrier")) %>% # focus on carriers

mutate(term = str_replace(term, "carrier","")) %>% # carrier short names

left_join(airlines, by = c("term" = "carrier")) %>% # carrier long names

ggplot(aes(reorder(name, statistic), statistic)) + # reorder the airlines with respect to


t-statistic

geom_linerange(aes(ymin=0, ymax=statistic)) + # like bar chart

geom_hline(yintercept = c(-2,2), # any effect within bounds are


statistically indifferent from zero

col="red", lty="dashed") +

coord_flip() + # long airline names do not fit along


the horizontal axis, flip the axes

labs(title = "Delay propensity of airlines", y = "t-statistic", x = NULL,

subtitle = "(Negative values indicate delays are less or least likely)")

19 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

Section 5.7.1

Worst on-time records


Which plane (tailnum) has the worst on-time record?
flights %>%

group_by(tailnum) %>%

summarize(worst_on_time = max(arr_delay, na.rm=TRUE), .groups = "drop") %>%

top_n(worst_on_time, n= 10) %>%

ggplot(aes(reorder(tailnum, desc(worst_on_time)), worst_on_time)) +

geom_linerange(aes(ymin=0, ymax=worst_on_time)) +

geom_text(aes(label=worst_on_time), vjust = -.5) +

20 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

labs(title = "Worst on-time flight tail numbers", y="Delay", x="Tail number")

Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor

Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor

Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor

Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor

Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor

Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor

Warning in max(arr_delay, na.rm = TRUE): max için eksik olmayan argüman yok;
-Inf döndürülüyor

21 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

Avoid delays
What time of day should you fly if you want to avoid delays as much as possible?
d2 <- flights %>%

mutate(dep_time_cut = cut(time_hour, breaks = "30 min")) %>%

group_by(dep_time_cut) %>%

summarize(median_delay = median(pmax(arr_delay, 0), na.rm=TRUE),

ucl95 = quantile(pmax(arr_delay, 0), prob = 0.95, na.rm=TRUE),

lcl05 = quantile(pmax(arr_delay, 0), prob = 0.05, na.rm=TRUE), .groups="drop") %>%

mutate(date_time = ymd_hms(as.character(dep_time_cut)), # convert factor


label to character and then to datetime

22 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

date = format(date_time, "%Y-%m-%d")) # record the date


only in a variable for filtering later

travel_date <- "2013-07-01" # choose a travel


date (in 2013)

d2 %>%

filter(date == travel_date) %>% # filter to travel


date

mutate_at(vars(median_delay, ucl95, lcl05), ~./60) %>% # convert minutes


to hours

ggplot(aes(date_time, group=1)) + # group = 1 is


needed to have points connected

geom_line(aes(y=median_delay), col = "blue") +

geom_ribbon(aes(ymin=lcl05, ymax = ucl95), fill = "pink", alpha =0.5) +

scale_x_datetime(date_breaks = "1 hour", date_labels = "%H") + # horizontal axis


ticks at each hour within the day

labs(title = format(ymd(travel_date), "%d %B %Y %A"),

subtitle = "Mean delay and 90% confidence interval",

x="Time (hour)", y="Delay (hour)")

23 of 24 19/02/2024, 15:03
IE451 Fall 2023-2024 Homework 1 Solutions file:///home/sdayanik/Downloads/ie451/Homework/H...

24 of 24 19/02/2024, 15:03

You might also like