Professional Documents
Culture Documents
IBM Capstone SpaceY Taylor Collard
IBM Capstone SpaceY Taylor Collard
17 May 2024
Outline
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
2
Executive Summary
• Methodologies used:
○ API calls and Web Scraping
○ SQL queries
○ Python visualization libraries
○ Simple Linear Regression
○ K Nearest Neighbor
○ Decision Trees
○ Logistic Regression
○ Support Vector Means
3
Introduction
• SpaceX advantage comes from reusing the first stage of their rockets,
saving clients over 100 million dollars per launch
• SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62
million dollars; other providers cost upwards of 165 million dollars each
• Therefore, if we can determine if the first stage will land, we can determine
the cost of SpaceX launches and use this information to our advantage.
4
Section
1
5
Methodology
Executive Summary
• Data collection methodology:
• Data was collected from both the SpaceX API and web scraping public data
(wikipedia) on SpaceX launches.
7
Data Collection - Scraping
8
Data Wrangling
9
EDA with Data Visualization
11
Build an Interactive Map with Folium
12
Build a Dashboard with Plotly Dash
• Pie graphs were used to show success/failure rates per launch site, and
the overall share of successful launches in order to determine any
statistical significance of launch location
• A Categorical scatter plot was used to visualize the relationship between
booster type, payload, and success rates across all locations.
• GitHub URL of completed Plotly Dash lab (code only), as an external
reference and peer-review purpose
13
Predictive Analysis (Classification)
14
Results
• EDA
○ Year is most significantly related to success
○ No strong correlation between orbit type
○ Some correlation between launch site and success rate
■ However more launches were at CCAFS SLC-40 earlier on, when
failure was more common
■ Same location has more successes than other locations later on
• Interactive analytics demo in screenshots on later slides
• Predictive analysis results
○ Models showing ~83% accuracy on test data
15
Section
2
Flight Number vs. Launch Site
17
Payload vs. Launch Site
18
Success Rate vs. Orbit Type
19
Flight Number vs. Orbit Type
20
Payload vs. Orbit Type
21
Launch Success Yearly Trend
22
All Launch Site Names
23
Launch Site Names Begin with 'CCA'
24
Total Payload Mass
25
Average Payload Mass by F9 v1.1
26
First Successful Ground Landing Date
27
Successful Drone Ship Landing with Payload between 4000 and 6000
28
Total Number of Successful and Failure Mission Outcomes
29
Boosters Carried Maximum Payload
• A subquery is utilized to filter all rows with max payload. All are shown to
be F9 B5 boosters
30
2015 Launch Records
31
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
32
Section
3
Launch Locations
• Launch locations marked and labeled. Florida locations overlapped due to proximity. 34
Landing result marker clusters
• Marker Cluster elements show launch outcomes decluttered for visibility. Green - Successful landings, Red- Failed Landings
36
Section
4
Share of Successful Launches by Site
• Current view shows almost half of all successful landings come from CCAFS SLC-40
• When choosing a specific launch site from the dropdown, share of success vs. failures is shown
38
CCAFS SLC-40 has highest success ratio
• Although CCAFS SCL-40 has almost half of all total successful landings, it still
has only a 42.9% success rate
39
Payload vs. Success Rate (all sites)
• This slider shows that in the 1000-6000kg range across all sites the FT Booster shows
significantly higher success rates than the v1.1 Booster
Return to slide 17
40
Section
5
Classification Accuracy
An example of the .score method
• All models had the same accuracy on the test data according to the score
method, likely due to sample size 42
Classification Accuracy
Code used to cross-validate
45
Appendix
46