Professional Documents
Culture Documents
Ds Capstone Template Coursera
Ds Capstone Template Coursera
Outline
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
2
Executive Summary (1/3)
Overall context of the project
• The SpaceX company typically offers to launch rockets to space at a fraction of the cheapest proposal of other
competitors.
• In this project, the cost of launch by SpaceX is 62 M€ which is at least 62.4% cheaper than the closest
competitors.
• Savings on the SpaceX side is the result of being able to reuse the first stage orbital rocket several times.
• Thus, a competitor can gain some leverage IF they can predict the outcome of the first-stage landing —
failure or success — based on available data.
• As such, a competitor can determine if they should make a bid at all and how much they should bid, based on the
predictions.
3
Executive Summary (2/3)
The challenge at hand:
• The main challenge is the accurate prediction of the chance of success/failure of the first landing stage of SpaceX Falcon 9 given
a collection of publicly available dataset.
• How to obtain this data which potentially is in a highly unstructured format and standardize it.
• How to contextualize and explore the gleaned data
• To find the best method, its hyperparameters and the corresponding tuning, that can predict the outcome of the landing stage with
the best accuracy.
Summary of methodologies:
• Clearly, the expected outcome is a discrete variable — a variable that can take only two values.
• Therefore, the best methodologies are those for discrete variable prediction.
• Four methodologies are adopted and compared to determine the best
• K-nearest neighbors
• SVM
• Classification trees 4
• Logistic regression
Executive Summary (3/3)
Summary of results:
• Blah blah……
5
Introduction (1/2)
• The space race has intensified over the last decade with novel innovations in space transportation.
• A significant challenge has been the costs of transporting payloads — satellites, materials, communications
systems, etc. — to space.
• The SpaceX company thought they could save cost by innovating on the transport system.
• Usually, after delivering the payload, the transport system is either completely destroyed or unusable.
• SpaceX produced the idea of reusing a stage of the transport mechanism at least several times.
• The results is significant cost savings — by more than 60% in some cases.
• This puts SpaceX in a significant position to underbid competitors.
• However, there is a chance that for numerous reasons, the main component of the reusable stage is
destroyed during transition, and by extension not reusable.
• This gives competitors a slight advantage, if they can predict the outcome of the reusable stage given public
data set.
• Then, they can outbid SpaceX. 6
Introduction (2/2)
7
Section 1
8
Methodology
Executive Summary
•Data collection methodology:
•Raw data was collected through an API request to launch data from SpaceX
•Perform data wrangling
•Data was wrangled exclusively with the Pandas library.
•Perform exploratory data analysis (EDA) using visualization and SQL
•Perform interactive visual analytics using Folium and Plotly Dash
•Perform predictive analysis using classification models
•How to build, tune, evaluate classification models
9
Data Collection Process
• Add the GitHub URL of the Place your flowchart of web scraping here
completed web scraping
notebook, as an external
reference and peer-review
purpose
Summarize what charts were plotted and why you used those charts
Add the GitHub URL of your completed EDA with data visualization notebook, as
an external reference and peer-review purpose
GitHub URL of EDA with data visualization notebook (Ctrl+click here to go to url!) 14
EDA with SQL
Load csv data into Db2, confirm data structure, and adjust if required.
Connect to Db2 from Jupyter notebook and perform high-level table checks
Data queries:
Select unique launch sites from the database
Isolate 5 records corresponding to launch sites with names starting with ‘CCA’
Determine the earliest date of the first successful landing outcome with ground pad
List the names of booster version that carried the maximum payload
List the failed landing outcomes using drone ship, their booster versions and launch site for year 2015
Summarize what map objects such as markers, circles, lines, etc. you created and added to a
folium map
Explain why you added those objects
Add the GitHub URL of your completed interactive map with Folium map, as an external
reference and peer-review purpose
16
Build a Dashboard with Plotly Dash
17
Predictive Analysis (Classification)
Summarize how you built, evaluated, improved, and found the best performing
classification model
You need present your model development process using key phrases and flowchart
Add the GitHub URL of your completed predictive analysis lab, as an external
reference and peer-review purpose
18
Results
19
Section 2
Flight Number vs. Launch Site
Scatter plot Flight Number vs. Launch Site
21
X(Payload) tended to launch from CCAFS SLC 40 & KSC LC 39A # Payloads less than 8000 kg tended to fail at a higher rate when launched from CCAFS SLC 40, # plausibly due to that launch site being used for R&D versus the other two launch # sites used with
• # Payloads that approach MAX(Payload) tended to launch from CCAFS SLC 40 & KSC LC 39A
• # Payloads less than 8000 kg tended to fail at a higher rate when launched from CCAFS SLC 40,
• # plausibly due to that launch site being used for R&D versus the other two launch
• # sites used with less failure-tolerant payloads.
22
Success Rate vs. Orbit Type
Bar chart for Orbital Success Rate
23
Flight Number vs. Orbit Type
Scatter plot for Flight Number vs. Orbit Type
25
Launch Success Yearly Trend
Year-on-Year Success Trend Plot
26
All Launch Site Names
Query:
%sql select Unique(LAUNCH_SITE) from SPACEXTBL;
27
Launch Site Names Begin with 'CCA'
Query:
%sql SELECT LAUNCH_SITE from SPACEXTBL where (LAUNCH_SITE) LIKE 'CCA%'
LIMIT 5
28
Total Payload Mass
Query:
%sql select sum(PAYLOAD_MASS__KG_) as payloadmass from SPACEXTBL;
Results:
619, 967 kg
29
Average Payload Mass by F9 v1.1
Query:
%sql SELECT AVG(PAYLOAD_MASS__KG_) FROM SPACEXTBL WHERE BOOSTER_VERSION = 'F9
v1.1’
30
First Successful Ground Landing Date
Query:
%sql SELECT MIN(DATE) FROM SPACEXTBL WHERE LANDING__OUTCOME = 'Success (ground
pad)'
31
Successful Drone Ship Landing with Payload between 4000 and 6000
Query:
%sql SELECT booster_version FROM SPACEXTBL WHERE LANDING__OUTCOME = 'Success (drone ship)' AND 4000 <
PAYLOAD_MASS__KG_ < 6000;
Names of boosters with successful outcomes with payload between 4000-6000 kg:
F9 FT B1021.1
F9 FT B1023.1
F9 FT B1029.2
F9 FT B1038.1
F9 B4 B1042.1
F9 B4 B1045.1
F9 B5 B1046.1
32
Total Number of Successful and Failure Mission Outcomes
Query:
%sql select count(MISSION_OUTCOME) as mission_outcomes from SPACEXTBL GROUP BY
MISSION_OUTCOME;
33
Boosters Carried Maximum Payload
Query:
%sql select BOOSTER_VERSION as boosterversion from SPACEXTBL where PAYLOAD_MASS__KG_=(select max(PAYLOAD_MASS__KG_) from
SPACEXTBL)
Query:
%sql select date, booster_version, landing__outcome, launch_site from SPACEXTBL where
landing__outcome like 'Failure (drone ship)' and DATE like '2015%’
35
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
Query:
%sql SELECT LANDING__OUTCOME FROM SPACEXTBL WHERE DATE BETWEEN '2010-06-04' AND
'2017-03-20' ORDER BY DATE DESC;
Results:
38
Geographic Cluster location
Explore the folium map and make a proper screenshot to show the color-labeled
launch outcomes on the map
39
Distance to Proximities
Explore the generated folium map and show the screenshot of a selected
launch site to its proximities such as railway, highway, coastline, with
distance calculated and displayed
40
Section 4
SpaceX Launch Records Dashboard
42
Success Ratio for Site with Highest Launch Success
43
Payload vs. Launch Outcome Dashboard
44
Section 5
Classification Accuracy
46
Confusion Matrix - KNN
• Show the confusion matrix of the best performing model with an explanation
47
Conclusions
Point 1
Point 2
Point 3
Point 4
…
48
Appendix
Include any relevant assets like Python code snippets, SQL queries, charts, Notebook
outputs, or data sets that you may have created during this project
49