Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 50

16/05/2022

Outline

• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix

2
Executive Summary (1/3)
Overall context of the project
• The SpaceX company typically offers to launch rockets to space at a fraction of the cheapest proposal of other
competitors.
• In this project, the cost of launch by SpaceX is 62 M€ which is at least 62.4% cheaper than the closest
competitors.
• Savings on the SpaceX side is the result of being able to reuse the first stage orbital rocket several times.
• Thus, a competitor can gain some leverage IF they can predict the outcome of the first-stage landing —
failure or success — based on available data.
• As such, a competitor can determine if they should make a bid at all and how much they should bid, based on the
predictions.

3
Executive Summary (2/3)
The challenge at hand:
• The main challenge is the accurate prediction of the chance of success/failure of the first landing stage of SpaceX Falcon 9 given
a collection of publicly available dataset.
• How to obtain this data which potentially is in a highly unstructured format and standardize it.
• How to contextualize and explore the gleaned data
• To find the best method, its hyperparameters and the corresponding tuning, that can predict the outcome of the landing stage with
the best accuracy.

Summary of methodologies:
• Clearly, the expected outcome is a discrete variable — a variable that can take only two values.
• Therefore, the best methodologies are those for discrete variable prediction.
• Four methodologies are adopted and compared to determine the best
• K-nearest neighbors

• SVM

• Classification trees 4
• Logistic regression
Executive Summary (3/3)
Summary of results:
• Blah blah……

5
Introduction (1/2)
• The space race has intensified over the last decade with novel innovations in space transportation.
• A significant challenge has been the costs of transporting payloads — satellites, materials, communications
systems, etc. — to space.
• The SpaceX company thought they could save cost by innovating on the transport system.
• Usually, after delivering the payload, the transport system is either completely destroyed or unusable.
• SpaceX produced the idea of reusing a stage of the transport mechanism at least several times.
• The results is significant cost savings — by more than 60% in some cases.
• This puts SpaceX in a significant position to underbid competitors.
• However, there is a chance that for numerous reasons, the main component of the reusable stage is
destroyed during transition, and by extension not reusable.
• This gives competitors a slight advantage, if they can predict the outcome of the reusable stage given public
data set.
• Then, they can outbid SpaceX. 6
Introduction (2/2)

• The problem is then 3-fold:


• Where to find publicly available information, clean, and structure it?
• How to contextualize and explore the data?
• Apply a retinue of applicable machine learning methodologies to determine the best to predict the landing
outcome, with accuracy as the metric.

7
Section 1

8
Methodology

Executive Summary
•Data collection methodology:
•Raw data was collected through an API request to launch data from SpaceX
•Perform data wrangling
•Data was wrangled exclusively with the Pandas library.
•Perform exploratory data analysis (EDA) using visualization and SQL
•Perform interactive visual analytics using Folium and Plotly Dash
•Perform predictive analysis using classification models
•How to build, tune, evaluate classification models
9
Data Collection Process

Raw data was collected through an API request to launch data


from SpaceX.
API request returned a json object of the data.
Json object is parsed and normalized using the Pandas library and
a structured ‘dataframe’ returned as the output.
Note:
Raw data from SpaceX API mainly returns unique IDs of elements within
each data column.
These unique IDs are like ‘keys’ to extract the actual desired data
This unique IDs are then used in another API call to extract the desired
data 10
Data Collection – SpaceX API
Flowchart of SpaceX API calls

Place your flowchart of SpaceX API calls here

GitHub URL of API call notebook (Ctrl+click here to go to url!) 11


Data Collection - Scraping
Flowchart of Webscraping Activity
• Present your web scraping
process using key phrases and
flowcharts

• Add the GitHub URL of the Place your flowchart of web scraping here
completed web scraping
notebook, as an external
reference and peer-review
purpose

GitHub URL of webscraping notebook (Ctrl+click here to go to url!) 12


Data Wrangling

Describe how data were processed


You need to present your data wrangling process using key phrases and
flowcharts
Add the GitHub URL of your completed data wrangling related notebooks, as
an external reference and peer-review purpose

GitHub URL of data wrangling notebook (Ctrl+click here to go to url!) 13


EDA with Data Visualization

Summarize what charts were plotted and why you used those charts
Add the GitHub URL of your completed EDA with data visualization notebook, as
an external reference and peer-review purpose

GitHub URL of EDA with data visualization notebook (Ctrl+click here to go to url!) 14
EDA with SQL
Load csv data into Db2, confirm data structure, and adjust if required.
Connect to Db2 from Jupyter notebook and perform high-level table checks
Data queries:
Select unique launch sites from the database

Isolate 5 records corresponding to launch sites with names starting with ‘CCA’

Determine the total payload mass launched by NASA (CRS)

Determine the average payload mass carried by a specific booster version

Determine the earliest date of the first successful landing outcome with ground pad

Determine the total number of successful/failed mission outcomes

List the names of booster version that carried the maximum payload

List the failed landing outcomes using drone ship, their booster versions and launch site for year 2015

Count the landing outcomes for a specific range of dates.


15
GitHub URL of EDA with SQL notebook (Ctrl+click here to go to url!)
Build an Interactive Map with Folium

Summarize what map objects such as markers, circles, lines, etc. you created and added to a
folium map
Explain why you added those objects
Add the GitHub URL of your completed interactive map with Folium map, as an external
reference and peer-review purpose

16
Build a Dashboard with Plotly Dash

Summarize what plots/graphs and interactions you have added to a dashboard


Explain why you added those plots and interactions
Add the GitHub URL of your completed Plotly Dash lab, as an external reference
and peer-review purpose

17
Predictive Analysis (Classification)

Summarize how you built, evaluated, improved, and found the best performing
classification model
You need present your model development process using key phrases and flowchart
Add the GitHub URL of your completed predictive analysis lab, as an external
reference and peer-review purpose

18
Results

• Exploratory data analysis results


• Interactive analytics demo in screenshots
• Predictive analysis results

19
Section 2
Flight Number vs. Launch Site
Scatter plot Flight Number vs. Launch Site

• Show a scatter plot of Flight Number vs. Launch Site


• Show the screenshot of the scatter plot with explanations

21
X(Payload) tended to launch from CCAFS SLC 40 & KSC LC 39A # Payloads less than 8000 kg tended to fail at a higher rate when launched from CCAFS SLC 40, # plausibly due to that launch site being used for R&D versus the other two launch # sites used with

Payload vs. Launch Site

Scatter plot Payload vs. Launch Site

• # Payloads that approach MAX(Payload) tended to launch from CCAFS SLC 40 & KSC LC 39A
• # Payloads less than 8000 kg tended to fail at a higher rate when launched from CCAFS SLC 40,
• # plausibly due to that launch site being used for R&D versus the other two launch
• # sites used with less failure-tolerant payloads.
22
Success Rate vs. Orbit Type
Bar chart for Orbital Success Rate

• Show a bar chart for the


success rate of each orbit type

• Show the screenshot of the


scatter plot with explanations

23
Flight Number vs. Orbit Type
Scatter plot for Flight Number vs. Orbit Type

• Show a scatter point of Flight number vs. Orbit type

• Show the screenshot of the scatter plot with explanations 24


Payload vs. Orbit Type
Scatter plot for Payload vs. Orbit Type

• Show a scatter point of payload vs. orbit type


• Show the screenshot of the scatter plot with explanations

25
Launch Success Yearly Trend
Year-on-Year Success Trend Plot

• Show a line chart of yearly


average success rate

• Show the screenshot of the


scatter plot with explanations

26
All Launch Site Names

Query:
%sql select Unique(LAUNCH_SITE) from SPACEXTBL;

Results of unique launch sites:


CCAFS LC-40
CCAFS SLC-40
CCAFSSLC-40
KSC LC-39A
VAFB SLC-4E

27
Launch Site Names Begin with 'CCA'

Query:
%sql SELECT LAUNCH_SITE from SPACEXTBL where (LAUNCH_SITE) LIKE 'CCA%'
LIMIT 5

Launch site names that begin with ‘CCA’:


CCAFS LC-40
CCAFS LC-40
CCAFS LC-40
CCAFS LC-40
CCAFS LC-40

28
Total Payload Mass

Query:
%sql select sum(PAYLOAD_MASS__KG_) as payloadmass from SPACEXTBL;

Results:
619, 967 kg

29
Average Payload Mass by F9 v1.1

Query:
%sql SELECT AVG(PAYLOAD_MASS__KG_) FROM SPACEXTBL WHERE BOOSTER_VERSION = 'F9
v1.1’

Average payload mass of F9 v1.1


2928 kg

30
First Successful Ground Landing Date

Query:
%sql SELECT MIN(DATE) FROM SPACEXTBL WHERE LANDING__OUTCOME = 'Success (ground
pad)'

Date of first successful ground landing:


2015-12-22

31
Successful Drone Ship Landing with Payload between 4000 and 6000

Query:
%sql SELECT booster_version FROM SPACEXTBL WHERE LANDING__OUTCOME = 'Success (drone ship)' AND 4000 <
PAYLOAD_MASS__KG_ < 6000;

Names of boosters with successful outcomes with payload between 4000-6000 kg:
F9 FT B1021.1
F9 FT B1023.1
F9 FT B1029.2
F9 FT B1038.1
F9 B4 B1042.1
F9 B4 B1045.1
F9 B5 B1046.1

32
Total Number of Successful and Failure Mission Outcomes

Query:
%sql select count(MISSION_OUTCOME) as mission_outcomes from SPACEXTBL GROUP BY
MISSION_OUTCOME;

Total number of successful and failed mission outcomes:


Successful — 99
Failure — 1

33
Boosters Carried Maximum Payload
Query:
%sql select BOOSTER_VERSION as boosterversion from SPACEXTBL where PAYLOAD_MASS__KG_=(select max(PAYLOAD_MASS__KG_) from
SPACEXTBL)

Names of booster with maximum payload:


F9 B5 B1048.4
F9 B5 B1049.4
F9 B5 B1051.3
F9 B5 B1056.4
F9 B5 B1048.5
F9 B5 B1051.4
F9 B5 B1049.5
F9 B5 B1060.2
F9 B5 B1058.3
F9 B5 B1051.6
F9 B5 B1060.3
F9 B5 B1049.7
34
2015 Launch Records

Query:
%sql select date, booster_version, landing__outcome, launch_site from SPACEXTBL where
landing__outcome like 'Failure (drone ship)' and DATE like '2015%’

Failed landing outcomes in drone ship and their booster versions:


2015-01-10 F9 v1.1 B1012Failure (drone ship)CCAFS LC-40
2015-04-14 F9 v1.1 B1015Failure (drone ship)CCAFS LC-40

35
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
Query:
%sql SELECT LANDING__OUTCOME FROM SPACEXTBL WHERE DATE BETWEEN '2010-06-04' AND
'2017-03-20' ORDER BY DATE DESC;

Results:

No attempt Precluded (drone ship) No attempt


Success (ground pad) No attempt Uncontrolled (ocean)
Success (drone ship) Failure (drone ship) No attempt
Success (drone ship) No attempt No attempt
Success (ground pad) Controlled (ocean) No attempt
Failure (drone ship) Failure (drone ship) Failure (parachute)
Success (drone ship) Uncontrolled (ocean) Failure (parachute)
Success (drone ship) No attempt
Success (drone ship) No attempt
Failure (drone ship) Controlled (ocean)
Failure (drone ship) Controlled (ocean) 36
Success (ground pad) No attempt
Section 3
Geographic Location of Launch Sites

Explore the generated folium map


and make a proper screenshot to
include all launch sites’ location
markers on a global map

Explain the important elements


and findings on the screenshot

38
Geographic Cluster location

Explore the folium map and make a proper screenshot to show the color-labeled
launch outcomes on the map

Explain the important elements and findings on the screenshot

39
Distance to Proximities

Explore the generated folium map and show the screenshot of a selected
launch site to its proximities such as railway, highway, coastline, with
distance calculated and displayed

Explain the important elements and findings on the screenshot

40
Section 4
SpaceX Launch Records Dashboard

Explain the important elements and findings on the screenshot

42
Success Ratio for Site with Highest Launch Success

Explain the important elements and findings on the screenshot

43
Payload vs. Launch Outcome Dashboard

Show screenshots of Payload vs. Launch


Outcome scatter plot for all sites, with different
payload selected in the range slider

Explain the important elements and findings on


the screenshot, such as which payload range or
booster version have the largest success rate, etc.

44
Section 5
Classification Accuracy

• Visualize the built model accuracy for all


built classification models, in a bar chart

• Find which model has the highest


classification accuracy

46
Confusion Matrix - KNN

• Show the confusion matrix of the best performing model with an explanation

47
Conclusions

Point 1
Point 2
Point 3
Point 4

48
Appendix

Include any relevant assets like Python code snippets, SQL queries, charts, Notebook
outputs, or data sets that you may have created during this project

49

You might also like