Capstone Story Template

IBM data science
capstone Project
Ikeora Ekene
May ,2022
OUTLINE
• Executive Summary
• Introduction
• Methodology
• Results
• EDA results
• Interactive map with Folium
results
• Plotly Dashboard results
• Predictive Analysis results
• Conclusion
• Appendix
EXECUTIVE SUMMARY
• Project to predict falcon 9 landing
• Data collection
• Space X Api requests and data wrangling
• Web scraping via wikipedia
• Exploratory Data Analysis
• EDA with SQL
• EDA with data visualizations
• Interactive visual analytics with dashboards
• Interactive visual analytics with folium maps
• Space X machine learning prediction
INTRODUCTION
• Space X falcon 9 launches has a huge cost
reduction compared to other providers.
• This is because of its ability to reuse the
first stage.
• This information will be needed to estimate
the cost of a launch and ultimately bid
against space X.
• Aim of the project
• To predict if the first stage of a rocket will
land
METHODOLOGY
• Data Collection
• Using Space X API
• Web Scraping from Wikipedia
• Data Wrangling
• Data preparation through one hot encoding categorical data
• Removal irrelevant columns
• Performing Exploratory Data Analysis
• EDA with SQL
• Visualizations such as bar charts, scatter graphs provided
by seaborn python library
• Interactive Visual analytics with Folium and plotly
Dash
• Machine learning prediction using classification
models provided by scikit-learn
• Train and test split and data normalization
• Parameter tuning to maximize accuracy of predictive models.
Data Collection
• From Space X API
Make request to
Returns space Data Cleaning Export data to a
space X api using
X data as Json and filtering csv file
relevant library
• Web Scraping
Parse the data and Convert

Make html request Export dataframe
find all relevant dictionary to
from wikipedia object to csv
tables dataframe
Data Wrangling
In the dataset, there are various outcome labels
Perform EDA on Dataset such as True Ocean, True RTLS, False RTLS,
True ASDS, False ASDS etc. All these labels
tells if the landing was successful or not
successful. For convenience the various classes
Calculate the number of launches at each will be grouped into two distinct classes. 1 will
site stand for successful landing and 0 for
unsuccessful landing.
Calculate the number and occurrence of

each orbit
Create a landing outcome label from

Outcome column
EDA with visualizations
Scatter Graph Bar Graph
• Flight Number VS. Payload Mass • Mean VS. Orbit
• Flight Number VS. Launch Site
• Payload VS. Launch Site
• Flight Number Payload VS. Orbit Type
• Orbit VS. Payload Mass
Line Graph
• Success Rate VS. Year
EDA with SQL
Sql queries performed on the dataset
• Displaying the names of the unique launch sites in the space mission
• Displaying 5 records where launch sites begin with the string 'KSC’
• Displaying the total payload mass carried by boosters launched by NASA (CRS)
• Displaying average payload mass carried by booster version F9 v1.1
• Date where the successful landing outcome in drone ship was achieved.
• Names of the boosters which have success in ground pad and have payload mass
greater than 4000 but less than 6000 •
• Total number of successful and failure mission outcomes •
• Names of the booster versions which have carried the maximum payload mass.
Predictive Analysis method
• Building the models
• Load our dataset into Pandas Dataframe
• Normalize Data
• Split our data into training and test data sets
• Select machine learning algorithms we want to use
• Set our parameters and algorithms to GridSearchCV
• Fit our datasets into the GridSearchCV objects and train our dataset
• Model Evaluation
• Calculate accuracy for each model with test set
• Plot Confusion Matrix
• Select the best predictive model
RESULTS
• EDA with visualization results
• EDA with SQL results
• Interactive map with Folium results
• Dashboard results
• Predictive Analysis results

EDA with visualization results
Flight number vs. Launch
site
The higher the flight number the higher the success rate at each launch site.
Payload vs. Launch Site
Higher Successful launches are recorded as the payload mass increases

Orbit vs. Success rate
ES-L1,GEO,HEO and SSO has

the best success rate.
Orbit SO has no recorded
success rate.
Flight Number vs. Orbit
LEO and VLEO orbit success improve with flight numbers but can’t be clearly seen with the other
orbits
Payload Mass vs. Orbit
Success rate increases with payload increase for LEO, ISS.

Outcome can not be distinguished with payload for GTO.
Year vs. Success rate
It can be observed that success rate

has been on the increase since
2013.
EDA with SQL results
Unique Launch Sites 5 records where launch sites begin

with the string 'CCA'
The total payload mass carried The date when the first
by boosters launched by NASA (CRS) successful landing outcome in
ground pad was achieved.
The average payload The names of the boosters which

mass carried by have success in drone ship and
booster version F9 have payload mass greater than
v1.1 4000 but less than 6000
The total number of The names of the booster
successful and failure versions which have
mission outcomes carried the maximum The failed landing_outcomes
payload mass in drone ship, their
booster versions, and
launch site names for in
year 2015
Ranked count of landing outcomes (such as
Failure (drone ship) or Success (ground pad))
between the date 2010-06-04 and 2017-03-20,
in descending order
Interactive map with Folium results
All launch sites displayed
The Launch sites are

shown to be located at the
coasts nearby Los Angeles
and Florida
Marked success/failed launches at each site
Failed/successful launches marked out at

CCAFS LC-40
Dashboard results
KSC LC-39A is the site with the largest number of successful launches
Total successful/failed launches at each site
Success
Failure
KSC LC-39A CCAFS LC-40 CCAFS SLC-40 VAFB SLC-4E
KSC LC-39A has the highest launch success rate

Payload range 0 - 5.5kg has a higher success rate than the payload range of 5.5 - 10 kg
Predictive Analysis results
Logistic Regression SVM

• Train Accuracy: 0.8464285 • Train Accuracy: 0.8482142
• Test Accuracy:0.833333 • Test Accuracy:0.833333
KNN
Decision Tree
• Train Accuracy: 0.8482143
• Train Accuracy: 0.889285
• Test Accuracy:0.833333
• Test Accuracy:0.833333
The four predictive models performed equally well with the test data. This may be as a result of the small
amount of data used for testing. The Decision Tree model has the highest accuracy with the training set.
CONCLUSION
• This project successfully allowed us to analyze
the space X dataset with visualizations and sql
queries allowing us to carefully select, transform
and perform feature engineering on the data.
• These features were then used to build
classification models.
• The accuracy of the four models were good but the
best performing model was the decision tree with a
training accuracy and test accuracy of 0.889285
and 0.83333 respectively.
APPENDIX
• GitHub
repository including all the completed notebooks
and Python files

Capstone Story Template

Uploaded by

Copyright:

Available Formats

You might also like

Capstone Story Template

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Capstone Story Template

Uploaded by

Copyright:

Available Formats

IBM data science

Parse the data and Convert

Calculate the number and occurrence of

Create a landing outcome label from

• EDA with SQL results

• Interactive map with Folium results

• Predictive Analysis results

Higher Successful launches are recorded as the payload mass increases

ES-L1,GEO,HEO and SSO has

Success rate increases with payload increase for LEO, ISS.

It can be observed that success rate

Unique Launch Sites 5 records where launch sites begin

The average payload The names of the boosters which

The Launch sites are

Failed/successful launches marked out at

KSC LC-39A CCAFS LC-40 CCAFS SLC-40 VAFB SLC-4E

KSC LC-39A has the highest launch success rate

Logistic Regression SVM

You might also like