Capstone Story Template

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

IBM data science

capstone Project
Ikeora Ekene
May ,2022
OUTLINE
• Executive Summary
• Introduction
• Methodology
• Results
• EDA results
• Interactive map with Folium
results
• Plotly Dashboard results
• Predictive Analysis results
• Conclusion
• Appendix
EXECUTIVE SUMMARY
• Project to predict falcon 9 landing
• Data collection
• Space X Api requests and data wrangling
• Web scraping via wikipedia
• Exploratory Data Analysis
• EDA with SQL
• EDA with data visualizations
• Interactive visual analytics with dashboards
• Interactive visual analytics with folium maps
• Space X machine learning prediction
INTRODUCTION
• Space X falcon 9 launches has a huge cost
reduction compared to other providers.
• This is because of its ability to reuse the
first stage.
• This information will be needed to estimate
the cost of a launch and ultimately bid
against space X.
• Aim of the project
• To predict if the first stage of a rocket will
land
METHODOLOGY
• Data Collection
• Using Space X API
• Web Scraping from Wikipedia
• Data Wrangling
• Data preparation through one hot encoding categorical data
• Removal irrelevant columns
• Performing Exploratory Data Analysis
• EDA with SQL
• Visualizations such as bar charts, scatter graphs provided
by seaborn python library
• Interactive Visual analytics with Folium and plotly
Dash
• Machine learning prediction using classification
models provided by scikit-learn
• Train and test split and data normalization
• Parameter tuning to maximize accuracy of predictive models.
Data Collection
• From Space X API
Make request to
Returns space Data Cleaning Export data to a
space X api using
X data as Json and filtering csv file
relevant library

• Web Scraping

Parse the data and Convert


Make html request Export dataframe
find all relevant dictionary to
from wikipedia object to csv
tables dataframe
Data Wrangling
In the dataset, there are various outcome labels
Perform EDA on Dataset such as True Ocean, True RTLS, False RTLS,
True ASDS, False ASDS etc. All these labels
tells if the landing was successful or not
successful. For convenience the various classes
Calculate the number of launches at each will be grouped into two distinct classes. 1 will
site stand for successful landing and 0 for
unsuccessful landing.

Calculate the number and occurrence of


each orbit

Create a landing outcome label from


Outcome column
EDA with visualizations
Scatter Graph Bar Graph
• Flight Number VS. Payload Mass • Mean VS. Orbit
• Flight Number VS. Launch Site
• Payload VS. Launch Site
• Flight Number Payload VS. Orbit Type
• Orbit VS. Payload Mass

Line Graph
• Success Rate VS. Year
EDA with SQL
Sql queries performed on the dataset
• Displaying the names of the unique launch sites in the space mission
• Displaying 5 records where launch sites begin with the string 'KSC’
• Displaying the total payload mass carried by boosters launched by NASA (CRS)
• Displaying average payload mass carried by booster version F9 v1.1
• Date where the successful landing outcome in drone ship was achieved.
• Names of the boosters which have success in ground pad and have payload mass
greater than 4000 but less than 6000 •
• Total number of successful and failure mission outcomes •
• Names of the booster versions which have carried the maximum payload mass.
Predictive Analysis method
• Building the models
• Load our dataset into Pandas Dataframe
• Normalize Data
• Split our data into training and test data sets
• Select machine learning algorithms we want to use
• Set our parameters and algorithms to GridSearchCV
• Fit our datasets into the GridSearchCV objects and train our dataset
• Model Evaluation
• Calculate accuracy for each model with test set
• Plot Confusion Matrix
• Select the best predictive model
RESULTS
• EDA with visualization results

• EDA with SQL results

• Interactive map with Folium results

• Dashboard results

• Predictive Analysis results


EDA with visualization results
Flight number vs. Launch
site

The higher the flight number the higher the success rate at each launch site.
Payload vs. Launch Site

Higher Successful launches are recorded as the payload mass increases


Orbit vs. Success rate

ES-L1,GEO,HEO and SSO has


the best success rate.
Orbit SO has no recorded
success rate.
Flight Number vs. Orbit

LEO and VLEO orbit success improve with flight numbers but can’t be clearly seen with the other
orbits
Payload Mass vs. Orbit

Success rate increases with payload increase for LEO, ISS.


Outcome can not be distinguished with payload for GTO.
Year vs. Success rate

It can be observed that success rate


has been on the increase since
2013.
EDA with SQL results

Unique Launch Sites 5 records where launch sites begin


with the string 'CCA'
The total payload mass carried The date when the first
by boosters launched by NASA (CRS) successful landing outcome in
ground pad was achieved.

The average payload The names of the boosters which


mass carried by have success in drone ship and
booster version F9 have payload mass greater than
v1.1 4000 but less than 6000
The total number of The names of the booster
successful and failure versions which have
mission outcomes carried the maximum The failed landing_outcomes
payload mass in drone ship, their
booster versions, and
launch site names for in
year 2015
Ranked count of landing outcomes (such as
Failure (drone ship) or Success (ground pad))
between the date 2010-06-04 and 2017-03-20,
in descending order
Interactive map with Folium results
All launch sites displayed

The Launch sites are


shown to be located at the
coasts nearby Los Angeles
and Florida
Marked success/failed launches at each site

Failed/successful launches marked out at


CCAFS LC-40
Dashboard results

KSC LC-39A is the site with the largest number of successful launches
Total successful/failed launches at each site
Success
Failure

KSC LC-39A CCAFS LC-40 CCAFS SLC-40 VAFB SLC-4E

KSC LC-39A has the highest launch success rate


Payload range 0 - 5.5kg has a higher success rate than the payload range of 5.5 - 10 kg
Predictive Analysis results

Logistic Regression SVM


• Train Accuracy: 0.8464285 • Train Accuracy: 0.8482142
• Test Accuracy:0.833333 • Test Accuracy:0.833333
KNN
Decision Tree
• Train Accuracy: 0.8482143
• Train Accuracy: 0.889285
• Test Accuracy:0.833333
• Test Accuracy:0.833333

The four predictive models performed equally well with the test data. This may be as a result of the small
amount of data used for testing. The Decision Tree model has the highest accuracy with the training set.
CONCLUSION
• This project successfully allowed us to analyze
the space X dataset with visualizations and sql
queries allowing us to carefully select, transform
and perform feature engineering on the data.
• These features were then used to build
classification models.
• The accuracy of the four models were good but the
best performing model was the decision tree with a
training accuracy and test accuracy of 0.889285
and 0.83333 respectively.
APPENDIX
• GitHub
repository including all the completed notebooks
and Python files

You might also like