1 - BBDS - Why Learning Data Science Is An Absolute Must

www.bbds.
ma
www.bigbang-datascience.com
Agenda
Introduction to Data Science Deep Learning | Machine Learning Practice
• Data Explosion
• Why Data Science? • Picking the right algorithm
• What is Data Science? • Misconceptions/Myths
• Type of Analytics • Hard Situations
• Data Science Portfolio • Limitations of ML today
• Data Science Process
• Career in Data Science
Introduction to Machine Learning

• Big Picture
• Definitions, Learning Mechanisms and
Origins Structure, Tasks and Success Metrics
• Jobs impacted by Machine Learning
6
Why should You Become
Data Scientist?
Make a Better World A fuel of 21st Century
Career of Tomorrow High Demand
Great Salaries Lucrative Career
4
Is Data Science for me?
Because of the media coverage around data science and the characterization of data scientists
as “rock stars,” you may feel like it’s impossible for you to enter into this realm. If you’re
the type of person who loves to solve puzzles and find patterns, whether or not you consider
yourself a quant, then data science is for you.
Cathy O’Neil & Rachel Schutt - Doing Data Science
8
8
Data Explosion
SomeInterestingFactsAboutData
▪Every day, we create 2.5 quintillion bytes of data, so much that
90% of the data in the world today has been created in the last two
years alone.
▪Walmart handles more than 1 million customer transactions every

hour, which is imported into databases estimated to contain more than
2.5 PB of data
▪ Twitter generates 12 TB of data every day.
▪ Airbus A380 generates 10 TB every 30 minutes of flight.
▪ NYSE generates a TB of data every month.
What do we do
we so much
amount of data?
Ignore or use it.
Howmuchdataisgettinggenerated?
Howmuchdataisgettinggenerated?
12
The model has changed …
Old Model – only a few companies were generating data (like news outlets), all others are
consuming data
New Model – all of us are generating data, and all of us are consuming data
OpportunitiesforNewApproachtoAnalytics
Over 2.5 Exabyte (2.5 billion gigabytes) of data is generated every day.
In 2020, the world will generate 50x more data than we generated in 2011
14
Data-TheMostValuableResource
“In its raw form, oil has little value. Once processed and refined, it helps power the world.”
—Ann Winblad
“Data is the new oil.”

—Clive Humby, CNBC
What is DataScience?
DataScience–ADefinition
A decade after the term data science was first used, there is continued debate among
practitioners and academics about what data science means.
Data Science is the science which uses computer science, statistics and machine
learning, visualization and human-computer interactions to collect, clean, integrate,
analyze, visualize, interact with data to create data products.
“The ability to take data—to be able to understand it, process it, to extract value
from it, to visualize it, to communicate it
—that’s going to be a hugely important skill in the next decades.”
- Hal Varian, Google’s Chief Economist
17
DataScience–ADefinition
If you torture the data long enough, it will confess to anything

Ronald Coase
18
DataScience–AVisualDefinition
Multidisciplinary
➢ Statistics quantifies numbers

➢ Data Mining explains patterns
From my perspective a data scientist
have a blend of many skills ➢ Machine Learning predicts with models
➢ Artificial Intelligence behaves and reasons
DataScience–AVisualDefinition
• Data science applies the scientific

method to analyzing data
Machine
• It lies at the intersection of several Learning
disciplines Data
Science
• It draws on industry knowledge that
makes the analysis of Big Data possible
Industry
Industry knowledge is Knowledge
essential toknowing what to
look for whenexploring data
Why DataScience ?
WhyDataScience?
Harvard Business : Data scientist is the sexiest career of the 21st century
LinkedIn: Statistical Analysis & Data Mining were the hottest skills that got
recruiters’ attention in 2014/2015/2016/2017/2018/2019/2020/2021
Glassdoor ranked data scientist as the #1 job to pursue in 2016/2017/2018/2019/2020
McKinsey: the US alone faces a shortage of 150,000+ data analysts and an
additional 1.5 million data- savvy managers
Salary trends have followed the impact of data science. With a national
average salary of $118,000 (which increases to $126,000 in Silicon
Valley), data science has become a lucrative career path where
you can solve hard problems and drive social impact.
23
“DataScience”anEmergingField
The future belongs to the

companies and people that turn
data into products
O’Reilly Radar report, 2011
GoalofDataScience
Turn data into data products.
24
Types of Analytics
There are four distinct types of Analytics
Explained what Suggests why it Indicates what Recommends what

has happened happened could happen should happen
There are four distinct types of Analytics
There are several area of Analytics
Whatdatasciencecando
Data science can be applied across any industry
Empowers Increases
management to make accountability and
better decisions validates decisions
Increases operational Identifies new

efficiency and opportunities to stay
investment from staff competitive
Data Science Portfolio
DataScientistProfile(Competencies)
1. Quantitative skills, such as mathematics or statistics
2. Technical aptitude, such as software engineering, machine learning, and programming skills.
3.Skeptical…..this may be a counterintuitive trait, although it is important that data scientists

can examine their work critically rather than in a one-sided way.
4.Curious & Creative, data scientists must be passionate about data and finding creative
ways to solve problems and portray information
5.Communicative & Collaborative: it is not enough to have strong

quantitative skills or engineering skills. To make a project resonate,
you must be able to articulate the business value in a clear way, and
work collaboratively with project sponsors and key stakeholders.
31
DataScienceIsa TeamSport
32
DataScienceIsa TeamSport
33
“Citizen Data Scientist” ?
Market trends indicate that the emergence of “ Citizen Data Scientist”
Will Data Science
take our jobs ?
What Jobs Will be Lost?
• 5% of all jobs can be completely automated today
– Collecting data, processing data and predictable
physical work
– Manufacturing, food preparation, tax
preparation, financial advising etc.
• At 60% jobs can be automated 30%

– 25% of a CEO job can be automated
• Change dependent on:

– Technical Feasibility, cost of development and
integration, labor market dynamics, economic
benefits, regulatory & social acceptance
• Least susceptible to automation:

– Applying expertise, decision making, planning, Ref: MGI 2017b
creative tasks, managing and developing others
3
5
New Jobs Due to AL
• Trainers: Teach AI systems how they should perform
– Natural-language processors and language translators make fewer errors (understanding
context, detect Sarcasm).
– How to mimic human behaviors (Empathy and humor).
– Ex: I am stressed out about the exam.
• Explainers: Bridge the gap between technologists and business leaders.

– Provide clarity, especially, in right of “right to explanation”
– Explain and correct unintended behaviors
– Ex: Ability to explain why a certain decision was reached
• Sustainers: Ensure that AI systems are operating as designed and that unintended
consequences are addressed.
– Maintain confidence in the fairness and auditability of their AI systems
– Be a watchdog and ombudsman for upholding norms of human values and morals
– Ex: Recalling only white women when search for “loving grandmother”, adverse actions
among a minority groups
Ref: Wilson 2017
3
6
Data Science Jobs have Increased
• Data Scientist has been called the
sexiest job of the 21st century
• There will be be 11.6 million new jobs

by 2026 per statistics
• Post Covid everything will digitize

faster creating more opportunities
• This is one of the worst

economic crisis in centuries
• Reports of job losses and record

unemployment worldwide
• AI, ML jobs are still expected to

grow this year and accelerate
further
New Jobs Due to AL
• Trainers: Teach AI systems how they should perform
– Natural-language processors and language translators make fewer errors (understanding
context, detect Sarcasm).
– How to mimic human behaviors (Empathy and humor).
– Ex: I am stressed out about the exam.
• Explainers: Bridge the gap between technologists and business leaders.

– Provide clarity, especially, in right of “right to explanation”
– Explain and correct unintended behaviors
– Ex: Ability to explain why a certain decision was reached
• Sustainers: Ensure that AI systems are operating as designed and that unintended
consequences are addressed.
– Maintain confidence in the fairness and auditability of their AI systems
– Be a watchdog and ombudsman for upholding norms of human values and morals
– Ex: Recalling only white women when search for “loving grandmother”, adverse actions
among a minority groups
Ref: Wilson 2017
3
6
Data Science
Lifecycle
Career in DataScience
What You Need to Learn to Become a Data Scientist
This next section covers all of the data science skills you’ll need to learn. You’ll also learn
about the tools you need to do your job.
Most data scientists use a combination of skills every day, some of which they have taught
themselves on the job or otherwise. They also come from various backgrounds.
There isn’t any one specific academic credential that is required to be an effective data
scientist.
How to Become a Data Scientist?
Domain Expertise Programing Languages Math | Stats | Probability Lingo | Foundations Projects
• Health Care • Python • Measures of Positions • Decision Tree • Customer Churn

• Sales • R • Measures of Dispersion • Random Forest • Fraud Detection
• Finance • Julia • Measures of Shape • Data Science • Basket Analysis
• IT • Scala • Measures of Relationships • Entropy • Loan default detection
• Management • JAVA • Mean, Median, and Mode • Data Split • Image classification
• Accounting • C++ • Variance and Standard Deviation • Model fitting • … etc.
• Transportation • Co-variance and Correlation • Training set
• Media • Permutations and Combinations • Testing set
• Travelling • Unions and Intersections • Target variable
• etc • Conditional Probability • Classification
• Bayes Theorem • Regression
• Binomial Distribution • Clustering
• Poisson Distribution • Big Data
• Normal Distribution • Machine Learning
• Sampling • Deep Learning
• Central Limit Theorem • NLP
• Hypothesis Testing • R2
• T-Distribution Testing • Confusion Matrix
• Regression Analysis • ...etc.
• ANOVA
• Chi Squared
• … etc.
Three-Legged Stool
Skills (5) Lifecycle (5)
Domains (5)
One way to understand the collaborations that lead to Data Science success is to think of a
three-legged stool. Each leg is critical to the stool remaining stable and fulfilling its intended
purpose
Three Legged Stool (Skills)
Data Viz
Tableau is not only an ultra-powerful tool for seasoned analytics, but is also so easy to learn … that is a great
entry point into the World of Data. Tableau is like a Data Science career hack
R/Python
These two programming languages have become the two titans of Data Science. While very different in nature,
they both facilitate the same thing – statistical analysis on unlimited complexity. Knowing at least one is must.
Knowing both puts you miles a head
SQL (PostgreSQL)
Knowing how to efficiently query database is a crucial part of Data Scientist’s job – to analyze the data you first
need to go get it. SQL programming also develops e certain way of thinking about data which helps you se the
big picture and workflow of your analysis
Statistics
Needless to say that if you want to be successful as a Data Scientist you will need to develop a certain level of
statistical acumen. Start with Logistic Regression, A/B test and the law of Large Number
Presentation
Preparing the data, building models, creating visualizations and deriving insights – are only half of the job. To be
a successful Data Scientist you need to be able to communicate your insights to your audience
Three Legged Stool (Domains)
Data Mining /BI Tools
Also known as ad-hoc analytics, data mining is the process of deriving new insights from data. Though different
in essence, creating business intelligence (BI) Tools is closely related, because often these insights need to be
streamlined and integrated into the business
Machine Learning/Modeling
Machine Learning is popping up everywhere: recommender systems on Amazon & Netflix, speech-to-text, face
recognition on your phone – the list goes one
Advanced Analytics
With Advanced Analytics you create simulations to help real-world businesses identify opportunities for
improvements
Computer Forensics
Computer Forensics/Fraud Analytics/Cyber Security all deal with slightly different things, however the overall
objectives are extractions, analysis, protection and even ethical hacking of information for legal purposes
Big Data
Big data refers to dealing with large and complex data sets which traditional applications simply cannot cope with.
Rule of “3Vs” – Volume, Variety, Velocity
Three Legged Stool (Lifecycle)
Phase 1 : Identify The Problem
Ever heard the phrase “Here’s some data, can you find some insights?”. Too often stakeholders approach Data
Scientists with vague or even undefined goals. Understanding the end goal is very important and sets up the rest
of the project for success – (Time consumption : 10%)
Phase 2 :Prepare the Data

Data can come from many sources, be in the wrong format, have anomalies and a myriad of other problems. A
single mistake in this stage can render the rest of the analysis useless – (Time consumption : 70%)
Phase 3: Analyze the Data

Creating models, performing data mining, running text analytics, setting up simulations. This is the most fun and
exciting part if the previous stages have been done correctly, analyzing the data and deriving insights will feel like
a breeze – (Time consumption : 10%)
Phase 4 : Visualize Insights
Visualizing comes hand –in-hand with analyzing. This is a very powerful technique as seeing the data in various
forms and shapes can help uncover insights that are otherwise not evident - – (Time consumption : 10%)
Phase 5: Present Findings

Presenting findings is a whole separate “Bonus” stage. You need to not only convey the insights in your
audience’s language but also get buy-in from them to take action based on those insights
BBDS 30 Weeks
Training Program
Program Overview
Our 30 Week Training program in Data Science and Machine Learning program starts in
August 1 at 10:00 AM EST. The program is 30 Weeks live stream, 5 times a week and 3
hrs. per session and the fee is $2999 with 4 payment plan and a possible discount for direct
payment. Once you complete the program, if you are ready we will help you with Resume
preparation, Interview preparation and Job placement for free. If you are not ready then we
will train you another 30 Weeks for free
BBDS Program built by identifying critical skills that hiring managers are asking for, with
actual tools used by analysts, delivered to you with 100% online training - including
practical labs where you get to use platforms hands on.
Made for the Modern Student
No stuffy classrooms, outdated textbooks, or overpriced bootcamps. This program is made

for the forward-looking learner. You will get a deep dive into not only important skills, but
strategies of how to use them
Program Overview
1 - Program Quality
- The program has been acquired by a university in Latin America and it has been
transformed into Post graduate diploma in Data Science & Machine Learning
2 - Program training material for this course is sourced from

• 7 Years of Data Science industry experience
• 4 Years of teaching this Data Science program leading to a refined curriculum
• Columbia University Master degree in Machine Learning and Applied Data Science
• Maryland University Master degree in Data Science and Analytics
3 - What is New in Batch 12?

• Batch 12 is 29 weeks compared to batch 11 (18 weeks)
• Migrated to Canvas as the LMS
• Collaboration with DataCamp : 85+ mandatory videos & 350+ mandatory exercises
• Digital blockchain Certification of Completion
Program Overview
4 - Program Highlights: 5 - Weekly Schedule:
• Hybrid classes (Virtual and Physical) • Saturday at 9:00 AM EST to 12:00 PM EST
• Flexible schedule (Evenings and Weekends) • Monday at 8:00 PM EST to 11:00 PM EST
• Wednesday at 8:00 PM EST to 11:00 PM EST
• 22+ Group Projects (R & Python)
• Thursday at 8:00 PM EST to 11:00 PM EST
• 1 Individual Capstone Project
• Extensive Live Online Training
• Instructor-Led Course
• Training Video Recordings
• Quality Training Materials
• Two-Way Interactive Sessions
• Job Oriented Training
• Mock Exam/ Graded Assessments
• Professional Certificate
• Interview Prep Job Placement and Placement Guidance
• Repeat any time at no additional cost
• Extra help if needed 24/7
Machine Learning Types & Techniques
Factor NLP – Time Series Reinforcement
Supervised Learning Unsupervised Learning
Analysis Deep Learning Learning
Dimensionality Clustering Pattern Search Time Series

Classification Regression Neural Nets Stacking
Reduction Association Rules Analysis
PCA K-Means Apriori AR – MA Genetic

Decision Tree Simple Linear ANN
EST Algorithm
Kernel Naïve Bayes Convolutional ARMA

Multiple Linear HCA Eclat Q-learning
PCA Net ARIMA
Logistic Bisecting FP -
FP-Growth Deep Learning
LDA Polynomial CNN SARSA
Regression K-Means Growth for TM
LS/Lasso Fuzzy C-Means Recurrent

T-SNE SVM Mean Shift A/B Testing NLP A3C
Ridge/ElN Ex. Maximization
Net
Locally L. Text Deep Q -

Kernel SVM Decision Tree Anomalies A/B Testing RNN
Embedding Analysis Network
Detection
Recommender Generative Topic Autoencoders

SVD - LSA KNN SVR Clustering
Systems Adversarial Modeling (seq2seq)
Anomaly
Matrix Perceptron Deep Learning
Partial
Factorization (MLP) for NLP
L.S(PLS)
Ensemble Methods Evaluation – Assessment - Optimization
Random Bagging Ada-Boost CM – ROC – R2 – MSE

G. Boosting LightGBM XGBoost
Forest Boosting CatBoost Cross V. - Grid S.
Business Data Data
Modeling Optimization Deployment
Understanding Understanding Preparation
Determine
Transform/Fix Data Select Planning
Business Design Features Model Selection
Target Variable Normalization The Model Deployment
Objectives
Frame the
Collect Redundant & Data Model Monitoring &
Problem Split Data
Initial Data Duplicates Factorization Optimization Maintenance
Assess Feasibility
Define Success Install & Import Data Quality Audit Data Data Parameters
Measurements Packages (Missing Values) Binarization Scaling Final Report
Tuning
Identify Target Read the Data Quality Audit Data

Variables (Y) Data Dummy Model Lessons Learned
(Outliers) Standardizing
Data
Identify Analytical Data Quality Audit Data
Manipulation & Build Model
Approach (Cardinality Check) Correlations
Wrangling
Data
Identify Exploratory Data Data Fit Model
Aggregation
Deployment Plan Analysis (EDA) Conversion (Train)
Binning
Produce Data Data Data Predict
Project Plan Visualization Transformation Decomposition (Test)
Feature
Identify the team Statistical Feature Assess &
Engineering
& Stakeholders Analysis Selections Evaluate
(Importance, Low variance, PCA)
Analytics Base Code Book

Data Version 2/3/4 Best Model Best Parameters ROI
Table (ABT) Quality Report
Certificate of Completion
Q&A
BIG BANG DATA SCIENCE SOLUTIONS
LEARN . ACHIEVE. STANDOUT

1 - BBDS - Why Learning Data Science Is An Absolute Must

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 - BBDS - Why Learning Data Science Is An Absolute Must

Uploaded by

Copyright:

Available Formats

www.bbds.

Introduction to Data Science Deep Learning | Machine Learning Practice

Introduction to Machine Learning

Career of Tomorrow High Demand

Great Salaries Lucrative Career

Cathy O’Neil & Rachel Schutt - Doing Data Science

▪Walmart handles more than 1 million customer transactions every

▪ Twitter generates 12 TB of data every day.

▪ Airbus A380 generates 10 TB every 30 minutes of flight.

▪ NYSE generates a TB of data every month.

“Data is the new oil.”

learning, visualization and human-computer interactions to collect, clean, integrate,

analyze, visualize, interact with data to create data products.

- Hal Varian, Google’s Chief Economist

If you torture the data long enough, it will confess to anything

➢ Statistics quantifies numbers

• Data science applies the scientific

The future belongs to the

O’Reilly Radar report, 2011

Explained what Suggests why it Indicates what Recommends what

Data science can be applied across any industry

Increases operational Identifies new

3.Skeptical…..this may be a counterintuitive trait, although it is important that data scientists

5.Communicative & Collaborative: it is not enough to have strong

• At 60% jobs can be automated 30%

• Change dependent on:

• Least susceptible to automation:

• Explainers: Bridge the gap between technologists and business leaders.

• There will be be 11.6 million new jobs

• Post Covid everything will digitize

• This is one of the worst

• Reports of job losses and record

• AI, ML jobs are still expected to

• Explainers: Bridge the gap between technologists and business leaders.

• Health Care • Python • Measures of Positions • Decision Tree • Customer Churn

Skills (5) Lifecycle (5)

Phase 2 :Prepare the Data

Phase 3: Analyze the Data

Phase 5: Present Findings

Made for the Modern Student

No stuffy classrooms, outdated textbooks, or overpriced bootcamps. This program is made

2 - Program training material for this course is sourced from

3 - What is New in Batch 12?

Dimensionality Clustering Pattern Search Time Series

PCA K-Means Apriori AR – MA Genetic

Kernel Naïve Bayes Convolutional ARMA

LS/Lasso Fuzzy C-Means Recurrent

Locally L. Text Deep Q -

Recommender Generative Topic Autoencoders

Ensemble Methods Evaluation – Assessment - Optimization

Random Bagging Ada-Boost CM – ROC – R2 – MSE

Identify Target Read the Data Quality Audit Data

Analytics Base Code Book

LEARN . ACHIEVE. STANDOUT

You might also like