Professional Documents
Culture Documents
1 - BBDS - Why Learning Data Science Is An Absolute Must
1 - BBDS - Why Learning Data Science Is An Absolute Must
ma
www.bigbang-datascience.com
Agenda
• Data Explosion
• Why Data Science? • Picking the right algorithm
• What is Data Science? • Misconceptions/Myths
• Type of Analytics • Hard Situations
• Data Science Portfolio • Limitations of ML today
• Data Science Process
• Career in Data Science
6
Why should You Become
Data Scientist?
Make a Better World A fuel of 21st Century
4
Is Data Science for me?
Because of the media coverage around data science and the characterization of data scientists
as “rock stars,” you may feel like it’s impossible for you to enter into this realm. If you’re
the type of person who loves to solve puzzles and find patterns, whether or not you consider
yourself a quant, then data science is for you.
8
8
Data Explosion
SomeInterestingFactsAboutData
▪Every day, we create 2.5 quintillion bytes of data, so much that
90% of the data in the world today has been created in the last two
years alone.
What do we do
we so much
amount of data?
Ignore or use it.
Howmuchdataisgettinggenerated?
Howmuchdataisgettinggenerated?
12
The model has changed …
Old Model – only a few companies were generating data (like news outlets), all others are
consuming data
New Model – all of us are generating data, and all of us are consuming data
OpportunitiesforNewApproachtoAnalytics
Over 2.5 Exabyte (2.5 billion gigabytes) of data is generated every day.
In 2020, the world will generate 50x more data than we generated in 2011
14
Data-TheMostValuableResource
“In its raw form, oil has little value. Once processed and refined, it helps power the world.”
—Ann Winblad
A decade after the term data science was first used, there is continued debate among
practitioners and academics about what data science means.
Data Science is the science which uses computer science, statistics and machine
“The ability to take data—to be able to understand it, process it, to extract value
from it, to visualize it, to communicate it
—that’s going to be a hugely important skill in the next decades.”
17
DataScience–ADefinition
Multidisciplinary
Harvard Business : Data scientist is the sexiest career of the 21st century
LinkedIn: Statistical Analysis & Data Mining were the hottest skills that got
recruiters’ attention in 2014/2015/2016/2017/2018/2019/2020/2021
Glassdoor ranked data scientist as the #1 job to pursue in 2016/2017/2018/2019/2020
McKinsey: the US alone faces a shortage of 150,000+ data analysts and an
additional 1.5 million data- savvy managers
Salary trends have followed the impact of data science. With a national
average salary of $118,000 (which increases to $126,000 in Silicon
Valley), data science has become a lucrative career path where
you can solve hard problems and drive social impact.
23
“DataScience”anEmergingField
GoalofDataScience
Turn data into data products.
24
Types of Analytics
There are four distinct types of Analytics
Empowers Increases
management to make accountability and
better decisions validates decisions
2. Technical aptitude, such as software engineering, machine learning, and programming skills.
4.Curious & Creative, data scientists must be passionate about data and finding creative
ways to solve problems and portray information
31
DataScienceIsa TeamSport
32
DataScienceIsa TeamSport
33
“Citizen Data Scientist” ?
Market trends indicate that the emergence of “ Citizen Data Scientist”
Will Data Science
take our jobs ?
What Jobs Will be Lost?
• 5% of all jobs can be completely automated today
– Collecting data, processing data and predictable
physical work
– Manufacturing, food preparation, tax
preparation, financial advising etc.
3
5
New Jobs Due to AL
• Trainers: Teach AI systems how they should perform
– Natural-language processors and language translators make fewer errors (understanding
context, detect Sarcasm).
– How to mimic human behaviors (Empathy and humor).
– Ex: I am stressed out about the exam.
• Sustainers: Ensure that AI systems are operating as designed and that unintended
consequences are addressed.
– Maintain confidence in the fairness and auditability of their AI systems
– Be a watchdog and ombudsman for upholding norms of human values and morals
– Ex: Recalling only white women when search for “loving grandmother”, adverse actions
among a minority groups
Ref: Wilson 2017
3
6
Data Science Jobs have Increased
• Data Scientist has been called the
sexiest job of the 21st century
• Sustainers: Ensure that AI systems are operating as designed and that unintended
consequences are addressed.
– Maintain confidence in the fairness and auditability of their AI systems
– Be a watchdog and ombudsman for upholding norms of human values and morals
– Ex: Recalling only white women when search for “loving grandmother”, adverse actions
among a minority groups
Ref: Wilson 2017
3
6
Data Science
Lifecycle
Career in DataScience
What You Need to Learn to Become a Data Scientist
This next section covers all of the data science skills you’ll need to learn. You’ll also learn
about the tools you need to do your job.
Most data scientists use a combination of skills every day, some of which they have taught
themselves on the job or otherwise. They also come from various backgrounds.
There isn’t any one specific academic credential that is required to be an effective data
scientist.
How to Become a Data Scientist?
Domain Expertise Programing Languages Math | Stats | Probability Lingo | Foundations Projects
Domains (5)
One way to understand the collaborations that lead to Data Science success is to think of a
three-legged stool. Each leg is critical to the stool remaining stable and fulfilling its intended
purpose
Three Legged Stool (Skills)
Data Viz
Tableau is not only an ultra-powerful tool for seasoned analytics, but is also so easy to learn … that is a great
entry point into the World of Data. Tableau is like a Data Science career hack
R/Python
These two programming languages have become the two titans of Data Science. While very different in nature,
they both facilitate the same thing – statistical analysis on unlimited complexity. Knowing at least one is must.
Knowing both puts you miles a head
SQL (PostgreSQL)
Knowing how to efficiently query database is a crucial part of Data Scientist’s job – to analyze the data you first
need to go get it. SQL programming also develops e certain way of thinking about data which helps you se the
big picture and workflow of your analysis
Statistics
Needless to say that if you want to be successful as a Data Scientist you will need to develop a certain level of
statistical acumen. Start with Logistic Regression, A/B test and the law of Large Number
Presentation
Preparing the data, building models, creating visualizations and deriving insights – are only half of the job. To be
a successful Data Scientist you need to be able to communicate your insights to your audience
Three Legged Stool (Domains)
Data Mining /BI Tools
Also known as ad-hoc analytics, data mining is the process of deriving new insights from data. Though different
in essence, creating business intelligence (BI) Tools is closely related, because often these insights need to be
streamlined and integrated into the business
Machine Learning/Modeling
Machine Learning is popping up everywhere: recommender systems on Amazon & Netflix, speech-to-text, face
recognition on your phone – the list goes one
Advanced Analytics
With Advanced Analytics you create simulations to help real-world businesses identify opportunities for
improvements
Computer Forensics
Computer Forensics/Fraud Analytics/Cyber Security all deal with slightly different things, however the overall
objectives are extractions, analysis, protection and even ethical hacking of information for legal purposes
Big Data
Big data refers to dealing with large and complex data sets which traditional applications simply cannot cope with.
Rule of “3Vs” – Volume, Variety, Velocity
Three Legged Stool (Lifecycle)
Phase 1 : Identify The Problem
Ever heard the phrase “Here’s some data, can you find some insights?”. Too often stakeholders approach Data
Scientists with vague or even undefined goals. Understanding the end goal is very important and sets up the rest
of the project for success – (Time consumption : 10%)
Our 30 Week Training program in Data Science and Machine Learning program starts in
August 1 at 10:00 AM EST. The program is 30 Weeks live stream, 5 times a week and 3
hrs. per session and the fee is $2999 with 4 payment plan and a possible discount for direct
payment. Once you complete the program, if you are ready we will help you with Resume
preparation, Interview preparation and Job placement for free. If you are not ready then we
will train you another 30 Weeks for free
BBDS Program built by identifying critical skills that hiring managers are asking for, with
actual tools used by analysts, delivered to you with 100% online training - including
practical labs where you get to use platforms hands on.
1 - Program Quality
- The program has been acquired by a university in Latin America and it has been
transformed into Post graduate diploma in Data Science & Machine Learning
Logistic Bisecting FP -
FP-Growth Deep Learning
LDA Polynomial CNN SARSA
Regression K-Means Growth for TM
Determine
Transform/Fix Data Select Planning
Business Design Features Model Selection
Target Variable Normalization The Model Deployment
Objectives
Frame the
Collect Redundant & Data Model Monitoring &
Problem Split Data
Initial Data Duplicates Factorization Optimization Maintenance
Assess Feasibility
Define Success Install & Import Data Quality Audit Data Data Parameters
Measurements Packages (Missing Values) Binarization Scaling Final Report
Tuning