anazo1s
Latest atv: sh 90
Becoming a Data
Selemt
Lesring bate
Date Science
(Creare an Data
Selenite
bate Ming
Date Anais
Date Science
ig Data
Facebook
‘ass 0
Ete
(1 How do become a data scientist? - Quora
* How do | become a data scientist?
Write Question Details
Want Answers 67K) Comments § Share 440. Dovvote
Nilesh Bharwad
Eat Biography « Mae Anonymous
\Wte your answer, or answer ater
Witiam Chon, Analyzes data daly as a Data Scien. moe)
‘5k upvoos by Isa esebal, Mar Edmonson, Ax Woburn)
Here are some amazing and completely free resources
online that you can use to teach yourself data science.
Besides this page, I would highly recommend the Quors Data Seience FAQ as
‘yur comprehensive guide to data scence! I includes resources similar to this
fone, aswell as advice on preparing for data science interviews. Additionally,
follow tne Quota Data Science topic if you haven't alteady to get updates on
nev questions and answers!
Fulfill your prerequisites
Before you begin, you need Multivariable Calculus, Linear Algebra, and
Python, If your math background is up to multivariable eleulus and linear
algebra, youl have enough background to understand almost all of the
probability /statsties / machine earning forthe job.
‘Multivariate Caleulus: stips:)/nww-quore.com/What-are-the-best-
resounees-for-mastering-multivarable-caleulus
Numerical Linear Algebra / Computational Linear Algebra / Matrix
Algebra: Linear Alger ,Coursera (Satis 2/2/2015)
Multivariate caleulus is useful for some parts of machine earning and alot of
probability. Linar / Matrix algebra i absolutely nocesary fo lot of concepts
in machine learning
You also need some programming background to begin, preferably in Python.
‘Most other things on ths guide an be learned on the job (like random forests,
pends, A/B testing) but you can't get away without knowing how to
program!
Python is the most important language for a data scientist to learn,
‘Tolearn to code, more about Python, and why Python i so important, check
cout
+ How do lear to code?
+ How do I learn Python?
‘+ Why is Python a language of choice for data scientist?
+ Ts Python the most important programming language to lem for aspiring
data scientists & data miners?
youre currently in school, take statistics and computer seience
lasses. Check out What classes should I take if want to become a data
scientist?
Plug Yourself Into the Community
CCheek out Meetup to find some that interest you! Attend an interesting talk,
learn about data science lve, and meet data scientists and other aspirational
data scientists Start reading data seionce blogs and following influential data
Ihiputweew.quora.com/How-do-|-become dala scientist
‘There's more on Quors
Pick raw people an topics 1 fatlow
And se the best anawors 09 Qu,
Hom aerate his infograpti in ems
of booming a data scons?
hat technology ae courses are
requred to eaceme a ota scieistin a
staf?
het shoul ocus onto become @
jester sata seentst?
fetence what shoul be my lesming path
ent kraw nwo cose?
Cant become a stlsauaht dala
sions?
‘Ar 100 od to become aaa cies?
Hom do become a dla slots thou
sing fo colegetaving a does?
Hon do become a da slots in
Ina?
or Read Guestions
maanazo1s
(1 How do become a data scientist? - Quora
scientists
‘+ What aze the best blogs about data?
+ What is your source of machine learning and data science news? Why?
+ Data Science: what are some best user
Facebook, G+, and Linkedin?
agencies to follow on Titer,
+ What are the best Twitter accounts about data?
Setup your tools
+ Tost Python, Python, and rated erin (guide )
4+ Insall Rand RStudio (1 would soy that R isthe second most important
language. Its good to know both Python and R)
+ Insal Sublime Text
Learn to use your tools
+ Learn R with sw
‘+ What's the best way'to lean to use Sublime Text?
+ How do I learn SQL? (I don't think there's too much ofa nocd to install ton
‘your computer, but just learning the syntax wil be helpful forthe job)
Learn Probability and Statistics
‘Be sure to go through a course that involves heavy application in Ror Python.
Knowing probability and statites will only rally be helpful you ean
Jimplement what you learn,
+ Python Application: Think Stats (Gree pf) (Python focus)
+ RApplications: An Introduction to Statistical Learning (fre pdf )
(100C ) (Rfoeus)
+ Print outa copy of Probability Cheatsheet
Complete Harvard's Data Science Course
‘This course is developed in pat by fellow Quora user, Profesor Joe
liustvin. Note that I recommend completing the 2013 version ofthe cass
instead of the 2014 version... (more)
ee 5
Katie Kent, Director of Educational Outcomes @ Ga... (rere)
4. upvots by Willan Emmanuel Ya Jason Huey, Views Mural (ncce)
Become a Data Scientist by Doing Data Science
“The host way'to hocome a data scientist iso lear - and do data scence
‘There area many'excelleat courses and tools avaiable oaline that can help you
get there,
Here isan incredible list of resources compiled by Jonathan Dinu, Co-founder
of Zipfian Academy, which trains data scientists and data engineers in San
Francisco via immersive programs felowshis and workshops
EDIT: Ive had several requests fora permalink to this answer, See here: A
Practical Intro to Data Science from Zipfian Academy
EDIT: See alo: "How to Become a Data Scientist” on SideSare:
hutp://www slideshare.not/ryanor...
Environment
Python isa great programming language of choice for aspiring data scientists
{due tots general purpose applicability, a gentle (or firm ) learning curve,
and — pethaps the most compelling reason — the rich ecosystem of resourees
And libraries aetively used by the scientific community
Ihiputweew.quora.com/How-do-|-become dala scientist aneanazo1s
(1 How do become a data scientist? - Quora
Development
‘When learning a new language ina new domain, it helps immensely to have an
interactive environment to explore and to receive immediate feedback. TPython
provides an interactive REPL. which also allows you to integrate a wide variety
of frameworks (including R_) into your Python programs.
staristics
Data scientists are better at software engineering than statisticians und better
at statistics than any software engineer. As such, statistical inference underpins
riuch ofthe theory behind data analysis anda solid foundation of statistical
‘methods and probability serves asa stepping stone into the world of data
Courses
‘elX: Introduction to Statisties: Deseritive Statistics. A base introductory
Upiole 14k] Dowrvote Comments 126 Share 20
teaches the complete pipeline of statistical analysis
MIT: Statisteal Thinking and Data Analysis: Tntrodution to probability,
sampling, regression, common distributions and inference
While isthe de facto standard for performing statistical analysis, it has quite
shih learning curve and there are ather areas of data science for which it is
‘ot well suited. To avoid learning a new language for a specific problem
domain, we recommend trying to perform the exercises ofthese courses with
Python and its numerous statistical reres, You wil find that moe ofthe
functionality of R can be replicated with NumPy , @SeiPy , @Matplotlb ,
and @Python Data Analysis Library
Books
‘Well-written books can bea great reference (and supplement) to these courses,
and also provide a more independent learning experiene, These may’be useful
ifyou already have some knowledge ofthe subject or just ned to fil in some
aps in your understanding:
OReilly Think Sts. + An Introduetion to Probability and Statsties for Python
programmers
Introduction to Probability : Textbook for Berkeley’s Stats 194 elas an
introductory treatment of probability with complementary exercises
Berkeley Lecture Notes, Introduction to Probability. : Compiled lecture notes
ofabowe textbook, complete with exercises
Openintso + Satsties: Introductory text book with supplementary exereses
and labs in an online portal
‘Think Bayes. + An simple introduction to Bayesian Statistics with Python code
examples
MACHINE LEARNING/ALGORITHMS
A solid base of Computer Science and algorithms is essential for an aspiring
data scientist Luckily there area wealth of great resourees online, and
machine lexring is one ofthe more Iuerative (and advanced) skills of a data
scientist.
Courses
CCoursera Machine Learning. : Stanford's famous machine learning course
taught by Andrew Nz,
Coursera: Computational Methods for Data Analysis : Statistical methods and
data analysis applied to physical, engineering, and biological sciences.
MIT Data Mining. + Am introduction to the techniques of data mining and how
toapply ML... (more)
Ihiputweew.quora.com/How-do-|-become dala scientist a8anazo1s
(1 How do become a data scientist? - Quora
‘Alox Kamit
|. Tepwots by Edn Kho, Nel Kee, Joseph Mis, nore)
Strictly speaking, ther is no such thing as “data science" (soe What is data
science? ). See also: Vardi, Science has only two legs: htp://portalaem.org/f_
gateway
Here are some resourves I've colleted about working with data, hope you
find them useful (note: I'm an undergrad student, this is not an expert
opinion in any wey),
2) Learn about matrix factorizations
‘Take the Computational Lincer Algebra course (itis sometimes called
Applied Linear Algebra or Matrix Computations or Numerical Analysis or
Matrix Analysis and it ean be ether CS or Applied Math course). Matrix
‘decomposition algorithms ae fundamental to many data mining
applications and are usually underrepresented in a standard "machine
learning” curriculum. With TBs of data traditional tools such as Matlab
‘become not suitable forthe job, you cannot just run cig0) on Big Data
Distributed matrix computation packages such as those included in Apache
“Mahout (1 ate tying to fil this vod but you noed to understand how the
‘numeric algorthms/LAPACK/BLAS routines [2][3]F4I[s] work in order to
use them properly, adjust for special eases, build your own and sele them
‘up to terabytes of data on a chistrof commodity machines [6] Usually
rnumeries courses are built upon undergraduate algebra and caleulus so you
should be good with prerequisites, Ti recommend these resources for seit
study reference material:
See Jack Dongarra : Courses and What are some good resources for
learning abost numerical analysis?
2) Learn about distributed computing
Itsimportant to learn how to work with @ Linus cluster and hove to design
scalable distributed algorithms ifyou want to work with big data (Why the
cxurent obsession sith big data?)
CCrays and Conaeetion Machines ofthe past can now be replaced with farms
‘of cheap cloud instances, the computing casts dropped to less than,
$1.80/GFlop in 2011 ws $15M in 1984: htp:/en.wikipodia.org/w
ikJFLOI
1fyoa want to squeeze the most out of your (rented) hardware itis also
‘becoming incressingly important to be able to utilize the fll power of
multicore (see hitp://en.ikipedia.org/wiki/Moo...)
Note: this topic snot par of standard Machine Learning track but you
‘can probly find courses such as Distributed Systems or Parallel
Programming in our C5/EE catalog. See distibated computing resourees, a
systems course at UIUC. , key works, and for starters: Introduction to
Computer Networking
+ After studying the bases of networking and distributed systems Td focus on
«istrbuted databases, which will on become ubiquitous with the data
‘deluge and hitting the Limits of vertical sealing. See key works, research
‘trends and for starters: Introduction to relational databases and
Introduction to distributed databases (HBase in Action)
8) Learn about statistical analysis
‘+ Start earning statistics by coding with R: What are essential references for
AR? and experiment with real-world data: Where can I find large datasets
open to the public?
Cosa Shallsi compiled some great materials on computational statistics,
cheek out his lecture slides, and also What are some good resources for
learning about Satistical analysis?
{ve found that earning tastes ina particular domain (eg, Natural
Language Processing) is much more enjoyable than taking Stats 101. My
personal recommendation isthe course by MichactColins at Columbia
Ihiputweew.quora.com/How-do-|-become dala scientist anaanazo1s
(1 How do become a data scientist? - Quora
(also available on Coursera
‘+ You can also choose field where the use of quantitative statisies and
causality prineples [7] sinevitable, say molecular biology [8], oF a fon sub-
field such as cancer research [3] or even narrower domain, eg. genetic
nals of tumor angiogenesis [10] and try nswering important questions
in that particular field, learning what yor need inthe process.
4) Learn about optimization
‘+ This subjects essentially prerequisite to understanding many Machine
Leama and Signal Processing algorithms besides being important in is
‘own right
+ Start with Stephen P, Boyd 'syideo lectures and also What are some good
resources to learn about optimization?
5) Learn about machine learning
Before you gett think about algorithms look carefully atthe data and select
‘features that help you fe signal from noise. See ths tak by Jeremy
ooward : At Kagal, It's Disadvantage To Know Too Much
‘Also sce How do learn machine learning? and What are some introductory
resources for learning about large sesle machine learning? Why?
Statistics vs, machine learning, Fight: hitp://brenacon.com/blog/2008/12,
‘You can structure your study program according to online course catalogs
and curricula of MIT, Stanford or other top schools. Experiment with
‘dst & lot, hack some code, es questions, talk to good people, set up a web
‘erawler in your garage: The Anatomy of a Search Engine
‘You can join one of these startups and leara by doing: What startups are
hiring engincers wth strengths in machine earning/NLP?
“The alternative (and rather expensive) option isto enroll in a CS
program Machine Loaraing track ifyou prefer studying in a formal
‘setting. See: What makes a Master's in Computer Science (MS CS) degree
worth it and why?
“Try to awid overspecialization, The breadth-frst approach often works best
‘when learning a new ficld and dealing with hard problems, se the Second
‘oynige of HMS Beagle on the adventures of an ingenions young data
6) Learn about information retrieval
+ Machine learning isnot as cool ast sounds: http /teddziubs.com/2008
Jos/mac.
‘+ What are some good resources to get started with Information Retrieval?
Why?
7) Learn about signal detection and estimation
+ Thisis classic topic and "data science" par excellence in my opinion.
Some of these methods were used to guide the Apolio mission or detect
‘enemy submarines and are stil active use ia maay fields. This
‘often part ofthe EE curriculum.
+ Good references are Robert F. Stengel lecture sides on optimal control and
estimation: Rob Stenge!’s Home Page, Alan V. Oppenhelm's Signals and
Systems and What are some good resources for learning about signal
‘estimation and detection? A good topic to focus on first is Kalman filter,
‘widely used for ime series forecasting.
‘+ Telking about data, you probably want to know something about
Information: its transmission, compression and fering signal from noise
‘The methods developed by communication engineers inthe 60s (such as
‘Viterbi decoder nove ased in about a billion cellphones) are applicable to a
surprising variety of data analysis tasks, fom Statistical machine
Uranslation to understanding the organization and function of molecular
networks . A good resource for startersis Information Theory and Reliable
Ihiputweew.quora.com/How-do-|-become dala scientist aneanes
(1 How do become a data scientist? - Quora
Communication: Robert G. Gallager: 9780471290489: Amazon.com:
Books Also What are some good resourees for learning about information
theory?
8) Master algorithms and data structures
+ What are the most learmer-friendly resourees fo learning about algorithms?
9) Practice
Getting In Shape For The Sport OF Data Srience
+ Carpentry: htp://software-carpenty.org/
+ What are some good tay problems in data science?
+ Tools: Which are some of the best Data Analysis tools?
+ Where can I find large datasets open to the public?
Ifyou do decide to go fora Masters degree:
10) Study Engineering,
Pa go for CS with a focus on either IR oF Machine Leerning ora combination
ofboth and take some systems courses along the way. Asa “data scientist” you
will have to write a ton of eode and probably develop disputed
lgorthms/systems to process massive amounts of data. MS in Statistics wil
teach you how to do modeling and regression analysis ete, not how to build
systems, I think the latter is more urgently needed these days as the old tous
become obsolete with the avalanche of data. There isa shortage of engineers
‘who can build a data mining stem from the ground up. You ean piek up
‘Statistics from books and experiments with R (se item 3 above) or take some
statistic clases as apart of your CS studies.
Good luck
[a] btip://mahoutapache-org/
La] hap: www:netiborg/lapack/
[al hitp://wwwneti.org/eispacky
Ta) hutp://math.nis.gov/javanumerc,
I) hitp:/ wow net ong/sealapack/
{6} hitp:/labs google-com/papers/ma,
7) Amazon.com: Causality: Models, Reasoning and Inference
(6780521895606): Judea Peat: Books
{8} Introduction to Biology, MIT 7.012 video letures
[9] Hanahan & Weinberg, The Hallmarks of Cancer, Next Generation: Page on
Wise
[0] The chaotic organization of tumor-associated vasculature, fom The
Biology of Cancer: Robert A. Weinberg: 9780815342205: Amazon.com:
Books . p. 562
Upated 18 Nov, 20:3. 138,857 views.
ole 17%) Dowrvote Comments 186. Share 65
Ihipuiweew.quora.com/How-do_I-become- a dala scientist ansanazo1s
(1 How do become a data scientist? - Quora
Pronojit Saha, Data Aficionado,
797 upotes by Angad Gate, Gaur Ghosh, Geonae Vara, note)
SELF STARTER WAY
Fora self-starter novice, hore isan outline thet one ean start wit. (this
reproduced from my blog- How to aequre the "Essential Skil Se?- the Self
Starter way).
0. Base Pre-requisites:
‘+ Mathematics, Algorithms & Databases: Mathispowergu-Caleulus
Coursere-Linear Algebra , Coursera~Analysis of Algorithms, Coursera-
Introduction to Databases
‘+ Staistis: Probability and Statistes for Programmers, Statistical Formulas
For Programmers , Coursera- Data Analysis , Coursera- Statistics One
«+ Programming: Google Developers R Programming Lectures , Introduction
to R-DataCamp Scientific Python Lectures. , How to Think Like a
‘Computer Scientist
Acquire & Serub Data:
+ DFS & Databases: Hadoop Tutorial - Yahoo. BigDataUaiversy: Big Data
Course , Hortonworks Sandbox Learning to Process Big Data with,
-MapReduce and Hadoop - Hands-On Exercises
+ Data Munging: Predictive Analytics: Data Preparation , Data Wranging in
Pandas , Data Wrangler , OpenRefine
2, Fiter & Mine data:
Data Analysis in R: Data science in R_ , Coursera-Computing for Data
Analysis in R
++ Data Analysis in Python (numpy, sipy, pandas, seikt): Getting Started With
Python For Data Science , SeiPy'20:3-NumPy Tutorials , Statistical Data
‘Analysis in Python, Pandas (st Video Below), SeiPy 2013- Introduction to
Seiki Learn Tutorial I & 1 (and & grd Video Below)
Ihiputweew.quora.com/How-do-|-become dala scientist
m4anazo1s (1 How do become a data scientist? - Quora
+ Exploratory Data Analsis- Exploratory Data Analysis in R. Exploratory
Data Analysis in Python , UC Berkeley: Descriplive Statistics , Basie Unix
Shell Commands forthe Data Scientist
+ Data Mining, Machine Learning:
Data Mining Map , Coursera-Machine Learning A Programmes’s Guide
toData Mining STATS 202 Data Mining & Anabsis_, Mining Massive
Data Sots -Stanford, Learning From Data - CalTech, Coursera-Web
Intelligence & Big Data
3. Represent & Refine Data: Tableau-Training & Tutorials , Data visualisation
in R with ggplotz and plyr_, Predictive Analstcs: Overview and Data
visualization , Flowing Date-Tutorials , UC Berkeley-Data Visualization
Dajs Tutorial
4, Domain Knowledge: Tis skl is developed through experience working in
‘an industy, Each dataset is different and comes with cerain assumptions and
industry knowledge, For example, a data analyst specializing in sock market
«data would need time to develop knovledge in analyzing transactional data for
restaurants
Combining al the above:
Data Literaey Course LAP
UC Berkeley Introduetion to Data Science
CCoursers-Introduetion to Data Science
‘Teach Data Science-Syracuse University
Apply the knowiedge:
Harvard Data Science Course Homework
Kaggle: The Home of Data Science
Analyzing Big Data with Twitter
Analyzing Twitter Data with Apache Hadoop
FORMAL WAY
Fora more formal way of becoming a date scientist one ean look into this post
(reproduced below)- How to aquire the "Essential Skil Set"?- the Formal way
“The Essential Skil Set aze the basic Fundamental skils which every data
scientists expected to know. Traditionally, these can be acquired by
undertaking a computer science degree ora statistics degree from an
institution, The Stanford Computer Science courses & Statsties courses
provide. good reference list of courses to undertake. Now some of the courses
are relevant while many others are not, For example in Computer Science while
‘one would do good to lean about large sale distributed databases &
algorithms but there is no nced for earning HCI and UX, or pureplay storage
and operating ystems, networking, et. Similarly some statistics courses focus
too much on, lets say, “old school statistics” including thousands of ways of
hypothesis esting instead of more on machine learning (clustering, regression,
classification, et). So Doth the streams have many nice to have courses and
Ihiputweew.quora.com/How-do-|-become dala scientist aneanes
(1 How do become a data scientist? - Quora
‘must have courses fora data scientist (1 dare to claim that at present the
percentage of must have courses seems tobe greater ina traditional Statistics
stream than & Computer Science stream). As such one needs to pick the
courses wisely.
Oralternatively, one can als lok into a number of new Data Science courses
{hat some universities are offering harping onthe points I mentioned above.
‘They combine the must have courses from both the traditional statistics and
computer science program to impart the 4 Essential Skill as well as include
‘courses to develop the Differentiator Skills in students, The MS in Data Seience
atNYU &MSin Analties at USF are good examples of such amalgamation
ofthe requisite courses, A complete list of such eourses is presented here-
Colleges with Data Science Degrees
The correct program obviously depends on the individual's goal. One ofthe
recent O’Rielly publications titled ‘Analyzing the Analyzers’ does a very good
job in aggregating the various data scientist roles into 4 main categories as per
{heir sklls An individual may therefore slet a program as per the ealegory of
data scientist he mos identities himself with, as shown below.
+ Data Businesspeople are the product and profit-foeused data scientists
They're Teaders, managers, and entrepreneurs, but with a technical bent. A
‘common educational path isan engineering degre paired with an MBA or
the new Date Science programs as mentioned above.
+ Data Creatives are eclectic jck-of-all-trades, able to work with a broad
range of data and tools, They may think of themsclves as artists or hackers,
and exed at visualization and open source technologies. They are expected
‘twhave a engineering degree (mostly in statistics or economies) but not
much in business sil,
+ Data Developers are focused on writing software to do analytic,
statistical, end machinc learning task, often in production environments.
“They often have computer science degrees, and often work with so-called
ig data
+ Data Researchers apply their scientific traning, and the tools and
techniques they earned in academia, to organizational data, They may have
‘8 MS oF PhDs instatisies, economic, physies, ee, and their eeative
applications of mathematical tools yields valuable insights and products,
The sls assorated withthe ¢ main categories, which justify the above
mentioned program recommendstion, areas below:
SihanSat-D pein
Upatea Jan. 62,855 views.
ole 797) Domvote Comments 14+ Share 28
Ye Zhao, data entusiast
703 upotes by Wiliam Chen Fj Wyn, Eanen Khoo, (re)
There isa really comprehensive and cool visualization ofthe path to fllow to
become a data scientist.
Ihiputweew.quora.com/How-do-|-become dala scientist onaanes
(1) How do become a data scientist? - Quera
‘The infographic shows the necessary skills to become a good data scientist and
‘mapped out the learning path of a data scientist according to 10 different
domains.
Edit: The image came from the article, Becoming a Data Scientist - Cursieulum
via Metromap - Pragmatic Perspectives, by Swami Chandrasckaran,
Viton 12,2013. 74.973 views.
Uoyois 703 Domvate Comments 1+ Shave 15
Potor Skomoroch, Sr. Data Scintst @ Linkealn
485 upvotes by Mat Keeoy, Olan Fel, Nel Keer, mote)
fy have the time to take courses, give it a shot.
1 Trytotake some of the undergrad math courses you missed. Linear Algebra,
Advanced Calculus, Diff. Eq, Probability, Statistics are the most important
‘After that, take some Machine Learning courses. Read afew of the leading ML
textbooks and keep up with journals to get a good sense ofthe field
‘cad up on what the top data companies are doing, Aftert or2 machine
learning courses you should have enough background to follow most of the
academie papers. Implement some of these algorithms on rel data,
2) Ifyou are working with large datasets, get familar withthe latest techniques
® tools (HHadoop, NoSQL, Spark, ee.) by putting them into practice at work (oF
outside of work)
4) Abig part of data science onthe product development side is esentially
software engincering, and being able to create, modify and implement
algorithms. As Williams Chen mentioned, many data scientists know Python, R,
sciktslearn et, but that s mostly for analysis or prototyping. Ifyou need to
implement anything at seale or within produetion ystems you wil ikely need
to know how to write code in something lke Java or C++. Check out the book
the Amazon.com: The Pragmati Programmer: From Journeyman to Master
(e7S0203616a24): Andrew Hunt, David Thomas: Books and the Software
Carpentry course if you are coming to software development from a science
background,
1 did a TCTV interview recently with Semi Shal where we went into more
depth on how to become a data scientist
* https //techeruneh.com/2012/09/06.
Undated 10 Apr, 204, 85882 vows. Asked io asworby Aloe Kami
pte 485 Dowwote Comments 7 Share 2
BI clare corthet, Designer & Data Scientist
150 upwoes by Benin Denil, Viewa Mur, Sudrya amos, gm)
Ihipuiweew.quora.com/How-do_I-become- a dala scientist
104anazo1s
(1 How do become a data scientist? - Quora
| wrote myself eurrculum for learning Data Sience with frely-avalable
resources, which I open-soureed in The Open Source Data Science Masters
Isa fre, community-ovmed resource
Updated 10 Ds, 2013. 1,084 views.
ovate 180 Domvote Comments 7# Share 3
Pathan Karimkhan, Bigdata, NLP, Machine leaning excite. (rar)
225 upvates by Rly Kwok, 1g TehourkoveK, Mat Klikaya, as)
Being data scientist requires a sold foundation typeallyin computer science
and applications, modeling, statistics, analytics and math.
What sets the data scientist apart i strong business acumen, coupled with the
abity to communicate findings to both business and IT leaders in a way that
«can influence how an organization approaches a business challenge. Good
‘ata scientists wll not just address business problems, they will pick the right
problems that have the mos value to the organization.
Also I believe in depth knowledge in Data science, Machine learning and NLP
‘wil help to sole groxind to top level sues. 4-5 years of development
experience ean give such acumenship.
+ Introduction to CS Course
Notes: Introduction to Computer Science Course that provides instructions
‘on coding.
‘Online Resources
UUdacity- introto CS course,
Ccoursera = Computer Science 101
+ Code in at least one object oriented programming language: C++,
Java, or Python
Beginner Online Resources
CCoursera - Leara to Program: The Fundamentals
MIT Intro to Programming in Java,
Google's Python Class
Coursera Introduction to Python,
Python Open Sauree F-Book
Intermediate Oaline Resources:
[Udacity’s Design of Computer Programs
Coursera - Learn to Program: Crating Quality Code,
CCoursera ~ Programming Languages.»
‘Brown Univesity - Introduetion to Programming Languages
+ Learn other Programming Languages
Notes: Add to your repertoite- Java Serip, CSS, HTML, Ruby, PHP, C, Pe,
Shel. Lisp, Scheme,
Online Resources: w3schoolcom HTML Tutorial, Learn to code
+ Test Your Code
Notes: Learn how to catch bugs ereate tess, and break your software
Online Resources: Udacity- Software Testing Methods Udacty- Sofware
Debugging
+ Develop logical reasoning and knowledge of discrete math
Online Resouees
MIT Mathemsties for Computer Science,
Coursera ~ Introduction to Logie,
Coursers - Linear and Discrete Optimization
CCoursera - Probabilistic Graphical Models,
+ Develop strong understanding of Algorithms and Data
Structures
Notes: Learn about fundamental datatypes (stack, queues, and begs),
sorting algorithms (quicksort, mengesor, heapsort), and data stu
Ihiputweew.quora.com/How-do-|-become dala scientist Weavarots (1 How do become a data scientist? - Quora
(inary search tres, red-black trees, has tables), Big
Online Resources
MIT Introduction to Algorithms
CCoursera Introduction to Algorithms Part & Part 2,
Wikipedia - List of Algorithms
Wikipedia - List of Data Structures ,
Book: The Algorithm Design Manual
+ Develop a strong knowledge of operating systems
Online Resources: UC Berkeley Computer Science 162
(ore)
Upioio 223) Downvele Comments 4¢ Share 12
sulle Lin
{5 pvctes by Dani Carscho, Lingih Sapir, Nita Kay, (mare)
Disclaimer: anyone who wrote their answers here are much more experienced
and developed on data science than me. Tam totally new starter on data
‘William Chen invited me to answer this question soI will use my post as reply
Nev Year Nev Start: Let's go with the top schools by Julie Linon Juli’ Data|
Learning
1 figure outa new approach to cantinue my data journey: (Neve to me but
maybe not new to you people)
Spy... No, mean, Search on the top schools data science/data analysis
programs, got the materials and teach myself using their well-designed routes.
Ifyou have any thoughts and advice on this post, please fee! free to comment.
Your words may help me and other people starting in data science.
Reasons to go with top education systems
(To save your time, fea ree to skip this part and “Cons” to the “Resources and
Links o Sta)
1. Rigorous academic foundation.
‘This isthe biggest issue of self-direted learning Ise: lacking solid and
rigorous seademie foundation to develop further erica thinking. ike
building architecture, a good and sold foundation ina nocessary to go higher.
fall your desieis to quickly sole a temporary work problem, learn and apply
«tool from any "data analyte tool book’
But pursue to go further,
2, Structured path,
In the first booklist shared in this bog, some great books were highly
recommended by data analyte gurus, which I personally followed as the route
of my journey. However, now I see them more as fruits, lowers and leaves of a
tree, instead of the whole tree
‘To grow the re, we probebly need a whole pictare and structured
evelopment. My vision sees a higher probability in sn educational path that
hasbeen designed and) approved by experienced professors and the top
education gxtemsin US.
‘The booklist recommended i sil useful, as self-directed “nutrition
supplementary’
Both points 1& 2 can be explained by one exemple
‘When I started picking up the booklist for data seienee, searched directly for
“Gate analytes” and "data seience” books or online courses, But missed the
point that strong statistics, mathematics an a litle programming background
are vital to dive into data science books courses. After searching the top
schools data science paths from undergraduate to graduate, I realized that t
needs tobe « whole designed package including sob foundations of statistic,
maths and computer science besides dat scence ite
Common Practice
‘A ook ora guru's advice may be good on one aspect based on their personal
experiences, Following them without experienced judgment i blindly
Ihiputweew.quora.com/How-do-|-become dala scientist roaanazo1s
(1 How do become a data scientist? - Quora
{gambling that you are not ana erooked way. Again, 1 sec higher probability to
fit inthe tp education programsas they were designed to fit hundreds of
excellent students,
Cons of Top School Data Science Programs
1. Flaws ofthe program itself
Harvard classes on data science
‘This article about x year ago mainly argued thatthe Harvard data seienee
program is too traditionally statistic based and lacks automated or “machine
to-machine” elements,
‘The defense replied tothe article was mainly Iny League's liehé about “the
reat academic resources and the excellent people in top schools”. Asa self
directed leaner, I think the epies are not realy helping to cover the flaw.
1s totaly fine. can “supplement the nutrition” by reading the machine
learning books in my book Ts.
2. Notable to participate real class projects
1 think Tam ableto fc ity doing online The Home of Data Science
competitions, please see Wiliam Chen's resources below
Resources and Links to Start
Although picking up Harvard asthe example in the “Cons” section, I want to
sive Harvard profesors and students/alumn’ a big applause for sharing
Useful resources and information on data science.
Please check Win Chen. (more)
Lpicie_6t | Oawmvete Comments 1+ Share 4
Vincent Spruyt
8 ucts by Joes Carke Hemnder Azucene, Jay Wacker, Prospect LA (os)
‘This really depends on your backgrouné, but for mos of us, learning how to
program efficiently isthe easy par. The problem i thal many data-scientists
“art using machine earning toolboxes and libraries such as Python's Seikit-
‘Learn, without having a basic understanding of the theoretical foundations of
‘the algorithms. strongly believe that such a black-box approach will lave you
‘witha handicap in the Future, as more data scientists emerge.
| would recommend to start by reading some books about probability theory,
pattern eeognition and machine learning. My top-4 machine learning books
for beginners:
1, Pattern Clasifieation Richard O. Duda
‘2 Machine Learning -Tom M. Mitchell
43, Pattern Recognition and Machine Learning - Christopher Bishop
4. Machine Learning: A Probabilistic Prspostve - Kevin P. Murphy
‘You can find a review ofthese hooks, regarding ther level of detail andthe
mathematics on Machine Learning Books - Computer vision for dummies
[Upuole 86 | Downvote Comments 2+ Share 4
Gautam Tambay, Crunched data at Capital Ono!
125 upwoes by Michael. Bensiein, Angad Gade, Ahmet Sinan Yaw (nee)
Claudia Gold, a SF-based data scientist (formerly at Alebnb and Clastdajo)
curated this Data Analysis Learning Path -- a sequence of onlin courses for
Doginners to learn Data Analysis, She also has some great Quora answers to
Data Science questions.
Another evo}, more advaneed, resouree is The Open Source Data Seienee
Masters curriculum by Clare Cozthell of Mattermark
Finally, Zipian Academy/Galvanize has a good post with linked resources:
Practieal Intro to Data Science
\hiton 29 May, 2014 11,98 views
Ihiputweew.quora.com/How-do-|-become dala scientist 1914avarots (1 How do become a data scientist? - Quora
Lptcie 125 | Dowrvole Comment. Share &
‘Srini Kumar Kadamatl, Data Scientist
{4 acts by Gautam Kura, Saeed Su, Ela Behar, (oe)
‘The Open Source Data Science Masters has everything you need to know,
from the Math tothe programming,
1 was actually written by someane who taught herself data science using all
free / open source tools, guides, courses ete, and became a Data Scientist at &
neat startup in the bay area!
ton 4 May, 2076 3.112 views,
Lpwoie 41 | Oownte Comment Share
Top Stories from Your Feed
Ihiputweew.quora.com/How-do-|-become dala scientist wie