How Do I Become A Data Scientist

anazo1s Latest atv: sh 90 Becoming a Data Selemt Lesring bate Date Science (Creare an Data Selenite bate Ming Date Anais Date Science ig Data Facebook ‘ass 0 Ete (1 How do become a data scientist? - Quora * How do | become a data scientist? Write Question Details Want Answers 67K) Comments § Share 440. Dovvote Nilesh Bharwad Eat Biography « Mae Anonymous \Wte your answer, or answer ater Witiam Chon, Analyzes data daly as a Data Scien. moe) ‘5k upvoos by Isa esebal, Mar Edmonson, Ax Woburn) Here are some amazing and completely free resources online that you can use to teach yourself data science. Besides this page, I would highly recommend the Quors Data Seience FAQ as ‘yur comprehensive guide to data scence! I includes resources similar to this fone, aswell as advice on preparing for data science interviews. Additionally, follow tne Quota Data Science topic if you haven't alteady to get updates on nev questions and answers! Fulfill your prerequisites Before you begin, you need Multivariable Calculus, Linear Algebra, and Python, If your math background is up to multivariable eleulus and linear algebra, youl have enough background to understand almost all of the probability /statsties / machine earning forthe job. ‘Multivariate Caleulus: stips:)/ resounees-for-mastering-multivarable-caleulus Numerical Linear Algebra / Computational Linear Algebra / Matrix Algebra: Linear Alger ,Coursera (Satis 2/2/2015) Multivariate caleulus is useful for some parts of machine earning and alot of probability. Linar / Matrix algebra i absolutely nocesary fo lot of concepts in machine learning You also need some programming background to begin, preferably in Python. ‘Most other things on ths guide an be learned on the job (like random forests, pends, A/B testing) but you can't get away without knowing how to program! Python is the most important language for a data scientist to learn, ‘Tolearn to code, more about Python, and why Python i so important, check cout + How do lear to code? + How do I learn Python? ‘+ Why is Python a language of choice for data scientist? + Ts Python the most important programming language to lem for aspiring data scientists & data miners? youre currently in school, take statistics and computer seience lasses. Check out What classes should I take if want to become a data scientist? Plug Yourself Into the Community CCheek out Meetup to find some that interest you! Attend an interesting talk, learn about data science lve, and meet data scientists and other aspirational data scientists Start reading data seionce blogs and following influential data|-become dala scientist ‘There's more on Quors Pick raw people an topics 1 fatlow And se the best anawors 09 Qu, Hom aerate his infograpti in ems of booming a data scons? hat technology ae courses are requred to eaceme a ota scieistin a staf? het shoul ocus onto become @ jester sata seentst? fetence what shoul be my lesming path ent kraw nwo cose? Cant become a stlsauaht dala sions? ‘Ar 100 od to become aaa cies? Hom do become a dla slots thou sing fo colegetaving a does? Hon do become a da slots in Ina? or Read Guestions ma anazo1s (1 How do become a data scientist? - Quora scientists ‘+ What aze the best blogs about data? + What is your source of machine learning and data science news? Why? + Data Science: what are some best user Facebook, G+, and Linkedin? agencies to follow on Titer, + What are the best Twitter accounts about data? Setup your tools + Tost Python, Python, and rated erin (guide ) 4+ Insall Rand RStudio (1 would soy that R isthe second most important language. Its good to know both Python and R) + Insal Sublime Text Learn to use your tools + Learn R with sw ‘+ What's the best way'to lean to use Sublime Text? + How do I learn SQL? (I don't think there's too much ofa nocd to install ton ‘your computer, but just learning the syntax wil be helpful forthe job) Learn Probability and Statistics ‘Be sure to go through a course that involves heavy application in Ror Python. Knowing probability and statites will only rally be helpful you ean Jimplement what you learn, + Python Application: Think Stats (Gree pf) (Python focus) + RApplications: An Introduction to Statistical Learning (fre pdf ) (100C ) (Rfoeus) + Print outa copy of Probability Cheatsheet Complete Harvard's Data Science Course ‘This course is developed in pat by fellow Quora user, Profesor Joe liustvin. Note that I recommend completing the 2013 version ofthe cass instead of the 2014 version... (more) ee 5 Katie Kent, Director of Educational Outcomes @ Ga... (rere) 4. upvots by Willan Emmanuel Ya Jason Huey, Views Mural (ncce) Become a Data Scientist by Doing Data Science “The host way'to hocome a data scientist iso lear - and do data scence ‘There area many'excelleat courses and tools avaiable oaline that can help you get there, Here isan incredible list of resources compiled by Jonathan Dinu, Co-founder of Zipfian Academy, which trains data scientists and data engineers in San Francisco via immersive programs felowshis and workshops EDIT: Ive had several requests fora permalink to this answer, See here: A Practical Intro to Data Science from Zipfian Academy EDIT: See alo: "How to Become a Data Scientist” on SideSare: hutp://www slideshare.not/ryanor... Environment Python isa great programming language of choice for aspiring data scientists {due tots general purpose applicability, a gentle (or firm ) learning curve, and — pethaps the most compelling reason — the rich ecosystem of resourees And libraries aetively used by the scientific community|-become dala scientist ane anazo1s (1 How do become a data scientist? - Quora Development ‘When learning a new language ina new domain, it helps immensely to have an interactive environment to explore and to receive immediate feedback. TPython provides an interactive REPL. which also allows you to integrate a wide variety of frameworks (including R_) into your Python programs. staristics Data scientists are better at software engineering than statisticians und better at statistics than any software engineer. As such, statistical inference underpins riuch ofthe theory behind data analysis anda solid foundation of statistical ‘methods and probability serves asa stepping stone into the world of data Courses ‘elX: Introduction to Statisties: Deseritive Statistics. A base introductory Upiole 14k] Dowrvote Comments 126 Share 20 teaches the complete pipeline of statistical analysis MIT: Statisteal Thinking and Data Analysis: Tntrodution to probability, sampling, regression, common distributions and inference While isthe de facto standard for performing statistical analysis, it has quite shih learning curve and there are ather areas of data science for which it is ‘ot well suited. To avoid learning a new language for a specific problem domain, we recommend trying to perform the exercises ofthese courses with Python and its numerous statistical reres, You wil find that moe ofthe functionality of R can be replicated with NumPy , @SeiPy , @Matplotlb , and @Python Data Analysis Library Books ‘Well-written books can bea great reference (and supplement) to these courses, and also provide a more independent learning experiene, These may’be useful ifyou already have some knowledge ofthe subject or just ned to fil in some aps in your understanding: OReilly Think Sts. + An Introduetion to Probability and Statsties for Python programmers Introduction to Probability : Textbook for Berkeley’s Stats 194 elas an introductory treatment of probability with complementary exercises Berkeley Lecture Notes, Introduction to Probability. : Compiled lecture notes ofabowe textbook, complete with exercises Openintso + Satsties: Introductory text book with supplementary exereses and labs in an online portal ‘Think Bayes. + An simple introduction to Bayesian Statistics with Python code examples MACHINE LEARNING/ALGORITHMS A solid base of Computer Science and algorithms is essential for an aspiring data scientist Luckily there area wealth of great resourees online, and machine lexring is one ofthe more Iuerative (and advanced) skills of a data scientist. Courses CCoursera Machine Learning. : Stanford's famous machine learning course taught by Andrew Nz, Coursera: Computational Methods for Data Analysis : Statistical methods and data analysis applied to physical, engineering, and biological sciences. MIT Data Mining. + Am introduction to the techniques of data mining and how toapply ML... (more)|-become dala scientist a8 anazo1s (1 How do become a data scientist? - Quora ‘Alox Kamit |. Tepwots by Edn Kho, Nel Kee, Joseph Mis, nore) Strictly speaking, ther is no such thing as “data science" (soe What is data science? ). See also: Vardi, Science has only two legs: htp:// gateway Here are some resourves I've colleted about working with data, hope you find them useful (note: I'm an undergrad student, this is not an expert opinion in any wey), 2) Learn about matrix factorizations ‘Take the Computational Lincer Algebra course (itis sometimes called Applied Linear Algebra or Matrix Computations or Numerical Analysis or Matrix Analysis and it ean be ether CS or Applied Math course). Matrix ‘decomposition algorithms ae fundamental to many data mining applications and are usually underrepresented in a standard "machine learning” curriculum. With TBs of data traditional tools such as Matlab ‘become not suitable forthe job, you cannot just run cig0) on Big Data Distributed matrix computation packages such as those included in Apache “Mahout (1 ate tying to fil this vod but you noed to understand how the ‘numeric algorthms/LAPACK/BLAS routines [2][3]F4I[s] work in order to use them properly, adjust for special eases, build your own and sele them ‘up to terabytes of data on a chistrof commodity machines [6] Usually rnumeries courses are built upon undergraduate algebra and caleulus so you should be good with prerequisites, Ti recommend these resources for seit study reference material: See Jack Dongarra : Courses and What are some good resources for learning abost numerical analysis? 2) Learn about distributed computing Itsimportant to learn how to work with @ Linus cluster and hove to design scalable distributed algorithms ifyou want to work with big data (Why the cxurent obsession sith big data?) CCrays and Conaeetion Machines ofthe past can now be replaced with farms ‘of cheap cloud instances, the computing casts dropped to less than, $1.80/GFlop in 2011 ws $15M in 1984: htp:/ ikJFLOI 1fyoa want to squeeze the most out of your (rented) hardware itis also ‘becoming incressingly important to be able to utilize the fll power of multicore (see hitp:// Note: this topic snot par of standard Machine Learning track but you ‘can probly find courses such as Distributed Systems or Parallel Programming in our C5/EE catalog. See distibated computing resourees, a systems course at UIUC. , key works, and for starters: Introduction to Computer Networking + After studying the bases of networking and distributed systems Td focus on «istrbuted databases, which will on become ubiquitous with the data ‘deluge and hitting the Limits of vertical sealing. See key works, research ‘trends and for starters: Introduction to relational databases and Introduction to distributed databases (HBase in Action) 8) Learn about statistical analysis ‘+ Start earning statistics by coding with R: What are essential references for AR? and experiment with real-world data: Where can I find large datasets open to the public? Cosa Shallsi compiled some great materials on computational statistics, cheek out his lecture slides, and also What are some good resources for learning about Satistical analysis? {ve found that earning tastes ina particular domain (eg, Natural Language Processing) is much more enjoyable than taking Stats 101. My personal recommendation isthe course by MichactColins at Columbia|-become dala scientist ana anazo1s (1 How do become a data scientist? - Quora (also available on Coursera ‘+ You can also choose field where the use of quantitative statisies and causality prineples [7] sinevitable, say molecular biology [8], oF a fon sub- field such as cancer research [3] or even narrower domain, eg. genetic nals of tumor angiogenesis [10] and try nswering important questions in that particular field, learning what yor need inthe process. 4) Learn about optimization ‘+ This subjects essentially prerequisite to understanding many Machine Leama and Signal Processing algorithms besides being important in is ‘own right + Start with Stephen P, Boyd 'syideo lectures and also What are some good resources to learn about optimization? 5) Learn about machine learning Before you gett think about algorithms look carefully atthe data and select ‘features that help you fe signal from noise. See ths tak by Jeremy ooward : At Kagal, It's Disadvantage To Know Too Much ‘Also sce How do learn machine learning? and What are some introductory resources for learning about large sesle machine learning? Why? Statistics vs, machine learning, Fight: hitp://, ‘You can structure your study program according to online course catalogs and curricula of MIT, Stanford or other top schools. Experiment with ‘dst & lot, hack some code, es questions, talk to good people, set up a web ‘erawler in your garage: The Anatomy of a Search Engine ‘You can join one of these startups and leara by doing: What startups are hiring engincers wth strengths in machine earning/NLP? “The alternative (and rather expensive) option isto enroll in a CS program Machine Loaraing track ifyou prefer studying in a formal ‘setting. See: What makes a Master's in Computer Science (MS CS) degree worth it and why? “Try to awid overspecialization, The breadth-frst approach often works best ‘when learning a new ficld and dealing with hard problems, se the Second ‘oynige of HMS Beagle on the adventures of an ingenions young data 6) Learn about information retrieval + Machine learning isnot as cool ast sounds: http / Jos/mac. ‘+ What are some good resources to get started with Information Retrieval? Why? 7) Learn about signal detection and estimation + Thisis classic topic and "data science" par excellence in my opinion. Some of these methods were used to guide the Apolio mission or detect ‘enemy submarines and are stil active use ia maay fields. This ‘often part ofthe EE curriculum. + Good references are Robert F. Stengel lecture sides on optimal control and estimation: Rob Stenge!’s Home Page, Alan V. Oppenhelm's Signals and Systems and What are some good resources for learning about signal ‘estimation and detection? A good topic to focus on first is Kalman filter, ‘widely used for ime series forecasting. ‘+ Telking about data, you probably want to know something about Information: its transmission, compression and fering signal from noise ‘The methods developed by communication engineers inthe 60s (such as ‘Viterbi decoder nove ased in about a billion cellphones) are applicable to a surprising variety of data analysis tasks, fom Statistical machine Uranslation to understanding the organization and function of molecular networks . A good resource for startersis Information Theory and Reliable|-become dala scientist ane anes (1 How do become a data scientist? - Quora Communication: Robert G. Gallager: 9780471290489: Books Also What are some good resourees for learning about information theory? 8) Master algorithms and data structures + What are the most learmer-friendly resourees fo learning about algorithms? 9) Practice Getting In Shape For The Sport OF Data Srience + Carpentry: htp:// + What are some good tay problems in data science? + Tools: Which are some of the best Data Analysis tools? + Where can I find large datasets open to the public? Ifyou do decide to go fora Masters degree: 10) Study Engineering, Pa go for CS with a focus on either IR oF Machine Leerning ora combination ofboth and take some systems courses along the way. Asa “data scientist” you will have to write a ton of eode and probably develop disputed lgorthms/systems to process massive amounts of data. MS in Statistics wil teach you how to do modeling and regression analysis ete, not how to build systems, I think the latter is more urgently needed these days as the old tous become obsolete with the avalanche of data. There isa shortage of engineers ‘who can build a data mining stem from the ground up. You ean piek up ‘Statistics from books and experiments with R (se item 3 above) or take some statistic clases as apart of your CS studies. Good luck [a] btip://mahoutapache-org/ La] hap: www:netiborg/lapack/ [al hitp:// Ta) hutp://, I) hitp:/ wow net ong/sealapack/ {6} hitp:/labs google-com/papers/ma, 7) Causality: Models, Reasoning and Inference (6780521895606): Judea Peat: Books {8} Introduction to Biology, MIT 7.012 video letures [9] Hanahan & Weinberg, The Hallmarks of Cancer, Next Generation: Page on Wise [0] The chaotic organization of tumor-associated vasculature, fom The Biology of Cancer: Robert A. Weinberg: 9780815342205: Books . p. 562 Upated 18 Nov, 20:3. 138,857 views. ole 17%) Dowrvote Comments 186. Share 65 a dala scientist ans anazo1s (1 How do become a data scientist? - Quora Pronojit Saha, Data Aficionado, 797 upotes by Angad Gate, Gaur Ghosh, Geonae Vara, note) SELF STARTER WAY Fora self-starter novice, hore isan outline thet one ean start wit. (this reproduced from my blog- How to aequre the "Essential Skil Se?- the Self Starter way). 0. Base Pre-requisites: ‘+ Mathematics, Algorithms & Databases: Mathispowergu-Caleulus Coursere-Linear Algebra , Coursera~Analysis of Algorithms, Coursera- Introduction to Databases ‘+ Staistis: Probability and Statistes for Programmers, Statistical Formulas For Programmers , Coursera- Data Analysis , Coursera- Statistics One «+ Programming: Google Developers R Programming Lectures , Introduction to R-DataCamp Scientific Python Lectures. , How to Think Like a ‘Computer Scientist Acquire & Serub Data: + DFS & Databases: Hadoop Tutorial - Yahoo. BigDataUaiversy: Big Data Course , Hortonworks Sandbox Learning to Process Big Data with, -MapReduce and Hadoop - Hands-On Exercises + Data Munging: Predictive Analytics: Data Preparation , Data Wranging in Pandas , Data Wrangler , OpenRefine 2, Fiter & Mine data: Data Analysis in R: Data science in R_ , Coursera-Computing for Data Analysis in R ++ Data Analysis in Python (numpy, sipy, pandas, seikt): Getting Started With Python For Data Science , SeiPy'20:3-NumPy Tutorials , Statistical Data ‘Analysis in Python, Pandas (st Video Below), SeiPy 2013- Introduction to Seiki Learn Tutorial I & 1 (and & grd Video Below)|-become dala scientist m4 anazo1s (1 How do become a data scientist? - Quora + Exploratory Data Analsis- Exploratory Data Analysis in R. Exploratory Data Analysis in Python , UC Berkeley: Descriplive Statistics , Basie Unix Shell Commands forthe Data Scientist + Data Mining, Machine Learning: Data Mining Map , Coursera-Machine Learning A Programmes’s Guide toData Mining STATS 202 Data Mining & Anabsis_, Mining Massive Data Sots -Stanford, Learning From Data - CalTech, Coursera-Web Intelligence & Big Data 3. Represent & Refine Data: Tableau-Training & Tutorials , Data visualisation in R with ggplotz and plyr_, Predictive Analstcs: Overview and Data visualization , Flowing Date-Tutorials , UC Berkeley-Data Visualization Dajs Tutorial 4, Domain Knowledge: Tis skl is developed through experience working in ‘an industy, Each dataset is different and comes with cerain assumptions and industry knowledge, For example, a data analyst specializing in sock market «data would need time to develop knovledge in analyzing transactional data for restaurants Combining al the above: Data Literaey Course LAP UC Berkeley Introduetion to Data Science CCoursers-Introduetion to Data Science ‘Teach Data Science-Syracuse University Apply the knowiedge: Harvard Data Science Course Homework Kaggle: The Home of Data Science Analyzing Big Data with Twitter Analyzing Twitter Data with Apache Hadoop FORMAL WAY Fora more formal way of becoming a date scientist one ean look into this post (reproduced below)- How to aquire the "Essential Skil Set"?- the Formal way “The Essential Skil Set aze the basic Fundamental skils which every data scientists expected to know. Traditionally, these can be acquired by undertaking a computer science degree ora statistics degree from an institution, The Stanford Computer Science courses & Statsties courses provide. good reference list of courses to undertake. Now some of the courses are relevant while many others are not, For example in Computer Science while ‘one would do good to lean about large sale distributed databases & algorithms but there is no nced for earning HCI and UX, or pureplay storage and operating ystems, networking, et. Similarly some statistics courses focus too much on, lets say, “old school statistics” including thousands of ways of hypothesis esting instead of more on machine learning (clustering, regression, classification, et). So Doth the streams have many nice to have courses and|-become dala scientist ane anes (1 How do become a data scientist? - Quora ‘must have courses fora data scientist (1 dare to claim that at present the percentage of must have courses seems tobe greater ina traditional Statistics stream than & Computer Science stream). As such one needs to pick the courses wisely. Oralternatively, one can als lok into a number of new Data Science courses {hat some universities are offering harping onthe points I mentioned above. ‘They combine the must have courses from both the traditional statistics and computer science program to impart the 4 Essential Skill as well as include ‘courses to develop the Differentiator Skills in students, The MS in Data Seience atNYU &MSin Analties at USF are good examples of such amalgamation ofthe requisite courses, A complete list of such eourses is presented here- Colleges with Data Science Degrees The correct program obviously depends on the individual's goal. One ofthe recent O’Rielly publications titled ‘Analyzing the Analyzers’ does a very good job in aggregating the various data scientist roles into 4 main categories as per {heir sklls An individual may therefore slet a program as per the ealegory of data scientist he mos identities himself with, as shown below. + Data Businesspeople are the product and profit-foeused data scientists They're Teaders, managers, and entrepreneurs, but with a technical bent. A ‘common educational path isan engineering degre paired with an MBA or the new Date Science programs as mentioned above. + Data Creatives are eclectic jck-of-all-trades, able to work with a broad range of data and tools, They may think of themsclves as artists or hackers, and exed at visualization and open source technologies. They are expected ‘twhave a engineering degree (mostly in statistics or economies) but not much in business sil, + Data Developers are focused on writing software to do analytic, statistical, end machinc learning task, often in production environments. “They often have computer science degrees, and often work with so-called ig data + Data Researchers apply their scientific traning, and the tools and techniques they earned in academia, to organizational data, They may have ‘8 MS oF PhDs instatisies, economic, physies, ee, and their eeative applications of mathematical tools yields valuable insights and products, The sls assorated withthe ¢ main categories, which justify the above mentioned program recommendstion, areas below: SihanSat-D pein Upatea Jan. 62,855 views. ole 797) Domvote Comments 14+ Share 28 Ye Zhao, data entusiast 703 upotes by Wiliam Chen Fj Wyn, Eanen Khoo, (re) There isa really comprehensive and cool visualization ofthe path to fllow to become a data scientist.|-become dala scientist ona anes (1) How do become a data scientist? - Quera ‘The infographic shows the necessary skills to become a good data scientist and ‘mapped out the learning path of a data scientist according to 10 different domains. Edit: The image came from the article, Becoming a Data Scientist - Cursieulum via Metromap - Pragmatic Perspectives, by Swami Chandrasckaran, Viton 12,2013. 74.973 views. Uoyois 703 Domvate Comments 1+ Shave 15 Potor Skomoroch, Sr. Data Scintst @ Linkealn 485 upvotes by Mat Keeoy, Olan Fel, Nel Keer, mote) fy have the time to take courses, give it a shot. 1 Trytotake some of the undergrad math courses you missed. Linear Algebra, Advanced Calculus, Diff. Eq, Probability, Statistics are the most important ‘After that, take some Machine Learning courses. Read afew of the leading ML textbooks and keep up with journals to get a good sense ofthe field ‘cad up on what the top data companies are doing, Aftert or2 machine learning courses you should have enough background to follow most of the academie papers. Implement some of these algorithms on rel data, 2) Ifyou are working with large datasets, get familar withthe latest techniques ® tools (HHadoop, NoSQL, Spark, ee.) by putting them into practice at work (oF outside of work) 4) Abig part of data science onthe product development side is esentially software engincering, and being able to create, modify and implement algorithms. As Williams Chen mentioned, many data scientists know Python, R, sciktslearn et, but that s mostly for analysis or prototyping. Ifyou need to implement anything at seale or within produetion ystems you wil ikely need to know how to write code in something lke Java or C++. Check out the book the The Pragmati Programmer: From Journeyman to Master (e7S0203616a24): Andrew Hunt, David Thomas: Books and the Software Carpentry course if you are coming to software development from a science background, 1 did a TCTV interview recently with Semi Shal where we went into more depth on how to become a data scientist * https // Undated 10 Apr, 204, 85882 vows. Asked io asworby Aloe Kami pte 485 Dowwote Comments 7 Share 2 BI clare corthet, Designer & Data Scientist 150 upwoes by Benin Denil, Viewa Mur, Sudrya amos, gm) a dala scientist 104 anazo1s (1 How do become a data scientist? - Quora | wrote myself eurrculum for learning Data Sience with frely-avalable resources, which I open-soureed in The Open Source Data Science Masters Isa fre, community-ovmed resource Updated 10 Ds, 2013. 1,084 views. ovate 180 Domvote Comments 7# Share 3 Pathan Karimkhan, Bigdata, NLP, Machine leaning excite. (rar) 225 upvates by Rly Kwok, 1g TehourkoveK, Mat Klikaya, as) Being data scientist requires a sold foundation typeallyin computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart i strong business acumen, coupled with the abity to communicate findings to both business and IT leaders in a way that «can influence how an organization approaches a business challenge. Good ‘ata scientists wll not just address business problems, they will pick the right problems that have the mos value to the organization. Also I believe in depth knowledge in Data science, Machine learning and NLP ‘wil help to sole groxind to top level sues. 4-5 years of development experience ean give such acumenship. + Introduction to CS Course Notes: Introduction to Computer Science Course that provides instructions ‘on coding. ‘Online Resources UUdacity- introto CS course, Ccoursera = Computer Science 101 + Code in at least one object oriented programming language: C++, Java, or Python Beginner Online Resources CCoursera - Leara to Program: The Fundamentals MIT Intro to Programming in Java, Google's Python Class Coursera Introduction to Python, Python Open Sauree F-Book Intermediate Oaline Resources: [Udacity’s Design of Computer Programs Coursera - Learn to Program: Crating Quality Code, CCoursera ~ Programming Languages.» ‘Brown Univesity - Introduetion to Programming Languages + Learn other Programming Languages Notes: Add to your repertoite- Java Serip, CSS, HTML, Ruby, PHP, C, Pe, Shel. Lisp, Scheme, Online Resources: w3schoolcom HTML Tutorial, Learn to code + Test Your Code Notes: Learn how to catch bugs ereate tess, and break your software Online Resources: Udacity- Software Testing Methods Udacty- Sofware Debugging + Develop logical reasoning and knowledge of discrete math Online Resouees MIT Mathemsties for Computer Science, Coursera ~ Introduction to Logie, Coursers - Linear and Discrete Optimization CCoursera - Probabilistic Graphical Models, + Develop strong understanding of Algorithms and Data Structures Notes: Learn about fundamental datatypes (stack, queues, and begs), sorting algorithms (quicksort, mengesor, heapsort), and data stu|-become dala scientist We avarots (1 How do become a data scientist? - Quora (inary search tres, red-black trees, has tables), Big Online Resources MIT Introduction to Algorithms CCoursera Introduction to Algorithms Part & Part 2, Wikipedia - List of Algorithms Wikipedia - List of Data Structures , Book: The Algorithm Design Manual + Develop a strong knowledge of operating systems Online Resources: UC Berkeley Computer Science 162 (ore) Upioio 223) Downvele Comments 4¢ Share 12 sulle Lin {5 pvctes by Dani Carscho, Lingih Sapir, Nita Kay, (mare) Disclaimer: anyone who wrote their answers here are much more experienced and developed on data science than me. Tam totally new starter on data ‘William Chen invited me to answer this question soI will use my post as reply Nev Year Nev Start: Let's go with the top schools by Julie Linon Juli’ Data| Learning 1 figure outa new approach to cantinue my data journey: (Neve to me but maybe not new to you people) Spy... No, mean, Search on the top schools data science/data analysis programs, got the materials and teach myself using their well-designed routes. Ifyou have any thoughts and advice on this post, please fee! free to comment. Your words may help me and other people starting in data science. Reasons to go with top education systems (To save your time, fea ree to skip this part and “Cons” to the “Resources and Links o Sta) 1. Rigorous academic foundation. ‘This isthe biggest issue of self-direted learning Ise: lacking solid and rigorous seademie foundation to develop further erica thinking. ike building architecture, a good and sold foundation ina nocessary to go higher. fall your desieis to quickly sole a temporary work problem, learn and apply «tool from any "data analyte tool book’ But pursue to go further, 2, Structured path, In the first booklist shared in this bog, some great books were highly recommended by data analyte gurus, which I personally followed as the route of my journey. However, now I see them more as fruits, lowers and leaves of a tree, instead of the whole tree ‘To grow the re, we probebly need a whole pictare and structured evelopment. My vision sees a higher probability in sn educational path that hasbeen designed and) approved by experienced professors and the top education gxtemsin US. ‘The booklist recommended i sil useful, as self-directed “nutrition supplementary’ Both points 1& 2 can be explained by one exemple ‘When I started picking up the booklist for data seienee, searched directly for “Gate analytes” and "data seience” books or online courses, But missed the point that strong statistics, mathematics an a litle programming background are vital to dive into data science books courses. After searching the top schools data science paths from undergraduate to graduate, I realized that t needs tobe « whole designed package including sob foundations of statistic, maths and computer science besides dat scence ite Common Practice ‘A ook ora guru's advice may be good on one aspect based on their personal experiences, Following them without experienced judgment i blindly|-become dala scientist roa anazo1s (1 How do become a data scientist? - Quora {gambling that you are not ana erooked way. Again, 1 sec higher probability to fit inthe tp education programsas they were designed to fit hundreds of excellent students, Cons of Top School Data Science Programs 1. Flaws ofthe program itself Harvard classes on data science ‘This article about x year ago mainly argued thatthe Harvard data seienee program is too traditionally statistic based and lacks automated or “machine to-machine” elements, ‘The defense replied tothe article was mainly Iny League's liehé about “the reat academic resources and the excellent people in top schools”. Asa self directed leaner, I think the epies are not realy helping to cover the flaw. 1s totaly fine. can “supplement the nutrition” by reading the machine learning books in my book Ts. 2. Notable to participate real class projects 1 think Tam ableto fc ity doing online The Home of Data Science competitions, please see Wiliam Chen's resources below Resources and Links to Start Although picking up Harvard asthe example in the “Cons” section, I want to sive Harvard profesors and students/alumn’ a big applause for sharing Useful resources and information on data science. Please check Win Chen. (more) Lpicie_6t | Oawmvete Comments 1+ Share 4 Vincent Spruyt 8 ucts by Joes Carke Hemnder Azucene, Jay Wacker, Prospect LA (os) ‘This really depends on your backgrouné, but for mos of us, learning how to program efficiently isthe easy par. The problem i thal many data-scientists “art using machine earning toolboxes and libraries such as Python's Seikit- ‘Learn, without having a basic understanding of the theoretical foundations of ‘the algorithms. strongly believe that such a black-box approach will lave you ‘witha handicap in the Future, as more data scientists emerge. | would recommend to start by reading some books about probability theory, pattern eeognition and machine learning. My top-4 machine learning books for beginners: 1, Pattern Clasifieation Richard O. Duda ‘2 Machine Learning -Tom M. Mitchell 43, Pattern Recognition and Machine Learning - Christopher Bishop 4. Machine Learning: A Probabilistic Prspostve - Kevin P. Murphy ‘You can find a review ofthese hooks, regarding ther level of detail andthe mathematics on Machine Learning Books - Computer vision for dummies [Upuole 86 | Downvote Comments 2+ Share 4 Gautam Tambay, Crunched data at Capital Ono! 125 upwoes by Michael. Bensiein, Angad Gade, Ahmet Sinan Yaw (nee) Claudia Gold, a SF-based data scientist (formerly at Alebnb and Clastdajo) curated this Data Analysis Learning Path -- a sequence of onlin courses for Doginners to learn Data Analysis, She also has some great Quora answers to Data Science questions. Another evo}, more advaneed, resouree is The Open Source Data Seienee Masters curriculum by Clare Cozthell of Mattermark Finally, Zipian Academy/Galvanize has a good post with linked resources: Practieal Intro to Data Science \hiton 29 May, 2014 11,98 views|-become dala scientist 1914 avarots (1 How do become a data scientist? - Quora Lptcie 125 | Dowrvole Comment. Share & ‘Srini Kumar Kadamatl, Data Scientist {4 acts by Gautam Kura, Saeed Su, Ela Behar, (oe) ‘The Open Source Data Science Masters has everything you need to know, from the Math tothe programming, 1 was actually written by someane who taught herself data science using all free / open source tools, guides, courses ete, and became a Data Scientist at & neat startup in the bay area! ton 4 May, 2076 3.112 views, Lpwoie 41 | Oownte Comment Share Top Stories from Your Feed|-become dala scientist wie

