Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Data Science with Machine Learning

Curriculum

1
Updated April 2021
Data Science with Machine Learning Curriculum

Program Objective

Data science is a fast-evolving field and offers many employment opportunities for people with a
robust operational analysis background. In recent years, technological development in data
collection and storage and innovations in data science tools and methodologies have made it even
more important to have properly trained data analysts and data scientists to perform data analyses
to gain business insights.

NYC Data Science Academy designed the Data Science with Machine Learning bootcamp to
provide accelerated training to fulfill the need for data science professionals in the employment
market. The objective of the Data Science Bootcamp is to provide training in primary data science
tools and methods that prepares students for employment opportunities across all industries as
data science professionals.

Program Description

The Data Science with Machine Learning bootcamp is an advanced certificate program that is
designed primarily for individuals who have earned a baccalaureate or higher degree and want to
further their career in the field of data science. It is a very accelerated training program in which
students learn the major tools and methods for performing data analyses and apply them to various
projects typically found in the data science field.

At the foundation level of the program, students learn to employ R and Python for data analytics
projects and for presenting research results effectively. Beyond the foundational level, students
study machine learning with Python and carry out research projects that involve advanced data
science methods and strategies. The program also exposes students to concepts and practices in
deep learning and big data.

Updated April 2021 2


Data Science with Machine Learning Curriculum

Prework Machine Learning I and Job Search


II with Python Career development
Access to online prework to Foundation of statistics, strategies and post-
prepare for an accelerated, regressions, classifications, bootcamp activities
immersive learning at NYC model selections, unsupervised to seek employment in
Data Science Academy. learning, and Machine Learning the data science field.
Project in a group.

Data Machine
Prework Capstone Job Search
Analytics Learning

Data Analytics Capstone


Data analytics using R, Work with team members to
Python, and other tools and design and complete a
methods; two application culminating project that
projects. uses real-world datasets.

3
Updated April 2021
Data Science with Machine Learning Curriculum

Prework
Once students are enrolled in the bootcamp, they are granted access to our online, self-paced pre-
work materials to get prepared in linear algebra, statistics, and some foundational work in coding.

Enrolled bootcamp students can also choose to take part-time, beginner-level courses hosted at our
NYC campus for preparation. Tuition paid for such courses will be credited as part of bootcamp
tuition.

4
Updated April 2021
Data Science with Machine Learning Curriculum

Curriculum Topic Outline


Data Science Toolkit
● Linux system
o Operating systems and Linux
o File system and file operations
o Text-processing commands
o Other useful commands
● Git
o What is version control and Git?
o Installing Git
o Getting started with Git
o Git tips
o Undoing changes
o What is GitHub?
o Working with remotes
● SQL: Part 1
o What is SQL?
o What does SQL do?
o Basic terminology
o Working with SQL hosted in a Docker container through GUI
o Basic SELECT statements
o Operators
o Case statements
o String operations
o Date and time data
● SQL: Part 2
o LIMIT statements
o WHERE statements
o Aggregating functions
o GROUP BY statements
o HAVING statements
o ORDEWR BY statements
• SQL: Part 3
o UNION and UNION ALL
o Normalization
o JOIN operations
o Subqueries

Updated April 2021

5
Data Science with Machine Learning Curriculum

• SQL: Part 4
o Window functions
o Comparison with Grouping and Aggregating
o Application
• SQL: Part 5
o Data manipulation
o Loading data from Les
o Data integrity

Data Analytics with R

• Introduction to R: Part I
o Introduction to R
o Introduction to RStudio
o R objects
o Functional programming: apply
● Introduction to R: Part II
o More data types
o Control statements
o Functions
o Data transformations
● Manipulating Data with dplyr and tidyr
o Introduction to dplyr
o Built-in functions
o Join data sets
o Groupwise operations
o Reshape the layout
o Split/Combine cells
● Data Visualization with "ggplot2"
o Why ggplot2?
o The “Grammar of Graphics”
o Constructing a ggplot2 plot
o Scatterplots
o Bar charts
o Histograms
o Visualizing big data
o Saving graphs
● Advanced ggplot2
o Customized graphics
o Titles
o Coordinate systems

Updated April 2021

6
Data Science with Machine Learning Curriculum

o Scales
o Themes
o Axis labels
o Legends
o Other visualizations
● Introduction to Shiny
o A quick introduction to shiny
o Building a Shiny App from scratch
o Improving your Shiny App
● Shiny Topics
o GoogleVis
o Leaflet
o Shiny dashboard
● Foundations of Statistics
o Descriptive statistics
o Introduction to inferential statistics
o Introduction to machine learning
• Advanced Statistics
o Distributions
o Going through codes
Project 1: Exploratory Visualization & Shiny

Data Analytics with Python


● Object Oriented Programming in Python
o Object oriented programming
o Portability and readability
o Case study and examples
● Data Structures and Control Flows
o Common data structures “out of the box”
o Iterated and vectorized operations
o Error handling
o Case study and examples
● Advanced Topics
o More class
o Iterators and generators
o Function arguments and unpacking
o Case study and examples
● Basic Python Roundup
o ETL case study
§ NumPy

Updated April 2021

7
Data Science with Machine Learning Curriculum

§ An example
§ Getting started
§ Items/spider/pipelines/settings.py
§ In class lab
● NumPy
o NumPy overview
o Ndarray
o Vectorized/Element-wise Operations
o Subscripting and Slicing
o Intro to Matrix Multiplication and Inversion
o Random number generation
o Case study
● Data Manipulations with Pandas
o Series and common operations
o Data frames and common operations
o Data manipulation
o Time series
● Data Visualization in the NumPy Stack
o Matplotlib
o Seaborn
o Case study
● SciPy and Data Analysis Roundup
o Introduction to SciPy
o Case study
Project 2: Data Analytics with Web Scraping

Business Cases in Data Science

• Role of Data Science and DS Professionals


o Data-driven decision making in the business world
o The data professional’s ecosystem
o Data science as a strategic asset for businesses across industries
o The roles of a data analyst and data scientist
o Survey of storytelling with data
• Data Science Solutions to Business Problems
o Key business operational concepts
o Typical business problems
o Role of data to solve business problems
o The process of data analytics
o Data science modeling - supervised and unsupervised
o Anatomy of a business case with data analysis for marketing
• Data Science Solutions to Business Problems
Updated April 2021

8
Data Science with Machine Learning Curriculum

o General business problems in the healthcare and pharmaceutical


industries
o Characteristics of data science projects in these industries
o Anatomy of a business case wherein data analytics offered a solution
to the problem
• Data Science Solutions to Business Problems
o General business programs in the financial industry
o Characteristics of data science projects in the financial industry
o Anatomy of a business case in which data science offered an
effective solution to a business problem
• Conceptualization of a Data Analytics Problem
o Planning process for a data analytics project
o Understand the business problem
o Data available for the project
o Review of dataset(s) to see what knowledge and insights can be
extracted
o Establish project objective(s)
o Preparing data for analysis
o Determine tools and methods
o Envision the approach
o Identify deliverables
o Tell the story to stakeholders

Machine Learning I
● Simple Linear Regression
o Overview of Machine Learning
o Residuals, RSS, and the Coefficient of Determination.
o Assumptions of Simple Linear Regression.
o Model coefficients
§ Interpretation
§ Standard Errors
● Multiple Linear Regression
o Assumptions of Multiple (General) Linear Regression
o Coefficients of Continuous Features
o Multicollinearity and Overfitting
o Dummifying Categorical features
§ Coefficient Interpretation
o Case Study
● Penalized Linear Regression
o The Bias-Variance tradeoff
o Euclidean (L2) vs Manhattan (L1) Distances

Updated April 2021

9
Data Science with Machine Learning Curriculum

o Ridge, Lasso, and ElasticNet Penalties


o Hyperparameter grid search and the model coefficient shrinkages
• Logistic Regression and Gradient Descent
o The failure of Linear Regression Models for Classification
o Log-odds and Sigmoid function
o The concept of Maximum Likelihood Estimation
o Negative Log-Likelihood as Loss
o Gradient Descent and Stochastic Gradient Descent
• Model Selection
o Model assessment in practice
§ Grid search
§ Train-test split
o Model comparison
§ Tuning
§ Various model types
o Thoughts on Feature Engineering
§ Different features make different models
o Case study
• Discriminant Analysis and Naïve Bayes Model
o Conditional Probability and Bayes Theorem
o Univariate and Multivariate Gaussian Distributions.
o Bayes classifiers
o Linear and Quadratic discriminant analysis and the curse of
dimensionality
o Naive Bayes models
§ Gaussian
§ Bernoulli
§ Multinomial
o Case study
• Time Series
o Introduction – the nature of time series analysis
o Univariate time series data
§ Decomposition
§ Correlation and Autocorrelation
§ Stationarity
§ AR/MA/ARIMA models
o Visit to Multivariate
o Case study

Machine Learning II

Updated April 2021

10
Data Science with Machine Learning Curriculum

● Tree-Based Models
o Decision Trees
o Bagging Trees
o Random Forests
o Case study
● Gradient Boosting
o First step to increase stability
o OOB Errors and assessment
o Comparison to the Decision Tree
● Random Forests
o Big picture behind Gradient Boosting
o Gradient Boosting with Trees
o Case study
● Support Vector Machines
o Maximum Margin Classifiers
o Support Vector Classifiers
o Support Vector Regressors
o Bias/Variance with SVM
o Case study
● Supervised Learning Roundup
o Regressors
o Classifiers
o Scalability and further topics
• Unsupervised 1: Clustering
o KMeans
o Hierarchical models
o Case study
• Unsupervised 2: Matrix Factorization
o Principal component analysis
o The other LDA: Latent Dirichlet Analysis
o Case studies
• Machine Learning Roundup
o Shoutout to those forgotten
o Working together: where unsupervised and supervised cooperate
o Case study
• Machine Learning Project
Project 3: Machine learning on housing price

Data Science: Advanced Topics


● Neural Network Foundations

Updated April 2021

11
Data Science with Machine Learning Curriculum

o Revisit Logistics Regression


o Introduction to the Multilayer Perception
o Comparison of where Classical ML and DL perform well
o Introduction to keras and Building the First MLP for Classification
● Neural Network Architectures
o Dealing with overfitting
o CNN
o RNN
o Auto encoders
o GAN
● Neural Networks Beyond
o Transfer learning
o Case study
● Scalability in Operation
o From desktop to database to distributed
§ Hadoop
o Big data on a small scale
§ Down-sampling
● Cloud Computing and Deployment
o To Cloud or Server?
o Current Cloud platforms and offerings
§ AWS
§ Google
o Survey of deployment and pipelines

Data Science Capstone Project


The capstone project is designed for students to employ the major data science
concepts, tools, and methods they have learned in the program to solve a business
operational problem with real data sets from a real business entity. Students are
presented data sets and potential problems to solve. Students are then required to
form project teams, develop a project proposal for instructor review and approval,
and execute the project. When the project is completed, each project team is
required to present the project findings and share the business insights obtained
from the research.

Updated April 2021

12

You might also like