You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9



Part A: Course Design

Course Title MLOps

Course No(s) AIML ZG523
Credit Units 4
Content Authors Pravin Y Pawar
Version 1.0

Course Description

Adaptation of DevOps for building and deploying machine learning systems, Model Deployment:
Infrastructure requirements; Deployment patterns, Model CI/CD (Build, Test, Integration and Delivery of
model); Model Serving tools and technologies; Model life cycle management, ML pipelines with data
management support, model assessment, evolution and management in production, MLOps infrastructure and
tools; Trends in Model deployment: ML on the Cloud / Edge / Browsers; VMs, Containers, Docker,
Kubernetes (K8S), FaSS; ML-as-a-Service.

Course Objectives

The course aims at providing:

CO1 Provide understanding of the requirements, stakeholders and essential steps involved in building a
machine learning pipeline

CO2 Equip with conceptual knowledge and hands-on experience in ML model deployments on different

CO3 Experience in automating the process for continually developing, evaluating, deploying and
updating the models

CO4 Introduce and apply industry practices for model monitoring and observability

CO5 Orient towards latest trends in MLOps space especially on cloud, for edge and mobile devices, in

Text Book(s)

T1 Introducing MLOps, Treveil and Dataiku Team

T2 Designing ML Systems, Chip Huyen

T3 Engineering MLOps: Rapidly build, test, and manage production-ready machine learning life
cycles at scale

Reference Book(s) & other resources

R1 Reliable Machine Learning – Applying SRE Principles to ML in Production, Chen et al

R2 Machine Learning Engineering, Andriy Burkov

R3 Building Machine Learning Pipelines - Automating Model Life Cycles with TensorFlow, Hannes
Hapke & Catherine Nelson

R4 Practical MLOps, Noah Gift, Alfredo Deza

R5 Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker, Google Cloud, and
Microsoft Azure

R6 AWS / Azure MLOps documentation

R7 Various product technical / white papers

Learning Outcomes:

Students will be able to :

LO1 Determine the infrastructure and tooling requirements necessary for a specified ML Use case

LO2 Build, deploy, serve, orchestrate and analyze ML pipeline using open-source tools/platforms

LO3 Refine ML models through retraining, periodic tuning and complete remodeling to ensure long-
term accuracy

LO4 Appreciate model Serving approaches on various targets such as edge, mobile device, on cloud, in

Part B: Course Handout

Academic Term I Semester 2023-2024

Course Title MLOps
Course No AIML ZG523
Lead Instructor Pravin Y Pawar

Glossary of Terms

Module M Module is a standalone quantum of designed content. A typical course is

delivered using a string of modules. M2 means module 2.

Contact Hour CH Contact Hour (CH) stands for an hour long live session with students
conducted either in a physical classroom or enabled through technology.
In this model of instruction, instructor led sessions will be for 32 CH.

Recorded RL RL stands for Recorded Lecture or Recorded Lesson. It is presented to the

Lecture student through an online portal. A given RL unfolds as a sequences of
video segments interleaved with exercises.

Lab Exercises LE Lab exercises associated with various modules

Self-Study SS Specific content assigned for self-study

Homework HW Specific problems/design/lab exercises assigned as homework

Modular Structure

Module Summary
No. Content of the Module
M1 MLOps Foundations

M2 Process and Tooling

M3 Model Experimentation and Packaging

M4 Model deployment & Orchestration

M5 Model Serving

M6 Monitoring & Observability

M7 Continual Learning and Testing

M8 Trends in MLOps

Detailed Structure

M1: MLOps Foundations

Contact Session 1-3

Session Type Description/Plan Reference

1 CH1  Three levels of ML software ML Workflow
Three Levels
 ML life-cycle and System Architecture AWS ML Lens
ML System Arch

2 CH3  Challenges with ML lifecycle Challenges

CH4  T1 Ch1
 Motivation and Drivers for MLOps
 Peoples of MLOps  T1 Ch2
3 CH5  Key MLOps features and maturity models  T1 Ch3
Google MLOps
 AllTheOps: DataOps, ModelOps, AIOps AllTheOps

Post CS AR  Hidden technical debt in machine learning systems

 2020 State of Enterprise Machine Learning | Algorithmia
 Why is DevOps for Machine Learning so Different?
 Delivering on the Vision of MLOps
 MLOps Principles
 MLOps Principles and How to Implement Them
 MLOps vs. DevOps vs. ModelOps
 Differences Between MLOps, ModelOps, AIOps,
 Roles in ML Team and How They Collaborate
LE With Each Other

 Lab 1

M2: Process and Tooling

Contact Session 4-5

Session Type Description/Plan Reference

4 CH7  MLOps life-cycle, process and capabilities  Google Guide
CH8  Infrastructure: Storage and Compute  T2 Ch10
 Dev / Production Env, Runtimes  T1 Ch5
5 CH9  ML Platforms T2 Ch10
 Landscape of MLOps Tools / Platforms ML Platforms
CH10 Mymlops
 Uber’s Michelangelo tools
 TFX @ Spotify Michelangelo
 ML @ NetFlix’s SpotifyI
Post CS AR  Building a Machine Learning Platform [Definitive
 Building a machine learning platform
 Open Source MLOps: Platforms, Frameworks and
 A Tour of End-to-End Machine Learning
 End to End ML Platform! Are we there yet?
 ML Platform Podcast
 MLOps Landscape in 2023: Top Tools and
M3: Model Experimentation and Packaging

Contact Session 6-7

Session Type Description/Plan Reference

6 CH11  Experimentation
CH12  Model Versioning
 Model Metadata
 Model Registry
7 CH13  Model Packaging  T3 Ch5
 Model File formats  R4 Ch4
 Serialization
 Containerization
Post CS AR  Three Levels of ML Software
 Guide to File Formats for Machine Learning:
Columnar, Training, Inferencing, and the Feature
 Lab 2

M4: Model deployment & Orchestration

Contact Session 8-9

Session Type Description/Plan Reference

8 CH15  Deployment Myths  T2 Ch7
CH16  Productionalization and Deployment  T1 Ch6
 Deployment requirements and challenges

 Deployment Patterns  R2 Ch8

o Static / Dynamic / Streaming

9 CH17  Orchestration of ML Pipelines

CH18  Apache Beam/AirFlow/KubeFlow

Post CS LE  Lab 3

M5: Model Serving

Contact Session 10-12

Session Type Description/Plan Reference

10 CH19  Properties of Model Serving runtime  R2 Ch8
CH20  Key serving questions: Load, Latency, Location,  R1 Ch7
Hardware, Execution, Feature pipelines
 three-
 Model serving Architectures/ patterns levels-of-ml-

 Batch vs Online Prediction/Scoring T2 Ch7

11 CH21  Model Server(Model as a Service)
CH22  Model API Design
 Real-time model serving

12 CH23  Integrated ML Platforms / ML-as-a-Service Platform

 Case study: AirBnb, Netflix,
Post CS LE  How to Solve the Model Serving Component of
the MLOps Stack
 Lab 3 and 5

M6: Monitoring & Observability

Contact Session 13-14

Session Type Description/Plan Reference

13 CH25  Causes of ML system failures • T2 Ch8
 Model degradation • T1 Ch7
 Drifts detection
 Feedback loop
14 CH27  Production Monitoring  R1 Ch9
CH28  Monitoring and Observability  T2 Ch8
o ML-specific Metrics
o Monitoring toolbox
o Observability
Post CS AR  A Comprehensive Guide on How to Monitor Your
Models in Production
 Arize - Machine Learning Observability
LE  Lab 4

M7: Continual Learning and Testing

Contact Session 15

Session Type Description/Plan Reference

15 CH29  Continual Learning  T2 Ch9
o Stateless retraining vs Stateful training
o Challenges
o Stages of continual learning
o Model upgradations
 Test in production  R2 Ch7
o Offline vs Online evaluation  T2 Ch9
o Shadow deployment
o A/B testing
o Canary releases
o Interleaving experiments
o Bandits
Post CS SS  To be identified

M8: Trends in MLOps

Contact Session 16

Session Type Description/Plan Reference

16 CH31  Model Compression
CH32  ML in Browsers and Mobile Phones
 ML on Edge
 Continuous ML
 Federated Machine Learning
Post CS SS  To be identified

Experiential Leaning Component

Lab Topic

1 Construct an end-to-end Machine Learning Pipeline using  Virtual Labs

MLflow (MATS?)

 Stages
a) Problem understanding (aka business understanding)
b) Data collection
c) Data annotation
d) Data wrangling
e) Model development, training and evaluation
f) Model Validation
g) Local Model deployment
h) Prediction

 Tech-Stack
a) RDBMS / Real time Source
b) SQL/Python
c) Dbt
d) Feast
e) DVC
f) Python/Scikit-Learn
g) MLFlow
h) GitHub

2 Manage Machine Learning Model Metadata using MLFlow /  Virtual Labs

Neptune (Continuous Integration)

 Components
a) Projects
b) Experiments
c) Metadata
d) Model tracking / logging
e) Model Registry
 Tech-Stack
a) Python
b) Jupyter Notebooks
c) MLFlow / Neptune
d) GIT?

3 Deploy and serve the ML model as Microservices (Continuous  Virtual Labs


 Stages
a) Triggers for deployment
b) Local Deployment using containers
c) Cloud deployment using AWS services (Sagemaker +
S3 etc.)
d) Offline (batch) serving
e) Online serving
 Tech-stack
a) Python
b) Containers
c) AWS
d) MLflow Model Registry
e) API

4 Monitor the Performance of deployed predictive model  Virtual Labs

 Stages
a) Monitoring Data and feature drift
b) Monitoring target drift
c) Monitoring model performance
 Tech-Stack
a) MLflow
b) Model Server
c) Evidently

5 Manage MLOps lifecycle using Cloud services  Virtual Labs

 Stages
 Tech-Stack
a) Azure Machine Learning
b) Azure DevOps

Evaluation Scheme:

Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session

No Name Type Duration Weight Day, Date, Session, Time

Experiential learning Take 15 days 20% TBA
EC-1 Assignment-I Home
Experiential learning Take 15 days 20% TBA
Assignment-II Home
EC-2 Mid-Semester Test Closed 2 hours 30% Per programme schedule
EC-3 Comprehensive Open 3 hours 30% Per programme schedule
Exam Book

Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 7

Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 16)

Important links and information:

Elearn portal:
Students are expected to visit the Elearn portal on a regular basis and stay up to date with the latest
announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule provided on the Elearn portal.
Evaluation Guidelines:
1. EC1 consists of two assignments. Announcements will be made available on the portal, in a timely
2. For Closed Book tests: No books or reference material of any kind will be permitted.
3. For Open Book exams: Use of books and any printed / written reference material (filed or bound) is
permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted in all
exams. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student
should follow the procedure to apply for the Make-Up Test/Exam which will be made available on the
Elearn portal. The Make-Up Test/Exam will be conducted only at selected exam centres on the dates to
be announced later.

It shall be the responsibility of the individual student to be regular in maintaining the self-study schedule as
given in the course handout, attend the online lectures, and take all the prescribed evaluation components such
as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.

You might also like