Professional Documents
Culture Documents
Unit - 3 MLMM
Unit - 3 MLMM
Versioning
Models and code may be frequently updated to account for drift or for
experimentation. Systems must ensure that the same versions of models and code are deployed
or that they differ in deliberate ways. For debugging, developers must often identify what
specific version of the models and code has made the specific decision and might want to
retrieve or recreate that specific version. When building systems with machine-learning
components, responsible engineers usually aim to version data, ML pipeline code, models,
non-ML code, and possibly infrastructure configurations.
On terminology: Revisions refer to versions of an artifact over time, where one revision
succeeds another, traditionally identified through increasing numbers or a sequence of
commits in a version control system. Variants refer to versions of an artifact that exist in
parallel, for example, models for different countries or two models deployed in an A/B test.
Traditionally, variants are stored in branches or different files in a repository. Version is the
general term that refers to both revisions and variants. Releases are select versions that are
often given a special name and are chosen for deployment. Here, we care about all forms of
versioning.
Versioning Data-Science Code
Versioning of code is standard practice for software developers, who have grown
messages, to a version control system like Git. Also operators now commonly version
discussed in chapter Planning for Operations). The version control system tracks every
single change and who has submitted it, and it enables developers to identify and
versioning their work. Their exploratory workflow does not align well with traditional
version control practices of committing cohesive incremental steps, since there often
are no obvious milestones and much code is not intended to be permanent. For
example, data scientists in our bank might experiment with many different ideas in a
format that makes identifying and showing changes difficult in traditional version
While versioning of experimental code can be useful as backup and for tracking ideas,
versioning usually becomes important once models move into production. This is also
pipelines (see chapter ML Pipeline Quality), for example, when we decide to first
deploy our fraud detection model as part of an A/B experiment. At this point, the code
should be considered as any other production code, and standard version control
practices should be used.
When versioning pipeline code, it is important to also track versions of involved
dependencies, the common strategies are to (1) use a package manager and declare
dependencies with pinned versions (e.g., requirements.txt) or (2) versioning the copied
package all learning code and dependencies into versioned virtual execution
environments, like Docker containers, to ensure that environment changes are also
Reproducing Study
When we are starting to build a new machine learning model and we deciding on the model
architecture, there are a number of issues that arise. We have to monitor code changes you
make, note any differences in the data you've used for training, and keep up with
hyperparameter value updates.
Being able to track all of these changes is important so that you can reproduce your
experiments without wondering which changes gave you the best model. We can go back to
any point in your experimenting process to see which changes gave you the best results.
Background on Hyperparameters
Hyperparameters are the values that define your model. This includes things like the number
of layers in a neural network or the learning rate for gradient descent. These parameters are
different from model parameters because we can't get them from training our model. They are
used to create the model we train with.
Optimizing these values means running training steps for different kinds of models to see
how accurate the results are. We can get the best model from iterating through different
hyperparameter values and seeing how they effect our accuracy. That's why we do
hyperparameter tuning. There are a couple common methods that we'll do some code
examples with: grid search and random search.
Let's start by talking about DVC a bit because we'll be using it to add reproducibility to our
tuning process. This is the tool we'll be using to track changes in our data, code, and
hyperparameters.
With DVC, we can add some automation to the tuning process and be able to find and restore
any really good models that emerge.
For hyperparameter tuning, this means we can play with their values without losing track of
which changes made the best model and also have other engineers take a look. We'll do an
example of this with grid search in DVC first.
After we have cloned the repo, install all of the dependencies with this command.
$ pip install -r requirements.txt
We should be able to open your terminal and run an experiment with the following command.
This will trigger the training process to run and it will record the ROC-AUC of our model.
We can check out the results of your experiment with the following command.
$ dvc exp show --no-timestamp --include-params train.n_est,train.min_split
We're adding a few options here to make the table view clearer. We aren't showing
timestamps and we're only looking at two hyperparameter values. We can run dvc exp
show without the options to see the entire table. This will produce a table similar to this.
Now that we have seen how to run an experiment, we're going to write a small script to
automate grid search for us using DVC. Using grid search in hyperparameter tuning means
we have an exhaustive list of hyperparameter values you want to cycle through. Grid search
will cover every combination of those hyperparameter values.
We'll do this by creating queues. A queue is how DVC allows us to create experiments that
won't be run until later. That way we can cycle through multiple hyperparameters quickly
instead of manually updating a config file with new hyperparameter values for each
experiment run. The command syntax for creating queues looks like this:
In the example queue above, we're updating the train.min_split value that's inside of
the params.yaml file. This file holds all of the hyperparameter values and is where DVC
looks to determine if any values have changed. With the command above, we're
automatically updating that value in the params.yaml using a queued experiment.
Now we can make the script. We can add a new file to the src directory called grid_search.py.
Inside of the file, add the following code.
import itertools
import subprocess
# Automated grid search experiments
n_est_values = [250, 300, 350, 400, 450, 500]
min_split_values = [8, 16, 32, 64, 128, 256]
We can run this script now and generate our queue with this command.
$ python src/grid_search.py
We'll see some outputs in the terminal telling you that your experiments have been queued.
Then you can run them all with the following command.
This will run every experiment that has been queued. Once all of those have run, take a look
at your metrics for each experiment.
Our table should look similar to this when you run the command above. We've included the
Random search
Another commonly used method for tuning hyperparameters is random search. This takes
random values for hyperparameters and builds the model with them. It usually takes less time
than an exhaustive grid search and it can perform better if run for a similar amount of time as
a grid search.
for _ in range(num_exps):
params = {
"rand_n_est_value": random.randint(250, 500),
"rand_min_split_value": random.choice([8, 16, 32, 64, 128, 256])
}
subprocess.run(["dvc", "exp", "run", "--queue",
"--set-param", f"train.n_est={params['rand_n_est_value']}",
"--set-param", f"train.min_split={params['rand_min_split_value']}"])
This search could be far more complex with Bayesian optimization to handle the
hyperparameter value selections, but we're keeping it super simple by choosing random
numbers to focus on reproducibility. This will generate ten experiments with random values
for each hyperparameter.
We can run these new experiments with dvc exp run --run-all and then take a look at the
results with dvc exp show --include-params=train.min_split,train.n_est --no-timestamp. Our
table should look something like this.
This shows the difference in the randomly selected values and the values from grid search.
You might find a better value with random search because it jumps around a range of values
which might hit the optimum faster than it would with a grid search.
Conclusion
With the comparison between grid search and random search, you can see how
reproducibility can help you find the best model for your project. We'll be able to see all of
the hyperparameter changes and code changes that created each model.
This gives you the ability to fine tune your model because you can go to any experiment and
resume training with different values, code, or data.
Machine Learning Metrics
Evaluating our machine learning algorithm is an essential part of any project. Our
model may give us satisfying results when evaluated using a metric say accuracy_score but
may give poor results when evaluated against other metrics such as logarithmic_loss or any
other such metric. Most of the times we use classification accuracy to measure the
performance of our model, however it is not enough to truly judge our model. Different types
Classification Accuracy
Logarithmic Loss
Confusion Matrix
Area under Curve
F1 Score
Mean Absolute Error
Mean Squared Error
Classification Accuracy
Classification Accuracy is what we usually mean, when we use the term accuracy. It is the
It works well only if there are equal number of samples belonging to each class.
For example, consider that there are 98% samples of class A and 2% samples of class B in our
training set. Then our model can easily get 98% training accuracy by simply predicting
every training sample belonging to class A.
When the same model is tested on a test set with 60% samples of class A and 40% samples of
class B, then the test accuracy would drop down to 60%. Classification Accuracy is great,
The real problem arises, when the cost of misclassification of the minor class samples are very
high. If we deal with a rare but fatal disease, the cost of failing to diagnose the disease of a
sick person is much higher than the cost of sending a healthy person to more tests.
Logarithmic Loss
Logarithmic Loss or Log Loss, works by penalising the false classifications. It works well for
multi-class classification. When working with Log Loss, the classifier must assign probability
to each class for all the samples. Suppose, there are N samples belonging to M classes, then
where,
Log Loss has no upper bound and it exists on the range [0, ∞). Log Loss nearer to 0 indicates
higher accuracy, whereas if the Log Loss is away from 0 then it indicates lower accuracy.
In general, minimising Log Loss gives greater accuracy for the classifier.
Confusion Matrix
Confusion Matrix as the name suggests gives us a matrix as output and describes the complete
performance of the model. Let’s assume we have a binary classification problem. We have
some samples belonging to two classes: YES or NO. Also, we have our own classifier which
predicts a class for a given input sample. On testing our model on 165 samples, we get the
following result.
Confusion Matrix
YES.
True Negatives: The cases in which we predicted NO and the actual output was NO.
False Positives: The cases in which we predicted YES and the actual output was NO.
False Negatives: The cases in which we predicted NO and the actual output was YES.
Accuracy for the matrix can be calculated by taking average of the values lying across
the “main diagonal” i.e
Confusion Matrix forms the basis for the other types of metrics.
Area Under Curve
Area Under Curve (AUC) is one of the most widely used metrics for evaluation. It is used for
binary classification problem. AUC of a classifier is equal to the probability that the classifier
will rank a randomly chosen positive example higher than a randomly chosen negative
True Positive Rate (Sensitivity): True Positive Rate is defined as TP/ (FN+TP). True
Positive Rate corresponds to the proportion of positive data points that are correctly
True Negative Rate (Specificity): True Negative Rate is defined as TN / (FP+TN). False
Positive Rate corresponds to the proportion of negative data points that are correctly
False Positive Rate: False Positive Rate is defined as FP / (FP+TN). False Positive Rate
corresponds to the proportion of negative data points that are mistakenly considered as
False Positive Rate and True Positive Rate both have values in the range [0,
1]. FPR and TPR both are computed at varying threshold values such as (0.00, 0.02, 0.04, ….,
1.00) and a graph is drawn. AUC is the area under the curve of plot False Positive Rate vs
our model.
F1 Score
F1 Score is used to measure a test’s accuracy. F1 Score is the Harmonic Mean between
precision and recall. The range for F1 Score is [0, 1]. It tells you how precise your classifier is
(how many instances it classifies correctly), as well as how robust it is (it does not miss a
significant number of instances). High precision but lower recall, gives you an extremely
accurate, but it then misses a large number of instances that are difficult to classify. The
greater the F1 Score, the better is the performance of our model. Mathematically, it can be
expressed as:
Precision: It is the number of correct positive results divided by the number of positive
Mean Absolute Error is the average of the difference between the Original Values and the
Predicted Values. It gives us the measure of how far the predictions were from the actual
output. However, they don’t gives us any idea of the direction of the error i.e. whether we are
under predicting the data or over predicting the data. Mathematically, it is represented as:
Mean Squared Error (MSE) is quite similar to Mean Absolute Error, the only difference being
that MSE takes the average of the square of the difference between the original values and the
predicted values. The advantage of MSE being that it is easier to compute the gradient,
whereas Mean Absolute Error requires complicated linear programming tools to compute the
gradient. As, we take square of the error, the effect of larger errors become more pronounced
then smaller error, hence the model can now focus more on the larger errors.
Machine Learning Model Versioning
It is crucial to understand version control to appreciate model versioning.
Version control
It is the process of tracking and managing modifications in software code or ML systems and
it is an essential part of maintaining a detailed record of changes to a system, enabling data
science teams to revert to previous (favourable) versions and collaborate effectively. Model
versioning, on the other hand, is a specific type of version control focused on tracking changes
made to the ML model in a machine learning system. By versioning the model, teams can
maintain a complete history of changes made to the model, enabling them to reproduce results,
debug issues, and collaborate effectively. In addition, model versioning can track datasets,
metrics, hyperparameters, algorithms, and artifacts to ensure transparency and accuracy in the
ML development process.
Collaboration: If you’re a solo researcher, this might not be important. When you work with
a team and your project is complex, it becomes very difficult to collaborate without a version
control system.
Versioning: While making changes, the model can break. With a version control system, you
get a changelog which will be helpful when your model breaks and you can revert your
changes to get back to a stable version.
Reproducibility: By taking snapshots for the entire machine learning pipeline, you make it
possible to reproduce the same output again, even with the trained weights, which saves the
time of retraining and testing.
Dependency tracking: Tracking different versions of the datasets (training, evaluation, and
development), tuning the model hyperparameters and parameters. By using version control,
you can test more than one model on different branches or repositories, tune the model
parameters and hyperparameters, and monitor the accuracy of each change.
Model Updates: Model development is not done in one step, it works in cycles. With the help
of version control, you can control which version is released while continuing the
development for the next release.
A Version Control System (VCS) is a software tool that enables developers to track and
manage changes to source code, data, or model. By programmatically versioning files and
projects, these tools help data scientists reduce the burden of manual versioning and enable
team collaboration. Also, these tools reduce the likelihood of single-point failure compared to
manual versioning where all changes can be lost if your disk gets damaged.
It involves creating a database of changes to files and directories in the project, allowing you
to revert to previous versions of the project in case of any issues. The system stores a complete
copy of the project in the local database, which allows the team to work offline without the
need for a network connection. It can lead to a single-point failure since it has inadequate
backup capabilities.
A diagram illustrating a local version control system
It stores the code in a central repository, and collaborators work on a local copy of the code.
Changes are made to the local copy, and then the changes are committed to the central
repository. Examples of CVCS include Subversion (SVN), and Perforce.
It also stores the code in a central repository, but each collaborator has a local copy of the
entire repository, enabling them to work offline and commit changes to the local repository.
Changes can then be merged into the central repository as needed. Examples of DVCs
include Git, Mercurial, and Bazaar. Git is a version control system while GitHub is a cloud-
based hosting service that helps you manage Git repositories.
Git is the most popular version control system used by developers and data scientists alike as it
has a very reliable workflow, is massively supported by most third-party platforms like
GitHub, GitLab, etc, and the immense adoption of distributed version control systems by the
vibrant development community. Here is an example of using git for model versioning on
your local machine. We can connect to our GitHub account to interact with your git repository
as well. This is a general example showing the idea behind model versioning with git.
## initialize a new Git repository in your project directory
git init
## create a new PyTorch model and save it to a file
import torch
import torch.nn as nn
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
self.fc1 = nn.Linear(32 * 8 * 8, 64)
self.fc2 = nn.Linear(64, 10)
## add the model file to the Git repository and commit the changes
git add model.pth
git commit -m "Initial version of PyTorch model"
## train the model on some data and save the new version
# Load some data and train the model
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
## Add the new model version to the Git repository and commit the changes
Apart from versioning the code and data in an ML development lifecycle, the model and the
environment should be versioned most for reproducibility and model optimization.
Models:
Model architecture or algorithm, hyperparameters (batch size, learning rate, epochs, etc.), and
weights can be versioned.
Model evaluation metrics and results for each version, including test accuracy and other
relevant performance indicators, should be documented. This enhances the model’s
explainability and performance throughout experimentation.
The configurations used for training and deploying models should be versioned. These
configurations include dependencies like libraries and packages to ensure training and
deployment environment consistency.
The deployment scripts used to deploy the model can be versioned to enable the
reproducibility of the deployment process.
The dependencies required for the deployment environment, such as the operating system,
runtime libraries, and software packages, can be versioned to ensure the consistency of the
deployment environment.
In order to ensure consistency and clarity in model versioning, different types of version
numbers are used to indicate the scope of the changes made to a model. These types of version
numbers are typically broken down into
Major version
Minor version
Patch version
Major Version: A major version indicates a significant change that could impact the
performance or functionality of your model. Typically, a major version update involves a
significant change in your model’s architecture, algorithms, or training data. You can also
introduce new features or capabilities using this. Major versions are typically denoted by
incrementing the first digit in the version number.
Minor Version: A minor version indicates a smaller change that typically does not
significantly affect the model’s performance or functionality. For example, a minor version
update could involve a bug fix, a small optimization, or a new feature that does not
fundamentally alter the model’s behaviour. Minor versions are typically denoted by
incrementing the second digit in the version number.
Patch Version: A patch version indicates a small change or bug fix that is made to a specific
version of the model. Patch versions are typically denoted by incrementing the third digit in
the version number.
Versioning schemes
Semantic Versioning:
Semantic versioning is a widely used versioning scheme that uses a three-part version number
consisting of major, minor, and patch versions. The version number is typically written in the
format “major.minor.patch”. This scheme is often used for software libraries, frameworks, and
APIs.
Calendar Versioning:
Calendar versioning is a versioning scheme that uses the date of release as the version number.
For example, a model released on January 1st, 2022 would have a version number of
2022.01.01. This scheme is often used for data science projects where the focus is on tracking
changes over time.
Sequential Versioning:
Sequential versioning is a versioning scheme that uses a simple sequential numbering system
to track versions. Each new version is assigned the next available number in the sequence
(e.g., 1, 2, 3, etc.). This scheme is often used for small projects or individual models.
While model versioning is an essential part of machine learning development, it comes with
its own set of challenges. As models become more complex and teams grow, it becomes
increasingly important to manage the versioning process effectively. Here are a few
considerations that should be made depending on the size of your project:
One major challenge in model versioning is data management and versioning. Keeping track
of changes in data used to train models is crucial as it can have a significant impact on model
performance. However, managing large datasets and tracking changes to them can be
challenging, especially when dealing with distributed datasets across multiple machines. It is
essential to establish proper protocols and workflows for data versioning to ensure data
consistency, reliability, and easy tracking of changes.
Integrating model versioning with existing workflows and systems can also pose a challenge.
Different teams may have different tools and workflows, and integrating model versioning
into these can require significant effort. It is essential to choose a model versioning system that
can integrate seamlessly with existing tools and workflows to minimize disruption and ensure
smooth collaboration across teams.
Machine Learning Model Versioning with DVC
DVC lets us connect with storage providers like AWS S3, Microsoft Azure Blob
Storage, Google Drive, Google Cloud Storage, HDFS, etc., to store ML models and
datasets.
ML Experiment Management
DVC introduces pipelines that help in the easy bundling of ML models, data, and code
Depending on the type of remote storage that will be used, we have to install optional
dependencies: [s3], [gdrive], [gs], [azure], [ssh], [hdfs], [webdav], [oss]. Use [all] to
include them all. Here, we will be using google drive as remote storage, so
Getting Started
We will see how to use dvc for tracking data and ml models with gdrive as remote
storage. Imagine the Git repository which contains the following structure:
models
utils
Gdrive Remote Configuration
Now, we need to configure gdrive remote storage. Go to your google drive and create a
folder called dvc_storage in it. Open the folder dvc_storage. Get the folder-id of the
dvc_storage folder from the URL:
https://drive.google.com/drive/folders/folder-id
Now, use the following command to use the dvc_storage folder created in the google
drive as remote storage:
dvc remote add myremote gdrive://folder-id
Now, we need to commit the changes to git repository by using the command:
git add -A
git commit -m "configure dvc remote storage"
To push the data to remote storage, we use the following command:
dvc push
Then, we push the changes to git using the command:
git push
To pull data from dvc, we can use the following command:
dvc pull
DVC Pipelines
We can make use of DVC pipelines to reproduce the workflows in our repository. The
main advantage of this is that we can go back to a particular point in time and run the
pipeline to reproduce the same result that we had achieved during the previous time.
There are different stages in the DVC pipeline like prepare, train, and evaluate, with
each of them performing different tasks. The DVC pipeline is nothing but a DAG
(Directed Acyclic Graph). In this DAG graph, there are nodes and edges, with nodes
representing the stages and edges representing the direct dependencies. The pipeline is
Use the prepare stage to run the data cleaning and pre-processing steps. Use
the train stage to train the machine learning model using the data from the prepare stage.
The evaluate stage uses the trained model and predictions to provide different plots and
metrics.
Machine learning model management framework will help data scientists and
engineers more efficiently manage the end-to-end machine learning lifecycle. The framework
provides a centralized repository for storing, sharing, and tracking machine learning models
and metadata. The repository can be used to store both code and model artifacts, and it
provides a web interface for accessing models and training results. The framework also
includes tools for automated model deployments, monitoring, and versioning. These tools
will help data scientists track the performance of their models in production and quickly roll
back changes if necessary.
The framework is designed to work with any machine learning model, allowing users
to easily manage and deploy models. The framework includes a set of tools that allow users
to train, test, and deploy models. The framework also includes a set of APIs that allow
developers to easily integration the framework into their applications.
The machine learning model management framework offers several benefits for users,
including the ability to:
Easily track machine learning models throughout their entire lifecycle, from training
to deployment
Efficiently manage large numbers of models
Automate key tasks such as model retraining and performance monitoring
Share models and collaborate with other users in a secure and controlled manner.
The Machine Learning model management framework is a toolkit that helps you manage
your machine learning models throughout their lifecycle, from development to production.
The framework includes tools for training, tuning, and deploying machine learning models.
The goal of the Machine Learning model management framework is to make it easy to
manage machine learning models at scale. The framework is designed to work with any
machine learning platform and any type of machine learning model.
To get started with the Machine Learning model management framework, you’ll need to
install the following:
– Python 3.5 or higher
– TensorFlow 1.12 or higher
– The latest version of the Machine Learning model management framework package
Studio ml setup
Install Python, preferably using a distribution like Anaconda, which comes with
popular data science libraries pre-installed.
Install essential libraries such as NumPy, pandas, scikit-learn, TensorFlow, PyTorch,
or any other libraries you plan to use for ML development.
Use version control systems like Git to manage your ML projects and collaborate with
others.
Create a Git repository for your projects and initialize it with a README file.
4. Data Management:
Organize your data effectively by setting up directories for raw data, processed data,
and datasets used in specific projects.
Use data versioning tools like DVC (Data Version Control) to track changes to your
data and ensure reproducibility.
5. Notebook Management:
6. Experiment Tracking:
Use experiment tracking tools like MLflow or Neptune to log parameters, metrics, and
artifacts from your experiments.
These tools help you keep track of experiments, compare results, and reproduce
experiments later.
Implement monitoring solutions to track model performance, data drift, and model
drift in production.
Continuously monitor and update models to ensure they remain accurate and reliable
over time.
9. Documentation and Communication:
Document your workflows, experiments, and findings to share with collaborators and
stakeholders.
Use tools like Markdown, Jupyter Notebooks, or wiki pages to create documentation.
By following these steps, we can create a robust and efficient ML studio environment to
develop, train, and deploy machine learning models effectively. Adjust the setup according to
your specific requirements and preferences.
The goal of building a machine learning model is to solve a problem, and a machine
learning model can only do so when it is in production and actively in use by consumers. As
such, model deployment is as important as model building. Data scientists excel at creating
models that represent and predict real-world data, but effectively deploying machine learning
models is more of an art than science. Deployment requires skills more commonly found in
software engineering and DevOps.
Many teams embark on machine learning projects without a production plan, an approach
that often leads to serious problems when it's time to deploy. It is both expensive and time-
consuming to create models, and you should not invest in an ML project if you have no plan
to put it in production, except of course when doing pure research. There are three key areas
your team needs to consider before embarking on any ML projects are:
A machine learning model is of no use to anyone if it doesn’t have any data associated with
it. You’ll likely have training, evaluation, testing, and even prediction data sets. You need to
answer questions like:
The size of your data also matters a lot. If your dataset is large, then you need more
computing power for preprocessing steps as well as model optimization phases. This means
you either have to plan for more compute if you’re operating locally, or set up auto-scaling in
a cloud environment from the start. Remember, either of these can get expensive if you
haven’t thought through your data needs, so pre-plan to make sure your budget can support
the model through both training and production
Even if you have your training data stored together with the model to be trained, you still
need to consider how that data will be retrieved and processed. Here the question of batch vs.
real-time data retrieval comes to mind, and this has to be considered before designing the ML
system. Batch data retrieval means that data is retrieved in chunks from a storage system
while real-time data retrieval means that data is retrieved as soon as it is available.
Along with training data retrieval, you will also need to think about prediction data retrieval.
Your prediction data is rarely as neatly packaged as the training data, so you need to consider
a few more issues related to how your model will receive data at inference time:
As with retrieval, you need to consider whether inference is done in batches or in real-time.
These two scenarios require different approaches, as the technology/skill involved may be
different. For batch inference, you might want to save a prediction request to a central store
and then make inferences after a designated period, while in real-time, prediction is
performed as soon as the inference request is made. Knowing this will enable you to
effectively plan when and how to schedule compute resources, as well as what tools to use.
Your model isn’t going to train, run, and deploy itself. For that, you need frameworks and
tooling, software and hardware that help you effectively deploy ML models. These can be
frameworks like Tensorflow, Pytorch, and Scikit-Learn for training models, programming
languages like Python, Java, and Go, and even cloud environments like AWS, GCP, and
Azure.
After examining and preparing your use of data, the next line of thinking should consider
what combination of frameworks and tools to use.
The choice of framework is very important, as it can decide the continuity, maintenance, and
use of a model. In this step, you must answer the following questions:
Popularity: How popular is the tool in the developer community? Popularity often means it
works well, is actively in use, and has a lot of support. It is also worth mentioning that there
may be newer tools that are less popular but more efficient than popular ones, especially for
closed-source, proprietary tools. You’ll need to weigh that when picking a proprietary tool to
use. Generally, in open source projects, you’d lean to popular and more mature tools for
reasons I’ll discuss below.
Support: How is support for the framework or tool? Does it have a vibrant community
behind it if it is open-sourced, or does it have good support for closed-source tools? How fast
can you find tips, tricks, tutorials, and other use cases in actual projects? Does it run on
Windows, Linux, or Mac OS? Is it easy to customize or implement in this target
environment? These questions are important as there can be many tools available to research
and experiment on a project, but few tools that adequately support your model while in
production.
ML projects are never static. This is part of engineering and design that must be considered
from the start. Here you should answer questions like:
Deploying a machine learning model involves the process of making your trained model
available for use in real-world applications. Here's a step-by-step overview of the deployment
process:
1. Preprocessing and Feature Engineering: Ensure that the preprocessing steps and feature
engineering techniques used during training are implemented in the deployment pipeline.
This includes scaling, normalization, encoding categorical variables, handling missing values,
etc.
4. Model Integration: Integrate your trained model into the deployment environment. This
may involve loading the model weights, architecture, and any other necessary files.
6. Scalability and Performance: Ensure that your deployment setup can handle the expected
load and maintain performance requirements. This may involve load testing and optimizing
the deployment infrastructure.
7. Monitoring and Logging: Implement monitoring and logging mechanisms to track the
performance of your deployed model in real-time. This helps in identifying issues,
monitoring model drift, and gathering insights for further improvement.
10. Feedback Loop: Establish a feedback loop to collect user feedback and performance
metrics from the deployed model. This feedback can be used to iteratively improve the model
over time.
By following these steps, you can effectively deploy your machine learning model and make
it available for use in real-world applications.
Deployment
Method Description Implementation Advantages
Decouples model from
Deploy the model application logic, supports
as an API service, real-time predictions, enables
allowing other integration with various
applications to send Use web frameworks platforms, and facilitates
API-Based input and receive like Flask, Django, or scalability and concurrent
Deployment predictions. FastAPI. access.
Ensures consistency across
Package the model environments, simplifies
and dependencies deployment and dependency
into a container management, supports
image for Use Docker to create microservices architecture,
consistency and and deploy container and enables scalability and
Containerization portability. images. portability.
Eliminates server
Deploy the model Utilize serverless management overhead,
as a serverless platforms like AWS supports event-driven
function, which Lambda, Azure architectures, auto-scaling,
Serverless automatically scales Functions, or Google and pay-per-use billing, and
Deployment based on demand. Cloud Functions. reduces operational costs.
Enables offline inference,
reduces latency by avoiding
network communication,
Integrate the model enhances privacy and security
directly into by keeping data local, and
embedded systems Optimize the model supports use cases with
Embedded or edge devices for for resource- limited or unreliable internet
Deployment local inference. constrained devices. connectivity.
Deployment
Method Description Implementation Advantages
Streamlines deployment
process, abstracts away
Use specialized infrastructure management,
model serving provides built-in support for
frameworks for Utilize frameworks common deployment tasks,
deploying and like TensorFlow and offers scalability,
Model Serving managing models Serving or reliability, and monitoring
Frameworks in production. TorchServe. capabilities.
Reduces latency for real-time
applications, improves
Deploy the model privacy and security by
to edge computing Utilize edge keeping data local, conserves
devices or servers computing platforms network bandwidth, and
Edge closer to the data or edge AI enables offline operation in
Deployment source or users. frameworks. disconnected environments.