Arize Guide To Optimized Retraining

A
A GUIDE
GUIDE TO
TO
Optimized
Optimized
Retraining
AUTHORS
Trevor LaViale Claire Longo

Introduction
While the industry has invested a lot in processes and

techniques for knowing when to deploy a model into
production, there is arguably less collective knowledge on
the equally important task of knowing when to retrain a
model. In truth, knowing when to retrain a model is hard
due to factors like delays in feedback or labels for live
predictions. In practice, many practitioners just end up
training on a specific schedule — or not at all — and hope
for the best.
Based on direct experience working with customers with

models in production topping billions of daily predictions,
this guide is designed to help data scientists and machine
learning engineering teams embrace automated retraining.
Guide to Optimized Model Retraining

Approaches for Retraining
There are two core approaches to automated model retraining:
Fixed Dynamic
Retraining a set cadence Adhoc triggered retraining based
(e.g., daily, weekly, monthly) on model performance metrics.
While the fixed approach is straightforward to implement, there are some drawbacks. Compute
costs can be higher than necessary, and the frequent retraining can lead to inconsistencies
from one model to another, while infrequent retraining schedules can lead to a stale model.
The dynamic approach can prevent models from going stale, and optimize the compute cost.
While there are numerous approaches to retraining, Arize has compiled some recommended
best practices for dynamic model retraining that will keep models healthier and performant.

There is a suite of various tools that can be used to create a model retraining system. The
diagram on the preceding page shows how an ML observability platform (i.e. Arize) can
integrate into a generalized flow.
There are a wealth of tutorials for specific tooling. Here are a few:
Automated model Automated model Pachyderm

retraining with retraining with example for
Eventbridge Airflow FinTech use-case
Ready to get started? Take it a step further with Etsy’s take on stateful model retraining.
Retraining Strategy
Automating the retraining of a live machine learning model can be a complex task, but there
are some best practices that can help guide the design.
1. Metrics to trigger retraining: The metrics used to trigger retraining will depend on the
specific model and use-cae. Each metric will need a threshold set. The threshold will be
used to trigger retraining when the performance of the model falls below the threshold.
This is where monitors can come into play. When a performance monitor fires in Arize, for
example, the Arize GraphQL API can be used to programmatically query the performance
and drift metrics to evaluate whether retraining is needed.
Ideal metrics to trigger model retraining:

• Prediction (score or label) drift
• Performance metric degradation
• Performance metric degradation for specific segments/cohorts.
• Feature drift
• Embeddings drift

Drift is the measure of the distance between two distributions. It is a meaningful metric
for triggering model retraining because it indicates how much your production data
has shifted from a baseline. Statistical drift can be measured with various drift metrics.
The baseline dataset used to calculate drift can be derived from either the training
dataset, or a window of production data.
2. Ensuring the new model is working

The new model will need to be tested or validated before promoting it to production to
replace the old one. There are a few recommended approaches here:
• Human review
• Automated metric checks in CI/CD pipeline
3. Strategy for promoting the new model

The strategy for promoting the new model will depend on the impact that the model
has on the business. In some cases, it may be appropriate to automatically replace the
old model with the new model. But in other cases, the new model may need to be A/B
test live before replacing the old model.
Some strategies for live model testing to consider are:
• Champion vs. Challenger - serve production traffic to both models but only use the
prediction/response from the existing model (champion) in the application. The data
from the challenger model is stored for analysis but not used.
• A/B testing - split production traffic to both models for a fixed experimentation period.
Compare key metrics at the end of the experiment and decide which model to promote.
• Canary deployment - start by redirecting a small percentage of production traffic to the
new model. Since it’s in a production path, this helps to catch real issues with the new
model but limits the impact to a small percentage of users. Ramp up the traffic to the
new model until the new model receives 100% of the traffic.
4. Retraining feedback loop data

Once we identify that the model needs to be retained, the next step is to choose the
right dataset to retrain with. Here are some recommendations to ensure the new
training data will improve the models performance.
• If the model performs well overall, but is failing to meet optimal performance criteria
on some segments, such as specific feature values or demographics, the new training
dataset should contain extra data points for these lower performing segments. A simple
upsampling strategy can be used to create a new training dataset that targets these low
performing segments.
• If the model is trained on a small timeslice, the training dataset may not accurately
capture and represent all possible patterns that will appear in the live production data. To
prevent this, avoid training the model on recent data alone. Instead, use a large sample of
historical data, and augment this with the latest data to add additional patterns for the
model to learn from.
• If your model architecture follows the transfer learning design, new data can simply be
added to the model during retraining, without losing the patterns that the model has
already learned from previous training data.

Arize dashboards are great for tracking and comparing model live performance during
these tests. Whether the model is tested as a shadow deploy, live A/B test, or simply an
offline comparison, these dashboards offer a simple way to view a side by side model
comparison. The dashboards can also easily be shared with others to demonstrate model
performance improvements to stakeholders.
Measurable ROI
Overall, it’s important to have a clear understanding of your business requirements and
the problem you are trying to solve when determining the best approach for automating
the retraining of a live machine learning model. It’s also important to continuously
monitor the performance of the model and make adjustments to the retraining cadence
and metrics as needed.
Measuring Cost impact:

Although it is challenging to calculate direct ROI for some tasks in AI, the value of
optimized model retraining is simple, tangible, and possible to calculate directly. The
compute and storage costs for model training jobs are often already tracked as part of
cloud compute costs. Often, the business impact of a model can be calculated as well.

When optimizing retraining, we are considering both the retraining costs, and the impact
of model performance to the business (“AI ROI”). We can weigh this cost against each
other to justify the cost of model retraining.
Here, we propose a weekly cost calculation, although this calculator can be adapted to
a different cadence such as daily or monthly depending on the model’s purpose and
maintenance needs.
Retraining Cost =
( compute cost for retraining
+ cost of storing new model ) x frequency per week
SCENARIO ONE SCENARIO TWO
The model is retraining too frequently The model is not retrained enough
My model costs $200 to retrain. I My model costs $200 to train. I train

train my model 1x per day. This model my model once per week. This model
maintained a steady average weekly maintained a steady average weekly
accuracy of 85%. I set up a pipeline Accuracy of 65%. I set up a pipeline
to automatically retrain based on to automatically retrain based on
prediction score drift greater than prediction score drift greater than 0.25
0.25 PSI and accuracy. Based on the PSI. Based on the new rule, my model
new rule, my model starts retraining retrains twice a week, and has achieved
only twice a week, and maintains that a better Accuracy of 85%.
accuracy of 85%.
Comparison of weekly maintenance
Comparison of weekly maintenance costs:
costs:
• Old model maintenance cost: 1*$200
• Old model maintenance cost: 7*$200 = $200 for 65% accuracy
= $1400 • New model maintenance cost:
• New model maintenance cost 2*$200= $400 for 85% accuracy
2*$200= $400 For a higher price, better model
That’s a x% reduction in model performance has been achieved.
maintenance costs. Although this This can be justified and profitable
is a simple contrived example, the if the AI ROI values are higher than
magnitude of cost savings can be on the retraining costs. Lack of frequent
this scale. retraining could have been leaving
money on the table.

Conclusion
Transitioning from model retraining at fixed intervals to automated model retraining triggered
by model performance offers numerous benefits for organizations, from lower compute costs at
a time when cloud costs are increasing to better AI ROI from to improved model performance.
Hopefully this guide provides a template for teams to take action.
Questions or thoughts? Feel free to reach out in the Arize Slack community.
Tto start your ML observability journey, sign up for a free account

of schedule a demo.
For the latest on ML observability best practices and tips, Sign up

for our monthly newsletter The Drift.

Arize Guide To Optimized Retraining

Uploaded by

Copyright:

Available Formats

You might also like

Arize Guide To Optimized Retraining

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Arize Guide To Optimized Retraining

Uploaded by

Copyright:

Available Formats

A

Trevor LaViale Claire Longo

While the industry has invested a lot in processes and

Based on direct experience working with customers with

Guide to Optimized Model Retraining

There are two core approaches to automated model retraining:

Guide to Optimized Model Retraining

Automated model Automated model Pachyderm

Ideal metrics to trigger model retraining:

Guide to Optimized Model Retraining

2. Ensuring the new model is working

3. Strategy for promoting the new model

Some strategies for live model testing to consider are:

4. Retraining feedback loop data

Guide to Optimized Model Retraining

Measuring Cost impact:

Guide to Optimized Model Retraining

SCENARIO ONE SCENARIO TWO

My model costs $200 to retrain. I My model costs $200 to train. I train

Guide to Optimized Model Retraining

Tto start your ML observability journey, sign up for a free account

For the latest on ML observability best practices and tips, Sign up

You might also like