Download as pdf or txt
Download as pdf or txt
You are on page 1of 222

FLAML: A Fast Library for AutoML & Tuning

github.com/microsoft/FLAML

Chi Wang1, Qingyun Wu2, Susan Xueqing Liu3, Luis Quintanilla1

1. Microsoft
2. Penn State University
3. Stevens Institute of technology

1
Copyright © 2022, by tutorial authors
What Will You Learn

How to use FLAML to (1) find accurate ML models with low computational
resources for common ML tasks; (2) tune hyperparameters generically

How to leverage the flexible and rich Finish the last mile for deployment
customization choices Create new applications

Code examples, demos, use cases

Research & Development opportunities

2 Copyright © 2022, by tutorial authors


Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)


Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
3 Copyright © 2022, by tutorial authors
Overview
7 Cross-industry Disruptive Trends in Next Few Decades –
McKinsey (2021)
ž Applied AI ž Future of programming
>75% of all digital-service touch points will ~30x reduction in the working time required
see improved usability, enriched for software development and analytics
personalization, and increased conversion

Companies need to master MLOps to make the most


of both trends

5 Copyright © 2022, by tutorial authors


Machine Learning Workflow

Get the data Prepare the data Train Evaluate Deploy Inferencing

Model Training Model Consumption

6 Copyright © 2022, by tutorial authors


Lots of decisions to make to
optimize model performance
• Select learner
Get the data
• Tune hyperparameters
Prepare the data Train Evaluate Deploy Inferencing
• …

Model Training Model Consumption

7 Copyright © 2022, by tutorial authors


Lots of decisions to make:
• Select learner
• Tune hyperparameters
• …
Optimize model performance

8 Copyright © 2022, by tutorial authors


Lots of decisions to make:
• Select learner
• Tune hyperparameters
• …
n_estimators = 5
Optimize model performance n_leaves = 10
learning_rate = 0.1

n_estimators = 1000
n_leaves = 100
learning_rate = 0.01


9 Copyright © 2022, by tutorial authors


Lots of decisions to make:
• Select learner ML

• Tune hyperparameters
• …
Optimize model performance

10 Copyright © 2022, by tutorial authors


Lots of decisions to make:
• Select learner
• Tune hyperparameters
Get the data
• …
Prepare the data Train Evaluate Deploy Inferencing
Optimize model performance

Model Training Model Consumption

11 Copyright © 2022, by tutorial authors


AutoML
Click to add text
Get the data Prepare the data Train Evaluate Deploy Inferencing

Model Training Model Consumption

AutoML Autonomous data platform


Market Size Forecast in 2030 $14.8 Billion $4.8 Billion
12
Copyright © 2022, by tutorial authors
Benefits of AutoML
ž Enables and empowers novices
ž Standardizes the ML workflow for better reproducibility, code
maintainability, knowledge sharing
ž Prevents suboptimal results due to idiosyncrasies of ML Innovators
ž Builds models more effectively and efficiently
ž Enables rapid prototyping
ž Fosters learning

[Xin et al. 2021] Xin, D., Wu, E. Lee, Y., Salehi N., and Parameswaran, A. Whither AutoML?
Understanding the Role of Automation in Machine Learning Workflows. CHI 2021.

13 Copyright © 2022, by tutorial authors


Deficiencies of AutoML
ž Lacks comprehensive end-to-end
support
ž Causes system failures due to compute
intensive workloads
ž Lacks customizability
ž Lacks transparency and interpretability

14 Copyright © 2022, by tutorial authors


Deficiencies of AutoML
ž Lacks comprehensive end-to-end
support
ž Causes system failures due to No usable model returned
compute intensive workloads Revert to manual development

ž Lacks customizability
ž Lacks transparency and interpretability

15 Copyright © 2022, by tutorial authors


Deficiencies of AutoML
ž Lacks comprehensive end-to-end
support
ž Causes system failures due to compute
intensive workloads
ž Lacks customizability (Or users need to make too many hard choices)
ž Lacks transparency and interpretability

16 Copyright © 2022, by tutorial authors


Deficiencies of AutoML
ž Lacks comprehensive end-to-end Offer both AutoML and
support Tune API & extensible
ž Causes system failures due to Fast
compute intensive workloads Economical
ž Lacks customizability Smooth customization
ž Lacks Transparency and Interpretability

17 Copyright © 2022, by tutorial authors


Offer both AutoML and
Tune API & extensible
AutoML
Fast
Tune ML Economical
Smooth customization

18 Copyright © 2022, by tutorial authors


Python Library flaml
ž Task-oriented AutoML

Customization is easy

ž Tune user-defined function

19 Copyright © 2022, by tutorial authors


Economical Tuning & AutoML

üLeverage joint impact of multiple factors on cost & error

üHyperparameter
üLearner
üSample size
üResampling strategy: CV or Holdout

20 Copyright © 2022, by tutorial authors


Fast library for AutoML and tuning
[MLSys’21]

ChaCha Online Meta Offline


BlendSearch (local + global
(Champion- tuning learning tuning
search) [ICLR’21]
Challenger)
[ICML’21] Zero-shot [KDD-AutoML’22]

Cost-frugal optimization
[AAAI’21]
Simple and provable local search

21 Copyright © 2022, by tutorial authors


Users’ Background
Job Title

Years of Programming Experience

22 Copyright © 2022, by tutorial authors


Example Use Cases
APPLICATION WHO FIELD
AutoML tool for .NET developers (Visual Studio, ML.NET) SDE
Suspicious behavior detector Security
Credit assessment Finance
Auto finance functions (classification/regression) Finance
Hardware demand forecast Supply chain
A/B testing with automated causal inference (auto-causality) CRM
Air quality estimate Science
Pricing Insurance

ž Impact: Accuracy, Productivity, R&D

23 Copyright © 2022, by tutorial authors


Blog Post in Towards Data Science
24
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)


Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
25 Copyright © 2022, by tutorial authors
Overview of AutoML and Tune in FLAML

Task-oriented AutoML AutoML

Tune user-defined function Tune ML

26 Copyright © 2022, by tutorial authors


Agenda and Resources to Be Used in This Tutorial

https://github.com/microsoft/FLAML

https://github.com/microsoft/FLAML
/tree/tutorial/tutorial

27 Copyright © 2022, by tutorial authors


Task-oriented AutoML
What Is Task-oriented AutoML

Inputs:
• Resource budget
• ML task
ü Training
data
Prepare the data AutoML Train Evaluate
Deploy Inferencing

ü Task type

Model Training Model Consumption

29 Copyright © 2022, by tutorial authors


Task-oriented AutoML: Supported Tasks
Required data format:
X: Numpy array or dataframe
y: Numpy array or series of labels in shape n*1.

Time series forecasting


tasks

NLP tasks

30 Copyright © 2022, by tutorial authors


Task-specific Built-in ML Estimators
XGBoost RandomForest CatBoost …

ssion
g re
/re
on
AutoML i c ati
assif
cl nlp Transformer
Tune ML tim
es

er
i es
fo
re
c as
tin
g
Prophet ARIMA …

31 Copyright © 2022, by tutorial authors


Resources to Be Used

https://github.com/microsoft/FL
AML/tree/tutorial/tutorial

32 Copyright © 2022, by tutorial authors


Task-oriented AutoML: A Basic Use Case
• Get data
• AutoML with FLAML
• AutoML Result

34 Copyright © 2022, by tutorial authors


Benefits of Task-oriented AutoML in FLAML
ž Save manual efforts (human resource)
ž Effectiveness and efficiency: use small computation resource
to find models with good performance

40 Copyright © 2022, by tutorial authors


Benefits of Task-oriented AutoML in FLAML
ž Save manual efforts (human resource)
ž Effectiveness and efficiency: use small computation resource
to find models with good performance

41 Copyright © 2022, by tutorial authors


Benefits of Task-oriented AutoML in FLAML
ž Save manual efforts (human resource)
ž Effectiveness and efficiency: use small computation resource
to find models with good performance

42 Copyright © 2022, by tutorial authors


Benefits of Task-oriented AutoML in FLAML
ž Save manual efforts (human resource)
ž Effectiveness and efficiency: use small computation resource
to find models with good performance
FLAML is better

Figure 1. Box plot of scaled score difference between FLAML and other libraries when FLAML
uses equal or smaller budget (positive difference meaning FLAML is better). [MLSys’21]

43 Copyright © 2022, by tutorial authors


Benefits of Task-oriented AutoML in FLAML
ž Save manual efforts (human resource)
ž Effectiveness and efficiency: use small computation resource
to find models with good performance
ž Rich customization choices in FLAML
Helpful in scenarios such as:
Due to business/deployment requirements, you need to use
- Special ML learner (e.g., a domain specific one), or/and
- Custom metrics, or/and
- Various constraints (e.g., in terms of computation resource,
model complexity, inference time)
44 … Copyright © 2022, by tutorial authors
Task-oriented AutoML: Overview
Empowered by our cost-frugal Hyperparameter Optimization (HPO) algorithms

Estimator selection Hyperparameter selection


estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

Inputs: resource 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )


Outputs: ML model
𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( )
budget, ML task (and more)

𝐷'$( 𝐷"#$%&
Split data

45 Copyright © 2022, by tutorial authors


The Cost-Frugal HPO Problem
Validation loss under hyperparameter
configuration 𝑥.

Each time you query f(x), you


need to pay a cost of 𝑔(𝑥)

Hyperparameter configuration

• Two important properties:


o f(x) is a black-box function
o Function value evaluation is expensive

• Vanilla HPO: find 𝑥 ∗ with small number of iterations


• Cost-frugal HPO: find 𝑥 ∗ while keeping total cost ∑𝑔(𝒙𝒊 ) small

46 Copyright © 2022, by tutorial authors


Insights On The Cost-Frugal HPO Problem
• Vanilla HPO: find 𝑥 ∗ with small number of iterations
• Cost-frugal HPO: find 𝑥 ∗ while keeping total cost ∑𝑔(𝒙𝒊 ) small

1. If g(x) is constant, low cost ó small #iterations 𝑔( (# 𝑙𝑒𝑎𝑣𝑒𝑠, #𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟𝑠) )


High cost
Bayesian Optimization optimizes for this case

2. It is common to encounter cost-related


hyperparameters
(Examples: the number of estimators and leaves
in gradient boosted trees; the number of layers
and neurons in DNN)
Low cost
47 Copyright © 2022, by tutorial authors
Assumptions about Cost-related HPs

expensive

1. Lipschitz continuous

2. Easy to know low-cost


configurations before we start the
optimization
cheap
low-cost region

48 Copyright © 2022, by tutorial authors


Cost-Frugal HPO Algorithm Design
ž To avoid high-cost points until necessary -> Low-cost starting point
+ local search
ž To find low-loss points -> Follow loss-descent directions
ž Cannot directly use gradient-based method: no gradient available.

ž Surprise: function values are enough to find the right directions for
efficient convergence.
ž More surprise: sign comparison between function values is enough!!

49 Copyright © 2022, by tutorial authors


A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps


after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
?
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

50
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps


after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

51
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps


after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

52
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps


after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

53
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps


after each move::
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

54
Implications:
At any iteration,
• The incumbent has the best loss
• The evaluated point in the next iteration is in the neighboring area
55 of the incumbent =>the cost is not far away from the incumbent
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
• Theoretical guarantees on:
o Convergence rate
High loss

Low loss
56
Copyright © 2022, by tutorial authors
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
• Theoretical guarantees on:
o Convergence rate o The total evaluation cost
High loss High cost

Low loss Low cost


57
Copyright © 2022, by tutorial authors
Combine local search and global search
ž LS – low cost; may get trapped in local optima
ž Global search– able to explore the whole space; high cost

58 Copyright © 2022, by tutorial authors


Framework
• Economical Hyperparameter Optimization With Blended Search Strategy.
Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.

59 Copyright © 2022, by tutorial authors


Benefits of Task-oriented AutoML in FLAML
ž Save manual efforts (human resource)
ž Effectiveness and efficiency: use small computation resource
to find models with good performance
ž Rich customization choices in FLAML
Helpful in scenarios such as:
Due to business/deployment requirements, you need to use
- Special ML learner (e.g., a domain specific one), or/and
- Custom metrics, or/and
- Various constraints (e.g., in terms of computation resource,
model complexity, inference time)
60 … Copyright © 2022, by tutorial authors
An Example Use Case That Needs Customization
ž Application domain: Security
ž Overall objective: Find a good model to detect suspicious behaviors.
Comments about customization (in blue).
Comments about performance (in green).

• I am able to use *** classifiers as the custom learner and use search space for random forest,
lr and lightgbm.

• It is useful for me to optimize the hyperparameters in a short time. So I appreciate that, and
to be able to customize the metrics. My job is to create detectors using models, my last few
models were optimized using flaml. It is a corporate level product.

• Adding 0.5% positive precision increase to "suspicious behavior detector" while adding 3.5% more
true positives (positive recall) Adding 0.8% positive precision increase to "suspicious remote behavior
detector" while adding 24% more true positives (positive recall). Contributed to ~2000 detections
weekly for both detectors above.

61 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Optimization Metric

Estimator selection Hyperparameter selection


estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )

𝐷'$( 𝐷"#$%&
Split data


I appreciate that I am able to customize the metrics. My job is to create
detectors using models, my last few models were optimized using flaml. ”

62 Copyright © 2022, by tutorial authors


Built-in Optimization Metrics
• metric

63 Copyright © 2022, by tutorial authors


Built-in Optimization Metrics
• metric

64 Copyright © 2022, by tutorial authors


User-defined Metric Function
• metric

65 Copyright © 2022, by tutorial authors


Optimization metric

Metrics to log

“It does allow us to build our models fast; esp. it allows us to control overfitting.
We did observe that the models we built using the packages are simpler (in terms of #trees,
depth, and #features in the models) and more robust (over time), compared to other models
we built in the past using similar tools like ***(another HPO service).
66 Copyright © 2022,
Copyright by by
© 2022, tutorial authors
tutorial authors

The Power of a User-defined Metric Function
“Closing the gap between the loss function we optimize in ML and the product metrics we
really want to optimize.”
--Carlos Guestrin at KDD ’19, on 4 Perspectives in Human-Centered Machine Learning

• User-defined Metric Function in FLAML:


Allow creative metrics toward the ultimate objectives
Metrics beyond typical ML For example, business objectives,
predictive performance metrics such as profit or revenue

Concrete examples:
1. A heuristic objective to control overfitting: 𝑜𝑏𝑗 = 𝐿𝑜𝑠𝑠!"# ∗ 1 + 𝛼 − 𝛼 ∗ 𝐿𝑜𝑠𝑠$%"&'
2. Integrating business optimization with a machine learning model

67 Copyright © 2022, by tutorial authors


Task-oriented AutoML: Estimators and Search Space
“I am able to use *** classifiers as the custom learner and use search space for random
forest, lr and lightgbm.

Candidate Estimators Hyperparameter Search Space


x

Estimator selection Hyperparameter selection


estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )

𝐷'$( 𝐷"#$%&
Split data

68 Copyright © 2022, by tutorial authors


Estimators
estimator_list
ž Use built-in estimators with default search space
ü Classification/regression task:
"lgbm", "xgboost", “rf”, "extra_tree", “catboost”, “lrl2”/“lrl1”, “kneighbor”,
ü Time series forecasting task: ”prophet”, “arima”, “sarimax”
ü NLP task: “transformer”

ž Add custom estimators

69 Copyright © 2022, by tutorial authors


Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.

70 Copyright © 2022, by tutorial authors


Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.

71 Copyright © 2022, by tutorial authors


Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.

2. Give the custom estimator a name and add it in AutoML.

72 Copyright © 2022, by tutorial authors


Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.

2. Give the custom estimator a name and add it in AutoML.

3. Tune the newly added custom estimator depending on your


needs.

73 Copyright © 2022, by tutorial authors


Search Space
Each hyperparameter is associated
with a dict with the following fields:
• “domain”, specifies the possible
values of the hyperparameter and
their distribution.
• “init_value” (optional), which
specifies the initial value of the
hyperparameter.
• “low_cost_init_value” (optional),
which specifies the value of the
hyperparameter that is associated
with low computation cost.

74 Copyright © 2022, by tutorial authors


Search Space
Each hyperparameter is associated
with a dict with the following fields:
• “domain”, specifies the possible
values of the hyperparameter and
their distribution.
• “init_value” (optional), which
specifies the initial value of the
hyperparameter.
• “low_cost_init_value” (optional),
which specifies the value of the
hyperparameter that is associated
with low computation cost.

Cost-related hyperparameter
76 Copyright © 2022, by tutorial authors
Cost-related Hyperparameter in the Search Space

77 Copyright © 2022, by tutorial authors


Resources to Be Used in This Tutorial

ž https://github.com/microsoft/F
LAML/tree/tutorial/tutorial

78 Copyright © 2022, by tutorial authors


Customize the Search Spaces (of Existing Estimators)
Option 1. Create a new estimator with a revised search space.
Option 2. A shortcut to override the search space of existing estimators
via custom_hp.

79 Copyright © 2022, by tutorial authors


Customize the Search Spaces (of Existing Estimators)

Using a different search range for “n_estimators”

Disable search by setting “domain” to None

Setting a constant value

80 Copyright © 2022, by tutorial authors


Task-oriented AutoML: ML Procedure and Ensemble
Estimator selection Hyperparameter selection
estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )

𝐷'$( 𝐷"#$%&
Split data

eval_method: A string of resampling strategy, one of fit_kwargs: Provide additional key word arguments to
['auto', 'cv', 'holdout']. pass to fit() function of the candidate learners, such as
split_ratio, n_splits, sample_weight.
split_type: ["auto", 'stratified', 'uniform', 'time', 'group’] fit_kwargs_by_estimator: The user specified keywords
X_val, y_val arguments, grouped by estimator name.

81 Copyright © 2022, by tutorial authors


Task-oriented AutoML: Advanced Functionalities
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name
• log_type
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points
• Parallel tuning
• n_concurrent_trials
82 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Advanced Functionalities
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points
• Parallel tuning [Xin et al. 2021]
• n_concurrent_trials
83 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Logging

{"record_id": 12, "iter_per_learner": 2, "logged_metric": {"pred_time": 3.085368373117264e-07},


"trial_time": 0.056574344635009766, "wall_clock_time": 1.4277665615081787, "validation_loss":
0.40724474179893333, "config": {"n_estimators": 4, "max_leaves": 4, "learning_rate": 0.03859136192132082,
"subsample": 1.0, "colsample_bylevel": 0.8148474110627004, "colsample_bytree": 0.9777234800442423,
"reg_alpha": 0.0009765625, "reg_lambda": 5.525802807180917, "min_child_weight": 0.01199969653421202,
"FLAML_sample_size": 10000}, "learner": "xgboost", "sample_size": 10000}

84 Copyright © 2022, by tutorial authors


Task-oriented AutoML: Logging with MLFlow

85 Copyright © 2022, by tutorial authors


Task-oriented AutoML: More Constraints
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit: Training time constraint in seconds
• pred_time_limit: Predict time constraint in seconds
• metric_constraints: A list of constraints on certain metrics
• Warm start
• starting_points
• Parallel tuning
• n_concurrent_trials
86 Copyright © 2022, by tutorial authors
Constraints of 4 Different Types
1. Constraints on the AutoML process: time_budget, max_iter

2. Constraints on the constructor arguments of the estimators.

87 Copyright © 2022, by tutorial authors


Constraints of 4 Different Types
3. Constraints on the models tried in AutoML: train_time_limit, pred_time_limit

Or/and

88 Copyright © 2022, by tutorial authors


Constraints of 4 Different Types
4. Constraints on the metrics of the ML model tried in AutoML: metric_constraints

89 Copyright © 2022, by tutorial authors


Constraints of 4 Different Types
4. Constraints on the metrics of the ML model tried in AutoML: metric_constraints

90 Copyright © 2022, by tutorial authors


Task-oriented AutoML: Warm Start
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points: Starting hyperparameter config for the estimators
• Parallel tuning
• n_concurrent_trials
91 Copyright © 2022, by tutorial authors
Warm Start
ž starting_points:
A dictionary of config start with or a str to specify the starting
hyperparameter config for the estimators | default="data".
If str:
- if "data", use data-dependent defaults; Will be covered
- if "data:path”, use data-dependent defaults which are stored at later in zero-
path; shot AutoML
- if "static", use data-independent defaults.

92 Copyright © 2022, by tutorial authors


Task-oriented AutoML: Parallel Tuning
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points
• Parallel tuning
• n_concurrent_trials: The number of concurrent trials
93 Copyright © 2022, by tutorial authors
Parallel Tuning
AutoML

Sequential
Tune ML
Parallel
Ray NNI

94 Copyright © 2022, by tutorial authors


Easy Parallel Tuning Is a Desirable Feature

So I work in Research and that entire application I wrote myself over the last 9 months. My job
(and the teams) currently is to research a next state application stack for our existing products. In
particular the models out of this product *** have been implemented in FLAML (this is our own
GLM and GBM) which is exactly why we liked FLAML because it allowed us to easily extend the
estimators, something *** (an alternative HPO service) does not allow. Anyway the reason we
love FLAML and Ray is to get large scale parallel tuning, something the product team are
struggling with because it’s desktop and C# based, so to have FLAML out of the box
distribute across an auto scaled cluster removes at least 18 months work for us.
-- A S&P 500 company

95
Copyright © 2022, by tutorial authors
Parallel vs Sequential Tuning

Heads-up: Parallel tuning is not necessarily more desirable than sequential tuning.

Things to Consider:
• Different overhead and trial time.
E.g., when parallel tuning is used (ray backend is used), there will be a certain computation
overhead that is larger than sequential tuning.
• Availability of parallel resources
• Different randomness

Find more in this doc:


https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning

96
Copyright © 2022, by tutorial authors
Parallel vs Sequential Tuning
A rough estimation of the wall-clock time needed to finish N trials:
Computation overhead

Scale of parallelism

Trial time to evaluate a particular hyperparameter configuration

k(scale of parallelism) = 8,
Sequential tuning is faster
SingleTrialTime ≈ 0.3
97 OverHead (parallel tuning) ≈ 2.6 Copyright © 2022, by tutorial authors
use_ray in Sequential Tuning
ž use_ray: boolean or dict.
ž If boolean, default=False | Whether to use ray to run the training in
separate processes. This can be used to prevent OOM for large
datasets, but will incur more overhead in time.
ž If dict: the dict contains the keywords arguments to be passed to
ray.tune.run

Suggested scenarios to use ray backend:


1. Parallel tuning
2. Sequential tuning with potential Out Of Memory
(OOM) failure

98 Copyright © 2022, by tutorial authors


AutoML Use Cases: 1. Credit Scoring and Fraud
Detection in Financial Industry Advantages of FLAML:
ž Overall objective and constraints: Finding an “optimal” (gradient
boosting or deep learning) model that • Fast HPO
ž performs the best in out-of-time (OOT) period; • Controlled by our HPO algorithm
ž control model complexity (e.g., # of variables in the model, # of
trees, tree depth);
ž maintain model explainability; • Allow custom metric
ž preferably meet the following criteria: (1) do not over fit the training
data; (2) perform consistently between training/holdout/OOT; (3)
Simple and intuitive, and can pass regulatory exams.

• It does allow us to build our models fast; esp. it allows us to control overfitting.
• We did observe that the models we built using the packages are simpler (in terms of #trees,
depth, and #features in the models) and more robust (over time), compared to other models
we built in the past using similar tools like ***(another HPO service).
99
” Copyright © 2022, by tutorial authors
AutoML Use Cases: 2. Investment Management
• Overall objectives and constraints: Find the best model (customized
estimator) based on business requirements for deployment.
• Advantages of FLAML:
• Saved dev time
• High compatibility

Dev time saved 30 - 40 percent on average; for gigantic datasets the saving is even more;
regarding the performance, AUC wise the lift is about 0.1 - 0.2 point

For us, a big part of R&D is testing new algorithms released in recent publications with
internal data; thanks to the high compatibility of FLAML, we can conveniently incorporate
these new algorithms into the existing pipeline and make apples-to-apples comparisons.
” -- A large private equity firm
100
AutoML Use Cases: 3. Custom Retention and Growth
Analysis
ž Overall objectives and constraints: Find a good multiclass
workload classification model (mainly xgboost) based on the usage
and behavior pattern.
ž Advantages of FLAML:
• Fast (30 hours grid search -> minutes)
• Can find accurate models

the model runs faster (in minutes) and we are able to try out various sampling
techniques and additional derived attributes. Thus, we are using FLAML model
in the production classification models. It really helped to improve our team
productivity and reduced development iterations.
101 ” Copyright © 2022, by tutorial authors
AutoML Use Cases: 4. Security
ž Overall objective and constraints: Find a good model to detect suspicious
behaviors.
ž Advantages of FLAML:
• Support customized learner, custom metrics
• Fast
• Can find accurate models

It is useful for me to optimize the hyperparameters in a short time. So I appreciate that, and to be
able to customize the metrics. My job is to create detectors using models, my last few models
were optimized using flaml. It is a corporate level product.

Adding 0.5% positive precision increase to "suspicious behavior detector" while adding 3.5% more
true positives (positive recall). Adding 0.8% positive precision increase to "suspicious remote
behavior detector" while adding 24% more true positives (positive recall). Contributed to ~2000
detections weekly for both detectors above.
102 Copyright © 2022, by tutorial authors

Task-oriented AutoML: Q & A

Estimator selection Hyperparameter selection


estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

Inputs: resource 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )


Outputs: ML model
𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( )
budget, ML task (and more)

𝐷'$( 𝐷"#$%&
Split data

103 Copyright © 2022, by tutorial authors


Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)


Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
104 Copyright © 2022, by tutorial authors
105 Copyright © 2022, by tutorial authors
ML.NET Platform
Command-line tool
Wizard-like experience CI/CD

Model Builder ML.NET CLI


(Visual Studio UI) (Cross-platform global tool)

AutoML .NET API (FLAML)

ML.NET API
(Microsoft.ML)

106 Copyright © 2022, by tutorial authors


Demo: FLAML AutoML in .NET

107 Copyright © 2022, by tutorial authors


Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)


Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
108 Copyright © 2022, by tutorial authors
Tune User Defined Function
AutoML vs Tune
AutoML

Estimator selection Hyperparameter selection


estimator 𝑚
Inputs: 𝑙𝑜𝑠𝑠)! estimator and config 𝑚!
1. Resource budget Outputs:
2. ML task 𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )
ML model
(and more)
𝐷'$( 𝐷"#$%&
Split data

Tune

Inputs: Hyperparameter selection


1. Resource budget 𝑟! Outputs:
𝑐
2. Search space Best config
3. User defined function
User defined function
110
Copyright © 2022, by tutorial authors
Tune User Defined Function
Tune

Hyperparameter selection
Inputs: Outputs: Best config
1. Resource budget 𝑟! 𝑐
2. Search space
3. User defined function User defined function

Model training

Inference
Downstream applications

111 Copyright © 2022, by tutorial authors


Tune User Defined Function
Examples:
It can be used to tune generic hyperparameters for:
• MLOps workflows, pipelines AzureML pipeline, MLflow pipeline
• Mathematical/statistical models Casual models
• Algorithms RL policies
• Computing experiments Simulations in environmental science
• Software configurations Database configurations

112
Copyright © 2022, by tutorial authors
Resources to Be Used

ž https://github.com/microsoft/F
LAML/tree/tutorial/tutorial

113 Copyright © 2022, by tutorial authors


Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function

114 Copyright © 2022, by tutorial authors


Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function

115 Copyright © 2022, by tutorial authors


Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function

• Evaluation function
• Optimization metric
• Optimization mode

116 Copyright © 2022, by tutorial authors


Tune User Defined Function: A Basic Tuning Procedure

Inputs: Hyperparameter selection


1. Resource budget Outputs: Best config
2. Search space
𝑟3 𝑐
3. User defined function
User defined function

117 Copyright © 2022, by tutorial authors


Tune User Defined Function: A Basic Tuning Procedure

118 Copyright © 2022, by tutorial authors


Resources to Be Used

ž https://github.com/microsoft/F
LAML/tree/tutorial/tutorial

119 Copyright © 2022, by tutorial authors


Benefits of Tune
• Save manual efforts (human resource)
• Effectiveness and efficiency: use small computation resource
to find hyperparameter configurations with good performance
• Flexibility: support even more diverse use cases than AutoML

120 Copyright © 2022, by tutorial authors


More about search space
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function

121 Copyright © 2022, by tutorial authors


Cost-related Hyperparameters in Search Space
• low_cost_partial_config (optional): A dictionary from a subset of controlled
dimensions to the initial low-cost values.
• cat_hp_cost (optional): A dictionary from a subset of categorical dimensions to
the relative cost of each choice.

• Search space

• Controlled dimensions where


we know the low-cost values
• The relative cost across
different categorical choices

122
Hierarchical Search Space
A hierarchical search space for xgboost

123
Copyright © 2022, by tutorial authors
HPO algorithm

Local search [AAAI’21] Local search + Global search [ICLR’21]

CFO is suggested when


• Simple search space
• Known good (low-cost) starting points
• Non-parallel tuning or low parallelization

124 Copyright © 2022, by tutorial authors


Tune User Defined Function: More Constraints ?

Inputs:
1. Resource budget Hyperparameter selection
2. Search space 𝑟3 𝑐 Outputs: Best config
3. User defined function
- More constraints? User defined function

125 Copyright © 2022, by tutorial authors


More Constraints on the Tuning
• config_constraints: constraints on the configurations (a list
of 3-tuple)

f: config -> float inequality constraint threshold

126 Copyright © 2022, by tutorial authors


More Constraints on the Tuning
• metric_constraints: constraints on the metrics (a list of 3-tuple)

Needs to be reported in the evaluation function

127
config_constraints vs metric_constraints
Does the calculation of the constraints relies on the
evaluation procedure in the metric function?
No Yes
config_constraints metric_constraints
Note: This type of constraint can be checked
before evaluation. So if a config does not
satisfy config_constraints, it will not be evaluated
(which saves computation).

128 Copyright © 2022, by tutorial authors


Tune User Defined Function: Parallel Tuning

Inputs:
1. Resource budget Hyperparameter selection
2. Search space 𝑟3 𝑐 Outputs: Best config
3. User defined function
- More constraints? User defined function
- Enable parallel tuning?

129 Copyright © 2022, by tutorial authors


Parallel Tuning
• resource_per_trial: a dict of hardware resources to allocate per trial

Required
Recommended

130 Copyright © 2022, by tutorial authors


Tune User Defined Function: Warm Start

Inputs:
1. Resource budget
2. Search space Hyperparameter selection
3. User defined function 𝑟3 𝑐 Outputs: Best config
- More constraints?
User defined function
- Enable parallel tuning?
- Enable warm start?

131 Copyright © 2022, by tutorial authors


Warm Start
ž points_to_evaluate: a list of initial configs to try first
ž evaluated_reward: a list of reward for the corresponding configs
provided in points_to_evaluate (must be the same or shorter length
than points_to_evaluate.)

The need to leverage results


from previous runs.

Results from previous runs +


some new configs to try first
132
Copyright © 2022, by tutorial authors
Warm Start
ž points_to_evaluate: a list of initial configs to try first
ž evaluated_reward: a list of reward for the corresponding configs
provided in points_to_evaluate (must be the same or shorter length
than points_to_evaluate.)

Evaluated configs
and results
Additional points
to evaluate

133
Copyright © 2022, by tutorial authors
Warm Start

Points to evaluate

134 Copyright © 2022, by tutorial authors


Tune User Defined Function: Trial Scheduling

Inputs:
1. Resource budget
2. User defined function Hyperparameter selection
3. Search space 𝑟3 𝑐 Outputs: Best config
- More constraints?
User defined function
- Enable parallel tuning?
- Enable warm start?
- Enable trial scheduling

135 Copyright © 2022, by tutorial authors


What Is a Scheduler Doing?
A scheduler can help manage the trials’ execution. It can be
used to perform multi-fidelity evaluation, or/and early
stopping.

136 Copyright © 2022, by tutorial authors


Trial Scheduling
ž scheduler: A scheduler for executing the trials.
• ’flaml’: Authentic scheduler in FLAML
• ‘asha’: The Asynchronous Successive Halving Algorithm
• An instance of the TrialScheduler class from ray.tune
ž resource_attr: A string to specify the resource dimension used by the
scheduler.
ž min_resource: A float of the minimal resource to use for the resource_attr.
ž max_resource: A float of the maximal resource to use for the resource_attr.
ž reduction_factor: A float of the reduction factor used for incremental
pruning.
137 Copyright © 2022, by tutorial authors
Copyright © 2022, by tutorial authors

Trial Scheduling
• Starts the search with the minimum resource.
At any time point (before the max resource is reached)
ž scheduler='flaml' • Switches between HPO with the current resource
(An authentic scheduler in FLAML) and increasing the resource for evaluation
depending on which leads to faster improvement.
resource_attr = the attribute name for r (e.g., sample size)
max_resource = 𝑅

Resource schedule
2% r

4r 𝑐! , 4r
?
?
reduction_factor = 2 2r 𝑐! , 2r 𝑐!"# , 2r
min_resource = r 𝑐$ , r
𝑐1 … 𝑐. 𝑐./0
138
search trajectory of configs
Effectiveness of the ”flaml” Scheduler
w/o flaml scheduler

w/o flaml scheduler


w/o flaml scheduler

w/ flaml scheduler
w/ flaml scheduler w/ flaml scheduler

Effectiveness of the authentic scheduler (scheduler = ‘flaml’) in FLAML [MLSys’21]

139 Copyright © 2022, by tutorial authors


Trial Scheduling With scheduler='flaml'

• Specifying “sample_size” as the resource dimension

• Using the “flaml” scheduler


• Setting lower limit, upper limit and
changing factor of the resource

140 Copyright © 2022, by tutorial authors


Trial Scheduling With scheduler='flaml'

• The suggested config


contains the additional
resource_attr dimension.

• In the evaluation function,


the resource dimension
shall be used properly.

141
Copyright © 2022, by tutorial authors
A Use Case of Tune: Validating Strategies
Overall objectives and constraints:
Tuning hyperparameters to find the best strategies (the evaluation of which
require expensive simulation), that lead to the highest profitable rate.
Advantages of FLAML.Tune:
• Fits the use case well
• “scheduler” allows for low cost evaluations
“ I started playing around with the tuning module - which is really cool by the way and fits really neatly
into one of my use cases, so cheers for that! I'm running simulations on market strategies in a given
bootstrapped sample.

I love the idea of having a resource_attr that increases to validate low cost results on a full dataset. For my
case, it's the number of bootstrap iterations that my data should be sampled with each strategy I'm testing.
So the low cost is 30 samples, which runs in a few seconds, and the validation/max scenario would be 1000
samples, which takes several minutes. The 30 folds help me exclude obviously bad strategies without
having to waste resources on running them 1000 times.

142
AutoML
Tune
Estimator selection Hyperparameter selection
Hyperparameter selection

Model validation Model training


User defined function

𝐷'$( 𝐷"#$%&
Data split

Application: Supercharge A/B testing w/automated causal inference


(AutoML: Automating generic regression/classification tasks)
"First of all, I use FLAML models FLAML AutoML as the generic regression. So whenever the
models need a regression component, I throw flaml into there. And also if they need the classifier, I
use FLAML classifier. The only exception is if my sample is actually random. I use the dummy
classifier as a propensity function. But then also on the second level I use FLAML for model
hyperparameter and estimator search.”
(Tune: Tuning causal models) -Head of AI at WISE
143
Copyright © 2022, by tutorial authors
Interested in Knowing More ?
ž Github: https://github.com/microsoft/FLAML
ž Documentation: https://microsoft.github.io/FLAML/

144 Copyright © 2022, by tutorial authors


Frequently Asked Questions (FAQ)
https://microsoft.github.io/FLAML/docs/FAQ
ž About low_cost_partial_config in Tune
ž How does FLAML handle imbalanced data (unequal distribution of
target classes in classification task)?
ž How to interpret model performance? Is it possible for me to
visualize feature importance, SHAP values, optimization history?

145 Copyright © 2022, by tutorial authors


Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)


Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
146 Copyright © 2022, by tutorial authors
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)


Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
147 Copyright © 2022, by tutorial authors
Zero-shot AutoML
149 Copyright © 2022, by tutorial authors
What Is Zero-shot AutoML
ž Zero-shot AutoML = "no-tuning" AutoML
ž Recommend data-dependent default configurations at runtime
ž The configuration depends on the dataset (X_train and y_train)

150 Copyright © 2022, by tutorial authors


What Is Zero-shot AutoML
ž Zero-shot AutoML = "no-tuning" AutoML
ž Recommend data-dependent default configurations at runtime

Dataset name: houses

151 Copyright © 2022, by tutorial authors


What Is Zero-shot AutoML
ž Zero-shot AutoML = "no-tuning" AutoML
ž Recommend data-dependent default configurations at runtime

Static default, r2 = 0.8296 Data-dependent default, r2 = 0.8537

152 Copyright © 2022, by tutorial authors


What Is Zero-shot AutoML
ž Zero-shot AutoML = "no-tuning" AutoML
ž Recommend data-dependent default configurations at runtime
ž Requires mining good hyperparameter configurations across
different datasets offline

153 Copyright © 2022, by tutorial authors


Benefit of Zero-shot AutoML

Training one model only

The decision of hyperparameter configuration is instant

User code remains the same

It requires less input (tuning budget) from the user

The offline preparation can be customized for a domain and leverage the historical tuning data

154 Copyright © 2022, by tutorial authors


Concerns of Zero-shot AutoML

Robust meta-learning
Performance
Combine with HPO

Transparency Transparent portfolio-based approach

New tasks/estimators/metrics Customizable meta-learning

155 Copyright © 2022, by tutorial authors


Mature libraries already have carefully
Data- crafted static defaults
dependent ž LightGBM
Defaults vs. ž XGBoost
Data-agnostic ž Random Forest
Defaults Can data-dependent defaults outperform
static defaults consistently?

156 Copyright © 2022, by tutorial authors


157 Copyright © 2022, by tutorial authors
Learned Zero-shot vs. Static Default

The margin is large in many tasks

• bng_libras_move (+24%)
• nyc_taxi (+15%)
• Yolanda (+17%)
• >1% on >67% of the tasks

The static default performs catastrophically in some tasks

• brazil_houses (-0.39 vs 0.76)


• poker (0.28 vs. 0.94)

Learned zero-shot is slightly worse in 1 task

• comet (0.9857 vs. 0.9907, -0.5%)

158 Copyright © 2022, by tutorial authors


Use Zero-shot AutoML: Import a “Flamlized" Learner
ž LGBMClassifier, LGBMRegressor (inheriting LGBMClassifier,
LGBMRegressor from lightgbm)
ž XGBClassifier, XGBRegressor (inheriting XGBClassifier, XGBRegressor
from xgboost)
ž RandomForestClassifier, RandomForestRegressor (inheriting from
scikit-learn)
ž ExtraTreesClassifier, ExtraTreesRegressor (inheriting from scikit-learn)

159 Copyright © 2022, by tutorial authors


Magic Behind the Scene
ž flaml.default.LGBMRegressor inherits
lightgbm.LGBMRegressor
ž Decide the hyperparameter
configurations based on
ž Training data (size and other Task Configuration
metafeatures)
ž Offline AutoML results
ž The decision is made instantly

160 Copyright © 2022, by tutorial authors


Check the Configuration Before Training

161 Copyright © 2022, by tutorial authors


Combine Zero-shot AutoML and HPO
ž Further improve the accuracy by tuning

162 Copyright © 2022, by tutorial authors


Who may consider customized defaults?
ž AutoML providers for a particular
domain
Use Your Own ž Data scientists or engineers who need
to repeatedly train models for similar
Meta-learned tasks with varying training data
Defaults ž Researchers or developers who would
like to leverage meta learning for new
tasks/estimators/metrics

163 Copyright © 2022, by tutorial authors


Meta Learning in FLAML
Metafeatures

Configurations

Evaluation results

164 Copyright © 2022, by tutorial authors


Learn Data-dependent Defaults
Metafeatures

Configurations

Evaluation results

my/ python –m flaml.default.portfolio --output my --input my --


metafeatures my/all/metafeatures.csv --task binary --estimator lgbm rf
all/metafeatures.csv
lgbm/ my/
2dplanes.json lgbm/binary.json
… rf/binary.json
results.csv all/binary.json
rf/…
165 Copyright © 2022, by tutorial authors
Use Your Own Meta-learned Defaults
ž Use “flamlized” learner

ž Combine zero-shot and HPO


location_for_defaults/
all/multiclass.json
lgbm/multiclass.json
xgb_limitdepth/multiclass.json
rf/multiclass.json

166 Copyright © 2022, by tutorial authors


"Flamlize" a Learner

ž Share it with others


ž or the future yourself

ž Update the learned defaults continuously

167 Copyright © 2022, by tutorial authors


Reference
ž Mining Robust Default
Configurations for Resource-
constrained AutoML. Moe Kayali,
Chi Wang. KDD-AutoML 2022.

168
Copyright © 2022, by tutorial authors
Time Series Forecasting

169
Time Series Forecasting Tasks

Forecast label can be • Numerical: task=“ts_forecast”, or


numerical (default) or task=“ts_forecast_regression”
• Categorical: task=“ts_forecast_classification”
categorical

Data ordered by a • Daily: 3-30-2022, 3-31-2022, 4-1-2022, 4-2-2022, …


datetime column with • Weekly: 3-28-2022, 4-4-2022, 4-11-2022, …
equal intervals • Monthly: 3-1-2022, 4-1-2022, 5-1-2022, …

Forecast horizon • How many future time points to predict

170 Copyright © 2022, by tutorial authors


Time Series Forecasting Dataset
SKU Date Volume Month Price Temperature
SKU_01 01-01-2022 4 Jan 11 25
SKU_02 01-01-2022 8 Jan 12 25
SKU_01 02-01-2022 20 Feb 10.5 27
SKU_02 02-01-2022 32 Feb 11.2 27
… … … … … …

Time series Timestamp Forecast


ID (datetime) label
(categorical) (numerical
or
categorical)
171 Copyright © 2022, by tutorial authors
Time Series Forecasting Dataset
SKU Date Volume Month Price Temperature
SKU_01 01-01-2022 4 Jan 11 25
SKU_02 01-01-2022 8 Jan 12 25
SKU_01 02-01-2022 20 Feb 10.5 27
SKU_02 02-01-2022 32 Feb 11.2 27
… … … … … …

Time series Timestamp Forecast Known features Unknown


ID (datetime) label (numerical or features
(categorical) (numerical categorical) (numerical
or or
categorical) categorical)
172 Copyright © 2022, by tutorial authors
Time Series Forecasting Dataset
SKU Date Volume Month Price Temperature
SKU_01 01-01-2022 4 Jan 11 25
SKU_02 01-01-2022 8 Jan 12 25
SKU_01 02-01-2022 20 Feb 10.5 27
SKU_02 02-01-2022 32 Feb 11.2 27
… … … … … …

Time series Timestamp Forecast Known features Unknown


ID (datetime) label (numerical or features
(categorical) (numerical categorical) (numerical
or or
categorical) categorical)
173 Copyright © 2022, by tutorial authors
Estimators

Statistical models Regressors/Classifiers Neural networks

• Prophet • LightGBM • TemporalFusionTransformer


• ARIMA • XGBoost (pytorch-forecast)
• SARIMAX • RandomForest
• ExtraTrees

Id columns
Unknown features Unknown features Training cost is high

174 Copyright © 2022, by tutorial authors


Data Splitter
ž Holdout
Training Validation

ž Cross validation
ž Fold 1
ž Fold 2
ž Fold 3
ž Fold 4
ž Fold 5

1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4

175 Copyright © 2022, by tutorial authors


Example: Prepare Dataset

176 Copyright © 2022, by tutorial authors


Example: Run AutoML.fit()
List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost',
'extra_tree', 'xgb_limitdepth', 'prophet', 'arima', 'sarimax']

177 Copyright © 2022, by tutorial authors


Example: Check Result

178 Copyright © 2022, by tutorial authors


Example: Prediction

179 Copyright © 2022, by tutorial authors


Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)


Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
180 Copyright © 2022, by tutorial authors
Natural Language Processing
pip install "flaml[nlp]"

Copyright © 2022, by tutorial authors


Natural Language Processing Tasks by FLAML
Sequence • Sentiment classification / Hate speech
classification detection / Document categorization

Natural Sequence • Review rating prediction, book price


language regression prediction
understanding
Token
• Named entity recognition / POS tagging
classification
Multiple choice
• Multiple choice classification
classification
Natural
language
Seq2seq • Text summarization
generation

182 Copyright © 2022, by tutorial authors


FLAML NLP Backbone: Fine-Tuning Language Models

AutoML
Tune Transformers

ž Library for fine-tuning transformer


models
ž Supports major NLP tasks
ž Open access to most state-of-the-
art language
models: huggingface.co/models

183 Copyright © 2022, by tutorial authors


FLAML NLP Backbone: Fine-Tuning Language Models
Natural language Named entity QA
inference recognition
Stage 1: Pretraining
on unlabeled data

Stage 2: Fine tuning


on downstream
tasks

184 Copyright © 2022, by tutorial authors


NLP: A Basic Use Case

185 Copyright © 2022, by tutorial authors


NLP: A Basic Use Case

ž Get the data


ž AutoML using FLAML
ž Result

186 Copyright © 2022, by tutorial authors


NLP: A Basic Use Case

ž Get the data


ž AutoML using FLAML
ž Result

187 Copyright © 2022, by tutorial authors


NLP: A Basic Use Case

FLAML accuracy = 1-0.077 = 0.922 Electra accuracy = 0.912

188 Copyright © 2022, by tutorial authors


Fine-Tuning Hyperparameters: FLAML vs Transformers

FLAML's AutoML.fit vs Transformers' hyperparameter_search API:


ž Low code implementation
ž More customization, e.g., custom metric
ž Better performance

189 Copyright © 2022, by tutorial authors


Hyperparameter Fine-Tuning with Transformers

AutoML with
Transformers for
text summarization:
107 lines

190 Copyright © 2022, by tutorial authors


Hyperparameter Fine-Tuning with Transformers

AutoML with Transf


ormers
for NER: 133 lines

191
Copyright © 2022, by tutorial authors
Hyperparameter Fine-Tuning with Transformers

model
data collator Trainer

Post processing
Pre-processing/tokenization

User code for


implementing
summarization with
computing metrics Transformers
192 Copyright © 2022, by tutorial authors
Hyperparameter Fine-Tuning with Transformers

model Trainer

Data collator

Pre-processing/tokenization User code for


implementing NER
with Transformers

193
computing metrics Copyright © 2022, by tutorial authors
Low-code Hyperparameter Fine-Tuning with FLAML

AutoML
Tune Transformers

Pre-processing Model loading Data collation

Computing metrics Training Post processing

Training arguments (e.g., model)

194
Copyright © 2022, by tutorial authors
Low-code Hyperparameter Fine-Tuning with FLAML

FLAML for Text summarization

AutoML with FLAML for AutoML with FLAML for


text NER: 15 lines
summarization: 15 lines

195 Copyright © 2022, by tutorial authors


Performance: FLAML vs Transformers

FLAML > Transformers


x8
Validation Accuracy

FLAML < Transformers


x1

FLAML = Transformers

x1

196 Copyright © 2022, by tutorial authors


Number of Trials: FLAML vs Transformers

FLAML > Transformers


x8

FLAML < Transformers


Trials number

x2

197 Copyright © 2022, by tutorial authors


Open Problems on Fine-Tuning Hyperparameters

ž Model selection
ž Warm starting AutoML/ zero shot AutoML
ž Troubleshooting AutoML failure
ž Optimal search space
ž ...

198 Copyright © 2022, by tutorial authors


Model Selection for Fine-Tuning LM Hyperparameters

RoBERTa: 84.6

BERT: 69.0
199 Copyright © 2022, by tutorial authors
Model Selection for Fine-Tuning LM Hyperparameters

• Non trivial problem. The "best" model does not exist!


ž Spooky-author-identification: BERT outperforms RoBERTa (and Electra, Muppet, etc.)

Validation Accuracy

Wall clock time


200 Copyright © 2022, by tutorial authors
Troubleshooting AutoML Failure

RS, BO, and ASHA underperform


recommended HPs [ACL 2021]
201 Copyright © 2022, by tutorial authors
Troubleshooting AutoML Failure

Reducing the search space can effectively reduce


overfitting in HPO [ACL 2021]
202 Copyright © 2022, by tutorial authors
FLAML NLP: Summary

ž An AutoML layer on top of Transformers


ž Low-code implementation
ž Outperforms Transformers' tuning performance
ž Join us to provide a better tool for AutoML for NLP

203 Copyright © 2022, by tutorial authors


Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)


Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
204 Copyright © 2022, by tutorial authors
Online AutoML https://vowpalwabbit.org/

time

Properties of online machine learning


ž Data is available in a sequential order
ž Training/learning is performed online
ž Prediction/decision needs to be made online

205 Copyright © 2022, by tutorial authors


Copyright © 2022, by tutorial authors

• Improves user experience with


real-time learning

https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/
206 Copyright © 2022, by tutorial authors
Online ML -> Online AutoML

Environment Learning agent Need AutoML


Online learner Hyperparameters:
Input 𝑋! - featurization
Prediction output 𝑌5! Predict(𝑋! ) choices
Ground truth 𝑌! Learn(𝑋! , 𝑌5! , 𝑌! ) - learning rate
- regularization
- …
Online evaluation (e.g., progressive
validation loss, cumulative regret)

207 Copyright © 2022, by tutorial authors


Online Namespace Interactions Tuning

• Features grouped into namespaces in VW:

• Sometimes adding feature interactions is helpful.

• How to automatically decide which namespace interactions to use?


(A double exponentially large search space)

208 Copyright © 2022, by tutorial authors


Online AutoML ?
ž Unique challenges of online AutoML (which make
conventional AutoML methods not directly applicable)

Sharp computation constraints Vastly different Learning algorithms


(only a constant factor more scales of data are being evaluated
computation is allowed at any volume constantly
time point)

209 Copyright © 2022, by tutorial authors


Online AutoML
Research question
How can we best design an efficient online automated
machine learning algorithm?

Key challenge
Finding a balance between
• Searching over a large number of plausible choices
• Concentrating the limited computation budget on a few
promising choices such that we do not pay a high `learning
price’ (regret)

210 Copyright © 2022, by tutorial authors


Champion-Challengers (ChaCha) for Online AutoML [ICML’21]
Champion: proven best config at the concerned time point
Challengers: the rest of the candidate configs under consideration
Champion update Challenger generation

An initial config

For online learning


211
Live challenger scheduling
Copyright © 2022, by tutorial authors
Resources to Be Used

https://github.com/microsoft/FL
AML/tree/tutorial/tutorial

212 Copyright © 2022, by tutorial authors


Demo: AutoVW +
AutoVW:

Vanilla VW:

213
Copyright © 2022, by tutorial authors
Demo: AutoVW +

214
Copyright © 2022, by tutorial authors
Demo: AutoVW +

215
ML Fairness and Fair AutoML
• Fairness in ML

Existing and ongoing efforts:


üFairness definitions in ML
üUnfairness mitigation methods
216 Source: https://fairmlbook.org/tutorial1.html
ML Fairness and Fair AutoML

• Fair AutoML: To find models not only with good accuracy but
also fair.
• Is it possible to find a model that is fair without applying
additional unfairness mitigation?
• If unfairness mitigation is necessary, how to incorporate it
into an AutoML pipeline?
• If we can find fair models (w/ or w/o unfairness mitigation),
how the “utility” (e.g., accuracy, cost) will be affected?

217 Copyright © 2022, by tutorial authors


Fair AutoML
• Developed abstractions for fairness assessment and unfairness
mitigation techniques in AutoML.
• Investigated the need and impact of unfairness mitigation on
AutoML.

[preprint] Fair AutoML. Qingyun Wu, Chi Wang, arXiv preprint arXiv:2111.06495 (2022).
218
Copyright © 2022, by tutorial authors
Fair AutoML

• The fair AutoML problem

Unfairness mitigation Fairness assessment

219 Copyright © 2022, by tutorial authors


Fair AutoML
E.g., from FairLearn
Inputs:
- Data: D)*#+, , D"#$ = X"#$ , Y"#$ Inputs:
- Loss function: Loss() - Fairness assessment procedure Fair()
- Hyperparameter search space 𝐶 Update - Unfairness mitigation procedure

Suggest(𝐶)
HyperparameterSearcher Config c Regular model building, m=0
No - ML model 𝑓%,-
( ;"#$ , Y"#$ )
Loss(Y
!"#$%
Yes - Model prediction (
Mitigate(c’)
;"#$ = 𝑓%,'
( Fair(𝑓%,' , 𝐷"#$ )
Model building w/ Y (X"#$ ) !"#$%

c’ = Suggest(C) unfairness mitigation, m=1


!"#$%
Fairness
Unfairness mitigation assessment
FairnessManager
Update
FairnessManager: for deciding
FairAutoML: A fair AutoML framework
whether we should perform
the mitigation
220 Copyright © 2022, by tutorial authors
FairFLAML
• A self-adaptive strategy (via FairnessManager) to balance HPO
and unfairness mitigation with a goal of finding a model with
the best fair loss efficiently and effectively.

Fair loss on the adult dataset


221 (sensitive attribute = “sex”) Copyright © 2022, by tutorial authors
Future Work
Open Problems and Research Opportunities on AutoML
ž Deficiencies of AutoML ž Timeout
ž Causes system failures due to compute ž Elasticity
intensive workloads
ž Multi-output tasks
ž Lacks customizability
ž Early stopping of trials
ž Lacks comprehensive end-to-end support
ž Lacks transparency and interpretability
ž Search space suggestion
ž Overfitting
ž Multiple optimization metrics
ž Customize meta learning

223 Copyright © 2022, by tutorial authors


Open Problems and Research Opportunities on AutoML
ž Deficiencies of AutoML ž Deployment constraints
ž Causes system failures due to compute ž Online learning
intensive workloads
ž Guardrail
ž Lacks customizability
ž Lacks comprehensive end-to-end support
ž Lacks transparency and interpretability ž How to decide time budget
ž Trustworthiness and Ethics
ž How many data samples are needed
ž Reproducibility
ML

224 Copyright © 2022, by tutorial authors


Use FLAML to Improve/Inspire Your Research
ü ML model tuning
- Tuning reinforcement learning models
- Tuning graph neural networks

ü Tuning certain choices in your task towards a particular objective
- Synthetic dataset generation by tuning data generation related choices
- Finding the most profitable investment strategies
- RL policy tuning

ü Compare different methods/models more comprehensively with
sufficient tuning of each method/model
225 Copyright © 2022, by tutorial authors
Call for Contribution
This project welcomes and encourages all forms of
contribution, including but not limited to:
ü Pushing patches.
ü Code review of pull requests.
ü Documentation, examples and test cases.
ü Community participation in issues, discussions, and gitter.
ü Tutorials, blog posts, talks that promote the project.
ü Sharing application scenarios and/or related research.

226 Copyright © 2022, by tutorial authors


Call For Contribution: Opportunities
üShare your use cases and needs
- Connect via gitter: https://gitter.im/FLAMLer/community
- Create issues
ü Development
- Improve existing features: zero-shot AutoML, time series
forecasting, online AutoML
- Develop new features: computer vision tasks, multi-modal
model, fair AutoML, quality monitoring and drift detection,
visualization and explanation
ü Integration with other libraries
227 Copyright © 2022, by tutorial authors
Call for Contribution

Gitter:
https://gitter.im/FLAMLer/community

Contributing guide:
https://microsoft.github.io/FLAML/docs/Contribute

Roadmap:
https://github.com/microsoft/FLAML/wiki/Roadmap-for-
Upcoming-Features

228 Copyright © 2022, by tutorial authors


References
• [MLSys’21] FLAML: A Fast and Lightweight AutoML Library. Chi Wang, Qingyun Wu, Markus Weimer,
Erkang Zhu.
• [ICML’21] ChaCha for Online AutoML. Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi.
• [ICLR’21] Economical Hyperparameter Optimization With Blended Search Strategy. Chi Wang, Qingyun Wu,
Silu Huang, Amin Saied.
• [AAAI’21] Frugal Optimization for Cost-related Hyperparameters. Qingyun Wu, Chi Wang, Silu Huang.
• [ACL’21] An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language
Q&A •
Models. Susan Liu, Chi Wang.
[KDD-AutoML’22] Mining Robust Default Configurations for Resource-constrained AutoML. Moe Kayali, Chi
Wang.
• [preprint] Fair AutoML. Qingyun Wu, Chi Wang. arXiv preprint arXiv:2111.06495 (2022).
aka.ms/FLAML

Thanks to
all contributors
& collaborators!

229

You might also like