Auto ML

INTRODUCTION
• Automated machine learning(automated ML) is the process of

automating the time consuming, iterative tasks of machine
learning model development to build models with high scale,
efficiency and productivity while sustaining model quality.
• Traditional machine learning model development

1.Is Resource-intensive.
2.Requires significant domain knowledge.
3.Consumes time for feature engineering, model selection etc
• With automated machine learning,efficiency can be improved

to get production ready ML models.
Problem Statement
• To develop an AutoML package in python which allows the user to
enter a dataset and outputs the best model which fits that particular
dataset. First data cleaning on the dataset is performed which
includes eliminating redundant data and filling the missing values in
the dataset. It also includes an encoding algorithm for columns which
include “string values”. This will be followed by feature engineering
which includes feature extraction and selection. Then the best model
with the best hyperparameters for predictions are selected. The last
step includes cross-validations to check the consistency of the
selected model.
Objective
• The objective of AutoML is to make machine learning more accessible
by automatically generating a data analysis pipeline that can include
data pre-processing, feature selection, and feature
engineering methods along with machine learning methods and
parameter settings that are optimized for a given data.
• Each of these steps can be time-consuming for the machine learning
expert and can be debilitating for the novice. These methods
enable data science using machine learning thus making this powerful
technology more widely accessible for those hoping to make use
of big data.
Literature Survey
1. Xin He, Kaiyong Zhao, Xiaowen Chu on topic “AutoML: A
Survey of the State-of-the-Art” by Department of Computer
Science, Hong Kong Baptist University, August-2019
• The main objective of this paper is to provide a comprehensive and up-to-date study on the state-of-the-
art AutoML.
• First, the AutoML techniques in details according to the machine learning pipeline is introduced.
• This includes data preparation, feature engineering and model generation. Data preparation includes
data collection and data cleaning. Feature engineering includes feature collection, construction and
extraction.
• Model generation includes model structures and hyperparameter optimisation. Then the paper
summarizes existing Neural Architecture Search (NAS) research, which is one of the most popular
topics in AutoML.
• It also compares the models generated by NAS algorithms with those human-designed models. Finally,
several open problems for future research are presented.
Advantages:
• The pipeline to be followed to develop an AutoML application is provided.

• Most aspects of AutoML starting from data collection to validation of selected models for a particular dataset is
covered.
Limitations:
• The methodologies provided by the paper for model selection are far too complex for simple AutoML
applications which includes basic regression and classification models, thus increasing the time complexity
of the problem.
2.Mohamed Maher, Sherif Sakr on topic “SmartML: A Meta Learning-Based
Framework for Automated Selection and Hyperparameter Tuning for Machine
Learning Algorithms ” Published in Proceedings of the 22nd International Conference
on Extending Database Technology (EDBT), March 26-29, 2019
• The main objective of this paper is to provide an approach for Automated
algorithm selection and Hyperparameter Tuning.
• SmartML is used in this paper to demonstrate a meta learning-based framework
• SmartML automatically extracts its meta features and searches its knowledge base
for the best performing algorithm .
• SmartML applies the SMAC technique for hyperparameter optimization.
• SMAC attempts to draw the relation between the algorithm performance and a
given set of hyper parameters
Advantages:
• the meta-learning feature is emulating the role of the domain expert in the field of machine
learning
• The implementation of growing knowledge base helps to get the smarter framework
Limitations:
• This approach runs through iterative process using various algorithm and hyper-parameter
which may lead to time complexity problem
3.Janek Thomas,Stefan Coors,Bernd Bischl on topic “Automatic Gradient Boosting”
published by Department of Statistics, LMU, Ludwigstrasse 33, D80539 Munich ,13 July 2018
• Automatic machine learning performs predictive modeling with high performing

machine learning tools without human interference
• Automatic gradient boosting simpliﬁes this idea one step further, using only
gradient boosting as a single learning algorithm in combination with model-based
hyperparameter tuning, threshold optimization and encoding of categorical features.
• Boosting implementations cannot natively handle categorical variables and it is
necessary to transform such features. The simplest possibility is to encode these
features into integers.
• The paper achieves a fast, scalable and robust AutoML solution that can handle
categorical parameters (even with many levels), outliers and missing data, while
having a much smaller conﬁguration space compared to existing solutions.
Advantages:
• Gradient Boosting handles the categorical variables.

• Large number of hyperparameter can used in GBT than the Grid search and
Random Search.
Limitations:
• The results showed that auto xgboost was outperformed on the larger number of
datasets by auto-sklearn.
4.Peter Flach on topic “Performance Evaluation in Machine Learning:
The Good, the Bad, the Ugly, and the Way Forward” published by The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-
19),July 2019
• An overview of understanding of performance evaluation measures

for machine-learned models which have improved over the last
decade is given.
• A single aggregated measurement is insufficient to accurately reflect
the performance of a machine learning algorithm
• Concatenation, Scales and Transformations
• Measurements on Confusion Matrices
Advantages:
• An overview about choosing the best performance evaluation for the chosen machine learning
model is given.
• An idea about when to stop the auto-ml pipeline and display the results is given.
Limitations:
• A question remains unanswered: “would a measurement theory endowed with latent variables be
all that is required for evaluating the performance of the model?”.
5.Quanming Yao, Mengshuo Wang,Yuqiang Chen, Wenyuan Dai, Yi-Qi Hu, Yu-Feng Li, Wei-
Wei Tu, Qiang Yang, Yang Yu on topic “Taking the Human out of Learning Applications:A
Survey on Automated Machine Learning” Published in arxiv.org,January-2019
• Machine learning is knowledge- and labor-intensive to pursue good

learning performance
• To make machine learning techniques easier to apply and reduce the
demand for experienced human experts, automated machine learning
(AutoML) has emerged.
• First, introduction and definition of the AutoML problem, with
inspiration from both realms of automation and machine learning.
• Subsequently, categorizing and reviewing the existing works from two
aspects, i.e., the problem setup and the employed techniques
Advantages:
• The AutoML pipeline and methodology has been explained clearly.

• The works to be taken in future has been an inspiration to develop an AutoML pipeline.
• New and effective techniques for hyper-parameter optimization has been proposed.
Limitations:
• The random and grid search techniques mentioned are old-fashioned and consumes time for
producing results.
Methodology
Conclusion
• The purpose of AutoML is to automate the repetitive tasks like
automatic selection of classifiers,pipeline creation and
hyperparameter tuning so that data scientists can spend more of their
time on the business problem at hand.
• AutoML also aims to make the technology available to everybody
rather than a select few. AutoML and data scientists can work in
conjunction to accelerate the ML process so that the real
effectiveness of machine learning can be utilized.

Auto ML

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Auto ML

Uploaded by

Copyright:

Available Formats

INTRODUCTION

• Automated machine learning(automated ML) is the process of

• Traditional machine learning model development

• With automated machine learning,efficiency can be improved

• The pipeline to be followed to develop an AutoML application is provided.

• Automatic machine learning performs predictive modeling with high performing

• Gradient Boosting handles the categorical variables.

• An overview of understanding of performance evaluation measures

• Machine learning is knowledge- and labor-intensive to pursue good

• The AutoML pipeline and methodology has been explained clearly.

You might also like