PDSeasonableSchool ML4PD

Machine Learning and Its
Applications in IC Physical Design

David Z. Pan Yibo Lin
University of Texas at Austin Peking University
http://users.ece.utexas.edu/~dpan/ http://yibolin.com/
1
Outline
Part I: Introduction to Machine Learning
• What is Machine Learning
• Taxonomy of Machine Learning
• Machine Learning Techniques
Part II: Machine Learning for Physical Design
• Motivations
• Opportunities
• Challenges and Future Directions
2
A Bit History
Courtesy Nvidia3
Enormous Opportunities for
Machine Learning
Automotive Appliances
Healthcare
Entertainment 4
Enormous Opportunities for
Machine Learning
Automotive Appliances
Healthcare
Trash Classification Entertainment 5
Machine Learning ≈ Looking for a
Function
Speech recognition Playing Go
𝑓 𝑓
= “𝐻𝑜𝑤 𝑎𝑟𝑒 𝑦𝑜𝑢” = “5 − 5” (next move)

Image recognition Dialogue system
𝑓 𝑓 “𝐻𝑖”
(What the user said)
= “𝐶𝑎𝑡” = “𝐻𝑒𝑙𝑙𝑜”
(system response)
6
Machine Learning Framework
TRAINING
“dog”
𝑓6
𝑓4 Evaluate
𝑓3 𝑓7 goodness of Data “cat”
𝑓5 𝑓8
functions
“dog”
A set of functions
Pick the BEST function 𝑓 ∗
“cat” Using 𝑓 ∗
Our model TESTING
7
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
Unsupervised Learning Supervised Learning Semi-supervised Learning

only inputs 𝑥; training data with inputs 𝑥 subsets of data have
no labels/outputs 𝑦 and labels/outputs 𝑦 labels/outputs 𝑦
Scenarios
8
Based on [Boning@MIT2019]
• Clustering K-means
• Dimensionality PCA
reduction
• Kernel methods
• Density
estimation
9
• Sampling
• Search and
optimization
• Active learning
• Reinforcement
learning
• Online learning
• Transfer learning
• Adversarial
learning
10
• Sampling
Genetic Algorithm • Search and
Bayesian Optimization optimization
• Active learning
• Reinforcement
learning
• Online learning
• Adversarial
learning
11
• Sampling
• Search and
optimization
• Active learning
• Reinforcement
learning
• Online learning
• Adversarial
learning
12
• Sampling
• Search and
optimization
• Active learning
• Reinforcement
learning
• Online learning
• Adversarial
learning
13
• Sampling
• Search and
optimization
• Active learning
• Reinforcement
learning
• Online learning
• Adversarial
learning
14
• Classification
Logistic
• Regression
SVM OLS, MARS, PCR, ridge…
(parametric)
• Instance-based
Gaussian process (nonparametric)
Trees, random forest • Dimensionality Latent variable
reduction Sparse coding
• Generative Matrix completion
Bayesian (density
MCMC estimation) Deep
• Discriminative NN Convolutional
recurrent
15
• Classification
Logistic
• Regression
(parametric)
• Instance-based
Gaussian process (nonparametric)
Bayesian (density
Parametric methods Nonparametric methods
model complexity fixed (does not depend on recurrent
model complexity depends on
number of training samples; may depend on training samples
hyperparameters and/or regularization)
16
• Classification
Logistic
• Regression
(parametric)
Generative adversarial
• Instance-based
combines generative and discriminative
Gaussian process learners, e.g., GAN
(nonparametric)
Bayesian (density
recurrent
Generative methods Discriminative methods
learn 𝑝 𝑥, 𝑦 and/or 𝑝 𝑥 𝑦 to reason learn 𝑝 𝑦 𝑥
about 𝑝 𝑦 𝑥 relationships directly
17
Before Going too Technical…
18
New Job Opportunities: AI Trainer
• Machines can learn by themselves

• Why do we need AI trainers/engineers?
• Pokemons can fight by themselves. Why do

they need pokemon trainers?
Based on [Lee@NTU2017] 19
To be Serious, Tasks for AI Trainer
• Proper model
• Define the set of functions
• Proper loss function
• Not always easy to find the best function
• Optimality? E.g., Deep learning
• Require experienced engineers
Step 1 Step 2 Step 3

Evaluate
Define a set of Pick the best
goodness of
functions functions
functions
Based on [Lee@NTU2017] 20

• Clustering • Classification • Sampling
• Dimensionality • Regression • Search and
reduction (parametric) optimization
• Kernel methods • Instance-based • Active learning
• Density (nonparametric) • Reinforcement
estimation • Dimensionality learning
reduction • Online learning
• Generative • Transfer learning
(density • Adversarial
estimation) learning
• Discriminative
21
Based on [Boning@MIT2019]
Linear Models
• Linear regression
• 𝑓 𝑥 = 𝑤 = 𝑥 + 𝑏 = ∑C
AB3 𝑤A 𝑥A + 𝑏
• Logistic classification
G
• 𝑃 𝑦 = 1|𝑥 =
J(∑N
LMG OL PL HQ)
GHI
22
Linear Models – Training
• Ordinary Least Squares (OLS)
• Minimizes the mean square error over the 𝑛 data points
• Direct solution is possible
n
1 X T (i)
min (w x + b y (i) )2
w,b n i=1
<latexit sha1_base64="ZqhExa1ano2bIbKiI/YLF8J2vO0=">AAAD6nicfVJNbxMxEHUaWsry0RSOXCxWlVIBUXZzgAtSBReOReqXFCeR15kkVmzvYnvbRq7/BDfEld+ExH/hgLMJapuWWFpp5o3fvPHbyQrBjW23f9c26g82tx5uP4oeP3n6bKex+/zE5KVmcMxykeuzjBoQXMGx5VbAWaGBykzAaTb9NK+fnoM2PFdHdlZAT9Kx4iPOqA3QoFGQDMZcOfhaVoiPiORq4C7e4MxjEtAhJiNNmUu8U56YUg4c/5D4fshw86J/hC/7rsn3PX6NM/wWzxbZfj/FROWWjiMCanjdf9CI2612dfDdIFkGMVqew8HuRkaGOSslKMsENaabtAvbc1RbzgSEgUsDBWVTOoZuCBWVYHqussbjvYAM8SjX4VMWV+hNhqPSmJkMb92T1E7Mam0O3lfrlnb0vue4KkoLii2ERqXANsdzn/GQa2BWzEJAmeZhVswmNBhpw9+IoogouGC5lDSYQzJNfTfpuWCVKTXM1RwRMLJEUDUWECdE8/HEXnl/mzgFez/x6h+F6KrDKjEosoqcrlPF5LyYUGVz6eLUL2fA1Z0WjtOb5cSv1QvNNb8E4bud/wviFcW4E1ZwKTpXqx6G484t3TAYXiutci0XJi2kTkDba3fmmY/CViarO3g3OElbSaeVfknjg4/L/dxGL9Er1EQJeocO0Gd0iI4RQ7/Qn9pmbasu6t/q3+s/Flc3akvOC3Tr1H/+BU2WTbc=</latexit>
• Ridge Regression
• Regularization by penalizing large weights
n
1 X T (i) 2
min ( (w x + b y (i) )2 ) + kwk
w,b n i=1
<latexit sha1_base64="ULL93Y4r1ZiP7QOZkjpu7doxaGU=">AAAEAXicfVLLbhMxFHUaHmV4tXQJSBYjpFRAlJksYINUwYZlkfqS4iTyODeJVdsz2J6mkesVC76FHWLLl/AR/AOeSVDbtPRKI917ro/P9ZmbFYIb2+n8bqw1b92+c3f9XnT/wcNHjzc2nxyYvNQM9lkucn2UUQOCK9i33Ao4KjRQmQk4zI4/Vv3DE9CG52rPzgvoSzpRfMwZtQEabnwjGUy4cvClrBEfEcnV0M1e48xjEtARbpGxpswl3ilPTCmHjr9P/CBUuDUb7OHTgWvxbY9f4Qy/wfNFtT1ItwNCRJhlRDFRuZZu5gdplVo6iQio0bnscCPutDt14KtJskxitIzd4eZaRkY5KyUoywQ1ppd0Ctt3VFvOBIR3lAYKyo7pBHohVVSC6bvaMY9fBmSEx7kOn7K4Ri8yHJXGzGWw4KWkdmpWexV4Xa9X2vG7vuOqKC0othAalwLbHFf24xHXwKyYh4QyzcOsmE1psNeGnxRFEVEwY7mUNJhDMk19L+m7YJUpNVRqjggY2+CqmgiIE6L5ZGrPvL9MPAZ7PfHsH4Xo+oZVYlBkNTm9SRWTk2JKlc2li1O/nAHXZ9o4Ti+2E3+jXrhc81MQvtf9vyBeUYy7YTOXopVa/TAcdy/phsHwjdLVQi5MWkgdgLbn7lSVj8JWJqs7eDU5SNtJt51+TuOdD8v9XEdP0QvUQgl6i3bQJ7SL9hFDfxpbjWeN582vze/NH82fi6NrjSVnC12K5q+/J9JVTg==</latexit>
• Stochastic gradient descent (SGD)

• Pick a data point 𝑖 (or a batch) at random for iterative
model updates, same idea as neural networks
23
Feature Space
• Mapping from input space to feature space
𝒙 = 𝒙𝟏 , 𝒙𝟐 , ⋯ , 𝒙𝑵 ↦ 𝝓 𝒙 = [𝝓 𝒙𝟏 , 𝝓 𝒙𝟐 , ⋯ , 𝝓 𝒙𝑵 ]
Attributes Features
Input Space Feature Space

• Nonlinear features under a linear model
• 𝑓 𝑥 = ∑T
AB3 𝑤A 𝜙A 𝑥 + 𝑏
24
Support Vector Machine
• A learning method that uses a hypothesis space of
linear functions in a high dimensional feature
space trained with an optimization based learning
algorithm
𝑥A : feature vector
𝑦A : label, {-1, 1}
Consider 𝑦tA = 𝑤A 𝑥A − 𝑏
Objective
1 4
𝑚𝑖𝑛. 𝑤
2
Subject to
𝑤A 𝑥A − 𝑏 𝑦A ≥ 1
25
Support Vector Machine
• Extend to a high-dimensional feature space
through the (nonlinear) kernel
• Data can be linearly separable in feature space
26
Decision Trees for Classification
Partition the input space and fit very simple models
to predict the output in each partition.
• Split nodes: select the “best” feature and value to split
• “Best”: usually defined in terms of some notion of
“impurity” in the resulting partition of training data.
Typical impurity measures include misclassification
error, Gini index, or entropy
27
Probability of kyphosis after surgery [wikipedia]
Decision Trees for Classification
Partition the input space and fit very simple models
to predict the output in each partition.
• Split nodes: select the “best” feature and value to split
• “Best”: usually defined in terms of some notion of
“impurity” in the resulting partition of training data.
Typical impurity measures include misclassification
error, Gini index, or entropy
28
Probability of kyphosis after surgery [wikipedia]
K-Means Clustering
Unsupervised learning
• Only have feature 𝑥, no label 𝑦
Target: identify natural groups of data
• min ∑|
AB3 ∑}∈•L 𝑥 − 𝜇A
4
Iterative algorithm
Randomly set cluster centers 𝑚3 , … 𝑚‚ ;

For not converged
• Use these centers to assign each data
point to the nearest cluster;
• Adjust centers 𝑚A based on those
memberships;
29
Deep Learning
Deep Learning for IoT Big Data and Streaming Analytics: A Survey 30
Neural Networks
• Multilayer perceptron (MLP)
𝒇 𝒙 = 𝝈𝟑 (𝑾𝟑 𝝈𝟐 (𝑾𝟐 𝝈𝟏 (𝑾𝟏 𝒙)))

𝝈𝒊 : activation function
𝑾𝟏 𝑾𝟐 𝑾𝟑
Neurons
31
Neural Networks
• Multilayer perceptron (MLP)
𝒇 𝒙 = 𝝈𝟑 (𝑾𝟑 𝝈𝟐 (𝑾𝟐 𝝈𝟏 (𝑾𝟏 𝒙)))

𝝈𝒊 : activation function
Rectified linear unit (ReLU) Logistic (Sigmoid)

0, 𝑓𝑜𝑟 𝑥 < 0 1
𝑓 𝑥 =ˆ 𝑓 𝑥 =
𝑥, 𝑓𝑜𝑟 𝑥 ≥ 0 1 + 𝑒 ‹}
32
A Bit History on Deep Learning
https://beamandrew.github.io/deeplearning
/2017/02/23/deep_learning_101_part1.html 33
A Bit History on Deep Learning
• “Neural networks” → “Deep learning” in 2006
• Geoffrey Hinton
• Breakthrough in 2012
• Large scale visual recognition challenge (LSVRC)
• ImageNet
https://beamandrew.github.io/
deeplearning/2017/02/23/dee
p_learning_101_part1.html 34
How to Train a Neural Network
• Objective: 𝑚𝑖𝑛. ∑A 𝐿𝑜𝑠𝑠(𝑓(𝑥A ), 𝑦A )
• Gradient descent
• Backpropagation
𝑦3 = 𝑓3 (𝑤}3,3 𝑥3 + 𝑤}4,3 𝑥4 )
𝑤}3,3 𝑓3 (𝑒)
𝑥3 𝑤}4,3 𝑓6 (𝑒)
𝑓4 (𝑒) 𝑓8 (𝑒)
𝑥4 𝑓7 (𝑒)
𝑓4 (𝑒)
Forward propagation
35
• Loss function: 𝐿(𝑧)
• Backpropagation
• Chain rule
Forward pass Backward pass
𝑥 𝑑𝐿 𝑑𝐿 𝑑𝑧
=
𝑑𝑥 𝑑𝑧 𝑑𝑥 𝑑𝐿
𝑓(𝑥, 𝑦) 𝑧 𝑑𝑓
𝑑𝑧
𝑑𝐿 𝑑𝐿 𝑑𝑧
𝑦 =
𝑑𝑦 𝑑𝑧 𝑑𝑦
36
• Backpropagation
• Non-linear optimization
Local cost minimum

𝐿𝑜𝑠𝑠
Global cost minimum
𝑤
37
Why Deep? Universality Theorem
• Any function 𝑓: 𝑅 C → 𝑅 T
• Can be realized by a neural network with one
hidden layer
• (given enough hidden neurons)
Why DEEP neural networks

but not FAT neural networks?
Input layer Output layer
Hidden layer
Reference for the reason:
http://neuralnetworksandde
eplearning.com/chap4.html 38
Why Deep? Analogy
Logic Circuits Neural Networks
• Consist of logic gates • Consist of neurons
• A two layers of logic gates • A hidden layer network can
can represent any represent any continuous
Boolean function function
• Use multiple layers of logic • Use multiple layers to
gates to build functions in represent functions in a
a much simpler way much simpler way
Less gates required Less parameters Less data?
More reason:
https://www.youtube.com/watch?v=Xs
C9byQkUH8&list=PLJV_el3uVTsPy9oCR
Y30oBPNLCo89yu49&index=13 39
Why Deep? NOT Fat
• Ocaam’s razor: Entities should not be multiplied
without necessity
• The simplest solution is most likely the right one
• Less parameters, better generalization
• Deep + Thin v.s. Short + Fat
• Same # of parameters
• Deep networks tend to generalize better
• Require less amount of training data
#Layers x #Neurons Error Rate (%) 1 x #Neurons Error Rate (%)
5x2k 17.2 1x3772 22.5
7x2k 17.1 1x4634 22.4
Seide, Frank, Gang Li, and Dong Yu. "Conversational speech transcription using context-dependent deep
40
neural networks." Twelfth annual conference of the international speech communication association. 2011.
How Deep?
5-layers
Top-5 err.
16.4
ResNet Top-5 err.

34 7.4
101 6.0
152 5.7
41
More Flavors…
Convolutional Neural Networks
Convolutional layer
• Suitable to images
• Correlation with neighboring pixels
42
More Flavors…
Highway Networks
Residual Network Highway Network
copy Gate
controller copy
Deep Residual Learning for Image Training Very Deep Networks

Recognition https://arxiv.org/pdf/1507.06
http://arxiv.org/abs/1512.03385 228v2.pdf 43
More Flavors…
Random Networks
Exploring Randomly Wired

Neural Networks for Image
Recognition
https://arxiv.org/pdf/1904.
01569.pdf
44
Generative Adversarial Networks
• GAN [Goodfellow et al, 2014] [Radford et al, 2015]
• Two networks contest (generator and discriminator)
• Produces images similar to those in the training data
45
Real samples
Is D
D
Correct?
Latent
Space G
Generated
Fake Samples 46
Noise Fine Tune Training
min max Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 D(G(z)))]
<latexit sha1_base64="oddVWz+SvzLAkdHsukR907ym7Rs=">AAAD3XicfVLPb9MwFHYbfozyYx0cuVhESC2IqkkPcJygaByHRLdJcRQ5rptadZxgO6OtlyM3xAmJv41/hRNOlsHajT3J0qf3vs/fe/aLc86UHg5/tdrOrdt37u7c69x/8PDRbnfv8ZHKCknohGQ8kycxVpQzQSeaaU5PcklxGnN6HC/eVfXjUyoVy8QnvcppmOJEsBkjWNtU1P2BYpowYejnos6UHZQyEZmDEqIULyMzroGex7F5X0ZmCZFiKcwjM8Ual71lv4QB4lkCxxaH8OUGe33BXvfWf4k979W4d2ATfctHItM46SAqpv96iLrucDCsA14FXgNc0MRhtNf20DQjRUqFJhwrFXjDXIcGS80Ip3aoQtEckwVOaGChwClVoamfr4TPbWYKZ5m0R2hYZy8rDE6VWqWxZVazqe1albyuFhR69iY0TOSFpoKcG80KDnUGq7+AUyYp0XxlASaS2V4hmWOJibY/1ul0kKBfSJam2D4OiiUuAy809qlUIWnlZhCnM404FgmnrockS+b6rCw3hQuqrxeeXUiQrG/YFlpHUov9m1whOs3nWOgsNa5fNj3AmjOArn+57JU3+tnLJVtSXgaj/xvCLUd3ZBe0Ma3c6sGgO9rwtY3BbWu7ZN72Sl0FR/7AGw38j767/7ZZtx3wFDwDPeCB12AffACHYAII+N2CrX7rhRM5X51vzvdzarvVaJ6AjXB+/gHYYkYW</latexit>
G D
Loss for real samples Loss for generated samples
Real samples
Is D
D
Correct?
Latent
Space G
Generated
Fake Samples 47
Noise Fine Tune Training
DCGAN
48
Recent Development of GANs
Image-to-Image Translation
with CGAN (pix2pix) BigGAN
Conditional GANs (CGANs)
Let’s build a generic CGAN Google: “Hold my beer Nvidia,
What if I want my GAN to
architecture where the my interns can use thousands
generate data with specific
condition is an image, and of TPUs to generate
attributes
learn to translate this image cheeseburgers”
into another domain
Jun. 2014 Nov. 2015 Nov. 2017
Nov. 2014 Nov. 2016 Sep. 2018
Generative Adversarial Deep Convolutional GANs (DCGANs) PG-GAN/pix2pixHD

Networks (GANs)
As the first major improvement on the Nvidia: “Let’s make some high-
The architecture of a generator GAN architecture, DCGAN is more resolution GANs with multi-
and a discriminator is first stable during training and generates scale approaches”
proposed by Goodfellow et al. higher quality samples
49
Recent Development of GANs
High quality images generated by BigGAN

50
Reinforcement Learning
AlphaGO
AlphaStar
51
Supervised v.s. Reinforcement
• Supervised
• Learning from teachers
• Labeled data
Next move: “5-5” Next move: “3-3”
• Reinforcement
• Learn from critics
Win!
Move 1 Move 2 …
Lose!
• AlphaGO
• Supervised learning + reinforcement learning
52
Reference and More Resources
ML courses from Hung-Yi Lee at NTU
• http://speech.ee.ntu.edu.tw/~tlkagk/courses.html
Stanford ML courses on Coursera
• https://www.coursera.org/learn/machine-learning
Tutorials on GANs
• NIPS 2016: https://arxiv.org/pdf/1701.00160.pdf
• CVPR 2018: https://sites.google.com/view/cvpr2018tutorialongans/
Toolkit tutorials
• Tensorflow: https://www.tensorflow.org/tutorials
• PyTorch: https://pytorch.org/tutorials/
53
Outline
Part I: Introduction to Machine Learning
• What is Machine Learning
• Taxonomy of Machine Learning
• Machine Learning Techniques
Part II: Machine Learning for Physical Design
• Motivations
• Opportunities
• Challenges and Future Directions
54
Modern VLSI Layouts
IBM Power7 Apple A12

1.2B 6.9B
• Large scale: billions of transistors

Nvidia Xavier
• Complicated design flow 9B
• Long design cycles
55
IC Design Flow – Silicon Compiler
System Specification
module test Architectural Design

input in[3];
… Floorplanning
endmodule
Functional&Logic Design
Fabless
AND
OR
Placement
Logic Synthesis
Clock Tree Synthesis
Physical Design
DRC Signal Routing
LVS
STA Mask Synthesis
& Verification Timing Closure
Fabrication
DFM Closure
Packaging and Testing 56

IC Design Flow – Silicon Compiler
Process Modeling
System Specification Test Patterns
module test Architectural Design Lithography

input in[3];
…
Experiment
endmodule
Functional&Logic Design Lithography Modeling
Fabless
AND
OR
Logic Synthesis
Layout/Mask
Physical Design
Mask Optimization
DRC Recipe
LVS
STA Mask Synthesis
& Verification SRAF&OPC
Fabrication Lithography Rule Check
Packaging and Testing Mask Optimization 57

Placement Example
Optimization targets
• Satisfy timing constraints?
• Satisfy power constraints?
• 100% routable?
• Wirelength minimized?
• …
[Curtesy NTU team] 58
Routing Example
A zoom-in 3D view
[curtesy samyzaf]
Challenging problem
• 10+ metal layers
• Millions of nets
• May be highly congested
• Minimize wirelength
[Curtesy Umich] 59
High Correlation between P&R
Curtesy semiwiki
60
Mask Synthesis
Design target
without OPC with OPC
Mask
Wafer
61
Mask Synthesis: OPC+SRAF
[Curtesy Semiengineering]
62
Mask Verification: Lithography
Simulation
Contact Mask
Aerial Image
(Light intensity map)
Resist Pattern
63
Mask Verification: Lithography
Hotspots
Litho Difficulty
Estimator: hotspots
[Curtesy Tech Design Forums] 64

Challenges for VLSI Design
• Long and complicated design flow
• Nested chicken-egg loops
• P&R, OPC&SRAF…
• Nearly all problems are NP-hard
• Stacking of metaheuristics
• High expectation to optimality
• Shoot for even 1% improvement
• Single iteration is expensive
• One iteration of backend flow may take days
• Require many iterations for convergence
65
How Machine Learning Can Help
Mask Mask
Placement Routing
Synthesis Verification
Hammers to tackle each step
66
Mask Mask
Placement Routing
Bridges to connect each step
67
Placement
Mask Mask
Placement Routing
68
VLSI Placement
Placement is critical to VLSI

design quality and design closure
Input
Gate-level netlist
Standard cell library
Output
Legal placement solution
Objective
Optimize wirelength,
routability, etc.
RePlAce
69
Nonlinear Placement Algorithm
Objective of nonlinear placement
Wirelength Density
Challenges of Nonlinear Placement

Low efficiency
§ > 3h for 10M-cell design
Limited acceleration
§ Limited speedup, e.g., mPL, due to clustering
Huge development effort
§ > 1 year for ePlace/RePlAce
70
Advances in Deep Learning
Hardware/Software
[Courtesy Nvidia]
GPU
CPU
Over 60x speedup in neural

Deep learning toolkits
network training since 2013
71
DREAMPlace Strategies
• We propose a novel analogy by casting the
nonlinear placement optimization into a neural
network training problem
• Greatly leverage deep learning hardware (GPU)

and software toolkit (e.g., PyTorch)
• Enable ultra-high parallelism and acceleration

while getting the state-of-the-art results
[Lin+,DAC’19]
72
Best paper award
Analogy between NN Training and
Placement
Train a neural network Solve a placement 73

Analogy between NN Training and
Placement
Casting the placement problem into neural network training
Train a neural network Solve a placement 74

DREAMPlace Architecture
Leverage highly optimized deep learning toolkit PyTorch
Placement Match RePlAce

API
Nesterov’s
Python Nonlinear Method
Optimizer
[TCAD’18,Cheng+]
Automatic
Gradient
CONV WL
OPs
C++/CUDA
ReLU Density
75
Experimental Results
DREAMPlace
• CPU: Intel E5-2698 v4 @2.20GHz
• GPU: 1 NVIDIA Tesla V100
• Single CPU thread was used
34x
speedup
RePlAce [TCAD’18,Cheng+]
• CPU: 24-core 3.0 GHz Intel Xeon
• 64GB memory allocated ISPD 2005 Benchmarks
200K~2M cells
Same quality of results!

43x
speedup
10M-cell design
finishes within 5min c.f. 3h Industrial Benchmarks
1M~10M cells
76
[Lin+,DAC’19]
Bigblue4 (2M-Cell Design)
Density Map Potential Map Field Map
Placement Metrics GPU Usage on Titan Xp
Code release: https://github.com/limbo018/DREAMPlace

77
Routing
Mask Mask
Placement Routing
78
Routing Guidance: GeniusRoute
Routing for analog circuits, e.g., comparator
• Sensitive performance to clock routing
Existing manual layouts
• Hard to encode designer expertise into rules
Sweep the routing of the clock net 79

[Zhu+,ICCAD’19]
Learn from manual layouts
• Encode designer expertise into neural networks!
Generate routing guidance for critical nets
• Clock, power/ground, critical signal nets
• Probability map of routing
Unrouted Generated
routing region Routed
Placement
Layout
80
[Zhu+,ICCAD’19]
• Adopt autoencoder for routing region generation
• Compare with routing from manual layouts
81
[Zhu+,ICCAD’19]
• Test on comparators and OTAs
• Evaluate with post layout simulation
• Compare with manual layout and previous methods
Schematic Manual ICCAD’10 W/o guide GeniusRoute

O↵set (µV) / 480 1230 2530 830
Delay (ps) 102 170 180 164 163
Noise (µVrms ) 439.8 406.6 437.7 439.7 420.7
<latexit sha1_base64="NeOAhk8VynDRv7Rc5DqJ2iTHEp0=">AAAFqXicfVNbk9M2FDaBUEgL5fLIi4awFIbBazshyfJEG2baPlCWy16GOLMjK8eJZmXJI8nA1uu/1//Q/9DX9rnHstlmA0UeW0dH53zfkfydJBfc2CD480Ln4qXu5W+uXO19+92169/fuHlr36hCM9hjSih9mFADgkvYs9wKOMw10CwRcJAcT+v9g/egDVfyrT3JYZ7RpeQpZ9Si6+hm5yhOYMllaSmmVL2YgbSguVz2YsM0z63hv0NvK2Y0rzPKqcpyqrlRkqiU5MrYx4KeqMISw7NCOFiiwRTCGpIqTaYvX+yGPkFoQRMQNdFThiAhes6oMVFX5SlzTzvh/qo+Vo9sxYhsOZ62yGQ5coFV6YArEsekjbtP3rAVZFgCI/fJCyoLKtD4dTr98fkPYYDmwbYiy4IvAO2fQfLCvMbKYQ3jZZoasOTBPaQksYWPttyv7j3E+G18h5MaJYwGAcE5ejKoly4qScvJIKj+A3oOeC01jsPITQMSBlH9HTuYBmw0XMMIR4P1E/2muIGmmE+1HJWNpTNTNZjDwY4/qedg5I/ceuyPW/94DXsYBf54HX1XfQDdHvWgKW/gD5+4ovydiSvUD3ea9SQ6V6Y/CdahYpCLs//YO1uioo5u9AM/cIN8boSt0ffasYuSDOOFYkWGQmSCGjMLg9zOS6rxvzqJFgZyyo7pEmZoSpqBmZeuFSqyhZ6F012qpCXOu55R0syYkyzBSBTKymzu1c4v7c0Km07mJZc56kWyhigtBLGK1H1FFlwDs+IEDYp9U2uQraimDLvJ9PBKJHxA2WcUbyZONK1m4bzEezKFriW7KmMBqcUekUsB/TDWfLmyp1V1PvEY7JcTTz+lxNohbCYiI3PJ0ddYSfw+X1FpVVb2o6qtgbgYn/Sj9e2w+iofgmv+EUQ1G/w/Idlg7Nfib0lrNncw0h+c48XCyCY1iizclNTnxn7ko8CjV1H/2U+t3K54d7y73gMv9MbeM+8Xb9fb81jnj85fnb87/3QfdV91D7vvmtDOhTbntndudNm/fEvTBw==</latexit>
Power (µW ) 13.45 16.98 17.19 16.82 16.80
Closest results to the manual layout
82
[Zhu+,ICCAD’19]
MAGICAL (Machine Generated Analog Layout) Overview
MAGICAL MAGICAL VALIDATION

INPUTS Calibre®
LAYOUT CONSTRAINT EXTRACTOR DRC/LVS/PEX
Circuit Netlist Pattern Matching +
Small Signal Analysis EVALUATION
Design Rules Cadence® Virtuoso®
ADE
PLACER ROUTER
Performance-driven MAGICAL
Placement Deep
DEVICE OUTPUT
learning-
GENERATOR Post-Placement guided GDSII Layout
Parametric Optimization Routing
Instances
Deep Learning
• MAGICAL 0.2 Version Open-source released in May 2019

• GitHub: https://github.com/magical-eda/MAGICAL
• End-to-end analog layout generation from netlist to GDSII
• No dependency on commercial tools
• Push-button, no-human-in-the-loop 83
Mask Synthesis
Mask Mask
Placement Routing
84
Optical Proximity Correction
Issue: Conventional OPC consumes very long time
Iteration : 0
Goal: High accurate correction in shorter runtime
Ratio of model-based OPC time

Drastic reduction of
(normalized by 40nm node)

OPC time is required!
Iteration : 3
Iteration : 5 1
Technology node
Model-based OPC is the most

Mask image Printed image widely used technique 85
Results on HBM-based OPC
RMSE (nm) Compare with model-based OPC
(Root mean square error) @iteration 2, 6, 10
■Train ■Test
Edge Placement Error (nm)
The number of iterations of model-based OPC
[Matsunawa+, SPIE’15]
can be reduced by 2x 86
GAN for OPC
• Generative adversarial networks for OPC
• [Yang+, DAC’18]
Generated Reference
Target & Mask
Mask
0.8
…
0.2
Mask
Compared with ILT flow
EPE error: 9% reduction Discriminator
PV-band: 1% reduction Target
Mask
Overall runtime: 2x less
…
Encoder Decoder
Generator 87
SRAF Insertion
• SRAF (sub-resolution assist feature) insertion
• ML-based approach achieved similar result but 10x faster
N%1 0 1 2
sub%sampling0point
SRAF%label:%0
SRAF%label:%1
Target pattern SRAF SRAF box OPC region SRAF region
[Xu+, ISPD’16, TCAD’17] 88

Comparison with Model-based SRAF
• Similar PV band and EPE
• But 10x faster
Can we do better?!
89
Image Translation with Generative
Adversarial Networks
Generative Adversarial Network (GAN) [Goodfellow+, 2014]
• Two neural networks contest (generator and discriminator)
• Produces images similar to those in the training data set
Conditional GAN (CGAN) for Image Translation [Isola+, CVPR’17]
• Takes an image in one domain and translate it to another one
90
SRAF Insertion & Image Translation
• What does SRAF generation have to do with Image
translation?!
• Can we define the problem as translating images from

the Target Domain (𝐷“ ) to the SRAF Domain (𝐷• )?
[Alawieh+,DAC’19] 91
Challenges for SRAF Insertion
• Layout images have sharp edges which pose a
challenge to GANs
• No guaranteed to generate polygon SRAF shapes
• Sharp edges can complicate gradient propagation
• Generated images need ultimately be changed to
layout format
• Images cannot be directly mapped to ‘GDS’ format
• Post-processing step should not be time consuming
• Hence, a proper encoding is needed to address
these challenges!
92
Multi-Channel Heatmap Encoding
Key Idea: encode each type of object on a separate
channel in the image
• Channel index carries object description (type, size,...)
• Excitations on the channel carry objects location
Original Layout Encoded Layout

93
CGAN Approach
Generator:
• Trained to produce images in
𝐷” based on input from 𝐷“ Fake/Real
• Tries to fool the Discriminator Discriminator

Discriminator: Real
Diff
• Trained to detect ‘fakes’
generated by the Generator
The two networks are jointly Encoder Decoder
trained until convergence Generator Fake
Input
94
Results Decoding
Decoding the generated layout images in two steps
• Thresholding
• Excitation detection
Decoding scheme is fast è GPU accelerated
SRAF
location
Isolated
pixel
95
Sample Results TCAD’17: [Xu+, ISPD’16, TCAD’17], SVM-based
MB: Model-Based Approach - Calibre
TCAD’17 GAN-SRAF MB
Target Pattern MB SRAF LS_SVM Final
LS_SVM Prediction CGAN Final CGAN Prediction 96

Comparison Summary
No SRAF MB TCAD’17 CGAN
PV Band (um2) 0.00335 0.002845 0.00301 0.00291
EPE (nm) 3.9287 0.5270 0.5066 0.541
Run time (s) - 6910 700 48
• The proposed CGAN based approach can

achieve comparable results with TCAD’17 and MB
with 14.6X and 144X reduction in runtime
[Alawieh+,DAC’19] 97
Mask Verification
Mask Mask
Placement Routing
98
Bottleneck in Semiconductor and IC
Manufacturing: Lithography
Layout Chip
• Lithographic modeling and hotspot detection
• What you see (at design) is NOT what you get (at fab)
• Hotspot è poor performance and yield
• Litho-simulations are extremely CPU intensive
• Need much faster algorithms to guide IC design, and
pinpoint possible mask/wafer inspection spots 99
Challenges in Lithography Modeling
Rigorous simulation: physics-based simulation, e.g., Synopsys S-Litho
Rigorous Simulation
Mask Layout Accurate but slow Resist Pattern
• Simulating 2 μm × 2 μm using Synopsys S-Litho ⟹ ~1 minute

• A 2 mm × 2 mm chip contains 1M such clips ⟹ 1.9 years!
• Intel Ivy Bridge 4C: 160 mm2
100
ML for Lithography Modeling
Rigorous Simulation
Machine learning for resist modeling [Watanabe+, SPIE’17] [Shim+, SPIE’17]
Optical Machine Threshold

Model Learning Processing
Mask Layout Aerial Image Slicing Threshold Resist Pattern
Speeds up the resist modeling stage

101
VLSI Technology Nodes
[electronicsforu, 2017] 102

Transfer Learning for Lithography
Modeling
Training with limited new tech. data + older tech.
10nm Data 10nm Litho

Model
Training
Large amount of data Knowledge
Transfer
7nm Data 7nm Litho

Model
Training
Small amount of data
Good accuracy for very
[Lin+, TCAD’19]
small data, CNN 103
Transfer Learning with ResNet
Transfer and Fix TFk Scheme
Source Domain Target Domain

Input Input
conv conv
conv Knowledge conv Fix k layers

Transfer
··· ···
conv conv
Finetune
fc fc
fc fc
Source Domain Target Domain

Output Output 104
Technology Transition from N10 to N7
Contact Layer Design Rules [Liebmann, SPIE’15]

N10 N7
Patterning LELE LELELE
Pitch (nm) 64 45
Mask pitch (nm) N10128 N7a 135 N7b
Design Rule
Litho-target (nm) A 60 B 60 B
Optical Source A B B
Resist Material A A B
Optical Sources Resist Materials
N10 N7 Resist A Resist B
Different dissolution slopes 105

Data Reduction from Knowledge
Transfer
From N10 to N7b From N7a to N7b
2.6X CNN 8X CNN

40 % CNN TF0 40 % CNN TF0
Training Data
Training Data
ResNet TF0 ResNet TF0
4X 20X
20 % 3X 20 % 15X
2X 10X 10X 10X
0% 0%
50
75
00
25
50
50
75
00
25
50
1.
1.
2.
2.
2.
1.
1.
2.
2.
2.
CD RMS Error (nm) CD RMS Error (nm)
2~10X reduction of 8~20X reduction of

training data training data
106
End-to-End Lithography Modeling
Rigorous Simulation
Machine learning for end-to-end lithography modeling
Machine Learning
Mask Layout Resist Pattern

Goal: ultimately fast modeling
[Ye+,DAC’19]
107
Best paper nomination
Image Translation for Lithography
Modeling
128 nm
1 µm
128 nm
1 µm
Expensive Litho Simulation
256 px
256 px
256 px
Fast Image Translation 256 px
Different elements encoded Resist pattern zoomed in for

on different image channels high-resolution/accuracy
108
LithoGAN Architecture
Generator
Encoder Decoder
Discriminator CNN for center prediction

109
LithoGAN Visualization
Model advancement progress
110
Setup
• Python w/ TensorFlow
• 3.3GHz Intel i9 CPU & Nvidia TITAN Xp GPU
Datasets
• Different types of contact arrays [Lin+, TCAD’18]
• 982 mask clips at 10nm node (N10)
• 979 mask clips at 7nm node (N7)
• 75-25 rule for train/test split
Methods Compelling runtime speedup for

• Rigorous sim using S-Litho: golden resist patterns early technology exploration
• [Lin+, TCAD’18]: Optical sim using Calibre +
threshold prediction using CNN + post processing
Optical Machine Threshold
Model Learning Processing
Mask Layout Aerial Image Slicing Threshold Resist Pattern

[Ye+,DAC’19] 111
Union
Predicted contour
EDE
Intersection
Golden contour
Accuracy measures
Edge Displacement Error (EDE)
• Distance between the golden edge and
the predicted one of the bounding boxes
• The smaller, the better
• Captures bounding box mismatch
IOU = Intersection/Union Competent accuracy for lithography usage

• The larger, the better (in consultation with industry)
• Captures contour mismatch
[Ye+,DAC’19] 112
Lithography Hotspot Detection
Pattern/Graph Matching Data Mining/ML

• Pros/cons • Pros/cons
• Accurate and fast for known • Good to detect unknown
patterns or unseen hotspots
• But too many possible • Trade-off accuracy and
patterns to enumerate false alarms
• Sensitive to changing • Accuracy may not be
manufacturing conditions good for “seen” patterns
113
PM & ML for Hotspot Detection
Pattern Matching Methods Machine Learning Methods

Good for detecting previously Good for detecting new/previously
known types of hotspots unknown hotspots
A New Unified Formulation (EPIC)

Good for detecting all types of hotspots
with good accuracy/false-alarm
tradeoff (Meta-Classifier)
• Meta-classification combines the strength of different

hotspot detection techniques [Ding+, ASPDAC’12]
• Balance accuracy and false-alarm
114
Application-Specific ML
Different ways of feature extractions/representations
2 3
C11,k C12,k C13,k ... C1n,k
6 C21,k C22,k C23,k ... C2n,k 7
6 7
6 .. .. .. .. .. 7
DCT Encoding 4 . . . . . 5
Division Cn1,k Cn2,k Cn3,k ... Cnn,k
( 2
C11,1 C12,1 C13,1 ... C1n,1
3
6 C21,1 C22,1 C23,1 ... C2n,1 7
6 7
6 .. .. .. .. .. 7
k 4 . . . . . 5
Cn1,1 Cn2,1 Cn3,1 ... Cnn,1
Deep learning: CNN, … [Yang+, TCAD’19] 115

Recent Progress – Lack of Data
• Self-paced semi-supervised learning
• [Chen+, ASPDAC’19]
• Leverage unlabeled data
• Multi-task learning
Clustering Task
Conv/3x3x32
Conv/3x3x32
MaxPool/2x2
contrastive
Pairwise
Softmax
FC/250
ẑi
FC/2
loss
<latexit sha1_base64="fhPGlcXL9sf7gj/VeINouEhou9c=">AAADUnicfZLPbtQwEMa9CX9KKNDCkYtFisRplWQP7bFSLxyLxLaV4mjleCcbax0nsp3CNs1jcIWX4sKrcMJJU6mblo5k6dPMfP6NrUkrwbUJgj8Tx33y9NnznRfey91Xr9/s7b8902WtGMxZKUp1kVINgkuYG24EXFQKaJEKOE/XJ139/BKU5qX8ajYVJAVdSZ5xRo1NxQckp6a5ahf8YLHnB9OgD3xfhIPw0RCni30nJMuS1QVIwwTVOg6DyiQNVYYzAa1Hag0VZWu6gthKSQvQSdPP3OKPNrPEWanskQb32buOhhZab4rUdhbU5Hpc65IP1eLaZEdJw2VVG5DsBpTVApsSdx+Al1wBM2JjBWWK21kxy6mizNhv8jyPSPjGyqKgctmQVNE2DpOGgNS1go7WEAGZIYLKlQA/JIqvcnPdttvGNZiHjde3FqL6G8ZGS2S9OXqMislllVNpyqLxo3aYAfc9U+xHd8th+yjPXq74dxBtPPs/EI+I/qzFt9CO1j8M+7Mtrh0Mj9F2ycLxSt0XZ9E0DKbhl8g/PhrWbQe9Rx/QJxSiQ3SMPqNTNEcMlegH+ol+Ob+dv+7EdW9ancngeYe2wt39B+h5FM8=</latexit>
Conv/3x3x16
Conv/3x3x16
MaxPool/2x2
Input
cross entropy
Conv/3x3x32
Conv/3x3x32
MaxPool/2x2
Weighted
Softmax
FC/250
ŷi
FC/2
<latexit sha1_base64="VgNKHd8VlvHhnm6RpYKQiz9yt04=">AAADUnicfZLPbtQwEMa9G/6UUKCFIxeLFInTKske6LESF45FYttKcbRyvJONtbYT2U5hleYxuMJLceFVOOGkqdRNS0ey9GlmPv/G1mSV4MaG4Z/J1Hv0+MnTvWf+8/0XL18dHL4+M2WtGSxYKUp9kVEDgitYWG4FXFQaqMwEnGebT139/BK04aX6arcVpJKuFc85o9alkiNSUNts2yU/Wh4E4SzsA98V0SACNMTp8nAakVXJagnKMkGNSaKwsmlDteVMQOuT2kBF2YauIXFSUQkmbfqZW/zeZVY4L7U7yuI+e9vRUGnMVmauU1JbmHGtS95XS2qbH6cNV1VtQbFrUF4LbEvcfQBecQ3Miq0TlGnuZsWsoJoy677J932i4BsrpaRq1ZBM0zaJ0oaAMrWGjtYQAbklgqq1gCAimq8Le9W2u8YN2PuNVzcWovsbxkZHZL05foiKyWVVUGVL2QRxO8yA+54ZDuLb5ah9kOcu1/w7iDaZ/x+IR8Rg3uIbaEfrH4aD+Q7XDYbHaLdk0Xil7oqzeBaFs+hLHJwcD+u2h96id+gDitBHdII+o1O0QAyV6Af6iX5Nf0//ehPPu26dTgbPG7QT3v4/5ZEUzg==</latexit>
Classification Task
116
Preliminary Results [Chen+, ASPDAC’19]
Better accuracy with small amount of training data
DAC SSL
117
Recent Progress: Model Confidence
Examine confidence of model predictions
• [Ye+, DATE’19]
• Gaussian Process provides confidence level
• Active learning: build accurate GP model with little data
• Use GP model: simulate unconfident samples
Clip pool
Select samples Train GPR model

p(f1 |X, x1 , y) p(f1 > 0) ⇡ 98%
p(f2 |X, x2 , y) Run simulation Predict using GPR
Sample1
Probability
Train SVM False

Confident?
Sample2
True New H
detected? True
p(f2 > 0) ⇡ 65% Run simulation
False
x
1 0 1 Training set Output H/NH
Active learning Gaussian Process 118

Preliminary Results [Ye+, DATE’19]
• Larger 𝛼 implies a large confidence interval
• [𝜇 − 𝛼𝜎, 𝜇 − 𝛼𝜎]
• Trade-off: quality and #simulations required
Layout 2 Layout 3
200 99.4
# False Alarms
# False Alarms
100.0 2,800
Accuracy (%)
Accuracy (%)
150 99.2
99.8 2,600
99.0 2,400
100
99.6 98.8 2,200
50 98.6
99.4 2,000
#Simulations (%)
#Simulations (%)
60 30
40
20
20
10
0
0.382 0.682 0.866 0.382 0.682 0.866
↵ ↵ 119
Mask Mask
Placement Routing
Bridges to connect each step
120
Bridge Placement and Routing
Mask Mask
Placement Routing
121
Predicting Routing Congestion
Given placement, predict
• Routing utilization map [Yu+,DAC’19] for FPGA
• DRC hotspots [Chan+,ISPD’17] [Xie+,ICCAD’18] for ASIC
Input FPGA Placement Output Routing Util Ground Truth

https://ycunxi.github.io/cunxiyu/dac19_demo.html
122
Predicting Routing Congestion
Given placement, predict
• Routing utilization map [Yu+,DAC’19] for FPGA
• DRC hotspots [Chan+,ISPD’17] [Xie+,ICCAD’18] for ASIC
[Chan+,ISPD’17]
123
Challenges in Predicting Congestion
Feature representation
• How to encode interconnect information
• Scalability of features
Model selection
• SVM, ANN, FCN, GAN, etc.
Different possibility of routing trees [Spindler+,DATE’07] Connectivity images

Evenly distribute routing demands (RUDY map) [Yu+,DAC’19]
124
Integrate Models into Placement
• Up to 76.8% DRC reduction
• Minor WL and timing impacts
[Chan+,ISPD’17]
#DRCs Wirelength TNS (ns)
eg1 8478 1964 -76.83% 1742804 1747685 0.3% -153.43 -158.4 3.2%
eg2 1502 927 -38.28% 1750698 1753047 0.1% -168.23 -163.5 -2.8%
eg3 2017 1819 -9.82% 1772889 1773701 0.0% -215.75 -213.6 -1.0%
eg4 2026 1780 -12.14% 1735185 1735227 0.0% -151.36 -149.6 -1.2%
eg5 4252 4255 0.07% 1831492 1836060 0.2% -264.34 -275.6 4.3%
eg6 3440 3891 13.11% 1790059 1794184 0.2% -195.65 -203.5 4.0%
avg -20.6% avg 0.2% avg 1.1%
max 13.1% max 0.3% max 4.3%
min -76.8% min 0.0% min -2.8%
125
Integrate Models into Placement
• Recent study from UTDA
• On 2016 ISPD FPGA Placement contest
benchmarks
• Up to 7% reduction in routed wirelength
Placement
Iterations
Congestion
Cell Inflation
Prediction
126
Bridge Design and Manufacturing
Mask Mask
Placement Routing
127
Lithography-Friendly Detailed Router:
AENEID
• Motivation: avoid RET dependent layout
printability
• Challenges: the lithography hotspot detection

dilemma in the detailed routing stage
[Ding+,DAC’11]
128
Lithography-Friendly Detailed Router:
AENEID
• Objective: minimize total wirelength
• Subject to: keep lithography cost on each routing
grid within a threshold
X Lagrangian X
min 1, relaxation max min 1+ e (litho(e) L),
P P
e2P e2P
<latexit sha1_base64="b7ACplMU47wZl0Fx3bA3ft8BlRE=">AAAD63icfVJNb9NAELUTPoL5SuHIZYQFKlJlxc6hPVZw4cAhSKStlI2i9WaSrLK7Nrvr0sj1r+CGOCHxo/g3bFwTNWnpSpbevpk3bzw7aS64sb3eH7/Vvnf/wcPOo+Dxk6fPnnf3XpyYrNAMhywTmT5LqUHBFQ4ttwLPco1UpgJP0+WHdfz0HLXhmfpiVzmOJZ0rPuOMWkdNur9IinOuSvxa1EwVNIRx7tbdJFeTARAXnsJbIKaQkxKBcAWDCuIDAEICYvHCalmayEbVJtfpF9k+vgMiED4dNDyZZZoKAU0NICqzdA4BQTXdmK7xpqNJN+xFvfrATRA3IPSaM5jstWIyzVghUVkmqDGjuJfbcUm15Uygq18YzClb0jmOHFRUohmX9TAreOOYKbgu3acs1Ox1RUmlMSuZukxJ7cLsxtbkbbFRYWdH45KrvLCo2JXRrBBgM1i/DEy5RmbFygHKNHe9AltQTZl17xcEAVH4jWVSUjcckmpajeJx6UZlCo1rt9LNeWaJoGouMIyJ5vOFvayqbeES7e3Cy38SousKu0LnyGpxcpcrkPN8QZXNZBkmVdMD1DkRhMn1cFzd6eeKa36Bohr1/28IO45h3+1fY7p2q38Mwv6Wr2sMdq3dksW7K3UTnCRR3I+Sz0l4/L5Zt473ynvt7Xuxd+gdex+9gTf0mN/xI//QP2rL9vf2j/bPq9SW32heelun/fsvThFKIQ==</latexit>
s.t. litho(e)  L, 8e 2 P <latexit sha1_base64="HfFxSVrDJ65we8pGoW7TqYTaEDk=">AAAD+3icfVJLb9NAEHYaHsW8WrggcRlhgVoBUZwc4IQqONADhyLRh5SNrPVm4qziXZvddWnk+tdwQ5yQ+DH8G8auidq0dCRLs9/MN994ZuI8ldb1+386a90bN2/dXr/j3713/8HDjc1HBzYrjMB9kaWZOYq5xVRq3HfSpXiUG+QqTvEwnn+o44fHaKzM9Be3yHGseKLlVAruCIo2frEYE6lL/Fo0SOW3gCV1Ry/FT6KSpVRxwitgSuqo3KvgBTBbqKhEYFIDASG8hDYtQtgi8izbwm14DZ+2XwEw5jOHJ86o0vZcjyqR4KQus+SwBKEPTGeOJ+Az1JNlE7W/7DDaCPq9fmNw2QlbJ/Ba24s210I2yUShUDuRcmtHYT9345IbJ0WKVL+wmHMx5wmOyNVcoR2XzXAreE7IBKaZoU87aNDzjJIraxcqpkzF3cyuxmrwqtiocNO341LqvHCoxZnQtEjBZVBvCibSoHDpghwujKReQcy44cLRPn3fZxq/iUwpTsNhseHVKByXNCpbGKzVaGs4dTRfnaQYhMzIZOZOq+oicY7uauLpPwozTYVVIimKhjy4ThXYcT7j2mWqDAZV2wM0OT0IBufDYXWtHhU38gTTajT8vyCsKAZDurRWtFZrfgyC4QVdagxWpenIwtWTuuwcDHrhsDf4PAh23rfntu499Z55W17ovfF2vF1vz9v3ROdJ513nY2e3W3W/d390f56lrnVazmPvgnV//wWsQVBH</latexit>
s.t. e 0
50% reduction
in hotspots
ANN SVM
[Ding+,DAC’11] 129
Conclusion
• Machine learning brings
• New modeling opportunities
• New optimization techniques
• New hardware acceleration, e.g., GPU acceleration
• New software platforms, e.g., Tensorflow, PyTorch
• Hammers and bridges for conventional EDA flow
• Reformulate the problems, e.g., neural network training,
image-to-image translation
• Accurate and efficient information feedback from late
stages to early stages
130
Open Problems
• Connectivity feature representation
• How to encode hypergraph as input to models
• Or develop models that are friendly to ultra-large
graphs
• Optimization-friendly ML models and ML-friendly
optimization techniques
• Target at integrating ML models into optimization
• Need to consider fidelity, smoothness, accuracy,
convergence rate
• Generalization guarantee
• How far can ML models generalize
• How to know whether a model is applicable to new data
131
Future Directions
• Routability & Timing-driven DREAMPlace with
deep learning toolkits and GPU acceleration
• Integrate ML models for routability and timing prediction
• Reinforcement learning for routing strategies
• Rip-up and re-route policy
• Tree topology generation
• End-to-end mask synthesis and verification
• Use LithoGAN to guide SRAF and OPC
• No need to generate golden SRAF and OPC solutions
• Open discussions…
132
Q&A
Thanks
133
Recent Development of Placement
*Data collected from RePlAce [TCAD’18, Cheng+] and http://vlsi-

cuda.ucsd.edu/~ljw/ePlace/ on ISPD 2005 benchmarks
Future Directions
New Solvers New Objectives
SGD, ADAM, etc. Routability, timing, etc.
DREAM
Gate sizing, BIGGER Multi-GPU,
distributed computing,
floorplanning,
mixed precision,
…
…
Applicable to Other
New Accelerations
CAD Problems
135

PDSeasonableSchool ML4PD

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PDSeasonableSchool ML4PD

Uploaded by

Copyright:

Available Formats

Machine Learning and Its

Applications in IC Physical Design

= “𝐻𝑜𝑤 𝑎𝑟𝑒 𝑦𝑜𝑢” = “5 − 5” (next move)

Pick the BEST function 𝑓 ∗

Our model TESTING

Unsupervised Learning Supervised Learning Semi-supervised Learning

• Machines can learn by themselves

• Pokemons can fight by themselves. Why do

Step 1 Step 2 Step 3

Unsupervised Supervised Semi-supervised

• Stochastic gradient descent (SGD)

Input Space Feature Space

Randomly set cluster centers 𝑚3 , … 𝑚‚ ;

𝒇 𝒙 = 𝝈𝟑 (𝑾𝟑 𝝈𝟐 (𝑾𝟐 𝝈𝟏 (𝑾𝟏 𝒙)))

𝒇 𝒙 = 𝝈𝟑 (𝑾𝟑 𝝈𝟐 (𝑾𝟐 𝝈𝟏 (𝑾𝟏 𝒙)))

Rectified linear unit (ReLU) Logistic (Sigmoid)

Forward pass Backward pass

Local cost minimum

Global cost minimum

Why DEEP neural networks

Less gates required Less parameters Less data?

ResNet Top-5 err.

Deep Residual Learning for Image Training Very Deep Networks

Exploring Randomly Wired

Jun. 2014 Nov. 2015 Nov. 2017

Nov. 2014 Nov. 2016 Sep. 2018

Generative Adversarial Deep Convolutional GANs (DCGANs) PG-GAN/pix2pixHD

High quality images generated by BigGAN

IBM Power7 Apple A12

• Large scale: billions of transistors

module test Architectural Design

Packaging and Testing 56

module test Architectural Design Lithography

Fabrication Lithography Rule Check

Packaging and Testing Mask Optimization 57

without OPC with OPC

[Curtesy Tech Design Forums] 64

Hammers to tackle each step

Bridges to connect each step

Placement is critical to VLSI

Challenges of Nonlinear Placement

Over 60x speedup in neural

• Greatly leverage deep learning hardware (GPU)

• Enable ultra-high parallelism and acceleration

Train a neural network Solve a placement 73

Train a neural network Solve a placement 74

Placement Match RePlAce

Same quality of results!

Density Map Potential Map Field Map

Placement Metrics GPU Usage on Titan Xp

Code release: https://github.com/limbo018/DREAMPlace

Sweep the routing of the clock net 79

Schematic Manual ICCAD’10 W/o guide GeniusRoute

Closest results to the manual layout

MAGICAL MAGICAL VALIDATION

• MAGICAL 0.2 Version Open-source released in May 2019

Ratio of model-based OPC time

(normalized by 40nm node)

Model-based OPC is the most

Target pattern SRAF SRAF box OPC region SRAF region

[Xu+, ISPD’16, TCAD’17] 88

• Can we define the problem as translating images from

Original Layout Encoded Layout