Download as pdf or txt
Download as pdf or txt
You are on page 1of 135

Machine Learning and Its

Applications in IC Physical Design


David Z. Pan Yibo Lin
University of Texas at Austin Peking University
http://users.ece.utexas.edu/~dpan/ http://yibolin.com/

1
Outline
Part I: Introduction to Machine Learning
• What is Machine Learning
• Taxonomy of Machine Learning
• Machine Learning Techniques
Part II: Machine Learning for Physical Design
• Motivations
• Opportunities
• Challenges and Future Directions

2
A Bit History

Courtesy Nvidia3
Enormous Opportunities for
Machine Learning

Automotive Appliances

Healthcare
Entertainment 4
Enormous Opportunities for
Machine Learning

Automotive Appliances

Healthcare
Trash Classification Entertainment 5
Machine Learning ≈ Looking for a
Function
Speech recognition Playing Go

𝑓 𝑓

= “𝐻𝑜𝑤 𝑎𝑟𝑒 𝑦𝑜𝑢” = “5 − 5” (next move)


Image recognition Dialogue system

𝑓 𝑓 “𝐻𝑖”
(What the user said)
= “𝐶𝑎𝑡” = “𝐻𝑒𝑙𝑙𝑜”
(system response)
6
Machine Learning Framework

TRAINING
“dog”
𝑓6
𝑓4 Evaluate
𝑓3 𝑓7 goodness of Data “cat”
𝑓5 𝑓8
functions
“dog”
A set of functions

Pick the BEST function 𝑓 ∗

“cat” Using 𝑓 ∗

Our model TESTING

7
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning

Unsupervised Learning Supervised Learning Semi-supervised Learning


only inputs 𝑥; training data with inputs 𝑥 subsets of data have
no labels/outputs 𝑦 and labels/outputs 𝑦 labels/outputs 𝑦

Scenarios

8
Based on [Boning@MIT2019]
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
• Clustering K-means
• Dimensionality PCA
reduction
• Kernel methods
• Density
estimation

9
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
• Sampling
• Search and
optimization
• Active learning
• Reinforcement
learning
• Online learning
• Transfer learning
• Adversarial
learning

10
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
• Sampling
Genetic Algorithm • Search and
Bayesian Optimization optimization
• Active learning
• Reinforcement
learning
• Online learning
• Transfer learning
• Adversarial
learning

11
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
• Sampling
• Search and
optimization
• Active learning
• Reinforcement
learning
• Online learning
• Transfer learning
• Adversarial
learning

12
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
• Sampling
• Search and
optimization
• Active learning
• Reinforcement
learning
• Online learning
• Transfer learning
• Adversarial
learning

13
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
• Sampling
• Search and
optimization
• Active learning
• Reinforcement
learning
• Online learning
• Transfer learning
• Adversarial
learning

14
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
• Classification
Logistic
• Regression
SVM OLS, MARS, PCR, ridge…
(parametric)
• Instance-based
Gaussian process (nonparametric)
Trees, random forest • Dimensionality Latent variable
reduction Sparse coding
• Generative Matrix completion
Bayesian (density
MCMC estimation) Deep
• Discriminative NN Convolutional
recurrent

15
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
• Classification
Logistic
• Regression
SVM OLS, MARS, PCR, ridge…
(parametric)
• Instance-based
Gaussian process (nonparametric)
Trees, random forest • Dimensionality Latent variable
reduction Sparse coding
• Generative Matrix completion
Bayesian (density
MCMC estimation) Deep
• Discriminative NN Convolutional
Parametric methods Nonparametric methods
model complexity fixed (does not depend on recurrent
model complexity depends on
number of training samples; may depend on training samples
hyperparameters and/or regularization)

16
Taxonomy of Machine Learning
Unsupervised Supervised Semi-supervised
Learning Learning Learning
• Classification
Logistic
• Regression
SVM OLS, MARS, PCR, ridge…
(parametric)
Generative adversarial
• Instance-based
combines generative and discriminative
Gaussian process learners, e.g., GAN
(nonparametric)
Trees, random forest • Dimensionality Latent variable
reduction Sparse coding
• Generative Matrix completion
Bayesian (density
MCMC estimation) Deep
• Discriminative NN Convolutional
recurrent
Generative methods Discriminative methods
learn 𝑝 𝑥, 𝑦 and/or 𝑝 𝑥 𝑦 to reason learn 𝑝 𝑦 𝑥
about 𝑝 𝑦 𝑥 relationships directly

17
Before Going too Technical…

18
New Job Opportunities: AI Trainer

• Machines can learn by themselves


• Why do we need AI trainers/engineers?

• Pokemons can fight by themselves. Why do


they need pokemon trainers?

Based on [Lee@NTU2017] 19
To be Serious, Tasks for AI Trainer
• Proper model
• Define the set of functions
• Proper loss function
• Not always easy to find the best function
• Optimality? E.g., Deep learning
• Require experienced engineers

Step 1 Step 2 Step 3


Evaluate
Define a set of Pick the best
goodness of
functions functions
functions
Based on [Lee@NTU2017] 20
Taxonomy of Machine Learning

Unsupervised Supervised Semi-supervised


Learning Learning Learning
• Clustering • Classification • Sampling
• Dimensionality • Regression • Search and
reduction (parametric) optimization
• Kernel methods • Instance-based • Active learning
• Density (nonparametric) • Reinforcement
estimation • Dimensionality learning
reduction • Online learning
• Generative • Transfer learning
(density • Adversarial
estimation) learning
• Discriminative

21
Based on [Boning@MIT2019]
Linear Models
• Linear regression
• 𝑓 𝑥 = 𝑤 = 𝑥 + 𝑏 = ∑C
AB3 𝑤A 𝑥A + 𝑏
• Logistic classification
G
• 𝑃 𝑦 = 1|𝑥 =
J(∑N
LMG OL PL HQ)
GHI

22
Linear Models – Training
• Ordinary Least Squares (OLS)
• Minimizes the mean square error over the 𝑛 data points
• Direct solution is possible
n
1 X T (i)
min (w x + b y (i) )2
w,b n i=1
<latexit sha1_base64="ZqhExa1ano2bIbKiI/YLF8J2vO0=">AAAD6nicfVJNbxMxEHUaWsry0RSOXCxWlVIBUXZzgAtSBReOReqXFCeR15kkVmzvYnvbRq7/BDfEld+ExH/hgLMJapuWWFpp5o3fvPHbyQrBjW23f9c26g82tx5uP4oeP3n6bKex+/zE5KVmcMxykeuzjBoQXMGx5VbAWaGBykzAaTb9NK+fnoM2PFdHdlZAT9Kx4iPOqA3QoFGQDMZcOfhaVoiPiORq4C7e4MxjEtAhJiNNmUu8U56YUg4c/5D4fshw86J/hC/7rsn3PX6NM/wWzxbZfj/FROWWjiMCanjdf9CI2612dfDdIFkGMVqew8HuRkaGOSslKMsENaabtAvbc1RbzgSEgUsDBWVTOoZuCBWVYHqussbjvYAM8SjX4VMWV+hNhqPSmJkMb92T1E7Mam0O3lfrlnb0vue4KkoLii2ERqXANsdzn/GQa2BWzEJAmeZhVswmNBhpw9+IoogouGC5lDSYQzJNfTfpuWCVKTXM1RwRMLJEUDUWECdE8/HEXnl/mzgFez/x6h+F6KrDKjEosoqcrlPF5LyYUGVz6eLUL2fA1Z0WjtOb5cSv1QvNNb8E4bud/wviFcW4E1ZwKTpXqx6G484t3TAYXiutci0XJi2kTkDba3fmmY/CViarO3g3OElbSaeVfknjg4/L/dxGL9Er1EQJeocO0Gd0iI4RQ7/Qn9pmbasu6t/q3+s/Flc3akvOC3Tr1H/+BU2WTbc=</latexit>

• Ridge Regression
• Regularization by penalizing large weights
n
1 X T (i) 2
min ( (w x + b y (i) )2 ) + kwk
w,b n i=1
<latexit sha1_base64="ULL93Y4r1ZiP7QOZkjpu7doxaGU=">AAAEAXicfVLLbhMxFHUaHmV4tXQJSBYjpFRAlJksYINUwYZlkfqS4iTyODeJVdsz2J6mkesVC76FHWLLl/AR/AOeSVDbtPRKI917ro/P9ZmbFYIb2+n8bqw1b92+c3f9XnT/wcNHjzc2nxyYvNQM9lkucn2UUQOCK9i33Ao4KjRQmQk4zI4/Vv3DE9CG52rPzgvoSzpRfMwZtQEabnwjGUy4cvClrBEfEcnV0M1e48xjEtARbpGxpswl3ilPTCmHjr9P/CBUuDUb7OHTgWvxbY9f4Qy/wfNFtT1ItwNCRJhlRDFRuZZu5gdplVo6iQio0bnscCPutDt14KtJskxitIzd4eZaRkY5KyUoywQ1ppd0Ctt3VFvOBIR3lAYKyo7pBHohVVSC6bvaMY9fBmSEx7kOn7K4Ri8yHJXGzGWw4KWkdmpWexV4Xa9X2vG7vuOqKC0othAalwLbHFf24xHXwKyYh4QyzcOsmE1psNeGnxRFEVEwY7mUNJhDMk19L+m7YJUpNVRqjggY2+CqmgiIE6L5ZGrPvL9MPAZ7PfHsH4Xo+oZVYlBkNTm9SRWTk2JKlc2li1O/nAHXZ9o4Ti+2E3+jXrhc81MQvtf9vyBeUYy7YTOXopVa/TAcdy/phsHwjdLVQi5MWkgdgLbn7lSVj8JWJqs7eDU5SNtJt51+TuOdD8v9XEdP0QvUQgl6i3bQJ7SL9hFDfxpbjWeN582vze/NH82fi6NrjSVnC12K5q+/J9JVTg==</latexit>

• Stochastic gradient descent (SGD)


• Pick a data point 𝑖 (or a batch) at random for iterative
model updates, same idea as neural networks
23
Feature Space
• Mapping from input space to feature space
𝒙 = 𝒙𝟏 , 𝒙𝟐 , ⋯ , 𝒙𝑵 ↦ 𝝓 𝒙 = [𝝓 𝒙𝟏 , 𝝓 𝒙𝟐 , ⋯ , 𝝓 𝒙𝑵 ]

Attributes Features

Input Space Feature Space


• Nonlinear features under a linear model
• 𝑓 𝑥 = ∑T
AB3 𝑤A 𝜙A 𝑥 + 𝑏

24
Support Vector Machine
• A learning method that uses a hypothesis space of
linear functions in a high dimensional feature
space trained with an optimization based learning
algorithm
𝑥A : feature vector
𝑦A : label, {-1, 1}
Consider 𝑦tA = 𝑤A 𝑥A − 𝑏

Objective
1 4
𝑚𝑖𝑛. 𝑤
2

Subject to
𝑤A 𝑥A − 𝑏 𝑦A ≥ 1
25
Support Vector Machine
• Extend to a high-dimensional feature space
through the (nonlinear) kernel
• Data can be linearly separable in feature space

26
Decision Trees for Classification
Partition the input space and fit very simple models
to predict the output in each partition.
• Split nodes: select the “best” feature and value to split
• “Best”: usually defined in terms of some notion of
“impurity” in the resulting partition of training data.
Typical impurity measures include misclassification
error, Gini index, or entropy

27
Probability of kyphosis after surgery [wikipedia]
Decision Trees for Classification
Partition the input space and fit very simple models
to predict the output in each partition.
• Split nodes: select the “best” feature and value to split
• “Best”: usually defined in terms of some notion of
“impurity” in the resulting partition of training data.
Typical impurity measures include misclassification
error, Gini index, or entropy

28
Probability of kyphosis after surgery [wikipedia]
K-Means Clustering
Unsupervised learning
• Only have feature 𝑥, no label 𝑦
Target: identify natural groups of data
• min ∑|
AB3 ∑}∈•L 𝑥 − 𝜇A
4

Iterative algorithm

Randomly set cluster centers 𝑚3 , … 𝑚‚ ;


For not converged
• Use these centers to assign each data
point to the nearest cluster;
• Adjust centers 𝑚A based on those
memberships;
29
Deep Learning

Deep Learning for IoT Big Data and Streaming Analytics: A Survey 30
Neural Networks
• Multilayer perceptron (MLP)

𝒇 𝒙 = 𝝈𝟑 (𝑾𝟑 𝝈𝟐 (𝑾𝟐 𝝈𝟏 (𝑾𝟏 𝒙)))


𝝈𝒊 : activation function

𝑾𝟏 𝑾𝟐 𝑾𝟑
Neurons

31
Neural Networks
• Multilayer perceptron (MLP)

𝒇 𝒙 = 𝝈𝟑 (𝑾𝟑 𝝈𝟐 (𝑾𝟐 𝝈𝟏 (𝑾𝟏 𝒙)))


𝝈𝒊 : activation function

Rectified linear unit (ReLU) Logistic (Sigmoid)


0, 𝑓𝑜𝑟 𝑥 < 0 1
𝑓 𝑥 =ˆ 𝑓 𝑥 =
𝑥, 𝑓𝑜𝑟 𝑥 ≥ 0 1 + 𝑒 ‹}
32
A Bit History on Deep Learning

https://beamandrew.github.io/deeplearning
/2017/02/23/deep_learning_101_part1.html 33
A Bit History on Deep Learning
• “Neural networks” → “Deep learning” in 2006
• Geoffrey Hinton
• Breakthrough in 2012
• Large scale visual recognition challenge (LSVRC)
• ImageNet

https://beamandrew.github.io/
deeplearning/2017/02/23/dee
p_learning_101_part1.html 34
How to Train a Neural Network
• Objective: 𝑚𝑖𝑛. ∑A 𝐿𝑜𝑠𝑠(𝑓(𝑥A ), 𝑦A )
• Gradient descent
• Backpropagation

𝑦3 = 𝑓3 (𝑤}3,3 𝑥3 + 𝑤}4,3 𝑥4 )

𝑤}3,3 𝑓3 (𝑒)

𝑥3 𝑤}4,3 𝑓6 (𝑒)

𝑓4 (𝑒) 𝑓8 (𝑒)

𝑥4 𝑓7 (𝑒)
𝑓4 (𝑒)
Forward propagation
35
How to Train a Neural Network
• Gradient descent
• Loss function: 𝐿(𝑧)
• Backpropagation
• Chain rule

Forward pass Backward pass

𝑥 𝑑𝐿 𝑑𝐿 𝑑𝑧
=
𝑑𝑥 𝑑𝑧 𝑑𝑥 𝑑𝐿
𝑓(𝑥, 𝑦) 𝑧 𝑑𝑓
𝑑𝑧
𝑑𝐿 𝑑𝐿 𝑑𝑧
𝑦 =
𝑑𝑦 𝑑𝑧 𝑑𝑦

36
How to Train a Neural Network
• Gradient descent
• Backpropagation
• Non-linear optimization

Local cost minimum


𝐿𝑜𝑠𝑠

Global cost minimum

𝑤
37
Why Deep? Universality Theorem
• Any function 𝑓: 𝑅 C → 𝑅 T
• Can be realized by a neural network with one
hidden layer
• (given enough hidden neurons)

Why DEEP neural networks


but not FAT neural networks?
Input layer Output layer
Hidden layer
Reference for the reason:
http://neuralnetworksandde
eplearning.com/chap4.html 38
Why Deep? Analogy
Logic Circuits Neural Networks
• Consist of logic gates • Consist of neurons
• A two layers of logic gates • A hidden layer network can
can represent any represent any continuous
Boolean function function
• Use multiple layers of logic • Use multiple layers to
gates to build functions in represent functions in a
a much simpler way much simpler way

Less gates required Less parameters Less data?

More reason:
https://www.youtube.com/watch?v=Xs
C9byQkUH8&list=PLJV_el3uVTsPy9oCR
Y30oBPNLCo89yu49&index=13 39
Why Deep? NOT Fat
• Ocaam’s razor: Entities should not be multiplied
without necessity
• The simplest solution is most likely the right one
• Less parameters, better generalization
• Deep + Thin v.s. Short + Fat
• Same # of parameters
• Deep networks tend to generalize better
• Require less amount of training data
#Layers x #Neurons Error Rate (%) 1 x #Neurons Error Rate (%)
5x2k 17.2 1x3772 22.5
7x2k 17.1 1x4634 22.4

Seide, Frank, Gang Li, and Dong Yu. "Conversational speech transcription using context-dependent deep
40
neural networks." Twelfth annual conference of the international speech communication association. 2011.
How Deep?

5-layers
Top-5 err.
16.4

ResNet Top-5 err.


34 7.4
101 6.0
152 5.7

41
More Flavors…
Convolutional Neural Networks
Convolutional layer
• Suitable to images
• Correlation with neighboring pixels

42
More Flavors…
Highway Networks
Residual Network Highway Network

copy Gate
controller copy

Deep Residual Learning for Image Training Very Deep Networks


Recognition https://arxiv.org/pdf/1507.06
http://arxiv.org/abs/1512.03385 228v2.pdf 43
More Flavors…
Random Networks

Exploring Randomly Wired


Neural Networks for Image
Recognition
https://arxiv.org/pdf/1904.
01569.pdf
44
Generative Adversarial Networks
• GAN [Goodfellow et al, 2014] [Radford et al, 2015]
• Two networks contest (generator and discriminator)
• Produces images similar to those in the training data

45
Generative Adversarial Networks
• GAN [Goodfellow et al, 2014] [Radford et al, 2015]
• Two networks contest (generator and discriminator)
• Produces images similar to those in the training data

Real samples

Is D
D
Correct?

Latent
Space G
Generated
Fake Samples 46
Noise Fine Tune Training
Generative Adversarial Networks
• GAN [Goodfellow et al, 2014] [Radford et al, 2015]
• Two networks contest (generator and discriminator)
• Produces images similar to those in the training data

min max Ex⇠pdata (x) [log D(x)] + Ez⇠pz (z) [log(1 D(G(z)))]
<latexit sha1_base64="oddVWz+SvzLAkdHsukR907ym7Rs=">AAAD3XicfVLPb9MwFHYbfozyYx0cuVhESC2IqkkPcJygaByHRLdJcRQ5rptadZxgO6OtlyM3xAmJv41/hRNOlsHajT3J0qf3vs/fe/aLc86UHg5/tdrOrdt37u7c69x/8PDRbnfv8ZHKCknohGQ8kycxVpQzQSeaaU5PcklxGnN6HC/eVfXjUyoVy8QnvcppmOJEsBkjWNtU1P2BYpowYejnos6UHZQyEZmDEqIULyMzroGex7F5X0ZmCZFiKcwjM8Ual71lv4QB4lkCxxaH8OUGe33BXvfWf4k979W4d2ATfctHItM46SAqpv96iLrucDCsA14FXgNc0MRhtNf20DQjRUqFJhwrFXjDXIcGS80Ip3aoQtEckwVOaGChwClVoamfr4TPbWYKZ5m0R2hYZy8rDE6VWqWxZVazqe1albyuFhR69iY0TOSFpoKcG80KDnUGq7+AUyYp0XxlASaS2V4hmWOJibY/1ul0kKBfSJam2D4OiiUuAy809qlUIWnlZhCnM404FgmnrockS+b6rCw3hQuqrxeeXUiQrG/YFlpHUov9m1whOs3nWOgsNa5fNj3AmjOArn+57JU3+tnLJVtSXgaj/xvCLUd3ZBe0Ma3c6sGgO9rwtY3BbWu7ZN72Sl0FR/7AGw38j767/7ZZtx3wFDwDPeCB12AffACHYAII+N2CrX7rhRM5X51vzvdzarvVaJ6AjXB+/gHYYkYW</latexit>
G D
Loss for real samples Loss for generated samples

Real samples

Is D
D
Correct?

Latent
Space G
Generated
Fake Samples 47
Noise Fine Tune Training
Generative Adversarial Networks
• GAN [Goodfellow et al, 2014] [Radford et al, 2015]
• Two networks contest (generator and discriminator)
• Produces images similar to those in the training data

DCGAN
48
Recent Development of GANs

Image-to-Image Translation
with CGAN (pix2pix) BigGAN
Conditional GANs (CGANs)
Let’s build a generic CGAN Google: “Hold my beer Nvidia,
What if I want my GAN to
architecture where the my interns can use thousands
generate data with specific
condition is an image, and of TPUs to generate
attributes
learn to translate this image cheeseburgers”
into another domain

Jun. 2014 Nov. 2015 Nov. 2017

Nov. 2014 Nov. 2016 Sep. 2018

Generative Adversarial Deep Convolutional GANs (DCGANs) PG-GAN/pix2pixHD


Networks (GANs)
As the first major improvement on the Nvidia: “Let’s make some high-
The architecture of a generator GAN architecture, DCGAN is more resolution GANs with multi-
and a discriminator is first stable during training and generates scale approaches”
proposed by Goodfellow et al. higher quality samples

49
Recent Development of GANs

High quality images generated by BigGAN


50
Reinforcement Learning

AlphaGO

AlphaStar

51
Supervised v.s. Reinforcement
• Supervised
• Learning from teachers
• Labeled data
Next move: “5-5” Next move: “3-3”

• Reinforcement
• Learn from critics
Win!
Move 1 Move 2 …
Lose!
• AlphaGO
• Supervised learning + reinforcement learning

52
Reference and More Resources
ML courses from Hung-Yi Lee at NTU
• http://speech.ee.ntu.edu.tw/~tlkagk/courses.html
Stanford ML courses on Coursera
• https://www.coursera.org/learn/machine-learning
Tutorials on GANs
• NIPS 2016: https://arxiv.org/pdf/1701.00160.pdf
• CVPR 2018: https://sites.google.com/view/cvpr2018tutorialongans/
Toolkit tutorials
• Tensorflow: https://www.tensorflow.org/tutorials
• PyTorch: https://pytorch.org/tutorials/

53
Outline
Part I: Introduction to Machine Learning
• What is Machine Learning
• Taxonomy of Machine Learning
• Machine Learning Techniques
Part II: Machine Learning for Physical Design
• Motivations
• Opportunities
• Challenges and Future Directions

54
Modern VLSI Layouts

IBM Power7 Apple A12


1.2B 6.9B

• Large scale: billions of transistors


Nvidia Xavier
• Complicated design flow 9B
• Long design cycles

55
IC Design Flow – Silicon Compiler
System Specification

module test Architectural Design


input in[3];
… Floorplanning
endmodule
Functional&Logic Design
Fabless

AND

OR
Placement
Logic Synthesis
Clock Tree Synthesis
Physical Design
DRC Signal Routing
LVS
STA Mask Synthesis
& Verification Timing Closure

Fabrication
DFM Closure

Packaging and Testing 56


IC Design Flow – Silicon Compiler
Process Modeling
System Specification Test Patterns

module test Architectural Design Lithography


input in[3];

Experiment
endmodule
Functional&Logic Design Lithography Modeling
Fabless

AND

OR

Logic Synthesis
Layout/Mask
Physical Design
Mask Optimization
DRC Recipe
LVS
STA Mask Synthesis
& Verification SRAF&OPC

Fabrication Lithography Rule Check

Packaging and Testing Mask Optimization 57


Placement Example

Optimization targets
• Satisfy timing constraints?
• Satisfy power constraints?
• 100% routable?
• Wirelength minimized?
• …
[Curtesy NTU team] 58
Routing Example

A zoom-in 3D view
[curtesy samyzaf]

Challenging problem
• 10+ metal layers
• Millions of nets
• May be highly congested
• Minimize wirelength
[Curtesy Umich] 59
High Correlation between P&R

Curtesy semiwiki

60
Mask Synthesis
Design target

without OPC with OPC

Mask

Wafer

61
Mask Synthesis: OPC+SRAF

[Curtesy Semiengineering]

62
Mask Verification: Lithography
Simulation

Contact Mask

Aerial Image
(Light intensity map)

Resist Pattern

63
Mask Verification: Lithography
Hotspots

Litho Difficulty
Estimator: hotspots

[Curtesy Tech Design Forums] 64


Challenges for VLSI Design
• Long and complicated design flow
• Nested chicken-egg loops
• P&R, OPC&SRAF…
• Nearly all problems are NP-hard
• Stacking of metaheuristics
• High expectation to optimality
• Shoot for even 1% improvement
• Single iteration is expensive
• One iteration of backend flow may take days
• Require many iterations for convergence

65
How Machine Learning Can Help

Mask Mask
Placement Routing
Synthesis Verification

Hammers to tackle each step

66
How Machine Learning Can Help

Mask Mask
Placement Routing
Synthesis Verification

Bridges to connect each step

67
Placement

Mask Mask
Placement Routing
Synthesis Verification

68
VLSI Placement

Placement is critical to VLSI


design quality and design closure

Input
Gate-level netlist
Standard cell library
Output
Legal placement solution
Objective
Optimize wirelength,
routability, etc.

RePlAce
69
Nonlinear Placement Algorithm
Objective of nonlinear placement

Wirelength Density

Challenges of Nonlinear Placement


Low efficiency
§ > 3h for 10M-cell design
Limited acceleration
§ Limited speedup, e.g., mPL, due to clustering
Huge development effort
§ > 1 year for ePlace/RePlAce
70
Advances in Deep Learning
Hardware/Software

[Courtesy Nvidia]

GPU

CPU

Over 60x speedup in neural


Deep learning toolkits
network training since 2013

71
DREAMPlace Strategies
• We propose a novel analogy by casting the
nonlinear placement optimization into a neural
network training problem

• Greatly leverage deep learning hardware (GPU)


and software toolkit (e.g., PyTorch)

• Enable ultra-high parallelism and acceleration


while getting the state-of-the-art results

[Lin+,DAC’19]
72
Best paper award
Analogy between NN Training and
Placement

Train a neural network Solve a placement 73


Analogy between NN Training and
Placement
Casting the placement problem into neural network training

Train a neural network Solve a placement 74


DREAMPlace Architecture
Leverage highly optimized deep learning toolkit PyTorch

Placement Match RePlAce


API
Nesterov’s
Python Nonlinear Method
Optimizer
[TCAD’18,Cheng+]
Automatic
Gradient

CONV WL
OPs
C++/CUDA
ReLU Density

75
Experimental Results
DREAMPlace
• CPU: Intel E5-2698 v4 @2.20GHz
• GPU: 1 NVIDIA Tesla V100
• Single CPU thread was used
34x
speedup
RePlAce [TCAD’18,Cheng+]
• CPU: 24-core 3.0 GHz Intel Xeon
• 64GB memory allocated ISPD 2005 Benchmarks
200K~2M cells

Same quality of results!


43x
speedup

10M-cell design
finishes within 5min c.f. 3h Industrial Benchmarks
1M~10M cells

76
[Lin+,DAC’19]
Bigblue4 (2M-Cell Design)

Density Map Potential Map Field Map

Placement Metrics GPU Usage on Titan Xp

Code release: https://github.com/limbo018/DREAMPlace


77
Routing

Mask Mask
Placement Routing
Synthesis Verification

78
Routing Guidance: GeniusRoute
Routing for analog circuits, e.g., comparator
• Sensitive performance to clock routing
Existing manual layouts
• Hard to encode designer expertise into rules

Sweep the routing of the clock net 79


[Zhu+,ICCAD’19]
Routing Guidance: GeniusRoute
Learn from manual layouts
• Encode designer expertise into neural networks!
Generate routing guidance for critical nets
• Clock, power/ground, critical signal nets
• Probability map of routing

Unrouted Generated
routing region Routed
Placement
Layout
80
[Zhu+,ICCAD’19]
Routing Guidance: GeniusRoute
• Adopt autoencoder for routing region generation
• Compare with routing from manual layouts

81
[Zhu+,ICCAD’19]
Routing Guidance: GeniusRoute
• Test on comparators and OTAs
• Evaluate with post layout simulation
• Compare with manual layout and previous methods

Schematic Manual ICCAD’10 W/o guide GeniusRoute


O↵set (µV) / 480 1230 2530 830
Delay (ps) 102 170 180 164 163
Noise (µVrms ) 439.8 406.6 437.7 439.7 420.7
<latexit sha1_base64="NeOAhk8VynDRv7Rc5DqJ2iTHEp0=">AAAFqXicfVNbk9M2FDaBUEgL5fLIi4awFIbBazshyfJEG2baPlCWy16GOLMjK8eJZmXJI8nA1uu/1//Q/9DX9rnHstlmA0UeW0dH53zfkfydJBfc2CD480Ln4qXu5W+uXO19+92169/fuHlr36hCM9hjSih9mFADgkvYs9wKOMw10CwRcJAcT+v9g/egDVfyrT3JYZ7RpeQpZ9Si6+hm5yhOYMllaSmmVL2YgbSguVz2YsM0z63hv0NvK2Y0rzPKqcpyqrlRkqiU5MrYx4KeqMISw7NCOFiiwRTCGpIqTaYvX+yGPkFoQRMQNdFThiAhes6oMVFX5SlzTzvh/qo+Vo9sxYhsOZ62yGQ5coFV6YArEsekjbtP3rAVZFgCI/fJCyoLKtD4dTr98fkPYYDmwbYiy4IvAO2fQfLCvMbKYQ3jZZoasOTBPaQksYWPttyv7j3E+G18h5MaJYwGAcE5ejKoly4qScvJIKj+A3oOeC01jsPITQMSBlH9HTuYBmw0XMMIR4P1E/2muIGmmE+1HJWNpTNTNZjDwY4/qedg5I/ceuyPW/94DXsYBf54HX1XfQDdHvWgKW/gD5+4ovydiSvUD3ea9SQ6V6Y/CdahYpCLs//YO1uioo5u9AM/cIN8boSt0ffasYuSDOOFYkWGQmSCGjMLg9zOS6rxvzqJFgZyyo7pEmZoSpqBmZeuFSqyhZ6F012qpCXOu55R0syYkyzBSBTKymzu1c4v7c0Km07mJZc56kWyhigtBLGK1H1FFlwDs+IEDYp9U2uQraimDLvJ9PBKJHxA2WcUbyZONK1m4bzEezKFriW7KmMBqcUekUsB/TDWfLmyp1V1PvEY7JcTTz+lxNohbCYiI3PJ0ddYSfw+X1FpVVb2o6qtgbgYn/Sj9e2w+iofgmv+EUQ1G/w/Idlg7Nfib0lrNncw0h+c48XCyCY1iizclNTnxn7ko8CjV1H/2U+t3K54d7y73gMv9MbeM+8Xb9fb81jnj85fnb87/3QfdV91D7vvmtDOhTbntndudNm/fEvTBw==</latexit>
Power (µW ) 13.45 16.98 17.19 16.82 16.80

Closest results to the manual layout

82
[Zhu+,ICCAD’19]
MAGICAL (Machine Generated Analog Layout) Overview

MAGICAL MAGICAL VALIDATION


INPUTS Calibre®
LAYOUT CONSTRAINT EXTRACTOR DRC/LVS/PEX
Circuit Netlist Pattern Matching +
Small Signal Analysis EVALUATION
Design Rules Cadence® Virtuoso®
ADE
PLACER ROUTER
Performance-driven MAGICAL
Placement Deep
DEVICE OUTPUT
learning-
GENERATOR Post-Placement guided GDSII Layout
Parametric Optimization Routing
Instances
Deep Learning

• MAGICAL 0.2 Version Open-source released in May 2019


• GitHub: https://github.com/magical-eda/MAGICAL
• End-to-end analog layout generation from netlist to GDSII
• No dependency on commercial tools
• Push-button, no-human-in-the-loop 83
Mask Synthesis

Mask Mask
Placement Routing
Synthesis Verification

84
Optical Proximity Correction
Issue: Conventional OPC consumes very long time
Iteration : 0
Goal: High accurate correction in shorter runtime

Ratio of model-based OPC time


Drastic reduction of

(normalized by 40nm node)


OPC time is required!
Iteration : 3

Iteration : 5 1

Technology node

Model-based OPC is the most


Mask image Printed image widely used technique 85
Results on HBM-based OPC
RMSE (nm) Compare with model-based OPC
(Root mean square error) @iteration 2, 6, 10

■Train ■Test
Edge Placement Error (nm)
The number of iterations of model-based OPC
[Matsunawa+, SPIE’15]
can be reduced by 2x 86
GAN for OPC
• Generative adversarial networks for OPC
• [Yang+, DAC’18]

Generated Reference
Target & Mask

Mask
0.8

0.2

Mask
Compared with ILT flow
EPE error: 9% reduction Discriminator
PV-band: 1% reduction Target

Mask
Overall runtime: 2x less


Encoder Decoder
Generator 87
SRAF Insertion
• SRAF (sub-resolution assist feature) insertion
• ML-based approach achieved similar result but 10x faster

N%1 0 1 2
sub%sampling0point
SRAF%label:%0

SRAF%label:%1

Target pattern SRAF SRAF box OPC region SRAF region

[Xu+, ISPD’16, TCAD’17] 88


Comparison with Model-based SRAF
• Similar PV band and EPE
• But 10x faster

Can we do better?!

89
Image Translation with Generative
Adversarial Networks
Generative Adversarial Network (GAN) [Goodfellow+, 2014]
• Two neural networks contest (generator and discriminator)
• Produces images similar to those in the training data set
Conditional GAN (CGAN) for Image Translation [Isola+, CVPR’17]
• Takes an image in one domain and translate it to another one

90
SRAF Insertion & Image Translation
• What does SRAF generation have to do with Image
translation?!

• Can we define the problem as translating images from


the Target Domain (𝐷“ ) to the SRAF Domain (𝐷• )?

[Alawieh+,DAC’19] 91
Challenges for SRAF Insertion
• Layout images have sharp edges which pose a
challenge to GANs
• No guaranteed to generate polygon SRAF shapes
• Sharp edges can complicate gradient propagation
• Generated images need ultimately be changed to
layout format
• Images cannot be directly mapped to ‘GDS’ format
• Post-processing step should not be time consuming
• Hence, a proper encoding is needed to address
these challenges!

92
Multi-Channel Heatmap Encoding
Key Idea: encode each type of object on a separate
channel in the image
• Channel index carries object description (type, size,...)
• Excitations on the channel carry objects location

Original Layout Encoded Layout


93
CGAN Approach
Generator:
• Trained to produce images in
𝐷” based on input from 𝐷“ Fake/Real

• Tries to fool the Discriminator Discriminator


Discriminator: Real
Diff
• Trained to detect ‘fakes’
generated by the Generator
The two networks are jointly Encoder Decoder

trained until convergence Generator Fake

Input

94
Results Decoding
Decoding the generated layout images in two steps
• Thresholding
• Excitation detection

Decoding scheme is fast è GPU accelerated

SRAF
location

Isolated
pixel

95
Sample Results TCAD’17: [Xu+, ISPD’16, TCAD’17], SVM-based
MB: Model-Based Approach - Calibre

TCAD’17 GAN-SRAF MB

Target Pattern MB SRAF LS_SVM Final

LS_SVM Prediction CGAN Final CGAN Prediction 96


Comparison Summary

No SRAF MB TCAD’17 CGAN

PV Band (um2) 0.00335 0.002845 0.00301 0.00291

EPE (nm) 3.9287 0.5270 0.5066 0.541

Run time (s) - 6910 700 48

• The proposed CGAN based approach can


achieve comparable results with TCAD’17 and MB
with 14.6X and 144X reduction in runtime

[Alawieh+,DAC’19] 97
Mask Verification

Mask Mask
Placement Routing
Synthesis Verification

98
Bottleneck in Semiconductor and IC
Manufacturing: Lithography

Layout Chip
• Lithographic modeling and hotspot detection
• What you see (at design) is NOT what you get (at fab)
• Hotspot è poor performance and yield
• Litho-simulations are extremely CPU intensive
• Need much faster algorithms to guide IC design, and
pinpoint possible mask/wafer inspection spots 99
Challenges in Lithography Modeling

Rigorous simulation: physics-based simulation, e.g., Synopsys S-Litho

Rigorous Simulation

Mask Layout Accurate but slow Resist Pattern

• Simulating 2 μm × 2 μm using Synopsys S-Litho ⟹ ~1 minute


• A 2 mm × 2 mm chip contains 1M such clips ⟹ 1.9 years!
• Intel Ivy Bridge 4C: 160 mm2

100
ML for Lithography Modeling

Rigorous simulation: physics-based simulation, e.g., Synopsys S-Litho

Rigorous Simulation

Mask Layout Accurate but slow Resist Pattern

Machine learning for resist modeling [Watanabe+, SPIE’17] [Shim+, SPIE’17]

Optical Machine Threshold


Model Learning Processing

Mask Layout Aerial Image Slicing Threshold Resist Pattern

Speeds up the resist modeling stage


101
VLSI Technology Nodes

[electronicsforu, 2017] 102


Transfer Learning for Lithography
Modeling

Training with limited new tech. data + older tech.

10nm Data 10nm Litho


Model
Training
Large amount of data Knowledge
Transfer

7nm Data 7nm Litho


Model
Training
Small amount of data
Good accuracy for very
[Lin+, TCAD’19]
small data, CNN 103
Transfer Learning with ResNet
Transfer and Fix TFk Scheme

Source Domain Target Domain


Input Input

conv conv

conv Knowledge conv Fix k layers


Transfer

··· ···

conv conv
Finetune
fc fc

fc fc

Source Domain Target Domain


Output Output 104
Technology Transition from N10 to N7

Contact Layer Design Rules [Liebmann, SPIE’15]


N10 N7
Patterning LELE LELELE
Pitch (nm) 64 45
Mask pitch (nm) N10128 N7a 135 N7b
Design Rule
Litho-target (nm) A 60 B 60 B
Optical Source A B B
Resist Material A A B
Optical Sources Resist Materials

N10 N7 Resist A Resist B

Different dissolution slopes 105


Data Reduction from Knowledge
Transfer

From N10 to N7b From N7a to N7b

2.6X CNN 8X CNN


40 % CNN TF0 40 % CNN TF0

Training Data
Training Data

ResNet TF0 ResNet TF0

4X 20X
20 % 3X 20 % 15X
2X 10X 10X 10X

0% 0%

50

75

00

25

50
50

75

00

25

50

1.

1.

2.

2.

2.
1.

1.

2.

2.

2.

CD RMS Error (nm) CD RMS Error (nm)

2~10X reduction of 8~20X reduction of


training data training data
106
End-to-End Lithography Modeling

Rigorous simulation: physics-based simulation, e.g., Synopsys S-Litho

Rigorous Simulation

Mask Layout Accurate but slow Resist Pattern

Machine learning for end-to-end lithography modeling

Machine Learning

Mask Layout Resist Pattern


Goal: ultimately fast modeling
[Ye+,DAC’19]
107
Best paper nomination
Image Translation for Lithography
Modeling

128 nm
1 µm

128 nm
1 µm

Expensive Litho Simulation

256 px
256 px

256 px
Fast Image Translation 256 px

Different elements encoded Resist pattern zoomed in for


on different image channels high-resolution/accuracy
108
LithoGAN Architecture
Generator

Encoder Decoder

Discriminator CNN for center prediction


109
LithoGAN Visualization

Model advancement progress

110
Experimental Results

Setup
• Python w/ TensorFlow
• 3.3GHz Intel i9 CPU & Nvidia TITAN Xp GPU

Datasets
• Different types of contact arrays [Lin+, TCAD’18]
• 982 mask clips at 10nm node (N10)
• 979 mask clips at 7nm node (N7)
• 75-25 rule for train/test split

Methods Compelling runtime speedup for


• Rigorous sim using S-Litho: golden resist patterns early technology exploration
• [Lin+, TCAD’18]: Optical sim using Calibre +
threshold prediction using CNN + post processing
Optical Machine Threshold
Model Learning Processing

Mask Layout Aerial Image Slicing Threshold Resist Pattern


[Ye+,DAC’19] 111
Experimental Results

Union
Predicted contour

EDE
Intersection

Golden contour

Accuracy measures
Edge Displacement Error (EDE)
• Distance between the golden edge and
the predicted one of the bounding boxes
• The smaller, the better
• Captures bounding box mismatch

IOU = Intersection/Union Competent accuracy for lithography usage


• The larger, the better (in consultation with industry)
• Captures contour mismatch
[Ye+,DAC’19] 112
Lithography Hotspot Detection

Pattern/Graph Matching Data Mining/ML


• Pros/cons • Pros/cons
• Accurate and fast for known • Good to detect unknown
patterns or unseen hotspots
• But too many possible • Trade-off accuracy and
patterns to enumerate false alarms
• Sensitive to changing • Accuracy may not be
manufacturing conditions good for “seen” patterns

113
PM & ML for Hotspot Detection

Pattern Matching Methods Machine Learning Methods


Good for detecting previously Good for detecting new/previously
known types of hotspots unknown hotspots

A New Unified Formulation (EPIC)


Good for detecting all types of hotspots
with good accuracy/false-alarm
tradeoff (Meta-Classifier)

• Meta-classification combines the strength of different


hotspot detection techniques [Ding+, ASPDAC’12]
• Balance accuracy and false-alarm
114
Application-Specific ML

Different ways of feature extractions/representations

2 3
C11,k C12,k C13,k ... C1n,k
6 C21,k C22,k C23,k ... C2n,k 7
6 7
6 .. .. .. .. .. 7
DCT Encoding 4 . . . . . 5
Division Cn1,k Cn2,k Cn3,k ... Cnn,k
( 2
C11,1 C12,1 C13,1 ... C1n,1
3
6 C21,1 C22,1 C23,1 ... C2n,1 7
6 7
6 .. .. .. .. .. 7
k 4 . . . . . 5
Cn1,1 Cn2,1 Cn3,1 ... Cnn,1

Deep learning: CNN, … [Yang+, TCAD’19] 115


Recent Progress – Lack of Data
• Self-paced semi-supervised learning
• [Chen+, ASPDAC’19]
• Leverage unlabeled data
• Multi-task learning
Clustering Task

Conv/3x3x32

Conv/3x3x32

MaxPool/2x2

contrastive
Pairwise
Softmax
FC/250
ẑi

FC/2

loss
<latexit sha1_base64="fhPGlcXL9sf7gj/VeINouEhou9c=">AAADUnicfZLPbtQwEMa9CX9KKNDCkYtFisRplWQP7bFSLxyLxLaV4mjleCcbax0nsp3CNs1jcIWX4sKrcMJJU6mblo5k6dPMfP6NrUkrwbUJgj8Tx33y9NnznRfey91Xr9/s7b8902WtGMxZKUp1kVINgkuYG24EXFQKaJEKOE/XJ139/BKU5qX8ajYVJAVdSZ5xRo1NxQckp6a5ahf8YLHnB9OgD3xfhIPw0RCni30nJMuS1QVIwwTVOg6DyiQNVYYzAa1Hag0VZWu6gthKSQvQSdPP3OKPNrPEWanskQb32buOhhZab4rUdhbU5Hpc65IP1eLaZEdJw2VVG5DsBpTVApsSdx+Al1wBM2JjBWWK21kxy6mizNhv8jyPSPjGyqKgctmQVNE2DpOGgNS1go7WEAGZIYLKlQA/JIqvcnPdttvGNZiHjde3FqL6G8ZGS2S9OXqMislllVNpyqLxo3aYAfc9U+xHd8th+yjPXq74dxBtPPs/EI+I/qzFt9CO1j8M+7Mtrh0Mj9F2ycLxSt0XZ9E0DKbhl8g/PhrWbQe9Rx/QJxSiQ3SMPqNTNEcMlegH+ol+Ob+dv+7EdW9ancngeYe2wt39B+h5FM8=</latexit>
Conv/3x3x16

Conv/3x3x16

MaxPool/2x2
Input

cross entropy
Conv/3x3x32

Conv/3x3x32

MaxPool/2x2

Weighted
Softmax
FC/250
ŷi

FC/2
<latexit sha1_base64="VgNKHd8VlvHhnm6RpYKQiz9yt04=">AAADUnicfZLPbtQwEMa9G/6UUKCFIxeLFInTKske6LESF45FYttKcbRyvJONtbYT2U5hleYxuMJLceFVOOGkqdRNS0ey9GlmPv/G1mSV4MaG4Z/J1Hv0+MnTvWf+8/0XL18dHL4+M2WtGSxYKUp9kVEDgitYWG4FXFQaqMwEnGebT139/BK04aX6arcVpJKuFc85o9alkiNSUNts2yU/Wh4E4SzsA98V0SACNMTp8nAakVXJagnKMkGNSaKwsmlDteVMQOuT2kBF2YauIXFSUQkmbfqZW/zeZVY4L7U7yuI+e9vRUGnMVmauU1JbmHGtS95XS2qbH6cNV1VtQbFrUF4LbEvcfQBecQ3Miq0TlGnuZsWsoJoy677J932i4BsrpaRq1ZBM0zaJ0oaAMrWGjtYQAbklgqq1gCAimq8Le9W2u8YN2PuNVzcWovsbxkZHZL05foiKyWVVUGVL2QRxO8yA+54ZDuLb5ah9kOcu1/w7iDaZ/x+IR8Rg3uIbaEfrH4aD+Q7XDYbHaLdk0Xil7oqzeBaFs+hLHJwcD+u2h96id+gDitBHdII+o1O0QAyV6Af6iX5Nf0//ehPPu26dTgbPG7QT3v4/5ZEUzg==</latexit>

Classification Task
116
Preliminary Results [Chen+, ASPDAC’19]
Better accuracy with small amount of training data
DAC SSL

117
Recent Progress: Model Confidence
Examine confidence of model predictions
• [Ye+, DATE’19]
• Gaussian Process provides confidence level
• Active learning: build accurate GP model with little data
• Use GP model: simulate unconfident samples

Clip pool

Select samples Train GPR model


p(f1 |X, x1 , y) p(f1 > 0) ⇡ 98%
p(f2 |X, x2 , y) Run simulation Predict using GPR

Sample1
Probability

Train SVM False


Confident?
Sample2
True New H
detected? True
p(f2 > 0) ⇡ 65% Run simulation
False
x
1 0 1 Training set Output H/NH

Active learning Gaussian Process 118


Preliminary Results [Ye+, DATE’19]
• Larger 𝛼 implies a large confidence interval
• [𝜇 − 𝛼𝜎, 𝜇 − 𝛼𝜎]
• Trade-off: quality and #simulations required

Layout 2 Layout 3
200 99.4

# False Alarms

# False Alarms
100.0 2,800
Accuracy (%)

Accuracy (%)
150 99.2
99.8 2,600
99.0 2,400
100
99.6 98.8 2,200
50 98.6
99.4 2,000
#Simulations (%)
#Simulations (%)

60 30

40
20
20
10
0
0.382 0.682 0.866 0.382 0.682 0.866
↵ ↵ 119
How Machine Learning Can Help

Mask Mask
Placement Routing
Synthesis Verification

Bridges to connect each step

120
Bridge Placement and Routing

Mask Mask
Placement Routing
Synthesis Verification

121
Predicting Routing Congestion
Given placement, predict
• Routing utilization map [Yu+,DAC’19] for FPGA
• DRC hotspots [Chan+,ISPD’17] [Xie+,ICCAD’18] for ASIC

Input FPGA Placement Output Routing Util Ground Truth


https://ycunxi.github.io/cunxiyu/dac19_demo.html

122
Predicting Routing Congestion
Given placement, predict
• Routing utilization map [Yu+,DAC’19] for FPGA
• DRC hotspots [Chan+,ISPD’17] [Xie+,ICCAD’18] for ASIC

[Chan+,ISPD’17]
123
Challenges in Predicting Congestion
Feature representation
• How to encode interconnect information
• Scalability of features
Model selection
• SVM, ANN, FCN, GAN, etc.

Different possibility of routing trees [Spindler+,DATE’07] Connectivity images


Evenly distribute routing demands (RUDY map) [Yu+,DAC’19]
124
Integrate Models into Placement
• Up to 76.8% DRC reduction
• Minor WL and timing impacts
[Chan+,ISPD’17]
#DRCs Wirelength TNS (ns)
eg1 8478 1964 -76.83% 1742804 1747685 0.3% -153.43 -158.4 3.2%
eg2 1502 927 -38.28% 1750698 1753047 0.1% -168.23 -163.5 -2.8%
eg3 2017 1819 -9.82% 1772889 1773701 0.0% -215.75 -213.6 -1.0%
eg4 2026 1780 -12.14% 1735185 1735227 0.0% -151.36 -149.6 -1.2%
eg5 4252 4255 0.07% 1831492 1836060 0.2% -264.34 -275.6 4.3%
eg6 3440 3891 13.11% 1790059 1794184 0.2% -195.65 -203.5 4.0%
avg -20.6% avg 0.2% avg 1.1%
max 13.1% max 0.3% max 4.3%
min -76.8% min 0.0% min -2.8%
125
Integrate Models into Placement
• Recent study from UTDA
• On 2016 ISPD FPGA Placement contest
benchmarks
• Up to 7% reduction in routed wirelength

Placement
Iterations

Congestion
Cell Inflation
Prediction
126
Bridge Design and Manufacturing

Mask Mask
Placement Routing
Synthesis Verification

127
Lithography-Friendly Detailed Router:
AENEID
• Motivation: avoid RET dependent layout
printability

• Challenges: the lithography hotspot detection


dilemma in the detailed routing stage

[Ding+,DAC’11]
128
Lithography-Friendly Detailed Router:
AENEID
• Objective: minimize total wirelength
• Subject to: keep lithography cost on each routing
grid within a threshold
X Lagrangian X
min 1, relaxation max min 1+ e (litho(e) L),
P P
e2P e2P

<latexit sha1_base64="b7ACplMU47wZl0Fx3bA3ft8BlRE=">AAAD63icfVJNb9NAELUTPoL5SuHIZYQFKlJlxc6hPVZw4cAhSKStlI2i9WaSrLK7Nrvr0sj1r+CGOCHxo/g3bFwTNWnpSpbevpk3bzw7aS64sb3eH7/Vvnf/wcPOo+Dxk6fPnnf3XpyYrNAMhywTmT5LqUHBFQ4ttwLPco1UpgJP0+WHdfz0HLXhmfpiVzmOJZ0rPuOMWkdNur9IinOuSvxa1EwVNIRx7tbdJFeTARAXnsJbIKaQkxKBcAWDCuIDAEICYvHCalmayEbVJtfpF9k+vgMiED4dNDyZZZoKAU0NICqzdA4BQTXdmK7xpqNJN+xFvfrATRA3IPSaM5jstWIyzVghUVkmqDGjuJfbcUm15Uygq18YzClb0jmOHFRUohmX9TAreOOYKbgu3acs1Ox1RUmlMSuZukxJ7cLsxtbkbbFRYWdH45KrvLCo2JXRrBBgM1i/DEy5RmbFygHKNHe9AltQTZl17xcEAVH4jWVSUjcckmpajeJx6UZlCo1rt9LNeWaJoGouMIyJ5vOFvayqbeES7e3Cy38SousKu0LnyGpxcpcrkPN8QZXNZBkmVdMD1DkRhMn1cFzd6eeKa36Bohr1/28IO45h3+1fY7p2q38Mwv6Wr2sMdq3dksW7K3UTnCRR3I+Sz0l4/L5Zt473ynvt7Xuxd+gdex+9gTf0mN/xI//QP2rL9vf2j/bPq9SW32heelun/fsvThFKIQ==</latexit>
s.t. litho(e)  L, 8e 2 P <latexit sha1_base64="HfFxSVrDJ65we8pGoW7TqYTaEDk=">AAAD+3icfVJLb9NAEHYaHsW8WrggcRlhgVoBUZwc4IQqONADhyLRh5SNrPVm4qziXZvddWnk+tdwQ5yQ+DH8G8auidq0dCRLs9/MN994ZuI8ldb1+386a90bN2/dXr/j3713/8HDjc1HBzYrjMB9kaWZOYq5xVRq3HfSpXiUG+QqTvEwnn+o44fHaKzM9Be3yHGseKLlVAruCIo2frEYE6lL/Fo0SOW3gCV1Ry/FT6KSpVRxwitgSuqo3KvgBTBbqKhEYFIDASG8hDYtQtgi8izbwm14DZ+2XwEw5jOHJ86o0vZcjyqR4KQus+SwBKEPTGeOJ+Az1JNlE7W/7DDaCPq9fmNw2QlbJ/Ba24s210I2yUShUDuRcmtHYT9345IbJ0WKVL+wmHMx5wmOyNVcoR2XzXAreE7IBKaZoU87aNDzjJIraxcqpkzF3cyuxmrwqtiocNO341LqvHCoxZnQtEjBZVBvCibSoHDpghwujKReQcy44cLRPn3fZxq/iUwpTsNhseHVKByXNCpbGKzVaGs4dTRfnaQYhMzIZOZOq+oicY7uauLpPwozTYVVIimKhjy4ThXYcT7j2mWqDAZV2wM0OT0IBufDYXWtHhU38gTTajT8vyCsKAZDurRWtFZrfgyC4QVdagxWpenIwtWTuuwcDHrhsDf4PAh23rfntu499Z55W17ovfF2vF1vz9v3ROdJ513nY2e3W3W/d390f56lrnVazmPvgnV//wWsQVBH</latexit>
s.t. e 0

50% reduction
in hotspots

ANN SVM
[Ding+,DAC’11] 129
Conclusion
• Machine learning brings
• New modeling opportunities
• New optimization techniques
• New hardware acceleration, e.g., GPU acceleration
• New software platforms, e.g., Tensorflow, PyTorch
• Hammers and bridges for conventional EDA flow
• Reformulate the problems, e.g., neural network training,
image-to-image translation
• Accurate and efficient information feedback from late
stages to early stages

130
Open Problems
• Connectivity feature representation
• How to encode hypergraph as input to models
• Or develop models that are friendly to ultra-large
graphs
• Optimization-friendly ML models and ML-friendly
optimization techniques
• Target at integrating ML models into optimization
• Need to consider fidelity, smoothness, accuracy,
convergence rate
• Generalization guarantee
• How far can ML models generalize
• How to know whether a model is applicable to new data

131
Future Directions
• Routability & Timing-driven DREAMPlace with
deep learning toolkits and GPU acceleration
• Integrate ML models for routability and timing prediction
• Reinforcement learning for routing strategies
• Rip-up and re-route policy
• Tree topology generation
• End-to-end mask synthesis and verification
• Use LithoGAN to guide SRAF and OPC
• No need to generate golden SRAF and OPC solutions
• Open discussions…

132
Q&A
Thanks

133
Recent Development of Placement

*Data collected from RePlAce [TCAD’18, Cheng+] and http://vlsi-


cuda.ucsd.edu/~ljw/ePlace/ on ISPD 2005 benchmarks
Future Directions

New Solvers New Objectives

SGD, ADAM, etc. Routability, timing, etc.

DREAM
Gate sizing, BIGGER Multi-GPU,
distributed computing,
floorplanning,
mixed precision,


Applicable to Other
New Accelerations
CAD Problems

135

You might also like