Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO.

7, JULY 1998 613

Machine Learning-Based VLSI


Cells Shape Function Estimation

Xiao Quan Li and Marwan Anwar Jabri

Abstract— We describe in this paper a novel approach based upon


machine learning for estimating layout shape functions of full-custom
integrated circuit cells. A neural network is trained to estimate one
dimension of cell layout from circuit netlist, a desired packing density, and
prescribed values of the complementary dimension. The neural network Fig. 1. A view of an integrated circuit hierarchy.
is then combined with a linear function generator and a neural network
that predicts the number of contacts (vias) to produce estimates of cell
layout shape functions. We have experimented with this approach on an
independent test set of circuits and the results are very encouraging. The
resulting estimation system is very fast and can be easily incorporated into
exiting floorplanning systems. An additional benefit of the the machine
learning aspect is the simplicity and systematicity in incorporating into
the estimation system new circuits and technology information as they
become available.
Index Terms— Dimension estimation, floorplanning, layout aspect ra-
tios, neural networks, number of contacts, shape functions.

I. INTRODUCTION
In full-custom very large scale integration (VLSI) design, a cell
refers to the lowest level circuit in the design hierarchy as shown
(a) (b)
in Fig. 1. For a given circuit context, the masks of the cell can be
designed in various ways. We show in Fig. 2 the circuit schematic Fig. 2. (a) Schematic of a circuit, with T1 and T2 being depletion and
of an inverter cell designed using an nMOS technology and an asso- enhancement mode nMOS transistors, respectively. (b) Mask layout of the
circuit.
ciated mask layout. Different layouts of the same circuit are called
“realizations.” We show in Fig. 3 some possible layout realizations of
the inverter in Fig. 2(a). The width and height of different realizations In manual design tasks, the existence of shape function estimators
can be used to define a discrete shape function as shown in Fig. 4. allow designers to explore layout dimension options and manual floor-
Features that characterize and differentiate full custom cell layout planning. For example, the availability of cell dimension estimates
(FCCL) realizations are (Fig. 2): can assist in the determination of wiring delays and clock skews.
This paper is concerned with the automatic estimation of FCCL
1) Circuit Specific: schematics of the circuit—transistors/gates
shape functions from circuit schematics. In other words, we are
count and their area, number of netlists (Fig. 5), and
concerned with the task of “predicting” the width and height of
2) Contextual: floorplan information—how the cell layout fits with
unknown realizations given some features that can be extracted from
its surrounding interconnection port positions.
the circuit schematic and contextual descriptions without knowledge
Different realizations can be produced by varying items in 2) of the layout.
above. A continuous shape function represents a continuous rela- The automatic prediction of cell shape functions in a full-custom
tionship between the width and the height (Fig. 4). design methodology is a complex process because of the large number
Shape functions are useful in automated and manual design tasks. of practical layout options. The creation of a shape function prediction
In automated design, they are used in chip level area optimization model is further complicated by the nonlinear relationships that exist
and floorplanning [1]–[3]. Shape functions of lower level modules between cell features and possible layouts [6].
are assumed to exist and are used to generate the minimal area This paper presents a novel machine learning-based technique for
of overall floorplan by optimally selecting the realization for each the generation of shape function prediction models. The technique is
module as to minimize a chip level cost function. The assumption based on multilayer neural networks trained using gradient descent
of the existence of FCCL shape functions (discrete or continuous) is to predict the width or height of FCCL’s and the number of contacts
valid in the case of library based design systems. In this case, several required by a circuit. To the best of our knowledge, it is the first time
layout realizations of a circuit may exist and the layout dimensions that the estimation of shape functions of full-custom layouts from
of these different realizations can be used to define a discrete shape schematics has been attempted. The technique has been evaluated on
function. For standard cell based designs, the shape functions of cells a database of real cells, and it shows great potential for its practical
and modules can be estimated accurately using empirical [4] and exploitation in CAD systems.
constructive techniques [5].
II. SHAPE FUNCTION MODELING
Manuscript received July 17, 1996. This paper was recommended by The modeling of shape functions can be treated as a nonparametric
Associate Editor M. Sarrafzdeh.
The authors are with the Computer Engineering Laboratory, Department of prediction problem. To achieve this, one has to define features or
Electrical Engineering, The University of Sydney, NSW 2006 Australia. the “predictor variables,” collect data or observations, and use the
Publisher Item Identifier S 0278-0070(98)05195-1. collected data to build an estimation model.

0278–0070/98$10.00  1998 IEEE

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
614 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998

(a) (b) (c) (d)


Fig. 3. Possible layouts for the inverter of Fig. 2. Note the layouts have been scaled to fit. (b) A rotated version of (a). (c) A different position for
Vin : (d) An up-down flip of (a).

Ntrans number of transistors or transistors equivalent of gates;


Tav average area of these transistors;
Nnetlists number of interconnection netlists in the circuit.
All these features influence the area of the FCCL in a relatively
easily predictable manner. Their influence on the aspect ratio of an
FCCL is significantly more difficult to predict. This is especially the
case for interconnection netlists where wiring directions (which are
considered to be unknown in the dimension prediction process) affect
the aspect ratio more directly than the number of netlists.
2) Context-Specific Features: The contextual (floorplan) feature
group consists of the information about the number of interconnection
ports on each side of the block’s enclosing rectangle. For these
features, we introduce the index side defined as
Fig. 4. The filed circles represent a discrete shape function. The continuous
curve represents a continuous shape function. The continuous shape function side 2 ftop, bottom, left, rightg:
defines the minimal area required for the layout of the block. Realizations
with corresponding width and height points above the curve (dotted lines) are
possible but will not be minimal in terms of area. The floorplan-based features are:
Dataside number of data ports on side;
C tlside number of control ports on side;
C lkside number of clock ports on side;
P owerside number of power and ground ports on side;
P ack the packing density as discussed in Fig. 8.
The numbers of data, control, clock, and power ports have been
treated as separate features because in practice they occupy varying
mask layers. This is not suggesting that these ports should occupy
different layers, but that degrees of freedom should be provided to
accommodate the cases where they do. We show in Fig. 3 how the
position of ports affects layout organization and dimensions.

B. Features Characterizing Shape Functions


The discussion above was aimed at briefly reviewing the important
features that characterize dimensions of FCCL’s. The question now is
Fig. 5. Netlists of the circuit in Fig. 2. Note the circuit has four netlists.
which of these features would influence the shape function of a cell?
More specifically, does shape function modeling require floorplan-
Before we describe the features that we have used in our modeling, based features?
we will review those used in the modeling of FCCL dimensions and As mentioned in Section I, different layout realizations of a circuit
which have been reported in [6]. are obtained by varying the contextual features of the cell. The
variation of these features translates into possible movements of ports
along a side of the FCCL enclosing rectangle, or from one side to
A. Features Characterizing FCCL Dimensions another. Such movements are not necessarily linear, that is, ports may
The width and height of an FCCL depend on a number of have to be moved in groups as to maintain their practical distribution.
features of the circuit at hand. Previous research has shown that the For example, if a group of ports represents a data bus, then it is
relationship between these features and the actual dimensions of the unlikely that the movement of one port (representing one bit of the
FCCL is too complex to be modeled using handcrafted formula [6] but bus), from one side of the circuit enclosing rectangle to another side,
are amenable to automatic modeling. These features can be divided would lead to a layout realization that meets practical floorplans. This
into two groups: circuit specific and context (floorplan) oriented. complex relationship between floorplans (port positions) and shape
1) Circuit-Specific Features: The circuit specific features that functions leads to difficulties in selecting the number of ports and their
have commonly been considered are: type on a side, as predictor variables in shape function modeling.

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 615

(a) (b)
Fig. 6. The packing density of mask layout. It reflects the amount of “dead space” in the layout. The packing density of the layout in (a) is higher than that
of (b). The dead space is any wasted space in the layout that may be due to layout pitch matching. Of course design rules based spacing is excluded.

TABLE I (or regression) algorithms to fit some observation data using a set
FEATURES USED IN SHAPE FUNCTION MODELING EXPERIMENTS of basis functions. Polynomial, splines fitting, and simple linear
regression are the most widely used. Depending on the prediction
problem, some modeling techniques are more suitable than others. For
instance, multivariate adaptive regression splines [7] (or MARS) is a
technique that is more suitable to problems where the input variables
(features) need not be highly combined in a linear or nonlinear fashion
to produce accurate models [8]. From the experiments we have done,
it is shown that there are some complex combinations of the input
feature variables in the modeling of FCCL dimensions and shape
functions, and it also confirmed that modeling techniques such as
MARS do not lead to as accurate predictions as multilayer perceptron
do.
Multilayer perceptrons can provide effective prediction models.
It has been shown that a three-layer multilayer perceptron (MLP)
network is a universal approximator [9]. A typical MLP architecture
is shown in Fig. 7. The input layer has a size N +1 (counting the bias
input) pin neurons. These pin neurons have a unity transfer function.
Their outputs are fed through weights to the next layer that contains
1) Prescribed Input Dimension: The contextual information can nonlinear neurons. Each pin in the input layer is connected to every
be represented in a simpler but indirect fashion. This is possible neuron in the first layer. Each neuron of the first layer feeds its output
in shape function modeling because we can aim at estimating one through weights to all nonlinear neurons of the second layer. And so
dimension given the cell features and given prescribed values of the on, until the output layer.
second dimension. For example, we can develop a model that can The weight connecting neuron i to neuron j is denoted wij :
estimate the width of a FCCL given the cell specific features, the Neurons are indexed from zero (bias pin neuron) to N
aggregate numbers of ports for each type, and prescribed values L01
of the height. In this case, the height of the cell becomes what N =1+ Nl
we call a prescribed input dimension (PID). As the height of the l=0
FCCL is specified and the aggregate number of ports for each type where Nl is the number of neurons in layer l: When pattern n is
is also provided, the shape function prediction model generation applied, the output of neuron i in layer l is
technique could extract a relationship that represents the floorplan-
based features. As we will see later in the paper, the provision of a
yi (n) = f (neti (n)) (1)
PID is simply and conveniently implemented using a linear function where
generator. N 01
Therefore, our modeling experiments have used the ten features neti (n) = wij yj (n):
(predictor variables) shown in Table I. We have not included in j =1
this table the PID as it is automatically generated. The modeling
The function f ( ) is usually a squashing function such as the sigmoid
experiments we describe later evaluate the impact of including the
function
aggregate number of ports for each type as well.
1
fs ( ) =
1 + e0
(2)
C. Prediction Model Construction Techniques
or the hyperbolic tangent function
Many automatic data modeling techniques have been developed
over the last few decades. Traditional techniques make use of “fitting” ft ( ) = tanh( ): (3)

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
616 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998

Fig. 7. Typical MLP architecture.

Both hyperbolic and sigmoid functions are used in the networks and/or testing set). During training the CVD is applied through the
described in this paper. MLP and the corresponding total mean-squared error is recorded.
In the case of estimation, the inputs to the MLP are the features Training is halted when this error shows some significant increase.
and its outputs are the variables that need to be estimated. An MLP However, in applications where data is limited, the strength of cross-
is trained to estimate by the means of an optimization algorithm. The validation is diminished because the training and testing data may
most commonly used algorithm is backpropagation [13]. Backpropa- suffer in size, and the size of the CVD may not be sufficiently large to
gation performs training by minimizing the network output error. We confidently detect good stopping conditions. In this case, determining
define the error for the output neuron k when pattern n is applied the best stopping criteria can be identified by running a large number
of experiments using various stopping criteria and values for "m :
ek (n) = dk (n) 0 yk (n) (4)

where yk (n) is the output of neuron k and dk (n) is its desired output,
for pattern n: The pattern mean-squared error for n is defined as III. SHAPE FUNCTION MODELING ARCHITECTURE

ek2 (n)
The shape function modeling architecture proposed in this paper
"(n) = 12 (5)
k2 
consists of two subsystems: a complementary dimension estimation
module and a linear function generator. The complementary dimen-
where  is the set of all output neurons. The training algorithm sion estimation module estimates a dimension of the FCCL given
will determine the values of the weights as to minimize a total input features from Table I and a PID. To produce a shape function,
mean-squared error the linear function generator produces a monotonically increasing
"Total = "(n): (6) series of numbers between zero and one at a predefined step size.
n
In the experiments we report below, step sizes of 0.1 and 0.01 are
used. Although the shape functions produced by this architecture are
The training will stop when "Total reaches a small prescribed value. discrete, the actual underlying model is continuous. Because the MLP
Once the weights have been determined, the MLP is evaluated on is in effect trained to predict a dimension given the complementary
an independent set of patterns (called the testing set) to assess dimension (PID), we call such configurations of MLP’s dimension-
its generalization capability. The quantification of the generation dependent prediction (DDP).
capability is problem dependent. In the case of shape function
modeling, an error margin "m is defined and the output of the MLP
for pattern n is said to be correct if A. Cells and Layout Database
"(n)  "m : (7) Our database was collected from two sources. The first source
is the VLSI designer’s library [11] which was used in full-custom
The error margin "m is also often used in the training algorithm to dimension estimation experiments in [12] and [6]. The layouts from
instruct it to neglect corrections to the weights when (7) holds for the second source are full-custom cells developed in our laboratory in
a particular pattern. The value of "m could be critical. If it is too the 1980’s. Altogether, 53 layouts have been collected. They are all
small, an over-training effect can take place where, when training of digital circuits, nMOS-based, and represent a variety of functions.
completes, the MLP would have memorized well the patterns in The layouts are lambda based with some mapping to a 5-m and
the training but its generalization performance (on the testing set) others to a 3-m fabrication processes. Note that the database does not
degrades compared to its performance when larger values of "m include regular arrays of structures such as PLA, RAM, and ROM.
are used. Techniques such as cross-validation [10] can be used to Such blocks are commonly produced by “module generators” and
avoid over-training. A third set of pattern, called the cross-validation their organizations are predefined and highly accurate estimates of
dataset (CVD), is created (patterns are removed from the training their dimensions and shape functions can be produced.

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 617

Fig. 8. The rotation of a pattern has the effect of introducing a new and important point of the shape function.

Although our database is based on nMOS circuits, the shape 1) the average and standard deviation of the relative error of the
function modeling method described in this paper can be applied to model on an independent test dataset;
other technologies where layouts can be collected. The collection of 2) an error tolerance  .
data for such a task is not easy as circuits and associated full-custom An experiment is repeated 20 times with different initial conditions,
layouts are generally treated with secrecy. Hence, we did not have a training, and independent testing datasets to yield an experiment set.
choice in selecting the data as only nMOS circuits and layouts were The average and standard deviation of the two measures above are
available. then computed.
For each circuit in the database, the features shown in Table I and The details of the experimental procedure are shown below.
the corresponding dimensions of the layout are extracted and stored to 1) A set of predictor variables are defined (input features).
form a pattern. Because the features are independent of the orientation 2) A Muscle file (MUME’s neural network configuration file [14])
of the cell, the number of the patterns was doubled by rotating the is created. Parameters such as training convergence criteria and
cells (swapping the width and height for each cell) to produce a total training output error margin "m are set.
number of patterns of 106. As the database does not contain patterns 3) A program is used to randomly generate from the master layout
of equal width and height, no pattern redundancy is introduced. database training and testing sets containing Ntr and Nte
The rotation of the original 53 patterns yields a very convenient patterns, respectively.
effect beside the increase of the amount of training and testing data. 4) The MUME neural-network simulation program is executed to
This effect is illustrated in Fig. 8. For each pattern a new symmetrical train the neural networks.
point of the circuit shape function is introduced by the rotation. 5) When training is completed (either convergence criteria or
The patterns in the database are then normalized by dividing each maximum number of iterations is reached) the generalization
feature value by a number that represents the maximum of that feature performance of the resulting neural-network shape function
in the database. The resulting patterns are then bound between zero model is evaluated on the test set to produce the estimates
and one. Other normalization methods were also assessed but did not of the complementary dimension for each test pattern.
yield better results. 6) The predicted and expected values of the dimension D being
The fact that each FCCL in the database exists in its original and modeled are used by a number of postprocessing programs to
rotated form ensures that the trained MLP is not biased toward a compute:
particular dimension. Hence, where a width- or a height-dimension
estimation MLP was used, proper training should yield on average a) the relative dimension error for a pattern in the test set
similar behavior for each MLP.

"e;p =
jD 0 D j
e;p a;p
(8)
IV. EXPERIMENTS AND RESULTS De;p
The investigation of MPL architectures capable of performing a
mapping/estimation task is an iterative design process. Determining a b) the average of the relative dimension errors
suitable architecture requires understanding of the learning algorithm,
N
the relationship between the dimensions of the input and output 1
"e = "e;i (9)
spaces, the architecture of the network (number and sizes of hidden Nte i=1
layers, neuron transfer functions) and the amount of data. A detailed
account of MLP exploration procedures is beyond the scope of this
c) the standard deviation of the relative dimension errors
paper. In brief, for a given feature set and data, essential elements
in the interpretation of MLP performance include an experimental 2
procedure, a concise performance measure, and variation of number
of free parameters (weights) through the variation of the number of
Nte "2 e;i
0 "e;i
e = e e

Nte (Nte 0 1)
hidden layers and their sizes. (10)

A. Experimental Procedures d) the tolerance specific performance Ce;m (percentage of


We distinguish here experiments and experiment sets. In an exper- pattern for which the target dimension is predicted to
iment, a modeling architecture is constructed and evaluated using: within  ):

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
618 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998

7) Steps 3)–6) above are repeated Ne times (number of trials)


with a different random selection of the training and testing
sets to produce an experiment set for which we compute the
following measures:

a) the experiment set relative error average


N
1
"= "e (11)
Ne e=1
b) the associated standard deviation
2
Fig. 9. Shape function modeling using a four-feature DDP.
Ne "2e 0 "e
e e
=
Ne (Ne 0 1)
(12) TABLE II
TRAINING AND GENERALIZATION PERFORMANCE OF
c) the tolerance specific average performance FOUR-FEATURE DDP’s FOR SHAPE FUNCTION MODELING
N
1
Cr = Ce; (13)
Ne e=1
d) the associated standard deviation
2
Ne 2 0
Ce; Ce;
e e
C
Ne (Ne 0 1)
= (14)

which are then used to assess and compare shape function


estimation architectures.
Note that for each experiment, and unless otherwise specified, 76
patterns were selected at random from the data to form a training set
(Ntr = 76) and the remaining 30 patterns were used for independent
testing (Nte = 30):
TABLE III
B. Further Assessments of Shape Function Prediction Performance PERFORMANCE OF THE FOUR-FEATURE SHAPE FUNCTION MODEL
(DDP 4-8-5-1) ON THE TRAINING AND TESTING SETS
The evaluation of the quality of the shape functions predicted by
our proposed method is not a straightforward process. The reason is
that a “direct” assessment of a shape function model would require
more than two layout patterns of the same circuit, which is very
difficult to assemble. To overcome this, we introduce two additional
“checks” to the evaluation of (13) and (14).
Check 1: The output of the shape function estimator should al-
ways decrease or remain unchanged when the PID increases as
illustrated in Fig. 4. Deviations of this desired behavior would
indicate significant problems in the modeling architecture, its training
and/or its input feature representation.
Check 2: The response of the model to features of “similar”
circuits has to be “similar.” Consider a circuit C1 in the test set. C1’s
features will be used to produce a prediction for its shape function.
The “closest” circuit C2 to C1 will be found by scanning the test set
for the circuit with the smallest Euclidean distance to the predictor
variables of C1. The model used to predict the shape function of C1
should produce a dimension estimate for C2 (given the PID of C2
from the database) which should be “close” to the predicted shape
function curve of C1.

C. Shape Function Modeling Using Four Features


A shape function prediction system using a four-feature DDP is training sets for which the complementary dimension has been
shown in Fig. 9. Note, port and packing density information is not estimated to within the specified tolerance : Note that the four layers
used. The output of the DDP MLP represents the complementary network achieves better training performance. The table shows under
dimension to the PID input. So if the PID represents the width, the the “generalization performance” column the average and standard
output of the DDP MLP represents the height. deviation [according to (11) and (12)] of the estimated complementary
We show in Table II the training and generalization performance dimension errors over the 20 runs and for the best of these runs. The
for a number of MLP DDP architectures. We show under the standard deviation of the generalization error is a useful indication of
“training performance” column the percentage of patterns in the how consistent is the complementary dimension prediction. It is also

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 619

Fig. 10. Shape functions produced by a four-feature DDP-based model.

Fig. 11. Shape functions produced by a ten-feature DDP-based model.

a useful measure to compare networks with different architectures and TABLE IV


the effects of input features on the performance. Note the error margin TRAINING AND GENERALIZATION PERFORMANCE OF
"m is not used in the evaluation of the generalization performance TEN-FEATURE DDP’s FOR SHAPE FUNCTION MODELING
(i.e., any error counts). The four-layer networks also achieve better
generalization performance on average. The best performance is
achieved by the 4-8-5-1 architecture.
Table III shows the performance of this architecture on the training
and testing sets when the tolerance  is swept from 0.05 (5% of target)
to 0.5 (50% of target). The average and the standard deviation of the
number of patterns that the trained 4-8-5-1 MLP DDP has correctly
estimated within  is shown as a percentage in the third column for
the training set and in the fourth for the testing set. Note in the case
of the training set, this is different from the training performance (that
is, it is not the same as training the net at varying error margin).
The numbers in Table III basically say that not only the network
has a poor generalization performance, but it also has difficulties
learning the training set. This is not surprising given that we are
trying to create a model that is too complex to be constructed from
the information conveyed by the four input features. This is also
illustrated in the poor quality shape functions produced by the model
aggregate number of ports per type as predictor variables. Note no
and of which we show two examples in Fig. 10. A simple inspection
is sufficient to see that the number of transistors, netlists, the average port-per-side information is used (that is no context information).
transistor area, and the PID do not convey sufficient information for Again, the output of the DDP MLP represents the complementary
accurate dimension dependent estimation. dimension to the PID input. So if the PID represents the width, the
output represents the height.
D. Shape Function Modeling Using Ten Features We show in Table IV the training and generalization performance
Given the poor results obtained with the four features of the of a number of MLP DDP architectures. What we see in this table
previous section, an obvious improvement is to incorporate the is an improvement over the four-feature modeling architecture. The

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
620 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998

average generalization performance has improved by two percentage


points.
We show in Fig. 11 shape functions produced by this model for
the same patterns in our previous illustration. Clearly there are some
improvements, but they are not significant. For a large number of
circuit patterns in the testing sets, the shape functions produced by the
ten feature models have behavior similar to those shown in Fig. 11.
The dimension estimated by the DDP decreases as the complementary
input dimension increases, up to point, and then both dimensions seem
to unexpectedly develop a monotonically increasing relationship. This
suggests that like the four feature-based DDP, the ten feature-based
Fig. 12. Including the number of contacts to a ten-feature DDP.
DDP is failing to capture the essence of the shape function model.
Because all possible direct features have been explored, there
are three possible reasons that could be responsible for the poor
estimation performance: TABLE V
TRAINING AND GENERALIZATION PERFORMANCE OF
1) inadequate (DDP) modeling architecture; ELEVEN-FEATURE DDP’s FOR SHAPE FUNCTION MODELING
2) insufficient training data;
3) insufficient information in the input representation (features).
We have investigated many potential architectures, and we could not
detect a change in the DDP estimation performance. As the data are
scarce, the only option we had was to increase the size of the training
set at the expense of the size of the testing set. This was tried and
no noticeable improvement was observed. As for the adequacy of
the input representation, the only option available to us (as all the
features have been assessed) was to explore the impact of “hidden
features.” The obvious choice was to target the use of the number
of contacts as a feature, assuming that it could be itself estimated
accurately from the features of Table I.

E. Including Number of Contacts in Shape Function Modeling


We argue here that the number of contacts is a valuable hidden fea-
ture in shape function prediction and we will later in the paper support
this argument by experimental results that support this argument. TABLE VI
It is known that the number of contacts or vias [6] can affect PERFORMANCE OF THE 11 FEATURE DDP
dimension prediction when contextual information is used as predictor (11-6-1) ON THE TRAINING AND TESTING SETS
variables. In shape function prediction no contextual information can
be used in a simple fashion. Because the number of contacts directly
affects area and a prescribed input dimension is used as a predictor
variable, we can expect the extraction of a relationship between the
number of contacts, the PID, and the complementary dimension to
be predicted, can be achieved by our modeling technique. We have
performed a number of experiments where the number of contacts
was used as an additional input to the implementations described in
the previous two sections. We discuss these experiments and their
results in the next few sections. Following these experiments we will
describe how the number of contacts can be accurately estimated
using features from Table I.

F. Including the Number of Contacts as an


Input to a Ten-Feature DDP
The modeling architecture is shown in Fig. 12. Again, for com-
parison purposes, the training and testing sets used here are those
used in the previous experiments. The training and generalization
performance of this architecture for various DDP MLP’s is shown
in Table V. Clearly, the number of contacts feature is yielding a
significant improvement in modeling accuracy compared to all models The shape functions for the same patterns illustrated previously are
described so far. shown in Fig. 13. Note the significant improvement in the shape
The best performance is achieved by an 11-6-1 DDP MLP. We of these functions, in particular, with respect to Check 1 and the
show in Table VI the performance of this DDP on the training and performance of the ten feature-based model. Before we look at how
testing sets when the tolerance is swept from 0.05 to 0.5. Note the this architecture performs with regard to Check 2, we look at the
significant improvement on the testing set in particular. For an error estimation of the number of contacts and it integration in an overall
margin of 20, over 81 of the testing patterns are correctly estimated. shape function estimation system.

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 621

Fig. 13. Shape functions produced by an 11-feature DDP with the number of contact feature provided from the database.

Fig. 14. Shape functions produced by the 11-feature DDP with the number of contact feature predicted by the NCP.

TABLE VIII
GENERALIZATION PERFORMANCE OF THE
CASCADED 11-6-1 DDP AND 9-11-5-1 NCP
TABLE VII
TRAINING AND GENERALIZATION PERFORMANCE OF THE NCP’s

G. Number of Contact Prediction for Shape Function DDP


The number of contact prediction (NCP) systems we have in-
vestigated have built on the experience described [6]. Note that in
comparison to the work reported in [6] (on the prediction of actual

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
622 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998

Fig. 15. Comparison of shape functions generated with 11-feature-based DDP with the number of contacts provided from the database and from the NCP.

dimensions of FCCL), the input feature representation in our present We show in Table VII the training and generalization performance
context are different and are restricted to the features of Table I. the best performing architecture for each of these groups. The best
Because the NCP will be connected to a DDP, the data used for the performing architecture in each group is shown in italic and the
evaluation of the NCP and the DDP should be consistent. That is, overall best performing architecture is shown in bold (9-11-5-1)
no training data of the NCP should be part of the testing sets of which has an average error of 9.21% with a very small standard
the DDP and vice-versa. The easiest way to maintain consistency is deviation.
to use the 11-feature DDP experiment datasets for the training and
evaluation of the NCP.
Our investigations of the NCP explored various combinations of H. Incorporating the NCP in Shape Function Modeling
circuit features and multilayer perceptron architectures. We have The DDP (11-6-1) and NCP (9-11-5-1) are now cascaded to form
grouped the investigated architectures according to their input rep- an 11-feature shape function prediction system (with the PID input of
resentations (features): the DDP supplied by the linear function). The generalization perfor-
three-features NCP architectures: Ntrans ; Taverage-area ; Nnodes ; mance of the cascaded scheme is shown in Table VIII. Obviously the
four-feature NCP architectures: Ntrans ; Taverage-area ; Nnodes ; performance is slightly degraded compared to the 11-feature shape
P ack; function estimation where the number of contacts was supplied from
the database.
five-feature NCP architectures: Ntrans ; Taverage -area ; Nnodes ;
We show in Fig. 14 the estimated shape functions for patterns 4
P ack; P orttotal ;
and 12. Obviously the improvements introduced by the number of
nine-feature NCP architectures: Ntrans ; Taverage -area ; Nnodes ; contacts are maintained when this number is estimated. This result
P ack; P orttotal ; Datatotal ; C tltotal ; C lktotal ; P owertotal : is very pleasing.

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 623

The proposed shape function generation system was evaluated on


nMOS circuits and layouts and the method is applicable to any
technology for which a database of layouts can be gathered.
The results of the experiments demonstrate the importance of port
information in shape function modeling. The results also demonstrate
that machine learning in the context of neural computing can take
advantage of the port information in conjunction with the PID to
yield improvements.
The presence of three- and five-micron nMOS circuits in our
database, shows a fabrication process independence in our shape
function modeling approach. The question as to whether constructed
models can be used to generate shape functions of CMOS circuits
is open to debate and requires further research. As the model-
input variables (features) are all lambda-based, one can expect that
generated models would apply to other lambda based technologies.
This is not necessarily true because although no nMOS specific
measures are used in our database, the layout dimensions used in the
MLP training reflect inherently nMOS organization of layout. CMOS
layouts are organized differently as they take advantage of different
design rules (e.g., well spacing). Based on our modeling experience
Fig. 16. The shape functions of pattern 4 and pattern 10, the closest to pattern however, we anticipate improved performance when CMOS-based
4 according to an Euclidean distance based on the 11 input features and the layout shape function estimation is targeted.
prescribed input dimension.

REFERENCES
We show in Fig. 15 shape functions produced by the 11-6-1
DDP for four different patterns (representative) when the number [1] G. Zimmerman, “A new area and shape function estimation technique
of contacts is sourced from the database and when it is estimated. for VLSI layouts,” in Proc. IEEE/ACM 25th Design Automat. Conf.,
1988, pp. 60–65.
Overall, the differences are not significant. [2] L. Stockmeyer, “Optimal orientations of cells in slicing floorplan
As an implementation of Check 2, we show in Fig. 16 the shape designs,” Inform. Contr., vol. 57, pp. 91–101, 1983.
functions for patterns 4 and 10. Pattern 10 is the closest to pattern [3] T. Wang and D. Wong, “Efficient shape curve construction in floorplan
4 in terms of Euclidean distance (smallest but nonzero Euclidean design,” in Proc. European Conf. Design Automat., 1991, pp. 356–360.
[4] F. Kurdahi and A. Parker, “Techniques for area estimation of VLSI
distance). Although the circuits associated with patterns 4 and 10 layouts,” IEEE Trans. Computer-Aided Design, vol. 8, pp. 81–92, Jan.
are slightly different, the similarity in the estimated shape functions 1989.
provides further evidence that the modeling reflects important aspects [5] F. Kurdahi and C. Ramachandran, “LAST: A Layout Area and Shape
of the desired dimensions relationship. function esTimator for high level applications,” in Proc. European Conf.
Design Automation, 1991, pp. 351–355.
[6] M. Jabri and X. Li, “Predicting the number of contacts and dimensions of
I. Operation Speed full-custom integrated circuit blocks using neural networks techniques,”
IEEE Trans. Neural Networks, vol. 3, pp. 146–153, Jan. 1992.
The operation speed of the overall prediction system is independent [7] J. H. Friedman, “Multivariate adaptive regression splines,” Ann. Statist.,
of the circuit at hand and is directly proportional to the size of vol. 19, pp. 1–141, Mar. 1991.
the MLP’s. The speed can be computed by counting the number [8] J. Zemrole, personal communication.
of multiply/accumulate and neuron transfer functions according to [9] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward
(1)–(3). For either of the neural networks, the number of operation networks are universal approximators,” Neural Networks, vol. 2, pp.
359–366, 1989.
for a feedforward evaluation is approximated to [10] J. Hertz, A. Krogh, and R. Palmer, Introduction to the Theory of Neural
Computation. Reading, MA: Addison-Wesley, 1991.
L L
l l01
N N + Nl :
[11] J. Newkirk and R. Mathews, The VLSI Designer’s Library. Reading
MA: Addison-Wesley, 1983.
i=1 l=1 [12] X. Chen and M. Bushnell, “A module area estimator for VLSI layout,”
in Proc. IEEE/ACM 25th DAC, 1988, pp. 60–65.
Multiply and accumulate Neuron functions [(2) or (3)]
[13] D. Rumelhart and J. McClelland, Parallel Distributed Processing:
Explorations in the Microstructure of Cognition, vol. 2. Camridge,
On a Sparc workstation Ultra 1, the speed for a single feedforward MA: MIT Press, 1986.
through the whole system is less than 200 ms. [14] M. Jabri, E. Tinker, and L. Leerink, “MUME—A multinet multiarchi-
tecture neural simulation environment,” 1994, pp. 229–247.

V. CONCLUSIONS
This paper has presented a novel technique for custom cell layout
shape function estimation based upon machine learning. Our results
show that a dimension can be predicted to within 11.52% average
error and that estimate accuracy is improved when the aggregate
number of ports for each of data, control, clock, and power, are
provided.
The accuracy and speed efficiency makes our proposed method
ideal for use in fast design exploration during floorplanning, timing,
and skew analysis by CAD systems.

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.

You might also like