Professional Documents
Culture Documents
Cell Shape Estimation
Cell Shape Estimation
I. INTRODUCTION
In full-custom very large scale integration (VLSI) design, a cell
refers to the lowest level circuit in the design hierarchy as shown
(a) (b)
in Fig. 1. For a given circuit context, the masks of the cell can be
designed in various ways. We show in Fig. 2 the circuit schematic Fig. 2. (a) Schematic of a circuit, with T1 and T2 being depletion and
of an inverter cell designed using an nMOS technology and an asso- enhancement mode nMOS transistors, respectively. (b) Mask layout of the
circuit.
ciated mask layout. Different layouts of the same circuit are called
“realizations.” We show in Fig. 3 some possible layout realizations of
the inverter in Fig. 2(a). The width and height of different realizations In manual design tasks, the existence of shape function estimators
can be used to define a discrete shape function as shown in Fig. 4. allow designers to explore layout dimension options and manual floor-
Features that characterize and differentiate full custom cell layout planning. For example, the availability of cell dimension estimates
(FCCL) realizations are (Fig. 2): can assist in the determination of wiring delays and clock skews.
This paper is concerned with the automatic estimation of FCCL
1) Circuit Specific: schematics of the circuit—transistors/gates
shape functions from circuit schematics. In other words, we are
count and their area, number of netlists (Fig. 5), and
concerned with the task of “predicting” the width and height of
2) Contextual: floorplan information—how the cell layout fits with
unknown realizations given some features that can be extracted from
its surrounding interconnection port positions.
the circuit schematic and contextual descriptions without knowledge
Different realizations can be produced by varying items in 2) of the layout.
above. A continuous shape function represents a continuous rela- The automatic prediction of cell shape functions in a full-custom
tionship between the width and the height (Fig. 4). design methodology is a complex process because of the large number
Shape functions are useful in automated and manual design tasks. of practical layout options. The creation of a shape function prediction
In automated design, they are used in chip level area optimization model is further complicated by the nonlinear relationships that exist
and floorplanning [1]–[3]. Shape functions of lower level modules between cell features and possible layouts [6].
are assumed to exist and are used to generate the minimal area This paper presents a novel machine learning-based technique for
of overall floorplan by optimally selecting the realization for each the generation of shape function prediction models. The technique is
module as to minimize a chip level cost function. The assumption based on multilayer neural networks trained using gradient descent
of the existence of FCCL shape functions (discrete or continuous) is to predict the width or height of FCCL’s and the number of contacts
valid in the case of library based design systems. In this case, several required by a circuit. To the best of our knowledge, it is the first time
layout realizations of a circuit may exist and the layout dimensions that the estimation of shape functions of full-custom layouts from
of these different realizations can be used to define a discrete shape schematics has been attempted. The technique has been evaluated on
function. For standard cell based designs, the shape functions of cells a database of real cells, and it shows great potential for its practical
and modules can be estimated accurately using empirical [4] and exploitation in CAD systems.
constructive techniques [5].
II. SHAPE FUNCTION MODELING
Manuscript received July 17, 1996. This paper was recommended by The modeling of shape functions can be treated as a nonparametric
Associate Editor M. Sarrafzdeh.
The authors are with the Computer Engineering Laboratory, Department of prediction problem. To achieve this, one has to define features or
Electrical Engineering, The University of Sydney, NSW 2006 Australia. the “predictor variables,” collect data or observations, and use the
Publisher Item Identifier S 0278-0070(98)05195-1. collected data to build an estimation model.
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
614 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 615
(a) (b)
Fig. 6. The packing density of mask layout. It reflects the amount of “dead space” in the layout. The packing density of the layout in (a) is higher than that
of (b). The dead space is any wasted space in the layout that may be due to layout pitch matching. Of course design rules based spacing is excluded.
TABLE I (or regression) algorithms to fit some observation data using a set
FEATURES USED IN SHAPE FUNCTION MODELING EXPERIMENTS of basis functions. Polynomial, splines fitting, and simple linear
regression are the most widely used. Depending on the prediction
problem, some modeling techniques are more suitable than others. For
instance, multivariate adaptive regression splines [7] (or MARS) is a
technique that is more suitable to problems where the input variables
(features) need not be highly combined in a linear or nonlinear fashion
to produce accurate models [8]. From the experiments we have done,
it is shown that there are some complex combinations of the input
feature variables in the modeling of FCCL dimensions and shape
functions, and it also confirmed that modeling techniques such as
MARS do not lead to as accurate predictions as multilayer perceptron
do.
Multilayer perceptrons can provide effective prediction models.
It has been shown that a three-layer multilayer perceptron (MLP)
network is a universal approximator [9]. A typical MLP architecture
is shown in Fig. 7. The input layer has a size N +1 (counting the bias
input) pin neurons. These pin neurons have a unity transfer function.
Their outputs are fed through weights to the next layer that contains
1) Prescribed Input Dimension: The contextual information can nonlinear neurons. Each pin in the input layer is connected to every
be represented in a simpler but indirect fashion. This is possible neuron in the first layer. Each neuron of the first layer feeds its output
in shape function modeling because we can aim at estimating one through weights to all nonlinear neurons of the second layer. And so
dimension given the cell features and given prescribed values of the on, until the output layer.
second dimension. For example, we can develop a model that can The weight connecting neuron i to neuron j is denoted wij :
estimate the width of a FCCL given the cell specific features, the Neurons are indexed from zero (bias pin neuron) to N
aggregate numbers of ports for each type, and prescribed values L01
of the height. In this case, the height of the cell becomes what N =1+ Nl
we call a prescribed input dimension (PID). As the height of the l=0
FCCL is specified and the aggregate number of ports for each type where Nl is the number of neurons in layer l: When pattern n is
is also provided, the shape function prediction model generation applied, the output of neuron i in layer l is
technique could extract a relationship that represents the floorplan-
based features. As we will see later in the paper, the provision of a
yi (n) = f (neti (n)) (1)
PID is simply and conveniently implemented using a linear function where
generator. N 01
Therefore, our modeling experiments have used the ten features neti (n) = wij yj (n):
(predictor variables) shown in Table I. We have not included in j =1
this table the PID as it is automatically generated. The modeling
The function f ( ) is usually a squashing function such as the sigmoid
experiments we describe later evaluate the impact of including the
function
aggregate number of ports for each type as well.
1
fs () =
1 + e0
(2)
C. Prediction Model Construction Techniques
or the hyperbolic tangent function
Many automatic data modeling techniques have been developed
over the last few decades. Traditional techniques make use of “fitting” ft () = tanh(): (3)
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
616 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998
Both hyperbolic and sigmoid functions are used in the networks and/or testing set). During training the CVD is applied through the
described in this paper. MLP and the corresponding total mean-squared error is recorded.
In the case of estimation, the inputs to the MLP are the features Training is halted when this error shows some significant increase.
and its outputs are the variables that need to be estimated. An MLP However, in applications where data is limited, the strength of cross-
is trained to estimate by the means of an optimization algorithm. The validation is diminished because the training and testing data may
most commonly used algorithm is backpropagation [13]. Backpropa- suffer in size, and the size of the CVD may not be sufficiently large to
gation performs training by minimizing the network output error. We confidently detect good stopping conditions. In this case, determining
define the error for the output neuron k when pattern n is applied the best stopping criteria can be identified by running a large number
of experiments using various stopping criteria and values for "m :
ek (n) = dk (n) 0 yk (n) (4)
where yk (n) is the output of neuron k and dk (n) is its desired output,
for pattern n: The pattern mean-squared error for n is defined as III. SHAPE FUNCTION MODELING ARCHITECTURE
ek2 (n)
The shape function modeling architecture proposed in this paper
"(n) = 12 (5)
k2
consists of two subsystems: a complementary dimension estimation
module and a linear function generator. The complementary dimen-
where is the set of all output neurons. The training algorithm sion estimation module estimates a dimension of the FCCL given
will determine the values of the weights as to minimize a total input features from Table I and a PID. To produce a shape function,
mean-squared error the linear function generator produces a monotonically increasing
"Total = "(n): (6) series of numbers between zero and one at a predefined step size.
n
In the experiments we report below, step sizes of 0.1 and 0.01 are
used. Although the shape functions produced by this architecture are
The training will stop when "Total reaches a small prescribed value. discrete, the actual underlying model is continuous. Because the MLP
Once the weights have been determined, the MLP is evaluated on is in effect trained to predict a dimension given the complementary
an independent set of patterns (called the testing set) to assess dimension (PID), we call such configurations of MLP’s dimension-
its generalization capability. The quantification of the generation dependent prediction (DDP).
capability is problem dependent. In the case of shape function
modeling, an error margin "m is defined and the output of the MLP
for pattern n is said to be correct if A. Cells and Layout Database
"(n) "m : (7) Our database was collected from two sources. The first source
is the VLSI designer’s library [11] which was used in full-custom
The error margin "m is also often used in the training algorithm to dimension estimation experiments in [12] and [6]. The layouts from
instruct it to neglect corrections to the weights when (7) holds for the second source are full-custom cells developed in our laboratory in
a particular pattern. The value of "m could be critical. If it is too the 1980’s. Altogether, 53 layouts have been collected. They are all
small, an over-training effect can take place where, when training of digital circuits, nMOS-based, and represent a variety of functions.
completes, the MLP would have memorized well the patterns in The layouts are lambda based with some mapping to a 5-m and
the training but its generalization performance (on the testing set) others to a 3-m fabrication processes. Note that the database does not
degrades compared to its performance when larger values of "m include regular arrays of structures such as PLA, RAM, and ROM.
are used. Techniques such as cross-validation [10] can be used to Such blocks are commonly produced by “module generators” and
avoid over-training. A third set of pattern, called the cross-validation their organizations are predefined and highly accurate estimates of
dataset (CVD), is created (patterns are removed from the training their dimensions and shape functions can be produced.
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 617
Fig. 8. The rotation of a pattern has the effect of introducing a new and important point of the shape function.
Although our database is based on nMOS circuits, the shape 1) the average and standard deviation of the relative error of the
function modeling method described in this paper can be applied to model on an independent test dataset;
other technologies where layouts can be collected. The collection of 2) an error tolerance .
data for such a task is not easy as circuits and associated full-custom An experiment is repeated 20 times with different initial conditions,
layouts are generally treated with secrecy. Hence, we did not have a training, and independent testing datasets to yield an experiment set.
choice in selecting the data as only nMOS circuits and layouts were The average and standard deviation of the two measures above are
available. then computed.
For each circuit in the database, the features shown in Table I and The details of the experimental procedure are shown below.
the corresponding dimensions of the layout are extracted and stored to 1) A set of predictor variables are defined (input features).
form a pattern. Because the features are independent of the orientation 2) A Muscle file (MUME’s neural network configuration file [14])
of the cell, the number of the patterns was doubled by rotating the is created. Parameters such as training convergence criteria and
cells (swapping the width and height for each cell) to produce a total training output error margin "m are set.
number of patterns of 106. As the database does not contain patterns 3) A program is used to randomly generate from the master layout
of equal width and height, no pattern redundancy is introduced. database training and testing sets containing Ntr and Nte
The rotation of the original 53 patterns yields a very convenient patterns, respectively.
effect beside the increase of the amount of training and testing data. 4) The MUME neural-network simulation program is executed to
This effect is illustrated in Fig. 8. For each pattern a new symmetrical train the neural networks.
point of the circuit shape function is introduced by the rotation. 5) When training is completed (either convergence criteria or
The patterns in the database are then normalized by dividing each maximum number of iterations is reached) the generalization
feature value by a number that represents the maximum of that feature performance of the resulting neural-network shape function
in the database. The resulting patterns are then bound between zero model is evaluated on the test set to produce the estimates
and one. Other normalization methods were also assessed but did not of the complementary dimension for each test pattern.
yield better results. 6) The predicted and expected values of the dimension D being
The fact that each FCCL in the database exists in its original and modeled are used by a number of postprocessing programs to
rotated form ensures that the trained MLP is not biased toward a compute:
particular dimension. Hence, where a width- or a height-dimension
estimation MLP was used, proper training should yield on average a) the relative dimension error for a pattern in the test set
similar behavior for each MLP.
"e;p =
jD 0 D j
e;p a;p
(8)
IV. EXPERIMENTS AND RESULTS De;p
The investigation of MPL architectures capable of performing a
mapping/estimation task is an iterative design process. Determining a b) the average of the relative dimension errors
suitable architecture requires understanding of the learning algorithm,
N
the relationship between the dimensions of the input and output 1
"e = "e;i (9)
spaces, the architecture of the network (number and sizes of hidden Nte i=1
layers, neuron transfer functions) and the amount of data. A detailed
account of MLP exploration procedures is beyond the scope of this
c) the standard deviation of the relative dimension errors
paper. In brief, for a given feature set and data, essential elements
in the interpretation of MLP performance include an experimental 2
procedure, a concise performance measure, and variation of number
of free parameters (weights) through the variation of the number of
Nte "2 e;i
0 "e;i
e = e e
Nte (Nte 0 1)
hidden layers and their sizes. (10)
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
618 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 619
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
620 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 621
Fig. 13. Shape functions produced by an 11-feature DDP with the number of contact feature provided from the database.
Fig. 14. Shape functions produced by the 11-feature DDP with the number of contact feature predicted by the NCP.
TABLE VIII
GENERALIZATION PERFORMANCE OF THE
CASCADED 11-6-1 DDP AND 9-11-5-1 NCP
TABLE VII
TRAINING AND GENERALIZATION PERFORMANCE OF THE NCP’s
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
622 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998
Fig. 15. Comparison of shape functions generated with 11-feature-based DDP with the number of contacts provided from the database and from the NCP.
dimensions of FCCL), the input feature representation in our present We show in Table VII the training and generalization performance
context are different and are restricted to the features of Table I. the best performing architecture for each of these groups. The best
Because the NCP will be connected to a DDP, the data used for the performing architecture in each group is shown in italic and the
evaluation of the NCP and the DDP should be consistent. That is, overall best performing architecture is shown in bold (9-11-5-1)
no training data of the NCP should be part of the testing sets of which has an average error of 9.21% with a very small standard
the DDP and vice-versa. The easiest way to maintain consistency is deviation.
to use the 11-feature DDP experiment datasets for the training and
evaluation of the NCP.
Our investigations of the NCP explored various combinations of H. Incorporating the NCP in Shape Function Modeling
circuit features and multilayer perceptron architectures. We have The DDP (11-6-1) and NCP (9-11-5-1) are now cascaded to form
grouped the investigated architectures according to their input rep- an 11-feature shape function prediction system (with the PID input of
resentations (features): the DDP supplied by the linear function). The generalization perfor-
three-features NCP architectures: Ntrans ; Taverage-area ; Nnodes ; mance of the cascaded scheme is shown in Table VIII. Obviously the
four-feature NCP architectures: Ntrans ; Taverage-area ; Nnodes ; performance is slightly degraded compared to the 11-feature shape
P ack; function estimation where the number of contacts was supplied from
the database.
five-feature NCP architectures: Ntrans ; Taverage -area ; Nnodes ;
We show in Fig. 14 the estimated shape functions for patterns 4
P ack; P orttotal ;
and 12. Obviously the improvements introduced by the number of
nine-feature NCP architectures: Ntrans ; Taverage -area ; Nnodes ; contacts are maintained when this number is estimated. This result
P ack; P orttotal ; Datatotal ; C tltotal ; C lktotal ; P owertotal : is very pleasing.
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 7, JULY 1998 623
REFERENCES
We show in Fig. 15 shape functions produced by the 11-6-1
DDP for four different patterns (representative) when the number [1] G. Zimmerman, “A new area and shape function estimation technique
of contacts is sourced from the database and when it is estimated. for VLSI layouts,” in Proc. IEEE/ACM 25th Design Automat. Conf.,
1988, pp. 60–65.
Overall, the differences are not significant. [2] L. Stockmeyer, “Optimal orientations of cells in slicing floorplan
As an implementation of Check 2, we show in Fig. 16 the shape designs,” Inform. Contr., vol. 57, pp. 91–101, 1983.
functions for patterns 4 and 10. Pattern 10 is the closest to pattern [3] T. Wang and D. Wong, “Efficient shape curve construction in floorplan
4 in terms of Euclidean distance (smallest but nonzero Euclidean design,” in Proc. European Conf. Design Automat., 1991, pp. 356–360.
[4] F. Kurdahi and A. Parker, “Techniques for area estimation of VLSI
distance). Although the circuits associated with patterns 4 and 10 layouts,” IEEE Trans. Computer-Aided Design, vol. 8, pp. 81–92, Jan.
are slightly different, the similarity in the estimated shape functions 1989.
provides further evidence that the modeling reflects important aspects [5] F. Kurdahi and C. Ramachandran, “LAST: A Layout Area and Shape
of the desired dimensions relationship. function esTimator for high level applications,” in Proc. European Conf.
Design Automation, 1991, pp. 351–355.
[6] M. Jabri and X. Li, “Predicting the number of contacts and dimensions of
I. Operation Speed full-custom integrated circuit blocks using neural networks techniques,”
IEEE Trans. Neural Networks, vol. 3, pp. 146–153, Jan. 1992.
The operation speed of the overall prediction system is independent [7] J. H. Friedman, “Multivariate adaptive regression splines,” Ann. Statist.,
of the circuit at hand and is directly proportional to the size of vol. 19, pp. 1–141, Mar. 1991.
the MLP’s. The speed can be computed by counting the number [8] J. Zemrole, personal communication.
of multiply/accumulate and neuron transfer functions according to [9] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward
(1)–(3). For either of the neural networks, the number of operation networks are universal approximators,” Neural Networks, vol. 2, pp.
359–366, 1989.
for a feedforward evaluation is approximated to [10] J. Hertz, A. Krogh, and R. Palmer, Introduction to the Theory of Neural
Computation. Reading, MA: Addison-Wesley, 1991.
L L
l l01
N N + Nl :
[11] J. Newkirk and R. Mathews, The VLSI Designer’s Library. Reading
MA: Addison-Wesley, 1983.
i=1 l=1 [12] X. Chen and M. Bushnell, “A module area estimator for VLSI layout,”
in Proc. IEEE/ACM 25th DAC, 1988, pp. 60–65.
Multiply and accumulate Neuron functions [(2) or (3)]
[13] D. Rumelhart and J. McClelland, Parallel Distributed Processing:
Explorations in the Microstructure of Cognition, vol. 2. Camridge,
On a Sparc workstation Ultra 1, the speed for a single feedforward MA: MIT Press, 1986.
through the whole system is less than 200 ms. [14] M. Jabri, E. Tinker, and L. Leerink, “MUME—A multinet multiarchi-
tecture neural simulation environment,” 1994, pp. 229–247.
V. CONCLUSIONS
This paper has presented a novel technique for custom cell layout
shape function estimation based upon machine learning. Our results
show that a dimension can be predicted to within 11.52% average
error and that estimate accuracy is improved when the aggregate
number of ports for each of data, control, clock, and power, are
provided.
The accuracy and speed efficiency makes our proposed method
ideal for use in fast design exploration during floorplanning, timing,
and skew analysis by CAD systems.
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on July 21,2020 at 09:32:05 UTC from IEEE Xplore. Restrictions apply.