Professional Documents
Culture Documents
92.311638
92.311638
3, SEYEMBER 1994
Abstract-In this paper, we describe the design of SODAS- process, operations and variables are assigned to functional
DSP (Sogang Design Automation System-DSP),a pipelined data- and memory units and interconnections among those modules
path synthesis system targeted for application-specificDSP chip are constructed by buses and multiplexors.
design. Through facilitated user interaction, the design space
of pipeliied datapaths for given design descriptions can be The higher is the abstraction level of design automation,
explored to produce an optimal design which meets design con- the wider is the design space to explore [12]. It is very diffi-
straints. Taking SFG (Signal Flow Graph) in schematic as in- cult to devise an efficient high-level synthesis algorithm that
puts, SODAS-DSP generates pipelined datapaths through sched- provides exhaustive search possibilities in the design space.
uling and module allocation processes. New scheduling and mod-
To overcome this problem, application area of a high-level
ule allocation algorithms are proposed for efficient synthesis of
pipelined hardwares. The proposed scheduling algorlthm is of synthesis system is specified to a certain target architecture so
iterativelconstructive nature, where the measure of equidistri- that the system can synthesize the datapath optimized in the
bution of operations among pipeline partitions is adopted as frame of the target architecture.
the objective function. Module allocation is performed in two DSP (Digital Signal Processing) has a very wide applica-
passes: the first pass for initial allocation and the second one for
reduction of interconnectioncost. In the experiments,we compare tion areas such as speech, audio, image processings, and its
the synthesis results for benchmark examples with those of recent importance is recognized. DSP algorithms involve linear and
pipelined datapath synthesis systems, Sehwa and PISYN, and nonlinear arithmetic operations for multi-dimensional signals.
show the effectiveness of SODAS-DSP. The most important aspect of DSP is the real time processing.
A signal processor must be specially designed to execute repet-
I. INTRODUCTION itive real-time algorithms with a constant data rate. So, mini-
mization of the chip area subject to a worst-case data rate must
~ ~~
~
Tasks
I1
12 F1F2 F3F4 F5 +
I3
I4
+$"
1 2 3 4 5 6 7 8 9 If there is no resynchronization overhead caused by pipeline
Time
DII hazards, the average performance gain of a pipeline depends
on DII and clock cycle time [9]. Opportunities for hardware
Fig. 1. Space-time diagram for a five-stage pipeline with DII = 2.
sharing can be increased at the cost of degraded performance
of a pipeline by increasing DII. Area and performance trade-off
SPAID system produces the datapath which uses N buses, each in pipeline designs can be achieved by changing the synthesis
of which is connected to a dedicated register file. The target parameters, DII, clock cycle time, and number of pipeline
architecture of SPAID is very simple and regular. However, it stages. Through careful scheduling of operations to pipeline
is difficult to get high speed computations, because operands stages and allocation of hardware modules, high utilization of
must be fetched from register files and the results of operations hardware modules can be achieved.
are also transfered to register files through buses in each time In Fig. 1, we can find that functions F1, F3, F5 are ex-
step. ecuted at the same time, as are functions F2 and F4. We
In this paper, the design of SODAS-DSP, a high level can group the functions into two clusters; { F l ,F3, F 5 } and
synthesis system for application- specific DSP chip designs, is ( F 2 , F4}. Any functions belonging to the same cluster cannot
described. Targeting pipelined datapaths with fixed DII (Data be executed at the same FU, while functions belonging to
Initiation Interval [9], [ 141 ), efficient scheduling and module different clusters can be. Pipeline partitions are defined to
allocation algorithms are proposed. The proposed scheduling represent the sets of time-overlapping stages whose functions
algorithm is of iterativekonstructive nature. The measure of are executed concurrently on consecutive data. In Fig. 1, sets
equidistribution of operations to the pipeline partitions is ( S l , S 3 , S 5 } and { S 2 , S 4 } are two pipeline partitions. It is
modeled by the 'entropy' function, and the approximate values noticeable that the number of pipeline partitions is equal to DII.
of its derivatives are used as the priority function to distribute It is impossible for operations belonging to a pipeline partition
operations. Module allocation consists of two passes; initial to share hardware resources, while operations belonging to
allocation and allocation improvement. Allocation is iteratively different pipeline partitions are allowed to do so. Most of
performed to reduce the interconnection cost from the initial pipeline synthesis systems employ scheduling and module
allocation. Section I1 presents the target architecture and design allocation algorithms exploiting this fact.
methodology of SODAS-DSP. The proposed scheduling and The target architecture of SODAS-DSP is the pipelined
module allocation algorithms are described in Section I11 and datapath with fixed DII. For the sharing of a FU between
Section IV, respectively. Section V presents the experimental operations, proper interconnections must be provided. For the
results for benchmark examples and conclusions are drawn in interconnection only multiplexors are supported in SODAS-
Section VI. DSP with the belief that high speed execution cannot be
achieved with buses. The target architecture consists of FU's
in tandem with proper amount of storages, as is illustrated in
11. TARGETARCHITECTURE AND DESIGNMETHODOLOGY Fig. 2. It is obtained by assigning functions F1, F2 to FUl
and F3, F4 to FU2, and establishing interconnections with
A. Target Architecture multiplexors and latches. At time 2n + 1, FU1, FU2, and FU3
A DSP algorithm performs a sequence of operations which perform functions F1, F3, and F5, respectively, on the data
organizes similar computation blocks on consecutively initi- produced by their predecessor stages. The results generated by
ated data [3]. Pipelining technique is essential for high speed FUl and FU2 are stored into latches located at output ports. At
DSP applications. The target architecture of this work is the time 2n, functions F2 and F4 are executed in FUl and FU2,
pipelined datapaths with fixed DII like that of Sehwa and respectively. The result produced by FU3 is the final output.
PISYN.
Fig. 1 shows the space time-diagram for five-stage pipeline B. Design Methodology
with DII = 2, where I j represents the j-th task input and Fi The overall configuration and synthesis methodology of
represents the function executed in stage S i . At time 1, task SODAS-DSP is shown in Fig. 3. The user interface of the
I 1 is entered into the pipeline and functions F1 and F 2 are system consists of two major parts; SFG View and Datapath
performed at stages S 1 and S2 on I 1 in two clock cycles. At View. Taking design descriptions in SFGDL (Signal Flow
time 3, a new task I 2 is initiated and functions F1 and F2 are Graph Description Language) and in schematic SFG, the SFG
performed on it, while functions F3 and F4 are performed on View manages the entire design process through menu-driven
11. At time 5 , another task is entered into the pipeline, while user interface. Datapath View displays the datapaths generated
I 1 and I 2 are served at stage S5 and S 3 / S 4 , respectively. by the system. Communication between these two views
294 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSrEMS, VOL. 2, NO. 3, SEPTEMBER 1994
Simulation
(a)
, I
-
INPORT x(8) ;
t -l-
fl
OUTPORT y(8) ;
ModuleSpUwsizer
(NET a l , a2,bo, bl, b2, ml, m2, m3, m4, m5,a1, a2, a4, z l , 22;
Lcglc Synthesis
NODE A1 : CELL CONSTANT : OUTLIST a1 : END :
NODE A2 i CELL CONSTANT i OMLIST &!i END '
NODE BO ; CELL CONSTANT ; OUTLIST bo; END i
NODE E1 : CELL CONSTANT : OUTLIST b l : END :
NoEE 82 i &CL BNsTAfif 6"fLisT b2j END i
Fig. 3 . Overall configuration and synthesis methodology of SODAS-DSP. NODE multl ; CELL MULT ; INLIST 81, z l ; OUTLIST ml ; END ;
NODE mult2 ; CELL MULT ; INLIST a2,22; OUTLIST m2; END ;
is performed by interprocess communication mechanism. It NODE mum ; CELL MULT ; INLIST a l , bo; OUTLIST m3; END ;
NODE muk4 ; CELL MULT ; INLIST zl, bl ; OUTLIST m4; END ;
allows the designers to examine the correspondence between NODE tnUk5 ; CELL MULT ; INLIST 22,b2; OUTLIST m5; END ;
NODE add1 ; CELL ADD ; INLIST x, a2;OUTLIST a1 ; END ;
the operations in the SFG View and the functional modules NODE add2 : CELL ADD ; INLIST m2, ml : OUTLIST a2;END ;
in the Datapath View. NODE add3 ; CELL ADD ; INLIST m3. a4; OUTLIST y; END ;
NODE add4 ' CELL ADD ' INLIST m5 m4' OUTLIST a4; END ;
Design descriptions verified through simulation are handed NODE zi ; CELL z ; INLIST a1 ;OUT~IST'ZI; END .
over to the synthesizer together with design constraints in
NODE 22 ; CELL Z ; INLIST 21; OUTLIST z2; END I
1
area and/or time. Design constraints are refined into synthesis (b)
constraints (such as DII, number of stages, and module set)
Fig. 4. Representations of the second-order IIR filter (a) in SFG (b) in
more suitable for synthesis. Depending on the constraints, SFGDL.
appropriate pipeline scheduling, time constrained or area con-
strained, is performed to generate an optimal pipeline schedule. in SFGDL. In the figure, nodes represent operations and
The scheduling result displayed in the SFG View can be edges represent the signal flows between nodes. To enhance
modified by the designer. Datapaths are established through designer's productivity, design descriptions in higher level
module allocation and displayed in the Datapath View. Under HDL's are also to be supported, such as Silage [7] and VHDL
the fixed pipeline schedule, the module allocation process U81.
determines the sharing of FU's and builds interconnections
among FU's. Designers can also modify the pattern of FU 111. SCHEDULING ALGORITHMFOR PIPELINED DATAPATHS
sharing through the Datapath View. If design constraints are To achieve pipelining, the input task must be divided into
not satisfied in the synthesized datapath, the entire process is a sequence of subtasks, each of which can be executed by
repeated with new synthesis constraints and module sets. The a dedicated hardware stage operating concurrently with the
final design satisfying the given design constraints is simulated other stages in the pipeline. Pipeline scheduling is a process
for verification. Through the user interactive synthesis of that assigns operations to pipeline stages [13]. The goal of
SODAS-DSP, the design space of pipelined datapaths for this process can be either maximizing the speed while area
given design descriptions can be explored to produce an constraints are satisfied, or minimizing the total area cost while
optimal design which meets design constraints. the time constraints are satisfied. Two scheduling algorithms,
SFG View also generates the call pattern graph (CPG), under time constraints and under area constraints, are devised
which contains the information on the SFG hierarchy, to and described in this section.
manage interactive hierarchical design. Each RTL module is
constructed by the corresponding module synthesizer with the A. Objective Function
assumption that all submodules are already synthesized and Throughout the scheduling process, the time frame intervals
in library. Traversing the CPG from leaf modules, functional for all the operations in SFG are calculated and maintained as
modules are synthesized such that all the design constraints a scheduling state. At an intermediate state of the scheduling
are satisfied. process, each operation in SFG has its time frame interval,
[bopn, eopn].The objective function of the scheduling is defined
C. Design Descriptions as the measure of equidistribution of operations to pipeline
A design can be described in SFG using schematic editor, or partitions and can be calculated by the time frame intervals
in SFGDL, a textual representation for SFG. Fig. 4 shows an for operations in SFG. The probability for the operations of
SFG displayed in the SFG View and its textual representation type 'OP' of belonging to stage i , p o p ( i ) , is the normalized
JUN AND HWANG: PIPELINED DATAPATH SYNTHESIS SYSTEM 295
Fig. 5. A scheduling state where the time frame interval of each operation OF(S) = H(OP)w(OP) (4)
is indicated.
where the weight w(0P) for operation type ‘OP’ is defined
by the area and the number of appearances of the ‘OF” type
operations in SFG. The maximal sharing of a functional unit
can be achieved by maximizing the objective function.
3 2666 2 lea Fig. 5 shows a scheduling state, where the time frame
2 2 166 4 166
1
i p 2 5 , 4 166 ‘?$166: interval for each operation is indicated in the SFG. The values
0 2 4 0 G 0 2 4 f f i in square brackets [b,e] are obtained by ASAP and ALAP
schedulings, respectively. An operation with the time frame
interval of [b,e] can be assigned to any stage between b and
e. The DG value at each stage can be obtained by summing
the reciprocal of the length of the time frame interval for the
operations that can be scheduled at the stage. For example, the
DG value for add operations at stage 3, DG+(3), is obtained
as follows;
L I
(b)
I I
DG+(3) = 0.333(due to + 5) + 0.333(due to + 6)
Fig. 6. (a) DG’s for add and multiply operations. (b) Probabilities for add
+ 0.25(due to + 7) + 0.25(due to + 8)
and multiply operations in each pipeline partition. + l(due to + 9) + 0.5(due to + 10)
form of the distribution graph [15] and is given by (l), where = 2.666.
Nop is the number of operations of type ‘OP’ and Prob(opn,
Fig. 6(a) shows the DG for the SFG of Fig. 5. The probabilities
i) is the probability of an operation ‘opn’ scheduled at stage i.
that add and multiply operations belong to each pipeline
poP(i) = Rob(opn, i ) / ~ o P , partition are shown in Fig. 6(b). The probability of add
opnEOP operation in pipeline partition 1 is the sum of probabilities
i = 1,.. . ,max-stage (1) at stages 1 and 4, i.e., P+(l)= p+(l) +p+(4) = 4.166/15+
2.5/15 = 0.44. The measure of equidistribution is calculated
where using (3) as follows.
prob(opn, i ) = l/(eopn - bopn + 1) for eopn I i I bopn
[o otherwise H ( + ) = - ( P + ( O ) b P + ( O ) + P+(l)logP+(1)
The probability that operations of type ‘OP’ belong to + P+ (2) log P+(2))/ log DII
pipeline partition k is given in (2), where the sum is taken = -(0.28* log(0.28) + 0.44* log(0.44)
over all the stages in the pipeline partition. Pipeline partition + 0.28* log(0.28))/ log 3
k(0 5 k 5 DII- 1) is the set of stage Si(1 5 i 5 maxstage), = 0.977
where index i satisfies i modulo DII = k . A functional unit
can be shared among the operations belonging to different H ( * ) = -(P*(O)logP*(O) + P*(l)logP*(1)
pipeline partitions. Thus, the number of functional units in + P*(2) logP*(2))
the final implementation is equal to the maximum number of = -(0.15* log(0.15) + 0.44* log(0.58)
operations in a pipeline partition. + 0.27” log(0.27))
Pop(k) = pop(i), k = 0 , . . . ,DII-l (2) = 0.909
for all 1
s . t . t modulo-DIl=k +
OF() = H(+)w(+) H(*)w(*)
The measure of equidistribution for each type of operations is = 0.977*15 + 0.909*8 = 21.93
defined by an entropy function and given by (3).
In Fig. 6(b), the probability of add operations is more balanced
Dii- 1 than that of multiply operations. Hence, add operation has a
H(0P) = - Pop(k)logPop(k)/log DII (3) larger entropy value; 0.977 for add operations versus 0.909
k=O for multiply operations. The value of the objective function is
The value of H(0P) lies between 0 and 1. H(0P) becomes determined to be 21.93, where the weights for add and multiply
1, when all the pipeline partitions have the same probability. operations are given by 15 and 8, respectively.
:p13
296 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 2, NO. 3, SEPTEMBER 1994
from (bopn, eopn) to (bopn + 1,eopd and (bopn, eopn - 13.31 13.4 14.41 (4.51 15.51
& 14.51
L
Fig. 8 shows a schedule of the FIR filter with the FU
s1 +1+2+3+4+5
s2 4 +7 +e ‘1 ‘2 ‘3 constraints of 5 adders and 3 multipliers. DII is decided by
S3 ’4 ‘5 ‘6 +9 +10 the max( [N+/C+l, [N,/C,l) = max( [15/5], 18/31) = 3.
s4 ‘7*8+11+12
55 +13 +14 Fig. 8(a) is the initial scheduling state obtained by the list
sa schedule, where operations in six stages and in three pipeline
partitions are shown in separate boxes. The initial schedule
s1 +1+2+3
uses 7 adders and 3 multipliers when executed in pipelined
S2 +4+5+6+7+8’1 ‘2‘3 fashion. It exceeds the available adders by 2. At the pipeline
S3 ‘4’5’6+9+10
S4 ‘7’8+11+12 partition P1, two add operations +4 and +5 in the first stage
are selected according to the priority function and deferred
to the second stage. Then the assignment of each operation is
changed as shown in Fig. 8(b). Final results shown in Fig. 8(c)
are obtained by deferring operations +7 and +8 in pipeline
partition P2.
1) Set bopn to the result of the non-pipelined list where p o p , b ( i ) is the probability that operations of ‘OP’ type
scheduling under given constraints. in branch b belong to stage i.
2) Set eopn to the result of ALAP schedule. The objective function of (4) represents the measure of
Step 2: count = 1. equidistribution of operations in pipeline partition k , where
Step 3: If (FU constraints are met in all pipeline partitions) summation of pop(i) over all stages is unique and is equal
then stop. to 1. The reduction of hardware due to conditional sharing is
Step 4: k = count % DII. reflected in the reduction of pop by (6). The weight modified
Step 5: While (# FU’s in pipeline partition k exceeds to reflect this fact is given in (7).
constraints) do max -stage
IV. MODULEALLOCATION
ALGORITHM
+ (0.# latches)
4
Cost Function = ( a . # mux inputs) (8)
1 P1 P2
FU1 +2 +1
FU FU2 +3 +4
(a) (b)
Fig. IO. An example of calculation of the number of multiplexor inputs. (a)
- FU3 “2 “1
MUX
A scheduled SFG. (b) The fanin set and the number of required multiplexor
inputs.
-
Fuv FU -> D, : 3 latches
Latch
FU -> D, : 2 latches
FU -> D, : no latch
left port. Investigating the fanin sets for all the FU ports, the
total number of multiplexor inputs is determined.
Fig. 11 shows a method of sharing latches. When inter-
connections from a FU to destination points Dj,for j = 1,
2, 3, need Nfzl.d, latches, the number of latches for the
complete interconnection is equal to the maximum of Nfu.d,
(b)
for j = 1, 2, 3. In the figure, for example, interconnections
from FU to destination points, D1,D2,0 3 need 3 , 2 , 0 latches, Fig. 12. An example state transition. (a) Initial location state. (b) After
exchanging the FU assignment of operations + 2 and +3.
respectively. Here, three latches are sufficient for the required
data transfer. assignment to FU’s. Assume that operations opl and op2 are
assigned to FU1 and FU2, respectively. Operations opl and
B. Module Allocation Algorithm op2 are candidates for pairwise exchange if both operations
The proposed algorithm performs allocation in two passes. can be performed in FU1 and FU2, and opl can be moved
Taking scheduling result, a possible assignment of operations to FU2 and op2 can be moved to FU1 simultaneously. An
to FU’s is performed at initial allocation. Each possible example state transition is shown in Fig. 12. The initial state
assignment is represented as an allocation state in which the of assignment shown in Fig. 12(a). Two 2-input multiplexors,
interconnection cost of multiplexors and latches is determined. MUXl and MUX2, are necessary for the ports of FU3, where
As module allocation progresses, assignment is refined to the size of fanin set for each port is IFS(FU3, 1)1 = 2 and
reduce the value of the cost function defined in (8) through IFS(FU3, 2)1 = 2. MUXl selects the results of operations +1
state transitions. Possible state transitions are generated by a and +3 for the left operand of FU3, and MUX2 selects the
pairwise exchange of operations and/or by operand swapping. results of +2 and +3 for the right port of FU3. Only one
Among those transitions most beneficial one is accepted as latch is sufficient for the results of FU1, because the results of
next state. The overall module allocation algorithm is summa- FU1 generated by operations +1 and +4 must be transferred
rized as follows: to MUXl and MUX2 through one latch. Similarly, one latch
Step 1: Perform initial allocation. is required for the results of FU2. There are two candidates
Step 2: Find all the possible state transitions. for exchange, (+2, +4) and (+l,+3), in the initial allocation
Step 3: Take the most beneficial one, i.e., a state state. A new state obtained by the exchange of the operation
transition with minimal cost, pair, (+a, +4), is shown in Fig. 12(b). The number of required
among the possible state transitions. latches is equal to that of the initial state, but the number of
Step 4: If the cost is reduced then make state multiplexor input is reduced by 4. This is due to the fact that no
transition and go to step 2. multiplexor is required for the input ports of FU3 as shown
An operation opn assigned to FU1 can be moved to FU2, in the fanin set of Fig. 12(b).
if the operation can be executed in FU2 and no operations In addition, operand swapping is also considered as a
are assigned to FU2 in the pipeline partition that opn belongs candidate of state transition. Every commutative operation can
to. However, possible moves of operations hardly exist when swap its operands to reduce the size of fanin set. Adopting the
operations are scheduled such that maximal sharing of FU’s operand swapping, the interconnection cost is reduced by 12%
is achieved. Two operations are selected to exchange their in the final datapath implementation.
300 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 2, NO. 3, SEPTEMBER 1994
TABLE IV
SYNTHESIS RESULTS
FOR THE F I ~ ELLIFTIC
H WAVE FILTER
(a) WITH# STAGES= 9. (b) WITH# STAGES= 10.
1 2 3 4 5 6 7 8 9
# mulbplias 818 414 4/4 313 414 2!2 2!2 212 2/2
t Reglstm 132U7 69/64 6OPS 53/48 54/32 49/48 48/46 47/46 46/45
CPU lime (Sec) 261 P81 2621 2651 B 7 / 2731 2711 2721 2771
181 343 2 9 0 300 123 144 167 121 106
3 4 4 5 5 6 6
2 3 4 5 6 7 8 9 10
(b)
Fig. 13. Scheduling of the 16-point FIR filter with #stages = 6 , DII = 3
t mumpliers 414 qa ziz 313 m 2iz 2~
(a) Result by Sehwa and PISYN. (b) Result by SODAS-DSP.
U adders 13/13 io10 70 m 56 515 615 414 4n
TABLE 111
(# STAGES = 6)
FOR THE 16-POINT FIR FILTER.
SYNTHESIS RESULTS U MUX inputs 66146 w4.s 66145 58/40 132144 55/41 57/42 wn9 52/41
t Reglsers 72lM 68/56 59159 60H 5562 54/47 52/48 51/46 51/46
I 2 3 4 5 6 CPU hme (Sec) 382/ 262/ 3771 3481 366l 3511 341 3391 3451
4W 334 400 643 392 527 479 469 549
t mumprim
t addars
t MUX inputs
U Reglam
V. EXPERIMENTAL
RESULTS
SODAS-DSP has been implemented in C language on
SUN SPARC-I workstation running UNIX. Experiments are
performed on the benchmark examples, the 16-point FIR filter,
fifth-order elliptic wave filter, and FDCT (Fast Discrete Cosine
Transform) kernel. Those are taken from the systems reported
in the literature [2], [lo], [16] for the comparison purposes.
cases. When the number of stages is set to 10, one multiplier TABLE V
and one adder are saved with DII = 3, and one adder is PATTERN FU’S FOR THE DATAPATH
OF SHARING SHOWN
IN FIG. 16.
-
saved with DII = 8 and DII = 10. The interconnection cost is
also reduced by 20%. It validates the use of the cost function
-ALUl
sube
--
ALU4 ALU5 ALU6
7
Fig. 16. Synthesized datapath for the FDCT kemel in Datapath view.
have been proposed. The proposed scheduling algorithms, the literature. For the dramatic enhancement of designer’s
under time constraints and/or area constraints, have the ob- productivity, efforts are being made to support high level
jective function adopting the measure of equidistribution of HDL’s. Silage is now supported, and researches are being
operations among pipeline partitions. The time constrained continued to support VHDL.
scheduling algorithm is of iterativekonstructive nature, where
the derivative of the objective function is used as the priority
function. A variation of the list scheduling is proposed for ACKNOWLEDGMENT
the synthesis of pipelined datapaths under area constraints. The authors would like to thank their colleagues Y. Lee who
The proposed module allocation algorithm iteratively improves developed a VHDL simulator, and M. Hyun who designed a
the interconnection cost (the numbers of multiplexor inputs powerful VHDL synthesis system to build up a fancy working
and latches) from an initial allocation by painvise exchange environment in the CAD & Computer Systems Laboratory of
of operations and operand swapping. In the experiments, Sogang University. The authors also wish to thank anonymous
we showed that SODAS-DSP generates efficient pipelined reviewers for their constructive comments to improve the
datapaths compared with the systems reported previously in quality of this paper.
IUN AND HWANG: PIPELINED DATAPATH SYNTHESIS SYSTEM 303
REFERENCES synthesis method for conditional branches,” in Proc. ZCCAD, pp. 62-65,
Nov. 1989.
H. De Man et al., “Architecture driven synthesis technique for VLSI [ 181 IEEE Standard VHDL Language Reference Manual, IEEE 1076-1987,
implementation of DSP algorithms,” IEEE Proc., vol. 78, no. 2, pp. April 1989.
319-335, Feb. 1990.
R. Camposano and W. Wolf, High-Level VLSI Synthesis. New York:
Kluwer Academic, 1991.
F. Catthoor and H. De Man, “Application specific architectural method-
ologies for high throughput digital signal and image processing,” IEEE Hong-Shin Jun received the B.S. and M.S.
Trans. ASSP., vol. 38, no. 2, pp. 339-349, Feb. 1990. degrees in electronic engineering from Sogang
D. D. Gajski, Silicon Compilation. Reading, MA: Addison Wesley, University, Seoul, Korea in 1989 and 1991,
1988. respectively.
B. S. Haroun and M. I. Flmasry, “Architectural synthesis for DSP He is currently working toward the Ph.D. degree
silicon compilers,” IEEE Trans. Compter-Aided Design, vol. 8, no. 4, with the Department of Electronic Engineering of
pp. 431447, April 1989. Sogang University. His research interests include
R. I. Hartley and J. R. Jasca, “Behavioral to structural translation in a silicon compilation and optimization in VLSI
bit-serial silicon compiler,” IEEE Trans. Compter-Aided Design, vol. 7, design.
no. 8, pp. 877-886, Aug. 1988.
P. Hilfinger, “A high level language and silicon compiler for digital
signal processing,” in Proc. Custom Integrated Circuits Conf., May
1985, pp. 213-216.
K. Hwang and A. E. Casavant, “Scheduling and hardware sharing in
pipelined data paths,” in Proc. ICCAD, Nov. 1989, pp. 2&27.
P. M. Kogge, The Architecture of Pipelined Computers. New York:
McGraw-Hill, 1982.
D. J. Mallon and P. B. Denyer, “A new approach to pipeline optimisa-
tion,” in Proc. EDAC, March 1990, pp. 83-88. Sun-Young Hwang (M’86) received the B.S. de-
J. Kim, F. Kurdahi and N. Park, “Automatic synthesis of time-stationary gree in electronic engineering from Seoul National
controllers for pipelined datapath,” in Proc. ICCAD-91, Nov. 1991, pp. University, Seoul, Korea in 1976, the M.S. degree
30-33. from Korea Advanced Institute of Science in 1978,
M. C. McFarland and A. C. Parker, “The high level synthesis of digital and the Ph.D. degree in electrical engineering from
systems,” IEEE Proc., vol. 78, no. 2, pp. 301-318, Feb. 1990. Stanford University, CA in 1986.
N. Park and A. C. Parker, “Sehwa: A software package for synthesis From 1976 to 1981, he v a s with Samsung Semi-
of pipelines from behavioral specification,” IEEE Trans. Compter-Aided conductor, Inc., Korea, where he designed several
Design, vol. 7, no. 3, pp. 356-370, Mar. 1988. CMOS VLSI chips and managed design section.
N. Park, “Synthesis of high-speed digital systems,” Ph.D. dissertation, Until 1988, he was with the Center for Integrated
Univ. Southem Calif., Oct. 1985. Systems at Stanford University, working on high-
P. Paulin, “Force directed scheduling for the behavioral synthesis of level synthesis and simulation system design. In 1986 and 1987, he held a
ASIC’s,” IEEE Trans. Compter-AidedDesign,vol. 8, no. 6, pp. 661-679, consulting position at Palo Alto Research Center of Fairchild Semiconductor
June 1989. Corporation. In 1989, he joined Sogang University, Seoul, Korea, where he
G. Saucier and P. M. McLellan, Logic and Architectural Synthesis for is currently an Associate Professor of electronic engineering. His research
Silicon Compilers. Amsterdam: North-Holland, Elsevier Science 1989. interests include silicon compilation, VLSI design, and computer systems
K. Wakabayashi and T. Yoshimura, “A resource sharing and control design.