Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

NOVEL DESIGN OF POWER EFFICIENT

RECONFIGURABLE MULTIPLY-ACCUMULATE
CO-PROCESSOR
1

K.Jayachitra1, T.Suresh 2

PG scholar, Professor, Dept of electronics and communications.


RMK Engineering College, Anna University of Technology, Kaverapettai, Chennai.

12

kjayachitra@ymail.com,2fiosuresh@yahoo.co.in

Abstract- Designing an adaptive multiply-accumulate


design with area trade off is an extensive and tedious
process. But experimenting multiply-accumulate with
various complex structures is a worthful task, since
multiply-accumulate unit is essential and unavoidable
units in modern digital equipment. As an innovative
design this project focus on designing a reconfigurable
multiply-accumulate unit for trading bit width of the
array size, thus maximizing the hardware resource
utilization. But the critical task of predicting and splitting
the
available
multiply-accumulate
adder
units
dynamically can be made much simpler by utilizing the
power of parallel processing by means of co-processors.
Thus by design a dynamically reconfigurable bit width
multiply-accumulate using co-processor, the performance
criteria of overall chip power and speed can be optimized
to a considerable range.
Keywords-Multiply-accumuale(MAC),Reconfigurable
computing,OFDM,CDMA,Microprocessor unit(MPU)

I INTRODUCTION
Multiplication and accumulations are the critical
operations in Digital Signal Processing applications. High
speed and high throughput Multiply- Accumulate (MAC)
always a key to achieve a high performance digital signal
processing system. The main consideration of MAC
design is to enhance the speed of the system. Multiplyaccumulate (MAC) is a common accelerator used in
extensively used in microprocessors and digital signal
processors for data intensive applications. For example,
many filters, orthogonal frequency- division multiplexing,
channel
estimation
etc
require
FIR,FFT/IFFT
computations that can be efficiently accelerated by
dedicated MAC units.
Reconfigurable computing incorporates some degree
of flexibility in the arithmetic structures to serve different
purposes to minimize the wastage of resources.
Reconfigurable Computing uses reconfigurable devices
as piece of hardware able to dynamically adapt to
algorithms. Reconfigurable hardware devices offer bit
level reconfigurability and it accelerates the need for high
performance in complex and real time applications. Coarse
grain Reconfigurable architecture provides high flexibility
and tackle mainly Digital signal processing /multimedia
issues.
II RELATED WORK
The related work of this paper is discussed here.
Hardware structure and application of dynamically

Reconfigurable hardware for wireless systems and it


provides complete design from analyzing requirements,
specification, VHDL implementation to the final physical
layout of the chip is provided [1].Design of reconfigurable
architecture optimized for protocol processing to trade
energy consumption, cost,size, flexibility by using
platform based design methodology by advocating
reusability[2].Reconfiguration requirements of hardware
platforms for high speed wireless communication systems
with emerging wireless communication standards based on
the OFDM and CDMA transmission techniques is
analyzed[3].Architectural solution for the IEEE 802.11a
MAC layer on chip implementation flow is given and it
exploits dedicated hardware for timing critical tasks[4].
In this paper Reconfigurable multiply-accumulate unit
is introduced to build a feasible and highly flexible with
multiple multipliers and adders. It consists of
Reconfigurable multiplier, reconfigurable adder and a
accumulation unit. Reconfiguration is done with respect to
the following parameters (i) bit widths of the operands that
is it can operate two 64 bit items, or eight 32 bit items, or
thirty two 16 bit items or hundred and twenty eight 8 bit
items or five hundred and twelve 4 bit items with
unsigned, signed or 2s complement representation (ii) the
arithmetic system of computations (iii) various throughput
rates.
Multiplication can be done by repeated use of small
mxm multipliers and small adder circuit blocks. The
accumulation module consists of adder implemented using
combined blocks of carry save adders. Reconfiguration
bits are used to reconfigure the structure and part of the
architecture may be disabled during idle state to save
power. Reconfiguration allows different arithmetic
representations such as signed, unsigned or 2s
complement [6].To reduce power consumption number of
pipeline stages bypassed through multiplexers and it also
achieves high performance and low energy dissipation[7].
Inner product computation used for constructing
reconfigurable multiplier and evaluation is implemented in
FPGA devices to perform comparison in terms of
performance, area and power consumption [5].
III BACKGROUND
Run time reconfigurable MAC is designed to trade bit
width for array size, signed, unsigned or 2s complement
data based on the specified throughput requirements. The
proposed MAC unit consists of reconfigurable multiplier,

reconfigurable
adder
and
accumulation
unit.
Reconfigurable MAC resolves the design conflict between
versatility, area and computation speed. The multiplication
part is implemented through the recursive decomposition
of a partial product matrix, repeated use of small mxm
multipliers and small adder circuit blocks. Adder is
implemented by using combined blocks of carry-save
adders. Appropriate number of pipeline stages can be
bypassed through multiplexers, trading throughput rate for
power consumption. This technique achieves high
performance and very low energy dissipation by adapting
its structure dynamically to computational requirements
over time.
Interface unit manipulates the incoming operands for
multiplication and addition and performs data transfers
from/to the bus. Arithmetic selection unit performs data
conversion between different arithmetic representations.
Reconfigurable multiplier performs multiplication for
various word lengths of input data and it affects the overall
latency, power consumption and area.
Addition unit performs addition for various members
of summands and word lengths of 8,16,32,64 bits and
adder unit is based on carry-save of multiple summands
for increased performance.
Reconfigurable hardware MAC co-processor is
designed to delegate critical tasks and it handles
simultaneous data streams of different bit widths.
Reconfigurable MAC co-processor provides opportunities
of power savings that will keep device feasible for batterypowered hand-held devices.MAC co-processor performs
the tasks parallel and thus it is expected to be more power
efficient than an equivalent implementation of MAC on
either microprocessor or FPGA.The architecture can
switch dynamically between different bit widths of
operands.
IV PROPOSED WORK
A. Reconfigurable Multiplication Unit
The unsigned multiplication can be expressed by
Pi=AiBj2i+j

i and j varies from 0 to 7. (1)

The architecture of 8x8 multiplier is shown in


fig.1.recon1 control bit is used to control the four products
of 4x4 multipliers. Carry-save adder is a adder used for
multiple operands and is more efficient in terms of area.
Two pipeline stages required o complete the 8x8
multiplication. Multiplication of larger input bit width
can be done by decomposition of 8x8 partial products into
four 4x4 products.16 bit data is split into four 4x4
multiplier and the products will be passed through the set
of demultiplexers and multiplexers to accumulate the
partial products in the adder unit. The final reconfigurable
unit pipelines the multiplications and multiplication done
concurrently to reduce the number of pipeline stages. A
reconfigurable m2kxm2k
multiplier based on mxm
multipliers can execute either,40m2kxm2k multiplications,
or 41m2k-1xm2k-1 multiplications , or 42m2k-2xm2k-2
multiplications.

Fig.1. The Reconfigurable 8x8 Multiplier


B. Arithmetic Selection Unit
ASU consists of three modules for the data conversion of
input and outgoing data. Binary number is converted into
its corresponding 2s complement representations and
converting signed into a unsigned number before the
multiplier unit and again restored into original form after
multiplication.recon1, recon2, recon3 and recon4 are the
control bits used to determine if this input is a 32 4 bit
items or 16 8 bit items or 8 16 bit items or 4 32 bit items
or 2 64 bit items respectively. Sign bit is used to save the
sign of the input data by setting bit 1 or 0.sign bits are
examined through an XOR gate and it is given to the result
bit which represents the sign of the number.
C. Reconfigurable Addition Unit
Addition can be done by splitting into number of
additions
and it will perform in parallel thus reducing
pipeline stages and power. Each number split into 2 parts.
In first step, four 2 bit numbers of minimum and
maximum magnitudes are added and two separate results
are added in the second step. Split of additions is shown in
fig.2.

Fig.2. Addition of eight numbers into two steps

Carry save adders is used to increase number of


summands and to enlarge the width of summands. AU
consists of two modules where first module consists of
identical units in which main additions takes place and it is
independent from each other and performs different kind
of additions at the same time. The second module adds the
output of first module to produce final result based on
addition properties.

determines and controls reconfiguration dynamically


through a complimentary reconfiguration controller.

D. Accumulation unit
The accumulation unit is shown in fig.3.Accumulation
unit consists of register and carry look ahead adder.
Recent sum is stored in the register and it will be
transferred to the adder to add it to next available input.
The maximum sum that comes out of the addition unit is
66 bits. Reset and clock are the signals used to transfer the
sum to accumulate the sum. When the multiplier
multiplies two 128 bit input it does not pass through the
AU because it is widest product and it is produced only
once in each time step.

Fig.4. Reconfigurable hardware Co-processor

Fig 3 The Accumulation Unit


E. Reconfiguration Logic
Reconfiguration logic receives clock, reset signal and 10
internal control signals and it performs two functions. The
main functions are internal clock generation and decoding
of internal signals. Reconfiguration control bits are used to
reconfigure bit width, signed and unsigned representation
and 2s complement representation and to regulate
pipeline stages.
C. Reconfigurable Hardware Co-processor
RHCP is responsible for implementing time critical and
power intensive functions of the MAC. The proposed part
is responsible for bit width reconfiguration.fig.4. is a block
diagram of RHCPs main components. It consists of
Interface and Reconfiguration controller (IRC) for the
flexible delegation of Reconfigurable functional unit tasks.
i. Interface and Reconfiguration controller
IRC is a key component of the architecture and it
interprets microprocessor commands to the RHCP.It

Fig.5.Interface and reconfiguration controller

ii. Opcodes and look up tables


Reconfigurable functional units triggers by a series of
request to execute the task only after they are reconfigured
correctly for that particular task. Two tables and opcodes
are required to reconfigure the tasks correctly. An opcode
corresponds to a request for service from a RFU in
particular reconfiguration state. The opcode table has for
each opcode a field indicating which RFU that opcode
corresponds to, and the reconfiguration state that RFU
should be in.IRC needs dynamic rfu_table to find an
RFUs status and it indicates if the RFU is in use, the
current configuration state of the RFU and the status of
any queued request for that RFU.
iii. The controllers
The IRC is a combination of interacting controllers and it
has an interface controller (IC) and reconfiguration
controller (RC).The IC has two interface modules: one that
receives request from MPU, other that interrupts the MPU.
The control task is delegated to task handlers that process
concurrently. Task handlers composed of controllers for
reconfiguration management and MAC operations. The
IRC receives the request and passes it on to one of the
three task handlers. The TH_R looks up the opcode table
for each opcode and then reads the rfu-table to check the
requested state against the relevant RFUs current state. It
invokes the RC if an RFU is in the wrong state. The RC
then triggers the RFU and reconfigures it to the required
configuration. TH_R cleared the first opcode of the super
opcode, it triggers the corresponding TH_M. The TH_M
performs the actual task of reading the opcode and the
associated arguments, interpreting the opcode command
using the opcode table, passing arguments to the RFUs
and triggering them. Three asynchronous task handlers run
concurrently having two independent and asynchronous
controllers, leads to the possibility of contention on some
shared resources like the lookup tables, the RFU and the
interconnect between the IRC and RFU. Mutex variables
are used to handle the tables and task-handler asserts when
it is reading a table. The contention over an RFU is
handled by a sleep/wake and queuing mechanism. The
queuing mechanism has the potential to delay the
execution of task and it will not cause a violation of the
real time constraints of the protocol.
iv. Reconfigurable Functional units
RFU has uniform interface and responsible for
carrying out the actual functions requested by MPU.The
RFUs are heterogeneous and dynamically as well as
individually reconfigurable. The RFU triggered, it
performs operation on a block of data, e.g. an encryption
RFU or a fragmentation RFU. An RFU may operate on a
single data word when triggered.
RFUs are all connected to the data bus of the packet
memory. They are each assigned an address as well, and
an address decoder translates write operation to these
addresses into triggers for the RFUs.IRC or any of the
RFUs can become a master of the packet-bus. Hence, the
same packet bus can be used for: The IRC writing data to
the RFU, the IRC writing data to the packet memory, an

RFU writing data to the packet memory, or an RFU


writing data to another RFU.A bus arbitration block
manages the multiple potential masters for the buses.
RFU operate on a block of data and then IRC hands
over control to the another RFU. Some RFU will need to
interact with another RFU on every word and have too
much overhead to switch between RFU so frequently.RFU
can directly trigger another RFU by asserting its address
on the address bus, there arose situations where an RFU
would be reading data from a memory while requiring
another RFU to process this data. Since this address bus is
being used by the first RFU to read the memory, it cannot
use the same bus to trigger another RFU concurrently.RFU
do their job very quickly and store the formatted packets
in the buffer, ready to be sent.
To overcome this problem, RHCP implements hardwired master/slave mechanism whereby an RFU can
become the master of another RFU, triggering it directly
rather than through the address bus and the address
decoder.
v. Event handlers and buffers
The event handler is a simple block that interprets Rx
events. If a packet is to be received, it formats a service
request. IRC gets service request either from the MPU or
the event handler. The IRC is transparent to the source of
the request. The buffer control is implemented as two
asynchronous interacting state machines. The buffer
interacts with the DRMP data width; quickly carry out
data transaction and leaving the DRMP free to cater to the
different data bit widths mode.
V EXPERIMENTAL RESULTS
Logic utilization

8
bit

16
bit

32
bit

64 bit

Number of 4 input LUT

17

17

17

Number of slices

10

10

10

Number of bonded IOBs

25

41

73

Number of GCLKS

Table.1 Device utilization design summary


Reconfigurable MAC for different bit width.
Logic utilization

8
bit

16
bit

32
bit

64 bit

Number of 4 input LUT

18

37

Number of slices

16

32

64

Number of bonded IOBs

27

48

96

192

of

Table.2 Device utilization design summary of


reconfigurable carry save adder unit for multiple bit
widths.

VI CONCLUSION
The design of a dynamically reconfigurable multiplyaccumulate co-processor is presented. It can be
reconfigured in terms of bit widths and throughput
rate. Reconfigurable MAC co-processor consists of
reconfigurable adder and reconfigurable multiplier to
perform efficient multiplication and addition by the
splitting mechanism. Reconfigurable MAC coprocessor is capable of handling parallel streams of
data of various bit widths. The superiority of the design
is achieved through the use of sub-multipliers,
repeatable parts and totally operation independent
units. Moreover the DRMP provides power saving
opportunities resulting from the reconfigurable MAC
and its concurrent tasks. DRMP handles tasks
concurrently by the request and grant services from/to
the arbiter and the RFUs reconfigures based on the
various bit widths of the operands and thus the DRMP
increases the speed and the performance of the
multiply accumulate operations in the digital
applications.
REFERENCES
[1]

[2]

[3]

[4]

[5]
[6]
[7]
[8]

[9]

J. Becker, T. Pionteck, C. Habermann, and M. Glesner.


Design
and implementation of a coarse-grained
dynamically reconfigurable
hardware architecture. In
VLSI, 2001. Proceedings.IEEE Computer Society
Workshop on, pages 41 46, Orlando, FL, Apr. 1920,
2001
T. Tuan, S.-F. Li, and J. Rabaey. Reconfigurable platform
design for wireless protocol processors. In Acoustics,
Speech, and Signal Processing, 2001. Proceedings.
(ICASSP 01).2001 IEEE International Conference on,
volume 2, pages893896, Salt Lake City, UT, May 711,
2001.
T. Pionteck, L. D. Kabulepa, C. Schlachta, and M.
Glesner. Reconfiguration requirements for high speed
wireless communication systems. In Field Programmable
Technology (FPT), 2003. Proceedings. 2003 IEEE
International Conference on, pages 118125, Dec. 1517,
2003.
G. Panic, D. Dietterle, Z. Stamenkovic, and KTittelbachHelmrich. A system-on-chip implementation of the
IEEE802.11a MAC layer. In Digital System Design,
2003. Proceedings.Euromicro Symposium on, pages 319
324, Sept.16, 2003.
R. Lin, Reconfigurable parallel inner product processor
architecture, IEEE Trans. VLSI Syst. 9 (2) (2001) 261
272.
B. Parhami, Computer Arithmetic: Algorithms and
Hardware Designs, Oxford University Press, 2000.
S. Kim, M. Papafethymiou, reconfigurable low energy
multiplier for multi-media system design, In: Proceedings
of the IEEE Annual Workshop on VLSI, 2000
.S. W. Nabi, C. C. Wells, and W. Vanderbauwhede. A
dynamically
reconfigurable
system-on-chip
for
implementing wireless MACs. In Microelectronics and
Electronics Conference, 2007. RME. Ph.D. Research in,
pages 3740, July25, 2007.
S.W.Nabi, C.C.Wells, anW.Vanderbauwhede.Towards a
Dynamically Reconfigurable SoC for Wireless MACs in
Consumer Handheld Devices. In First International
Conference on Computer, Control and Communication,
pages 182191, Nov. 1213, 2007.

You might also like