Microelectronics Journal: Ali Zarei, Farshad Safaei

Microelectronics Journal 82 (2018) 62–70
Contents lists available at ScienceDirect
Microelectronics Journal
journal homepage: www.elsevier.com/locate/mejo
Power and area-efficient design of VCMA-MRAM based full-adder using

approximate computing for IoT applications
Ali Zarei, Farshad Safaei *
Faculty of Computer Science and Engineering, Shahid Beheshti University G.C., Evin, 1983963113, Tehran, Iran
A R T I C L E I N F O A B S T R A C T
Keywords: The next generation in computing era will be within the realm of internet of things (IoT). High density, near-zero
Approximate computing leakage, and high endurance are some of the important properties of magnetic RAM (MRAM) which makes it
Internet of things (IoT) attractive for many IoT applications. As in many of these applications, the computational error is tolerable to some
Magnetic tunnel junction (MTJ)
extent; therefore, it is reasonable to use approximate computing to have significant gain in terms of area and
Non-volatile full-adder
Voltage-controlled magnetic anisotropy
power. Full-adder is one of the main parts of CPU for basic operations. In this paper, we utilize voltage-controlled
(VCMA) magnetic anisotropy spin transfer torque (VCMA-STT) magnetic tunnel junction (MTJ) as a memory element in
Low-power computing accurate non-volatile full-adder and present the corresponding writing circuit that improves 7.6x power con-
sumption compared to the state-of-the-art work. We also propose several approximate magnetic full-adders
(AMFs) based on STT-assisted precessional VCMA that are very cost-effective. Some proposed AMFs in this
work have improved more than 50% of area and 9.5x of power consumption.
1. Introduction approximation since they are interacting with noisy input data [9]. These
applications endure some errors but the final output should be in a se-
By emerging Internet of Things (IoT), the number of smart devices mantic range. Various hardware components such as data processing and
around the world is increasing and it is anticipated to be doubled by 2020 data storage can exploit the tolerable error [10]. In Ref. [11] authors
[1]. Therefore, the speed of the data generation rate will be so high that studied the use of MRAM to design a non-volatile processor for IoT de-
the current computing systems cannot process them and need high-speed vices and concluded that in most cases it will lead to better energy effi-
technology. Also, despite the advancements in semiconductor technology ciency. Emerging memory technologies can potentially make IoT devices
and the developments of low-power design, the power consumption of a energy-efficient, and approximate computing can increase the battery of
processing system has been increased dramatically due to larger amounts these devices [12]. For example, in a European project, STT-MRAM uti-
of data [2]. lized for analog and digital sub-systems of IoT platforms that led to better
Approximate computing is one of the most important methods for integration and power efficiency of embedded and mobile communica-
reducing power consumption in IoT in which fewer resources are used tion systems [13].
[3]. In approximate computing, it is possible to reduce the area, power, The full-adder (FA) is the main element in the computational circuits
and delay effectively without affecting the output quality significantly used in the arithmetic logic unit (ALU) of each processor for addition and
[4–6]. The approximate computing is very important in applications such other operations such as subtraction, division, and multiplication.
as image and signal processing, voice recognition, data analysis, machine Additionally, full-adders are also used in floating point units (FPU) and in
learning, and global positioning system (GPS); because one of the com- address generation to access cache and main memory [14]. Recently,
mon points of these applications is their ability to tolerate a certain non-volatile magnetic full-adders (MFA) have been designed and
amount of computational errors [5–7]. In addition, with the increasing implemented, which use memory elements to store data values and
popularity of smart devices that support multimedia applications, the almost have no static power consumption [15–17].
demand for video and image processing with a very small budget has International Technology Roadmap for Semiconductor (ITRS) re-
been increased [8]. ported static power consumption would have a considerable portion of
Numerous IoT applications are naturally dealing with the the whole power consumption in the future [18]. Power consumption,
* Corresponding author.
E-mail addresses: a_zarei@sbu.ac.ir (A. Zarei), f_safaei@sbu.ac.ir (F. Safaei).
https://doi.org/10.1016/j.mejo.2018.10.010
Received 14 June 2018; Received in revised form 3 October 2018; Accepted 24 October 2018
Available online 27 October 2018
0026-2692/© 2018 Elsevier Ltd. All rights reserved.
A. Zarei, F. Safaei Microelectronics Journal 82 (2018) 62–70
particularly static power is one of the major challenges in conventional With the VCMA effect, the magnetization dynamics of the free layer of
memory based on semiconductors. Nevertheless, emerging non-volatile an MTJ can be defined by a modified Landau-Lifshitz-Gilbert (LLG)
memory makes it possible to have a suitable option to solve this chal- equation, as [23]
lenge since its static power consumption is almost zero [19,20].
d!m ! d!m
Spin transfer torque magnetic random access memory (STT-MRAM) ¼ γ!
m H eff þ α!
m ρstt !
m ð!
m !
m rÞ (2)
in emerging non-volatile memories, is one of the most important solu- dt dt
tions that can be employed instead of SRAM in the cache thanks to its
high density, low static power consumption, and high endurance. where γ is the gyromagnetic ratio, ! m is the magnetization vector of the
Accordingly, researchers have been also designed flip-flop and MFA free layer, !m r is the polarization vector, α is the Gilbert damping factor,
!
circuits with this technology [21,22]. However, STT-MRAM write power H eff is the effective magnetic field, and ρstt is the STT factor. The effective
and delay is less efficient than conventional SRAMs due to the need to magnetic field comprises of external field, demagnetization field, thermal
higher charge currents for switching [19]. An appropriate solution for !
noise field, and voltage-dependent anisotropy field (H ani ðVb Þ) that is
reducing energy is using an electric field instead of a charge current, expressed as [27]
which has much lower power consumption [23].

Controlling magnetic field by an electric field can be a useful and ! 2ki ð0Þtox 2ξVb
H ani ðVb Þ ¼ mz (3)
effective method rather than STT for high-speed, low-power MRAM ap- μ0 Ms tf tox
plications. Voltage controlled magnetic anisotropy (VCMA) allows using
In Eq. (3), ξ is the VCMA coefficient, tf is the thickness of the free
an electric field or voltage to switch the magnetic tunnel junction (MTJ).
To take benefit of the VCMA effect, there are several switching ap- layer, tox is the thickness of the oxide barrier, and ki ð0Þ is the
proaches that the STT-assisted precessional VCMA is generally optimal in perpendicular-anisotropy under zero voltage.
terms of dynamic power consumption, design complexity, and latency The magnetization vector of a ferromagnetic layer is aligned along an
[24]. By employing STT-assisted precessional VCMA, a low-power axis in the steady state that is called easy axis, which is most desirable
non-volatile flip-flop was designed for data backup operation, as well from the minimum potential energy point of view. The orientation of the
as an MRAM-backed SRAM cell for reconfigurable memory units of free layer or fixed layer easy axis gives two different types of MTJs ac-
FPGAs [25,26]. The key contributions of our work are: cording to the geometry of the ferromagnetic layers: In-Plane Anisotropy
(IPA) devices and Perpendicular-to-Plane Anisotropy (PPA) devices [28].
We introduce STT-assisted precessional VCMA based full-adder that is PPA device relative to IPA consumes less energy, has higher thermal
very cost-effective in comparison to other types of full-adders that stability, and better scalability [23]. Hence, we consider PPA in this
makes it particularly suitable for IoT applications. paper.
We provide write circuits with only one supply voltage in order to be In PPA MTJ (P-MTJ), a voltage is applied for increasing the switching
employed in STT-assisted precessional VCMA MTJ. speed by creating an electric field across the oxide barrier. Fig. 1.a shows
We propose several approximate full-adders based on STT-assisted the p-MTJ device based on STT-assisted precessional VCMA and Fig. 1.b
precessional VCMA MTJ that are very power and area-efficient. depicts the writing signal along it. Writing signal for STT-assisted pre-
Circuit and application level evaluation is considered for the pro- cessional VCMA p-MTJ includes a VCMA effect voltage pulse (1.2 V) that
posed accurate and approximate full-adders design. To evaluate our is applied at time t (in pre-charge phase) for 0.2ns that follows by an STT
approximate designs at application level, we apply these methods into voltage pulse (0.8 V or 0.8 V) within 0.4 ns [25]. In fact, VCMA effect
the Sobel edge detection algorithm. can be used to assist STT for MTJ switching. If STT voltage pulse is
positive (T1 – T2 ¼ 0.8), '00 is written in the p-MTJ, otherwise if the STT
The rest of this paper is organized as follows: first, we review MTJ and voltage is negative (T1 – T2 ¼ 0.8), '10 is written in the p-MTJ.
explain the STT-assisted precessional VCMA strategy in Section 2. In To switch MTJ from '1' (P) to '0' (AP) or vice-versa, a number of
section 3, we present magnetic full-adder based on STT-assisted preces- transistors are needed to control the VCMA effect and STT current.
sional VCMA. Related work and our proposed approximate full-adders Fig. 2.a shows the write driver for the STT. If the write signal (En-write) is
and its application on Sobel edge detection algorithm are introduced in active, an input ('10 or '00 ) can be written in the MTJ. Fig. 2.b and c depict
section 4. Simulation results are presented in Section 5. Finally, Section 6 the STT-assisted precessional VCMA MTJ proposed write circuit and
summarizes and concludes this paper. switching waveforms, respectively. When the P0 transistor is turned on,
i.e. the VCMA signal is '00 , source voltage pulse (1.2 V) applies the VCMA
2. VCMA-MTJ effect during 0.2ns on MTJ (purple solid path).
Afterward, if En-write is set to '10 , N0 and N1 transistors are activated
MTJ is an essential element in MRAM. Each MTJ composed of two by STT signal to write '0' (red dashed path) or '1' (green dotted path).
ferromagnetic layers called free layer and fixed layer (CoFeB) with an
oxide tunnel barrier (MgO) that is sandwiched between them as an
insulator. Two directions can be considered for the free layer: parallel (P)
or anti-parallel (AP) with a fixed layer, which ascertains the MTJ resis-
tance. When the magnetic direction of the free layer and fixed layer are
parallel, the resistance is low; in contrast, if the directions are anti-
parallel, the resistance is high.
The difference between the resistance in both parallel and anti-
parallel modes is determined by the ratio of the tunnel magnetic reso-
nance (TMR) shown in Eq. (1) [20]. RP and RAP are MTJ resistances in
parallel and anti-parallel states, respectively. Since the larger TMR ratio
represents a more difference between the resistance of parallel and
anti-parallel, the TMR ratio is a proper measure for distinguishing two
states in MTJ.
Fig. 1. (a) STT-assisted precessional VCMA p-MTJ device. (b) write signal in-
RAP RP
TMR ¼ (1) cludes VCMA effect followed by STT for writing '00 in STT-assisted precessional
RP VCMA MTJ [25].
63
Fig. 2. (a) write driver (b) proposed write circuit (c) MTJ switching waveforms for STT-assisted precessional VCMA p-MTJ.
When the source voltage passes through the NMOS transistor (N0, N1), it Hence, the 'B0 is written in the 'MTJ-L0 and 'MTJ-R0 as complementary. A
will be about 0.8 V, having reduced by the amount of NMOS threshold PCSA comprises a pre-charge sub-circuit, a discharging sub-circuit, and a
voltage (Vth, n ~0.4 V). This issue will result in the write circuit of MTJ to pair of inverters. PCSA works in two phases: In the first phase, 'CLK0 is set
be powered by a 1.2 V supply voltage instead of two different voltage to '00 to pre-charge complementary outputs such as 'SUM0 and '/SUM0 ,
amplitudes. Subsequently, STT current is applied to the MTJ during 'Cout', and '/Cout'. Then in the second phase, for discharge, 'CLK0 is set to
0.4 ns. As illustrated in Fig. 2.b, P3 and N2 transistors are turned on to '1'. There are two branches with non-identical resistance toward dis-
pass STT current across MTJ if the input is '0'. Therefore, the magneti- charging. The branch with fewer resistance is pulled down, which causes
zation of the free layer is anti-parallel with the fixed layer. In contrast, if the other branch to be pulled up to the 'Vdd', simultaneously. Eventually,
the input is '10 , P2 and N3 transistors will be turned on that makes the the branch with lower resistance settles to the 'Gnd'. CMOS logic is also
magnetization of the free layer parallel to the fixed layer. based on Eq. (4) to Eq. (7) [15]. Unlike 'Cout' CMOS logic, the CMOS logic
for 'SUM0 matches to the logic correlation between the inputs 'A0 , 'B0 and
3. Non-volatile STT-assisted precessional VCMA based magnetic 'Ci'.
full-adder
Sum ¼ A B Ci ¼ ABCi þ ABCi þ ABCi þ ABCi (4)
As illustrated in Fig. 3, the MFA structure contains a pre-charge sense
amplifier (PCSA), CMOS logic, and MTJ memory elements. A pair of Sum ¼ ABCi þ ABCi þ ABCi þ ABC (5)
MRAM cells is on the contrary state which is related to non-volatile input
data [15]. In this paper we implemented a non-volatile magnetic Cout ¼ AB þ ACi þ BC (6)
full-adder based on STT-assisted precessional VCMA MTJ. Compared to
STT-MTJ based MRAM, the STT-assisted precessional VCMA gains a Cout ¼ AB þ ACi þ BCi (7)
lower power consumption and faster writing switching. By using the
STT-assisted VCMA-MRAM, the dynamic writing energy can be reduced
4. Related work
significantly, whereas STT-MRAM has 30x higher energy consumption
[29].
There are many methods in the literature for accurate and approxi-
In this paper, we suppose that input 'B0 is the only non-volatile datum.
mate adders [30–39]. Approximate adders compared to accurate ones
64
Fig. 3. STT-assisted precessional VCMA based Magnetic Full-Adder architecture with 'SUM0 and 'Cout' outputs, 'B0 is non-volatile data.
sacrifice a percentage of accuracy for power, area, and delay reduction. to '/Cout ' and '/SUM 0 outputs, respectively. In other words, only in two
For example, in Ref. [30], OR operation was used instead of the XOR out of eight cases where 'A ¼ B ¼ Ci ¼ 00 and 'A ¼ B ¼ Ci ¼ 10 , the
operation in 'SUM0 to compute least significant bits. Several valuable calculation of 'SUM0 from '/Cout' and 'Cout' from '/SUM0 is not correct
works has been done in the approximate non-volatile magnetic (bold content in Table .1). Therefore, we can exploit this feature and
full-adders [32,37]. In Ref. [37] approximate majority gate based obtain 'SUM0 or 'Cout' approximately.
full-adder is presented that utilized current induced Domain Wall Motion Due to the fact that the number of 'Cout' sub-circuit CMOS logic
(DWM). In Ref. [32], authors reduced the supply voltage of the circuit so transistors is less than that of the 'SUM0 sub-circuit, we consider the 'Cout'
that it can be used to approximate computing (AX-MFA2). According to sub-circuit intact and manipulate the 'SUM0 sub-circuit for approxima-
the results presented, half of the 'SUM0 is erroneous and the number of tion. If 'Cout' is error-free, its advantage is that in making multi-bit adders
errors (NoE) is 4. In addition, the 'Cout' is not exactly correct and has NoE the error does not go up to higher stages and will not propagate.
of 2. In addition, they eliminate the input 'Ci' from the CMOS logic to Therefore, in AMF1, we utilized the 'Cout' sub-circuit CMOS logic instead
calculate the 'SUM0 based on 2-input XOR which has an NoE of 4, but the of the 'SUM0 sub-circuit CMOS logic. As shown in Fig. 4 the CMOS logic of
output of the 'Cout' operates correctly (AX-MFA1). In these methods, the the '/Cout' and 'Cout' sub-branches was placed in the CMOS logic of the
area of the FA circuit is not reduced significantly, while in our proposed 'SUM0 and '/SUM0 sub-branches, respectively with fewer transistors. For
method it is decreased at a much lower cost. AMF2, we removed the complete 'SUM0 sub-circuit and exploit the '/Cout'
In this paper, seven approximate magnetic full-adders (AMFs) based sub-branch in the 'Cout' sub-circuit instead of 'SUM'. AMF2 is a
on STT-assisted precessional VCMA is presented. Although VCMA effect straightforward method and efficient than AMF1, since in AMF2 a PCSA,
improves energy-delay efficiency using an electric field, for a very limited two MTJs with along write circuit and eight transistors in CMOS logic
energy budget IoT applications that can tolerate some errors, we propose will be removed which is very cost-effective in terms of power and area.
approximate full-adders that are more cost-effective than previous works Consequently, as shown in Table 2, in AMF1 and AMF2, 'Cout' is accurate,
having the same or even better accuracy. For simplicity in the remainder and 'SUM0 has an NoE of 2.
of this paper, we call these methods AMF1 to AMF7. One of the problems with MTJ is power consumption and delay in
Table .1 shows the truth table of a full-adder. According to the truth writing. Writing circuit transistors should be large enough to provide the
table, we find that the outputs of 'SUM0 and 'Cout' in most cases are equal required current for MTJ switching. Hence, in AMF3, we eliminated
writing circuit of MTJs in 'SUM0 sub-circuit, in which case the initial
states of the MTJs remained unchanged, and the NoE of 'SUM0 is 4.
Table 1 AMF4 is the same as AMF1, with the difference that MTJs in 'SUM0
Truth table of a full-adder.
sub-circuit have not write circuit. In this method, 'SUM0 has an NoE of 4.
A B Ci SUM /SUM Cout /Cout In AMF5, we only removed the CMOS logic in the 'SUM0 sub-circuit with
0 0 0 0 1 0 1 8 transistors that half of the them is correct and the NoE is 4. In AMF6, we
0 0 1 1 0 0 1 removed the entire MTJs write circuits in full-adder, which significantly
0 1 0 1 0 0 1 reduced power consumption. Note that in this method, the 'Cout' output
0 1 1 0 1 1 0
has less NoE than 'SUM0 output, which is 2 and 4, respectively.
1 0 0 1 0 0 1
1 0 1 0 1 1 0 AMF7 is a combination of AMF2 and AMF6, which has the minimum
1 1 0 0 1 1 0 cost among all proposed methods. In the AMF7, the 'Cout' sub-circuit
1 1 1 1 0 1 0 MTJs have no write circuit, and we also removed the 'SUM0 sub-circuit
65
Fig. 4. CMOS logic for AMF1.
Table 2
SUM and Cout outputs of accurate and approximate magnetic full-adder; proposed (AMF1 to AMF7) and previous works: (AX-MFA) [32], approximate Domain
Wall-based [37], approximate CMOS-based [38].
to exploit the '/Cout' sub-branch in the 'Cout' sub-circuit instead of 'SUM'. terms of accuracy in the approximate magnetic full-adders. The PSNR (in
In this method, a PCSA and two MTJs with the CMOS logic in the 'SUM0 dB) and the SSIM are two key parameters that can be used to compare
sub-circuit, and four MTJs writing circuits in the full-adder circuit were two images [44]. PSNR equals INF and SSIM equals '10 in the accurate
deleted while 'Cout' and 'SUM0 have NoE of 2 and 4, respectively. The case where two images are the same.
outputs of the 'SUM0 and 'Cout' for accurate full-adder are presented in
Table 2 with the proposed and previous approximate full-adders. Incor- 5. Simulation results and evaluation
rect outputs in this table are bolded in gray and as we can see, AMF1 and
AMF2 have less NoE. To evaluate our proposed methods and to compare it with previous
works, we used the HSPICE simulator with a PTM-45 nm CMOS model
4.1. Sobel's edge detection algorithm [45] and STT-assisted precessional VCMA Verilog-A model which was
developed in Ref. [24]. In addition, we utilized MatLab to implement
Image processing is employed in different applications and one of its Sobel algorithm and to measure quality metrics of images after approx-
important aims is to provide images for the measurement of the features imation. In the other words, the Verilog-A model of STT-assisted VCMA is
and structure in order to extract some applicable information [40]. The used for device level simulation. The Hspice simulation is used for
most fundamental feature of an image is its edges. Image edges are the measuring power and delay at circuit level. In addition, at application
areas having striking contrasts. It includes the important information of level, the MatLab simulation is used for impact of approximate full-adder
an image. on Sobel algorithm output quality by calculating PSNR and SSIM metrics.
The edge detection of an image filter out unusable information and For mapping Hspice simulations to MatLab, we implemented a function
significantly reduces the amount of data, while saving substantial in MatLab that performs approximate bitwise addition. In this function,
structural properties in an image [41]. Sobel is one of the most important approximate add operation and number of LSB for approximation can be
edge detection algorithms in image processing. Recently, image pro- configured.
cessing algorithms especially edge detection such as Sobel, have been An MTJ device with a TMR ratio of 100% is considered for all sim-
implemented to monitor the environment in IoT devices [42,43]. The ulations and the rest of parameters are listed in Table 3. We simulate our
motivation that we used Sobel algorithm is that, it can be implemented accurate and approximate STT-assisted VCMA-MRAM based full-adders
with almost addition operator with an 8 bit-width addition. Then, in and compared them with the accurate and approximate CMOS-based
order to investigate the effect of proposed AMF in the image quality of [34,38], MTJ-based [34], SHE-based [35] and DWM-based [36,37]
the Sobel algorithm output, we applied approximate full-adders into full-adders.
Sobel algorithm according to Table 2. Fig. 5 depicts the transient simulation of a single-bit accurate and
Approximate addition can be applied over a number of least- approximate magnetic full-adders whose inputs A, B, Ci and its outputs
significant bits (LSB) according to the trade-off between accuracy and are 'SUM0 , 'Cout'. For instance, in accurate MFA if 'A' ¼ '10 , 'B' ¼ '10 ,
cost. We used the peak-signal-to-noise ratio (PSNR) as well as the 'Ci' ¼ '10 , the 'SUM0 and the 'Cout' are '10 but in MFA1 or MFA2 the 'SUM0 is
structural similarity index measure (SSIM) to compare performance in '00 and the 'Cout' is '1'.
66
Table 3 scaled our works to this process node. To do this, we used scaling
Parameters used in the model. equations for the accurate prediction (presented in Ref. [48]) of power
Default Value Description Parameter and delay from 45 nm with Vdd ¼ 1.2–180 nm node with Vdd ¼ 1 which
2 also used or scaled in previous works.
0.32 mJ/m Interfacial PMA Ki
0.625 106 A/m Saturation magnetization Ms Since instead of electric current, an electric field is utilized in VCMA
60 fJ/V.m VCMA coefficient ξ effect, compared to STT-MTJ switching it has lower dynamic power
50 nm Diameter of MTJ d consumption. Besides, STT-assisted VCMA-MTJ employed much thicker
1.4 nm Oxide layer thickness tox tunneling barrier which causes negligible static power consumption.
1.1 nm Free layer thickness tf
Additionally, STT-assisted VCMA-MTJ writing speed is faster than STT-
0.05 Gilbert Damping Factor α
40 Thermal stability at Vb ¼ 0 Δ MTJ [29]. As we expected and based on simulation results, our pro-
~100 kΩ, ~200 kΩ MTJ resistance Rp ; RAP posed accurate STT-assisted VCMA-MRAM based full-adder significantly
outperforms all previous accurate full-adders [34–37] in terms of power
consumption by 7.6x compared to state-of-the-art work. As declared in
Table 4, the proposed accurate full-adder device count (area) is almost
the same as the other accurate works. Delay corresponds to the maximum
time that inputs change and affect the 'Sum' or 'Cout' output. Due to
simulation results, the full-adder delay is obtained from 'Cout' output and
as result shows, the delay is lower than 1 ns.
As shown in Table 4, AMF7 has the lowest cost in terms of device
count, power consumption and delay compared to the other proposed
approximate methods and previous approximate works. AMF7 has
reduced device count by more than 50%, compared to AX-MFA1, AX-
MFA2 and approximate DWM-based full-adder, and has reduced power
consumption substantially by 9.5x, 3.6x and 39x respectively. Hereafter,
we want to compare the proposed and previous methods with equal NoE,
ER and MED. In the case where the NoE of the 'SUM0 is 4 and the 'Cout'
has no error, the device count (if write circuit also considered) and
especially the power consumption of AMF3 and AMF4, is better than AX-
MFA1 and AMF5.
In the case where the 'SUM0 NoE is 2 and the 'Cout' NoE is zero, device
count and power consumption of AMF2 is half of the AMF1. Furthermore,
AMF2 decreased power consumption by 9.4x and 8.4x compared to
approximate CMOS-based [38] and DWM-based [37] full-adders,
respectively. It is notable that AMF2 outperforms other methods when
'Cout' is error-free and the NoE of 'SUM0 is 2. AMF6 has better perfor-
mance compared to AX-MFA2 when 'Cout' and the NoE of 'SUM0 is 2 and
4 respectively, owing to lack of writing circuits which in turn consumes
power dramatically. In this case, AMF7 outperforms other methods.
For comparison of image quality in approximate computing, we used
PSNR and SSIM metrics to evaluate accurate and approximate full-adders
performing Sobel algorithm. According to the results in Table 4, when the
least significant bit is erroneous in computing (1-LSB), AMF5 and AX-
MFA1 have higher PSNR and SSIM. We found that in the sum of the
Fig. 5. Transient simulations STT-assisted precessional VCMA based accurate least significant bit (1-LSB), since there is no carry for the 1-LSB, one
magnetic full-adder and AMF1, 2. input is always zero (C), another one (B) is also zero in most cases and the
input A is either one or zero. According to Table 2, AMF5 and AX-MFA1
Error distance (ED) is one of the most valuable metrics for the contrast work correctly in these cases, i.e. A ¼ 0, B ¼ 0, C ¼ 0 and A ¼ 1, B ¼ 0,
between approximate FAs wherein the accurate output 'a' and inaccurate C ¼ 0, but the rest methods are erroneous in one of these two cases. In
output 'b' is compared for all feasible combinations of inputs by Eq. (8) general, AMF1, AMF2, approximate CMOS-based [38] and DWM-based
[46]. Error rate (ER) and mean error distance (MED) are also two [37] full-adders, have the best accuracy. It is noteworthy to mention
important metrics that is obtained from the ED by assuming that the that, the PSNR and SSIM for AMF7 are close to other methods in 1-LSB to
inputs are uniformly distributed. The ER is defined as the probability of 4-LSB approximation, due to smallest area and power consumption. In
producing an erroneous outputs and the MED is defined as the mean the Table 4, best values for PSNR and SSIM is bolded.
value of the ED of all possible outputs for a given set of input vectors [47]. Fig. 6 illustrates the accurate and approximate result of applying the
Sobel edge detection algorithm to the Lena image. First of all, as Fig. 6.b

EDða; bÞ ¼ ja bj ¼ Σ i a½i 2i Σ j b½j 2j (8) depicts, we perform Sobel on the original image that is the accurate
output result. Afterward, according to Table 4, we show several better
Note that the results obtained from the simulations for the proposed approximate Sobel results in 1-LSB and 2-LSB for all approximate
method and previous works are presented in Table 4 with regard to the methods. As previously stated, in 1-LSB and 2-LSB, there are one and two
number of transistors (T), number of MTJs (M), number of SHE-MTJs least significant bits in the approximate computing, respectively. As
(SM), number of domain walls (DM), power consumption shown in Fig. 6.c to e, g, i, and k to m; in 1-LSB or even in 2-LSB, there is
(dynamic þ static), number of errors (NoE) for 'SUM0 and 'Cout' outputs, no meaningful change in the output of approximated Sobel.
error rate (ER), mean error distance (MED) and image quality metrics
such as PSNR and SSIM for 1-LSB to 4-LSB (x-LSB means x-bit(s) with 6. Conclusions
low-order) approximate bits in 8-bit adder. Considering most previous
works were designed and evaluated in 180 nm, for a fair comparison, we In this paper, we introduced accurate magnetic full-adder based on
67
A. Zarei, F. Safaei
Table 4
Comparison of the previous works and the proposed accurate and approximate STT-assisted VCMA-MRAM based full-adders.
Magnetic full-adder Device count (1) Power (2) (μW) Delay (3)
(ps) NoE (4)
ER (5) (%) MED (6)
1-LSB (7)
2-LSB 3-LSB 4-LSB
SUM Cout PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
accurate Proposed 32Tþ4 M 7.3 þ 0 989 0 0 0 0 INF 1 INF 1 INF 1 INF 1

CMOS [34] 42T 71.1 þ 0.001 2200
MTJ [34] 34Tþ4 M 2100 þ 0 10200
SHE [35] 23Tþ3SM 710 þ 0 7000
DWM [36] 20Tþ4 Mþ2D 85 þ 0 877
DWM [37] 28Tþ4 Mþ2D 55.6 þ 0 3000
approx CMOS [38] 14T 32.5 þ 0.002 645 2 0 25 0.25 21.95 0.91 20.70 0.88 16.13 0.76 12.21 0.66
DWM [37] 28Tþ4 Mþ2D 28.9 þ 0 2000 2 0 25 0.25 21.95 0.91 20.70 0.88 16.13 0.76 12.21 0.66
AX-MFA1 [32] 28Tþ4 M 7þ0 990 4 0 50 0.5 23.52 0.92 14.83 0.70 12.91 0.64 12.32 0.61
AX-MFA2 [32] 32Tþ4 M 2.4 þ 0 901 4 2 50 0.5 20.40 0.87 16.03 0.73 14.32 0.66 11.72 0.61
AMF1 28Tþ4 M 6.73 þ 0 990 2 0 25 0.25 21.95 0.91 20.70 0.88 16.13 0.76 12.21 0.66
AMF2 14Tþ2 M 3.43 þ 0 990 2 0 25 0.25 21.95 0.91 20.70 0.88 16.13 0.76 12.21 0.66
68
AMF3 32Tþ4 M 4.29 þ 0 990 4 0 50 0.5 21.91 0.90 18.09 0.80 14.53 0.67 12.04 0.62
AMF4 28Tþ4 M 4.38 þ 0 990 4 0 50 0.5 20.73 0.88 16.08 0.76 13.01 0.67 10.33 0.62
AMF5 26Tþ4 M 6.53 þ 0 989 4 0 50 0.5 23.52 0.92 14.83 0.70 12.91 0.64 12.32 0.61
AMF6 32Tþ4 M 1.60 þ 0 869 4 2 50 0.5 20.40 0.87 16.03 0.73 14.32 0.66 11.72 0.61
AMF7 14Tþ2 M 0.74 þ 0 868 4 2 50 0.5 21.05 0.89 17.33 0.78 14.52 0.68 11.04 0.62
Notes: AX-MFA1 and AX-MFA2 full-adders are based on STT-assisted VCMA-MRAM. Technology scaling is also utilized to achieve fair comparison. (1) T: CMOS Transistor, M: MTJ, SM: SHE-MTJ, D: DW. (2) dynamic
power þ static power. Unlike CMOS-based storage to keep data, the power of the non-volatile designs can be cut-off. (3) Worst case delay is considered in proposed accurate and approximate designs (AMF1 to AMF7) and
AX-MFA1 and AX-MFA2. (4) Number of Errors in outputs. (5) Error Rate. (6) Mean Error Distance. (7) Least Significant Bit of an 8-bit add operation in Sobel edge detection algorithm.
Microelectronics Journal 82 (2018) 62–70

Fig. 6. (a) Lena original image (b) accurate result of applying the Sobel to the Lena (c) AMF1, AMF2, [37] and [38] 1-LSB (d) AMF1, AMF2, [37] and [38] 2-LSB (e)
AMF3, 1-LSB (f) AMF3, 2-LSB (g) AMF4, 1-LSB (h) AMF4, 2-LSB (i) AMF5 and AX-MFA1, 1-LSB (j) AMF5 and AX-MFA1, 2-LSB (k) AMF6 and AX-MFA2, 1-LSB (l)
AMF6 and AX-MFA2, 2-LSB (m) AMF7, 1-LSB (n) AMF7, 2-LSB.
STT-assisted precessional VCMA and its corresponding write circuit. Af- spintronic technology for IoT, in: Design, Automation & Test in Europe Conference
& Exhibition (DATE), 2018, IEEE, 2018, pp. 931–936.
terward, we proposed seven efficient approximate magnetic full-adders
[14] S. Dutt, S. Nandi, G. Trivedi, Analysis and design of adders for approximate
(AMFs) in terms of area and power which can be used in IoT devices. computing, ACM Trans. Embed. Comput. Syst. 17 (2018) 40.
Moreover, these approximate full-adders were mapped in Sobel image [15] E. Deng, Y. Zhang, J.-O. Klein, D. Ravelsona, C. Chappert, W. Zhao, Low power
processing algorithm at some LSB; in addition, for comparison PSNR and magnetic full-adder based on spin transfer torque MRAM, IEEE Trans. Magn. 49
(2013) 4982–4987.
SSIM metrics are used. Our simulation results demonstrate that, some of [16] H.-P. Trinh, W. Zhao, J.-O. Klein, Y. Zhang, D. Ravelsona, C. Chappert, Magnetic
the proposed AMFs reduced area and power consumption significantly, adder based on racetrack memory, IEEE Trans. Circ. Syst. I: Reg. Pap. 60 (2013)
with the same or even better accuracy than previous works. In the pro- 1469–1477.
[17] Y. Gang, W. Zhao, J.-O. Klein, C. Chappert, P. Mazoyer, A high-reliability, low-
posed methods, AMF2 and AMF7 are very affordable among all methods. power magnetic full adder, IEEE Trans. Magn. 47 (2011) 4611–4616.
AMF2 is the most cost effective method when 'Cout' is error-free while [18] B. Hoefflinger, ITRS: the International Technology Roadmap for Semiconductors,
AMF7 is most desirous when 'Cout' is erroneous. Chips 2020, Springer, 2011, pp. 161–174.
[19] B.K. Kaushik, S. Verma, A.A. Kulkarni, S. Prajapati, Next Generation Spin Torque
Memories, Springer, 2017.
Appendix A. Supplementary data [20] D.D. Tang, Y.-J. Lee, Magnetic Memory: Fundamentals and Technology, Cambridge
University Press, 2010.
[21] X. Fong, K. Roy, On-chip Non-volatile STT-MRAM for Zero-standby Power, Enabling
Supplementary data to this article can be found online at https://doi. the Internet of Things, Springer, 2017, pp. 213–246.
org/10.1016/j.mejo.2018.10.010. [22] E. Deng, G. Prenat, L. Anghel, Non-volatile Magnetic Decider Based on Magnetic
Tunnel Junctions, Electronics Letters, 2016.
[23] S. Wang, H. Lee, F. Ebrahimi, P.K. Amiri, K.L. Wang, P. Gupta, Comparative
References
evaluation of spin-transfer-torque and magnetoelectric random access memory,
IEEE J. Emerg. Select. Top. Circ. Syst. 6 (2016) 134–145.
[1] A.B. Intelligence, More than 30 Billion Devices Will Wirelessly Connect to the [24] W. Kang, Y. Ran, Y. Zhang, W. Lv, W. Zhao, Modeling and exploration of the
Internet of Everything in 2020, Allied Business Intelligence (ABI) Reasearch, New voltage-controlled magnetic anisotropy effect for the next-generation low-power
York, NY, USA, 2014. Retrieved November, 10 (2013). and high-speed MRAM applications, IEEE Trans. Nanotechnol. 16 (2017) 387–395.
[2] M. Gao, Q. Wang, M.T. Arafin, Y. Lyu, G. Qu, Approximate computing for low power [25] W. Kang, Y. Ran, W. Lv, Y. Zhang, W. Zhao, High-speed, low-power, magnetic non-
and security in the internet of things, Computer 50 (2017) 27–34. volatile flip-flop with voltage-controlled, magnetic anisotropy assistance, IEEE
[3] M. Alioto, Enabling the Internet of Things: from Integrated Circuits to Integrated Magnet. Lett. 7 (2016) 1–5.
Systems, Springer, 2017. [26] R. Rajaei, A. Gholipour, Low power, reliable, and nonvolatile MSRAM cell for
[4] A. Bosio, A. Virazel, P. Girard, M. Barbareschi, Approximate computing: design & facilitating power gating and nonvolatile dynamically reconfiguration, IEEE Trans.
test for integrated circuits, in: Test Symposium (LATS), 2017 18th IEEE Latin Nanotechnol. 17 (2018) 261–267.
American, IEEE, 2017, p. 1. [27] J. Alzate, Voltage-controlled Magnetic Dynamics in Nanoscale Magnetic Tunnel
[5] Q. Xu, T. Mytkowicz, N.S. Kim, Approximate computing: a survey, IEEE Design Test Junctions, Electrical Engineering, University of California, Los Angeles, 2014.
33 (2016) 8–22. [28] S. Huda, A. Sheikholeslami, A novel STT-MRAM cell with disturbance-free read
[6] H.E. Yantir, A.M. Eltawil, F.J. Kurdahi, Approximate memristive in-memory operation, IEEE Trans. Circ. Syst. I: Reg. Pap. 60 (2013) 1534–1547.
computing, ACM Trans. Embed. Comput. Syst. 16 (2017) 129. [29] H. Cai, W. Kang, Y. Wang, L. Naviner, J. Yang, W. Zhao, High performance MRAM
[7] S. Mittal, A survey of techniques for approximate computing, ACM Comput. Surv. with spin-transfer-torque and voltage-controlled magnetic anisotropy effects, Appl.
48 (2016) 62. Sci. 7 (2017) 929.
[8] B. Zeinali, D. Karsinos, F. Moradi, Progressive scaled STT-RAM for approximate [30] H.R. Mahdiani, A. Ahmadi, S.M. Fakhraie, C. Lucas, Bio-inspired imprecise
computing in multimedia applications, IEEE Trans. Circ. Syst. II: Expr. Briefs 67 (7) computational blocks for efficient VLSI implementation of soft-computing
(2017) 938–942. applications, IEEE Trans. Circ. Syst. I: Reg. Pap. 57 (2010) 850–862.
[9] F. Samie, L. Bauer, J. Henkel, IoT technologies for embedded computing: a survey, [31] A. Panahi, F. Sharifi, M.H. Moaiyeri, K. Navi, CNFET-based approximate ternary
in: Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on adders for energy-efficient image processing applications, Microprocess. Microsyst.
Hardware/Software Codesign and System Synthesis, ACM, 2016, p. 8. 47 (2016) 454–465.
[10] J. Henkel, S. Pagani, H. Amrouch, L. Bauer, F. Samie, Ultra-low power and [32] H. Cai, Y. Wang, L.A.D.B. Naviner, W. Zhao, Robust ultra-low power non-volatile
dependability for IoT devices (Invited paper for IoT technologies), in: 2017 Design, logic-in-memory circuits in FD-SOI technology, IEEE Trans. Circ. Syst. I: Reg. Pap.
Automation & Test in Europe Conference & Exhibition (DATE), IEEE, 2017, 64 (2017) 847–857.
pp. 954–959. [33] H. Cai, Y. Wang, L.A. Naviner, Z. Wang, W. Zhao, Approximate computing in MOS/
[11] S. Senni, L. Torres, G. Sassatelli, A. Gamatie, Non-volatile processor based on spintronic non-volatile full-adder, in: Nanoscale Architectures (NANOARCH), 2016
MRAM for ultra-low-power IoT devices, ACM J. Emerg. Technol. Comput. Syst. 13 IEEE/ACM International Symposium on, IEEE, 2016, pp. 203–208.
(2017) 17. [34] S. Matsunaga, J. Hayakawa, S. Ikeda, K. Miura, H. Hasegawa, T. Endoh, H. Ohno,
[12] H. Jayakumar, A. Raha, Y. Kim, S. Sutar, W.S. Lee, V. Raghunathan, Energy-efficient T. Hanyu, Fabrication of a nonvolatile full adder based on logic-in-memory
system design for IoT devices, in: Design Automation Conference (ASP-DAC), 2016 architecture using magnetic tunnel junctions, APEX 1 (2008), 091301.
21st Asia and South Pacific, IEEE, 2016, pp. 298–301. [35] A. Roohi, R. Zand, D. Fan, R.F. DeMara, Voltage-based concatenatable full adder
[13] M. Tahoori, S. Nair, R. Bishnoi, S. Senni, J. Mohdad, F. Mailly, L. Torres, P. Benoit, using spin Hall effect switching, IEEE Trans. Comput. Aided Des. Integrated Circ.
A. Gamatie, P. Nouet, Using multifunctional standardized stack as universal Syst. 36 (2017) 2134–2138.
69
[36] A. Roohi, R. Zand, R.F. DeMara, A tunable majority gate-based full adder using [42] M.-S. Liao, S.-F. Chen, C.-Y. Chou, H.-Y. Chen, S.-H. Yeh, Y.-C. Chang, J.-A. Jiang,
current-induced domain wall nanomagnets, IEEE Trans. Magn. 52 (2016) 1–7. On precisely relating the growth of phalaenopsis leaves to greenhouse
[37] S. Angizi, H. Jiang, R.F. DeMara, J. Han, D. Fan, Majority-based spin-CMOS environmental factors by using an IoT-based monitoring system, Comput. Electron.
primitives for approximate computing, IEEE Trans. Nanotechnol. 17 (4) (2018) Agric. 136 (2017) 125–139.
795–806. [43] K. Lakshmi, S. Gayathri, Implementation of IoT with Image processing in plant
[38] V. Gupta, D. Mohapatra, A. Raghunathan, K. Roy, Low-power digital signal growth monitoring system, J. Sci. Innov. Res. 6 (2017) 80–83.
processing using approximate adders, IEEE Trans. Comput. Aided Des. Integrated [44] A. Hore, D. Ziou, Image quality metrics: PSNR vs. SSIM, Pattern recognition (icpr),
Circ. Syst. 32 (2013) 124–137. in: 2010 20th International Conference on, IEEE, 2010, pp. 2366–2369.
[39] E. Deng, Y. Zhang, W. Kang, B. Dieny, J.-O. Klein, G. Prenat, W. Zhao, Synchronous [45] Predictive Technology Model (PTM). [Online]. http://ptm.asu.edu/.
8-bit non-volatile full-adder based on spin transfer torque magnetic tunnel junction, [46] J. Liang, J. Han, F. Lombardi, New metrics for the reliability of approximate and
IEEE Trans. Circ. Syst. I: Reg. Pap. 62 (2015) 1757–1765. probabilistic adders, IEEE Trans. Comput. 62 (2013) 1760–1771.
[40] J.C. Russ, The Image Processing Handbook, CRC press, 2016. [47] C. Liu, J. Han, F. Lombardi, An analytical framework for evaluating the error
[41] S. Gupta, S.G. Mazumdar, Sobel edge detection algorithm, Int. J. Comput. Sci. characteristics of approximate adders, IEEE Trans. Comput. 64 (2015) 1268–1281.
Manag. Res. 2 (2013) 1578–1583. [48] A. Stillmaker, B. Baas, Scaling equations for the accurate prediction of CMOS device
performance from 180 nm to 7 nm, Integrat. VLSI J. 58 (2017) 74–81.
70

Microelectronics Journal: Ali Zarei, Farshad Safaei

Uploaded by

Copyright:

Available Formats

You might also like

Microelectronics Journal: Ali Zarei, Farshad Safaei

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Microelectronics Journal: Ali Zarei, Farshad Safaei

Uploaded by

Copyright:

Available Formats

Microelectronics Journal 82 (2018) 62–70

Contents lists available at ScienceDirect

Power and area-efﬁcient design of VCMA-MRAM based full-adder using

Fig. 4. CMOS logic for AMF1.

accurate Proposed 32Tþ4 M 7.3 þ 0 989 0 0 0 0 INF 1 INF 1 INF 1 INF 1

Microelectronics Journal 82 (2018) 62–70

You might also like