Professional Documents
Culture Documents
Mohanty Isqed2011 Bch
Mohanty Isqed2011 Bch
net/publication/261205011
BCH code based multiple bit error correction in finite field multiplier circuits
CITATIONS READS
20 1,445
5 authors, including:
All content following this page was uploaded by Saraju P. Mohanty on 11 June 2015.
Abstract— This paper presents a design methodology for mul- providing the correct information to the external world. Recent
tiple bit error detection and correction in Galois field arithmetic research proposes various schemes to overcome such attacks.
circuits such as the bit parallel polynomial basis (PB) multipliers The conventional approach for concurrent error detection
over GF(2m ). These multipliers are crucial in most of the
cryptographic hardware designs and hence it is essential to ensure may be considered as a one-to-one mapping between the
that they are not vulnerable to security threats. Security threats output of the functional unit, which is represented as a binary
arising from injected soft (transient) faults into a cryptographic vector of length k, and a codeword which is a binary vector
circuit can expose the secret information, e.g. the secret key, of length n [13]. Namely, a codeword represents a Boolean
to an attacker. To prevent such soft or transient fault related expression that is a minterm. The Hamming distance between
attacks, we consider fault tolerance as a method of mitigation.
Most of the current fault tolerant schemes are only multiple bit the codewords must be greater than one, otherwise, it is
error detectable but not multiple bit error correctable. Keeping impossible to detect an error. This restriction increases the
this in view, we present a multiple bit error correction scheme implementation cost of the functional unit and the implemen-
based on the BCH codes, with an efficient bit-parallel Chien tation cost of the checker [15]. The most commonly used
search module. This paper details the design procedure as fault tolerant schemes for single error detection/correction
well as the hardware implementation specs. Comparison with
existing methods demonstrate improved area, and reduced delay are the error detection and recalculation methods, and the
performances. error correcting designs [11] [9]. In the error detection and
recalculation scheme, the concurrent error detection circuitry
I. I NTRODUCTION (CED) monitors for an error occurrence and, in case of an
Online error detection and correction has received much error, it rolls backs and recalculate again. The drawback of this
attention in recent years as a candidate for attack tolerant approach is that it often increases the time redundancy. An-
cryptographic hardware design and to certain extent fault other commonly used approach is the N-modular redundancy
tolerance as well [1]. With low operating voltage levels and (NMR). In NMR, the actual circuitry is replicated N times
low noise margins, digital designs are often vulnerable to and the decision of whether the resultant output is correct is
various faults [2]. In present day communication devices, cryp- decided by a voting circuitry. The main pitfall of this method is
tographic hardware is an inevitable part. The cryptographic the increase in space redundancy depending upon the number
hardware often needs to store some information hidden in of bit error corrections we need. In [5], a single error correction
order to ensure high degree of information security. The (SEC) scheme based on the Hamming codes is proposed, while
Faults resulting in incorrect output are mainly due to naturally in [10] this method has been extended to the LDPC codes and
occurring faults or due to malicious attacks. The former the method in [6] used to synthesize efficient circuits.
can be detected with various testing techniques, whereas the Motivation and Contribution: From the literature review
latter cannot be detected with the testing schemes presently made, it is evident that efficient error correcting designs
available. Clearly, this can result in catastrophe, if unde- for Galois field based arithmetic circuits are not fully ex-
tected [8]. Attacks against such cryptography hardware are plored. Our proposed scheme overcomes most of the design
often classified into two categories: invasive and non-invasive. drawbacks mentioned above. Firstly, this is the first known
The invasive attacks are based on reverse engineering and approach for bit parallel multiple error correction based on
hence require costly equipment. The non-invasive method, on BCH codes for the multipliers over GF(2m ), where as all
the other hand, exploits the implementation weakness of the the existing techniques considered only single bit error. As
device and is also known as the side channel attacks [3]. It a part of the design, we also propose an efficient bit parallel
is also noted that, hacking of information within the chip is implementation of the Chein search module within the error
possible by introducing hardware Trojans within the process correcting circuitry. Secondly, most of the discussed error cor-
variation allowance of the chip [4]. In these types of attacks, rection schemes impose much higher time redundancy where
the attacker may try to acquire the hidden information by as our proposed multiple bit error correction scheme runs in
injecting random events, e.g. through transient faults, into the parallel with the logic, hence requires less time overhead. The
hardware [7]. To keep our information hidden, we need to added delay is only due to the decoding part of the proposed
mask these injected faults and ensure that the devices keep scheme. In the parity based error correction schemes [1], [5],
the error in the parity blocks can not be detected, where as bi s, where 0 ≤ i ≤ m − 1, are the coordinates of A and B re-
in the proposed scheme, the errors in the parity blocks are spectively. The formulation is based on three matrices namely,
detected as well as the final outputs corrected. We have also an m × m reduction matrix Q, a lower triangular matrix L and
derived the closed form expressions for the parity prediction an upper triangular matrix U. The matrix based multiplication
block. is formulated as an inner product (IP) network with two vector
→
The remainder of this paper is organized as follows. Sec- outputs d~ and e respectively, where,
tion II explains the basic fundamentals of the Galois field
d~ = L~b (1)
arithmetic and Polynomial basis multiplication. In Section III,
we present the design methodologies of the BCH code based ~e = U~b, (2)
multiple error correction scheme. Section IV presents the
where ~b = [b0 , b1 , b2 , . . . , bm−1 ]T , a vector column of the coor-
experimental results, and finally Section V concludes the paper
dinates and xT represents the x transpose. The matrices L and
with future extensions of our research.
U are defined in [12].
The multiplication outputs are given by the equation:
II. P RELIMINARIES
~c = d~ + QT~e, (3)
For completeness, this section presents the preliminaries of
Galois field (GF) arithmetic, which is necessary for both the where the matrix Q, which is dependent on the irreducible
multiplier as well as the BCH coding scheme. polynomials, can be derived as shown in [12] and ~c =
For every prime number p, there exists a Galois field, [c0 , c1 , c2 , . . . , cm−1 ]T is the output bits. For the sake of clarity,
also known as the finite fields, over the set GF(p) having we depict a small example below.
p elements with special elements 0 and 1 as the additive Example 1: Let A and B be two multiplicands over GF(23 )
and multiplicative identities respectively. It is possible to generated with the irreducible polynomial P(x) = x3 + x + 1
extend the fields over GF(p) to a field that consists of pm with A = [a0 , a1 , a2 ] and B = [b0 , b1 , b2 ]. Then, A and B can also
elements, where m is a nonzero positive integer. This extended be represented as A = a2 x2 + a1 x + a0 and B = b2 x2 + b1 x +
field over the set GF(pm ) is known as the extension of the b0 in the polynomial form. The product C(x) = A(x) · B(x)
field over GF(p). Let ‘+’ and ‘·’ represent the addition and mod P(x). Now, C(x) = (a2 x2 + a1 x + a0 ) · (b2 .x2 + b1 x + b0 ) =
multiplication operations on the field elements. Then GF(pm ) (a2 b2 )x4 + (a1 b2 + a2 b1 )x3 + (a0 b2 + a1 b1 + a2 b0 )x2 + (a0 b1 +
forms a finite field if it forms a commutative ring with identity a1 b0 )x+(a0 b0 ). Let us consider the outputs of the IP networks
over these two operations. The finite fields over GF(2), and d and e. In the GF(23 ) arithmetic, d and e are column vectors
their extensions over GF(2m ), have particular interest in digital having 3 and 2 elements respectively. Let d = [d0 , d1 , d2 ] and
electronics owing to the field elements 0 and 1 only. e = [e0 , e1 ]. Then C(x) can be rewritten as,
The finite fields over GF(2m ) can be generated with monic C(x) = e1 x4 + e0 x3 + d2 x2 + +d1 x + d0 . (4)
irreducible polynomials of the form P(x) = xm−1 + ∑i=0 m−2
ci .xi ,
where ci ∈ GF(2) [14]. Other than elements 0 and 1 the field
consists of elements that are multiples of the element α , also A B
known as the primitive element, where α is the root of P(x) m m
i.e. P(α ) = 0. P(x) is also known as the primitive polynomial m
m
of the field. To make sure that the operations over the field are
finite, any element in the field having power > 2m−1 is reduced BCH Check−bit
m−bit PB GF Multiplier
to an element with power < 2m−1 by using the primitive Generation
polynomial P(x). The set of elements {0, 1, α , α 2 , . . . , α m−1 }
forms the polynomial basis (PB). Any element A ∈ GF(2m ) can .... pk .... p2 p1 p0
cm c1 c0
be represented using the elements in PB. Let A, B ∈ GF(2m )
with, A(x) = ∑i=0 m−1
ai xi , and B(x) = ∑i=0
m−1
bi xi , , where ai , bi ∈ ECm
Synd Generator &
GF(2). The polynomial basis multiplication of A(x) and B(x)
...
over GF(2m ) is defined as C(x) = A(x) · B(x) mod P(x). EC2 BCH Decoder
To simplify the classical way of finite field multiplication, EC1
M(x) = c4 x4 + c3 x3 + c2 x2 + c1 x + c0 (10) The syndrome decoding is done by using the well known
n−k n−k 4 3 2 Peterson-Gorenstein-Zierler algorithm. Peterson noticed that
x M(x) = x (c4 x + c3 x + c2 x + c1 x + c0 )
we need only a few of the syndromes to effectively correct the
= c4 x14 + c3 x13 + c2 x12 + c1 x11 + c0 x10 , (11) bit errors. In our case for three bit error correction we need
since n = 15 and k = 5 in this case. to calculate only syndromes S1, S3, and S5. The generalized
The parity check bits are generated by, equation for syndromes for the given example of BCH(15, 5, 7)
are given as follows,
P(x) = xn−k M(x) mod g(x). (12) S1 = s13α 3 + s12α 2 + s11α + s10, S3 = s33α 3 + s32α 2 +
Let α be the primitive element of GF(24 ), as shown in s31α +s30, S5 = s53α 3 +s52α 2 +s51α +s50, s10 = c4 +c3 +
Table I. Here, P(x) = x4 + x + 1 is the primitive polynomial. c2 + c0 + p8 + p7 + p4 + p0 , s11 = c2 + c1 + c0 + p9 + p7 + p5 +
The three minimal polynomials m1 (x), m3 (x), and m5 (x) are p4 + p1 , s12 = c3 + c2 + c1 + c0 + p8 + p6 + p5 + p2 , s13 = c4 +
given by, c3 + c2 + c1 + p9 + p7 + p6 + p3 , s30 = c4 + c0 + p9 + p5 + p4 +
p0 , s31 = c4 +c3 + p9 + p8 + p4 + p3 , s32 = c4 +c2 + p9 + p7 +
m1 (x) = x4 + x + 1 (13)
p4 + p2 , s33 = c4 + c3 + c2 + c1 + p9 + p8 + p7 + p6 + p4 + p3 +
m3 (x) = x4 + x3 + x2 + x + 1 (14) p2 + p1 , s50 = c4 + c2 + c1 + p9 + p8 + p6 + p5 + p3 + p2 + p0 ,
m5 (x) = x2 + x + 1. (15) s51 = c4 + c3 + c1 + c0 + p8 + p7 + p5 + p4 + p2 + p1 , s52 =
c4 + c3 + c1 + c0 + p8 + p7 + p5 + p4 + p2 + p1 , s53 = 0.
For three bit error correction (t = 3), the generator polyno-
Determining whether the computation is error free is not
mial for constructing the codeword is then given by,
sufficient, and we also need to correct these errors in case
g(x) = lcm m1 (x), m3 (x), m5 (x)). (16) if they are present. For this, we need to compute the error
Substituting values of Eq. (13), Eq. (14) and Eq. (15) in positions or error locations of the erroneous bits. To de-
Eq. (16) we get, termine the error positions effectively, we need to decode
the syndromes. The syndrome decoding block of the BCH
g(x) = x10 + x8 + x5 + x4 + x2 + x + 1 (17) based error correction technique contains an error locator
Substituting Eq. (17) in Eq. (12) gives, polynomial generator block that finds the root of the error
locator polynomial and a decoder that eventually corrects the
P(x) = p9 x9 + p8 x8 + p7 x7 + p6 x6 + p5 x5 + p4 x4 + p3 x3 erroneous bits based on the computed error position. For this
+p2 x2 + p1 x1 + p0 (18) purpose computed syndrome values are passed on to the error
of the error locator polynomial are based on exhaustive search
methods. A widely known algorithm for finding the roots is
the Chien search algorithm, in which all the possible values
of the primitive element α , ranging from α 0 , α , . . . , α 2m−1 ,
are induced into the error locator polynomial to check if they
satisfy the polynomial. In the proposed design, a bit parallel
implementation of the Chien search algorithm is implemented.
In particular, we proposed a scheme in which the root of the
error locator polynomial is checked only among the powers of
the primitive element α corresponding to the bit positions of
the message bits, i.e. the multiplier output bits. The roots of
the error locator polynomial corresponding to the parity bits
are omitted in order to reduce the hardware complexity. For a
5-bit multiplier, we check whether α , α 2 , α 3 , α 4 , α 5 are roots
of the error locator polynomial, which in turn corresponds
to the bit positions c4 , c3 , c2 , c1 and c0 in the output of the
multiplier. In other words, if α is a root of the error locator
polynomial, it says that the bit c4 of the multiplier is erroneous,
etc. The decoder corrects the erroneous bit(s) corresponding
to the information provided by the parallel root search block.
Based on this design principle, we have also extended the
Fig. 3. Block with area analysis of a 45-bit GF multiplier with 3-bit error design to a 16-bit bit parallel PB multiplier over GF(216 ) and
correction.
to a 45-bit bit parallel PB multiplier over GF(245 ).
(S13 + S3 + S1S2).
The above three equations give the coefficients σ 1, σ 2, and 120
80
60
40
20
0
With Error Correction
Original Design 4 Error 5 Error
2 Error 3 Error
the SynopsysTM (180nm technology) design compiler. Fig. 5 on the BCH codes. We also presented an efficient bit parallel
shows the area overhead for the various designs with 2, 3, 4, structure of the iterative Chien search algorithm for finding
and 5 error correction for a 45-bit multiplier. Fig. 3 shows the roots of the error locator polynomial, comprising less
the area of the various blocks in our proposed multiple error area and time complexity. The experimental results showed
correction scheme. Fig. 4 shows the snapshot of a typical that the proposed scheme has a lower complexity in terms
ModelsimTM simulation result. During the simulations, we of area and delay compared with the NMR based techniques.
have introduced faults into the multiplier outputs randomly Also, compared to SEC techniques the hardware overhead is
for checking the error correction capability of the proposed well within acceptable margins despite its enhanced capability.
scheme. The highlighted parts in Fig. 4 show one among Future extensions of the proposed work will include fully
the many testing values. We have introduced errors in the testable versions of the proposed scheme for higher complexity
intermediate stages of the multiplier, which in turn gave bit parallel and digit serial architectures. Power and delay
multiple bit errors at the multiplier output. In this case we specs will be explored with advanced testable designs.
have errors at bit positions 1, 2, and 16, however the cout
R EFERENCES
values show the corrected final output from the BCH decoder.
Although the example designs considered 2 to 5 bit error [1] A. Reyhani-Masoleh, and M. Anwar Hasan, “Fault Detection Architec-
tures for Field Multiplication Using Polynomial Bases”, IEEE Trans.
correction capability, based on the theory presented in this Computers, vol. 55, No. 9, Sept. 2006.
paper, we can easily extend its capability to more than five [2] Mitra S., Seifert N., Zhang M., Shi Q and Kim K, “Robust System
bits. The back end process, place and route, is done for a 45- Design with Built-In Soft Error Resilience”, IEEE Computer, Vol. 38,
Number 2, pp. 43-52, Feb. 2005.
bit GF multiplier with three error correction capability using [3] C. R. Moratelli, E. Cota, M. S. Lubaszewski “A Cryptography Core
the Cadence EncounterTM tool set. The final layout of the Tolerant to DFA Fault Attacks”, Journal Integrated Circuits and Systems,
design is shown in Fig. 6. pp-14-21, 2007.
[4] Y. Jin, Y. Makris, “Hardware Torjans in Wireless Cryptographic ICs”,
IEEE Design & Test Computers, January/February 2010.
[5] J. Mathew, A. M. Jabir, H. Rahaman, D. K. Pradhan, “Single error
Correctable Bit Parallel Multipliers over GF(2m )”, IET Comput. Digit.
Tech., Vol. 3, Iss.3, pp. 281-288,2009.
[6] A. Jabir, D. Pradhan, J. Mathew “GfXpress: A Technique for Synthesis
and Optimization of GF(2m ) Polynomials”, IEEE Trans. CAD 27(4),
690–711, 2008.
[7] G. B. Ratanpal, R. D. Williams,and T. N. Blalock, “An On-Chip Signal
Suppression Countermeasure to Power Analysis Attacks”, IEEE Trans.
on dependable and secure computing, vol. 1, no. 3, July 2004.
[8] D. Boneh, R. A. DeMillo, and R. J. Lipton, “On the Importance of Elim-
inating Errors in Cryptographic Computations”, Journal of Cryptology,
14, pp. 101–119, 2001.
[9] M. Ciet and M. Joye. Dueck, ”Elliptic Curve Cryptosystems in the
Presence of Permanent and Transient Faults”, Designs, Codes and
Cryptography, 36(1), July 2005, pp. 33–43.
[10] J. Mathew, J. Singh, A. M. Jabir, M. Hosseinabady and D. K. Pradhan,
“Fault Tolerant Bit Parallel Finite Field Multipliers using LDPC Codes”,
IEEE, 2008.
[11] G. Gaubatz and B. Sunar, ”Robust finite field arithmetic for fault-
Fig. 6. Layout of 45-bit GF Multiplier with Multiple Error Correction tolerant public-key cryptography”, 2nd Workshop on Fault Tolerance
Feature. and Diagnosis in Cryptography (FTDC), 2005.
[12] A. Reyhani-Masoleh, and M. Anwar Hasan, “Low Complexity Bit Par-
allel Architectures for Polynomial Basis Multiplication over GF(2m )”,
Further, for a given error correction capability, the extra IEEE Trans. on Computers, Vol.53, No.8, pp.945-959, August 2004.
hardware comes down significantly for larger and more practi- [13] K. Wu, R. Karri, G. Kuznetsov and M. Goessel: “Low Cost Concurrent
Error Detection for the Advanced Encryption Standard”, International
cal designs. For example, for the 5-, 16-, and 45-bit multipliers Test Conference, pp.1242-1248, 2004.
we observed that the extra hardware is 600%, 240%, and [14] D. K. Pradhan, “A Theory of Galois Switching Functions”, IEEE Trans.
150.4% respectively for 3-bit error correction capability. Computers, vol. 27, no. 3, pp.239-248, Mar. 1978.
[15] O. Keren “One-to-Many: Context-Oriented Code for Concurrent Error
Detection” Journal of Electron Test, vol. 26, pp. 337-353, 2010
V. C ONCLUSIONS AND F UTURE WORK [16] E. D. Matrovito, “VLSI Achitectures for Computation in Galois Fields.”
Ph.D. thesis, Linkoping University, Linkoping Sweden, 1999.
This paper proposed a technique for designing bit parallel
multiple bit error correctable multipliers over GF(2m ) based