Burst Error Correcting Code For Protecting On-Chip Memory Systems Against Multiple Cell Upsets

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

IPASJ International Journal of Electronics & Communication (IIJEC)

Web Site: http://www.ipasj.org/IIJEC/IIJEC.htm


Email: editoriijec@ipasj.org
ISSN 2321-5984

A Publisher for Research Motivation........

Volume 2, Issue 10, October 2014

Burst error correcting code for protecting onchip memory systems against multiple cell
upsets
Hoyoon Jun1 and Yongsurk Lee2
1

School of Electrical and Electronic Engineering Yonsei University, Seoul, Korea

School of Electrical and Electronic Engineering Yonsei University, Seoul, Korea

ABSTRACT
Multiple cell upsets (MCUs) by neutron-induced soft errors threat the reliability of on-chip memory systems as process
technologies continue to shrink in the deep submicron regime. Multiple burst error correction codes (MBECCs) is useful to
address these upsets. We propose cost effective single error correction, double error detection, and doubletriple adjacent error
correction (SEC-DED-DTAEC) code. The proposed code corrects double adjacent errors without mis-correction and triple
adjacent errors with a small parity bits than double error correction (DEC) BCH. In simulation experiments, we demonstrate
that the proposed code requires fewer parity bits than conventional double or triple adjacent error correction codes.

Keywords: error correcting code, neutron-induced soft error, multiple cell upsets, on-chip memories

1. INTRODUCTION
Multiple cell upsets (MCUs) caused by neutron-induced soft errors are extremely problematic for on-chip memory
systems as the size and the supply voltage of silicon devices shrink in the deep sub-micron regime. These upsets may
cause burst errors in physically adjacent memory cells because the heavy-ions tracks left by nuclear reactions alter the
information in storage nodes being changed [1, 2]. Soft errors in on-chip memory systems may be effectively protected
by error correcting codes (ECCs) [2]. Single error correction and double error detection (SECDED) codes correct
single-bit errors and detect double-bits errors using small redundant bits called parity check bits and are widely
employed in commercial on-chip memory systems. However, these codes cannot reliably protect these systems against
MCUs. To correct multiple adjacent errors caused by MCUs, multiple burst error correction codes (MBECCs) are
proposed. There are SECDED and double adjacent error correction (SECDEDDAEC) codes which exploit linear
block coding [3, 4]. These codes are shortened Hamming codes that correct both single and double adjacent errors.
However, because some syndromes for double adjacent errors and double non-adjacent errors are equal, these codes are
prone to mis-correction. Double error correction BoseChaudhuriHocquenghem (DEC BCH) codes for on-chip
memories are proposed in [5]. Although, these codes correct random and burst double errors without mis-correction,
they require large parity bits and decoding latency. Triple adjacent error correction codes are proposed in [6]. These
codes are derived from orthogonal Latin square codes that can correct random and burst double errors with simple onestep majority logic decoding. However, these codes require more parity bits than DEC BCH codes. In addition, low-cost
burst error correction codes are proposed in [7]. These codes are more efficient for flip-flops than on-chip memories
because they offer rapid decoding, but still require many parity check bits. In this paper, we propose cost effective SEC
DED, and doubletriple adjacent error correction (SECDEDDTAEC) codes for protecting on-chip memory systems
against MCUs. In simulation experiments and comparison with conventional MBECCs, the proposed codes corrected
single error and double adjacent errors without mis-correction and corrected triple adjacent errors with same number of
parity check bits as SECDEDDAEC codes.

2. LINEAR BLOCK CODE


A block code with n k parity check bits is linear block (n, k) code if these 2k codewords forms a k-dimensional
subspace of the vector space of all the n-tuples over the field GF(2). All 2k codewords can be generated by a k n
generation matrix G or a (n k) n H-matrix. The generation matrix G consists of a (n k) k parity equation matrix
P and a k k identity matrix Ik. The H-matrix consists of (n k) k parity equation matrix PT and (n k) (n k)
identity matrix Ink, where PT is a transpose of the matrix P. The G and H matrix thus have k and n k linearly
independent rows. Therefore, an (n, k) linear block code can be completely generated by linear combinations of the
rows of each matrix. In addition, the GHT = 0, where HT is the transpose of H, because the G = [PIk] and H = [InkPT].

Volume 2, Issue 10, October 2014

Page 43

IPASJ International Journal of Electronics & Communication (IIJEC)


A Publisher for Research Motivation........

Volume 2, Issue 10, October 2014

Web Site: http://www.ipasj.org/IIJEC/IIJEC.htm


Email: editoriijec@ipasj.org
ISSN 2321-5984

For an n-bit vector v = (v0, v1, ... , vn1), the v is a codeword, if vHT = 0. An error vector e can be expressed difference
between a transmitted codeword v and a received codeword r = (r0, r1, ... , rn1). The sum of these vectors r + v is
represented as n-tuple the error vector e = (e0, e1, ..., e n1), because for ri vi, ei =1 and for ri = vi, ei = 0. When a r is
received, the decoder computes a (n k)-tuple s = (s0, s1, ..., snk1) = rHT which is called a syndrome of the r. The s = 0
if the r is a transmitted codeword without an error but otherwise s 0. The syndrome s is an error pattern specified,
because s = rHT = (v + e)HT = vHT + eHT. Consequently, the syndrome s = eHT, since vHT = 0. If the syndrome s
vector is equal to a single column vector of H matrix, this single error pattern indicates the bit position to be corrected
in the r. A minimum distance of a linear block code determines a random error detecting and correcting capability of
the code. The hamming weight of a codeword v is defined as the number of 1s in the codeword. In a linear block code,
the Hamming distance between two codewords of equal length is the number of positions at which the corresponding
symbols are different. Thus, the distance of any two codewords is equal to the weight of an XOR result between these
codewords. The minimum distance for a linear block code thus equals the minimum weight of its codeword. For each
codewords of Hamming weight w, there is a linear dependence relation among w column of H, which a sum of w
column is zero vector. In other words, every combinations of w 1 or fewer columns of H matrix is linearly
independent, which the sum of w 1 columns is non-zero. The received codeword r is not transmitted codword v, if the
r is the codeword that does not follow the minimum distance rule. For an (n, k) linear block code with minimum
distance dmin, all the error correcting capability t = (dmin 1) / 2 or fewer errors. The first class of linear block codes for
error correction and detection is the Hamming code. For any positive integer dmin 1 n k, where the n and k are a
length of codeword and information respectively, there exists the Hamming code with 2r 1 = n codeword bits, 2r r 1
= k information bits, n k = r parity bits, and t = 1 (dmin = 3) error correcting capability. A code is capable of correcting
t error and detecting t + 1 errors if dmin > 2t + 1. The hamming distance for SECDED and DEC are 4 and 5
respectively. For H matrix in SEC-DED, the sum of any 3 columns of H-matrix should be non-zero. The SECDEC in
computer memory system is usually Hsiao code [8] that consists of odd-weighted columns in H matrix. The important
feature of the Hsiao code is fast encoding/decoding with small parity bits. The H matrix in the Hsiao code satisfies
following requirements:
1. There is no all-zero column vector.
2. Every column is distinct.
3. Every column contains an odd number of 1s.

Figure 1 Parity-check matrix of the Hsiao (39, 32) SEC-DED codes


The third requirement allows the code generated by the H matrix to have a minimum distance of at least 4 because a
sum of three odd weight columns is always odd weight. The Fig.1 shows H-matrix without an identity matrix of the
(39, 32) SECDED Hsiao code. Every weight of column vectors in the H-matrix is three (odd).

3. PROPOSED CODE
3.1 H-matrix Generation Rules
The proposed code is derived from the Hsiao SECDED code. Syndromes for DAEC should be unique to correct double
adjacent errors without mis-correction problem. Indeed, syndromes for double adjacent and non-adjacent errors are
non-overlapping and unique among double adjacent errors. In addition, syndromes for triple adjacent errors are
completely separated from those of a single error and other triple adjacent errors, because an odd weight is applied to
the column in the H-matrix. Therefore, the H-matrix in our proposed codes is generated under the following rules:

Volume 2, Issue 10, October 2014

Page 44

IPASJ International Journal of Electronics & Communication (IIJEC)


A Publisher for Research Motivation........

Volume 2, Issue 10, October 2014

Web Site: http://www.ipasj.org/IIJEC/IIJEC.htm


Email: editoriijec@ipasj.org
ISSN 2321-5984

1. There is no all-zero column vector.


2. Every column vector should be unique and odd weight.
3. Every XOR result for double and triple adjacent column vectors is unique.
4. Every XOR result for double non-adjacent column vectors can overlap each other.
5. Every XOR result between columns of information bits and parity check bits may overlap.
The first two rules guarantee a hamming distance of four for SECDED. The third rule provides capability to correct
double and triple adjacent errors. The fourth rule is helpful for decreasing the number of parity check bits because the
proposed code dose not cover double random errors correction according to error pattern of MCUs. Typically, bit
interleaving scheme in SRAMs is used to achieve effective regularity of the SRAM layout because the bit-cell pitch in
the horizontal direction of the SRAM columns is typically smaller than that of an I/O circuit, including the sense
amplifier and the write driver. Thus, most information bits are physically separated from the parity check bits.
Therefore, the fifth rule is acceptable.
3.2 H-matrix Generation Procedures
The H-matrix for our proposed codes is generated from an intended column pool matrix and a heuristic algorithm. A
column vector in the column pool matrix is arranged with reversed colexicographic order as shown Fig. 2. This matrix
is exploited to address mis-correction problem. In this matrix, consecutive ones and zeroes in the same row are helpful
to separate double adjacent errors from double non-adjacent errors. For example, in the first row in the column pool, if
two errors is occurred at one column in consecutive ones and another column in consecutive zeros, the sum of the first
row position bit between these columns is one. In fact, the sum of the first row position bit between two adjacent
columns in the consecutive section (ones or zeros) is zero except for the sum of the first row position bit between border
columns of the consecutive sections. The diagonal arrangements of ones beneath the sub-sections where consecutive
ones occur assist in discriminating random double errors in those sub-sections.

Figure 2 Example of 7C3 column pool matrix


An H-matrix satisfying all the proposed rules is generated by the following algorithm as shown Fig. 3. First, column
pools matrices are constructed with reversed colexicographic order according to a column vector with odd weight r.
Second, to exploit the advantage of reversed colexicographic order, appropriate columns are always retrieved from the
pool matrices in a left-to-right order. Column selection is managed by an overlap weight, defined as the weight of the
XOR result between the last updated column in the temporal H-matrix and a newly selected column in the column
pools. Low overlap weight enables to implement an H-matrix with small ones because this weight requires low weight
column vectors. In addition, exploiting overlap weights in an ascending order is useful for selecting available columns
and for distinguishing double adjacent and non-adjacent errors. Third, uniqueness and overlap tests are performed, and
the temporary H-matrix is updated by the test result. If no satisfactory H-matrix is found, the overlap weight, seed
column, and length of parity check bits is increased, and the procedure is repeated. Lastly, an H-matrix with the
smallest number of ones is confirmed to diminish the decoding logic area and power consumption of the proposed code.

Volume 2, Issue 10, October 2014

Page 45

IPASJ International Journal of Electronics & Communication (IIJEC)


Web Site: http://www.ipasj.org/IIJEC/IIJEC.htm
Email: editoriijec@ipasj.org
ISSN 2321-5984

A Publisher for Research Motivation........

Volume 2, Issue 10, October 2014

Figure 3 The proposed algorithm for the H-matrix generation

4. EXPERIMENTAL RESULTS
The effectiveness of our proposed code was evaluated by simulating them in a high-level language. Table 1 lists the
main parameters in the proposed code. The fourth column displays the decrement of parity bits by our proposed codes,
compared with the DEC BCH code. For 64 information bits, the proposed codes require 21.4% fewer parity bits than
DEC BCH code. The sixth column shows the rate of increase of ones in the H-matrix compared to SECDEC codes.
This result implies that our proposed codes can decode at least as rapidly as SEC DED codes. Table 2 compares the
performance of our proposed codes with those of conventional DAEC codes (where k = 32). The mis-correction is
avoided in both our proposed code and DEC BCH code. In addition, our proposed code corrects triple adjacent errors
with the same number of parity bits as DAEC codes. Although the codes proposed in [6] can also correct triple adjacent
errors using low cost decoding logic, they require 36 parity bits for k = 64, 327% higher than that required in our
proposed code.
Table 1: Proposed SECDEDTDAEC codes parameters
n
42
75
141

k
32
64
128

Volume 2, Issue 10, October 2014

r
10
11
13

r-OH
-2
-3
-3

# of 1s
100
256
538

1s-OH
+4%
+23%
+13%

Page 46

IPASJ International Journal of Electronics & Communication (IIJEC)


Web Site: http://www.ipasj.org/IIJEC/IIJEC.htm
Email: editoriijec@ipasj.org
ISSN 2321-5984

A Publisher for Research Motivation........

Volume 2, Issue 10, October 2014

Table 2: Comparison of DAEC codes (k = 32)


codes
DAEC[3]
DAEC[4]
DEC BCH[5]
Proposed code

r
10
10
12
10

mis-correction rate
8.8%
9.0%
0%
0%

# of 1s
140
80
200
100

TAEC
No
No
No
Yes

5. CONCLUSION
In this paper, we have proposed cost effective single error correction, double error detection, and doubletriple adjacent
error correction (SECDEDDTAEC) code that reliably protects on-chip memory systems from multiple cell upsets
(MCUs). To achieve DAEC without mis-correction, the H-matrix consists of column vectors collected from reversed
colexicographic order matrices. In addition, the overlap weight enables the construction of low cost H-matrix. The
proposed codes correct single, double and triple adjacent errors with small parity bits. Therefore, our proposed code can
protect against MCUs in on-chip memory systems at much lower power consumption than conventional multiple burst
error correcting codes.

References
[1] Taiki U., Takashi K., Hideya M., and Hashimoto M., Soft-Error in SRAM at Ultra-Low Voltage and Impact of
Secondary Proton in Terrestrial Environment, IEEE Trans. Nucl. Sci., pp. 4232-4237, 2013.
[2] Nicolaidis M., Design for soft error mitigation, IEEE Trans. Device Mater. Rel., 2005, 5, p. 405-418.
[3] Datta R. and Touba N. A., Exploiting Unused Spare Columns to Improve Memory ECC, In Proceedings of the
27th IEEE VLSI test symp., pp. 47-52, 2009.
[4] Neale A. and Sachdev M., A New SECDED Error Correction Code Subclass for Adjacent MBU Tolerance in
Embedded Memory, IEEE Trans. Device Mater. Rel., pp. 223-230, 2013.
[5] Naseer R. and Draper, J., Parallel double error correcting code design to mitigate multi-bit upsets in SRAMs,, In
Proceedings of the 34th European Solid-State Circuits Conf., pp. 222-225, 2008.
[6] Reviriego P., Bleakely C. and Maestro J. A., Implementing triple adjacent error correction in double error
correction orthogonal Latin squares, In Proceedings of the 26th IEEE symp. Defect and Fault Tolerance in VLSI
and Nanotechnology Systems, 2013, pp. 167-171, 2013.
[7] Reviriego P., Pontarelli S., Maestro J. A. and Ottavi M. Low-cost single error correction multiple adjacent error
correction codes, IET Electronics lett., pp. 1470-1472, 2012.
[8] Hsiao, M. Y. A class of optimal minimum odd-weight-column SECDED codes, IBM J. Res. Develp., pp. 395401, 1970.

Volume 2, Issue 10, October 2014

Page 47

You might also like