Professional Documents
Culture Documents
Burst Error Correcting Code For Protecting On-Chip Memory Systems Against Multiple Cell Upsets
Burst Error Correcting Code For Protecting On-Chip Memory Systems Against Multiple Cell Upsets
Burst Error Correcting Code For Protecting On-Chip Memory Systems Against Multiple Cell Upsets
Burst error correcting code for protecting onchip memory systems against multiple cell
upsets
Hoyoon Jun1 and Yongsurk Lee2
1
ABSTRACT
Multiple cell upsets (MCUs) by neutron-induced soft errors threat the reliability of on-chip memory systems as process
technologies continue to shrink in the deep submicron regime. Multiple burst error correction codes (MBECCs) is useful to
address these upsets. We propose cost effective single error correction, double error detection, and doubletriple adjacent error
correction (SEC-DED-DTAEC) code. The proposed code corrects double adjacent errors without mis-correction and triple
adjacent errors with a small parity bits than double error correction (DEC) BCH. In simulation experiments, we demonstrate
that the proposed code requires fewer parity bits than conventional double or triple adjacent error correction codes.
Keywords: error correcting code, neutron-induced soft error, multiple cell upsets, on-chip memories
1. INTRODUCTION
Multiple cell upsets (MCUs) caused by neutron-induced soft errors are extremely problematic for on-chip memory
systems as the size and the supply voltage of silicon devices shrink in the deep sub-micron regime. These upsets may
cause burst errors in physically adjacent memory cells because the heavy-ions tracks left by nuclear reactions alter the
information in storage nodes being changed [1, 2]. Soft errors in on-chip memory systems may be effectively protected
by error correcting codes (ECCs) [2]. Single error correction and double error detection (SECDED) codes correct
single-bit errors and detect double-bits errors using small redundant bits called parity check bits and are widely
employed in commercial on-chip memory systems. However, these codes cannot reliably protect these systems against
MCUs. To correct multiple adjacent errors caused by MCUs, multiple burst error correction codes (MBECCs) are
proposed. There are SECDED and double adjacent error correction (SECDEDDAEC) codes which exploit linear
block coding [3, 4]. These codes are shortened Hamming codes that correct both single and double adjacent errors.
However, because some syndromes for double adjacent errors and double non-adjacent errors are equal, these codes are
prone to mis-correction. Double error correction BoseChaudhuriHocquenghem (DEC BCH) codes for on-chip
memories are proposed in [5]. Although, these codes correct random and burst double errors without mis-correction,
they require large parity bits and decoding latency. Triple adjacent error correction codes are proposed in [6]. These
codes are derived from orthogonal Latin square codes that can correct random and burst double errors with simple onestep majority logic decoding. However, these codes require more parity bits than DEC BCH codes. In addition, low-cost
burst error correction codes are proposed in [7]. These codes are more efficient for flip-flops than on-chip memories
because they offer rapid decoding, but still require many parity check bits. In this paper, we propose cost effective SEC
DED, and doubletriple adjacent error correction (SECDEDDTAEC) codes for protecting on-chip memory systems
against MCUs. In simulation experiments and comparison with conventional MBECCs, the proposed codes corrected
single error and double adjacent errors without mis-correction and corrected triple adjacent errors with same number of
parity check bits as SECDEDDAEC codes.
Page 43
For an n-bit vector v = (v0, v1, ... , vn1), the v is a codeword, if vHT = 0. An error vector e can be expressed difference
between a transmitted codeword v and a received codeword r = (r0, r1, ... , rn1). The sum of these vectors r + v is
represented as n-tuple the error vector e = (e0, e1, ..., e n1), because for ri vi, ei =1 and for ri = vi, ei = 0. When a r is
received, the decoder computes a (n k)-tuple s = (s0, s1, ..., snk1) = rHT which is called a syndrome of the r. The s = 0
if the r is a transmitted codeword without an error but otherwise s 0. The syndrome s is an error pattern specified,
because s = rHT = (v + e)HT = vHT + eHT. Consequently, the syndrome s = eHT, since vHT = 0. If the syndrome s
vector is equal to a single column vector of H matrix, this single error pattern indicates the bit position to be corrected
in the r. A minimum distance of a linear block code determines a random error detecting and correcting capability of
the code. The hamming weight of a codeword v is defined as the number of 1s in the codeword. In a linear block code,
the Hamming distance between two codewords of equal length is the number of positions at which the corresponding
symbols are different. Thus, the distance of any two codewords is equal to the weight of an XOR result between these
codewords. The minimum distance for a linear block code thus equals the minimum weight of its codeword. For each
codewords of Hamming weight w, there is a linear dependence relation among w column of H, which a sum of w
column is zero vector. In other words, every combinations of w 1 or fewer columns of H matrix is linearly
independent, which the sum of w 1 columns is non-zero. The received codeword r is not transmitted codword v, if the
r is the codeword that does not follow the minimum distance rule. For an (n, k) linear block code with minimum
distance dmin, all the error correcting capability t = (dmin 1) / 2 or fewer errors. The first class of linear block codes for
error correction and detection is the Hamming code. For any positive integer dmin 1 n k, where the n and k are a
length of codeword and information respectively, there exists the Hamming code with 2r 1 = n codeword bits, 2r r 1
= k information bits, n k = r parity bits, and t = 1 (dmin = 3) error correcting capability. A code is capable of correcting
t error and detecting t + 1 errors if dmin > 2t + 1. The hamming distance for SECDED and DEC are 4 and 5
respectively. For H matrix in SEC-DED, the sum of any 3 columns of H-matrix should be non-zero. The SECDEC in
computer memory system is usually Hsiao code [8] that consists of odd-weighted columns in H matrix. The important
feature of the Hsiao code is fast encoding/decoding with small parity bits. The H matrix in the Hsiao code satisfies
following requirements:
1. There is no all-zero column vector.
2. Every column is distinct.
3. Every column contains an odd number of 1s.
3. PROPOSED CODE
3.1 H-matrix Generation Rules
The proposed code is derived from the Hsiao SECDED code. Syndromes for DAEC should be unique to correct double
adjacent errors without mis-correction problem. Indeed, syndromes for double adjacent and non-adjacent errors are
non-overlapping and unique among double adjacent errors. In addition, syndromes for triple adjacent errors are
completely separated from those of a single error and other triple adjacent errors, because an odd weight is applied to
the column in the H-matrix. Therefore, the H-matrix in our proposed codes is generated under the following rules:
Page 44
Page 45
4. EXPERIMENTAL RESULTS
The effectiveness of our proposed code was evaluated by simulating them in a high-level language. Table 1 lists the
main parameters in the proposed code. The fourth column displays the decrement of parity bits by our proposed codes,
compared with the DEC BCH code. For 64 information bits, the proposed codes require 21.4% fewer parity bits than
DEC BCH code. The sixth column shows the rate of increase of ones in the H-matrix compared to SECDEC codes.
This result implies that our proposed codes can decode at least as rapidly as SEC DED codes. Table 2 compares the
performance of our proposed codes with those of conventional DAEC codes (where k = 32). The mis-correction is
avoided in both our proposed code and DEC BCH code. In addition, our proposed code corrects triple adjacent errors
with the same number of parity bits as DAEC codes. Although the codes proposed in [6] can also correct triple adjacent
errors using low cost decoding logic, they require 36 parity bits for k = 64, 327% higher than that required in our
proposed code.
Table 1: Proposed SECDEDTDAEC codes parameters
n
42
75
141
k
32
64
128
r
10
11
13
r-OH
-2
-3
-3
# of 1s
100
256
538
1s-OH
+4%
+23%
+13%
Page 46
r
10
10
12
10
mis-correction rate
8.8%
9.0%
0%
0%
# of 1s
140
80
200
100
TAEC
No
No
No
Yes
5. CONCLUSION
In this paper, we have proposed cost effective single error correction, double error detection, and doubletriple adjacent
error correction (SECDEDDTAEC) code that reliably protects on-chip memory systems from multiple cell upsets
(MCUs). To achieve DAEC without mis-correction, the H-matrix consists of column vectors collected from reversed
colexicographic order matrices. In addition, the overlap weight enables the construction of low cost H-matrix. The
proposed codes correct single, double and triple adjacent errors with small parity bits. Therefore, our proposed code can
protect against MCUs in on-chip memory systems at much lower power consumption than conventional multiple burst
error correcting codes.
References
[1] Taiki U., Takashi K., Hideya M., and Hashimoto M., Soft-Error in SRAM at Ultra-Low Voltage and Impact of
Secondary Proton in Terrestrial Environment, IEEE Trans. Nucl. Sci., pp. 4232-4237, 2013.
[2] Nicolaidis M., Design for soft error mitigation, IEEE Trans. Device Mater. Rel., 2005, 5, p. 405-418.
[3] Datta R. and Touba N. A., Exploiting Unused Spare Columns to Improve Memory ECC, In Proceedings of the
27th IEEE VLSI test symp., pp. 47-52, 2009.
[4] Neale A. and Sachdev M., A New SECDED Error Correction Code Subclass for Adjacent MBU Tolerance in
Embedded Memory, IEEE Trans. Device Mater. Rel., pp. 223-230, 2013.
[5] Naseer R. and Draper, J., Parallel double error correcting code design to mitigate multi-bit upsets in SRAMs,, In
Proceedings of the 34th European Solid-State Circuits Conf., pp. 222-225, 2008.
[6] Reviriego P., Bleakely C. and Maestro J. A., Implementing triple adjacent error correction in double error
correction orthogonal Latin squares, In Proceedings of the 26th IEEE symp. Defect and Fault Tolerance in VLSI
and Nanotechnology Systems, 2013, pp. 167-171, 2013.
[7] Reviriego P., Pontarelli S., Maestro J. A. and Ottavi M. Low-cost single error correction multiple adjacent error
correction codes, IET Electronics lett., pp. 1470-1472, 2012.
[8] Hsiao, M. Y. A class of optimal minimum odd-weight-column SECDED codes, IBM J. Res. Develp., pp. 395401, 1970.
Page 47