Professional Documents
Culture Documents
Design of Cost Effective Scan-Chain Based Multiple Errors Recovery in TMR Systems
Design of Cost Effective Scan-Chain Based Multiple Errors Recovery in TMR Systems
Design of Cost Effective Scan-Chain Based Multiple Errors Recovery in TMR Systems
SHIVANANDA N.T
Associate Professor, Dept .Of. ECE, Don Bosco Institute of Technology, Bangalore, Karnataka,
India
ABSTRACT: In this paper, we present a roll forward recovery technique based on scan-chain for
safety critical applications using Triple modular redundancy called, scan-chain based multiple
errors recovery in triple modular redundancy (SMERTMR) systems. In this technique single or
multiple transient faults and latent faults of faulty module were detected and corrected using scan
chain flip-flops. Using scan chain flip-flops transient faults were detected and these are referred as
temporary faults. While comparing the internal states of the TMR modules latent faults were
detected. If a fault is not propagated to the system outputs but it causes a mismatch between the
states of the TMR modules referred as latent faults. When any mismatch is detected between the
modules output, the SMERTMR system will locate the faulty modules and it can correct the faulty
module by copying the state of a non-faulty redundant module to the faulty modules. We
implemented the same in SMERTMR controller, which is the combination of comparison and
recovery mode. This SMERTMR controller was used for comparing the states and recovering
erroneous modules. We modified the existing method to reduce the area used with low power
consumption and for improved performance. The proposed system is designed using Verilog code
on Xilinx ISE simulator.
INTRODUCTION
In todays world the embedded processors are widely used in many applications. In these safety
critical applications like process control in industries, avionics in space explorations and in
medical field for patient life support monitoring are of concern. To meet the reliability
requirements the embedded processor should be equipped with appropriate error detection and
correction mechanism and providing fault tolerant techniques with minimum performance is
another important issue. These systems require both timing constraint and fault tolerance.
In safety critical applications one of the well known and widely used fault tolerant techniques is
Triple Modular Redundancy (TMR). A traditional TMR system consists of three redundant
modules and a voter. So far proposed TMR based error recovery techniques uses a retry or roll
back mechanism to recover transient errors in TMR systems. These techniques are not suitable for
tight deadline applications. Because in retry or roll back mechanism, once an error is detected the
faulty module will re-execute the entire process.
In roll forward recovery mechanism there is no re-computation hence it can be used for tight
deadline applications. A roll forward technique for TMR system has been proposed in S. Yu & E. J.
McClusey (2001). The technique called Scan-chain TMR (ScTMR) uses a roll forward mechanism
and scan chain to recover both transient and permanent faults. This technique is suitable for
general purpose circuit has been proposed in M. Ebrahimi, S. G. Miremadi, & H. Asadi (2001). It has
two major drawbacks. First, when any latent faults are presented ScTMR is unable recover a single
faulty module. Second, ScTMR cannot recover the system if multiple faults are simultaneously
occurs at the modules outputs. The system named SMERTMR is extended to locate, remove and
recover the errors in multiple faulty modules M. Ebrahimi, S. G.Miremadi, H. Asadi & M. Fazeli (2013).
SMERTMR reuses the available scan chains to compare the internal states of TMR modules to
locate the faulty module and restore the correct state from fault-free redundant module into the
faulty modules. The SMERTMR is further modified for improved performance and to reduce the
area with low power consumption.
RELATED WORK
In traditional TMR system, voter masks only one faulty module and it cannot be recovered. The
voters proposed in H. Kim & K. G .Shin (1996), K. G. Shin & H. Kim (1989), P. K. Chande, A. K.
Ramani & P. C. Sharma (1989) these use the faulty modules history and identifies the permanent
error when the number of consecutive faults exceeds the predefined number. Using multiple voters
and disagree detector the transient and permanent faults can be masked. A disagree detector
detects the single fault by comparing the values of different voters in a TMR system but a faulty
detector may lead to failure. In most of the previous work H. Kim & K. G .Shin (1996), K. G. Shin &
H. Kim (1989) retry mechanism has been used, when an error is detected the faulty module will re-
execute the entire process. Re-computation may result in task completion after its deadline.
The work proposed in M. Ebrahimi, S. G.Miremadi, H. Asadi & M. Fazeli (2013) shows that multiple
errors can be recovered. It uses roll- forward recovery mechanism hence it is well suited for tight
deadline applications. In roll forward recovery method, the faulty module is recovered by
replacing its correct state with fault free redundant modules to avoid the re-computation. The
proposed architecture shows the multiple errors recovery with low power consumption and
improved performance with less area.
SMERTMR ARCHITECTURE
The SMERTMR block diagram is shown in figure.1. The SMERTMR includes 1) three redundant
modules; 2) a voter circuit; and 3) a controller. A proposed voter is an intelligent voter, it can
detect the error in single faulty module. Controller logic places very important role because it has
to make the decisions.
In this technique once an error is detected by the voter, it activates an error signal to alert the
SMERTMR controller. When activation of an error signals, controller switches to the comparison
mode to locate the faulty modules from the normal operation. After locating the faulty modules,
controller switches to the recovery mode to recover the faulty modules replacing the correct state
of one of the fault-free redundant modules. It reuses the available scan chains to compare the
internal states of the TMR modules. The proposed system can detect and recover the error using
same circuitry. It also deals with multiple errors recovery in TMR systems.
270
Figure.1. SMERTMR block diagram
Proposed Voter
In TMR all the three modules will do the same operation. The outputs from all the three modules
are given to the voter. An error signal is detected by the voter, it alerts the controller and controller
will initiates the comparison and recovery operation. In redundant systems detection and
correction of faulty modules is challenging issue and is still ongoing research topic. The system
reliability affects significantly for a wrong detection or inability find the faulty module.
To address this issue, a voter proposed in figure.2 can identify the faulty module and also it can
detect the comparators faults. The voter consists of comparison unit i.e. three comparators (Cm12,
Cm13 and Cm23) for comparing the outputs of two TMR modules and also represents any
mismatch between two TMR modules. Ex: error signal E12 is activated when any mismatch
between output I and output II is detected. When any TMR modules generates erroneous output
(ex: output II), two comparators (Cm12 and Cm23) will activate the mismatch signals (E12 and
E23). If a comparator (Cm13) is faulty, only the corresponding error signal (E13) is activated and
other error signals (E12 and E23) are deactivated.
271
To detect the permanent faults, the presented voter has three input signals (Pr12, Pr13 and Pr23).
During normal and recovery process these three signals are deactivated. An output selector circuit
uses error signals (E12 and E13) as inputs to the logical AND gate to generate the select signal for
2 x 1 multiplexer. The ultimate output depends on majority of the modules output. The TABLE I is
used to identify the faulty module or faulty comparator and select the correct voter output using
error signals. According to the Table 1 when any of the comparators, module II and module III
becomes faulty, the output of module I is selected as fault free output. When a module I is faulty,
output of module II is selected as fault free output. Based on this specification the output selector
circuit can be designed.
SMERTMR CONTROLLER
State diagram of SMERTMR is as shown in figure.3 includes four states i.e. normal mode, comparison mode,
recovery mode and unrecoverable mode. When there are no faulty modules, initially system will be in normal
mode of operation. When an error is detected by the voter, it switches the controller and recovery mode of
operation. Where internal states are compared and fault is detected. Also faulty module will be replaced by
correct states of fault free module. After successful completion of recovery process if a comparison processes
fails to detect faults and an unsuccessful recovery of faults in recovery process, it enters to the unrecoverable
condition.
272
flip-flops in the faulty module. After completion of comparison mode Fault Locator Unit (FLU)
detects the faulty module and stores the faulty module numbers in Faulty Module Register (FMR).
Faulty Module Register (FMR) is used by the FLU to store the faulty module numbers.
As shown in figure. 4 the counters unit consists of three counters namely counter12, counter13 and
counter23 are used to stores the number of mismatch between TMR modules. Controller
architecture uses XOR gates to compare the internal states of the TMR modules. Using scan chain
state of faulty module is recovered by state of fault free redundant module in recovery process.
When any mismatch is detected corresponding counter is incremented by one unit. During
comparison mode counters are up counting. Upon detection of any mismatch between modules the
corresponding counter is decremented by one unit. In recovery mode counters are down counting.
For example module I is faulty the corresponding counters (counter12 and counter13) will be
incremented in comparison and decremented in recovery mode respectively. All counters value
should contain zero at the end of the recovery mode. If counters are non-zero the SMERTMR
system enters to the unrecoverable condition.
SIMULATION RESULTS
273
Figure.7. Simulation for one faulty module Figure.8. Simulation for two faulty modules
CONCLUSION
In this paper, we designed the system to detect and recover the multiple errors in TMR systems. In
the proposed technique, we designed a voter circuit and then designed combine unit of comparison
mode and recovery mode of SMERTMR controller which in turn reduces the area used. Both
comparison and recovery modes operate at same time which reduces the time delay and power
consumption is also reduced.
REFERENCES
274