Design of Cost Effective Scan-Chain Based Multiple Errors Recovery in TMR Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

DESIGN OF COST EFFECTIVE SCAN-CHAIN BASED MULTIPLE

ERRORS RECOVERY IN TMR SYSTEMS


ASHRAYA K.N
PG Scholar, Dept .Of. ECE, Don Bosco Institute of Technology, Bangalore, Karnataka, India

SHIVANANDA N.T
Associate Professor, Dept .Of. ECE, Don Bosco Institute of Technology, Bangalore, Karnataka,
India

ABSTRACT: In this paper, we present a roll forward recovery technique based on scan-chain for
safety critical applications using Triple modular redundancy called, scan-chain based multiple
errors recovery in triple modular redundancy (SMERTMR) systems. In this technique single or
multiple transient faults and latent faults of faulty module were detected and corrected using scan
chain flip-flops. Using scan chain flip-flops transient faults were detected and these are referred as
temporary faults. While comparing the internal states of the TMR modules latent faults were
detected. If a fault is not propagated to the system outputs but it causes a mismatch between the
states of the TMR modules referred as latent faults. When any mismatch is detected between the
modules output, the SMERTMR system will locate the faulty modules and it can correct the faulty
module by copying the state of a non-faulty redundant module to the faulty modules. We
implemented the same in SMERTMR controller, which is the combination of comparison and
recovery mode. This SMERTMR controller was used for comparing the states and recovering
erroneous modules. We modified the existing method to reduce the area used with low power
consumption and for improved performance. The proposed system is designed using Verilog code
on Xilinx ISE simulator.

KEYWORDS: Roll-forward error recovery, scan-chain TMR (ScTMR), scan-chain multiple


errors recovery TMR (SMERTMR), triple modular redundancy (TMR), fault tolerance.

INTRODUCTION

In todays world the embedded processors are widely used in many applications. In these safety
critical applications like process control in industries, avionics in space explorations and in
medical field for patient life support monitoring are of concern. To meet the reliability
requirements the embedded processor should be equipped with appropriate error detection and
correction mechanism and providing fault tolerant techniques with minimum performance is
another important issue. These systems require both timing constraint and fault tolerance.

In safety critical applications one of the well known and widely used fault tolerant techniques is
Triple Modular Redundancy (TMR). A traditional TMR system consists of three redundant
modules and a voter. So far proposed TMR based error recovery techniques uses a retry or roll
back mechanism to recover transient errors in TMR systems. These techniques are not suitable for
tight deadline applications. Because in retry or roll back mechanism, once an error is detected the
faulty module will re-execute the entire process.
In roll forward recovery mechanism there is no re-computation hence it can be used for tight
deadline applications. A roll forward technique for TMR system has been proposed in S. Yu & E. J.
McClusey (2001). The technique called Scan-chain TMR (ScTMR) uses a roll forward mechanism
and scan chain to recover both transient and permanent faults. This technique is suitable for
general purpose circuit has been proposed in M. Ebrahimi, S. G. Miremadi, & H. Asadi (2001). It has
two major drawbacks. First, when any latent faults are presented ScTMR is unable recover a single
faulty module. Second, ScTMR cannot recover the system if multiple faults are simultaneously
occurs at the modules outputs. The system named SMERTMR is extended to locate, remove and
recover the errors in multiple faulty modules M. Ebrahimi, S. G.Miremadi, H. Asadi & M. Fazeli (2013).
SMERTMR reuses the available scan chains to compare the internal states of TMR modules to
locate the faulty module and restore the correct state from fault-free redundant module into the
faulty modules. The SMERTMR is further modified for improved performance and to reduce the
area with low power consumption.

RELATED WORK

In traditional TMR system, voter masks only one faulty module and it cannot be recovered. The
voters proposed in H. Kim & K. G .Shin (1996), K. G. Shin & H. Kim (1989), P. K. Chande, A. K.
Ramani & P. C. Sharma (1989) these use the faulty modules history and identifies the permanent
error when the number of consecutive faults exceeds the predefined number. Using multiple voters
and disagree detector the transient and permanent faults can be masked. A disagree detector
detects the single fault by comparing the values of different voters in a TMR system but a faulty
detector may lead to failure. In most of the previous work H. Kim & K. G .Shin (1996), K. G. Shin &
H. Kim (1989) retry mechanism has been used, when an error is detected the faulty module will re-
execute the entire process. Re-computation may result in task completion after its deadline.

The work proposed in M. Ebrahimi, S. G.Miremadi, H. Asadi & M. Fazeli (2013) shows that multiple
errors can be recovered. It uses roll- forward recovery mechanism hence it is well suited for tight
deadline applications. In roll forward recovery method, the faulty module is recovered by
replacing its correct state with fault free redundant modules to avoid the re-computation. The
proposed architecture shows the multiple errors recovery with low power consumption and
improved performance with less area.

SMERTMR ARCHITECTURE

The SMERTMR block diagram is shown in figure.1. The SMERTMR includes 1) three redundant
modules; 2) a voter circuit; and 3) a controller. A proposed voter is an intelligent voter, it can
detect the error in single faulty module. Controller logic places very important role because it has
to make the decisions.

In this technique once an error is detected by the voter, it activates an error signal to alert the
SMERTMR controller. When activation of an error signals, controller switches to the comparison
mode to locate the faulty modules from the normal operation. After locating the faulty modules,
controller switches to the recovery mode to recover the faulty modules replacing the correct state
of one of the fault-free redundant modules. It reuses the available scan chains to compare the
internal states of the TMR modules. The proposed system can detect and recover the error using
same circuitry. It also deals with multiple errors recovery in TMR systems.

270
Figure.1. SMERTMR block diagram

Proposed Voter
In TMR all the three modules will do the same operation. The outputs from all the three modules
are given to the voter. An error signal is detected by the voter, it alerts the controller and controller
will initiates the comparison and recovery operation. In redundant systems detection and
correction of faulty modules is challenging issue and is still ongoing research topic. The system
reliability affects significantly for a wrong detection or inability find the faulty module.

Table 1.Table for working of proposed voter

Figure.2. Proposed voter

To address this issue, a voter proposed in figure.2 can identify the faulty module and also it can
detect the comparators faults. The voter consists of comparison unit i.e. three comparators (Cm12,
Cm13 and Cm23) for comparing the outputs of two TMR modules and also represents any
mismatch between two TMR modules. Ex: error signal E12 is activated when any mismatch
between output I and output II is detected. When any TMR modules generates erroneous output
(ex: output II), two comparators (Cm12 and Cm23) will activate the mismatch signals (E12 and
E23). If a comparator (Cm13) is faulty, only the corresponding error signal (E13) is activated and
other error signals (E12 and E23) are deactivated.

271
To detect the permanent faults, the presented voter has three input signals (Pr12, Pr13 and Pr23).
During normal and recovery process these three signals are deactivated. An output selector circuit
uses error signals (E12 and E13) as inputs to the logical AND gate to generate the select signal for
2 x 1 multiplexer. The ultimate output depends on majority of the modules output. The TABLE I is
used to identify the faulty module or faulty comparator and select the correct voter output using
error signals. According to the Table 1 when any of the comparators, module II and module III
becomes faulty, the output of module I is selected as fault free output. When a module I is faulty,
output of module II is selected as fault free output. Based on this specification the output selector
circuit can be designed.

SMERTMR CONTROLLER

State diagram of SMERTMR is as shown in figure.3 includes four states i.e. normal mode, comparison mode,
recovery mode and unrecoverable mode. When there are no faulty modules, initially system will be in normal
mode of operation. When an error is detected by the voter, it switches the controller and recovery mode of
operation. Where internal states are compared and fault is detected. Also faulty module will be replaced by
correct states of fault free module. After successful completion of recovery process if a comparison processes
fails to detect faults and an unsuccessful recovery of faults in recovery process, it enters to the unrecoverable
condition.

Figure.3. SMERTMR state diagram Figure.4.SMERTMR Controller architecture in


comparison and recovery mode

SMERTMR Controller in comparison and recovery mode


The SMERTMR controller is the brain of the system. Controller is responsible for both error
detection and recovery. Figure.4 shows the proposed SMERTMR controller architecture. In this
system controller switches from normal operation to the comparison mode to locate the faulty
modules after locating faulty modules controller then changes to the recovery mode. The proposed
controller architecture is a combination of both comparison and recovery mode. This controller
can also work without voter circuit. It can locate and recover the multiple faults in TMR systems
In comparison mode all module pairs (I/II, I/III and II/III) are compared. Using scan chains
internal states of all TMR modules are shifted out. During recovery mode using scan chain flip-
flop values of fault free redundant modules are shifted out and are copied to the corresponding

272
flip-flops in the faulty module. After completion of comparison mode Fault Locator Unit (FLU)
detects the faulty module and stores the faulty module numbers in Faulty Module Register (FMR).
Faulty Module Register (FMR) is used by the FLU to store the faulty module numbers.

As shown in figure. 4 the counters unit consists of three counters namely counter12, counter13 and
counter23 are used to stores the number of mismatch between TMR modules. Controller
architecture uses XOR gates to compare the internal states of the TMR modules. Using scan chain
state of faulty module is recovered by state of fault free redundant module in recovery process.
When any mismatch is detected corresponding counter is incremented by one unit. During
comparison mode counters are up counting. Upon detection of any mismatch between modules the
corresponding counter is decremented by one unit. In recovery mode counters are down counting.
For example module I is faulty the corresponding counters (counter12 and counter13) will be
incremented in comparison and decremented in recovery mode respectively. All counters value
should contain zero at the end of the recovery mode. If counters are non-zero the SMERTMR
system enters to the unrecoverable condition.

SIMULATION RESULTS

Proposed Voter Simulation Results


Figure5 shows the simulation results of the proposed voter. The signal ULTO is the ultimate
output and MI, MII and MIII are the three outputs of three redundant modules. The design is
checked for all combinations of inputs shown in Table 1. The voter selects the majority of modules
output if any mismatch is found and then it activates corresponding error signals E12, E13 and
E23.

Figure.5. Simulation for Voter results Figure.6.Area and Time Comparison

SMERTMR Simulation Results for Multiple Error Recovery


The area and time consumption are compared with the existing method and is shown in figure 6.
The figure 7 and figure 8 shows the simulation results for one and two faulty modules respectively.
The faults are injected based on fault injection technique.

273
Figure.7. Simulation for one faulty module Figure.8. Simulation for two faulty modules

CONCLUSION

In this paper, we designed the system to detect and recover the multiple errors in TMR systems. In
the proposed technique, we designed a voter circuit and then designed combine unit of comparison
mode and recovery mode of SMERTMR controller which in turn reduces the area used. Both
comparison and recovery modes operate at same time which reduces the time delay and power
consumption is also reduced.

REFERENCES

M. Ebrahimi, S. G.Miremadi, H .Asadi & M .Fazeli (2013) Low-Cost Scan-Chain-Based


Technique to Recover Multiple Errors in TMR Systems IEEE Trans. VLSI Systems,
vol.21.no.8, pp 1454-1468
H. Kim and K. G. Shin (1996) Design and analysis of an optimal instruction retry policy for
TMR controller computers, IEEE Trans. Comput vol.45, no11, pp.1217-1225.
S. Yu & E. J. McClusey (2001), On-line testing and recovery in TMR systems for real-time
applications, in Proc. Int. Test. Conf., pp.240-249
K. G. Shin & H. Kim (1989) A time redundancy approach to TMR failures using fault-state
likelihoods, IEEE Trans, Ind. Electron., vol.43, no.10, pp.1151-1162,
P. K. Chande, A. K. Ramani, & P. C. Sharma (1989) Modular TMR multiprocessor system,
IEEE Trans. Electron., vol.36, no. 1, pp.34-41.
M. Ebrahimi, S. G. Miremadi, & H. Asadi (2001) ScTMR: A Scan chain based error recovery
technique for TMR systems in safety critical applications, in Proc. Design Autom. Test Eur:
Conf. Exhibit., pp.1-4
Parag K. Lala text book on Fault tolerant and fault testable hardware design.

274

You might also like