High Availability

Designing a Control System for High Availability

Art Pietrzyk, TUV FSExp, Rockwell Automation

Brian Root, Redundancy Marketin Manaer, Proce!! "nitiati#e, Rockwell Automation
Paul $ru%n, P&E&, 'FSE, Trainin Manaer, "'S Triplex
(%en %earin t%e term )%i% a#aila*ility,+ many enineer! t%ink o, redundancy a! t%e
only met%od ,or ac%ie#in %i%er a#aila*ility& -owe#er, redundancy increa!e! t%e
num*er o, component!, w%ic% increa!e! t%e num*er o, potential component ,ailure!&
T%ere,ore, redundancy, i, not applied properly, can actually decrease !y!tem a#aila*ility&
So, !%ould redundancy remain top.o,.mind or !%ould alternate met%od! *e con!idered/
T%i! paper will di!cu!! redundant and non.redundant met%od! ,or ac%ie#in %i%
a#aila*ility o, control !y!tem!, a! well a! impro#ement! in control tec%noloy and
recommended control !y!tem de!in!& T%e paper will al!o %i%li%t ,eature! wit%in t%e
Rockwell Automation "nterated Arc%itecture0 plat,orm and "'S Triplex product line!
t%at can %elp ac%ie#e %i%er a#aila*ility&
Plea!e note t%at t%i! paper ,ocu!e! on %i% a#aila*ility ,or control !y!tem!& T%i! paper
doe! not addre!! !a,ety !y!tem concept! or de!in p%ilo!op%ie!&
What is High Availability?
At t%e mo!t *a!ic le#el, a#aila*ility can *e de,ined a! t%e pro*a*ility t%at a !y!tem i!
operatin !ucce!!,ully w%en needed& A#aila*ility i! o,ten expre!!ed a! a percent&
Expre!!ed mat%ematically, a#aila*ility i! one minu! t%e una#aila*ility&
A#aila*ility 1A2 i! calculated u!in t%e ,ormula A 3 MTBF 4 1MTBF 5 M6T2, w%ere
MTBF i! Mean Time Between Failure and M6T i! Mean 6own Time& M6T i! o,ten
a!!umed to *e t%e !ame a! MTTR, t%e Mean Time to Repair& MTTF, Mean Time To
Failure, i! o,ten con!idered interc%anea*le wit% MTBF, alt%ou% t%ere are !u*tle
di,,erence!& Anot%er common term in t%e ,ield o, relia*ility enineerin i! ,ailure rate 172
w%ic% i! expre!!ed a! 84MTBF&
T%e term %i% a#aila*ility %a! *een u!ed to encompa!! all t%in! related to producti#ity,
includin relia*ility and maintaina*ility& So let9! take a clo!er look at t%e!e term! a! well&
Relia*ility can *e de,ined a! t%e likeli%ood t%at a de#ice will per,orm it! intended
,unction durin a !peci,ic period o, time 1o,ten called t%e mi!!ion time2& "t i! a mea!ure
o, !y!tem !ucce!! o#er a time inter#al&
To %elp make !ure t%at product! meet cu!tomer expectation!, relia*ility can *e de!ined
u!in tec%ni:ue! !uc% a! 'omponent 6eratin and 6e!in t%rou% Six Sima&
6iano!tic! are nece!!ary in order to detect ,ault! and alert per!onnel w%en ,aulty
%ardware need! to *e replaced& T%i! %elp! ac%ie#e %i% a#aila*ility& T%e le#el o,
diano!tic! ,or t%e Allen.Bradley; 'ontrol<oix; proramma*le automation controller
1PA'2 ,rom Rockwell Automation exceed! => percent, meanin mo!t ,ailure! can *e
detected and appropriate action! taken& "'S Triplex Tru!ted0 and AA6#ance !y!tem!
%a#e diano!tic co#erae in t%e rane o, == percent&
But e#en t%e mo!t ro*u!t and relia*le !y!tem may not *e t%e mo!t a#aila*le& To *e
a#aila*le, a !y!tem mu!t al!o *e ea!y to trou*le!%oot, modi,y and repair durin t%e
mi!!ion time, w%ic% may exceed a decade or more&
The Impact of aintainability on Availability
Maintaina*ility i! t%e a*ility o, a !y!tem to *e c%aned or repaired&
Factor! t%at a,,ect maintaina*ility include?
Sy!tem and component.le#el diano!tic! ,or detectin and i!olatin ,ailure!
Annunciation o, ,ault!
Tool! ,or trou*le!%ootin
Trained per!onnel
Time to replace or repair
A*ility to add component! or make c%ane!
Maintaina*ility o, a !y!tem !ini,icantly impact! t%e end u!er9! perception o, a#aila*ility&
For example, today9! automo*ile! %a#e diano!tic ,eature! t%at can impro#e a#aila*ility
1e&& tire! t%at %eal and4or monitor pre!!ure, electronic inition!, diano!tic me!!ae
di!play! t%at indicate a mal,unctionin !y!tem and t%e a*ility to call ,or road!ide
Some o, t%e ,eature! wit%in Rockwell Automation product! t%at %elp impro#e
maintaina*ility include t%in! !uc% a! !y!tem.le#el diano!tic!, wire.o,, detection, auto.
tunin "4@ and determini!tic communication!& To a!!i!t wit% trou*le!%ootin, on*oard
<E6 indicator!, network monitorin tool!, rap%ical prorammin lanuae! and -M"
di!play! %elp :uickly identi,y and remedy pro*lem!&
Aey to keepin a !y!tem maintained i! to make !ure t%ere are :uali,ied and trained
per!onnel& <e!! o*#iou! B *ut Cu!t a! important B are p%y!ical c%aracteri!tic! t%at a,,ect
maintaina*ility& Module! or component! !%ould *e capa*le o, *ein remo#ed, replaced or
added to t%e !y!tem wit%out interruptin t%e mi!!ion& Replacement! !%ould not need
rewirin or reprorammin&
Feature! like online edit!, partial download!, addin "4@ online, and remo#in and
in!ertin module! under power %elp make t%e 'ontrol<oix PA', and t%e "'S Triplex
Tru!ted and AA6#ance !y!tem! more a#aila*le& T%e a*ility to add ta! to t%e -M" online
al!o %elp! impro#e t%e a#aila*ility o, t%e -M" and in,ormation layer!&
'ontrol tec%noloy ,eature! t%at %a#e impro#ed maintaina*ility, include?
Addin or remo#in module! under power
Addin "4@ online
@nline edit! and partial download!
So,t !witc%in o, proce!!or9! producer4con!umer communication
"nternal diano!tic! to detect ,ailure!
6iano!tic! o, ,ield circuit pro*lem!? open circuit, !%ort circuit, etc&
'on,iura*le ,ault re!pon!e? %old la!t !tate or turn o,,
-ART and ot%er ,ield*u! tec%noloy wit% !en!or and actuator diano!tic!
Sel,.learnin or in%erent mac%ine diano!tic!
Addin !en!or!, "4@ and ta! online wit%out interruption
"n today9! world o, increa!inly complex !emiconductor and !o,tware.*a!ed de#ice!, it
can *e more c%allenin to predict ,ailure!& "t i! not a :ue!tion o, if a ,ailure i! po!!i*le,
*ut %ow o,ten it can *e detected and w%et%er t%e mi!!ion can *e completed&
-ART and ot%er ,ield*u! de#ice! communicate wit% more intellient !en!or!, in!trument!
and actuator!, and pro#ide t%eir own le#el o, de#ice diano!tic!& T%i! diano!tic
,unctionality, alon wit% additional proce!! data t%at t%e!e de#ice! pro#ide, are married to
!o,tware t%at *ene,it! u!er! wit% up,ront alarm!, cali*ration and model in,ormation ,or
ea!ier replacement and in#entory manaement, makin !ure t%e part! are replaced
:uickly and correctly&
@t%er inno#ation!, !uc% a! !tate.*a!ed control and !el,.learnin diano!tic routine!, %a#e
rai!ed t%e a*ility o, t%e controller to detect, annunciate and de!cri*e pro*lem! wit%in t%e
mac%inery& For many u!er!, t%e a*ility to maintain and re#i!e t%e !y!tem wit%out !%uttin
down o,,er! an accepta*le le#el o, a#aila*ility, e!pecially i, t%e c%ane or repair can *e
made in minute!&
Achieving High Availability Through Redundancy and !ault
'u!tomer! or critical application! t%at cannot tolerate impact to t%e mi!!ion may ,ind
redundancy or ,ault tolerance nece!!ary&
Redundant component! needed ,or %i% a#aila*ility include?
Uninterrupted power !upply 1UPS2
Redundant power !upplie!
Redundant component!
o '%a!!i!
o Proce!!or!
o "4@ module!
o Sen!or! and actuator!
o P'!4-M"
o Detwork!
o Media
o Ser#er!
o 6ata*a!e!
Some ,orm o, redundancy or ,ault tolerance i! enerally u!ed i, a control !y!tem
!%utdown or lo!! o, #i!i*ility cau!e! a maCor lo!! o, re#enue, lo!! o, e:uipment, inCury to
people or a di!ruption to pu*lic !er#ice!& Redundancy in t%e!e !ituation! mean! t%e
duplication or triplication o, e:uipment t%at i! needed to operate wit%out di!ruption, i,
and w%en t%e primary e:uipment ,ail! durin t%e mi!!ion& Fault tolerance i! t%e a*ility
o, t%e !y!tem to tolerate ,ault! and continue operatin properly& T%ere are !u*tle
di,,erence! *etween redundancy and ,ault tolerance&
"n term! o, electrical e:uipment, t%e mo!t important place to !tart ,or uaranteein
relia*le operation i! *y pro#idin continuou! power& Mo!t power come! ,rom t%e
electrical power rid to a plant, w%ere at one point in t%e electrical deli#ery !y!tem a
!inle tran!,ormer !upplie! an area wit%in a plant& A li%tnin !trike on t%e power rid
will certainly %a#e an impact& Bad power i! !ometime! ,iltered *y a plant9! electrical
in,ra!tructure, a! *ad power can cau!e unexpected *e%a#ior to runnin microproce!!or.
*a!ed e:uipment& T%u!, t%e control !y!tem can only *e a!!umed a! relia*le a! t%e power
pro#ided to it& T%e ood new! i! t%at t%ere are many !upplier! o, :uality uninterrupti*le
power !upplie! 1UPS!2 t%at can %elp pro#ide a con!tant power o, accepta*le :uality&
T%e key i! to attac% t%e output power o, t%e UPS to t%e primary controller, w%ic% ,ilter!
!ure! and minimize! reco#ery o, t%e !y!tem w%en power i! re.e!ta*li!%ed *y t%e plant&
Rockwell Automation !y!tem! are a#aila*le wit% redundant power !upplie! and will work
wit% 6' and A' #oltae option! to accommodate plant, *attery and UPS !upplied power&
T%i! i! critical w%en powerin t%e primary power !upply wit% a UPS and i!olatin t%e
!econd redundant power !upply wit% !tandard plant line power& Redundant power
!upplie! can *e in!talled on *ot% t%e redundant controller pair! and in a remote
'ontrol<oix "4@ c%a!!i!&
"n addition to t%e power con!ideration!, redundant component! may *e re:uired to
pro#ide or maintain control in t%e e#ent o, ,ailure& T%e 'ontrol<oix !y!tem i! an
example o, a plat,orm t%at pro#ide! t%i! a!!urance& T%i! !y!tem pro#ide! redundant
power !upplie!, controller! and network module! 1*ot% 'ontrolDet and Et%erDet4"P
#er!ion!2 t%at re!ide in a !eparate c%a!!i!& T%i! !eparation and duplication pro#ide ,or a
more complete and maintaina*le controller redundancy&
AA6#ance !y!tem! can *e ,ault tolerant and utilize dual or triple redundancy o,
proce!!or! and "4@ module!& Tru!ted !y!tem! utilize triple modular redundancy 1TMR2
,or t%e %i%e!t le#el o, ,ault tolerance&
All Rockwell Automation !y!tem!, employin redundancy or ,ault tolerance, only re:uire
a !inle con,iuration& T%e only additional !etup needed wit% t%e 'ontrol<oix !y!tem,
ot%er t%an t%e %ardware, i! to c%eck t%e redundant controller parameter *ox in t%e
Rockwell So,tware; RS<oix0 E>>> prorammin !o,tware& All ta data and "4@ can
*e updated *etween t%e primary and !econdary controller at a u!er.de,ined rate&
"n t%e e#ent o, a ,ailure o, one o, t%e module! in t%e primary9! c%a!!i!, t%e !y!tem
!witc%e! control and -M" ,unction wit% a *umple!! tran!,er on t%e control to t%e
!econdary c%a!!i!& (%en t%e ,aulty module i! replaced in t%e di!:uali,ied !econdary
c%a!!i!, t%e !y!tem re.!ync%ronize! automatically wit%out any extra operator action&
Redundant controller! can *e mirated to t%e next relea!e wit%out !%uttin down control,
w%ic% an!wer! one o, t%e mo!t !ini,icant concern! ,or an end u!er&

-M" a#aila*ility i! mi!!ion.critical due to t%e ,act t%at mo!t cu!tomer! re:uire !ome ,orm
o, -M" to pro#ide #i!i*ility into t%e proce!! or e:uipment 8>> percent o, t%e time&
Typically, t%e network o, c%oice ,or t%e in,ormation layer to an -M", data %i!torian or
MES !y!tem i! Et%ernet, *ut i, redundant media i! !peci,ied, 'ontrolDet i! t%e *e!t
Redundant !ield Devices
(%en it come! to determinin i, redundant "4@ module! are re:uired ,or ac%ie#in %i%er
a#aila*ility, !en!or! and end de#ice! mu!t *e addre!!ed a! well& Mo!t !y!tem de!iner!
know t%at t%e relia*ility and diano!tic! ,rom !en!or! and actuator! i! a manitude le!!
t%an t%at o, t%e loic !ol#er& T%ey o,ten implement redundancy on t%e input !ide *y
monitorin t%e !ame proce!! #aria*le wit% two !eparate !en!or! t%at are wired to two
!eparate "4@ module!& A compari!on i! made in t%e loic to determine i, t%ere i! a
mi!matc%& Maintenance can per,orm a c%eck to i!olate t%e anomaly or automatic te!tin
!c%eme! can *e implemented to i!olate a pro*lem to t%e !en!or, wire or module&
Redundancy With the Control"ogi# System
", redundant "4@ module! are de!ired to ac%ie#e a#aila*ility and maintaina*ility, t%en
!e#eral met%od! utilizin termination! are po!!i*le& Eac% met%od and a!!ociated
termination! o,,er #ariou! deree! o, diano!tic!, ea!e o, u!e and co!t!& T%e le#el!
di!cu!!ed in t%i! paper are *a!ed on t%e ,ollowin?
82 Allen.Bradley Bulletin 8F=G0 terminal *lock! and termination *oard!

"n Fiure 8, !imple termination *lock! are u!ed to !impli,y t%e wirin ,rom a !inle
!en!or to two input module!& Two !eparate module! !%ould *e u!ed !o t%at i, one ,ail!,
t%e ot%er can !till pro#ide input data w%ile t%e ,ailed module i! replaced& For *ot% di!crete
and analo input!, no diode i!olation i! re:uired& T%e analo !inal ,rom a !en!or mu!t *e
con#erted to a #oltae !o it can *e read *y two input module! in parallel&
Fiure 8&

Termination! wit% *lockin diode! can *e u!ed ,or di!crete and analo output!& T%e
diode! i!olate t%e ,inal output dri#e de#ice! !o t%at a ,ailure to @FF or $D6 doe! not
*rin t%e ot%er output low& Dote? T%e polarity o, t%e diode! will *e *a!ed on w%et%er t%e
module! are !inkin or !ourcin& Mo!t output! are !ourcin !o t%e P !ide o, t%e diode
!%ould normally o to t%e output 1!ee Fiure G2&
Fiure G&
T%ere are !ome o*#iou! limitation! to t%i! !imple ,orm o, redundancy& Fir!t o, all, output!
w%ic% ,ail @D are not i!olated and will remain on unle!! an alternati#e met%od ,or
remo#in power i! de!ined into t%e !y!tem& T%e ,ailure mode! co#ered are t%o!e output!
w%ic% turn o,, and t%e !econd output can continue to pro#ide power&
T%ere are no diano!tic! to detect i, an output %a! ,ailed unle!! a pro#i!ion, !uc% a!
monitorin output! wit% input!, i! included& "nput! can *e compared ,or areement in
ladder loic 1!ee Fiure H2& ", mi!compare i! detected, t%e u!er mu!t trou*le!%oot t%e
input !tate! and #oltae! at t%e termination panel and trou*le!%oot to i!olate t%e ,ailure&
Fiure H& Application loic can compare input #alue! or !tate! ,or concurrence&
T%e u!er proram mu!t al!o contain run! to annunciate a ,ault in t%e e#ent o, a !u!tained
mi!compare *etween two point! 1!ee Fiure F2&
Fiure F&
T%e next example o, a ,orm o, "4@ redundancy i! t%rou% t%e u!e o, complex
termination!, w%ic% %a#e on.*oard circuitry and are a*le to pro#ide more diano!tic! and
ot%er ,unctionality& Some o, t%e ,eature! a#aila*le include? o#er.#oltae protection, ,u!e!,
and <E6! ,or trou*le!%ootin& Fiure E !%ow! a re!tricted ,orm o, redundancy ,or
ac%ie#in ,ault tolerance o, "4@ ,or a S"<.rated 'ontrol<oix !y!tem&

Fiure E& Fault Tolerant 'ontrol<oix Sy!tem
A-B Qualit
Redundancy Module
Redundancy Module
DC Input Board
Output Board
I!O C"ai #$% I!O C"ai #B%
$nl( Input Board
!ault Tolerance With AADvance
T%e AA6#ance !y!tem, !upplied *y "'S Triplex, a Rockwell Automation company, can
%a#e indi#idual portion! con,iured !implex, dual or triplicated& 6i,,erent le#el! o, ,ault
tolerance can *e pro#ided dependin upon t%e u!er! re:uirement!& A 8ooG6 arranement
18 out o, G wit% diano!tic!2 i! ,ault tolerant and can !ur#i#e !inle module ,ailure!& "n a
dual con,iuration, i, a !inle module were to ,ail, t%e con,iuration derade! to !implex&
", t%e la!t module were to ,ail, t%e !y!tem would !%ut down& T%i! i! re,erred to a! a G.8.>
deradation mode&
Fiure I& Redundant 18ooG62 AA6#ance Sy!tem
AA6#ance can al!o %a#e portion! con,iured in a triple modular redundant 1TMR2
arc%itecture ,or reater ,ault tolerance& "n a triplicated con,iuration, w%en a !inle
module ,ail!, t%e con,iuration derade! to dual& ", anot%er module were to ,ail *e,ore t%e
,ir!t ,ailure i! repaired, t%e con,iuration derade! to !inle& ", t%e la!t module were to
,ail, t%e !y!tem would !%ut down& T%i! i! re,erred to a! a H.G.8.> deradation mode&
!ault Tolerance With Trusted System
T%e "'S Triplex Tru!ted !y!tem utilize! a triple modular redundant 1TMR2 arc%itecture&
Triplication eliminate! t%e po!!i*ility o, any !inle component ,ailure cau!in a !puriou!
or ,al!e trip& T%i! ac%ie#e! t%e %i%e!t le#el o, a#aila*ility& -i% le#el! o, internal
diano!tic!, a! well a! error! detected t%rou% di!crepancie!, allow t%e !y!tem to continue
runnin in t%e pre!ence o, ,ault! and annunciate ,ault! ,or operator action& Module! can
*e ea!ily replaced online wit%out a,,ectin t%e proce!!&
A di,,erence in t%e triplication *etween AA6#ance and Tru!ted !y!tem! i! t%at t%e
Tru!ted maCority #ote! data in %ardware& T%ere,ore, it i! not po!!i*le to run on a !inle
)!lice+& T%i! i! re,erred to a! a H.G.> deradation mode& ", !pare module! are in!talled, t%e
deradation mode *ecome! H.H.G.>&
T%e Tru!ted TMR !y!tem pro#ide!?
J T%e %i%e!t le#el o, internal diano!tic!
J Tolerance to multiple ,ailure!
J Do time repair re!triction!
J Reduced operatin !y!tem !ize and complexity
Achieving High $et%or& and HI Availability Through Design
Many time!, cu!tomer! can ac%ie#e an accepta*le le#el o, a#aila*ility t%rou% de!in,
w%ic% include! t%e controller, -M" and in,ormation !y!tem& T%e de!iner mu!t *e willin
to accept t%e ,act t%at anyt%in can ,ail and de!in t%e ,acility around t%i! notion&
T%e e:uipment or plant can *e de!ined to continue runnin i, a mac%ine were to ,ail&
T%i! %a! o,ten *een re,erred to modular di!tri*uted de!in, and in#ol#e! t%e ,ollowin
J 6i!tri*uted control arc%itecture!
J 6i!tri*uted control de!in wit% independent line, zone!, etc&
J 6i!tri*uted -M"
J 6i!tri*uted data*a!e!
"n continuou! and *atc% proce!!in operation!, ,ollowin t%e SKK model %elp! ac%ie#e
a#aila*ility *y allowin recipe! and procedure! to *e ported to #ariou! e:uipment, line!
and plant!& -uman inter,ace into a proce!! or operation i! not only crucial, *ut in many
ca!e! it i! an a*!olute re:uirement t%at %a! *een ac%ie#ed o#er t%e year! u!in %ardwired
indicator li%t! and manual control!&
Today, t%e!e are *ein replaced wit% t%e more co!t e,,ecti#e 'RT.*a!ed -M"& Eac%
tec%noloy %a! it! po!iti#e and neati#e!& T%e neati#e a!pect! o, electronic, and
e!pecially P'.*a!ed -M", i! t%at it i! a !inle and complex de#ice& Micro!o,t.*a!ed
di!play! %a#e t%e additional c%allene! o, *ein open and !u*Cect to !ecurity, #iru! and
ot%er related i!!ue!&
Since mo!t proce!! cu!tomer! cannot !ee t%e mac%inery and product, w%ic% i! !pread out
o#er lare area! and in clo!ed pipe! and tank!, #i!i*ility i! crucial& Mo!t cu!tomer! u!e
one o, !e#eral di,,erent met%od! to ac%ie#e #i!i*ility and mo!t, i, not all, include !ome
,orm o, redundancy& @ne typical met%od i! to de!in u!in di#er!ity& T%i! mi%t mean
di#er!e -M" de#ice!, !uc% a! an Allen.Bradley PanelView0 terminal wit% FactoryTalk;
View -M" !o,tware or a 'RT.*a!ed -M", are u!ed in conCunction wit% local indicator
li%t!& Anot%er met%od i! t%e u!e o, client !er#er con,iuration!, wit% multiple client!
linked !ometime! wit% redundant !er#er!&
", data i! critical, t%an pro#i!ion! !%ould !ecure it ,rom data lo!! !uc% a!, i, a !inle !er#er
would o down& @ne o, t%e mo!t economical met%od! i! to keep !ome o, t%e more recent
data !tored in t%e controller!& ", a network i! determined to *e a weak link, t%en redundant
or ,ault tolerant communication! !%ould *e utilized& T%i! could include Et%ernet rin!,
wit% or wit%out redundant media& Detwork! can *e con,iured wit% redundant pat%! u!in
!witc%e! and4or router!&
Security i! al!o important, a! !ecurity *reec%e! will impact a#aila*ility&
T%i! paper explored t%e redundant and non.redundant met%od! ,or ac%ie#in %i%
a#aila*ility, a! well a! impro#ement! in control tec%noloy and recommended control
!y!tem de!in!& Alt%ou% redundancy i! t%e traditional met%od ,or ac%ie#in %i%
a#aila*ility, t%e mean! to ac%ie#in %i% a#aila*ility re:uire more t%an Cu!t t%inkin
a*out redundant component!& A !y!tem wit% no redundant component! can !till *e #ery
