Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

US010802929B2

( 12) United States Patent ( 10) Patent No .: US 10,802,929 B2


Bailey et al . (45 ) Date of Patent : Oct. 13 , 2020
( 54 ) PARALLEL PROCESSING SYSTEM ( 56 ) References Cited
RUNTIME STATE RELOAD
U.S. PATENT DOCUMENTS
( 71 ) Applicant: Tesla, Inc. , Palo Alto , CA (US ) 4,015,804 A * 4/1977 Dobler B61L 27/0011
246/5
( 72 ) Inventors: Daniel William Bailey , Austin , TX 4,816,989 A * 3/1989 Finn GO6F 9/4881
(US ) ; David Glasco , Austin , TX (US ) 709/248
(Continued )
( 73 ) Assignee : Tesla, Inc. , Palo Alto , CA (US )
FOREIGN PATENT DOCUMENTS
( * ) Notice: Subject to any disclaimer, the term of this WO 2013126066 Al 8/2013
patent is extended or adjusted under 35
U.S.C. 154 ( b ) by 147 days . OTHER PUBLICATIONS
( 21 ) Appl. No .: 15 /979,771 International Searching Authority; International Search Report and
Written Opinion ; International Application No. PCT / IB2018 /
(22) Filed : May 15 , 2018 060136 ; dated Apr. 12 , 2019 ; 11 pages .
( 65 ) Prior Publication Data Primary Examiner Jonathan D Gibson
(74 ) Attorney, Agent, or Firm — Knobbe, Martens, Olson
US 2019/0205218 A1 Jul . 4 , 2019 & Bear, LLP
(57 ) ABSTRACT
Related U.S. Application Data A parallel processing system includes at least three proces
sors operating in parallel, state monitoring circuitry, and
( 60 ) Provisional application No. 62/ 613,306 , filed on Jan. state reload circuitry. The state monitoring circuitry couples
to the at least three parallel processors and is configured to
3 , 2018 .
monitor runtime states of the at least three parallel proces
(51 ) Int. Cl . sors and identify a first processor of the at least three parallel
GOOF 11/00 ( 2006.01 ) processors having at least one runtime state error. The state
G06F 11/14 ( 2006.01) reload circuitry couples to the at least three parallel proces
(Continued )
sors and is configured to select a second processor of the at
least three parallel processors for state reload , access a
( 52 ) U.S. CI. runtime state of the second processor, and load the runtime
CPC GO6F 11/1469 ( 2013.01 ) ; G06F 9/52 state of the second processor into the first processor. Moni
(2013.01 ) ; G06F 11/0724 ( 2013.01 ) ; toring and reload may be performed only on sub - systems of
( Continued ) the at least three parallel processors . During reload , clocks
( 58 ) Field of Classification Search and supply voltages of the processors may be altered . The
CPC G06F 11/2028 state reload may relate to sub - systems.
See application file for complete search history. 18 Claims , 15 Drawing Sheets

154 PARALLEL PROCESSOR 102A

298 PARALLEL PROCESSOR 102B


STATE
MONITORING STATE RELOAD
CIRCUITRY CIRCUITRY
3rd PARALLEL PROCESSOR 102C 106
104

Nth PARALLEL PROCESSOR 102N

100
US 10,802,929 B2
Page 2

(51 ) Int. Ci. 5,062,046 A 10/1991 Sumiyoshi G06F 9/468


GO6F 9/52 ( 2006.01 ) 700/4
G06F 11/267 ( 2006.01 ) 5,464,435 A * 11/1995 Neumann A61N 1/36514
GO6F 11/18 ( 2006.01 ) 607/9
G06F 11/16 ( 2006.01 ) 5,473,771 A * 12/1995 Burd G06F 11/1443
GOOF 11/07 ( 2006.01 ) 709/248
G06F 11/30 ( 2006.01 ) 5,652,833 A * 7/1997 Takizawa GOOF 11/2028
G06F 11/20 ( 2006.01 ) 714/10
( 52) U.S. CI. 10,474,549 B2 * 11/2019 West G06F 11/1658
2006/0107112 A1 * 5/2006 Michaelis G06F 11/1654
CPC GO6F 11/0796 (2013.01 ) ; G06F 11/1629 714/15
(2013.01 ) ; G06F 11/1658 ( 2013.01 ) ; G06F 2006/0107114 A1 * 5/2006 Michaelis G06F 11/0721
11/181 ( 2013.01 ) ; GO6F 11/183 ( 2013.01 ) ; 714/25
G06F 11/2028 (2013.01 ) ; G06F 11/267 2009/0328049 Al 12/2009 Tanaka
(2013.01 ) ; G06F 11/3013 (2013.01 ) ; GOOF 2012/0159235 A1 * 6/2012 Suganthi G06F 11/2028
2201/805 ( 2013.01 ) ; GO6F 2201/82 (2013.01 ) 714 / 4.11
( 56) References Cited 2015/0026532 A1 1/2015 Clouqueur
2016/0132356 A1 * 5/2016 Kozawa G06F 9/50
U.S. PATENT DOCUMENTS 713/100
2018/0285296 A1 * 10/2018 Rota G06F 15/17
4,823,256 A * 4/1989 Bishop G06F 11/1666 2019/0034301 A1 * 1/2019 West G06F 11/1658
700/82 2019/0205218 A1 * 7/2019 Bailey G06F 9/52
4,891,810 A * 1/1990 de Corlieu G06F 11/2041
714/11 * cited by examiner
U.S. Patent Oct. 13 , 2020 Sheet 1 of 15 US 10,802,929 B2

RSETLAOED CIRUTY 106

P1RAO0sCEL2StAOR P21ARO0nCEL2SdBOR P31AROr0CEL2SdCOR PN1ARO0tCEL2ShNOR FIG


1
.

st

STA E MONITRG CIRUTY 104 100


U.S. Patent Oct. 13 , 2020 Sheet 2 of 15 US 10,802,929 B2

CLIMATE CONTRL DEVICE 218 SADENSOR 212D


204B
204A STERING CONTRLE 224

SADENSOR 212F 205


E/NGINE
MOT R CONTRLE 222

BRAKING CONTRLE 226


?

BATERY CONTRLE 220 SADENSOR 212C


2
.
FIG

202
CAONTRDLE
SADENSOR 212E INFOTAME DEVICE 214
200
ADSENSOR 212A
SADENSOR 212B
MEORY 216
U.S. Patent Oct. 13 , 2020 Sheet 3 of 15 US 10,802,929 B2

361

MEORY 354 R3EC6IV0ER


PDAURTIONVLMEGS PRYOCSETINMG 364 MEDIA
1
362
F
/ 3B
.
FIG

GENRAL PROCESING CIRUTY 352 T3RANS5MIT8ER


*****CTEKE

202

311

MEORY 304 R3EC1IV0ER


O,SL(RCORATIMHNDEIOARC ST)EYNSPOER 306 MEDIA
312
F
/
I
FIG
3A
.

PROCESING CIRUTY 302 T3RANS0MIT8ER


*****
300
U.S. Patent Oct. 13 , 2020 Sheet 4 of 15 US 10,802,929 B2

RSETLAOED CIRUTY 406

FEA
4CShc0a3inA P24ARO0nCEL2SdBOR SC4hc0a3inB I.P34AROr0CEL2SdCOR 4CSh0ca3iCn PN4RAO0tCEL2ShNOR CS4hc0a3inN 4
.
FIG
P1"4ARO0CEL2S OALR
st

STA E MONITRG CIRUTY 404 400


U.S. Patent Oct. 13 , 2020 Sheet 5 of 15 US 10,802,929 B2

402A A

PROCES 502AN
SUB-SYSTEM 3 2

PROCES 502A3 CMSOINTRAUEYG


3 3 3 3

SUB-SYSTEM
5MLEO0C6ARLY 13 3 3

404
CRSIETLAUOREDY 406 5
.
FIG

PROCES 502A
3

3
SUB-SYSTEM 3 3

3 3
PROCES 502A1
SUB-SYSTEM
U.S. Patent Oct. 13 , 2020 Sheet 6 of 15 US 10,802,929 B2

-602N 2

LOCAL ME ORY 606N 3


PROCES -SSUBYSTEM 604N 3 3 3

602C
PROCES
3

LOCAL M6E0ORCY 3 2 2 2
-SSUBYSTEM 604C 3

6
.
FIG
CMSOINTRAU EYG CRSIETLAUOREDY
404 406

602B
PROCES
2

LOCAL 6ME0ORBY 2
-SSUBYSTEM 604B

-602A 2 2 3

LOCAL M6E0ORAY 3 3
PROCES 604A
SUB-SYSTEM 3
U.S. Patent Oct. 13 , 2020 Sheet 7 of 15 US 10,802,929 B2

702 706 708 710 712


Ter
704

OSMCITNRAUFOESYG -S/PRAYOCUELTSBOMLR ?ESRTAOTRE


Y
/PICRSDAETONLUSIFAEORDLY ESRW-UYTINRAOBEMH SCR/PIEATOLUISCTABERODYL RSFTOUEYTNLOSAIREMD SRACOIUETNLFSAMERDY -S/PRAEYOLCUTSBEOMLDR OSRLCIEUTNAFDMERSY PI-S/RAYONUCELTSBEOMLR ESRW/SUB-PUYTOINSCRAETOSMHRE
N
r .
FIG
ZA

700
U.S. Patent Oct. 13 , 2020 Sheet 8 of 15 US 10,802,929 B2

740 742 744

OFTATLFCOEWMASCORTKN LAFVSCOEUMTRASCOGEN
ACRSIETLUOAREDSY SRIEGLNOADS ECRSITALBUOSHREDY POFT/RLAOUCHELSIEOTRLYS SUBYTEM ECRSITALBUOHREDYS OFPTLAURHWLIEOTY SP/URBOCYESTMR 7C
.
FIG

710

730 732
pre 734

OFATLVSOENUATRSECGT /SP-RAYOCUSELTBOMLR
ACRSIETLUOAREDSY SRIEGLNOADS CMRSIOELTDAUFCRKEDSY ONEPOFATL/AREOCALSETOLR SUB-SYSTEM CRSMIOETDLAUFRESY FIG
7B
.

710
U.S. Patent Oct. 13 , 2020 Sheet 9 of 15 US 10,802,929 B2

814

4CMSOITNR0AU4IENYG 812A 812B wase mfer


10
810A

802A vd 1 clk1 802B vd 2 clk2 802C vd 3 clk3 8


.
FIG
806A 806B 806C

-HOLT
-80 A 808B -808C

804A 804B 804C


Sub ys1 Su8bS1y0s2B Su8bS1y2s30 800

810C
mazuropen w apbsano ***
U.S. Patent Oct. 13 , 2020 Sheet 10 of 15 US 10,802,929 B2

814
?
=
serpum

812A 812B

802
vd al clkal
1 810A

806C
806B 808C FIG
9
.

AFG
806A
808B 804C
808A
824
+SubSys321 804A 804B 1

900
810B 8120
810C
w form from
s
U.S. Patent Oct. 13 , 2020 Sheet 11 of 15 US 10,802,929 B2

804B
910
9
804A
10
.
FIG
908 904s
902
906

rsteload1 scan_hift scan_dti dat 1 st_reload2 data2 824


U.S. Patent Oct. 13 , 2020 Sheet 12 of 15 US 10,802,929 B2

8 7 3
22

11
.
FIG

(X _X X
clkal dat 2
st_reload1 dat 1
U.S. Patent Oct. 13 , 2020 Sheet 13 of 15 US 10,802,929 B2

EARLY CY LE
12
.
FIG
NEG SKEW

XXX
2 3
?

FAULT FAULT
*
*
*

ff.
&
*

dXXaXtaX1
clk2 dat 2 clk1
st_reload1
U.S. Patent Oct. 13 , 2020 Sheet 14 of 15 US 10,802,929 B2

-CLYATLE
POS SKEW

*********

XX
FAULT FAULT
13
.
FIG

clk2 dat 2 clk1


Xst_reload1 X
data1
U.S. Patent Oct. 13 , 2020 Sheet 15 of 15 US 10,802,929 B2

HOLD
SKIP

HOLD

XX
3

SKIP
DO
*
FIG
14
.
*
*
*

*
*

H
U
T
44.
+
++
++
++
43
44+

SKIP
1

FAULT FAULT
HOLD

WdXatXa1X
clk2 data
2 clk1
st_reload1
US 10,802,929 B2
1 2
PARALLEL PROCESSING SYSTEM error state . The state time reload is performed while other
RUNTIME STATE RELOAD processors having the good state continue to function , thus
increasing system availability.
CROSS - REFERENCE TO RELATED According to one aspect of the present disclosure , the
APPLICATION 5 runtime states of the at least three parallel processors cor
respond to respective sub - systems of the at least three
The present application claims priority pursuant to 35 parallel processors. With such aspect , only a predetermined
U.S.C. 119 (e ) to U.S. Provisional Patent Application No. portion of the parallel processors are monitored for runtime
62 / 613,306 , entitled “ PARALLEL PROCESSING SYS 10
errors and have their states replaced when an error is
TEM RUNTIME STATE RELOAD ” , filed 3 Jan. 2018 , determined . With this implementation , only deemed most
which is incorporated herein by reference in its entirety for important sub - systems may be affected . With one particular
all purposes. example of this aspect , the parallel processing system sup
BACKGROUND ports autonomous driving and the respective sub - systems of
15 the at least three parallel processors are safety sub - systems
Technical Field that determine whether autonomous driving is to be enabled .
According to another aspect of the present disclosure the
The present invention relates to parallel processing sys secondstate reload circuitry is configured to use a scan chain of the
tems ; and more particularly to the states of parallel proces processor to access the runtime state of the second
sors of a parallel processing system . 20 processor and to use a scan chain of the first processor to
load the runtime state of the second processor into the first
Description of Related Art processor. By using scan chains for state access and reload ,
the scan chains as modified according to the present disclo
A system on a Chip ( SoC ) includes a plurality of pro sure support the additional aspects of the present disclosure .
cessing systems arranged on a single integrated circuit . Each 25 Further, according to still another aspect of the present
of these separate processing systems typically performs a disclosure, accessing the runtime state of the second pro
corresponding set of processing functions. The separate cessor includes accessing a plurality of pipeline states of the
processing systems typically interconnect via one or more second processor and loading the runtime state of the second
communication bus structures that include an N - bit wide processor into the first processor includes loading the plu
data bus (N , an integer greater than one ). 30 rality of pipeline states into the first processor. According to
Some SoCs are deployed within systems that require high this aspect , during loading of the runtime state of the second
availability, e.g. , financial processing systems, autonomous processor into the first processor the state reload circuitry
driving systems , medical processing systems , and air traffic may be further configured to alter at least one clock input of
control systems , among others. These parallel processing the first processor and at least one clock input of the second
systems typically operate upon the same input data and 35 processor.
include substantially identical processing components, e.g. , According to yet another aspect of the present disclosure ,
pipeline structure, so that each of the parallel processing during loading of the runtime state of the second processor
systems , when correctly operating, produces substantially into the first processor the state reload circuitry is further
the same output. Thus, should one of the parallel processors configured to alter a supply voltage of at least one of the first
fail, at least one other processor would be available to 40 memory processordataandofthethesecond processor and / or to invalidate
first processor.
continue performing autonomous driving functions.
A method for operating a parallel processing system
SUMMARY having at least three parallel processors an embodiment of
the present disclosure according to the present disclosure
Thus, in order to overcome the above -described short- 45 includes monitoring runtime states of the at least three
comings , among other shortcomings, a parallel processing parallel processors , identifying a first processor of the at
system of an embodiment of the present disclosure includes least three parallel processors having at least one runtime
at least three processors operating in parallel, state moni state error, selecting a second processor of the at least three
toring circuitry, and state reload circuitry. The state moni parallel processors for state reload , accessing a runtime state
toring circuitry couples to the at least three parallel proces- 50 of the second processor, and loading the runtime state of the
sors and is configured to monitor runtime states of the at second processor into the first processor.
least three parallel processors and identify a first processor According to an aspect of this method , the runtime states
of the at least three parallel processors having at least one of the at least three parallel processors may correspond to
runtime state error. The state reload circuitry couples to the respective sub - systems of the at least three parallel proces
at least three parallel processors and is configured to select 55 sors . The method may further include using a scan chain of
a second processor of the at least three parallel processors the second processor to access the runtime state of the
for state reload, access a runtime state of the second pro second processor and using a scan chain of the first proces
cessor, and load the runtime state of the second processor sor to load the runtime state of the second processor into the
into the first processor. first processor.
The parallel processing system may be implemented as 60 According to another aspect of this method , the method
part of an autonomous driving system , part of a financial may further include altering at least one clock input of the
processing system , part of a data center processing system , first processor and at least one clock input of the second
or part of another system requiring high reliability. With the processor and / or invalidating local memory of the first
state reload aspect of the parallel processing system of the processor.
present disclosure , when one processing system is deter- 65 Benefits of the disclosed embodiments will become
mined to be in an error state , the good state of another apparent from reading the detailed description below with
processor may be loaded into the processing system in the reference to the drawings.
US 10,802,929 B2
3 4
BRIEF DESCRIPTION OF THE SEVERAL processor of a plurality of parallel processors/plurality of
VIEWS OF THE DRAWINGS sub -systems and to load an error free runtime state into
another processor/sub -system in runtime. The principals of
FIG . 1 is a diagram illustrating a parallel processing the present disclosure may be applied to critical computing
system constructed and operating according to a described 5 systems such as data centers , communication switches ,
embodiment. medical devices , autonomous driving systems , and indus
FIG . 2 is a block diagram illustrating an autonomous trial system controllers , for example, that should not be
driving controller constructed and operating according to a taken out of service for a full restart . A particular installation
first described embodiment. within an autonomous driving system is described herein
FIG . 3A is a block diagram illustrating an autonomous 10 with reference to FIGS . 2 , 3A , and 3B herein .
driving sensor constructed according to a described embodi The parallel processing system 100 of FIG . 1 includes N
ment. parallel processors 102A , 102B , 102C , ... , and 102N state
FIG . 3B is a block diagram illustrating an autonomous monitoring circuitry 104 and state reload circuitry 106. The
driving controller constructed according to a described principals of the present disclosure are contemplated and
embodiment. 15 described with at least three processors operating in parallel.
FIG . 4 is a block diagram illustrating a parallel processing However, these principals may be extended to include more
system that supports runtime state reload according to a than three processors operating in parallel.
described embodiment. Further, each of the parallel processors 102A - 102N may
FIG . 5 is a block diagram illustrating a processor of a include a plurality of sub -systems. Moreover, each of the
plurality of parallel processors of the parallel processing 20 parallel processors 102A - 102N may have its own local
system that supports runtime state reload according to a memory .
described embodiment. The plurality of parallel processors 102A - 102N have
FIG . 6 is a block diagram illustrating respective sub identical or nearly identical processing components, e.g. ,
systems of a plurality of parallel processors of a parallel pipeline structure with combination logic and data paths
processing system that supports runtime state reload accord- 25 there between, and operates on substantially the same input
ing to a described embodiment. data . With the nearly identical structure and operating on the
FIG . 7A is a flow diagram illustrating runtime state reload same input data , the plurality of parallel processors 102A
operations according to one or more described embodi 102N should have identical output data and runtime states.
ments . However, because of local environmental conditions of the
FIG . 7B is a flow diagram illustrating first options for 30 plurality of parallel processors 102A -102N , e.g. , by voltage
runtime state reload operations according to one or more fluctuations, circuit aging , clock skew , memory write errors ,
aspects of the present disclosure . memory read errors , etc., one ( or more ) of the parallel
FIG. 7C is a flow diagram illustrating second options for
processors 102A- 102N may have one or more runtime state
runtime state reload operations according to one or more errors . With runtime state errors , the processor may fail to
other aspects of the present disclosure . 35 correctly perform its processing operations and produce
FIG . 8 is a block diagram illustrating portions of a erroneous output data . The term “ runtime” is used herein to
plurality of processors of a parallel processing system and indicate that state monitoring and state reload is done while
state monitoring circuitry according to one or more the plurality of parallel processors 102A - 102N are opera
described embodiments . tional and performing their intended functions.
FIG . 9 is a block diagram illustrating portions of a 40 Thus, according to the present disclosure, the parallel
plurality of processors of a parallel processing system hav processing system 100 includes state monitoring circuitry
ing common clock and voltage input according to one or 104 coupled to the plurality of parallel processors 102A
more described embodiments. 102N . The state monitoring circuitry 104 is configured to
FIG . 10 is a block diagram illustrating a portion of state monitor runtime states , inputs, and / or outputs of the plurality
reload circuitry that works in combination with a scan chain 45 of parallel processors 102A - 102N and to identify a first
according to one or more described embodiments . processor, e.g. , 102A of the plurality of parallel processors
FIG . 11 is a timing diagram illustrating clocks of the 102A - 102N having at least one runtime state error. The state
circuits of FIGS . 9 and 10 according to one or more monitoring circuitry 104 may monitor runtime states of the
described embodiments . plurality of parallel processors 102A - 102N at any accessible
FIG . 12 is a timing diagram illustrating clocks of the 50 pipeline location, at any output location, or at any other
circuits of FIGS . 8 and 10 according to one or more location of the plurality of parallel processors 102A - 102N at
described embodiments . which runtime state data is available. Generally, with at least
FIG . 13 is a timing diagram illustrating clocks of the three processors 102A , 102B , and 102C , differing respective
circuits of FIGS . 8 and 10 according to one or more other runtime states are compared to one another in parallel. In
described embodiments . 55 one embodiment considering three parallel processors/ sub
FIG . 14 is a timing diagram illustrating clocks of the systems , if two out of three of the runtime states are equal
circuits of FIGS . 8 and 10 according to one or more other or consistent with one another and a third of the runtime
described embodiments . states is unequal or inconsistent with the other two , it is
determined that the unequal or inconsistent runtime state is
DETAILED DESCRIPTION OF THE 60 in error. Many different mechanisms may be employed to
DISCLOSURE compare runtime states to one another. One particular
example is illustrated in FIGS . 8 and 9 .
FIG . 1 is a block diagram illustrating a parallel processing The parallel processing system 100 further includes state
system 100 that supports runtime state reload according to a reload circuitry 106 coupled to the plurality of parallel
described embodiment. The concepts of the present disclo- 65 processors 102A - 102N , the state reload circuitry 106 being
sure may also be applied to any parallel processing system configured to select a second processor, e.g. , 102B , of the
to determine whether a runtime state error exists in any plurality of parallel processors 102A - 102N for state reload ,
US 10,802,929 B2
5 6
to access a runtime state of the second processor 102B , and By way of example and not limitation , processing cir
to load the runtime state of the second processor 102B into cuitry 302 may be a central processing unit , a microcon
the first processor 102A . Various structures and methodolo troller, a digital signal processor, an application specific
gies for monitoring runtime states and reloading runtimes integrated circuit , a Judging unit, a Determining Unit, an
will be described further herein with reference to FIGS . 5 Executing unit, combinations of any of the foregoing, or any
4-14 . As will be described further with reference to FIG . 6 , other device suitable for execution of computer programs.
however, not all of the components of the plurality of By way of example, memory 304 may be dynamic memory,
processors 102A - 102N may be monitored and enabled for static memory, disk drive ( s ), flash drive ( s ), combinations of
runtime state reload . any of the foregoing, or any other form of computer
FIG . 2 is a block diagram illustrating an autonomous 10 memory. The memory 304 stores computer programs for
driving system 200 constructed and operating according to operations of the present disclosure, may also store other
a described embodiment. The example of FIG . 2 is provided computer programs, configuration information , and other
to show a system in which a plurality of processors is short - term and long - term data necessary for implementation
deployed to support autonomous driving. The example of of the embodiments of the present disclosure.
FIG . 2 is only one implementation in which the teachings of 15 The transceiver 311 includes a transmitter 308 , a receiver
the present invention may be employed. Other implemen 310 , and a media I/ F 312. The media I /F 312 may be a
tation examples include financial processing systems, medi transmit / receive ( T /R ) switch , a duplexer, or other device
cal processing systems, and air traffic control systems, that supports the illustrated coupling . In other embodiments ,
among others. both the transmitter 308 and receiver 310 couple directly to
The autonomous driving system 200 includes a bus , an 20 the bus or couple to the bus other than via the media I/F 312 .
autonomous driving controller 202 coupled to the bus , and The transceiver 311 supports communications via the bus .
a plurality of autonomous driving sensors 212A - 212F The processing circuitry 302 and the transceiver 311 are
coupled to the bus . In the embodiment of FIG . 2 , the bus configured to transmit autonomous driving data to the
includes two interconnected sections . A first section 204A of autonomous driving controller 108 on the bus .
the bus includes one or more conductors , a second section 25 FIG . 3B is a block diagram illustrating an autonomous
204B of the bus includes one or more conductors , and an driving controller constructed according to a described
interconnecting portion 205 that interconnects the first sec embodiment. The autonomous driving controller 202
tion 204A and the second section 204B of the bus . The bus includes general processing circuitry 352 , memory 354 , and
may be a twisted pair of conductors, a pair of strip conduc a transceiver 361 coupled to the general processing circuitry
tors , a coaxial conductor, a two conductor power bus that 30 352 and configured to communicate with a plurality of
carries DC power, or another structure having one or two autonomous driving sensors via the bus . The autonomous
conductors to support communications . driving controller 108 also includes an autonomous driving
A plurality of devices communicates via the bus . These parallel processing system 364 that operates on autonomous
devices include the autonomous driving controller 202 , the driving data received from the autonomous driving sensors
plurality of autonomous driving sensors 212A - 212F, an 35 and supports autonomous driving operations. The autono
infotainment device 214 , memory 216 , a climate control mous driving parallel processing system 364 includes at
device 218 , a battery controller 220 ( when the vehicle is an least three processors operating in parallel with one another.
electric vehicle or hybrid vehicle ), an engine/motor control The transceiver 361 includes a transmitter 358 , a receiver
ler 222 , a steering controller 224 , and a braking controller 360 , and a media I /F 362 that in combination support
226. Note that the communication connectivity via the bus 40 communications via the bus .
may be different in differing embodiments. The plurality of The construct of the general processing circuitry 352 may
autonomous driving sensors 212A - 212F may include one or be similar to the construct of the processing circuitry 302 of
more RADAR units , one or more LIDAR units, one or more the autonomous driving sensor 300. The autonomous driv
cameras, and / or one or more proximity sensors . The plural ing parallel processing system 364 will be described further
ity of autonomous driving sensors 212A - 212F collect 45 herein with reference to FIGS . 4-14 . The memory 354 may
autonomous driving data and transmit the collected autono be of similar structure as the memory 304 of the autonomous
mous driving data to the autonomous driving controller 108 driving sensor 300 but with capacity as required to support
on the bus . the functions of the autonomous driving controller 108 .
FIG . 3A is a block diagram illustrating an autonomous FIG . 4 is a block diagram illustrating a parallel processing
driving sensor constructed according to a described embodi- 50 system that supports runtime state reload according to a
ment. The autonomous driving sensor 300 includes data described embodiment. The parallel processing system 400
collection component 306 configured to collect autonomous of FIG . 4 may be the autonomous driving parallel processing
driving data . The data collection component 306 may be a system 364 of the autonomous driving controller 202. How
RADAR sensor, a LIDAR sensor, a sonic proximity sensor, ever, the concepts of the present disclosure , may also be
or another type of sensor. The autonomous driving sensor 55 applied to any other parallel processing system to determine
300 further includes processing circuitry 302 , memory 304 , whether a runtime state error exists in any processor of a
and a transceiver 311 coupled to the processing circuitry plurality of parallel processors/plurality of sub - systems and
302 , to the memory 304 , and to the data collection compo to load an error free runtime state into another processor/
nent 306 via a bus . The processing circuitry 302 executes sub -system in runtime. For example, the principals of the
programs stored in memory 304 , e.g. , autonomous driving
emergency operations, reads and writes data from / to
60 present disclosure may be applied to critical computing
systems such as data centers, communication switches ,
memory, e.g. , data and instructions to support autonomous medical devices, and industrial system controllers, for
driving operations, to interact with the data collection com example, that should not be taken out of service for a full
ponent 306 to control the collection of autonomous driving restart.
data, to process the autonomous driving data, and to interact 65 The parallel processing system 400 of FIG . 4 includes N
with the transceiver 311 to communicate via the bus , among parallel processors 402A , 402B , 402C , ... , and 402N state
other operations . monitoring circuitry 404 and state reload circuitry 406. The
US 10,802,929 B2
7 8
principals of the present disclosure are contemplated and gies for monitoring runtime states and reloading runtimes
described with at least three processors operating in parallel. will be described further herein with reference to FIGS .
However, these principals may be extended to include more 5-14 . As will be described further with reference to FIG . 6 ,
than three processors operating in parallel. however, not all of the components of the plurality of
Each of the parallel processors 402A - 402N may have 5 processors 402A - 402N may be monitored and enabled for
specific structure relating to autonomous driving in the runtime state reload .
embodiment of FIG . 2. For example, each of the parallel With various aspects of the parallel processing system
processors 402A - 402N may include convolutional process 400 , the state reload circuitry 406 is configured to use scan
ing components that operate on autonomous driving data . chains 403A - 403N of the plurality of processors 402A - 402N
Further, each of the parallel processors 402A - 402N may 10 to access the runtime state of the second processor 402B and
include a plurality of sub - systems , including safety sub to load the runtime state of the second processor 402B into
systems , security sub - systems , and / or other sub -systems. the first processor 402A . The structure and usage of scan
Moreover, each of the parallel processors 402A - 402N may chains is generally known and will not be described further
have its own local memory . herein except for how the scan chains relate to the present
The plurality of parallel processors 402A -402N have 15 disclosure .
identical or nearly identical processing components, e.g. , The plurality of processors 402A - 402N may include a
pipeline structure with combination logic and data paths pipeline architecture including a pluralities of processing
there between , and operates on substantially the same input logic intercoupled by data latching circuitry . The plurality of
data, e.g. , input data received from the autonomous driving processors 402A - 402N may each include a plurality of
sensors 212A - 212E described with reference to FIG . 2. With 20 pipelines . With such a processing structure, accessing the
the nearly identical structure and operating on the same runtime state of the second processor 402B includes access
input data, the plurality of parallel processors 402A - 402N ing a plurality of pipeline states of the second processor
should have identical output data and runtime states . How 402B . Further, in with such a processing structure , loading
ever, because of local environmental conditions of the the runtime state of the second processor 402B into the first
plurality of parallel processors 402A - 402N , e.g. , by voltage 25 processor 402A includes loading the plurality of pipeline
fluctuations, circuit aging , clock skew , memory write errors, states into the first processor.
memory read errors , etc. , one (or more) of the parallel According to another aspect of the parallel processing
processors 402A - 402N may have one or more runtime state system 404 , during loading of the runtime state of the second
errors . With runtime state errors , the processor may fail to processor 402B into the first processor 402A , the state reload
correctly perform its processing operations and produce 30 circuitry 406 is further configured to alter at least one clock
erroneous output data . Because of the requirements of input of the first processor 402A and at least one clock input
autonomous driving, erroneous output data cannot be toler of the second processor 402B . Examples of operation with
ated . The term “ runtime " is used herein to indicate that state out clock alteration and with clock alteration will be
monitoring and state reload is done while the plurality of described further herein with reference to FIGS . 11-14 .
parallel processors 402A - 402N are operational and perform- 35 According to another aspect of the parallel processing
ing their intended functions. system 400 , during loading of the runtime state of the second
Thus, according to the present disclosure, the parallel processor 402B into the first processor 402A , the state reload
processing system 400 includes state monitoring circuitry circuitry 406 is tolerant of differing supply voltages of at
404 coupled to the plurality of parallel processors 402A least one of the first processor 402A and the second proces
402. The state monitoring circuitry 404 is configured to 40 sor 402B . In one operation , a single source voltage may be
monitor runtime states , inputs, and /or outputs of the plurality applied to these processors 402A and 402B to alleviate any
of parallel processors 402A - 402N and to identify a first problems that could be caused during runtime state reload by
processor, e.g. , 402A of the plurality of parallel processors driving the processors 402A and 402B with differing volt
402A - 402N having at least one runtime state error. The state ages .
monitoring circuitry 404 may monitor runtime states of the 45 According to yet another aspect of the parallel processing
plurality of parallel processors 402A - 402N at any accessible system 400 , loading of the runtime state of the second
pipeline location , at any output location, or at any other processor 402B into the first processor 402A includes
location of the plurality of parallel processors 402A - 402N at obtaining memory data , e.g. , cache memory data , from local
which runtime state data is available . Generally, with at least memory of the second processor 402B and loading the
three processors 402A , 402B , and 402C , differing respective 50 memory data into local memory of the first processor 402A .
runtime states are compared to one another in parallel. In These operations are important to maintain consistency in
one embodiment considering three parallel processors /sub the runtime states . Since this is usually impractical, another
systems , if two out of three of the runtime states are equal approach is to invalidate the cache memory data for all
or consistent with one another and a third of the runtime processors 402A , 402B , 402C while the runtime state is
states is unequal or inconsistent with the other two, it is 55 reloaded in the faulty processor . A third approach is to
determined that the unequal or inconsistent runtime state is monitor cache memory data being written and read into each
in error. Many different mechanisms may be employed to cache memory ; if all the data is identical at the time of
compare runtime states to one another. One particular runtime state reload , then invalidation is not required. A
example is illustrated in FIGS . 8 and 9 . fourth approach is to only monitor data being written in the
The parallel processing system 400 further includes state 60 cache memory and use Error Correction Codes (ECC ) to fix
reload circuitry 406 coupled to the plurality of parallel faults discovered when the data is read .
processors 402A - 402N , the state reload circuitry 406 con In some embodiments , both the state monitoring circuitry
figured to select a second processor, e.g. , 402B , of the 404 and the state reload circuitry 406 have access to all
plurality of parallel processors 402A - 402N for state reload , pipeline states , in parallel , of the plurality of processors ,
to access a runtime state of the second processor 402B , and 65 including the input and output. Thus , all pipelines states of
to load the runtime state of the second processor 402B into the plurality of processors may be separately monitored for
the first processor 402A . Various structures and methodolo runtime state errors . Moreover, by having the ability to
US 10,802,929 B2
9 10
access all pipeline states of each of the plurality of proces time state error exists ( step 704 ) . Techniques for determining
sors , the just the pipeline state of one processor may be whether a runtime state error exists are further described
loaded into another processor instead of the entire processor with reference to FIGS . 8 and 9. If no runtime state errors are
state . detected , operations 700 return to step 702 .
FIG . 5 is a block diagram illustrating a processor of a 5 If one or more runtime state errors are detected at step
plurality of parallel processors of the parallel processing 704 , operations continue with identifying a first processor of
system that supports runtime state reload according to a the at least three parallel processors having at least one
described embodiment. The structure of FIG . 5 is directed to runtime state error ( step 706 ) . Operations 700 then continue
the first processor 402A of FIG . 4. The first processor 402A with selecting a second processor of the at least three parallel
includes a plurality of processor sub - systems 502A1 , 10 processors for state reload ( step 708 ) . Then , operations 700
502A2 , 502A3 , ... , 502AN and local memory 506A . The continue with accessing a runtime state of the second
state monitoring circuitry 404 couples to at least some of the processor ( step 710 ) and concludes with loading the runtime
processor sub - systems 502A1-502AN and to the local state of the second processor into the first processor ( step
memory 506A to monitor the runtime state of the at least 712 ) . With step 712 completed, operations 700 return to step
some of the processor sub - systems 502A1-502AN . Further, 15 702 .
the state reload circuitry 406 couples to at least some of the With one aspect of the operations 700 of FIG . 7A , the
processor sub - systems 502A1-502AN and to the local runtime states of the at least three parallel processors may
memory 506A to retrieve the runtime state of the at least correspond to respective sub -systems of the at least three
some of the processor sub -systems 502A1-502AN . This parallel processors. Further, according to another aspect , the
same construct may also be employed with the second 20 operations 700 may include using a scan chain of the second
processor 402B of FIG . 4 to load the runtime state of processor to access the runtime state of the second processor
processor 402A into processor 402B . and using a scan chain of the first processor to load the
With the embodiment of FIG . 5 , only some of the sub runtime state of the second processor into the first processor.
systems 502A1-502AN may be monitored and enabled for According to yet another aspect , the operations 700 include
runtime state reload . As described above with reference to 25 modifying memory data of the first processor.
FIG . 4 , sub - systems may be chosen for runtime state moni FIG . 7B is a flow diagram illustrating first options for
toring and runtime state reload based upon their importance runtime state reload operations according to one or more
compared to other sub - systems. In one particular embodi aspects of the present disclosure . With the embodiment of
ment, only the runtime states of the safety sub - systems of the FIG . 7C , runtime state reload operations 710 include assert
plurality of processors 402A - 402N of the parallel processing 30 ing reload signals by the state reload circuitry (step 730 ) ,
system 400 are enabled for runtime state monitoring and modifying a clock of at least one parallel processor/ sub
runtime state reload . system ( step 732 ) and /or modifying a source voltage of at
FIG . 6 is a block diagram illustrating respective sub least one parallel processor / sub - system ( step 734 ) . Clock
systems of a plurality of parallel processors of a parallel skew , supply voltage noise , ground plane noise , and circuit
processing system that supports runtime state reload accord- 35 aging can cause runtime state errors. Thus, to avoid future
ing to a described embodiment. Shown are a plurality of runtime state errors, differing clocks and / or source voltages
sub - systems 602A - 602N that include a corresponding plu should be used . Further, in order to cause the runtime state
rality of processor sub - system 604A - 604N and a corre reload process to be successful, the state reload circuitry
sponding plurality of local memories 606A - 606N . An may temporarily modify clocks and / or source voltage during
example of this implementation is the safety sub -system 40 access of the runtime state and loading of the accessed
described above . In other words, only a portion of the runtime state . As will be described further with reference to
runtime state of the plurality of processors 402A - 402N is FIGS . 11-13 , clocks to the first and second processors/sub
monitored and enabled for runtime state reload . In the systems may be individually manipulated .
example of FIG . 6 , that portion of the runtime state moni FIG . 7C is a flow diagram illustrating second options for
tored and enabled for runtime state reload corresponds to a 45 runtime state reload operations according to one or more
plurality of safety sub - systems . other aspects of the present disclosure . With the embodiment
The state monitoring circuitry 404 couples to the plurality of FIG . 7C , runtime state reload operations 710 include
of sub - systems 602A - 602N and is configured to monitor asserting reload signals by the state reload circuitry ( step
runtime states of the plurality of sub -systems 602A - 602N 740 ) , establishing a common clock for at least two of the
and to identify a first sub - system , e.g. , 602C , of the plurality 50 parallel processors / sub -systems ( step 742 ) and / or establish
of sub - systems 602A - 602N having at least one runtime state ing a common source voltage for at least two of the parallel
error. The state reload circuitry 406 couples to the plurality processors /sub -systems ( step 744 ) .
of sub -systems 602A - 602N and is configured to select a FIG . 8 is a block diagram illustrating portions of a
second sub - system processor, e.g. , 602B , of the plurality of plurality of processors of a parallel processing system and
sub - systems 602A- 602N for state reload, to access a runtime 55 state monitoring circuitry according to one or more
state of the second sub - system 602B , and to load the runtime described embodiments. The components 800 of FIG . 8
state of the second sub -system 602B into the first sub -system include portions of three processors / sub - systems and a por
602C . In the example of FIG . 6 , the state reload circuitry 406 tion of the state monitoring circuitry 404. These components
may also access contents of the local memory 606B that illustrate a three processor/sub - system implementation .
corresponds to sub - system 602B to extract data and to write 60 These components exist for each monitored runtime state of
this data to the local memory 606C . the three processor / sub - system implementation. Block 802A
FIG . 7A is a flow diagram illustrating runtime state reload is a portion of a first processor / first sub - system and includes
operations according to one or more described embodi flip - flops 804A and 806A ( also referred to herein as flops or
ments . Operations 700 of a parallel processing system latches ) and processing logic 808A . Block 802B is a portion
having at least three parallel processors include monitoring 65 of a second processor /second sub - system and includes flops
runtime states of the at least three parallel processors ( step 804B and 806B and processing logic 808B . Block 802C is
702 ) . Operations continue with determining whether a run a portion of a third processor/third sub -system and includes
US 10,802,929 B2
11 12
flops 804C and 806C and processing logic 808C . Flops with the subsequent timing diagrams and, if these flops were
810B , 810C , and 812C enable processing delays between not included , the timing diagrams would be modified
the blocks 802A , 802B , and 802C , which operate on respec accordingly.
tive data ( e.g. , 32 , 64 , 128 bits wide) received as input. These FIG . 10 is a block diagram illustrating a portion of state
three blocks 802A , 802B , and 802C may be considered to 5 reload circuitry that works in combination with a scan chain
represent a single logic block, input latch , and output latch , according to one or more described embodiments . The state
i.e. , pipeline stage of a respective processor/ sub -system . The reload circuitry illustrated in FIG . 10 corresponds to detail
state monitoring circuitry 404 is able to monitor one pipeline 824 of FIG . 9 and includes OR gates 902 and 904 and
stage , more than one pipeline stage , and / or the output of the multiplexers 906 , 908 and 910. A first OR gate 902 receives
blocks 802A - 802C . Monitoring of more than pipeline stage/ 10 as its input a first state reload signal ( st_reloadl) and a
output requires parallel comparators for each pipeline stage/ scan_shift signal ( corresponding to a scan chain) . When the
output. scan_shift signal is logic high , multiplexer 906 selects
Comparator 814 compares the three runtime states scan_data_in signal as its output. When the scan_shift signal
received from the three blocks 802A , 802B , and 802C , is logic low , multiplexer 906 selects the output of flop 804B
which have been time aligned by flops 810A , 812A , and 15 as its output, such operations occurring during runtime state
812B . Comparator 814 compares all bits ofreceived runtime reload . When the output of OR gate 902 is logic low, the
states or a portion of the bits of the received runtime states. output of multiplexer 908 is datal (first processor pipeline
Based upon its comparison, comparator 814 either deter data ). When the output of OR gate 902 is logic high, the
mines that the states are consistent, concluding that no output of multiplexer 908 is the output of multiplexer 906
runtime state errors exist , or when one of the runtime states 20 ( scan_data_in operations during scan chain operations or the
disagrees with the other two runtime states , the comparator output of flop 804B during first runtime state reload opera
determines that a runtime state error exists and identifies the tions ) .
block 802A , 802B , or 802C that presents the erroneous Second OR gate 904 receives as its inputs the scan_shift
runtime state . The state monitoring circuitry 404 communi signal and a second state reload signal ( st_reload2 ) and,
catively couples to the state reload circuitry 406 to notify the 25 when either of those two inputs is logic high , the output of
state reload circuitry 406 which of the blocks 802A , 802B , OR gate 904 is logic high . During scan chain or runtime state
or 802C has the runtime state error. The state reload circuitry reload operations, with the output of OR gate 904 logic high ,
406 then selects one of blocks 802A , 802B , or 802C that multiplexer 910 produces as its output the output of flop
does not have a runtime state error for runtime state reload . 804A . During normal operations (neither st_reload2 nor
In some embodiments, both the state monitoring circuitry 30 scan_shift logic high) , the multiplexer 910 produces as its
404 and the state reload circuitry 406 have access to all output data2 ( second processor pipeline data ).
pipeline states , inputs, and outputs, in parallel, of the plu FIG . 11 is a timing diagram illustrating clocks of the
rality of processors / sub - systems. Thus , all pipelines states, circuits of FIGS . 9 and 10 according to one or more
inputs , and outputs of the plurality of processors /sub-sys described embodiments . As shown , the runtime state ( datal)
tems may be separately monitored for runtime state errors . 35 of first processor /first sub -system is determined to have at
Moreover, by having the ability to access all pipeline states , least one error. In response to this determination by the state
inputs, and outputs of each of the plurality of processors/ monitoring/ state reload circuitry, the signal st_reload1 is
sub -systems, the entire pipeline , input, and output of one asserted to initiate the loading of runtime state ( data2 ) from
processor may be loaded into another processor during a few second processor / second sub - system into the first processor/
clock cycles . Thus , the only loss of function of the proces- 40 first sub - system . However, by using a single clock ( clkall)
sor / sub -system having the runtime state error is between the for both the first and second processors/sub -systems, there is
time the runtime state error is detected until the time that the a one clock cycle delay before data2 is loaded into the first
second runtime state of the second processor/ second sub processor/sub -system . However, using a single clock may
system is loaded into the first processor/sub -system . result in errors in the runtime state reload process due to
FIG . 9 is a block diagram illustrating portions of a 45 clock delay.
plurality of processors of a parallel processing system hav FIG . 12 is a timing diagram illustrating clocks of the
ing common clock and voltage input according to one or circuits of FIGS . 8 and 10 according to one or more
more described embodiments. The components 900 of FIG . described embodiments. As shown , the runtime state ( datal )
9 are the same components as were previously described of first processor / first sub -system is determined to have at
with reference to FIG . 8 but with added detail regarding 50 least one error. In response to this determination by the state
runtime state reload . As shown, during state reload opera monitoring state reload circuitry , the signal st_reload1 is
tions , data at the output of flop 804A may serve as an input asserted to initiate the loading of runtime state ( data2 ) from
data to flop 804B , data at the output of flop 804B may serve second processor /second sub - system into the first processor/
as an input data to flop 804A and / or 804C . Likewise, during first sub -system . With the embodiment of FIG . 12 , a first
state reload operations, data at the output of flop 806A may 55 clock ( clkl ) is used for the first processor / first sub - system
serve as an input data to flop 806B , data at the output of flop and a second clock (clkl ) is used for the second processor/
806B may serve as an input data to flop 806A and /or 806C . second sub -system . There exists a negative skew between
Further, with the embodiment of FIG . 9 , a single source the first clock ( clkl ) and the second clock ( clk2 ) , resulting
voltage ( vddall) and a single clock ( clkall) drives all com in an early cycle of the loading of the runtime state ( data2 )
ponents. Such driving of components may be done during 60 of the second processor/second sub - system into the first
normal operations or only during runtime state reload opera processor /sub -system , potentially resulting in errors in the
tions . Examples of how a single clock may be manipulated runtime state reload process .
to assist in runtime state reload operations will be described FIG . 13 is a timing diagram illustrating clocks of the
further with reference to FIG . 14 . circuits of FIGS . 8 and 10 according to one or more other
With the embodiments of FIGS . 8 and 9 , the flops 810A , 65 described embodiments. As shown , the runtime state ( datal )
810B , and 810C as well as the flops 812A , 812B , and 812C of first processor / first sub - system is determined to have at
are optional . The illustrated embodiments are consistent least one error. In response to this determination by the state
US 10,802,929 B2
13 14
monitoring / state reload circuitry, the signal st_reload1 is gramming or code in one or more digital computers or
asserted to initiate the loading of runtime state ( data2 ) from processors , by using application specific integrated circuits
second processor / second sub - system into the first processor/ ( ASICs ) , programmable logic devices, field programmable
first sub - system . With the embodiment of FIG . 13 , a first gate arrays (FPGAs ), optical , chemical , biological , quantum
clock ( clkl ) is used for the first processor / first sub -system 5 or nano -engineered systems , components and mechanisms.
and a second clock (clkl ) is used for the second processor/ Based on the disclosure and teachings representatively pro
second sub - system . There exists a positive skew between the vided herein , a person skilled in the art will appreciate other
first clock ( clkl ) and the second clock ( clk2 ), resulting in a ways or methods to implement the invention .
late cycle of the loading of the runtime state (data2 ) of the As used herein , the terms " comprises," " comprising , "
second processor /second sub - system into the first processor/ 10 “ includes , ” “ including, ” “ has , ” “ having ” or any contextual
sub - system , potentially resulting in errors in the runtime variants thereof , are intended to cover a non - exclusive
state reload process.
FIG . 14 is a timing diagram illustrating clocks of the ratus that comprises a, list
inclusion . For example a process , product, article, or appa
of elements is not necessarily
circuits of FIGS . 8 and 10 according to one or more other
described embodiments . As shown, the runtime state (datal) 15 limited to only those elements , but may include other
of first processor / first sub - system is determined to have at elements not expressly listed or inherent to such process ,
least one error. In response to this determination by the state product, article , or apparatus. Further, unless expressly
monitoring / state reload circuitry, the signal st_reload1 is stated to the contrary, “ or ” refers to an inclusive or and not
asserted to initiate the loading of runtime state (data2 ) of the to an exclusive or. For example , a condition “ A or B ” is
second processor/ second sub- system into the first processor/ 20 satisfied by any one of the following: A is true (or present)
first sub - system . With the embodiment of FIG . 14 , a first and B is false (or not present ), A is false (or not present) and
clock ( clkl ) is used for the first processor /first sub -system B is true (or present ), and both A and B is true (or present ).
and a second clock (clk1 ) is used for the second processor/ Although the steps, operations, or computations may be
second sub - system . The implementation of FIG . 14 is tol presented in a specific order, this order may be changed in
erant of large skew between clocks due to independent 25 different embodiments . In some embodiments, to the extent
clocks and /or independent voltage sources. Further, each of multiple steps are shown as sequential in this specification,
the clocks clk1 and clk2 and the state reload signal st_re some combination of such steps in alternative embodiments
load1 is separately controlled . With control available, cycles
of clk1 and clk2 may be skipped to compensate for delay in may be performed at the same time . The sequence of
data2 that is used to load into datal . As shown, the first state 30 reversed , ordescribed
operations
otherwise
herein can be interrupted , suspended ,
controlled by another process .
reload signal ( st_reload?) is held high to allow data2 to be It will also be appreciated that one or more of the elements
latched and available for loading into the first processor / first depicted in the drawings / figures can also be implemented in
sub -system (as datal ). By skipping a cycle of clk1 after the a more separated or integrated manner , or even removed or
first state reload signal ( st_reloadl ) is asserted, data2 can be
latched and available for loading into the first processor/first 35 rendered as inoperable in certain cases, as is useful in
sub - system to replace datal. After the skipped cycle of clki , accordance with a particular application . Additionally , any
signal arrows in the drawings / figures should be considered
when clk1 transitions from low to high the complete runtime only
state of the second processor / second sub -system may be as exemplary, and not limiting , unless otherwise spe
loaded into the first processor / first sub - system . cifically noted therewith .
In the foregoing specification, the disclosure has been 40
described with reference to specific embodiments. However, What is claimed is :
as one skilled in the art will appreciate , various embodi 1. A parallel processing system comprising:
ments disclosed herein can be modified or otherwise imple at least three processors operating in parallel;
mented in various other ways without departing from the state monitoring circuitry coupled to the at least three
spirit and scope of the disclosure. Accordingly, this descrip- 45 parallel processors , the state monitoring circuitry con
tion is to be considered as illustrative and is for the purpose figured to :
of teaching those skilled in the art the manner of making and monitor runtime states of the at least three parallel
using various embodiments of the disclosed system , method, processors; and
and computer program product. It is to be understood that identify a first processor of the at least three parallel
the forms of disclosure herein shown and described are to be 50 processors having at least one runtime state error ;
taken as representative embodiments . Equivalent elements, and
materials, processes or steps may be substituted for those state reload circuitry coupled to the at least three parallel
representatively illustrated and described herein . Moreover, processors , the state reload circuitry configured to :
certain features of the disclosure may be utilized indepen select a second processor of the at least three parallel
dently of the use of other features, all as would be apparent 55 processors for state reload ;
to one skilled in the art after having the benefit of this access a runtime state of the second processor ; and
description of the disclosure. load the runtime state of the second processor into the
Routines, methods, steps , operations, or portions thereof first processor,
described herein may be implemented through electronics, wherein loading of the runtime state of the second
e.g. , one or more processors , using software and firmware 60 processor into the first processor includes at least one
instructions. A " processor " or " processing circuitry " of:
includes any hardware system , hardware mechanism or invalidating memory data of the first processor,
hardware component that processes data , signals or other invalidating memory data of the three processors,
information . A processor can include a system with a central determining that invalidation of memory data is not
processing unit , multiple processing units , dedicated cir- 65 required, or
cuitry for achieving functionality, or other systems. Some repairing memory data using Error Correction
embodiments may be implemented by using software pro Codes .
US 10,802,929 B2
15 16
2. The parallel processing system of claim 1 , wherein the invalidating memory data of a plurality of the sub
runtime states of the at least three parallel processors cor systems;
respond to respective sub - systems of the at least three determining that invalidation of memory data is not
required; or
parallel processors .
3. The parallel processing system of claim 2 , wherein : 5 repairing memory data using Error Correction
Codes .
the parallel processing system supports autonomous driv 10. The runtime state reload system of claim 9 , wherein
ing ; and the respective sub - systems are safety sub -systems of an
the respective sub -systems of the at least three parallel autonomous driving system that determine whether autono
processors are safety sub -systems that determine 10 mous driving is to be enabled .
whether autonomous driving is to be enabled . 11. The runtime state reload system of claim 9 , wherein
4. The parallel processing system of claim 1 , wherein the the state reload circuitry is configured to :
state reload circuitry is configured to : use a modified scan chain of the second sub - system to
use a modified scan chain of the second processor to access the runtime state of the second sub -system ; and
access the runtime state of the second processor ; and 15
use a modified scan chain of the first sub -system to load
use a modified scan chain of the first processor to load the the runtime state of the second sub - system into the first
runtime state of the second processor into the first sub - system .
processor. 12. The runtime state reload system of claim 9 , wherein :
5. The parallel processing system of claim 1 , wherein : accessing the runtime state of the second sub - system
accessing the runtime state of the second processor 20
includes accessing a plurality of pipeline states of the
includes accessing a plurality of pipeline states of the second sub -system ; and
second processor; and loading the runtime state of the second sub - system into
loading the runtime state of the second processor into the the first sub - system includes loading the plurality of
first processor includes loading the plurality of pipeline pipeline states into the first sub - system .
states into the first processor. 25
13. The runtime state reload system of claim 9 , wherein ,
6. The parallel processing system of claim 1 , wherein , during into the
loading of the runtime state of the second sub - system
first sub - system the state reload circuitry is further
during loading of the runtime state of the second processor
into the first processor the state reload circuitry is further configured
sub - system
to alter at least one clock input of the first
and at least one clock input of the second
configured to alter at least one clock input of the first
processor and at least one clock input of the second proces- 30 sub14.-system .
The runtime state reload system of claim 9 , wherein ,
sor .
7. The parallel processing system of claim 1 , wherein , during into the
loading of the runtime state of the second sub - system
first sub - system the state reload circuitry is further
during loading of the runtime state of the second processor
into the first processor the state reload circuitry is further configured
sub - system
to alter a supply voltage of at least one of the first
and the second sub - system .
configured to alter a supply voltage of at least one of the first 35 15. A method for operating a parallel processing system
processor and the second processor. having at least three parallel processors, the method com
8. The parallel processing system of claim 1 , wherein
loading of the runtime state of the second processor into the prising :
monitoring runtime states of the at least three parallel
first processor includes invalidating memory data of the first processors ;
processor. 40
9. A runtime state reload system of a parallel processing identifying a first processor of the at least three parallel
system that includes at least three parallel processors , the processors having at least one runtime state error;
runtime state reload system comprising: selecting a second processor of the at least three parallel
state monitoring circuitry coupled to respective sub -sys processors for state reload ;
tems of the at least three parallel processors, the state 45 invalidatinga runtime
accessing state of the second processor;
local memory of the first processor ; and
monitoring circuitry configured to : loading the runtime state of the second processor into the
monitor runtime states of the respective sub -systems;
and first processor.
16.
identify a first sub - system of the respective sub - systems the at leastThe method of claim 15 , wherein the runtime states of
having at least one runtime state error; and 50
three parallel processors correspond to respective
state reload circuitry coupled to the respective sub - sys sub - systems of the at least three parallel processors .
tems , the state reload circuitry configured to : 17. The method of claim 16 , further comprising:
select a second sub - system of the respective sub using a scan chain of the second processor to access the
systems; runtime state of the second processor; and
access a runtime state of the second sub -system ; and 55
using a scan chain of the first processor to load the
load the runtime state of the second sub - system into the runtime state of the second processor into the first
first sub -system , processor.
wherein loading of the runtime state of the second 18. The method of claim 15 , further comprising altering
sub - system into the first sub - system includes at least at least one clock input of the first processor and at least one
one of: clock input of the second processor.
invalidating memory data of the first sub - system ;

You might also like