Professional Documents
Culture Documents
Recent Progress in The APL Fault Management Process: Kristin Fretz & Adrian Hill Jhu/Apl
Recent Progress in The APL Fault Management Process: Kristin Fretz & Adrian Hill Jhu/Apl
Finding:
§ “There is insufficient formality in the documentation of
FM designs and architectures, as well as a lack of
principles to guide the processes”
Recommendation:
§ “Identify representation techniques to improve the
design, implementation and review of FM systems”
§ “Establish a set of design guidelines to aid in FM design”
Implemented Change for Finding 4:
Insufficient Formality of Documentation
§ APL’s QMS has formal FM and Autonomy engineering
processes which define required documentation and
reviews
FM
Autonomy
FM
Architecture
Document
N/A
FM
Requirements
Document
Autonomy
Requirements
Specifica.on
FM
Design
Specifica.on
Autonomy
Design
Spec
&
Users
Guide
FM
Test
Plan
&
Verifica.on
Matrix
Autonomy
Acceptance
Test
Plan
FM
Test
Procedures
Autonomy
Acceptance
Test
Specifica.on
FM
Test
Reports
Autonomy
Acceptance
Test
Report
(includes
verifica.on
matrix)
§ RBSP FM and Autonomy developed system and
subsystem diagrams to communicate information
captured in architecture, requirements, and specifications
Implemented Change for Finding 4:
Insufficient Formality of Documentation
§ Example of Fault Management Modes Diagram used to convey interactions between
Hardware, Software, Autonomy, and Ground/Operations for managing spacecraft faults
Ground Commands
XCVR FSW
OPERATIONAL MODE self-initiated reset RF processor watchdog
•Normal power-positive conditions exist timeout
•Spacecraft is available for science data collection Nominal Operations Configurations
Communication C&DH FSW self-initiated reset
XCVR FSW
self-initiated reset RF processor watchdog
Power Low battery current not during eclipse or
current controller over-power Battery electronics over-power Payload Instrument power down timeout
request, instrument
RF Processor Reset heartbeat failure, or over-power
Avionics IEM processor
watchdog timeout Communication
Battery Discharge Battery Electronics Protection
Protection RF Electronics
Individual Establish
XCVR FSW defaults to RF Processor Reset
Power Off Disconnect Non-Optimal Instrument Safing Emergency IEM Non Power-On Reset
Current
Reset PSE Battery
Turn Battery
Battery Reboots Emergency RF Electronics
Interface Card Electronics Off Comm Recover Establish
Controller Electronics Maintenance Comm
Power Off
C&DH
Existing SSR Spin Pulse
Establish
Emergency
XCVR FSW defaults to
Emergency
FSW File System Management Reboots Emergency
Instrument Comm Comm
Reboots (3x max) Comm
Battery over-temperature Battery Temperature
returns to nominal SSPA over-power IEM Reset CCD Command SSPA over-power
Battery Over-Temperature Protection Battery Over-Temperature
Recovery Latch Valve IEM Reset via IEM CCD RF Electronics
Enable
G&C/Propulsion continuous current
RF Electronics Protection
Disable Set Charge Enable Over- Enable Over- Reset IEM Recover
Coulometer
Charge Control
Current to
Trickle
Temperature
Recovery
Coulometer
Charge
Temperature
Protection
Protection
Latch Valve
Continuous-Current
non-critical
modules*
Existing SSR
File System
Power Off
Control SSPA
Power Off Power-Off *Resets SBC, SSR (lose open files only), SCIF (non-critical) SWCLT expiration or
FSW off-pulse command Latch Valve Transceiver over-power
to PSE I/F card SSPA IEM off-pulse
Local
HWCLT Expiration
RF Electronics
LVS or LBSOC Off-pulse Establish
XCVR Softdefaults
LVS to MaxEmergency
Sun Angle Exceedance Hardware Command Loss Timer Sequence
Emergency
PDU Load Shed Sequence (3x max)
Comm
Comm
SAFE MODE PDU
Off-Pulse
XCVR
Off-Pulse
IEM
Off-Pulse
Reinitialize
File System
Switched Loads Separated •Protects against life-threatening low-power (Destructive)
PGS VT-only
OFF
or communication fault HWCLT Expiration
Separated
•No science data collection Separated
KEY
Not
Separated
Hardware Command Loss Timer Sequence
Go-Safe Sequence
Post-Separation Sequence Disable Disable C&DH/ BME, PSE Power on RF Load
System
Assert PDU
Autonomy Autonomy Post-Sep Macro Separated & Thruster Fire I/F Delete Inst CC, & Downlink & Resume SSR Re-enforcement
PDU Detects 3 of
4 Switches and
IEM Provides
Switch Status to
Detects 3 of 4 (Enable Safety Nominal -OR- Fault LVS or LBSOC & Power Off Time-tag Instruments Configure Emer Operations
Reinitialize (SS, HTRs,
Go-Safe
Relay KEY
Switches via IEM Buses, RF ON, Sun SA Deploy Macro PDU XCVR
Catbeds Commands IEMOFF Comm PSE I/F ON)
Removes Inhibit Autonomy
telemetry Sensor ON) File System Autonomy
Off-Pulse Off-Pulse Off-Pulse
Hardware Rule that triggers post-sep macro disabled by ground after confirmation that macro successfully completed
(Destructive)
Hardware
MOPS
MOPS Launch Vehicle Separation
FSW
LVS or
Launch Safe
FSW LBSOC
Configuration Configuration
Implemented Change for Finding 4:
Insufficient Formality of Documentation
§ Example of Autonomy Design Diagram used to convey implementation of monitors
and responses that comprise the on-board autonomy system for responding to faults
RF COMMUNICATIONS LAUNCH SAFE MODE DETECT SUN ANGLE MAX [R12]
If sun angle exceeds Maximum Sun Angle
SSR INITIALIZATION
[SV04] and either Mission Phase [SV03] is
DETECT SW CLT TIMEOUT [R15] DETECT ANY CDH REBOOT [R01] DETECT SEPARATION [R05] DETECT SA DEPLOY READY [R06] DETECT CDH REBOOT WITH SSR
ORBIT or Deployments Complete Flag
If time elapsed since If C&DH processor reboots for any If 3 (or more) of 4 switches indicate separation and If 3 (or more) of 4 switches indicate sep -AND- CONTENTS LOST [R04]
DETECT HARDWARE LVS [R10] [SV11] is TRUE
Time of Last SWCLT Restart reason and (spacecraft is separated Mission Phase [SV03] is LAUNCH for 5 seconds Mission Phase [SV03] is LAUNCH -AND- If C&DH processor reboots and SSR
If Hardware LVS detected and either
[SV01] exceeds SWCLT Interval or Mission Phase [SV03] is ORBIT) Time of Separation [SV05] is non-zero -AND- contents are lost (e.g., IEM Off-
Mission Phase [SV03] is ORBIT or
[SV02] and (spacecraft is separated ( time elapsed since Time of Separation [SV05] DETECT HARDWARE CLT [R13] Pulse) and SSR H/W Initialization of
Deployments Complete Flag [SV11] is
or Mission Phase [SV03] is ORBIT) exceeds SA Deploy Delay Interval [SV13] -OR- If C&DH processor reboots as result of HW its SDRAM is complete
TRUE
POST-SEPARATION SEQUENCE EXECUTIVE LVS detected -OR- LBSOC is detected ) CLT expiration (HWCLT Relay set) and
[M05] (spacecraft is separated or Mission Phase
q
RF COMMUNICATIONS
Set Time of Separation [SV05] to current DETECT LOW BATT SOC [R11] [SV03] is ORBIT)
POWER ON RF DOWNLINK PATH
POWER ON RF DOWNLINK PATH FSW uptime If LBSOC is detected and and either REINITIALIZE SSR RECORDING
AND OFF PULSE TRANSCEIVER
AND CONFIGURE EMERGENCY q
CALL PERFORM POST-SEPARATION Mission Phase [SV03] is ORBIT or [M04]
[M15] DETECT SOFTWARE LVS [R14]
COMM [M01] SEQUENCE [M07] SA DEPLOYMENTS EXECUTIVE [M06] Deployments Complete Flag [SV11] is q
Enable SSR memory scrub
q
Power on SSPA (A) If batt voltage is less than Software LVS
q
Power on SSPA (A) q
Sleep 20 seconds q
Sleep 60 seconds TRUE q
Create new SSR file system
q
CALL OFF PULSE
q
Sleep 5 seconds q
DETECT SW CLT TIMEOUT
CALL PERFORM POST-SEPARATION
[R15]
q
DETECT
CALL PERFORM X-AXIS SA DEPLOYMENTS [M08]
ANY CDH REBOOT [R01] Threshold [SV14] and either Mission Phase
(destructive)
TRANSCEIVER [M16] [SV03] is ORBIT or Deployments Complete
q
Disable baseband uplink
q
Issue transceiver firmware SEQUENCE If [M07]
time elapsed since q
Sleep 60 seconds If C&DH processor reboots for any Flag [SV11] is TRUE
q
Sleep 5 seconds
q
q
reset
Sleep 15 seconds
Time of Last SWCLT Restartqq
CALL PERFORM Y-AXIS SA DEPLOYMENTS [M13]
Sleep 60 seconds
reason and (spacecraft is separated CALL RECORD ENGINEERING
DATA TO SSR [M27]
DETECT TRANSCEIVER EXCESS
q
CALL CONFIGURE RF FOR [SV01] exceeds
PERFORM POST-SEPARATION SEQUENCE
SWCLT Interval
q
or
CALL PERFORM X-AXIS SA DEPLOYMENTS Mission
[M08] Phase [SV03] is ORBIT) PERFORM GOSAFE [M10]
EMERGENCY COMM [M19] q
Sleep 60 seconds q
CALL DISABLE THRUSTERS [M20]
POWER [R16] [M07] [SV02] and (spacecraft is separated q
CALL PERFORM Y-AXIS SA DEPLOYMENTS [M13] q
Delete instrument timetags
If transceiver power exceeds limits q
Enable RF safety buses (A&B) DETECT CDH REBOOT WITH SSR
q
or Mission Phase [SV03] is ORBIT)
Enable actuator safety buses (A&B) q
Set Deployments Complete Flag [SV11] to TRUE q
Suspend C&DH timetags at priorities 4-15
CONTENTS PRESERVED [R03]
q
CALL POWER OFF BATTERY
q
Enable propulsion safety buses (A&B) If C&DH processor reboots and SSR
MANAGEMENT ELECTRONICS [M42]
q
Enable battery conn relays (1&2) RECORD ENGINEERING DATA TO contents are preserved (subject to
q
Power off PSE current controller
CONFIGURE RF FOR q
Enable SAJB Relay at Reduced Curr (A&B) SSR [M27] maximum 3 attempts)
OFF PULSE TRANSCEIVER [M16] q
CALL SHUTDOWN PAYLOAD [M47]
EMERGENCY COMM [M19] q
Select VT Level #4 q
Open files on SSR to record
DisconnectPOWER ONrelay
RF DOWNLINK PATH
q
Off-pulse transceiver (h/w- PERFORM X-AXIS SA PERFORM Y-AXIS SA q
CALL POWER ON RF DOWNLINK PATH
q
Configure FSW tables for q
limited to 3 attempts)
BME cell bypass DEPLOYMENTS [M08] POWER ON[M13]
DEPLOYMENTS RF DOWNLINK PATH AND CONFIGURE EMERGENCY COMM engineering data (includes
emergency downlink rates q
CALL POWERAND ONOFF PULSEPATH
RF DOWNLINK TRANSCEIVERq
Fire +X primary / -X q
Fire +Y primary / -Y housekeeping, event message
q
Disable FSW playback AND CONFIGURE EMERGENCY COMM ANDredundant CONFIGURE EMERGENCY [M01]
and command status packets) RESUME SSR RECORDING [M03]
off pulse will
q
Sleep 30 seconds [M15] redundant actuators actuators q
CALL RECORD ENGINEERING DATA TO q
Enable SSR memory scrub
cause transceiver [M01] q
Sleep 5 seconds qCOMM
Sleep[M01]
5 seconds SSR [M27]
software reboot q
Configure transceiver for q
q
sensorPower
Power on sun on
electronics unitSSPA (A) q
Fire +X redundant / -X qq
FirePower
+Y redundant / -Y
q
Recover existing SSR file system
emergency uplink/downlink on SSPA (A) q
Power on propulsion module heaters q
Sleep 5 seconds
q
CALL OFF PULSE primary actuators primary actuators q
Power on sun sensor electronics unit
DETECT TRANSCEIVER *** This macro is disabled by default for I&T *** q
Sleep 5 seconds q
Power on bus survival heaters
q
CALL RECORD ENGINEERING
SOFTWARE REBOOT [R17] TRANSCEIVER [M16] RECORD INSTRUMENT DATA TO DATA TO SSR [M27]
DETECT SSPA EXCESS POWER q
Issue transceiver firmware q
Inhibit PDU Response to Low Battery State of SSR [M28] q
CALL RECORD INSTRUMENT
If transceiver flight software reboots
[R18] q
Disable baseband uplink Charge from PSE Interface Card
unexpectedly
If SSPA power exceeds limits
reset q
Power on PSE interface card
q
Open files on SSR to record DATA TO SSR [M28]
instrument data
q
Sleep 15 seconds q
Assert PDU Go-Safe relay
KEY:
Past
APL
Mission
#1
Past
APL
Mission
#2
Past
APL
Mission
#3
RBSP
Staff
Months