Diverse monitor techniques (with independence between the monitor and the monitored function in the same computer)
What it is, why it matters for functional safety, and exactly how to apply it.
What is it?
A diverse monitor is a supervisory function that continuously checks a main (monitored) function against simple, safety-oriented rules (invariants). Unlike the main controller, it derives from a distinct specification and is realized using a different language, toolchain, and runtime partition. The monitor does not prove full correctness; instead, it ensures the system does not enter or remain in an unsafe state and, when needed, actively enforces a safe reaction.
How it supports functional safety
The technique targets systematic failures in requirements, design, and implementation by adding an independently specified checker that validates outputs, timing, and key assumptions. Because it observes real data and time, it can also detect manifestations of random/common-cause hardware faults (e.g., stuck outputs, clock drift) and prevent unsafe propagation. The monitor must have authority to inhibit or override the main function so the safety function never silently acts on unsafe or corrupted information.
When to use
- SIL 2–4 software where residual specification/implementation faults in the controller are credible and consequences are severe.
- Complex or novel control algorithms that are hard to verify exhaustively, where independence is required but a second computer is impractical.
- When timing integrity matters (e.g., rate limits, watchdog windows) and you need an independent time base plus a decisive safe reaction path.
Inputs & Outputs
Inputs
- Candidate setpoints or key internal states from the main function (one-way, integrity-protected interface).
- Independent references: timer/clock domain, select sensors (read-only), configuration limits.
Outputs
- Permission gate for actuation (opened per cycle upon successful checks).
- Safe-state commands (e.g., de-energize, force pump off) and diagnostic flags/events.
Procedure
- Define safety invariants: Translate hazards into bounds, temporal guards (rates, timeouts), and plausibility checks that are distinct from the control law.
- Engineer independence: Separate specification; different implementation approach (e.g., C monitor vs. C++ controller); diverse toolchains; memory/MPU partitioning; independent timer/clock.
- Design the authority path: Give the monitor exclusive control of a permission gate and/or the final safety output so it can enforce a safe state regardless of main behavior.
- Harden the interface: One-way main→monitor data with sequence counters and CRC; avoid callbacks and shared mutable state.
- Supervise timing: The monitor owns the windowed watchdog; loss of monitor activity must drop the system to a safe state.
- Verify and validate: Fault-inject spec and timing errors, demonstrate independence (build/report evidence), and prove deterministic safe reactions.
Worked Example
High-level
A chemical dosing controller computes a pump flow setpoint. A diverse monitor (independent spec) checks: (1) absolute flow bound, (2) ramp rate per 10 ms, (3) commanded vs measured plausibility, and (4) interface integrity. The monitor alone can open a per-cycle gate to actuate; otherwise it forces the pump off.
Code-level
// --- pump_controller.cpp (C++, Toolchain A) --- // Publishes candidate setpoints only; does not actuate without a gate. struct Candidate { uint32_t seq; float cmd; uint32_t crc32; };
void PumpControllerTask() {
static uint32_t seq = 0;
for (;;) {
float demand = IO::readDemand();
float temp = IO::readTemperature();
float cmd = compute_command(demand, temp); // complex control law
Candidate f{ ++seq, cmd, 0 };
f.crc32 = crc32(&f, sizeof(f) - sizeof(f.crc32));
MonitorIF::publish(f); // one-way to monitor
if (MonitorIF::gate_open_once()) {
IO::setPumpFlow(cmd); // actuate only if gate opened
}
rtos::delay_ms(10);
}
}
// --- safety_monitor.c (C, Toolchain B) ---
// Independent spec: invariants + temporal guards; owns watchdog & safety output.
typedef struct { uint32_t seq; float cmd; uint32_t crc32; } candidate_t;
#define MAX_SAFE_FLOW 100.0f
#define MAX_RAMP_PER10 10.0f // mg/s per 10 ms window
static float last_cmd = 0.0f;
static uint32_t last_seq = 0;
void SafetyMonitorTask(void) {
lptim_start(); // independent clock domain
wdt_config_windowed(); // only the monitor services the watchdog
for (;;) {
candidate_t f;
if (MonitorIF_try_read(&f)) {
// Integrity: sequence & CRC
if (!crc_ok(&f, sizeof(f)) || f.seq <= last_seq) {
MonitorIF_close_gate();
IO_forcePumpOff(); // SAFE REACTION: invalid frame
Log_warn("SM: integrity fail");
continue;
}
uint32_t dt = lptim_elapsed_ms();
float max_step = (MAX_RAMP_PER10 * (dt / 10.0f));
// Invariant 1: absolute bound
if (f.cmd > MAX_SAFE_FLOW) {
MonitorIF_close_gate();
IO_forcePumpOff(); // SAFE REACTION: bound violated
Log_warn("SM: abs bound");
continue;
}
// Invariant 2: rate limit
if ((f.cmd - last_cmd) > max_step) {
MonitorIF_close_gate();
IO_forcePumpOff(); // SAFE REACTION: ramp exceeded
Log_warn("SM: ramp");
continue;
}
// Invariant 3: plausibility vs. measured flow (read-only)
float pv = IO_readMeasuredFlow();
if (f.cmd > 0 && pv <= 0) {
MonitorIF_close_gate();
IO_forcePumpOff(); // SAFE REACTION: cmd/pv mismatch
Log_warn("SM: plausibility");
continue;
}
// All checks passed: open gate for this cycle only
MonitorIF_pulse_gate();
last_cmd = f.cmd;
last_seq = f.seq;
wdt_kick(); // only the monitor kicks the watchdog
}
lptim_delay_ms(10);
}
}
// --- boot_platform.c (conceptual) ---
// MPU/ownership: monitor has exclusive write access to safety output.
mpu_allow_write(SAFETY_DO_BASE, SAFETY_DO_SIZE, TASK_SAFETY_MONITOR);
mpu_deny_write (SAFETY_DO_BASE, SAFETY_DO_SIZE, TASK_MAIN_CTRL);
// Hardware wiring: SAFE_OFF line is driven only by monitor context.
// If monitor stalls and WDT expires -> hardware drops to safe.
Result: Even if the main controller has a systematic fault, unsafe commands are blocked; timing anomalies or interface corruption trigger a deterministic SAFE REACTION (pump de-energized).
Quality criteria
- Independence evidence: Distinct specification; different language/toolchain; separate memory, stacks, and RTOS tasking; independent time base.
- Authority: Monitor controls a permission gate and/or exclusive safety output path that the main function cannot override.
- Simplicity: Monitor logic limited to clear invariants and temporal guards; easy to test and justify.
- Interface integrity: One-way data flow with sequence counters and CRC; no shared mutable state.
- Deterministic reactions: SAFE REACTION defined, implemented, and verified (incl. watchdog ownership by the monitor).
Common pitfalls
- Hidden coupling (shared libraries, callbacks) — Mitigation: enforce one-way interface; static linking separation; code reviews for coupling.
- Monitor without authority — Mitigation: give exclusive ownership of the safety output and prove it via MPU and wiring diagrams.
- Over-complex monitor — Mitigation: keep to bounds, plausibility, and timing checks; avoid re-implementing the controller.
- No independent time base — Mitigation: use a separate timer/clock domain and a windowed watchdog serviced only by the monitor.
References
FAQ
Does the monitor prove the controller is correct?
No. It ensures the system does not become unsafe. The controller may still be sub-optimal; the monitor’s role is to block hazardous behavior.
Can both functions share an OS?
Yes, if you can prove independence: separate tasks, MPU partitions, one-way interface, and a monitor-owned watchdog and output authority.