Defensive Programming in Functional Safety — IEC 61508
Defensive programming is a coding practice that anticipates invalid inputs, unexpected control flow, and anomalous states — then enforces predefined safe reactions at runtime. It is a key technique for controlling systematic failures in safety-related software under IEC 61508.
What is defensive programming?
Defensive programming deliberately assumes things will go wrong. Instead of trusting inputs, timing, or state transitions, the code checks them and responds in a controlled way. Typical measures include:
- Range and plausibility checks
- Parameter type and dimension checks
- Sanity guards on state machines
- Safe defaults
The aim is predictable behaviour under abnormal conditions — and evidence that the safety function will not act on bad data.
How it supports functional safety
Defensive programming helps control systematic failures that stem from design mistakes, coding errors, or misunderstood requirements. It detects anomalies at runtime and forces predefined outcomes.
It also helps surface manifestations of random or common-cause hardware faults when they appear as corrupted or implausible data — so the safety function does not silently operate on invalid information.
The key question is: if bad data reaches the safety function, what happens — and can you prove it?
When to use
- Safety-related inputs/outputs where out-of-range or inconsistent values could lead to hazards
- Interfaces across trust boundaries (sensor buses, comms frames, configuration files, HMI inputs)
- State machines, control loops, or mode logic sensitive to timing, ordering, or missing steps
Inputs and outputs
Inputs
- Sensor readings, commands, configuration parameters
- Control-flow context (mode, state, sequence/heartbeat, timeouts)
- Design constraints (valid ranges, units, rates of change, pre/postconditions)
Outputs
- Validated or clamped values, or rejected inputs with traceable diagnostics
- Safe reactions (hold last safe value, degrade mode, safe-state actuation)
- Logs and counters for anomaly detection and verification evidence
Procedure
- Derive checks from hazards and requirements. Identify variables, interfaces, states, and timing that can cause harm if wrong. Define ranges, units, invariants, and allowable transitions.
- Design explicit safe reactions. For each detected anomaly, specify what the software must do — discard, clamp, substitute last safe value, degrade, or enter a defined safe state — and how it recovers.
- Implement systematically. Apply range/type/dimension checks, plausibility guards (rate-of-change, cross-sensor consistency), control-flow guards (sequence counters, watchdogs, timeouts), and fail-safe defaults.
- Make it observable. Add deterministic error codes, health counters, and bounded logging so anomalies are auditable without flooding.
- Verify abnormal cases. Unit and integration tests must exercise boundary and fault-injected scenarios. Demonstrate deterministic safe reactions and no unintended side effects.
- Review and analyse. Peer reviews and static analysis (e.g. MISRA-style rules) confirm all critical paths are guarded and reactions are reachable and testable.
- Trace and maintain. Trace each check to a requirement and hazard. Update checks and tests when operating ranges or assumptions change.
Worked example — variable-speed drive
A variable-speed drive receives a target speed from a supervisory PLC. If the command is out of the certified operating envelope or jumps too quickly, the drive must reject it and hold the last safe value. After repeated anomalies it must enter a safe torque-off state.
Code-level example
/* Pseudo-C: defensive handling of a commanded speed */
#define SPEED_MIN_KMH 0
#define SPEED_MAX_KMH 200
#define MAX_DDELTA_KMH 20 // max allowed change per cycle
#define MAX_BAD_COUNT 3
static int last_safe_speed = 0;
static int bad_counter = 0;
int apply_commanded_speed(int cmd_speed, int dt_ms) {
// Type/range check (assume km/h integer)
if (cmd_speed < SPEED_MIN_KMH || cmd_speed > SPEED_MAX_KMH) {
bad_counter++;
log_error(ERR_RANGE, cmd_speed);
if (bad_counter >= MAX_BAD_COUNT) {
enter_safe_state(); // SAFE REACTION: transition to defined safe state
return last_safe_speed;
}
return last_safe_speed; // SAFE REACTION: hold last known safe value
}
// Plausibility: rate-of-change limit (protect mechanics)
if (abs(cmd_speed - last_safe_speed) > MAX_DDELTA_KMH) {
bad_counter++;
log_error(ERR_RATE, cmd_speed);
return last_safe_speed; // SAFE REACTION: reject transient spike
}
// Control-flow/sequence sanity (e.g., watchdog/heartbeat elsewhere)
bad_counter = 0; // reset health on good command
last_safe_speed = cmd_speed;
set_drive_speed(cmd_speed); // Actuation
return cmd_speed;
}
Result: The actuator never receives a hazardous command value or slew, and repeated anomalies deterministically force a safe state.
Quality criteria
- Completeness: All safety-relevant interfaces and states have defined checks and mapped safe reactions.
- Determinism: Reactions are bounded in time, free of undefined behaviour, and leave the system in a known state.
- Traceability and test evidence: Each check is traced to a requirement and hazard, and covered by tests including boundary and injected-fault cases.
- Clarity: Guard code is readable with no hidden side effects. Logs and diagnostics are unambiguous and rate-limited.
- Acceptable overhead: CPU, memory, and latency impact are measured and shown acceptable for the safety function.
Common pitfalls
"Detect but do nothing"
Checks raise flags without enforcing safe behaviour.
Mitigation: Bind each check to a concrete safe reaction and test it.
Inconsistent units or scaling
Comparing values in different units defeats plausibility checks entirely.
Mitigation: Centralise unit conversions and assert on interface contracts.
Excessive complexity
Too many ad-hoc checks become a maintenance risk and a source of new systematic faults.
Mitigation: Prioritise by hazard and consolidate into reusable guards.
Assuming development asserts survive to production
Assertions may be compiled out in release builds, silently removing safety-critical checks.
Mitigation: Implement runtime guards that remain active in production builds.
Silent clamping without traceability
Hiding errors impedes diagnosis and removes evidence that the system detected a problem.
Mitigation: Log bounded, meaningful diagnostics with error codes.
No recovery path
The system enters a safe state but cannot resume safely.
Mitigation: Design and verify recovery criteria and procedures.
Frequently asked questions
Is defensive programming mandatory in IEC 61508?
It is a recommended technique in IEC 61508-3 Table A.4. Depending on SIL, architecture, and hazard analysis, using it — or equivalent measures — is often expected to achieve acceptable residual risk.
How do I justify the runtime overhead?
Prioritise guards by hazard, measure CPU/memory/latency impact, and provide test evidence that overhead is bounded and acceptable for the safety function.
Is this the same as exception handling?
No. Exceptions report errors. Defensive programming prevents unsafe behaviour by validating conditions and enforcing a safe reaction even when exceptions are not thrown.
Can it improve diagnostic coverage (DC)?
Yes. While aimed at systematic failures, checks such as plausibility and timeouts can detect effects of random faults in data paths, contributing evidence towards DC claims.
Related techniques
- Range and plausibility checks — core checks often implemented as part of defensive programming
- Control-flow monitoring — complementary technique to detect sequence/order anomalies and task overruns
References
- IEC 61508-3:2010 — especially Annex A, Table A.4 and Annex C.2.5
- ISO 26262-6:2018 — Road vehicles — Product development at the software level
- J.E. Cooling, Software Engineering for Real-time Systems (2003)
Go deeper — IEC 61508 Certification Course
Our IEC 61508 course covers software safety techniques, architectural design, SIL verification, and safety case preparation — for engineers building safety-related systems.
Explore the course → Ask us a question