Defensive Programming

What is it?

Defensive programming is a software technique that deliberately assumes things will go wrong. Instead of trusting inputs, timing, or state transitions, the code checks them and responds in a controlled way. Typical measures include range and plausibility checks, parameter type/dimension checks, sanity guards on state machines, and safe defaults. The aim is predictable behavior under abnormal conditions and evidence that the safety function will not act on bad data.

How it supports functional safety

Defensive programming helps control systematic failures that stem from design mistakes, coding errors, or misunderstood requirements by detecting anomalies at runtime and forcing predefined outcomes. It also helps surface manifestations of random or common-cause hardware faults when they appear as corrupted or implausible data, so the safety function does not silently operate on invalid information.

When to use

Safety-related inputs/outputs where out-of-range or inconsistent values could lead to hazards.
Interfaces across trust boundaries (sensor buses, comms frames, configuration files, HMI inputs).
State machines, control loops, or mode logic sensitive to timing, ordering, or missing steps.

Inputs & Outputs

Inputs

Sensor readings, commands, configuration parameters.
Control-flow context (mode, state, sequence/heartbeat, timeouts).
Design constraints (valid ranges, units, rates of change, pre/postconditions).

Outputs

Validated/clamped values or rejected inputs with traceable diagnostics.
Safe reactions (hold last safe value, degrade mode, safe-state actuation).
Logs/counters for anomaly detection and verification evidence.

Procedure

Derive checks from hazards and requirements. Identify variables, interfaces, states, and timing that can cause harm if wrong; define ranges, units, invariants, and allowable transitions.
Design explicit safe reactions. For each detected anomaly, specify what the software must do (discard, clamp, substitute last safe value, degrade, or enter a defined safe state) and how it recovers.
Implement systematically. Apply: range/type/dimension checks; plausibility (rate-of-change, cross-sensor consistency); control-flow guards (sequence counters, watchdogs, timeouts); fail-safe defaults.
Make it observable. Add deterministic error codes, health counters, and bounded logging so anomalies are auditable without flooding.
Verify abnormal cases. Unit and integration tests must exercise boundary and fault-injected scenarios; demonstrate deterministic safe reactions and no unintended side effects.
Review and analyze. Peer reviews and static analysis (e.g., MISRA-style rules) confirm all critical paths are guarded and reactions are reachable and testable.
Trace and maintain. Trace each check to a requirement and hazard; update checks and tests when operating ranges or assumptions change.

Worked Example

High-level

A variable-speed drive receives a target speed from a supervisory PLC. If the command is out of the certified operating envelope or jumps too quickly, the drive must reject it and hold the last safe value; after repeated anomalies it must enter a safe torque-off state.

Code-level

/* Pseudo-C: defensive handling of a commanded speed */ #define SPEED_MIN_KMH 0 #define SPEED_MAX_KMH 200 #define MAX_DDELTA_KMH 20 // max allowed change per cycle #define MAX_BAD_COUNT 3

static int last_safe_speed = 0;
static int bad_counter = 0;

int apply_commanded_speed(int cmd_speed, int dt_ms) {
// Type/range check (assume km/h integer)
if (cmd_speed < SPEED_MIN_KMH || cmd_speed > SPEED_MAX_KMH) {
bad_counter++;
log_error(ERR_RANGE, cmd_speed);
if (bad_counter >= MAX_BAD_COUNT) {
enter_safe_state(); // SAFE REACTION: transition to defined safe state
return last_safe_speed;
}
return last_safe_speed; // SAFE REACTION: hold last known safe value
}

// Plausibility: rate-of-change limit (protect mechanics)
if (abs(cmd_speed - last_safe_speed) > MAX_DDELTA_KMH) {
    bad_counter++;
    log_error(ERR_RATE, cmd_speed);
    return last_safe_speed;  // SAFE REACTION: reject transient spike
}

// Control-flow/sequence sanity (e.g., watchdog/heartbeat elsewhere)
bad_counter = 0;            // reset health on good command
last_safe_speed = cmd_speed;
set_drive_speed(cmd_speed); // Actuation
return cmd_speed;


}

Result: The actuator never receives a hazardous command value or slew, and repeated anomalies deterministically force a safe state.

Quality criteria

Completeness: All safety-relevant interfaces and states have defined checks and mapped safe reactions.
Determinism: Reactions are bounded in time, free of undefined behavior, and leave the system in a known state.
Traceability & test evidence: Each check is traced to a requirement/hazard and covered by tests, including boundary and injected-fault cases.
Clarity: Guard code is readable (no hidden side effects); logs/diagnostics are unambiguous and rate-limited.
Acceptable overhead: CPU, memory, and latency impact are measured and shown acceptable for the safety function.

Common pitfalls

“Detect but do nothing”: Checks raise flags without enforcing safe behavior → bind each check to a concrete safe reaction and test it.
Inconsistent units or scaling: Comparing apples to oranges defeats plausibility → centralize unit conversions and assert on interface contracts.
Excessive complexity: Too many ad-hoc checks become a maintenance risk → prioritize by hazard and consolidate into reusable guards.
Assuming development asserts in production: Assertions may be compiled out → implement runtime guards that remain active in release builds.
Silent clamping without traceability: Hiding errors impedes diagnosis → log bounded, meaningful diagnostics with error codes.
No recovery path: System enters a safe state but cannot resume safely → design and verify recovery criteria and procedures.

References

FAQ

Is defensive programming mandatory in IEC 61508?

It is a recommended technique in IEC 61508-3 Table A.4. Depending on SIL, architecture, and hazard analysis, using it (or equivalent measures) is often expected to achieve acceptable residual risk.

How do I justify the runtime overhead?

Prioritize guards by hazard, measure CPU/memory/latency impact, and provide test evidence that overhead is bounded and acceptable for the safety function.

Is this the same as exception handling?

No. Exceptions report errors; defensive programming prevents unsafe behavior by validating conditions and enforcing a safe reaction even when exceptions are not thrown.

Can it improve diagnostic coverage (DC)?

Yes. While aimed at systematic failures, checks (e.g., plausibility, timeouts) can detect effects of random faults in data paths, contributing evidence towards DC claims.

Defensive Programming

Defensive Programming

What is it?

How it supports functional safety

When to use

Inputs & Outputs

Inputs

Outputs

Procedure

Worked Example

High-level

Code-level

Quality criteria

Common pitfalls

References

FAQ

For you

You and Us

Resources

About Risknowlogy

Defensive Programming

Defensive Programming

What is it?

How it supports functional safety

When to use

Inputs & Outputs

Inputs

Outputs

Procedure

Worked Example

High-level

Code-level

Quality criteria

Common pitfalls

Related techniques

References

FAQ

For you

You and Us

Resources

About Risknowlogy