Defensive Programming

2025-09-04

  

Defensive Programming

What is it?

Defensive programming is a software technique that deliberately assumes things will go wrong. Instead of trusting inputs, timing, or state transitions, the code checks them and responds in a controlled way. Typical measures include range and plausibility checks, parameter type/dimension checks, sanity guards on state machines, and safe defaults. The aim is predictable behavior under abnormal conditions and evidence that the safety function will not act on bad data.

When to use

  • Safety-related inputs/outputs where out-of-range or inconsistent values could lead to hazards.
  • Interfaces across trust boundaries (sensor buses, comms frames, configuration files, HMI inputs).
  • State machines, control loops, or mode logic sensitive to timing, ordering, or missing steps.

Inputs & Outputs

Inputs

  • Sensor readings, commands, configuration parameters.
  • Control-flow context (mode, state, sequence/heartbeat, timeouts).
  • Design constraints (valid ranges, units, rates of change, pre/postconditions).

Outputs

  • Validated/clamped values or rejected inputs with traceable diagnostics.
  • Safe reactions (hold last safe value, degrade mode, safe-state actuation).
  • Logs/counters for anomaly detection and verification evidence.

Procedure

  1. Derive checks from hazards and requirements. Identify variables, interfaces, states, and timing that can cause harm if wrong; define ranges, units, invariants, and allowable transitions.
  2. Design explicit safe reactions. For each detected anomaly, specify what the software must do (discard, clamp, substitute last safe value, degrade, or enter a defined safe state) and how it recovers.
  3. Implement systematically. Apply: range/type/dimension checks; plausibility (rate-of-change, cross-sensor consistency); control-flow guards (sequence counters, watchdogs, timeouts); fail-safe defaults.
  4. Make it observable. Add deterministic error codes, health counters, and bounded logging so anomalies are auditable without flooding.
  5. Verify abnormal cases. Unit and integration tests must exercise boundary and fault-injected scenarios; demonstrate deterministic safe reactions and no unintended side effects.
  6. Review and analyze. Peer reviews and static analysis (e.g., MISRA-style rules) confirm all critical paths are guarded and reactions are reachable and testable.
  7. Trace and maintain. Trace each check to a requirement and hazard; update checks and tests when operating ranges or assumptions change.

Worked Example

High-level

A variable-speed drive receives a target speed from a supervisory PLC. If the command is out of the certified operating envelope or jumps too quickly, the drive must reject it and hold the last safe value; after repeated anomalies it must enter a safe torque-off state.

Code-level

/* Pseudo-C: defensive handling of a commanded speed */ #define SPEED_MIN_KMH 0 #define SPEED_MAX_KMH 200 #define MAX_DDELTA_KMH 20 // max allowed change per cycle #define MAX_BAD_COUNT 3

static int last_safe_speed = 0;
static int bad_counter = 0;

int apply_commanded_speed(int cmd_speed, int dt_ms) {
// Type/range check (assume km/h integer)
if (cmd_speed < SPEED_MIN_KMH || cmd_speed > SPEED_MAX_KMH) {
bad_counter++;
log_error(ERR_RANGE, cmd_speed);
if (bad_counter >= MAX_BAD_COUNT) {
enter_safe_state(); // SAFE REACTION: transition to defined safe state
return last_safe_speed;
}
return last_safe_speed; // SAFE REACTION: hold last known safe value
}

// Plausibility: rate-of-change limit (protect mechanics)
if (abs(cmd_speed - last_safe_speed) > MAX_DDELTA_KMH) {
    bad_counter++;
    log_error(ERR_RATE, cmd_speed);
    return last_safe_speed;  // SAFE REACTION: reject transient spike
}

// Control-flow/sequence sanity (e.g., watchdog/heartbeat elsewhere)
bad_counter = 0;            // reset health on good command
last_safe_speed = cmd_speed;
set_drive_speed(cmd_speed); // Actuation
return cmd_speed;


}

Result: The actuator never receives a hazardous command value or slew, and repeated anomalies deterministically force a safe state.

Quality criteria

  • Completeness: All safety-relevant interfaces and states have defined checks and mapped safe reactions.
  • Determinism: Reactions are bounded in time, free of undefined behavior, and leave the system in a known state.
  • Traceability & test evidence: Each check is traced to a requirement/hazard and covered by tests, including boundary and injected-fault cases.
  • Clarity: Guard code is readable (no hidden side effects); logs/diagnostics are unambiguous and rate-limited.
  • Acceptable overhead: CPU, memory, and latency impact are measured and shown acceptable for the safety function.

Common pitfalls

  • “Detect but do nothing”: Checks raise flags without enforcing safe behavior → bind each check to a concrete safe reaction and test it.
  • Inconsistent units or scaling: Comparing apples to oranges defeats plausibility → centralize unit conversions and assert on interface contracts.
  • Excessive complexity: Too many ad-hoc checks become a maintenance risk → prioritize by hazard and consolidate into reusable guards.
  • Assuming development asserts in production: Assertions may be compiled out → implement runtime guards that remain active in release builds.
  • Silent clamping without traceability: Hiding errors impedes diagnosis → log bounded, meaningful diagnostics with error codes.
  • No recovery path: System enters a safe state but cannot resume safely → design and verify recovery criteria and procedures.

References

FAQ

Is defensive programming mandatory in IEC 61508?

It is a recommended technique in IEC 61508-3 Table A.4. Depending on SIL, architecture, and hazard analysis, using it (or equivalent measures) is often expected to achieve acceptable residual risk.

How do I justify the runtime overhead?

Prioritize guards by hazard, measure CPU/memory/latency impact, and provide test evidence that overhead is bounded and acceptable for the safety function.

Is this the same as exception handling?

No. Exceptions report errors; defensive programming prevents unsafe behavior by validating conditions and enforcing a safe reaction even when exceptions are not thrown.

Can it improve diagnostic coverage (DC)?

Yes. While aimed at systematic failures, checks (e.g., plausibility, timeouts) can detect effects of random faults in data paths, contributing evidence towards DC claims.

This article explains Defensive Programming in general functional-safety practice. Always consult applicable standards for normative requirements.


Back to all news

We use cookies
Cookie preferences
Below you may find information about the purposes for which we and our partners use cookies and process data. You can exercise your preferences for processing, and/or see details on our partners' websites.
Analytical cookies Disable all
Functional cookies
Other cookies
We use cookies to personalize content and ads, to provide social media features and to analyze our traffic. Learn more about our cookie policy.
Accept all Decline all Change preferences
Cookies