Error Detecting Codes in Functional Safety — IEC 61508
Error detecting codes are mathematical techniques that add redundancy bits to digital data to detect corruption during transmission or storage. In functional safety, their purpose is to prevent faulty data from being used in safety-critical decisions — common examples include CRC, Hamming codes, and checksums.
What are error detecting codes?
Error detecting codes add extra redundancy bits to digital information, allowing a system to check whether data has been corrupted. Common examples include parity bits, checksums, Hamming codes, and cyclic redundancy checks (CRC).
In functional safety, the purpose is not to correct errors but to prevent faulty data from being used in safety-critical decisions. When corruption is detected, the system discards the data and applies a defined safe reaction.
How it supports functional safety
Error detecting codes help prevent systematic failures by ensuring that corrupted or incomplete data does not silently propagate through the system. They also detect the effects of random hardware faults or electromagnetic interference that might alter transmitted or stored data.
By discarding or safely reacting to erroneous data, error detection prevents hazardous control actions.
The key question is: if data is corrupted between sensor and controller, will your system detect it — or act on it?
When to use
- Safety-related communication between sensors, controllers, and actuators
- Protecting memory contents in embedded controllers or safety PLCs
- Serial communication links exposed to noise or interference
- Any case where corrupted data could cause a dangerous or unintended actuation
Inputs and outputs
Inputs
- Raw data to be stored or transmitted
- Coding scheme (e.g. CRC polynomial, Hamming parameters)
Outputs
- Encoded data with redundancy bits
- Detection status (valid or corrupted)
Procedure
- Select an appropriate code. Choose parity, CRC, Hamming, or another scheme based on required safety integrity level and the expected error patterns.
- Encode outgoing data by adding redundancy bits.
- Transmit or store the data with the code attached.
- At the receiver (or during retrieval), recompute and verify the code.
- If valid → accept data.
- If invalid → apply a safe reaction (discard, hold last safe value, or enter safe state).
Worked example — temperature sensor on a noisy bus
A temperature sensor sends data to a safety controller over a noisy bus. A CRC is appended to each data frame. If corruption occurs, the controller rejects the frame and keeps the last safe reading, avoiding a spurious shutdown command.
Code-level example
def transmit(data):
crc = compute_crc(data)
return data, crc
def receive(data, crc):
if compute_crc(data) == crc:
return data
else:
# SAFE REACTION: discard frame, hold last safe value
return last_safe_value
Result: The controller only acts on verified, uncorrupted sensor data.
Quality criteria
- Code selection: Coding scheme selected must match the SIL/ASIL target.
- Error coverage: Coverage against single-bit and burst errors must be justified.
- Safe reaction: Reaction on error must be specified, tested, and documented.
Common pitfalls
Using error correction instead of detection
Correction may produce an incorrect but valid-looking value, which is unsafe.
Mitigation: Always discard or enter safe state on detection. Do not guess the correct value.
Weak codes missing multi-bit errors
Simple parity cannot detect multi-bit errors, giving a false sense of security.
Mitigation: Use strong CRCs or Hamming codes where required by the SIL target.
Not testing safe reaction paths
The detection logic works but the safe reaction has never been exercised.
Mitigation: Include error injection in verification to prove the full detection-to-reaction chain.
Frequently asked questions
Why not correct errors instead of just detecting them?
Correction may produce an incorrect but valid-looking value, which is unsafe. Functional safety favours discarding over guessing.
Are CRCs enough for SIL 3/4?
CRCs with sufficient length and carefully chosen polynomials can provide very high diagnostic coverage, but justification is required. The choice must match the SIL target and the expected error patterns.
Related techniques
- Cyclic Redundancy Checks (CRC) — specialised polynomial-based detection
- Diverse redundancy — prevents systematic failures through architectural means
References
- IEC 61508-3:2010 — Annex C
- Huffman, W.; Pless, V. — Fundamentals of Error-Correcting Codes, Cambridge University Press, 2003
- Koopman, P. — CRC and Error Detection Tutorial
Go deeper — IEC 61508 Certification Course
Our IEC 61508 course covers data integrity techniques, diagnostic coverage, software safety design, and safety case preparation — for engineers building safety-related systems.
Explore the course → Ask us a question