Error Detecting Codes
What is it?
Error detecting codes are mathematical techniques that add extra redundancy bits to digital information. These codes allow a system to check whether data has been corrupted during transmission or storage. Common examples include parity bits, checksums, Hamming codes, and cyclic redundancy checks (CRC). In functional safety, their purpose is not to correct errors but to prevent faulty data from being used in safety-critical decisions.
How it supports functional safety
Error detecting codes help prevent systematic failures by ensuring that corrupted or incomplete data does not silently propagate through the system. They also detect the effects of random hardware faults or electromagnetic interference that might alter transmitted or stored data. By discarding or safely reacting to erroneous data, error detection prevents hazardous control actions.
When to use
- Safety-related communication between sensors, controllers, and actuators.
- Protecting memory contents in embedded controllers or safety PLCs.
- Serial communication links exposed to noise or interference.
- Any case where corrupted data could cause a dangerous or unintended actuation.
Inputs & Outputs
Inputs
- Raw data to be stored or transmitted
- Coding scheme (e.g., CRC polynomial, Hamming parameters)
Outputs
- Encoded data with redundancy bits
- Detection status (valid / corrupted)
Procedure
- Select an appropriate error detection code (parity, CRC, Hamming, etc.) based on required safety integrity level.
- Encode outgoing data by adding redundancy bits.
- Transmit or store the data with the code attached.
- At the receiver (or during retrieval), recompute and verify the code.
- If valid → accept data.
- If invalid → apply a safe reaction (discard, hold last safe value, or enter safe state).
Worked Example
High-level
A temperature sensor sends data to a safety controller over a noisy bus. A CRC is appended to each data frame. If corruption occurs, the controller rejects the frame and keeps the last safe reading, avoiding a spurious shutdown command.
Code-level
def transmit(data):
crc = compute_crc(data)
return data, crc
def receive(data, crc):
if compute_crc(data) == crc:
return data
else:
# SAFE REACTION: discard frame, hold last safe value
return last_safe_value
Result: The controller only acts on verified, uncorrupted sensor data.
Quality criteria
- Coding scheme selected must match the SIL/ASIL target.
- Coverage against single-bit and burst errors must be justified.
- Safe reaction on error must be specified, tested, and documented.
Common pitfalls
- Using error correction instead of detection → unsafe mis-corrections. Mitigation: always discard or enter safe state on detection.
- Weak codes (e.g., simple parity) missing multi-bit errors. Mitigation: use strong CRCs or Hamming codes where required.
- Not testing safe reaction paths. Mitigation: include error injection in verification.
References
- IEC 61508-3:2010, Annex C
- Huffman, W., Pless, V. Fundamentals of Error-Correcting Codes, Cambridge University Press, 2003
- Koopman, P. — CRC and Error Detection Tutorial
FAQ
Why not correct errors instead of just detecting them?
Correction may produce an incorrect but valid-looking value, which is unsafe. Functional safety favors discarding over guessing.
Are CRCs enough for SIL 3/4?
CRCs with sufficient length and carefully chosen polynomials can provide very high diagnostic coverage, but justification is required.