SIL 3 Mechanical Devices, Really?

2020-03-03

functional safety

Some Basic Concept

In order to come to this conclusion, I first need to explain a few basic concepts, which include architectural constraints, Type, HFT and SFF. Yes, sorry about that. It is the only way to understand that not all claimed SIL 3 is really SIL 3.

IEC 61508 does not allow you just to design any architecture. What IEC 61508 does allow is summarised in the architectural constraints table below. This table represents one of around a thousand SIL requirements that exist in the IEC 61508 standard. When the Type, the Hardware Fault Tolerance (HFT) and the Safety Failure Fraction (SFF) are known, this table will tell you which SIL potentially can be achieved. Just take into account that in order to really meet that SIL level, many other SIL requirements need to be addressed as well.

The IEC 61508 standard differentiates between Type A and Type B devices. Type A devices are typically mechanical devices like actuators, valves, solenoid valves, relays, etc. They are considered to be “simple” devices compared to Type B devices which are “complex”. Any device with software inside is a Type B device as it is based on complex electronics. For example, smart transmitters and programmable electronic logic solvers.

A Type A device, like a valve, has an HFT of zero. From the architectural constraints table, you can see that you can claim a higher SIL level for a single device (HFT=0) when the SFF of that device is higher. You can also see that it is “easier” to achieve a higher SIL level with a Type A device than with a Type B device. A Type A device with an SFF between 90% and 99% achieves SIL 3, while a Type B device in the same SFF range only achieves SIL 2. By the way, this does not mean it is also easier for a Type A device to achieve an SFF of over 90%.

What Is A Safe Failure Fraction, And Why Is It Important?

The SFF measures the effectiveness of the fail-safe design and/or the diagnostics built into that device. The SFF of a device is important as it is one of the factors that will decide which SIL can be claimed for this device. As the table above shows, the higher the SFF, the higher the SIL level that can be claimed. Each device has an SFF, which is presented as a percentage (0-100%). The SFF is calculated using the following formula:

SFF = (SD + SU + DD) / (SD + SU + DD + DU)

To calculate the SFF, you need Safe Detected (SD), Safe Undetected (SU), Dangerous Detected (DD) and Dangerous Undetected (DU) failure rates. Normally, the device manufacturer derives these failure rates using the FMEDA technique. From this formula, it becomes clear that the SFF of a device fully depends on the DU failure rate of that device.

When the manufacturer can design the device, so there are no DU failures, the device has an SFF of 100%. Since a high SFF allows you to claim a high SIL, there is a lot of pressure “to ensure” that a device has a high SFF. But, there are no devices on the market with zero DU failures and thus an SFF of 100%.

A device manufacturer can get a high SFF for the device in two ways. Option one is to make a fail-safe design. A fail-safe design means many failures inside the device go to the safe state. In other words, these failures are Safe failures (SD or SU failures). When a device is 100% fail-safe, and thus only has Safe failures, the SFF is also 100%. But, there are also no devices on the market that only have safe failures.

Unfortunately, there are always dangerous failures inside a device. The only option a device manufacturer has to improve the SFF when there are too many Dangerous Undetected failures is to improve the device's diagnostics. This is the big advantage Type B device have. They are based on electronics and thus can have a lot of diagnostics built in. Type A devices do not have diagnostics in terms of the standard. They can only depend on periodic proof testing. The only problem is that periodic proof testing cannot be used to claim a failure to be “detected”.

Many people do not realise that the SFF is a design parameter, not an operational one. What we mean by this is that once the manufacturer has finished the device's design, the SFF of that device is determined and fixed. Changing (read improve) the SFF when the device is installed in the field is almost impossible. For most devices, end users have no control over the SFF. They get what they get.

SIL 3 Mechanical Devices, Really?

There are many mechanical devices on the market for which the manufacturers claim SIL 3. Being a Type A device, this means the device must have an SFF of over 90%. Can this really be done? Most of the time, NOT!

Mechanical devices do not have built-in diagnostics like programmable electronic devices can have. In other words, there are no Safe Detected and no Dangerous Detected failures by default. Only Safe Undetected and Dangerous Undetected failures.

Let's take an example of a typical mechanical device (relay, solenoid valve, ball valve, etc., which does not matter). Let's assume the valve has the following failure rates:

SD = 0 FIT
SU 436 FIT
DD = 0 FIT
DU = 244 FIT

This leads to an SFF of 64%. Being a Type A device with an HFT of zero, we can see from the table above that this device could claim SIL 2. You cannot achieve SIL 3 with this device unless you increase the HFT to 1 and thus build, for example, 1oo2, i.e., redundancy.

What if I could build an automated diagnostic test for this device? Can I not turn undetected failures into detected failures? Of course, you can. Let's assume we build some system which allows us to test our Type A device automatically. We turn undetected failures into detected failures. Of course, no test is perfect, so there will always be some undetected failures. Let us assume our automated test is very effective, and our failure rates now look like this:

SD = 0 FIT
SU 436 FIT
DD = 189 FIT
DU = 55 FIT

This leads to an SFF of (436 + 189) / (436 + 189 + 55) = 92%. Fantastic, this means you can now claim SIL 3 for a single device. Sounds great, but there is a catch.

IEC 61508 has rules for diagnostic tests, and one of the rules is that the test needs to be frequent enough. If the test frequency is not fast enough, it cannot be claimed as a diagnostic test. And if it is not a diagnostics test, we can not use it to turn DU failures into DD failures; thus, we cannot improve the SFF. So this is a critical point that end users need to consider.

When Is a Test Frequent Enough to Be a Diagnostic Test?

Whether a test interval is frequent enough to be a diagnostic test depends on the HFT and the demand mode of the function. IEC 61508 has the following rules:

For subsystems with HFT=0, like an individual valve:
- When the mode is low demand mode the diagnostic test interval and the time to repair detected failures is less than the assumed MTTR in calculations;
- When the mode is high demand or continuous mode, then the diagnostic test interval + the action to go to a safe state must be faster than the process safety time;
- When the mode is high demand, the diagnostic test interval must be 100x faster than the demand rate.
For subsystems with HFT≥1, for example 1oo2 valves:
- The diagnostic test interval and the time to repair detected failures are less than the assumed MTTR in calculations

Does Process Safety Time or MTTR Kill the SIL Level of Most Mechanical Devices?

The achievable SIL level of a single mechanical, type A device depends on the SFF. The SFF is higher when there are fewer DU failures. To turn DU failures into DD failures, we can buildin a diagnostics test. A test needs to be frequent enough to be a diagnostic test. For low-demand mode functions, the diagnostic test interval frequency plus the action to go to the safe state must be faster than the process safety time. What is the process safety time, and why does it not favour the SIL level?

Process safety time is the period between a failure occurring in the process or the basic process control system (with the potential to give rise to a hazardous event) and the occurrence of the hazardous event if the SIF is not performed. In many process installations, the process safety time is less than 1 second, a few seconds, a few minutes, and exceptionally may be a few hours. Consider now the IEC 61508 rules about the diagnostics test interval above, and you will realise that process safety time is critical.

In our last example, we claimed an SFF of 92% because we assumed an automatic test would reveal those failures. In practice, this automated test is a proof test. In practice, proof testing of low-demand mode functions is done once per year. If the test interval is one year and the action to go to the safe state is 10 seconds, then the process safety time must be slower than one year plus 10 seconds. Which end-user has process safety times of more than a year? Nobody.

In practice, the process safety time is a few seconds, minutes or hours. Let's assume the process safety time is 1 hour. With an action to go to the safe state of 10 seconds, the test interval must be faster than 1 hour and 10 seconds. For low-demand mode functions, the proof test interval is usually around one year, not in hours. In other words, this automated proof test needs to be faster. And if it is not fast enough, we cannot claim the proof test as a diagnostic test, and thus we cannot improve the SFF by turning DU failures into DD failures. In other words, our 92% SFF is, in reality, 64%, and thus our mechanical device is not SIL 3 but SIL 2.

SIL 3 Mechanical Devices, Not Really

The process safety time, the diagnostic test interval, and the claimed MTTR are the problem makers why most SIL 3 mechanical devices are rather SIL 1 or 2. End users are careful with device manufacturers claiming SIL 3 for single devices, according to IEC 61508. It only works if you have a test interval that meets the rules for diagnostics testing according to IEC 61508. Which, most likely, you have not!

But We Apply IEC 61511

That makes, unfortunately for you, little difference. IEC 61511 states that new devices should be developed according to IEC 61508.

Back to all news

SIL 3 Mechanical Devices, Really?

Some Basic Concept

What Is A Safe Failure Fraction, And Why Is It Important?

SIL 3 Mechanical Devices, Really?

When Is a Test Frequent Enough to Be a Diagnostic Test?

Does Process Safety Time or MTTR Kill the SIL Level of Most Mechanical Devices?

SIL 3 Mechanical Devices, Not Really

For you

You and Us

Resources

About Risknowlogy