SIL 3 Mechanical Devices, Really?

Michel Houtermans 2016-10-01

There are many manufactures of mechanical devices (valves, actuators, etc) that claim that their devices can be used in SIL 3 applications. In this post I am going to explain that most mechanical devices actually will never achieve SIL 3. In order to come to this conclusion I first need to explain a few basic concepts which include architectural constraints, Type, HFT and SFF. Yes, sorry about that. It is the only way to understand that not all claimed SIL 3 is really SIL 3.

IEC 61508 does not allow you to just design any architecture. What IEC 61508 does allow is summarised in the architectural constraints table below. This table represents one of around a thousand SIL requirements that exist in the IEC 61508 standard. When the Type, the Hardware Fault Tolerance (HFT) and the Safety Failure Fraction (SFF) are known, this table will tell you which SIL potentially can be achieved. Just take into account that in order to really meet that SIL level, many other SIL requirements need to be addressed as well.

Architectural Constraints 61508

The IEC 61508 standard makes a differentiation between Type A and Type B devices. Type A devices are typically mechanical devices like actuators, valves, solenoid valves, relays and so on. They are considered to be “simple” devices compared to Type B devices which are “complex”. Any device with software inside is a Type B device as it is based on complex electronics. For example, smart transmitters and programmable electronic logic solvers.

A Type A device, like a valve, has a HFT of zero. From the architectural constraints table you can see that you can claim a higher SIL level for a single device (HFT=0) when the SFF of that device is higher. What you can also see is that it is “easier” to achieve a higher SIL level with a Type A device compared to a Type B device. A Type A device with a SFF between 90% and 99% achieves SIL 3 while a Type B device in the same SFF range only achieves SIL 2. Btw this does not mean that it is also easier for a Type A device to actually achieve a SFF of over 90%.

What Is Safe Failure Fraction And Why Is It Important?

The SFF measures the effectiveness of the fail safe design and/or the diagnostics built into that device. The SFF of a device is important as it is one of the factors that will decide which SIL can be claimed for this device. As we see in the table above, the higher the SFF the higher the SIL level that can be claimed. Each device has a SFF which is presented as a percentage (0-100%). The SFF is calculated using the following formula:

SFF = (SD + SU + DD) / (SD + SU + DD + DU)

In order to calculate the SFF you need Safe Detected (SD), Safe Undetected (SU), Dangerous Detected (DD) and Dangerous Undetected (DU) failure rates. Normally the manufacturer of the device derives these failure rates using the FMEDA technique. From this formula it becomes clear that the SFF of a device fully depends on the DU failure rate of that device.

When the manufacturer can design the device in such way that there are no DU failures then the device has a SFF of 100%. Since a high SFF allows you to claim a high SIL there is a lot of pressure “to make sure” that a device has a high SFF. But, there are no devices on the market with zero DU failures and thus a SFF of 100%.

There are two ways a device manufacturer can get a high SFF for the device. Option one is to make a fail safe design. A fail safe design means that many failures inside the device go to the safe state. In other words these failures are Safe failures (SD or SU failures). When a device is 100% fail safe, and thus only has Safe failures, the SFF is also 100%. But, there also no devices on the market that only have safe failures.

Unfortunately there are always dangerous failures inside a device. The only option a device manufacturer has to improve the SFF when there are too many Dangerous Undetected failures is to improve the diagnostics of that device. This is the big advantage Type B device have. They are based on electronics and thus can have a lot of diagnostics built in. Type A devices do not have diagnostics in terms of the standard. They can only depend on periodic proof testing. The only problem is that periodic proof testing cannot be used to claim a failure to be “detected”.

What many people do not realise is that the SFF is a design parameter and not an operational parameter. What we mean by this is that once the manufacturer has finished the design of the device the SFF of that device is determined and fixed. It is almost not possible to change (read improve) the SFF when the device is installed in the field. For most devices, end users have no control over the SFF. They get what they get.

SIL 3 Mechanical Devices, Really?

There are many mechanical devices on the market for which the manufacturers claim SIL 3. Being a Type A device, this means the device must have a SFF of over 90%. Can this really be done? Most of the time NOT!

Mechanical devices do not have built in diagnostics like programmable electronic devices can have. In other words by default there are no Safe Detected and and no Dangerous Detected failures. Only Safe Undetected and Dangerous Undetected failures.

Lets take an example of a typical mechanical device (relay, solenoid valve, ball valve, etc, does not matter). Lets assume the valve has a the following failure rates:

  • SD = 0 FIT
  • SU 436 FIT
  • DD = 0 FIT
  • DU = 244 FIT

This leads to a SFF of 64%. Being a Type A device with a HFT of zero we can see from the table above that this device could claim SIL 2. You cannot achieved SIL 3 with this device unless you increase the HFT to 1 and thus built for example 1oo2, i.e., redundancy.

What if I could built an automated diagnostic tests for this device? Cannot I not turn undetected failure into detected failures? Of course you can. Lets assume we build some kind of system which allows us to automatically test our Type A device. We turn undetected failures into detected failure. Of course no test is perfect so there will always be some undetected failures left. Let us assume our automated test is very effective and our failure rates now look like this this:

  • SD = 0 FIT
  • SU 436 FIT
  • DD = 189 FIT
  • DU = 55 FIT

This leads to a SFF of  (436 + 189) / (436 + 189 + 55) = 92%. Fantastic, this mean you can now claim SIL 3 for a single device. Sounds great but there is a catch.

IEC 61508 has rules for diagnostic tests and one of the rules is that the test needs to be frequent enough. If the test frequency is not fast enough it cannot be claimed as a diagnostic test. And if it is not a diagnostics test we can not use it to turn DU failures into DD failures and thus we cannot improve the SFF. So this is a very critical point that end users need to take into account.

When Is A Test Frequent Enough To Be A Diagnostic Test?

Whether a test interval is frequent enough to be a diagnostic test depends on the HFT and the demand mode of the function. IEC 61508 has the following rules:

  • For subsystems with HFT=0, like an individual valve:
    • When the mode is low demand mode the diagnostic test interval and the time to repair detected failures is less than the assumed MTTR in calculations;
    • When the mode is high demand or continous mode, then the diagnostic test interval + the action to go to safe state must be faster than process safety time;
    • When the mode is high demand, then the diagnostic test interval must be 100x faster than the demand rate.
  • For subsystems with HFT≥1, for example 1oo2 valves:
    • The diagnostic test interval and the time to repair detected failures is less than the assumed MTTR in calculations

Does Process Safety Time or MTTR Kill The SIL Level Of Most Mechanical Devices?

The achievable SIL level of a single mechanical, type A device, depends on the SFF. The SFF is higher when there are less DU failures. In order to turn DU failures into DD failures we can built in diagnostics test. In order for a test to be a diagnostic test it needs to be frequent enough. For low demand mode functions the frequency of the diagnostic test interval plus the action to go to the safe state must be faster than the process safety time. What is the process safety time and why is it not in favour of the SIL level?

The process safety time is the time period between a failure occurring in the process or the basic process control system (with the potential to give rise to a hazardous event) and the occurrence of the hazardous event if the SIF is not performed. In many process installation the process safety time is less than 1 second, a few seconds, a few minutes, and exceptionally may be a few hours. Consider now the IEC 61508 rules about the diagnostics test interval above and you realise that the process safety time is a critical factor.

In our last example we claimed a SFF of 92% because we assumed an automatic test would reveal those failures. In practice this automated test is a proof test. In practice, proof testing of low demand mode functions is done, maybe, once per year. If the test interval is 1 year and the action to go to the safe state is 10 seconds then the process safety time must be slower than 1 year plus 10 seconds. Which end user has process safety times of more than a year? Nobody.

In practice though the process safety time is rather a few seconds, minutes or hours. Lets assume the process safety time is 1 hour. With an action to go to the safe state of 10 seconds this means the test interval must be faster than 1 hour and 10 seconds. For low demand mode functions the proof test interval is usually around 1 year, not in hours. In other words this automated proof test is not fast enough. And if it is not fast enough then we cannot claim the proof test as a diagnostic test and thus we cannot improve the SFF by turning DU failures into DD failures. In other words our 92% SFF is in reality 64% and thus our mechanical device is not SIL 3 but SIL 2.

SIL 3 Mechanical Devices, Not Really

The process safety time, the diagnostic test interval and the claimed MTTR are the problem makers why most SIL 3 mechanical devices are rather SIL 1 or 2. End users be careful with device manufacturers claiming SIL 3 for single devices according to IEC 61508. It only works if you have test interval that meets the rules for diagnostics testing according to IEC 61508. Which most likely you have not!

But We Apply IEC 61511

That makes unfortunately for you not much difference. IEC 61511 is very clear that new devices should be developed according to IEC 61508.

Contact Us If You Need Help With SIL And Functional Safety