We often get the question whether HFT (hardware fault tolerance) is equal to redundancy? The answer is no. In this blog post I will explain why not. In order to do that we need to actually understand three terms, i.e., redundancy, HFT and voting.
In the technical world everybody seems to know the word redundancy and yet it can be very confusing. Especially when you try to express it in a number, i..e, how much redundant is a design. So what is redundancy? Redundancy can be defined as a system function which is designed in such way that there are multiple means (parts, components, devices, software, etc) to carry out the function so that the function will not fail if one or more of these means fails. Redundancy is not determined by the number of similar parts or devices you see. Whether there is redundancy or not is solely determined by the function that you carry out with these parts or devices. Take a look at the following pictures. You see two valves. Is this now redundancy or not?
Well that depends on the function that is being carried out with these two valves. If the function is to stop the flow upon demand and both valves are open during normal operation than only 1 valve needs to close and stop the flow. In other words if one valve is stuck open (a dangerous failure) the function will still work as the other valve can close. This would be redundancy and is a so called 1oo2 architecture design.
If on the other hand the function is to open the flow and both valves are closed during normal operation than both valves need to open in order to start the flow. If one valve is stuck closed (in this case also a dangerous failure) the function cannot be carried out, even if the other valve opens. This is not redundancy and the valves are in a so called 2oo2 architecture design.
In the first case we are redundant but how much redundant. Some cultures call it redundant, other say it is two redundant, but actually the correct way to express it is one redundant. The reason is that one valve is needed to stop the flow and there is one additional valve in case the other fails.
HFT and Voting
In the functional safety business we use the term HFT to express that we have redundancy or not. When a design has a HFT of X it means that it can tolerate X dangerous failures and it still works. X+1 dangerous failures and it does not work any more. HFT can easily be calculated if the architecture is known, i.e., 1oo1, 1oo2, 2oo3, etc. If the architecture is expressed as MooN than the HFT is calculated as N – M. In other words a 2oo4 architecture has a HFT of 2. This means it can tolerate 2 failures and it still works, and thus it is an architecture with redundancy. But how much redundant is it? Lets explore this.
A 1oo1 architecture has a HFT=0 and thus can tolerate 0 failures and has no or zero redundancy. A 2oo2 architecture has a HFT=0 and thus can tolerate 0 failures. It has no or 0 redundancy. Yet it consist of two devices. The problem in this case is voting. Voting is defined as the number of paths that must work out of the total number of paths available. A 2oo2 has two paths available but also two paths need to work. If one path fails, it does not work any more, even if the other path is available. Hence a 2oo2 has no redundancy. So just because you see two valves that does not mean you have redundancy. You need to know how much voting is needed.
So how does this now work for the most popular architectures in the safety industry. See the table below which gives an overview.
You notice anything special? Yes, HFT looks like it is equal to redundancy but suddenly with 2oo4 it goes wrong. Which automatically means that hardware fault tolerance is not a measure of redundancy. It is not the same. If HFT is larger than zero you know you have redundancy but you do not know how much.
Lets assume we have four transmitters A, B, C, D in a 2oo4 architecture. This means we have the following options to carry out our desired function: AB, AC, AD, BC, BD, CD. In order to know how much redundant we are we need to know how many options are left if a similar device has failed. If one transmitter fails, does it still work? If it does we have redundancy. Lets assume A fails then we have the following options left:
AB, AC, AD, BC, BD, CD. In other words after one failure we have three options left and thus we are three redundant after one failure. We are one redundant after two failures.
My preference is to use HFT instead of trying to count redundancy. It expresses better what we want to know in the first place. I hope this demystified the terms redundancy, HFT and voting for you.