Redundancy, HFT, and Voting
2014-08-16
We often get the question of whether HFT (hardware fault tolerance) is equal to redundancy? The answer is no. In this blog post, I will explain why not. To do that, we need to actually understand three terms, i.e., redundancy, HFT and voting.
REDUNDANCYIn the technical world, everybody seems to know the word redundancy, and yet it can be very confusing. Especially when you try to express it in a number, i.e., how much redundant is a design? So what is redundancy? Redundancy can be defined as a system function which is designed in such a way that there are multiple means (parts, components, devices, software, etc.) to carry out the function so that the function will not fail if one or more of these means fails. Redundancy is not determined by the number of similar parts or devices you see. Whether there is redundancy or not is solely determined by the function that you carry out with these parts or devices. Take a look at the following pictures. You see two valves. Is this now redundant or not?
Is this redundant or not?
Well, that depends on the function that is being carried out with these two valves. If the function is to stop the flow upon demand and both valves are open during normal operation, then only one valve must close and stop the flow. In other words, if one valve is stuck open (a dangerous failure), the function will still work as the other valve can close. This would be redundancy and is a so-called 1oo2 architecture design.
If, on the other hand, the function is to open the flow and both valves are closed during normal operation, then both valves need to open in order to start the flow. If one valve is stuck closed (in this case, also a dangerous failure), the function cannot be carried out, even if the other valve opens. This is not redundancy; the valves are in a so-called 2oo2 architecture design.
In the first case, we are redundant but how much redundant? Some cultures call it redundant, others say it is two redundant, but the correct way to express it is one redundant. One valve is needed to stop the flow, and there is one additional valve in case the other fails.
HFT AND VOTINGIn the functional safety business, we use the term HFT to express whether we have redundancy or not. When a design has an HFT of X, it means that it can tolerate X dangerous failures, and it still works. X+1 dangerous failures, and it does not work any more. HFT can easily be calculated if the architecture is known, i.e., 1oo1, 1oo2, 2oo3, etc. If the architecture is expressed as MooN, then the HFT is calculated as N – M. In other words, a 2oo4 architecture has an HFT of 2. This means it can tolerate two failures, and it still works; thus, it is an architecture with redundancy. But how much redundant is it? Let's explore this.
A 1oo1 architecture has an HFT=0 and thus can tolerate 0 failures and has no or zero redundancy. A 2oo2 architecture has an HFT=0 and thus can tolerate 0 failures. It has no or 0 redundancy. Yet it consists of two devices. The problem, in this case, is voting. Voting is the number of paths that must work out of the total number of paths available. A 2oo2 has two paths available, but also, two paths need to work. If one path fails, it does not work any more, even if the other path is available. Hence a 2oo2 has no redundancy. So just because you see two valves does not mean you have redundancy. You need to know how much voting is needed.
So how does this now work for the most popular architectures in the safety industry? See the table below, which gives an overview.
Architecture | Voting | HFT | Redundancy |
---|---|---|---|
1oo1 | 1 | 0 | 0 |
1oo2 | 1 | 1 | 1 |
2oo2 | 2 | 0 | 0 |
2oo3 | 2 | 1 | 1 |
2oo4 | 2 | 2 | 3 |
3oo3 | 3 | 0 | 0 |
Do you notice anything special? Yes, HFT looks like it is equal to redundancy, but suddenly with 2oo4, it goes wrong. Which automatically means that hardware fault tolerance is not a measure of redundancy. It is not the same. If HFT is larger than zero, you know you have redundancy, but you do not know how much.
Let's assume we have four transmitters, A, B, C, and D, in a 2oo4 architecture. This means we have the following options to carry out our desired function: AB, AC, AD, BC, BD, and CD. In order to know how much redundant we are, we need to know how many options are left if a similar device has failed. If one transmitter fails, does it still work? If it does, we have redundancy. Let's assume A fails, then we have the following options left: AB, AC, AD, BC, BD, and CD. In other words, after one failure, we have three options left, and thus we are three redundant after one failure. We are one redundant after two failures.
My preference is to use HFT instead of trying to count redundancy. It expresses better what we want to know in the first place. I hope this demystified the terms redundancy, HFT, and voting for you.