Common Cause Failures

Common cause failures are the bane of redundancy. A common cause failure is one in which a single failure or condition affects the operation of multiple devices that would otherwise be considered independent. Common cause failures can result in the SIS failing to function when there is a process demand. Common cause failures can reduce the effectiveness of separation between the PCS and the SIS if they are implemented using equipment from the same vendor, particularly if they are connected to the same data highway.

DCS are designed to avoid common cause failures, particularly those that would affect both a controller and its back-up. However, not all potential common cause failures can be eliminated.

The most common type of common cause failure is software. The danger of a software problem occurring in both the PCS and the SIS is that the failure in the PCS can cause a process upset which, if unmitigated, might lead to an accident. A simultaneous software problem might cause the SIS to be unavailable to mitigate the hazard caused by PCS failure.

Normally, if both the PCS and SIS have been properly tested the chance of each coincidentally encountering a software bug is extremely small, unless there is some common triggering event, e.g. a problem with the data highway could cause the same software code to be executed in each device connected to the highway. A bug in that software could result in a failure of multiple devices.

As such, common cause failures must be identified during the design process and the potential impact on the SIS functionality must be understood. Unfortunately, there is a great deal of disagreement among the experts on how to define common cause failure and what specific events comprise a common cause. The following are often cited as examples of common cause faults:

However, the examination of these faults, in light of any SIS design, will indicate that any of these six examples can disable single I/O systems, as well as redundant I/O systems.

For example, mis-calibration of redundant sensors is often cited as an important common cause to consider. The use of a single sensor will eliminate the common cause problem, but that does not make sense. The mis-calibration of a single sensor will cause the SIS to fail just as seriously as the mis-calibration of redundant sensors.

Mis-calibration, in its broadest sense, can be anything from poor calibration procedure (procedural problem), bad calibration equipment (mechanical problem), or incorrect execution of calibration procedure (human error). Due to this, the elimination of common cause failure must involve the examination of everything and everyone that interacts with the device.

Of course, the ultimate failure of all is that the safety requirement specification (SRS) is incorrect at the beginning of the design process and the transmitter cannot detect the potential incident. This is the most disastrous common cause failure that directly leads to the hazardous incident that the designer is seeking to prevent.

A methodology for CCF prevention using checklists has been incorporated into draft ISA TR84.0.02 Part 1 Annex A, where it is referenced to IEC 61508. The proposed methodology is still under development and numerous changes are expected prior to final issuance.

 

Back on Top