The context — having the safety function is not enough
On 11 December 2005, the Buncefield oil storage depot north of London exploded. Tank 912 was being filled overnight. Two barriers were supposed to stop it overflowing: the automatic level gauge, and an independent high-high level switch meant to cut the inflow as a last resort.
The gauge froze without raising an alarm. The independent switch was inoperative: its locking device had not been refitted after maintenance, and without it the switch could not act. The tank overflowed, a large petrol-vapour cloud formed and then exploded. It was one of the largest peacetime explosions in Europe — around fifty injured, no fatalities by luck, and massive damage.
The lesson is exactly that of functional safety: a safety function existed on paper, but its integrity was neither demonstrated nor maintained. Designing a protection is not enough. You must prove it reduces the risk by the expected factor, and guarantee that it keeps doing so throughout the life of the installation. That is precisely the problem IEC 61508 tackles head-on.
What functional safety is, and what it is not
Functional safety is the part of overall safety that depends on an active system operating correctly in response to its inputs. Concretely: a sensor detects a hazardous condition, a logic unit decides, an actuator brings the installation to a safe state. A typical example: “if pressure exceeds the threshold, close the valve in under two seconds.”
It is not the same as the intrinsic safety of equipment in an explosive atmosphere, which falls under ATEX. Nor is it the mechanical strength of a casing or a structure. Functional safety is a function actively performed by a sensor–logic–actuator chain. If that chain fails at the wrong moment, the protection no longer exists.
IEC 61508 speaks of electrical, electronic and programmable electronic safety-related systems. In plain terms: safety relays, safety PLCs, sensors and transmitters, safety valves, and the software that drives them.
IEC 61508: the parent standard
IEC 61508 is titled “Functional safety of electrical/electronic/programmable electronic safety-related systems.” It is published by the International Electrotechnical Commission. Its first edition dates from the late 1990s, its second edition from 2010.
It has seven parts. Part 1 sets the general requirements and the lifecycle. Part 2 covers electronic hardware. Part 3 covers software. Part 4 gives the definitions. Parts 5, 6 and 7 provide methods, application guidance and a catalogue of techniques.
It is a generic standard — a so-called basic standard. Sector standards derive from it and adapt it to their field: IEC 61511 for the process industries, IEC 62061 for machinery, IEC 61513 for nuclear, the EN 5012x series for railways, and ISO 26262 for automotive. On the machinery side, ISO 13849 offers an alternative approach based on performance levels. We will cover these variants in dedicated articles. Here, we lay the foundation.
The goal: reduce risk by a demonstrable factor, the SIL
The central idea is simple. You assess the risk of a hazardous scenario. You determine the risk reduction needed to make it tolerable. You then assign the safety function an integrity level, the SIL — Safety Integrity Level. There are four levels, from SIL 1, the least demanding, to SIL 4, the most demanding.
The SIL is measured differently depending on how often the function is called upon.
In low-demand mode — that is, when the function is called only rarely — you measure the average probability of failure on demand, the average PFD. Each level also corresponds to a risk reduction factor.
| SIL | Average PFD (low-demand mode) | Risk reduction factor |
|---|---|---|
| 1 | 10⁻² to 10⁻¹ | 10 to 100 |
| 2 | 10⁻³ to 10⁻² | 100 to 1,000 |
| 3 | 10⁻⁴ to 10⁻³ | 1,000 to 10,000 |
| 4 | 10⁻⁵ to 10⁻⁴ | 10,000 to 100,000 |
In high-demand or continuous mode, when the function is called upon often or permanently, you measure the probability of dangerous failure per hour, the PFH.
| SIL | PFH (high-demand or continuous mode) |
|---|---|
| 1 | 10⁻⁶ to 10⁻⁵ per hour |
| 2 | 10⁻⁷ to 10⁻⁶ per hour |
| 3 | 10⁻⁸ to 10⁻⁷ per hour |
| 4 | 10⁻⁹ to 10⁻⁸ per hour |
To convert a risk reduction into a SIL, or a PFD target into a level, the site’s PFD to SIL and risk reduction factor to SIL calculators give the result directly.
Two distinct enemies: random and systematic failures
A safety function can fail for two fundamentally different reasons. IEC 61508 treats them separately, because they have nothing in common.
Random hardware failures arise from wear, ageing, physical defects. They are quantifiable by a failure rate. They are fought with redundancy, built-in diagnostics, and periodic testing.
Systematic failures come from human error frozen into the design: an ambiguous specification, a software defect, a generic wiring error, a poor component choice. They are not quantifiable by a probability. They are fought with process rigour: reviews, verification, validation, traceability. IEC 61508 measures that rigour by the systematic capability, rated from 1 to 4.
A crucial point, too often forgotten: a SIL is not just a reliability figure. It combines three requirements.
The three requirements of a SIL
Meeting the quantified PFD or PFH target is necessary, but not sufficient. A valid SIL rests on three pillars, and the weakest of the three sets the level actually achieved.
First, the quantitative requirement: the average PFD, or the PFH, must fall within the band of the target SIL.
Second, the architectural constraints. They impose a minimum hardware fault tolerance as a function of the safe failure fraction — that is, the share of failures that are not dangerous or that are detected. The standard distinguishes components with simple, well-understood behaviour, called Type A, from complex components such as microprocessors, called Type B. It offers two routes: a route based on these tables, and a route based on proven-in-use reliability data, introduced in the second edition.
Third, the systematic requirement: the systematic capability of the elements and the control of design faults.
The practical consequence is clear. You can perfectly well compute an excellent PFD and still miss the SIL, because the architecture lacks redundancy, or because the software was not developed with the required rigour.
The heart of the standard: the safety lifecycle
The major contribution of IEC 61508 is not a formula. It is a methodological idea: safety is not a component you buy, it is an end-to-end process. The standard formalises it as the safety lifecycle, a sequence of stages covering the entire life of the installation.
Everything starts with hazard and risk analysis. From it flows the safety requirements specification, which defines each function, its SIL, its response time and its safe state. Then comes the allocation of functions to systems, then realisation, hardware and software. The software follows its own cycle, a V-model, described in Part 3. Then installation, validation, operation and maintenance, change management, and finally decommissioning.
Each stage has inputs, outputs and a verification. One weak link ruins the whole chain: a vague specification, a fanciful reliability figure, an unrealistic test, or sloppy maintenance as at Buncefield. Functional safety is managed over time, not only at design.
The metrics you need to read
A few quantities recur in every SIL study. Failure rates are split according to whether the failure is dangerous or safe, detected or undetected. The dangerous undetected share is the most penalising: it is the one that only periodic testing reveals.
Diagnostic coverage measures the fraction of dangerous failures detected automatically. The common-cause factor, often noted beta, measures the share of failures that strike two redundant channels at the same time; it is the Achilles heel of any redundancy. The interval between periodic tests weighs heavily on the average PFD.
For a single channel without redundancy, the average PFD is roughly half the product of the dangerous undetected failure rate and the test interval. In other words: testing twice as often halves that term. Redundant architectures are noted “M out of N”: one out of one, one out of two, two out of three. The failure rate to mean time between failures calculator helps handle these orders of magnitude.
The stakes
The first stake is human and environmental. A failed safety function means a major-accident risk: fire, explosion, toxic release, crushing.
The second stake is regulatory. Major-hazard sites fall under the Seveso directive in Europe; machinery, under the machinery regulation. Demonstrating the conformity of your safety functions is not optional.
The third stake is economic. A spurious trip stops production and is costly in availability. An undersized safety measure costs infinitely more in the event of an accident. The right SIL is a balance, not a maximum.
The fourth stake is cyber. Safety systems are now connected. The Triton attack, in 2017, targeted precisely a safety instrumented system to disable its protections. Functional safety and cybersecurity can no longer be handled separately. The topic is detailed in the article Exchanging data between two OT controllers.
The real difficulties
The safety requirements specification is the first source of problems. An ambiguous requirement produces a function that does not do what was needed, and no reliability calculation rescues that.
Reliability data is the second difficulty. The failure rates supplied by manufacturers are sometimes optimistic, and the common-cause factor is regularly underestimated. A SIL study is only as good as its input data.
Periodic testing is the third difficulty. A partial test that does not reveal all dangerous failures gives false confidence. Test coverage matters as much as test frequency.
Finally, the confusion between the SIL of a component and the SIL of a complete loop is a permanent mistake. A sensor certified for a given level does not make the whole loop that level. The loop depends on all its elements, its architecture and its test interval.
The mistakes not to make
| Mistake | Why people make it | Why it is dangerous |
|---|---|---|
| Aiming for the highest SIL “to be safe” | Believing you buy safety with a number | Massive over-cost, and false safety if systematic capability does not follow |
| ”The sensor is SIL 2, so my loop is SIL 2” | Confusing component and loop | The loop depends on all its elements, architecture and testing; it can be well below |
| ”The PFD is good, it’s validated” | Reducing the SIL to a single figure | Forgetting the architectural constraints and systematic capability |
| Bypassing safety to keep producing, “we’ll put it back later” | Production pressure | This is the Buncefield scenario: the barrier exists but no longer acts |
| Test once, never review again | The function “worked” at commissioning | The PFD drifts over time; without realistic periodic testing, integrity erodes |
Good practice
Start with the risk analysis, then write a clear, testable safety requirements specification, function by function.
Allocate the SIL to each function, not globally to the installation.
Verify the three requirements, not only the PFD: the quantified target, the architectural constraints, and the systematic capability.
Use credible reliability data, a realistic common-cause factor, and a test interval consistent with the lifetime and the targeted availability.
Separate the control system from the safety system. The system that drives the process must not be the one that protects it; their independence is a fundamental requirement.
Manage bypasses: every inhibition must be logged, time-limited, signalled, and cleared by procedure.
Put a functional safety management in place, and have an independent functional safety assessment carried out by a person or team distinct from the designers.
Review checklist
- Hazard and risk analysis documented and dated
- Safety requirements specification testable, per function
- SIL allocated and justified for each function
- The three requirements verified: PFD or PFH, architectural constraints, systematic capability
- Reliability data, common-cause factor and test interval justified
- Independence between control system and safety system
- Bypass management logged and time-bounded
- Periodic test plan with defined coverage
- Functional safety management in place
- Independent functional safety assessment carried out
- Change management applied to every modification
Going further
- The IEC 61508 page details the parent standard part by part.
- The sector variants: IEC 61511 for processes, IEC 62061 and ISO 13849 for machinery. Each will be the subject of a dedicated article.
- The Safety PLC hub covers the safety controllers that perform these functions.
- The PFD to SIL, risk reduction to SIL and failure rate to mean time between failures calculators for orders of magnitude.
One last thing. Functional safety is not declared, it is demonstrated and maintained. Buncefield had the safety function; what it lacked was integrity. That is the whole difference IEC 61508 draws: not “do you have a protection?”, but “can you prove it reduces the risk by the required factor, today and in ten years?”.