Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook – 2013 Edition 978-82-14-05601-3

PDS is a method used to quantify the safety unavailability and loss of production for safety instrumented systems (SISs)

1,448 231 2MB

English Pages 0 [95] Year 2013

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook – 2013 Edition
 978-82-14-05601-3

Citation preview

Restricted

Report Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook – 2013 Edition Authors Stein Hauge Tony Kråkenes Per Hokstad Solfrid Håbrekke Hui Jin

SINTEF Technology and Society Safety Research May 2013

SINTEF Teknologi og samfunn SINTEF Technology and Society Address: Postboks 4760 Sluppen NO-7465 Trondheim NORWAY Telephone:+47 73593000 Telefax:+47 73592896 [email protected] www.sintef.no Enterprise /VAT No: NO 948007029 MVA

KEYWORDS: Safety Instrumented Systems (SIS) Reliability analysis SIL calculations IEC 61508

Report Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook – 2013 Edition VERSION

DATE

1.0

2013-05-23

AUTHORS

Stein Hauge Tony Kråkenes Per Hokstad Solfrid Håbrekke Hui Jin CLIENT(S)

CLIENT’S REF.

Multiclient – PDS Forum

Håkon S. Mathisen

PROJECT NO.

NUMBER OF PAGESS:

60S051

93 incl. appendices

ABSTRACT

PDS is a method used to quantify the safety unavailability and loss of production for safety instrumented systems (SISs). This report presents an updated version of the PDS method. Among new and updated topics are: •

• • • •

Calculations for multiple safety systems. Slightly updated model for common cause calculations. More thorough discussion of different demand mode situations. How to incorporate the effect from reduced proof test coverage (PTC) in the reliability calculations. General update of the calculation formulas and examples.

PREPARED BY

SIGNATURE

Stein Hauge CHECKED BY

SIGNATURE

Jørn Vatn APPROVED BY

SIGNATURE

Frode Rømo, Research Director REPORT NO.

ISBN

CLASSIFICATION

CLASSIFICATION THIS PAGE

SINTEF A24442

978-82-14-05601-3

Restricted

Unrestricted

1 of 93

Document history VERSION

DATE

VERSION DESCRIPTION

DRAFT 0.1 1.0

2013-02-04 Draft version to PDS members for comment. 2013-05-23 Final version

2 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

PREFACE The present report is an update of the 2010 edition of the PDS method handbook /6/ and is mainly a result of work carried out in the research project “Barriers to prevent and limit acute releases to sea”. The authors would like to thank everyone who has provided us with input and comments to this PDS method handbook. The work has been funded by the Research Council of Norway and the PDS participants. Trondheim, May 2013 PDS forum participants in the project period 2010 – 2012: Oil Companies/Operators/Drilling Companies • A/S Norske Shell • BP Norge AS • ConocoPhillips Norge • Eni Norge AS • GDF SUEZ E&P • Odfjell Drilling & Technology • Marathon Petroleum Company (Norway) LLC • Talisman Energy Norge • Teekay Petrojarl ASA • Statoil ASA • TOTAL E&P NORGE AS Control and Safety System Vendors • ABB AS • FMC Kongsberg Subsea AS • Honeywell AS • Kongsberg Maritime AS • Bjørge Safety Systems AS • Siemens AS • Simtronics AS Engineering Companies and Consultants • Aker Engineering & Technology AS • Det Norske Veritas AS • Lilleaker Consulting AS • Safetec Nordic AS • Scandpower AS Governmental bodies • The Norwegian Maritime Directorate (Observer) • The Petroleum Safety Authority Norway (Observer) • The Research Council of Norway (funding)

3 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

ABSTRACT PDS is a method used to quantify the safety unavailability and loss of production for safety instrumented systems (SISs). The method accounts for all types of failure categories; technical, software, human, etc. This report presents an updated version of the PDS method. IEC 61508 and IEC 61511 have become important standards for specification, design and operation of safety instrumented systems in the process industry. The PDS method is in line with the main principles advocated in these standards, focusing mainly – but not only - on the quantitative aspects of the standards.

4 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Table of contents PREFACE ................................................................................................................................................ 3 ABSTRACT ............................................................................................................................................. 4 1

INTRODUCTION............................................................................................................................. 7 1.1 Purpose of the Handbook............................................................................................................... 7 1.2 Organisation of the Handbook ....................................................................................................... 7 1.3 Abbreviations ................................................................................................................................. 8

2

THE NEED FOR RELIABILITY CALCULATIONS .................................................................................... 9 2.1 Why do we Need Reliability Analysis of Safety Instrumented Systems? ....................................... 9 2.2 Why PDS? ....................................................................................................................................... 9 2.3 Uncertainty in Reliability Analysis ................................................................................................ 10

3

RELIABILITY CONCEPTS .................................................................................................................11 3.1 Introduction.................................................................................................................................. 11 3.2 Failure Classification by Cause of Failure ..................................................................................... 11 3.3 How to make the reliability calculations more realistic ............................................................... 13 3.4 Testing and Failure Detection ...................................................................................................... 14 3.5 Failure Mode Classification and Taxonomy.................................................................................. 15 3.6 3.7 3.8

4

Dangerous Undetected Failures - λDU........................................................................................... 17 Performance Measures for Loss of Safety – Low Demand Systems ............................................ 18 Loss of Production ........................................................................................................................ 22

MODELLING OF COMMON CAUSE FAILURES .................................................................................24 4.1 The PDS Extension of the Beta-Factor Model – CMooN .................................................................. 24 4.2 Proposed Values for the CMooN Factors ......................................................................................... 24 4.3 4.4

Standard β-factor Model Versus PDS Approach .......................................................................... 25 Modelling of CCF for Components with Non-Identical Characteristics........................................ 26

5

PDS CALCULATION FORMULAS – LOW DEMAND SYSTEMS ............................................................28 5.1 Assumptions and Limitations ....................................................................................................... 28 5.2 PDS Formulas for Loss of Safety ................................................................................................... 29 5.3 How to Model the Quantitative Effect of Imperfect Functional Testing ..................................... 36 5.4 Quantification of Spurious Trip Rate (STR) ................................................................................... 39

6

PDS CALCULATION FORMULAS – HIGH DEMAND SYSTEMS ............................................................41 6.1 High and Low Demand Systems ................................................................................................... 41 6.2 Loss-of-safety Measures: PFD and PFH examples ........................................................................ 42 6.3 Using PFD or PFH? ........................................................................................................................ 43 6.4 PFH Formulas; Including both Common Cause and Independent Failures .................................. 44

5 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

7

CALCULATIONS FOR MULTIPLE SYSTEMS.......................................................................................50 7.1 Background................................................................................................................................... 50 7.2 Motivation for Using Correction Factors (CF) in Multiple SIS Calculations .................................. 52 7.3 Correction Factors for Simultaneous Testing ............................................................................... 54 7.4 Correction Factors for Non-Simultaneous Testing ....................................................................... 56 7.5 Concluding remarks ...................................................................................................................... 57

8

QUANTIFICATION EXAMPLE .........................................................................................................58 8.1 System Description – Topside HIPPS function.............................................................................. 58 8.2 Reliability Input Data .................................................................................................................... 58 8.3 Loss of Safety Assessment – CSU.................................................................................................. 59 8.4 Spurious Trip Assessment............................................................................................................. 61

9

REFERENCES.................................................................................................................................62

APPENDIX A: NOTATION AND ABBREVIATIONS.....................................................................................64 APPENDIX B: THE CONFIGURATION FACTOR CMOON ................................................................................66 B.1 Determining the Configuration Factor CMooN................................................................................ 66 B.2 Formulas for the Configuration Factor CMooN ............................................................................... 68 APPENDIX C: DETAILED FORMULAS FOR PFD AND DTU ..........................................................................71 APPENDIX D: MULTIPLE SIS – BACKGROUND AND CALCULATIONS..........................................................75 D.1 Approaches to determining CF in case of simultaneous testing .................................................. 75 D.2 The effect of differences in testing .............................................................................................. 83 APPENDIX E: PFD VERSUS PFH AND THE EFFECT OF DEMANDS ...............................................................87

6 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

1 INTRODUCTION 1.1 Purpose of the Handbook The PDS 1 method is used to quantify the safety unavailability and loss of production for safety instrumented systems (SISs). The method has been widely used in the Norwegian petroleum industry, but is also applicable to other business sectors. This handbook provides an updated version of the PDS method. The objective has been to incorporate development work done in the PDS project during the last years and, based on input from practitioners and PDS participants, to provide some more in-depth discussion of selected areas. The increased use of computer-based safety systems has resulted in functional safety standards like IEC 61508, /1/ and IEC 61511, /2/. IEC 61508 provides a basis for specification, design and operation of SISs with emphasis on safety activities in each lifecycle phase of the system. For estimating the reliability of a SIS, the IEC standards describes a number of possible calculation approaches including analytical formula, Boolean approaches including reliability block diagrams (RBD) and fault tree analysis (FTA), Markov modelling and Petri Nets (see IEC 61508-6, Annex B). It should be noted that the IEC standards do not mandate one particular approach or a particular set of formulas, but leave it to the user to choose the most appropriate approach for quantifying the reliability of a given system or function. For further reading and details about the different reliability modelling approaches, reference is also made to the new ISO TR 12489, /24/. The PDS method represents an example of how to implement analytical formula, and together with the PDS data handbook, it offers an effective and practical approach towards implementing the quantitative aspects of the IEC standards. Efforts have been made to give the reader an understanding of how the formulas are derived at, including their applicability and their limitations. The report is aimed at reliability and safety engineers, as well as management, designers and technical personnel working with safety instrumented systems.

1.2 Organisation of the Handbook The report is organised as follows: • • • • • • •

Chapter 2 includes a general discussion on the need for reliability calculations, and why the PDS calculation method is recommended. Chapter 3 discusses the failure classification and the reliability parameters of the updated PDS method. In chapter 4 the modelling of common cause failures is discussed. Chapter 5 presents calculation formulas for low demand mode systems. In chapter 6 a discussion of high demand versus low demand mode systems is given and formulas for high demand mode / continuously operating systems are presented Chapter 7 provides a discussion of and formulas for calculating the reliability of multiple layer safety systems. Chapter 8 presents a worked example of quantification.

Appendix A presents a complete list of notation and abbreviations used in the report. In Appendix B the modelling of common cause failures is discussed in some more detail and in Appendix C slightly more detailed formulas than those given in chapter 5 are presented. Appendix D provides a description of the various alternative approaches to determining an appropriate correction factor (CF) for multiple SIS calculations in the case of simultaneous testing. This appendix also contains a discussion of the effects of non-simultaneous testing, both regarding different phasing and different length of test intervals. Finally, Appendix E gives a discussion of the use of PFH versus PFD. 1

PDS is a Norwegian acronym for reliability of safety instrumented systems.

7 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

The present report focuses on the safety and reliability aspects of the PDS method, including performance measures for loss of safety and for production availability. It does not consider maintenance performance and lifecycle cost calculations.

1.3 Abbreviations avg CCF CF CMF Crit CSU D DC DD DU DTU ESD HIPPS HR IEC LOPA MCS MooN m-oo-n MTTF MTTR N/A NOG NONC OREDA PDS PFD PFH PSA PSD PT PTC PWV RH RNNP S SAR SD SU SFF SIF SIL SIS STR SYST TIF

-

Average Common cause failures Correction factor (for multiple SISs) Common mode failures Critical (failures) Critical safety unavailability Dangerous Diagnostic coverage Dangerous detected Dangerous undetected Downtime unavailability Emergency shutdown High integrity pressure protection system Hazard rate International electro-technical commission Layer of protection analysis Minimal cut set M-out-of-N; 𝑀, 𝑁 = 1,2,3, … Representative m-out-of-n structure of a SIS; 𝑚, 𝑛 not necessary integers Mean time to failure Mean time to restoration Not applicable Norwegian Oil and Gas Association (former OLF) Non-critical (failures) Offshore reliability data Norwegian acronym for “reliability of computer based safety systems” Probability of failure on demand Probability of failure per hour Petroleum Safety Authorities (Norway) Process shutdown Pressure transmitter Proof test coverage Production wing valve Random hardware Project: Risk level in Norwegian petroleum production (www.ptil.no) Safe Safety analysis report Safe detected Safe undetected Safe failure fraction Safety instrumented function Safety integrity level Safety instrumented system Spurious trip rate Systematic (failures) Test independent failure

8 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

2 THE NEED FOR RELIABILITY CALCULATIONS 2.1 Why do we Need Reliability Analysis of Safety Instrumented Systems? There is an increasing reliance on safety instrumented systems (SISs) to achieve satisfactory risk levels in the process industry. Also, in other business sectors such as the public transport industry (air and rail) and the manufacturing industry, there is a major increase in the use of computer based safety systems. Fire and gas detection systems, process shutdown systems and emergency shutdown systems are examples of SISs used to prevent abnormal operating conditions from developing into an accident. Such systems are thus installed to reduce the process risk associated with health and safety effects, environmental impacts, loss of property, and business interruption costs, /5/. In the PDS method failure of such systems is referred to as “loss of safety” or “safety unavailability”. Addressing safety and reliability in all relevant phases of the safety system life cycle therefore becomes paramount both with respect to safe as well as commercial operation. It must be verified that all safety requirements for the SIS are satisfied, and that the risk reduction actually obtained from the SIS is in line with what is required. Here, the PDS method plays an important role in predicting the risk reduction obtained from the safety instrumented functions (SIF) that are performed by the SIS. IEC 61508 and IEC 61511 have become the main standards for design, construction, and operation of SISs in the process industry. The Norwegian Oil and Gas Association (NOG) has developed a guideline (former OLF guideline no. 070, /3/) to support the implementation of the two IEC standards. In the regulations from the Norwegian Petroleum Safety Authorities (PSA), /4/, specific references are given to the IEC standards and the NOG guideline. IEC 61508 allows using different approaches for quantifying loss of safety. In the NOG guideline, it is recommended to use the PDS method for this purpose. Although most reliability analyses have been used to gain confidence in the system by assessing the reliability attributes, it may be even more interesting to use reliability analysis as a means to achieve reliability, e.g., by design optimisation. It would usually be efficient to employ these techniques in the design phase of the system, when less costly changes can be made. Proper analytic tools available during the design process may ensure that an optimal system configuration is installed from the very beginning, thereby reducing overall system cost. The operational phase has been given more attention in recent years, and the need for barrier control is stressed in the PSA regulations, ref. /4/. Further, both the IEC standards and the PSA regulations focus on the entire life cycle of the safety systems. In the PDS project, guidelines for follow-up of SISs in the operating phase have been developed (downloadable from the web) and procedures for updating failure rates and test intervals in the operating phase have been suggested, ref. /7/ and /8/.

2.2 Why PDS? Uncritical use of quantitative analyses may weaken the confidence in the value of performing reliability analyses, as extremely ‘good’, but highly unrealistic figures can be obtained, depending on the assumptions and the input data used. The PDS method is considered to be realistic as it accounts for all major factors affecting reliability during system operation, such as: • • • • •

All major failure categories/causes Common cause failures Automatic self-tests Functional (manual) testing Systematic failures

9 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

• •

Complete safety function Redundancies and voting logic

The PDS method has been developed in close cooperation with the industry and attempts have been made to keep the formulas and associated explanations as simple and intuitive as possible without losing required accuracy. The method may therefore contribute to enhance the use of reliability analysis in the engineering disciplines, thereby bridging the gap between reliability theory and application. As stressed in IEC 61508, it is important to be function oriented, and take into account the performance of the total signal path from the sensors via the control logic and to the actuators. This is a core issue in PDS.

2.3 Uncertainty in Reliability Analysis It is important to realize that quantification of loss of safety is associated with uncertainty. This means that the results that we obtain from such analyses are not the true value, but rather a basis for comparing the reliability of different system designs and for trending reliability performance in the operational phase. An important objective of quantitative (and qualitative) reliability analyses is to increase the awareness among system designers, operators, and maintenance personnel on how the system may fail and what the main contributors to such failures are. We may relate the uncertainty to: •

The model: To what extent is the model able to capture the most important phenomena of the system, including its operating conditions? In practice, we often need to balance the two conflicting interests: o The model should be sufficiently simple to be handled by available mathematical and statistical methods, and o The model should be sufficiently realistic such that the results are of practical relevance.



Data used in the analysis: To what extent are the data relevant and able to capture the future performance? o The use of reliability data are usually based on some assumed statistical model. E.g., the standard assumption of a constant failure rate may be a simplification for some equipment types. o Historical performance is not the same as future performance, even for the same component. The historical performance is often based on various samples with various operating conditions and in some cases different properties (such as size, design principle and so on). o Data may be incomplete due to few samples, lack of censoring, and not including all type of failures, for example software related failures. o There is also uncertainty related to data collection, failure reporting, classification and interpretation of data.

Sensitivity analyses may be performed to investigate how changes in the model and the assumed data can influence the calculated loss of safety. The use of sensitivity analyses is common practise in sectors like the nuclear industry and the aerospace industry, but has so far been given limited attention in the process industry.

10 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

3 RELIABILITY CONCEPTS 3.1 Introduction This chapter presents the failure classification and the reliability parameters used in the PDS method. The objective is to give an introduction to the model taxonomy and to explain the relation between the PDS and the IEC 61508 approach for quantification of loss of safety. IEC 61508 and IEC 61511 distinguish between four levels of risk reduction, called safety integrity levels (SIL). To each SIL, the IEC standards assign a target range for loss of safety. To measure loss of safety, the standards use Probability of Failure on Demand (PFD) for low demand SISs and Probability of Failure per Hour (PFH) for high demand /continuous operating SISs. This chapter describes some of the main concepts and principles underlying the formulas for PFD and PFH, and outlines the slight differences between the PDS approach and the approaches in IEC 61508 and IEC 61511.

3.2 Failure Classification by Cause of Failure Failures can be categorised according to failure cause and the IEC standards differentiate between random hardware failure and systematic failure. PDS uses the same classification and gives a somewhat more detailed breakdown of the systematic failures, as indicated in Figure 1.

Failure

Random hardware failure

Systematic failure

Aging failure

Software faults

- Random failures due to natural (and foreseen) stressors

- Programming error - Compilation error - Error during software update

Installation failure - Gas detector cover left on after commisioning - Valve installed in wrong direc ion - Incorrect sensor location

Hardware related failure - Inadequate specificaton - Inadequate implementation - Design not suited to operational conditions

Operational failure - Valve left in wrong position - Sensor calibra ion failure - Detector in override mode

Excessive stress failure - Excessive vibration - Unforeseen sand prod. - Too high temperature

Figure 1: Possible failure classification by cause of failure. The following failure categories (causes) are defined: Random hardware failures are failures resulting from the natural degradation mechanisms of the component. For these failures it is assumed that the operating conditions are within the design envelope of the system.

11 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Systematic failures are in PDS defined as failures that can be related to a particular cause other than natural degradation. Systematic failures are due to errors made during specification, design, operation and maintenance phases of the lifecycle. Such failures can therefore normally be eliminated by a modification, either of the design or manufacturing process, the testing and operating procedures, the training of personnel or changes to procedures and/or work practices. There are different schemes for splitting between random hardware and systematic failures and for classifying the systematic failures. Here, a further split into five categories of systematic failures has been suggested: •

Software faults may be due to programming errors, compilation errors, inadequate testing, unforeseen application conditions, change of system parameters, etc. Such faults are present from the point where the incorrect code is developed until the fault is detected either through testing or through improper operation of the safety function. Software faults can also be introduced during modification to existing process facilities, e.g., inadequate update of the application software to reflect the revised shutdown sequences or erroneous setting of a high alarm outside its operational limits.



Hardware related systematic failures are failures (other than software faults) introduced mainly during the design phase of the equipment but also during modifications/repairs. It may be a failure arising from incorrect, incomplete or ambiguous system specification, a failure in the manufacturing process and/or in the quality assurance of the component. Examples are a valve failing to close due to insufficient actuator force or a sensor failing to discriminate between true and false demands.



Installation failures are failures introduced during the last phases prior to operation, i.e., during installation or commissioning. If detected, such failures are typically removed during the first months of operation and such failures are therefore often excluded from data bases. These failures may however remain inherent in the system for a long period and can materialise during an actual demand. Examples are erroneous location of e.g., fire/gas detectors, a valve installed in the wrong direction or a sensor that has been erroneously calibrated during commissioning.



Excessive stress failures occur when stresses or conditions beyond the design specification are placed upon the component. The excessive stresses may be caused either by external causes or by internal influences from the medium. Examples may be damage to process sensors as a result of excessive vibration, internal valve erosion caused by unforeseen sand production or plugging of instrument taps caused by unforeseen low temperatures.



Operational failures are initiated by human errors during operation/intervention or testing/ maintenance/repair. In the operational phase the variability of tasks and work practices increases, thereby increasing the possibility of human errors during interaction with the SIS. Such errors are therefore believed to be an important contributor towards SIS unavailability, which is further supported by data from sources such as OREDA /13/ and operational reviews performed by SINTEF. Examples of such interaction failures2 are loops left in override position after completion of maintenance, a shutdown function set in bypass during start-up due to dynamic process conditions, erroneous calibration of a level sensor or a process sensor isolation valve left in closed position so that the instrument does not sense the medium.

Systems and equipment that are designed and operated in accordance with IEC 61508 undergo a formal work process specifically looking to minimize systematic failures. The standard also provides a number of checklists with measure and techniques to avoid and control such failures during the different life cycle phases. Hence, systematic failures shall be minimised to the extent possible given that functional safety management according to IEC 61508 and IEC 61511 is properly implemented. 2

It should be mentioned that in other classification schemes (e.g. in /24/) some of these operational human errors are defined as random failures rather than systematic ones. However, in PDS we have chosen to classify all human errors as systematic failures.

12 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

In general, systematic failures can give rise to failure of multiple components, i.e., common cause failures. Random hardware failures, on the other hand, can be denoted independent failures and are assumed not to result in common cause failures. It should be noted that some failures may not fit perfectly into the above scheme. E.g., it may sometimes be difficult to discriminate between an aging failure and a stress failure. Similarly it may be argued that there is overlap between some of the failure categories. However, for the purpose of illustrating that SIS failures may have a variety of causes without introducing a too complex classification scheme, the above categories are considered sufficiently detailed. Random hardware failures are sometimes referred to as physical failures whereas systematic failures are referred to as non-physical failures. A physical failure occurs when a component has degraded to a point of failure where it is not able to operate and thus needs to be changed or repaired. An example can be a relay which due to wear out is not able to change position. A non-physical failure on the other hand, occurs when the component is still able to operate but does not perform its specified function. An example may be a gas detector not functioning because it is still covered by plastic due to sand blasting in the area. It should, however, be noted that systematic failures caused by excessive stresses may result in a physical failure of the component. E.g., unforeseen vibration of a pump can cause a physical failure of a flow transmitter located on the connected piping. Hence, given the classification scheme in figure 1, it is not correct to state that all systematic failures are non-physical failures. In line with the IEC standards, the PDS method has a strict focus on the entire safety function and therefore intends to account for all failures that could compromise this function. Some of these failures may be related to the interface/environment such as e.g., “vibration of nearby pump causing transmitter to fail”. However, it is part of the PDS philosophy to include or at least to consider the possibility of having such events since they may contribute towards the unavailability of the safety system.

3.3 How to make the reliability calculations more realistic Following the introduction of IEC 61508 and the accompanying SIL verification process, it has become an increasing problem that exaggerated performance claims are made by equipment manufacturers, (see e.g., /9/ and /10/). Predictive analyses based on seemingly perfect operating conditions often claim failure rates a magnitude or more below what has historically been observed during operation. There may be several causes for such exaggerated claims of performance, including imprecise definition of equipment- and analysis boundaries, incorrect failure classification or too optimistic predictions of the diagnostic coverage factor, /9/. Another important reason seems to be that figures from such predictive analyses frequently exclude any possible contributions from systematic failures, e.g., failures that in one way or another can be attributed to operation rather than the equipment itself. From a manufacturers point of view this is understandable – why include failures that are not “his responsibility”? On the other hand the SIS is installed for the purpose of providing a further specified risk reduction and unrealistic failure rates can result in far too optimistic predictions. An important idea behind the PDS method is that the predicted risk reduction, calculated for a safety instrumented function (SIF) in the design phase, should reflect the actual risk reduction that may be experienced in the operational phase. In the PDS method we have therefore argued that both the contribution from random hardware failures as well as systematic failures should, to the degree possible, be quantified. This approach may appear somewhat different from the IEC 61508 standard, saying that only the contribution from random hardware failures shall be quantified and that reduction and avoidance of systematic failures shall be treated qualitatively. It should, however, be noted that IEC 61508 actually quantifies part of the systematic failures through the proposed method for quantifying hardware related common cause failures (ref. IEC 61508-6, Annex D). The IEC standard also repeatedly states that the contribution from human errors should be included, although not explicitly saying how this shall be done.

13 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Some main arguments why the contribution from systematic failures and in particular those introduced in the operational phase, should be included in the reliability estimates are: • • • • •

We want our risk reduction predictions (and SIL calculations) to be as realistic as possible; Failure to adequately address potential systematic failures can lead to overly optimistic results and a possible misallocation of resources intended to reduce risk; Too optimistic failure rates may result in an inadequate number of protection layers since too high risk reduction is assumed for each layer; Systematic failures are often the dominant contributor towards the overall failure probability (ref. e.g., failure data dossiers in /11/); Failure rates as given in e.g. /11/ and /13/ are based on historic (operational) data and therefore often include some systematic failures.

In PDS the systematic failures have, in addition to failure cause, been classified in two main categories: 1. Systematic failures detectable during testing. Examples may be a detector left in override mode at the last test, a miscalibrated transmitter or a valve that will not close due to hydrate formation; 2. Systematic failures not detected during testing but occurring upon a true demand. One example may be a software error introduced during update of the program logic. Another example can be a valve that closes during regular testing but due to insufficient actuator force does not close upon a process demand situation (with high process pressure). It should finally be pointed out that a thorough understanding of the system, including an analysis of relevant failure modes and how to detect them, are crucial in order to avoid these failures in the first place.

3.4 Testing and Failure Detection Testing and subsequent failure detection is vital in order to reveal and remove hidden failures in the safety system. Mainly, we have three possibilities for failure detection: • • •

Failure detection by automatic self-tests (including operator observation); Failure detection by functional testing (i.e., manual testing); Failure detection upon process demands / shutdowns.

3.4.1 Automatic Self-tests Modules often have built-in automatic (diagnostic) self-test to detect failures. Typical failure modes that can be detected by diagnostics are signal loss, drifted analogue signal / signal out of range or final element in wrong position, /5/. Further, upon discrepancy between redundant modules in the safety system, the system may determine which of the modules have failed. This is considered part of the self-test. But it is never the case that all failures are detected automatically. The fraction of failures being detected by the automatic self-test is called the diagnostic (fault) coverage and quantifies the effect of the self-test. Note that the actual effect on system performance from a failure that is detected by the automatic self-test will depend on system configuration and what action is taken when the equipment fault is detected. In particular it is important to consider whether the fault initiates an automatic shutdown action or alternatively only generates a system alarm which requires an active operator response. In addition to the automatic self-test, an operator or maintenance crew may detect dangerous failures incidentally in between tests. For instance, the panel operator may detect a transmitter that has frozen or a detector that has been left in by-pass. Similarly, when a process segment is isolated for maintenance, the operator may detect that one of the valves will not close. In previous editions of the handbook, the PDS method has allowed for incorporating this effect into the diagnostic coverage factor. However, since there is an increasing trend towards low- or unmanned (or subsea) installations and also to be in line with the

14 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

IEC definitions, we now define diagnostic coverage for dangerous failures to only include the effect of self-test (ref. section. 3.6.1 for definition of coverage).

3.4.2 Functional Testing Functional testing is performed manually at predefined time intervals and aims at testing the components involved in the execution of a safety instrumented function. In reliability analyses it is often assumed that functional testing is “perfect” in the sense that it replicates a true demand and thereby detects 100% of the failures. However, in reality the testing may be imperfect and/or the test conditions may deviate from the true demand conditions, leaving some parts of the function untested. Some typical examples of test conditions that may not reveal all failures are 3: • • • • •

Partial stroke testing (PST) 4; Test buttons on switches, e.g. built in test facilities - these may or may not reveal all faults; Transmitters put into test mode and signals injected (usually with smart / fieldbus transmitters); Pressure transmitters tested from manifold, i.e. impulse lines not tested; Equipment not tested in normal position.

To cater for the effect of incomplete testing, the probability of so called test independent failures (TIF) can be added to the PFD. This is further discussed in section 3.7.4. The fraction of failures detected during functional testing is often referred to as the proof test coverage (PTC). Partial stroke testing of valves is maybe the best known example where only part of the valve functionality is tested but not the full stroke. Typically, for such a partial test, the test coverage may be estimated and applied in the reliability calculations. This is further discussed in section 3.7.5.

3.4.3 Process Demands Serving as Testing Generally, it has not been standard practice in reliability analyses to model demands as a means for failure detection. One obvious reason for this being that a real demand on the safety function cannot be predicted and detection of a failure at this point may anyhow be too late (especially in single configurations). There will however be several planned (and unplanned) shutdown events where data related to SIS performance can be recorded – either manually or automatically in the plant information management system. Such information may typically include a listing of activated equipment, result of activation and possible failure modes including response/travel times. Hence, it may be possible to utilise this shutdown information for testing purposes, thereby potentially reducing the need for manual testing. Utilising shutdown reports as a means of testing should however be done with great care. It is required that the data recorded during the shutdown provides the equivalent information as obtained during a functional test. Further, it is important to identify which functions or parts of functions that are not activated during the shutdown and therefore need to be tested separately.

3.5 Failure Mode Classification and Taxonomy In IEC 61508-4 /1/ the following definitions are given of a dangerous and a safe failure respectively: Dangerous failure; “failure of an element and/or subsystem and/or system that plays a part in implementing the safety function that: a) prevents a safety function from operating when required (demand mode) or causes a safety function to fail (continuous mode) such that the EUC is put into a hazardous or

3

For more information see: http://www.hse.gov.uk/foi/internalops/hid circs/technical general/spc tech gen 48.htm Some practitioners have argued that PST is a mean to increase the diagnostic coverage. However, since the interval between partial stroke testing is usually 1-3 months, whereas a diagnostic test should take place more frequently than once a week (ref. IEC 61508-6, Table D.3), we here consider partial stroke testing as a (partial) functional test.

4

15 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

potentially hazardous state; or b) decreases the probability that the safety function operates correctly when required” Safe failure; “failure of an element and/or subsystem and/or system that plays a part in implementing the safety function that: a) results in the spurious operation of the safety function to put the EUC (or part thereof) into a safe state or maintain a safe state; or b) increases the probability of the spurious operation of the safety function to put the EUC (or part thereof) into a safe state or maintain a safe state”. Furthermore, IEC 61508-4 defines “no effect failure” as a failure of an element that plays a part in implementing the safety function but has no direct effect on the safety function. In line with IEC 61508 the PDS method also consider three failure modes; dangerous, safe and non-critical failures. These failure modes are given the following interpretations (on a component level): •

Dangerous (D). The component does not operate upon a demand (e.g., sensor stuck upon demand or valve does not close on demand). The Dangerous failures are further split into: o Dangerous Undetected (DU). Dangerous failures not detected by automatic self-test (i.e., revealed only by a functional test or upon a demand); o Dangerous Detected (DD). Dangerous failures detected by automatic self-test.



Safe (S). The component may operate without any demand (e.g., sensor provides a shutdown signal without a true demand - 'false alarm'). The safe failures are further split into: o Safe Undetected (SU). Safe failures not detected by automatic self-test (or incidentally by personnel) and therefore results in spurious operations of the components 5; o Safe Detected (SD). Potential spurious operations failures detected by automatic self-test (or incidentally by personnel). Hence, actual trips of the components are avoided.



Non-critical (NONC). The main functions of the component are not affected. Examples may be sensor imperfection or a minor leakage of hydraulic oil from an actuator, which has no immediate impact on the specified safety function. These failures correspond to the no-effect failures as defined by IEC 61058-4.

The Dangerous and Safe (spurious operation) failures are considered “critical” in the sense that they affect either of the two main functions, i.e., (1) the ability to shut down on demand or (2) the ability to maintain production when safe. The Safe failures are usually revealed instantly upon occurrence, whilst the Dangerous failures are “dormant” and can be detected by testing or upon a true demand. Note that although a safe failure typically results in the system going to its predefined safe state, such failures are by no means without consequences. There may be associated production losses, environmental emissions caused by flaring and also the required process start-up with all of its potential hazards. It should further be noted that a given failure may be classified as either dangerous or safe depending on the intended application. E.g., loss of hydraulic supply to a valve actuator operating on-demand will be dangerous in an energise-to-trip application and safe in a de-energise-to-trip application. Hence, when performing reliability calculations, the assumptions underlying the applied failure data as well as the context in which the data shall be used must be carefully considered. Based on the classification discussed above, the failure rate λ can be split into the following elements: • • • 5

𝜆DD = Rate of dangerous detected failures 𝜆DU = Rate of dangerous undetected failures 𝜆SD = Rate of safe detected failures

Depending on system configuration a spurious trip of the SIS may be avoided; e.g., by using a 2oo2 voting.

16 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

• •

𝜆SU = Rate of safe undetected failures 𝜆NONC = Rate of non-critical failures (comparable to “no effect failure” in IEC)

We also introduce 𝜆crit = 𝜆D + 𝜆S , which is the rate of critical failures; i.e. failures which unless detected can cause a failure on demand or a spurious trip of the safety function. In addition we have the total failure rate 𝜆 = 𝜆crit + 𝜆NONC Table 1 and Figure 2 further illustrate how 𝜆crit and 𝜆 can be split into their various elements. Table 1: Rate of critical failures, λcrit, split into various elements Safe failures

Dangerous failures

Sum

𝜆SU

𝜆DU

-

𝜆D

𝜆crit

Undetected Detected

𝜆SD

Sum

𝜆S

λcrit λ

-

𝜆DD

Contribute to SFF (Safe Failure Fraction)

λDU

Dangerous failure, undetected by automatic self-test

λDD

Dangerous failure, detected by automatic selftest

λSU

Safe failure, undetected by automatic self-test (or personnel)

λSD

Safe failure, detected by automatic self-test (or personnel)

λNONC Figure 2: Failure rate λ split into various elements

3.6 Dangerous Undetected Failures - λDU As discussed above, the critical failure rate, 𝜆crit is split into dangerous and safe failures which are further split into detected and undetected failures. When performing safety unavailability calculations, the rate of dangerous undetected failures, 𝜆DU , is of special importance, since this parameter – together with the functional test interval – to a large degree governs the prediction of how often a safety function is likely to fail on demand. As discussed in section 3.2, 𝜆DU will include both random hardware failures as well as systematic failures. Consequently, it is relevant to think of 𝜆DU as comprising two elements; 𝜆DU−RH which is the rate of DU random hardware failures (i.e., the strict IEC 61508 definition of 𝜆DU ), and 𝜆DU−SYST , being the rate of DU systematic failures, detectable by functional testing. Hence we can write: 𝜆DU = 𝜆DU−RH +𝜆DU−SYST.

17 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Further, in PDS the parameter 𝑟 is defined as being the fraction of 𝜆DU originating from random hardware failures, i.e., 𝑟 = 𝜆DU−RH /𝜆DU. Then, 1 − 𝑟 will be the fraction of 𝜆DU originating from systematic failures, i.e., 1 − 𝑟 = 𝜆DU−SYST /𝜆DU .

It must be pointed out that splitting 𝜆DU is not necessary when performing standard reliability calculations. This is further discussed when the calculation formulas are presented in the next sections. However, when considering risk reducing measures to reduce the failure rate it is advantageous to have additional knowledge on how the different failure contributors distribute.

3.6.1 Coverage Factors and Safe Failure Fraction IEC 61508 defines the diagnostic coverage (DC) as: •

DC = λDD /λD, i.e., fraction of dangerous failures detected by automatic on-line diagnostic tests. The fraction of dangerous failures is computed by using the dangerous failure rates associated with the detected dangerous failures divided by the total rate of dangerous failures

In the IEC definition (of DC) given above, the coverage only includes failures “detected by automatic online diagnostic tests”. As discussed in section 0 it will for some equipment and some installations where detected failures can be rectified quickly, be relevant also to include random observation (by control room operator, field operator or maintenance crew). However, to be in line with the IEC definition we will adapt the same definition as above for dangerous failures, whereas we keep the PDS definition (from previous PDS handbooks) for safe failures. We therefore define the coverage, c, for dangerous and safe (spurious operation) failures as: • •

𝑐D = λDD /λD, i.e., the fraction of dangerous failures detected by automatic self-tests

𝑐S = λSD /λS, i.e., the fraction of safe (spurious operation) failures that are detected by automatic self-tests (or by personnel) so that a spurious operation of the component is avoided

Thus, we see that the coverage for dangerous failures, 𝑐D now is directly comparable to the DC as defined in IEC 61508. Concerning the coverage factor for safe failures, 𝑐S , this factor is not explicitly defined in IEC 61508 and its physical interpretation seems to vary among users of the IEC standards. In PDS a safe detected (SD) failure is interpreted as a failure which is detected prior to a spurious operation of the component (or spurious trip of the system), whereas a safe undetected failure actually causes a component trip (but a system trip may be avoided due to the configuration of the system, e.g., 2oo2 voting). Finally, observe that IEC also introduces the safe failure fraction (SFF) in relation to the requirements for hardware fault tolerance. This is the fraction of failures that are not critical with respect to safety unavailability of the safety function. SFF is defined as the ratio of safe failures plus dangerous detected failures to the total failure rate and can be estimated as: •

SFF = 1 − (𝜆DU /𝜆crit ); or rather in percentage: SFF = [1 − (𝜆DU /𝜆crit )] ∙ 100 %.

3.7 Performance Measures for Loss of Safety – Low Demand Systems

The measures for loss of safety used in IEC are the average PFD (Probability of Failure on Demand) for low demand systems and PFH (Probability of Failure per Hour) for high demand systems. This section presents the various measures for loss of safety used in PDS. All these reflect safety unavailability of the function, i.e., the probability of a failure on demand. Probability of failure per hour (PFH) is discussed separately in chapter 6.

3.7.1 Contributions to Loss of Safety The potential contributors to loss of safety (safety unavailability) have in PDS been split into three main categories:

18 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

• • •

PFD: Unavailability due to dangerous undetected (DU) failures. DTU: Unavailability due to known or planned downtime PTIF : Unavailability due to TIF failures (test independent failures)

1) Unavailability due to dangerous undetected (DU) failures, i.e., unavailability caused by dangerous failures that are detectable only during functional testing or upon a demand (not revealed by automatic self-test). This unavailability, which is often referred to as “unknown” may be thought of as comprising two elements: a) The unavailability due to dangerous undetected random hardware failures (occurring with rate 𝜆DU−RH ). b) The unavailability due to dangerous undetected systematic failures (occurring with rate 𝜆DU−SYST ).

2) Unavailability due to known or planned downtime. This unavailability is caused by components either taken out for repair or for testing/maintenance. The downtime unavailability can be split in two main contributors: a) The known unavailability due to dangerous (D) failures where the failed component must be repaired. The average period of unavailability due to these events equals the mean time to restoration, MTTR, i.e., the time elapsing from the failure is detected until the situation is restored. b) The planned (and known) unavailability due to the downtime/inhibition time during functional testing and/or preventive maintenance. 3) Unavailability due to test independent failures, i.e., unavailability caused by hidden dangerous failures that are not revealed during functional testing but only upon a true demand. These failures are denoted Test Independent Failures (TIF), as they are not detected through the functional test or by automatic self-test, only during a real demand. Figure 3 illustrates the three categories of contributors to loss of safety.

Dangerous Undetected

1a)

1b)

Systematic failures

Random hardware failures

PFD

Downtime unavailability

2b) Out for testing Out for repair

2a)

Test Independent Failure (TIF)

3) Failures not covered by functional testing

DTU

PTIF

Figure 3: Loss of safety contributors in PDS It should be noted that the actual contribution to loss of safety from failures in category 2) will depend heavily on the operating philosophy, on the configuration of the process plant as well as the configuration of the SIS itself. Therefore, the downtime unavailability should be treated separately and not together with category 1) and 3). Often, temporary compensating measures will be introduced while a component is down for maintenance or repair. Other times, when the component is considered too critical to continue production (e.g., a critical shutdown valve in single configuration), the production may simply be shut down during the restoration and testing period. On the other hand there may be test- or repair-situations where parts of or the whole safety system is bypassed while production is being maintained. An example may be that selected fire and gas detectors are being inhibited while reconfiguring a node in the fire and gas system.

19 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Often the downtime unavailability is small compared to the contribution from failures in category 1). That is, usually MTTR 0.05 or when 𝑁 is large (i.e., 𝑁 > 3, see Appendix C).

Simplified PFD formulas for some different voting logics are summarised in Table 3 below. The table includes the following: i. In the first column, the voting logic (𝑀oo𝑁) is given; ii. In the second column the PFD contribution from common cause failures is included. For voted configurations like 1oo2, 1oo3, 2oo3, etc. this will often be the main contributor towards the total PFD; iii. In the third column, the contribution to PFD from independent failures is given. For a 𝑀oo𝑁 voting we get a contribution if at least 𝑁 − 𝑀 + 1 of the components fail within the same test interval. Note that for the 1oo1, 2oo2,… and 𝑁oo𝑁 voting’s, the PFD will (conservatively) equal the sum of the independent failure contributions.

30 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Table 3: Summary of simplified formulas for PFD PFD calculation formulas Voting

Contribution from independent failures

Common cause contribution 1oo1

-

1oo2

𝛽 ⋅ 𝜆DU ⋅ 𝜏/2

+

1oo3

C1oo3 ⋅ 𝛽 ⋅ 𝜆DU ⋅ 𝜏/2

+

2oo2

C2oo3 ⋅ 𝛽 ⋅ 𝜆DU ⋅ 𝜏/2

+

1oo𝑁 𝑁 = 2, 3, …

C1oo𝑁 ⋅ 𝛽 ⋅ 𝜆DU ⋅ 𝜏/2

+

C𝑀oo𝑁 ⋅ 𝛽 ⋅ 𝜆DU ⋅ 𝜏/2

+

𝑀oo𝑁 𝑀 < 𝑁; 𝑁 = 2, 3, … 𝑁oo𝑁 𝑁 = 1, 2, 3, …

(𝜆DU ⋅ 𝜏)2 /3

-

2oo3 3oo3

𝜆DU ⋅ 𝜏/2

2 ⋅ 𝜆DU ⋅ 𝜏/2

(𝜆DU ⋅ 𝜏)3 /4 (𝜆DU ⋅ 𝜏)2

-

-

3 ⋅ 𝜆DU ⋅ 𝜏/2

1 ⋅ (𝜆DU ⋅ 𝜏)𝑁 𝑁+1 𝑁! (𝑁 − 𝑀 + 2)! ⋅ (𝑀 − 1)! ⋅ (𝜆DU ⋅ 𝜏)𝑁−𝑀+1 𝑁 ⋅ 𝜆DU ⋅ 𝜏/2

For slightly more accurate formulas, reference is made to Appendix C.

5.2.2 Comparison with other Methods To illustrate the quantitative effect of using the above simplified formulas as compared to performing more exact calculations – here represented by using Markov chain modelling with the standard 𝛽-factor model – some typical PFD calculations have been performed. Also, for further comparison, the results from using the IEC 61508 formulas have been included (ref. table B.2–B.4 of IEC 61508-6).

31 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Table 4: Calculated PFD using alternative calculation methodologies Calculated PFD 1)

System / parameter specification Voting

1)

1oo1 1oo2 2oo2 1oo3 2oo3

𝜷

0.02 0.10 0.02

𝝀𝐃

5 ⋅ 10−6 2.5 ⋅ 10−6 0.5 ⋅ 10−6 5.0 ⋅ 10−6 2.5 ⋅ 10−5

𝐃𝐂

60 % 60 % 60 % 90 % 90 %

𝝉

12 months 12 months 6 months 24 months 12 months

PDS

Markov

IEC 61508

8.8 ⋅ 10−3 1.1 ⋅ 10−4 8.8 ⋅ 10−4 2.2 ⋅ 10−4 3) 9.2 ⋅ 10−4 3)

8.7 ⋅ 10−3 1.1 ⋅ 10−4 8.7 ⋅ 10−4 2) 4.4 ⋅ 10−4 6.7 ⋅ 10−4

8.8 ⋅ 10−3 1.1 ⋅ 10−4 8.8 ⋅ 10−4 4.4 ⋅ 10−4 4) 7.1 ⋅ 10−4 4)

For the IEC 61508 results, a mean restoration time of 8 hours have been assumed for dangerous failures. For the PDS calculations the contribution from repair of dangerous failures is not included (treated separately as DTU, ref section 5.2.3). However, as seen from the results this difference is negligible, which is also confirmed by the Markov modelling where the contribution from repair has been omitted. 2) 3)

For a 2oo2 voting a 𝛽-factor of 0.02 has been applied for the Markov modelling

For a 1oo3 and a 2oo3 voting, the PDS and IEC/Markov modelling results differ as expected. IEC (and here also the Markov modelling) apply the standard 𝛽-factor model for CCF, which treats a 1oo3 and 2oo3 voting similar to a 1oo2 voting, whereas the PDS formulas include the C𝑀oo𝑁 factors (ref. discussion in section 4.1-4.2). 4)

If using the correction factors for β as given in IEC 61508-6, Table D.5, the same results are obtained using the IEC 61508 and the PDS formulas.

5.2.3 Formulas for Downtime Unavailability (DTU) The downtime unavailability includes two elements: 1) The downtime related to repair of (dangerous) failures. The average duration of this period is the mean time to restoration (MTTR); 2) The downtime (or inhibition time) resulting from planned activities such as testing and preventive maintenance. As discussed in previous sections, the contribution from downtime unavailability will depend on the operational philosophy and the configuration of the process as well as the SIS itself. Further, statutory requirement saying that compensating measures shall be introduced upon degradation of a critical safety function will also affect the operational philosophy. Hence, which formulas to apply, will depend on several factors. Below, some approximate formulas and the corresponding assumptions underlying these formulas are given for each of the two downtime contributors listed above. 1) Downtime Unavailability due to Repair of Dangerous Failures – DTUR The approximate formulas for the downtime unavailability due to repair, here referred to as DTUR , are comparable to the PFD formulas presented above. However, given that a dangerous failure has occurred, the average “known” unavailability period is MTTR rather than 𝜏/2.

If we follow IEC 61508, we include the MTTR of all dangerous failures, and for a single component, the related downtime unavailability is then approximately 𝜆D ⋅ MTTR. Whether it is correct to treat DD failures detected during normal operation and DU failures revealed during a functional test or upon a true demand, similarly, is a matter of discussion. However, in order to be in line with the IEC standard, the following discussion handles dangerous failures in general.

32 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

When establishing formulas for DTUR, the operational philosophy must be specified. Here, three possible operational/repair philosophies are considered: 1. Always shut down. This (extreme) philosophy may apply for the most critical safety functions, and means that production is shut down (even for redundant systems) whenever at least one component of the safety function experiences a dangerous failure. In such case there will be no contribution to the DTUR, but there will be a contribution to loss of production. 2. Degraded operation if possible; otherwise shutdown. This may be the most common philosophy. If all redundant components have a dangerous failure there will be a shutdown, otherwise there will be degraded operation. If there is a single D failure in a 2oo3 voting, it must be specified whether the degraded operation is a 1oo2 or a 2oo2 voting. Note that if a 2oo3 voting degrades to a 1oo2 configuration, the safety performance actually improves, and no degradation term should be added, ref. Appendix C. 3. Always continue production, even with no protection. This is another extreme philosophy where all the (redundant) components have experienced a dangerous failure, but production is continued during the repair/restoration period even with no protection available. Observe that the above list is not complete since alternative operational philosophies can be foreseen (“combinations” of the above). Also note that the possibility of introducing compensating measures has not been included in this discussion. Table 5 presents DTUR formulas for three common configurations for the two operational philosophies that may give DTUR contributions.

Table 5: Formulas for 𝐃𝐓𝐔𝐑 for some voting logics and operational philosophies Initial voting logic

Failure Type

𝟏𝐨𝐨𝟏

Single failure

𝟏𝐨𝐨𝟐

Both components fail

Single failure

Single failure 𝟐𝐨𝐨𝟑 1)

Contribution to 𝐃𝐓𝐔𝐑 for different operational/repair philosophies 1), 2) Degraded operation

Operation with no protection

N/A

𝜆D ⋅ MTTR

Degraded operation with 1oo1: 2 ⋅ 𝜆D ⋅ MTTR ⋅ 𝜆DU ⋅ 𝜏/2 N/A

Degraded operation with 2oo2: 3) 3 ⋅ 𝜆D ⋅ MTTR ⋅ 2 ⋅ 𝜆DU ⋅ 𝜏/2

Degraded operation with 1oo1: Two components fail (C2oo3 − C1oo3 ) ⋅ 𝛽 ⋅ 𝜆D ⋅ MTTR ⋅ 𝜆DU ⋅ 𝜏/2 All three components fail

N/A

N/A

𝛽 ⋅ 𝜆D ⋅ MTTR N/A N/A C1oo3 ⋅ 𝛽 ⋅ 𝜆D ⋅ MTTR

Note that 𝜆D has been used in the formulas to ensure consistency with IEC 61508. In Appendix C the more correct 𝜆DD has been applied since degraded operation mainly will take place upon a detected failure. 2) Also note that the formulas provided here do not distinguish between the MTTR for one or two (three) components. 3) Degradation to a 1oo2 voting gives no contribution to the DTU, since a 1oo2 voting actually gives increased safety as compared to a 2oo3 voting.

When the operational/repair philosophy for safe (detected and undetected) failures are specified, similar DTUR formulas as shown in Table 5 can be established also for these failure types. If the same repair

33 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

philosophy applies for all critical failures, the approximate DTUR formulas as given in Table 5 can be applied, simply replacing 𝜆D with 𝜆crit .

2) Downtime Unavailability due to Functional Testing/Preventive Maintenance – DTUT The downtime unavailability due to planned testing (or other preventive maintenance) activities, is here referred to as DTUT. The DTUT can for a single component be given as 𝑡/τ, where 𝑡 is duration of function being bypassed during functional testing, and τ is time between tests. Hence, this is simply the fraction of time when the function is being bypassed.

Similarly, if redundant (voted) components are taken out for testing simultaneously while production is maintained, (as is often the case for e.g., gas detectors), the DTUT can also be given as 𝑡/τ. Here, 𝑡 is still the duration of the function being by-passed during the test; (may differ from the 𝑡 above when a single component is tested). For some redundant systems, e.g., two process sensors voted 1oo2, it may be more relevant that one sensor is taken out for testing, while the other is still operating. Hence, in this period, the function is actually degraded to a 1oo1 system, and we therefore need to calculate the unavailability contribution during the time of degraded operation: While testing the first component (component 1), a time period τ has elapsed since the last test. The corresponding unavailability contribution while testing component 1 then becomes: DTUT(1) ≈ (𝑡/𝜏) ⋅ 𝜆DU ⋅ 𝜏 = 𝑡 ⋅ 𝜆DU.

Here, 𝑡/τ is the fraction of time while component 1 is tested, and 𝜆DU ⋅ 𝜏 is the probability that a dangerous undetected failure has been introduced to component 2 during the period, τ since last test. While testing component 2, component 1 will be active and has just been tested (“as good as new”). Hence, the unavailability contribution from testing of component 2 then becomes: DTUT(2) ≈ (𝑡/𝜏) ⋅ 𝜆DU ⋅ 𝜏/2.

Assuming that 𝑡 1) there must be more than one failure in a test interval in order to have a system failure, and so the length of the test interval, 𝜏, necessarily enters the PFH formula; (an alternative explanation of the above formula is given below for a 1oo2 voting.)

Further observe that, given a system failure, this will on the average occur in the last part of the interval when all components have failed, and 1/(𝑁 + 1) is actually the average fraction of the interval where all 𝑁 components are failed. So the probability that the system has a dangerous undetected failure is the probability that all units fail in an interval, multiplied with this fraction of the interval having a system failure; see above formula for PFD1oo𝑁 .

42 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Observe that the two safety measures – PFH and PFD – are just two different ways of expressing loss of safety, and in the present example we have 10: PFD1oo𝑁 ≈ PFH1oo𝑁 ⋅ 𝜏/(𝑁 + 1) ;

(For 𝑁 = 1, 2, 3, … ; independent failures only).

Also note that neither of these measures actually restricts themselves to be used for low demand or high demand mode only. However, there is a difference in interpretation: PFD is the relative fraction of time that the system is unavailable, whereas PFH expresses the frequency at which DU failures occur (irrespective of duration of the resulting unavailability).

6.3 Using PFD or PFH?

When choosing between the measures PFD and PFH it is important to consider the foreseen frequency of process demands. If this is higher than once per year the system should – according to IEC 61508 – be treated as high demand, implying the use of PFH. This criterion for deciding which measure to apply is considered in more depth in Appendix E. Mainly some conclusions are referred below. First, it is claimed that an important criterion for choosing a loss-of-safety measure should be its ability to capture how the safety depends on the following parameters: • • • •

Failure rates (in particular 𝜆DU ) and 𝛽 Configuration, (i.e. , C𝑀oo𝑁 ) Length of interval of functional testing, (𝜏) Demand rate, (𝛿).

The effect of the demand rate, 𝛿, is in most analyses not explicitly accounted for although it certainly affects safety. In the present context, where low demand and high demand is introduced, the demand rate however becomes an important parameter. Secondly, we will include the hazard rate (HR), which is the rate of demands where the SIS has failed. This is an important safety parameter as it expresses the actual hazard frequency. Thus, in addition to calculating the PFD or PFH, we also want to estimate the HR. In standard reliability theory where the effect on the PFD from the number of demands is not accounted for, HR = PFD ∙ 𝛿. Since the demands may actually reduce PFD by serving as a functional test, this relationship can be somewhat more complex. However, for a fixed average PFD the hazard rate and the demand rate will be proportional.

The investigations in Appendix E mainly consider the standard approximation regarding PFH and PFD, i.e., only the contribution from CCF is considered for 𝑀oo𝑁 (𝑀 < 𝑁). It is concluded as follows regarding measures for loss-of-safety due to DU failures:

10



PFH seems a sensible measure for systems operating in continuous mode, when we are talking of more or less immediate failure detection



PFH alone is not suited as a measure for loss of safety for on demand systems, (neither low demand nor high demand); irrespective of 𝜏 and 𝛿: o The main contributor to PFH does not depend on the length of the test interval, 𝜏. So the decrease in safety experienced by increasing 𝜏 is essentially not captured by PFH. o No argument is found for using PFH instead of PFD if the number of demands is above one (1) per year. PFH is essentially constant, independent of the demand rate, 𝛿, and does not reveal how the risk depends on the demand rate. However, by starting from PFH, we can easily calculate both PFD and HR.

Here the factor 𝜏/(N + 1) can be interpreted as the average duration of system unavailability when all N components have failed

43 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition



The relation HR = 𝛿 ∙ PFD can be shown to be valid also in the generalized situation where PFD depends on 𝛿 (see Appendix E). So when the demands actually serve as functional tests, it is recommended that the above generalized expressions for HR and PFD (depending on both 𝜏 and 𝛿, see Appendix E), should be used to determine whether the SIS has an acceptable safety unavailability.

Summing up the above it can be concluded that for systems working on demand, such as emergency shutdown systems, process shutdown systems and fire and gas detection systems, it should generally be preferred to use PFD rather than PFH as the safety unavailability measure. However, using the hazard rate (HR) may be an even better alternative for estimating the associated risk.

6.4 PFH Formulas; Including both Common Cause and Independent Failures In this section we will for completeness present simplified formulas for calculating PFH for different voting configurations; now also taking into account the effect of combinations of independent failures. The focus is still on DU failures, but it is also discussed how to include the effect of DD failures. Note that the effect of demands is not accounted for in this section.

6.4.1 Assumptions for Simplified PFH Formulas In section 5.1 some assumptions underlying the PFD formulas were given. Similarly, when establishing simplified formulas for PFH certain assumptions must be made, i.e.: •

All failure rates are considered constant with respect to time; i.e., an exponential failure model is assumed.



PFH is calculated as an average value.



A component is considered “as good as new” after a repair or a functional test (standard assumption).



The time between diagnostic self-tests is assumed significantly lower than the time between demands.



The self-test period is “small” compared to the interval between functional testing, i.e., at least a factor 100 lower.



When giving the “simple” formulas for PFH, (section 6.4.2), the contribution from unavailability due to repair and testing of components is not included; (cf. discussion in section 6.4.2), i.e., short MTTRs are assumed.



For single (1oo1) component systems, the system is immediately put in a safe state upon detection and repair of a dangerous detected failure. Similarly, a DD failure affecting all 𝑁 redundant components of a system will upon detection immediately result in the system going to a safe state. So, in these simplified formulas we actually ignore DD failures, and PFH equals the rate of DU failures.



The PFH of the function (safety system) is obtained by summing the PFH of each (series of / set of) redundant modules(s). That is we assume that PFHA and PFHB are small enough to let:

• •

1 − (1 − PFHA ) ⋅ (1 − PFHB ) ≈ PFHA + PFHB .

The term λDU ⋅ τ should be small enough to allow e−λDU⋅τ ≈ 1 − λDU ⋅ τ, i.e., λDU ⋅ τ ≤ 0.2.

The rate of independent DU failures is throughout approximated with λDU (rather than e.g., using (1 − 𝛽) ⋅ λDU for 1oo2).

44 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition



For 𝑁 ≥ 3 we ignore the contribution of a combination of single and double failures. For instance, when considering a triple system voted 1oo3, we will only include, in the system failure frequency, common cause failures taking all three components out or three separate (independent) failures. Consequently, we will disregard the possibility that within the same test interval one common cause failure takes out two components whereas the third component fails independently.



The formulas given here do not account for demands as a means of testing to detect dangerous failures (ref discussion in section 3.4.3).

6.4.2 Simplified PFH Formulas In the following simplified formulas the above list of assumptions applies, implying that the contribution from dangerous detected failures is negligible. For inclusion of DD failures, relaxing some of the assumptions, see next section.

1oo1 voting

As discussed above, when considering a single 1oo1 system which goes to a safe state upon detection of a dangerous failure, the PFH will equal the dangerous undetected failure rate: PFH1oo1 = λDU .

Observe that for a single component system the test interval τ does not enter the PFH formula. A constant failure rate is assumed and functional testing (of a single component system) will not influence this failure rate. Additionally, frequent self-tests prevent DD failures to contribute significantly.

1oo2 voting

For a duplicated module, voted 1oo2, we get the following contribution when first considering only the common cause failures: (CCF)

PFH1oo2 ≈ 𝛽 ⋅ 𝜆DU .

In addition we need to add the contribution from independent failures. This contribution can be approximated by: (ind.)

PFH1oo2 ≈ (𝜆DU ⋅ 𝜏)2 /𝜏 = 𝜆DU 2 ⋅ 𝜏.

Observe that for a duplicated- and generally for a redundant system, the test interval 𝜏 does enter the PFH formula. This can be explained as follows for a system voted 1oo2. Upon failure of either of the components (with constant rate 𝜆DU) the likelihood of the other unit also being down (and thus give a system filure upon a demand) will inevitably depend on how long it is since the components have been tested. Or said in other words: For a single system it is assumed that a critical situation will occur “at once” upon the introduction of a DU failure, whereas for a redundant system there will be “back-up” and the availability of this back-up will depend on the time since the last functional test. Now, the contribution from independent failures in a 1oo2 voting can be intuitively interpreted as follows: Consider a system with two redundant components A and B each failing with the constant rate 𝜆DU . If component A fails this will “place a demand” on component B, which must have failed for the redundant system to fail. The rate of this event can be expressed as the rate of failure of component A, i.e., 𝜆DU , times the likelihood of component B to be failed upon a demand, i.e., (𝜆DU ⋅ 𝜏)/2. Hence the rate of the above event becomes: 𝜆DU ⋅ (𝜆DU ⋅ 𝜏)/2. Similarly, component B can fail first with rate 𝜆DU which is multiplied with the likelihood of component A failing on demand, i.e., 𝜆DU ⋅ (𝜆DU ⋅ 𝜏)/2. By adding these two equal (ind.) contributions, we obtain the above expression of PFH1oo2 .

45 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Hence, when including both the common cause and the contribution from independent failures, we get the following PFH formula for a 1oo2 voted system: PFH1oo2 ≈ 𝛽 ⋅ 𝜆DU + (𝜆DU ⋅ 𝜏)2 /𝜏.

2oo3 voting

For components voted 2oo3, the common cause contribution to the PFH is given by: (CCF)

PFH2oo3 ≈ C2oo3 ⋅ 𝛽 ⋅ 𝜆DU .

The contribution from independent failures can be approximated by: (ind.)

PFH2oo3 ≈ 3 ⋅ (𝜆DU ⋅ 𝜏)2 /𝜏 = 3 ⋅ 𝜆DU 2 ⋅ 𝜏.

We observe that this contribution is three times the independence contribution for a 1oo2 voting. This can be intuitively explained as follows: Consider a system with three components A, B and C voted 2oo3, each failing with a constant rate 𝜆DU . For a 2oo3 voting two components must fail for the system to fail 11. Now consider the case when component A fails. Then this will place a demand on component B and C and upon failure of either of these components the system fails. The rate of this event can be expressed as the rate of failure of component A, i.e., 𝜆DU , times the likelihood of either component B or C being failed upon a demand, i.e., 2 ⋅ (𝜆DU ⋅ 𝜏)/2. Hence the rate of the above event becomes: 𝜆DU ⋅ (𝜆DU ⋅ 𝜏). Similarly component B or C can fail first, and by adding these three equal contributions, we obtain the above (ind.) expression of PFH2oo3 . When including both the common cause and the contribution from independent failures, we then get the following PFH formula for a 2oo3 voted system:

MooN voting

PFH2oo3 ≈ C2oo3 ⋅ 𝛽 ⋅ 𝜆DU + 3 ⋅ (𝜆DU ⋅ 𝜏)2 /𝜏.

Generally, for components voted 𝑀oo𝑁, the common cause contribution to the PFH is given by: (CCF)

PFH𝑀oo𝑁 ≈ C𝑀oo𝑁 ⋅ 𝛽 ⋅ 𝜆DU ; (𝑀 < 𝑁).

When also including the contribution from independent failures, we obtain the following approximate formula for an 𝑀oo𝑁 voted system, ref. /18/: PFH𝑀oo𝑁 ≈ C𝑀oo𝑁 ⋅ 𝛽 ⋅ 𝜆DU +

𝑁! ⋅ [(𝜆DU ⋅ 𝜏)𝑁−𝑀+1 /𝜏] (𝑁 − 𝑀 + 1)! ⋅ (𝑀 − 1)!

Finally for an 𝑁oo𝑁 voting, we may apply the following approximation formula: PFH𝑁oo𝑁 ≈ 𝑁 ⋅ 𝜆DU.

11

We have then disregarded the possibility of triple failures which will also give a system failure for the 2oo3 system.

46 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Summary of simplified formulas for PFH

The simplified/approximate PFH formulas for some different voting logics are summarised in Table 9. The table includes the following: i. In the first column, the voting logic (𝑀oo𝑁) is given; ii. In the second column the PFH contribution from common cause failures is included; iii. In the third column, the contribution to PFH from independent failures is given.

Table 9: Simplified formulas for PFH, (DU failures only; not accounting for demands as a test) PFH calculation formulas

Voting

Contribution from CCF

Contribution from independent failures

-

𝜆DU

𝟏𝐨𝐨𝟏 𝟏𝐨𝐨𝟐

𝛽 ⋅ 𝜆DU

+

𝟏𝐨𝐨𝟑

C1oo3 ⋅ 𝛽 ⋅ 𝜆DU

+

𝟐𝐨𝐨𝟐

-

𝟐𝐨𝐨𝟑

C2oo3 ⋅ 𝛽 ⋅ 𝜆DU

+

𝑴𝐨𝐨𝑵; 𝑴 < 𝑵; 𝑵 = 𝟐, 𝟑, …

C𝑀oo𝑁 ⋅ 𝛽 ⋅ 𝜆DU

+

𝟑𝐨𝐨𝟑

𝑵𝐨𝐨𝑵; 𝑵 = 𝟏, 𝟐, 𝟑, …

(𝜆DU ⋅ 𝜏)2 /𝜏 2 ⋅ 𝜆DU

(𝜆DU ⋅ 𝜏)3 /𝜏

3 ⋅ (𝜆DU ⋅ 𝜏)2 /𝜏

-

-

3 ⋅ 𝜆DU

𝑁! ⋅ (𝜆DU ⋅ 𝜏)𝑁−𝑀+1 /𝜏 (𝑁 − 𝑀 + 1)! ⋅ (𝑀 − 1)! 𝑁 ⋅ 𝜆DU

Note that there is a close connection between the PFH formulas given above and the ones for PFD in Table 3. The results could be summarized as follows: For MooN voting (𝑀 < 𝑁):

(CCF)

PFD𝑀oo𝑁 ≈ PFH𝑀oo𝑁 ⋅ 𝜏/2

(ind.)

And for NooN:

PFD𝑀oo𝑁 ≈ PFH𝑀oo𝑁 ⋅ 𝜏/(𝑁 − 𝑀 + 2) PFD𝑁oo𝑁 ≈ PFH𝑁oo𝑁 ⋅ 𝜏/2

So except for combinations of independent failures we have (see Section 6.3) that PFD ≈ PFH ⋅ 𝜏/2, ("linear case").

6.4.3 Including MTTR and DD Failures in the PFH Formulas As stated in the list of assumptions, the above formulas are valid when:

47 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

1. Time between diagnostic self-tests is "very short", so that DD failures can be assumed to be detected "immediately"; 2. Also the MTTR is very short (e.g. compared to 𝜏), and is in practice we let MTTR = 0; 3. A DD failure causing a system failure will upon detection result in the system immediately going to the safe state. This assumption corresponds to the philosophy (see Section 5.2.3) "Degraded operation if possible; otherwise shutdown". Note that this is the same operational philosophy specified in IEC61508 for calculating PFH (see Section B.3.3 in part 6 of the standard).

When combining assumptions 2 and 3, the result is – as shown in the above simplified formulas - that the DD failures are ignored and (for a 1oo1 voting) the PFH equals the rate of DU failures (a DD failure can never cause a system failure due to assumption 3, and degradation periods due to DD failures are ignored due to assumption 2). In this section we will relax some of these assumptions. Assumption 1 and initially also assumption 3 are maintained. Assumption 2 is however removed, as MTTR is assumed sufficiently long to affect unavailability (of channels) due to independent failures. As compared to the PFD/CSU (low demand) calculations, this corresponds to including the DTUR contribution in the unavailability measure.

Consider first a 1oo2 voting configuration. PFH still consists of two terms (Table 9), the CCF contribution resulting from DU failures and the contribution from an independent DU failure in both channels. The second contribution is now changed: • •

First one of the two channels fail, causing one channel being unavailable: The mean duration of this unavailability is 𝜏/2 for a DU failure and MTTR for a DD failure. During this unavailability of one channel, there is a DU failure of the second channel,

This results in the following PFH:

𝜏 PFH1oo2 = 𝛽 ⋅ 𝜆DU + 2 ⋅ �𝜆DU ∙ + 𝜆DD ∙ MTTR� ⋅ 𝜆DU 2

Observe that for MTTR = 0 this reduces to the expression given in Table 9. When we compare with the corresponding expression given in Section B.3.3.2 of IEC61508-6, the results are very similar. The standard introduces different 𝛽 values for DU and DD failures; (1 − 𝛽D ) ∙ 𝜆DD is used for the rate of independent DD failures and (1 − 𝛽) ⋅ 𝜆DU is used rather than 𝜆DU for the rate of independent DU failures (which is similar to the more accurate PFD formulas in Appendix C). Further, in the standard the mean unavailability period of a DU failure is given as 𝜏/2 + 𝑟𝑒𝑝𝑎𝑖𝑟 𝑡𝑖𝑚𝑒, rather than 𝜏/2 as used above.

Following the same line of argument for 2oo3 and 1oo3 configurations we get: and

𝜏 PFH2oo3 = C2oo3 ⋅ 𝛽 ⋅ 𝜆DU + 6 ⋅ �𝜆DU ⋅ + 𝜆DD ⋅ MTTR� ⋅ 𝜆DU 2 𝜏 2

𝜏 3

PFH1oo3 = C1oo3 ⋅ 𝛽 ⋅ 𝜆DU + 6 ⋅ �𝜆DU ⋅ + 𝜆DD ⋅ MTTR� ⋅ �𝜆DU ⋅ + 𝜆DD ⋅ MTTR� ⋅ 𝜆DU .

Again, these formulas are comparable to the results of the IEC standard. It is worth observing that 𝜆DU is a common factor in all the PFH expressions given above, corresponding to the rate of the "final" DU failure that causes the system to fail. The general expression for a 𝑀𝑜𝑜𝑁 voting is relatively complex. However, for a 𝑁𝑜𝑜𝑁 voting we have, as before:

48 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

PFH𝑁oo𝑁 = 𝑁 ⋅ 𝜆DU.

Finally, consider a change to assumption 3 given above and assume that for instance a DD failure of all channels will give a system failure (i.e. the system is not brought to a safe state). This means that the "last failure", causing the system to fail, has rate 𝜆D = 𝜆DU + 𝜆DD rather than 𝜆DU . So the result for this operational philosophy, "No protection" is given as: No protection

PFH𝑀oo𝑁

=

𝜆D ⋅ PFH ∗ 𝑀oo𝑁 𝜆DU

where PFH ∗ 𝑀oo𝑁 represent the expressions for the philosophy used above: "Degraded operation if possible; otherwise shutdown". Recently a formula including both DD and DU failures, but still having MTTR = 0, has been obtained. The result is, see /23/ 12: No protection

PFH𝑀oo𝑁

𝑁!

≈ C𝑀oo𝑁 ⋅ 𝛽 ⋅ (𝜆DU + 𝜆DD ) + (𝑁−𝑀+1)!⋅(𝑀−1)! ⋅ �(𝜆DU ⋅ 𝜏)𝑁−𝑀 ⋅ (𝜆DU + 𝜆DD )�.

So if referring to PFH𝑀oo𝑁 as the formula obtained from Table 9, it is easily seen that by also inserting 𝜆D = 𝜆DU + 𝜆DU we again have the following simple relation: No protection

PFH𝑀oo𝑁

=

𝜆D ⋅ 𝑃𝐹𝐻𝑀oo𝑁 𝜆DU

Often 𝜆DD ≫ 𝜆DU , and for the operational philosophy "No protection" one should apply this PFH formula, i.e., include DD failures in the PFH calculations, irrespective of 𝜏1 value.

6.4.4 Failure data for high demand systems

The use of failure data for high demand systems is a difficult issue. Most of the SIS data from the process industry are based on low demand systems. Applying these data for demand systems with higher demand rates is necessarily not correct since other failure mechanisms may be introduced. Further, failure data for continuously operated components (pneumatic valves, relays, contactors, position switches, etc.) may originally be derived based on cyclic fatigue testing and the subsequent procedure described in ISO 13849 13 for deriving mean time to failure (MTTF). Using these data for on demand systems may be a dubious exercise since systems which are basically stand-by will experience different failure mechanisms than systems operating more or less continuously.

12

Actually /23/ obtains a formula including even further terms, but the present formula gives the main terms. EN-ISO 13849-1: “Safety of machinery – Safety related parts of control system – part 1: General principles for design”. In section C.4 of this document a procedure for deriving at MTTF figures based on results from cyclic fatigue testing is described. 13

49 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

7 CALCULATIONS FOR MULTIPLE SYSTEMS 7.1 Background Often more than one SIS – or a combination of a SIS and other safety systems – is necessary in order to obtain a desired risk reduction. The safety systems may then work in sequence, where the additional systems are back-up systems that are activated once the preceding system fails. Such redundant safety systems are commonly referred to as safety layers (e.g., in the LOPA terminology) and the total protection system may be referred to as a multiple safety system or a multiple SIS. Normally when having multiple safety systems, it is sufficient that one of the systems works successfully in order to have a successful safety action. In reliability terminology this means that the systems are voted 1oo𝑁 where 𝑁 is the number of redundant systems.

In many industries, such as the petroleum industry, two or possibly three safety layers are common. A simple example is given in Figure 8 where a possible solution for high level protection of a pressure vessel is indicated. Here, level protection of the vessel is implemented both through the PSD system and the ESD system. Signal out

Signal out

Pilot

ESV

PSD Logic

ESD Logic

LT2

LT1

Pilot

XV

Pressure vessel

Figure 8: Example of a multiple SIS – combining PSD and ESD level protection When addressing the total reliability of several SISs combined, one often calculates the average PFD (PFDavg) of each SIS independently, and combines the results to find the total PFD of the multiple SIS. A standard approach, adopted by many tools and practitioners, is to simply take the product of the individual PFDavg to find the total (average) PFD. This is appropriate as long as the PFDs of the systems are independent, but independence is rarely the case. Dependencies exist between the SISs, as well as between components within a SIS, due to e.g., simultaneous functional testing, close location or common utility sources (hydraulic, electricity, etc.). Dependent failures may be split into three categories; common cause failures (CCF), cascading failures and negative dependencies, ref. /15/. CCFs are discussed separately in chapter 4, whereas cascading failures and negative dependencies are not explicitly addressed in this handbook. In this chapter we primarily focus on the impact of the systemic dependencies introduced by periodic testing. We consider mainly the PFD arising from independent failures of components, but CCF are also treated to some extent. The calculation error made when simply multiplying average PFD values may be negligible or significant, depending on the case at hand. It may also be either conservative or non-conservative. In order to minimize

50 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

the error and ensure robustness of the calculations, a correction factor (CF) is introduced to the product of average PFD values. For example, for a multiple SIS comprising two layers, the average PFD of the multiple SIS can be calculated as: PFDavg = CF ∙ PFDavg (SIS1 ) ∙ PFDavg (SIS2 )

The main purpose of this chapter is to discuss the appropriate use of correction factors (CFs) in various cases, and to suggest CFs for multiple SIS.

It should be pointed out that as the complexity of modelling increases due to e.g. number of components, dependencies between components, specific repair strategies and alternative probability distributions, simplified formulas may reach their limit of application. It may then be appropriate to explore alternative modelling approaches, like Markov driven fault trees and Petri Net modelling. For further reading on this, reference is made to the new ISO TR 12489, /24/.

7.1.1 Time-Dependent PFD and Average PFD In reliability calculations and in this handbook components are assumed to have a constant failure rate, denoted 𝜆. The time to failure then follows an exponential probability distribution with parameter 𝜆. The component unavailability at time 𝑡 is the probability of having failed at time 𝑡, and is called the probability of failure on demand (PFD) at time 𝑡. The PFD is an increasing function of time, with PFD(𝑡) = 1 − e−𝜆𝑡 . For small values ( 1), where several MCS will arise.

This result can be used to calculate the PFD of the overall SIS quite accurately without explicitly identifying a CF for the multiple SIS by the “cut set” method, as long as all the MCSs are known. The PFD contribution from a MCS is the product of the PFDs of the components in the MCS, multiplied by the CF, and the total PFD of the multiple SIS is the sum of all contributions, i.e. 𝐾

𝑛𝑘

𝑘=1

𝑖=1

2𝑛𝑘 PFD = � . � PFD𝑘𝑖 𝑛𝑘 + 1

where 𝐾 is the number of MCS, 𝑛𝑘 is the number of components in MCS 𝑘 and PFD𝑘𝑖 is the PFD of component 𝑖 in the MCS 𝑘.

It should be noted that the “cut set” approach is rather advanced compared to the other approaches discussed in this chapter. The required calculations are somewhat tedious to perform by hand, but may be easily automated in a spread sheet or a more dedicated computer tool.

D.1.6 Discussion Comparing the six approaches discussed in this handbook, we see that they all end up with very similar CFs when applied to a common example (Table D.6), except the very conservative “maximal order” approach. Table D.6: Summary of possible approaches to determining the CF for a multiple SIS

1 2 3 4 5 6

Approach

SIS element structure

“Global” “Maximal order” “Minimal order” “Dominant element” “Structural average” “Cut set”

Unknown/disregarded Known Known Known Known Known

Element PFD contribution Unknown/disregarded Unknown/disregarded Unknown/disregarded Known Known Known

CF (example) 1.33 2.0 1.33 1.33 1.36 1.35

The similar results are explained by the fact that in the given example each SIS is dominated by single component elements with respect to PFD contribution. Especially the CCF elements, which are always modelled as voted 1oo1, have a significant contribution. This dominance of single component elements makes each approach identical or very similar to the “global” approach. If the single component elements were less dominant, e.g., if CCF failures were not modelled, there would be a greater variation in the suggested CFs from the various approaches.

82 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

More than anything, the results shows that the simple “global” approach will be adequate for the vast majority of conceivable multiple SIS calculations, and that there is seldom need for adopting one of the other advanced or overly conservative approaches.

D.2 The effect of differences in testing In order to be able to assess the need for correction factors for multiple SIS, we first need to study the effect of testing on the PFD estimates. In the following we consider test intervals of different phasing and/or different length. Furthermore, we consider simple SIS structures consisting of two components (in parallel), and generalize to three or more components when appropriate.

D.2.1 Different phasing of test intervals We first consider the case with two components voted 1oo2 having failure rates 𝜆1 and 𝜆2 , respectively, and the same test interval 𝜏, but where the testing is not done simultaneously. The situation is illustrated in Figure D.7. PFD ≈ 𝜆2 (𝑡 + 𝜏 − 𝑎)

PFD2(t)

≈ 𝜆1 𝑡

a

0

PFD1(t)

≈ 𝜆2 (𝑡 − 𝑎)

τ

t



Figure D.7: PFD(t) for two components with equal test intervals, but not tested simultaneously. The representative interval (0,τ) is in grey Assuming that component 2 is tested at time 𝑎 inside the test interval of component 1, we have: 𝜏

𝑎

𝜏

1 1 1 PFD(𝑎) = � PFD1 (𝑡) ∙ PFD2 (𝑡)d𝑡 = � 𝜆1 𝑡 ∙ 𝜆2 (𝑡 + 𝜏 − 𝑎)d𝑡 + � 𝜆1 𝑡 ∙ 𝜆2 (𝑡 − 𝑎)d𝑡 𝜏 𝜏 𝜏 0

0

𝑎

4 2𝑎 2𝑎2 𝜆1 𝜆2 𝜏 2 4 2𝑎 2𝑎2 =� − + 2 �∙ =� − + 2 � ∙ PFD1 ∙ PFD2 𝜏 4 𝜏 3 𝜏 3 𝜏

The expression in parentheses is the CF needed when calculating with average PFD values. PFD(a) attains its maximum value PFDmax =

4 ∙ PFD1 ∙ PFD2 3

PFDmin =

5 ∙ PFD1 ∙ PFD2 6

when 𝑎 = 0 or 𝑎 = 𝜏, i.e., when the components are tested simultaneously. This corresponds to the well known CF of 4/3 for two redundant components. Further, PFD(𝑎) attains its minimum value when 𝑎 = 𝜏/2, i.e., when component 2 is tested in the middle of the test interval of component 1:

83 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Note that this minimum PFD is actually lower than the PFD obtained when simply multiplying the average PFD values of the components. This implies a CF less than 1, i.e., that the uncorrected product of average PFD values is actually conservative. Compared to the case of simultaneous testing, we see a PFD reduction of 38 % in the case of “optimal” testing. Hence, there is a substantial potential for improvement in the total PFD if components are tested at different times. This is exploited in staggered testing, /15/, where components are not tested simultaneously, but tests are distributed as evenly as possible in time. Staggered testing is intended to reduce the impact of common cause failures, but have a significant positive effect on the PFD arising from independent failures as well. In our example, staggered testing would imply testing component 2 at 𝑡 = 𝜏/2, which yields the minimum PFD value for the system of two components. Despite the positive effect, staggered testing is often a desk exercise; in real operation such a test regime may prove unlikely due to practical considerations. If we have no prior knowledge of 𝑎, i.e., if the two components are tested completely independently (this corresponds to selecting 𝑎 at random), the expected average PFD can be obtained by integrating PFD(𝑎). 15 1 𝜏 𝜆1 𝜆2 𝜏 2 PFD = � PFD(𝑎)d𝑎 = = PFD1 ∙ PFD2 𝜏 0 4

We see that this result is equal to the PFD obtained when simply multiplying the average PFD values. Hence, with independent testing, there is no correlation between the PFD(𝑡) functions of the components, and the systemic dependencies vanish. 1oo3 voting Next we consider the case with three components voted 1oo3 having the same test interval 𝜏, but where the testing is not done simultaneously. We assume that component 2 and 3 are tested at times 𝑎 and 𝑏 inside the test interval of component 1, such that 0 ≤ 𝑎 ≤ 𝑏 ≤ 𝜏. Analogous to the 1oo2 case, it can be shown that the PFD of this system of three components is: PFD(𝑎, 𝑏) = �2 −

8(𝑎 + 𝑏) 4𝑎(𝑎 + 𝑏) 8(𝑎3 + 𝑏 3 ) − 12𝑎𝑏(𝑎 + 𝑏) + + � ∙ PFD1 ∙ PFD2 ∙ PFD3 3𝜏 𝜏2 3𝜏 3

Similar to the 1oo2 voting, this function attains its maximum value when the components are tested simultaneously (i.e.: 𝑎 = 𝑏 = 0) and its minimum value when the tests are evenly distributed in time (i.e., 𝑎 = 𝜏/3 and 𝑏 = 2𝜏/3). PFDmax = 2 ∙ PFD1 ∙ PFD2 PFDmin =

2 ∙ PFD1 ∙ PFD2 3

The lengthy expression in parentheses above is the required CF, and we have CF = 2 for PFDmax and CF = 2/3 for PFDmin . Also, in case of no prior knowledge of 𝑎 and 𝑏, i.e., if the three components are tested completely independently, the average PFD is again equal to the product of average PFD values. 𝜏 𝑏

2 𝜆1 𝜆2 𝜆3 𝜏 3 PFD = 2 � � PFD(𝑎, 𝑏) d𝑎 d𝑏 = = PFD1 ∙ PFD2 ∙ PFD3 𝜏 8 0 0

15

Note the difference in interpretation of PFD(𝑎) and PFD: PFD(𝑎) is the average PFD of the 1oo2 structure given that comp 2 is tested at time a, while PFD is the average PFD of the 1oo2 structure when we have no information about 𝑎.

84 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

Generalization to 1-oo-n voting Although it is not proven here, these results are generalizable to the 1-oo-n case, where n redundant components have the same test interval 𝜏, but where the testing is not done simultaneously. This means that: 1. The PFD attains its maximum value when components are tested simultaneously. PFDmax

𝑛

∏𝑛𝑖=1 𝜆𝑖 𝜏 2𝑛 = = ∙ � PFD𝑖 𝑛+1 𝑛+1 𝑖=1

2. PFDmax is always higher than the product of average PFD values, corresponding to a CF = 2𝑛 ⁄(𝑛 + 1). This implies that the product of average PFD values in this case is non-conservative.

3. The PFD attains its minimum value when tests are distributed evenly in time, i.e., at times {𝑡𝑘 = 𝑘𝜏⁄𝑛}𝑘=1…𝑛 . 4. PFDmin is always lower than the product of average PFD values, corresponding to a CF < 1. This implies that the product of average PFD values in this case is conservative.

5. The average PFD with independent testing is equal to the product of average PFD values. This implies that no CF is necessary. 𝑛

𝑛

𝑖=1

𝑖=1

𝜆𝑖 𝜏 PFD = � = � PFD𝑖 2

D.2.2 Different length of test intervals

We consider the case with two components voted 1oo2 with failure rates 𝜆1 and 𝜆2 , respectively, and test intervals 𝜏1 and 𝜏2 , respectively. We first assume that the test interval of component 2 is a multiple of the interval of component 1, i.e., 𝜏2 = 𝑛𝜏1 , and that the testing is “quasi-simultaneous” in the sense that whenever component 2 is tested, component 1 is tested also. The situation is illustrated in Figure D.8 with 𝑛 = 4. PFD

PFD1(t)

PFD2(t)

τ1

2τ1

3τ1

4τ1 τ2

5τ1

t

Figure D.8: PFD(t) for two components with different test intervals and failure rates. The case of τ2=4τ1 is shown The overall PFD for this system is:

85 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition 𝑛𝜏

𝑛

𝑘𝜏 1 1 PFD = � PFD1 (𝑡) ∙ PFD2 (𝑡) d𝑡 = �� 𝜆1 (𝑡 − (𝑘 − 1)𝜏) ∙ 𝜆2 𝑡 d𝑡 𝑛𝜏 𝑛𝜏 (𝑘−1)𝜏 0

3𝑛 + 1 = ∙ PFD1 ∙ PFD2 3𝑛

𝑘=1

(D.2)

If there is a difference in phasing of the tests, we know from the discussion above that this will reduce the PFD. Assuming that component 2 is tested at time 𝑎 inside the test interval of component 1, it can be shown that: 3𝑛 + 1 2𝑎 2𝑎2 PFD(𝑎) = � − + � ∙ PFD1 ∙ PFD2 3𝑛 𝑛𝜏 𝑛𝜏 2 Mirroring the results in section D.2.1, this function attains its maximum value (with CF > 1) when the components are tested simultaneously and its minimum value (with CF < 1) when 𝑎 = 𝜏/2: PFDmax = PFDmin =

3𝑛 + 1 ∙ PFD1 ∙ PFD2 3𝑛

6𝑛 − 1 ∙ PFD1 ∙ PFD2 6𝑛

Also, in case of no prior knowledge of a, i.e., if the two components are tested completely independently, the average PFD is equal to the product of average PFD values, and no CF is needed: 1 𝜏 PFD = � PFD(𝑎)d𝑎 = PFD1 ∙ PFD2 𝜏 0

We see that as n grows, the effect of different phasing diminishes rapidly. For example, for two components tested quarterly and yearly, respectively (𝑛 = 4), calculating with individual average PFD values will require a CF of 1.08. In other words: omitting the CF in the calculation underestimates the actual PFD by a mere 8 %. The formula D.2 may therefore be used as a fairly good – and conservative – approximation also in cases of different phasing. Furthermore, the formula applies for integer 𝑛, but can also be used as an approximation in cases where 𝜏2 is not an exact multiple of 𝜏1 .

86 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

APPENDIX E: PFD VERSUS PFH AND THE EFFECT OF DEMANDS This Appendix provides some of the background for the main results presented in chapter 6. In addition to the standard parameters, we now introduce the following notation: 𝑿 = Number of demands in a test interval (0, 𝜏]. 𝑋 is Poisson distribution with parameter 𝛿 ∙ 𝜏, where

𝜹 = Constant rate of demands which serve as a functional test.

𝒁 = Number of hazards in a test interval (0, 𝜏], i.e., number of demands not being “prevented” by the SIS, due to a DU failure of the SIS. 𝒀𝟏 , 𝒀𝟐 , … , 𝒀𝒌 = Occurrences of the demands in a test interval, given 𝑋 = 𝑥, where 0 < 𝑌1 < 𝑌2 < ⋯ < 𝑌𝑥 < 𝜏.

Regarding PFD, PFH and HR we use the following notation:

𝐏𝐅𝐃(𝒕) = Instantaneous PFD at time 𝑡; 0 < 𝑡 ≤ 𝜏, the general case also accounting for demands.

𝐏𝐅𝐃 = Average PFD over the test interval (0, 𝜏]; i.e., general case when demands are accounted for.

𝐏𝐅𝐃𝒙 = Average PFD, when demands are accounted for, and given that 𝑋 = 𝑥 in the test interval (0, 𝜏].

𝐏𝐅𝐃𝟎 = “Traditional” PFD. Average PFD over the test interval, not accounting for demands, (i.e., X = 0). 𝐏𝐅𝐃𝟎 (𝒕) = Instantaneous PFD at time 𝑡, not accounting for demands, i.e., 𝑋 = 0.

𝐏𝐅𝐇𝟎 (𝒕) = Rate of SIS failures at time 𝑡, not accounting for demands, i.e., 𝑋 = 0.

𝐏𝐅𝐇𝟎 = Average rate of SIS failures, given X = 0.

𝐏𝐅𝐇(𝒕) = Rate of SIS failures at time 𝑡; 0 < 𝑡 ≤ 𝜏, the general case also accounting for demands. 𝐏𝐅𝐇 = Average rate of SIS failures, general case accounting for demands.

𝐇𝐑 = Hazard rate, i.e., rate of demands not prevented by the SIS and thus giving a hazardous event (HR = E(Z)/𝜏).

Here it is assumed that we have an on demand system where it is the DU failures that make the essential contributions to loss-of-safety; (this is often a fair assumption, since when D failures are detected “immediately” safety precautions may be taken to avoid hazards). Further, a constant failure rate of DU failures is assumed, and we make the standard simplifying assumption PFD0 (𝑡) = 1 − e−𝜆DU⋅𝑡 ≈ 𝜆DU ⋅ 𝑡 (and e.g., PFD0 = 𝜆DU ⋅ 𝜏/2. The system is as assumed as good as new after each functional test, and the demands included here are also considered to give a “perfect test”. It should be noted that if a demand does not always serve as a functional test of the system, 𝛿 should be defined as the rate of demands which serve as a functional test; (might depend on the voting configuration), and thus 𝑋 is the number of those demands.

First it is observed that the occurrence of demands does essentially not affect the SIS failure rate PFH. Unless we account for combinations of independent failures, the PFH is independent of δ and equal to PFH0 . Further, assuming a constant PFH = (PFH0 ), we get PFD0 = PFH0 ⋅ τ/2. This is the relation between PFD and PFH, when demands are not accounted for (i.e., X = 0).

87 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

The objective now is to derive expressions for the PFD, PFH (and HR), also taking the effect of demands into account. We present the results for the case that PFD(𝑡) is a step-wise linear function in 𝑡 (linear case); corresponding to time-dependent PFH being constant. This means that we restrict to consider either NooN configuration or for 𝑀oo𝑁 configurations (𝑀 < 𝑁) restrict to consider CCF (i.e., PFH0 = C𝑀oo𝑁 ⋅ 𝛽 ⋅ 𝜆DU .) This is a standard assumption and not considered a serious limitation, and we skip discussing the contribution of independent DU failures. The approach chosen will be to derive the expressions for basic parameters (as HR), by conditioning on the values of 𝑋 and 𝑌1 , 𝑌2 , … , 𝑌𝑋 . The approach will follow the line of probabilistic arguments found e.g., in /15/. First we give the following basic probabilistic result: Lemma Given 𝑋 = 𝑥, the distribution of 𝑌1 , 𝑌2 , … , 𝑌𝑋 , will be that of the order statistics of a uniform distribution over (0, 𝜏]. That is, the joint probability density function (pdf) of 𝑌1 , 𝑌2 , … , 𝑌𝑋 given 𝑋 = 𝑥 equals 𝑓𝑌 (𝑦1 , 𝑦2 , … 𝑦𝑥 ) =

Proof Let 𝑋 = 2. Then,

P(𝑌1 ≤ 𝑦1 , 𝑌2 ≤ 𝑦2 |𝑋 = 2) = (𝛿𝜏)2

According to the Poisson distribution, P(𝑋 = 2) = � the probability of the event that • • •

2!

𝑥! ; 𝜏𝑥



0 < 𝑦1 < 𝑦2 < 𝑦𝑥 < 𝜏.

𝑃(𝑌1 ≤𝑦1 ,𝑌2 ≤𝑦2 ∩𝑋=2) . 𝑃(𝑋=2)

� e−δτ . Further, P(𝑌1 ≤ 𝑦1 , 𝑌2 ≤ 𝑦2 ∩ 𝑋 = 2) is

In the interval (0, 𝑦1 ] there is exactly one demand; In the interval (𝑦1 , 𝑦2 ] there is exactly one demand; In the interval (𝑦2 , 𝜏] there is no demand.

It follows that,

P(𝑌1 ≤ 𝑦1 , 𝑌2 ≤ 𝑦2 ∩ 𝑋 = 2) = 𝛿𝑦1 e−δy1 ⋅ 𝛿(𝑦2 − 𝑦1 )e−𝛿(𝑦2 −𝑦1 ) ⋅ e−𝛿(𝜏−𝑦2 ) .

And so the conditional cumulative distribution function equals

giving the conditional pdf

P(𝑌1 ≤ 𝑦1 , 𝑌2 ≤ 𝑦2 |𝑋 = 2) = 𝑓𝑌 (𝑦1 , 𝑦2 , … 𝑦𝑥 ) =

2 ; 𝜏2

𝑦1 ⋅(𝑦2 −𝑦1 ) 2, 𝜏2 /2

0 < 𝑦1 < 𝑦2 < 𝜏.

This proves the result for 𝑋 = 2. The general result is proved similarly.



A hazard is defined as a demand occurring when the SIS is in a failed state (not responding properly to the demand). The Hazard Rate, HR will be given as E(𝑍)/𝜏, and so we first find E(𝑍). Given 𝑋 = 𝑥 demands, the number of hazards in (0, 𝜏] be written as

88 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

𝑍 = 𝐼1 + 𝐼2 + ⋯ + 𝐼𝑘 ,

where 𝐼𝑘 equals 1 if demand no. 𝑘 results in a hazard; that is if the SIS has a DU failure at that demand (occurring at time 𝑌𝑘 ), and 0 otherwise. So 𝑃(𝐼𝑘 = 1) = E(𝐼𝑘 ) = probability that demand no. 𝑘 meets a failed SIS. First consider the conditional case that both 𝑋 and 𝑌1 , 𝑌2 , … , 𝑌𝑋 are given. In the conditional case (Figure E.1): 𝑃(𝐼𝑘 = 1) = 𝜆DU ⋅ (𝑦𝑘 − 𝑦𝑘−1 ); 𝑘 = 2,3, … , 𝑛 and P(𝐼1 = 1) = 𝜆DU ⋅ 𝑦1 .

(writing out the result just for a single SIS component).

It follows by a simple calculation that, conditionally, given 𝑋 = 𝑥 and (𝑌1 , 𝑌2 , … , 𝑌𝑥 ) = (𝑦1 , 𝑦2 , … , 𝑦𝑥 ): 𝐸(𝑋) = E(𝐼1 + 𝐼2 + ⋯ + 𝐼𝑥 ) = 𝜆DU ⋅ 𝑦𝑥 .

PFD(t)

λDUy1 y1

λDU(y2 – y1)

λDU(yx – yx –1)

y2

yx

λDU(τ – yx)

τ

Figure E.1: PFD(t) for a single component, given X = x demands, and given the instances (yk) of demand (“linear case”, DU failures only) By integrating this over the joint pdf of 𝑌1 , 𝑌2 , … , 𝑌𝑥 (given 𝑋 = 𝑥), we obtain the conditional value of the mean number of hazards, E(𝑍), in an interval with 𝑋 = 𝑥 demands: 𝐸(𝑍|𝑋 = 𝑥) = 𝑥 ⋅ 𝜆DU 𝜏/(𝑥 + 1);

(= 𝜆DU 𝜏/(1 + 𝑥 −1 ) for 𝑥 > 0).

For 𝑥 = 0, it is observed that we get the obvious result, E(𝑍│𝑋 = 0) = 0; (there can be no hazard in an interval with no demand). Further, E(𝑍│𝑋 = 1) = 𝜆DU 𝜏/2, which is also as expected: the single demand occurs randomly in the interval, and the average probability that the demand “meets” a failed system equals the average PFD0 = 𝜆DU 𝜏/2. However as number of demands, 𝑋, increases it is seen that mean no. of hazards, 𝐸(𝑍│𝑋 = 𝑥), also increases, (approaching 𝜆DU 𝜏 = 2 ⋅ PFD0 ).

It remains to find the unconditional value of E(𝑍). Since 𝑋 has a Poisson distribution with mean 𝛿 ∙ 𝜏, it can be derived that

And so

E(𝑍) = ∑𝑥 E(𝑍|𝑋 = 𝑥) ⋅ P(𝑋 = 𝑥) = 𝜆DU (e−𝛿𝜏 − 1 + 𝛿𝜏)/𝛿. HR = E(𝑍)/𝜏 = 𝜆DU (e−𝛿𝜏 − 1 + 𝛿𝜏)/(𝛿𝜏).

This rather fundamental result is showing how the HR depends on all the parameters 𝜆DU , 𝜏 and 𝛿. It is valid for a SIS with a constant failure rate, here denoted, 𝜆DU (i.e., a 1oo1 configuration). However, if we

89 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

restrict to consider CCF – which is usually the main contribution to loss of safety – the SIS failure rate will actually be constant also for 𝑀oo𝑁, i.e., we write PFH0 = C𝑀𝑜𝑜𝑁 ⋅ 𝛽 ⋅ 𝜆DU ; 𝑀 < 𝑁, provided we have a constant failure rate for each component, and restrict to include CCF in the calculations, (or have a 𝑁oo𝑁 configuration with PFH0 = 𝑁 ⋅ 𝜆DU ), we have the more general result HR = PFH0 ⋅ (e−δτ − 1 + 𝛿𝜏)/(𝛿𝜏).

So the fundamental parameter HR is now given as the product of PFH0 and a factor, which we see is entirely determined by 𝛿𝜏 (= mean number of demands in the test interval). Actually, by expanding e−δτ, we get (e−δτ − 1 + 𝛿𝜏)/(𝛿𝜏) ≈ 𝛿𝜏/2 − (𝛿𝜏)2 /6 ≈ 𝛿𝜏/2 (for “small” 𝛿𝜏). It follows that HR ≈ PFH0 ⋅ 𝛿𝜏/2 , small 𝛿𝜏,

Next, investigate the expression for PFD when demands are accounted for. By following the same procedure as above, first find the time dependent PFD, i.e., PFD(𝑡), given 𝑋 = 𝑥 and (𝑌1 , 𝑌2 , … , 𝑌𝑥 ) = (𝑦1 , 𝑦2 , … , 𝑦𝑥 ). For a 1oo1 system, PFD(𝑡) increases linearly from 0 to 𝜆DU 𝑦1, where it drops to 0 and then again increases linearly to 𝜆DU (𝑦2 − 𝑦1 ) at time 𝑦2 , etc.; see Figure E.1. Given 𝑋 = 𝑥, we can derive the conditional PFD, given 𝑋 = 𝑥: PFD𝑥 = 𝜆DU ⋅ 𝜏/(𝑥 + 2),

showing how PFD decreases as 𝑥 increases (due to the “testing” performed by the demands). Of course we get the special case PFD0 = 𝜆DU ⋅ 𝜏/2. From PFD𝑥 we can now derive the overall average PFD = ∑𝑥 PFDx ⋅ P(𝑋 = 𝑥). However, the overall PFD is most easily obtained utilizing that the HR at time 𝑡 equals HR(𝑡) = PFD(𝑡) ⋅ 𝛿 and by integrating over (0, 𝜏] this gives HR = PFD ∙ 𝛿.

From the above result on HR it then directly follows that

PFD = 𝜆DU (e−𝛿𝜏 − 1 + 𝛿𝜏)/(𝛿 2 𝜏).

Again, we may replace 𝜆DU ⋅ 𝜏 by using 𝑃𝐹𝐷0 = 𝜆DU ⋅ 𝜏/2, and the average PFD becomes or

where we define

PFD = PFD0 ⋅ 2(e−𝛿𝜏 − 1 + 𝛿𝜏)/(𝛿𝜏)2 PFD = PFD0 ⋅ 𝑓(𝛿𝜏), 𝑓(𝑥) = 2(e−𝑥 − 1 + 𝑥)/𝑥 2 .

Here it can be proved that 𝑓(𝑥) → 1 as 𝑥 → 0, which assures that PFD approaches PFD0 as 𝛿 → 0.

By performing an expansion of e−x , we get the approximation 𝑓(𝑥) ≈ 1 − 𝑥/3, for small 𝑥. Then, the PFD becomes: PFD ≈ PFD0 �1 −

𝛿𝜏 �, 3

(small 𝛿𝜏).

90 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

For “small” 𝛿𝜏 this will give a simple relation between PFD and the “traditional” PFD0 . Note that the exact formula is obtained by multiplying the “traditional” PFD0 with a simple factor which depends on 𝛿𝜏.

Summary of main results Note that the above results are valid in general for the “linear case”, i.e., PFH0 being constant: PFH0 ≈ 𝑁 ⋅ 𝜆DU for the 𝑁oo𝑁 configuration, and

PFH0 ≈ C𝑀𝑜𝑜𝑁 ⋅ 𝛽 ⋅ 𝜆DU , for the 𝑀oo𝑁 configuration, 𝑀 < 𝑁,

an approximation being valid when combinations of independent failures are not taken into consideration. It is seen that this neither depends on the length of test interval, 𝜏, nor on the demand rate, 𝛿. So this approximation, which usually represents the major contribution to PFH0 , will neither capture the effect of 𝜏 nor of 𝛿.

Further, by introducing 𝑓(𝑥) = 2(e−𝑥 − 1 + 𝑥)/𝑥 2 we have the following relations between the three measures, PFH0 , PFD and HR, (with PFH = PFH0 ): PFD = PFD0 𝑓(𝛿𝜏) = PFH0 ∙ 𝑓(𝛿𝜏)𝜏/2. HR = PFD ∙ 𝛿 = PFD0 ∙ 𝛿𝑓(𝛿𝜏). HR = PFH0 ∙ 𝑓(𝛿𝜏) ∙ 𝛿𝜏/2.

In particular, HR is is a product of the constant PFH0 and is a function of the mean number of demands in the test interval, i.e., 𝛿𝜏. These results generalize the standard results, which disregard the effect of demands. However, note that the relations are given under the assumption that PFH remains constant and equal to PFH0 throughout the test interval (which is usually a fair approximation).

Some numerical results Table E.1 indicates the effect on PFD of the demands, by giving the “modification factor”, 𝑓(𝛿𝜏) = PFD/PFD0 for some values of 𝛿𝜏. Note that the interpretation of 𝛿𝜏 is the mean number of demands (acting as a test) in one test interval. As stated above the modification factor approaches 1 as 𝛿𝜏 approaches 0. If, for instance, there are five demands per test interval (acting as tests), PFD is seen to be reduced to about one third of its value when there are no demands. Table E.1: Values of “modification factor” = PFD/PFD0, as a function of δτ. Last line: HR/PFD0, (with τ = 1) 0.1 𝒇(𝜹𝝉) = 𝐏𝐅𝐃/𝐏𝐅𝐃𝟎

𝜹 ∙ 𝒇(𝜹𝝉) = 𝐇𝐑/𝐏𝐅𝐃𝟎 ; (𝝉 = 𝟏)

𝜹𝝉 = mean number of demands per test interval 0.2

0.5

1

2

5

10

0.97

0.94

0.85

0.74

0.57

0.32

0.18

0.10

0.19

0.43

0.74

1.1

1.6

1.8

The last line of the table gives the factor HR/PFD0 = 𝛿 ∙ 𝑓(𝛿𝜏), inserting 𝜏 = 1 (𝜏 is chosen as the time unit, and 𝛿 is interpreted as the mean number of demands in a test interval). For small 𝛿, it is seen that we actually get HR/PFD0 = 𝛿, i.e., HR ≈ 𝛿 ⋅ PFD0 , and the error by calculating HR from 𝛿 ⋅ PFD0 is

91 of 93

Reliability Prediction Method for Safety Instrumented Systems PDS Method Handbook, 2013 Edition

negligible. But as 𝛿 increases (above 1), HR/PFD0 becomes 0). So no harm is done by calculating PFH0 as long as the result is afterwards used to calculate HR. PFD (and in particular HR = 𝛿 ∙ PFD) is well suited to describe how loss-of-safety depends on both 𝜏 and 𝛿. It is highly recommended to calculate HR, However, also PFD is a measure reflecting both 𝜏 and 𝛿, and there is little doubt that PFD is a better loss-of-safety parameter than is PFH0 (when dormant failure is the main issue). When demands actually serve as functional tests, it is recommended that the expressions for HR and PFD referred above should be used to determine whether the SIS has an acceptable reliability, (i.e., whether loss-of-safety is sufficiently low). o It is observed that PFD depends on the demands only through 𝛿𝜏 = mean number of demands in a test interval. We take this as an indication that the previous IEC 61508 definition of low demand mode in /23/ is more sensible than the new one focusing entirely on the no. of demands in one year, (as one year also seems a rather arbitrary time unit).

92 of 93