Resilience engineering is a subfield of safety science research that focuses on understanding how complex adaptive systems cope when encountering a surprise. The term resilience in this context refers to the capabilities that a system must possess in order to deal effectively with unanticipated events. Resilience engineering examines how systems build, sustain, degrade, and lose these capabilities.[1]
Resilience engineering researchers have studied multiple safety-critical domains, including aviation, anesthesia, fire safety, space mission control, military operations, power plants, air traffic control, rail engineering, health care, and emergency response to both natural and industrial disasters.[2] [3] Resilience engineering researchers have also studied the non-safety-critical domain of software operations.[4]
Whereas other approaches to safety (e.g., behavior-based safety, probabilistic risk assessment) focus on designing controls to prevent or mitigate specific known hazards (e.g., hazard analysis), or on assuring that a particular system is safe (e.g., safety cases), resilience engineering looks at a more general capability of systems to deal with hazards that were not previously known before they were encountered.
In particular, resilience engineering researchers study how people are able to cope effectively with complexity to ensure safe system operation, especially when they are experiencing time pressure.[5] Under the resilience engineering paradigm, accidents are not attributable to human error. Instead, the assumption is that humans working in a system are always faced with goal conflicts, and limited resources, requiring them to constantly make trade-offs while under time pressure. When failures happen, they are understood as being due to the system temporarily being unable to cope with complexity.[6] Hence, resilience engineering is related to other perspectives in safety that have reassessed the nature of human error, such as the "new look",[7] the "new view",[8] "safety differently",[9] and Safety-II.[10]
Resilience engineering researchers ask questions such as:
Because incidents often involve unforeseen challenges, resilience engineering researchers often use incident analysis as a research method.
The first symposium on resilience engineering was held in October 2004 in Soderkoping, Sweden. It brought together fourteen safety science researchers with an interest in complex systems.
A second symposium on resilience engineering was held in November 2006 in Sophia Antipolis, France.[11] The symposium had eighty participants.[12] The Resilience Engineering Association, an association of researchers and practitioners with an interest in resilience engineering, continues to hold bi-annual symposia.[13]
These symposia led to a series of books being published (see Books section below).
This section discusses aspects of the resilience engineering perspective that are different from traditional approaches to safety.
The resilience engineering perspective assumes that the nature of work which people do within a system that contributes to an accident is fundamentally the same as the work that people do that contributes to successful outcomes. As a consequence, if work practices are only examined after an accident and are only interpreted in the context of the accident, the result of this analysis is subject to selection bias.[14]
The resilience engineering perspective posits that a significant number of failure modes are literally inconceivable in advance of them happening, because the environment that systems operate in are very dynamic and the perspectives of the people within the system are always inherently limited. These sorts of events are sometimes referred to as fundamental surprise. Contrast this with the approach of probabilistic risk assessment which focuses on evaluate conceivable risks.
The resilience engineering perspective holds that human performance variability has positive effects as well as negative ones, and that safety is increased by amplifying the positive effects of human variability as well as adding controls to mitigate the negative effects. For example, the ability of humans to adapt their behavior based on novel circumstances is a positive effect that creates safety. As a consequence, adding controls to mitigate the effects of human variability can reduce safety in certain circumstances[15]
Expert operators are an important source of resilience inside of systems. These operators become experts through previous experience at dealing with failures.[16]
Under the resilience engineering perspective, the operators are always required to trade-off risks. As a consequence, in order to create safety, it is sometimes necessary for a system to take on some risk.
The researcher Richard Cook distinguishes two separate kinds of work that tend to be conflated under the heading resilience engineering:
The first type of resilience engineering work is determining how to best take advantage of the resilience that is already present in the system. Cook uses the example of setting a broken bone as this type of work: the resilience is already present in the physiology of bone, and setting the bone uses this resilience to achieving better healing outcomes.
Cook notes that this first type of resilience work does not require a deep understanding of the underlying mechanisms of resilience: humans have been setting bones long before the mechanism by which bone heals was understood.
The second type of resilience engineering work involves altering mechanisms in the system in order to increase the amount of the resilience. Cook uses the example of new drugs such as Abaloparatide and Teriparatide, which mimic Parathyroid hormone-related protein and are used to treat osteoporosis.
Cook notes that this second type of resilience work requires a much deeper understanding of the underlying existing resilience mechanisms in order to create interventions that can effectively increase resilience.
The safety researcher Erik Hollnagel views resilient performance as requiring four systemic potentials:
This has been described in a White Paper from Eurocontrol on Systemic Potentials Management https://skybrary.aero/bookshelf/systemic-potentials-management-building-basis-resilient-performance
The safety researcher David Woods considers the following two concepts in his definition of resilience:[17]
These two concepts are elaborated in Woods's theory of graceful extensibility.
Woods contrasts resilience with robustness, which is the ability of a system to deal effectively with potential challenges that were anticipated in advance.
The safety researcher Richard Cook argued that bone should serve as the archetype for understanding what resilience is in the Woods perspective. Cook notes that bone has both graceful extensibility (has a soft boundary at which it can extend function) and sustained adaptability (bone is constantly adapting through a dynamic balance between creation and destruction that is directed by mechanical strain).
In Woods's view, there are three common patterns to the failure of complex adaptive systems:
In 2012 the growing interest for resilience engineering gave rise the to the sub-field of Resilient Health Care. This led to a series of annual conferences on the topic that are still ongoing as well as a series of books, on Resilient Health Care, and in 2022 to the establishment of the Resilient Health Care Society (registered in Sweden). (https://rhcs.se/)