Resilience engineering explained

Resilience engineering is a subfield of safety science research that focuses on understanding how complex adaptive systems cope when encountering a surprise. The term resilience in this context refers to the capabilities that a system must possess in order to deal effectively with unanticipated events. Resilience engineering examines how systems build, sustain, degrade, and lose these capabilities.^[1]

Resilience engineering researchers have studied multiple safety-critical domains, including aviation, anesthesia, fire safety, space mission control, military operations, power plants, air traffic control, rail engineering, health care, and emergency response to both natural and industrial disasters.^[2] ^[3] Resilience engineering researchers have also studied the non-safety-critical domain of software operations.^[4]

Whereas other approaches to safety (e.g., behavior-based safety, probabilistic risk assessment) focus on designing controls to prevent or mitigate specific known hazards (e.g., hazard analysis), or on assuring that a particular system is safe (e.g., safety cases), resilience engineering looks at a more general capability of systems to deal with hazards that were not previously known before they were encountered.

In particular, resilience engineering researchers study how people are able to cope effectively with complexity to ensure safe system operation, especially when they are experiencing time pressure.^[5] Under the resilience engineering paradigm, accidents are not attributable to human error. Instead, the assumption is that humans working in a system are always faced with goal conflicts, and limited resources, requiring them to constantly make trade-offs while under time pressure. When failures happen, they are understood as being due to the system temporarily being unable to cope with complexity.^[6] Hence, resilience engineering is related to other perspectives in safety that have reassessed the nature of human error, such as the "new look",^[7] the "new view",^[8] "safety differently",^[9] and Safety-II.^[10]

Resilience engineering researchers ask questions such as:

What can organizations do in order to be better prepared to handle unforeseeable challenges?
How do organizations adapt their structure and behavior to cope effectively when faced with an unforeseen challenge?

Because incidents often involve unforeseen challenges, resilience engineering researchers often use incident analysis as a research method.

Resilience engineering symposia

The first symposium on resilience engineering was held in October 2004 in Soderkoping, Sweden. It brought together fourteen safety science researchers with an interest in complex systems.

A second symposium on resilience engineering was held in November 2006 in Sophia Antipolis, France.^[11] The symposium had eighty participants.^[12] The Resilience Engineering Association, an association of researchers and practitioners with an interest in resilience engineering, continues to hold bi-annual symposia.^[13]

These symposia led to a series of books being published (see Books section below).

Themes

This section discusses aspects of the resilience engineering perspective that are different from traditional approaches to safety.

Normal work leads to both success and failure

The resilience engineering perspective assumes that the nature of work which people do within a system that contributes to an accident is fundamentally the same as the work that people do that contributes to successful outcomes. As a consequence, if work practices are only examined after an accident and are only interpreted in the context of the accident, the result of this analysis is subject to selection bias.^[14]

Fundamental surprise

The resilience engineering perspective posits that a significant number of failure modes are literally inconceivable in advance of them happening, because the environment that systems operate in are very dynamic and the perspectives of the people within the system are always inherently limited. These sorts of events are sometimes referred to as fundamental surprise. Contrast this with the approach of probabilistic risk assessment which focuses on evaluate conceivable risks.

Human performance variability as an asset

The resilience engineering perspective holds that human performance variability has positive effects as well as negative ones, and that safety is increased by amplifying the positive effects of human variability as well as adding controls to mitigate the negative effects. For example, the ability of humans to adapt their behavior based on novel circumstances is a positive effect that creates safety. As a consequence, adding controls to mitigate the effects of human variability can reduce safety in certain circumstances^[15]

The centrality of expertise and experience

Expert operators are an important source of resilience inside of systems. These operators become experts through previous experience at dealing with failures.^[16]

Risk is unavoidable

Under the resilience engineering perspective, the operators are always required to trade-off risks. As a consequence, in order to create safety, it is sometimes necessary for a system to take on some risk.

Bringing existing resilience to bear vs generating new resilience

The researcher Richard Cook distinguishes two separate kinds of work that tend to be conflated under the heading resilience engineering:

Bringing existing resilience to bear

The first type of resilience engineering work is determining how to best take advantage of the resilience that is already present in the system. Cook uses the example of setting a broken bone as this type of work: the resilience is already present in the physiology of bone, and setting the bone uses this resilience to achieving better healing outcomes.

Cook notes that this first type of resilience work does not require a deep understanding of the underlying mechanisms of resilience: humans have been setting bones long before the mechanism by which bone heals was understood.

Generating new resilience

The second type of resilience engineering work involves altering mechanisms in the system in order to increase the amount of the resilience. Cook uses the example of new drugs such as Abaloparatide and Teriparatide, which mimic Parathyroid hormone-related protein and are used to treat osteoporosis.

Cook notes that this second type of resilience work requires a much deeper understanding of the underlying existing resilience mechanisms in order to create interventions that can effectively increase resilience.

Hollnagel perspective

The safety researcher Erik Hollnagel views resilient performance as requiring four systemic potentials:

The potential to respond
The potential to monitor
The potential to learn
The potential to anticipate.

This has been described in a White Paper from Eurocontrol on Systemic Potentials Management https://skybrary.aero/bookshelf/systemic-potentials-management-building-basis-resilient-performance

Woods perspective

The safety researcher David Woods considers the following two concepts in his definition of resilience:^[17]

graceful extensibility: the ability of a system to develop new capabilities when faced with a surprise that cannot be dealt with effectively with a system's existing capabilities
sustained adaptability: the ability of a system to continue to keep adapting to surprises, over long periods of time

These two concepts are elaborated in Woods's theory of graceful extensibility.

Woods contrasts resilience with robustness, which is the ability of a system to deal effectively with potential challenges that were anticipated in advance.

The safety researcher Richard Cook argued that bone should serve as the archetype for understanding what resilience is in the Woods perspective. Cook notes that bone has both graceful extensibility (has a soft boundary at which it can extend function) and sustained adaptability (bone is constantly adapting through a dynamic balance between creation and destruction that is directed by mechanical strain).

In Woods's view, there are three common patterns to the failure of complex adaptive systems:

decompensation: exhaustion of capacity when encountering a disturbance
working at cross purposes: when individual agents in a system behave in a way that achieves local goals but goes against global goals
getting stuck in outdated behaviors: relying on strategies that were previously adaptive but are no longer so due to changes in the environment

Resilient Health care

In 2012 the growing interest for resilience engineering gave rise to the sub-field of Resilient Health Care. This led to a series of annual conferences on the topic that are still ongoing as well as a series of books, on Resilient Health Care, and in 2022 to the establishment of the Resilient Health Care Society (registered in Sweden). (https://rhcs.se/)

Books

Resilience Engineering: Concepts and Precepts by David Woods, Erik Hollnagel, and Nancy Leveson, 2006.
Resilience Engineering in Practice: A Guidebook by Jean Pariès, John Wreathall, and Erik Hollnagel, 2013.
Resilient Health Care, Volume 1: Erik Hollnagel, Jeffrey Braithwaite, and Robert L. Wears (eds), 2015.
Resilient Health Care, Volume 2: The Resilience of Everyday Clinical Work by Erik Hollnagel, Jeffrey Braithwaite, Robert Wears (eds), 2015.
Resilient Health Care, Volume 3: Reconciling Work-as-Imagined and Work-as-Done by Jeffrey Braithwaite, Robert Wears, and Erik Hollnagel (eds), 2016.
Resilience Engineering Perspectives, Volume 1: Remaining Sensitive to the Possibility of Failure by Erik Hollnagel, Christopher Nemeth, and Sidney Dekker (eds.), 2016.
Resilience Engineering Perspectives, Volume 2: Remaining Sensitive to the Possibility of Failure by Christopher Nemeth, Erik Hollnagel, and Sidney Dekker (eds.), 2016.
Governance and Control of Financial Systems: A Resilience Engineering Perspective by Gunilla Sundström and Erik Hollnagel, 2018.

Notes and References

Book: Woods, D.D. . IRGC resource guide on resilience (vol. 2): Domains of resilience for complex interconnected systems . EPFL International Risk Governance Center . 2018 . Trump . B.D. . Lausanne, CH . Resilience is a Verb . Florin . M.-V. . Linkov . I . https://irgc.org/wp-content/uploads/2018/12/Woods-for-IRGC-Resilience-Guide-Vol-2-2018.pdf .
Book: Pariès, Jean . Resilience Engineering in Practice . 15 May 2017 . CRC Press . 978-1-317-06525-8 . 1151009227.
Book: Hollnagel . Erik . Christopher P. Nemeth . Sidney Dekker . Resilience engineering perspectives . 2: Preparation and Restoration . 2019 . CRC Press . 978-0-367-38540-8 . 1105725342.
Book: Woods, D.D. . STELLA: Report from the SNAFUcatchers Workshop on Coping With Complexity . Ohio State University . 2017 . Columbus, OH.
Book: Dekker, Sidney . Foundations of safety science: a century of understanding accidents and disasters . 2019 . 978-1-351-05977-0 . Boca Raton . 1091899791.
Book: (David), Woods, D. . Resilience Engineering: Concepts and Precepts. . 2017 . CRC Press . 978-1-317-06528-9 . 1011232533.
Book: Woods, David D. . Behind human error. . 2017 . Sidney Dekker . Richard Cook . Leila Johannesen . 978-1-317-17553-7 . 2nd . Boca Raton . 1004974951.
Dekker . Sidney W. A. . 2002-10-01 . Reconstructing human contributions to accidents: the new view on error and performance . Journal of Safety Research . en . 33 . 3 . 371–385 . 10.1016/S0022-4375(02)00032-4 . 12404999 . 46350729 . 0022-4375.
Book: Dekker, Sidney . Safety differently : human factors for a new era . 2015 . 978-1-4822-4200-3 . Second . Boca Raton, FL . 881430177.
Book: Hollnagel, Erik . Safety-I and safety-II: the past and future of safety management . 2014 . 978-1-4724-2306-1 . Farnham . 875819877.
Web site: 2006 Sophia Antipolis (F) . 2022-09-25 . Resilience Engineering Association . en-GB.
Book: Resilience engineering perspectives . 2008–2009 . Ashgate . Erik Hollnagel, Christopher P. Nemeth, Sidney Dekker . 978-0-7546-7127-5 . Aldershot, Hampshire, England . 192027611.
Web site: Symposium . 2022-09-25 . Resilience Engineering Association . en-GB.
Book: Resilience engineering perspectives . 2008–2009 . Ashgate . Erik Hollnagel . Christopher P. Nemeth . Sidney Dekker . 978-0-7546-7127-5 . Aldershot, Hampshire, England . 192027611.
Book: Dekker, Sidney . The safety anarchist: relying on human expertise and innovation, reducing bureaucracy and compliance . 2018 . 978-1-351-40364-1 . London . 1022761874.
Web site: Hindsight 31 SKYbrary Aviation Safety . 2022-09-25 . skybrary.aero.
Woods . David D. . September 2015 . Four concepts for resilience and the implications for the future of resilience engineering . Reliability Engineering & System Safety . en . 141 . 5–9 . 10.1016/j.ress.2015.03.018.