Test oracle explained

In software testing, a test oracle (or just oracle) is a provider of information that describes correct output based on the input of a test case. Testing with an oracle involves comparing actual results of the system under test (SUT) with the expected results as provided by the oracle.^[1]

The term "test oracle" was first introduced in a paper by William E. Howden.^[2] Additional work on different kinds of oracles was explored by Elaine Weyuker.^[3]

An oracle can operate separately from the SUT; accessed at test runtime, or it can be used before a test is run with expected results encoded into the test logic.^[4]

However, method postconditions are part of the SUT, as automated oracles in design by contract models.^[5]

Determining the correct output for a given input (and a set of program or system states) is known as the oracle problem or test oracle problem, which some consider a relatively hard problem, and involves working with problems related to controllability and observability.^[6]

Categories

A research literature survey covering 1978 to 2012^[7] found several potential categories of test oracles.

Specified

A specified oracle is typically associated with formalized approaches to software modeling and software code construction. It is connected to formal specification,^[8] model-based design which may be used to generate test oracles,^[9] state transition specification for which oracles can be derived to aid model-based testing^[10] and protocol conformance testing,^[11] and design by contract for which the equivalent test oracle is an assertion.

Specified test oracles have a number of challenges. Formal specification relies on abstraction, which in turn may naturally have an element of imprecision as all models cannot capture all behavior.^[7]

Derived

A derived test oracle differentiates correct and incorrect behavior by using information derived from artifacts of the system. These may include documentation, system execution results and characteristics of versions of the SUT.^[7] Regression test suites (or reports) are an example of a derived test oracle - they are built on the assumption that the result from a previous system version can be used as aid (oracle) for a future system version. Previously measured performance characteristics may be used as an oracle for future system versions, for example, to trigger a question about observed potential performance degradation. Textual documentation from previous system versions may be used as a basis to guide expectations in future system versions.

A pseudo-oracle^[7] falls into the category of derived test oracle. A pseudo-oracle, as defined by Weyuker,^[12] is a separately written program which can take the same input as the program or SUT so that their outputs may be compared to understand if there might be a problem to investigate.

A partial oracle^[7] is a hybrid between specified test oracle and derived test oracle. It specifies important (but not complete) properties of the SUT. For example, metamorphic testing exploits such properties, called metamorphic relations, across multiple executions of the system.

Implicit

An implicit test oracle relies on implied information and assumptions.^[7] For example, there may be some implied conclusion from a program crash, i.e. unwanted behavior - an oracle to determine that there may be a problem. There are a number of ways to search and test for unwanted behavior, whether some call it negative testing, where there are specialized subsets such as fuzzing.

There are limitations in implicit test oracles - as they rely on implied conclusions and assumptions. For example, a program or process crash may not be a priority issue if the system is a fault-tolerant system and so operating under a form of self-healing/self-management. Implicit test oracles may be susceptible to false positives due to environment dependencies. Property based testing relies on implicit oracles.

Human

A human can act as a test oracle. This approach can be categorized as quantitative or qualitative.^[7] A quantitative approach aims to find the right amount of information to gather on a SUT (e.g., test results) for a stakeholder to be able to make decisions on fit-for-purpose or the release of the software. A qualitative approach aims to find the representativeness and suitability of the input test data and context of the output from the SUT. An example is using realistic and representative test data and making sense of the results (if they are realistic). These can be guided by heuristic approaches, such as gut instincts, rules of thumb, checklist aids, and experience to help tailor the specific combination selected for the SUT.

Examples

Test oracles are most commonly based on specifications and documentation.^[13] ^[14] A formal specification used as input to model-based design and model-based testing would be an example of a specified test oracle. The model-based oracle uses the same model to generate and verify system behavior.^[15] Documentation that is not a full specification of the product, such as a usage or installation guide, or a record of performance characteristics or minimum machine requirements for the software, would typically be a derived test oracle.

A consistency oracle compares the results of one test execution to another for similarity.^[16] This is another example of a derived test oracle.

An oracle for a software program might be a second program that uses a different algorithm to evaluate the same mathematical expression as the product under test. This is an example of a pseudo-oracle, which is a derived test oracle.^[12]

During Google search, we do not have a complete oracle to verify whether the number of returned results is correct. We may define a metamorphic relation^[17] such that a follow-up narrowed-down search will produce fewer results. This is an example of a partial oracle, which is a hybrid between specified test oracle and derived test oracle.

A statistical oracle uses probabilistic characteristics,^[18] for example with image analysis where a range of certainty and uncertainty is defined for the test oracle to pronounce a match or otherwise. This would be an example of a quantitative approach in human test oracle.

A heuristic oracle provides representative or approximate results over a class of test inputs.^[19] This would be an example of a qualitative approach in human test oracle.

Bibliography

Binder, Robert V. (1999). "Chapter 18 - Oracles" in Testing Object-Oriented Systems: Models, Patterns, and Tools, Addison-Wesley Professional, 7 November 1999,

Notes and References

Earl T. Barr et al; The Oracle Problem in Software Testing: A Survey, 2015
Howden . W.E. . July 1978 . Theoretical and Empirical Studies of Program Testing . IEEE Transactions on Software Engineering . 4 . 4 . 293–298 . 10.1109/TSE.1978.231514 .
Weyuker, Elaine J.; "The Oracle Assumption of Program Testing", in Proceedings of the 13th International Conference on System Sciences (ICSS), Honolulu, HI, January 1980, pp. 44-49
Jalote, Pankaj; An Integrated Approach to Software Engineering, Springer/Birkhäuser, 2005,
Meyer . Bertrand . Fiva . Arno . Ciupa . Ilinca . Leitner . Andreas . Wei . Yi . Stapf . Emmanuel . September 2009 . Programs That Test Themselves . Computer . 42 . 9 . 46–55 . 10.1109/MC.2009.296 .
Ammann, Paul; and Offutt, Jeff; "Introduction to Software Testing, 2nd edition", Cambridge University Press, 2016,
Barr . Earl T. . Harman . Mark . McMinn . Phil . Shahbaz . Muzammil . Yoo . Shin . November 2014 . The Oracle Problem in Software Testing: A Survey . IEEE Transactions on Software Engineering . 41 . 5 . 507–525 . 10.1109/TSE.2014.2372785 . free .
Book: Börger . E . Applied Formal Methods — FM-Trends 98 . High Level System Design and Analysis Using Abstract State Machines . Hutter . D . Stephan . W . Traverso . P . Ullman . M . 1999. 1641 . 1–43 . 10.1007/3-540-48257-1_1 . Lecture Notes in Computer Science . 978-3-540-66462-8 . 10.1.1.470.3653 .
Peters . D.K. . March 1998 . Using test oracles generated from program documentation . IEEE Transactions on Software Engineering . 24 . 3 . 161–173 . 10.1109/32.667877 . 10.1.1.39.2890 .
Utting . Mark . Pretschner . Alexander . Legeard . Bruno . A taxonomy of model-based testing approaches . Software Testing, Verification and Reliability . 22. 5 . 1099-1689. 10.1002/stvr.456 . 297–312. 2012 .
Book: Marie-Claude Gaudel . Gaudel . Marie-Claude . Reliable SoftwareTechnologies — Ada-Europe 2001 . Testing from Formal Specifications, a Generic Approach . Craeynest . D.. Strohmeier . A. 2001 . 2043 . 35–48 . 10.1007/3-540-45136-6_3 . Lecture Notes in Computer Science . 978-3-540-42123-8 .
Weyuker . E.J. . November 1982 . On Testing Non-Testable Programs . The Computer Journal . 25 . 4 . 465–470 . 10.1093/comjnl/25.4.465 . free .
Peters . Dennis K. . 10.1.1.69.4331 . Generating a Test Oracle from Program Documentation . McMaster University . M. Eng. . 1995 .
Peters . Dennis K. . Parnas . David L. . Generating a Test Oracle from Program Documentation . Proceedings of the 1994 International Symposium on Software Testing and Analysis . ISSTA . ACM Press . 58–65 .
Robinson, Harry; Finite State Model-Based Testing on a Shoestring, STAR West 1999
Hoffman, Douglas; Analysis of a Taxonomy for Test Oracles, Quality Week, 1998
Z.Q. . Zhou . S. . Zhang . M. . Hagenbuchner . T.H. . Tse . F.-C. . Kuo . T.Y. . Chen . 2012 . Automated functional testing of online search services . Software Testing, Verification and Reliability . 22 . 4 . 221–243 . 10.1002/stvr.437 . 10722/123864 . free .
Test Oracles Using Statistical Methods . Mayer . Johannes . Guderlei . Ralph . 2004 . Springer . Proceedings of the First International Workshop on Software Quality, Lecture Notes in Informatics . 179–189 . First International Workshop on Software Quality.
Hoffman, Douglas; Heuristic Test Oracles, Software Testing & Quality Engineering Magazine, 1999