E-graph explained
In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Definition and operations
Let
be a set of
uninterpreted functions, where
is the subset of
consisting of functions of arity
. Let
be a countable set of opaque identifiers that may be compared for equality, called
e-class IDs. The application of
to e-class IDs
is denoted
and called an
e-node.
The e-graph then represents equivalence classes of e-nodes, using the following data structures:
representing equivalence classes of e-class IDs, with the usual operations
,
and
. An e-class ID
is
canonical if
; an e-node
is canonical if each
is canonical (
in
).
- An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
(i.e. a mapping) from canonical e-nodes to e-class IDs, and
that maps e-class IDs to e-classes, such that
maps equivalent IDs to the same set of e-nodes:
\foralli,j\inid,M[i]=M[j]\Leftrightarrowfind(U,i)=find(U,j)
Invariants
In addition to the above structure, a valid e-graph conforms to several data structure invariants. Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes
f(i1,\ldots,in),f(j1,\ldots,jn)
are congruent when
find(U,ik)=find(U,jk),k\in\{1,\ldots,n\}
. The
hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
Operations
E-graphs expose wrappers around the
,
, and
operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.
E-matching
Let
be a set of variables and let
be the smallest set that includes the 0-arity function symbols (also called
constants), includes the variables, and is closed under application of the function symbols. In other words,
is the smallest set such that
,
\Sigma0\subsetTerm(\Sigma,V)
, and when
x1,\ldots,xn\inTerm(\Sigma,V)
and
, then
f(x1,\ldots,xn)\inTerm(\Sigma,V)
. A term containing variables is called a
pattern, a term without variables is called
ground.
An e-graph
represents a ground term
t\inTerm(\Sigma,\emptyset)
if one of its e-classes represents
. An e-class
represents
if some e-node
does. An e-node
represents a term
if
and each e-class
represents the term
(
in
).
e-matching is an operation that takes a pattern
and an e-graph
, and yields all pairs
where
is a substitution mapping the variables in
to e-class IDs and
is an e-class ID such that each term
is represented by
. There are several known algorithms for e-matching,
[1] the
relational e-matching algorithm is based on
worst-case optimal joins and is worst-case optimal.
[2] Complexity
- An e-graph with n equalities can be constructed in O(n log n) time.
Equality saturation
Equality saturation is a technique for building optimizing compilers using e-graphs. It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.
Applications
E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3[3] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers. In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates. E-graphs are also used in the Simplify theorem prover of ESC/Java.[4]
Equality saturation is used in specialized optimizing compilers,[5] e.g. for deep learning[6] and linear algebra.[7] Equality saturation has also been used for translation validation applied to the LLVM toolchain.[8]
E-graphs have been applied to several problems in program analysis, including fuzzing,[9] abstract interpretation,[10] and library learning.[11]
References
- Book: de Moura. Leonardo. Bjørner. Nikolaj. Automated Deduction – CADE-21 . Efficient E-Matching for SMT Solvers . 2007. Pfenning. Frank. https://link.springer.com/chapter/10.1007/978-3-540-73595-3_13. Lecture Notes in Computer Science. 4603. en. Berlin, Heidelberg. Springer. 183–198. 10.1007/978-3-540-73595-3_13. 978-3-540-73595-3.
- Willsey. Max. Nandi. Chandrakana. Wang. Yisu Remy. Flatt. Oliver. Tatlock. Zachary. Panchekha. Pavel. 2021-01-04. egg: Fast and extensible equality saturation. Proceedings of the ACM on Programming Languages. 5. POPL. 23:1–23:29. 10.1145/3434304. 2004.03082. 226282597.
- Book: Tate. Ross. Stepp. Michael. Tatlock. Zachary. Lerner. Sorin. Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages . Equality saturation . 2009-01-21. https://doi.org/10.1145/1480881.1480915. POPL '09. Savannah, GA, USA. Association for Computing Machinery. 264–276. 10.1145/1480881.1480915. 978-1-60558-379-2. 2138086.
- Book: Flatt . Oliver . Coward . Samuel . Willsey . Max . Tatlock . Zachary . Panchekha . Pavel . October 2022 . Small Proofs from Congruence Closure . A. Griggio . N. Rungta . Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022 . https://repositum.tuwien.at/handle/20.500.12708/81325 . en . TU Wien Academic Press . 75–83 . 10.34727/2022/isbn.978-3-85448-053-2_13 . 978-3-85448-053-2. 252118847 .
External links
Notes and References
- Moskal . Michał . Łopuszański . Jakub . Kiniry . Joseph R. . 2008-05-06 . E-matching for Fun and Profit . Electronic Notes in Theoretical Computer Science . Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007) . en . 198 . 2 . 19–35 . 10.1016/j.entcs.2008.04.078 . 1571-0661. free .
- Zhang . Yihong . Wang . Yisu Remy . Willsey . Max . Tatlock . Zachary . 2022-01-12 . Relational e-matching . Proceedings of the ACM on Programming Languages . 6 . POPL . 35:1–35:22 . 10.1145/3498696. 236924583 . free .
- Book: de Moura. Leonardo. Bjørner. Nikolaj. Tools and Algorithms for the Construction and Analysis of Systems . Z3: An Efficient SMT Solver . 2008. Ramakrishnan. C. R.. Rehof. Jakob. Lecture Notes in Computer Science. 4963. en. Berlin, Heidelberg. Springer. 337–340. 10.1007/978-3-540-78800-3_24. 978-3-540-78800-3. free.
- Detlefs . David . Nelson . Greg . Saxe . James B. . May 2005 . Simplify: a theorem prover for program checking . Journal of the ACM . 52 . 3 . 365–473 . 10.1145/1066100.1066102 . 9613854 . 0004-5411.
- Joshi. Rajeev. Nelson. Greg. Randall. Keith. 2002-05-17. Denali: a goal-directed superoptimizer. ACM SIGPLAN Notices. 37. 5. 304–314. 10.1145/543552.512566. 0362-1340.
- Yang. Yichen. Phothilimtha. Phitchaya Mangpo. Wang. Yisu Remy. Willsey. Max. Roy. Sudip. Pienaar. Jacques. 2021-03-17. Equality Saturation for Tensor Graph Superoptimization. cs.AI. 2101.01332.
- Wang. Yisu Remy. Hutchison. Shana. Leang. Jonathan. Howe. Bill. Suciu. Dan. 2020-12-22. SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra. cs.DB. 2002.07951.
- Book: Stepp. Michael. Tate. Ross. Lerner. Sorin. Computer Aided Verification . Equality-Based Translation Validator for LLVM . 2011. Gopalakrishnan. Ganesh. Qadeer. Shaz. Lecture Notes in Computer Science. 6806. en. Berlin, Heidelberg. Springer. 737–742. 10.1007/978-3-642-22110-1_59. 978-3-642-22110-1. free.
- Web site: Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022 . 2023-02-03 . pldi22.sigplan.org.
- Coward . Samuel . Constantinides . George A. . Drane . Theo . 2022-03-17 . Abstract Interpretation on E-Graphs . cs.LO . 2203.09191 .
Coward . Samuel . Constantinides . George A. . Drane . Theo . 2022-05-30 . Combining E-Graphs with Abstract Interpretation . cs.DS . 2205.14989 .
- Cao . David . Kunkel . Rose . Nandi . Chandrakana . Willsey . Max . Tatlock . Zachary . Polikarpova . Nadia . 2023-01-09 . babble: Learning Better Abstractions with E-Graphs and Anti-Unification . Proceedings of the ACM on Programming Languages . 7 . POPL . 396–424 . 10.1145/3571207 . 2212.04596 . 254536022 . 2475-1421.