E-graph explained

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Definition and operations

Let

\Sigma

be a set of uninterpreted functions, where

\Sigman

is the subset of

\Sigma

consisting of functions of arity

n

. Let

id

be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of

f\in\Sigman

to e-class IDs

i1,i2,\ldots,in\inid

is denoted

f(i1,i2,\ldots,in)

and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:

U

representing equivalence classes of e-class IDs, with the usual operations

find

,

add

and

merge

. An e-class ID

e

is canonical if

find(U,e)=e

; an e-node

f(i1,\ldots,in)

is canonical if each

ij

is canonical (

j

in

1,\ldots,n

).

H

(i.e. a mapping) from canonical e-nodes to e-class IDs, and

M

that maps e-class IDs to e-classes, such that

M

maps equivalent IDs to the same set of e-nodes:

\foralli,j\inid,M[i]=M[j]\Leftrightarrowfind(U,i)=find(U,j)

Invariants

In addition to the above structure, a valid e-graph conforms to several data structure invariants. Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes

f(i1,\ldots,in),f(j1,\ldots,jn)

are congruent when

find(U,ik)=find(U,jk),k\in\{1,\ldots,n\}

. The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

E-graphs expose wrappers around the

add

,

find

, and

merge

operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

E-matching

Let

V

be a set of variables and let

Term(\Sigma,V)

be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words,

Term(\Sigma,V)

is the smallest set such that

V\subsetTerm(\Sigma,V)

,

\Sigma0\subsetTerm(\Sigma,V)

, and when

x1,\ldots,xn\inTerm(\Sigma,V)

and

f\in\Sigman

, then

f(x1,\ldots,xn)\inTerm(\Sigma,V)

. A term containing variables is called a pattern, a term without variables is called ground.

An e-graph

E

represents a ground term

t\inTerm(\Sigma,\emptyset)

if one of its e-classes represents

t

. An e-class

C

represents

t

if some e-node

f(i1,\ldots,in)\inC

does. An e-node

f(i1,\ldots,in)\inC

represents a term

g(j1,\ldots,jn)

if

f=g

and each e-class

M[ik]

represents the term

jk

(

k

in

1,\ldots,n

).

e-matching is an operation that takes a pattern

p\inTerm(\Sigma,V)

and an e-graph

E

, and yields all pairs

(\sigma,C)

where

\sigma\subsetV x id

is a substitution mapping the variables in

p

to e-class IDs and

C\inid

is an e-class ID such that each term

\sigma(p)

is represented by

C

. There are several known algorithms for e-matching,[1] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.[2]

Complexity

Equality saturation

Equality saturation is a technique for building optimizing compilers using e-graphs. It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3[3] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers. In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates. E-graphs are also used in the Simplify theorem prover of ESC/Java.[4]

Equality saturation is used in specialized optimizing compilers,[5] e.g. for deep learning[6] and linear algebra.[7] Equality saturation has also been used for translation validation applied to the LLVM toolchain.[8]

E-graphs have been applied to several problems in program analysis, including fuzzing,[9] abstract interpretation,[10] and library learning.[11]

References

External links

Notes and References

  1. Moskal . Michał . Łopuszański . Jakub . Kiniry . Joseph R. . 2008-05-06 . E-matching for Fun and Profit . Electronic Notes in Theoretical Computer Science . Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007) . en . 198 . 2 . 19–35 . 10.1016/j.entcs.2008.04.078 . 1571-0661. free .
  2. Zhang . Yihong . Wang . Yisu Remy . Willsey . Max . Tatlock . Zachary . 2022-01-12 . Relational e-matching . Proceedings of the ACM on Programming Languages . 6 . POPL . 35:1–35:22 . 10.1145/3498696. 236924583 . free .
  3. Book: de Moura. Leonardo. Bjørner. Nikolaj. Tools and Algorithms for the Construction and Analysis of Systems . Z3: An Efficient SMT Solver . 2008. Ramakrishnan. C. R.. Rehof. Jakob. Lecture Notes in Computer Science. 4963. en. Berlin, Heidelberg. Springer. 337–340. 10.1007/978-3-540-78800-3_24. 978-3-540-78800-3. free.
  4. Detlefs . David . Nelson . Greg . Saxe . James B. . May 2005 . Simplify: a theorem prover for program checking . Journal of the ACM . 52 . 3 . 365–473 . 10.1145/1066100.1066102 . 9613854 . 0004-5411.
  5. Joshi. Rajeev. Nelson. Greg. Randall. Keith. 2002-05-17. Denali: a goal-directed superoptimizer. ACM SIGPLAN Notices. 37. 5. 304–314. 10.1145/543552.512566. 0362-1340.
  6. Yang. Yichen. Phothilimtha. Phitchaya Mangpo. Wang. Yisu Remy. Willsey. Max. Roy. Sudip. Pienaar. Jacques. 2021-03-17. Equality Saturation for Tensor Graph Superoptimization. cs.AI. 2101.01332.
  7. Wang. Yisu Remy. Hutchison. Shana. Leang. Jonathan. Howe. Bill. Suciu. Dan. 2020-12-22. SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra. cs.DB. 2002.07951.
  8. Book: Stepp. Michael. Tate. Ross. Lerner. Sorin. Computer Aided Verification . Equality-Based Translation Validator for LLVM . 2011. Gopalakrishnan. Ganesh. Qadeer. Shaz. Lecture Notes in Computer Science. 6806. en. Berlin, Heidelberg. Springer. 737–742. 10.1007/978-3-642-22110-1_59. 978-3-642-22110-1. free.
  9. Web site: Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022 . 2023-02-03 . pldi22.sigplan.org.
  10. Coward . Samuel . Constantinides . George A. . Drane . Theo . 2022-03-17 . Abstract Interpretation on E-Graphs . cs.LO . 2203.09191 .
    Coward . Samuel . Constantinides . George A. . Drane . Theo . 2022-05-30 . Combining E-Graphs with Abstract Interpretation . cs.DS . 2205.14989 .
  11. Cao . David . Kunkel . Rose . Nandi . Chandrakana . Willsey . Max . Tatlock . Zachary . Polikarpova . Nadia . 2023-01-09 . babble: Learning Better Abstractions with E-Graphs and Anti-Unification . Proceedings of the ACM on Programming Languages . 7 . POPL . 396–424 . 10.1145/3571207 . 2212.04596 . 254536022 . 2475-1421.