Mogensen–Scott encoding explained

In computer science, Scott encoding is a way to represent (recursive) data types in the lambda calculus. Church encoding performs a similar function. The data and operators form a mathematical structure which is embedded in the lambda calculus.

Whereas Church encoding starts with representations of the basic data types, and builds up from it, Scott encoding starts from the simplest method to compose algebraic data types.

Mogensen–Scott encoding extends and slightly modifies Scott encoding by applying the encoding to Metaprogramming. This encoding allows the representation of lambda calculus terms, as data, to be operated on by a meta program.

History

Scott encoding appears first in a set of unpublished lecture notes by Dana Scottwhose first citation occurs in the book Combinatorial Logic, Volume II.^[1] Michel Parigot gave a logical interpretation of and strongly normalizing recursor for Scott-encoded numerals,^[2] referring to them as the "Stack type" representation of numbers.Torben Mogensen later extended Scott encoding for the encoding of Lambda terms as data.^[3]

Discussion

Lambda calculus allows data to be stored as parameters to a function that does not yet have all the parameters required for application. For example,

((λx₁\ldotsx_n.λc.c x₁\ldotsx_n) v₁\ldotsv_n) f

May be thought of as a record or struct where the fields

x₁\ldotsx_n

have been initialized with the values

v₁\ldotsv_n

. These values may then be accessed by applying the term to a function f. This reduces to,

f v₁\ldotsv_n

c may represent a constructor for an algebraic data type in functional languages such as Haskell. Now suppose there are N constructors, each with

A_i

arguments;

\begin{array}{c|c|c} Constructor&Givenarguments&Result\\ \hline ((λx₁\ldots

x
	A₁

.λc₁\ldotsc_N.c_1 x₁\ldots

x
	A₁

) v₁\ldots

v
	A₁

)& f₁\ldotsf_N& f_1 v₁\ldots

v
	A₁

\\ ((λx₁\ldots

x
	A₂

.λc₁\ldotsc_N.c_2 x₁\ldots

x
	A₂

) v₁\ldots

v
	A₂

)& f₁\ldotsf_N& f_2 v₁\ldots

v
	A₂

\\ \vdots&\vdots&\vdots\\ ((λx₁\ldots

x
	A_N

.λc₁\ldotsc_N.c_N x₁\ldots

x
	A_N

) v₁\ldots

v
	A_N

)& f₁\ldotsf_N& f_N v₁\ldots

v
	A_N

\end{array}

Each constructor selects a different function from the function parameters

f₁\ldotsf_N

. This provides branching in the process flow, based on the constructor. Each constructor may have a different arity (number of parameters). If the constructors have no parameters then the set of constructors acts like an enum; a type with a fixed number of values. If the constructors have parameters, recursive data structures may be constructed.

Definition

Let D be a datatype with N constructors,

\{c_i\}

	N

	i=1

, such that constructor

c_i

has arity

A_i

Scott encoding

The Scott encoding of constructor

c_i

of the data type D is

λx₁\ldots

x
	A_i

.λc₁\ldotsc_N.c_i x₁\ldots

x
	A_i

Mogensen–Scott encoding

Mogensen extends Scott encoding to encode any untyped lambda term as data. This allows a lambda term to be represented as data, within a Lambda calculus meta program. The meta function mse converts a lambda term into the corresponding data representation of the lambda term;

\begin{align} \operatorname{mse}[x]&=λa,b,c.a x\\ \operatorname{mse}[M N]&=λa,b,c.b \operatorname{mse}[M] \operatorname{mse}[N]\\ \operatorname{mse}[λx.M]&=λa,b,c.c (λx.\operatorname{mse}[M])\\ \end{align}

The "lambda term" is represented as a tagged union with three cases:

Constructor a - a variable (arity 1, not recursive)
Constructor b - function application (arity 2, recursive in both arguments),
Constructor c - lambda-abstraction (arity 1, recursive).

For example,

\begin{array}{l} \operatorname{mse}[λx.f (x x)]\\ λa,b,c.c (λx.\operatorname{mse}[f (x x)])\\ λa,b,c.c (λx.λa,b,c.b \operatorname{mse}[f] \operatorname{mse}[x x])\\ λa,b,c.c (λx.λa,b,c.b (λa,b,c.a f) \operatorname{mse}[x x])\\ λa,b,c.c (λx.λa,b,c.b (λa,b,c.a f) (λa,b,c.b \operatorname{mse}[x] \operatorname{mse}[x]))\\ λa,b,c.c (λx.λa,b,c.b (λa,b,c.a f) (λa,b,c.b (λa,b,c.a x) (λa,b,c.a x))) \end{array}

Comparison to the Church encoding

The Scott encoding coincides with the Church encoding for booleans. Church encoding of pairs may be generalized to arbitrary data types by encoding

c_i

of D above as

λx₁\ldots

x
	A_i

.λc₁\ldotsc_N.c_i(x₁c₁\ldotsc_N)\ldots

(x
	A_i

c₁\ldotsc_N)

compare this to the Mogensen Scott encoding,

λx₁\ldots

x
	A_i

.λc₁\ldotsc_N.c_ix₁\ldots

x
	A_i

With this generalization, the Scott and Church encodings coincide on all enumerated datatypes (such as the boolean datatype) because each constructor is a constant (no parameters).

Concerning the practicality of using either the Church or Scott encoding for programming, there is a symmetric trade-off:^[4] Church-encoded numerals support a constant-time addition operation and have no better than a linear-time predecessor operation; Scott-encoded numerals support a constant-time predecessor operation and have no better than a linear-time addition operation.

Type definitions

Church-encoded data and operations on them are typable in system F, as are Scott-encoded data and operations. However, the encoding is significantly more complicated.^[5]

The type of the Scott encoding of the natural numbers is the positive recursive type:

\muX.\forallR.R\to(X\toR)\toR

Full recursive types are not part of System F, but positive recursive types are expressible in System F via the encoding:

\muX.G[X]=\forallX.((G[X]\toX)\toX)

Combining these two facts yields the System F type of the Scott encoding:

\forallX.(((\forallR.R\to(X\toR)\toR)\toX)\toX)

This can be contrasted with the type of the Church encoding:

\forallX.X\to(X\toX)\toX

The Church encoding is a second-order type, but the Scott encoding is fourth-order!

References

Stump, A. (2009). Directly reflective meta-programming. Higher-Order and Symbolic Computation, 22, 115-144.
Mogensen, T.Æ. (1992). Efficient Self-Interpretations in lambda Calculus. J. Funct. Program., 2, 345-363.

Notes and References

Book: Curry . Haskell . Haskell Curry . Combinatorial Logic, Volume II . 1972 . North-Holland Publishing Company . 0-7204-2208-6.
2nd European Symposium on Programming. Nancy, France, March 21–24, 1988. Parigot . Michel . Programming with proofs: A second order type theory . H. Ganzinger. European Symposium on Programming: ESOP '88 . Lecture Notes in Computer Science . 1988 . 300 . Springer. 145–159 . 10.1007/3-540-19027-9_10 . 978-3-540-19027-1 . free .
Mogensen. Torben. Efficient Self-Interpretation in Lambda Calculus. Journal of Functional Programming. 1994. 2. 3. 345–364. 10.1017/S0956796800000423. 8736707.
Parigot . Michel . On the representation of data in lambda-calculus . International Workshop on Computer Science Logic: CSL '89 . Lecture Notes in Computer Science . 1990 . 440 . 209–321 . 10.1007/3-540-52753-2_47 . 978-3-540-52753-4 . Springer . 3rd Workshop on Computer Science Logic. Kaiserslautern, FRG, October 2-6, 1989 . Egon Börger . Hans Kleine Büning. Michael M. Richter .
See the note "Types for the Scott numerals" by Martín Abadi, Luca Cardelli and Gordon Plotkin (February 18, 1993).

Mogensen–Scott encoding explained

History

Discussion

Definition

Scott encoding

Mogensen–Scott encoding

Comparison to the Church encoding

Type definitions

See also

References

Notes and References