
# More monotonicity

Thanks to a tip from Bob Atkey, I’ve recently been looking at coeffects as a means to prove program functions monotone. This post contains an introduction to coeffects, along with an explanation of how they can be used to reason about monotonicity. My goal is to cover the key ideas of coeffects without getting into a full treatment of their semantics. The particular coeffect system that I will examine is taken from Coeffects: A calculus of context-dependent computation by Tomas Petricek, Dominic Orchard, and Alan Mycroft.

The content of this post depends on some basic category theory. In particular, it involves functors, natural transformations, and cartesian closed categories. We write objects using capital letters $A,B,C,$ etc. We write arrows as lower-case letters $f,g,$ etc. Categories are written using bold capital letters $\mbf{C}, \mbf{Q},$ etc. Natural transformations are written as lowercase greek letters: $\delta, \epsilon, \eta$, etc. The component of a natural transformation $\eta$ at an object $A$ is written $(\eta)_A$, where the natural transformation has been parenthesized and subscripted by the object. We write functors using capital letters $D,F,G,$ etc. We write the composition of two functors $F$ and $G$ as $(G \circ F)$ and the application of a functor $F$ to an object $A$ as $F(A)$.

# Coeffects

In a structural type-and-coeffect system, a function $f$ has a type of the form $\tau \overset{q}{\to} \sigma$, where $\tau$ and $\sigma$ are $f$’s domain and codomain types, and $q$ is a coeffect scalar. The coeffect scalar denotes some contextual capability’’ which augments the domain $\tau$. We will formalize the notion of a contextual capability in the next section, and in the meantime it will stay intuitive.

Petricek et al. provide an example targeted at dataflow languages such as Lustre. In this example, time is modeled as a sequence of discrete steps, and variables denote value streams which contain one element per time step. A variable occurrence denotes its stream’s value at the current time step, and values at previous timesteps can be accessed by wrapping occurrences of the variable in a syntactic form called pre. As depicted above, a coeffect scalar in this system is a natural number denoting a lower bound on the number of cached values at consecutive previous timesteps for a particular stream. Note that certain contextual capabilities are stronger than others. For example, in the dataflow system, the coeffect scalar $1$ allows us to read a stream’s value at 1 timestep prior, while the coeffect scalar $2$ is stronger than $1$ because it allows us to read a stream’s value both 1 and 2 timesteps prior.

It may be helpful to open a new browser window fixed to Figure 2, so that its rules may be quickly recalled when required by later sections.

We can assume that a function constant of type $\tau \overset{s}{\to} \sigma$ utilizes’’ at most the contextual capabilities indicated by its scalar $s$. But an upper bound on the contextual requirements of a lambda abstraction must be proven, using the type-and-coeffect system of Figure 2. A type-and-coeffect judgment has the form $\Gamma @ R \vdash e : \tau$. Here $\Gamma$, $e$, and $\tau$ are the familiar typing context, expression, and type. $R$ is a vector of coeffect scalars, associating one scalar to each entry in $\Gamma$; we refer to such a vector simply as a coeffect. We write the concatenation of coeffects $R$ and $S$ as $R \times S$.

The ABS rule says that under context $\Gamma$ and coeffect $\langle \ldots \rangle$, an abstraction $\lambda x. e$ has type $\sigma \overset{s}{\to} \tau$ whenever we can type-check its body under the the extended context $\Gamma, x:\tau$ and extended coeffect $\langle \ldots, s \rangle$. The key to ensuring that our abstraction does not overuse’’ the capability denoted by $s$ is to restrict, within the body $e$, both the positions in which the variable $x$ may occur and the positions in which it may be discarded; the weaker the capability a scalar denotes, the more restrictive it is syntactically. The particular set of restrictions a scalar makes is dependent on a parameter to the structural type-and-coeffect system called the coeffect scalar structure.

A coeffect scalar structure $\mathbf Q = (Q, \otimes, \oplus, use, ign, \leq)$ consists of

• An underlying set $Q$ of scalars
• Binary composition and contraction operators $\otimes$ and $\oplus$ on $Q$
• A coeffect scalar $use$, the most restrctive scalar that permits a variable to appear in an hole context, i.e. the most restrictive scalar $s$ such that $x:\tau @ \langle s \rangle \vdash x : \tau$ is derivable
• A coeffect scalar $ign$, the most restrictive scalar which constrains a variable in a way that permits it to be discarded, i.e. whenever $\Gamma @ \langle \ldots \rangle \vdash e : \sigma$ is derivable so is $\Gamma, x: \tau @ \langle \ldots, ign \rangle \vdash e : \sigma$

• A preorder $\leq$ on $Q$, where $q_1 \leq q_2$ means that $q_1$ is more restrictive than $q_2$

We require $(Q, \otimes, use)$ and $(Q, \oplus, ign)$ to be monoids. Further, for all $p,q,r \in Q$ we require these distributivity equalities

Finally, both $\oplus$ and $\otimes$ should be monotone separately in each argument, that is $q \leq r$ implies

The coeffect scalar structure for the dataflow coeffect system is $\mathbf{Q_{df}} = (\mathbb N, +, max, 0, 0, \leq)$, where $\mathbb N$ is the set of natural numbers, $+$ is the addition operator on natural numbers, $max$ is the operator which produces the greater of two natural number arguments, and $\leq$ is the standard comparison operator for natural numbers. It’s instructive to consider the system of Figure 2 instantiated with this structure, in which the VAR axiom $x : \tau @ \langle 0 \rangle \vdash x : \tau$ says that a variable occurrence does not require the capability to read that variable’s stream at prior time steps. Assuming $B$ is the sole base type, the $pre$ construct, which allows us to access a stream at the previous time step, is then implemented as a built-in function constant of type $B \overset{1}{\to} B$.

To get a feel for the APP rule, consider this derivation.

The conclusion concatenates disjoint typing contexts and coeffects from its function and argument premises. While the scalar in the left premise ($0$) matches its portion of the conclusion (the $0$ in $\langle 0, 1 \rangle$), the scalar in the right premise ($0$) differs from its portion of the conclusion (the $1$ in $\langle 0, 1 \rangle$). The right premise of Figure 3 places the variable $x$ in an empty context, where it required to provide its value at the current timestep, and $0$ prior timesteps (thus the scalar $0$). In the left premise, $pre$ is a function requiring access to its argument at 1 prior timestep (thus the scalar $1$). Hence, the conclusion $pre~x$ requires access to $x$ at $0 + 1 = 1$ prior timesteps. The essential power of type-and-coeffect systems is to reason about such composition of contextual capabilities, and this is handled with $\otimes$ operator, which in the dataflow coeffect scalar structure $\mathbf{Q_{df}}$ is the $+$ operator on natural numbers. In the APP rule, $\Gamma_2$ and $S$ may in general contain multiple entries. When a vector coeffect rather than a scalar is used for the second argument of $\otimes$, we apply $\otimes$ componentwise: for all coeffects $\langle s_1, s_2, \ldots, s_n \rangle$ and all scalars $t$ we have $t \otimes \langle s_1, s_2, \ldots, s_n \rangle = \langle t \otimes s_1, t \otimes s_2, \ldots, t \otimes s_n \rangle$

Since the APP rule concatenates disjoint typing contexts and coeffects, how do we type terms in which a single variable has multiple occurrences? We accomplish this by contracting two variables in context and combining their associated contextual capabilities. This requires an application of the CTXT typing rule, which allows us to prove a typing judgment with context-and-coeffect $\Gamma'@R'$ using a typing derivation with context-and-coeffect $\Gamma @ R$, obtained from a structural manipulation of $\Gamma'@R'$. The right premise allows us to apply a structural manipulation rule of the form $\Gamma'@ R' \rightsquigarrow \Gamma@R,\theta$; in particular, to contract two variables (and their associated capabilities) we must use the CONTR rule.

# Formalizing contextual capabilities

So far we have been keeping the notion of a contextual capability intuitive. We will now briefly summarize the formalization of contextual capabilities. Recall that a standard typed lambda calculus can be modeled as a cartesian closed category $\mathbf C$, where for every type $\tau$, the denotation $\sem{\tau}$ of $\tau$ is an object of $\mathbf C$. In particular, the interpretation of a function type $\sem{\sigma \to \tau}$ is the exponential object $\sem{\sigma} \Rightarrow \sem{\tau}$. Furthermore, the interpretation of a typing judgment is an arrow $\sem{x_1:\tau,\ldots,x_n:\tau_n \vdash e : \sigma} : \sem{\tau_1} \times \ldots \times \sem{\tau_n} \to \sem{\sigma}$.

To model type-and-coeffect judgments categorically, we interpret the scalar coeffect structure $\catq = (Q,\otimes,\oplus,use,ign,\leq)$ as a category using the standard interpretation of preorders as categories. However, we use the transpose of $\leq$ for the preorder. That is, the objects of $\catq$ are the elements of $Q$, and for each $q_1, q_2 \in Q$ with $q_2 \leq q_1$, $\catq$ has one arrow with domain $q_1$ and codomain $q_2$. We consider $\catq$ a strictly monoidal category with monoidal product $\otimes$ and unit $use$.

We interpret a structural type-and-coeffect system (instantiated with a scalar coeffect structure $\catq$) with respect to a cartesian closed category $\mathbf C$ and an indexed comonad functor $D$

Writing $D_{q}$ for the application $D$ to the scalar $q$, D is associated with a counit natural transformation

and a family of comultiplication natural transformations

making the following diagrams commute for all objects $C$ of $\mathbf C$

In true categorical fashion, a contextual capability is defined not by its structure, but how it behaves. In particular, the contextual capability denoted by $q$ is the endofunctor $D_{q} : \mathbf C \to \mathbf C$. $D_{q}$ operates on an object $C$ of $\mathbf C$ by placing it into a context with capability $q$’’. It operates on an arrow $f : A \to B$ of $\mathbf C$, which for our purposes represents a transformation, by converting it into a transformation $D_{q} f : D_q A \to D_q B$ that preserves the contextual capability $q$.

$\epsilon_{use}$ is the transformation of pulling an object out of a context, while for scalars $q,r \in \catq$, $\delta_{q,r}$ is the transformation of decomposing a single context into a context-in-a-context. Whenever $p, q \in \catq$ such that $p \leq q$, the functorality of $D$ gives a natural transformation $sub_{p,q} : D_{q} \to D_{p}$ which helps justify the context manipulation rule called SUB.

The bottom left corner of the top diagram therefore states that a transformation which

1. Decomposes a context of capability $q$ into an empty context containing a context of capability $q$.
2. Then takes the inner context of capability $q$ out of the empty context.

is equivalent to the identity transformation, which does nothing. The bottom diagram shows that context decomposition is in a sense associative.

With the indexed comonad $D$, a function type $\sigma \overset{s}{\to} \tau$ is interpreted as the exponential $D_{s} \sem{\sigma} \Rightarrow \sem{\tau}$. A single-variable type-and-coeffect judgment $x:\sigma @ \langle s \rangle \vdash e : \tau$ is interpreted as an arrow of type $D_{s}\sem{\sigma} \to \sem{\tau}$. Of course, we also need a way to interpret type-and-coeffect judgments with multiple variables in context, as well as interpretations of structural rules such as contraction and weakening. For this we need structural indexed comonads, a topic outside the scope of this post. For a full account of categorical semantics for coeffects, see Petricek.

At this point, you might be wondering about the goal of all this indexed comonad business. Furthermore, if the coeffect $R$ in a type-and-coeffect judgment $\Gamma @ R \vdash e : \tau$ just denotes some applications of endofunctors, why bother making those applications explicity in the judgment system? In other words, why have a judgment of the form $x : Nat @ \langle - \rangle \vdash negate~x : Nat$ rather than $x : D_{-}(Nat) \vdash negate~x : Nat$? The answer is that the domain has been decomposed into a portion $\Gamma$ that is handled manually by the programmer and a portion $R$ that is handled automatically by the indexed comonad. To apply a function of type $\sigma \overset{t}{\to} \tau$, the programmer must place an expression of type $\sigma$ in the argument position of the application. Importantly, the programmer does not supply an argument of type $D_t( \sigma)$; the contextual capability $t$ is automatically piped in to the function application.

To convey the intuition behind this automatic piping of contextual capabilities, we present the notion of an indexed coKleisli category.

If $D : \mbf{Q} \to [\mbf{C},\mbf{C}]$ is an indexed comonad, the indexed coKleisli category determined by $D$, written $\mbf{D^\ast}$ is defined such that

• The objects of $\mbf{D^\ast}$ are the same as the objects of $\mbf{C}$.

• For all $t \in \mbf{Q}$, and objects $A$ and $B$ of $\mbf{C}$, an arrow $f' : D_{t}(A) \to B$ is considered as an arrow of $f : A \to B$ of $\mbf{D^\ast}$.

• Given arrows $f : A \to B$ and $g : B \to C$ of $\mbf{D^\ast}$ with underlying arrows $f' : D_t(A) \to B$ and $g' : D_s(B) \to C$ in $C$, we define the composition $g \circ f : A \to C$ as the arrow generated from the arrow $(g \circ f)' : D_{s \otimes t}(A) \to C$ defined as $(g \circ f)' = g' \circ D_{s}(f') \circ (\delta_{s,t})_A$.

• For each object $A$ of $\mbf{D^\ast}$, the identity arrow $1_A$ in $\mbf{D^\ast}$ has the underlying arrow $1'_A = (\epsilon_{use})_A$ in $\mbf{C}$.

In an indexed coKleisli category, we can compose arrows without regard to the scalars attached to their underlying arrows. In this way, programming in a type-and-coeffect system is kind of like composing arrows in a coKleisli category.

To provide some further intuition, in an algorithmic type-and-coeffect system, $R$ in the judgment $\Gamma @ R \vdash e : \tau$ is an output position, much like $\tau$; in constrast, $\Gamma$ is an input position. The contextual capabilities required to execute expression $e$ must be synthesized from the structure of $e$ rather than inherited from $e$’s context.

We can interpret dataflow coeffects by letting $\mathbf C = \mathbf{Sets}$, the category of sets and functions, and using the indexed comonad $D : \mathbf{Q_{df}} \to [\mathbf{Sets}, \mathbf{Sets}]$ where for all $n, m \in \mathbb{N}$, sets $A$ and $B$, and functions $f : A \to B$, we have

and

We place the type $A$ into a context of capability $n$ by pairing it up with its n-ary self-product, where the nth component of the product represents the value of the stream at the nth previous consecutive timestep. To transform a function $f$ into an equivalent function which preserves the contextual capability $n$, we must apply the function not only at the current timestep, but also at the $n$ cached consecutive previous timesteps.

We pull a value of type $A$ out of a context with 0 cached previous consecutive timesteps simply by getting its value at the current timestep.

Having cached $n+m$ previous consecutive timesteps is equivalent to having cached the length-$m$ suffixes at the previous $n$ consecutive timesteps.

Finally, we can discard cached values. For $n \leq m$ we have

# Monotonicity as coeffect

Instantiating the coeffect calculus of Figure 2 with the following coeffect scalar structure results in a system which allows us to prove functions monotone.

• The scalar set is $Q = \{ ?, +, -, \sim \}$, where $?$ denotes the capability to be transformed in an equivalence-class-preserving manner, $+$ to be transformed monotonically, $-$ to be transformed antitonically, and $\sim$ to be transformed in an order-robust manner. (An order robust function maps two inputs values related by the symmetric transitive closure of its domain to two isomorphic elements of its codomain.)
• The composition and contraction operators $\otimes$ and $\oplus$ are defined in Figure 6.
• The $use$ scalar is $+$.
• The $ign$ scalar is $\sim$.
• The preorder is the partial order $\leq$ depicted in Figure 7.

For the underlying cartesian closed category $\mathbf C$ of our semantics, we choose $\mathbf{Preorder}$, the category of preordered sets and monotone functions. For preordered sets $(A, \leq_A)$ and $(B, \leq_B)$, and monotone function $f : (A, \leq_A) \to (B, \leq_B)$, our indexed comonad $D$ is defined as

where

• $\geq_A$ is the dual of $(A, \leq_A)$, that is, the preordered set defined such that for all $a_1,a_2 \in A$ we have $a_1 \geq_A a_2$ if and only if $a_2 \leq_A a_1$.
• $=_A$ is the order obtained by removing from $\leq_A$ all edges $(a,a')$ which do not have a symmetric counterpart $(a',a)$ in $\leq_A$.
• $\ast_A$ is the symmetric transitive closure of $\leq_A$.

For all preorders $S$ and all $p,q \in Q$ we have

Furthermore, if $p \leq q$ then we have

These definitions are a bit anticlimactic, but they make sense when one considers that $D_{+}(S) = S$ and $D_{p \otimes q}(S) = (D_{p} \circ D_{q})(S)$

The natural transformations of monotonicity coeffects have a much different flavor than those of the dataflow coeffects, in that they have no significant “runtime content”. While it’s tempting to think of a contextual capability as some extra information that is paired with each input value supplied to a function, monotonicity coeffects show that this really isn’t the case.

# Example derivation

I will demonstrate this system using an example from page 2 of Consistency Analysis in Bloom: a CALM and Collected Approach. Let $Nat$ be the type of natural numbers, let $Set[Nat]$ be the type of finite sets of natural numbers, ordered by inclusion, and let $Bool$ be the two element preordered set containing the boolean values true and false, ordered such that $false \leq true$. The example requires us to prove the expression $% $ monotone, where

• $X$ is a variable of type $Set[Nat]$
• $Min$ is a function of type $Set[Nat] \ato Nat$, which produces the minimum element contained in the argument
• $% $ is the standard “less than” operator on naturals, of type $Nat \ato (Nat \pto Bool)$

Renaming “<” to “lt”, $% $ rewritten to prefix notation is $(Lt~(Min~X))~10$. The following abbreviations allow our proofs to fit on the page.

• $ty(Min)$ abbreviates $Set[Nat] \ato Nat$
• $ty(Lt)$ abbreviates $Nat \ato (Nat \pto Bool)$

Using Petricek’s coeffect system, proof of the judgment $Lt : ty(Lt), Min : ty(Min), X : Set[Nat] @ \langle ?, ?, + \rangle \vdash (Lt~(Min~X))~10$ follows.

where $\Pi_1$ is

and $\Pi_2$ is

and $\Pi_3$ is

and $\Pi_4$ is

# Conclusion

Coeffects are a simple yet extremely general framework enabling type systems which can prove, among many other things, that a program function is monotone. I think that recently proposed programming frameworks such as Bloom, LVars, and Lasp can benefit from this power. In addition, I believe all of these frameworks would also benefit from the ability to prove that a program function is a semilattice homomorphism. For many semilattice datatypes, a monotonicity-aware type system such as the one presented here can be used for this purpose, as a monotone function from a certain (common) kind of semilattice’s join-irreducible elements can be converted into a homomorphism from the semilattice itself.

While this post focused primarily on semantics, there are pragmatic issues which need addressing.

First, the coeffect calculus presented here in non-algorithmic. However, it’s not hard to imagine an algorithmic variant, developed under the same strategy as algorithmic linear type systems: a type-and-coeffect judgment would include an additional vector of output coeffects, denoting that portion of the input capability which is not actually utilized by the term. It may be necessary to place some additional restrictions on the scalar coeffect structure, but I expect that the scalar coeffect structure for monotonicity coeffects would satisfy these restrictions.

Second, even with an algorithmic system, it isn’t currently clear to me that this approach is going to be mentally tractable for programmers, as there’s a fair amount of coeffect shuffling that happens to the left of the turnstile. In my opinion, a type system is most useful as a design tool, a mental framework that a programmer uses to compose a complex program from simple parts. I’m ultimately looking for such a mental framework, rather than a mere verification tool, for proving program functions monotone.

It’s exciting to imagine a statically typed variant of BloomL. Perhaps coeffects can serve as the basis for static BloomL’s type system.