diff --git a/paper/gtt.tex b/paper/gtt.tex index 92c0b72343f28e2a55ba3cc8a38fa66e61dbcdc2..30c5da4d09deca12ae35a854c31bd20d8b03aeec 100644 --- a/paper/gtt.tex +++ b/paper/gtt.tex @@ -66,7 +66,7 @@ %% Citation style %% Note: author/year citations are required for papers published as an %% issue of PACMPL. -% \citestyle{acmauthoryear} %% For author/year citations +\citestyle{acmauthoryear} %% For author/year citations \setcitestyle{nosort} \title{Gradual Type Theory} @@ -410,9 +410,9 @@ However, the dichotomy between gradual and optional typing is not as firm as one might like. % There have been many different proposed semantics of run-time type -checking: ``transient'' cast semantics~\cite{vitousekswordssiek:2017} +checking: ``transient'' cast semantics~\citep{vitousekswordssiek:2017} only checks the head connective of a type (number, function, list, -\ldots), ``eager'' cast semantics~\cite{herman2010spaceefficient} checks +\ldots), ``eager'' cast semantics~\citep{herman2010spaceefficient} checks run-time type information on closures, whereas ``lazy'' cast semantics~\citep{findler-felleisen02} will always delay a type-check on a function until it is called (and there are other possibilities, see @@ -458,8 +458,7 @@ the other does not) results. The logic axiomatizes the equational properties gradual programs should satisfy, and offers a high-level syntax for proving theorems about many languages at once: -if a particular operational -semantics models gradual type theory, then it satisfies all +if a language models gradual type theory, then it satisfies all provable equivalences/approximations. % Due to its type-theoretic design, different axioms of program @@ -485,8 +484,7 @@ common. % The second property we take for granted is that the language satisfies the \emph{dynamic gradual guarantee}~\cite{refined} (``graduality'')---a -now standard correctness theorem of gradual typing that most newly -proposed gradual languages seek to satisfy--- which constraints how +strong correctness theorem of gradual typing--- which constraints how changing type annotations changes behavior. Graduality says that if we change the types in a program to be ``more precise''---e.g., by changing from the dynamic type to a more precise type such as integers or @@ -736,8 +734,8 @@ types and downcasts with lazy/computation types, and that the modalities relating values and computations induce the downcasts for eager/value types and upcasts for lazy/computation types. Moreover, this analysis articulates an important behavioral property of casts that was proved -operationally for call-by-value in \cite{newahmed18} but missed for -call-by-name in \cite{newlicata2018-fscd}: upcasts for eager types and +operationally for call-by-value in \citet{newahmed18} but missed for +call-by-name in \citet{newlicata2018-fscd}: upcasts for eager types and downcasts for lazy types are both ``pure'' in a suitable sense, which enables more refactorings and program optimizations. In particular, we show that these casts can be taken to be (and are essentially forced to @@ -788,7 +786,7 @@ Then for the extended CBPV and define a step-indexed biorthogonal logical relation that interprets the ordering relation on terms as contextual error approximation, which underlies the definition of graduality as -presented in \cite{newahmed18}. +presented in \citet{newahmed18}. % Combining these theorems gives an implementation of the term language of GTT in which $\beta, \eta$ are observational equivalences @@ -812,7 +810,6 @@ Our modular proof architecture allows us to easily prove correctness of $\beta, \eta$ and graduality for all of these interpretations. \textbf{Contributions} - The main contributions of the paper are as follows. \begin{enumerate} \item We present Gradual Type Theory in section \ref{sec:gtt}, a simple @@ -854,7 +851,7 @@ The main contributions of the paper are as follows. \begin{shortonly} \textbf{Extended version:} An extended version of the paper, which includes the omitted cases of definitions, lemmas, and proofs is -available as \citet{newlicataahmed19:extended}. +available in \citet{newlicataahmed19:extended}. \end{shortonly} % While gradual typing has been researched quite extensively by proving @@ -1452,7 +1449,7 @@ types will arise from type dynamism and casts. The \emph{type dynamism} relation of gradual type theory is written $A \ltdyn A'$ and read as ``$A$ is less dynamic than $A'$''; intuitively, this means that $A'$ supports more behaviors than $A$. -\cite{newahmed18,newlicata2018-fscd} analyze this as the existence of an \emph{upcast} +\citet{newahmed18,newlicata2018-fscd} analyze this as the existence of an \emph{upcast} from $A$ to $A'$ and a downcast from $A'$ to $A$ which form an embedding-projection pair (\emph{ep pair}) for term error approximation (an ordering where runtime errors are minimal): the upcast followed by the @@ -1490,35 +1487,35 @@ domain~\cite{newahmed18,newlicata2018-fscd}. \begin{mathpar} \framebox{$A \ltdyn A'$ and $\u B \ltdyn \u B'$} - \inferrule*[right=VTyRefl]{ }{A \ltdyn A} + \inferrule*[lab=VTyRefl]{ }{A \ltdyn A} - \inferrule*[right=VTyTrans]{A \ltdyn A' \and A' \ltdyn A''} + \inferrule*[lab=VTyTrans]{A \ltdyn A' \and A' \ltdyn A''} {A \ltdyn A''} - \inferrule*[right=CTyRefl]{ }{\u B \ltdyn \u B'} + \inferrule*[lab=CTyRefl]{ }{\u B \ltdyn \u B'} - \inferrule*[right=CTyTrans]{\u B \ltdyn \u B' \and \u B' \ltdyn \u B''} + \inferrule*[lab=CTyTrans]{\u B \ltdyn \u B' \and \u B' \ltdyn \u B''} {\u B \ltdyn \u B''} - \inferrule*[right=VTyTop]{ }{A \ltdyn \dynv} + \inferrule*[lab=VTyTop]{ }{A \ltdyn \dynv} - \inferrule*[right=$U$Mon]{\u B \ltdyn \u B'} + \inferrule*[lab=$U$Mon]{\u B \ltdyn \u B'} {U B \ltdyn U B'} - \inferrule*[right=$+$Mon]{A_1 \ltdyn A_1' \and A_2 \ltdyn A_2' } + \inferrule*[lab=$+$Mon]{A_1 \ltdyn A_1' \and A_2 \ltdyn A_2' } {A_1 + A_2 \ltdyn A_1' + A_2'} - \inferrule*[right=$\times$Mon]{A_1 \ltdyn A_1' \and A_2 \ltdyn A_2' } + \inferrule*[lab=$\times$Mon]{A_1 \ltdyn A_1' \and A_2 \ltdyn A_2' } {A_1 \times A_2 \ltdyn A_1' \times A_2'} \\ - \inferrule*[right=CTyTop]{ }{\u B \ltdyn \dync} + \inferrule*[lab=CTyTop]{ }{\u B \ltdyn \dync} -\inferrule*[right=$F$Mon]{A \ltdyn A' }{ \u F A \ltdyn \u F A'} +\inferrule*[lab=$F$Mon]{A \ltdyn A' }{ \u F A \ltdyn \u F A'} -\inferrule*[right=$\with$Mon]{\u B_1 \ltdyn \u B_1' \and \u B_2 \ltdyn \u B_2'} +\inferrule*[lab=$\with$Mon]{\u B_1 \ltdyn \u B_1' \and \u B_2 \ltdyn \u B_2'} {\u B_1 \with \u B_2 \ltdyn \u B_1' \with \u B_2'} -\inferrule*[right=$\to$Mon]{A \ltdyn A' \and \u B \ltdyn \u B'} +\inferrule*[lab=$\to$Mon]{A \ltdyn A' \and \u B \ltdyn \u B'} {A \to \u B \ltdyn A' \to \u B'} \begin{longonly} \\ @@ -1551,7 +1548,7 @@ considered. However, we can arrive at a first proposal by considering how previous work would be embedded into CBPV. % In the previous work on both CBV and -CBN~\cite{newahmed18,newlicata2018-fscd} every type dynamism judgement +CBN~\citep{newahmed18,newlicata2018-fscd} every type dynamism judgement $A \ltdyn A'$ induces both an upcast from $A$ to $A'$ and a downcast from $A'$ to $A$. % @@ -1667,14 +1664,14 @@ our design choice is forced for all casts, as long as the casts between ground t \inferrule*[lab=TmDynRefl]{ }{\Gamma \ltdyn \Gamma \mid \Delta \ltdyn \Delta \vdash E \ltdyn E : T \ltdyn T} \qquad + \inferrule*[lab=TmDynVar] + { } + {\Phi,x \ltdyn x' : A \ltdyn A',\Phi' \vdash x \ltdyn x' : A \ltdyn A'} + \\\\ \inferrule*[lab=TmDynTrans]{\Gamma \ltdyn \Gamma' \mid \Delta \ltdyn \Delta' \vdash E \ltdyn E' : T \ltdyn T' \\\\ \Gamma' \ltdyn \Gamma'' \mid \Delta' \ltdyn \Delta'' \vdash E' \ltdyn E'' : T' \ltdyn T'' } {\Gamma \ltdyn \Gamma'' \mid \Delta \ltdyn \Delta'' \vdash E \ltdyn E'' : T \ltdyn T''} - \\\\ - \inferrule*[lab=TmDynVar] - { } - {\Phi,x \ltdyn x' : A \ltdyn A',\Phi' \vdash x \ltdyn x' : A \ltdyn A'} \qquad \inferrule*[lab=TmDynValSubst] {\Phi \vdash V \ltdyn V' : A \ltdyn A' \\\\ @@ -1859,7 +1856,7 @@ typically because they are identical), which is a syntactic judgement for contex Finally, we assert some term dynamism axioms that describe the behavior of programs. The cast universal properties at the top of -Figure~\ref{fig:gtt-term-dyn-axioms}, following~\cite{newlicata2018-fscd}, say that +Figure~\ref{fig:gtt-term-dyn-axioms}, following~\citet{newlicata2018-fscd}, say that the defining property of an upcast from $A$ to $A'$ is that it is the least dynamic term of type $A'$ that is more dynamic that $x$, a ``least upper bound''. That is, $\upcast{A}{A'}{x}$ is a term of type $A'$ that @@ -4595,7 +4592,7 @@ introduction forms for $\dynv$ are exactly the introduction forms for all of the value types (unit, pairing,$\texttt{inl}$, $\texttt{inr}$, $\texttt{force}$), while elimination forms are all of the elimination forms for computation types ($\pi$, $\pi'$, application and binding); such ``bityped'' languages -are related to \cite{girard01locussolum,zeilberger09thesis}. +are related to \citet{girard01locussolum,zeilberger09thesis}. \begin{shortonly} In the extended version, we give an extension of GTT axiomatizing this implementation of the dynamic types. @@ -4701,7 +4698,7 @@ lazy product type $\with$. Normally, these are not included in this way, but rather sums are encoded by making each case use a fresh constructor (using nominal techniques like opaque structs in Racket) and then making the sum the -union of the constructors, as argued in \cite{siekth16recursiveunion}. +union of the constructors, as argued in \citet{siekth16recursiveunion}. % We leave modeling this nominal structure to future work, but in minimalist languages, such as simple dialects of Scheme and Lisp, sum @@ -5385,7 +5382,7 @@ translate a heterogeneous term dynamism to a homogeneous inequality \end{theorem} \begin{longonly} - Relative to previous work on graduality (\cite{newahmed18}), + Relative to previous work on graduality \citep{newahmed18}, the distinction between complex value upcasts and complex stack downcasts guides the formulation of the theorem; e.g. using upcasts in the left-hand theorem would require more thunks/forces. @@ -6742,7 +6739,7 @@ strictness). \end{longonly} % \ This translation clarifies the behavioral meaning of complex values and -stacks, following \cite{munchmaccagnoni14nonassociative, +stacks, following \citet{munchmaccagnoni14nonassociative, fuhrmann1999direct}, and therefore of upcasts and downcasts. \begin{longonly} This is related to completeness of focusing: it moves inversion rules @@ -7069,7 +7066,7 @@ $\kw{split}$ are not evaluation contexts in the operational semantics). \end{longonly} -Levy~\cite{levy03cbpvbook} translates \cbpvstar\/ to \cbpv, but not does not prove +\citet{levy03cbpvbook} translates \cbpvstar\/ to \cbpv, but not does not prove the inequality preservation that we require here, so we give an alternative translation for which this property is easy to @@ -7113,7 +7110,7 @@ the same type. The \emph{de-complexification} procedure is defined as follows. % We note that this translation is not the one presented in -\cite{levy03cbpvbook}, but rather a more inefficient version that, in CPS +\citet{levy03cbpvbook}, but rather a more inefficient version that, in CPS terminology, introduces many administrative redices. % Since we are only proving results up to observational equivalence @@ -7840,7 +7837,7 @@ in figure \end{figure} \fi -This is morally the same as in \cite{levy03cbpvbook}, but we present +This is morally the same as in \citet{levy03cbpvbook}, but we present stacks in a manner similar to Hieb-Felleisen style evaluation contexts\iflong(rather than as an explicit stack machine with stack frames)\fi. % @@ -7988,7 +7985,7 @@ least preorders so we write $\apreorder$ rather than $\sim$. \noindent Three important relations arise as liftings: Equality of results lifts to observational equivalence ($\ctxize=$). The preorder generated by $\err \ltdyn R$ (i.e. the other three results are unrelated maximal elements) lifts to the notion of -\emph{error approximation} used in \cite{newahmed18} to prove the +\emph{error approximation} used in \citet{newahmed18} to prove the graduality property ($\ctxize\ltdyn$). The preorder generated by $\diverge \preceq R$ lifts to the standard notion of \emph{divergence approximation} ($\ctxize\preceq$). @@ -7999,7 +7996,7 @@ The most famous use of lifting is for observational equivalence, which is the lifting of equality of results ($\ctxize=$), and we will show that $\equidyn$ proofs in GTT imply observational equivalences. % -However, as shown in \cite{newahmed18}, the graduality property is +However, as shown in \citet{newahmed18}, the graduality property is defined in terms of an observational \emph{approximation} relation $\ltdyn$ that places $\err$ as the least element, and every other element as a maximal element. @@ -8158,7 +8155,7 @@ approximation, i.e., $M \ctxize= N$ if and only if $M \ctxize\preceq N$ and $N \ctxize\preceq M$, and since $\preceq$ has $\diverge$ as a least element, we can use a step-indexed relation to prove it. % -As shown in \cite{newahmed18}, a similar trick works for error +As shown in \citet{newahmed18}, a similar trick works for error approximation, but since $\ltdyn$ is \emph{not} an equivalence relation, we decompose it rather into two \emph{different} orderings: error approximation up to divergence on the left $\errordivergeleft$ and @@ -8252,8 +8249,8 @@ So using Lemmas~\ref{lem:ctx-commutes-conjunction}, \ref{lem:ctx-commutes-dual} \end{proof} As a corollary, the decomposition of contextual equivalence into diverge -approximation in \cite{ahmed06:lr} and the decomposition of dynamism in -\cite{newahmed18} are really the same trick: +approximation in \citet{ahmed06:lr} and the decomposition of dynamism in +\citet{newahmed18} are really the same trick: \begin{corollary}[Contextual Decomposition] ~~~ \label{cor:contextual-decomposition} \begin{enumerate} \item $\ctxize= \mathbin{\Leftrightarrow} \ctxize{\preceq} \wedge @@ -9357,7 +9354,7 @@ Corollary~\ref{cor:contextual-decomposition} that this coincides with contextual equivalence. \end{longproof} -\section{Related and Future Work} +\section{Discussion and Related Work} \label{sec:related} In this paper, we have given a logic for reasoning about gradual @@ -9374,7 +9371,7 @@ implementation theorems (e.g. $\upcast{\mathtt{list}(A)}{\mathtt{list}(A')} \equidyn \mathtt{map}\upcast{A}{A'}$). Additionally, since more efficient cast implementations such as optimized cast calculi (the lazy variant in -\cite{herman2010spaceefficient}) and threesome +\citet{herman2010spaceefficient}) and threesome casts~\cite{siekwadler10zigzag}, are equivalent to the lazy contract semantics, they should also be models of GTT, and if so we could use GTT to reason about program transformations and optimizations in them. @@ -9395,7 +9392,7 @@ sections \ref{sec:contract}, \ref{sec:complex}, \ref{sec:operational}. We conjecture that simple call-by-value and call-by-name gradual languages are also models of GTT, by extending the translation of call-by-push-value into call-by-value and call-by-name in the appendix -to Levy's monograph \cite{levy03cbpvbook}. +of Levy's monograph \cite{levy03cbpvbook}. % In order for the theorem to apply, the language must validate an appropriate version of the $\eta$ principles for the types. @@ -9404,10 +9401,10 @@ So for example, a call-by-value language that has reference equality of functions does \emph{not} validate even the value-restricted $\eta$ law for functions, and so the case for functions does not apply. % -It is a well-known issue that the lazy semantics of function casts is -not compatible with the refinement property that graduality models in -the presence of pointer equality, and our uniqueness theorem provides -a different perspective on this phenomenon +It is a well-known issue that in the presence of pointer equality of +functions, the lazy semantics of function casts is not compatible with +the graduality property, and our uniqueness theorem provides a +different perspective on this phenomenon \cite{findlerflattfelleisen04,chaperonesimpersonators, refined}. % However, we note that the cases of the uniqueness theorem for each @@ -9416,9 +9413,9 @@ specification of casts and the $\beta,\eta$ principles for the particular connective, and not on the presence of any other types, even the dynamic types. % -So while a call-by-value language that has reference equality of -functions might still have the $\eta$ principle for strict pairs, so -that case of the theorem still applies. +So even if a call-by-value language may have reference equality +functions, if it has the $\eta$ principle for strict pairs, then the +pair cast must be that of theorem \ref{functorial-casts}. Next, we consider the applicability to non-eager languages. % @@ -9430,13 +9427,13 @@ restriction for call-by-value function $\eta$. % We are not aware of any call-by-name gradual languages, but there is considerable work on \emph{contracts} for non-eager languages, -especially Haskell, \cite{hinzeJeuringLoh06,XuPJC09}. +especially Haskell \cite{hinzeJeuringLoh06,XuPJC09}. % However, we note that Haskell is \emph{not} a call-by-name language in our sense for two reasons. % First, Haskell uses call-by-need evaluation where results of -computations are memoized. However when only considering Haskell's +computations are memoized. However, when only considering Haskell's effects (error and divergence), this difference is not observable so this is not the main obstacle. % @@ -9450,7 +9447,7 @@ x. \Omega$ will not. % This is a crucial feature of Haskell and is a major source of differences between implementations of lazy contracts, as noted in -\cite{Degen2012TheIO}. +\citet{Degen2012TheIO}. % We can understand this difference by using a different translation into call-by-push-value: what Levy calls the ``lazy paradigm'', as @@ -9465,7 +9462,7 @@ With this embedding and the uniqueness theorem, GTT produces a definition for lazy casts, and the definition matches the work of \citet{XuPJC09} when restricting to non-dependent contracts. -Some recent work \cite{greenmanfelleisen:2018} gives a spectrum of +\citet{greenmanfelleisen:2018} gives a spectrum of differing syntactic type soundness theorems for different semantics of gradual typing. % @@ -9473,7 +9470,7 @@ Our work here is complementary, showing that certain program equivalences can only be achieved by certain cast semantics. \citet{Degen2012TheIO} give an analysis of different cast semantics -for contracts in lazy languages, specifically based on Haskell, so +for contracts in lazy languages, specifically based on Haskell, i.e., call-by-need with \seq. % They propose two properties ``meaning preservation'' and @@ -9493,7 +9490,7 @@ non-contracted term. Completeness, on the other hand, requires that when a contract is attached to a value that it is \emph{deeply} checked. % -The two properties are incompatible because for instance a pair of a +The two properties are incompatible because, for instance, a pair of a diverging term and a value can't be deeply checked without causing the entire program to diverge. % @@ -9523,12 +9520,13 @@ Because of this, the theorems proven in that paper are more closely related to our model construction in section \ref{sec:contract}. % -More specifically, many of the lemmas proven in the extended version -of the paper have direct analogues in Henglein's work. +More specifically, many of the properties of casts needed to prove +theorem \ref{thm:axiomatic-graduality} have direct analogues +in Henglein's work, such as the coherence theorems. % We have not included these lemmas in the paper because they are quite -similar to lemmas proven in \citet{newahmed18}; see there for a -more detailed comparison. +similar to lemmas proven in \citet{newahmed18}; see there for a more +detailed comparison, and the extended version of this paper for full proof details \citep{newlicataahmed19:extended}. % Finally, we note that our assumption of compositionality, i.e., that all casts can be decomposed into an upcast followed by a downcast, is @@ -9665,7 +9663,7 @@ interpretation. % It may be of interest to develop a more general notion of model of GTT for which we can prove soundness and completeness theorems, as in -\cite{newlicata2018-fscd}. +\citet{newlicata2018-fscd}. % A model would be a strong adjunction between double categories where one of the double categories has all ``companions'' and the other has @@ -9757,16 +9755,16 @@ structure~\cite{ahman+16fiberedeffects}. %% recursive types, polymorphism, and state, where both equational %% reasoning and gradualization are more difficult. -\newpage +%% \newpage \paragraph{Acknowledgments} -We thank Gabriel?, Sam TH?, Ron?, MSFP people? FSCD people? for helpful -discussions about this work. We thank the anonymous reviewers for -helpful feedback on a previous version of this article. This material -is based on research sponsored by the NSF? FIXME and the United States -Air Force Research Laboratory under agreement number FA9550-15-1-0053 -and and FA9550-16-1-0292. +We thank Gabriel Scherer, Kenji Maillard, Ron Garcia, anyone else? for +helpful discussions about this work. We thank the anonymous reviewers +for helpful feedback on this article. This +material is based on research sponsored by the NSF? FIXME and the +United States Air Force Research Laboratory under agreement number +FA9550-15-1-0053 and and FA9550-16-1-0292. %% The U.S. Government is %% authorized to reproduce and distribute reprints for Governmental %% purposes notwithstanding any copyright notation thereon.