diff --git a/paper/gtt.tex b/paper/gtt.tex
index 30c5da4d09deca12ae35a854c31bd20d8b03aeec..e7541a8528ad9871c4f55cefd5c810b3a5857b16 100644
--- a/paper/gtt.tex
+++ b/paper/gtt.tex
@@ -436,9 +436,10 @@ connective.
 %
 The blame soundness theorem might also be quite strong, but depends on
 the definition of blame, which is part of the operational semantics of
-the language being defined (just as Milner's original ``well-typed
-programs cannot go wrong'' depends on putting in an error outcome, and
-then ruling it out for typed programs).
+the language being defined.
+%% (just  Milner's original ``well-typed
+%% programs cannot go wrong'' depends on putting in an error outcome, and
+%% then ruling it out for typed programs).
 %
 We argue that these type soundness theorems are only indirectly
 expressing the actual desired properties of the gradual language,
@@ -468,6 +469,7 @@ Gradual type theory can be used both to explore language design questions and
 to verify behavioral properties of specific programs, such as correctness of
 optimizations and refactorings.
 
+\textbf{FIXME: This needs a section header but I'm not sure what to call it.}
 To get off the ground, we take two properties of the gradual language
 for granted.
 %
@@ -484,7 +486,7 @@ common.
 %
 The second property we take for granted is that the language satisfies
 the \emph{dynamic gradual guarantee}~\cite{refined} (``graduality'')---a
-strong correctness theorem of gradual typing--- which constraints how
+strong correctness theorem of gradual typing--- which constrains how
 changing type annotations changes behavior.  Graduality says that if we
 change the types in a program to be ``more precise''---e.g., by changing
 from the dynamic type to a more precise type such as integers or
@@ -570,34 +572,35 @@ guarantee, or one of the $\beta, \eta$ axioms---and in practice, it is
 usually $\eta$.
 
 For instance, a transient semantics, where only the top-level
-connectives are checked, violates $\eta$ for strict pairs:
+connectives are checked, violates $\eta$ for strict pairs
 \begin{small}
-  \[ {x : A_1 \times A_2} \vdash \letXbeYinZ x {(x_1,x_2)} 0 \neq 0 \]
-\end{small}
-because the top-level connectives of $A_1, A_2$ are only checked when
-the pattern match is introduced. As a concrete counterexample to
-contextual equivalence, let $A_1, A_2$ all be \texttt{String}. Then
-because only the top-level connective is checked, $(0,1)$ is a valid
-value of type $\texttt{String} \times \texttt{String}$, but pattern
-matching on the pair ensures that the two components are checked to be
-strings, so $\letXbeYinZ {(0,1)} {(x_1,x_2)} 0 \mapsto \err$, whereas
-with no pattern match a value is returned. This means simple program
-changes such as changing a function of two arguments to take a single
-pair of those arguments that are valid in a typed language are
-invalidated by the transient semantics.
+  \[ {x : A_1 \times A_2} \vdash (\letXbeYinZ x {(x_1,x_2)} 0) \neq 0 \]
+\end{small}%
+because the top-level connectives of $A_1$ and $A_2$ are only checked
+when the pattern match is introduced. As a concrete counterexample to
+contextual equivalence, let $A_1, A_2$ all be \texttt{String}.  Because
+only the top-level connective is checked, $(0,1)$ is a valid value of
+type $\texttt{String} \times \texttt{String}$, but pattern matching on
+the pair ensures that the two components are checked to be strings, so
+the left-hand side $\letXbeYinZ {(0,1)} {(x_1,x_2)} 0 \mapsto \err$
+(raises a type error). On the right-hand side, with no pattern, match a
+value (0) is returned. This means simple program changes that are valid
+in a typed language, such as changing a function of two arguments to
+take a single pair of those arguments, are invalidated by the transient
+semantics.
 %
 In summary, transient semantics is ``lazier'' than the types dictate,
 catching errors only when the term is inspected.
 
 As a subtler example, in call-by-value ``eager cast semantics'' the
-$\beta, \eta$ principles for all of the eager datatypes ($0, +, 1,
+$\beta\eta$ principles for all of the eager datatypes ($0, +, 1,
 \times$, lists, etc.) will be satisfied, but the $\eta$ principle for
-the function type $\to$ is violated, i.e., there are values $V : A \to
-A'$ for which $V \neq \lambda x:A. V x $.
+the function type $\to$ is violated: there are values $V : A \to A'$ for
+which $V \neq \lambda x:A. V x $.
 %
 For instance, take an arbitrary function value $V : A \to
 \texttt{String}$ for some type $A$, and let $V' = \obcast{A \to
-  \dyn}{A \to \texttt{String}}$ be the result of casting it to have a
+  \dyn}{A \to \texttt{String}}{V}$ be the result of casting it to have a
 dynamically typed output.
 %
 Then in eager semantics, the following programs are not equivalent:
@@ -630,10 +633,17 @@ In summary the ``eager'' cast semantics is in fact overly eager: in
 its effort to find bugs faster than ``lazy'' semantics it disables the
 very type-based reasoning that gradual typing should provide.
 
+%% FIXME: There should be some comment here that these aren't new
+%%   observations about transient/eager, and what our work adds\ldots Is
+%%   this right?
+  While these criticisms of transient and eager cast semantics are
+  well-known (cite?), a novel consequence of our development is that
+  \emph{only} the lazy cast semantics satisifies both graduality and
+  $\eta$.
 The reader unfamiliar with proof theory may find the centrality of
 $\eta$ equalities in our development unusual.  
 %
-Of course, the original $\eta$ law of the $\lambda$-calculus, which
+Of course, the $\eta$ law of the untyped $\lambda$-calculus, which
 states that any $\lambda$-term $M \equiv \lambda x. M x$, is
 restricted in a typed language to only hold for terms of function type $M
 : A \to B$.
@@ -679,14 +689,13 @@ principle which enables similar reasoning, such as proofs by induction.
 The $\eta$ principles for lazy types \emph{in call-by-name} support dual
 behavioral reasoning about lazy functions, records, and streams.
 
-\textbf{Overview of GTT.}  This paper unifies two strands of previous
-work.  The first~\citep{newahmed18} develops operational (logical
-relations) reasoning for gradual typing in a call-by-value setting, but
-does not develop an axiomatic proof theory for it.  The
-second~\citep{newlicata2018-fscd} develops an axiomatic gradual type
-theory, but for only function and pair types in call-by-name (whereas
-all existing gradual languages are call-by-value), and gives
-denotational but not operational models of the axioms.
+\textbf{Technical Overview of GTT.}  The gradual type theory developed
+in this paper unifies our previous work~\citep{newahmed18} on
+operational (logical relations) reasoning for gradual typing in a
+call-by-value setting (which did not consider a proof theory), and on an
+axiomatic proof theory for gradual typing~\citep{newlicata2018-fscd} in
+a call-by-name setting (which considered only function and product
+types, and denotational but not operational models).
 
 In this paper, we develop an axiomatic gradual type theory GTT for a unified
 language that includes \emph{both} call-by-value/eager types and
@@ -725,21 +734,21 @@ advance.  The specification defines the casts ``uniquely up to
 equivalence'', which means that any two implementations satisfying it
 are behaviorally equivalent.
 
-One of our main contributions (Section~\ref{sec:gtt}) is to generalize
-this axiomatic approach to call-by-push-value, where there are both
-eager/value types and lazy/computation types. This is both a subtler
-question than it might at first seem, and has a surprisingly nice
-answer: we find that upcasts are naturally associated with eager/value
-types and downcasts with lazy/computation types, and that the modalities
-relating values and computations induce the downcasts for eager/value
-types and upcasts for lazy/computation types.  Moreover, this analysis
-articulates an important behavioral property of casts that was proved
-operationally for call-by-value in \citet{newahmed18} but missed for
-call-by-name in \citet{newlicata2018-fscd}: upcasts for eager types and
-downcasts for lazy types are both ``pure'' in a suitable sense, which
-enables more refactorings and program optimizations.  In particular, we
-show that these casts can be taken to be (and are essentially forced to
-be) ``complex values'' and ``complex stacks'' (respectively) in
+We generalize this axiomatic approach to call-by-push-value
+(Section~\ref{sec:gtt}), where there are both eager/value types and
+lazy/computation types. This is both a subtler question than it might at
+first seem, and has a surprisingly nice answer: we find that upcasts are
+naturally associated with eager/value types and downcasts with
+lazy/computation types, and that the modalities relating values and
+computations induce the downcasts for eager/value types and upcasts for
+lazy/computation types.  Moreover, this analysis articulates an
+important behavioral property of casts that was proved operationally for
+call-by-value in \citep{newahmed18} but missed for call-by-name in
+\citep{newlicata2018-fscd}: upcasts for eager types and downcasts for
+lazy types are both ``pure'' in a suitable sense, which enables more
+refactorings and program optimizations.  In particular, we show that
+these casts can be taken to be (and are essentially forced to be)
+``complex values'' and ``complex stacks'' (respectively) in
 call-by-push-value, which corresponds to a behavioral property of
 \emph{thunkability} and
 \emph{linearity}~\cite{munchmaccagnoni14nonassociative}.  We argue in
@@ -749,11 +758,11 @@ dynamic eager/value type and a dynamic lazy/computation type, where the
 former can be thought of as a sum of all possible values, and the latter
 as a product of all possible behaviors.  At the language design level,
 gradual type theory can be used to prove that, for a variety of
-eager/value and lazy/computation types, the ``lazy'' semantics
-of casts is the unique implementation satisfying $\beta,\eta$ and
-graduality (Section~\ref{sec:theorems-in-gtt}). These behavioral
-equivalences can then be used in reasoning about optimizations,
-refactorings, and correctness of specific programs.  
+eager/value and lazy/computation types, the ``lazy'' semantics of casts
+is the unique implementation satisfying $\beta,\eta$ and graduality
+(Section~\ref{sec:theorems-in-gtt}). These behavioral equivalences can
+then be used in reasoning about optimizations, refactorings, and
+correctness of specific programs.
 
 %% Morever, working in a setting with mixed eager/value and
 %% lazy/computation types often leads to more insight into each than
@@ -786,7 +795,7 @@ Then
 for the extended CBPV and define a step-indexed biorthogonal logical
 relation that interprets the ordering relation on terms as contextual
 error approximation, which underlies the definition of graduality as
-presented in \citet{newahmed18}.
+presented in \citep{newahmed18}.
 %
 Combining these theorems gives an implementation of the term
 language of GTT in which $\beta, \eta$ are observational equivalences
@@ -809,39 +818,38 @@ functions.
 Our modular proof architecture allows us to easily prove correctness
 of $\beta, \eta$ and graduality for all of these interpretations.
 
-\textbf{Contributions}
+\textbf{Contributions.}
 The main contributions of the paper are as follows.
 \begin{enumerate}
-\item We present Gradual Type Theory in section \ref{sec:gtt}, a simple
+\item We present Gradual Type Theory in Section \ref{sec:gtt}, a simple
   axiomatic theory of gradual typing. The theory axiomatizes three
   simple assumptions about a gradual language: compositionality,
   graduality, and type-based reasoning in the form of $\eta$
   equivalences.
 \item We prove many theorems in the formal logic of Gradual Type
-  Theory in section \ref{sec:theorems-in-gtt}. Chief among these are the
-  uniqueness implementation theorems for casts which show that for
+  Theory in Section \ref{sec:theorems-in-gtt}. These include the
+  unique implementation theorems for casts, which show that for
   each type connective of GTT, the $\eta$ principle for the type
   ensures that the casts must implement the lazy contract
   semantics. Furthermore, we show that upcasts are always pure
   functions and dually that downcasts are always strict functions, as
   long as the base type casts are pure/strict.
 \item To substantiate that GTT is a reasonable axiomatic theory for
-  gradual typing, we construct \emph{models} of GTT in sections
-  \ref{sec:contract}, \ref{sec:complex} and \ref{sec:lr}.  This
-  proceeds in two stages. First (Section \ref{sec:contract}), we use
-  call-by-push-value as a typed metalanguage to construct several
-  models of GTT using different recursive types to implement the
-  dynamic types of GTT and interpret the casts as embedding-projection
-  pairs. This extends standard translations of dynamic typing into
-  static typing using type tags: the dynamic value type is constructed
-  as a recursive sum of basic value types, but dually the dynamic
-  computation type is constructed as a recursive \emph{product} of
-  basic computation types. We observe that this interpretation of the
-  dynamic computation type naturally models stack-based
+  gradual typing, we construct \emph{models} of GTT in Sections
+  \ref{sec:contract}, \ref{sec:complex} and \ref{sec:lr}.  This proceeds
+  in two stages. First (Section \ref{sec:contract}), we use
+  call-by-push-value as a typed metalanguage to construct several models
+  of GTT using different recursive types to implement the dynamic types
+  of GTT and interpret the casts as embedding-projection pairs. This
+  extends standard translations of dynamic typing into static typing
+  using type tags: the dynamic value type is constructed as a recursive
+  sum of basic value types, but dually the dynamic computation type is
+  constructed as a recursive \emph{product} of basic computation
+  types. This dynamic computation type naturally models stack-based
   implementations of variable-arity functions as used in the Scheme
   language.
 \item We then give an operational model of the term dynamism ordering
-  as contextual error approximation in sections \ref{sec:complex} and
+  as contextual error approximation in Sections \ref{sec:complex} and
   \ref{sec:lr}. To construct this model, we extend previous work on
   logical relations for error approximation from call-by-value to
   call-by-push-value \cite{newahmed18}, simplifying the presentation
@@ -1022,7 +1030,7 @@ available in \citet{newlicataahmed19:extended}.
 \label{sec:gtt}
 
 In this section we introduce the syntax of Gradual Type Theory, an
-extension of Call-by-push-value to support the constructions of
+extension of Call-by-push-value~\citep{levy03cbpvbook} to support the constructions of
 gradual typing.
 %
 First we introduce call-by-push-value and then describe in turn the
diff --git a/paper/max.bib b/paper/max.bib
index 84d08a4209f0ffa2fad52299b03ba0ed4adb831f..52648e1d860a0181b5cfcb0db3504d207a9e7509 100644
--- a/paper/max.bib
+++ b/paper/max.bib
@@ -1226,5 +1226,12 @@ author="Hinze, Ralf and Jeuring, Johan and L{\"o}h, Andres",
 title="Typed Contracts for Functional Programming",
 booktitle=flops,
 year="2006",
+}
+
+@Unpublished{newlicataahmed19:extended,
+author = {Max S. New and Daniel R. Licata and Amal Ahmed},
+title = {Gradual Type Theory (Extend Version)},
+year = {2018},
+note = {arxiv:},
+}
 
-}
\ No newline at end of file