diff --git a/paper/gtt.tex b/paper/gtt.tex
index fab1bc964003210e25b477fea20eb38507815888..cc05b349e5cb75f93a0540897761bed67060b99c 100644
--- a/paper/gtt.tex
+++ b/paper/gtt.tex
@@ -361,14 +361,14 @@ on it?
 The reason this program is rejected is because the annotation $x :
 \texttt{Num}$ is not just something the implementor of the function
 must respect, but also something that the type checker must
-\emph{ensure}, and therefore the programmer, compiler and automated
-tools can rely on.
+\emph{ensure}, and therefore something that the programmer, compiler and
+automated tools can rely on.
 %
 For instance, if the variable $x$ is guaranteed to only be
 instantiated with numbers, then the programmer is free to replace $0$
 with $x - x$ or vice-versa.
 %
-However if $x$ is instantiated with a closure, then $x - x$ will raise
+However, if $x$ is instantiated with a closure, then $x - x$ will raise
 a type error while $0$ will succeed.
 %
 We can formalize this equality by \emph{observational equivalence} of
@@ -427,8 +427,8 @@ the definition of blame, which is an element of the system as defined.
 %
 We argue that these type soundness theorems are only indirectly
 getting at the actual desired properties of the gradual language,
-which are program equivalences in the typed portion of the code that
-are not valid in the dynamically typed portion.
+which are \emph{program equivalences in the typed portion of the code} that are
+not valid in the dynamically typed portion. 
 
 % AA: To directly get at the desired properties of a gradual language, we will
 % first axiomatize all the properties we want, and then show what operational
@@ -459,7 +459,7 @@ $\obcast{B}{A}{t}$ is equivalent to first casting up from $A$ to
 $\dyn$ and then down to $B$: $\obcast{B}{\dyn}\obcast{\dyn}{A} t$.
 %
 These casts often have quite different performance characteristics,
-but should have extensionally the same behavior: of the cast semantics
+but should have the same extensional behavior: of the cast semantics
 presented in (TODO: cite design space), only the partially eager
 detection strategory violates this principle, and this strategy is not
 common.
@@ -467,8 +467,65 @@ common.
 The second property we take for granted is that the language satisfy
 Siek-Vitousek-Cimini-Boyland's \emph{dynamic gradual guarantee}, a now
 standard correctness theorem of gradual typing that most newly
-proposed gradual languages seek to satisfy. 
-% AA: Following New-Ahmed, we call it graduality, which says that...
+proposed gradual languages seek to satisfy.  This property says that if we
+change the types in a program to be ``more precise''---e.g., by changing from 
+the dynamic type to a more precise type such as integers or functions---the
+program will either produce the same behavior as the original or with raise a
+dynamic type error.  Conversely, if a program does not error and some types are
+made ``less precise'' then the program behaves the same as before.  This is an
+important reasoning principle for programmers: it captures how we expect
+program behavior to change if we change types to be more or less precise. 
+
+% AA: I think we should explain that we build on recent work by newlicata and
+% newahmed somewhere in the intro...
+
+% We build on two strands of recent work. 
+% %
+% First, \cite{cbn-gtt} provided an axiomatic semantics for the
+% negative fragment of call-by-name in which upcasts and downcasts were
+% specified as certain joins and meets respectively.
+% %
+% They proved two ``contract uniqueness principles'' for call-by-name gradual
+%% typing: if the language satisfies a property 
+%% called graduality\footnote{an extensional version of \cite{refined}'s
+%%   dynamic gradual guarantee} and the $\eta$ laws for functions and
+%% products, then the casts between function types (respectively product
+%% types) must be equivalent to the classic ``wrapping'' implementation.
+
+%% Second, \cite{cbv-gt-lr} provided a logical relation interpretation of
+%% graduality in terms of a nonstandard ``observational error
+%% approximation'' relation.
+%% %
+%% They synthesized the casts from the syntax of type dynamism proofs and
+%% decomposed the graduality property into observational error
+%% approximation ``up to cast''.
+
+%% Each paper had certain limitations: \cite{cbn-gtt} only dealt with the
+%% negative fragment of call-by-name, and did not have an accompanying
+%% operational semantic interpretation.
+%% %
+%% On the other hand, \cite{cbv-gt-lr} dealt with positive and negative
+%% types in call-by-value, but only allowed for contextual equivalence
+%% proofs in a \emph{single} language, because it was not based on an
+%% axiomatic semantics.
+%% %
+%% In this paper, we resolve all of these problems by developing a new
+%% axiomatic system we dub \emph{Gradual Type Theory} (GTT), and a
+%% logical relation model for it using observational error approximation
+%% in a variant of Levy's \emph{Call-by-push-value}.
+%% %
+%% Gradual Type Theory is built by applying the type theoretic
+%% methodology used to design Call-by-name GTT to Call-by-push-value.
+%% %
+%% We chose call-by-push-value because it follows a similar type
+%% theoretic discipline as the negative type lambda calculus: all
+%% connectives internalize some property of the judgmental structure of
+%% the system.
+%% %
+%% Furthermore, since it fully abstractly embeds call-by-value and
+%% call-by-name calculi, we can provide similar uniqueness theorems for
+%% those systems by restricting GTT to only include the image of those
+%% embeddings.
 
 We then study what program equivalences are provable in GTT under
 various assumptions.
@@ -601,9 +658,9 @@ The model is a \emph{contract} interpretation of GTT in that the
 ``built-in'' casts of GTT are translated to ordinary functions in a
 CBPV language that perform the necessary checks.
 %
-The translation builds on the work of
-\cite{newlicata2018,newahmed2018} which provided similar translations
-for CBV and CBN fragments of the theory.
+The translation builds on recent work by 
+\citet{newlicata2018-fscd} and \citet{newahmed2018} which provided similar
+translations for CBV and CBN fragments of the theory.
 
 To keep the proofs high-level, we break the proof into two steps.
 %
@@ -648,199 +705,152 @@ support the CBV or CBN subsets of CBPV.
 Our modular proof architecture allows us to easily prove correctness
 of $\beta, \eta$ and graduality for all of these interpretations.
 
-%% While gradual typing has been researched quite extensively by proving
-%% safety theorems, these safety theorems are too weak to justify program
-%% equivalences commonly used by programmers and compilers to refactor
-%% and optimize typed programs.
-%% %
-%% These questions are essential to prove if dynamic typing is to be
-%% \emph{added} to an existing statically typed language.
-%% %
-%% For instance if when making a Gradual Haskell or Gradual OCaml,
-%% certain program equivalences were not validated, then existing
-%% optimizers and other tools could not necessarily be used, and existing
-%% code might have unexpected behavior because its correctness was based
-%% on now invalid reasoning.
-%% %
-%% Furthermore, even verified programs fall back to dynamically checking
-%% correctness: for instance CompCert's register allocation algorithm was
-%% originally an unverified procedure whose output was validated, and any
-%% bugs would cause CompCert to fail to compile the program.
-%% %
-%% Techniques from gradual typing should help to more easily and
-%% correctly implement these mixed statically/dynamically verified
-%% programs, an idea explored in \cite{bader-aldrich-tanter-vmcai18}.
-%% %
-%% Clearly it is absolutely crucial that such a gradual extension be
-%% compatible with the existing semantics of the verification language.
-
-%% It might appear that program equivalence theorems have not been proven
-%% because they do not hold for a gradually typed language.
-%% %
-%% For instance adding gradual typing to a pure simply typed lambda
-%% calculus with sums and functions makes the equational theory
-%% inconsistent, and assuming a call-by-value evaluation order will break
-%% the $\eta$ principles for functions and product types.
-%% %
-%% The reason is that the pure language has no effects while the gradual
-%% language has two: general recursion and dynamic type errors.
-%% %
-%% However, when starting from a language that already has non-trivial
-%% effects, these obstacles go away: the equational theory for pure
-%% lambda calculus does not have models in effectful languages and
-%% call-by-value languages only satisfy a restricted $\eta$ principle.
-%% %
-%% In this article we will show that gradually typed languages \emph{can}
-%% satisfy the same reasoning principles as an effectful language without
-%% gradual typing.
-%% %
-%% To clarify this goal we propose the following principle of
-%% \emph{strong} gradual type soundness:
-%% \begin{principle}{Strong Gradual Type Soundness}
-%%   The relational properties of a gradually typed language should be
-%%   the same as the fully typed sublanguage with general recursion and
-%%   errors.
-%% \end{principle}
-
-%% The typical way to establish these relational properties for a
-%% language presented with an operational semantics is to express them as
-%% observational equivalence or approximation properties of programs and
-%% then develop a \emph{logical relation} or other \emph{denotational
-%%   semantics} that is sound for observational
-%% approximation/equivalence.
-%% %
-%% However, even simple languages with general recursion require a
-%% non-trivial semantics: step-indexing for logical relations and domain
-%% theory or metric spaces for denotational semantics.
-%% %
-%% These semantic techniques are essential, but obscure the basic
-%% principles being studied due to heavy formal machinery.
-
-%% Instead, we will study program equivalence properties of gradual
-%% typing via \emph{axiomatic semantics}.
-%% %
-%% This has the benefit that equivalence and approximation properties can
-%% be stated and proven entirely syntactically, and allows for multiple
-%% different concrete interpretations using logical relations, domains,
-%% etc.
-%% %
-%% This way we can add gradual typing as a sort of ``axiomatic mixin''
-%% that can be added modularly to the axioms of an existing type theory.
+% While gradual typing has been researched quite extensively by proving
+% safety theorems, these safety theorems are too weak to justify program
+% equivalences commonly used by programmers and compilers to refactor
+% and optimize typed programs.
+% %
+% These questions are essential to prove if dynamic typing is to be
+% \emph{added} to an existing statically typed language.
+% %
+% For instance if when making a Gradual Haskell or Gradual OCaml,
+% certain program equivalences were not validated, then existing
+% optimizers and other tools could not necessarily be used, and existing
+% code might have unexpected behavior because its correctness was based
+% on now invalid reasoning.
+% %
+% Furthermore, even verified programs fall back to dynamically checking
+% correctness: for instance CompCert's register allocation algorithm was
+% originally an unverified procedure whose output was validated, and any
+% bugs would cause CompCert to fail to compile the program.
+% %
+% Techniques from gradual typing should help to more easily and
+% correctly implement these mixed statically/dynamically verified
+% programs, an idea explored in \cite{bader-aldrich-tanter-vmcai18}.
+% %
+% Clearly it is absolutely crucial that such a gradual extension be
+% compatible with the existing semantics of the verification language.
+
+% It might appear that program equivalence theorems have not been proven
+% because they do not hold for a gradually typed language.
+% %
+% For instance adding gradual typing to a pure simply typed lambda
+% calculus with sums and functions makes the equational theory
+% inconsistent, and assuming a call-by-value evaluation order will break
+% the $\eta$ principles for functions and product types.
+% %
+% The reason is that the pure language has no effects while the gradual
+% language has two: general recursion and dynamic type errors.
+% %
+% However, when starting from a language that already has non-trivial
+% effects, these obstacles go away: the equational theory for pure
+% lambda calculus does not have models in effectful languages and
+% call-by-value languages only satisfy a restricted $\eta$ principle.
+% %
+% In this article we will show that gradually typed languages \emph{can}
+% satisfy the same reasoning principles as an effectful language without
+% gradual typing.
+% %
+% To clarify this goal we propose the following principle of
+% \emph{strong} gradual type soundness:
+% \begin{principle}{Strong Gradual Type Soundness}
+%   The relational properties of a gradually typed language should be
+%   the same as the fully typed sublanguage with general recursion and
+%   errors.
+% \end{principle}
+
+% The typical way to establish these relational properties for a
+% language presented with an operational semantics is to express them as
+% observational equivalence or approximation properties of programs and
+% then develop a \emph{logical relation} or other \emph{denotational
+%   semantics} that is sound for observational
+% approximation/equivalence.
+% %
+% However, even simple languages with general recursion require a
+% non-trivial semantics: step-indexing for logical relations and domain
+% theory or metric spaces for denotational semantics.
+% %
+% These semantic techniques are essential, but obscure the basic
+% principles being studied due to heavy formal machinery.
+
+% Instead, we will study program equivalence properties of gradual
+% typing via \emph{axiomatic semantics}.
+% %
+% This has the benefit that equivalence and approximation properties can
+% be stated and proven entirely syntactically, and allows for multiple
+% different concrete interpretations using logical relations, domains,
+% etc.
+% %
+% This way we can add gradual typing as a sort of ``axiomatic mixin''
+% that can be added modularly to the axioms of an existing type theory.
+
+% To study gradual typing axiomatically, we first need to understand the
+% relational principles that gradual typing should satisfy.
+% %
+% In previous work \cite{cbn-gtt}, we have developed a suitable notion
+% we call \emph{graduality}, that is an extensional refinement of
+% Siek-Vitousek-Cimini-Boyland's \emph{(dynamic) gradual guarantee}.
+% %
+% Graduality formalizes the notion that refining the types of a
+% gradually typed program to be more precise should result in a
+% refinement of program behavior.
+% %
+% If a programmer decides to change a dynamically typed portion of the
+% code to have type $\texttt{number}$, then either the new code will
+% error (because it does not produce a number) or else it should have
+% the same behavior as the original code (modulo typing differences).
+% %
+% To formalize this axiomatically, we developed a \emph{logic} for this
+% notion of type and program refinement.
+% %
+% First we extend the term language with a notion of type
+% \emph{dynamism} $A \ltdyn B$ where the dynamic type is a greatest
+% element $A \ltdyn \dyn$.
+% %
+% We also extend the language with a notion of term dynamism $t \ltdyn
+% t'$ where the dynamic type error is the \emph{least} element $\err
+% \ltdyn t$.
+% %
+% The type and term orderings are related in that if $t \ltdyn t'$ and
+% $t, t'$ are typed as $\Gamma \vdash t : A$ and $\Gamma' \vdash t' :
+% A'$ respectively then it must be the case that $A \ltdyn A'$ and
+% $\Gamma \ltdyn \Gamma'$ pointwise.
+
+% We observed in that work that in addition to modeling the gradual
+% guarantee, an axiomatic presentation of term dynamism allows us to
+% give a \emph{specification} for the casts of gradual typing.
+% %
+% Specifically, for any types $A \ltdyn B$, we add two functions: an
+% \emph{upcast} $\upcast{A}{B} t$ from $A$ to $B$ and a \emph{downcast}
+% $\dncast{A}{B} u$ from $B$ to $A$.
+% %
+% The informal specification for these terms is that a casted term
+% should act as much as possible like the original term, while
+% satisfying the different type.
+% %
+% We can model this as a refinement relation.
+% %
+% If $u : B$, then the downcast should refine the behavior of $u$: which
+% we write as $\dncast{A}{B}{u} \ltdyn u$.
+% %
+% However, this would be satisfied even by the dynamic type error
+% $\err$, so we want to ensure that it does not refine the behavior too
+% much: only as much is needed to satisfy $B$.
+% %
+% We can state this by saying that $\dncast{A}{B}{u}$ should have the
+% \emph{most behavior} of any term of type $A$ that refines $u$.
+% %
+% That is we add a rule that says if $t : A$ and $t \ltdyn u$ then $t
+% \ltdyn \dncast{A}{B}{u}$.
+% %
+% We can axiomatize the upcasts in a perfectly dual manner: $\upcast A B
+% t$ should be refined by $t$ and it should be the minimal program with
+% this property.
+% %
+% Note that these properties \emph{uniquely define} the casts (up to
+% mutual refinement) because two greatest elements must each be greater
+% than the other and dually for least elements.
 
-%% To study gradual typing axiomatically, we first need to understand the
-%% relational principles that gradual typing should satisfy.
-%% %
-%% In previous work \cite{cbn-gtt}, we have developed a suitable notion
-%% we call \emph{graduality}, that is an extensional refinement of
-%% Siek-Vitousek-Cimini-Boyland's \emph{(dynamic) gradual guarantee}.
-%% %
-%% Graduality formalizes the notion that refining the types of a
-%% gradually typed program to be more precise should result in a
-%% refinement of program behavior.
-%% %
-%% If a programmer decides to change a dynamically typed portion of the
-%% code to have type $\texttt{number}$, then either the new code will
-%% error (because it does not produce a number) or else it should have
-%% the same behavior as the original code (modulo typing differences).
-%% %
-%% To formalize this axiomatically, we developed a \emph{logic} for this
-%% notion of type and program refinement.
-%% %
-%% First we extend the term language with a notion of type
-%% \emph{dynamism} $A \ltdyn B$ where the dynamic type is a greatest
-%% element $A \ltdyn \dyn$.
-%% %
-%% We also extend the language with a notion of term dynamism $t \ltdyn
-%% t'$ where the dynamic type error is the \emph{least} element $\err
-%% \ltdyn t$.
-%% %
-%% The type and term orderings are related in that if $t \ltdyn t'$ and
-%% $t, t'$ are typed as $\Gamma \vdash t : A$ and $\Gamma' \vdash t' :
-%% A'$ respectively then it must be the case that $A \ltdyn A'$ and
-%% $\Gamma \ltdyn \Gamma'$ pointwise.
-
-%% We observed in that work that in addition to modeling the gradual
-%% guarantee, an axiomatic presentation of term dynamism allows us to
-%% give a \emph{specification} for the casts of gradual typing.
-%% %
-%% Specifically, for any types $A \ltdyn B$, we add two functions: an
-%% \emph{upcast} $\upcast{A}{B} t$ from $A$ to $B$ and a \emph{downcast}
-%% $\dncast{A}{B} u$ from $B$ to $A$.
-%% %
-%% The informal specification for these terms is that a casted term
-%% should act as much as possible like the original term, while
-%% satisfying the different type.
-%% %
-%% We can model this as a refinement relation.
-%% %
-%% If $u : B$, then the downcast should refine the behavior of $u$: which
-%% we write as $\dncast{A}{B}{u} \ltdyn u$.
-%% %
-%% However, this would be satisfied even by the dynamic type error
-%% $\err$, so we want to ensure that it does not refine the behavior too
-%% much: only as much is needed to satisfy $B$.
-%% %
-%% We can state this by saying that $\dncast{A}{B}{u}$ should have the
-%% \emph{most behavior} of any term of type $A$ that refines $u$.
-%% %
-%% That is we add a rule that says if $t : A$ and $t \ltdyn u$ then $t
-%% \ltdyn \dncast{A}{B}{u}$.
-%% %
-%% We can axiomatize the upcasts in a perfectly dual manner: $\upcast A B
-%% t$ should be refined by $t$ and it should be the minimal program with
-%% this property.
-%% %
-%% Note that these properties \emph{uniquely define} the casts (up to
-%% mutual refinement) because two greatest elements must each be greater
-%% than the other and dually for least elements.
-
-%% Previous work has built towards this goal from two directions.
-%% %
-%% First, \cite{cbn-gtt} provided an axiomatic semantics for the
-%% negative fragment of call-by-name in which upcasts and downcasts were
-%% specified as certain joins and meets respectively.
-%% %
-%% There, they proved two ``contract uniqueness principles'' for
-%% call-by-name gradual typing: if the language satisfies a property
-%% called graduality\footnote{an extensional version of \cite{refined}'s
-%%   dynamic gradual guarantee} and the $\eta$ laws for functions and
-%% products, then the casts between function types (respectively product
-%% types) must be equivalent to the classic ``wrapping'' implementation.
-
-%% Second, \cite{cbv-gt-lr} provided a logical relation interpretation of
-%% graduality in terms of a nonstandard ``observational error
-%% approximation'' relation.
-%% %
-%% They synthesized the casts from the syntax of type dynamism proofs and
-%% decomposed the graduality property into observational error
-%% approximation ``up to cast''.
-
-%% Each paper had certain limitations: \cite{cbn-gtt} only dealt with the
-%% negative fragment of call-by-name, and did not have an accompanying
-%% operational semantic interpretation.
-%% %
-%% On the other hand, \cite{cbv-gt-lr} dealt with positive and negative
-%% types in call-by-value, but only allowed for contextual equivalence
-%% proofs in a \emph{single} language, because it was not based on an
-%% axiomatic semantics.
-%% %
-%% In this paper, we resolve all of these problems by developing a new
-%% axiomatic system we dub \emph{Gradual Type Theory} (GTT), and a
-%% logical relation model for it using observational error approximation
-%% in a variant of Levy's \emph{Call-by-push-value}.
-%% %
-%% Gradual Type Theory is built by applying the type theoretic
-%% methodology used to design Call-by-name GTT to Call-by-push-value.
-%% %
-%% We chose call-by-push-value because it follows a similar type
-%% theoretic discipline as the negative type lambda calculus: all
-%% connectives internalize some property of the judgmental structure of
-%% the system.
-%% %
-%% Furthermore, since it fully abstractly embeds call-by-value and
-%% call-by-name calculi, we can provide similar uniqueness theorems for
-%% those systems by restricting GTT to only include the image of those
-%% embeddings.
 
 %% Our proof architecture is as follows:
 %% \begin{tikzcd}
@@ -5220,7 +5230,7 @@ translate a heterogeneous inequality to a homogeneous inequality
   \end{longonly}
 \end{theorem}
 
-Relative to previous work on graduality (\cite{new-ahmed2018}),
+Relative to previous work on graduality (\cite{newahmed2018}),
 the distinction between complex value upcasts and complex stack
 downcasts guides the formulation of the theorem; e.g. using upcasts in
 the left-hand theorem would require more thunks/forces.  
diff --git a/paper/max.bib b/paper/max.bib
index f70e5d4218848590882de70b5adb40de0ab42905..02ccf78315c32da68684dc724aae4806ff899811 100644
--- a/paper/max.bib
+++ b/paper/max.bib
@@ -115,6 +115,7 @@
 @STRING{icfp15 =    icfp # ", Vancouver, British Columbia, Canada" }
 @STRING{icfp16 =    icfp # ", Nara, Japan" }
 @STRING{icfp17 =    icfp # ", Oxford, United Kingdom" }
+@STRING{icfp18 =    icfp # ", Saint Louis, Missouri" }
 
 % ----
 @STRING{oopsla =    "{ACM} {S}ymposium on {O}bject {O}riented {P}rogramming:
@@ -533,7 +534,7 @@ author = "Andrew M. Pitts",
 @inproceedings{ahmed17,
 author = {Amal Ahmed and Dustin Jamner and Jeremy G. Siek and Philip Wadler},
 title = {Theorems for Free for Free: Parametricity, With and Without Types},
-booktitle = icfp,
+booktitle = icfp17,
 year = {2017},
 }
 @inproceedings{igarashipoly17,
@@ -975,3 +976,10 @@ note="Available at \texttt{http://www.cs.indiana.edu/\linebreak[0]$\sim$amal/\li
   pages={245--292},
   year={1999},
 }
+
+@inproceedings{newahmed18,
+author = {Max S. New and Amal Ahmed},
+title = {Graduality from Embedding-Projection Pairs},
+booktitle = icfp18,
+year = {2018},
+}