diff --git a/paper/gtt.tex b/paper/gtt.tex
index 711676431685c261abad722b27de4da62f6526b1..50ff6027adb630cf167d7235cceef03c5b90fc82 100644
--- a/paper/gtt.tex
+++ b/paper/gtt.tex
@@ -338,13 +338,13 @@ program interactively with minimal work.
 %
 On the other hand, statically typed code provides mathematically
 sound reasoning principles that justify type-based refactorings,
-enable compiler optimizations and provide machine-verified
-documentation.
+enable compiler optimizations, and underly formal software verification.
+%% provide machine-verified documentation -- reasoning principles don't provide documentation
 %
 The difficulty is accomodating both of these styles and their benefits simultaneously:
 allowing the dynamic and static code to interact without forcing the
 dynamic code to be statically checked or violating the correctness of
-type-based optimizations.
+type-based reasoning.
 
 The linchpin to the design of a gradually typed language is the
 semantics of \emph{runtime type casts}.  These are runtime checks that ensure
@@ -359,7 +359,7 @@ raise a dynamic type error.
 A programmer familiar with dynamically typed programming might object
 that this is overly strong: for instance if $f$ is just a constant
 function $f = \lambda x:\texttt{Num}. 0$ then why bother checking if
-$x$ is a number since the body of the program doesn't seem to depend
+$x$ is a number since the body of the program does not seem to depend
 on it?
 %
 The reason this program is rejected is because the annotation $x :
@@ -376,9 +376,9 @@ instantiated with numbers, then the programmer is free to replace $0$
 with $x - x$ or vice-versa.
 %
 However, if $x$ can be instantiated with a closure, then $x - x$ will raise
-a type error while $0$ will succeed.
+a runtime type error while $0$ will succeed.
 %
-We can formalize this equality by \emph{observational equivalence} of
+We can such relationships by \emph{observational equivalence} of
 programs: the two closures $\lambda x:\texttt{Num}. 0$ and $\lambda
 x:\texttt{Num}. x - x$ are indistinguishable to any other program in
 the language.
@@ -408,10 +408,11 @@ However, the dichotomy between gradual and optional typing is not as
 firm as one might like.
 %
 There have been many different proposed semantics of run-time type
-checking: ``transient'' cast semantics only checks the head connective of a
-type, ``eager'' cast semantics checks run-time type information on
-closures, whereas ``lazy'' cast semantics will always delay a type-check on
-a function until it is called~\cite{vitousekswordssiek:2017,siek+09designspace}.
+checking: ``transient'' cast semantics only checks the head connective
+of a type (number, function, list, \ldots), ``eager'' cast semantics
+checks run-time type information on closures, whereas ``lazy'' cast
+semantics will always delay a type-check on a function until it is
+called~\cite{vitousekswordssiek:2017,siek+09designspace,findler-felleisen02,hermanXXspaceefficient}.
 %
 The extent to which these different semantics have been shown to
 validate type-based reasoning has been limited to syntactic type
@@ -432,8 +433,8 @@ connective satisfies the type.
 The blame soundness theorem might also be quite strong, but depends on
 the definition of blame, which is part of the operational semantics of
 the language being defined (just as Milner's original ``well-typed
-programs don't go wrong'' depended on putting in operational transitions
-to an error state, and then ruling them out for typed programs).
+programs cannot go wrong'' depends on putting in an error outcome, and
+then ruling it out for typed programs).
 %
 We argue that these type soundness theorems are only indirectly
 expressing the actual desired properties of the gradual language,
@@ -441,27 +442,28 @@ which are \emph{program equivalences in the typed portion of the code} that are
 not valid in the dynamically typed portion. 
 
 In this paper, we systematically study questions of program equivalence
-for a broad class of gradually typed languages by working in an
+for a class of gradually typed languages by working in an
 \emph{axiomatic theory} of gradual program equivalence, a language and
 logic we call \emph{gradual type theory} (GTT). 
 %
 Gradual type theory is the combination of a language of terms and
 gradual types with a simple logic for proving program equivalence and
-\emph{approximation} (equivalence up to one side erroring more) results.
+\emph{error approximation} (equivalence up to one program erroring when
+the other does not) results.
 %
-The logic directly axiomatizes the equational properties gradual
+The logic axiomatizes the equational properties gradual
 programs should satisfy, and offers a high-level syntax for proving
-theorems about many languages at once.
+theorems about many languages at once:
+if a particular operational
+semantics models gradual type theory, then it satisfies all
+provable equivalences/approximations.
 %
 Due to its type-theoretic design, different axioms of program
 equivalence are easily added or removed.
 %
-If a particular operational
-semantics models the axioms, then it satisfies all
-equivalences/approxamations provable in the type theory.
-%
-This gives a way of organizing and exploring the space of
-implementations that satisfy these criteria.
+Gradual type theory can be used both to explore language design questions and
+to verify behavioral properties of specific programs, such as correctness of
+optimizations and refactorings.
 
 To get off the ground, we take two properties of the gradual language
 for granted.
@@ -477,18 +479,17 @@ presented in \cite{siek+09designspace}, only the partially eager
 detection strategory violates this principle, and this strategy is not
 common.
 %
-The second property we take for granted is that the language satisfy
-Siek-Vitousek-Cimini-Boyland's \emph{dynamic gradual guarantee}~\cite{refined}
-(``graduality''), a now standard correctness theorem of gradual typing
-that most newly proposed gradual languages seek to satisfy.  This
-property says that if we change the types in a program to be ``more
-precise''---e.g., by changing from the dynamic type to a more precise
-type such as integers or functions---the program will either produce the
-same behavior as the original or raise a dynamic type error.
-Conversely, if a program does not error and some types are made ``less
-precise'' then the program behaves the same as before.  This allows
-programmers to reason about how changing type annotations changes
-behavior.
+The second property we take for granted is that the language satisfies
+the \emph{dynamic gradual guarantee}~\cite{refined} (``graduality'')---a
+now standard correctness theorem of gradual typing that most newly
+proposed gradual languages seek to satisfy--- whcih constraints how
+changing type annotations changes behavior.  Graduality says that if we
+change the types in a program to be ``more precise''---e.g., by changing
+from the dynamic type to a more precise type such as integers or
+functions---the program will either produce the same behavior as the
+original or raise a dynamic type error.  Conversely, if a program does
+not error and some types are made ``less precise'' then the program
+behaves the same as before.  
 
 % We build on two strands of recent work. 
 % %
@@ -544,9 +545,9 @@ various assumptions.
 Our central application is to study when the $\beta, \eta$ equalities
 are satisfied in a gradually typed language.
 %
-The $\beta$ equalities are usually reductions of the language, so only
-the $\eta$ (extensionality) equalities for each type connective are in
-question.
+%% The $\beta$ equalities are usually reductions of the language, so only
+%% the $\eta$ (extensionality) equalities for each type connective are in
+%% question.
 %
 We approach this problem by a surprising tack: rather than defining the
 behavior of dynamic type casts and then verifying or invalidating the
@@ -555,24 +556,24 @@ $\beta$ and $\eta$ equality and then show that certain reductions of
 casts are in fact program equivalence \emph{theorems} deducible from the
 axioms of GTT.
 %
-Thus, any gradually typed language for which these reductions are not
-program equivalences is \emph{not} a model of the axioms of gradual type
-theory: so the language violates either compositionality, the gradual
-guarantee, or one of the $\beta, \eta$ axioms (in practice, usually
-$\eta$).
-
-The cast reductions that we show satisfy all of these constraints are
-those given by the ``lazy cast semantics'' (in the terminology of
-\cite{siek+09designspace})
-%
-For instance, a transient semantics, where only the first level connectives are
-checked, violates $\eta$ (for example, in the context
-${x : A_1 \times (A_2 \times A_3)}$ $\letXbeYinZ x {(x_1,(x_2,x_3))} 0$ is not
-equivalent to $0$; this is provable in GTT as a consequence of the $\eta$
-principle for $\times$).  Next, in a typical CBV ``eager cast semantics'' the
-$\beta, \eta$ principles for all of the eager datatypes $0, +, 1, \times$ will
-be satisfied, but the $\eta$ principle for the function type $\to$ is violated,
-i.e., there are values $V : A \to A'$ for which $V \neq \lambda x:A. V x $.
+Contrapositively, any gradually typed language for which these
+reductions are not program equivalences is \emph{not} a model of the
+axioms of gradual type theory: so the language violates either
+compositionality, the gradual guarantee, or one of the $\beta, \eta$
+axioms---and in practice, it is usually $\eta$.
+
+The cast reductions that we show satisfy all three constraints are
+those given by the ``lazy cast semantics''~\cite{findler-felleisen02,siek+09designspace}.  
+%
+For instance, a transient semantics, where only the top-level
+connectives are checked, violates $\eta$---for example, in transient
+semantics, ${x : A_1 \times (A_2 \times A_3)}$ $\letXbeYinZ x
+{(x_1,(x_2,x_3))} 0$ is not equivalent to $0$, but this is provable in
+GTT using the $\eta$ principle for $\times$.  Next, in a typical CBV
+``eager cast semantics'' the $\beta, \eta$ principles for all of the
+eager datatypes ($0, +, 1, \times$, lists, etc.) will be satisfied, but
+the $\eta$ principle for the function type $\to$ is violated, i.e.,
+there are values $V : A \to A'$ for which $V \neq \lambda x:A. V x $.
 %% We provide a concrete example in the appendix.
 %
 Thus, we argue that ``eager'' cast semantics is in fact overly eager: in
@@ -583,14 +584,14 @@ The reader unfamiliar with proof theory may find the centrality of
 $\eta$ equalities in our development unusual.  
 %
 Of course, the original $\eta$ law of the $\lambda$-calculus, which
-states that for any $\lambda$-term $M \equiv \lambda x. M x$, is
+states that any $\lambda$-term $M \equiv \lambda x. M x$, is
 restricted in a typed language to only hold for terms of function type $M
 : A \to B$.
 %
 While this famously ``fails'' to hold in call-by-value languages in the presence
-of effects (if $M$ is a program that prints \texttt{"hello"} before returning a
-function then $M$ will print \emph{now} whereas $\lambda x. M x$ will only print
-when given an argument), this is also not an obstacle, because the $\eta$ law is
+of effects: if $M$ is a program that prints \texttt{"hello"} before returning a
+function, then $M$ will print \emph{now}, whereas $\lambda x. M x$ will only print
+when given an argument.  But this is also not an obstacle, because the $\eta$ law is
 valid in some call-by-value languages (e.g. SML) if we have a ``value
 restriction'' $V \equiv \lambda x. V x$ or otherwise restrict to non-effectful
 programs.
@@ -600,7 +601,7 @@ connective, and be sensitive to the effects/evaluation order of the terms
 involved.
 %
 For instance, the $\eta$ principle for the boolean type $\texttt{Bool}$
-in call-by-value is that for any term $M$ with a free variable $x :
+\emph{in call-by-value} is that for any term $M$ with a free variable $x :
 \texttt{Bool}$, $M$ is equivalent to a term that performs an if
 statement on $x$: $M \equiv \kw{if} x (M[\texttt{true}/x])
 (M[\texttt{false}/x])$.
@@ -624,17 +625,16 @@ Any eager datatype, one whose elimination form is given by pattern
 matching such as $0, +, 1, \times, \mathtt{list}$, has a similar $\eta$
 principle which enables similar reasoning, such as proofs by induction.
 %
-The $\eta$ principles for negative types in call-by-name support dual
-reasoning, equating different ways to construct lazy functions, records,
-and streams.
+The $\eta$ principles for lazy types \emph{in call-by-name} support dual
+behavioral reasoning about lazy functions, records, and streams.
 
 \textbf{Overview of GTT.}  This paper unifies two strands of previous
-work.  The first~\cite{newahmed18} develops operational (logical relations)
-reasoning for gradual typing in a call-by-value setting, but does not
-develop an axiomatic proof theory for reasoning about gradual programs.
-The second~\cite{newlicata2018-fscd} develops an axiomatic gradual type theory, but
-only for negative/computation types in call-by-name (whereas all
-existing gradual languages are call-by-value), and gives
+work.  The first~\cite{newahmed18} develops operational (logical
+relations) reasoning for gradual typing in a call-by-value setting, but
+does not develop an axiomatic proof theory for it.  The
+second~\cite{newlicata2018-fscd} develops an axiomatic gradual type
+theory, but for only function and pair types in call-by-name (whereas
+all existing gradual languages are call-by-value), and gives
 domain-theoretic but not operational models of the axioms.
 
 In this paper, we develop an axiomatic gradual type theory for a unified
@@ -644,56 +644,60 @@ show that it is sound for contextual equivalence via a logical relations model
 (Sections~\ref{sec:contract}, \ref{sec:complex}, \ref{sec:operational}).
 Because the $\eta$ principles for types play a key role in our approach, it is
 necessary to work in a setting where we can have $\eta$ principles for both
-eager and lazy types.  One such setting is Levy's
+eager and lazy types.  We use Levy's
 Call-by-Push-Value~\citep{levy03cbpvbook} (CBPV), which fully and faithfully
-embeds both call-by-value and call-by-name evaluation, with both eager and lazy
+embeds both call-by-value and call-by-name evaluation with both eager and lazy
 datatypes,\footnote{The distinction between ``lazy'' vs ``eager'' casts above is
   different than lazy vs. eager datatypes.}  and underlies much recent work on
 reasoning about effectful programs~\cite{bauerpretnar13eff,lindley+17frank}.
-This allows us to prove results about existing call-by-value gradually typed
+This gradual type theory can prove results in and about existing call-by-value gradually typed
 languages, and also suggests a design for call-by-name and even full
 call-by-push-value gradually typed languages.
 
-We show that the axiomatic gradual type theory approach of \cite{newlicata2018-fscd}
-applies to call-by-(push)-value.  In this approach, general casts are
-decomposed into upcasts and downcasts, as suggested above.  A \emph{type
-  dynamism} relation (corresponding to type precision~\cite{refined} and
-na\"ive subtyping~\cite{wadler-findler09}) controls which casts exist: a
-type dynamism $A \ltdyn A'$ induces an upcast from $A$ to $A'$ and a
-downcast from $A'$ to $A$.  Then, a \emph{term precision} judgement is
-used for equational/approximational reasoning about programs.  Term
-precsion relates two terms whose types
-are related by type precision, and the upcasts and downcasts are each
-\emph{specified} by certain term precision judgements holding.  This
-specification is entirely independent of the types involved in the
-casts, so cast reductions can be \emph{proved from it}, rather than
-stipulated in advance.  The specification defines the
-casts ``uniquely up to equivalence'', which means that any two
-implementations satisfying it are behaviorly equivalent.
-
-One of our main contributions (Section~\ref{sec:gtt}) is to generalize this
-axiomatic approach to call-by-push-value, where there are both eager/value types
-and lazy/computation types. This is both a subtler question than it might at
-first seem, and has a surprisingly nice answer: we find that upcasts are
-naturally associated with eager/value types and downcasts with lazy/computation
-types, and that the modalities relating values and computations induce the
-downcasts for eager/value types and upcasts for lazy/computation types.
-Moreover, this analysis articulates an important behavioral property of casts
-that was proved operationally for call-by-value in \cite{newahmed18} but missed
-for call-by-name in \cite{newlicata2018-fscd}: upcasts for eager types and
-downcasts for lazy types are both ``pure'' in a suitable sense, which enables
-more refactorings and program optimizations.  In particular, we show that these
-casts can be taken to be (and are essentially forced to be) ``complex values''
-and ``complex stacks'' (respectively) in call-by-push-value, which corresponds
-to a behavioral property of \emph{thunkability} and
+In the proior work on call-by-name gradual type
+theory~\cite{newlicata2018-fscd}, general casts are decomposed into
+upcasts and downcasts, as suggested above.  A \emph{type dynamism}
+relation (corresponding to type precision~\cite{refined} and na\"ive
+subtyping~\cite{wadler-findler09}) controls which casts exist: a type
+dynamism $A \ltdyn A'$ induces an upcast from $A$ to $A'$ and a downcast
+from $A'$ to $A$.  Then, a \emph{term dynamism} judgement is used for
+equational/approximational reasoning about programs.  Term dynamism
+relates two terms whose types are related by type dynamism, and the
+upcasts and downcasts are each \emph{specified} by certain term
+dynamism judgements holding.  This specification is entirely
+independent of the types involved in the casts, so cast reductions can
+be \emph{proved from it}, rather than stipulated in advance.  The
+specification defines the casts ``uniquely up to equivalence'', which
+means that any two implementations satisfying it are behaviorly
+equivalent.
+
+One of our main contributions (Section~\ref{sec:gtt}) is to generalize
+this axiomatic approach to call-by-push-value, where there are both
+eager/value types and lazy/computation types. This is both a subtler
+question than it might at first seem, and has a surprisingly nice
+answer: we find that upcasts are naturally associated with eager/value
+types and downcasts with lazy/computation types, and that the modalities
+relating values and computations induce the downcasts for eager/value
+types and upcasts for lazy/computation types.  Moreover, this analysis
+articulates an important behavioral property of casts that was proved
+operationally for call-by-value in \cite{newahmed18} but missed for
+call-by-name in \cite{newlicata2018-fscd}: upcasts for eager types and
+downcasts for lazy types are both ``pure'' in a suitable sense, which
+enables more refactorings and program optimizations.  In particular, we
+show that these casts can be taken to be (and are essentially forced to
+be) ``complex values'' and ``complex stacks'' (respectively) in
+call-by-push-value, which corresponds to a behavioral property of
+\emph{thunkability} and
 \emph{linearity}~\cite{munchmaccagnoni14nonassociative}.  We argue in
-Section~\ref{sec:related} that this property is related to blame soundness.  Our
-gradual type theory naturally has two dynamic types, a dynamic eager/value type
-and a dynamic lazy/computation type, where the former can be thought of as a sum
-of all possible values, and the latter as a product of all possible behaviors.
-These axioms imply that, for a variety of eager/value and lazy/computation
-types, the ``lazy contract'' semantics of casts is the unique implementation
-satisfying $\beta,\eta$ and graduality (Section~\ref{sec:theorems-in-gtt}).
+Section~\ref{sec:related} that this property is related to blame
+soundness.  Our gradual type theory naturally has two dynamic types, a
+dynamic eager/value type and a dynamic lazy/computation type, where the
+former can be thought of as a sum of all possible values, and the latter
+as a product of all possible behaviors.  At the language design level,
+gradual type theory can be used to prove that, for a variety of
+eager/value and lazy/computation types, the ``lazy contract'' semantics
+of casts is the unique implementation satisfying $\beta,\eta$ and
+graduality (Section~\ref{sec:theorems-in-gtt}).  
 
 %% Morever, working in a setting with mixed eager/value and
 %% lazy/computation types often leads to more insight into each than
@@ -1480,7 +1484,7 @@ relation for call-by-value gradual typing~\cite{newahmed18} proved that all
 upcasts had this property (but this depended on the only effects being
 divergence and type error).  In GTT, we can make this property explicit
 in the syntax of the casts, by making the upcast $\upcast{A}{A'}$
-induced by a value type precision $A \ltdyn A'$ itself a complex value,
+induced by a value type dynamism $A \ltdyn A'$ itself a complex value,
 rather than computation.  On the other hand, a downcast between value
 types is typically implemented as a case-analysis looking for a specific
 tag, and erroring otherwise, and so should not be a complex value.
@@ -1510,7 +1514,7 @@ type dynamisms, it suffices to put in only \emph{upcasts} for
 \emph{value} type dynamisms
 and \emph{downcasts} for \emph{computation} type dynamisms, because of
 monotonicity of type dynamism for $U$/$\u F$ types.  The
-\emph{downcast} for a \emph{value} type precision $A \ltdyn A'$, as a
+\emph{downcast} for a \emph{value} type dynamism $A \ltdyn A'$, as a
 stack $\bullet : \u F A' \vdash \dncast{\u F A}{\u F A'}{\bullet} : \u F
 A$ as described above, is obtained from $\u F A \ltdyn \u F A'$ as
 computation types.  The upcast for a computation type precsion $\u B
@@ -2345,7 +2349,7 @@ $\upcast{A}{\dynv}{x} \equidyn \upcast{A'}{\dynv}{\upcast{A}{A'}{x}}$.
  less but but otherwise does not change the behavior) and
  $\upcast{T}{T'}{\dncast{T}{T'}{x}} \ltdyn x$ (downcast then upcast
  might error more but otherwise does not change the behavior).  However,
- since a value type precision $A \ltdyn A'$ induces a value upcast $x :
+ since a value type dynamism $A \ltdyn A'$ induces a value upcast $x :
  A \vdash \upcast{A}{A'}{x} : A'$ but a stack downcast $\bullet : \u F
  A' \vdash \dncast{\u F A}{\u F A'}{\bullet} : \u F A$ (and dually for
  computations), the statement of the theorem wraps one cast with 
@@ -3630,7 +3634,7 @@ This suggests taking $\bot_v := 0$ and $\bot_c := \top$.
 %% dynamic value type should have an upcast complex value into every type,
 %% which the empty type $0$ does.  A least dynamic computation type should
 %% have a complex stack from every type, which the \emph{unit} type $\top$
-%% does.  Thus, it is sensible to extend GTT with the type precision axioms
+%% does.  Thus, it is sensible to extend GTT with the type dynamism axioms
 %% $0 \ltdyn A$ and $\top \ltdyn \u B$, in which case we have:
 %% \end{shortonly}
 \begin{theorem}