diff --git a/paper/gtt.tex b/paper/gtt.tex
index 2a61d07f74b73042869dd1aa375aa3e066203316..a83742b9c9a1e73088cd7395ea17fa0f6251764f 100644
--- a/paper/gtt.tex
+++ b/paper/gtt.tex
@@ -446,6 +446,63 @@ expressing the actual desired properties of the gradual language,
 which are \emph{program equivalences in the typed portion of the code} that are
 not valid in the dynamically typed portion. 
 
+Such program equivalences typically include $\beta$-like principles,
+which arise from computation steps, as well as \emph{$\eta$ equalities},
+which express the uniqueness or universality of certain constructions.
+%% The reader unfamiliar with proof theory may find the centrality of
+%% $\eta$ equalities in our development unusual.  
+%
+%%Of course,
+The $\eta$ law of the untyped $\lambda$-calculus, which
+states that any $\lambda$-term $M \equiv \lambda x. M x$, is
+restricted in a typed language to only hold for terms of function type $M
+: A \to B$---$\lambda$ is the unique/universal way of making an element
+of the function type.  
+%
+This famously ``fails'' to hold in call-by-value languages in the
+presence of effects: if $M$ is a program that prints \texttt{"hello"}
+before returning a function, then $M$ will print \emph{now}, whereas
+$\lambda x. M x$ will only print when given an argument. But this can be
+accomodated with one further modification: the $\eta$ law is valid in
+simple call-by-value languages\footnote{This does not hold in languages
+  with some intensional feature of functions such as reference
+  equality. We discuss the applicability of this theorem more generally
+  in section \ref{sec:related}} (e.g. SML) if we have a ``value
+restriction'' $V \equiv \lambda x. V x$.
+%
+This illustrates that $\eta$/extensionality rules must be stated for
+each type connective, and be sensitive to the effects/evaluation order
+of the terms involved.
+%
+For instance, the $\eta$ principle for the boolean type $\texttt{Bool}$
+\emph{in call-by-value} is that for any term $M$ with a free variable $x :
+\texttt{Bool}$, $M$ is equivalent to a term that performs an if
+statement on $x$: $M \equiv \kw{if} x (M[\texttt{true}/x])
+(M[\texttt{false}/x])$.
+%
+If we have an \texttt{if} form that is strongly typed (i.e., errors on
+non-booleans) then this tells us that it is \emph{safe} to run an if
+statement on any input of boolean type (in CBN, by contrast an if
+statement forces a thunk and so is not safe).
+%
+In addition, even if our \texttt{if} statement does some kind of
+coercion, this tells us that the term $M$ only cares about whether $x$
+is ``truthy'' or ``falsy'' and so a client is free to change e.g. one
+truthy value to a different one without changing behavior.
+%
+This $\eta$ principle justifies a number of program optimizations,
+such as dead-code and common subexpression elimination, and 
+hoisting an if
+statement outside of the body of a function if it is well-scoped
+($\lambda x. \kw{if} y \, M \, N \equiv \kw {if} y \, (\lambda x.M) \, (\lambda x.N)$).
+%
+Any eager datatype, one whose elimination form is given by pattern
+matching such as $0, +, 1, \times, \mathtt{list}$, has a similar $\eta$
+principle which enables similar reasoning, such as proofs by induction.
+%
+The $\eta$ principles for lazy types \emph{in call-by-name} support dual
+behavioral reasoning about lazy functions, records, and streams.
+
 \textbf{An Axiomatic Approach to Gradual Typing}
 In this paper, we systematically study questions of program equivalence
 for a class of gradually typed languages by working in an
@@ -638,54 +695,6 @@ well-known (cite?), a novel consequence of our development is that
 \emph{only} the lazy cast semantics satisifies both graduality and
 $\eta$.
 
-The reader unfamiliar with proof theory may find the centrality of
-$\eta$ equalities in our development unusual.  
-%
-Of course, the $\eta$ law of the untyped $\lambda$-calculus, which
-states that any $\lambda$-term $M \equiv \lambda x. M x$, is
-restricted in a typed language to only hold for terms of function type $M
-: A \to B$.
-%
-This famously ``fails'' to hold in call-by-value languages in the presence
-of effects: if $M$ is a program that prints \texttt{"hello"} before returning a
-function, then $M$ will print \emph{now}, whereas $\lambda x. M x$ will only print
-when given an argument. But this can be accomodated with one further modification: the $\eta$ law is
-valid in simple call-by-value languages\footnote{This does not hold in languages with some intensional feature of functions such as reference equality. We discuss the applicability of this theorem more generally in section \ref{sec:related}} (e.g. SML) if we have a ``value
-restriction'' $V \equiv \lambda x. V x$.
-%
-This illustrates that $\eta$/extensionality rules must be stated for each type
-connective, and be sensitive to the effects/evaluation order of the terms
-involved.
-%
-For instance, the $\eta$ principle for the boolean type $\texttt{Bool}$
-\emph{in call-by-value} is that for any term $M$ with a free variable $x :
-\texttt{Bool}$, $M$ is equivalent to a term that performs an if
-statement on $x$: $M \equiv \kw{if} x (M[\texttt{true}/x])
-(M[\texttt{false}/x])$.
-%
-If we have an \texttt{if} form that is strongly typed (i.e., errors on
-non-booleans) then this tells us that it is \emph{safe} to run an if
-statement on any input of boolean type (in CBN, by contrast an if
-statement forces a thunk and so is not safe).
-%
-In addition, even if our \texttt{if} statement does some kind of coercion, this
-tells us that the term $M$ only cares about whether $x$ is ``truthy'' or
-``falsy'' and so a client is free to change e.g. one truthy value
-to a different one without changing behavior.
-%
-This $\eta$ principle justifies a number of program optimizations,
-such as dead-code and common subexpression elimination, and 
-hoisting an if
-statement outside of the body of a function if it is well-scoped
-($\lambda x. \kw{if} y \, M \, N \equiv \kw {if} y \, (\lambda x.M) \, (\lambda x.N)$).
-%
-Any eager datatype, one whose elimination form is given by pattern
-matching such as $0, +, 1, \times, \mathtt{list}$, has a similar $\eta$
-principle which enables similar reasoning, such as proofs by induction.
-%
-The $\eta$ principles for lazy types \emph{in call-by-name} support dual
-behavioral reasoning about lazy functions, records, and streams.
-
 \textbf{Technical Overview of GTT.}  The gradual type theory developed
 in this paper unifies our previous work~\citep{newahmed18} on
 operational (logical relations) reasoning for gradual typing in a