The claim is that there exists dense linear order with no endpoints that is not isomorphic to the real line. First we observe that if \(M\) is a nonstandard model of arithmetic, then \(M\) has ordertype \(\mathbb{N}+\mathbb{Z}\theta\), where \(\theta\) is the ordertype of a dense linear order without endpoint. In other words, every nonstandard model of arithmetic looks like the natural numbers followed by many copies of the integers, such that between any two copies there’s another such copy, and every \(\mathbb{Z}\)-chain is preceded and followed by other \(\mathbb{Z}\)-chains.
To see this, note that the standard natural numbers, \(0, s0, ss0,...\),consist of the \(\mathbb{N}\) part of \(M\). And every nonstandard natural number \(a\) sits in an \(\mathbb{Z}\)-chain \(...a-2,a-1,a,a+1,a+2...\). For no-end-points: take any \(\mathbb{Z}\)-chain and a nonstandard \(a\) on it; \(2a\) and \(a/2\) (or \((a+1)/2\) if \(a\) is odd) must sit on different \(\mathbb{Z}\)-chains. Why? If \(a\) and \(2a\) were on the same chain, then \(a\) can reach \(2a\) by standardly-many applications of successor operation, contradicting the assumption that \(a\) is nonstandard. And for denseness: similar idea, take nonstandard \(a,b\) on different chains, then \((a+b)/2\) (or \((a+b+1)/2\) if \(a+b\) is odd) must sit on a chain between them, because otherwise the midpoint between \(a\) and \(b\) would can be reached by either standardly-many applications of successor operation from \(a\) or predecessor operation from \(b\), contradicting the assumption that \(a\) and \(b\) are nonstandard and sit on different chains.
Next, by the Löwenheim–Skolem theorem there must be nonstandard models of arithmetic of cardinality \(\vert \mathbb{R}\vert\). If all dense linear orders without endpoints of this size were isomorphic, then such a model would have ordertype \(\mathbb{N}+\mathbb{Z}\mathbb{R}\). We now argue that there isn’t such a model. This argument is due to Klaus Pothoff.
Assume there is a nonstandard model of arithmetic of ordertype \(\mathbb{N}+\mathbb{Z}\mathbb{R}\). Take a nonstandard natural number \(a\), and consider \((na\mid 0<n\in \mathbb{N})\). Writing \(\mathbb{Z}_{r_n}\) for the chain that contains \(na\), we observe that the sequence \(r_n\) is an increasing sequence of real numbers bounded above (by the index of the chain that contains \(a^2\) for example). Then the \(r_n\)’s converge to some \(r\). Consider the chain \(\mathbb{Z}_r\). Take an element \(b\) in it (if there’s any multiple of \(a\) in it, then choose \(b\) smaller than that). Now we can let \(X:=\{c\mid a \text{ divides } c \wedge c<b\}\). Now the standard natural numbers can be defined in \(M\) as \(\{n\mid na\in X\}\), contradicting the fact that the standard natural numbers are not definable in nonstandard models.
Putting everything together, there must be dense linear orders without endpoints of size continuum but not isomorphic to the real line.
]]>The following screenshot is taken from the notes of Jörg Brendle’s Bogotá lectures on forcing and the structure of the real line.
This post provides a proof or two of the remark that in any extension which adds reals, the ground model reals have inner measure zero.
Theorem. Assume \(M\) is a model of ZFC, possibly a proper class. If there is a real number not in \(M\), then the reals in \(M\) have inner measure zero.
Proof using \([0,1]\). Letting \(a\) denote a real not in \(M\), consider the translates \(A_n=\{r+\frac{a}{n}\mid r\in [0,1^M]\}\). The \(A_n\)’s are pairwise disjoint, because otherwise (say \(q+\frac{a}{n}=r+\frac{a}{m}\)) \(a\) would have been definable in \(M\) as the unique real solution to the equation \(q+\frac{x}{n}=r+\frac{x}{m}\), contradicting the assumption that \(a\notin M\).
But translation preserves inner measure, and \(\bigcup_n A_n\) is bounded. So if \([0,1]^M\) has inner measure anything other than zero, then \(\bigcup_n A_n\) would have inner measure infinity, contradicting boundeness. \(\square\)
Proof using \(2^\omega\). Given a real \(a\notin M\), consider the flip maps induced by \(a\). That is, for each natural number \(n\), let \(F_n\) flip the \((n+k)^\text{th}\) bit of \(x\in 2^\omega\) iff \(a(k)=1\). In other words, \(F_n(x)(n+k)=1-x(n+k)\) iff $a(k)=1$, otherwise \(F_n(x)(n+k)=x(n+k)\). Now mirroring the proof above, let \(A_n=F_n[(2^\omega)^M]\).
First notice that if \(F_n(x)=F_m(y)\) for \(n\neq m\), then \(x\neq y\) (This is most easily proven by looking at the contrapositive and using the fact that \(a\) has at least one \(1\)). Next, I claim that the \(A_n\)’s must be disjoint. This is because if \(F_n(x)=F_m(y)\) for \(n\neq m\), then \(a\) is definable as the unique real that makes this true (recall that the \(F\)’s are defined from \(a\)).
To see why \(a\) is unique: suppose not, then there are \(a\neq a'\) witnessing the corresponding \(F_n(x)=F_m(y)\) and \(F'_n(x)=F'_m(y)\). Now let \(k\) be the first place that \(a\) differs from \(a'\) and assume without loss of generality that \(a(k)=0\) and \(n<m\).
Observe: \(F_m(y)(n+k)=F'_m(y)(n+k)\). This is because if \(n+k<m\), then the equality holds by definition of \(F_m\) and \(F'_m\). On the other hand, if \(n+k=m+l\) for some \(l\), then \(a(l)=a'(l)\) by minimality of \(k\) and the assumption that \(n<m\). Hence \(F_m(y)(m+l)=F'_m(y)(m+l)\).
Now to arrive at a contradiction, notice that we have:
\[\begin{align*} F_n(x)(n+k) & = F_m(y)(n+k)\\ & = F'_m(y)(n+k) \\ & = F'_n(x)(n+k) \end{align*}\]But this cannot be true, since \(a\) tells \(F_n\) to keep the \((n+k)^\text{th}\) bit of \(x\), whereas \(a'\) tells \(F'_n\) to flip it. \(\square\)
The lemma before the remark is meant to show that random forcing preserves outer measure. So after forcing to add a random real, the ground model reals have outer measure 1 but inner measure 0, making it non-measurable. Similarly, after adding a Cohen real, the ground model reals don’t have the property of Baire.
]]>“People would usually spend a whole class in computability theory proving this. What they are doing is they are very carefully proving the Baire category theorem without explicitly saying it.”
I’ve recently learned of a pretty neat proof of the existence of incomparable Turing degrees. And this reminds me that I’ve actually seen quite a few funny (nuking-the-mosquito-type) proofs of this statement, so I decided to record them here.
The first proof was shown to me today by Andrew Marks. You can do this with either measure or category:
Consider the relation \(R(x,y)\Leftrightarrow x\leq_T y\). First notice that this is a Borel subset of \(\mathbb{R}\times \mathbb{R}\) (take your favorite interpretation of what \(\mathbb{R}\) is). So it is measurable/has the Baire property. Second, observe that each section \(R_y\) is countable, and so it is a null/meager subset of \(\mathbb{R}\). By Fubini’s theorem/Kuratowski-Ulam theorem, \(R\) is a null/meager subset of \(\mathbb{R}^2\). The analogous argument works to show that \(R^{-1}\) is also null/meager.
Hence, \(R\cup R^{-1}\), the set containing all pairs that are Turing comparable, is null/meager. Therefore the set of pairs \((x,y)\) such that \(x,y\) are Turing-incomparable is measure one/comeager.
The second proof I saw on the internet (for example here). It uses the observation that the continuum hypothesis follows from total comparability of the Turing degrees: each real computes countably many reals, so the reals ordered by \(\leq_T\) will form an uncountable linear order in which every proper initial segment is countable. This implies there are at most \(\omega_1\) many reals, so CH holds.
Now just force to negate CH. In the extension, \(\leq_T\) is not a linear order. But the sentence “there exists two reals that are incomparable” is \(\Sigma^1_1\), and hence by Mostowski absoluteness this alread holds in the ground model.
The third proof is somewhat similar to the second. I came up with it when I was thinking about the question “if a real is computable from a comeager set of reals, is it computable?” The measure analogue of this is true, and this was the first interaction between computability theory and measure theory. That result was indepdently obtained by Sacks, and De Leeuw-Moore-Shannon-Shapiro. The answer to my question is also yes, and the argument I came up with had already appeared in Andreas Blass’s Needed reals and recursion in generic reals 20 years ago.
The proof goes like this: force to add two Cohen reals, then neither computes the other. But again it’s \(\Sigma^1_1\) to say there exists two incomparable reals, and so this already holds true in the ground model.
The key fact used in the proof is that Cohen reals hold no computation power. I think this is an independently interesting fact, so I’ll end this post with a properly written proof.
Theorem. Let \(M\) be a countable transitive model of enough of ZFC, and let \(x\) be a real in \(M\) and \(c\) a Cohen real over \(M\). If \(x\) is computable relative to \(c\), then \(x\) is computable.
Proof. If \(x\) is computed by the Turing program \(\Phi^c_e\), then this fact also holds true in \(M[c]\), and so by the forcing theorem this is forced by some condition \(p\). That is,
\[p\Vdash \text{ the } \check e\text{th Turing program in the oracle }\dot c \text{ computes } \check x\]For any \(i\in\omega\) we compute \(x(i)\) as follows: run \(\Phi^s_e(i)\) for all the \(s\) extending \(p\).
As soon as any of these computations halt, the output will be the correct value of \(x(i)\). This is because: if \(s_0,s_1\) are two different nodes extending \(p\) and \(\Phi^{s_0}_e(i)=0\neq 1=\Phi^{s_1}_e(i)\), then we can build two different filters \(G_0\) and \(G_1\) containing \(s_0,s_1\) respectively. Now \(M[G_0]\) and \(M[G_1]\) will both think \(x\) is computed by \(\Phi_e^a\) (since both filters contain \(p\). Note that they will interpret \(a\) differently; but that doesn’t matter). So \(M[G_0]\) thinks that \(x(i)=0\) and \(M[G_1]\) thinks \(x(i)=1\). But whatever \(x(i)\) is, this is an absolute fact about \(x\in M\), so it should be answered in the same way by all transitive models extending \(M\). Contradiction! \(\square\)
]]>I remember being puzzled by the following passage from the chapter on Borel equivalence relations (by Greg Hjorth) in the Handbook of Set Theory.
The claim is that there is a Borel map \(f:2^\omega\to 2^\omega\) that reduces identity to eventual equality. In other words, \(f\) is such that
\[x=y\Leftrightarrow (f(x)(n)=f(y)(n) \text{ for all but finitely many } n)\]It is well-known (or maybe I should say well-documented?^{1}) that the existence of such a map is equivalent to saying that there is a perfect set of inequivalent elements for the following equivalence relation denoted \(E_0\):
\[xE_0 y \Leftrightarrow x(n)=y(n) \text{ for all but finitely many } n\]which is to say that \(x\) and \(y\) are eventually equal.
Hjorth says it’s routine to prove the existence of a perfect set of mutually generic reals in the Cantor space. This is puzzling at first glance: mutual genericity is a notion in forcing, which is typically used for proving consistency results, instead of existence.
It turns out that this is one of those cases where one can prove an existence claim using forcing. The trick, of course, is to appeal to Shoenfield absoluteness.
To see this, consider the statement: there is a perfect set such that any two elements are not eventually equal. Now perfect sets are coded by perfect trees, which can in turn be coded by a single real. So this sentence is really saying that there is a real coding a perfect tree, such that any two branches (i.e., real numbers tracing these branches) are not eventually equal.
Since it is arithmetic to say two reals are not eventually equal, this makes the statement \(\Sigma^1_2\). So if we can force this, this already holds true in the ground model by Shoenfield absoluteness.
But then this is easy to prove: the forcing to add a perfect set of Cohen reals^{2} will make this true. This is because the perfect set of Cohen reals will all fail to be eventually equal with one another: if some \(c_1,c_2\) are eventually equal, then from \(c_1\) we can define \(c_2\) by only chaning \(c_1\) on some finite initial segment. That will contradict the fact that \(c_1\) and \(c_2\) are mutually generic.
So I think this is what Hjorth means to say with mutual genericity.
Of course, since Cohen forcing is essentially a Baire-category method, I’ve committed theft over honest toil by sweeping under the Cohen rug any mention of “dense, meager, comeager”, etc. The interested reader can find an argument using Baire category notions in Su Gao’s Invariant Descriptive Set Theory, Theorem 5.3.1, a stronger theorem which is attributed to Mycielski.
See, for example, Proposition 5.1.12 in Su Gao’s Invariant Descriptive Set Theory. ↩
What I intend to achieve here is somewhat particular: after a first course in formal logic, set theory, or Gödel’s theorems, a motivated student might see the term “forcing” pop up on the occasional Google chase. They might be intrigued as to why this technique won Paul Cohen, its discoverer, a Fields Medal, for instance. But “what is forcing” is a question that is highly difficult to address in office hours, because there are quite a few moving pieces involved. Each of these pieces brings its own special kind of unease to the student, making the whole thing quite daunting. It is simply my hope here to record these moving pieces their associated difficulties.
Due to this peculiar aim, the article is sprinkled with pausing remarks (click to expand) on what is difficult about the matter at hand and whether it’s conceptual or technical. My judgment is that forcing is not substantially harder than any other main theorems in graduate math textbooks. It is simply due to the variety of concepts involved and the unfamiliarity of the tools used, that forcing has gained some kind of notoriety.
An earlier simpler draft of this post was first published in Chinese on the Q & A platform Zhihu. One may view the original version here: link to the Chinese version.
Forcing is the name of a technical method to establish independence results in set theory. These are results saying that certain statements cannot be deduced from the axioms of set theory (the Zermelo-Fraenkel axioms with Choice).
To show that a statement cannot be deduced from a theory, one shows that the negation of the statement is consistent with the theory. Given a mathematical theory $T$, we write $\mathsf{Con}(T)$ for the statement that $T$ is consistent: i.e., there is no proof that starts with the axioms of $T$ and ends with a contradiction.
If for some sentence $P$ we succeed in showing $\mathsf{Con}(\mathsf{ZFC}+P)$, then we have succeeded in showing that $\mathsf{ZFC}$ cannot refute $P$. This is because if $\mathsf{ZFC}$ refutes $P$, then this means $\mathsf{ZFC}$ proves $\neg P$ (“not-P”); therefore from the theory $\mathsf{ZFC}+P$ one can deduce $P \wedge \neg P$, a contradiction, by doing the following: first deduce $\neg P$ from $\mathsf{ZFC}$ alone, and then deduce $P$.
Forcing allows one to conclude, under suitable assumptions laid out below, $\mathsf{Con}(\mathsf{ZFC}+P)$ for various choices of $P$. Different people have different ways of understanding and implementing it, but arguably the most straightforward way to make sense of it is that it is a method of building new models of ZFC from old ones.
It turns out that talking about proofs is not the easiest thing to do. It is easier to work with models instead. A model of a theory, roughly put, is just a set of things that satisfies that theory. The connection between proofs and models was confirmed by Kurt Gödel in his doctoral thesis:
Fact. (Gödel’s Completeness Theorem): $\mathsf{Con}(T)$ if and only if $T$ has a model.
So if our goal is to show that $\mathsf{ZFC}$ cannot deduce $P$ (so we want to show $\mathsf{Con}(\mathsf{ZFC}+\neg P)$), it is enough to build a model of $\mathsf{ZFC}+\neg P$. This shifts the focus from talking about proofs, which are strings of symbols generated by some fixed rule, to models, which are perhaps more tangible and familiar.
Unfortunately, one cannot just build a model of $\mathsf{ZFC}$ out of thin air. Another theorem by Gödel, his second incompleteness theorem, in simple terms states:
If $\mathsf{ZFC}$ proves that $\mathsf{ZFC}$ has a model, then it can also deduce a contradiction.
So if $\mathsf{ZFC}$ is consistent to begin with, we cannot use it to deduce $\mathsf{Con}(\mathsf{ZFC})$, let alone $\mathsf{Con}(\mathsf{ZFC}+P)$. This left us with the second best option: assume that $\mathsf{ZFC}$ is consistent, and then show $\mathsf{Con}(\mathsf{ZFC}+P)$.
This is where models come in: assuming $\mathsf{ZFC}$ is consistent provides us with a model of it. With forcing, one can carefully pick out an element, add it to the model, and make sure that the resulting model does what one wants.
Models of $\mathsf{ZFC}$ are very complex. A model of $\mathsf{ZFC}$ will contain everything that $\mathsf{ZFC}$ proves to exist (i.e., pretty much all of math). In particular, it proves that the following set exists: $\{f \mid f: \mathbb{N}\to \{0,1\}\}$. Intuitively, this is the set of all infinite sequences of zeroes and ones. There is a way of viewing them as the real numbers, which is why set theorists and recursion theorists often like to refer to elements in this set as the real numbers. In particular, the set $\{f \mid f: \mathbb{N}\to \{0,1\}\}$, which we denote by $2^\mathbb{N}$, is uncountable.
Paul Cohen invented forcing for the express purpose of adding such a $0-1$ sequence to a model of $\mathsf{ZFC}$. To do this, he restricted his attention to models of $\mathsf{ZFC}$ that he knew for sure would be missing at least one $0-1$ sequence: the countable transtive ones.
A model $M$ is transitive if and only if $x\in M$ implies $x\subseteq M$. Intuitively, the model $M$ “sees” the elements of its elements, the elements of those elements, and so on. Transitive models are nice, in that if $M_1\subseteq M_2$ are transitive models, then they agree on the meaning of basic, simple terms. For instance, if $x\in M_1$ is such that $M_1$ satisfies the sentence $``x \text{ is the set of natural numbers}”$, then $M_2$ will satisfy the same sentence. Recall that the use of forcing is to build a model of $\mathsf{ZFC}$ on top of another one, so this $M_1\subseteq M_2$ situation is precisely what we will end up with. Therefore it is important to be able to hold the meaning of certain terms fixed across these two models.
Suppose $M$ is a countable transitive model of $\mathsf{ZFC}$. Since $M$ is countable, we know for sure that there are elements that $M$ is missing. The function telling us that $M$ is countable, for example ($g: M\leftrightarrow \mathbb{N}$). Another example is the “height of $M$”. This is a common name for the set of ordinals in $M$, that is $M \cap \text{ Ordinals}$. This is because being an ordinal is a basic, simple notion that $M$ correctly recognizes. So if $M \cap \text{ Ordinals}$ is an element of $M$, then $M$ will satisfy “the class of ordinals form a set”, and we know this isn’t a theorem of $\mathsf{ZFC}$.
It is also clear that adding a single set to $M$ won’t be enough. This is because $\mathsf{ZFC}$ axioms are “generative” in spirit: if $x$ is a set, then $\{x\}$ is also a set; if $x,y$ are sets, then so is $\{x,y\}$, and so on. So whatever set we add to $M$, we will also have to add the sets that are generated by it, as required by the axioms. And the sets that are generated by those, ad infinitum.
The point of the last two paragraphs is this: while we know there are plenty of things that cannot be in $M$, we need to be careful in what we want to add to it. For instance, it should be not too informative, in that it makes the countability of $M$, or the “set-ness” of its ordinals, obvious.
Recall that our goal is to add a $0-1$ sequence to $M$. Such a sequence must exist, since there are uncountably many of these and $M$ is countable. But how do we pick out such a sequence? It cannot be too simple, because simple notions are guaranteed to exist; it cannot be too specific, because adding it to $M$ might not result in a model of $\mathsf{ZFC}$ any more. This was the challenge that Cohen faced, which he recounts in his account of the discovery of forcing.
The innovation of forcing derives from the remarkable decision that this sequence is going to be constructed using partial information. Given that our object of interest is an infinite $0-1$ sequence, we take partial information to mean the finite initial segments of the sequence.
Definition. The Cohen poset (short for partially ordered set) is the set of finite sequences of $0-1$: $\{s\mid s: \{0,…,n\}\to \{0,1\}, n\in \mathbb N\}$.
And the informal notion of construction using partial information is made precise by appealing to the notion of a filter.
Definition. A filter on the Cohen poset is a collection $G$ of finite $0-1$ sequences that are coherent in the following sense: 1) the empty sequence is in $G$; 2) if a sequence is in $G$, then all the initial sequences of it is also in $G$; 3) if $s_0,s_1$ are two sequences in $G$, then one must be an extension of the other.
For example, if $f: \mathbb N\to\{0,1\}$ is any $0-1$ sequence whatsoever, the set of all its initial segments will be a filter. This tells us that filters are not enough: each $0-1$ sequence corresponds to a filter. So this idea of “construct it using partial information” is too broad. It does not avoid the “too informative” or “too specific” or “too simple” problem. What’s missing is an additional restriction on the filters so that the object we end up obtaining is perfectly average: not too simple as to be fixed across all models, not too specific as to be generated outright in $M$, and not too informative as to be contradicting $\mathsf{ZFC}$.
Cohen realized this missing restriction was to be the topological notion of genericity. In addition to being coherent, the filter is further required to pick out elements that are perfectly average in the following sense.
Definition. A dense subset of the Cohen poset is one in which every finite $0-1$ sequence can find an extension. A filter is $M$-generic if it intersects every dense subset of the Cohen poset that is in $M$.
So generic filters capture the idea of being not too specific.
They are not too simple either. In fact, if $G$ is a $M$-generic filter on the Cohen poset, then it can be shown that $G$ is not an element of $M$. This is roughly because, if $G\in M$, then we can use $G$ to express the property of “not being in $G$” and show that the finite $0-1$ sequences having this property form a dense subset in $M$.
We observe that, for each natural number $n$, the set of finite binary sequences of length at least $n$ is dense. Or in other words, it is a generic property of binary sequences to have length $\geq n$. So for each $n$, a generic filter will contain finite sequences of length at least $n$.
This implies that the construction indeed gives us what we want. If $G$ is a $M$-generic filter, then piecing together its element (i.e., taking $\bigcup G$) gives us a genuine function $f: \mathbb N\to \{0,1\}$, an infinite $0-1$ sequence that is not in $M$.
If $M$ is a countable transitive model, then yes. This result is known as the Rasiow-Sikorski lemma, proven about a decade before the invention of forcing.
By all means this is one of those pleasant moments in mathematics where ideas from various fields converge. It turns out that the existence of a generic filter, which is motivated by our attempt to build an object outside of $M$ that is perfectly average, is a close relative to the Baire category theorem. The exact equivalence is spelled out in this paper by Goldblatt.
The proof of the Rasiowa-Sikorski lemma reveals a strong flavor of diagonal arguments, hinting at a uniformity between the Baire category proof of the uncountablility of the reals and the more familiar diagonalization proof. This analogy underlies much of the study of what’s known today as forcing axioms (see for instance the exposition by Viale).
The resulting sequence obtained by piecing together the generic filter is now called a Cohen (generic) real over $M$. Due to later works of Solovay, we now know in hindsight that if $M$ is a countable transitive model of $\mathsf{ZFC}$, then in the sense of topology almost every real number is generic over $M$.
Forcing is a tool to build new models out of old ones. So far, we started with a countable transitive model, and motivated the notion of generic filters. Such filters exist, and the resulting object is perfectly average in a precise sense. But we’ve only built an infinite binary sequence. How do we build new models of $\mathsf{ZFC}$?
In the above, we observed that it is not enough to just add the generic object to $M$. We need to add things that are generated by it. A doomed attempt is to try to find some way and describe what one needs to add to $M$: i.e., if $f$ is the generic sequence, we require that $\{f\}$ be in the new model, $\{\{f\}\}$ be in the new model, etc etc. This is doomed, because the $\mathsf{ZFC}$ axioms are so complicated that there is simply o hope for some tangible description to round out what one needs to add to $M$ in order to obtain a model of $\mathsf{ZFC}$. What’s missing, then, is some kind of process that “takes care of itself”, so to speak.
The incredibly ingenious realization by Cohen was that, just like the generic object is built using partial information, so can the entire extension model of $M$. To unuderstand this, we need to understand a theorem (really an axiom) of $\mathsf{ZFC}$: every set can be obtained in the following recursive process, carried out transfinitely:
The familiar reader will notice that this is an informal description of the von Neumann hierarchy, where: $V_0=\emptyset$, $V_{\alpha+1}=\mathcal{P}(V_\alpha)$, and $V_\lambda = \bigcup_{\beta<\lambda}V_\beta$ for limit ordinals $\beta$.
It is an axiom of $\mathsf{ZFC}$ that every set belongs to one of these stages. Intuitively, the von Neumann hierarchy describes a process building a model of $\mathsf{ZFC}$. Cohen decided to mimic this process, using the generic sequence we obtained.
The trick here requires a new perspective. That is, we no longer view the statement $x\in X$ as a proposition, but we view it as expressing a degree of truth. Of course, in the classical case, to assert $x\in X$ is to assert that $x\in X$ is true to degree 1, and to assert $x\notin X$ is to assert that $x\in X$ is true to degree 0 (i.e., not true at all).
That is, we would like to identify each set $X\in M$ with its characteristic function $char_X: X\to\{0,1\}$, where $char_X(x)=1$ if and only if $x\in X$. Since this is set theory, mind you, functions are really just sets of ordered pairs. So we shall view a set $X$ as the set $X \times \{0,1\}$, and we take the “true” elements of $X$ to be those that are paired with $1$.
One way to “extend” $M$ (in scare quotes because we are not really adding anything), is to do this for each element of $M$, so we end up getting the set of “possible elements of $M$”:
$\{(x,i)\mid i\in \{0,1\}~\&~ x\in M\}$
We can view these possible elements of $M$ as being undetermined whether they are in $M$ or not (so a little bit like Schrödinger’s poor cat), waiting to be given a “truth value” as to the question “am I in $M$?” One can also think of them as names of elements that will go into $M$, waiting to be called out by some kind of selector.
Of course, it is trivial to recover $M$ from the set of its possible elements: just take all the $(x,1)$, and take the left coordinate.
Let’s pause here and reflect the above way of thinking. We have a way of naming the possible elements of $M$, and we can recover $M$ by designating a privileged “truth value” to pick out what names we call out (in our case $1$). Our target is to modify this strategy so that it names the possible elements of the bigger model extending $M$, and we would like to find some ways to use the generic filter as well.
There is a shortcoming with the current strategy: it’s not expressive enough. By this I mean the names are taking the following form: $(a,b)$ where $a$ is an element of $M$, and $b$ is either $0$ or $1$. In their present form, they are too closely tied to the elements of $M$ to let us name an element that is not in it. It also relies too much of $M$: in order to “extend” $M$, we need to first have $M$ ready to form the names and call out the correct ones. These defects doom the current strategy as a way of building/talking about a structure before we have it.
Recall that it is wise to leave the construction of the extension model to a process that somehow “takes care of itself”. And a sensible candidate is to imitate the von Neumann hierarchy in a way that reflects the recursive nature of the construction. The essential idea of the von Neumann hierarchy is this:
So our current strategy fails to capture this guiding principle, because it is consider every element of $M$ at once, ignoring the hierarchical nature of the sets.
So let us mimic the von Neumann hierarchy, but not in the construction of the new sets, but in the construction of the names.
Definition. Call the two element poset $\{0,1\}$ the truth value poset
Definition. Let us consider the Name hierarchy for the truth value poset:
- $\text{Name}_0$ contains only the empty set. The empty set is the only level $0$ name.
- $\text{Name}_{\alpha+1} = \mathcal{P}(\text{Name}_\alpha\times \{0,1\})$.
- For limit ordinals $\lambda$, $\text{Name}_\lambda=\bigcup_{\beta<\lambda}\text{Name}_\beta$.
Of course, the name hierarchy described above is solely restricted to the truth value poset, and intended as an illustration.
So we’ve improved our description of the names of the possible elements in the “extension” of $M$. This is done by mimicking the von Neumann construction of the sets. But what we have are just names so far. We don’t have a model yet. We need a way, again, to turn names into real elements.
Definition. The filter $T$ on the truth value poset is the set $\{1\}$.
To recover $M$ from the names for the truth value poset, we also follow von Neumann’s guiding principle. This is sensible, because the names themselves are constructed recursively in a hierarchical fashion.
Note that we can’t just say “take left elements that are paired with $1$” like before, since these left elements are also themselves names. One needs to recursively give the names values like the following:
1. $M_0$ contains only the empty set. That is, the empty set (as a name) gets evaluated by the filter $T$ to be the empty set.
2. $M_{\alpha+1}$ contains the $T$-evaluation of the names in $\text{Name}_\alpha$ in the following sense: if $\tau$ is a name in stage $\text{Name}_{\alpha+1}$, then its $T$-evaluation is the $T$-evaluation of set of names in it. Or in symbols:
$val_T(\tau) = \{val_T(\sigma) \mid (\sigma, 1)\in \tau\}$
3. At limit stage, $M_\lambda$ is the set of all evaluations of the stages before $\lambda$.
One may verify, somewhat tediously, that the set of evaluations of the names is exactly $M$.
So we’ve improved our method of naming possible elements of a model and then instantiating them to be really elements of a model. This was done in the toy example with the truth value poset $\{0,1\}$ and the filter $T=\{1\}$. Forcing builds new models by taking more complicated posets and using generic filters on them as valuations. For instance:
Definition. The name hierarchy for the Cohen poset $\mathbb{C}$ is defined recursively as follows
- $\text{Name}_0$ contains only the empty set. The empty set is the only level $0$ name.
- $\text{Name}_{\alpha+1} = \mathcal{P}(\text{Name}_\alpha\times \mathbb{C})$
- For limit ordinals $\lambda$, $\text{Name}_\lambda=\bigcup_{\beta<\lambda}\text{Name}_\beta$
Notice that, in the view that we are setting up, elements in the Cohen poset itself takes the role of truth values. So we still think of a statement of the form $x\in X$ as an assertion of a degree of truth; however, we no longer think of “degrees of truth” as being just the set $\{0,1\}$. We abstractly take the “degrees of truth” to be the Cohen poset.
And if $G$ is a $M$-generic filter of the Cohen poset, the set of evaluations of the names in $M$, which we shall call the generic extension of $M$ by $G$ (in symbols, $M[G]$), is also obtained recursively.
1. $M[G]_0$ contains only the empty set. That is, the empty set (as a name) gets evaluated by the filter $G$ to be the empty set.
2. $M[G]_{\alpha+1}$ contains the $G$-evaluation of the names in $\text{Name}_\alpha$ in the following sense: if $\tau$ is a name in stage $\text{Name}_{\alpha+1}$, then its $G$-evaluation is the $G$-evaluation of set of names in it. Or in symbols:
$val_G(\tau) = \{val_G(\sigma) \mid \exists s\in G~ (\sigma, s)\in \tau\}$
3. At limit stage, $M[G]_\lambda$ is the set of all evaluations of the stages before $\lambda$.
$M[G]$ denotes the union of all such $M[G]_\alpha$’s. The term “forcing” derives from the following definition of the relation between the poset in question and what $M[G]$ satisfies.
Definition. If $s$ is an element of the Cohen poset and $P$ is a statement about elements of $M[G]$, then we write $s\Vdash P$ (“$s$ forces $P$”) to mean: every generic filter $G$ having $s$ as an element will make $M[G]$ satisfy $P$.
In many texts at this point, basic properties of $M[G]$ will be established. The most important ones are:
The proof of 2 is far from trivial. It requires a careful study of the relation between $M, G,$ and $M[G]$, as well as the forcing relation. A key component is the The Truth Lemma and the Definability Lemma, which jointly say that the forcing relation $s \Vdash P$ turns out to be equivalent to a topological property of the poset in question. This makes it a mathematical property that $M$ can handle using the $\mathsf{ZFC}$ axioms, rather than a metamathematical property about countable transitive models.
So that is how forcing builds new models of $\mathsf{ZFC}$ out of old ones. I should stress that we’ve only looked at forcing with toy examples, namely the Cohen poset and the truth value poset. Forcing itself is incredibly versatile, in that one can use forcing with whatever poset one desires, and sophisticated arguments and extensions have been produced using clever designs of posets. For instance, let me say a little bit about how one obtains a model in which $2^{\aleph_0}\neq \aleph_1$ (so the continuum hypothesis fails in that model).
This is done by considering a suitable product of the Cohen poset.
I should also point out that the approach outlined in this post, namely the approach using countable transitive models, is one of several equivalent approaches to forcing. Some others are:
In syntactic forcing, one completely dispenses with any talks of models or truth in a model. Rather, one proceeds directly to the topological property given by the truth lemma and the definability lemma, making forcing really the study of topology and the Unicodes of formulas.
Pro: no need to talk about countable transitive models or the existence of generics any more.
Con: involves complicated coding and translating semantics notions (such as truth) into somewhat stilted syntactic notions that are not very intuitive.
In this approach, one takes the truth value metaphor seriously. If one forces with the special class of posets known as the complete Boolean algebras, one can make precise the notion of a Boolean-valued characteristic function (which is what names are in this approach) and the corresponding Boolean-valued model. So names end up playing the roles of actual elements in a Boolean-valued model. This was invented by Solovay and Scott shortly after Cohen’s invention of forcing, in order to get a conceptually perspicuous way in thinking about what forcing is really doing.
Pro: it is easier to understand what forcing is trying to do in this way, and if one has category-theoretic backgrounds, this is arguably the most natural way of understanding forcing. In addition, many of the results from the study of Boolean algebras can be applied to the study of generic extensions, a connection that is obscured in the countable transitive model approach.
Con: substantial prerequisites in the theory of complete Boolean algebras, including the separative quotient of a poset, regular open completion of a separative poset, etc.
This is a slight variant of Boolean-valued models. One uses the Boolean analogue of the ultrapower construction to obtain an (uncountable) model over which generics exist. Joel Hamkins provided a naturalist account of forcing based on this idea.
The key to this proof is the following special case of Fubini’s Theorem:
Theorem. Let \(A\) be a measurable subset of \(\mathbb{R}^2\). For each \(y\in \mathbb{R}\) we define the section of \(A\) by \(y\) as \(A_y:=\{x\in \mathbb{R}\mid (x,y)\in A\}\). Then \(A\) has measure zero iff almost no point can give \(A\) non-measure-zero sections. That is, iff \(\{y\in\mathbb{R}\mid A_y \text{ is not measure zero}\}\) is a measure-zero subset of \(\mathbb{R}\).
Here, by not measure zero (or non-null), I mean either having positive measure or nonmeasurable.
The key to utilizing Fubini’s Theorem along with a well-ordering of the reals is that we can now identify the first place where things start to have positive measure. In other words, we find a place on the well-ordering where all the proper initial segments are either measure zero or nonmeasurable. This will end up giving us a set which satisfies Fubini’s Theorem.
Let us see this in action with a warm-up proof.
Theorem. Assume the continuum hypothesis, which says there is a well-ordering \(\prec\) of \(\mathbb{R}\) such that every proper initial segment is countable. Then \(\prec\) is not measurable as a subset of the plane.
Proof: For each \(x\in\mathbb{R}\), the initial segment \(\prec_x\) determined by \(x\) is countable, and hence of measure zero. It is helpful to visualize \((\mathbb{R},\prec )\times (\mathbb{R},\prec )\) as a plane. Assume towards a contradiction that the set \(\prec =\{(x,y)\mid x\prec y\}\) is measurable. Then it follows that it has measure zero, because each of its section is an initial segment in the well-ordering \(\prec\) and hence countable.
But then the complement of this well-ordering \(\succ:=\{(x,y)\mid y\prec x \text{ or } y=x\}\) would also have measure zero, by the same argument (except now we consider the sections along the other axis). This would then mean that \(\mathbb{R}^2=\prec \cup \succ\) is null. Contradiction!
So this shows that under \(\mathsf{CH}\), a well-ordering of the reals is not measurable. we now observe that we used \(\mathsf{CH}\) to establish that every initial segment in the well-ordering has measure zero. This can be dispensed with, because one might simply consider a similar least position, below which everything is either measure zero or nonmeasurable. This works as follows.
Theorem. Let \(\prec\) be a well-ordering of the reals. Then \(\prec\) is not measurable.
Proof: Suppose it were measurable. Let \(a\) be the least real such that the product \(\prec_a\times\prec_a\) has positive measure (\(\prec_a\) here denotes the initial segment determined by \(a\) in this well-ordering). It could very well be that such a real doesn’t exist, in which case we consider again the whole space \((\mathbb{R},\prec)\times(\mathbb{R},\prec)\).
Now we restrict the well-ordering to everything \(\prec\)-below \(a\), that is, consider \(\prec\upharpoonright a :=\{(x,y)\mid x,y\prec a \text{ and }x\prec y\}\). This set must have measure zero: if a section of it by some \(z\) is non-null, then \(z\prec a\) would violate the minimality of \(a\). So Fubini’s Theorem implies that \(\prec\upharpoonright a\) has measure zero.
But the other half of the square \(\prec_a\times\prec_a\) has the same measure as \(\prec\upharpoonright a\). And hence the square \(\prec_a\times\prec_a\) is the union of two measure zero sets, which implies that it has measure zero itself, contradicting the choice of \(a\). \(\square\)
]]>