An introductory overview of forcing, from the learner's perspective
Note: the goal of this article is to give a sense of the kind of difficulties, both conceptual and technical, one might encounter if one decides to learn the mathematical tool of forcing. As such it is not my intention to provide an intuitive motivation for forcing, or explain the inner workings of it in a perspicuous way. Already there are excellent sources for that: for example see A Philosopher’s Guide to Forcing by Toby Meadows or A Beginner’s Guide to Forcing by Timothy Chow.
What I intend to achieve here is somewhat particular: after a first course in formal logic, set theory, or Gödel’s theorems, a motivated student might see the term “forcing” pop up on the occasional Google chase. They might be intrigued as to why this technique won Paul Cohen, its discoverer, a Fields Medal, for instance. But “what is forcing” is a question that is highly difficult to address in office hours, because there are quite a few moving pieces involved. Each of these pieces brings its own special kind of unease to the student, making the whole thing quite daunting. It is simply my hope here to record these moving pieces their associated difficulties.
Due to this peculiar aim, the article is sprinkled with pausing remarks (click to expand) on what is difficult about the matter at hand and whether it’s conceptual or technical. My judgment is that forcing is not substantially harder than any other main theorems in graduate math textbooks. It is simply due to the variety of concepts involved and the unfamiliarity of the tools used, that forcing has gained some kind of notoriety.
An earlier simpler draft of this post was first published in Chinese on the Q & A platform Zhihu. One may view the original version here: link to the Chinese version.
Forcing is the name of a technical method to establish independence results in set theory. These are results saying that certain statements cannot be deduced from the axioms of set theory (the Zermelo-Fraenkel axioms with Choice).
To show that a statement cannot be deduced from a theory, one shows that the negation of the statement is consistent with the theory. Given a mathematical theory $T$, we write $\mathsf{Con}(T)$ for the statement that $T$ is consistent: i.e., there is no proof that starts with the axioms of $T$ and ends with a contradiction.
If for some sentence $P$ we succeed in showing $\mathsf{Con}(\mathsf{ZFC}+P)$, then we have succeeded in showing that $\mathsf{ZFC}$ cannot refute $P$. This is because if $\mathsf{ZFC}$ refutes $P$, then this means $\mathsf{ZFC}$ proves $\neg P$ (“not-P”); therefore from the theory $\mathsf{ZFC}+P$ one can deduce $P \wedge \neg P$, a contradiction, by doing the following: first deduce $\neg P$ from $\mathsf{ZFC}$ alone, and then deduce $P$.
Forcing allows one to conclude, under suitable assumptions laid out below, $\mathsf{Con}(\mathsf{ZFC}+P)$ for various choices of $P$. Different people have different ways of understanding and implementing it, but arguably the most straightforward way to make sense of it is that it is a method of building new models of ZFC from old ones.
It turns out that talking about proofs is not the easiest thing to do. It is easier to work with models instead. A model of a theory, roughly put, is just a set of things that satisfies that theory. The connection between proofs and models was confirmed by Kurt Gödel in his doctoral thesis:
Fact. (Gödel’s Completeness Theorem): $\mathsf{Con}(T)$ if and only if $T$ has a model.
So if our goal is to show that $\mathsf{ZFC}$ cannot deduce $P$ (so we want to show $\mathsf{Con}(\mathsf{ZFC}+\neg P)$), it is enough to build a model of $\mathsf{ZFC}+\neg P$. This shifts the focus from talking about proofs, which are strings of symbols generated by some fixed rule, to models, which are perhaps more tangible and familiar.
Unfortunately, one cannot just build a model of $\mathsf{ZFC}$ out of thin air. Another theorem by Gödel, his second incompleteness theorem, in simple terms states:
If $\mathsf{ZFC}$ proves that $\mathsf{ZFC}$ has a model, then it can also deduce a contradiction.
So if $\mathsf{ZFC}$ is consistent to begin with, we cannot use it to deduce $\mathsf{Con}(\mathsf{ZFC})$, let alone $\mathsf{Con}(\mathsf{ZFC}+P)$. This left us with the second best option: assume that $\mathsf{ZFC}$ is consistent, and then show $\mathsf{Con}(\mathsf{ZFC}+P)$.
This is where models come in: assuming $\mathsf{ZFC}$ is consistent provides us with a model of it. With forcing, one can carefully pick out an element, add it to the model, and make sure that the resulting model does what one wants.
Models of $\mathsf{ZFC}$ are very complex. A model of $\mathsf{ZFC}$ will contain everything that $\mathsf{ZFC}$ proves to exist (i.e., pretty much all of math). In particular, it proves that the following set exists: $\{f \mid f: \mathbb{N}\to \{0,1\}\}$. Intuitively, this is the set of all infinite sequences of zeroes and ones. There is a way of viewing them as the real numbers, which is why set theorists and recursion theorists often like to refer to elements in this set as the real numbers. In particular, the set $\{f \mid f: \mathbb{N}\to \{0,1\}\}$, which we denote by $2^\mathbb{N}$, is uncountable.
Paul Cohen invented forcing for the express purpose of adding such a $0-1$ sequence to a model of $\mathsf{ZFC}$. To do this, he restricted his attention to models of $\mathsf{ZFC}$ that he knew for sure would be missing at least one $0-1$ sequence: the countable transtive ones.
A model $M$ is transitive if and only if $x\in M$ implies $x\subseteq M$. Intuitively, the model $M$ “sees” the elements of its elements, the elements of those elements, and so on. Transitive models are nice, in that if $M_1\subseteq M_2$ are transitive models, then they agree on the meaning of basic, simple terms. For instance, if $x\in M_1$ is such that $M_1$ satisfies the sentence $``x \text{ is the set of natural numbers}”$, then $M_2$ will satisfy the same sentence. Recall that the use of forcing is to build a model of $\mathsf{ZFC}$ on top of another one, so this $M_1\subseteq M_2$ situation is precisely what we will end up with. Therefore it is important to be able to hold the meaning of certain terms fixed across these two models.
Suppose $M$ is a countable transitive model of $\mathsf{ZFC}$. Since $M$ is countable, we know for sure that there are elements that $M$ is missing. The function telling us that $M$ is countable, for example ($g: M\leftrightarrow \mathbb{N}$). Another example is the “height of $M$”. This is a common name for the set of ordinals in $M$, that is $M \cap \text{ Ordinals}$. This is because being an ordinal is a basic, simple notion that $M$ correctly recognizes. So if $M \cap \text{ Ordinals}$ is an element of $M$, then $M$ will satisfy “the class of ordinals form a set”, and we know this isn’t a theorem of $\mathsf{ZFC}$.
It is also clear that adding a single set to $M$ won’t be enough. This is because $\mathsf{ZFC}$ axioms are “generative” in spirit: if $x$ is a set, then $\{x\}$ is also a set; if $x,y$ are sets, then so is $\{x,y\}$, and so on. So whatever set we add to $M$, we will also have to add the sets that are generated by it, as required by the axioms. And the sets that are generated by those, ad infinitum.
The point of the last two paragraphs is this: while we know there are plenty of things that cannot be in $M$, we need to be careful in what we want to add to it. For instance, it should be not too informative, in that it makes the countability of $M$, or the “set-ness” of its ordinals, obvious.
Recall that our goal is to add a $0-1$ sequence to $M$. Such a sequence must exist, since there are uncountably many of these and $M$ is countable. But how do we pick out such a sequence? It cannot be too simple, because simple notions are guaranteed to exist; it cannot be too specific, because adding it to $M$ might not result in a model of $\mathsf{ZFC}$ any more. This was the challenge that Cohen faced, which he recounts in his account of the discovery of forcing.
The innovation of forcing derives from the remarkable decision that this sequence is going to be constructed using partial information. Given that our object of interest is an infinite $0-1$ sequence, we take partial information to mean the finite initial segments of the sequence.
Definition. The Cohen poset (short for partially ordered set) is the set of finite sequences of $0-1$: $\{s\mid s: \{0,…,n\}\to \{0,1\}, n\in \mathbb N\}$.
And the informal notion of construction using partial information is made precise by appealing to the notion of a filter.
Definition. A filter on the Cohen poset is a collection $G$ of finite $0-1$ sequences that are coherent in the following sense: 1) the empty sequence is in $G$; 2) if a sequence is in $G$, then all the initial sequences of it is also in $G$; 3) if $s_0,s_1$ are two sequences in $G$, then one must be an extension of the other.
For example, if $f: \mathbb N\to\{0,1\}$ is any $0-1$ sequence whatsoever, the set of all its initial segments will be a filter. This tells us that filters are not enough: each $0-1$ sequence corresponds to a filter. So this idea of “construct it using partial information” is too broad. It does not avoid the “too informative” or “too specific” or “too simple” problem. What’s missing is an additional restriction on the filters so that the object we end up obtaining is perfectly average: not too simple as to be fixed across all models, not too specific as to be generated outright in $M$, and not too informative as to be contradicting $\mathsf{ZFC}$.
Cohen realized this missing restriction was to be the topological notion of genericity. In addition to being coherent, the filter is further required to pick out elements that are perfectly average in the following sense.
Definition. A dense subset of the Cohen poset is one in which every finite $0-1$ sequence can find an extension. A filter is $M$-generic if it intersects every dense subset of the Cohen poset that is in $M$.
So generic filters capture the idea of being not too specific.
They are not too simple either. In fact, if $G$ is a $M$-generic filter on the Cohen poset, then it can be shown that $G$ is not an element of $M$. This is roughly because, if $G\in M$, then we can use $G$ to express the property of “not being in $G$” and show that the finite $0-1$ sequences having this property form a dense subset in $M$.
We observe that, for each natural number $n$, the set of finite binary sequences of length at least $n$ is dense. Or in other words, it is a generic property of binary sequences to have length $\geq n$. So for each $n$, a generic filter will contain finite sequences of length at least $n$.
This implies that the construction indeed gives us what we want. If $G$ is a $M$-generic filter, then piecing together its element (i.e., taking $\bigcup G$) gives us a genuine function $f: \mathbb N\to \{0,1\}$, an infinite $0-1$ sequence that is not in $M$.
If $M$ is a countable transitive model, then yes. This result is known as the Rasiow-Sikorski lemma, proven about a decade before the invention of forcing.
By all means this is one of those pleasant moments in mathematics where ideas from various fields converge. It turns out that the existence of a generic filter, which is motivated by our attempt to build an object outside of $M$ that is perfectly average, is a close relative to the Baire category theorem. The exact equivalence is spelled out in this paper by Goldblatt.
The proof of the Rasiowa-Sikorski lemma reveals a strong flavor of diagonal arguments, hinting at a uniformity between the Baire category proof of the uncountablility of the reals and the more familiar diagonalization proof. This analogy underlies much of the study of what’s known today as forcing axioms (see for instance the exposition by Viale).
The resulting sequence obtained by piecing together the generic filter is now called a Cohen (generic) real over $M$. Due to later works of Solovay, we now know in hindsight that if $M$ is a countable transitive model of $\mathsf{ZFC}$, then in the sense of topology almost every real number is generic over $M$.
Forcing is a tool to build new models out of old ones. So far, we started with a countable transitive model, and motivated the notion of generic filters. Such filters exist, and the resulting object is perfectly average in a precise sense. But we’ve only built an infinite binary sequence. How do we build new models of $\mathsf{ZFC}$?
In the above, we observed that it is not enough to just add the generic object to $M$. We need to add things that are generated by it. A doomed attempt is to try to find some way and describe what one needs to add to $M$: i.e., if $f$ is the generic sequence, we require that $\{f\}$ be in the new model, $\{\{f\}\}$ be in the new model, etc etc. This is doomed, because the $\mathsf{ZFC}$ axioms are so complicated that there is simply o hope for some tangible description to round out what one needs to add to $M$ in order to obtain a model of $\mathsf{ZFC}$. What’s missing, then, is some kind of process that “takes care of itself”, so to speak.
The incredibly ingenious realization by Cohen was that, just like the generic object is built using partial information, so can the entire extension model of $M$. To unuderstand this, we need to understand a theorem (really an axiom) of $\mathsf{ZFC}$: every set can be obtained in the following recursive process, carried out transfinitely:
The familiar reader will notice that this is an informal description of the von Neumann hierarchy, where: $V_0=\emptyset$, $V_{\alpha+1}=\mathcal{P}(V_\alpha)$, and $V_\lambda = \bigcup_{\beta<\lambda}V_\beta$ for limit ordinals $\beta$.
It is an axiom of $\mathsf{ZFC}$ that every set belongs to one of these stages. Intuitively, the von Neumann hierarchy describes a process building a model of $\mathsf{ZFC}$. Cohen decided to mimic this process, using the generic sequence we obtained.
The trick here requires a new perspective. That is, we no longer view the statement $x\in X$ as a proposition, but we view it as expressing a degree of truth. Of course, in the classical case, to assert $x\in X$ is to assert that $x\in X$ is true to degree 1, and to assert $x\notin X$ is to assert that $x\in X$ is true to degree 0 (i.e., not true at all).
That is, we would like to identify each set $X\in M$ with its characteristic function $char_X: X\to\{0,1\}$, where $char_X(x)=1$ if and only if $x\in X$. Since this is set theory, mind you, functions are really just sets of ordered pairs. So we shall view a set $X$ as the set $X \times \{0,1\}$, and we take the “true” elements of $X$ to be those that are paired with $1$.
One way to “extend” $M$ (in scare quotes because we are not really adding anything), is to do this for each element of $M$, so we end up getting the set of “possible elements of $M$”:
$\{(x,i)\mid i\in \{0,1\}~\&~ x\in M\}$
We can view these possible elements of $M$ as being undetermined whether they are in $M$ or not (so a little bit like Schrödinger’s poor cat), waiting to be given a “truth value” as to the question “am I in $M$?” One can also think of them as names of elements that will go into $M$, waiting to be called out by some kind of selector.
Of course, it is trivial to recover $M$ from the set of its possible elements: just take all the $(x,1)$, and take the left coordinate.
Let’s pause here and reflect the above way of thinking. We have a way of naming the possible elements of $M$, and we can recover $M$ by designating a privileged “truth value” to pick out what names we call out (in our case $1$). Our target is to modify this strategy so that it names the possible elements of the bigger model extending $M$, and we would like to find some ways to use the generic filter as well.
There is a shortcoming with the current strategy: it’s not expressive enough. By this I mean the names are taking the following form: $(a,b)$ where $a$ is an element of $M$, and $b$ is either $0$ or $1$. In their present form, they are too closely tied to the elements of $M$ to let us name an element that is not in it. It also relies too much of $M$: in order to “extend” $M$, we need to first have $M$ ready to form the names and call out the correct ones. These defects doom the current strategy as a way of building/talking about a structure before we have it.
Recall that it is wise to leave the construction of the extension model to a process that somehow “takes care of itself”. And a sensible candidate is to imitate the von Neumann hierarchy in a way that reflects the recursive nature of the construction. The essential idea of the von Neumann hierarchy is this:
So our current strategy fails to capture this guiding principle, because it is consider every element of $M$ at once, ignoring the hierarchical nature of the sets.
So let us mimic the von Neumann hierarchy, but not in the construction of the new sets, but in the construction of the names.
Definition. Call the two element poset $\{0,1\}$ the truth value poset
Definition. Let us consider the Name hierarchy for the truth value poset:
- $\text{Name}_0$ contains only the empty set. The empty set is the only level $0$ name.
- $\text{Name}_{\alpha+1} = \mathcal{P}(\text{Name}_\alpha\times \{0,1\})$.
- For limit ordinals $\lambda$, $\text{Name}_\lambda=\bigcup_{\beta<\lambda}\text{Name}_\beta$.
Of course, the name hierarchy described above is solely restricted to the truth value poset, and intended as an illustration.
So we’ve improved our description of the names of the possible elements in the “extension” of $M$. This is done by mimicking the von Neumann construction of the sets. But what we have are just names so far. We don’t have a model yet. We need a way, again, to turn names into real elements.
Definition. The filter $T$ on the truth value poset is the set $\{1\}$.
To recover $M$ from the names for the truth value poset, we also follow von Neumann’s guiding principle. This is sensible, because the names themselves are constructed recursively in a hierarchical fashion.
Note that we can’t just say “take left elements that are paired with $1$” like before, since these left elements are also themselves names. One needs to recursively give the names values like the following:
1. $M_0$ contains only the empty set. That is, the empty set (as a name) gets evaluated by the filter $T$ to be the empty set.
2. $M_{\alpha+1}$ contains the $T$-evaluation of the names in $\text{Name}_\alpha$ in the following sense: if $\tau$ is a name in stage $\text{Name}_{\alpha+1}$, then its $T$-evaluation is the $T$-evaluation of set of names in it. Or in symbols:
$val_T(\tau) = \{val_T(\sigma) \mid (\sigma, 1)\in \tau\}$
3. At limit stage, $M_\lambda$ is the set of all evaluations of the stages before $\lambda$.
One may verify, somewhat tediously, that the set of evaluations of the names is exactly $M$.
So we’ve improved our method of naming possible elements of a model and then instantiating them to be really elements of a model. This was done in the toy example with the truth value poset $\{0,1\}$ and the filter $T=\{1\}$. Forcing builds new models by taking more complicated posets and using generic filters on them as valuations. For instance:
Definition. The name hierarchy for the Cohen poset $\mathbb{C}$ is defined recursively as follows
- $\text{Name}_0$ contains only the empty set. The empty set is the only level $0$ name.
- $\text{Name}_{\alpha+1} = \mathcal{P}(\text{Name}_\alpha\times \mathbb{C})$
- For limit ordinals $\lambda$, $\text{Name}_\lambda=\bigcup_{\beta<\lambda}\text{Name}_\beta$
Notice that, in the view that we are setting up, elements in the Cohen poset itself takes the role of truth values. So we still think of a statement of the form $x\in X$ as an assertion of a degree of truth; however, we no longer think of “degrees of truth” as being just the set $\{0,1\}$. We abstractly take the “degrees of truth” to be the Cohen poset.
And if $G$ is a $M$-generic filter of the Cohen poset, the set of evaluations of the names in $M$, which we shall call the generic extension of $M$ by $G$ (in symbols, $M[G]$), is also obtained recursively.
1. $M[G]_0$ contains only the empty set. That is, the empty set (as a name) gets evaluated by the filter $G$ to be the empty set.
2. $M[G]_{\alpha+1}$ contains the $G$-evaluation of the names in $\text{Name}_\alpha$ in the following sense: if $\tau$ is a name in stage $\text{Name}_{\alpha+1}$, then its $G$-evaluation is the $G$-evaluation of set of names in it. Or in symbols:
$val_G(\tau) = \{val_G(\sigma) \mid \exists s\in G~ (\sigma, s)\in \tau\}$
3. At limit stage, $M[G]_\lambda$ is the set of all evaluations of the stages before $\lambda$.
$M[G]$ denotes the union of all such $M[G]_\alpha$’s. The term “forcing” derives from the following definition of the relation between the poset in question and what $M[G]$ satisfies.
Definition. If $s$ is an element of the Cohen poset and $P$ is a statement about elements of $M[G]$, then we write $s\Vdash P$ (“$s$ forces $P$”) to mean: every generic filter $G$ having $s$ as an element will make $M[G]$ satisfy $P$.
In many texts at this point, basic properties of $M[G]$ will be established. The most important ones are:
The proof of 2 is far from trivial. It requires a careful study of the relation between $M, G,$ and $M[G]$, as well as the forcing relation. A key component is the The Truth Lemma and the Definability Lemma, which jointly say that the forcing relation $s \Vdash P$ turns out to be equivalent to a topological property of the poset in question. This makes it a mathematical property that $M$ can handle using the $\mathsf{ZFC}$ axioms, rather than a metamathematical property about countable transitive models.
So that is how forcing builds new models of $\mathsf{ZFC}$ out of old ones. I should stress that we’ve only looked at forcing with toy examples, namely the Cohen poset and the truth value poset. Forcing itself is incredibly versatile, in that one can use forcing with whatever poset one desires, and sophisticated arguments and extensions have been produced using clever designs of posets. For instance, let me say a little bit about how one obtains a model in which $2^{\aleph_0}\neq \aleph_1$ (so the continuum hypothesis fails in that model).
This is done by considering a suitable product of the Cohen poset.
I should also point out that the approach outlined in this post, namely the approach using countable transitive models, is one of several equivalent approaches to forcing. Some others are:
In syntactic forcing, one completely dispenses with any talks of models or truth in a model. Rather, one proceeds directly to the topological property given by the truth lemma and the definability lemma, making forcing really the study of topology and the Unicodes of formulas.
Pro: no need to talk about countable transitive models or the existence of generics any more.
Con: involves complicated coding and translating semantics notions (such as truth) into somewhat stilted syntactic notions that are not very intuitive.
In this approach, one takes the truth value metaphor seriously. If one forces with the special class of posets known as the complete Boolean algebras, one can make precise the notion of a Boolean-valued characteristic function (which is what names are in this approach) and the corresponding Boolean-valued model. So names end up playing the roles of actual elements in a Boolean-valued model. This was invented by Solovay and Scott shortly after Cohen’s invention of forcing, in order to get a conceptually perspicuous way in thinking about what forcing is really doing.
Pro: it is easier to understand what forcing is trying to do in this way, and if one has category-theoretic backgrounds, this is arguably the most natural way of understanding forcing. In addition, many of the results from the study of Boolean algebras can be applied to the study of generic extensions, a connection that is obscured in the countable transitive model approach.
Con: substantial prerequisites in the theory of complete Boolean algebras, including the separative quotient of a poset, regular open completion of a separative poset, etc.
This is a slight variant of Boolean-valued models. One uses the Boolean analogue of the ultrapower construction to obtain an (uncountable) model over which generics exist. Joel Hamkins provided a naturalist account of forcing based on this idea.