An introductory overview of forcing, from the learner's perspective

Note: the goal of this article is to give a sense of the kind of difficulties, both conceptual and technical, one might encounter if one decides to learn the mathematical tool of forcing. As such it is not my intention to provide an intuitive motivation for forcing, or explain the inner workings of it in a perspicuous way. Already there are excellent sources for that: for example see A Philosopher’s Guide to Forcing by Toby Meadows or A Beginner’s Guide to Forcing by Timothy Chow.

What I intend to achieve here is somewhat particular: after a first course in formal logic, set theory, or Gödel’s theorems, a motivated student might see the term “forcing” pop up on the occasional Google chase. They might be intrigued as to why this technique won Paul Cohen, its discoverer, a Fields Medal, for instance. But “what is forcing” is a question that is highly difficult to address in office hours, because there are quite a few moving pieces involved. Each of these pieces brings its own special kind of unease to the student, making the whole thing quite daunting. It is simply my hope here to record these moving pieces their associated difficulties.

Due to this peculiar aim, the article is sprinkled with pausing remarks (click to expand) on what is difficult about the matter at hand and whether it’s conceptual or technical. My judgment is that forcing is not substantially harder than any other main theorems in graduate math textbooks. It is simply due to the variety of concepts involved and the unfamiliarity of the tools used, that forcing has gained some kind of notoriety.

An earlier simpler draft of this post was first published in Chinese on the Q & A platform Zhihu. One may view the original version here: link to the Chinese version.

Forcing is the name of a technical method to establish *independence results* in set theory. These are results saying that certain statements **cannot** be deduced from the axioms of set theory (the Zermelo-Fraenkel axioms with Choice).

To show that a statement cannot be deduced from a theory, one shows that the negation of the statement is *consistent* with the theory. Given a mathematical theory $T$, we write $\mathsf{Con}(T)$ for the statement that $T$ is consistent: i.e., there is no proof that starts with the axioms of $T$ and ends with a contradiction.

If for some sentence $P$ we succeed in showing $\mathsf{Con}(\mathsf{ZFC}+P)$, then we have succeeded in showing that $\mathsf{ZFC}$ cannot refute $P$. This is because if $\mathsf{ZFC}$ refutes $P$, then this means $\mathsf{ZFC}$ proves $\neg P$ (“not-P”); therefore from the theory $\mathsf{ZFC}+P$ one can deduce $P \wedge \neg P$, a contradiction, by doing the following: first deduce $\neg P$ from $\mathsf{ZFC}$ alone, and then deduce $P$.

Forcing allows one to conclude, under suitable assumptions laid out below, $\mathsf{Con}(\mathsf{ZFC}+P)$ for various choices of $P$. Different people have different ways of understanding and implementing it, but arguably the most straightforward way to make sense of it is that it is a method of building new models of ZFC from old ones.

It turns out that talking about proofs is not the easiest thing to do. It is easier to work with models instead. A model of a theory, roughly put, is just a set of things that satisfies that theory. The connection between proofs and models was confirmed by Kurt Gödel in his doctoral thesis:

Fact. (Gödel’s Completeness Theorem): $\mathsf{Con}(T)$ if and only if $T$ has a model.

A technical difficulty lies in understanding the statements $\mathsf{Con}(\mathsf{ZFC})$ and "$\mathsf{ZFC}$ has a model" as precise mathematical statements.

In the strictest sense, $\mathsf{Con}(\mathsf{ZFC})$ is to be construed as the number-theoretic claim that, roughly, there does not exist a natural number which in Unicode encodes a sequence of symbols, in which the starting lines are axioms of $\mathsf{ZFC}$, and the last line is the sentence $0=1$, and the intermediate lines are all obtained by valid rules of inferences in first-order logic. This whole business of encoding sequences of symbols by natural numbers is very fuzzy and not very fun to go through. Fortunately this accords with one's intuition how proofs work, so this is familiar territory for someone familiar with proofs. It's just painstakingly formalized in this case.

Fact: One can write down formulas $\mathsf{Axiom}(x)$, $\mathsf{isProof}(x,y)$ in the language of number theory, such that if number theory can prove $\mathsf{Axiom}(n)$, then $n$ really is the Unicode encoding of one of the $\mathsf{ZFC}$ axioms; and if number theory can prove $\mathsf{isProof}(a,b)$, then $a$ really is the Unicode encoding of a $\mathsf{ZFC}$-proof of the statement encoded by $b$.

Similarly, one needs to strictly formalize what a theory is and what it means for a theory to have a model. This is of course routine now in an advanced logic course, but it is highly unlikely that someone without a logic background can spell out what a model really is. Again, the good news is that this doesn't matter much, because it stays pretty close with our mathematical practice and intuition.

In the strictest sense, $\mathsf{Con}(\mathsf{ZFC})$ is to be construed as the number-theoretic claim that, roughly, there does not exist a natural number which in Unicode encodes a sequence of symbols, in which the starting lines are axioms of $\mathsf{ZFC}$, and the last line is the sentence $0=1$, and the intermediate lines are all obtained by valid rules of inferences in first-order logic. This whole business of encoding sequences of symbols by natural numbers is very fuzzy and not very fun to go through. Fortunately this accords with one's intuition how proofs work, so this is familiar territory for someone familiar with proofs. It's just painstakingly formalized in this case.

Fact: One can write down formulas $\mathsf{Axiom}(x)$, $\mathsf{isProof}(x,y)$ in the language of number theory, such that if number theory can prove $\mathsf{Axiom}(n)$, then $n$ really is the Unicode encoding of one of the $\mathsf{ZFC}$ axioms; and if number theory can prove $\mathsf{isProof}(a,b)$, then $a$ really is the Unicode encoding of a $\mathsf{ZFC}$-proof of the statement encoded by $b$.

Similarly, one needs to strictly formalize what a theory is and what it means for a theory to have a model. This is of course routine now in an advanced logic course, but it is highly unlikely that someone without a logic background can spell out what a model really is. Again, the good news is that this doesn't matter much, because it stays pretty close with our mathematical practice and intuition.

So if our goal is to show that $\mathsf{ZFC}$ cannot deduce $P$ (so we want to show $\mathsf{Con}(\mathsf{ZFC}+\neg P)$), it is enough to build a model of $\mathsf{ZFC}+\neg P$. This shifts the focus from talking about proofs, which are strings of symbols generated by some fixed rule, to models, which are perhaps more tangible and familiar.

Unfortunately, one cannot just build a model of $\mathsf{ZFC}$ out of thin air. Another theorem by Gödel, his second *in*completeness theorem, in simple terms states:

If $\mathsf{ZFC}$ proves that $\mathsf{ZFC}$ has a model, then it can also deduce a contradiction.

So if $\mathsf{ZFC}$ is consistent to begin with, we cannot use it to deduce $\mathsf{Con}(\mathsf{ZFC})$, let alone $\mathsf{Con}(\mathsf{ZFC}+P)$. This left us with the second best option: *assume* that $\mathsf{ZFC}$ is consistent, and then show $\mathsf{Con}(\mathsf{ZFC}+P)$.

This is where models come in: assuming $\mathsf{ZFC}$ is consistent provides us with a model of it. With forcing, one can carefully pick out an element, add it to the model, and make sure that the resulting model does what one wants.

Models of $\mathsf{ZFC}$ are very complex. A model of $\mathsf{ZFC}$ will contain everything that $\mathsf{ZFC}$ proves to exist (i.e., pretty much all of math). In particular, it proves that the following set exists: $\{f \mid f: \mathbb{N}\to \{0,1\}\}$. Intuitively, this is the set of all infinite sequences of zeroes and ones. There is a way of viewing them as the real numbers, which is why set theorists and recursion theorists often like to refer to elements in this set as the real numbers. In particular, the set $\{f \mid f: \mathbb{N}\to \{0,1\}\}$, which we denote by $2^\mathbb{N}$, is uncountable.

Paul Cohen invented forcing for the express purpose of adding such a $0-1$ sequence to a model of $\mathsf{ZFC}$. To do this, he restricted his attention to models of $\mathsf{ZFC}$ that he knew for sure would be missing at least one $0-1$ sequence: the countable transtive ones.

One of the first great stumbling blocks for learners of forcing (and mathematical logic really) is the existence of countable models of $\mathsf{ZFC}$. This presents a seeming contradictions of facts at hand:

1. (Cantor's Theorem) $\mathsf{ZFC}$ proves that uncountable sets exist; in particular, the set of real numbers, or $2^\mathbb{N}$, is uncountable; but,

2. (Downward Löwenheim-Skolem Theorem) If $\mathsf{ZFC}$ has any model at all, then it has a countable model.

These two facts seem to contradict each other in a confusing way. So much so that it deserves its own name and article in the Stanford Encyclopedia of Philosophy.

The existence of countable models certainly presents conceptual difficulties for someone seeing this for the first time. Indeed this is one of those "you'll get used to it" bits in math. But perhaps let me give an intuitive explanation:

$\mathsf{ZFC}$ has only countably many formulas. This entails that the theory can only talk about countably many things. In particular, it can only define countably many things, and prove that they exist. A structure can be a model of $\mathsf{ZFC}$ if it doesn't rule out the existence of these things. So to obtain a model of $\mathsf{ZFC}$, we are only required to throw in the things that $\mathsf{ZFC}$ defines and proves to exist: there are only countably many of these!

What about the set of real numbers, which $\mathsf{ZFC}$ proves to exist? Well, we are only building a structure that satisfies a collection of sentences. There is no guarantee that we are really building the actual mathematical universe. Thus if $M$ is a countable model, $M$ is going to satisfy "there exists a unique set of real numbers", but we are not guaranteed in having that the unique set that $M$ judges to be the real numbers is really the set of all real numbers (although there might be overlap).

1. (Cantor's Theorem) $\mathsf{ZFC}$ proves that uncountable sets exist; in particular, the set of real numbers, or $2^\mathbb{N}$, is uncountable; but,

2. (Downward Löwenheim-Skolem Theorem) If $\mathsf{ZFC}$ has any model at all, then it has a countable model.

These two facts seem to contradict each other in a confusing way. So much so that it deserves its own name and article in the Stanford Encyclopedia of Philosophy.

The existence of countable models certainly presents conceptual difficulties for someone seeing this for the first time. Indeed this is one of those "you'll get used to it" bits in math. But perhaps let me give an intuitive explanation:

$\mathsf{ZFC}$ has only countably many formulas. This entails that the theory can only talk about countably many things. In particular, it can only define countably many things, and prove that they exist. A structure can be a model of $\mathsf{ZFC}$ if it doesn't rule out the existence of these things. So to obtain a model of $\mathsf{ZFC}$, we are only required to throw in the things that $\mathsf{ZFC}$ defines and proves to exist: there are only countably many of these!

What about the set of real numbers, which $\mathsf{ZFC}$ proves to exist? Well, we are only building a structure that satisfies a collection of sentences. There is no guarantee that we are really building the actual mathematical universe. Thus if $M$ is a countable model, $M$ is going to satisfy "there exists a unique set of real numbers", but we are not guaranteed in having that the unique set that $M$ judges to be the real numbers is really the set of all real numbers (although there might be overlap).

A model $M$ is transitive if and only if $x\in M$ implies $x\subseteq M$. Intuitively, the model $M$ “sees” the elements of its elements, the elements of those elements, and so on. Transitive models are nice, in that if $M_1\subseteq M_2$ are transitive models, then they agree on the meaning of basic, simple terms. For instance, if $x\in M_1$ is such that $M_1$ satisfies the sentence $``x \text{ is the set of natural numbers}”$, then $M_2$ will satisfy the same sentence. Recall that the use of forcing is to build a model of $\mathsf{ZFC}$ on top of another one, so this $M_1\subseteq M_2$ situation is precisely what we will end up with. Therefore it is important to be able to hold the meaning of certain terms fixed across these two models.

One needs of course to define what "basic, simple terms" are and prove that they behave as intended. For those interested, these refer to the things on the $\Delta_1$ level of the Levy-hierarchy . And the technical term for "holding the meaning fixed across models" is "absoluteness".

One should also note that "there exists a countable transitive model of $\mathsf{ZFC}$" is strictly stronger than "there exists a countable model of $\mathsf{ZFC}$". In fact, it is much, much stronger. One can show that "there exists a countable transitive model of $\mathsf{ZFC}$" implies $\mathsf{Con(ZFC)}$, $\mathsf{Con(ZFC+Con(ZFC))}$, $\mathsf{Con(ZFC+Con(ZFC+Con(ZFC)))}$, and so much more.

This means we are really cheating a little bit when we restrict attention to countable transitive models, whose existence is not really guaranteed by the starting assumption of merely $\mathsf{Con(ZFC)}$. The technical difficulty here lies in dispensing with the transitive model assumption while still working with it.

This is done via The Reflection Principle, which says given any finite subset of the $\mathsf{ZFC}$ axioms, one can find a countable transitive model for that. And the proof of $\mathsf{Con(ZFC)}\rightarrow \mathsf{Con(ZFC+P)}$ proceeds in 2 steps:

1. If $\mathsf{Con(ZFC+P)}$ does not hold, then it can deduce a contradiction in finitely many steps, using only finitely many axioms from $\mathsf{ZFC+P}$. Let's say it uses the axioms $A_1,...,A_{1024}, P$.

2. But the forcing construction will also end up only using finitely many axioms of $\mathsf{ZFC}$, say the axioms $B_1,...,B_{2048}$. Using the Reflection Principle, one obtains a countable transitive model of $\{A_1,...,A_{1024}, B_1,...,B_{2048}\}$, and use forcing to produce a countable transitive model of $\{A_1,...,A_{1024}, B_1,...,B_{2048}, P\}$. But we just assumed the latter theory is inconsistent!

Therefore, working from the $\mathsf{ZFC}$ axioms $\{A_1,...,A_{1024}, B_1,...,B_{2048}\}$, we proved the existence of a model of $\{A_1,...,A_{1024}, P\}$. This is impossible, if $\mathsf{ZFC}$ is consistent. So our starting assumption must be wrong. Hence $\mathsf{Con(ZFC+P)}$ must hold.

One should also note that "there exists a countable transitive model of $\mathsf{ZFC}$" is strictly stronger than "there exists a countable model of $\mathsf{ZFC}$". In fact, it is much, much stronger. One can show that "there exists a countable transitive model of $\mathsf{ZFC}$" implies $\mathsf{Con(ZFC)}$, $\mathsf{Con(ZFC+Con(ZFC))}$, $\mathsf{Con(ZFC+Con(ZFC+Con(ZFC)))}$, and so much more.

This means we are really cheating a little bit when we restrict attention to countable transitive models, whose existence is not really guaranteed by the starting assumption of merely $\mathsf{Con(ZFC)}$. The technical difficulty here lies in dispensing with the transitive model assumption while still working with it.

This is done via The Reflection Principle, which says given any finite subset of the $\mathsf{ZFC}$ axioms, one can find a countable transitive model for that. And the proof of $\mathsf{Con(ZFC)}\rightarrow \mathsf{Con(ZFC+P)}$ proceeds in 2 steps:

1. If $\mathsf{Con(ZFC+P)}$ does not hold, then it can deduce a contradiction in finitely many steps, using only finitely many axioms from $\mathsf{ZFC+P}$. Let's say it uses the axioms $A_1,...,A_{1024}, P$.

2. But the forcing construction will also end up only using finitely many axioms of $\mathsf{ZFC}$, say the axioms $B_1,...,B_{2048}$. Using the Reflection Principle, one obtains a countable transitive model of $\{A_1,...,A_{1024}, B_1,...,B_{2048}\}$, and use forcing to produce a countable transitive model of $\{A_1,...,A_{1024}, B_1,...,B_{2048}, P\}$. But we just assumed the latter theory is inconsistent!

Therefore, working from the $\mathsf{ZFC}$ axioms $\{A_1,...,A_{1024}, B_1,...,B_{2048}\}$, we proved the existence of a model of $\{A_1,...,A_{1024}, P\}$. This is impossible, if $\mathsf{ZFC}$ is consistent. So our starting assumption must be wrong. Hence $\mathsf{Con(ZFC+P)}$ must hold.

Suppose $M$ is a countable transitive model of $\mathsf{ZFC}$. Since $M$ is countable, we know for sure that there are elements that $M$ is missing. The function telling us that $M$ is countable, for example ($g: M\leftrightarrow \mathbb{N}$). Another example is the “height of $M$”. This is a common name for the set of ordinals in $M$, that is $M \cap \text{ Ordinals}$. This is because being an ordinal is a basic, simple notion that $M$ correctly recognizes. So if $M \cap \text{ Ordinals}$ is an element of $M$, then $M$ will satisfy “the class of ordinals form a set”, and we know this isn’t a theorem of $\mathsf{ZFC}$.

It is also clear that adding a single set to $M$ won’t be enough. This is because $\mathsf{ZFC}$ axioms are “generative” in spirit: if $x$ is a set, then $\{x\}$ is also a set; if $x,y$ are sets, then so is $\{x,y\}$, and so on. So whatever set we add to $M$, we will also have to add the sets that are generated by it, as required by the axioms. And the sets that are generated by those, ad infinitum.

The point of the last two paragraphs is this: while we know there are plenty of things that cannot be in $M$, we need to be careful in what we want to add to it. For instance, it should be not too informative, in that it makes the countability of $M$, or the “set-ness” of its ordinals, obvious.

Recall that our goal is to add a $0-1$ sequence to $M$. Such a sequence must exist, since there are uncountably many of these and $M$ is countable. But how do we pick out such a sequence? It cannot be too simple, because simple notions are guaranteed to exist; it cannot be too specific, because adding it to $M$ might not result in a model of $\mathsf{ZFC}$ any more. This was the challenge that Cohen faced, which he recounts in his account of the discovery of forcing.

The innovation of forcing derives from the remarkable decision that this sequence is going to be constructed using partial information. Given that our object of interest is an infinite $0-1$ sequence, we take partial information to mean the finite initial segments of the sequence.

Definition. The Cohen poset (short for partially ordered set) is the set of finite sequences of $0-1$: $\{s\mid s: \{0,…,n\}\to \{0,1\}, n\in \mathbb N\}$.

And the informal notion of construction using partial information is made precise by appealing to the notion of a filter.

Definition. A filter on the Cohen poset is a collection $G$ of finite $0-1$ sequences that are coherent in the following sense: 1) the empty sequence is in $G$; 2) if a sequence is in $G$, then all the initial sequences of it is also in $G$; 3) if $s_0,s_1$ are two sequences in $G$, then one must be an extension of the other.

For example, if $f: \mathbb N\to\{0,1\}$ is any $0-1$ sequence whatsoever, the set of all its initial segments will be a filter. This tells us that filters are not enough: each $0-1$ sequence corresponds to a filter. So this idea of “construct it using partial information” is too broad. It does not avoid the “too informative” or “too specific” or “too simple” problem. What’s missing is an additional restriction on the filters so that the object we end up obtaining is perfectly average: not too simple as to be fixed across all models, not too specific as to be generated outright in $M$, and not too informative as to be contradicting $\mathsf{ZFC}$.

Cohen realized this missing restriction was to be the topological notion of genericity. In addition to being coherent, the filter is further required to pick out elements that are perfectly average in the following sense.

Definition. A dense subset of the Cohen poset is one in which every finite $0-1$ sequence can find an extension. A filter is $M$-generic if it intersects every dense subset of the Cohen poset that is in $M$.

So generic filters capture the idea of being not too specific.

They are not too simple either. In fact, if $G$ is a $M$-generic filter on the Cohen poset, then it can be shown that $G$ is not an element of $M$. This is roughly because, if $G\in M$, then we can use $G$ to express the property of “not being in $G$” and show that the finite $0-1$ sequences having this property form a dense subset in $M$.

We observe that, for each natural number $n$, the set of finite binary sequences of length at least $n$ is dense. Or in other words, it is a generic property of binary sequences to have length $\geq n$. So for each $n$, a generic filter will contain finite sequences of length at least $n$.

This implies that the construction indeed gives us what we want. If $G$ is a $M$-generic filter, then piecing together its element (i.e., taking $\bigcup G$) gives us a genuine function $f: \mathbb N\to \{0,1\}$, an infinite $0-1$ sequence that is not in $M$.

If $M$ is a countable transitive model, then yes. This result is known as the Rasiow-Sikorski lemma, proven about a decade before the invention of forcing.

By all means this is one of those pleasant moments in mathematics where ideas from various fields converge. It turns out that the existence of a generic filter, which is motivated by our attempt to build an object outside of $M$ that is perfectly average, is a close relative to the Baire category theorem. The exact equivalence is spelled out in this paper by Goldblatt.

The proof of the Rasiowa-Sikorski lemma reveals a strong flavor of diagonal arguments, hinting at a uniformity between the Baire category proof of the uncountablility of the reals and the more familiar diagonalization proof. This analogy underlies much of the study of what’s known today as forcing axioms (see for instance the exposition by Viale).

The resulting sequence obtained by piecing together the generic filter is now called a Cohen (generic) real over $M$. Due to later works of Solovay, we now know in hindsight that if $M$ is a countable transitive model of $\mathsf{ZFC}$, then in the sense of topology almost every real number is generic over $M$.

Forcing is a tool to build new models out of old ones. So far, we started with a countable transitive model, and motivated the notion of generic filters. Such filters exist, and the resulting object is perfectly average in a precise sense. But we’ve only built an infinite binary sequence. How do we build new models of $\mathsf{ZFC}$?

In the above, we observed that it is not enough to just add the generic object to $M$. We need to add things that are generated by it. A doomed attempt is to try to find some way and describe what one needs to add to $M$: i.e., if $f$ is the generic sequence, we require that $\{f\}$ be in the new model, $\{\{f\}\}$ be in the new model, etc etc. This is doomed, because the $\mathsf{ZFC}$ axioms are so complicated that there is simply o hope for some tangible description to round out what one needs to add to $M$ in order to obtain a model of $\mathsf{ZFC}$. What’s missing, then, is some kind of process that “takes care of itself”, so to speak.

The incredibly ingenious realization by Cohen was that, just like the generic object is built using partial information, so can the entire extension model of $M$. To unuderstand this, we need to understand a theorem (really an axiom) of $\mathsf{ZFC}$: every set can be obtained in the following recursive process, carried out transfinitely:

- stage 0: take the empty set
- stage $\alpha+1$: take all the subsets of the things in stage $\alpha$
- at limit stage $\lambda$: collect all things gathered so far.

The familiar reader will notice that this is an informal description of the von Neumann hierarchy, where: $V_0=\emptyset$, $V_{\alpha+1}=\mathcal{P}(V_\alpha)$, and $V_\lambda = \bigcup_{\beta<\lambda}V_\beta$ for limit ordinals $\beta$.

We are slowly getting into the unfamiliar territory of definition by transfinite recursion and proof by transfinite induction. The reader who has never heard of the term ordinal might be at a loss here. In more familar recursions and inductions, one moves along the natural numbers. The ordinals, simply put, are objects that mark the steps of a recursion or induction that goes on longer than the natural numbers.

Some math textbooks will cover transfinite recursion or transfinite induction (for instance in Terrence Tao's Analysis). But more likely than not, transfinite recursion and induction will take a little getting used to. Moreover, the transfinite recursions to come will be massively more complicated than what one typically encounters in real analysis or formal logic. This might create a sense of unease for the faint of heart.

Some math textbooks will cover transfinite recursion or transfinite induction (for instance in Terrence Tao's Analysis). But more likely than not, transfinite recursion and induction will take a little getting used to. Moreover, the transfinite recursions to come will be massively more complicated than what one typically encounters in real analysis or formal logic. This might create a sense of unease for the faint of heart.

It is an axiom of $\mathsf{ZFC}$ that every set belongs to one of these stages. Intuitively, the von Neumann hierarchy describes a process building a model of $\mathsf{ZFC}$. Cohen decided to mimic this process, using the generic sequence we obtained.

The trick here requires a new perspective. That is, we no longer view the statement $x\in X$ as a proposition, but we view it as expressing a degree of truth. Of course, in the classical case, to assert $x\in X$ is to assert that $x\in X$ is true to degree 1, and to assert $x\notin X$ is to assert that $x\in X$ is true to degree 0 (i.e., not true at all).

That is, we would like to identify each set $X\in M$ with its characteristic function $char_X: X\to\{0,1\}$, where $char_X(x)=1$ if and only if $x\in X$. Since this is set theory, mind you, functions are really just sets of ordered pairs. So we shall view a set $X$ as the set $X \times \{0,1\}$, and we take the “true” elements of $X$ to be those that are paired with $1$.

One way to “extend” $M$ (in scare quotes because we are not really adding anything), is to do this for each element of $M$, so we end up getting the set of “possible elements of $M$”:

$\{(x,i)\mid i\in \{0,1\}~\&~ x\in M\}$

We can view these possible elements of $M$ as being undetermined whether they are in $M$ or not (so a little bit like Schrödinger’s poor cat), waiting to be given a “truth value” as to the question “am I in $M$?” One can also think of them as *names* of elements that will go into $M$, waiting to be called out by some kind of selector.

Of course, it is trivial to recover $M$ from the set of its possible elements: just take all the $(x,1)$, and take the left coordinate.

Let’s pause here and reflect the above way of thinking. We have a way of naming the possible elements of $M$, and we can recover $M$ by designating a privileged “truth value” to pick out what names we call out (in our case $1$). Our target is to modify this strategy so that it names the possible elements of the bigger model extending $M$, and we would like to find some ways to use the generic filter as well.

There is a shortcoming with the current strategy: it’s not expressive enough. By this I mean the names are taking the following form: $(a,b)$ where $a$ is an element of $M$, and $b$ is either $0$ or $1$. In their present form, they are too closely tied to the elements of $M$ to let us name an element that is not in it. It also relies too much of $M$: in order to “extend” $M$, we need to first have $M$ ready to form the names and call out the correct ones. These defects doom the current strategy as a way of *building/talking about a structure before we have it*.

Recall that it is wise to leave the construction of the extension model to a process that somehow “takes care of itself”. And a sensible candidate is to imitate the von Neumann hierarchy in a way that reflects the recursive nature of the construction. The essential idea of the von Neumann hierarchy is this:

- Build things in stages. At each stage, use only what is available at that stage.

So our current strategy fails to capture this guiding principle, because it is consider every element of $M$ at once, ignoring the hierarchical nature of the sets.

So let us mimic the von Neumann hierarchy, but not in the construction of the new sets, but in the construction of the *names*.

Definition. Call the two element poset $\{0,1\}$

the truth value poset

Definition. Let us consider the Name hierarchy for the truth value poset:

- $\text{Name}_0$ contains only the empty set. The empty set is the only level $0$ name.
- $\text{Name}_{\alpha+1} = \mathcal{P}(\text{Name}_\alpha\times \{0,1\})$.
- For limit ordinals $\lambda$, $\text{Name}_\lambda=\bigcup_{\beta<\lambda}\text{Name}_\beta$.

Of course, the name hierarchy described above is solely restricted to the truth value poset, and intended as an illustration.

The definition of names is probably one of the greatest challenges for someone seeing forcing for the first time.

The key point is that the construction of the names fully mimics the construction of the von Neumann hierarchy. At stage 0, we take the empty set and declare that to be a name. At stage $\alpha+1$, we take the names obtained in the previous stage, pair those with either $0$ or $1$, take arbitrary collections of the results, and declare these to be names of level $\alpha+1$. At limit stages, gather all names we've obtained so far.

In some texts, names are given the following definition: given a poset $\mathbb{P}$, a $\mathbb{P}$-name is any (possibly empty) set of ordered pairs of the form $(\sigma, p)$, where $\sigma$ is a $\mathbb{P}$-name and $p\in \mathbb{P}$. This seemingly circular definition is secured by noting that the empty set is a name, and names are built in the hierarchical way described above. An analogous description of what a **set** is might help: the empty set is a set; a set is anything that contains elements of the form $x$, where $x$ is a set.

I've attempted to tell a somewhat motivating story as how names come about. It's also interesting to note what Cohen himself remarked, on the discovery of (something in the spirit of) the name hierarchy:

"There are certainly moments in any mathematical discovery when the resolution of a problem takes place at such a subconscious level that, in retrospect, it seems impossible to dissect it and explain its origin. Rather, the entire idea presents itself at once, often perhaps in a vague form, but gradually becomes more precise.

This idea as it presented itself to me, appeared so different from any normal way of thinking, that I felt it could have enormous consequences."

The key point is that the construction of the names fully mimics the construction of the von Neumann hierarchy. At stage 0, we take the empty set and declare that to be a name. At stage $\alpha+1$, we take the names obtained in the previous stage, pair those with either $0$ or $1$, take arbitrary collections of the results, and declare these to be names of level $\alpha+1$. At limit stages, gather all names we've obtained so far.

In some texts, names are given the following definition: given a poset $\mathbb{P}$, a $\mathbb{P}$-name is any (possibly empty) set of ordered pairs of the form $(\sigma, p)$, where $\sigma$ is a $\mathbb{P}$-name and $p\in \mathbb{P}$. This seemingly circular definition is secured by noting that the empty set is a name, and names are built in the hierarchical way described above. An analogous description of what a **set** is might help: the empty set is a set; a set is anything that contains elements of the form $x$, where $x$ is a set.

I've attempted to tell a somewhat motivating story as how names come about. It's also interesting to note what Cohen himself remarked, on the discovery of (something in the spirit of) the name hierarchy:

"There are certainly moments in any mathematical discovery when the resolution of a problem takes place at such a subconscious level that, in retrospect, it seems impossible to dissect it and explain its origin. Rather, the entire idea presents itself at once, often perhaps in a vague form, but gradually becomes more precise.

This idea as it presented itself to me, appeared so different from any normal way of thinking, that I felt it could have enormous consequences."

So we’ve improved our description of the names of the possible elements in the “extension” of $M$. This is done by mimicking the von Neumann construction of the sets. But what we have are just names so far. We don’t have a model yet. We need a way, again, to turn names into real elements.

Definition. The filter $T$ on the truth value poset is the set $\{1\}$.

To recover $M$ from the names for the truth value poset, we also follow von Neumann’s guiding principle. This is sensible, because the names themselves are constructed recursively in a hierarchical fashion.

Note that we can’t just say “take left elements that are paired with $1$” like before, since these left elements are also themselves names. One needs to recursively give the names values like the following:

1. $M_0$ contains only the empty set. That is, the empty set (as a name) gets evaluated by the filter $T$ to be the empty set.

2. $M_{\alpha+1}$ contains the $T$-evaluation of the names in $\text{Name}_\alpha$ in the following sense: if $\tau$ is a name in stage $\text{Name}_{\alpha+1}$, then its $T$-evaluation is the $T$-evaluation of set of names in it. Or in symbols:

$val_T(\tau) = \{val_T(\sigma) \mid (\sigma, 1)\in \tau\}$

3. At limit stage, $M_\lambda$ is the set of all evaluations of the stages before $\lambda$.

One may verify, somewhat tediously, that the set of evaluations of the names is exactly $M$.

So we’ve improved our method of naming possible elements of a model and then instantiating them to be really elements of a model. This was done in the toy example with the truth value poset $\{0,1\}$ and the filter $T=\{1\}$. Forcing builds new models by taking more complicated posets and using generic filters on them as valuations. For instance:

Definition. The name hierarchy for the Cohen poset $\mathbb{C}$ is defined recursively as follows

- $\text{Name}_0$ contains only the empty set. The empty set is the only level $0$ name.
- $\text{Name}_{\alpha+1} = \mathcal{P}(\text{Name}_\alpha\times \mathbb{C})$
- For limit ordinals $\lambda$, $\text{Name}_\lambda=\bigcup_{\beta<\lambda}\text{Name}_\beta$

Notice that, in the view that we are setting up, elements in the Cohen poset itself takes the role of truth values. So we still think of a statement of the form $x\in X$ as an assertion of a degree of truth; however, we no longer think of “degrees of truth” as being just the set $\{0,1\}$. We abstractly take the “degrees of truth” to be the Cohen poset.

And if $G$ is a $M$-generic filter of the Cohen poset, the set of evaluations of the names in $M$, which we shall call the *generic extension* of $M$ by $G$ (in symbols, $M[G]$), is also obtained recursively.

1. $M[G]_0$ contains only the empty set. That is, the empty set (as a name) gets evaluated by the filter $G$ to be the empty set.

2. $M[G]_{\alpha+1}$ contains the $G$-evaluation of the names in $\text{Name}_\alpha$ in the following sense: if $\tau$ is a name in stage $\text{Name}_{\alpha+1}$, then its $G$-evaluation is the $G$-evaluation of set of names in it. Or in symbols:

$val_G(\tau) = \{val_G(\sigma) \mid \exists s\in G~ (\sigma, s)\in \tau\}$

3. At limit stage, $M[G]_\lambda$ is the set of all evaluations of the stages before $\lambda$.

$M[G]$ denotes the union of all such $M[G]_\alpha$’s. The term “forcing” derives from the following definition of the relation between the poset in question and what $M[G]$ satisfies.

Definition. If $s$ is an element of the Cohen poset and $P$ is a statement about elements of $M[G]$, then we write $s\Vdash P$ (“$s$ forces $P$”) to mean: every generic filter $G$ having $s$ as an element will make $M[G]$ satisfy $P$.

In many texts at this point, basic properties of $M[G]$ will be established. The most important ones are:

- $M[G]$ is countable transitive. $G\in M[G]$. $M\subseteq M[G]$. And any transitive model satisfying these conditions will contain $M[G]$.
- $M[G]$ satisfies $\mathsf{ZFC}$.

The proof of 2 is far from trivial. It requires a careful study of the relation between $M, G,$ and $M[G]$, as well as the forcing relation. A key component is the The Truth Lemma and the Definability Lemma, which jointly say that the forcing relation $s \Vdash P$ turns out to be equivalent to a topological property of the poset in question. This makes it a mathematical property that $M$ can handle using the $\mathsf{ZFC}$ axioms, rather than a metamathematical property about countable transitive models.

The majority of the time and energy on a first go at forcing will be spent on these facts, which involve very tedious proofs by induction. In particular, the equivalent topological property given by the truth and definiability lemmas will involve complex recursive definitions that are not intuitively clear at all.

So that is how forcing builds new models of $\mathsf{ZFC}$ out of old ones. I should stress that we’ve only looked at forcing with toy examples, namely the Cohen poset and the truth value poset. Forcing itself is incredibly versatile, in that one can use forcing with whatever poset one desires, and sophisticated arguments and extensions have been produced using clever designs of posets. For instance, let me say a little bit about how one obtains a model in which $2^{\aleph_0}\neq \aleph_1$ (so the continuum hypothesis fails in that model).

This is done by considering a suitable product of the Cohen poset.

- $M$ thinks that there is a poset that is $\aleph_2$-many copies of the Cohen poset.
- Forcing with it produces a generic extension that has that many infinite $0-1$ sequences, and hence that many real numbers.
- One then checks that the generic filter $G$ is not too informative as to disturb the calculation of what $\aleph_2$ (“the second uncountable cardinal”) is between $M$ and $M[G]$. This step is necessary because as we’ve seen, adding things to $M$ might reveal the countability of elements of $M$.
- This is ensured by appealing to the truth lemma and the definability lemma, and using a topological property of the poset known as the countable chain condition.
- So one concludes that, in $M[G]$, $2^{\aleph_0}\geq \aleph_2$.

I should also point out that the approach outlined in this post, namely the approach using countable transitive models, is one of several equivalent approaches to forcing. Some others are:

- Syntactic forcing.

In syntactic forcing, one completely dispenses with any talks of models or truth in a model. Rather, one proceeds directly to the topological property given by the truth lemma and the definability lemma, making forcing really the study of topology and the Unicodes of formulas.

Pro: no need to talk about countable transitive models or the existence of generics any more.

Con: involves complicated coding and translating semantics notions (such as truth) into somewhat stilted syntactic notions that are not very intuitive.

- Boolean-valued models.

In this approach, one takes the truth value metaphor seriously. If one forces with the special class of posets known as the complete Boolean algebras, one can make precise the notion of a Boolean-valued characteristic function (which is what names are in this approach) and the corresponding Boolean-valued model. So names end up playing the roles of actual elements in a Boolean-valued model. This was invented by Solovay and Scott shortly after Cohen’s invention of forcing, in order to get a conceptually perspicuous way in thinking about what forcing is really doing.

Pro: it is easier to understand what forcing is trying to do in this way, and if one has category-theoretic backgrounds, this is arguably the most natural way of understanding forcing. In addition, many of the results from the study of Boolean algebras can be applied to the study of generic extensions, a connection that is obscured in the countable transitive model approach.

Con: substantial prerequisites in the theory of complete Boolean algebras, including the separative quotient of a poset, regular open completion of a separative poset, etc.

- Boolean-valued ultrapowers.

This is a slight variant of Boolean-valued models. One uses the Boolean analogue of the ultrapower construction to obtain an (uncountable) model over which generics exist. Joel Hamkins provided a naturalist account of forcing based on this idea.

- For an introductory text, Kunen’s
*Set Theory: An Introduction to Independence Proofs*is really hard to beat. This book takes the countable transitive model approach, but has a section devoted to addressing other approaches. - Jech’s
*Set Theory: The Third Millennium Edition*is another classic. This encyclopedia of a book contains an introductory chapter to forcing and uses the Boolean-valued approach. - If Kunen’s and Jech’s books are too long or detailed, two other shorter, more focused alternatives are:
*Forcing for Mathematicians*by Weaver and*Fast Track to Forcing*by Džamonja. Besides being both dedicated to an introduction to forcing, the former has the advantage of surveying many actual mathematical problems that are proven to be independent using forcing and has a few chapters on forcing axioms; and the latter surveys many sophisticated posets invented by set theorists. - This set of lecture notes by Kameryn Williams. Very compact and self-contained material that takes one from logical preliminaries all the way to applications of forcing to the study of large cardinals (what a ride!). It has a section devoted to the introduction of forcing.
- Intermediate extensions or symmetric submodels. If $M$ satisfies $\mathsf{ZFC}$, then $M[G]$ does too. So it might seem that independence results about the axiom of choice cannot be obtained by forcing. This is not the case, as (Cohen proved) one can look at models of set theory between $M$ and $M[G]$ in a systematic way, to obtain models where the axiom of choice fails. This subject is treated in Jech’s
*The Axiom of Choice*. The tour de force of this line of thought is no doubt Solovay’s classic 1970 paper on the Annals of Mathematics, where he produced a model where every set of the real numbers is Lebesgue measurable (and much more!). - Applications of forcing to mathematics. The later chapters of Weaver’s book contain proofs of independence of some open problems. Forcing also provides new tools to study sets of real numbers, which is the topic of Miller’s book Descriptive Set Theory and Forcing: How to prove theorems about Borel sets the hard way.. A more comprehensive overview of set-theoretic tools used in real analysis, see Set Theory and the Analyst.
- The proliferation of generic extensions makes these extensions and the forcing method themselves interesting objects of study. This is a subfield known as Set-theoretic Geology. See also The Modal Logic of Forcing.
- The proliferation of generic extensions also invites the question: are the axioms of set theory more similar to the axioms of group theory than the axioms of real numbers? If so, what can one say about the truth/falsity of independent statements, such as the continuum hypothesis? A provocative view comes from Joel Hamkins in his influential paper The Set-theoretic Multiverse. This whole debate is nicely recounted in the early sections of A Reconstruction of Steel’s Multiverse Project by Maddy and Meadows. The same sections also contain a riveting narrative of how the independence phenomenon influenced the development of set theory and set theorists’ thinking.
- And much much more!