Different kinds of ∞, and the א numbers

We all have some notion of what it means for something to be infinite. Take the biggest number you can think of, and multiply it by 7! Ok, not quite but since we’ve already conjured up the image, let’s work with it. Call this number a. Now let’s, raise it to the power of itself. Are we there yet? Not even close since we still have,

    \begin{align*}\lim\limits_{x\rightarrow\infty} \frac{a^a}{x}=0\end{align*}

no matter how wildly imaginative you were in your choice of fixed \frac{a}{7} in the first place. But can one kind of infinity be smaller/larger than another? Well, a brilliant 19^{\text{th}} century German mathematician thought so, and now, most people agree with him.

When Georg Cantor published his thoughts advancing this notion, it made him the laughing stock of his contemporaries. To quote this interesting paper on Cantor’s life,

Leopold Kronecker, one of Cantor’s teachers and among the most prominent members of the German mathematics establishment, even attacked Cantor personally, calling him a “scientific charlatan,” a “renegade” and a “corrupter of youth”.

The ridicule of his peers caused Cantor immense emotional distress, leading him to shy away from the public eye, and he eventually passed away secluded, and in poverty. Nevertheless, Cantor’s singular thought earned him a little piece of immortality as his theorems live on, and are part of standard curriculum in advanced mathematics. His ideas are essential in Measure Theory,  and its application to Quant Finance through the Stochastic Calculus. In Part I, we provide some basic background in Set Theory, then state and prove what is now known as Cantor’s Theorem.

Bijective Maps

Consider a set X containing N different elements.


These elements can be integers, tractors, hamburgers, hamburgers and tractors, etc. The crucial thing is that the elements are, as Cantor put it,

distinct of our perception (Anschauung) or of our thought.

ln other words, x_i\neq x_j whenever i\neq j, \ 1\leq i,j \leq N. Suppose Y is any other set. We might wish to create some kind of relationship between the elements of X and the elements of Y. The apparatus for this task is our good ol’ friend, the function.

A function or map f:X\rightarrow Y is a machine that takes an element from its domain X, and spits out an element from its codomain Y. There are 3 very general classes of functions, the third just being a combination of the first 2.

Definition (Injection):

A function f:X\rightarrow Y is an injection, or one to one,  if for every x_1,x_2\in X,  f(x_1)=f(x_2) implies x_1=x_2.

Here f is an injection, but not a surjection

As can be seen from the diagram, the injection f maps every element x_i\in X to some  y_j\in Y, such that no two elements of X map to the same element of Y. The visual test for injectivity generally encountered in pre-calculus is the so called horizontal line test, which states that a function is injective if and only if one can draw any horizontal line on the graph of f(x)=y such that the line touches f(x) at at most one point. Note that in our diagram, f(X)\neq Y, leading us to…

Definition (surjection):

A function f:X\rightarrow Y is a surjection, or onto,  if  for every y\in Y, there is at least one x\in X such that y=f(x).

A surjection, but not an injection

Hence, if the image of X under f is the entire codomain Y, i.e. f(X)=Y, then f:X\rightarrow Y is a surjection. sur is the French word for on, which conveys the idea that f maps X onto the entire codomain Y.

Definition 1 (bijection):

A function f:X\rightarrow Y is a bijection if it is both an injection and a surjection.

A bijection is both an injection and surjection


Definition 2 (bijection):

f:X\rightarrow Y is a bijection if and only if there exists a function f^{-1}:Y\rightarrow X, such that for all x\in X and y \in Y   

    \begin{align*}f^{-1}\circ f(x) &:=f^{-1}\left(f(x)\right) = I(x) \\ \text{ and } f\circ f^{-1}(y) &:= f\left(f^{-1}(y)\right)=I(y) \\ \\ \text{where }I(z) &= z, \ \text{(the identity map).}\end{align*}

Hence, whenf^{-1} is both a left inverse and a right inverse, we say that f^{-1} is the inverse function of f. When such an f^{-1} described exists, f is invertible. Hence bijections are always invertible functions and vice versa. It’s a nice exercise to demonstrate the equivalence of these 2 definitions of bijections.

Ok, let’s consider 3 functions, just to summarize.

    \begin{align*}f(x)&=x^2 \\ g(x)&=x^3\\h(x) &=e^x\end{align*}

To start, we’ll assume the 3 map from \mathbb{R} to \mathbb{R}.

The Horizontal Line Test – Can you figure out the approximate scale of the graph? :)

Right off the bat we can see that 

    \begin{align*}f(x)=f(-x), \ \forall x\in\mathbb{R}\end{align*}

A so called even function. Thus f is not an injection and therefore not a bijection.

We can make it an injection by restricting the domain of f to non-negative reals on [0,\infty).

Done. So is 

    \begin{align*}f:[0,\infty)\rightarrow \mathbb{R}\end{align*}

a bijection? Still not, since there’s no real number that when squared, will yield a negative real number. So while we have tweaked f into an injection, it’s still not a surjection, and hence not a bijection. So we need to change the codomain too, by restricting it to all non-negative reals. 

    \begin{align*}f:[0,\infty)\rightarrow [0,\infty) \end{align*}

And finally, by restricting f to the upper right quadrant, we have our bijection.

What about g:\mathbb{R}\rightarrow\mathbb{R}? First notice that the derivative 

    \begin{align*}\frac{dg(x)}{dx} &= 3x^2 \\ &>0, \ \forall x \in\mathbb{R}\setminus \{0\}\end{align*}

So g is continuously differentiable, and strictly increasing in x over the real line (except at the origin where it switches from a concave function to a convex function). Hence for any two reals x_1<x_2, g(x_2) can’t possibly be  g(x_1), since g is strictly increasing on [x_1,x_2].

Hence all strictly monotonic functions are injections. Since g is also continuous over all of \mathbb{R},  (else g would not be differentiable over \mathbb{R}), all reals are hit as  x\rightarrow\pm\infty, and we conclude g is a bijection (actually a homeomorphism). In general, all strictly monotonic functions map bijectively from their domains to their images (not necessarily codomains!). Notice that g^{-1}(y) =\sqrt[3]{y} exists and is also a bijection.

You can use the exact same argument to show 

    \begin{align*}h &:\mathbb{R}\rightarrow (0,\infty): x\mapsto e^x \\ h^{-1} &:(0,\infty)\rightarrow \mathbb{R}:y\mapsto \ln y\end{align*}

 are bijections.


Suppose we wanted to get a sense of the size of a set F. In other words, we’d like to measure F. To keep things simple, we’ll just count the number of elements in F, and call this the size of F. This is what Cantor called the cardinality, or cardinal measure of F, which we’ll denote by |F|. Why choose cardinality to measure F? For one thing, its universally understandable – since everybody knows how to count. More importantly though, it’s unitless.

To see why, let F be a set of N different pigeons each capable of delivering 1 mailpiece. If E is a set of N different letters, then you are guaranteed that each piece of mail can be assigned to a unique pigeon. In other words, |F|=|E|. This is known as the Pigeonhole Principle.

The Axiom of Choice – A close relative of the pigeonhole principle


 |F|=|E| if and only if there is a bijection f  mapping E onto F. We then say F and E are equinumerous. 

In other words, for each letter l_i in the mailbag E, there is a unique pigeon p_i in the coop F such that p_i=f(l_i). The result is that all pigeons and all letters will be put  in a 1 to 1 correspondence.

Our bijection f

These finite cardinal measures, or finite cardinals are represented by elements of  \mathbb{N}. All the naturals represent here, are magnitudes of sets. No more, no less.


Notice that each of the l_i, p_i isn’t quite a number, and it’s not obvious that p_1<p_3 or l_7 <l_N.
Why should a particular pigeon come before any other in the set? Because it’s prettier? Because it’s faster? Actually, it’s simply because we’ve chosen to order it that way. The ordering information is encoded in the subscripts, called ordinals. Once again, the characters we use for these finite ordinal numbers are the natural numbers.

Unlike cardinals, which represent magnitudes of sets, ordinals are used to induce an order relation on sets. Keep in mind that notwithstanding cardinals’ and ordinals’ use of the same characters (arabic numerals),  they represent different things. In fact, the pronunciation is different. “One”, “Two”, “Three”, …, for cardinals, “First”, “Second”, “Third”,…, for ordinals. Cardinals are absolute magnitudes, which have meaning on their lonesome. Ordinals, on the other hand, are relative magnitudes, which can only be understood in the context of other ordinals.

Countable sets

We mentioned the fact that the cardinal measure is useful since it relies on the universal ability to count. Cantor wanted to consider the implications of making this concept more precise.


A set E is countable if and only if there exists a bijection f such that f(E)\subseteq\mathbb{N}.

Here \mathbb{N} is the set of all Natural Numbers, (or counting numbers). i.e.,


Hence, F will be countable, provided we can bijectively map it onto a set of naturals, possibly all of \mathbb{N}.
We have already seen that when a set F has finitely many elements, it can be bijectively mapped to any other set with the same finite cardinality. Hence, every finite set is countable.

To see why, let |F|=N for some finite number N. Then we can just take the set \widetilde{F}=\{1,2,3,\dots,N\} as our subset of \mathbb{N}. Since both sets have cardinality N, we can simply let our bijection be 

    \begin{align*}f:F\rightarrow\widetilde{F}\subset\mathbb{N} \text{ such that } f(p_i) = i, \ i =1,2,3,\dots, N \end{align*}

What about sets with infinitely many elements? Well, countability is defined in terms of \mathbb{N}. Yet,


Does that mean we can’t count the countable numbers? Referring back to the definition of a countable set, all we need to do is demonstrate the existence of a bijective map from \mathbb{N} to \mathbb{N} , and we are done.

This is easy enough, let’s define f:\mathbb{N}\rightarrow \mathbb{N}

    \begin{align*}f(i) &= i, \ \forall i\in\mathbb{N}\end{align*}

(A)bort , (R)etry: (F)ail: R

Let’s verify that f so defined forms a bijection. Consider any two natural numbers i,j\in\mathbb{N}. Then,

    \begin{align*}f(i)-f(j) &= i-j\end{align*}

Hence f(i)=f(j) if and only if i=j showing that f is injective. It is also a surjection, since for every natural i in the codomain \mathbb{N}, there is an \widetilde{i} in the domain \mathbb{N} such that f(\widetilde{i})=i. In particular, \widetilde{i}=i. Hence f produces unique, exhaustive pairings between \mathbb{N} and \mathbb{N}, and is therefore a bijection. 

We have shown that \mathbb{N} is indeed countable, as is any infinitely numerous set for which there is a bijection onto \mathbb{N}.

Cantor called all such sets countably infinite, at most countable, or simply countable.

The Powerset

A very important set that can be derived from any set F is its powerset denoted by \mathcal{P}(F). \mathcal{P}(F) is the set of all possible unique subsets of F.

To see what this means, suppose an entrepeneur wished to capitalize on the USPS’s termination of Saturday delivery, by opening a pigeon mail delivery service. To keep pigeon morale high, s/he decides that every pigeon will have the opportunity to make a delivery with a group of its peers, where the group is responsible for delivering a single mailpiece. In order to ensure the pigeons don’t have too much fun, each pigeon will still have to make a delivery on its own from time to time. Let’s say s/he anticipates a mailbag E of 3 letters every Saturday, and therefore invests in a set F of 3 pigeons for now. How many different kinds of groups can be formed from these pigeons? Well,

    \begin{align*}\mathcal{P}(F) = \{\emptyset, \{p_1\}, \{p_2\}, \{p_3\}, \{p_1,p_2\},\{p_1,p_3\}, \{p_2,p_3\}, \{p_1,p_2,p_3\} \}\end{align*}

Hence there are 7 possible groups, plus one empty set denoted by \emptyset = \{\}, giving 8 total.

Powersets exist for every set by the last of the Zermelo–Fraenkel (ZF) axioms – Axiom of Powerset (Axiom VIII).

Cardinality of powersets of finite sets

Notice that


In general,

    \begin{align*}|F| &<|\mathcal{P}(F)| \\ &=2^{|F|}, \ \forall F, |F|<\infty. \end{align*}

This isn’t obvious though. Just because it held in the case of 3 pigeons, doesn’t necessarily mean it should hold in the case of say, a^a.

There are several ways to convince yourself with minimal guilt that it will hold for every single finite cardinal.
The following is proof by induction that it will.
Please see oldrin.bataku’s intuitive explanation, that requires no combinatorial messiness.


\textbf{Proof: } \text{(by induction)}

Let’s try F with 4 pigeons. How many elements are in \mathcal{P}(F)? Let’s phrase this differently – how many unique groups of pigeons in F are there?

Recall that there  are 

    \begin{align*}\binom{N}{k}&:= \ _{N}C_k :=\frac{N!}{(N-k)!k!} \qquad \qquad (0\leq k\leq N)\end{align*}

(read: N choose k) ways of making a unique collection of k items from a set of N elements. Where the factorial operator,  is defined by,

    \begin{align*}N! & :=N(N-1)! \\ &=N(N-1)(N-2)(N-3)\cdot\dots\cdot 3\cdot 2 \cdot 1\cdot\underbrace{0!}_{:=1}\end{align*}

Thus, the total possible number of unique groups of 2 pigeons formed from the 4 is,

    \begin{align*} \binom{4}{2} &=\frac{4!}{2!2!} \\ &= 6 \end{align*}


    \begin{align*}|\mathcal{P}(F)| &= \binom{4}{0}+\binom{4}{1}+\binom{4}{2}+\binom{4}{3}+\binom{4}{4} \\ &= 16 \\ &= 2^4\end{align*}

Ok, we’re getting pretty confident that our hunch is right. We’ve already shown shown that groups of 3, and 4 pigeons have powersets with cardinality 2^3,2^4, respectively.

Now, assume that every set of N elements has powerset cardinality 2^N. If we can show this assumption implies every set of N+1 elements has powerset cardinality 2^{N+1}, we will have shown this holds for all N\geq 3. Note that had we started by demonstrating the (trivial) case of N=0 (the empty set \emptyset=\{\} (where clearly we can only make 2^0=1 choice – to choose nothing), we would then be showing this holds for all  N\geq 0. So now that we have already shown it, we might as well just say we’re showing it for all such non-negative N. This is called proof by induction.

Towards this end, let F have N+1 elements.

    \begin{align*}|\mathcal{P}(F)| &=\sum_{k=0}^{N+1} \binom{N+1}{k} \\ &= \binom{N+1}{0}+\sum_{k=1}^{N} \binom{N+1}{k} + \binom{N+1}{N+1} \\ &= \binom{N+1}{0}+\sum_{k=1}^{N} \left(\binom{N}{k}+\binom{N}{k-1}\right) + \binom{N+1}{N+1}  & (*) \\ &= \left(\binom{N}{0}+\sum_{k=1}^N\binom{N}{k}\right) +\left(1+\sum_{k=1}^N\binom{N}{k-1}\right) \\ \text{let } j = k-1 \\ &=2^N +\left(\binom{N}{N}+\sum_{j=0}^{N-1}\binom{N}{j}\right)  \\ &=2(2^N) \\ &=2^{N+1} \end{align*}

Which is what we wanted to show.

Note that in equality (*), we used Pascal’s identity,


Actually, as oldrin.bataku notes,  the above proof is just a special case of a more general result – the Binomial theorem, which requires just a tiny bit more work to show that,

    \begin{align*}(x+y)^n &=\sum_{k=0}^n \binom{n}{k}x^k y^{n-k}, &x,y\in\mathbb{R},n\in\mathbb{N}\end{align*}

Letting x=y=1 gives us 2^n.


Yet, what happens in the case where N=\infty? Well, the above formula yields,

    \begin{align*}|\mathcal{P}(F)|&=2^\infty \\ &=\infty \end{align*}

But this doesn’t give us our answer, since |\mathbb{N}|=\infty and as shown above it’s still countable. Furthermore using F, E, as our respective pigeon coop and mailbag, intuition would suggest that though there will be an infinite number of pigeon groups, if ever there is a group without a letter, we should still be able to just reach into the infinite mailbag and grab a letter, without taking one away from any other group, right? Let’s find out.

Cantor’s Theorem: transfinite cardinals

What follows is Cantor’s brilliant argument that the determination of whether an infinitely large set is countable is dependent upon just how “infinite” it is. It starts by considering the most basic powerset of a countably infinite set – \mathcal{P}(\mathbb{N}).
Let’s assume for the moment that we’ve found a bijection mapping \mathbb{N} onto its powerset.


Let’s construct a special kind of set D defined by,

    \begin{align*}D=\left\{i\in \mathbb{N}| i\notin f(i)\right\}\end{align*}

In words, D is the set of all natural numbers i, such that i is not in the set to which f maps it. That is, all naturals i, with the property that i\notin f(i). To illustrate, let’s say our bijection of the first 5 naturals looks like this,


After these 5 maps, which elements will we have added to D? Since 1 isn’t in f(1)1 must be in D. Since 2 is in f(2)2\notin D. Similarly, 3,5\in D whereas 4\notin D. Hence, so far, D=\{1,3,5\}. Continuing in this fashion, numbers will be added to D, so long as they aren’t in their map under f.

Let’s stop for a moment and consider what we have. D is just a set of natural numbers, and therefore being a subset of \mathbb{N}, must be an element of the powerset \mathcal{P}(\mathbb{N}). Since D\in\mathcal{P}(\mathbb{N}), and we have assumed that f, being bijective, matches every natural number with a unique element of the powerset, there must be some natural number j such that f(j)=D.

Let’s see what else we can say. One thing that we can be certain of is j is either in D or not in D.

So is j\in D? Well, if it were, that would mean that j gets mapped to a set containing it, and by the way we defined D, this is impossible. Hence j is not in D.

But if j\notin D, then j is mapped by f to a set which doesn’t contain it, meaning that j must be in D. Clearly j being both in and not in D is absurd, and we can only conclude our assumption of f being a bijection is clearly wrong. Hence there is no bijective function mapping \mathbb{N} into its powerset. We can only conclude that even though \mathbb{N} is itself countably infinite, its powerset just has too many elements to be countable.

Ok, so far we’ve shown that

    \begin{align*}|\mathbb{N}| &< |\mathcal{P}(\mathbb{N} )| \end{align*}


    \begin{align*} |F| &< |\mathcal{P}(F)|\end{align*}

for any countable set  F.

What about uncountable sets? Then the same argument still applies. Let X be an arbitrary set, and f:X\rightarrow \mathcal{P}(X) be a purported bijection. Then we once again construct our special set 

    \begin{align*}D=\{x\in X|x\notin f(x)\}.\end{align*}

and proceed to show, as we did in the countable case, that this generates the contradiction, 

    \begin{align*}x\in D \Longleftrightarrow x \notin D.\end{align*}

Hence for arbitrary set F


The ב ,א, and ω numbers

Under revision: Thx Fourmisain

Cantor realized that even though both countable sets and their powersets had infinite cardinality, these cardinalities represented different magnitudes of infinity, the latter being indescribably larger than the former. For this reason, he created two new classes of numbers called the transfinite numbers.

Transfinite ordinals the class of which is denoted by ON, are used to encode order information upon infinitely numerous sets. The first and smallest such number is denoted by \omega. There are a variety of characterizations of such numbers, (see Von Neumann’s construction), but we can loosely consider \omega to be the first “spot” at the end of a countably infinite “list”.

Hence, the next few transfinite ordinals will be 


until we can no longer count these infinite numbers, and we “arrive” at the first uncountable transfinite ordinal, \omega_1. Keep in mind that transfinite numbers do not usually behave the same way finite numbers do under familiar operations of addition and multiplication. For example, for any finite ordinal number c,

    \begin{align*}\omega + c &\neq c+\omega \\ &=\omega \\ &= c\cdot \omega \\ &\neq \omega \cdot c \\ &= \sum_{i=1}^c \omega\end{align*}

See here if interested why. To give a visual, from wiki,

Start at 0, and move clockwise

The transfinite cardinals, also known as the aleph numbers, are denoted by,

    \begin{align*}\aleph_{\alpha}, \qquad \aleph_{\alpha} < \aleph_{\alpha+1}\end{align*}

For every ordinal \alpha. Cantor chose \aleph  from the Hebrew words אין סוף (Ayn Sof), which translates as “without end”, and is one of the names used to describe God in the Zohar.

Cantor discerned between these infinities by replacing infinity with \aleph_0 for the cardinality of countably infinite sets, and \aleph_1 for uncountably infinite sets. Hence, Cantor’s Theorem can be succinctly stated as,


In the next post, we’ll explore the nature of two extremely important infinite sets: the set of all rational numbers (\mathbb{Q}), and the real numbers (\mathbb{R}). We’ll also take a look at another of Cantor’s brilliant constructs – the Cantor set.

7 thoughts on “Different kinds of ∞, and the א numbers

  1. Hey, Dovi, nice presentation. Check the section titled ‘Countable sets’. you say that you will demonstrate the existence of a bijective map from N to N. It seems to me that you postulate the existence of said bijective map in lieu and place of demonstrating it. Maybe I am wrong.

  2. Nice post, though you made a mistake in giving the power set of F. It should read

    P(F) = {{}, {p_1}, {p_2}, {p_3}, {p_1, p_2}, {p_1, p_3}, {p_2, p_3}, {p_1, p_2, p_3}}

    rather than

    P(F) = {{}, p_1, p_2, p_3, {p_1, p_2}, {p_1, p_3}, {p_2, p_3}, {p_1, p_2, p_3}}

  3. by the way, there is a simpler ‘proof’ of the cardinality of the power set by considering that for each subset there are two possibilities for each element — membership or lack thereof. It follows that for a set of cardinality n we have n such possibilities per element and so 2^n possible subsets corresponding to the different ways we can include or ‘leave out’ elements per subset. I usually find this more intuitive to explain than the inductive argument. You might also wish to appeal to the binomial theorem in noting nC0 + nC1 + nC2 + … + nCn = (1 + 1)^n

  4. Thank you! I made the changes. I was actually considering proving the binomial theorem, rather than the special case we were dealing with. Will do that when I get home later.

  5. This is a good introductory to Cantor’s work, although I felt, that the the end of your blogpost isn’t that fleshed out and looked into.

    First of all, I was wondering, why you are restricting Cantor’s Theorem only to the set of natural numbers, when it applies to every set, especially every infinite one: |X| < |P(X)|, for every set X. That alone tells you, that there has to be an infinite amount of infinite cardinals (the cardinals omega, |P(omega)|, |P(P(omega))| are sometimes called beth numbers). You don't even need to change your proof, as you didn't use any specific property of the set of natural numbers.

    Then there is an error in your last paragraph.
    You are stating, that Aleph_0 < 2^{Aleph_0} = Aleph_1.
    Of course Aleph_0 < Aleph_1, by definition; but the statement 2^{Aleph_0} = Aleph_1 (or Beth_1 = Aleph_1) is called the continuum hypothesis and can neither be proven nor disproven (in ZFC), which in itself is one of the most interesting things about mathematics or mathematical logic, tying tightly together with Gödel's incompleteness theorems. It probably also was a driving force of Cantor's mental demise, constantly trying to prove the unprovable.
    The only thing one can be sure of, is that 2^{Aleph_0} is at least as big as Aleph_1 and that 2^{Aleph_0} is the cardinality of the set of real numbers, although the latter isn't trivial.

  6. Hmm, I actually intended for the Cantor’s Theorem section to generalize the inequality |F|<|\mathcal{P}(F)| developed for finite sets in the previous section, but somehow stopped short of showing it holds beyond countable sets such as \mathbb{N}.

    You’re absolutely right about 2^{\aleph _0}=\aleph _1 unprovability/undisprovability in ZFC, and I’ve made the change. I should probably demonstrate at least why |\mathbb{R}|> \aleph _0, and |\mathbb{R}|=2^{\aleph _0}, in the next post.
    Thank you for pointing out that mistake!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>