Doubling a Cube Using Origami There are three problems in classical geometry that for many centuries were open: trisecting an angle, doubling the cube, and squaring the circle. These are all now known to be impossible using "classical methods". What do I mean by "classical methods"? I mean we start with a Euclidean plane, so flat, and extending infinitely far in all directions. We are given two tools: an unmarked compass and an unmarked straightedge. The compass allows us to draw circles; if we have a point and a length, we can draw a circle whose center is at that point and whose radius is that length. But the compass is unmarked, so we can't actually just pick a length arbitrarily. We have to first construct or be given a line segment of that length. The straightedge allows us to draw line segments. In particular, given two points, we can draw a line segment that joins those two points, and to extend that line segment as far as we want. We can also find and draw lines that are tangent to a circle, so if we have a point outside of the circle we can draw a line that goes through the point and is tangent to the circle, or if we have a point on the circle we can draw the tangent line to the circle through that point. But the straightedge is unmarked, so again we can't pick the lengths of our line segments beforehand; we need points to cordon off any particular lengths we want to use. The problem of doubling a cube is to take a line segment and construct another line segment whose length is the length of the first segment times the cube root of 2. The problem of trisecting an angle is to take two line segments that intersect at a point with some angle, and construct a third line segment that also goes through the point and whose angle with one of the first two line segments is 1/3 of the original angle. The problem of squaring the circle is to take a circle and construct a square whose area is exactly that contained within the circle. All of these can be done if we allow calculus, but the question was how much could be done with just an unmarked compass and an unmarked straightedge? Let's look at the question of doubling a cube. It is really a question of constructing two line segments where the ratio of their lengths is the cube root of 2. So we can ask: can we construct a cube root using an unmarked compass and an unmarked straight edge? Well, actually, first let's ask a simpler question: can we construct a square root using an unmarked compass and an unmarked straightedge? In particular, suppose that we have a line segment of length L; can we construct another line segment of length L√n for any natural number n? The answer turns out to be yes. There's a fairly simple construction, although that doesn't mean it's easy to stumble across. Spoiler: Hint Set up two similar right triangles that share a leg; what does that tell you about the ratios of the leg lengths? So we can take square roots. Indeed, the quadratic formula tells us that since we can take square roots, we can in fact solve general quadratic equations with integer coefficients, at least once you figure out how to formulate them in terms of line segments. Since you can take square roots, you can also take fourth roots, and eight roots, and sixteenth roots, by repeatedly taking square roots. Now consider all the numbers you get by starting with the natural numbers and including everything you can get by repeatedly adding, subtracting, multiplying, dividing, and taking square roots. We end up with a field that we call the "constructible numbers". It turns out that these numbers are precisely the length ratios that you can get via compass and straightedge starting from a single line segment. To briefly give some intuition: compasses and straightedges given you lines and circles. From an algebraic perspective, a line is defined by a degree 1 polynomial: y = mx + b. So you can do linear stuff with lines. A circle is defined by a degree 2 polynomial: x^{2} + y^{2} = r^{2}. So you can solve quadratic stuff since you have access to circles. Thus you might expect that anything you can construct with lines and circles are the things you can construct via basic arithmetic and solving quadratic equations. Now the question becomes: is the cube root of 2 a constructible number? I don't remember if I talked about Galois theory in this thread yet, so I'll just say it outright: no, the cube root of 2 is not constructible. Indeed, if an integer is not the cube of another integer, then its cube root is not constructible. So you can't double a cube using just a compass and straightedge. In fact, trisecting an angle is also a form of taking a cube root; the angle addition formulas in trigonometry and/or De Moivre's law will show that being able to trisect an arbitrary angle is equivalent to being able to take the cube root of an arbitrary integer. Just as there are some integers whose cube roots are easy, like 8 or 27, there are some angles that are easy to trisect; but most of the time you're out of luck. However, with origami, you can take the cube root of a length! Firstly, what are the allows moves in origami? You can make a fold along a line that joins two specific points, you can make a fold that takes one specified point and puts it in the same place as another specified point, you can make a fold that takes one specified line and lays it along another specified line, and given two points and two lines, you can make a fold that puts the first point on the first line and the second point on the second line, provided such a fold exists at all. A little work shows that anything you can construct with a compass and straightedge can be achieved via folding, provided you have a big enough piece of paper. But you can also take cube roots! Take two lines that cross at right angles and treat them as coordinate axes; i.e. you specify some length as 1, and you specify one line as the x-axis and the other as the y-axis. Now mark two points, M at (0,-1) and N at (-n, 0). Fold so that M ends up on the line y = 1 and N ends up on the line x = N. The x-intercept of the fold is at ^{3}√n. How does this work? It essentially extends the reasoning behind the construction for square roots. Spoiler: Hint The fold allows you to set up 3 similar right triangles so that the first shares a leg with the second, and the second shares its other leg with the third. Now what can you say about the ratios of leg lengths? So you can double a cube using origami, and you can trisect an angle using origami. Just as you can solve quadratics using a compass and straightedge, you can solve degree 4 polynomials with integer coefficients using origami. However, squaring a circle requires constructing a length ratio that is not the solution to any polynomial with integer coefficients, i.e. a transcendental number, and hence neither compass and straightedge nor origami will be able to square a circle.
Quandles Mathematicians like to come up with bizarre objects to play with, because making your own toys can be fun. One such object is called a "quandle", which is a delightful word coined by David Joyce. Consider a group, G, and the conjugation operation: x ► y = xyx^{-1}. Suppose we are only allowed to talk about ►, and not regular group multiplication. Then what can we say? Well, we can say that x ► x = xxx^{-1} = x, and that given x and y, there is a unique element z such that z ► (x ► y) = y. Finally, we can write a weird-looking law that says: x ► (y ► z) = (x ► y) ► (x ► z) Check this one! It essentially follows because conjugation is always a group automorphism. It also indicates that ► is nonassociative, but in an interesting way. Anyway, the point is that we can examine a group using only conjugation and not group multiplication, and we get a thing where left "multiplication" is an automorphism of the thing, unlike regular group multiplication. And as mathematicians often do, we take the laws that we got from a particular object, and treat those laws as axioms while ignoring the original object. A quandle is a set that has two operations, denoted ► and ◄, that obey the following laws: 1): a ►a = a = a◄ a 2): a ►(b ◄ a) = b = (a ► b) ◄ a 3l): a ► (b ► c) = (a ► b) ► (a ► c) 3r): (a ◄ b) ◄ c = (a ◄ c) ◄ (b ◄ c) In our group example, x ◄ y = y^{-1}xy. Another example is to take a group and now define x ► y as xy^{-1}x. This one is also weird, because this is not a group automorphism. And yet, it still obeys the quandle laws. So why do we care about quandles in general? There are two cases that might be interesting outside of "eh, here's a fun bit of nonsense". Consider Euclidean space. If you do a reflection of the entire space across some plane, and then do a reflection across a different plane, what you end up with cannot be replicated using only a single reflection. If you do three reflections in a row, then sometimes the result can be replicated by a single reflection, but sometimes not. So how can you take reflections across two different planes and end up with a reflection across a third plane? Simple: reflect across the first plane, then across the second, and then across the first again. So now we have a way to compose reflections to get reflections. And this is quandle-y, since reflections are elements of the group of symmetries of Euclidean space, and the means of composing reflections that we just defined is conjugation in that group, at least when restricted to reflections. The interesting fact is that the result of this conjugation is again a reflection. This can be generalized to other types of spaces where we have reflection-like symmetries, like a sphere, or even things like the set of integers modulo n, where a "reflection" looks like the map r_{j}(i) = 2j - i mod n This leads to the study of symmetric spaces, which are really nice geometric objects studied by both algebraic and differential geometers, for various reasons. A lot of nice geometric objects, like spheres and projective spaces and Grassmannians are symmetric spaces. Indeed, for the octonions, the octonionic projective plane is probably most easily described as a symmetric space. The structure that we've been calling a "quandle" was originally discovered by Mituhisa Takasaki when he was trying to come up with an algebraic structure for the reflections of a symmetric space; he called the structure Kei. Consider a knot, by which we mean a loop of string living in three-dimensional space. In my post on knots I talked about knot diagrams, where we place the knot on a plane and mark down which parts of the knot cross over other parts. We consider "oriented" knots, by which we mean, start at some point on the knot and pick a direction along the string to travel in, and then follow the entire knot in that direction. So now each bit of the string has a direction "forward" and "backward". Consider a point where the string crosses over itself. Let's rotate everything so that the two bits of the string involved go from up to down. Now we label bits of the string. For the strand of string going over the crossing, we give the same label, say, "y", to both the part of that strand going into the crossing and coming out of it. For the strand that goes below, if the incoming part is labeled "x", we label the outgoing part either "y ►x" if the over-crossing strand goes from upper left to lower right, or "x◄ y" if it goes the other way. Here's a diagram, where they write "a ►_{+} b" for what I'm writing as "b ► a" and write "a ►_{-} b" for what I'm writing as "a◄ b", and also use white triangles instead of black*. Unfortunately, the exact notation for quandles hasn't been nailed down. I have my reasons for writing it the way I do, but not everyone agrees with me. Anyway, visually it looks like this: There are three Reidemeister moves that tell us when two knot diagrams are really the same knot. And they in fact give use precisely the quandle laws: Move I: I.e. x ►x = x. The mirror image of move I gives us the other half of law 1. Move II: This move and its mirror image give us, as you can see, the second quandle law. Move III: A little work shows that this gives us half of the third quandle law, and its mirror image gives us the other half. Since the quandle laws precisely match the Reidemeister moves, this tells us that our rule for assigning labels to strands gives us a perfect map between knots and quandles that doesn't depend on the diagram used to represent the knot. Hence a quandle is a knot invariant. Indeed, the term "quandle" was created by David Joyce when he was first examining knot invariants. So quandles aren't entirely just "let's do something funny to groups". * I would also be using white triangles if my CharMap program had them available.
Laver Tables So slightly tangent to my last post, let's talk about that law 3l: a ►(b ► c) = (a ► b) ► (a ► c) If we ignore the rest of the quandle laws and only ask for this one, we get a bunch more structures than just quandles. This law is sometimes called "left self-distributivity", because it looks like the law of distributivity, which we are most familiar with in terms of multiplicatin distributing over addition, but instead of one operation distributing over another, here it's an operation that distributes over itself. And here we're only asking for distribution on one side. The Laver tables are the most famous of the left self-distributive structures that aren't quandles. We pick a positive number n and define ► on the set {1,..., 2^{n}} by the laws p ► 1 = p + 1 mod 2^{n} p ►(q ► r) = (p ► q) ► (p ► r) where when taking mod we write 2^{n} instead of 0. Perhaps surprisingly, this completely defines ► on pairs of numbers between 0 and 2^{n}-1 inclusive, and indeed the entire table can be computed mechanically. For instance, if n is 2 we get a "multiplication table" 1 2 3 4 1| 2 4 2 4 2| 3 4 3 4 3| 4 4 4 4 4| 1 2 3 4 The entries in the first row are interesting: 2 4 2 4. Well, on their own they're not. But let's look at larger n: n = 2: 2 4 2 4 n = 3: 2 4 6 8 2 4 6 8 n = 4: 2 12 14 16 2 12 14 16 2 12 14 16 2 12 14 16 The entries repeat, with a cycle smaller than 2^{n}, but how much smaller? We call the length of the cycle the period of the table. The periods for the first few tables, starting with n = 1, is 1, 2, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16 ... One question you might ask is "does the period ever exceed 16?" And if the answer is yes, you can ask "does the period get arbitrarily large?" Randall Dougherty proved that the smallest n for which the period can possibly be 32 is A(9, A(8, A(8, 255))), where A is the Ackermann function; if you're familiar with the Ackermann function, or at least aware of its general properties, you will understand that this is a ridiculously large number. In fact, the only known proof that the period does in fact get arbitrarily large, and one of the reasons that Richard Laver came up with these tables, requires the existence of a particular type of Large Cardinal Axiom, which I have previously mentioned as being extra axioms that one tacks on to the usual ZFC axioms if regular set theory is insufficient. In particular, you need a Rank-into-Rank cardinal, which are the largest cardinals not known to be inconsistent with ZFC. So in other words, whether or not these rather simply, mechanically-defined, finite tables have a certain growth property requires stepping far outside of what ZFC alone offers.
okay what the actual fuck is going on with the Riemann Hypothesis rn? Are people freaking out over nothing again?
Apparently, some guy allegedly came up with a proof for it, but he's also known for kinda talking out of his ass for attention sometimes. That's at least what I got from vague discussions of it.
Sir Michael Atiyah, age 89, former president of the Royal Academy, Fields Medalist, who like 20 or 30 years ago was one of the biggest names in geometry, co-inventor of topological K-theory and the Atiyah-Singer Index Theorem among other really important accomplishments, claimed a few weeks ago that he had a proof of the Riemann Hypothesis and gave his first public presentation about it today, or yesterday, or whatever. I don't think any serious mathematicians gave that claim any credence even before the presentation, because Atiyah's work has been declining quite badly for the decade or so; for instance, in 2016 he also claimed to have a solution to an open problem about six-dimensional spheres that turned out to be full of unfixable holes. Also, the Riemann Hypothesis is a little far from his usual wheelhouse; he does algebraic topology, not analytic number theory. But he's also Sir Michael Atiyah, who earned that knighthood for his contributions to mathematics, so most mathematicians of any clout just kind of politely don't say anything too loudly because the guy is 90 years old and his best years, very good years at that, are definitely behind him. On the other hand, non-mathematicians hyped it up because Atiyah is known to be a Name among mathematicians and the Riemann Hypothesis is known to be an Important Problem among mathematicians, so the media, for some value of "the media", totally blew everything out of proportion, because that's what they're paid to do. The end result is that Atiyah's claimed proof doesn't seem to work at all, as expected.
What I've heard is that the function integral to his proof, if it has all the properties he claims it has, is a constant.
No, I read that that particular line of reasoning doesn't apply cause the function isn't actually analytic.
Grover's Algorithm After some learning, I think I appreciate Grover's algorithm a lot more. Especially now that I really understand why it's inverting a function and not searching a database, and also because I have a nice geometric idea of why it works. The particulars are as follows: Suppose we have a function f whose domain is the set of binary strings of length N, and for some particular binary string x we have f(x) = 1, and for all other binary strings y we have f(y) = 0, and the hope is to find x in the complete absence of any information about f other than the ability to evaluate it on individual binary strings. The classical procedure in this case would be to simply try strings until we hit the right one, because in the absence of other information we can't really do anything. That will take an expected half of the total number of binary strings in the set, i.e. 2^{N-1}. So O(2^{N}) operations. In the quantum case, however, we have some interesting things we can do. Indeed, we get a quadratic speedup over the classical case. We start with not a set of binary strings, but a vector space of dimension 2^{N} with a basis given by vectors corresponding to our binary strings; we'll denote the vector corresponding to a string w by |w>. Now we also assume that we have a measuring device that takes a vector that is a linear combination of basis vectors and spits out binary string; which particular binary string is proportional to the square magnitude of the coefficient of the corresponding basis vector in the linear combination. Our quantum computer likes taking in vectors and spitting out vectors, so when we say "compute f", what we end up with is more along the lines of an operator U_{f} taking a vector |w> and spitting out (-1)^{f(w)}|w>. So in particular, U_{f}|x> = -|x> while for all other y in our set of binary strings, U_{f}|y> = |y> Notably, U_{f} acts as a reflection. Grover's algorithm then builds another reflection called the Grover diffusion operator U_{s}; it takes the uniform superposition |s> of all of the basis states (i.e. a linear combination of all of the basis vectors with coefficient 1/2^{N/2}), and reflects that, while keeping everything orthogonal to that vector fixed. (I think the usual diffusion operator is actually minus this operator, but it's still reflection-y). Now we alternate applying U_{f} and U_{s}. What happens when you apply two reflections one after the other? You get a rotation in the plane given by the two reflected vectors. So if you start with the uniform superposition |s> and apply U_{s}U_{f} over and over, you'll end up with a superposition that's pointed more toward |x>. But be careful, because you can actually end up going past |x> if you rotate too far. So how far should you rotate? Well, |s> and |x> aren't quite orthogonal to each other, so U_{s}U_{f} doesn't rotate by pi radians. Indeed, you can do some trig and figure out the angle; it's 2arcsin(1/2^{N/2}) which is approximately 2/2^{N/2} because for small angles, arcsin is approximately the identity function. So to get our vector as close to |x> as possible, we should apply our rotation about 2^{N/2}pi/4 times, and then measure what binary string corresponds to our state. In other words, that's O(2^{N/2}) operations, as opposed to the classical case of O(2^{N}) operations. Hence the quadratic speedup. Furthermore, Grover's algorithm is optimal, in that if your only information about f is the operator U_{f} then you need to apply it at least O(2^{N/2}) times. The tricky part about Grover's algorithm, and why it's better to view it as inverting a function rather than searching a database, is actually building |s> and U_{s}. Given N qubits, we can build |s> using a Hadamard gate and U_{s} from there, but for more general sets it's actually quite hard to set up anything similar. Furthermore, even getting the data of a classical database into a quantum computer takes at least O(size of database) time, so you can't really get any real speedups in that case.
Borromean Rings Succumbing to his own fears and the false promises of Sauron, Boromir attacked -- wait. That's not quite right. The Borromean rings are an arrangement of three loops such that any two of them are unlinked, but the full set cannot be separated. If you were to, say, delete the blue ring, the red and green rings become separable. Similarly, if you delete the red ring, the green and blue rings become separable, and if you delete the green ring, the red and blue rings become separable. But taken all together, the each ring blocks the other two rings from coming apart. Note: despite what the image seems to imply, you cannot actually make a set of Borromean rings out of perfect circles. For instance, suppose that the red and green rings are perfect circles. Then the blue ring somehow has to go over the red circle but below the green one, despite the green one being below the red one. The Borromean rings are an instance of a broader set of arrangements called Brunnian links, which are sets of loops such that deleting any one loop allows the entire set to be pulled apart into unlinked loops. The Borromean rings are the simplest nontrivial case. You can make more complicated Brunnian links by adding in more loops or by arranging them in a more complicated fashion. For instance, in the three loops case, you can make a Brunnian link with more crossings, if you make sure that the blue loop always crosses above the red loop and below the green loop, which always crosses below the red loop. This always-above and always-below behavior ensures that the loops are pairwise unlinked. The trickier part is to make sure that they don't just come apart anyway. One way to get Brunnian links with any number of loops is to use the "Rubber Band" setup, pictured with 10 links here: But you can probably imagine adjusting this to any number of loops. A similar and perhaps more familiar object is the standard three-strand braid. If you nail down the ends of the strands and then delete any one of them, the remaining two become unbraided. This is because each strand always crosses over one of the other strands and always crosses below the third, which is the same property that allows the Borromean rings to unlink. And indeed, if you take a standard braid and glue the beginning of each strand to its end, you'd get a 3-loop Brunnian link. Again in this picture, the blue strand always goes over the red one, so deleting the black strand leaves the blue and red able to separate. There is, as one might expect, a bunch of deep mathematics related to these things, in terms of how to identify and classify these objects. Unfortunately, it takes us into the realm of algebraic topology, which I am less than stellar both in terms of doing and in terms of explaining. Images stolen from Wikipedia, except for the Rubber Band picture, stolen from The Knot Atlas
R^{4} is smooth as hell Suppose we have R^{2}. And we have an object that claims to be R^{2}. How can we verify this claim? Well, we can't, because that claim isn't quite well-defined. But we can ask if the object is equivalent to R^{2} for various notions of equivalent. So we can ask: is there a bijection from the points of our object to the points of R^{2}? So we can at least say that our object has the same size as R^{2}. Or we can ask about the topological structure: do we have a homeomorphism, a continuous bijection whose inverse is also continuous? No rips, tears, jumps, missing points, weird interleavings, etc. So we can at least say that our object has the same topology as R^{2}. We can ask about smooth structure*: we come up with a way to talk about derivatives of functions defined on our object, and derivatives of derivatives, and so on, and we can ask for a diffeomorphism, a homeomorphism that is smooth (infinitely differentiable) according to our notion of derivatives and whose inverse is also smooth. It turns out that if an object is homeomorphic to R^{2}, and we put in a smooth structure that is compatible with the topology, then the object is also diffeomorphic to R^{2}. So we've really only talked about two distinct ways to possibly be equivalent to R^{2}, rather than 3. Similarly for R^{1}, and R^{3}, and R^{5}, and R^{6}, and R^{n} for all positive integer values of n...other than n = 4. R^{4}, or at least "standard" R^{4}, has a particular topological structure where open sets can be built out of open sets of the form {(w,x,y,z) | (w-w_{0})^{^2}+(x-x_{0})^{^2}+(y-y_{0})^{^2}+(z-z_{0})^{^2} < r^{2}} and a particular smooth structure which is the one you expect from multivariable calculus. But there are other smooth structures you can stick on R^{4}. We call an object that is homeomorphic to but not diffeomorphic to the standard R^{4} an exotic R^{4}. Exotic R^{4}s fall into roughly two types: those for which there is a smooth embedding into the standard R^{4}, and those for which there isn't. The ones that can embed are called small, and the ones that can't embed are called large. There is in fact a largest exotic R^{4} into which all other R^{4}s can embed. But wait, you might say, can't R^{4} always properly embed into itself pretty trivially? Well, yes, if we only insist on continuity. If we insist on smoothness, then a bunch of stuff goes wrong, which is why we have larger R^{4}s that can't embed smoothly into smaller R^{4}s. There are a continuum of exotic R^{4}, and indeed one can create a smoothly-varying family of R^{4}s, one for each point on the real line, such that the family glues together like a stack of pancakes to make an R^{5}. And it turns out that no matter which family you pick, even if each R^{4} involved is exotic, the resulting R^{5} is the standard R^{5} because there are no exotic R^{5}s. This oddity at n = 4 happens because of surfaces "knotting" together when you're trying to move from one structure to another. In 3 or fewer dimensions you can't really get surface knots, while in 5 and higher dimensions you can always shove things along extra dimensions to undo anything that crops up. In 4-dimensions, however, you have enough room to get knots but not enough room to undo them. Welp. I mentioned in the post on the Poincare Conjecture/Perelman Theorem that the results we have say that being homotopy equivalent to a 2-sphere means diffeomorphic to a 2-sphere, and homotopy equivalent to a 3-sphere means diffeomorphic to a 3-sphere. Is this true in general? Not for n = 7! We know that exotic 7-spheres exist! We can take an object that is homeomorphic to a 7-sphere and stick on a smooth structure that is incompatible with the standard smooth structure that we would get from viewing the 7-sphere as a subset of R^{8}. In fact, there are 28 distinct smooth structures (up to diffeomorphism). They form a cyclic group, where the group operation is "cut a 7-disk out of each sphere and glue the results together along the boundaries of the holes you just created". But note: if you take an exotic 7-sphere and remove one point, then you get something that is homeomorphic to R^{7}, the same way that if you take a balloon and puncture it then all the air rushes out and you're left with a rubber sheet. But there are no exotic R^{7}s, so that means that all exotic 7-spheres are the standard R^{7} plus a point, so all the exoticness can be located at that point. So the smooth Poincare Conjecture is false at least in some cases. Note that there are no exotic 1-, 2-, 3-, 5-, 6-, or 12-spheres, but there are several other dimensions where we know that there exist exotic spheres. We don't know at all in the case of n = 4! There might be exotic 4-spheres to go along with our exotic R^{4}s, but we don't know yet! * In more detail: by "object" we're generally talking about a manifold, i.e. a thing where each small-enough piece has a homeomorphism to an open subset of R^{n} for some n, and the pieces glue together nicely. We can carry over the notions of derivative from R^{n} to the pieces via these homeomorphisms, and if the results don't disagree where the pieces overlap, then we say that we have a smooth structure on our object. Note: not all topological manifolds admit any smooth structures, but anything that claims to be homeomorphic to a smooth manifold should probably try its best if it's going to be making such a claim.
Arrow's Impossibility Theorem Maine's 2nd District just elected Jared Golden for the US House of Representatives, but in a somewhat complicated manner. Originally, more people voted for Bruce Poliquin than for Jared Golden, but the race was tight so there was a runoff. And the runoff used what is called "ranked choice voting", wherein instead of just saying "I want this guy to be Representative", you put all the people you wouldn't mind being Representative in order of how much you don't mind. So if you really want, say, Tiffany Bond to be Representative then you put her first, but if you wouldn't mind Golden, then you might put him second, and if you might be willing to hold your nose for Poliquin, then you might put him third, just so that people know that even though you want Bond if you can get her, you'd rather have Golden than Poliquin. And in the end what happened was that while more people put Poliquin first than put Golden first, more people put Golden before Poliquin, if one ignores whether they actually put Golden first or not. Then the vote tallyers asked "do more people prefer Poliquin to Golden or Golden to Poliquin?" and since more people preferred Golden over Poliquin (and more people preferred either of the two over any third party candidates) according to ranked choice voting, Golden won. But Poliquin is challenging that result in court, saying that ranked choice voting is unconstitutional. Now, it doesn't actually say in the Constitution how exactly votes are turned into decisions, only that the decisions should be based on the votes. So Poliquin is unlikely to succeed in his suit. Moreover, the state of Maine voted earlier this year that runoffs should use ranked choice voting, so it is indeed a democratic decision. But it is an interesting outcome that two different methods of vote-tallying give different results despite ostensibly asking the same question: who wants Golden versus who wants Poliquin. Clearly, what we should do is figure out which vote-aggregation scheme is best and just use that, right? I mean, once we get past the usual partisan stonewalling and gerrymandering and irrational stubbornness and corruption and lobbying and whatnot. There's a mathematical theorem called Arrow's Impossibility Theorem which states that in a fairly broad set of circumstances, there is no "best" vote-aggregation scheme, and in fact there are no vote-aggregation schemes that fit what sound like fairly reasonable criteria. But of course the devil is in the details, so let's talk about the details. Arrow's Impossibility Theorem starts with a simple setup: there are at least two votes, and there are at least three candidates, and each voter submits an ordered list of the candidates, in order of preference. Not how much they'd prefer any given candidate, but simply "I prefer A more than B, and B more than C, and C more than D" and so on. We assume that for each voter the rankings are transitive: if a voter prefers A over B and B over C, then that voter prefers A over C. No rock-paper-scissors here. So that's our setup. Now we take those votes and want to put them together into a single list that reflects, in some fashion, the will of the voting populace. What does that mean? Here's a formal statement, which I will then interpret: Let L be the list of possible orderings of the candidates, and suppose there are N voters. A voting scheme is a function F from L^{N}, i.e. the set of all possible votes cast, to L, which is the "will of the people". Let V_{i} be the vote of voter i. Then the vote scheme takes the collection {V_{i}} and returns F(V_{1},...,V_{n}). Arrow's theorem gives a few "fairness" criteria: 1: Unanimity: If for all i, A comes before B in V_{i}, then A should come before B in F(V_{1},...,V_{N}). If everyone prefers A to B, then the People prefer A to B. 2: No dictators: There is no i such that F(V_{1},...,V_{N}) = V_{i} for all possible votes, i.e. there is no single person whose vote is the only one that matters. 3: Independence of Irrelevant Alternatives: Suppose we consider two candidates A and B. Suppose we have two sets of possible votes, {V_{i}} and {W_{i}}, and that for each i, if V_{i} and W_{i} always agree on whether A or B is better then F(V_{1},...,V_{N}) and F(W_{1},...,W_{N}) should agree on whether A or B is better, regardless of where the other candidates end up. Arrow's Impossibility Theorem states that with the given setup, no voting scheme F can satisfy all three criteria at the same time. The proof is a little tricky, involving constructing a sequence of possible vote sets that force one of the criteria to fail. There's a bunch of bookkeeping involved that is clever but I don't find terribly interesting. There are a variety of ways that people try to weaken or change the setup and criteria in order to get something that works. For instance, you might consider a setup where you're only trying to pick a single winner. Unfortunately, the Gibbard-Satterthwaite theorem states that you end up with one of three outcomes: 1: Your voting scheme has a dictator 2: You only have two candidates 3: People use "tactical voting", i.e. submitting rankings that they don't actually agree with in order to influence who ends up winning in the end. The more general Gibbard's theorem gives us similar results even when voters submit something other than rankings, for instance assigning numerical grades to each candidate. Anyway, determining "fair" voting is hard in the mathematical sense, even before we get to the political difficulties.
I think my favorite real-life encounter with Arrow's Theorem (and variants thereof) was finding out that it also applies to machine-learning ensembles. Because of course it does, but I hadn't thought about it that way before and also more of my day-to-day involves machine learning than political elections.
I know, right? I mean, a machine learning ensemble is a bunch of individual learners submitting what they each think the answer is, and then having to take all those individual answers and turn them into a single answer for the entire ensemble. Only the question is "what is this thing" rather than "who should be president". But it's still voting. So when you have more than two classes and you're asking each learner for confidence levels or whatnot, then you run into Arrow-type theorems.
That feel when you think you have a sharp, brilliant insight but then it turns out that you've just screwed up some bookkeeping. In unrelated news, it turns out that I have been misinterpreting the Cooley-Tukey algorithm. Oops.
It's almost Christmas Eve, and that means that at least here in the US, lots of stores are going to be packed with people doing last minute shopping. And with so many people trying to get so many gifts in such a small window, queuing up to pay for stuff is going to be super painful. Which means it's time to talk about The Long Line A lot of counterexamples in topology can be summed up as someone smiling at you with their lips only and saying "haha, no. Fuck you." The Long Line is one of my favorite cases. We start with the interval [0, 1), which contains all the real numbers no less than 0 and strictly less than 1. If we take two copies of it and shift one over, we can put them together end-to-end to make [0, 1)+[1, 2), which we often call the interval [0, 2). Three copies gets us [0, 3), and so on. Countably many copies gets us the non-negative real numbers, [0, ∞). We'll call this a short ray. If we take two copies of this short ray and flip one of them over, we can join them at 0 to get the real line (-∞, ∞). Okay, so we can build the real line out of copies of [0, 1). And indeed, a countable number of copies of [0, 1). What if we took our short ray [0, ∞) and stuck another copy of [0, 1) on the end? We'd get a set [0, ∞)+[∞, ∞+1). We might write this as [0, ∞+1). Which looks kind of weird! But it's still a thing we can do, I guess. And we can stick another copy of [0, 1) onto that to get [0, ∞+2), and even another ray, to get something that we might write as [0, ∞*2). And so on. We want to be careful about how we think about these things. When we look at [0, ∞+1), we want to remember that by ∞+1 we mean that there's a copy of [0, 1) stuck in after a countable number of copies of [0, 1), and so that ∞+1 is not ∞. Rather, we have a bunch of numbers that we've declared are larger than any real number, but otherwise act like real numbers within any other interval between 0 and 1. Similarly, for [0, ∞*2), we have two intervals, each of which looks like the nonnegative real numbers, and everything in one interval is bigger than everything in the other interval. So since we're talking about bigger and smaller, we're talking about orderings, which means we're talking about ordinals. Recall that an ordinal is a set with a notion of "smaller than" on it that is well-ordered, i.e. any subset of an ordinal has a smallest element. So the positive integers are an ordinal, since any set of positive integers has a smallest one, but the real numbers with the standard ordering aren't an ordinal since, for instance, there's not smallest positive real number. So the way we talk about these unions of copies of [0, 1) is via ordinals: every element of our set gets two coordinates: an ordinal n and a real number in [0, 1), and we say that the element with coordinates {a, b} is smaller than the element with coordinates {c, d} if a < c or if a = c and b < d. So the real number 2.97 gets coordinates {2, 0.97} and the point 4.86 gets coordinates {4, 0.86}, and since 4 > 2, we get that 4.86 is bigger than 2.97. I'm going to write ω_{0} instead of ∞ to be clear that I'm talking about the smallest infinite ordinal, i.e. the one that comes after all the finite integers. So we can have elements with coordinates like {ω_{0}, 0.35}, and that's bigger than 4.86 and 2.97 because ω_{0} is bigger than 4 and bigger than 2. We call this type of ordering "lexicographic" or "dictionary" ordering, since that's how we order words: if the first letter differs, then see which word's first letter comes first in the alphabet. If the first letters are the same, then look at the second letter. And so on. Recall that every finite integer is in the set denoted by ω_{0}, so our short ray, i.e. the nonnegative real numbers, can be viewed as the product ω_{0} x [0,1) with the dictionary order on it. We can also use this ordering to put a topology on it. Suppose that you have two numbers, {a, b} and {c, d} where {a, b} < {c, d} in our ordering. Then we say that the "open interval" ({a, b}, {c, d}) is the set of every number that is less than {c, d} but greater than {a, b} in dictionary order. If {a, b} and {c, d} are finite real numbers, then such an interval is your basic single, connected piece of the real line with no holes. So we'll consider them that way in general. Which is a little weird to think about when you have points whose first coordinate isn't finite, but that's math for you. So for instance, the "interval" ({2, 0.65}, {ω_{0}+74, 0.111}) should be considered as if it were just like a regular interval, containing every number between 2.65 and ω_{0}+7.111. The topology generated by these intervals is called, appropriately, the "lexicographic" or "dictionary" topology. Okay, so we have our finite real numbers, and we have some other stuff that's bigger than all the finite stuff. But that's not long yet. For any countable ordinal, the resulting ray is still homeomorphic to a short ray, i.e. the nonnegative real numbers, i.e. there's a continuous function from your ray to the nonnegative real numbers and back, which sends open intervals to open intervals. Just because some of the intervals look kind of weird doesn't really make them act any different from a topological perspective. So to get the Long Line, we look at ω_{1}, which is the first uncountable ordinal. Now look at ω_{1} x [0, 1), again with the dictionary order and dictionary topology. We call this the long ray. Making another copy, flipping the copy over and fusing the two copies at {0, 0} gives us the Long Line. Now this guy is long. Not only is it uncountable as a set, but it's an uncountable number of copies of the nonnegative real line stuck end-to-end. But it still is the same cardinality as the real numbers. Even though for any point on the Long Line the immediate region around that point looks like the real numbers, taken as a whole this is a very different beast, not at all homeomorphic to the real numbers. For instance, any strictly increasing sequence converges somewhere on the long line. This is not the case for the regular real numbers, where the sequence 1, 2, 3, ... doesn't converge. On the Long Line, it converges to ω_{0}. Even sequences like 1, ω_{0}, ω_{0}^{2}, ω_{0}^{3}, ... converges on the long line somewhere off in the distance. So the Long Line appears locally to be the same as the real numbers, but is simply much, much longer. Hopefully any lines you might have to stand in aren't too bad.