# Math(s)! Currently: the Twin Primes Conjecture

Discussion in 'General Chatter' started by Exohedron, Mar 14, 2015.

1. ### ExohedronDoesn't like words

I really like Wikipedia's proof of why the determinant is multiplicative: first you show that any n-linear, alternating function F from kn to k is determined by the value it takes on the identity matrix In (viewed as n vectors), and thus that the determinant is the unique such function that takes the value 1 at In, and that for any other n-linear alternating functions F, F applied to M is just F(In)det(M) by linearity in the space of multilinear functions.
Then you note that the function dA that takes a matrix M to det(AM) is such a function, and takes the value det(A) at M = In, and so in general is just det(A)det(M).

Step one takes a little effort, though, mostly to convince the reader that the bookkeeping works out.

Last edited: Dec 30, 2018
2. ### ExohedronDoesn't like words

A slight divergence from pure math stuff, because I attempted to explain some of this to my parents recently:

Digital Signatures

You want to send someone a letter, and you want them to be certain that you sent the letter. Conversely, you don't want people sending things under your name.
If we have a physical letter, the traditional method is to append a signature or a seal, some symbol or sign that your intended recipient can recognize as yours and that can't be produced by anyone else.
But forgery is a thing, duplicating someone's handwriting or copying a seal. Most of us don't have to worry about that on the physical plane, because that takes effort that aren't worth it for most people.
But there are types of forgery that do pay well. For instance, art forgery; learning someone's painting style, techniques, materials well enough to claim that you've "found" a lost piece.

In the digital realm, we send a lot of messages, important ones, and it's very easy to copy bits. So we need a good way of signing things that is more than just appending a bunch of bits containing your name.

Fortunately, we have some decent public key encryption systems.
For a recap: in a public key system, you have some algorithm E that takes a key KE and a message p and produces an encrypted string c, and a different algorithm D that takes a different key KD and the encrypted string c and produces the message p. This is different from more traditional, symmetric key cryptography, wherein the two keys are the same. In public key cryptography, because the encryption key is distinct from the decryption key, you can tell people the encryption key, and they can send you encrypted messages using the algorithm E and the key KE, but as long as you keep the decryption key secret, nobody can read those encrypted messages except for you, even if they know the algorithm D.

So how does that help with signing things?
A digital signature works similarly to public key encryption. You, the signer, have a key KS and a key KV, and you release KV to the public. There is an algorithm S which takes a message m and your key KS, and produces a signature h, and an algorithm V that takes a message m, a supposed signature h, and a key KV, and verifies whether the signature would be produced by S given the message and the key KS.
Note that m also gets transmitted. The reason for this is that we want the signature to depend on the message in some fashion. Otherwise a forger could take a valid signature from one message and paste it onto a fake message, like ripping a wax seal from one letter and gluing it onto a fake letter. At the very least, the verifier wants to know that not only did they get a message from the correct person, they got the correct message from that person. This is called authentication, and is slightly different from identifying. But it's still important, because someone might intercept a legitimate message from you, mess with the contents of the message without touching the signature, and then send it onward, and you find yourself promising money to people you didn't mean to.
Also, for the verifier's benefit, a message with a valid signature can't be repudiated by the sender, because nobody else could have produced a valid signature for that message; hence you can't deny having signed and therefore presumably having sent that message. But this is usually considered a less important aspect compared to identification and authentication.

Having a public key encryption system actually directly helps with making a digital signature scheme. Your standard public key setup promises that
p = D(KD, E(KE, p)))
so that decrypting with the proper decryption key undoes an encryption with the proper encryption key. But a lot of them also have the property that
c = E(KE, D(KD, c))
i.e. that encrypting with the proper encryption key undoes a decryption with the proper decryption key. This is not always the case, but it is true for a lot of public key encryption systems, in particular RSA.
So, to sign a message, you take the message m, you decrypt it using KD to get some "decrypted" h, and you use h as your signature. The verifier, knowing the corresponding key KE, encrypts h and sees if it matches m.
Voila, a signature. If a would-be forger doesn't know your decryption key KD, they won't be able to "decrypt" m to get h, and so won't be able to produce a valid signature. Moreover, if they don't have the correct m, then the encrypted signature won't match. So we get identification and authentication.

Okay, so I'm leaving a step out. We don't want to decrypt the entire message, because messages can be long but your signature should be short. So we cheat a little. Instead of decrypting the message, we hash the message, using a complicated function that takes in messages and produces short strings of nonsense. Then we decrypt the hashed message to get a short signature. Then the verifier compares the encrypted signature with the hash of the message.
The danger here is if two messages generate the same hash in a predictable manner. Fortunately, we also have decent hash functions.

So that's what your computer is doing under the hood every time it tries to send a message that needs to be trustworthy. It hashes the message, turns the hash into a signature, and then sends the message and the signature; conversely, every time it receives a signed message, it processes the signature, hashes the message, and checks if it got matching results. If the signature corresponds to the message, great. If not, someone is trying to pull a fast one. Or something got garbled during transmission; either way, don't trust the message.

Now, I'm sure some of you are thinking "wait, isn't RSA going to be insecure when we have quantum computers? Aren't all of our public key systems currently in use going to be insecure against quantum computation? Doesn't that mean that our digital signature algorithms will be insecure?" And the answer is yes.

So in the future, possibly the near future, we are going to need both encryption algorithms that will stand up against a quantum computer, and digital signature schemes that will stand up against a quantum computer. It seems, from what I can tell, that signature schemes are slightly easier to come up with than encryption algorithms, due to how they handle what each person needs to know, but it's still a hard problem.

3. ### ExohedronDoesn't like words

For reasons, I am currently liveblogging a math paper in my bijou blogette here. The paper, Homological Tools for the Quantum Mechanic, by Tom Mainiero, is about detecting factorizability of tensors via cohomology, available on the arXiv here.
If you want to see someone reacting in uninformative ways to what is really a small book (126 pages) then you can follow the tag #MathPaperLiveBlog.*

* I don't know if you can actually follow in a meaningful fashion other than watching the thread and/or searching for "#MathPaperLiveBlog" in the search bar.

Last edited: Jan 26, 2019
4. ### ExohedronDoesn't like words

Today I learned that a map between Poisson manifolds that preserves the Poisson structure is called an ichthyomorphism.

• Winner x 4
• Informative x 2
5. ### ExohedronDoesn't like words

Given that I seem to be an algebraic differential geometer who was unfortunately mostly raised by analytic differential geometers, the discovery that torsion can be explained in terms of exterior derivatives applied to differential forms makes me very happy. I don't know what is meant by "twisting" in a differential-geometric sense, but the discrepancy between a covariant exterior derivative and the standard exterior derivative makes sense as a thing.

6. ### ExohedronDoesn't like words

I learned a great π fact today.

π is the minimum value of π

On the plane, we can define what are called "Lp" norm, by which we mean the following: first, pick a real number p between 1 and ∞.
Given a vector with coordinates (x, y), we say that the Lp norm of (x, y) is (|x|p + |y|p)1/p.
So the usual Euclidean norm is the L2 norm, and the L1 norm is just |x| + |y|.
There's also what we call the L norm, which gives max(|x|, |y|).
We can define the Lp-distance between two points (x1, y1) and (x2, y2) as the Lp norm of the vector difference between the two points, i.e.
dp((x1, y1), (x2, y2)) = (|x1 - x2|p + |y1 - y2|p)1/p

Now we can define a unit Lp circle to be the set of points of Lp-distance 1 from a given point. We can also define a notion of Lp-length of a curve using calculus and substituting in p where we normally would see 2 in the formula.

So now we can ask "What is the Lp-length of a unit Lp-circle?" We can divide that value by 2 and call the resulting number πp.
So for instance, π1 is 4, a fact that I'll let you puzzle out on your own. π2 is the familiar 3.14... π also ends up being 4.

So here's the fun π fact: π, by which we mean π2, is the minimum value of πp with regards to all values of p between 1 and ∞.

Last edited: Mar 28, 2019
• Like x 2
• Informative x 1
7. ### ExohedronDoesn't like words

Given a polynomial with complex coefficients, we can consider the polynomial both as a bunch of coefficients but also as a function that takes complex numbers to complex numbers. Note that the product of any two polynomials is again a polynomial, and that if two polynomials aren't 0 everywhere, then their product isn't 0 everywhere.
This also works with real numbers, talking about polynomials in terms of either their coefficients or their values.

Something funny happens when we look at a finite field. Suppose that we look at a field Fp with p elements for some prime number p. Then Fermat's little theorem states that xp = x for any x in Fp. We have that x is a nonzero polynomial, and xp-1 - 1 is a nonzero polynomial since evaluating it at x = 0 gives -1. But the product of these two functions gives xp - x, which always takes on the value 0 for any x in Fp.
So we two nonzero polynomials that appear to multiply to something that always evaluates to 0?
The problem that we have here is that we're very, very far away from being algebraically closed, and so the nonzero-ness of xp - x is hidden from us. If we move to a field extension of Fp, then we get that it is no longer true that xp - x always evaluates to 0; for elements of the extension that aren't in Fp, xp - x yields a nonzero value.
However, there are still lurking issues. Suppose that we extend Fp by adding in roots of some irreducible degree k polynomial. Then we end up with the field Fpk. So now most elements don't satisfy xp - x = 0. However, everything satisfies xpk - x = 0, and so now we have that the product of x and xpk-1 - 1 always evaluates to 0.
So we have to move to an even bigger extension, and at this point we might as well go all the way to the algebraic closure of Fp just in case.

Anyway, I was originally going to talk about the group structure of CP in terms of rational functions and roots and poles, and how that totally doesn't work for FpP due to the problem of hidden nonzeros except I kind of lost the thread. I'm still learning how weird algebraic geometry is when you're so far away from being algebraically closed.

8. ### ExohedronDoesn't like words

More fun with modular arithmetic.

Newton's method and Hensel's lemma

Suppose that you wanted to find a solution to the equation x3 + 2x + 1 = 0 mod 2873. Don't ask me why.
How would you find one? Well, first thing you do, as you do in all questions about modular arithmetic, is you factor the modulus.
2873 = 132 * 17.
According to the Chinese Remainder theorem, if you can find a solution modulo 132 and find a solution modulo 17, you can combine them into a solution modulo 2873.
But you can't use the Chinese Remainder theorem to combine a solution modulo 13 with a solution modulo 13 to get a solution modulo 132.
So what do you do?

Let's take a digression into the real numbers.

Suppose you have a function f and you want to find a point x such that f(x) = 0.
For some particular functions we have formulas for finding such points, but for most functions we don't.
But we can find approximate solutions in some cases.
Suppose we guess a point x0. f(x0) probably isn't 0, and probably isn't close enough to 0. Can we make a better guess? Well, assume that f is linear. If f has slope m, then we have that
f(x0 - f(x0)/m) = 0
So we set
x = x0 - f(x0)/m
And we're done!
But what if f is not linear? Well, then we just assume that f is approximately linear, i.e. differentiable. Then the slope of the line that approximates f at the point x0 is f '(x0).
So we get that we should look at
x1 = x0 - f(x0)/f '(x0)
and this might be a better approximation to x.
So we compute f(x1), and if it isn't close enough to 0 then we set
x2 = x1 - f(x1)/f '(x1)
And so on.
This is called Newton's method for finding zeros of a function.
In general this will serve us well, unless we accidentally have that f ' = 0 at one of these points. If that happens, we pick another starting point.
Newton's method requires a decent enough function, by which we mean has continuous second derivative and has a good Taylor approximation. It also requires that we don't pick a really bad x0. Also it requires that a solution actually exist; otherwise we might get all sorts of nonsense.

Now back to modular arithmetic.

Suppose we have the function f(x) = x3 + 2x + 1, and we want to find a solution to f(x) = 0 not in the real numbers, but in the integers modulo 132.
We could try finding one directly, either through testing each possible number or just using the cubic formula, but let's try to use a slightly different tack.
Suppose that we had a solution to f(x) = 0 mod 132. Then since 13 divides 132, x mod 13 should be a solution to f(x) = 0 mod 13. So let's find x mod 13.
Modulo 13, we get that the only solution is x = 2.
So now let's try to lift that to a solution modulo 132. We know x mod 13 is 2, so our actual solution modulo 132 is going to be 2 + 13t for some t. Let's plug that into f and see what we get:
f(2 + 13t) = (2 + 13t)3 + 2(2 + 13t) + 1
= 8 + 3*22*13t + 3*2*(13t)2 + (13t)3 + 2*2 + 2*13t + 1
= 13 + 3*22*13t + 3*2*132t2 + 133t3 + 2*13t
Since we're working modulo 132, the t2 and t3 terms vanish, since both have coefficients that are divisible by 132. So we're left with
13 + 3*22*13t + 2*13t = 13 + (3*22 + 2)*13t mod 132
3*22 + 2 = 14, so we get
13 + 14*13t = 13 + 13t mod 132
So t = -1, and our actual solution is 2 - 13 = -11 = 158 mod 132.
And you can check that this is an actual solution to f(x) = 0 mod 132.

Note what happened: the terms with higher powers of t vanished due to the modulus, and we were left with a linear equation in t. Solving the linear equation allowed us to get from a solution mod 13 to a solution mod 132.
And what linear equation did we end up with? Well, we were looking at solving
8 + 2*2 + 1 + (3*22 + 2)*13t = 0
f(2) + f '(2)*13t = 0
So we wanted that 13t = -f(2)/f '(2).
That term should look somewhat familiar. Indeed, it's the correction term from Newton's method. Our initial "approximate solution" was 2, and then we corrected by -13, which is indeed what we would get from computing -f(2)/f '(2) modulo 132.

So why does this work? The answer is that we are doing the same thing that we did above, approximating our function by a linear one and solving that to get a more accurate answer.
When we computed f(2 + 13t) and got rid of the terms that were divisible by 169, we got rid of anything that was divisible by t2 and that gave us the linear approximation f(2) + f '(2)*13t.
So what's the notion of accuracy at work here? The p-adic one!
Recall that the p-adic absolute value for an integer is given by 1 over the highest power of p that divides that integer, and that the p-adic distance between two integers is 1 over the highest power of p that divides their difference.
When we're working mod p, we only get accuracy to within 1/p, Good enough to get initial approximations. Then when we move to mod p2, we're asking for accuracy within 1/p2, and the correction term -f(x)/f '(x) gives us that extra accuracy.
Note that this also works for moving from mod p2 to mod p3: again we take the solution x0 that works mod p2, and we look for a solution of f(x0 + p2t) = 0 mod p3, and we end up with
f(x0) + f '(x0)p2t = 0 mod p3
because all of the terms that involve higher powers of t also involve at least p3.
And so we get
p2t = -f(x0)/f '(x0)
and that gets us the solution mod p3.
So if we wanted to find a solution to x3 + 2x + 1 = 0 mod 133, we would compute
158 - f(158)/f '(158) = 158 - 1014/196 = 158 - 1014 = 1341
which you can check does indeed solve f(x) = 0 modulo 133.
And of course, finding a solution mod pk is just repeating this process.

This is called Hensel's lemma, which also happens to tell us that for each solution to f(x) mod p, there is exactly one corresponding solution mod p2, mod p3, mod p4 and so on, given by the corresponding correction terms. But it's really just Newton's method for p-adic integers.

So that's how we can solve polynomials over composite moduli: first we factor into powers of distinct primes, and then we solve the polynomial over each prime factor, and then we use Hensel's lemma to turn solutions modulo primes into solutions modulo powers of primes, and then we use the Chinese Remainder Theorem to put those solutions together over the original modulus.

Last edited: Mar 24, 2019
9. ### ExohedronDoesn't like words

I actually really like the Miller-Rabin test for primality. It's simple to implement and easy to understand why it works with only a little bit of number theory. In contrast, the AKS test, while not awful to implement, is a much harder beast to prove. On the other hand, it is deterministically polynomial time, which is pretty good, unlike the Miller-Rabin test which is only probabilistically polynomial.

10. Isn't there a limit to the number you'd have to check assuming the generalized Riemann hypothesis?

11. ### ExohedronDoesn't like words

Yeah, if the Generalized Riemann Hypothesis is true, we get that there's a deterministic, polytime version of the Miller-Rabin test called the Miller test, where you only have to check up to 2(log n)2. GRH gives us that there's a set of elements that generates (Z/nZ)x where all of them are smaller than 2(log n)2, so since the strong liars are all in a proper subgroup, checking all the generators means we must at some point test something that's outside the subgroup and thus give a witness of compositeness.

However, unlike the Miller-Rabin test, I don't think I understand how the stuff with the GRH works.

12. ### ExohedronDoesn't like words

Today's Math Fact:

A signature in the logic sense is a set of function symbols and relation symbols, each with specified arity (number of inputs). A theory is a collection of sentences using those symbols and any other logical grammar necessary to create actual sentences, without any free variables. A model of a theory is a collection of sets and a map from the relation symbols to relations on those sets and from the function symbols to functions on those sets, with the proper arities, such that the sentences in the theory are all satisfied. Sometimes we'll specify a theory just from a set of axioms and generate the rest via the rules of logical inference.

The Löwenheim-Skolem theorem says that any countable first-order theory with an infinite model has a model of every infinite cardinality.

For instance, the natural numbers using the Peano axioms is a countable first-order theory, and there is an infinite model, i.e. the natural numbers that we're used to. So, by Löwenheim-Skolem, there is a model of the Peano axioms that is the same size as the real numbers.
Conversely, if we consider the set of real numbers and all the true statements we can make about them just in terms of +, *, 0 and 1, we get what is called the theory of real-closed fields, and by Löwenheim-Skolem there is a model of this that is countable.

The fun math fact is that Thoralf Skolem didn't think much of this result, because he didn't believe in uncountable sets, and so for him the theorem was a tautology; if there's only one infinite cardinal then obviously anything with a model of that size has a model of every infinite cardinal.

Those familiar with set theory might be a bit confused at this point. Löwenheim-Skolem implies that ZF set theory, which is a countable theory that has an infinite model, must also have a countable model. But doesn't Cantor's diagonalization argument imply the existence of an uncountable set in ZF set theory? This is called Skolem's paradox.
Fortunately, Skolem, being a serious logician, did have a resolution, which was to be very, very careful about what it means to be countable.
In particular, the axioms of ZF and Cantor diagonalization imply that there is some set which is infinite and does not have a 1-1 correspondence with the natural numbers. The trick is then to construct a system in which there are sets that have the cardinality of the natural numbers, at least if you're allowed to view the system from the outside, but for which any possible correspondence fails to be a set; since everything in a model of set theory has to be a set, we get that we don't have a 1-1 correspondence to the natural numbers. It's kind of like how we can "prove" the Godel ststement of a formal system, but only by going to a meta-system.
So it turns out that "countable" and "uncountable" aren't absolute properties of a set, but are rather dependent on what model you're working in.

Anyway, Skolem eventually went even further and became a finitist.

13. ### ExohedronDoesn't like words

I keep thinking about writing up an introduction to homology/cohomology/homological algebra and I'm not entirely sure why. It's not even an area I'm super interested in beyond deRham, group and Lie algebra cohomology, which are all basically the same thing anyway.

14. ### ExohedronDoesn't like words

Why do I ever bother to talk about anything other than combinatorial group theory?

Consider the set of regular polygons. For each number greater than 2, we have a regular polygon, and a unique one, ignoring translations and rotations and dilations and whatnot.

Now let's go up a dimension and talk about regular solids, i.e. the Platonic solids. What do we mean by regular? We mean that all the faces are regular polygons and the faces are all identical and all of the corners are identical as well.
We suddenly get a drastic decrease in variety, in that there are only 5 Platonic solids.

If we go up another dimension, we can ask about regular hypersolids, where by regular we mean that the 3-dimensional faces are regular polyhedrons that are all identical and the vertices are all identical.
We end up with 6 regular hypersolids.

If we go up another dimension, we can ask about regular hyper- nevermind. The general term is "n-polytope". A polygon is a 2-polytope, a solid is a 3-polytope, a hypersolid is a 4-polytope, etc. So we can ask about regular 5-polytopes. How many are there? What about regular 6-polytopes?

The answer ties back to the Dynkin Diagrams I talked about pretty far back in this thread. Like, page 4 or 5 or something.

Anyway, a refresher:
An looks like a row of n dots, each connected to the next by a line marked 3.
BCn looks like An except the last line is marked with a 4 instead of a 3.
Dn looks like a row of n-2 dots, each connected to the next by a line marked 3, and then two more dots, each connected by a line marked 3 to the last dot in the row.
En looks like a row of n-1 dots, each connected to the next by a line marked 3, and then another dot connected to the third dot in the row by a line marked 3. We have to restrict n to be at least 6 and at most 8.
F4 looks like A4 except the middle line is marked with a 4 instead of a 3.
G2 looks like a 2 dots connected by a line marked 6.
Hn looks An except the last line is marked with a 5 instead of a 3. We have to restrict n to be at least 2 and at most 4.
I2(k) looks like two dots connected with a line marked k. We have to restrict to k at least 3.

You might note that this list is kind of redundant in places. A2 is I2(3), A3 is D3, B2 is I2(4), G2 is I2(6), and H2 is I2(5).

It turns out, for reasons I'll get into later, that the diagrams that correspond to regular n-polytopes are the ones that are just rows of n dots. We call these "linear" Dynkin diagrams. So not the Dn or En.
Also because of the redundancies, we'll only look at An, BCn and Hn for n > 2, and ignore G2, in favor of I2(k).

In fact, regular polytopes correspond to the endpoints of the linear Dynkin diagrams, so if the two endpoints of the Dynkin diagram look the same, then they yield the same polytope.

So for An, the diagram is symmetric, and so we only get one polytope from it. For n = 2 we get a triangle, and for n = 3 we get a tetrahedron, and for higher n we get what is called a simplex, which you get by taking n+1 equidistant points in Rn and connecting all pairs by lines and then filling in all triples by triangles and all quartets by tetrahedrons etc.

For BCn, the endpoint connected to the next dot by a line marked 4 yields a cube or hypercube of the appropriate dimension. The other end point yields what is called an "orthoplex", which is the dual of the hypercube in the same way that the octahedron is the dual of the cube: replace the n-1 dimensional faces by points and the n-2 dimensional faces by lines and so on. For BC4, we get the usual hypercube and what is called the 16-cell, which is made of tetrahedrons.
This also explains why, for instance, the two endpoints of A3 don't yield distinct polyhedra, because the tetrahedron is its own dual.

For F4 we get a 4-dimensional object called a 24-cell, made of octahedrons.

For H3 the endpoint connected to the next dot by a line marked 5 yields a dodecahedron, and the other endpoint yields the icosahedron.
H4, the end point connected by a line marked 5 yields the 120-cell, made of dodecahedrons, and other gives the 600-cell, made of tetrahedrons.

For I2(k) you can probably guess that I2(k) yields a k-gon, again self-dual so it doesn't matter which endpoint we pick.

Counting gives a few numbers: for n = 2, we have an infinite set because we can let k be any number greater than 2.
For n = 3, we have one polyhedron from A3, two from BC3, and two from H3, for a total of 5 regular solids.
For n = 4, we have one from A4, two from BC4, one from F4 and two from H4, for a total of 6 regular hypersolids.
For higher n, we have one from An, two from BCn, and then no more, because we've run out of Fs and Hs. So for n greater than 4, there are only 3 regular n-polytopes.
So our pattern is infinity, 5, 6, 3, 3, 3,....

A certain type of person now has a few questions. Why this correspondence? What do the other dots in the linear Dynkin diagrams mean? What do the non-linear Dynkin diagrams mean?

I'll cover that in another post, because it gets kind of complicated.

15. ### ExohedronDoesn't like words

A flag in mathematics, namely in algebraic geometry and combinatorics, is a bunch of things that are contained in other things.

Let's give an example: take a cube.
A flag for a cube might be a vertex, and an edge containing that vertex, and a face containing that edge.
There are 48 such flags for a cube: there are eight vertices, and each vertex is contained in three edges, and each edge is contained in two faces.
Notably, the cube also has 48 symmetries, in the forms of rotations and reflections, because given a vertex you can put it into any of 8 places, and once you've done that you can pick an edge from that vertex and put it in any of three positions, and once you've done that you can take a face containing that edge and put it in one of two positions.
And indeed, given two flags there is a unique symmetry of the cube that moves one flag to the other.
So the idea that we have here is that the set of flags correspond to the set of symmetries of the object in question.

So let's look at an icosahedron. What does a flag look like here? It looks like a vertex contained in an edge contained in a triangular face. How many flags are there? There are 12 vertices, and five edges off of each vertex, and two faces off of each edge, so there are a total of 120 flags. And there are 120 symmetries of the icosahedron.

A flag doesn't have to look like a 0-dimensional thing contained in a 1-dimensional thing in a 2-dimensional thing. It could look like a 0-dimensional thing contained in a 2-dimensional thing; we call this a partial flag because it doesn't have a 1-dimensional piece.
It could, however, look like a 0-dimensional thing contained in a 2-dimensional thing and also contained in a different 2-dimensional thing.
For instance, consider the cuboctahedron; you get this by taking a cube and cutting off all of the corners so that the square faces become smaller square faces; you end up with a shape that has 6 square faces and 8 triangular faces; now each vertex is contained in four edges, and each edge is contained in a single triangular face and a single square face.
So a flag could be a vertex and an edge containing the vertex and a square face containing the edge and a triangular face containing the edge. But now we have too much information; if we only knew the square and the triangle, we could figure out the edge. So we can make a flag that's just the vertex, the square and the triangle, and the edge is implied.
How many such flags does the cuboctahedron have? Well, pick a vertex; there are 12 vertices. Pick a square containing that vertex; there are two such squares. And now pick a triangle that also contains that vertex and shares an edge with the square; there are two such triangles. So we get 12*2*2 = 48, same as the cube, which is probably to be expected since we got the cuboctahedron by slicing bits off of the cube.

Now let's look at some diagrams.

First let's look at the diagram A0. This is a row of no dots, and I'm going to say that it corresponds to a vertex, being 0-dimensional. It has one flag, containing the vertex. Well say that #A0 = 1
Let's look at the diagram A1, which looks like a single dot, *. I'm going to say that it corresponds to an edge, being 1-dimensional. An edge has two flags, one for each vertex. We'll say that #A1 = 2.
Let's look at the diagram A2, which looks like two dots in a row connected by a line marked 3. I'm going to say that it corresponds to a triangle, and has six flags. We'll say that #A2 = 6.
Let's also look at the diagram BC2, which looks like two dots in a row, connected by a line marked 4. I'm going to say that it corresponds to a square, and has eight flags. We'll say that #BC2 = 8.

Now let's look at BC3, which looks like three dots in a row, the first two connected by a line marked 4 and the last two connected by a line marked 3. Now consider the last dot. If we delete it we get a diagram that looks like BC2, which I said corresponded to a square.
If instead we deleted the second and third dots, we're left with A1, which is an edge, and if we delete all three dots we get A0, which is a vertex.
So our flags for the cube are what we get by deleting dots and looking at the remaining diagrams.

What if we flip it around? If we delete the first dot only, we get the diagram A2, corresponding to a triangle; if we delete the first and second dots we get A1, for an edge, and if we delete all three dots we get A0, for a vertex. And these are the flags for an octahedron.

But wait, you say, the icosahedron also had flags that were made up of a vertex contained in an edge contained in a triangle, so why should we be so confident that we got an octahedron this time around?

Because we haven't used all the information in the diagram yet!

So consider our diagram BC3 again. We said that deleting dots from the right got us pieces of the flag for a cube. What does deleting dots from the left tell us about the cube?
Well, suppose we deleted the first dot. Then we get the diagram A2, i.e. the triangle.
Notice that if you take a cube and consider a vertex, that vertex is contained in 3 edges and each of those edges is contained in 2 faces, so that vertex is contained in six flags. 6 = #A2. In fact, recalling the relation between flags and symmetries, we get that there are 6 symmetries of the cube that keep a particular vertex in place. We call this the "stabilizer" of the vertex.

Now suppose you delete the first and second dots from BC3. You end up with A1, and #A1 = 2. Only now instead of two vertices contained in the single edge, we have two faces containing the single edge. Similarly, for each edge there are 2 symmetries of the cube that keep all the points of the edge in place.

And finally, we deleting all three dots yields A0, where #A0 = 1, and indeed, a face is contained in one...cube. I guess? More importantly, if we insist on keeping all of the points of a face in place, we can't actually move the cube at all.

So for the cube we assign the left-most dot to the vertices, because deleting it and everything to the right of it yields A0, and then the dots to the right of the first dot tell us that there are 6 symmetries of the cube will keep the vertex in place.
The dot in the middle corresponds to the edges, since the dots to the left of it yield A1, and the dot to the right tells us that there are two symmetries of the cube that fix all of the points of the edge.
The dot on the right corresponds to the faces, since the dots to the left of it give us a BC2, i.e. a square, and the lack of dots to the right of it tell us that keeping the face in place locks the entire cube in place.

Now let's look at the octahedron. We'll flip our diagram over so that the 3 is between the first left two dots and the 4 is between the right two dots.
Our leftmost dot corresponds to the vertices since there's nothing to the left of it, and the dots to the right form a BC2. So there are eight symmetries of the shape that keep the vertex in place; this is how we determine that we're getting an octahedron, since for an icosahedron there would be 10 symmetries.

Now what happens when we instead fold the diagram over so that the middle dot is to the left and the other two dots are to the right? What do we get?
Well now the left-most dot is still the vertex dot, but now the dots to the right of it are two disconnected dots. So we get an A1 and another A1 of symmetries, i.e. 2*2 symmetries.
Okay, what about the other dots? Well, for the dot connected to the left-most dot by a 3, the rest of the dots yield a BC2, so it corresponds to a square, and there are no dots to the right of it so no symmetries leave it unmoved. For the dot connected to the left-most dot by a 4, the rest of the dots yield an A2, so it corresponds to a triangle, and there are no dots to the right of it so no symmetries leave it unmoved.

So we have that each vertex is contained in a square and in a triangle. Moreover, the A1 + A1 set of symmetries that fixes a vertex indicates that each vertex must be contained in two squares and two triangles. This should sound kind of familiar: it's the cuboctahedron! The symmetries that leave a vertex fixed swap the two squares, or swap the two triangles, or swap the squares and swap the triangles, or leave everything in place.
So you see what happens when we pick something other than an endpoint of the diagram to be the vertex point: we end up with something that has multiple types of faces, because we have multiple types of right-most dots.
We'll also get similar results if we looked at diagrams that weren't linear: again we'd get multiple right-most points, and hence our shape wouldn't be regular.

Except! What if the two right-most dots gave the same shape when deleted?

Let's look at A3, which is three dots in a row connected by lines marked 3. Let's fold the diagram so that the middle dot is on the left and the other two dots are on the right.
Now the two dots on the right both yield A2 when deleted, i.e. triangles. Deleting the dot to the left yields A1+A1 again. So now we have that each vertex is contained in two triangles, and also in two other triangles. Sound familiar? That's an octahedron.
But now something slightly odd is going on, because according to the diagram, we no longer have a symmetry that will fix a vertex in place and rotate the octahedron by 90 degrees; there is no such motion in A1+A1. So now the eight triangular faces of the octahedron have been sundered into two sets of four. Identical sets, yes, and if you held an octahedron in your hand you'd be able to swap one set for the other, but according to the diagram you can't.
So that's what happens when the two right-most dots give the same shape: they tell you the shapes of the faces, but they lie about the symmetries because they think that you can't swap the faces even when the faces are the same shape.

So that's why the regular polytopes all correspond to endpoints of the linear Dynkin diagrams: from a Dynkin diagram you can read off a polytope by picking one dot to be the vertex dot, and if you don't pick an endpoint of a linear Dynkin diagram you either get a shape whose faces are different shapes or the diagram lies to you about how symmetrical the polytope is.

16. ### ExohedronDoesn't like words

"1+1 = 2?"

During grad school, I was trying to explain homotopy type theory to some friends of mine in the department lounge, and, as is the habit of many people who give impromptu lectures, was writing somewhat uninformative, contextless tidbits on the board.
The most clear one was the question:

"1 + 1 = 2?"

And of course, this is immediately noticed by people who were in the lounge but hadn't been listening to the conversation, and who I like to think normally respect me as a mathematician.

So what the hell was I talking about?

Consider a pair of sets A and B. If we are only allowed to talk about maps to and from our sets but aren't allowed to talk about individual elements, how do we define the disjoint union of the sets?
Now consider the disjoint union of A and B, which we'll denote by A+B. We have a function il: A -> A+B, and a function ir: B -> A+B, the usual inclusion maps. Moreover, for any set X and any functions f: A -> X and g: B -> X, we can uniquely define (f+g): A+B -> X such that
(f+g)il = f
(f+g)ir = g
So the disjoint union allows us to combine functions with the same range set.
If we were allowed to talk about elements, all of this follows by looking at an element of A+B and asking "Is it from A? Then do the function from A to the set. Is it from B? Then do the function from B to the set."
But if we can't talk about elements, then this will have to do.

Note that the number of elements in A+B is the number of elements in A plus the number of elements in B.
Well, wait, we said that we aren't allowed to talk about elements. Fine.
We call a set T "terminal" if for any set S, there is exactly one function from S to T. Notably, all one-element sets are terminal, and indeed, the only terminal sets have exactly one element. Moreover, as sets, all the terminal sets are isomorphic, because there's exactly one function between any two terminal sets, and that function has to be invertible because there's exactly one function going the other way.
So we have a notion of 1, as any terminal set. We also can say that 1+1 has two elements, in the sense that there are two functions from 1 to 1+1, which, if we were allowed to talk about elements, would send the single element of 1 to one of the elements of 1+1, or to the other element.

While we're here, we can also define "0" as an initial object. We say that a set I is "initial" if for any set S, there is exactly one function from I to S. It turns out that the set I has to be the empty set, so "0" is a pretty good moniker for it.

Okay, let's abstract a bit, to categories. Instead of talking about sets, we'll talk about more generic "objects", and instead of functions, we'll talk about more generic "morphisms". We can still talk about terminal objects, in that there is always exactly one morphism from any object to a terminal object. We can talk about initial objects, in that there is exactly one morphism from an initial object to any object. And we can talk about "disjoint unions", except we call them coproducts, in the following sense:
Suppose that we have two objects A and B. Then we define A+B to be an object such that there are morphisms il: A -> A+B and ir: B -> A+B and such that for any object X and any pair of morphisms f: A -> X and g: B ->X, we get a unique morphism (f+g):A+B -> X such that
(f+g)il = f
(f+g)ir = g

While we're here, we can also define the product. It looks like the coproduct with all of the arrows backwards, like so:
Suppose that we have two objects A and B. Then we define A*B to be an object such that there are morphisms pl: A*B -> A and pr: A*B -> B and such that for any object X and any pair of morphisms f: X -> A and g: X -> B, we get a unique morphism (f*g): X -> A*B such that
pl(f*g) = f
pr(f*g) = g
And note that for sets, the cardinality of A*B is the cardinality of A times the cardinality of B.

For certain categories, we also have a notion of exponentiation: AB, by which we mean all of the morphism from B to A. Note that if A and B are sets, then the cardinality of AB is exactly the cardinality of A raised to the cardinality of B.

Suppose we have a category where products, coproducts and exponentials are always defined. Then we automatically get initial and terminal objects as well, which we'll denote by 0 and 1.
All of these things obey what Tarski called the "high-school algebra" axioms:
x+y = y+x
(x+y)+z = x+(y+z)
x*1 = x
x*y = y*x
(x*y)*z = x*(y*z)
x*(y+z) = (x*y)+(x*z)
1x = 1
x1 = x
xy+z = xy*xz
(x*y)z = xz*yz
(xy)z = xy*z
and indeed, you might recognize these from high school. Note that we might need to replace those equalities with isomorphisms, but eh, close enough.

We also have a notion of "how many points does an object X have", again by saying "how many morphisms are there from a terminal object to X?"

Anyway, so we've defined a notion of 1 in a more general setting, and 1+1 in a more general setting, and the notion of number of points in a more general setting.

Of course, all of this can be demonstrated with a simple example:
Suppose our objects are pairs of sets, and our morphisms are pairs of functions. Then our terminal object looks like (1,1), and we have that (1,1)+(1,1) = (1+1,1+1).
So far so good, right?
Except now we have that there are four morphisms from (1,1) to (1+1,1+1). Oops!

So it turns out that the axioms given above, familiar from high-school algebra, are insufficient to actually tell you the size of 1+1.

Last edited: Apr 20, 2019
17. ### ExohedronDoesn't like words

My favorite fact about harmonic polynomials.

Suppose you have a function P in n variables x1, ..., xn. We define the Laplacian Δ as the operator that sends P to
Δ(P) = ∂2P/∂x12 + ... + ∂2/∂xn2
In other words, you take the second derivative of P in each direction, and add them together.

The Laplacian is a particularly interesting operator for a number of reasons. It shows up all over the place in differential equations, since it gives us a nice way to talk about wave-like behavior or about diffusion of heat or particles or generally any case where the behavior of an individual is described by averaging the behavior of its neighbors.

But as always I don't care about analysis, I want to talk about algebra.

We say that P is harmonic if Δ(P) = 0.
My favorite fact about harmonic functions goes as follows:

Let C be the polynomial x12 + ... + xn2.
Then for any twice-differentiable function* in n variables with real coefficients, there is a unique way to write P as a (possibly infinite) series of terms CkPk where Pk is harmonic.

So for instance, let's suppose that n = 3, so that
C = x12 + x22 + x32
Suppose that
P = 3x14 - 4x12x22 + 3x12x32 + x24 + x22x32 + x34
Then we have that
P0 = x14 - 6x12x22+x24
P1 = x12-x22
P2 = 1
and you can check that each of these is harmonic.

So why do I like this fact? Because I'm a representation theorist, and this is about representations of the orthogonal group SO(n).
If we take our space Rn and apply a rotation to it, then we can ask how this rotation affects our functions P and C and how it affects our operator Δ. It turns out that any rotation will yield the same C, because C is just the Euclidean distance which our rotations preserve by definition, and it will also preserve Δ, because we can get Δ by taking C and replacing the variables with the corresponding partial derivatives.
So if we write R for the rotation, then applying R to P and then applying Δ is the same as applying Δ to P and then applying R:
Δ(R(P)) = R(Δ(P))
In particular, if P is harmonic, then so is R(P) for any rotation R.
So this fact about harmonic functions is particularly nice since it allows us to write any function in terms of an invariant C and harmonic functions, which themselves are defined by a rotationally-invariant operator.

Indeed, this allows us to write functions on spheres in a particularly useful way, because the unit sphere in Rn is what we get by setting C to 1. Conversely, if we have a function that we've only defined on the unit sphere, we can extend it to a harmonic function on all of Rn in a unique way, by extending it in any way at all, writing the result in terms of C and harmonic functions, and then setting all of instances of C in the expression to 1.

There's some other fun you can have with certain other groups, where by "certain other" I mean Coxeter groups and simple Lie groups, because I never talk about any other groups (I really ought to talk about other groups at some point).

* You might need the function to be analytic. I'm not actually sure. I know that it's true for polynomials and therefore true for analytic functions, but I am not sure for more general cases.

Last edited: May 12, 2019
• Like x 1
18. ### ExohedronDoesn't like words

deRham cohomology is best cohomology (fight me)

There's an old theorem of Poincare that says that if you have a vector field v defined everywhere on the plane, and the curl of v is 0 everywhere, then there is some function f such that v is the gradient of f. Moreover, if you have a region R of the plane that doesn't have any holes in it, and v is defined everywhere on R and the curl of v is 0 everywhere on R, then there is some function f defined on R such that v is the gradient of f.
The converse is pretty clear just by writing out definitions and expanding terms and being careful with your bookkeeping.

The part about R not having any holes in it is pretty important. Consider the region that's the entire plane except the origin. Let v be the vector field given by
v(x, y) = (-y/(x2+y2), x/(x2+y2))
v isn't defined at the origin, but that's outside our region.
If I've gotten my numbers right, then we have that the curl of v is 0 everywhere in our region. But you can check that v is not the gradient of any function!

Suppose that v is in fact the gradient of a function f. If you start at a point in the plane, and evaluate f there, you get some value f(x,y). And just like regular integration will tell you how f changes as you move in terms of its derivative, the integral along a curve will tell you how f changes as you move around in the plane in terms of the gradient:
f(end) - f(start) = ∫C grad(f)·t ds
where t is the unit tangent vector to the curve C at the point, and ds is your arclength element.
In particular, if you start and end at the same point, i.e. your curve is a loop, then integrating the gradient around the loop should yield no change.

So let's look at v on the unit circle. For (x,y) on the unit circle, we get v(x,y) = (-y, x). Also, our tangent vector to the unit circle is (-y, x).
C v·t ds = ∫C v·t ds = ∫C (y2+x2 ds = ∫C 1 ds = 2π
which is not 0!
So we get that v can't be the gradient of a function!

And indeed, there is a sense in which this can be used to detect holes:
If you have a region R in the plane and someone hands you a vector field defined on R whose curl is 0 but which isn't the gradient of a function on R, then you know by Poincare's theorem that your region has to have a hole
If someone hands you a loop and the integral of the vector field around that loop isn't 0, then you know that the vector field isn't the gradient of a function

In fact, in order to tell you anything interesting, the loop has to go around a hole! Otherwise we can just apply Poincare's theorem to the region inside the loop, and so the integral around the loop would have to be 0.

So let's pull away from individual vector fields and functions, and look at a general statement: Let R be a connected region on the plane and consider all of the vector fields on our region R that have vanishing curl. Call this set Z1(R). Note that it's a vector space, since the curl is a linear operation on vector fields. It's a very big (infinite dimensional in fact) vector space, but we're not interested in it on its own.
Let B1(R) be the vector space of all vector fields that are gradients of functions on R. Again, very big vector space. Also it's a subspace of Z1(R), since the curl of a gradient is always 0.
The question is: is there anything in Z1(R) that isn't in B1(R). Alternatively, is
H1(R) = Z1(R)/B1(R)
nontrivial as a vector space?
If R has a hole in it, then H1(R) does have something nontrivial in it, and thus is at least one dimensional.
Claim: The dimension of H1(R) is the number of holes in R.

Now let's boost by a dimension. Here's where the fun starts.

Suppose you have a region in 3-dimensional space. We can define functions and vector fields in this region.

Consider a vector field v. We can again define the curl of v, and again we get the statement that if our region has no holes in it, then every vector field whose curl is 0 everywhere is the gradient of a function. And again we can say:
Define Z1(R) as the space of vector fields with vanishing curl, and B1(R) as the space of vector fields that are gradients of functions, and define H1(R) = Z1(R)/B1(R)
And if H1(R) is nontrivial, then we have a hole.
But not just missing isolated points or pockets. No, this indicates a tunnel that goes all the way through our space, like the hole of a donut. H1 asks what happens when we take a rubber band around our supposed hole and let it go: for an empty pocket the rubber band will just slide around it and shrink to nothing, but around an tunnel all the way through the space, the band can't avoid the tunnel.
But what about empty pockets? How do we detect them?
Well, suppose you have a vector field v whose divergence is 0 everywhere. If our region has no holes in it, then every vector field whose divergence is everywhere 0 is the curl of a vector field.
So we define Z2(R) as the space of vector fields with vanishing divergence, and B2(R) as the space of vector fields that are curls of vector fields, and define H2(R) = Z2(R)/B2(R)
And now the dimension of H2(R) counts pockets rather than tunnels. H2 asks if you have a balloon and you let it shrink, can it shrink to 0 or will it get caught on a patch of missing space.

This is the beginning of what is called deRham cohomology, which is the nicest cohomology but only works in spaces that have enough derivatives defined on them. But we have this idea that we can count holes of various types by taking some operations z and b such that zb = 0, and looking at the space Z of things that z sends to 0, and the space B of things that are the result of applying b to stuff, and looking at H = Z/B.
If we're a bit more careful about what kinds of derivative-like operators we're talking about and switch from vector fields to what are called "differential forms", we can extend this idea to any number of dimensions. If we have a region of four-dimensional space, what kind of holes can we have, and what should we compute to detect these holes? But that requires a bit more machinery.
We can even extend it to things that aren't subsets of Rn. We just need appropriate notions of derivative and differential form.

On the other hand, because Z and B are so large, I guess it's not great if you want to actually compute stuff. But who cares about that kind of thing?

• Like x 1
19. ### ExohedronDoesn't like words

I'm starting to appreciate generating functions more. I mean, I already liked the idea, but now that I'm occasionally using them to actually do stuff I'm really seeing how powerful and elegant they can be.

20. ### ExohedronDoesn't like words

I just remembered that we can tie the harmonic function stuff and the deRham cohomology stuff together:
There is a natural way to extend the Laplacian Δ to act on differential forms, and using that extension we get that for each w in Hn(R) there is a unique element z in Zn(R) such that Δ(z) = 0 and w is the image of z under the natural quotient map from []: Zn(R) -> Hn(R).

The interaction of the harmonic structure and the differential structure is kind of interesting. For instance, if you have an element a in Zm(R) and an element b in Zn(R), then ab is an element of Zm+n(R), but it is not necessarily the case that the product of two harmonic things is harmonic; there is a harmonic differential form that corresponds to it, since we can take [ab] in Hm+n(R) and ask for the corresponding harmonic element of Zm+n(R), but it probably isn't going to be ab even if a and b are harmonic.