I really like Wikipedia's proof of why the determinant is multiplicative: first you show that any n-linear, alternating function F from k^{n} to k is determined by the value it takes on the identity matrix I_{n} (viewed as n vectors), and thus that the determinant is the unique such function that takes the value 1 at I_{n}, and that for any other n-linear alternating functions F, F applied to M is just F(I_{n})det(M) by linearity in the space of multilinear functions. Then you note that the function d_{A} that takes a matrix M to det(AM) is such a function, and takes the value det(A) at M = I_{n}, and so in general is just det(A)det(M). Step one takes a little effort, though, mostly to convince the reader that the bookkeeping works out.
A slight divergence from pure math stuff, because I attempted to explain some of this to my parents recently: Digital Signatures You want to send someone a letter, and you want them to be certain that you sent the letter. Conversely, you don't want people sending things under your name. If we have a physical letter, the traditional method is to append a signature or a seal, some symbol or sign that your intended recipient can recognize as yours and that can't be produced by anyone else. But forgery is a thing, duplicating someone's handwriting or copying a seal. Most of us don't have to worry about that on the physical plane, because that takes effort that aren't worth it for most people. But there are types of forgery that do pay well. For instance, art forgery; learning someone's painting style, techniques, materials well enough to claim that you've "found" a lost piece. In the digital realm, we send a lot of messages, important ones, and it's very easy to copy bits. So we need a good way of signing things that is more than just appending a bunch of bits containing your name. Fortunately, we have some decent public key encryption systems. For a recap: in a public key system, you have some algorithm E that takes a key K_{E} and a message p and produces an encrypted string c, and a different algorithm D that takes a different key K_{D} and the encrypted string c and produces the message p. This is different from more traditional, symmetric key cryptography, wherein the two keys are the same. In public key cryptography, because the encryption key is distinct from the decryption key, you can tell people the encryption key, and they can send you encrypted messages using the algorithm E and the key K_{E}, but as long as you keep the decryption key secret, nobody can read those encrypted messages except for you, even if they know the algorithm D. So how does that help with signing things? A digital signature works similarly to public key encryption. You, the signer, have a key K_{S} and a key K_{V}, and you release K_{V} to the public. There is an algorithm S which takes a message m and your key K_{S}, and produces a signature h, and an algorithm V that takes a message m, a supposed signature h, and a key K_{V}, and verifies whether the signature would be produced by S given the message and the key K_{S}. Note that m also gets transmitted. The reason for this is that we want the signature to depend on the message in some fashion. Otherwise a forger could take a valid signature from one message and paste it onto a fake message, like ripping a wax seal from one letter and gluing it onto a fake letter. At the very least, the verifier wants to know that not only did they get a message from the correct person, they got the correct message from that person. This is called authentication, and is slightly different from identifying. But it's still important, because someone might intercept a legitimate message from you, mess with the contents of the message without touching the signature, and then send it onward, and you find yourself promising money to people you didn't mean to. Also, for the verifier's benefit, a message with a valid signature can't be repudiated by the sender, because nobody else could have produced a valid signature for that message; hence you can't deny having signed and therefore presumably having sent that message. But this is usually considered a less important aspect compared to identification and authentication. Having a public key encryption system actually directly helps with making a digital signature scheme. Your standard public key setup promises that p = D(K_{D}, E(K_{E}, p))) so that decrypting with the proper decryption key undoes an encryption with the proper encryption key. But a lot of them also have the property that c = E(K_{E}, D(K_{D}, c)) i.e. that encrypting with the proper encryption key undoes a decryption with the proper decryption key. This is not always the case, but it is true for a lot of public key encryption systems, in particular RSA. So, to sign a message, you take the message m, you decrypt it using K_{D} to get some "decrypted" h, and you use h as your signature. The verifier, knowing the corresponding key K_{E}, encrypts h and sees if it matches m. Voila, a signature. If a would-be forger doesn't know your decryption key K_{D}, they won't be able to "decrypt" m to get h, and so won't be able to produce a valid signature. Moreover, if they don't have the correct m, then the encrypted signature won't match. So we get identification and authentication. Okay, so I'm leaving a step out. We don't want to decrypt the entire message, because messages can be long but your signature should be short. So we cheat a little. Instead of decrypting the message, we hash the message, using a complicated function that takes in messages and produces short strings of nonsense. Then we decrypt the hashed message to get a short signature. Then the verifier compares the encrypted signature with the hash of the message. The danger here is if two messages generate the same hash in a predictable manner. Fortunately, we also have decent hash functions. So that's what your computer is doing under the hood every time it tries to send a message that needs to be trustworthy. It hashes the message, turns the hash into a signature, and then sends the message and the signature; conversely, every time it receives a signed message, it processes the signature, hashes the message, and checks if it got matching results. If the signature corresponds to the message, great. If not, someone is trying to pull a fast one. Or something got garbled during transmission; either way, don't trust the message. Now, I'm sure some of you are thinking "wait, isn't RSA going to be insecure when we have quantum computers? Aren't all of our public key systems currently in use going to be insecure against quantum computation? Doesn't that mean that our digital signature algorithms will be insecure?" And the answer is yes. So in the future, possibly the near future, we are going to need both encryption algorithms that will stand up against a quantum computer, and digital signature schemes that will stand up against a quantum computer. It seems, from what I can tell, that signature schemes are slightly easier to come up with than encryption algorithms, due to how they handle what each person needs to know, but it's still a hard problem.
For reasons, I am currently liveblogging a math paper in my bijou blogette here. The paper, Homological Tools for the Quantum Mechanic, by Tom Mainiero, is about detecting factorizability of tensors via cohomology, available on the arXiv here. If you want to see someone reacting in uninformative ways to what is really a small book (126 pages) then you can follow the tag #MathPaperLiveBlog.* * I don't know if you can actually follow in a meaningful fashion other than watching the thread and/or searching for "#MathPaperLiveBlog" in the search bar.
Today I learned that a map between Poisson manifolds that preserves the Poisson structure is called an ichthyomorphism.
Given that I seem to be an algebraic differential geometer who was unfortunately mostly raised by analytic differential geometers, the discovery that torsion can be explained in terms of exterior derivatives applied to differential forms makes me very happy. I don't know what is meant by "twisting" in a differential-geometric sense, but the discrepancy between a covariant exterior derivative and the standard exterior derivative makes sense as a thing.
I learned a great π fact today. π is the minimum value of π On the plane, we can define what are called "L^{p}" norm, by which we mean the following: first, pick a real number p between 1 and ∞. Given a vector with coordinates (x, y), we say that the L^{p} norm of (x, y) is (|x|^{p} + |y|^{p})^{1/p}. So the usual Euclidean norm is the L^{2} norm, and the L^{1} norm is just |x| + |y|. There's also what we call the L^{∞} norm, which gives max(|x|, |y|). We can define the L^{p}-distance between two points (x_{1}, y_{1}) and (x_{2}, y_{2}) as the L^{p} norm of the vector difference between the two points, i.e. d_{p}((x_{1}, y_{1}), (x_{2}, y_{2})) = (|x_{1} - x_{2}|^{p} + |y_{1} - y_{2}|^{p})^{1/p} Now we can define a unit L^{p} circle to be the set of points of L^{p}-distance 1 from a given point. We can also define a notion of L^{p}-length of a curve using calculus and substituting in p where we normally would see 2 in the formula. So now we can ask "What is the L^{p}-length of a unit L^{p}-circle?" We can divide that value by 2 and call the resulting number π_{p}. So for instance, π_{1} is 4, a fact that I'll let you puzzle out on your own. π_{2} is the familiar 3.14... π_{∞} also ends up being 4. So here's the fun π fact: π, by which we mean π_{2}, is the minimum value of π_{p} with regards to all values of p between 1 and ∞.
Given a polynomial with complex coefficients, we can consider the polynomial both as a bunch of coefficients but also as a function that takes complex numbers to complex numbers. Note that the product of any two polynomials is again a polynomial, and that if two polynomials aren't 0 everywhere, then their product isn't 0 everywhere. This also works with real numbers, talking about polynomials in terms of either their coefficients or their values. Something funny happens when we look at a finite field. Suppose that we look at a field F_{p} with p elements for some prime number p. Then Fermat's little theorem states that x^{p} = x for any x in F_{p}. We have that x is a nonzero polynomial, and x^{p-1} - 1 is a nonzero polynomial since evaluating it at x = 0 gives -1. But the product of these two functions gives x^{p} - x, which always takes on the value 0 for any x in F_{p}. So we two nonzero polynomials that appear to multiply to something that always evaluates to 0? The problem that we have here is that we're very, very far away from being algebraically closed, and so the nonzero-ness of x^{p} - x is hidden from us. If we move to a field extension of F_{p}, then we get that it is no longer true that x^{p} - x always evaluates to 0; for elements of the extension that aren't in F_{p}, x^{p} - x yields a nonzero value. However, there are still lurking issues. Suppose that we extend F_{p} by adding in roots of some irreducible degree k polynomial. Then we end up with the field F_{pk}. So now most elements don't satisfy x^{p} - x = 0. However, everything satisfies x^{pk} - x = 0, and so now we have that the product of x and x^{pk-1} - 1 always evaluates to 0. So we have to move to an even bigger extension, and at this point we might as well go all the way to the algebraic closure of F_{p} just in case. Anyway, I was originally going to talk about the group structure of CP^{∞} in terms of rational functions and roots and poles, and how that totally doesn't work for F_{p}P^{∞} due to the problem of hidden nonzeros except I kind of lost the thread. I'm still learning how weird algebraic geometry is when you're so far away from being algebraically closed.
More fun with modular arithmetic. Newton's method and Hensel's lemma Suppose that you wanted to find a solution to the equation x^{3} + 2x + 1 = 0 mod 2873. Don't ask me why. How would you find one? Well, first thing you do, as you do in all questions about modular arithmetic, is you factor the modulus. 2873 = 13^{2} * 17. According to the Chinese Remainder theorem, if you can find a solution modulo 13^{2} and find a solution modulo 17, you can combine them into a solution modulo 2873. But you can't use the Chinese Remainder theorem to combine a solution modulo 13 with a solution modulo 13 to get a solution modulo 13^{2}. So what do you do? Let's take a digression into the real numbers. Suppose you have a function f and you want to find a point x such that f(x) = 0. For some particular functions we have formulas for finding such points, but for most functions we don't. But we can find approximate solutions in some cases. Suppose we guess a point x_{0}. f(x_{0}) probably isn't 0, and probably isn't close enough to 0. Can we make a better guess? Well, assume that f is linear. If f has slope m, then we have that f(x_{0} - f(x_{0})/m) = 0 So we set x = x_{0} - f(x_{0})/m And we're done! But what if f is not linear? Well, then we just assume that f is approximately linear, i.e. differentiable. Then the slope of the line that approximates f at the point x_{0} is f '(x_{0}). So we get that we should look at x_{1} = x_{0} - f(x_{0})/f '(x_{0}) and this might be a better approximation to x. So we compute f(x_{1}), and if it isn't close enough to 0 then we set x_{2} = x_{1} - f(x_{1})/f '(x_{1}) And so on. This is called Newton's method for finding zeros of a function. In general this will serve us well, unless we accidentally have that f ' = 0 at one of these points. If that happens, we pick another starting point. Newton's method requires a decent enough function, by which we mean has continuous second derivative and has a good Taylor approximation. It also requires that we don't pick a really bad x_{0}. Also it requires that a solution actually exist; otherwise we might get all sorts of nonsense. Now back to modular arithmetic. Suppose we have the function f(x) = x^{3} + 2x + 1, and we want to find a solution to f(x) = 0 not in the real numbers, but in the integers modulo 13^{2}. We could try finding one directly, either through testing each possible number or just using the cubic formula, but let's try to use a slightly different tack. Suppose that we had a solution to f(x) = 0 mod 13^{2}. Then since 13 divides 13^{2}, x mod 13 should be a solution to f(x) = 0 mod 13. So let's find x mod 13. Modulo 13, we get that the only solution is x = 2. So now let's try to lift that to a solution modulo 13^{2}. We know x mod 13 is 2, so our actual solution modulo 13^{2} is going to be 2 + 13t for some t. Let's plug that into f and see what we get: f(2 + 13t) = (2 + 13t)^{3} + 2(2 + 13t) + 1 = 8 + 3*2^{2}*13t + 3*2*(13t)^{2} + (13t)^{3} + 2*2 + 2*13t + 1 = 13 + 3*2^{2}*13t + 3*2*13^{2}t^{2} + 13^{3}t^{3} + 2*13t Since we're working modulo 13^{2}, the t^{2} and t^{3} terms vanish, since both have coefficients that are divisible by 13^{2}. So we're left with 13 + 3*2^{2}*13t + 2*13t = 13 + (3*2^{2} + 2)*13t mod 13^{2} 3*2^{2} + 2 = 14, so we get 13 + 14*13t = 13 + 13t mod 13^{2} So t = -1, and our actual solution is 2 - 13 = -11 = 158 mod 13^{2}. And you can check that this is an actual solution to f(x) = 0 mod 13^{2}. Note what happened: the terms with higher powers of t vanished due to the modulus, and we were left with a linear equation in t. Solving the linear equation allowed us to get from a solution mod 13 to a solution mod 13^{2}. And what linear equation did we end up with? Well, we were looking at solving 8 + 2*2 + 1 + (3*2^{2} + 2)*13t = 0 which we can read as f(2) + f '(2)*13t = 0 So we wanted that 13t = -f(2)/f '(2). That term should look somewhat familiar. Indeed, it's the correction term from Newton's method. Our initial "approximate solution" was 2, and then we corrected by -13, which is indeed what we would get from computing -f(2)/f '(2) modulo 13^{2}. So why does this work? The answer is that we are doing the same thing that we did above, approximating our function by a linear one and solving that to get a more accurate answer. When we computed f(2 + 13t) and got rid of the terms that were divisible by 169, we got rid of anything that was divisible by t^{2} and that gave us the linear approximation f(2) + f '(2)*13t. So what's the notion of accuracy at work here? The p-adic one! Recall that the p-adic absolute value for an integer is given by 1 over the highest power of p that divides that integer, and that the p-adic distance between two integers is 1 over the highest power of p that divides their difference. When we're working mod p, we only get accuracy to within 1/p, Good enough to get initial approximations. Then when we move to mod p^{2}, we're asking for accuracy within 1/p^{2}, and the correction term -f(x)/f '(x) gives us that extra accuracy. Note that this also works for moving from mod p^{2} to mod p^{3}: again we take the solution x_{0} that works mod p^{2}, and we look for a solution of f(x_{0} + p^{2}t) = 0 mod p^{3}, and we end up with f(x_{0}) + f '(x_{0})p^{2}t = 0 mod p^{3} because all of the terms that involve higher powers of t also involve at least p^{3}. And so we get p^{2}t = -f(x_{0})/f '(x_{0}) and that gets us the solution mod p^{3}. So if we wanted to find a solution to x^{3} + 2x + 1 = 0 mod 13^{3}, we would compute 158 - f(158)/f '(158) = 158 - 1014/196 = 158 - 1014 = 1341 which you can check does indeed solve f(x) = 0 modulo 13^{3}. And of course, finding a solution mod p^{k} is just repeating this process. This is called Hensel's lemma, which also happens to tell us that for each solution to f(x) mod p, there is exactly one corresponding solution mod p^{2}, mod p^{3}, mod p^{4} and so on, given by the corresponding correction terms. But it's really just Newton's method for p-adic integers. So that's how we can solve polynomials over composite moduli: first we factor into powers of distinct primes, and then we solve the polynomial over each prime factor, and then we use Hensel's lemma to turn solutions modulo primes into solutions modulo powers of primes, and then we use the Chinese Remainder Theorem to put those solutions together over the original modulus.
I actually really like the Miller-Rabin test for primality. It's simple to implement and easy to understand why it works with only a little bit of number theory. In contrast, the AKS test, while not awful to implement, is a much harder beast to prove. On the other hand, it is deterministically polynomial time, which is pretty good, unlike the Miller-Rabin test which is only probabilistically polynomial.
Yeah, if the Generalized Riemann Hypothesis is true, we get that there's a deterministic, polytime version of the Miller-Rabin test called the Miller test, where you only have to check up to 2(log n)^{2}. GRH gives us that there's a set of elements that generates (Z/nZ)^{x} where all of them are smaller than 2(log n)^{2}, so since the strong liars are all in a proper subgroup, checking all the generators means we must at some point test something that's outside the subgroup and thus give a witness of compositeness. However, unlike the Miller-Rabin test, I don't think I understand how the stuff with the GRH works.
Today's Math Fact: A signature in the logic sense is a set of function symbols and relation symbols, each with specified arity (number of inputs). A theory is a collection of sentences using those symbols and any other logical grammar necessary to create actual sentences, without any free variables. A model of a theory is a collection of sets and a map from the relation symbols to relations on those sets and from the function symbols to functions on those sets, with the proper arities, such that the sentences in the theory are all satisfied. Sometimes we'll specify a theory just from a set of axioms and generate the rest via the rules of logical inference. The Löwenheim-Skolem theorem says that any countable first-order theory with an infinite model has a model of every infinite cardinality. For instance, the natural numbers using the Peano axioms is a countable first-order theory, and there is an infinite model, i.e. the natural numbers that we're used to. So, by Löwenheim-Skolem, there is a model of the Peano axioms that is the same size as the real numbers. Conversely, if we consider the set of real numbers and all the true statements we can make about them just in terms of +, *, 0 and 1, we get what is called the theory of real-closed fields, and by Löwenheim-Skolem there is a model of this that is countable. The fun math fact is that Thoralf Skolem didn't think much of this result, because he didn't believe in uncountable sets, and so for him the theorem was a tautology; if there's only one infinite cardinal then obviously anything with a model of that size has a model of every infinite cardinal. Those familiar with set theory might be a bit confused at this point. Löwenheim-Skolem implies that ZF set theory, which is a countable theory that has an infinite model, must also have a countable model. But doesn't Cantor's diagonalization argument imply the existence of an uncountable set in ZF set theory? This is called Skolem's paradox. Fortunately, Skolem, being a serious logician, did have a resolution, which was to be very, very careful about what it means to be countable. In particular, the axioms of ZF and Cantor diagonalization imply that there is some set which is infinite and does not have a 1-1 correspondence with the natural numbers. The trick is then to construct a system in which there are sets that have the cardinality of the natural numbers, at least if you're allowed to view the system from the outside, but for which any possible correspondence fails to be a set; since everything in a model of set theory has to be a set, we get that we don't have a 1-1 correspondence to the natural numbers. It's kind of like how we can "prove" the Godel ststement of a formal system, but only by going to a meta-system. So it turns out that "countable" and "uncountable" aren't absolute properties of a set, but are rather dependent on what model you're working in. Anyway, Skolem eventually went even further and became a finitist.
I keep thinking about writing up an introduction to homology/cohomology/homological algebra and I'm not entirely sure why. It's not even an area I'm super interested in beyond deRham, group and Lie algebra cohomology, which are all basically the same thing anyway.
Why do I ever bother to talk about anything other than combinatorial group theory? Consider the set of regular polygons. For each number greater than 2, we have a regular polygon, and a unique one, ignoring translations and rotations and dilations and whatnot. Now let's go up a dimension and talk about regular solids, i.e. the Platonic solids. What do we mean by regular? We mean that all the faces are regular polygons and the faces are all identical and all of the corners are identical as well. We suddenly get a drastic decrease in variety, in that there are only 5 Platonic solids. If we go up another dimension, we can ask about regular hypersolids, where by regular we mean that the 3-dimensional faces are regular polyhedrons that are all identical and the vertices are all identical. We end up with 6 regular hypersolids. If we go up another dimension, we can ask about regular hyper- nevermind. The general term is "n-polytope". A polygon is a 2-polytope, a solid is a 3-polytope, a hypersolid is a 4-polytope, etc. So we can ask about regular 5-polytopes. How many are there? What about regular 6-polytopes? The answer ties back to the Dynkin Diagrams I talked about pretty far back in this thread. Like, page 4 or 5 or something. Anyway, a refresher: A_{n} looks like a row of n dots, each connected to the next by a line marked 3. BC_{n} looks like A_{n} except the last line is marked with a 4 instead of a 3. D_{n} looks like a row of n-2 dots, each connected to the next by a line marked 3, and then two more dots, each connected by a line marked 3 to the last dot in the row. E_{n} looks like a row of n-1 dots, each connected to the next by a line marked 3, and then another dot connected to the third dot in the row by a line marked 3. We have to restrict n to be at least 6 and at most 8. F_{4} looks like A_{4} except the middle line is marked with a 4 instead of a 3. G_{2} looks like a 2 dots connected by a line marked 6. H_{n} looks A_{n} except the last line is marked with a 5 instead of a 3. We have to restrict n to be at least 2 and at most 4. I_{2}(k) looks like two dots connected with a line marked k. We have to restrict to k at least 3. You might note that this list is kind of redundant in places. A_{2} is I_{2}(3), A_{3} is D_{3}, B_{2} is I_{2}(4), G_{2} is I_{2}(6), and H_{2} is I_{2}(5). It turns out, for reasons I'll get into later, that the diagrams that correspond to regular n-polytopes are the ones that are just rows of n dots. We call these "linear" Dynkin diagrams. So not the D_{n} or E_{n}. Also because of the redundancies, we'll only look at A_{n}, BC_{n} and H_{n} for n > 2, and ignore G_{2}, in favor of I_{2}(k). In fact, regular polytopes correspond to the endpoints of the linear Dynkin diagrams, so if the two endpoints of the Dynkin diagram look the same, then they yield the same polytope. So for A_{n}, the diagram is symmetric, and so we only get one polytope from it. For n = 2 we get a triangle, and for n = 3 we get a tetrahedron, and for higher n we get what is called a simplex, which you get by taking n+1 equidistant points in R^{n} and connecting all pairs by lines and then filling in all triples by triangles and all quartets by tetrahedrons etc. For BC_{n}, the endpoint connected to the next dot by a line marked 4 yields a cube or hypercube of the appropriate dimension. The other end point yields what is called an "orthoplex", which is the dual of the hypercube in the same way that the octahedron is the dual of the cube: replace the n-1 dimensional faces by points and the n-2 dimensional faces by lines and so on. For BC_{4}, we get the usual hypercube and what is called the 16-cell, which is made of tetrahedrons. This also explains why, for instance, the two endpoints of A_{3} don't yield distinct polyhedra, because the tetrahedron is its own dual. For F_{4} we get a 4-dimensional object called a 24-cell, made of octahedrons. For H_{3} the endpoint connected to the next dot by a line marked 5 yields a dodecahedron, and the other endpoint yields the icosahedron. H_{4}, the end point connected by a line marked 5 yields the 120-cell, made of dodecahedrons, and other gives the 600-cell, made of tetrahedrons. For I_{2}(k) you can probably guess that I_{2}(k) yields a k-gon, again self-dual so it doesn't matter which endpoint we pick. Counting gives a few numbers: for n = 2, we have an infinite set because we can let k be any number greater than 2. For n = 3, we have one polyhedron from A_{3}, two from BC_{3}, and two from H_{3}, for a total of 5 regular solids. For n = 4, we have one from A_{4}, two from BC_{4}, one from F_{4} and two from H_{4}, for a total of 6 regular hypersolids. For higher n, we have one from A_{n}, two from BC_{n}, and then no more, because we've run out of Fs and Hs. So for n greater than 4, there are only 3 regular n-polytopes. So our pattern is infinity, 5, 6, 3, 3, 3,.... A certain type of person now has a few questions. Why this correspondence? What do the other dots in the linear Dynkin diagrams mean? What do the non-linear Dynkin diagrams mean? I'll cover that in another post, because it gets kind of complicated.
A flag in mathematics, namely in algebraic geometry and combinatorics, is a bunch of things that are contained in other things. Let's give an example: take a cube. A flag for a cube might be a vertex, and an edge containing that vertex, and a face containing that edge. There are 48 such flags for a cube: there are eight vertices, and each vertex is contained in three edges, and each edge is contained in two faces. Notably, the cube also has 48 symmetries, in the forms of rotations and reflections, because given a vertex you can put it into any of 8 places, and once you've done that you can pick an edge from that vertex and put it in any of three positions, and once you've done that you can take a face containing that edge and put it in one of two positions. And indeed, given two flags there is a unique symmetry of the cube that moves one flag to the other. So the idea that we have here is that the set of flags correspond to the set of symmetries of the object in question. So let's look at an icosahedron. What does a flag look like here? It looks like a vertex contained in an edge contained in a triangular face. How many flags are there? There are 12 vertices, and five edges off of each vertex, and two faces off of each edge, so there are a total of 120 flags. And there are 120 symmetries of the icosahedron. A flag doesn't have to look like a 0-dimensional thing contained in a 1-dimensional thing in a 2-dimensional thing. It could look like a 0-dimensional thing contained in a 2-dimensional thing; we call this a partial flag because it doesn't have a 1-dimensional piece. It could, however, look like a 0-dimensional thing contained in a 2-dimensional thing and also contained in a different 2-dimensional thing. For instance, consider the cuboctahedron; you get this by taking a cube and cutting off all of the corners so that the square faces become smaller square faces; you end up with a shape that has 6 square faces and 8 triangular faces; now each vertex is contained in four edges, and each edge is contained in a single triangular face and a single square face. So a flag could be a vertex and an edge containing the vertex and a square face containing the edge and a triangular face containing the edge. But now we have too much information; if we only knew the square and the triangle, we could figure out the edge. So we can make a flag that's just the vertex, the square and the triangle, and the edge is implied. How many such flags does the cuboctahedron have? Well, pick a vertex; there are 12 vertices. Pick a square containing that vertex; there are two such squares. And now pick a triangle that also contains that vertex and shares an edge with the square; there are two such triangles. So we get 12*2*2 = 48, same as the cube, which is probably to be expected since we got the cuboctahedron by slicing bits off of the cube. Now let's look at some diagrams. First let's look at the diagram A_{0}. This is a row of no dots, and I'm going to say that it corresponds to a vertex, being 0-dimensional. It has one flag, containing the vertex. Well say that #A_{0} = 1 Let's look at the diagram A_{1}, which looks like a single dot, *. I'm going to say that it corresponds to an edge, being 1-dimensional. An edge has two flags, one for each vertex. We'll say that #A_{1} = 2. Let's look at the diagram A_{2}, which looks like two dots in a row connected by a line marked 3. I'm going to say that it corresponds to a triangle, and has six flags. We'll say that #A_{2} = 6. Let's also look at the diagram BC_{2}, which looks like two dots in a row, connected by a line marked 4. I'm going to say that it corresponds to a square, and has eight flags. We'll say that #BC_{2} = 8. Now let's look at BC_{3}, which looks like three dots in a row, the first two connected by a line marked 4 and the last two connected by a line marked 3. Now consider the last dot. If we delete it we get a diagram that looks like BC_{2}, which I said corresponded to a square. If instead we deleted the second and third dots, we're left with A_{1}, which is an edge, and if we delete all three dots we get A_{0}, which is a vertex. So our flags for the cube are what we get by deleting dots and looking at the remaining diagrams. What if we flip it around? If we delete the first dot only, we get the diagram A_{2}, corresponding to a triangle; if we delete the first and second dots we get A_{1}, for an edge, and if we delete all three dots we get A_{0}, for a vertex. And these are the flags for an octahedron. But wait, you say, the icosahedron also had flags that were made up of a vertex contained in an edge contained in a triangle, so why should we be so confident that we got an octahedron this time around? Because we haven't used all the information in the diagram yet! So consider our diagram BC_{3} again. We said that deleting dots from the right got us pieces of the flag for a cube. What does deleting dots from the left tell us about the cube? Well, suppose we deleted the first dot. Then we get the diagram A_{2}, i.e. the triangle. Notice that if you take a cube and consider a vertex, that vertex is contained in 3 edges and each of those edges is contained in 2 faces, so that vertex is contained in six flags. 6 = #A_{2}. In fact, recalling the relation between flags and symmetries, we get that there are 6 symmetries of the cube that keep a particular vertex in place. We call this the "stabilizer" of the vertex. Now suppose you delete the first and second dots from BC_{3}. You end up with A_{1}, and #A_{1} = 2. Only now instead of two vertices contained in the single edge, we have two faces containing the single edge. Similarly, for each edge there are 2 symmetries of the cube that keep all the points of the edge in place. And finally, we deleting all three dots yields A_{0}, where #A_{0} = 1, and indeed, a face is contained in one...cube. I guess? More importantly, if we insist on keeping all of the points of a face in place, we can't actually move the cube at all. So for the cube we assign the left-most dot to the vertices, because deleting it and everything to the right of it yields A_{0}, and then the dots to the right of the first dot tell us that there are 6 symmetries of the cube will keep the vertex in place. The dot in the middle corresponds to the edges, since the dots to the left of it yield A_{1}, and the dot to the right tells us that there are two symmetries of the cube that fix all of the points of the edge. The dot on the right corresponds to the faces, since the dots to the left of it give us a BC_{2}, i.e. a square, and the lack of dots to the right of it tell us that keeping the face in place locks the entire cube in place. Now let's look at the octahedron. We'll flip our diagram over so that the 3 is between the first left two dots and the 4 is between the right two dots. Our leftmost dot corresponds to the vertices since there's nothing to the left of it, and the dots to the right form a BC_{2}. So there are eight symmetries of the shape that keep the vertex in place; this is how we determine that we're getting an octahedron, since for an icosahedron there would be 10 symmetries. Now what happens when we instead fold the diagram over so that the middle dot is to the left and the other two dots are to the right? What do we get? Well now the left-most dot is still the vertex dot, but now the dots to the right of it are two disconnected dots. So we get an A_{1} and another A_{1} of symmetries, i.e. 2*2 symmetries. Okay, what about the other dots? Well, for the dot connected to the left-most dot by a 3, the rest of the dots yield a BC_{2}, so it corresponds to a square, and there are no dots to the right of it so no symmetries leave it unmoved. For the dot connected to the left-most dot by a 4, the rest of the dots yield an A_{2}, so it corresponds to a triangle, and there are no dots to the right of it so no symmetries leave it unmoved. So we have that each vertex is contained in a square and in a triangle. Moreover, the A_{1} + A_{1} set of symmetries that fixes a vertex indicates that each vertex must be contained in two squares and two triangles. This should sound kind of familiar: it's the cuboctahedron! The symmetries that leave a vertex fixed swap the two squares, or swap the two triangles, or swap the squares and swap the triangles, or leave everything in place. So you see what happens when we pick something other than an endpoint of the diagram to be the vertex point: we end up with something that has multiple types of faces, because we have multiple types of right-most dots. We'll also get similar results if we looked at diagrams that weren't linear: again we'd get multiple right-most points, and hence our shape wouldn't be regular. Except! What if the two right-most dots gave the same shape when deleted? Let's look at A_{3}, which is three dots in a row connected by lines marked 3. Let's fold the diagram so that the middle dot is on the left and the other two dots are on the right. Now the two dots on the right both yield A_{2} when deleted, i.e. triangles. Deleting the dot to the left yields A_{1}+A_{1} again. So now we have that each vertex is contained in two triangles, and also in two other triangles. Sound familiar? That's an octahedron. But now something slightly odd is going on, because according to the diagram, we no longer have a symmetry that will fix a vertex in place and rotate the octahedron by 90 degrees; there is no such motion in A_{1}+A_{1}. So now the eight triangular faces of the octahedron have been sundered into two sets of four. Identical sets, yes, and if you held an octahedron in your hand you'd be able to swap one set for the other, but according to the diagram you can't. So that's what happens when the two right-most dots give the same shape: they tell you the shapes of the faces, but they lie about the symmetries because they think that you can't swap the faces even when the faces are the same shape. So that's why the regular polytopes all correspond to endpoints of the linear Dynkin diagrams: from a Dynkin diagram you can read off a polytope by picking one dot to be the vertex dot, and if you don't pick an endpoint of a linear Dynkin diagram you either get a shape whose faces are different shapes or the diagram lies to you about how symmetrical the polytope is.
"1+1 = 2?" During grad school, I was trying to explain homotopy type theory to some friends of mine in the department lounge, and, as is the habit of many people who give impromptu lectures, was writing somewhat uninformative, contextless tidbits on the board. The most clear one was the question: "1 + 1 = 2?" And of course, this is immediately noticed by people who were in the lounge but hadn't been listening to the conversation, and who I like to think normally respect me as a mathematician. So what the hell was I talking about? Consider a pair of sets A and B. If we are only allowed to talk about maps to and from our sets but aren't allowed to talk about individual elements, how do we define the disjoint union of the sets? Now consider the disjoint union of A and B, which we'll denote by A+B. We have a function i_{l}: A -> A+B, and a function i_{r}: B -> A+B, the usual inclusion maps. Moreover, for any set X and any functions f: A -> X and g: B -> X, we can uniquely define (f+g): A+B -> X such that (f+g)i_{l} = f (f+g)i_{r} = g So the disjoint union allows us to combine functions with the same range set. If we were allowed to talk about elements, all of this follows by looking at an element of A+B and asking "Is it from A? Then do the function from A to the set. Is it from B? Then do the function from B to the set." But if we can't talk about elements, then this will have to do. Note that the number of elements in A+B is the number of elements in A plus the number of elements in B. Well, wait, we said that we aren't allowed to talk about elements. Fine. We call a set T "terminal" if for any set S, there is exactly one function from S to T. Notably, all one-element sets are terminal, and indeed, the only terminal sets have exactly one element. Moreover, as sets, all the terminal sets are isomorphic, because there's exactly one function between any two terminal sets, and that function has to be invertible because there's exactly one function going the other way. So we have a notion of 1, as any terminal set. We also can say that 1+1 has two elements, in the sense that there are two functions from 1 to 1+1, which, if we were allowed to talk about elements, would send the single element of 1 to one of the elements of 1+1, or to the other element. While we're here, we can also define "0" as an initial object. We say that a set I is "initial" if for any set S, there is exactly one function from I to S. It turns out that the set I has to be the empty set, so "0" is a pretty good moniker for it. Okay, let's abstract a bit, to categories. Instead of talking about sets, we'll talk about more generic "objects", and instead of functions, we'll talk about more generic "morphisms". We can still talk about terminal objects, in that there is always exactly one morphism from any object to a terminal object. We can talk about initial objects, in that there is exactly one morphism from an initial object to any object. And we can talk about "disjoint unions", except we call them coproducts, in the following sense: Suppose that we have two objects A and B. Then we define A+B to be an object such that there are morphisms i_{l}: A -> A+B and i_{r}: B -> A+B and such that for any object X and any pair of morphisms f: A -> X and g: B ->X, we get a unique morphism (f+g):A+B -> X such that (f+g)i_{l} = f (f+g)i_{r} = g While we're here, we can also define the product. It looks like the coproduct with all of the arrows backwards, like so: Suppose that we have two objects A and B. Then we define A*B to be an object such that there are morphisms p_{l}: A*B -> A and p_{r}: A*B -> B and such that for any object X and any pair of morphisms f: X -> A and g: X -> B, we get a unique morphism (f*g): X -> A*B such that p_{l}(f*g) = f p_{r}(f*g) = g And note that for sets, the cardinality of A*B is the cardinality of A times the cardinality of B. For certain categories, we also have a notion of exponentiation: A^{B}, by which we mean all of the morphism from B to A. Note that if A and B are sets, then the cardinality of A^{B} is exactly the cardinality of A raised to the cardinality of B. Suppose we have a category where products, coproducts and exponentials are always defined. Then we automatically get initial and terminal objects as well, which we'll denote by 0 and 1. All of these things obey what Tarski called the "high-school algebra" axioms: x+y = y+x (x+y)+z = x+(y+z) x*1 = x x*y = y*x (x*y)*z = x*(y*z) x*(y+z) = (x*y)+(x*z) 1^{x} = 1 x^{1} = x x^{y+z} = x^{y}*x^{z} (x*y)^{z} = x^{z}*y^{z} (x^{y})^{z} = x^{y*z} and indeed, you might recognize these from high school. Note that we might need to replace those equalities with isomorphisms, but eh, close enough. We also have a notion of "how many points does an object X have", again by saying "how many morphisms are there from a terminal object to X?" Anyway, so we've defined a notion of 1 in a more general setting, and 1+1 in a more general setting, and the notion of number of points in a more general setting. Of course, all of this can be demonstrated with a simple example: Suppose our objects are pairs of sets, and our morphisms are pairs of functions. Then our terminal object looks like (1,1), and we have that (1,1)+(1,1) = (1+1,1+1). So far so good, right? Except now we have that there are four morphisms from (1,1) to (1+1,1+1). Oops! So it turns out that the axioms given above, familiar from high-school algebra, are insufficient to actually tell you the size of 1+1.
My favorite fact about harmonic polynomials. Suppose you have a function P in n variables x_{1}, ..., x_{n}. We define the Laplacian Δ as the operator that sends P to Δ(P) = ∂^{2}P/∂x_{1}^{2} + ... + ∂^{2}/∂x_{n}^{2} In other words, you take the second derivative of P in each direction, and add them together. The Laplacian is a particularly interesting operator for a number of reasons. It shows up all over the place in differential equations, since it gives us a nice way to talk about wave-like behavior or about diffusion of heat or particles or generally any case where the behavior of an individual is described by averaging the behavior of its neighbors. But as always I don't care about analysis, I want to talk about algebra. We say that P is harmonic if Δ(P) = 0. My favorite fact about harmonic functions goes as follows: Let C be the polynomial x_{1}^{2} + ... + x_{n}^{2}. Then for any twice-differentiable function* in n variables with real coefficients, there is a unique way to write P as a (possibly infinite) series of terms C^{k}P_{k} where P_{k} is harmonic. So for instance, let's suppose that n = 3, so that C = x_{1}^{2} + x_{2}^{2} + x_{3}^{2} Suppose that P = 3x_{1}^{4} - 4x_{1}^{2}x_{2}^{2} + 3x_{1}^{2}x_{3}^{2} + x_{2}^{4} + x_{2}^{2}x_{3}^{2} + x_{3}^{4} Then we have that P_{0} = x_{1}^{4} - 6x_{1}^{2}x_{2}^{2}+x_{2}^{4} P_{1} = x_{1}^{2}-x_{2}^{2} P_{2} = 1 and you can check that each of these is harmonic. So why do I like this fact? Because I'm a representation theorist, and this is about representations of the orthogonal group SO(n). If we take our space R^{n} and apply a rotation to it, then we can ask how this rotation affects our functions P and C and how it affects our operator Δ. It turns out that any rotation will yield the same C, because C is just the Euclidean distance which our rotations preserve by definition, and it will also preserve Δ, because we can get Δ by taking C and replacing the variables with the corresponding partial derivatives. So if we write R for the rotation, then applying R to P and then applying Δ is the same as applying Δ to P and then applying R: Δ(R(P)) = R(Δ(P)) In particular, if P is harmonic, then so is R(P) for any rotation R. So this fact about harmonic functions is particularly nice since it allows us to write any function in terms of an invariant C and harmonic functions, which themselves are defined by a rotationally-invariant operator. Indeed, this allows us to write functions on spheres in a particularly useful way, because the unit sphere in R^{n} is what we get by setting C to 1. Conversely, if we have a function that we've only defined on the unit sphere, we can extend it to a harmonic function on all of R^{n} in a unique way, by extending it in any way at all, writing the result in terms of C and harmonic functions, and then setting all of instances of C in the expression to 1. There's some other fun you can have with certain other groups, where by "certain other" I mean Coxeter groups and simple Lie groups, because I never talk about any other groups (I really ought to talk about other groups at some point). * You might need the function to be analytic. I'm not actually sure. I know that it's true for polynomials and therefore true for analytic functions, but I am not sure for more general cases.
deRham cohomology is best cohomology (fight me) There's an old theorem of Poincare that says that if you have a vector field v defined everywhere on the plane, and the curl of v is 0 everywhere, then there is some function f such that v is the gradient of f. Moreover, if you have a region R of the plane that doesn't have any holes in it, and v is defined everywhere on R and the curl of v is 0 everywhere on R, then there is some function f defined on R such that v is the gradient of f. The converse is pretty clear just by writing out definitions and expanding terms and being careful with your bookkeeping. The part about R not having any holes in it is pretty important. Consider the region that's the entire plane except the origin. Let v be the vector field given by v(x, y) = (-y/(x^{2}+y^{2}), x/(x^{2}+y^{2})) v isn't defined at the origin, but that's outside our region. If I've gotten my numbers right, then we have that the curl of v is 0 everywhere in our region. But you can check that v is not the gradient of any function! Suppose that v is in fact the gradient of a function f. If you start at a point in the plane, and evaluate f there, you get some value f(x,y). And just like regular integration will tell you how f changes as you move in terms of its derivative, the integral along a curve will tell you how f changes as you move around in the plane in terms of the gradient: f(end) - f(start) = ∫_{C} grad(f)·t ds where t is the unit tangent vector to the curve C at the point, and ds is your arclength element. In particular, if you start and end at the same point, i.e. your curve is a loop, then integrating the gradient around the loop should yield no change. So let's look at v on the unit circle. For (x,y) on the unit circle, we get v(x,y) = (-y, x). Also, our tangent vector to the unit circle is (-y, x). ∫_{C} v·t ds = ∫_{C} v·t ds = ∫_{C} (y^{2}+x^{2} ds = ∫_{C} 1 ds = 2π which is not 0! So we get that v can't be the gradient of a function! And indeed, there is a sense in which this can be used to detect holes: If you have a region R in the plane and someone hands you a vector field defined on R whose curl is 0 but which isn't the gradient of a function on R, then you know by Poincare's theorem that your region has to have a hole If someone hands you a loop and the integral of the vector field around that loop isn't 0, then you know that the vector field isn't the gradient of a function In fact, in order to tell you anything interesting, the loop has to go around a hole! Otherwise we can just apply Poincare's theorem to the region inside the loop, and so the integral around the loop would have to be 0. So let's pull away from individual vector fields and functions, and look at a general statement: Let R be a connected region on the plane and consider all of the vector fields on our region R that have vanishing curl. Call this set Z^{1}(R). Note that it's a vector space, since the curl is a linear operation on vector fields. It's a very big (infinite dimensional in fact) vector space, but we're not interested in it on its own. Let B^{1}(R) be the vector space of all vector fields that are gradients of functions on R. Again, very big vector space. Also it's a subspace of Z^{1}(R), since the curl of a gradient is always 0. The question is: is there anything in Z^{1}(R) that isn't in B^{1}(R). Alternatively, is H^{1}(R) = Z^{1}(R)/B^{1}(R) nontrivial as a vector space? If R has a hole in it, then H^{1}(R) does have something nontrivial in it, and thus is at least one dimensional. Claim: The dimension of H^{1}(R) is the number of holes in R. Now let's boost by a dimension. Here's where the fun starts. Suppose you have a region in 3-dimensional space. We can define functions and vector fields in this region. Consider a vector field v. We can again define the curl of v, and again we get the statement that if our region has no holes in it, then every vector field whose curl is 0 everywhere is the gradient of a function. And again we can say: Define Z^{1}(R) as the space of vector fields with vanishing curl, and B^{1}(R) as the space of vector fields that are gradients of functions, and define H^{1}(R) = Z^{1}(R)/B^{1}(R) And if H^{1}(R) is nontrivial, then we have a hole. But not just missing isolated points or pockets. No, this indicates a tunnel that goes all the way through our space, like the hole of a donut. H^{1} asks what happens when we take a rubber band around our supposed hole and let it go: for an empty pocket the rubber band will just slide around it and shrink to nothing, but around an tunnel all the way through the space, the band can't avoid the tunnel. But what about empty pockets? How do we detect them? Well, suppose you have a vector field v whose divergence is 0 everywhere. If our region has no holes in it, then every vector field whose divergence is everywhere 0 is the curl of a vector field. So we define Z^{2}(R) as the space of vector fields with vanishing divergence, and B^{2}(R) as the space of vector fields that are curls of vector fields, and define H^{2}(R) = Z^{2}(R)/B^{2}(R) And now the dimension of H^{2}(R) counts pockets rather than tunnels. H^{2} asks if you have a balloon and you let it shrink, can it shrink to 0 or will it get caught on a patch of missing space. This is the beginning of what is called deRham cohomology, which is the nicest cohomology but only works in spaces that have enough derivatives defined on them. But we have this idea that we can count holes of various types by taking some operations z and b such that zb = 0, and looking at the space Z of things that z sends to 0, and the space B of things that are the result of applying b to stuff, and looking at H = Z/B. If we're a bit more careful about what kinds of derivative-like operators we're talking about and switch from vector fields to what are called "differential forms", we can extend this idea to any number of dimensions. If we have a region of four-dimensional space, what kind of holes can we have, and what should we compute to detect these holes? But that requires a bit more machinery. We can even extend it to things that aren't subsets of R^{n}. We just need appropriate notions of derivative and differential form. On the other hand, because Z and B are so large, I guess it's not great if you want to actually compute stuff. But who cares about that kind of thing?
I'm starting to appreciate generating functions more. I mean, I already liked the idea, but now that I'm occasionally using them to actually do stuff I'm really seeing how powerful and elegant they can be.
I just remembered that we can tie the harmonic function stuff and the deRham cohomology stuff together: There is a natural way to extend the Laplacian Δ to act on differential forms, and using that extension we get that for each w in H^{n}(R) there is a unique element z in Z^{n}(R) such that Δ(z) = 0 and w is the image of z under the natural quotient map from []: Z^{n}(R) -> H^{n}(R). The interaction of the harmonic structure and the differential structure is kind of interesting. For instance, if you have an element a in Z^{m}(R) and an element b in Z^{n}(R), then ab is an element of Z^{m+n}(R), but it is not necessarily the case that the product of two harmonic things is harmonic; there is a harmonic differential form that corresponds to it, since we can take [ab] in H^{m+n}(R) and ask for the corresponding harmonic element of Z^{m+n}(R), but it probably isn't going to be ab even if a and b are harmonic.