Complex numbers and branched covers: If you don't want to think about the math, just watch the video at the end. It's a spherical video, so that means you can click and drag the viewpoint around. Please do so, and don't be afraid to turn all the way around several times. [math] Consider the complex numbers, a + ib where a and b are real numbers and i squares to -1. One of the fun statements about the complex numbers that is not true about the real numbers is that for any polynomial P(z) = c_{n}z^{n} + c_{n-1}z^{n-1} + ... + c_{1}z + c_{0} where n is not equal to 0 and a_{n} is not 0, i.e. for any polynomial that isn't a constant function, there is some complex number a + bi such that P(a+bi) = 0. This is often called the fundamental theorem of algebra. Indeed, for most polynomials there are n distinct complex numbers a+bi such that P(a+bi) = 0. This is very different from the real case, where, for instance, x^2 + 1 = 0 has no solutions in the real numbers; for any real number x, x^2 is at least 0, so x^2 + 1 is at least 1 and hence can't equal 0. Now consider the function f_{c}(z) = z^{2} - c. If f_{c}(z) = 0, that tells us that z^{2} = c. Unlike the real case, for the complex numbers there is always at least one solution. Furthermore, if c is not equal to 0, we get two distinct solutions, since if z^{2} = c, then (-z)^{2} = z^{2} = c so solutions come in pairs if z is not equal to -z, i.e. if z is not 0. Let's look at the complex plane again. It stretches off into infinity in all directions without reaching it, and that makes it kind of unruly, so we will make a point at "infinity" just by declaring one; I'll denote it by ∞. So now every direction heads off to ∞. Perhaps a more concrete realization of this: consider, say, a circular rubber sheet, maybe a meter or so across. If you take the edge of the sheet and shrink and glue that edge together into a single point, you get a balloon, i.e. a sphere. Now every straight line on the original rubber sheet eventually hits the edge of the sheet, since the sheet is only a meter across, and since we've shrunk the edge to a single point, every straight line on the original sheet now hits that single point. That single point is ∞. Of course, the complex plane is a lot bigger than a meter across, but if we never actually measure distances then our complex plane + ∞ is just a sphere. This is called the "one-point compactification" of the plane, or the "Riemann sphere". But what can we say about polynomials now? Well, we should expect that ∞^{k} is going to be ∞ for any positive integer k, so P(∞) = ∞ if P is not a constant function. And conversely, f_{∞}(z) = z^{2} - ∞, so f_{∞}(z) = 0 only if z = ∞. So let's examine our sphere. We have the map z -> z^{2}, which maps points on the sphere to points on the sphere, in almost a 2-to-1 fashion. So it's almost like we're actually mapping two copies of the sphere to one copy. But it's not like we have one copy that is z and one copy that is -z. Consider z = 1. That gets mapped to 1. Moving counterclockwise around 0 gets us to z = i, which gets sent to -1, and then to z = -1, which gets sent to 1, and then to z = -i which gets sent to -1, and then back to 1. So our two copies of the sphere are actually joined together. One way to think about this is to take two copies of the Riemann sphere, slice them both along the positive-real axis from 0 to ∞, and then join the two copies so that the first quadrant of one sphere gets connected to the fourth quadrant of the other, i.e. positive-imaginary part of one sphere to the negative-imaginary part of the other. For the most part this just gets us a somewhat-more convoluted surface, but a short-sighted ant standing on it wouldn't be able to tell that anything funny is going on because you're just gluing the edge of one sphere to that of an identical sphere; except at 0 and ∞ where all of the quadrants come together. This surface is called a 2-fold branched cover of the Riemann sphere, because it's two copies of the Riemann sphere glued together so that most of the time it just looks like Riemann-sphere stuff, and then there are two points where things go bad. We call those bad points "branch points". [/math] Here's the promised video of a branched cover that Henry Segerman made of his apartment, with the branch points at the middle of the ceiling and the floor.
1 + 1 = 2 So in the Yahoo Answers thread the question of proving that 1 + 1 = 2 was brought up, and the referent given was Russell and Whitehead's Principia Mathematica. The Principia was a mathematical milestone, a first attempt at a full, from-first-principles, ground-up derivation of all of mathematics. I've mentioned this sort of endeavor earlier in this thread; this is what mathematicians tend to refer to when they talk about such things. And the common remark about the Principia is that it contains a proof that 1 + 1 = 2 and that proof is several hundred pages long. The Principia had a lot of problems: it wasn't rigorous by today's standards, it used nonstandard notation that didn't work, the authors didn't know what they were doing, and, as noted in earlier posts in this thread, the project was doomed from the start. These days mathematicians don't use the Principia or the system set up within. Too many problems, not enough power or breadth, and also it's awful to try to use. So there are two linked questions here. Firstly, why is "1 + 1 = 2" hard to prove, and secondly, how would one go about proving it? The first question has several answers. For instance is the fact that 1 + 1 = 2 is obvious, and so proving it means having to deliberately pretend that you don't know a lot of things that have been with you since you were a child, things that you know in your heart to be true and unquestionable and therefore really hard to explain, where the best you can do is ask "but how can it not be true?" which, unfortunately, does not constitute a proof. Complementary is that 1 + 1 = 2 has four symbols all of which need to be defined, and depending on where you start, this can be very difficult. This is one of the things that the Principia struggled with: terrible definitions. What does it mean for two numbers to be equal? What does it mean to add a number to another number? What is 2? These days mathematicians have come up with various ways of defining those four symbols, or, as the case may be, not defining those four symbols but at least putting sufficient constraints on their behavior that logical inference can be performed. And then there's again the fact that the authors of the Principia, brilliant mathematicians and logicians, didn't really have a great idea of what they were doing. Mathematical logic was a young field, formalization hadn't quite figured itself out yet, and so Principia was a groundbreaking effort; and like most alpha versions it had a number of bugs that the authors hadn't had any clue were there. The second question ties into the definitions issue of the first question. To prove something is to provide a path from an accepted starting point to that something using moves approved by the person you're trying to convince; so a proof of a thing depends on the starting point and the moves you're allowed. The two most common setups are the Zermelo-Frankel axioms with the Von-Neumann construction of the natural numbers, or the Peano axioms, and the proofs are fairly similar in the two cases but differ slightly in their starting points and which fiddly bits absolutely need to be present. For concreteness, let's examine the Peano setup. If you don't want to suffer through a bunch of pedantry, skip to the horizontal line. In the following, statements preceded to the left by [ ] are axioms or definitions, statements preceded by ( ) are theorems derived from axioms/definitions, [ | ] on the right are how the theorems are derived from previous axioms/definitions/theorems; things on the right of the | is the axiom/definition used at the given step, things to the left are demonstrations that the axiom/definition is applicable. We have three basic symbols: 0, S and =. They obey the following rules: [x0] 0 is a (natural) number [x1a] reflexivity: (a is a number) implies (a = a) [x1b] symmetry: (a and b are numbers) implies (a = b implies b = a) [x1c] transitivity: (a and b and c are numbers and a = b and b = c) implies (a = c) [x2] closure of numberhood under equality: (b is a number and a = b) implies (a is a number) [x3] closure of numberhood under succession: (a is a number) implies (S(a) is a number) [x4] S is a function: (a and b are numbers and a = b) implies (S(a) = S(b)) [x5] S is injective: (a and b are numbers and S(a) = S(b)) implies (a = b) [x6] 0 comes first: (a is a number) implies (S(a) does not equal 0) [x7] induction: for any subset M of numbers, ((0 is in M) and ( (x is in M) implies (S(x) is in M) )) implies that for all numbers y, y is in M. We define addition recursively: [a0] For all numbers a and b, a + b is a number [a1] For all numbers a, b, b', if b = b' then a + b = a + b' [a2] For all numbers a, a + 0 = a [a3] For all numbers a and b, a + S(b) = S(a + b) [d1] We define 1 = S(0) (l0) S(0) is a number [x0 | x3] [L1] 1 is a number [d1, l0 | x2] [d2] We define 2 = S(1) Given this, we prove that 1 + 1 = 2 via: (1) 1 + 1 is a number [L1 | a0] (2) 1 + 1 = 1 + S(0) [L1, d1 | a1] (3) 1 + S(0) is a number [(1), (2) | x2] (4) 1 + S(0) = S(1 + 0) [L1, x0 | a3] (5) S(1 + 0) is a number [(3), (4) | x2] (6) 1 + 1 = S(1 + 0) [(1), (2), (3), (4), (5) | x1c] (7) 1 + 0 = 1 [L1 | a2] (8) 1 + 0 is a number [L1, (7) | x2] (9) S(1 + 0) = S(1) [(7), (8), L1 | x4] (10) S(1) is a number [L1 | x3] (11) 1 +1 = S(1) [(1), (5), (6), (9), (10) | x1c] (12) 2 is a number [d2, (10) | x2] (13) S(1) = 2 [(10), (12) | x1b] (14) 1 + 1 = 2 [(1), (10), (11), (12), (13) | x1c] ____________________ Behold: not hundreds of pages. Things I have not proved: certain facts about equality, for instance the idea that not being not equal to something means that you're equal to it. Don't try to parse that too hard. The point is that here is a proof in only a relatively few lines of the theorem "1 + 1 = 2" starting from a somewhat barebones but not-too-unreasonable system. If I wanted to do this in ZF it would take a little longer, but one can essentially translate the Peano setup to ZF. Anyway, the point is that the community of mathematicians has figured out how to rigorously add 1 to itself without taking hundreds of pages to do so. It still takes longer than one at first expects that it should, but we can do it.
Ordinals A few posts back I mentioned that there are some things that are too big to be sets. One such thing is the collection of sets; if we assume that the collection of all sets is a set, we get Russell's paradox. Another is the set of ordinal numbers. If cardinal numbers describe the possible sizes of sets, ordinal numbers describe the possible well-orderings you can have on sets. That's a little vague and misleading, so let's say something concrete. Consider the empty set. It has nothing in it. {}. Call it 0. Now consider the set with one thing in it. What can we put in it? 0. So {0} = 1. Now consider the set with two things in it. What can we put in it? 0 and 1. So {0, 1} = 2. And so on. After building n sets, we can make a set with n things in it by putting all the previous things inside. So it looks like we're just remaking the cardinal numbers. Only now we have an ordering. We can say that a set n is greater than a set m if m is an element of n. Now we get to the fun part. Suppose that we've built all of the finite sets that we get by this process. It takes an infinite number of steps to do that, but let's suppose that we've done it. Now we can shove all of them into a set: {0, 1, 2, 3,...} = ω. The size of ω is our friend aleph_{0}, the size of the natural numbers. But here's where things start to diverge a bit. What about the set {0, 1, 2, 3,..., ω}? Its size is still aleph_{0}. But it's greater than ω, because it contains ω. If ω were finite, we'd call this new set ω + 1. ω isn't finite, but we do that anyway. So {0, 1, 2, 3,..., ω} = ω + 1.* And, we can build {0, 1, 2, 3,..., ω, ω + 1} = ω + 2, and so on, until we reach {0, 1, 2, 3,..., ω, ω + 1, ω + 2, ω + 3,...} = ω + ω = ω*2 And then onwards, to {0, 1, 2,..., ω, ω + 1, ω + 2,..., ω*2, ω*2 + 1, ω*2+2,... } = ω*3 and then {0, 1, 2,..., ω, ω + 1,..., ω*2,..., ω*3,..., ω*4,...} = ω*ω = ω^{2} And then {0, 1, 2,..., ω,..., ω^{2},..., ω^{2}2, ..., ω^{2}3,...} = ω^{2}*ω = ω^{3} And then {0, 1, 2,..., ω,..., ω^{2},..., ω^{3},..., ω^{4},...} = ω^{ω} And then... Anyway this keeps going. Eventually we get an awful infinite tower of ω^{ωωω...} which we call ε_{0}, and then after a while we run out the ability to denote things using ε_{0} and so we start in on ε_{1}, and then ε_{2},..., ε_{ω}, ε_{ωω}, ε_{ε0}, ε_{ε1}... and you get the idea. And then we eventually run out of things that can be denoted by ε and have to move on to other notations. And then we run out of notations, and we haven't even started in on the uncountable sets; all of these ωs and εs are all orderings on countable sets. The set of all countable ordinals is an uncoutnable ordinal called ω_{1}, and then the entire process starts over again. Anyway, the point is that mathematicians have tried to take the notion of "and so on" to its logical conclusion and eventually realized that you don't end up with a set. Basically, if the set of all ordinal numbers were a set then it would be an ordinal number and thus an element of itself, which would mean that it would actually be less than itself. This fact, that there are too many ordinals to constitute a set, also tells us that there are too many cardinals. Our friend aleph_{0} has friends aleph_{1}, aleph_{2},..., aleph_{ω}, aleph_{ω+1},... aleph_{ε0},..., one for each ordinal number, each a bigger cardinal than the last. Since there are too many ordinal numbers, there are too many cardinal numbers. *Notably, 1 + ω is not considered equal to ω + 1. ω + 1 is a set with a countably infinite number of things, and then another thing at the end. 1 + ω is one thing followed by a countably infinite number of things with no ending, but that's the same as just a countably infinite number of things with no ending; adding one thing at the beginning doesn't change that.
Okay, I wrote a bunch of stuff for the last post, but that's kind of tangential to the thing I really wanted to talk about. Surreal numbers Consider the following process: Each step consists of building new things from things that were built at previous steps. To build a new thing, you take all of the things that were previously built and divide them into "left things" and "right things" so that each left things is less than each right thing. The new thing is considered to be greater than all of the left things and smaller than all of the right things, and is the "simplest" thing that fits those criteria. For example, to start out with, you have nothing built, so you end up with no left things and no right things. The thing that gets built out of this is called "0", and we write it as < | > = 0. Now we can make 0 into a left thing or a right thing. We'll say that < 0 | > is 1, and < | 0> is -1. Now we have three things to place. < -1, 0, 1| > = 2, <-1, 0 | 1> = 1/2, <-1 | 0, 1> = -1/2, < | -1, 0, 1> = -2. And so on. At the next round, we would say that <-2, -1, -1/2, 0 | 1/2, 1, 2> = 1/4, and <-2, -1, -1/2, 0, 1/2 | 1, 2> = 3/4. So after countably many steps we've ended up with all of the fractions whose denominators are powers of 2. This includes all of the integers. Now we can shove everything to the left, i.e. <...-1,..., 0,..., 1,... | > which we'll call ω. Why? Because it comes after all of the integers. At the next step we can make <..., -1,..., 0,..., 1,..., ω | > = ω + 1, and so on. But we can also build <..., -1,..., 0,..., 1,... | ω > = ω - 1. Note that this works differently from the ordinal arithmetic we've defined above. We can also build awful things like <..., -1,..., 0 | ..., 1,...> where the ..., 1 indicates all of the positive numbers that we've built. So we'd want a number that's less than any positive number but greater than 0, i.e. an infinitesimal, although not in the sense that I used the term "infinitesimal" in several posts ago; these guys don't square to 0. If we continue on like this, we get a collection that is called the surreal numbers. It includes all of the real numbers, and also numbers that are greater than all of the real numbers, and numbers that are infinitesimally small. Some fiddling with left things and right things gives us notions of addition, subtraction, multiplication and division (except by 0). The real numbers have their usual notions, but we also have things like the ω - 1 mentioned above, and 1/ω, which is <..., -1,..., 0 | ..., 1,...> as mentioned above. In fact we get all of the algebraic structure of a field like the real numbers or the rational numbers, the full ability to do arithmetic, except for one crucial difference: the full surreal numbers contains all of the ordinal numbers. But as we saw in the previous post, there are too many ordinal numbers to form a set. So the surreal numbers can't form a set. A field, however, has to be a set. Mathematicians are constantly running out of ways to express themselves. They burn through alphabets, and then unicode symbols, and then fonts, assigning each thing a precise meaning that gets overwritten later because they've run out again. For a thing like the surreal numbers, we want to imply that it's like a field, has all of the operations of a field, but it's too big to be a set. So what do mathematicians call it? They call it a FIELD. I'm not actually sure how that's pronounced in real life because I've never heard it said. Also, because wordplay: We're generally aware of the word "surreal" in the usual, colloquial sense, and now we have "surreal" in the mathematical sense. Now, just as you can go from the real numbers to the complex numbers by saying "numbers of the form a + bi where a and b are real and i is the square root of -1" we can talk about numbers of the form a + bi where a and b are surreal and i is the square root of -1; such numbers are called "surcomplex numbers", because mathematicians are great at naming things.
The Curry-Howard Correspondence The Curry-Howard Correspondence was a series of letters between a mathematician, Howard, and a food dish originating in India. No one is quite sure why Howard thought to write to a food dish or how said food dish was able to write back, but the world is often stranger than one thinks.* The basic idea is that deductions can be viewed as computations. Suppose I have a statement, N. For instance, maybe the statement is "there is a natural number". A witness of N is a reason to accept N as true, for instance in this case, the existence of the number 3. We can write this as 3:N, witness on the left, proposition being witnessed on the right. More formally, we view propositions as types in the programming sense, and a witness is then a term or instance of that type. A proof then takes a set of axioms, i.e. some propositions, and derives another proposition from them. We can consider this in terms of witnesses: a proof takes reasons to accept the initial set of propositions and turns them into a reason to accept the concluding proposition. A computation takes instances of types and returns an instance of another type. So for instance, let's take the statement Z that there is an integer. We can take a witness of N, and, say, multiply it by 2 to get a witness of Z. So starting with our witness 3:N, we get the witness 6:Z. Hence we would say that N => Z, because we have a function from N to Z. Notably, if we have a function from type A to type B, then if we interpret A and B as propositions then we would have that A implies B, because our functions turn witnesses for A into witness for B. Let's go back to our usual propositional truth-table: If Q is true, then regardless of what P is, P => Q If P is false, then regardless of what Q is, P => Q If P is true and Q is false, then P !=> Q. How does that look in our functions setup? If Q has a witness q, and we can define the function "send anything in P to q". So we have a function, and therefore P => Q. If P doesn't have any witnesses, then we can define the function "if you find anything in P, send it to whatever". So we have a function that doesn't do anything, but that doesn't disqualify it from being a function, and therefore P => Q. If P has a witness but Q does not, then there's nowhere to send that witness, so we can't have a function from P to Q. Therefore P !=> Q. Based on this, we have the following ideas: the proposition TRUE has a single witness that has no properties other than its existence. Everything implies TRUE because we can always make a function that sends things to the single witness of TRUE. The proposition FALSE has no witnesses. It implies everything because no function from FALSE to anything else ever gets evaluated, so those functions don't have to do anything. Moreover, things with witnesses can't have functions to FALSE because there's nowhere for the witnesses to go, and TRUE can't have any functions to things without witnesses because TRUE has a witness that needs to go somewhere. So that's our basic model. This is exciting, because now we can use computers to check proofs, by writing the proof as a sequence of functions and having a program check that our typing declarations are correct. We can define AND and OR and NOT in terms of computations. For instance, we have a function that I'll call "meet" that takes a witness of A and a witness of B and returns a witness of the type A AND B, and a pair of functions "proj_{A}" and "proj_{B}" that take a witness of A AND B and return a witness of A and a witness of B respectively. See if you can figure out how OR works. For NOT, we can think of it as saying that A is false, or in our witness-setup, that A has no witnesses. How can we say "has no witnesses"? We can say that A => FALSE, because if A had witnesses then we could not have A => FALSE. If I am to believe (A => FALSE), I need a reason to believe, i.e. a witness, and the clearest choice of a witness of (A => FALSE) is a function from A to FALSE. This is one of the more interesting aspects of this sort of computational logic: implications, i.e. (A => B), are themselves propositions, whose witnesses are "proofs" or functions. In other words, the proposition A => B has as witnesses functions f: A -> B. This is significantly different from the usual formulation of "proving things", in that proofs are separate objects from the things being discussed. A proof about numbers is not itself a number (Godel encoding does not turn a proof into a number, merely encodes it using numbers), and the whole body of logic that is used to construct and examine proofs are extra attachments to the domain of inquiry, subject to different laws and different techniques. In contrast, in the Curry-Howard propositions-as-types setup, proofs are just another kind of witness, and indeed just another type of object, just as natural numbers and integers are objects, so we're not attaching an extra thing to our framework: proofs and logic are treated the same way as the objects in question. Moreover, just as we can examine natural numbers and integers and find more structure than the mere existence of natural numbers and integers, we can look at the witnesses of A => B and perhaps find structure that is not described by the mere statement A => B. One result of the particular formulation of NOT A is that double negation doesn't work. NOT (NOT A) = NOT (A => FALSE) = ((A => FALSE) => FALSE). Which is a weird thing to write, but the statement is that if you had a function from A to FALSE, you could produce a witness of FALSE, and witness of NOT (NOT A) is then a function that takes (functions from A to FALSE) and returns witnesses of FALSE. This is very, very different from having a witness of A, and indeed, just using the definitions given, you can't in general form a function from witnesses of NOT (NOT A) to witnesses of A. So this logic is naturally intuitionistic! Double negation has to be added in as an extra rule, but then we lose the computational nature of the setup, because how the hell do you turn (function from (functions from A to FALSE) to FALSE) into witnesses of A? All that the usual statement of double negation does is say is that if a (function...) exists, then a witness of A exists without saying how to get from one to the other. In fact, like many intuitionistic theories, this propositions-as-computation-types framework is generally unhappy with statements of existence that don't actually provide explicit examples. A witness in the hand is worth two in the bush, or however that goes. So that's the Curry-Howard correspondence, the realization that several problems of the form "how to get from A to B" are actually the same. *CW: lying
Quantum Stuff! This is technically physics, but because the physical interpretation isn't really settled at all, what we have is essentially a purely mathematical model that somehow matches very well with physical observations. So I think that counts as applied math. If you know some linear algebra, the whole of this post will boil down to: eigenvectors of one matrix are often not eigenvectors of other matrix, and that matrices don't necessarily commute. There's a lot more to quantum mechanics than that, of course, but most of the fun divergences from classical behavior come from those facts, which are actually just one fact. Suppose you're standing somewhere, and 200 kilometers north of you is a city, called it city A. 200 kilometers to the south of you is city B. To the west is city 1, and to the east is city 2. And then someone stands 200 kilometers northwest of you. You can ask about that person, "are they in city A?" and the answer would probably be no, unless you're in like Tokyo or something. And you can ask "are they in city 1?" and the answer is still no. You can describe their position in terms of the two cities, but they are in neither city. And suppose they decide to wander around the circle of radius 200 kilometers that has you at the center* then sometimes they'll pass through city A or B or 1 or 2, but most of the time they can't be properly described as being in any of the aforementioned cities. Let's get a little less metaphorical. When we have a physical system, a collection of physical objects, forces, fields, etc, just a bunch of physical things that are possibly capable of interacting in a physically describable way, we say that a total description of that system at a given moment is called its state, and different physical paradigms describe states in different ways. Conversely, different physical paradigms have different ways to read off physical predictions from states. For instance, for a system consisting of one particle, we might describe it in terms of where it is and how it's moving, and maybe its mass or its electric charge, etc, and all of those properties would be collected into a description called its state, and to find out the momentum of the particle you take the velocity data, which is stored as a vector or a triple of numbers, and the mass data, which is another number, and multiply them together. In quantum mechanics, we use what is called a "wavefunction" to describe the state, usually written Ψ. There are multiple ways to construct it, but the important part is that given an input x, Ψ will spit out a complex number Ψ(x). So we have a notion of multiplying Ψ by a complex number, by which we mean that the function cΨ will take an input x and spit out c times Ψ(x). Also, given two wave functions, we can add them together, where Ψ + Φ takes x and spits out Ψ(x) + Φ(x). The formal term is that the set of wavefunctions form a vector space, if you know that term. If you don't, just think of them in terms of what happens when you put in inputs. So given a bunch of wavefunctions, Ψ_{1}, Ψ_{2}, etc, and a bunch of complex numbers, c_{1}, c_{2}, etc, we can form a wavefunction c_{1}Ψ_{1} + c_{2}Ψ_{2} + etc, which mathematicians would call a linear combination, and which physicists would call a superposition. For instance, if Ψ_{A} corresponds to being in city A, and Ψ_{1} to being in city 1, then √(1/2)Ψ_{A} + √(1/2)Ψ_{1} corresponds to being 200 kilometers northwest. This is one way that quantum mechanics differs from our usual understanding of physics, because we usually think of things as having definite physical properties, as being in City A or B or 1 or 2. We usually think of Schrodinger's cat as definitely alive or definitely dead, even if we don't know which. But quantum mechanics isn't about being in the cities. Quantum mechanics is about wandering in the wilderness in between. Extracting data from a wavefunction is a little interesting: for each physically observable quantity, we have an operator, say, A, that takes Ψ and spits out another wavefunction A(Ψ), and then a little more manipulation gives us a value, written <A>. When A(Ψ) is equal to aΨ, we say that Ψ is an eigenfunction of A, and the physical interpretation is that the system described by Ψ has value a for that physical observable. So for instance, there is a position operator, X, and if X(Ψ) = xΨ, we say that the system described by Ψ has position x. So what happens when Ψ isn't an eigenfunction of A? Fortunately, Ψ can be written as a linear combination of eigenfunctions of A, i.e. we say that Ψ is a superposition of states that have definite values for a, and then the value <A> ends up being a weighted average of the various possible values a that the physical observable can have. In the technical jargon, A is a linear transformation, i.e. A(cΨ) = cA(Ψ) for complex numbers c, and A(Ψ + Φ) = A(Ψ) +A(Φ). Hence the averaging process. So suppose that X is the operator corresponding to position. If Ψ corresponds to a state with a definite position, then <X> will end up being a value x that corresponds to that position. If Ψ doesn't have a definite position, then <X> will give you an expected position, in the probabilistic sense. Even worse are things like the operator N that measures how many particles a system has. Some states have a definite number of particles, and so Ψ would give us a integer n for the number of particles. But superpositions of those states would give wavefunctions that are not eigenfunctions of N, and then <N> might end up being some weird fraction because it's averaging over the possible number of particles the system could have. But that's mathematical prediction. When we actually do observations we get valid numbers, not fuzzy averages, and the system then acts as if the number we got was the definite value for that system. The system acts like instead of just a passive documentation of reality, observing seemingly causes the state of the system to instantly change to one that has a definite value, one whose wavefunction is an eigenfunction of whatever the corresponding operator is for that observation. Moreover, the weighting in the weighted average reflects the frequencies in which each of the possible eigenfunctions appear when doing said observation. We don't necessarily get the value <A> for any given measurement; when we measure the same thing a lot of times, the average of the values that we get ends up being <A>. And this is where the interpretation questions start, because what does that mean? Things certainly don't appear to act that way in regular life, and the whole setup seems to lose a bit of predictive power if it only returns "probably" all the time. Einstein's "God does not play dice" comes from this. I have my own opinions as to what's going on here, but as all of the interpretations don't yield any predictive differences, I can't bring myself to care enough to argue for it.** Anyway, back to wave-particle duality. If you look for wave-like properties, things like momentum or energy, then you'll see wave-like behavior. If you look for particle-like properties like position, you'll see particle-like behavior, because the system acts as if, at the instant of observation, its wavefunction was an eigenfunction of whatever you were looking for. But the vast majority of the time, when unobserved the wavefunction wanders the superpositions, not a wave-like eigenfunction or a particle-like eigenfunction, and thus the thing it describes can't properly be called a wave or a particle at all. For the chemists, this is why all of those "orbital diagrams" that you see in textbooks describing "where the electron is" are so bizarre. They're not actually trying to describe where the electron "is", because those pictures don't correspond to states that are particle-like and hence the electrons in question don't really have positions to speak of. Those pictures are trying to describe the weighting that the weighted average <X> would compute for the wave-like states being depicted. Speaking of wave-like and particle-like behavior, let's talk about the Heisenberg Uncertainty Principle. We have the obvious question of "what if two people are looking at the same system and one tries to measure particle-like properties and the other tries to measure wave-like properties at the same time?" And this is an interesting question. It turns out that you can't do it, or at least, you can't do it precisely. Again, the physical interpretation is a little bit up in the air, but here's the mathematics, according to our model: Suppose that we have two operators A and B. If Ψ is an eigenfunction of A, then we have that A(Ψ) = aΨ. Then B(A(Ψ)) = B(aΨ) = aB(Ψ). And if Ψ is then an eigenfunction of B, we get aB(Ψ) = abΨ, and we would say that Ψ is an eigenfunction of the combined operator BA with eigenvalue ab. Notably, in this case A(B(Ψ)) spits out the same thing. But if A and B have different eigenfunctions, then A(B(Ψ)) and B(A(Ψ)) will spit out different things. In the particular case known as Heisenberg's Uncertainty Principle, the operator X that corresponds to position and the operator P that corresponds to momentum have completely different eigenfunctions, and so the combined operator XP, which should measure the position x and the momentum p and multiply them together to give xp, gives a different result from PX, which should measure the momentum p and the position x and multiply them together to give px. But x and p are numbers, and so xp and px should give the same value. Again for those linear-algebra folks, this is the very common fact that matrices don't usually commute. Since we're not dealing with eigenfunctions, we instead get those fuzzy averages <X> and <P>. A little bit more detail about how to get measurements eventually yields that how fuzzy those averages are has a real manifestation, not in just the measurements of position and momentum, but in the physical values of the position and momentum. In particular, there's a lower bound on the value of the variance in the position times the variance in the momentum. We know that it's not just a measurement issue because of the existence of Bose-Einstein condensates. By being so cold the momentum is constrained very precisely since there's not enough energy for high momentum; this means less variance in the momentum, which means the variance in the position has to go up, and so the atoms in the Bose-Einstein condensate act as if smeared out, and if the smearing is larger than the distance between the atoms then the atoms start to act like one big quantum object rather than a bunch of isolated ones, and that's when the weirdness starts. Anyway, that's at least the basis of the weirdness of non-relativistic, non-gauge quantum mechanics. The rabbit hole goes a lot deeper, and while the mathematics here is nothing terribly special, there is stuff down the line that makes most mathematicians shake their heads and wonder how the hell quantum mechanics manages to make predictions at all, never mind remarkably accurate ones. *Rather vain of you, I have to say. **Relative State, if you're wondering.
Special Relativity Since I talked about quantum mechanics, how about some relativity? From a geometric standpoint, rather than a physical one. Physics will appear at the end. Let's talk about the most infamous Parallel Postulate, which in one phrasing goes: Given a point P and a line l that doesn't go through P, there is exactly one line through P that is parallel to l. What does it mean for two distinct lines to be parallel? We could say that parallel lines never meet. If they did meet, they'd meet in a single point, and we could very well say that this point is determined by the two lines. So parallel lines do not determine any points. We can get variations on Euclidean geometry by fiddling with this postulate, for instance saying that there might be no lines through P that are parallel to l, so that any line through P must meet l at some point*. Or we could say that perhaps there are many lines through P that are parallel to l**. But rather than doing that, we're going to stick with exactly one line through P that is parallel to l. Instead, we're going to flip this on its head and start talking about parallel points. What does it mean for two points to be parallel? We could say that parallel points don't determine any lines. "Parallel" sounds a bit weird in this context, but what else are you going to call it? So if we have a point P and a line l that doesn't go through P, how many points on l are parallel to P? In Euclidean geometry, there are none. If we say exactly one, then we get what's called Galilean geometry, for reasons we'll come back to. If we say "lots", then we get Minkowskian geometry. In particular, let's say that we'll only allow lines whose slopes are strictly between 1 and -1. We'll call lines with slope between 1 and -1 "timelike", lines with slope exactly 1 or -1 "lightlike", and lines with slope greater than 1 or less than -1 "spacelike", including vertical lines. In particular, we only want to look at timelike lines as actual lines; the rest are convenient fictions. So if two points would determine a lightlike or spacelike line, then they're parallel, because they don't determine an actual line. In Euclidean geometry, if we rotate the entire plane, parallel lines remain parallel. So in our Minkowskian geometry, we need to keep parallel lines parallel when we "rotate", but we also need to keep our parallel points parallel. In particular, we need to have that a lightlike line gets sent to a lightlike line, and spacelike lines get sent to spacelike lines. But in Euclidean geometry we can also talk about distances. Can we talk about distances in Minkowskian geometry? Sure. Our distance looks a little wonky though. In Euclidean geometry, we can use the Pythagorean theorem to talk about distances on the coordinate plane: the distance between (0, 0) and (x, y) is given by ||(x, y)||^{2} = x^{2} + y^{2} In Minkowskian geometry we instead write ||(x,y)||^{2} = x^{2} - y^{2} And you'll note that this works with our notion of timelike, lightlike and spacelike: if the line determined by (0, 0) and (x, y) is timelike, ||(x,y)||^{2} is positive, if the line is lightlike, ||(x,y)||^{2} is 0, and if the line is spacelike, ||(x,y)||^{2} is negative. So only timelike lines have meaningful lengths, which is why they're the only actual lines. A lot of weirdness comes from that minus sign. For instance, in Euclidean geometry we can talk about perpendicular lines, and perpendicularity can in fact be determined from the distance formula. In Minkowskian geometry, we get that lightlike lines are perpendicular to themselves, and that timelike lines are only perpendicular to spacelike lines and vice-versa. So we can't have any right-angle triangles in Minkowskian geometry if we demand that all the sides be actual line segments! So, physics time. First, we rename our variables as t and d. Also we include Galilean geometry. So our formulas become Euclidean: ||(t, d)||^{2} = t^{2} + d^{2} Galilean: ||(t,d)||^{2} = t^{2} Minkowskian: ||(t,d)||^{2} = t^{2} - d^{2} and now we interpret t as time change and d as spatial distance. Slope is then velocity. The units are chosen so that the speed of light is 1; in other words, if (0, 0) and (t,d) determine a lightlike line, then velocity that line corresponds to is the speed of light, and in general a line with slope k corresponds to a speed equal to |k| times the speed of light. Points being parallel means that we can't go from one point to the other without going too fast; in Galilean, "too fast" means infinite speed, while in Minkowskian "too fast" means above the speed of light. The corresponding notions of "rotation" become "boosts", i.e. changing velocities, and conserve these speed limits. In particular, Minkowskian rotations leave the speed of light at 1. ||(t,d)|| now is the "proper time", or in other words what a clock moving from (0, 0) to (t, d) along a straight line would measure. Galilean geometry is called such because it is the geometric interpretation of the physics described by Galileo (and Newton). Minkowskian geometry is the geometry corresponding to the physics of Einstein (and Lorentz and Poincare). The physical weirdness of special relativity can be seen from the differences between Galilean and Minkowskian geometry. For instance, we intuitively assume that time as measured by the traveling clock only depends on t, and not on d; only on time as we measure it, not on whatever spatial maneuvering the clock is doing. This leads to, say, the twin paradox, where the twin who goes out and comes back ages less than the twin who stays in place. This corresponds to straight lines being the longest paths rather than shortest. In addition to changing how many parallel points we have, we can also fool around with the number of parallel lines as well. So we get a total of 9 possibilities, depending on whether we have zero, one, or many parallel lines and zero, one or many parallel lines. So far we've talked about five of these possibilities: (zero, zero), (one, zero), (many, zero), (one, one), and (one, many), the last of which was the focus. Actually, all nine geometries have physical interpretations, although the ( , zero) row, corresponding to Elliptic, Euclidean and Hyperbolic, only show up in science fiction*** and awkward work-arounds in quantum field theory****. But the other six, "Co-Euclidean", Galiean, "Co-Minkowskian", "Co-hyperbolic", Minkowskian and "Doubly-Hyperbolic", all have interesting physical meanings in terms of how time works (how many parallel points) and how gravity works (how many parallel lines). At some point I might work myself up to describing general relativity, except at the moment I'm not sure if I can explain it well without going into a lot of technical detail. It's either bad analogies, no information, or a mess of awful formulas, and I'd rather not do any of those if I can help it. *Elliptic, or spherical geometry: the geometry of a sphere where the "lines" are great circles. **Hyperbolic geometry: the geometry of a pringles chip. Or saddles, I guess. ***Okay, so the only place in sci-fi that I've seen it taken seriously is Greg Egan's Orthogonal trilogy, but he takes it very seriously, in that Orthogonal is really more of a series of papers that happen to occur in a science-fiction narrative. It's really entertaining, though, and the characters are actually intelligent, rather than Hollywood Intelligent. ****Wick rotation. Ugh. It works, but ugh.
1/|k|, no? ... wait, are you putting the t axis sideways? >.< *headtilt* Okay, I see the advantages of the way you're setting it up, and the concept of parallel points is very cool, but instead/also consider this: Let's graph points in spacetime on the complex plane. The horizontal axis measures the real quantity x, distance in meters. The vertical axis measures the imaginary quantity i*c*t, with c being the speed of light (natural units are convenient, but they obscure dimensional stuff). It turns out i*c*t is also a "distance" in units of imaginary meters. The ordinary distance formula now gives d^2 = x^2 + (i*c*t)^2 = x^2 - (c^2)(t^2), and we call d the proper distance - physically, the distance between two points in a frame of reference where they are simultaneous. A negative proper distance is a time-like interval, and a zero proper distance is light-like. The math is all equivalent, it's just an alternate description - but I think this one demonstrates the unity of space and time better. Space and time depend on each other because they're actually the same kind of thing. Time being an imaginary axis makes a big difference - consider the duration of one second, versus the distance of one light second (30,000 km) - but the reason it makes sense to rotate them together is that they are fundamentally related, as evidenced by our ability to measure them both in meters. And then similar rotations link mass with energy, and electricity with magnetism. Plus, it lets me say "A year is an imaginary distance", which makes me feel like a Vorlon. General relativity makes me feel like Daft Punk - I start chanting "the future... is down" over and over in my head.
I actually really dislike that description, to be honest, because it doesn't generalize well to multiple spatial dimensions or to curvatures, and also I'm stuck in the (+---) convention. Real things should have real square magnitudes. Also the complex numbers have a notion of magnitude built in, and it unfortunately would give x^{2} + c^{2}t^{2}. In the interest of not building in any structure that has to be ignored later, I stick to the vector space version. Plus that description encourages Wick rotations, which I am firmly against.
... well, it wouldn't be the first time I learned a better interpretation on the internet than I got in my actual classes. What was that about curvatures? I took general relativity in grad school and it made my head hurt.
So one way of looking at curvature is to say "go in a tiny circle; what changed as you went around?" Start at the north pole. Suppose you have an arrow pointing along the surface of the Earth, say toward Vancouver. You head south in the direction that the arrow points until you reach the equator. Then, still keeping the arrow pointing south, you head east for a while. Then you start heading north again, still keeping the arrow pointing south. When you reach the north pole, the arrow is no longer pointing toward Vancouver. At no point did you "turn" the arrow relative to itself. It's just that the world is curved. Anyway, the point of this is that the metric you put on the tangent spaces dictates what it means to "turn" your arrow, and hence this notion of curvature. And for any metric that is sufficiently nice, you can compute a notion of curvature based on this idea of wandering in loops and seeing how your tangent vectors change. But if you also have to carry around a complex structure on your vector spaces, things get a bit more difficult, because now you have to carry around not one but two arrows: the real and the imaginary, and these have a relationship. But these aren't linked to each other in a Lorentz-invariant way: a boost doesn't leave the new time axis as i times the new space axis. The ability to say that time is just imaginary space depends on the coordinates you use. This is okay in flat space, because you can just decouple the complex structure from the relativistic structure and say "we're just going to express things in these particular coordinates", and then just use those particular coordinates forever. But because a curved space has that problem with the loops, you don't get to pick your coordinate system. Or at least, you can pick your coordinate system once, but if you go in a loop then you end up with different coordinates when you get back since your vectors get all skewed around. There is a way to make the "time as imaginary space" view work, but only if you completely separate out the t axis and the x axis as distinct complex coordinates. The x-t system isn't a single complex plane, but rather one axis each from two different complex planes; from one complex plane you take the real axis and label it x and from the other complex plane you take the imaginary axis and label it t, and putting them together you get something that looks like but is not really a complex plane. Now if you do this and keep all of these extra hidden bits in mind, a lot of the curvature problem goes away because now your two arrows are independent. But those extra hidden bits are important and therefore shouldn't be hidden.
Innnteresting... though I don't know what you mean by "tangent space" - the arrow to Vancouver is not a tangent vector, since on different legs of the loop it is parallel, perpendicular, and anti-parallel to the path.
By "tangent vector" I am referring to the vectors tangent to the surface of the Earth. So not tangent to the path, necessarily.
I'm probably going to type up a thing about (differential) geometric curvature at some point. Not in terribly much detail here, maybe more detail, including more general bundle curvature, on my blog. I wish I still had that exercise ball that I'd scribbled all over.
Curvature The mathematical notion of "curvature" is familiar in certain cases like surfaces as deviations from flatness, but gets more complicated for higher dimensions. Even for surfaces people aren't always clear what it means for a surface to be curved from a mathematical perspective. Here I'll give some notions of curvature that can be extended to more dimensions in various ways. In particular, these are ways that can be detected by someone living in the space in question, or at least someone who can't look "up" and can't see very far but has an accurate ruler and protractor. This was a topic that was near and dear to my heart up until grad school when I realized that I liked say "take a derivative" much more than I liked actually taking derivatives and so switched to algebra. We'll consider three cases: zero curvature, like a sheet of paper on your desk or said sheet of paper rolled into a tube*, (constant) positive curvature, like the surface of a ball, and (constant) negative curvature, like a saddle or a Pringles chip. We define a "line segment" as the shortest path between two distinct points, and a "line" as a bunch of line segments joined end-to-end so that the angle between adjacent line segments is π radians. Line segments may have to be "small enough". We define the distance between two points to be the arclength of the line segment between the two points. Consider a point and a small distance. We can look at all of the points that particular distance away from the particular point, and that set of points forms a "circle". We can then measure the circumference of the circle and compute the ratio of the circumference to the distance which served as a radius. If our surface has zero curvature, then the ratio is 2π regardless of the distance chosen. If our surface has positive curvature, then the ratio is less than 2π by an amount depending on the distance chosen. If our surface has negative curvature, then the ratio exceeds 2π by an amount depending on the distance chosen. Consider three distinct points drawn on our surface and line segments joining those three points to form a triangle. Look at the inner angles of this triangle. Add them up. If our surface has zero curvature, then the sum of the inner angles is π radians regardless of the area of the triangle. If our surface has positive curvature, then the sum of the inner angles exceeds π radians by an amount proportional to the area of the triangle. If our surface has negative curvature, then the sum of the inner angles is less than π radians by an amount proportional to the area of the triangle.** Consider a triangle drawn on our surface. Starting at one corner of the triangle, place a little arrow so that it lies along the surface of the ball in the direction of one of the edges of the triangle. Now pretend you've strapped that arrow to the back of an ant, so they're facing the same way, and have the ant carry the arrow along that edge until it reaches the next corner. Now the arrow is transferred, without turning it, to the back of a new ant that travels along the next edge, so that the arrow and the ant carrying it are no longer facing the same way. When the second ant gets to the end of the second edge, again transfer the arrow, without turning it, to a third ant who will carry it along the third edge. The arrow ends up back at the starting point. If our surface has zero curvature, then the arrow points in the same direction it was originally pointing. If our surface has positive curvature, then the arrow has been rotated clockwise by an amount proportional to the area of the triangle. If our surface has negative curvature, then the arrow has been rotated counterclockwise by an amount proportional to the area of the triangle. *** Take two points and draw the line segment between them. Then at the each endpoint of the segment draw a line perpendicular to the segment, so that the two lines drawn point in the same direction. If our surface has zero curvature, the two lines always stay the same distance apart as you go along them. If our surface has positive curvature, the two lines get closer together as you go along them. If our surface has negative curvature, the two lines get farther apart as you go along them. Now let's leave behind our surface, and suppose we're just doodling around in our room. Suppose we have three straight sticks, and we arrange them into a triangle, and then measure the inner angles. We'd expect that the sum of the inner angles to be π radians, but what if it isn't? What should we then have to consider about the space that we live in? Or suppose that we have a ball, and we measure the ratio of its surface area to the square of its radius. High-school geometry tells us that this ratio ought to be 4π regardless of the size of the ball, but what if it's not? Of course, if our space is curved, the curvature is very minute, so you'd have to make a very big ball or a big triangle to notice it. But these are, in theory, things you could detect. The last method of detecting curvature, the two lines, is (part of) the method usually considered by (astro-)physicists to detect curvature. For instance, if we take "inertial paths", i.e. freefall, to be our notion of "straight line", then curvature in spacetime manifests as gravity, where nearby freefall lines (for instance, the path of a skydiver in spacetime and the path of the Earth in spacetime) get closer together, which looks like things being drawn to each other. Of course, the big part of General Relativity is explaining how matter affects the inertial paths and is more complicated because in more dimensions, there are more things that paths can do than just get closer together or farther apart. But this is what people mean when they say that gravity is a "fictitious force" in GR: objects move in straight lines according to whatever their inertia tells them they should be doing, so they don't experience any force of gravity, not in the Netwonian F = ma sense; rather, the phenomenon of gravity comes about from those straight lines getting closer together. There are of course formal notions of curvature that explain how to measure the amount of curvature itself with just an accurate ruler, even in spaces with many dimensions and nonconstant curvature, but that requires a lot of ugly calculus and probably some topology facts that I don't want to get into here. Even worse is going the other way: starting with the curvature and trying to find a space that has that curvature via awful differential equations. I have been told that physics grad students doing general relativity or quantum field theory spend a lot of their time doing such calculations and I do not envy them. *Yes, the tube is considered geometrically "flat", since any small-enough geometric figure drawn on the plane can be rolled up onto the tube without distorting any lengths or angles. **Fun fact: The smallest angle between two lines that actually meet up is 0 (asymptotically) so the smallest sum of inner angles that a triangle can have is 0. Hence in a space with constant negative curvature, there's actually an upper bound on the area that triangles can have. ***This setup, of an arrow carried around by ants without turning, is why I don't like the complex-number model of Minkowski geometry, since the turning means that the real and complex axes will get screwed around as one moves around the surface, even if you return to the point where you started. Since the match between the complex numbers and Minkowski geometry depends on a particular algebraic relationship that, in curved space, gets destroyed when you rotate the axes, the model doesn't work well if the space isn't flat.
I'm having trouble extending this description of curvature to spacetime. (I passed GR, really I did!) The angle between the arrow and the ant's path would correspond to, what, the apparent velocity of an test particle in freefall, from a frame of reference that's moving around a closed timelike loop?
Yes. You need to have a notion of "parallel" velocities for nearby points, i.e. parallel transport, which is handled via the Levi-Civita connection as long as the paths are differentiable. In that case, we have that taking two vectors and parallel transporting them along the same path preserves their inner product, so intervals are preserved, and then there's also a property called "torsion-free" to prevent the space of test-particle velocity vectors from just spinning freely around the path. Anyway, the point is that there's a good notion of how to compare test-particle velocity vectors at two nearby points along a path, and we want to say "keep it constant". Note: instead of a loop, usually you take two timelike paths between a pair of timelike-separated points: you start with a velocity vector of a test particle at the first point, transport it on each of the paths, and see how the two results differ when you get to the second point; this deals with any smoothness/time-reversal/closedness issues that don't show up in Riemannian geometries. So the parallel-transported velocity vector reflects the path taken if the spacetime is curved. Well, technically, that's holonomy; curvature is the infinitesimal version. Again I think the more phenomenologically-detectable property is the divergence/convergence properties of nearby parallel geodesics (inertial paths in spacetime), but this is certainly one way to look at the notion of gravitational curvature.
Yeah, the convergence of geodesics is the one I remember from school, but the argument you made against the complex-number model relies on the loops, and it was very convincing except my brain wants to visualize what it's being convinced by. Lemme sleep on it.
I mean, you can do it with loops, except at some point they need to go backwards in time. If you're fine with that, and the requisite non-smooth point needed to actually get a path that goes forward in time and then goes backward and is timelike in both directions, then you can turn the two-point, two-path setup into one point and a path that abruptly goes backwards in time at some other point.