Roger Barlow

Antineutrinos and the failure of Occam’s Razor

William Of Ockham is one of the few medieval theologian/philosophers whose name survives today, thanks to his formulation of the principle known as Occam’s Razor. In the original latin, if you want to show off, it runs Non sunt multiplicanda entia sine necessitate, or Entities are not to be multiplied without necessity, which can be loosely paraphrased as The simplest explanation is the best one, an idea that is as attractive to a 21st century audience as it was back in the 14th.

William of Ockham

Now fast forward a few centuries and let’s try and apply this to the neutrino. People talk about the “Dirac Neutrino” but that’s a bit off-target. Paul Dirac produced the definitive description not of the neutrino but of the electron. The Dirac Equation shows – as explained in countless graduate physics courses – that there have to be 2×2=4 types of electron: there are the usual negatively charged ones and the rarer positively charged ones (usually known as positrons), and for each of these the intrinsic spin can point along the direction of motion (‘right handed’) or against it (‘left handed’). The charge is a basic property that can’t change, but handedness depends on the observer (if you and I observe and discuss electrons while the two of us are moving, we will agree about their directions of spin but not about their directions of motion.)

Dirac worked all this out to describe how the electron experienced the electromagnetic force. But it turned out to be the key to describing its behaviour in the beta-decay weak force as well. But with a twist. Only the left handed electron and the right handed positron ‘feel’ the weak force. If you show a right handed electron or a left handed positron to the W particle that’s responsible for the weak force then it’s just not interested. This seems weird but has been very firmly established by decades of precision experiments.

(If you’re worried that this preference appears to contradict the statement earlier that handedness is observer-dependent then well done! Let’s just say I’ve oversimplified a bit, and the mathematics really does take care of it properly. Give yourself a gold star, and check out the difference between ‘helicity’ and ‘chiralilty’ sometime.)

Right, that’s enough about electrons, let’s move on to neutrinos. They also interact weakly, very similarly to the electron: only the left-handed neutrino and the right-handed antineutrino are involved, and the right-handed neutrino and left-handed antineutrino don’t.

But it’s worse than that. The left handed neutrino and right handed antineutrino don’t interact weakly: they also don’t interact electromagnetically because the neutrino, unlike the electron, is neutral. And they don’t interact strongly either. In fact they don’t interact full stop.

And this is where William comes in wielding his razor. Our list of fundamental particles includes this absolutely pointless pair that don’t participate at all. What’s the point of them? Can’t we rewrite our description in a way that leaves them out?

And it turns out that we can.

Ettore Majorana, very soon after Dirac published his equation for the electron, pointed out that for neutral particles a simpler outcome was possible. In his system the ‘antiparticle’ of the left-handed neutrino is the right-handed neutrino. The neutrino, like the photon, is self-conjugate. The experiments that showed that neutrinos and antineutrinos were distinct (neutrinos produce electrons in targets: antineutrinos produce positrons) in fact showed the difference between left-handed and right-handed neutrinos. There are only 2 neutrinos and they both interact, not 2×2 where two of the foursome just play gooseberry.

So hooray for simplicity. But is it?

The electron (and its heavier counterparts, the mu and the tau) is certainly a Dirac particle. So are the quarks, both the 2/3 and the -1/3 varieties. If all the other fundamental fermions are Dirac particles, isn’t it simpler that the neutrino is cut to the same pattern, rather than having its own special prescription? If we understand electrons – which it is fair to say that we do – isn’t it simpler that the neutrino be just a neutral version of the electron, rather than some new entity introduced specially for the purpose?

And that’s where we are. It’s all very well advocating “the simple solution” but how can you tell what’s simple? The jury is still out. Hopefully a future set of experiments (on neutrinoless double beta decay) will give an answer on whether a neutrino can be its own antiparticle, though these are very tough and will take several years. After which we will doubtless see with hindsight the simplicity of the answer, whichever it is, and tell each other that it should have been obvious thanks to William. But at the moment he’s not really much help.

STV: the benefit nobody talks about

British democracy was created in the 19th century, like the railways. Like the railways it was, for its time, truly world-leading. However after 150 years, like the railways, it is showing its age. People, technology and society have come a long way, and a system which worked yesterday needs to be adapted and improved for the different conditions of today.

One flaw which is becoming increasingly apparent is the way it stifles minorities. Under the first-past-the-post system the winner takes all and the loser gets nothing. Democracy has to be more than that. Even in a simple two party system, 49% of the voters may have no say in how the country is run – and in a multiparty system the ruling party may have the support of well below half of the population. In a balanced system where the pendulum of power swings to and fro this may not matter too much, but when the difference is structural a large minority is rendered permanently powerless, which in the long run invites revolution.

These arguments are well rehearsed and various schemes to improve proportionality are suggested: the party list system, as was used in the Euro elections, the additional member system, as is used in the regional assemblies, the alternative vote and the Single Transferable Vote. Pundits with spreadsheets discuss the improvements in ‘proportionality’ given by the various schemes. I want to make a point in favour of the STV scheme which has nothing to do, directly, with proportionality. It gives voters the chance to choose their MP from within party list.

Let’s take a town of 200,000 people. Under STV its voters elect three MPs. Suppose, for simplicity, there are only two parties: Left and Right. The town is fairly evenly balanced, and elects 2 Left and 1 Right MP in some elections and 2 Right and 1 Left in others, depending on the way the political wind is blowing.

Now, although each party knows that the best it can hope for is 2 seats out of 3, they will put up 3 candidates. Not to fill the slate will be seen as a sign of weakness. This happens. To take an example close to home, in the last Euro elections (2019) here in the North West region all the parties (Conservatives, Labour, Liberal Democrats, Brexit, Change, UKIP and the Greens) all put up full slates of 8 candidates, although they knew that they were never going to win all of them. The picture was the same in other regions. Parties will put up as many candidates as there are seats to be won.

So when a voter in this hypothetical town goes into the polling booth their ballot paper has six names, and they rank them in order (and although the mechanics of counting STV votes are complicated, its use by voters is really simple). A staunchly pro-Left voter will write 1, 2 and 3 against the Left candidates and 4, 5 and 6 against the Right candidates: a pro-Right voter will do the reverse. In doing so they are not only expressing their allegiance to a party, they are also expressing their preference for the candidates within that party. And that preference carries through to the result.

Let’s see how that works. Suppose that Smith, Jones and Robinson are the candidates for the Left party, which is doing well this time, while Brown, Green and White are standing for the Right party, which is lagging. Smith (a prominent local character) is more popular than Jones (a relative newcomer of whom little is known), while Robinson (whose controversial twitter stream has annoyed many people) is least popular of the three. As the votes are counted the popular Smith is the first to reach the quota (more than one quarter of the votes cast). Smith is elected, and surplus votes are diverted to the Jones pile.

Even with that boost, perhaps neither Jones nor anyone else makes quota. For the lagging Right party, Brown is the most popular candidate, followed by Green and then White, so the unfortunate White has the smallest number of 1st preference votes and is eliminated, their votes going to Brown who now makes quota. Robinson is eliminated next, their votes going to Jones who narrowly beats Green. Yes, proportionality has worked, after a fashion, in that the town has elected two Left and one Right MP, but it has done more than that: it has chosen between the candidates within the parties.

Everybody’s vote counts. There may be cases where a ballot is not counted for the voters preferred party – because the candidate made quota or dropped off the bottom – but their 4-5-6 ranking is used to express a preference as to which candidate of their non-preferred party gets elected. And so far we’ve ignored cross-party voting, which will strengthen the effect: voters are not tied to party allegiance and may vote for a popular individual despite their party.

STV also gives a much-needed voice to the majority. There is much – valid – complaint that in a ‘safe’ seat, voters for the losing parties have no say. But voters for the winning party have no say either. The candidate is appointed by a small selection committee, or by party headquarters. With STV it may still be effectively built-in that a party is bound to get a seat, but which of the candidates benefits from this is in the hands of the voters. Candidates – and sitting MPs – are going to realise this. They will be aware that they are answerable to the electorate rather than the party machinery. Today a Tory MP in the shires or a Labour MP in the industrial north knows that it would take major misbehaviour on their part to make voters switch party and thereby lose their seat, but with STV they will need to fear a switch in preference within the party ticket, and will treat their voters with much more respect.

This will change the dynamic of elections. Candidates will have to appeal to the electorate not just for their party but for themselves. Bright young SPADs who work the system within the party organisation to get onto the candidate list will also have to appeal to the electors if they’re going to get elected. It’s worth noting that this dynamic is the opposite to the ‘party list’ system. You sometimes hear people object to PR because it gives control to the party rather than the voter; this applies to the list system but for STV it’s just the opposite.

Hopefully, in 100 years time “safe seats” will have gone the way of rotten boroughs and be consigned to history. STV can make that happen, giving choice to the people rather than the party machinery.

The Lesson from the Prisoner’s Dilemma

This is a classic puzzle which, like all such, comes in the form of a story. Here is one version:

Alice and Bob are criminals. No question. They have been caught red-handed in a botched robbery of the Smalltown Store, and are now in jail awaiting trial.

The police have realised that Alice and Bob match the description of the pair who successfully robbed the Bigtown Bank last month. They really want to get a conviction for that, but with no evidence apart from the resemblance they need to get a confession.

So they say to Alice: “Look, you are going to get a 1 year sentence for the Smalltown Store job, no question. But if you co-operate with us by confessing that the two of you did the Bigtown Bank heist then we’ll let you go completely free. You can claim Bob was the ringleader and he’ll get a 10 year sentence.”

Alice thinks a moment and asks two questions.

“Are you making the same offer to Bob? What happens if we both confess?”

The police tell her that yes, they are making the same offer to both of them. And if both confess, they’ll get 6 years each.

OK, that’s the story. All that circumstantial detail is just to lead up to this decision table, which Alice is now looking at:

		Bob
		Confess	Deny
Alice	Confess	6+6	0+10
Alice	Deny	10+0	1+1

That’s the problem in a nutshell. Before we look at it there are maybe a few points to clear up

Alice and Bob are not an item. They are just business partners. Each is aiming to minimise their own jail term, and what happens to the other is irrelevant for them.
‘Go free’ really does mean that – there are no vengeful families or gang members to bring retribution on an informer.
Whether they actually committed the Bigtown Bank job is completely irrelevant to the puzzle.

OK, let’s get back to Alice. She reasons as follows:

“I don’t know what Bob is going to do. Suppose he denies the bank job. Then I should confess, to reduce my sentence from 1 year to zero. But what if he confesses? In that case, I’d better confess too, to get 6 years rather than 10. Whichever choice Bob makes, the better option for me is to confess. So I’ll confess.”

Bob will, of course, reason the same way. If Alice denies, he should confess. If Alice confesses, he should confess. 0 is less than 1, and 6 is less than 10. Therefore he should confess.

The logic is irrefutable. But look at that table again. The prisoners have firmly chosen the top left box, and will both serve 6 years. That’s a terrible result! It’s not only the worst total sentence (12 years), its the next-to-worst individual sentence (6 years is better than 10, but much worse than 0 or 1). Clearly the bottom right is the box to go for. It’s the optimal joint result and the next-to-optimal individual result.

That is obvious to us because we look at the table as a whole. But Bob (or Alice) can only consider their slices through it and either slice leads to the Confess choice. To see it holistically one has to change the question from the Prisoner’s Dilemma to the Prisoners’ Dilemma. That’s only the movement of an apostrophe, but it’s a total readjustment of the viewpoint. A joint Bob+Alice entity, if the police put them in one room together for a couple of minutes (but they won’t), can take the obvious bottom-right 1+1 choice. Separate individual Bob or Alice units, no matter how rational, cannot do that.

This is what the philosophers call emergence. The whole is more than just the sum of its parts. A forest is more than a number of trees. An animal is more than a bunch of cells. It’s generally discussed in terms of complex large-N systems: what’s nice about the Prisoner’s Dilemma is that emergence appears with just N=2. There is a Bob+Alice entity which is more than Bob and Alice seperately, and makes different (and better) decisions.

There’s also a lesson for politics. It’s an illustration of the way that Mrs Thatcher was wrong: there is such a thing as society, and it is more than just all its individual members. Once you start looking for them, the world is full of examples where groups can do things that individuals can’t – not just from the “united we stand” bundle-of-sticks argument but because they give a different viewpoint.

I should stockpile lavatory paper in case there’s a shortage caused by people stockpiling lavatory paper.
When recruiting skilled workers it’s quicker and cheaper for me to poach yours rather than train my own.
My best fishing strategy is to catch all the fish in the pond, even though that leaves none for you, and none for me tomorrow.
If I get another cow that will always give me more milk, even though the common grazing we share is finite.

Following the last instance, economists call this “The tragedy of the commons”. It’s the point at which Adam Smith’s “invisible hand” fails.

This tells us something about democracy. A society or a nation is more than just the individuals that make it up. E pluribus unum means that something larger, more powerful and – dare one say it – better can emerge. So democracy is more than just arithmetical counting noses, democracy provides the means whereby men and women can speak with one voice as a distinct people. That’s the ideal, anyway, and – even if the form we’ve got is clunky and imperfect – some of us still try to believe in it.

Why can’t science journalists understand p-values?

X1T

The Xenon1T experiment has just announced a really interesting excess of events, which could be due to axions or some other new particle or effect. It’s a nice result from a difficult experiment and the research team deserve a lucky break. Of course like any discovery it could be a rare statistical fluctuation which will go away when more data is taken. They quote the significance as 3.5 sigma, and we can actually follow this: they see 285 events where only 232 are expected: the surplus of 53 is just 3.5 times the standard deviation you would expect from Poisson statistics: 15.2 events, the square root of 232.

This is all fine. But the press accounts – as in, for example, Scientific American, report this as “there’s about a 2 in 10,000 chance that random background radiation produced the signal”. It’s nothing of the sort.

Yes, the probability of exceeding 3.5 sigma (known as the p-value) is actually 2.3 in 10,000. But that’s not the probability that the signal was produced by random background. It’s the probability that random background would produce the signal. Not the same thing at all.

What’s the difference? Well, if you buy a lottery ticket there is, according to Wikipedia, a 1 in 7,509,578 chance of winning a million pounds. Fair enough. But now you meet a millionaire and ask “What is the chance they got that way through last week’s lottery?” it’s certainly not 1 in 7,509,578.

There are several paths to riches: inheritance, business and of course previous lottery winners who havn’t spent it all yet. The probability that some plutocrat got that way through a particular week’s lottery depends not just on that 1 in 7,509,578 number but on the number of people who buy lottery tickets, and the number of millionaires by who made their pile by other means. (It’s then just given by Bayes’ theorem – I’ll spare you the formula.) You can’t find the answer by just knowing p, you need all the others as well.

There is a 1 in 7 chance that your birthday this year falls on a Wednesday, but if today is Wednesday, the probability that it’s your birthday is not 1 in 7. Your local primary school teacher is probably a woman, but most women are not primary teachers. All crows are black, but not all black birds are crows. Everyday examples are all around. For instance – to pick an analogous one – if you see strange shapes in the sky this could be due to either flying saucers or to unusual weather conditions. Even if a meteorologist calculates that such conditions are very very unusual, you’ll still come down in favour of the mundane explanation.

clouds So going back to the experiment, the probability that random background would give a signal like this may be 1 in 20,000 but that’s not the probability that this signal was produced by random background: that also depends on the probabilities we assign to the mundane random background or the exotic axion. Despite this 1 in 20,000 figure I very much doubt that you’d find a particle physicist outside the Xenon1T collaboration who’d give you as much as even odds on the axion theory turning out to be the right one. (Possibly also inside the collaboration, but it’s not polite to ask.)

This is a very common mistake – every announcement of an anomaly comes with its significance reported in terms of the number of sigma, which somebody helpfully translates into the equivalent p-value, which is then explained wrongly, with language like “the probability of the Standard Model being correct is only one in a million” instead of “the probability that the Standard Model would give a result this weird is only one in a million”. When you’re communicating science then you use non-technical language so people understand – but you should still get the facts right.

Tips for speakers#2: Beware of the second slide

In a million and one grad student talks the second slide looks like this: the table-of-contents or the outline-of-the-talk. It may be a bit more colourful, with banners and logos and exciting pictures, but it’s basically the same, and the speaker will repeat the traditional phrases “After an introduction and a survey of the literature, I’ll describe the methodology we used…”

By this stage, one minute into the talk, the members of the audience are all thinking “Here’s another grad student talk like a million others… so predictable. ” and their attention will wander to their unanswered emails, or their plans for dinner, or an attractively-filled T shirt two rows in front, and the poor speaker has got to work really hard to get them back.

Do you need a contents slide at all? It’s not compulsory. Even though some presentation packages provide it almost by default, with sections and subsections, you don’t have to have one. Before you include it you should weigh up the reasons for and against.

Against:

It cuts into your time allocation.
It disrupts the flow of the talk as, by definition, it stands outside the narrative.
It will tend to shift the focus onto you as the speaker rather than on the material

On the other hand:

It can give structure to an otherwise amorphous talk
It can help the audience keep track of a complicated sequence of ideas

So its inclusion or exclusion depends on the talk length, the nature of the material, the links between you and your audience, and your personal style. In a 10 minute conference oral where you’re developing one idea it’s almost certainly not wanted. In a one hour seminar covering disparate but linked topics it could be really useful. If you’re going to include an outline, that should be a conscious decision, not just something you feel you ought to do.

If you do decide to include one, then make it work for you. Refer back to it during the talk, showing the audience where they’ve got to on the map you set out at the start. (There are some Beamer themes that do this automatically – Berkeley is a standard. UpSlide does it for PowerPoint. But it’s easy to do it by hand.) If you’re including an outline then make full use of it.

The final point is: if you’ve decided to include a table of contents then customize it. Make it your own and unique so that it’s not just the same as every other grad student talk. Here’s a revised version of that original outline slide (with some invented details). It’s the same slide: the first bullet is the introduction, the second is the literature search, and so on. But don’t call them that. Fill in your details in the generic slots, and that will keep the audience engaged and attentive and get them into your world and your language from the start.

Probability and job applications

In preparing a recent talk on probability for physics grad students – it’s here if you’re interested – I thought up a rather nice example to bring out a key feature of frequentist probability. I decided not to include it, as the talk was already pretty heavy, but it seemed too good an illustration to throw away. So here it is.

Suppose you’ve applied for a job. You make the short list and are summoned for interview. You learn that you’re one of 5 candidates.

So you tell yourself – and perhaps your partner – not to get too hopeful. There’s only a 20% probability of your getting the job.

But that’s wrong.

That 20% number is a joint property of yourself and what statisticians call the collective, or the ensemble. Yes, you are one of a collective of 5 candidates, but those candidates are not all the same.

Let me tell you – from my experience of many job interviews, good and bad, on both sides of the table, about those 5 candidates .

hiring-1977803_1280

One candidate will not turn up for the interview. Their car will break down, or their flight will be cancelled, or they will be put in Covid-19 quarantine. Whether their subconscious really doesn’t want them to take this job, or they have a guardian angel who knows it would destroy them, or another candidate is sabotaging them, or they’re just plain unlucky, they don’t show. There’s always one.

A second candidate will be hopeless. They will have submitted a very carefully prepared CV and application letter that perfectly match everything in the job specification, bouncing back all the buzz-words and ticking all the boxes so that HR says they can’t not be shortlisted. But at the interview they turn out to be unable to do anything except repeat how they satisfy all the requirements, they’ll show no signs of real interest in the work of the job apart from the fact that they desperately want it.

The third candidate will be grim. Appointable, but only just above threshold. The members of the panel who are actually going to work with them are thinking about how they’re going to have to simplify tasks and provide support and backup, and how they really were hoping for someone better than this.

Candidate four is OK. Someone who understands the real job, not just the job spec in the advert, and who has some original (though perhaps impractical) ideas. They will make a success of the job and though there will be occasional rough patches they won’t need continual support.

Candidate five is a star. Really impressive qualification and experience on paper, glowing references, and giving a superb interview performance, answering questions with ease and enthusiasm and using them to say more. They will certainly get offered the job – at which point they will ask for a delay, and it will become clear that they’re also applying for a much better job at a superior institution, and that they don’t really want this one which is only an insurance in case they don’t get their top choice.

So there are the five. (Incidentally, they are distributed evenly between genders, backgrounds and ethnicities). Don’t tell yourself your chance is 20%. That’s true only in the sense that your chance of being male (as opposed to female) is 50%. Which it is, as far as I’m concerned, but certainly not as far as you’re concerned.

Instead ask yourself – which of the five candidates are you?

(If you don’t know, then you’re candidate #3)

Tips for speakers #1: don’t thank the audience for their attention

One problem anyone faces in putting any sort of talk together is how to finish. And a depressingly large number of speakers do so with a slide like this

This way of ending a talk came originally, I think, from Japan. And unless you are Japanese you should never use it. A Japanese speaker has centuries of proud samurai tradition behind them, and when they say ” thank you for your attention” what they mean is

If you are not Japanese this does not work. Instead the message conveyed is

Which is not a good way to finish.

And this throws away a golden opportunity. The end of the talk is the point at which you really have the attention of the audience. This may not be for the best of reasons – perhaps they want to hear the next speaker, or to go off for much-needed coffee, but when you put your conclusions slide up your listeners’ brains move up a gear. They look up from the email on their laptops and wonder what’s next. So your final message is the one with the best chance of being remembered.

Give them the pitch that you hope they’ll take away with them.

“So we have the best results yet on ….”

“So we have the prospect of getting the best results on … in time for next year’s conference”

“There are going to be many applications of this technique”

“We understand the whole process of … a lot better”

Whatever’s appropriate. Be positive and upbeat and, even if they’ve been asleep for the past 20 minutes, they will go away with a good feeling about your work, your talk, and your ability as a speaker.

(See what I just did??)

LHCb clocks up 8 Inverse femtobarns

The LHCb experiment has just announced that it’s accumulated 8 inverse femtobarns of data. The screen shows the result and the ongoing totals.

It’s obviously a cause for celebration, but maybe this is an opportunity to explain the rather obscure ‘inverse femtobarn’ unit.

Let’s start with the barn. It’s a unit of area. 10 ^-28 square metres, so rather small. It was invented by the nuclear physicists to describe the effective target-size corresponding to a particular nuclear reaction. When you fire beam particles at target particles then all sorts of things can happen, with probabilities predicted by complicated quantum mechanical calculations, but those probabilities can be considered as if each reaction had its own area on the target nucleus: the bigger the area the more likely the reaction. It’s not literally true, of course, but the dimensions are right and you can use it as a model if you don’t push it too far. Nuclear cross sections, usually called σ, are typically few barns, or fractions of a barn, so it’s a handy unit.

No, it’s not named after some Professor Barn – it’s probably linked to expressions like “Couldn’t hit the broad side of a barn” or even “Couldn’t hit a barn door with a banjo!”

Particle physicists took this over, though our cross sections were typically smaller – millibarns (10^-3 barns) for strong interaction processes, microbarns (10^-6) and nanobarns (10^-9) for electromagnetic processes such as were measured at PETRA and LEP. Beyond that lie picobarns (10^-12) and femtobarns(10^-15). Only the neutrino physicists stuck with m² or cm² as their cross sections are so small that even using attobarns can’t give you a sensible number. So a femtobarn is just a cross section or area, 10^-43, square metres.

In a colliding beam storage ring like the LHC, Luminosity measures the useful collision rate, how many particles are in the beams, how tightly the beams have been focussed and how well they are aligned when they collide. The event rate is the produce of the cross section and the luminosity, R=Lσ so luminosity is what accelerator designers set out to deliver. Integrated luminosity is just luminosity integrated over time, and ∫L dt = N/σ, as the integrated rate is just the total number of events,∫R dt = N

So there we have it. An integrated luminosity of 8 inverse femtobarns (written 8 fb^-1 or 8 (1/fb) ) means that for a cross section as tiny as 8 fb, one would expect to see, on average, 8 events. 8 is a measurable number – though that depends on backgrounds – but what this is saying is that we can detect (if they’re there) rare processes that have cross sections of order 1 fb. That’s the sort of number that many Beyond-the-Standard-Model theories come up with. It’s pushing back the frontier.

If you look at the screen you can see the recorded number as 8000 pb^-1. Yes, an inverse femtobarn is bigger than an inverse picobarn. Which is obvious if you think about it, but disconcerting until you do.

Another point to take from the picture is that the LHC accelerator actually delivered more. 8769.76 pb^-1were delivered to get the 8000 taken. The loss is inevitable due to time lost to ramping voltages, detector calibration and the overall efficiency of over 90% is pretty good.

So it’s a landmark. But these BSM processes havn’t shown up yet and until they do we need to keep taking data – and increase the luminosity. Both the LHCb detector and the LHC accelerator are working hard to achieve this – both are needed, as the detector has to be able to handle the increased data it’s being given. So we’ve passed a milestone, but there’s still a long and exciting road ahead.

Why is the sky blue?

This is a stock ‘Ask the physicist’ question and most physicists think they know the answer. Actually, they only know half the story.

The usual response is “Rayleigh Scattering”. On further probing they will remember, or look up, a formula like

for the intensity of light scattered at an angle θ by molecules of polarisability α a distance R away.

The key point to this formula is that the intensity is proportional to the inverse 4th power of the wavelength. Light’s oscillating electric field pushes all the electrons in a molecule one way and all the nuclei the other, so the molecule (presumably Nitrogen or Oxygen in the stratosphere) responds like any simple harmonic oscillator to a forced oscillation well below its resonant frequency. These oscillating charges act as secondary radiators of EM waves, and higher frequencies radiate more. Visible light varies in frequency by about a factor of 2 over the spectrum (actually a bit less but 2 will do – we speak of the `octave’ of visible EM radiation) so violet light is scattered 16 times as much as red light. So when we look at the sky we see light from the sun that’s been scattered by molecules, and it’s dominated by higher frequency / short wavelengths so it has a blue colour – not completely violet as there is still some longer wavelength light presen

This also explains why light from the sun at sunset (and sunrise), which travels a long way through the atmosphere to get to us, appears redder, having lost its short wavelength component to provide a blue sky for those living to the west (or east) of where we are. It also predicts and explains the polarisation of the scattered blue light: there are dramatic effects to be seen with a pair of polarised sunglasses, but this post’s quite long enough without going into details of them.

Most explanations stop there. This is just as well, because a bit more thought reveals problems. Why don’t light rays show this behaviour at ground level? Why don’t they scatter in solids and liquids, in glass and water? There are many more molecules to do the scattering, after all, but we don’t see any blue light coming out sideways from a light beam going through a glass block or a glass of water.

The reason appears when we consider scattering by several molecules. They are excited by the same EM wave and all start oscillating, the oscillations are secondary sources of EM radiation which could be perceived by an observer – except that they are, in general, out of phase. The light from source to molecule to observer takes different optical paths for each molecule, and when you add them all together they will (apart from statistical variations which are insignificant) sum to zero. To put it another way, when you shine a light beam through a piece of glass and look at it from the side, you perceive different induced dipoles, but half will point up and half will point down, and there is no net effect. The random phase factors only cancel if you look directly along or against the direction of the beam – against the beam the secondary sources combine to give a reflected ray, along the beam their combined effect is out of phase with the original ray and their sum slips in phase – making the light beam slow down.

light scattered by different molecules is out of phase and sums to zero

So we’re stuck. One molecule is not enough to turn the sky blue, you need many. But many molecules co-operate in such a way that there is no side scattering. Dust was once suggested as the reason but dust is only present in exceptional circumstances like after volcano eruptions.

The only way to do it would be if the molecules grouped together in clusters. Clusters small compared to the wavelength of visible photons, but separated by distances large compared to their coherence length. Why would they ever do that?

But they do. Molecules in a gas – unlike those in a solid or liquid – are scattered in random positions and form clusters by sheer statistical variation. This clustering is enhanced by the attractive forces between molecules – the same forces that makes them condense into a liquid at higher pressures / lower temperatures. So the fluctuations in density in the stratosphere are considerable; their size is small and their separation is large, and it’s these fluctuations in molecular density that give us the bright blue sky.

The figure shows (very schematically) how this happens. In the first plot the density is very low and the few molecules are widely separated. In the second the density is higher and even though the distribution shown here is random, clusters emerge due to statistical fluctuations. In the third plot these clusters are enhanced by attraction between the molecules. In the final plot the density is so high they form a solid (or liquid).

This puzzle was not solved by Rayleigh but by – yet again – Albert Einstein. In 1910 he explained (Annalen der Physik 33 1275 (1910)) the size and nature of the density fluctuations in a gas, and showed how the theory explained the phenomenon of critical opalescence, when gases turn milky-white at the critical point, and that the sky was an example of this. It dosn’t even count as one of his ‘great’ papers – though it does follow on from his 1905 annus mirabilis paper on Brownian motion. He showed that our blue sky comes from light scattering not just off molecules, but off fluctuations in the molecular density.

So if anyone ever asks you why the sky is blue, be sure to give them the full story.

The Lorentz Transformation – a minimal proof

You can find many ways in the textbooks to derive the Lorentz Transformation, starting from Einstein’s famous two postulates: that the laws of physics are the same in all inertial frames, and that the speed of light is a constant. You can do it in one big chunk, or by starting with length contraction and time dilation.

What I want to do here is show a proof which requires only one, surprisingly minimal, assumption, and which relegates ‘light’ to its proper place as a subsidiary phenomenon. This is the opposite of the order which is usually taught, so this is not the sort of proof you get in Relativity101, but after you’ve learnt and are happy with the standard proofs, I think you’ll appreciate this one.

We make some basic assumptions – as indeed we do in a conventional proof, though they’re not usually spelt out. Events occur in continuous time t and continuous space r, though for simplicity we’ll just consider one space dimension x. Space and time are isotropic and homogeneous – there are no special times or places. We can plot events in space-time diagrams, where the t axis is calibrated using repeated identical processes like the swing of a pendulum or the vibrations of a crystal, and the x axis is calibrated using stationary identical rods.

Events cause, and are caused by, other events. For a pair of events A and B it could be that A→B, A has a (possible) effect on B, or that B→A, B has a (possible) effect on A. In the first case we say that A lies in the past of B, and B is in the future of A. In the second case it’s the other way round. We dismiss the possibility that both A→B and B→A, as that leads to paradoxes of the killing-your-grandfather variety. But what about the possibility that neither A→B nor B→A: that there can be pairs of events for which neither can influence the other?

There’s not an obvious answer. If you were designing a universe you could insist that any pair of events must have a causal connection one way or the other, or you could allow the equivalent of the ‘Don’t know’ box. The choice is not forced on us by logic. But let’s suppose that we do live in a universe where this directed link between events is optional rather than compulsory:

There are pairs of events which are not causally connected.

I promised you a single assumption: there it is. Now let’s build on it.

For any event there must be some events which are not causally connected. The assumption says this is for true for some events, but all events must be similar (as space and time are homogeneous) , so this is true in general. So we can drawa space-time diagram showing the events that are past, future, and elsewhere for an event at the origin.

Causality is transitive: if A→B and B→ C then A→ C, as A can influence C through B. That means that at any particular point x, events that are in A‘ s past must be followed by elsewhere events and then future events. They can’t be mixed up. The events occur in defined regions

The Elsewhere region extends to the origin

Even at small distances there must be elsewhere events – if there were some minimum distance from A, Δ, within which all events were either past or future, and B is the event at Δ on the division between past and future, then all events within 2Δ of A must be in the past and future, and so on for 3,4,5….

The lines separating the past, elsewhere and future regions must be straight lines going through the origin. For any point B on the future light cone of A, the gradient of the line separating B‘ s elsewhere and future must have the same gradient as the light cone for A at x=0. But the future light cone of B defines the future light cone of A. So the gradient must be constant all the way. (The same applies for the past light cone, and symmetry requires that the gradient have the same magnitude.)

So to re-cap: first we establish that there are elsewhere events, then that they lie in regions, then that these regions go all the way to the origin, and finally that the shape of the elsewhere region is a simple double wedge. (It’s called a ‘light cone’ as you can imagine extending the picture to two space dimensions by rotating these 2D pictures about the vertical axis, but you probably knew that already.)

Out of this picture a number emerges: the gradient of the line dividing the elsewhere region from the future (or the past). We have no way of knowing what its value is – only that it is finite. It describes the speed of the fastest possible causal signal and we will, of course, denote it by c. It can be viewed as a fundamental property of the universe, or as a way of relating time measurement units to space ones.

Now we’re on more familiar ground. If an event that we denote by (x,t) is observed by someone in a different inertial frame moving at some constant speed relative to the first, they will ascribe different numbers (x’,t’). What is the transformation (x,t)→(x’,t’)?

Let’s assume that zeros are adjusted so that (0,0) is just (0,0). That’s trivial.
We require that vector equations remain true: if (x_A,t_A)=(x_B,t_B)+(x_C,t_C) then (x’_A,t’_A)=(x’_B,t’_B)+(x’_C,t’_C). That limits us to linear transformations x’=Ax+Bt; t’=Cx+Dt. So the transformation is completely described by 4 parameters A,B,C and D.
The inverse transform (x’,t’) to (x,t) must be the same, except that the direction of the speed has changed. That’s the equivalent of changing the sign of x or t. So x=Ax’-Bt’; t=-Cx’+Dt’. The transformation to the new frame and back again must take us exactly back to what we started with, i.e. A(Ax+Bt)-B(Cx+Dt)=x. From which we must have A=D and A²-BC=1. The four parameters are reduced to two.
Finally we impose the requirement that the new co-ordinates (x’,t’) must lie in the same sector (past, present, or elsewhere) as the old. In particular, if x=ct then x’=ct’. That means Act+Bt=c(Cct+Dt) and using A=D from the previous paragraph, this shows B=c²C. The two parameters are reduced to one. This is most neatly expressed by introducing v=-B/A, as then A²-BC=1 gives our old friend A=1/√(1-v²/c²) and substituting A, B, C and D gives the familiar form of the Lorentz transformations.
The Lorentz Transformation

Inspecting these shows that v, which we introduced as a parameter, describes the motion of the point x’=0, the origin of the primed frame, in the original frame, i.e. the speed of one frame with respect to the other.

A bit of algebra shows that the ‘interval’ of an event is the same: x²-c²t²=x’²-c²t’². Which is neat, showing that the points lie on a hyperbola of which the light-cone crossed-lines is the limiting case, so they cannot move between sectors . But we didn’t have to assume that the interval is unchanged, only that an interval of zero remains zero.

So the Lorentz Transformation springs from the basic causal structure of space-time, assuming that not all events are causally connected one way or the other, with c the speed of the fastest causal signal, whatever that happens to be. Length contraction and time dilation follow from this. Then you discover that if you have Coulomb’s Law type electrostatics the Lorentz Transformations give you magnetism and Maxwell’s Equations emerge. These have wavelike solutions with wave velocity c.

In terms of logical argument, the causal structure of the universe just happens to include the possibility that 2 events cannot affect one another in either way. This fundamental property leads to relativity and the Lorentz Transformation, which leads to electromagnetism, which then leads to EM waves and light, even though historically and pedagogically the sequence is presented the other way round.