It seems unthinkable today

As a new lecturer, in the early 1980’s, I soon learnt that the first meeting of the 3rd year examiners was the focal point of the physics department’s year. All the academics would be there: attendance was higher than at any seminar. Because this was the meeting that mattered.

Exams were over, and the marks had been collected and aggregated. Now the final-year students were to be awarded their degree classifications – a decision defining them for the rest of their lives. This was done to a clear scheme: 70% or above was a first, 60% a 2-1, and so on. Anyone making the threshold when all their marks were added got the degree, no question. But what about those just below the line with 69.9% or 58.8%? We reckoned we couldn’t mark more accurately than 2%, so anyone within that margin deserved individual consideration. The external examiner, plus a couple of internal assistants, would examine borderline candidates orally, typically going over a question in which they’d done uncharacteristically badly, to give them the chance to redeem the effects of exam panic or taking a wrong view. It was grim for the students, but we did out best, talking science with them, as one physicist to another, trying to draw out the behaviour characteristic of 1st class (or 2-1 or…) student.

But not all students in the borderlines could be interviewed. There were too many, not if the examining panel was to do the thorough job each candidate deserved. So a selection had to be made, and that’s what this meeting was for. Starting with students scoring 69.99 and working down the list, the chair would ask the opinions of those who knew the student – their tutors, director of studies, and anyone who had been in contact with them during their 3 year course – whether they thought this candidate was in the right place, or if they deserved a shot at the rung above. Those of us who knew an individual would give our opinion – usually in the upward direction, but not always. Medical evidence and other cases of distress was given. On the basis of all this information, the meeting would decide on the interview lists.

As we worked down from 69.99 to 67.00 the case for interview got harder to make. Those with inconsistent performance – between papers, between years – got special attention. This was done at all the borderlines (and in exceptional circumstances for some below the nominal 2% zone).

We were too large a department for me or anyone to know all the students, but we would each know a fair fraction of them, one way or another, with a real interest in their progress and this, their final degree. So it mattered. We were conscientious and careful, and as generous as we could be. At the end of the meeting, which would have lasted more than 2 hours, there was the cathartic feeling of a job well done.

The 2nd meeting of the 3rd year examiners would follow some days later. This was also well attended and important, but there was little opportunity for input. The interviews would have taken place, and the panel would make firm recommendations as to whether or not a students should be nudged up or left in place. The degree lists would be agreed and signed, and we would be done with that cohort of undergraduates and start preparing for the freshers who would replace them.


The university decided that exam marking should be anonymised. The most obvious effect was that the scripts had numbers rather than names, removing the only mildly interesting feature of the tedious business of marking. But a side effect was that the students in the examiners’ meetings became anonymised too. And if candidate 12345 has a score of 69.9%, I have no way of knowing whether this is my student Pat, keen and impressive in tutorials but who made a poor choice of a final year option, or Sam, strictly middle of the road but lucky in their choice of lab partner. There was no way for us to give real information about real people. The university produced sets of rules to guide the selection of candidates for interview, all we could do was rubber-stamp the application of the rules. People gave up attending. Eventually I did too.

By this point some readers’ heads will have exploded with anger. This tale of primitive practices must sound like an account of the fun we used to have bear-baiting and cock-fighting, and the way drowning a witch used to pull the whole village together. Yes, we were overwhelmingly (though not completely) white and male, though I never heard anyone make an overtly racist or sexist comment about a candidate, and I am very sure that anyone who had done so would have been shouted down. We were physicists judging other physicists, and in doing that properly there is no room for any other considerations. There may have been subconscious influences – though we would, by definition, be unaware of that. I can hear the hollow laughter from my non-white and/or female colleagues when I tell them the process wasn’t biassed. But it wasn’t very biassed – and it could not move people down, it could only refrain from moving them up. Although the old system had to go as it was open to unfair discriminatory prejudice, I don’t believe that in our department (and I wouldn’t be prepared to speak for anywhere else) we were unfair. But perhaps you shouldn’t take my word for that.

So the old unfair system based on professional judgement has been replaced by a new unjust system based on soulless number-crunching. There is no good solution: while we draw any line to divide individuals into classes – particularly the 2-1/2-2 border in the middle of the mark distribution – and while we measure something as multidimensional as ‘ability’ by a single number, there are going to be misclassifications. I had hoped that when, thanks to data protection legislation, universities had to publish transcripts of all the student’s marks rather than just the single degree class, that the old crude classification would become unimportant, but this shows no signs of happening.

There is no question that anonymous marking was needed. But any positive reform has some negative side effects, and this was one of them. The informed judgement of a community was replaced by a set of algorithms in a spreadsheet. And replacing personal and expert knowledge of students by numerical operations with spreadsheets is bound to bring injustices. Also a rare instance where the department acted as a whole, rather than as a collection of separate research groups, got wiped from existence.

Writing the abstract

Perhaps the abstract was once a brief summary of the full paper. That is now largely history. In these days of the information explosion the abstract’s purpose is to let the reader know whether they want to spend the time reading your whole paper – which may possibly involve them in the hassle of downloading it and even fighting a paywall.

So there are two aspects: you want to make it inviting: you want the right peer group to read and heed it, and in some cases you want the conference organisers to select it for a talk or poster. `But you also need to inform those who wouldn’t find it relevant that they’d be wasting their time going further.

So it is not a summary. It is not a precis. It does not have to cover everything in your paper. You cannot assume the potential reader (who is probably scrolling down a long list of many such abstracts) will read your abstract all the way through: they will take a glance at the first couple of lines and only read further if you’ve caught their attention.

After writing and reading (and not reading) many abstracts, I have come to rely on the 4 sentence system. It gives a sure-fire mechanism for producing high quality abstracts, it does not involve any staring at a blank sheet of paper waiting for inspiration, and it is also flexible. It works for experimental and theoretical papers, and for simulations. It is good for the reader and the author.

The 4 Sentence Abstract

  1. What you did. This is the opening which will catch the reader’s eye and their attention. Keep it short and specific. Don’t mention your methodology. “We describe the 4 sentence system for writing an abstract.”
  2. Why this is important. This is why you chose to work on this topic, way back when. The core specialist readers will know this, of course, but will be happy to have their views confirmed and reinforced: for those in the field but not quite so specialised it may be necessary to justify the work you’ve done. “Many authors find it difficult to write their abstract, and many paper abstracts are long and unhelpful.”
  3. How your result improves on previous ones. This is your chance to big-up what you’ve done. You have more data, or better apparatus, or a superior technique, or whatever. Now you can mention your methodology, insofar as it’s an improvement on previous work. “Our technique provides an easy-to-use methodical system.”
  4. Give the result. If possible, the actual result, particularly if it’s a relatively straightforward measurement. If (but only if) you are submitting an abstract to a future conference and you havn’t actually got your results yet, you may have to paraphrase this as “Results for … are given.” People using it spend less time writing, and the abstracts they produce are better.”

This is a starting framework which can be adapted. The 4 “sentences” can be split if necessary, their relative length and emphasis varied according to the paper they describe. But it fits pretty much every situation, and it gives a thematic organisation which matches the potential reader’s expectation. (You can write it in the first or third person, active or passive, depending on your preferences and the tradition of your field, provided you’re consistent.)

There is a lot of advice about abstracts around on the web. Many of them are, to my mind, unhelpful in that they see the abstract through the eyes of the author, as a summary based on the paper, rather than through the eyes of a potential reader. I’ve taken to using the 4 sentences: what we did, why it matters, how it’s better, and the result. I now find writing abstracts quick and straightforward, and the results are pretty good.

Why computing can be complicated

It is amazing how simple computation can have profound complexity once you start digging.

Let’s take a simple example: finding the average (arithmetic mean) of a set of numbers. It’s the sort of thing that often turns up in real life, as well as in class exercises. Working from scratch you would write a program like (using C as an example: Python or Matlab or other languages would be very similar)

float sum=0;
for(int j=0;j<n;j++){
   sum += x[j];
float mean=sum/n;

which will compile and run and give the right answer until one day, eventually, you will spot it giving an answer that is wrong. (Well, if you’re lucky you will spot it: if you’re not then its wrong answer could have bad consequences.)

What’s the problem? You won’t find the answer by looking at the code.

The float type indicates that 32 bits are used, shared between a mantissa and and exponent and a sign, and in the usual IEE754 format that gives 24 bits of binary accuracy, corresponding to 7 to 8 decimal places. Which in most cases is plenty.

To help see what’s going on, suppose the computer worked in base 10 rather than base 2, and used 6 digits. So the number 123456 would be stored as 1.23456 x 105 . Now, in that program loop the sum gets bigger and bigger as the values are added. Take a simple case where the values all just happen to be 1.0. Then after you have worked through 1,000,000 of them, the sum is 100000, stored as 1.00000 x 106 . All fine so far. But now add the next value. The sum should be 1000001, but you only have 6 digits so this is also stored as 1.00000 x 106 . Ouch – but the sum is still accurate to 1 part in 106 . But when you add the next value, the same thing happens. If you add 2 million numbers, all ones, the program will tell you that their average is 0.5. Which is not accurate to 1 part in 106 , not nearly!

Going back to the usual but less transparent binary 24 bit precision, the same principles apply. If you add up millions of numbers to find the average, your answer can be seriously wrong. Using double precision gives 53 bit precision, roughly 16 decimal figures, which certainly reduces the problem but doesn’t eliminate it. The case we considered where the numbers are all the same is actually a best-case: if there is a spread in values then the smallest ones will be systematically discarded earlier.

And you’re quite likely to meet datasets with millions of entries. If not today then tomorrow. You may start by finding the mean height of the members of your computing class, for which the program above is fine, but you’ll soon be calculating the mean multiplicity of events in the LHC, or distances of galaxies in the Sloan Digital Sky Survey, or nationwide till receipts for Starbuck’s. And it will bite you.

Fortunately there is an easy remedy. Here’s the safe alternative

float mean=0;
for(int j=0;j<n;j++){
     mean += (x[j]-mean)/(j+1);

Which is actually one line shorter! The slightly inelegant (j+1) in the denominator arises because C arrays start from zero. Algebraically they are equivalent because

but numerically they are different and the trap is avoided. If you use the second code to average a sequence of 1.0 values, it will return an average of 1.0 forever.

So those (like me) who have once been bitten by the problem will routinely code using running averages rather than totals. Just to be safe. The trick is well known.

What is less well known is how to safely evaluate standard deviations. Here one hits a second problem. The algebra runs

where the n/(n-1) factor, Bessel’s correction, just compensates for the fact that the squared standard deviation or variance of a sample is a biassed estimator of that of the parent. We know how to calculate the mean safely, and we can calculate the mean square in the same way. However we then hit another problem if, as often happens, the mean is large compared to the standard deviation.

Suppose what we’ve got is approximately Gaussian (or normal, if you prefer) with a mean of 100 and a standard deviation of 1. Then the calculation in the right hand bracket will look like

10001 – 10000

which gives the correct value of 1. However we’ve put two five-digit numbers into the sum and got a single digit out. If we were working to 5 significant figures, we’re now only working to 1. If the mean were ~1000 rather than ~100 we’d lose two more. There’s a significant loss of precision here.

If the first rule is not to add two numbers of different magnitude, the second is not to subtract two numbers of similar magnitude. Following these rules is hard because an expression like x+y can be an addition or a subtraction depending on the signs of x and y.

This danger can be avoided by doing the calculation in two passes. On the first pass you calculate the mean, as before. On the second pass you calculate the mean of (x-μ)2 where the differences are sensible, of order of the standard deviation. If your data is in an array this is pretty easy to do, but if it’s being read from a file you have to close and re-open it – and if the values are coming from an online data acquisition system it’s not possible.

And there is a solution. It’s called the Welford Online Algorithm and the code can be written as a simple extension of the running-mean program above

 // Welford's algorithm
float mean=x[0];
float V=0;
for(int j=1;j<n;j++){
     float oldmean=mean;
     mean += (x[j]-mean)/(j+1);
     V += ((x[i]- mean)(x[i]-oldmean) - V)/j
float sigma=sqrt(V);

The subtractions and the additions are safe. The use of both the old and new values for the mean accounts algebraically, as Welford showed, for the change that the mean makes to the overall variance. The only differences from our original running average program are the need to keep track of both old and new values, and initially defining the mean as the first element (zero), so the loop starts at j=1, avoiding division by zero: the variance estimate from a single value is meaningless. (It might be good to add a check that n>1 to make it generally safe).

I had suspected such an algorithm should exist but, after searching for years, I only found it recently (thanks to Dr Manuel Schiller of Glasgow University). It’s beautiful and its useful and it deserves to be more widely known.

It is amazing how simple computation can have profound complexity once you start digging.

What’s wrong with Excel?

I just posted a tweet asking how best to dissuade a colleague from presenting results using Excel.

The post had a fair impact – many likes and retweets – but also a lot of people saying, in tones from puzzlement to indignation, that they saw nothing wrong with Excel and this tweet just showed intellectual snobbery on my part.

A proper answer to those 31 replies deserves more than the 280 character Twitter limit, so here it is.

First, this is not an anti-Microsoft thing. When I say “Excel” I include Apple’s Numbers and LibreOffice’s Calc. I mean any spreadsheet program, of which Excel is overwhelmingly the market leader. The brand name has become the generic term, as happened with Hoover and Xerox.

Secondly, there is nothing intrinsically wrong with Excel itself. It is really useful for some purposes. It has spread so widely because it meets a real need. But for many purposes, particularly in my own field (physics) it is, for reasons discussed below, usually the wrong tool.

The problem is that people who have been introduced to it at an early stage then use it because it’s familiar, rather than expending the effort and time to learn something new. They end up digging a trench with a teaspoon, because they know about teaspoons, whereas spades and shovels are new and unfamiliar. They invest lots of time and energy in digging with their teaspoon, and the longer they dig the harder it is to persuade them to change.

From the Apple Numbers standard example. It’s all about sales.

The first and obvious problem is that Excel is a tool for business. Excel tutorials and examples (such as that above) are full of sales, costs, overheads, clients and budgets. That’s where it came from, and why it’s so widely used. Although it deals with numbers, and thanks to the power of mathematics numbers can be used to count anything, the tools it provides to manipulate those numbers – the algebraic formulae the graphs and charts – are those that will be useful and appropriate for business.

That bias could be overcome, but there is a second and much bigger problem. Excel integrates the data and the analysis. You start with a file containing raw numbers. Working within that file you create a chart: you specify what data to plot and how to plot it (colours, axes and so forth). The basic data is embellished with calculations, plots, and text to make (given time and skill) a meaningful and informative graphic.

In the alternative approach (the spade or shovel of the earlier analogy) is to write a program (using R or Python or Matlab or Gnuplot or ROOT or one of the many other excellent languages) which takes the data file and makes the plots from it. The analysis is separated from the data.

Let’s see how this works and why the difference matters. As a neutral example, we’ll take the iris data used by Fisher and countless generations of statistics students. It’s readily available. Let’s suppose you want to plot the Sepal length against the Petal length for all the data. It’s very easy, using a spreadsheet or using a program

Using Apple Numbers (other spreadsheets will be similar) you download the iris data file, open it, and click on

  • Chart
  • Scatter-plot icon.
  • “Add Data”
  • Sepal Length column
  • Petal Length column

and get

In R (other languages will be similar) you read the data (if necessary) and then draw the desired plot

plot(iris$Sepal.Length, iris$Petal.length)

and get

Having looked at your plot, you decide to make it presentable by giving the axes sensible names, by plotting the data as solid red squares, by specifying the limits for x as 4 – 8 and for y as 0 – 7, and removing the ‘Petal length’ title.

Going back to the spreadsheet you click on:

  • The green tick by the ‘Legend’ box, to remove it
  • “Axis”
  • Axis-scale Min, and insert ‘4’ (the other limits are OK)
  • Tick ‘Axis title’
  • Where ‘Value Axis’ appears on the plot, over-write with “Sepal Length (cm)”
  • ‘Value Y’
  • Tick ‘Axis title’
  • Where ‘Value Axis’ appears, over-write with “Petal Length(cm)”
  • “Series”
  • Under ‘Data Symbols’ select the square
  • Click on the chart, then on one of the symbols
  • “Style”
  • ‘Fill Color’ – select a nice red
  • ‘Stroke Color’ – select the same red

In R you type the same function with some extra arguments

plot(iris$Sepal.Length,iris$Petal.Length,xlab="Sepal length (cm)", ylab="Petal length (cm)", xlim=c(4,8), ylim=c(0,7), col='red', pch=15)

So we’ve arrived at pretty much the same place by the two different routes – if you want to tweak the size of the symbols or the axis tick marks and grid lines, this can be done by more clicking (for the spreadsheet) or specifying more function arguments (for R). And for both methods the path has been pretty easy and straightforward, even for a beginner. Some features are not immediately intuitive (like the need to over-write the axis title on the plot, or that a solid square is plotting character 15), but help pages soon point the newbie to the answer.

The plots may be the same, but the means to get there are very different. The R formatting is all contained in the line

plot(iris$Sepal.Length,iris$Petal.Length,xlab="Sepal length (cm)", ylab="Petal length (cm)", xlim=c(4,8), ylim=c(0,7), col='red', pch=15)

whereas the spreadsheet uses over a dozen point/click/fill operations. Which are nice in themselves but make it harder to describe what you’ve done – that left hand column up above is much longer than the one on the right. And that was a specially prepared simple example. If you spend many minutes of artistic creativity improving your plot – changing scales, adding explanatory features, choosing a great colour scheme and nice fonts – you are highly unlikely to remember all the changes you made, to be able to describe them to someone else, or to repeat them yourself for a similar plot tomorrow. And the spreadsheet does not provide such a record, not in the same way the code does.

Now suppose you want to process the data and extract some numbers. As an example, imagine you want to find the mean of the petal width divided by the sepal width. (Don’t ask me why – I’m not a botanist).

  • Click on rightmost column header (“F”) and Add Column After.
  • Click in cell G2, type “=”, then click cell C2, type “/”, then cell E2, to get something like this

(notice how your “/” has been translated into the division-sign that you probably haven’t seen since primary school. But I’m letting my prejudice show…)

  • Click the green tick, then copy the cell to the clipboard by Edit-Copy or Ctrl-C or Command-C
  • Click on cell G3, then drag the mouse as far down the page as you can, then fill those cells by Edit-Paste or Ctrl-V or Command-V
  • Scroll down the page, and repeat until all 150 rows are filled
  • Add another column (this will be H)
  • Somewhere – say H19 – insert “=” then “average(“,click column G , and then “)”. Click the green arrow
  • Then, because it is never good just to show numbers, in H18 type “Mean width ratio”. You will need to widen the column to get it to fit

Add two lines to your code:

> ratio=iris$Petal.Width/iris$Sepal.Width
> print(paste("Mean width ratio",mean(ratio)))
[1] "Mean width ratio 0.411738307332676"

It’s now pretty clear that even for this simple calculation the program is a LOT simpler than the spreadsheet. It smoothly handles the creation of new variables, and mathematical operations. Again the program is a complete record of what you’ve done, that you can look at and (if necessary) discuss with others, whereas the contents of cell 19 are only revealed if you click on it.

As an awful warning of what can go wrong – you may have spotted that the program uses “mean” whereas the spreadsheet uses “average”. That’s a bit off (Statistics 101 tells us that the mode, the mean and the median are three different ‘averages’) but excusable. What is tricky is that if you type “mean(” into the cell, this gets autocorrected to “median(“. What then shows when you look at the spreadsheet is a number which is not obviously wrong. So if you’re careless/hurried and looking at your keyboard rather than the screen, you’re likely to introduce an error which is very hard to spot.

This difference in the way of thinking is brought out if/when you have more than one possible input dataset. For the program, you just change the name of the data file and re-run it. For the spreadsheet, you have to open up the new file and repeat all the click-operations that you used for the first one. Hopefully you can remember what they are – and if not, you can’t straightforwardly re-create them by examining the original spreadsheet.

So Excel can be used to draw nice plots and extract numbers from a dataset, particularly where finance is involved, but it is not appropriate

  • If you want to show someone else how you’ve made those plots
  • If you are not infallible and need to check your actions
  • If you want to be able to consider the steps of a multi-stage analysis
  • If you are going to run the same, or similar, analyses on other datasets

and as most physics data processing problems tick all of these boxes, you shouldn’t be using Excel for one.

Why we’re teaching the Standard Model all wrong

In any description if the Standard Model of Particle Physics, from the serious graduate-level lecture course to the jolly outreach chat for Joe Public, you pretty soon come up against a graphic like this.

“Particles of the Standard Model”

It appears on mugs and on T shirts, on posters and on websites. The colours vary, and sometimes bosons are included. It may be – somewhat pretentiously – described as “the new periodic table”. We’ve all seen it many times. Lots of us have used it – I have myself.

And it’s wrong.

Fundamentally wrong. And we’ve known about it since the 1990’s.

The problem lies with the bottom row: the neutrinos. They are shown as the electron, mu and tau neutrinos, matching the charged leptons.

But what is the electron neutrino? It does not exist – or at least if it does exist, it cannot claim to be a ‘particle’. It does not have a mass. An electron neutrino state is not a solution of the Schrödinger equation: it oscillates between the 3 flavours. Anything that changes its nature when left to itself, without any interaction from other particles, doesn’t deserve to be called an ‘elementary particle’.

That this changing nature happened was a shattering discovery at the time, but now it’s been firmly established over 20 years of careful measurement of these oscillations: from solar neutrinos, atmospheric neutrinos, reactors, sources and neutrino beams.

There are three neutrinos. Call them 1, 2 and 3. They do have definite masses (even if we don’t know what they are) and they do give solutions of the Schrödinger equation: a type 1 neutrino stays a type 1 neutrino until and unless it interacts, likewise 2 stays 2 and 3 stays 3.

So what is an ‘electron neutrino’? Well, when a W particle couples to an electron, it couples to a specific mixture of ν1, ν2, and ν3, That specific mixture is called νe. The muon and tau are similar. Before the 1990s, when the the only information we had about neutrinos came from their W interactions, we only ever met neutrinos in these combinations so it made sense to use them. And they have proved a useful concept over the years. But now we know more about their behaviour – even though that is only how they vary with time – we know that the 1-2-3 states are the fundamental ones.

By way of an analogy: the 1-2-3 states are like 3 notes, say C, E and G, on a piano. Before the 1990s our pianist would only play them in chords: CE, EG and CG (the major third, the minor third and the fifth, but this analogy is getting out of hand…) As we only ever met them in these combinations we assumed that these were the only combinations they ever occurred in which made them fundamental. Now we have a more flexible pianist and know that this is not the case.

We have to make this change if we are going to be consistent between the quarks in the top half of the graphic and the leptons in the bottom. When the W interacts with a u quark it couples to a mixture of d, s and b. Mostly d, it is true, but with a bit of the others. We write d’=Uudd+Uuss+Uubb and introduce the CKM matrix or the Cabibbo angle. But we don’t put d’ in the “periodic table”. That’s because the d quark, the mass eigenstate, leads a vigorous social life interacting with gluons and photons as well as Ws, and it does so as the d quark, not as the d’ mixture. This is all obvious. So we have to treat the neutrinos in the same way.

So if you are a bright annoying student who likes to ask their teacher tough questions (or vice versa), when you’re presented with the WRONG graphic, ask innocently “Why are there lepton number oscillations among the neutral leptons but not between the charged leptons?”, and retreat to a safe distance. There is no good answer if you start from the WRONG graphic. If you start from the RIGHT graphic then the question is trivial: there are no oscillations between the 1-2-3 neutrinos any more than there are between e, mu and tau, or u, c, and t. If you happen to start with a state which is a mixture of the 3 then of course you need to consider the quantum interference effects, for the νe mixture just as you do for the d’ quark state (though the effects play out rather differently).

So don’t use the WRONG Standard model graphic. Change those subscripts on the bottom row, and rejoice in the satisfaction of being right. At least until somebody shows that neutrinos are Majorana particles and we have to re-think the whole thing…

Why can’t science journalists understand p-values?


The Xenon1T experiment has just announced a really interesting excess of events, which could be due to axions or some other new particle or effect. It’s a nice result from a difficult experiment and the research team deserve  a lucky break. Of course like any discovery it could be a rare statistical fluctuation which will go away when more data is taken. They quote the significance as 3.5 sigma, and we can actually follow this: they see 285 events where only 232 are expected: the surplus of 53 is just 3.5 times the standard deviation you would expect from Poisson statistics: 15.2 events, the square root of 232.

This is all fine. But the press accounts – as in, for example,   Scientific American, report this as “there’s about a 2 in 10,000 chance that random background radiation produced the signal”.  It’s nothing of the sort.

Yes, the probability of exceeding 3.5 sigma (known as the p-value) is actually 2.3 in 10,000. But that’s not the probability that the signal was produced by random background. It’s the probability that random background would produce the signal. Not the same thing at all.


What’s the difference? Well, if you buy a lottery ticket there is, according to Wikipedia, a 1 in 7,509,578 chance of winning a million pounds.  Fair enough.  But now you meet a millionaire and ask  “What is the chance they got that way through last week’s lottery?” it’s certainly not 1 in 7,509,578.

There are several paths to riches: inheritance, business and of course previous lottery winners who havn’t spent it all yet. The probability that some plutocrat got that way through a particular week’s lottery depends not just on  that 1 in 7,509,578 number but on  the number of people who buy lottery tickets, and the number of millionaires by  who made their pile by other means. (It’s then just given by Bayes’ theorem – I’ll spare you the formula.)  You can’t find the answer by just knowing p, you need all the others as well.

There is a 1 in 7 chance that your birthday this year falls on a Wednesday, but if today is Wednesday, the probability that it’s your birthday is not 1 in 7. Your local primary school teacher is probably a woman, but most women are not primary teachers. All crows are black, but not all black birds are crows. Everyday examples are all around. For instance – to pick an analogous  one – if you see strange shapes in the sky this could be due to either flying saucers or to unusual weather conditions. Even if a meteorologist calculates that such conditions are very very unusual, you’ll still come down in favour of the mundane explanation.

clouds So going back to the experiment,  the probability that random background would give a signal like this may be 1 in 20,000 but that’s not the probability that this signal was produced by  random background: that also depends on the probabilities we assign to the mundane random background or the exotic axion. Despite this 1 in 20,000 figure I very much doubt that you’d find a particle physicist outside the Xenon1T collaboration who’d give you as much as even odds on the axion theory turning out to be the right one. (Possibly also inside the collaboration, but it’s not polite to ask.)

This is a very common mistake – every announcement of an anomaly comes with its significance reported in terms of the number of sigma, which somebody helpfully translates into the equivalent p-value, which is then explained wrongly, with language like “the probability of the Standard Model being correct is only one in a million” instead of “the probability that the Standard Model would give a result this weird is only one in a million”.   When you’re communicating science then you  use non-technical language so people understand – but you should still get the facts right.


Tips for speakers #1: don’t thank the audience for their attention

One problem anyone faces in putting any sort of talk together is how to finish.  And a depressingly large number of speakers do so with a slide like this

Screenshot 2019-06-15 at 15.27.05

This way of ending a talk came originally, I think, from Japan. And unless you are Japanese you should never use it. A Japanese speaker has centuries of proud samurai tradition behind them, and when they say ” thank you for your attention” what they mean is

Screenshot 2019-06-15 at 15.22.15

If you are not Japanese this does not work. Instead the message conveyed is

Screenshot 2019-06-15 at 15.24.52

Which is not a good way to finish.

And this throws away a golden opportunity.  The end of the talk is the point at which you really have the attention of the audience. This may not be for the best of reasons – perhaps they want to hear the next speaker, or to go off for much-needed coffee, but when you put your conclusions slide up your listeners’ brains move up a gear. They look up from the email on their laptops and wonder what’s next. So your final message is the one with the best chance of being remembered.

Give them the pitch that you hope they’ll take away with them.

“So we have the best results yet on ….”

“So we have the prospect of getting the best results on … in time for next year’s conference”

“There are going to be many applications of this technique”

“We understand the whole process of … a lot better”

Whatever’s appropriate.  Be positive and upbeat and, even if they’ve been asleep for the past 20 minutes,  they will go away with a good feeling about your work, your talk, and your ability as a speaker.


(See what I just did??)







LHCb clocks up 8 Inverse femtobarns

The LHCb experiment has just announced that it’s accumulated 8 inverse femtobarns of data. The screen shows the result and the ongoing totals.

It’s obviously a cause for celebration, but maybe this is an opportunity to explain the rather obscure ‘inverse femtobarn’ unit.

Let’s start with the barn. It’s a unit of area. 10 -28 square metres, so rather small. It was invented by the nuclear physicists to describe the effective target-size corresponding to a particular nuclear reaction. When you fire beam particles at target particles then all sorts of things can happen, with probabilities predicted by complicated quantum mechanical calculations,  but those probabilities can be considered as if each reaction had its own area on the target nucleus: the bigger the area the more likely the reaction. It’s not literally true, of course, but the dimensions are right and you can use it as a model if you don’t push it too far.  Nuclear cross sections, usually called σ, are typically  few barns, or fractions of a barn, so it’s a handy unit.

No, it’s not named after some Professor Barn – it’s probably linked to expressions like “Couldn’t hit the broad side of a barn” or even “Couldn’t hit a barn door with a banjo!”

Particle physicists took this over, though our cross sections were typically smaller – millibarns (10-3 barns) for strong interaction processes, microbarns (10-6) and nanobarns (10-9) for electromagnetic processes such as were measured at PETRA and LEP.  Beyond that lie picobarns (10-12) and femtobarns(10-15).   Only the neutrino physicists stuck with m2 or cm2 as their cross sections are so small that even using attobarns can’t give you a sensible number. So a femtobarn is just a cross section or area, 10-43 , square metres.

In a colliding beam storage ring like the LHC, Luminosity measures the useful collision rate, how many particles are in the beams, how tightly the beams have been focussed and how well they are aligned when they collide.  The event rate is the produce of the cross section and the luminosity, R=Lσ so luminosity is what accelerator designers set out to deliver. Integrated luminosity is just  luminosity integrated over time, and ∫L dt = N/σ, as  the integrated rate is just the total number of events,∫R dt = N

So there we have it.  An integrated luminosity of 8 inverse femtobarns (written 8 fb-1  or 8 (1/fb) ) means that for a cross section as tiny as 8 fb, one would expect to see, on average,  8 events. 8 is a measurable number – though that depends on backgrounds –  but what this is saying is that we can detect (if they’re there) rare processes that have cross sections of order 1 fb.   That’s the sort of number that many Beyond-the-Standard-Model theories come up with. It’s pushing back the frontier.

If you look at the screen you can see the recorded number as 8000 pb-1. Yes, an inverse femtobarn is bigger than an inverse picobarn. Which is obvious if you think about it, but disconcerting until you do.

Another point to take from the picture is that the LHC accelerator actually delivered more. 8769.76 pb-1 were delivered to get the 8000 taken. The loss is inevitable due to time lost to ramping voltages,  detector calibration and the overall efficiency of over 90% is pretty good.

So it’s a landmark.  But these BSM processes havn’t shown up yet and until they do we need to keep taking data – and increase the luminosity. Both the LHCb  detector and the LHC accelerator are working hard to achieve this – both are needed, as the detector has to be able to handle the increased data it’s being given. So we’ve passed a milestone, but there’s still a long and exciting road ahead.

Why is the sky blue?

This is a stock ‘Ask the physicist’ question and most physicists think they know the answer. Actually, they only know half the story.

The usual response is “Rayleigh Scattering”.  On further probing they will remember, or look up, a formula like

for the intensity of light  scattered at an angle θ by molecules of polarisability α  a distance R away.

The sky is blue

The key point to this formula is that  the intensity is proportional to the inverse 4th power of the wavelength. Light’s oscillating electric field pushes all the electrons in a molecule one way and all the nuclei the other, so the molecule (presumably Nitrogen or Oxygen in the stratosphere) responds like any simple harmonic oscillator to a forced oscillation well below its resonant frequency.  These oscillating charges act as secondary radiators of EM waves, and higher frequencies radiate more.  Visible light varies in frequency by about a factor of 2 over the spectrum  (actually a bit less but 2 will do – we speak of the `octave’ of visible EM radiation) so violet light is scattered 16 times as much as red light.  So when we look at the sky we see light from the sun that’s been scattered by molecules,  and it’s dominated by higher frequency / short wavelengths so it has  a blue colour – not completely violet as there is still some longer wavelength light presen

This also explains why light from the sun  at sunset (and sunrise),  which travels a long way through the atmosphere to get to us, appears redder, having lost its short wavelength component to provide a blue sky for those living to the west (or east) of where we are.  It also predicts and explains the polarisation of the scattered blue light: there are dramatic effects to be seen with a pair of polarised sunglasses, but this post’s quite long enough without going into details of them.

Most explanations stop there. This is just as well, because a bit more thought reveals problems. Why don’t light rays show this behaviour at ground level? Why don’t they scatter in solids and liquids, in glass and water? There are many more molecules to do the scattering, after all, but we don’t see any blue light coming out sideways from a light beam going through a glass block or a glass of water.

The reason appears when we consider scattering by several molecules. They are excited by the same EM wave and all start oscillating, the oscillations are secondary sources of EM radiation which could be perceived by an observer – except that they are, in general, out of phase.  The light from source to molecule to observer takes different optical paths for each molecule, and when you add them all together they will (apart from statistical variations which are insignificant) sum to zero. To put it another way, when you shine a light beam through a piece of glass and look at it from the side, you perceive different induced dipoles, but half will point up and half will point down, and there is no net effect.   The random phase factors only cancel if you look directly along or against the direction of the beam – against the beam the secondary sources combine to give a reflected ray, along the beam their combined effect  is out of phase with the original ray and their sum slips in phase – making the light beam slow down.



light scattered by different molecules is out of phase and sums to zero

So we’re stuck.  One molecule is not enough to turn the sky blue, you need many. But many molecules co-operate in such a way that there is no side scattering. Dust was once suggested as the reason but dust is only present in exceptional circumstances like after volcano eruptions.

The only way to do it would be if the molecules grouped together in clusters.   Clusters small compared to the wavelength of visible photons, but separated by distances large compared to their coherence length. Why would they ever do that?

But they do. Molecules in a gas – unlike those in a solid or liquid –  are scattered in random positions and form clusters by sheer statistical variation.  This clustering is enhanced by the attractive forces between molecules – the same forces that makes them condense into a liquid at higher pressures / lower temperatures.  So the fluctuations in density in the stratosphere are considerable; their size is small and their separation is large, and it’s these fluctuations in molecular density that give us the bright blue sky.

Molecules in a gas

The figure shows (very schematically) how this happens. In the first plot the density is very low and the few molecules are widely separated.  In the second the density is higher and even though the distribution shown here is random, clusters emerge due to statistical fluctuations. In the third plot these clusters are enhanced by attraction between the molecules. In the final plot the density is so high they form a solid (or liquid).

This puzzle was not solved by Rayleigh but by – yet again – Albert Einstein.  In 1910 he explained (Annalen der Physik 33 1275 (1910)) the size and nature of the density fluctuations in a gas, and showed how the theory explained the phenomenon of critical opalescence, when gases turn milky-white at the critical point, and that the sky was an example of this. It dosn’t even count as one of his ‘great’ papers – though it does follow on from his 1905 annus mirabilis paper on Brownian motion. He showed that our blue sky comes from light scattering not just off molecules, but off fluctuations in the molecular density.

So if anyone ever asks you why the sky is blue, be sure to give them the full story.

The Lorentz Transformation – a minimal proof



You can find many ways in the textbooks to derive the Lorentz Transformation, starting from Einstein’s famous two postulates: that the laws of physics are the same in all inertial frames, and that the speed of light is a constant. You can do it in one big chunk, or by starting with length contraction and time dilation.

What I want to do here is show a proof which requires only one, surprisingly minimal, assumption, and  which relegates ‘light’ to its proper place as a subsidiary phenomenon. This is the opposite of the order which is usually taught, so this is not the sort of proof  you get in Relativity101, but after you’ve learnt and are happy with the standard proofs, I think you’ll appreciate this one.

We make some basic assumptions – as indeed we do in a conventional proof, though they’re not usually spelt out.  Events occur in continuous time t and continuous space r, though for simplicity we’ll just consider one space dimension x. Space and time are isotropic and homogeneous – there are no special times or places. We can plot events in space-time diagrams, where the t axis is calibrated using repeated identical processes like the swing of a pendulum or the vibrations of a crystal, and the x axis is calibrated using stationary identical rods.

Events cause, and are caused by, other events. For a pair of events A and B it could be that A→BA has a (possible) effect on B, or that B→A, B has a (possible) effect on A. In the first case we say that A lies in the past of B, and B is in the future of A. In the second case it’s the other way round. We dismiss the possibility that both A→B and B→A, as that leads to paradoxes of the killing-your-grandfather variety. But what about the possibility that neither A→B nor B→A: that there can be pairs of events for which neither can influence the other?

There’s not an obvious answer. If you were designing a universe you could insist that any pair of events must have a causal connection one way or the other, or you could allow the equivalent of the ‘Don’t know’ box. The choice is not forced on us by logic. But let’s suppose that we do live in a universe where this directed link between events is optional rather than compulsory:

There are pairs of events which are not causally connected.

I promised you a single assumption: there it is.  Now let’s build on it.

For any event there must be some events which are not causally connected. The assumption says this is for true for some events, but all events must be similar (as space and time are homogeneous) , so this is true in general.  So we can drawa  space-time diagram showing the events  that are  past, future, and elsewhere for an event at the origin.

Events lie in definite regions

Causality is transitive: if A→B and B→ C then A→ C, as A can influence C through B. That means that  at any particular point x, events that are in A‘ s past must be followed by elsewhere events and then future events. They can’t be mixed up.  The events occur in defined regions



The Elsewhere region extends to the origin

Even at small distances there must be elsewhere events – if there were some minimum distance from A, Δ, within which all events were either past or future,  and B is the event at Δ on the division between past and future, then all events within 2Δ of A must be in the past and future, and so on for 3,4,5….



The Elsewhere region is a simple wedge

The lines separating the past, elsewhere and future regions must be straight lines going through the origin. For any point B on the future light cone of A, the gradient of the line separating B‘ s elsewhere and future must have the same gradient as the light cone for A at x=0. But the future light cone of B defines the future light cone of A. So the gradient must be constant all the way. (The same applies for the past light cone, and symmetry requires that the gradient have the same magnitude.)

So to re-cap: first we establish that there are elsewhere events, then that they lie in regions, then that these regions go all the way to the origin, and finally that the shape of the elsewhere region is a simple double wedge. (It’s called a ‘light cone’ as you can imagine extending the picture to two space dimensions by rotating these 2D pictures about the vertical axis, but you probably knew that already.)

Out of this picture a number emerges: the gradient of the line dividing the elsewhere region from the future (or the past). We have no way of knowing what its value is – only that it is finite. It describes the speed of the fastest possible causal signal and we will, of course, denote it by c. It can be viewed as a fundamental property of the universe, or as a way of relating time measurement units to space ones.

Now we’re on more familiar ground. If an event that we denote by (x,t) is observed by someone in a different inertial frame moving at some constant speed relative to the first, they will ascribe different numbers (x’,t’). What is the transformation (x,t)→(x’,t’)?

  1. Let’s assume that zeros are adjusted so that (0,0) is just (0,0). That’s trivial.
  2. We require that vector equations remain true: if (xA,tA)=(xB,tB)+(xC,tC) then  (x’A,t’A)=(x’B,t’B)+(x’C,t’C). That limits us to linear transformations x’=Ax+Bt; t’=Cx+Dt. So the transformation is completely described by 4 parameters A,B,C and D.
  3. The inverse transform  (x’,t’) to (x,t) must be the same, except that the direction of the speed has changed. That’s the equivalent of changing the sign of x or t. So x=Ax’-Bt’; t=-Cx’+Dt’.   The transformation to the new frame and back again must take us exactly back to what we started with, i.e.  A(Ax+Bt)-B(Cx+Dt)=x.  From which we must have A=D and A2-BC=1. The four parameters are reduced to two.
  4. Finally we impose the requirement that the new co-ordinates (x’,t’) must lie in the same sector (past, present, or elsewhere) as the old. In particular, if x=ct then x’=ct’. That means Act+Bt=c(Cct+Dt) and using A=D from the previous paragraph, this shows B=c2C. The two parameters are reduced to one. This is most neatly expressed by introducing v=-B/A, as then A2-BC=1 gives our old friend A=1/√(1-v2/c2) and substituting A, B, C and D gives the familiar form of the Lorentz transformations.

    The Lorentz Transformation

Inspecting these shows that v, which we introduced as a parameter, describes the motion of the point x’=0, the origin of the primed frame, in the original frame, i.e. the speed of one frame with respect to the other.

A bit of algebra shows that the ‘interval’ of an event is the same: x2-c2t2=x’2-c2t’2. Which is neat, showing that the points lie on a hyperbola of which the light-cone crossed-lines is the limiting case, so they cannot move between sectors . But we didn’t have to assume that the interval is unchanged, only that an interval of zero remains zero.

So the Lorentz Transformation springs from the basic causal structure of space-time, assuming that not all events are causally connected one way or the other, with c the speed of the fastest causal signal, whatever that happens to be. Length contraction and time dilation follow from this. Then you discover that if you have Coulomb’s Law type electrostatics the Lorentz Transformations give you magnetism and Maxwell’s Equations emerge. These have wavelike solutions with wave velocity  c. 

In terms of logical argument, the causal structure of the universe just happens to include the possibility that 2 events cannot affect one another in either way. This fundamental property leads to relativity and the Lorentz Transformation,  which leads to electromagnetism, which then leads to EM waves and light, even though historically and pedagogically the sequence is presented the other way round.