Writing the abstract

Perhaps the abstract was once a brief summary of the full paper. That is now largely history. In these days of the information explosion the abstract’s purpose is to let the reader know whether they want to spend the time reading your whole paper – which may possibly involve them in the hassle of downloading it and even fighting a paywall.

So there are two aspects: you want to make it inviting: you want the right peer group to read and heed it, and in some cases you want the conference organisers to select it for a talk or poster. `But you also need to inform those who wouldn’t find it relevant that they’d be wasting their time going further.

So it is not a summary. It is not a precis. It does not have to cover everything in your paper. You cannot assume the potential reader (who is probably scrolling down a long list of many such abstracts) will read your abstract all the way through: they will take a glance at the first couple of lines and only read further if you’ve caught their attention.

After writing and reading (and not reading) many abstracts, I have come to rely on the 4 sentence system. It gives a sure-fire mechanism for producing high quality abstracts, it does not involve any staring at a blank sheet of paper waiting for inspiration, and it is also flexible. It works for experimental and theoretical papers, and for simulations. It is good for the reader and the author.

The 4 Sentence Abstract

  1. What you did. This is the opening which will catch the reader’s eye and their attention. Keep it short and specific. Don’t mention your methodology. “We describe the 4 sentence system for writing an abstract.”
  2. Why this is important. This is why you chose to work on this topic, way back when. The core specialist readers will know this, of course, but will be happy to have their views confirmed and reinforced: for those in the field but not quite so specialised it may be necessary to justify the work you’ve done. “Many authors find it difficult to write their abstract, and many paper abstracts are long and unhelpful.”
  3. How your result improves on previous ones. This is your chance to big-up what you’ve done. You have more data, or better apparatus, or a superior technique, or whatever. Now you can mention your methodology, insofar as it’s an improvement on previous work. “Our technique provides an easy-to-use methodical system.”
  4. Give the result. If possible, the actual result, particularly if it’s a relatively straightforward measurement. If (but only if) you are submitting an abstract to a future conference and you havn’t actually got your results yet, you may have to paraphrase this as “Results for … are given.” People using it spend less time writing, and the abstracts they produce are better.”

This is a starting framework which can be adapted. The 4 “sentences” can be split if necessary, their relative length and emphasis varied according to the paper they describe. But it fits pretty much every situation, and it gives a thematic organisation which matches the potential reader’s expectation. (You can write it in the first or third person, active or passive, depending on your preferences and the tradition of your field, provided you’re consistent.)

There is a lot of advice about abstracts around on the web. Many of them are, to my mind, unhelpful in that they see the abstract through the eyes of the author, as a summary based on the paper, rather than through the eyes of a potential reader. I’ve taken to using the 4 sentences: what we did, why it matters, how it’s better, and the result. I now find writing abstracts quick and straightforward, and the results are pretty good.

Why computing can be complicated

It is amazing how simple computation can have profound complexity once you start digging.

Let’s take a simple example: finding the average (arithmetic mean) of a set of numbers. It’s the sort of thing that often turns up in real life, as well as in class exercises. Working from scratch you would write a program like (using C as an example: Python or Matlab or other languages would be very similar)

float sum=0;
for(int j=0;j<n;j++){
   sum += x[j];
   }
float mean=sum/n;

which will compile and run and give the right answer until one day, eventually, you will spot it giving an answer that is wrong. (Well, if you’re lucky you will spot it: if you’re not then its wrong answer could have bad consequences.)

What’s the problem? You won’t find the answer by looking at the code.

The float type indicates that 32 bits are used, shared between a mantissa and and exponent and a sign, and in the usual IEE754 format that gives 24 bits of binary accuracy, corresponding to 7 to 8 decimal places. Which in most cases is plenty.

To help see what’s going on, suppose the computer worked in base 10 rather than base 2, and used 6 digits. So the number 123456 would be stored as 1.23456 x 105 . Now, in that program loop the sum gets bigger and bigger as the values are added. Take a simple case where the values all just happen to be 1.0. Then after you have worked through 1,000,000 of them, the sum is 100000, stored as 1.00000 x 106 . All fine so far. But now add the next value. The sum should be 1000001, but you only have 6 digits so this is also stored as 1.00000 x 106 . Ouch – but the sum is still accurate to 1 part in 106 . But when you add the next value, the same thing happens. If you add 2 million numbers, all ones, the program will tell you that their average is 0.5. Which is not accurate to 1 part in 106 , not nearly!

Going back to the usual but less transparent binary 24 bit precision, the same principles apply. If you add up millions of numbers to find the average, your answer can be seriously wrong. Using double precision gives 53 bit precision, roughly 16 decimal figures, which certainly reduces the problem but doesn’t eliminate it. The case we considered where the numbers are all the same is actually a best-case: if there is a spread in values then the smallest ones will be systematically discarded earlier.

And you’re quite likely to meet datasets with millions of entries. If not today then tomorrow. You may start by finding the mean height of the members of your computing class, for which the program above is fine, but you’ll soon be calculating the mean multiplicity of events in the LHC, or distances of galaxies in the Sloan Digital Sky Survey, or nationwide till receipts for Starbuck’s. And it will bite you.

Fortunately there is an easy remedy. Here’s the safe alternative

float mean=0;
for(int j=0;j<n;j++){
     mean += (x[j]-mean)/(j+1);
     } 

Which is actually one line shorter! The slightly inelegant (j+1) in the denominator arises because C arrays start from zero. Algebraically they are equivalent because

but numerically they are different and the trap is avoided. If you use the second code to average a sequence of 1.0 values, it will return an average of 1.0 forever.

So those (like me) who have once been bitten by the problem will routinely code using running averages rather than totals. Just to be safe. The trick is well known.

What is less well known is how to safely evaluate standard deviations. Here one hits a second problem. The algebra runs

where the n/(n-1) factor, Bessel’s correction, just compensates for the fact that the squared standard deviation or variance of a sample is a biassed estimator of that of the parent. We know how to calculate the mean safely, and we can calculate the mean square in the same way. However we then hit another problem if, as often happens, the mean is large compared to the standard deviation.

Suppose what we’ve got is approximately Gaussian (or normal, if you prefer) with a mean of 100 and a standard deviation of 1. Then the calculation in the right hand bracket will look like

10001 – 10000

which gives the correct value of 1. However we’ve put two five-digit numbers into the sum and got a single digit out. If we were working to 5 significant figures, we’re now only working to 1. If the mean were ~1000 rather than ~100 we’d lose two more. There’s a significant loss of precision here.

If the first rule is not to add two numbers of different magnitude, the second is not to subtract two numbers of similar magnitude. Following these rules is hard because an expression like x+y can be an addition or a subtraction depending on the signs of x and y.

This danger can be avoided by doing the calculation in two passes. On the first pass you calculate the mean, as before. On the second pass you calculate the mean of (x-μ)2 where the differences are sensible, of order of the standard deviation. If your data is in an array this is pretty easy to do, but if it’s being read from a file you have to close and re-open it – and if the values are coming from an online data acquisition system it’s not possible.

And there is a solution. It’s called the Welford Online Algorithm and the code can be written as a simple extension of the running-mean program above

 // Welford's algorithm
 
float mean=x[0];
float V=0;
for(int j=1;j<n;j++){
     float oldmean=mean;
     mean += (x[j]-mean)/(j+1);
     V += ((x[i]- mean)(x[i]-oldmean) - V)/j
     } 
float sigma=sqrt(V);

The subtractions and the additions are safe. The use of both the old and new values for the mean accounts algebraically, as Welford showed, for the change that the mean makes to the overall variance. The only differences from our original running average program are the need to keep track of both old and new values, and initially defining the mean as the first element (zero), so the loop starts at j=1, avoiding division by zero: the variance estimate from a single value is meaningless. (It might be good to add a check that n>1 to make it generally safe).

I had suspected such an algorithm should exist but, after searching for years, I only found it recently (thanks to Dr Manuel Schiller of Glasgow University). It’s beautiful and its useful and it deserves to be more widely known.

It is amazing how simple computation can have profound complexity once you start digging.

What’s wrong with Excel?

I just posted a tweet asking how best to dissuade a colleague from presenting results using Excel.

The post had a fair impact – many likes and retweets – but also a lot of people saying, in tones from puzzlement to indignation, that they saw nothing wrong with Excel and this tweet just showed intellectual snobbery on my part.

A proper answer to those 31 replies deserves more than the 280 character Twitter limit, so here it is.

First, this is not an anti-Microsoft thing. When I say “Excel” I include Apple’s Numbers and LibreOffice’s Calc. I mean any spreadsheet program, of which Excel is overwhelmingly the market leader. The brand name has become the generic term, as happened with Hoover and Xerox.

Secondly, there is nothing intrinsically wrong with Excel itself. It is really useful for some purposes. It has spread so widely because it meets a real need. But for many purposes, particularly in my own field (physics) it is, for reasons discussed below, usually the wrong tool.

The problem is that people who have been introduced to it at an early stage then use it because it’s familiar, rather than expending the effort and time to learn something new. They end up digging a trench with a teaspoon, because they know about teaspoons, whereas spades and shovels are new and unfamiliar. They invest lots of time and energy in digging with their teaspoon, and the longer they dig the harder it is to persuade them to change.

From the Apple Numbers standard example. It’s all about sales.

The first and obvious problem is that Excel is a tool for business. Excel tutorials and examples (such as that above) are full of sales, costs, overheads, clients and budgets. That’s where it came from, and why it’s so widely used. Although it deals with numbers, and thanks to the power of mathematics numbers can be used to count anything, the tools it provides to manipulate those numbers – the algebraic formulae the graphs and charts – are those that will be useful and appropriate for business.

That bias could be overcome, but there is a second and much bigger problem. Excel integrates the data and the analysis. You start with a file containing raw numbers. Working within that file you create a chart: you specify what data to plot and how to plot it (colours, axes and so forth). The basic data is embellished with calculations, plots, and text to make (given time and skill) a meaningful and informative graphic.

In the alternative approach (the spade or shovel of the earlier analogy) is to write a program (using R or Python or Matlab or Gnuplot or ROOT or one of the many other excellent languages) which takes the data file and makes the plots from it. The analysis is separated from the data.

Let’s see how this works and why the difference matters. As a neutral example, we’ll take the iris data used by Fisher and countless generations of statistics students. It’s readily available. Let’s suppose you want to plot the Sepal length against the Petal length for all the data. It’s very easy, using a spreadsheet or using a program

Using Apple Numbers (other spreadsheets will be similar) you download the iris data file, open it, and click on

  • Chart
  • Scatter-plot icon.
  • “Add Data”
  • Sepal Length column
  • Petal Length column

and get

In R (other languages will be similar) you read the data (if necessary) and then draw the desired plot

iris=read.csv("filename")
plot(iris$Sepal.Length, iris$Petal.length)

and get

Having looked at your plot, you decide to make it presentable by giving the axes sensible names, by plotting the data as solid red squares, by specifying the limits for x as 4 – 8 and for y as 0 – 7, and removing the ‘Petal length’ title.

Going back to the spreadsheet you click on:

  • The green tick by the ‘Legend’ box, to remove it
  • “Axis”
  • Axis-scale Min, and insert ‘4’ (the other limits are OK)
  • Tick ‘Axis title’
  • Where ‘Value Axis’ appears on the plot, over-write with “Sepal Length (cm)”
  • ‘Value Y’
  • Tick ‘Axis title’
  • Where ‘Value Axis’ appears, over-write with “Petal Length(cm)”
  • “Series”
  • Under ‘Data Symbols’ select the square
  • Click on the chart, then on one of the symbols
  • “Style”
  • ‘Fill Color’ – select a nice red
  • ‘Stroke Color’ – select the same red

In R you type the same function with some extra arguments

plot(iris$Sepal.Length,iris$Petal.Length,xlab="Sepal length (cm)", ylab="Petal length (cm)", xlim=c(4,8), ylim=c(0,7), col='red', pch=15)

So we’ve arrived at pretty much the same place by the two different routes – if you want to tweak the size of the symbols or the axis tick marks and grid lines, this can be done by more clicking (for the spreadsheet) or specifying more function arguments (for R). And for both methods the path has been pretty easy and straightforward, even for a beginner. Some features are not immediately intuitive (like the need to over-write the axis title on the plot, or that a solid square is plotting character 15), but help pages soon point the newbie to the answer.

The plots may be the same, but the means to get there are very different. The R formatting is all contained in the line

plot(iris$Sepal.Length,iris$Petal.Length,xlab="Sepal length (cm)", ylab="Petal length (cm)", xlim=c(4,8), ylim=c(0,7), col='red', pch=15)

whereas the spreadsheet uses over a dozen point/click/fill operations. Which are nice in themselves but make it harder to describe what you’ve done – that left hand column up above is much longer than the one on the right. And that was a specially prepared simple example. If you spend many minutes of artistic creativity improving your plot – changing scales, adding explanatory features, choosing a great colour scheme and nice fonts – you are highly unlikely to remember all the changes you made, to be able to describe them to someone else, or to repeat them yourself for a similar plot tomorrow. And the spreadsheet does not provide such a record, not in the same way the code does.

Now suppose you want to process the data and extract some numbers. As an example, imagine you want to find the mean of the petal width divided by the sepal width. (Don’t ask me why – I’m not a botanist).

  • Click on rightmost column header (“F”) and Add Column After.
  • Click in cell G2, type “=”, then click cell C2, type “/”, then cell E2, to get something like this

(notice how your “/” has been translated into the division-sign that you probably haven’t seen since primary school. But I’m letting my prejudice show…)

  • Click the green tick, then copy the cell to the clipboard by Edit-Copy or Ctrl-C or Command-C
  • Click on cell G3, then drag the mouse as far down the page as you can, then fill those cells by Edit-Paste or Ctrl-V or Command-V
  • Scroll down the page, and repeat until all 150 rows are filled
  • Add another column (this will be H)
  • Somewhere – say H19 – insert “=” then “average(“,click column G , and then “)”. Click the green arrow
  • Then, because it is never good just to show numbers, in H18 type “Mean width ratio”. You will need to widen the column to get it to fit

Add two lines to your code:

> ratio=iris$Petal.Width/iris$Sepal.Width
> print(paste("Mean width ratio",mean(ratio)))
[1] "Mean width ratio 0.411738307332676"

It’s now pretty clear that even for this simple calculation the program is a LOT simpler than the spreadsheet. It smoothly handles the creation of new variables, and mathematical operations. Again the program is a complete record of what you’ve done, that you can look at and (if necessary) discuss with others, whereas the contents of cell 19 are only revealed if you click on it.

As an awful warning of what can go wrong – you may have spotted that the program uses “mean” whereas the spreadsheet uses “average”. That’s a bit off (Statistics 101 tells us that the mode, the mean and the median are three different ‘averages’) but excusable. What is tricky is that if you type “mean(” into the cell, this gets autocorrected to “median(“. What then shows when you look at the spreadsheet is a number which is not obviously wrong. So if you’re careless/hurried and looking at your keyboard rather than the screen, you’re likely to introduce an error which is very hard to spot.

This difference in the way of thinking is brought out if/when you have more than one possible input dataset. For the program, you just change the name of the data file and re-run it. For the spreadsheet, you have to open up the new file and repeat all the click-operations that you used for the first one. Hopefully you can remember what they are – and if not, you can’t straightforwardly re-create them by examining the original spreadsheet.

So Excel can be used to draw nice plots and extract numbers from a dataset, particularly where finance is involved, but it is not appropriate

  • If you want to show someone else how you’ve made those plots
  • If you are not infallible and need to check your actions
  • If you want to be able to consider the steps of a multi-stage analysis
  • If you are going to run the same, or similar, analyses on other datasets

and as most physics data processing problems tick all of these boxes, you shouldn’t be using Excel for one.

Why we’re teaching the Standard Model all wrong

In any description if the Standard Model of Particle Physics, from the serious graduate-level lecture course to the jolly outreach chat for Joe Public, you pretty soon come up against a graphic like this.

“Particles of the Standard Model”

It appears on mugs and on T shirts, on posters and on websites. The colours vary, and sometimes bosons are included. It may be – somewhat pretentiously – described as “the new periodic table”. We’ve all seen it many times. Lots of us have used it – I have myself.

And it’s wrong.

Fundamentally wrong. And we’ve known about it since the 1990’s.

The problem lies with the bottom row: the neutrinos. They are shown as the electron, mu and tau neutrinos, matching the charged leptons.

But what is the electron neutrino? It does not exist – or at least if it does exist, it cannot claim to be a ‘particle’. It does not have a mass. An electron neutrino state is not a solution of the Schrödinger equation: it oscillates between the 3 flavours. Anything that changes its nature when left to itself, without any interaction from other particles, doesn’t deserve to be called an ‘elementary particle’.

That this changing nature happened was a shattering discovery at the time, but now it’s been firmly established over 20 years of careful measurement of these oscillations: from solar neutrinos, atmospheric neutrinos, reactors, sources and neutrino beams.

There are three neutrinos. Call them 1, 2 and 3. They do have definite masses (even if we don’t know what they are) and they do give solutions of the Schrödinger equation: a type 1 neutrino stays a type 1 neutrino until and unless it interacts, likewise 2 stays 2 and 3 stays 3.

So what is an ‘electron neutrino’? Well, when a W particle couples to an electron, it couples to a specific mixture of ν1, ν2, and ν3, That specific mixture is called νe. The muon and tau are similar. Before the 1990s, when the the only information we had about neutrinos came from their W interactions, we only ever met neutrinos in these combinations so it made sense to use them. And they have proved a useful concept over the years. But now we know more about their behaviour – even though that is only how they vary with time – we know that the 1-2-3 states are the fundamental ones.

By way of an analogy: the 1-2-3 states are like 3 notes, say C, E and G, on a piano. Before the 1990s our pianist would only play them in chords: CE, EG and CG (the major third, the minor third and the fifth, but this analogy is getting out of hand…) As we only ever met them in these combinations we assumed that these were the only combinations they ever occurred in which made them fundamental. Now we have a more flexible pianist and know that this is not the case.

We have to make this change if we are going to be consistent between the quarks in the top half of the graphic and the leptons in the bottom. When the W interacts with a u quark it couples to a mixture of d, s and b. Mostly d, it is true, but with a bit of the others. We write d’=Uudd+Uuss+Uubb and introduce the CKM matrix or the Cabibbo angle. But we don’t put d’ in the “periodic table”. That’s because the d quark, the mass eigenstate, leads a vigorous social life interacting with gluons and photons as well as Ws, and it does so as the d quark, not as the d’ mixture. This is all obvious. So we have to treat the neutrinos in the same way.

So if you are a bright annoying student who likes to ask their teacher tough questions (or vice versa), when you’re presented with the WRONG graphic, ask innocently “Why are there lepton number oscillations among the neutral leptons but not between the charged leptons?”, and retreat to a safe distance. There is no good answer if you start from the WRONG graphic. If you start from the RIGHT graphic then the question is trivial: there are no oscillations between the 1-2-3 neutrinos any more than there are between e, mu and tau, or u, c, and t. If you happen to start with a state which is a mixture of the 3 then of course you need to consider the quantum interference effects, for the νe mixture just as you do for the d’ quark state (though the effects play out rather differently).

So don’t use the WRONG Standard model graphic. Change those subscripts on the bottom row, and rejoice in the satisfaction of being right. At least until somebody shows that neutrinos are Majorana particles and we have to re-think the whole thing…

Antineutrinos and the failure of Occam’s Razor

William Of Ockham is one of the few medieval theologian/philosophers whose name survives today, thanks to his formulation of the principle known as Occam’s Razor. In the original latin, if you want to show off, it runs Non sunt multiplicanda entia sine necessitate, or Entities are not to be multiplied without necessity, which can be loosely paraphrased as The simplest explanation is the best one, an idea that is as attractive to a  21st century audience as it was back in the 14th.

 William of Ockham

Now fast forward a few centuries and let’s try and apply this to the neutrino. People talk about the “Dirac Neutrino” but that’s a bit off-target. Paul Dirac produced the definitive description not of the neutrino but of the electron. The Dirac Equation shows – as explained in countless graduate physics courses – that there have to be 2×2=4 types of electron: there are the usual negatively charged ones and the rarer positively charged ones (usually known as positrons), and for each of these the intrinsic spin can point along the direction of motion (‘right handed’) or against it (‘left handed’). The charge is a basic property that can’t change, but handedness depends on the observer (if you and I observe and discuss electrons while the two of us are moving, we will agree about their directions of spin but not about their directions of motion.)

Paul Dirac, 1933

Dirac worked all this out to describe how the electron experienced the electromagnetic force.  But it turned out to be the key to describing its behaviour in the beta-decay weak force as well. But with a twist. Only the left handed electron and the right handed positron  ‘feel’ the weak force. If you show a right handed electron or a left handed positron to the W particle that’s responsible for the weak force then it’s just not interested.   This seems weird but has been very firmly established by decades of precision experiments.

(If you’re worried that this preference appears to contradict the statement earlier that handedness is observer-dependent then well done! Let’s just say I’ve oversimplified a bit, and the mathematics really does take care of it properly. Give yourself a gold star, and check out the difference between ‘helicity’ and ‘chiralilty’ sometime.) 

Right, that’s enough about electrons, let’s move on to neutrinos. They also interact weakly, very similarly to the electron: only the left-handed neutrino and the right-handed antineutrino are involved, and the right-handed neutrino and left-handed antineutrino don’t.

But it’s worse than that. The left handed neutrino and right handed antineutrino don’t interact weakly: they also don’t interact electromagnetically because the neutrino, unlike the electron, is neutral. And they don’t interact strongly either. In fact they don’t interact full stop.  

And this is where William comes in wielding his razor. Our list of fundamental particles includes this absolutely pointless pair that don’t participate at all. What’s the point of them? Can’t we rewrite our description in a way that leaves them out?

And it turns out that we can.

Ettore Majorana

Ettore Majorana, very soon after Dirac published his equation for the electron, pointed out that for neutral particles a simpler outcome was possible. In his system the ‘antiparticle’ of the left-handed neutrino is the right-handed neutrino. The neutrino, like the photon, is self-conjugate. The experiments that showed that neutrinos and antineutrinos were distinct (neutrinos produce electrons in targets: antineutrinos produce positrons) in fact showed the difference between left-handed and right-handed neutrinos. There are only 2 neutrinos and they both interact, not 2×2 where two of the foursome just play gooseberry.

So hooray for simplicity. But is it?

The electron (and its heavier counterparts, the mu and the tau) is certainly a Dirac particle. So are the quarks, both the 2/3 and the -1/3 varieties. If all the other fundamental fermions are Dirac particles, isn’t it simpler that the neutrino is cut to the same pattern, rather than having its own special prescription? If we understand electrons – which it is fair to say that we do – isn’t it simpler that the neutrino be just a neutral version of the electron, rather than some new entity introduced specially for the purpose?

And that’s where we are. It’s all very well advocating “the simple solution” but how can you tell what’s simple? The jury is still out. Hopefully a future set of experiments (on neutrinoless double beta decay) will give an answer on whether a neutrino can be its own antiparticle, though these are very tough and will take several years. After which we will doubtless see with hindsight the simplicity of the answer, whichever it is, and tell each other that it should have been obvious thanks to William.   But at the moment he’s not really much help.

STV: the benefit nobody talks about

British democracy was created in the 19th century, like the railways. Like the railways it was, for its time, truly world-leading. However after 150 years, like the railways, it is showing its age. People, technology and society have come a long way, and a system which worked yesterday needs to be adapted and improved for the different conditions of today.

One flaw which is becoming increasingly apparent is the way it stifles minorities. Under the first-past-the-post system the winner takes all and the loser gets nothing. Democracy has to be more than that. Even in a simple two party system, 49% of the voters may have no say in how the country is run – and in a multiparty system the ruling party may have the support of well below half of the population.  In a balanced system where the pendulum of power swings to and fro this may not matter too much, but when the difference is structural a large minority is rendered permanently powerless, which in the long run invites revolution.

These arguments are well rehearsed and various schemes to improve proportionality are suggested: the party list system, as was used in the Euro elections, the additional member system, as is used in the regional assemblies, the alternative vote and the Single Transferable Vote. Pundits with spreadsheets discuss the improvements in ‘proportionality’ given by the various schemes. I want to make a point in favour of the STV scheme which has nothing to do, directly, with proportionality. It gives voters the chance to choose their MP from within party list.

Let’s take a town of 200,000 people. Under STV its voters elect three MPs.  Suppose, for simplicity, there are only two parties: Left and Right.   The town is fairly evenly balanced, and elects 2 Left and 1 Right MP in some elections and 2 Right and 1 Left in others, depending on the way the political wind is blowing. 

Now, although each party knows that the best it can hope for is 2 seats out of 3, they will put up 3 candidates. Not to fill the slate will be seen as a sign of weakness.  This happens. To take an example close to home, in the last Euro elections (2019) here in the North West region all the parties (Conservatives, Labour,  Liberal Democrats, Brexit, Change, UKIP and the Greens) all put up full slates of 8 candidates, although they knew that they were never going to win all of them. The picture was the same in other regions. Parties will put up as many candidates as there are seats to be won.

So when a voter in this hypothetical town goes into the polling booth their ballot paper has six names, and they rank them in order (and although the mechanics of counting STV votes are complicated, its use by voters is really simple).  A staunchly pro-Left voter will write 1, 2 and 3 against the Left candidates and 4,  5 and 6 against the Right candidates: a pro-Right voter will do the reverse.  In doing so they are not only expressing their allegiance to a party, they are also expressing their preference for the candidates within that party.   And that preference carries through to the result.

Let’s see how that works.  Suppose that Smith, Jones and Robinson are the candidates for the Left party, which is doing well this time, while Brown, Green and White are standing for the Right party, which is lagging. Smith (a prominent local character) is more popular than Jones (a relative newcomer of whom little is known), while Robinson (whose controversial twitter stream has annoyed many people) is least popular of the three.   As the votes are counted the popular Smith is the first to reach the quota (more than one quarter of the votes cast).  Smith is elected, and surplus votes are diverted to the Jones pile.  

Even with that boost, perhaps neither Jones nor anyone else makes quota.   For the lagging Right party, Brown is the most popular candidate, followed by Green and then White, so the unfortunate White has the smallest number of 1st preference votes and is eliminated, their votes going to Brown who now makes quota.  Robinson is eliminated next,  their votes going to Jones who narrowly beats Green.   Yes, proportionality has worked, after a fashion, in that the town has elected two Left and one Right MP,  but it has done more than that: it has chosen between the candidates within the parties.

Everybody’s vote counts. There may be cases where a ballot is not counted for the voters preferred party – because the candidate made quota or dropped off the bottom – but their 4-5-6 ranking is used to express a preference as to which candidate of their non-preferred party gets elected. And so far we’ve ignored cross-party voting, which will strengthen the effect: voters are not tied to party allegiance and may vote for a popular individual despite their party.

STV also gives a much-needed voice to the majority. There is much – valid – complaint that in a ‘safe’ seat, voters for the losing parties have no say. But voters for the winning party have no say either.  The candidate is appointed by a small selection committee, or by party headquarters.  With STV it may still be effectively built-in that a party is bound to get a seat, but which of the candidates benefits from this is in the hands of the voters. Candidates – and sitting MPs – are going to realise this. They will be aware that they are answerable to the electorate rather than the party machinery. Today a Tory MP in the shires or a Labour MP in the industrial north knows that it would take major misbehaviour on their part to make voters switch party and thereby lose their seat, but with STV they will need to fear a switch in preference within the party ticket, and will treat their voters with much more respect.

This will change the dynamic of elections. Candidates will have to appeal to the electorate not just for their party but for themselves. Bright young SPADs who work the system within the party organisation to get onto the candidate list will also have to appeal to the electors if they’re going to get elected.  It’s worth noting that this dynamic is the opposite to the ‘party list’ system. You sometimes hear people object to PR because it gives control to the party rather than the voter; this applies to the list system but for STV it’s just the opposite.

Hopefully, in 100 years time “safe seats” will have gone the way of rotten boroughs and be consigned to history. STV can make that happen, giving choice to the people rather than the party machinery. 

The Lesson from the Prisoner’s Dilemma

This is a classic puzzle which, like all such, comes in the form of a story. Here is one version:


Alice and Bob are criminals. No question. They have been caught red-handed in a botched robbery of the Smalltown Store, and are now in jail awaiting trial.

The police have realised that Alice and Bob match the description of the pair who successfully robbed the Bigtown Bank last month. They really want to get a conviction for that, but with no evidence apart from the resemblance they need to get a confession.

So they say to Alice: “Look, you are going to get a 1 year sentence for the Smalltown Store job, no question. But if you co-operate with us by confessing that the two of you did the Bigtown Bank heist then we’ll let you go completely free. You can claim Bob was the ringleader and he’ll get a 10 year sentence.”

Alice thinks a moment and asks two questions.

“Are you making the same offer to Bob? What happens if we both confess?”

The police tell her that yes, they are making the same offer to both of them. And if both confess, they’ll get 6 years each.




OK, that’s the story. All that circumstantial detail is just to lead up to this decision table, which Alice is now looking at:

Bob
Confess Deny
Alice Confess 6+6 0+10
Deny 10+0 1+1

That’s the problem in a nutshell. Before we look at it there are maybe a few points to clear up

  • Alice and Bob are not an item. They are just business partners. Each is aiming to minimise their own jail term, and what happens to the other is irrelevant for them.
  • ‘Go free’ really does mean that – there are no vengeful families or gang members to bring retribution on an informer.
  • Whether they actually committed the Bigtown Bank job is completely irrelevant to the puzzle.

OK, let’s get back to Alice. She reasons as follows:

“I don’t know what Bob is going to do. Suppose he denies the bank job. Then I should confess, to reduce my sentence from 1 year to zero. But what if he confesses? In that case, I’d better confess too, to get 6 years rather than 10. Whichever choice Bob makes, the better option for me is to confess. So I’ll confess.”

Bob will, of course, reason the same way. If Alice denies, he should confess. If Alice confesses, he should confess. 0 is less than 1, and 6 is less than 10. Therefore he should confess.

The logic is irrefutable. But look at that table again. The prisoners have firmly chosen the top left box, and will both serve 6 years. That’s a terrible result! It’s not only the worst total sentence (12 years), its the next-to-worst individual sentence (6 years is better than 10, but much worse than 0 or 1). Clearly the bottom right is the box to go for. It’s the optimal joint result and the next-to-optimal individual result.

That is obvious to us because we look at the table as a whole. But Bob (or Alice) can only consider their slices through it and either slice leads to the Confess choice. To see it holistically one has to change the question from the Prisoner’s Dilemma to the Prisoners’ Dilemma. That’s only the movement of an apostrophe, but it’s a total readjustment of the viewpoint. A joint Bob+Alice entity, if the police put them in one room together for a couple of minutes (but they won’t), can take the obvious bottom-right 1+1 choice. Separate individual Bob or Alice units, no matter how rational, cannot do that.

This is what the philosophers call emergence. The whole is more than just the sum of its parts. A forest is more than a number of trees. An animal is more than a bunch of cells. It’s generally discussed in terms of complex large-N systems: what’s nice about the Prisoner’s Dilemma is that emergence appears with just N=2. There is a Bob+Alice entity which is more than Bob and Alice seperately, and makes different (and better) decisions.

There’s also a lesson for politics. It’s an illustration of the way that Mrs Thatcher was wrong: there is such a thing as society, and it is more than just all its individual members. Once you start looking for them, the world is full of examples where groups can do things that individuals can’t – not just from the “united we stand” bundle-of-sticks argument but because they give a different viewpoint.

  • I should stockpile lavatory paper in case there’s a shortage caused by people stockpiling lavatory paper.
  • When recruiting skilled workers it’s quicker and cheaper for me to poach yours rather than train my own.
  • My best fishing strategy is to catch all the fish in the pond, even though that leaves none for you, and none for me tomorrow.
  • If I get another cow that will always give me more milk, even though the common grazing we share is finite.

Following the last instance, economists call this “The tragedy of the commons”. It’s the point at which Adam Smith’s “invisible hand” fails.

This tells us something about democracy. A society or a nation is more than just the individuals that make it up. E pluribus unum means that something larger, more powerful and – dare one say it – better can emerge. So democracy is more than just arithmetical counting noses, democracy provides the means whereby men and women can speak with one voice as a distinct people. That’s the ideal, anyway, and – even if the form we’ve got is clunky and imperfect – some of us still try to believe in it.

Why can’t science journalists understand p-values?

X1T

The Xenon1T experiment has just announced a really interesting excess of events, which could be due to axions or some other new particle or effect. It’s a nice result from a difficult experiment and the research team deserve  a lucky break. Of course like any discovery it could be a rare statistical fluctuation which will go away when more data is taken. They quote the significance as 3.5 sigma, and we can actually follow this: they see 285 events where only 232 are expected: the surplus of 53 is just 3.5 times the standard deviation you would expect from Poisson statistics: 15.2 events, the square root of 232.

This is all fine. But the press accounts – as in, for example,   Scientific American, report this as “there’s about a 2 in 10,000 chance that random background radiation produced the signal”.  It’s nothing of the sort.

Yes, the probability of exceeding 3.5 sigma (known as the p-value) is actually 2.3 in 10,000. But that’s not the probability that the signal was produced by random background. It’s the probability that random background would produce the signal. Not the same thing at all.

a

What’s the difference? Well, if you buy a lottery ticket there is, according to Wikipedia, a 1 in 7,509,578 chance of winning a million pounds.  Fair enough.  But now you meet a millionaire and ask  “What is the chance they got that way through last week’s lottery?” it’s certainly not 1 in 7,509,578.

There are several paths to riches: inheritance, business and of course previous lottery winners who havn’t spent it all yet. The probability that some plutocrat got that way through a particular week’s lottery depends not just on  that 1 in 7,509,578 number but on  the number of people who buy lottery tickets, and the number of millionaires by  who made their pile by other means. (It’s then just given by Bayes’ theorem – I’ll spare you the formula.)  You can’t find the answer by just knowing p, you need all the others as well.

There is a 1 in 7 chance that your birthday this year falls on a Wednesday, but if today is Wednesday, the probability that it’s your birthday is not 1 in 7. Your local primary school teacher is probably a woman, but most women are not primary teachers. All crows are black, but not all black birds are crows. Everyday examples are all around. For instance – to pick an analogous  one – if you see strange shapes in the sky this could be due to either flying saucers or to unusual weather conditions. Even if a meteorologist calculates that such conditions are very very unusual, you’ll still come down in favour of the mundane explanation.

clouds So going back to the experiment,  the probability that random background would give a signal like this may be 1 in 20,000 but that’s not the probability that this signal was produced by  random background: that also depends on the probabilities we assign to the mundane random background or the exotic axion. Despite this 1 in 20,000 figure I very much doubt that you’d find a particle physicist outside the Xenon1T collaboration who’d give you as much as even odds on the axion theory turning out to be the right one. (Possibly also inside the collaboration, but it’s not polite to ask.)

This is a very common mistake – every announcement of an anomaly comes with its significance reported in terms of the number of sigma, which somebody helpfully translates into the equivalent p-value, which is then explained wrongly, with language like “the probability of the Standard Model being correct is only one in a million” instead of “the probability that the Standard Model would give a result this weird is only one in a million”.   When you’re communicating science then you  use non-technical language so people understand – but you should still get the facts right.

 

Tips for speakers#2: Beware of the second slide

Screenshot 2020-03-21 at 14.21.03

In a million and one grad student talks the second slide looks like this: the table-of-contents or the  outline-of-the-talk. It may be a bit more colourful, with banners and logos and exciting pictures, but it’s basically the same, and the speaker will repeat the traditional phrases “After an introduction and a survey of the literature, I’ll describe the methodology we used…”

By this stage, one minute into the talk, the members of the audience are all thinking “Here’s another grad student talk like a million others… so predictable. ” and their attention will wander to their unanswered emails, or their plans for dinner, or an attractively-filled T shirt two rows in front, and the poor speaker has got to work really hard to get them back.

Do you need a contents slide at all?  It’s not compulsory.  Even though some presentation packages provide it almost by default, with sections and subsections, you don’t have to have one.  Before you include it you should weigh up the reasons for and against.

Against:

  • It cuts into your time allocation.
  • It disrupts the flow of the talk as, by definition, it stands outside the narrative.
  • It will tend to shift the focus onto you as the speaker rather than on the material

On the other hand:

  • It can give structure to an otherwise amorphous talk
  • It can help the audience keep track of a complicated sequence of ideas

So its inclusion or exclusion depends on the talk length, the nature of  the material, the links between you and your audience, and your personal style. In a 10 minute conference oral where you’re developing one idea it’s almost certainly not wanted. In a one hour seminar covering disparate but linked topics it could be really useful.  If you’re going to include an outline, that should be a conscious decision, not just something you feel you ought to do.

If you do decide to include one, then make it work for you. Refer back to it during the talk, showing the audience where they’ve got to on the map you set out at the start. (There are some Beamer  themes that do this automatically – Berkeley is a standard.  UpSlide does it for PowerPoint. But it’s easy to do it by hand.) If you’re including an outline then make full use of it.  

Screenshot 2020-03-21 at 16.28.27The final point is: if you’ve decided to include a table of contents then customize it. Make it your own and unique so that it’s not just the same as every other grad student talk. Here’s a revised version of that original outline slide (with some invented details). It’s the same slide: the first bullet is the introduction, the second is the literature search, and so on. But don’t call them that. Fill in your details  in the generic slots, and that will keep the audience engaged and attentive and get them into your world and your language from the start. 

Probability and job applications

In preparing a recent talk on probability for physics grad students – it’s here if you’re interested – I thought up a rather nice example to bring out a key feature of frequentist probability. I decided not to include it, as the talk was already pretty heavy, but it seemed too good an illustration to throw away. So here it is.

Suppose you’ve applied for a job. You make the short list and are summoned for interview. You learn that you’re one of 5 candidates.

So you tell yourself – and perhaps your partner – not to get too hopeful.  There’s only a 20% probability of your getting the job.

But that’s wrong.

That 20% number is a joint property of yourself and what statisticians call the collective, or the ensemble.  Yes, you are one of a collective of 5 candidates, but those candidates are not all the same.

Let me tell you – from my experience of many job interviews, good and bad, on both sides of the table, about those 5 candidates .

hiring-1977803_1280

One candidate will not turn up for the interview. Their car will break down, or their flight will be cancelled, or they will be put in Covid-19 quarantine. Whether their subconscious really doesn’t want them to take this job, or they have a guardian angel who knows it would destroy them, or another candidate is sabotaging them, or they’re just plain unlucky, they don’t show.   There’s always one.

A second candidate will be hopeless. They will have submitted a very carefully prepared CV and application letter that perfectly match everything in the job specification, bouncing back all the buzz-words and ticking all the boxes so that HR says they can’t not be shortlisted. But at the interview they turn out to be unable to do anything except repeat how they satisfy all the requirements, they’ll show no signs of real interest in the work of the job apart from the fact that they desperately want it.

The third candidate will be grim. Appointable, but only just above threshold. The members of the panel who are actually going to work with them are thinking about how they’re going to have to simplify tasks and provide support and backup, and how they really were hoping for someone better than this.

Candidate four is OK.  Someone who understands the real job, not just the job spec in  the advert, and who has some original (though perhaps impractical) ideas. They will make a success of the job and though there will be occasional rough patches they won’t need continual support.

Candidate five is a star. Really impressive qualification and experience on paper, glowing references, and giving a superb interview performance, answering questions with ease and enthusiasm and using them to say more. They will certainly get offered the job – at which point they will ask for a delay, and it will become clear that they’re also applying for a much better job at a superior institution, and that they don’t really want this one which is only an insurance in case they don’t get their top choice.

So there are the five. (Incidentally, they are distributed evenly between genders, backgrounds and ethnicities). Don’t tell yourself your chance is 20%. That’s true only in the sense that your chance of being male (as opposed to female) is 50%.  Which it is, as far as I’m concerned, but certainly not as far as you’re concerned.

Instead ask yourself – which of the five candidates are you?

(If you don’t know, then you’re candidate #3)