Why can’t science journalists understand p-values?


The Xenon1T experiment has just announced a really interesting excess of events, which could be due to axions or some other new particle or effect. It’s a nice result from a difficult experiment and the research team deserve  a lucky break. Of course like any discovery it could be a rare statistical fluctuation which will go away when more data is taken. They quote the significance as 3.5 sigma, and we can actually follow this: they see 285 events where only 232 are expected: the surplus of 53 is just 3.5 times the standard deviation you would expect from Poisson statistics: 15.2 events, the square root of 232.

This is all fine. But the press accounts – as in, for example,   Scientific American, report this as “there’s about a 2 in 10,000 chance that random background radiation produced the signal”.  It’s nothing of the sort.

Yes, the probability of exceeding 3.5 sigma (known as the p-value) is actually 2.3 in 10,000. But that’s not the probability that the signal was produced by random background. It’s the probability that random background would produce the signal. Not the same thing at all.


What’s the difference? Well, if you buy a lottery ticket there is, according to Wikipedia, a 1 in 7,509,578 chance of winning a million pounds.  Fair enough.  But now you meet a millionaire and ask  “What is the chance they got that way through last week’s lottery?” it’s certainly not 1 in 7,509,578.

There are several paths to riches: inheritance, business and of course previous lottery winners who havn’t spent it all yet. The probability that some plutocrat got that way through a particular week’s lottery depends not just on  that 1 in 7,509,578 number but on  the number of people who buy lottery tickets, and the number of millionaires by  who made their pile by other means. (It’s then just given by Bayes’ theorem – I’ll spare you the formula.)  You can’t find the answer by just knowing p, you need all the others as well.

There is a 1 in 7 chance that your birthday this year falls on a Wednesday, but if today is Wednesday, the probability that it’s your birthday is not 1 in 7. Your local primary school teacher is probably a woman, but most women are not primary teachers. All crows are black, but not all black birds are crows. Everyday examples are all around. For instance – to pick an analogous  one – if you see strange shapes in the sky this could be due to either flying saucers or to unusual weather conditions. Even if a meteorologist calculates that such conditions are very very unusual, you’ll still come down in favour of the mundane explanation.

clouds So going back to the experiment,  the probability that random background would give a signal like this may be 1 in 20,000 but that’s not the probability that this signal was produced by  random background: that also depends on the probabilities we assign to the mundane random background or the exotic axion. Despite this 1 in 20,000 figure I very much doubt that you’d find a particle physicist outside the Xenon1T collaboration who’d give you as much as even odds on the axion theory turning out to be the right one. (Possibly also inside the collaboration, but it’s not polite to ask.)

This is a very common mistake – every announcement of an anomaly comes with its significance reported in terms of the number of sigma, which somebody helpfully translates into the equivalent p-value, which is then explained wrongly, with language like “the probability of the Standard Model being correct is only one in a million” instead of “the probability that the Standard Model would give a result this weird is only one in a million”.   When you’re communicating science then you  use non-technical language so people understand – but you should still get the facts right.


The Monty Hall Puzzle

There are many probability paradoxes, but the Monty Hall Puzzle is much the greatest of these, provoking more head scratching and bafflement than any other.

It is easy to state. Monty Hall hosted a TV quiz show “Let’s Make a Deal”, in which a contestant has to choose one of 3 doors: behind one of these is a sports car, whereas the other two both contain a goat. (Some discussions of the puzzle – and there are many – speak of ‘a large prize or smaller prizes’, but they can be dismissed as non-canonical; the goats are essential.) There is no other information, so the contestant has a 1 in 3 chance of guessing correctly. Let’s say, without loss of generality, that they pick door 1.

But Monty doesn’t open it straight away. Instead he opens one of the other 2 doors – let’s say it’s door 3 – and shows that it contains a goat. He then offers the contestant a chance to switch their choice from the original door 1 to door 2.

Should the contestant switch? Or stick? Or does it make no difference?

That’s the question. I suggest you think about it before reading on. What would you do?  Bear in mind that the pressure is on, you are in a spotlight with loud music building up tension, and Monty is insistent for an answer. Putting the contestant under pressure makes good television.

Several arguments are put forward – often vehemently

  1. You should switch: the odds were 1/3 that door 1 was the winner, and 2/3 that it was one of the other doors. You now know the car isn’t behind door 3, so all that 2/3 collapses onto door 2. Switching doubles your chance from 1/3  to 2/3.
  2. There’s no point in switching: all you actually know, discarding the theatricality, is that the car is either behind door 1 or door 2, so the odds are equal.
  3. But you should switch! Suppose there were 100 doors rather than 3. You choose one, and Monty opens 98 others, revealing 98 goats, leaving just one of the non-chosen doors unopened.  You’d surely want to switch to that door he’s so carefully avoided opening.

Thought about it?  OK, the answer is that there is no answer. You don’t yet have enough information to make the decision, as you need to know Monty’s strategy. Maybe he wants you to lose, and only offers you the chance to switch because you’ve chosen the winning door. Or maybe he’s kind and is offering because you’ve chosen the wrong door. (There’s a pragmatic let-out which says that if you don’t know whether to switch or stick you might as well switch, as it can’t do any harm – we can close that bolthole by supposing that Monty will charge you a small amount, $10 or so, to change your mind.)

OK, let’s suppose we know the rules and they are

  1. Monty always opens another door.
  2. He always opens a door with a goat and offers the chance to switch. If both non-chosen doors contain goats he chooses either at random.

Now we have enough information. We can analyse this using frequentist probability, which is what we learnt at school.

 Suppose we did this 1800 times ( a nice large number with lots of useful factors). Then the car would be behind each door 600 times. Alright, not exactly 600 because of the randomness of the process, but the law of large numbers ensures it will be close.

A door is then chosen – this is also random so in each of the 3 x 600 cases door 1 will be chosen in only 3 x 200 times. The other cases can now be discarded as we know they didn’t happen. 

For the 200 cases where the car is behind door 1,  Monty will open door 2 and door 3 100 times each. We know he didn’t open door 2, so only 100 cases survive. But all 200 cases with the car behind door 2 survive, as for them he is sure open door 3. When the car is behind door 3 he is never going to open it. So of the original 1800 instances, door 1 is chosen and door 3 is opened in 300 cases, of which 200 involve a winning door 2 and only 100 have door 1 as the winner. Within this sample the odds are 2:1 in favour of door 2. You should switch!

You can also show show this using Bayes’ theorem. Maybe I’ll write about Bayes’ theorem another time.  For the moment, let’s just accept that when you have data, prior probabilities are multiplied by the likelihood of getting that data, subject to overall normalisation.

The initial probability is 1/3 for each door.


The ‘data’ is that Monty chose to open door 3.  If the winner is door  2, he will certainly open door 3. If it is door 3, he will not open it. If it is door 1, there is a 50% chance of picking door 3 (and 50% for door 2). So the likelihoods are   1/ 2 , 1 and 0 respectively, and after normalisation

P1‘ = 1/ 3         P2‘ = 2/ 3         P3‘=0

So switch! It doubles your chances. 

If you think that’s all obvious and are feeling pretty smug, let’s try a slightly different version of the rules:

  1. Monty always opens another door.
  2. He does this at random. If it reveals a car, he says ‘Tough.” If it contains a goat, he offers a switch.

The frequentist analysis is similar: starting with 1800 cases, if door 1 is chosen then that leaves 600, with 200 for each door being the winner. Now he opens doors 2 and 3 with equal probability, whatever the winning door may be. If it’s door 1, 100 survive as before. If it’s door 2, this time only 100 survive, and in the other hundred he opens door 2 to show a car. For door 3 there are no survivors as he either reveals a goat behind door 2 or a car behind door 3, neither of which has happened. So in this scenario there are 200 survivors, 100 each for doors 1 and 2. The odds are even and there is no point in switching.

Using Bayes’ theorem gives (of course) the same result. The prior probabilities are still all  1/ 3.  The likelihood for Monty to pick door 3 and reveal a goat is  1/ 2 for both door 1 and door 2 concealing a car, and zero for door 3.  Normalising

P1‘ = 1/ 2         P2‘ = 1/ 2         P3‘=0

and theres no point in switching.

So a slight change in the rules switches the result. The arguments 1 to 3 are all suspect.  Even the 3rd argument (which I personally find pretty convincing) is not valid for the second set of rules. If Monty opens 98 doors at random to reveal 98 goats this does not make it any more likely that the 99th possibility is the winner.

If you don’t believe that – or any of the other results – then the only cure is to write a simulation program in the language of your choice. This will only take a few lines, and seeing the results will convince you where mathematical logic can’t.

So the moral is to be very wary of common sense and “intuition” when dealing with probabilities, and to trust only in the results of the calculations. Thank you, Monty!