You can find many ways in the textbooks to derive the Lorentz Transformation, starting from Einstein’s famous two postulates: that the laws of physics are the same in all inertial frames, and that the speed of light is a constant. You can do it in one big chunk, or by starting with length contraction and time dilation.
What I want to do here is show a proof which requires only one, surprisingly minimal, assumption, and which relegates ‘light’ to its proper place as a subsidiary phenomenon. This is the opposite of the order which is usually taught, so this is not the sort of proof you get in Relativity101, but after you’ve learnt and are happy with the standard proofs, I think you’ll appreciate this one.
We make some basic assumptions – as indeed we do in a conventional proof, though they’re not usually spelt out. Events occur in continuous time t and continuous space r, though for simplicity we’ll just consider one space dimension x. Space and time are isotropic and homogeneous – there are no special times or places. We can plot events in space-time diagrams, where the t axis is calibrated using repeated identical processes like the swing of a pendulum or the vibrations of a crystal, and the x axis is calibrated using stationary identical rods.
Events cause, and are caused by, other events. For a pair of events A and B it could be that A→B, A has a (possible) effect on B, or that B→A, B has a (possible) effect on A. In the first case we say that A lies in the past of B, and B is in the future of A. In the second case it’s the other way round. We dismiss the possibility that both A→B and B→A, as that leads to paradoxes of the killing-your-grandfather variety. But what about the possibility that neither A→B nor B→A: that there can be pairs of events for which neither can influence the other?
There’s not an obvious answer. If you were designing a universe you could insist that any pair of events must have a causal connection one way or the other, or you could allow the equivalent of the ‘Don’t know’ box. The choice is not forced on us by logic. But let’s suppose that we do live in a universe where this directed link between events is optional rather than compulsory:
There are pairs of events which are not causally connected.
I promised you a single assumption: there it is. Now let’s build on it.
For any event there must be some events which are not causally connected. The assumption says this is for true for some events, but all events must be similar (as space and time are homogeneous) , so this is true in general. So we can drawa space-time diagram showing the events that are past, future, and elsewhere for an event at the origin.
Causality is transitive: if A→B and B→ C then A→ C, as A can influence C through B. That means that at any particular point x, events that are in A‘ s past must be followed by elsewhere events and then future events. They can’t be mixed up. The events occur in defined regions
Even at small distances there must be elsewhere events – if there were some minimum distance from A, Δ, within which all events were either past or future, and B is the event at Δ on the division between past and future, then all events within 2Δ of A must be in the past and future, and so on for 3,4,5….
The lines separating the past, elsewhere and future regions must be straight lines going through the origin. For any point B on the future light cone of A, the gradient of the line separating B‘ s elsewhere and future must have the same gradient as the light cone for A at x=0. But the future light cone of B defines the future light cone of A. So the gradient must be constant all the way. (The same applies for the past light cone, and symmetry requires that the gradient have the same magnitude.)
So to re-cap: first we establish that there are elsewhere events, then that they lie in regions, then that these regions go all the way to the origin, and finally that the shape of the elsewhere region is a simple double wedge. (It’s called a ‘light cone’ as you can imagine extending the picture to two space dimensions by rotating these 2D pictures about the vertical axis, but you probably knew that already.)
Out of this picture a number emerges: the gradient of the line dividing the elsewhere region from the future (or the past). We have no way of knowing what its value is – only that it is finite. It describes the speed of the fastest possible causal signal and we will, of course, denote it by c. It can be viewed as a fundamental property of the universe, or as a way of relating time measurement units to space ones.
Now we’re on more familiar ground. If an event that we denote by (x,t) is observed by someone in a different inertial frame moving at some constant speed relative to the first, they will ascribe different numbers (x’,t’). What is the transformation (x,t)→(x’,t’)?
- Let’s assume that zeros are adjusted so that (0,0) is just (0,0). That’s trivial.
- We require that vector equations remain true: if (xA,tA)=(xB,tB)+(xC,tC) then (x’A,t’A)=(x’B,t’B)+(x’C,t’C). That limits us to linear transformations x’=Ax+Bt; t’=Cx+Dt. So the transformation is completely described by 4 parameters A,B,C and D.
- The inverse transform (x’,t’) to (x,t) must be the same, except that the direction of the speed has changed. That’s the equivalent of changing the sign of x or t. So x=Ax’-Bt’; t=-Cx’+Dt’. The transformation to the new frame and back again must take us exactly back to what we started with, i.e. A(Ax+Bt)-B(Cx+Dt)=x. From which we must have A=D and A2-BC=1. The four parameters are reduced to two.
- Finally we impose the requirement that the new co-ordinates (x’,t’) must lie in the same sector (past, present, or elsewhere) as the old. In particular, if x=ct then x’=ct’. That means Act+Bt=c(Cct+Dt) and using A=D from the previous paragraph, this shows B=c2C. The two parameters are reduced to one. This is most neatly expressed by introducing v=-B/A, as then A2-BC=1 gives our old friend A=1/√(1-v2/c2) and substituting A, B, C and D gives the familiar form of the Lorentz transformations.
Inspecting these shows that v, which we introduced as a parameter, describes the motion of the point x’=0, the origin of the primed frame, in the original frame, i.e. the speed of one frame with respect to the other.
A bit of algebra shows that the ‘interval’ of an event is the same: x2-c2t2=x’2-c2t’2. Which is neat, showing that the points lie on a hyperbola of which the light-cone crossed-lines is the limiting case, so they cannot move between sectors . But we didn’t have to assume that the interval is unchanged, only that an interval of zero remains zero.
So the Lorentz Transformation springs from the basic causal structure of space-time, assuming that not all events are causally connected one way or the other, with c the speed of the fastest causal signal, whatever that happens to be. Length contraction and time dilation follow from this. Then you discover that if you have Coulomb’s Law type electrostatics the Lorentz Transformations give you magnetism and Maxwell’s Equations emerge. These have wavelike solutions with wave velocity c.
In terms of logical argument, the causal structure of the universe just happens to include the possibility that 2 events cannot affect one another in either way. This fundamental property leads to relativity and the Lorentz Transformation, which leads to electromagnetism, which then leads to EM waves and light, even though historically and pedagogically the sequence is presented the other way round.
4 thoughts on “The Lorentz Transformation – a minimal proof”
I sometimes find myself convinced by two different opposing arguments.
I read an interesting blog that says that Einstein was not explicit about a hidden, but assumed, postulate when developing the Special Relativity. Fixing this flaw, according to the author, Dr. Robert Buenker, will preserve the empirically proven parts of SR while ridding it of some of its most problematic areas. I would appreciate if you can point out the flaw in his reasoning: The Alternative Lorentz Transformation http://alternativelorentztransformation.blogspot.com/p/introduction.html
Buenker attacks the standard assumption that transverse co-ordinates are not affected – y’=y. However this ‘assumption’ can be justified as follows. Because of the symmetry between the two frames, if y’is less than y (he suggests y’= y/gamma) then we must also have y less than y’, which is a contradiction and absurd, so they must be the same.
“But!” you shout, “Surely that’s what happens in length contraction! Both observers say that a moving rod is shorter than a stationary one. What’s different about x and y?” The difference lies in the question “How do you measure the length of a rod?” to which the answer is that you compare both ends against a ruler and read off the numbers. In doing so the measurements of the two ends must be done at the same time (unless the rod and ruler are stationary). If time elapses between the measurement of one end and the measurement of the other then the ruler will move and the measurement won’t be valid.
So suppose an observer on a railway track lays a metre rule across the track, and places a camera midway. An observer on the train positions a metre rule similarly. As the train passes, the ends of the rules coincide and the two images of this meeting travel to the camera, arriving 0.5/c=17 ns later. The observer on the track argues that they arrive together, they travelled the same distance, so the images were taken simultaneously and this is a valid measurement. The observer on the train argues the same. The track observer may say that the time taken for the images to get to the camera on the train is not 17 ns but, using Pythagoras, 0.5/sqrt(c2-v2), a bit longer, but they will still agree that the times taken for the images at the two ends are the same. So the two observers can agree that the rods have the same length as their ends coincided simultaneously.
If they try and do this for a longitudinal measurement then they won’t agree. If the train observer says that two images arrive together so they must have started together, the track observer will say that as the train is moving, for the events to arrive at the same time the one at +x must have started later – the comparisons were not simultaneous and this is not a valid measurement.
So y’=y and z’=z are not ‘assumptions’. They hold because observers can compare the transverse lengths of measuring rods in such a way that both agree that the measurement is valid (which is not true for longitudinal lengths) so anything other that y’=y would lead to a contradiction.
Botom line: Buenker is wrong and Einstein is right, as usual.
LikeLiked by 2 people
I love the simplicity of this proof; it doesn’t appeal to much more than what I believe any reasonable person would agree with. I’m just going through Landau’s Classical Theory of Fields textbook and his explanation is very similar to what’s outlined here; both have helped me to make sense of a phenomenon I’ve often wondered about yet never quite wrapped my head around (I’m a PhD candidate in experimental heavy ion physics, and I often fear my understanding is well behind the bubble). Thank you for posting this up!
I’m hoping this is the same argument as put forward by Vladimir Ignatowski in 1909 (and others later). Versions of Ignatowski’s argument are usually laid out in more mathematically formal language involving groups, and I have to confess to getting a bit lost. But I can follow your argument easily, even though I suspect it’s the same thing.