As a new lecturer, in the early 1980’s, I soon learnt that the first meeting of the 3rd year examiners was the focal point of the physics department’s year. All the academics would be there: attendance was higher than at any seminar. Because this was the meeting that mattered.
Exams were over, and the marks had been collected and aggregated. Now the final-year students were to be awarded their degree classifications – a decision defining them for the rest of their lives. This was done to a clear scheme: 70% or above was a first, 60% a 2-1, and so on. Anyone making the threshold when all their marks were added got the degree, no question. But what about those just below the line with 69.9% or 58.8%? We reckoned we couldn’t mark more accurately than 2%, so anyone within that margin deserved individual consideration. The external examiner, plus a couple of internal assistants, would examine borderline candidates orally, typically going over a question in which they’d done uncharacteristically badly, to give them the chance to redeem the effects of exam panic or taking a wrong view. It was grim for the students, but we did out best, talking science with them, as one physicist to another, trying to draw out the behaviour characteristic of 1st class (or 2-1 or…) student.
But not all students in the borderlines could be interviewed. There were too many, not if the examining panel was to do the thorough job each candidate deserved. So a selection had to be made, and that’s what this meeting was for. Starting with students scoring 69.99 and working down the list, the chair would ask the opinions of those who knew the student – their tutors, director of studies, and anyone who had been in contact with them during their 3 year course – whether they thought this candidate was in the right place, or if they deserved a shot at the rung above. Those of us who knew an individual would give our opinion – usually in the upward direction, but not always. Medical evidence and other cases of distress was given. On the basis of all this information, the meeting would decide on the interview lists.
As we worked down from 69.99 to 67.00 the case for interview got harder to make. Those with inconsistent performance – between papers, between years – got special attention. This was done at all the borderlines (and in exceptional circumstances for some below the nominal 2% zone).
We were too large a department for me or anyone to know all the students, but we would each know a fair fraction of them, one way or another, with a real interest in their progress and this, their final degree. So it mattered. We were conscientious and careful, and as generous as we could be. At the end of the meeting, which would have lasted more than 2 hours, there was the cathartic feeling of a job well done.
The 2nd meeting of the 3rd year examiners would follow some days later. This was also well attended and important, but there was little opportunity for input. The interviews would have taken place, and the panel would make firm recommendations as to whether or not a students should be nudged up or left in place. The degree lists would be agreed and signed, and we would be done with that cohort of undergraduates and start preparing for the freshers who would replace them.
The university decided that exam marking should be anonymised. The most obvious effect was that the scripts had numbers rather than names, removing the only mildly interesting feature of the tedious business of marking. But a side effect was that the students in the examiners’ meetings became anonymised too. And if candidate 12345 has a score of 69.9%, I have no way of knowing whether this is my student Pat, keen and impressive in tutorials but who made a poor choice of a final year option, or Sam, strictly middle of the road but lucky in their choice of lab partner. There was no way for us to give real information about real people. The university produced sets of rules to guide the selection of candidates for interview, all we could do was rubber-stamp the application of the rules. People gave up attending. Eventually I did too.
By this point some readers’ heads will have exploded with anger. This tale of primitive practices must sound like an account of the fun we used to have bear-baiting and cock-fighting, and the way drowning a witch used to pull the whole village together. Yes, we were overwhelmingly (though not completely) white and male, though I never heard anyone make an overtly racist or sexist comment about a candidate, and I am very sure that anyone who had done so would have been shouted down. We were physicists judging other physicists, and in doing that properly there is no room for any other considerations. There may have been subconscious influences – though we would, by definition, be unaware of that. I can hear the hollow laughter from my non-white and/or female colleagues when I tell them the process wasn’t biassed. But it wasn’t very biassed – and it could not move people down, it could only refrain from moving them up. Although the old system had to go as it was open to unfair discriminatory prejudice, I don’t believe that in our department (and I wouldn’t be prepared to speak for anywhere else) we were unfair. But perhaps you shouldn’t take my word for that.
So the old unfair system based on professional judgement has been replaced by a new unjust system based on soulless number-crunching. There is no good solution: while we draw any line to divide individuals into classes – particularly the 2-1/2-2 border in the middle of the mark distribution – and while we measure something as multidimensional as ‘ability’ by a single number, there are going to be misclassifications. I had hoped that when, thanks to data protection legislation, universities had to publish transcripts of all the student’s marks rather than just the single degree class, that the old crude classification would become unimportant, but this shows no signs of happening.
There is no question that anonymous marking was needed. But any positive reform has some negative side effects, and this was one of them. The informed judgement of a community was replaced by a set of algorithms in a spreadsheet. And replacing personal and expert knowledge of students by numerical operations with spreadsheets is bound to bring injustices. Also a rare instance where the department acted as a whole, rather than as a collection of separate research groups, got wiped from existence.