Done and Dusted
Thursday saw the release of GCSE exam results, marking the end of UK exam season. We now have a single week of breathing-space before it all starts again with the new cohorts in September. Bring it on.
Results days are some of the most emotionally charged days in the academic calendar but it’s always mixed with commentary from politicians and pundits talking about the state of the nation’s education and usually the need for reform. It’s only a matter of time before someone says those mortifying words: "Exams are getting too easy, they were tougher in my day!" I’ve heard politicians say it, people on buses, parents of students and so on. Everyone seems to think their exams were the most difficult to have ever exist, but is that fair?
It seems like an insult to the hard-working students who have bled themselves dry in order to do well, but I guess it makes you feel special if you truly believe your life has been a tougher struggle than anyone else’s.
But how are today’s exams different to those of the past? As someone on the front line of modern education (well, it’s really the students who are on the front line, I’m more like the drill sergeant who trains them and sends them off to war) I thought I’d share my thoughts on the topic.
How Grades Work – UK vs USA
Education is a tricky thing to get right and I don’t think any country has it figured out (although I’d take a glance in Canada and Scandanavia’s direction). Most of the web traffic I get comes from the UK and the US, so let’s take a look at how these two systems address the problem of getting an entire population educated.
In the USA nothing is standardised. Every pupil attends classes and their teacher is responsible for their overall grade. How that grade is reached varies between schools, subjects and teachers themselves. Typically between 40 – 50% of the grade is based on a final exam, written and marked by the school faculty, while the remaining 60 – 50% is based on things like coursework, class participation, attendance and behaviour.
At the end of the year, the teacher adds up your scores from these different streams and the grade boundaries pretty straightforward. Score 90% and you get an A. 80% gets you a B, 70% a C, 60% is a D. Anything below that and you get an F - a “Fail”. You can re-take the year however, so if you don’t get enough good grades you get another shot. And that’s the end of that.
In the UK, exams are written by privately run Exam-boards. Exam boards make money in two ways: entry fees (schools pay to enter a student for an exam) and things like text-books and online resources. This second one means an exam board tends to make more money if they change the content of the course every few years since schools have to buy new books and equipment to keep up with them.
The exams are sat nationally at the same time up and down and the country, before they get collected and distributed to markers (usually teachers earning a smidge of extra cash). That’s why it takes several months between the exams being sat in May/June and results day in August. Typically 90 – 100% of your score is determined by the final exam with things like coursework, homework and behaviour being irrelevant.
At GCSE level (age 16) the grades go from A* - G with a “U” grade being a fail. Except starting next year we’re switching to a numerical system where the grades go from 9 – 1 (9 being the highest). Then at A-level (age 18) the grades go from A* - E, then U being a fail.
The grade boundaries are moderated every year by a team of exam officers (slightly different for each board) so the score required to achieve a particular grade changes. Re-sitting is a complicated and expensive option so once you’ve done your exams that’s pretty much it unless you can afford the re-sit fees.
There are clear strengths and weaknesses with both systems. The UK model is obviously intended to be standardised so that an A from one school means the same as an A from another (although the fact that there are about five different exam boards sort of undermines that).
It does also prevent manipulation so a teacher can’t mark a student they don’t like harshly, or give extra credit to a student who’s good on the football team and the local community wants to see them going to college etc.
The US system has the clear advantage that the student has a chance to demonstrate skill over a long period of time, rather than being scrutinised on three year’s worth of a work in a single exam. I’ve known students who have suffered a personal tragedy a few days before their exam so obviously didn't do their best. In the American system I’d be able to give them the grade I felt they deserved, but in the UK if you’re ill on the day – too bad. Until we learn how to digitially upload information to the human brain, it's unlikely anyone will solve the problem perfectly.
Lies, Damned Lies and Statistics
Let me demonstrate something which I think is important. I wanted to look at the figures surrounding GCSE and A-level grades but it turns out getting hold of these statistics is surprisingly difficult. The UK government website doesn’t offer any publicly available information so you really have to go hunting to find what you want.
I am particularly grateful to Brian Stubbs from the University of Bath, who I contacted to help write this blog. If you’re interested, I strongly encourage you to check out his website: http://www.bstubbs.co.uk where he has collated decades of historical exam information. So what does the data show?
Well, in 1989 approximately 77,700 A grades were awarded to A-level students in the UK. This year, around 150,000 were awarded. So exams are twice as easy because the number of A grade students has doubled?
Let’s take another look. In 1989 an “A” grade was the highest grade you could get, but in 2017 it’s the second highest. The highest grade in 2017 is an A*...of which only 69,000 were issued. In other words the “top grade awarded” went down significantly, so exams are obviously getting much, much harder, right?
Not necessarily. In 1989 only around 600,000 students nationally even attempted A-levels whereas this year it was around 830,000. So if we take the top grade as a percentage we see the number of top grades awarded has gone from 11% to 8%, so exams have gotten harder but only by a small amount.
Now let’s look at GCSE grades. In 1988, 12.8% of students were getting the second-highest grade. In 2007 that number was 13.1%. So actually the difficulty level of exams hasn’t changed at all - the same number of students are getting the same kind of grades.
But a really interesting pattern emerges if we look at the years a new “top grade” is introduced. In 2011, 7.8% of GCSE students achieved a grade of A*. Compare that with 1994, the year the A* grade was introduced – that year only 2.8% of students got it. So the exams are getting easier?
Well no, this year they introduced the grade 9 and only 3% of students got it. So if you compare like with like, i.e. compare 2017 with 1994, then you get 3% of students achieving the top grade, so there has been no change. Exams are staying about the same.
This year there has been a 0.4% drop in grade 9s/8s compared to last year’s A* grades for English GCSE. That’s the headline most newspapers are worrying over. Except what’s not being mentioned is that this grade-dip is for English language and literature combined. If you look at English literature (all students sit two English GCSEs) we’ve actually gone up by over 2%.
The point I’m making should be obvious. Depending on which years I pick and which grades I choose to look at, I can spin any story I want. If I were in the government I might want to make it look like grades were going up under my party. Or perhaps I might want to make it look like grades went down under the opposition. If I were the head of an exam board I might want to make it look like grades are staying level and that everything is nice and fair. We have to be very careful what we’re looking at.
The statistics are complicated. However, it is reasonably accurate to say there has been a slight increase in the percentage of “top grades” being awarded over the past twenty years. Grade inflation is a real thing, albeit a very subtle one. But that doesn’t necessarily mean exams are getting easier. In Science you don’t just look at the data and immediately decide the explanation. You consider alternative explanations and see if they account for the data better.
If exams were getting easier then we wouldn’t see sudden dips when a new grading system is introduced, like we did this year and in 1994. Actually, the most sensible conclusion to draw would be that grades increase as a function of familiarity. Change how familiar the exam is and you’ll see a dip in grades. What you might really be seeing in those figures is that people do better each year, provided it’s the same style of exam.
Teachers get used to the types of questions, pupils have access to more past-year’s papers, examiners have more trustworthy mark schemes, exam-writers have done it before so they can give more training to teachers on what to expect etc.
Actually, a very steady increase in grades is precisely what you would expect if the exams were staying more or less the same. The grade-inflation data we see is very small, implying that it’s more about adaptation rather than exams getting easier.
Today, partly thanks to the fact that schools are shifting to online data storage, we can keep past-papers from previous years and give them to our students. In fact, at my school I have done video-recordings of myself answering previous years Physics papers. Students can log on to the physics network and watch me as I attempt a question, describing my method as I go. This is very specific coaching which gives them a slight edge. And that’s a good thing.
The downside is that we spend a lot of time “teaching to the test” rather than teaching a subject for the fun of it and we put waaaaay too much emphasis on answering exam-questions. It has to be said that I have been able to train some students to jump through hoops and over obstacles and squeeze them over the boundary of an A grade, when really they don’t understand the Science any better than a student who gets a B.
Maybe I’m actually causing problems for them further down the line by doing that. I have occasionally coached a student to get an A grade, and they’ve gone to University only to find they don’t really understand the subject as well as they thought and have dropped out. Perhaps I should just let students do a bit worse and not train them in the art of the exam? Hmmm that's a tricky one.
Ultimately, once teachers get to know how an exam system works they can train the students to do better at it, so we see an increase in grades. The problem is that this puts teachers in a difficult position. The government tells schools to raise their standards. If the grades don’t go up then we’ve failed to do it. If the grades do go up then it’s because the exams are easier. It’s a no-win scenario which is not something anyone wants to face.
Besides, I’m not sure “more A grades” necessarily equates to a higher standard of education. At the moment more A grades just means more students better trained to pass exams. Is there a risk that some of the A-grade students aren’t really comparable to A-grade students of yesteryear because they’ve been coached to pass an exam rather than having a deep understanding of the subject? I’m not sure what the solution is (like I said, education’s a tricky thing to get right) so I tend to keep things as simple as I can: if a student asks me for help...I give it to them.
What Are Exams Like Now?
A report commissioned by Ofqual (The Office of Qualifications and Examinations Regulation) in 2012 really irked me. It decided that looking at grade boundaries wasn’t a good way of deciding if exams were getting easier. So far I agree. In order to solve the problem, they did a detailed analysis of exam papers from 2005 and compared them with exam papers from 2008...in two subjects (Biology and Geography). That would be like looking at the weather in two cities a week apart and drawing conclusions about climate change. That’s a far too narrow data set.
The report then claimed that yes, exams really were getting easier. Most of this conclusion came from two factors. Let’s look at the first one.
Ofqual noted that older exams had more essay-questions while modern exams had more multiple-choice questions. Therefore modern exams are easier. The assumption seems to be that essays are hard and multiple-choice is easy. Let’s break that nonsense down.
Working as an exam-marker isn't exactly a soul-fulfilling job. You get paid for every exam script you mark (not very much) so the aim is obviously to get as many done as quickly as possible. After a 10-hour day in school you go home, log on and spend another five hours staring at the same question over and over again, clicking buttons on a screen.
Do you think every line of every essay is closely scrutinised? Or do you think some markers just skim read it and decide the mark based on a general impression? I’m not saying that’s what should happen...but what do you think does happen?
Personally I know a lot of students who feel very confident writing essays. Use the right keywords, keep your grammar up to scratch, drop in some phrases you know the examiner is looking for and you can bluff your way to a high grade. In multiple-choice there’s a clear right or wrong answer and you can’t argue the point. An essay gives you room for manoeuvre and interpretation. A ticked box does not.
You might argue that in a multiple-choice question at least you have the correct answer written somewhere in front of you. But if you know the answer to the question, having it as a multiple-choice makes no difference...you’d have written the correct answer anyway. If you don’t know the right answer then you’re still at no advantage. Yes the right answer is written in front of you, but so are four incorrect ones. If you make a guess you’re 80% likely to get it wrong. Does that make multiple-choice sound easier?
The second issue the Ofqual report highlighted was that some of the Biology papers had less emphasis on scientific content and more on softer things like context. That has been true, but it actually makes answering the question harder.
Here’s an example. When I worked as an examiner there was a question on a exam I marked which said “explain why graphite is used in pencils.” I saw one student who gave the following answer “graphite is composed of layers of hexagonally arranged carbon atoms in a 2D lattice. These layers have weak van der Waals interactions between them meaning they will slide off each other, allowing the graphite to be scraped as pencil lead.”
That answer is scientifically perfect. It’s a “hard science” answer. But guess what, that student got zero marks. The mark scheme wanted you to say “graphite is dark and brittle.” And there is the problem.
That’s a soft answer. It’s what a five-year-old child would say...but that doesn’t make the question easier to answer. It actually makes it harder because you’ve got no idea what the examiner wants you to say if they’re not looking for the specific Science.
I actually complained about that question because it was punishing students who had better scientific understanding and favouring those who answered like children. I wrote to the exam board and explained why I thought the mark-scheme should be changed. They ignored me, so I quit. They asked me to mark again for them the following year and I refused.
What the Ofqual report seemed to miss is that asking straightforward science questions is easier for a well-prepared student to answer because they know what’s expected. So I disagree with Ofqual vociferously. Exams are not getting easier unless you’re naïve enough to assume that certain types of question are inherently “easy” rather than acknowledging different students have different strengths and weaknesses.
Today’s GCSE physics students have to memorise 26 equations for their exam, whereas previous years were given a data-sheet to consult. I’m not sure I even know 26 equations off the top of my head. When I need to know an equation I do what every single scientist in the real world does...I look it up.
In English, students are no longer allowed to take their books into the exam to reference certain passages of text. In Chemistry A level, students are expected to know over 40 reaction pathways...most of which won’t get asked about. And the same is true across any subject. Exams are hard regardless of which year you’re looking at. But even doing that is a bit pointless because the grade boundaries are constantly changing.
How do Grade Boundaries get Decided?
Because the exam is different every year, grade boundaries change with it. At University the grade boundaries for your final exams don’t fluctuate so if you happen to sit your paper during a tougher year, then that’s just tougher luck. University departments always have internal moderation panels to try and make sure the exam questions are fair, but it’s never perfect obviously.
The idea of moderating grade boundaries is to get around this problem. If the exam is harder, the boundaries are lower so you don’t get everyone failing. If the exam was really easy the grade boundaries are higher so you don’t get everyone passing who doesn’t deserve it.
But we’re faced with the same problem: how do we actually do this moderation? Do we assume the top 10% of students will be the best, so we give them all A* grades no matter how well they did? Then we just go down by 10% for the A grades and so on?
There’s an obvious reason not to do that. It makes the assumption that every year the abilities of students will be in the same proportion. There are going to be fluctuations each year so chopping things off every ten percent doesn’t quite seem fair. And what if there are more students one particular year? That means you get more students with the top grades, but are they comparable to the students who got the top grade the previous year?
Usually around 5.5 million students are entered for GCSEs but in 2017 it was 3.6 million. A sociologist could spend years analysing this sudden dip in numbers, or we could just recognise that populations go up and down with time. That’s not a trend, that’s random noise.
Either we keep the grade boundaries the same each year and make an effort to keep the exams of comparable difficulty, or we go through all sorts of committee procedures to moderate the grades after the fact. And this is Britain...so it’s the latter option we go for.
The exact process by which grade boundaries are decided isn’t made clear unless you’re one of the senior examiners, but if you’re curious here’s the website of the Exam board OCR explaining how two students who both score 61/80 end up with different grades: http://www.ocr.org.uk/ocr-for/learners-and-parents/getting-your-results/calculating-your-grade/
That sounds potentially ludicrous. Part of the justification is that one of the students attempted more complicated questions. But more complicated according to whom? We’ve already seen that Ofqual considers essay questions harder than multiple-choice with very little justification, so deciding that one question is harder than another can vary from person to person.
Either the two students sat different papers (undermining the whole point of standardised testing) or they attempted trickier questions on the same paper. That’s like saying doing a single 2-mark question is more valuable than two 1-mark questions. Is it? Says who?
It turns out that grade boundaries are down to examiner opinion. If some examiners think a particular question is trickier or easier this can affect how well the student does after they have already sat the exam and there’s nothing the student can do about it.
The key message is that an A grade one year is not necessarily equivalent to an A grade the year after. You might immediately say “yes but the previous year’s paper was harder, so you can’t compare how they did one year with how they would have done the previous year.” Which is absolutely 100% exactly and entirely my point. You can’t compare two years because the exams are different. So there’s no point speculating on which was easier or harder. It’s too subjective.
I’m alright with you saying that a student who gets an A grade has done well, better than a student who gets a D...obviously that’s true. But that kind of broad statement is all we can honestly say. If we try and get more specific, analysing how students have gone up or down, we're extracting more information than is really there.
Likewise we can make general statements about exam difficulty. Calculus is harder than multiplying fractions. Balancing equations is tougher than counting electrons on a diagram, but again, being more specific is uncalled for. Is calculus harder than trigonometry? Is a pH calculation harder than an NMR spectrum analysis? It depends on the student and the examiner. It's too hard to call.
The problem is that when we try to compare exams between the present and the past we're getting too specific. We can't make accurate statements. Otherwise we're looking for patterns which are only there by coincidence.
Comparisons are Deadly
On the government GCSE-results website you can find the following quotation: “It is always difficult to compare in a meaningful way grade boundaries between old and new qualifications”. That’s actually a very fair thing to say. Well done government!
It’s just a shame they undermine their own message on the very same web-page with the phrase “Overall results are stable comparing outcomes last summer with outcomes this summer” (I’ve paraphrased it because the original sentence is three times as long and adds nothing).
Using the word stable seems like a mistake to me. Stability implies that something isn’t going to fall in the future, or hasn’t fallen compared to the past. But if we’ve already agreed we can’t compare present, past and future, what do we mean by saying the grades are “stable”? It’s almost like “stable” was just a positive-sounding buzzword which doesn’t actually convey much meaning.
The thing is, in the UK, exam criteria change every six years roughly. Each school picks a different exam board and as Ofqual’s own report found, there was a different style of exam even three years apart. How can we possibly hope to extract any meaningful data looking thirty years apart?
The Chemistry A level exam at the end of 2014 was fairly reasonable, but the one at the end of 2015 literally made the news because it was so difficult (I saw dozens of students coming out of the exam hall in tears that day). Two exams in the same topic one year apart can be wildly different.
With past papers available, teachers trying to teach to the test, exam boards have to constantly write tougher questions to make it more of a challenge. It’s an arms race between student’s preparations and an exam board’s desire to actually test them. It gets to the point where if a student writes that a chemical is “blue/green” they get the point, but if they write “turquoise” they don’t (I’ve seen that happen too).
The fact is that it’s not possible to make a meaningful or detailed statement about the quality of exams by simply looking at the grades. If you think exams used to be easier, try teaching a class of students. Or better yet, try sitting your children’s exams yourself and see how well you do. Here's a math question from an EdExcel GCSE paper a few years ago. Remember this is testing "General" maths education for 16 year olds.
Personally, I think things are Harder...but not because of the exams
As someone who sat A levels exams just over ten years ago, I’ve seen a decade’s worth of exam material and it looks about the same. Some bits were harder, some bits were easier.
I mean that’s just my personal opinion...but the exam boards are using that approach to measure grade boundaries, so I don’t see a problem. From what I can tell, the quality of questions is “stable”. There are fluctuations year on year but the exams today’s students are sitting are no harder, nor easier than the exams their parents sat.
However, there is something else which I think has to be factored in which you can’t measure or quantify. This makes it rather hard to write about it in a Science blog, so I’ll make it clear: at this point I’m going into anecdote and speculation. What I have noticed is that students today are under more pressure than their parents were. A lot more.
I have seen students vomiting in exams from stress. I’ve seen them pass out. I’ve seen scores of students having intense anxiety attacks and I’ve even seen one or two wetting themselves. Yes, this isn’t pretty. Horrible to read about right? Imagine you’re a teacher who cares about these kids. Or imagine you’re the actual student themselves.
Imagine you’ve been studying something for three years (in the case of GCSEs) or two years (in the case of A-levels) and now you have to prove yourself in the space of two hours and it’s your ONLY chance. Imagine knowing you’re in competition with 5 million other students and the grade boundaries are in free-fall based on the whims of the examiners. Imagine having to study 10 subjects (only 3 of which you actually chose to do). And imagine being told that your entire future depends on them.
Students are given benchmark grades in year 7. They’re given mock exams in year 10 and then twice in year 11. There are catch-up sessions, warm-up sessions, workshops, after-school extra lessons and students are constantly tested (every three weeks roughly). Not only this, but they are repeatedly warned about the risks of doing badly in exams and how their life will be over if they don’t get the right grades.
Imagine being in a frightening, results-driven environment which is compulsory, you don’t get paid for it and you’re judged as a person based on a few hours worth of work. School in the UK is stressful for kids. Give them a break.
Yes, of course exams should involve stress. I remember working myself silly when I studied for my A-levels, but it was nothing like what I’m seeing today. I don’t really know what the cause is (I have a few guesses but this blog is already too long) but something isn’t right with this picture. When you have dozens of students crying their eyes out before sitting a mock exam...something has gone wrong somewhere.
I love science, let me tell you why.