Done and Dusted
Thursday saw the release of GCSE exam results, marking the end of UK exam season. We now have a single week of breathing-space before it all starts again with the new cohorts in September. Bring it on.
Results days are some of the most emotionally charged days in the academic calendar but it’s always mixed with commentary from politicians and pundits talking about the state of the nation’s education and usually the need for reform. It’s only a matter of time before someone says those mortifying words: "Exams are getting too easy, they were tougher in my day!" I’ve heard politicians say it, people on buses, parents of students and so on. Everyone seems to think their exams were the most difficult to have ever exist, but is that fair?
It seems like an insult to the hard-working students who have bled themselves dry in order to do well, but I guess it makes you feel special if you truly believe your life has been a tougher struggle than anyone else’s.
But how are today’s exams different to those of the past? As someone on the front line of modern education (well, it’s really the students who are on the front line, I’m more like the drill sergeant who trains them and sends them off to war) I thought I’d share my thoughts.
How Grades Work – UK vs USA
Education is a tricky thing to get right and I don’t think any country has it figured out (although I’d take a glance in Canada and Scandanavia’s direction). Most of the web traffic I get comes from the UK and the US, so let’s look at how these two systems address the problem of getting a population educated.
In the USA nothing is standardised. Every pupil attends classes and their teacher is responsible for their overall grade. How that grade is reached varies between schools, subjects and teachers themselves. Typically between 40 – 50% of the grade is based on a final exam, written and marked by the school faculty, while the remaining 60 – 50% is based on things like coursework, class participation, attendance and behaviour.
At the end of the year, the teacher adds up your scores from these different streams and the grade boundaries pretty straightforward. Score 90% and you get an A. 80% gets you a B, 70% a C, 60% is a D. Anything below that and you get an F - a “Fail”. You can re-take the year however, so if you don’t get good grades you get another shot instead. And that’s the end of that.
In the UK, exams are written by privately run Exam-boards. Exam boards make money in two ways: entry fees (schools pay to enter a student for an exam) and things like text-books and online resources. This second one means an exam board tends to make more money if they change the content of the course every few years since schools have to buy new books and equipment to keep up.
The exams are sat nationally at the same time up and down and the country, before they get collated and distributed to markers (teachers earning a smidge of extra cash). That’s why it takes several months between the exams being sat in May/June and results day in August. Typically 90 – 100% of your score is determined by the final exam with things like coursework, homework and behaviour being irrelevant.
At GCSE level (age 16) the grades go from A* - G with a “U” grade being a fail. Except starting next year we’re switching to a numerical system where the grades go from 9 – 1 (9 being the highest). Then at A-level (age 18) the grades go from A* - E, then U being a fail.
The grade boundaries are moderated every year by a team of exam officers (slightly different for each board) so the score required to achieve a particular grade changes. Re-sitting is a complicated and expensive option so once you’ve done your exams that’s pretty much it unless you can afford the re-sit fees.
There are clear strengths and weaknesses with both systems. The UK model is obviously intended to be standardised so that an A from one school means the same as an A from another (although the fact that there are five different exam boards sort of undermines that).
It does also prevent manipulation, so a teacher can’t mark a student they don’t like harshly, or give extra credit to a student who’s good on the football team and the local community wants to see them going to college etc.
The US system has the clear advantage that the student has a chance to demonstrate skill over a long period of time, rather than being scrutinised on three year’s worth of a work in a single exam. I’ve known students who have suffered a personal tragedy a few days before their exam so obviously didn't do their best. In the American system I’d be able to give them the grade I felt they deserved, but in the UK if you’re ill on the day – too bad. Until we learn how to digitally upload information to the human brain, it's unlikely anyone will solve the problem.
Lies, Damned Lies and Statistics
Let me demonstrate something which I think is important. I wanted to look at the figures surrounding GCSE and A-level grades but it turns out getting hold of these numbers is surprisingly difficult. The UK government website doesn’t offer any publicly available information so you really have to go hunting to find what you want.
I am particularly grateful to Brian Stubbs from the University of Bath, who I contacted to help write the blog. If you’re interested, I strongly encourage you to check out his website: http://www.bstubbs.co.uk where he has gathered decades of historical exam information about UK exams. Now what does the data show?
Well, in 1989 approximately 77,700 A grades were awarded to A-level students in the UK. This year around 150,000 were awarded. So exams are twice as easy because the number of A grade students has doubled?
Let’s take another look. In 1989 an “A” grade was the highest grade you could get, but in 2017 it’s the second highest. The highest grade in 2017 is an A*...of which only 69,000 were issued. In other words the “top grade awarded” went down significantly, so exams are obviously getting much, much harder, right?
Not necessarily. In 1989 only around 600,000 students nationally even attempted A-levels whereas this year it was around 830,000. So if we take the top grade as a percentage we see the number of top grades awarded has gone from 11% to 8%, so exams have gotten harder but only by a small amount.
Now let’s look at GCSE grades. In 1988, 12.8% of students were getting the second-highest grade. In 2007 that number was 13.1%. So actually the difficulty level of exams hasn’t changed at all - the same number of students are getting the same kind of grades.
But a really interesting pattern emerges if we look at the years a new “top grade” is introduced. In 2011, 7.8% of GCSE students achieved a grade of A*. Compare that with 1994, the year the A* grade was introduced – that year only 2.8% of students got it. So the exams are getting easier?
Well no, this year they introduced the grade 9 and only 3% of students got it. So if you compare like with like, i.e. compare 2017 with 1994, then you get 3% of students achieving the top grade, so there has been no change. Exams are staying about the same.
This year there has been a 0.4% drop in grade 9s/8s compared to last year’s A* grades for English GCSE. That’s the headline most newspapers are worrying over. Except what’s not being mentioned is that this grade-dip is for English language and literature combined. If you look at English literature (all students sit two English GCSEs) we’ve actually gone up by over 2%.
The point I’m making should be obvious. Depending on which years I pick and which grades I choose to look at, I can spin any story I want. If I were in government I might want to make it look like grades were going up under my party. Or perhaps I might want to make it look like grades went down under the opposition. If I were the head of an exam board I might want to make it look like grades are staying level and that everything is nice and fair. We have to be very careful what we’re looking at.
The statistics are complicated. However, it is reasonably accurate to say there has been a slight increase in the percentage of “top grades” being awarded over the past twenty years. Grade inflation is a real thing, albeit a very subtle one. But that doesn’t necessarily mean exams are getting easier.
If exams were getting easier then we wouldn’t see sudden dips when a new grading system is introduced like we did this year and in 1994. Actually, the most sensible conclusion to draw would be that grades increase as a function of familiarity. Change how familiar the exam is and you’ll see a dip in grades. What you might really be seeing in those figures is that people do better each year, provided it’s the same style of exam.
Teachers get used to the types of questions, pupils have access to more past-year papers, examiners have more trustworthy mark schemes, exam-writers have done it before so they can give more training to teachers on what to expect etc.
Actually, a very steady increase in grades is precisely what you would expect if the exams were staying more or less the same. The grade-inflation data we see is small, implying that it’s more about adaptation rather than exams getting easier.
Today, partly thanks to the fact that schools are shifting to online data storage, we can keep past-papers from previous years and give them to our students. In fact, at my school I have done video-recordings of myself answering previous Physics papers. Students can log on to the school network and watch me as I answer a question, describing my method as I go. This is very specific coaching which gives them a slight edge. And that’s a good thing.
The downside is that we spend a lot of time “teaching to the test” rather than teaching a subject for the fun of it and we put waaaaay too much emphasis on answering exam-questions. It has to be said that I have been able to train some students to jump through hoops and over obstacles and squeeze them over the boundary of an A grade, when they don’t really understand the Science any better than a student who gets a B.
Maybe I’m actually causing problems for them further down the line by doing that. I have occasionally coached a student to get an A grade, and they’ve gone to University only to find they don’t really understand the subject as well as they thought and have dropped out. Perhaps I should just let students do a bit worse and not train them in the art of the exam? Hmmm that's a tricky one.
Ultimately, once teachers get to know how an exam system works they can train the students to do better at it so we see an increase in grades. The problem is that this puts teachers in a difficult position. The government tells schools to raise their standards. If the grades don’t go up then we’ve failed to do it. If the grades do go up then it’s because the exams are easier. It’s a no-win scenario which is not something anyone wants to face.
Besides, I’m not sure “more A grades” necessarily equates to a higher standard of education. At the moment more A grades just means more students better trained to pass exams. Is there a risk that some of the A-grade students aren’t really comparable to A-grade students of yesteryear because they’ve been coached to pass an exam rather than having a deep understanding of the subject?
What Are Exams Like Now?
A report commissioned by Ofqual (The Office of Qualifications and Examinations Regulation) in 2012 really bugged me. It decided that looking at grade boundaries wasn’t a good way of deciding if exams were getting easier. In order to solve the problem they did a detailed analysis of exam papers from 2005 and compared them with exam papers from 2008...in two subjects (Biology and Geography). That would be like looking at the weather in two cities a week apart and drawing conclusions about climate change. That’s far too narrow a data set.
The report then claimed that yes, exams were getting easier. Most of this conclusion came from two factors. Let’s look at the first one.
Ofqual noted that older exams had more essay-questions while modern exams had more multiple-choice. Therefore modern exams are easier. The assumption seems to be that essays are hard and multiple-choice is easy. Let’s break that nonsense down.
Working as an exam-marker isn't a fulfilling job. You get paid for every exam script you mark (not very much) so the aim is obviously to get as many done as quickly as possible. After a 10-hour day in school you go home, log on and spend another five hours staring at the same question over and over again, clicking buttons on a screen.
Do you think every line of every essay is closely scrutinised? Or do you think some markers just skim read and decide the mark based on a general impression? I’m not saying that’s what should happen...but what do you think does happen?
Personally I know a lot of students who feel confident writing essays. Use the right keywords, keep your grammar up to scratch, drop in some phrases you know the examiner is looking for and you can bluff your way to a high grade. In multiple-choice there’s a clear right or wrong answer and you can’t argue the point. An essay gives you room for manoeuvre and interpretation. A tick-box does not.
You might argue that in a multiple-choice question, at least you have the correct answer written somewhere in front of you. But if you know the answer to the question, having it as a multiple-choice makes no difference...you’d have written the correct answer anyway. If you don’t know the right answer then you’re at no advantage. Yes the right answer is written in front of you, but so are four incorrect ones. If you make a guess you’re 80% likely to get it wrong. Does that make multiple-choice sound easier?
The second issue the Ofqual report highlighted was that some of the Biology papers had less emphasis on scientific content and more on softer things like context. That has been true, but it actually makes answering the question harder.
Here’s an example. When I worked as an examiner there was a question on a exam I marked which said “explain why graphite is used in pencils.” I saw one student who gave the following answer “graphite is composed of layers of hexagonally arranged carbon atoms in a 2D lattice. These layers have weak van der Waals interactions between them meaning they will slide off each other, allowing the graphite to be scraped as pencil lead.”
That answer is scientifically perfect. It’s a “hard science” answer. But guess what, that student got zero marks. The mark scheme wanted you to say “graphite is dark and brittle.” That’s a soft answer. It’s what a five-year-old child would say...and that doesn’t make the question easier to answer. It makes it harder because you’ve got no idea what the examiner wants you to say if they’re not looking for the Science.
I actually complained about that question because it was punishing students who had better scientific understanding and favouring those who answered like children. I wrote to the exam board and explained why I thought the mark-scheme should be changed. They ignored me, so I quit.
What the Ofqual report seemed to miss is that asking straightforward science questions is easier for a well-prepared student to answer because they know what’s expected. So I disagree with Ofqual vociferously. Exams are not getting easier unless you’re naïve enough to assume that certain types of question are inherently “easy” rather than acknowledging different students have different strengths and weaknesses.
Today’s GCSE physics students have to memorise 23 equations for their exam, whereas previous years were given a data-sheet to consult. I’m not sure I even know 23 equations off the top of my head. When I need to know an equation I do what every single scientist in the real world does...I look it up.
In English, students are no longer allowed to take their books into the exam to reference certain passages of text. In Chemistry A level, students are expected to know over 40 reaction pathways most of which won’t get asked about. And the same is true across any subject. Exams are hard regardless of which year you’re looking at. But even doing that is a bit pointless because the grade boundaries are constantly changing.
How do Grade Boundaries get Decided?
Because the exam is different every year, grade boundaries change with it. At University the grade boundaries for your final exams don’t fluctuate so if you happen to sit your paper during a tougher year, then that’s just tougher luck. University departments always have internal moderation panels to try and make sure the exam questions are fair, but it’s never perfect obviously.
The idea of moderating grade boundaries is to get around this problem. If the exam is harder, the boundaries are lower so you don’t get everyone failing. If the exam was really easy the grade boundaries are higher so you don’t get everyone passing who doesn’t deserve it.
But we’re faced with the same problem: how do we actually do this moderation? Do we assume the top 10% of students will be the best, so we give them all A* grades no matter how well they did? Then we just go down by 10% for the A grades and so on?
There’s an obvious reason not to do that. It makes the assumption that every year the abilities of students will be in the same proportion. There are going to be fluctuations each year so chopping things off every ten percent doesn’t seem fair. And what if there are more students one particular year? Usually around 5.5 million students are entered for GCSEs but in 2017 it was 3.6 million. That means you got more students with the top grades; are they comparable to students who got top grades the previous year?
Either we keep grade boundaries the same each year and make an effort to keep exams of comparable difficulty, or we go through all sorts of committee procedures to moderate the grades after the fact. And this is Britain...so it’s the latter we go for.
The exact process by which grade boundaries are decided isn’t made clear unless you’re one of the senior examiners, but if you’re curious here’s the website of the Exam board OCR explaining how two students who both score 61/80 end up with different grades: http://www.ocr.org.uk/ocr-for/learners-and-parents/getting-your-results/calculating-your-grade/
That sounds potentially ludicrous. Part of the justification is that one of the students attempted more complicated questions. But more complicated according to whom? We’ve already seen that Ofqual considers essay questions harder than multiple-choice with very little justification, so deciding that one question is harder than another can vary from person to person.
Either the two students sat different papers (undermining the whole point of standardised testing) or one of them attempted trickier questions on the same paper. That’s like saying a single 2-mark question is more valuable than two 1-mark questions. Is it? Says who?
It turns out that grade boundaries are down to examiner opinion. If some examiners think a particular question is trickier or easier this can affect how well the student does after they have already sat the exam and there’s nothing the student can do about it.
The key message is that an A grade one year is not necessarily equivalent to an A grade the year after. You might immediately say “yes but the previous year’s paper was harder, so you can’t compare how they did one year with how they would have done the previous year.” Which is entirely my point. You can’t compare two years because the exams are different. So there’s no point speculating on which was easier or harder.
I’m alright with you saying that a student who gets an A grade has done well, better than a student who gets a D...obviously that’s true. But that kind of broad statement is all we can honestly say. If we try and get more specific, analysing how students have gone up or down, we're extracting more information than is really there.
Comparisons are Deadly
On the government GCSE-results website you can find the following quotation: “It is always difficult to compare in a meaningful way grade boundaries between old and new qualifications”. That’s a very fair thing to say. Well done government!
It’s just a shame they undermine their own message on the very same web-page with the phrase “Overall results are stable comparing outcomes last summer with outcomes this summer” (I’ve paraphrased it because the original sentence is three times as long and adds nothing).
Using the word stable seems like a mistake to me. Stability implies something isn’t going to fall in the future, or hasn’t fallen compared to the past. But if we’ve already agreed we can’t compare present, past and future, what do we mean by saying the grades are “stable”? It’s almost like “stable” was just a positive-sounding buzzword which doesn’t actually convey much meaning. Hmmm.
In the UK, exam criteria change about every six years. Each school picks a different exam board and as Ofqual’s own report found, there was a different style of exam even three years apart. How can we possibly hope to extract meaningful data looking thirty years apart?
The Chemistry A level exam at the end of 2014 was fairly reasonable, but the one at the end of 2015 literally made the news because it was so difficult (I saw dozens of students coming out of the exam hall in tears that day). Two exams in the same topic one year apart can be wildly different.
With past papers available, teachers trying to teach to the test, exam boards have to constantly write tougher questions to make it more of a challenge. It’s an arms race between student’s preparations and an exam board’s desire to actually test them. It gets to the point where if a student writes that a chemical is “blue/green” they get the point, but if they write “turquoise” they don’t (I’ve seen that happen too).
The fact is that it’s not possible to talk about the quality of exams by looking at the grades. If you think exams used to be easier, try teaching a class of students. Or better yet, try sitting your children’s exams yourself and see how well you do. Here's a math question from an EdExcel GCSE paper a few years ago. Remember this is testing "General" maths education for 16 year olds.
Personally, I think things are Harder...but not because of the exams
As someone who sat A levels exams just over ten years ago, I’ve seen a decade’s worth of exam material and it looks about the same. Some bits were harder, some bits were easier.
I mean that’s just my personal opinion...but the exam boards are using that approach to measure grade boundaries so I don’t see a problem. From what I can tell, the quality of questions is “stable”. There are fluctuations year on year but the exams today’s students are sitting are no harder, nor easier than the exams their parents sat.
However, there is something else which I think has to be factored in which you can’t quantify. This makes it rather hard to write about it in a Science blog, so I’ll make it clear: at this point I’m going into anecdote and speculation. What I have noticed is that students today are under more pressure than their parents were. A lot more.
I have seen students vomiting in exams from stress. I’ve seen them pass out. I’ve seen scores of students having intense anxiety attacks and I’ve even seen one or two wetting themselves. Yes, this isn’t pretty. Horrible to read about right? Imagine you’re a teacher who cares about these kids. Or imagine you’re the actual student themselves.
Imagine you’ve been studying something for three years (in the case of GCSEs) or two years (in the case of A-levels) and now you have to prove yourself in the space of two hours and it’s your ONLY chance. Imagine knowing you’re in competition with 5 million other students and the grade boundaries are in free-fall based on the whims of the examiners. Imagine having to study 10 subjects (only 3 of which you actually chose to do). And imagine being told your entire future depends on them.
Students are given benchmark grades in year 7. They’re given mock exams in year 10 and then twice in year 11. There are catch-up sessions, warm-up sessions, workshops, after-school extra lessons and students are constantly tested (every three weeks roughly). Not only this, but they are repeatedly warned about the risks of doing badly in exams and how their life will be over if they don’t get the grades.
Imagine being in a frightening, results-driven environment which is compulsory, you don’t get paid for it and you’re judged as a person based on a few hours worth of work. School in the UK is stressful for kids. Give them a break.
Yes, of course exams should involve stress. I remember working myself silly when I studied for my A-levels, but it was nothing like what I’m seeing today. I don’t really know what the cause is (I have a few guesses but this blog is already too long) but something isn’t right with this picture. When you have dozens of students crying their eyes out before sitting a mock exam, something has gone wrong somewhere.
Better Late Than Inaccurate
I don’t often write about current affairs in Science for two reasons. The first is that when a "news-worthy" Science story breaks, it gets splashed everywhere in the media so there’s no need for me to report it too. The other reason is that I like to take my time with things. When you hear a Scientific claim, the best thing to do is check it carefully, do some research, find original sources, learn the background etc. Unfortunately, the media machine moves very fast so by the time I know what’s actually happened I’m usually behind the curve.
And to be honest I like it that way. I’d much rather be cautious when I hear a news story than comment within the hour. Particularly if it’s complicated. So, despite many people trying to persuade me to write more up-to-date stuff, I’m going to be stubborn. Personally I value accuracy over expediency.
One of the exceptions to these rules is when the "hype" over a story has gotten out of proportion, or that people are misunderstanding what actually happened. In that case, I do feel more of an urge to try and put my thoughts out there, and this story of Artificial Intelligence (AI) gone haywire is a prime example.
You might have come across it a few days ago (31st July was when the story broke). I ran across it on scaremongering Instagram feeds and ignored it, perhaps foolishly. When it kept coming back I decided to look into it and see what the truth was. It’s taken me a few days to get to grips as I’m not an expert on AI technology, but I’m pretty confident I can report with reasonable insight.
What Got Reported/Is Being Reported
According to the headlines, facebook was doing research on AIs and successfully created two robots which possessed the ability to communicate with each other. The robots struck up a conversation, but very quickly decided to abandon English and invented their own language which the programmers could no longer decrypt.
The robots conversed in their secret “robot-ese” with increasing speed, hiding their conversation from us, learning as they went. Panicked and frightened, the facebook programmers immediately shut down the software before it got too smart. This is apparently the first instance of computers creating their own secret code system and attempting to outwit their human creators.
What Actually Happened...
Facebook, like many other companies who develop computer software, spend a lot of time researching chatbots. Chatbots are programs designed to mimic human speech, useful for all sorts of things like voice recognition software or operating systems that talk back etc.
The way they work is by picking up on certain words, applying the basic rules of grammar and syntax, interpreting the message and outputting a logical response. There’s a debate around whether this consistutes “speaking” a language, but a lot of chatbot software can be quite sophisticated.
And chatbots aren't anything new. In fact, there’s an annual competition called the Loebner Prize which has been running for twenty one years in which chatbots compete to try and convince a panel of judges they are human. These tests (where someone is talking to a screen and isn’t sure if it’s a person or a robot) are called “Turing tests” and there are lots of chatbots which have reasonable success-rates at passing them. Specific and detailed conversations are still impossible, but simple chats about the weather etc. can be simulated easily.
One of the things programmers of chatbots particularly like to do in order to road-test them, is to put two chatbots into conversation with/against each other. Depending on your perspective this is either ingenius or hilarious. The result is that the two chatbots communicate and try to understand each other’s usage of a language.
Obviously, when two chatbots talk they can end up exchanging complete gibberish because they don’t really understand English (that’s the whole point of the research, to see how close a simulation can get). And that’s what these two chatbots ended up doing, the only difference was that their gibberish had a vague structure to it. The language they were using was still English, just a slightly distorted version, which made little sense to any human reader.
If you’re curious what their conversation looked like, here’s a short section of it. In the spirit of the AI takeover, you should probably listen to the theme music from Terminator 2: Judgement Day (which I've helpfully included below) while you read the extract:
Extract from the Chatbot Conversation:
Bob: you i everything else . . . . . . . . . . . . . .
Alice: balls have a ball to me to me to me to me to me to me to me
Bob: i i can i i i everything else . . . . . . . . . . . . . .
Alice: balls have a ball to me to me to me to me to me to me to me
Bob: i . . . . . . . . . . . . . . . . . . .
Alice: balls have zero to me to me to me to me to me to me to me to me to
Bob: you i i i i i everything else . . . . . . . . . . . . . .
Alice: balls have 0 to me to me to me to me to me to me to me to me to
Bob: you i i i everything else . . . . . . . . . . . . . .
Alice: balls have zero to me to me to me to me to me to me to me to me to
Quaking in your boots right?
Not so much "The Terminator" as "Jonny-Five suffering a tourettes outburst". The chatbots started producing conversations like that and as a result, the programmers switched them off. Not because they were being intelligent, but because they were being stupid. And here's the thing: this sort of thing happens all the time. Chatbots often start talking in garbled forms - the only reason this made the headlines was because it was facebook doing it and it's front page news when Mark Zuckerberg blows his nose.
The reason the malfunction occurred was because when the programmers wrote the chatbot they forgot to specify that the language had to stick to certain grammatical rules. If we agree that a language has to have certain properties e.g. finite words, infinite sentences, recursivity, generativity, then there are 24 grammatical possibilities a language can take (linguistic logicians like Frederick NewMeyer have actually worked this out). Of the 6,000 languages on Earth, only 15 of the possible grammar structures are actually used, with most languages sticking to one of 4.
In other words, almost every human language on earth conforms to one of 4 types, but there are 20 largely unused ones out there. It’s no surprise that a mathematically-minded chatbot might select one of the others that might be far more efficient than ours. Really, it’s no surprise that a computer would butcher our language...our language is a mess to begin with.
So technically, the chatbots did start using their own language but it wasn’t an invented one. It was just one of the other possible ones and they were still using English words. There was nothing sinister going on and pulling the plug was done because the bots were failing to simulate our language...badly. It wasn't so much a case of "Oh God the robots are sentient, quick pull the plug!" it was more "Oh damnit...Steve, the stupid bots are talking like idiots again, can you hit the stop button, I can't reach it over my coffee!"
So no, we don't have anything to fear from facebook AIs getting too smart. The worst we could say is that a robot who spoke like Alice or Bob would be extremely irritating.
This blog was brought to you by Skynet.
That's a Good Question
It's something I get asked by students of all ages. The Universe is expanding, but the Universe is (by definition) everything...so what the devil is it expanding into? Admittedly, most of my students don't phrase it like that because they're not 19th Century businessmen, but you get the idea.
I usually do my best to give an answer on the fly but it's a surprisingly tricky thing to deal with because there are lots of misconceptions and variables we have to take into account. I'm afraid the answer isn't something simple like "your mum's face". It gets very strange, very fast.
The most straightforward response to the question is technically "we don't know" but NOT because we have no explanation - actually we have three - we just don't know which one is correct. So I decided it was time to do justice to the question and go through what we do and don't know. This topic is a bit of a head-pickler though, so if you find yourself getting confused don't worry, that means you probably understand it. It's the people who claim they aren't confused you need to worry about. So, let's get down to business...
How do we know it's expanding?
If you look at the stars, everything seems simple. They follow predictable patterns and, the occasional comet or meteor aside, nothing seems to be moving around very much. For the longest time we assumed our Universe was completely static, but in 1912 we discovered something very unusual.
Imagine asking someone to do an impression of a car going past on a motorway. Pretty much everyone will make the same noise: Niieeeeeaaaaaoowwwww! It's hard to write it but you can imagine the sound I'm trying to describe. It starts off as a high pitched whine and then gets lower as it shoots past you. You may have also noticed the same effect when an ambulance goes past your house. The blaring of the siren seems to gradually droop as it moves away from you. This phenomenon is called Doppler shift and the diagram below shows where it comes from.
The sound waves are depicted as ripples. If you imagine standing in front of the car as it approaches, the soundwaves are being squashed since the car is moving toward its own wavefront. The result is that your ear drums pick up lots of compressions per second aka a high frequency sound. By contrast, if you are standing behind the car the pulses are stretched out because the car is moving away from you and your ear will detect a low frequency sound. High frequency sound is what our brains percieve as higher pitch, while lower frequency sounds correspond to the lower notes. This is why the car's sound appears to go from high to low as it shoots past you. The waves are going from compressed to rarefied, creating a pitch differential. But it's not just sound waves that do this; any type of ripple exhibits the Doppler phenomenon.
As you probably remember from high-school physics, beams of light exhibit a wave-like property. The nature of light is complicated but we can think of it as a ripple in an invisible field. This means a beam of light can appear stretched if it's moving away from you and vice versa.
If I were to throw a torch at your head, the beam of light will be slightly compressed before it hits you. And if you throw it back to me, the beam will be stretched as it moves away. This means beams of light can appear as higher pitch or lower pitch frequencies. Except instead of giving a different note it gives a different colour. A high-pitched beam of light is what we think of as blue/violet, while a low-pitched beam of light is what we percieve as red/orange.
Although it sounds hard to believe, an object moving toward you appears slightly blue and an object moving away will appear slightly red. This is an imperceptible effect however, partly because light waves are tiny and partly because your eye isn't sensitive enough to pick up on it, but it is there and you can detect it with the right equipment.
It was in 1912 that a man whose name (amazingly) was Vesto Slipher discovered that light from other galaxies was red-shifted. If you want to go into detail then technically what he discovered was that the Fraunhoffer lines were redshifted (feel free to look that up) but it amounts to the same thing. Distant galaxies give off light which is being stretched.
By 1917, the astronomer Edward James Keeler had made a careful measurement of all the known galaxies and discovered that on average everything was redshifted and therefore moving away from us. There are a couple of exceptions e.g. the Andromeda galaxy is blue-shifted, but the average picture is clear. Everything in the Universe is moving away which means the whole thing is expanding. Here's some photographs of Slipher and Keeler - they don't help with the explanation but I had to include them for the look on Slipher's face.
Where's the Centre?
The first gut-reaction everyone has to this discovery is to be spooked by it - is Earth truly the centre of the Universe? Why is everything moving away from us specifically? This is where most of the misconceptions stem from, so let's get detailed. The idea that all galaxies are flying away from us is wrong. They aren't.
In 1927 Edwin Hubble discovered that the further out you looked the faster things appear to be going. Imagine you were looking at a particular galaxy, call it "A" and measured its speed as 100 m/s. Then say there was another galaxy further away, call it "B", and B was also moving away from you at 100 m/s.
Both galaxies are flying away from us at 100 m/s. But now imagine standing on galaxy A and looking at B. B is moving at 100 m/s too, so you wouldn't see it moving at all. It would appear stationary because you're both matching its velocity. You would see it at a constant distance and it would be planet Earth which would appear to be moving away.
What Hubble discovered is that galaxy B is actually moving at 200 m/s from our perspective. This means an observer in galaxy A would look at B and say "galaxy B is moving away from me at 100 m/s, just like the Earth is."
In other words, an observer in galaxy A would also see everything moving away from themselves. What Hubble showed was that because things further out appear faster relative to us, this means there is no "stationary point" of the Universe which everything is flying away from. Actually, everything is moving away from everything else. There is no "centre of the Universe". Every point could be described as the centre, which starts to make things hard to visualise. However, the point of Hubble's discovery is that nobody can claim to be the centre of the Universe no matter how much they might want it to be true.
Actually, things aren't moving at all
The fact that the Universe is expanding in all directions without a centre doesn't make sense. How can all the galaxies be flying away but not be flying away from a specific point? The answer to this question was actually solved in 1923...four years before we even knew it needed solving. Sometimes Science is like that.
The Russian physicist Aleksander Friedmann had been playing around with Einstein's theory of general relativity (1915) and discovered that if you assume the fabric of space itself was somehow stretching, the equations still worked.
Friedmann was, largely for fun, seeing if it was possible to create a theoretical universe in which the fabric of space was expanding and it turned out to be legitimate. It sounds wrong to imagine empty space having any kind of property,but it was just equations on a piece of paper; a mathematical curiosity which described a possible Universe, not the actual one.
Once Hubble had discovered the Universe was expanding in all directions however, people began looking at Friedmann's ideas seriously and realised they match what we observe. In a Friedmann universe, it's not the objects which are all flying away from each other, but the background of empty space which is stretching, creating the illusion of objects moving. Was it possible that Friedmann's theoritcal universe was accidentally the real one?
Pretty soon we turned Friedmann's equation into a testable prediction: if the expansion is an illusion caused by "space-stretching" rather than "object-movement" it should be detectable in the form of a microwave signal in deep space. The reason why Friedmann's equations predict this are laborious and mathematical so I'll skip over them...the outcome is simple: if it's space which is expanding we should discover a microwave-hum to the entire Universe which would be caused by beams of light from the early expansion getting stretched out. In 1964, such a signal was discovered by Arno Penzias and Robert Wilson and it's unmistakable.
As crazy as it sounds, the galaxies of our Universe are not drifting away from each other like an explosion. They are "standing still" and it's the empty space between them getting bigger. Here's a photograph of Friedmann, again, just for the look on his face.
The Balloon Analogy
The most common way of illustrating the expansion of the Universe is with an analogy I have mixed feelings about. The idea is that you draw a bunch of dots on an uninflated balloon and then gradually blow into it. As you do so, the elastic stretches and the dots (representing galaxies) give the illusion of moving away from each other. I've used it myself in class but there are a lot of potential misconceptions which arise. Here's what it looks like...
The problems with the analogy are twofold. First, the balloon clearly has a centre...it's the point inside where you're blowing air into. It also shows the balloon expanding into the room you're doing the demonstration in. What we have to be clear about is that the interior of the balloon and the exterior of the balloon are NOT part of the analogy.
Essentially, you have to ignore the fact that you know the balloon is being inflated because we're pumping air in and ignore the fact that there is a room surrounding you. You have to focus on the surface of the balloon only. This two-dimensional surface is what the analogy is really about. If you imagine you're some kind of microscopic bug living on the surface of the balloon, as you look around you'll see galaxies moving away and space expanding. We can't easily demonstrate the 3D process but we can simplify it by compressing the third dimension into just two.
The second problem is that the rubber is still made of particles which are being spread as the balloon expands. The reality is that empty space is not made of particles which are rearranging and spreading. It's the fabric of empty space which is expanding and it doesn't have any finer structure we're aware of.
But, if you can bypass those two problems, the balloon analogy is pretty good. It shows that it's space between dots expanding, shows the overall space/volume of the territory getting bigger and shows that the dots themselves aren't expanding very much i.e. the galaxies themselves aren't getting much bigger, just the region between them. Technically, becase empty space is stretching then yes, the distance between two stars will slightly increase over time (the dots on the balloon will gradually grow larger as the ink molecules are moved) but the effect is too small to observe.
It's also very useful in showing that the Universe has no centre. If you imagine asking the little bug to find the central point of the balloon's surface it wouldn't be able to. For the same reason a circle has no start and stop, the surface of the balloon has none either. You could pick any point on the surface equally. The "centre" of the balloon exists in a higher dimension than the bug can percieve.
I've also heard a pretty good analogy which is to imagine the Universe as a blob of dough with chocolate chips in it. As the dough is cooked it expands and the chocolate chips end up further away from each other. Although even the word "expands" could be misleading. Stretching is really what Friedmann had in mind.
OK...but seriously, what's it stretching into?
Now that we've covered what the expanding universe theory actually says, we can address the question properly. Even though the balloon analogy isn't perfect, it shows that the volume enclosing all the galaxies is increasing. In the dough analogy you could eventually get to the edges of the dough and ask yourself what was beyond and in the balloon analogy you could measure the thickness of the balloon's elastic and notice that this is gradually getting thinner. So the questions is still there: what is the background that we measure our Universe against?
Or, it can be phrased in an even simpler way: what is outside the Universe? And this is where things get interesting. There are, at present, three llikely contenders for dealing with the question.
1. The Universe is Finite
This one is the simplest to visualise. The idea is that our Universe really does have a limit 14 billion light years away from us and it separates our Universe from whatever is without. This "without" could have all sorts of properties but it could also be a complete vacuum. Perhaps the emptiness outside our Universe is like some kind of soup and our Universe has an edge made of big-bang material, or perhaps it's just sheer empty space which our "space" is moving into.
This boundary to our Universe constitutes an Event Horizon i.e. a surface which separates two regions and makes it impossible for them to communicate with each other. This doesn't mean it's a physical surface (although it might be) it could just be that once you get to the edge of empty space, you just find...even emptier space. This edge of the Universe is sometimes called the Cosmic Event Horizon and it really means the point of perfect ignorance, by definition we cannot know what is outside it.
This does of course mean it's entirely possible there are lots of Universes out there which are all occupying this mysterious void and they are gradually expanding into it together. This isn't to be confused with the many-worlds interpretation of quantum mechanics, but it has a lot of the same outcomes: there are a huge number of Universes, possibly infinite, possibly not, and they are all occupying some kind of mega-space. Each Universe could have totally different laws of physics and different historical timelines so anything could be possible provided you pick the right pocket Universe.
2. The Universe is Infinite
The previous idea is a strange one but it's not intractably strange. We can just about imagine it. But this next one is a whole other kettle of carrots. It's possible the Universe simply is everything so the question of there being an outside is meaningless. It's like asking what is North of the North pole? Or what's more right-angled than 90 degrees? By definition there is nothing beyond, the universe has no edge, it is just everywhere. This isn't easy to swallow because as humans we aren't very good at picturing infinity. But here's a stab.
Consider the following numbers: 1 2 3 4 5 6 7 8 9. We can imagine the number line going on forever in both directions i.e. it is infinite. But now consider this number line: 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9. That number line has more numbers in it, I've included the decimal points halfway between each integer, so it's a bigger line...but it's still infinite in both directions. In other words, the second infinity is bigger than the first.
I could go further and include other decimal points between each number, in fact I could do that an infinite number of times. There are an infinite number of infinities. In(finite)ception! But the weird thing about infinities is that if an infinity expands (or stretches) then it doesn't have to be stretching into anything...it just is.
So it's possible the answer is that there is nothing outside the Universe, it is literally everything, so it's expanding and that's all there is to it. That one isn't exactly easy to digest, and personally I'm doubtful of it (I won't bore you as to why). But it is a possibility. And that's just the nature of infinity. It goes on forever.
3. The Universe is Looped
Have you ever played the really old mobile-phone game "snake"? If you've not, the idea is simple. You control a pixellated snake which has to move around the phone screen eating other black pixels. I dunno, scones or something, whatever snakes eat.
What made the game really interesting was that if you went off the right-hand edge of the screen you reappeared on the left-hand edge. If you went upwards you just appeared at the bottom and so on. The snake-game Universe was infinite as far as the digital snake could tell. If it went in one direction forever it just kept coming back to where it started. The snake was a 2D creature who thought it's Universe had no edge, but we as higher-dimensional beings (3D creatures) could see the entire size of the snake's Universe.
The real answer to how this would be possible is that the snake's Universe was actually curved in the third dimension (our Universe). It looped back on itself so that actually the 2D Universe the snake percieved was really the surface of a sphere. If you're a 2D creature living on a sphere then you would see the Universe expanding in all directions because it was looped around on itself in a higher dimension. Remember it's not the objects on the screen flying away from each other, it's the screen itself stretching. Now all we have to do is go back and add one extra dimension.
We 3D creatures may find that if we travel in a straight line we end up back where we started. It would seem strange to us, but to a 4D being looking "down" they would see our Universe was curved back on itself.
So in a way the Universe is simultaneously finite and infinite depending on your perspective. It might be infinite in the 3rd dimension but finite in the 4th. It could have an edge as far as a higher-D creature can tell but to us we'd never see it because we're trapped in our 3D world. So what our Universe is expanding into could actually be a higher dimension. That's why I enjoyed playing Snake anyway.
Will we ever know?
The answer to the Universe expansion hinges on a lot of unknowns. There are sub-theories of the ones I've mentioned above and there are subtle details I've missed out, but it looks very likely that one of these three explanations is correct. Conclusively answering it is going to prove difficult however because there's a limit to how far into space we can actually see.
The further out something is, the older it is. Which means that when we look at objects far away we're also looking back in time. The furthest objects we can see today are galaxies which formed a few million years after the big bang expansion started. We can literally take photographs of the early Universe and figure out how it evolves. But that presents several difficulties.
As far as we can tell, our Universe took its current form 14 billion years ago. This means the farthest out we could ever hope to see would be 14 billion light years. Beyond is also "before" and asking what happened "before the start of time" gets sticky and possibly meaningless.
There's also the fact that the very early Universe was opaque and glowy, meaning we won't really be able to see past the early wall of light to what came before it. I'm afraid going out to the edge of the Universe and looking to see what's there is probably not feasible...not with today's understanding of Physics at least. So it's going to have to be elsewhere that we need to look.
If there are higher dimensions then maybe we can detect them. If there are other pocket Universes then maybe they influence ours in some measurable way. At the moment, we just don't know so this question remains speculative. The Universe is expanding, that much is clear. The fabric of space is what causes it, but beyond that we are still piecing the puzzle together.
And there we have it. The Universe is either expanding into a multiverse, it is infinite so isn't expanding into anything, or it's expanding into itself via some hyperspace curvature. I'm afraid these questions always lead to wierd territory but that's because we're dealing with the fundamentals of reality, it would be a surprise if it didn't bake our brains. Not to mention a disappointment. Personally I'd rather live in a Universe which takes effort to explain.
Expanding Universe: internapcdn
Captain Li Shang: animatedheroes
Sound Waves: meet
Vesto Slipher: newspaperslibrary
Edward James Keeler: Britannica
Miley Cyrus: celebuzz
Aleksander Friedmann: Wikpedia
Balloon Stretch: Astronomer
Stretch pose: pinimg
Multiverse: space cheetah
Infinity Movie: Wikimedia
Snake game: Mothership
I love science, let me tell you why.