# Box plots - teachers notes

## Data telling stories

I wanted to put a separate page for teachers on this activity to stop the outside chance of a student getting a sneak preview of what we might be up to! Sometimes I think it is worth keeping a few things to ourselves until just the right moment. Despite the open nature of the task, the intent is clearly that students come up with the idea of the box plot for themselves which is, of course, a really powerful idea! I feel that the box plot is often overlooked, especially for IAs and perhaps it is often used trivially, but it is a fundamentally powerful tool for comparing data sets and when seen in an non-trivial context it can tell a story very well. This activity is such an example.

### What is the data?

This is one of the last things that students find out and I like to wait until they have effectively made 4 box and whisker diagrams to compare the data sets before I tell them. The data sets are infant mortality rates for a systematical sample of 19 countries from Africa, The Americas, Europe and Asia. The figure is the number of deaths in the first year of life per 1000 births. So if the figure is 17, then it means that 17 out of 1000 babies born will die before they reach 1 year old. Below is an image of the Infant Mortality (click to download) sample data. The data comes from the excellent www.gapminder.org

### Notes

So it is a good idea to print the data sets on 4 different coloured pieces of paper so that they are easily distinguishable from each other. If time allows, laminating gives them a longer life!

I recommend giving students total freedom over this to start with. Simply ask them to 'arrange the numbers in a way that describes the set'. As so often, what they come up with is very interesting and often very thoughtful and creative. Let them go through the steps as listed below and explain to each other what they did

- What do you notice about the number set from the start?
- What are the features of the number set you think might set it apart from another set?
- How can you show these features by the way you set them out?
- Try and find a way that shows ALL of the key features of the number set.
- Have a look at the other representations the other groups have chosen - how different or similar are they to yours? What do their representations show you that yours doesn’t? Vice-versa?

I think that knowing that these numbers are in context - even without knowing the context - does give us a different take on things and so, if students haven't already taken that leap for themselves then make a point of saying that they are 'data sets'. go through the next phase. The exchange, discussion and debate that goes on around this, the better.

- How, if at all, does this impact how you want to set them out?
- Having settled on a representation - are there any key numbers in the set that need to be highlighted because of the role they have in telling the story?
- Have a look at the other representations the other groups have chosen - how different or similar are they to yours? What do their representations show you that yours doesn’t? Vice-versa?

#### What sort of things come up?

Well, this can be difficult to predict. I have done this with a number of groups of students and teachers and usually get something I haven't seen before. Below are some of the things that do come up.

There is an instinct to **separate the integers** and use other means of **'classifying the numbers'** in to different sets. This is obviously not appropriate once we know they are a data set, but is an interesting instinct.

**Grouping in to classes** - This is quite a common instinct (and a good one!) in doing so, we often see the beginnings of a frequency histogram which is very nice!

**Numerical ordering** - Because I know what I want students to do, I often surprised that they don't do this straight away! It is, eventually, a common approach though.

**Numerical order on a scale** - This, however, does not seem to be an instinct which I find fascinating, because the scale adds so much detail to the story that is being told. It is common to see one group with a numerical order and another with a histogram and by asking students to think about the pros and cons of both, they often then come up with the scale.

Ultimately I am very happy to steer students towards the numerical order on the scale if it doesn't happen naturally. It usually does, but it is fine to steer I think given the aims of the exercise. It is just nicer if it happens naturally.

#### Working with scales

I love watching this part of the activity as students effectively use trial and improvement and then a range of reasoned techniques (eg divisions in carpet tiles) to get the data on a scale. Of course each group has a different data set and, as such, different scaling issues.

This is just multiplied when we ask them to put all of their data sets ont he same scale. This is more than just a bit of fun, this is the part where the power of the box plot as a comparative tool comes in. The geographical areas all put each other into perspective. It is a powerful exercise.

#### 5 Figure summaries

Once groups have got their data sets on a numerical scale then they will start talking about how the data is spread out, which is obviously a key aim. At this point it is worth asking students to think about how they can 'summarise' the spread of the data set. Which are the key descriptive numbers? Drawing on previous knowledge, most groups will identify the **median** and have the subsequent discussion about why this may or may not be in the **middle** of the scale (hold on to this moment for teaching the normal distribution!)

Of course, the focus on lower and upper quartiles is fairly arbitrary but if teachers get involved in the discussion here we can discuss the role of the 5 figure summary here and, as such, complete the creating of a box plot.

#### All on the same scale

Another key moment (as described above) This is where the comparison comes in to play and it is brilliant to listen to how students will talk about the differences between their data sets even before they know what they are. If (when!) this happens it is a real triumph. This is the moment to tell students what the data is. Whilst it is a sensitive and worrying topic this is when you see the impact of the work that has been done. You can usually hear a few gasps through the silence as students realise the significance of the the story the data sets are telling! Obviously it is important to handle the ensuing discussion carefully and equally important to reflect on the nature and worth of the box plot as a tool. Not much convincing needed.

#### Tok

This activity is another one that is really charged with opportunities for ToK and definitely not to be missed. The following are just a suggested list of related questions that can come up here....

Do we need to compare two or more data sets to really bring meaning to anyone of them?

Two data sets have the same median - how similar are they? How much do two data sets need to have in common before we consider them similar.

What is the impact of sampling on any knowledge that can be gained from the associated analysis?

What is an acceptable infant mortality for a given country?

Are data representations better taught or conceived originally?

How can numbers tell stories?

........