Mentor: Now I would like to help you with another graphing method that allows you to compare
different categories of data. It is called a
box plot. It looks something like this:
Each one of the vertical lines represents an important number related to the data set: The
first and last line (leftmost and rightmost) are drawn at the lowest and highest data values.
The three lines that form the box are drawn 25%, 50%, and 75% of the way through the data.
These five numbers, the least, the 25%, the 50%, the 75%, and the greatest, together are
called a 5 number summary.
Student: Summary of what, the data?
Mentor: Right. In the past we have talked about the mean of the data being the average of all of the
data points. There is another important 'middle' number. It is called the
median (M).
Student: I know that. The median is the mid-point of the data set. If you were to line up the data set
from least to greatest, split the number of data points in half, put the lower half of the
data points on one side of the scale and the upper half of the data points on the other side
of the scale, the median would be data value at the balancing point if there is an odd number
of data points like this for data points 1 1 3 7 8 8 9:
Mentor: That's right, the quantity of numbers on either side of the scale is the same. But keep in
mind, if you have an even quantity of numbers, you average the two middle numbers and report
their average as the median. You do not add this number to the list. It is simply the median
value, and it marks the fiftieth percentile of the data.
Also remember not to get median confused with the
mean. The median has to do with the number of data points where the mean deals with the value of
the data points.
Now let's look at splitting the halves in half in order to find the ends of the box in the box
plot.
Student: You mean splitting the data into quarters?
Mentor: Yes. We want to talk about the twenty-fifth and the seventy-fifth percentiles of the data.
The twenty-fifth percentile is called the
first quartile (Q1)or the
lower quartile and the seventy-fifth percentile is the
third quartile (Q3) or
upper quartile.
Student: What exactly is a quartile?
Mentor: The lower quartile is the median of the first 50% of the data. And the upper quartile is the
median of the last 50% of the data.
Student: So is it just another point in the data set?
Mentor: Much like the median, as long as there is an odd number of data points in the first or last
50% of the data. If there is an even number of data points then the quartile is the average of
the two middle numbers, just like when we found the median.
There are two possible ways of finding the quartiles. Neither of these methods is considered
standard over the other way of finding the upper and lower quartiles so your final answer will
depend on which method you choose to use.
Student: What are the different methods?
Mentor: Well, it depends on whether or not the median is part of the data set. If the median is not
part of the original data set then you just use the numbers on one side of the median
depending on which quartile you are trying to calculate. However, it gets a little tricky when
you are trying to calculate the upper and lower quartiles of a data set in which the median is
a number in the set.
Student: What do you mean, "it gets tricky?"
Mentor: I mean there are two different ways people calculate the quartile when the median is a number
in the data set. One method people use, is to include the median in the calculation of both
the upper and lower quartiles. The second way people calculate the upper and lower quartiles
is to exclude the median from the calculation of both quartiles.
Do you remember how we found the median to begin with?
Student: Yes, we took the middle number of the data set if the set had an odd number of values, and we
averaged the two middle digits if there was an even quantity of numbers in the data set.
Mentor: Correct, and we use a similar method to find the different quartiles. If we choose to use the
median in our calculations on sets where the median is a number in the data set, then to find
the lower quartile we need to look at all of the digits from the lowest value through the
median and calculate the median of those numbers. The median of the lower half of the data set
is the first quartile. Can you figure out how we will calculate the third quartile?
Student: My guess is, you look at all the numbers from the median to the greatest number, calculate
their median and that number will be equal to the third quartile.
Mentor: You are absolutely correct. Do you have any questions?
Student: You covered how to calculate the quartiles when the median is part of the data set but, what
if the median is not a part of the data set?
Mentor: Good question! If the median is not part of the data set and you want to calculate the upper
quartile then you just calculate the median of the numbers in the upper 50% of the data set.
Student: And for the lower quartile you just find the median of the lower 50% of the data set.
Mentor: Exactly, now do you want to give me a couple of data sets in which to calculate the median?
Student: Let's use 2 6 7 10 14 15 since it has an even number of numbers in the set, and then we can
use 1 4 9 12 16 23 24 for an odd sized data set. The median for the first set is 8.5 - I
averaged 7 and 10. The median for the second set is 12, the middle number.
Mentor: Good. Now for the quartiles. For 2 6 7 10 14 15 the first quartile is equal to 6 and the
third quartile equal to 14:
Student: Right, I got that. Let me try the other set: 1 4 9 12 16 23 24. If I include the median to
calculate the quartiles then the first quartile is the average of 4 and 9 or 6.5 and the third
quartile is the average of 16 and 23 or 19.5. If I do not include the median to calculate the
quartiles then the lower quartile is 4 and the upper quartile is 23.
Mentor: Right! That was great. So far we have calculated the median, first quartile, and the third
quartile for the second data set. What else do we need to complete our five number summary?
Student: The highest and lowest values of the data set.
Mentor: Right again. We then use those five numbers in drawing our box plot.
Student: Okay.
Lo
1
First Quartile (Q1)
6.5
Median (Q2)
12
Third Quartile (Q3)
19.5
Hi
24
Mentor: The last couple of numbers that are of interest are the ranges. The range of the data set is
the greatest value minus the smallest value. The interquartile range is when you subtract the
first quartile from the third quartile. Do you know what the interquartile range represents?
Student: The middle fifty percent of the data.
Mentor: Right. And the middle fifty percent of the data determines the length of the box. So for the
data set 1 4 9 12 16 23 24 here is our box plot: