The big idea: When you repeat a measurement, you get a set of values — not one number. To describe that set you report two things:
An average — a typical, central value (usually the mean).
A spread — how scattered the values are around the average (the standard deviation).
On a graph the average is the height of the bar, and the spread is shown by an error bar drawn on top of it.
- Mean (x̄)
- The average: add up all the values and divide by how many there are.
- Median
- The middle value when the data are placed in order from smallest to largest.
- Mode
- The most common value, or on a frequency graph the tallest class (most common range).
- Range
- The spread from the smallest to the largest value (largest − smallest).
- Standard deviation (s)
- A measure of how spread out the data are around the mean. Small s = tightly clustered; large s = widely scattered.
- Error bar
- The line drawn on a bar or point showing the spread of the data — usually ± one standard deviation about the mean.
| Statistic | What it tells you | How to find it |
|---|---|---|
| Mean (x̄) | The AVERAGE value — a typical central value | Add all values, divide by how many there are: |
| Median | The MIDDLE value when the data are put in order | Order the data; take the middle one (or the mean of the middle two) |
| Mode | The MOST COMMON value (or class) | The value/class that occurs most often — the tallest bar on a frequency graph |
| Range | The total SPREAD from smallest to largest | Largest value − smallest value |
| Standard deviation (s) | How SPREAD OUT the data are around the mean | A small s = values cluster near the mean; a large s = values are widely spread |
Mean vs median vs mode: Mean = add-them-up-and-divide average.
Median = the middle one in order.
Mode = the most common one (the tallest bar on a frequency graph).
On a histogram the mode is the tallest class and the median is the class holding the middle individual — Paper 1B asks you to read these straight off the bars.
Here is the method, shown with a small data set so you can see every step and carry the units through.
The mean formula: $$\bar{x} = \dfrac{\sum x}{n}$$
where means add up all the values and is how many values there are. The answer keeps the same units as the data.
IB-style question — calculate the mean and describe the spread
A student measured the height of 5 bean seedlings (in mm): 41, 38, 45, 39, 47. Calculate the mean height, and state what an error bar of ± one standard deviation would show. [3]
Worked solution (formula first, then numbers)
- Write the formula. — add the values, divide by how many there are.
- Add the values. mm.
- Divide by n. There are seedlings, so mm. (Mark 1: correct . Mark 2: mm with units.)
- The error bar. A calculator gives the standard deviation mm. An error bar drawn ± one would run from mm up to mm — it shows the spread of the data about the mean, not a single point. (Mark 3: error bar = ± SD / shows the spread.)
Final answer
Mean = 210 ÷ 5 = 42 mm. An error bar of ± 1 standard deviation (≈ 3.7 mm) runs from about 38.3 mm to 45.7 mm and shows how spread out the heights are around the mean.
You don't calculate SD by hand in IB: In the IB you find the standard deviation on your GDC/calculator (the 1-Var Stats function) — you are not expected to compute it by hand.
What you are expected to do is interpret it: a small means the values are tightly clustered (short error bars, more reliable); a large means they are widely scattered (long error bars, less reliable).
Reading an error bar off a chart: An error bar is centred on the mean (the top of the bar) and reaches ± one standard deviation.
So its total height = 2s, and its half-length = s. The longer the bar, the more spread (and the less reliable) the data.
Memorize terms 3x faster
Smart flashcards show you cards right before you forget them. Perfect for definitions and key concepts.
The overlap rule: Error bars let you judge whether two means are really different — without doing a full statistical test.
Error bars OVERLAP → the difference between the means is NOT significant (the two could be the same).
Error bars do NOT overlap (a clear gap) → the difference IS significant (the means are genuinely different).
Small bars = low spread = more reliable. Large bars = high spread = less reliable.
Mean bee visits per hour to three flower colours, each bar capped with an error bar of ± one standard deviation. Blue and purple OVERLAP (difference not significant); yellow sits clearly higher with a CLEAR GAP (difference is significant).
Interactive diagram
Explore the labelled diagram, charts and maps for this topic in full study mode.
IB-style question — use the overlap rule to compare three treatments
Bees visited blue flowers a mean of 22 times/hour (s = 5), purple flowers 26 times/hour (s = 5) and yellow flowers 47 times/hour (s = 4). Using the error bars, state whether (a) blue and purple differ significantly, and (b) yellow and blue differ significantly. Justify each answer. [4]
Worked solution
- Find each bar's reach. Each error bar runs ± one about the mean. Blue: 17 to 27. Purple: 21 to 31. Yellow: 43 to 51.
- (a) Blue vs purple. Blue reaches up to 27 and purple reaches down to 21, so the bars OVERLAP (21–27 is shared). → The difference is NOT significant — bees may visit blue and purple equally. (Marks 1–2.)
- (b) Yellow vs blue. Blue tops out at 27; yellow's bar starts at 43. There is a clear gap (27 up to 43), so the bars do NOT overlap. → The difference IS significant — yellow really is visited more. (Marks 3–4.)
Final answer
Blue (17–27) and purple (21–31) overlap → not significant. Yellow (43–51) and blue (17–27) have a clear gap → significant. Always justify by stating whether the error bars overlap.
| What the error bars do | What it means | Wording to use |
|---|---|---|
| They OVERLAP (no clear gap) | The two means could be the same — the difference is NOT significant | “The error bars overlap, so the difference between the means is not significant.” |
| They DO NOT overlap (clear gap) | The means are genuinely different — the difference IS (likely) significant | “The error bars do not overlap, so the difference is significant.” |
| The bars are SMALL | The data are tightly clustered → more precise / more reliable | “Small error bars show low spread, so the results are reliable.” |
| The bars are LARGE | The data are widely spread → less precise / less reliable | “Large error bars show high spread, so the results are less reliable.” |
Box-and-whisker plots: median, quartiles, outliers: A box plot is another way to show a data set's spread.
The line inside the box is the median. The box runs from the lower quartile (Q1) to the upper quartile (Q3) — the middle 50% of the data (its length is the interquartile range). The whiskers reach the minimum and maximum, and a separate marked point beyond a whisker is an outlier.
On Paper 1B you read the median straight off the box, compare two boxes, or explain what a marked point (an outlier) represents.
Box-and-whisker plots of leaf length (mm) for the same plant grown in shade vs full sun. The line inside each box is the MEDIAN, the box spans the lower-to-upper quartile (the middle 50% of the data), the whiskers reach the minimum and maximum, and a separate marked point is an OUTLIER.
Interactive diagram
Explore the labelled diagram, charts and maps for this topic in full study mode.
Reading the box plot above: Shade has a higher median leaf length (32 mm) than full sun (22 mm).
The box for shade (Q1 = 26 mm to Q3 = 38 mm) holds the middle 50% of shade leaves.
The lonely point at 9 mm (shade) and 47 mm (full sun) are outliers — single values far from the rest.