The big idea: No single measurement can be fully trusted — every measurement carries some random error, and any one reading could be an anomaly (a freak result).
So we repeat each measurement. Each repeat is called a replicate.
Repeats let us do three things: take a mean (which evens out random error), spot anomalies (an odd value that doesn't fit), and check the result is repeatable (we'd get a similar value if we did it again).
A result that we'd get again on repeating is called reliable.
- Replicate
- One repeat of a measurement made under the same conditions. Several replicates let you calculate a mean.
- Reliable / repeatable
- A result is reliable if repeating the same method gives a very similar result — the repeats agree with each other.
- Anomaly (anomalous result)
- A measurement that lies well outside the others. Repeats let you spot it; you usually re-check it or leave it out of the mean.
- Random error
- Small unpredictable variation that makes repeats differ. Taking a mean of many repeats reduces its effect.
- Mean
- The average of the replicates: add the values and divide by how many there are. The single best estimate from a set of repeats.
Replicates vs range of the variable: These are two different kinds of 'more':
More replicates = measure the SAME setting several times → makes each point more reliable.
More values of the variable = test MORE settings (e.g. more temperatures) → shows the trend across the range.
A good improvement answer often needs both.
| A RELIABLE method | An UNRELIABLE method | |
|---|---|---|
| Repeats | Each measurement is REPEATED several times (replicates) | Each value is measured only ONCE |
| Spread of repeats | Repeats are CLOSE together (small spread) | Repeats are WIDELY scattered (large spread) |
| What you do with them | Calculate a MEAN; spot and re-check anomalies | No mean possible; an anomaly is impossible to detect |
| If you repeated the whole experiment | You'd get a very SIMILAR result (repeatable) | You might get a very DIFFERENT result |
| Confidence in the result | HIGH — the value is trustworthy | LOW — the value could be a one-off fluke |
Let's see, with real numbers, why repeats make a result more reliable. A student measures the rate of photosynthesis of pondweed at one light intensity by counting the bubbles of oxygen released per minute. They take five replicates.
The five replicate readings: Replicate 1: 31 · Replicate 2: 33 · Replicate 3: 30 · Replicate 4: 32 · Replicate 5: 64 bubbles per minute.
Four readings sit around 30; one (64) is far higher — a likely anomaly (maybe two bubbles were counted as one, or a bubble stuck and released late).
IB-style question — using the mean and the spread to handle an anomaly
(a) Calculate the mean if all five readings are kept. (b) Identify the anomaly and recalculate the mean without it. (c) State which mean is more reliable, and why. [3]
Worked solution (formula first, then numbers with units)
- (a) The formula for the mean. — add the readings and divide by how many there are ().
- Substitute all five. bubbles per minute. Notice this mean (38) is higher than four of the five readings — a sign one value is dragging it up.
- (b) Identify the anomaly. The 64 lies far outside the cluster (30–33), so it is the anomaly. Recompute without it: bubbles per minute.
- Measure the spread. Range max min. With the anomaly the spread is ; without it the spread is . A spread of 3 is tight — the repeats agree.
- (c) Which is more reliable? The mean of 31.5 is more reliable: removing the single anomalous reading leaves four repeats that agree closely (small spread), so the value would be reproduced on repeating. Keeping the 64 makes the mean unrepresentative.
Final answer
(a) Mean of all five = 190 ÷ 5 = 38 bubbles min⁻¹. (b) 64 is the anomaly; mean of the other four = 126 ÷ 4 = 31.5 bubbles min⁻¹. (c) 31.5 is more reliable, because the four remaining repeats are tightly clustered (range only 3), so the result is repeatable, whereas the single 64 distorts the mean.
The two reasons repeats make a result reliable: 1. They reveal anomalies. With one reading you can't tell a fluke from a true value; with several, an odd one stands out and can be re-checked or excluded.
2. They reduce random error. Averaging many repeats cancels out the small random ups-and-downs, so the mean is closer to the true value than any single reading.
And the spread of the repeats (range or standard deviation) is itself the evidence: small spread = reliable; large spread = you need more repeats before trusting the mean.
How spread out the repeats are tells you how RELIABLE the method is. Each bar is the mean of several repeats; the cap shows the spread (± the range/standard deviation). 'Tight method' has small, tightly-clustered repeats (small spread = reliable); 'scattered method' has widely-scattered repeats (large spread = less reliable) even though the means are similar.
Interactive diagram
Explore the labelled diagram, charts and maps for this topic in full study mode.
Reliable, valid and accurate are NOT the same: Examiners use these words precisely — answer the one they ask for:
Reliable = repeats of the same method agree (fix it with more replicates).
Valid = the method is a fair test of what it claims — only one variable changed, others controlled, a control present (fix it with a control / controlling variables).
Accurate = a reading is close to the true value (fix it with a better instrument).
| Term | What it asks | How you improve it |
|---|---|---|
| Reliable / repeatable | Do REPEATS of the SAME method agree with each other? | More replicates; reduce random error |
| Valid | Does the method actually TEST what it claims (fair test, control, only one variable changed)? | Add a control; control all other variables; test the right range |
| Accurate | Is a measurement CLOSE to the true value? | Calibrate / use a better instrument; reduce systematic error |
Learn what examiners really want
See exactly what to write to score full marks. Our AI shows you model answers and the key phrases examiners look for.
How this is tested: On Paper 1B (and threaded through the data question) this micro appears as short Suggest / Explain / Evaluate items hung off someone's method and data:
• Suggest how to make it more reliable → 'repeat each measurement more times and take a mean' (reduces random error / reveals anomalies).
• Justify why replicates are needed → so you can calculate a mean, spot anomalies and check repeatability.
• Propose improvements → give a fix AND a matched reason (more replicates → more reliable; a control → shows the effect is due to your variable; a wider range → reveals the full trend).
• Evaluate whether the data support a claim → say what the data show, THEN weigh the support against limitations (small sample, big spread/overlap, no control, only one factor changed). A balanced answer ('supports it, BUT…') scores best.
The scenario: A gardener claims a new plant feed makes basil grow taller. They grew 3 fed plants and 3 unfed plants for two weeks and measured final height (cm):
Fed: 18, 20, 19 (mean 19 cm)
Unfed: 14, 22, 12 (mean 16 cm)
The gardener concludes: 'The feed clearly makes basil grow taller.'
IB-style question — evaluate the claim that the feed makes basil grow taller
Evaluate the gardener's claim using the data, and suggest how the investigation could be improved. [4]
How to score all four marks
- What the data show (support). The mean height of the fed plants ( cm) is higher than the unfed plants ( cm), so on average the fed plants were taller — this supports the claim.
- Why it is weak (limitation 1 — spread/overlap). The unfed plants are widely spread (12 to 22 cm, range 10) and actually overlap the fed plants (the tallest unfed plant, 22 cm, beats every fed plant). A 3 cm difference in means is small compared with that spread, so the difference may just be random variation, not the feed.
- Why it is weak (limitation 2 — sample size & repeats). Only 3 plants per group is a very small sample, so the means are unreliable; a single odd plant shifts the mean a lot. We can't yet call the result reliable.
- Improvements (each with a matched reason). Use many more plants per group (a larger sample → more reliable mean); control other variables (same light, water, soil, pot size → a fair test, so any difference is due to the feed); ideally repeat the whole experiment. Only then could you fairly judge the claim — a chi-squared or t-test could check whether the difference is statistically significant. (Award marks for: states data supports it; spread/overlap point; small-sample point; a matched improvement.)
Final answer
On average the fed plants were taller (mean 19 cm vs 16 cm), which supports the claim. BUT the unfed heights are widely spread (12–22 cm) and overlap the fed group, and there are only 3 plants per group, so the 3 cm difference could be random variation rather than the feed. Improve it by using many more plants per group (larger, more reliable sample) and controlling all other variables (light, water, soil) for a fair test, so any difference can be attributed to the feed.
✓ Why this scores full marks: It is balanced: it first says what the data show (the means support the claim), then weighs that against the limitations (overlap/spread and tiny sample), and finishes with matched improvements (more plants → reliable; control variables → valid).
A common way to lose marks is to answer 'yes, the feed works' from the means alone, ignoring the spread, the overlap and the sample size.