IB Biology HL - Reliability, replicates & evaluating method | Free Notes

The big idea: No single measurement can be fully trusted — every measurement carries some random error, and any one reading could be an anomaly (a freak result).

So we repeat each measurement. Each repeat is called a replicate.

Repeats let us do three things: take a mean (which evens out random error), spot anomalies (an odd value that doesn't fit), and check the result is repeatable (we'd get a similar value if we did it again).

A result that we'd get again on repeating is called reliable.

Replicate: One repeat of a measurement made under the same conditions. Several replicates let you calculate a mean.
Reliable / repeatable: A result is reliable if repeating the same method gives a very similar result — the repeats agree with each other.
Anomaly (anomalous result): A measurement that lies well outside the others. Repeats let you spot it; you usually re-check it or leave it out of the mean.
Random error: Small unpredictable variation that makes repeats differ. Taking a mean of many repeats reduces its effect.
Mean: The average of the replicates: add the values and divide by how many there are. The single best estimate from a set of repeats.

Replicates vs range of the variable: These are two different kinds of 'more':

More replicates = measure the SAME setting several times → makes each point more reliable.

More values of the variable = test MORE settings (e.g. more temperatures) → shows the trend across the range.

A good improvement answer often needs both.

	A RELIABLE method	An UNRELIABLE method
Repeats	Each measurement is REPEATED several times (replicates)	Each value is measured only ONCE
Spread of repeats	Repeats are CLOSE together (small spread)	Repeats are WIDELY scattered (large spread)
What you do with them	Calculate a MEAN; spot and re-check anomalies	No mean possible; an anomaly is impossible to detect
If you repeated the whole experiment	You'd get a very SIMILAR result (repeatable)	You might get a very DIFFERENT result
Confidence in the result	HIGH — the value is trustworthy	LOW — the value could be a one-off fluke

Let's see, with real numbers, why repeats make a result more reliable. A student measures the rate of photosynthesis of pondweed at one light intensity by counting the bubbles of oxygen released per minute. They take five replicates.

The five replicate readings: Replicate 1: 31 · Replicate 2: 33 · Replicate 3: 30 · Replicate 4: 32 · Replicate 5: 64 bubbles per minute.

Four readings sit around 30; one (64) is far higher — a likely anomaly (maybe two bubbles were counted as one, or a bubble stuck and released late).

IB-style question — using the mean and the spread to handle an anomaly

(a) Calculate the mean if all five readings are kept. (b) Identify the anomaly and recalculate the mean without it. (c) State which mean is more reliable, and why. [3]

Worked solution (formula first, then numbers with units)

(a) The formula for the mean. — add the readings and divide by how many there are ().
Substitute all five. bubbles per minute. Notice this mean (38) is higher than four of the five readings — a sign one value is dragging it up.
(b) Identify the anomaly. The 64 lies far outside the cluster (30–33), so it is the anomaly. Recompute without it: bubbles per minute.
Measure the spread. Range max min. With the anomaly the spread is ; without it the spread is . A spread of 3 is tight — the repeats agree.
(c) Which is more reliable? The mean of 31.5 is more reliable: removing the single anomalous reading leaves four repeats that agree closely (small spread), so the value would be reproduced on repeating. Keeping the 64 makes the mean unrepresentative.

Final answer

(a) Mean of all five = 190 ÷ 5 = 38 bubbles min⁻¹. (b) 64 is the anomaly; mean of the other four = 126 ÷ 4 = 31.5 bubbles min⁻¹. (c) 31.5 is more reliable, because the four remaining repeats are tightly clustered (range only 3), so the result is repeatable, whereas the single 64 distorts the mean.

The two reasons repeats make a result reliable: 1. They reveal anomalies. With one reading you can't tell a fluke from a true value; with several, an odd one stands out and can be re-checked or excluded.

2. They reduce random error. Averaging many repeats cancels out the small random ups-and-downs, so the mean is closer to the true value than any single reading.

And the spread of the repeats (range or standard deviation) is itself the evidence: small spread = reliable; large spread = you need more repeats before trusting the mean.

How spread out the repeats are tells you how RELIABLE the method is. Each bar is the mean of several repeats; the cap shows the spread (± the range/standard deviation). 'Tight method' has small, tightly-clustered repeats (small spread = reliable); 'scattered method' has widely-scattered repeats (large spread = less reliable) even though the means are similar.

Interactive diagram

Explore the labelled diagram, charts and maps for this topic in full study mode.

Unlock free for 7 days

Reliable, valid and accurate are NOT the same: Examiners use these words precisely — answer the one they ask for:

Reliable = repeats of the same method agree (fix it with more replicates).

Valid = the method is a fair test of what it claims — only one variable changed, others controlled, a control present (fix it with a control / controlling variables).

Accurate = a reading is close to the true value (fix it with a better instrument).

Term	What it asks	How you improve it
Reliable / repeatable	Do REPEATS of the SAME method agree with each other?	More replicates; reduce random error
Valid	Does the method actually TEST what it claims (fair test, control, only one variable changed)?	Add a control; control all other variables; test the right range
Accurate	Is a measurement CLOSE to the true value?	Calibrate / use a better instrument; reduce systematic error

Learn what examiners really want

See exactly what to write to score full marks. Our AI shows you model answers and the key phrases examiners look for.

Try AI Feedback Free7-day free trial • No card required

How this is tested: On Paper 1B (and threaded through the data question) this micro appears as short Suggest / Explain / Evaluate items hung off someone's method and data:

• Suggest how to make it more reliable → 'repeat each measurement more times and take a mean' (reduces random error / reveals anomalies).

• Justify why replicates are needed → so you can calculate a mean, spot anomalies and check repeatability.

• Propose improvements → give a fix AND a matched reason (more replicates → more reliable; a control → shows the effect is due to your variable; a wider range → reveals the full trend).

• Evaluate whether the data support a claim → say what the data show, THEN weigh the support against limitations (small sample, big spread/overlap, no control, only one factor changed). A balanced answer ('supports it, BUT…') scores best.

The scenario: A gardener claims a new plant feed makes basil grow taller. They grew 3 fed plants and 3 unfed plants for two weeks and measured final height (cm):

Fed: 18, 20, 19 (mean 19 cm)

Unfed: 14, 22, 12 (mean 16 cm)

The gardener concludes: 'The feed clearly makes basil grow taller.'

IB-style question — evaluate the claim that the feed makes basil grow taller

Evaluate the gardener's claim using the data, and suggest how the investigation could be improved. [4]

How to score all four marks

What the data show (support). The mean height of the fed plants ( cm) is higher than the unfed plants ( cm), so on average the fed plants were taller — this supports the claim.
Why it is weak (limitation 1 — spread/overlap). The unfed plants are widely spread (12 to 22 cm, range 10) and actually overlap the fed plants (the tallest unfed plant, 22 cm, beats every fed plant). A 3 cm difference in means is small compared with that spread, so the difference may just be random variation, not the feed.
Why it is weak (limitation 2 — sample size & repeats). Only 3 plants per group is a very small sample, so the means are unreliable; a single odd plant shifts the mean a lot. We can't yet call the result reliable.
Improvements (each with a matched reason). Use many more plants per group (a larger sample → more reliable mean); control other variables (same light, water, soil, pot size → a fair test, so any difference is due to the feed); ideally repeat the whole experiment. Only then could you fairly judge the claim — a chi-squared or t-test could check whether the difference is statistically significant. (Award marks for: states data supports it; spread/overlap point; small-sample point; a matched improvement.)

Final answer

On average the fed plants were taller (mean 19 cm vs 16 cm), which supports the claim. BUT the unfed heights are widely spread (12–22 cm) and overlap the fed group, and there are only 3 plants per group, so the 3 cm difference could be random variation rather than the feed. Improve it by using many more plants per group (larger, more reliable sample) and controlling all other variables (light, water, soil) for a fair test, so any difference can be attributed to the feed.

✓ Why this scores full marks: It is balanced: it first says what the data show (the means support the claim), then weighs that against the limitations (overlap/spread and tiny sample), and finishes with matched improvements (more plants → reliable; control variables → valid).

A common way to lose marks is to answer 'yes, the feed works' from the means alone, ignoring the spread, the overlap and the sample size.

A RELIABLE method

An UNRELIABLE method

Repeats

Each measurement is REPEATED several times (replicates)

Each value is measured only ONCE

Spread of repeats

Repeats are CLOSE together (small spread)

Repeats are WIDELY scattered (large spread)

What you do with them

Calculate a MEAN; spot and re-check anomalies

No mean possible; an anomaly is impossible to detect

If you repeated the whole experiment

You'd get a very SIMILAR result (repeatable)

You might get a very DIFFERENT result

Confidence in the result

HIGH — the value is trustworthy

LOW — the value could be a one-off fluke

Term

What it asks

How you improve it

Reliable / repeatable

Do REPEATS of the SAME method agree with each other?

More replicates; reduce random error

Valid

Does the method actually TEST what it claims (fair test, control, only one variable changed)?

Add a control; control all other variables; test the right range

Accurate

Is a measurement CLOSE to the true value?

Calibrate / use a better instrument; reduce systematic error

Reliability, replicates & evaluating method

Know exactly what to write for full marks

Learn what examiners really want

Try an IB Exam Question — Free AI Feedback

Related Biology HL Topics

16 questions to test your understanding

Reliability, replicates & evaluating method

Know exactly what to write for full marks

Learn what examiners really want

Try an IB Exam Question — Free AI Feedback

Related Biology HL Topics

16 questions to test your understanding