aimnova.
DashboardMy LearningPaper MasteryStudy Plan

Stay in the loop

Study tips, product updates, and early access to new features.

aimnova.

AI-powered IB study platform with personalised plans, instant feedback, and examiner-style marking.

IB Subjects
  • All IB Subjects
  • IB Diploma
  • IB ESS
  • IB Economics
  • IB Business Management
  • IB Math AI
  • IB Math AA
  • IB Physics
  • IB Biology
  • IB Chemistry
  • IB Geography
  • IB Spanish B
  • IB German B
  • IB French B
  • IB English B
Question Banks
  • ESS Question Bank
  • Economics Question Bank
  • Business Management Question Bank
  • Math AI Question Bank
  • Math AA Question Bank
  • Physics Question Bank
  • Biology Question Bank
  • Chemistry Question Bank
  • Geography Question Bank
  • Spanish B Question Bank
  • German B Question Bank
  • French B Question Bank
  • English B Question Bank
Predicted Topics 2026
  • ESS Predictions 2026
  • Economics Predictions 2026
  • Business Management Predictions 2026
  • Math AI Predictions 2026
  • Math AA Predictions 2026
  • Physics Predictions 2026
  • Biology Predictions 2026
  • Chemistry Predictions 2026
  • Geography Predictions 2026
  • Spanish B Predictions 2026
  • German B Predictions 2026
  • French B Predictions 2026
  • English B Predictions 2026

Study Resources

  • Free Study Notes
  • Mock Exams
  • Revision Guide
  • Flashcards
  • Exam Skills
  • Command Terms
  • Past Paper Feedback
  • Grade Calculator
  • Exam Timetable 2026

Company

  • Features
  • Pricing
  • About Us
  • Blog
  • Contact
  • Terms
  • Privacy
  • Cookies

© 2026 Aimnova. All rights reserved.

Made with 💜 for IB students worldwide

v0.1.1429
NotesBiology HLTopic 5.1Reliability, replicates & evaluating method
Back to Biology HL Topics
5.1.24 min read

Reliability, replicates & evaluating method

IB Biology • Unit 5

7-day free trial

Know exactly what to write for full marks

Practice with exam questions and get AI feedback that shows you the perfect answer — what examiners want to see.

Start Free Trial

Contents

  • Why one measurement is never enough
  • Worked example: how repeats reveal and reduce an anomaly
  • Exam-style question: evaluate the evidence
The big idea: No single measurement can be fully trusted — every measurement carries some random error, and any one reading could be an anomaly (a freak result).

So we repeat each measurement. Each repeat is called a replicate.

Repeats let us do three things: take a mean (which evens out random error), spot anomalies (an odd value that doesn't fit), and check the result is repeatable (we'd get a similar value if we did it again).

A result that we'd get again on repeating is called reliable.
Replicate
One repeat of a measurement made under the same conditions. Several replicates let you calculate a mean.
Reliable / repeatable
A result is reliable if repeating the same method gives a very similar result — the repeats agree with each other.
Anomaly (anomalous result)
A measurement that lies well outside the others. Repeats let you spot it; you usually re-check it or leave it out of the mean.
Random error
Small unpredictable variation that makes repeats differ. Taking a mean of many repeats reduces its effect.
Mean
The average of the replicates: add the values and divide by how many there are. The single best estimate from a set of repeats.
Replicates vs range of the variable: These are two different kinds of 'more':

More replicates = measure the SAME setting several times → makes each point more reliable.

More values of the variable = test MORE settings (e.g. more temperatures) → shows the trend across the range.

A good improvement answer often needs both.
A RELIABLE methodAn UNRELIABLE method
RepeatsEach measurement is REPEATED several times (replicates)Each value is measured only ONCE
Spread of repeatsRepeats are CLOSE together (small spread)Repeats are WIDELY scattered (large spread)
What you do with themCalculate a MEAN; spot and re-check anomaliesNo mean possible; an anomaly is impossible to detect
If you repeated the whole experimentYou'd get a very SIMILAR result (repeatable)You might get a very DIFFERENT result
Confidence in the resultHIGH — the value is trustworthyLOW — the value could be a one-off fluke

Let's see, with real numbers, why repeats make a result more reliable. A student measures the rate of photosynthesis of pondweed at one light intensity by counting the bubbles of oxygen released per minute. They take five replicates.

The five replicate readings: Replicate 1: 31 · Replicate 2: 33 · Replicate 3: 30 · Replicate 4: 32 · Replicate 5: 64 bubbles per minute.

Four readings sit around 30; one (64) is far higher — a likely anomaly (maybe two bubbles were counted as one, or a bubble stuck and released late).

IB-style question — using the mean and the spread to handle an anomaly

(a) Calculate the mean if all five readings are kept. (b) Identify the anomaly and recalculate the mean without it. (c) State which mean is more reliable, and why. [3]

Worked solution (formula first, then numbers with units)

  1. (a) The formula for the mean. — add the readings and divide by how many there are ().
  2. Substitute all five. bubbles per minute. Notice this mean (38) is higher than four of the five readings — a sign one value is dragging it up.
  3. (b) Identify the anomaly. The 64 lies far outside the cluster (30–33), so it is the anomaly. Recompute without it: bubbles per minute.
  4. Measure the spread. Range max min. With the anomaly the spread is ; without it the spread is . A spread of 3 is tight — the repeats agree.
  5. (c) Which is more reliable? The mean of 31.5 is more reliable: removing the single anomalous reading leaves four repeats that agree closely (small spread), so the value would be reproduced on repeating. Keeping the 64 makes the mean unrepresentative.

Final answer

(a) Mean of all five = 190 ÷ 5 = 38 bubbles min⁻¹. (b) 64 is the anomaly; mean of the other four = 126 ÷ 4 = 31.5 bubbles min⁻¹. (c) 31.5 is more reliable, because the four remaining repeats are tightly clustered (range only 3), so the result is repeatable, whereas the single 64 distorts the mean.

The two reasons repeats make a result reliable: 1. They reveal anomalies. With one reading you can't tell a fluke from a true value; with several, an odd one stands out and can be re-checked or excluded.

2. They reduce random error. Averaging many repeats cancels out the small random ups-and-downs, so the mean is closer to the true value than any single reading.

And the spread of the repeats (range or standard deviation) is itself the evidence: small spread = reliable; large spread = you need more repeats before trusting the mean.

How spread out the repeats are tells you how RELIABLE the method is. Each bar is the mean of several repeats; the cap shows the spread (± the range/standard deviation). 'Tight method' has small, tightly-clustered repeats (small spread = reliable); 'scattered method' has widely-scattered repeats (large spread = less reliable) even though the means are similar.

Interactive diagram

Explore the labelled diagram, charts and maps for this topic in full study mode.

Unlock free for 7 days
Reliable, valid and accurate are NOT the same: Examiners use these words precisely — answer the one they ask for:

Reliable = repeats of the same method agree (fix it with more replicates).

Valid = the method is a fair test of what it claims — only one variable changed, others controlled, a control present (fix it with a control / controlling variables).

Accurate = a reading is close to the true value (fix it with a better instrument).
TermWhat it asksHow you improve it
Reliable / repeatableDo REPEATS of the SAME method agree with each other?More replicates; reduce random error
ValidDoes the method actually TEST what it claims (fair test, control, only one variable changed)?Add a control; control all other variables; test the right range
AccurateIs a measurement CLOSE to the true value?Calibrate / use a better instrument; reduce systematic error

Learn what examiners really want

See exactly what to write to score full marks. Our AI shows you model answers and the key phrases examiners look for.

Try AI Feedback Free7-day free trial • No card required
How this is tested: On Paper 1B (and threaded through the data question) this micro appears as short Suggest / Explain / Evaluate items hung off someone's method and data:

• Suggest how to make it more reliable → 'repeat each measurement more times and take a mean' (reduces random error / reveals anomalies).

• Justify why replicates are needed → so you can calculate a mean, spot anomalies and check repeatability.

• Propose improvements → give a fix AND a matched reason (more replicates → more reliable; a control → shows the effect is due to your variable; a wider range → reveals the full trend).

• Evaluate whether the data support a claim → say what the data show, THEN weigh the support against limitations (small sample, big spread/overlap, no control, only one factor changed). A balanced answer ('supports it, BUT…') scores best.
The scenario: A gardener claims a new plant feed makes basil grow taller. They grew 3 fed plants and 3 unfed plants for two weeks and measured final height (cm):

Fed: 18, 20, 19 (mean 19 cm)

Unfed: 14, 22, 12 (mean 16 cm)

The gardener concludes: 'The feed clearly makes basil grow taller.'

IB-style question — evaluate the claim that the feed makes basil grow taller

Evaluate the gardener's claim using the data, and suggest how the investigation could be improved. [4]

How to score all four marks

  1. What the data show (support). The mean height of the fed plants ( cm) is higher than the unfed plants ( cm), so on average the fed plants were taller — this supports the claim.
  2. Why it is weak (limitation 1 — spread/overlap). The unfed plants are widely spread (12 to 22 cm, range 10) and actually overlap the fed plants (the tallest unfed plant, 22 cm, beats every fed plant). A 3 cm difference in means is small compared with that spread, so the difference may just be random variation, not the feed.
  3. Why it is weak (limitation 2 — sample size & repeats). Only 3 plants per group is a very small sample, so the means are unreliable; a single odd plant shifts the mean a lot. We can't yet call the result reliable.
  4. Improvements (each with a matched reason). Use many more plants per group (a larger sample → more reliable mean); control other variables (same light, water, soil, pot size → a fair test, so any difference is due to the feed); ideally repeat the whole experiment. Only then could you fairly judge the claim — a chi-squared or t-test could check whether the difference is statistically significant. (Award marks for: states data supports it; spread/overlap point; small-sample point; a matched improvement.)

Final answer

On average the fed plants were taller (mean 19 cm vs 16 cm), which supports the claim. BUT the unfed heights are widely spread (12–22 cm) and overlap the fed group, and there are only 3 plants per group, so the 3 cm difference could be random variation rather than the feed. Improve it by using many more plants per group (larger, more reliable sample) and controlling all other variables (light, water, soil) for a fair test, so any difference can be attributed to the feed.

✓ Why this scores full marks: It is balanced: it first says what the data show (the means support the claim), then weighs that against the limitations (overlap/spread and tiny sample), and finishes with matched improvements (more plants → reliable; control variables → valid).

A common way to lose marks is to answer 'yes, the feed works' from the means alone, ignoring the spread, the overlap and the sample size.

Try an IB Exam Question — Free AI Feedback

Test yourself on Reliability, replicates & evaluating method. Write your answer and get instant AI feedback — just like a real IB examiner.

Two researchers compared the growth of bacteria on plates of very different starting sizes.

Instead of reporting the absolute increase in colony number, they reported the percentage increase relative to each plate's starting count.

why expressing the results as a relative (percentage) change, rather than an absolute change, allows a fairer comparison between the plates.
[2 marks]

Related Biology HL Topics

Continue learning with these related topics from the same unit:

5.1.1Experimental design: variables & controls
5.1.3Magnification, scale bars & microscope measurement
5.1.4Reading & interpreting graphs, charts & tables
5.1.5Percentage change, ratios & rate from graphs
View all Biology HL topics

Improve your exam technique

Command terms, paper structure, and mark-scheme tips for Biology HL

Previous
5.1.1Experimental design: variables & controls
Next
Magnification, scale bars & microscope measurement5.1.3

16 questions to test your understanding

Reading is just the start. Students who tested themselves scored 82% on average — try IB-style questions with AI feedback.

Start Free TrialView All Biology HL Topics