The big idea: Two groups in an experiment will almost never give exactly the same numbers — there is always some random variation. So when you see a difference, you have to ask:
Is this a REAL difference, or could it just be chance?
A statistical test answers that. It gives a number you compare to a critical value. From that you decide whether the result is statistically significant — unlikely to be just chance.
- Statistically significant
- A result so unlikely to have happened by chance alone that we accept it is a real effect. By convention we use a 5% cut-off (p = 0.05).
- Null hypothesis (H₀)
- The 'no effect' starting assumption: there is NO real difference between the groups (or NO association) — any difference is just chance.
- Alternative hypothesis (H₁)
- There IS a real difference between the groups (or a real association).
- p-value
- The probability that a difference this big would happen by chance if H₀ were true. Small p = unlikely to be chance.
- Critical value
- A threshold (read from a table at p = 0.05 and the right degrees of freedom) that the calculated statistic must reach to count as significant. These values are GIVEN to you on the paper.
- Degrees of freedom (df)
- A number, set by how many categories or measurements you have, that tells you which row of the critical-value table to read.
The p = 0.05 rule (this is the whole decision): We compare the calculated statistic to the critical value at p = 0.05:
If the calculated value is BIGGER than (or equal to) the critical value → p < 0.05 → the result is significant → reject H₀ (the difference is real).
If the calculated value is SMALLER than the critical value → p > 0.05 → the result is not significant → do not reject H₀ (the difference could be chance).
Memory hook: bigger calculated → significant.
Which test? Match the data type: Chi-squared (χ²) is for counts — does a set of categories fit an expected ratio, or is there an association between two categorical variables?
t-test is for two means of a measured variable — are two averages significantly different?
Ask first: am I comparing counts (χ²) or averages (t-test)?
| Chi-squared (χ²) test | t-test | |
|---|---|---|
| Use it to compare… | OBSERVED counts/frequencies against EXPECTED counts (e.g. a genetic ratio, or an association in a contingency table) | TWO MEANS of a measured (continuous) variable from two groups |
| Typical biology example | Do the offspring fit an expected 3:1 monohybrid ratio? Is flower colour associated with habitat? | Is mean leaf length different between shaded and sunlit plants? |
| Data type | Counts / categories | Measurements (numbers you can average) |
| What you compare | Calculated χ² vs the critical value | Calculated t vs the critical value |
| Significant when | calculated χ² ≥ critical value | calculated t ≥ critical value |
Both tests follow the same five steps — state the hypotheses, work out what is expected, calculate the statistic, look up the critical value, then compare. The maths is given to you; the marks are for using it correctly and drawing the right conclusion.
The 5 steps (the same shape for both tests)
- State the hypotheses. H₀ (null): there is no real difference/association (any difference is just chance). H₁ (alternative): there is a real difference/association.
- Work out what is expected under H₀. For χ²: the expected counts from the ratio. For a t-test: you compare the two sample means directly.
- Calculate the test statistic ( or ) from your data using the formula.
- Find the degrees of freedom (df) and read the critical value at p = 0.05 from the table (this table is GIVEN to you).
- Compare and conclude. If the calculated value ≥ the critical value, the result is significant (p < 0.05) → reject H₀. If it is smaller, the result is not significant (p > 0.05) → do not reject H₀.
Chi-squared formula:
where O = the observed count, E = the expected count, and $\sum$ means you add up the term for every category. The expected counts come from the ratio you are testing.
Worked example — does the cross fit a 3 : 1 ratio?
Solution
- State the hypotheses. H₀: the offspring fit a 3 : 1 ratio (any difference is chance). H₁: they do not fit 3 : 1.
- Expected counts (E) from the 3 : 1 ratio out of 160:
- Calculate χ² — one term per category, then add:
- Work each term out:
- Degrees of freedom = (number of categories − 1) = 2 − 1 = 1. From the table, the critical χ² at df = 1, p = 0.05 is 3.84.
- Compare and conclude. Calculated χ² = 4.8 is bigger than the critical 3.84, so p < 0.05. The result is significant → reject H₀: the offspring do not fit a 3 : 1 ratio.
Final answer
χ² = 4.8 (df = 1). This is greater than the critical value 3.84 at p = 0.05, so the difference is significant and we reject H₀ — the offspring do not fit a 3 : 1 ratio.
| Degrees of freedom (df) | Critical χ² at p = 0.05 |
|---|---|
| 1 | 3.84 |
| 2 | 5.99 |
| 3 | 7.81 |
| 4 | 9.49 |
t-test — comparing two means: A t-test asks whether two sample means ( and ) are significantly different. You calculate a value of t from the two means, their spreads (standard deviations) and the sample sizes — the formula is given to you, so the marks are for the decision, not for memorising it:
If calculated $t \ge$ critical $t$ (at p = 0.05) → the means ARE significantly different.
If calculated $t <$ critical $t$ → they are NOT significantly different.
Degrees of freedom for two groups of size and : .
Worked example — are the two means significantly different?
Solution
- Hypotheses. H₀: there is no difference in mean leaf length between sunlit and shaded plants. H₁: there is a difference.
- Degrees of freedom:
- Critical value. From the table, the critical t at df = 30, p = 0.05 is 2.04.
- Compare and conclude. The calculated t = 1.78 is smaller than the critical 2.04, so p > 0.05. The difference is not significant → do not reject H₀: there is no evidence that mean leaf length differs.
Final answer
Calculated t = 1.78 < critical t = 2.04 (df = 30, p = 0.05), so the difference is not significant — we do not reject H₀.
| Degrees of freedom (df) | Critical t at p = 0.05 (two-tailed) |
|---|---|
| 5 | 2.57 |
| 10 | 2.23 |
| 20 | 2.09 |
| 30 | 2.04 |
The error-bar shortcut for a t-test: On a graph of two means with error bars you can often judge significance by eye:
Error bars OVERLAP → the means are probably NOT significantly different (a t-test would give t below the critical value).
Error bars clearly DON'T overlap → the difference may well be significant.
This is exactly the reasoning examiners want when they show error bars on a chart.
Two means with overlapping error bars: because the bars overlap, the difference between the means is NOT statistically significant — a t-test would give t below the critical value (p > 0.05).
Interactive diagram
Explore the labelled diagram, charts and maps for this topic in full study mode.
Practice with real exam questions
Answer exam-style questions and get AI feedback that shows you exactly what examiners want to see in a full-marks response.
How this is tested: On Paper 1B and in the IA you are GIVEN the formulae and the critical-value table — the marks are for running the test correctly and stating the right conclusion, not for memorising maths.
Common tasks: justify, using the data, that a difference is non-significant (calculated value below the critical value / p > 0.05 / error bars overlap); explain how to collect quantitative data so a treatment can be tested statistically (a numeric measure, replicates, treated vs control, then a t-test on the two means); and draw conclusions from a graph that shows p-values, stating an effect only where it is marked significant.
IB-style question — test a dihybrid-style count against an expected ratio
How to score all four marks
- State H₀. H₀: the offspring fit a 3 : 1 grey : ebony ratio; any difference is due to chance. (H₁: they do not fit 3 : 1.)
- Expected counts from 3 : 1 out of 120:
- Calculate χ² using :
- Conclusion. df = 2 − 1 = 1, so the critical value is 3.84. Calculated χ² = 0.18 is smaller than 3.84, so p > 0.05: the result is not significant → do not reject H₀. The offspring do fit a 3 : 1 ratio. (Mark 1: H₀ stated. Mark 2: correct expected counts. Mark 3: χ² ≈ 0.18 correctly calculated. Mark 4: compares to 3.84 and concludes 'not significant / fits 3 : 1'.)
Final answer
H₀: offspring fit 3 : 1. Expected = 90 grey, 30 ebony. χ² = 4/90 + 4/30 = 0.18. df = 1, critical = 3.84. Since 0.18 < 3.84, p > 0.05 — not significant, do not reject H₀: the data fit a 3 : 1 ratio.
✓ Why this scores full marks: It states H₀, works the expected counts from the ratio, plugs into to get a correct value, and — crucially — compares to the critical value and gives a biological conclusion ('fits 3 : 1 / not significant').
The classic way to lose the last mark is to calculate χ² but forget to compare it to 3.84, or to get the direction backwards (remember: calculated bigger = significant).
Justifying a NON-significant result (a 2-mark favourite): If asked to justify that a difference is not significant, quote the numbers:
'The calculated statistic is below the critical value at p = 0.05 (so p > 0.05), therefore the difference is not statistically significant and we do not reject H₀.'
On a chart you can add: 'the error bars overlap, which agrees that the means are not significantly different.'