IB Biology HL - Statistical significance: chi-squared & t-test | Free Notes

The big idea: Two groups in an experiment will almost never give exactly the same numbers — there is always some random variation. So when you see a difference, you have to ask:

Is this a REAL difference, or could it just be chance?

A statistical test answers that. It gives a number you compare to a critical value. From that you decide whether the result is statistically significant — unlikely to be just chance.

Statistically significant: A result so unlikely to have happened by chance alone that we accept it is a real effect. By convention we use a 5% cut-off (p = 0.05).
Null hypothesis (H₀): The 'no effect' starting assumption: there is NO real difference between the groups (or NO association) — any difference is just chance.
Alternative hypothesis (H₁): There IS a real difference between the groups (or a real association).
p-value: The probability that a difference this big would happen by chance if H₀ were true. Small p = unlikely to be chance.
Critical value: A threshold (read from a table at p = 0.05 and the right degrees of freedom) that the calculated statistic must reach to count as significant. These values are GIVEN to you on the paper.
Degrees of freedom (df): A number, set by how many categories or measurements you have, that tells you which row of the critical-value table to read.

The p = 0.05 rule (this is the whole decision): We compare the calculated statistic to the critical value at p = 0.05:

If the calculated value is BIGGER than (or equal to) the critical value → p < 0.05 → the result is significant → reject H₀ (the difference is real).

If the calculated value is SMALLER than the critical value → p > 0.05 → the result is not significant → do not reject H₀ (the difference could be chance).

Memory hook: bigger calculated → significant.

Which test? Match the data type: Chi-squared (χ²) is for counts — does a set of categories fit an expected ratio, or is there an association between two categorical variables?

t-test is for two means of a measured variable — are two averages significantly different?

Ask first: am I comparing counts (χ²) or averages (t-test)?

	Chi-squared (χ²) test	t-test
Use it to compare…	OBSERVED counts/frequencies against EXPECTED counts (e.g. a genetic ratio, or an association in a contingency table)	TWO MEANS of a measured (continuous) variable from two groups
Typical biology example	Do the offspring fit an expected 3:1 monohybrid ratio? Is flower colour associated with habitat?	Is mean leaf length different between shaded and sunlit plants?
Data type	Counts / categories	Measurements (numbers you can average)
What you compare	Calculated χ² vs the critical value	Calculated t vs the critical value
Significant when	calculated χ² ≥ critical value	calculated t ≥ critical value

Both tests follow the same five steps — state the hypotheses, work out what is expected, calculate the statistic, look up the critical value, then compare. The maths is given to you; the marks are for using it correctly and drawing the right conclusion.

The 5 steps (the same shape for both tests)

State the hypotheses. H₀ (null): there is no real difference/association (any difference is just chance). H₁ (alternative): there is a real difference/association.
Work out what is expected under H₀. For χ²: the expected counts from the ratio. For a t-test: you compare the two sample means directly.
Calculate the test statistic ( or ) from your data using the formula.
Find the degrees of freedom (df) and read the critical value at p = 0.05 from the table (this table is GIVEN to you).
Compare and conclude. If the calculated value ≥ the critical value, the result is significant (p < 0.05) → reject H₀. If it is smaller, the result is not significant (p > 0.05) → do not reject H₀.

Chi-squared formula:

where O = the observed count, E = the expected count, and $\sum$ means you add up the term for every category. The expected counts come from the ratio you are testing.

Worked example — does the cross fit a 3 : 1 ratio?

Solution

State the hypotheses. H₀: the offspring fit a 3 : 1 ratio (any difference is chance). H₁: they do not fit 3 : 1.
Expected counts (E) from the 3 : 1 ratio out of 160:
Calculate χ² — one term per category, then add:
Work each term out:
Degrees of freedom = (number of categories − 1) = 2 − 1 = 1. From the table, the critical χ² at df = 1, p = 0.05 is 3.84.
Compare and conclude. Calculated χ² = 4.8 is bigger than the critical 3.84, so p < 0.05. The result is significant → reject H₀: the offspring do not fit a 3 : 1 ratio.

Final answer

χ² = 4.8 (df = 1). This is greater than the critical value 3.84 at p = 0.05, so the difference is significant and we reject H₀ — the offspring do not fit a 3 : 1 ratio.

Degrees of freedom (df)	Critical χ² at p = 0.05
1	3.84
2	5.99
3	7.81
4	9.49

t-test — comparing two means: A t-test asks whether two sample means ( and ) are significantly different. You calculate a value of t from the two means, their spreads (standard deviations) and the sample sizes — the formula is given to you, so the marks are for the decision, not for memorising it:

If calculated $t \ge$ critical $t$ (at p = 0.05) → the means ARE significantly different.

If calculated $t <$ critical $t$ → they are NOT significantly different.

Degrees of freedom for two groups of size and : .

Worked example — are the two means significantly different?

Solution

Hypotheses. H₀: there is no difference in mean leaf length between sunlit and shaded plants. H₁: there is a difference.
Degrees of freedom:
Critical value. From the table, the critical t at df = 30, p = 0.05 is 2.04.
Compare and conclude. The calculated t = 1.78 is smaller than the critical 2.04, so p > 0.05. The difference is not significant → do not reject H₀: there is no evidence that mean leaf length differs.

Final answer

Calculated t = 1.78 < critical t = 2.04 (df = 30, p = 0.05), so the difference is not significant — we do not reject H₀.

Degrees of freedom (df)	Critical t at p = 0.05 (two-tailed)
5	2.57
10	2.23
20	2.09
30	2.04

The error-bar shortcut for a t-test: On a graph of two means with error bars you can often judge significance by eye:

Error bars OVERLAP → the means are probably NOT significantly different (a t-test would give t below the critical value).

Error bars clearly DON'T overlap → the difference may well be significant.

This is exactly the reasoning examiners want when they show error bars on a chart.

Two means with overlapping error bars: because the bars overlap, the difference between the means is NOT statistically significant — a t-test would give t below the critical value (p > 0.05).

Interactive diagram

Explore the labelled diagram, charts and maps for this topic in full study mode.

Unlock free for 7 days

Practice with real exam questions

Answer exam-style questions and get AI feedback that shows you exactly what examiners want to see in a full-marks response.

Try Practice Free7-day free trial • No card required

How this is tested: On Paper 1B and in the IA you are GIVEN the formulae and the critical-value table — the marks are for running the test correctly and stating the right conclusion, not for memorising maths.

Common tasks: justify, using the data, that a difference is non-significant (calculated value below the critical value / p > 0.05 / error bars overlap); explain how to collect quantitative data so a treatment can be tested statistically (a numeric measure, replicates, treated vs control, then a t-test on the two means); and draw conclusions from a graph that shows p-values, stating an effect only where it is marked significant.

IB-style question — test a dihybrid-style count against an expected ratio

How to score all four marks

State H₀. H₀: the offspring fit a 3 : 1 grey : ebony ratio; any difference is due to chance. (H₁: they do not fit 3 : 1.)
Expected counts from 3 : 1 out of 120:
Calculate χ² using :
Conclusion. df = 2 − 1 = 1, so the critical value is 3.84. Calculated χ² = 0.18 is smaller than 3.84, so p > 0.05: the result is not significant → do not reject H₀. The offspring do fit a 3 : 1 ratio. (Mark 1: H₀ stated. Mark 2: correct expected counts. Mark 3: χ² ≈ 0.18 correctly calculated. Mark 4: compares to 3.84 and concludes 'not significant / fits 3 : 1'.)

Final answer

H₀: offspring fit 3 : 1. Expected = 90 grey, 30 ebony. χ² = 4/90 + 4/30 = 0.18. df = 1, critical = 3.84. Since 0.18 < 3.84, p > 0.05 — not significant, do not reject H₀: the data fit a 3 : 1 ratio.

✓ Why this scores full marks: It states H₀, works the expected counts from the ratio, plugs into to get a correct value, and — crucially — compares to the critical value and gives a biological conclusion ('fits 3 : 1 / not significant').

The classic way to lose the last mark is to calculate χ² but forget to compare it to 3.84, or to get the direction backwards (remember: calculated bigger = significant).

Justifying a NON-significant result (a 2-mark favourite): If asked to justify that a difference is not significant, quote the numbers:

'The calculated statistic is below the critical value at p = 0.05 (so p > 0.05), therefore the difference is not statistically significant and we do not reject H₀.'

On a chart you can add: 'the error bars overlap, which agrees that the means are not significantly different.'

Chi-squared (χ²) test

t-test

Use it to compare…

OBSERVED counts/frequencies against EXPECTED counts (e.g. a genetic ratio, or an association in a contingency table)

TWO MEANS of a measured (continuous) variable from two groups

Typical biology example

Do the offspring fit an expected 3:1 monohybrid ratio? Is flower colour associated with habitat?

Is mean leaf length different between shaded and sunlit plants?

Data type

Counts / categories

Measurements (numbers you can average)

What you compare

Calculated χ² vs the critical value

Calculated t vs the critical value

Significant when

calculated χ² ≥ critical value

calculated t ≥ critical value

Degrees of freedom (df)

Critical χ² at p = 0.05

3.84

5.99

7.81

9.49

Degrees of freedom (df)

Critical t at p = 0.05 (two-tailed)

2.57

2.23

2.09

2.04

Statistical significance: chi-squared & t-test

Stop guessing — know where you lost marks

Practice with real exam questions

Try an IB Exam Question — Free AI Feedback

Related Biology HL Topics

16 practice questions on Statistical significance: chi-squared & t-test

Statistical significance: chi-squared & t-test

Stop guessing — know where you lost marks

Practice with real exam questions

Try an IB Exam Question — Free AI Feedback

Related Biology HL Topics

16 practice questions on Statistical significance: chi-squared & t-test