IB Biology - Correlation, causation & coefficient of determination | Free Notes

The big idea: A correlation means two measured variables change together in a pattern.

Positive correlation — as one goes up, the other goes up (e.g. leaf area and transpiration rate).

Negative correlation — as one goes up, the other goes down (e.g. altitude and air temperature).

But a correlation does not prove that one variable causes the other. Some third (confounding) variable, or pure coincidence, could be behind it. This is the rule examiners test again and again: correlation ≠ causation.

A positive correlation: as leaf area increases, transpiration rate increases. The best-fit line slopes UP and r is positive (≈ +0.9). A strong correlation still does NOT prove that larger leaves CAUSE faster transpiration.

Interactive diagram

Explore the labelled diagram, charts and maps for this topic in full study mode.

Unlock free for 7 days

Correlation: A relationship in which two variables tend to change together (positively or negatively).
Positive correlation: As one variable increases, the other also increases. The best-fit line slopes upward.
Negative correlation: As one variable increases, the other decreases. The best-fit line slopes downward.
Causation: A change in one variable directly produces a change in another. Shown by a controlled experiment, NOT by a correlation alone.
Confounding variable: A third variable that affects both measured variables and can create a correlation without a direct cause.
Correlation coefficient (r): A number from −1 to +1 measuring the strength and direction of a linear correlation.
Coefficient of determination (R²): r squared — the fraction (or %) of the variation in one variable explained by the other.

Why correlation can't prove cause: Ice-cream sales and drowning deaths rise together every summer — but ice cream does not cause drowning.

A third variable (hot weather) drives both.

So whenever you see a correlation, ask: could something else explain both? Only a controlled experiment can establish causation.

Two numbers summarise a correlation. You are never asked to calculate them by hand — they are given to you — but you must interpret them correctly.

r — the correlation coefficient — captures two things at once: the direction (its sign) and the strength (how close it is to ±1).

Interpreting r — direction and strength: Read r in two steps:

Sign → direction. means positive (both rise); means negative (one rises, the other falls).

Size → strength. The closer is to 1, the stronger the linear correlation; means no linear relationship.

So is a strong negative correlation, while is a weak positive one. Same strength reasoning, opposite signs: and are equally strong.

Correlation coefficient r	Strength	Direction
r ≈ +1 (e.g. +0.9)	Strong	Positive — as x ↑, y ↑
r ≈ +0.5	Moderate	Positive
r ≈ 0	None / very weak	No linear relationship
r ≈ −0.5	Moderate	Negative — as x ↑, y ↓
r ≈ −1 (e.g. −0.9)	Strong	Negative

A negative correlation: as altitude increases, mean air temperature decreases. The best-fit line slopes DOWN and r is negative (≈ −0.95). Direction (sign of r) and strength (how close to ±1) are read separately.

Interactive diagram

Explore the labelled diagram, charts and maps for this topic in full study mode.

Unlock free for 7 days

The coefficient of determination R² = r²: The coefficient of determination is simply r squared:

$$R² = r²$$

It tells you what fraction of the variation in y is explained by x (the rest is due to other factors / random scatter). Multiply by 100 to read it as a percentage.

Worked example. A study finds between leaf area and transpiration rate.

— so about 81% of the variation in transpiration rate is explained by leaf area; the other ~19% is due to other factors (humidity, wind, light…).

Going the other way: if you are told , then (use the graph's slope to pick the sign).

	Correlation coefficient r	Coefficient of determination R²
Range	−1 to +1	0 to 1 (often given as a %)
Tells you	Strength AND direction of the linear relationship	What FRACTION of the variation in y is explained by x
Sign	Can be + or −	Always positive (it is r squared)
Link	—	R² = r² (so r = ±√R²)
Example	r = +0.9	R² = 0.81 → ~81% of the variation in y is explained by x

R² is still only a correlation: A high (say 0.92) sounds powerful — 92% of the variation explained — but it is still a correlation.

It does not prove x causes y. A confounding variable could explain that 92% just as well. Strength of a correlation and proof of causation are different things.

Study smarter, not longer

Most students waste 40% of study time on topics they already know. Our AI tracks your progress and optimizes every minute.

Try Smart Study Free7-day free trial • No card required

How this is tested: On Paper 1B and the IA you analyse data you are given:

Describe / state the relationship a graph or table shows (1 mark) — say the direction (positive/negative) and quote figures from the data.

Comment on an R² value (2 marks) — convert it to a % of variation explained and note it still does not prove cause.

Deduce / discuss / evaluate a causal claim from a correlation — the safe answer is almost always the data show a correlation but do not prove causation; a third variable could be responsible; a controlled experiment is needed.

IB-style question — interpret a correlation and an R² value

A scientist measured the resting heart rate and the weekly exercise time of 40 adults. The data gave a correlation coefficient of r = −0.80, and a coefficient of determination of R² = 0.64.

(a) Describe the relationship between weekly exercise time and resting heart rate. [1] (b) State what the R² value of 0.64 tells you. [2] (c) A newspaper claims the data prove that exercising more lowers your resting heart rate. Evaluate this claim. [2]

Fully worked answer

(a) Direction + strength. is negative and close to −1, so there is a strong negative correlation: as weekly exercise time increases, resting heart rate decreases. (1 mark — direction stated with the variables.)
(b) Turn R² into a percentage. , and , so about 64% of the variation in resting heart rate is explained by weekly exercise time (1 mark). The remaining ~36% is due to other factors — age, diet, genetics, etc. (1 mark). (Check: $R^2 = r^2 = (-0.80)^2 = 0.64$ ✓.)
(c) Evaluate the causal claim. The data show a strong correlation, but a correlation does not prove causation (1 mark). A confounding variable could explain both — e.g. healthier/younger people both exercise more and have lower heart rates — and only 36% is even explained here. To establish cause you would need a controlled experiment. So the claim is not justified by these data (1 mark).

Final answer

(a) A strong negative correlation: as weekly exercise time increases, resting heart rate decreases. (b) R² = 0.64 → about 64% of the variation in resting heart rate is explained by exercise time; the other ~36% is due to other factors. (c) The data show a correlation, not causation — a confounding variable (e.g. age, general fitness) could cause both, so a controlled experiment is needed; the claim is not justified.

✓ Why this scores full marks: Part (a) names the direction AND the strength and links the right variables.

Part (b) converts to a % of variation explained and accounts for the rest.

Part (c) gives the examiner's favourite line: correlation ≠ causation, names a plausible confounding variable, and calls for a controlled experiment — that is what an Evaluate needs.

Correlation coefficient r

Strength

Direction

r ≈ +1 (e.g. +0.9)

Strong

Positive — as x ↑, y ↑

r ≈ +0.5

Moderate

Positive

r ≈ 0

None / very weak

No linear relationship

r ≈ −0.5

Moderate

Negative — as x ↑, y ↓

r ≈ −1 (e.g. −0.9)

Strong

Negative

Correlation coefficient r

Coefficient of determination R²

Range

−1 to +1

0 to 1 (often given as a %)

Tells you

Strength AND direction of the linear relationship

What FRACTION of the variation in y is explained by x

Sign

Can be + or −

Always positive (it is r squared)

Link

—

R² = r² (so r = ±√R²)

Example

r = +0.9

R² = 0.81 → ~81% of the variation in y is explained by x

Correlation, causation & coefficient of determination

Practice the questions examiners actually ask

Study smarter, not longer

IB Exam Questions on Correlation, causation & coefficient of determination

How Correlation, causation & coefficient of determination Appears in IB Exams

Related Biology Topics

16 exam-style questions ready for you

Correlation, causation & coefficient of determination

Practice the questions examiners actually ask

Study smarter, not longer

IB Exam Questions on Correlation, causation & coefficient of determination

How Correlation, causation & coefficient of determination Appears in IB Exams

Related Biology Topics

16 exam-style questions ready for you