The big idea: A correlation means two measured variables change together in a pattern.
Positive correlation — as one goes up, the other goes up (e.g. leaf area and transpiration rate).
Negative correlation — as one goes up, the other goes down (e.g. altitude and air temperature).
But a correlation does not prove that one variable causes the other. Some third (confounding) variable, or pure coincidence, could be behind it. This is the rule examiners test again and again: correlation ≠ causation.
A positive correlation: as leaf area increases, transpiration rate increases. The best-fit line slopes UP and r is positive (≈ +0.9). A strong correlation still does NOT prove that larger leaves CAUSE faster transpiration.
Interactive diagram
Explore the labelled diagram, charts and maps for this topic in full study mode.
- Correlation
- A relationship in which two variables tend to change together (positively or negatively).
- Positive correlation
- As one variable increases, the other also increases. The best-fit line slopes upward.
- Negative correlation
- As one variable increases, the other decreases. The best-fit line slopes downward.
- Causation
- A change in one variable directly produces a change in another. Shown by a controlled experiment, NOT by a correlation alone.
- Confounding variable
- A third variable that affects both measured variables and can create a correlation without a direct cause.
- Correlation coefficient (r)
- A number from −1 to +1 measuring the strength and direction of a linear correlation.
- Coefficient of determination (R²)
- r squared — the fraction (or %) of the variation in one variable explained by the other.
Why correlation can't prove cause: Ice-cream sales and drowning deaths rise together every summer — but ice cream does not cause drowning.
A third variable (hot weather) drives both.
So whenever you see a correlation, ask: could something else explain both? Only a controlled experiment can establish causation.
Two numbers summarise a correlation. You are never asked to calculate them by hand — they are given to you — but you must interpret them correctly.
r — the correlation coefficient — captures two things at once: the direction (its sign) and the strength (how close it is to ±1).
Interpreting r — direction and strength: Read r in two steps:
Sign → direction. means positive (both rise); means negative (one rises, the other falls).
Size → strength. The closer is to 1, the stronger the linear correlation; means no linear relationship.
So is a strong negative correlation, while is a weak positive one. Same strength reasoning, opposite signs: and are equally strong.
| Correlation coefficient r | Strength | Direction |
|---|---|---|
| r ≈ +1 (e.g. +0.9) | Strong | Positive — as x ↑, y ↑ |
| r ≈ +0.5 | Moderate | Positive |
| r ≈ 0 | None / very weak | No linear relationship |
| r ≈ −0.5 | Moderate | Negative — as x ↑, y ↓ |
| r ≈ −1 (e.g. −0.9) | Strong | Negative |
A negative correlation: as altitude increases, mean air temperature decreases. The best-fit line slopes DOWN and r is negative (≈ −0.95). Direction (sign of r) and strength (how close to ±1) are read separately.
Interactive diagram
Explore the labelled diagram, charts and maps for this topic in full study mode.
The coefficient of determination R² = r²: The coefficient of determination is simply r squared:
$$R2 = r2$$
It tells you what fraction of the variation in y is explained by x (the rest is due to other factors / random scatter). Multiply by 100 to read it as a percentage.
Worked example. A study finds between leaf area and transpiration rate.
— so about 81% of the variation in transpiration rate is explained by leaf area; the other ~19% is due to other factors (humidity, wind, light…).
Going the other way: if you are told , then (use the graph's slope to pick the sign).
| Correlation coefficient r | Coefficient of determination R² | |
|---|---|---|
| Range | −1 to +1 | 0 to 1 (often given as a %) |
| Tells you | Strength AND direction of the linear relationship | What FRACTION of the variation in y is explained by x |
| Sign | Can be + or − | Always positive (it is r squared) |
| Link | — | R² = r² (so r = ±√R²) |
| Example | r = +0.9 | R² = 0.81 → ~81% of the variation in y is explained by x |
R² is still only a correlation: A high (say 0.92) sounds powerful — 92% of the variation explained — but it is still a correlation.
It does not prove x causes y. A confounding variable could explain that 92% just as well. Strength of a correlation and proof of causation are different things.
Study smarter, not longer
Most students waste 40% of study time on topics they already know. Our AI tracks your progress and optimizes every minute.
How this is tested: On Paper 1B and the IA you analyse data you are given:
Describe / state the relationship a graph or table shows (1 mark) — say the direction (positive/negative) and quote figures from the data.
Comment on an R² value (2 marks) — convert it to a % of variation explained and note it still does not prove cause.
Deduce / discuss / evaluate a causal claim from a correlation — the safe answer is almost always the data show a correlation but do not prove causation; a third variable could be responsible; a controlled experiment is needed.
IB-style question — interpret a correlation and an R² value
A scientist measured the resting heart rate and the weekly exercise time of 40 adults. The data gave a correlation coefficient of r = −0.80, and a coefficient of determination of R² = 0.64.
(a) Describe the relationship between weekly exercise time and resting heart rate. [1]
(b) State what the R² value of 0.64 tells you. [2]
(c) A newspaper claims the data prove that exercising more lowers your resting heart rate. Evaluate this claim. [2]
Fully worked answer
- (a) Direction + strength. is negative and close to −1, so there is a strong negative correlation: as weekly exercise time increases, resting heart rate decreases. (1 mark — direction stated with the variables.)
- (b) Turn R² into a percentage. , and , so about 64% of the variation in resting heart rate is explained by weekly exercise time (1 mark). The remaining ~36% is due to other factors — age, diet, genetics, etc. (1 mark). (Check: $R^2 = r^2 = (-0.80)^2 = 0.64$ ✓.)
- (c) Evaluate the causal claim. The data show a strong correlation, but a correlation does not prove causation (1 mark). A confounding variable could explain both — e.g. healthier/younger people both exercise more and have lower heart rates — and only 36% is even explained here. To establish cause you would need a controlled experiment. So the claim is not justified by these data (1 mark).
Final answer
(a) A strong negative correlation: as weekly exercise time increases, resting heart rate decreases. (b) R² = 0.64 → about 64% of the variation in resting heart rate is explained by exercise time; the other ~36% is due to other factors. (c) The data show a correlation, not causation — a confounding variable (e.g. age, general fitness) could cause both, so a controlled experiment is needed; the claim is not justified.
✓ Why this scores full marks: Part (a) names the direction AND the strength and links the right variables.
Part (b) converts to a % of variation explained and accounts for the rest.
Part (c) gives the examiner's favourite line: correlation ≠ causation, names a plausible confounding variable, and calls for a controlled experiment — that is what an Evaluate needs.