Key Idea: This topic is about measuring and modelling how two variables move together — describing a scatter, putting a number on it with r, and fitting the line y = ax + b to predict. On Paper 2 the GDC hands you a, b and r in one step.
📈 Describing a scatter & Pearson's r
Describe a scatter by its direction and its strength. Direction — positive (both rise together) or negative (one rises as the other falls). Strength — strong (points hug a line) or weak (loosely scattered). e.g. 'strong positive', 'weak negative'. r puts a number on exactly this.
- the direction — + is positive, − is negative
- the strength — near 1 is strong, near 0 is weak
- the points lie exactly on a straight line
📉 The regression line y = ax + b
- gradient — change in y per 1-unit rise in x
- y-intercept — predicted y when x = 0
The regression line always passes through (x̄, ȳ). So substituting the means into y = ax + b works exactly — handy for finding a missing mean, or for checking your line. To predict y use y on x; to predict x use x on y. The two lines cross at (x̄, ȳ), so solving them together gives the means.
✏️ IB-style worked examples
IB-style question — describe correlation from r
A study of weekly rainfall and umbrella sales gives r = 0.91, and a study of outdoor temperature and heater use gives r = −0.87. Describe the correlation in each case.
Step by step:
Read r in two parts: sign = direction, |r| close to 1 = strength.
Do the same for the second value.
r = 0.91 → strong positive; r = −0.87 → strong negative.
IB-style question — find r and the regression line (Paper 2)
For six bakeries, x = number of staff and y = loaves baked per hour are (2, 18), (3, 26), (4, 31), (5, 40), (6, 44), (7, 53). Find r and the regression line of y on x, then predict y when x = 8.
Step by step:
Enter the pairs in L1, L2 and run linear regression on the GDC.
Write the line, then substitute x = 8 to predict.
x = 8 is just outside the data — flag it as extrapolation.
r ≈ 0.996 (very strong positive); y = 6.8x + 4.73; y ≈ 59 loaves at x = 8.
IB-style question — interpret and use the line
A regression line for a seedling's height y cm against age x weeks is y = 1.8x + 4, and the mean age is x̄ = 5. Interpret a and b, and find the mean height ȳ.
Step by step:
Gradient a = growth per week; intercept b = height at week 0.
The mean point (x̄, ȳ) lies on the line — substitute x̄ = 5.
Grows ≈ 1.8 cm per week, ≈ 4 cm at week 0; ȳ = 13 cm.
Important: A strong r shows the variables move together, NOT that one causes the other — never claim cause from r alone. Two more traps: r only measures a linear pattern (a strong curve can give a small r), and extrapolating far beyond the data is unreliable — only trust predictions inside the data range.
Tap each card to reveal the answer.
Exam Tips
- Describe a scatter with BOTH a direction and a strength.
- r's sign is the direction; how close |r| is to 1 is the strength; −1 ≤ r ≤ 1.
- On Paper 2, get a, b and r from LinReg(ax+b) — never compute by hand.
- The regression line always passes through (x̄, ȳ); predict y with y on x.
- A strong r is not proof of cause, and extrapolating beyond the data is unreliable.