Key Idea: Topic 2.6 takes the model types from 2.5 and asks: which one fits this data best? You use the GDC to run regression, get an equation, and then use it to make predictions. The critical thinking skill here is knowing when to trust a prediction (interpolation — inside the data) versus when to be cautious (extrapolation — beyond the data).
✅ The modelling workflow
Example: Data: year (x) vs sales (y) for 5 years. GDC linear regression gives: y = 12.4x + 85.2, r = 0.97. Interpretation: Strong positive linear relationship (r close to 1). For every additional year, sales increase by about 12.4 units. Predict sales in year 6 (interpolation or extrapolation?): x = 6 is just beyond the last data point — this is a slight extrapolation. Prediction: y = 12.4(6) + 85.2 = 159.6 (treat with some caution).
Selecting the regression type matters. Using linear regression on exponential data gives a poor fit even if r looks acceptable. Always check the graph of the regression against the scatter plot. If an exam question says 'use your regression equation to predict...', substitute your x-value into the equation and calculate y. Round to a sensible level of accuracy for the context.
Paper 2 (GDC allowed): Write the regression equation, the value of r or r², and then use it to make the prediction. Show the substitution step. Whenever you extrapolate, acknowledge the limitation: 'this is beyond the data range so the prediction may be less reliable'. This is an explicit IB marking criterion.
IB-style question [7 marks]
A scientist studies how the mass m (grams) of a chemical that has dissolved depends on the temperature T (°C) of the water. Data are collected for temperatures from 20 °C to 60 °C. A GDC linear regression gives the line of m on T as m = 0.85T + 4 with correlation coefficient r = 0.97. (a) Describe the correlation between temperature and mass dissolved. (b) Use the regression line to estimate the mass dissolved at 45 °C. (c) The scientist uses the line to predict the mass dissolved at 95 °C. State, with a reason, whether this prediction is reliable.
Step by step:
(a) Read r and state strength + direction. r = 0.97 is close to +1.
(b) Substitute T = 45 into the line. 45 °C is inside the range 20–60, so this is interpolation.
(c) Compare T = 95 with the data range 20–60. It is well outside, so the prediction is extrapolation.
Beyond the tested temperatures the relationship may not stay linear (e.g. the water boils at 100 °C, or the solution saturates), so the prediction is not reliable.
(a) Strong positive linear correlation. (b) About 42.25 g. (c) Not reliable — T = 95 °C is outside the data range 20–60 °C (extrapolation), so the linear pattern may not continue.