Scatter diagrams and bivariate data
Bivariate data: Data with two variables (e.g., height and weight).
Plotted as points (x,y) on a scatter diagram.
[Diagram: math-scatter-regression] - Available in full study mode
Reading scatter diagrams: Each point is one observation.
Position shows relationship between variables.
| Pattern | Meaning |
|---|---|
| Points trend upward | Positive correlation |
| Points trend downward | Negative correlation |
| Points scattered random | No correlation/weak |
Always plot bivariate: Plotting reveals patterns that raw numbers hide (e.g., Anscombe quartet).
Types of correlation
Correlation strength: Strong: points follow clear line pattern.
Weak: points scattered.
None: no pattern.
| Type | r value | Scatter pattern |
|---|---|---|
| Strong positive | 0.7 to 1.0 | Points close to upward line |
| Moderate positive | 0.4 to 0.7 | Scattered but trend up |
| Weak positive | 0.0 to 0.4 | Very scattered, slight up |
| Strong negative | -0.7 to -1.0 | Points close to downward line |
| No correlation | Near 0 | Random scatter |
Pearson correlation coefficient r: Measures strength and direction.
Always between -1 and +1.
See how examiners mark answers
Access past paper questions with model answers. Learn exactly what earns marks and what doesn't.
Interpreting correlation
Worked example
Dataset: height (cm) vs weight (kg) for 10 students.
Scatter shows clear upward trend with r=0.85.
What does this mean?
Interpretation
- r=0.85 is positive (height increases, weight increases)
- 0.85 is close to 1 (strong correlation)
- Points cluster near a line - predictable relationship
- But this is not causation: tall students dont cause weight gain
Final answer
Strong positive correlation exists, but we cannot conclude causation.
Correlation vs causation: Strong correlation does NOT mean one variable causes the other.
Both may depend on a third variable.
Outliers and their influence
Effect of outliers: One extreme point can dramatically change correlation coefficient and line of best fit.
Worked example
Dataset 1: r=0.90 (clean).
Dataset 2: same data plus one outlier point far from trend.
New r=0.50.
Why did r drop so much?
Explanation
- Outlier is extreme value far from main pattern
- r measures how well points fit overall trend
- One outlier increases scatter, weakening correlation
- Always identify and consider removing outliers
Final answer
Outliers can mask or exaggerate true relationships. Report correlation with and without outliers.