IB Math AI HL Topic 4.10: Spearman rank correlation coefficient | Notes & Flashcards

IB Math AI HL — Spearman rank correlation coefficient

IB Mathematics AI SL topic covering core concepts and exam-style applications.

Higher Level students should use this topic hub as a map: start with the shared sub-topics, then follow the HL-only extensions and exam-skill links where this topic asks for deeper analysis.

Key concepts in Spearman rank correlation coefficient

Key Idea: Before you can analyse data, you need to understand what kind of data you have and how it was collected. Topic 4.1 covers the vocabulary of statistics: population vs sample, types of data, and sampling methods. Getting these right matters because the method of collection affects the validity of any conclusions you draw.

✅ Types of data

Type	Description	Example
Qualitative (categorical)	Categories or labels. No numerical meaning.	Colour of cars; country of birth; grade (A/B/C)
Quantitative discrete	Countable whole numbers. Gaps between values.	Number of students in a class; goals scored
Quantitative continuous	Any value in a range. Measured, not counted.	Height; temperature; time; mass

✅ Population and sampling

Population: The entire group being studied. A census collects data from every member. Example: testing every light bulb in a factory batch (often impractical).
Sample: A subset of the population. Used when a census is too costly, slow, or destructive. The sample must be representative.
Simple random sampling: Every member has an equal chance of selection. Use a random number generator or lottery. Removes bias but requires a complete population list.
Systematic sampling: Select every kth member after a random start. Example: every 10th person on a list. Simple to implement but can miss patterns.
Stratified sampling: Divide the population into subgroups (strata), then sample from each in proportion to its size. Ensures each group is represented. Example: if 40% are female, 40% of the sample should be female.
Convenience/quota sampling: Non-random. Convenience = whoever is easiest to reach. Quota = fill a pre-set number from each category. Both introduce bias — results may not generalise.

Reliability: A random sample tends to produce reliable results (low bias) if it is large enough. Non-random methods are faster but less reliable. Outlier impact: A single extreme value (outlier) can distort the mean significantly. Always identify outliers before drawing conclusions.

Paper 1: Questions often ask you to identify data type or explain why a sampling method is biased. Write a specific reason — 'convenience sampling means people who are easy to reach are over-represented' earns the mark; vague answers do not. Paper 2: You may need to calculate sample size per stratum. Divide: nₛₜᵣₐₜᵤₘ = (stratum size / population size) × total sample size.

IB-style question [6 marks]

A town with 2000 adult residents is surveyed about a proposed cycle lane. The residents are grouped by age: 800 are under 30, 700 are aged 30 to 60, and 500 are over 60. A stratified sample of 80 residents is taken. (a) Explain why stratified sampling is more suitable here than simple random sampling. (b) Find the number of residents that should be selected from each age group. (c) State one type of data being collected (support for the cycle lane) and classify it.

Step by step:

(a) Stratified sampling guarantees every age group is represented in proportion to its size, so the views of older and younger residents are not under- or over-counted by chance.
$\text{each stratum sampled in proportion} \to \text{representative}$
(b) Use the stratified-sample rule for each group. State it first.
$n_{\text{group}} = \frac{\text{group size}}{2000}\times 80$
Under 30: 800 out of 2000.
$\frac{800}{2000}\times 80 = 32$
Aged 30–60: 700 out of 2000.
$\frac{700}{2000}\times 80 = 28$
Over 60: 500 out of 2000 (or 80 − 32 − 28). Check the total.
$\frac{500}{2000}\times 80 = 20,\qquad 32+28+20 = 80\ \checkmark$
(c) 'Support for the cycle lane' (yes / no) is a category, not a number.
$\text{support (yes/no)} \to \text{qualitative (categorical)}$

Final answer:

(a) It keeps each age group represented in proportion, avoiding chance over-representation. (b) 32 under 30, 28 aged 30–60, 20 over 60. (c) Support (yes/no) is qualitative (categorical) data.

IB Math AI HL — Spearman rank correlation coefficient

IB Mathematics AI SL topic covering core concepts and exam-style applications.

Higher Level students should use this topic hub as a map: start with the shared sub-topics, then follow the HL-only extensions and exam-skill links where this topic asks for deeper analysis.

Key concepts in Spearman rank correlation coefficient

Key Idea: Before you can analyse data, you need to understand what kind of data you have and how it was collected. Topic 4.1 covers the vocabulary of statistics: population vs sample, types of data, and sampling methods. Getting these right matters because the method of collection affects the validity of any conclusions you draw.

✅ Types of data

Type	Description	Example
Qualitative (categorical)	Categories or labels. No numerical meaning.	Colour of cars; country of birth; grade (A/B/C)
Quantitative discrete	Countable whole numbers. Gaps between values.	Number of students in a class; goals scored
Quantitative continuous	Any value in a range. Measured, not counted.	Height; temperature; time; mass

✅ Population and sampling

Population: The entire group being studied. A census collects data from every member. Example: testing every light bulb in a factory batch (often impractical).
Sample: A subset of the population. Used when a census is too costly, slow, or destructive. The sample must be representative.
Simple random sampling: Every member has an equal chance of selection. Use a random number generator or lottery. Removes bias but requires a complete population list.
Systematic sampling: Select every kth member after a random start. Example: every 10th person on a list. Simple to implement but can miss patterns.
Stratified sampling: Divide the population into subgroups (strata), then sample from each in proportion to its size. Ensures each group is represented. Example: if 40% are female, 40% of the sample should be female.
Convenience/quota sampling: Non-random. Convenience = whoever is easiest to reach. Quota = fill a pre-set number from each category. Both introduce bias — results may not generalise.

Reliability: A random sample tends to produce reliable results (low bias) if it is large enough. Non-random methods are faster but less reliable. Outlier impact: A single extreme value (outlier) can distort the mean significantly. Always identify outliers before drawing conclusions.

Paper 1: Questions often ask you to identify data type or explain why a sampling method is biased. Write a specific reason — 'convenience sampling means people who are easy to reach are over-represented' earns the mark; vague answers do not. Paper 2: You may need to calculate sample size per stratum. Divide: nₛₜᵣₐₜᵤₘ = (stratum size / population size) × total sample size.

IB-style question [6 marks]

Step by step:

(a) Stratified sampling guarantees every age group is represented in proportion to its size, so the views of older and younger residents are not under- or over-counted by chance.
$\text{each stratum sampled in proportion} \to \text{representative}$
(b) Use the stratified-sample rule for each group. State it first.
$n_{\text{group}} = \frac{\text{group size}}{2000}\times 80$
Under 30: 800 out of 2000.
$\frac{800}{2000}\times 80 = 32$
Aged 30–60: 700 out of 2000.
$\frac{700}{2000}\times 80 = 28$
Over 60: 500 out of 2000 (or 80 − 32 − 28). Check the total.
$\frac{500}{2000}\times 80 = 20,\qquad 32+28+20 = 80\ \checkmark$
(c) 'Support for the cycle lane' (yes / no) is a category, not a number.
$\text{support (yes/no)} \to \text{qualitative (categorical)}$

Final answer:

(a) It keeps each age group represented in proportion, avoiding chance over-representation. (b) 32 under 30, 28 aged 30–60, 20 over 60. (c) Support (yes/no) is qualitative (categorical) data.

IB Math AI HL — Spearman rank correlation coefficient

Key concepts in Spearman rank correlation coefficient

✅ Types of data

✅ Population and sampling

IB-style question [6 marks]

What you'll learn in Topic 4.10

Study resources — 4.10 Spearman rank correlation coefficient

Spearman Rank Correlation

Ready to study Spearman rank correlation coefficient?

Ready to practice?

IB Math AI HL — Spearman rank correlation coefficient

Key concepts in Spearman rank correlation coefficient

✅ Types of data

✅ Population and sampling

IB-style question [6 marks]

What you'll learn in Topic 4.10

Study resources — 4.10 Spearman rank correlation coefficient

Spearman Rank Correlation

Ready to study Spearman rank correlation coefficient?

Ready to practice?