You can't survey everyone — so sample fairly: A school has 1200 students but you only have time to survey 60. The whole 1200 is the population; the 60 you actually ask are the sample. The goal is a sample that looks like the population in miniature, so your conclusions generalise.
How you pick those 60 is the sampling method — and a careless method quietly skews every number that follows.
Which method gives a fair picture, and which one is fastest but biased?
The five methods at a glance
- Simple random — everyone has an equal chance (names from a hat / GDC random numbers)
- Systematic — order the list, pick every kᵗʰ person from a random start
- Stratified — split into groups (strata), sample each in proportion to its size
- Quota — like stratified but the interviewer chooses who fills each quota (non-random)
- Convenience — just ask whoever is easiest (most biased, least representative)
IB-style question — stratified sample
A college has 600 Year 12 and 400 Year 13 students. A stratified sample of 50 students is taken across the two year groups.
Find how many Year 12 students should be in the sample.
Step by step
- Stratified means each group is sampled in proportion to its size. First the total population.
- Year 12's share of the population, times the sample size.
- (Check: Year 13 gives 400/1000 × 50 = 20, and 30 + 20 = 50.)
Final answer
30 Year 12 students (and 20 Year 13). Stratifying keeps the sample's mix the same as the college's.
Two different ways data can go wrong: Imagine a bathroom scale.
If it reads 70.1, 70.0, 70.2 kg every time you step on it, it is reliable — it gives the same answer consistently.
But if your true mass is 65 kg, that scale is not valid — it is reliably measuring the wrong thing.
Reliability = consistency (small random error). Validity = measuring what you intend (no systematic error / bias). A measure can be reliable but not valid; to be trustworthy it must be both.
IB-style question — spot the bias
A researcher wants the average daily screen-time of all teenagers in a town. She stands outside a gaming cafe at 6 pm and asks everyone who walks out.
Identify the sampling method and explain why the estimate is biased.
Step by step
- She asks whoever is easiest to reach at one place and time — no random selection.
- People leaving a gaming cafe are far heavier screen-users than typical teenagers.
- So the mean screen-time will be pushed up — a systematic over-estimate.
Final answer
Convenience sampling. The sample over-represents heavy screen-users, so the estimate is biased upwards (too high) — it is not a valid measure of all teenagers.
Reliable vs valid in one line: Reliable but not valid → a clock that's always exactly 10 minutes fast (consistent, but wrong).
Valid but not reliable → a clock with the right average time but that jumps around randomly.
You want both: consistent AND correct.