A test weighs the evidence against H₀: Imagine a coffee chain claims its cups hold 250 ml on average. A sample comes out a bit low — but is that a real shortfall, or just random cup-to-cup variation?
A hypothesis test answers this. You set up two rival statements:
H₀ (null): nothing unusual — the mean is exactly the claimed value (μ = 250).
H₁ (alternative): the claim you suspect (μ < 250, or μ ≠ 250).
The GDC returns a p-value = the probability of getting data this extreme if H₀ were true.
The decision rule on its own line:
p < significance level → reject H₀ (the data are too surprising for H₀ to stand).
p ≥ significance level → do not reject H₀ (the data are consistent with H₀).
One-tailed vs two-tailed: One-tailed (μ < 250 or μ > 250): you only care about one direction. Use when the claim is directional ('cups are short-filled').
Two-tailed (μ ≠ 250): you care about a difference either way. Use when the claim is just 'the mean has changed'.
The GDC needs to know which — pick the tail that matches H₁.
IB-style question — set up and conclude
A machine should fill bottles to 500 ml. A consumer group suspects under-filling and tests at the 5% significance level. Their test gives p = 0.018.
State the hypotheses and the conclusion in context.
Step by step
- 'Suspects under-filling' is directional, so this is a one-tailed (lower) test.
- Compare the p-value to α = 0.05.
- Since p < α, reject H₀.
Final answer
Reject H₀. There is significant evidence (at the 5% level) that the machine under-fills the bottles — the mean is below 500 ml.
Which test? Look at the standard deviation: Both tests compare means, but they differ in what you know about the spread:
z-test — use when the population standard deviation σ is given (known).
t-test — use when σ is unknown and you only have the sample standard deviation. This is the common case in real data, so the t-test dominates AI HL.
Then count the groups:
One-sample — compare one group's mean to a fixed claimed value (e.g. 'is the mean weight 250 g?').
Two-sample — compare two independent groups' means (e.g. 'do brand A and brand B last equally long?').
IB-style question — one-sample t-test
A nutritionist claims a snack bar contains 200 kcal on average. A sample of 12 bars has mean 207 kcal and sample standard deviation 9 kcal. The population σ is unknown. Test at the 5% level whether the mean differs from 200 kcal.
Step by step
- σ unknown → t-test. 'Differs' (either way) → two-tailed.
- Enter the summary stats into the GDC's one-sample t-test (μ₀ = 200, x̄ = 207, sₙ₋₁ = 9, n = 12, two-tailed).
- The GDC returns the p-value.
- Compare to α = 0.05.
Final answer
Reject H₀. There is significant evidence at the 5% level that the mean calorie content differs from the claimed 200 kcal (the sample suggests it is higher).
IB-style question — two-sample t-test
Two factories produce LED bulbs. A sample from factory A lasts longer on average than a sample from factory B. A two-sample t-test of H₀: μA = μB against H₁: μA > μB gives p = 0.16 at the 5% level.
State the conclusion in context.
Step by step
- Two independent groups, σ unknown → two-sample t-test (one-tailed, since H₁ is directional).
- Compare the p-value to α.
- Since p ≥ α, do not reject H₀.
Final answer
Do not reject H₀. There is not enough evidence at the 5% level to conclude that factory A's bulbs last longer — the observed difference could be due to chance.