Key Idea: The chi-squared test of independence (χ²) tests whether two categorical variables are related, or whether they are independent. You set up a contingency table of observed frequencies, calculate expected frequencies if the variables were independent, and compare. A small p-value means the variables are not independent — there is a statistically significant association.
✅ Hypothesis test structure
📊 GDC method
Example: 2×2 contingency table: Gender vs Preferred sport Observed: Male/Football=40, Male/Tennis=20, Female/Football=30, Female/Tennis=50. Grand total = 140. Row totals: Male=60, Female=80. Column totals: Football=70, Tennis=70. Expected (Male, Football) = (60 × 70)/140 = 30 GDC gives: χ² = 9.33, p = 0.0023. Since p < 0.05, reject H₀. Evidence that gender and sport preference are associated.
The conclusion must always be in context — name the two variables. 'Reject H₀' alone is not a full answer. The test only tells you that association exists — it does not say how strong or in which direction.
Paper 2 (GDC allowed): State both hypotheses in full before running the test. After: write χ², p-value, compare with α, and state conclusion with the variable names. Check expected frequencies: After running the test on GDC, view the expected matrix and verify all values ≥ 5. If not, state this as a limitation of the test.
IB-style question [7 marks]
A college investigates whether a student's faculty is associated with how they travel to college. A sample of 150 students is classified by faculty (Arts, Science) and by main mode of travel (walk, cycle, bus): Arts: walk 24, cycle 12, bus 24 (60 students). Science: walk 26, cycle 28, bus 36 (90 students). A χ² test for independence is carried out at the 5% significance level. (a) Write down the null hypothesis. (b) Show that the expected number of Arts students who walk is 20. (c) Write down the number of degrees of freedom. (d) The test gives a p-value of 0.223. State the conclusion in context.
Step by step:
(a) The null hypothesis is that the two variables are independent.
(b) Use the expected-frequency formula before putting numbers in.
The Arts row total is 60, the walk column total is 24 + 26 = 50, and the grand total is 150.
(c) Degrees of freedom use the table dimensions (2 rows, 3 columns).
(d) Compare the p-value with the significance level. Here 0.223 > 0.05, so do not reject the null hypothesis.
(a) H₀: faculty and mode of travel are independent. (b) E = (60 × 50)/150 = 20. (c) df = 2. (d) Since p = 0.223 > 0.05, there is insufficient evidence at the 5% level to conclude that a student's faculty and their mode of travel are associated.