Key Idea: Topic 5.1 is not new biology — it is the data-handling and maths toolkit that runs through every experiment and every data question. It is assessed directly on Paper 1B (the short data-based section) and underpins the whole Internal Assessment (IA). Unlike Themes A–D, this topic is quantitative: you will design experiments, calculate magnification, percentage change and rates, find means and standard deviations, run a chi-squared or t-test, judge correlation vs causation, compute diversity and population indices, and pick the right sampling or lab technique. The good news: on the real paper the formulae and critical-value tables are GIVEN — so the marks are for method and reasoning, not memorising sums. This page recaps every skill, gives you a one-stop formula toolkit, and ends with mixed calculation questions.
Show your working — method marks are awarded for the substitution even if the final number slips. Always quote units (µm, g h⁻¹, %) and the right number of decimal places / significant figures. Back every conclusion with the data — quote the value you read off, don't just describe a shape.
🧮 Formula toolkit (everything in one place)
Every key formula for this topic is collected below. On Paper 1B and in the IA these are provided — your job is to substitute correctly, show working, and interpret the answer. Skim this table first, then each section shows the formula in action.
| Skill | Formula | When you use it |
|---|---|---|
| Magnification | $M = \dfrac{\text{image size}}{\text{actual size}}$ | Rearranges 3 ways: image = M × actual; actual = image ÷ M. Convert mm ↔ µm first (×1000). |
| Percentage change | $\%\,\text{change} = \dfrac{\text{final} - \text{initial}}{\text{initial}} \times 100$ | Always divide by the ORIGINAL (initial) value; a fall is negative. |
| Rate from a graph | $\text{rate} = \text{gradient} = \dfrac{\Delta y}{\Delta x}$ | Read two points off the line; rate of reaction = how steep the curve is. |
| Mean | $\bar{x} = \dfrac{\sum x}{n}$ | Add all values, divide by how many — the central/typical value. |
| Significance (overlap rule) | error bars OVERLAP → NOT significant; clear GAP → significant | The fast visual check before any formal test. |
| Chi-squared (χ²) | $\chi² = \sum \dfrac{(O-E)²}{E}$ | Genetics ratios & association. Compare to the critical value at df, p = 0.05. |
| t-test | compare two means; significant if $tcₐₗc \ge tcᵣᵢₜ$ | Two means. Critical value read at df, p = 0.05 (formula & table are GIVEN). |
| Coefficient of determination | $R²$ (0 → 1); e.g. $R² = 0.92$ → 92% of the variation is explained by the fit | How well the trend line fits; $R² = r²$. |
| Simpson's reciprocal index | $D = \dfrac{N(N-1)}{\sum n(n-1)}$ | Biodiversity. $N$ = total of ALL individuals; $n$ = individuals of one species; higher $D$ = more diverse. |
| Lincoln (mark–recapture) index | $N = \dfrac{n₁ \times n₂}{n₃}$ | Population estimate. $n₁$ marked & released, $n₂$ second catch, $n₃$ marked in second catch. |
| Rf (chromatography) | $Rf = \dfrac{\text{distance moved by spot}}{\text{distance moved by solvent}}$ | Always between 0 and 1; identifies a pigment/molecule. |
Magnification triangle: $M = \dfrac{\text{image}}{\text{actual}}$ rearranges to image = M × actual and actual = image ÷ M. Cover the one you want. Percentage change is always ÷ the ORIGINAL value: $\dfrac{\text{final} - \text{initial}}{\text{initial}} \times 100$ — a decrease comes out negative.
🧪 Experimental design: variables & controls (5.1.1)
Every good experiment changes one thing and measures one thing while keeping everything else the same. The independent variable (IV) is what you deliberately change; the dependent variable (DV) is what you measure; controlled variables are kept constant so they cannot affect the result; and a control treatment (often with no treatment) gives a baseline to compare against. Naming all four — and a sensible control — is exactly what Paper 1B asks for.
| Type of variable | What it is | Worked example (light → photosynthesis) |
|---|---|---|
| Independent (IV) | The ONE thing you deliberately change | Light intensity (the distance of the lamp) |
| Dependent (DV) | The thing you measure as a result | Bubbles of oxygen per minute |
| Controlled | Variables kept CONSTANT so they don't affect the DV | Temperature, CO₂ level, same pondweed, same time |
| Control treatment | A comparison with NO treatment (or a known result) | A tube kept in the dark — shows the effect is due to light |
Key Idea: 'I change the IV, I depend on the DV.' Whatever you put on the x-axis is usually the IV; whatever you measure on the y-axis is the DV. Everything else must be controlled — list the obvious ones (temperature, pH, time, volume, same organism).
🔄 Reliability, replicates & evaluating method (5.1.2)
Reliable results are repeatable; valid results actually measure what they claim to. You improve reliability by taking replicates and a mean, and validity by controlling variables and including a control. Evaluating a method means naming a specific limitation (too few repeats, a narrow range, one site, a short time) and a matched improvement — vague answers like 'do it more carefully' score nothing.
How to make a method more reliable / valid
- Repeat (replicates) and take a mean — repeats reduce the effect of random error and let you spot anomalies.
- Control all other variables so only the IV affects the DV — this is what makes the result valid.
- Include a control treatment so any change can be attributed to the IV, not to something else.
- Use precise instruments and standard methods (same observer, same timing) to cut measurement error.
- Evaluate honestly: name a real limitation (small sample, narrow range, one site) and a matched improvement.
Reliable = repeatable (same result if you do it again) → fixed by more replicates. Valid = measures the right thing with no confounding variable → fixed by controlling variables and adding a control.
🔬 Magnification, scale bars & microscope measurement (5.1.3)
Magnification links the size you see to the real size: $M = \dfrac{\text{image size}}{\text{actual size}}$. The same triangle rearranges to find a real size ($\text{actual} = \text{image} \div M$) or an image size ($\text{image} = M \times \text{actual}$). The classic trap is units: measure the image and the scale bar in the same unit. Remember 1 mm = 1000 µm, so a 40 mm bar that represents 80 µm gives $M = 40\,000 \div 80 = ×500$.
Magnification from a scale bar: measure the bar on the image (40 mm = 40 000 µm), read what it stands for in real life (80 µm), then DIVIDE — magnification = ×500. The same triangle rearranges to give the real size of anything else on the image.
🔒 Interactive diagram
Explore the labelled diagram, charts and maps for this topic in study mode.
Convert before you divide: × 1000 to go mm → µm, ÷ 1000 to go µm → mm. Measure the scale bar with a ruler (in mm), turn it into µm, then divide by what the bar says it represents. Magnification has no units (it is a ratio) and is written with a × (e.g. ×500).
📊 Reading & interpreting graphs, charts & tables (5.1.4)
This is the single most-tested data skill (around 90 logged questions). Most marks are for reading off a value accurately or describing a trend with figures. Match your answer to the command term: state/identify wants one value (with units); describe wants the shape of the trend; compare and contrast wants a similarity and a difference; deduce/explain wants a conclusion backed by quoted data.
| Command term | What examiners want | How to score it |
|---|---|---|
| State / Identify / Read off | ONE value or label straight from the data | Quote the number WITH its unit; no explanation needed |
| Describe | The overall trend / shape | Direction + a quoted value: 'rises steeply to ~40, then plateaus' |
| Compare & contrast | Two series — a similarity AND a difference | One 'both…' point and one 'whereas…' point, with figures |
| Deduce / Explain | A conclusion BACKED by the data | State the conclusion, then quote the data that supports it |
Key Idea: Use the pattern direction → quoted figures → any change in pattern: e.g. 'the rate rises steeply from 0 to about 40 units, then plateaus after 30 °C.' Always quote at least one value with its unit — a bare 'it goes up' rarely scores.
📈 Percentage change, ratios & rate from graphs (5.1.5)
Percentage change = $\dfrac{\text{final} - \text{initial}}{\text{initial}} \times 100$ — always divide by the original value; a fall is negative. Rate is the gradient of a graph: $\text{rate} = \dfrac{\Delta y}{\Delta x}$. For a curve, the rate at a point is the slope of the tangent; over an interval, pick two clear points and divide the rise by the run (remember the time unit, e.g. per minute).
An enzyme reaction releases 8 cm³ of gas in the control and 20 cm³ with an activator. $\%\,\text{change} = \dfrac{20 - 8}{8} \times 100 = \dfrac{12}{8} \times 100 = +150\%$. Dividing by the wrong value (20 instead of 8) is the usual error — always use the starting value.
A steeper line means a faster rate. Read two points off the line, do $\dfrac{\Delta y}{\Delta x}$, and attach the unit (e.g. cm³ min⁻¹). For a reaction that slows down, the curve flattens — the gradient (rate) falls toward zero.
📏 Mean, standard deviation & error bars (5.1.6)
Descriptive statistics summarise a data set: the mean is the average, the median the middle value, the mode the commonest, the range the spread, and the standard deviation (s) how tightly the data cluster around the mean. On a chart, error bars usually show ± one standard deviation. The quick significance read is the overlap rule: if two bars' error bars overlap, the difference is not significant; if there is a clear gap, it is significant. Small error bars also signal reliable/precise results.
| Statistic | What it tells you | How to read / find it |
|---|---|---|
| Mean (x̄) | The average — a typical central value | Add the values, divide by n: $\bar{x} = \dfrac{\sum x}{n}$ |
| Median | The middle value when ordered (the box-plot line) | Less affected by outliers than the mean |
| Mode | The most common value / the tallest bar | Read the most frequent class straight off a frequency graph |
| Range | Spread from lowest to highest | Largest value − smallest value |
| Standard deviation (s) | How spread out the data are around the mean | Small s = tightly clustered; large s = widely spread |
| Error bar overlap | A quick significance check | Overlap → not significant; clear gap → significant |
Three means, each capped with an error bar of ± one standard deviation. Blue and purple OVERLAP → that difference is NOT significant. Yellow sits clearly higher with a CLEAR GAP → that difference IS significant. The overlap rule is the quickest read of significance on a bar chart.
🔒 Interactive diagram
Explore the labelled diagram, charts and maps for this topic in study mode.
Key Idea: Error bars overlap → the means could be the same → difference NOT significant. Clear gap (no overlap) → the difference IS significant. Big error bars = lots of spread = less reliable; small bars = tight data = more reliable.
A box-and-whisker plot shows the same spread a different way — read the median off the line inside the box, the middle 50% (IQR) as the box, the range as the whiskers, and spot any outlier as a separate marked point.
A box-and-whisker plot packs five numbers into one shape: the line inside the box is the MEDIAN, the box spans the lower-to-upper quartile (the middle 50% of the data = the IQR), the whiskers reach the minimum and maximum, and any separate marked point is an OUTLIER. Read the median straight off the line inside the box.
🔒 Interactive diagram
Explore the labelled diagram, charts and maps for this topic in study mode.
🧮 Statistical significance: chi-squared & t-test (5.1.7)
A significance test turns 'these look different' into a number you can judge. Use chi-squared (χ²) to compare observed vs expected counts (genetic ratios, association); use a t-test to compare two means. Both follow the same logic: state H₀/H₁, calculate the statistic, look up the critical value at your df and p = 0.05, then compare. If the calculated value ≥ the critical value, the result is significant (p < 0.05) and you reject H₀.
| Feature | Chi-squared (χ²) | t-test |
|---|---|---|
| What it compares | Observed vs expected counts (e.g. a 3 : 1 genetic ratio) | Two means (e.g. shade vs sun leaf length) |
| Formula | $\chi² = \sum \dfrac{(O-E)²}{E}$ | compares the two means using their spread → gives $tcₐₗc$ |
| Significant when | $\chi²cₐₗc \ge \chi²cᵣᵢₜ$ (p < 0.05) | $tcₐₗc \ge tcᵣᵢₜ$ (p < 0.05) |
| Conclusion if significant | Reject H₀ — the difference is real | Reject H₀ — the means really differ |
The 5 steps (the same shape for both tests)
- State the hypotheses. H₀ = 'no difference / no association (any difference is chance)'; H₁ = 'there is a real difference / association'.
- Work out the expected values (E) — e.g. from the predicted ratio, or the no-effect prediction.
- Calculate the statistic — $\chi² = \sum \dfrac{(O-E)²}{E}$ for counts, or $tcₐₗc$ for two means.
- Find the degrees of freedom (df) and read the critical value at p = 0.05 from the GIVEN table.
- Compare and conclude. Calculated ≥ critical → significant (p < 0.05) → reject H₀; smaller → not significant → do not reject H₀.
Degrees of freedom for a ratio test = (number of categories − 1). Read the critical value at p = 0.05:
| Degrees of freedom (df) | Critical χ² at p = 0.05 |
|---|---|
| 1 (e.g. a 3 : 1 ratio → 2 categories) | 3.84 |
| 2 (3 categories) | 5.99 |
| 3 (4 categories) | 7.81 |
Calculated ≥ critical → SIGNIFICANT (p < 0.05) → reject H₀ (the difference/association is real). Calculated < critical → NOT significant (p > 0.05) → do not reject H₀ (any difference is likely chance). The same sentence works for χ² and t.
🔗 Correlation, causation & coefficient of determination (5.1.8)
A correlation means two variables change together. The correlation coefficient r runs from −1 to +1: the sign gives the direction (positive = both rise; negative = one rises as the other falls) and closeness to ±1 gives the strength. R² (the coefficient of determination) is the fraction of the variation explained by the trend line — e.g. R² = 0.92 means 92% of the variation in y is explained by x. Crucially, a strong correlation does NOT prove causation — only a controlled experiment can do that.
| Term | What it means | Key point |
|---|---|---|
| Correlation coefficient r | Direction + strength of a straight-line link (−1 to +1) | Sign = direction (↑↑ or ↑↓); closeness to ±1 = strength |
| R² (coefficient of determination) | Fraction of the variation the trend line explains (0 → 1) | $R² = 0.92$ → 92% of the variation is explained; $R² = r²$ |
| Correlation | Two variables change together | Does NOT prove one causes the other |
| Causation | One variable actually produces the change in the other | Needs a controlled experiment to establish, not just a graph |
A positive correlation: as leaf area increases, transpiration rate increases, so the best-fit line slopes UP and r ≈ +0.9 (close to +1 = strong). A strong correlation still does NOT prove that bigger leaves CAUSE faster transpiration — correlation ≠ causation.
🔒 Interactive diagram
Explore the labelled diagram, charts and maps for this topic in study mode.
Key Idea: r carries a sign (direction) and a size (strength): r = +0.91 is a strong positive link; r = −0.95 is a strong negative one. R² is always positive and is a percentage of explained variation (R² = r²). High R² = the points hug the line. Neither value proves one thing causes the other.
🐢 Biodiversity & population indices (5.1.9)
Two indices quantify communities. Simpson's reciprocal diversity index $D = \dfrac{N(N-1)}{\sum n(n-1)}$ combines richness (how many species) and evenness (how evenly individuals are spread) into one number — a higher D means more diverse. Here $N$ is the total of all individuals and $n$ is the count of one species. The Lincoln (capture–mark–recapture) index $N = \dfrac{n₁ \times n₂}{n₃}$ estimates a population of mobile animals: $n₁$ are marked and released, $n₂$ are caught later, and $n₃$ of those carry a mark.
| Index | Formula | What the symbols mean / what it tells you |
|---|---|---|
| Simpson's reciprocal diversity (D) | $D = \dfrac{N(N-1)}{\sum n(n-1)}$ | $N$ = total of ALL individuals; $n$ = individuals of one species. Higher D = more diverse (depends on richness AND evenness). |
| Lincoln mark–recapture (N) | $N = \dfrac{n₁ \times n₂}{n₃}$ | $n₁$ marked & released, $n₂$ second catch, $n₃$ marked in second catch. Assumes no births/deaths/migration and full mixing. |
Simpson's: $N$ is the total of ALL individuals, NOT the number of species. Build the n(n−1) column first, sum it, then divide — that is where the method marks are. Lincoln: the estimate only holds if there are no births, deaths or migration between samples and marked animals mix back fully and aren't easier (or harder) to recatch.
🦫 Sampling & laboratory techniques (5.1.10)
Field and lab methods give you the valid data to analyse. Quadrats (placed randomly) estimate abundance; transects and kite diagrams show how a community changes along a gradient; chromatography separates pigments (report an Rf); gel electrophoresis separates DNA/proteins by size; PCR amplifies DNA; and a respirometer measures respiration rate. Across all of them the same quality rules apply: random/representative sampling, a control, replicates, and standardised conditions.
| Technique | What it samples / measures | Key detail |
|---|---|---|
| Quadrat (random) | Abundance / % cover of non-motile organisms | Place randomly (random coordinates) to avoid bias; estimate density per m² |
| Transect / kite diagram | How organisms change ALONG a gradient (e.g. up a shore) | A line or belt; the kite's width shows abundance at each distance |
| Chromatography (Rf) | Separates pigments/molecules by solubility | $Rf = \dfrac{\text{distance spot}}{\text{distance solvent}}$ (0–1); furthest = most soluble |
| Gel electrophoresis | Separates DNA / proteins by size (& charge) | Smaller fragments move FURTHER through the gel toward the +ve electrode |
| PCR | Amplifies (copies) a tiny amount of DNA | Repeated heating/cooling cycles double the DNA each time |
| Respirometer | Rate of respiration (O₂ used / CO₂ made) | Use a control tube and keep temperature constant |
Rf = $\dfrac{\text{distance the spot moved}}{\text{distance the solvent moved}}$ — always between 0 and 1; the spot that travels furthest is most soluble in the solvent. In gel electrophoresis, smaller fragments travel further toward the positive electrode — so the band nearest the far end is the smallest molecule.
✍️ Mixed worked examples (each needs a calculation)
IB-style question — magnification from a scale bar
A light micrograph of a plant root cell is printed with a scale bar that measures 50 mm long on the page and is labelled '100 µm'. The cell measures 35 mm across on the same image. (a) Calculate the magnification of the image. (b) Calculate the real width of the cell in µm. [4]
Model answer:
Convert to the same units. The scale bar is 50 mm on the page = 50 × 1000 = 50 000 µm of page, and it represents 100 µm of real tissue.
(a) Magnification $= \dfrac{\text{image size}}{\text{actual size}} = \dfrac{50\,000}{100} = ×500$ (no units — it is a ratio).
(b) Rearrange for the real width: $\text{actual} = \dfrac{\text{image}}{M}$. The cell is 35 mm = 35 000 µm on the image.
Substitute: $\text{actual} = \dfrac{35\,000}{500} = 70\ µm$. (Mark 1: convert mm → µm. Mark 2: M = ×500. Mark 3: correct rearrangement. Mark 4: 70 µm with units.)
(a) Magnification = 50 000 µm ÷ 100 µm = ×500. (b) Real width = image ÷ M = 35 000 µm ÷ 500 = 70 µm.
IB-style question — chi-squared on a genetic cross
A monohybrid cross is predicted to give offspring in a 3 : 1 ratio of green to yellow. A student counts 156 green and 44 yellow seedlings (200 total). Carry out a chi-squared test, $\chi² = \sum \dfrac{(O-E)²}{E}$, and state your conclusion. The critical χ² at df = 1, p = 0.05 is 3.84. [4]
Model answer:
State H₀. H₀: the offspring fit a 3 : 1 ratio (any deviation is due to chance).
Expected values (E). Of 200, a 3 : 1 ratio predicts $\tfrac{3}{4} × 200 = 150 green and $\tfrac{1}{4} × 200 = 50 yellow.
Calculate χ². Green: $\dfrac{(156-150)²}{150} = \dfrac{36}{150} = 0.24$. Yellow: $\dfrac{(44-50)²}{50} = \dfrac{36}{50} = 0.72$. So $\chi² = 0.24 + 0.72 = 0.96$.
Compare and conclude. df = (2 − 1) = 1, critical value = 3.84. Calculated χ² = 0.96 is smaller than 3.84, so p > 0.05: not significant → do not reject H₀. The offspring do fit the 3 : 1 ratio. (Mark 1: H₀. Mark 2: E = 150 & 50. Mark 3: χ² ≈ 0.96. Mark 4: compares to 3.84 and concludes 'fits 3 : 1'.)
Expected = 150 green, 50 yellow. χ² = (6²/150) + (6²/50) = 0.24 + 0.72 = 0.96. df = 1, critical = 3.84. 0.96 < 3.84, so the result is not significant — do not reject H₀; the offspring fit a 3 : 1 ratio.
IB-style question — Simpson's diversity index
A student sweep-nets a meadow and records: grasshopper 10, ladybird 6, hoverfly 5, spider 4 (25 individuals in total). Calculate Simpson's reciprocal diversity index, $D = \dfrac{N(N-1)}{\sum n(n-1)}$, to 2 decimal places, and state what a higher value of D would mean. [4]
Model answer:
Find N. N = the total of ALL individuals = 10 + 6 + 5 + 4 = 25 (NOT the number of species).
Build the n(n−1) column. Grasshopper: 10 × 9 = 90; ladybird: 6 × 5 = 30; hoverfly: 5 × 4 = 20; spider: 4 × 3 = 12. Sum $\sum n(n-1) = 90 + 30 + 20 + 12 = 152$.
Substitute. $D = \dfrac{N(N-1)}{\sum n(n-1)} = \dfrac{25 × 24}{152} = \dfrac{600}{152} = 3.95$ (no units).
Interpret. A higher D means greater biodiversity — more species and/or a more even spread of individuals. (Mark 1: N = 25 & Σ n(n−1) = 152. Mark 2: substitution (25×24)/152. Mark 3: D = 3.95. Mark 4: higher D = more diverse.)
N = 25. Σ n(n-1) = 90 + 30 + 20 + 12 = 152. D = (25×24)/152 = 600/152 = 3.95. A higher D means greater biodiversity (more species and/or more even abundance).
✅ Quick self-check
Tap each card to check yourself.
How do you turn a scale bar into a magnification? Measure the bar on the image and what it represents in the SAME unit (convert mm → µm with ×1000), then magnification = image size ÷ actual size. It has no units and is written ×.
What is the formula for percentage change, and what do you divide by? % change = (final − initial) ÷ initial × 100. Always divide by the ORIGINAL (initial) value; a decrease gives a negative answer.
What does it mean if two bars' error bars overlap? The two means could be the same — the difference is NOT statistically significant. A clear gap (no overlap) suggests the difference IS significant. Small error bars also mean more reliable data.
When is a chi-squared or t-test result significant? When the calculated value is greater than or equal to the critical value at the right degrees of freedom and p = 0.05 (so p < 0.05). Then you reject H₀ — the difference/association is real.
Does a strong correlation prove causation? No. r near ±1 (or a high R²) shows a strong association, but only a controlled experiment can establish that one variable actually CAUSES the change in the other. Correlation ≠ causation.
In Simpson's index D = N(N−1)/Σn(n−1), what is N? N is the TOTAL number of individuals of ALL species added together (not the number of species). n is the count of one species. A higher D means greater biodiversity.
Exam Tips
- On Paper 1B the formulae and critical-value tables are GIVEN — marks are for method and interpretation, so always SHOW your working and quote units.
- Magnification: convert to one unit first (1 mm = 1000 µm); M = image ÷ actual (no units, write ×); rearrange to actual = image ÷ M for a real size.
- Percentage change = (final − initial) ÷ initial × 100 — divide by the ORIGINAL value; a fall is negative.
- Rate = gradient of a graph = Δy ÷ Δx; a steeper line is a faster rate; attach a time unit (e.g. cm³ min⁻¹).
- Error-bar overlap rule: overlap → difference NOT significant; clear gap → significant. Small bars = reliable/precise data.
- Box plot: the line in the box is the median, the box is the middle 50% (IQR), whiskers reach min/max, a separate point is an outlier.
- Significance tests share one rule: calculated ≥ critical (at df, p = 0.05) → significant (p < 0.05) → reject H₀. χ² for counts/ratios, t-test for two means.
- Correlation ≠ causation: r gives direction (sign) and strength (closeness to ±1); R² is the fraction of variation explained (R² = r²).
- Simpson's D = N(N−1) ÷ Σn(n−1): N is the TOTAL of all individuals, build the n(n−1) column first; higher D = more diverse. Lincoln N = (n₁ × n₂) ÷ n₃ for mobile populations.
- Match graph answers to the command term: state = one value (with unit); describe = trend with figures; compare and contrast = a similarity AND a difference; deduce/explain = conclusion backed by quoted data.