Year 12 Biology Module 8 · IQ3 Lesson 13 of 21 45 min

Epidemiology — Data Analysis, Treatment Outcomes and Study Evaluation

A new cancer drug reduces mortality by 50% — sounds dramatic. But if the baseline risk was 2%, the absolute reduction is only 1 percentage point. Understanding the difference between relative and absolute risk is what separates critical reading of medical evidence from being misled by statistics.

📈

Printable worksheet

Download this lesson's worksheet

Use the PDF for classwork, homework or revision. It includes key ideas, activities, questions, an extend task and success-criteria proof.

Kidney Disease

Download PDF Open printable version

Think First — Critical Reading

"This Drug Cuts Heart Attack Risk by 40%" — Should You Take It?

A newspaper headline reads: "New cholesterol drug cuts heart attack risk by 40%." The drug costs $200/month and has moderate side effects in 10% of users.

The actual trial data: in the placebo group, 5 out of every 1,000 patients had a heart attack over 5 years. In the drug group, 3 out of every 1,000 patients had a heart attack over 5 years.

Before reading on:

Q1: The 40% figure is the relative risk reduction. Calculate the absolute risk reduction (the actual difference in risk between the two groups). How does the absolute figure compare to the relative figure in terms of what it means for an individual patient?

Q2: If 1,000 patients take this drug for 5 years, how many heart attacks are prevented? How does this change your assessment of the drug's value?

✏️ Calculate and reason before reading on.

Key Terms — scan before reading

Study design:it the right design for the question? (RCT for intervention; cohort for long-term exposure; case-control for rare diseas

Sample size:it large enough to detect a real effect? Small samples are underpowered — they may miss real effects (false negative) or

Control group:there an appropriate comparison group? Placebo vs active control vs no treatment — the choice affects what conclusions c

the absolute reductiononly 1 percentage point

relative and absolute riskwhat separates critical reading of medical evidence from being misled by statistics

terms of what itfor an individual patient?

Know

How to calculate relative risk, absolute risk reduction, and NNT from trial data
How to read and interpret a basic survival curve
The hierarchy of evidence from case reports to systematic reviews
What a p-value represents and its limitation as the only measure of significance

Understand

Why relative risk reduction can be misleading without absolute risk context
Why NNT is more useful for clinical decision-making than relative risk
Why systematic reviews and meta-analyses provide stronger evidence than individual studies
How to identify limitations of studies and what they prevent you from concluding

Can Do

Calculate RR, ARR, RRR, and NNT from a 2×2 table or trial data
Interpret a survival curve and identify what the gap between curves represents
Evaluate whether a study's conclusions are supported by its data and design
Identify when a statistically significant result may not be clinically meaningful

Core Content

Key Point

Connect this concept back to the broader homeostasis and disease framework you have built across the course.

Relative Risk, Absolute Risk Reduction and Number Needed to Treat

The three most important numbers for evaluating any treatment or prevention claim

Interactive

Try this: Enter values into the 2×2 table or load an example, then observe how RR, attributable risk, and odds ratio change. Try RR = 1, RR > 1, and RR < 1 to see how the interpretation shifts.

This calculator demonstrates why relative risk alone can be misleading without absolute risk context.

Interactive: Relative Risk Calculator

Key Takeaway

Relative Risk tells you how many times more likely the exposed group is to develop the outcome. RR = 1 means no association. RR > 1 indicates increased risk. RR < 1 indicates protection. Always interpret RR alongside absolute risk reduction and NNT to understand real-world clinical significance.

Relative risk tells you how much more or less likely an outcome is in one group compared to another — expressed as a ratio. Absolute risk reduction tells you the actual size of that difference in real-world terms. Number needed to treat translates that difference into a clinically meaningful statement about how many patients benefit. All three are needed to evaluate a treatment honestly.

Epidemiology showing study types, measures and evaluation

Bradford Hill criteria for establishing causation

Relative Risk (RR) = Risk in exposed group ÷ Risk in unexposed (control) group

Absolute Risk Reduction (ARR) = Risk in control group − Risk in treatment group

Relative Risk Reduction (RRR) = ARR ÷ Risk in control group (expressed as %)

Number Needed to Treat (NNT) = 1 ÷ ARR

Worked Example — Statin Drug for Heart Disease Prevention

An RCT follows 10,000 patients (5,000 statin, 5,000 placebo) for 5 years. Results:

Statin group: 100 heart attacks out of 5,000 = risk of 0.02 (2%)

Placebo group: 150 heart attacks out of 5,000 = risk of 0.03 (3%)

RR = 0.02 ÷ 0.03 = 0.67 — the statin group has 67% of the risk of the placebo group (a 33% lower relative risk).

ARR = 0.03 − 0.02 = 0.01 (1%) — the statin reduces absolute heart attack risk by 1 percentage point over 5 years.

RRR = 0.01 ÷ 0.03 = 33% — the statin reduces relative risk by 33%.

NNT = 1 ÷ 0.01 = 100 — 100 patients must take the statin for 5 years to prevent one additional heart attack.

Interpretation: A headline saying "statins reduce heart attack risk by 33%" is technically accurate (RRR) but can be misleading — the absolute reduction is only 1%. Whether NNT = 100 is acceptable depends on the drug's cost, side effects, and the severity of the outcome prevented. For a condition as serious as heart attack, NNT = 100 may well be worthwhile. For a minor condition, it may not be.

Why relative risk can mislead

Relative risk amplifies small effects in low-risk populations. "This supplement reduces cancer risk by 50%" sounds impressive — but if the baseline risk is 0.002% (2 in 100,000), a 50% reduction means an absolute reduction from 0.002% to 0.001%. NNT would be 100,000 — you would need to treat 100,000 people to prevent one cancer. The relative figure is truthful but decontextualised from real-world importance.

Conversely, an ARR of 5% (NNT = 20) represents a very effective treatment — for every 20 people treated, one extra bad outcome is prevented. In clinical medicine, NNT values below 10 are considered highly effective; 10–100 moderately effective; above 100 marginal.

Common Error Students confuse relative risk with relative risk reduction. RR is the ratio of risks (exposed ÷ unexposed). RRR is how much the risk decreases proportionally: (control risk − treatment risk) ÷ control risk. If RR = 0.67, then RRR = 1 − 0.67 = 33% — the treatment reduces risk by 33% relative to the control. A drug with RR = 0.8 has a 20% relative risk reduction (1 − 0.8 = 0.2 = 20%), not an 80% reduction.

Survival Curves — Reading Kaplan-Meier Graphs

The most common graph type in clinical trial reporting — used to show how long patients survive or remain disease-free

A survival curve (Kaplan-Meier plot) shows the proportion of a study population that has not yet experienced the primary outcome (often death, but also disease recurrence, hospitalisation, or other events) over time. They appear in almost every major clinical trial and in many HSC exam questions about epidemiological data.

How to read a survival curve

Y-axis: Proportion of participants who have not yet experienced the outcome (usually 0–1 or 0–100%). Starts at 1.0 (100% event-free) and falls over time as participants experience the outcome.
X-axis: Time (months, years).
Multiple lines: Each line represents a different group (e.g. treatment vs placebo; smokers vs non-smokers). A line that falls more steeply = more events happening faster = worse outcome.
Gap between lines: The vertical distance between lines at any time point represents the difference in survival probability between groups at that time. A widening gap over time suggests the treatment benefit increases; a converging gap suggests diminishing benefit.
Plateau: A line that flattens indicates no further events are occurring — either participants have reached long-term survival, the study has ended, or participants are being lost to follow-up.
Censoring marks (tick marks on lines): Small vertical marks on the survival line indicate participants who were 'censored' — lost to follow-up, withdrew, or the study ended before they experienced the outcome. These people's outcomes are unknown.

Worked Example — Interpreting a Survival Curve

A melanoma trial shows two survival curves over 5 years. The immunotherapy group starts at 1.0 and falls gradually to 0.52 (52% surviving at 5 years). The chemotherapy group starts at 1.0 and falls more steeply to 0.28 (28% surviving at 5 years). The curves diverge from 6 months onward.

What can you conclude:

At 5 years, 52% of immunotherapy patients were still alive vs 28% of chemotherapy patients — a difference of 24 percentage points (absolute difference in 5-year survival).

The curves diverge from 6 months — suggesting immunotherapy benefit begins early and increases over time. This divergence pattern is consistent with immunotherapy's mechanism (stimulating durable immune responses that continue killing cancer cells).

You cannot conclude that all immunotherapy patients will survive long-term — the curves show 48% of immunotherapy patients also died within 5 years. What the study shows is that immunotherapy more than doubled the proportion surviving at 5 years compared to chemotherapy.

HSC Exam Tip When asked to interpret a survival curve in an HSC exam: (1) Identify what the y-axis represents (proportion surviving/event-free) and the x-axis (time); (2) Describe the trend for each group; (3) Quantify the difference — read off specific values at key time points; (4) State what can be concluded from the gap between curves; (5) Note any limitations (censoring, sample size, follow-up period). Never just say "the treatment group did better" — always quote the numbers.

The Evidence Hierarchy — From Single Case to Systematic Review

Not all evidence is equal — understanding the hierarchy explains why some claims are more reliable than others

Interactive

Try this: Select each study description card, then place it in the correct study type bin. Check your answers when all six cards are placed.

Recognising study designs from their description is an essential HSC skill tested directly in exam questions.

Interactive: Study Type Classifier

Key Takeaway

Cohort studies follow exposed groups forward in time. Case-control studies compare past exposures between cases and controls. Cross-sectional studies measure exposure and disease at one time point. Recognising these designs from their description is essential for evaluating epidemiological evidence.

In medicine and public health, evidence is graded by quality. Evidence from a single patient case report is informative but cannot establish general truths. Evidence from a well-conducted systematic review of dozens of RCTs provides the most reliable basis for clinical decisions. Understanding this hierarchy allows you to evaluate claims critically — and to recognise when media reports cherry-pick weak evidence to make strong claims.

Level	Study type	Strength	Limitation	Example
1 (strongest)	Systematic review and meta-analysis of RCTs	Pools results of multiple high-quality trials; greatest statistical power; controls for individual study quirks	Quality depends on quality of included studies; publication bias can distort results	Cochrane review of statin trials
2	Single well-designed RCT	Randomisation controls confounders; establishes causation	May not generalise to all populations; can be underpowered	UKPDS trial — metformin for Type 2 diabetes
3	Cohort study	Prospective; establishes temporal sequence; large populations possible	Observational — cannot control all confounders	Nurses' Health Study — diet and cancer
4	Case-control study	Efficient for rare diseases; retrospective	Recall bias; cannot establish incidence	Case-control study of HPV and cervical cancer
5	Cross-sectional study	Cheap; generates hypotheses	Cannot establish temporal sequence	National Health Survey — diet and diabetes prevalence
6 (weakest)	Case report / expert opinion	Identifies novel phenomena; hypothesis-generating	No comparison group; no statistical analysis; highly susceptible to bias	"Patient who ate X recovered from Y"

Statistical significance vs clinical significance

A p-value below 0.05 (the conventional threshold for 'statistical significance') means there is less than a 5% probability of observing the result by chance if the null hypothesis (no effect) were true. It does NOT mean the effect is clinically important. With very large sample sizes, even tiny trivial differences become statistically significant.

Example: A study of 500,000 patients finds that a new drug reduces blood pressure by an average of 0.3 mmHg compared to placebo (p = 0.001 — highly statistically significant). A 0.3 mmHg reduction in blood pressure is clinically meaningless — no patient would benefit detectably from such a small change. The study found a real effect, but not a useful one. Statistical significance tells you whether an effect exists; clinical significance (effect size, ARR, NNT) tells you whether it matters.

Critical Reading When a media report claims "scientists have proved X causes/prevents Y," always ask: (1) What level of evidence is this? (single study, systematic review?); (2) Is the effect statistically AND clinically significant? (what is the ARR/NNT?); (3) Was this an RCT or observational study?; (4) Was the study replicated?; (5) Was there a plausible biological mechanism? A single observational study showing association is not 'proof' — it is one piece of a larger puzzle.

Evaluating a Study — A Systematic Approach

A checklist for critically evaluating any epidemiological study or trial — directly tested in HSC extended response questions

The HSC regularly asks students to evaluate study quality. This is not about finding flaws for the sake of it — it is about identifying what a study can and cannot establish, so that claims based on its results can be appropriately qualified.

Key evaluation criteria

Study design: Is it the right design for the question? (RCT for intervention; cohort for long-term exposure; case-control for rare disease).
Sample size: Is it large enough to detect a real effect? Small samples are underpowered — they may miss real effects (false negative) or produce spurious results.
Representativeness: Does the study population reflect the target population? RCTs often exclude elderly, pregnant, or multi-morbid patients — limiting generalisability.
Blinding: Were participants and/or researchers blind to treatment allocation? Single-blind (participants unaware); double-blind (participants and assessors unaware). Unblinded studies are more susceptible to placebo effect and assessment bias.
Control group: Is there an appropriate comparison group? Placebo vs active control vs no treatment — the choice affects what conclusions can be drawn.
Follow-up: Was the follow-up period long enough? Diseases with long latency periods (cancer, cardiovascular disease) require years of follow-up — short studies miss delayed outcomes.
Confounding: Were potential confounders identified and controlled? In observational studies, residual confounding is always a risk.
Outcome measurement: Were outcomes measured objectively and consistently? Subjective outcomes (pain, quality of life) are more susceptible to bias than objective outcomes (death, laboratory values).
Statistical analysis: Was the appropriate statistical method used? Were confidence intervals reported alongside p-values?

Worked Example — Evaluating a Hypothetical Study

Study: A 6-week RCT of 200 patients found that a new anti-inflammatory drug reduced self-reported knee pain by 35% more than placebo (p = 0.03). The study was single-blind (patients did not know which group they were in, but researchers did). Patients with severe kidney disease were excluded.

Strengths: RCT design — randomisation controls for most confounders. Appropriate study design for testing a new treatment.

Limitations to note: (1) Single-blind — researchers who knew treatment allocation could unconsciously bias their assessments of patient-reported pain (assessment bias). Double-blinding would be stronger. (2) 6 weeks is short — many musculoskeletal conditions improve spontaneously over 6 weeks (regression to the mean). A longer trial would be more convincing. (3) Self-reported pain is subjective — placebo effect is substantial for pain outcomes even with blinding. (4) Excluded severe kidney disease patients — results may not generalise to this group who may have different drug metabolism. (5) p = 0.03 is statistically significant but close to the threshold — with a small sample (200), there is more risk this reflects sampling variation.

HSC Exam Strategy When asked to "evaluate" a study in HSC exams, structure your answer as: (1) State the design type; (2) Identify 2–3 specific strengths with reasoning; (3) Identify 2–3 specific limitations with reasoning; (4) State what can and cannot be concluded. Use the language of epidemiology: confounding, bias, temporal sequence, statistical significance, sample size, representativeness. Avoid vague statements like "the study was good" — be specific.

Real-World Anchor — The Heart Protection Study and NNT

How NNT Changed the Way Doctors Prescribe Statins

The Heart Protection Study (HPS), published in 2002, was one of the largest cardiovascular trials ever conducted — 20,536 patients with existing cardiovascular disease or high risk, followed for 5 years. It found that simvastatin reduced major vascular events (heart attacks, strokes, revascularisation procedures) by about 24% relative risk reduction compared to placebo.

The headline figure — 24% relative risk reduction — was used extensively to promote statin prescribing. But the absolute figures were equally important: the event rate fell from approximately 25.2% in the placebo group to 19.8% in the statin group — an ARR of 5.4 percentage points, giving an NNT of approximately 19 over 5 years. This means treating 19 high-risk patients with simvastatin for 5 years prevents one additional major vascular event.

For high-risk patients with existing cardiovascular disease, NNT = 19 is considered highly clinically significant — statins rapidly became standard of care for this group. But when the same relative risk reduction (24%) was applied to lower-risk primary prevention populations (people without existing CVD), the absolute event rate in the placebo group was much lower (~5% over 5 years), producing an ARR of only ~1.2% and an NNT of ~83. The same drug, the same relative risk reduction, but very different absolute benefit — which is why prescribing decisions for primary prevention are more nuanced than for secondary prevention. This is precisely why NNT matters.

Priority Misconceptions — Data Analysis and Study Evaluation

✗

"A large relative risk reduction means the treatment is very effective." — Relative risk reduction must always be considered alongside the absolute baseline risk. A 50% relative risk reduction from 0.002% to 0.001% (ARR = 0.001%, NNT = 100,000) is far less clinically meaningful than a 25% relative risk reduction from 20% to 15% (ARR = 5%, NNT = 20).

✗

"A p-value below 0.05 means the result is important." — Statistical significance (p < 0.05) means the result is unlikely to be due to chance. It says nothing about clinical importance. With large enough samples, trivially small and meaningless differences become statistically significant. Always assess clinical significance (effect size, ARR, NNT) alongside statistical significance.

✗

"A survival curve that falls to zero means all patients died." — A survival curve that reaches zero means all patients in the study eventually experienced the primary outcome within the follow-up period. However, studies usually end before all participants experience the outcome — a curve that plateaus does not mean those patients are cured; it means the follow-up period ended. Censoring marks indicate patients who left the study before the outcome — their fate is unknown, not assumed to be survival.

✗

"A systematic review is just a literature review." — A systematic review uses pre-specified, reproducible methods to identify and critically appraise ALL relevant studies on a question, minimising selection bias in which studies are included. A narrative (regular) literature review is selective — the author chooses which studies to discuss, which can introduce bias. Meta-analysis within a systematic review statistically pools study results. These methodological distinctions place systematic reviews at the top of the evidence hierarchy.

✗

"If a study found no effect (p > 0.05), the treatment definitely doesn't work." — A non-significant result means there was insufficient evidence to reject the null hypothesis — not that the null hypothesis is true. A small study may be underpowered to detect a real but modest effect. The absence of evidence is not evidence of absence. Confidence intervals around the null result tell you more than the p-value alone — a wide confidence interval crossing zero indicates uncertainty, not definitive null effect.

Image Slot 1: Side-by-side comparison of relative risk reduction vs absolute risk reduction using two scenarios: (A) High-risk population — 20% vs 15% event rate — ARR = 5%, NNT = 20; (B) Low-risk population — 0.4% vs 0.3% event rate — ARR = 0.1%, NNT = 1000. Both have the same RRR = 25%. Visual should show the same headline ("25% risk reduction") but starkly different real-world impact.

Image Slot 2: Annotated Kaplan-Meier survival curve with labels for: y-axis (proportion event-free), x-axis (time in months/years), two diverging lines (treatment vs control), the gap between lines at specific time points (annotated with values), censoring tick marks, and a shaded region showing the difference in 5-year survival. Arrows pointing to key features with explanatory notes.

Group	Patients	Progressed to T2D
Drug group	1,000	60
Placebo group	1,000	100

Multiple Choice

Test Your Understanding

ApplyBand 3

1. In a vaccine trial, 2% of the unvaccinated group developed the disease compared to 0.5% of the vaccinated group. What is the Number Needed to Vaccinate (equivalent to NNT) to prevent one case?

4 — because the vaccinated group had 4× lower risk

75 — because relative risk reduction is 75%

67 — because ARR = 2% − 0.5% = 1.5%, and NNT = 1 ÷ 0.015 ≈ 67

200 — because absolute risk in unvaccinated group is 2 per 100

UnderstandBand 3

2. A Kaplan-Meier survival curve shows two lines that converge after year 3, having diverged in years 1–3. What does the convergence most likely indicate?

The treatment caused long-term harm to the treatment group after year 3

The survival advantage of the treatment is diminishing over time — the treatment and control groups are experiencing similar rates of the outcome after year 3, suggesting the treatment benefit does not persist long-term for all patients

All patients in both groups died after year 3, causing both curves to reach zero

The study ended at year 3 and the data after that point is extrapolated

AnalyseBand 4

3. A very large study of 1 million people finds that people who drink 3+ cups of coffee per day have a statistically significantly lower rate of Type 2 diabetes (p = 0.001, RR = 0.97). The ARR is 0.3%. Which statement best evaluates this finding?

The finding is highly significant because p = 0.001, which is much lower than the 0.05 threshold

Coffee should be recommended to all patients at risk of Type 2 diabetes based on this evidence

The finding disproves any link between coffee and diabetes since the relative risk is so close to 1.0

The finding is statistically significant but clinically trivial — RR = 0.97 represents only a 3% relative risk reduction, and ARR = 0.3% gives NNT ≈ 333. The large sample size makes even this tiny effect detectable statistically. This is an observational study, so confounders (diet, lifestyle) cannot be excluded. No clinical recommendation should be made from this alone

UnderstandBand 3

4. Why are systematic reviews and meta-analyses considered the highest level of evidence for treatment efficacy?

They pool results from multiple well-designed studies, increasing statistical power to detect true effects, and use pre-specified methods to minimise selection bias in which studies are included — making their conclusions more reliable than any individual study

They are conducted by more experienced researchers than individual trials, which makes their results more accurate

They include all types of studies (case reports, expert opinions) and give equal weight to each

They always produce statistically significant results because of their large combined sample sizes

EvaluateBand 5

5. A 4-week trial of a new pain medication finds a statistically significant reduction in pain scores (p = 0.02, NNT = 8). However, the trial was single-blind, recruited only 80 patients from one hospital, excluded patients over 70, and had a 25% dropout rate. A doctor argues: "The NNT of 8 is excellent — I should prescribe this drug." Evaluate this reasoning.

The reasoning is fully justified — NNT = 8 is highly effective and the statistical significance is confirmed

The NNT of 8 is clinically promising, but the reasoning is premature — the trial has significant limitations: single-blind design allows clinician bias; small sample (80 patients) from one site reduces generalisability; exclusion of patients over 70 means results may not apply to elderly pain patients (a major demographic); 25% dropout could introduce survivor bias if those who dropped out did so due to side effects or lack of efficacy. Larger, multi-site, double-blind trials are needed before widespread prescribing

The reasoning is unjustified because a 4-week trial can never produce valid evidence about a pain medication

The dropout rate makes the results invalid regardless of other considerations

Short Answer

✍

Short Answer Questions

ApplyBand 4

6. A newspaper headline reads: "New cancer drug slashes tumour recurrence by 45%." The underlying trial data shows: recurrence rate in placebo group = 20%; recurrence rate in drug group = 11%. Calculate the absolute risk reduction and NNT for this drug. Then explain why the headline's "45%" figure, while mathematically accurate, could mislead a patient trying to understand their personal benefit from the drug. 4 MARKS

✏️ Calculate ARR and NNT, verify the 45% figure, and explain the misleading nature of RRR in your book.

AnalyseBand 4–5

7. A researcher presents survival curve data from a lung cancer trial showing that a new targeted therapy group has significantly better 3-year survival than standard chemotherapy (60% vs 35%, p < 0.001). A colleague argues: "These results prove the targeted therapy should immediately replace chemotherapy for all lung cancer patients." Evaluate this claim by discussing what the survival curve data does and does not show, and what additional information is needed before making the recommendation. 5 MARKS

✏️ Evaluate what the survival curve does and does not show, and state what additional data is needed, in your book.

EvaluateBand 5–6

8. "A single well-designed RCT showing a positive result is sufficient to change clinical practice." Evaluate this claim by discussing the strengths and limitations of individual RCTs, the role of replication and systematic review, and when it might be appropriate to act on a single trial versus waiting for more evidence. 6 MARKS

✏️ Evaluate the claim covering RCT strengths/limits, replication, and context-dependent action in your book.

Revisit Your Thinking

Return to your Think First responses at the start of this lesson.

Q1 — absolute vs relative risk: Placebo: 5/1000 = 0.5%. Drug: 3/1000 = 0.3%. ARR = 0.5% − 0.3% = 0.2%. RRR = 0.2% ÷ 0.5% = 40% (the headline figure). The absolute reduction is tiny — 0.2 percentage points. Did you see how the relative figure (40%) makes the drug sound far more impressive than the absolute figure does?
Q2 — heart attacks prevented in 1,000 patients: NNT = 1 ÷ 0.002 = 500. Treating 1,000 patients prevents only 2 heart attacks. Whether that justifies $200/month and a 10% side effect rate is a value judgement that depends on how severe heart attacks are — but it reframes the "40% reduction" dramatically.
Write the four risk measure formulas from memory, and in one sentence explain why NNT is more useful for clinical decisions than RRR.

Comprehensive Answers

▼

Activity 1 — Risk Calculations and Survival Curve

1. T2D drug trial. (a) Risk (drug) = 60/1000 = 0.06 (6%); Risk (placebo) = 100/1000 = 0.10 (10%). (b) RR = 0.06 ÷ 0.10 = 0.60. The drug group has 60% of the risk of the placebo group — i.e. 40% lower relative risk. (c) ARR = 0.10 − 0.06 = 0.04 (4%). (d) RRR = 0.04 ÷ 0.10 = 0.40 = 40%. (e) NNT = 1 ÷ 0.04 = 25. Plain language: to prevent one extra person progressing to Type 2 diabetes over 3 years, 25 patients with insulin resistance must take this drug for 3 years. Whether this is clinically worthwhile depends on the drug's cost, side effects, and the severity/cost of Type 2 diabetes if it develops. Given the serious long-term complications of T2D (blindness, kidney failure, cardiovascular disease), NNT = 25 over 3 years could well be considered clinically meaningful.

2. Melanoma survival curve. (a) Absolute difference in 5-year survival = 55% − 25% = 30 percentage points — 30% more patients in the immunotherapy group were alive at 5 years compared to the chemotherapy group. (b) The curves diverge from month 4 and continue to separate throughout follow-up. This diverging pattern (rather than parallel or converging) suggests the treatment benefit grows over time — consistent with immunotherapy's mechanism: it stimulates the patient's own immune system to recognise and kill melanoma cells. This immune response, once established, can continue killing cancer cells and providing durable disease control even after the treatment has been given. This contrasts with chemotherapy, which kills cancer cells directly but does not establish immunological memory. (c) Limitation 1: the follow-up is only 5 years — we cannot conclude whether immunotherapy confers long-term or permanent benefit; curves may converge after 5 years as patients relapse. Limitation 2: 45% of immunotherapy patients also died within 5 years — the drug clearly does not 'cure' all patients; it prolongs survival for a proportion. The study shows improved survival probability, not a cure. Additionally: no information about side effects, patient selection criteria, or whether results apply to all melanoma subtypes (e.g. BRAF-mutated vs non-mutated).

Activity 2 — Critical Appraisal

Antidepressant trial evaluation. (a) ARR = 42% − 28% = 14%; NNT = 1 ÷ 0.14 ≈ 7. For every 7 patients treated with this drug, one extra patient achieves a meaningful reduction in depression symptoms compared to placebo. NNT = 7 is clinically meaningful for a condition as disabling as depression. (b) Limitation 1 — single-blind design: clinicians who know which patients are in the drug group may unconsciously rate those patients more favourably on depression scales (assessment bias). For a subjective outcome like depression (assessed by clinician interview or self-report), this is a significant limitation. Double-blinding (where both patients and assessors are unaware of treatment allocation) would produce more reliable results. Limitation 2 — small sample from one clinic: 120 patients from a single clinic is a relatively small, geographically limited sample. The patient population at one clinic may not be representative of all patients with moderate depression — different age, ethnicity, comorbidity, and medication history profiles. Single-site studies are also more susceptible to local biases in patient selection and management. (c) Evaluation of the company's conclusion: the conclusion is partially justified but overstated. p = 0.04 is statistically significant and NNT = 7 is clinically meaningful — there is reasonable evidence of a real short-term effect. However, 'significantly more effective' is too broad a claim based on this single study because: (1) the single-blind design introduces potential bias in outcome measurement; (2) 12 weeks is a short follow-up — depression is often a long-term condition and short trials may capture response to treatment or natural fluctuation; (3) the trial compared drug vs placebo — it did not compare with existing antidepressants, so relative advantage over standard treatment is unknown; (4) the 25% dropout rate raises the question of why people dropped out (side effects? lack of efficacy? both would bias results favourably if not analysed by intention-to-treat). (d) Before widespread prescribing: a larger (500+ patient), multi-site, double-blind RCT with longer follow-up (6–12 months minimum); comparison against existing first-line antidepressants (not just placebo); independent replication by researchers without financial ties to the company; safety data on longer-term use; intention-to-treat analysis accounting for all dropouts; ideally inclusion in a systematic review or meta-analysis.

Multiple Choice

1. C — ARR = 2% − 0.5% = 1.5% = 0.015. NNT = 1 ÷ 0.015 = 66.7 ≈ 67. Option A incorrectly uses the relative risk ratio directly. Option B uses the RRR (75%) as if it were the NNT denominator. Option D uses an incorrect calculation.

2. B — Converging curves after year 3 mean the survival advantage established in years 1–3 is diminishing — both groups are experiencing similar event rates after year 3. This could mean the treatment benefit doesn't persist, or responders have already been identified and those remaining have similar prognosis regardless of treatment. Option A is wrong — convergence doesn't necessarily mean harm. Option C is wrong — convergence means similar rates, not zero survival. Option D is wrong — convergence is a data observation, not an extrapolation artefact.

3. D — RR = 0.97 = only a 3% relative risk reduction. ARR = 0.3%, NNT ≈ 333. The p = 0.001 result is almost certainly driven by the massive sample size (1 million) detecting a trivially small real effect. Clinical significance is negligible. Option A mistakes statistical significance for clinical importance. Option B ignores the tiny effect size and observational design. Option C misinterprets RR close to 1.0 — this doesn't disprove a link but shows the link, if real, is tiny.

4. A — Systematic reviews use pre-specified methods to minimise selection bias and pool results for greater power. Option B is wrong — researcher experience is not the basis for the hierarchy. Option C is wrong — systematic reviews typically focus on high-quality studies and assess quality explicitly. Option D is wrong — large combined samples increase power but don't guarantee significance, and the value is in quality not just size.

5. B — NNT = 8 is promising but the trial limitations (single-blind, 80 patients, one site, exclusion of over-70s, 25% dropout) substantially reduce confidence in the result. The doctor's reasoning acknowledges the NNT but ignores the methodological caveats. Option A ignores the limitations. Option C is wrong — 4-week trials can produce valid evidence for acute conditions. Option D overstates the impact of dropout — it is a limitation, not an automatic invalidation, especially if an intention-to-treat analysis was done.

Short Answer Model Answers

Q6 (4 marks): ARR = 20% − 11% = 9 percentage points (0.09) [1 mark]. NNT = 1 ÷ 0.09 = 11.1 ≈ 11. For every 11 patients treated with this drug, one extra tumour recurrence is prevented [1 mark]. Verification of headline: RRR = 9% ÷ 20% = 45% — confirming the headline figure is the relative risk reduction. Why the headline misleads: the 45% figure is a relative measure — it expresses the reduction as a proportion of the baseline risk (20%). A patient reading "slashes risk by 45%" is likely to interpret this as their personal risk dropping by 45 percentage points — e.g. from 20% to near zero — which is incorrect. The actual personal risk drops from 20% to 11% — a 9 percentage point absolute reduction [1 mark]. The NNT of 11 means that out of 11 patients treated, 10 experience the same outcome regardless of treatment. Only 1 in 11 patients benefits specifically from the drug vs placebo. This is still a clinically useful NNT — but the patient's understanding of benefit is accurately framed as "an 11 in 100 chance rather than 20 in 100" not "45% better" [1 mark — 4 marks total].

Q7 (5 marks): What the survival curve data shows: at 3 years, 60% of targeted therapy patients remained alive compared to 35% of chemotherapy patients — an absolute difference of 25 percentage points. This is both statistically significant (p < 0.001) and clinically meaningful. The targeted therapy approximately doubled the proportion surviving to 3 years, which is a substantial improvement for a disease as serious as lung cancer [1 mark]. What the data does NOT show: (1) Survival beyond 3 years — the trial follow-up is 3 years; we do not know if the curves converge later, whether all responders eventually relapse, or what the 5-year survival rates are. (2) The side effect and quality-of-life profile of the targeted therapy — a treatment with dramatically better survival but severe chronic toxicity may not be preferable for all patients. (3) Whether these results apply to all lung cancer patients — targeted therapies (e.g. EGFR inhibitors, ALK inhibitors) typically only benefit patients whose tumours carry specific genetic mutations. The trial may have enrolled a mutation-selected population, making results non-generalisable to unselected patients [2 marks]. Additional information needed: (1) mutation profiling — which patients carry the targetable mutation; (2) longer follow-up data (5-year, 10-year survival curves); (3) toxicity data to compare quality-adjusted survival; (4) cost-effectiveness data — targeted therapies are typically very expensive; (5) head-to-head comparison in both mutation-positive and mutation-negative populations to define which patients actually benefit [1 mark]. Conclusion: the claim that this therapy should "immediately replace chemotherapy for all lung cancer patients" is not supported — the 3-year survival data is promising and justifies prioritising this therapy for mutation-positive patients, but widespread adoption across all patients requires evidence of benefit in unselected populations, longer-term data, and consideration of toxicity and cost [1 mark — 5 marks total].

Q8 (6 marks): Strengths of individual RCTs: randomisation distributes known and unknown confounders equally, establishing that any observed difference between groups is due to the intervention rather than pre-existing differences. Double-blinding reduces both performance bias (participants behaving differently if they know their allocation) and detection bias (assessors rating outcomes differently). A well-powered, well-designed RCT with a pre-specified primary outcome is the strongest single study for establishing efficacy and causation of benefit [1.5 marks]. Limitations of individual RCTs: (1) Chance variation — even a well-designed RCT has a ~5% probability of a false positive result (the p < 0.05 threshold by definition accepts 5% of false positives). A single positive trial may reflect sampling variation. (2) Publication bias — negative RCTs are less frequently published; the literature may systematically overestimate treatment efficacy if only positive trials are reported. (3) Limited generalisability — trials often use narrow patient eligibility criteria (excluding elderly, multi-morbid, pregnant patients) that may not reflect real-world prescribing populations. (4) Potentially underpowered — a small RCT may produce a statistically significant result in a single subgroup by chance [2 marks]. Role of systematic review and replication: systematic reviews pool results from multiple independent RCTs, dramatically increasing statistical power to detect true effects while averaging out chance findings in individual trials. Pre-specified inclusion criteria minimise selection bias. Publication bias assessment (e.g. funnel plots) partially addresses the overestimation of positive results. If an effect is genuine, it should appear consistently across multiple trials in different populations — consistency is a key Bradford Hill criterion for causation [1.5 marks]. When single trial may justify action vs when more evidence is needed: a single RCT may appropriately change practice when: the disease is severe and life-threatening with no existing effective treatment; the trial is large, well-powered, double-blinded, and shows a very large effect size (NNT very low); the biological mechanism is clearly understood; and the result is biologically plausible. It may be appropriate to wait for replication when: existing effective treatments are available; the effect size is modest; the trial population is narrow; there are concerns about funding bias; or the disease is relatively minor and the drug has significant side effects [1 mark — 6 marks total].

Epidemiology — Data Analysis, Treatment Outcomes and Study Evaluation

Download this lesson's worksheet

"This Drug Cuts Heart Attack Risk by 40%" — Should You Take It?

Know

Understand

Can Do

Relative Risk, Absolute Risk Reduction and Number Needed to Treat

Worked Example — Statin Drug for Heart Disease Prevention

Why relative risk can mislead

Survival Curves — Reading Kaplan-Meier Graphs

How to read a survival curve

Worked Example — Interpreting a Survival Curve

The Evidence Hierarchy — From Single Case to Systematic Review

Statistical significance vs clinical significance

Evaluating a Study — A Systematic Approach

Key evaluation criteria

Worked Example — Evaluating a Hypothetical Study

How NNT Changed the Way Doctors Prescribe Statins

Priority Misconceptions — Data Analysis and Study Evaluation

Copy Into Your Books

Risk Measure Formulas

Survival Curves

Evidence Hierarchy (1 = best)

Study Evaluation Checklist

Working With Risk Measures and Trial Data

Critical Appraisal of a Study

Test Your Understanding

Short Answer Questions

Revisit Your Thinking

Comprehensive Answers

Activity 1 — Risk Calculations and Survival Curve

Activity 2 — Critical Appraisal

Multiple Choice

Short Answer Model Answers

Blast the Correct Answer

Mark lesson as complete