Year 12 Biology Module 8 · IQ3 Lesson 12 of 21 45 min

Epidemiology — Incidence, Prevalence, Mortality and Study Design

Every claim about disease — "smoking causes lung cancer," "obesity increases heart disease risk," "this vaccine reduces infection by 95%" — comes from epidemiology. This lesson builds the tools to understand how those claims are generated, what makes them reliable, and how to critically evaluate them.

📊

Printable worksheet

Download this lesson's worksheet

Use the PDF for classwork, homework or revision. It includes key ideas, activities, questions, an extend task and success-criteria proof.

Cancer Treatment

Download PDF Open printable version

Think First — Data Literacy

Is "More People Have Diabetes" the Same as "Diabetes Is Getting Worse"?

Between 2000 and 2022, the total number of Australians diagnosed with Type 2 diabetes more than doubled. Headlines reported this as evidence that Australia's diabetes epidemic was worsening catastrophically.

But over that same period, Australia's population also grew substantially — and aged significantly. More people were also being screened and diagnosed than ever before. A researcher argues that when you adjust for population size and age structure, the age-standardised incidence rate of Type 2 diabetes has actually been relatively stable or even declining in some age groups.

Before reading on:

Q1: What is the difference between the total number of cases of a disease and the rate of disease in a population? Why does this distinction matter for public health decisions?

Q2: Why might improved screening and diagnosis make a disease appear to be increasing even if the underlying rate is unchanged?

✏️ Write your reasoning before reading on.

Key Terms — scan before reading

understand how those claimsgenerated, what makes them reliable, and how to critically evaluate them

DiabetesGetting Worse"?

Whatthe difference between the total number of cases of a disease and the rate of disease in a population? Why does this dis

variable thatassociated with both the exposure and the outcome, which can create a spurious or distorted apparent association between

confounding variableand why it matters

standardised ratesneeded for valid population comparisons

Know

The definitions and formulas for incidence, prevalence, and mortality rate
The key features of cohort, case-control, cross-sectional, and randomised controlled trial (RCT) study designs
What a confounding variable is and why it matters
The difference between correlation and causation in epidemiological data

Understand

Why incidence and prevalence give different pictures of disease burden
Why age-standardised rates are needed for valid population comparisons
Why observational studies can establish association but not proof of causation alone
Why RCTs are considered the gold standard and when they cannot be used

Can Do

Calculate or interpret incidence, prevalence, and mortality rates from data
Identify the most appropriate study design for a given research question
Identify confounding variables in epidemiological scenarios
Evaluate whether epidemiological data supports a causal or merely associative conclusion

Core Content

Key Point

Connect this concept back to the broader homeostasis and disease framework you have built across the course.

Measuring Disease — Incidence, Prevalence and Mortality

Three different measurements of disease burden, each answering a different question about population health

Before any analysis of disease patterns can be done, the disease must be measured consistently. Epidemiologists use three core measures — incidence, prevalence, and mortality — each capturing a different aspect of disease burden. Confusing these three measures is one of the most common errors in interpreting health data.

Incidence — New Cases Over Time

What it measures: The rate at which new cases of a disease arise in a population over a defined time period. Answers the question: "How fast is this disease spreading or developing?"

Types of bias in epidemiological studies

Incidence rate = (Number of NEW cases in time period ÷ Population at risk) × 100,000

Example: If 1,800 people are newly diagnosed with melanoma in a population of 10 million in one year, the incidence rate is 18 per 100,000 per year.

Best used for: Measuring the risk of developing a disease; assessing whether a disease is becoming more or less common; evaluating the impact of prevention programs.

Interpretation note: Rising incidence can reflect genuinely increased disease burden, OR improved screening/diagnosis detecting cases that previously went undetected.

Prevalence — All Existing Cases at a Point in Time

What it measures: The total proportion of a population that has a condition at a specific time point (point prevalence) or during a specified period (period prevalence). Answers: "How much of this disease exists in the community right now?"

Prevalence = (Number of EXISTING cases ÷ Total population) × 100

Example: If 1.3 million of Australia's 26 million people have Type 2 diabetes at a given time, prevalence is 5%.

Best used for: Healthcare planning (how many people need treatment?); allocating health resources; understanding disease burden on the healthcare system.

Key relationship: Prevalence = Incidence × Average duration of disease. A disease with low incidence but long duration (e.g. Type 2 diabetes — chronic, lifelong) has high prevalence. A disease with high incidence but short duration (e.g. influenza — resolves or kills quickly) has lower prevalence relative to incidence.

Mortality Rate — Deaths From Disease

What it measures: The number of deaths attributable to a specific disease per unit of population per unit of time. Distinct from case fatality rate (proportion of cases that die).

Mortality rate = (Number of deaths from disease ÷ Population) × 100,000 per year

Example: If 1,800 people die of coronary heart disease per year in a population of 10 million, mortality rate is 18 per 100,000 per year.

Best used for: Measuring the severity of a disease; assessing the impact of treatment advances; comparing the lethality of different diseases.

Important distinction: A disease can have high incidence but low mortality (e.g. most skin cancers — common but rarely fatal if caught early) or low incidence but high mortality (e.g. pancreatic cancer — rare but ~90% mortality within 5 years).

Age-standardisation — making fair comparisons

Raw rates cannot always be fairly compared between populations with different age structures. Older populations will always have higher crude rates of age-related diseases (cancer, cardiovascular disease, dementia) simply because they have more older people — not necessarily because those diseases are more prevalent for any given age. Age-standardisation applies a standard age distribution to both populations, allowing the underlying disease rates to be compared on a level playing field.

This is why Australia's age-standardised cancer mortality has been falling for decades even as the total number of cancer deaths has risen — improved treatment has reduced the death rate per case, but the population is larger and older, producing more total deaths despite the improved rate.

Common Error Students confuse incidence and prevalence. Key distinction: incidence = NEW cases in a time period (a rate of new events); prevalence = ALL existing cases at a point in time (a snapshot). A disease with effective treatment that extends life will see rising prevalence even if incidence is stable or falling — because people live longer with the disease. HIV is the classic example: effective antiretroviral therapy means fewer people die, so prevalence rises even as incidence (new infections) falls in high-income countries.

Epidemiological Study Designs — From Observation to Experiment

Different questions require different study designs — each with characteristic strengths, limitations, and appropriate uses

Interactive

Try this: Click a research scenario, then click the matching study design. Each correct match explains why that design is most appropriate for the research question.

Notice how the study design is determined by what question is being asked, not just by convenience.

Interactive: Study Design Matcher

Key Takeaway

The best study design depends on what question you are asking. Cohort studies establish temporal sequence for common outcomes. Case-control studies are efficient for rare diseases. Cross-sectional studies give population snapshots. RCTs provide the strongest causal evidence but cannot be used for harmful exposures.

Epidemiologists cannot randomly assign people to smoke cigarettes or eat unhealthy diets for decades to study the effect on health — most important questions about disease and exposure must be studied observationally. The choice of study design determines what questions can be answered and what conclusions can be drawn.

Cohort Study (Prospective)

Design: A group of disease-free people is followed over time. Exposed and unexposed subgroups are compared for disease development.

Strength: Establishes temporal sequence (exposure before disease); good for common outcomes; can study multiple outcomes from one exposure.

Limitation: Slow and expensive; loss to follow-up; not practical for rare diseases.

Example: British Doctors Study (Doll and Hill) — followed 40,000 doctors from 1951, comparing smoking status to lung cancer rates over decades.

Case-Control Study (Retrospective)

Design: People with a disease (cases) are compared with disease-free people (controls). Past exposures are compared between groups.

Strength: Efficient for rare diseases; quick and inexpensive; can study multiple exposures simultaneously.

Limitation: Relies on recalled exposure (recall bias); cannot establish temporal sequence as clearly; cannot directly calculate incidence.

Example: Comparing asbestos exposure history in mesothelioma patients vs controls without mesothelioma.

Cross-Sectional Study

Design: Measures both exposure and disease at the same time point — a population snapshot.

Strength: Quick and cheap; good for measuring prevalence; generates hypotheses for further study.

Limitation: Cannot establish which came first (exposure or disease); susceptible to prevalence bias; cannot calculate incidence.

Example: National Health Survey measuring smoking status and cardiovascular disease in a sample of Australians at one time point.

Randomised Controlled Trial (RCT)

Design: Participants randomly allocated to intervention (treatment/exposure) or control (placebo/no treatment) groups. Outcomes compared after defined follow-up period.

Strength: Randomisation controls for confounders — the gold standard for establishing causation. Double-blinding reduces bias.

Limitation: Cannot be used for harmful exposures (unethical); expensive; may lack real-world generalisability.

Example: HPV vaccine trials — participants randomly assigned to vaccine or placebo, HPV infection and precancerous lesion rates compared.

HSC Tip When asked to choose or evaluate a study design in an exam, always address: (1) whether it can establish temporal sequence (exposure before disease); (2) whether it controls for confounders; (3) whether it is ethical and practical. RCTs are gold standard but often impossible for disease risk questions. Cohort studies are the next best for establishing causation; case-control for rare diseases; cross-sectional for prevalence.

Confounding Variables, Bias and the Limits of Epidemiological Evidence

Association ≠ causation — understanding what can go wrong in epidemiological studies

Epidemiology measures associations between exposures and diseases in real populations — which means it must contend with all the complexity of real life. Confounding variables, biases, and chance findings can all produce apparent associations that are not genuinely causal. Critical evaluation of epidemiological evidence requires recognising these limitations.

Confounding variables

A confounding variable is one that is associated with both the exposure being studied and the disease outcome, and whose presence can create a spurious or distorted apparent relationship. Classic example: a study finds that coffee drinking is associated with lung cancer. Apparent conclusion: coffee causes lung cancer. But coffee drinkers in the 1950s–1980s were also far more likely to smoke. Smoking is the confounder — it is associated with both coffee drinking (same social context) and lung cancer (causally). When you control for smoking status, the coffee-cancer association largely disappears.

Confounders can be controlled by: matching cases and controls on confounding variables; statistical adjustment; stratified analysis; or — best of all — randomisation (which distributes confounders equally between groups by chance).

Types of bias

Selection bias: The sample does not represent the target population. Healthy worker effect (workers are healthier than the general population, so occupational studies underestimate disease rates in the general population).
Recall bias: Cases (who have a disease) may remember past exposures differently from controls (who do not). People who have developed cancer may more carefully recall exposure to potential carcinogens than healthy controls.
Information bias: Systematic errors in measuring exposure or outcome. Misclassification of disease status or exposure level.
Reporting bias: Certain outcomes are more likely to be published (publication bias — positive results are more publishable than null findings).

Correlation vs causation

Two variables can be correlated (statistically associated) without one causing the other. The classic examples: ice cream sales correlate with drowning rates (both rise in summer — confounded by hot weather). Countries with higher chocolate consumption have more Nobel Prize winners per capita (confounded by wealth and education). In epidemiology, establishing causation requires more than statistical association — it requires the Bradford Hill criteria (from L08): strength, consistency, specificity, temporality, dose-response, biological plausibility, coherence, experiment, and analogy.

Meaning

Large relative risk

Association replicated in multiple studies/populations

Exposure precedes disease

More exposure = more disease

Known biological mechanism

Exposure linked to specific disease(s)

Example (tobacco-lung cancer)

Smokers have 15–25× higher lung cancer risk than non-smokers

Found in studies across dozens of countries and populations

Smoking precedes cancer by 20–40 years

More pack-years = higher lung cancer risk; quitting reduces risk

PAHs form DNA adducts → G→T mutations in TP53 (L08)

Tobacco specifically causes lung and other cancers, not all diseases equally

Epidemiological measures showing incidence, prevalence and mortality rate definitions

The three core epidemiological measures and why age-standardised rates are essential for valid comparisons between populations.

IQ3 Framing The IQ3 inquiry question asks you to "investigate the treatment of non-infectious diseases." Epidemiology is the foundation of that investigation — you cannot evaluate whether a treatment or prevention strategy works without measuring disease rates, identifying risk factors, designing studies to test interventions, and critically evaluating the evidence. The skills in this lesson apply to everything in IQ3 and IQ4.

Reading and Interpreting Epidemiological Data

The practical skills needed to interpret tables, graphs, and data from studies — tested directly in HSC exams

HSC Biology exams regularly include tables or graphs of epidemiological data and ask students to interpret, analyse, and evaluate them. These questions test whether you can read what the data shows (describe), identify patterns and relationships (analyse), and assess whether the data supports a conclusion (evaluate).

Worked example — interpreting a data table

The following table shows hypothetical data on Type 2 diabetes in Australia:

Year	Total diagnosed cases	Population (millions)	Crude prevalence (%)	Age-standardised prevalence (%)
2000	640,000	19.2	3.3%	4.1%
2010	970,000	22.3	4.4%	4.3%
2022	1,300,000	25.9	5.0%	4.2%

What you should notice and state:

Total cases increased by ~100% from 2000 to 2022 — but this partly reflects population growth.
Crude prevalence increased from 3.3% to 5.0% — but this partly reflects the ageing of the population (older people have higher T2D rates).
Age-standardised prevalence changed much less (4.1% → 4.2%) — suggesting the underlying disease rate in comparable age groups has been relatively stable, not dramatically increasing. Much of the apparent increase reflects demographic change rather than worsening epidemic.
This illustrates why age-standardised rates are essential for valid comparisons over time and between populations.

Exam Technique When asked to "analyse" epidemiological data in HSC exams: (1) Describe the overall trend; (2) Quote specific data values from the table/graph to support your description; (3) Identify any patterns, anomalies, or differences between groups; (4) If asked to evaluate, state what conclusions can and cannot be drawn — always note limitations (confounders, correlation vs causation, age-standardisation).

Real-World Anchor — The Doll and Hill British Doctors Study

How a Cohort Study Changed Medicine and Public Policy

In 1951, Richard Doll and Austin Bradford Hill sent questionnaires to every doctor on the British Medical Register asking about their smoking habits. They then followed these ~40,000 doctors for decades, recording causes of death. This was one of the first large prospective cohort studies — and it produced the most compelling epidemiological evidence for the smoking-lung cancer causal link.

Within 4 years, the data were clear enough that Doll himself — a smoker — quit. After 50 years of follow-up, the study had quantified that smoking reduced life expectancy by approximately 10 years, established the dose-response relationship between pack-years and lung cancer mortality, and documented the survival benefit of quitting at different ages. Doctors who quit before age 35 had near-normal life expectancy; those who quit at 65 had reduced but still significant benefit.

The study design was crucial: by following people forward in time (prospective cohort), it established that smoking preceded lung cancer — ruling out reverse causation. By following a large, well-defined professional cohort with reliable death certification, it minimised selection bias and information bias. The results were consistent across subgroups, showed a clear dose-response, and had an identified biological mechanism (carcinogens in smoke). This is exactly how Bradford Hill's criteria for causation are applied in practice.

Priority Misconceptions — Epidemiology

✗

"Prevalence and incidence mean the same thing." — Incidence is the rate of NEW cases arising per unit time. Prevalence is the total existing cases at a point in time. A disease with effective treatment that extends life (e.g. HIV in high-income countries) will have rising prevalence even if incidence is falling — because people live longer with the disease. Always specify which measure you are using.

✗

"Correlation means causation." — Statistical association between an exposure and disease does not prove causation. A third variable (confounder) may explain the association. Causation requires temporal sequence, dose-response, biological plausibility, consistency, and ideally experimental confirmation — the Bradford Hill criteria.

✗

"RCTs can always be used to test hypotheses about disease causes." — RCTs cannot ethically be used to study harmful exposures. You cannot randomly assign people to smoke for 20 years to study lung cancer. For questions about harmful exposures, observational studies (cohort, case-control) are the only ethical approach. RCTs are used for testing treatments and preventive interventions, not for studying harmful exposures.

✗

"A study with more participants is always better." — Sample size matters, but study design matters more. A very large cross-sectional study cannot establish temporal sequence — it cannot determine whether the exposure preceded the disease. A large observational study with uncontrolled confounders will produce a large, precisely wrong answer. Design quality, control of bias, and appropriate methods are more important than size alone.

✗

"If a disease rate is rising, the disease is becoming more common." — Rising rates can reflect: genuinely increasing disease burden; population growth (more absolute cases from the same rate); population ageing (more age-susceptible people); improved screening and diagnosis (detecting cases that previously went unrecognised); changes in diagnostic criteria (new definition includes previously uncounted cases). Always ask whether rates are crude (unadjusted) or age-standardised before interpreting a trend.

Image Slot 1: Visual comparison of incidence vs prevalence using a bathtub analogy — water flowing IN = incidence (new cases); water in the tub = prevalence (all existing cases); water flowing OUT = recovery or death. Show how effective treatment (slower outflow of death) raises the water level (prevalence) even if inflow (incidence) is constant. Label with disease examples.

Image Slot 2: Study design hierarchy diagram — pyramid showing RCT at top (strongest evidence), cohort study, case-control study, cross-sectional study, and case reports at the base. For each level: direction of time (prospective/retrospective), ability to establish causation, and practical limitations. Annotated with examples of each from Australian public health context.

Country	CVD deaths	Population	Crude mortality (per 100k)	Age-standardised mortality (per 100k)
Country A	48,000	24 million	200	145
Country B	18,000	12 million	150	190

Multiple Choice

Test Your Understanding

UnderstandBand 3

1. A chronic disease has an annual incidence rate of 50 per 100,000 and an average disease duration of 10 years (before recovery or death). Which statement about this disease is most likely to be correct?

The prevalence will be approximately equal to the incidence rate — 50 per 100,000

The prevalence will be approximately 500 per 100,000, because prevalence ≈ incidence × duration (50 × 10)

The prevalence will be lower than the incidence rate because chronic diseases accumulate deaths

Prevalence and incidence cannot be compared because they are measured in different units

ApplyBand 3

2. Researchers want to study whether long-term exposure to air pollution causes chronic obstructive pulmonary disease (COPD). They cannot randomly assign people to breathe polluted or clean air for decades. Which study design is most appropriate, and why?

Cross-sectional study — because it can quickly measure both air pollution exposure and COPD rates at one time point

Case-control study — because it is retrospective and can quickly compare past pollution exposure between COPD cases and controls

Cohort study — because it follows people forward in time, establishing that pollution exposure preceded COPD development; it can measure cumulative exposure over years and calculate incidence rates in high- vs low-exposure groups, providing the strongest observational evidence for causation

RCT — because randomisation is needed to control for confounders in respiratory disease research

AnalyseBand 4

3. A study finds that countries with higher rates of mobile phone ownership have higher rates of brain cancer. A newspaper runs the headline: "Mobile phones cause brain cancer." Which statement best evaluates this conclusion?

The conclusion is valid because the correlation is strong and consistent

The conclusion is invalid because correlation studies can never provide useful information about disease causes

The conclusion is valid if the association has been replicated in multiple studies

The conclusion is not supported — this is an ecological correlation between country-level data, which cannot establish causation at the individual level. Countries with high mobile phone ownership are typically wealthier, with better healthcare and diagnosis, different diets, and different exposures — any of which could confound the association. Individual-level studies with controlled confounders and biological plausibility would be needed before any causal claim could be made

UnderstandBand 3

4. The crude cardiovascular disease mortality rate in Country X is 220 per 100,000 per year. The age-standardised rate is 140 per 100,000 per year. What does this difference most likely indicate?

Country X has an older-than-average population — the crude rate is inflated by the large proportion of elderly people who have higher CVD mortality rates; the age-standardised rate more accurately reflects the underlying disease risk after accounting for age structure

The country has been undercounting CVD deaths and the age-standardised rate is the more accurate measure of true mortality

CVD affects younger people disproportionately in Country X, causing the crude rate to underestimate the true burden

Age-standardised rates are always lower than crude rates because standardisation removes all disease cases from older age groups

EvaluateBand 5

5. A large cohort study follows 50,000 people for 20 years and finds that people who eat processed meat 5+ times per week have a 40% higher relative risk of developing bowel cancer than those who eat it less than once per week. A researcher concludes that this study proves processed meat causes bowel cancer. Evaluate this conclusion.

The conclusion is fully justified — a 40% higher risk from a large cohort study is definitive proof of causation

The conclusion is unjustified because cohort studies can never provide evidence about disease causes

The conclusion is partially supported but overstated. The cohort study demonstrates a consistent association with temporal sequence (exposure precedes disease), which is stronger evidence than a cross-sectional study. However, residual confounding is possible (processed meat eaters may have other dietary and lifestyle differences), and 'proves' is too strong — epidemiological studies establish evidence, not proof. The finding is consistent with biological plausibility (heterocyclic amines in processed meat are carcinogens) and is supported by multiple other studies, making a causal interpretation reasonable — but the word 'proves' is inappropriate for any observational study

The conclusion is invalid because the relative risk increase of 40% is too small to be meaningful

Short Answer

✍

Short Answer Questions

ApplyBand 4

6. Distinguish between incidence and prevalence, and explain why effective treatment for a disease can cause its prevalence to rise even if its incidence is falling. Use a specific example in your answer. 4 MARKS

✏️ Define both measures, explain the treatment effect, and give an example in your book.

AnalyseBand 4–5

7. A researcher is investigating whether regular physical activity reduces the risk of Type 2 diabetes. Describe how you would design a cohort study to investigate this question. In your answer, identify the cohort, the exposure and outcome variables, how data would be collected, and what would constitute evidence of an association. Identify one confounding variable and explain how it would be controlled. 5 MARKS

✏️ Design the cohort study in full in your book.

EvaluateBand 5–6

8. Evaluate the following claim using your knowledge of epidemiological evidence and study design: "Because an RCT is the gold standard for medical evidence, we should require RCT evidence before accepting any claim that an environmental exposure causes disease." 5 MARKS

✏️ Evaluate the claim — strengths and limitations of the argument — in your book.

Revisit Your Thinking

Return to your Think First responses at the start of the lesson.

Q1 — total cases vs rate: Total case count is influenced by population size. Rate (cases per 100,000) controls for this — allowing valid comparison between different-sized populations and over time. Did you identify that rate = cases divided by population size?
Q2 — improved screening making disease appear to increase: Screening detects cases that previously existed but were undiagnosed. When screening uptake increases, the diagnosed (recorded) prevalence rises even if true prevalence is stable. Did you identify this 'ascertainment bias' in your prediction?
Write the formulas for incidence rate and prevalence from memory, and state in one sentence why age-standardised rates are more useful than crude rates for comparing populations.

Comprehensive Answers

▼

Activity 1 — Calculations and Interpretation

1. Bowel cancer calculations. (a) Incidence rate = 450 ÷ 5,000,000 × 100,000 = 9 per 100,000 per year. (b) Prevalence = 8,500 ÷ 5,000,000 × 100 = 0.17%. (c) Case fatality rate = 90 ÷ 8,500 × 100 = 1.06% per year. This means approximately 1 in 100 existing bowel cancer patients dies from the disease each year — reflecting that many patients are diagnosed early (stage I–II) and survive for many years, while a smaller proportion with advanced disease contribute most deaths.

2. CVD country comparison. From crude rates: Country A has a higher crude CVD mortality rate (200 vs 150 per 100,000), suggesting more CVD deaths per person in Country A. However, from age-standardised rates: Country B actually has a higher age-standardised CVD mortality rate (190 vs 145 per 100,000). This reversal — where Country A has a higher crude rate but lower age-standardised rate than Country B — indicates that Country A has an older population. The large proportion of elderly people in Country A elevates the crude rate even though the underlying disease risk at each age is lower than in Country B. Age-standardised rates are more valid for comparing the underlying disease burden between populations because they remove the confounding effect of different age structures. If you wanted to know which country has riskier cardiovascular conditions for its residents, Country B's higher age-standardised rate indicates it has the greater underlying risk, despite having fewer total deaths per 100,000 in the raw data.

Activity 2 — Study Design Evaluation

1. New drug for T2D. Best design: Randomised Controlled Trial (RCT). Justification: the drug has been safety-tested and is believed to be beneficial — it is ethical to assign participants to the drug vs placebo. RCT randomisation eliminates confounding — both groups will have similar baseline characteristics (age, diet, activity, genetics) by chance, so any difference in T2D progression can be attributed to the drug. Double-blinding (neither participant nor researcher knows which group they are in) eliminates measurement and assessment bias. Limitation: the trial population may not represent all T2D patients (often excludes very old, pregnant, or multi-morbid patients), limiting generalisability to these groups. Also, the trial duration may be insufficient to detect longer-term effects.

2. Childhood sun exposure and melanoma. Best design: Case-control study. Justification: participants are adults aged 40–60 — we cannot follow children prospectively for 30–40 years to study adult melanoma (too slow, too expensive, too much loss to follow-up). Case-control design efficiently recruits adults who already have melanoma (cases) and compares them to adults without melanoma (controls), asking both groups to recall their childhood sun exposure history. This is retrospective — looking back at exposure rather than following forward. Limitation: recall bias — melanoma patients may more carefully recall and report sun exposure in childhood than controls (who have no particular reason to think carefully about their childhood sun habits). This asymmetric recall can artificially inflate the apparent association between sun exposure and melanoma. This can be partially controlled by using objective measures (e.g. geographical records of sun exposure at birth location) rather than self-report.

3. Red wine and CVD. Confounder 1: socioeconomic status (SES). People who regularly drink moderate amounts of red wine tend to have higher SES. Higher SES is independently associated with lower CVD risk (better healthcare access, healthier diet, more physical activity, lower stress). SES is therefore associated with both the exposure (wine drinking) and the outcome (lower CVD risk) — a classic confounder. Confounder 2: diet quality. People who drink red wine moderately often follow a Mediterranean-style diet (high in vegetables, fish, olive oil, whole grains) which independently reduces CVD risk. Wine drinking may be a marker for this overall dietary pattern rather than a cause of reduced CVD risk. Why the study cannot establish causation: the study is observational — it shows that wine drinkers have lower CVD rates, but cannot determine whether the wine is causal or whether the association is entirely explained by confounders like SES and diet. Without controlling for these confounders (through statistical adjustment, matching, or preferably an RCT of moderate wine consumption), the headline claim is unjustified. An RCT randomising people to drink red wine vs not drink wine would be needed to establish causation — but such trials face practical challenges (compliance, ethics). Existing RCT evidence from polyphenol supplementation (the proposed active ingredient in red wine) does not support a strong protective effect.

Multiple Choice

1. B — Prevalence ≈ incidence × duration: 50 × 10 = 500 per 100,000. A chronic disease accumulates cases over its duration, so prevalence is much higher than incidence. Option A would only be correct if duration were 1 year (very short, acute). Option C is wrong — chronic diseases have high prevalence because people live with them long-term. Option D is wrong — while measured differently, the relationship between incidence and prevalence is a key concept.

2. C — Cohort study is most appropriate: it establishes temporal sequence (pollution exposure measured before COPD develops); can measure cumulative exposure; calculates incidence rates in exposed vs unexposed groups. Option A (cross-sectional) cannot establish which came first. Option B (case-control) is less able to establish temporal sequence and is better for rare diseases. Option D (RCT) is unethical for harmful exposures.

3. D — Ecological correlation (country-level association) cannot establish individual-level causation. Wealthier countries have both more mobile phones and better cancer detection; many confounders exist. Option A is wrong — correlation strength alone does not establish causation. Option B is wrong — observational studies can provide useful information, just not proof of causation. Option C is wrong — replication of an ecological association still does not establish individual-level causation without controlling for confounders.

4. A — Crude rate 220, age-standardised 145: the crude rate is higher than the age-standardised rate, meaning after accounting for age structure, the underlying rate is lower. This indicates Country X has an older-than-average population — the age-standardisation reference population is younger on average, so applying the standard removes the contribution of the large elderly population. Option B confuses standardisation with accuracy of recording. Option C reverses the logic. Option D is wrong — age-standardisation adjusts for age distribution, not removes cases from older groups.

5. C — The cohort study provides strong observational evidence: temporal sequence established, large sample, dose-response likely present, consistent with biological plausibility (carcinogens in processed meat). However, 'proves' overstates what any observational study can establish — residual confounding is always possible. The finding is consistent with causation and would be appropriately described as 'provides strong evidence for' or 'is consistent with a causal role for processed meat in bowel cancer.' Option A incorrectly uses 'proof.' Option B is wrong — cohort studies are valuable evidence. Option D is wrong — a 40% relative risk increase is clinically and epidemiologically significant.

Short Answer Model Answers

Q6 (4 marks): Incidence is the rate of new cases of a disease arising in a defined population over a specified time period, calculated as: (number of new cases ÷ population at risk) × 100,000. It measures how fast disease is developing. Prevalence is the total proportion of a population with a disease at a given time, calculated as: (number of existing cases ÷ total population) × 100. It measures how much disease exists in the community [2 marks]. Why effective treatment raises prevalence despite falling incidence: prevalence = incidence × average disease duration. Effective treatment extends survival — patients live longer with the disease, so they remain in the 'existing cases' pool for longer. Even if incidence (new cases per year) falls because of prevention programs, the total pool of people living with the disease grows as survival improves. Prevalence thus rises [1 mark]. Example: HIV in high-income countries. Antiretroviral therapy introduced in the mid-1990s dramatically extended life expectancy for HIV-positive people. Annual new infections (incidence) fell due to prevention programs. But people lived with HIV for decades rather than dying within years — so the total number living with HIV (prevalence) rose substantially through the 2000s despite falling incidence. Similar patterns are seen with Type 2 diabetes (better treatment → longer survival with disease → rising prevalence despite stable incidence) [1 mark — 4 marks total].

Q7 (5 marks): Cohort: recruit a large sample (e.g. 50,000+) of adults aged 35–65 who do NOT currently have Type 2 diabetes and who are willing to be followed for 15–20 years. Diversity in physical activity levels is important for adequate comparison groups [1 mark]. Exposure variable: measure physical activity level at baseline and at regular intervals (e.g. every 2 years) — using standardised questionnaires or accelerometers — recording type, duration, intensity, and frequency of exercise per week. Classify participants into physical activity categories (e.g. sedentary, moderate, high) [1 mark]. Outcome variable: development of Type 2 diabetes, defined as fasting blood glucose ≥7.0 mmol/L, HbA1c ≥48 mmol/mol, or physician diagnosis. Measured at each follow-up visit [1 mark]. Evidence of association: calculate and compare annual T2D incidence rates in high-activity vs low-activity groups. Calculate relative risk (incidence in high-activity ÷ incidence in low-activity) — a relative risk significantly less than 1.0 would support a protective effect of physical activity. Test for dose-response: does increasing activity further decrease T2D risk? [1 mark]. Confounding variable: dietary habits (healthier eaters tend to exercise more AND have lower T2D risk through independent mechanisms). Control: collect detailed dietary data at baseline and follow-up; statistically adjust for dietary quality index in the analysis. Alternatively, restrict the analysis to participants with similar dietary patterns [1 mark — 5 marks total].

Q8 (5 marks): Why RCTs are the gold standard: randomisation distributes known and unknown confounders equally between groups by chance — the only method that truly controls for variables not yet identified. Double-blinding prevents measurement bias. RCTs establish causation because the only systematic difference between groups is the intervention [1 mark]. Why RCTs cannot be required for environmental exposures: it is unethical to randomly assign people to harmful exposures — decades of smoking, asbestos inhalation, high UV exposure, or toxic chemical exposure cannot be deliberately assigned to participants. An ethical review board would never approve such trials. Requiring RCT evidence before accepting that a harmful exposure causes disease would mean we could never establish causation for any environmental carcinogen or toxin using experimental methods [2 marks]. What observational evidence can establish: the Bradford Hill criteria provide a framework for establishing causation from observational evidence — strength of association, consistency across studies and populations, temporal sequence (exposure precedes disease), dose-response relationship, biological plausibility, and specificity. When multiple criteria are satisfied simultaneously, causal inference is reasonable. The smoking-lung cancer causal link was established entirely through observational studies (primarily cohort studies like Doll and Hill) combined with molecular mechanistic evidence — an RCT was neither possible nor necessary [1 mark]. Conclusion: the claim is partially valid in the sense that RCTs are the ideal study design when ethically possible — for drug trials, vaccination programs, dietary interventions, and preventive strategies, RCT evidence should be sought. But requiring RCT evidence for harmful environmental exposures is an inappropriate standard that would paralise public health action. A more appropriate standard is convergent evidence from multiple study types — observational studies establishing association and temporal sequence, plus mechanistic evidence establishing biological plausibility, plus dose-response data — together satisfying the Bradford Hill criteria. The field of epidemiology has developed precisely because RCTs are often impossible, and its methods are capable of establishing causation to the standard required for public health and clinical decision-making [1 mark — 5 marks total].

Epidemiology — Incidence, Prevalence, Mortality and Study Design

Download this lesson's worksheet

Is "More People Have Diabetes" the Same as "Diabetes Is Getting Worse"?

Know

Understand

Can Do

Measuring Disease — Incidence, Prevalence and Mortality

Incidence — New Cases Over Time

Prevalence — All Existing Cases at a Point in Time

Mortality Rate — Deaths From Disease

Age-standardisation — making fair comparisons

Epidemiological Study Designs — From Observation to Experiment

Cohort Study (Prospective)

Case-Control Study (Retrospective)

Cross-Sectional Study

Randomised Controlled Trial (RCT)

Confounding Variables, Bias and the Limits of Epidemiological Evidence

Confounding variables

Types of bias

Correlation vs causation

Reading and Interpreting Epidemiological Data

Worked example — interpreting a data table

How a Cohort Study Changed Medicine and Public Policy

Priority Misconceptions — Epidemiology

Copy Into Your Books

Three Disease Measures

Study Designs

Confounding + Bias

Data Interpretation

Calculating and Comparing Disease Measures

Choosing and Evaluating Study Designs

Test Your Understanding

Short Answer Questions

Revisit Your Thinking

Comprehensive Answers

Activity 1 — Calculations and Interpretation

Activity 2 — Study Design Evaluation

Multiple Choice

Short Answer Model Answers

Race Through Epidemiology!

Mark lesson as complete