Why ANOVA: The Multiple Comparisons Problem
When comparing three or more group means, the naive approach of running multiple t-tests (Group A vs. B, A vs. C, B vs. C) inflates the Type I error rate. With 3 groups and α = 0.05 per comparison, the family-wise error rate (FWER) — the probability of at least one false positive across all comparisons — rises to 1 − (1 − 0.05)³ = 0.143. With 10 groups and 45 pairwise comparisons, the FWER exceeds 0.90 — virtually guaranteed false positives. Analysis of Variance (ANOVA) tests all group means simultaneously in a single test, preserving the overall Type I error rate at alpha. The logic of ANOVA: if all group means are equal (H₀), the variance between group means should be no larger than what we expect from sampling variation. The F-statistic is the ratio of variance between groups to variance within groups (error variance): F = MS_between / MS_within. A large F indicates the between-group variance is much larger than expected by chance, suggesting at least one group mean differs. ANOVA's H₀ is omnibus: μ₁ = μ₂ = μ₃ (all means are equal). The alternative (H₁) only states that at least one mean differs — it does not specify which. A significant F-test prompts post hoc tests to identify the specific pairs that differ. ANOVA assumes: (1) independence of observations, (2) normality within each group (or large n), (3) homogeneity of variance across groups (test with Levene's; violate with Welch's ANOVA or Brown-Forsythe test).