Loading...

Confidence Intervals and Regression Diagnostics

Regression Assumptions and Residual Plots

Linear regression's validity rests on four key assumptions about the residuals (errors) — the difference between observed values and model-predicted values. (1) Linearity: the relationship between predictors and outcome is linear. Checked with a residuals-vs-fitted plot: residuals should scatter randomly around the horizontal zero line with no curved pattern. A U-shaped or inverted-U pattern indicates a non-linear relationship — add a quadratic term or use a non-linear model. (2) Independence of errors: residuals are not correlated with each other — especially important in time series data (autocorrelation). Test with Durbin-Watson statistic (values near 2 indicate no autocorrelation; below 1.5 or above 2.5 indicates concern). (3) Homoscedasticity (constant variance): residuals have consistent spread across all fitted values. Checked with the scale-location plot (square root of standardized residuals vs. fitted values): a horizontal line with evenly spread points indicates homoscedasticity. A funnel shape (increasing spread) indicates heteroscedasticity — variance increases with fitted value. Remedy: weighted least squares, robust standard errors, or log-transformation of the outcome. (4) Normality of residuals: residuals are approximately normally distributed — checked with a Q-Q plot of residuals (not of the raw outcome). Robust to modest violations with large n.

Confidence Intervals and Regression Diagnostics

Regression Assumptions and Residual Plots

Linear regression's validity rests on four key assumptions about the residuals (errors) — the difference between observed values and model-predicted values. (1) Linearity: the relationship between predictors and outcome is linear. Checked with a residuals-vs-fitted plot: residuals should scatter randomly around the horizontal zero line with no curved pattern. A U-shaped or inverted-U pattern indicates a non-linear relationship — add a quadratic term or use a non-linear model. (2) Independence of errors: residuals are not correlated with each other — especially important in time series data (autocorrelation). Test with Durbin-Watson statistic (values near 2 indicate no autocorrelation; below 1.5 or above 2.5 indicates concern). (3) Homoscedasticity (constant variance): residuals have consistent spread across all fitted values. Checked with the scale-location plot (square root of standardized residuals vs. fitted values): a horizontal line with evenly spread points indicates homoscedasticity. A funnel shape (increasing spread) indicates heteroscedasticity — variance increases with fitted value. Remedy: weighted least squares, robust standard errors, or log-transformation of the outcome. (4) Normality of residuals: residuals are approximately normally distributed — checked with a Q-Q plot of residuals (not of the raw outcome). Robust to modest violations with large n.