Regression Assumptions and Residual Plots
Linear regression's validity rests on four key assumptions about the residuals (errors) β the difference between observed values and model-predicted values. (1) Linearity: the relationship between predictors and outcome is linear. Checked with a residuals-vs-fitted plot: residuals should scatter randomly around the horizontal zero line with no curved pattern. A U-shaped or inverted-U pattern indicates a non-linear relationship β add a quadratic term or use a non-linear model. (2) Independence of errors: residuals are not correlated with each other β especially important in time series data (autocorrelation). Test with Durbin-Watson statistic (values near 2 indicate no autocorrelation; below 1.5 or above 2.5 indicates concern). (3) Homoscedasticity (constant variance): residuals have consistent spread across all fitted values. Checked with the scale-location plot (square root of standardized residuals vs. fitted values): a horizontal line with evenly spread points indicates homoscedasticity. A funnel shape (increasing spread) indicates heteroscedasticity β variance increases with fitted value. Remedy: weighted least squares, robust standard errors, or log-transformation of the outcome. (4) Normality of residuals: residuals are approximately normally distributed β checked with a Q-Q plot of residuals (not of the raw outcome). Robust to modest violations with large n.