Simple Linear Regression: Fitting the Line
Simple linear regression models the relationship between a predictor variable (X) and a response variable (Y) with a straight line: Ŷ = β₀ + β₁X, where β₀ is the y-intercept and β₁ is the slope. The line is fit using the least squares method: minimize the sum of squared residuals (SSR = Σ(yᵢ − ŷᵢ)²). The slope formula: β₁ = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ(xᵢ − x̄)² = r × (sᵧ/sₓ). The intercept: β₀ = ȳ − β₁x̄. Interpreting coefficients: β₁ is the average change in Y for a one-unit increase in X, holding all other factors constant. Example: if we regress salary (Y, in thousands) on years of experience (X): Ŷ = 42 + 2.3X means a new hire with 0 years earns $42k expected, and each additional year of experience is associated with $2,300 more salary on average. Residuals (eᵢ = yᵢ − ŷᵢ) represent the unexplained variation — the vertical distance between each data point and the regression line. A good model has residuals that are small, randomly scattered (no pattern), and approximately normally distributed.