Overview

Linear Regression is a statistical method used to model the relationship between a dependent variable ( y ) and one or more independent variables ( X ). It aims to find the linear relationship between these variables by fitting a line (or hyperplane) that minimizes the difference between the observed and predicted values.

Key Concepts

Simple Linear Regression

Models the relationship between two variables by fitting a straight line: $y = β_{0} + β_{1} x + ϵ$

$y$ : Dependent variable
$x$ : Independent variable
$β_{0}$ : Intercept
( \beta_1 ): Slope
( \epsilon ): Error term

Multiple Linear Regression

Extends the simple linear model to include multiple independent variables: $y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{p} x_{p} + ϵ$

( y ): Dependent variable
( x_1, x_2, $\dots$ , x_p ): Independent variables
( \beta_0, \beta_1, \ldots, \beta_p ): Coefficients
( \epsilon ): Error term

Assumptions of Linear Regression

Linearity: The relationship between the independent and dependent variable is linear.
Independence: Observations are independent of each other.
Homoscedasticity: Constant variance of errors.
Normality: The residuals (errors) are normally distributed.

Estimating Coefficients

Ordinary Least Squares (OLS): Method to estimate the coefficients by minimizing the sum of squared residuals (errors): [ \hat{\beta} = (X^T X)^{-1} X^T y ]
Gradient Descent: Iterative optimization algorithm to minimize the cost function.

Evaluating the Model

R-squared (( R^2 )): Proportion of variance in the dependent variable that is predictable from the independent variables. [ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} ]
Adjusted R-squared: Adjusts ( R^2 ) for the number of predictors in the model.
Mean Squared Error (MSE): Average of the squares of the residuals.
Root Mean Squared Error (RMSE): Square root of MSE.
Residual Plots: Graphical analysis to check assumptions.

Common Techniques and Extensions

Polynomial Regression: Extends linear regression by adding polynomial terms.
Ridge Regression: Adds L2 regularization to the cost function to prevent overfitting.
Lasso Regression: Adds L1 regularization to the cost function, promoting sparsity.
Elastic Net: Combines L1 and L2 regularization.

Python Implementation (Example)

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
 
# Load data
data = pd.read_csv('data.csv')
X = data[['feature1', 'feature2']]
y = data['target']
 
# Initialize and fit model
model = LinearRegression()
model.fit(X, y)
 
# Predict and evaluate
y_pred = model.predict(X)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
 
print(f'MSE: {mse}')
print(f'R-squared: {r2}')

Summing up

We can learn a linear regression model by minizing a loss function, for example the squared loss
For this to happen, we need to solve a set of linear equations, as minizing the loss function is an optimization problem.
Alternatively, gradient descent is a different optimization method to minimize the loss function directly.

Louis' Notes

Explorer

Linear regression