Overview

Linear Regression is a statistical method used to model the relationship between a dependent variable ( y ) and one or more independent variables ( X ). It aims to find the linear relationship between these variables by fitting a line (or hyperplane) that minimizes the difference between the observed and predicted values.

Key Concepts

Simple Linear Regression

Models the relationship between two variables by fitting a straight line:

  • : Dependent variable
  • : Independent variable
  • : Intercept
  • ( \beta_1 ): Slope
  • ( \epsilon ): Error term

Multiple Linear Regression

Extends the simple linear model to include multiple independent variables:

  • ( y ): Dependent variable
  • ( x_1, x_2, , x_p ): Independent variables
  • ( \beta_0, \beta_1, \ldots, \beta_p ): Coefficients
  • ( \epsilon ): Error term

Assumptions of Linear Regression

  1. Linearity: The relationship between the independent and dependent variable is linear.
  2. Independence: Observations are independent of each other.
  3. Homoscedasticity: Constant variance of errors.
  4. Normality: The residuals (errors) are normally distributed.

Estimating Coefficients

  • Ordinary Least Squares (OLS): Method to estimate the coefficients by minimizing the sum of squared residuals (errors): [ \hat{\beta} = (X^T X)^{-1} X^T y ]
  • Gradient Descent: Iterative optimization algorithm to minimize the cost function.

Evaluating the Model

  • R-squared (( R^2 )): Proportion of variance in the dependent variable that is predictable from the independent variables. [ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} ]
  • Adjusted R-squared: Adjusts ( R^2 ) for the number of predictors in the model.
  • Mean Squared Error (MSE): Average of the squares of the residuals.
  • Root Mean Squared Error (RMSE): Square root of MSE.
  • Residual Plots: Graphical analysis to check assumptions.

Common Techniques and Extensions

  • Polynomial Regression: Extends linear regression by adding polynomial terms.
  • Ridge Regression: Adds L2 regularization to the cost function to prevent overfitting.
  • Lasso Regression: Adds L1 regularization to the cost function, promoting sparsity.
  • Elastic Net: Combines L1 and L2 regularization.

Python Implementation (Example)

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
 
# Load data
data = pd.read_csv('data.csv')
X = data[['feature1', 'feature2']]
y = data['target']
 
# Initialize and fit model
model = LinearRegression()
model.fit(X, y)
 
# Predict and evaluate
y_pred = model.predict(X)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
 
print(f'MSE: {mse}')
print(f'R-squared: {r2}')

Summing up

  • We can learn a linear regression model by minizing a loss function, for example the squared loss
  • For this to happen, we need to solve a set of linear equations, as minizing the loss function is an optimization problem.
  • Alternatively, gradient descent is a different optimization method to minimize the loss function directly.