Overview
Linear Regression is a statistical method used to model the relationship between a dependent variable ( y ) and one or more independent variables ( X ). It aims to find the linear relationship between these variables by fitting a line (or hyperplane) that minimizes the difference between the observed and predicted values.
Key Concepts
Simple Linear Regression
Models the relationship between two variables by fitting a straight line:
- : Dependent variable
- : Independent variable
- : Intercept
- ( \beta_1 ): Slope
- ( \epsilon ): Error term
Multiple Linear Regression
Extends the simple linear model to include multiple independent variables:
- ( y ): Dependent variable
- ( x_1, x_2, , x_p ): Independent variables
- ( \beta_0, \beta_1, \ldots, \beta_p ): Coefficients
- ( \epsilon ): Error term
Assumptions of Linear Regression
- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of errors.
- Normality: The residuals (errors) are normally distributed.
Estimating Coefficients
- Ordinary Least Squares (OLS): Method to estimate the coefficients by minimizing the sum of squared residuals (errors): [ \hat{\beta} = (X^T X)^{-1} X^T y ]
- Gradient Descent: Iterative optimization algorithm to minimize the cost function.
Evaluating the Model
- R-squared (( R^2 )): Proportion of variance in the dependent variable that is predictable from the independent variables. [ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} ]
- Adjusted R-squared: Adjusts ( R^2 ) for the number of predictors in the model.
- Mean Squared Error (MSE): Average of the squares of the residuals.
- Root Mean Squared Error (RMSE): Square root of MSE.
- Residual Plots: Graphical analysis to check assumptions.
Common Techniques and Extensions
- Polynomial Regression: Extends linear regression by adding polynomial terms.
- Ridge Regression: Adds L2 regularization to the cost function to prevent overfitting.
- Lasso Regression: Adds L1 regularization to the cost function, promoting sparsity.
- Elastic Net: Combines L1 and L2 regularization.
Python Implementation (Example)
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load data
data = pd.read_csv('data.csv')
X = data[['feature1', 'feature2']]
y = data['target']
# Initialize and fit model
model = LinearRegression()
model.fit(X, y)
# Predict and evaluate
y_pred = model.predict(X)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
print(f'MSE: {mse}')
print(f'R-squared: {r2}')
Summing up
- We can learn a linear regression model by minizing a loss function, for example the squared loss
- For this to happen, we need to solve a set of linear equations, as minizing the loss function is an optimization problem.
- Alternatively, gradient descent is a different optimization method to minimize the loss function directly.