Alternative title: Benchmarking

Overview by class of model

Classification

  • Accuracy
  • Precision
  • Recall (Sensitivity)
  • F1 Score

Regression

  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
  • Mean Absolute Error (MAE)

Clustering

Time-series

  • Mean Absolute Percentage Error (MAPE)
  • Mean Absolute Scaled Error (MASE)
  • Symmetric Mean Absolute Percentage Error (SMAPE)

NLP

  • Perplexity

Generative model

  • Negative log-likelihood

Accuracy

Accuracy measures the proportion of correct predictions made by the model. It is calculated as the number of true positives (TP) and true negatives (TN) divided by the total number of samples:

Accuracy = (TP + TN) / (TP + TN + FP + FN)

where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.

Precision

Precision measures the proportion of true positives among all positive predictions made by the model. It is calculated as the number of true positives (TP) divided by the sum of true positives (TP) and false positives (FP):

Precision = TP / (TP + FP)

Negatives and positives

Actual Negative (0)Actual Positive (1)
Predicted Negative (0)True negativeFalse negative
Predicted Positive (1)False positiveTrue positive

False Negative Rate: FN / (TP + FN) - woman is pregnant, but doctors says she isn’t.

False Positive Rate: FP / TN + FP - doctors says woman is pregnant, but she isn’t.