Matthew's Correlation Coefficient
Matthews Correlation Coefficient (MCC) is a measure of the quality of binary classification. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.
The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 represents a random prediction, and -1 indicates an inverse prediction.
Mathematically, it's defined as:
MCC = (TP . TN - FP . FN) / sqrt((TP + FP)(TP + FN)(TN + FP)(TN + FN))
where:
TP is the number of true positives
TN is the number of true negatives
FP is the number of false positives
FN is the number of false negatives
The denominator in the formula is a normalization factor that ensures the MCC always falls between -1 and +1.
MCC is generally regarded as a balanced measure since it can be used even if the classes are of very different sizes. For example, if you have a highly imbalanced dataset, where positive instances are rare, metrics like accuracy can be misleadingly high. MCC can be a more reliable indicator of performance in such cases.
It's worth noting that MCC is more informative than the F1 score, as it takes into account all four values in the confusion matrix (TP, TN, FP, FN), while the F1 score only considers three (TP, FP, FN).
Updated 5 months ago