Log Loss, also known as Logistic Loss or Cross-Entropy Loss, is a common loss function used in machine learning algorithms that deal with binary or multi-class classification problems. It quantifies the accuracy of a classifier by penalising false classifications.

In essence, Log Loss measures the uncertainty of the probabilities of your model in terms of how far they are from the true values. A perfect model would have a log loss of 0. Log Loss is not symmetric, and its value increases significantly as the predicted probability diverges from the actual label.

To better understand Log Loss, let's first define it mathematically. For binary classification, Log Loss is defined as:

Log Loss = - (1/n) Σ [y_i . log(p_i) + (1-y_i) . log(1-p_i)]


n is the number of observations
y_i is the binary target (true label)
p_i is the predicted probability of the observation being a positive class
In simple terms, this formula can be described as follows:

For each instance, it considers the logarithm of the predicted probability for the actual class (either positive or negative). So, if the instance is positive, it takes into account log(p_i), and if the instance is negative, it takes log(1 - p_i).
These logarithmic losses for each instance are then averaged across all instances to calculate the total log loss.
When dealing with multiclass classification problems, we use a generalisation of the Log Loss formula:

Log Loss = - (1/n) Σ Σ y_ij.log(p_ij)


n is the number of observations
y_ij is a binary indicator of whether class j is the correct classification for observation i
p_ij is the predicted probability that observation i is of class j
Key things to note about Log Loss:

Log Loss penalises both types of errors, but especially those predictions that are confident and wrong.
Lower Log Loss is better, with 0 representing a perfect Log Loss. However, achieving a Log Loss of 0 means perfect predictions, which is highly unlikely in real-world scenarios.
Log Loss heavily penalises classifiers that are confident about an incorrect classification. For example, if for a particular observation, the actual label is 1 but the model predicts the probability of the label being 1 as close to 0, then the log loss for that observation will be a very large positive number.


Consider a binary classification problem where we have 3 observations:

ObservationActual ClassPredicted Probability of Class 1

Let's compute the Log Loss for each observation:

For observation 1:

The actual class (y_i) is 1, and the predicted probability (p_i) is 0.9.
So, the log loss for this observation is: - [1 log(0.9) + (1 - 1) log(1 - 0.9)] = -log(0.9) = 0.105

For observation 2:

The actual class (y_i) is 0, and the predicted probability (p_i) is 0.3.
So, the log loss for this observation is: - [0 log(0.3) + (1 - 0) log(1 - 0.3)] = -log(0.7) = 0.357

For observation 3:

The actual class (y_i) is 1, and the predicted probability (p_i) is 0.2.
So, the log loss for this observation is: - [1 log(0.2) + (1 - 1) log(1 - 0.2)] = -log(0.2) = 1.609

Now, to compute the total log loss, we average these three individual log losses:

Total Log Loss = (0.105 + 0.357 + 1.609) / 3 = 0.690

Note that log loss values are not bounded, and can range from 0 to infinity. A perfect model would have a log loss of 0. Here, our model's log loss is 0.690, indicating room for improvement, especially in its prediction for the third observation, where it predicted a low probability for the actual positive class.