How Gauss would compute a Confusion matrix for their classification model

May 19, 2022 · 2 min read

xkcd confusion matrix comic — Source: xkcd.com

A confusion matrix is a practical and conceptually simple tool to evaluate a classification model. So we need to honour it with a simple way to compute it, like Gauss in the past, without the magic of ~~from sklearn.metrics import confusion_matrix~~ would do it with simple linear algebra operations:

A confusion matrix is the matrix multiplication by the true and predicted labels, both encoding as one-hot vectors.

If we have the true labels of 4 observations in vector $𝐲 = [1, 0, 2, 1]$ , and 3 different classes (i.e. 0, 1 and 2), their one-hot encoding will be:

𝐓 = [\begin{matrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{matrix}] \in [0,1]^{4 \times 3}

Some classification model gives us the predicted label for each observation in the vector $\hat{𝐲} = [2, 0, 2, 0]$ , by the same logic above, the one-hot encoding will be:

\hat{𝐓} = [\begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{matrix}] \in [0,1]^{4 \times 3}

We have everything to compute the confusion matrix and, it will be

𝐓^{⊤} \hat{𝐓} \in 𝐙_{0 +}^{3 \times 3}

. So again,

A confusion matrix is the matrix multiplication by the true and predicted labels, both encoding as one-hot vectors.

\boldsymbol T^{\top}\hat{\boldsymbol T} = \begin{bmatrix} 1 & 0 & 0\ 1 & 0 & 1\ 0 & 0 & 1 \end{bmatrix}~~\in~Z_{0+}^{3~~\times~3} $$

As you notice, the confusion matrix summarizes the information correctly of both vectors.

\begin{matrix} 𝐲 = [1,0,2,1] \\ \hat{𝐲} = [2,0,2,0] \end{matrix}

The sum of the diagonal elements tells us that two observations were correctly classified by the model
The model correctly classified one observation for class 0
The model didn't assign any label to class 1, producing two errors
The model confuses two class' 1-observations, one with a 2 and the other with a 0, look at the second row

Now with `import numpy as np`

We need two steps to compute our confusion matrix.

First, we need a way to transform a vector $𝐯$ with k-classes into their one-hot-encoding version, v_one_hot = one_hot_econding(v):

def one_hot_encoding(v):
  '''Return the one-hot encoding vector for k-classes label vector'''
  num_classes = np.unique(v).size
  return np.eye(num_classes)[v]

Second, compute the confusion matrix, $𝐓^{⊤} \hat{𝐓} \in 𝐙_{0 +}^{K \times K}$ , for k-classes; there are many ways of doing it with numpy as you can see in the following code. Below I used the canonical notation to name the true labels (y) and the predicted ones (y_pred):

# 1st option: Using the matrix multiplication '@' operator
one_hot_encoding(y).T @ one_hot_encoding(y_pred)

# 2nd option: Using np.dot()
np.dot(one_hot_encoding(y).T, one_hot_encoding(y_pred))

# 3rd option: Using np.matmul()
np.matmul(one_hot_encoding(y).T, one_hot_encoding(y_pred))

And we are done! Of course, you can always get your confusion matrix from your favourite store ;)

from sklearn.metrics import confusion_matrix
confusion_matrix(y, y_pred)

That's the way computer talks to each other.

How Gauss would compute a Confusion matrix for their classification model

Now with import numpy as np

Now with `import numpy as np`