VEDANT JORE
4 min readJun 5, 2021
created by vedant jore

Confusion matrix is fairly common term when it involves machine learning. Today i am trying to relate the importance of confusion matrix when considering the cyber crimes.

Now in this article i’m going to feed you delicious breakfast about confusion matrix in cyber security.

So before we jump deep we should initially let’s figure out about what a confusion matrix is

What is the Confusion Matrix?

A confusion matrix is a matrix that plots the quantity of correct predictions against the quantity of incorrect predictions. For a binary classifier, this is able to mean the quantity of true negatives and true positives (correct predictions) versus the quantity of false negatives and false positives (incorrect predictions)

Architecture of the Confusion Matrix

The size of the matrix is directly proportional to the amount of output classes. It’s a matrix where we assume the column headers as actual values and therefore the row headers as model predictions. Predicted positive and it’s true are True Positives (TP), predicted negative and it’s true are True Negatives (TN), predicted positive and it’s false are False Positives (FP) and predicted negative and it’s false are False Negatives (FN). Let’s have a look..

It is a table with 4 different combinations of predicted and actual values. It’s extremely useful for measuring Recall, Precision, Specificity, Accuracy and most significantly AUC-ROC Curve

How about we comprehend TP, FP, FN, TN as far as pregnancy relationship

  • True Positive:

Interpretation: You predicted positive and it’s true.

You predicted that a woman is pregnant and she actually is.

  • True Negative:

Interpretation: You predicted negative and it’s true.

You predicted that a man is not pregnant and he is actually not.

  • False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

You predicted that a man is pregnant but he is actually not.

  • False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

You predicted that a woman is not pregnant but she actually is.

Simply Remember, We describe predicted values as Positive and Negative and actual values as True and False.

What would be able to gain from this?

ohh i think i forgot to tell you some basic terminologies related with this.

  1. Precision: It is the portion of values that are identified by the model as correct and are relevant to the problem statement solution. We can also quote this as values, which are a portion of the total positive results given by the model and are positive. Therefore, we can give formula as TP/ (TP + FP).
  2. Recall: It is the portion of values that are correctly identified as positive by the model. It is also termed as True Positive Rate or Sensitivity. Its formula comes out to be TP/ (TP+FN).
  3. F-1 Score: It is the harmonic mean of Precision and Recall. It means that if we were to compare two models, then this metric will suppress the extreme values and consider both False Positives and False Negatives at the same time. It can be quoted as 2*Precision*Recall/ (Precision+Recall).
  4. Accuracy: It is the portion of values that are identified correctly irrespective of whether they are positives or negatives. It means that all True positives and True negatives are included in this. The formula for this is (TP+TN)/ (TP+TN+FP+FN).

Out of the multitude of terms, accuracy and review are most generally utilized. Their tradeoff is a valuable proportion of the achievement of a better prediction. The ideal model should have high accuracy and high recall, however this is just in completely divisible information. In useful use cases, the information is exceptionally chaotic and imbalanced.

How Confusion Matrix used in Cyber Crime ?

With help of the confusion matrix, we can get information related to correctly classified categories and incorrectly classified categories. In a similar way, we can use it in predicting the accuracy of the model which involves the identification of different types of cyberattacks.

Consider the following example of cybercrime from one of the case study. we need to classify all these classes accordingly some features are selected.

Let’s understand this case by considering the result……

criminal court case label can be predicted with an accuracy of 76%. This means 24% of all criminal court cases get misclassified as another class. However, since this accuracy is the weighted average of each f1_score of a class, it may be better to calculate accuracies per class as some classes are performing better than others. It appears ‘child pornography’ can be determined with high accuracy.

In this way, the Confusion Matrix is used in solving various challenges of Cyber Crime.

Thanks for reading…stay tuned for next one…

Responses (5)