36

I would like to get a confidence score of each of the predictions that it makes, showing on how sure the classifier is on its prediction that it is correct.

I want something like this:

How sure is the classifier on its prediction?

Class 1: 81% that this is class 1
Class 2: 10%
Class 3: 6%
Class 4: 3%

Samples of my code:

features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(main, target, test_size = 0.4)

# Determine amount of time to train
t0 = time()
model = SVC()
#model = SVC(kernel='poly')
#model = GaussianNB()

model.fit(features_train, labels_train)

print 'training time: ', round(time()-t0, 3), 's'

# Determine amount of time to predict
t1 = time()
pred = model.predict(features_test)

print 'predicting time: ', round(time()-t1, 3), 's'

accuracy = accuracy_score(labels_test, pred)

print 'Confusion Matrix: '
print confusion_matrix(labels_test, pred)

# Accuracy in the 0.9333, 9.6667, 1.0 range
print accuracy



model.predict(sub_main)

# Determine amount of time to predict
t1 = time()
pred = model.predict(sub_main)

print 'predicting time: ', round(time()-t1, 3), 's'

print ''
print 'Prediction: '
print pred

I suspect that I would use the score() function, but I seem to keep implementing it correctly. I don't know if that's the right function or not, but how would one get the confidence percentage of a classifier's prediction?

2
  • 1
    really helpful question. is there a way to associate the Class names with probabilities as well? for example if i get the following list of probabilities for a input [0.33 0.25 0.75]. i know that the third one will be picked, but which class does the third one refer to?
    – AbtPst
    Dec 18, 2015 at 15:28
  • 1
    the probabilities correspond to classifier.classes_. But they are non-sense if the dataset is small :-( . Moreover, they are also not guaranteed to match up with classifier.predict() :'( . link to docs page Jun 23, 2017 at 16:43

3 Answers 3

33

Per the SVC documentation, it looks like you need to change how you construct the SVC:

model = SVC(probability=True)

and then use the predict_proba method:

class_probabilities = model.predict_proba(sub_main)
4
  • 2
    Ah okay, thanks! And how would you translate class_probabilities into percentage form? For example, I got [[1.614297e-03 3.99785477e-04 5.44054423e-02 9.9254921e-01]] as the output, but I don't know how to interpret these values, let alone convert them myself. What exactly do these values mean? Jun 30, 2015 at 15:57
  • 1
    @user3377126 How did you interpreted the values Mar 12, 2019 at 12:05
  • Is the probability same as confidence? While predict_proba returns the proability/likelihood of that observation belonging to that particular class. How can we find the confidence with which the likelihood is determined
    – The Great
    Jan 17, 2022 at 13:53
  • If you have time, can help with this related question. - stats.stackexchange.com/questions/560774/…
    – The Great
    Jan 17, 2022 at 13:55
16

For those estimators implementing predict_proba() method, like Justin Peel suggested, You can just use predict_proba() to produce probability on your prediction.

For those estimators which do not implement predict_proba() method, you can construct confidence interval by yourself using bootstrap concept (repeatedly calculate your point estimates in many sub-samples).

Let me know if you need any detailed examples to demonstrate either of these two cases.

7
  • Ah okay, thanks! And how would you translate class_probabilities into percentage form? For example, I got [[1.614297e-03 3.99785477e-04 5.44054423e-02 9.9254921e-01]] as the output, but I don't know how to interpret these values, let alone convert them myself. What exactly do these values mean? Jun 30, 2015 at 15:57
  • 5
    @user3377126 They are already in percentage form. :) The sum of each row should equal exactly to 1. The last element is actually 0.992 which means the algo predict it belongs to this class with prob 99.2%. Note e-03 is just scientific notation.
    – Jianxun Li
    Jun 30, 2015 at 16:00
  • Ah I see now, thank you! :) I would have accepted your answer, but since Justin Peel commented first with the example that worked for me, I decided to give it to him, sorry about that but thanks for the advice! Jun 30, 2015 at 17:36
  • 1
    No problem at all. :) Glad that we both could help.
    – Jianxun Li
    Jun 30, 2015 at 17:37
  • 1
    is there a way to associate the Class names with probabilities as well? for example if i get the following list of probabilities for a input [0.33 0.25 0.75]. i know that the third one will be picked, but which class does the third one refer to?
    – AbtPst
    Dec 18, 2015 at 15:28
0

using above code you will get 4 class names with predicted value for each sample. You can change no_of_class for as many as you need.

probas1 =model.predict_proba(sub_main)
no_of_class=4

top3_classes1 = np.argsort(-probas1, axis=1)[:, :no_of_class]

class_labels1 = rf.classes_[top3_classes1[i]] for i in range(len(top3_classes1))]

class_labels1

top_confidence1=[probas1[i][top3_classes1[i]] for i in range(len(top_classes1))]

for i in range(len(class_labels1)):

    for j in range(no_of_class):

        print(f"Sample {i}: {class_labels1[i][j]} :: {top_confidence1[i][j]}")

NOTE: you can simply also convert this into dataframe where you can add column of predicted class and in another column its predicted value

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.