python - Probabilities for computers decision in ML? -
phrased quite badly, trying ask how can display percentage of confidence in predicting classification algorithm? using scikit-learn.
say trying identify whether apple or orange based on texture , weight:
#features: 0 = "bumpy" 1 = "smooth" #labels: 0 = apple 1 = orange features = [[140, 1], [130, 1], [150, 0], [170, 0]] labels = [0, 0, 1, 1] # using decision tree in instance clf = tree.decisiontreeclassifier() clf = clf.fit(features, labels) print(clf.predict([[160, 0]]))
so predicting [160, 0]
, based on pattern, , computer predict orange. is there way scikit-learn can predict confidence computer has in returning either 1 or 0? important when have more parameters in feature vector.
yep.
just use predict_proba(x)
function (instead of predict()
).
probability = clf.predict_proba([[160, 0]])
certain classifiers in scikit have ability this, others don't.
in case of decisiontreeclassifier
, model, when asked probability of given class, give fraction of elements in training set same class in particular "leaf".
a leaf in decision tree set of conditions (rules) represents path down tree.
for example example [0, 160]
represent [x1, x2]
, rule might have been
if x1 < 10: if x2 > 150: # in our training set of `n` examples, 100 fell under # rule set. 75 of them apple, , 25 orange - thus: probability = [0.75, 0.25] # p(apple) = .75, p(orange) = .25
and of course in binary classification case (two classes) scikit returns both, need 1 or other because probabilities complementary (1 - .75 = .25).
check out docs here learn more.
hope helps.
Comments
Post a Comment