sklearn log_loss different number of classes
I'm using log_loss with sklearn
from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)
and i have following error:
ValueError: y_true and y_pred have different number of classes 38, 2
It is really strange to me since, the arrays look valid:
print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38
print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38
What is wrong with the log_loss? Why it throws the error?
Sample data:
pred: array([ 0, 1, 2, ..., 3, 12, 16], dtype=int64)
true: array([ 0, 1, 2, ..., 3, 12, 16])
python scikit-learn
add a comment |
I'm using log_loss with sklearn
from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)
and i have following error:
ValueError: y_true and y_pred have different number of classes 38, 2
It is really strange to me since, the arrays look valid:
print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38
print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38
What is wrong with the log_loss? Why it throws the error?
Sample data:
pred: array([ 0, 1, 2, ..., 3, 12, 16], dtype=int64)
true: array([ 0, 1, 2, ..., 3, 12, 16])
python scikit-learn
Can you post some data for pred and true? It looks like your labels are being passed incorrectly.
– ryanmc
Nov 9 '15 at 19:25
Added to the original post
– Ablomis
Nov 9 '15 at 19:28
2
Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.
– ryanmc
Nov 9 '15 at 20:03
add a comment |
I'm using log_loss with sklearn
from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)
and i have following error:
ValueError: y_true and y_pred have different number of classes 38, 2
It is really strange to me since, the arrays look valid:
print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38
print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38
What is wrong with the log_loss? Why it throws the error?
Sample data:
pred: array([ 0, 1, 2, ..., 3, 12, 16], dtype=int64)
true: array([ 0, 1, 2, ..., 3, 12, 16])
python scikit-learn
I'm using log_loss with sklearn
from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)
and i have following error:
ValueError: y_true and y_pred have different number of classes 38, 2
It is really strange to me since, the arrays look valid:
print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38
print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38
What is wrong with the log_loss? Why it throws the error?
Sample data:
pred: array([ 0, 1, 2, ..., 3, 12, 16], dtype=int64)
true: array([ 0, 1, 2, ..., 3, 12, 16])
python scikit-learn
python scikit-learn
edited Nov 9 '15 at 19:29
Ablomis
asked Nov 9 '15 at 18:52
AblomisAblomis
3515
3515
Can you post some data for pred and true? It looks like your labels are being passed incorrectly.
– ryanmc
Nov 9 '15 at 19:25
Added to the original post
– Ablomis
Nov 9 '15 at 19:28
2
Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.
– ryanmc
Nov 9 '15 at 20:03
add a comment |
Can you post some data for pred and true? It looks like your labels are being passed incorrectly.
– ryanmc
Nov 9 '15 at 19:25
Added to the original post
– Ablomis
Nov 9 '15 at 19:28
2
Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.
– ryanmc
Nov 9 '15 at 20:03
Can you post some data for pred and true? It looks like your labels are being passed incorrectly.
– ryanmc
Nov 9 '15 at 19:25
Can you post some data for pred and true? It looks like your labels are being passed incorrectly.
– ryanmc
Nov 9 '15 at 19:25
Added to the original post
– Ablomis
Nov 9 '15 at 19:28
Added to the original post
– Ablomis
Nov 9 '15 at 19:28
2
2
Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.
– ryanmc
Nov 9 '15 at 20:03
Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.
– ryanmc
Nov 9 '15 at 20:03
add a comment |
3 Answers
3
active
oldest
votes
It's simple, you are using the prediction and not the probability of your prediction. Your pred
variable contains [ 1 2 1 3 .... ] but to use log_loss
it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba
:
pred = model.predict_proba(x_test)
eval = log_loss(y_true,pred)
add a comment |
Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.
add a comment |
From the log_loss documentation:
y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)
Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.
You need to pass probabilities not the prediction labels.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f33616102%2fsklearn-log-loss-different-number-of-classes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
It's simple, you are using the prediction and not the probability of your prediction. Your pred
variable contains [ 1 2 1 3 .... ] but to use log_loss
it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba
:
pred = model.predict_proba(x_test)
eval = log_loss(y_true,pred)
add a comment |
It's simple, you are using the prediction and not the probability of your prediction. Your pred
variable contains [ 1 2 1 3 .... ] but to use log_loss
it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba
:
pred = model.predict_proba(x_test)
eval = log_loss(y_true,pred)
add a comment |
It's simple, you are using the prediction and not the probability of your prediction. Your pred
variable contains [ 1 2 1 3 .... ] but to use log_loss
it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba
:
pred = model.predict_proba(x_test)
eval = log_loss(y_true,pred)
It's simple, you are using the prediction and not the probability of your prediction. Your pred
variable contains [ 1 2 1 3 .... ] but to use log_loss
it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba
:
pred = model.predict_proba(x_test)
eval = log_loss(y_true,pred)
edited Nov 21 '18 at 9:44
answered Mar 22 '17 at 13:37
deltasciencedeltascience
1,49222652
1,49222652
add a comment |
add a comment |
Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.
add a comment |
Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.
add a comment |
Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.
Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.
answered Jun 29 '16 at 8:01
Hima VarshaHima Varsha
226111
226111
add a comment |
add a comment |
From the log_loss documentation:
y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)
Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.
You need to pass probabilities not the prediction labels.
add a comment |
From the log_loss documentation:
y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)
Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.
You need to pass probabilities not the prediction labels.
add a comment |
From the log_loss documentation:
y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)
Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.
You need to pass probabilities not the prediction labels.
From the log_loss documentation:
y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)
Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.
You need to pass probabilities not the prediction labels.
answered Mar 22 '17 at 7:37
ug2409ug2409
644
644
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f33616102%2fsklearn-log-loss-different-number-of-classes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you post some data for pred and true? It looks like your labels are being passed incorrectly.
– ryanmc
Nov 9 '15 at 19:25
Added to the original post
– Ablomis
Nov 9 '15 at 19:28
2
Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.
– ryanmc
Nov 9 '15 at 20:03