sklearn log_loss different number of classes












3















I'm using log_loss with sklearn



from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)


and i have following error:



ValueError: y_true and y_pred have different number of classes 38, 2


It is really strange to me since, the arrays look valid:



print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38

print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38


What is wrong with the log_loss? Why it throws the error?



Sample data:



pred: array([ 0,  1,  2, ...,  3, 12, 16], dtype=int64)
true: array([ 0, 1, 2, ..., 3, 12, 16])









share|improve this question

























  • Can you post some data for pred and true? It looks like your labels are being passed incorrectly.

    – ryanmc
    Nov 9 '15 at 19:25











  • Added to the original post

    – Ablomis
    Nov 9 '15 at 19:28








  • 2





    Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.

    – ryanmc
    Nov 9 '15 at 20:03


















3















I'm using log_loss with sklearn



from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)


and i have following error:



ValueError: y_true and y_pred have different number of classes 38, 2


It is really strange to me since, the arrays look valid:



print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38

print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38


What is wrong with the log_loss? Why it throws the error?



Sample data:



pred: array([ 0,  1,  2, ...,  3, 12, 16], dtype=int64)
true: array([ 0, 1, 2, ..., 3, 12, 16])









share|improve this question

























  • Can you post some data for pred and true? It looks like your labels are being passed incorrectly.

    – ryanmc
    Nov 9 '15 at 19:25











  • Added to the original post

    – Ablomis
    Nov 9 '15 at 19:28








  • 2





    Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.

    – ryanmc
    Nov 9 '15 at 20:03
















3












3








3


1






I'm using log_loss with sklearn



from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)


and i have following error:



ValueError: y_true and y_pred have different number of classes 38, 2


It is really strange to me since, the arrays look valid:



print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38

print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38


What is wrong with the log_loss? Why it throws the error?



Sample data:



pred: array([ 0,  1,  2, ...,  3, 12, 16], dtype=int64)
true: array([ 0, 1, 2, ..., 3, 12, 16])









share|improve this question
















I'm using log_loss with sklearn



from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)


and i have following error:



ValueError: y_true and y_pred have different number of classes 38, 2


It is really strange to me since, the arrays look valid:



print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38

print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
38


What is wrong with the log_loss? Why it throws the error?



Sample data:



pred: array([ 0,  1,  2, ...,  3, 12, 16], dtype=int64)
true: array([ 0, 1, 2, ..., 3, 12, 16])






python scikit-learn






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 9 '15 at 19:29







Ablomis

















asked Nov 9 '15 at 18:52









AblomisAblomis

3515




3515













  • Can you post some data for pred and true? It looks like your labels are being passed incorrectly.

    – ryanmc
    Nov 9 '15 at 19:25











  • Added to the original post

    – Ablomis
    Nov 9 '15 at 19:28








  • 2





    Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.

    – ryanmc
    Nov 9 '15 at 20:03





















  • Can you post some data for pred and true? It looks like your labels are being passed incorrectly.

    – ryanmc
    Nov 9 '15 at 19:25











  • Added to the original post

    – Ablomis
    Nov 9 '15 at 19:28








  • 2





    Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.

    – ryanmc
    Nov 9 '15 at 20:03



















Can you post some data for pred and true? It looks like your labels are being passed incorrectly.

– ryanmc
Nov 9 '15 at 19:25





Can you post some data for pred and true? It looks like your labels are being passed incorrectly.

– ryanmc
Nov 9 '15 at 19:25













Added to the original post

– Ablomis
Nov 9 '15 at 19:28







Added to the original post

– Ablomis
Nov 9 '15 at 19:28






2




2





Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.

– ryanmc
Nov 9 '15 at 20:03







Log loss is to be used to assess the accuracy of probabilities - it is expecting an array of probabilities associated with every possible label (you are passing only label). I believe your pred variable should be an array of n-arrays, where n is the number of labels.

– ryanmc
Nov 9 '15 at 20:03














3 Answers
3






active

oldest

votes


















6














It's simple, you are using the prediction and not the probability of your prediction. Your pred variable contains [ 1 2 1 3 .... ] but to use log_loss it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba:



pred = model.predict_proba(x_test)
eval = log_loss(y_true,pred)





share|improve this answer

































    1














    Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.






    share|improve this answer































      1














      From the log_loss documentation:




      y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)



      Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.




      You need to pass probabilities not the prediction labels.






      share|improve this answer























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f33616102%2fsklearn-log-loss-different-number-of-classes%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        6














        It's simple, you are using the prediction and not the probability of your prediction. Your pred variable contains [ 1 2 1 3 .... ] but to use log_loss it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba:



        pred = model.predict_proba(x_test)
        eval = log_loss(y_true,pred)





        share|improve this answer






























          6














          It's simple, you are using the prediction and not the probability of your prediction. Your pred variable contains [ 1 2 1 3 .... ] but to use log_loss it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba:



          pred = model.predict_proba(x_test)
          eval = log_loss(y_true,pred)





          share|improve this answer




























            6












            6








            6







            It's simple, you are using the prediction and not the probability of your prediction. Your pred variable contains [ 1 2 1 3 .... ] but to use log_loss it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba:



            pred = model.predict_proba(x_test)
            eval = log_loss(y_true,pred)





            share|improve this answer















            It's simple, you are using the prediction and not the probability of your prediction. Your pred variable contains [ 1 2 1 3 .... ] but to use log_loss it should contain something like [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]. to obtain these probabilities use the function predict_proba:



            pred = model.predict_proba(x_test)
            eval = log_loss(y_true,pred)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 21 '18 at 9:44

























            answered Mar 22 '17 at 13:37









            deltasciencedeltascience

            1,49222652




            1,49222652

























                1














                Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.






                share|improve this answer




























                  1














                  Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.






                  share|improve this answer


























                    1












                    1








                    1







                    Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.






                    share|improve this answer













                    Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Jun 29 '16 at 8:01









                    Hima VarshaHima Varsha

                    226111




                    226111























                        1














                        From the log_loss documentation:




                        y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)



                        Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.




                        You need to pass probabilities not the prediction labels.






                        share|improve this answer




























                          1














                          From the log_loss documentation:




                          y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)



                          Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.




                          You need to pass probabilities not the prediction labels.






                          share|improve this answer


























                            1












                            1








                            1







                            From the log_loss documentation:




                            y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)



                            Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.




                            You need to pass probabilities not the prediction labels.






                            share|improve this answer













                            From the log_loss documentation:




                            y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)



                            Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.




                            You need to pass probabilities not the prediction labels.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Mar 22 '17 at 7:37









                            ug2409ug2409

                            644




                            644






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f33616102%2fsklearn-log-loss-different-number-of-classes%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                mysqli_query(): Empty query in /home/lucindabrummitt/public_html/blog/wp-includes/wp-db.php on line 1924

                                How to change which sound is reproduced for terminal bell?

                                Can I use Tabulator js library in my java Spring + Thymeleaf project?