Predicting all zeros












0














I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.



Data distribution



How can I fix it? Should I change from neural network to another algorithm?



Thanks in advance



Edit: i've just checked and my model is predicting the same probability for each row.



The model is a NN with 5 layers, and tf.nn.relu6 as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits



To predict the values I use:



predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))


EDIT 2



I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:



Prediction



I have tested to change activation function to tanh or sigmoid but then outputs NaN's










share|improve this question




















  • 2




    Please supply info on what model are you using, what is the data, and the code.
    – Dinari
    Nov 18 '18 at 7:50










  • Google "class imbalance"...
    – desertnaut
    Nov 18 '18 at 8:10










  • In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
    – quant
    Nov 18 '18 at 8:10






  • 1




    @quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
    – desertnaut
    Nov 18 '18 at 8:16
















0














I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.



Data distribution



How can I fix it? Should I change from neural network to another algorithm?



Thanks in advance



Edit: i've just checked and my model is predicting the same probability for each row.



The model is a NN with 5 layers, and tf.nn.relu6 as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits



To predict the values I use:



predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))


EDIT 2



I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:



Prediction



I have tested to change activation function to tanh or sigmoid but then outputs NaN's










share|improve this question




















  • 2




    Please supply info on what model are you using, what is the data, and the code.
    – Dinari
    Nov 18 '18 at 7:50










  • Google "class imbalance"...
    – desertnaut
    Nov 18 '18 at 8:10










  • In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
    – quant
    Nov 18 '18 at 8:10






  • 1




    @quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
    – desertnaut
    Nov 18 '18 at 8:16














0












0








0


1





I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.



Data distribution



How can I fix it? Should I change from neural network to another algorithm?



Thanks in advance



Edit: i've just checked and my model is predicting the same probability for each row.



The model is a NN with 5 layers, and tf.nn.relu6 as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits



To predict the values I use:



predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))


EDIT 2



I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:



Prediction



I have tested to change activation function to tanh or sigmoid but then outputs NaN's










share|improve this question















I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.



Data distribution



How can I fix it? Should I change from neural network to another algorithm?



Thanks in advance



Edit: i've just checked and my model is predicting the same probability for each row.



The model is a NN with 5 layers, and tf.nn.relu6 as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits



To predict the values I use:



predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))


EDIT 2



I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:



Prediction



I have tested to change activation function to tanh or sigmoid but then outputs NaN's







python tensorflow machine-learning deep-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 18 '18 at 9:40







A. Esquivias

















asked Nov 18 '18 at 7:42









A. EsquiviasA. Esquivias

485




485








  • 2




    Please supply info on what model are you using, what is the data, and the code.
    – Dinari
    Nov 18 '18 at 7:50










  • Google "class imbalance"...
    – desertnaut
    Nov 18 '18 at 8:10










  • In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
    – quant
    Nov 18 '18 at 8:10






  • 1




    @quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
    – desertnaut
    Nov 18 '18 at 8:16














  • 2




    Please supply info on what model are you using, what is the data, and the code.
    – Dinari
    Nov 18 '18 at 7:50










  • Google "class imbalance"...
    – desertnaut
    Nov 18 '18 at 8:10










  • In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
    – quant
    Nov 18 '18 at 8:10






  • 1




    @quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
    – desertnaut
    Nov 18 '18 at 8:16








2




2




Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50




Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50












Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10




Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10












In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10




In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10




1




1




@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16




@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16












1 Answer
1






active

oldest

votes


















0














There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.



To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.




Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.




You can upsample data with a code like this:



from sklearn.utils import resample
# Separate majority and minority classes
df_majority = df[df.balance==0]
df_minority = df[df.balance==1]

# Upsample minority class
df_minority_upsampled = resample(df_minority,
replace=True, # sample with replacement
n_samples=576, # to match majority class
random_state=123) # reproducible results

# Combine majority class with upsampled minority class
df_upsampled = pd.concat([df_majority, df_minority_upsampled])

# Display new class counts
df_upsampled.balance.value_counts()
# 1 576
# 0 576
# Name: balance, dtype: int64


You can find more information and other solutions that are well explained here.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53358838%2fpredicting-all-zeros%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.



    To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.




    Up-sampling is the process of randomly duplicating observations from
    the minority class in order to reinforce its signal.




    You can upsample data with a code like this:



    from sklearn.utils import resample
    # Separate majority and minority classes
    df_majority = df[df.balance==0]
    df_minority = df[df.balance==1]

    # Upsample minority class
    df_minority_upsampled = resample(df_minority,
    replace=True, # sample with replacement
    n_samples=576, # to match majority class
    random_state=123) # reproducible results

    # Combine majority class with upsampled minority class
    df_upsampled = pd.concat([df_majority, df_minority_upsampled])

    # Display new class counts
    df_upsampled.balance.value_counts()
    # 1 576
    # 0 576
    # Name: balance, dtype: int64


    You can find more information and other solutions that are well explained here.






    share|improve this answer


























      0














      There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.



      To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.




      Up-sampling is the process of randomly duplicating observations from
      the minority class in order to reinforce its signal.




      You can upsample data with a code like this:



      from sklearn.utils import resample
      # Separate majority and minority classes
      df_majority = df[df.balance==0]
      df_minority = df[df.balance==1]

      # Upsample minority class
      df_minority_upsampled = resample(df_minority,
      replace=True, # sample with replacement
      n_samples=576, # to match majority class
      random_state=123) # reproducible results

      # Combine majority class with upsampled minority class
      df_upsampled = pd.concat([df_majority, df_minority_upsampled])

      # Display new class counts
      df_upsampled.balance.value_counts()
      # 1 576
      # 0 576
      # Name: balance, dtype: int64


      You can find more information and other solutions that are well explained here.






      share|improve this answer
























        0












        0








        0






        There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.



        To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.




        Up-sampling is the process of randomly duplicating observations from
        the minority class in order to reinforce its signal.




        You can upsample data with a code like this:



        from sklearn.utils import resample
        # Separate majority and minority classes
        df_majority = df[df.balance==0]
        df_minority = df[df.balance==1]

        # Upsample minority class
        df_minority_upsampled = resample(df_minority,
        replace=True, # sample with replacement
        n_samples=576, # to match majority class
        random_state=123) # reproducible results

        # Combine majority class with upsampled minority class
        df_upsampled = pd.concat([df_majority, df_minority_upsampled])

        # Display new class counts
        df_upsampled.balance.value_counts()
        # 1 576
        # 0 576
        # Name: balance, dtype: int64


        You can find more information and other solutions that are well explained here.






        share|improve this answer












        There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.



        To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.




        Up-sampling is the process of randomly duplicating observations from
        the minority class in order to reinforce its signal.




        You can upsample data with a code like this:



        from sklearn.utils import resample
        # Separate majority and minority classes
        df_majority = df[df.balance==0]
        df_minority = df[df.balance==1]

        # Upsample minority class
        df_minority_upsampled = resample(df_minority,
        replace=True, # sample with replacement
        n_samples=576, # to match majority class
        random_state=123) # reproducible results

        # Combine majority class with upsampled minority class
        df_upsampled = pd.concat([df_majority, df_minority_upsampled])

        # Display new class counts
        df_upsampled.balance.value_counts()
        # 1 576
        # 0 576
        # Name: balance, dtype: int64


        You can find more information and other solutions that are well explained here.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 18 '18 at 11:42









        ManriqueManrique

        498112




        498112






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53358838%2fpredicting-all-zeros%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

            ComboBox Display Member on multiple fields

            Is it possible to collect Nectar points via Trainline?