Predicting all zeros

I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.

Data distribution

How can I fix it? Should I change from neural network to another algorithm?

Thanks in advance

Edit: i've just checked and my model is predicting the same probability for each row.

The model is a NN with 5 layers, and tf.nn.relu6 as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits

To predict the values I use:

predicted = tf.nn.sigmoid(Z5)

correct_pred = tf.equal(tf.round(predicted), Y)

accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

EDIT 2

I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:

Prediction

I have tested to change activation function to tanh or sigmoid but then outputs NaN's

edited Nov 18 '18 at 9:40

asked Nov 18 '18 at 7:42

A. Esquivias

485

2

Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50

Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10

In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10

1

@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16

add a comment |

I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.

Data distribution

How can I fix it? Should I change from neural network to another algorithm?

Thanks in advance

Edit: i've just checked and my model is predicting the same probability for each row.

The model is a NN with 5 layers, and tf.nn.relu6 as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits

To predict the values I use:

predicted = tf.nn.sigmoid(Z5)

correct_pred = tf.equal(tf.round(predicted), Y)

accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

EDIT 2

I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:

Prediction

I have tested to change activation function to tanh or sigmoid but then outputs NaN's

edited Nov 18 '18 at 9:40

asked Nov 18 '18 at 7:42

A. Esquivias

485

2

Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50

Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10

In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10

1

@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16

add a comment |

I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.

Data distribution

How can I fix it? Should I change from neural network to another algorithm?

Thanks in advance

Edit: i've just checked and my model is predicting the same probability for each row.

The model is a NN with 5 layers, and tf.nn.relu6 as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits

To predict the values I use:

predicted = tf.nn.sigmoid(Z5)

correct_pred = tf.equal(tf.round(predicted), Y)

accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

EDIT 2

I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:

Prediction

I have tested to change activation function to tanh or sigmoid but then outputs NaN's

edited Nov 18 '18 at 9:40

asked Nov 18 '18 at 7:42

A. Esquivias

485

I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.

Data distribution

How can I fix it? Should I change from neural network to another algorithm?

Thanks in advance

Edit: i've just checked and my model is predicting the same probability for each row.

The model is a NN with 5 layers, and tf.nn.relu6 as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits

To predict the values I use:

predicted = tf.nn.sigmoid(Z5)

correct_pred = tf.equal(tf.round(predicted), Y)

accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

EDIT 2

I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:

Prediction

I have tested to change activation function to tanh or sigmoid but then outputs NaN's

python tensorflow machine-learning deep-learning

edited Nov 18 '18 at 9:40

asked Nov 18 '18 at 7:42

A. Esquivias

485

edited Nov 18 '18 at 9:40

asked Nov 18 '18 at 7:42

A. Esquivias

485

edited Nov 18 '18 at 9:40

asked Nov 18 '18 at 7:42

A. Esquivias

485

asked Nov 18 '18 at 7:42

A. Esquivias

485

asked Nov 18 '18 at 7:42

A. Esquivias

485

2

Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50

Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10

In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10

1

@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16

add a comment |

2

Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50

Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10

In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10

1

@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16

Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50

Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10

In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10

@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16

add a comment |

1 Answer
1

active

oldest

votes

There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.

To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.

Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.

You can upsample data with a code like this:

from sklearn.utils import resample

# Separate majority and minority classes

df_majority = df[df.balance==0]

df_minority = df[df.balance==1]



# Upsample minority class

df_minority_upsampled = resample(df_minority, 

                                 replace=True,     # sample with replacement

                                 n_samples=576,    # to match majority class

                                 random_state=123) # reproducible results



# Combine majority class with upsampled minority class

df_upsampled = pd.concat([df_majority, df_minority_upsampled])



# Display new class counts

df_upsampled.balance.value_counts()

# 1    576

# 0    576

# Name: balance, dtype: int64

You can find more information and other solutions that are well explained here.

answered Nov 18 '18 at 11:42

Manrique

498112

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53358838%2fpredicting-all-zeros%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.

Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.

You can upsample data with a code like this:

from sklearn.utils import resample

# Separate majority and minority classes

df_majority = df[df.balance==0]

df_minority = df[df.balance==1]



# Upsample minority class

df_minority_upsampled = resample(df_minority, 

                                 replace=True,     # sample with replacement

                                 n_samples=576,    # to match majority class

                                 random_state=123) # reproducible results



# Combine majority class with upsampled minority class

df_upsampled = pd.concat([df_majority, df_minority_upsampled])



# Display new class counts

df_upsampled.balance.value_counts()

# 1    576

# 0    576

# Name: balance, dtype: int64

You can find more information and other solutions that are well explained here.

answered Nov 18 '18 at 11:42

Manrique

498112

add a comment |

To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.

Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.

You can upsample data with a code like this:

from sklearn.utils import resample

# Separate majority and minority classes

df_majority = df[df.balance==0]

df_minority = df[df.balance==1]



# Upsample minority class

df_minority_upsampled = resample(df_minority, 

                                 replace=True,     # sample with replacement

                                 n_samples=576,    # to match majority class

                                 random_state=123) # reproducible results



# Combine majority class with upsampled minority class

df_upsampled = pd.concat([df_majority, df_minority_upsampled])



# Display new class counts

df_upsampled.balance.value_counts()

# 1    576

# 0    576

# Name: balance, dtype: int64

You can find more information and other solutions that are well explained here.

answered Nov 18 '18 at 11:42

Manrique

498112

add a comment |

To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.

Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.

You can upsample data with a code like this:

from sklearn.utils import resample

# Separate majority and minority classes

df_majority = df[df.balance==0]

df_minority = df[df.balance==1]



# Upsample minority class

df_minority_upsampled = resample(df_minority, 

                                 replace=True,     # sample with replacement

                                 n_samples=576,    # to match majority class

                                 random_state=123) # reproducible results



# Combine majority class with upsampled minority class

df_upsampled = pd.concat([df_majority, df_minority_upsampled])



# Display new class counts

df_upsampled.balance.value_counts()

# 1    576

# 0    576

# Name: balance, dtype: int64

You can find more information and other solutions that are well explained here.

answered Nov 18 '18 at 11:42

Manrique

498112

To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.

Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.

You can upsample data with a code like this:

from sklearn.utils import resample

# Separate majority and minority classes

df_majority = df[df.balance==0]

df_minority = df[df.balance==1]



# Upsample minority class

df_minority_upsampled = resample(df_minority, 

                                 replace=True,     # sample with replacement

                                 n_samples=576,    # to match majority class

                                 random_state=123) # reproducible results



# Combine majority class with upsampled minority class

df_upsampled = pd.concat([df_majority, df_minority_upsampled])



# Display new class counts

df_upsampled.balance.value_counts()

# 1    576

# 0    576

# Name: balance, dtype: int64

You can find more information and other solutions that are well explained here.

answered Nov 18 '18 at 11:42

Manrique

498112

answered Nov 18 '18 at 11:42

Manrique

498112

answered Nov 18 '18 at 11:42

Manrique

498112

answered Nov 18 '18 at 11:42

Manrique

498112

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

2wEq1,o1BBq ObvDHk,D8pu6wO7U17ubZ840BDHzF91OFGC3fG7XjdV,E5co0hHkgTePgAH Bb5

搜尋此網誌

Cfrgtkky