Predicting all zeros
I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.
How can I fix it? Should I change from neural network to another algorithm?
Thanks in advance
Edit: i've just checked and my model is predicting the same probability for each row.
The model is a NN with 5 layers, and tf.nn.relu6
as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits
To predict the values I use:
predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
EDIT 2
I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:
I have tested to change activation function to tanh or sigmoid but then outputs NaN's
python tensorflow machine-learning deep-learning
add a comment |
I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.
How can I fix it? Should I change from neural network to another algorithm?
Thanks in advance
Edit: i've just checked and my model is predicting the same probability for each row.
The model is a NN with 5 layers, and tf.nn.relu6
as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits
To predict the values I use:
predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
EDIT 2
I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:
I have tested to change activation function to tanh or sigmoid but then outputs NaN's
python tensorflow machine-learning deep-learning
2
Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50
Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10
In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10
1
@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16
add a comment |
I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.
How can I fix it? Should I change from neural network to another algorithm?
Thanks in advance
Edit: i've just checked and my model is predicting the same probability for each row.
The model is a NN with 5 layers, and tf.nn.relu6
as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits
To predict the values I use:
predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
EDIT 2
I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:
I have tested to change activation function to tanh or sigmoid but then outputs NaN's
python tensorflow machine-learning deep-learning
I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.
How can I fix it? Should I change from neural network to another algorithm?
Thanks in advance
Edit: i've just checked and my model is predicting the same probability for each row.
The model is a NN with 5 layers, and tf.nn.relu6
as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits
To predict the values I use:
predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
EDIT 2
I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:
I have tested to change activation function to tanh or sigmoid but then outputs NaN's
python tensorflow machine-learning deep-learning
python tensorflow machine-learning deep-learning
edited Nov 18 '18 at 9:40
A. Esquivias
asked Nov 18 '18 at 7:42
A. EsquiviasA. Esquivias
485
485
2
Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50
Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10
In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10
1
@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16
add a comment |
2
Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50
Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10
In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10
1
@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16
2
2
Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50
Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50
Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10
Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10
In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10
In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10
1
1
@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16
@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16
add a comment |
1 Answer
1
active
oldest
votes
There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.
To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.
Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.
You can upsample data with a code like this:
from sklearn.utils import resample
# Separate majority and minority classes
df_majority = df[df.balance==0]
df_minority = df[df.balance==1]
# Upsample minority class
df_minority_upsampled = resample(df_minority,
replace=True, # sample with replacement
n_samples=576, # to match majority class
random_state=123) # reproducible results
# Combine majority class with upsampled minority class
df_upsampled = pd.concat([df_majority, df_minority_upsampled])
# Display new class counts
df_upsampled.balance.value_counts()
# 1 576
# 0 576
# Name: balance, dtype: int64
You can find more information and other solutions that are well explained here.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53358838%2fpredicting-all-zeros%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.
To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.
Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.
You can upsample data with a code like this:
from sklearn.utils import resample
# Separate majority and minority classes
df_majority = df[df.balance==0]
df_minority = df[df.balance==1]
# Upsample minority class
df_minority_upsampled = resample(df_minority,
replace=True, # sample with replacement
n_samples=576, # to match majority class
random_state=123) # reproducible results
# Combine majority class with upsampled minority class
df_upsampled = pd.concat([df_majority, df_minority_upsampled])
# Display new class counts
df_upsampled.balance.value_counts()
# 1 576
# 0 576
# Name: balance, dtype: int64
You can find more information and other solutions that are well explained here.
add a comment |
There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.
To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.
Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.
You can upsample data with a code like this:
from sklearn.utils import resample
# Separate majority and minority classes
df_majority = df[df.balance==0]
df_minority = df[df.balance==1]
# Upsample minority class
df_minority_upsampled = resample(df_minority,
replace=True, # sample with replacement
n_samples=576, # to match majority class
random_state=123) # reproducible results
# Combine majority class with upsampled minority class
df_upsampled = pd.concat([df_majority, df_minority_upsampled])
# Display new class counts
df_upsampled.balance.value_counts()
# 1 576
# 0 576
# Name: balance, dtype: int64
You can find more information and other solutions that are well explained here.
add a comment |
There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.
To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.
Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.
You can upsample data with a code like this:
from sklearn.utils import resample
# Separate majority and minority classes
df_majority = df[df.balance==0]
df_minority = df[df.balance==1]
# Upsample minority class
df_minority_upsampled = resample(df_minority,
replace=True, # sample with replacement
n_samples=576, # to match majority class
random_state=123) # reproducible results
# Combine majority class with upsampled minority class
df_upsampled = pd.concat([df_majority, df_minority_upsampled])
# Display new class counts
df_upsampled.balance.value_counts()
# 1 576
# 0 576
# Name: balance, dtype: int64
You can find more information and other solutions that are well explained here.
There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.
To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.
Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.
You can upsample data with a code like this:
from sklearn.utils import resample
# Separate majority and minority classes
df_majority = df[df.balance==0]
df_minority = df[df.balance==1]
# Upsample minority class
df_minority_upsampled = resample(df_minority,
replace=True, # sample with replacement
n_samples=576, # to match majority class
random_state=123) # reproducible results
# Combine majority class with upsampled minority class
df_upsampled = pd.concat([df_majority, df_minority_upsampled])
# Display new class counts
df_upsampled.balance.value_counts()
# 1 576
# 0 576
# Name: balance, dtype: int64
You can find more information and other solutions that are well explained here.
answered Nov 18 '18 at 11:42
ManriqueManrique
498112
498112
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53358838%2fpredicting-all-zeros%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Please supply info on what model are you using, what is the data, and the code.
– Dinari
Nov 18 '18 at 7:50
Google "class imbalance"...
– desertnaut
Nov 18 '18 at 8:10
In practise I have seen two possible fixes for that (although I dislike both of them). The first is to throw away parts of the dataset so that all outcomes are equally likely. The other one is to put some samples - namely those with a prediction outcome that is not occuring to often - multiple times into the dataset. With both ways all prediction results become about equally likely in the traning set. If someone knows another way I would be interested in it too.
– quant
Nov 18 '18 at 8:10
1
@quent class imbalance is a huge subtopic, and there are are several other approaches, like creating artificial samples of the minority class; look for SMOTE & RUSBoost
– desertnaut
Nov 18 '18 at 8:16