Is decision threshold a hyperparameter in logistic regression?

Predicted classes from (binary) logistic regression are determined by using a threshold on the class membership probabilities generated by the model. As I understand it, typically 0.5 is used by default.

But varying the threshold will change the predicted classifications. Does this mean the threshold is a hyperparameter? If so, why is it (for example) not possible to easily search over a grid of thresholds using scikit-learn's GridSearchCV method (as you would do for the regularisation parameter C).

asked Jan 31 at 17:17

Nick

886

1

$begingroup$
"As I understand it, typically 0.5 is used by default." Depends on the meaning of the word "typical". In practice, no one should be doing this.
$endgroup$
– Matthew Drury
Jan 31 at 17:27

3

$begingroup$
Very much related: Classification probability threshold
$endgroup$
– Stephan Kolassa
Jan 31 at 18:17

$begingroup$
Strictly you don't mean logistic regression, you mean using one logistic regressor with a threshold for binary classification (you could also train one regressor for each of the two classes, with a little seeded randomness or weighting to avoid them being linearly dependent).
$endgroup$
– smci
Jan 31 at 19:53

add a comment |

asked Jan 31 at 17:17

Nick

886

1

$begingroup$
"As I understand it, typically 0.5 is used by default." Depends on the meaning of the word "typical". In practice, no one should be doing this.
$endgroup$
– Matthew Drury
Jan 31 at 17:27

3

$begingroup$
Very much related: Classification probability threshold
$endgroup$
– Stephan Kolassa
Jan 31 at 18:17

$begingroup$
Strictly you don't mean logistic regression, you mean using one logistic regressor with a threshold for binary classification (you could also train one regressor for each of the two classes, with a little seeded randomness or weighting to avoid them being linearly dependent).
$endgroup$
– smci
Jan 31 at 19:53

add a comment |

asked Jan 31 at 17:17

Nick

886

machine-learning logistic scikit-learn hyperparameter

asked Jan 31 at 17:17

Nick

886

asked Jan 31 at 17:17

Nick

886

asked Jan 31 at 17:17

Nick

886

asked Jan 31 at 17:17

Nick

886

asked Jan 31 at 17:17

Nick

886

1

$begingroup$
"As I understand it, typically 0.5 is used by default." Depends on the meaning of the word "typical". In practice, no one should be doing this.
$endgroup$
– Matthew Drury
Jan 31 at 17:27

3

$begingroup$
Very much related: Classification probability threshold
$endgroup$
– Stephan Kolassa
Jan 31 at 18:17

$begingroup$
Strictly you don't mean logistic regression, you mean using one logistic regressor with a threshold for binary classification (you could also train one regressor for each of the two classes, with a little seeded randomness or weighting to avoid them being linearly dependent).
$endgroup$
– smci
Jan 31 at 19:53

add a comment |

1

$begingroup$
"As I understand it, typically 0.5 is used by default." Depends on the meaning of the word "typical". In practice, no one should be doing this.
$endgroup$
– Matthew Drury
Jan 31 at 17:27

3

$begingroup$
Very much related: Classification probability threshold
$endgroup$
– Stephan Kolassa
Jan 31 at 18:17

$begingroup$
Strictly you don't mean logistic regression, you mean using one logistic regressor with a threshold for binary classification (you could also train one regressor for each of the two classes, with a little seeded randomness or weighting to avoid them being linearly dependent).
$endgroup$
– smci
Jan 31 at 19:53

"As I understand it, typically 0.5 is used by default." Depends on the meaning of the word "typical". In practice, no one should be doing this.

– Matthew Drury
Jan 31 at 17:27

Very much related: Classification probability threshold

– Stephan Kolassa
Jan 31 at 18:17

Strictly you don't mean logistic regression, you mean using one logistic regressor with a threshold for binary classification (you could also train one regressor for each of the two classes, with a little seeded randomness or weighting to avoid them being linearly dependent).

– smci
Jan 31 at 19:53

add a comment |

2 Answers
2

active

oldest

votes

The decision threshold creates a trade-off between the number of positives that you predict and the number of negatives that you predict -- because, tautologically, increasing the decision threshold will decrease the number of positives that you predict and increase the number of negatives that you predict.

The decision threshold is not a hyper-parameter in the sense of model tuning because it doesn't change the flexibility of the model.

The way you're thinking about the word "tune" in the context of the decision threshold is different from how hyper-parameters are tuned. Changing $C$ and other model hyper-parameters changes the model (e.g., the logistic regression coefficients will be different), while adjusting the threshold can only do two things: trade off TP for FN, and FP for TN. However, the model remains the same, because this doesn't change the coefficients. (The same is true for models which do not have coefficients, such as random forests: changing the threshold doesn't change anything about the trees.) So in a narrow sense, you're correct that finding the best trade-off among errors is "tuning," but you're wrong in thinking that changing the threshold is linked to other model hyper-parameters in a way that is optimized by GridSearchCV.

Stated another way, changing the decision threshold reflects a choice on your part about how many False Positives and False Negatives that you want to have. Consider the hypothetical that you set the decision threshold to a completely implausible value like -1. All probabilities are non-negative, so with this threshold you will predict "positive" for every observation. From a certain perspective, this is great, because your false negative rate is 0.0. However, your false positive rate is also at the extreme of 1.0, so in that sense your choice of threshold at -1 is terrible.

The ideal, of course, is to have a TPR of 1.0 and a FPR of 0.0 and a FNR of 0.0. But this is usually impossible in real-world applications, so the question then becomes "how much FPR am I willing to accept for how much TPR?" And this is the motivation of roc curves.

edited Feb 1 at 15:34

answered Jan 31 at 17:27

Sycorax

39.9k12102201

$begingroup$
Thanks for the answer @Sycorax. You have almost convinced me. But can't we formalise the idea of "how much FPR am I willing to accept for how much TPR"? e.g. using a cost matrix. If we have a cost matrix then would it not be desirable to find the optimal threshold via tuning, as you would tune a hyperparameter? Or is there a better way to find the optimal threshold?
$endgroup$
– Nick
Feb 1 at 8:32

1

$begingroup$
The way you're using the word "tune" here is different from how hyper-parameters are tuned. Changing $C$ and other model hyper-parameters changes the model (e.g., the logistic regression coefficients will be different), while adjusting the threshold can only do two things: trade off TP for FN, and FP for TN (but the model remains the same -- same coefficients, etc.). You're right, that you want to find the best trade-off among errors, but you're wrong that such tuning takes place inside GridSearchCV.
$endgroup$
– Sycorax
Feb 1 at 13:49

$begingroup$
@Sycorax Isn't the threshold and the intercept (bias term) doing basically the same thing? I.e. you can keep the threshold fixed at 0.5 but change the intercept accordingly; this will "change the model" (as per your last comment) but will have the identical effect in terms of binary predictions. Is this correct? If so, I am not sure the strict distinction between "changing the model" and "changing the decision rule" is so meaningful in this case.
$endgroup$
– amoeba
Feb 1 at 16:16

$begingroup$
@amoeba This is a though-provoking remark. I'll have to consider it. I suppose your suggestion amounts to "keep the threshold at 0.5 and treat the intercept as a hyperparameter, which you tune." There's nothing mathematically to stop you from doing this, except the observation that the model no longer maximizes its likelihood. But achieving the MLE may not be a priority in some specific context.
$endgroup$
– Sycorax
Feb 1 at 16:26

add a comment |

But varying the threshold will change the predicted classifications. Does this mean the threshold is a hyperparameter?

Yup, it does, sorta. It's a hyperparameter of you decision rule, but not the underlying regression.

If so, why is it (for example) not possible to easily search over a grid of thresholds using scikit-learn's GridSearchCV method (as you would do for the regularisation parameter C).

This is a design error in sklearn. The best practice for most classification scenarios is to fit the underlying model (which predicts probabilities) using some measure of the quality of these probabilities (like the log-loss in a logistic regression). Afterwards, a decision threshold on these probabilities should be tuned to optimize some business objective of your classification rule. The library should make it easy to optimize the decision threshold based on some measure of quality, but I don't believe it does that well.

I think this is one of the places sklearn got it wrong. The library includes a method, predict, on all classification models that thresholds at 0.5. This method is useless, and I strongly advocate for not ever invoking it. It's unfortunate that sklearn is not encouraging a better workflow.

edited Feb 1 at 16:52

answered Jan 31 at 17:28

Matthew Drury

25.6k261102

$begingroup$
I also share your skepticism of the predict method's default choice of 0.5 as a cutoff, but GridSearchCV accepts scorer objects which can tune models with respect to out-of-sample cross-entropy loss. Am I missing your point?
$endgroup$
– Sycorax
Jan 31 at 17:32

$begingroup$
Right, agreed that is best practice, but it doesn't encourage users to tune decision thresholds.
$endgroup$
– Matthew Drury
Jan 31 at 17:32

$begingroup$
Gotcha. I understand what you mean!
$endgroup$
– Sycorax
Jan 31 at 17:33

1

$begingroup$
@Sycorax tried to edit to clarify!
$endgroup$
– Matthew Drury
Jan 31 at 17:35

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f390186%2fis-decision-threshold-a-hyperparameter-in-logistic-regression%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

The decision threshold is not a hyper-parameter in the sense of model tuning because it doesn't change the flexibility of the model.

edited Feb 1 at 15:34

answered Jan 31 at 17:27

Sycorax

39.9k12102201

$begingroup$
Thanks for the answer @Sycorax. You have almost convinced me. But can't we formalise the idea of "how much FPR am I willing to accept for how much TPR"? e.g. using a cost matrix. If we have a cost matrix then would it not be desirable to find the optimal threshold via tuning, as you would tune a hyperparameter? Or is there a better way to find the optimal threshold?
$endgroup$
– Nick
Feb 1 at 8:32

1

$begingroup$
The way you're using the word "tune" here is different from how hyper-parameters are tuned. Changing $C$ and other model hyper-parameters changes the model (e.g., the logistic regression coefficients will be different), while adjusting the threshold can only do two things: trade off TP for FN, and FP for TN (but the model remains the same -- same coefficients, etc.). You're right, that you want to find the best trade-off among errors, but you're wrong that such tuning takes place inside GridSearchCV.
$endgroup$
– Sycorax
Feb 1 at 13:49

$begingroup$
@Sycorax Isn't the threshold and the intercept (bias term) doing basically the same thing? I.e. you can keep the threshold fixed at 0.5 but change the intercept accordingly; this will "change the model" (as per your last comment) but will have the identical effect in terms of binary predictions. Is this correct? If so, I am not sure the strict distinction between "changing the model" and "changing the decision rule" is so meaningful in this case.
$endgroup$
– amoeba
Feb 1 at 16:16

$begingroup$
@amoeba This is a though-provoking remark. I'll have to consider it. I suppose your suggestion amounts to "keep the threshold at 0.5 and treat the intercept as a hyperparameter, which you tune." There's nothing mathematically to stop you from doing this, except the observation that the model no longer maximizes its likelihood. But achieving the MLE may not be a priority in some specific context.
$endgroup$
– Sycorax
Feb 1 at 16:26

add a comment |

The decision threshold is not a hyper-parameter in the sense of model tuning because it doesn't change the flexibility of the model.

edited Feb 1 at 15:34

answered Jan 31 at 17:27

Sycorax

39.9k12102201

$begingroup$
Thanks for the answer @Sycorax. You have almost convinced me. But can't we formalise the idea of "how much FPR am I willing to accept for how much TPR"? e.g. using a cost matrix. If we have a cost matrix then would it not be desirable to find the optimal threshold via tuning, as you would tune a hyperparameter? Or is there a better way to find the optimal threshold?
$endgroup$
– Nick
Feb 1 at 8:32

1

$begingroup$
The way you're using the word "tune" here is different from how hyper-parameters are tuned. Changing $C$ and other model hyper-parameters changes the model (e.g., the logistic regression coefficients will be different), while adjusting the threshold can only do two things: trade off TP for FN, and FP for TN (but the model remains the same -- same coefficients, etc.). You're right, that you want to find the best trade-off among errors, but you're wrong that such tuning takes place inside GridSearchCV.
$endgroup$
– Sycorax
Feb 1 at 13:49

$begingroup$
@Sycorax Isn't the threshold and the intercept (bias term) doing basically the same thing? I.e. you can keep the threshold fixed at 0.5 but change the intercept accordingly; this will "change the model" (as per your last comment) but will have the identical effect in terms of binary predictions. Is this correct? If so, I am not sure the strict distinction between "changing the model" and "changing the decision rule" is so meaningful in this case.
$endgroup$
– amoeba
Feb 1 at 16:16

$begingroup$
@amoeba This is a though-provoking remark. I'll have to consider it. I suppose your suggestion amounts to "keep the threshold at 0.5 and treat the intercept as a hyperparameter, which you tune." There's nothing mathematically to stop you from doing this, except the observation that the model no longer maximizes its likelihood. But achieving the MLE may not be a priority in some specific context.
$endgroup$
– Sycorax
Feb 1 at 16:26

add a comment |

The decision threshold is not a hyper-parameter in the sense of model tuning because it doesn't change the flexibility of the model.

edited Feb 1 at 15:34

answered Jan 31 at 17:27

Sycorax

39.9k12102201

The decision threshold is not a hyper-parameter in the sense of model tuning because it doesn't change the flexibility of the model.

edited Feb 1 at 15:34

answered Jan 31 at 17:27

Sycorax

39.9k12102201

edited Feb 1 at 15:34

answered Jan 31 at 17:27

Sycorax

39.9k12102201

answered Jan 31 at 17:27

Sycorax

39.9k12102201

answered Jan 31 at 17:27

Sycorax

39.9k12102201

$begingroup$
Thanks for the answer @Sycorax. You have almost convinced me. But can't we formalise the idea of "how much FPR am I willing to accept for how much TPR"? e.g. using a cost matrix. If we have a cost matrix then would it not be desirable to find the optimal threshold via tuning, as you would tune a hyperparameter? Or is there a better way to find the optimal threshold?
$endgroup$
– Nick
Feb 1 at 8:32

1

$begingroup$
The way you're using the word "tune" here is different from how hyper-parameters are tuned. Changing $C$ and other model hyper-parameters changes the model (e.g., the logistic regression coefficients will be different), while adjusting the threshold can only do two things: trade off TP for FN, and FP for TN (but the model remains the same -- same coefficients, etc.). You're right, that you want to find the best trade-off among errors, but you're wrong that such tuning takes place inside GridSearchCV.
$endgroup$
– Sycorax
Feb 1 at 13:49

$begingroup$
@Sycorax Isn't the threshold and the intercept (bias term) doing basically the same thing? I.e. you can keep the threshold fixed at 0.5 but change the intercept accordingly; this will "change the model" (as per your last comment) but will have the identical effect in terms of binary predictions. Is this correct? If so, I am not sure the strict distinction between "changing the model" and "changing the decision rule" is so meaningful in this case.
$endgroup$
– amoeba
Feb 1 at 16:16

$begingroup$
@amoeba This is a though-provoking remark. I'll have to consider it. I suppose your suggestion amounts to "keep the threshold at 0.5 and treat the intercept as a hyperparameter, which you tune." There's nothing mathematically to stop you from doing this, except the observation that the model no longer maximizes its likelihood. But achieving the MLE may not be a priority in some specific context.
$endgroup$
– Sycorax
Feb 1 at 16:26

add a comment |

$begingroup$
Thanks for the answer @Sycorax. You have almost convinced me. But can't we formalise the idea of "how much FPR am I willing to accept for how much TPR"? e.g. using a cost matrix. If we have a cost matrix then would it not be desirable to find the optimal threshold via tuning, as you would tune a hyperparameter? Or is there a better way to find the optimal threshold?
$endgroup$
– Nick
Feb 1 at 8:32

1

$begingroup$
The way you're using the word "tune" here is different from how hyper-parameters are tuned. Changing $C$ and other model hyper-parameters changes the model (e.g., the logistic regression coefficients will be different), while adjusting the threshold can only do two things: trade off TP for FN, and FP for TN (but the model remains the same -- same coefficients, etc.). You're right, that you want to find the best trade-off among errors, but you're wrong that such tuning takes place inside GridSearchCV.
$endgroup$
– Sycorax
Feb 1 at 13:49

$begingroup$
@Sycorax Isn't the threshold and the intercept (bias term) doing basically the same thing? I.e. you can keep the threshold fixed at 0.5 but change the intercept accordingly; this will "change the model" (as per your last comment) but will have the identical effect in terms of binary predictions. Is this correct? If so, I am not sure the strict distinction between "changing the model" and "changing the decision rule" is so meaningful in this case.
$endgroup$
– amoeba
Feb 1 at 16:16

$begingroup$
@amoeba This is a though-provoking remark. I'll have to consider it. I suppose your suggestion amounts to "keep the threshold at 0.5 and treat the intercept as a hyperparameter, which you tune." There's nothing mathematically to stop you from doing this, except the observation that the model no longer maximizes its likelihood. But achieving the MLE may not be a priority in some specific context.
$endgroup$
– Sycorax
Feb 1 at 16:26

Thanks for the answer @Sycorax. You have almost convinced me. But can't we formalise the idea of "how much FPR am I willing to accept for how much TPR"? e.g. using a cost matrix. If we have a cost matrix then would it not be desirable to find the optimal threshold via tuning, as you would tune a hyperparameter? Or is there a better way to find the optimal threshold?

– Nick
Feb 1 at 8:32

The way you're using the word "tune" here is different from how hyper-parameters are tuned. Changing $C$ and other model hyper-parameters changes the model (e.g., the logistic regression coefficients will be different), while adjusting the threshold can only do two things: trade off TP for FN, and FP for TN (but the model remains the same -- same coefficients, etc.). You're right, that you want to find the best trade-off among errors, but you're wrong that such tuning takes place inside GridSearchCV.

– Sycorax
Feb 1 at 13:49

@Sycorax Isn't the threshold and the intercept (bias term) doing basically the same thing? I.e. you can keep the threshold fixed at 0.5 but change the intercept accordingly; this will "change the model" (as per your last comment) but will have the identical effect in terms of binary predictions. Is this correct? If so, I am not sure the strict distinction between "changing the model" and "changing the decision rule" is so meaningful in this case.

– amoeba
Feb 1 at 16:16

@amoeba This is a though-provoking remark. I'll have to consider it. I suppose your suggestion amounts to "keep the threshold at 0.5 and treat the intercept as a hyperparameter, which you tune." There's nothing mathematically to stop you from doing this, except the observation that the model no longer maximizes its likelihood. But achieving the MLE may not be a priority in some specific context.

– Sycorax
Feb 1 at 16:26

add a comment |

But varying the threshold will change the predicted classifications. Does this mean the threshold is a hyperparameter?

Yup, it does, sorta. It's a hyperparameter of you decision rule, but not the underlying regression.

If so, why is it (for example) not possible to easily search over a grid of thresholds using scikit-learn's GridSearchCV method (as you would do for the regularisation parameter C).

edited Feb 1 at 16:52

answered Jan 31 at 17:28

Matthew Drury

25.6k261102

$begingroup$
I also share your skepticism of the predict method's default choice of 0.5 as a cutoff, but GridSearchCV accepts scorer objects which can tune models with respect to out-of-sample cross-entropy loss. Am I missing your point?
$endgroup$
– Sycorax
Jan 31 at 17:32

$begingroup$
Right, agreed that is best practice, but it doesn't encourage users to tune decision thresholds.
$endgroup$
– Matthew Drury
Jan 31 at 17:32

$begingroup$
Gotcha. I understand what you mean!
$endgroup$
– Sycorax
Jan 31 at 17:33

1

$begingroup$
@Sycorax tried to edit to clarify!
$endgroup$
– Matthew Drury
Jan 31 at 17:35

add a comment |

But varying the threshold will change the predicted classifications. Does this mean the threshold is a hyperparameter?

Yup, it does, sorta. It's a hyperparameter of you decision rule, but not the underlying regression.

If so, why is it (for example) not possible to easily search over a grid of thresholds using scikit-learn's GridSearchCV method (as you would do for the regularisation parameter C).

edited Feb 1 at 16:52

answered Jan 31 at 17:28

Matthew Drury

25.6k261102

$begingroup$
I also share your skepticism of the predict method's default choice of 0.5 as a cutoff, but GridSearchCV accepts scorer objects which can tune models with respect to out-of-sample cross-entropy loss. Am I missing your point?
$endgroup$
– Sycorax
Jan 31 at 17:32

$begingroup$
Right, agreed that is best practice, but it doesn't encourage users to tune decision thresholds.
$endgroup$
– Matthew Drury
Jan 31 at 17:32

$begingroup$
Gotcha. I understand what you mean!
$endgroup$
– Sycorax
Jan 31 at 17:33

1

$begingroup$
@Sycorax tried to edit to clarify!
$endgroup$
– Matthew Drury
Jan 31 at 17:35

add a comment |

But varying the threshold will change the predicted classifications. Does this mean the threshold is a hyperparameter?

Yup, it does, sorta. It's a hyperparameter of you decision rule, but not the underlying regression.

If so, why is it (for example) not possible to easily search over a grid of thresholds using scikit-learn's GridSearchCV method (as you would do for the regularisation parameter C).

edited Feb 1 at 16:52

answered Jan 31 at 17:28

Matthew Drury

25.6k261102

But varying the threshold will change the predicted classifications. Does this mean the threshold is a hyperparameter?

Yup, it does, sorta. It's a hyperparameter of you decision rule, but not the underlying regression.

If so, why is it (for example) not possible to easily search over a grid of thresholds using scikit-learn's GridSearchCV method (as you would do for the regularisation parameter C).

edited Feb 1 at 16:52

answered Jan 31 at 17:28

Matthew Drury

25.6k261102

edited Feb 1 at 16:52

answered Jan 31 at 17:28

Matthew Drury

25.6k261102

answered Jan 31 at 17:28

Matthew Drury

25.6k261102

answered Jan 31 at 17:28

Matthew Drury

25.6k261102

$begingroup$
I also share your skepticism of the predict method's default choice of 0.5 as a cutoff, but GridSearchCV accepts scorer objects which can tune models with respect to out-of-sample cross-entropy loss. Am I missing your point?
$endgroup$
– Sycorax
Jan 31 at 17:32

$begingroup$
Right, agreed that is best practice, but it doesn't encourage users to tune decision thresholds.
$endgroup$
– Matthew Drury
Jan 31 at 17:32

$begingroup$
Gotcha. I understand what you mean!
$endgroup$
– Sycorax
Jan 31 at 17:33

1

$begingroup$
@Sycorax tried to edit to clarify!
$endgroup$
– Matthew Drury
Jan 31 at 17:35

add a comment |

$begingroup$
I also share your skepticism of the predict method's default choice of 0.5 as a cutoff, but GridSearchCV accepts scorer objects which can tune models with respect to out-of-sample cross-entropy loss. Am I missing your point?
$endgroup$
– Sycorax
Jan 31 at 17:32

$begingroup$
Right, agreed that is best practice, but it doesn't encourage users to tune decision thresholds.
$endgroup$
– Matthew Drury
Jan 31 at 17:32

$begingroup$
Gotcha. I understand what you mean!
$endgroup$
– Sycorax
Jan 31 at 17:33

1

$begingroup$
@Sycorax tried to edit to clarify!
$endgroup$
– Matthew Drury
Jan 31 at 17:35

I also share your skepticism of the predict method's default choice of 0.5 as a cutoff, but GridSearchCV accepts scorer objects which can tune models with respect to out-of-sample cross-entropy loss. Am I missing your point?

– Sycorax
Jan 31 at 17:32

Right, agreed that is best practice, but it doesn't encourage users to tune decision thresholds.

– Matthew Drury
Jan 31 at 17:32

Gotcha. I understand what you mean!

– Sycorax
Jan 31 at 17:33

@Sycorax tried to edit to clarify!

– Matthew Drury
Jan 31 at 17:35

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky