To overfit, or not to overfit, that's the question











up vote
1
down vote

favorite












I hope this is not a stupid question. Let us say I have a data generation process that is quite stationary and I do not care about arriving at generalizable knowledge but more about accurate predictions. Would it be acceptable in this scenario to overfit a powerful model (e.g. random forest => fully saturated-ish model) by refreshing it daily using all retrospective data and using it to predict next day’s dependent variable?










share|cite|improve this question
























  • You can do that, but of what value are overfitted and potentially false predictions?
    – Todd D
    Nov 14 at 17:17










  • but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
    – cs0815
    Nov 14 at 17:20












  • Related stats.stackexchange.com/q/249493/35989
    – Tim
    Nov 14 at 17:58










  • @cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
    – Martijn Weterings
    Nov 14 at 17:58






  • 1




    Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
    – Aksakal
    Nov 14 at 21:24















up vote
1
down vote

favorite












I hope this is not a stupid question. Let us say I have a data generation process that is quite stationary and I do not care about arriving at generalizable knowledge but more about accurate predictions. Would it be acceptable in this scenario to overfit a powerful model (e.g. random forest => fully saturated-ish model) by refreshing it daily using all retrospective data and using it to predict next day’s dependent variable?










share|cite|improve this question
























  • You can do that, but of what value are overfitted and potentially false predictions?
    – Todd D
    Nov 14 at 17:17










  • but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
    – cs0815
    Nov 14 at 17:20












  • Related stats.stackexchange.com/q/249493/35989
    – Tim
    Nov 14 at 17:58










  • @cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
    – Martijn Weterings
    Nov 14 at 17:58






  • 1




    Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
    – Aksakal
    Nov 14 at 21:24













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I hope this is not a stupid question. Let us say I have a data generation process that is quite stationary and I do not care about arriving at generalizable knowledge but more about accurate predictions. Would it be acceptable in this scenario to overfit a powerful model (e.g. random forest => fully saturated-ish model) by refreshing it daily using all retrospective data and using it to predict next day’s dependent variable?










share|cite|improve this question















I hope this is not a stupid question. Let us say I have a data generation process that is quite stationary and I do not care about arriving at generalizable knowledge but more about accurate predictions. Would it be acceptable in this scenario to overfit a powerful model (e.g. random forest => fully saturated-ish model) by refreshing it daily using all retrospective data and using it to predict next day’s dependent variable?







regression multiple-regression modeling model






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Nov 14 at 17:20









Penguin_Knight

10k2046




10k2046










asked Nov 14 at 17:03









cs0815

249417




249417












  • You can do that, but of what value are overfitted and potentially false predictions?
    – Todd D
    Nov 14 at 17:17










  • but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
    – cs0815
    Nov 14 at 17:20












  • Related stats.stackexchange.com/q/249493/35989
    – Tim
    Nov 14 at 17:58










  • @cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
    – Martijn Weterings
    Nov 14 at 17:58






  • 1




    Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
    – Aksakal
    Nov 14 at 21:24


















  • You can do that, but of what value are overfitted and potentially false predictions?
    – Todd D
    Nov 14 at 17:17










  • but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
    – cs0815
    Nov 14 at 17:20












  • Related stats.stackexchange.com/q/249493/35989
    – Tim
    Nov 14 at 17:58










  • @cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
    – Martijn Weterings
    Nov 14 at 17:58






  • 1




    Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
    – Aksakal
    Nov 14 at 21:24
















You can do that, but of what value are overfitted and potentially false predictions?
– Todd D
Nov 14 at 17:17




You can do that, but of what value are overfitted and potentially false predictions?
– Todd D
Nov 14 at 17:17












but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
– cs0815
Nov 14 at 17:20






but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
– cs0815
Nov 14 at 17:20














Related stats.stackexchange.com/q/249493/35989
– Tim
Nov 14 at 17:58




Related stats.stackexchange.com/q/249493/35989
– Tim
Nov 14 at 17:58












@cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
– Martijn Weterings
Nov 14 at 17:58




@cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
– Martijn Weterings
Nov 14 at 17:58




1




1




Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
– Aksakal
Nov 14 at 21:24




Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
– Aksakal
Nov 14 at 21:24










3 Answers
3






active

oldest

votes

















up vote
3
down vote













It will eventually be a balance that you need to test (e.g cross validation).




  • If you are too conservative then you won't capture the model and the predictions will be bad.

  • If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.


It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).





In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).






share|cite|improve this answer























  • Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
    – cs0815
    Nov 14 at 17:21










  • Is your question about updating or about overfitting?
    – Martijn Weterings
    Nov 14 at 17:23


















up vote
3
down vote













We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.



Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.



See also the Which model is better: One that overfits or one that underfits? thread.






share|cite|improve this answer





















  • Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
    – cs0815
    Nov 14 at 20:15








  • 1




    @cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
    – Tim
    Nov 14 at 20:24




















up vote
-1
down vote













Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.






share|cite|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f377005%2fto-overfit-or-not-to-overfit-thats-the-question%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote













    It will eventually be a balance that you need to test (e.g cross validation).




    • If you are too conservative then you won't capture the model and the predictions will be bad.

    • If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.


    It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).





    In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).






    share|cite|improve this answer























    • Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
      – cs0815
      Nov 14 at 17:21










    • Is your question about updating or about overfitting?
      – Martijn Weterings
      Nov 14 at 17:23















    up vote
    3
    down vote













    It will eventually be a balance that you need to test (e.g cross validation).




    • If you are too conservative then you won't capture the model and the predictions will be bad.

    • If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.


    It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).





    In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).






    share|cite|improve this answer























    • Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
      – cs0815
      Nov 14 at 17:21










    • Is your question about updating or about overfitting?
      – Martijn Weterings
      Nov 14 at 17:23













    up vote
    3
    down vote










    up vote
    3
    down vote









    It will eventually be a balance that you need to test (e.g cross validation).




    • If you are too conservative then you won't capture the model and the predictions will be bad.

    • If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.


    It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).





    In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).






    share|cite|improve this answer














    It will eventually be a balance that you need to test (e.g cross validation).




    • If you are too conservative then you won't capture the model and the predictions will be bad.

    • If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.


    It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).





    In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Nov 14 at 17:45

























    answered Nov 14 at 17:18









    Martijn Weterings

    11.8k1355




    11.8k1355












    • Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
      – cs0815
      Nov 14 at 17:21










    • Is your question about updating or about overfitting?
      – Martijn Weterings
      Nov 14 at 17:23


















    • Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
      – cs0815
      Nov 14 at 17:21










    • Is your question about updating or about overfitting?
      – Martijn Weterings
      Nov 14 at 17:23
















    Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
    – cs0815
    Nov 14 at 17:21




    Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
    – cs0815
    Nov 14 at 17:21












    Is your question about updating or about overfitting?
    – Martijn Weterings
    Nov 14 at 17:23




    Is your question about updating or about overfitting?
    – Martijn Weterings
    Nov 14 at 17:23












    up vote
    3
    down vote













    We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.



    Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.



    See also the Which model is better: One that overfits or one that underfits? thread.






    share|cite|improve this answer





















    • Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
      – cs0815
      Nov 14 at 20:15








    • 1




      @cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
      – Tim
      Nov 14 at 20:24

















    up vote
    3
    down vote













    We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.



    Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.



    See also the Which model is better: One that overfits or one that underfits? thread.






    share|cite|improve this answer





















    • Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
      – cs0815
      Nov 14 at 20:15








    • 1




      @cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
      – Tim
      Nov 14 at 20:24















    up vote
    3
    down vote










    up vote
    3
    down vote









    We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.



    Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.



    See also the Which model is better: One that overfits or one that underfits? thread.






    share|cite|improve this answer












    We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.



    Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.



    See also the Which model is better: One that overfits or one that underfits? thread.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered Nov 14 at 18:24









    Tim

    55k9124211




    55k9124211












    • Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
      – cs0815
      Nov 14 at 20:15








    • 1




      @cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
      – Tim
      Nov 14 at 20:24




















    • Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
      – cs0815
      Nov 14 at 20:15








    • 1




      @cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
      – Tim
      Nov 14 at 20:24


















    Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
    – cs0815
    Nov 14 at 20:15






    Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
    – cs0815
    Nov 14 at 20:15






    1




    1




    @cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
    – Tim
    Nov 14 at 20:24






    @cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
    – Tim
    Nov 14 at 20:24












    up vote
    -1
    down vote













    Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.






    share|cite|improve this answer

























      up vote
      -1
      down vote













      Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.






      share|cite|improve this answer























        up vote
        -1
        down vote










        up vote
        -1
        down vote









        Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.






        share|cite|improve this answer












        Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Nov 14 at 21:01









        Nuclear Wang

        2,482819




        2,482819






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f377005%2fto-overfit-or-not-to-overfit-thats-the-question%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to send String Array data to Server using php in android

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

            Is anime1.com a legal site for watching anime?