To overfit, or not to overfit, that's the question
up vote
1
down vote
favorite
I hope this is not a stupid question. Let us say I have a data generation process that is quite stationary and I do not care about arriving at generalizable knowledge but more about accurate predictions. Would it be acceptable in this scenario to overfit a powerful model (e.g. random forest => fully saturated-ish model) by refreshing it daily using all retrospective data and using it to predict next day’s dependent variable?
regression multiple-regression modeling model
|
show 2 more comments
up vote
1
down vote
favorite
I hope this is not a stupid question. Let us say I have a data generation process that is quite stationary and I do not care about arriving at generalizable knowledge but more about accurate predictions. Would it be acceptable in this scenario to overfit a powerful model (e.g. random forest => fully saturated-ish model) by refreshing it daily using all retrospective data and using it to predict next day’s dependent variable?
regression multiple-regression modeling model
You can do that, but of what value are overfitted and potentially false predictions?
– Todd D
Nov 14 at 17:17
but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
– cs0815
Nov 14 at 17:20
Related stats.stackexchange.com/q/249493/35989
– Tim♦
Nov 14 at 17:58
@cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
– Martijn Weterings
Nov 14 at 17:58
1
Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
– Aksakal
Nov 14 at 21:24
|
show 2 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I hope this is not a stupid question. Let us say I have a data generation process that is quite stationary and I do not care about arriving at generalizable knowledge but more about accurate predictions. Would it be acceptable in this scenario to overfit a powerful model (e.g. random forest => fully saturated-ish model) by refreshing it daily using all retrospective data and using it to predict next day’s dependent variable?
regression multiple-regression modeling model
I hope this is not a stupid question. Let us say I have a data generation process that is quite stationary and I do not care about arriving at generalizable knowledge but more about accurate predictions. Would it be acceptable in this scenario to overfit a powerful model (e.g. random forest => fully saturated-ish model) by refreshing it daily using all retrospective data and using it to predict next day’s dependent variable?
regression multiple-regression modeling model
regression multiple-regression modeling model
edited Nov 14 at 17:20
Penguin_Knight
10k2046
10k2046
asked Nov 14 at 17:03
cs0815
249417
249417
You can do that, but of what value are overfitted and potentially false predictions?
– Todd D
Nov 14 at 17:17
but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
– cs0815
Nov 14 at 17:20
Related stats.stackexchange.com/q/249493/35989
– Tim♦
Nov 14 at 17:58
@cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
– Martijn Weterings
Nov 14 at 17:58
1
Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
– Aksakal
Nov 14 at 21:24
|
show 2 more comments
You can do that, but of what value are overfitted and potentially false predictions?
– Todd D
Nov 14 at 17:17
but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
– cs0815
Nov 14 at 17:20
Related stats.stackexchange.com/q/249493/35989
– Tim♦
Nov 14 at 17:58
@cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
– Martijn Weterings
Nov 14 at 17:58
1
Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
– Aksakal
Nov 14 at 21:24
You can do that, but of what value are overfitted and potentially false predictions?
– Todd D
Nov 14 at 17:17
You can do that, but of what value are overfitted and potentially false predictions?
– Todd D
Nov 14 at 17:17
but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
– cs0815
Nov 14 at 17:20
but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
– cs0815
Nov 14 at 17:20
Related stats.stackexchange.com/q/249493/35989
– Tim♦
Nov 14 at 17:58
Related stats.stackexchange.com/q/249493/35989
– Tim♦
Nov 14 at 17:58
@cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
– Martijn Weterings
Nov 14 at 17:58
@cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
– Martijn Weterings
Nov 14 at 17:58
1
1
Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
– Aksakal
Nov 14 at 21:24
Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
– Aksakal
Nov 14 at 21:24
|
show 2 more comments
3 Answers
3
active
oldest
votes
up vote
3
down vote
It will eventually be a balance that you need to test (e.g cross validation).
- If you are too conservative then you won't capture the model and the predictions will be bad.
- If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.
It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).
In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).
Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
– cs0815
Nov 14 at 17:21
Is your question about updating or about overfitting?
– Martijn Weterings
Nov 14 at 17:23
add a comment |
up vote
3
down vote
We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.
Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.
See also the Which model is better: One that overfits or one that underfits? thread.
Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
– cs0815
Nov 14 at 20:15
1
@cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
– Tim♦
Nov 14 at 20:24
add a comment |
up vote
-1
down vote
Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
It will eventually be a balance that you need to test (e.g cross validation).
- If you are too conservative then you won't capture the model and the predictions will be bad.
- If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.
It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).
In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).
Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
– cs0815
Nov 14 at 17:21
Is your question about updating or about overfitting?
– Martijn Weterings
Nov 14 at 17:23
add a comment |
up vote
3
down vote
It will eventually be a balance that you need to test (e.g cross validation).
- If you are too conservative then you won't capture the model and the predictions will be bad.
- If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.
It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).
In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).
Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
– cs0815
Nov 14 at 17:21
Is your question about updating or about overfitting?
– Martijn Weterings
Nov 14 at 17:23
add a comment |
up vote
3
down vote
up vote
3
down vote
It will eventually be a balance that you need to test (e.g cross validation).
- If you are too conservative then you won't capture the model and the predictions will be bad.
- If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.
It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).
In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).
It will eventually be a balance that you need to test (e.g cross validation).
- If you are too conservative then you won't capture the model and the predictions will be bad.
- If you are too liberal then you will capture too much of the noise (aside from the model) and the predictions will be bad.
It can be that a slightly more conservative model than the 'real' model (e.g the true model is a polynomial of order 5 and the optimal model to fit it is of order 4) works better, but this depends entirely on the specific circumstances and needs to be tested on a case-by-case basis. However, in general it is better to add some little bias (it will reduce the variability, if done correctly ).
In case your question is about adding new data to the data that you used to train your model, then I would guess that this is rarely gonna be a problem. In most cases adding more data should make the model better unless the modelfit has the behaviour that it is not gonna improve with more data (e.g. when the model is not constant in time, but then the predictions are not going be good anyway).
edited Nov 14 at 17:45
answered Nov 14 at 17:18
Martijn Weterings
11.8k1355
11.8k1355
Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
– cs0815
Nov 14 at 17:21
Is your question about updating or about overfitting?
– Martijn Weterings
Nov 14 at 17:23
add a comment |
Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
– cs0815
Nov 14 at 17:21
Is your question about updating or about overfitting?
– Martijn Weterings
Nov 14 at 17:23
Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
– cs0815
Nov 14 at 17:21
Thanks I think I will use CV to still optimize hyper parameters but refresh daily with all data.
– cs0815
Nov 14 at 17:21
Is your question about updating or about overfitting?
– Martijn Weterings
Nov 14 at 17:23
Is your question about updating or about overfitting?
– Martijn Weterings
Nov 14 at 17:23
add a comment |
up vote
3
down vote
We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.
Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.
See also the Which model is better: One that overfits or one that underfits? thread.
Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
– cs0815
Nov 14 at 20:15
1
@cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
– Tim♦
Nov 14 at 20:24
add a comment |
up vote
3
down vote
We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.
Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.
See also the Which model is better: One that overfits or one that underfits? thread.
Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
– cs0815
Nov 14 at 20:15
1
@cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
– Tim♦
Nov 14 at 20:24
add a comment |
up vote
3
down vote
up vote
3
down vote
We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.
Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.
See also the Which model is better: One that overfits or one that underfits? thread.
We say that model overfitts when it has good performance on training data, but not on unseen data. It is not a statement about data generating process, but about the sample that you use for training, versus any other sample that can be drawn. So if model has good predictive performance on unseen data, it does not overfit.
Overfitting would not be a problem if you didn't want to make predictions on unseen data and didn't want to make any conclusions about it given the model. You are right that if you can be perfectly sure that the future data would be identical to your training sample, then it wouldn't matter, but I can't imagine any scenario where you could be sure about it. Notice that even if you had perfectly representative sample, or population data, it still can happen that the phenomenon of interest would change over time and the past data wouldn't be relevant any more.
See also the Which model is better: One that overfits or one that underfits? thread.
answered Nov 14 at 18:24
Tim♦
55k9124211
55k9124211
Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
– cs0815
Nov 14 at 20:15
1
@cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
– Tim♦
Nov 14 at 20:24
add a comment |
Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
– cs0815
Nov 14 at 20:15
1
@cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
– Tim♦
Nov 14 at 20:24
Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
– cs0815
Nov 14 at 20:15
Thanks. Sorry I would disagree a bit. If the training sample is representative of the data generation process and the unseen data are as well, then memorizing data (i.e. over-fitting) should not be a major issue. I guess the more dimensions there are the more representative samples there have to be ... I also said, that I refit the model regularly, so even a change in the data generation process should be picked up?
– cs0815
Nov 14 at 20:15
1
1
@cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
– Tim♦
Nov 14 at 20:24
@cs0815 If you re-fit the model, you seem to be assuming that the data can change over time, don't you? If so, then inevitably every time you train the model on historical data, to predict the future. So something could have changed. If that's not the case, don't re-fit your model, train it once and don't monitor the performance, as you're waisting your time.
– Tim♦
Nov 14 at 20:24
add a comment |
up vote
-1
down vote
Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.
add a comment |
up vote
-1
down vote
Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.
add a comment |
up vote
-1
down vote
up vote
-1
down vote
Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.
Overfitting is bad, because it means the model you learned from your training data may not work well for new data points. You can imagine a perfectly overfit model that simply memorizes each training point and returns the appropriate output. When confronted with data that it wasn't trained on, it outputs a random number. You could train a model like this on a ton of retrospective data, but unless you get identical data tomorrow, you'll do no better than random. I suppose an approach like this could work with a limited and discrete input space, but you don't really need machine learning models for that anyway.
answered Nov 14 at 21:01
Nuclear Wang
2,482819
2,482819
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f377005%2fto-overfit-or-not-to-overfit-thats-the-question%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You can do that, but of what value are overfitted and potentially false predictions?
– Todd D
Nov 14 at 17:17
but the process is fairly stationary so new data should not be unexpected thus lead to massively 'false' predictions ....
– cs0815
Nov 14 at 17:20
Related stats.stackexchange.com/q/249493/35989
– Tim♦
Nov 14 at 17:58
@cs0815 you have accepted a rather ordinary and simple answer very quickly. I posted the question more as a temporary answer in a process were I was hoping that you were gonna give some more information about your question. What is the deal with the 'refreshing it daily'? That would be essential to make this question not a duplicate with just a fancy title.
– Martijn Weterings
Nov 14 at 17:58
1
Frequency of model updates and overfitting are separate concerns. If the model doesn't overfit, then it can benefit from consuming new data frequently provided that the new data contains information and not only noise. Overfitting is fitting to the noise, and if you somehow prevent it then, you'll be fitting to daily new information, which is good
– Aksakal
Nov 14 at 21:24