Skip to main content
error
Source Link
Dave Harris
  • 7.8k
  • 16
  • 27

No, it is not true. Bayesian methods will certainly overfit the data. There are a couple of things that make Bayesian methods more robust against overfitting and you can make them more fragile as well.

The combinatoric nature of Bayesian hypotheses, rather than binary hypotheses allows for multiple comparisons when someone lacks the "true" model for null hypothesis methods. A Bayesian posterior effectively penalizes an increase in model structure such as adding variables while rewarding improvements in fit. The penalties and gains are not optimizations as would be the case in non-Bayesian methods, but shifts in probabilities from new information.

While this generally gives a more robust methodology, there is an important constraint and that is using proper prior distributions. While there is a tendency to want to mimic Frequentist methods by using flat priors, this does not assure a proper solution. There are articles on overfitting in Bayesian methods and it appears to me that the sin seems to be in trying to be "fair" to non-Bayesian methods by starting with strictly flat priors. The difficulty is that the prior is important in normalizing the likelihood.

Bayesian models are intrinsically optimal models in Wald's admissibility sense of the word, but there is a hidden bogeyman in there. Wald is assuming the prior is your true prior and not some prior you are using so that editors won't ding you for putting too much information in it. They are not optimal in the same sense that Frequentist models are. Frequentist methods begin with the optimization of minimizing the variance while remaining unbiased.

This is a costly optimization in that it discards information and is not intrinsically admissible in the Wald sense, though it frequently is admissible. So Frequentist models provide an optimal fit to the data, given unbiasedness. Bayesian models are neither unbiased nor optimal fits to the data. This is the trade you are making to minimize overfitting.

Bayesian modelsestimators are intrinsically biased modelsestimators, unless special steps are taken to make them unbiased, that are usually a worse fit to the data. Their virtue is that they never use less information than an alternative method to find the "true model" and this additional information makes Bayesian modelsestimators never lessmore risky than alternative modelsmethods, particularly when working out of sample. That said, there will always exist a sample that could have been randomly drawn that would systematically "deceive" the Bayesian method.

As to the second part of your question, if you were to analyze a single sample, the posterior would be forever altered in all its parts and would not revert to the prior unless there was a second sample that exactly cancelled out all the information in the first sample. At least theoretically this is true. In practice, if the prior is sufficiently informative and the observation sufficiently uninformative, then the impact could be so small that a computer could not measure the differences because of the limitation on the number of significant digits. It is possible for an effect to be too small for a computer to process a change in the posterior.

So the answer is "yes" you can overfit a sample using a Bayesian method, particularly if you have a small sample size and improper priors. The second answer is "no" Bayes theorem never forgets the impact of prior data, though the effect could be so small you miss it computationally.

No, it is not true. Bayesian methods will certainly overfit the data. There are a couple of things that make Bayesian methods more robust against overfitting and you can make them more fragile as well.

The combinatoric nature of Bayesian hypotheses, rather than binary hypotheses allows for multiple comparisons when someone lacks the "true" model for null hypothesis methods. A Bayesian posterior effectively penalizes an increase in model structure such as adding variables while rewarding improvements in fit. The penalties and gains are not optimizations as would be the case in non-Bayesian methods, but shifts in probabilities from new information.

While this generally gives a more robust methodology, there is an important constraint and that is using proper prior distributions. While there is a tendency to want to mimic Frequentist methods by using flat priors, this does not assure a proper solution. There are articles on overfitting in Bayesian methods and it appears to me that the sin seems to be in trying to be "fair" to non-Bayesian methods by starting with strictly flat priors. The difficulty is that the prior is important in normalizing the likelihood.

Bayesian models are intrinsically optimal models in Wald's admissibility sense of the word, but there is a hidden bogeyman in there. Wald is assuming the prior is your true prior and not some prior you are using so that editors won't ding you for putting too much information in it. They are not optimal in the same sense that Frequentist models are. Frequentist methods begin with the optimization of minimizing the variance while remaining unbiased.

This is a costly optimization in that it discards information and is not intrinsically admissible in the Wald sense, though it frequently is admissible. So Frequentist models provide an optimal fit to the data, given unbiasedness. Bayesian models are neither unbiased nor optimal fits to the data. This is the trade you are making to minimize overfitting.

Bayesian models are intrinsically biased models, unless special steps are taken to make them unbiased, that are usually a worse fit to the data. Their virtue is that they never use less information than an alternative method to find the "true model" and this additional information makes Bayesian models never less risky than alternative models, particularly when working out of sample. That said, there will always exist a sample that could have been randomly drawn that would systematically "deceive" the Bayesian method.

As to the second part of your question, if you were to analyze a single sample, the posterior would be forever altered in all its parts and would not revert to the prior unless there was a second sample that exactly cancelled out all the information in the first sample. At least theoretically this is true. In practice, if the prior is sufficiently informative and the observation sufficiently uninformative, then the impact could be so small that a computer could not measure the differences because of the limitation on the number of significant digits. It is possible for an effect to be too small for a computer to process a change in the posterior.

So the answer is "yes" you can overfit a sample using a Bayesian method, particularly if you have a small sample size and improper priors. The second answer is "no" Bayes theorem never forgets the impact of prior data, though the effect could be so small you miss it computationally.

No, it is not true. Bayesian methods will certainly overfit the data. There are a couple of things that make Bayesian methods more robust against overfitting and you can make them more fragile as well.

The combinatoric nature of Bayesian hypotheses, rather than binary hypotheses allows for multiple comparisons when someone lacks the "true" model for null hypothesis methods. A Bayesian posterior effectively penalizes an increase in model structure such as adding variables while rewarding improvements in fit. The penalties and gains are not optimizations as would be the case in non-Bayesian methods, but shifts in probabilities from new information.

While this generally gives a more robust methodology, there is an important constraint and that is using proper prior distributions. While there is a tendency to want to mimic Frequentist methods by using flat priors, this does not assure a proper solution. There are articles on overfitting in Bayesian methods and it appears to me that the sin seems to be in trying to be "fair" to non-Bayesian methods by starting with strictly flat priors. The difficulty is that the prior is important in normalizing the likelihood.

Bayesian models are intrinsically optimal models in Wald's admissibility sense of the word, but there is a hidden bogeyman in there. Wald is assuming the prior is your true prior and not some prior you are using so that editors won't ding you for putting too much information in it. They are not optimal in the same sense that Frequentist models are. Frequentist methods begin with the optimization of minimizing the variance while remaining unbiased.

This is a costly optimization in that it discards information and is not intrinsically admissible in the Wald sense, though it frequently is admissible. So Frequentist models provide an optimal fit to the data, given unbiasedness. Bayesian models are neither unbiased nor optimal fits to the data. This is the trade you are making to minimize overfitting.

Bayesian estimators are intrinsically biased estimators, unless special steps are taken to make them unbiased, that are usually a worse fit to the data. Their virtue is that they never use less information than an alternative method to find the "true model" and this additional information makes Bayesian estimators never more risky than alternative methods, particularly when working out of sample. That said, there will always exist a sample that could have been randomly drawn that would systematically "deceive" the Bayesian method.

As to the second part of your question, if you were to analyze a single sample, the posterior would be forever altered in all its parts and would not revert to the prior unless there was a second sample that exactly cancelled out all the information in the first sample. At least theoretically this is true. In practice, if the prior is sufficiently informative and the observation sufficiently uninformative, then the impact could be so small that a computer could not measure the differences because of the limitation on the number of significant digits. It is possible for an effect to be too small for a computer to process a change in the posterior.

So the answer is "yes" you can overfit a sample using a Bayesian method, particularly if you have a small sample size and improper priors. The second answer is "no" Bayes theorem never forgets the impact of prior data, though the effect could be so small you miss it computationally.

correct antecedent for pronoun
Source Link
Dave Harris
  • 7.8k
  • 16
  • 27

No, it is not true. Bayesian methods will certainly overfit the data. There are a couple of things that make Bayesian methods more robust against overfitting and you can make them more fragile as well.

The combinatoric nature of Bayesian hypotheses, rather than binary hypotheses allows for multiple comparisons when someone lacks the "true" model for null hypothesis methods. A Bayesian posterior effectively penalizes an increase in model structure such as adding variables while rewarding improvements in fit. The penalties and gains are not optimizations as would be the case in non-Bayesian methods, but shifts in probabilities from new information.

While this generally gives a more robust methodology, there is an important constraint and that is using proper prior distributions. While there is a tendency to want to mimic Frequentist methods by using flat priors, this does not assure a proper solution. There are articles on overfitting in Bayesian methods and it appears to me that the sin seems to be in trying to be "fair" to non-Bayesian methods by starting with strictly flat priors. The difficulty is that the prior is important in normalizing the likelihood.

Bayesian models are intrinsically optimal models in Wald's admissibility sense of the word, but there is a hidden bogeyman in there. Wald is assuming the prior is your true prior and not some prior you are using so that editors won't ding you for putting too much information in it. They are not optimal in the same sense that Frequentist models are. They Frequentist methods begin with the optimization of minimizing the variance while remaining unbiased.

This is a costly optimization in that it discards information and is not intrinsically admissible in the Wald sense, though it frequently is admissible. So Frequentist models provide an optimal fit to the data, given unbiasedness. Bayesian models are neither unbiased nor optimal fits to the data. This is the trade you are making to minimize overfitting.

Bayesian models are intrinsically biased models, unless special steps are taken to make them unbiased, that are usually a worse fit to the data. Their virtue is that they never use less information than an alternative method to find the "true model" and this additional information makes Bayesian models never less risky than alternative models, particularly when working out of sample. That said, there will always exist a sample that could have been randomly drawn that would systematically "deceive" the Bayesian method.

As to the second part of your question, if you were to analyze a single sample, the posterior would be forever altered in all its parts and would not revert to the prior unless there was a second sample that exactly cancelled out all the information in the first sample. At least theoretically this is true. In practice, if the prior is sufficiently informative and the observation sufficiently uninformative, then the impact could be so small that a computer could not measure the differences because of the limitation on the number of significant digits. It is possible for an effect to be too small for a computer to process a change in the posterior.

So the answer is "yes" you can overfit a sample using a Bayesian method, particularly if you have a small sample size and improper priors. The second answer is "no" Bayes theorem never forgets the impact of prior data, though the effect could be so small you miss it computationally.

No, it is not true. Bayesian methods will certainly overfit the data. There are a couple of things that make Bayesian methods more robust against overfitting and you can make them more fragile as well.

The combinatoric nature of Bayesian hypotheses, rather than binary hypotheses allows for multiple comparisons when someone lacks the "true" model for null hypothesis methods. A Bayesian posterior effectively penalizes an increase in model structure such as adding variables while rewarding improvements in fit. The penalties and gains are not optimizations as would be the case in non-Bayesian methods, but shifts in probabilities from new information.

While this generally gives a more robust methodology, there is an important constraint and that is using proper prior distributions. While there is a tendency to want to mimic Frequentist methods by using flat priors, this does not assure a proper solution. There are articles on overfitting in Bayesian methods and it appears to me that the sin seems to be in trying to be "fair" to non-Bayesian methods by starting with strictly flat priors. The difficulty is that the prior is important in normalizing the likelihood.

Bayesian models are intrinsically optimal models in Wald's admissibility sense of the word, but there is a hidden bogeyman in there. Wald is assuming the prior is your true prior and not some prior you are using so that editors won't ding you for putting too much information in it. They are not optimal in the same sense that Frequentist models are. They begin with the optimization of minimizing the variance while remaining unbiased.

This is a costly optimization in that it discards information and is not intrinsically admissible in the Wald sense, though it frequently is admissible. So Frequentist models provide an optimal fit to the data, given unbiasedness. Bayesian models are neither unbiased nor optimal fits to the data. This is the trade you are making to minimize overfitting.

Bayesian models are intrinsically biased models, unless special steps are taken to make them unbiased, that are usually a worse fit to the data. Their virtue is that they never use less information than an alternative method to find the "true model" and this additional information makes Bayesian models never less risky than alternative models, particularly when working out of sample. That said, there will always exist a sample that could have been randomly drawn that would systematically "deceive" the Bayesian method.

As to the second part of your question, if you were to analyze a single sample, the posterior would be forever altered in all its parts and would not revert to the prior unless there was a second sample that exactly cancelled out all the information in the first sample. At least theoretically this is true. In practice, if the prior is sufficiently informative and the observation sufficiently uninformative, then the impact could be so small that a computer could not measure the differences because of the limitation on the number of significant digits. It is possible for an effect to be too small for a computer to process a change in the posterior.

So the answer is "yes" you can overfit a sample using a Bayesian method, particularly if you have a small sample size and improper priors. The second answer is "no" Bayes theorem never forgets the impact of prior data, though the effect could be so small you miss it computationally.

No, it is not true. Bayesian methods will certainly overfit the data. There are a couple of things that make Bayesian methods more robust against overfitting and you can make them more fragile as well.

The combinatoric nature of Bayesian hypotheses, rather than binary hypotheses allows for multiple comparisons when someone lacks the "true" model for null hypothesis methods. A Bayesian posterior effectively penalizes an increase in model structure such as adding variables while rewarding improvements in fit. The penalties and gains are not optimizations as would be the case in non-Bayesian methods, but shifts in probabilities from new information.

While this generally gives a more robust methodology, there is an important constraint and that is using proper prior distributions. While there is a tendency to want to mimic Frequentist methods by using flat priors, this does not assure a proper solution. There are articles on overfitting in Bayesian methods and it appears to me that the sin seems to be in trying to be "fair" to non-Bayesian methods by starting with strictly flat priors. The difficulty is that the prior is important in normalizing the likelihood.

Bayesian models are intrinsically optimal models in Wald's admissibility sense of the word, but there is a hidden bogeyman in there. Wald is assuming the prior is your true prior and not some prior you are using so that editors won't ding you for putting too much information in it. They are not optimal in the same sense that Frequentist models are. Frequentist methods begin with the optimization of minimizing the variance while remaining unbiased.

This is a costly optimization in that it discards information and is not intrinsically admissible in the Wald sense, though it frequently is admissible. So Frequentist models provide an optimal fit to the data, given unbiasedness. Bayesian models are neither unbiased nor optimal fits to the data. This is the trade you are making to minimize overfitting.

Bayesian models are intrinsically biased models, unless special steps are taken to make them unbiased, that are usually a worse fit to the data. Their virtue is that they never use less information than an alternative method to find the "true model" and this additional information makes Bayesian models never less risky than alternative models, particularly when working out of sample. That said, there will always exist a sample that could have been randomly drawn that would systematically "deceive" the Bayesian method.

As to the second part of your question, if you were to analyze a single sample, the posterior would be forever altered in all its parts and would not revert to the prior unless there was a second sample that exactly cancelled out all the information in the first sample. At least theoretically this is true. In practice, if the prior is sufficiently informative and the observation sufficiently uninformative, then the impact could be so small that a computer could not measure the differences because of the limitation on the number of significant digits. It is possible for an effect to be too small for a computer to process a change in the posterior.

So the answer is "yes" you can overfit a sample using a Bayesian method, particularly if you have a small sample size and improper priors. The second answer is "no" Bayes theorem never forgets the impact of prior data, though the effect could be so small you miss it computationally.

Source Link
Dave Harris
  • 7.8k
  • 16
  • 27

No, it is not true. Bayesian methods will certainly overfit the data. There are a couple of things that make Bayesian methods more robust against overfitting and you can make them more fragile as well.

The combinatoric nature of Bayesian hypotheses, rather than binary hypotheses allows for multiple comparisons when someone lacks the "true" model for null hypothesis methods. A Bayesian posterior effectively penalizes an increase in model structure such as adding variables while rewarding improvements in fit. The penalties and gains are not optimizations as would be the case in non-Bayesian methods, but shifts in probabilities from new information.

While this generally gives a more robust methodology, there is an important constraint and that is using proper prior distributions. While there is a tendency to want to mimic Frequentist methods by using flat priors, this does not assure a proper solution. There are articles on overfitting in Bayesian methods and it appears to me that the sin seems to be in trying to be "fair" to non-Bayesian methods by starting with strictly flat priors. The difficulty is that the prior is important in normalizing the likelihood.

Bayesian models are intrinsically optimal models in Wald's admissibility sense of the word, but there is a hidden bogeyman in there. Wald is assuming the prior is your true prior and not some prior you are using so that editors won't ding you for putting too much information in it. They are not optimal in the same sense that Frequentist models are. They begin with the optimization of minimizing the variance while remaining unbiased.

This is a costly optimization in that it discards information and is not intrinsically admissible in the Wald sense, though it frequently is admissible. So Frequentist models provide an optimal fit to the data, given unbiasedness. Bayesian models are neither unbiased nor optimal fits to the data. This is the trade you are making to minimize overfitting.

Bayesian models are intrinsically biased models, unless special steps are taken to make them unbiased, that are usually a worse fit to the data. Their virtue is that they never use less information than an alternative method to find the "true model" and this additional information makes Bayesian models never less risky than alternative models, particularly when working out of sample. That said, there will always exist a sample that could have been randomly drawn that would systematically "deceive" the Bayesian method.

As to the second part of your question, if you were to analyze a single sample, the posterior would be forever altered in all its parts and would not revert to the prior unless there was a second sample that exactly cancelled out all the information in the first sample. At least theoretically this is true. In practice, if the prior is sufficiently informative and the observation sufficiently uninformative, then the impact could be so small that a computer could not measure the differences because of the limitation on the number of significant digits. It is possible for an effect to be too small for a computer to process a change in the posterior.

So the answer is "yes" you can overfit a sample using a Bayesian method, particularly if you have a small sample size and improper priors. The second answer is "no" Bayes theorem never forgets the impact of prior data, though the effect could be so small you miss it computationally.