All Questions
Tagged with prior regularization
24
questions
3
votes
1
answer
120
views
Parameter distribution of $\theta$ from a rectangular matrix multiplication $C\theta$
I am struggeling to see where this problem fits - i.e. what topics this problem relates to, so I am not able to find the right literature. I want to use some particular information as a prior to a ...
0
votes
0
answers
21
views
Ask for rationale of finding the corresponding prior from regularizer by taking exponential of negative regularizer
In equation (5.112) of textbook "Pattern Recognition and Machine Learning" by Christopher M. Bishop, the simple regularizer takes the form $\frac{\lambda}{2}{\bf w}^T{\bf w}$. The author ...
0
votes
0
answers
13
views
Beta distribution equivalence with two redondant parameters [duplicate]
context
In Factor graphs on discrete variables, the parameters are contained in factors associated each with a subset of the random variables in the system. Each factor provides a different positive ...
2
votes
1
answer
40
views
Bayes prior in MAP estimation corresponding to $\ell^0$ penalization
I gather that in the context of penalized least squares, we can interpret a penalty term as corresponding to a prior $\pi(\beta)\propto \exp\{-\text{pen}\}.$
Is this also true for $\ell^0$ ...
4
votes
1
answer
653
views
Bayesian priors associated with regularization penalties
I gather that adding a penalty term to (linear) least squares minimization typically corresponds with choosing some prior for Bayes estimation in the normal linear regression model. A couple questions ...
6
votes
1
answer
362
views
If LASSO is equivalent to Bayesian Regression with a Laplace (double exponential) prior, what would be the prior for non-negative LASSO? Exponential?
We know that the LASSO penalty is equivalent to Laplace prior. So what would be the corresponding prior for a non-negative LASSO? Is it exponential distribution?
More generally, is it true that every ...
6
votes
1
answer
214
views
What prior would lead to $\ell_\infty$ regularization of model weights?
Gaussian prior on weights of a GLM lead to Ridge / $\ell_2$ squared regularization.
Laplace prior leads to $\ell_1$ regularization
Question
What prior would lead to $\ell_\infty$ regularization ?
2
votes
1
answer
563
views
Random-walk prior with ridge-like regularizarion?
I am working with a model that contains a large number of coefficients, arranged in an ordered vector $\beta_1, \dots, \, \beta_N $. I have some prior knowledge that could be used to improve the ...
0
votes
1
answer
83
views
How to select variables when using shrinkage priors?
I am fitting a linear regression model using shrinkage priors (Horseshoe and Laplace/LASSO). This shrinks many of the variables close to zero, but I would like to select the variables. Can I use the ...
1
vote
0
answers
154
views
Deriving posterior mean with horseshoe prior
I want to decompose a matrix $S \in \mathbb{R}^{D \times D}$ as below
$$S=vv^T $$
where $v_i\mid\lambda_i,\tau_i \sim N(0,\lambda^2_i\rho^2_i)$, $\lambda_i \sim Cauchy^+(0,1)$ i.e $v$ has horseshoe ...
4
votes
2
answers
268
views
In "A Topology Layer for Machine Learning," are the topological priors learned by the network or imposed by humans?
In this paper by Gabrielsson, Nelson, et al. the authors "present a differentiable topology layer that can, among other things, construct a loss on the output of a deep generative network to ...
2
votes
1
answer
336
views
How does L2 penalize large weights
The L2 regularization term is useful because it penalizes large weights over smaller weights which is good to prevent overfitting. I'm having a hard time understanding how exactly it does this.
This ...
9
votes
1
answer
6k
views
MAP estimation as regularisation of MLE
Going through the Wikipedia article on Maximum a posteriori estimation, it got confusing after reading this:
It is closely related to the method of maximum likelihood (ML) estimation, but employs ...
4
votes
1
answer
803
views
Difference between random effect and fixed effect with regularization/prior
Let's say I have a random effect intercept. For example:
lme4::lmer(yield ~ 1 + (1|Batch))
How is that different than just ordinary regression using ...
2
votes
1
answer
240
views
Marginal prior derivation in hierarchical Bayesian model
I am working on a model that is closely related to the normal gamma shrinkage prior setup discussed in Griffin & Brown (2010). Suppose we want to draw $P$ parameters $\beta_p$ with $p=1,...,P$. ...