Assume I have a random variable $Y=X_1+X_2$. I want to estimate the distribution $f$ of $Y$ given a sample $y_1,\ldots,y_N$. If this was all that is known about $Y$ the best way would probably be to use some type of kernel density estimator.
However, also assume that $X_1\sim g_\theta$ for some known family of distributions parametrized by an unknown parameter vector $\theta$ and that the distribution of $X_2$ is unknown.
Is there a way to use this additional information about $Y$ to get a better estimate of the distribution of $Y$ and possibly to also infer the parameters $\theta$?
It is clear that $g_\theta$ somehow constrains the possible distributions $f$. Let $X_2\sim h$, then we have $f=g_\theta * h$, so $f$ cannot be arbitrary. For example if $X_1\sim Normal(\mu,\sigma)$, then $f$ cannot have any "sharp edges".
However, I don't completely see how I could use this information in an algorithm. One idea is to define the KDE based on the characteristic function (assuming all $y_i$ are distinct) $\chi(y)=1$ if $y=y_i$ for some $i$ and $\chi(y)=0$ otherwise. Then for some kernel $k$, the KDE could be written as $\hat{f}=\chi * k$. So if we know that $f=g_\theta*h$, then we could substitute $\hat{g_\theta}*\hat{h}=\chi*k$ (where $\hat{g_\theta}$ etc are some kind of estimates). If $h$ and $g_\theta$ were invertible, then we could write $\hat{h}=\chi*k*\hat{g_\theta}^{-1}$ and $\hat{g_\theta}=\chi*k*h^{-1}$ but I am not sure if this could be turned into some form of EM algorithm or similar.
Edit: Thanks to whuber for pointing this out. One central assumption I am making is that $X_1$ and $X_2$ are independent. This was implicitly used when I described $f$ as the convolution of the other distributions, which is only the case if they are independent.