Let $X\sim\operatorname{Poisson}(\lambda)$ and $Y\sim\mathcal N(\mu,\sigma^2)$ be independent and define $Z=X+Y$. The density of $Z$ can be described as an infinite Gaussian mixture of the form $$ f_Z(z)=\sum_{k=0}^\infty\frac{\lambda^ke^{-\lambda}}{k!}\phi(z,\mu+k,\sigma^2), $$ with $\phi(\cdot)$ denoting the Gaussian probability density. Note that $\mathsf EZ=\lambda+\mu$ and $\mathsf{Var}Z=\lambda+\sigma^2$. I am curious about methods that can be employed to estimate the parameters $\lambda$, $\mu$, and $\sigma$ given observations of $Z$. Here is a plot of the density for $\lambda=3$, $\mu=1$, $\sigma=0.2$:
Context: Deep-Sub Electron Read Noise (DSERN) image sensors are a type of image sensor that exhibit very low noise properties. Photons of light arriving at the sensor can be accurately described by that of a Poisson distribution $X\sim\operatorname{Poisson}(\lambda)$ where $\lambda$ represents the expected number of photon arrivals over the exposure time. The process of reading the charge accumulated by these photons interactions introduces read noise, which is usually modeled as normal $Y\sim\mathcal N(\mu,\sigma^2)$ and is independent of the photon induced signal. As such, the output signal of these devices is accurately described by the distribution above. The ability to estimate the parameters of this distribution can tell an experimenter about key performance specifications of a DSERN sensor.
Thoughts: Expectation maximization (EM) is a commonly used method to estimate the parameters of (finite) Gaussian mixtures. If we truncate the number of terms in $f_Z$ to only the most dominant, i.e. $$ f_Z(z;m,n)=\sum_{k=m}^n\frac{\lambda^ke^{-\lambda}}{k!}\phi(z,\mu+k,\sigma^2), $$ such that $\int f_Z(z;m,n)\,\mathrm dz\approx 1$ then the paramters can be decently approximated using EM. In (DSERN) devices, the read noise is typically small enough $\sigma<0.5$ so that each Gaussian component is distinct from its neighbors (c.f. figure above). This makes for a fairly straightforward application of EM without the issue of determining the appropriate number of Gaussian components. In this particular example, however, there are only three unknown, so while the mixture is infinite, I'm wondering if there are some other tricks that can be employed to estimate the unknown parameters. For example, in the special case $\mu=0$, the parameter $\lambda$ can be directly estimated from a sample mean $$ \hat\lambda=\frac{1}{n}\sum_{k=1}^nZ_k. $$ Likewise, $$ \hat\sigma^2= \frac{1}{n-1}\sum_{k=1}^n(Z_k-\bar Z)^2 -\frac{1}{n}\sum_{k=1}^nZ_k. $$ The simplicity of the parameter estimators in this special case leads me to believe that there may be a better approach over EM for the general case.