Is there a way to get the error in fitting parameters from scipy.stats.norm.fit?

Question

I have some data which I have fitted a normal distribution to using the scipy.stats.normal objects fit function like so:

import numpy as np                                                                                                                                                                                                                       
import matplotlib.pyplot as plt                                                                                                                                                                                                          
from scipy.stats import norm                                                                                                                                                                                                             
import matplotlib.mlab as mlab                                                                                                                                                                                                           

x = np.random.normal(size=50000)                                                                                                                                                                                                         

fig, ax = plt.subplots()                                                                                                                                                                                                                 

nbins = 75                                                                                                                                                                                                                               
mu, sigma = norm.fit(x)                                                                                                                                                                                                                  
n, bins, patches = ax.hist(x,nbins,normed=1,facecolor = 'grey', alpha = 0.5, label='before');                                                                                                                                            
y0 = mlab.normpdf(bins, mu, sigma) # Line of best fit                                                                                                                                                                                    
ax.plot(bins,y0,'k--',linewidth = 2, label='fit before')                                                                                                                                                                                 
ax.set_title('$\mu$={}, $\sigma$={}'.format(mu, sigma))                                                                                                                                                                                  

plt.show()

I would now like to extract the uncertainty/error in the fitted mu and sigma values. How can I go about this?

Tonechas · Accepted Answer · 2021-06-16 15:24:43Z

You can use scipy.optimize.curve_fit: This method does not only return the estimated optimal values of the parameters, but also the corresponding covariance matrix:

popt : array

Optimal values for the parameters so that the sum of the squared residuals of f(xdata, *popt) - ydata is minimized

pcov : 2d array

The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov)).

How the sigma parameter affects the estimated covariance depends on absolute_sigma argument, as described above.

If the Jacobian matrix at the solution doesn’t have a full rank, then ‘lm’ method returns a matrix filled with np.inf, on the other hand ‘trf’ and ‘dogbox’ methods use Moore-Penrose pseudoinverse to compute the covariance matrix.

You can calculate the standard deviation errors of the parameters from the square roots of the diagonal elements of the covariance matrix as follows:

import numpy as np 
import matplotlib.pyplot as plt
from scipy.stats import norm 
from scipy.optimize import curve_fit

x = np.random.normal(size=50000)
fig, ax = plt.subplots() 
nbins = 75
n, bins, patches = ax.hist(x,nbins, density=True, facecolor = 'grey', alpha = 0.5, label='before'); 

centers = (0.5*(bins[1:]+bins[:-1]))
pars, cov = curve_fit(lambda x, mu, sig : norm.pdf(x, loc=mu, scale=sig), centers, n, p0=[0,1])

ax.plot(centers, norm.pdf(centers,*pars), 'k--',linewidth = 2, label='fit before') 
ax.set_title('$\mu={:.4f}\pm{:.4f}$, $\sigma={:.4f}\pm{:.4f}$'.format(pars[0],np.sqrt(cov[0,0]), pars[1], np.sqrt(cov[1,1 ])))

plt.show()

This results in the following plot:

please note that the uncertainties reported here are due entirely to the sampling of the data into 75 bins. There is no source of noise or non-normal distribution besides the otherwise arbitrary binning. — M Newville, Commented Mar 7, 2018 at 1:30
@MNewville So, does the norm.fit not suffer from these uncertainties? Other than reporting on these uncertainties, how are curve_fit and norm.fit different? — Always Learning Forever, Commented Oct 4, 2018 at 6:26
@AlwaysLearningForever . I think my earlier comment is not correct -- there is a natural distribution, and with a large enough number of bins, the uncertainty on centroid and width will stabilize to non-zero values. For what norm.fit does: I'm not 100% certain, but I believe that scipy.stats.norm.fit() uses Nelder-Mead to do the fit, while curve_fit uses Levenberg-Marquardt. I do not know if scipy.stats.norm.fit() even tries to estimates uncertainties, but I suspect not. — M Newville, Commented Oct 5, 2018 at 4:22

M Newville · Accepted Answer · 2018-03-06 14:34:32Z

3

See also lmfit (https://github.com/lmfit/lmfit-py) which gives an easier interface and reports uncertainties in fitted variables. To fit data to a normal distribution, see http://lmfit.github.io/lmfit-py/builtin_models.html#example-1-fit-peak-data-to-gaussian-lorentzian-and-voigt-profiles

and use something like

from lmfit.models import GaussianModel

model = GaussianModel()

# create parameters with initial guesses:
params = model.make_params(center=9, amplitude=40, sigma=1)  

result = model.fit(ydata, params, x=xdata)
print(result.fit_report())

The report will include the 1-sigma errors like

[[Variables]]
    sigma:       1.23218358 +/- 0.007374 (0.60%) (init= 1.0)
    center:      9.24277047 +/- 0.007374 (0.08%) (init= 9.0)
    amplitude:   30.3135620 +/- 0.157126 (0.52%) (init= 40.0)
    fwhm:        2.90157055 +/- 0.017366 (0.60%)  == '2.3548200*sigma'
    height:      9.81457817 +/- 0.050872 (0.52%)  == '0.3989423*amplitude/max(1.e-15, sigma)'

answered Mar 6, 2018 at 14:34

M Newville

7,8022 gold badges17 silver badges29 bronze badges

What method did you use to determine the initial parameter values in the example code that you provided?
– James Phillips
Commented Mar 6, 2018 at 16:52
@JamesPhillips : I looked at the data (not even posted here, but in the lmfit example) and guessed. Lmfit's GaussianModel actually has a guess method to help guess center, amplitude, and sigma -- the linked example uses that method. Peak-finding utilities in scipy or other libraries can also be useful at identifying peak centers. But also: for isolated Gaussian peaks, you don't need to be that close in the initial guess for the fit to converge.
– M Newville
Commented Mar 6, 2018 at 18:52
It does look like your guessing worked well in this case.
– James Phillips
Commented Mar 6, 2018 at 20:58

Add a comment |

Collectives™ on Stack Overflow

Is there a way to get the error in fitting parameters from scipy.stats.norm.fit?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
statistics
curve-fitting
gaussian
data-fitting
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonstatisticscurve-fittinggaussiandata-fitting or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
statistics
curve-fitting
gaussian
data-fitting
or ask your own question.