10
$\begingroup$

Let $Y$ be a random variable that obeys the Tweedie distribution for parameter $\alpha = 1.1$. Let the link function be the natural log. Assume that we have a database of numbers of the form

$(y_1, x_{1,1}, x_{1,2}, ..., x_{1,m})$

$(y_2, x_{2,1}, x_{2,2}, ..., x_{2,m})$

...

$(y_n, x_{n, 1}, x_{n, 2}, ..., x_{n, m})$.

The variables are a mix of categorical variables and continuous variables. Because this is a GLM, we know that

$E[Y] = e^{X\beta}$. So here is my question: given the database of numbers and using the fact that this is a Tweedie distribution with a given parameter, what algorithm do I use to best choose $\beta$? Is there an error function that I need to minimize, or do I estimate parameters of maximum likelihood?

$\endgroup$
2
  • 1
    $\begingroup$ Maximum likelihood is correct. See en.wikipedia.org/wiki/…. $\endgroup$
    – amoeba
    Commented Nov 20, 2017 at 19:37
  • 2
    $\begingroup$ You can use GLM to fit it by ML; you just need to supply the right functions to GLM. These are available in statmod (and some additional useful functions are in the tweedie package in R, such as AICtweedie). While you can manage without these if you know how to drive glm well enough, I'd suggest you use the packages. $\endgroup$
    – Glen_b
    Commented Nov 23, 2017 at 6:55

1 Answer 1

15
$\begingroup$

Are you familiar with generalized linear models in R? If so, you can fit Tweedie glms just like any other glms. The glm family definition necessary to make this happen is provided by the statmod R package from CRAN.

Tweedie glms assume that the variance function is a power function: $${\rm var}(y)=V(\mu)\phi=\mu^\alpha \phi$$ Special case include normal glms ($\alpha=0$), Poisson glms $\alpha=1$), gamma glms ($\alpha=2$) and inverse-Gaussian glms ($\alpha=3$).

Here is an example of R code:

> library(statmod)
> y <- c(4.0, 5.9, 3.9, 13.2, 10.0, 9.0)
> x <- 1:6
> fit <- glm(y~x, family = 
           tweedie(var.power=1.1, 
           link.power=0))
> summary(fit)

Call:
glm(formula = y ~ x, family = 
         tweedie(var.power = 1.1, 
         link.power = 0))

Deviance Residuals: 
      1        2        3        4        5        6  
-0.2966   0.1183  -1.0742   1.4985   0.1205  -0.6716  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1.3625     0.4336   3.143   0.0348 *
x             0.1794     0.1008   1.779   0.1498  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Tweedie family taken to be 1.056557)

    Null deviance: 7.3459  on 5  degrees of freedom
Residual deviance: 3.9670  on 4  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 4

The Tweedie package allows you to fit a glm with any power function and any power link. In the glm family call, var.power is the $\alpha$ parameter so that var.power=1.1 specifies $\alpha=1.1$. The var.power refers to exponent of the glm variance function, so that var.power=0 specifies a normal family, var.power=1 means Poisson family, var.power=2 means gamma family, var.power=3 means inverse Gaussian family and so on. Values between 0 and 1 are not permitted but virtually anything else in allowed.

link.power=0 specifies a log-link. The link is specified in terms of Box-Cox transformation powers, so link.power=1 is the identity link and link.power=0 means log.

The above model assumes that $y_i\sim {\rm Tweedie}_\alpha(\mu_i,\phi)$ where $$\log \mu_i=\beta_0+\beta_1 x_i$$ and $${\rm var}(y_i)=\mu_i^{1.1} \phi$$

The regression coefficients $\beta_j$ have been estimated by maximum likelihood. The dispersion parameter $\phi$ has been estimated using the residual sum of squared residuals -- this is called the Pearson estimator.

Regardless of what $\alpha$ or link you use, any of the downstream functions provided in R for glms will work on the glm fitted model object produced by glm().

$\endgroup$
1

Not the answer you're looking for? Browse other questions tagged or ask your own question.