0
$\begingroup$

The rough idea is that I am trying to compare linguistic properties (e.g. readability) between pieces of texts from two authors essentially. For this, I thought using an ANOVA would be appropriate. But, I know my data does not follow a Gaussian distribution. Whilst I know that there is a frequentist non-parametric ANOVA test (Kruskal-Wallis), I was wondering if a Bayesian alternative exists so that I can also get the effect size? I am looking for either a package in Python or R if possible. I know the JASP software exists but I don't think they have a Bayesian non-parametric one-way (or even two way ANOVA yet). Any advice would be much appreciated.

$\endgroup$

2 Answers 2

1
$\begingroup$

I think you are conflating two ideas here.

What is non-parametric and why do we use it?

First is that in the Frequentist realm people often appeal to "non-parametric" when they don't know the exact sampling distribution of the error term. This is less of a problem than people imagine. The operating characteristics of the alternatives to the t-test or ANOVA, usually are large sample results. But if we were dealing with large sample theory, we would accept that it is the asymptotic distribution of the error term with which we should be concerned: even with highly skewed error terms, the sampling mean may tend to normal by the CLT, which justifies using the t-test. This is made evident if you explore resampling test alternatives, like a bootstrap, or a permutation test. With a moderate sample size, you often find sampling distributions of the test statistic that exhibit a higher degree of normality than with the error term.

In fact, it is just such a resampling test that provides a viable alternative in many other scenarios. I see the permutation test as an ideal candidate for a non-parametric alternative to the ANOVA. The effects and CIs from the model based ANOVA can be reported, along with the robust p-value obtained from permutation. The mean is what we cared about a priori, it is the error which we must handle without distorting our investigation.

Kruskal Wallis introduces the issue that most analysts don't know how to state the null hypothesis or interpret the findings. We are given a p-value and little else. It is perplexing from a reviewer's perspective to understanding the meaning and impact of any analysis using a rank based test.

What is the impact of using a non-conjugate prior on a Bayesian analysis?

Given the nature of the investigation, I commend you on the choice of a Bayesian analysis! Bayesian probability is exactly the right paradigm to explore this kind of retrospective investigation and quantification of belief. If I compared the writings of Plato and Socrates quantitatively, it becomes really convoluted to imagine a $p$-value because there weren't many Socrates and Platos to compare, and yet finite population corrections are unobtainable here... we can't put a theoretic "cap" to the data we might find and say definitively that we would converge on a point estimate in a finite or infinite $n$.

So the error is non-normal but you're interested in the analogue to a linear regression (ANOVA with categorical effect(s)). If we were dealing with pen and paper math, we would specify that the regression parameters had a normal distribution, that the error term has an inverse gaussian distribution, and we could calculate by hand the distributional form of the posterior and do inference. That's too 1900s for our purposes. We use numerical solvers and, with Gibbs sampling, estimate the posterior using a suitable prior on the error term and parameters as before. Suppose the data are inconsistent with these distributional assumptions? We update the posterior accordingly. The posterior may have some interesting and difficult aspects to report and interpret, but that is exactly what you are charged to deal with as a statistician.

Non-parametric Bayes is still something quite different, quite a bit more sophisticated, and likely not applicable to this analysis.

$\endgroup$
2
  • $\begingroup$ Thank you very much for your reply! Since I am a relative beginner to Bayes and statistics in general (as you probably guessed from my post), if I am not mistaken then, you recommend I change my approach to doing a permutation test correct? This might be out of the scope of Cross Validated (forgive me it's my first post), but do you know if there is a package implementing a permutation test or a resource that can show me how to execute one, work out the effect size and confidence intervals? Thank you in advance. $\endgroup$ Commented Mar 9, 2020 at 16:48
  • $\begingroup$ @BeginnerByron I would prefer to go Bayes, but a Bayesian analysis is really high level stuff, I would say an MS stats level at least, but that's cuz when I got my MS we learned how to implement Gibb's Sampler in R before RStudio and before BUGS was ported to R... so who knows what's been boiled down since then. Permutation is easy, I haven't vetted the R packages, but I know they're out there. $\endgroup$
    – AdamO
    Commented Mar 9, 2020 at 17:25
1
$\begingroup$

There may be, but I haven't seen it.

I fear that there is a semantic confusion, as "non-parametric" means two different things (https://en.wikipedia.org/wiki/Nonparametric_statistics), depending on whether you look at the right-hand side of $y \sim f(x)$, or the left-hand side:

"Non-parametric" in the Bayesian machine-learning or Bayesian finite mixture context means that the model, i.e. the right-hand side, is non-parametric (e.g. https://blog.statsbot.co/bayesian-nonparametrics-9f2ce7074b97). In other words, the model structure $f(.)$ is not fixed and adapts to the data (think splines, Gaussian random fields).

When your response is non-Gaussian, you are looking at a "non-parametric" response, which is addressed in Kruskal-Wallis and alike. Here "parametric" refers to the parameters of the distribution from which you assume your data derive. For such a model you could have priors for the response terms, making it potentially Bayesian, but you still need a likelihood (or approximation) for the "non-parametric" response.

The idea of the rank-transformation (which, without ties, yields the Kruskal-Wallis when used in an ANOVA) is to get the response variable into something uniformly distributed. So if you are willing to assume a uniform distribution for your rank-transformed response, and present priors for your model terms, then you are in the land of "Bayesian non-parametric response".

One point of Bayesian priors is to use previous knowledge. If you rank-transform your data, this knowledge is largely eliminated, or at least reduced to something like "group A is larger than group B". I wonder whether it is worth going Bayes in this case.

$\endgroup$
1
  • $\begingroup$ +1 "For such a model you could have priors for the response terms" This seems to be the nub of the issue for KW and other nonparametric tests: the prior distributions would be about distributions of rank sums and similar, which are seldom of any substantive interest in themselves, being instrumental to specific inferences about the unranked data, and typically constructed as functions of $N$. $\endgroup$
    – Alexis
    Commented Apr 1, 2022 at 19:17

Not the answer you're looking for? Browse other questions tagged or ask your own question.