Questions tagged [agreement-statistics]
Agreement is the degree to which two raters, instruments, etc, give the same value when applied to the same object. Special statistical methods have been designed for this task.
449
questions
0
votes
0
answers
9
views
Per-Item inter rater reliability with multiple values and raters
I am trying to find the best statistical calculation to measure the agreement between 72 different raters on one item. My goal is to convey in a statistic how spread the raters are on their rating and ...
0
votes
0
answers
30
views
Is it possible to use unweighted Kappa for some questions and weighted for others to measure interrater agreement in the same questionnaire?
I intend to conduct an interrater agreement analysis on a questionnaire about student essays that contains binary options, mainly yes-no questions, ordinal variables, primarily a likert agreement ...
2
votes
0
answers
14
views
What is the best metric to use to discard annotators with low IAA (inter-annotator agreement) with all others?
This question is specific to ordinal data collected on the likert scale
What is the best metric to discard annotators with low inter-annotator agreement (IAA) with others? from e.g., Cohen’s Kappa, ...
0
votes
0
answers
11
views
How is Krippendorff's alpha defined when expected disagreement is 0?
I'm wondering what is the definition of Krippendorff's Alpha statistics when the expected disagreement is 0?
The general form of Krippendorff's Alpha is:
$\alpha = 1 - \frac{D_{o}}{D_{e}}$
I'm ...
1
vote
0
answers
63
views
Calculate inter-rater noise using Kahnemans (2021) approach
I need help calculating signal and noise based on the method described by Kahneman et al. (2021) in their book "Noise." They provide a technique for quantifying noise between raters ...
3
votes
1
answer
67
views
How to improve testing method B, if Bland-Altman analysis show a near-perfect correlation between the difference and the mean between method A and B?
I have two sets of measurements (two different methods), A and B.
Correlation between the measurements is only modest.
Therefore, I wondered whether there is some inherent bias in (one or both) of ...
1
vote
0
answers
20
views
How to calculate an ICC for test-retest, with more than one rater
In my data, i have two raters that rated multiple scores to do a test-retest analysis. The method fulfill the requirements for test-retest (short period between both visits)
I would like to calculate ...
0
votes
0
answers
13
views
Is it possible to calculate inter-rater reliability for one item rated by multiple raters with weights?
I have a survey with a number of statements that had participants categorizing the statements into one of the 4 options they were provided.
The participants were then asked to rate the confidence of ...
2
votes
0
answers
83
views
Comparing human estimates and algorithm on normalized score
By means of an algorithm, we want to predict a metric value $Y$ that can vary for different items $i=1,\ldots,n$. The quantities $y_i$ have been estimated by $N$ human annotators and we want to ...
2
votes
1
answer
63
views
method comparison/agreement - is Bland-Altman or equivalence of the mean best
I am interested in the appropriate technique for assessing the agreement of the paired values of measurements made by 2 measuring devices, an equivalence test of the mean of the difference in the ...
0
votes
0
answers
22
views
How should I approach calculating sample size for number of raters required for a study?
I am proposing a study for evaluating reports generated by 3 human experts vs 3 different LLMs on a set of 10 situations. Basically, we're trying to whether the human experts are better or if the LLMs ...
0
votes
0
answers
20
views
How to format data for one-way intraclass correlation coefficient?
I'm testing inter-rater reliability of an instrument on a sample of subjects from several sites. Each subject is assessed by two raters, different between each site.
According to this paper, I should ...
2
votes
0
answers
40
views
Correct approach for evaluating correlation between multiple measurement devices used on the same subject (repeated measures)
My scenario is as follows. I am using three different devices (TempDeviceA, TempDeviceB, and TempDeviceC) to take the temperature of each animal in a group every day over several weeks. TempDeviceC is ...
3
votes
1
answer
49
views
Proof of lower bound of Cohen's kappa
Can anyone refer me to a paper or book showing that Cohen's kappa is bounded below by $-1$? I've read various papers stating this, but I have never found a complete proof. This question was asked here,...
1
vote
0
answers
28
views
Extending paired t-test to compare agreement between three or more analytical instruments over time
I have three instruments that measure the concentration of dust in the air, and I want to test whether the mean difference between their measurements is zero (i.e that they give the same measurement). ...