Newest 'agreement-statistics' Questions

0 votes

0 answers

9 views

Per-Item inter rater reliability with multiple values and raters

I am trying to find the best statistical calculation to measure the agreement between 72 different raters on one item. My goal is to convey in a statistic how spread the raters are on their rating and ...

Jaromando

101

asked Jul 18 at 2:16

0 votes

0 answers

30 views

Is it possible to use unweighted Kappa for some questions and weighted for others to measure interrater agreement in the same questionnaire?

I intend to conduct an interrater agreement analysis on a questionnaire about student essays that contains binary options, mainly yes-no questions, ordinal variables, primarily a likert agreement ...

Christopher Michael Turner Mue

1

asked Jun 29 at 17:58

2 votes

0 answers

14 views

What is the best metric to use to discard annotators with low IAA (inter-annotator agreement) with all others?

This question is specific to ordinal data collected on the likert scale What is the best metric to discard annotators with low inter-annotator agreement (IAA) with others? from e.g., Cohen’s Kappa, ...

user2160809

141

asked Jun 21 at 19:19

0 votes

0 answers

11 views

How is Krippendorff's alpha defined when expected disagreement is 0?

I'm wondering what is the definition of Krippendorff's Alpha statistics when the expected disagreement is 0? The general form of Krippendorff's Alpha is: $\alpha = 1 - \frac{D_{o}}{D_{e}}$ I'm ...

Vera Bernhard

1

asked Jun 14 at 10:16

1 vote

0 answers

63 views

Calculate inter-rater noise using Kahnemans (2021) approach

I need help calculating signal and noise based on the method described by Kahneman et al. (2021) in their book "Noise." They provide a technique for quantifying noise between raters ...

Magnus Nordmo

111

asked Jun 13 at 9:22

3 votes

1 answer

67 views

How to improve testing method B, if Bland-Altman analysis show a near-perfect correlation between the difference and the mean between method A and B?

I have two sets of measurements (two different methods), A and B. Correlation between the measurements is only modest. Therefore, I wondered whether there is some inherent bias in (one or both) of ...

pishcotec_a

41

asked Jun 11 at 8:10

1 vote

0 answers

20 views

How to calculate an ICC for test-retest, with more than one rater

In my data, i have two raters that rated multiple scores to do a test-retest analysis. The method fulfill the requirements for test-retest (short period between both visits) I would like to calculate ...

BPeif

143

asked Jun 7 at 14:40

0 votes

0 answers

13 views

Is it possible to calculate inter-rater reliability for one item rated by multiple raters with weights?

I have a survey with a number of statements that had participants categorizing the statements into one of the 4 options they were provided. The participants were then asked to rate the confidence of ...

Jay Jakka

1

asked Jun 4 at 19:27

2 votes

0 answers

83 views

Comparing human estimates and algorithm on normalized score

By means of an algorithm, we want to predict a metric value $Y$ that can vary for different items $i=1,\ldots,n$. The quantities $y_i$ have been estimated by $N$ human annotators and we want to ...

cdalitz

5,392

asked May 15 at 9:22

2 votes

1 answer

63 views

method comparison/agreement - is Bland-Altman or equivalence of the mean best

I am interested in the appropriate technique for assessing the agreement of the paired values of measurements made by 2 measuring devices, an equivalence test of the mean of the difference in the ...

user3156942

360

asked Mar 28 at 6:22

0 votes

0 answers

22 views

How should I approach calculating sample size for number of raters required for a study?

I am proposing a study for evaluating reports generated by 3 human experts vs 3 different LLMs on a set of 10 situations. Basically, we're trying to whether the human experts are better or if the LLMs ...

user2615936

13

asked Mar 23 at 3:11

0 votes

0 answers

20 views

How to format data for one-way intraclass correlation coefficient?

I'm testing inter-rater reliability of an instrument on a sample of subjects from several sites. Each subject is assessed by two raters, different between each site. According to this paper, I should ...

Charlie

142

asked Mar 22 at 14:08

2 votes

0 answers

40 views

Correct approach for evaluating correlation between multiple measurement devices used on the same subject (repeated measures)

My scenario is as follows. I am using three different devices (TempDeviceA, TempDeviceB, and TempDeviceC) to take the temperature of each animal in a group every day over several weeks. TempDeviceC is ...

SoManyQuestions

21

asked Mar 15 at 5:04

3 votes

1 answer

49 views

Proof of lower bound of Cohen's kappa

Can anyone refer me to a paper or book showing that Cohen's kappa is bounded below by $-1$? I've read various papers stating this, but I have never found a complete proof. This question was asked here,...

nahp

151

asked Feb 29 at 22:54

1 vote

0 answers

28 views

Extending paired t-test to compare agreement between three or more analytical instruments over time

I have three instruments that measure the concentration of dust in the air, and I want to test whether the mean difference between their measurements is zero (i.e that they give the same measurement). ...

fieldofsheep

11

asked Feb 23 at 21:21

Stack Exchange Network

Questions tagged [agreement-statistics]

Per-Item inter rater reliability with multiple values and raters

Is it possible to use unweighted Kappa for some questions and weighted for others to measure interrater agreement in the same questionnaire?

What is the best metric to use to discard annotators with low IAA (inter-annotator agreement) with all others?

How is Krippendorff's alpha defined when expected disagreement is 0?

Calculate inter-rater noise using Kahnemans (2021) approach

How to improve testing method B, if Bland-Altman analysis show a near-perfect correlation between the difference and the mean between method A and B?

How to calculate an ICC for test-retest, with more than one rater

Is it possible to calculate inter-rater reliability for one item rated by multiple raters with weights?

Comparing human estimates and algorithm on normalized score

method comparison/agreement - is Bland-Altman or equivalence of the mean best

How should I approach calculating sample size for number of raters required for a study?

How to format data for one-way intraclass correlation coefficient?

Correct approach for evaluating correlation between multiple measurement devices used on the same subject (repeated measures)

Proof of lower bound of Cohen's kappa

Extending paired t-test to compare agreement between three or more analytical instruments over time

Hot Network Questions

Questions tagged [agreement-statistics]

Related Tags