2
$\begingroup$

I work for a public child welfare agency. We have something called a level of care tool that categorizes children based on the amount of care they need. This helps us identify appropriate care options for the child. A level of care 1 is a child who needs the "standard" amount of care (these are usually children who are not developmentally delayed, who do not have severe behavioral issues, and so on). A level of care 10 is a child who needs constant care.

The level of care tool that categorizes these children is essentially a survey about behavioral issues, diagnoses, history of trauma, and delays, as well as strengths.

We are re-designing the survey to more accurately reflect the needs of children, and I have been asked to help assign weights to the questions so that the categorization is accurate. I don't have a PhD or even a masters in survey design, psychometrics, or item response theory and feel a out of my depth. Is there anyone out there who has worked on something similar and can share their approach? Are there any books or articles that you'd recommend?

$\endgroup$
1
  • $\begingroup$ Hi @ra_learns, welcome to CV! While I probably can't be very useful with a response, perhaps you can clarify, in the question, what "I have been asked to help assign weights to the questions so that the categorization is accurate." means. For example, you have a bunch of questions which you then use to generate the Level of Care? And they questions relative to Level of Care in what way at the moment? This is a bit unclear to people (like me) who don't work in your field $\endgroup$
    – Alex J
    Commented Mar 7 at 0:19

2 Answers 2

4
$\begingroup$

I agree with @Preston Botter that this is an advanced application of IRT and support the advice that you might want to look for consultation.

However, I am aware of a possible realization. The R package TAM provides a (not officially supported/documented) solution for this issue using the so-called Q-matrix (quite possible that the following idea works for other software packages as well).

First, note that in case of the 2 parameter logistic model the loading structure of the factor loadings (discrimination parameter in IRT) is estimated in order to best fit the data. More importantly, the discrimination parameter (B-matrix in TAM) is the relative weight of the respective items in the total score.

Next, the Q-matrix is typically used as a binary matrix for assigning the loading of items to different latent dimensions. However, in TAM it is possible to assign values other than zero or one to the Q-matrix. The values of Q are multiplied to the factor loadings (discrimination parameter in IRT; 1 in case of the Rasch model). Thus, we can force a loading other than 1 to items in the Rasch model (or multidimensional versions thereof).

> library(TAM)
> data(data.sim.rasch)
> data.sim <- data.sim.rasch[, 1:15]
> head(data.sim)
     I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 I12 I13 I14 I15
[1,]  1  1  1  1  1  1  0  1  0   1   1   0   0   1   1
[2,]  0  1  0  0  0  1  1  1  0   1   1   1   1   1   0
[3,]  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1
[4,]  1  0  1  1  0  1  1  1  0   0   1   1   1   0   0
[5,]  1  1  1  1  1  1  1  1  0   0   1   1   1   1   1
[6,]  1  1  1  0  0  1  0  0  0   0   0   0   0   0   0
> mod1 <- TAM::tam.mml(resp = data.sim,
+                      Q = data.frame("Loading" = c(rep(.5, 5), rep(1, 5), rep(2, 5))),
+                      verbose = FALSE)
> mod1$item
    item    N      M   xsi.item AXsi_.Cat1 B.Cat1.Dim1
I1    I1 2000 0.8270 -1.6257163 -1.6257163         0.5
I2    I2 2000 0.8145 -1.5382826 -1.5382826         0.5
I3    I3 2000 0.8000 -1.4422434 -1.4422434         0.5
I4    I4 2000 0.7860 -1.3542285 -1.3542285         0.5
I5    I5 2000 0.7725 -1.2731382 -1.2731382         0.5
I6    I6 2000 0.7710 -1.3979913 -1.3979913         1.0
I7    I7 2000 0.7430 -1.2253834 -1.2253834         1.0
I8    I8 2000 0.7435 -1.2283610 -1.2283610         1.0
I9    I9 2000 0.7295 -1.1462542 -1.1462542         1.0
I10  I10 2000 0.6945 -0.9509658 -0.9509658         1.0
I11  I11 2000 0.6905 -1.2062157 -1.2062157         2.0
I12  I12 2000 0.6615 -1.0102233 -1.0102233         2.0
I13  I13 2000 0.6515 -0.9443237 -0.9443237         2.0
I14  I14 2000 0.6415 -0.8791659 -0.8791659         2.0
I15  I15 2000 0.6000 -0.6152801 -0.6152801         2.0

A word of caution: I'm not sure of what will happen if both, relativ loading structure via Q-matrix are given and discrimination parameter are estimated.

$\endgroup$
3
  • $\begingroup$ +1 Good idea @Tom. I was not thinking about diagnostic classification models (because the question referred to IRT) - though still good idea! $\endgroup$ Commented Mar 7 at 22:56
  • 1
    $\begingroup$ Q matrices are present in IRT models as well in order to model multidimensional IRT models; also TAM is the IRT tool in Alex' package framework (that does indeed include the CDM package as well; which makes greater use of the Q matrix); thanks for the +1! $\endgroup$
    – Tom
    Commented Mar 8 at 9:07
  • $\begingroup$ Thank you so much Tom! This is very helpful as I do program in R primarily. I appreciate your time and assistance! $\endgroup$
    – ra_learns
    Commented Mar 19 at 17:38
5
$\begingroup$

Preamble

It is possible to incorporate item-level weights (e.g., a correct response to item 1 is worth 2 points, a correct response to item 2 is worth 1 point, etc.) within an item response theory (IRT) framework. However, I must say that this is a rather advanced application of IRT, so you may need to look for external help. The difficulty mainly comes from the fact that doing this would take extensive programming that requires precise knowledge of the intricacies of IRT, as no software I am aware of can do this on its own (though still with easy to use software this task would be difficult).

Your Question

Yes, assigning weights to items is possible. This can be done by extending the Lord-Wingersky algorithm (Lord & Wingersky, 1984). Generally speaking, the Lord-Wingersky algorithm is a way to obtain IRT scale-scores that correspond to number correct (or sum) scores. Implicit to the original formulation of this algorithm is that each item is equally weighted (i.e., a correct response is weighted equally). Without going into too much detail, the algorithm can be modified to weigh items however one wants. Stucky (2009) does something similar in his master's thesis, though he differentially weighs sections of a test, as opposed to individual items. For a general introduction to the algorithm, I recommend Stucky (2009), as well as Cai (2015) and Huang & Cai (2021).

References

Cai, L. (2015). Lord–Wingersky algorithm version 2.0 for hierarchical item factor models with applications in test scoring, scale alignment, and model fit testing. Psychometrika, 80, 535-559.

Huang, S., & Cai, L. (2021). Lord–Wingersky Algorithm Version 2.5 with Applications. psychometrika, 86, 973-993.

Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score" equatings". Applied psychological measurement, 8(4), 453-461.

Stucky, B. D. (2009). Item response theory for weighted summed scores (Doctoral dissertation, The University of North Carolina at Chapel Hill).

$\endgroup$
2
  • 1
    $\begingroup$ Thank you so much for these references and for your helpful insights, Preston. I have begun the journey of reading these articles. I so appreciate your time and assistance. $\endgroup$
    – ra_learns
    Commented Mar 19 at 17:37
  • $\begingroup$ @ra_learns No problem, I'm glad I could help! $\endgroup$ Commented Mar 19 at 17:39

Not the answer you're looking for? Browse other questions tagged or ask your own question.