Best practices for sample size in disaggregation of data on educational outcomes

Question

The California community college where I teach has a prescribed system for collecting and analyzing data on students' success and then disaggregating the data by race. As a department, we are required to use these numbers to calculate an "achievement gap," defined as the highest success rate minus the lowest success rate. This tends to lead to statistically silly results, since many of the racial categories are very small. For example, we may have one student in our department in a given year who clicked on "other non-white" to describe their ethnicity, so then that racial category is assigned a success rate of either 0% or 100%, depending on whether that particular student passed the class they took. I have raised this issue with the relevant faculty committee but was rebuffed, apparently because any proposal to change these statistical practices was perceived by some members of the committee as bad for promoting racially equitable outcomes. Disaggregation in general is required by our accrediting body, but there are no specific requirements, e.g., we could disaggregate according to categories that have nothing to do with race, or we could use race but not define an achievement gap in this way. Their requirement is:

The institution disaggregates and analyzes learning outcomes and achievement for subpopulations of students. When the institution identifies performance gaps, it implements strategies, which may include allocation or reallocation of human, fiscal and other resources, to mitigate those gaps and evaluates the efficacy of those strategies. (2014 ACCJC Standards, Standard I.B.6)

Googling shows that the national K-12 establishment seems to have considered these issues very carefully, including consideration of privacy issues that arise when a racial group is very small, so that individuals can be identified. The 2015 Every Student Succeeds Act (ESSA) makes a specific requirements that:

the statistical rigor that informed the selection of the minimum n-size should be documented and how this minimum number is statistically sound should be described.

However, I'm not turning up much information relating to higher education. I came across some state legislation in my state that in its original version would have required statistical significance in both health and educational statistics, but the educational part was stripped out during debate by the Senate before the bill was passed into law.

Are there laws or recognized best practices that apply at the college level? The whole thing seems to me like a pretty trivial exercise in freshman statistics, but it seems that mathematical facts and competence, in and of themselves, do not count for much unless stated by some authoritative body. I would think that academic publications on this topic would also command at least some respect.

[EDIT] After some discussions in comments with Buffy, I think the following clarification may be helpful. I do not advocate the use of these statistics for these purposes, so I'm not asking for a magic prescription as to how to use them so that they will be valid indicators of something. We are currently being forced to use them for a particular purpose, as part of a required program review process that occurs at intervals of a few years. These statistics are used by administrators for making decisions that affect us. For example, if we want money to be spent on lab apparatus or a new tenure-track position, we are expected to explain why that spending will improve our students' success, according to various measures, including the "achievement gap" described above. What I'm hoping for is basically suggestions for harm reduction, such as publications that I might be able to point to in order to show that the practices being imposed on us are not in line with best practices. It may be that the entire exercise is so totally flawed as to be unusable for any serious purpose. If so, then it would be great to have an answer making the case that that is so, but in order to be helpful, such an answer should point to something like a peer-reviewed paper, procedures at a specific institution that the answerer considers to be a model, or guidelines by a national organization. Our accrediting body requires us to do something with data like this, so I don't think it's going to fly if I tell my colleagues that we need to lobby for change at our accrediting body because anonymous-internet-person makes a good argument.

It would be novel for the ACCJC to use a robust methodology. — user14140, Commented Jan 7, 2019 at 1:04

Maarten Buis · Accepted Answer · 2018-08-27 07:39:59Z

Basing a proportion on one observation is silly. If your funding body needs a reference for that, then they are a lost cause. It is unlikely any argument based on that will reach them, so I would (initially) forget about that angle.

One possible alternative would be a requirement that the results are anonymous, i.e. you cannot recover the results from individual students from the numbers you report. Statistical agencies often use that requirement not to report results from small groups. This style of argument may fit the bureaucratic mindset of your funding agency better.

Buffy · Accepted Answer · 2018-08-26 21:14:24Z

0

I think you are missing the purpose of gathering such data. While statistics has quite valid procedures and rules for measuring the probability that a sample accurately reflects characteristics of a population, that isn't the purpose here. In particular, your samples aren't random, most likely, so even the interpretation with larger samples becomes problematic.

Even more problematic is the question of determination of the population itself. If it is just the local students, then you aren't in the world of statistics at all, as you are querying the entire population.

However, the purpose of gathering such numbers is to provide warnings when things seem out of balance leading to possible policy steps to correct them. All such things are fraught, of course, but they are really just triggers that might generate action. On the flip side, they can be "soothing tones" that suggest all is well when perhaps it isn't.

In the best case, "bad" numbers lead to discussions and further "research" which may lead to corrective action if it is deemed necessary. But scientific "accuracy" isn't necessary unless it is being built in to automatic actions applied without thought.

However, with such small numbers, it is unlikely that one should try to publish "scientific-ish" papers that suggest something has been "proven" when it really hasn't.

answered Aug 26, 2018 at 21:14

Buffy

375k86 gold badges993 silver badges1.5k bronze badges

This answer seems to be in two halves. (1) You point out that there are issues with sampling procedures that would tend to make the results statistically suspect, regardless of sample size. This is a great point, thanks, but I don't think it helps with the problem that our institutionalized practices are already statistically wrong because of sample size. (2) In the second half, you suggest that statistically insignificant results can generate discussion. I don't follow. If a result is at the 0.2 sigma level, it's a meaningless fluctuation, not an indication that we should investigate.
– user1482
Commented Aug 26, 2018 at 21:33
@BenCrowell Read the paragraph about populations. This isn't statistics at all. There is no basis that I see here for even discussing sigma levels. If in your student body you have 3 students who identify as (just for fun) gamers and all of them have failed. What can you conclude about gamers. You can only conclude something about those three people in your population. You aren't trying to extrapolate to the total California student population, I expect. If you were, you would need to use some sort of randomized selection, which this isn't. You are misusing statistics in a classic way.
– Buffy
Commented Aug 26, 2018 at 21:40
You are misusing statistics in a classic way. I don't quite understand what you mean here, since I don't advocate using these data in any way, for any purpose. I'm describing an administrative system that requires us to (mis)use them in a particular way. In the second half of your answer, you do seem to advocate using these data in a particular way (to initiate discussions, etc.), which seems odd since you seem only to give reasons why the data would not be meaningful. You have not given any reason why they would be meaningful if used for some purpose -- as you seem to claim, and I don't.
– user1482
Commented Aug 26, 2018 at 21:51
What I'm saying here is what your school is doing is not statistics at all. It is gathering data about a population, but not statistical data. Therefore the "computations" of "statistical measures" is entirely meaningless. You are plugging numbers in to formulas without understanding those formulas and their meaning. It is just noise. What you have is a census, not a statistical model.
– Buffy
Commented Aug 26, 2018 at 22:02
Graduation rates clearly are descriptive statistics for the population of students at that institution and can serve many useful purposes even when they can't be used for statistical inference about a much larger population.
– Brian Borchers
Commented Aug 26, 2018 at 22:37

| Show 2 more comments

Stack Exchange Network

Best practices for sample size in disaggregation of data on educational outcomes

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Best practices for sample size in disaggregation of data on educational outcomes

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions