31

Per definition, grading by a curve usually means that the students are assigned grades based on the statistical distribution of the test/exam results. No matter what, say 20% of students will always fail, and only say 10% will get a perfect mark.

I see a number of serious problems with this, and I fail to see anything in favor. In my view, for a fair system, the following are necessary; and neither are fulfilled by grading by curve:

  1. A test/exam should have clear and absolute rules and requirements. Here, the students do not know what are the exact requirements to get the result they desire, as the point limits are, obviously, only known after the test/exam.
  2. The grade of the test should directly correspond to how much of the course material did the student learn and understand. The fact that the same ratio of comprehended material can give an A in one year, then a C in the next year, tells that the requirements of the course itself are also very unclear.
  3. Similarly, the grade should directly and absolutely reflect the knowledge of the student. An A+ should worth the same every time in the same test. But here, the grade only reflects the student's knowledge relative to class members. It can't be correlated to the amount of learned knowledge.
  4. The grade should only be influenced by factors under the direct influence of the student. Here, the mark depends on factors like how "bright" are the other students, how much effort they put into preparation, and even on whether they cheated. Hence, the student's mark is influenced by circumstances completely outside of his control, and unrelated to his knowledge.
  5. No matter the collective results, a given number of students will always inevitably fail. If almost everyone is a genius, there will be the same number of failures as if everyone did poorly. Good students may fail, and bad students may pass. In a similar way as the point before, a student shouldn't be faced with the negative, possibly serious, consequences of failing, just because by chance there were a lot of very bright students in the class.
  6. Also, it creates and encourages an unhealthy and harmful class dynamic. Ideally, the students help each other in preparation. If an otherwise smart student doesn't understand some part and is roadblocked, others would explain. If a student missed a class due to valid reasons, classmates help to give and explain the missed material. But learning by curve creates an artificial race situation, where the students are actively harming themselves by helping others. Decimating the collective amount of obtained knowledge, this goes against the very spirit of education.

All these problems, and I'm struggling to come up with any argument in favor of grading by curve. In fact, I see only two cases where it may be used, and both of them are a fault of administration:

  1. The department expects an exact number of students to pass the course. This is against the spirit of education IMO. If a student is good enough, he should pass. If not, he shouldn't. Regardless of others.
  2. The responsible teacher failed to set exact, absolute, and clear requirements, and/or is lazy or unable to do so. Because of his fault, the students are exposed to an unfair grading system.

Am I missing something here? Are there any objective arguments in favour of grading by curve, and what are these? If not, or they are weak, what are the reasons it's still used, apparently widely?

16
  • 6
    "The department expects an exact number of students to pass the course": alas, this is exactly what many departments expect, because virtually everywhere nowadays having more students who complete their studies in a reasonable short time means more funding for the university (whether from student fees, government funding, improvement in the international rankings etc.) Commented May 1, 2021 at 13:52
  • 9
    "No matter what, say 20% of students will always fail, and only say 10% will get a perfect mark" - this is not a form of "curved grading" I have ever seen; usually the curve is only to increase and never to decrease the grades. It sets the top grades to the top scores in the class presuming that scores lower than perfect are more due to the difficulty of testing or inadequacy of instruction.
    – Bryan Krause
    Commented May 1, 2021 at 14:11
  • 3
    Does this answer your question? Why do educators use curve to adjust the performance?
    – Bryan Krause
    Commented May 1, 2021 at 14:12
  • 11
    Many people writing about education--notably Alfie Kohn--have made the observation that a lot of educational practices are intended to sort students by "ability" rather than to educate them. Curving is one of those unfortunate practices. Commented May 1, 2021 at 22:12
  • 2
    @AzorAhai-him- That was just a number I put there, I don't know what's common. Anecdotally, we had some courses with absolute grading and almost 50% fail rate. Rightfully, I must say, freshman course and most students didn't get yet what university is about apart from carpe diem.
    – Neinstein
    Commented May 3, 2021 at 21:57

7 Answers 7

9

I agree with you for two types of courses:

  • Very small classes with fewer than, say, ten students. It is very possible that these students all deserve As or Fs and we should not try to impose a bell curve where there is none.
  • Very large classes like calculus I and physics I, particularly when there are multiple sections of each class. These classes are taught so often that it is very possible to design a sensible, static "absolute" grading scheme. Such a scheme will also give highest grades to the students who know the most, without regard to their instructors' competence or peers' ability (among the other advantages you list).

But for classes that do not fall in either of these categories, I submit that "curving" and setting "absolute" standards are just two ways of framing the same practice (and "curving" is more honest). It is a statistical reality that most medium-sized and large courses have students that master, partially master, and fail to master the material, and the instructor naturally wants the grade distribution to reflect this. So "absolute grades" in this case are somewhat disingenuous -- if everyone is failing or getting an A at midterm, for example, the instructor is likely to make some changes (e.g., making the final exam easier or harder than anticipated) in order to force the desired histogram. In this regard, explicitly curving is much more transparent; we are simply admitting that we will make changes as needed to ensure the desired curve. Further, even "curving" does leave some discretion to the instructor: if I feel that I have a great group of students that has learned a lot and is working hard, I might curve such that the average is a B+ rather than a B. Similarly, I would always consider the tails of the distribution manually.

Given that they are equivalent, I agree that "absolute" grading gives a healthier class dynamic. As a student, I recognized that both grading schemes were effectively the same thing, and always find purportedly "absolute" grades to be somewhat disingenuous (unless the professor had actually taught the course often enough to be able to say that the exams were already written and nothing would change under any circumstances). But as a teacher, I learned quickly that most students do not realize this, and that they tend to respond better to the purportedly absolute system. In the same way, a very hard test with a massive curve is mathematically equivalent to an easy test with an unforgiving curve, but students respond much better to the latter. Irrational, but there you are.

Personally, I used both. When I was teaching a course for the first time, it was difficult to predict what grade distributions would look like so I would curve. But, I would also make adjustments to the curve based on my observations. When I was on surer footing, I would say things like "900 points is a guaranteed A, but I might give As for fewer points, we'll see how it goes"; I found that this was both honest and led to a healthy class dynamic.

10
  • 10
    Actually, I'd be incredibly happy to give all A's. Sadly, I never found it possible except with doctoral students.
    – Buffy
    Commented May 1, 2021 at 21:40
  • 1
    Well, I am assuming the class is more-or-less typical; of course I’d be delighted to have a genius class that universally masters the material and all deserve As...I would also like a friendly pet dragon :-)
    – cag51
    Commented May 2, 2021 at 0:41
  • 2
    If everyone is getting an A or an F by the midterm, the instructor seriously needs to look at whether the course objectives are clear and whether the assessments are aligned to the objectives. Yes, it can happen that a whole class is awesome or horrible, but it's much more likely the instruction is off track and should have been fixed before the midpoint of the term.
    – Kathy
    Commented May 3, 2021 at 14:29
  • 4
    "we all want to have a nice bell curve that separates the top, average, and weak students" ...ehh, no? I would like to have students that are each, individually, able to demonstrate a certain level of understanding and/or skill.
    – Servaes
    Commented Jan 30, 2023 at 6:59
  • 2
    I think that the argument for "they are equivalent" is wrong and also the claim itself is wrong. I also think that many people do not have "a desired curve" at all. I have standards that I would like the students to reach. That's it. No curve.
    – Dirk
    Commented Jun 6 at 11:18
20

My firm belief and policy is that a student's grade should depend on their own efforts and nothing else. In that case, curving the class is actually just wrong.

But my grading policy is pretty different from most. First, I use (used, actually, as I'm retired) cumulative grading. Every student task was assigned a point value and those added to, say, 1000. If you did a 100 point task, you might get 90 points based on assessments of quality. The breaks between grades were defined, say 900 to earn an A.

Second, I permitted rework if a student earned less than they expected to earn, but rework only got you back part of the lost marks. Therefore, I didn't get complaints about strict grading. "If you want more points on this, do a better job of it". This also allowed me to encourage students to do better work and rethink poor assumptions.

Third, I assumed that I wasn't perfect and that I occasionally was overly strict, so a student never missed a final grade by what I considered an infinitesimal amount. 899 points didn't mean a B.

When I came time to assign final grades, I'd look at the earned distribution and make a decision whether that represented what I thought they had actually learned and I might adjust all grades upwards (only upwards) a few points if I thought that, at that point I'd been too strict along the way.

I had the reputation of being very demanding, and I was, both in the quantity and difficult of work. But I was just a bit less strict than my reputation. Students after the last class were never disappointed when they saw their final grade. They were able to compute it along the way, and they might learn that they did a bit better than their expectations.


No student should suffer in a class simply because, just by chance, there are a number of genius level other students.

Curving, is, I think, an admission of failure of the admissions system itself, which tries to predict that admitted students will be successful. If the system is set up so that some are guaranteed to fail then something is extremely wrong.

Expecting that the distribution from one year to another should be the same in an individual course is also foolish. The students aren't randomly selected from a known distribution in large enough numbers (for a single class) that statistical assumptions have any validity. Some years the students just work harder than in others. Some years they have to deal with things (pandemics, say) that make their work much more difficult.

Grade individuals, based on their performance. Make it possible that the performance can be good or can improve. Don't assume that the students in your class don't really need to be there. Be a teacher, not a grader.


Also see this question and my answer there.


Note: To make rework possible and not overly burdensome on the grader, students turned in work on paper. For rework they also turned in (in a folder) all of the previous versions that had been commented by the grader. They also highlighted, in the new work, changes to the most recent version. It was easy and quick to see if additional points should be awarded and easy to mark up the latest document with suggestions, etc.

This may not scale beyond 30 students per grader, I realize. I know that at 40, scale starts to be an issue.

1
  • "... a student never missed a final grade by what I considered an infinitesimal amount. 899 points didn't mean a B." But that then means an A is from 899 not 900, and then your rule becomes "899 points didn't mean a B" which then means an A is from 898. And so on.
    – user95861
    Commented Jun 10 at 13:09
3

I don't agree with grading on the curve in most cases and I agree with most of the criticisms highlighted in the original post, but that doesn't mean that there aren't any arguments in its favor. Generally these arguments can be summarized as "No human is perfect and students/employers* shouldn't suffer because its impossible for a professor to be perfect". (* delete as your biases see fit)

No exam can test every single thing on a syllabus, so no two exams are likely to test exactly the same selection of material. Further, no two questions on the same material are of exactly the same difficult. Its not just that the setter failed to generate two questions of identical difficult, but that having two questions of the same difficulty is impossible.

So we must conclude that no testing systems fit all the criteria you set out above once you get above trivial knowledge-recall or mechanical-process application based tasks, which shouldn't be what university level education is about. For example, two of the Learning Outcomes on the genomic module I teach are:

"Be able to construct a convincing argument using evidence from high-throughput/genome-wide/genomic experiments"

"Create reasonable, testable hypotheses from genomic science research questions and design realistic experiments to test them"

There is no fully objective test of these with a precisely defined difficulty. For the first one there will always be a judgement on whether an argument is "convincing", but also some judgement of the extent of which an argument uses high-dimensional data. While for the second one can imagine an answer that does or does not make the criteria (although actaully designing one is much harder than you'd imagine), its impossible to come up with two questions of precisely matched difficulty.

So all systems of assessment are compromises between different priorities. What compromise you will settle on will depend on where your priorities lie. The design of valid and fair assessment is an entire academic discipline of its own.

To decide on the proper priorities for assesment design, this we must consider what the purpose of assessment is. I can think of three possibilities:

  1. Guide the student as to where their strengths and weaknesses are, help them focus their efforts
  2. Ensure to a "consumer" (e.g. an employer) of the student that they meet a certain minimum standard on some task
  3. Measure the "ability" of students, either for this particular subject or just in general.

So at one end of the spectrum you have things like a driving test, where a person must demonstrate they are capable of performing a predetermined list of skills - its not about whether you are a better driver than your friend, just whether you meet the basic safety standards. And on the other end, NASA wants to recruit the best candidates for the space program. It will spend years training them, so what they actually do now is not as relevant as just selecting the "best" people.

I, and I like to think most educationalists, don't particularly care for reason 3. But many employers do, and because the employer does, students do.

Curve grading prioritizes measuring "how good" a student is, over whether they can perform a predetermined set of tasks. It means, for example, if the teacher is ill one day and performs the lecture poorly, the students from that class will not loose out to students from a different class when they come to apply for a job. Or students that take hard classes will not see their grades suffer for that choice.

It relies on the assumptions that your sample size is large enough that a change in mean test score is a more likely explanation than a change in average student ability. Thus, it was originally implement in massive, standardized tests, like the GRE in the US, or the nationwide GCSE (General Certificate of Secondary Education - a set of exams taken by all 16 year olds in the UK). These exams are taken by 100s of thousands of students at a time, are generally marked by more than one person, and are used to decide which subset of students gets access to some limited resource. They may well be appropriate in these circumstances.

To come back to the actaul question - how is it fair?

It is fair under two possible conditions:

  • If the aim of the exam is to sort/filter students. In particular if the student wishes to be compared to students from other classes/universities when applying for a job.
  • You believe that variation in test difficulty is greater than the variation in average student ability year-to-year.
  • You believe that the variation in teaching effectiveness varies from year to year more than the ability of students.

Note that the final two reasons are the fault of the instructor. But this immaterial. Given that instructors will be imperfect, what is the best way of negating the effects of this for students.

All that said, I am not particularly convinced of the arguments for curve grading in university class type situations.

3

One example where a curve could not only be warranted, but also beneficial is in cases where the course/exam is purposefully designed to be extremely difficult or 'discriminatory' towards 'worse' students.

In my graduate school experience we had exams that were designed to be extremely difficult. So much so that almost all the students would score an F. But then these were placed on a curve. One of the underlying purposes of the course was for the professors to figure who the strongest students were in order to offer them acceptance into the PhD program. At the end of each semester the professor basically ensured everyone passed the course and got a degree. But he also was able to discern who the brightest students in the class were.

1

Professors sometimes forget how difficult their material can be. As a student, my professors sometimes say after a particular exam that they "probably over-did the exam," or "made it way too long by mistake," etc. If almost everyone taking the exam requests extra time, it is likely that the exam was overly difficult, and the professor can correct their own mistake by curving the exam.

For smaller classes, the professor can use students that they are familiar with as a baseline: "student X is the best in the class, and even they couldn't solve this problem...I should probably give that problem less weight."

2
  • I can understand a retrospective positive-only correction on an exam that turned out to be more difficult than it should've been. But I encountered GBC as the per-syllabus method of grading several times, two times at my university and many times reading about it, here or elsewhere. I'm more confused about that kind of application. This, sure, I can support it.
    – Neinstein
    Commented Jun 10 at 9:52
  • 1
    @Neinstein To be fair, the question asked was "are there any objective arguments in favour of grading by curve." This is one. I think I agree with you that many exams are curved that shouldn't be; however, I don't think it's as detrimental as you suggest in the question. Commented Jun 10 at 14:08
0

Suppose that students can choose to take either course A or course B. After the exams have been sat, the results for course A are significantly higher than course B. Now there are two possible reasons for this (and, of course, they could both apply):

  1. The exam for course A was (unintentionally) easier than the exam for course B.
  2. The students who chose course A were generally stronger than those who chose course B.

Now if you adjust the grades ("grade on a curve") and 2 applies, you unfairly penalise students who happened to choose the same course as most of the stronger students. However, if you don't adjust the grades and 1 applies, you unfairly penalise students who happened to choose the course where the instructor set the harder exam.

So it sounds like the arguments for and against more-or-less balance out, with maybe a reason for preferring the status quo (that you shouldn't intervene unless you are confident your intervention is an improvement). However, there are a few reasons why I don't agree with this.

  • Actually, 1 is more likely to happen than 2, simply because it only depends on the actions of one person and not a group of people (and getting the difficulty level precisely right is hard).
  • If you do, by prior arrangement, adjust the grades, then 2 becomes even less likely, since you remove the incentive for weak students to choose "easy" courses.
  • You can get an approximate idea of to what extent 2 applies, based on previous grades for the students, and take that into account. When I was examining in an institution that required adjustment of grades, each course was given (in automated fashion) a target average score based on the predicted strength of students enrolled on that course. If the actual average result was significantly higher (or lower) than this, that suggested the exam was too easy (or hard).
  • Adjusting grades is important if you want to be able to compare courses from very different fields, even if the average marks in those courses are comparable. This is because STEM subjects naturally produce a much wider spread of marks than humanities subjects.
0

Check out the four publicly available Physics GRE practice exams.

ETS have provided the % of students who get individual questions right. If you examine that statistic, it should be clear that two of the four tests are relatively hard because there were more questions which few students got right, and one of the four is noticeably harder still.

ETS got around the different difficulties by grading on a curve. To score 990 (the maximum score) in these exams, you need to get 84/100, 76/100, 67/100, and 85/100 respectively.

If you don't grade on a curve, how do you propose to accommodate the varying test difficulty? You could tell the students beforehand "to get 990 in these exams you need to score 85/100", and then 0% of the students in three of the four years will score 990. That makes the exam useless as a standardized test.

Some responses to your arguments as well:

  • The exact requirements are clear: answer as many questions correctly as you can. This is the case regardless of how the exam is graded.
  • The grade can be correlated to the amount of learned knowledge. If you know more you get a higher grade (for that class). Comparing students between different classes doesn't work, but then it doesn't work without curved grading either, as you can see from the Physics GRE practice exams. You can't easily compare students who took different exams.
  • Curved grading usually sets the top grade; it doesn't say "10% of students get A's and 10% of students get F's". Hence curved grading does not a priori cause students to fail if there are lots of bright students in the class. For the same reason it doesn't a priori cause student vs. student competition.
  • Curved grading eliminates grade inflation. It handles both "exam too easy, everyone getting A" and "exam too difficult, everyone failing" situations. This maintains value for the grade, since you can reasonably e.g. say someone who fails did not understand the material as opposed to simply had a very difficult exam.

Edit: here's a link to an article explaining how curved grading is implemented in practice. It also explains why curved grading is used. Relevant part of the article is:

Setting such an exam [for a large group of students] is, by no means, easy. Pitch it tough, most students will fail. Set it too easy, and many will score very high grades, and the resulting scores are hardly differentiated.

Differentiation is necessary for CAP purposes, and for Honours classification, and these are here to stay for the foreseeable future. Most if not all major universities have variants of degree classes or GPA scores. And because of the need for differentiation, many institutions from North America to Asia, use the bell curve as a mechanism to moderate marks.

Module requirements may encompass different modes of assessment such as tutorial presentations, laboratory reports, projects, essays, as well as mid-term and final examinations. Grading may be based on absolute performance, relative performance, or a combination of the two. Higher-level modules with small enrolments typically grade a student based on his absolute performance; larger lower-level modules take into account a student’s performance vis-à-vis the other students in the same module. Where necessary, the final grade which a student receives for a module may be subject to moderation.

One important reason for grade moderation is that examiners come from diverse academic backgrounds and may be accustomed to different marking regimes. While we do make every effort to make sure modules are designed with clear learning outcomes, and professors are responsible to ensure their exams are pitched at the right level, grade moderation will prevent grade inflation or deflation, and helps to achieve consistency in assessment grading across modules.

14
  • 13
    The GRE is not at all the same thing. The numbers of people taking the exam are large enough that statistical assumptions are valid. That is hardly the case for course exams.
    – Buffy
    Commented May 1, 2021 at 16:03
  • 9
    Curving can't undo "exam too easy, everyone getting A" in a remotely fair way. At the most extreme, if everyone gets 100%, there is no way to assign different grades. But even if everyone gets between 90-100%, are you really going to give the 90%-scorers anything below an A-? Commented May 1, 2021 at 17:00
  • 4
    @Allure It should be excepted that the teacher knows his own syllabus well, therefore it can also be excepted that he has a good idea whether his own exam question is easy, hard or impossible to answer for someone who obtained all knowledge from the syllabus. Even if it's the first time (s)he set up an exam. Sure, lack of experience can make it harder; but the solution to lack of experience is not implementing an unfair grading system, but research, consultation with more experienced collegaues, etc.
    – Neinstein
    Commented May 3, 2021 at 6:48
  • 4
    @Allure Sure it happens a lot, examining is hard! But it's still the fault of the exam setter. And such cases can be fixed by the students complaining, the exam setter admitting the fault, and compensate in the othervise absolute grading system accordingly. No need to implement an unfair grading system just because the teacher may make mistakes...
    – Neinstein
    Commented May 3, 2021 at 7:10
  • 9
    @Neinstein you are asking people to do the impossible. I have been teaching for many years and still am occasionally unable to predict how well students will do on an exam question I design. Friends of mine who are senior professors have complained of the same difficulty. Sorry, your expectations of what professors are able to do when they are experienced are grossly unrealistic.
    – Dan Romik
    Commented May 3, 2021 at 7:19

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .