15

Arum and Roksa (p. 7) say:

Research on course evaluations by Valen Johnson has convincingly demonstrated that "higher grades do lead to better course evaluations" and "student course evaluations are not very good indicators of how much students have learned. "

I don't have access to Johnson's book, but a review states:

[Johnson] found the "grade-attribution" theory the most useful: "Students attribute success in academic work to themselves, but attribute failure to external sources" (96). Regardless of the reason, the analysis provides "conclusive evidence of a biasing effect of student grades on student evaluations of teaching" (118).

Johnson did his work in the US. If I'm understanding correctly based on the fairly brief descriptions I have available, he managed to get permission to spy on students' actions over time, so that he could actually detect not just correlations but the time-ordering of events, which could help to tease apart questions of causation.

Johnson says that evaluations are "not very good" indicators of learning. My question is basically on what the available evidence is as to what "not very good" means. It's possible that someone could answer this simply by having access to Johnson's book and flipping to p. 118.

If "not very good" means low correlation, then it would be interesting to know whether the correlation is statistically different from zero, and, if so, what its sign is. My guess, which encountered a very skeptical reaction in comments here, was that the correlation might be negative, since improved learning might require higher standards, which would tend to result in lower grades.

If the correlation is nonzero, it would also be interesting to understand whether one can infer that learning has any causal effect on evaluations. These two variables could be correlated due to the grade-attribution effect, but that wouldn't mean higher learning caused higher evaluations; it could just mean that better students learn more, and better students also give higher evaluations.

If we had, for example, a study in which students were randomly assigned to different sections of a course, we might be able to tell whether differences between sections in learning were correlated with differences between sections in evaluations. However, my understanding is that most of these "value added" analyses (which are often done in K-12 education) are statistically bogus. Basically you're subtracting two measurements from one another, and the difference is very small compared to the random and systematic errors.

My anecdotal experience is that when I first started teaching, I was a relatively easy grader, I got very high teaching evaluations, and my students did badly on an internationally standardized test that I gave at the end of the term. Over time, I got confident enough to raise my standards, my teaching evaluations went down, and my students' learning got dramatically better, as measured by this test.

References

Arum and Roksa, Academically Adrift: Limited Learning on College Campuses

Valen Johnson, Grade Inflation: A Crisis in College Education, 2003

related: Do teaching evaluations lead to lower standards in class?

11
  • 4
    I suspect that teacher's evaluations are designed to evaluate what the faculty wants to evaluate about its teaching body, and the outcomes might not be correlated to learning by design. Commented Aug 21, 2016 at 22:18
  • 2
    @MassimoOrtolano: Good point. The forms used at my school have remarkably little material on them that is actually about learning. It's mostly stuff like, "The instructor's grading criteria were made clear," and "All students were treated equally." (These are not real quotes, just my recollection of the style of the questions.)
    – user1482
    Commented Aug 21, 2016 at 22:21
  • 4
    At my R1 univ, teaching evals are required by the administration, but/and completely ignored except for moments that someone wants to sabotage tenure of a junior person. And, some decades of observation does tend to corroborate the skeptical viewpoint mentioned above, in any case! ... and that more personable instructors are better liked. Gosh, who could have predicted that? I guess the thing to remember is that the goal of teaching is not to induce the students to "like" the teacher, but to ... impart knowledge? Get them past some gateway exams? Help them, even if against their own will? :) Commented Aug 21, 2016 at 22:51
  • 5
    ... and, yes, the "evaluation forms" at my place have recently been redesigned by apparently very naive people (hired at great expense, etc.) on-line into glossy-but-irrelevant "UX" thingies. I imagine that people of students' ages have long-ago become disenchanted with such stuff... But, again, the only real point is that "teachers" are not teachers, literally, but (status) gatekeepers. Unfortunately. Not my fave role. Commented Aug 21, 2016 at 22:58
  • It looks like you want answers supported by citations, not anecdotes. In which case, seems like this should be tagged reference-request?
    – ff524
    Commented Aug 28, 2016 at 23:10

2 Answers 2

13

The answer, from new research approaches dating from 2010, seems to be that increased learning tends to cause lower scores on students' evaluations of teaching (SET), but this is a complicated issue that has historically been a bone of contention.

There is a huge literature on this topic. The people who study this kind of thing the most intensely are psychometricians. There are many things on which they seem to agree universally, and many of these areas simply represent the consensus view of professional psychometricians on their field in general:

  • The surveys used for students' evaluations of teaching (SET) should be designed by professionals, and are basically useless if created by people who lack professional expertise in psychometrics. Certain common practices, such as treating evaluation scores as if they were linear (and can therefore meaningfully be averaged), show a lack of competence in measurement.

  • It's a terrible idea to use SETs as the sole measure of a teacher's effectiveness. Multiple measures are always better than a single measure. But, as is often the case, administrators tend to prefer a single measure that is cheap to administer and superficially appears impartial and scientific.

  • SETs are increasingly being given online rather than being administered in class on paper. This is a disaster, because the response rates for the online evaluations are extremely low (usually 20-40%), so the resulting data are basically worthless.

  • The difficulty of a course or the workload, as measured by SET scores, has nearly zero correlation with achievement.

  • SET scores are multidimensional measures of multidimensional traits, but they seem to break down into two main dimensions, professional and personal, which are weighted about the same. The personal dimension is subject to biases based on sex, race, ethnicity, and sexual orientation (Calkins).

Getting down to the main question: does better learning affect teaching evaluations?

Before 2010, the best studies on this topic were ones in which students were randomly assigned to different sections of the same course, and then given an identical test at the end to measure achievement. These studies tended to show that SET ratings had correlations with achievement of about +0.30 to +0.44. But Cohen says, "There is one study finding of a strong negative relationship between ratings and the highest rated instructors had the lowest performing students. There is also one study finding showing the opposite, a near perfect positive relationship between ratings and achievement." This lack of consistency is not surprising, because we're talking about different fields of education and different SET forms. A typical positive correlation of +0.4 would indicate that 16% of the variance in students' performance could be attributed to differences between teachers that could be measured by SETs. Although 16% isn't very high, the sign of the correlation in most of the studies is positive and statistically significant.

But starting in 2010, new evidence arrived that turned this whole picture upside-down (Carrell, Braga). In these newer studies, students were randomly assigned to different sections of a class such as calculus, but they were then followed later in their career as they took required follow-up classes such as aeronautical engineering. The Carrell study was done at the US Air Force Academy, and due to the academy's structure, there was low attrition, and students could be forced to take the follow-up courses.

Carrell constructed a measure of added value for each teacher based on their students' performance on a test given at the end of the class (contemporaneous value-added), and a different measure (follow-on course value-added) based on performance in the later, required follow-on courses.

Academic rank, teaching experience, and terminal degree status of professors are negatively correlated with contemporaneous value-added but positively correlated with follow-on course value-added.

We find that less experienced and less qualified professors produce students who perform significantly better in the contemporaneous course being taught, whereas more experienced and highly qualified professors produce students who perform better in the follow-on related curriculum.

Braga's study at Bocconi University in Italy produces similar findings:

[We] find that our measure of teacher effectiveness is negatively correlated with the students' evaluations: in other words, teachers who are associated with better subsequent performance receive worst evaluations from their students. We rationalize these results with a simple model where teachers can either engage in real teaching or in teaching-to-the-test, the former requiring higher students' effort than the latter.

References

Abrami, d'Apollonia, and Rosenfield, "The dimensionality of student ratings of instruction: what we know and what we do not," in The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective, eds. Perry and Smart, Springer 2007 - link

Braga, Paccagnella, and Pellizzari, "Evaluating Students' Evaluations of Professors," IZA Discussion Paper No. 5620, April 2011 - link

Calkins and Micari, "Less-Than-Perfect Judges: Evaluating Student Evaluations," Thought & Action, fall 2010, p. 7 - link

Carrell and West, "Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors," J Political Economy 118 (2010) 409 - link

Marsh and Roche, "Making Students' Evaluations of Teaching Effectiveness Effective: The Critical Issues of Validity, Bias, and Utility," American Psycologist, November 1997, p. 1187 - link

Stark and Freishtat, "An Evaluation of Course Evaluations," ScienceOpen https://www.scienceopen.com/document/vid/42e6aae5-­‐246b-­‐4900-­‐8015-­‐ dc99b467b6e4?0 - link

1
  • 5
    Something I noticed as a student but I've never heard of being looked at in evaluations: Poor students had very different notions of who were good teachers than good students. Commented Mar 27, 2019 at 4:46
1

Only commenting on your own experience, I can say that mine corresponds (roughly, very roughly) with your second-half. This is so even though the students were (literally) Ivy League material. I did notice a tad more interest and engagement than in a state university, but only for a few. The group was more concerned with checking boxes for the class, and if I got too critical-thinky, the bad comments came out.

That said, better learning does lead (arguably) to better grades, and so this can help the evaluations. But here we may have to distinguish between the immediate learning that leads to grades, and the longer-term one that produces better thought.

Don't kill yourself - simply mix-in short- and long-term concerns in your teaching. But the latter may cost you some points, although it is the right thing to do.

1
  • 1
    Thanks for sharing your experiences, but this isn't an answer. This would have been better as a comment.
    – user1482
    Commented Aug 28, 2016 at 20:57

You must log in to answer this question.