Student Evaluation of Teaching (SET) handout
Handout distributed during Red Balloon Faculty Forum #4–Student Evaluations
November 22, 2010
posted with permission from Dr. Elaine Blakemore, Associate Dean College of Arts and Sciences, IPFW
Student Evaluations of Teaching (SET)
Summary of Research Findings and Recommendations
Elaine Blakemore, November 22, 2010
Important Part of a System of Faculty Evaluation
Virtually every scholar who summarizes research on student evaluations and evaluation of faculty concludes two things:
- That student evaluations should be part of the evaluation of university teaching.
- That they should only be a part, and that there are many other factors that ought to be used to evaluate the teaching role.
What Do Student Evaluations Measure?
- They are a measure of student satisfaction with instruction.
- They generally agree with other measures of impressions of effectiveness or satisfaction (alumni, peer ratings under certain conditions, expert ratings of videotapes, etc.).
- There are four factors (three meaningful ones) in well-constructed instruments:
- Instructor’s role in delivering information (clarity, enthusiasm, organization, plus the global items load on this; most variance here).
- Instructor’s role in facilitating a social environment (concern, respect, fairness)
- Instructor’s role in evaluating student work.
- Odds and ends, hard to interpret (e.g., knowledge of subject matter, maintaining classroom discipline, choosing appropriate materials).
- They measure satisfaction predictably and well.
- Most variance is related to things that faculty do in the classroom and how they construct the course.
Are They Affected by Extraneous Factors?
- Course-related factors (class size, rigor, discipline, reason for taking class, level of class, etc.)
- Instructor-related factors (gender, age, physical appearance, personality)
- Yes, these factors impact student evaluations, so cannot be ignored, but not nearly as much as things faculty do in the classroom and in constructing the course.
- Not much evidence for a leniency effect (easy grades for good evaluations), but consistent evidence for a reciprocity effect (good students give higher evaluations)
- Instructor enthusiasm has a notable effect on student evaluations (on average raises them more than 1 SD), but it also affects student learning.
Are They Related to Student Learning?
The most compelling evidence comes from the multi-section study – different instructors of several sections of a single course with a common final measure of learning, usually a common final exam. Between 50 and 100 such studies exist, and there have been about half a dozen meta-analyses of this research. Meta-analysts conclude that there is a consistent small to moderate relationship between SET and the final common measure of learning. The students of higher-rated instructors perform somewhat better on average on the final measure of learning.
How to Use and for What Purpose?
- For summative review (P&T, performance evaluations, reappointment), a single global item, or a combination of items measuring the factors above should be used to make “crude” judgments (e.g., exceptional – adequate – unacceptable). Finer judgments than this are not necessary for these purposes. Norm comparisons and open-ended items should generally be avoided for these purposes.
- For formative review and improvement, specific items on both rating scales and open-ended questions (e.g., textbook, tests, class discussions) are most helpful. Norms can help here to let faculty know where they tend to be weaker or stronger than colleagues, which can assist improvement.