Professional Documents
Culture Documents
Student Ratings of Instruction: Validity and Normative Interpretations
Student Ratings of Instruction: Validity and Normative Interpretations
Student Ratings of Instruction: Validity and Normative Interpretations
Using 3,355 class section means, the relationship between six predictor variables and
student ratings of instruction (CEQ) was investigated by computing the intercorrelation
matrix among all variables and by performing several regression analyses. Results of the
study indicated that all linear interactions were negligible and that more than one-fourth
of the criterion variance was shared with all the predictor variables. Two predictor
variables, expected grade and required-elective, provided extremely large contributions
to the prediction of the criterion measure, however. Implications of the validity results
with respect to normative data and thus to the administrative use of the ratings were
illustrated.
= ~ = t = = = = = ~ I = = = = ~ = = ~ = = ~ = t ~ = = * = = ~ = ~ - = ~ = = = ~ = = ~ = I ~ = = = ~ = ~ D = ® = ~ = m ~ = ~ = ~ i N = ~ = ® ~ * m = ~
67
68 Brandenburg, Slinde, and Batista
by their expected grade, others (Garverick and Carter, 1962; Holmes,
1971; Kennedy, 1975) did not. The relationship of class size and
student ratings was found nonsignificant by Aleamoni and Graham
(1974), while Gage (1961) and Lovell and Hayner (1955) found a
significant relationship. Some studies (Doyle and Whitely, 1974;
Hildebrand, Wilson and Dienst, Note 2) have reported that the
required-elective nature of the course was not related to ratings. Other
researchers, on the other hand (Gage, 1961; Lovell and Haner, 1955;
Magoon and Bausell, 1973), found that instructors who teach required
courses receive lower ratings than those teaching nonrequired courses.
Course level was found to be a statistically significant variable by
Aleamoni and Graham (1974) and Jiobu and Pollis (1971) but
nonsignificant by Grant (1971). Rayder (1968) has indicated that, in
general, instructor characteristics are related to a greater extent to
ratings than student characteristics.
The methodology used when investigating the validity of student
rating data has varied considerably from one study to another. Aside
from studies using instruments of unknown or unreported technical
quality, a number of studies report results based on a small number of
classes or on different sections of a single course. Other studies report
results which used only teaching assistants as ratees, while still other
studies have used students from only one or two class levels.
Furthermore, there has been a failure to report results about
cross-validations or even the appropriateness of studying linearity
(scatterplots) in almost all correlation studies. Also, several studies
have used ad-hoc rating instruments. Generalizability from such studies
tend to be limited; the methodological differences between studies may
help explain a sizable portion of the mixed results that have been
reported.
In addition to methodological differences between studies, statistical
analyses have also varied from one study to another. Analyses have
ranged from nonparametric, such as sign and chi-square tests, to
multivariate analysis of variance and cannonical correlations. The most
frequently used methods include simple correlations, linear regressions,
and univariate analysis of variance. The wide range of both the
methodological and statistical procedures between studies may help
explain the current difficulties in clearly determining which and to what
extent variables appear to have a relationship with student ratings.
Also, more useful information can be gathered regarding the Jariables
affecting student ratings if the variables proposed to have any effect are
considered simultaneously.
Sheehan (1975) has reviewed a limited number of research findings
and concluded that using student ratings for administrative decisions
Student Ratings of Instruction 69
(e.g., pay, rank, and tenure) is questionable. In a reply to Sheehan,
Aleamoni (1976) pointed out that whether the criterion variation
accounted for by various predictor variables results in significantly
different ratings is still undetermined. The purposes of this study,
therefore, were to investigate the validity of student ratings of
instruction and to assess the effect of the validity results on the raw
and scaled rating scores. Like most previous research, it would be
more appropriate to say that this study is concerned with the invalidity
of student ratings rather than their validity. That is, the extent to which
extraneous variables bias the underlying construct being measured in
such a way as to favor certain instructors over others.
METHOD
RESULTS
e~
I I
I I I
I1
II I
I t
"0
~!..~.-.~..~..
I I I
I I I I I I
T~
II1 li 111
o~
~ I I ! I I
.=
o ~ .
OO
72
'~" II
~J
L;
r
~J
~S
~J ~r~ ~- .0
°~
~D
0
0
..= ,,0
J~
L~
.= E~
~qO
.~.o
4a
i ° ~D
c~
73
74 Brandenburg, Slinde, and Batista
difference, often called incremental validity, yields the proportion of
additional variance in the criterion that is predictable from the "new'"
predictor variable(s) above that variance accounted for by the "initial"
variable(s). The incremental validity was obtained for the interaction
effects, but given the trivial increase in R by including the interaction
effects, the beta weights and cross-validation results were not reported
for the interaction effects. Given the results for the regression analysis
involving all individual predictor variables, the incremental validity was
obtained for expected grade with required elective and for the
remaining variables. Both values, taken together, provide further
information about the importance of these two sets of predictor
variables. For example, the difference between squared multiple R's
for Analyses I and II given in Table II is 0.039. This indicates that a
rather small amount of predictable variance is added by variables other
than expected grade or required elective. The difference of 0.191
between squared multiple R's for Analyses I and III of Table II
indicates that a sizeable amount of predictable variance is added by the
addition of expected grade and required-elective. Thus, these two
variables are extremely important predictors of student ratings, and
they also include almost all the information given by the entire set of
predictor variables.
The effect of the two most important variables on the raw and scaled
CEQ scores is illustrated with the results reported in Table III where a
combination of the expected grade and required-elective variables is
used to predict CEQ ratings. The predicted CEQ's were referred to the
overall norms table contained in Illinois Course Evaluation
Questionnaire (CEQ) Results Interpretation Manual Form 73
(Brandenburg and Aleamoni, Note 1) and both the predicted values and
associated deciles are recorded in Table III. It can be seen from Table
III that if a class has greater than average elective enrollment and has
greater than average expected grade (upper left), decile ratings are
substantially different than for a class with lower than average elective
enrollment and lower than average expected grade (lower right). For
the most extreme case illustrated in Table III, there is about a 1.2
standard deviation unit or a five decile difference between the two
predicted CEQ scores. Needless to say, differences of even half this
magnitude can have quite an impact on administrative decision-making.
Of course the decile differences are not as dramatic when the
required-elective norms are used, which is due to the common variance
between expected grade and required elective, but the deciles based on
the overall norms are the values most often consulted for
administrative use.
Student Ratings of Instruction 75
CONCLUSIONS
The decile differences that were reported above are large, but on the
other hand, the raw score differences are, except for the moderate to
extreme cases, somewhat small. But even small raw score differences
can rank order instructors quite differently. In addition, the raw score
differences can be made to appear larger merely by applying a linear
transformation. This is why differences were reported in terms of the
criterion standard deviation; in this form, the differences remain the
same no matter what the linear scaling. Furthermore, under a linear
transformation, the deciles will also remain unchanged. Increasing the
number of scale points is, of course, not the same as a linear
transformation, but the extent to which it operates roughly in this
fashion, then the results reported here will also generalize to scales
with additional points.
It has been demonstrated that significant shifts in ratings can occur
as the result of extraneous variables. In addition, the results also
illustrate the problem of using deciles and the problem of criterion
insensitivity. Deciles tend to exaggerate raw score differences in the
middle of the distribution but squeeze them together' at the extremes.
Criterion insensitivity refers to raters using only a relatively small
portion of the scale and thus to a small criterion standard deviation
which, at least from our experience, is characteristic of omnibus forms
such as the CEQ.
The problem of extraneous variables brings up a final concern about
using expected grade as a control variable, e.g., in a norms table or in
a regression equation which uses the difference between predicted and
actual criterion values to control for extraneous variables. We argue for
excluding expected grade as a control variable not only because it can
be influenced by the instructor, but also because it is correlated with
obtained grade and thus to final achievement. It appears quite
contradictory to argue for using expected grade as a control variable
and yet use final achievement as a criterion for the validation of
student ratings.
As previously indicated, it is often the case, unfortunately, that
administrative decisions will be influenced to a great extent by the
normative information provided by students' ratings. This being the
case, the strong relationship between expected grade and required
elective with the ratings, and their subsequent consequences on
normative data, indicate that certain instructors will be at a
disadvantage relative to their peers. Student rating information in
conjunction with self-evaluation (see for example, Batista and
%
Q ¢.1
C~
I
X
C~
I
r._.
IX
~S
C~
IX
e~
-i-
C~
~J
4-
~+
4-
IX
"i '~ riTj
~u
~L
-I--I- I I
76
Student Ratings of Instruction 77
Brandenburg, 1975), peer, and administrative information, however, is
likely to provide a reasonable evaluation of one's instructional
performance,
REFERENCE NOTES
1. Batista, E. E., and Brandenburg, D. C. (1976). The instructor self-evaluation
form: Development and validation of an ipsati~ee forced-choice measure of
self-perceived faculty performance. Unpublished.
2. Brandenburg, D. C., and Aleamoni, L. M. (1976). Illinois Course Evaluation
Questionnaire (CEQ) Results Interpretation Manual Form 73. (Manual.
Urbana, Illinois). University of Illinois at Urbana-Champaign, Measurement
and Research Division of the Office of Instructional Resources.
3. Hildebrand, M., Wilson, R. C., and Dienst, E. R. (1971). Evaluating
University Teaching. Berkeley, Calif.: Center for Research and Development
in Higher Education,
REFERENCES
Aleamoni, L. M. (1976). On the invalidity of student ratings for administrative
personnel decision (comment). Journal of Higher Education 47:607-610.
Aleamoni, L. M., and Graham, M. H. (1974). The relationship between CEQ
ratings and instructor's rank, class size and course level. Journal of
Educational Measurement 11:189-202.
Aleamoni, L. M., and Spencer, R. E. (1973). The Illinois course evaluation
questionnaire: A description of its development and a report of some of its
results. Educational and Psychological Measurement 33:669-684.
Aleamoni, L. M., and Yimer, M. (1973). An investigation of the relationship
between colleague rating, student rating, research productivity, and academic
rank in rating instructional effectiveness. Journal of Educational
Measurement 64:274-277.
Bausell, R. B., and Magoon, J. (1972). Expected grade in a course, grade point
average, and student ratings of the course and the instructor. Educational
and Psychological Measurement 32:1013-1023.
Downie, N. M. (1952). Student evaluation of faculty. Journal of Higher
Education 23:495-496.
Doyle, K. O., and Whitely, S. E. (1974). Student ratings as criteria for
effective teaching. American Educational Research Journal 11:259-274.
Gage, N. L. (1961). The appraisal of college teaching: An analysis of ends and
means. Journal of Higher Education 32:17-22.
Garverick, C. M., and Carter, H. D. (1962). Instructor ratings and expected
grades. California Journal of Educational Research 13:2t8-221.
Grant, C. W. (1971). Faculty allocation of effort and student course
evaluations. Journal of Educational Research 64:405-411.
Granzin, K. I., and Painter, J. J. (1973). A new explanation for students'
course evaluation tendencies. American Educational Research Journal
10:115-124.
78 Brandenburg, Slinde, and Batista
Holmes, D. S. (1971). The relationship between expected grades and students'
evaluation of their instructors. Educational and Psychological Measurement
31:951-957.
Jiobu, R. M., and Pollis, C. A. (1971). Student evaluations of courses and
instructors. The American Sociologist 6:317-321.
Kennedy, W. R. (1975). Grades expected and grades received--their
relationship to students' evaluation of faculty performance. Journal of
Educational Psychology 6:109-115.
Kedinger, F., and Pedhauser, E. J. (1973). Multiple Regression in Behavioral
Research. New York: Holt, Rinehart and Winston.
Kooker, E. W. (1968). The relationship of known college grades to course
ratings on student selected items. The Journal of Psychology 69:209-215.
Lovell, G. D., and Haner, C. F. (1955). Forced-choice applied to college
faculty rating. Educational and Psychological Measurement 15:291-304.
Magoon, J., and Bausell, R. B. (1973). Required versus elective course ratings.
College Student Journal 7:29-33.
McKeachie, W. J., and Lin, Y. (1971). Sex differences in student response to
college teachers: Teacher warmth and teacher sex. American Educational
Research Journal 8:221-226.
Pohlman, J. T. (1975). A multivariate analysis of selected class characteristics
and student ratings of instruction. Multivariate Behavioral Research 10:81-91.
Rayder, N. F. (1968). College student ratings of instructors. Journal of
Experimental Education 37:76-81.
Sheehan, D. S. (1975). On the invalidity of student ratings for administrative
personnel decisions. Journal of Higher Education 46:687-700.
Weaver, C. H. (1960). Instructor rating by college students. Journal of
Educational Psychology 51:21-25.