A Lesson from Buffalo in Understanding Cut Scores in Standardized Testing

By Elizabeth Scheinberg

Understanding cut scores is key to rational discussion about the performance of Buffalo schools

Mark Garrison
Buffalo Education Technology Examiner
August 5th, 2010 11:37 pm ET.

Following my July 29 article on how New York State imposed mass failure on Buffalo schools, a teacher confessed to me that he did not understand the conception or significance of “cut score” discussed in that article.
Little public discussion of the limits of testing technology

This confession reveals several things. First, it reveals that the technical language surrounding standardized test development, administration and interpretation is unnecessarily obtuse, making this “social technology” less easy to understand.

The phrase “cut score” is one such example. A more obvious, but equally appropriate phrase would be “passing score.” So, a “cut score” is the minimum score required of a student to be considered having performed at a certain level. “Cut” or passing scores answer the question, how much is good enough?

The written portion of New York’s driver’s exam offers a useful example. Knowing that most people “pass” the written portion of this test tells us little about the test or people's driving ability. We need to know what the passing score is and how the passing score is determined; we also need to review sample test items to evaluate the test. Knowing that one must answer 80 percent of the questions correctly to pass helps, but it begs the question of how 80 percent was chosen as the passing score. Why not 90 percent, or 70 percent? Further, are all questions on this written portion of the test equally difficult, as an aggregate percentage of correct answers assumes? Are all items equally important? Knowing what a stop sign looks like seems far more important than knowing the symbol for slippery road.

When it comes to educational testing, the New York State Education Department (NYSED) offers no public answers to these types of questions. We don’t know what the passing scores are for this year’s 3-8 grade math and English tests, nor do we know how that passing score was determined.

On July 29, this examiner requested from NYSED the analysis they purport to have conducted as a basis for raising cut scores in New York. As of this posting, they have not responded to the request.
Setting passing scores is political

This reveals a second trend: “States,” education policy analyst Andrew Rotherham observes, “rarely explain what it actually means for a student to pass a state test, to be ‘proficient,’ or how passing scores are established.” Media outlets simply report the results without explanation; educators are forced to respond to the reports as if test scores are a force of nature.

Yet Rotherham emphasizes that determining these passing scores is subjective -- meaning is it a judgment call. There are a number of technical procedures to help in determining “cut scores” -- some of which are outlined in Rotherham’s article -- but they all rest on the judgment of a relatively small group of people, largely hidden from and unaccountable to the public.

Thus, test scores do not represent some absolute, infallible truth. Standardized tests only offer us an estimate of a student’s knowledge. Because it is an estimate it is by definition fallible, just as weather predictions are fallible. Yet these simple caveats are hidden from the public.
Possibly more important for understanding present trends is the political nature of setting cut cores. Rotherham observes:

Political considerations can also influence the setting of cut scores -- and sometimes do. As a general rule, state policymakers want to look good, and this can create a downward pressure on passing scores. States also often set cut scores lower than they otherwise might in order to create buy-in from educators and the public. While high passing scores might earn plaudits from some educators and school reformers, they can erode public and educator confidence in various reforms because progress appears daunting. Political influences on cut-score setting can be subtle. Decisions about the composition of score-setting panels, for example, can affect the process in largely untraceable but potentially powerful ways.

It is no coincidence that with the arrival of a new commissioner of education in New York, new cut scores are produced. Dramatic changes in test scores often vary with political transitions, a pattern denoted by education researcher Robert Linn as the “saw tooth effect”: school systems learn the test and show test score gains over time on that test. When a new test is introduced by a new administration, test scores drop (graphing this trend reveals a “saw tooth” pattern). The failure induced by the new administration serves to provide it with a sense of urgency that can be used to mobilize support for its agenda.

What is even less well known is that forces like the Business Roundtable work to have their executives sit on “cut score committees” so that they can manipulate the scores. Typically, as evidenced in this earlier effort, they seek to produce more failure in order to bolster the need for their “reform” agenda.
To help inform readers, here are some recommended reading:

Measuring Up: What Educational Testing Really Tells Us, by Daniel Koretz

The Paradoxes of High Stakes Testing: How They Affect Students, Their Parents, Teachers, Principals, Schools, and Society, by George Madaus, Michael Russell & Jennifer Higgins

A Measure of Failure: The Political Origins of Standardized Testing, by Mark J. Garrison

Category: 0 comments