Commentary

A New Blueprint for State Standardized Testing

This piece originally appeared in Education Next.

Statewide standardized testing has played a central role in education policy for decades, as policymakers have sought to get a clearer picture of how schools are performing and to promote improvement. But support for state testing has been steadily eroding. If testing advocates hope to preserve state testing and its many benefits, it’s time for policymakers to rethink the role of the tests, including the possibility of abandoning the federal requirement that every state use test results to identify schools for improvement.

State tests have been attacked from many directions and for many reasons. They’re a time sink, critics charge. They encourage schools to prioritize low-level skills, test prep, and tested subjects at the expense of a richer curriculum, and they cause teachers to focus disproportionately on “bubble students”—those who are close to testing at proficiency. They prompt school districts to clog the calendar with additional tests to gauge students’ readiness, and they fail to help teachers in their classrooms.

At the heart of these and other indictments lies the fact that different stakeholders want state tests to serve two distinct, equally legitimate, and largely incompatible roles. Some want the tests to provide policymakers with information on student achievement that’s comparable across schools and school districts, with the goal of holding schools accountable for results. Others want the tests to give educators and families detailed information to improve instruction and monitor individual student progress. States and test developers have tried to reconcile these competing demands, but it has proven impossible to achieve both goals.

Solving the Stalemate

As a result, many states have abandoned the high-quality state tests developed at substantial public cost by the Partnership for Assessment of Readiness for College and Careers (PARCC) and Smarter Balanced consortia. The competing priorities have stymied nascent testing innovations and paralyzed the national discussion on how to do the critical work of helping students acquire the academic skills, knowledge, and habits of mind necessary to pursue meaningful postsecondary options. And they have played into the hands of those who would strip all state testing provisions from federal law—putting at risk state testing’s contributions to research, school improvement, and educational equity. Recent moves by Wisconsin and other states to lower the proficiency bar on their tests suggests the importance of public scrutiny of student performance.

The best way out of the testing stalemate is to reduce the demands on state testing by revamping federal testing provisions designed to identify low-performing schools for improvement and then lean into understanding individual student performance at the local level to inform parents and strengthen instruction. This two-part strategy would weaken the case of testing abolitionists by improving the quality and shrinking the scale of state testing while preserving its core mission: helping policymakers, parents, and taxpayers understand public school performance against state standards.

Using state tests to compare schools’ performance presents a challenge because it is predicated on high levels of test security, testing students on equally demanding grade-level content, and standardized rules for test administration and scoring.

That means teachers, parents, and students cannot see test questions or answers without the costly process of crafting new items every year. It means testing students in every school and school district on comparable content under the same conditions. And it means presenting a student’s answers as a single, year-end score that’s aligned to the state’s standards. Only with these features can states confidently—and legitimately—use test results to target schools for improvement or otherwise hold them accountable for their students’ performance.

But these requirements conflict with the demand, enshrined in the federal Every Student Succeeds Act of 2015, or ESSA, and echoed by many testing advocates, that state tests yield “diagnostic” information that teachers and parents can use to help individual students improve. While that’s a worthy goal, it’s virtually impossible for large-scale, end-of-year state tests to capture individual performance in sufficient detail to guide teachers’ work with each student—not to mention that teachers typically receive state test results long after the school year has ended.

A better strategy would shift the focus of state testing to giving policymakers, parents, and the public an annual window into student and school performance, while stopping short of tying test results to consequences for schools and expecting them to yield a teaching plan for every student.

This shift would allow states to scale back testing—ESSA currently requires that they test every student every year in reading and math in grades 3 through 8 and once in high school and in science once per grade span (elementary, middle, and high school). States could reduce the amount of testing by borrowing the sampling approach used by pollsters. States could test a representative sample of students in key grades, or they could adopt what psychometricians call matrix sampling, in which each student is tested in greater depth on only a sample of the relevant standards. Matrix sampling could allow states to improve test quality and test a wider range of curriculum content, because not every student would have to answer every question.

The NAEP Model

The highly regarded, federally funded National Assessment of Educational Progress uses matrix sampling to capture student performance at the national, state, and local levels, as do national testing systems in other countries. An alternative to sampling, proposed by Scott Marion, director of the nonprofit Center for Assessment, would test every student every other year or every other grade. Both the matrix and Marion models make sense to us.

Less testing would free up time to gauge other student experiences and outcomes that many stakeholders in the testing debate want measured, including, for example, whether schools are creating a sense of belonging among students.

As a practical matter, the move wouldn’t have much impact on school accountability, which in most states has been substantially weakened under ESSA.

The No Child Left Behind Act of 2002 required consequences for schools if students weren’t performing up to state standards, as measured largely by state testing. ESSA maintained the frequency of state tests but defined accountability very differently, requiring only that bottom-performing schools be identified and that improvement plans be drawn up for them. States would decide which steps, if any, schools should take to improve. In other words, the new law eliminated NCLB’s strongest improvement measures and devolved accountability decisions to states—and many states have declined to act decisively on low-performing schools. A 2024 federal Government Accountability Office study found that, eight years after ESSA’s enactment, states hadn’t produced complete improvement plans for even half of their lowest-performing schools—serving 2.5 million students—identified under ESSA. And improvement efforts are underway in far fewer schools than that.

Nor, given education politics today, are more stringent federal accountability mandates likely to return any time soon.

But states that want to use state test data to target schools for improvement could do so under the modified regime we’re proposing. Data on demographic subgroups within schools would be more limited under either matrix sampling or Marion’s approach, making it difficult to use the data for measuring school performance because sample sizes would be too small. But states could compensate for that by reporting scores for the bottom 25 percent of students in the school. The reality is that there’s not much evidence that disaggregating scores by race and socioeconomic status has made a significant difference in closing achievement gaps.

Local Diagnostics

The focus on diagnostics, meanwhile, could shift to local testing, where tests would be tied more closely than their state counterparts to instruction and teachers would get results in time to help their students, rather than receiving what amounts to autopsy reports after schools close at the end of the year.

Many school districts already use this approach. Alison Timberlake, deputy director for assessment and accountability in the Georgia Department of Education, told us that given the widespread emergence of locally adopted tests woven into instructional materials and designed to deliver diagnostics, ESSA’s expectation that states yield diagnostics by testing “every kid every year on the full depth and breadth of [state] standards . . . isn’t necessary anymore.” States could establish review panels of psychometricians, curriculum specialists, and local educators to ensure that the local standardized tests are of high quality and are aligned to state standards.

Reducing the demands on state testing would yield another benefit: enabling the implementation of testing innovations that have struggled to meet the technical requirements for validity, comparability, and reliability demanded of state tests by ESSA. These include “performance assessments” that probe deeper levels of learning by asking students to show what they know by completing an experiment or conducting an analysis rather than merely answering multiple-choice questions; student surveys of school climate; and “competency- or skills-based assessments” that provide students immediate results and permit them to progress at their own pace based on demonstrated mastery.

Compared to current state tests, these new forms of measurement are able to gauge a wider range of student competencies, from career and technical skills to interpersonal skills to digital problem solving. Federal policymakers could require that school districts measure students against the same standards and learning progressions used in state tests and that parents receive clear score reports so they understand exactly how their children are performing.

NCLB roughly tripled the amount of state testing in the nation’s schools. It also led to significantly greater school-district use of commercially developed interim and benchmark tests to measure whether students were on track to do well on state tests. And these assessments were frequently layered on top of existing local testing. By reducing the state testing footprint and incentives for school districts to test students’ readiness for state tests, the new model we’re proposing would likely lessen standardized testing significantly in many schools and allow more time for instruction.

The Federal Role

These changes would require a revision of federal testing requirements or, in the short term, a willingness by federal officials to let states adopt the model under the U.S. Department of Education’s Innovative Assessment Demonstration Authority. Designed to encourage novel approaches to state testing, IADA has attracted few takers since its creation nine years ago because it still requires states to meet federal accountability mandates. However, Education Secretary Miguel Cardona announced late in 2023 that he was relaxing the program’s requirements to encourage more states to take part, an approach the new administration could be inclined to continue given its stated commitment to reducing federal oversight.

We know that the new testing paradigm we’re proposing would spark controversy because it would require tradeoffs. Even though our approach would provide a significant level of transparency—and transparency itself serves as a form of accountability—many accountability advocates insist on the need for consequences for schools whose students perform poorly, even if accountability under ESSA has fallen short of that expectation.

Also, without annual testing of every student, measuring growth in student achievement over time poses difficulties. Federal law now rightly encourages that metric in order to more fairly evaluate the work of schools that serve large percentages of vulnerable students who start school behind their more privileged peers. What’s more, a good deal of education research depends on the data derived from measuring student performance year after year. State testing of students every other year would permit policymakers to continue to measure student growth with a meaningful degree of confidence while providing an audit on local reporting. States and school districts could administer performance assessments and other innovative measures in the alternate grades or years.

We’re encouraged that Dale Chu and Chad Aldeman agree with us that standardized testing is excessive, that state tests are not the best vehicle to provide detailed instructional roadmaps for educators or diagnostics on individual students, and that local educators and parents need test results more quickly.

But they sidestep the dilemma at the crux of our essay: that the current configuration of state testing is playing into the hands of testing opponents even as the school accountability system it powers has stalled in many states.

Nor in the course of critiquing matrix sampling do Chu and Aldeman acknowledge that testing every student every other year or every other grade—a second path to the more manageable testing system that we propose—preserves the capacity to measure student growth.

And their singular state testing fix—quicker “pulse tests” throughout the school year—runs up against the federal accountability-driven requirement that state tests produce a single, year-end score tied to state standards. That’s why we propose faster, more frequent testing at the local level.

We support testing, but we also try to think realistically. Accountability advocates have not been inclined to compromise. More broadly, stakeholders have yet to engage in a clear-eyed national conversation about how much a single test can accomplish; policymakers continue to search for a unicorn assessment that can be all things to all people. Until they begin to explore alternatives such as what we have proposed here, the stalemate on standardized testing will continue—and the likelihood of losing state testing altogether under the next reauthorization of the federal elementary and secondary education law will increase.

This essay is part of a two-part forum. Read Dale Chu and Chad Aldeman’s response here.

A New Blueprint for State Standardized Testing

Solving the Stalemate

The NAEP Model

Local Diagnostics

The Federal Role

The Churn

Events

FutureEd in the News