How To Build an Online High-Stakes Assessment System

As one of the pioneers in instituting statewide online assessments, Delaware provides an excellent case study in how to plan and implement such an ambitious initiative.

At first glance it appears the state of Delaware has pushed through online testing in all of its public schools--a feat in and of itself--in less than a year. But according to Wayne Hartschuh, the seeds for its success were actually planted 15 years ago, when the state began planning for a centralized networking infrastructure that would not only power the state government but also all 19 school districts.

The Delaware Center of Educational Technology, where Hartschuh is executive director, was formed specifically with a goal of wiring every classroom. The Center swept through the state, retrofitting buildings to deliver Internet access to every classroom and provide professional development to teachers, librarians, and administrators to help them understand what going online meant.

In the upcoming June/July T.H.E. Journal feature, "High-Stakes Online Testing: Coming Soon!," we provide an in-depth look at what district IT directors can expect when online assessment comes to their state.

By the late 1990s, once the infrastructure was in place, a second influx of state funds helped buy computers for the classrooms. In the early 2000s, the state consolidated from 45 systems for tracking student information to a single one.

All of that earlier work, said Hartschuh, "is what allowed us to ramp up so quickly" with online testing.

But it was the ending of a vendor contract that drove the immediate need. The company delivering end of year paper and pencil tests had been doing that job for Delaware since 1998. Its contract was set to end with the 2009-2010 school year. The state put in place a task force charged with coming up with its new student assessment system. As Hartschuh recalled, "They pulled the stakeholders together and came up with the recommendation to go to online testing."

A Shift in Approaching Assessment
That task force--made up of educators, administrators, teachers, legislators, business people, state Department of Education people--was focusing on two things: immediate scoring and more instructional information as measured across time. Previously, the lone test would be issued in March; the results would come back in June.

Linda Rogers, Delaware's Department of Education's Associate Secretary for the Teaching and Learning Branch, compared that testing experience to taking a snapshot--and one that would require up to two months for results to develop to see results. And then someone would complain that "their eyes were closed."

By shifting to providing multiple tests throughout the year, "it's more like a picture album," she explained. "You can see growth. That was missing in the high-stakes environment where you sat once for a test. And let's hope you have a good day and your eyes are open." The state decided to pursue a testing cycle that would consist of three exams periods, the first a couple of weeks into the new school year; another one mid-year; and the final one beginning in April, and for which the student could take a given test twice. For federal accountability, the state would use the best test score from that spring testing round.

The task force also decided to make the tests adaptive, which would structure the online versions, "a light year away from a fixed form paper and pencil test," she noteed. When a student takes an adaptive exam, he or she answers about three questions, and after those three items, the test adjusts to ask questions to gauge the test-taker's performance above or below grade level through the course of about 50 questions. "That's not possible in a paper and pencil test," she says.

The request for proposal to find a new testing vendor that could fulfill the state's specifications went out, and, by December 2009, the contract was awarded to American Institution for Research, a testing company in Washington, DC. AIR held a kickoff meeting in January 2010 and informed the state's Department of Education that in order to meet the aggressive schedule for the 2010-2011 school year, it would have to do doing item analysis in field testing during April and May 2010, a scant three months away. Each test would last about 90 minutes, and rather than all third graders, for example, sitting down and taking the paper and pencil version of the math exam at the same time, classes would have to be cycled through computer labs, one on one day and another the next.

To make sure there were plenty of computers available for the field testing, the state purchased and distributed 8,700 Dell Inspiron 11zs, a device sized between a netbook and a laptop.

Initial Hurdles and Minor Problems
Given the rapid pace of the whole initiative, Hartschuh acknowledged that he had to deal with pushback from among the district IT people. "I have a pretty good rapport with tech coordinators across the state," he explained. "They'd say, 'We can't do this in our district.' I'd respond, 'Hey, you don't have a choice. If something goes wrong, and people point fingers, they'd better not be pointing at the technology.' That's the way it was. They did a fantastic job on a short timeframe of getting machines ready, getting them in places where kids were going to go."

Hartschuh estimated that about 189,000 field tests were delivered during that period, the results of which went into developing a baseline of questions that would appear in the real exams starting in fall 2010. But ultimately, he added, the field test was about the getting the process and technology in place. "The system was set up. Information was distributed. Teachers were trained. Tests were administered. And we got results."

Minor problems did surface. At a couple of schools computers in one lab would work; the computers next door wouldn't due to small differences in configuration. Or the wireless base station would lock up during the middle of tests. "All of a sudden, 25 computers would stop working," he sighed. The base station would be reset, the students would log off, then back on, and they'd be taking the test again. "From the perspective of technology, that's a minor thing. To the kids sitting there and the teacher administering the test, that's a major thing. So we had to adjust to those types of situations."

The first real testing cycle started up in October, not September as planned, owing to last-minute modifications in the technology. The IT people at the Center for Ed Tech had to modify the Mozilla Internet browser from the one used during the field test to a more secure edition to prevent students from going out to the Web or from others getting into the tests. They also set the screen to a certain resolution to maintain consistency in the test's look. Tweaks had to be made to prevent Windows from using the browser on its own schedule to make updates to its various components; likewise, Windows' updating needed to be turned off during the school day.

But, Hartschuh pointed out, the field test gave the state a chance to shake out its technology. When the first testing window happened, fewer issues surfaced, and so on. "With each window we have, things go a little smoother."

Now, after a student takes his or her test, the results are shown immediately. As classes wrap up their testing day, the schools, districts, and the state get reporting on results as well, within 48 hours if not sooner. But can instructors actually adapt their teaching that quickly to have an impact on test results as the school year goes by? "It is their professional responsibility to do so," insisted Rogers.

Don't Call It a 'Test!'
As she noted, Delaware is a winner of Race to the Top Funds, which is spurring a lot of the reform programs being in put in place in the state. That encompasses ample teacher professional development through professional learning communities and the use of "data coaches" to help them understand what the test results mean and how to use it to guide their work in the classroom. But those assessments "aren't meant to be the sole indicator," she added. "The data coaches are there to say, 'Does this look like what you're seeing in the classroom? Does this look like what you're getting in some other piece of assessment--be DIBELS or SAT results or whatever other information you have?' It's a comparative thing where you're looking for patterns to see how much you can learn about a student across multiple pieces of data."

In fact, Rogers bristled at the phrase "state test." "You'll hear remarks [such as], 'What does their state test look like?' Delaware has an assessment system," she said, emphasizing that last word. "It's not one test at one point in time. It's a system with multiple components that we've developed, all of them important."

Rogers said Delaware could never have pulled off its fast-track to online testing without the strong alignment of interests and commitment from the governor's office, the state legislative bodies, and the educational community. "Any one of these three groups can stop these efforts in their tracks," she points out. "We have been able to enjoy discussion, promotion, encouragement, fiscal resources, people resources--we've just had a commitment from all those levels to get it done."

Featured