DOE Software Study: Are the Numbers Flawed?


4/6/2007—The United States Department of Education (ED) Thursday released the results of a study measuring the impact of math and reading software on student achievement. The results? "Test scores were not significantly higher in classrooms using selected reading and mathematics software products." This didn't sit well with organizations (or individuals) in education technology. Several responded to the study, denouncing the research as flawed or cautioning careful scrutiny of the data.

The report for Congress released by ED, titled Effectiveness of Reading and Mathematics Software Products, was based on research that commenced in 2003 in conjunction with Mathematica Policy Research Inc. (MPR) and SRI International.Participants included 33 districts, 132 schools, and 439 teachers using 16 different software products, which were selected based on recommendations by "expert review panels" and the study team. Products covered first- and fourth-grade reading, sixth-grade math, and algebra. There were 9,424 students involved in the sample who were tested in 2004 and 2005.

In a joint statement issued Thursday, the Consortium for School Networking (CoSN), the International Society for Technology in Education (ISTE), and the State Educational Technology Directors Association (SETDA) cautioned readers of the report "to scrutinize the findings carefully, as even [ED] states that the study 'was not designed to assess the effectiveness of educational technology across its entire spectrum of uses.'" These organizations also noted that there are several "scientifically-based studies" from independent research organizations funded by ED that show technology's positive impact on learning and instruction.

But essentially, said critics of the study, ED didn't take into account the critical factors of proper implementation and curriculum integration, professional development for teachers, planning, or infrastructure issues, among others.

The Software & Information Industry Association (SIIA) also released a statement on the research. "As this study recognizes, proper implementation of education software is essential for success. Unfortunately, it appears the study itself may not have adequately accounted for this key factor, leading to results that do not accurately represent the role and impact of technology in education."

The results of the study were not all net-zero. But, on the whole, each of the four categories in the study showed no statistically significant differences between the test groups using software and the control groups. This runs counter to other studies, said critics, that have shown the positive impact technology has had on education.

"Across the nation, educators are using a range of learning technologies to transform teaching and learning and improve student achievement. We have solid evidence that these efforts are working," said Mary Ann Wolf, Executive Director of SETDA, in the joint statement.

"It is important to remember that educational software, like textbooks, is only one tool in the learning process. Neither can be a substitute for well- trained teachers, leadership, and parental involvement," said Keith Krueger, CEO of CoSN, in the same joint statement release.

First-grade reading
According to the report, "First grade reading products did not affect test scores by amounts that were statistically different from zero. Figure 1 shows observed score differences on the SAT-9 reading test, and Figure 2 shows observed score differences on the Test of Word Reading Efficiency. The differences are shown in “effect size” units, which allow the study to compare results for tests whose scores are reported in different units. (The study’s particular measure of effect size is the score difference divided by the standard deviation of the control group test-score.) Effect sizes are consistent for the two tests and their subtests, in the range of -0.01 to 0.06. These effect sizes are equivalent to increases in student percentile ranks of about 0 to 2 points. None is statistically significant."

Figs. 1 and 2 show the impact of first-grade reading software on SAT-9 and the Test of Word Reading Efficiency results. Data were compiled from five software reading products--Destination Reading (Riverdeep), the Waterford Early Reading Program (Pearson Digital Learning), Headsprout (Headsprout), Plato Focus (Plato), and the Academy of Reading (Autoskill)--used in 11 districts and 43 schools, involving 158 teachers and 2,619 students.

The results are shown in units of "effect size," allowing ED to compare results across systems that use different measurements. (The complete report, linked at the end of this article, shows actual test figures.) These numbers are compared with the results of a control group that was not using the selected software.

Fourth-grade reading
Similarly, the ED study showed statistically insignificant results when measuring the results of the use of fourth-grade reading software. For this portion of the study, nine districts and 43 schools were involved, with 118 teachers and 2,265 students.

Fig. 3 shows the results of the SAT-10 tests for the forth-graders in the sample as compared with results from the control group.

Software put under the microscope for this portion of the study included Leapfrog (Leaptrack), Read 180 (Scholastic), Academy of Reading (Autoskill), and KnowledgeBox (Pearson Digital Learning).

Sixth-grade math and algebra
For the sixth-grade math portion of the study, 10 districts and 28 schools participated, including 81 teachers and 3,136 students. Three software products were used in this portion of the study: Larson Pre-Algebra (Houghton-Mifflin), Achieve Now (Plato), and iLearn Math (iLearn).

Fig. 4 shows the results of the analysis of math test results for the SAT-10 Math exam. Again, according to ED's reckoning, none of the improvements found were statistically significant.

The story was similar with the results from the study of algebra-related software (fig. 5). The study involved 10 districts and 23 schools, including 69 classrooms and 1,404 students. The software products studied included Cognitive Tutor Algebra (Carnegie Learning), Plato Algebra (Plato), and Larson Algebra (Houghton-Mifflin). Again, according to the study, there were no statistically significant differences in results from those who used the software and those who did not.

Training, support and school/classroom characteristics
In all of the study areas of this latest research, ED said there was a wide range of results between schools, but there were few correlations between the range of results and disparities in school and classroom characteristics.

Said the study: "For first and fourth grade reading products, the study found several school and classroom characteristics that were correlated with effectiveness, including student-teacher ratios (for first grade) and the amount of time products were used (for fourth grade). The study did not find characteristics related to effectiveness for sixth grade math or algebra. The study also found that products caused teachers to be less likely to lecture and more likely to facilitate, while students using reading or mathematics software products were more likely to be working on their own."

Also worth noting was that the study involved teachers who had not used the software products in the year prior to the research period. There will, however be a second-year follow-up study that will determine whether effectiveness increases when teachers have more experience with the tools.

For this first-year study, teachers received, on average, 7.5 hours of training on the software used, ranging from two to 18 hours, depending on the software. Training began in summer and fall of 2004, with 94 percent of participating teacher attending the initial meeting. In addition to training, teachers received help from vendors during the school year in various ways, including:

  • Help desk: 69 percent;
  • Vendors visiting teachers: 55 percent; and
  • Additional training at school: 39 percent.

The report also noted that the study group worked with districts during the period of the survey "to identify hardware and software needs, such as computers, headphones, memory, and operating system upgrades, and the study purchased the upgrades as needed. Common upgrades included desktop and laptop computers, servers, memory, and headphones. The study did not upgrade networking infrastructures, though some purchases of servers enabled products to operate more smoothly on local networks. As noted in the previous chapter, providing hardware and software upgrades may have contributed to higher levels of measured effectiveness, if districts normally would not have been able to purchase the upgrades. The study did not provide software or hardware support for control group teachers."

However, the support and training may not have been enough. Said the SIAA: "A strong body of research demonstrates that implementation is crucial to the success of any technology. Whether a given school experiences the full benefits of a software application depends just as much on the planning, teacher training, school leadership, technology infrastructure, support and technology use as it does the technology itself.

"There are questions about whether these issues were adequately addressed in this study. SIIA learned of a number of concerns, including inadequate student time on task, limitations in providing comprehensive product training over the time of the implementation and inappropriate match of technology design to local curriculum. It is also well recognized that year one of a technology implementation is too early to draw conclusions. Time is needed for teachers to be trained and gain comfort in integrating technology into their lessons."

In addition, the joint statement from CoSN, ISTE, and SETDA cited examples of the positive impact technology can make on student achievement in environments that include adequate professional development and support.

Said the statement: "In Utah, Missouri and Maine, the eMINTS program, which provides schools and teachers with education technology, curriculum and [more than] 200 hours of professional development, was responsible for students in the eMINTS classroom achieving [more than] 10 to 20 percentage points higher than students in the control classroom. Additionally, in Iowa, after connecting teachers with sustainable professional development and technology-based curriculum interventions, student scores increased by 14 percentage points in eighth grade math, 16 points in fourth grade math, and 13 points in fourth grade reading, when compared with control groups."

SIIA, for its part, said that the study does not diminish the importance of technological advances in the classroom. And the CoSN/ISTE/SEDTA statement concluded, "As we consider America's competitiveness, we cannot allow one narrow study to derail the progress technology is making in education in our 21st Century global economy."

Read More:


About the author: Dave Nagel is the executive editor for 1105 Media's educational technology online publications and electronic newsletters. He can be reached at

Have any additional questions? Want to share your story? Want to pass along a news tip? Contact Dave Nagel, executive editor, at

THE News Update

Sign up for our newsletter.

Terms and Privacy Policy consent

I agree to this site's Privacy Policy.