Automated Engines Score Essays Like Humans

A study from the PARCC found that essays graded by computers matched those of humans based on various performance metrics.

A report from a national testing organization found that the performance of automated scoring engines matches that of human scorers.

The Partnership for Assessment of Readiness for College and Careers (PARCC), a consortium of states working to create a standard of K-12 assessments in mathematics and English language arts in alignment with the Common Core State Standards Initiative, recently released a report on the viability of computer-scored essays. The study was conducted in 2014 and published in 2015, but the report was not widely available to the public until the Parent Coalition for Student Privacy and other parties wrote a letter to state commissioners urging the PARCC to be more widely available.

Pearson Education and Educational Testing Service (ETS) together participated in the research study to test and compare the PARCC’s automated scoring against human scoring. The joint study included 75 prompts, spanning multiple grade levels and task types. Both Pearson and ETS first trained their scoring engines on the prompts using correlating human-scored responses. Then, Pearson and ETS fed an unseen set of student essays to their scoring engines and compared the results to human-scored responses on the same unseen set. Performance was based on grade level, trait and type of prompt. The study revealed that, on average, the performance of the automated scoring engines matched that of the human scorers, and only essays from grade three performed slightly below human-scored tests.

Parents and advocates addressed their concerns about automated scoring, citing the “inability of computers to assess the creativity and critical thought that the Common Core standards were supposed to demand” in the letter. They wanted more information from the PARCC, such as the percentage of computer-scored tests that were re-checked by humans and what happens when machine scores vary significantly from scores from humans.

Despite the call for more information, several of the PARCC states will start using scoring engines to judge essays this year, according to an online report. This year, about two-thirds of all student essays will be scored automatically, while one-third will be scored by humans. In addition, about 10 percent of all responses will be randomly selected to receive a second score as a precaution. States can still opt to have all essays hand-scored.

About the Author

Sri Ravipati is Web producer for THE Journal and Campus Technology. She can be reached at [email protected].

Featured

  • Indianapolis Public Schools Adopt DreamBox Math

    Thanks to a new partnership with Discovery Education, all Indianapolis Public Schools (IPS) K–8 students and teachers will gain access to DreamBox Math, which blends curriculum and continuous formative assessments that adapt to student needs to boost achievement.

  • The First Steps of Establishing Your Cloud Security Strategy

    In this guide, we'll identify some first steps you can take to establish your cloud security strategy. We'll do so by discussing the cloud security impact of individual, concrete actions featured within the CIS Critical Security Controls® (CIS Controls®) and the CIS Benchmarks™.

  • Google Brings Gemini AI to Teens in the Classroom

    Google is making its Gemini large language model available for free for students ages 13 and up in the United States (age minimums vary by country), via Google Workspace for Education accounts.

  • A top-down view of a person walking through a maze with walls made of glowing blue Wi-Fi symbols on dark pathways

    Navigating New E-Rate Rules for WiFi Hotspots

    Beginning in funding year 2025, WiFi hotspots will be eligible for E-rate Category One discounts. Here's what you need to know about your school's eligibility, funding caps, tracking requirements, and more.