Automated Engines Score Essays Like Humans -- THE Journal

Research

Automated Engines Score Essays Like Humans

A study from the PARCC found that essays graded by computers matched those of humans based on various performance metrics.

By Sri Ravipati
04/14/16

A report from a national testing organization found that the performance of automated scoring engines matches that of human scorers.

The Partnership for Assessment of Readiness for College and Careers (PARCC), a consortium of states working to create a standard of K-12 assessments in mathematics and English language arts in alignment with the Common Core State Standards Initiative, recently released a report on the viability of computer-scored essays. The study was conducted in 2014 and published in 2015, but the report was not widely available to the public until the Parent Coalition for Student Privacy and other parties wrote a letter to state commissioners urging the PARCC to be more widely available.

Pearson Education and Educational Testing Service (ETS) together participated in the research study to test and compare the PARCC’s automated scoring against human scoring. The joint study included 75 prompts, spanning multiple grade levels and task types. Both Pearson and ETS first trained their scoring engines on the prompts using correlating human-scored responses. Then, Pearson and ETS fed an unseen set of student essays to their scoring engines and compared the results to human-scored responses on the same unseen set. Performance was based on grade level, trait and type of prompt. The study revealed that, on average, the performance of the automated scoring engines matched that of the human scorers, and only essays from grade three performed slightly below human-scored tests.

Parents and advocates addressed their concerns about automated scoring, citing the “inability of computers to assess the creativity and critical thought that the Common Core standards were supposed to demand” in the letter. They wanted more information from the PARCC, such as the percentage of computer-scored tests that were re-checked by humans and what happens when machine scores vary significantly from scores from humans.

Despite the call for more information, several of the PARCC states will start using scoring engines to judge essays this year, according to an online report. This year, about two-thirds of all student essays will be scored automatically, while one-third will be scored by humans. In addition, about 10 percent of all responses will be randomly selected to receive a second score as a precaution. States can still opt to have all essays hand-scored.

About the Author

Sri Ravipati is Web producer for THE Journal and Campus Technology. She can be reached at [email protected].

E-Mail this page

Printable Format

Featured

4 Steps to Responsible AI Implementation in Education

Researchers at the University of Kansas Center for Innovation, Design & Digital Learning (CIDDL) have published a new framework for the responsible implementation of artificial intelligence at all levels of education, from preschool through higher education.
Report: Organizations Face Obstacles to Cloud Infrastructure Readiness in the Face of Increase in AI-Driven Workloads

Enterprise cloud teams face barriers to scaling and resilience as AI-driven workloads surge, according to a new report from ControlMonkey.
Microsoft, OpenAI Restructure Partnership

Microsoft and OpenAI have announced they are redefining their partnership as part of a major recapitalization effort aimed at preparing for the arrival of artificial general intelligence (AGI).
HPE Intros Agentic AI Enhancements to Mist Platform

HPE recently introduced new capabilities for its Juniper Mist platform that leverage agentic AI to enable more autonomous, intelligent, and proactive network operations.