The Surprising Truth About Writing Auto-Graders

The technology promises to save time and tedium, but can English papers truly be assessed without the human touch of a teacher?

This article originally appeared in T.H.E. Journal's June 2013 digital edition.

auto grading machines for writing

The West Virginia Department of Education's auto-grading initiative dates back to 2004--a time when school districts were making their first, early forays into automation. The Charleston-based WVDE had instituted a statewide writing assessment in 1984 for students in fourth, seventh, and 10th grades, and was looking to expand that program without having to pay more teachers to hand-score the tests over the summer.

"We brought teachers to Charleston and paid them daily rates plus lodging and meals," explains Sandy Foster, lead coordinator in the WVDE's Office of Assessment and Accountability. "It was great professional development for the teachers, but the process was expensive; we were only able to do it for the three grade levels."

Seeking a more affordable solution, over the years the WVDE explored the concept of auto-grading, which meant programming computers to scan and grade student essays. It ultimately selected CTB/McGraw-Hill's Writing Roadmap, an online writing tool that measures and tracks writing proficiency for students in grades 3 through 12. Foster says the off-the-shelf product came with 100 different writing assignments or sets of instructions for essays (called "prompts"), as well as a generic scoring engine that was customized to meet WVDE's needs by adding additional prompts.

Part of the software customization included training the scoring engine on the additional prompts (a process handled by the vendor) and then field-testing several thousand student papers to ensure that the engine was properly trained on those prompts. The end result was the WV Writes program, which has been scoring students in grades three through 11 since 2008. Second-graders also use the auto-grading system, but only for practice. For the first three months of 2013 alone, the system had already scored about 500,000 essays for the 180,000 students that the WVDE serves.

Grading the Graders
Using WV Writes, teachers make assignments, then students sit down at a computer to read the prompt and type their reading/language arts essays into the system. Students use a locked-down keyboard that doesn't allow access to the internet, says Foster, thus eliminating the chance of students "spoofing" tests by cutting and pasting information from the web.

Once the essays are complete, students hit a single key to have their work immediately assessed and scored on a six-point rubric that includes organization, development, sentence structure, word choice, grammar usage, and mechanics. "It scores essays holistically based on each of those traits," Foster says, "and spits out a number between one and six."

To ensure scoring reliability and validity, WVDE conducts an annual review by 50 to 60 teachers who are trained on the scoring rubric and who then spend several days hand-scoring student papers that have already been scored by WV Writes. The teachers don't know the automated scores for the papers that they are grading. According to Foster, the similarities between the auto-grades and the hand-scored grades are remarkable.

"In most cases there's a closer correlation between the human and computer scores than there is between two human scores," says Foster. "It's pretty amazing." The annual validity studies helped quell one of the WVDE's biggest challenges with auto-grading: the fact that English teachers don't believe that computers can score writing. "When teachers saw for themselves how close the WV Writes scores were to theirs," says Foster, "more of them became believers."

Gaming the System
With automation gaining ground in the K-12 environment, it was only a matter of time before the tedious, time-consuming task of grading English papers was turned over to robo-graders that are fast, cheap, and tireless. But as WVDE found out, the notion that a computer can effectively judge student writing is one that's regularly contested by English teachers and others.

Ease of cheating can be another major obstacle to auto-grading papers, according to Jim Klein, director of information services and technology for the Saugus Union School District in Santa Clarita, CA. In 2008, the district started using the MY Access! writing and assessment software from Vantage Learning to help fourth-grade students pre-assess and hone their essays before submitting them to teachers for hand-grading.

Klein says it didn't take long for students to figure out ways to game the system. "Use well-constructed sentences and million-dollar words and your score goes up," he says. Foster says WVDE has faced similar challenges with its auto-grading system. Essay length counts, for example, so some students develop a single, well-constructed paragraph and then duplicate it multiple times throughout the essay--a trick that the auto-grader doesn't always pick up on. (Foster says this is a "technology issue that the vendor is currently working on resolving.")

"The reality is the kids will always find a way to game these systems," Klein says, "and manipulate their scores." On a positive note, he says auto-grading has alleviated at least some of the pressure of grading English papers. After hitting "submit" on their keyboards, for example, students can instantly see and fix small errors like the fact that they forgot to single-space after every period.

"When a first draft is lousy, and hand-graded, the teacher has to go through a series of steps to get the student through the revision process," says Klein. "MY Access! serves as a refinement tool that provides immediate feedback, saves time shuttling essays back and forth between student and teacher, and preps students for Smarter Balanced assessments."

Klein says that while auto-grading has its role in pre-assessing writing, it's not likely to replace hand-grading anytime soon. "A tool cannot look at an individual student and what he or she is capable of, not to mention the fact that students have learned how to work around the technology to boost their scores," says Klein. "In the end, you still need that individual touch that only a teacher can provide."

A Work in Progress
Rob Primmer, a ninth and 12th-grade English teacher at Brookline High School in Brookline, MA, is not a proponent of auto-grading. "I'm wary of giving a student a pass or fail mark based on a machine," says Primmer. "I need to know more about how a machine really stacks up against a human grader; I'm not ready to give that over."

Primmer says he handles the time-consuming process of grading writing papers on his own and with the help of the school's Canvas open-source online course management system. Using the solution, Primmer can annotate essays and other writing projects that have been proofread, finalized, and then uploaded by students. "I provide commentary, highlight passages, and give feedback," says Primmer, "but I do all of the grading."

That's doesn't mean Primmer hasn't at least thought about handing over some of the grading work to a machine. He says that if he could develop a checklist of elements that are missing from or inserted into English papers (incorrect book titles within citations, for example), and then turn that list over to a computer program to point out those basics to the student for revision, it would facilitate--but not completely replace--the human grading process.

"Who knows how things will look in 10 years?" says Primmer. "From my perspective the core of English will still be books, reading, and discussion. The classroom is a culmination of thoughts that come together and can't be reproduced. From my perspective, this aspect of [English] instruction doesn't seem to lend itself to a digital pathway."

Featured

  • glowing digital human brain composed of abstract lines and nodes, connected to STEM icons, including a DNA strand, a cogwheel, a circuit board, and mathematical formulas

    OpenAI Launches 'Reasoning' AI Model Optimized for STEM

    OpenAI has launched o1, a new family of AI models that are optimized for "reasoning-heavy" tasks like math, coding and science.

  • landscape photo with an AI rubber stamp on top

    California AI Watermarking Bill Supported by OpenAI

    OpenAI, creator of ChatGPT, is backing a California bill that would require tech companies to label AI-generated content in the form of a digital "watermark." The proposed legislation, known as the "California Digital Content Provenance Standards" (AB 3211), aims to ensure transparency in digital media by identifying content created through artificial intelligence. This requirement would apply to a broad range of AI-generated material, from harmless memes to deepfakes that could be used to spread misinformation about political candidates.

  • clock with gears and digital circuits inside

    Report Estimates Cost of AI at Nearly $300K Per Minute

    A report from cloud-based data/BI specialist Domo provides a staggering estimate of the minute-by-minute impact of today's generative AI boom.

  • glowing lines connecting colorful nodes on a deep blue and black gradient background

    Juniper Intros AI-Native Networking and Security Management Platform

    Juniper Networks has launched a new solution that integrates security and networking management under a unified cloud and artificial intelligence engine.