NIST Introduces National Generative AI Testing Program

The National Institute of Standards and Technology (NIST) is moving toward establishing a more standardized national approach to AI safety. The government agency has announced the launch of NIST GenAI, described as an "evaluation program to support research in Generative AI technologies."

The launch comes six months after the Biden White House signed an Executive Order requiring LLM makers to implement guardrails around AI technologies that protect the privacy and security of consumer data. For instance, the order mandated the development of "standards, tools, and tests to help ensure that AI systems are safe, secure, and trustworthy," and of "standards and best practices for detecting AI-generated content and authenticating official content."

The NIST GenAI program is part of the department's effort to address those mandates.

A companion NIST program, dubbed Aria, is set to launch soon. Aria's stated goal is "to advance measurement science for safe and trustworthy AI."

In a press release Monday, the U.S. Department of Commerce, of which NIST is part, described the GenAI program as a platform to "evaluate and measure generative AI technologies."

"The NIST GenAI program will issue a series of challenge problems designed to evaluate and measure the capabilities and limitations of generative AI technologies," said the agency. "These evaluations will be used to identify strategies to promote information integrity and guide the safe and responsible use of digital content."

The first of these challenges aims to evaluate the efficacy of text-to-text (T2T) AI models -- those that generate human-like text ("generators"), as well as those that purport to detect AI-generated text ("discriminators"). Findings from the challenge will help guide the NIST's eventual recommendations to LLM makers for how to convey the provenance of content made using their AI systems. This is how NIST describes the challenge in its Overview page:

NIST GenAI T2T is an evaluation series that supports research in Generative AI Text-to-Text modality. Which generative AI models are capable of producing synthetic content that can deceive the best discriminators as well as humans? The performance of generative AI models can be measured by (a) humans and (b) discriminative AI models. To evaluate the "best" generative AI models, we need the most competent humans and discriminators. The most proficient discriminators are those that possess the highest accuracy in detecting the "best" generative AI models. Therefore, it is crucial to evaluate both generative AI models (generators) and discriminative AI models (discriminators).

The challenge is open to academics, researchers and LLM makers; those interested can read the participation guidelines here. A similar challenge to evaluate text-to-image models is set to start soon.

Besides the GenAI program launch, NIST this week released preliminary versions of four papers about the secure development and implementation of AI. These papers, which are described as "initial drafts," are as follows:

Each draft is still subject to change based on public input. The NIST is accepting feedback for each publication until June 2, and plans to publish final versions published "later this year."

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.

Featured

  • abstract geometric pattern of glowing interconnected triangles, hexagons, and circles in blue, gold, and white, spread across a dark navy-to-black gradient background

    OpenAI Introduces 'Operator' AI for Performing Web Tasks

    OpenAI has announced "Operator," an AI agent designed to perform web-based tasks autonomously using its own browser. Currently available as a research preview for Pro users in the United States, the tool aims to automate everyday activities such as filling out forms, ordering groceries, and even creating memes.

  • digital illustration of Estonia with glowing neural network-like connections spreading across the map

    Estonia to Roll Out ChatGPT Edu for all Secondary Schools

    In a nationwide artificial intelligence program dubbed "AI Leap 2025," the country of Estonia plans to provide free access to leading AI applications for all secondary school students and teachers. The initiative will launch with a rollout of ChatGPT Edu to 20,000 high school students in grades 10-11 and their 3,000 teachers, beginning Sept. 1.

  • glowing digital brain made of blue circuitry hovers above multiple stylized clouds of interconnected network nodes against a dark, futuristic background

    Report: 85% of Organizations Are Leveraging AI

    Eighty-five percent of organizations today are utilizing some form of AI, according to the latest State of AI in the Cloud 2025 report from Wiz. While AI's role in innovation and disruption continues to expand, security vulnerabilities and governance challenges remain pressing concerns.

  • DreamBox Math

    Discovery Education Announces Accessibility Enhancements for DreamBox Math

    Discovery Education has updated DreamBox Math, an online math program for K–8 students to supplement core instruction, to improve accessibility for K–5 students, according to a news release. DreamBox Math provides personalized instruction by adapting to individual learners’ responses and providing an engaging, dynamic learning environment.