Yahoo Develops Interface Classification System for Hadoop

Developers at Yahoo have been working on a new interface classification system in Hadoop to distinguish two facets of an interface from the perspective of backward compatibility: the audience of the interface and the stability of the interface.

The "audience" of the interface refers to its scope or visibility. It's about the potential customers for it. The classifications in the new system include "public," "limited private" (for hooks exposed to peer frameworks or systems), and "private." The "stability" of an interface refers to how changes might or might not break compatibility. The classifications include "stable," "evolving," and "unstable."

Hadoop is the popular Java-based open-source framework for data-intensive distributed computing. The Hadoop Framework is an open-source distributed computing platform designed to support parallel computations over large data sets on so-called unreliable computer clusters. It's based on Google's MapReduce, a programming model for processing and generating large data sets, which divides an application into multiple units of work, each of which can be executed on any node in a server cluster. Hadoop supports the HDFS distributed file system, which designed to scale to petabytes of storage and to run on top of the file systems of the underlying OS.

In his Yahoo Developer Network Blog, Hadoop team member Sanjay Radia wrote: "Hadoop is increasingly being used to run large, long-lived, enterprise-class applications. Porting these applications to non-compatible upgrades of Hadoop is an arduous, expensive task that distracts teams from finding new and better ways of using Hadoop to bring value to their companies. Today, Hadoop users are demanding backwards compatibility and interface stability; these features are necessary for the next growth phase of Hadoop, as it gains wider enterprise adoption."

According to Radia, an interface can be a Java API, a configuration variable, the parameters or output of a command, or metrics variables. The system tags Java APIs using Java Annotations, while other types of interfaces (configuration options and output formats, for example), are tagged using informal documentation conventions. The upcoming release 0.21 of Hadoop will be the first to expose this classification, Radia said.

Yahoo's recommendation to app developers: stick to "public-stable" interfaces. "If you are early adopter, you may use a public-evolving interface," Radia wrote, "but be aware that the interface may change slightly in the future, forcing a change to your application." If you're a framework developer on Hadoop: "You can of course safely use any of the public interfaces, but can also use limited-private interfaces targeted to your framework. For example the Hadoop RPC layer provides limited-private interfaces for HDFS and MapReduce."

The new classification system, which is derived from OpenSolaris and Yahoo's own internal system, has been in the works for the last year. It's part of Yahoo's plan to provide stronger backward compatibility, Radia said.

The details of the classifications for interfaces system can be found here.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

Featured

  • AI-powered individual working calmly on one side and a burnt-out person slumped over a laptop on the other

    AI's Productivity Gains Come at a Cost

    A recent academic study found that as companies adopt AI tools, they're not just streamlining workflows — they're piling on new demands. Researchers determined that "AI technostress" is driving burnout and disrupting personal lives, even as organizations hail productivity gains.

  • laptop displaying a glowing digital brain and data charts sits on a metal shelf in a well-lit server room with organized network cables and active servers

    Cisco Unveils AI-First Approach to IT Operations

    At its recent Cisco Live 2025 event, Cisco introduced AgenticOps, a transformative approach to IT operations that integrates advanced AI capabilities to enhance efficiency and collaboration across network, security, and application domains.

  • sunlit classroom with laptops on every desk, each displaying a glowing AI speech bubble icon above the screen

    Copilot Chat and Microsoft 365 Copilot to Become Available for Teen Students

    This summer, Microsoft is expanding availability of its Copilot Chat and Microsoft 365 Copilot products for students aged 13 and older. Administrators will be able to grant access for students based on their institution's plans and preferences, the company announced in a blog post.

  • colorful geometric illustration features gaming devices, computer accessories, and stacks of books

    Gaming in K–12 Classrooms Is Powering the Future Tech Workforce

    Today's most forward-thinking schools are using gaming as a platform to train students for real-world roles in fields like aviation, robotics, remote operations, and data center management.