Distributed Computing | News

Yahoo Develops Interface Classification System for Hadoop

Developers at Yahoo have been working on a new interface classification system in Hadoop to distinguish two facets of an interface from the perspective of backward compatibility: the audience of the interface and the stability of the interface.

The "audience" of the interface refers to its scope or visibility. It's about the potential customers for it. The classifications in the new system include "public," "limited private" (for hooks exposed to peer frameworks or systems), and "private." The "stability" of an interface refers to how changes might or might not break compatibility. The classifications include "stable," "evolving," and "unstable."

Hadoop is the popular Java-based open-source framework for data-intensive distributed computing. The Hadoop Framework is an open-source distributed computing platform designed to support parallel computations over large data sets on so-called unreliable computer clusters. It's based on Google's MapReduce, a programming model for processing and generating large data sets, which divides an application into multiple units of work, each of which can be executed on any node in a server cluster. Hadoop supports the HDFS distributed file system, which designed to scale to petabytes of storage and to run on top of the file systems of the underlying OS.

In his Yahoo Developer Network Blog, Hadoop team member Sanjay Radia wrote: "Hadoop is increasingly being used to run large, long-lived, enterprise-class applications. Porting these applications to non-compatible upgrades of Hadoop is an arduous, expensive task that distracts teams from finding new and better ways of using Hadoop to bring value to their companies. Today, Hadoop users are demanding backwards compatibility and interface stability; these features are necessary for the next growth phase of Hadoop, as it gains wider enterprise adoption."

According to Radia, an interface can be a Java API, a configuration variable, the parameters or output of a command, or metrics variables. The system tags Java APIs using Java Annotations, while other types of interfaces (configuration options and output formats, for example), are tagged using informal documentation conventions. The upcoming release 0.21 of Hadoop will be the first to expose this classification, Radia said.

Yahoo's recommendation to app developers: stick to "public-stable" interfaces. "If you are early adopter, you may use a public-evolving interface," Radia wrote, "but be aware that the interface may change slightly in the future, forcing a change to your application." If you're a framework developer on Hadoop: "You can of course safely use any of the public interfaces, but can also use limited-private interfaces targeted to your framework. For example the Hadoop RPC layer provides limited-private interfaces for HDFS and MapReduce."

The new classification system, which is derived from OpenSolaris and Yahoo's own internal system, has been in the works for the last year. It's part of Yahoo's plan to provide stronger backward compatibility, Radia said.

The details of the classifications for interfaces system can be found here.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].