Sharing = Performance + Savings
By Neal Starkman
Think of a time-share home. The home is nice enough, but the company managing it says that you can have it for only two days one week, a day and a half the following week, and three nonconsecutive days the following month. Not only that, but once you're in the home, you can be kicked out at any time if someone “more important” wants it.
That's the way researchers used to look at how they were allotted time on their universities' supercomputers: a few hours here, a few hours there, and always knowing there was a possibility of getting bumped by someone with “more important” research.
In at least some universities, however, that's all changing. Now researchers are staying in their time-shares for months at a time — consecutively. They don't get kicked out because there are plenty of time-shares to go around. And, in fact, researchers are now planning bigger and better vacations together.
It's all because of what's called “high-performance computer clusters,” or HPCCs, which are, essentially, lots of computers — lots of computers — connected by a high-speed network. It looks something like this:
The nodes all work together and users can tap into any of them. The cluster selects the node that's most able to handle the work. The head node, or master node, is the doorman who lets you into your time-share.
Andrew Binstock, principal analyst at Pacific Data Works LLC, explains it like this (in “ Multiprocessors, Clusters, Grids and Parallel Computing: What's the Difference?”): “Load-balancing clusters process heavy volumes of transactions of a similar type.” For example, “the cluster routes the incoming transaction stream to whichever node in the system is most able to handle it. … Some database clusters can divide access to a large database according to record keys, so that on a four-node system, for instance, records beginning with a first digit of 0-2 might be handled by one node, 3-5, by another, 4-6 by a third, and 7-9 by the remaining node.”
Dell is probably the major supplier of HPCCs. As such, they work closely with people from universities and businesses around the world to determine the exact configuration that will meet their needs. The c lusters are usually based on either Fast Ethernet or Gigabit 2 Ethernet network connections with Dell PowerEdge 1850 configurations. But a s Dell points out on its Web site, no one cluster can satisfy everyone's needs. The diagram below presents an example of the basic components required in a high-performance computing cluster. Each layer in the diagram illustrates several options; each path from the bottom to the top represents a common configuration of a system.
George Jones, one of Dell's business development managers, says the advantages of HPCCs are that they're easy to deploy (a typical interval between ordering a cluster and actually doing work on it is about 60 days), they're cost-effective (maybe $50,000 for 16 processors), and they lead to more and better science (more researchers have access to the computers more quickly and with more power). Jones says, “They provide tools to solve engineering problems, educate the next generation, and help universities recruit and maintain faculty.”
Here's another way to look at the cost: A n IBM Regatta, which is a supercomputer-class Unix machine with 64 processors that CGG Americas used in the late 1990s, cost the company $1.5 million to $2 million. By comparison, CGG Americas paid Dell $830,000 for an initial cluster of 256 machines, or 512 processors. The difference comes out to about $31,250 per processor for the supercomputer and $1,621 per processor for the cluster.
Stephen Itoga is the chair of the University of Hawaii at Manoa's Department of Information and Computer Sciences (http://www.ics.hawaii.edu). The department has a cluster from Dell — 96 nodes, meaning 192 PowerEdge 1850 processors since each node is a dual-processor. Engineers, marine biologists, meteorologists and others all set up their systems on this cluster. Itoga says, “They can do things faster and do more things.” What kinds of things? The meteorologists are feeding grids of information into the model so they can fine-tune daily weather predictions, which means that the program has to account for a vast amount of data very quickly. The marine biologists are interested in coral growth, which is sensitive to thousands of interrelated factors including El Niño. Therefore, their need is for the program to track these interrelations over time.
Itoga points out another considerable advantage of the cluster: “It may enhance the learning environment, simply because the technology is now affordable. … With a little sweat equity, some of the instructors may come up with more captivating learning environments.” Returning to our time-share analogy, once professors know that a racquetball court is available, they might take up racquetball.
This feeling is echoed by Matt Wolf, a co-director of Georgia Tech's Interactive High Performance Computing Laboratory (http://www.cc.gatech.edu/projects/ihpcl). He and his staff have been working with clusters for 10 years; today, the laboratory has hundreds of processors. Wolf says, “We can help people to think bigger, instead of doing the same things faster.” And the Interactive High Performance Computing Laboratory is a paragon of helping people to think bigger — the atmosphere is positively, well, collegial. Professors and students — several hundred in all — are constantly talking with and e-mailing each other not only to figure out how best to use the computers, but also to think about what kinds of research they could do given the enormous potential of the equipment. Physicists, chemists, mechanical engineers — everyone is involved in what is sometimes referred to as “blue-collar computing,” i.e., computing that lies somewhere between a PC on your desk and a supercomputer that encompasses thousands of nodes. Wolf calls this an “intellectual mind share.”
Another intellectual mind share of sorts is going on at the Texas Advanced Computing Center (TACC — online at http://www.tacc.utexas.edu), located at the University of Texas at Austin. There, Tommy Minyard manages the High Performance Computing Group. They have several different clusters including a 512-node cluster (1,024 processors). As at Georgia Tech, practically any research scientist associated with the university can apply for computer hours; the typical request is for about 15,000 hours. That may seem like a lot, but Minyard's group has about 8 million hours at their disposal. One million of those hours are used by the university's Institute for Computer and Engineering Sciences, but that still leaves a lot of time for biologists, physicists, chemists and engineers. They all log in to the master node and submit their jobs; a batch scheduler does the rest.
Minyard clearly revels in the amount of power and memory the clusters can provide scientists. “It's allowed people to run much bigger high-resolution programs,” he says. “More memory, more processors. More weather simulations. More what-if scenarios.” Again, as at Georgia Tech, scientists from many different disciplines use the cluster for various reasons: molecular modeling, protein folding, RNA structures. It was a team from Georgia Tech that was able to use the HPCCs to simulate the impact of the foam coming off the Columbia spaceship.
The 512 nodes at the Texas facility cost about $3 million. But as Minyard says, “Cost versus performance is really where clusters win.” A one-piece supercomputer would probably have cost over $10 million.
The potential for science is enormous. One of Minyard's colleagues, climatologist Charles Jackson from the University of Texas at Austin Institute for Geophysics (UTIG), works with models that predict climate change. “The earth's climate is influenced by everything from the brief periodic El Niño episodes in the Pacific to the planet's orbital wobbles over tens and hundreds of thousands of years,” he says in an article written by Merry Maisel (http://www.tacc.utexas.edu/research/users/features/climatechange.php). He and his team will be working at TACC on a machine called “Wrangler,” consisting of 128 Dell two-processor nodes with a total of 512 GB of memory and 6.6 terabytes of disk storage, 3.2 GHz EM64T Xeon chips as processors, Myrinet and InfiniBand interconnects, and a memory subsystem running at 800 MHz.
Jackson believes that his group's activities will result in an important advance: a much better understanding of the contributions and weightings of the many interacting processes represented in climate models. “We're hoping that our findings will be of benefit to the entire climate-modeling community, whether the topic is paleoclimate or current or future climate, and whether the time scales are measured in decades or centuries or longer.”
Increasingly businesses as well as universities are relying on high-performance computing (HPC) to stay competitive. The July 2004 “High Performance Computing Users Conference: Supercharging U.S. Innovation & Competitiveness,” which had more than 200 in attendance from business, government and academia, reported the following:
- HPC tools are considered indispensable by 97% of businesses surveyed prior to the conference.
- Benefits from HPC include accelerated product development cycles and reduced time to market.
- Business obstacles, such as the inability to accurately quantify the return on investment in HPC, often inhibit more aggressive use.
- Strong partnerships are needed between government, industry and academia.
The conclusion, according to Deborah Wince-Smith, president of the Council on Competitiveness: “In today's globally competitive environment, this advanced technology is essential to business survival. A country that wishes to out-compete in any market must also be able to out-compute its rivals.”
Dell and other companies are only too ready to supply HPCCs to businesses as well as research facilities such as those at the University of Hawaii, Georgia Tech and the University of Texas. They provide educational discounts as well as warranties on their hardware, and end-users like Tommy Minyard and Matt Wolf appreciate their responsiveness. The only question Minyard, Wolf, Itoga and the others seem to have these days is, “How much better can this get?”
Time-shares have never been so accommodating.