Researchers at IBM’s Almaden researcher lab in California are working on a 120 petabyte hard disk array. For those of you playing along at home, that’s 120 million gigabytes spread out over 200,000 hard drives. The array is being assembled for an as-of-yet unnamed client, and will be used to run complex modeling simulations, like those used in meteorology.
Here’s some stats to drool over: With 120 petabytes of storage size, Technology Review says that the array could store a trillion files, 24 billion 5 megabyte MP3s, or 60 copies of the 150 billion page Internet Archive WayBack Machine. The largest arrays currently available are around 15 petabytes, about a tenth the size of the IBM array. Bruce Hillsberg, the lead on the project, must be very proud of his water-cooled memory monstrosity.
Beyond its voluminous size, the IBM array has a few software tricks up its sleeve. One of these, called the General Parallel File System or GPFS allows of super fast indexing of massive amounts of information. With GPFS, the computer attached to the array can also read and write different parts of individual files at the same time, despite their being split up across the myriad of drives. Technology Review writes that a recent GPFS demonstration indexed 10 billion files in 43 minutes, which shattered the previous record of one billion files in three hours.
The massive disk array also has a unique system that accounts for the inevitable death of hardisks. From Technology Review:
When a lone disk dies, the system pulls data from other drives and writes it to the disk’s replacement slowly, so the supercomputer can continue working. If more failures occur among nearby drives, the rebuilding process speeds up to avoid the possibility that yet another failure occurs and wipes out some data permanently. Hillsberg says that the result is a system that should not lose any data for a million years without making any compromises on performance.
While currently ludicrous and over-the-top, Hillsberg points out that drives like this are sure to become commonplace as the world moves toward a more cloud-based computing environment. Cloud networks demand high capacity storage, high reliability (after all, why would you trust your precious family photos to a system that can’t guarantee their safety?), and fast speeds so as to seem natural for users. IBM’s 120 petabyte array is a monster, to be sure, but it’s sure to soon be eclipsed by more and greater machines.