Monday, June 09, 2008

Roadrunner -- Over a petaflop, and will the bombs work if Michele and George want to try them in Iran.



This is news, in a sense more interesting than Al and Betty, and more about the mind of the human animal than visceral reactions, sophistry in politics, bruised egos, offense taken, apologies given, etc.

I would not hesitate to replace Norm Coleman with a Roadrunner.

It uses the cell chip, and AMD opteron chips, and miles of fiber cable and would take more power to run a minute than your home probably consumes in five years. That's just a guess, but in terms of Mips per watt, it's green.

Here's the heart of the beast, the cell chip:


Here's an online pic of the Roadrunner - when you reach that scale, it does not fit on a desktop, it looks like the pics from the old days.


More umph than needed to beat Kasparov. It will be used to model reliability of the nuclear arsenal, because even underground testing has been banned since 1992.

Why, regarding the nukes -- we built them, differing designs, power levels, neutron bombs, tactical nukes, etc., why would we want to model them again, after having done it once and done the underground testing to prove things worked as intended?

Because of Tritium.

The petaflop barrier first, then the Tritium situation. If you care. If not, it would be a surprise if you got this far before abandoning the post.

The online stuff on the Roadrunner and supercomputers in the order I read it is here, here, here, here, here, here, here, here, and elsewhere, per Google news for example.

In summary, piecing together the reports, a petaflop is a thousand trillion calculations per second, more specifically, one "flop" is a floating point operation. Using [presumably] 64 bit floating point standard representation, with the leading bit a sign bit, and an "operation" presumably being sign-conscious adding, since division algorithms all require multiple "operation" steps to do a single divide operation completely, from the two numbers, divisor and dividend, in floating point, to get the quotient, again in floating point. It

tops what was the most powerful computer in the world, the BlueGene/L. That one runs at 478.2 teraflops (trillions of calcs/second, not thousands of trillions calcs/second), and uses 212,992 processors. But believe it or not, Roadrunner needs only 20,000 chips as the special design will utilize "both conventional Opteron processors made by AMD and the PlayStation 3's Cell processor. " This project took "several years" of work by engineers from Sony, IBM and Toshiba, and the result is a machine that uses 13,000 of the PS3's Cell processors, with each one of the 8-core chips performing at speeds of 4GHz. Roadrunner's first practical task? To show up at the Los Alamos National Laboratory in New Mexico and "monitor the country's nuclear stockpile." Sounds important. it'll be housed in 288 refrigerator-sized cases and linked together with 57 miles of fibre-optic cable. Oh, and while we all know the PS3 consumes a giant amount of power, that pales in comparison to the three megawatts of power Roadrunner is going to need. And here's the final set of statistics that will make your head swim, as offered by the administrator of the National Nuclear Security Administration, Thomas P. D'Agostino: If everybody on earth (that's about 6 billion people) used a regular ol' calculator 24 hours a day, 7 days a week, 365 days a year, it would take them 46 years to perform the calculations Roadrunner could manage in one day.


That's from the first link, above, and other links add:

The Livermore Blue Gene/L system has held the title of fastest supercomputer since November 2004, when it made the No. 1 spot on the Top 500 Supercomputer Sites list. Following several expansions, that system was listed at 478.2 trillion floating operations per second on the most recent Top 500 list. The Top 500 list is compiled by Erich Strohmaier and Horst Simon of NERSC/Lawrence Berkeley National Laboratory, Hans Meuer of the University of Mannheim, Germany, and Jack Dongarra of the University of Tennessee, Knoxville. The RoadRunner is the result of six years of work by Los Alamos researchers and IBM. Fremont-based Panasas Inc. provided the storage system that makes operations of such magnitude possible.
Its speed is an important milestone in high-performance computing, which is now used for aircraft design, oil exploration, financial forecasting, biotechnology, and other applications. [... The petaflop barrier was a sought-after milestone for years and once installed as scheduled in August at Los Alamos National Lab it is expected to be capable of 1.5 petaflop operation.] After being loaded onto 21 tractor trailer trucks and shipped from New York to Los Alamos in New Mexico, Roadrunner will perform at speeds equivalent to 100,000 laptops combined. [According to John Morrison, leader of Los Alamos's high performance computing division, simulation complexity talked about for a number of years in the high-performance computing industry, will be possible with Roadrunner. Besides testing ongoing nuclear weapon reliability pharmaceutical and other complex molecular modeling [protien folding for example] will be feasible in ways previously unattainable and the Wall Street quants can have a go at things more complexly than at present but my guess is with greed built into human nature, especially Wall Street human nature, no degree of modeling skill or number crunching capability is likely to curb bubbles and cycles.] Roadrunner cost about $100 million and combines 6,948 dual-core AMD Opteron chips and 12,960 Cell engines, all housed in IBM blade servers [the big boxes holding multiple processing cards, each called a "blade"]. Eighty terabytes of memory are kept in 288 "refrigerator-sized" racks occupying 6,000 square feet [it being unclear whether the processing part or the chip memory part will be the bigger square foot and power consumer, and I saw no mention of secondary storage or I/O to more conventional machines giving simulation image and other output to users]. Roadrunner weights 500,000 pounds, and has 10,000 Infiniband and Gigabit Ethernet connections requiring 57 miles of fiber optic cable. [So off-the-shelf ultra-speed Ethernet between cards, and presumably for I/O to the host computer interface(s). The following is not fully clear to me, but...] IBM built 3,456 "tri-blades," each consisting of two IBM QS22 blade servers using Cell engines and one LS21 blade server based on AMD chips. "Standard processing (e.g., file system I/O) is handled by the Opteron processors [while] mathematically and CPU-intensive elements are directed to the Cell processors," IBM states in a press release. "Each tri-blade unit can run at 400 billion operations per second." Linux. What else did you expect? "Roadrunner uses open source Linux software from Red Hat and is more efficient than most supercomputers, delivering 376 million calculations per watt, according to IBM. That should be enough to place Roadrunner among the most energy-efficient systems on the Green 500 list coming out later this month, IBM says."

[If you recall the $100 million price metioned above, $133 million was also reported. One article indicated IBM "is working on another Petaflop computer called Blue Gene/P." Intel and Cray are working on an alternate architecture.] The top 10 supercomputers today are reported in one item as:

1. IBM Roadrunner
2. IBM Blue Gene
3. NEC Earth Simulator
4. IBM ASCI White
5. Intel ASCI Red
6. Hitachi CP-PACS
7. Hitachi SR2201
8. Fujitsu Numerical Wind Tunnel - II
9. Intel Paragon XP/S140
10. Fujitsu Numerical Wind Tunnel - I

[One report noted that Moore's law re device performance rising exponentially is that energy consumption varies as the cube of clock speed, hence, doubling the clock speed will consume eight times the energy, and at current 65 nanometer size heat dissapation imposes limits - hence the move is to parallel processing. Los Alamos personnel are credited with the design management, with IBM and other vendors playing a role. The cell chip is optimized for floating point operation, and the one pictured has eight cells with the remaining on-chip circuitry being control and I/O. Each cell is a "flophouse" in a sense, Google "cell chip" for specifics.]


An arguably interesting site: http://www.top500.org/ will link you to info on the top computers in Russia, China, Europe, and North America - and it has a list of top ten systems that you can compare to the above list, from a different site:

TOP 10 Systems
1 BlueGene/L - eServer Blue Gene Solution, IBM
2 JUGENE - Blue Gene/P Solution, IBM
3 SGI Altix ICE 8200, Xeon quad core 3.0 GHz, SGI
4 Cluster Platform 3000 BL460c, Xeon 53xx 3GHz, Infiniband, Hewlett-Packard
5 Cluster Platform 3000 BL460c, Xeon 53xx 2.66GHz, Infiniband, Hewlett-Packard
6 Red Storm - Sandia/ Cray Red Storm, Opteron 2.4 GHz dual core, Cray Inc.
7 Jaguar - Cray XT4/XT3, Cray Inc.
8 BGW - eServer Blue Gene Solution, IBM
9 Franklin - Cray XT4, 2.6 GHz, Cray Inc.
10 New York Blue - eServer Blue Gene Solution, IBM

Each list must be true, or it would not be on the Internet.

Add to that, from here, apparently the fourth fastest and top system in Asia is in India, not China nor Japan - true again, since it's on the Net, "An interesting fact to be proud of: India's EKA (the Sanskrit name for number one) supercomputer is ranked as the 4th fastest in the world and is the fastest supercomputer in Asia, according to the Top 500 Supercomputer list announced at SC07, the International Conference for High Performance Computing, Networking, Storage and Analysis at Reno, Nevada, USA." I wonder how this differs from listing the top ten restaurants in the Twin Cities. Each list is true.

That's as far as I went on the hardware. On the purpose, bomb modeling, and tritium, there is a transitional DOE link here, then other links accessed are here, here, here, and here. The "executive summary" is that they don't know the half-life of tritium for certain, experiments that way are ongoing, or they know but for national security reasons are only giving an approxomate number. The Wikipedia cites the NIST figure, 4500±8 days (approximately 12.32 years). That is an error in the fourth significant digit, an error of 0.2%, and while it arguably is small, all the modeling quality depends on the decay of Tritium to Helium-3. The device is a conventional enriched uranium or plutonium core, a Tritium-Duterium shell, and an outer unenriched Uranium shell. The inner two layers produce a large flux of high temperature [high speed] neutrons, which will fission the normal Uranium isotope which does not fission with normal energy "thermal" neutrons. And it all has to happen in a fraction of an eyeblink before the entire mess comes apart destroying much in its proximity with the design aim to maximize the bang for the buck, in a manner of speaking, to maximize the yield by varying the triggering, layer configurations, etc. Figure that a maximized device, after 12 years has half as much Tritium as when made, and you can see how you might get a largely sub-optimal bang, perhaps even for some designs a failure to trigger the larger outer core explosion [or depending on engineering learning, perhaps the outer layer is the trigger, and the massive amount of regular Uranium is at the center - I have never designed one nor had any designer route plans through me to the Israelis or Chinese, so I can only guess].

So, if Michele Bachmann and George Bush, during lame-duck days, want to October surprise us and the Iranians, they have to know whether it will be "pop" or BANG."

So we need a Roadrunner to help those two poor souls.

***

They don't say but Roadrunner probably would have picked Big Brown to win the triple crown. And Patriots to have won the last Super Bowl. Or us to have accomplished the mission, whatever it was supposed to be Iraq-wise, [ask Condi, ask Rummy], by May 1, 2003.


No matter how much number crunching a machine - hardware/software - can do, the quality of the model is a limit, and good models require judgment and limitation to what is needed, all that is needed to avoid over-simplication, but no more. Not easy. Not intuitive, when you get to astrophysics, particle physics, or trading assets on a globalized basis against other models on other platforms of varying number crunching capability. Game playing systems and trading systems are frequently modeled because benchmarking quality of a model has a good surrogate - is the game won or lost frequently against other models, is mock trading profitable or a loss against other systems, or against the market.

Now, a closing homework assignment, contemplate how more crunching might affect genome research, and how machines such as Roadrunner might help NSA to monitor your phone call, library, medical, and Internet records, yours and mine, while we remember that the NSA minions, happy with teraflops to watch things, must totally love the petaflop capability, and hence are a proven bunch of petaphiles. Put those names in a registry, watch them more carefully, it's politically correct to do that if they're petaphiles. Or did I make a linguistic mistake somewhere?