As a long-time member of the HPC community, I have seen a few orders of magnitude performance gains in my career. In the early 90s when the Top500.org list was starting, performance was measured in Gflops. In the late 90's with the introduction of Sandia's ASCI Red, we hit the Teraflop, to the mid-2000s when I was on the review committee approving LANL's Roadrunner (first to a Petaflop). Having a front row seat to leading-edge supercomputing during this time was fascinating. I witnessed the enormous effort expended to take a design from white board to system implementation and the huge lift necessary to port and certify applications. I can literally say I have seen and been involved in $B’s worth of HPC computing procurements and code modernization efforts over the years.
In the early 90s, as we were starting with parallel computing, I remember a dressing down one of the NERSC physicists gave a superkid (1). He dared to suggest the future would involve systems with 100K+ processing elements. The physicist was adamant that was not the direction HPC was going, it was not possible feasible and an absurd thing to even speculate on. Now that systems are being built with millions of processing elements, I am tempted to look for that scientist and have a chat.
Parallelism is, in fact, the trend most responsible for performance growth as HPC has grown at roughly 2X Moore’s Law for much of my career. The rapid growth in parallel computing forced us to learn/invent methods to harness the capability. As we succeeded, our appetite grew along with the number of cores in the systems we deployed. Terms such as scale-up vs scale-out, strong scaling vs weak scaling were brought into common practice as we progressed. The legendary "hero programmer" was (and remains) very real with huge efforts expended to prove out technology in architecture, and to harness the compute power for application. These successes, in turn, helped justify funds to push the edge of capability.
There have been some notable “step functions” in the past 30 years, where a system (both hardware and software) takes such a leap forward that it sits in the number 1 spot on the Top500 for several iterations. One of these was NEC/Japan’s Earth Simulator at 36 TFlops with big vectors (which, then, I would describe to people as equivalent to one flop for every gallon of water in Lake Tahoe). BlueGene/L was an architectural push in the parallel direction with low-powered processors numbering over 128,000. It held the top spot at 280 TFlops for several generations of the Top500 list. The China Tianhe-2 system, which is the next step function, came in at 34 petaflops. This is about 5 operations for ever gallon of water in all the great lakes, which hold 6 quadrillion (x10^15) gallons.
This Sunday is 10/18 - which gives us a reason to focus on exascale or 10^18 flops, the next target performance metric the community is focused on achieving. To continue the water analogy, the Mediterranean Sea consists of 1.1x10^18 gallons of water. The Supercomputer Fugaku, at the RIKEN Computational Center in Kobe Japan looks to be a step function system in many ways. It has lifted the current performance chart by a good amount despite being about half an exaflop in size. With new architectural features such as High Bandwidth Memory (HBM) and an on-chip vector implementation (2x512 Scalable Vector Extension), the system is the first described as an “Exascale Architecture”.
Consistent with advancements in the past, visionaries in the community see great potential for advancement in human knowledge with this latest step up. For much of my career, I have been able to directly see the excitement growing among scientists who want access to the latest technology. It is still the case that the science excites and the road ahead seems difficult and steep. With great strides in capability in the cloud, and the availability of Arm-IP for the community, I am witness to an elevated level of cloud HPC activity. And we're seeing an increasingly diverse set of cloud HPC workloads and an international community of players.
Fundamentally, advancement in science and in understanding our world and our impact on it has never been more important to humanity. This year has seen such devastation and disruption in our lives – more so than anytime in the past. With the pandemic, extreme weather and fire, we are reminded of this daily. The COVID-19 HPC Consortium, which has banded together to help fight the pandemic, has been an amazing example of the community coming together to address a common enemy. Exascale capability would certainly aid in the fight.
I’ve been spending some time thinking about weather modeling & prediction. The state of art today in prediction is quite impressive as evidenced by the recent 3.5 day landfall prediction for hurricane Laura on the US Gulf Coast being accurate to within a mile. The implications of this are that emergency planning has both more time and more precision – saving lives and protecting property. Another impressive example from current events is the team at SDSC with their Firemap application, where they are able to predicting wildfire paths. This helps emergency response know where to deploy scarce resources to protect life and property. These are but a few examples of the dramatic impact HPC is having on our everyday lives.
Looking forward to the step function promise of exascale, I am drawn to the topic of climate change. I am fortunate to be connected to some of the world’s leading climate scientists in both the US and the EU. Their applications can take advantage of leadership class systems today and they look forward to additional capability. These are problems that are critical to our environment and thus our long-term existence on the planet. Recently, as I spoke with them about supercomputer Fugaku and the coming exascale era, I was smitten with their enthusiasm and the thirst for knowledge they have. A clear desire for exascale exists in these groups. With it, they can make real advancements in science and help answer key questions to guide policy on environmental issues. Their knowledge of Fugaku was strong: An Arm architecture from Fujitsu featuring the first implementation of High-Bandwidth Memory (HBM - 1 TB/s memory bandwidth) and the Scalable Vector Extension (SVE). This combination endows a CPU-only architecture with capabilities found only in recent GPUs. With a simpler programming interface, this represents an interesting path forward. This technology is also available today from HPE as the Apollo-80.
In closing, I want to say that the race to exascale is alive and well. What country or organization get there first is yet to be determined, but the need and desire to advance knowledge is strong. I am looking forward to being involved and helping users in the new era.
Explore Arm in HPC