VERTEX Main Page | Challenges | Architecture | Open VERTEX 1.0 | VERTEX Services |
Challenges in Today's Hybrid HPC Systems
The availability of new compute intensive processors, or accelerators, such as the SIMD, GPGPU and many-core CPUs have enabled hybrid HPC systems with many different instruction set- architectures to coexist in a single cluster. These hybrid HPC systems provide substantial cost-performance advantages over traditional, homogeneous HPC clusters. But they also present significant new challenges:
The new compute intensive processors are not designed to perform IO and support
functions well, depending on traditional CPUs to perform those functions. Moreover,
the accelerators or compute intensive processors usually need to sit on separate
cards or blades from those which host the CPU due to power, memory, and bandwidth
limitations. Determining which processor does what and how they communicate from
an OS, IO and systems perspective is not a simple effort.
The CPU based IO and support nodes would clearly need a full-function operating
system to handle all the system complexity. But having a full-function OS on the
accelerator or compute nodes distracts them from their primary computing task, thus
degrading compute node performance (OS jitters).
Existing HPC cluster software architectures expose run-time management of platform
heterogeneity (hybrid nodes) to the end user. This causes serious usability problems
for systems that deploy a variety of instruction set architectures and processors
in a single, large platform environment.
In addition to these difficulties associated with heterogeneity, there are several scalability related issues when an HPC application demands the use of hundreds, sometimes thousands, even millions of processing cores:
A large number of nodes demanding bandwidth make the IO subsystem inefficient, un-scalable,
and unreliable. Full function OS's in compute nodes introduce more OS jitters as
the system size increases, thereby reducing overall performance and making the application
un-scalable.
VERTEX Main Page | Challenges | Architecture | Open VERTEX 1.0 | VERTEX Services |