Friday, 17th September 2021

NVIDIA and partners launch HGX A100 systems

NVIDIA is turbocharging the NVIDIA HGX™ AI supercomputing platform with new technologies that fuse AI with high performance computing, making supercomputing more useful to a growing number of industries.

To accelerate the new era of industrial AI and HPC, NVIDIA has added three key technologies to its HGX platform: the NVIDIA® A100 80GB PCIe GPU, NVIDIA NDR 400G InfiniBand networking, and NVIDIA Magnum IO™ GPUDirect™ Storage software. Together, they provide the extreme performance to enable industrial HPC innovation.

Atos, Dell Technologies, Hewlett Packard Enterprise (HPE), Lenovo, Microsoft Azure and NetApp are among dozens of partners using the NVIDIA HGX platform for next-generation systems and solutions.

“The HPC revolution started in academia and is rapidly extending across a broad range of industries,” said Jensen Huang, founder and CEO of NVIDIA. “Key dynamics are driving super-exponential, super-Moore’s law advances that have made HPC a useful tool for industries. NVIDIA’s HGX platform gives researchers unparalleled high performance computing acceleration to tackle the toughest problems industries face.”

Industry Leaders Use HGX Platform to Power Innovation Breakthroughs

The HGX platform is being used by high-tech industrial pioneer General Electric, applying HPC innovation to computational fluid dynamics simulations that guide design innovation in large gas turbines and jet engines. The HGX platform has achieved order-of-magnitude acceleration for breakthrough CFD methods in GE’s GENESIS code, which employs Large Eddy Simulations to study the effects of turbulent flows inside turbines that are composed of hundreds of individual blades and require uniquely complex geometry.

Besides driving industrial HPC transformation, the HGX platform is also accelerating scientific HPC systems around the world, including the next next-generation supercomputer at the University of Edinburgh, also announced today.

NVIDIA A100 80GB PCIe Performance Enhancements for AI and HPC

NVIDIA A100 Tensor Core GPUs deliver unprecedented HPC acceleration to solve complex AI, data analytics, model training and simulation challenges relevant to industrial HPC. A100 80GB PCIe GPUs increase GPU memory bandwidth 25 percent compared with the A100 40GB, to 2TB/s, and provide 80GB of HBM2e high-bandwidth memory.

The A100 80GB PCIe’s enormous memory capacity and high-memory bandwidth allow more data and larger neural networks to be held in memory, minimizing internode communication and energy consumption. Combined with faster memory bandwidth, it enables researchers to achieve higher throughput and faster results, maximizing the value of their IT investments.

A100 80GB PCIe is powered by the NVIDIA Ampere architecture, which features Multi-Instance GPU (MIG) technology to deliver acceleration for smaller workloads such as AI inference. MIG allows HPC systems to scale compute and memory down with guaranteed quality of service. In addition to PCIe, there are four- and eight-way NVIDIA HGX A100 configurations.

NVIDIA partner support for the A100 80GB PCIe includes Atos, Cisco, Dell Technologies, Fujitsu, H3C, HPE, Inspur, Lenovo, Penguin Computing, QCT and Supermicro. The HGX platform featuring A100-based GPUs interconnected via NVLink is also available via cloud services from Amazon Web Services, Microsoft Azure and Oracle Cloud Infrastructure.

Next-Generation NDR 400Gb/s InfiniBand Switch Systems

HPC systems that require unparalleled data throughout are supercharged by NVIDIA InfiniBand, the world’s only fully offloadable in-network computing interconnect. NDR InfiniBand scales performance to tackle the massive challenges in industrial and scientific HPC systems. The NVIDIA Quantum™-2 fixed-configuration switch systems deliver 64 ports of NDR 400Gb/s InfiniBand per port (or 128 ports of NDR200), providing 3x higher port density versus HDR InfiniBand.

The NVIDIA Quantum-2 modular switches provide scalable port configurations up to 2,048 ports of NDR 400Gb/s InfiniBand (or 4,096 ports of NDR200) with a total bidirectional throughput of 1.64 petabits per second — 5x over the previous-generation. The 2,048-port switch provides 6.5x greater scalability over the previous generation, with the ability to connect more than a million nodes with just three hops using a DragonFly+ network topology.

The third generation of NVIDIA SHARP In-Network Computing data reduction technology boosts performance for high-performance industrial and scientific applications with 32x higher AI acceleration power compared to the previous generation.

Advanced management features include self-healing network capabilities and NVIDIA In-Network Computing acceleration engines. Data center downtime is further minimized with the NVIDIA UFM® Cyber-AI platform.

Based on industry standards, the NVIDIA Quantum-2 switches — which are expected to sample by year end — are backward- and forward-compatible, enabling easy migration and expansion of existing systems and software.

Industry-leading infrastructure manufacturers — including Atos, DDN, Dell Technologies, Excelero, GIGABYTE, HPE, Lenovo, Penguin Computing, QCT, Supermicro, VAST and WekaIO — plan to integrate the Quantum-2 NDR 400Gb/s InfiniBand switches into their enterprise and HPC offerings. Cloud service providers including Azure are also taking advantage of InfiniBand technology.

Introducing Magnum IO GPUDirect Storage

Providing unrivaled performance for complex workloads, Magnum IO GPUDirect Storage enables direct memory access between GPU memory and storage. The direct path enables applications to benefit from lower I/O latency and use the full bandwidth of the network adapters while decreasing the utilization load on the CPU and managing the impact of increased data consumption.

Super Micro Computer has introduced an expanded portfolio of single-processor systems based on the n...
Fujitsu has launched a refreshed portfolio of PRIMERGY mono-socket servers to support digital transf...
Lenovo Open Cloud Automation (LOC-A) automates the many tasks required to deploy cloud infrastructur...
NVIDIA has introduced NVIDIA AI Enterprise, a comprehensive software suite of AI tools and framework...
A new £1.1m high-performance computer has been officially unveiled at Loughborough University.
Annual cadence of innovations drives leadership from silicon to system.
NVIDIA and King’s College London has unveiled new details about one of the first projects on Cambrid...
New agreement delivers Iceotope’s liquid-cooled chassis with HPE ProLiant servers in any remote envi...