NVIDIA and partners launch HGX A100 systems

NVIDIA is turbocharging the NVIDIA HGX™ AI supercomputing platform with new technologies that fuse AI with high performance computing, making supercomputing more useful to a growing number of industries.

Tuesday, 29th June 2021 Posted 4 years ago in Server Technology Storage + Servers by Phil Alsop

To accelerate the new era of industrial AI and HPC, NVIDIA has added three key technologies to its HGX platform: the NVIDIA® A100 80GB PCIe GPU, NVIDIA NDR 400G InfiniBand networking, and NVIDIA Magnum IO™ GPUDirect™ Storage software. Together, they provide the extreme performance to enable industrial HPC innovation.

Atos, Dell Technologies, Hewlett Packard Enterprise (HPE), Lenovo, Microsoft Azure and NetApp are among dozens of partners using the NVIDIA HGX platform for next-generation systems and solutions.

“The HPC revolution started in academia and is rapidly extending across a broad range of industries,” said Jensen Huang, founder and CEO of NVIDIA. “Key dynamics are driving super-exponential, super-Moore’s law advances that have made HPC a useful tool for industries. NVIDIA’s HGX platform gives researchers unparalleled high performance computing acceleration to tackle the toughest problems industries face.”

Industry Leaders Use HGX Platform to Power Innovation Breakthroughs

The HGX platform is being used by high-tech industrial pioneer General Electric, applying HPC innovation to computational fluid dynamics simulations that guide design innovation in large gas turbines and jet engines. The HGX platform has achieved order-of-magnitude acceleration for breakthrough CFD methods in GE’s GENESIS code, which employs Large Eddy Simulations to study the effects of turbulent flows inside turbines that are composed of hundreds of individual blades and require uniquely complex geometry.

Besides driving industrial HPC transformation, the HGX platform is also accelerating scientific HPC systems around the world, including the next next-generation supercomputer at the University of Edinburgh, also announced today.

NVIDIA A100 80GB PCIe Performance Enhancements for AI and HPC

NVIDIA A100 Tensor Core GPUs deliver unprecedented HPC acceleration to solve complex AI, data analytics, model training and simulation challenges relevant to industrial HPC. A100 80GB PCIe GPUs increase GPU memory bandwidth 25 percent compared with the A100 40GB, to 2TB/s, and provide 80GB of HBM2e high-bandwidth memory.

The A100 80GB PCIe’s enormous memory capacity and high-memory bandwidth allow more data and larger neural networks to be held in memory, minimizing internode communication and energy consumption. Combined with faster memory bandwidth, it enables researchers to achieve higher throughput and faster results, maximizing the value of their IT investments.

A100 80GB PCIe is powered by the NVIDIA Ampere architecture, which features Multi-Instance GPU (MIG) technology to deliver acceleration for smaller workloads such as AI inference. MIG allows HPC systems to scale compute and memory down with guaranteed quality of service. In addition to PCIe, there are four- and eight-way NVIDIA HGX A100 configurations.

NVIDIA partner support for the A100 80GB PCIe includes Atos, Cisco, Dell Technologies, Fujitsu, H3C, HPE, Inspur, Lenovo, Penguin Computing, QCT and Supermicro. The HGX platform featuring A100-based GPUs interconnected via NVLink is also available via cloud services from Amazon Web Services, Microsoft Azure and Oracle Cloud Infrastructure.

Next-Generation NDR 400Gb/s InfiniBand Switch Systems

HPC systems that require unparalleled data throughout are supercharged by NVIDIA InfiniBand, the world’s only fully offloadable in-network computing interconnect. NDR InfiniBand scales performance to tackle the massive challenges in industrial and scientific HPC systems. The NVIDIA Quantum™-2 fixed-configuration switch systems deliver 64 ports of NDR 400Gb/s InfiniBand per port (or 128 ports of NDR200), providing 3x higher port density versus HDR InfiniBand.

The NVIDIA Quantum-2 modular switches provide scalable port configurations up to 2,048 ports of NDR 400Gb/s InfiniBand (or 4,096 ports of NDR200) with a total bidirectional throughput of 1.64 petabits per second — 5x over the previous-generation. The 2,048-port switch provides 6.5x greater scalability over the previous generation, with the ability to connect more than a million nodes with just three hops using a DragonFly+ network topology.

The third generation of NVIDIA SHARP In-Network Computing data reduction technology boosts performance for high-performance industrial and scientific applications with 32x higher AI acceleration power compared to the previous generation.

Advanced management features include self-healing network capabilities and NVIDIA In-Network Computing acceleration engines. Data center downtime is further minimized with the NVIDIA UFM® Cyber-AI platform.

Based on industry standards, the NVIDIA Quantum-2 switches — which are expected to sample by year end — are backward- and forward-compatible, enabling easy migration and expansion of existing systems and software.

Industry-leading infrastructure manufacturers — including Atos, DDN, Dell Technologies, Excelero, GIGABYTE, HPE, Lenovo, Penguin Computing, QCT, Supermicro, VAST and WekaIO — plan to integrate the Quantum-2 NDR 400Gb/s InfiniBand switches into their enterprise and HPC offerings. Cloud service providers including Azure are also taking advantage of InfiniBand technology.

Introducing Magnum IO GPUDirect Storage

Providing unrivaled performance for complex workloads, Magnum IO GPUDirect Storage enables direct memory access between GPU memory and storage. The direct path enables applications to benefit from lower I/O latency and use the full bandwidth of the network adapters while decreasing the utilization load on the CPU and managing the impact of increased data consumption.

NVIDIA and partners launch HGX A100 systems

NVIDIA is turbocharging the NVIDIA HGX™ AI supercomputing platform with new technologies that fuse AI with high performance computing, making supercomputing more useful to a growing number of industries.

76% of global HPC data centres to use quantum computing by 2023

YellowDog builds multi-region supercomputer in the cloud in 33 minutes

82% of IT leaders looking to Cloud for critical mainframe applications

Atos and the CEA launch EXA1 supercomputer

Accelerating the delivery of HPC clusters

Xilinx launches Alveo U55C

Supercomputing reaches new heights

New Tier 2 HPC platform launched by HPC Midlands+