Improved data movement for GPUs with NVIDIA GPUDirect RDMA technology


Whether you’re exploring mountains of data, researching scientific problems, training neural networks, or modeling financial markets, you need a computing platform with the highest data throughput. And, since GPUs consume data much faster than CPUs, you also need the extra bandwidth, while maintaining low latency,

InfiniBand is the ideal network for powering and scaling the data-hungry GPUs you need to maintain efficiency in the AI ​​Exascale era, which is already in full swing in data centers around the world.

InfiniBand remains the world’s most widely adopted high-performance network technology for high-performance computing (HPC) and has grown over the years to also become the most widely adopted high-speed network deployed in all areas of the industry. IA. This includes those used for advanced research, development and critical commercial deployments that fully integrate the latest NVIDIA A100 Ampere architecture.

GPUDirect RDMA: direct communication between NVIDIA GPUs

InfiniBand’s Remote Direct Memory Access (RDMA) engines can be leveraged to provide direct access to GPU memory. Designed specifically for GPU acceleration needs, GPUDirect RDMA provides a direct communication path between NVIDIA GPUs in remote systems using InfiniBand. This eliminates system processors and the required buffer copies of data through system memory, resulting in superior performance.

Figure 1: Block diagram of NVIDIA GPUDirect RDMA connectivity.

In recent years, hardware technology has improved dramatically. For example, InfiniBand moved up to 400 Gb / s, we have seen the transition to PCIe Gen-4, and GPUs process data more than 20 times faster. Yet there has been one constant: NVIDIA technology has constantly improved, keeping pace through generations of software.

GDRCopy: fast copy library

GPUDirect RDMA also received a performance improvement with GDRCopy, a fast, low-latency copy library based on NVIDIA GPUDirect RDMA technology. While GPUDirect RDMA is intended for direct access to GPU memory from the network, it is possible to use these same APIs to create perfectly valid CPU mappings of GPU memory. CPU-driven copying requires only a small amount of overhead and dramatically improves performance.

Modern communication libraries such as NVIDIA HPC-X, Open MPI, and MVAPICH2 can easily take advantage of GPUDirect RDMA and GDRCopy to exploit the lowest latency and highest bandwidth when transferring data between acceleration capabilities unprecedented NVIDIA A100 GPUs.

A test drive at the HPC-AI Advisory Council Performance Center

Recently, we took the latest versions of these supported libraries for a trial on the Tessa cluster, which just presented to the HPC-AI Advisory Board. The HPC-AI Advisory Council High Performance Center provides an environment to develop, test, compare and optimize products based on clustering technology. The Tessa cluster is somewhat unique, in that it is fully equipped with NVIDIA A100 PCIe 40 GB GPUs and populated with ConnectX-6 HDR InfiniBand adapters, running on highly flexible servers from Colfax International, the CX41060t-XK7, a platform based on PCIe-3. While this is not common for a PCIe-3 configuration, running at HDR 200Gb / s InfiniBand instead of HDR100, it certainly cuts down on every ounce of platform performance.

Click to enlarge
Click to enlarge
Click to enlarge

Figure 2: GPUDirect + GDRCopy performance on HPC-AI Advisory Board “Tessa” cluster with NVIDIA HPC-X

Speed ​​up the most important work of our time

The combination of NVIDIA MagnumIO â„¢, InfiniBand and A100 Tensor Core GPUs delivers unmatched acceleration across the research, scientific computing and industry spectrum. To learn more about how NVIDIA is accelerating the world’s best performing data centers for AI, data analytics, and HPC applications, check out the resources below to get started:

Access this tutorial for a complete overview of GPUDirect RDMA and GDRCopy:

Check out more resources on GPUDirect RDMA:

Presentation of the NVIDIA GPUDirect RDMA solution

Get more information about GDRCopy:

Learn more about NVIDIA Magnum IO, the modern data center IO subsystem:

Scottish Schultz | Senior Director, HPC and Technical IT | NVIDIA

Scot Schultz is an HPC technology specialist specializing in artificial intelligence and machine learning systems. Schultz has extensive knowledge of distributed computing, operating systems, AI frameworks, high speed interconnects and processor technologies. Throughout his career, with over 25 years of experience in high performance computing systems, his responsibilities have included various engineering and leadership roles, including the strategic activation of the HPC technology ecosystem. Scot has been instrumental in the growth and development of many industry standards organizations.


About Author

Leave A Reply