// php echo do_shortcode (‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’)?>
Over the past decade, workflows on High Performance Computing (HPC) systems have diversified considerably, often mixing AI / ML processing with traditional HPC. In response, a wide variety of specialized HPC computing systems (cluster nodes) have been designed and used to meet the performance optimization of specific applications and frameworks. Queues targeting these systems differently allow each user to instruct the Batch Scheduler to dispatch work to hardware that closely matches the computational requirements of their application. High memory nodes, nodes with one or more accelerators, nodes supporting a high performance parallel file system, interactive nodes, and hosts designed to support containerized or virtualized workflows are just a few examples of groups. of specialized nodes developed for HPC. Nodes can also be grouped into queues depending on how they are interconnected.
The density and traffic requirements of interconnected systems in a data center hosting an HPC cluster require topologies such as the spine / leaf architecture, as shown in Figure 1. This picture becomes even more complex if the HPC systems exceed the capacity of a single location and are distributed among multiple buildings or data centers. Traffic patterns involving inter-process communication, interactive access, shared file system I / O, and service traffic like NTP, DNS, and DHCP, some of which are highly sensitive to latency, should otherwise compete for the available bandwidth. Connectivity using the spine / leaf architecture solves this problem by enabling routing algorithms that can provide a single, unimpeded path for all node-to-node communication.
Figure 1: Structure topologies
HPC is now evolving from an almost exclusively purpose-built on-premises infrastructure to hybrid or even fully cloud-resident architectures. The high cost of building, operating and maintaining the infrastructure to host the dedicated HPC has challenged many government labs, businesses and universities to rethink the strategy of specially designed HPC over the past two decades. . Instead of buying the space, racks, power, cooling, data storage, servers, and networking needed to create on-premise HPC clusters, not to mention the staff and expense to run it. maintenance and updating of these systems, all HPC practitioners, except the largest, move. to a model more based on the use of cloud providers that offer HPC services. These changes have spurred a refocusing of investments in internet connectivity and the bandwidth needed to enable cloud burst, data migration, and interactivity on the infrastructure residing in the cloud. This creates new challenges for developers working to establish custom environments in which to develop and run application frameworks, often resulting in complex interdependencies between software versions. Using containerization has helped to isolate many of these software and library dependencies, simplifying migration to the cloud due to relaxed host image constraints.
HPC Network Infrastructure Considerations for 400G / 800G Ethernet
Internet service providers and operators who are responsible for delivering all this traffic depend on technologies that are developing at a constant and reliable rate, and of course are very cost-conscious as their bottom line is tied to the construction investment, upgrade and manage the cost of operating the network infrastructure. Hyperscalers and cloud service providers are also facing increased financial pressures to aggregate and reduce the number of switching devices, power usage, and cooling demands in their data centers.
Cost isn’t the only factor to consider when driving Ethernet to these new heights of speed. The PAM-4 signaling shown in Figure 2 was initially introduced at a signaling rate of 25 Gb / s as a catalyst for 100G Ethernet, but this method requires forward error correction (FEC) due to the high rates of ‘high bit errors. Signaling changes incorporating FEC create both latency overhead and complexity for the physical layer design, but even faster signaling rates make the use of FEC mandatory. Although link aggregation of multiple 100 Gb / s ports to achieve higher bandwidth, which is still possible with NRZ signaling rates, might be a temporary solution to this problem, it is not a solution. in the long term because of the density constraints it implies as well as the high cost of the exponentially larger numbers of ports that are required. Beyond 400G Ethernet, alternatives to PAM-4 offering even greater and longer signal density must be exploited.
Figure 2 – High speed Ethernet signaling
Cabling is another challenge for high speed Ethernet. Copper cables are often too loud and power hungry at these speeds, even over short distances. A use case requires distributed cabling options because multiple computer systems can be supported by a single switch port with high enough bandwidth. Another use case focuses on the switch-to-switch aggregation layer switch or site-to-site connectivity. Dense wavelength division multiplexing (DWDM) for long distance connections (approximately 80 km per repeated segment) and single mode fiber (SMF) for shorter range connections will gradually replace multimode fiber and copper technologies. to enable signaling rates of 200Gb / s, but the 100G electrical signaling rates and cost advantages of multimode fiber will be difficult to overcome and replace in the coming years. CWDM and DWDM introduce coherent optical signaling as an alternative to PAM-4, but involve even greater power, cost and complexity to achieve the longest ranges they allow. Within the data center, the pressures of backward compatibility, switch aggregation and fewer switches and the potential for energy savings are strong incentives for a flexible on-board optical design that could also accommodate existing plug-in modules for low-rate connectivity.
Enabling 400G / 800G Ethernet with IP
So how do SoC designers develop chips to support 400G Ethernet and beyond? Network switches and computer systems must use components that support these high data rates to deliver the application acceleration they promise. Whether reducing the complexity of a network fabric to achieve higher levels of aggregation, expanding the infrastructure of a hyperscaler beyond the limits previously imposed by slower network technologies or speed up the delivery of data to a neural network running on a group of computers connected to the network – all elements of the data path must be able to support the lower latencies and higher bandwidth required without power excessive or cost penalties. And of course, backward compatibility with slower components will ensure the seamless adoption and integration of Ethernet 400G / 800G and beyond into existing data centers.
Providing this performance in 400G / 800G networks involves multiple challenges in the physical and electronic fields. Electrical efficiencies with faster clock speeds, parallel paths, and complex signaling requirements are difficult to achieve, and inherent high error rates at faster communication speeds create the need for a highly efficient FEC for ensure minimum latency with low retransmission rates. As mentioned earlier, the cabling brackets need to support the high data rates on racks, data centers, and even metro scales. No cabling technology is ideal over such a diverse range of lengths, so multiple types of media must be supported by any solution developed.
SoC designers need a silicon IP developed with all of these things in mind. Synopsys has been a leading developer of IP over silicon Ethernet for many generations of the protocol and remains essential in promoting standardization for Ethernet 400G / 800G and beyond. Synopsys offers an integrated 400G / 800G Ethernet IP solution that is industry standard and configurable to meet the diverse needs of today’s HPC, even with AI / ML workloads, while maintaining backward compatibility with lower speeds and old standardization.
About the Author: Jerry Lotto, Senior Technical Marketing Director
Jerry Lotto brings over 35 years of scientific / high performance computing experience. Jerry built the first HPC teaching cluster in Harvard’s Department of Chemistry and Chemical Biology with an InfiniBand backbone. In 2007, Jerry helped establish the Harvard Arts and Sciences Research Computing Group. In an unprecedented collaborative effort between 5 universities, industry and state government, Jerry also helped design the Massachusetts Green High-Performance Computing Center in Holyoke, MA, which was completed in November 2012.
Blog: What is driving the demand for 200G, 400G and 800G Ethernet?
Video: Complete DesignWare 400G / 800G Ethernet IP Product Update
Article: Anatomy of an Integrated Ethernet IP PHY for High Performance Computing SoCs
Webinar: Minimize Latency with 400G / 800G Ethernet IP