Researchers’ Reference Experimental RISC-V Supercomputer

0

A group of researchers from the University of Bologna and Cineca explored an experimental cluster of eight-node, 32-core RISC-V supercomputers. The demonstration showed that even a group of SiFive’s humble Freedom U740 SoCs could run supercomputer applications at relatively low power. Additionally, the cluster performed well and supported a basic high-performance computing stack.

Need for RISC-V

One of the benefits of the open source RISC-V instruction set architecture is the relative simplicity of building a highly customized RISC-V core for a particular application that will provide a very competitive balance of performance, power energy and cost. It makes RISC-V suitable for emerging applications and various high-performance computing projects that meet a particular workload. The group explored the cluster to prove that RISC-V based platforms can work for high performance computing (HPC) from a software perspective.

“Monte Cimone is not intended to achieve strong floating point performance, but it was built with the purpose of ‘priming the pipe’ and exploring the challenges of integrating a multi-node RISC-V cluster capable of delivering a production HPC stack including interconnect, storage, and power monitoring infrastructure on RISC-V hardware,” the project description (opens in a new tab) bed (via NextPlatform (opens in a new tab)).

For its experiments, the team took a ready-to-use Monte Cimone cluster (opens in a new tab) consisting of four dual-board blades in a 1U form factor built by E4, an Italian HPC company (note that E4’s Monte Cimone cluster consists of six blades). The Monte Cimone is a platform “for porting and tuning software stacks and HPC applications relevant to the RISC-V architecture”, so the choice was well justified.

Cluster

The 1U Monte Cimone machines used two SiFive Unmatched HiFive development motherboards powered by SiFive’s heterogeneous Freedom U740 multi-core SoC which incorporates four U74 cores running at up to 1.4 GHz and one S7 core using SiFive’s exclusive Mix + Match technology. the company as well as 2 MB of L2 cache. . Additionally, each platform has 16GB of DDR4-1866 memory and a 1TB NVMe SSD.

(Image credit: E4)

Each node also sports a Mellanox ConnectX-4 FDR 40 Gbps Host Channel Adapter (HCA), but for some reason RDMA did not work even though the Linux kernel could recognize the device driver and mount the kernel module to manage the Mellanox OFED stack. Therefore, two of the six nodes were equipped with Infiniband HCA cards with a throughput of 56 Gbps to maximize the available inter-node bandwidth and compensate for the lack of RDMA.

(Image credit: E4)

One of the critical parts of the experiment was porting the essential HPC services required to expose the compute-intensive workloads. The team reported that porting NFS, LDAP, and the SLURM task scheduler to RISC-V was relatively straightforward; then they installed an ExaMon plugin dedicated to data sampling, a broker for transport layer management and a database for storage.

Results

Since using a low-power cluster designed for software porting purposes for real HPC workloads doesn’t make sense, the team ran HPL and Stream benchmarks to measure GFLOPS performance and memory bandwidth. The results were mixed, however.

(Image credit: University of Bologna)

The maximum theoretical performance of SiFive’s U74 core is 1 GFLOPS, which suggests that a maximum theoretical performance of a Freedom U740 SoC should be 4 GFLOPS. Unfortunately, each node only achieved a sustained performance of 1.86 GFLOPS in HPL, which means that the maximum compute capacity of an eight-node cluster should be around 14.88 GFLOPS assuming a in perfect linear scale. The entire cluster reached a maximum sustained performance of 12.65 GFLOPS, or 85% of the extrapolated achievable peak. Meanwhile, due to the relatively poor scaling of the SoC, 12.65 GFLOPS is 39.5% of the theoretical peak of the whole machine, which might not be so bad for an experimental one if we don’t take into account the poor scaling of the U740 model.

(Image credit: University of Bologna)

As for memory bandwidth, each node should produce around 14.928 GB/s of bandwidth using a DDR4-1866 module. In fact, it never exceeded 7760 MB/s, which is not a good result. The actual benchmark results in the unmodified upstream stream are even less impressive, as a 4-threaded workload only achieved bandwidth no greater than 15.5% of the maximum available bandwidth, which is much lower than the results of the other clusters. On the one hand, these results demonstrate the Freedom U740’s mediocre memory subsystem, but on the other hand, they also show that software optimizations could improve things.

(Image credit: University of Bologna)

In terms of energy consumption, the Monte Cimone cluster keeps its promises: it is low. For example, the actual power consumption of a SiFive Freedom U740 peaks at 5.935W under CPU-intensive HPL workloads, while in standby it consumes approximately 4.81W.

Summary

The Monte Cimone cluster used by the researchers is perfectly capable of running an HPC software stack and suitable test applications, which is good enough. Also, SiFive’s HiFive Unmatched card and E4 systems indulge in software porting purposes, so the smooth running of NFS, LDAP, SLURM, ExaMon and other programs was a pleasant surprise. Meanwhile, the lack of RDMA support was not.

(Image credit: E4)

“To our knowledge, this is the first fully operational RISC-V cluster supporting a basic HPC software stack, proving the maturity of the RISC-V ISA and the first generation of commercially available RISC-V components” , the team wrote in its report. “We also evaluated support for Infiniband network adapters that are recognized by the system, but are not yet capable of supporting RDMA communication.”

But the actual performance of the cluster did not meet expectations. Such effects fell under the condition of the U740’s mediocre performance and capabilities, but software preparation played a role. That said, while HPC software can run on RISC-V based systems, it cannot meet expectations. This will change once the developers optimize the programs for the open source architecture and the appropriate hardware is released.

Indeed, the researchers say their future work involves improving the software stack, adding RDMA support, implementing dynamic power and heat management, and using RISC-based accelerators. v.

As for hardware, SiFive can build SoCs with up to 128 high-performance cores. These CPUs are aimed at data centers and HPC workloads, so expect them to have decent performance scalability and a decent memory subsystem. Additionally, once SiFive enters these markets, it will need to provide compatibility and software optimizations, so expect the chipmaker to encourage software developers to tweak their programs for the RISC-V ISA.

Share.

About Author

Comments are closed.