HBM3 dons the crown of bandwidth king

Low power – high performance


How the latest version of the High Bandwidth Memory standard adapts to increasingly demanding applications.

With the release of the HBM3 update to the High Bandwidth Memory (HBM) standard, a new bandwidth king is crowned. The scorching performance demands of advanced workloads, with AI/ML training leading the pack, are driving the need for ever faster bit delivery. Memory bandwidth is a critical enabler of compute performance, hence the need for accelerated evolution of the standard with HBM3 representing the new benchmark.

Here is what HBM3 offers:

  • Above all, provides higher data throughput. HBM3 increases the data rate per pin to 6.4 Gigabits per second (Gb/s), which is twice that of HBM2 (and a 78% increase over the 3.6 Gb/s data rate of HBM2E).
  • Retains the 1024-bit wide interface of previous generations. Bandwidth is the product of data rate and interface width, so 6.4 x 1024 is 6554 Gbps. Dividing by 8 bits/byte yields a bandwidth of 819 gigabytes per second (GB/s) that is possible between a host processor and a single HBM3 DRAM device.
  • Doubles the number of memory channels to 16 and supports 32 virtual channels (with two pseudo channels per channel). With more memory channels, HBM3 can support higher DRAM stacks per device and finer access granularity.
  • Supports 3D DRAM devices up to 12 stacks high (with the possibility of future expansion up to 16 devices per stack) with device densities up to 32 GB. A stack of 12 32 GB devices will translated by a single HBM3 DRAM device with a capacity of 48 GB.
  • Retains 2.5D architecture of host processor and interposer-mounted HBM3 DRAM devices to support routing of thousands of signal traces. Thus, as with previous generations, HBM3 is a 2.5D/3D architecture.
  • Improves power efficiency by lowering the operating voltage to 1.1V and using 0.4V low swing signaling.

Let’s wrap it all up in a potential use case. A future AI accelerator implementation includes six (6) HBM3 DRAM devices. Total aggregate memory bandwidth at 6.4 Gb/s is 4.9 terabytes per second (TB/s). Each 12 x 32 GB HBM3 DRAM device has a capacity of 48 GB, so the AI ​​Accelerator can access 288 GB of direct-attached HBM3 memory.

It is a huge capacity. HBM3 extends the bandwidth performance established by what was originally called the “slow and wide” HBM memory architecture. While the interface is still wide, HBM3 running at 6.4 Gb/s is now really fast enough. All things being equal, higher speeds mean higher power. The motivation for the wide interface (which required the more complex 2.5D architecture) was to operate at low data rates providing high bandwidth at low power. To compensate, HBM3 lowers the operating voltage (the last chip in our list above) for higher power efficiency.

But there’s no free lunch, and lower tension means lower design margin for what is already a difficult 2.5D design. Luckily, Rambus has your back with our 8.4 Gb/s HBM3 memory subsystem that offers plenty of design headroom and room to grow. To help you successfully exploit the full potential of HBM3 memory, Rambus offers reference designs of interposers and packages.

The Rambus memory subsystem includes a modular and highly configurable memory controller. The controller is optimized to maximize throughput and minimize latency, and its memory settings are programmable in real time. With a pedigree of over 50 HBM2 and HBM2E customer implementations, it has proven its effectiveness on a wide variety of configurations and data traffic scenarios.

While the road to higher performance is a journey, not a destination, the latest generation from HBM promises to deliver some very extraordinary capabilities. All hail the new king of memory bandwidth, HBM3.

Additional Resources:

Frank Ferro

Frank Ferro

(All posts)

Frank Ferro is Senior Product Marketing Manager for IP Cores at Rambus.


About Author

Comments are closed.