Power analysis focused on emulation of SoC designs


Verification expert Lauro Rizzatti recently interviewed Jean-Marie Brunet, Senior Marketing Director, Scalable Verification Solutions Division (SVSD), Siemens EDAon the importance of accurate power estimation and optimization for system-on-chip (SoC) designs.

What is the problem facing the semiconductor industry today regarding pre-silicon power estimation?

The problem is the discrepancy between the estimated pre-silicon dynamic power consumption in SoC designs and the actual power dissipated by the fabricated SoC. Over the past few years, customers have noticed that when newly designed SoCs are plugged into the sockets of end products, the actual dynamic power consumption exceeds the estimated power by an order of magnitude.

It has become essential to accurately predict actual power consumption when designing and verifying new designs.

The main cause of this gap is the shift from traditional planar CMOS technology to finFET semiconductor technology. Historically, traditional CMOS technology suffered from significant current leakage in standby or static mode. Moving to lower nodes, below 32 nm, the idle current increased exponentially and became unmanageable. FinFET technology has greatly reduced static current. Unfortunately, this did not significantly change the switching or dynamic current.

Can you elaborate a bit on dynamic power dissipation in finFETs?

The finFET transistor significantly mitigates power leakage from planar devices via a 3D approach. By raising the channel and wrapping the gate around it, the resulting structure provides more efficient channel control that lowers threshold and supply voltages (Figure 1).

Fig. 1: The graph highlights FinFET gate capability versus planar processes. (Source: Cavium Networks)

In finFETs, dynamic power consumption accounts for most of the total power dissipation due to higher pin capacitances compared to planar transistors. This results in higher dynamic power numbers.

Designing with finFET technology requires more stringent design rules that consider finFET process requirements. The new rules limit synthesis, placement, floor planning, and optimization affecting design measurements.

Power analysis at the RTL level is now mandatory, should start early in the design flow and be performed at all stages of the design flow, along with other design metrics, such as performance and area . Cross-analysis between RTL, embedded software code, and layout is essential for identifying and debugging issues early in the design flow.

What other issues are contributing to the discrepancy in estimated dynamic power consumption before silicon?

Another significant issue stems from the intrinsic limitations of the stimulus exercising the design under test (DUT) when verifying the pre-silicon design.

Today, the electronics industry makes extensive use of benchmarks to evaluate the performance and power consumption of new designs. Different industry segments use different types of benchmarks.

In the mobile industry, a very popular benchmark called AnTuTu evaluates the performance/power of smartphones and tablets. For GPU-centric designs, the most popular are Car Chase, Manhattan, and all Kishonti benchmarks.

In the Artificial Intelligence/Machine Learning (AI/ML) industry, the MLPerf benchmark suite measures the performance/power of ML software frameworks, ML hardware accelerators, and ML cloud platforms. It is popular for both training and inference. In storage, IOP measurement provides an accurate performance/accuracy assessment of new devices.

It is imperative to run these benchmarks in pre-silicon validation. Full design visibility can identify areas of excessive power consumption long before silicon is fabricated and enable design corrections.

How do you measure power consumption in pre-silicon validation?

Traditionally, power consumption has been realized at the gate level by tracking the DUT switching activity exerted by test benches made up of regression vectors. The approach has two problems.

First, testing takes place very late in the design cycle. Although the gap with silicon is only 5%, there is not enough flexibility to correct the problem in the design. A better trade-off is to evaluate dynamic power consumption at RTL, which results in a larger deviation from silicon on the order of 15%, but provides more flexibility to accommodate design changes.

Second, workbench vectors are not a good representation of how the design is going to be used. To get an accurate power estimate, it is important to capture switching activity as accurately as possible in the context of the target system running real-world workloads and performance/power tests, as previously described.

What is the setup to perform a power analysis and how do I achieve it?

Obviously, the RTL simulation can no longer undertake the demanding work. What is needed is a hierarchical approach, starting at the high level of design abstraction and moving in stages to the RTL and gate level. No tool can do all the work anymore. Instead, multiple tools with optimal feature trade-offs can speed up power estimation and optimization (see Table 1 below).

Table 1: A hierarchical approach to power estimation and analysis is needed to speed up the process. (Source: Lauro Rizzatti)

In the first step, the entire DUT described in high-level C/C++ is quickly validated against hardware/software specifications, and a very rough power consumption is estimated.

Next, power dissipation is validated in a hybrid configuration consisting of one part of design described at a high level of abstraction, typically including processing cores and memories such as Arm Fast models, and the other part of design at RTL. The high-level abstraction section runs on a host server, the RTL runs on a hardware emulator, and the two are connected through a transaction-based interface.

While the emulator runs at very low megahertz speeds, the hybrid setup can reach speeds of around 50 MHz, fast enough to quickly boot Android, Linux, and all the kernel below it, as well as run benchmarks and real tests. apps.

The configuration provides a head start in profiling the entire design for power consumption in a relatively short time. By plotting switching activity over a long period of billions of clock cycles, the design team can identify high and low power dissipation hotspots in ranges of a few million clock cycles. Similarly, by tiling the power dissipation areas in an activity map, the team can visually identify high and low power dissipation design sections.

Once hotspots and critical tiles are identified, the team can move to full RTL and enjoy precise, detailed visibility into each design network. By correlating the activity plot to the embedded software code as well as the activity map to the RTL code, the team can quickly zoom in on areas of potential power issues.

It is extremely important to capture complete design activity for all workload processing and to avoid sampling, which is typically done with FPGA-based platforms that lack complete internal visibility ( figure 2).

Fig. 2: Power tools can track power trend analysis with maps and activity plots. (Source: Siemens EDA)

It’s worth mentioning that a major semiconductor house changed its mind about early power profiling at RTL after witnessing the ‘Angry Birds’ benchmark running on one of its SoCs running on a emulator. I had to laugh thinking that my daughter is having fun playing “Angry Birds” on her iPad, and this big semiconductor company is running the same program on an emulator.

What developments do you anticipate next?

A new design aspect that is very complex to manage in the pre-silicon stage concerns chips, chip stacking and packaging of 3D integrated circuits.

My previous discussion of power profiling and analysis was based on a monolithic design where all components are combined on a single chip. What we look at next are designs implemented in a complex 3D IC package. In many of these designs, the CPU cores are on one die, the GPU cores on another, the memories on a third, etc., and they communicate with each other through a multi-chip interconnect substrate or bridge. integrated (EMIB) (Figure 3).

Fig. 3: An integrated multi-chip interconnect bridge (EMIB) enables communication between CPU cores on one chip, GPU cores on another, and memories on a third. (Source: Intel)

Performing power profiles and analyzes as well as thermal analyzes on the design hardware hierarchy and configurable embedded software stack distributed across multiple dies is complex and challenging.

We need to design a modular, hierarchical build of a complete design targeting a specific hardware emulation platform, and design the ability to browse, identify and debug hardware/software based activity across the hierarchy Design.


About Author

Comments are closed.