Power analysis focused on emulating SoC designs

0


Download this article in PDF format.

What you will learn:

  • How FinFET technology has changed the analysis of energy consumption.
  • Steps involved in taking a hierarchical approach to perform proper power analysis.

Audit expert Lauro Rizzatti recently interviewed Jean-Marie Brunet, Senior Marketing Director, Scalable Verification Solutions Division (SVSD), Siemens EDA, on the importance of accurate power estimation and optimization for system-on-chip (SoC) designs.

What is the problem facing the semiconductor industry today regarding the estimation of pre-silicon power?

The problem is the gap between the estimated dynamic power consumption before silicon in SoC designs and the actual power dissipated by the manufactured SoC. In recent years, customers have noticed that when newly designed SoCs are plugged into end-product outlets, the actual dynamic power consumption exceeds the estimated power by an order of magnitude.

It has become essential to accurately forecast actual energy consumption when designing and verifying new designs.

The main cause of the gap is the shift from traditional planar semiconductor technology to FinFET semiconductor technology. Historically, traditional CMOS technology suffered from significant standby or static current leakage. As it moved to the lower nodes, below 32nm, the standby current increased exponentially and became unmanageable. FinFET technology has significantly reduced static current. Unfortunately, this did not significantly change the switching or dynamic current.

Can you expand on the dynamic power dissipation in FinFETs a bit?

The FinFET transistor dramatically reduces power leakage from planar devices via a 3D approach. By raising the channel and wrapping the gate around it, the resulting structure provides more efficient channel control that lowers threshold and supply voltages. (Fig. 1).

% {[ data-embed-type=”image” data-embed-id=”5ff6114a2621c848218b482f” data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”1. The chart highlights FinFET gate capacitance as compared to planar processes. (Source: Cavium Networks)” data-embed-src=”https://img.electronicdesign.com/files/base/ebm/electronicdesign/image/2021/01/Rizzatti_1.5ff61149a966c.png?auto=format&fit=max&w=1440″ data-embed-caption=”1. The chart highlights FinFET gate capacitance as compared to planar processes. (Source: Cavium Networks)” ]}%

In FinFETs, dynamic power consumption makes up the bulk of the total power dissipation due to the higher pin capacitances than planar transistors. This results in higher dynamic power numbers.

Designing with FinFET technology requires design rules that take into account the requirements of the FinFET process. The new rules limit the synthesis, placement, planning, and optimization affecting design metrics.

RTL-level power analysis is now mandatory, should start early in the design flow, and be performed at all stages of the design flow, along with other design metrics, such as performance and area . The intersection of RTL, embedded software code, and layout is essential for identifying and debugging issues early in the design flow.

What other issues are contributing to the gap between estimated dynamic power consumption ahead of silicon?

Another important problem arises from the intrinsic limitations of the stimulus exerting the design under test (DUT) when verifying the pre-silicon design.

Today, the electronics industry makes extensive use of benchmarks to assess the performance and power consumption of new designs. Different segments of the industry use different types of references.

In the mobile industry, a very popular benchmark called AnTuTu assesses the performance / power of smartphones and tablets. For the GPU-centric design, the most popular are Car Chase, Manhattan, and all Kishonti benchmarks.

In the artificial intelligence / machine learning (AI / ML) industry, the MLPerf benchmark suite measures the performance / power of ML software frameworks, ML hardware accelerators and ML cloud platforms. It is popular for both training and inference. In storage, IOPS measurement provides an accurate assessment of the performance / accuracy of new devices.

It is imperative to run these benchmarks in pre-silicon validation. Full design visibility can identify areas of excessive power consumption long before silicon is fabricated and allow for design corrections.

How to measure power consumption in pre-silicon validation?

Traditionally, energy consumption has been achieved at the door by following the switching activity of the DUT exerted by test benches made up of regression vectors. The approach has two problems.

First, testing takes place very late in the design cycle. Although the gap to silicon is only 5%, there is not enough flexibility to correct the problem in the design. A better compromise is to assess the dynamic power consumption at RTL which leads to a larger deviation from silicon in the order of 15%, but offers greater flexibility to accommodate design changes.

Second, the test bed vectors are not a good representation of how the design is going to be used. To get an accurate estimate of power, it is important to capture switching activity as accurately as possible in the context of the target system running real workloads and performance / power tests, as previously described.

What is the setup for performing power analysis, and how do you perform it?

Obviously, the RTL simulation can no longer undertake the demanding work. What is needed is a hierarchical approach, starting at the high level of design abstraction and progressing in stages to RTL and gate level. No tool can do all the work anymore. Instead, multiple tools with optimal feature tradeoffs can speed up power estimation and optimization. (see table below).

% {[ data-embed-type=”image” data-embed-id=”5ff6119df418e605178b48a0″ data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”A hierarchical approach to power estimation and analysis is necessary to accelerate the process. (Source: Lauro Rizzatti)” data-embed-src=”https://img.electronicdesign.com/files/base/ebm/electronicdesign/image/2021/01/RizzattiTable_.5ff6119d0e6b5.png?auto=format&fit=max&w=1440″ data-embed-caption=”A hierarchical approach to power estimation and analysis is necessary to accelerate the process. (Source: Lauro Rizzatti)” ]}%

In the first step, the entire DUT described in C / C ++ at a high level of abstraction is quickly validated against the hardware / software specifications, and a very approximate power consumption is estimated.

Then, the power dissipation is enabled in a hybrid configuration consisting of one part of the design described at a high level of abstraction, typically including processing cores and memories such as Arm Fast models, and the other part of design at RTL. The high-level abstraction section is executed on a host server, the RTL is executed on a hardware emulator, and the two are connected through a transaction-based interface.

While the emulator operates at a very low megahertz speed, the hybrid configuration can reach speeds of around 50 MHz, fast enough to quickly boot Android, Linux and all the underlying kernel, as well as run benchmarks and the real life. applications.

The setup provides a head start to streamline the entire design for power consumption in a relatively short time. By plotting switching activity over a long period of billions of clock cycles, the design team can identify high and low power dissipation hot spots in ranges of a few million clock cycles. Likewise, by tiling the areas of power dissipation in an activity map, the team can visually identify the high and low power dissipation design sections.

Once the hotspots and critical tiles are located, the team can move to the full RTL and enjoy precise and detailed visibility of each design network. By correlating the activity graph to the embedded software code as well as the activity map to the RTL code, the team can quickly zoom in on areas of potential power issues.

Capturing full design activity for all workload processing is extremely important and avoiding sampling, which is typically done with FPGA-based platforms that lack full internal visibility. (Fig. 2).

% {[ data-embed-type=”image” data-embed-id=”5ff61166f418e6e51f8b47f5″ data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”2. Power tools can track power trend analysis with activity maps and plots. (Source: Mentor, a Siemens Business)” data-embed-src=”https://img.electronicdesign.com/files/base/ebm/electronicdesign/image/2021/01/Lauro_Rizzatti_ED_Expert_Column_11_20__2.5ff611654fb1d.png?auto=format&fit=max&w=1440″ data-embed-caption=”2. Power tools can track power trend analysis with activity maps and plots. (Source: Mentor, a Siemens Business)” ]}%

It should be mentioned that a big semiconductor house has changed its mind on the early profiling of power at RTL after witnessing the execution of the angry Birds benchmark on one of its SoCs running on an emulator. I must have laughed thinking my daughter is having fun playing angry Birds on his iPod, and this big semiconductor company is running the same program on an emulator.

What changes do you anticipate next?

An innovative design aspect which is very complex to manage at the pre-silicon stage concerns the chips, the stacking of the dies and the packaging of 3D integrated circuits.

My previous discussion of profiling and power analysis was based on a monolithic design where all components are combined on a single die. What we look at next are the designs implemented in a complex 3D IC package. In many of these designs, the CPU cores are on one chip, the GPU cores on another, the memories on a third, and so on, and they communicate with each other through a substrate or multi-die interconnect bridge. integrated (EMIB) (Fig. 3).

% {[ data-embed-type=”image” data-embed-id=”5ff61184fa8287462b8b479a” data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”3. An embedded multi-die interconnect bridge (EMIB) enables communication between CPU cores on one die, GPU cores on another, and memories on a third. (Source: Intel)” data-embed-src=”https://img.electronicdesign.com/files/base/ebm/electronicdesign/image/2021/01/Lauro_Rizzatti_ED_Expert_Colum_11_20__3.5ff61183a8447.png?auto=format&fit=max&w=1440″ data-embed-caption=”3. An embedded multi-die interconnect bridge (EMIB) enables communication between CPU cores on one die, GPU cores on another, and memories on a third. (Source: Intel)” ]}%

Performing power profiling and analysis as well as thermal analysis on the design hardware hierarchy and configurable embedded software stack spanned across multiple arrays is complex and difficult.

We need to design a modular, hierarchical compilation of a complete design targeting a specific hardware emulation platform, and design the ability to navigate, identify, and debug hardware / software activity across the design hierarchy.


Share.

About Author

Leave A Reply