

### An Energy-Efficient Near/Sub-Threshold FPGA Interconnect Architecture Using Dynamic Voltage Scaling and Power-Gating

He Qi, Oluseyi Ayorinde, and Benton H. Calhoun Charles L. Brown Department of Electrical and Computer Engineering

> University of Virginia Charlottesville, Virginia hq5tj,oaa4bj,bhc2bi@virginia.edu

Robust Low Power VLSI

### Motivation

By 2020, there will be more than 50 billion electronic devices in total and 6.58 per person connected to internet.



Source: Evans, Dave. "The internet of things: How the next evolution of the internet is changing everything." CISCO white paper 1 (2011): 1-11.

The majority of these electronic devices will be Low-power sensors in Ubiquitous Computing.

- Health Sensors
- Environmental Sensors



https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9G cQCNBXqldhnsSVYp4y1A4MnKeoVOVfLomVOqtyQT-wQRZij\_sy7



http://www.valencell.com/blog/2013/12/wearable-technology-all-about-people

## Motivation

#### **Requirements on Hardware**

- Low Power/Energy Consumption
- Substantial Processing Capability
- Flexible Hardware

Power

Low Development and Deployment Cost



#### Performance

The power of existing LP FPGAs exceed the energy budget of sensor applications.

#### Solution

• ULP FPGA operating in sub/near-threshold

**FPGAs meet all of these** 

requirements.

### Background FPGA Energy Breakdown

Logic 10 9% 21% Clock Interconnect

- The interconnect dominates FPGA delay & energy.
- To reduce energy, we proposed an lowswing interconnect in our prior work by removing buffers and properly sizing the circuits at near/sub-threshold.

#### Low-Swing Interconnect



Our low-swing interconnect is proved to be 42.7% lower energy than a traditional unidirectional interconnect at 0.4V.

However, energy waste still exist in the low-swing interconnect.

### Problems

#### **Energy Waste in Low-Swing Interconnect**

• Energy Waste #1: Attaching circuits on non-critical paths to the same supply voltage of circuits on critical paths is a waste of energy.



### Observations

 The delay of the non-critical paths is unnecessarily small. Reducing the supply voltage of circuits on non-critical paths saves energy without affecting the overall FPGA speed.

### Problems

### **Energy Waste in Low-Swing Interconnect**

 Energy Waste #2: The interconnect resources that are in idle mode consume a lot of leakage energy, especially in sub-threshold region.



#### ENERGY DISTRIBUTION @ VDD=0.6

#### **Observations**

- Implementing the showing benchmarks, over 40% of the total FPGA energy is wasted in the form of idle circuit leakage.
- The idle circuit leakage energy mostly comes from configuration bitcells.

## Problems

### **Typical Solutions**

- **Dual-VDD**: apply a lower VDD to the circuits on non-critical paths
- **Power-Gating**: cut off the connections between the idle circuits and supply voltages using headers.

### However

- Due to the large area overhead, no existing work applied dual-VDD to the traditional Interconnect.
- No existing work applied Power-Gating to configuration bitcells.



### Contributions

#### Contributions

- We applied dual-VDD technique to the low-swing FPGA interconnect at near/sub-threshold.
- We applied power-gating technique to the idle configuration bitcells.
- We developed a new dynamic voltage scaling architecture for low-swing interconnect.
- We designed a power management unit enabling dual-VDD and DVS.

#### Tasks

- SPICE Simulation
- Energy Saving Evaluation
- Overhead Evaluation
- Tool Development
- Chip Measurement of a Custom 512-LUT FPGA

### **Proposed Architecture**



- The VDDH & VDDL are generated by a LDO, along with the headers to perform dual-VDD and power-gating.
- The VDDC is generated by a delay-chain-based control logic to perform DVS.

### **Proposed Architecture**

Details of the delay-chain-based control logic



## Methodology



### **Results --- Dual-VDD**



Observations \* VRO : the energy overhead of the voltage regulator

- The optimal VDDL in terms of energy is obtained at 0.1V lower than VDDH.
- The energy reduction of using dual-VDD is about 20% on average, but reduces to about 10% when considering voltage regulator overhead.

### **Results --- Dual-VDD & Power-Gating**



#### **Observations**

- Using coarse-grained power-gating & dual-VDD together with considering voltage regulator overhead, the energy reduction reaches 17.5 ~ 21.9%. If using fine-grained power-gating, the energy reduction reaches 43.7 ~ 62.2%.
- The measurement results of a custom 512-LUT FPGA shows an 91.1% leakage energy reduction using coarse-grained power-gating itself.

### **Results --- DVS**

ED-Curves of the FPGA When Using DVS ( $V_{DD} = 0.6V$ ) → apex2 (w/o R) ··▲·· apex2 (w/ R) \* R: repeater 80 Delay: 0.22us Energy/Op: 35.7pJ 75 70 0.7 Energy/Op (pJ) 62 20 20 20 20 20 20 20 Delay: 0.43us 0.6 Energy/Op: 21.9pJ  $V_{DDC} = V_{DD}$ 0.5 0.4  $V_{DDC} - V_{DD} = 0.1V$ 0.3 45 0.2 40 1.0 0.0 2.0 3.0 Delay (us)

#### Observations

For APEX2 at 0.6V, by sweeping VDDC from VDD to 0.7V higher than VDD, the critical path delay can be adjusted in the range of 0.22us ~ 0.43us, while the total FPGA energy per operation can be adjusted in the range of 21.9pJ ~ 35.7pJ.

### Conclusions

#### Contributions

- We applied dual-VDD technique to the low-swing FPGA interconnect at near/sub-threshold with tool support.
- We applied power-gating technique to the idle configuration bitcells.
- We developed a new dynamic voltage scaling architecture for low-swing interconnect.
- We designed a power management unit enabling dual-VDD and DVS.

#### Limitations & Future work

- **Dual-VDD:** We haven't developed a tool for configuring dual-VDD on chips. We have no measurement results for dual-VDD so far.
- Power-Gating: We haven't optimized the layout of switch boxes using fine-grained power-gating.
- Benchmarks: We haven't evaluate the proposed architecture using IoT applications

# Thank you! Questions?

# **Backup Slides**

### **Noise & Crosstalk**

]þ-

|                                                                     | Worst Case Crosstalk | No Crosstalk |  |
|---------------------------------------------------------------------|----------------------|--------------|--|
| Critical Path Delay (us)                                            | 0.23                 | 0.14         |  |
| Energy Reduction of the Full FPGA<br>when Using Dual-VDD (%)        | 9.8                  | 11.0         |  |
| Sensitivity of Critical Path Delay to<br>VDDH & VDDL Noise (%/10mV) | + 2.1                | + 3.1        |  |
| Sensitivity of Full FPGA Energy to<br>VDDH & VDDL Noise (%/10mV)    | - 3.2                | + 0.4        |  |
| Sensitivity of Critical Path Delay to<br>VDDC Noise (%/10mV)        | + 1.3                | + 0.9        |  |
| Sensitivity of Full FPGA Energy to<br>VDDC Noise (%/10mV)           | + 0.9                | + 0.7        |  |

### **Benchmark Characterization**

]þ-

| Benchmark | LUT Count    | FF Count             | I/O Count |
|-----------|--------------|----------------------|-----------|
| alu4      | 1522         | Combinational        | 22        |
| apex2     | <u>18</u> 78 | Combinational        | 41        |
| apex4     | 1262         | <b>Combinational</b> | 28        |
| des       | 1591         | <b>Combinational</b> | 501       |
| ex5p      | 1064         | Combinational        | 71        |

## **Comparisons with Prior Art**

]Ի

| Specs             | [6]             | [7]             | [5]                   | This work                   |
|-------------------|-----------------|-----------------|-----------------------|-----------------------------|
| VDDH/VDDL (V)     | 1.1/0.9         | 1.8/1.26 ~ 1.57 | 1.3/0.8 ~ 1.0         | 0.6/0.45 ~ 0.6              |
| Interconnect type | Uni-directional | Uni-directional | <b>Bi-directional</b> | Unidirectional<br>Low-swing |
| Relative          |                 |                 |                       |                             |
| interconnect      |                 |                 |                       |                             |
| energy at the     | 1               | 1.47            | 1.39                  | 0.64 ~ 0.86                 |
| same VDD and      | 1               | 1.47            | 1.55                  | 0.04 0.00                   |
| technology node   |                 |                 |                       |                             |
| (x)               |                 |                 |                       |                             |
| The adjustable    | Not support     |                 | Not support           |                             |
| speed range by    | Not support     | Not provided    |                       | 2.3 ~ 7.1                   |
| using DVS (MHz)   | DVS             |                 | DVS                   |                             |
| The adjustable    |                 |                 |                       |                             |
| energy range by   | Not support     | Not unavidad    | Not support           |                             |
| using DVS         | DVS             | Not provided    | DVS                   | 5.5 ~ 35.7                  |
| (pJ/Op)           |                 |                 |                       |                             |