

# Modeling and Analysis of Power Supply Noise Tolerance with Finegrained GALS Adaptive Clocks

Divya Akella Kamakshi\*, Matthew Fojtik<sup>†</sup>, Brucek Khailany<sup>†</sup>, Sudhir Kudva<sup>†</sup>, Yaping Zhou<sup>†</sup>, Benton H. Calhoun<sup>\*</sup>

Robust Low Power VLSI

\*University of Virginia <sup>†</sup>NVIDIA Corporation

#### Outline

- Problem: Power supply noise
- Existing solution
  - Traditional adaptive clocking scheme
  - Drawbacks
- Proposed solution: Fine-grained GALS adaptive clocking
- Quantification of benefits
  - Experimental setup
  - Simulation results
- Summary

#### Outline

- Problem: Power supply noise
- Existing solution
  - Traditional adaptive clocking scheme
  - Drawbacks
- Proposed solution: Fine-grained GALS adaptive clocking
- Quantification of benefits
  - Experimental setup
  - Simulation results
- Summary



Figure courtesy: <u>http://www.theregister.co.uk/2012/05/18/inside\_nvidia\_kepler2\_gk110\_gpu\_tesla/</u> http://www.ansys.com/Products/Electronics/Option-SIwave-PSI-Solver



Parasitics  $\rightarrow$  power supply noise !!!

• Resistive component  $\rightarrow$  IR drop



• Reactive component  $\rightarrow$  Ldl/dt droop \_2.50E-03 First, second, third droop . . .  $\widehat{G}^{2.00E-03}$ 





Supply noise  $\rightarrow$  timing errors  $\rightarrow$  performance degradation.

Operate at frequency that can handle worst-case noise !!!

#### Outline

Problem: Power supply noise

#### Existing solution

- Traditional adaptive clocking scheme
- Drawbacks
- Proposed solution: Fine-grained GALS adaptive clocking
- Quantification of Benefits
  - Experimental setup
  - Simulation results
- Summary

### **Existing Solution: Traditional Adaptive Clocking**

Adaptive clocking [1][2]



Frequency tracks voltage !!!

### **Existing Solution: Traditional Adaptive Clocking**

Fixed clocking vs. Adaptive clock



# Metric: Uncompensated Voltage Noise (UVN)

#### **Uncompensated voltage noise UVN = V**<sub>mean</sub> – V<sub>req</sub>

*V<sub>mean</sub>* : available voltage, averaged over a clock cycle
 *V<sub>req</sub>* : required voltage (for operation of circuits at required frequency)

When  $V_{mean} > V_{req}$ , no problem !!! When  $V_{mean} < V_{req} \rightarrow$  additional margin for failure-free operation (UVN)

#### Lower UVN → Iower margin

#### Lower UVN is better!!!

### **Existing Solution: Traditional Adaptive Clocking**

Fixed clocking vs. Adaptive clock



Voltage droops: voltage available (V<sub>mean</sub>)

Voltage corresponding to fixed frequency ( $V_{req}$ )

 $UVN = V_{mean} - V_{req}$ 

Voltage droops: voltage available  $(V_{mean})$ 

Voltage corresponding to adaptive frequency (V<sub>req</sub>)

UVN = 0 (expected)

Adaptive clocking is a great solution !!!

- Large chips with a few clock domains
- But each clock domain is still many mm<sup>2</sup>



Drawback #1: Effect of Clock-tree Insertion Delay



Drawback #1: Effect of Clock-tree Insertion Delay



Adaptive clock responds to supply noise

 $\Delta t$  time for the stretched pulses to reach the load (~ 1 - 2 ns)

Higher UVN: additional margin for failure-free operation !!!

Drawback #2: Effect of Spatial Workload Variations

Current variations across chip  $\rightarrow$  Variations in voltage fluctuation across chip



Higher UVN, additional margin for failure-free operation !!!



- Higher clock domain area
  - Effect of clock-tree insertion delay is higher
  - Spatial difference in voltage fluctuations is higher

#### Outline

- Problem: Power supply noise
- Existing solution
  - Traditional adaptive clocking scheme
  - Drawbacks
- Proposed solution: Fine-grained GALS adaptive clocking
- Quantification of Benefits
  - Experimental setup
  - Simulation results
- Summary

# Proposed Solution: Fine-grained GALS Adaptive Clock



Traditional adaptive clock Clock domain many mm<sup>2</sup> Fine-grained GALS adaptive clock Clock domain as small as a mm<sup>2</sup> <sup>19</sup>

# Proposed Solution: Fine-grained GALS Adaptive Clock



A) Asynchronous boundary crossing:

- B. Keller et. al
- Pausible bisynchronous
  FIFO design
- Easily integrated to standard tool flows
- Average latency 1.34 cycles
- B) Myriad local clocks
- Ring oscillators: mW range
  power
  20

# Proposed Solution: Fine-grained GALS Adaptive Clock

- Lower clock domain area
  - Lower insertion delay (few 100 ps).
  - Lower variation in voltage fluctuation.



Fine-grained GALS adaptive clocks → lower UVN (lower margin)!!!

#### Outline

- Problem: Power supply noise
- Existing solution
  - Traditional adaptive clocking scheme
  - Drawbacks
- Proposed solution: Fine-grained GALS adaptive clocking
- Quantification of Benefits
  - Experimental setup
  - Simulation results
- Summary











#### **Traditional Adaptive Clocking**

- Long clock-tree → upto 2 ns
- Set PDN area to many mm<sup>2</sup>

Fine-grained GALS Adaptive Clocking

- Short clock-tree → low as 300 ps
- Set PDN area to just a mm<sup>2</sup>



# **Power Distribution Network**

#### Simple lumped PDN model



- Cannot model spatial voltage variations
- Need distributed PDN Voltspot [3]

# **Power Distribution Network**

#### Distributed PDN model using Voltspot



- Total chip area
- PDN divided into an array (47 x 47)



### **Adaptive Clock Generator**



- Verilog-A model
- Voltage averaged over a cycle : V<sub>mean</sub>
- Voltage vs. frequency (VF) curve

How is VF curve generated?

- Critical path: longest circuit path on an SoC
- Emulate critical path using 45 nm PDK kit
- Simulate for max frequency vs. voltage → VF curve



#### **Clock-tree**

- Global and local clock distribution
- Insertion delay vs. voltage





#### Workload

- Current switching activity  $\rightarrow$  voltage rail fluctuation
- Resonating current profile have worst effect of supply noise



- Frequency of interest : 10 40 MHz (Resonance at 30 MHz)
- Current slew rate: 10 A to 90 A over 10 clock cycles
- System frequency : 850 MHz, supply voltage = 1 V

- A. Effect of Clock-tree Insertion Delay
- Uniform current distribution throughout PDN area
- Sweep workload frequency: 10 40 MHz, insertion delay: 0.3 -1.5 ns



B. Effect of Spatial Workload Variations



B. Effect of Spatial Workload Variations



B. Effect of Spatial Workload Variations:

- Lower half : 80% of power (top half: 20%)
- Workload frequency: 30 MHz





Traditional Adaptive Clocking Uncompensated voltage noise = 111 mV

Fine-grained GALS Adaptive Clocking Uncompensated voltage noise = 33 mV

### Summary

- Model and analyze power supply noise tolerance
  - Traditional adaptive clocking
  - Fine-grained GALS adaptive clocking
- Effects of clock-tree insertion delay, spatial workload variations.
- UVN savings of ~78 mV
- Equivalent to power saving of ~15% for same performance (@1 V)
- Overheads
  - Myriad local clocks
  - Good candidates are digitally-controlled / ring oscillators
  - Only a few mWs of power (<1%).</li>
- Future work
  - Overall savings dependent on the GALS partition size.

#### References

- 1. A. Grenat, S. Pant, R. Rachala, and S. Naffziger, "5.6 Adaptive clocking system for improved power efficiency in a 28nm x86-64 microprocessor," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International, vol., no., pp.106-107, 9-13 Feb. 2014
- N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas, and R. Kumar, "Next Generation Intel Core<sup>™</sup> Micro-Architecture (Nehalem) Clocking," in Solid-State Circuits, IEEE Journal of , vol.44, no.4, pp.1121-1129, April 2009
- R. Zhang, K. Wang, B. H. Meyer, M. R. Stan, and K. Skadron, "Architecture implications of pads as a scarce resource," in Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on , vol., no., pp.373-384, 14-18 June 2014

#### **Thank You!**