# A Sub-nW 93% Peak Efficiency Buck Converter With Wide Dynamic Range, Fast DVFS, and Asynchronous Load-Transient Control

Xinjian Liu<sup>®</sup>, *Graduate Student Member, IEEE*, Benton H. Calhoun<sup>®</sup>, *Fellow, IEEE*, and Shuo Li<sup>®</sup>, *Member, IEEE* 

Abstract—This article presents a highly efficient buck converter with sub-nW quiescent power and wide dynamic range for ultra-low-power (ULP) Internet-of-Things (IoT) systemson-chip (SoCs). To optimize the SoC power consumption and performance, this buck converter supports fast dynamic voltage and frequency scaling (DVFS) and fast load-transient response (FLTR) using an asynchronous control scheme. To achieve robust and high-efficiency power delivery over process, voltage, and temperature variations, an adaptive deadtime controller (ADTC) is proposed with minimized area and power overhead. The power stage and gate drivers are optimized by a length split technique and a strong-up weak-down (SuWd) scheme to achieve low quiescent power. In addition, the buck converter is fully self-contained with a bias generator (BG), clock, and power-on-reset (PoR) integrated on-chip. Fabricated in 65nm CMOS, measurement results show that the buck converter achieves 802-pW quiescent power and 93% peak efficiency at 1.5-V input voltage. The measured dynamic range is from 0.5 to 2.75 mW, which is over six orders of magnitude. The measured voltage droop is 56 mV for a 45-nA-to-1-mA load current step and DVFS up- and down-tracking takes 10.57 and 19.81  $\mu$ s for an 88- and 92-mV reference step, respectively. This sub-nW buck converter integrates fast DVFS and FLTR features with a wide dynamic range making it suitable for ULP IoT applications.

Index Terms—Asynchronous control, buck converter, fast dynamic voltage and frequency scaling (DVFS), fast load-transient response (FLTR), high efficiency, sub-nW quiescent power, wide dynamic range.

# I. INTRODUCTION

ITH growing research interests and rapid development in wireless sensor nodes, healthcare devices, green and efficient industry, smart home, and cities, systems-on-chip (SoCs) need to work autonomously with energy harvesters to

Manuscript received December 1, 2021; revised February 6, 2022; accepted March 12, 2022. This article was approved by Associate Editor Rabia Tugce Yazicigil. This work was supported in part by the U.S. Department of Energy's Office of Energy Efficiency and Renewable Energy (EERE) under Award DE-EE0008225 and in part by the NSF Nanosystems Engineering Research Center for Advanced Self-Powered Systems of Integrated Sensors and Technologies (NERC ASSIST) Center under Grant EEC-1160483. This article was presented in part at the IEEE 47th European Solid-State Circuits Conference, September 2021 [DOI: 10.1109/ESSCIRC53450.2021.9567744]. (Corresponding author: Shuo Li.)

Xinjian Liu and Benton H. Calhoun are with the Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22903 USA (e-mail: xl5sp@virginia.edu).

Shuo Li is with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: shuoli@illinois.edu). Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2022.3161617.

Digital Object Identifier 10.1109/JSSC.2022.3161617

use energy directly from environment or keep active by using batteries with minimized form factor and a lifetime of tens of years with minimum maintenance. Therefore, aggressive power scaling techniques for the Internet-of-Things (IoT) SoCs are needed and projected to continue for a longer system lifetime, especially when those devices are powered by a shrinking size of energy harvesters. A commercial CR1025 lithium coin battery [1] only provides a 30-mAh capacity, which provides <15 months lifetime at  $3-\mu A$  load power. An on-chip photovoltaic (PV) cell can only provide 10 nW-1  $\mu$ W under 10–1000 lx for indoor applications [2]. With the limited form factor and diverse environmental conditions, the power dissipation of the SoCs needs to decrease down to a few nanowatts or even picowatt level [3], [4] to meet the ultra-low-power (ULP) budget for next-generation IoT devices. To optimize the SoC power and performance, several power management techniques have been used, including dynamic voltage and frequency scaling (DVFS) [3]-[5] and multimodal control [4], [6]–[8]. However, these trends and new techniques bring tremendous challenges to the dc-dc converter design. For DVFS technique, the supply voltage and clock frequency of the SoC adaptively scale according to the workloads and energy conditions to achieve energy saving while maintaining the performance requirements. Therefore, the dc-dc converter should support fast reference tracking to enable DVFS and adapt to the load current change with fast response. Furthermore, to optimize system energy and performance, components usually operate at different modes, with a wide range of dynamic power consumption. For example, the SoC in [4] covers a power range from 10 nW to 4 mW (4  $\times$  10<sup>5</sup>) with three operation modes and Lin et al. [6] reported an SoC that has a power range from 1.85 nW to 343  $\mu$ W (1.8  $\times$  10<sup>5</sup>) with six different operation modes. Besides, the dc-dc converter potentially also powers different types of components, such as power scalable ADC [9] and clock [10]. All those applications motivate the necessity of designing a ULP, high efficiency, and wide dynamic range dc-dc converter with fast DVFS tracking and fast load-transient response (FLTR) to avoid large voltage droop and long settling time during load operation mode switching.

Thus far, a wide variety of ULP voltage regulators have been reported to achieve those goals [11]–[19]. The sub-nA digital low dropout (DLDO) regulator in [11] utilizes a hybrid synchronous binary-searching and asynchronous linear-searching

0018-9200 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. Conventional architectures of ULP dc-dc converters. (a) Using an event-driven asynchronous control loop [16]. (b) Using a PFM-based synchronous control loop [15], [17], [23].

control to achieve a wide dynamic range and FLTR. However, the power efficiency degrades for a large input and output voltage ratio intrinsically. A switched-capacitor (SC) voltage regulator [12] utilizes power-gating and event-driven asynchronous control to achieve picowatt leakage power with limitations of 50% peak efficiency and a  $1.3 \times 10^4$  dynamic range. Another sub-nW SC regulator [13] proposes a recursive current injection scheme where the input current is injected to each stage of the voltage booster but only achieving 58% peak efficiency and a 10<sup>2</sup> dynamic range. The inductor-based switching voltage regulator (SVR) has the advantages of high conversion efficiency, which is suitable for energy-efficient systems. Bandyopadhyay et al. [14] proposed a 1.1-nW boost converter with 544-pW quiescent power and 4-nW maximum output power. Paidimarri and Chandrakasan [15] proposed a 92% peak efficiency and wide dynamic range buck converter with 240-pW quiescent power. The optimization of power FET length and pulsewidth is explored to achieve low quiescent current and minimized energy loss. In [16], a 3.2-nW buck-boost converter is presented for solar energy harvesting and battery power management. To achieve low switching loss, the system operates in an asynchronous fashion with a continuous-time (CT) comparator. Sadagopan et al. [17] presented a Wi-Fi energy harvesting boost converter with ∼1-nW quiescent power. Although the prior inductor-based SVR has achieved sub-nW or near sub-nW quiescent power, they either suffer from small output power range [14], [16], [17] due to the limited clock frequency range and slow speed or require manual control for clock and deadtime [15]. Furthermore, none of the prior arts support both fast DVFS and FLTR. Conventional work [18]–[21] utilizes a multilevel structure or current-mode scheme to achieve FLTR and DVFS with a small value of inductor and capacitor. However, those techniques usually require a high-frequency clock (~MHz) and target on milliwatt power range, which are not suitable for ULP applications.

To address the existing challenges, we presented a sub-nW buck converter with high peak efficiency, wide dynamic range, fast DVFS, and FLTR [22]. By using a hybrid synchronous and asynchronous feedback control loop, reusing existing signals and circuits, and multiplexing the control paths, the buck converter achieves those features with minimum power and area

overhead. The synchronous loop is able to achieve frequency modulation and output regulation with high efficiency and the asynchronous loop automatically tracks the maximal frequency the circuit can operate at to achieve fast reference tracking. The control paths of the DVFS up-tracking and FLTR are multiplexed to achieve low area and power cost. Using thick oxide devices, a wide frequency range clock, and leakage-optimized power stage and drivers, this buck converter achieves sub-nW quiescent power and 93% peak efficiency with wide dynamic power range  $(5.5 \times 10^6)$ . In this article, we expand the work [22] to further illustrate the design details, tradeoffs, and circuit implementations in the context of energy-constrained ULP IoT SoC applications. The rest of this article is organized as follows. Section II describes the system architecture, design considerations, and control loops. Section III explains the circuit implementations followed by the measurement results and comparison with the state-of-the-art buck converters in Section IV. Finally, Section V concludes this article.

# II. SYSTEM ARCHITECTURE AND CONTROL LOOPS

## A. Conventional Versus Proposed Architectures

The conventional dc-dc converters [15]-[17], [23] achieve low quiescent and high conversion efficiency by using two types of architectures shown in Fig. 1. Fig. 1(a) shows the event-driven asynchronous control scheme [16]. The input/output voltages are regulated by the amplifier-based CT comparators. Based on the voltage value, the comparator generates the enable signal and the pulse control block triggers the power delivery operation where the inductor gets charged and then discharged to deliver power to the load side. The second architecture uses a synchronous pulse frequency modulation (PFM)-based buck/boost scheme, as shown in Fig. 1(b), where the clock frequency is adjusted based on the input/output power. When the output power is high, the clock frequency increases to allow more frequent comparisons between  $V_{REF}$  and  $V_{OUT}$  and power delivery operations for output regulation. When the output power is low, the clock frequency decreases to save dynamic power loss. Therefore, the PFM can scale the frequency to balance between power loss and speed. To achieve FLTR and fast DVFS for the two prior architectures, the asynchronous control scheme needs to



Fig. 2. Proposed buck converter with hybrid synchronous/asynchronous control loops.

increase the bias current of the comparators to achieve higher speed, and the synchronous scheme must always run at a high frequency to improve the bandwidth and response speed. Due to those intrinsic drawbacks, conventional scenarios are not suitable for ULP DVFS applications.

The proposed architecture utilized a hybrid synchronous and asynchronous control loops to achieve fast DVFS and FLTR, as shown in Fig. 2. The system consists of four parts: a power stage with off-chip inductors and capacitors, a synchronous PFM-based loop, an asynchronous control loop, and a pulse generator with drivers that controls the power stage to deliver power from  $V_{\rm IN}$  to  $V_{\rm OUT}$ . The synchronous PFM loop is implemented to regulate the output of the buck converter. At the rising edge of the CLK<sub>BUCK</sub>, the strong-armbased comparator compares  $V_{REF}$  and  $V_{OUT}$ . When  $V_{OUT}$  is smaller than  $V_{REF}$ , the output of the comparator,  $EN_{BUCK}$ , goes up to 1 and enables power delivery for only once. Otherwise, EN<sub>BUCK</sub> stays at 0 and no power delivery operation is executed. When CLK<sub>BUCK</sub> is zero, EN<sub>BUCK</sub> gets reset to 0, ready for next comparison. The frequency controller with PFM control algorithm adaptively tunes the frequency of CLK<sub>OSC</sub> by maintaining the frequency ratio of CLK<sub>BUCK</sub>/EN<sub>BUCK</sub>to be 3-4 times. The mode controller (MC) controls the circuit to operate among three modes: synchronous mode, DVFS mode, and FLTR mode. In default, the MC disables the asynchronous loop and the system clock is controlled by the PFM and equal to CLK<sub>OSC</sub>. When SEL<sub>VREF</sub>, which controls the voltage references, has changed or EN<sub>FLTR</sub> is high, the MC enables asynchronous loop to achieve fast DVFS and FLTR, as shown in Fig. 3.

When the asynchronous loop is enabled, unlike the traditional way where the CT comparator in the asynchronous loop regulates the output [16], the proposed asynchronous loop generates a fast asynchronous pulse frequency to clock the strong-arm comparator in the synchronous loop to regulate the output. The asynchronous loop has four functions: 1) the CT comparator detects the voltage drop at the output; 2) the asynchronous pulse generator (APG) reuses the existing



Fig. 3. Flowchart for mode control algorithm.

DONE and low side (LS) signals to generate the maximal frequency that the circuit can operate at for fast DVFS and FLTR; 3) based on the circuit mode, the clock selector chooses asynchronous pulse, CLK<sub>UP</sub> or CLK<sub>DN</sub>, as the system clock, CLK<sub>BUCK</sub>, which is used by the strong-arm comparator and MC; and 4) the feedforward signal, EN<sub>DVFS</sub>, generates a pulse at CLK<sub>BUCK</sub>, to quickly let the MC response. Therefore, the FLTR and DVFS tracking speed is no longer dependent on the bias current of the CT comparator and is maximized by the APG. In this design, a 50-pA bias current is used for the CT comparator to achieve low static power. The APG is digitally implemented and is only enabled when DVFS and FLTR are detected.

Furthermore, since the FLTR and DVFS up-tracking both track a higher voltage reference, their control paths can be merged to achieve lower area and power cost, as shown in Fig. 3. The DVFS is enabled when  $SEL_{VREF}$  changes, which is controlled externally by the processor or users based on performance requirements and power-saving demand, while the FLTR is enabled internally by the CT comparator based on the voltage difference between  $V_{REF}$  and  $V_{OUT}$ . With all those features, the asynchronous loop adds negligible power and area overhead to the whole system while enabling fast reference tracking.

## B. Power Loss Analysis and Component Parameter Selection

Adding an extra control loop inevitably increases area and power, to achieve high efficiency at near nanowatt range, the power loss of the circuits should be carefully considered and optimized especially at light load. For discontinuous conduction mode (DCM) operation, as shown in Fig. 4, where the inductor current keeps at zero during the OFF state ( $T_{\rm OFF}$ ), the efficiency of the dc–dc converter can be calculated by the



Fig. 4. Leakage loss paths and key waveforms for DCM operation.

following equation:

$$\eta = \frac{P_{\text{OUT}}}{P_{\text{IN}}} = \frac{P_{\text{OUT}}}{P_{\text{OUT}} + P_{\text{DYN}} + P_Q + P_{SW} + P_{\text{COND}}}$$
(1)

where  $P_{\rm IN}$  and  $P_{\rm OUT}$  stand for input and output power, respectively, and  $P_Q$ ,  $P_{\rm DYN}$ ,  $P_{\rm SW}$ , and  $P_{\rm COND}$  stand for quiescent power, dynamic power, switching loss, and conduction loss, respectively. Notably,  $P_{\rm DYN}$  is proportional to the output current and  $P_Q$  is the power consumption when the output current is zero. It consists of the dynamic loss of control logics, leakage power, and bias current. As load power keeps shrinking down to nanowatt–picowatt,  $P_{\rm SW}$ ,  $P_{\rm COND}$ , and  $P_{\rm DYN}$  are negligible, while  $P_Q$  dominants. Therefore, to improve the efficiency, the dynamic loss, leakage, and bias current should be minimized.

In terms of the leakage power and bias current, or called static power, as shown in Fig. 4, it mainly comes from three parts, which are buffer stage leakage, power stage leakage, and static power from control circuits. Several techniques are proposed to suppress the static power including transistor length modulation [15] and power stage over-driving [14], [17]–[23]. In this work, the three parts are effectively optimized by utilizing a strong-up weak-down (SuWd) driver, length split technique, digital implementation, signal reuse, and path multiplexing, which will be discussed in detail in Section III.

In terms of the dynamic loss minimization, a clock source with a wide frequency range and PFM control scheme is desired to be able to scale the clock frequency down to a few hertz. To allow low-frequency operation, an inductor with large inductance value ( $>\mu$ H) is needed since a large inductor can deliver more power per operation, which allows the circuit to operate at a lower frequency at light load. The following equations show the methods to calculate the desired components parameters for the purpose of decreasing power delivery frequency and dynamic power loss. The inductor peak current can be calculated by the following equation:

$$I_{\text{PEAK}} = \frac{V_{\text{IN}} - V_{\text{OUT}}}{I} T_{\text{HS}} \tag{2}$$

where L represents the inductance of the off-chip inductor and  $T_{\rm HS}$  represents the width of the high side (HS) signal.

The energy delivered to the load per power delivery operation,  $E_{\text{PULSE}}$ , can be calculated by (3), if the inductor peak current is given

$$E_{\text{PULSE}} = \frac{V_{\text{IN}} L I_{\text{PEAK}}^2}{2(V_{\text{IN}} - V_{\text{OUT}})}.$$
 (3)

The energy delivered per pulse increases with the inductance. Therefore, with a larger inductance, the power delivery frequency can be decreased according to the following equation:

$$P_{\text{LOAD}} = P_{\text{PULSE}} = \text{FREQ}_{\text{PULSE}} \times E_{\text{PULSE}}.$$
 (4)

However, the ripple voltage  $V_{\text{RIPPLE}}$  increases with  $E_{\text{PULSE}}$  according to the following equation:

$$C_{\text{LOAD}}((V_{\text{OUT}} + V_{\text{RIPPLE}})^2 - V_{\text{OUT}}^2) = 2E_{\text{PULSE}}.$$
 (5)

For example, assume that  $V_{\rm IN}=2$  V,  $V_{\rm OUT}=0.5$  V, and  $I_{\rm PEAK}=20$  mA, if we want the frequency of the power delivery operation to be 10 Hz for a 100-nW output power, according to (4) and (5), we can resolve L, which needs to be 37.5  $\mu$ H. If a <10-mV ripple voltage is required, by utilizing (5),  $C_{\rm LOAD}$  requires to be 1.98  $\mu$ F.

# C. Fast Reference Tracking for DVFS

Fast DVFS up-tracking allows the circuit to quickly switch to a high-power and high-performance mode, which is critical for performance regulation. For DVFS down-tracking, although the slow free-discharging method [24] can fully utilize the energy stored on the cap, yet slowly moving into low power mode could lead to more power consumption for the components that are communicating with the free-discharged blocks. Because the frequent communications are likely to generate more power at interfaces and other components, especially in SoCs applications. Therefore, fast DVFS downtracking is desired as well. However, the constrained current budget significantly limits the bandwidth and frequency of the circuits, which leads to slow reference tracking and large settling time. When reference tracking is demanded, the dc–dc converter should maximize the power delivery to the load or the discharging current at the load. When  $T_{OFF}$  is negligibly small, the buck converter can continuously delivery energy to the load, and the energy delivery speed can be calculated by being divided by the period. Therefore, when the load current is much smaller than the up-tracking current, the up-tracking speed,  $S_{PULSE}$ , can be expressed as follows:

$$S_{\text{PULSE}} = \frac{E_{\text{PULSE}}}{T_{\text{PULSE}}} = \frac{V_{\text{OUT}}I_{\text{PEAK}}}{2} \tag{6}$$

where  $T_{\text{PULSE}}$  represents the time period of one power delivery, which equals  $T_{\text{HS}} + T_{\text{LS}}$  when  $T_{\text{OFF}}$  is zero.

According to (6), faster energy delivery can be achieved with a larger peak inductor current. However, it causes a larger conduction loss and degrades efficiency [15]. Therefore, with the analysis in Section II-B, the choice of the off-chip component parameters is a tradeoff between speed, efficiency, and ripple voltage. For a given peak inductor current which is decided by the efficiency requirements, larger inductance and



Fig. 5. Operation timing diagram of the reference up-tracking.

capacitance decrease the quiescent power and ripple voltage yet at a cost of slower DVFS tracking speed.

Fig. 5 shows the waveform of the DVFS reference uptracking. To achieve fast up-tracking, the asynchronous loop generates the maximal clock frequency the circuit can run to deliver energy quickly. As analyzed above,  $T_{OFF}$  needs to be near zero to enable continuous power delivery. Therefore, the LS falling edge, which represents the completion of one power delivery, is used to generate the asynchronous pulse, CLK<sub>UP</sub>. The signal EN<sub>DVES</sub> is used to generate a pulse,  $T_P$ , at CLK<sub>BUCK</sub> to let the circuit quickly move into the DVFS mode since the MC makes decision at the rising edge of CLK<sub>BUCK</sub>. When SEL<sub>VREF</sub> changes, the MC enables the DVFS mode. Then, the UP signal equals 1 and CLK<sub>UP</sub> is selected as the system clock. At the end of each power delivery operation, the LS falling edge triggers a pulse,  $T_{PULSE}$ , after a time delay,  $T_D$ , to generate CLK<sub>UP</sub>. Then, the asynchronous pulse enables power delivery to the output to increase the voltage value until the comparator output, EN<sub>BUCK</sub>, is 0 at the rising edge of the  $CLK_{BUCK}$ , indicating that  $V_{OUT} > V_{REF}$ . Then, the MC stops the up-tracking, and CLK<sub>BUCK</sub> switches from CLK<sub>UP</sub> back to  $CLK_{OSC}$ . To regulate the output at higher  $V_{REF}$ , the  $CLK_{OSC}$ frequency is set to be maximum initially after the up-tracking completes. Then, the frequency is controlled by the PFM. The period generated by the APG is decided by the HS pulsewidth  $(T_{\rm HS})$ , LS pulsewidth  $(T_{\rm LS})$ ,  $T_D$ , and  $T_{\rm PULSE}$ . Notably, in the real design,  $T_D$  and  $T_{PULSE}$  could be negligible compared with  $T_{\rm HS}$  and  $T_{\rm LS}$ .

Fig. 6 shows the waveform of the DVFS down-tracking. Once triggered by the pulse created by EN<sub>DVFS</sub>, the MC detects the decrease of SEL<sub>VREF</sub>, and then, the DN goes to 1 and turns on a switch between  $V_{OUT}$  and GND, to generate a large current sink at output to quickly decrease the output voltage. Therefore, the comparator is required to work at the fastest speed to avoid potential undershoot. When the clock of the comparator is zero, the DONE signal gets reset. When the comparator finishes one comparison, the DONE signal goes up to 1, indicating the completion of comparison. The DONE signal is used to create the asynchronous pulse, CLK<sub>DN</sub>. The rising edge of the DONE signal followed by a short delay  $(T_D)$ enables the next comparison until  $EN_{BUCK} = 1$ , representing  $V_{\rm OUT}$  <  $V_{\rm REF}$ . Then, the down-tracking stops, and the buck clock switches to the CLK<sub>OSC</sub> again. During down-tracking, the clock period in DVFS down-tracking is equal to  $2 \times T_D$ .



Fig. 6. Operation timing diagram of the reference down-tracking.



Fig. 7. Operation timing diagram of the FLTR.

## D. Fast Load-Transient Response

Fig. 7 shows the waveform of the asynchronous FLTR, which reuses the DVFS up-tracking path to achieve low power and area cost. When the output current changes with a step up, the voltage drops below the guard band, and ENFLTR goes up to 1 to generate a pulse,  $T_P$  on CLK<sub>BUCK</sub> to push the circuit into the FLTR mode. Once ENFLTR is equal to 1, the MC enables FLTR and the circuits start to track the reference. The total settling time  $T_{\text{SETTLE}} = T_{\text{DISCH}} + T_{\text{COMP}} + T_{\text{UP-TRACK}}$ . The first parameter is the discharging time of the guard band voltage,  $\Delta V_{\rm GB}$ , which is dependent on the output capacitance, load current, and  $\Delta V_{\rm GB}$ .  $T_{\rm COMP}$  is the delay mostly from the CT comparator.  $T_{\text{UP-TRACK}}$  is the time needed for charging the output to the voltage reference and is decided by the tracking speed, load current, and output capacitance. In this design, due to the low bias current and large ( $\sim \mu F$ ) off-chip decoupling capacitor, the discharging time of the guard band voltage and the delay of the CT comparator dominate the total settling time. Also, the voltage droop at the output is decided by the guard band and  $T_{\text{COMP}}$  since the FLTR is enabled after the CT comparator output changes to 1. Thus, as shown in Fig. 7, the voltage keeps dropping after it reaches the guard band due to  $T_{\text{COMP}}$ . After the FLTR is enabled, similarly, once the output voltage reaches the reference voltage, ENBUCK stays at 0 after comparison, the circuits exit FLTR mode, and the system clock, CLK<sub>BUCK</sub>, switches back to equal to CLK<sub>OSC</sub>. When there is a current step down, the FLTR is not enabled and there is no overshoot theoretically even if the clock frequency is high since the power delivery operation is enabled only once if the comparator output is 1. After the load current steps down, there is no power delivery operation if  $V_{OUT} > V_{REF}$ .



Fig. 8. System block diagram of the proposed buck converter [22].

Also, the PFM is enabled to decrease the clock frequency for power saving.

#### III. CIRCUIT IMPLEMENTATION

## A. System Implementation

The detailed block diagram of the proposed buck converter is shown in Fig. 8. During general operation, the strongarm latch, LATH, compares  $V_{\rm OUT}$  with  $V_{\rm REF}$  to generate the control signal, EN<sub>BUCK</sub>, and enable power delivery under DCM. Triggered by the rising edge of EN<sub>BUCK</sub>, the pulse generator generates the HS and LS pulses to enable power delivery. At the end of each power delivery operation, the ZCD comparator compares the switching node, SWD, with VSS and adjusts the LS pulsewidth through a counter to achieve zerocurrent switching (ZCS). The hybrid pulse-frequency control block switches the circuit between synchronous and asynchronous loops to achieve output regulation, clock frequency modulation, and fast DVFS and FLTR. For the PoR, once  $V_{\rm IN}$  is powered up, a voltage detector block generates an enable signal to turn on a 7-bit counter and the current-starving oscillator (CS-OSC). After counting for a certain number of clock cycles, an output pulse is generated to reset the buck converter. To minimize the quiescent power, all the digital blocks are implemented with 500-nm length 2.5-V-thick oxide devices at the cost of a larger area.

# B. Driver and Power Stage Optimization

The power stage design is critical to achieve low power and high efficiency for buck converter. To optimize the tradeoffs among conduction loss, switching loss, and quiescent power, the width and length of the FET need to be carefully picked. Traditionally, the method of directly increasing the length of the transistor is utilized and analyzed in [15] for low quiescent power, as shown in Fig. 9(a). However, the large length of the



Fig. 9. Power stage optimization methods. (a) Traditional length modulation [15]. (b) Width split technique [25]. (c) Length split technique [29].

CMOS inevitably increases the gate capacitance, which incurs extra switching loss and degrades the efficiency at heavy load with PFM control. The gate width split technique is introduced in [25] to balance between conduction loss and switching loss targeting on an output power range from 500  $\mu$ A to 20 mA, as shown in Fig. 9(b). At heavy load, more power stage FETs are turned on to decrease the conduction loss, while in PFM, they are turned off to decrease switching loss. However, in the nanowatt power range, when leakage power is dominant, the width split method has no ability to suppress the leakage. Hence, by leveraging the stack effects [26], [29], we implement a length split technique to balance the switching loss and leakage power, as shown in Fig. 9(c). Since the leakage current from PMOS of the power stage partly flows into the load side, which can be regarded as the load charging current, here, we only consider the SWD to GND leakage loss.

Fig. 10 shows the simulated power stage leakage power and estimated area when we use a traditional structure with a size of W/L versus the length split technique with a W/(L/2) for each split FET. When the length is small, the length split technique may suffer from short channel effects leading to



Fig. 10. Simulated power stage leakage loss and estimated area cost for traditional length modulation versus length split techniques at  $V_{\rm IN}=1.5~{\rm V}$  and  $V_{\rm OUT}=0.5~{\rm V}$ .

large leakage current from SWD to GND. However, as the channel length increases, the length split scheme achieves better leakage suppression ability compared to the traditional length modulation. When  $V_{\rm OUT}$  is 0.5 V and  $V_{\rm IN}$  is 1.5 V, at 700-nm length, the stacked FETs have a 350-nm length and leakage is 78 pA, while the traditional length modulation has a 104-pA leakage at the same transistor length. Furthermore, by always turning on one of the two stacked FETs at heavy load, the proposed structure can achieve 40.4% less switching loss in simulation when  $V_{\rm IN}=1.5$  V and output power is 250 nW. In addition, compared with the minimal length transistors, the length split technique with 700-nm length can achieve  $13\times$  smaller leakage current with a  $4.6\times$  area cost.

For the length split technique, each power FET requires a dedicated driver for equal transition time [25] to avoid potential timing glitches such as deadtime mismatch caused by the HS/LS feedthrough, which increases the number of drivers. Therefore, the driver stage takes a large portion of leakage, which needs to be optimized. Drivers with smaller size have less leakage and gate capacitance but lack driving ability, which leads to large transition time when turning on/off the power FET and increases conduction loss, especially when the inductor current is at the peak. To balance the tradeoff, as shown in Fig. 11(a) and (b), the proposed SuWd driver minimizes the pulling down transistors for low quiescent power while maintaining the pulling up ability to achieve negligible penalty in conduction loss since the long transition time only happens when inductor current is still small. Therefore, for the HS driver, the design is straightforward where a smaller NMOS (8  $\mu$ m/500 nm) can be selected, while the PMOS stays constant (80  $\mu$ m/300 nm) at the last stage. For LS driver, the PMOS desires small leakage and large driving ability at the same time. Therefore, we increase both the width and length (100  $\mu$ m/500 nm) for the PMOS of the driver. To avoid large reverse current at the ZCS point, a medium size (16  $\mu$ m/300 nm) is selected for the NMOS. Compared to a traditional 300-nm length fan-out-four (FO4) driver, as shown in Fig. 11(c), the simulated result shows that it can improve the efficiency by 3%-4% at light load due to the transistors with larger length on the leakage path. At heavy load, the efficiency is also improved by 1%-2% since smaller size NMOS of the driver decreases the switching loss.



Fig. 11. (a) Schematic of the proposed SuWd driver circuit and leakage path. (b) Related timing waveform and principles. (c) Efficiency improvement compared with traditional 300-nm length FO4 drivers.

# C. Load-Transient Detector and Asynchronous Timing Generator

Fig. 12(a) shows the schematic of the asynchronous loadtransient detector (LTD). The LTD consists of a CT comparator and a D flip-flop (DFF) to store the ENFLTR signal. The CT comparator uses an amplifier-based structure with 50-pA bias current, including 4-bit offset tuning bits to generate a guard band,  $\Delta V_{\rm GB}$ , as shown in Fig. 7. To generate the maximal clock frequency for DVFS tracking, the APG has two separate loops to for up-tracking and down-tracking, as shown in Fig. 12(b). In the FLTR and DVFS mode, CLK<sub>UP</sub> and CLK<sub>DN</sub> are all set at 1 to make sure that the clock does not change during mode transition since mode transition happens at the rising edge of the CLK<sub>BUCK</sub>. For up-tracking, the falling edge of LS, which represents the completion of power delivery, generates a negative pulse at CLK<sub>UP</sub> to enable the next power delivery. Similar schemes could be used for CLK<sub>DN</sub> generation. However, in our design, a simpler oscillating scheme is used. When CLK<sub>BUCK</sub> is 0, the LATH resets the EN<sub>BUCK</sub> and DONE signals. During comparison, either DONE or EN<sub>BUCK</sub> goes up to 1 according to the input voltages. Therefore, during the down-tracking when  $V_{REF} < V_{OUT}$ , the DONE always toggles between 0 and 1. Also, this feature is used to create an oscillating loop for CLK<sub>DN</sub> generation.

Fig. 12(c) shows the simulated delay of the CT comparator under different input voltage differences and bias current. The two knobs significantly affect the speed of the comparator. To tradeoff between the speed and power, a 50-pA current and 20–50-mV programmable guard band is used for our design.

## D. Pulse Generators

Fig. 13 shows the pulse generator schematic and design optimization for parasitics, which can deteriorate the power efficiency of the dc–dc converter. The HS pulsewidth is generated at the rising edge of  $\rm EN_{BUCK}$  controlled by the time that the pulse generator needs to charge  $X_{\rm CHG}$  from 0 to the threshold, which is around  $V_{\rm IN-REAL}/2$ . The charging time is dependent on the RC constant, which is fine-grained tuned by a



Fig. 12. (a) Schematic of the LTD with a CT amplifier-based comparator. (b) Schematic of the asynchronous timing generator with a pulse generator. (c) CT comparator delay versus input voltage differences and bias current.



Fig. 13. Schematic of the pulse generator with an extra MP to solve the pulse generator threshold deviation caused by the parasitic CGS and bonding wire inductance at the rising edge of HS signal.

6-bit capacitance tuning bit [15]. The DN signal is also added as an input signal of HS pulse generator. Referring to Fig. 8, EN<sub>BUCK</sub> goes to 1 first, and then, the MC detects it at the rising edge of the delayed clock signal, CLK<sub>BUCKD</sub>. To avoid the down-tracking mode and power delivery happening at the same time and causing short-circuit current, the pulse generator will not be enabled until DN goes back to 0. Therefore, there is no power delivery operation at the end of the down-tracking mode if CLK<sub>OSC</sub> is 0, as shown in Fig. 6.

Due to the large parasitic C<sub>GS</sub> from the large power FET, bonding wire parasitic inductance, and picofarad-level on-chip decoupling capacitance, when  $X_{CHG}$  charges up to the threshold and HS goes from 0 to 1, the large  $C_{GS}$  and inductor remaining current causes the supply voltage,  $V_{\rm IN}$  REAL, bumping up to a higher voltage value, which causes the threshold to go up as well. This leads to ringing and generates a large short-circuit current at the power stage degrading the efficiency. Therefore, an extra transistor  $M_P$  (shown in the top of Fig. 13) is added to expedite the charging speed of  $X_{\rm CHG}$  once it reaches the threshold. Considering the delay time introduced by the drivers,  $M_P$  has enough time to charge  $X_{\text{CHG}}$  to a safer voltage value before HS flips if  $M_P$  is large enough. After EN<sub>BUCK</sub> goes down to zero, the pulse generator gets reset by the transistor  $M_{\rm DC}$ . This is important, because for FLTR and DVFS up-tracking, when the clock goes to 0, the LATH and the pulse generator need to be reset and ready for next power delivery. The LS pulse generator uses a similar structure to generate the pulse but without the feedback  $M_P$ transistor.

## E. Clock and Bias Generator

The clock generator is implemented by a CS-OSC with optimized driver stage, shown in the top of Fig. 14. The CS-OSC has benefits of ultra-low quiescent current but suffers from weak driving abilities and large short-circuit currents at the output driver stages. To improve the driving ability with low power cost, capacitances are added at last CS stages to expedite the transition and the normal buffer stage utilizes the stacked drivers to suppress short-circuit current. The bottom of Fig. 14 shows the simulated power consumption of the CS-OSC when  $C_2$  increases at different values of  $V_{\rm IN}$ . The tradeoff exists between the short-circuit current at the driver stage and the extra power needed to charge/discharge the



Fig. 14. Schematic of the CS-OSC and BG, and simulation results of the impact from  $C_2$  capacitance across  $V_{\rm IN}$ .

added extra capacitors. In our design, 600-fF  $C_2$  value is selected to balance the tradeoffs, which achieves 25.2-pW power at 20 Hz and 1.5-V  $V_{\rm IN}$ . The 10-bit bias current tunability is added to cover a range of 20 Hz–78.8 kHz according to the simulation. The bias generator (BG) utilized a beta multiplier current reference to generate supply-independent current and voltage references. The voltage reference value can be calculated using the following equations. The current flowing through M1 and M2 is equal due to the PMOS-based current mirror. The bias current can be resolved by using the following equation:

following equation: 
$$\sqrt{\frac{2I_{\text{BIAS}}}{\mu_n C_{\text{ox}}(W_2/L_2)}} + V_{\text{TH2}}$$

$$= \sqrt{\frac{2I_{\text{BIAS}}}{\mu_n C_{\text{ox}}(W_1/L_1)}} + V_{\text{TH1}} + I_{\text{BIAS}}R \quad (7)$$
where  $I_{\text{BIAS}}$  represents the bias current of M1 and M2,  $\mu_n$  represents the carrier mobility,  $C_{\text{ox}}$  represents the gate possible of the transistor per unit area, and  $W/L$  represents

where  $I_{\rm BIAS}$  represents the bias current of M1 and M2,  $\mu_n$  represents the carrier mobility,  $C_{\rm ox}$  represents the gate oxide of the transistor per unit area, and W/L represents the ratio of transistor width and length. Then, M4 copies the current reference to generate a voltage reference through M6. The reference voltage can be solved by using the following equation:

$$V_A = V_{GS,M6} = \sqrt{\frac{2I_{BIAS}}{\mu_n C_{ox}(W_6/L_6)}}$$
$$= \sqrt{\frac{4 * (\sqrt{\frac{L^2}{W^2}} - \sqrt{\frac{L^1}{W^1}})^2}{(\mu_n C_{ox})^2 (W_6/L_6)R^2}}.$$
 (8)

With a 30-M $\Omega$  resistor, the BG works across 1.4–2.5 V and achieves 397 pW at 1.5-V supply in simulation. For DVFS

reference, two voltage references with 6.4-pF on-chip decoupling capacitors are connected to a MUX for reference selection. When SEL<sub>VREF</sub> changes, the BG changes the voltage through the MUX and the MC enters DVFS mode to track the new reference voltage.

## F. Adaptive Deadtime Controller

SVRs need a deadtime between HS and LS pulses to prevent short-circuit current. If the deadtime is too long, the SWD, in Fig. 15, will go down to negative voltage value (near negative  $V_{\rm TH}$ ) to force M2 on which causes a large conduction loss. If the deadtime is too short, the switching loss will increase [27], [28]. The ideal situation is to turn on the power stage NMOS right after the SWD node reaches zero. Besides, due to the process variation and large supply voltage range, the deadtime usually varies and is tuned manually traditionally, which is not desired in real applications and a large scale of deployment. In [27], the deadtime is adaptively tuned by indirectly detecting the SWD point to decide the moment to enable the LS driver, but it requires an extra bias current, which is not suitable for nanowatt-level low quiescent power dc-dc converter. In this work, we proposed a switch capacitorbased adaptive deadtime controller (ADTC) shown in Fig. 15, which consists of a SC-based pre-charger, a comparator, and a digital counter. The  $V_{\rm DT}$  node is first pre-charged to a fixed programmable voltage, and, during deadtime, to maintain the inductor current,  $I_L$ , the voltage at SWD reduces from  $V_{\rm IN}$ to a negative voltage to generate  $I_{LS}$ . Adding a small size of transistor M3 in parallel with M2 generates a small current  $I_{\rm DT}$ , which discharges the  $C_{\rm PRE}$  capacitor. The time that needs to discharge  $V_{\rm DT}$  to GND through  $I_{\rm DT}$  is the time that the SWD is at around negative  $V_{\rm TH}$ .



Fig. 15. Schematic of the proposed ADTC.

In this design, M2 and M3 keep the same length, but the width of M3 is 1/375 of M2 making  $I_{\rm DT}$  equal to  $I_{\rm LS}/375$ . By setting proper  $C_{\rm PRE}$  and  $C_{\rm TUNE}$  values, the deadtime can be adaptively tuned by the ADTC through the delay cell,  $T_D$ . Compared with the traditional work, the SC-based ADTC utilizes the existing HS and CLK<sub>ZCD</sub> as its clock source achieving low area and power overhead, and at the same time, it can scale its power based on the load current since the HS pulse frequency is proportional to the load power.

# IV. EXPERIMENTAL RESULTS

The buck converter is fabricated in a 65-nm CMOS process. The die photograph is shown in Fig. 16 with an active area of 0.237 mm². Fig. 17 shows the testing setup. The chip is packaged and tested with a 100-pin quad flat no-lead (QFN) socket. The off-chip components are the 22- $\mu$ H buck converter inductor, from Coil-Craft LPS5030-223MRC (4.9 mm × 4.9 mm × 3 mm), a 4.7- $\mu$ F output decoupling capacitor  $C_{\rm OUT}$  (0805 package size), and a 10- $\mu$ F input decoupling capacitor  $C_{\rm IN}$  (0805 package size). A Keithley 6430 Sub-Femto Remote Sourcemeter is used to measure the quiescent power from all the blocks.

#### A. Load-Transient Response and Fast DVFS

The measured load-transient response is shown in Fig. 18(a) and (b). By connecting the output of the buck converter to an 840- $\Omega$  off-chip load resistor, a current step can be provided to measure the load-transient response. When the load step changes from 45 nA to around 1 mA, the voltage droop is 56 mV and the settling time is 183  $\mu$ s, where the sum of  $T_{\text{DISCH}}$  and  $T_{\text{COMP}}$  takes 173  $\mu$ s, and  $T_{\text{UP-TRACKING}}$ takes 10  $\mu$ s. As we analyzed in Sections II-C and III-C,  $T_{\text{DISCH}}$  and  $T_{\text{COMP}}$  take most of part of the settling time. After the output voltage drops below the guard band (<56 mV), the CT comparator detects the voltage drop and its output changes after a delay time. During the delay time,  $V_{\text{OUT}}$  keeps decreasing. Then, after the FLTR is enabled, it takes 10  $\mu$ s for the output voltage to be charged to the reference. As a comparison, when the FLTR function is disabled, even for a small 7- to  $105-\mu A$  current step, the output voltage has a 400-mV voltage droop and 33.8-ms settling time due to the slow synchronous feedback loop, as shown in Fig. 18(c). When



Fig. 16. Chip micrograph of the buck converter.



Fig. 17. Testing and measurement setups.

the load current steps down from 1 mA to 45 nA, as shown in Fig. 18(a), there is no voltage droop observed, which matches with the analysis in Section II-C.

The measured fast DVFS tracking is shown in Fig. 19. Through a JTAG debugger (SEGGER J-Link) and RISCV processor, we changed EN<sub>DVFS</sub> and SEL<sub>VREF</sub> at the same time to let the MC enable the DVFS tracking. The fast DVFS tracking achieves 10.57 µs for an 88-mV up-tracking and 19.81 µs for a 92-mV down-tracking. Without the fast DVFS function, measured up-/down-tracking time increases up to 923 ms and 2.65 s, respectively, due to the slow feedback loop and small discharging load current. A ring OSC, VCO<sub>LOAD</sub>, which provides the clock frequency to the loads, is used in the testing to show the frequency scaling ability. As voltage scales down, the frequency of the ring OSC, powered by the buck converter, also decreases from 2.2 to 0.95 kHz, achieving dynamic frequency scaling. The undershoot and overshoot voltage is less than 4 mV, which is only 0.5% of  $V_{\rm OUT}$  at 700 mV and is close to the measured ripple voltage.

The speed of the DVFS and load-transient response is affected by the off-chip decoupling capacitance value, load current, and the energy delivery speed to the load. Smaller capacitance is helpful for achieving faster transition speed. However, it causes larger voltage ripples at the output. Larger  $V_{\rm IN}$  is desired for both reference up-tracking and down-tracking because it increases  $V_{\rm GS}$  of M5 (in Fig. 6), allowing a large discharging current. Also, higher supply increases the



Fig. 18. Measured transient waveform for FLTR. (a) FLTR waveform for an output current step from 45 nA to 1 mA. (b) Detailed waveform from (a). (c) Waveform with FLTR disabled for current step from 7 to 105  $\mu$ A.



Fig. 19. Measured transient waveform for DVFS up and down reference tracking along with the dynamic frequency scaling.

inductor peak current to increase the energy per power delivery according to (2), (3), and (6). Notably, unlike the traditional method where the power stage LS FET is used to discharge the current from  $V_{\text{OUT}}$  to GND. In this design, an extra tunable transistor, M5, is added to control the discharging current amplitude. The power stage FET can generate a large discharging current due to its large W/L ratio. For example, when  $V_{\rm IN}=2~{
m V}$  and  $V_{\rm OUT}=0.8~{
m V}$ , the power FET LS transistor can discharge a 4.7-μF capacitor from 0.8 V to 700 mV within 600 ns with a near 800-mA discharging current. This requires the LATH to operate at 34 MHz at least for a  $4\times$  oversample rate if the undershoot needs to be less than 20 mV. To meet this high operating frequency requirement, the LATH needs to have larger W/L and bias current for faster speed, which increases quiescent current and dynamic power loss. Therefore, an extra transistor M5 with a small 20  $\mu$ m  $\times$  60  $\mu$ m area overhead is implemented to tradeoff between speed and power. Fig. 20(a) shows the measured supply voltage versus tracking speed. When supply voltage increases from 1.5 to 2.3 V, the up-



Fig. 20. (a) Measured DVFS tracking speed versus supply voltage changes. (b) Tradeoffs between the tracking speed and ripple voltage in terms of off-chip output capacitance.

tracking and down-tracking can achieve  $2.1\times$  and  $1.7\times$  faster speed. Fig. 20(b) shows the measured tradeoffs between the speed of the DVFS transition and the output voltage ripples across different off-chip decoupling capacitance. When the off-chip decoupling capacitor changes from 4.7 to 0.87  $\mu$ F, though the ripple voltage increases, the tracking speed achieves  $3-5\times$  improvement.

# B. Power Breakdown and Efficiency

Fig. 21 shows the simulated power loss breakdown at light load (25 nW) and heavy load (10  $\mu$ W). The control power of the circuits takes only 7% of the total power loss at the heavy load. When the output current decreases down to 25 nW. The control power increases up to 46% of the total power loss



Fig. 21. Simulated power loss breakdown at light load and heavy load with three categories: conduction loss, switching loss, and control power.



Fig. 22. Measured quiescent power across  $V_{\rm IN}$  from 1.5 to 2.3 V for all the subblocks and power breakdown at  $V_{\rm IN}=1.5$  V.

due to the infrequent power delivery operation. If the output power keeps decreasing further, the proportion of switching loss and conduction loss will decrease as well since fewer power delivery operations are needed and the control power takes more proportion of the total power loss. Therefore, decreasing the quiescent power of the circuit is critical for light-load efficiency improvement.

The quiescent power of the buck system is measured when there is no output power and the system clock frequency is modulated to its lowest value, 14.7 Hz. Fig. 22 shows the measured quiescent power across different values of  $V_{\rm IN}$  for different sub-blocks and the lowest quiescent power for the buck converter is 802 pW at 1.5-V  $V_{\rm IN}$ . The pie chart shows the detailed power breakdown of the buck converter. Due to the length split power stage, clock driver optimization, and reusing control paths, the buck converter control, clock, and



Fig. 23. Measured power efficiency across different values of  $V_{\rm IN}$  and  $V_{\rm OUT}$ .

power stage in sum take 43% of the total quiescent power, while the BG takes 52% of the total power. Fig. 23 shows the measured power efficiency across different values of  $V_{\rm IN}$  and  $V_{\rm OUT}$ , with a 93% peak efficiency. The buck converter achieves a measured dynamic range from 0.5 nW to 2.75 mW, which provides over six orders of magnitude and still keeps an 80% efficiency down to 4.3-nW output power.

## C. Comparison to State of the Arts

Table I compares this work with state-of-the-art picowattnanowatt dc-dc converters, including the inductor-, SC-, and LDO-based converters. Compared with the LDO and SCbased converters, our converter achieves higher efficiency and wider dynamic range. However, in terms of area consideration, the LDO and SC usually only require one offchip decoupling capacitor at the output [11], [12] or can be off-chip component-free [31]. For inductor-based dc-dc converter, an extra microhenry-level off-chip inductor along with microfarad-level decoupling capacitors at both input and output is needed for ULP applications as we analyzed in Section II-B. Nevertheless, the volume of the off-chip capacitors can be scaled to <1 mm<sup>3</sup> level for most commercial products, for example, 10-µF ceramic capacitors with 0402 footprint (1.00 mm  $\times$  0.50 mm) [32]. Also, the inductor (47  $\mu$ H) can reach a sub-cm<sup>3</sup> size as well [33]. Therefore, the proposed inductor-based sub-nW dc-dc converter is still one of the best candidates for ULP IoT cm<sup>3</sup> level applications due to its high efficiency, wide dynamic output range, and ultra-low quiescent power.

Fig. 24 benchmarks this work against prior nanowatt- and picowatt-level dc-dc converters in terms of dynamic power range and quiescent power with a consideration of FLTR and DVFS tracking capability. Our buck converter achieves the highest peak efficiency and widest dynamic range among all the sub-nW SVRs. Due to the hybrid loop control scheme, the buck converter also features fast DVFS and FLTR which previous work does not support. In addition, the buck converter integrates all the features on chip, including the proposed ADTC, PoR, BG, and clock generator. All the measured performance and integrated features make the buck converter well suited as the power management solution for ULP IoT applications.

|                                                                     | [12]<br>JSSC'17                           | [30]<br>ISSCC'16                  | [11]<br>ISSCC'19                       | [14]<br>JSSC'14                | [16]<br>JSSC'16                   | [23]<br>JSSC'16                  | [15]<br>JSSC'17                     | [17]<br>JSSC'18                      | This Work                              |
|---------------------------------------------------------------------|-------------------------------------------|-----------------------------------|----------------------------------------|--------------------------------|-----------------------------------|----------------------------------|-------------------------------------|--------------------------------------|----------------------------------------|
| Technology (nm)                                                     | 65                                        | 180                               | 65                                     | 180                            | 130                               | 65                               | 65                                  | 65                                   | 65                                     |
| Topology                                                            | SC(Boost)                                 | SC(BUCK)                          | DLDO                                   | SVR(Boost)                     | SVR(Buck-Boost)                   | SVR(Buck)                        | SVR(Buck)                           | SVR(Boost)                           | SVR(Buck)                              |
| Regulation Scheme                                                   | ACRM                                      | Binary-Reconfig                   | ABS-SLS                                | PWM                            | PFM                               | PWM/PFM/AM                       | PFM                                 | PFM                                  | PFM/LM/PWM                             |
| L (μH) / C <sub>OUT</sub> (μF)                                      | N/A                                       | N/A                               | N/A/0.1                                | 47/0.2                         | 47/NR                             | 4.7/4.7                          | 47/0.35                             | 47/1                                 | 22/4.7                                 |
| V <sub>IN</sub> (V)                                                 | 0.25-0.65                                 | 0.9-4                             | 0.5-1                                  | 0.02-0.07                      | 2.9-4.1                           | 0.55-1.0                         | 1.2-3.3                             | 0.08-0.8*                            | 1.5-2.3                                |
| V <sub>out</sub> (V)                                                | 3.8-4                                     | 0.6 1.2 3.3                       | 0.4-0.95                               | 1                              | 0.8-1.1                           | 0.35-0.5                         | 0.7-0.9                             | 0.7-1.1                              | 0.5-0.9                                |
| Quiescent Power (nW)                                                | 0.015                                     | 4.6*                              | 0.373                                  | 0.544                          | 3.2                               | 50.7*                            | 0.24                                | 1.05                                 | 0.801                                  |
| Peak Efficiency (%)                                                 | 50                                        | 81                                | V <sub>OUT</sub> /V <sub>IN</sub>      | 53                             | 87                                | 92                               | 92                                  | 78                                   | 93                                     |
| Dynamic Range<br>(P <sub>LOAD, MAX</sub> / P <sub>LOAD, MIN</sub> ) | 45.2pW - 0.6µW*<br>(1.3×10 <sup>4</sup> ) | 5nW-500μW<br>(1×10 <sup>5</sup> ) | 0.32-121.5µW<br>(3.8×10 <sup>5</sup> ) | 0.544-4nW<br>(7.4)             | 10nW-1μW*<br>(1×10 <sup>2</sup> ) | 45nW-9mW<br>(2×10 <sup>5</sup> ) | 0.4nW-0.8mW<br>(2×10 <sup>6</sup> ) | 0.12-160nW<br>(1.3×10 <sup>3</sup> ) | 0.5nW-2.75mW<br>(5.5×10 <sup>6</sup> ) |
| Dynamic Range<br>with η > 80%                                       | No                                        | No                                | 3.6nW-121.5µW                          | N/A                            | 40nW-1μW*                         | 13.5µW-9mW                       | 0.9nW-0.8mW*                        | N/A                                  | 4.3nW-2.75mW                           |
| Fast Load response                                                  | No                                        | Yes                               | Yes                                    | No                             | No                                | Yes                              | No                                  | No                                   | Yes                                    |
| Voltage droop @<br>ΔΙ <sub>LOAD</sub>                               | N/A                                       | 160mV@1µA                         | 76.5mV@0.27m<br>A                      | 40mV*@N/R                      | N/A                               | N/A                              | 30mV*@0.4mA                         | N/A                                  | 56mV@1mA                               |
| Settling Time @ ΔI <sub>LOAD</sub>                                  | N/A                                       | >20ms@1µA                         | 48µs@0.27mA                            | >40s@N/R                       | N/A                               | N/A                              | 2ms@0.4mA                           | N/A                                  | 0.183ms@1mA                            |
| Fast DVFS<br>(T <sub>UP</sub> / T <sub>DOWN</sub> )                 | No                                        | No                                | No                                     | No                             | No                                | No                               | No                                  | No                                   | Yes<br>(6μs/10.76μs <sup>†</sup> )     |
| Off-chip Components                                                 | 1 cap                                     | N/A                               | 1 cap                                  | 2 caps + inductor<br>+ antenna | 2 caps + inductor                 | 1 cap + inductor                 | 2 caps + inductor                   | 3 caps + inductor                    | 2 caps +<br>inductor                   |

TABLE I

COMPARISON WITH STATE-OF-THE-ART PICOWATT–NANOWATT DC–DC CONVERTERS

\*Observed/Calculated from the waveform †For a 50mV reference tracking step



Fig. 24. Comparison with the existing nanowatt dc-dc converters in terms of dynamic power range and quiescent power with a consideration of FLTR and DVFS tracking capability.

## V. CONCLUSION

To achieve high efficiency, low quiescent power, and wide dynamic range while supporting fast DVFS and FLTR, we presented a sub-nW buck converter with hybrid synchronous and asynchronous loop control. The synchronous loop is able to achieve frequency modulation and output regulation with high efficiency and the asynchronous loop generates the maximal timing frequency to achieve fast FLTR and DVFS. By reusing the existing signals and blocks and multiplexing the DVFS up-tracking and FLTR paths, the two control loops achieve 248-pW power overhead in total. The power stage and gate drivers are analyzed and optimized by utilizing the length split techniques and SuWd scheme to achieve higher efficiency and lower leakage. The clock driving stage is optimized to suppress the short-circuit current, achieving 15-pW lowest power and 20-Hz-to-78.8-kHz frequency range. An ADTC with load-proportional scaling is proposed to enhance the efficiency. The measurement results show that this buck converter achieves an 802-pW quiescent power at 1.5-V  $V_{\rm IN}$ ,

93% peak efficiency, and  $5.5 \times 10^6$  dynamic range with fast DVFS and FLTR. This buck converter shows an optimal power management solution for powering ULP IoT SoCs.

# REFERENCES

- [1] Energizer. *CR1025-Lithium Coin Battery*. Accessed: Nov. 2021. [Online]. Available: https://data.energizer.com/pdfs/cr1025.pdf
- [2] I. Lee, W. Lim, A. Teran, J. Phillips, D. Sylvester, and D. Blaauw, "A >78%-efficient light harvester over 100-to-100 klux with reconfigurable PV-cell network and MPPT circuit," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2016, pp. 370–371.
- [3] D. S. Truesdell et al., "A 6–140-nW 11 Hz–8.2-kHz DVFS RISC-V microprocessor using scalable dynamic leakage-suppression logic," IEEE Solid-State Circuits Lett., vol. 2, no. 8, pp. 57–60, Aug. 2019.
- [4] P. Prabhat et al., "27.2 M0N0: A performance-regulated 0.8-to-38 MHz DVFS ARM cortex-M33 SIMD MCU with 10 nW sleep power," in IEEE ISSCC Dig. Tech. Papers, Feb. 2020, pp. 422–424.
- [5] T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, "A dynamic voltage scaled microprocessor system," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1571–1580, Nov. 2000.
- [6] L. Lin, S. Jain, and M. Alioto, "Integrated power management for battery-indifferent systems with ultra-wide adaptation down to nW," *IEEE J. Solid-State Circuits*, vol. 55, no. 4, pp. 967–976, Apr. 2020.
- [7] D. Bol et al., "19.6 A 40-to-80 MHz sub-4µW/MHz ULV Cortex-M0 MCU SoC in 28 nm FDSOI with dual-loop adaptive back-bias generator for 20µs wake-up from deep fully retentive sleep mode," in IEEE ISSCC Dig. Tech. Papers, Feb. 2019, pp. 322–324.
- [8] J. Myers, A. Savanth, D. Howard, R. Gaddh, P. Prabhat, and D. Flynn, "8.1 An 80 nW retention 11.7 pJ/cycle active subthreshold ARM cortex-M0+ subsystem in 65 nm CMOS for WSN applications," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [9] F. M. Yaul and A. P. Chandrakasan, "A 10 bit SAR ADC with data-dependent energy reduction using LSB-first successive approximation," IEEE J. Solid-State Circuits, vol. 49, no. 12, pp. 2825–2834, Dec. 2014.
- [10] I. Lee, D. Sylvester, and D. Blaauw, "A constant energy-per-cycle ring oscillator over a wide frequency range for wireless sensor nodes," *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 697–711, Mar. 2016.
- [11] S. Li and B. H. Calhoun, "A 745 pA hybrid asynchronous binary-searching and synchronous linear-searching digital LDO with 3.8×10<sup>5</sup> dynamic load range, 99.99% current efficiency, and 2 mV output voltage ripple," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2019, pp. 232–234.
- [12] X. Wu et al., "A 20-pW discontinuous switched-capacitor energy harvester for smart sensor applications," *IEEE J. Solid-State Circuits*, vol. 52, no. 4, pp. 972–984, Apr. 2017.

- [13] H. Lee et al., "A sub-nW fully integrated switched-capacitor energy harvester for implantable applications," in Proc. IEEE 44th Eur. Solid State Circuits Conf. (ESSCIRC), Sep. 2018, pp. 50–53.
- [14] S. Bandyopadhyay, P. P. Mercier, A. C. Lysaght, K. M. Stankovic, and A. P. Chandrakasan, "A 1.1 nW energy-harvesting system with 544 pW quiescent power for next-generation implants," *IEEE J. Solid-State Circuits*, vol. 49, no. 12, pp. 2812–2824, Dec. 2014.
- [15] A. Paidimarri and A. P. Chandrakasan, "A wide dynamic range buck converter with sub-nW quiescent power," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3119–3131, Dec. 2017.
- [16] D. El-Damak and A. P. Chandrakasan, "A 10 nW–1 μW power management IC with integrated battery management and self-startup for energy harvesting applications," *IEEE J. Solid-State Circuits*, vol. 51, no. 4, pp. 943–954, Apr. 2016.
- [17] K. R. Sadagopan, J. Kang, Y. Ramadass, and A. Natarajan, "A cm-scale 2.4-GHz wireless energy harvester with nanowatt boost converter and antenna-rectifier resonance for WiFi powering of sensor nodes," *IEEE J. Solid-State Circuits*, vol. 53, no. 12, pp. 3396–3406, Dec. 2018.
- [18] X. Liu, C. Huang, and P. K. T. Mok, "A high-frequency three-level buck converter with real-time calibration and wide output range for fast-DVS," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 582–595, Feb. 2018.
- IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 582–595, Feb. 2018.
  [19] J.-G. Kang, M.-G. Jeong, J. Park, and C. Yoo, "A 10 MHz time-domain-controlled current-mode buck converter with 8.5% to 93% switching duty cycle," in IEEE ISSCC Dig. Tech. Papers, Feb. 2018, pp. 424–426.
  [20] S. Pan and P. K. T. Mok, "A 10-MHz hysteretic-controlled buck
- [20] S. Pan and P. K. T. Mok, "A 10-MHz hysteretic-controlled buck converter with single on/off reference tracking using turning-point prediction for DVFS application," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 11, pp. 4502–4515, Nov. 2019.
- [21] S.-H. Chien, T.-H. Hung, S.-Y. Huang, and T.-H. Kuo, "A monolithic capacitor-current-controlled hysteretic buck converter with transientoptimized feedback circuit," *IEEE J. Solid-State Circuits*, vol. 50, no. 11, pp. 2524–2532, Nov. 2015.
- [22] X. Liu, S. Li, and B. H. Calhoun, "An 802 pW 93% peak efficiency buck converter with 5.5×10<sup>6</sup> dynamic range featuring fast DVFS and asynchronous load-transient control," in *Proc. IEEE 47th Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2021, pp. 347–350.
- [23] P.-H. Chen, C.-S. Wu, and K.-C. Lin, "A 50 nW-to-10 mW output power tri-mode digital buck converter with self-tracking zero current detection for photovoltaic energy harvesting," *IEEE J. Solid-State Circuits*, vol. 51, no. 2, pp. 523–532, Feb. 2016.
- [24] Y. Lu, "Digitally assisted low dropout regulator design for low duty cycle IoT applications," in *Proc. IEEE Asia Pacific Conf. Circuits Syst.* (APCCAS), Oct. 2016, pp. 33–36.
- [25] Y. Park et al., "A design of a 92.4% efficiency triple mode control DC–DC buck converter with low power retention mode and adaptive zero current detector for IoT/wearable applications," *IEEE Trans. Power Electron.*, vol. 32, no. 9, pp. 6946–6960, Sep. 2017.
- [26] M. C. Johnson, D. Somasekhar, and K. Roy, "Models and algorithms for bounds on leakage in CMOS circuits," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 18, no. 6, pp. 714–725, Jun. 1999.
- [27] H. Chen et al., "A single-inductor dual-output converter with the stacked MOSFET driving technique for low quiescent current and cross regulation," *IEEE Trans. Power Electron.*, vol. 34, no. 3, pp. 2758–2770, Mar. 2019.
- [28] W. Qiu, S. Mercer, Z. Liang, and G. Miller, "Driver deadtime control and its impact on system stability of synchronous buck voltage regulator," *IEEE Trans. Power Electron.*, vol. 23, no. 1, pp. 163–171, Jan. 2008.
- [29] S. S. Amin and P. P. Mercier, "MISIMO: A multi-input single-inductor multi-output energy harvesting platform in 28-nm FDSOI for powering net-zero-energy systems," *IEEE J. Solid-State Circuits*, vol. 53, no. 12, pp. 3407–3419, Dec. 2018.
- [30] W. Jung et al., "8.5 A 60%-efficiency 20 nW-500μW tri-output fully integrated power management unit with environmental adaptation and load-proportional biasing for IoT systems," in *IEEE ISSCC Dig. Tech.* Papers, Feb. 2016, pp. 154–155.
- [31] W. Jung et al., "A 3 nW fully integrated energy harvester based on self-oscillating switched-capacitor DC–DC converter," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2014, pp. 398–399.
- [32] Kyocera AVX. 04026D106MAT2A—Ceramic Capacitors. Accessed: Feb. 2022. [Online]. Available: https://www.digikey.com/en/products/detail/kyocera-avx/04026D106MAT2A/6564251
- [33] Coilcraft. LPS4018—Shielded Power Inductors. Accessed: Feb. 2022. [Online]. Available: https://www.mouser.com/datasheet/2/597/lps4018-270702.pdf



Xinjian Liu (Graduate Student Member, IEEE) received the B.Eng. degree in microelectronics from Fudan University, Shanghai, China, in 2019. He is currently pursuing the Ph.D. degree in electrical engineering with the University of Virginia, Charlottesville, VA, USA.

His research interests include low-power dc-dc converter, power management unit design, and the Internet of Things healthcare system prototyping.



**Benton H. Calhoun** (Fellow, IEEE) received the B.S. degree from the University of Virginia, Charlottesville, VA, USA, in 2000, and the M.S. and Ph.D. degrees from the Massachusetts Institute of Technology, Cambridge, MA, USA, in 2002 and 2006, respectively, all in electrical engineering.

In January 2006, he joined the Department of Electrical and Computer Engineering, University of Virginia, where he is currently a Professor. His research has emphasized energy-efficient and subthreshold circuit design for self-powered, batteryless

wireless sensing systems. Starting from fundamental advances in subthreshold circuits, he has expanded his work to include complete self-powered nodes for the Internet-of-Things (IoT) and body-worn applications. He is the Campus Director and the Technical Thrust Leader of the NSF Nanosystems Engineering Research Center (ERC) for Advanced Self-Powered Systems of Integrated Sensors and Technologies (ASSIST). He co-founded and is the co-CTO at Everactive, Inc., Charlottesville, which is selling self-powered, energy-harvesting wireless sensing solutions in the industrial IoT market. He is a coauthor of Sub-threshold Design for Ultra-Low-Power Systems (Springer, 2006) and an author of Design Principles for Digital CMOS Integrated Circuit Design (NTS Press, 2012). He has over 200 peer-reviewed publications and holds 22 issued U.S. patents that contribute to the field of energy-efficient circuits and systems for self-powered and energy-constrained applications. His research interests include self-powered wireless sensors for the IoT, batteryless systems, body area sensor networks, low-power digital circuit design, system-on-chip architecture and circuits for energyconstrained applications, system-driven embedded hardware/software design, wakeup receivers, energy-harvesting-power management units, subthreshold digital circuits, subthreshold SRAM, energy-efficient communication, power harvesting and delivery circuits, low-power mixed-signal design, and medical applications for low-energy electronics.



Shuo Li (Member, IEEE) received the B.Eng. degree in microelectronics from the University of Electronic Science and Technology of China, Chengdu, China, in 2013, the M.S. degree in microelectronics from Fudan University, Shanghai, China, in 2016, and the Ph.D. degree in electrical engineering from the University of Virginia, Charlottesville, VA, USA, in 2021.

He is currently a Post-Doctoral Research Associate with the Coordinated Science Laboratory, University of Illinois at Urbana–Champaign, Urbana, IL, USA.

His research interests include analog/digital/mixed-signal integrated circuits and systems, energy harvesting and power management units, dc-dc converters, energy-efficient machine-learning accelerators, in-memory computing techniques, and ultra-low-power systems-on-chip for the Internet-of-Things applications.

Dr. Li was a recipient of the IEEE International Symposium on Circuits and Systems (ISCAS) Best Paper Award in 2020 and a winner of the IEEE SSCS 2019-2020 International Student Circuit Contest. He also serves as a Reviewer for IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, and IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS.