# A 1.3µW, 5pJ/cycle Sub-threshold MSP430 Processor in 90nm xLP FDSOI for Energy-efficient IoT Applications

Abhishek Roy<sup>1</sup>, Peter J. Grossmann<sup>2</sup>, Steven A. Vitale<sup>2</sup>, Benton H. Calhoun<sup>1</sup>

<sup>1</sup>University of Virginia, Charlottesville, VA USA

<sup>2</sup>MIT Lincoln Laboratory, Lexington, MA USA

Email: {ar9ch, bcalhoun}@virginia.edu, {grossmann, steven.vitale}@ll.mit.edu

### Abstract

This paper presents an implementation of a 16-bit MSP430 processor for ultra-low-power (ULP) systems catering to battery-less wireless sensor nodes, biomedical, and other IoT applications. Implemented in a custom extremely low power (xLP) 90nm FDSOI process, the processor consumes  $1.3\mu$ W operating at 0.4V while executing a peak detection algorithm at 250 kHz. It supports the standard MSP430 instruction set architecture (ISA) and demonstrates QRS peak detection for an Electrocardiogram (ECG) application. The measured energy while executing peak detection at 250 kHz was 5pJ per cycle at 0.4V. The fabricated xLP devices show 55% reduction in threshold voltage (V<sub>th</sub>) variation compared to similar-sized transistors in a traditional FDSOI process.

### Keywords

Energy-Efficiency, Subthreshold processors, FDSOI, wireless sensor nodes, internet-of-things

## 1. Introduction

The increasing focus on IoT specific applications such as wearable sensors, portable biomedical electronics such as ECG monitors, and self-sustaining surveillance systems demand energy-efficient system operation. Owing to their requirements of having a smaller form-factor and mostly self-powered nearperpetual operation for a longer system lifetime, such systems are severely energy-constrained. Within the limited energy budget, these systems need to run application specific programs and sub-routines such as ECG monitoring [3][5]. Hence, energy-efficient processing at the circuit and at the system level is essential to minimize the energy per operation of such systems. Existing work in literature has reported systems or processor implementations consuming nW to  $\mu W$  power levels by operating the system near the threshold voltage ( $V_{th}$ ) of a transistor [1][2][3][4][5].

Operating a digital circuit in the subthreshold regime causes transistor leakage to be a dominant source of energy consumption because of exponentially large delays. Prior work in literature such as [2] has proposed digital logic styles to suppress subthreshold leakage of conventional bulk devices. Hence optimizing the leakage characteristics of a device can result in significant benefits at the overall system level. However low voltage transistor operation presents four key challenges: 1) to minimize the subthreshold swing and achieve maximum ON current below threshold, 2) to minimize static leakage current, 3) to minimize  $V_{th}$ variation, and 4) to minimize device capacitances. The extremely low power (xLP) FDSOI process provides CMOS transistors, optimized for lower subthreshold leakage with reduced  $V_{th}$  variation and minimal degradation in performance.

In this paper, we implement a 16-bit MSP430 processor [8] for subthreshold operation for diverse IoT applications in the 90nm xLP FDSOI process using logic synthesis and auto-place-and route (APR) tools. A library of logic gates and sequential circuits such as flip-flops and latches were characterized to operate at 0.36V, and timing closure was achieved at 200 kHz using static-timing-analysis (STA) tools. Measurement in silicon shows an energy consumption of 5pJ/cycle at 0.4V running a QRS peak detection algorithm on an ECG data at 250 kHz on the processor. This paper is organized as follows: Section 2 describes the xLP FDSOI process and devices used in the MSP430 processor implementation. Section 3 describes the overall processor architecture, including the processor operation and functional waveforms. Section 4 describes the measurement results and Section 5 concludes the paper.

## 2. xLP FDSOI Process and Device Description

The custom low power 90nm (xLP) FDSOI process technology is optimized for near- and subthreshold operation [6].



Fig 1: Cross-Sectional comparison of a standard PDSOI transistor and the xLP FDSOI transistor

Fig. 1 shows a schematic of the xLP FDSOI transistor and compares it against a typical commercial 90nm PDSOI transistor. xLP transistors are fabricated using 30-nm Si on 145-nm BOX. The gate dielectric is 3.5-nm SiON. Minimum gate lengths of 90 nm and gate widths of 120 nm are supported. Device engineering includes a 1020 °C, 5s rapid thermal anneal, ~10 nm of CoSi<sub>2</sub>, and a 20-min 400 °C hydrogen passivation anneal. The back-end consists of 5 metal layers of aluminum interconnect and SiO<sub>2</sub> dielectric. Interconnect widths as small as 140nm are supported. Near-ideal subthreshold swing is obtained by using moderately thin FDSOI and maintaining gate lengths of 90 nm and longer. Eliminating channel doping reduces the threshold voltage variation caused by non-uniformity in SOI thickness and random dopant fluctuations. The threshold voltage and thus leakage current of the transistors is set by a work function-tuned TiN metal gate. A custom plasma-enhanced atomic layer deposition (PE-ALD) process for the gate metal was developed. The PE-ALD TiN causes less plasma damage than typical sputtered TiN metal gates resulting in lower gate leakage and less device-to-device variation [7]. By eliminating the source drain extensions and employing wide nitride spacers in the xLP technology, device capacitances are minimized by 76% as compared to commercial FDSOI technology [6]. Fig. 2 shows the  $I_{ds}$ -V<sub>gs</sub> characteristics of 8 $\mu$ m wide and 150nm long xLP FDSOI NMOS and PMOS devices. Inset shows a TEM of a 150nm long device.



Fig 2: Ids-Vgs characteristics of FDSOI devices

#### 3. MSP430 Processor Architecture

The openMSP430 is an open-source MSP430 architecture from opencores.org [8]. It is a 16-bit RISC microcontroller based the Von-Neumann on architecture with a single address space for instructions and data. It is compatible with the MSP430 microcontroller family from Texas Instruments [9]. The core supports a 16 x 16 multiplier, watchdog, and a UART debug interface using standard RS232 serial communication protocols. Fig 3 describes the overall architecture of the openMSP430. The UART module in the standard debug interface (SDI) provides 8N1 serial communication with a host computer and enables the processor to be programmed serially [8].



Fig 3: OpenMSP430 Architecture [8]

The MSP430 architecture from Opencores supports 1kB of program memory (PMEM) and 128 bytes of data memory (DMEM) [8]. Both the PMEM and DMEM can be accessed using the SDI. The Frontend module fetches the 16-bit instruction from the PMEM and then decodes the instruction. The execution unit comprising of the ALU and the register file executes the decoded instruction. The memory backbone acts as an arbiter between the frontend, execution unit as well as the SDI and the PMEM/DMEM memories [8]. The architecture supports 512B of memory for peripherals, which include a basic clock module, a hardware multiplier, special function registers (SFRs) and a watchdog unit [9].



### (b) Executing a Fibonacci sequence program Fig 4: Functional waveforms of MSP430 processor

Fig. 4(a) describes the functional waveforms of the processor after the system comes out of reset followed by loading the PMEM with instructions and DMEM with data for executing a Fibonacci sequence program through the UART interface. Fig. 4(b) shows that after the PMEM is loaded with instructions and data, program execution is initiated through the UART and the output is transmitted over UART to the host computer.

#### 4. Measurement Results

Fabricated in the 90nm xLP FDSOI process, the testchip was packaged in a 132-pin PGA package for testing convenience. The chip was programmed and tested using а Tektronix TLA7012 pattern generator/logic analyzer. Current measurements were performed using a Keithley 2401 sourcemeter. To demonstrate processor functionality, measurements were taken by executing three different programs: A simple adder program to add and store two unsigned 16-bit integers, a program to generate and store the n<sup>th</sup> order Fibonacci sequence, where n is a programmable input set by the user, and a QRS peak detection algorithm [10] to detect sparse spikes in a measured ECG datastream. In this implementation, since PMEM

and DMEM are register-based and not SRAMs or custom memories, performance and energy metrics of the memory system is not measured and not taken into consideration.





Fig. 5 shows the measured functional waveforms of the UART debug interface transmitted by the processor and captured on an oscilloscope. Fig. 6 shows the energy-delay trends of the MSP430 processor for the three programs discussed above. The instructions for the three programs were loaded on to the on-chip register-based PMEM and the processor was configured to fetch instructions and data from the onchip PMEM and DMEM respectively. For the peak detection implementation, the processor consumes 5pJ per cycle at 0.4V and 250 kHz. If a higher performance is needed for overall ECG detection at the system level, the processor can operate at 1MHz at 0.6V, consuming 6.7pJ per cycle. Hence if a higher performance is desired, by sacrificing 34% energy, 4x performance improvement can be achieved. Since the chip was fabricated in an FDSOI process, a back-gate bias ranging from -5V to 5V was applied to tune the V<sub>th</sub> of the transistors to achieve optimum performance.



Fig 6: Measured Energy vs. Delay trends of the processor

Measured results show 55% reduction in  $V_{th}$ variation of the fabricated devices in the xLP FDSOI process as compared to a standard FDSOI process. Measured minimum energy across 8 functional dies show a  $\sigma/\mu$  of 0.0405. Fig 7 shows the  $I_{ds}$ -V<sub>gs</sub> measurements of 46 PMOS transistors across two wafers. The  $3\sigma$  variation in  $V_{th}$  was found to be 8mVfor a device with channel length,  $L_{\rm g}$  = 180nm and  $V_{\rm ds}$  = 0.3V. The reduced variation in V<sub>th</sub> was achieved due to reduced V<sub>th</sub> sensitivity to silicon thickness. Absence of random dopant fluctuations and reduced channel length sensitivity to source drain anneal variations further minimize V<sub>th</sub> variation. Table I compares the performance of the proposed implementation with state-of-the-art processors published in literature. This 16-bit MSP430 implementation in the xLP FDSOI process consumes 67% less energy as compared to [12]. The implementation in [1] consumes 6-10pJ/cycle at 0.5V for individual CPU instructions such as LOAD/STORE, AND, XOR etc. while the proposed implementation consumes 5pJ/cycle executing a peak detection algorithm.



Fig 7: Measured  $V_{th}$  variation in xLP FDSOI and comparison with standard FDSOI process



(a) Chip Micrograph



(b) Measurement setup Fig 8: Chip Micrograph and measurement setup

|                        | [1]                            | [2]                   | [4]                   | [11]                                                 | [12]                        | This Work                                     |
|------------------------|--------------------------------|-----------------------|-----------------------|------------------------------------------------------|-----------------------------|-----------------------------------------------|
| Technology             | 65nm                           | 180nm                 | 180nm                 | 65nm                                                 | 130nm                       | 90nm FDSOI                                    |
| Architecture           | 16-bit<br>MSP430<br>compatible | ARM Cortex<br>M0+     | ARM Cortex<br>M0      | 16-bit<br>MSP430<br>compatible                       | 16-bit MSP430<br>compatible | 16-bit MSP430<br>compatible                   |
| Operating Voltage      | 0.3-0.6V                       | 0.16-1.15V            | 0.6V                  | 0.32-0.48V                                           | 0.55V-1.2V                  | 0.38-0.9V                                     |
| Min Energy             | 6-10pJ/cycle<br>@ 0.5V         | 44.7pJ/inst@<br>0.55V | 17.2pJ/inst@<br>0.26V | 2.6pJ/cycle@<br>0.375V<br>executing<br>FIR filtering | 14.8pJ/cycle@0.<br>6V       | 5pJ/cycle@0.4V<br>executing peak<br>detection |
| Operating<br>Frequency | 8.7kHz-<br>1MHz                | 2Hz-15Hz              | 160-330kHz            | 25-71MHz                                             | -                           | 100kHz-10MHz                                  |
| Area                   | 1.62mm2                        | 2.04mm2(CP<br>U+MEM)  | 1.7mm2(CP<br>U+MEM)   | 0.42mm2                                              | 5.13mm2(CPU+<br>Mem+Accl)   | 0.44mm2                                       |

Table I: Comparison of proposed work with state-of-the-art

The implementation in [11] operates over a limited range of 0.32-0.48V while [2] and [4] have a limited range of operating frequencies and consumes higher energy per cycle as compared to the current implementation.

Fig. 8(a) shows the chip micrograph and Fig. 8(b) shows the measurement setup. The 16-bit MSP430 processor consumes an area of  $0.44 \text{ mm}^2$ . The total die size including the processor and register-based memories is  $3.372 \text{ mm} \times 3.372 \text{ mm}$ .

#### 5. Conclusion

We implemented an open-source 16-bit MSP430 processor in a custom 90nm xLP FDSOI process optimized for ULP operation. We executed a QRS peak detection algorithm on the processor, which can be implemented in an ECG application. The processor consumes  $1.3\mu$ W at 250 kHz operating at 0.4V with an improved tolerance to process variations and can be utilized in wearable ULP systems for IoT applications.

### 6. Acknowledgement

The authors would like to thank Mr. Daniel Truesdell for helping with programming and chip testing and Dr. Dilip Vasudevan for helpful discussions. The Lincoln Laboratory portion of this work is sponsored by the Assistant Secretary of Defense for Research & Engineering (ASD R&E) under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

## 7. References

 Kwong, J.; et al., "A 65 nm Sub-V<sub>t</sub> Microcontroller With Integrated SRAM and Switched Capacitor DC-DC Converter," JSSC 2009

- Wootaek Lim; et al., "8.2 Batteryless Sub-nW Cortex-M0+ processor with dynamic leakage-suppression logic," *ISSCC* 2015
- [3] Dongsuk Jeon; et al., "24.3 An implantable 64nW ECG-monitoring mixed-signal SoC for arrhythmia diagnosis," *ISSCC* 2014
- [4] Yoonmyung Lee; et al., "A modular 1mm<sup>3</sup> die-stacked sensing platform with optical communication and multi-modal energy harvesting," *ISSCC* 2012
- [5] Klinefelter, A.; et al., "21.3 A 6.45μW self-powered IoT SoC with integrated energy-harvesting power management and ULP asymmetric radios," *ISSCC* 2015
- [6] Vitale, S.A.; et al., "FDSOI Process Technology for Subthreshold-Operation Ultralow-Power Electronics," *Proceedings of the IEEE*, vol.98, no.2, pp.333,342, Feb. 2010
- [7] C.J. Brennan, et al., "Comparison of Gate Dielectric Plasma Damage from Plasma-Enhanced Atomic Layer Deposited and Magnetron Sputtered TiN Metal Gates," *Journal of Applied Physics* 118, 045307 (2015)
- [8] OpenMSP430Architecture http://opencores.org/project,openmsp430
- [9] Texas Instruments MSP430x1xx Family http://www.ti.com/lit/ug/slau049f/slau049f.pdf
- [10] Friesen, G.M.; et al., "A comparison of the noise sensitivity of nine QRS detection algorithms," *Biomedical Engineering, IEEE Transactions on*, Jan. 1990
- [11] Bol, D.; et al., "SleepWalker: A 25-MHz 0.4-V Submm<sup>2</sup> 7-μW/MHz Microcontroller in 65-nm LP/GP CMOS for Low-Carbon Wireless Sensor Nodes," JSSC 2013
- [12] Kyong Ho Lee; et al., "A Low-Power Processor With Configurable Embedded Machine-Learning Accelerators for High-Order and Adaptive Analysis of Medical-Sensor Signals," JSSC 2013