# SleepRunner: A 28-nm FDSOI ULP Cortex-M0 MCU With ULL SRAM and UFBR PVT Compensation for 2.6–3.6-µW/DMIPS 40–80-MHz Active Mode and 131-nW/kB Fully Retentive Deep-Sleep Mode

David Bol<sup>®</sup>, Senior Member, IEEE, Maxime Schramme<sup>®</sup>, Graduate Student Member, IEEE, Ludovic Moreau, Student Member, IEEE, Pengcheng Xu<sup>®</sup>, Student Member, IEEE,

Rémi Dekimpe<sup>(D)</sup>, Graduate Student Member, IEEE, Roghayeh Saeidi, Member, IEEE,

Thomas Haine<sup>®</sup>, *Member, IEEE*, Charlotte Frenkel<sup>®</sup>, *Member, IEEE*, and Denis Flandre<sup>®</sup>, *Senior Member, IEEE* 

Abstract-Preventing device obsolescence in Internet-ofthings (IoT) is mandatory for its massive deployment to be ecologically sustainable. This calls for ultralow-power (ULP) reprogrammable microcontroller units (MCUs) for long lifetime, yet with sufficient computing performance to extract the meaningful information from the sensed data before transmitting it to the cloud. In this article, we present the SleepRunner MCU with logic/memory/power management co-optimization for best exploitation of the forward back biasing (FBB) capability in fully-depleted silicon-on-insulator (FDSOI) technologies. For low active power, we use ultralow-voltage (ULV) low- $V_t$  logic with upsized gate length and asymmetric FBB, a ULP SRAM macro with low read-access energy and switched-capacitor voltage regulators (SCVRs) for ULV supply generation from a single I/O voltage. The custom ULP SRAM macro is based on an ultralowleakage (ULL) FBB-compatible bitcell for low SRAM retention power. In addition, a dual-loop digital unified frequency/backbias regulation (UFBR) system efficiently compensates process and temperature variations with short wakeup from the zero-back-bias deep-sleep mode. Performance is measured for a synthetic benchmark and biomedical inference applications. The measured 40-MHz 2.6- $\mu$ W/DMIPS (3.3  $\mu$ W/MHz) active and 131-nW/kB deep-sleep power consumptions with CPU

Manuscript received July 9, 2020; revised October 27, 2020 and January 4, 2021; accepted January 21, 2021. Date of publication February 24, 2021; date of current version June 29, 2021. This article was approved by Associate Editor Jonathan Chang. This work was supported in part by the Fonds européen de développement régional (FEDER), in part by the Wallonia within the Wallonie-2020.EU program, in part by the Plan Marshall, and in part by the FRS-FNRS of Belgium. (*Corresponding author: David Bol.*)

David Bol, Maxime Schramme, Ludovic Moreau, Pengcheng Xu, Rémi Dekimpe, Roghayeh Saeidi, and Denis Flandre are with the Electronic Circuits and Systems (ECS) Group, ICTEAM Institute of Université catholique de Louvain, 1348 Louvain-la-Neuve, Belgium (e-mail: david.bol@ uclouvain.be).

Thomas Haine was with the Electronic Circuits and Systems (ECS) Group, ICTEAM Institute of Université catholique de Louvain, 1348 Louvain-la-Neuve, Belgium. He is now with E-peas Semiconductors, 1435 Mont-Saint-Guibert, Belgium.

Charlotte Frenkel is with the Electronic Circuits and Systems (ECS) Group, ICTEAM Institute of Université catholique de Louvain, 1348 Louvain-la-Neuve, Belgium, and also with the Institute of Neuroinformatics, University of Zurich and ETH Zurich, 8057 Zurich, Switzerland.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2021.3056219.

Digital Object Identifier 10.1109/JSSC.2021.3056219

state retention are respectively  $3 \times$  and  $2.5 \times$  lower than for a conventional MCU design in this technology. This demonstrates the interest of 28-nm FDSOI with the proposed FBB-driven system optimization for ULP MCUs.

*Index Terms*—Adaptive process, voltage, and temperature (PVT) compensation, CMOS digital integrated circuits, fully-depleted silicon-on-insulator (FDSOI), microcontroller, near-threshold computing, SRAM, ultralow power (ULP), ultralow voltage (ULV).

## I. INTRODUCTION

THE massive deployment of smart sensors and connected bjects according to the Internet-of-things (IoT) vision faces ecological [1], [2] and societal issues with respect not only to the eco-toxicity of battery replacement but also to the geopolitical conflicts, carbon footprint and local pollution associated with critical metal extraction [3], the growing energy footprint of chip production [4] and the disposal of e-waste [5]. As all these impacts are sensitive to device obsolescence, we need to pursue a very long lifetime for smart sensors enabled by energy-harvesting supply and overthe-air firmware updates to keep up with the evolution of the applications. This calls for ultralow-power (ULP) microcontroller units (MCUs) with large programmable memory that are capable of edge computing to locally extract the meaningful information from the sensor data before transmitting it, in order to avoid a data deluge in the cloud [6].

In smart sensor applications, MCUs alternate between sleep mode to save power and active mode for processing sensor data. ULP MCU design thus faces the key tradeoff represented in Fig. 1 between computing performance, active-mode power, deep-sleep retention power and wakeup time/energy, while keeping silicon area under control for cost and production carbon footprint concerns [7], [8].<sup>1</sup> Near-threshold circuits

2256

0018-9200 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Univ Catholique de Louvain/UCL. Downloaded on June 29,2021 at 11:46:12 UTC from IEEE Xplore. Restrictions apply.

<sup>&</sup>lt;sup>1</sup>The global direct greenhouse gas (GHG) emissions of MCU production are estimated at 10 MTCO<sub>2</sub>e per year for an annual production volume of 30 billion units at 0.35 kgCO2e per unit [9], excluding the extraction of raw material. Massive IoT deployment is expected to increase this production volume and the associated emissions.



Fig. 1. Performance tradeoffs for ULP MCUs.

operating at ultralow voltage (ULV) [10]-[12] to improve the performance/active power tradeoff have matured from the first full ULP MCUs [13], [14] to integration in commercial products [15]. However, ULV operation triggers the extra challenge of preserving this tradeoff over process, voltage, and temperature (PVT) corners. Previous papers on ULP MCUs focused on PVT compensation with unified frequency and voltage regulation [8], [17], ultralow deep-sleep power through custom SRAM/DRAM macros [17]-[20], and multi-mode embedded power management (ePM) [17], [21], [22], custom standardcell libraries [23], timing error detection [24]-[26], use of fully-depleted silicon-on-insulator (FDSOI) with exploitation of its forward back bias (FBB) capability [27]-[30], [52], technology scaling to 14-nm FinFET [31] and on-chip loops for active-mode energy minimization [32], [33]. However, we demonstrate here that best tradeoffs can only be reached by co-optimizing the architecture, logic, memory, and power management.

In this article, which extends [34], [35], we aim at supporting this claim by presenting a 40-80-MHz 64-kB Cortex-M0 ULP MCU in 28-nm FDSOI codenamed Sleep-Runner. It advances the performance tradeoff beyond the state of the art by full exploitation of the unique FDSOI back biasing capability through co-optimization of the logic, memory, and power management. The specific contributions are: i) a switchable back bias scheme between zero back bias (ZBB) in deep sleep mode for low retention power and adaptive asymmetric FBB in active mode for speed at ULV, combined with ii) a custom ULP SRAM macro based on a low-voltage ultralow-leakage (ULL) FBB-compatible bitcell, and iii) a digital unified frequency and back-bias regulation (UFBR) system offering PVT compensation in active mode at low power overhead and with sub-20  $\mu$ s wakeup time. ePM includes dual-mode switched-capacitor voltage regulators (SCVRs) and the computing architecture embeds an fast Fourier transform (FFT) hardware (HW) accelerator for spectral sensor data processing.

This article is organized as follows. In Section II, we introduce the MCU architecture with its specifications. Sections III, IV, V, and VI are focused on the design of the ULV logic, ULP SRAM, UFBR system and SCVRs, respectively. Experimental validation is presented in Section VII and the system co-optimization is summarized with conclusions in Section VIII.

#### II. MCU ARCHITECTURE AND SPECIFICATIONS

As shown in Fig. 2, the ULV MCU logic at 0.4 V  $(V_{DDL})$  includes a Cortex-M0 CPU from ARM (DesignStart obfuscated version), an FFT accelerator, a wakeup interrupt controller (WIC) and various interfaces including a JTAG slave



Fig. 2. SleepRunner MCU architecture and power modes.

for programming, a dual SPI master for control of a radio and a sensor, a digital-camera module interface (DCMI) slave and GPIOs.

The 64-kB memory is implemented with two different 32-kB SRAM macros for the program memory (PMEM) holding the firmware and the data memory (DMEM). The PMEM is based on a custom ULP SRAM macro [35] operated at 0.5 V ( $V_{DDS}$ ) and specifically designed for a low read access energy combined with a low leakage at the cost of a limited density of 125 kB/mm<sup>2</sup>. The design of this memory is introduced in Section IV. Compared to the PMEM, the DMEM in Cortex-M0 MCUs contributes less to the active power because of less frequent access. Therefore, to limit silicon die area, we selected as DMEM a high-density (HD) 32-kB foundry SRAM supplied at 0.8 V ( $V_{DDH}$ ), which leads to a limited active-mode power overhead despite its much higher access energy.

The main system clock (MCLK) with register-program mable frequency is generated on-chip from an external 12-MHz reference clock (REF\_CLK). It is generated by a tunable ring oscillator (TRO) whose frequency is controlled by the NMOS back bias (BBN) and PMOS back bias (BBP) backbias voltages through UFBR. ePM includes three dual-mode SCVRs to generate  $V_{DDL}$ ,  $V_{DDS}$  and  $V_{DDH}$  internal supplies from the single 1.8-V I/O voltage  $V_{DDIO}$ .

The power modes of SleepRunner are summarized in Fig. 2. In active mode, the MCU runs at a frequency programmable between 48 and 72 MHz at nominal voltage. The SCVRs support  $\pm 5\%$  output voltage over/underdrive to expand the frequency range from 48–72 to 40–80 MHz. Their output power range is 150  $\mu$ W for each output voltage supply  $V_{DDL}$ ,  $V_{DDS}$ , and  $V_{DDH}$ . In active mode, FBB is used adaptively for the logic and the ULP SRAM as controlled by the UFBR system to preserve the target clock frequency over PVT corners while minimizing leakage power.



Fig. 3. Optimization of the tradeoff for the MCU logic between (a) active energy per cycle ( $E_{cycle}$ ), (b) deep-sleep leakage power ( $P_{leak}$ ), and (c) SCVR PCE under 72-MHz timing constraints (SPICE simulation results at TT corner, 25°C, with timing/power model calibrated from post-layout place/route results).

A first simple low-power mode is entered by setting the Cortex-M0 CPU in sleep through its embedded architectural clock gating. Power savings are limited but the wakeup is instantaneous and does not consume energy. This mode is useful when waiting for the FFT or a peripheral to complete a short task. Deep-sleep mode saves much more power by using a 750-kHz system clock (divided from REF\_CLK) and by using ZBB for the logic and ULP SRAM, which increases the threshold voltage of the transistors  $(V_t)$  for low leakage. Let us mention here that ZBB corresponds to BBN = BBP = 0 V for low-Vt (LVT) MOSFETs in 28-nm FDSOI [38]. The wakeup time (20  $\mu$ s) is more important as the UFBR needs to restore MCLK frequency and BBN/BBP voltages before activating the CPU. However, such a switchable back bias scheme allows significant leakage reduction in a deep-sleep mode without resorting to power gating. Deep-sleep full state retention is thus guaranteed in both the memory and the logic, which avoids software initialization at wakeup. The DMEM HD SRAM accounts for most of the power in deep-sleep mode and it can thus be power gated at the cost of a software initialization at wakeup.

#### **III. ULV LOGIC IMPLEMENTATION**

Previous ULP MCUs in FDSOI mostly use regular- $V_t$ (RVT) transistors for leakage concern [27], [29]. However, it is shown in [37] and [36] that LVT transistors have the capability to move the minimum-energy point (MEP) for the MCU logic toward higher frequency, e.g., to 25 MHz in 65-nm LP/GP CMOS [8] or to 100 MHz in 28-nm FDSOI [28]. To support 40-80-MHz operation at ULV, we thus select LVT devices. At ULV, the leakage power integrated over the relatively long cycle time ( $P_{\text{leak}} \times T_{\text{cycle}}$ ) typically contributes to 10%–30% of the logic energy per cycle ( $E_{cycle}$ ). Upsizing the transistor gate length  $(L_g)$  thus reduces  $E_{cycle}$  at the MEP as it reduces more the leakage power than it degrades the cycle time thanks to improved subthreshold swing, lower DIBL and variability [39]. In FDSOI, gate length upsize in logic standard cells is enabled by poly biasing (PB) [50]. Fig. 3(a) shows the total energy per cycle  $(E_{cycle})$  of the logic of the whole MCU. Applying a 16-nm poly bias (PB16) yields an  $E_{\text{cycle}}$  close to 1 pJ/cycle for frequencies below 10 MHz. FBB applied to both the logic and ULP SRAM is used in active mode to shift the MEP with PB16 library to the target frequency range.

SCVRs generating the ULV supplies from  $V_{\text{DDIO}}$  can only achieve high power conversion efficiency (PCE) in a limited output voltage range, which depends on their topology. As represented in Fig. 3(c), a divide-by-3 SCVR topology is efficient in the 0.49–0.56-V range and a divide-by-4 one is efficient in the 0.38–0.43-V range. We thus aim for 0.4 V and reach 72-MHz operation using asymmetric FBB with a stronger FBB level for PMOS than for NMOS transistors (i.e., BBP voltage is more negative), as PMOS transistors are slower and have a lower back bias effect in this technology. In nominal conditions, i.e., typical-NMOS/typical-PMOS process (TT) corner at 25°C and 72 MHz, the target BBN/BBP voltages are +1 V/-2 V, respectively.

As shown in Fig. 3(b), a high FBB level leads to  $30-\mu W$ logic leakage, which is much higher than with ZBB for the PB16 library. This is fine for active mode (30% of the logic power) but prohibitive for deep-sleep mode. Therefore, BBN and BBP are driven to 0 V in deep-sleep mode by the on-chip FBB drivers to kill leakage power, while preserving full state retention in both logic and ULP SRAM, which avoids software initialization at wakeup. The effect of the ZBB is a leakage power reduction by  $30 \times$  compared to the +1/-2-V FBB condition. Fig. 3(b) further shows that the switchable back bias between FBB in active mode and the ZBB in deep-sleep mode, combined with PB16 upsized- $L_g$  LVT transistors results in both better active power and deep-sleep power than RVT transistors. Of course, lower leakage in deep-sleep mode could be achieved with RVT transistors with poly bias or reverse back bias. However, poly bias with RVT will suffer from active power penalty due to a higher minimum  $V_{DD}$  to operate at 72 MHz. RBB would suffer from a deep-sleep power penalty to generate these reverse biases, especially the negative voltage for the NMOS transistors. Indeed, previously reported back biasing generators consume quiescent powers of 56  $\mu$ W [45], 10  $\mu$ W [46], and 2.5  $\mu$ W [47]. FBB in active mode avoids this pitfall as the power cost of operating the FBB drivers is negligible compared to the active power.

# IV. ULP SRAM MACRO BASED ON ULL BITCELL COMPATIBLE WITH ADAPTIVE FBB AND ULV OPERATION

In ULP MCUs, memory macros are critical as their access energy, leakage power and area can easily dominate the activemode energy per cycle, deep-sleep retention power, and chip area, respectively. ULP SRAM macros typically aim at ULV operation for low access energy and use bitcells with 8–10 transistors [40]. The general idea illustrated in Fig. 4 with a dual- $V_t$  8-T bitcell architecture is to use a latch with relatively high  $V_t$  for leakage concern,<sup>2</sup> and a decoupled read port for avoiding read disturb problems at ULV. To preserve speed at ULV, the read port uses low  $V_t$  transistors and the write word lines (WWL) can be boosted to a voltage higher than the ULV  $V_{DD}$  [18], [40].

In 28-nm FDSOI, RVT transistors (i.e., the core transistor type with the highest  $V_t$ ) are formed without channel doping

<sup>&</sup>lt;sup>2</sup>In extreme cases, SRAM macros use thick-oxide I/O transistors for even lower leakage at the cost of density and access energy penalties due to higher bitline capacitance and higher required  $V_{DD}$  to meet the speed target [17].



Fig. 4. Conventional dual- $V_t$  ULP SRAM bitcell (see, e.g., [40]). In FDSOI, the N-well sharing between RVT PMOS and LVT NMOS leads to a conflict for temperature compensation through UFBR (adaptive) FBB.



Fig. 5. ULL SRAM bitcell from [35] using exclusively LVT transistors for compatibility with the UFBR FBB scheme: (a) architecture and (b) hold retention characteristics (SPICE simulation results at 0.5 V with BBN=1 V, BBP=-2 V). No poly bias is used in the bitcell but gate length of the three marked transistors is upsized in the layout and the other transistors have a gate length of 32 nm.

but with NMOS transistors lying over a P-well and PMOS over an N-well, both below the buried oxide [38]. In opposition, LVT transistors are formed with NMOS transistors in N-well and PMOS in P-well. Using the dual- $V_t$  bitcell architecture from Fig. 4, RVT PMOS transistors from the low-leakage latch share the BBN N-well back bias with the LVT NMOS of the ULV read port. In this case, it is not possible to compensate for the effect of temperature variations with adaptive FBB because a higher FBB to speed up the read port at low temperature would lead to a slowdown of the write operation in the latch. Therefore, dual- $V_t$  bitcells cannot be used with adaptive FBB in 28-nm FDSOI. The challenge is thus to design an LVT bitcell without suffering from the associated leakage penalty in deep-sleep mode. Whereas the write and read ports can be gated in deep-sleep mode, the latch is the critical part as it needs to stay on for retention concern.

We thus propose an ULL bitcell compatible with adaptive FBB and low-voltage operation [35] as shown in Fig. 5(a). It is based on the LVT latch from [41] that uses two ULP negative-differential-resistance (NDR) structures. This structure is composed of an NMOS/PMOS transistor pair where both transistors self-bias themselves at  $V_{\rm gs} = -\Delta V/2$  with  $\Delta V$  the voltage difference across the NDR structure [41]. In this latch, when  $V_{\rm cell}$  increases from 0 V to the SRAM



Fig. 6. ULL bitcell results: (a) statistical hold robustness (post-layout SPICE simulation results at 0.5 V with BBN=1 V, BBP= -2 V in active and BBN=BBP=0 V in sleep) and (b) layout compliant with logic DRC rules. The post-shrink area of the bitcell is 0.76  $\mu$ m<sup>2</sup>.

supply voltage  $V_{\text{DDS}}$ , the current of the pull-down NDR structure  $I_{\text{NDR,PD}}$  first increases because the  $V_{\text{ds}}$  of its internal transistors increases but then decreases as their  $V_{\text{gs}}$  progressively become more negative, which leads to the NDR characteristics [Fig. 5(b)]. The pull-up NDR structure behaves symmetrically, which results in two stable points on  $V_{\text{cell}}$  for holding logic-0 and logic-1 data at  $V_{\text{hold0}}$  and  $V_{\text{hold1}}$ , respectively, and a metastable trip point at  $V_{\text{meta}}$ , as shown in Fig. 5(b). We add a transmission-gate LVT write port for fast write operation and a decoupled 2-T LVT read port for avoiding the read disturb problem at ULV.

In the proposed ULL bitcell, the NDR currents responsible of data retention are subthreshold transistor currents. They are thus very sensitive to local mismatch, which can lead to retention failure due to hold instability.<sup>3</sup> As memories are made of thousands of bitcells, the hold failure rate of a single bitcell has to be very low to get a high yield for the macro [42]. If we consider a 32-kB memory with a yield target of 99%, we can obtain the failure rate for a single bitcell with

$$1 - \eta_{\text{bitcell}} = 1 - \eta_{\text{macro}}^{\frac{1}{N_{\text{bitcell}}}}$$
(1)

with  $N_{\text{bitcell}}$  the number of bitcells in the macro, and  $\eta_{\text{bitcell}}$ and  $\eta_{\text{macro}}$  the yield of bitcell and the macro, respectively. This gives a specification on the bitcell hold failure rate of  $3.85 \times 10^{-8}$ , equivalent to  $5.37\sigma$ , that we conservatively round at 0.03 ppm. To reach such a low hold failure rate for the ULL bitcell, we upsize the width of the NDR structures  $(W = 140 \text{ nm}, L_g = 32 \text{ nm})$  and the gate length of the writeport transistors to limit their leakage [35]. We also upsized the length of one of the read-port transistors to avoid spurious read bitline (RBL) discharge [35]. The verification of the bitcell yield was performed by computing the cumulative distribution function (CDF) of the failure rate with the Gradient Importance Sampling methodology from [42], both in active and sleep modes. Results are shown in Fig. 6(a) and can be interpreted as follows: for the hold-0, the cumulative failure rate increases when specifying that the maximum stable hold-0 point  $V_{hold0}$  should be closer to 0 V, as this is a stricter

 $<sup>^{3}</sup>$ Global PVT variations can further degrade the yield but as the UFBR system presented in Section V compensates global PVT variations, we will not discuss the impact of PVT variations on the ULP SRAM here.

| Temperature |             | Process corner | Normalized delay | Normalized leakage |  |  |  |
|-------------|-------------|----------------|------------------|--------------------|--|--|--|
| -40 °       | $^{\circ}C$ | SS             | $2.47 \times$    | 0.01 	imes         |  |  |  |
| +25         | $^{\circ}C$ | SS             | $1.48 \times$    | $0.39 \times$      |  |  |  |
| +25         | $^{\circ}C$ | TT             | $1 \times$       | $1 \times$         |  |  |  |
| +25         | $^{\circ}C$ | FF             | $0.72 \times$    | $2.58 \times$      |  |  |  |
| +85         | $^{\circ}C$ | FF             | $0.55 \times$    | $21.5 \times$      |  |  |  |
|             |             |                |                  |                    |  |  |  |

constraint. As long as the 0.03-ppm worst case  $V_{hold0}$  is below the worst case  $V_{meta}$  trip point, the hold-0 state is stable with a static noise margin (SNM) corresponding to the voltage difference SNM =  $V_{meta} - V_{hold0}$ . The same reasoning applies to the hold-1 state. Statistical hold robustness is thus ensured with a 0.03-ppm worst case SNM of 92m mV in active mode and 66 mV in sleep mode, both for hold-1 state as hold-0 state is less critical. The resulting  $0.76-\mu$  m<sup>2</sup> bitcell layout is depicted in Fig. 6(b). The area penalty compared to the HD bitcell from the foundry is significant: around 6×. It is due to the two additional transistors, the transistor upsize, the tangled cell structure and the fact that we did not use the SRAM pushed DRC rules.

The proposed ULP SRAM is supplied by the 0.5-V  $V_{DDS}$  instead of the 0.4-V  $V_{DDL}$  of the logic because memories have a lower activity factor than logic and their MEP supply voltage under a system timing constraint is thus typically 100–200 mV higher than the logic MEP [8]. To further decrease the access energy and avoid the half-write problem, the 32-kB macro uses a divided-WL 16-bank 512-bitcell/column architecture [35].

## V. DESIGN OF THE UFBR SYSTEM

At ULV, the impact of process and temperature variations on the logic and SRAM delay is magnified. The most severe effect is the low operating temperature, which significantly increases the logic delay and its local variability [43]. Process and temperature variations also influence the leakage current, which has a strong impact not only on the sleep power but also on the active power at ULV because leakage power usually accounts for a significant portion of the active power at ULV. The simulated impact of process and temperature variations on logic delay and leakage current at the design point of SleepRunner logic is provided in Table I. In the worst cases, a delay is increased by  $2.47 \times$  and leakage by  $21.5 \times$ .

## A. Previous Unified Frequency and Voltage Regulation

Conventionally, digital circuits rely on independent regulation of their clock frequency and their supply voltage as sketched in Fig. 7(a), which requires guardbands on either the maximum clock frequency or on the minimum supply voltage in the worst process and temperature corner [37]. Limiting this guardband calls for adaptive techniques that are usually implemented by either *in situ* error monitors and error correction scheme [24]–[26], or unified voltage/frequency regulation (UFVR) scheme represented in Fig. 7(b) [8], [26].



Fig. 7. Previous architectures for regulation of clock frequency and supply voltage in ULV digital systems. (a) Conventional independent frequency/voltage regulation. (b) Unified frequency/voltage regulation (UFVR) [8].

In such a scheme, the clock is generated by a TRO directly supplied by the ULV  $V_{DD}$  rail so that its cycle time tracks the process/temperature-induced variations of the logic delay. A feedback loop uses the cycle time or frequency information from a timing sensor to control the output of the voltage regulator so as to compensate for logic delay variations. As process/temperature variations are quite slow phenomena, the bandwidth of the UVFR loop is usually limited and primarily dictated by wake-up time constraints. Interestingly, UVFR systems are robust against fast supply voltage droops because the TRO frequency drops instantaneously when this happens [8], [26]. In [8], the UFVR scheme saves 25% active power in nominal conditions by avoiding the guardband linked to SS -40°C operation and preserving robust timing closure and constant frequency over the whole process and temperature range. However, when operating at FF 85°C, UFVR schemes fail to compensate for the very high leakage despite a small reduction in the supply voltage. In [8], the active power is significantly increased at 85°C compared to room temperature.

# B. Proposed Dual-Loop Digital UFBR System

To overcome the limitation of the UFVR scheme, we aim at compensating the delay and leakage variations through adaptive back biasing in FDSOI in a UFBR scheme. Indeed, it was shown that adaptive back or body biasing is capable of compensating not only the low-temperature increased delay but also the high-temperature increased leakage [37], [44]. Therefore, we keep the supply voltage mostly constant as generated by the SCVRs.

The UFBR system we propose, represented in Fig. 8, is digital and takes inspiration from UFVR by relying on a TRO to generate MCLK and track both slow and fast delay variations. It operates in a frequency-locked-loop (FLL) fashion similar to [8]. First, a counter senses and digitizes the TRO frequency during six REF\_CLK cycles. Second, a proportional controller actuates in PWM mode on the BBN by activating a current charge pump (CP) during a variable number of REF\_CLK cycles. To cover the range between 0 and 1.8 V, the CP is supplied by  $V_{\text{DDIO}}$  supply and uses thick-oxide I/O transistors.

The UFBR system features a second loop to balance rising and falling transitions associated with independent (crossed) process variations between PMOS and NMOS transistors leading to faster NMOS/slower PMOS (FS) process conditions or vice versa. This is performed in a delay-locked-loop (DLL) fashion by using delay-based NMOS/PMOS imbalance



Fig. 8. Proposed dual-loop digital UFBR system. BBN is the N-well bias applied to the LVT NMOS transistors and BBP is the P-well bias applied to the LVT PMOS transistors. The 390-pF coupling capacitance between BBN and BBP comes from the deep-N-well connected to BBN that isolates the P-well from the P-substrate. BBN is regulated in a FLL and BBP is regulated in a differential DLL. (a) Architecture. (b) Operation.

sensors and a bang-bang controller actuating on the BBP through a second CP. In order to cover a range between 0 and -3 V, this CP is based on switched capacitors (SC) with two stages according to the voltage doubler topology [46]. It is also supplied by  $V_{\text{DDIO}}$  rail with I/O transistors to reach an open-circuit voltage of -3.6 V.

The UFBR system generates BBN and BBP for both the logic and the ULP SRAM, which account for a significant portion of the MCU area. In this area, a triple-well is used to isolate the LVT PMOS P-well (flipped-well configuration [38]) connected to BBP from the grounded P-substrate. As a result, there is a large well diode between BBN and BBP which results in an estimated coupling capacitance of 390 pF between them. This capacitance results in a parasitic charge injection between BBN and BBP, which makes the dual loop unstable because when the FLL tries to increase the TRO frequency by pumping positive charges on BBN, this parasitic charge injection results in a spurious BBP charge up, which counteracts the TRO frequency increase.

We adopted a simple solution to this problem: we added an external capacitor on BBN to stabilize it. The drawback of this capacitor is the associated energy overhead to charge it at wakeup. The capacitance value should thus not be oversized. The reason to stabilize BBN instead of BBP is that BBP has a higher voltage swing between sleep (0 V) and active (-2 V) modes and it would thus consume higher wakeup energy to charge a large capacitance on BBP. Fig. 9(a) shows the UFBR wakeup with external decoupling capacitance values from 0.5 to 2 nF. We see that low capacitance value leads to overshoot because of lower stability. Increasing the capacitance value to 2 nF kills the overshoot, which actually reduces the wake-up energy by 10% despite the larger switched capacitance. Fig. 9(b) also shows that the proposed architecture can generate independent asymmetric BBN/BBP voltages to compensate for skewed process corners.

#### C. Proposed Process Imbalance Sensor

Previous NMOS/PMOS process imbalance sensors were based on analog NMOS/PMOS current comparison, which



Fig. 9. UFBR startup: (a) the use of a 2-nF external BBN decap  $C_{ext}$  allows a faster lock without overshoot and (b) compensation of skewed process corner (mixed-signal HDL/SPICE simulations at 25 °C).

leads to a hard tradeoff between dc power and response time [45], [47]. In the proposed BBP DLL, we designed a digital NMOS/PMOS process imbalance sensor to overcome this challenge, while using only digital standard cells for good correlation with the MCU logic to be calibrated. Fig. 10(a) shows the sensor schematic, which is based on two delay lines with selective sensitivity to NMOS/PMOS logic delay. Comparison of their delay provides binary information on the relative FS/SF conditions.

The NMOS/PMOS selective sensitivity is made by alternating strong ( $\times$ 38) and weak ( $\times$ 2) driving cells such that the low current of the weak cells has to charge the large input capacitance of the strong cells. Fig. 10(b) shows that a uniform delay line, i.e., with identical  $\times 38$  inverter cells has the same delay sensitivity to NMOS and PMOS  $V_t$  variations around 6%/10 mV. This results from the balanced rise/fall times in the TT corner with the asymmetric BBN/BBP values we selected for this corner. There is thus no selectivity. The proposed delay lines with alternated strong/weak inverter cells with a different input edge shows its selectivity with a sensitivity  $3 \times$  higher to NMOS  $V_t$  variations (9%/10 mV) than to PMOS  $V_t$  variations for the NMOS-sensitive delay line, and vice versa. Fig. 10(b) also shows that transistor stacking in NAND2/NOR2 gates with shorted inputs as weak cells further improves the selective sensitivity.



Fig. 10. Proposed NMOS/PMOS imbalance sensor: (a) schematic, (b) sensitivity to NMOS/PMOS independent  $V_t$  variations, and (c) D2D variability of the closed-loop BBP voltage (100-run MC SPICE simulation results at TT corner, 25 °C).

The drawback of weak cells is that they suffer from a high local  $V_t$  mismatch due to small transistors. This can lead to strong die-to-die (D2D) variations in the resulting closedloop BBP voltage and thus significant leakage overhead: 47% in Fig. 10(c). Increasing the number of stages in the delay lines reduces this variability by the averaging effect at the cost of slower sensing. We thus use an accurate sensor based on this increased number of stages in UFBR lock conditions, while the fast yet inaccurate sensor with less stages is used during the UFBR startup. The proposed sensors can be sampled up to 6 and 3 MS/s respectively, therefore significantly improving the loop response time compared to the state of the art (e.g., 200  $\mu$ s in [47]).

Finally, bang-bang control in the BBP DLL can lead to spurious CP activity, which results in active power overhead. To avoid this, we added a small deadband in the process imbalance sensor with additional delays in the sensor comparison logic, as illustrated in Fig. 10(a). The timing closure of the MCU was performed by taking these BBP deadband and variability into account when recharacterizing the standardcell libraries at ULV, which results in a negligible pessimism compared to full process/temperature corner spread.

# VI. DESIGN OF DUAL-MODE I/O-INPUT SWITCHED-CAPACITOR VOLTAGE REGULATION

The MCU requires three core supply voltages:  $V_{\text{DDH}}$  (0.8 V) for the DMEM HD SRAM,  $V_{\text{DDS}}$  (0.5 V) for the PMEM ULP SRAM and  $V_{\text{DDL}}$  (0.4 V). Three SCVRs are used to generates these supplies from the single I/O supply voltage  $V_{\text{DDIO}}$  at 1.8 V. As illustrated in Fig. 11(a), we use switched-cap networks (SCNs) with divide-by-2 ( $\div$ 2), divide-by-3 ( $\div$ 3) and divide-by-4 ( $\div$ 4) topologies, respectively. The SCN uses a mix of metal-insulator-metal (MiM) capacitors and metal-oxide-metal (MoM) capacitors are denser and have lower bottom-plate parasitic capacitance and are thus preferred over MoM



Fig. 11. Dual-mode I/O-input SCVRs: (a) architecture and (b) quiescent current breakdown in active and sleep modes (SPICE simulation, TT corner at 25 °C). Supplying  $V_{\text{DDS}}$  and  $V_{\text{DDL}}$  comparators from the 0.8-V  $V_{\text{DDH}}$  output instead of the primary 1.8-V  $V_{\text{DDIO}}$  allows the use of thin-oxide core transistors instead of thick-oxide I/O transistors.

capacitors [48]. However, the maximum voltage they tolerate is 1.1 V and some capacitors in the SCNs see a higher voltage across them at startup. We thus use MoM capacitors for these. The switches in the SCNs are I/O transistors controlled by 1.8-V signals generated by non-overlapping clock (NoC) generators. Regulation is performed through pulse-skip modulation (PSM) as it ensures that switching losses scales with the output load power. It is based on voltage references generated from an analog 1.8-V input ( $V_{REF}$ ) and dynamic comparators clocked by the 12-MHz reference clock REF\_CLK. The SCNs were sized with the methodology from [48].

There are two challenges for these SCVRs. First, as we only have  $V_{\text{DDIO}}$  as primary supply, the control logic consumes a significant amount of quiescent power because of its relatively high supply voltage (1.8 V) and the fact that it needs to be implemented with thick-oxide I/O transistors significantly larger than core transistors. Fig. 11 shows that the quiescent power (< 20  $\mu$ W) dominated by the I/O comparator power consumption (control losses) is indeed high compared to the expected load power at low MCLK frequency ( $\approx$  120  $\mu$ W). Second, the maximum output load power of these SCVRs varies by 30× between active and sleep modes, which sets a strong constraint on the quiescent power in sleep mode while ensuring sufficient regulation capability (output impedance and bandwidth) in active mode.

Fig. 11 shows the proposed solution to these challenges. We first generate the  $V_{\text{DDH}}$  supply from  $V_{\text{DDIO}}$  to supply the comparators of  $V_{\text{DDS}}$  and  $V_{\text{DDL}}$  SCVRs, implemented with thin-oxide core transistors. This allows saving 2/3 of the comparator power.<sup>4</sup> Second, the SCVRs are designed for dual-mode operation with a different clock frequency between active and sleep modes: 12-MHz CLK\_REF is used in active mode and it is divided by 64× in sleep mode, which reduces quiescent power proportionally except for leakage and frequency-divider contributions.

<sup>4</sup>Let us mention for full transparency that I/O comparators in this chip were sized conservatively with respect to mismatch and that their power could thus be reduced by  $3-5\times$  with sizing optimization. Nevertheless, their power would still be  $\approx 20\times$  higher than the comparators implemented with core transistors.



Fig. 12. Microphotograph of SleepRunner MCU die in 28-nm FDSOI with superimposed layout view. Active MCU area is below 0.6 mm<sup>2</sup>. Sizes and area numbers are provided post optical shrink.



Fig. 13. SCVR measurement results: (a) load regulation in active mode, (b) efficiency in active mode, and (c) quiescent current for the tested dies in active and sleep modes.

## VII. EXPERIMENTAL VALIDATION

SleepRunner MCU SoC was prototyped on a 1.6-mm<sup>2</sup> die illustrated in Fig. 12 in 28-nm FDSOI with a 10-metal process featuring dense MiM caps. The MCU area is below 0.6 mm<sup>2</sup> including all digital and mixed-signal blocks from Fig. 2. It is packaged in QFN80. In this section, we provide experimental validation for SCVR, UFBR, ULP SRAM macro, and computing sub-systems before assessing the ULP applicative potential by mapping two biomedical applications on SleepRunner and comparing them to state-of-the-art ULP MCUs.

## A. SCVR Performance

SCVR characterization shows good load regulation for the three core supplies  $V_{\text{DDH}}$ ,  $V_{\text{DDS}}$ , and  $V_{\text{DDL}}$  up to 150–200  $\mu$ W of output load power, as illustrated in Fig. 13(a). The global PCE defined as the ratio between the sum of the three output power on the core supplies  $V_{\text{DDH}}$ ,  $V_{\text{DDS}}$ , and  $V_{\text{DDL}}$ , and the input power on  $V_{\text{DDIO}}$  primary supply is shown in Fig. 13(b) with a descent peak at 76.3%. Fig. 13(c) shows the total SCVR quiescent power for the tested dies with a reduction from 10 to 1.65  $\mu$ W between active and sleep modes thanks to clock frequency division. Let us notice the higher variability of



Fig. 14. UFBR system measurement results: (a) wakeup and go-to-sleep power mode transitions, (b) generated frequency, and (c) active power as a function of the temperature.

quiescent power in sleep mode  $(3 \times \text{higher standard deviation})$  coming from the proportionally higher leakage contribution.

## B. UFBR Performance

The UFBR functionality was tested for several dies and a temperature range between -40 and +85 °C. Fig. 14(a) shows the typical transitions between active and sleep modes and vice versa. It shows a test output signal consisting of MCLK divided by 64 on-chip. It also shows BBN and BBP voltages. At wakeup, we see that the closed-loop UFBR system locks in less than 10  $\mu$ s with MCLK generated by the on-chip TRO that stabilizes at 64 MHz and BBN/BBP that stabilize around +0.75/-1.8 V, respectively. The total wakeup time is less than 20  $\mu$ s. The time required to go to sleep mode is similar and we can see MCLK being at 750 kHz, which results from the use of REF\_CLK divided by 16 in sleep. BBN is quickly discharged to 0 V by the current CP driving it. We notice that BBP takes time to stabilize at 0 V. This comes from the switchedcapacitor discharge pump driving BBP, which is gated after a few  $\mu$ s. Indeed, as this discharge pump is not a static circuit, it cannot be kept active in sleep mode for power concerns. This is not an issue as the leakage is dominated by NMOS transistors in sleep mode and the impact of incomplete BBP discharge does not significantly affect the leakage power. The 20  $\mu$ s mode transition times are very competitive compared to previous designs using adaptive FBB in closed loop<sup>5</sup> (200  $\mu$ s in [45], 200  $\mu$ s in [47]).

Fig. 14(b) shows the MCLK frequency as a function of the temperature. With both ZBB and static asymmetric FBB (BBN=1 V, BBP= -2 V), we see that the frequency of the TRO generating MCLK decreases significantly with a temperature decrease resulting from the associated  $V_t$  increase [43]. However, with static FBB, the frequency drops above +25 °C. Moreover, we observe a degradation of the absolute maximum

 $<sup>{}^{5}</sup>$ Let us mention that this time depends on the biased load area, which is around 0.4 mm<sup>2</sup> post-shrink in this chip.

frequency (i.e., at the optimum FBB level) with the increase of the temperature. We can attribute this either to prohibitive IR drops and/or to the fact that the impact of the carrier mobility degradation on the transistor current becomes higher than the impact of the  $V_t$  reduction. At +85 °C, the MCU is even not functional with static FBB because robustness issues appear either due to the poor  $I_{on}/I_{off}$  from the very low  $V_t$  in this case or to too high IR drops due to prohibitive leakage. At low temperature, the MCU is not functional even with static ZBB the reduced TRO frequency. This is either due to robustness issues in the ULP SRAM because of too low subthrehsold current or to timing violations on the external JTAG clock, which uses a fixed frequency.

In any case, Fig. 14(b) shows that the MCU remains functional over a wide voltage range when using the adaptive FBB generated by the UFBR system. We also see the UFBR capability to preserve the target frequency over this temperature range. Fig. 14(c) shows the power of the logic and ULP SRAM, which both use the adaptive FBB from the UFBR, as a function of the frequency. When using a static asymmetric FBB (BBN=1 V, BBP= -2 V), the power explodes at high temperature because of the leakage power that becomes dominant. Unlike adaptive supply voltage scaling (UFVR) [8], we see that adaptive FBB (UFBR) is capable of preserving constant the active power at low temperatures while significantly limiting the active power overhead of high-temperature operation compared to static FBB, e.g., with 3–4  $\mu W/MHz$  for adaptive FBB at +55 °C versus 7.5  $\mu$ W/MHz for static FBB. Of course, static ZBB preserves very low active power up to +85 °C but this comes at the cost of lower and uncontrolled clock frequency and even functional failure at low temperature as shown in Fig. 14(a).

#### C. ULP SRAM Performance

The power of the proposed ULP SRAM was measured in two versions on the prototyped chip: the 32-kB macro used as PMEM in the MCU and an independent 8-kB macro connected to a BIST interface. The 8-kB macro reaches outstanding read energy of 0.66 pJ for accessing 32-bit words. As shown in Table II, the 32-kB version uses a write byte mask (i.e., to independently select the bytes to write from the selected 32-bit word, according to the AHB standard). In combination with the leakage power of the unaccessed bitcells, the switching power and routing of this write byte mask result in an increase of the access energy to 1.6 pJ. However, as shown in Table II, this read access energy is  $5 \times$  lower than the read access energy of the foundry HD SRAM based on a conventional 6T bitcell, prototyped on the same chip. This reduction is obtained thanks to ULV operation, divided-WL architecture and single-ended bitlines [35]. Although we selected a low-leakage flavor with RVT transistors for the HD SRAM, the retentive power of the proposed ULP SRAM is  $2 \times$ lower despite its full LVT implementation thanks to its unique ULL bitcell with negative  $V_{gs}$  self-biasing in the NDR-MOS structures. These power savings come at the cost of a  $5 \times$  lower density for the ULP SRAM mostly due to the larger ULL bitcell layout.

TABLE II Comparison to ULP SRAMs in FDSOI

|                                        | HD SRAM         | ULP SRAM     | Propos  | ed ULP  | ULP SRAM                                         |  |  |  |
|----------------------------------------|-----------------|--------------|---------|---------|--------------------------------------------------|--|--|--|
|                                        | with 6T bitcell | with 7T ULL  | SRAM    | with 8T | with 6T                                          |  |  |  |
|                                        | [foundry]       | bitcell [19] | ULL B   | oitcell | bitcell [49]                                     |  |  |  |
| Technology                             | 28nm FDSOI      | 28nm FDSOI   | 28nm    | FDSOI   | 65nm SOTB                                        |  |  |  |
| Macro size [kB]                        | 32              | 8            | 8       | 32      | 16                                               |  |  |  |
| I/O word width [bits]                  | 32              | 32           | 32      | 32      | 32                                               |  |  |  |
| Write byte mask                        | ✓               | ×            | ×       | _ ✓     | N/A                                              |  |  |  |
| V <sub>DD</sub> [V]                    | 0.8             | 0.5          | 0.55    | 0.5     | 0.75                                             |  |  |  |
| Macro density<br>[kB/mm <sup>2</sup> ] | 600             | 89           | 118     | 125     | 180                                              |  |  |  |
| Read access<br>time [ns]               | < 2             | < 10         | < 10    | < 10    | < 4.6                                            |  |  |  |
| Access energy                          | 8.2             | 0.64         | 0.66    | 1.6     | 6.3                                              |  |  |  |
| [pJ/32-bit access]                     | @80 MHz         | @80 MHz      | @80 MHz | @48 MHz | @150 MHz                                         |  |  |  |
| Retentive power<br>[nW/kB]             | 138             | 2600†        | 61      | 69      | 0.034° (macro)<br>0.076 (BB gen)<br>0.11 (total) |  |  |  |

Retentive power is reported in deep-sleep mode with zero back bias except for: † where FBB is required to preserve stability as it shows some retention errors at zero back bias, and ° where RBB is used in deep-sleep mode, which requires to keep the BB generator active.



Fig. 15. Power consumption breakdown in (a) active and (b) deep-sleep modes. The SCVR inefficiency is not considered here for the sake of comparison to the state-of-the-art. In active mode, the higher power consumption of the ULP SRAM compared to the HD SRAM is due to more frequent access. In CPU sleep mode, the power is reduced by  $\pm 2 \mu$ W/MHz compared to the active mode. In deep-sleep mode, the power gating of the HD SRAM results in power consumption of 4.4  $\mu$ W for 32-kB and CPU state retention.

Table II also shows a comparison with the previous ULP SRAM based on NDR structures and a ULP SRAM based on reverse BB in deep-sleep mode in a more relaxed FDSOI technology (65-nm SOTB). The ULP SRAM from [19] based on 7-T ULL NDR bitcell achieves similar access energy for 8-kB macros at a lower density. Its retention characteristic was ensured at FBB but had an issue at zero BB. Therefore, retentive power for this macro is reported with FBB, which leads to strong retention power overhead. The ULP SRAM from [49] offers a dense layout for 65-nm CMOS with outstanding retentive power thanks to RVT transistors, the use of RBB and more relaxed CMOS technology. As this comes at the cost of  $4 \times$  higher access energy, it is an interesting option for applications with a very low duty cycle, whereas the proposed ULP SRAM is better adapted for duty cycles above 1%.

#### D. Computing Power

Active power was measured independently for logic ( $V_{DDL}$ ), PMEM ULP SRAM ( $V_{DDS}$ ) and DMEM HD SRAM ( $V_{DDH}$ ), as well as FBB drivers ( $V_{DDIO}$ ) for a simple synthetic benchmark. The results are provided in Fig. 15(a) for different target frequencies from 40 to 80 MHz, generated internally by the UFBR system. At 40 and 80 MHz, we use a -5% underdrive and a +5% overdrive on the supply voltages, respectively. The power consumption of the FBB drivers is negligible as the UFBR is in the lock.<sup>6</sup>

The active power normalized to the clock frequency is roughly stable over the frequency range and below 4  $\mu$ W/MHz, which confirms the observation from [28], [36], [37], [47] of the FBB capability to shift the MEP over a frequency range. When FBB increases, we observe a slight increase of the logic and ULP SRAM power normalized to the clock frequency.7 This is due to the faster increase of leakage current with a  $V_t$  reduction compared to the clock frequency increase because the transistors are in the nearthreshold regime. This would not happen in the sub-threshold regime [51]. This increase is higher for the ULP SRAM because of its higher leakage power proportion than the logic one. Let us recall that the reason for the higher ULP SRAM power than the HD SRAM power is the much more frequent access to the PMEM than to the DMEM in Cortex-M0 MCUs. If the PMEM was implemented with an HD SRAM macro, its power alone in active mode would be 4  $\mu$ WMHz instead of 1  $\mu$ W/MHz for the ULP SRAM.

When executing the CoreMark benchmark, the active power is 3.3  $\mu$ W/MHz at 48 MHz, which is 25% higher than for the simple counter bench due to more frequent DMEM accesses but still well below the 4  $\mu$ W/MHz threshold.

Deep-sleep power was measured for 10 dies and the average value is 8.4  $\mu$ W. It is dominated by the HD SRAM leakage despite the low-leakage flavor we selected, as shown in 15(b). The ULP SRAM again shows its power advantage. Notice that the HD SRAM features internal power switches to power gate it. The deep-sleep power can thus be reduced to 4.4  $\mu$ W in this mode, at the cost of no data retention, which thus requires a software initialization at startup.

#### E. Applicative Power

In order to quantify the interest in SleepRunner energy efficiency in applications, we ported two biomedical application algorithms on it. First, we ported the epileptic seizure onset detection algorithm with an FFT-based spatio-temporal feature extraction running on 2-s epochs of 23-EEG channels. Classification is run with a linear SVM performed on the features of three consecutive and overlapping epochs. Fig. 16(a) shows the energy consumed by the logic and memories in different cases. First, when running everything in software with HD SRAM used as PMEM and DMEM, the total energy is 28.7  $\mu$ J. The execution of the feature-extraction on the FFT HW accelerator increases the active power but it significantly speeds up the computation which reduces total energy by  $28 \times$ thanks to much lower FFT energy. Setting the CPU in sleep mode (clock gating) during FFT computation further saves 20% active power. Finally, using the ULP SRAM as PMEM saves another 33% of active power.



Fig. 16. ULP performance in biomedical applications: (a) energy for batch execution of epileptic seizure onset detection on three 2-s epochs and (b) always-on power for real-time arrhythmia detection from 200-S/s ECG. Both are run at 48-MHz with UFBR system regulating MCLK and BBN/BBP.

We also ported an arrhythmia-detection algorithm to illustrate the average power of SleepRunner in always-on applications. It is based on heartbeat detection, temporal feature extraction, and linear-SVM-based classification triggered when a heartbeat is detected. Fig. 16(b) shows the evolution of the total average power by progressively adding functionality from deep sleep mode to 1-channel ECG data acquisition through SPI at 200 S/s (i.e., 200 wakeup events per second), 3-channel ECG data acquisition, and 3-channel ECG data acquisition with subsequent data processing. It shows that frequent wakeup events at 200 S/s have a significant impact on the always-on power but also that the ultralow active power of SleepRunner combined with high-speed performance for a ULP MCU allows to perform rich computation while preserving a low duty cycle with a very limited increase of the average power.

### F. Comparison to the State of the Art

Let us now compare SleepRunner to previous ULP MCUs in Table III. From the functionality point of view, SleepRunner and the 90-nm ULL MCU from [25] are amongst the rare research MCUs featuring closed-loop PVT compensation, ePM, CPU state retention in deep-sleep mode, and memory capacity above 16 kB for rich application processing. Sleep-Runner features  $7 \times$  lower active power (3.3  $\mu$ W/MHz) at  $5 \times$ higher computing performance (51 DMIPS at 40 MHz).

For visual performance comparison, we provide power/performance tradeoff plots in Fig. 17. The MCUs in 28/22-nm FDSOI CMOS technology clearly stand out in Fig. 17(a) as much more efficient than MCUs in 90/65/40-nm ULP/ULL/LP bulk CMOS, for the tradeoff between active power and computing performance.<sup>8</sup> With

<sup>&</sup>lt;sup>6</sup>Let us mention that in [34] we wrongly attributed a parasitic power to the FBB drivers. This parasitic power was due to a different block sharing the same supply pin that was erroneously left active.

 $<sup>^7 \</sup>mathrm{The}$  HD SRAM power does not vary with FBB voltages because it uses fixed zero BB.

<sup>&</sup>lt;sup>8</sup>The relatively high active power of the MCU in 14-nm FinFET CMOS from [31] is hard to explain. It could be due to its more complex x86 CPU architecture compared to the Cortex-M CPU architecture used in other MCUs or to more guardbands for industrial robustness concern.

TABLE III Comparison to State-of-the-Art ULP MCUs

|                                             | Bol,<br>JSSC,<br>2013 | Myers,<br>VLSIC,<br>2017 | Prabhat,<br>ISSCC,<br>2020 | Lee,<br>JSSC,<br>2020        | Paul,<br>JSSC,<br>2017  | Ambiq,<br>ApolloBlue3,<br>2019 | Salvador,<br>ESSCIRC,<br>2018 | Abouzeid,<br>ESSCIRC,<br>2015 | Uytterhoeven,<br>ESSCIRC,<br>2018 | Lallement,<br>JSSC,<br>2018 | Lallement,<br>SSCL,<br>2019 | Höppner,<br>ESSDERC,<br>2019 | This<br>work  |
|---------------------------------------------|-----------------------|--------------------------|----------------------------|------------------------------|-------------------------|--------------------------------|-------------------------------|-------------------------------|-----------------------------------|-----------------------------|-----------------------------|------------------------------|---------------|
| CMOS                                        | 65nm                  | 65nm                     | 65nm                       | 55nm                         | 14nm                    | 40nm                           | 90nm                          | 28nm                          | 28nm                              | 28nm                        | 22nm                        | 22nm                         | 28nm          |
| technology                                  | LP/GP                 | LP                       | LP                         | DDC                          | FinFET                  | ULP eFlash                     | ULL eFlash                    | FDSOI                         | FDSOI                             | FDSOI                       | FDSOI                       | FDSOI                        | FDSOI         |
| CPU                                         | oMSP430               | CM0+                     | CM33 SIMD                  | CM0                          | x86 IA                  | CM4F                           | CM3                           | CM4F                          | Zscale                            | CM0+                        | CM0+                        | CM4F                         | CM0DS         |
| Memory                                      | 18kB SRAM             | 16kB SRAM                | 128kB ROM<br>+ 20kB RAM    | 8kB SRAM                     | 16kB ROM<br>+ 80kB SRAM | 384kB SRAM<br>+ 1MB Flash      | 32kB SRAM<br>+ 256kB Flash    | 16kB SRAM                     | 64kB SRAM                         | 8kB SRAM                    | 12kB SRAM                   | 84kB SRAM                    | 64kB SRAM     |
| Closed-loop PVT<br>compensation             | UFVR<br>(AVS)         | AFS                      | UFVR<br>(AVS)              | AVS+ABB with<br>MEP tracking | ×                       | N/A                            | UFVR<br>(AVS)                 | ×                             | ×                                 | ×                           | Limited ABB                 | ABB                          | UFBR<br>(ABB) |
| Embedded PM                                 | √                     | ~                        | ✓                          | ~                            | ×                       | ✓                              | ✓                             | ×                             | ×                                 | ×                           | ×                           | ×                            | ~             |
| Max. frequency at<br>MEP supply [MHz]       | 32                    | 0.2                      | 0.8                        | 5                            | 3.5                     | 96                             | 16                            | 45                            | 66                                | 16                          | 20                          | 180                          | 80            |
| Active power                                | 6.1                   | 7.6                      | 20                         | 6.4                          | 27*                     | 32.8†                          | 23                            | 8.9                           | 8.8                               | 2.7                         | 1.13                        | 6.9                          | 3.3           |
| at MEP [µW/MHz]                             | @25 MHz               | @0.2 MHz                 | @4 MHz                     | @0.5 MHz                     | @3.5 MHz                | @48 MHz                        | @5 MHz                        | @45 MHz                       | @22 MHz                           | @16 MHz                     | @20 MHz                     | @180 MHz                     | @40 MHz       |
| Peak efficiency                             | ~ 66*                 | 180                      | 95                         | × (not                       | ~ 74*                   | 58†                            | 82                            | 215                           | 126                               | × (not                      | 841                         | 278                          | 385           |
| [DMIPS/mW]                                  | (10 DMIPS)            | (0.24 DMIPS)             | (7.6 DMIPS)                | enough RAM)                  | (7 DMIPS)               | (93 DMIPS)                     | (9 DMIPS)                     | (86 DMIPS)                    | (24 DMIPS)                        | enough RAM                  | ) (19 DMIPS)                | (344 DMIPS)                  | (51 DMIPS)    |
| Logic state retention<br>in deep sleep mode | ×                     | ~                        | ×                          | N/A                          | ~                       | ~                              | $\checkmark$                  | N/A                           | N/A                               | ×                           | ✓                           | ~                            | ~             |
| Deep-sleep retention                        | 95‡                   | 16                       | 2.5‡                       |                              | 79                      | 220† (8 kB RAM)                | 4.3                           |                               |                                   | 121                         | 308                         | > 548°                       | 131           |
| power [nW/kB]                               | (18kB RAM)            | (4kB RAM)                | (4kB RAM)                  | -                            | (80 kB RAM)             | 17+ (384kB RAM)                | (8kB RAM)                     | -                             | -                                 | (8kB RAM)                   | (12kB RAM)                  | (84 kB RAM)                  | (64 kB RAM)   |
| Wake-up time                                | 30 µs                 | N/A                      | 180 µs                     | -                            | > 1 ms                  | 15 µs                          | N/A                           | -                             | -                                 | N/A                         | N/A                         | N/A                          | < 20 µs       |

When the data are available, CoreMark bench is selected for active power numbers. For the peak energy efficiency, Drystone performance are used because they were available for most MCUs, except for \*, where we assume 0.4 DMIPS/MHz for oMSP430 and 2 DMIPS/MHz for x86.

Efficiency numbers are not provided for MCUs with memory capacity below 10kB that do not allow Dhrystone/CoreMark execution.

<sup>+</sup> Power data including the ePM losses as the numbers without ePM are not available. <sup>‡</sup> Deep-sleep power reported without CPU state retention. <sup>°</sup> Leakage only (excluding switching). Red fonts highlight MCU limitations: RAM capacity below 32 kB, MEP frequency below 10 MHz, wake-up time above 50µs, absence of PVT compensation, ePM or CPU state retention. Blue fonts highlight top performance: freq. at MEP supply above 50 MHz, active power below 5 µW/MHz, efficiency above 200 DMIPS/mW and deep-sleep power below 200 nW/kB.



Fig. 17. Comparison to previous ULP MCUs with respect to (a) active power and (b) deep-sleep retention power versus computing performance. The legend indicates the technology node and (a) the RAM size due to its impact on active power and (b) the wakeup time due to its impact on retention power. Red fonts highlight MCU limitations: (a) RAM capacity below 32 kB and (b) absence of CPU state retention in. For the sake of comparison the ePM power losses (here SCVRs) are not taken into account except for [15] and [16].

2.6  $\mu$ W/DMIPS at 51 DMIPS (40 MHz), SleepRunner clearly outperforms previous 28-nm FDSOI MCUs, i.e., 3× lower power yet higher speed than the MCU from [28]. In [50], 22-nm FDSOI is shown to reduce by 3× the logic energy per cycle under timing constraints, compared to 28-nm FDSOI. Despite this fact, SleepRunner achieves 30% lower active power than the 22-nm FDSOI MCU from [30] thanks to the proposed FBB-driven logic/memory/power management co-optimization. The 22-nm FDSOI MCU from [29] reaches the best active power at the cost of limited speed memory capacity. Let us also mention here that SleepRunner is based on an easy-access obfuscated Cortex-M0 CPU that has less optimized energy efficiency than the full Cortex-M0+ CPU used in other references.

Fig. 17(b) shows the opposite trend for deep-sleep retention power: the MCUs in 90/65/40-nm ULP/ULL/LP bulk CMOS achieves much lower retention power than the ones in 28/22-nm FDSOI CMOS because these processes are optimized for low leakage. Nevertheless, with a total deep-sleep power of 131 nW/kB at a short wakeup time of 20  $\mu$ s, Sleep-Runner achieves competitive results with a retention power  $4 \times$  lower than the leakage power<sup>9</sup> of the MCU from [30] at the cost of lower speed performance. This low deep-sleep retention power is achieved by the switchable FBB scheme with ZBB applied in this mode and the unique low-leakage capability of the ULL SRAM bitcell, while the low wakeup time is enabled by the digital UFBR scheme.

### VIII. CONCLUSION

In this article, we introduced a ULP Cortex-M0 MCU with an FBB-driven logic/memory/power management co-optimization illustrated in Fig. 18 for the best FBB exploitation in FDSOI. In particular, ULV operation with upsized gate length and adaptive asymmetric FBB combined with the digital dual FLL/DLL UFBR system and the FBB-compatible ULP SRAM macro allows down to 2.6  $\mu$ W/DMIPS active power at MEP with computing performance up to 100 DMIPS. Its total deep-sleep power of 131 nW/kB with retention of both 64-kB SRAM data and CPU state is enabled by the ZBB voltage in this mode and the unique custom ULL SRAM bitcell, at a short wakeup time thanks to the digital UFBR system. Compared to a

<sup>9</sup>Only leakage power is reported in [30], which thus excludes any always-on switching activity for control purpose.



Fig. 18. Proposed logic/memory/power management co-optimization to exploit FDSOI FBB capability for 100-DMIPS 64-kB Cortex-M0 MCUs.

conventional design with RVT logic and two 32-kB HD SRAMs fully supplied at 0.8 V with ZBB as MCU starting point, the proposed techniques enable a reduction of the total active power and deep-sleep power by factors  $3 \times$  and  $2.5 \times$ , respectively.

Although some of these techniques can hardly be applied in FinFET technologies, we do believe they can be applied to 22/18/12-nm FDSOI technologies.

### ACKNOWLEDGMENT

The authors also would like to thank François Stas for the CP design and Adrian Kneip for the ULL bitcell statistical simulations.

#### REFERENCES

- D. Bol et al., "Green SoCs for a sustainable Internet-of-Things," in Proc. IEEE Faible Tension Faible Consommation, Jun. 2013, pp. 1–4.
- [2] D. Bol, G. de Streel, and D. Flandre, "Can we connect trillions of IoT sensors in a sustainable way? A technology/circuit perspective (Invited)," in *Proc. IEEE SOI-3D-Subthreshold Microelectron. Technol. Unified Conf. (S3S)*, Oct. 2015, pp. 1–3.
- [3] F. Wall, A. Rollat, and R. Pell, "Responsible sourcing of critical materials," *GeoScienceWorld Elements*, vol. 13, no. 5, pp. 313–318, 2017.
- [4] K. Chou, D. Walther, and H. Liou, "The conundrums of sustainability: Carbon emissions and electricity consumption in the electronics and petrochemical industries in Taiwan," *Sustainability*, vol. 11, no. 20, p. 5664, Oct. 2019.
- [5] H. Boni, M. Schluep, and R. Widmer, "Recycling of ICT equipment in industrialized and developing countries," in *ICT Innovations for Sustainability.* Cham, Switzerland: Springer, 2015, pp. 223–241.
- [6] D. Bol, "Ultra-low-power SoCs for local sensor data processing," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018.
- [7] D. Bol, S. Boyd, and D. Dornfeld, "Application-aware LCA of semiconductors: Life-cycle energy of microprocessors from high-performance 32 nm CPU to ultra-low-power 130 nm MCU," in *Proc. IEEE Int. Symp. Sustain. Syst. Technol.*, May 011.
- [8] D. Bol et al., "SleepWalker: A 25-MHz 0.4-V Sub-mm<sup>2</sup> 7-μW/MHz microcontroller in 65-nm LP/GP CMOS for low-carbon wireless sensor nodes," *IEEE J. Solid-State Circuits*, vol. 48, no. 1, pp. 20–32, Jan. 2013.
- [9] ST Microelectronics. Footprint of a Microcontroller. Accessed: Apr. 9, 2020. [Online]. Available: https://www.st.com/content/ st\_com/en/about/st\_approach\_to\_sustainability/sustainability-priorities/ sustainable-technology/eco-design/footprint-of-a-microcontroller.html

- [10] R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, "Near-threshold computing: Reclaiming Moore's law through energy efficient integrated circuits," *Proc. IEEE*, vol. 98, no. 2, pp. 253–266, Feb. 2010.
- [11] D. Bol, "Robust and energy-efficient ultra-low-voltage circuit design under timing constraints in 65/45 nm CMOS," in *J. Low-Power Electron. Appl.*, vol. 1, pp. 1–19, 2011.
  [12] M. Alioto, "Ultra-low power VLSI circuit design demystified and
- [12] M. Alioto, "Ultra-low power VLSI circuit design demystified and explained: A tutorial," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 1, pp. 3–29, Jan. 2012.
- [13] J. Kwong *et al.*, "A 65 nm sub-V<sub>t</sub> microcontroller with integrated SRAM and switched capacitor DC-DC converter," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 115–126, Jan. 2009.
  [14] S. Hanson *et al.*, "A low-voltage processor for sensing applications with
- [14] S. Hanson *et al.*, "A low-voltage processor for sensing applications with picowatt standby mode," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1145–1155, Apr. 2009.
- [15] Apollo3 Blue MCU Datasheet, Revision 0.11.0, AmbiqMicro, Austin, TX, USA, 2020.
- [16] Apollo2 Blue Datasheet, Revision 1.0, AmbiqMicro, Austin, TX, USA, 2019.
- [17] P. Prabhat et al., "MON0: A performance-regulated 0.8-to-38MHz DVFS ARM cortex-M33 SIMD MCU with 10nW sleep power," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 422–423.
- [18] D. Kim, G. Chen, M. Fojtik, M. Seok, D. Blaauw, and D. Sylvester, "A 1.85fW/bit ultra low leakage 10T SRAM with speed compensation scheme," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2011, pp. 69–72.
- [19] T. Haine, Q.-K. Nguyen, F. Stas, L. Moreau, D. Flandre, and D. Bol, "An 80-MHz 0.4 V ULV SRAM macro in 28 nm FDSOI achieving 28-fJ/bit access energy with a ULP bitcell and on-chip adaptive back bias generation," in *Proc. 43rd IEEE Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2017, pp. 241–244.
- [20] R. Giterman, A. Teman, and P. Meinerzhagen, "Hybrid GCeDRAM/SRAM bitcell for robust low-power operation," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 64, no. 12, pp. 1362–1366, Dec. 2017.
- [21] S. Clerc et al., "A 0.33V/-40°C process/temperature closed-loop compensation SoC embedding all-digital clock multiplier and DC-DC converter exploiting FDSOI 28nm back-gate biasing," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 422–423.
- [22] J. Myers *et al.*, "A subthreshold ARM cortex-M0+ subsystem in 65 nm CMOS for WSN applications with 14 power domains, 10T SRAM, and integrated voltage regulator," *IEEE J. Solid-State Circuits*, vol. 51, no. 1, pp. 31–44, Jan. 2016.
- [23] N. Reynders and W. Dehaene, "Variation-resilient building blocks for ultra-low-energy sub-threshold design," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 59, no. 12, pp. 898–902, Dec. 2012.
- [24] H. Reyserhove and W. Dehaene, "Margin elimination through timing error detection in a near-threshold enabled 32-bit microcontroller in 40nm CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 7, pp. 2101–2113, Jul. 2018.
- [25] R. Salvador *et al.*, "A cortex-M3 based MCU featuring AVS with 34 nW static power, 15.3pJ/inst. active energy, and 16% power variation across process and temperature," in *Proc. IEEE 44th Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2018, pp. 278–281.
- [26] K. Bowman, "Adaptive and resilient circuits: A tutorial on improving processor performance, energy efficiency, and yield via dynamic variation," *IEEE Solid-State Circuits Mag.*, vol. 10, no. 3, pp. 16–25, Summer 2018.
- [27] G. Lallement *et al.*, "A 2.7 pJ/cycle 16 MHz, 0.7 μW deep sleep power ARM cortex-M0+ core SoC in 28 nm FD-SOI," *IEEE J. Solid-State Circuits*, vol. 53, no. 7, pp. 2088–2100, Jul. 2018.
  [28] R. Uytterhoeven and W. Dehaene, "A sub 10 pJ/cycle over a 2
- [28] R. Uytterhoeven and W. Dehaene, "A sub 10 pJ/cycle over a 2 to 200 MHz performance range RISC-V microprocessor in 28 nm FDSOI," in *Proc. IEEE 44th Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2018, pp. 326–329.
- [29] G. Lallement *et al.*, "A 1.1-pJ/cycle, 20-MHz, 0.42-V temperature compensated ARM cortex-M0+ SoC with adaptive self body-biasing in FD-SOI," *IEEE Solid-State Circuits Lett.*, vol. 1, no. 7, pp. 174–177, Jul. 2018.
- [30] S. Hoppner *et al.*, "How to achieve world-leading energy efficiency using 22FDX with adaptive body biasing on an arm cortex-M4 IoT SoC," in *Proc. 49th Eur. Solid-State Device Res. Conf. (ESSDERC)*, Sep. 2019, pp. 66–69.
  [31] S. Paul *et al.*, "A sub-cm<sup>3</sup> energy-harvesting stacked wireless sensor
- [31] S. Paul *et al.*, "A sub-cm<sup>3</sup> energy-harvesting stacked wireless sensor node featuring a near-threshold voltage IA-32 microcontroller in 14nm tri-gate CMOS for always-on always-sensing applications," *IEEE J. Solid-State Circuits*, vol. 52, no. 4, pp. 961–971, Apr. 2017.

- [32] J. Lee *et al.*, "A self-tuning IoT processor using leakage-ratio measurement for energy-optimal operation," *IEEE J. Solid-State Circuits*, vol. 55, no. 1, pp. 87–97, Jan. 2020.
- [33] F. U. Rahman, R. Pamula, and V. S. Sathe, "Computationally enabled minimum total energy tracking for a performance regulated subthreshold microprocessor in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 55, no. 2, pp. 494–504, Feb. 2020.
- [34] D. Bol et al., "A 40-to-80MHz sub-4μW/MHz ULV cortex-M0 MCU SoC in 28 nm FDSOI with dual-loop adaptive back-bias generator for 20μs wake-up from deep fully retentive sleep mode," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Mar. 2019, pp. 322–323.
- [35] T. Haine, D. Flandre, and D. Bol, "8-T ULV SRAM macro in 28nm FDSOI with 7.4 pW/bit retention power and back-biased-scalable speed/energy trade-off," in *Proc. IEEE SOI-3D-Subthreshold Microelectron. Technol. Unified Conf. (S3S)*, Oct. 2018, pp. 1–3.
- [36] G. de Streel and D. Bol, "Impact of back gate biasing schemes on energy and robustness of ULV logic in 28nm UTBB FDSOI technology," in *Proc. Int. Symp. Low Power Electron. Design (ISLPED)*, Sep. 2013, pp. 255–260.
- [37] D. Bol, D. Flandre, and J.-D. Legat, "Technology flavor selection and adaptive techniques for timing-constrained 45 nm subthreshold circuits," in *Proc. 14th ACM/IEEE Int. Symp. Low power Electron. Design (ISLPED)*, Aug. 2009, pp. 21–26.
- [38] A. Cathelin, "Fully depleted silicon on insulator devices CMOS: The 28-nm node is the perfect technology for analog, RF, mmW, and mixedsignal system-on-chip integration," *IEEE Solid-State Circuits Mag.*, vol. 9, no. 4, pp. 18–26, Fall 2017.
- [39] D. Bol, D. Kamel, D. Flandre, and J.-D. Legat, "Nanometer MOS-FET effects on the minimum-energy point of 45 nm subthreshold logic," in *Proc. 14th ACM/IEEE Int. Symp. Low power Electron. Design (ISLPED)*, Aug. 2009, pp. 3–8.
- [40] H. Yamauchi, "Embedded SRAM design in nanometer-scale technologies," in *Embedded Memories for Nano-Scale VLSIs* (Integrated Circuits and Systems). Cham, Switzerland: Springer, 2009, pp. 69–72.
- [41] D. Levacq, V. Dessard, and D. Flandre, "Low leakage SOI CMOS static memory cell with ultra-low power diode," *IEEE J. Solid-State Circuits*, vol. 42, no. 3, pp. 689–702, Mar. 2007.
- [42] T. Haine, J. Segers, D. Flandre, and D. Bol, "Gradient importance sampling: An efficient statistical extraction methodology of high-sigma SRAM dynamic characteristics," in *Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE)*, Mar. 2018.
- [43] D. Bol, C. Hocquet, D. Flandre, and J.-D. Legat, "The detrimental impact of negative celsius temperature on ultra-low-voltage CMOS logic," in *Proc. ESSCIRC*, Sep. 2010, pp. 522–525.
- [44] R. G. Gomez, E. Bano, and S. Clerc, "Comparative evaluation of body biasing and voltage scaling for low-power design on 28 nm UTBB FD-SOI technology," in *Proc. IEEE/ACM Int. Symp. Low Power Electron. Design (ISLPED)*, Jul. 2019, pp. 1–6.
- [45] G. de Streel *et al.*, "SleepTalker: A ULV 802.15.4a IR-UWB transmitter SoC in 28-nm FDSOI achieving 14 pJ/b at 27 Mb/s with channel selection based on adaptive FBB and digitally programmable pulse shaping," *IEEE J. Solid-State Circuits*, vol. 52, no. 4, pp. 1163–1177, Apr. 2017.
- [46] M. Blagojevic, M. Cochet, B. Keller, P. Flatresse, A. Vladimirescu, and B. Nikolic, "A fast, flexible, positive and negative adaptive body-bias generator in 28nm FDSOI," in *Proc. IEEE Symp. VLSI Circuits (VLSI-Circuits)*, Jun. 2016, pp. 60–61.
- [47] A. Quelen *et al.*, "A 2.5μW 0.0067 mm<sup>2</sup> automatic back-biasing compensation unit achieving 50% leakage reduction in FDSOI 28 nm over 0.35-to-1V V<sub>DD</sub> range," in *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) Dig. Tech. Papers, 2018, pp. 304–305.
- [48] J. De Vos, D. Flandre, and D. Bol, "A sizing methodology for onchip switched-capacitor DC/DC converters," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 5, pp. 1597–1606, May 2014.
- [49] M. Yabuuchi et al., "A 65 nm 1.0 v 1.84 ns silicon-on-thin-box (SOTB) embedded SRAM with 13.72 nW/Mbit standby power for smart IoT," in Proc. Symp. VLSI Technol., Jun. 2017, pp. 220–221.
- [50] G. Lallement, "Extension of SoCs mission capabilities by offering nearzero-power performances and enabling continuous functionality for IoT systems," Ph.D. dissertation, Aix-Marseille Univ., Marseille, France, 2019.
- [51] D. Bol, R. Ambroise, D. Flandre, and J.-D. Legat, "Interests and limitations of technology scaling for subthreshold logic," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 17, no. 10, pp. 1508–1519, Oct. 2009.

[52] F. Abouzeid et al., "28nm FD-SOI technology and design platform for sub-10pJ/cycle and SER-immune 32bits processors," in Proc. Conf.-41st Eur. Solid-State Circuits Conf. (ESSCIRC), Sep. 2015, pp. 108–111.



**David Bol** (Senior Member, IEEE) received the Ph.D. degree in engineering science from the Université catholique de Louvain (UCLouvain), Louvain-la-Neuve, Belgium, in 2008, respectively.

In 2005, he was a Visiting Ph.D. Student at the CNM National Center for Microelectronics, Seville, Spain, in advanced logic design. In 2009, he was a Post-Doctoral Researcher at intoPIX, Louvain-la-Neuve, in low-power image processing. In 2010, he was a Visiting Post-Doctoral Researcher at the UC Berkeley Laboratory for Manufacturing

and Sustainability, Berkeley, CA, USA, in life-cycle assessment of the semiconductor environmental impacts. He is currently an Associate Professor at UCLouvain. In 2015, he participated to the creation of E-peas Semiconductors, Louvain-la-Neuve. He leads the Electronic Circuits and Systems (ECS) Group focused on ultralow-power design of integrated circuits for the IoT and biomedical applications including computing, power management, sensing and wireless communications. He is engaged in a social-ecological transition in the field of information and communication technology (ICT) research. He co-teaches four M.S. courses on digital, analog and mixed-signal ICs, sensors and systems, with two B.S. courses including the course on sustainable development and transition. He has authored or coauthored more than 120 technical papers and conference contributions and holds three delivered patents. On the private side, he pioneered the parental leave for male professors in his faculty to spend time connecting to nature with his family.

Dr. Bol (co-)received four Best Paper/Poster/Design Awards in IEEE conferences (ICCD 2008, SOI Conference 2008, FTFC 2014, ISCAS 2020). He serves as a Reviewer for various journals and conferences such as IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I/II. Since 2008, he presented several invited papers and keynote tutorials in international conferences including a forum presentation at IEEE ISSCC 2018.



Maxime Schramme (Graduate Student Member, IEEE) received the M.Sc. degree in electrical engineering from the Université catholique de Louvain (UCLouvain), Louvain-la-Neuve, Belgium, in 2016, where he is currently pursuing the Ph.D. degree in microelectronics on the exploitation of backgate biasing capability of FDSOI technology to reduce power, increase circuit speed, ensure good performance and/or compensate process-voltage– temperature variations.

In this context, his current research focuses on the design of ultralow-power clock generation and frequency synthesis circuits.



Ludovic Moreau (Student Member, IEEE) received the M.Sc. degree in electrical engineering from the Université catholique de Louvain (UCLouvain), Louvain-La-Neuve, Belgium, in 2014.

He is currently a Teaching Assistant at UCLouvain and Ph.D. student with Prof. D. Bol. His research focuses on ultralow-power digital IC design for biomedical applications. From 2014 to 2018, he was involved in the UCLouvain IEEE Student Branch, especially as chairman (2014–2017) where he helped revive and develop the branch and its activities

within the ICTEAM research institute.



**Pengcheng Xu** (Student Member, IEEE) received the bachelor's degree (Hons.) in physics from Shanghai Normal University, Shanghai, China, in 2013, the master's degree (Hons.) in integrated circuit engineering from Tongji University, Shanghai, in 2016, and the Ph.D. degree in engineering sciences from Université Catholique de Louvain (UCLouvain) in 2021.

In 2015, he was an Exchange Student with the University of Erlangen-Nuremberg, Erlangen, Germany. Since 2017, he has been a Research Assistant in

electronic engineering with UCLouvain. He was the Holder of the National Scholarship for three times. He has authored or coauthored six technical papers. His research area includes analog and mixed-signal integrated circuit design.

Mr. Xu is a Student Member of the *Solid-State Circuits Magazine*. He was a recipient of the Shanghai Outstanding Graduates Student Award and the Meritorious Winner in the 2013 Mathematical Contest in Modeling of America (MCM). He serves as a Reviewer for various journals and conferences, such as the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM I, IEEE WIRELESS COMMUNICATIONS LETTERS and IEEE ACCESS.



His current research interests include mixed-signal IC design and ultralow-power smart sensors, focusing on biomedical applications.



**Roghayeh Saeidi** (Member, IEEE) received the M.Sc. degree in electrical and computer engineering from the University of Tehran, Tehran, Iran, in 2007, and the Ph.D. degree from Sharif University of Technology, Tehran, in 2014.

From 2009 to 2015, she was a member of the Advanced Integrated Circuit Design Laboratory, Sharif University of Technology. She was an Assistant Professor of Iran Telecommunication Research Center (ITRC), from 2015 to 2019. Since 2019, she has been a Post-Doctoral Research Assistant

with Université catholique de Louvain (UCLouvain). Her current research interests include ultralow-power integrated circuits design for biomedical applications, dc–dc converters, low-power SRAM circuits, analog, and mixed-signal integrated circuits. She enjoys working on engineering products to improve the quality of life without discrimination.



His interest lies in ultralow-power systems on chips, especially in ultralow-power (ULP) SRAMs and ULP CMOS imagers. He is also interested by algorithm for the fast extraction of low failure rate.

degree at UCL.



**Charlotte Frenkel** (Member, IEEE) received the M.Sc. degree (*summa cum laude*) in electromechanical engineering and the Ph.D. degree in engineering science from the Université catholique de Louvain (UCLouvain), Louvain-la-Neuve, Belgium, in 2015 and 2020, respectively.

Thomas Haine (Member, IEEE) received the M.S.

degree in electrical engineering from the Université

catholique de Louvain (UCL), Louvain-la-Neuve,

Belgium, in 2014. He is currently pursuing the Ph.D.

In February 2020, she joined the Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland, as a Post-Doctoral Researcher. Her current research focuses on low-power highdensity spiking neural network processor design and

aims at bridging the bottom-up and top-down design approaches toward neuromorphic intelligence.

Ms. Frenkel serves as a Technical Program Chair (TPC) member for the IEEE International Symposium on Embedded Multicore/Many-core Systemson-Chip (MCSoC) Conference and as a Reviewer for various conferences and journals, including the IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, IEEE ACCESS, and Frontiers in Neuroscience.



**Denis Flandre** (Senior Member, IEEE) received the M.S. degree in electrical engineering, the Ph.D. degree and the Research Habilitation from the Université catholique de Louvain (UCL), Louvainla-Neuve, Belgium, in 1986, 1990, and 1999, respectively. His doctoral research was on the modeling of silicon-on-insulator (SOI) MOS devices for characterization and circuit simulation, his post-doctoral thesis on a systematic and automated synthesis methodology for MOS analog circuits. Since 2001, he is a Full-Time Professor at

UCL. He is involved in the research and development of SOI MOS devices, digital and analog circuits, as well as sensors, MEMS and solar cells, for special applications, more specifically ultralow-voltage low-power, microwave, biomedical, radiation-hardened and high-temperature electronics and microsystems. He has authored or coauthored more than 900 technical papers or conference contributions. He is co-inventor of 12 patents. He has organized or lectured many short courses on SOI technology, devices and circuits in universities, industrial companies and conferences.

Dr. Flandre has received several scientific prizes and best paper awards. He has participated or coordinated numerous research projects funded by regional and European institutions. He has been a member of several EU Networks of Excellence on High-Temperature Electronics, SOI technology, Nanoelectronics and Micro-nano-technology. He is a co-founder of CISSOID, a spin-off company of UCL focusing on SOI and high-reliability integrated circuit design and products. He is scientific advisor of two other UCL start-ups: INCIZE (Semiconductor characterization and modeling for design of digital, analog/RF and harsh environment applications) and E-peas (Energy harvesting and processing solutions for longer battery life, increased robustness in all IoT applications). He is an active member of the SOI Industry Consortium and of the European Silicon-on-Insulator Conference (EUROSOI) network.