# Low-Power Half-Rate Dual-Loop Clock-Recovery System in 28-nm FDSOI

C. Gimeno, Member, IEEE, D. Flandre, Senior Member, IEEE and D. Bol, Member, IEEE

*Abstract*— In this paper, a new dual-loop half-rate clock recovery is proposed for chip-to-chip communications. The halfrate topology allows for reducing the speed requirements of the blocks that constitute the clock-recovery system to achieve the required data-rate. The proposed topology is formed by a frequency-locked loop (FLL) active at startup for coarse PVT compensation and a phase-locked loop (PLL) taking over after startup to provide phase alignment between the clock and the data. The PLL uses a multi-level bang-bang phase detector for low-power. The proposed clock recovery circuit is designed for a 5-Gb/s data rate in a 28-nm FDSOI CMOS technology with two supply voltages (1 and 1.8V). It reaches an average power consumption of 1.43 mW under PLL operation.

*Index Terms*— Chip-to-chip communication, clock recovery circuit, half-rate, low-power.

# I. INTRODUCTION

CLOCK and data recovery circuits (CDR) are crucial in many different communication systems as they are in charge of recovering the clock and re-timing the incoming data. They are especially critical for high-speed signaling because of the required high timing precision.

Reference-less CDRs are used in many different applications. They are widely used both in optical and electrical wired communications [1-3]. Also, recently, they have been introduced in wireless communication. For example, in [4], we proposed to use a CDR to generate the clock necessary for the correct operation of a wireless chip-tochip-communication transceiver as a low-power alternative to wireline links such as PCI-Express.

There are many different ways to implement a CDR: delaylocked loop (DLL), phase-locked loop (PLL), CDR based on a phase interpolator (PI-CDR), etc. All of them can be both implemented in the analog or digital domain. Among these, PLL-based CDRs are the most widespread systems for highspeed reference-less CDRs.

Fig. 1 shows the general block diagram of a PLL-based CDR. It is typically formed by a phase detector (PD) that compares the phase of the input data and the signal generated by a voltage-controlled oscillator (VCO). A charge pump (CP)



Fig. 1. Block diagram of a PLL-based CDR.

is also necessary to charge or discharge a loop filter (LP) to provide a smooth signal controlling the VCO.

At high data rate, a half-rate CDR is very useful to reduce the requirements on the performance of the internal blocks while maintaining the high throughput of the system [5-7]. In this case, a half-rate PD (HR-PD) senses the input data at full rate but uses a VCO running at half the input rate. Therefore, the speed requirements of the phase detector and the oscillator are relaxed.

In this paper, we provide the design of a half-rate dual-loop FLL+PLL-based clock recovery operating at an input data rate of 5 Gb/s and providing very low-power consumption. The paper is organized as follows. Section II describes the architecture of the proposed clock recovery. In Section III, the design of the building blocks is discussed. Finally, Section IV provides the performance results.

## II. ARCHITECTURE OF THE PROPOSED CLOCK RECOVERY

The role of the clock recovery is to generate a clock with phase alignment on the input data.

The phase detector is one of the critical blocks of the clock recovery as it will determine the phase error that has to be compensated by the PLL loop. A bang-bang (BB) or binary phase detector is preferred over the linear phase detector due to its simplicity, good phase adjustment, high-speed operation and, most importantly, low power consumption.

As previously mentioned, a half-rate clock recovery is used in our design to reduce the speed requirements of its main internal blocks and, therefore, a half-rate phase detector (HR-PD) is needed.

Although using a BB-PD in a half-rate clock recovery enables to compensate any variation in the phase difference between its inputs, [8] demonstrates that when the frequency difference between its input signals is large, the system is not able to lock. This is explained as follows. If the frequency offset is large, the effect of the loop to compensate the phase difference becomes small. The effect of the frequency offset to change the phase dominates the total change in the phase error which makes the VCO frequency to remain untuned.

However, when using a ring oscillator as VCO for low

This work was supported in part by F.R.S.-F.N.R.S. of Belgium under the credit de fonctionnement n° 1.B.209.16F and by MINECO-FEDER (TEC2014-52840-R).

C. Gimeno, D. Flandre and D. Bol are with the ICTEAM Institute, Université catholique de Louvain, Place du Levant 3-L5.03.02, 1348 Louvain–La-Neuve, Belgium, (e-mail: cecilia.gimenogasca@ uclouvain.be).



Fig. 2. Block diagram of the proposed dual-loop clock recovery system.

power and low area in scaled CMOS technologies, PVT variations results in frequency variations up to  $\pm 30\%$ . Therefore we require a wide locking frequency range (here 1.75-3.25 GHz for 5-Gb/s data rate).

Because of this effect, we propose to include together with the PLL, a frequency-locked loop (FLL). The FLL will operate at the startup of the data transmission and will initially lock the frequency. Next, the FLL turns off and the PLL turns on to perform phase alignment while reducing power consumption. The PLL will then be able to compensate the small frequency variations of the VCO due to temperature, supply noise, and jitter, as well as phase mismatches between the input data and the VCO.

The whole block diagram of the proposed clock recovery is shown in Fig. 2. As previously commented, it is formed by a frequency-locked loop (FLL) and a phase-locked loop (PLL). A reset signal controls the on and off states of both loops. Although each loop has its own charge pump, as they require different operation conditions, the loop filter is shared to save area.

When initiating data transmission, the reset signal turns on the FLL so that the correct frequency range can be reached by the VCO. In order not to lock the system in a false locked state, every time the reset signal turns on, the control voltage of the VCO is also reset to 0. Once the VCO frequency has reached the target value, the FLL turns off and the PLL starts its operation. This PLL is able to compensate small variations that can arise during the transmission of the data frame.

As we are working at half-rate, several phases of the clock are used for both the FLL and PLL. Therefore, a ring oscillator is preferred here as it is able to directly generate multiple phases of the clock.

# III. CIRCUIT DESIGN

This section focuses on the circuit implementation of the clock recovery building blocks. Simulations have been performed in SPICE at schematic level using 28nm CMOS FDSOI technology. Fig. 2 shows the clock recovery that is formed by a frequency detector (FD), a bang-bang phase detector (BB-PD), 2 charge pumps (CP1 and CP2), a loop



Fig. 3. Frequency detector implemented with a linear half-rate phase detector.



Fig. 4. Operation principle of (a) the frequency detector, and (b) multi-level bang-bang half-rate phase detector [9].

filter (LF) and a voltage-controlled oscillator (VCO) mostly operated at the core supply voltage of 1 V. The LF is based on the conventional lead-lag low-pass filter [5].

## A. Frequency detector

To implement the frequency detector, we use the circuit shown in Fig. 3 [5]. It consists of four latches that track its input for half a clock period and remain constant for the other half, and two XOR gates. The input data is applied to the two cascaded latches that are clocked by the half-rate clock. Fig. 4 (a) shows the operation principle of FD.

If there is no phase difference between the clock and the data, the error signal has a width equal to 1/4 of the clock period, but the reference signal width is 1/2 of the clock period. This disparity has to be removed by scaling down the effect of the second output by a factor of two by halving the corresponding current source in the charge pump.

We can see that it operates as a linear phase detector but with tolerance to a high frequency offset, which allows it to lock the frequency from a wide range of frequency offset



Fig. 5. Charge pumps schematic design: (a) charge pump for the FLL (CP1) and (b) charge pump for the PLL (CP2).

# (1.75-3.25 GHz).

#### B. Phase detector

To implement the PD, we propose to use a multi-level halfrate phase detector (ML-HR-PD) [9]. Thanks to the finer control that provides information about the sign and magnitude of the phase difference between its inputs, it provides less jitter and bit error rate (BER) when included in a PLL-type CDR.

Fig. 4 (b) shows its operation principle. It comprises 4 levels of quantization that tells whether the phase shift is positive or negative and its magnitude is quantized into 2 levels. Five different phases of the clock are needed for the correct operation of the ML-HR-PD, but they can be generated with a four-stage differential ring oscillator.

# C. Charge pump

Two charge pumps are used, one for the FLL and another for the PLL. Fig. 5 shows their circuit level implementations. They are formed by simple CMOS switched current sources to inject positive and negative current pulses in the loop filter. Inverters and transmission gates are added at the inputs to achieve a matched delay. In the first charge pump (CP1), the current generated by the PMOS source is twice the current generated by the NMOS source to compensate for the disparity between the error (Late) and the reference (Early) signals in the FD. In the second charge pump, 2 identical NMOS sources and 2 identical PMOS sources are used to generate the 4 levels of quantization.

The 1.8-V I/O supply is used for the charge pumps to have a wide voltage range for the control node of the VCO. Hence the charge pumps are implemented with low-VT I/O MOSFETs and level shifters are used at their inputs.

## D. Voltage-controlled oscillator

LC-based oscillators can offer good jitter performance but ring oscillators (ROs) achieve lower power, area and settling time. Furthermore, ROs can nicely generate multiple clock phases needed for the multi-level HR-PD. We thus use an RO as VCO with 4 differential stages.

Fig. 6 shows the delay stage that is used to implement the RO. Their delay is controlled by the back bias applied to the transistors in FDSOI, resulting in a back-bias controlled ring oscillator (BBRO) [10]. Indeed, a forward back bias (FBB)



Fig. 6. Schematic of the proposed delay stage in the back-bias-controlled ring oscillator.



Fig. 7. Simulated oscillator frequency tuning range with FBB to compensate for PVT variations for different temperatures.

between 0 and 1.8V applied to the NMOS transistors modifies their threshold voltage and thus the stage delay and oscillation frequency.

The BBRO consumes 460  $\mu$ W when operating at 2.5 GHz and features a phase noise of -100.4 dBc/Hz at 30-MHz offset.

The FBB impact on the oscillation frequency of the BBRO is shown in Fig. 7 in typical corner operation. A wide range of frequencies from 1.74 to 3.63 GHz can be achieved. The tuning sensitivity of the oscillator is 1.5 GHz/V. Fig. 7 also shows that operation in the 2.5 GHz remain possible for temperatures down to -40 °C or up to 80 °C.

Corner simulations have been also performed for the BBCO as this is the most critical block with respect to process variations. Fig. 8 shows the frequency tuning range at the SS (slow PMOS, slow NMOS), and FF (fast PMOS, fast NMOS) corners. These deviations are small enough to maintain the 2.5 GHz operation.

### IV. SYSTEM RESULTS

The proposed double-loop clock recovery system has been simulated in Verilog-AMS with behavioral models including their specific non-idealities. Schematics blocks implemented in a 28-nm FDSOI CMOS technology have been used for power and noise estimations. The proposed topology requires 2 supply voltages: 1.8 V for the charge pump supply and 1 V for the rest of the circuitry.

The clock recovery system consumes an average of 3.2 mW and 1.43 mW in FLL and PLL modes, respectively. The main power saving comes from the smaller power consumption of the charge pump for the PLL loop with respect to the FLL charge pump thanks to the lower activity on the UP/DOWN signals with the BB-PD.

To perform a functional validation of the FLL loop, Fig. 9



Fig. 8. Simulated oscillator frequency tuning range with FBB to compensate for SS and FF corners.



Fig. 9. FLL operation for two extreme initial frequencies: 1.75 and 3.25 GHz.

shows the frequency locking for the two extreme initial frequencies (1.75 GHz and 3.25 GHz), where the final maximum peak-to-peak frequency error achieved is below 3% in less than 0.4  $\mu$ s.

When the FLL loop is on, the generated cycle-to-cycle jitter of the recovered clock is 6.5 ps rms when the input data rate of 5 Gb/s is used.

A maximum locking time of  $0.5 \ \mu s$  for the FLL is needed when currents of 1 mA and 0.5 mA are used for the PMOS and the NMOS current sources of the charge pump, respectively. After this startup time (that we set at 1  $\mu s$  to include a safety range) the FLL can be turned off and the PLL turned on to reduce the power consumption.

Fig. 10 shows the complete transient response for both frequency and phase loops and the frequency of the BBRO. The FLL loop is on during the first 1  $\mu$ s to achieve the required control voltage for the VCO. At this moment, the PLL is turned on and the FLL turned off. An instant frequency deviation of 15 MHz has been added every  $\mu$ s to reproduce small variations that can take place once the FLL has been locked. The PLL is able to compensate these variations almost instantaneously as shown in the zoom of Fig. 10.

When the PLL loop is working and the FLL is off, the generated cycle-to-cycle jitter of the recovered clock is 6.2 ps rms for an input data bit rate of 5 Gb/s which is inside the



Fig. 10. PLL operation for small frequency variations.

specifications for the data transmission at that rate.

# V. CONCLUSION

In this paper, a new dual-loop half-rate clock recovery system is proposed. It works at 5 Gb/s and was designed in a 28-nm FDSOI CMOS technology.

It is based on an FLL, which requires less than 1  $\mu$ s to provide the coarse tuning of the clock frequency to mitigate large PVT variations, and a multi-level half-rate PLL. The PLL turns on after 1  $\mu$ s when the FLL turns off, which allows to reduce the power consumption by 55 %. A cycle-to-cycle jitter of only 6.2 ps rms if obtained for the generated clock.

#### REFERENCES

- F.-T. Chen *et al.*, "A 10-Gb/s low jitter single-loop clock and data recovery circuit with rotational phase frequency detector," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 11, pp. 3278–3287, 2014.
- [2] E. Guerrero, C. Sánchez-Azqueta, C. Gimeno, J. Aguirre, and S. Celma, "An adaptive bitrate clock and data recovery circuit for communication signal analyzers," *IEEE Transactions on Instrumentation and Measurement*, vol. 66, no. 1, pp. 191–193, 2017.
- [3] H. Ju, W. Bae, G.-S. Jeong, and D.-K. Jeong, "A 800-Mb/s 0.89-pJ/b reference-less optical receiver with pulse-position modulation scheme," *IEEE ISCAS2016*, pp. 2346–2349, 2016.
- [4] C. Gimeno, D. Flandre, and D. Bol, "Analysis and specification of an IR-UWB transceiver for high-speed chip-to-chip communication in a server chassis", *IEEE Transactions on Circuits and Systems I: Regular Papers*, DOI: 10.1109/TCS1.2017.2765312.
- [5] B. Razavi, "Challenges in the design of high-speed clock and data recovery circuits," *IEEE Solid-State Circuits Magazine*, vol. 40, no. 8, pp. 94–101, 2002.
- [6] M. Ramezani, C. Andre, and T. Salama, "Analysis of a half-rate bangbang phase-locked-loop," *IEEE Transactions on Circuits and Systems-II: Express Briefs*, vol. 49, no. 7, pp. 505–509, 2002.
- [7] G. Shu et al., "A reference-less clock and data recovery circuit using phase-rotating phase-locked loop," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 4, pp. 1036–1047, 2014.
- [8] M. S. Jalali, A. Sheikholeslami, M. Kubune, and H. Tamura, "A reference-less single-loop half-rate binary CDR," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 9, pp. 2037–2047, 2015.
- [9] C. Gimeno, D. Bol, and D. Flandre, "5-Gb/s input-data multi-level halfrate phase detector", *IEEE Transactions on VLSI*, under review.
- [10] G. de Streel; F. Stas; T. Gurné; F. Durant; C. Frenkel; A. Cathelin; D. Bol, "SleepTalker: A ULV 802.15.4a IR-UWB Transmitter SoC in 28-nm FDSOI Achieving 14 pJ/b at 27 Mb/s With Channel Selection Based on Adaptive FBB and Digitally Programmable Pulse Shaping," IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 1163–1177, 2017.