A 6.5-12.5-Gbs Half-Rate Single-Loop All-Digital CDR

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE JOURNAL OF SOLID-STATE CIRCUITS 1

A 6.5–12.5-Gb/s Half-Rate Single-Loop All-Digital


Referenceless CDR in 28-nm CMOS
Changzhi Yu , Member, IEEE, Euije Sa, Student Member, IEEE, Soowan Jin, Student Member, IEEE,
Himchan Park , Member, IEEE, Jongshin Shin , Member, IEEE,
and Jinwook Burm , Senior Member, IEEE

Abstract— This article presents a novel method for frequency in the case of an input signal with a low signal-to-noise
tracking based on an extended bang-bang phase detector ratio (SNR), the use of a reference clock introduces additional
(XBBPD) in a referenceless clock and data recovery (CDR) noise and reduces cost-effectiveness [4], [5]. The shortcomings
circuit. The XBBPD-based structure has a frequency tracking
range that completely covers the tuning range of the digitally con- of CDRs that require a reference frequency can be improved
trolled oscillator (DCO) with a fast locking feature. To minimize with referenceless CDRs.
the loop delay and thereby improve the jitter tolerance, the CDR Studies on referenceless CDRs aim to achieve high energy
design includes an additional proportional path that is realized efficiency, fast locking, wide frequency capture ranges, and,
by directly controlling the phase of the oscillator with the output in particular, simplified circuit structures. Among these aims,
signal of the phase detector. The design is all-digital, including
digital filters that simplify the design. The CDR occupies an active the conflicting requirements of expanding the frequency lock-
area of 0.031 mm2 , implemented in a 28-nm CMOS process. The ing range and reducing the lock time have not been satisfied
receiver operates up to 12.5 Gb/s. The frequency locking time, simultaneously. In [6], a wide frequency acquisition range
measured as the time required for every 1-Gb/s change in the is achieved with stochastic subharmonic frequency extrac-
input data, is 320 ns. The power consumption is only 21.13 mW, tion. But generated low reference frequency leads to a long
corresponding to an energy efficiency of 2.11 pJ/bit.
acquisition time. In [7], a dual bang-bang phase detector
Index Terms— Digital loop filter, digitally controlled oscillator (BBPD)-based phase frequency detector (PFD) used to achieve
(DCO), extended bang-bang phase detector (XBBPD), half-rate large frequency capture range was well-presented. However,
sampling, high-speed integrated circuits, referenceless clock and
data recovery (CDR). the tasks of generating and buffering eight clocks with evenly
distributed phases increase the power consumption and the
I. I NTRODUCTION area overhead. Another frequency acquisition scheme [8] scans
the digitally controlled oscillator (DCO) frequency from low-
W ITH the development of serial communication technol-
ogy in wired-line and optical communications, clock
and data recovery (CDR) has become an increasingly critical
est to highest with frequency acquisition achieved, but setting
the initial frequency of the DCO to its lowest value to prevent
the harmonic lock causes a long frequency acquisition time.
module in receivers, and the ongoing diversification of applica-
To address the limitations of the above schemes, this article
tions has stimulated focus on the versatility necessary to meet
presents a new type of referenceless frequency detection and
various specifications. In this context, CDRs must advance
tracking scheme that maximizes the frequency pull-in and
beyond the current operational requirement for pre-defined
lock range with a reduced locking time and fewer sampling
data rates to enable automatic adaptation to any input data rate
clock phases. The proposed scheme is also insensitive to
[1]–[12]. In some repeater applications, the number of pins in a
the encoding form of the input data and guarantees stable
chip is strictly limited, and CDRs that use frequency references
operation even with sudden changes in input data rates or
cannot be applied readily to these applications because of
standards. The proposed receiver can be applied to video
the excessive pin overhead of such CDRs [3]. In addition,
quality conversion by channel switching. For example, when
Manuscript received September 6, 2019; revised January 5, 2020 and a 4k video operating at 11.88 Gb/s is switched to ultra high
March 19, 2020; accepted May 13, 2020. This article was approved by definition (UHD) video operating at 5.94 Gb/s, the proposed
Guest Editor Daniel Friedman. This work was supported in part by the Brain
Korea 21 Plus Project; in part of Grant NRF-2018R1D1A1B07049663, Grant referenceless CDR operation continues after the frequency and
IITP-2020-2018-0-01421, and Grant 10080622. The EDA tool and MPW was phase re-lock in a very short time.
supported by the IC Design Education Center (IDEC), Daejeon, South Korea. The remainder of this article is organized as follows.
(Corresponding author: Jinwook Burm.)
Changzhi Yu, Himchan Park, and Jinwook Burm are with the Department of Section II presents the feasibility analysis of the proposed
Electronic Engineering, Sogang University, Seoul 04107 South Korea (e-mail: scheme for improving the frequency detection characteristics
burm@sogang.ac.kr). of the existing BBPD, as well as a frequency detection perfor-
Euije Sa is with SK Hynix, Icheon 17336, South Korea.
Soowan Jin is with LG Electronics, Seoul 06772, South Korea. mance analysis of the extended BBPD (XBBPD). Section III
Jongshin Shin is with Foundry Division, Samsung Electronics Company, details the construction of the receiver and CDR and imple-
Ltd, Hwaseong 18448, South Korea. mentation of building blocks. Section IV describes the exper-
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org. imental setup and shows the measurement results from the
Digital Object Identifier 10.1109/JSSC.2020.3005750 fabricated receiver. Section V presents the conclusions.
0018-9200 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: City, University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 1. (a) Block diagram of the BBPD and (b) its timing diagram in lock
state.

Fig. 3. Timing diagram of frequency detection for (a) input data rate is higher
than the recovered clock frequency. The 010 sampling results for 010 data
input sequence and (b) input data rate is lower than recovered clock frequency.
The 111 sampling results for 010 data input sequence.

capability enables a stable frequency lock even when the input


data jitter is large.

A. XBBPD for Extended Frequency Detection


Fig. 2. Operation principle of the BBPD. D0 , D180 , and D360 are sampled
data by CK0 , CK180 , and CK360 , respectively. (a) Output error 1: clock phase For a random input binary sequence, the UI is defined by the
leads, but recovered frequency is slow (UIDATA < TCK ). (b) Output error 2: time it takes to transmit one bit. Because the input signal con-
clock phase lags, but recovered frequency is high (UIDATA > TCK ).
tains alternating data such as 010 (or 101), we can correct the
DCO frequency by comparing the input data UI (UIDATA) of
II. F REQUENCY D ETECTION the middle value with the period of the sampling clock (TCK ).
The proposing frequency detection scheme depends on single
For the proposed design, an improved scheme is introduced
isolated “1” or “0” pulses of the input data.
to extend the frequency detection range of the existing BBPD.
XBBPD simultaneously outputs the frequency-up/
Then, the frequency detection function and the performance
frequency-down (FUP/FDN) correction signal and the
of the XBBPD are analyzed.
phase correction signal (PUP/PDN). Fig. 3 shows a random
The conventional BBPD and its timing diagram in lock
binary bit sequence containing alternating data inputs such
state are shown in Fig. 1 [13] A CDR loop with a BBPD
as 010 (or 101). If the recovered clock frequency is lower
can pull the clock frequency into the target frequency only
than the input data rate (FCK < RDATA or UIDATA < TCK ,
when the frequency deviation between the input data (DIN )
where RDATA is the input data rate), and if the rising edge
and the clock (CK0/180 ) is small, typically within several
of CK180 is located at the center of the input data interval,
thousands of ppm depending on the input jitter and the loop
then the sampling results (D0 and D360 in Fig. 3) of the two
bandwidth [9]. For a large frequency deviation a converging
consecutive rising edges at CK0 and CK360 are the same as
solution cannot be found due to random occurrences of phase
each other and different from the sampling result D180 at
up (PUP)/phase down (PDN) signals. For example, as shown
CK180 , as shown in Fig. 3(a). The three clock sampling points
in Fig. 2(a) when recovered clock phase leads the input data
at CK0 , CK180 , and CK360 produce the 010 (or 101) signal for
PDN is generated to decrease the oscillator frequency even
the corresponding 010 (or 101) input data sequence [Fig. 3(a)].
if the recovered frequency is lower than the input data rate
When the frequency of the recovered clock is determined to
(i.e., UIDATA < TCK , where UIDATA is the unit interval of the
be lower than the input data rate (FCK < RDATA ), the BBPD
input data and TCK is the period of the clock). This makes
will output an FUP signal (FUP if D0 D180 D360 = 010 or 101)
the clock frequency deviate away from the desired value.
to increase the clock frequency. In contrast, if the recovered
Similarly, another detrimental result can occur when a PUP
clock frequency is higher than the time input data rate
signal results in a frequency increase even when the recovered
(FCK > RDATA or UIDATA > TCK ), and if the rising edge
frequency is higher than the input data rate (UIDATA > TCK ),
of CK180 is located at the center of the input data interval,
as shown in Fig. 2(b)
the sampling results of the three clock phases will be the same
Therefore, we now extend the basic BBPD structure to
(111 or 000), as shown in Fig. 3(b). As a result, the BBPD
propose a new phase and frequency detector designed for
produces an FDN signal (FDN if D0 D180 D360 = 111 or 000)
CDRs. The proposed XBBPD has more sampling points
to decrease the recovered clock frequency.
than the conventional BBPD and can output frequency and
phase error signals simultaneously. XBBPD’s frequency pull-
in range is from approximately 0% to 175% of the input data B. FUP Generation
rate. Therefore, the lowest input frequency of the clock that can Applying the approach described in Section II.A to generate
be captured with XBBPD is close to 0, although this minimum an FUP signal, we observed an interesting phenomenon. The
is often limited by the minimum oscillation frequency of the results of three consecutive clock phase samplings at 0◦ ,
DCO. The highest trackable frequency range of the clock 180◦ , and 360◦ were completely random when the recovered
is 1.75× the input data rate. The XBBPD’s two-way lock clock frequency was much lower than the input data rate

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

YU et al.: HALF-RATE SINGLE-LOOP ALL-DIGITAL REFERENCELESS CDR 3

Fig. 6. Timing diagram of (a) FDN error due to random sampling, (b) FDN
Fig. 4. Timing diagram of FUP generation. (a) and (b) When UIDATA < TCK . error due to CIDs in low frequency, (c) FDN error due to CIDs when
(c) and (d) When UIDATA > TCK . FCK = RDATA , and (d) FDN error correction with the additional samples of
D−180 and D540 .

Fig. 7. (a) Timing diagram of FDN generation when FCK > RDATA and
(b) no FDN generated as the input data are CIDs when FCK < RDATA as
expected.

Fig. 5. Number of FUP as a function of relative frequency. A random phase


alignment between data and clock is assumed.
as the frequency deviation between the recovered clock and the
input data rate is continuously reduced, NFUP is continuously
reduced (region 2 in Fig. 5).
When the frequency of the recovered clock is higher than the
(UIDATA  TCK or RDATA  FCK ) In this case, the probability
input data rate, the sampling result will be neither 010 nor 101,
of FUP (D0 D180 D360 = 010 or 101) occurring at any time
and NFUP will be reduced to 0 because when the recovered
is 2 × (1/2)3 , assuming a completely random data sequence,
clock frequency is higher than the input data rate, at least two
as shown in Fig. 4(a) and (b). Therefore, the BBPD can still
or more clock phases fall within the same UI range of the
output the FUP signal, which causes the lower limit of the
input data, as shown in Fig. 4(c) and (d). Thus, the equation
frequency capture range to extend to near dc (FCK ∼ 0)
for generating an FUP signal is
When FCK is close to RDATA , but still FCK < RDATA , the FUP
signal may not be produced for every input data sequence FUP = (D0 ⊕ D180 ) · (D180 ⊕ D360 ) (1)
of 010 (or 101). In addition, FUP signals corresponding to
where ⊕ represents the XOR operation and · represents the
010 or 101 sampling results may not occur when the rising
AND operation.
edge of CK180 is no longer the center of a data period and/or
when the input data sequences are different from 010 (or 101)
(e.g., 0110). C. FDN Generation
When FCK is much smaller than RDATA , as the frequency When the sampled data D0 D180 D360 = 111 or 000 are used
of the recovered clock increases, the number of the FUP as shown in Fig. 3(b) to generate a FDN signal, there are
signal occurrences due to random sampling continues to two reasons for undesirable FDN generation: 1) a random
increase. Assuming that both the input data and the recovered occurrence of data sampling due to a large RDATA and FCK
clock are random in phase, the number of the FUP signal difference as shown in Fig. 6(a) and 2) data sampling of
occurrences (NFUP ) with the change in the clock frequency consecutive identical digits (CIDs) to produce the results
from dc to 2× the input data rate is shown in Fig. 5. of 111 (or 000), regardless of the recovered clock frequency
In Fig. 5, the statistical analysis results of the normalized NFUP as shown in Fig. 6(b) and (c).
with relative frequency (FCK /RDATA ) indicate that the above Since a 111 (or 000) sample is not in itself sufficient to
theory is feasible for generating FUP signals. This increase determine an FDN condition, two more sampling results—
in FUP signal occurrences continues until the clock frequency D−180 and D540 —are introduced to prevent erroneous FDN
reaches half of the input data rate (region 1 in Fig. 5). As the generation, as shown in Fig. 6(d). Thus, five sampling results
frequency of the recovered clock continues to increase, that is are used for FDN generation: D−180 , D0 , D180 , D360 , and D540

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 8. Number of FDN occurrence with fifth sampling result of D540 (red) Fig. 9. Occurrence of FUP(NFUP ) and FDN(NFDN ) versus frequency ratio
and D450 (green). FDN signals are D−180 D0 D180 D360 D540 for red line and from 0 to 2 with different TDs of input data.
D−180 D0 D180 D360 D450 for green line.

(sampled at the clock phases of −180, 0, 180, 360, and Although with (3), a small number of FDN signals still
540, respectively). As shown in Fig. 7(a), in the case of appeared within the range of FCK /RDATA < 0.8 (an FUP
FCK > RDATA (an FDN condition), and the rising edge of condition), the overall number of FUP occurrences was much
CK180 is just in the middle of three alternating data values greater than the number of FDN occurrences (NFUP  NFDN ).
(010 or 101), the middle three sampling results—D0 , D180 , Therefore, the small number of errors does not affect the
and D360 —are identical and mutually exclusive of the sam- direction of the frequency tracking process. The combined
pling results on either side (D−180 and D540 ). The sampled results obtained with the FUP from (1) and the FDN from (3)
sequence D−180 D0 D180 D360 D540 is either 01110 or 10001 for are shown in Fig. 9. Because this approach uses extra
FDN signals. Therefore, the equation for generating the FDN sampling points at −180◦ and 450◦ in comparison to the
signal is conventional BBPD, we call this method the XBBPD for
frequency detection. In addition, the number of FUP and FDN
FDN = (D−180 ⊕ D0 ) · (D0  D180 ) · (D180  D360 ) occurrences varies with the transition density (TD) of the input
·(D360 ⊕ D540 ) (2) data, and we analyzed the effect of different TDs on NFUP and
NFDN in the input random data and refined the relevant analysis
where  represents the XNOR operation. results in Fig. 9.
To further check the FDN operation, we performed FDN The feasibility analysis showed that NFUP and NFDN can
sampling analysis on a random sequence with a changing guide the frequency correction direction correctly. Further-
input data rate. From a simulation that used (2), the FDN more, NFUP and NFDN are linear and depend on the TD of
generation over the frequency ratio (FCK /RDATA ) from 0 to 2 is the input data (Fig. 9) with respect to the relative frequency
shown by the solid red line in Fig. 8. However the FDN signal in the interval of 0.8 ≤ FCK /RDATA ≤ 1.25, as follows:
 
exhibits a symmetric shape around FCK /RDATA = 1, indicating FCK FCK
NFUP = kUP · 1 − , 0.8 ≤ ≤1
the same occurrence of FDN signals in both the intervals of RDATA RDATA
 
0.8 < FCK /RDATA < 1 (an FUP condition) and 1 < FCK / FCK FCK
RDATA < 1.2 (an FDN condition), which is incorrect. This NFDN = kDN · −1 , 1≤ ≤ 1.25 (4)
RDATA RDATA
result occurred because when the input data were CIDs
where kUP and kDN are the gain of the XBBPD in the frequency
(e.g., 0110) and CK180 was in the middle of the four UIs,
acquisition process and kUP = kDN when 0.8 ≤ FCK /RDATA ≤
as shown in Fig. 7(b), the sampling result of D−180 D0 D180
1.25. Because kUP = kDN = k, (4) can be expressed as
D360 D540 was 01110 (or 10001), generating an FDN signal  
for an FUP condition. FCK
N = NFDN − NFUP = k · −1
Therefore, the fifth sample point needed to be changed RDATA
to another clock phase. We analyzed the effect of a clock FCK
for 0.8 ≤ ≤ 1.25 (5)
phase of 450◦ (CK450 ) as the fifth sampling clock on a new RDATA
FDN, and the result is shown by the green line in Fig. 8. where N > 0 for the FDN condition and N < 0 for the FUP
Using CK450 as the fifth sampling clock (a phase delay condition. The frequency locked loop (FLL) with the proposed
of 90◦ from CK360 , where the sampling result is D450 ), the XBBPD has the advantageous feature of bidirectional locking.
number of FDN occurrences in the interval of FCK /RDATA < 1 The lower limit of the frequency detection range of the
was significantly reduced. Furthermore, within the interval of XBBPD is near the dc and the upper limit is 1.75× the input
0.8 ≤ FCK /RDATA ≤ 1, there were zero FDN occurrences. data rate, as shown in Fig. 9. This provides a reliable solution
Therefore, CK450 was considered suitable as the fifth sampling for designing referenceless CDR circuits.
clock, and (2) was revised as
D. DCD Analysis
FDN = (D−180 ⊕ D0 ) · (D0  D180 ) · (D180  D360 ) When duty cycle distortion (DCD) occurs on the input data,
·(D360 ⊕ D450 ). (3) the results obtained by our model are as shown in Fig. 10.

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

YU et al.: HALF-RATE SINGLE-LOOP ALL-DIGITAL REFERENCELESS CDR 5

TABLE I
OVERSAMPLING R ATIO AND C LOCK P HASE N EEDED
FOR D IFFERENT A RCHITECTURES

Fig. 10. NFUP and NFDN in the presence of DCD.

Fig. 11. Timing diagram and sampling clock phase for frequency detection
of (a) full-rate, (b) half-rate, and (c) quarter-rate.

We analyzed the effect of DCD on frequency lock when


changing from 0% to 15%. With the increase in DCD, NFUP
and NFDN are no longer zero at the relative frequency of 1. The Fig. 12. Block diagram of referenceless all digital CDR implementation.
loop can still be locked, but frequency offset (slightly higher
than the target frequency, about 25 000 ppm) occurs. When
locked with DCD, jitter generation will be introduced because III. A RCHITECTURE AND C IRCUIT I MPLEMENTATION
FUP and FDN are not zero and appear alternately. A maximum
of 10% DCD can be tolerated to track its frequency offset The overall architecture of the proposed referenceless CDR
within the pull-in range of the phase-locked loop (PLL) is shown in Fig. 12. The input signal is amplified using a
without the significant cost of the lock time according to limiting amplifier and then fed to the phase and frequency
the post-layout simulation with 231 − 1 pseudo-random bit detector, which in this circuit is the XBBPD. The PUP/PDN
sequence (PRBS) input streams. Once the frequency locking and FUP/FDN signals from the XBBPD are processed in a
is achieved, frequency tracking is disabled and the subsequent digital loop filter, and with assistance from the integrated
phase tracking (PLL) reduces the jitter generation. Further XBBPD, the FLL and PLL are combined into one control loop
increasing the DCD to more than 10% would result in the to maximally exploit automated tools for synthesizing. The
failure of frequency tracking. output of the digital loop filter controls the DCO to achieve
phase and frequency lock on the input signal. The proposed
all-digital referenceless CDR was implemented using half-rate
E. Sub-Rate Analysis sampling. To facilitate the digitization process, we convert the
In the full-rate open-loop state, we derived a method of early and late phase and frequency signals into signed binary
generating FUP and FDN signals by scanning the interval of digits and input them into the digitally implemented gain
the relative frequency from 0 to 2 and analyzed the sampling controller. The XBBPD samples the input signal and outputs
results. The resulting XBBPD showed a large dynamic fre- two pairs of phase correction signals (PUP [1:0]/PDN [1:0])
quency detection range and also produced phase correction and a pair of frequency error signals (FUP/FDN). Two pairs of
signals at the same time. At the full rate, the XBBPD required phase correction signals are simplified into by a majority voter
two clock phases with a difference of 180◦, in combination and then input to the deserializer because, for example, with
with a third phase of 90◦ , to generate the FUP and FDN an input data rate of 10 Gb/s, the bit rate of the voted phase
signals, as shown in Fig. 11(a). This method can similarly error signal is still very high (5 Gb/s). Therefore, a deserializer
be used with the sub-rate sampling CDR. As shown by the is used to deserialize the voted signals and output a pair of
half-rate sampling timing diagram in Fig. 11(b), two input 8-bit 625-Mb/s parallel signals for a 10-Gb/s input data rate.
data and their transition edges were simultaneously sampled The pair of 8-bit parallel signals is then processed by an
by four clock phases with a phase interval of 90◦ ; however, arithmetic logic unit to generate a 5-bit signed binary phase
only one additional clock phase of 315◦ was needed to assist correction signal. The frequency error signal does not need to
in identifying the duration of one data UI. Therefore, in the pass the majority vote circuit, and the signed binary frequency
halfrate mode, the oversampling ratio was 5/2. The relationship correction signal is obtained using an arithmetic logic unit
between the above CDR structure and oversampling ratio and after de-serialization. The digital synthesis logic module thus
the required clock phase is given in Table I. includes an adjustable gain controller, a set of arithmetic logic

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 13. Implementation of XBBPD for phase and frequency detection. The 1 clock phase (5 GHz).

D90 , D270 , and D315 . Because this design is for a halfrate CDR,
two pairs of PUP and PDN signals are generated by comparing
the results sampled by two clocks with the orthogonal phases,
following the logic that
PUP [1] = D0 ⊕ D90 ; PDN [1] = D90 ⊕ D180
PUP [0] = D180 ⊕ D270 ; PDN [0] = D270 ⊕ D360 . (6)
Here D360 is the sampling result of the next clock of CK0 . The
frequency correction signal can be generated only when CK180
Fig. 14. Phase–frequency detection logic. (a) Timing diagram and operation is located in the middle of a singlepulse pattern (X010X or
principle of PUP/PDN and FUP/FDN generation. (b) Majority voter. X101X, where X can be either 0 or 1) of a random binary
sequence From (1) and (3), the frequency detection logic
units, a digital loop filter with frequency and phase filtering, expression can be modified for a half-rate CDR as follows:
and a lock detector. FUP = (D90 ⊕ D180 ) · (D180 ⊕ D270 ) = PDN [1] · PUP [0]
FDN = (D0 ⊕ D90 ) · (D90  D180 ) · (D180  D270 )
A. XBBPD Modules
· (D270 ⊕ D315 )
The proposed XBBPD consists of four components: high-
= PUP [1] · (PDN [1] + PUP [0]) · (D270 ⊕ D315 ). (7)
speed sampler, retimer, phase–frequency detection logic, and
majority voter modules, as detailed in Fig. 13. The sampling result D315 at the 315◦ clock phase is required
1) HighSpeed Sampler: For a CDR circuit designed for a when detecting whether CK180 is located in the middle of a
half-rate operation with simultaneous detection of frequency singlepulse pattern.
and phase errors, five sense-amplifier-based flip-flops [14] 4) Majority Voter: For the proposed design, in one cycle
are used to sample the input signal. The five flip-flops use of the DCO, two UIs of input data are sampled by five
clocks corresponding to five clock phases: 0◦ , 90◦ , 180◦, 270◦, clock phases. The sampled signals are re-timed and input
and 315◦. Because the oscillator consists of four differential to the phase–frequency detection logic to generate the phase
delay units, a total of eight clock phases are produced with correction signals, which are PUP’[1:0] and PDN’[1:0], and
a phase interval of 45◦ , and thus a 315◦ phase clock can two frequency correction signals, which are FUP and FDN.
be readily provided. The proposed structure uses only one These four correction signals need to be simplified and com-
additional sample at 315◦ compared with the existing half- bined into two: PUP and PDN [Fig. 14(b)]. The simplification
rate BBPD and achieves simultaneous frequency and phase needs to consider the effects of jitter in the input data in
detection. the phase tracking loop. That is, when the input data signal
2) Retimer for Phase Alignment: Because of the sub-rate exhibits jitter, the two decision results of the phase detection
sampling, five phase clocks are required to alternately sample logic may be different. In this cases, the output of the majority
the input signal in one clock cycle. Therefore, all the sampling voter should be zero to avoid introducing unnecessary jitter.
results must be synchronized to one clock phase for further
processing. All these five sampled signals are used for fre-
quency detection, and only four of them are needed to detect B. Digital Synthesis Logic
the phase error. These five signals are sent to the retimer, 1) Phase–Frequency Correction Code Generation: The
which aligns the sampled signals to a single clock phase. high-speed phase and frequency correction signals are demul-
3) Phase–Frequency Detection Logic: The timing diagram tiplexed by the deserializer to generate an 8-bit parallel signal
and the operating principle for the phase–frequency detection at a rate of one-eighth. These signals need to be converted
logic in one DCO cycle are shown in Fig. 14(a). The input into two’s complement form for an adder array. The output of
data are sampled by five samplers and fed to the re-timing the digitized phase and frequency correction signals is in the
circuit, which aligns the sampled signals to the same clock range of [−8, +8].
phase. Of the five aligned signals, two are recovered data: Digital Loop Filter: The digital loop filter consists of
D0 and D180; the remaining three signals are edge signals: proportional and integral paths [15], [16]. The proportional

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

YU et al.: HALF-RATE SINGLE-LOOP ALL-DIGITAL REFERENCELESS CDR 7

Fig. 15. Detailed block diagram of the digital loop filter.

Fig. 16. Post-layout simulation result of the frequency acquisition


path is only for the PLL, and the integral path is shared by process. Two frequency changes 1) from 4.75 to 3.25 GHz and 2) from
the PLL and FLL. The gain of the proportional path is K P , 4.75 to 6.25 GHz are shown for three different input data sets (0101, PRBS7,
and that of the integral path is K I for the PLL and K F for the and PRBS31).

FLL. A detailed view of the digital synthesis filter is shown


in Fig. 15. Its implementation is divided into two parts: a high- different frequencies. In this article, to cover the wide tuning
speed phase-processing module and a low-speed synthesis range of DCO, the 11 next-to-highest bits are used to control
module; these modules have different operating rates and are the DCO, because the highest bit represents the sign bit.
capable of controlling the DCO separately. The high-speed 2) Lock Detector: In addition, the digital synthesis module
phase-processing module receives the PUP/PDN signal from can be set to operate in the frequency acquisition mode or
the majority voter, and then directly controls the DCO after in the phase tracking mode adaptively according to the input
amplification by the gain of K PP , set by the gain controller. data rate. When the digital CDR powered on or when the
To maintain the stability of the loop, the simplest possible input data rate changes instantly, a lock detector controls the
high-speed phase-processing module must be implemented to multiplexer to select the frequency acquisition mode. As soon
reduce the loop delay. Therefore, we use a 7-bit binary number as the frequency lock is achieved, the lock detector changes
to represent K PP and implement a high-speed proportional the control signal to enable the phase tracking loop.
module simply by enabling or disabling K PP based on the The lock detector tracks the FUP/FDN signals and deter-
output of the majority voting circuit. The resulting high-speed mines when to switch the state between lock and unlock. The
7-bit binary signal directly controls the DCO. The adjustable criterion for locking is the relationship between the maximum
gain control range of K PP is an integer between 0 and 27. and minimum values of the integral path within a certain
Thus, the loop gain for the PLL in Fig. 15 is as follows: period of time (configurable from 128 to 2048 clock cycles,
    commonly 128 or 256 clock cycles). This range is determined
KI K DCO
L(z −1 ) = K PD K V K PP + K P + −1
z −N
· based on the results of CDR’s behavior simulation in the
1−z 1 − z −1
frequency locked state. The frequency lock condition is when
(8)
the output change in the integration path that is intercepted
where K PD , K V , K P , K I , and K DCO are the gains of the to control the DCO frequency is within ±2, which can be
XBBPD, voter, proportional path, integral path, and DCO, programmed externally. For the cases where the input signal
respectively; the term z −N denotes the total delay of the has a low TD, this range needs to be reduced to prevent false
proportional and integral paths. During FLL, PUP and PDN locks The post-layout simulation results of the CDR frequency
are still generated to affect the loop through the K PP factor acquisition are shown in Fig. 16. The lock detector operates
as in (8). However, the effects of PUP and PDN on the as follows.
DCO output is minimal, since the average of PUP–PDN is 0 1) When the CDR is reset by power on reset (POR) or a
from the random nature of their occurrences. In addition, lack of input data, the lock detector directly enters the
K PP is a minimal value to prevent the frequency disturbance unlock state to enable the frequency acquisition loop.
during FLL. In addition, with no data input, PUP or PDN is zero
The clock frequency used in the filter is one-eighth of the (PUP + PDN = 0), and the lock detector will output
DCO frequency, which benefits the implementation of high- the unlock state.
precision accumulators. Therefore, the phase and frequency 2) During the frequency acquisition process, the maximum
correction signals are demultiplexed by a factor of eight. value (max) and the minimum value (min) of the integral
The internal accumulator has a high resolution of 21 bits, path will be generated within the detection period. If the
which allows the gain control signals (K P , K I , and K F ) to difference between the two values is greater than a
use a wide range of proportional and integral gain. The gain predefined value (tolerable frequency: ftol; max − min >
control is achieved by shifting the corresponding number of ftol), the lock detector will still output the unlock state.
bits according to the value set by the control signal. The 3) Frequency locking: If the output frequency of the DCO
configurable gain range is between 0 and 2n (0 ≤ n ≤ 15), approaches or reaches the target frequency, the differ-
which is sufficient to cover the difference in K DCO at ence between the maximum and minimum values will

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE JOURNAL OF SOLID-STATE CIRCUITS

Fig. 18. Die photograph with block description.

Fig. 17. Circuit diagram of the four-stage DCO.

be less than or equal to ftol (i.e., max − min ≤ ftol)


and will satisfy PUP + PDN
= 0. At this point, the lock
detector outputs the lock state.
4) When the input data rate is too low, below the
lower limit of the XBBPD frequency detection range
(i.e., FCK > 1.75× RDATA), the number of occurrences of
FUP and FDN is zero as shown in Fig. 9, which implies
no adjustment of the frequency. Therefore, the maximum Fig. 19. Measurement setup.
and minimum values of the frequency are the same as
Therefore, a total of 29 bits (7 + 7 + 15) control words,
the initial value. If PUP + PDN
= 0 (i.e., input data
corresponding to a resolution of 11.0875 binary weighted bits.
exist), the lock detector will remain in the unlock state
until the input data rate enters the frequency acquisition IV. M EASUREMENT R ESULTS
range.
The prototype was designed and manufactured using a
28-nm CMOS technology. A micrograph of the chip is shown
C. DCO in Fig. 18. The chip’s active area was 0.031 mm2 . As shown in
To generate five clock phases (0◦ , 90◦ , 180◦ , 270◦, and 315◦ ) the measurement setup in Fig. 19, an Anritsu MU18020A gen-
to oversample the input data, the DCO is implemented as a ring erated a pseudorandom bit sequence 231 − 1 (PRBS31) pattern
oscillator with a four-stage differential delay cell. Each stage as the input data. The fabricated chip was characterized using
of the differential cell generates a 45◦ phase delay. Because of a chip-on-board assembly. The input data channel consisted of
its differential structure, the DCO generates total eight clock a 1.2-m RG402 coaxial cable and a 1.4-cm onboard FR-4 trace
phases: 0◦ 45◦ 90◦ 135◦ 180◦ 225◦ 270◦ and 315◦ among with a channel loss of 8.6 dB at 6.25 GHz
which just five of the eight phases are used The delay cell All modules were powered by a 1.0-V supply and consumed
and its schematic are shown in Fig. 17 Each stage consists 21.7 mW at a input data rate of 10 Gb/s. The power consump-
of a differential current starved inverter and a cross-coupled tion is detailed in Fig. 20: the XBBPD consumed 50.5% of
latch, and frequency adjustment is achieved by controlling the the power because it operated at a high frequency. The digital
current flowing through a set of digitally controlled resistors synthesis module for frequency acquisition and CDR logic
composed of P/NMOS consumed only 8.7% of the total power.
An 11-bit binary weighted frequency control word (FCW) The capture range was measured by initializing the DCO
generated by the synthesis module at an updating rate of frequency to 4.75 GHz (corresponding to RDATA of 9.5 Gb/s)
one-eighth of the oscillator frequency is assigned as follows. and measuring the CDR’s lockable data rate. The measurement
Because of the large number of DCO frequency control bits, results showed that the proposed CDR’s capture range was
if all the FCW bits are flipped simultaneously, a large glitch 6.5–12.5 Gb/s. In addition, the time required for a frequency
will be generated. lock decreased as the gain of the integral path increased.
Therefore, to reduce the glitches by decreasing the number However, overly increased integral gain will result in an
of segments participating in switching and to reduce jitter unstable lock frequency. As shown in Fig. 21, with the input
generation, 4-bit MSBs of 11-bit binary code are converted of the PRBS31 pattern and an integral gain of 23 , the lock
into thermometer codes that correspond to FCW[21:7], and time was 1.5 µs. The lock time increased to 2.5 µs when the
the remaining 7 bits are still binary weighted, corresponding integral gain was set to 22 .
to FCW[6:0]. Additional DCO controls of FCW_DP[6:0] and The effect of channel loss on the frequency acquisition
FCW_DN[6:0] are implemented to be directly controlled by behavior is also measured to show some errors in lock
the PUP and PDN signals generated by the XBBPD, of which frequency as shown in Fig. 22. As the channel loss increases,
the updating rate is the same as the DCO frequency [17]. the effect of channel loss and ISI becomes obvious.

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

YU et al.: HALF-RATE SINGLE-LOOP ALL-DIGITAL REFERENCELESS CDR 9

TABLE II
P ERFORMANCE S UMMARY AND C OMPARISON TO THE S TATE - OF - THE -A RT R EFERENCELESS CDRs

Fig. 20. (a) Test results of power consumption as a function of data rate. Fig. 22. Measured (a) channel loss and (b) corresponding frequency
(b) Power breakdown. acquisition behavior with 12.5-Gb/s input data rate.

increase in the captured frequency. The final frequency in


this case is outside of the pull-in range of the PLL, causing
erroneous frequency locking as shown in the measurement
result of channel 3 in Fig. 22. For jitter tolerance (JTOL) mea-
surements, the recovered data were fed into the error detector
(MU183040B) of the signal quality analyzer. The JTOL curve
obtained by this measurement is shown in Fig. 23. The corner
frequency of 10 MHz was realized under PRBS31 with a bit
error rate (BER) < 10−12 . At the corner frequency, as K PP
increased, the delay between the input data and the recovered
clock decreased and the JTOL increased; however, further
increased K PP led to phase overcorrection and thus reduced
Fig. 21. Measurement result of lock time with different K F settings.
JTOL. With optimal setting of K PP , JTOL was 0.34 UI at
high frequency. Table II summarizes the performance of the
referenceless CDR in comparison to the state-of-the-art works.
The prototype includes a limiting amplifier, which can Fig. 24(a) and (b) shows the measured waveforms of the
tolerate a maximum channel loss of 9.3 dB at 6.25 GHz. recovered data and clock. The measured jitter of the root mean
Increasing channel loss further will result in a corresponding square (rms) and the peak-to-peak jitter of the recovered clock

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE JOURNAL OF SOLID-STATE CIRCUITS

R EFERENCES
[1] D. Dalton et al., “A 12.5-mb/s to 2.7-Gb/s continuous-rate CDR with
automatic frequency acquisition and data-rate readback,” IEEE J. Solid-
State Circuits, vol. 40, no. 12, pp. 2713–2725, Dec. 2005.
[2] J. Jin, X. Jin, J. Jung, K. Kwon, J. Kim, and J.-H. Chun,
“A 0.75–3.0-Gb/s dual-mode temperature-tolerant referenceless CDR
with a deadzone-compensated frequency detector,” IEEE J. Solid-State
Circuits, vol. 53, no. 10, pp. 2994–3003, Oct. 2018.
[3] M. S. Jalali, A. Sheikholeslami, M. Kibune, and H. Tamura,
“A reference-less single-loop half-rate binary CDR,” IEEE J. Solid-State
Circuits, vol. 50, no. 9, pp. 2037–2047, Sep. 2015.
[4] N. Kocaman, S. Fallahi, M. Kargar, M. Khanpour, and A. Momtaz,
“An 8.5-11.5-Gbps SONET transceiver with referenceless frequency
acquisition,” IEEE J. Solid-State Circuits, vol. 48, no. 8, pp. 1875–1884,
Aug. 2013.
[5] G. Shu et al., “A reference-less clock and data recovery circuit using
Fig. 23. Measured JTOL with different gains of direct pulling path. phase-rotating phase-locked loop,” IEEE J. Solid-State Circuits, vol. 49,
no. 4, pp. 1036–1047, Apr. 2014.
[6] R. Inti, W. Yin, A. Elshazly, N. Sasidhar, and P. K. Hanumolu, “A 0.5-to-
2.5 Gb/s reference-less half-rate digital CDR with unlimited frequency
acquisition range and improved input duty-cycle error tolerance,” IEEE
J. Solid-State Circuits, vol. 46, no. 12, pp. 3150–3162, Dec. 2011.
[7] K. Park, W. Bae, J. Lee, J. Hwang, and D.-K. Jeong, “A 6.7–11.2
Gb/s, 2.25 pJ/bit, single-loop referenceless CDR with multi-phase,
oversampling PFD in 65-nm CMOS,” IEEE J. Solid-State Circuits,
vol. 53, no. 10, pp. 2982–2993, Oct. 2018.
[8] G. Shu et al., “A 4-to-10.5 Gb/s continuous-rate digital clock and data
recovery with automatic frequency acquisition,” IEEE J. Solid-State
Circuits, vol. 51, no. 2, pp. 428–439, Feb. 2016.
[9] W. Rahman et al., “A 22.5-to-32-Gb/s 3.2-pJ/b referenceless baud-rate
Fig. 24. Recovered clock and data at (a) 6.5 and (b) 12.5 Gb/s. (c) Recovered digital CDR with DFE and CTLE in 28-nm CMOS,” IEEE J. Solid-State
clock jitter at 11.8 Gb/s. Circuits, vol. 52, no. 12, pp. 3517–3531, Dec. 2017.
[10] P. K. Hanumolu, G.-Y. Wei, and U.-K. Moon, “A wide-tracking range
clock and data recovery circuit,” IEEE J. Solid-State Circuits, vol. 43,
no. 2, pp. 425–439, Feb. 2008.
were 1.15 and 9.88 ps, respectively when the input data rate [11] S.-K. Lee, Y.-S. Kim, H. Ha, Y. Seo, H.-J. Park, and J.-Y. Sim,
was 11.8 Gb/s and the external jitter source was disabled, “A 650Mb/s-to-8Gb/s referenceless CDR circuit with automatic acqui-
as shown in Fig. 24(c). sition of data rate,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
Tech. Papers, Feb. 2009, pp. 184–185.
[12] S. Huang, J. Cao, and M. M. Green, “An 8.2 Gb/s-to-10.3 Gb/s full-
V. C ONCLUSION rate linear referenceless CDR without frequency detector in 0.18 µm
CMOS,” IEEE J. Solid-State Circuits, vol. 50, no. 9, pp. 2048–2060,
We presented a 2.5× oversampling phase–frequency detec- Sep. 2015.
tion scheme and a continuous half-rate digital referenceless [13] R. Behzad, “Designing bangBang PLLs for clock and data recovery
in serial data transmission systems,” in Proc. High-Perform. Syst.,
CDR circuit based on this scheme. The maximum tolerable Feb. 2003, pp. 34–45.
channel loss is currently 9.3 dB at 6.25 GHz, which can [14] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. Kar-Shing Chiu,
be improved with enhanced equalizer and limiting amplifier. and M. Ming-Tak Leung, “Improved sense-amplifier-based flip-flop:
Design and measurements,” IEEE J. Solid-State Circuits, vol. 35, no. 6,
Compared with other referenceless CDRs, the largest fre- pp. 876–884, Jun. 2000.
quency acquisition and phase tracking range were achieved [15] J. L. Sonntag and J. Stonick, “A digital clock and data recovery archi-
tecture for multi-Gigabit/s binary links,” IEEE J. Solid-State Circuits,
using fewer samplers, resulting in high power and area vol. 41, no. 8, pp. 1867–1875, Aug. 2006.
efficiency. The only additional overhead required compared [16] V. Kratyuk, P. K. Hanumolu, U.-K. Moon, and K. Mayaram, “A design
with the existing BBPD was one more clock phase and one procedure for all-digital phase-locked loops based on a charge-pump
phase-locked-loop analogy,” IEEE Trans. Circuits Syst. II, Exp. Briefs,
sampler. Because the scheme used digital processing, the FLL vol. 54, no. 3, pp. 247–251, Mar. 2007.
and PLL shared the same integrator, further optimizing the [17] M. Verbeke et al., “A 1.8-pJ/b, 12.5–25-Gb/s wide range all-digital clock
chip area. Thus, the proposed CDR had a minimal chip and data recovery circuit,” IEEE J. Solid-State Circuits, vol. 53, no. 2,
pp. 470–483, Feb. 2018.
area of 0.03 mm2 , and the prototype was manufactured in
a 28-nm CMOS process. The frequency tracking scheme is
achieved by detecting the input data sequences of 010/101;
compared with other referenceless CDRs, the proposed
2.5× oversampling referenceless all-digital CDR had the Changzhi Yu (Member, IEEE) received the B.S.
shortest locking time of 1.5 µs and achieved an excellent degree from Konkuk University, Seoul, South Korea,
power efficiency of 2.11 pJ/bit at an input rate of 10 Gb/s. in 2012, and the M.S. degree from Sungkyunkwan
University, Suwon, South Korea, in 2014. He is
currently pursuing the Ph.D. degree in electronics
ACKNOWLEDGMENT engineering with Sogang University, Seoul.
His research interests include mixed-mode signal
The authors would like to thank Foundry Division, Samsung integrated circuits, clock and data recovery circuits,
and high-speed I/O links.
Electronics, Hwaseong, South Korea, for the opportunity of
chip fabrication.

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

YU et al.: HALF-RATE SINGLE-LOOP ALL-DIGITAL REFERENCELESS CDR 11

Euije Sa (Student Member, IEEE) was born in Jongshin Shin (Member, IEEE) received the B.S.,
Cheongju, South Korea, in 1991. He received the M.S., and Ph.D. degrees in electronics and electrical
B.S. and M.S. degrees in electronics engineer- engineering from Seoul National University, Seoul,
ing from Sogang University, Seoul, South Korea, South Korea, in 1997, 1999, and 2004, respectively.
in 2016 and 2018, respectively. He joined Samsung Electronics, Hwaseong,
He is currently working with SK Hynix, Icheon, South Korea, in 2004, as a member of the Technical
South Korea. His research interests include wide- Staff, where he is currently the Vice President of
range phase-locked loops (PLLs) for serial interface Foundry Division. His research interests include
and clocking circuits. clock generators, high-speed IO, and clock and data
recovery circuits.

Soowan Jin (Student Member, IEEE) was born in


Eumseong, South Korea, in 1990. He received the
B.S. degree from Chungbuk University, Cheongju,
South Korea, in 2016, and the M.S. degree from
Sogang University, Seoul, South Korea, in 2018. Jinwook Burm (Senior Member, IEEE) was born in
He is currently working with LG Electronics, South Korea, in 1964. He received the B.S. degree
Seoul. His current research interests include high- in physics from Seoul National University, Seoul,
speed interface circuits (SERDES) and phase-locked South Korea, in 1987, the M.S. degree in physics
loops. from the University of Michigan, Ann Arbor, MI,
USA, in 1989, and the Ph.D. degree in applied
physics from Cornell University, Ithaca, NY, USA,
Himchan Park (Member, IEEE) was born in Seoul, in 1995.
South Korea, in 1984. He received the B.S. and He did a post-doctoral work at Cornell University
M.S. degrees in electronics engineering from Sogang and Bell Labs, Lucent Technologies, Murray Hill,
University, Seoul, in 2012 and 2014, respectively, NJ, USA. He joined the Department of Electronics
where he is currently pursuing the Ph.D. degree. Engineering, Sogang University, Seoul, as an Assistant Professor in 1998,
His current interests are in the design of integrated where he is currently a Professor. He also worked as a Principal Scientist
sensors and analog front-ends for CMOS image with Pixelplus Semiconductor, Inc., San Jose, CA, USA, for one year starting
sensors and sensor applications. in August 2006. He worked on millimeter-wave ICs and high-speed GaN
transistors at Cornell University, and high-speed optoelectronic circuits at Bell
Labs. His current research interests include high-speed interface circuits and
CMOS implementation of various sensors.

Authorized licensed use limited to: City&#44; University of London. Downloaded on July 10,2020 at 01:54:15 UTC from IEEE Xplore. Restrictions apply.

You might also like