Professional Documents
Culture Documents
Li 2016
Li 2016
Li 2016
FOR 16QAM
HAO LI, ZHI-GANG WANG, HOU-JUN WANG
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
E-MAIL: haoli_uestc@163.com
y (4n)
yout 31 1/2
1/2
2
P1 y (4n 1)
H ( n)
detector
error detector
yout 01 y (4n 1)
D 1/2
1/2
adjust
d ust
yout 32
y (4n 1)
Timing adj
P2
Gardner error
Timing
NCO
Parallel NCO
yout 02 y (4n 2)
Gardner
w
yout 33 e(2n 1)
Parallel
P3 y (4n 2)
yout 03
y (4n 3)
yout 34 1/2
1/2
2
P4
yout 04 y (4n 3)
Loop
en1
lter
en2
fi
en3
en4 In addition, the y(4n 1) and the y(4n i), i 1,3 is
the symbol sample point. The y(4n) and the y(4n 2) is
Fig.1 Four-parallel timing synchronization structure
the intermediate sample point. We can find that the parallel
Gardner algorithm can calculate the two route parallel timing
2.1. Gardner error detector
error values for four route parallel input data in one clock
cycle. The Figure 2 shows the four-parallel Gardner error
Gardner error detector is used to detect the loop timing
detector implementation structure. In Figure 2, in order to
error. Gardner algorithm is deduced based on QPSK
simplify the subsequent processing of the loop filter, the
modulated signal. But it is also suitable for 16QAM
modulated signal, which has been verified in [11]. The average of timing error H (n) is calculated.
404
2.2. Loop filter traditional NCO structure does not be implemented in FPGA.
So we propose an improved parallel NCO algorithm, which
The role of the loop filter is filtering the timing error. It uses the principle of the sign bit decision.
can reduce the influence of high frequency noise, so as to Suppose parallel NCO control unit has M route and
reduce the jitter of the timing error signal. This makes timing generate M sets of data K1 (n) , K2 (n) , , KM (n) . Each set
error value more smoothly and make the entire loop of the of data is given
timing synchronization is more stable. Here we just give a
Ki (n) K (nM i) i 1, 2,3, M (6)
common loop filter structure, as shown in Figure 3.
k1
Combining equation (4) can be obtained the following
expression
K (nM i) [K (nM i 1) w(n)]mod1, i 1, 2, M (7)
e( n ) Z Then equation (7) is also written as
D
K (nM 1) [K (nM ) w(n)]mod 1
° K (nM 2) [K (nM 1) w(n)]mod 1
k2 °°
® K (nM 3) [K (nM 2) w(n)]mod 1 (8)
°
Fig.3 Loop filter structure °
°̄ K (nM M ) [K (nM M 1) w(n)]mod 1
In Figure 3, the k1 and the k 2 are loop filter
coefficients, which determines the performance of the timing where register variable K (nM i) , i 1, 2,3, M is
loop convergence rate. Its calculation method has been initialized to 0.
specifically given in [12]. The above process is described as follows. Input control
word w(n) at the initial moment, then according to above
2.3. Parallel NCO equation (8), all M groups K can be calculated in one clock
cycle. Meanwhile, analyzing the positive and negative of the
Parallel NCO control unit provides parallel fractional M groups K . Because we have to get M groups K in one
interval for farrow interpolation filter, which will determine
the interpolation position. It also provides overflow parallel clock cycle, so we will not use the pipeline design ideas. And
enable signal for the timing adjust unit. equation (8) is simplified as follows
It has been pointed out the essence of NCO control unit K (nM 1) [K (nM ) w(n)]
is a differential equation in [4]. It is expressed as follows ° K (nM 2) [K (nM ) 2 w(n)]
°°
K (m) [K (m 1) w(m)]mod 1 (4)
® K (nM 3) [K (nM ) 3w(n)] (9)
where mod is defined as a modulo function that just take the °
remainder portion. Specific given by (5). °
K (m 1) w(m) 1 K (m 1) w(m) 0 °̄ K (nM M ) [K (nM ) Mw(n)]
K (m) ® (5) It can be seen that the current time needed M groups K
¯K (m 1) w(m) otherwise
Next, the K (m) represents the value of NCO register in only relevant K (nM ) and w(n) . This allows to calculate
the m-th working clock. The w(m) is input control word of the value of M groups K independent of each other, so as
the NCO control unit. to achieve the purpose of parallel processing. In equation (9),
The working clock of the NCO is 1 Ts , and the 1 Ti is it does not need to judge the positive and negative of the M
groups K and also do not need to use modulo function,
output data rate after interpolator. Input control word w(m)
since the binary subtraction is converted into an addition
is adjusted through the loop filter. When the loop reaches operation by its complement. It means that we can do parallel
balance, the w(m) is approximately a fixed constant, that is processing according to the per clock cycle.
w(m) Ts Ti . According to the analysis of the preceding The M-parallel NCO overflow enable signal expression
Gardner algorithm, we can know that 1 Ti 2Rs , where Rs is given
represents the symbol rate. Therefore, at the initial moment
of the loop, we set w(1) 2RsTs . With the sample clock 1 Ts
is quite large in high speed demodulation system, the
405
en[1] sgn[K (nM )] sgn[K (nM 1)] Scatter plot Scatter plot
° en[2] sgn[K ( nM 1)] sgn[K ( nM 2)]
5 4
°° 2
Quadrature
Quadrature
® en[3] sgn[K (nM 2)] sgn[K (nM 3)] (10)
°
0 0
° -2
°̄ en[ M ] sgn[K (nM M 1)] sgn[K (nM M )]
-5 -4
where sgn[K ] means to take the sign bit of K and
-5 0 5 -4 -2 0 2 4
In-Phase In-Phase
represents exclusive OR (XOR). (a) before timing synchronization (b) after timing synchronization
Fractional interval
¯ P0 en[1] 0
BER
-3
interval.
10 Float simulation BER
fixed simulation BER
Theoretical BER
-6
10
0 2 4 6 8 10 12 14
406
parallel timing synchronization structure after FPGA [7] Zhou X, Chen X, Zhou W, et al. “All-Digital Timing
implementation. In this experiment, the running clock of the Recovery and Adaptive Equalization for 112 Gbit⁄s
entire system is 1440 8 180 MHz in FPGA. Thus, for high POLMUX-NRZ-DQPSK Optical Coherent Receivers”
speed data streams, the existing FPGA can meet the [J]. Journal of Optical Communications & Networking,
implementation requirements through parallel processing 2010, 2(11):984-990.
architecture. And the consumption of hardware resource is [8] Schmidt D, Lankl B. “Parallel architecture of an all
also allowed to accept. digital timing recovery scheme for high speed receivers”
[C]. International Symposium on Communication
4 Conclusions Systems Networks and Digital Signal Processing. IEEE,
2010:31-34.
This paper presents a brand new high speed parallel [9] Higashino S, Kobayashi S, Yamagami T. “A parallel
timing synchronization structure for 16QAM. This parallel architecture of interpolated timing recovery for high-
structure has been proved to be quite effective and very speed data transfer rate and wide capture-range” [C].
suitable for FPGA implementation. The simulation indicates Optical Data Storage. International Society for Optics
that the system performance loss is very small, which is very and Photonics, 2007:66200Y-66200Y-6.
close to the theoretical value. Meanwhile, the hardware [10] Harris F. “Performance and design of Farrow filter used
implementation results show that the proposed parallel for arbitrary resampling” [C]. Int Conf on Digital
architecture does not consume a lot of resources. This simple Signal Processing. 1997:595 - 599.
and efficient parallel timing synchronization algorithm of [11] D'Andrea A N, Luise M. “Optimization of symbol
low complexity can be widely used in high speed timing recovery for QAM data demodulators” [J]. IEEE
communication system. Transactions on Communications, 1996, 44(3):399-406.
[12] Landgrebe D A. Phaselock Techniques, 3rd Edition [J].
2003.
Acknowledgements
[13] Guo J, Shi Y, Wang Z. “A Novel Design of DDR-based
Data Acquisition Storage Module in a Digitizer” [C].
This paper is supported by National Natural Science
International Conference on Communications, Circuits
Foundation of China (Grant No. 60934002).
and Systems. IEEE, 2007:995-998.
References
407