Professional Documents
Culture Documents
1 s2.0 S0030401820300213 Main
1 s2.0 S0030401820300213 Main
Optics Communications
journal homepage: www.elsevier.com/locate/optcom
∗ Corresponding author.
E-mail address: hzt@bupt.edu.cn (Z. Huang).
https://doi.org/10.1016/j.optcom.2020.125272
Received 25 November 2019; Received in revised form 4 January 2020; Accepted 7 January 2020
Available online 10 January 2020
0030-4018/© 2020 Elsevier B.V. All rights reserved.
X. Wu, Z. Huang and Y. Ji Optics Communications 462 (2020) 125272
2.1. VLC system Visible light communication wireless links can be divided into
two categories: line-of-sight channel (LoS), non-line-of-sight channel
The architecture of the VLC system based on DCO-OFDM is illus-
(NLoS). Indoor communication often uses LoS. Path loss can be cal-
trated in Fig. 1. It is noticeable that the CE part would be replaced by
culated by parameters. Consider the visible light communication light
the trained DNN model. The workflow will be introduced below. In
source as the Lambertian model. The channel impulse response of the
order to focus on channel estimation, we did not add forward error
visible light signal received by the receiving end can be expressed in
correction coding technology. By applying a suitable direct current
Eq. (5).
(DC) to 𝑥(𝑛), non-negative signals are transmitted through the LED. On
{ (𝑚+1)𝐴
the receiver, the optical intensity is captured by a photodiode (PD) with 𝑟
cos(𝜙)𝑚 cos(𝜓)𝑔(𝜓), 0 < 𝜓⩽𝜓𝑐
ℎ𝐿𝑜𝑆 (0) = 2𝜋𝑑 2 (5)
direct detection. After analog-to-digital conversion and data recovery,
0. 𝜓 > 𝜓𝑐
𝑌 (𝐾) is prepared to de-mod. CE data described by the DNN model is
applied to recovery 𝑌 (𝐾) before de-mod. Finally, we will get the time- Where 𝑚 is the Lambertian radiation, 𝐴𝑟 is the physical area of the
domain signal, it should be the same as 𝑥(𝑛) in ideal conditions. We detector in the PD, 𝑑 is the distance between a transmitter and a
assume a sample-spaced channel which expressed as h(n). The output receiver, 𝜙 is the angle of irradiance, 𝜓 is the angle of incidence,
received signal of the channel can be calculated as Eq. (1). 𝑔(𝜓) is the gain of optical concentrator. Noise is assumed as an AWGN
(additive white Gaussian noise).
𝑦(𝑛) = 𝑥(𝑛) ⊗ ℎ(𝑛) + 𝑤(𝑛) , (1)
where ⊗ denotes the circular convolution. After removing the cyclic 3. Channel estimation based deep learning method
prefix (CP) and performing DFT, the received the frequency-domain
signal is calculated in Eq. (2).
3.1. Least squares and minimum mean square error estimation
𝑌 (𝑘) = 𝑋(𝑘)𝐻(𝑘) + 𝑊 (𝑘) . (2)
Conventional channel estimation schema based on the training se-
quence (TS) technique is to insert sub-carriers as pilots which are
2.2. O-OFDM
already known both sides. The receiver could estimate the channel
Since the VLC system adopts the IM/DD modulation detection by the difference between the known signal and the corresponding
method, in order to ensure that the transmitted signals in time domain transmitted signal. The instantaneous channel impulse can be calcu-
obtained by the inverse discrete Fourier transform (IDFT) are real- lated in this way. The more pilots added, the more accurately estimate.
valued signals. In this case, zero-padding, and hermitian conjugate However, inserted pilot signal occupy the band resources and may also
operation are required, the input frequency domain 𝑋(𝑘) should fulfill suffer from noise which result in bad effort to recovery. The optimal
hermitian symmetry and the IDFT size must be at least twice the criterion of the LS method is to minimize the least square errors of the
subcarrier number 𝑁. 𝑥(𝑛) is generated by Eq. (3) so that IM/DD will data block and the target data block to find an optimal estimator for
not lose information. Eq. (4) further explains the parameter in Eq. (3). the unknown parameters. The estimated channel by LS is written as
follows,
∑
2𝑁−1 [ ]
1 𝑗2𝜋𝑛𝑘 ̂ 𝐿𝑆 = 𝐘(𝑝1 )∕𝐗(𝑝1 ), … 𝐘(𝑝𝑛 )∕𝐗(𝑝𝑛 ), 𝐘(𝑝𝑁 )∕𝐗(𝑝𝑁 ), 𝑇 ,
𝐇 (6)
𝑥(𝑛)= √ 𝑅{𝐶(𝑘) exp( )} , (3)
2𝑁 𝑘=0 2𝑁
where 𝑝𝑛 is the 𝑛th pilot index of OFDM symbol. 𝑋(𝑝𝑛 ), 𝑛 = {1, 2, … , 𝑁}
is a pilot signal that is known between the transmitter and the re-
𝐶 = [𝐗𝑇 , (𝑍(𝐗))𝑇 ]𝑇 . (4) ceiver for channel estimation. The purpose of MMSE is to find the
2
X. Wu, Z. Huang and Y. Ji Optics Communications 462 (2020) 125272
unknown parameters by minimizing the mean square error (MSE). Forward propagation performs a series of linear operations and
MMSE estimated channel is generated by Eq. (7). activation operations through the weight coefficient matrix 𝐰, the bias
𝜎𝑛2 vector 𝐛, and the input value vector, starting from the input layer,
̂ 𝑀𝑀𝑆𝐸 = 𝑅
𝐇 𝐻𝐻
(𝑅𝐻𝐻 + 𝐼)−1 𝐇
̂ 𝐿𝑆 , (7) calculating the layer backward, and continuing to the output layer to
𝜎𝑥2
obtain the output result. The forward propagation calculation is shown
where 𝑅𝐻𝐻 is the covariance matrix of the channel coefficient in the below as two steps:
frequency domain. 𝜎 2 is noise variance.
These two pilot-based channel estimation algorithms have been 𝑧𝑖(𝑙) = 𝐰𝑖(𝑙) 𝐲(𝑙−1) + 𝑏𝑖(𝑙) , (8)
adopted to compare with the deep learning method as a traditional 𝑦𝑖(𝑙) = 𝑓 (𝑧𝑖(𝑙) ) , (9)
scheme.
where l means the 𝑙th layer, i means the 𝑖th node of a layer, 𝑓 ( ) is the
3.2. DNN method activation function.
Backpropagation first requires a loss function to measure the dif-
‘‘Deep’’ refers to the number of layers in the neural network. Shal- ference between the output calculated by the training sample and the
low neural networks have a so-called hidden layer, while deep neural actual training sample output, where the output of the training sample
networks have more than one hidden layer. The basis of deep learning is calculated by the forward propagation algorithm. We use L2 as the
is distributed representation in machine learning. It assumes that ob- loss function.
servations are generated by interactions of different factors and that 1 ∑ ̂
𝐿2 = (𝑍(𝑘) − 𝑍(𝑘))2 , (10)
the process of this interaction can be divided into multiple levels, 𝑁 𝑘
representing the multi-layered abstraction of observed values. Simple
̂
where 𝑍(𝑘) is the prediction and 𝑍(𝑘) is the supervision, which is the
features (such as two pixels) can be layer-by-layer superimposed to
form more complex features (such as a straight line). transmitted symbols in this experiment. For each sample, it is expected
Usually, deep learning can be concluded in two stages: training that the following formula will be minimized,
( )
model and implement the trained model. In this paper, the train is 1‖ ‖2 1 ‖ ‖2
offline and implementation is online. In the offline training, the model 𝐽 (𝛩) = ‖𝑦𝑙𝑖 − 𝑎‖ = ‖ ‖ 𝑓 𝐰𝑖(𝑙) 𝐲(𝑙−1) + 𝑏(𝑙) − 𝑎‖
‖ . (11)
2 ‖ ‖2 2‖ 𝑖
‖2
is trained with samples that are OFDM signals and labels, labels are the
The neurons in each layer of the neural network will produce
received corresponding OFDM signals passing optical channel in this
predictions. Therefore, the gradient descent method of the traditional
case. The channel influence between transmitted signals and received
regression problem cannot be directly used to minimize the loss func-
signals could be represented by the model. In the online stages, the
tion, and the prediction error needs to be considered layer by layer
DNN model trained offline takes unknown received signals as input
and recover it to get close to transmitted signals through the trained and optimized layer by layer. In a multi-layer neural network, the
knowledge. Backpropagation Algorithm is used to optimize the prediction. First, the
DNN is popular while facing non-convex and nonlinear problems. prediction error of each layer is defined as the vector 𝛿 (𝑙) , and expressed
The structure of a typical DNN model is shown in Fig. 2. Generally as Eq. (12)
{
speaking, the inner neural network layer of DNN can be divided into 𝐲(𝑙) − 𝑎 𝑙=𝐿
three types, input layer, hidden layer, and output layer. As shown in
(𝑙)
𝛿 = ( (𝑙+1) (𝑙+1) )𝑇 ( ) (12)
𝐰 𝛿 ⊙ 𝑓 ′ 𝐳(𝑙) 𝑙 = 2, 3, … , 𝐿 − 1
Fig. 2, the first layer is the input layer, the middle layers are all hidden
and the last layer is the output layer. The neuron is the fundamental The activation function must be non-linear, otherwise, the output
unit of the neural network, also called Node. The middle layers are fully of each layer is a linear function of the upper layer. When a nonlinear
connected, that is, any node of the 𝑖th layer must be connected to any function is introduced as an activation function, the output of the
node of the 𝑖 + 1th layer. neural network is no longer a linear combination of inputs and can
3
X. Wu, Z. Huang and Y. Ji Optics Communications 462 (2020) 125272
approximate an arbitrary function. The activation functions are sigmoid In this paper, the CE model includes five layers, three of which
function and Relu function, defined as below: are hidden layers. The number of neurons in each layer is 256, 500,
500, 120, 16. The number of neurons in the input and output layers
𝜎Re𝐿𝑈 (𝑥) = max(0, 𝑥) , (13)
corresponds to the length of the input and output data vectors, re-
1 spectively. The number of neurons in the input layer of the network
𝜎𝑆𝑖𝑔𝑚𝑜𝑖𝑑 (𝑥) = . (14)
1 + 𝑒−𝑥
is n, which is the number of sub-carriers of OFDM. A random binary
Relu is a commonly used activation function in deep neural networks, number 𝑏𝑖 is generated as transmitted data, then it is modulated by
and it is faster and faster. Sigmoid is used in the last layer to map the m-mod to become a complex signal which is the training data. The
input continuous real value to output between 0 and 1. label is the corresponding constellation point generated by 𝑏𝑖. Training
the model to minimize the difference between the input and the label,
Algorithm 1 DNN Training Algorithm the difference is expressed by 𝐿2 loss. In the middle layers, we use
Input: The total number of layers 𝐿, the number of neurons Relu function as activation function, and the output layer use sigmoid
in each hidden layer and output layer, activation function, loss function to get the output in the interval [0, 1].
function, iteration step length 𝑀𝐴𝑋, maximum iteration num- To avoid over-fitting, dropout is introduced into the training, when
ber and threshold to stop iteration 𝜀; 𝑚 Samples and labels training a batch of training data, randomly remove some of the hidden
{( ) ( ) ( )} layer neurons, and use the neural network to remove the hidden layer
𝑠1 , 𝑟1 , 𝑠2 , 𝑟2 , … , 𝑠𝑚 , 𝑟𝑚 ;
Output: Linear relationship coefficient matrix 𝐰 and bias vector 𝐛 of to fit the training data, and then use this network to remove a part of
each hidden layer and output layer; the neurons to perform a round of parameter update. Before a batch
1: Initial 𝐰 and 𝐛 of each hidden layer and the output layer as a of data training, the DNN model will be restored to the original fully
random value; connected model, and then some hidden layer neurons will be removed
2: for epoch = 1 to 𝑀𝐴𝑋 do again, and the parameters will be updated.
3: for i = 1 to 𝑚 do In order to better evaluate the generalization ability of the model,
4: Initial the DNN input layer, 𝐲1 = 𝐱; Using the LOOCV method (Leave-one-out cross-validation), firstly, the
5: for l = 2 to 𝐿 do simplest method is to divide the training set and the test set into two
6: do forward propagation, calculate 𝐲𝑖 ; parts, one for training and one for verification. By training the model on
7: end for the training set, observe the size of the MSE corresponding to different
8: calculate the output layer 𝛿 (𝐿) through loss function; parameters of different models on the test set, and then select the
9: for l = 2 to 𝐿 do appropriate model and parameters.
10: do back propagation, calculate 𝛿 (𝑙) ;
11: end for 4. Experiment results and discussion
12: end for
13: for l = 2 to 𝐿 do We implement a series of experiments to prove the performance of
14: update 𝐰 and 𝐛 of the 𝑙 layer. the deep learning technique and analyze the results. As Fig. 3 shows,
15: if all 𝐰 and 𝐛 changes are less than the stop iteration threshold the OFDM signal is generated in the MATLAB and transmitted by
𝜀 then arbitrary-waveform-generator (AWG, Tektronix AWG5012). The signal
16: jump out of the iteration loop; bandwidth is set to 100 Mb/s, the bias voltage of the LED (OSRAM LUW
17: end if W5SM) is set to 6 V, and the drive voltage of the signal is set to 2 V.
18: end for At the receiver, the commercially-made avalanche diode (Hamamatsu
19: end for C12702-12) is used to receive optical signals and execute the photoelec-
tric conversion. Then, the signal captured by an oscilloscope(LeCroy
3.3. DNN model training SDA760Zi) is demodulated in the MATLAB. Different channel estima-
tion methods are compared in terms of BER under different conditions
It receives input from external or other nodes (usually the front expressed by signal-to-noise ratios (SNR).
layer) and calculates the output through an activation function, which
contributes to delinearization. The input corresponds to the weight, 4.1. Impact of pilot number
which is the relative importance of each input received by this node;
the bias (Bias) can be understood as a special input such as additive Fig. 4 shows that the DNN model provides the best performance
radio noise. when only 8 pilots are used to complete channel estimation. When a
4
X. Wu, Z. Huang and Y. Ji Optics Communications 462 (2020) 125272
Fig. 4. BER performance for LS, MMSE and DNN model with 8 pilots. Fig. 6. BER performance for LS, MMSE and DNN model without CP.
5. Conclusion
Fig. 5. BER performance for LS, MMSE and DNN model with 64 pilots. Xi Wu: Data curation, Writing - original draft. Zhitong Huang:
Methodology, Writing - review & editing, Project administration. Yue-
feng Ji: Conceptualization, Writing - review & editing.
lack of pilots, LS and MMSE estimators both perform worse compared
with 64 pilots because of less channel information. DNN model perfor- Acknowledgments
mance is excellent all the time. As indicated before, the DNN model
has already learned characteristics of the channel from train data and This work was supported in part by the National Key Research and
its performance does not depend on the pilots. Development Program of China under Grant 2017YFB0403605, in part
From Fig. 5, when 64 pilots are added, the LS estimator provides by the National 973 Program of China under Grant 2013CB329205,
the worst performance, which degrades while noise increasing. Because and in part by the National Natural Science Foundation of China under
of enough pilots, the MMSE estimator has the best performance since it Grant 61801165.
has accurate channel statistical properties. DNN model is comparable to
the MMSE estimator. When we turn to complexity, the DNN model can References
be trained offline and the optimization is excluded immediately with-
[1] X. Liu, Y. Wang, F. Zhou, Z. Deng, R. Qingyang Hu, Performance analysis on
out the amount of calculation. On the contrary, the MMSE estimator visible light communications with multi-eavesdroppers and practical amplitude
requires time and resources to realize. constraint, IEEE Commun. Lett. 23 (12) (2019) 2292–2295, http://dx.doi.org/
10.1109/LCOMM.2019.2947549.
[2] Y. Ji, J. Zhang, X. Wang, H. Yu, Towards converged, collaborative and
4.2. Impact of CP co-automatic (3C) optical networks, Sci. China Inf. Sci. 61 (12) (2018) 121301.
[3] Y. Ji, J. Zhang, Y. Xiao, Z. Liu, 5G flexible optical transport networks with
large-capacity, low-latency and high-efficiency, China Commun. 16 (5) (2019)
Fig. 6 illustrates the result for a DCO-OFDM system without CP, 19–32.
which reduce spectrum occupancy furthermore. CP transforms linear [4] H. Elgala, R. Mesleh, H. Haas, A study of LED nonlinearity effects on optical
wireless transmission using OFDM, in: 2009 IFIP International Conference on
convolution into circular convolution and Maintain frequency orthog- Wireless and Optical Communications Networks, 2009, pp. 1–5, http://dx.doi.
onality. It is guaranteed that the number of periods of the waveform org/10.1109/WOCN.2009.5010576.
contained in the delayed copy of the OFDM symbol is also an integer [5] S. Coleri, M. Ergen, A. Puri, A. Bahai, Channel estimation techniques based
on pilot arrangement in OFDM systems, IEEE Trans. Broadcast. 48 (3) (2002)
during the FFT period. Removing CP, the performance of LS and MMSE
223–229, http://dx.doi.org/10.1109/TBC.2002.804034.
estimators both decline along with increasing noise. MMSE estimator [6] M. Morelli, U. Mengali, A comparison of pilot-aided channel estimation methods
still performs better than LS estimator, but greatly worse than DNN for OFDM systems, IEEE Trans. Signal Process. 49 (12) (2001) 3065–3073,
model. In the meantime, accuracy tends to be saturated when SNR rises. http://dx.doi.org/10.1109/78.969514.
[7] Y.S. Hussein, M.Y. Alias, A.A. Abdulkafi, On performance analysis of LS and
DNN model works better as long as the SNR ascends. Compared to the
MMSE for channel estimation in VLC systems, in: 2016 IEEE 12th International
performance with cp, neither MMSE nor LS can work effectively as well, Colloquium on Signal Processing its Applications, CSPA, 2016, pp. 204–209,
but the learned DNN still learn the characteristics of channel well. http://dx.doi.org/10.1109/CSPA.2016.7515832.
5
X. Wu, Z. Huang and Y. Ji Optics Communications 462 (2020) 125272
[8] O. Şayli, H. Doğan, E. Panayirci, On channel estimation in DC biased opti- [14] A. Yesilkaya, O. Karatalay, A.S. Ogrenci, E. Panayirci, Channel estimation for
cal OFDM systems over VLC channels, in: 2016 International Conference on visible light communications using neural networks, in: 2016 International Joint
Advanced Technologies for Communications, ATC, 2016, pp. 147–151, http: Conference on Neural Networks, IJCNN, IEEE, 2016, pp. 320–325.
//dx.doi.org/10.1109/ATC.2016.7764763. [15] Z. Gao, Y. Wang, X. Liu, F. Zhou, K. Wong, FFDNet-based channel estimation
[9] X. Chen, M. Jiang, Adaptive statistical Bayesian MMSE channel estimation for massive MIMO visible light communication systems, IEEE Wirel. Commun.
for visible light communication, IEEE Trans. Signal Process. 65 (5) (2017) Lett. (2019) 1, http://dx.doi.org/10.1109/LWC.2019.2954511.
1287–1299, http://dx.doi.org/10.1109/TSP.2016.2630036. [16] M. Soltani, V. Pourahmadi, A. Mirzaei, H. Sheikhzadeh, Deep learning-based
[10] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) channel estimation, IEEE Commun. Lett. 23 (4) (2019) 652–655, http://dx.doi.
436–444. org/10.1109/LCOMM.2019.2898944.
[11] S. Dörner, S. Cammerer, J. Hoydis, S. ten Brink, Deep learning based com- [17] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Netw.
munication over the air, IEEE J. Sel. Top. Sign. Proces. 12 (1) (2017) 61 (2015) 85–117.
132–143. [18] X. Liu, Z. Chen, Y. Wang, F. Zhou, Y. Luo, R.Q. Hu, BER analysis of NOMA-
[12] X. Wang, L. Gao, S. Mao, S. Pandey, Csi-based fingerprinting for indoor enabled visible light communication systems with different modulations, IEEE
localization: A deep learning approach, IEEE Trans. Veh. Technol. 66 (1) (2016) Trans. Veh. Technol. 68 (11) (2019) 10807–10821, http://dx.doi.org/10.1109/
763–776. TVT.2019.2938909.
[13] S. Chen, G. Gibson, C. Cowan, P. Grant, Adaptive equalization of finite non-linear
channels using multilayer perceptrons, Signal Process. 20 (2) (1990) 107–119.