Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ROBUST VARIABLE-REGULARIZED RLS ALGORITHMS

Camelia Elisei-Iliescu† , Cristian Stanciu† , Constantin Paleologu† , Jacob Benesty‡ , Cristian Anghel† , and Silviu Ciochină†


University Politehnica of Bucharest, Romania, e-mail: {cristian, pale, canghel, silviu}@comm.pub.ro

INRS-EMT, University of Quebec, Montreal, Canada, e-mail: benesty@emt.inrs.ca

ABSTRACT 2. REGULARIZED RLS ALGORITHM


The regularization parameter is required in most (if not all) adap- Let us consider a system identification problem, where the desired
tive algorithms, while its role becomes very critical in the presence signal at the discrete-time index n is obtained as
of additive noise. In this paper, we focus on the regularized recur-
d(n) = hT x(n) + v(n) = y(n) + v(n), (1)
sive least-squares (RLS) algorithm and present a method to find its
regularization parameter, which is related to the signal-to-noise ra-  T
where h = h0 h1 · · · hL−1 is the impulse response
tio (SNR). Also, using a proper estimation of the SNR, we further
propose a variable-regularized RLS (VR-RLS) algorithm. In addi- (of length L) of the system that we need to identify, superscript T
tion, a low-complexity version of the VR-RLS algorithm is devel- denotes transpose of a vector or a matrix,
oped, based on the dichotomous coordinate descent (DCD) method.  T
x(n) = x(n) x(n − 1) · · · x(n − L + 1) (2)
Due to their nature, the proposed algorithms have good robustness
features against additive noise, which make them behave well in all is a vector containing the most recent L samples of the zero-mean
noisy conditions. Simulations performed in the context of acoustic input signal x(n), and v(n) is a zero-mean additive noise signal,
echo cancellation support these findings. which is independent of x(n). From (1), we can define the SNR as
Index Terms— Echo cancellation, adaptive filters, recursive least-
squares (RLS), variable-regularized RLS, dichotomous coordinate σy2
SNR = , (3)
descent (DCD). σv2
   
where σy2 = E y 2 (n) and σv2 = E v 2 (n) are the variances of
1. INTRODUCTION y(n) and v(n), respectively, with E [·] denoting mathematical ex-
The performance of the recursive least-squares (RLS) algorithm [1], pectation.
[2] is mainly controlled by two important parameters, i.e., the for- The objective is to estimate or identify h with an adaptive filter
 T
getting factor and the regularization term. The first one controls the 
h(n) = 
h0 (n)  h1 (n) · · ·  hL−1 (n) . Let us consider
“memory” of the algorithm and its value leads to a compromise be-
the regularized least-squares criterion:
tween the performance criteria, i.e., low misadjustment versus fast
tracking [3]. In order to properly address this compromise, several 
n  2  
variable forgetting factor RLS (VFF-RLS) algorithms were devel- J(n) =  T (n)x(i) + δ 
λn−i d(i) − h  
h(n)  , (4)
2
oped, e.g., see [4]–[6] and the references therein. i=0

As compared to the forgetting factor, the regularization parame- where λ (0 < λ < 1) is the exponential forgetting factor, δ is the
ter has been less addressed in the literature. Apparently, it is required regularization parameter, and ·2 is the 2 norm. From (4), the
in matrix inversion when this matrix is ill conditioned, especially in update of the regularized RLS algorithm [2] results in
the initialization stage of the algorithm. However, its role is of great
 −1
importance in practice, since regularization is a must in all ill-posed   − 1) + R  x (n) + δIL
h(n) = h(n x(n)e(n), (5)
problems (like in adaptive filtering), especially in the presence of
additive noise [7]–[9]. where
In this paper, we focus on the regularized RLS algorithm [2].
First, following the development from [8], a method to select an 
n
 x (n) =
R  x (n − 1) + x(n)xT (n) (6)
λn−i x(i)xT (i) = λR
optimal regularization parameter is presented, so that the algorithm
i=0
could behave well in all noisy conditions. Second, since the value
of this parameter is related to the signal-to-noise ratio (SNR), we is an estimate of the correlation matrix of x(n) at time n, IL is the
propose a simple and practical way to estimate the SNR in prac- identity matrix of size L × L, and
tice, which leads to a variable-regularized RLS (VR-RLS) algorithm.
Third, a low-complexity version of the proposed VR-RLS algorithm  T (n − 1)x(n) = d(n) − y(n)
e(n) = d(n) − h (7)
is developed, based on the dichotomous coordinate descent (DCD)
is the a priori error signal. We will assume that the matrix R  x (n)
method [10], [11]. Simulations performed in the context of acous-
tic echo cancellation [12], [13] indicate the good performance of the has full rank, although it can be very ill conditioned. As a result,
proposed algorithms, especially in terms of their robustness against if there is no noise, regularization is not really required; however,
the additive noise (i.e., the near-end signal). in terms of robustness, the more the noise, the larger should be the
value of δ.
This work was supported by UEFISCDI Romania under Grant PN- Summarizing, the regularized RLS algorithm is defined by the
II-RU-TE-2014-4-1880 and by UPB Romania under Grants UPB-GEX relations (5)–(7). In the next section, we present one reasonable way
100/26.09.2016 code 220 & 97/26.09.2016 code 533. to find the regularization parameter δ.

‹,(((  +6&0$


3. OPTIMAL REGULARIZATION PARAMETER 4. VARIABLE-REGULARIZED RLS ALGORITHM
It can be noticed that the update equation of the regularized RLS can Let us assume that the adaptive filter has converged to a certain de-
be rewritten as [8] gree, so that we can use the approximation y(n) ≈ y(n). Hence,

h(n)  − 1) + h(n),
= P(n)h(n (8)
σy2 ≈ σy2, (18)
where
 −1  
 x (n) + δIL where σy2 = E y2 (n) . Since the output of the unknown system
P(n) = IL − R x(n)xT (n) (9) and the noise can be considered uncorrelated, (1) can be expressed
and in terms of power estimates as
 −1

h(n) = R x (n) + δIL x(n)d(n) (10) σd2 = σy2 + σv2 , (19)
 
is the correctiveness component of the algorithm, which depends on where σd2 = E d2 (n) . Using (18) in (19), we obtain
the new observation d(n). In this context, we can notice that P(n)
does not depend on the noise signal and P(n)h(n  − 1) in (8) can be
σv2 ≈ σd2 − σy2. (20)
seen as a good initialization of the adaptive filter. In fact, (10) is the
solution of the noisy linear system of L equations: The power estimates can be evaluated in a recursive manner as
 
R
 x (n) + δIL h(n) = x(n)d(n). (11)
d2 (n) = γ
σ σd2 (n − 1) + (1 − γ)d2 (n), (21)
Let us define y2(n)
σ = σy2(n
γ − 1) + (1 − γ)
y (n),2
(22)
T (n)x(n),
e (n) = d(n) − h (12)
where γ = 1 − 1/(KL), with K ≥ 1, and σ d2 (0) = σ
y2(0) = 0.
the error signal between the desired signal and the estimated signal Therefore, based on (18), (20), and (21), an estimation of the SNR is
obtained from the filter optimized in (10). Consequently, we could obtained as
find δ in such a way that the expected value of e 2 (n) is equal to the
variance of the noise, i.e., y2(n)
σ
  
SNR(n) = , (23)
E e 2 (n) = σv2 . (13) | 2
σd (n) − σ y2(n)|
This is reasonable if we want to attenuate the effects of the noise in
so that the variable regularization parameter results in
the estimator h(n).
For the sake of simplicity, let us assume that x(n) is stationary

and white. Apparently, this assumption is quite restrictive, even if 
L 1 + 1 + SNR(n)
it was widely used in many developments in the context of adaptive δ(n) = σx2 = β(n)σx2 , (24)
filtering [1], [2]. However, the resulting VR-RLS algorithms will 
SNR(n)
 x (n) and, consequently, they will inherit
still use the full matrix R
the good performance feature of the RLS family in case of correlated where
inputs. In this case and for n large enough (also considering that the

forgetting factor λ is on the order of 1 − 1/L), we have 
L 1 + 1 + SNR(n)
 
σ2
  β(n) = (25)
 x (n) + δIL ≈
R x
+ δ IL ≈ Lσx2 + δ IL (14) 
SNR(n)
1−λ
  is the variable normalized regularization parameter. Consequently,
and xT (n)x(n) ≈ Lσx2 , where σx2 = E x2 (n) is the variance of
the input signal. Developing (13) and based on the previous approx- based on (24), we obtain a variable-regularized RLS (VR-RLS) al-
imations, we obtain the quadratic equation: gorithm, with the update:
2 2  −1
Lσ 2 Lσx 
h(n)  − 1) + R
= h(n  x (n) + δ(n)IL x(n)e(n),
δ2 − 2 x δ − = 0, (15) (26)
SNR SNR
with the obvious solution: where R  x (n) is recursively evaluated according to (6) and δ(n) is

L 1 + 1 + SNR 2 computed based on (21)–(24).
δ= σx = βσx2 , (16)
SNR Finally, some practical issues should be outlined. The abso-
where lute values in (23) prevent any minor deviations (due to the use of
√ power estimates) from the true values, which can make the denomi-
L 1 + 1 + SNR nator negative; also, a practical alternative could be to floor the term
β= (17)
SNR d2 (n) − σ
σ y2(n) to a small positive value. It is a non-parametric algo-
is the normalized regularization parameter of the RLS algorithm. rithm, since all the parameters in (23) are available. Also, good ro-
As we can notice from (16), the regularization parameter δ de- bustness against the additive noise variations is expected. The main
pends on three elements, i.e., the length of the adaptive filter, the drawback is due to the approximation in (18). This assumption will
variance of the input signal, and the SNR. In most applications, the be biased in the initial convergence phase or when there is a change
first two elements (L and σx2 ) are known, while the SNR can be esti- of the unknown system. Concerning the initial convergence, we can
mated. Using a proper evaluation of the SNR, the algorithm should use a constant regularization parameter δ in the first steps of the al-
own good robustness features against the additive noise. gorithm (e.g., in the first L iterations).


5. LOW-COMPLEXITY IMPLEMENTATION Table 1. VR-RLS-DCD algorithm.
The VR-RLS algorithm developed in the previous section faces two
main challenges in terms of computational complexity. The first one 
Initialization: h(0)  x (0) = 0L
= 0, r(0) = 0, R
 x (n) from (6), while the second issue is
is the update of the matrix R For n = 1, 2, . . .
related to the evaluation of the last term from the right-hand side of  x (n) = λR x (n − 1) + x(n)xT (n)
(26), which contains both the matrix inversion and the product with Step 1: R
the input vector. [using (27)]
The complexity of (6) can be greatly reduced taking into account Step 2: Compute δ(n) based on (21)–(24)
that the vector x(n) has the time shift property [see (2)] and the ma-  x (n) + δ(n)IL
 x (n) is symmetric. Thus, only the first column of this matrix Step 3: Q(n) = R
trix R
has to be computed, i.e.,  T (n − 1)x(n)
Step 4: e(n) = d(n) − h
Step 5: p0 (n) = λr(n − 1) + x(n)e(n)
 (1)
R  (1)
x (n) = λRx (n − 1) + x(n)x(n), (27)

Step 6: Q(n)h(n) 
= p0 (n) ⇒ h(n), r(n)
since the lower-right (L − 1) × (L − 1) block of R  x (n) can be (to be solved with DCD iterations [11])
obtained by copying the (L − 1) × (L − 1) upper-left block of the 
Step 7: h(n)  − 1) + h(n)
= h(n 
matrix R x (n − 1).
The evaluation of the last term from the right-hand side of (26)
is more challenging. In fact, the basic problem can be interpreted in
terms of solving the normal equations [2]:

Q(n)h(n) = p(n), (28) the solution vector. In this case, any multiplication with α can be
replaced by a bit-shift. Second, the parameter Nu represents the
where maximum number of allowed (or “successful”) iterations performed

for h(n) [11]; in practice Nu  L. The arithmetic complexity of
 x (n) + δ(n)IL
Q(n) = R (29)
the DCD algorithm is proportional to LNu but using only additions.
and Consequently, the complexity associated to the matrix inversion is
greatly reduced as compared to the classical method [which requires
p(n) = λp(n − 1) + x(n)d(n). (30) O(L3 ) operations] and even to the regular RLS algorithm [1], [2]
[which is based on the matrix inversion lemma and needs O(L2 ) op-
As an alternative to the classical approaches [1], [2], the normal erations]. Therefore, the DCD-based algorithms are very appealing
equations (28) can be recursively solved using the dichotomous co- for real-world applications.
ordinate descent (DCD) method [10]. The basic idea is to express Nevertheless, the RLS-DCD algorithm proposed in [11] uses a
the problem in terms of auxiliary normal equations with respect to constant regularization for R x (0), but the influence of this parameter
increments of the filter weights [11]. In our case, we need to solve is negligible due to the forgetting factor in the update of the matrix
 R x (n). On the other hand, using a proper estimation of the regular-
Q(n)h(n) = p0 (n), (31)
ization parameter within the algorithm (i.e., steps 2 and 3 in Table 1),

where h(n) is the increment of the filter weights and the robustness against additive noise could be improved. Thus, the
proposed VR-RLS-DCD algorithm owns this robustness feature, but
p0 (n) = λr(n − 1) + x(n)e(n), (32) also the low-complexity advantage inherited from the DCD method.

with r(n) representing the so-called residual vector associated to the 6. SIMULATION RESULTS
solution [11]. Consequently, following the development from Sec-
Simulations are performed in the context of acoustic echo cancel-
tion 4 and the steps presented in [11], the low-complexity version of
lation [12], [13]. The unknown system, i.e., the echo path, is a
the proposed VR-RLS algorithm, namely VR-RLS-DCD, is summa-
measured acoustic impulse response. It has 512 coefficients and the
rized in Table 1, where step 6 involves the DCD iterations.
same length is used for the adaptive filter (L = 512); the sampling
The DCD algorithm [10] is based on coordinate descent itera-
frequency is 8 kHz. The input signal (i.e., the far-end) is a speech
tions with a power of two variable step-size, α. It does not need
sequence and the output of the echo path is corrupted by a white
multiplications or divisions (these operations are simply replaced by
Gaussian noise with different SNRs, i.e., 20 dB, 10 dB, and 0 dB.
bit-shifts), but only additions, so that it is well suited for hardware
Based on (17), we can determine the values of the optimal normal-
implementation. In our case, the auxiliary normal equations from
ized regularization parameter in these cases. Using appropriate no-
step 6 are solved by using the DCD with a leading element [11].
tation, we obtain β20 = 56.57, β10 = 221.01, and β0 = 1236.07,
Due to the lack of space, we do not further detail the DCD algo-
respectively. In simulations, we compare the regularized RLS algo-
rithm. An insightful analysis of this algorithm can be found in [11].
rithm using these constant regularization parameters with the pro-
Also, detailed implementation aspects are discussed in [14].
posed VR-RLS and VR-RLS-DCD algorithms. Also, the RLS-DCD
Here, we briefly outline some of the important parameters of the
algorithm [11] is included for comparison, using Nu = 8, Mb = 16,
DCD algorithm (using the notation from [11]). First, the parame-
and H = 1 (the same parameters are used in the VR-RLS-DCD al-
ters H and Mb represent the maximum amplitude expected for the
 gorithm). The forgetting factor is set to λ = 1 − 1/(16L) for all the
values of h(n), respectively the number of bits used for their rep-
algorithms. The performance measure is the  normalized
 misalign-

resentation. If the value of H is chosen accordingly, the values of the   
ment (in dB), which is evaluated as 20log10 h − h(n)  / h2 .
step-size α correspond to the powers of 2 and are associated with the 2
bits comprising the binary representation of each computed value in In the first set of experiments, the value of the SNR is set to


(a) (a)
5 5
RLS, β RLS, β
Misalignment (dB)

Misalignment (dB)
0 20 10
0 RLS-DCD
RLS-DCD
VR-RLS -5 VR-RLS
-10
VR-RLS-DCD VR-RLS-DCD
-10
-20
-15

-30 -20
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time (seconds) Time (seconds)
(b) (b)
30 30
RLS, β RLS, β
Misalignment (dB)

Misalignment (dB)
20 20 10
20
RLS-DCD RLS-DCD
10 VR-RLS VR-RLS
10
0 VR-RLS-DCD VR-RLS-DCD
0
-10
-20 -10

-30 -20
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time (seconds) Time (seconds)

Fig. 1. Misalignment of the regularized RLS (using β20 ), RLS-DCD, Fig. 2. Misalignment of the regularized RLS (using β10 ), RLS-DCD,
VR-RLS, and VR-RLS-DCD algorithms. The input signal is speech, VR-RLS, and VR-RLS-DCD algorithms. The input signal is speech,
L = 512, and SNR = 20 dB. (a) Echo path changes at time 30 sec- L = 512, and SNR = 10 dB. (a) Echo path changes at time 30 sec-
onds. (b) Near-end speech appears between time 27 and 30 seconds onds. (b) Near-end speech appears between time 27 and 30 seconds
(double-talk scenario). (double-talk scenario).

(a)
20 dB. In Fig. 1(a), an echo path change scenario is simulated in the 5
RLS, β0

Misalignment (dB)
middle of the experiment, by shifting the impulse response to the RLS-DCD
0
right by 25 samples. First, it can be noticed that the VR-RLS and VR-RLS
VR-RLS-DCD
VR-RLS-DCD algorithms behave very similarly and are close to the
-5
regularized RLS algorithm using the constant (optimal) parameter
β20 , which is associated to the value of the SNR. As expected, there -10
is an inherent delay in the initial convergence rate and tracking reac- 0 10 20 30 40 50 60
tion of the variable-regularized algorithms (as compared to the RLS- Time (seconds)
DCD algorithm), due to the approximation in (18). In Fig. 1(b), (b)
30
RLS, β0
Misalignment (dB)

a double-talk scenario is considered; the near-end speech appears


20 RLS-DCD
between time 27 and 30 seconds. It is clear that the VR-RLS and VR-RLS
VR-RLS-DCD algorithms are more robust in this case, since the es- 10 VR-RLS-DCD

timated SNR from (23) also includes the contribution of the near-end 0
signal.
-10
In the second set of experiments, we select a lower SNR value, 0 10 20 30 40 50 60
i.e., 10 dB. In this case, the importance of regularization becomes Time (seconds)
more apparent. As we can see from Fig. 2(a), the VR-RLS and VR-
RLS-DCD algorithms behave similarly to the regularized RLS using Fig. 3. Misalignment of the regularized RLS (using β0 ), RLS-DCD,
the constant (optimal) parameter β10 , and outperform the RLS-DCD VR-RLS, and VR-RLS-DCD algorithms. The input signal is speech,
algorithm (in terms of misalignment). Also, as we can notice in L = 512, and SNR = 0 dB. (a) Echo path changes at time 30 sec-
Fig. 2(b), the variable-regularized algorithms are much more robust onds. (b) Near-end speech appears between time 27 and 30 seconds
to double-talk, as compared to their counterparts. (double-talk scenario).
Finally, in the last set of experiments, we consider SNR = 0 dB.
As expected, according to the results in Fig. 3(a), the VR-RLS and
VR-RLS-DCD algorithms behave now similarly to the regularized was proposed, using a proper estimation of the SNR, thus result-
RLS using the constant (optimal) parameter β0 , and are much bet- ing a VR-RLS algorithm. Moreover, a low-complexity version of
ter as compared to the RLS-DCD algorithm. Besides, according to this algorithm was derived, based on the DCD method, namely the
Fig. 3(b), the variable-regularized algorithms outperform by far their VR-RLS-DCD. Simulations performed in an acoustic echo cancel-
counterparts in terms of double-talk robustness. lation scenario indicate that the variable-regularized algorithms own
good robustness features against the near-end signal. In other words,
the robustness of the algorithm against SNR variations (e.g., like
7. CONCLUSIONS
double-talk) can be controlled in terms of the regularization param-
Most of the works reported in literature (even in popular books such eter. Without a proper regularization, the misalignment of the adap-
as [1], [2]) treated the regularization parameter of the RLS algo- tive filter may fluctuate a lot and may even never converge, especially
rithm as a small positive constant used only in the initialization stage. for low SNRs. Using the proposed estimation of the SNR and, con-
In this paper, we have focused on the regularized RLS algorithm, sequently, of the regularization parameter, the variable-regularized
first presenting a method to find an optimal regularization param- RLS algorithms behave very well at all SNR levels or variations,
eter depending on the SNR. Then, a variable-regularized solution thus being reliable candidates for real-world applications.


8. REFERENCES
[1] S. Haykin, Adaptive Filter Theory. Fourth Edition, Upper Saddle River,
NJ: Prentice-Hall, 2002.
[2] A. H. Sayed, Adaptive Filters. New York, NY: Wiley, 2008.
[3] S. Ciochină, C. Paleologu, J. Benesty, and A. A. Enescu, “On the influ-
ence of the forgetting factor of the RLS adaptive filter in system identi-
fication,” in Proc. IEEE ISSCS, 2009, pp. 205–208.
[4] Y. J. Chu and S. C. Chan, “A new local polynomial modeling-based
variable forgetting factor RLS algorithm and its acoustic applications,”
IEEE/ACM Trans. Audio, Speech, Language Processing, vol. 23, pp.
2059–2069, Nov. 2015.
[5] C. Paleologu, J. Benesty, and S. Ciochină, “A robust variable forget-
ting factor recursive least-squares algorithm for system identification,”
IEEE Signal Processing Lett., vol. 15, pp. 597–600, 2008.
[6] S.-H. Leung and C. F. So, “Gradient-based variable forgetting factor
RLS algorithm in time-varying environments,” IEEE Trans. Signal Pro-
cessing, vol. 53, pp. 3141–3150, Aug. 2005.
[7] P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems: Numer-
ical Aspects of Linear Inversion. Philadelphia, PA: SIAM, 1998.
[8] J. Benesty, C. Paleologu, and S. Ciochină, “Regularization of the RLS
algorithm,” IEICE Trans. Fundamentals, vol. E94-A, pp. 1628–1629,
Aug. 2011.
[9] Y. V. Zakharov and V. H. Nascimento, “Sparse sliding-window RLS
adaptive filter with dynamic regularization,” in Proc. EUSIPCO, 2016,
pp. 145–149.
[10] Y. V. Zakharov and T. C. Tozer, “Multiplication-free iterative algo-
rithm for LS problem,” IEE Electronics Lett., vol. 40, pp. 567–569,
Apr. 2004.
[11] Y. V. Zakharov, G. P. White, and J. Liu, “Low-complexity RLS algo-
rithms using dichotomous coordinate descent iterations,” IEEE Trans.
Signal Processing, vol. 56, pp. 3150–3161, July 2008.
[12] J. Benesty, T. Gaensler, D. R. Morgan, M. M. Sondhi, and S. L. Gay,
Advances in Network and Acoustic Echo Cancellation. Berlin, Ger-
many: Springer-Verlag, 2001.
[13] C. Paleologu, J. Benesty, and S. Ciochină, Sparse Adaptive Filters for
Echo Cancellation. Morgan & Claypool Publishers, 2010.
[14] J. Liu, Y. V. Zakharov, and B. Weaver, “Architecture and FPGA design
of dichotomous coordinate descent algorithms,” IEEE Trans. Circuits
and Systems I: Regular Papers, vol. 56, pp. 2425–2438, Nov. 2009.



You might also like