Professional Documents
Culture Documents
PSO-LSSVM Model For Hy-Line Brown Laying-Type Hens Egg-Laying Rate Prediction Based On PCA
PSO-LSSVM Model For Hy-Line Brown Laying-Type Hens Egg-Laying Rate Prediction Based On PCA
PSO-LSSVM Model For Hy-Line Brown Laying-Type Hens Egg-Laying Rate Prediction Based On PCA
ABSTRACT Statistical models for predicting potential laying pattern were important for economically
optimal breeding strategy of egg production in a poultry flock. The aim of this study was to establish an
optimal model for describing egg production using room temperature, feed consumption, layer weight, and
age during the production period. The following mathematical models were used PSO-LSSVM (Particle
swarm optimization-Least squares support vector machines) and PCA (Principal component analysis). The
daily recorded of egg production data from 19,666 laying-type hens was used. Hen-daily egg production
was described using egg-laying rate during successive days after reaching sexual maturity (120 days of age)
and daily recorded room temperature, feed consumption, layer weight, and age. Then present study used
PCA to study the correlation between this data. Using the Pearson correlation coefficient of the five factors
(maximum and minimum shed temperatures, layer weight, feed consumption, and age) and egg-laying rate,
the present study weighted each factor according to its influence on the egg-laying rate. In addition, LSSVM
was used to create a regression model of the weighted data, and PSO was to optimize parameters (penalty
coefficient c and kernel parameter g) in the LSSVM. Our experimental results showed that the goodness-of-
fit criteria value (MSE) was small, lower than that of existing prediction models. The PSO-LSSVM model
was able to fit well egg-laying rate of the whole Hy-Line Brown laying-type hens’ flock.
INDEX TERMS Least squares support vector machines (LSSVM), principal component analysis (PCA),
egg-laying rate, particle swarm optimization (PSO).
I. INTRODUCTION have shown that egg-laying rate correlated with age, temper-
Among food products, eggs provide the human body a rich ature, humidity, layer weight, feed, and other characteristics.
source of protein, fat, minerals, and various vitamins, giving In poultry research, the use of mathematical models to fit
them an extremely high nutritional value [1]. According to the egg-laying rate began in 1981 [8], McMillan first proposed
Food and Agriculture Organization (FAO), the global produc- the McMillan compartment model to establish the nonlinear
tion of eggs exceeded 1370 million metric tons in 2018, with prediction curve of egg production with time. In subsequent
the top five egg-producing countries accounting for 56.13% egg production studies, ZHANG Yudong [9] discussed the
of the total demand. Factors that affect the egg-laying rate fitting curve between hormones, egg-laying rate, egg layer
were complex, and changes in the egg-laying rate reflect fac- weight and feed conversion rate. LUO lei [10] studied the
tors such as environmental differences and growth of laying curve fitting between egg-laying rate and cumulative egg
hens over a period of time. Predicting egg production data in production. LIANG Zhongjun [11] established a fitting curve
advance improves the economic performance of the farm and between egg-laying rate and amino acid intake level.
provided important insight into the production resources the However, the goodness-of-fit of these models were not
farm requires at the next stage [2]. satisfactory due to reason that egg-laying rate was affected by
Studies by ZHANG Houchen [3], KONG Lijun [4], ZHU a number of factors, a regression model of the egg-laying rate
Jianguo [5], WANG Haitao [6], and WANG Shaofeng [7] established using a single influencing feature or some kind of
average of features offered low accuracy and poor resistance
The associate editor coordinating the review of this manuscript and to interference. Only by establishing an accurate mathemat-
approving it for publication was Nadeem Iqbal. ical model to predict laying curve of laying hens can it be
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 8, 2020 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 167319
M. Jiang et al.: PSO-LSSVM Model for Hy-Line Brown Laying-Type Hens ‘Egg-Laying Rate Prediction Based on PCA
Set
n
X T
XX T = X∗ · X∗
i=1
1 T T
The objective function = max u XX u1 (4)
226 1
The objective function could also be expressed as the L2 of
the vector:
1
max (X T u1 )T X T u1
226
= max < X T u1 , X T u1 >
FIGURE 1. Flow chart of principal component analysis (PCA). ||X T u1 ||2 2
= max||X T u1 ||22 = max( ) (5)
||u1 ||2
Since the maximum value of the ratio of vector length
Y = (y1 , y2 , y3 , · · · , yi )T (i ∈ N+ ) was the main component:
mapped by matrix was the maximum singular value of this
matrix, that was:
a11 a12 · · · a1m
a21 a21 · · · a2m
||X T u1 ||2
A= . .. .. . (1) ≤ σ XT
(6)
.. . . ||u1 ||2
ai1 ai2 ··· aim
Supposed the eigenvalues of XX T are λ1 , λ1 . . . λn , so that
Matrix A required the principal components to be inde-
||X T u1 ||2 √
pendent of each other. The combination coefficient vector ≤ λ1 = σ1 (7)
a = (a1 , a2 , a3 , · · · , ai )T was a unit vector, i.e., ai aTi = 1. ||u1 ||2
How well the main component Yi retained the original infor- Therefore, the solution to the objective function was σ1 ,
mation was measured by its variance: the larger the variance, σ1 was the maximum singular value of the matrix X T and the
the more information was stored in the original data. The direction of the first principal component, which can be called
greatest variance was called the first principal component, the first principal component. The second and third principal
followed by the second principal component, and so on. The components used the same method. The percentage of the
flow chart of PCA was shown in figure 1. principal component in the original data can be expressed by
In order to explore the key indexes influencing the egg- equation (8)
laying rate, egg-laying rate, age, maximum shed temperature, sP
k
minimum shed temperature, feed consumption, and layer σi
ε = Pi=1 n (8)
weight were taken into consideration by using principal com- i=1 σi
ponent analysis.
where, ni=1 σi was the sum of squares of all singular values
P
Sample data was recorded the as Xnomal = (x1 , x2 , x3 , · · · ,
XX T , and ki=1 σi was the sum of squares of the K singular
P
xj , · · · , x226 )T , j = 1, · · · , 226. Each data sample xj had 6-
dimensional characteristics: age c1, maximum shed temper- values selected.
ature c2, minimum shed temperature c3, feed consumption
C. LEAST SQUARE SUPPORT VETOR MACHINE
c4, layer weight c5, andegg-laying rate c6. Thus, xnomal =
xjc1 , xjc2 , xjc3 , · · · , xic6 . When it came to analyze the data, The support vector machine (SVM) was a machine learning
we centralized data X∗ = (x1 −u, x2 −u, x3 −u, · · · x226 −u), method proposed by Vapnik and Cortes et al [14]. Based
where on statistical learning theory, this method made use of the
compromise between approximation accuracy and model
226
1 X complexity to find the minimized risk so as to achieve
u= xi (2) better modeling effect. Least square support vector regres-
226
i=1
sion [17], [18] used nonlinear transformation to map actual
After the centralization, the data was most widely dis- sample data to a high-dimensional feature space using a
tributed in the u1 direction of the first principal axis. The kernel function and then constructed a decision function to
direction of u1 can be obtained by inner product of X∗ and perform linear regression.
LSSVM replaced the non-equality constraint of the opti- As a result, the expression function used by the LSSVM as
mization problem in SVM with an equality constraint as the regression prediction could be expressed as follows:
shown in equation (9): N
X
N y (x) = αk K (x, xk ) + b (21)
1 T 1 X 2
minw,b,e J (w, e) = w w+ γ ek , (9) k=1
2 2
k=1
D. PARTICLE SWARM OPTIMIZATION ALGORITHM
subjected to PSO algorithm was a global optimization algorithm designed
h i by simulating the predatory behavior of birds. The flock sends
yk = wT ϕ (xk ) + b + ek , k = 1, 2, . . . , N , messages to each other throughout the search, letting the other
birds knew where they were. Through such collaboration,
where ek denoted each error variable, ϕ (xk ) was original data,
the birds decided whether they have found the optimal solu-
and xk maps to a higher dimensional space.
tion and transmitted the information of the optimal solution to
The Lagrange multiplier method was used to transform the
the whole flock. Eventually, the entire flock could converge
original problem into a single parameter:
around the food source, which was what it called finding the
L (w, b, e; a) = J (w, e) optimal solution.
N nh i o Particle swarm algorithm simulated birds in a flock by
X
− αk wT ϕ (xk ) − b +ek −yk (10) designing a massless particle with only two properties: veloc-
k=1 ity and position. Velocity represented how fast or slow the
particle moved and position represented the direction of the
where αk was a Lagrange multiplier. movement. Each particle searched for the optimal solution
Taking the derivative of w, b, ek , and αk resulted in separately in the search space and was denoted as the current
w, b, ek , and αk = 0: individual (pbest). The individual extreme value was shared
∂L with other particles in the whole particle swarm and the
=0 (11) optimal individual extreme value was found as the current
∂w
∂L global optimal solution (gbest) of the whole particle swarm.
=0 (12) All particles in the particle swarm adjust their speed and
∂b
∂L position according to the current individual extreme value
=0 (13) found by themselves and the current global optimal solution
∂ek
∂L Shared by the whole particle swarm.
=0 (14) Based on the foregoing, the PSO algorithm summarized the
∂αk
foraging behavior of birds. There were only two principles to
By solving equation (11-14), it could be obtaining the follow when looking for food in a flock [19].
following equations: 1) Information was shared among the birds, and they fly to
N the place where the most food was known to the group.
X
w= αk ϕ (xk ) , (15) 2) The birds’ experience of finding the most food deter-
k=1 mines their destination.
N
X According to these behaviors, the velocity and position
αk = 0 (16) formulas were abstracted as shown in equation (22) and (23):
k=1 vti = q ∗ vt−1
αk = γ ek (17) i
∂L + a1 ∗ r1 ∗ pbest t−1
i − xit−1
=0 (18)
∂αk
+ a2 ∗ r2 ∗ gbest t−1
i − x t−1
i (22)
According to equation (15)–(18), the linear equations of α
and b could be obtained by solving xit = xit−1 + vti , (23)
where vi was the velocity, xi was the position, t represented the
0 1Tv
I b = 0 (19)
current state, q was the inertia factor, a1 and a2 were learning
1v + α
y factors, pbest was the individual optimal position, and gbest
y
was the global optimal position. Compared with other algo-
where for nuclear matrix as showing in equation (20) rithms, this algorithm had the advantage of memory and good
solution preservation. Compared with genetic algorithms,
||xk −xl ||2
− previous solutions vary with population changes, while PSO
= ϕ T (xk ) ϕ (xl ) = K (xk , xl ) = e σ2 , (20)
had no crossover and mutation operations. Particles were only
And σ 2 was the kernel function, 1v = [1, 1 . . . , 1, 1]T , and updated according to the internal velocity, which was simpler,
I was the identity matrix. with fewer parameters, and an easy implementation.
III. RESULTS
A. PCA RESULTS
Table 3 showed the resulting principal component eigen-
values and variance contribution rates. Table 4 showed the
principal component matrix. As can be seen in Figure 2,
Components 1 and 2 were relatively steep and contain more
information, while Component 3 was flat and contains less
information. Table 3 showed that the cumulative variance
contribution rate of Component 1 and 2 was 93.708%, which
represented most of the information in the original data.
Tables 3 and 4 together showed that the variance contribution
rate of the first principal component was 71.276%, which FIGURE 2. Component information contains diagrams. The higher the
slope, the more information the principal component contains.
was positively correlated with the egg-laying rate of laying
hens and the five other characteristics of age, maximum shed
temperature, minimum shed temperature, feed consumption,
and layer weight. The proportion of layer weight 0.969, age
0.918 and minimum shed temperature 0.906 were slightly
higher, indicating a slightly larger correlation among the
three. The variance contribution rate of the second principal
component was 22.432%, which was positively correlated
with feed consumption, layer weight, and egg-laying rate.
The egg-laying rate and feed consumption were 0.622 and
0.664, indicating that there was a greater correlation between
them.
Tables 3 and 4 showed a positive correlation between the
first principal component and the five factors. The propor-
tion of layer weight and age was slightly higher, but there
was no significant difference between the highest and lowest
temperature, which can only be called the comprehensive FIGURE 3. Cluster pedigree diagram. The smallest distance between
egglaying rate and layer weight represents the largest correlation,
factor. In order to better explain this component, we used the followed by feed consumption, then by age, maximum temperature, and
maximum variance method to rotate the component matrix. minimum temperature.
After rotation, the common factors were still unrelated. The
rotation factor matrix was shown in Table 6. In the first prin- temperature were higher, so the first principal component
cipal component, the proportions of 0.951 for age, 0.977 for could be called the seasonal factor. The second principal
maximum shed temperature, and 0.931 for minimum shed component, with a high proportion of feed consumption of
TABLE 6. Rotation component matrix. difference was significant. If the significance test value was
lower than 0.05, it proved that the correlation between fea-
tures derived from previous experiment was not accidental.
Significance tests for egg-laying rate, daily age, layer weight,
maximum and minimum shed temperatures, and feed con-
sumption were all at 0.0001, indicating that the egg-laying
rate correlated significantly with the factors, indicating the
reliability of the correlation coefficient.
FIGURE 6. Model with original data. The data used in this model was not FIGURE 9. SVM prediction.
weighted by equation (21).
FIGURE 7. Fitness The figure shows that the PSO algorithm has obtained
the best c and g.
This study also used some other machine learning models [3] Z. Houcheng, ‘‘Statistical analysis of laying rate and environmental factors
for comparison, the fitting results were shown in figure 9 and in winter,’’ J. Anhui Agricult. Sci., vol. 35, pp. 4532–4533, 2007.
[4] K. Lijun and Z. Dongli, ‘‘Nutritional factors affecting laying rate and
figure 10. Table 10 showed a comparison of presented model countermeasures,’’ J. Animal Husbandry Vet. Med., vol. 11, pp. 77–78,
in this paper with others. In the comparative experiments, 2013.
BP model paid more attention to the early stage of laying [5] Z. Jianguo, ‘‘Observation on the effect of biological feed on the production
performance of laying hens,’’ Chin. Livestock Poultry Breeding, vol. 9,
eggs and the prediction effect of the early stage of laying pp. 169–170, 2019.
eggs was the best among all the models. However, BP model [6] W. Haitao, ‘‘Influencing factors and improving measures of laying rate of
could not work well when egg-laying rate fluctuated at 88%. laying hens,’’ Livestock Poultry Ind., vol. 30, p. 31, 2019.
[7] W. Shaofeng, ‘‘Influencing factors of laying rate in summer and improving
Both LSSVM and SVM model had a balanced prediction for measures,’’ Jilin Animal Husbandry Vet. Med., vol. 6, pp. 23–25, 2019.
the whole egg producing period. However, the accuracy of [8] I. McMILLAN, ‘‘Compartmental model analysis of poultry egg production
SVM model could not be compared with LSSVM model. curves,’’ Poultry Sci., vol. 60, no. 7, pp. 1549–1551, Jul. 1981.
[9] Z. Yudong, ‘‘Effects of Daidzein on the performance and hormone levels
PSO-LSSVM based on PCA model had lowest MSE and of the whole laying cycle of laying-type hens,’’ Sci. Technol. Eng., vol. 19,
highest R-square, revealing that our PSO-LSSVM model pp. 5637–5642, 2009.
using PCA had higher accuracy and better predicted ability [10] L. Lei, Z. Long, and H. Qinke, ‘‘Curve fitting analysis of underlying group
for laying rate and total egg number of local hens new strains (HS),’’
of egg-laying rate. J. Sichuan Agricult. Univ., vol. 31, no. 4, pp. 438–443, 2013.
[11] L. Zhongjun, H. Xuejiao, M. A. Digang, ‘‘Effects of dietary levels of
IV. CONCLUSION sulphur amino acids on performance of Jinghong laying hens during peak
The egg-laying rate of hens was a nonlinear system affected production period,’’ Chin. J. Animal Nutrition, vol. 12, pp. 3720–3725,
2015.
by many characteristics that were difficult to describe with [12] I. T. Joliffe, Principal Component Analysis. New York, NY, USA: Springer,
explicit mathematical relations. Previous studies have pre- 2002.
sented fitting approaches but with shortcomings including [13] V. Cherkassky, ‘‘The nature of statistical learning theory,’’ IEEE Trans.
Neural Netw., vol. 8, no. 6, pp. 1564–1573, Nov. 1997.
low precision and susceptibility to interference. [14] C. Cortes and V. Vapnik, ‘‘Support-vector networks,’’ Mach. Learn.,
This study analyzed the correlation between factors affect- vol. 20, no. 3, pp. 273–297, 1995.
ing the egg-laying rate using principal component analysis [15] R. Poli, J. Kennedy, and T. Blackwell, ‘‘Particle swarm optimization,’’
Swarm Intell., vol. 1, no. 1, pp. 33–57, Jun. 2007.
and correlation analysis and established the correlation coef- [16] J. Kennedy, Particle Swarm Optimization, Encyclopedia of Machine
ficient formula in turn. The present study weighted the five Learning. New York, NY, USA: Springer, 2010, pp. 760–766.
factors of age, maximum and minimum shed temperatures, [17] D. Hanbay, A. Baylar, and M. Batan, ‘‘Prediction of aeration efficiency on
stepped cascades by using least square support vector machines,’’ Expert
feed consumption, and weight, and provided them as input Syst. Appl., vol. 36, no. 3, pp. 4248–4252, Apr. 2009.
to an SVM to obtain the egg-laying rate as an output. Then [18] L. Hongchao, W. Weigang, and D. Xuemei, ‘‘Mixed least square sup-
PSO was employed to find the most suitable hyperparameters port vector regression, M-LS-SVR,’’ Electr. Eng., vol. 1, pp. 76–80,
2006.
c and g for the LSSVM algorithm. These changes improved [19] X.-S. Yang and S. Deb, ‘‘Cuckoo search: Recent advances and
the accuracy of our prediction model to 6.451. The prediction applications,’’ Neural Comput. Appl., vol. 24, no. 1, pp. 169–174,
results were broadly consistent with the actual laying data, Jan. 2014.
[20] J. A. K. Suykens, ‘‘Least squares support vector machines,’’ Int. J. Circuit
meaning our method offers better prediction of the egg-laying Theory Appl., vol. 27, pp. 605–615, Nov. 2002.
rate. [21] L. Shengliang and L. Zhi, ‘‘Parameter selection in SVM with RBF kernel
function,’’ J. Zhejiang Univ. Technol., vol. 2, pp. 163–167, 2007.
[22] C. Sun, Y. Jin, R. Cheng, J. Ding, and J. Zeng, ‘‘Surrogate-assisted cooper-
REFERENCES ative swarm optimization of high-dimensional expensive problems,’’ IEEE
[1] L. Yingying and Z. Nan, ‘‘Research on prediction model of egg fresh- Trans. Evol. Comput., vol. 21, no. 4, pp. 644–660, Aug. 2017.
ness based on image processing,’’ Food Machinery, vol. 33, pp. 103–109, [23] Q. Lin, S. Liu, Q. Zhu, C. Tang, R. Song, J. Chen, C. A. C. Coello,
2017. K.-C. Wong, and J. Zhang, ‘‘Particle swarm optimization with a balance-
[2] L. Fei and J. Minlan, ‘‘A layer yield prediction model based on support able fitness estimation for many-objective optimization problems,’’ IEEE
vector machine regression,’’ Jiangsu Agricult. Sci., vol. 47, pp. 249–252, Trans. Evol. Comput., vol. 22, no. 1, pp. 32–46, Feb. 2018.
2019.