Professional Documents
Culture Documents
PDF Hybrid Intelligent Systems 18Th International Conference On Hybrid Intelligent Systems His 2018 Held in Porto Portugal December 13 15 2018 Ana Maria Madureira Ebook Full Chapter
PDF Hybrid Intelligent Systems 18Th International Conference On Hybrid Intelligent Systems His 2018 Held in Porto Portugal December 13 15 2018 Ana Maria Madureira Ebook Full Chapter
https://textbookfull.com/product/hybrid-intelligent-systems-19th-
international-conference-on-hybrid-intelligent-systems-
his-2019-held-in-bhopal-india-december-10-12-2019-ajith-abraham/
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-
loucas/
https://textbookfull.com/product/hybrid-intelligent-systems-
ajith-abraham/
https://textbookfull.com/product/proceedings-of-the-16th-
international-conference-on-hybrid-intelligent-systems-
his-2016-1st-edition-ajith-abraham/
https://textbookfull.com/product/proceedings-of-the-tenth-
international-conference-on-soft-computing-and-pattern-
recognition-socpar-2018-ana-maria-madureira/
Advances in Intelligent Systems and Computing 923
Hybrid
Intelligent
Systems
18th International Conference
on Hybrid Intelligent Systems (HIS 2018)
Held in Porto, Portugal, December 13–15, 2018
Advances in Intelligent Systems and Computing
Volume 923
Series editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
The series “Advances in Intelligent Systems and Computing” contains publications on theory,
applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all
disciplines such as engineering, natural sciences, computer and information science, ICT, economics,
business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the
areas of modern intelligent systems and computing such as: computational intelligence, soft computing
including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms,
social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and
society, cognitive science and systems, Perception and Vision, DNA and immune based systems,
self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics including
human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent
data analysis, knowledge management, intelligent agents, intelligent decision making and support,
intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings
of important conferences, symposia and congresses. They cover significant recent developments in the
field, both of a foundational and applicable character. An important characteristic feature of the series is
the short publication time and world-wide distribution. This permits a rapid and broad dissemination of
research results.
** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP,
SCOPUS, Google Scholar and Springerlink **
Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa
Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, Electronic Engineering, University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, EL PASO, TX,
USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney,
NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana,
Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro,
Brazil
Ngoc Thanh Nguyen, Faculty of Computer Science and Management, Wrocław University of Technology,
Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong,
Shatin, Hong Kong
Editors
123
Editors
Ana Maria Madureira Ajith Abraham
School of Engineering Machine Intelligence Research Labs
Instituto Superior de Engenharia (ISEP/IPP) (MIR Labs)
Porto, Portugal Auburn, WA, USA
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
v
vi Preface
Our special thanks to the Springer Publication team for the wonderful support
for the publication of this proceedings. Enjoy reading!
vii
viii Contents
1 Introduction
In recent years, the act of traditional marketing has been gradually losing rele-
vance to the concept of e-marketing, the process of advertising and selling prod-
ucts and services on the internet [7].
At its core, e-marketing remains the same as traditional marketing, organi-
zations must meet their customers needs. More than that, it could be said that
the internet was the final nail in the power shift from organizations to users.
By giving full control to the user over what he wanted to do, marketers could
not keep persons captive while watching advertisements, i.e. it generated a great
degree of awareness [7].
2 Previous Work
The idea of using machine learning for either filtering or blocking spam is not
new. Through the last decade, partially due to the rise of spam in prominence,
works in this area have become common. Normally, spam classification problems
focus on the judging the contents of messages and not so much in the classifica-
tion of databases. That being the case, two previous works will be presented ([8]
and [1]), one focusing on the comparison of algorithms in “regular” spam detec-
tion, and another focusing on algorithms performance in binary classification
(since we want to find if a CDB is that of a spammer or not).
In the paper, A comparative study for content-based dynamic spam classifi-
cation using four machine learning algorithms [8], the authors proposed the exe-
cution of an empirical evaluation of four different machine learning algorithms,
relatively to their spam classification capabilities. The encompassed algorithms
included: one based in a Naı̈ve Bayes approach; an Artificial Neural-network; a
SVM approach; and a Relevance Vector Machine (RVM) approach.
The various approaches were evaluated based on different data-sets and fea-
ture sizes in terms of: accuracy, the percentage of e-mail correctly classified by
the algorithm; spam precision, the ratio of spam e-mail correctly classified from
all e-mail classified as spam; and spam recall, the proportion between the e-mails
which the algorithm managed to classify as spam and the true number of spam
e-mails present in the testing set [8].
From the results depicted in that paper, it was possible to conclude that: the
NN classifier is unsuited to be used by itself in the task, as it is the one that
consistently got the lowest results; both the Support Vector Machine (SVM) and
RVM classifiers seemed to slightly outperform the NB classifier; comparatively
to SVM, RVM provided similar results with less relevance vectors and faster
testing time, although the learning procedure was slower overall.
One work presenting a comparison between the performance of supervised
learning algorithms in regards to binary problem scenarios is An Empirical Com-
parison of Supervised Learning Algorithms Using Different Performance Metrics
[1]. In this work the authors used various data-sets and performance metrics
in order to compare seven supervised learning algorithms based on different
approaches. These approaches were based on: K Nearest Neighbours (KNN),
Artificial Neural Networks (ANN), Decision Tree (DT), Bagged Decision Tree
(BAG-DT), Random Forest (RF), Boosted Decision Tree (BST-DT), Boosted
Stumps (BST-STMP), SVM.
These algorithms were then evaluated according to three different metric
groups (each containing three metrics), these were: threshold metrics, order-
ing/rank metrics and probability metrics. Additionally, this comparison between
4 D. Coelho et al.
3 Solution
The first step to find a solution is to understand the problem. In this case some
contact database importations are being performed by spammers which then use
those lists to produce spam e-mail campaigns. So, how can this be prevented?
The answer is straightforward, information. If enough information is presented
about a subject then it can be understood. However, due to the context of the
problem, information can only be extracted regarding a CDB’s characteristics
and not a campaign’s content. Various parameters can be extracted from a single
CDB. In a global perspective, factors such as the overall size of an importation
are important, as spammers’ importations will probably differ in size in compar-
ison to those made by both singular users as well as corporate users. If a more
sharp and singular outlook is used, then characteristics obtained from the e-mails
present in a CDB also become important, e.g. a regular CDB shouldn’t have any
spam-trap addresses in its constitution, while that of a spammer’s might.
Many different features can be extracted from a CDB importation. The impor-
tant thing to keep in mind, however, is that not every single one has the same
degree of contribution to the final decision. This becomes clear if we consider, for
example, two features, the first one representing the percentage of e-mails that is
similar to a known domain but not an actual one (e.g. example@gmail.com), and
the second one representing the percentage of spam-traps in the imported CDB.
By definition a spam-trap classifies any domain that tries to send messages to it
as a possible spammer, which means that even a low percentage of these e-mails
A Machine Learning Approach to Contact Databases’ Importation 5
in a CDB can carry more weight than a greater percentage similar e-mails, in
which case the worst outcome is that a bounce will occur.
The problem now becomes how to distinguish these relevant features in order
to train the algorithm which will classify each CDB importation attempt. A good
starting point would be the concept of sensibility analysis. Sensitivity analysis
in normally defined as: “The study of how uncertainty in the output of a model
(numerical or otherwise) can be apportioned to different sources of uncertainty in
the model input” [6]. However this definition is not of much use without knowing
what a model means in this context. Simply put a model is a representation
of a given system or problem made through a mathematical approach (e.g. a
simple body mass index prediction model could be accomplished by computing
the formula BM I = W eight/Height2 ). Knowing this we can then say that
sensibility analysis can be summed up as an activity that aims to investigate the
degree to which a variable used in a model (e.g. the height in the BMI example)
can affect that models results (i.e. the BMI itself).
Although various ways of conducting this type of analysis exist, the most
straightforward one is direct observation of data, namely, through the use of
scatter-plots and other such methods. Evidently, the presence of a possible pat-
tern in a scatter-plot is usually a good sign of the sensitivity between a variable
and a model’s output. The more clear the pattern is, the more sensible the out-
put is to that variable. For instance, in Fig. 1, Y is increasingly more sensitive
to Z1 , Z2 , Z3 and Z4 , that is the case since it is clearly possible to see that a
pattern gradually becomes more clear in each case’s plot [6].
This method may be adequate if the relations between the various input
feature and the generated outputs are distinct, however if the relation between
them does not seem to show a well defined pattern, then using it a basis for
6 D. Coelho et al.
decision may prove difficult. A plausible way of countering this problem would
be to compute the correlation coefficients of input features versus their expected
outputs. This would allow for a solid and easy to understand hierarchy to be
formed between the input features, as, by definition, correlation coefficients are
measured in a [−1, 1] scale. However, if this method is to be used one should
consider the various assumptions that any correlation coefficient method car-
ries, as well as what it stands for. E.g. the use of Pearson’s correlation coeffi-
cient is not adequate to measure relations which are not linear, this is the case
since Pearson’s correlation coefficient is a measure of the strength of the linear
relationship between two variables, on the other hand, Spearman’s coefficient
assesses how well an arbitrary monotonic function can describe a relationship
between two variables, without making any assumptions about their frequency
distribution [5].
Having decided which algorithms should be used for the classifier’s proto-
types, as well as the way to deal with the problem of class imbalance (training
weights). The next step to take is to find which hyper-parameters would be ideal
in order to train the classifiers. However, to do that, it is necessary to understand
which classifier metric should be optimized. The performance of a classifier in an
unbalanced classification problem can normally be determined by the recall and
precision of its various classes. For clarification sake, precision can be defined as
the proportion of classifications of a given class which was actually correct, while
recall expresses the proportion of actual class instances which were identified cor-
rectly. Taking into account the problem’s context, it is possible to conclude that
the most important metric to optimize is the recall of the spammer class. This
is the case since if a misclassification of a regular case occurs there is no major
repercussion, while if a misclassification of a spam importation were to occur it
can lead to the black-listing of e-mails/addresses.
In order to perform this optimization/tuning process regarding the hyper-
parameters of the classifier prototypes, the chosen approach to be use was nested
cross-validation. Nested cross-validation estimates the generalization error of the
underlying model and its hyper-parameter search. It does this to counter the fact
that model selection without nested cross-validation uses the same data to tune
model parameters and evaluate model performance, which may lead informa-
tion to “leak” to the model causing over-fitting. It operates by using a series
of train/validation/test set splits. In an inner loop the score is approximately
maximized by fitting a model to each training set, and then directly maximized
in selecting hyper-parameters over the validation set. Then, using a outer loop,
generalization error is estimated by averaging test set scores over several data-set
splits.
By applying training weights to the prototype classifiers’ training phase and
optimizing their hyper-parameters through the use of nested cross-validation, all
four prototypes were implemented [2].
4 Results
Now that the prototypes have been implemented, it is necessary to evaluate how
well they perform comparatively to each other. For clarity, during this chapter
positive class will refer to spam importations while negative class will refer
to “regular” importations. By comparing some intermediate results obtained
through the development phase, it was possible to establish a rough perfor-
mance order of: Naı̈ve-Bayes < SVM < Random-forest < Ada-boost. However,
this order is based on single runs used for hyper-parameter optimization, as
such a better comparison method is necessary. As such, it was decided that
three different factors would be compared between prototypes: training time,
precision/recall and area under the ROC curve.
To verify the training time differences between classifiers, ten different runs
for training sets with a varying numbers of instances were executed for each
classifier. A timer was started when each training session began and stopped
8 D. Coelho et al.
when it finished. In this way, a set of points related to each classifier’s necessary
training time for a certain number instances was created. By applying a gener-
alized linear model to each group of observations, the following plot was created
(Fig. 2):
By itself, this plot does not constitute any kind of statistical evidence, hence
an appropriate test is necessary. In this case, while taking into account the data’s
characteristics a two-sample sign test was used. By applying this test to each
pair of classifiers, the following Table 1 was formed:
Table 1. p-values for two-sample sign test between each classifier’s populations
Finally, in order to test the area under the ROC curve of the various classi-
fiers, the first step taken was to extract data targeting the classification probabil-
ities generated by the same train/test split in each of their cases. This train/test
split was based on a data-set with 26240 entries, and its division followed a
ratio of 70% training data to 30% testing data, meaning that the classification
probabilities were based in 7872 instances.
Once that was concluded the next step taken was to use that data to plot the
ROC for each individual classifier and calculate their area under the curve (seen
10 D. Coelho et al.
in Fig. 3). The computed values for each area were: 0.86 for the Naı̈ve Bayes
classifier, 0.87 for the SVM classifier, 0.95 for the Random-forest classifier, and
0.97 for the Ada-boost classifier.
As is plain to see, if we make a direct comparison without further analysis
then the order established between algorithms, in regards to Area Under the
Curve (AUC), would be: Naı̈ve Bayes < Support Vector Machine < Random
Forest < Ada-Boost. This ordering, however, may not be correct, as the dif-
ferences between each of the ROC curves could be caused by a random set of
factors. To solve this problem the methodology described in [4] was employed,
allowing to conclude with a confidence level of 95%, that the order which cor-
rectly expressed the relation between the area under the curve for the various
ROC curves was: Naı̈ve Bayes = SVM < Random-forest < Ada-boost.
5 Conclusion
Through the process described above, a significant problem of online marketing
platforms was identified, and a possible solution to it was implemented. By
using machine-learning capabilities, the developed solution was able to obtain
comparable success to the currently employed rules-based approach to spam
prevention in contact database importations. Additionally, various prototypes
were developed allowing for multiple approaches to the same problem to exist and
comparison to one-another to occur, which in turn benefits further optimization
of the system. This comparison allowed to assert that, for the problem at hand,
the best performing algorithms were the ones based in Ada-boost, Random-
forest, SVMs and Naı̈ve Bayes, in that order. This performance order is not
immutable, since as the system evolves more efficient prototypes may be created
or the existent prototypes may suffer changes in their proficiency.
References
1. Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning
algorithms. In: Proceedings of the 23rd International Conference on Machine Learn-
ing, pp. 161–168. ACM (2006)
2. Coelho, D.: Intelligent analysis of contact databases’ importation. Master’s thesis.
Instituto Superior de Engenharia do Porto (2018)
3. Gonçalves, M.: E-goi (2018). https://www.e-goi.pt/
4. Hanley, J.A., McNeil, B.J.: A method of comparing the areas under receiver oper-
ating characteristic curves derived from the same cases. Radiology 148(3), 839–843
(1983)
5. Hauke, J., Kossowski, T.: Comparison of values of Pearson’s and Spearman’s corre-
lation coefficients on the same sets of data. Quaestiones Geographicae 30(2), 87–93
(2011)
6. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D.,
Saisana, M., Tarantola, S.: Global Sensitivity Analysis: The Primer. Wiley, Hoboken
(2008)
7. Strauss, J., et al.: E-Marketing. Routledge, Abingdon (2016)
8. Yu, B., Xu, Z.B.: A comparative study for content-based dynamic spam classification
using four machine learning algorithms. Knowl.-Based Syst. 21(4), 355–362 (2008)
Post-processing of Wind-Speed Forecasts Using
the Extended Perfect Prog Method
with Polynomial Neural Networks
to Elicit PDE Models
1 Introduction
Continual fluctuations in wind speed arise from complex chaotic interaction processes
included in the large-scale air circulation primarily induced by global pressure and
temperature differences. Numerical Weather Prediction (NWP) systems solve sets of
primitive differential equations, simplifying the atmosphere behavior to the ideal gas
flow, starting from the initial conditions to predict the time-change of a variable in each
central grid cell with respect to its neighbors. The NWP models cannot reliably rec-
ognize conditions near the ground as they simulate multi-layer physical processes in
© Springer Nature Switzerland AG 2020
A. M. Madureira et al. (Eds.): HIS 2018, AISC 923, pp. 11–21, 2020.
https://doi.org/10.1007/978-3-030-14347-3_2
12 L. Zjavka et al.
Artificial Neural Networks (ANN) are simple 1 or 2-layer structures which cannot
model complex patterns in local weather described by a mass of data. ANN require data
pre-processing to significantly reduce the number of input variables, which leads
usually to the models over-simplification. The number of parameters in polynomial
regression grows exponentially with the number of variables, contrary to ANN.
Polynomial Neural Networks (PNN) decompose the general connections between
inputs and output variables expressed by the Kolmogorov-Gabor polynomial (1).
P
n P
n P
n P
n P
n P
n
Y ¼ a0 þ a i xi þ aij xi xj þ aijk xi xj xk þ . . .
i¼1 i¼1 j¼1 i¼1 j¼1 k¼1 ð1Þ
n - number of input variables xi ai ; aij ; aijk ; . . . - polynomial parameters
Post-Processing of Wind-Speed Forecasts Using the Extended PP Method 13
y ¼ a0 þ a1 xi þ a2 xj þ a3 xi xj þ a4 x2i þ a5 x2j
ð2Þ
xi ; xj - input variables of polynomial neuron nodes
The proposed method is a 2-stage process analogous to the PP approach. In the 1st
stage, the training is made on real observations to derive the PDE model (Fig. 1, left)
which is applied to the forecasts to calculate 24-h output predictions (Fig. 1, right) at
the corresponding time in the 2nd PP stage. The models formed with data from the last
days and applied in the prediction day with similar data patterns give usually more
accurate predictions than models developed with large data sets including various
weather conditions. The 1st PP stage is extended into 2 steps:
1. The optimal number of training days, used to elicit the prediction model, is initially
estimated by an assistant test model.
2. Parameters of the prediction model are adapted for inputs->output data samples to
process forecasts of the training input variables.
14 L. Zjavka et al.
Data observaons The last day The previous day NWP model forecasts
Training (2-x days) Test (6h.) Test (6h.) Post-processing (24 h.)
Reference Input
output NWP data
Input Input
me-series: me-series:
Temperature D-PNN Temperature Model
Relat.humidity Relat.humidity
Pressure Pressure
Wind speed & Wind speed &
direcon Desired direcon Wind speed
wind speed predicons
Fig. 1. D-PNN is trained with spatial data observations (blue-left) over the last few days to form
daily prediction PDE models that process forecast series of the input variables (red-right)
The assistant model is an additional D-PNN test model formed initially before the
1st PP step with observations from increasing number of the previous days 2, 3, …, x. It
processes NWP data from the previous day to compare regularly the output with the
latest 6-h observations to detect a change in the atmosphere state in the previous days,
which determines the training period length. The optimal number of data samples,
giving the minimal test error, is used in the daily prediction model formation.
The D-PNN models can post-process and improve forecasts in more or less settled
weather which tends to prevail and whose patterns do not change fundamentally within
short-time intervals. The model derived from the previous last days is not actual in the
case of a significant overnight change in the atmosphere state because of different
spatial data relations. If the assistant model cannot obtain an acceptable test error,
defined in consideration of failed forecast revisions in the previous days, the post-
processing model is flawed and should not be applied to the current NWP data.
D-PNN defines and substitutes for the general linear PDE (3) whose exact form is
unknown and which can describe uncertain dynamic systems. It decomposes the n-
variable general PDE into 2nd order specific sub-PDEs in the PNN nodes to model the
unknown functions uk whose sum gives the complete n-variable model u (3).
X
n
@u X n X n
@2u X
1
a þ bu þ ci þ dij þ... ¼ 0 u¼ uk ð3Þ
i¼1
@xi i¼1 j¼1
@xi @xj k¼1
Considering 2-input variables in the PNN nodes, the derivatives of the 2nd order
PDE (4) correspond to variables of the GMDH polynomial (2). This type of the PDE is
most often used to model physical or natural systems non-linearities.
X
@uk X @uk X @ 2 uk X @ 2 uk X @ 2 uk
; ; ; ;
@x1 @x2 @x21 @x1 @x2 @x22 ð4Þ
uk - node partial sum functions of an unknown separable function u
The polynomial conversion of the 2nd order PDE (4) using procedures of Opera-
tional Calculus is based on the proposition of the Laplace transform (L-transform) of
the function nth derivatives in consideration of the initial conditions (5).
P
n
ði1Þ
L f ðnÞ ðtÞ ¼ pn FðpÞ pni f0 þ Lff ðtÞg ¼ FðpÞ
k¼1 ð5Þ
0 ðnÞ
f ðtÞ; f ðtÞ; . . .; f ðtÞ - originals continuous in \0 þ ; 1[ p; t - complex and real variables
This polynomial substitution for the f(t) function nth derivatives in an Ordinary
Differential Equation (ODE) leads to algebraic equations from which the L-transform
image F(p) of an unknown function f(t) is separated in the form of a pure rational
function (6). These fractions represent the L-transforms F(p), expressed with the
complex number p, so that the inverse L-transformation is applied to them to obtain the
original functions f(t) of a real variable t (6) described by the ODE.
PðpÞ P
n
Pðak Þ 1 P
n
Pðak Þ ak t
FðpÞ ¼ QðpÞ ¼ f ðtÞ ¼ Qk ðak Þe
k¼1
Qk ðak Þ pak
k¼1 ð6Þ
ak - simple real roots of the multinomial Qð pÞ F ð pÞ - Ltransform image
x1 x2
Input
variables
nd
2 order
PDE
solution …
/ / /
neurons
GMDH
Π polynomial
CT
CT = composite
Block output terms
Fig. 2. A block of derivative neurons - 2nd order sub-PDE solutions in the PNN nodes
output polynomial (2) which is used to form neurons which can be included in the total
network output sum of a general PDE solution.
@u @u @ 2 u @ 2 u @ 2 u
F x1 ; x2 ; u; ; ; ; ; ¼0
@x1 @x2 @x21 @x1 @x2 @x22 ð7Þ
where F ðx1 ; x2 ; u; p; q; r; s; tÞ is a function of 8 variables
While using 2 input variables in the PNN nodes the 2nd order PDE can be expressed
in the equality of 8 variables (7), including derivative terms formed with respect to
variables corresponding to the GMDH polynomial (2).
a0 þ a1 x1 þ a2 x2 þ a3 x1 x2 þ a4 sigðx21 Þ þ a5 sigðx22 Þ u
y1 ¼ w 1 e ð8Þ
b0 þ b 1 x 1
a0 þ a1 x1 þ a2 x2 þ a3 x1 x2 þ a4 sigðx21 Þ þ a5 sigðx22 Þ u
y3 ¼ w 3 e ð9Þ
b0 þ b1 x2 þ b2 sigðx22 Þ
a0 þ a1 x1 þ a2 x2 þ a3 x1 x2 þ a4 sigðx21 Þ þ a5 sigðx22 Þ u
y5 ¼ w 5 e
b0 þ b1 x1 þ b2 x12 þ b3 x1 x2
ð10Þ
u ¼ arctgðx1 =x2 Þ - phase representation of 2 input variables x1 ; x2
ai ; bi - polynomial parameters wi - weights sig - sigmoidal transformation
Each D-PNN block can form 5 simple derivative neurons, in respect of single x1, x2
(8) squared x21 ; x22 (9) and combination x1x2 (10) derivative variables, which can solve
specific 2nd order sub-PDEs in the PNN nodes (7). The Root Mean Squared Error
(RMSE) is calculated in each iteration step of training and testing to select and optimize
the parameters (11).
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PM
ðYid Yi Þ
2
Multi-layer networks form composite functions (Fig. 2). The blocks of the 2nd and next
hidden layers can produce additional Composite Terms (CT) equivalent to the neurons.
The CTs substitute for the node sub-PDEs with respect to variables of back-connected
blocks of the previous layers according to the composite function (12) partial derivation
rules (13).
@F X m
@f ðz1 ; z2 ; . . .; zm Þ @/i ðXÞ
¼ k ¼ 1; . . .; n ð13Þ
@xk i¼1
@zi @xk
The 2rd layer blocks can form and select one of the neurons or additional 10 CTs
using applicable neurons of the 1st layer 2 blocks in the products (13) for composite
sub-PDEs (14). The 3rd layer blocks can select from additional 10 + 20 CTs using
neurons of 2 and 4 back-connected blocks in the previous 2nd and 1st layers (20), etc.
The number of possible block CTs doubles along with each joined preceding layer
(Fig. 3). The L-transform image F(p) is expressed in the complex form, so the phase of
the complex representation of 2-variables is applied in the inverse L-transformation.
a þ a x þ a x þ a x x þ a x2 þ a x2
y3p ¼ w3p 0 1 21 2 22 x321 21 22 4 21 5 22 eu31 c21 b0 þx11b1 x2 eu11
ykp - pth Composite Term ðCT Þ output u21 ¼ arctgðx11 =x13 Þ u31 ¼ arctgðx21 =x22 Þ ð15Þ
ckl - complex representation of the lth block inputs xi ; xj in the k th layer
The CTs include the L-transformed fraction of the external function sub-PDE (left)
in the starting block and the selected leaf neuron (right) for the internal function from a
block in preceding layers (14). The complex representation of 2-inputs (16) of blocks in
the CTs inter-connected layers can substitute for the node internal function PDEs (15).
The pure rational fractions (6) correspond to the amplitude r (radius) in Eulers’s
notation of a complex number c (16) in polar coordinates.
x1 x2 x3 x4 x5 … xn
B = max. number of
blocks in all layers
x11 x13
x21 x22
Backward
connections
Σ -1
Y = y1 + y2 + … + yk+ … L -transformed fraction sum terms
node sub-PDE solutions substitution for the general PDE
Fig. 3. D-PNN selects from possible 2-variable combination blocks in each hidden layer
Another random document with
no related content on Scribd:
Zehnter Abschnitt
Um Lindi und Kilwa