Nonlinear Dynamic Modeling of Physiological Systems - Marmarelis

Nonlinear Dynamic
Modelingof
Physiological Systems
IEEE Press
445 Hoes Lane
Piscataway, NJ 08855
IEEE Press Editorial Board

Stamatios V. Kartalopoulos, Editor in Chief
M. Akay R. J. Herrick M. S. Newman

R. J. Baker D. Kirk M.Padgett
J. E. Brewer R. Leonardi w. D. Reeve
M. E. EI-Hawary G. Zobrist S. Tewksbury
Kenneth Moore, Director ofBook and Information Services (BIS)

Catherine Faduska, Senior Acquisitions Editor
Christina Kuhnen, Associate Acquisitions Editor
Technical Reviewers
Nonlinear Dynamic
Modelingof
Physiological Systems
Vasilis z. Marmarelis
--E*a
IEEE Engineering in Medicine
and Biology Society, Sponsor
IEEE Press Series on Biomedical Engineering

Metin Akay, Series Editor
+
+IEEE
IEEE Press
G1WILEV-
~VJ") INTERSCIENCE
A lOHN WILEY & SONS, INC., PUBLICATION
Copyright © 2004 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved.
Published by John Wiley & Sons, Ine., Hoboken, New Jersey.

Published simultaneously in Canada.
No part ofthis publication may be reproduced, stored in a retrieval system or transmitted in any form or by any
means, electronie, mechanical, photoeopying, recording, scanning or otherwise, except as permitted under
Section 107 or 108 of the 1976 United States Copyright Aet, without either the prior written permission of the
Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center,
Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the web at
www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions
Department, John Wiley & Sons, Ine., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-
6008.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representation or warranties with respect to the accuracy or completeness of
the contents of this book and speeifically disclaim any implied warranties of merchantability or fitness for a
particular purpose. No warranty may be created or extended by sales representatives or written sales materials.
The advice and strategies contained herein may not be suitable for your situation. You should consult with a
professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any
other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other produets and services please contact our Customer Care Department
within the V.S. at 877-762-2974, outside the V.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however,
may not be available in electronie format.
Library 0/ Congress Cataloging-in-Publication Data is available.
ISBN 0-471-46960-2
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1
To my fa ther Zissis and my mother Elpida
for instilling in me the love for leaming
To my brother Panos for guiding my first steps

andfor being my sage adviser all my life
To my love Melissa and my sons Zissis and Myrl

for being the inspiration and joy in my life
Contents
Prologue Xlll
1 Introduction 1
1.1 Purpose of this Book 1
1.2 Advocated Approach 4
1.3 The Problem of System Modeling in Physiology 6
1.3.1 Model Specification and Estimation 10
1.3.2 Nonlinearity and Nonstationarity 12
1.3.3 Definition ofthe Modeling Problem 13
1.4 Types ofNonlinear Models ofPhysiological Systems 13
Example 1.1. Vertebrate Retina 15
Example 1.2. Invertebrate Photoreceptor 18
Example 1.3. Volterra analysis ofRiccati Equation 19
Example 1.4. Glucose-Insulin Minimal Model 21
Example 1.5. Cerebral Autoregulation 22
1.5 Deductive and Inductive Modeling 24
Historical Note #1: Hippocratic and Galenic Views of 26
Integrative Physiology
2 Nonparametrie Modeling 29
2.1 Volterra Models 31
2.1.1 Examples of Volterra Models 37
Example 2.1. Static Nonlinear System 37
Example 2.2. L-N Cascade System 38
Example 2.3. L-N-M"Sandwich" System 39
Example 2.4. Riccati System 40
vii
viii CONTENTS
2.1.2 Operational Meaning of the Volterra Kernels 41

Impulsive Inputs 42
Sinusoidal Inputs 43
Remarks on the Meaning of Volterra Kernels 45
2.1.3 Frequency-Domain Representation ofthe Volterra Models 45
2.1.4 Discrete-Time Volterra Models 47
2.1.5 Estimation ofVolterra Kemels 49
Specialized Test Inputs 50
Arbitrary Inputs 52
Fast Exact Orthogonalization and Parallel-Cascade Methods 55
Iterative Cost-Minimization Methods for Non-Gaussian 55
Residuals
2.2 Wiener Models 57
2.2.1 Relation between Volterra and Wiener Models 60
The Wiener Class of Systems 62
Examples of Wiener Models 63
Comparison of VolterraIWiener Model Predictions 64
2.2.2 Wiener Approach to Kernel Estimation 67
2.2.3 The Cross-Correlation Technique for Wiener Kernel Estimation 72
Estimation of ho 73
Estimation of h, ('T) 73
Estimation of h2( 'Tl' 'T2) 74
Estimation of h3( 'Tl' 'T2' 'T3) 75
Some Practical Considerations 77
Illustrative Example 78
Frequency-Domain Estimation ofWiener KerneIs 78
2.2.4 Quasiwhite Test Inputs 80
CSRS and Volterra Kernels 84
The Diagonal Estimability Problem 85
An Analytical Example 86
Comparison of Model Prediction Errors 88
Discrete- Time Representation of the CSRS Functional Series 89
Pseudorandom Signals Based on m-Sequences 89
Comparative Use ofGWN, PRS, and CSRS 92
2.2.5 Apparent Transfer Function and Coherence Measurements 93
Example 2.5. L-N Cascade System 96
Example 2.6. Quadratic Volterra System 97
Example 2.7. Nonwhite Gaussian Inputs 98
Example 2.8. Duffing System 98
Concluding Remarks 99
2.3 Efficient Volterra Kernel Estimation 100
2.3.1 Volterra Kernel Expansions 101
Model Order Determination 104
2.3.2 The Laguerre Expansion Technique 107
Illustrative Examples 112
2.3.3 High-Order Volterra Modeling with Equivalent Networks 122
2.4 Analysis of Estimation Errors 125
2.4.1 Sources of Estimation Errors 125
CONTENTS lx
2.4.2 Estimation Errors Associated with the Cross-Correlation 127

Technique
Estimation Bias 128
Estimation Variance 130
Optimization of Input Parameters 131
Noise Effects 134
Erroneous Scaling ofKernel Estimates 136
2.4.3 Estimation Errors Associated with Direct Inversion Methods 137
2.4.4 Estimation Errors Associated with Iterative 139
Cost-Minimization Methods
Historical Note #2: Vito Volterra and Norbert Wiener 140
3 Parametrie Modeling 145

3.1 Basic Parametrie Model Forms and Estimation Procedures 146
3.1.1 The Nonlinear Case 150
3.1.2 The Nonstationary Case 152
3.2 Volterra Kernels ofNonlinear Differential Equations 153
Example 3.1. The Riccati Equation 157
3.2.1 Apparent Transfer Functions ofLinearized Models 158
Example 3.2. Illustrative Example 160
3.2.2 Nonlinear Parametrie Models with Intermodulation 161
3.3 Discrete-Time Volterra KerneIs ofNARMAX Models 164
3.4 From Volterra Kernel Measurements to Parametrie Models 167
3.5 Equivalence Between Continuous and Discrete Parametrie Models 171
3.5.1 Modular Representation 177
4 Modular and Connectionist Modeling 179

4.1 Modular Form ofNonparametric Models 179
4.1.1 Principal Dynamic Modes 180
4.1.2 Volterra Models of System Cascades 191
The L-N-M, L-N, and N-M Cascades 194
4.1.3 Volterra Models of Systems with Lateral Branches 198
4.1.4 Volterra Models of Systems with Feedback Branches 200
4.1.5 Nonlinear Feedback Described by Differential Equations 202
Example 1. Cubic Feedback Systems 204
Example 2. Sigmoid Feedback Systems 209
Example 3. Positive Nonlinear Feedback 213
Example 4. Second-Order Kemels ofNonlinear 215
Feedback Systems
Nonlinear Feedback in Sensory Systems 216
Concluding Remarks on Nonlinear Feedback 220
4.2 Connectionist Models 223
4.2.1 Equivalence between Connectionist and Volterra Models 223
Relation with PDM Modeling 230
X CONTENTS
4.2.2 Volterra-Equivalent Network Architectures for Nonlinear 235

System Modeling
Equivalence with Volterra Kemels/Models 238
Selection ofthe Structural Parameters ofthe VEN Model 238
Convergence and Accuracy of the Training Procedure 240
The Pseudomode-Peeling Method 244
Nonlinear Autoregressive Modeling (Open-Loop) 246
4.3 The Laguerre-Volterra Network 246
Illustrative Example of LVN Modeling 249
Modeling Systems with Fast and Slow Dynamic (LVN-2) 251
Illustrative Examples ofLVN-2 Modeling 255
4.4 The VWM Model 260
5 A Practitioner's Guide 265

5.1 Practical Considerations and Experimental Requirements 265
5.1.1 System Characteristics 266
System Bandwidth 266
System Memory 267
System Dynamic Range 267
System Linearity 268
System Stationarity 268
System Ergodicity 268
5.1.2 Input Characteristics 269
5.1.3 Experimental Characteristics 270
5.2 Preliminary Tests and Data Preparation 272
5.2.1 Test for System Bandwidth 272
5.2.2 Test for System Memory 272
5.2.3 Test for System Stationarity and Ergodicity 273
5.2.4 Test for System Linearity 274
5.2.5 Data Preparation 275
5.3 Model Specification and Estimation 276
5.3.1 The MDV Modeling Methodology 277
5.3.2 The VENNWM Modeling Methodology 278
5.4 Model Validation and Interpretation 279
5.4.1 Model Validation 279
5.4.2 Model Interpretation 281
Interpretation of Volterra Kernels 281
Interpretation ofthe PDM Model 282
5.5 Outline of Step-by-Step Procedure 283
5.5.1 Elaboration ofthe Key Step #5 284
6 Selected Applications 285

6.1 Neurosensory Systems 286
6.1.1 Vertebrate Retina 287
6.1.2 Invertebrate Retina 396
6.1.3 Auditory Nerve Fibers 302
6.1.4 Spider Mechanoreceptor 307
6.2 Cardiovascular System 320
CONTENTS xi
6.3 Renal System 333

6.4 Metabolic-Endocrine System 342
7 Modeling of MultiinputlMultioutput Systems 359

7.1 The Two-Input Case 360
7.1.1 The Two-Input Cross-Correlation Technique 362
7.1.2 The Two-Input Kernel-Expansion Technique 362
7.1.3 Volterra-Equivalent Network Models with Two Inputs 364
7.2 Applications ofTwo-Input Modeling to Physiological Systems 369
7.2.1 Motion Deteetion in the Invertebrate Retina 369
7.2.2 Receptive Field Organization in the Vertebrate Retina 370
7.2.3 Metabolie Autoregulation in Dogs 378
7.2.4 Cerebral Autoregulation in Humans 380
7.3 The Multiinput Case 389
7.3.1 Cross-Correlation-Based Method for Multiinput Modeling 390
7.3.2 The Kernel-Expansion Method for Multiinput Modeling 393
7.3.3 Network-Based Multiinput Modeling 393
7.4 Spatiotemporal and Speetrotemporal Modeling 395
7.4.1 Spatiotemporal Modeling of Retinal Cells 398
7.4.2 Spatiotemporal Modeling ofCortical Cells 401
8 Modeling of Neuronal Systems 407

8.1 A General Model of Membrane and Synaptic Dynamics 408
8.2 Functional Integration in the Single Neuron 414
8.2.1 Neuronal Modes and Trigger Regions 417
8.2.2 Minimum-Order Modeling of Spike-Output Systems 432
The Reverse-Correlation Teehnique 432
Minimum-Order Wiener Models 435
8.3 Neuronal Systems with Point-Process Inputs 439
8.3.1 The Lag-Delta Representation of P-V or P-WKernels 445
8.3.2 The Redueed P-V or P-WKernels 446
8.3.3 Examples from the Hippocampal Formation 450
Single-Input Stimulation in Vivo and Cross-Correlation 450
Teehnique
Single-Input Stimulation in Vitro and Laguerre-Expansion 455
Technique
Dual-Input Stimulation in the Hippocampal Slice 457
Nonlinear Modeling of Synaptic Dynamics 461
8.4 Modeling ofNeuronal Ensembles 463
9 Modeling of Nonstationary Systems 467

9.1 Quasistationary and Reeursive Tracking Methods 468
9.2 Kernel Expansion Method 469
9.2.1 IllustrativeExample 474
9.2.2 A Test ofNonstationarity 475
xii CONTENTS
9.2.3 Linear Time-Varying Systems with Arbitrary Inputs 479

9.3 Network-Based Methods 480
9.3.1 Illustrative Examples 481
9.4 Applications to Nonstationary Physiological Systems 484
10 Modeling of Closed-Loop Systems 489

10.1 Autoregressive Form ofClosed-Loop Model 490
10.2 Network Model Form ofClosed-Loop Systems 491
Appendix I Function Expansions 495
Appendix 11 Gaussian White Noise 499
Appendix 111 Construction of the Wiener Series 503
Appendix IV Stationarity, Ergodicity, and Autocorrelation Functions 505

of Random Processes
References 507
Index 535
Prologue
Although this book has the rather specific purpose of providing methodological tools for
mathematical modeling of physiological systems, the broad subject matter of "system
modeling" has wide-ranging scientific, epistemological, and philosophical implications.
At the heart of it is the primordial urge of the human mind to understand the surrounding
world in a way that can be articulated (verbally or symbolically) and communicated to
others. The process by which experience and observation are amalgamated into a capsule
ofknowledge-"the model"-is the motor that drives the evolution of scientific thought.
Thus, the model can be viewed as a conceptual articulation of distilled knowledge that
can be effectively communicated.
The conceptual and mathematical elaboration of the modeling process underpins the
development ofnatural sciences and the articulation of"naturallaws." In the second half
of the 20th century, it gave birth to "cybemetics" and ushered in the "information age,"
on the crest of which we are currently riding. It is not hyperbole to state that the rapid
pace of scientific and technological developments of the last 40 years would have not
been possible had it not been for the strides that cybernetics and systems science made,
not only as disciplines in their own right but also-and most importantly-as a new way
of thinking that expands the scope of scientific inquiry.
In this general context, the present book aspires to make a contribution to the state of
the art in quantitative analysis of data collected in physiological systems for the purpose
of constructing dynamic models (mathematical and computational) that can benefit our
understanding of physiological function. The modeling problem is often fonnulated at
first as the search for the stimulus-response dynamic relationships imprinted in broad-
band time-series data; however, other (equivalent) mathematical fonnalisms can also be
sought. The driving goal of this endeavor is to develop reliable mathematical models of
physiological function under natural operating conditions. This entails formidable chal-
lenges arising from the nonlinear dynamic (and often nonstationary) characteristics of
xiii
xiv PROLOGUE
physiological systems and sets an exacting standard for this book, unmet by previous ef-
forts. In attempting to reach this ambitious goal, the accumulated contributions of many
investigators will be used that have built on the foundation of the seminal Volterra and
Wiener theories.
It should be evident from this preamble that I view this undertaking with daring ambi-
tion and sobering responsibility, rooted in a deep-seated sense that the time has come for a
"great leap forward" in systems physiology. The scientific and technological milieu is
ripe, and there is an increasing recognition (and pressing need) that long-standing obsta-
cles should be removed and the full potential of present-day scientific means should be
utilized. Whether we succeed or not, this promises to be an exciting journey. But succeed
we must.
At this point, I would like to turn to a more personal reflection on the course of events
that led to the writing of this book and to give proper credit to the people who had a piv-
otal influence on my work and way of thinking.
Thirty years aga today, Sgt. Pepper had already taught the band to play and I arrived at
the Califomia Institute of Technology (Caltech) from Athens, Greece as a new graduate
student to pursue a Ph.D. in Engineering Science. It was a time ofhigh anxiety and even
higher expectation-anxiety to prove that I could be successful in the highly competitive
environment of one of the most renowned research universities in the world, and expecta-
tion because the outstanding academic environment at Caltech offered a unique opportu-
nity for truly exciting research. My research interests focused on the emerging science of
systems and cybemetics. Having decided that engineering systems were not exciting
enough for my ambitious mind, I explored the research prospects for biological systems
or socioeconomic systems because oftheir potential impact on human life or society, re-
spectively. Soon, I realized that the prevailing mindset in the social sciences (including
economics) was somewhat stifling and restrictive relative to my socioeconomic views,
and I decided to focus on the study of living systems.
Thus, I joined the research group of Prof. Gilbert McCann that was studying the early
stages of visual systems from the cybernetic viewpoint of input-output signal transforma-
tion. The essential issue was the development of practicable methodologies for obtaining
accurate/reliable mathematical models of input-output transformations in the visual sys-
tem from experimental data in a nonlinear dynamic context. It was a great challenge and a
fundamental scientific issue that could have tremendous implications for advancing our
knowledge in numerous fields (including, but not limited to, biological systems).
An additional motivating factor was the fact that my (only) brother, Panos, was the ris-
ing young star in this group (he had just completed his Ph.D. on the same subject) and
was playing a pivotal role in fusing and leading the ambitious efforts of three research
groups (McCann's, Naka's and Fender's) to "unlock the mysteries ofvisual information
processing." In fact, Panos had initiated the primary thrust ofthis effort through his Ph.D.
work on the application ofthe Volterra-Wiener approach to visual system modeling. AI-
though Panos left the following year to take a faculty position at the Camegie-Mellon
University, the initial comfort of a "family environment" provided a welcome level of se-
curity and valuable initial guidance through frequent interactions. This was one of the
most creative and intellectually enjoyable years of my life.
In addition to Panos, the generous support provided by Prof. McCann (my Ph.D. advi-
sor) generated an exceptional academic environment and helped motivate my productive
engagement with the core research of the Bioinformation Systems group. The group had
attained favorable national and international visibility, thanks to Panos' pioneering work
PROLOGUE XV
with his close associate, Prof. Ken Naka, in conjunction with Prof. McCann's visionary
leadership. A dozen graduate students and several collaborating faculty (chief among
them, Profs. Derek Fender and Thomas Caughey) formed a vibrant peer community in the
best tradition of Caltech's research excellence.
To the surprise of everyone around me, Prof. McCann demonstrated an uncharacteris-
tic level of support for my research efforts and elevated me in status among my peers.
This and the fact that I was Panos' "heir apparent" caused some (understandable) envy
among my fellow graduate students. However, the perfeet GPA I achieved on a higher
than normal course load (in trying to make up for a delayed arrival from Greece) seemed
to vindicate me in the eyes of my fellow graduate students and established me as a "first
among equals." These events played a pivotal role in my subsequent development as they
endowed me with a firm sense of self-worth and the potent confidence of nearly limitless
achievement.
As exaggerated as this view may have been, it played a constructive role in propelling
my research efforts beyond the ordinary scope of a graduate student. As a result, my
Ph.D. Thesis and related work received considerable recognition and helped produce the
first book on the subject, coauthored with Panos as the senior partner. The book was pub-
lished two years after my graduation and received considerable acclaim, while Panos
(who was primarily responsible for its success through his outstanding research results
and his clever strategy of dissemination) was completing his medical studies and was
changing his career path toward medical research and practice. This was a career change
that appeared gratifying to hirn but represented the loss of a brilliant intellect for our field
of research.
Thus, I was left in the early 1980s as the sole beneficiary of a successful book (soon
translated into Russian and Chinese) and the "only Marmarelis brother" in the field that
was facing changing tides in national research priorities (a combination of Reagan's de-
fense build-up with a sharp turn toward reductionist molecular biology and away from in-
tegrative systems physiology). Ironically, the Bioinformation Systems group (headed by
Prof. McCann) that spearheaded this promising new approach, was dissolved by a school
administration lacking foresight, and I found welcome refuge among kindred spirits on
the faculty of the neighboring University of Southem Califomia (USC) in the Fall of
1978. There, a pioneering Department of Biomedical Engineering had formed under the
enlightened leadership of Prof. Fred Grodins to pursue the grand vision of modeling in
systems physiology.
My initial contact with USC was Prof. George Moore (a brilliant neurophysiologist
with an endearing personality and sharp intellect attracted by the systems viewpoint) who
had served as the series Editor for Plenum on the book I coauthored with Panos. Because
of Caltech's elitist mindset, I initially felt like "stepping down" when I moved to USC.
However, I soon realized that the Biomedical Engineering faculty at USC were at least as
good as their Caltech counterparts (in spite of the difference in reputation and the facili-
ties ofthe two schools). In fact, among my new collegues I found the impressive intellects
ofDonald Marsh, Eugene Yates, George Bekey, and Robert Kalaba (a close associate of
the late Richard Bellman) who provided, along with Fred Grodins and George Moore, a
very stimulating academic environment of the highest caliber. Yates was leading at the
time an NIH-funded Center for Systems Physiology that provided an immediate "horne"
for anchoring my initial research program. I was soon joined by some of the brightest of
the younger generation of systems physiologists: David D'Argenio, Michael Khoo, and
Ted Berger.
xvi PROLOGUE
In 1985, I was able to establish my own NIH-funded Center, the Biomedical Simula-
tions Resource (BMSR), which is dedicated to modeling and simulation ofphysiological
systems. The BMSR has remained active through five cycles of continuous multimillion
dollar funding and has fostered high-caliber research in various areas of physiological
system modeling, including the work of my close associates, Professors D' Argenio,
Khoo, and Berger (with whom I have interacted extensively). The BMSR has been the
primary base of research support for extending the work started at Caltech and bringing it
to a level that vindicates the ambitious aspirations of the 1970s. This book represents the
culmination of this thirty-year effort and seeks to provide the critical link onto the next
generation of the "ambitious new breed of systems physiologists" to whom Panos dedi-
cated our first book.
The first book was pioneering (and somewhat controversial) as it broke new ground
and challenged much of the established thinking. Its pioneering nature raised many new
questions that were subject to intense debate. This book represents a sequel that seeks to
answer many of these "second-generation" questions, resolve longstanding arguments re-
garding the applicability of this approach, and put to rest much of the surrounding "con-
troversy."
Although this book is primarily methodological in its focus, it also addresses the im-
portant issue of physiological interpretability of nonlinear dynamic models through spe-
cific illustrative examples and places the subject matter in the historical context of the
evolution of physiological science.
It is obvious from the foregoing that I owe deep gratitude to my brother Panos for bis
sage advice and enlightening mentorship and to my Ph.D. advisor Gilbert McCann for his
generous support and strong confidence in my abilities. Without their pivotal contribu-
tions, this field would not have developed to the present promise of revolutionizing sys-
tems physiology and I would not have been afforded the opportunity of exciting and grat-
ifying research accomplishments. Furthermore, I must acknowledge the constructive
influence of my senior colleagues at Caltech: Ken Naka, Derek Fender, and Thomas
Caughey, as weIl as the influence and valuable support of my colleagues at USC: David
D'Argenio, Ted Berger, Michael Khoo, George Moore, Don Marsh, Gene Yates, Bob
Kalaba, George Bekey, and Fred Grodins. I consider myself fortunate to have also re-
ceived strong support from the broader peer community throughout my research efforts,
and especially from my distinguished colleagues: Andrew French, Jose Segundo, David
Brillinger, Larry Stark, Ted Lewis, Aage Moller, Dennis O'Leary, Bob Sclabassi, Mark
Citron, Bob Emerson, Stan Klein, Berj Bardakian, and Rob Kearney, to name but a few. I
was also fortunate to have many loyal and productive graduate students and research
staff, especially Spiros Courellis and Georgios Mitsis. Valuable for the preparation ofthis
manuscript has been the earnest and competent assistance of the BMSR Administrative
Coordinator, Marcos Briano. Last, but not least, this undertaking would not have been
possible without the most precious and irreplaceable moral support of my wife Melissa
and ofmy twin sons Zissis and Myrl-the breath ofmy life.
I would like to close the Prologue by borrowing Isaac Newton's closing sentence in his
Preface to the Principia: "I heartily beg that what I have here done may be read with for-
bearance; and that my labors in a subject so difficult may be examined, not so much with
the view to censure, as to remedy their defects."
VASILIS Z. MARMARELIS
Los Angeles, June 2003
1
Introduction
In medicine, one must pay attention not to plausible theorizing (AO"YL<TJ.LOs) but to experience
and reason (AO"YOS) together. I agree that theorizing is to be approved, provided that it is
based on facts and is systematically induced from what is observed; but conclusions drawn
by the unaided reason can hardly be serviceable, only those drawn from observed facts.
-Hippocrates, "Precepts," Athens, 5th Century H.C.
1.1 PURPOSE OF THIS BOOK
The purpose of this book is to bring to the attention of the biomedical community the fact
that an effective methodology exists for quantifying the dynamic interrelationships
among physiological variables ofinterest from natural observations (data).
The book presents the conceptual framework and the mathematical foundation of this
methodology that render it general in its application and rigorous in its approach. This
methodology yields mathematical models of the dynamic interrelationships among the
observed variables in a nonlinear and nonstationary context that is appropriate for physi-
ological systems. Unlike many previous approaches, the advocated approach resists the
temptation of simplifying the model to fit the method but retains, instead, the full com-
plexity depicted in the data.
The book is focused on the modeling of nonlinear time-invariant physiological sys-
tems, unlike most modeling studies to date, which have focused on the limited class of
linear time-invariant systems, due to the relative simplicity of the methods of estimation
and analysis associated with the latter. Nonlinearities are ubiquitous in physiology and of-
ten essential in subserving critical aspects of physiological function. Although few will
argue with the importance and necessity of addressing the nonlinear and dynamic aspects
of physiological systems, most will view this task as a daunting challenge owing to its
Nonlinear Dynamic Modeling 0/ Physiological Systems. By Vasilis Z. Mannarelis 1

ISBN 0-471-46960-2 © 2004 by the Institute of Electrical and Electronics Engineers.
2 INTRODUCTION
considerable complexity. The purpose of this book is to alter this confining view, so
prevalent in the scientific community, by removing the perceived methodological barriers
and offering the practicable prospect of analyzing nonlinear physiological systems within
the prevailing experimental and computational constraints.
We seek to achieve this goal by presenting recently developed methodologies in a
clear and reasoned manner that is rigorous but not unduly burdened by mathematical for-
malism. The emphasis is on key operational issues that allow the practical application of
the advocated approach by interested investigators in a manner that avoids critical pitfalls
and enhances the understanding of the physiological function of the system under study.
Illustrative examples are used extensively to elucidate important methodological points
and to assist the reader in understanding why and how the method works. The examples
also instruct the reader in the manner through which the utility ofthe obtained results can
be realized by elucidating their physiological interpretation.
The mathematical models are derived inductively (from the data) using a general and
rigorous mathematical framework in order to secure methodological rigor and fidelity to
the data. Unlike deductive methods previously used, the advocated methodology does not
require a priori model postulates and avoids the potential pitfall of (possibly inaccurate)
preconceived notions that may unduly constrain the achievable model form.
This point is most critical when limited experimental or natural data are used for the
estimation and validation of the model, thereby limiting the modeling task to a portion of
the operating space of the system. The advocated approach espouses the fundamental re-
quirement of a broad ensemble of input-output data that covers as densely as possible the
entire operating space of the system. Therefore, in the advocated approach, the inductive-
ly derived models are "true to the data" that are collected under broad experimental or
natural operating conditions, and not ad hoc mathematical postulates (reflecting specific
preconceived notions) fitted to experimental data that may probe only part ofthe total op-
erating space ofthe system. Essential for the inductive approach is a general methodolog-
ical framework (such as the one advocated herein) that does not constrain the possible
form of the emerging model.
It is evident that the advocated methodology departs from the conventional way of
thinking and practice in the study ofphysiological systems with regard to the undesirabil-
ity of a priori model postulates and specializedlexperimental input waveforms. It chal-
lenges the established deductive approach, especially when limited experimental data are
used, as potentially misleading and inherently incapable of exploring the full complexity
of the system. It rejects the notion that practical necessity dictates the simplification of the
employed model to stay within the "tractable boundaries" of linear, stationary or static
analysis as unjustifiable in light of recent developments that make nonlinear, nonstation-
ary, and dynamic analysis feasible. It insists on the use of a broad repertoire of data (be
they experimental or natural, although the latter are preferable) instead of limited experi-
mental data using specialized waveforms (e.g., pulses, sinusoids) that are often contrived
to "simplify" the experiment and facilitate its interpretation. The risks of such experimen-
tal or methodological "simplifications" cannot be overstated, since they are likely to cre-
ate the illusion of knowledge while providing results that have limited utility (at best) or
are potentially misleading. For instance, the response to a pulse or sinusoidal stimulus
cannot be used to extrapolate or infer the response of nonlinear dynamic systems to any
other stimulus.
The validity of the aforementioned arguments rests on whether physiological systems
(which have been shown to be almost always nonlinear and dynamic) can be represented
1. 1 PURPOSE OF THIS BOOK 3
meaningfully by approximate linear and/or static models. The definitive answer to this
question can be given only after performing the complete nonlinear dynamic analysis of
the system under broad operating conditions and, subsequently, assessing the adequacy of
linearlstatic analysis. In other words, the linearlstatic analysis is inherently incapable of
answering this question because it does not probe the nonlinear/dynamic alternative (i.e.,
it does not contain explicit information about the possible nonlinear dynamic characteris-
tics of the system). Note that the assessment of the adequacy (or lack thereof) of the lin-
earlstatic model depends on the specific data ensemble used and the prevailing ambient
noise. Therefore, nonlinear dynamic analysis with a broad data repertoire is the only way
to reach the correct assessment ofthe adequacy ofthe linearlstatic model and obtain glob-
ally valid models (even ifthe latter end up being static and/or linear).
The expressed views seem almost self-evident but represent a strong challenge to the
status quo. They seek to facilitate the advent of a new era in systems physiology that con-
stitutes a quantum leap in the way physiological systems are modeled and understood.
They are not meant to denigrate the efforts of past investigators who did not follow the
advocated approach, since a review of history establishes the fact that the process of sei-
entific progress requires a multitude of viewpoints and reveals that even unsuccessful ef-
forts contribute to the advancement ofknowledge by identifying dead ends.
The advocated approach cannot be the definitive view on the subject of physiological
system modeling, since it simply represents another stage in the evolution of knowledge
(with many improvements and refinements certain to follow). Nonetheless, it can advance
the state of the art by a "quantum leap," since it represents a drastic departure from con-
ventional thinking and practice. It is interesting to note that the advocated approach is
consistent with the basic tenets of the Hippocratic teachings that formed the foundation of
medicine as a scientific discipline, separated from priestcraft and groundless speculation,
in the 5th century B.C. (see Historical Note #1).
Hippocrates first asserted the cardinal importance of clinical observation against "the-
orizing" (speculation not supported by data) and established the fundamental concept of
the "unity of organism" against the fragmented approach of the "Empiricists" and the
"Anatomists." The latter distinction remains the key issue underpinning the ongoing de-
bate between the integrative systems approach advocated herein and the reductionist ap-
proach prevalent in the physical and biological sciences to date. The static view of the
Anatomists and the Empiricists was countered by the Hippocratic dynamic view of the
"disease process" and the "recuperative faculties" of living organisms (an early version of
the concepts of homeostasis and system dynamics). The Hippocratic views survived the
challenge ofthe Empiricists, as weIl as the "Methodists" and the "Atomicists" of antiqui-
ty (all espousing reductionist views), and were restored to their rightful place of universal
acceptance by the great Greek physician, scientist, and philosopher, Galen of Pergamos
(Galenos or faAllvoS in Greek) in the 2nd century A.D. Galenos' extensive writings on
human physiology defined medicine until the 16th century when William Harvey founded
modem physiology with his seminal experiments in Padua (building on earlier seminal
anatomical contributions by Vesalius and others in northern Italy).
Galenos fully espoused the key Hippocratic views on the "unity of the organism" and
the cardinal importance of clinical observation of the "disease process" (as opposed to the
static view of examining the composing parts and the isolated symptoms, espoused by all
other schools of thought at his time) that led hirn to an integrative and dynamic view of
physiology. These basic tenets form the foundation ofthe advocated approach of dynamic
system physiology under natural operating conditions.
4 INTRODUCTION
This approach stands in contrast to the reductionist and static viewpoints that remain
dominant in biological sciences, and questions the validity (or even the purpose) of the
oversimplified experimental preparations often used in designing experiments for con-
ventional "hypothesis-driven" research. These strong statements do not seek to invalidate
the significant contributions made by hypothesis-driven research or to reject the valuable
knowledge acquired over the centuries through the reductionist approach that have ad-
vanced scientific progress. The advocated viewpoint simply seeks to establish the proper
balance between the two approaches in pursuing scientific knowledge, so that their syner-
gistic contributions serve scientific progress and prevent the establishment of a mutually
impeding antagonism so often fostered in the past by an unwise inclination for polarized
thinking.
It is the ambitious goal of this book to contribute to a new movement (with ancient
roots) that will restore the key Hippocratic-Galenic tenets to their rightful position in sci-
entific thinking and practice through adoption of the dynamic systems viewpoint, in order
to bring about the desirable leap of progress in physiology and medicine.
1.2 ADVOCATED APPROACH
The advocated approach offers the effective methodological framework for obtaining reli-
able and objective (i.e., devoid of subjective modeling notions) descriptors ofthe system
nonlinear dynamics based on the available experimental or natural data. This approach
employs general model forms that do not require specific model postulates and yield in-
ductively "data-true" models in the stochastic broadband context ofnatural operating con-
ditions.
Due to the complexity of this fundamental problem, we have taken a gradualist step-
by-step approach, building on the rigorous and general mathematical foundation of the
Volterra-Wiener approach as extended and modified for various applications over the last
thirty years. It is gratifying to note that our efforts have succeeded in developing asolid
foundation for a general modeling approach capable oftackling this problem in a practical
context.
This novel modeling methodology has been tested in pilot applications from the ner-
vous, cardiovascular, renal, respiratory and metabolic/endocrine systems. These applica-
tions have showcased the efficacy of the developed methodology and have allowed im-
portant advances in systems physiology by assigning physiological significance to the
obtained model components in a manner that deepens the scientific understanding of the
system under study. This demonstrates the potential benefits of the advocated approach
that are expected to enable improvements in diagnostic as well as therapeutic procedures.
Standing on this solid foundation, we are poised to address the next generation of chal-
lenges that pertain to the multivariate and highly interconnected nature of physiological
systems. This direction represents the natural extension of our efforts in reaching our ulti-
mate objective of modeling the true physiological systems, and not their simplified "sur-
rogates" born of methodological inadequacy. This forward-looking task is commenced
with the important case of multiple inputs and multiple outputs that is discussed in Chap-
ter 8. The complexity that emerges from the interconnections among multiple physiologi-
cal variables of interest (often in closed-loop or nested-loop configurations) must be
placed in a nonlinear dynamic context, with possible nonstationarities. Since we seek to
study the physiological system under "natural" operating conditions (i.e., exposed to an
1.2 ADVOCATEDAPPROACH 5
ensemble of natural stimuli and unconstrained by arbitrary experimental manipulations),

the ultimate modeling task must be placed in a multivariate stochastic broadband context,
without artificially imposed constraints (e.g., fixed operating points or specialized input
waveforms), in order to achieve a globally valid model of the real system.
A measure of the posed challenge is attained when we note that proper study of the
real physiological systems requires reliable modeling methods that are capable of dealing
with:
• Nonlinear dynamics (ofarbitrary order)

• Multiple variables of interest (observable/measurable)
• Multiple interconnections (possibly closed-loop)
• Possible nonstationarities in system dynamics
• Broadband stochastic input/output signals
• Considerable measurement noise and systemic interference
Last, but not least, the obtained models must be amenable to meaningful physiological in-
terpretation and offer the prospect of achieving significant diagnostic and/or therapeutic
improvements.
The critical task of meaningful interpretation of the obtained models and their proper
utilization in clinical practice is formidable, as much as it is important, because the wealth
of information contained within these complicated models may be overwhelming and dif-
ficult to hamess. Nonetheless, it is incumbent on us to perform this task in order to realize
the benefits of the advocated new approach and achieve the ambitious goal of a quantum
leap in the state of the arte
Interpretation of the obtained models will focus on relating their characteristics to spe-
cific physiological mechanisms that are either known qualitatively or can be explored ex-
perimentally. It will also examine the effects of changes in certain physiological variables
(in a dynamic context) and the robustness ofthe overall system in the event ofintemal or
external perturbations (horneostasis). The latter study may demarcate the bounds of nor-
mal versus pathological states.
Utilization of the obtained models in a clinical context will seek to examine their use
for improved diagnosis of disease (i.e., more and better clinically relevant information)
and for the quantitative assessment of pharmaceutical treatment or therapeutic interven-
tion. The latter will allow optimization of treatment (with regard to specific clinical goals)
and the design of improved therapeutic procedures, consistent with the key Hippocratic
exhortation: ''first, do no harm. . . ."
The progress made to date in the development of effective methodologies for nonlinear
and/or nonstationary modeling of physiological systems is summarized in this book and
has given rise to a new generation of issues associated with the study of greater system
complexity and the analysis of expanded experimental/clinical databases-both direct
consequences of advances in the state of the art--eonsistent with Socrates' aphorism: "the
more I learn, the more I realize how much I do not know."
The advocated approach stands on the confluence of methodological (nonlinear and
nonstationary) and technological (computational and experimental) advancements, and
seeks to leverage on their synergistic utilization in order to tackle the formidable chal-
lenges in physiological system modeling from natural (i.e., random broadband) data. Pilot
applications are selected from physiological domains that exhibit essential nonlineari-
6 INTRODUCTION
ties/nonstationarities (neural, cardiovascular, respiratory, renal, endocrine, and metabolie

systems) in order to demonstrate the wide applicability and unique capabilities of the ad-
vocated novel methodologies. In this sense, the advocated approach is at the cutting edge
of scientific developments and has a universal appeal to all physiological domains in
terms of scientific advancement, as well as potential impact across a broad swath of clini-
cal applications.
The immense variety of nonlinear behavior makes it desirable that the developed
methodologies retain a high degree of generality and the ability to gracefully transition
into interpretable models for each particular application.
The complexity of nonlinear behavior also makes it imperative that the ensemble of
input signals used for probing/observing the system behavior be spectrally and tempo-
rally rich, so that the maximum possible interactions among different values or frequen-
cies of the input signal be observed at the recorded output. This is the fundamental rea-
son why we favor random broadband input signals over specialized deterministic
signals, following on the pivotal suggestion of Norbert Wiener regarding the use of
Gaussian white noise inputs that is discussed in Section 2.2. For instance, the customary
use of reetangular pulses or sinusoids as stimuli by experimental physiologists may be
appealing to the eye of the observer and offer a comprehensible response waveform, but
they are very poor probing signals in terms of the extracted amount of information for
given experimentation time duration. This important point will be elaborated in Sections
2.1.5,2.2.1, and 5.1.
In addition, the practical modeling task is complicated by the fact that the observed op-
erating conditions are often burdened by severe interference and noise, as well as nonsta-
tionarities in the system behavior. Since we will mainly address the case of stationary
(time-invariant) models, the issue of systemic and measurement nonstationarities should
be borne in mind as a complicating factor that may compromise the quality of the results
or limit the record length of the data used for model estimation. The subject of explicit
nonstationary modeling will be discussed in Chapter 9.
The practical challenge posed by the ubiquitous presence of noise and/or interference
necessitates robust estimation algorithms that minimize the effect of noise/interference on
the obtained model estimates for a given data record length. Increasing the data record
length will normally improve the model estimates unless possible nonstationarities de-
grade the benefits of the extended data record. Repetition of identical experiments and
proper averaging may mitigate the effects of noise/interference even for certain types of
systemic nonstationarity but is more time-consuming and vulnerable to lack of ergodicity
(see Chapter 5). In all cases, intelligent design of experiments can maximize the output
signal-to-noise ratio by putting more input signal power at those frequency bands that are
most critical for the system dynamics and are less contaminated by noise/interference.
This is an important practical issue that has not received adequate attention in the litera-
ture and will be discussed in Chapter 5.
1.3 THE PROBLEM OF SYSTEM MODELING IN PHYSIOLOGY
The purpose of physiological system modeling is to advance our quantitative understand-

ing of biological function and improve medical science and practice by utilizing the ac-
quired quantitative knowledge. In modeling physiological systems, we seek to summarize
all available experimental evidence about the functional characteristics of a system in the
1.3 THE PROBLEM OF SYSTEM MODELING IN PHYSIOLOGY 7
form of mathematical relations among variables of physiological interest. The resulting

mathematical models ought to emulate the observed functional behavior of the system un-
der natural operating conditions when simulated on the computer.
Ideally, such a model must be accurate (i.e., reproduce precisely the observed data),
global (i.e., be accurate under all natural operating conditions), compact (i.e., have the
minimum mathematical and computational complexity), and interpretable (i.e., be
amenable to physiological interpretation that advances our understanding of the mecha-
nisms subserving the system function). Implicit in the first attribute is the robustness of
the model, which provides for stable behavior in the face of internal or external random
perturbations. The latter, and the omnipresence of noise, require that our modeling efforts
be cast in a stochastic context. Furthennore, the development of a global model (valid un-
der all natural operating conditions) presumes our ability to observe and measure the
spontaneous activity of the variables of interest with sufficient accuracy and sampling res-
olution over representative time intervals. These variables can be viewed as inputs, out-
puts, or internal state variables depending on our specific modeling goals.
Such an "ideal" model would offer a succinct quantitative representation of the func-
tional characteristics of the system and would allow the study of its behavior under arbi-
trary conditions through computer simulations (thus maximizing the "yield" ofphysiolog-
ical research). In addition to being a complete "capsule of knowledge" that advances
scientific understanding of how and why physiological systems function in the way they
do, such a model can improve clinical diagnosis (by providing better and more relevant
information) and treatment (by properly guiding therapeutic procedures and assessing
their effects). The implications are immense and promise to usher a new era of advanced
medical care.
The development of such an "ideal" model is a formidable task, because of the func-
tional complexity of physiological systems, which are typically dynamic, nonlinear, non-
stationary, highly interconnected (often in closed-loop configurations), and subject to sto-
chastic perturbations and noise. Furthennore, the modeling process has been constrained
in practice by limitations of the available experimental, computational, and analytical
methods, leading heretofore to the development of "less than ideal" models in the sense
defined above, that is, inability to reach the desired attributes of accuracy, globality, com-
pactness, and interpretability.
This sobering reality is put in perspective when one notes that many of the current
modeling efforts are still confined to rudimentary methods of static and linear analysis.
The course of methodological evolution started with static linear analysis (linear regres-
sion among variables of interest) and gradually progressed into dynamic (differential or
integral equations) and nonlinear analysis. Although linear dynamic analysis has been in-
creasingly used, the use of nonlinear dynamic analysis remains rather limited due to the
scarcity of practical methods and the intrinsic complexity of the problem. This shortcom-
ing would not be alanning, if it were not for the fact that most physiological systems ex-
hibit significant and essential dynamic nonlinearities. This problem is compounded by the
fact that physiological systems are also often nonstationary (i.e., their functional charac-
teristics vary with time) necessitating models whose parameters also change with time.
The need for proper modeling methodologies in this realistic context is increasingly
pressing and provides the motivation for this book.
Physiological system modeling provides the means of summarizing vast amounts of
experimental data into relatively compact mathematical (or computational) forms that al-
low the formulation and testing of scientific hypotheses regarding the functional proper-
8 INTRODUCTION
ties of the system-an iterative process that should lead to successive refinement and evo-
lution of the system model. Thus, system modeling attains a central role in the scientific
process of generation and dissemination of knowledge, consistent with the credo "model
or muddle." Models can be applied to arbitrary levels of system decomposition or integra-
tion depending on the availability of appropriate data, hence providing the conceptual and
methodological means for developing a hierarchical understanding of integrative systems
physiology.
The fundamental question ofthe functional relationships between observed physiolog-
ical variables drives the system modeling effort. The variables are observed experimental-
ly over time (occasionally over space or wavelength) and are viewed as signals (or
datasets) that are linked by a causal relationship. The direction ofthis causal relationship
(cause-to-effect sequence) defines some of them as inputs and some of them as outputs of
a conceptual operator (the system) that transforms the inputs into the outputs (Figure 1.1).
This transformation is generally dynamic in the sense that the present values of the out-
puts depend not only on the present but also on the past values of the inputs. Another way
of describing the same (defining) characteristic of dynamic systems is to state that the ef-
fect of an input change upon the output signal spreads over time.
It is worth noting that the system is defined by means of its inputs and outputs (and not
by a physical entity). Thus, we may define many different systems using the same physi-
cal entity by altering the selected inputs and outputs, as long as causality is not violated.
This point is illustrated in Figure 1.2, where a physical entity is comprised of five compo-
nents (A, B, C, D, and E) and the designated connections (arrows) represent directional
causal relationships. Ifwe stimulate component A with input XA(t) and record output ycCt)
from component C, then we define the system SAC as the signal transformation from A to
C. However, if we record from another component [e.g., YD(t) out of component D] and
stimulate another component [e.g., XE(t) into component B], then we define a different
system SBD representing the signal transformation from input B to output D, even though
the underlying physical entity remains the same.
The input-output signal transformation describes the functional properties of the sys-
tem and may be linear or nonlinear, time-invariant (stationary) or time-varying (nonsta-
tionary), and deterministic or stochastic. A mathematical expression describing quantita-
tively this input-output signal transformation is the sought model of the system. For our
purposes, the sought model will be deterministic, implying that possible stochastic varia-
tions of the system characteristics will be relegated to the status of systemic noise or mod-
eling errors and will be incorporated in the stochastic error term of the model. The latter
will also incorporate other modeling errors and possible measurement noise or
extemal/systemic interference.
Xl(t) .... ...

"7" "'7 YI(t)
....
S ....
XM(t) ? -, YN(t)
Figure 1.1 Schematic of a "black-box" system operator S transforming M input signals {X1(t), ... ,
XM(t)} into N output signals {Y1 (t), ... , YN(t)}.
1.3 THE PROBLEM OF SYSTEM MODELING IN PHYSIOLOGY 9
S
BD
XA(t)
Yc(t)
(a)
YD(t)
i---------------~-----------------------------------------------1
,,, ( D '''
,, ''
I I
,, ' '
,, '''
,,, ''
I I
, '
i,, A !'
'
,, ''
I '
I •
, '
i !
: i
,________________________ Ji
x.(t) ------------c--S-----
BD
(b)
Figure 1.2 Schematic diagram of causal connections (denoted by arrows) among five physical vari-
ables (A, 8, C, D, E). The selected input(s) and output(s) define the particular system operator. For in-
stance, an inputxA(t) stimulating A and an outputyc(t) recorded from C define a system operator SAC
(a) or an input XB(t) stimulating Band an output YD(t) recorded from D define a different system oper-
ator SBO (b).
If we limit ourselves, at first, to the single-input/single-output case, we can use the

convenient mathematical notation ofa functional S[·] to denote the causal relationship be-
tween past and present values of the input x and the present value of the output y as
y(t) = S[x(t'), t' ::; t] + e(t) (1.1)
where the functional S[·] represents the detenninistic system as it maps the input past and
present values onto the output present value, and e(t) represents the stochastic error term.
The error term is assumed to be stochastic and additive-the latter is a common assump-
tion that simplifies the estimation task but may not correspond to reality in some cases
where the error may have a multiplicative or other modulatory effect on the input/output
signals. The error term is also called the "residual" and may contain modeling errors (in-
cluding possible stochastic variations of the system characteristics), systemic noise or in-
terference, and measurement noise. For analysis and estimation purposes, the residual is
usually treated as a stationary random process (with zero mean) that is statistically inde-
pendent from the input and the noise-free output signals (although it is contained in the
output data measurements). Note that deviation from this assumption regarding the resid-
ual term may cause significant estimation errors.
10 INTRODUCTION
The convenient mathematical notation of Equation (1.1) can be extended to the case of
causal systems with multiple inputs and multiple outputs by adopting a vector notation
(shown in bold) for the input/output signals, the residual terms, and the functionals:
y(t) = S[x(t'), t' ~ t] + s(t) (1.2)
Clearly each output signal must have its own distinct functional, implicating (poten-
tially) all inputs. The case of multiinput/multioutput systems is discussed extensively in
Chapter 8. However, the main methodological developments are presented in Chapters
2-5 for the single-input/single-output case to avoid the burden ofthe inevitably increased
complexity ofmathematical expressions resulting from the multiple inputs and outputs.
1.3.1 Model Specification and Estimation

The goal of mathematical modeling is to obtain an explicit mathematical expression for
the functional S[·] using experimental or natural input-output data and all other available
knowledge about the system. This goal is generally achieved in two steps. At first, a suit-
able mathematical form is selected for S[·], containing unknown parameters and/or func-
tions, which are subsequently estimated by use of input-output data in the second step.
This two-step procedure is often referred to as "system identification" in the engineering
literature, and the two steps are termed "model specification" and "model estimation," re-
spectively.
The model specification task is generally more challenging and the critical step to suc-
cessful modeling. It utilizes all prior knowledge and information regarding the system un-
der study and seeks to select the appropriate model form in each case. Typically, the se-
lection is made from among four classes of models: nonparametric, parametric, modular,
and connectionist (see Table 1.1). The criteria for selection of the appropriate model class
are discussed in Chapters 2-5 and constitute a critical issue.
The model estimation task employs estimation methods suitable for the specific char-
acteristics of each case and seeks to maximize the accuracy of the resulting model predic-
tion for given data types and noise conditions. Various measures and norms ofmodel pre-
diction accuracy can be used, but the most common is the mean-square error of the output
prediction (the sum ofthe squared residuals).
The model estimation task is important because it yields the desired result, but it is
meaningful only when the model specification task is performed successfully. The highly
technical nature ofthe model estimation task has attracted most ofthe attention in the en-
Table 1.1 Nonlinear System Modeling Methodologies
Strengths (+) and Weaknesses (-)

Nonparametrie Parametrie Modular Conneetionist
Model speeifieation + ± ±
Interpretability ± + +
Robustness to noise + + ±
Compaetness + +
Adaptable to time varianee + ± ± +
1.3 THE PROBLEM OF SYSTEM MODELING IN PHYS/OLOGY 11
gineering literature, and the contents of this book reflect this fact by delving into the many
technical details ofthe various estimation methods. Nonetheless, it is useful to remember
that, although the estimation methods are necessary tools for accomplishing the modeling
task, the art of modeling and the impact of its scientific applications hinge primarilyon
the successful performance of the model specification task and the meaningful interpreta-
tion of the obtained model. It is for this reason that the overall modeling philosophy advo-
cated by this book places the emphasis on securing first the best possible model specifica-
tion results and then performing accurate model estimation.
Since system modeling seeks to find and quantify the causal functional relationships
among observed variables, it occupies a central position in the process of scientific dis-
covery. Although it addresses only the functional aspects of the system under study,
structural information can be used to assist the model specification task by properly con-
straining the postulated model. Conversely, system models can be used to examine alter-
native hypotheses regarding the structural composition of a given system (e.g., intercon-
nections of neuronal circuitry) and thus can advance our structural knowledge of
physiological systems.
It is critical to note that the selection of the model form must not constrain unduly the
range of functional possibilities of a system, lest it lead to biased and inaccurate results.
Thus extreme care must be taken to avoid such inappropriate constraining of the model
either in terms of its mathematical form or in terms of its operational range (i.e., dynamic
range and bandwidth). At the same time, the efficiency ofthe employed estimation meth-
ods and the utility of the obtained model in terms of scientific interpretability are compro-
mised when the model is not adequately constrained. This key trade-off between model
parsimony and its global validity pervades all modeling studies and is of fundamental im-
portance, as discussed later in connection with the various classes of model forms.
Related to the model specification task is the issue of inductive versus deductive mod-
el development, discussed in Section 1.5. Deductive modeling is possible only in those
rare occasions where sufficient knowledge exists about the detailed functional properties
of the system, so that its internal workings can be described accurately by use of first
physical and/or chemical principles. In these fortunate, but rare, occasions the system
model can be reliably postulated in the form of precise mathematical expressions using a
deductive process. The complexity of physiological systems rarely affords this kind of
opportunity and, consequently, modeling of physiological systems is usually inductive
(i.e., it is based on accumulated empirical evidence from experimental data). This distinc-
tion between inductive and deductive modeling has been made before with the use of the
terms "empirical or external" and "axiomatic or internal" models respectively [Arbib et
al., 1969; Astrom & Eykhoff, 1971; Bellman & Astrom, 1969; Rosenblueth & Wiener,
1945; Yates, 1973; Zadeh, 1956].
Although prior knowledge about the system can be used to assist the model specifica-
tion task, the modeling approaches presented herein are based entirely on experimental or
natural input-output data and exclude the favorable (but rare) occasions where the model
form can be derived from first principles or can be reliably postulated on the basis of pri-
or knowledge about the system. Therefore, our approach to physiological system model-
ing is inductive and data driven.
The necessity of inductive modeling for most physiological systems elevates the im-
portance of the type and quality of experimental or natural data needed for our modeling
purposes. It is imperative, for instance, that the data cover the entire functional space of
the system (e.g., the entire bandwidth and dynamic range of interest) and that the noise
12 INTRODUCTION
levels remain low relative to the power of the signal of interest. It must be noted in this
connection that the systemic noise or interference (which is prevalent in physiological
systems) is usually more harmful for the estimation process than the measurement noise.
1.3.2 Nonlinearity and Nonstationarity

A critical issue that complicates the modeling task is the presence of nonlinearities in the
physiological system. Since this is the focus ofthe book, it is discussed in detail in the fol-
lowing section and in Chapters 2-7. Here, we limit ourselves to the definition ofnonlin-
earities and nonstationarities with regard to the functional notation of Equation (1.1). A
note should be made about static nonlinearities, whereby y(t) depends only on the value of
x(t) and the functional S[·] reduces to a function. Static nonlinearities are easy to model
(graphically or with simple numerical fitting procedures), leaving dynamic nonlinearities
as the true challenge.
With reference to the functional notation ofEquation (1.1), linearity ofthe system im-
plies that S[·] obeys the "superposition principle," which states that if YI (t) and Y2(t) are
the system outputs for inputs XI(t) and X2(t), respectively, then for an input x(t) = AIXI(t)+
A2X2(t) the output is: y(t) = AtYI(t) + A2Y2(t). This can be expressed by the mathematical
condition
S[AIXl(t) + A2X2(t)] = AIS[XI(t)] + A2S[X2(t)] (1.3)
where XI(t) andx2(t) are linearly independent signals, and Al and A2 are nonzero scalar co-
efficients. The above condition can be tested experimentally with any pair of linearly in-
dependent inputs and scalar coefficients, and it must be satisfied by all such pairs. This
condition for linear superposition can be extended to any number of linearly independent
signals and remains by practical necessity a necessary, but not sufficient, condition (since
all possible combinations cannot be practically tested). Appendix 1 provides the definition
of linear independence between two or more signals. Elaboration on the experimental
testing of system linearity is given in Section 5.2.4.
Returning to the functional notation of Equation (1.1), we can point out that stationarity
(or time-invariance) of the system implies that the input-output mapping rule represented
by S[·] remains invariant through time. It should be stressed that S[·] denotes the rule by
which the system constructs the output at time t using the input values at time t and before.
Thus, nonstationarity (or time-variance) should not be confused with the inevitable tempo-
ral changes in the output signal caused by temporal changes in the input signal that occur
whether the system is stationary or not. Experimental testing ofthe stationarity of a system
requires the repetition of identical experiments at different times and the comparison ofthe
obtained results. The test for stationarity of a system can be based on the following condi-
tional statement: if the system output for input x(t) is y(t) then the output for input x(t - u)
is y(t - u) for every time shift a. Since all experimental studies are subject to random influ-
ences and disturbanees, this assessment is not straightforward in practice and requires sta-
tistical analysis of sufficient data, as discussed in Section 5.2.3 and in Chapter 9.
An experimental complication, often encountered in physiological systems, is the pres-
ence of possible nonstationarities in the experimental preparation caused by inadvertent
injuries in surgical procedures. Thus an important distinction must be made between non-
stationarties that are intrinsic to the system operation (e.g., endocrinelhonnonal cycles or
biological rhythms) and pertain to the actual physiological function of the system, and
1.4 TYPES OF NONLINEAR MODELS OF PHYSIOLOG/CAL SYSTEMS 13
those that affect our measurement but are of no interest vis-ä-vis the physiological func-
tion ofthe system under study. The former type ofnonstationarity ought to be incorporat-
ed into the estimated model by proper methodological means, as discussed in Chapter 9.
The latter type of nonstationarity degrades the quality of the data and, although it can be
often viewed as low-frequency noise, it may not have a simple additive effect on the ob-
served output data (e.g., it may have a multiplicative or modulatory effect). Therefore, it
is important to remember that the employed modeling methodologies for physiological
systems must be robust in the presence of noise or systemic interference and must not re-
quire very long experimentation time over which measurement nonstationarities may de-
velop or have significant effect.
Since the input-output data are collected and processed in sampled digital form, it is
evident that the actual implementation of these modeling approaches takes place in dis-
crete time. Thus, the continuous-time, input-output signals x(t) and y(t) must be convert-
ed into discrete-time signals x(n) and y(n) using a fixed sampling interval T, where n de-
notes the discrete-time index (i.e., t = nT). Provided that proper attention is paid to the
issue of aliasing (by sampling at a sufficiently high rate that secures a Nyquist frequency
greater than the bandwidth of the sampled input-output signals, as discussed in Section
5.1.3), the presented mathematical methods generally transfer to the discrete-time case
with minor modifications (e.g., converting an integral into a summation). Both discrete
and continuous cases are presented throughout the book, and we make special note of the
few cases where the transition from continuous to discrete time requires special attention
(e.g., from nonlinear differential to difference equations).
1.3.3 Definition of the Modeling Problem

With all these considerations in mind, the physiological system modeling problem is de-
fined as:
Given a set ofinput-output experimental or natural data, find a mathematical model that
describes the dynamic input-output relationship with sufficient accuracy (using a mean-
square-error criterion for the output model prediction) under the following conditions:
• No prior knowledge is available about the internal workings ofthe system

• Nonlinearities and/or nonstationarities may be present in the system
• Extraneous noise and/or systemic interference may be present in the data
• The obtained model must be amenable to physiological interpretation
• Experimentation time and computational requirements may not be excessive
A practical guide for the solution ofthe modeling problem in the context ofphysiolog-
ical systems is provided in Chapter 5.
1.4 TYPES OF NONLINEAR MODELS OF PHYSIOLOGICAL SYSTEMS
The challenge of modeling nonlinear physiological systems derives from the immense va-
riety of nonlinearities in living systems and the complex interactions among multiple
mechanisms linked together within these systems.
We summarize in Table 1.1 the four main nonlinear modeling approaches and class-
14 INTRODUCTION
es of nonlinear models used to date: nonparametric, parametric, modular, and connec-

tionist.
In the nonparametric approach, the input-output relation is represented either analyti-
cally in integral equation form where the unknown quantities are kerne I functions (e.g.,
Volterra-Wiener expansions) or computationally as input-output mapping combinations
(e.g., look-up tables or operational surfaces/subspaces in phase space). Nonparametrie
models are easy to postulate (because of their generality) but typically lack parsimony of
representation. Of the various nonparametric approaches, the discrete-time Volterra-
Wiener (kernel) formulation has been used most extensively for nonlinear modeling of
physiological systems and will form the mathematical foundation of this book, along with
its relations to the other modeling approaches.
In the parametric approach, algebraic or differential/difference equation models are
typically used to represent the input-output relation for static or dynamic systems, respec-
tively. These models typically contain a small number of unknown parameters that may
be constant or time-varying depending on whether the model is stationary or nonstation-
ary. The specific form ofthese parametric models is usually postulated a priori but the se-
lection of certain structural parameters (e.g., degree/order of equation) are guided by the
data.
The modular approach is a hybrid between the parametric and the nonparametric ap-
proaches that makes use of block-structured models composed of parametric and/or non-
parametric components properly connected to represent the input-output relation in a
manner that reflects our evolving understanding of the functional organization of the sys-
tem. The model specification task for this class of models is more demanding and may
utilize previous parametrie and/or nonparametric modeling results. A promising variant
of this approach, which derives from the general Volterra-Wiener formulation, employs
principal dynamic modes as a minimum set of filters to represent parsimoniously a non-
linear dynamic system ofthe broad Volterra class (see Section 4.1.1).
The connectionist approach has recently acquired considerable popularity and makes
use of generic model configurations/architectures known as artificial neural networks to
represent input-output nonlinear mappings in discrete time. These connectionist models
are fully parameterized, making this approach akin to parametric modeling, although typ-
ically lacking the parsimony and interpretability ofparametric models. A hybrid nonpara-
metric/connectionist approach is at the core ofthe modeling methodology that is advocat-
ed as the best overall option at present.
The relations among these four approaches are of critical practical importance, since
considerable benefits may accrue from the combined use of these approaches in a cooper-
ative manner. This synergistic use aims at securing the full gamut of advantages specific
to each approach. The relative advantages and disadvantages in practical applications of
the four modeling approaches will be discussed in Chapters 2-4.
The ultimate selection of a particular methodology (or combinations thereof) hinges
upon the specific characteristics of the application at hand and the prioritization of objec-
tives by the individual investigator. Nonetheless, it is appropriate to state that no single
methodology is globally superior to all others (i.e., excelling with regard to all criteria and
under all possible circumstances) and much can be gained by the synergistic use of the
various modeling approaches in a combination that takes into account the specific charac-
teristics/requirements of each application. Judgment must be exercised in each individual
case to select the combination of methods that yields the greatest insight within the given
experimental constraints. Since the general nonlinear system identification problem re-
1.4 TYPES OF NONLINEAR MODELS OF PHYS/OLOG/CAL SYSTEMS 15
mains achallenge of considerable complexity, one must not be lulled into the risk-prone
complacency of blind algorithmic processing. Many challenging issues remain that re-
quire vigilant attention, since there is no substitute for intelligence and educated judgment
in resolving these issues. Four examples are given below to illustrate the various model
forms employed by these approaches in different physiological domains.
Example 1.1. Vertebrate Retina

The early stage of the vertebrate visual system (retina) is chosen as the first illustrative
example to honor the historical fact that this was the first physiological system extensive-
ly studied with the Volterra-Wiener approach. A schematic ofthe neuronal architecture
ofthe retina is shown in Figure 1.3. The natural input to the retina is light intensity (pho-
ton flux per unit surface) impinging on the photoreceptor cells that convert the photon en-
ergy into intracellular potential through a chain of photochemical and biochemical reac-
tions. The generated photoreceptor potential is synaptically transferred to downstream
horizontal cells and bipolar cells through triadic synapses in the photoreceptor pedicle.
The intracellular potentials thus generated within horizontal and bipolar cells are synapti-
cally transferred to the postsynaptic ganglion cells and to the interneurons of the inner
plexiform layer (various types of amacrine cells).
Thus, visual information conveyed by the time variations of light intensity is converted
(encoded) into sequences of action potentials by the ganglion cells and transmitted to
higher levels of the visual system via the optic nerve. This "encoding" process represents
cascaded signal transformations effected by the various retinal neurons and their multiple
interconnections. Therefore, one "system of interest" can be defined by considering as in-
put signal the variations of light intensity impinging on the photoreceptors and as output
LIGHT STIMULUS
1 S(x ,y, l)
CONE RECEPTOR ROD RECEPTOR

RESPONSE RESPONSE
Vc(t)
V.(t)
HORIZONTAL CELL
RESPONSE
BIPOLAR CELL
RESPONSE • \( VII(t)
V.(t)
AMACRINE CELL
RESPONSE
V.(t)
GANGLION CELL
RESPONSE
Vc(t)
Figure 1.3 Schematic of the neuronal organization of the retina (after Dowling & Boycott, 1966).
16 INTRODUCTION
signal the resulting sequence of action potentials generated by the ganglion cells. The in-
duced intracellular potential in any other retinal neuron along this cascade of signal trans-
formations can be used as an output signal in order to define a different "system of inter-
est." A block diagram depicting the main neuronal interconnections in the retina is shown
in Figure 1.4, suggesting many different possibilities for defining input/output signals and
the corresponding "systems of interest."
In the foregoing, we described briefly the temporal sequence of causal effects from in-
put photons impinging on the photoreceptor cell to the output action potentials generated
by the ganglion cells. However, it is clear that complicated causal effects also occur in
space through the multiple lateral interconnections among retinal neurons and intemeu-
rons. Thus, space can be viewed as another independent variable (in addition to time) to
define retinal input-output signal transformations. This leads to the advanced spatiotem-
poral analysis ofthe visual system discussed in Section 7.4.1. Note that the wavelength of
the input photons can be viewed as yet another independent variable in studies of color vi-
sion.
The first successful application of the cross-correlation technique of nonparametrie
modeling (see Section 2.2.3) on visual neuronal pathways using band-limited Gaussian
white noise (GWN) test inputs was performed on the catfish retina [Marmarelis & Naka,
1972]. The recorded output signal was the sequence ofaction potentials generated by a gan-
glion cell (actually the "probability offiring" measured by superimposing the outputs ofre-
peated trials with the same GWN input, also called the "peristimulus histogram"). The ob-
tained nonparametric model took the form of a discrete-time, second-order Wiener model:
y(n)=h o+ L, hl(m)x(n-m) + L,L,h 2(mh m2)x(n- ml)x(n- m2)-PL, h 2(m,m) (1.4)

m mlm2 m
where ho, h l, and h 2 denote the discretized Wiener kemels of zeroth, first, and second or-
der, P is the power level of the discretized GWN input x(n) (light intensity), and the sum-
STIMULUS (WHITE NOISE)
~
LIGHT
Figure 1.4 A block diagram depicting the main signal-flow pathways in the vertebrate retina and a
light stimulus with the resulting ganglion cell response. Other stimulus-response pairs can be chosen
experimentally (after Marmarelis & Marmarelis, 1978).
mation over m, m b and mz covers the entire memory of the system kernels . The dis-
cretized output y(n) represents the probability of firing an action potential by a single gan-
glion cell at discrete-time index n (t = nT, where T is the sampling interval).
The first-order and second-order Wiener kemels of the horizontal-to-ganglion neu-
ronal pathway, estimated through the cross-correlation technique, are shown in Figure
1.5. The interpretation ofthese kemels became a key issue from the very beginning, insti-
gating intensive debates regarding the potential utility of this approach. Many arguments
were voiced for and against this approach, some of which remain the fodder of lively de-
bate to date. However, the fact remains that this general nonparametric approach repre-
sents a quantum leap of improvement over any other method used heretofore and, al-
though some interpretation issues remain open, it has already elucidated immensely our
understanding ofretinal function. For instance, it is evident from the waveform of h, that
the system is encoding both intensity and rate-of-change information (discussed further in
Section 6.1.1). It is also evident from the form of h 2 that the system exhibits rectifying
nonlinearities, consistent with the presence of a threshold for the generation of action po-
tentials (see the contribution of h 2 to the model prediction in Figure 1.6). The validation
for this model is provided by its ability to predict the output signal to any given input sig-
nal, as demonstrated in Figure 1.6.
Many other "systems of interest" have been defined in the retina by considering the
outputs of other retinal neurons (e.g., horizontal, bipolar, amacrine) and/or other inputs
(e.g., light intensity , current injected into various cells, or light stimuli of different wave-
lengths). Extensive modeling studies have been reported for these systems by Naka and
his associates following the nonparametric approach (see Section 6.1.1).
rhZ
~
SECONO-ORQER KERNEL
25
h, (Tl , FIRST-ORDER KERNEL
s
.::::::
0.2
-5
-10
Figure 1.5 The first - and second-order Wiener kernel estimates of the horizontal-to-ganglion cell
system in the catfish retina, obtained via the cross-correlation technique using band-Iimited GWN
stimuli of current injected into the horizonta l cell layer (after Marmarelis & Naka, 1972).
18 INTRODUCTION
~~t~~~
iLlliU~ Ih,t linear model O ~l ~
o th, .h,l, nonlinear model

I JI!
Figure 1.6 The reeorded experimental response of the ganglion eell (seeond traee) represented as
"frequeney of firing " (or peristimulus histogram) for repeated trials of the band-Iimited GWN eurrent
stimulus shown in the top traee. The predietions of the linear (Le., first-order) model and of the non-
linear (l.e., seeond-order) model using the Wiener kerneis of Figure 1.5 are shown in the third and
fourth traees, respeet ively (after Marmarelis & Naka, 1972).
Example 1.2. Invertebrate Photoreceptor

As a second illustrative example , we consider the modular (block-structured) model de-
rived for the photoreceptor ofthe fly eye (retinula cells 1-6 in an ornmatidium ofthe com-
posite eye of the fly Calliphora erythrocephala) using CSRS quasiwhite stimuli under
light-adapted conditions [Marmarelis & McCann, 1977]. We have found that the
input-output relation of this system can be described by means of a modular model com-
prised of the cascade of a linear filter L followed by a quadratic static nonlinearity N (see
Figure 1.7). Ifthe impulse response function ofthe filter L is denoted by g(m) and the stat-
ic nonlinearity N is the quadratic function : y = c I V + C2 v2, then the equivalent discrete-time
nonparametrie model takes the form of a discrete Volterra model similar to the Wiener
model ofEquation (1.4), except for the first and last terms ofthe right-hand side, which be-
come zero in the Volterra model (see Section 2.2.1). The first- and second-order discrete
Volterra kernels of this model (a11 other kernels are zero) are given by the expressions
kl(m) = c ig(m)u(m) (1.5)
k2(mJ, m2) = c~(m l )g(m2)u(m aU(m2) (1.6)
x
~I L I y' ~ I N I Y ~
r-------------,,
u~N·!i~HILI'
S'
L_~n~r ~I~~_J
· se
Figure 1.7 The L-N easeade model (a linear filter L followed bya statie nonlinearity N) obtained for
the photoreeeptor of the fly Calliphora erythrocepha/a (after Marmarelis & MeCann, 1977).
where u(m) denotes the discrete step function (zero for m < 0 and 1 for m ;::: 0) manifest-
ing the causality of the model, and g( rn) is shown in Figure 1.7 (for the proof, see Section
4.1.2).
We can obtain the equivalent parametrie discrete-time model through "parametrie real-
ization" of gern) (see Section 3.4), given by the two equations
v(n) = alv(n - 1) + ... + aKv(n -K) + ßx(n) (1.7)
yen) = clv(n) + c2v2(n) (1.8)
where the linear difference equation (1.7) describes the linear filter L (for properly cho-
sen order K) and the algebraic equation (1.8) describes the static nonlinearity N. It is ev-
ident that if we seek to substitute v in terms of y in Equation (1.7) by solving Equation
(1.8) with respect to v, we arrive at an irrational expression in terms of y (i.e., this sys-
tem cannot be represented by a single rational nonlinear difference equation). In control
engineering terminology, Equation (1.7) can be viewed as a "state equation" and
Equation (1.8) as the "output equation." This parametrie model is also isomorphie to the
L-N modular model in this case.
The equivalent connectionist model for this system requires an infinite number ofhid-
den units if the conventional sigmoidal activation functions are used (see Section 4.2.1).
However, the use ofpolynomial activation functions allows for an equivalent connection-
ist model with a single hidden unit, as discussed in Section 4.2.1.
Example 1.3. Volterra Analysis 01 Riccati Equation

The third example is chosen to be a parametric model of a nonlinear differential system
described by the well-studied Riccati equation:
dy
- +ay+by2=cx (1.9)
dt
where x(t) is viewed as the input and y(t) as the output of the system. This model exhibits
a squared-output nonlinearity and can be also written as
L(D)y = cx - by: (1.10)
where L(D) represents the differential operator D + a, with D denoting differentiation

over time. The form ofEquation (1.10) implies that this model can be also viewed as a
nonlinear feedback model with a linear feedforward component cL-I(D) and a static non-
linear (square) negative feedback, as shown in Figure 1.8. This negative feedback formu-
lation represents a modular (block-structured) model, equivalent to the parametrie model
of Equation (1.9). In Figure 1.8, we also show an equivalent "circuit model," since this
type of equivalent model form has been used extensively in physiology for parametrie
models described by differential equations. We consider the equivalent circuit model as
another form of a parametrie model (since it can be direct1y converted into a system of
differential equations, and vice versa).
The equivalent nonparametrie model for the Riccati equation is derived in Section 3.2
and corresponds to an infinite-order Volterra series. However, if we assume that Ibl is
very small, then the higher-order Volterra functional terms (higher than second order) can
20 INTRODUCTION
c
x I • Y
, ,}
Figure 1.8 Equivalent modular (block-structured) model for the parametric model defined by the
Riccati equation (1.9), depicting linear dynamic feedforward and nonlinear static feedback compo-
nents (Iett panel). The equivalent "circuit model" (right panel) has a current source x flowing through a
fixed conductance c, and the output is represented by the voltage y applied to a unit capacitance
and a voltage-dependent conductance: G =a + by.
be neglected and the equivalent nonparametric Volterra model becomes approximately of

second order, expressed in continuous time as
y(t) == ko + r
o
k)(T)x(t - T)dT+ rf
0
k2( Th T2)x(t - T)x(t - T2)dT) dT2 (1.11 )
where x(t) andy(t) denote the input and output signals, respectively, and the Volterra ker-
nels are given by the expressions
ko=O (1.12)
k l ( T) = ee-aTu(T) (1.13)
be 2 .
k2( Tb T2) = - - e-a(TI+T2)[1 - e" . mm(TI,T2)]U(TI)U( T2) (1.14)
a
where U(T) denotes the continuous-time step function (0 for T< 0, 1 for T ~ 0). Note that
the Volterra kemels depend on the Riccati parameters (as expected), but terms of order b 2
or higher have been neglected. In practice, these Volterra kernels are estimated from sam-
pled input-output data (i.e., discretized signals) and they yield a discrete-time Volterra
model. An equivalent continuous-time model can be obtained subsequently (if needed) by
means ofthe "kerneI invariance method" presented in Section 3.5.
The discrete-time Volterra kemels of first order and second order can be obtain by dis-
cretization oftheir continuous-time counterparts ofEquations (1.13) and (1.14):
kl(m) = ')'a"lu(m) (1.15)
~ ßy .( )
k2(m b m2) = --a"lI+m2[1 - a-m m ml,m2 ]u(ml)u(m2) (1.16)
In a
where the discrete-time parameters (a, B, ')') of the equivalent discrete-time parametric
model that takes the form of a first-order nonlinear difference equation,
y(n) + ay(n - 1) + ßy(n - 1)2 = ')'x(n) (1.17)

1.4 TYPES OF NONLINEAR MODELS OF PHYSIOLOG/CAL SYSTEMS 21
are distinct from the continuous-time parameters (a, b, c) and expressed in terms of the
continuous parameters ofthe Riccati equation as a = exp[-aT], ß = bT, 'Y = cT, where T
is the sampling interval.
The mathematical analysis of the equivalence between nonlinear differential and dif-
ference equation models is based on the "kerne1invariance method" presented in Section
3.5. We note that the nonparametric model is not as compact as its parametric counterpart.
On the other hand, the model specification task is greatly simplified in the nonparametric
approach, and the estimation of the kemels of the nonparametric model can be accom-
plished by various methods using input-output data, as described in Chapter 2.
Example 1.4. Glucose-Insulin Minimal Model

The fourth example is drawn from the metabolic/endocrine system and concerns the ex-
tensively studied dynamic interrelationship between blood glucose and insulin. The wide-
ly accepted "minimal model," used in connection with glucose tolerance tests, is a good
example of a parametric model for this nonlinear dynamic interrelationship [Bergman et
al., 1981; Carson et al., 1983; Cobelli & Mari, 1983]. This model is comprised ofthe fol-
lowing two first-order differential equations:
dG(t) = -Pj[G(t) - Gb ] -X(t)G(t) (1.18)

dt
dX(t) = -p.y(t) + P3[I(t) - I b ] (1.19)

dt
where G(t) is the glucose plasma concentration (in mg/dl), X(t) is the insulin action (in
min- I), I(t) is the insulin plasma concentration (in J.LU/ml), G b is the basal glucose plasma
concentration (in mg/dl), L, is the basal insulin plasma concentration (in u.Uzml), PI and
P2 are two characteristic parameters describing the kinetics of glucose and insulin action,
respectively, (in min- I), and P3 (in min ? ml/J.LU) is a parameter determining the modula-
tory influence of insulin action on glucose uptake dynamics. It should be noted that this
model does not take into consideration the pancreatic secretion of insulin induced by
changes in glucose plasma concentration or the production of new glucose from internal
organs (e.g., liver in response to elevation ofplasma insulin), which can be described by
separate differential equations (although this is far from a trivial task). The physiological
parameters of glucose effectiveness SG = PI (in min- I ) and insulin sensitivity SI = P3/P2 (in
ml min-I/f-LU) have been defined and used extensively in the literature for
physiological/clinical purposes.
The above system is nonlinear, due to the bilinear term present in Equation (1.18),
which gives rise to an equivalent nonparametric Volterra model ofinfinite order. Howev-
er, it can be shown that, for the physiological range ofthe parameter values, a second-or-
der Volterra model approximation is adequate for all practical purposes. Considering the
variations of I(t) around I b as the input of the system and G(t) as the output, we can derive
the Volterra kernels of the system analytically using the generalized harmonie balance
method (see Section 3.2). The resulting expressions for the zeroth-, first-, and second-
order kerneIs are (in first approximation, for P3 ~ PI):
ko = Gb (1.20)
22 INTRODUCTION
k l ( T) = -P3Gbh(T) (1.21)
Gbpj { Lmin(Tl, T2 ) }
k2(TI' T2) = -2- h(TI)h(T2) + PI 0 exp(-PIA)h(TI - A)h(T2 - >")dA (1.22)
where
1
h(T) = --[exp(-PIT) - exp(-P2T)] (1.23)
P2-PI
The rth-order Volterra kernel is proportional to pt; and can be neglected if 1P31 is very
small. Many other parametrie models have been proposed for this system, typically com-
prising a larger number of"compartments" [Cobelli & Pacini , 1988; Vicini et al., 1999].
The equivalent first- and second-order Volterra kemels of the "minimal model" are shown
in Figure 1.9 for typical parameters: PI = 0.023, P2 = 0,033, P3 = 1.783 . 10-5, and Gb =
80,25, An equivalent modular model is shown in Figure 1.10, utilizing two linear filters, a
multiplier, an adder, and a feedback pathway. This modular (block-structured) model de-
picts the fact that the minimum model can be viewed as expressing a nonlinear (modulato-
ry) control mechanism implemented by the feedback pathway into the multiplier.
Example 1.5. Cerebral Autoregulation

The fifth example is drawn from the cardiovascular system and concerns cerebral au-
toregulation. This is achallenging example of a physiological system with multiple
closed (nested) loops that involve biomechanical, neural, endocrine and metabolie mech-
anisms interacting with each other. A simplified schematic of the protagonists in cerebral
autoregulation is shown in Figure 1.11. A modular model obtained from real data is
shown in Figure 1.12, depicting three parallel branches that correspond to the "principal
dynarnic modes" of this system (see Seetion 4.1.1). This modular model was obtained via
0i :;;=: I
x 1 0~
.().aJ2
.().ax
.().!Dl
'()'01
'()'012
'()'014 o
o
'()'016 :m
'()'018
:m O
mllminsl m2lminsl
100 200 300 400 500 600
TIme (minutes)
Figure 1.9 The equivalent tirst- and second-order Volterra kemels given by the Equations (1.21) and
(1.22) tor the insulin-glucose "minimal" model detined by Equations (1.18)-(1.19).
(l (I) -I.)
X
G(/)
Insulin Glucose
PPb
Figura 1.10 Equivalent modular (block-structured) model for insulin-glucose minimal model, utilizing
two linear filters with impulse response funct ions: P3 exp(-P2.ry and exp(-p,.ry, an adder, and a multi-
plier for negative multiplicat ive (modulatory) feedback (p,Gb is a fixed-reference level defined by the
basal glucose value).
Figura 1.11 Schematic of the main protagonists in cerebral flow autoregulation.
15 t_~DM1(f)
SYMP . 1.I:.
s - -
oI =_______
------.
'~ ~ ~~_ ~(21I I I '

MABP O. I ,.. .... ' ;.........,. 0
' f HZ ·2
. .-4
0.2 0.3 0.4 0.5 -10
8
v3(1) 6
4
2
PARA ~ f[HZl l •
01 0"
0 0.1 0.2 0.3 0.4 0.5 -2 0 2 4
Figura 1.12 Modular model of cerebral flow autoregulation in a normal human subject, using the ad-
vocated methodology that starts with the general Volterra model and derives an equ ivalent "princi pal
dynamic mode " model of lower complex ity. Each of the three branches is composed of a linear filter
(a "principal dynam ic mode" of the system) followed bya static nonlinearity. The model was obtained
from 6 min long data of mean arterial blood pressure (input) and mean cerebral blood flow velocity
(output) sampled every 1 sec (for details , see Section 6.2).
23
24 INTRODUCnON
the Laguerre-Volterra network approach presented in Section 4.3, using mean arterial
blood pressure data as input and mean cerebral blood flow velocity data as output. The
model reveals the presence of (at least) three nonlinear dynamic mechanisms of cerebral
autoregulation (see Section 6.2). Equivalent parametric, connectionist, and nonparametric
models can be obtained from this modular model (see Chapter 4).
The many technical details surrounding the derivation ofthese equivalent model forms
are discussed in Chapters 2--4, along with the corresponding estimation methods and their
relative performance characteristics. Important practical considerations for the successful
application of these modeling methodologies and the required preliminary testing and er-
ror analysis in actual modeling applications are given in Chapter 5.
1.5 DEDUCTIVE AND INDUCTIVE MODELING
The hypothesis-driven approach to scientific research offers a time-honored deductive

path to scientific discovery that has been proven effective in the development of the phys-
ical sciences, closely associated with the reductionist viewpoint (i.e., proceeding deduc-
tively from first principles). However, this traditional approach encounters difficulties as
the complexity of the problem under study increases, making the effective use of first
principles unwieldy. This fact has given rise to a complementary approach that follows an
inductive method based on the available data. In the inductive (data-driven) approach, pri-
ority is given to the data and the research effort is directed toward developing rigorous
and robust mathematical and computational methods that extract the relevant information
(models in our case) in a general context. This general context minimizes the apriori as-
sumptions made about the system under study and, therefore, avoids possible "biasing" of
the results by preconceived (and possibly restrictive) notions of the individual investiga-
tor. To the extent that "unbiased knowledge" is acquired by this data-true inductive
process, it can be incorporated in subsequent hypothesis-driven research to answer specif-
ic questions of interest and ultimately derive "general laws" that can be used deductively
to advance scientific knowledge.
A synergistic approach is advocated in this book that commences with inductive (data-
driven) modeling, following the methodologies presented in this book, and then formu-
lates specific hypotheses that can be tested to unambiguously answer specific scientific
questions ofinterest. In this manner, we secure the advantages ofboth approaches (induc-
tive and deductive) and avoid their mutual shortcomings. In addition, this synergistic ap-
proach is time-efficient and cost-effective, because the inductive method places us quick-
ly in the "neighborhood" ofthe correct model that can be further elaborated with regard to
specific scientific questions of interest by use of hypothesis-driven research. The specific
hypotheses depend, of course, on the goals of the study and, therefore, cannot be pre-
scribed beforehand, other than to indicate that they will be structured in a manner compat-
ible with the available models.
Examples of the advocated synergistic approach are given in Chapter 6, where specific
parametric or modular (block-structured) models are examined along with the previously
obtained (data-true) nonparametric models in order to answer specific scientific questions
and assist in the interpretation of the models. Another class of examples pertains to the ef-
fects caused by the experimental change of a key controlling variable and the assessment
of the resulting quantitative changes in the model characteristics (e.g., effect of various
drugs on cardiovascular, neural, or metabolic function).
1.5 DEDUCTIVE AND INDUCTIVE MODELING 25
The advocated synergistic approach is appropriate for complex systems (because it ob-
viates the need for vast reductionist/hierarchical structures) and protects the investigator
from possible misleading results (when the assumptions made are restrictive or the testing
conditions are not "natural"). It is hard to imagine a "downside" to this synergistic ap-
proach. In the worst case, when it is not necessary because the investigator is able to con-
struct "perfect" hypothesis-driven tests, then we get definitive validation of the hypothe-
sis-based results-hardly a useless outcome. In all other cases, where existing
reductionist knowledge and subjective intuition are either limited or yield unwieldy mod-
el postulates, the initial use of the data-driven approach can protect from potentially mis-
leading results and can accelerate the paceof progress by placing us in the "neighbor-
hood" ofthe correct answer. Further refinements/elaborations using the hypothesis-driven
approach are subsequently possible and desirable.
One may wonder, then, why the advocated data-driven approach has not been used
more extensively. The answer is that appropriate methodologies capable of tackling the
true complexity of the problem (i.e., nonlinear, dynamic, nonstationary, multivariate,
nested-loop) have not been available heretofore. Lack of such methodologies forces in-
vestigators to use inadequate (restrictive) methods that are often unable to achieve the in-
tended goals (i.e., the results are "biased" and often obtained under restrictive experimen-
tal conditions that do not place us in the "neighborhood" of the correct answer). As a
result, the data-driven approach has not yet "proven" itself for lack of appropriate practi-
cable methodologies, and investigators have seen no compelling reason yet to depart from
the traditional hypothesis-driven approach. The overall goalofthis book is to make avail-
able such appropriate methodologies to the biomedical research community at large to
help usher in a new era of advanced research on the true physiological systems-and help
the peer community avoid unrealistic simplifications, born of perceived necessity, that
may breed misconceptions and perpetuate astate of studious confusion.
It is useful to remind the reader that the long debate between the reductionist and the
integrative viewpoints (originating with Hippocrates in the 5th century B.C. and lasting
with undiminished intensity until the present time) is intertwined with the issue of hy-
pothesis-driven research. The latter fosters a tendency toward fragmentation and statie in-
quiry in a legitimized effort to construct and test clear and eomprehensible hypotheses.
This approach has borne considerable benefits but also has placed serious limitations on
those cases in which multipart dynamic interactions are of critical importance. For in-
stance, when Erasistratus and the Anatomists were generating a wealth of anatomie
knowledge in Alexandria in the 3rd century B.C., they were inadvertently diverting med-
ical thought away from the fundamental Hippocratic concepts of the unity of organism
and the dynamic disease process. This fact should not detract from the indisputable con-
tributions of the Anatomists to medieal science but should serve as a constructive re-
minder of the balance required in pursuing scientific research. Advancing knowledge is a
multifaceted endeavor and requires a multiprong approach.
This point is not a mere historical curiosity, because a similar debate is taking place in
our times (onlyon a much larger scale) between the reductionist approach espoused by
molecular biology and the integrative approach advocated by systems biology/physiolo-
gy. Nor is this debate an idle intellectual exercise, since it affects critically the direction of
future research efforts. Although the integrative approach may folIoweither hypothesis-
driven or data-driven methods, this book argues for a synergistic approach that gives pri-
ority to the data-driven methods in order to avoid self-entrapment within the comfortable
confines of established viewpoints. A synergistic approach is also sensible in order to
26 INTRODUCTION
combine the benefits of the reductionist and integrative viewpoints. Although the desir-
ability of this combination is self-evident, the historical record shows that even the best
human minds have a tendency toward polarized binary thinking. The reasons for this ten-
dency toward "binary thinking" are beyond the scope of this book, but certainly the root
causes must be searched for in the psychophilosophical plexus of the human mind. The
reader is urged to contemplate a way out of this "failing of the human mind" by taking
into consideration the Galenic philosophical exhortations (see Historical Note #1 below).
HISTORICAL NOTE #1: HIPPOCRATIC AND GALENIC VIEWS OF

INTEGRATIVE PHYSIOLOGY*
Hippocrates is considered the founder of the medical profession, since he was the first to
separate medicine from priestcraft and give it the independent status of a scientific disci-
pline in Greece in the 5th century B.C. He was affiliated with the Asclepeion of Cos (a
Greek island in the southeast Aegean) but also lived and worked in post-Periclean Athens.
By providing a rational basis for medical practice and emphasizing the importance of
clinical observation, Hippocrates did to medical thought what his contemporary, Socrates,
did to thought in general: separated it from cosmological speculation. He gave the physi-
cian an independent status but held hirn to a high professional standard embodied in the
"Hippocratic oath" that still defines the elevated duties of physicians worldwide.
Hippocrates observed that the human organism responds to external stresses/assaults
(including disease) in a homeostatic manner (recuperation, in the case of disease); that is,
the living organism possesses self-preserving powers and tends to maintain stable opera-
tion through complicated intrinsic mechanisms. He observed that each disease tends to
followa specific course through time and, therefore, it is a dynamic process (as opposed
to a static state, which was the prevailing view at the time). Consequently, he emphasized
prognosis over diagnosis and believed in the recuperative powers of the living organism.
He advocated "giving nature a chance" to effect over time the adjustments that will eure
the disease and restore health in most cases.
This broadly defined homeostatic view led Hippocrates to the notion of the "unity of
organism" that underpins integrative systems physiology to the present day. This integra-
tive view tended to overlook the importance of the constituent parts and set hirn apart
from the "reductionist" viewpoint espoused by the Anatomists of Alexandria. The labors
of the latter in the 3rd century B.C. (especially Erasistratus) made outstanding contribu-
tions to medicine and gave rise to the sect of the Empiricists, who proclaimed their con-
cern with "the eure, and not the cause, of disease."
The reductionist viewpoint promulgated by the Empiricists was reinforced by the
atomic theory of Leucippus and Democritus, as adapted to medicine by Asclepiades (1st
century B.C.), who introduced Greek medicine to Rome. A man of forceful personality
and broad education, Asclepiades combined flamboyance with sharp intellect to achieve
professional success and promote the view that physiological processes depend upon the
particular way in which the indivisible particles of atomic theory come together. AI-
though the validity of this fundamental view is indisputable (and self-evident), it did not
lead to any constructive proposition for medical practice but served only as a public-rela-
*Following A. J. Brock's introductory comments in his translation of Galen's "On the Natural Faculties," Har-
vard University Press, 1979 (first printed in 1916).
HISTORICAL NOTE #1: HIPPOCRATIC AND GALENIC VIEWS OF INTEGRATIVE PHYSIOLOGY 27
tions vehicle for self-promotion in the intellectually shallow and "faddish prone" society
of lst-century Rome-a phenomenon that is frequently recurring in history and all too fa-
miliar in our times as welle In fact, it can be argued that the disbelief of Asclepiades in the
self-maintaining powers of the living organism and the intrinsic obstructionism of his
maximalist approach caused a serious regression in the progress of medicine at that time.
His views gave rise to the "Methodists" (founded by his pupil Themison), who espoused
the simplistic pathological theory that all diseases belonged to two classes: one caused by
constricted and the other by dilated pores traversing the molecular groups that compose
all tissues. Another dubious trait established by the Methodists (and still prevalent to pre-
sent time) is the tendency to invent a label for a perceived "disease" and then "treat the la-
bel" with no regard to the actual physiological processes that underpin the perceived dis-
ease.
The Empiricists and the Methodists were dominating Graeco- Roman medicine when
Galen was born (circa 131 A.D.) as Claudius Galenos in Pergamos (a major Greek cultur-
al center in Asia Minor during the Roman period). Galen, or more appropriately, Galenos
(raAnvo<;, meaning "tranquil" in Greek) had a benevolent and well-educated father,
Nicon, who was a distinguished architect, mathematician, and philosopher. Galenos re-
ceived an eclectic liberal education and studied the four main philosophical systems: Pla-
tonic, Aristotelian, Stoic, and Epicurean. He pursued medical studies under the best teach-
ers in Pergamos and afterward in the other Hellenie centers of medical studies: Smyma,
Corinth, and Alexandria. At the age of 27, he returned to Pergamos and was appointed
surgeon to the gladiators. Four years later, driven by professional ambition, he went to
Rome, where he quickly achieved high distinction, rising to the coveted position ofphysi-
cian to the emperor Marcus Aurelius. Despite his broad acceptance and popularity,
Galenos made no effort to conceal his contempt for the ignorance and charlatanism of
most physicians in Rome. His courageous stand against corrupt medical practice, com-
bined with professional envy, eamed hirn many enemies in the medical circles of Rome,
who conspired against his life. To save bis life, he fled Rome secretly at the age of 37 and
retuned to his old horne in Pergamos, where he settled down to a literary life of philo-
sophical contemplation and medical research. Even an imperial mandate a year later was
not able to summon hirn back to Italy. Galenos pleaded vigorously to be excused and the
emperor eventually consented, while trusting to his care the young prince Commodus.
During the remaining 30 years of his life, Galenos wrote extensivelyon physiology,
anatomy, and logic, providing the foundation for medieval medicine as the supreme au-
thority (he was called the "Medical Pope ofthe Middle Ages") until Vesalius and Harvey
disproved some of Galenos' basic cardiovascular premises with their seminal experiments
and laid the foundation ofmodem anatomy and physiology, respectively, in the 16th and
17th centuries A.D.
In the six centuries that elapsed between Hippocrates and Galenos, the big debate in
medicine revolved around the interrelated issues of the integrative versus reductionist
viewpoints of physiology (Hippocrates' view of the unity of organism vs. the Atomists'
view of decomposition into indivisible particles) and the dynamic versus static views of
disease (Hippocrates' view of disease as a process vs. the anatomical view ofthe Empiri-
cists). Galenos managed to put this debate to rest for 14 centuries by convincingly making
the Hippocratic case. He reestablished the Hippocratic ideas of the unity of the organism,
the dynamic interdependence of its parts, and its interaction with the environment (horne-
ostasis). This constitutes the common conceptual foundation with our contemporary view
of integrative systems physiology in a dynamic and homeostatic context that is espoused
28 INTRODUCTION
by this book. The living system can only be understood as adynamie whole and not by
static isolation of its component parts.
This fundamental principle is in direct opposition to the widespread view (even in our
own times) that the whole can be understood by careful summation of the elaborated parts
(reductionist viewpoint). The key difference concerns the emerging physiological proper-
ties from the dynamic interactions of the component parts and the interaction of the whole
living system with its environment. In this, we stand today with Hippocrates and Galenos
in uncompromising solidarity.
Galenos was not only a man of great intellect but also possessed a strong moral consti-
tution. In his book "That the best Physician is also a Philosopher," he stipulated that a
physician should have perfeet self-control, should live laborious days, and should be dis-
interested in money and the weak pleasures of the senses. Clearly, he would be a "hard
sell" today. He calls on the physicians to be versed in: (a) logic, the science of how to
think; (b) physics, the science of what is nature; and (c) ethics, the science of what to do.
We must always remain aware of his concerns that medicine should not be allowed to fall
into the hands of competing specialists without any organizing scientific, philosophieal,
and moral principles. His recorded thoughts remain an inspiration forever and guide the
constant evolution of medical science and practice.
2
Nonparametrie Modeling
Nonparametrie modeling constitutes, at present, the most general and mature (i.e., tested
and reliable) methodology for nonlinear modeling of physiological systems. Its main
strengths and weaknesses are summarized below.
The main strengths of nonparametric modeling are:
• It simplifies the model specijication task and yields nonlinear dynamic "data-true"
models for almost all physiological systems.
• It yields robust model estimates from input-output data in the presence of ambient
noise and systemic interference.
• It allows derivation of equivalent parametric and modular model forms that facili-
tate physiological interpretation of the model.
• It is extendable to nonlinear dynamic modeling ofphysiological systems with multi-
ple inputs and outputs (including spatiotemporal, spectrotemporal, and spike-train
or point-process data).
• It is extendable to nonlinear dynamic modeling of many nonstationary physiologi-
cal systems (including adapting and cyclical behavior).
The main weaknesses of nonparametrie modeling are:
• It requires judicious attention to maintain the compactness of the model, lest it be-
come unwieldy for highly nonlinear systems.
• It requires appropriate input-output experimental or natural data (i.e., broadband
and in sufficient quantity).
• Physiological interpretation of the model may require derivation of equivalent para-
metric or modular (e.g., PDM) model forms.
Nonlinear Dynamic Modeling 0/ Physiological Systems. By Vasilis Z. Marmarelis 29

30 NONPARAMETRIC MODELING
Nonparametrie modeling employs the mathematical tool of afunctional (a term due to

Hadamard) that is a function of a function. As an alternative, the term "function of a line"
was used by Volterra in his early work. A functional can represent mathematically the in-
put-output transformation performed by a causal system. The functional defines the map-
ping F ofthe past and present values ofthe input signal x(t) (a function) onto the present
value ofthe output signal y(t) (a scalar):
y(t) = F[x(t'), t' :::; t] (2.1)
The objective ofnonparametric modeling is to obtain an explicit mathematical representa-

tion of the functional F using input-output data (i.e., to derive inductively an empirical,
true-to-the-data mathematical model ofthe system input-output transformation).
This conceptual/mathematical framework can be extended to the case of multiple in-
puts and multiple outputs, whereby each output in characterized by its own functional op-
erating on all inputs that have a causallink to this specific output. Thus, the "system" is
defined by its inputs and outputs, which are selected by the investigator to serve the ob-
jectives ofthe specific study. It is important to realize the immense flexibility afforded the
investigators in defining the "system of interest" and the critical ramifications of this se-
lection vis-ä-vis the objectives of their study.
When the system is stationary (time-invariant), this mapping rule F remains fixed
through time, facilitating the modeling task. The reader is alerted to avoid the pitfall of
confusing an output signal varying through time (which is always the case) with temporal
variation of the rule F. The case of nonstationary systems (whereby the mapping rule F
varies through time) is far more challenging from the point of view of obtaining an ex-
plicit mathematical model from input-output data. This book focuses on stationary sys-
tem modeling that has been the subject of most studies to date, although nonstationary
modeling methods are also discussed in Chapter 9.
It is important to note that the mathematical formulation of Equation 2.1 applies to dy-
namic systems, although the latter are often associated with differential equation models
(parametric models). A system is dynamic (and causal) when the present value ofthe out-
put depends on the present and past values of the input, as indicated by the functional no-
tation of Equation 2.1. Another definition of dynamic systems, beyond the conventional
differential equation fonnalism, can be based on whether or not the effects of an instanta-
neous (impulsive) input on the output spread over time. This is illustrated in Figure 2.1,
where the distinction is also made between amplitude nonlinearity and dynamic nonlin-
earity.
The general approach to nonparametric modeling of nonlinear dynamic systems from
input-output data is based on the Volterra functional expansion (or Volterra series) and
its many elaborations or variants, including the orthogonal Wiener series for Gaussian
white noise inputs. Because of its fundamental importance, we begin the chapter with a
thorough discussion of the Volterra series and models (Section 2.1) in continuous and dis-
crete time, as well as practical issues of Volterra kernel estimation. We continue with the
orthogonal Wiener series (Section 2.2), although its early importance is rapidly waning
due to novel estimation methods that do not require orthogonality. Methodologies for the
estimation of the Wiener kemels (the unknown quantities in Wiener models) or kernels
from other orthogonal functional expansions using quasiwhite test inputs (e.g., CSRS) are
also presented in Section 2.2. Efficient methodologies for the estimation ofVolterra ker-
nels, which are critically important in actual applications, are discussed in Section 2.3, in-
2. 1 VOL TERRA MODELS 31
Xa(tj~ __ .r>;"
o tt tf
~
Xb(t:--L
o f2 t2
XC(C) nn 1~"~U
I""~ Ya ( t) +Yb(.)
..J.U.L -
o t1t 2 +t.., t~ 2 ~
y,(t)
c
Figure 2.1 Illustration of dynamic effects of an impulsive input xa(t) at time t., manifested as the
spread of the elicited response Ya(t) beyond time t 1 (top trace). The third trace illustrates the violation
of the superposition principle by this system for two impulsive inputs that indicate the presence of
dynamic nonlinearity in this system by means of the nonzero interaction term 81,2(t) = Yc(t) - Yb(t) -
Ya(t) (in contradistinction to an amplitude nonlinearity that will be manifested as a violation of a linear
scaling relation between input and output) [Marmarelis & Marmarelis, 1978].
cluding the most efficient method to date (Laguerre expansion technique). Emerging al-
ternative methods for Volterra system modeling, based on equivalent network models and
iterative estimation techniques, are discussed in Section 2.3. Chapter 2 concludes with a
discussion of model estimation errors in a practical context using actual experimental data
(Section 2.4).
The extension of these methodologies to the case of multiple inputs and multiple out-
puts is presented separately in Chapter 7 (including the case of spatiotemporal inputs and
outputs), because it is attracting growing attention and importance. Likewise, the case of
neural systems with spike inputs andlor outputs (sequences of action potentials) is dis-
cussed separately in Chapter 8, because of the significant methodological modifications
required by this distinct signal modality.
2.1 VOLTERRA MODELS
The development ofVolterra models relies on the mathematical notion ofthe Volterra se-
ries (a functional power series expansion) introduced by the distinguished Italian mathe-
matician Vito Volterra* about a century ago [Volterra, 1930]. The term "Volterra series"
is used to denote the (generally infinite) functional expansion of an analytic functional
representing the input-output relation of any continuous and stable nonlinear dynamic
system withfinite memory.
The requirement of finite memory is necessary for the convergence of the Volterra
functional series expansion (i.e., for the stability of the system) and excludes chaotic sys-
tems and autonomous (but not forced) nonlinear oscillators from this mathematical repre-
sentation. The term "continuous" is appropriate for physiological systems and seeks to
waive the mathematical requirement of analyticity of the system functional, because even
nonanalytic, but continuous, functionals (corresponding to systems with nondifferentiable
*For brief bio ofVito Volterra, see Historical Note #2 at the end ofthis chapter.
continuous input-output relations) can be approximated to any desired degree of accuracy

by an analytic functional (and therefore a Volterra series) in a matter akin to the Weier-
strass polynomial approximation theorem for nonanalytic continuous functions. Thus, the
applicability of the Volterra series expansion to system modeling is very broad and re-
quires a minimum of prior assumptions. We advocate its use because it offers practically
a nearly universal modeling framework for physiological systems.
It is unclear whether Volterra anticipated the profound implications of the proposed
functional series expansion on the system modeling problem. At that time, his work on
functionals and integrodifferential equations was primarily motivated by the desire to un-
derstand phenomena in nonlinear mechanics (e.g., hereditary elasticity and plasticity ex-
tending to hysteresis) and later in population dynamics (e.g., prey-predator models).
Since the conceptual framework of system science (Le., the notion of a system as an oper-
ator transforming an input into an output) was not part ofthe scientific thinking in the be-
ginning of the 20th century, it is likely that Volterra never conceived his functional ex-
pansion ideas in the context of the system modeling problem as we currently
conceptualize it. This was evidently done in the 1940s by another great thinker and math-
ematician of the 20th century, Norbert Wiener, the father of cybemetics. * Wiener, in his
seminal mono graph [Wiener, 1958], made this critical connection between the nonlinear
system modeling/identification problem and functional expansions (surprisingly, without
acknowledgement of Volterra's preceding work). Wiener's fundamental contributions
will be discussed in the following section.
Returning to Volterra's fundamental contributions to our subject, we note that the
introduction of the Volterra "functional power series" (as he termed it) occupies only
one page in his seminal monograph, Theory of Functionals and Integro-Differential
Equations (p. 21 in the Dover edition). Two more pages (pp. 19-20) are devoted to the
introduction of the "regular homogeneous functionals" of higher degree (what we now
call the Volterra functionals) and the extension of the Weierstrass theorem to continu-
ous functionals-also a Frechet contribution around the same time [Frechet, 1928]. It is
worth noting the disproportionate impact of these three pages (out of a 160-page mono-
graph) on the future development of system modeling theory and on Volterra's posteri-
ty. Nonetheless, in acknowledging him, we honor the great impact of his entire scientif-
ic work, as weIl as the brilliance and overall intellectual stature of a remarkable
individual.
Volterra's pivotal idea is the transition from a finite dimensional vector space to an
enumerably infinite vector space and then to a continuous function space. In other
words, the Volterra series may be viewed as a generalization of the well-known Taylor
multivariate series expansion of an analytic function, f, of m variables as m ~ 00. The
multivariate Taylor series expansion of an analytic function j'(x., ... , x m ) about a refer-
ence point (xi, .. X:Z) in the m-dimensional vector space is defined by these m vari-
0 ,
ables as
m m m
f(Xh . 0 • ,xm ) =f(xi, . 0 0 ,x:) + Ii=1 a;(xi - x~) +il=li2=1
I I aibi2(Xil - X~I)(Xi2 - X~2) + 0 o. (2.2)
and evolves into the Volterra functional power series as m ~ 00, where the origin of the
real axis is used as the reference point (i.e., x~ = 0). Then the vector [Xh . ,xm ] turns into0 •
*For brief bio ofNorbert Wiener, see Historical Note #2 at the end ofthis chapter.
a continuous functionx(A) for A in the interval [a, b], and the analytic functionJtums into
the analytic functional F that can be expressed as the Volterra series expansion:
F[x(A)] = ko + f
a
kJ(A)x(A)dA + ff
a
k2(A b A2)x(A)x(A2)dA)dA2
+ ... + f ... f
a
kr(Ab ... , Ar)x(A J) ... x(Ar)dA) ... ds; + . . . (2.3)
where k; represents the limit ofthe multivariate Taylor expansion coefficients ail ... i r and
is termed the "Volterra kerne]" of rth order. The multiple integrals are termed the "regular
homogeneous functionals" or, simply, "Volterra functionals"
The coefficients of the Taylor series expansion in Equation (2.2) are determined by
the partial derivatives of the analytic function f evaluated at the reference point.
Likewise, the kernels of the Volterra series expansion in Equation (2.3) are determined
by the partial derivatives of the analytic functional F as defined by Volterra and others.
This does not necessarily provide the requisite physical or physiological insight into
the meaning of the Volterra kernels for an actual system-an important issue that will
be addressed later in the context of nonlinear system dynamics using illustrative exam-
ples.
With regard to the analyticity requirement of the Volterra series expansion, we al-
ready indicated that it can be relaxed to continuous system functionals based on an ex-
tension of the Weierstrass theorem due to Frechet and Volterra. This implies broad ap-
plicability of the Volterra expansion in practice, since it is difficult to disprove
experimentally the continuity of a physiological system. Nonetheless, if discontinuity
exists, then either separate Volterra representations can be sought for the different oper-
ational regions where the system is continuous or a satisfactory continuous model ap-
proximation may be obtained for the entire operational region of the system. After all,
the difference between a "discontinuous" jump and a "very rapid" (but continuous)
change is expected to be experimentally opaque in most cases. This remains a topic of
open inquiry, although it is not expected to affect practically but aminute percentage of
applications to physiological systems.
With regard to the issue of convergence of the Volterra series, we note that lack of
convergence is experimentally manifested either as an instability of the system output for
certain inputs or as the inability of the Volterra model to represent the system output with
satisfactory accuracy. The former manifestation is experimentally evident; however, the
latter may be ambiguous since other factors (e.g., noise or inadequate model specification
andlor estimation) may prevent the model from achieving satisfactory accuracy. The
mathematical condition for convergence of the Volterra series can be expressed as fol-
lows for the case ofuniformly bounded inputs (i.e., Ix(t)I ~ B):
fo
oo • • • f Ikr('Tb ... 'Tr)ld'Tl ... d-t; S Ar
Br
(2.4)
where {Ar} is a convergent series of nonnegative scalars. Note that the absolute integra-
bility condition of Equation (2.4) incorporates the requirement of finite memory that was
placed earlier on the applicability ofthe Volterra modeling approach. It is evident that the
finite-memory requirement is satisfied both in the strict sense of the finite kernel support
(Le., finite domain of nonzero kerneI values) and in the asymptotic sense of kernel ab-
solute integrability.
The form of the Volterra series that we will use throughout the book is slightly differ-
ent from Equation (2.3) to conform with established practice of how the input signal is
represented. Connecting also with the system perspective depicted by Equation (2.1), we
can state that the output y(t) of a stationary stable causal system (and all physiological
systems are assumed causal) can be expressed in terms of its input signal x(t) by means of
the Volterra series expansion as:
Lkl('T)x(t- 'T)d'T+ fr k2('TJ> 'T2)x(t- 'Tl)x(t- 'T2)d'Tld'T2

oo
y(t) = ko +
o 0
+ ... + r...
°
f kr('TJ> ••• , 'Tr)x(t - 'Tl) ••• x(t - 'Tr)d'Tl ... dr, + . . . (2.5)
where the range of integration ('Ti from 0 to (0) indicates that the input past and present
values affect the output present value in a manner determined by the Volterra kernels (i.e.,
the kernels should be viewed as weighting functions in this integration). Therefore, the
kernels represent the system-specific patterns of input-output causal effects that practical-
ly extend over a finite time interval J.L (i.e., 'Ti takes values from 0 to J.L) termed the "mem-
ory extent" of the system.
Note that the Volterra kernels are causal functions (i.e., zero for negative arguments)
and symmetric with respect to their arguments (i.e., invariant to permutations of their ar-
guments). They characterize completely the system dynamics and form a hierarchy of
nonlinear terms (the Volterra functionals) representing system nonlinearity in ascending
order. The order of each kernel corresponds to the multiplicity of distinct values of the in-
put past and present (termed the input "epoch") that partake in forming the system present
output. In practical terms, the kernels ought to be estimated over the finite memory extent
ofthe system, using input-output data.
The zeroth-order Volterra kernel, ko, is the value of F (and ofthe output) when the in-
put is absent (i.e., the input is the null function). For a stationary system, k o is constant.
However, in an actual physiological context, the system output will not be generally con-
stant even in the absence of input, due to the effects of a multitude of unobservable factors
that act as noise or systemic interference. In this practical context, the constant ko, select-
ed for a stationary Volterra model, becomes the average value of the output variations due
to this noise/interference (or, in other words, the average of the spontaneous activity of
the system in the absence of any input). Note that ko can be expressed explicitly as a func-
tion oftime for a nonstationary system/model (along with all other kernels), as discussed
in Chapter 9.
The first-order Volterra kernel, k1( 'T), is akin to the "impulse response function" oflin-
ear system theory, although it does not determine the response to an impulsive input in the
nonlinear context (see Section 2.1.2). It should be viewed as the linear component ofthe
nonlinear system, describing the pattern by which the system weighs the input past and
present values (termed hereafter the "input epoch") in order to generate the linear portion
of the output signal (i.e., no nonlinear interactions are included).
As an illustrative example, consider the first-order kerne I of the fly photoreceptor
model shown in Figure 1.7 (i.e., the impulse response function of the filter L). It depicts
the fact that input values of light intensity impinging on the photoreceptor have maximum
positive impact on the output value of intracellular potential generated at the photorecep-
tor soma about 12 ms later (lag) and maximum negative impact at about 20 ms lag (in a
linear context). The total impact is quantified for all time lags between input and output
by the values of the first-order kerne1(in a linear context). The actual impact on the output
depends, of course, on the input epoch and the static nonlinearity N. The linear compo-
nent ofthe system output can be computed by means ofthe convolution between the first-
order kerneI and any given input signal, as indicated by the first-order Volterra functional.
The first-order kerneI provides a complete quantitative description of the linear dynamic
characteristics of the system, and its Fourier transform provides the frequency response
characteristics of the system in a linear context (akin to the "transfer function" or "fre-
quency response function" of linear system theory). Most physiological systems exhibit
linearized behavior for very small input signals; therefore, the first-order kernel usually
yields a good approximation of the entire system output for small input signals.
An illustration of the convolution operation performed by the first-order Volterra func-
tional is given in Figure 2.2, where an arbitrary input signal x(t) is approximated by con-
tiguous reetangular pulses of narrow width at. As fit approaches zero, r(t - tn) tends to
[kI(t - tn)x(tn)at] and the first-order Volterra functional computes for each time t the sum-
mation of all r(t - tn) for tn = n at between t and t - JL, where JL is the extent ofthe first-or-
der kerne 1 k, ( 'T).
The second-order kernei, k2( 'Tl' 'T2), represents the lowest order nonlinear interactions
(i.e., between two values of the input epoch as they affect the output present value) in the
Volterra modeling framework. It can be viewed as the two-dimensional pattern (a sur-
face) by which the system weighs all possible pairwise product combinations of input
epoch values in order to generate the second-order component of the system output (see
Figure 2.3). For a second-order Volterra system, this can be illustrated by the interaction
term (JI,2(t) depicted in Figrure 2.1, which represents the second-order interaction between
the two unitary impulses at the input and is equal to twice a slice of the second-order ker-
nel ofthis system (parallel to the diagonal), as shown in Figure 2.4. For higher-order sys-
tems, there are also higher-order interactions between the two input impulses that are rep-
resented by "slices" of higher-order kernels,
x(t) ~ LX( nßt)p(t -nßt)~y(t)::: LX( nßt)r(t - nßt)

n n
INPUT xCt) OUTPUTy(t)
f.J
r(t -In) = fkl (-r )p(t -tn - t )d-r

ji/x(tn)o p(t-tn)
i ·
I f'<
tn In + IJ
,0
t
tn = nßt
Figure 2.2 Illustration of the convolution operation performed by the first-order Volterra functional.
The input signal x(t) is approximated by contiguous rectangular pulses p(t - tn) of narrow width ß.t(tn =
nß.t) and the resulting output signal y(t) is the summation (superposition) of the individual responses
r(t - tn) over the "memory" extent JL 01 the first-order kernel k 1 (i) . Thus, the output at time t 1 contains
contributions from all input pulses from (t1 - J-L) to t 1 •
k,(t""t",) k, r<,t"; )
Input
t
Presen!
Time t",
Figure 2.3 Second-order interaction of input epoch values x(t - ' 1) = Xb and x(t - ' 2) = x. as they af-
fect the outputy(t) in the manner determined by the value k 2(' "1. '"2) of the second-order Volterra ker-
nei. Contributions from all possible pairs of input epoch values are integrated to produce the second-
order output component according to Equation (2.5).
As an example from areal second-order Volterra system , consider the aforementioned

model ofthe fly photoreceptor. Its second-order kerne I is shown in Figure 6.9 and depicts
the fact that the maximum second-order effect on the present value ofthe system output is
from the negative square of the input epoch values at about 12 ms lag. The maximum op-
posite (positive) effect is from the product combination ofinput values at about 12 ms and
20 ms lag. The entire second-order impact ofthe input epoch values on the system present
output is quantified by the second-order kernel for all possible pairwise combinations of
input lagged values over the memory extent of the system.
The second-order component of the system output can be computed for any given
input using the double convolution operation indicated by the second-order Volterra func-
tional. The two-dimension al Fourier transforrn of the second-order kernel provides the
complete bifrequency respon se characteristics of the system, as discussed in Section
2.1.3. An illustration of the double convolution operation perforrned by the second-
order Volterra functional is given in Figure 2.3. The operational meaning of the second-
order kerneis is discus sed further in Sections 2.1.2 and 2.1.3.
k 2(t"" '2)
12 -I,
r
'2
Figure 2.4 For a second-order Volterra system, the nonlinear interaction between two unitary im-
pulsive inputs (see Figure 2.1) is equal to twice the "slice" of the second-order Volterra kernel parallel
to the diagonal at '1 = t 2 - t 1 or at ' 2 = t2 - t 1 (see section 2.1.5).
Higher-order kernels represent the patterns of nonlinear interactions among a number

of input epoch values equal to the order of the kemel, Their operational meaning will be
further discussed in Sections 2.1.2 and 2.1.3. Theyare seldom obtained in actual applica-
tions (for order higher than second) since their estimation and interpretation becomes a
formidable challenge due to the increasing dimensionality ofthe argument space. For this
reason, we have developed an alternate approach to modeling higher-order nonlinearities
that facilitates estimation and interpretation (see Section 2.3.3).
Note that the Volterra functional expansion can be defined around any reference func-
tion (in addition to the null reference function for which it was initially defined) as long as
we remain within regions of continuity and convergence. For instance, it is often the case
in practice that a physical input signal cannot attain negative values (e.g., light intensity in
visual stimulation). Then a nonzero (positive) reference level can be used to deliver phys-
ical stimuli with positive and negative deviations from this reference level that secures
positive physical stimulus values. For stochastic stimuli (e.g., white noise), this reference
level usually represents the mean of the random input signal and its value is greater than
the maximum deviation from the mean.
This reference level can be used as a "computational zero" and the Volterra series ex-
pansion is defined around it. Of course, the obtained Volterra kemels depend on the spe-
cific reference level (in the same manner that the Taylor series coefficients depend on the
reference point of the expansion) and the employed reference level must be explicitly re-
ported when such kernels are published. For a nonzero reference level c, the associated set
of Volterra kernels {k~} is related to the zero reference level set {k~} through the relation
[Marmarelis & Marmarelis, 1978]
k~( 'TJ, ••• 'Tr ) = I

00 (J) c . f ... f !s{
1- r
00
'TJ, ••. , 'T" 'Tr+J, ... , 'T)d'Tr+! ... d7j (2.6)
J=r r 0
Some analytical examples of Volterra models are given below, followed by further
discussion on the operational meaning of the Volterra kemels (Section 2.1.2) and the
frequency-domain representation of Volterra models (Section 2.1.3) Since inductive
modeling of physiological systems (i.e., directly from data) is performed in practice us-
ing sampled data, the mathematical expressions of the Volterra models must be adapted
to discrete-time form, as discussed in Section 2.1.4. The practical estimation of the dis-
crete-time Volterra kemels that are involved in the discretized Volterra models is ini-
tially discussed in Section 2.1.5 and elaborated further in Sections 2.3 and 2.4.
2.1.1 Examples of Volterra Models

To illustrate the use ofVolterra models for nonlinear systems and their relation to equiva-
lent parametric models, we use the following four examples.
Example 2.1. Static Nonlinear System

Westart with the simplest case of a static nonlinear system:
y = fex) (2.7)
Ifthe functionfis analytic, then we can use its Taylor series expansion:
y = ao + aIx + a2x2 + ... + a.x' + ... (2.8)

where, a, =jCr)(O)/r!, withjCr)(o) denoting the rth derivative ofj(x) at x = o. This Taylor
expansion detennines the equivalent Volterra series with kernels (r ~ 0):
kr( 'Tb ••• , 'Tr ) = a r c5( 'Tl) . . . S( 'Tr) (2.9)
where S('T) denotes the Dirac delta (impulse) function. *

If the function j is continuous (but not analytic), then a finite-degree polynomial
approximation of any desired accuracy can be obtained based on the Weierstrass theo-
rem:
j(x) == <Xo + alx + ... + aQxQ (2.10)
and the Volterra kernels are given by Equation (2.9) for 0 :::; r :::; Q.
The evaluation of the coefficients {o.} of the Weierstrass approximation can be made
by use of polynomial expansions on bases defined over an expansion interval equal to the
amplitude range ofthe system input (see Appendix I). This is illustrated in Figure 2.5 for
one analytic and two nonanalytic nonlinearities (one continuous and one discontinuous).
Note that the expansion coefficients ofpolynomial approximations depend on the selected
expansion interval, unlike the Taylor expansion, which only depends on the derivatives of
the (analytic) function at the reference point.
Example 2.2. L-N Cascade System

We continue with a simple nonlinear dynamic system given by the cascade of a linear fil-
ter followed by a static nonlinearity (like the L-N cascade model of the fly photoreceptor
presented in Section 1.4 and shown in Figure 1.7). The equivalent Volterra model for this
cascade system can be found by considering a polynomial representation of the static
nonlinearity acting on the filter output v(t):
Q
y(t) = L arvr(t)
r=0
(2.11)
where Q may be finite or infinite. Then, substitution ofv given by Equation (2.12) as the
convolution integral describing the input-output relation for the linear filter:
v(t) = r g( T)x(t - T)dT (2.12)
into Equation (2.11) yields the following analytical expression for the rth order Volterra
kerne I ofthis L-N cascade:
kr( 'Tb ••• , 'Tr ) = arg( 'Tl) ... g( 'Tr) (2.13)
*The Dirac delta function 5(1- (0) (also known as the impulse function) is defined as zero whenever its argu-
ment is nonzero (Le., 1 =1= ( 0 ) and tends to infinity at 1 = 10 • The key defining relation is:
to + e
fto-e
1(1)5(1 - (0)dl = l(to)
for any continuous functionj'(r) in the e-neighborhood Ofl 0 •

f(x) = exp[ -Äx] ICx) ::::: ao + alx+ a').J? ICx) ::::: bo + qx+ b"J?
~ (-Äfll = aa (A,8)+a1 (A,8)x+aJ (A,8)x J
= ba ( A, 8) + b" (A, 8) x + b:l (A, 8 ) x:l
=L-
;w~ m!
x"
•••
x e A
.
•J:•
'
A
Figure 2.5 An analytic function and its Taylor series expansion valid for all x (exponential in left pan-
el). The second-degree polynomial (quadratic) approximations of the nonanalytic continuous function
(middle panel) and of the discontinuous function (right panel) in the interval [-A, A] are shown with
dotted line. Note that the coefficients depend on A and 8 (see Appendix I).
where g( T) is the impulse response function ofthe linear filter L (for detailed derivation see
Section 4.1.2). It is evident that the Volterra kerneIs of such a cascade system have a very
particular structure that is amenable to interpretation and can be quickly ascertained by vis-
ual inspection, that is, for specific values ofj tau arguments, the rth-order Volterra kernel
is proportional to the (r - j)th-order Volterra kerne I ofthis system. This is typically ascer-
tained in practice by examining a possible scaling relation between a "slice" of the second-
order kernel estimate (along Tl or T2) and the first-order kernel estimate. This scaling rela-
tionship is illustrated in Figure 2.6 for the Volterra model ofthe fly photoreceptor, and it is
onlyapproximate for this real sensory system (under certain light-adapted conditions).
Example 2.3. L-N-M "Sandwich" System

Another modular cascade model that has been developed in the study of the visual system
is the so-called "sandwich" model comprised of two linear filters Land M separated by a
static nonlinearity N (see Figure 2.7). This cascade model has been extensively studied
analytically and experimentally [Korenberg, 1973a; Marmarelis & Marmarelis, 1978; Ko-
renberg & Hunter, 1986]. Its Volterra kernels can be derived by combining the results of
the two previous examples and are given by (see Section 4.1.2 for detailed derivation)
m in( Tl> ... , T r )

kr ( Tb . . . , Tr) = a;
I o
h(A)g( Tl - A) ... g( T r - A)dA (2.14)
where g( T) and h( T) are the impulse response functions of the prior filter Land the poste-
rior filter M, respectively, and {s.} are the polynomial coefficients of the static nonlinear-
ity N(as before).
A B
TIME" (SECONDS) TIME"(SECONDS)
Figure 2.6 Comparison of the first-order kernel (A) with a "slice" of the second-order kernel (8) for
the fly photoreceptor [Marmarelis & McCann, 1977].
x-1 H H L
v=g®x
N
z = Larvr
M
y =h®z
~Y
Figure 2.7 The L-N-M cascade (sandwich) model, comprised of two linear filters (L and M) and a
static nonlinearity (N) "sandwiched" between them (® denotes convolution).
We observe that the simple scaling relation established in the L-N cascade between
"cuts" of Volterra kernels of various orders (at arbitrary values of Ti) does not hold for the
L-N-M cascade. However, in this case, the kernel values along any axes are proportional
to each other (as long as they have the same dimensionality), because of the causality of
the filters Land M [Chen et al., 1985]. For instance, if T r = 0, then g( T r - A) :f= 0 only for A
= 0, and therefore (for sampled kernel values)
kr(Tb ... , Tr-b 0) = a,.h(O)g(O)g( Tl) ... g( Tr-l) (2.15)
which is proportional to: kr+j ( Tb ... , Tr-b 0, ... , 0) = ar+jh(O) [g(O)Y+19(Tl) ... g( Tr-l) for
any j. In practice, this test is not easily applied to ascertain the possible suitability of the
L-N-M cascade model, because it requires at least a third-order kerneI estimate (Le., it
cannot be applied between the first-order and the second-order kernels that are typically
estimated in practice). Therefore, an alternative test has been developed for this purpose,
as described in Section 4.1.2 along with further elaboration on the analysis of the L-N-M
cascade model.
One ofthe initial applications ofthe L-N-M cascade model was in the vertebrate reti-
na, where it was found suitable for modeling the amacrine cells (see Section 6.1.1).
Example 2.4. Riccati System

Consider now the simple parametric model presented as Example 1.3 in Section 1.4,
where the relation between the input x(t) and the output y(t) can be described by the first-
order nonlinear differential equation
dy
- +ay+by2=cx (2.16)
dt
The homogeneous solution of this equation (i.e., in the absence of any input x) is the
Bernoulli equation that yields upon integration the sigmoidal "logistic curve" (for nonze-
ro initial conditions) encountered frequently in biology to describe saturation of growth
processes. This equation has been used also by Verhulst to describe the population dy-
namics of a species with internal competition.
With a forcing function on the right-hand side (represented by the input x), Equation
(2.16) becomes the Riccati equation and has been used to describe nonlinear kinetics,
where the kinetic constant depends linearlyon the output (i.e., it is equal to a + by). It
constitutes a parametrie model with three parameters (a, b, c) of a system that is nonlinear
(because of the y term), stationary (because all coefficients are time-invariant), and dy-
namic (because ofthe derivative term).
2.1 VOLTERRA MODELS 41
The equivalent nonparametric Volterra model can be obtained by use ofthe "general-
ized harmonie balance method" presented in Section 3.2. An infinite-order Volterra mod-
el (Volterra series) is required for complete representation of this system. The first two
Volterra kerneIs are derived to be
k,( T) = ee-aTu( T) (2.17)
bc: "
k2 ( Tb T2) = - - e- a (Tl +T2)[ 1 - ea "m m (Tl ,T2 )]U( Tl)U( T2) (2.18)
a
where U(T) denotes the step function (i.e., zero for T< 0 and unity elsewhere). The higher-
order Yolterra kerneIs have more complicated expressions and are omitted in the interest
of space. Note, however, that the r-th order kernel is proportional to b':' and, thus, terms
higher than second order are negligible if Ibl is very small.
2.1.2 Operational Meaning of the Volterra Kerneis

In this section, we seek to provide an operational meaning to the Yolterra kemels, so that
they do not remain abstract mathematical objects but become useful instruments for en-
hancing our understanding ofthe functional properties ofphysiological systems.
Let us begin by pointing out that the zeroth-order kernel is a simple reference constant
(for stationary systems), representing the output value when no input is applied to the sys-
tem. The first-order Volterra functional is the well-known convolution integral that repre-
sents the input-output relation of linear time-invariant (LTI) systems. In this convolution,
the first-order kernel plays the role of the "impulse response function" for LTI systems,
which is the system response (output) to an impulsive input for LTI systems but not for
nonlinear systems. As will be seen below, the response of a nonlinear system to an impul-
sive input involves additionally the diagonal values of all higher-order kerneIs. The first-
order kernel represents the pattern by which the system weights the input epoch values
(past and present) to generate the present value ofthe system output through linear super-
position (weighted summation in discrete time or integration in continuous time). An il-
lustration of this is presented in Figure 2.2.
The nonlinear behavior is represented by the multiple convolution integrals of the
Volterra functionals of order second and higher. The rth-order Volterra functional
Vr[x(t)] = (' ...

o
f kr(Tj, ... , Tr)x(t- Tl) ... x(t- Tr)dTI ... dr, (2.19)
is an r-tuple convolution of r time-shifted versions of the input signal with an r-dimen-

sional function k.; termed the rth-order Volterra kemel, The latter describes the weighting
pattern of rth-order nonlinear interactions that the system uses in order to generate the
present value of the system output through integration of all product combinations of r in-
put epoch values.
An illustration ofthis is given in Figure 2.3 for the second-order kernel. Each value of
this kernel (depicted as a surface), for instance k2 (A1, A2 ) at lags Al and A2 , represents the
weight (i.e., the relative importance) of the product x(t - A1)x(t - A2 ) in constructing the
system output y(t). All such weighted product combinations are integrated in order to con-
struct the second-order nonlinear component of the system output described by the sec-
ond-order Volterra functional. A large positive value of k2(A I , A2 ) implies strong mutual
facilitation ofthe input-lagged values x(t - Al) and x(t - A2 ) in the way they affect the sys-
tem output y(t). Note that this often occurs at the diagonal points (Al = A2) , as in the case
of the retinal ganglion cell whose second-order kernel peaks around Al = A2 = 35 ms, as
shown in Figure 1.5. Conversely, a negative value of k2(A h A2 ) implies mutual inhibition
between the input-lagged values x(t - Al) and x(t - A2) in the way they affect the system
output y( t)-see, for instance, the negative "trough" along the diagonal of the second-or-
der kernel ofthe ganglion cell between a lag of 80 ms and 160 ms (shown in Figure 1.5).
Small kernel values imply that the corresponding combinations ofinput-lagged values do
not affect significantly the present output value.
It is critical to note that the Volterra kerne I values are fixed (for a given stationary
system) and represent characteristic system signatures (i.e., they are unique in the
Volterra context). Therefore, they can be used for unambiguous classification ofnonlin-
ear physiological systems and hold great promise for improved diagnostic purposes. The
Volterra kernels of a given system are also complete descriptors of the system nonlinear
dynamics of the corresponding order (i.e., for second-order interactions, there is nothing
left out of the second-order Volterra kernel pertaining to the manner in which pairs of
input-lagged values affect the output ofthe system). Therefore, the Volterra kemels con-
tain complete and reliable information regarding the system function at the respective
order of nonlinear interactions (nothing missing, nothing spurious) and offer the ulti-
mate tools for proper understanding of physiological function, upon successful interpre-
tation.
Thus, the Volterra kernels of the system form a hierarchy of system nonlinearities (ac-
cording to multiplicity rank of input interactions) and constitute a canonical representa-
tion of the system nonlinear dynamies. They are the complete and reliable descriptors of
the system function (i.e., allow accurate prediction ofthe system output for any given in-
put). Their estimation from input-output data is the objective ofthe system identification
task in the nonparametrie context. Methodologies to this purpose are discussed in Sec-
tions 2.1.5 and 2.3.
The elegant and insightful hierarchieal organization of the Volterra functionals is
clearly depicted in Example 2.2, where the rth -degree term of the polynomial static non-
linearity gives rise to the rth-order Volterra functional (involving the associated rth-order
Volterra kernel), An instructive look into theoperational meaning of the rth-order Volter-
ra functional (and kerne I) is provided by the use of sinusoidal and impulsive inputs, as
discussed below.
Impulsive Inputs. For an impulsive input x(t) = A5(t), the output ofthe Volterra model
is
y(t) = ko + Akl(t) + A 2k2(t, t) + ... + Arkr(t, ... , t) + ... (2.20)
that is, it involves all the main diagonal values of all kernels in the system and forms a
power series (or polynomial, if the model order is finite) in terms of the amplitude A of
the impulsive input. This fact draws the clear distinction between the impulse response of
a nonlinear system (which contains the main diagonal values of all kernels) and the im-
pulse response of a linear system (which corresponds to the first-order Volterra kernel),
Thus, we must avoid the use of the term "impulse response function" to denote the first-
order kernel of a nonlinear system, since this is clearly misleading.
Let us now consider a pair of impulses in the input: x(t) = A5(t) + B8(t - to). Then the
Volterra model output is
y(t) = ko + Ak1(t) + Bk 1(t - to)

+ A 2k2(t, t) + B 2k2(t - to, t - to) + 2ABk2(t, t - to) + A 3k3(t, t, t)
+ B 3k3(t - to, t - to, t - to) + 3A 2Bk3(t, t, t - to) + 3AB 2k3(t, t - to, t - to) + ... (2.21)
where the first three terms are the first-order ("linear") component of the output, the fol-
lowing three terms are the second-order component of the output, the following four
terms are the third-order component of the output, and so on. Note that the expression of
Equation (2.21) is condensed by using the fact that the Volterra kernels are symmetric
about the diagonals (i.e., invariant for any permutation oftheir arguments).
Clearly, for the pair-of-impulses input, the high-order kemels generate interaction
terms in the output (e.g., the term 2ABk2(t, t - to) for second-order interaction) that repre-
sent the nonlinear "cross talk" between the two impulses in generating the output. This is
the effect of "dynamic nonlinearity" that is manifested in the output as interactions of in-
put values at different (lagged) points in time [e.g., at time-points t and (t - to) in this ex-
ample]. This effect of"dynamic nonlinearity" can be contrasted to the "amplitude nonlin-
earity" effect manifested in this example by the terms involving only the main diagonal
values of the kernels and the powers of only A or only B. Note that both effects spread
over time in a manner determined by the kerne I values, as illustrated in Figure 2.1.
It is evident that ifthe output components due to each impulse (applied separately) get
subtracted from the pair-of-impulses output (the test for the superposition principle), then
the residual corresponds strictly to the "dynamic nonlinearity" terms and depends on the
off-diagonal values of the kemels, This is further elaborated in Section 2.1.5, in connec-
tion with the issue of Volterra kernel estimation using multiple impulses as experimental
test inputs.
It is also evident from Equation (2.21) that "amplitude nonlinearities" cannot be de-
tected with the pair-of-impulses test, unless the amplitude of the impulses is varied. The
same is true for any sequence of impulses (including random or pseudorandom se-
quences) that maintain fixed magnitude. This implies that the main diagonal values ofthe
kernels cannot be estimated by use of impulses with fixed magnitude (although the re-
maining kernel values can be estimated), a fact that has important implications in model-
ing studies ofneuronal systems with spike-train inputs (as elaborated in Chapter 8). This
limitation can be overcome, of course, if the impulses are allowed to vary in amplitude or
a judicious form of interpolation can be used.
This type of analysis can be extended to three or more impulses and can yield Volterra
kerne1estimates for finite-order systems, as discussed for a limited experimental context in
Section 2.1.5, or be extended to arbitrary impulse sequences in a general methodological
context germane to neuronal systems, as discussed in Chapter 8. Due to its relative sim-
plicity, the study of the system response to impulsive stimuli can provide useful prelimi-
nary information about the system memory and dynamics, as discussed in Section 5.2.
Sinusoidallnputs. For a sinusoidal input offrequency Wo, the rth-order Volterra func-
tional Vr generates an rth harmonic [i.e., a sinusoidal signal offrequency (r'wo)] and low-
er hannonies of the same parity (i.e., odd or even), namely the hannonics r, (r - 2), (r -
4), ... , (r - 2q), where q is the integer part of r/2, as can be shown by use oftrigonomet-
ric fonnulae [Bedrosian & Rice, 1971]. The amplitude and phase of these harmonics is
detennined by the values ofthe rth-order Volterra kernel k.. This is demonstrated below
for the second-order ease. ..
For a sinusoidal input x(t) = cos wot, the second-order Volterra functional is
V[x(t)] = J["k
2 2 ( TI> T2) COS Wo(t - Tl) cos wo(t - T2)d T l d T2
°
= ~ J('k2( Tl' T2) cos wo(Tl- T2) d T l dT2
+ ±f(' k2( Tb T2) cos wo(2t- Tl - T2) d T l d T2 (2.22)
Clearly, the first tenn is constant over time (zeroth hannonic) and the second term yields
the second harmonie:
~ cos 2wotff kl TI> T2)COS Wo( Tl + T2) d T l dT2 + ~ sin 2wotfJkl TI> T2)sin Wo( Tl + T2) d T l dT2
= 7i2 Re{K2(wo, wo)} cos 2w ot + 7i2 Im{K2(wo, wo)} sin 2w ot (2.23)
where K 2(Wl' W2) is the two-dimensional Fourier transform of the second-order Volterra
kernel, It is evident from Equation (2.23) that the amplitude and the phase of the second
harmonie depend on the value of the two-dimensional Fourier transform (2D-FT) of the
second-order Volterra kernel at the bifrequency (wo, wo). It can be further shown that, if
two sinusoidal frequeneies (Wb W2) are used at the input: X2(t) = COS Wl t + cos w2t, then
the second-order Volterra functional will generate sinusoidal components at frequencies
(2Wl)' (2W2)' (Wl + W2), and (Wl - W2), in addition to constant terms, indicating that the
complex values of K 2( W i ± Wj), where i.] = 1, 2, determine the second-order response of
the system. This result ean be generalized for any number M of sinusoids in the input sig-
nal by letting the indices i andj take all integer values from 1 to M. Specifically,
MM
V2[X2(t)] = const. + Ti2 I I Re{K2(wi ± wj)} cos(wi ± Wj)t
i=lj=l
+ Im {K 2(to, ± wj)} sin( co, ± wj)t (2.24)
This expression (2.24) govems the second-order response of any Volterra system to an ar-
bitrary input waveform expressed in terms ofits Fourier decomposition. Thus, second-or-
der nonlinear interactions in the frequency domain (intennodulation effects) involve all
possible pair eombinations (Wi ± wj ) , of sinusoidal components of the input signal weight-
ed by the values of the 2D-FT of the second-order kernel at the respective bifrequency
points (w i ± wj ) .
Following the same line of analysis, we can show that the rth-order Volterra funetion-
al generates output components at all frequencies (Wil ± Wi2 ± ... ± Wir)' weighted by the
respective values of Kr(Wil ± Wi2 ± ... ± Wir)' where the indices t, through i, take all inte-
ger values from 1 to M. The frequency response eharacteristics of the Volterra functionals
2. 1 VOLTERRA MODELS 45
are discussed more broadly in the following section dealing with frequency-domain repre-
sentations ofthe Volterra models.
Remarks on the Meaning of Vo/terra Kerne/s. From the foregoing discussion, it

is evident that the Volterra kemels (of any order) can be viewed as the multidimensional
weighing patterns by which the system weighs all product combinations of input-lagged
values or sum/difference combinations ofmultisinusoid input frequencies in order to pro-
duce the system output through weighted integration (or summation, in discrete time).
These pattems of nonlinear interactions among different values of the input signal (as
they are encapsulated by the system kemels) allow prediction of the system output to any
given input and constitute a complete representation of the system functional properties,
as weIl as characteristic "signatures" of the system function. As such, they can be used to
simulate and analyze the functional properties of the system, as well as to characterize it
for classification or diagnostic purposes.
The far-reaching implications for physiology and medicine are evident (physiological
understanding, hypothesis testing, clinical diagnosis and monitoring, closed-Ioop treat-
ment, therapy assessment, design of prosthetics and implants, tissue characterization,
physiological control and regulation, etc.), ifwe can only hamess this modeling power in
an experimental and clinical context.
It should be emphasized that the Volterra kemels representation is not an ad hoc
scheme based on intuition or serendipitous inspiration but a complete, rigorous, canonical
representation of the system functional properties that possesses the requisite credibility
and reliability for critical, life-affecting applications.
2.1.3 Frequency-Domain Representation of the Volterra Models

The useful insight gained by frequency-domain analysis provides the motivation for
studying the Volterra models in the frequency domain. This is accomplished with the use
ofmultidimensional Fourier transforms for the high-order kemels. It has been found that
the Volterra series can be expressed in the frequency domain as [Brillinger, 1970; Rugh,
1981]
Y(w) = 27TkoD(W) + K1(w)X(w) + - 1 Joo K 2(w, w- u)X(w)X(w- u)du + ...

27T -00
... + .
27T
\r-I JJ. ..{oe Klu], . . . 'Ur-I> w-
-00
Ul .•. - Ur-I)
X(UI) ... X(Ur-l) ... X(w - UI ... - Ur-l)dul ... dü., + ... (2.25)
for detenninistic inputs and kemels that have proper Fourier transforms. The latter are
guaranteed because the kemels must satisfy the absolute integrability condition (Dirichlet
condition) for purposes of Volterra series convergence and system stability (asymptotic
finite-memory requirement), as indicated by Equation (2.4). Although certain input sig-
nals may not have proper Fourier transforms (e.g., stationary random signals such as
white noise), the use of finite data records in practice makes this mathematical issue
moot. Note that wand u, denote frequency in rad/sec, giving rise to the powers of (27T)
scaling terms in Equation (2.25). If frequency is measured in Hz, then these scaling fac-
tors are eliminated.
Equation (2.25) indieates that for a generalized sinusoidal input x(t) = Adwot, the rth-
order Volterra functional generates at the system output the rth harmonie:
Yr(W) = 211'ArKr(wo, Wo, ... , Wo, wo)5(w - rwo) (2.26)
sinee X(u i ) = 211'AS(u i - wo) in this ease. Note that no lower hannonies are generated here
beeause of the eomplex analytie form (phasor) of the generalized sinusoidal input that
simplifies the mathematieal expressions. However, in praetiee, the input is not eomplex
analytie and the resulting output eomponents inelude lower hannonies of the same parity
(odd or even). For instanee, the fifth-order Volterra funetional will give rise to a first,
third, and fifth harmonie. This odd/even separation ofthe Volterra funetionals can be used
in praetiee to gain additional insight into the possible odd/even symmetries ofthe system
nonlinearity.
Ifwe eonsider an input eomprised of a pair of eomplex analytie sinusoids X2(t) = AeiWIt
+ Bei w2t, thenX2(w) = 21T(AS(w- WI) + B5(w- W2)], and intermodulation tenns are gener-
ated by the Volterra funetionals due to nonlinear interaetions. For instanee, the seeond-or-
der funetional eontributes the following three terms to the system output in the frequeney
domain:
Y2(W) = 211'A2K2(Wh WI)5(W- 2WI) + 211'B2K2(W2' W2)S(W - 2W2)

+ 411'ABK2(WI' W2)5(W- WI - W2) (2.27)
that represent seeond harmonies at frequeneies (2WI) and (2W2)' as well as an intennodu-
lation tenn at frequeney (WI + W2). In the time domain, the seeond-order Volterra fune-
tional for this input is
V2[X2(t)] = A2K2(Wh WI)ei2wIt + B2K2(W2' W2)ei2W2t + 2ABK2(Wh W2)ei(WI+W2)t (2.28)
The resulting seeond-order output eomponent has three generalized sinusoidal terms at
the frequeneies (2WI), (2W2)' and (WI + W2) with amplitudes and phases determined by the
values of K 2 at the respeetive frequeneies, as illustrated in Figure 2.8 for a seeond-order
kernel from renal autoregulation.
The expressions for inputs with multiple sinusoids and higher-order funetionals are, of
eourse, more eomplieated. For an input with M eomplex analytie sinusoids,
xMCt) =AIeiWIt + ... + A#wMt (2.29)
the rth-order Volterra funetional eontributes in the frequeney domain the rth-order output
eomponent:
M M
Yr(w) = 211'L ...mr=I
ml=I
I AmI· .. AmrKr(Wml' ... , Wmr)S(W- WmI - ... - Wmr) (2.30)
whieh yields in the time domain eomplex analytie sinusoidal eomponents at all possible ,M
sums of r frequeneies (with repetitions) from the M frequeneies present in the input signal:
M M
Vr[XM\'t)] -- L"" ... L~ A mI··· A mr ~( wml' ... , wmr) e j(wI+··+Wmr)f
ft l (2.31)
mI=I mr=I
/ j
f
C-
~
l
r":
0)2 ~f f\ / '\ / I <, 'f~ 0~ 'J \

/'\
"
:
~ __ ~)
\
e ' ""- , --
<~ S 02
IV.
0)1
0.3
,
0.4
1
0 '0
TI
20 30
Figura 2.8 The magnitude (Ieft) of the two-dimensional Fourier transform of the secend-erder ker-
nel shown in the right panel (from renal autoregulation). For given pair of stimulation frequencies (w;,
wi), the resulting second-order output component has three terms at frequencies (2w;), (2wi), and (w;
+ wi) as indicated in Equation (2.28). The corresponding magnitudes depend on the values of K 2(w; ,
wi), K 2(wi, wi), and K 2(w;, wi) as marked in the figure with solid circles.
The main point is that the hannonics and intennodulation tenns generated by the system
nonlinearities can be predicted explicitly and accounted for quantitatively by the system
kernels.
Equation (2.25) also shows that a broadband input may lead to an output with even
broader bandwidth, depending on the spectral characteristics ofthe kemels. For instance,
a static nonlinearity will generally lead to broadening ofthe output bandwidth (relative to
the input bandwidth) but this may not happen for certain types of kemels. Another inter-
esting possibility raised by Equation (2.25) is that the various orders of kernels of the
same system may have different bandwidths and, therefore, different nonlinearities may
be activated by higher or lower input frequencies .
2.1.4 Discrete-Time Volterra Models

In actual applications, the input-output signals are sampled at a fixed sampling rate that
must exceed the bandwidths of both signals (Nyquist frequency) . These sampled data
constitute discrete-time signals, also referred to as time-series data . These discrete-time
signals are used to model the system in practice and to estimate the requisite Volterra
models in discrete form, as discussed in Sections 2.1.5 and 2.3. This gives rise to the dis-
crete-time Volterra model of the form
yen) = ko + TI k,(m)x(n -m) + T2 I I k2(m"

m m, m2
m2)x(n -ml)x(n -m2) + . . . (2.32)
where n represents the discrete-time index (n = tlT), m denotes the discrete-time lag (m =
TIT), and T is the sampling interval. The discretized kernels k1(m). k2Cm], m2), . . . are
sampled versions of the true continuous-time Volterra kernels of the system. Thus, the
discretization of the Volterra model is straightforward as long as T is sufficiently small

relative to the bandwidth B..;, ofthe system [i.e., T ~ 1/(2Bs ) ] ' A note of caution is in order
regarding the selection of T when the bandwidth of the input signal is narrower than the
system bandwidth. The reader should be alerted that kernel estimation is not possible in
this case (without aliasing) if T is selected on the basis of the input Nyquist frequency.
Therefore, T must be seleeted according to the maximum bandwidth of the system and of
the input-output signals.
It is evident from Equation (2.32) that the discrete-time Volterra series expansion de-
pends on the sampling interval T. Since the latter is not usually incorporated into the dis-
crete Volterra model used for kernel estimation, the resulting discrete kernel estimates are
sealed by apower of T equal to the order of the kernel. In order to remove the dependence
of the estimated diserete kerneI values on T, we must either include T explicitly in the dis-
crete Volterra model as shown in Equation (2.32) or divide the obtained kerne I values by
Tr, where r is the order ofthe kernel. We adopt the latter convention, whieh is also impor-
tant in order to maintain the proper physical units for the estimated kernel values. Under
this convention, the physieal units for the rth-order kernel remain what they ought to be:
(output units)/[(input units)" x (time units)"]. Note, however, that under this convention
the estimated values ofthe discrete Volterra kernels must be divided by T", in order to re-
tain their physieal units.
An important praetieal attribute of the discrete Yolterra kernels is the number of sam-
ples (lags) along eaeh dimension of the kernel that are required in order to represent prop-
erly the kernel in discrete time. This number M is determined as the ratio of the effective
kernel memory JL (i.e., the domain of T over whieh the kerne I has significant values) to the
sampling interval T. Since the latter may not exceed the inverse of twice the system band-
width B, (in Hz) to avoid aliasing, we conclude that the minimum M is equal to twice the
memory-bandwidth product of the system. In general,
M"?:::. 2Bs JL (2.33)
where JL is measured in sec and B, in Hz. The parameter M is eritieal in praetice because it
determines the number of unknowns that need be estimated when the kernel is represent-
ed by its diserete-time samples, This number inereases geometrieally with the order r of
the kerneI (i.e., as M"). Therefore, high-order kernels with large memory-bandwidth
produet are difficult to estimate. This problem is mitigated by the introduetion of appro-
priate kernel expansions that reduce the number of unknowns to be estimated, as dis-
cussed in Section 2.3.
The equivalenee between diserete Volterra models (kernels) and discretized paramet-
rie models of differential equations (i.e., nonlinear differenee equations) is discussed in
Seetion 3.3. In general, discrete Volterra kernels of differenee-equation models can be de-
rived analytieally using the "generalized harmonie balance" method, This provides a
methodological bridge between parametric and nonparametrie models in discrete time.
However, this methodology tends to be rather cumbersome in the general case and may
prove praetical only in a limited number of applications-an issue that deserves further
investigation, sinee it has received only a rudimentary treatment to date.
The frequency-domain analysis of the discrete Volterra modelslkernels employs the
diserete Fourier transform (DFT) or its computationally efficient version, the fast Fourier
transform (FFT). The DFT or FFT of the discrete Volterra kernels are approximations of
the kernel Fourier transforms discussed in Section 2.1.3, within the frequency resolution
defined by the fundamental frequency of the DFT/FFT (i.e., the inverse of the kernel
length). The minimum frequency resolution is established by the kernel memory, but
higher resolution can be had by increasing the length of the kernel estimate with zero
packing (i.e., decreasing the fundamental frequency of the DFT/FFT), if so desired. A
parsimonious approach is always recommended, due to the dimensionality of high-order
kerneis (e.g., a doubling of frequency resolution quadruples the number of points in the
2D-FFT ofthe second-order kernel).
For analytical manipulations in discrete time, the z-transform can be used as indicated
in Sections 3.3-3.5. Note that the primary utility ofthe z-transform is in analytically solv-
ing difference equations.
2.1.5 Estimation of Volterra Kerneis

From the point of view of system modeling, the critical issue is the practical and accu-
rate estimation of the discrete-time Volterra kemels from input-output experimental
data. It is evident from Equation (2.32) that the unknown sampled-kernel values enter
linearly in this estimation problem (although the model is nonlinear in terms of the in-
put-output relation), thus facilitating the solution of the set of linear equations written in
matrix form as
y=Xk+e (2.34)
where y' = [y(1), y(2), ... ,y(N)] is the output data vector (with prime denoting "trans-
pose"), k'= [ko, Tk1(0), Tk 1( 1), ... , Tk1(M), T 2k2(0, 0), 2T2k2 ( 1, 0), T 2k2 ( 1, 1), 2T2k2(2,
0),2T2k2(2, 1), T 2k2(2, 2), ... , 2T2k2(M, M - 1), T 2k2(M, M)] is the vector ofunknown
kernel values (to be estimated) for a second-order discrete-time Volterra model with
memory-bandwidth product M(J-L = M -Tv; Xis the input data matrix constructed accord-
ing to Equation (2.32) using the above definition ofthe kernel vector k that takes into ac-
count the kernel symmetries, and B is the error vector [e(1), e(2), ... , e(N)]' defined
from Equation (2.32) for each discrete time n as the difference between the model-pre-
dicted and the measured output value (the error terms are also called "residuals"). Note
that the estimated discrete kernel values are scaled by constants Tl" dependent on the spe-
cific kernel order rand point location, where I accounts for the intrinsic kernel symme-
tries (e.g., I = r! for all off-diagonal points but 1=1 for full-diagonal points). As discussed
in the previous section, the estimated discrete kerne I values depend on the sampling inter-
val T, which is determined by the system bandwidth and the sampling requirements of the
input-output signals. The size P of the unknown kerne I vector k (which determines the
computational burden of this estimation problem) depends on the system memory-band-
width product M and on the nonlinear order of the system Q = max {r }, roughly increasing
as MO.
There are two confounding issues in practice. The first issue concerns the rapid in-
crease of discrete kernel values that need be estimated as the model order and/or the sys-
tem memory-bandwidth product increase. For a kernel with M discrete sampled values in
each dimension (representing a fixed characteristic ofthe system determined by the prod-
uct of the system bandwidth with the system memory), the number of estimated discrete
values for the rth-order kerne1 is [M(M + 1) ... (M + r - l)]/r!, when the kernel symme-
tries are taken into account. By summing all kernel orders from 0 to Q, we find that the to-
tal number of discrete kernel values for a Qth-order system is
P = (M + Q)(M + Q - 1) ... (M + 1)/Q! (2.35)
For Q ~ M, this number is approximately ~/Q!, indicating a geometric increase with Q

and an exponential dependence of the total number of estimated kernel values (free para-
meters) on the model order and the logarithm of M.
This "curse of dimensionality" (as it is often lamented by investigators confronted with
this problem) represents the most serious limitation in the practical application of the
Volterra modeling approach to nonlinear systems of high order and/or large memory-
bandwidth product (MBP). This limitation has motivated the introduction of kernel ex-
pansions (see Section 2.3), which mitigate the effects of the dimensionality problem by
compacting the kerneI representation for systems with large MBP. However, the kernel
expansion still faces limitations for high-order systems. This problem can be effectively
addressed only by use of equivalent structured models that constrain the number of free
parameters, such as the network models discussed in Section 2.3.3. The latter represent
our best answer to this problem at the present time.
The second confounding issue arises from the practical necessity of selecting a finite-
order Volterra model, even if the system is actually of infinite order. This implies that
there exists some correlation among the residuals ofthe estimation (fitting) procedure due
to the model truncation errors that depend on the input signal. This correlation of the
residuals leads to biases in the kernel estimates obtained via least-squares estimation pro-
cedures. Of course, this estimation bias becomes significant only when the model trunca-
tion error is significant. Therefore, the severity of this problem depends on the specific
application and becomes serious only for high-order systems that are not represented with
satisfactory model order approximation.
Note that the adequacy of the model order approximation also depends on the dy-
namic range (or power level) of the input signals used or anticipated in each application.
Naturally, the greater the dynamic range (or power level) of the input signal, the greater
the relative importance of the higher-order kernels. Furthermore, the form of the result-
ing estimation bias depends on the spectral characteristics of the particular input data
used for kernel estimation. Most applications to date have been limited to the estimation
of up to second-order kernels and are liable to this "model truncation" problem (Le., the
possible presence of significant higher-order kernels will cause estimation biases in the
obtained first-order and second-order kernels), However, the advocated use of equiva-
lent high-order structured (network) models alleviates the model truncation problem by
allowing estimation of high-order nonlinearities with small computational cost (see
Section 2.3.3).
We describe below several methodologies that have been used thus far for the estima-
tion ofthe Volterra kemels. These methodologies can be clustered in two groups: one em-
ploying specialized experimental inputs (e.g., multiple impulses or sums of sinusoids) and
the other applicable for arbitrary input signals. These methods are not recommended for
efficacy but are presented for completeness of methodological background.
Specialized Test Inputs. Since impulsive and sinusoidal inputs have been used ex-
tensively in the study of linear systems, it was natural that early attempts to estimate
Volterra kemels employed similar test input waveforms. In order to account for multiple
time or frequency interactions, these inputs took the form of sequences of impulses (with
variable interimpulse intervals) and sums of sinusoids (with incommensurate frequen-
cies), as discussed below.
In the case of impulse sequences, the order Q of the Volterra model has to be deter-
mined first from single-impulse experiments using the variable impulse strength input
XA(t) = AS(t) that elicits the output [Schetzen, 1965a]:
YA(t) = Akl(t) + A 2k2(t, t) + ... + AQkQ(t, ... , t) (2.36)
The model order Q is determined by finding the polynomial dependence of Y A on Q for

some t (usually near the peak output value).
The diagonal values of the kernels can be estimated, along with k l (t), from single-im-
pulse elicited output data by solving a system of linear simultaneous equations given by
Equation (2.36) for various A values. The number of A values ought to be at least Q times
the memory-bandwidth product ofthe system to have a critically determined set of equa-
tions, although a larger number of A values is welcome since it offers noise-suppressing
possibilities through an overdetermined set of equations.
In order to estimate the off-diagonal values of the kernels, a sequence of two, three,
... , up to Q impulses of variable timing and amplitude is presented to the system in a
manner that covers all timing and amplitude combinations of interest. Upon completion
of this long sequence of such experiments, the various kernels are estimated in de-
scending order, starting with the highest-order kerne I through the proper subtraction of
multiple experimental outputs, as demonstrated below.
For a second-order system, a two-impulse sequence
XAB(t) = AS(t) + Bset - to) (2.37)
is used as experimental input to produce the output (omitting ko for simplicity):
YAB(t) = Akl(t) + Bkl(t - to) + A 2k2(t, t) + B 2k2(t - to, t - to) + 2ABk2(t, t - to) (2.38)
for all values of t o from T (the sampling interval) to J.L (the system memory). Note that the
second-order kernel is symmetrie about its diagonal [i.e., k2(t - to, t) = k 2(t, t - to)]. AI-
though only single values of A and Bare required in theory, several values may be used to
cover the input amplitude range of interest (in order to secure the global validity of the
model) and improve its accuracy in the presence of inevitable noise by affording some av-
eraging. The values A and B can be randomly selected according to a prescribed probabil-
ity distribution representing our understanding of the likelihood of input amplitudes oc-
curring under the real operating conditions of the system. Then, it is evident that
YAB(t) - YA(t) - YB(t - to) = 2ABk2(t, t - to) (2.39)
which yields an estimate of a slice k2(t, t - to) ofthe second-order kernel parallel to the di-
agonal for given values of A and B (which can be divided out). Using different values of A
and B, we can obtain multiple estimates of the same paradiagonal slice of the second-
order kernel and suppress inevitable errors (from measurements and noise) through aver-
aging. By varying to, we cover the entire second-order kernel slice by slice (up to the sys-
tem memory, for t o = J,L), except for its diagonal.
The apparent simplicity of this approach conceals the onerous experimentation re-
quired, especially for high-order systems, and its vulnerability to noise. Nonetheless, it
may represent an attractive option in certain cases. A variant ofthis procedure can be used
in the study ofneural systems with spike inputs (action potentials), as discussed in Chap-
ter 8.
Another specialized test input that was introduced in the early years of this approach is
the sum ofsinusoids ofincommensurate frequencies [Victor et al., 1977; Victor, 1979]:
x(t) = I i
Aiei
Wit (2.40)
where the frequencies {Wi} are not multiples of the fundamental frequency Wo defined by
the inverse ofthe record length R (wo = 2n/R). The summation index i takes the same pos-
itive and negative values, and Ai = A !i because the input signal is real.
When this input is presented to a nonlinear (Volterra) system, the rth-order Volterra
kerneI will give rise to sinusoidal terms at the output that have frequencies defined by the
sums or differences of these incommensurate frequencies (see Section 2.1.3). For in-
stance, a second-order system will generate the output
y( t) = ~
LA·· (w.)ei Wit + ~ ~
' K1 , L L A· . A· .
l} '2
K 2(w.'I' eo.'2 )ej (wi 1+wi2)t (2.41)
i il -z
where K 1(wJ andK2(wil' Wi2) are the one-dimensional and two-dimensional Fourier trans-
forms of the first- and second-order Volterra kernels, respectively, sampled at the input
frequencies. Note that the summation indices in Equation (2.41) account for sums and dif-
ferences of input frequencies, since the summation indices take symmetrie positive and
negative values. Because the input-output signals are real, the differences between these
input frequencies also arise (e.g., Wil + W- i2 = Wil - Wi2)'
Based on Equation (2.41), the Fourier transform of y(t) will reveal the values of the
Fourier transforms of the Volterra kemels at the input frequencies and their sum/differ-
ence combinations thereof. The fact that the input frequencies are incommensurate guar-
antees that no overlapping frequencies will exist at the output because of high-order
sum/difference combinations (i.e., the sum or the difference oftwo or more input frequen-
eies will not coincide with another input frequency). A key practical issue is the ability to
reconstruct accurately the values of the kernels at these incommensurate frequencies, us-
ing DFT or FFT computed values that are found by numerical necessity at commensurate
frequencies (leakage correction).
Note also that the highest significant harmonie of each ofthe input frequencies indicates
the order of the system, since the Qth-order kernel will give rise to a maximum Qth har-
monie. This is an important practical advantage of this method, since it obviates the need
for selecting apriori the order of the Volterra model. Another advantage of this method is
that it is rather robust in the presence ofnoise since it concentrates the signal power at a few
specific frequencies. In addition, it is experimentally and computationally efficient.
The main drawback of this method is the leakage correction problem and the fact that
it estimates the Volterra kemels of the system only at a few specific points in the frequen-
cy domain. Thus, if the kemels are relatively smooth in the frequency domain, this ap-
proach can be very efficient. However, ifthe kemels have multiple resonances and/or dis-
sonances, then this approach is rendered ineffective for a limited number of input
frequencies.
Arbitrary Inputs. The adjective "arbitrary" implies input signals that are not specially
designed to attain specific waveforms, but they are input signals that occur naturally dur-
ing the normal operation of the system. The only desirable (and, in fact, necessary) fea-
ture of these inputs is to be relatively broadband (but not necessarily white) so that they
cover the entire range of frequencies of interest, which detennines the system bandwidth.
Naturally, the special case of random inputs (white or nonwhite) is covered by these
methods.
The direct approach to the estimation of the discrete Volterra kemels using arbitrary
(sampled) inputs, x(n), and the resulting outputs, y(n), is to fonnulate the problem in the
vector-matrix form ofEquation (2.34). Then the simplest solution to this estimation prob-
lem is given by the ordinary least-squares (aLS) estimate [Eykhoff, 1963, 1974; Hsieh,
1964]:
kOLS = [X'X]-IX'y (2.42)
where the prime denotes transpose. This aLS estimate is unbiased, consistent, and of
minimum variance if the error terms in the vector 8 are uncorrelated, input-independent,
and zero-mean Gaussian-an assumption that cannot be made easily, especially for trun-
cated Volterra models where the residuals are correlated, input-dependent, and possibly
non-Gaussian. In addition, physiological noise may exhibit non-Gaussian statistical char-
acteristics that call for robust estimation methods, discussed later in this section.
This aLS estimate represents the most basic and direct approach to the Volterra kerne I
estimation problem and may yield satisfactory results ifthe [P x P] "Gram matrix" [X'X]
is not ill-conditioned (i.e., a numerically stable inverse exists) and the error terms (also
termed "residuals") are approximately uncorrelated and Gaussian. Let us examine the es-
timation errors associated with the aLS estimator of Equation (2.42). By substitution of
the aLS estimate of Equation (2.42) into the model of Equation (2.34), the estimated
model output becomes
y = X[X'X]-IX'y (2.43)
and the estimated output residuals are
e=y-y
= [I - X[X'X]-IX']y (2.44)
where 1 is the [N x N] identity matrix. Then the following measure ofthe OLS estimation
errors can be derived, which is given by the covariance matrix of the estimated parameter
vector:
E[(k - k)(k - k)'] = u2[X'X]-I (2.45)
where er is the computed variance ofthe output residuals. It is evident that the kernel es-
timation variance is determined by the Gram matrix, which relates to the input autocorre-
lation. The key practical problem is the possible ill-conditioning of the Gram matrix
[X'X], which requires robust inversion methods such as singular value decomposition
(SVD) or other generalized inverse methods [Fan & Kalaba, 2003; Kalaba & Tesfatsion,
1990; Udwadia & Kalaba, 1996]. In cases where the ill-conditioning may be caused by
insufficient input bandwidth, the issue of proper testing or observation of the system must
be addressed. This is particularly important for nonlinear systems that are prone to pitfalls
regarding the adequacy of the input ensemble and deserves careful attention in practice.
Note that when the input vectors are orthogonal, then the Gram matrix [X'X] becomes
diagonal and the matrix inversion problems are avoided. This apparently attractive situa-
tion exists when the input signal is white (or quasiwhite) and the Volterra functional ex-
pansion is orthogonalized, as discussed in Section 2.2 for the Wiener modeling approach.
If the residuals are correlated, then they can be prewhitened with a linear transforma-
tion and the kernel estimate becomes the "generalized least squares" (GLS) solution given
by
kGLS = [X'S-IX]-IX'S-l y (2.46)
where S is the covariance matrix ofthe residuals. Note that this GLS estimate will mini-
mize the estimation variance under the Gaussian assumption and remains unbiased and
consistent for input-independent residuals. The GLS estimate ofEquation (2.46) implies a
change in the coordinate system defined by the columns of the input matrix X,
Z=BX (2.47)
where the coordinate transformation matrix B is based on the residual covariance matrix
as
B'B= S-1 (2.48)
so that the residuals become uncorrelated. Note that the application ofthe GLS method re-
quires knowledge ofthe residual covariance matrix-not a simple requirement in practice.
If the residuals are not Gaussian, then an estimate with smaller variance can be ob-
tained by utilizing the log-likelihood function of the residuals (if such can be evaluated)
as the cost function, which is distinct from the quadratic cost function used in least-
squares estimation methods (see below). This cost function is minimized through iterative
procedures using gradient descent, as discussed in Section 4.2.2, or other minimization
methods. This case has been receiving increasing attention in recent years, especially
when the residuals exhibit outliers (caused by impulsive noise or spurious measurements)
that affect significantly the quadratic cost function (robust estimation).
It is evident from the foregoing that the size [N x P] ofthe matrix X must remain with-
in computationally feasible bounds. This may become a serious problem for high-order
systems (large Q) with large memory-bandwidth product (M). This problem is com-
pounded by the fact that N must be much larger than P in order to obtain an estimate of
reasonable accuracy in the presence of noise. The latter constraint impacts the length of
the experimental data record and prolongs the experiment as M and/or Q increase.
For instance, a typical physiological system will have M'" 102 and, therefore, the data
record must be rnuch larger than '" 102Q /Q! for a Qth-order model. Thus, if Q = 3 the min-
imum number of input-output samples is N'" 106 and probably closer to 107 , depending
on the noise level. This requirement can be experimentally onerous or even infeasible
(due, for instance, to system nonstationarities that prevent the collection of long data
records under the stationary assumption).
In order to address the practically serious problem of required record length, we advo-
cate the key idea of kernel expansions on properly chosen bases to reduce the size of the
unknown coefficient vector and, consequently, reduce the length of the required experi-
mental data record. This idea was originally proposed by Wiener and various implemen-
tations have been explored by several investigators [Bose, 1956; Lee, 1964; Amorocho &
Brandstetter, 1971; Watanabe & Stark, 1975]. We have developed and advocate a variant
of demonstrated efficacy that employs discrete-time Laguerre expansions [Marmarelis,
1993].
The kernel expansion approach represents the core idea of the advocated modeling
methodology, based on extensive experience with various physiological systems. It usual-
ly reduces the required record length by a factor of 10Q and results in significant benefits
in terms of estimation accuracy and reduction in the experimental, as well as the computa-
tional, burden. Because of its importance, this methodology is discussed in detail in Sec-
tion 2.3.
Fast Exact Orthogonalization and Paralle/-Cascade Methods. Solution of the

least-squares problem of Equation (2.34) can be also achieved with a variant of the
Cholesky (QR) decomposition that is computationally efficient [Korenberg, 1988,
1989a,b]. This technique, termed "fast exact orthogonalization", develops the kernels es-
timates by successive orthogonalization of input data vectors (having the structure of a
column of matrix X) with respect to each other and to the output residuals. This method
has been applied successfully to various biological systems but remains computationally
intensive when the system memory-bandwidth product is large or when kernels of order
higher than second need be estimated. It also remains vulnerable to noise in the data, es-
pecially noise corrupting the input data.
In order to make the practical estimation ofhigh-order kernels more efficient, the "par-
allel cascade" approach was introduced by the same investigator, whereby the system
model is developed by adding successive parallel branches of L-N cascades (a linear filter
followed by a static nonlinearity) until a satisfactory prediction error criterion is met [Ko-
renberg, 1991]. The linear filters of these parallel cascades are estimated by "cross-corre-
lation slices" and the unknown parameters of the static nonlinearities are determined by
fitting procedures. This method is computationally efficient, even for high-order models
with large memory-bandwidth product, but remains sensitive to noise in the data and usu-
ally yields a very large number ofparallel cascades (in the hundreds). Although the latter
can be consolidated in the form of equivalent kerneIs computed from the parameters of
the parallel cascades, this "parallel-cascade" model does not lend itself to physiological
interpretation, unlike the method of "principal dynamic modes" discussed in Section 4.1.1
that employs a small number of parallel cascades.
Iterative Cost-Minimization Methods for Non-Gaussian Residuals. We begin

with the linear estimation problem described by Equation (2.34) for the case ofresiduals
with arbitrary joint prob ability density function (joint PDF) p(8). Following the maximum
likelihood framework, which has been proven to yield optimal (i.e., minimum variance)
estimates, we seek to maximize over the parameter vector k the likelihood function:
L(k) = p(y - Xk) (2.49)
In practice, we often minimize the negative log-likelihood function (viewed as a cost

function) because many PDFs take a decaying exponential form. Furthermore, the joint
PDF is avoided as impractical by performing "prewhitening" of the model residuals (if
necessary) with the transformation indicated in Equation (2.47). It is understood that the
latter transformation simply provides uncorrelated residuals, which implies true
"prewhitening" (i.e., statistical independence) only in the Gaussian case. Nonetheless, in
the spirit of pragmatic accommodation, we replace in practice the joint PDF with the
product ofthe single-variable PDFs for all residuals (assuming statistical independence of
the residuals after prewhitening).
For instance, in the Gaussian case, the likelihood function under the assumption of
white (Le., statistically independent) residuals becomes
L(k) == (27Ter)-N/2 exp[__I-

2er n=l
[y(n) -i: x~kF] (2.50)
where er is the residual variance, N is the number of samples and x~ is the transpose of
the nth row of the matrix X. Instead of maximizing Lover k, we can minimize (-log L)
over k. This leads to a quadratic cost function:
N 1 N
C(k) ~ -log L == -
2
log(21Ter) + -2
Zo:
I n=l
[y(n) - x~k]2 (2.51)
which can be minimized in closed form by the ordinary least-squares estimate of Equa-
tion (2.42). Note that the GLS estimate ofEquation (2.46) is the closed-form solution of
this cost minimization problem for correlated residuals with covariance matrix S. Of
course, Equation (2.51) can be also solved iteratively using gradient descent (or any other
minimization) methods.
Let us now consider a case of non-Gaussian, white residuals with PDF:
p(e) == A exp[-aleI ß] (2.52)
where a, ß > 0 are dispersion and shape parameters, respectively, and the value of A that
satisfies the PDF normalization condition is
aß (2.53)
A = 2r(1/ß)
where r denotes the Gamma function. This class of PDFs includes the Gaussian (ß == 2)
and the Laplacian (ß == 1), and yields the log-likelihood cost function
N
C(k) == -log A + a I
n=l
[v(») - x~klß (2.54)
which can be minimized over k through gradient-descent iterative methods, since it is dif-
ferentiable except aty(n) - x~k. Note that the gradient components are given by
oc N
ak. = -aß ·
1
I
n=l
sgn[e(n)]xn,Min)Iß-1 (2.55)
when e(n) =t= 0 [the gradient should be set to zero if e(n) == 0], sgn[·] denotes the signum
function and x., is the ith element ofthe vector x'; (for i == 1, ... ,P).
2.2 WIENER MODELS 57
The estimation errors associated with iterative cost-minimization procedures haven

been studied extensively in the literature in connection with a variety ofprocedures. Since
a host of available procedures exists (based on gradient descent or random search), we
will defer to the vast literature on the subject [Eykhoff, 1974; Haykin, 1994; Hassoun,
1995]. We simply note that the key issues are: (a) avoidance of local minima, and (b)
rapid convergence of the iterative algorithm.
These iterative cost-minimization procedures can also be used to solve the daunting
nonlinear regression problem, where the nonlinearity arises from the model form and not
from the non-Gaussian residuals. An example is the training (i.e., the iterative parameter
estimation) ofthe network models discussed in Section 2.3.3, which are equivalent to the
Volterra models of nonlinear systems. In these network models, certain unknown parame-
ters enter nonlinearly and, therefore, the simple fonnulation ofEquation (2.34) is not ap-
plicable. The chain rule of differentiation has to be used in this context (referred to as "er-
ror back-propagation") for iterative estimation of the unknown network parameters.
Although this iterative method has been used extensively, it still offers challenges in some
applications.
2.2 WIENER MODELS
The motivation for the introduction of the Wiener series (and the associated Wiener mod-
els) is found in the desire to diagonalize the Gram matrix [X'X] ofthe previous section by
orthogonalizing the "input vectors." This also addresses the "model truncation" problem
by decoupling the various kemels through orthogonalization of their corresponding func-
tionals, and subsequently facilitates their separate estimation and reduces the size of the
estimation problem. This is similar to the procedure followed in order to facilitate the es-
timation of the expansion coefficients of a function expansion on a basis of functions by
orthogonalizing the expansion basis over the selected domain of the independent variable
(see Appendix I).
Wiener proposed this approach in the context of functionals (systems) by orthogonal-
izing the Volterra functionals for a Gaussian white noise (GWN) input using a
Gram-Schmidt orthogonalization procedure (see Appendix 111). The basic properties of
GWN are discussed in Appendix 11. The GWN input power level defines the region of
functional orthogonality (i.e., the range ofinput power for which orthogonality holds) in a
manner akin to the role of the domain of the independent variable in defining orthogonal
basis functions. Wiener studied extensively the stochastic process of Brownian motion
and the mathematical properties ofits "derivative" (the GWN), including its stochastic in-
tegrals, which led hirn to the introduction ofwhat he tenned "homogeneous chaos"-a hi-
erarchy of stochastic integrals involving GWN that was a forerunner of the Wiener series
[Wiener, 1938].
Wiener's idea extends to functional spaces the logic established in function spaces by
the introduction of orthogonal function bases to facilitate the evaluation of the expansion
coefficients of square-integrable functions. This logic entails the decoupling of simultane-
ous equations through orthogonalization and was extended by Wiener to functional ex-
pansions ofunknown system functionals by combining Volterra's key idea of extending
the mathematical fonnalism from enumerably infinite vector spaces to continuous func-
tion spaces on one hand, with the statistical properties of GWN and its integrals (homoge-
neous chaos) on the other hand. It is critical for the comprehension of the functional ex-
pansions to view a function as a "vector" with an enumerably infinite number of dimen-

sions.
If one draws the analogy between the Yolterra series expansion of an analytic func-
tional and a Taylor series expansion of an analytic function, then the analogy can be also
drawn between the Wiener series of orthogonal functionals with GWN input and a Her-
mite orthogonal expansion of a square-integrable function, because the latter employs a
Gaussian weighting function. In fact, the structure of the Wiener functionals resembles
the structure of the Hermite polynomials. It must be noted again that the Wiener kernels
of a system are generally different from its Yolterra kerneis, although specific analytical
relations exist between the two sets that are presented below.
Even though Wiener's ideas had great influence and shaped constructively our think-
ing on nonlinear system identificationlmodeling, the practical relevance of the orthogonal
Wiener series (for GWN inputs) has diminished in recent years due to the advent of supe-
rior kernel estimation methodologies that are applicable for non-GWN inputs, and the
practical necessity ofutilizing non-GWN inputs in the study ofphysiological systems un-
der natural operating conditions. Nonetheless, we will present Wiener's seminal ideas in
this section, because they still exert considerable influence and are instructive in under-
standing the evolution of this field.
Wiener' s critical contributions to the problem of nonlinear system identificationlmod-
eling are two: (1) the suggestion that GWN is an effective test input for identifying nonlin-
ear dynamic systems of a very broad class, and (2) the introduction of specific procedures
for the estimation of the unknown system kernels from input-output data in the frame-
work of the orthogonal Wiener series. Even though better kernel estimation procedures
(which do not require orthogonalization of the functional expansion or white-noise in-
puts) have been developed in recent years, Wiener's seminal contributions gave tremen-
dous initial impetus to the field and "blazed the trail" for many investigators who fol-
lowed his lead and advanced the state of the arte For this, he is properly considered a
pioneer and a prominent founder of the field.
The idea that GWN is an effective test input for nonlinear system identification and
modeling (the same way the impulse function is an effective test input for linear time-in-
variant system identification) is ofparticular importance and interest. Aside ofthe mathe-
matical properties of GWN that facilitate the Wiener kernel estimation, the idea engen-
ders the notion that the nonlinear system must be tested by all possible input waveforms
that are expected to stimulate the system under normal operation or by a dense, represen-
tative subset ofthis "natural input ensemble." This fundamental idea is revisited through-
out the book in a context broader than the original Wiener suggestion (i.e., only a subset
of GWN comprises the "natural input ensemble" and, therefore, GWN-even band-limit-
ed-may exhibit unnecessary redundancy). The concept is clear but the practical implica-
tions depend on the degree of redundancy of GWN relative to the natural input ensemble
of the system.
In principle, the Volterra kerneis of a system cannot be directly determined from in-
put-output data unless the Yolterra expansion is offinite order. For a finite-order Volterra
expansion, kernel measurement methods through least-squares fitting procedures or by use
of specialized inputs (e.g., multiple impulses or multiple sinusoids) were discussed in the
previous section. These methods have numericalor experimentallimitations and potential
pitfalls, related primarily to the effects of the model truncation error (correlated residuals
leading to estimation biases) and the "richness" ofthe utilized input ensemble (misleading
results, ifthe system functional space is not probed densely by the input signals).
These two fundamentallimitations motivated Wiener to introduce the GWN as an "ef-

fective test input" (i.e., an input signal that probes densely the operational space of all sys-
tems) and to propose the orthogonalization ofthe Volterra functional expansion (i.e., the
orthogonal Wiener expansion makes the residuals orthogonal to the estimated model pre-
diction for a GWN input). The latter results in a new set of kernels (Wiener kernels) that
are distinct from the Volterra kernels of the system, in general. This can be viewed as a
"structural bias" of the Wiener kernels relative to the Volterra kernels of a system since
the residuals of a truncated Wiener model remain correlated (i.e., nonwhite). The differ-
ence is that the "structural bias" of the Wiener kernels is detennined by the GWN input
power level (one parameter), whereas the estimation bias of the Volterra kernels (in a
truncated model) depend on the utilized input ensemble that can be different from case to
case, thus introducing a source of inconsistency in the obtained results (estimated ker-
nels).
For these reasons, Wiener suggested the orthogonalization of the Volterra series for a
GWN test input (see Appendix III and Historical Note #2 at the end ofthis chapter). The
resulting orthogonal functional series is tenned the "Wiener series" and exhibits the
aforementioned advantages. Additional advantages, due to its orthogonality, are the "fi-
nality" of the Wiener kernel estimates (i.e., they do not change if additional higher-order
terms are added) and the rapid convergence of the expansion for a GWN input (i.e., least
truncation error for given model order). Note, however, that the latter advantage is true
only for GWN inputs (as discussed later).
The functional terms of the Wiener series are termed the "Wiener functionals" and are
constructed on the basis of a Gram-Schmidt orthogonalization procedure requiring that
the covariance between any two Wiener functionals be zero for a GWN input, as detailed
in Appendix 111. The resulting Wiener series expansion of the output signal takes the form
[Wiener, 1958]:
y(t) = L Gn[hn; x(t'), t'

n=O
$ t]
[nI2] (-l)nn!pm Joo Joo

L L ( _2m)1.m.'2 m
00
= ... hn(T], ..• , Tn-2m' A], A], ... ,Am' Am)

n=Om=O n 0 0
x(t - Tl) ... x(t - Tn-2m)dTI ... dTn-2mdAb ... dAm (2.56)
where [nI2] is the integer part of nl2 and P is the power level of the GWN input. The lead-
ing integral term of the nth-order Wiener functional has the form of the nth-order Volter-
ra functional (of course with a different kernel), The Wiener kerne I is integrated in the
nonleading integral terms (of lower homogeneous order) for each Wiener functional to re-
duce appropriately the dimensionality and secure the orthogonality of the Wiener func-
tionals. Note that the nth-order Wiener functional has [nI2] + 1 integral tenns that contain
the same Wiener kerneI convolved with the input n, (n - 2), ... , n - 2[nI2] times (i.e.,
each of these integral terms has the form of a homogeneous functional of order equal to
the number of convolved inputs).
The Wiener functionals {Gn(t)} are constructed orthogonal in the statistical sense of
zero co-variance: E[Gn(t)Gm(t')] = 0, for m =1= n and for all values of t and t'; where E[·]
denotes the "expected value" operator which fonns the statistical average of the random
quantity within the brackets over the entire ensemble ofthis random quantity. For ergodie
and stationary random processes, this ensemble average can be replaced by a time aver-
age over the entire time axis (from -00 to +(0). In practice, of course, these averages (both
over ensemble and over time) form incompletely, because ofthe inevitably finite ensem-
ble and/or time record of data, leading to inaccuracies that are discussed in detail in Sec-
tion 2.4.2.
The orthogonality ofthe Wiener functionals is also compromised in practice by the ne-
cessity of using band-limited GWN inputs (instead of the ideal GWN that has infinite
bandwidth and is, therefore, not physically realizable). This situation is akin to the com-
mon approximation of the Dirac delta function (a mathematical idealization that is not
physically realizable) with an impulse waveform offinite time-support (width) that is suf-
ficiently small for the requirements of each specific application. In the same vein, the ide-
al GWN input is approximated in practice by a band-limited GWN signal with sufficient-
ly broad bandwidth as to cover the bandwidth ofthe system under study.
2.2.1 Relation Between Volterra and Wiener Models

The set ofWiener kernels {h n } is, in general, different from the set ofVolterra kemels {k n }
ofthe system and dependent on the GWN input power level P. Specific mathematical rela-
tions exist between the two sets of kemels (when they both exist) that can be derived by
equating the two series expansions. These relations are given in the time domain by
(n + 2m)!pm {oo (oo

L
00
hn(TJ, . . . ,Tn) = "m L··· L kn+ 2m(TJ, . . . ,Tn, AJ, AJ, ... ,Am, Am)dA1 ••• dAm
m=O n.m.2 0 0
(2.57)
or in the frequency domain by
(n + 2m)!pm
Hn(wJ> ... , wn) = ~o n!m!2m(2'lT)m
00 foo
_00'··
r
---00
K n+ 2m(wJ, ... , W n, Ub -UJ, ... ,Um, -U m)dUl ..• du.; (2.58)
where to, or u, denote frequency in rad/sec. It is evident from Equation (2.57) that the nth-
order Wiener kemel depends not only on the nth-order Volterra kernel, but also on all
higher-order Volterra kernels of the same parity. Note that the parity (odd/even) separa-
tion in the expressions of the Wiener kemels provides that the even/odd-order Wiener
kernels are polynomials in P with coefficients depending on all the higher even/odd-order
Volterra kemels. Thus, a system with an even-symmetric (or odd-symmetric) nonlinearity
will have only even-order (or odd-order) Volterra and Wiener kernels,
Similar expressions can be derived for the Volterra kernels of the system in terms of
the Wiener kernels of higher (and equal) order and the respective power level P, by col-
lecting the terms in the Wiener functional with the same number of input product terms
from all the Wiener functionals [Marmarelis, 1976]. Note that, for finite-order models, the
Volterra and Wiener kernels of the two highest orders (odd and even) are identical, be-
cause ofthe absence ofhigher-order kemels ofthe same parity.
As an illustrative example, consider the L-N cascade system of Example 2.2, but with
a cubic nonlinearity. Its Volterra kemels are given by Equation (2.13) for r = 1,2,3, with
ko = o. According to Equation (2.57), the equivalent Wiener kernels of this system for
GWN input power level P are
ho = P f'o
kz(A, A)dA = PaZ f'
0
g2(A)dA (2.59)
h$T) = k\(T) + 3Pf'kiT, A, A)dA = a\g(T) + 3Pa~T)f' g2(A)dA (2.60)

o 0
h 2(Tb T2) = k2(Tb T2) = a~( TI)g(T2) (2.61)
h 3(Tb T2' T3) = k3(Tb T2' T3) = a3g(TI)g(T2)g( T3) (2.62)
If we wish to express the Volterra kernels in tenns of the Wiener kemels of this system,
then (the highest two orders are identical)
k o = h o - P( 'hz(A, A)dA (2.63)
kt(T) = h\(T) - 3Pf'h3(T, A, A)dA (2.64)

o
since the first three Wiener functionals have the structure
G\(t) = f\\(T)x(t- T)dT (2.65)

o
Gz(t) = f'f
o
h z( TJ, Tz)x(t - TJ)x(t - Tz)dT\dTZ - pI hz(A, A)dA
0
00 (2.66)
Git) = f ('f hiT\, Tz, T3)x(t- T$x(t- Tz)x(t- T3)dTJdTz d T3 - 3P('fh 3(T, A, A)x(t- T)dTdA
(2.67)
One interesting implication of the derived relation between the Volterra and the
Wiener kernels of the system is that the first-order Wiener kernel of a nonlinear system is,
in general, different from the linear part of the system (the first-order Volterra kernel),
and it is actually dependent on all higher odd-order Volterra kernels, i.e., contains some
of the odd-order system nonlinearities. This demonstrates the faster convergence of the
Wiener orthogonal expansion, where even the first-order functional tenn reflects some of
the nonlinear characteristics of the system [see Equation (2.57) for n = 1 or Equation
(2.60) in the example]. At the same time, this "projection" of higher odd-order Volterra
kemels on the first-order Wiener kerneI may obscure the interpretation of "linearized ap-
proximations" obtained in the Wiener framework. This point has important practical im-
plications for "apparent transfer function" measurements often used in practice and the
corresponding coherence measurements, as discussed in Section 2.2.5.
With regard to possible pitfalls in the interpretation ofWiener kernels, the reader must
be reminded that the Wiener series is constructed orthogonally with respect to GWN in-
puts of certain power level P, which detennines the range of validity of the orthogonality
between any two Wiener functionals. This "range of orthogonality" in function space is
determined by the product of the input bandwidth and variance. Therefore, since P deter-
mines the range of the orthogonal "coordinate system" represented by the Wiener func-
tionals, the obtained Wiener kerne} estimates depend on the specific P value of the uti-
lized white (or quasiwhite) input, as indicated by Equation (2.57), and should be expected
to provide good model predictions for input signals with bandwidth-variance products
comparable to P (ifthe model is truncated). Ifthe model is complete, then the predictions
will be good for any input signal. Since the estimated Wiener kernels are generally differ-
ent for different P values, they should be reported in the literature with reference to the P
value for which they were obtained.
The reader may wonder why orthogonality is sought. As mentioned briefly earlier,
there are three main reasons why orthogonality is desirable. The first reason is that an
orthogonal basis spans the functional space (within the range of its validity) most effi-
ciently. That is, the Wiener series is expected to have faster convergence than the
Volterra series for a GWN input (i.e., smaller output prediction error for given order of
truncated model). However, this cannot be guaranteed for an arbitrarily chosen input
signal. Recall that GWN is an ergodie random process with the same power over all fre-
quencies, and thus constitutes an exhaustive test input; that is, it tests the system with all
possible input waveforms, given sufficient time. Consequently, it can be expected to
provide a better truncated model of the system over all possible inputs. This is the ra-
tionale for Wiener's suggestion of using GWN test inputs for kerne I estimation.
However, this fact does not exclude the possibility of the opposite to be true for certain
specific input signals, raising the important issue of defming a "natural ensemble" of in-
puts for each specific system, which should be used for kernel estimation in order to
provide the truncated model with the best convergence for the system at hand (the issue
is moot for a complete model).
The second reason for seeking orthogonality is that, if the expansion basis is orthog-
onal, then the truncated model can be extended to include higher-order terms without af-
fecting the lower-order terms already estimated (finality of orthogonal expansions). The
third reason is that the orthogonality allows the estimation of the system kernels in a rel-
atively simple way using cross-correlation (or covariance) when the input is GWN (as
discussed in Section 2.2.3) or through diagonalization of the Gram matrix (as discussed
in Section 2.1.5). This is analogous to the determination of the expansion coefficients of
a given vector or function on an orthogonal vector or function basis (discussed in
Appendix I).
This last advantage of orthogonality has been the primary motivation for the initial use
ofthe Wiener series in the actual identification ofnonlinear systems, although the first ad-
vantage of orthogonality (faster convergence) was the primary motivation for its introduc-
tion in connection with GWN test inputs.
It must be noted that the GWN input is not the only signal with respect to which the
Volterra functional series can be orthogonalized. The orthogonalization can be achieved
for other classes of input signals that possess suitable autocorrelation properties, such as
the CSRS class of quasiwhite signals discussed in Section 2.2.4. For each such signal
class, a corresponding orthogonal functional series can be constructed and the associated
set of kernels can be estimated.
The Wiener Class of Systems. The class of nonlinear time-invariant systems for
which a Wiener series expansion exists is different from the Volterra class defined by the
absolute integrability condition of Equation (2.4). The Wiener class is comprised of sys-
tems that generate outputs with finite variance in response to a GWN input.
Since the OUtput variance ofthe Wiener model for a GWN input is (see Appendix 111
for derivation)
(T~ = f
r=0
rlpr J. .. [\;(
0
TI> ... , Tr)dTI ... dr; (2.69)
the condition for finite output variance becomes the square-integrability condition on the
Wiener kemels:
f.. ·L'\;(TI> ... , Tr)dTI ... dTr:::; r~;r (2.70)
where {er} is a convergent series of nonnegative scalars.

Therefore, the Wiener class of systems is defined by a square integrability condition
for the Wiener kernels with a radius of convergence detennined by the power level of the
GWN input (unlike the absolute integrability condition for the Volterra kemels that has a
radius of convergence detennined by the uniform bound on the amplitude of the input sig-
nal). This square-integrability condition excludes kernels with delta functions from the
Wiener class (e.g., solitary static nonlinearities), which are admissible in the Volterra
class of systems.
We should note at this point that the output of a Wiener model is a stationary random
process-in general, nonwhite. Therefore the analysis of such output signals has to be sta-
tistical and make use of probability density functions, correlation functions, and spectra
(including high-order ones because ofthe system nonlinearities). Thus, it is deemed use-
ful to provide abrief overview of the basic tools for the characterization and analysis of
stationary random processes (signals) in Appendix IV. Because of its pivotal importance,
we discuss the basic properties of the GWN process separately in Appendix 11.
Examples of Wiener Models. As illustrative examples, let us examine the equivalent

Wiener models for the systems used previously as Examples 2.1-2.4 for Volterra models.
Example 2.1 of a static nonlinear system has no formal equivalent Wiener model be-
cause its kernels are not square integrable (being composed of delta functions) and do not
satisfy the condition (2.70).
Example 2.2 of the L-N cascade system has Volterra kernels given by Equation
(2.13). The Wiener kernels for this system (in the general case of infinite-order nonlin-
earity and not the case of cubic nonlinearity discussed earlier) are found by use of
Equation (2.57) as
h r( TI> ... , Tr) = ~o

00 (r + 2m)!pm
r!ml2m CXr+2m
[1 g2(A)dA ]mg(
0
00
Tl) . . . g( Tr) (2.71)
This i1lustrates again the general observation that the Wiener series has a different pattern
of convergence than the Volterra series and, therefore, truncated models will yield differ-
ent prediction accuracies depending on the specific input. The closer the input signal
comes to the GWN input that was used to estimate the Wiener kernels, the better the rela-
tive performance of the Wiener model. However, this performance advantage may turn
into a deficit for certain inputs that deviate significantly from the aforementioned GWN
input.
Example 2.3 of the L-N-M cascade has Wiener kernels given by combining Equation
(2.14) with Equation (2.57) as
hkrJ, ... , Tr ) = f
m=O
(r + 2m)!P'"
r!m!i"\.... ar+2m
Lmin(TI, ... ,
r){[""g2(A/)dA/}m(A)g( Tl - A) ... g(Tr- A)dA
T
h (2.72)
Note that as the posterior filter h(A) tends to a delta function (i.e., its memory decreas-
es or its bandwidth increases), the Wiener kernels for this "sandwich" model tend to their
counterparts ofEquation (2.71), as expected, since the L-N-M cascade tends to the L-N
cascade in this case.
Example 2.4 presents a greater challenge in evaluating its Wiener kernels because of
the complexity of the expressions for its high-order Volterra kemels, This subject will be
examined in Section 3.2, when we analyze the relation between Volterra models and non-
linear differential equations, in connection with studies of nonlinear feedback.
Comparison of VolterralWiener Model Predictions. It is important to reempha-

size that the Wiener kernels depend on the GWN input power level, whereas the Volterra
kemels are independent of any input characteristics. This is due to the fact that the Wiener
kernels are associated with an orthogonal functional expansion (when the input is GWN
of some power level P), whereas the Volterra kemels are associated with an analytic func-
tional expansion that only depends on the functional derivatives, which are characteristic
of the system but independent of the specific input. This situation can be likened to the
difference between the coefficients of an orthogonal and an analytic expansion of a func-
tion, where the coefficients of the orthogonal expansion depend on the interval of expan-
sion, whereas the coefficients of the analytic expansion depend only on the derivatives of
the function at the reference point (see Appendix I). It is therefore imperative that Wiener
kernel estimates be reported in the literature with reference to the GWN input power level
that was used to estimate them. On the other hand, the Volterra kernels are fixed for a giv-
en system and their estimates are input-invariant for a complete model.
When a complete set ofWiener kemels is obtained for a given system, then the complete
set ofVolterra kemels ofthe system can be evaluated using the following relationship:
k n( 'Th . . . , 'Tn ) =
~
00 (-l)m(n + 2m)! p m
n'm'2 m
f oo
• ••
Jhn+2m (TJ, ... , Tm AJ, AJ, . · . ,Am, Am)dA I .. , dAm (2.73)
m-O •• 0
which bears an astonishing resemblance with the reverse relationship (expressing the
Wiener kernels in terms ofthe Volterra kemels) given by Equation (2.57); the only differ-
ence is the (-l)m term in the series.
When a complete set of Wiener kemels cannot be obtained, then approximations of
Volterra kernels can be obtained from Wiener kernels of the same order measured with
various input power levels, utilizing the polynomial dependence of the Wiener kernels on
the GWN input power level P, as described by Equation (2.57). For instance, the first-or-
der Wiener kernel as a function of different values of P is given by Equation (2.57) as
hI(r; P) == I00 (2m + l)!pm

,m 1
00
k2m+ I ( r , Ah Ab' .. ,Am, Am)dA I ... dAm (2.74)

m=O m.2 0
which can be used to eliminate the contribution of k3 from two measurements of h, for
two different PI and P 2 values (following a form of Gaussian elimination):
hl(-r;P 1) -
PI
P h l(T;P2)=
(P P
2 - PI )
kl(T)
2 2
~ (2m + l)!P'i ( pr PI )
+L '2 pm - -P k2m+ I ( r , Ah Ab' .. , Am, Am)dA I dAm (2.75)
m. m
•••
m=2 2 2
This procedure can be continued with a third measurement for P == P 3 in order to elimi-
nate the contribution of ks by computing the expression:
[ hl(T;P 1) -
PI h
P l(r,P2 )
J -
PI - P 2 [ hl(T;P )
P 1 -
PI
P h l(T;P3 ) J=
2 1 - P3 3
(P2 -P 1)(P3 -P2 )

== k I ( r) + {terms involving k-. and higher order Volterra kemels} (2.76)
P 2P3
Therefore, this procedure can be continued until the contribution of all significant Volter-
ra kemeIs is eliminated, yielding a good estimate of k l(T). This procedure can be used for
any order of estimated Wiener kerne1 and can be fonnulated mathematically as an inver-
sion of a Vandennonde matrix defined by the various values of the GWN input power
level used [Mannarelis & Sams, 1982].
Complete sets of either the Wiener or the Volterra kernels can be used to predict the
system output to any given input (for which the series converge). However, if the ob-
tained Wiener or Volterra model is incomplete (truncated), then the accuracy of the pre-
dicted system output will be, in general, different for the two models and for each differ-
ent input signal.
For instance, using the cascade example above, the complete third-order Volterra or
Wiener models will predict precisely the output for any given input. However, if an in-
complete model (e.g., truncated at the second order) is used, then the difference in output
prediction between the second-order Wiener (}lw) and Volterra ()lv) models is
j\(t) -yw(t) = prk2(A, A)dA + 3pr k3(T, A, A)x(t- T)dA dr

o 0
J
== P(a2 + 3(3)1°Og2(A)dA ·rg(T)x(t- T)dT (2.77)
o 0
i.e., the difference depends on the second-order and third-order Volterra kemels (because
ofthe lower order projections ofhigher-order terms in the Wiener functionals) which re-
duces to a simpler relation in this specific example, given by Equation (2.77) as propor-
tional to the first-order Volterra functional. This model prediction difference depends
generally on P and the specific input x(t), as expected. The truncated Wiener model will
have the minimum prediction mean-square error for a GWN input with power level P,
due to its orthogonality. However, for arbitrary input signals, the relative prediction accu-
racy ofthe two truncated models (ofthe same order) will vary.
The proper way of comparing the Volterra/Wiener model predictions is to evaluate thc
mean-square errors in the two cases for a certain ensemble of inputs. This task tends to be
rather cumbersome analytically when we go beyond the second-order functionals. There-
fore, we will use here a second-order example for simplicity of derivations, with the un-
derstanding that the essential conclusions hold for higher-order cases as welle
We will compare the mean-square errors (MSEs) of the two types of first-order model
predictions for arbitrary random inputs. For the first-order Volterra model, the MSE is
o, ~ E[n(t)] = fT JJ
o
kz(T], T2)kz(T{, TDE[x(t- T])x(t- T2)X(t- TDx(t~ T~)]dT]dT2dT{dT~
(2.78)
and depends on the fourth-order autocorrelation function of the input. The MSE of the
first-order Wiener model prediction is
Qw ~ E[G~(t)] = fJ JJ
o
h 2(Tb T2)h 2(T;, T~)E[x(t - TaX(t - T2)X(t - T;)x(t ~T~)]dT]dT2dT;dT~
-2pL\2(A, A)dA· ('J h2(T] , T2)E[x(t- TI)x(t- T2)]dT] dT2

+ {P('h 2(A, A)dAY (2.79)
and depends on the fourth-order and second-order autocorrelation functions of the input.
It is evident from these expressions that a comparison between the two MSEs would
not be a simple matter for an arbitrary ensemble of inputs, if it were not for the fact that
k 2( Tb T2) == h2( Tb T2) for a second-order system. Therefore, for this example of a second-
order system, we have
Qv - c, = 2P('h2(A, A)dA . ('J hz(T], T2)<!J(Tl - T2)dT] dT2 - {p( '«o. A)dAY (2.80)
where 4> denotes the second-order autocorrelation function ofthe input signal.
For an arbitrary input ensemble, Equation (2.80) indicates that a reduction in predic-
tion MSE will occur for the Wiener model (Qv > Qw) if
fJ
o
h2(T], T2)<P( T] - T2)dT] dT2 > ~2 P J(00h2(A, A)dA
o
(2.81)
Therefore, the improvement of the Wiener model prediction depends on the autocorrela-
tion properties of the input ensemble and its relation to the second-order kernel (i.e., the
nonlinear characteristics of the system). It is conceivable that for some inputs the Wiener
model prediction may be worse than the Volterra model prediction of the same order.
However, as the input ensemble tends to GWN [i.e., 4>( Tl - T2) = PS( Tl - T2)], the im-
provement of the Wiener model prediction becomes guaranteed, since the left-hand side
of (2.81) becomes twice the right-hand side. It must be emphasized that these statements
apply only to truncated models, and the complete models (Volterra or Wiener) give the
same output prediction.
For an alternate GWN test input of power level pI, the MSE difference between the
two-model predictions is given by
,iQ(P') = cu:- P)P{L'\2(A, A)dAY (2.82)
which indicates that the reduction in the MSE ofthe Wiener model prediction increases as
the alternate GWN input power level P' increases (relative to the power level P of the
GWN input for which the Wiener series was orthogonalized) but it may become negative
if P' < P/2. For P = P', a reduction in prediction MSE occurs (LlQ > 0), as expected. Note
that the rate ofthis reduction is proportional to the square ofthe integral ofthe second-or-
der kerne I diagonal (i.e., it depends on the system nonlinear characteristics). This illustra-
tive result is strictly limited to second-order systems and must not be generalized to high-
er-order systems and models.
2.2.2 Wiener Approach to Kernel Estimation

The estimation of the unknown system kemels is the key task in nonparametric system
modeling and identification. As discussed earlier, the primary motivation for the introduc-
tion ofthe Wiener series has been the facilitation ofthe kernel estimation task.
In the general case of an infinite Volterra series, the accurate (unbiased) estimation of
the Volterra kernels from input-output data using the methods of Section 2.1.5 is not pos-
sible, in principle, due to the unavoidable truncation of the Volterra series that results in
input-correlated residuals. The severity of this kernel estimation bias depends on the size
of the input-correlated residuals relative to the prediction of the truncated Volterra model.
If more high-order terms are included in the truncated model in order to reduce the size of
the residuals, then the estimation task becomes more cumbersome and often impractical.
Although a practicable solution for this important problem has been recently proposed in
the form oftrainable network models ofhigh-order systems (see Section 2.3.3), this seri-
ous limitation gave initial impetus to the Wiener approach and its variants that were de-
veloped to overcome specific practical shortcomings of the initial Wiener methodology.
The Wiener approach addresses the problem of biased kernel estimation in the general
context of an infinite series expansion by decoupling the various Wiener kernels through
orthogonalization of the Wiener functionals. Nonetheless, the resulting Wiener kemels
are distinct from the Volterra kernels of the system and can be viewed as having a "struc-
tured bias" related to the employed GWN input, as discussed previously.
In this section, we present Wiener' s original approach to kerne I estimation in order to
provide historical perspective and allow the reader to appreciate the many subtleties of
this modeling problem, although Wiener' s approach is not deemed the best choice at pre-
sent, in light of recent developments that have made available powerful methodologies
for the estimation ofVolterra (not Wiener) models with superior performance in a practi-
cal context.
Following presentation ofthe rudiments ofthe Wiener approach, we will elaborate on
its most popular implementation (the cross-correlation technique) in Section 2.2.3 and
discuss its practically useful variants using quasiwhite test inputs in Section 2.2.4. We
note that the recommended and most promising estimation methods at present (using ker-
nel expansions, iterative estimation techniques, and equivalent network models) are pre-
sented in Sections 2.3 and 4.3, and they yield Volterra (not Wiener) models.
The orthogonality of the Wiener series allows decoupling of the Wiener functionals
through covariance computations and estimation ofthe Wiener kernels from the GWN in-
put and the corresponding output data. Since the orthogonality of the Wiener functionals
is independent of the specific kernel functions involved, a known "instrumental" Wiener
functional can be used to isolate each term in the Wiener series (by computing its covari-
ance with the system output) and subsequently obtain the corresponding kernel, For in-
stance, if an mth-order instrumental functional Qm[qm; x(t'), t' ;::: t], constructed with a
known kerneI qm(Tb ... , 'Tm), is used to compute the covariance with the OUtput signal
y(t), then
E[y(t)Qm(t)] = L E[Gn(t)Qm(t)]
n=O
= E[Gm(t)Qm(t)]
= mlpm( ... (hm(Th"" Tm)qm(Th"" Tm)dT\ ... dr.; (2.83)
since Qm is orthogonal (i.e., it has zero covariance) with all Gn functionals for m :/; n.
Note that the instrumental functional Qm(t) has the form of the mth-order Wiener func-
tional given by Equation (2.56) with the kernel qm replacing the kernel hm; hence, it can
be computed for a given input signal x(t).
The "instrumental" kerneI qm( Tl' . . . , 'Tm) is judiciously chosen in order to facilitate the
evaluation of the unknown kernel hm(Tl' ... , Tm), after the left-hand-side of Equation
(2.83) is evaluated from input-output data. Wiener suggested the use of a multidimen-
sional orthonormal basis for defining the instrumental kernels. So, if {b)( T)} is a complete
orthonormal (CON) basis over the range of system memory, TE [0, JL], then instrumental
kernels of the form
qm(Tb· .. , Tm) = b)l(Tl) ... b)m(Tm) (2.84)
can be used to obtain the expansion coefficients {aj}, ... ,im} of the unknown kerne I over
the specified CON basis as
1
a)}, ... «Jm = 'pm E[y(t)Qm(t)] (2.85)
rn.
where
hm( Th ... , Tm) = L ...La)}, ... ,)mb)l(

ji jm
Tl) . . . b)m(Tm) (2.86)
Note that in this case
[m/2] (-1yp1rn ! ~ ~
Qm(t) = L
/=0
(-a1A "'1\'1''''1 V)l(t) ... V)m_2/(t)5jm_2/+},)m_2/+2 .. . 5jm- l ,)m (2.87)
Ho(t)
Hermite Expansion
x(t) H J (t) coeffiCientsl y (t)
CON
INPUT Basis {rJ, , ••• ,riR } OUTPUT
HM (t)
LAGUERRE
FILTERBANK
Figure 2.9 The block-structured Wiener model that is equivalent to the Wiener series for a GWN in-
put when Land M tend to infinity. The suggested filter bank {b;} by Wiener was the complete otho-
normal (CON) Laguerre basis of functions. The Hermite basis generates the signals {Hj(t)}. Since the
Laguerre and Hermite bases are known (selected), the problem reduces to determining the expan-
sion coefficients {/'j1' ... , /'jR} from input-output data when the input is GWN.
where
vif) = rb/
o
T)x(t - T)dT (2.88)
and J.L and Bi,} denote the system memory and the Kronecker delta* respectively. Since the
input x(t) is GWN and the basis functions {bj } are orthononnal, the signals vj(t) are inde-
pendent Gaussian (nonwhite) random processes with zero mean and variance P [Mar-
marelis, 1979b]. Therefore, the instrumental functionals {Qm} can be seen as orthogonal
multinomials in the variables (Vb . . . , V m ) , with a structure akin to multivariate Hermite
polynomials.
Motivated by this observation, Wiener proposed a general model (for the Wiener class
of systems) comprised of a known set of parallel linear filters with impulse response func-
tions {bj ( 'T)} (i.e., a filter bank comprising an orthonormal complete basis of functions,
such as the Laguerre set) receiving the GWN input signal and feeding the filter-bank out-
puts into a multiinput static nonlinearity,y = f(Vb Vb . . .), which he decomposed into the
cascade of a known "Hermite-polynomial" component and an unknown "coefficients"
component to be detennined from the data, as shown in Figure 2.9. The latter decomposi-
tion was viewed as an efficient way of implementing the Hennite-like structure of the
functionals {Qm}, although one may question this assertion of efficiency in light of the
complexity introduced by the Hermite expansion. The system identificationlmodeling
task then reduces to evaluating the "coefficients" component from input-output data for a
GWN input, since all other components are known and fixed.
In the original Wiener formulation, the output values of the filter bank {Vi} are viewed
as variables completely describing the input signal from -00 up to the present time [x( T); T
:5 t]. Wiener chose the Laguerre set of functions to expand the past (and present) of the
*The Kronecker delta Bi,} is the discrete-time equivalent ofthe Dirac delta function (impulse), defmed as 1 für i
=j and as zero elsewhere.
input signal because these functions form a complete orthononnal (CON) basis in the
semi-infinite interval [0, 00) and have certain desirable mathematical properties (which
will be described in Section 2.3.2). In addition, the outputs of the Laguerre "filterbank"
can be easily generated in analog form by linear ladder R-C circuits (an important issue at
that time).
When we employ L filters in the filter bank, we use L variables {vlt)} to describe the
past (and present) of the input signal at each time t. Thus, the system output can be con-
sidered a function of L variables and it can be expanded in tenns of the CON basis of Her-
mite polynomials {~} as
M M
y(t) =}j~ L .. ·.L ».... jJ!ljJ(Vl) ... ~iVR) + ho

R~oo Jl=O JR=O
(2.89)
Clearly, in practice, both M and R must be finite and detennine the maximum order of
nonlinearity approximated by the finite-order model, where M is the maximum order of
Hennite polynomials used in the expansion. Note that the nonlinear order r is defined by
the SUfi ofthe indices: V} + ... + JR)' The Hermite polynomial ofjth order is given by
dj
H{ v) = e- ß v2 - . [e
ß v2
] (2.90)
J dv'
where the parameter ß determines the Gaussian weighting function exp[-2ßv2] that de-
fines the orthogonality ofthe Hermite polynomials as:
r e-2ßv2~I(v)~iv)dv
-00
= B;I,j2 (2.91)
The proper selection of the parameter ß is important in practice, because it determines the
convergence ofthe Hermite expansion ofthe static nonlinearity f(v}, ... , VL) in conjunc-
tion with the GWN input power level P, since the variance of each vlt) process is P.
For any GWN input, the terms ofthe multidimensional Hermite expansion in Equation
(2.89) are statistically orthogonal (i.e., have zero covariance) and are normalized to unity
Euclidean norm because the joint PDF ofthe {Vi} processes has the same form as the Her-
mite weighting function (i.e., multivariate Gaussian). Therefore, the expansion coeffi-
cients in Equation (2.89) can be evaluated through the ensemble average:
l'i} ... iR = E{[y(t) - ho]Hi} [Vl(t)] ... HiR[VR(t)]} (2.92)
where all terms ofthe expansion ofEquation (2.89) with indices Vi' ... ,jR) distinct from
the indices (i;, ... , iR ) vanish. The indices i. through iR take values from 0 to M and add
up to the order of the estimated Wiener functional component. Note that the multiindex
subscript (i} ... i R ) is ordered (i.e., it is nonpennutable) and all possible combinations of
L functions {Vi} taken by R must be considered in the expansion of Equation (2.89). The
mean h o of the output y(t) must be subtracted in Equation (2.92) because it is separated
out in Equation (2.89).
According to the Wiener approach, the coefficients l'i} ... iR characterize the system
completely and the identification problem reduces to the problem of determining these
coefficients through the averaging operation indicated in Equation (2.92). The ensemble
average of Equation (2.92) can be implemented by time averaging in practice, due to the
ergodicity and stationarity of the input-output processes. Once these coefficients have
been determined, they can be used to synthesize (predict) the output ofthe nonlinear mod-
el for any given input, according to Equation (2.89). Of course, the output prediction for
any given input will be accurate only ifthe model is (nearly) complete, as discussed earli-
er [Bose, 1956; Wiener, 1958].
This approach, which was deciphered for Wiener's MIT colleagues in the 1950's by
Amar Bose (see Historical Note #2), is difficult to apply to physiological systems in a
practical context for the following reasons:
1. The form of the output expression is alienating to many physiologists, because it is

difficult to assign some physiological meaning to the characterizing coefficients
that would reveal some functional features ofthe system under study.
2. The experimentation and computing time required for the evaluation ofthe charac-
terizing coefficients is long, because long data records are required in general for
reducing the variance ofthe estimates down to acceptable levels.
For these reasons, this original Wiener approach has been viewed by most investiga-
tors as impractical, and has not found any applications to physiology in the originally pro-
posed form. However, variants of this approach have found many applications, primarily
by means of the cross-correlation technique discussed in Section 2.2.3 and multinomial
expansions of the multiinput static nonlinearity in tenns of the orthogonal polynomials
{Qm} given by Equation (2.87) to yield estimates ofthe expansion coefficients {aJI,J2, ... }
ofthe Wiener kemels, as described by Equation (2.85) [Marmarelis, 1987b].
To avoid the complexity introduced by the Hermite expansion, Bose proposed the use
of an orthogonal class of functions that he called "gate" functions, which are simply
square unit pulses that are used to partition the output function space into nonoverlapping
cells (hence, orthogonality ofthe gate functions) [Bose, 1956]. This fonnulation is con-
ceptually simple and appears suitable for systems that have strong saturating elements.
Nonetheless, it has found very limited applications, due to the still cumbersome model
form and the demanding requirement for long input-output data records.
It must be emphasized that the main contribution of Wiener's fonnulation is in sug-
gesting the decomposition ofthe general nonlinear model into a linear filter bank (using a
complete set of filters that span the system functional space) and a multiinput static non-
linearity receiving the outputs ofthe filter bank and producing the system output (see Fig-
ure 2.9). This is a powerful statement in terms ofnonlinear dynamic system modeling, be-
cause it separates the dynamics (the filter-bank stage) from the nonlinearities and reduces
the latter to a static form that is much easier to represent/estimate, for any given applica-
tion.
In the many possible variants of the Wiener approach, different orthogonal or
nonorthogonal bases of functions can be used both for the linear filter bank and the static
nonlinearity. We have found that the Laguerre basis (in discrete time) is a good choice for
the filter bank in general, as discussed in Section 2.3.2. We have also found that polyno-
mial nonlinearities are good choices in the general case, although combinations with spe-
cialized forms (e.g., sigmoidal) may also be suitable in certain cases. These important is-
sues are discussed further in Section 2.3 and constitute the present state of the art, in
connection with iterative (gradient-based) estimation methods and equivalent network
structures (see Sections 4.2~.4).
It should be noted that a general, yet rudimentary, approach in discretized, input-out-

put space (having common characteristics with the Bose approach), may utilize a grid of
the discrete input values that cover the memory of the system and the dynamic range of
amplitudes ofthe input. At any discrete time t, the present and past values ofthe input are
described by a vector ofreal numbers (xo, Xl' . . . ,XM) that can be put in correspondence
with the value, Yo, of the system output at this time, thus forming the "mapping"
input-output vector (xo, Xj, . . . ,XM, Yo). As the system is being tested with an ergodie in-
put (e.g., white noise), input-output vectors are formed until the system is densely tested
by combinations of values of the input vectors. All these input-output vectors define an
input-output mapping that represents a "digital model" of the system in a most rudimen-
tary form.
In the next two sections, we complete the traditional approaches to Wiener kerne I esti-
mation that have found many applications to date but have seen their utility eclipsed by
more recent methodologies presented in Sections 2.3 and 4.3.
2.2.3 The Cross-Correlation Technique for Wiener Kernel Estimation

Lee and Schetzen (1965) proposed a different implementation of Wiener's original idea
for kernel estimation that has been widely used because of its relative simplicity. The Lee
and Schetzen method, termed the "cross-correlation technique," is based on the observa-
tion that the product of m time-shifted vers ions of the GWN input can be written in the
form ofthe leading term ofthe mth-order Wiener functional using delta functions:
x(t- TI) . . . X(t- Tm) = r.. r

o °
8(TI - AI)· .. 8(Tm- Am)x(t- AI)· .. x(t- Am)dAI ... dAm
(2.93)
The expression of Equation (2.93) has the form of a homogeneous (Volterra) functional
of mth order and, therefore, it is orthogonal to all Wiener functionals of higher order.
Based on this observation, they were able to show that, using this product as the leading
term of an "instrumental functional" in connection with Equation (2.83), the Wiener ker-
nel estimation is possible through input-output cross-correlations of the respective order.
The resulting expression for the estimation of the mth-order Wiener kernel is
1
h m('Tb ... , 'Tm) = --E[Ym(t)x(t - 'Tl) ... x(t - 'Tm)] (2.94)
m!pm
where Ym(t) is the mth-order output residual defined by the expression

m-I
Ym(t) = y(t) - I
n=O
Gn(t) (2.95)
The use ofthe output residual in the cross-correlation formula ofEquation (2.94) is ne-
cessitated by the fact that the expression of Equation (2.93), having the form of a homo-
geneous (Volterra) functional of mth order, is orthogonal to all higher-order Wiener func-
tionals but not to the lower-order ones, whose contributions must be subtracted. It is seen
later that this subtraction is required only by the lower-order terms of the same parity
(odd/even) in principle. Failure to use the output residual in Equation (2.94) leads to se-
2.2 WJENER MODELS 73
vere misestimation of the Wiener kernels at the diagonal values, giving rise to impulse-
like errors along the kernel diagonals. In practice, this output residual is computed on the
basis of the previously estimated lower-order Wiener kernels and functionals. Thus, the
use of the output residual in Equation (2.94) implies the application of the cross-correla-
tion technique in ascending order ofWiener kernel estimation.
The statistical ensemble average denoted by the "expected value" operator E[.] in
Equation (2.94) can be replaced in practice by time averaging over the length ofthe data
record, assuming stationarity ofthe system. Since the data record is finite, these time av-
erages form with certain statistical variance (i.e., they are not precise). This variance de-
pends on the record length and the GWN input power level, in addition to the ambient
noise and the system characteristics, as detailed in Section 2.4.2. At the risk ofbecoming
somewhat pedantic, we detail below the successive steps in the actual implementation of
the cross-correlation technique for Wiener kernel estimation [Marmarelis & Marmarelis,
1978; Schetzen 1980].
Estimation of ho- The expected value of each Wiener functional Gn[hn; x(t)] (for n 2::
1) is zero if x(t) is GWN, since the Wiener functionals for n 2:: 1 are constructed orthogo-
nal to the constant zeroth-order Wiener functional h.; Therefore, taking the expected val-
ue ofthe output signal y(t) yields
E[y(t)] = ho (2.96)
which indicates that ho is the ensemble average (mean) or the time average of the output
signal for a GWN input.
Estimation of h 1(T). The shifted inputx(t- 0") is a first-order homogeneous (Volterra)

functional according to Equation (2.93), where it is written as a convolution ofthe GWN
input with a delta function. Since x(t - 0") can be also viewed as a Wiener functional of
first order (no other terms are required in the first-order case), its covariance with any oth-
er order of Wiener functional will be zero. Thus,
E[y(t)x(t - lT)] = E[X(t - o) r h,( r)x(t - r)dT]
= r
o
h b)E[x(t - lT).x(t - T)]dT
= f\,(T)p8(T-lT)dT (2.97)
o
since the second-order autocorrelation function of GWN is a delta function with strength
P. Therefore, the first-order Wiener kernel is given by the cross-correlation between the
GWN input and the respective output, normalized by the input power level P:
h 1(0") = (I/P)E[y(t)x(t - u)] (2.98)
Note that the cross-correlation fonnula (2.98) for the estimation of h, does not require in
principle the use ofthe first-order output residual prescribed by Equation (2.94). The rea-
son for this is the parity (odd/even) separation of the homogeneous functionals (and, by
extension, of the Wiener functionals) due to the fact that the odd-order autocorrelation
functions of GWN are unifonnly zero (see Appendix 11). Nonetheless, since the orthogo-
nality between Wiener functionals is only approximate in practice due to the finite data
records, it is advisable to always use the output residual as prescribed by Equation (2.94).
This implies subtraction of the previously estimated ho from y(t) prior to cross-correlation
with x(t - rr) for the practical estimation of h 1(er).
Estimation 0' h 2 ( Tl' 72). Since x(t - lTl)X(t - lTz) is a second-order homogeneous
(Volterra) functional of x(t) [see Equation (2.93)], it is orthogonal to all Wiener function-
als ofhigher order, i.e., E[G;(t)x(t - lTl)X(t - uz)] = 0 for i > 2. Thus, the cross-correlation
between [x(t - Ul)X(t - lTz)] and the output y(t) eliminates the contributions of all Wiener
functionals except Go, GI, and Gz. Furthermore,
E[GoX(t- Ul)X(t- uz)] = hoP8(lTl -lTz) (2.99)
indicating that the estimate of ho ought to be subtracted from y(t) prior to cross-correla-
tion, and
EtG](t)x(t- O"])x(t- 0"2)] = r h](T)E[X(t- O"])x(t- 0"2)x(t- T)]dT= 0 (2.100)
since all the odd-order autocorrelation functions of GWN are zero (see Appendix 11). For
the second-order Wiener functional, we have
E[Gz(t)x(t - <Tl)X(t- uz)]
= p2 fJ h2( Th T2)[ 0( Tl - T2)0(0"] - 0"2) + 0(T] - 0"])0(T2 - 0"2)
r
+ 0( T2 - 0"])0(T] - 0"2)]dT]dT2 - p20(0"] - 0"2) h2(T], Tl)dTI
= 2p Zh z(Uh uz) (2.101)
using the Gaussian decomposition property (see Appendix 11) and the symmetry of the
second-order kemel, Thus, the cross-correlation between the output y(t) and [x(t- lTl)x(t-
uz)] yields
E[y(t)x(t - Ul)x(t - uz)] = Ph o8(lTl - lTz) + 2pZh z(lTb uz) (2.102)
which demonstrates the previously stated fact of impulse-like estimation errors along the
kerneI diagonals when the output residual is not used in the cross-correlation formula of
Equation (2.94). This example also demonstrates the fact that there is an oddleven separa-
tion ofthe Wiener functionals (i.e., h, is not present in Equation (2.102) but ho is). In or-
der to eliminate the contribution of ho to this second-order cross-correlation, it suffices in
practice to subtract from the output signaly(t) the previously estimated value of ho (which
is the average value ofthe output signal). However, it is generally advisable to subtract all
lower-order functional contributions from the output signal, as prescribed by Equation
(2.95), because the theoretical orthogonality between [x(t - lTl)x(t - lTz)] and G 1[h 1; x(t)]
is only approximate for finite data-records in practice. Therefore, in order to minimize the
estimation errors due to the finite data records, we cross-correlate the second-order output
residual (the "hats" denote estimates ofthe Wiener kemels):
Y2(t) = y(t) -ho - {"h\(T)x(t- T)dT (2.103)

o
with [x(t - UI)X(t - (2)] to obtain the second-order Wiener kernel estimate as
h 2(Uh (2) = (1/2p2)E[y2(t)x(t- UI)X(t - (2)] (2.104)
It should be noted that a possible error Jiho in the estimate of ho used for the computation
of the output residual causes some error Jih2 at the diagonal points of the second-order
Wiener kernel estimate that takes the form of adelta function along the diagonal:
Jih2(uJ, (2) = (1/2P)Jih 08(UI - (2) (2.105)
Estimation of h 3 (Tl ' T2' T3). Following the same reasoning as for h 2 , we first compute
the third-order output residual Y3(t):
Y3(t) = y(t) - G2(t) - GI(t) - Go (2.106)
using the previously estimated ho, h1, and h2 (even though the subtraction of G2 and Go is
not theoretically required due to the odd/even separation), and then we estimate the third-
order Wiener kernel by computing the third-order cross-correlation:
h 3(UI, U2, (3) = (1/6p3)E[Y3(t)x(t - O"I)X(t - (2)X(t - (3)] (2.107)
It must be noted that any imprecision Jih l in the estimation of h l , which is used for the
computation of the output residual, will cause some error Jih3 along the diagonals of the
h 3 estimate that will have the impulsive form
Jih3(Uh U2, (3) = (1/6P)[Jih l(UI)8(0"2 - (3) + Jih l(U2)8(U3 - 0"1) + Jih l(U3)5(UI - (2)]
(2.108)
because
E[GI(t)x(t - UI)X(t- 0"2)X(t - 0"3)] = P2[h l(UI)5(0"2 - (3)

+ h l(U2)5(0"3 - UI) + h l(0"3)8(UI - (2)] (2.109)
The successive steps of Wiener kernel estimation using the cross-correlation technique
are illustrated in Figure 2.10. The aforementioned errors due to the finite data records [see
Equations (2.105) and (2.108)] occur along the diagonals, but additional estimation errors
occur throughout the kernel space for a variety of reasons detailed in Section 2.4.2 and
become more severe as the order of kernel estimation increases. A complete analysis of
estimation errors associated with the cross-correlation technique and their dependence on
the input and system characteristics is given in Seetion 2.4.2.
white-noise stimulus response

~ ll.i!!j SYSlEM ~ ~
~ '.!tlj AVERAGE ~
.ho
response
An" 1\ 1\ D. 1\"
.h o
FIRST-ORDER KERNEl
h 1 (T)
~
W.N.stimuJus ...k.JJ.J
vT-'-~T ~)~
kernel-predo
response nonlinear response

Yo(t ) component
~ YLTIIt.ßU8TRACT~) ~
response
SECONO-ORDER KERNEL
response ~ hZ (TI' TZ)
~ Y~
""I'l~ ~AlA. A.Ml.Lt.aA."IA.k.JwJx(t_~ ) . CROSS ~T""T"2)
I ~~'/
'~./t'",'r.\1rv"'''''Yr'''''·Y·'''FT f CORRELATE
~J~ X(t-Tz)
fWG)ß "
~~00
W.N. stimulus
Tf
Figure 2.10 Illustration 01 the successive steps for the estimation 01 zero-, first-, and second-order
Wiener kerneis via the cross-correlation technique [Marmarelis & Marmarelis, 1978].
The computation of the output residuals in ascending order also provides in practice a
measure of model adequacy at each order of Wiener kernel estimation. This measure of
model adequacy is typically the normalized mean-square error (NMSE) Qr of the rth-or-
der output prediction, defined as the ratio of the variance of the rth-order output residual
Yr(t) to the output variance (note that the first-order output residual is simply the de-
meaned output):
Qr = E[y;(t)]/E[yt(t)] (2.110)
The NMSE measure Qr lies between 0 and 1 (for r ~ 1) and quantifies the portion of
the output signal power that is not predicted by the model. If this quantity Qr drops below
a selected critical value (e.g., a value Qr = 0.1 would correspond to 10% NMSE ofthe rth-
order model prediction, indicating that 90% of the output signal power is explained/pre-
dicted by the model), then the kernel estimation procedure can be terminated and a trun-
cated Wiener model of rth order is obtained. Obviously, the selection of this critical
NMSE value depends on the specific requirements of each application and on the prevail-
ing signal-to-noise ratio (which provides a lower bound for this value). In applications to
date, this value has ranged from 0.02 to 0.20.
As a practical matter, the actual estimation of Wiener kernels has been limited to sec-
ond-order to date (with rare occasions of third-order Wiener kernel estimation but no fur-
ther) due to the multi-dimensional structure of high-order kernels, This is dictated by
practicallimitations of data-record length, computational burden, and estimation accuracy
that become rather severe for multidimensional high-order kernels.
Same Practical Consideratians. Wiener kernel estimation through the cross-corre-

lation technique has several advantages over the original Wiener formulation because: (1)
it directly estimates the Wiener kernels, which can reveal interpretable features ofthe sys-
tem under study; (2) it is much simpler computationally, since it does not involve the con-
catenated Laguerre and Hermite expansions.
Because of its apparent simplicity, the cross-correlation technique has found many ap-
plications in physiological system modeling and fomented the initial interest in the
Wiener approach. Nonetheless, the application ofthe original cross-correlation technique
exposed a host of practical shortcomings that were addressed over time by various inves-
tigators. These shortcomings primarily concemed issues of estimation errors regarding
the dependence of estimation variance on the input length and bandwidth, as weIl as the
generation and application of physically realizable approximate GWN inputs. This
prompted the introduction of a broad class of quasiwhite test input signals that address
some of these applicability issues, as discussed in Section 2.2.4.
Among the important practical issues that had to be explored in actual applications of
the cross-correlation technique are the generation of appropriate quasiwhite test signals
(since ideal GWN is not physically realizable), the choice ofinput bandwidth, the accura-
cy of the obtained kernel estimates as a function of input bandwidth and record length,
and the effect of extraneous noise and experimental transducer errors. A comprehensive
and detailed study of these practical issues was first presented in Marmarelis & Mar-
marelis (1978). A summary of the types of estimation errors associated with the cross-
correlation technique is provided in Section 2.4.2, since the cross-correlation technique is
a "legacy methodology" but not the best choice at present time.
It is evident that in actual applications the ideal GWN input has to be approximated in
terms of practicallimitations on bandwidth and amplitude. Since the latter are both infi-
nite in the ideal GWN case, we have to accept in practice a band-limited and amplitude-
truncated GWN input.
In light of these inevitable approximations, non-Gaussian, quasiwhite input signals
were introduced that exhibited whiteness within a given finite bandwidth and remained
within a specified finite amplitude range. In addition, these quasiwhite test signals were
easy to generate and offered computational advantages in certain cases (e.g., binary or
temary signals). A broad family of such quasiwhite test input signals was introduced in
1975 by the author during his Ph.D. studies. These signals can be used in connection with
the cross-correlation technique for kernel estimation and give rise to their own orthogonal
functional series, as discussed in Section 2.2.4.
We must note that a fair amount of variance is introduced into the Wiener kernel esti-
mates (obtained via cross-correlation) because of the stochastic nature of the (quasi-)
white noise input. This has prompted the use ofpseudorandom m-sequences which, how-
ever, present some problems in their high-order autocorrelation functions (see Section
2.2.4). The use of deterministic inputs (such as multiple impulses or sums of sinusoids of
incommensurate frequencies) alleviates this problem of stochastic inputs but places us in
the context ofVolterra kerneis, as discussed in Section 2.1.5.
An interesting estimation method that reduces the estimation variance for arbitrary sto-
chastic inputs is based on exact orthogonalization of the expansion terms for the given in-
put data record [Korenberg, 1988]. However, this method (and any other method based on
least-squares fitting for arbitrary input signals) yields a hybrid between the Wiener and
the Volterra kerneis of the system, which depends on the spectral characteristics of the
specific input signal. This hybrid can be viewed as a biased estimate ofthe Wiener and/or
Volterra kerneis if significant higher-order terms exist beyond the order of the estimated
truncated model.
Another obstacle in the broader use of the cross-correlation technique has been the
heavy computational burden associated with the estimation of high-order kerneis. The
amount of required computations increases geometrically with the order of estimated
kernei, since the estimated kernel values are roughly proportional to Mf2, where Q is the
kernel order and M is the kernel memory-bandwidth product. This prevents the practi-
cal estimation of kerneis above a certain order, depending on the kernel memory-band-
width product (i.e., the number of sampie values per kerne I dimension). An additional
practical limitation is imposed by the fact that the kerneis with more than three dimen-
sions are difficult to represent, inspect, or interpret meaningfully. As a result, successful
application of the cross-correlation technique has been limited to weakly nonlinear sys-
tems (typically second-order or rarely third-order) to date.
Illustrative Example. To illustrate the application of thecross-correlation technique

on a real system, we present below one of its first successful applications to a physiologi-
cal system that transforms light-intensity variations impinging upon the catfish retina into
a variable ganglion cell discharge rate [Marmarelis & Naka, 1973b]. The latter is mea-
sured by superimposing multiple spike-train responses (sequences of action potentials
generated by the ganglion cell) to the same (repeated) light-intensity stimulus signal. The
physical stimulus is monochromatic light intensity modulated in band-limited GWN fash-
ion around a reference level of mean light intensity (light-adapted preparation). The re-
sponse is the extracellularly recorded ganglion cell response (spike trains superimposed
and binned to yield a measure of instantaneous firing frequency). The stimulus-response
data are processed using the cross-correlation technique to estimate the first-order and
second-order Wiener kernels of the light-to-ganglion cell system shown in Figure 2.11.
Frequency-Domain Estimation of Wiener Kerneis. Wiener kerne I estimation is

also possible in the frequency domain through the evaluation of high-order cross-spectra
[Brillinger, 1970; French & Butz, 1973; Barker & Davy, 1975; French, 1976], which rep-
resent the frequency-domain counterparts of the high-order cross-correlations. The prob-
lem of estimation variance due to the stochastic nature of the GWN input persists in this
approach as weIl.
.-
h 2 ( .,." 1'2)
1'1
.05 sec
0,o . .012. . .-.
'<' .. •
.121
• •
.11
• ,
t----t 1'2
.012
.C8I
...
., .
•• 8
Figure 2.11 The Wiener kernel estimates of first order (Iett) and second order (right) obtained via
the cross-correlation technique for the light-to-ganglion cell system in the catfish retina [Marmarelis
& Naka, 1973b].
The efficient implementation of the frequency-domain approach calls for the use of
fast Fourier transforms (FFTs) with a data-length size equal to the smallest power oftwo
that is greater than or equal to twice the memory-bandwidth product of the system. Thus,
if the FFT size is 2M (a power of two by algorithmic necessity), then the input-output
data record is segmented into K contiguous segments of 2M data points each. An estimate
of the rth-order cross-spectrum,
Sr,z(WJ, ... , wr) = Yr,lWI + ... + Wr)Xi(WI) ... Xj(w r) (2.111)
can be obtained from the ith segment, where Xj( w) denotes the FFT conjugate of the ith
input segment and Yr,z(w) denotes the FFT ofthe corresponding ith segment ofthe rth out-
put residual. Then, the rth-order Wiener kernel can be estimated in the frequency domain
by averaging the various segment estimates ofEquation (2.111):
" 1 ~"
Hr(wJ, ... , wr) = -,-r-LSr,z(wJ, ... , wr) (2.112)
r.PK i = l
where P is the input power level and (2M) is the total number of input-output data points.
The rth-order Wiener kerneI estimate can be converted into the time domain via an r-
dimensional inverse FFT.
This approach offers some computational advantages over direct cross-correlation (es-
pecially for systems with large memory-bandwidth product) but exhibits the same vulner-
abilities in terms of kerneI estimation errors.
2.2.4 Quasiwhite Test Inputs

The aforementioned quasiwhite random signals, whieh ean be used as test inputs in the
eontext of Wiener modeling, have been termed "eonstant-switehing-paee symmetrie ran-
dom signals" (CSRS) and they are defined by the following generation procedure [Mar-
marelis, 1975, 1976, 1977]:
1. Seleet the bandwidth of interest, B, (in Hz), and the desirable finite-range symmet-
rie amplitude distribution (with zero mean), p(x), for the diserete-time CSRS input
signal x(n).
2. Draw an independent sample x(n) at every time step li.t = 1/(2Bx ) (in sec) from a
random number generator aecording to the probability distributionp(x).
It has been shown [Marmarelis, 1975, 1976, 1977] that a quasiwhite signal thus gener-
ated possesses the appropriate autoeorrelation properties of all orders that allow its use for
Wiener-type modeling of nonlinear systems with bandwidths B ~ 1/(2Ii.t) via the eross-
eorrelation teehnique.
In analog form, the CSRS remains constant at the seleeted value, x(n), for the duration
of eaeh interval zir [i.e., for nli.t ~ t< (n + 1) li.t] and switehes at t = (n + 1) dt to the sta-
tistieally independent value x(n + 1), where it stays eonstant until the next switching time
t = (n + 2)at. The fundamental time interval zir is called the "step" ofthe CSRS, and it de-
termines the bandwidth ofthe signal within which whiteness is seeured: B; = 1/(2Ii.t).
It has been shown that the statistical independence of any two steps of a CSRS is suffi-
cient to guarantee its whiteness within the bandwidth determined by the specified step
size li.t. The symmetrie probability density function (PDF) p(x) with which a CSRS is
generated has zero mean, and, consequently, all its odd-order moments are zero. The
even-order moments of p(x) are nonzero and finite, because the CSRS has finite ampli-
tude range. The power level of the CSRS is (M2 • li.t), where M 2 is the second moment of
p(x)--also the variance since p(x) has zero mean.
This random quasiwhite signal approaches the ideal white-noise proeess as li.t tends to
zero. Note that in order to maintain constant CSRS power level as li.t decreases, the signal
amplitude must be divided by ~. This implies that the distribution of the asymptotic
white-noise process as at ~ 0 is ~ p(x· ~) and tends to infinite variance. For in-
stance, if the standardized normal/Gaussian distribution is used to generate the CSRS
(zero mean and unit variance), then the asymptotic PDF is
lim V Ii.t/(2TT) exp[-x2Ii.t/2]

6.t~O
having varianee equal to 1/lit.

We note that every member of the CSRS family is by construction a stationary and er-
godie process. For a given step size li.t and a proper amplitude PDF p(x), an ensemble of
random proeesses is defined within the CSRS family. Because of the ergodicity and sta-
tionarity of the CSRS, the ensemble-through and time-through statistics of the signal are
invariant over the ensemble and over time, respectively. This allows us to define its auto-
correlation functions in both the temporal and the statistieal (ensemble) sense.
In practiee, we have finite-length signals and we are praetieally restrieted to obtaining
only estimates of the autoeorrelation funetions, usually by time averaging over the reeord
lengthR, as
A
4>n( 'Tb ... ,
1
'Tn-I) = R _ T
IR x(t)x(t - 'Tl) . . . x(t - 'Tn_I)dt (2.113)
m Tm
where Tm = max {'Tb . . . , 'Tn-I} and stationarity has suppressed one tau argument (e.g., no
shift for the first term of the product). For discrete-time (sampled) data, the integral of
Equation (2.113) becomes a summation.
The estimate 4>n( 'Tb •.. , 'Tn-I) is a random variable itself and its statistical properties
must be studied in order to achieve an understanding of the statistical properties of the
kerneI estimates obtained by the cross-correlation technique. It was found that the expect-
ed value of 4>n( 'Tb . . . , 'Tn-I) is cPn( 'Tb . . . , 'Tn-I), which makes it an unbiased estimate, and
the variance of 4>n( 'Tb . . . , 'Tn-I) at all points tends to zero asymptotically with increasing
record length R, which makes it a consistent estimate [Marmarelis, 1977].
It has been shown [Marmarelis, 1975, 1977] that the autocorrelation functions of a
CSRS are those of a quasiwhite signal; that is, the odd-order autocorrelation functions are
uniformly zero, whereas the even-order ones are zero everywhere except at the diagonal
strips. Note that the diagonal strips are areas within ±at (the step size of a CSRS) around
every Juli diagonal of the argument space (i.e., where all the arguments form pairs of
identical values).
If we visualize the autocorrelation function estimate as a surface in n-dimensional
space, then 4>n( 'Tb . . . , 'Tn-I) appears to have apexes corresponding to the nodal points of
the n-dimensional space (i.e., the points with coordinates that are multiples of at). These
apexes are connected with n-dimensional surface segments that are offirst degree (planar)
with respect to each argument 'Ti. The direct implication of this morphology is that the
"extrema" of the surface 4>n( 'Tb . . . , 'Tn-I) must be sought among its apexes (i.e., among
the nodal points of the argument space). This simplifies the analysis of this multi dimen-
sional surface that determines the statistical characteristics of the CSRS kerne I estimates.
To illustrate this, we consider the second-order autocorrelation function of a CSRS:
<P2(T\) = { :2( 1- 1;1) for I 'Tl 1 :::; at

(2.114)
for I 'Tl 1 > at
shown in Figure 2.12 along with its power spectrum (i.e., the Fourier transform ofthe au-
tocorrelation function). The quasiwhite autocorrelation properties ofthe CSRS family are
manifested by the impulse-like structure oftheir even-order autocorrelation functions and
justify their use in kerne I estimation via the cross-correlation technique. However, the
kernel estimates that are obtained through the use of CSRS test inputs correspond to an
orthogonal functional series that is slightly different in structure from the original Wiener
series. This structural difference is due to the statistical properties of the CSRS, as ex-
pressed by the moments of its amplitude PDF, which are different in general from the mo-
ments ofthe Gaussian distribution (although the latter is included in the CSRS family).
The decomposition property of even products of Gaussian random variables (see Ap-
pendix 11) results in great simplification of the expressions describing the orthogonal
Wiener functionals. In the general case of a non-Gaussian CSRS, however, the decompo-
sition property does not hold and complete description ofthe amplitude PDF may require
several of its moments, which results in greater complexity of the form of the CSRS or-
thogonal functionals. Nevertheless, the construction ofthe CSRS functionals can be made
xh)
M~(Tr-T2) ..
l:it 0 Öt
- - - - -..... T
1-T2
~
6~
~(f) = p(Sin1rf6t)2
1TfAt
-2/6' -1I6t o 1/6t 2/6t
log ~(f)
~ Ia:: _
>-
.~ I logP
o
e
n
~
(/)
Frequency 1/6t2/Af logf
Figure 2.12 Portion of a CSRS quasiwhite signal (top left), its seeond-order autoeorrelation func-
tion (top right) and its power spectrum in linear and logarithmie seales (oottom) [Marmarelis & Mar-
rnarelis, 1978].
routinelyon the basis of an orthogonalization (Gram-Schmidt) procedure similar to the

one that was used in the construction of the Wiener series, under the assumption that the
CSRS bandwidth is broader than the system bandwidth.
In the special case where a Gaussian amplitude PDF is chosen for the CSRS, the CSRS
functional series takes the form of the Wiener series, where the power level of the quasi-
white input is equal to the product of the second moment and the step size of the CSRS (P
= M 2at). Thus, it becomes clear that the CSRS functional series is a more general orthog-
onal functional expansion than the Wiener series, extending the basic idea of orthogonal
expansions using Volterra-type functionals throughout the space of symmetrie probability
distributions. The advantages of such a generalization are those that accrue in any opti-
mization problem where the parameter space is augmented. This generality is achieved at
the expense of more complexity in the expressions for the orthogonal functionals if a non-
Gaussian PDF is chosen.
Under the assumption that the CSRS bandwidth is broader than the system bandwidth,
the first four orthogonal functionals {G~} that correspond to a CSRS quasiwhite input
take the form [Marmarelis, 1977]
Gö[go; x(t'), t' ::; t] = go (2.115)
Gt[g$T$; x(t'), t'::::; t] = {"g$T$x(t- T\)dT\ (2.116)

er! [g2( 'Tb 'T2); X(t '), t' ~ t] = fl

0 0 0
oo
g2('T], 'T2)x(t - 'TI ).x(t - 'T2)d'T] d'T2 - (M2Ilt) L oo
g2('T], 'T] )d'T]

(2.117)
Gj[gl 'Tb 'Tl> 'T3); X(t'), t' -s t] = fff

o 0 0
g3('Tb 'T2, 'T3)x(t - 'T])x(t - 'T2)x(t - 'T3)d'T]d'T2 d'T3
-3 (M2Ilt)ff
o
g3(
0
'Tb 'T2, 'T2)x(t - 'T])d'T]d'T2 - [(M
M
3M2)llt ]f g3(
4
2
-
2
0
'Tb 'Tb 'T])x(t- 'T])d'T]
(2.118)
where X(t) is a CSRS, Ilt is its step size, {gi} are its associated ~SRS kernels, and M 2 , M 4 ,
and so on are the second, fourth, and so on moments of its amplitude PDF p(x). It is evi-
dent that the deviation of the CSRS functionals (and associated kemels) from their
Wiener counterparts diminishes as the CSRS amplitude distribution approaches the
Gaussian profile, since then M 4 = 3M~ and G3* attains exactly the form of a third-order
Wiener functional with power level P = M 2 1lt.
The expressions for higher-order functionals become quite complicated since they in-
volve all the higher even moments of the CSRS input, but their derivation can be made
routinelyon the basis of a Gram-Schmidt orthogonalization procedure. Notice that the
basic evenlodd separation in the structural form ofthe CSRS functionals is the same as in
the Wiener functionals, i.e., the odd- (even-) order functionals consist solely of all odd-
(even-) order homogeneous functionals of equal and lower order.
The CSRS functionals should be viewed as slightly modified Wiener functionals. The
integral terms (i.e., the homogeneous functionals) of the CSRS functionals that contain
higher even moments (>2) contain also a higher power of Ilt as a factor. This makes them
significantly smaller than the terms containing only the second moment, since Ilt attains
small values. Therefore, the CSRS functionals become (for all practical purposes) the
same as the Wiener functionals for very small zsr. This implies in turn that whenever Ilt is
very smalI, the CSRS kernels are approximately the same as the Wiener kernels, except
possibly at the diagonal points where the finite number of CSRS amplitude levels (e.g.,
binary, ternary, etc.) limits estimability, as discussed below.
The power spectrum of a CSRS with step size ~t and second moment M 2 is shown in
Fig. 2.12 and is independent of the amplitude PDF. The bandwidth of the signal is in-
versely proportional to Ilt, and it approaches the ideal white noise (infinite bandwidth) as
Ilt approaches zero (provided that the power level P = M 2 1lt remains finite). The orthogo-
nality of the CSRS functionals is satisfactory when the bandwidth of the CSRS exceeds
the bandwidth ofthe system under study.
In order to estimate the rth-order CSRS kernel, we cross-correlate the rth-order output
residual y(t) with r time-shifted versions of the CSRS input (as in the GWN input case)
and scale the outcome:
gr(Uh' .. , ur) = C,E[Yr(t)x(t - UI) . . . x(t - Ur)] (2.119)
where
r-l
Yr(t) = y(t) - I
;=0
GT[g;(Th ... , Ti); x(t/), t' 5 t] (2.120)
and C; is the proper scaling faetor that depends on the even moments and the step size of
the CSRS, as well as the loeation of the estimated point in the kernel (e.g., diagonal vs.
nondiagonal).
It was shown earlier that in the case of GWN inputs, the scaling factor er is (r!pr)-l.
However, in the case of CSRS inputs, the scaling factors differ for the diagonal and non-
diagonal points of the kerneIs. For the non-diagonal points (i.e., when all c, are distinct)
the scaling factor is the same as in the GWN case, where P = M 2 flt. However, the deter-
mination of the appropriate scaling factors for the diagonal points involves higher even
moments. For example, in the second-order case the scaling factor for the diagonal points
( (J' 1 = (J' 2) is found to be:
1
C2 = (M4 - M~)~t2 (2.121)
CSRS and Va/terra Kerne/s. It is instructive to examine the relation between the
Volterra and the CSRS kernels of a system (at the nondiagonal points, for simplicity of
expression) in order to demonstrate the dependence of the CSRS kernels upon the even
moments and the step size of the specific CSRS that is used to estimate the kerneIs.
Recall that the Volterra kemels of a system are independent of input characteristics, un-
like the Wiener or CSRS kernels that depend on the GWN or CSRS input characteris-
tics (i.e., the power level or the step size and even moments, respectively) [Marmarelis
& Marmarelis, 1978]. The resulting expressions for the CSRS kernels in terms of the
Volterra kernels {k i } of the system are more complicated than their Wiener counterparts
given by Equation (2.57). The general expression for the nondiagonal points of the
even-order CSRS kernels is
g2n(CTb ... , CT2n) = I Cm.n(p])

00
m=n
{(X)
I·
0
··l k
0
00
2m( Tl> .•• , T2m-2m CTb •.• , CT2n)dT] ... dT2m-2n
+ I~tIDI.m.n(Pl"" ,pl+l)I~· ·fk2m(Tb " " T2m-2n--2b CTI"'" CT2n)dTI ... dT2m-2n-2/}
~l 0 0
(2.122)
where PI is a "generalized power level of Ith order" for CSRS inputs, defined as
P/=M2 /fl r (2.123)
The function Cm,n depends only on the conventional power level P = PI, hut the function
DI,m,n is a rational expression of the generalized power levels. Note, however, that the
terms of the second summation in the expression (2.122) are negligible in comparison
with the terms of the first summation, since flt is very small. Furthermore, the function
Cm.n(P 1) tends to the coefficient found in the relation between the Wiener and the Volter-
ra kernels given by Equation (2.57) as flt tends to zero. Hence, the CSRS kemels at the
nondiagonal points are approximately the same as the Wiener kernels (as long as they
have the same power level P = M 2 flt). The only significant difference hetween the CSRS
and the Wiener kernels of a system is found at some diagonal points whenever the CSRS
attains a finite number of amplitude levels, as discussed below.
The Diagonal Estimability Problem. Every CSRS waveform attains (by construc-
tion) a finite number L of amplitude levels. This limits our ability to estimate via cross-
correlation the kernel values at diagonal points that have dimension L or above. For in-
stance, a binary CSRS input x(t) attains the values of ±A. Therefore, x 2(t) is a constant A2
for this binary CSRS input and the cross-correlation formula of Equation (2.119) yields at
the diagonal points ((Tl = (T2 = (T) ofthe second-order binary kernel
g~((T, (T) = C2A2E[y2(t)] = 0 (2.124)
since the second-order output residual has zero mean by construction. The same zero val-
ue is obtained for all diagonal points ofhigher-order binary kerneIs with even parity (i.e.,
when an even number of arguments are the same) because x 2n(t) = A2n and the high-order
cross-correlations reduce to lower-order ones that are orthogonal to the output residual.
By the same token, the estimated values of binary kerneIs at diagonal points with odd
parity are zero, because all the odd-order cross-correlations reduce to a first-order one that
is zero for all residuals of order higher than first. For instance, the third-order binary ker-
nel estimate at the diagonal of parity two (i.e., lTI = (T/, (T2 = (T3 = (T) is
g~( (T/, a, (T) = C02E[y3(t)X(t - (T/)] = 0 (2.125)
since the third-order output residual is orthogonal to x(t - (T/). The estimates at the diago-
nal points of parity three ((Tl = lT2 = (T3) also yield zero result, because the third-order
cross-correlation reduces to the first-order one indicated by Equation (2.125).
It can be stated, in general, that the binary CSRS input "folds" all diagonal kernel esti-
mates of dimension two or above (obtained via cross-correlation) into lower-order projec-
tions and yields a zero estimated value (inability to estimate these diagonal kerne I values)
because of the orthogonality between the output residual and the reduced order of the in-
strumental homogeneous functional (i.e., the number of cross-correlated input product
terms is smaller than the order of the output residual).
The same can be stated for a ternary CSRS input with regard to the diagonal of the
third-order ternary kerne I estimate or all diagonal values of dimension three and above in
higher-order ternary kerneIs.
In general, a CSRS input with L amplitude levels cannot yield kerne I estimates (via
cross-correlation) at diagonal points of dimension L or above. The contributions of these
diagonal points to the system output are folded into lower-order functional terms and the
resulting cross-correlation estimates of the CSRS kerneIs for these diagonal points are
zero. An analytical example is given below that elucidates these points.
Since kernel estimation via cross-correlation has been limited in practice to second or-
der (and rarely extending to third order), the practical implications ofthe aforementioned
fact of "diagonal estimability" attain importance only for binary (and occasionally
ternary) quasiwhite inputs. The binary test inputs have been rather popular among investi-
gators, in either the CSRS form or in the pseudorandom m-sequence form (discussed be-
low). There fore, the issue of "diagonal estimability" attains practical importance, since
many investigators have been perplexed in the past by the so-called "diagonal problem"
when binary test inputs are used to estimate second-order kerneIs. This discussion eluci-
dates the problem and provides useful guidance in actual applications.
An illustrative example from areal physiological system (the fly photoreceptor) is giv-
en in Figure 2.13, where the second-order kerneI estimates obtained via cross-correlation
86 NONPARAMETRIC MODEUNG
li FIRST-ORDER KERNEL li FIRST-ORDER KERNEL

'c 50 ESTIMATE G,(,) c 50
40
ESTIMATE 0.(.)
:J 40 :J
.; 30 .; 30
c- _c -
u
::::1-~ 1020 :::: 81
::-
20
10
~ -10o
Ci o
~ - 10
~ I , I I , ~
0.00 -
u.06 L--
0.00 0.06
T (sec) r (sec)
0.06r--=:._•._ 0.05, ,
"t:<;uNu-OROER KERNEL SECONO-OROER KERNEL
ESTIMATE G,('I ' ") ESTIMATE 0..('" ")
V"
,-. ~' ,
u "
u
f (I
~
!-- '
~
~
,: </ ~
~
~ ~/ " ,:
" ':'/~
;;';::, :,~ .{:~:':' : '
-t
r- .', D~' -" ') ,::;
.c:-: ~ffs::.~·'1. ! :{r!"j:~:
j ':;.
0 ". _ . - .
~t~~iy;~. t~~(~~t~})
0.00
0.00 ' ,
r (sec)
, ! -
V.OS
I 0.00'
0.00
, "
V.OS
ft
I
'. (sec)
Figure 2.13 The first- and second-order CSRS kemel estimates of the fly photoreceptor obtained
with the use of binary (right) and ternary (Ieft) test inputs. The differences along the diagonal of the
second-order kemel are evident [Marmarelis & McCann. 1977].
using binary and ternary CSRS test inputs are shown. The differences along the diagonal
points are evident and point to the potential pitfalls of the common (but questionable)
practice ofinterpolating the diagonal values ofthe binary kerneI estimate to overcome the
"diagonal problem." It is evident from Figure 2.13 that diagonal interpolation ofthe bina-
ry kernel estimate would not reproduce the correct kernel values (represented by the
ternary estimate) especially along two segments ofthe diagonal (the part between the two
positive humps and for latencies shorter than 8 msec). There may be certain cases in
which interpolation may yield reasonable approximations along the diagonal (depending
on the true kernel morphology) but this cannot be asserted in general. However, even in
those cases where interpolation is justifiable, adjustments must be made to the lower-
order kernel estimates of the same parity (e.g., the zero-order kernel estimate when the
second-order kernel is interpolated along the diagonal) in order to balance properly the
contributions of these kernel values to the system output.
An Analytical Example. As an analytical example of the relation between CSRS and

Volterra kerneis, consider a third-order Volterra system [i.e., a system for which kn( 'T\, 'T2,
... , 'Tn ) = 0 for n > 3]. If k o = 0, the system response to a CSRS x(t) is
y(t) = r f
k$T$.x(t- 'T\)d'T\ + L'\2('TI> 'T2)x(t- 'T\)x(t- 'T2)d'Tld'T2
+ ffJo k3( 'TI> 'T2, 'T3).x(t - 'T\)x(t- 'T2)x(t - 'T3)d'T\d'T2 d'T3 (2.126)
If we consider a CSRS input with more than three amplitude levels (to avoid the afore-
mentioned problem of"diagonal estimability"), then the CSRS kernels ofthe system (for
the nondiagonal points) are
go = (M21:1 t) ro
k2(TI, TI )dTI (2.127)
f
g\(O"I) = kl(O"I) + 3(M2I:1t) k3( 0" 1> Tl> TI)dTI + [( ~: - 3M2)l:1t2]k3 ( 0" 1> 0"1' 0"1) (2.128)
g2((Tb (T2) = k2 ( (Tb (T2) (2.129)
g3((Tb (T2, (T3) = k 3( (Tb (T2' (T3) (2.130)
The second-order and third-order CSRS kernels are identical to the Volterra kernels be-
cause of the absence of nonlinearities of order higher than third and the multilevel struc-
ture ofthe CSRS input (more than three levels). The zeroth-order CSRS kernel depends
upon the second-order Volterra kemel, and the first-order CSRS kerneI depends upon the
first-order and third-order Volterra kernels in a manner similar to the dependence on the
Wiener kernels as ~t tends to zero.
If a binary CSRS input is used, then the diagonal values of the estimated second-or-
der and third-order binary kernels are zero, as indicated in Eqs. (2.124)-(2.125). The
"nulling" of the diagonal values of g~ and g~ simplifies the form of the CSRS func-
tionals given by Equations (2.115)-(2.118), because only the leading term of each func-
tional remains nonzero for a binary CSRS input:
oq*[g~; x(t'), t'::5 t] = ff ~(Tl> T2)x(t- TI)x(t- T2)dTl dT2 (2.131)
~*[g~; x(t'), t'::5 t] = frf

o
rl(TI> T2' T3)x(t- TI)x(t- T2)x(t- T3)dTl d T2dT3 (2.132)
where the superscript "b" denotes "binary" functionals or kemels, This simplification ap-
plies to all higher-order functionals that may be present in a system. This modified form
of the CSRS functional series for binary inputs will be revisited in the study of neuronal
systems with spike-train inputs (see Chapter 8).
If a ternary CSRS input is used, then only the main diagonal values of g3 become zero
and only the third-order CSRS functional is simplified by elimination of its last term (for
this example of a third-order system). For a high-order system, the kernel values for all
diagonal points of parity three or above attain zero values, with concomitant simplifica-
tions of the respective ternary functionals. Since most applications have been limited to
second-order models thus far, the ternary CSRS inputs appear to be a better choice than
their more popular binary counterparts in terms of avoiding the "diagonal problem" in the
second-order kerneI estimate.
It is instructive to explore also the form of the Wiener kemels of this third-order sys-
tem as an illustrative example. The second-order and third-order Wiener kernels are iden-
tical to their Volterra (and nondiagonal CSRS) counterparts for this system. The zeroth-
order and first-order Wiener kernels are given by
ho = pr
o
l0.( Tl> TI)dTI (2.133)
h$u$ = k$u$ + 3P{"klub 'Tb 'T\)d'Tl (2.134)
where P is the power level of the GWN input. In other words, the Wiener kernels can be
viewed as a special case of CSRS kernels, when M 4 = 3M~ (i.e., when the amplitude dis-
tribution is Gaussian), or when atis very small [i.e., when the third term of gl in Equation
(2.128) becomes negligible relative to the second term].
Comparison of Model Prediction Errors. Of importance in system modeling is the

accuracy (in the mean-square error sense) ofthe model-predicted response to a given stim-
ulus. This issue was discussed previously with regard to Wiener models and is examined
here with regard to the predictive ability of a CSRS model relative to a Volterra model of
the same order. To simplify the mathematical expressions (without loss of conceptual gen-
erality), we consider the zeroth-order model ofthe system in the previous analytical exam-
ple. The mean-square error (MSE) ofthe zeroth-order Volterra model prediction is
Qv = E[y2(t)] (2.135)
because we considered ko = 0, while the MSE ofthe zeroth-order CSRS model prediction
is
Qc = E[ [y(t) - gO]2]
= E[y2(t)] + g6 - 2goE[y(t)] (2.136)
Therefore, the improvernent in accuracy of the zeroth-order model prediction, using a

CSRS model instead of a Volterra model for an arbitrary input, is
i o = Qv- Qc
= 2goE[y(t)] - g6 (2.137)
If the input signal is the CSRS with which the model was estimated, then
i o = g6
= [PI L'"k 2( 'Tb 'T\)d'T\ J;: : 0 (2.138)
As expected, we always have improvement in predicting the system output for a CSRS in-
put of the same power level with which kernel estimation was performed.
If other CSRS inputs of different power level Pi are used to evaluate the zeroth-order
model, then
i o = P$2Pj - p\{{"k 2 ( 'Tb 'T$d'T\ r (2.139)
which is similar to the result for the Wiener model [see Equation (2.82)]. Thus, we can
have improvement or deterioration in the accuracy of the zeroth-order model prediction,
depending on the relative size of the power levels.
Ifthe input x(t) is an arbitrary signal, then
io = r,('k2(Tl' T))dT] {2JL( 'k I(T))dTI + 2 f Ll k3( Tb T2' T3) cf>3( T], T2' T3)dT] d T2dT3
+ fr ki Tl> T2)[2cf>2( Tl> T2) - PI 8( TI - T2)]dTI d T2} (2.140)
where 4>2 and 4>3 are the second- and third-order autocorrelation functions ofthe input sig-
nal and J.L is the input mean. Equation (2.140) clearly demonstrates the fact that the im-
provement (or deterioration) in the case of an arbitrary input signal depends upon the au-
tocorrelation functions of this signal. This establishes the important fact that the
performance of models obtained with quasiwhite test inputs (including band-limited
GWN) depends crucially on the relation ofthe autocorrelation functions ofthe specific in-
put with the kernels and the ones ofthe quasiwhite signal used to obtain the model.
Discrete-Time Representation of the CSRS Functional Series. Since sampled

data are used in practice to perform the modeling task, it is useful to examine the form of
the CSRS functional series in discrete time. This form is simplified when the sampling in-
terval T is equal to the CSRS step size Si. Because of aliasing considerations, T cannot be
greater than at and if T < at, the integrals of the continuous-time representation of the
CSRS functionals attain complicated discretized forms. Therefore, we adopt the conven-
tion that T = at for actual applications, which allows the conversion of the integrals of the
CSRS functionals into summations to yield the discrete-time representation (for at = T):
Gi(n) = TI g}(m)x(n - m)
m
(2.141)
Gi(n) = T2I I
g2(mJ, m2)x(n - m})x(n - m2) -M2T2I g2(m, m) (2.142)
m} m2 m
Gj(n) = T3 I I I g3(mJ, m2' m3)x(n - m})x(n - m2)x(n - m3)

m} m2 m3
- 3M2T 3I I
m m'
g3(m, m', m ')'X(n - m) - (MM 3M ) T 3I
4
2
- 2
m
g3(m, m, m)x(n - m) (2.143)
where the tilde denotes the discrete-time (sampled) counterparts of the continuous-time
variables. Aside from possible scaling factors involving T, the discretized values of the
kernels and input variables are simply the corresponding sampled values. These dis-
cretized forms should be used in practice to interpret the CSRS kernels and functionals
when the cross-correlation technique is used for CSRS kernel estimation.
Pseudorandom Signals Based on m-Sequences. In order to reduce the natural

redundancy ofthe random quasiwhite signals, while still preserving quasiwhite autocorre-
lation properties, one may employ specially crafted pseudorandom signals (PRSs) based
on m-sequences, which are deterministic periodic signals with quasiwhite autocorrelation
properties within aperiod of the PRS. These PRS signals are generated with linear auto-
recursive relations designed to yield sequences with maximum period (see below) [Zier-
ler, 1959; Gyftopoulos & Hooper, 1964; Barker, 1967; Golomb, 1967; Davies, 1970;
Moller, 1973; Sutter, 1975].
An important advantage of the PRS is the fact that their second-order autocorrelation
function is zero outside the neighborhood of the origin (zero lag) and within, of course,
the period of the signal (since the autocorrelation functions are also periodic). This is an
advantage over random quasiwhite signals (such as the CSRS), which exhibit small
nonzero values in this region of their second-order autocorrelation function for finite data
records. The latter cause some statistical error in CSRS kerneI estimation (see Section
2.4.2). However, the PRS exhibit significant imperfections in their higher even-order au-
tocorrelation functions, which offset their superiority in the second-order autocorrelation
properties and may cause significant errors in the estimation ofhigh-order kemels [Bark-
er & Pradisthayon, 1970]. For this reason, the PRS are most advantageous in identifica-
tion of linear systems, whereas the presence of nonlinearities in the system makes the
choice between random and pseudorandom quasiwhite test signals dependent upon the
specific characteristics of the system at hand.
The PRS exhibit the same stair-like form as the CSRS, i.e., they remain constant with-
in small time intervals defined by the step size li.t and switch abruptly at multiples of li.t.
Their values at each step are determined by a linear recurrence formula of the form
x, == al 0 Xi-l EB a2 ® X i-2 EB ... EB a.; ® Xi-rn (2.144)
where the coefficients aj and the signal values x, correspond to the elements of a finite
Galois field (i.e.a finite set of integer values equal to apower of a prime number). The op-
erations {0, EB} are defined to be internaIoperations of multiplication and addition for the
specified Galois field. For example, in the case ofa binary pseudorandom signal, the Galois
field has two elements (0 and 1) and the operations EB and ® are defined to be modulo 2
(i.e., corresponding to "AND" and "OR" Boolean logic operations), so that the outcome of
the recurrence fonnula (2.144) is also an element of same Galois field (i.e., binary).
It is evident that a sequence {Xi} constructed on the basis of the linear recurrence for-
mula (2.144) is periodic, because the root string of m consecutive values of x, will repeat
after a finite number of steps. The length of this period depends on the specific values of
the coefficients aj and the order m of the recurrence formula (for a given Galois field).
Among all the sequences {x} constructed from the L members of a certain Galois field (a
prime number) and with linear recurrence fonnulae of order m, there are some that have
the maximum period. Since I/" is the number of all possible distinct arrangements with
repetitions of L elements in strings of m, this maximum period is (Lm - 1), where L is a
power of a prime number and the null string is excluded.
These maximum-period sequences are called "maximum-length" or "m-sequences,"
and they correspond to a special choice ofthe coefficients {al, ... ,am} that coincide with
the coefficients of a primitive (or irreducible) polynomial of degree (m - 1) in the respec-
tive Galois field [cf. Zierler, 1959]. Thus, we can always select the number of elements L
and the order of the recurrence fonnula m in such a way that we get an m-sequence with a
desirable period (within the limitations imposed by the integer nature of m and prime L).
The generation of PRS in the laboratory is a relatively simple task. Suppose we have
decided upon the prime number L of amplitude values that the signal will attain and the
required maximum period (Lm - 1) [i.e., the order m of the linear recurrence forrnula
(2.144)]. Now, we only need the coefficients of a primitive polynomial of degree (m - 1)
in the respective L-element Galois field. If such a primitive polynomial is available (such
polynomials have been tabulated, cf. Church, 1935), then we choose an initial string of
values and construct the corresponding m-sequence. Any initial string (except the null
one) will give the same m-sequence (for a given set of coefficients aj), merely shifted.
Specialized pieces ofhardware can be used for real-time generation. For example, a bi-
nary m-sequence can be generated through a digital shift register, composed of a cascade
of flip-flops (0 or 1) and an "exclusive OR" feedback connection, as shown in Figure
2.14. Upon receipt of a shift (or clock) pulse, the content of each flip-flop is transferred to
its neighbor and the input to the first stage is being received from the output ofthe "exclu-
sive OR" gate (which generates a 0 when the two inputs are same, or 1 otherwise).
We note that a 15-bit m-sequence can be generated by this four-stage shift register,
corresponding to the maximum number (24 - 1) ofpossible four-bit binary numbers, ex-
cept for the null one (since ifOOOO ever occurred, the output thereafter would be zero).
Such pseudorandom sequences, produced by shift registers with feedback, have been
studied extensively (cf. Davies, 1970; Golomb, 1967; Barker, 1967) in connection with
many engineering applications, especially in communication systems (spread-spectrum
communications and CDMA protocols ofwireless telephony). Table 2.1 gives the possi-
ble stage numbers in the shift register from which the output, along with the output from
the last stage, could be fed into the "exclusive-OR" gate and fed back into the first stage,
in order to obtain a maximum-length binary sequence [Davies, 1970].
The quasiwhiteness of a pseudorandom signal (and, consequently, its use for
Wiener/CSRS kerne I estimation in connection with the cross-correlation technique) is due
to the shift-and-add property of the m-sequences (cf. Ream, 1970). According to this
property, the product (of the proper modulo) of any number of sequence elements is an-
other sequence element:
Xk-jI @ Xk-j2 @ ... @ Xk-jm = Xk-jo (2.145)
where z, depends on}I'}I' ... .i; but not on k. A slight modification must be made in the
m-sequences with even numbers of levels in order to possess the antisymmetric property,
which entails the inversion of every other bit of the m-sequence (doubling the period of
the sequence) and makes the odd-order autocorrelation functions uniformly zero. As a re-
sult of the shift-and-add property and the basic structural characteristics of the m-se-
quences (i.e., maximum period and antisymmetry), the odd-order autocorrelation func-
tions are uniformly zero everywhere (within aperiod of the signal) and the even-order
ones approximate quasiwhiteness.
Note that the even-order autocorrelation functions of order higher than second exhibit
some serious imperfections (termed "anomalies" in the literature), which constitute a seri-
ous impediment in the use of PRS test inputs for nonlinear system identification using the
Reollt.,s: .1 #3 #4
Output
Figure 2.14 Shift register with "exclusive OR" feedback for the generation of pseudorandom se-
quences [Marmarelis & Marmarelis, 1978].
Table 2.1 Stages that Can Be Combined with the Last Stage of Various Shift Registers to
Generate Maximum-Length Binary Sequences*
Number of stages Stage number Sequence length
in shift-register giving feedback in bits
5 2 31
6 1 63
7 1 or 3 127
9 4 511
10 3 1023
11 2 2047
15 1,4, or 7 32767
18 7 262,143
20 3 1,048,575
21 2 2,097,151
22 1 4,194,303
23 5 or 9 8,388,607
25 3 or 7 33,554,431
28 3,9,or 13 268,435,455
31 3,6,7, or 13 2,147,483,647
33 13 8,589,934,591
*From Davies (1970) .
cross-correlation technique [Barker et al., 1972]. These anomalies, first observed by

Gyftopoulos and Hooper (1964, 1967), have been studied extensively by Barker and
Pradisthayon (1970). Barker et al. (1972) studied several PRS (binary, temary, and
quinary) and compared their relative performance, showing that these anomalies are due to
existing linear relationships among the elements ofthe m-sequence and that their exact po-
sition and magnitude can be determined from the generating recurrence equation through a
laborious algorithm related to polynomial division. Since these anomalies are proven to be
inherent characteristics of the m-sequences related to their mathematical structure, their ef-
fect can be anticipated and potentially mitigated through an elaborate iterative scheme.
The kemeis estimated by the use of a PRS and the corresponding functional series are
akin to the CSRS kerneis ofthe associated multilevel amplitude distribution, correspond-
ing to the specific PRS [Marmarelis & Marmarelis, 1978].
Comparative Use of GWH, PRS, and CSRS. The quasiwhite test input signals that
have been used so far to estimate the Wiener/CSRS kemeis of nonlinear systems through
cross-correlation are band-limited GWN, PRS, and CSRS. Each one of these classes of
quasiwhite input signals exhibits its own characteristic set of advantages and disadvan-
tages, summarized below.
For the GWN. The main advantage ofband-limited GWN derives from its Gaussian na-
ture, which secures the simplest expressions for the orthogonal functional series (the
Wiener series) because of the decomposition property of Gaussian random variables
(which allows all the even-order autocorrelation functions to be expressed in terms ofthe
second-order one). Additionally, the use of a GWN test input avoids the estimation prob-
lems at diagonal kerne1points, associated with the use ofbinary or temary PRS/CSRS.
The main disadvantages of GWN are the actual generation and application in the labo-
ratory (including the unavoidable truncation of the Gaussian amplitude distribution), as
weIl as the imperfections in the autocorrelation functions due to its stochastic nature and
the finite data records.
For the PRS. There are two main advantages ofbinary or ternary PRS:
1. Easy generation and application.

2. Short records required to form the desirable autocorrelation functions.
The main disadvantages ofPRS are:
1. The anomalies in their higher (>2) even-order autocorrelation functions, which may
induce considerable kernel estimation errors ifthe system contains significant non-
linearities.
2. The inability to estimate the diagonal kerneI values using binary PRS.
For the CSRS. The main advantages of the CSRS are:
1. Easy generation and application.

2. Their autocorrelation functions do not exhibit any "anomalies" as in the case ofPRS.
3. Error analysis is facilitated by the simple structure of CSRS, allowing the design of
an optimum test input.
4. The user is given flexibility in choosing the signal with the number of levels and
amplitude PDF that fits best the specific case at hand.
The main disadvantages ofthe CSRS are:
1. They require fairly long records in order to reduce the statistical error in the kerne I
estimates (as in the case of GWN).
2. The analytical expressions concerning the corresponding functional series and ker-
nels are fairly complicated (e.g., relation of CSRS kernels with Volterra kernels,
normalizing factors of the cross-correlation estimates, etc.).
3. The inability to estimate the diagonal kerne! va1ues using binary CSRS. Note that
for a ternary test input (CSRS or PRS), this inability concerns the estimation ofthe
third-order kernel main diagonal (or higher order diagonals).
Besides these basic advantages and disadvantages of GWN, PRS, and CSRS, there
may be other factors that become important in a specific application because of particular
experimental or computational considerations [Marmarelis, 1978a,b].
2.2.5 Apparent Transfer Function and Coherence Measurements

One of the questionable habits forced upon investigators by the lack of effective method-
ologies for the study of nonlinear systems is the tendency to "linearize" physiological sys-
tems with intrinsic nonlinearities by applying uncritically linear modeling methods in the
frequency domain. An "apparent transfer function" measurement is typically sought in
those cases, often accompanied by "coherence function" measurements in order to test the
validity ofthe linear assumption or establish the extent ofthe "linearized" approximation.
Specifically, the "coherence function" is computed over the entire frequency range ofin-
terest using Equation (2.149), and the proximity of its values to unity is examined (coher-
ence values are by definition between 0 and 1). If the coherence values are found to be
close to unity over the frequency range of interest, then the inference is made that the lin-
ear time-invariant assumption is valid and the noise content of the experimental data is
low. In the opposite case, the reduced coherence is thought to indicate the presence of ei-
ther system nonlinearities (and/or nonstationarities) and/or high noise content in the data.
Distinguishing among those possible culprits for the reduction in coherence values re-
quires specialized testing and analysis (e.g., repetition of identical experiments and aver-
aging to reduce possible noise effects, or application of nonlinear/nonstationary analysis
to assess the respective effects). As a rule of thumb, coherence values above 0.8 are
thought to validate the linearized approximation.
In this section, we examine the apparent transfer function and coherence function mea-
surements in the general framework ofnonlinear systems with GWN inputs following the
Wiener approach [Marmarelis, 1988a], thus offering a rigorous guide for the proper inter-
pretation of these two popular experimental measurements.
In the Wiener series representation of nonlinear systems/models, the Wiener function-
als are constructed to be mutually orthogonal (or have zero covariance) for a GWN input
with power level P. Thus, using this statistical orthogonality (zero covariance) and the ba-
sie properties of high-order autocorrelation functions of GWN summarized in Appendix
11, we can find the output spectrum to be
Sy{f) = h5 5{f) + PIHI {f )12 +
+ Ir!pr
r=2
r· .. f
-00
IRr(Uh · . · , Ur-hf- Ul - •. • -ur-l)1 2dul · · . dUr-l (2.146)
where H; is the r-dimensional Fourier transform ofthe rth-order Wiener kernel ofthe sys-
tem h., andfdenotes frequency in Hz. Since the input-output cross-correlation is the first-
order Wiener kernel scaled by P [see Equation (2.98)], its Fourier transform yields the
cross-spectrum when the input is GWN:
Syw(f) =PHI(f) (2.147)
Therefore, the commonly evaluated "apparent transfer function" (ATF) is the first-order
Wiener kernel in the frequency domain:
A Syw(f) (f)
Rapp (f) = Sw(f) = H 1 (2.148)
since Sw(f) = P for a GWN input. The coherence function becomes
ISyw(f) 2 1
y-(f) ~ Sw(f)Sy(f)
IHI (f )12
2
IHI{f)1 + L r!P
~ r-I J . .. JIHr(Uh···, ur-hf- UI - ... -ur-I)I dUI ... dUr-I
2 (2.149)
r=2
für all frequencies other thanf= o. Since the summation in the denominator ofEquation
(2.149) includes only nonnegative terms, it is clear that the coherence function will be
less than unity to the extent determined by the GWN input power level and the indicated
integrals ofthe high-order kernels ofthe system. Equation (2.149) shows that for a linear
system or very small P, the coherence function is close to unity.
Note that the coherence function is further reduced in the presence of noise. For in-
stance, in the presence of output-additive noise n(t) the output signal is
y(t) = y(t) + n(t) (2.150)
which leaves the input-output cross-spectrum unchanged (Syw == Syw) ifthe noise has zero
mean and is statistically independent from the output signal, but the output spectrum be-
comes
Sy{f) = Sy{f) + Sn(f) (2.151)
where Sn(f) is the noise spectrum. Thus, the coherence function for the noise-contaminat-
ed output data is reduced according to the relation
Sy{f)
',;Z(f) = y(f) Sy(f) + Sn(f) (2.152)
Since the ATF equals the Fourier transform of the first-order Wiener kernel, we can ex-
press it in terms of the Volterra kernels of the system:
Happ{f) =
00 + I)!pm foo
L (2mm.2
, fK
m . .. 2m+ I (f, u-; -UJ, . . . ,Um, -Um)dUI . . . du.; (2.153)
m=O -00
which indicates that the ATF of a nonlinear system depends on all odd-order Volterra ker-
nels of the system and the GWN input power level. Therefore, measurements of Happ{f)
with GWN inputs differ from the linear transfer function of the system [represented by
the first-order Yolterra kernel K1{f)] and vary with the power level ofthe GWN input as
apower series, according to Equation (2.153).
In many studies of physiological systems, experimental considerations dictate that
GWN test inputs ofvarious nonzero mean levels be used (e.g., in the visual system the in-
put light intensity can assume, physically, only positive values). In those cases, the com-
putation of the Wiener kemels requires de-meaning of the input and the resulting kernel
estimates depend on the input mean level.
Thus, the ATF in this case becomes
HJJapp(f) = ff
m~O '~O
(2m + t+ 1)!
m! t!
(P)m
2 (p., - JLo) e
L: ... fK~m+e+l(f, UI> -UI> . . . ,Um, -Um' 0, ... , O)du, ... du.; (2.154)
where /L is the nonzero mean of the GWN test input and /Lo is the reference mean level
(possibly, but not necessarily, zero) for the nominal Volterra kernels {KiO}. The coher-
ence function measurement is also affected, because the Wiener kernel estimates depend
on the GWN input JL used in the experiment.
Example 2.5. L-N Cascade System

As an illustrative example, we consider the case of a simple L-N cascade of a linear filter,
with impulse response function g( r), followed by a static nonlinearity that is represented
by a Taylor series expansion with coefficients {c.}. The system output is
y(t) = ~arv'{t) = ~ar[rg(T)x(t- T)dTl (2.155)
where v( t) is the output of the linear filter.

The Volterra kernels ofthis system in the frequency domain are (see Section 4.1.2)
Krift, ... ,Ir) = arGCft) ... G(fr) (2.156)
and the ATF can be found using Equation (2.153) to be
Happ(f) -
_ {
J:o (2mm!2+ l)!pm
00
m
.
a2m+1
[J 1
OO
-00
2
G(u) 1du
]m}
G(f)
= C' G(f) (2.157)
i.e., it is a scaled version of G(f) which is the transfer function ofthe linear component of
the cascade. The scaling factor c depends on the static nonlinearity, the input power level,
and the Euclidean norm of G(f).
The coherence function in this example is found from Equations (2.157) and (2.149) to
be (forf* 0)
c2 IG(f )12
y(j) =
c 2 1G(f)1
2
+ Lr!pr-IA;
r=2
J... JlG(ul) . · . G(Ur-I)G(f- UI - ... - ur_I)1 2dul·
.. du.s,
(2.158)
where
Ar =
00 2m)!pm
J:o (r +r!m!2m ar+2m
[J lG(u)1 2du ]m (2.159)
This indicates that, even in this relatively simple case, the coherence function is a rather
complicated expression, dependent on the system nonlinearities and the input power level
in the manner described by the denominator ofEquation (2.158). To reduce the complex-
ity, let us consider a quadratic nonlinearity (i.e., CXr = 0 for r > 2). Then,
Happ(f) = al G(f) (2.160)

and:
y2(j) = IG(f)12 ]
(2.161)
[ IG(f)1 +2
2P(:~ YJIG(U)G(f- u)1 2du
for f =/:. o. Equation (2.161) indicates clearly that the quadratic nonlinearity reduces the
coherence values more as P and/or (a2/a})2 increase.
An illustration ofthis is given in Figure 2.15 where part (a) shows the ATF measure-
ments of a quadratic cascade system for three different values of GWN input power level
(P = 0.25, 1, and 4) obtained from input-output records of 4,096 data points each. We ob-
serve slight variations in the ATF measurements due to the finite data record, which leads
to imperfectly forming averages. Part (b) of Figure 2.15 shows the coherence function
measurements obtained from the same input-output data records and values of P. The ob-
served effect of P on the coherence function measurement is in agreement with Equation
(2.161) [Marmarelis, 1988a].
Example 2.6. Quadratic Valterra System

Since quadratic nonlinear systems are quite common in physiology (e.g., in the visual sys-
tem), we use them in a second example that is more general than the first in that the sys-
tem is described by two general kernels (K}, K 2 ) and is not limited to a cascade arrange-
ment. Note that these kernels are the same for the Volterra and the Wiener expansions-a
fact that holds for all second-order systems (Kr == 0 for r> 2). For this general second-or-
der system, we have (forf=/:. 0)
Happ(f) = K}(f) (2.162)
2
y(j) = IK)(f)1
(2.163)
IK) (f)1 2
+ 2PJIK u,f- u)1
2(
2du
Equation (2.162) shows that the ATF in this case is identical to the linear portion (first-or-
der Volterra kerneI) of the system. However, if a nonzero mean GWN test input is used
(as in the visual system), then we can find from Equation (2.154) that
H':pp(f) = K?(f) + 2(JL - JLo)K~(f, 0) (2.164)
i.e., the ATF measurement is affected by the second-order kernel and the nonzero mean of
the GWN test input signal.
Equation (2.163) shows that the coherence function values are reduced as the relative
nonlinear contribution increases. The latter is different for different GWN input mean lev-
el JL, since the corresponding Kf(f) or Hf(f) will vary with JL as
Kf(f) = K?(f) + 2(JL - JLo)K~(f, 0) (2.165)
while K 2 (or H 2 ) remains the same for all values of JL.

0.400E-Q2 100
~~
0.360E-Q2 0900
0.320E-Q2 O.BOO
0.2BOE-Q2 0.700 p
t
0.240E-Q2 0600
0200E-Q2 0500
0.160E-Q2 0400
0.120E-Q2 0300
0.BOOE-Q2 0200
0.400E-Q2 0.100
0.0 1 , , , , , 0.0 1 , , , , I
00 0.100 0200 0.300 0.400 0.500 00 0.100 0200 0300 0.400 0.500
Frequency Frequency
(a) (b)
Figure 2.15 (a) The gain of the apparent transfer function of a quadratic cascade system for three
different power levels of GWN input (P = 0.25, 1, and 4); (b) the coherence function measurements
for the three GWN power input levels [Marmarelis, 1988a].
Example 2. 7. Nonwhite Gaussian Inputs

We can use the case of quadratic nonlinear systems to demonstrate also the effect of non-
linearities on coherence and apparent transfer function measurements when the input x(t)
is zero-mean Gaussian but nonwhite. In this case, the cross-spectrum is
Syx(f) = K1(f)Sx(f) (2.166)
and the output spectrum is (for/* 0)
Si/) = IK\(f)1 2Sx(f) + 2 J~IK2(U,J- u)1 2SxCu)Sx(f- u)du (2.167)
where S, is the input spectrum.

Consequently, the ATF for quadratic systems is exactly the linear component of the
system, K\(f), regardless of whiteness of the input. On the other hand, the coherence
function is (for/* 0)
1
(2.168)
r(f) = Joo IKz(u,J- u) !2[SxCU)]Si/_ u)du
1 + 2 -= K \ (f) Si/)
which is clearly less than unity to the extent detennined by the degree of nonlinearity and
the input spectrum, as indicated in the second tenn of the denominator.
Example 2.8. Duffing System

As a final example, let us consider a system described by a relatively simple nonlinear
differential equation:
Ly+ ay 3 = X (2.169)
where L is a linear differential operator of qth order with constant coefficients [i.e., L(D)
= cp + ... + cID + Co, where D denotes the differential operator d(·)/dt]. When L is of
second order, Equation (2.169) is the Duffing equation, popular in nonlinear mechanics
because it describes a mass-spring system with cubic elastic characteristics.
The Volterra kernels ofthis nonlinear differential system are [Marmarelis et al., 1979]
KI(f) = l/L(j) (2.170)
K 2(fi,12) = 0 (2.171)
K 3Cft,12,13) = -aKI(fi )K I(f2)K 1(f3)K ICft +12 +13) (2.172)
The general expression for the odd-order Volterra kernels of this system is
3(-a)r
KZr+\U;, ... ,fzr+l) = (4,-2 _ 1) K$f;) ... KlJir+$K\U; + ... + fzr+I)IK$fj\ + jjz + jj3)
r lIJ2J3
(2.173)
where the summation is taken over all combinations of three indices from the integers 1
through (2r + 1). All the even-order Volterra kemels of this system are zero. Since the
(2r + 1)th-order Volterra kernel is proportional to a", we can simplify the equivalent
Volterra model when a ~ 1 by neglecting the odd-order Volterra kernels of order higher
than third. Then,
Happ(f) == KI(f) - AKI(f) (2.174)
and
11 - AK I (1)1 2
y(f)= (2.175)
11 - AK\(f)I Z + 6a zpzf f IK1(u$K,(uz)K\(f- », - uz)IZdu\duz
where
A= 3PafIK\(u)IZdu (2.176)
Note that Equation (2.174) provides a practical tool for exploring a class of nonlinear
feedback systems, as discussed in Section 4.1.5.
Concluding Remarks. In conclusion, it has been shown that for the case of GWN in-
puts:
1. The coherence function measurements reflect the presence of high-order (nonlin-

ear) kernels and depend on the GWN input power level. Specifically, the coherence
values are reduced from unity as the input power level P and/or the degree of non-
linearity increase.
2. The apparent transfer function is identical to the Fourier transform ofthe first-order
Wiener kernel of the system, which is not, in general, the same as the linear compo-
nent ofthe system (formally represented by the first-order Volterra kernel). The ap-
parent transfer function depends on the system odd-order nonlinearities and on the
power level of the GWN input.
3. In those physiological studies where the physical test input is a GWN perturbation
around a nonzero mean J.L (e.g., in vision), the apparent transfer function and the co-
herence function measurements depend on J.L. Therefore, they will vary as J.L may
vary from experiment to experiment.
4. The same observations hold true for nonwhite broadband inputs, where, additional-
ly the specific form ofthe input spectrum affects the apparent transfer function (un-
less it is a second-order system) and the coherence measurements.
It is hoped that the presented analysis will assist investigators of physiological systems
in interpreting their experimental results, obtained with these traditional frequency-do-
main measurements of the apparent transfer function and the coherence function, in the
presence of intrinsie system nonlinearities.
2.3 EFFICIENT VOlTERRA KERNEl ESTIMATION
In Section 2.2, we described methods for the estimation ofthe Wiener kernels of a nonlin-
ear system that receives a GWN (or quasiwhite) input. For such inputs, the most widely
used method to date has been the cross-correlation technique in spite of its numerous lim-
itations and inefficiencies. Part of the reason is that, although the Wiener functional terms
are decoupled (i.e., theyare orthogonal for GWN inputs), their orthogonality (zero covari-
ance) is approximate in practice, since their covariance is not exact1y zero for finite data
records, which causes estimation errors in the Wiener kerne! estimates. Additional esti-
mation errors associated with the cross-correlation technique are caused by the finite
bandwidth of the input and the presence of noise/interference. These errors depend on the
system characteristics and decrease with increasing data-record length, as discussed in
Section 2.4.2.
The most important limitations ofthe cross-correlation technique are: (a) the stringent
requirement of a band-limited white-noise input, (b) the input dependence of the estimat-
ed Wiener (instead ofVolterra) kernels, (c) the experimental and computational burden of
long data records, and (d) the considerable estimation variance ofthe obtained kernel esti-
mates (especially of order higher than first). The latter two limitations are an inevitable
consequence ofthe stochastic nature ofthe employed GWN (or quasiwhite) input and the
fact that the cross-correlation estimates are computed from input-output data records with
finite length. These estimates converge to the true values at a rate proportional to the
square-root ofthe record length. Note that this limitation is also incumbent on the initially
proposed Wiener implementation using time-averaging computation ofthe covariance be-
tween any two Wiener functionals over finite data records [Marmarelis & Marmarelis,
1978]. Thus, long data records are required to obtain cross-correlation or covariance esti-
mates of satisfactory accuracy, resulting in heavy experimental and computational bur-
den.
In addition to the various estimation errors, the Wiener kemels are not input-indepen-
2.3 EFFICIENT VOLTERRA KERNEL ESTIMATION 101
dent and, therefore, the Volterra kernels are deemed more desirable in actual applications
from the viewpoint of physiological interpretation. Methods for Volterra kernel estima-
tion were discussed in Section 2.1.5, but they were also shown to exhibit practicallimita-
tions and inefficiencies. More efficient methods for Volterra kernel estimation are dis-
cussed in this section. These methods can be applied to systems with arbitrary
(broadband) inputs typical of spontaneous/natural operation, thus removing a serious
practicallimitation ofthe Wiener approach (i.e., the requirement for white or quasiwhite
inputs) or the need for specialized inputs (e.g., impulses or sinusoids).
It is evident that, in practice, we can obtain unbiased estimates of the Volterra kernels
only for systems of finite order. These Volterra kernel estimates do not depend on input
characteristics when a complete (nontruncated) model is obtained for the system at hand.
However, the Volterra kernel estimates for a truncated model generally have biases re-
sulting from the correlated residuals, which are dependent on the specific input used for
kerne I estimation. Thus, the attraction of obtaining directly the Volterra kernels of a sys-
tem has been tempered by the fact that model truncation is necessitated in the practical
modeling ofhigh-order systems, which in turn introduces input-dependent estimation bi-
ases because the residuals are correlated and input-dependent since the functional terms
of the Volterrra models are coupled.
Naturally, the biases and other estimation errors of the Volterra kernels depend on the
specific characteristics of each application (e.g., system order, data-record length, input
characteristics, etc.) and should be minimized in each particular case. To accomplish this
error minimization, we must have a thorough understanding of the methodological issues
regarding kerne I estimation in a general, yet realistic, context.
The emphasis here will be placed on the most promising methods of Volterra kernel
estimation that are applicable for nearly arbitrary inputs and high-order models. These
methods employ Volterra kerne I expansions, direct inversion and iterative estimation
methods in connection with equivalent network model structures.
2.3.1 Volterra Kernel Expansions

The introduction of the kernel expansion approach was prompted by the fact that the use
of the cross-correlation technique for kerne I estimation revealed serious practical prob-
lems with the estimation of multidimensonal (high-order) kernels, These problems are
rooted in the unwieldy multi dimensional representation ofhigh-order kemels and the sta-
tistical variation of covariance estimates.
The Volterra kernel expansion approach presented in this section mitigates some of the
problems of kernel estimation by compacting the kernel representation and avoiding the
computation of cross-correlation (or covariance) averages, performing, instead, least-
squares fitting ofthe actual data to estimate the expansion coefficients ofthe Volterra ker-
nels. This alternative strategy also removes the whiteness requirements ofthe experimen-
tal input, although broadband inputs remain desirable. The difference between this
approach and the direct least-squares estimation of Volterra kernels discussed in Section
2.1.5 is that the representation of the Volterra kernels in expansion form is more compact
than the discrete-time formulation of Section 2.1.5, especially for systems with large
memory-bandwidth products, as discussed below. The more compact kernel representa-
tions result in higher estimation accuracy and reduced computational burden, especially
in high-order kernel estimation.
The basic kernel expansion methodology employs a properly selected basis of L causal
funetions {bj ( T)} defined over the kernel memory, whieh ean be viewed as the impulse re-
sponse funetions of a linear filterbank reeeiving the input signal x(t). This basis is as-
sumed to span eompletely and effieiently (i.e., with fast eonvergenee) the kerne I function
spaee over [0, J.L]. The outputs {vj(t)} of this filterbank (j = 1, ... , L) are given by
vif) = rbl
o
T)x(t - T)dT (2.177)
and ean be used to express the system output y(t) aeeording to the Volterra model of
Equation (2.5), through a multinomial expression:
Q L L
y(t) = k o + Ir=1 jlI ... jr=1
=1
I arVb ... ,}r)vj1(t) ... Vjr(t) (2.178)
whieh results from substituting the kernels in Equation (2.5) with their expansions:
L L
kr(Tb· .. , Tr ) = I ... jr=1

jl=1
IarVb ... ,}r)bj1(Tl) ... bjr(Tr) (2.179)
where {ar} are the eoefficients of the rth-order kernel expansion. Note that this modified
expression of the Volterra model (using the kernel expansions) is isomorphie to the
bloek-structured model of Figure 2.16, whieh is equivalent to the Volterra class of sys-
tems if and only if the kernel expansions of Equation (2.179) hold for all r. In other
words, the seleeted basis {bj ( T)} must be complete for the expansion of the partieular ker-
nels of the system under study or at least provide adequate approximations for the re-
quirements of the study. The kernel estimation problem thus reduees to the estimation of
the unknown expansion coeffieients using the expression (2.178), whieh is linear in terms
ofthe unknown eoeffieients. Note that the signals vj(t) are known as eonvolutions ofthe
LINEAR STATIC
FILTERBANK NONLINEARITY
VI (t)
x(t) Vj (t) y ( t ) =f [ VI (t),...,VL (t)]

f[-]
VL (t)
Figure 2.16 The modular form of the Volterra model, akin to the modular Wiener model of Figure
2.9. This modular model is isomorphie to the modified Volterra model of Equation (2.178), where the
multi-input statie nonlinearity f[.] is represented/ approximated by a multinomial expansion.
2.3 EFFICIENTVOLTERRA KERNEL ESTIMATION 103
input signal with the selected filterbank [see Equation (2.177)], but are available only in
sampled form (discretized) in practice.
For this reason, the advocated methodology is cast in a discrete-time context by letting
n replace t, m replace T, and summation replace integral. This leads to the modified dis-
crete Volterra (MDV) model:
Q L Jr - l
yen) = Co + L L ...JLCrUb ... ,}r)vi 1(n) . . . viren) + e(n)
r=1 Jl=1
(2.180)
r=1
where the expansion coefficients {c.} take into account the symmetries of the Volterra
kemels, that is,
CrUb' .. ,jr) = A~rVb ... ,ir) (2.181)
where Ar depends on the multiplicity ofthe specific indices Vb ... ,jr)' For instance, if all
indices are distinct, then Ar = 1; but ifthe indices formp groups with multiplicities mj(i =
1, ... ,p, and m, + ... + mp = r), then Ar = ml! ... rnpL The error tenn e(n) incorporates
possible model truncation errors and noise/interference in the data. The filter-bank out-
puts are
M-l
vj(n) = TL bj(m)x(n - m)
m=O
(2.182)
where M denotes the memory-bandwidth product of the system and T is the sampling in-
terval.
For estimation efficiency, the number of required basis functions L must be much
smaller than the memory-bandwidth product M ofthe system, which is the number of dis-
crete samples required for representing each kerne I dimension. With regard to the estima-
tion of the unknown expansion coefficients, the methods discussed in Section 2.1.5 apply
here as well, since the unknown coefficient vector must be estimated from the matrix
equation:
y = Vc + 8 (2.183)
where the matrix V is constructed with the outputs of the filter bank according to Equa-
tion (2.180). Note that the symmetries of the kernels have been taken into account by re-
quiring that j, $ }i-l in Equation (2.180). For instance, for a second-order system, the nth
row of the matrix V is
{I, vl(n), ... , vL(n), vr(n), v2(n)vl(n), ... ,

vL(n)vl(n), v~(n), v3(n)v2(n), ... , vL(n)vL-l(n), vz(n)}
The number of columns in matrix V for a Qth-order MDV model is
P = (L + Q)!/(L!Q!) (2.184)
and the number of rows is equal to the number of output samples N. Solution of the es-
timation problem formulated in Equation (2.183) can be achieved by direct inversion of

the square Gram matrix G = [V'V] if it is nonsingular, or through pseudoinversion of
the reetangular matrix V if G is singular (or ill-conditioned), as discussed in Section
2.1.5. Iterative schemes based on gradient descent can also be used instead of matrix in-
version, especially if the residuals are non-Gaussian, as discussed in Section 2.1.5. The
number of columns of the matrix V determines the computational burden in the direct-
inversion approach and depends on the parameters Land Q, as indicated in Equation
(2.184). Therefore, it is evident that practical solution of this estimation problem is sub-
ject to the "curse of dimensionality" for high-order systems, since the number of
columns P in matrix V increases geometrically with Q or L. For this reason, our efforts
should aim at minimizing L by judicious choice of the expansion basis, since Q is an in-
variant system characteristic.
If the matrix V is full-rank, then the coefficient vector can be estimated by means of
ordinary least-squares as
c = [V'V]-lV'y (2.185)
This unbiased and consistent estimate is also efficient (i.e., it has minimum variance
among all linear estimators) if the residuals are white (i.e., statistically independent with
zero mean) and Gaussian. Ifthe residuals are not white, then the generalized least-squares
estimate ofEquation (2.46) can be used, as discussed in Section 2.1.5. In that same sec-
tion, an iterative procedure was described for obtaining efficient estimates when the resid-
uals are not Gaussian, by minimizing a cost function defined by the log-likelihood func-
tion (not repeated here in the interest of space).
A practical problem arises when the matrix V is not full-rank or, equivalently, when
the Gram matrix G is singular. In this case, a generalized inverse (or pseudoinverse) V+
can be used to obtain the coefficient estimates as [Fan & Kalaba, 2003]:
c=V+y (2.186)
Another problematic situation arises when the Gram matrix G is ill-conditioned (a fre-
quent occurrence in practice). In this case, a generalized inverse can be used or a reduced-
rank inverse can be found by means of singular-value decomposition to improve numeri-
cal stability. The resulting solution in the latter case depends on the selection of the
threshold used for determining the "significant" singular values.
Model Order Determination. Of particular practical importance is the determination

ofthe structural parameters Land Q ofthe MDV model, which detennine the size ofma-
trix V. A statistical criterion for MDV model order selection has been developed by the
author that proceeds sequentially in ascending order, as described below for the case of
Gaussian white output-additive noise (i.e., the residual vector for the true model order). If
the residuals are not white, then the prewhitening indicated by Equation (2.47) must be
applied.
First, we consider a sequence ofMDV models ofincreasing order "r," where r is a se-
quential index sweeping through the double loop of increasing Land Q values (L being
the fast loop). Thus, starting with (Q = 1, L = 1) for r = 1, we go to (Q = 1, L = 2) for r =
2, and so on. After (Q = 1, L = L max ) for r = L max , we have (Q = 2, L = 1) for r = L m ax + 1
and so on until we reach (Q = Qmax, L = L max ) for r = Qmax . L m ax . For the true model or-
der r = R (corresponding to the true Q and L values), we have
Y= VRCR + SR (2.187)
where the residual vector BR is white and Gaussian, and the coefficient vector eR recon-
structs the true Volterra kemels of the system according to
L Jq-I
kq(mb . . . , mq) = I ... I
. I . I
CqUb ... ,}q)bJI(mI) ... b, (m q)
q
(2.188)
11= /«:
However, for an incomplete order r, we have the truncated model
Y= V,c, + Sr (2.189)
where the residual vector contains portion of the input-output relationship and is given by
Sr = [I - Vr[V;Vr]-lV;]y
~HrY (2.190)
where the "projection" matrix H, is idempotent (i.e., H; = H r) and ofrank (N - Pr), with
Pr denoting the number of free parameters given by Equation (2.184) for the respective L
and Q. Note that
e; = H~;-IBr-l
~ SrBr-I (2.191)
which relates the residuals at successive model orders through the matrix Sr that depends
on the input data. Since Bo is the same as y (by definition), the rth-order residual vector
can be expressed as
e; = SrSr-l ... SIY

=Hry (2.192)
because the concatenation ofthe linear operators [SrSr-l ... Sd simply reduces to H': The
residuals of an incomplete model rare composed of an input-dependent term Ur (the un-
explained part of the system output) and a stochastic term W r that represents the output-
additive noise after being transformed by the matrix Hr- Therefore,
e, = Ur + W r (2.193)
where Ur is input-dependent and
Wr = Hrwo (2.194)
Ur = Hruo (2.195)
Note that Wo denotes the output-additive noise in the data and Uo represents the noise-free
output data. For the true order R, we must have UR = 0 since all the input-dependent infor-
mation in the output data has been captured and explained by the MDV model of order R.
The expected value ofthe sum ofthe squared residuals (SSR) at order r is given by
E[Or] = E[s;sr}
= u.u, + UÖTr{H;Hr } (2.196)
where UÖ is the variance of the initial white residuals, and Tr{·} denotes the trace of the
subject matrix. The latter is found to be
Tr{H;H r} = N - Pr
= N - iL, + Qr)!/{Lr!Qr!) (2.197)
where L, and Q; are the structural parameter values that correspond to model order r.
To test whether r is the true order ofthe system, we postulate the null hypothesis that it
is the true order, which implies that Ur = 0, and an estimate of the initial noise variance
can be obtained from the computed SSR as
uö = O/{N - Pr) (2.198)
Then, the expected value ofthe reduction in the computed SSR for the next order (r + 1) is
E[Or- Or+l] = uö· (Pr+l -Pr) (2.199)
under this null hypothesis. Therefore, using the estimate of UÖ given by Equation (2.198),
we see that this reduction in computed SSR ought to be greater than the critical value:
(Jr = Anr (Fr+l- Fr)

(2.200)
(N -Pr)
to reject this null hypothesis and continue the search to higher orders. We can select A = 1
+ 2V2, because the SSR follows approximately a chi-square distribution. Ifthe reduction
in computed SSR is smaller than the critical value given by Equation (2.200), then the or-
der r is accepted as the true order ofthe MDV model for this system; otherwise we pro-
ceed with the next order (r + 1) and repeat the test.
As an alternative approach, we can employ a constrained configuration of the MDV
model in the form of equivalent feedforward networks, which reduce significantly the
number of free parameters for high-order systems regardless of the value of L, as dis-
cussed in Section 2.3.3. However, the estimation problem ceases to be linear in terms of
the unknown parameters and estimation ofthe network parameters is achieved with itera-
tive schemes based on gradient descent, as discussed in Section 4.2.2. The use of gradi-
ent-based iterative estimation of system kemels was first attempted with "stochastic ap-
proximation" methods [Goussard et al., 1991], but it became more attractive and more
widely used with the introduction of the equivalent network models discussed in Section
2.3.3. It should be noted that these methods offer computational advantages when the size
of the Gram matrix G becomes very large, but they are exposed to problems of conver-
gence (speed and local minima). They also allow robust estimation in the presence of
noise outliers (i.e., when the model residuals are not Gaussian and exhibit occasional out-
liers) by utilizing nonquadratic cost functions compatible with the log-likelihood 2nction
of the residuals, since least-squares estimation methods are prone to serious errors in
those cases (as discussed in Sections 2.1.5 and 2.4.4).
The key to the efficacy of the kerne I expansion approach is in finding the proper basis
{bj } that reduces the number L to a minimum for a given application (since Q is fixed for
a given system, the number of free parameters depends only on L). The search for the pu-
tative "minimum set" of such basis functions may commence with the use of a general
complete orthonormal basis (such as the Laguerre basis discussed in Section 2.3.2 below)
and advance with the notion of "principal dynamic modes" (discussed in Section 4.1.1),
in order to extract the significant dynamic components of the specific system and elimi-
nate spurious or insignificant terms.
2.3.2 The Laguerre Expansion Technique

To date, the best implementation ofthe kernel expansion approach (discussed in the previ-
ous section) has been made with the use of discrete Laguerre functions. The Laguerre basis
(in continuous time) was Wiener' s original suggestion for expansion ofthe Wiener kernels,
because the Laguerre functions are orthogonal over a domain from zero to infinity (consis-
tent with the kernel domain) and have a built-in exponential (consistent with the relaxation
dynamic characteristics ofphysical systems). Furthermore, the Laguerre expansions can be
easily generated in continuous time with analog means (ladder R-C network), which en-
hanced their popularity at that time when analog processing was in vogue. With the ad-
vancing digital computer technology, the data processing focus shifted to discretized sig-
nals and, consequently, to sampled versions of the continuous-time Laguerre functions.
The first applications of Laguerre kernel expansions were made in the early 1970s on the
eye pupil reflex [Watanabe & Stark, 1975] and on hydrological rainfall-runoffprocesses
[Amorocho & Branstetter, 1971] using sampled versions ofthe continuous-time Laguerre
functions. Note that these discretized Laguerre functions are distinct from the "discrete
Laguerre functions" (DLFs) advocated herein, which are constructed to be orthogonal in
discrete time [Ogura, 1985] (the discretized versions are not generally orthogonal in dis-
crete time but tend to orthogonality as the sampling interval tends to zero). Wide applica-
tion of the Laguerre kernel expansion approach commenced in the 1990s with the success-
ful introduction ofthe DLFs discussed below [Marmarelis, 1993].
The Laguerre expansion technique (LET) for Volterra kernel estimation is cast in dis-
crete time by the use ofthe orthonormal set of discrete Laguerre 2nctions (DLFs) given
by [Ogura, 1985]
hirn) = a(m- j)/2(l - a)1I2 ~(-1 )k(: )({ )ai-k(l - a)k (2.201)
where bj(m) denotes thejth-order orthonormal DLF, the integer m ranges from 0 to M - 1
(the memory-bandwidth product of the system), and the real positive number a (0 < a <
1) is the critical DLF parameter that determines the rate of exponential (asymptotic) de-
cline ofthese 2nctions. We consider a DLF filter bank (for) = 0, 1, ... ,L - 1) receiving
the input signal x(n) and generating at the output of each filter the key variables {vj(n)} as
the discrete-time convolutions given by Equation (2.182) between the input and the re-
spective DLF.
Since the sampled versions of the traditional continuous-time Laguerre functions,
which were originally proposed by Wiener and used by Watanabe and Stark in the first
known application of Laguerre kernel expansions to physiological systems [Watanabe &
Stark, 1975], are not necessarily orthogonal in discrete time (depending on the sampling
interval), Ogura constructed the DLFs to be orthogonal in discrete time and introduced
their use in connection with Wiener-type kernel estimation that involves the computation
of covariances by time averaging over the available data records (with all the shortcom-
ings of the covariance approach discussed earlier) [Ogura, 1985]. The advocated LET
combines Ogura's DLFs (which are easily computed using an autorecursive relation) with
least-squares fitting (instead of covariance computation), as was initially done by Stark
and Watanabe. An important enhancement of the LET was introduced by the author in
terms of adaptive estimation of the critical Laguerre parameter a from the input-output
data, as elaborated below. This is a critical issue in actual applications of the Laguerre ex-
pansion approach because it determines the rate of convergence of the DLF expansion
and the efficiency of the method,
It should be noted that LET was introduced in connection with Volterra kernel estima-
tion but could be also used for Wiener kernel estimation if the Wiener series is employed
as the model structure. However, there is no apparent reason to use the Wiener series for-
mulation, especially when a complete (nontruncated) model is employed. Of course, the
Volterra formulation must be used when the input signal is nonwhite.
When the discretized Volterra kemels of the system are expanded on the DLF basis
as
L-l Jr-l
k.irn«, ... , mr) = I ... I CrVb ... ,Jr)bJ1(ml) ... bJr(m r) (2.202)
Jl=O Jr=O
then the resulting MDV model of Qth order is given by Equation (2.180), where a finite
number L ofDLFs is used to represent the kernels as in Equation (2.202).
The task of Volterra system modeling now reduces to estimating the unknown kernel
expansion coefficients {CrOb ... , Jr)} from Equation (2.180), where the output data
{y(n)} and the transfonned input data {vJ(n)} (which are the outputs ofthe DLF filter-
bank) are known. This task can be performed either through direct inversion ofthe matrix
V in the fonnulation of Equation (2.183) or using iterative (gradient-based) techniques
that can be more robust when the errors are non-Gaussian (with possible outliers) and/or
the size of the V matrix is very large.
The total number of free parameters P that must be estimated [which is equal to the
number of columns of matrix V given by Equation (2.184)] remains the critical consider-
ation with regard to estimation accuracy and computational burden. Estimation accuracy
generally improves as the ratio PIN decreases, where N is the total number of input-out-
put data points. The computational burden increases with P or N, but is more sensitive to
increases in P. Thus, minimizing L is practically important in order to minimize P, since
the model order Q is dictated by the nonlinear characteristics of the system (beyond our
control). A complete model (Le., when Q is the highest order of significant nonlinearity in
the system) is required in order to avoid estimation biases in the obtained Volterra kerne I
estimates. This also alleviates the strict requirement of input whiteness and, thus, natural-
ly occurring data can be used for Volterra kerne I estimation.
The computations required for given Land Q can be reduced by computing the key
variables {vj(n)} using the autorecursive relation [Ogura, 1985]:
vj(n) = ~vj(n - 1) + Yavj_l(n) - vj_l(n - 1) (2.203)
which is due to the particular form ofthe DLF. Computation ofthis autorecursive relation
must be initialized by the following autorecursive equation offirst order that yields vo(n)
for given stimulus x(n):
vo(n) = Yavo(n - 1) + TVT=ax(n) (2.204)
where T is the sampling interval. These computations can be preformed rather fast for n =
1, ... ,N and} = 0, 1, ... ,L - 1, where L is the total number ofDLF used in the kernel
expansion. The DLFs also can be generated by these autorecursive relations.
The choice ofthe Laguerre parameter a is critical in achieving efficient kerne I expan-
sions (i.e., minimize L) and, consequently, fast and accurate kernel estimation. Its judi-
cious selection was initially made on the basis of the parameter values Land M [Mar-
marelis, 1993] or by successive trials. However, the author recently proposed its
estimation through an iterative adaptive scheme based on the autorecursive relation of
Equation (2.203). This estimation task is embedded in the broader estimation procedure
for the kernel expansion coefficients using iterative gradient-based methods.
Specifically, we seek the minimization of a selected nonnegative cost function F [usu-
ally the square ofthe residual term in Equation (2.180)] by means ofthe iterative gradi-
ent-based expression
= C~)Ub ... ,ir) - acrUaF(e) I (2.205)

, . . . , J') 'Y r e=eCi)
I, . . . , J')
C (i+ I )U '1
r r
where y denotes the adaptation step (learning constant), i is the iteration index, and the
gradient aF/aCr is evaluated for the ith iteration error [i.e., the model residual or predic-
tion error is computed from Equation (2.180) for the ith-iteration parameter estimates]. In
the case of the Laguerre expansion, the iterative estimation of the expansion coefficients
based on the expressions (2.205) can be combined with the iterative estimation of the
square-root ofthe DLF parameter ß = al/2 :
aF(e) I
ß<i+l) = ß<i) - 'Yß~ s=dJ) (2.206)
Clearly, changes in ß (or a = ß2) alter the values of the key variables {Vj(n)} according to
the autorecursive relation (2.203), and thus
n)
cJvi = vin - 1) + vj_l(n) (2.207)
aß
This simple gradient relation (with respect to ß = al/2 ) allows iterative estimation of a
along with the expansion coefficients in a practical context. For instance, in the case of a
quadratic cost function F = i2 (corresponding to least-squares fitting), the gradient com-

ponents are
aF(e)
(2.208)
and
aF Q L Jr- I Jr
----v2
aa
= -2e(n)I
r= 1 Ji =O
I ...IJr=O PI~Ji cl } !> .. . ,Jr)vJ\(n) . .. [vp(n - 1) + vp-I(n)] ... vJ/n)
(2.209)
Although this procedure appears to be laborious for high-order models, it becomes drasti-
cally accelerated when an equivalent network structure is used to represent the high-order
model, as discussed in Section 2.3.3.
It is instructive to illustrate certain basic properties of the DLFs. The first five DLFs
for a = 0.2 are shown in Figure 2.17 (T = 1). We note that the number of zero crossings
(roots) of each DLF equals its order. Furthermore, the higher the order, the longer the
significant values of a DLF spread over time, and the time separation between zero
crossing increases. This is further illustrated in Figure 2.18, where the DLFs of order 4,
8, 12, and 16 are shown for a = 0.2. An interesting illustration of the first 50 DLFs for
a = 0.2 is given in Figure 2.19, plotted as a square matrix over 50 time lags in three-di-
mensional perspective (top display) and contour plot (bottom display). We note the
symmetry of this matrix and the fact that higher-order DLFs are increasingly "delayed"
0.900
0 .75 0
0.600
0 .4 50 ;, " ...
0.300
/\< "" "'''/ -.
~\
0 . 150 " .
0.0
/1:, '\. \ / '" -- ... _ -~ . . . ....
....-.-.-.-
" - " ' -"
\: :~ I ~ I,
~\ H,' \. . \., .:,j

- 0 .1 5 0 r\.\!!, -:
'j'; : \ .:\' /
-0 .300 . ; I '~." ', .' ,.
,=-,;' .'~
- 0. 45 0
j\! '" / . ""'- "'-
I'
/ ..
-0.600
0.0 5 .00 10.0 15.0 20 .0 25 .0
TIME UNITS
Figure 2.17 Discrete-time Laguerre functions (DLF) of order 0 (solid), 1 (dotted), 2 (dashed), 3 (dot-
dash), and 4 (dot-dot-dot-dash) for a = 0.2, plotted over the first 251ags [Marmarelis , 1993].
2.3 EFFICIENT VOLTERRA KERNEL ESTIMA TlON 111
0.500
0.400
0..:500
,,
0.200
0. 100
/7\ "" ":/...'.,.''
"
"
0.0
/ " ....'.
' / ....
",I it ,'
,~ i , ,
-0.100 I'
,\
i I
i !
,.-
\ ', i I "- , ,
-0.200 "
.'I
-0 ..:500
- 0.400
-0.500 ,-
0.0 10.0 20 .0 .:50.0 40.0 50.0
TIME UNITS
Figura 2.18 Discrete-time Laguerre funct ions (DLF) of order 4 (solid), 8 (dotted), 12 (dashed), and
16 (dot-dash) for Ci = 0.2, plotted over the first 25 lags [Marmarelis, 1993].
in their significant values. Note that Volterra kernels with a pure delay may require use
of "associated DLFs" of appropriate order for efficient kerne1 representation [Ogura,
1985].
In the frequency domain, the FFT magnitude of all DLFs for a given a is identical, as
illustrated in Figure 2.20 for the five DLFs of Figure 2.17. However, the FFT phase of
these DLFs is different, as illustrated also in Figure 2.20. We note that the nth-order DLF
exhibits a maximum phase shift of (n1T) , with the odd-order ones commencing at 1T radi-
ans and the even-order ones commencing at 0 radians (at zero frequency). Thus , the "min-
imum-phase" DLF is always for order zero.
In order to examine the effect of a on the form ofthe DLFs, we show in Figure 2.21
the fourth-order DLF for a = 0.1,0.2, and 004. We observe that increasing a results in
longer spread of significant values and zero crossings. Thus, kernels with longer memory
may require a larger a for efficient representation.
In the frequency domain, the FFT magnitudes and phases ofthe DLFs ofFigure 2.21
are shown in Figure 2.22. We observe that for larger a, the lower frequencies are empha-
sized more and the phase lags more rapidly, although the total phase shift is the same (41T
for all fourth-order DLFs).
A more complete illustration ofthe effect of o is given in Figure 2.23, where the ma-
trix ofthe first 50 DLFs for a = 0.1 is plotted in the same fashion as in Figure 2.19 (for a
= 0.2). It is clear from these figures that increasing a increases the separation between the
zero crossings (ripple in three-dimensional perspective plots) and broadens the "fan" for-
mation evident in the contour plots.
These observations provide an initial guide to selecting an approximate value of a for
given kernel memory-bandwidth product (M) and number ofDLFs (L) . For instance, we
Dato file : Uot2
....z
w
er::
85 SI
:::>
~
.... 0
o N
er::
~
er::
o g
:::~~: 4~ ::::: ~ ::~~: .~ :UU : ::~~: ·: :1::11:8:

10 Ja
TIME UNITS '"
Figure 2.19 The first 50 DLFs for a = 0.2, plotted from 0 to 49 lags in a three-dimensional perspec-
tive plot (Ieft panel) and contour plot (right panel) [Marmarelis, 1993].
may select the value a for which the significant values of the DLFs extend from 0 to M
while, at the same time, the DLFs values are diminished for m J$> M (in order to secure
their orthogonality). In other words, for given values M and L, a must be chosen so that
the point (M, L) in the contour plane is near the edge ofthe "fan fonnation" but outside
this "fan."
I/Iustrative Examples. We now demonstrate the use of DLF expansions for kernel
estimation. Consider first a second-order nonlinear system with the first- and second-
order Volterra kernels shown in Figure 2.24, corresponding to a simple L-N cascade.
HO
''''
L'" 4.00
"'0
3.20
2.40
"
.. -'. ;\ ~.
'''' Lee ., ., ' .

.'.
'000 i "
'000 0 _800 ,\
~
' 400 0.0
' .200 - 0 .800

~
,., -1 .60
\
0.100 o lOO 0300 ,0<>0 '500
1rt01t~1[!) fJl[OV( NCY
-2 .40
-3.20
-1 '';\'"' ' i ' .' i i i
0.0 0.100 0.200 0.300 0400 0.500
NORlolAUZED F'R[OU( NCT
(a) (b)
Figure 2.20 (a) FFT magnitude of the first five (orders 0 to 4) DLFs (shown in Figure 2.17) for a =
0.2, plotted up to the normalized Nyquist frequency of 0.5 . (b) FFT phase of the first five (orders 0 to
4) DLFs (shown in Figure 2,17) for a = 0.2, plotted up to the normalized Nyquist frequency of 0.5
[Marmarelis, 1993].
o.soc
0.400
0.300 ~ t\" ,,
,
0.200 "'j i",
n. " ,,
•, ,,
0.100 -1 / '1'
... ....... --.
0.0 -t 1' 1\ I; :
.IE
;
-0.100 ~ 11: 1\ , ~
:1
-0.200 ~ ... 1'/ :'
t..:, : !'
- 0.300 .j ~' 11 /
.. ;:
-0.400
- O . ~O O
I' • " I
0.0 10.0 20 .0 30.0 40.0 50 .0
n UE UNITS
Figure 2.21 The fourth-order DLFs for a = 0.1 (solid line). 0.2 (dotted line), and 004 (dashed line),
plotted from 0 to 491ags [Marmarelis, 1993].
This system is simulated for a band-limited GWN input of 512 data points, and the
kernels are estimated using both the cross-correlation technique (CCT) and the advocat-
ed Laguerre expansion technique (LET). The first-order kerne I estimates are shown
in Figure 2.25, where the LET estimate is plotted with asolid line (alm ost exactly
identical to the true kernel) and the CCT estimate is plotted with a dashed line. The sec-
ond-order kernel estimates are shown in Figure 2.26, demonstrating the superiority of
the LET estimates. Note that for a second-order system, the Volterra and Wiener kemels
• .80
25. • .00
3.20
r,·.
U S
' 00 2.4.
1.7S '0
1.80
' .10
' .1$
•.00
i 0....
0.0
0.'"
0....
", -... _-
•
-:-----.,
~a _ . a -. _ • • • • • • • •• J
-0.800
- 1.60
0.'" - 2.&0
0.•
0.0 0.100
~ZtO
• lOG
l"IIIeoucl<'l'
" 00 .... 0100
-s.ee
00 0'00 eseo o >CO
'OOflUAUZED ' A( OU(NCl'
(a) (b)
Figure 2.22 (a) FFT magn itude of the fourth-order DLFs shown in Figure 2.21, for a =0.1 (solid line).
a = 0.2 (dotted line), and a =004 (dashed line). (b) FFT phase of the fourth-order DLFs shown in Fig-
ure 2.21, for a = 0.1 (solid line). a = 0.2 (dotted line), a = 0.4 (dashed) [Marmarelis, 1993].
114 NONPARAMETRIC MODEL/NG
Dat a fi,, : LJr,l; 1

L-f"NS lo'ATRnl (A.O.I )
~
W
Cl<
Cl< SI
~
::s ~
~
Cl<
o !!
:: :~:: 4:::::: ~::~:: 61::: :: :::~:: -g :U1U::: 20 Ja

TIM E UNI TS
-c
Figure 2.23 The first 50 DLFs for a = 0.1, plotted from 0 to 49 lags in three-dimensional perspective
plot (Ieft panel) and contour plot (right panel) [Marmarelis, 1993].
are identical (except for the zeroth-order one). Thus, the Volterra kernel estimates ob-
tained via LET can be directly compared with Wiener kernel estimates obtained via
CCT .
These LET estimates were obtained for L = 10 and IX = 0.1, and the estimates of the
Laguerre coefficients in this case are shown in Figure 2.27 for the first-order and second-
order kcrnels, Note that thc second-order eoeffieients are plotted as a symmetrie matrix,
even though only the entries of one triangular region are estimated. It is evident from Fig-
ure 2.27 that an adequate estimation ofthese kerneis canalso be aecomplished with L = 8
(due to the very small values of the ninth and tenth coefficients), which demonstrates the
significant compactness in kernel representation aecomplished by the Laguerre expansion
(for M = 50 and L = 8, the savings faetor is about 35 for a second-order model and grows
rapidly for higher-order models).
Recall that the number L ofrequired DLFs for accurate representation ofthe kerneis of
0.100
x-min w 0.0000
0.000 x-max = IS.OOOO
0 .seo
y-min - 0.0000
y-max = Il .OOOO
0. _ e-rnte - -O. IOI9E +OO
z-max e O.5076E+OO
i~ 0 ,300
0.200
0.100
0.0
- 0.100
-0%00
00 ... '00
TlWEI.JoCö (T,t,U )
". 0 20_0 2~ C
(a) (b)
Figure 2.24 (a) Exact first-order kernel of simulated system. (b) Exact second-order kemel of simu-
lated system [Marmarelis , 1993].
2.3 EFFICIENTVOLTERRA KERNEL ESTIMATION 115
0.800
0.700
0.600
0.500
0.400
~
....., 0.300
:
0.200
0.1.00
" ... _,' " ,
0.0 \
,._.1
r, -------,
\
....... ,,
, I
-0.100 \
-0.200
-.
0.0 5.00 10.0 15.0 20.0 25.0
TIME LAG (TAU)
Figure 2.25 The estimated first-order kerneI via LET (solid line) and CCT (dashed line) [Marmarelis,
1993].
a given system critically affects the computational burden associated with this method. In
general, the total number ofresulting expansion coefficients (free parameters ofthe mod-
el) for a system of order Q is (L + Q)!/(L!Q!). Note that this number includes all kerneIs
up to order Q and takes into account the symmetries in high-order kerneIs.
The required L in a given application can be detennined by the statistical procedure de-
scribed in the previous section [see Equation (2.200)] or can be empirically selected by
estimating the first-order kernel for a large L, and then, by inspecting the obtained coeffi-
cient estimates, we can select the minimum number that corresponds to significant values.
x-rnin = 0.0000 x-min 0.0000

x-rnax = 15.0000 x-max 15.0000
y-min = 0.0000 y-min 0.0000
y-rnax = 15.0000 y-max 15.0000
z-min = -0.1013E+OO
z-rnax = 0.507SE+OO I A z-min
z-max
-0. 1464E+OO
O.S421E+OO
(a) (b)
Figure 2.26 (a) Second-order kernel estimated via LET. (b) Second-order kernel estimated via CCT
[Marmarelis, 1993].
The same reasoning can be applied to high-order kemels (e.g., the second-order coeffi-
cients shown in Figure 2.27), although the pattern of convergence ofthe Laguerre expan-
sion depends on o, a fact that prompted the introduction of the iterative scheme (dis-
cussed above) for estimating a (along with the expansion coefficients) on the basis ofthe
data. Selection of the required maximum kernel order Q can also be made using the statis-
tical procedure ofEquation (2.200) or can be empirically based on preliminary tests or on
the adequacy of the output prediction accuracy of a given model order (as in all previous
applications ofthe Volterra-Wiener approach). We strongly recommend the selection of
the key model parameters Land Q using the statistical criterion of Equation (2.200) and
successive trials in ascending order, as discussed above. The use of the popular criteria
(e.g., Akaike information criterion or minimum description length), although rather com-
mon in established practice, is discouraged as potentially misleading.
We now examine the effect of noise on the obtained kernel estimates by adding inde-
pendent GWN to the output signal for a signal-to-noise ratio (SNR) of 10 dB. The first-or-
der kernel estimates obtained by LET and CCT are shown in Figure 2.28 as asolid line
for LET and as a dotted line for CCT. The corresponding second-order kernel estimates
are shown in Figure 2.29 and demonstrate the superiority ofthe LET approach in terms of
robustness under noisy conditions. Note that the LET estimates in this demonstration
were computed with L = 8, and required very short computing time. The fact that kernel
estimates of this quality can be obtained from such short data records (512 input-output
data points), even in cases with significant noise (SNR = 10 dB), can have important im-
plications in actual applications of the advocated modeling approach to physiological sys-
tems.
Another important issue in actual applications is the effect of higher-order nonlinear-
ities on the obtained lower-order kerneI estimates when a truncated (incomplete)
Volterra model is used. Since most applications limit themselves to the first two kemels,
the presence of higher-order (>2) Volterra functionals acts as a source of "correlated
noise," which is dependent on the input. To illustrate this effect, we add third-order and
fourth-order nonlinearities to the previous system and recompute the kerneI estimates.
0.110
x-min = 0.‫סס‬OO
0.120 x-max = 9.‫סס‬OO
y-min = 0.‫סס‬OO
O.eooE-Ol y-max = 9.‫סס‬OO
z-min = -0.1980E+OO
0.0 z-max = 0.S932E+OO
-0.800E-01
Ü -0.120
-0.1150
-0.240
-0.300
-0.360
-0.420
0.0 2.00 4.00 6.00 8.00 10.0
ORDER OF'I.AGUERRE COEFF'ICIENT
(a) (b)
Figure 2.27 (a) Estimates of the first 10 (order 0 to 9) Laguerre expansion coefficients for the first-
order kerne!. (b) Estimates of the Laguerre expansion coefficients (order 0 to 9 in each dimension) for
the second-order kernei, plotted as asymmetrie two-dimensional array (total number of distinct co-
efficients is 55 ) [Marmarelis, 1993].
0.800
0.700
0.800
0.500
0.400
0.300
0.200
0.100
0.0
.... ~
-0.100 -,
-0.200
-.
0.0 5.00 10.0 15.0 20.0 25.0
TIME LAG (TAU)
Figure 2.28 First-order kernel estimates obtained via LET (solid line) and CCT (dotted line) for noisy
data (SNR = 10dB) [Marmarelis, 1993].
Note that the simulated system is a simple L-N cascade of a linear filter followed by a
static nonlinearity of the form y = z + Z2 (for the second-order system) and y = Z + Z2
+ z3!3 + z4/4 (for the fourth-order system), where z{t) is the output of the linear filter.
The obtained first-order kerneI estimates are shown in Figure 2.30, where the exact
Wiener kerne I is plotted with asolid line, the LET estimate with a dotted line, and the
CCT estimate with a dashed line. The LET estimate is much better than its CCT coun-
terpart, but it exhibits certain minor deviations from the exact kernel due to the presence
of the third-order and fourth-order terms. The exact second-order Volterra kernel of the
fourth-order system has the same form as in Figure 2.24, but its second-order Wiener
kerne} is scaled by a factor of 2.23 because of the presence of the fourth-order nonlin-
earity [see Equation (2.57) for P = 1]. The second-order kerne I estimates obtained via
x-min o.oooo x-min 0.0000

x-max rs.oooo x-max 15.0000
y-min O.()()()() y-min 0.0000
y-max is.ooco y-max 15.0000
z-min
z-max
-O.l06SE+OO
0.S120E+OO I A z-min
z-max
-0. 1395E+OO
O.5491E+OO
(a) (b)
Figure 2.29 (a) Second-order kernel estimated obtained via LET for noisy data (SNR = 10 dB). (b)
Second-order kernel estimated obtained via CCT for noisy data [Marmarelis, 1993].
1.60
1.40
1.20
1.00
0.800
0.600
0.400
"
'\
0.200
,--
~ "
\
, I
r-. , ..
0.0 "'~".":.I.l\ , ' ...... ,
\,'
-0.200
-0.400
\
\ ...... "
I
T
0.0 5.00 10.0 15.0 20.0 25.0
TIME LAG (TAU)
Figure 2.30 First-order kernel estimates for fourth-order simulated system obtained via LET (dotted
line) and CCT (dashed line). The exact kernel is plotted with the solid line and is almost identical to
the LET estimate [Marmarelis, 1993].
LET and CCT are shown in Figure 2.31 and demonstrate that the LET estimate is bet-
ter than its CCT counterpart. Note that the LET estimates closely resemble the exact
Wiener or Volterra kernels in form (since both have the same shape in this simulated
system), but have the size of the Wiener kernels (for this cascade system) because the
input is GWN and the estimated model is truncated to the second order.
An important advantage of the advocated technique (LET) is its ability to yield accu-
rate kernel estimates even when the input signal deviates from white noise (as long as the
model order is correct). This is critically important in experimental studies where a white-
noise (or quasiwhite) input cannot be easily secured. As an illustrative example, consider
the previous second-order system being simulated for a nonwhite (broadband) input with
two resonant peaks [Marmarelis, 1993]. The first-order kernel estimates obtained via LET
x-min 0.0000
x-min 0.0000 x-max 15.0000
x-max 15.0000 y-min 0.0000
y-min 0.0000 y-max 15.0000
y-max 15.0000 z-min -0.3564E+OO
z-min -0.2077E+OO z-max 0.1223E+OI
z-max 0.1157E+OI
(a) (b)
Figure 2.31 (a) Second-order kernel estimated for the fourth-order simulated system obtained via
LET. (b) Second-order kernel estimated for the fourth-order simulated system obtained via CCT
[Marmarelis, 1993].
0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100
,'", ... ", '.
.. -'" , '" \ \
',I,"~
0.0 ->, "
\ I
-0.100 \, ,
\ , ', __,'
-0.200
T
0.0 5.00 10.0 15.0 20.0 25.0
TIME LAG (TAU)
Figure 2.32 First-order kernel estimates for nonwhite stimulus obtained via LET (dotted line) and
CCT (dashed line). The exact kernel of the solid line is nearly superimposed on the LET estimate
[Marmarelis, 1993].
and CCT are shown in Figure 2.32, where the LET estimate (dotted line) is almost identi-
cal to the exact kerne I (solid line), while the CCT estimate (dashed line) shows the effects
ofthe nonwhite stimulus in terms of estimation bias (in addition to the anticipated estima-
tion variance). The second-order kernel estimates obtained via LET and CCT are shown
in Figure 2.33 and clearly demonstrate that the LET estimate is far better than the CCT es-
timate and is not affected by the nonwhite spectral characteristics ofthe input. Note, how-
ever, that the LET estimates will be affected by the spectral characteristics of a nonwhite
input in the presence of higher-order nonlinearities. This is illustrated by simulating the
previous fourth-order system with the nonwhite input. The obtained first-order kernel es-
timates via LET and CCT are shown in Figure 2.34, along with the exact kernel, The LET
estimate is affected by showing some additional estimation bias relative to the previous
x-min 0.0000
x-max lS.OOOO x-min = 0.0000
y-min 0.0000 x-max = lS.OOOO
y-max 1S.OOOO y-min = 0.0000
z-min -0.1021E+OO y-max = lS.OOOO
z-max O.SOSOE+OO z-rnin = -0.1876E+OO
z-max = O.644SE+OO
(a) (b)
Figure 2.33 (a) Second-order kernel estimated for nonwhite stimulus via LET. (b) Second-order ker-
nel estimated for nonwhite stimulus via CCT [Marmarelis, 1993].
1.60
1.40
1.20
1.00
0.800
0.600
0 ..00
0.200
,.
;'" '"
,,,,
0.0
... '" " ... I', ,----; ;' '.~2
, .. I
, 3
"- . . ~Ml"' .......
-0.200
-0.400
t········ .
0.0 5.00 10.0 15.0 20.0 25.0
TIME lAG (TAU)
Figure 2.34 First-order kernel estimates for nonwhite stimulus and fourth-order system obtained
via LET (dotted line) and CCT (dashed line). The exact kernel is plotted with asolid line [Marmarelis,
1993].
case of white input for the fourth-order system (see Figure 2.30), but still remains much
better than its CCT counterpart. The same is true for the second-order kerne1 estimates
shown in Figure 2.35. It is interesting to note that the overall form of the CCT second-
order kernel estimates for the nonwhite input (see Figures 2.33 and 2.35) is not affected
much by the presence of higher-order terms, even though the estimates are rather poor in
both cases.
These results demonstrate the fact that the advocated LET approach yields accurate
kerne 1estimates from short experimental data records, even for nonwhite (but broadband)
inputs, when there are no significant nonlinearities higher than the estimated ones (non-
truncated models). However, these kernel estimates may be seriously affected when the
x-min 0.0000 x-min 0.0000

x-max 15.0000 x-max 15.0000
y-min 0.0000 y-min 0.0000
y-max 15.0000 y-max 15.0000
z-min -0.2835E+OO z-min -0.3677E+OO
z-max 0.7750E+OO f\ z-max 0.9611E+OO
(a) (b)
Figure 2.35 (a) The second-order kernel estimate for the nonwhite stimulus and fourth-order sys-
tem obtained via LET. (b) The second-order kernel estimate for the nonwhite stimulus and fourth-or-
der system obtained via CCT [Marmarelis, 1993].
experimental input is nonwhite and significant higher-order nonlinearities exist beyond

the estimated model order (truncated model). This is due to the fact that the model residu-
als are correlated and input-dependent, resulting in certain estimation bias owing to low-
er-order "projections" from the omitted higher-order terms (which are distinct from the
projections associated with the structure of the Wiener series). Of course, this problem is
alleviated when all significantnonlinearities (kernels) are included in the estimated mod-
el. This is illustrated below with the accurate estimation of third-order kernels from short
data records (made possible by the ability of the LET approach to reduce the number of
required parameters for kernel representation). Although this compactness of kernel rep-
resentation cannot be guaranteed in every case, the structure of the DLFs (i.e., exponen-
tially weighted polynomials) makes it likely for most physiological systems, since their
kemels usually exhibit asymptotically exponential structure.
Let us now consider the previously simulated cascade system with the third-order non-
linearity y = z + Z2 + z3, receiving a band-limited GWN input of 1024 data points. The re-
sulting third-order kernel estimate via LET is shown in Figure 2.36, as a three-dimension-
al "slice" for m-; = 4 (note that visualization of third-order kernels requires taking
"three-dimensional slices" for specific values of m3)' Comparison with the exact third-or-
der kerne I "slice" at m-; =: 4 shown also in Figure 2.36, indicates the efficacy of the LET
technique for third-order kerne I estimation. Results of similar quality were obtained for
all other values of m-; It has been shown that the third-order kernel estimate obtained via
the traditional CCT in this case is rather poor [Marmarelis, 1993].
The ability of the advocated LET approach to make the estimation of third-order ker-
nels practical and accurate from short data records creates a host of exciting possibilities
for the effective nonlinear analysis of physiological systems with significant third-order
nonlinearities. This can give good approximations of nonlinearities with an inflection
point (such as sigmoid-type nonlinearities that have gained increasing prominence in re-
cent years) that are expected to be rather common in physiology because of the require-
ment for bounded operation for very large positive or negative input values. The effica-
cy of the advocated kerne I estimation technique (LET) is further demonstrated in
Chapter 6 with results obtained from actual experimental data in various physiological
systems.
x-min 0.0000
x-max 15.0000
x-min = 0.0000 y-min 0.0000
x-max = 15.0000 y-max 15.0000
y-min = 0.0000 z-min -0.7259E-Ol
y-max = 15.0000 z-max 0.1457E-OI
z-min = -0.7306E-OI
z-max es O.2003E-OI
(a) (b)
Figure 2.36 (a) Third-order kernet estimate (two-dimensional "slice" at m3 = 4) obtained via LET. (b)
Exact third-order kernet "slice" at m3 = 4 [Marmarelis, 1993].
2.3.3 High-Order Volterra Modeling with Equivalent Networks

The use of a basis of functions for kerne! expansion (discussed in the previous two sec-
tions) is equivalent to using a linear filterbank to preprocess the input in order to arrive at
a static nonlinear relation between the system output and the multiple outputs of the filter
bank, in accordance with Wiener's original suggestion (see Figure 2.9).
As discussed in Section 2.2.2, Wiener further suggested the expansion of the multiin-
put static nonlinearity into a set of Hermite multinomials, giving rise to a network of ad-
justable coefficients of these multinomials that are characteristic of each system and must
be estimated from the input-output data (i.e., a parametric description of the system).
This model structure (often referred to as the Wiener-Bose model and shown in Figure
2.9) is an early network configuration that served as an equivalent model for the Wiener
class ofnonlinear dynamic systems. For reasons discussed in Section 2.2.2, this particular
model form was never adopted by users. However, Wiener's idea has been adapted suc-
cessfully in the Volterra context, where the multiinput static nonlinear relation can be ex-
pressed in multinomial form, as shown in Equation (2.178) for the general Volterra mod-
el representation (see Figure 2.16). This modular structure ofthe general Volterra model
can be also cast as an equivalent feedforward network configuration that may offer certain
practical/methodological advantages presented below.
Various architectures can be utilized to define equivalent networks for Volterra models
of physiological systems. Different types of input preprocessing can be used (i.e., differ-
ent filter banks) depending on the dynamic characteristics of the system, and different
numbers or types ofhidden units (e.g., activation functions) can be used depending on the
nonlinear characteristics of the system. The selected architecture affects the performance
ofthe model in terms ofparsimony, robustness and prediction accuracy, as elaborated in
Chapter 4 where the connectionist modeling approach is discussed further. In this section,
we will limit ourselves to discussing the network architectures that follow most directly
from the Volterra model forms discussed in Section 2.3.1.
Specifically, we focus on the use of filter banks for input preprocessing and polynomi-
al-type decompositions of the static nonlinearity (i.e., polynomial activation functions of
the hidden units) transforming the multiple outputs of the filter bank. into the system out-
put. One fundamental feature of these basic architectures is that there are no recurrent
connections in these equivalent networks and, therefore, they attain forms akin to feedfor-
ward "artificial neural networks" that have received considerable attention in recent
years. Nonetheless, recurrent connections may offer significant advantages in certain
cases and are discussed in Chapter 10. Due to the successful track record ofthe Laguerre
expansion technique (presented in Section 2.3.2), the particular feedforward network ar-
chitecture that utilizes the discrete-time Laguerre filter bank for input preprocessing,
termed the "Laguerre-Volterra network," has found many successful applications in re-
cent years and is elaborated in Section 4.3.
The modular representation of the general Volterra model shown in Figure 2.16 is
equivalent to a feedforward network architecture that can be used to obtain accurate and
robust high-order Volterra models using relatively short data records. One particular im-
plementation that has been used successfully in several applications to physiological
system modeling is the aforementioned Laguerre-Volterra Network (discussed in
Section 4.3). In this section, we discuss the general features of the feedforward
"Volterra-Wiener Network" (VWN) which implements directly the block-structured
model of Figure 2.16 with a single hidden layer having polynomial activation functions.
The VWN employs a discrete-time filter bank with trainable parameter(s) for input pre-
processing and is equivalent to a "separable Volterra network," discussed below. The
applicability of the VWN is premised on the separability of the multiinput static nonlin-
earity f into a sum of polynomial transformations of linear combinations of the L out-
puts {Vj} of the filter bank as
yen) = f(vJ, ... , VL)

H Q {L
== Co + ~ ~Ch,q ~ Wh,jvi n )
}q (2.210)
where {Wh,j} are the connection weights from the L outputs ofthe filter bank to the Hhid-
den units and {Ch,q} are the coefficients of the polynomial activation functions of the hid-
den units.
Although this separability cannot be guaranteed in the general case for finite H, it has
been found empirically that such separability is possible (with satisfactory approximation
accuracy) for all cases ofphysiological systems modeling attempted to date. In fact, it has
been found that reasonable approximation accuracy is achieved even for small H, leading
to the notion of "principal dynamic modes" discussed in Section 4.1.1.
It can be shown that exact representation of any Volterra system can be achieved by
letting Land H approach infinity [Marmarelis & Zhao, 1994, 1997]. However, this math-
ematical result is of no practical utility as we must restriet Land H to small values by
practical necessity.
Estimation ofthe unknown parameters ofthe VWN (i.e., the w's and c's) is achieved
by iterative schemes based on gradient descent and the chain rule of differentiation for er-
ror back-propagation, as discussed in detail in Section 4.2. Here we limit ourselves to dis-
cussing the main advantage of the VWN model (i.e., the reduction in the number of un-
known parameters) relative to conventional Volterra or MDV models, as weIl as the
major considerations governing their applicability to physiological systems.
In terms ofmodel compactness, the VWN has [H(L + Q) + 1] free parameters (exclud-
ing any trainable parameters of the filter bank), whereas the MDV model of Equation
(2.180) based on kernel expansions has [(L + Q)!/(L!Q!)] free parameters (in addition to
any filter bank parameters, such as the DLF parameter a). The conventional discrete
Volterra model of Equation (2.32) with memory-bandwidth product M has [(M +
Q)!/(M!Q!)] free parameters. Since typically M $> L, the conventional discrete Volterra
model is much less compact than the MDV model and we will focus on comparing the
MDV with the VWN. It is evident that the VWN is more compact when
H < (L + Q - 1) ... (L + 1)/ Q!
Clearly, the potential benefits ofusing the VWN increase for larger Q, given that His ex-
pected to retain small values in practice (less than five). Since L is typically less than ten,
the VWN is generally expected to be more compact than the MDV model (provided the
separability property applies to our system). As an example, for H = 3, L = 10, and Q = 3
we have a compactness ratio of 7 in favor of VWN, which grows to about 67 for Q = 5.
This implies significant savings in data-record length requirements when the VWN is
used, especially for high-order models. Note that the savings ratio is much higher relative
to conventional discrete Volterra models (about 4,000 for Q = 3).
Another network model architecture emerges when the separability concept used in
VWN is applied to the conventional discrete Volterra model, without a filter bank for pre-
processing the input. This leads to the "separable Volterra network" (SVN) that exhibits
performance characteristics between VWN and MDV, since its number of free parame-
ters is [H(M + Q) + 1]. For instance, the VWN is about eight times more compact than the
SVN when M = 100, L = 10, and Q = 3, regardless of the number of hidden units. The
SVN is also useful in demonstrating the equivalence between discrete Volterra models
and three-layer perceptrons or feedforward artificial neural networks [Marmarelis &
Zhao, 1997], as discussed in Section 4.2.1.
It is evident that a variety of filter banks can be used in given applications to achieve
the greatest model compactness. Naturally, the performance improvement in the model-
ing task depends on how weIl the chosen filter bank matches the dynamics ofthe particu-
lar system under study. This has given rise to the concept of the "principal dynamic
modes," which represent the optimal (minimal) set of filters in terms of MDV model
compactness for a given system, as discussed in Section 4.1.1.
It is also worth noting that different types of activation functions (other than polynomi-
al) can be used in these network architectures, as long as they retain the equivalence with
the Volterra class of models. Specifically, any complete set of analytic functions will re-
tain this equivalence (e.g., exponential, trigonometric, polynomial, and combinations
thereof). Of course the key practical criterion in assessing suitability for a specific system
is the resulting model compactness for a given level ofprediction accuracy. The latter is
determined by the combination ofthe selected filter bank and activation functions (in the
form and numbers). A promising approach to near-optimal selection of a network model
for physiological systems is presented in Section 4.4.
The model compactness determines the required length of experimental data and the
robustness of the obtained parameter estimates (i.e., estimation accuracy) for given sig-
nal-to-noise ratio (SNR}-both important practical considerations. In general, more free
parameters and/or lower SNR in the data imply longer data-record requirements-a fact
that impacts critically the experiment design and our fundamental ability to achieve our
scientific goal of understanding the system function under the prevailing experimental!
operational conditions. For instance, issues of nonstationarity of the experimental prepa-
ration may impose strict limits on the length of the experimental data that can be collected
in a stationary context, with obvious implications on model parameterestimation. In addi-
tion, lower SNR in the data raises the specter of unintentional overfitting when the model
is not properly constrained (i.e., having the minimum necessary number of free parame-
ters). This risk is mitigated by the use of statistical criteria for model order selection, like
the one presented in Section 2.3.1 for MDV models. Finally, it must be noted that the
model compactness may affect our ability to properly interpret the results in a physiologi-
cally meaningful context.
It is conceivable that the introduction of more "hidden layers" in the network architec-
ture (i.e., more layers of units with nonlinear activation functions) may lead to greater
overall model compactness by allowing reduction in the number of hidden units in each
hidden layer (so that the total number ofparameters is reduced). The introduction of addi-
tional hidden layers also relaxes the aforementioned requirement of separability, ex-
pressed by Equation (2.210). The estimation/training procedure is minimally impacted by
the presence of multiple hidden layers. The case of multiple hidden layers will be dis-
cussed in Sections 4.2 and 4.4 and may attain critical importance in connection with mul-
tiinput and spatiotemporal models (see Chapter 7).
2.4 ANALYSIS OF ESTIMATION ERRORS 125
2.4 ANALYSIS OF ESTIMATION ERRORS
In this section, we elaborate on the specific estimation errors associated with the use of
each ofthe presented kernel estimation methods: (1) the cross-correlation technique (Sec-
tion 2.4.2), (2) the direct inversion methods (Section 2.4.3), and (3) the iterative cost min-
imization methods (Section 2.4.4). We begin with an overview ofthe various sources of
estimation errors in the following section.
2.4.1 Sources of Estimation Errors

The main sources of estimation errors in nonparametric modeling can be classified into
three categories:
1. Model specification errors, due to mismatch between the system characteristics and
the specified model structure
2. Estimation method errors, due to imperfections of the selected estimation method
for the given input and model structure
3. Noiselinterference errors, due to the presence of ambient noise (including measure-
ment errors) and systemic interference
In the advocated modeling approach that employs kerne I expansions or equivalent net-
work structures, the model specification errors arise from incorrect selection of the key
structural parameters of the model. For instance, in the Laguerre expansion technique
(LET), the key structural parameters are the number L of discrete-time Laguerre functions
(DLFs), the order of nonlinearity Q, and the DLF parameter a. In the case of a
Volterra-Wiener Network (VWN), the key structural parameters are the number L offilters
in the filterbank, the number H ofhidden units, and the order ofnonlinearity Q in the poly-
nomial activation functions. Misspecification of these structural parameters will cause
modeling errors whose severity will depend on the specific input signals and, of course, on
the degree of misspecification. This type of error is very important, as it has broad ramifi-
cations both on the accuracy and the interpretation ofthe obtained models. It can be avoid-
ed (or at least minimized) by the use of a statistical search procedure utilizing successive
trials, where the parameter values are gradually increased until a certain statistical criterion
on the incremental improvement ofthe output prediction (typically quantified by the sum
ofthe squared residuals) is met. This procedure is described in detail in Section 2.3.1.
The errors associated with the selected estimation method depend also on the specific
input signals as they relate to the selected model structure. In the case of LET, the direct
inversion method is typically used, since the estimation problem is linear in the unknown
parameters. The required matrix inversion (or pseudoinversion if the matrix is ill-condi-
tioned) is subject to all the subtleties and pitfalls of this well-studied subject. In short, the
quality of the result depends on the input data and the structural parameters of the model
that determine the Gram matrix to be inverted. Generally, ifthe input is a broadband ran-
dom signal and the model structure is adequate for the system at hand, then the results are
expected to be very good-provided that the effects of noise/interference are not exces-
sive (see third type of error below). However, ifthe model structure is inadequate or the
input signal is a simple waveform with limited information content (e.g., pulse, impulse,
sinusoid, or narrowband random), then the resulting Gram matrix is likely to be ill-condi-
tioned and the results are expected to be poor unless a pseudoinverse of reduced rank is
used.
In the case ofVWN, the iterative cost-minimization method is typically used, since the
estimation problem is nonlinear in some of the unknown parameters (i.e., the weights of
the filter bank outputs). A variety of algorithms are available for this purpose, most of
them based on gradient descent. Generally, the key issues with these algorithms are the
rate of convergence and the search for the global minimum (avoidance of local mini-
ma)-both of which depend on the model structure and the input characteristics. The
proper definition of the minimized cost function depends on the prediction error statistics
that include the noise/interference contaminating the data. Although a quadratic cost
function is commonly used, the log-likelihood function ofthe model residuals can define
the cost function in order to achieve estimates with minimum variance. Many important
details concerning the application of these algorithms are discussed in Sections 4.2 and
4.3, or can be found in the extensive literature on this subject that has received a lot of at-
tention in recent years in the application context of artificial neural networks (see, for in-
stance, Haykin, 1994 and Hassoun, 1995).
The last type of estimation error is due to the effects of ambient noise andlor sys-
temic interference contaminating the input-output data. The extent of this error depends
on the statistical and spectral characteristics of the noise/interference as they relate to
the respective data characteristics, the selected model structure, and the employed esti-
mation method. For instance, if there is considerable noise/interference at high frequen-
cies but little noise-free output power at these high frequencies, then the resulting esti-
mation errors will be significant if the selected model structure allows high-frequency
dynamics. On the other hand, these errors will be minimal if the selected model struc-
ture constrains the high-frequency dynamics (e.g., a large DLF parameter a andlor small
L in the LET method) or if the input-output data are low-pass filtered prior to process-
ing in order to eliminate the high-frequency noise/interference. It should be noted that
the iterative cost-minimization methods allow for explicit incorporation of the specific
noise/interference statistics (if known or estimable) in the minimized cost function to re-
duce the estimation variance and achieve robust parameter estimates (see Sections 2.1.5,
4.2, and 4.3).
Naturally, there is a tremendous variety in the possible noise/interference characteris-
tics that can be encountered in practice, and many possible interrelationships with the
model/method characteristics to provide general practical guidelines for this type of error.
However, it can be generally said that the noise/interference characteristics should be
carefully studied and intelligently managed by proper preprocessing of the data (e.g., fil-
tering) and judicious selection of the model structural parameters for the chosen estima-
tion method and available data. This general strategy is summarized in Chapter 5 and il-
lustrated with actual applications in Chapter 6. It is accurate to say that the proper
management of the effects of noise/interference is of paramount importance in physiolog-
ical system modeling and often determines the success of the whole undertaking. Note
that, unlike man-made systems in which high-frequency noise is often dominant, physio-
logical systems are often burdened with low-frequency noise/interference that has tradi-
tionally been given less attention.
The modeling errors that result from possible random fluctuations in system character-
istics are viewed as being part of the systemic noise, since the scope of the advocated
methodology is limited to deterministic models. Such stochastic system variations can be
studied statistically on the basis of the resulting model residuals.
The effects of noise and interference are intrinsie to the experimental preparation and
do not depend on the analysis procedure. However, the specification and estimation errors
depend on the selected modeling and estimation methodology. The issue of model speci-
fication errors runs through the book as a central issue best addressed in the nonparamet-
ric context (minimum prior assumptions about the model structure). A major thrust ofthis
book is to emphasize this point and develop the synergistic aspects with other modeling
approaches (parametric, modular, and connectionist).
The model specification and estimation errors are discussed in the following section
for the various nonparametric methods presented in this book: the cross-correlation tech-
nique, the direct-inversion methods, and the iterative cost-minimization methods.
2.4.2 Estimation Errors Associated with the Cross-Correlation Technique

In this section, we concentrate on the estimation errors associated with the use of the
cross-correlation technique for the estimation of the Wiener or CSRS kernels of a nonlin-
ear system (see Sections 2.2.3 and 2.2.4). Although the introduction ofthe cross-correla-
tion technique is theoretically connected to GWN inputs, it has been extended (by practi-
cal necessity) to the case of quasiwhite (CSRS or PRS) inputs used in actual applications,
as discussed in Section 2.2.4. Since we are interested in studying the estimation errors
during actual applications of the cross-correlation technique, we focus here on the broad
class of CSRS quasiwhite inputs (that includes the band-limited GWN as a special case)
and the discrete-time formulation ofthe problem, as relevant to practice. Detailed analysis
was first provided in Marmarelis and Marmarelis (1978).
The estimation of the rth-order CSRS kernel, gn requires the computation of the rth-
order cross-correlation between sampled input-output data of finite record length N and
its subsequent scaling as
" c, ~
gr(mJ, ... ,m r) = - L..,y(n)x(n - ml) ... x(n - m r) (2.211)
N n=l
where C; is the scaling constant, x(n) is the discretized CSRS input, and yen) is the dis-
cretized output, used here instead ofthe rth-order output residual (as properly defined in
Section 2.2.4) to facilitate the analytical derivations, i.e., the analysis applies to the nondi-
agonal points of the kernel argument space. It is evident that the estimation error is more
severe at the diagonal points, because of the presence of low-order terms in the CSRS or-
thogonal functionals; however, the number of diagonal points in a kernel is much smaller
than the number of nondiagonal points. Thus, the overall kerne I estimation error will be
determined primarily by the cumulative errors at the nondiagonal points. We note that the
study of the errors at diagonal points of the kernels makes use of the higher moments of
the input CSRS, thus complicating considerably the analytical derivations. The important
issues related to CSRS kernel estimation at the diagonal points are discussed in Section
2.2.4.
The estimation errors for the rth-order CSRS kerneI estimate, gn given by Equation
(2.211) depend on the step size fit ofthe particular quasiwhite CSRS inputx(n) and on the
record length N. Note that fit may not be equal to the sampling interval T (T ::5 fit), and
that the initial portion of the actual experimental data record (equal to the extent of the
computed kernel memory) is not included in the summation interval ofEquation (2.211)
in order to prevent estimation bias resulting from null input values.
The use of Equation (2.211) for the estimation of the CSRS results in two main types
of errors, which are due to the finite record length and the finite input bandwidth. These
two limitations of finite record and bandwidth are imposed in any actual application of
the method and, consequently, the study of their effect on the accuracy of the obtained
kerne I estimates becomes of paramount importance. In addition to the aforementioned
sources of estimation error, there are errors due to the finite rise-time of input transducers,
which along with the discretization and digitization errors can be made negligible in prac-
tice [Marmarelis & Marmarelis, 1978]. Of particular importance is the sampling rate,
which must be higher than the Nyquist frequency of the input-output signals in order to
alleviate the aliasing problem (T :=; ~t) and must exceed the system bandwidth Bs(T :=;
(2B s )- I). Also, digitization length of at least 12 bits is needed to provide the numerical ac-
curacy sufficient for most applications.
We concentrate on the two main types of errors caused by the finite record length and
bandwidth of the CSRS family of quasiwhite test signals [Marmarelis, 1975, 1977, 1979].
The relatively simple statistical structure of the CSRS facilitates the study of their auto-
correlation properties (as discussed in Section 2.2.4), which are essential in the analysis of
the kernel estimation errors.
Substituting yen) in terms of its Volterra expansion in Equation (2.211), we obtain
the following expression for the nondiagonal points of the rth-order CSRS kerne I esti-
mate:
gr(mb· .. ,mr) = er.L I ... I k;(ub ... , U;)~r+;(Ub ... , U;, nu, ... ,m r ) (2.212)
i=O Ui Ui
where the factor P has been incorporated in the discretized Volterra kerne I k., and
~ 1 N
cPr+i( (Tl> ... , (Ti> ml> · · . , m r ) = N ~x(n - (Tj) ... x(n - (T;}x(n- mj) ... x(n - mr ) (2.213)
is the estimate of the (r + i)th autocorrelation function of the CSRS input x(n).
Clearly, the error analysis for the obtained CSRS kernel estimate gr necessitates the
study of the statistical properties of the autocorrelation function estimate ~r+;, as they af-
fect the outcome ofthe summation in Equation (2.212). The estimation error ingr consists
of two parts: the bias and the variance.
Estimation Bias. The bias in the CSRS kernel estimate is due to the finite bandwidth
of the quasiwhite input (i.e., the finite step size ~t). The possible deviation of the CSRS
input amplitude distribution from Gaussian may also cause a bias (with respect to the
Wiener kernels) but it is not actually an error, since it represents the legitimate differ-
ence between the Wiener kernels and the CSRS kernels of the system. Therefore, this
type of bias will not be included in the error analysis and was studied separately in
Section 2.2.4. Note that the difference between Wiener and CSRS kernels at the nondi-
agonal points vanishes as ~t tends to zero [see, for instance, Equations (2.128) and
(2.134)].
The estimation bias due to the finite input bandwidth amounts to an attenuation of
high frequencies in the kerne I estimate. The extent of this attenuation can be studied in
the time domain by considering the 2rth-order autocorrelation function cP2r of a CSRS at
the nondiagonal points (mI' ... , mr ) , which was found to be [Marmarelis, 1977]
4J2r(O"b . . . , a.; mi, ... , mr ) = {M2 fI(1 - Im f i tI).'

z=1
i
- (Ti for Imi - O"il ~ fit
(2.214)
o elsewhere
Assuming that the rth-order Volterra kernel k, is analytic in the neighborhood ofthe non-
diagonal point (mb . . . , m r), we can expand it in a multivariate Taylor series about that
point. With the use ofEquation (2.214), we cari obtain
r
Ergr(mb ... , m r)] = kr(mb ... , m r) + Ifit2/ I D/(jb ... ,l/)/tr2l)(m b ... , m r)
/=1 Jl,· .. ,iFl
(2.215)
where k/2l)(mb ... , m r) denotes the 21th partial derivative of k; with respect to each ofits
arguments twice, evaluated at the discrete point (mh ... , mr), and D/(jb ... ,li) depends
on the multiplicity ofthe sets ofindices (jb ... ,li). More specifically, ifa certain combi-
nation (j 1, . . . ,li) consists of I distinct groups of identical indices, and Pi is the population
number (multiplicity) ofthe ith group, then
I 1
D/(j b ... ,j/) = JJ (2p;)!(Pi + 1)(2Pi + 1) (2.216)
In all practical situations, fit is much smaller than unity, which allows us to obtain a sim-
pler approximate expression for the expected value of the kernel estimate given by
,.,
E[gr(mb ... , m r)] == kr(mb ... , mr) + -
fit2 ~ a'lkr( 'Tb ... , 'Tr)
L 2
I ._. (2.217)
12 J=1 a'TJ 7j-mJ
Therefore, ifhigher-order Volterra functionals are present in the system (note that the
lower-order ones will vanish for nondiagonal points), then we can show that, in first ap-
proximation, they give rise to terms of the form [Marmarelis, 1979]
Ergr(mb ... , m
r)]
== i/~O (r + 2l)!(M2ätY
r!l!_1 ät/
I ... I {kr+2l mb ... , m., mr+b mr+b ... , mr+b m r+/)

mr+l mr+/
2
fit ~ a'lkr+2/('Tb· .. , 'Tn 'Tr+h Tr+b ... , 'Tr+b 'Tr+/) 1._ .} (2.218)
+ 12 L,
J=1
a'T~J 7j-mJ
where (r + 2l) represents the order of each existing higher-order Volterra kernel. We con-
clude that, for small fit, the bias of the rth-order CSRS kernel estimate at the nondiagonal
points can be approximately expressed as
Ergr(mJ, ... , m r)] - gr(mJ, ... , mr) == Ar(mb ... , m r )fit 2 (2.219)
The functionAr(mb ... , m r ) depends on the second partial derivative ofthe Volterra ker-
nels ofthe same or higher order (ofthe same parity) as indicated by Equations (2.217) and
(2.218). We must note that the validity ofthe derived expressions is limited to the analyt-
ic regions of the kerne I (a point of only theoretical interest).
Estimation Variance. The variance of the CSRS kernel estimate gr is due to the statis-
tical variation of the estimates of the input autocorrelation functions for finite data
records. This variance diminishes as N increases, but it remains significant for values of N
used in practice. Thus, ifwe consider the random deviation 'Yi ofthe ith-order autocorrela-
tion function, then the variance ofthe rth-order kernel estimate at the nondiagonal points
is equal to the second moment of the random quantity
s Im I> ••• , m r) = er i iP- ... f

i=Q 0
k;(CTI> ••• , CT;}Yr+i( CTI> ••• , CTi, nu, ... , mr)dCT] ... dCT;
= LUr,i(mb ... , mr) (2.220)

i=O
where the scaling factor er for nondiagonal points is [Marmarelis, 1977]

(2.221)
Each of the random functions Ur,i can be evaluated by utilizing the basic properties of the
random deviation functions 'Yr+i' Namely, that 'Yr+i is offirst degree (i.e., piecewise linear)
with respect to each of its arguments and that its values at the nodal points (i.e., the points
where each argument is a multiple of ~t) determine uniquely its values over the whole ar-
gument space through multilinear interpolation [Marmarelis, 1975, 1977; Marmarelis &
Marmarelis, 1978]. Furthermore, the probability density function ofthe 'Yr+i random val-
ues at nodal points will tend to the zero-mean Gaussian due to the Central Limit Theorem
and the statistical properties of the CSRS input. If we expand the kernel k, into a multi-
variate Taylor series about the nodal points and evaluate the integral over each elementary
segment ofthe space within which 'Yr+i remains analytic (and piecewise multilinear), then
the variance of each random quantity Ur,i (for small ~t) is approximately (cf. Marmarelis,
1979):
pi
var[Ur,lmb ... , mr)] = N~tr Ur,lmb ... , mr) (2.222)
where P = M2~t is the power level ofthe input CSRS, and Ur,i is a square-integral quanti-
ty dependent on the ith-order Volterra kernel ofthe system. Combining Equations (2.222)
and (2.220), we find that the variance ofthe rth order CSRS kernel estimate at the nondi-
agonal points is approximately
"( B2(p.
var[gr ml> ... , m r)] == r ,ml>"" m r) (2.223)
N~r
where
B;(P; m., ... , mr) = L piUr,lmb ... , mr)

i=O
(2.224)
depends on the Volterra kernels of the system and the input power level (for details, see
Marmarelis, 1979).
Optimization of Input Parameters. The combined error, due to the estimation vari-
ance and bias, can be minimized by judicious choice of the CSRS input parameters Ilt and
N. This optimization procedure is based on Equations (2.219) and (2.223). Therefore, it is
valid for very small values of Ilt-a condition that is always satisfied in actual applica-
tions.
We seek to minimize the total mean-square error Qr ofthe rth-order CSRS kernel esti-
mate that can be found by summation of A;' and B;' over the effective memory extent of
the kernel:
Qr = ur!if + ßr(P) (2.225)

Nil I"
where a, and ßr are the summations of A;' and B;', respectively, over (mb . . . , m r) that
cover the entire kernel memory. Note that Qr decreases monotonically with increasing N
but, in practice, limitations are imposed on N by experimental or computational consider-
ations. Therefore, the optimization task consists of selecting Ilt as to minimize Qr for a
given value of N (set at the maximum possible for a given application).
The function Qr has always a single minimum with respect to Ilt, because a; and ßr are
positive constants. The position of this minimum defines the optimum li.t for each order r
and is given by
(!itopt)r = [~ ßr(P) ]lICr+4) (2.226)

4N a,
Consideration must be given to the fact that the optimum Ilt depends implicitly through
ßr on the power level P, which is proportional to Ilt. Therefore, the optimization of Ilt for
a fixed P necessitates the adjustment of the second moment of the CSRS input signal as
at changes (since P = M 2 . Ilt) by dividing its amplitude by VKt.
Note also that Equation (2.226) gives a different optimum Ilt for each order of kerne!.
The determination of each optimum 1i.t requires the knowledge of the corresponding con-
stants a; and ßr which depend on the (unknown) Volterra kernels of the system. There-
fore, these constants must be estimated through properly designed preliminary tests, on
the basis of the analysis given above. For instance, if we obtain several rth-order kernel
estimates for various values of 1i.t while keeping the input power level constant and vary-
ing the record length so as to keep the second term of Equation (2.225) constant, then a,
can be estimated through a least-squares regression procedure. A regression procedure
can be also used to estimate ßr by varying the input record length, while keeping Ilt and P
constant. These procedures have been illustrated with computer-simulated examples
[Marmarelis, 1977] and elaborated extensively in the first monograph by the Marmarelis
brothers [Marmarelis & Marmarelis, 1978].
Weillustrate below the findings of the presented analysis through computer-simulated
examples and demonstrate the proposed optimization method in an actual application.
The dependence ofthe estimation bias upon the step size Ilt ofthe CSRS input [cf. Equa-
tion (2.219)] is illustrated in Figure 2.37, where the kernel estimates obtained for three
different step sizes are shown. The bias of those estimates at each point is approximately
3r ~
g,
estimation bias
0 J
second derivative of g\
o' TIME (SECONDS) 6
Figure 2.37 Dependence of first-order CSRS kemel estimation bias on CSRS step size (top) for the
simulated example defined by the first-order Volterra kernel shown as a curve D in the bottom panel,
along with the CSRS kernel estimates (bottom) for Ilt equal to 0.1 (A), 0.2 (8), and 0.4 (C)[Marmarelis
& Marmarelis, 1978].
proportional to the second derivative of the kernel and the square of the respective step
size, confirrning Equation (2.219).
The dependence of the estimation variance upon the record length [cf. Equation
(2.223)] is illustrated in Figure 2.38, where the first- and second-order CSRS kernel esti-
mates, obtained for three different record lengths, are shown . The step size 1:::..1 of the
CSRS input in all three cases is 0.1 sec. The norrnalized mean-square errors of those esti-
mates (as percentages of the respective summed squared kerneIs) are given in Table 2.2.
These numerical values confirm, within some statistical deviation , the theoretical relation
given by Equation (2.223) .
The dependence of the total estimation error (bias and variance) upon the step size 1:::..1
of the CSRS input [see Equation (2.225)] is illustrated in Figure 2.39, where the second-
order CSRS kerne1 estimates obtained for three different step sizes are shown. The data-
record length in all three cases is 500 CSRS steps (i.e., N = 500). The statistical error
(estimation variance) for multiple kernel estimates of first and second order is plotted in
Figure 2.40 for various step sizes. Note that the same data-record length is used for
these estimates and the input power level is kept constant while varying 1:::..1 by adjusting
the variance of the CSRS input signal. The total mean-square errors of several kerne1es-
timates, obtained for various CSRS step sizes and record lengths, are shown in Figure
2.41, along with the curves representing the theoretical relation described by Equation
(2.225).
Table 2.2 Normalized Mean-Square Errors of First- and Second-Order CSRS Kernel
Estimates for Various Record Lengths
Normalized mean-square eITers of CSRS kernel estimate of

Record length First order Second order
500 CSRS steps 0.834 23.74
1000 CSRS steps 0.488 12.22
2000 CSRS steps 0.211 6.82
2.4 ANALYSIS OF ESTIMA TlON ERRORS 133
...
o
w
8
Il)
o I,. '~:;:.:~(;<~/
ci w
o I' :'~ ' ', ' .'~~
o , ~ .' " ,,';(. "
Il)
ci /;, .':,.~:.:"
!...~ ;' ,' , , ,
.." ,. r,
~' / '
"
o o
ci ci
0,0 0.500E 01 0.0 0.500E 01
...
o "
~ ~ TIME (SECONDS)
,
l.
/ ,; .,'
..;, , ~
T
" ,
w , ~
;
~'' , ...' ., , '"
8 .'" » », .....,~ .
Il)
ci ."" /" .,~..~'::/(,
"
I!z ITI' T2 )
sec
o
ci F~~R./-: ~
0.0 0.500E 01
Tl -
Figure 2.38 First- and second-order CSRS kernel estimates tor various data-record lengths: 500
(A), 1000 (8), and 2000 (C) data po ints. The exact kerneis are shown in (0) [Marmarelis & Marmarelis,
1978],
t:.t = .15 sec

• .. / ,; -,. ,;
! "
c., ', '/, ,
/, ,
,
..
.
!
",,',', ,', ... ,.,,-
J
',""
r , ;;.. , ', " ,/
..' ':'
,,, '/:,; ..\ . '
,,,
.".'.,
.
.'" ,
"
~Il'-~~ ... ' -, :i

,.
t:.t = .45 sec
.... " ,. t:.t = .60 sec
• .. . 01
!•I l/
ö
I !
~ :i
... 0'-''' ... 0.. . ..
Figura 2.39 Second-order CSRS kemel estimates for various step sizes of the CSRS input. The ex-
act kernel ls shown in Figure 2.38 (0) [Marmarelis & Marmarelis, 1978).
Noise Effects. The effect of noise in the data is examined in the context of the cross-
correlation technique by considering first the most cornrnon case of output-additive
noise:
yen) = yen) + e(n) (2.227)
then the rth-order CSRS kernel estimate obtained via cross-correlation becomes
_( ) •(
g r m\> . .. , m r = gr m\> . . . , m r ) +N ~ e(n)x(n -
er ~ m\) . . . x(n - m r ) (2 .228)
The last term of Equation (2.228) represents the effect of output-additive noise on the
CSRS kerne I estimate and will diminish as the record length N increases, because the
noise e(n) is assumed to have zero mean and finite variance .
statistical fluctuation
error of i
.3
.2. I I t
f
L
.I
o .15 .30 45 .60

st~p length [sec]
statistico' fluctuotäon
error of Q2
I
6
5
4
3
I
2
r I ~
I I I
.15 .30 .45 .60
step Je ngth [sec]
Figure 2.40 The statistical error (estimation variance) of first- and second-order CSRS kernel esti-
mates for various CSRS step sizes (mean and standard deviation bars) [Marmarelis & Marmarelis,
1978].
In the case of input-additive noise, the effect is more complicated since substitution ofi
= x +8 into the cross-correlation formula (2.211) for CSRS kerne1estimation gives rise to
many multiplicative terms involving the noise term and the input signal. Furthermore,
some ofthe input-additive noise (ifnot simply measurement errors) may propagate through
the system and give rise to noise-generated deviations in the output signal. This further
complicates matters and causes multiplicative tenns that may not have zero mean (e.g.,
products of correlated noise values) whose effect will not diminish as fast with increasing
N. Fortunately, in practice, input-additive noise is not significant in most cases and care
must be taken to ensure that remains minimized by paying proper attention to the way the
input is applied to the system (e.g., careful design ofD-to-A devices and transducers).
The effect of systemic noise or interference (i.e., random variations ofthe system char-
acteristics due to interfering factors) is potentially the most detrimental and most difficult
to alleviate in practice, since its effect on the output signal is generally dependent on the
input and may result in significant estimation errors when the cross-correlation technique
is used for kerne I estimation. Nonetheless, if the systemic noise/interference has zero
CI)
00= ~ ck ·6 t k Q1= A, ·~t4 + !!L
T
/ 1=500
k=l .3
/A)
.10 T=500 /.,/ /T-looo
9'/
""'/6o"'T=IOOO .2
/'/
0/0
//
«..... T-2000
.....
L / /O~
.05
g..//iO//
YT
~/ ..........: /.d
-A~~
=2000
......' ..........y ,...D.....
........ .1
::::..r~-
o .30 ~5 o .30 ~t
4 82
Q2 =A2 -At + T-,6t Ta500 Otot =00+"'2·Ot+m~·021 r-soo
15 6~ ~ f
10 \
\ \ /
ITa1000 4 \"
\ I
/J 1=1000
".",
, \
~ \ / /' /
5 -,"a.........
',b.... . . . JI /"Tj2000 2
~_~_G""
,0,
D -
.J:I /
/ d'
"T=20oo
---0--- <,~-~-
.15 .45 .30 .45
Figure 2.41 The total mean-square error of CSRS kernel estimates for various CSRS step sizes at
and record lengths T: 00' (zeroth order), 0 1 (first order), O2 (second order), Otot (all orders together)
[Marmarelis & Marmarelis, 1978].
mean, then sufficiently long data records may alleviate its effect considerably. Careful at-
tention must be given to the nature of the systemic noise/interference and experimental
care must be taken to minimize its effects.
Erroneous Scaling of Kernel Estimates. In the cross-correlation method, we ob-

tain the rth-order CSRS kernel estimate by scaling the rth-order cross-correlation function
estimate with the appropriate factors that depend on the even moments of the CSRS, its
step size Ji.t, and the location of the estimated kernel point (i.e., multiplicity of indices for
diagonal points).
The problem that is encountered in practice, with regard to the accurate scaling of the
CSRS kerne I estimates, derives from the fact that the actual input signal, which stimulates
the system under study, deviates somewhat from the exact theoretical CSRS waveform
that is intended to be delivered as test input to the system. This deviation is usually caused
by the finite response time of the experimental transducers, which convert the string of
numbers generated by the digital computer into an analog physical signal that stimulates
the system under study. This does not affect the quasiwhite autocorrelation properties of
the test input signal but it causes some changes in the values ofthe even moments (which
define the scaling factors) from the ones that are theoretically anticipated.
Ifthe measurement ofthe actual even moments ofthe applied input signal is not possi-
ble, then a final correctional procedure can be used that is based on a least-squares fit be-
tween the actual system output and the estimated contributions of the estimated CSRS or-
thogonal functionals, in order to determine the accurate scaling of the CSRS kernel
estimates [Marmarelis & Marmarelis, 1978].
2.4.3 Estimation Errors Associated with Direct Inversion Methods

The direct inversion methods for Volterra kernel estimation employ the modified discrete
Volterra (MDV) model ofEquation (2.180) in its matrix formulation ofEquation (2.183).
Although pseudoinversion may be required when the matrix V is not full rank or simply
ill-conditioned (as discussed in Section 2.3.1), the error analysis will be performed below
for the case when the matrix V is full rank and, equivalently, the Gram matrix G = [V'V]
is nonsingular. In this case, we can define the "projection" matrix
n, = 1- Vr[V;Vr]-IV; (2.229)
for an MDV model of order r that corresponds to structural parameter values of L basis
functions for the expansion of the Volterra kernels of highest-order Q. The matrix H, is
idempotent and has rank (N - Pr), where N is the number of output data points and Pr = (Q
+ L)!/(Q!L!) is the total number offree parameters in the MDV model (i.e., the total num-
ber ofkernel expansion coefficients that need be estimated).
In Section 2.3.1, we showed that that residual vector for model order r is given by
8r = Hry
= Ur + Wr (2.230)
where Ur is input-dependent (the unexplained part of the system output that vanishes for
the correct model order), and Wr is the transformation of the input-independent noise Wo
(which is added to the output data and assumed to be white and Gaussian) using the "pro-
jection" matrix:
Wr = Hrwo (2.231)
Note that the matrix H, depends on the input data and the MDV model structure of order
r. Therefore, even if Wo is white, the residuals for r =1= 0 are not generally white. Further-
more, we note that the residuals for an incomplete model order r contain a term
Ur = Hruo (2.232)
that represents the model "truncation error" of order r (because it is the part of the noise-
free output that is not captured by the model order r), where Uo represents the noise-free
output data.
Thus for a MDV model of order r, the "model specification" error is represented by Ur
and vanishes for the time system order R, i.e., UR = o. The size ofthis error in the estimat-
ed expansion coefficients for a truncated model (r < R) is given by
9r ~ E[c r ] - Cr = A,A~CR - c, (2.233)
where c, and CR denote the kernel expansion coefficients for the model orders rand R, re-
spectively (R is the true system order), AR denotes the generalized inverse ofthe [PR x .N]
matrix AR' and Ar is the input-dependent matrix that yields the coefficient vector estimate
for model order r:
Cr = Ary (2.234)
where
Ar = [V;Vr]-IV; (2.235)
Note that the coefficient vector estimate is composed oftwo parts:
Cr = Aruo + Arwo (2.236)
where the first term represents the deterministic bias an of the estimate cr (due to the mod-
el truncation), and the second term represents the random variation in the estimate due to
the output-additive noise Wo (statistical error). Since
E[c r] = Aruo
=A~kCR (2.237)
which yields the result of Equation (2.233), quantifying the estimation bias due to the
truncation ofthe model order. It is evident form Equation (2.233) that the model specifi-
cation error depends on the model structure and the input data.
The statistical estimation error Arwo has zero mean and covariance matrix
cov[Cr ] ~ E[A rwow6A;] = U6 . A~;

= U6 . [V;Vr]-1 (2.238)
when the output-additive noise is white with variance U6 (note that the SNR =
u6uo/Nuö). Thus, the coefficient estimates are correlated with each other and their co-
variance depends on the input data and the structure of the model. The estimation vari-
ance will be minimum for Gaussian noise (for given input data). The statistics ofthe es-
timation error are multivariate Gaussian if Wo is Gaussian, or will tend to multivariate
Gaussian even if Wo is not Gaussian (but reasonably concentrated) because of the
Central Limit Theorem.
When the output-additive noise Wo is not white but has some covariance matrix S, then
we can perform prewhitening ofthe output data in the manner similar to the one described
earlier by Equations (2.47)-(2.48). This results in alteration ofthe fundamental matrix Ar
as
Ar = [V'S-IV
r r]-IV'S-1
r (2.239)
which determines the estimation bias in Equation (2.233) and the estimation variance in
Equation (2.238). Thus, the estimation bias and variance ofthe kernel expansions are af-
fected by the covariance matrix of the output-additive noise when the latter is not white.
The case of nonGaussian output-additive noise is discussed in the following section in
connection with iterative cost-minimization methods, since the estimation problem be-
comes nonlinear in the unknown expansion coefficients when nonquadratic cost functions
are used.
2.4.4 Estimation Errors Associated with Iterative

Cost-Minimization Methods
These iterative estimation methods are used typieally in connection with the network
models introduced in Section 2.3.3 and elaborated further in Sections 4.2-4.4, because the
estimation problem becomes nonlinear with respect to the unknown network parameters
in those cases. The estimation problem also becomes nonlinear when nonquadratic cost
functions are used (see Section 2.1.5), even ifthe model remains linear with respect to the
unknown parameters. It is also plausible that these methods can be used for practical ad-
vantage in certain cases with quadratic cost funetion in which the model is linear with re-
spect to the unknown parameters (e.g., for very large and/or ill-conditioned Gram matrix)
resulting in the "iterative least-squares" method [Goodwin & Sin, 1984; Ljung, 1987;
Haykin, 1994].
The model specification error for these methods relates to the selection of the structur-
al parameters (e.g., number L ofbasis filters, number H ofhidden units, number Q ofnon-
linear order). This error can be minimized by use of a statistical selection criterion such as
the one described by Eq. (2.200) in Section 2.3.1 in eonnection with the modified discrete
Volterra (MDV) model. However, the size ofthe resulting specification error depends on
the partieular iterative method used. The same is true for the parameter estimation error,
which depends on the ability of the selected iterative method to converge to the global
minimum of the cost function (for the specified model). Since there is a great variety of
such iterative methods (from gradient-based to random-search or genetic algorithms), a
comprehensive error analysis is not possible within the bounds of this section. The reader
can find ample information on this subject in the extensive literature on the training of
"artificial neural networks" that have recently come into vogue, or on general minimiza-
tion methods dating all the way back to the days of Newton [Eykhoff, 1974; Haykin,
1994; Hassoun, 1995].
For the modest objectives of this section, we limit Dur discussion to simple gradient-
based methods in connection with the MDV model [see Equations (2.180) and (2.205)].
Generally speaking, the efficacy of all these iterative cost-minimization methods relies on
the ability of the employed algorithm to find the global minimum (thus avoiding being
trapped in the possible local minima) within a reasonable number of iterations (fast con-
vergence). Clearly this task depends on the "morphology ofthe surface defined by the cost
funetion" in the parameter space (to use a geometrie notion). 1fthis surface is convex with
a single minimum, the task is trivial. However, this is seldom the case in practice, where the
task is further complicated by the fact that the "surfaee morphology" changes at each itera-
tion. In addition, the presence of noise introduces additional "wrinkles" onto this surface.
The presence ofmultiple local minima (related to noise or not) makes the choice ofthe ini-
tialization point rather critical. Therefore, this task is formidable and remains an open chal-
lenge in its general formulation (although considerable progress has been made in many
cases with the introduction of clever algorithms for specific classes ofproblems).
As an example, we consider the case ofthe MDV model ofEquation (2.180) and the
gradient-descent method of Equation (2.205) with a quadratic cost function F( e) yielding
Equation (2.208). Then, the ''update rule" ofthe iterative procedure for the expansion co-
efficient Cr(ji- . . • , j r) is
c;(i+ I)(j i- .•. «l-. ) -- c;(i)(j b · · · -J»

.)
+ 2 ve (i)()
n ViI(i)( n) ... Vir
(i)( n ) (2.240)
where i is the iteration index and vy)(n) denotes the ith-iteration estimate of vj(n). It is ev-
ident from Equation (2.240) that the method utilizes an estimate of the gradient of the
cost-function surface at each iteration. This estimate can be rather poor and further de-
graded by the presence of output-additive noise [i.e., a term could be added at the end of
Equation (2.240) representing the gradient ofthe input-dependent residual w, in Equation
(2.231)]. Therefore, the reader should be disabused of any simplistic notions of reliable
gradient evaluation ofthe putative cost-function surface or ofthe certitude of smooth con-
vergence in general.
It is evident that, even in this relatively simple case where the model is linear with re-
spect to the unknown parameters and the cost function is quadratic, the convergence of
the gradient-descent algorithm raises a host of potential pitfalls. Nonetheless, experience
has shown that gradient-descent methods perform (on the average) at least as weIl as any
existing alternative search methods that have their own set of potential pitfalls.
It should be also noted that the three types of error defined in the introduction of this
section (model specification, model estimation, and noise related) are intertwined in their
effects following the iterative cost-minimization approach. For instance, the use of the
proper model order does not guarantee elimination ofthe model specification error (since
entrapment in a local minimum is still possible), or the use of very long data records does
not guarantee diminution of the model estimation error (no claim of "consistent estima-
tion" can be made). Likewise, it is not possible to quantify separately the effects of noise
based on a measure of signal-to-noise ratio in the data, as these effects depend on the un-
predictable distortion ofthe cost-function surface in the neighborhood ofthe global mini-
mum. For all these reasons, reliable quantitative analysis of estimation errors is not gener-
ally possible for the iterative cost-minimization methods.
HISTORICAL NOTE #2: VITO VOlTERRA AND NORBERT WIENER
The field ofmathematical modeling ofnonlinear dynamic systems was founded on the pi-
oneering ideas of the Italian mathematician Vito Volterra and his American counterpart
Nobert Wiener, the "father of cybernetics." Vito Volterra was born at Ancona, Italy in
1860 and was raised in Florence. His strong interest in mathematics and physics con-
vinced his reluctant middle-class family to let hirn attend university, receiving a Doctor-
ate in Physics from the University of Pisa in 1882. He rose quickly in mathematical
prominence to become the youngest Professor ofthe University ofPisa in 1883, at the age
of23. In 1890, he was invited to assume the Chair ofMathematical Physics in the Univer-
sity ofRome, reaching the highest professional status in Italian academe.
In his many contributions to mathematical physics and nonlinear mechanics, Volterra
emphasized a method of analysis that he described as "passing from the finite to the infi-
nite." This allowed hirn to extend the mathematical body of knowledge on vector spaces
to the study offunctional spaces. In doing so, he created a general theory offunctionals (a
function of a function) as early as 1883 and applied it to a class of problems that he
termed "the inversion of definite integrals." This allowed solution of long-standing prob-
lems in nonlinear mechanics (e.g., hereditary elasticity and hysteresis) and other fields of
mathematical physics in terms of integral and integrodifferential equations. In his seminal
monograph "Theory of Functionals and of Integral and Integro-differential Equations,"
Volterra discusses the conceptual and mathematical extension of the Taylor series expan-
HISTORICAL NOTE #2: VITO VOL TERRA AND NORBERT WIENER 141
sion to the theory of analytic functionals, resulting in what we now call the Volterra series
expansion, which is at the core of our subject matter.
The introduction of the Volterra series as a general explicit representation of a func-
tional (output) in terms of a causally related arbitrary function (input) occupies only one
page in Volterra's seminal monograph (p. 21 in the Dover edition of 1930) and the defin-
ition of the homogeneous Volterra functionals occupies only two pages (pp. 19-20).
Nonetheless, the impact ofthese seminal ideas on the development ofmethods for nonlin-
ear system modeling from input-output data has been immense, starting with the pivotal
influence on Wiener's ideas on this subject (as discussed below).
The breadth of Volterra's intellect and sociopolitical awareness led hirn to become a
Senator of the Kingdom of Italy in 1905 and to take active part in World War I
(1914-1918). He was instrumental in bringing Italy to the side ofthe Allies (the Entente)
and although in his mid-50s, he joined the Air Force and developed its capabilities, flying
hirnself with youthful enthusiasm in the Italian skies. He assiduously promoted scientific
and technical collaboration with the French and English allies.
At the end ofWorld War I in 1918, Volterra returned to university teaching and com-
menced research on mathematical biology regarding population dynamics of competitive
species in a shared environment. This work was stimulated by extensive interactions with
Umberto D'Ancona, a Professor of Biology in the University of Siena, and remained his
primary research interest until the end ofhis life in 1940.
Volterra's later years were shadowed by the oppressive grip of fascism rising over
Italy in the 1920s. He was one ofthe few Senators who had the courage to oppose vocally
the rise of fascism at great personal risk, offering to all ofus a life model of scientific ex-
cellence coupled with moral convictions and liberal thinking. When democracy was com-
pletely abolished by the Fascists in 1930, he was forced to leave the University ofRome
and resign from all positions of influence. His life was saved by the international renown
of his scientific stature. Since 1930 and until his death in 1940, he lived mostly in Paris
and other European cities, returning only occasionally to his country house in Ariccia,
where he took refuge from the tunnoil of a tormented Europe.
Volterra lived a productive and noble life. He remains an inspiration not only for his
pioneering scientific contributions but also for his highly ethical and principled conduct
in life. In the words of Griffith Evans from his Preface to Volterra's monograph (Dover
edition), "His career gives us confidence that the Renaissance ideal of a free and widely
ranging knowledge will not vanish, however great the presence of specialization." Let me
add that Volterra' s noble life offers a model of the struggle of enlightened intellect
against the dark impulses of human nature, which remains the only means for sustaining
the advancement of civilization against reactionary forces.
It is intriguing that with Vito Volterra's passing in 1940, and as the world was getting
embroiled in the savagery of World War 11, Norbert Wiener was joining the war effort
against the Axis on the other side of the Atlantic and was assuming the scientific mantle
of advancing Volterra's pivotal ideas (regrettably without explicit acknowledgement)
with regard to the problem of nonlinear system modeling from input-output data.
Norbert Wiener was born in 1894 and demonstrated an early aptitude in mathematics
(he was a child prodigy), receiving his Ph.D. at Harvard at the age of 19 and becoming a
Professor of mathematics at MIT at the age of 25, where he remained until the end of his
life in 1964. Among his many contributions, the ones with high scientific impact have
been the solution of the "optimal linear filtering" problem (the Wiener-Hopf equation)
and its extension to nonlinear filtering and system identification (the Wiener functional
series) that is germane to our subject. However, the most influential and best known ofhis
intellectual contributions has been the introduction of"cybemetics" in 1948 as the science
of"communication and control in the animal and the machine."
Wiener authored several books and monographs on the mathematical analysis of data,
including The Fourier Integral and Certain ofits Applications (1933) and Extrapolation
and Interpolation and Smoothing of Stationary Time Series with Engineering Applica-
tions (1949). He also authored a two-volume autobiography, Ex-Prodigy: My Childhood
and Youth (1953) and I Am a Mathematician (1956), as well as a novel, The Temper
(1959). In the last year ofhis life, he published God and Golem (1964), which received
posthumously the National Book Award in 1965.
Wiener was an original thinker and agrand intellectual force throughout his life. His
ideas had broad impact that extended beyond science as they shaped the forward perspec-
tive of our society with regard to cybemetics and ushered in the "information age." Ger-
mane to our subject matter is Wiener's fundamental view "... that the physical function-
ing of the living individual and the operation of some of the newer communication
machines are precisely parallel in their analogous attempts to control entropy through
feedback" [from The Human Use of Human Beings: Cybernetics and Society (1950) p.
26; emphasis added]. This view forms the foundation for seeking mathematical or robotic
emulators of living systems and asserts the fundamental importance of feedback in physi-
ological system function (i.e., homeostatic mechanisms constraining disorganization,
which is analogous to "controlling entropy"). This view is the conceptual continuation of
the Hippocratic views on "the unity of organism" and "the recuperative mechanisms
against the disease process" with regard to the feasibility of prognosis. Furthennore,
Wiener's specific mathematical ideas for modeling nonlinear dynamic systems in a sto-
chastic context, presented in bis seminal monograph Nonlinear Problems in Random The-
ory (1958), have provided great impetus to our collective efforts for modeling physiolog-
ical systems.
Wiener's seminal ideas on systems and cybemetics have shaped the entire field of sys-
tems science and have inspired its tremendous growth in the last 50 years. This develop-
ment was greatly assisted by his interactions with the MIT "statistical communication the-
ory group" under the stewardship of Y. W. Lee (Wiener's former doctoral student).
Through these interactions, Wiener's ideas were adapted to the practical context of elec-
trical engineering and were elaborated for the benefit of the broader research community
by Lee's graduate students (primarily Amar Bose, who first expounded on Wiener's theo-
ry of nonlinear systems in a 1956 RLE Technical Report). Lee hirnself worked with
Wiener (initially for bis doctoral thesis) on the use ofLaguerre filters for the synthesis of
networks/systems (a pivotal Wiener idea that has proven very useful in recent applica-
tions) and the use of high-order cross-correlations for estimation of Wiener kernels (in
collaboration with Lee's doctoral student, Martin Schetzen). Wiener's ideas on nonlinear
system modeling were presented in aseries of lectures to Lee's group in 1956 and were
transcribed by this group into Wiener's seminal mono graph Nonlinear Problems in Ran-
dom Theory.
Wiener's ideas on nonlinear system identification were first recorded in abrief 1942
paper in which the Volterra series expansion was applied to a nonlinear circuit with
Gaussian white noise excitation. Following on the orthogonalization ideas of Cameron
and Martin (1947), and combining them with his previously published ideas on the "ho-
mogeneous chaos" (1938), he developed the orthogonal Wiener series expansion for
HISTORICAL NOTE #2: VITO VOLTERRA AND NORBERT WIENER 143
Gaussian white noise inputs that is the main thrust of the aforementioned seminal mono-
graph and his main contribution to our subject matter (see Section 2.2).
Wiener's ideas were further developed, elaborated, and promulgated by Lee.'s group in
the 1950s [Bose, 1956; Brilliant, 1958; George, 1959] and in the 1960s with the
Lee-Schetzen cross-correlation technique (1965) having the greatest impact on future de-
velopments. However, the overall impetus ofthe MIT group began to wane in the 1970s,
while at the same time the vigor of Wiener' s ideas sprouted considerable research activity
at Caltech on the subject ofvisual system modeling, spearheaded by my brother Panos in
collaboration with Ken Naka and Gilbert McCann (see Prologue). As these ideas were put
to the test of actual physiological system modeling, the weaknesses of the Wiener ap-
proach (as weIl as its strengths) became evident and stimulated an intensive effort for the
development of variants that offered more effective methodologies in a practical context.
This effort spread nationally and bore fruit in the form of more effective methodologies
by the late 1970s, when the Caltech group was dissolved through administrative fiat in
one of the least commendable moments of the school' s administration. The state of the art
at that time was imprinted on the monograph Analysis 0/ Physiological Systems: The
White-Noise Approach, authored by the Marmarelis brothers
The torch was passed on to a nationwide community of researchers who undertook to
apply this methodology to a variety of physiological systems and further explore its effi-
cacy. It became evident by this time that only some core elements ofthe original Wiener
ideas remained essential. For instance, it is essential to have broadband (preferably sto-
chastic) inputs and kernel expansions for improved efficiency of estimation, but there is
no need to orthogonalize the Volterra series or necessarily use Gaussian white noise in-
puts. This process culminated in the early 1990s, with the development of efficient
methodologies that are far more powerful than the original methods in a practical applica-
tion context (see Section 2.3). On this research front, a leading role was played by the
Biomedical Simulations Resource Center at the University of Southern California (a re-
search center funded by the National Institutes of Health since 1985) through the efforts
ofthe author and his many collaborators worldwide.
As we pause at the beginning of the new century to reflect on this scientific journey
that started with Volterra and Wiener, we can assert with a measure of satisfaction that the
roots planted by Hippocrates and Galen grew through numerous efforts all the way to the
point where we can confidently avail the present generation of scientists with the power-
ful methodological tools to realize the "quantum leap" in systems physiology through re-
liable models that capture the essential complexity of living systems.
3
Parametrie Modeling
Parametrie modeling typically takes the form of algebraic or differentialldifference equa-

tions, depending on whether the system is static or dynamic, respectively. The term "para-
metric" derives from the free-parameter status of the coefficients of the model equation
that must be estimated from data. Both continuous-time and discrete-time parametric
models can be used, although the discrete-time form (difference equations) is the one ob-
tained directly from the sampled input-output data. Equivalent continuous-time models
(in the form of nonlinear differential equations) can be derived from their discrete-time
counterparts (nonlinear difference-equation models) using a specialized technique dis-
Historically, differential-equation models have been developed (actually postulated)
for various physiological systems based on first physicallchemical principles, following
the reductionist approach of physical sciences. These parametric models are extremely
useful if accurate, because they are physiologically interpretable and often compact.
However, the deductive process by which they are developed is often fraught with dan-
gers of misrepresentation and oversimplification. The result can be an inaccurate model
with pretensions of scientific pedigree that does not capture the true dynamics of
the physiological system under study. Therefore, the validity of postulated parametric
models must be tested rigorously with a broad ensemble of input signals and under var-
ious operational conditions, representative of the actual operating environment of the
system.
The difficulty of developing accurate parametric models of physiological systems
from first principles (due to the complexity of such systems) has motivated a more practi-
cal procedure, whereby we postulate a class of plausible parametric models and select the
most likely member of this class on the basis of the available input-output data. This
process incorporates all existing knowledge (and intuition) about the characteristic prop-
erties and the functional organization of the system. The free parameters of the selected
model are estimated from the data as part of the modeling process.

ISBN 0-471-46960-2 © 2004 by the Institute ofElectrical and Electronics Engineers.
146 PARAMETRIC MODELING
Another approach that leads to parametric models in a "data-true" fashion commences

with inductive nonparametric modeling using the input-output data and then transitions
to an equivalent parametric model, as discussed in Section 3.4. The reverse route is also
possible (i.e., going from a parametric model to an equivalent nonparametric model), as
discussed in Section 3.2 for continuous-time models and in Section 3.3 for discrete-time
models.
The important case of nonlinear feedback and its relation to parametric models of non-
linear differential equations is discussed in Section 4.1.5 in the modular modeling con-
text. Nonlinear feedback is an essential functional component of many physiological sys-
tems used to achieve homeostasis through autoregulation or improved performance (e.g.,
enhancement of sensory/neural information processing).
Note that the connectionist models discussed in Chapter 4 may also be viewed as para-
metric models since the modeling task reduces to estimating the free parameters of the re-
spective connectionist modellnetwork. However, the distinction is made herein between
the two types of model structures: networks (connectionist) versus equations (paramet-
ric), because the latter maintain the claim of physical lineage and physiological inter-
pretability (as well as parsimonious representation).
3.1 BASIC PARAMETRIC MODEL FORMS AND

ESTIMATION PROCEDURES
Consider the discrete-time input and output data, x(n) andy(n), respectively. Ifthe system
is static and linear, then we can employ the simple model of linear regression,
y(n) = ax(n) + b + e(n) (3.1)
to represent the input-output relation for every discrete-time index n, where e(n) denotes
the error term (or model residual) at each n. The unknown parameters (a, b) can be esti-
mated through least-squares linear regression methods, as discussed in Section 2.1.5.
This model can be extended to multiple inputs {x., X2, . . . ,Xk} and outputsjy.} as
Yln) = a i,l xl(n) + a i,2 x2(n), ... , ai,kxk(n) + b + e(n) (3.2)
and various linear regression techniques can be used for the estimation of the unknown
parameters (a;,b . . . , a;,h b) for each output Yi [Eykhoff, 1974; Soderstrom & Stoica,
1989; Ljung & Soderstrom, 1983; Ljung, 1987; Goodwin & Payne, 1977].
If the system is static and nonlinear, then a nonlinear input-output relation
J
y(n) = ICjFj[x(n)] + e(n) (3.3)
j=l
can be used as a parametric model for the single-input/single-output case, where the func-
tions {Fj} represent a set of selected nonlinear functions (e.g., polynomials, sinusoids,
sigmoids, exponentials, or any other suitable set of functions over the range of values of
x), and {c.} are the unknown parameters that can be estimated through linear regression-
provided that the {Fj} functions do not contain other unknown parameters in a nonlinear
fashion. In the latter case, nonlinear regression methods must be used (e.g., gradient-
based methods) as discussed in Chapter 4. Naturally, the choice of the functions {Fj} is
3.1 BASIC PARAMETRIC MODEL FORMS AND ESTIMATION PROCEDURES 147
critical in terms of modeling efficiency and depends on the characteristics of the system
or the type of available data.
The relatively simple case of static systems (discussed above) has been extensively
studied to date but have only limited applicability to actual physiological systems, since
the latter are typically dynamic-i.e., the output value at time n depends also on input
and/or output values at previous times (lags). Note that the possible dependence of the
present output value on previous output values (autoregressive terms) can be also ex-
pressed as a dependence on previous input values. Thus, we now turn to the practically
important case of dynamic systems.
We begin with the case oflinear (stationary) dynamic systems, where the discrete-time
parametric model takes the form of an autoregressive moving average with exogenous
variable (ARMAX) model described by the difference equation:
yen) = aly(n - 1) + ... + aky(n - k) + ßoX(n) + ßlx(n - 1)

+ ... + ß,;x(n - m) + wen) + Ylw(n - 1) + ... + yqw(n - q) (3.4)
where wen) represents a white-noise sequence. This ARMAX model is a difference equa-
tion that expresses the present value ofthe output, yen), as a linear combination of k previ-
ous values ofthe output (autoregressive part), previous (and the present) values ofthe in-
put (exogenous part), and q previous (and the present) values of the white-noise
disturbance sequence (moving-average part). The latter is assumed independent ofthe in-
put-output data and composes the model residual (error term). When Yi = 0 for all i =1= 0,
then the residuals form a white sequence, and the coefficients (ab' .. , ab ßo, ßb ... ,
ßm) can be estimated through the ordinary least-squares estimator ofEq. (2.42). However,
if any Y. is nonzero, then minimum-variance estimation of the coefficients requires the
generalized least-squares estimator ofEq. (2.46), also required for the multiple regression
model ofEquation (3.2) when e(n) is a nonwhite error sequence (see Section 2.1.5). In the
case of noise present at the input data, as weIl as at the output data, "total least-squares"
procedures must be used that introduce additional complexity in the estimation of the
model parameters.
Although the estimation of ARMAX model parameters can be straightforward through
linear regression, the model-order determination remains achallenge, i.e., determining
the maximum lag values (k, m, q) in the difference equation (3.4) from given input-output
data, x(n) and y(n). A number of statistical procedures have been devised for this purpose
(e.g., weighted residual variance, Akaike information criterion, minimum description
length criterion, or our model-order selection criterion presented in Section 2.3.1), all of
them based on minimizing a quantity that involves the prediction error for given model
order (k, m, q) compensated for properly by the total number ofmodel parameters. Typi-
cally, the sum of the squared residuals is used to quantify the prediction error in the statis-
tical model-order criterion. It is critical that the prediction error be evaluated on a set of
input-output data distinct from the one used for the estimation of the model parameters
("out-of-sample" prediction). Application of these statistical criteria is repeated succes-
sively for ascending values ofthe model order (k, m, q), and the order selection process is
completed when the minimum criterion value is achieved [Soderstrom & Stoica, 1989].
Another point of frequent confusion in practice is whether the prediction errors (resid-
uals) are computed using the autoregressively estimated model output or the observed
output values in the autoregressive terms of the model. These are often referred to as the
"open loop" or "closed loop" conditions for error estimation. We use the observed output
values (open-loop conditions) to avoid problems from accumulation of output estimation

errors in a closed-loop recursive scheme.
It should be noted that the term more commonly used in present practice of linear para-
metric modeling is ARMA (without the letter X, denoting the exogenous variable), where
the "moving-average" part refers to the exogenous variable instead of the residual term.
Although this is not consistent with the original definition of the ARMAX model, it
makes practical sense because actual investigations seldom delve into the detailed model-
ing of the residuals, limiting themselves instead to what should be properly termed ARX
modeling. Therefore, the reader should be alerted to the possible use of the term "ARMA
model" in the literature instead of what should be properly called "ARX model" accord-
ing to the original definition.
This potential confusion is bound to increase as more investigations extend to multiple
exogenous variables. In the latter case, the autoregressive (AR) part involving lagged val-
ues of the "output variable" remains structurally the same, but more exogenous variables
and their appropriate lagged values are included on the right-hand side of the difference
equation to represent the multiple "input variables," as shown below for the two-input (Xl'
X2) case:
y(n) = aly(n - 1) + + aky(n - k) + ßO,IXI(n) + ... + ßmJ,IXI(n- ml)

+ ßO,2X2(n) + + ßm2,2X2(n - m2) + e(n) (3.5)
where the structure ofthe residual term remains the same as before:
e(n) = w(n) + 'Ylw(n - 1) + ... + 'Yqw(n - q) (3.6)
In the original definition, this model would be called an "ARMAX 1X2 " model, but it is
more likely nowadays to be called a model with "two moving-average components." This
recently adopted convention of ARMA terminology to describe the input-output relation
attains greater importance for our purposes as the discrete-time Volterra model of Eq.
(2.32) is now referred to (on occasion) as a "nonlinear moving-average model."
The multiple linear regression problem defined by any ofthe Equations (3.2)-(3.5) can
be written in a vector form:
y(n) = r'(n)8 + e(n) (3.7)
where r(n) represents the vector of all M regression variables in each case, the prime de-
notes "transpose" and 8 denotes the unknown parameter vector. For a set of data points (n
= 1, ... ,N) Equation (3.7) yields the matrix formulation
y=R8 + 8 (3.8)
where y is the output vector, R is the (N x M) reetangular regression matrix, and 8 is the
vector of the residuals.
The general solution to this regression problem, which minimizes the sum of the
squared residuals, is given by means of the "generalized inverse" R+ of the reetangular
matrix R [Fan & Kalaba, 2003]:
8=R+y (3.9)
This generalized inverse (or pseudoinverse) yields the following output prediction:
y=RR+y (3.10)
resulting in the residuals
e> [1-RR+]y (3.11 )
where the (N x N) matrix [I - RR+] is idempotent and of rank less than or equal to (N -
M). The sum of squared residuals (SSR) for this general solution is
'I' = y'[1 - RR+]y (3.12)
Ifthe output data vector y is the sum of a noise-free vector Yo and a zero-mean noise vec-
torw, then the mean ofthe parameter vector estimate is
E[Ö] = R'y, (3.13)
and its covariance matrix is
E[ÖÖ'] = R+C(R+)' (3.14)
where C is the covariance matrix of the noise vector w. The mean and variance of the
SSR are
E['I'] = yb[1- RR+]yo (3.15)
var['I'] = yb[1- RR+]'C[I - RR+]yo (3.16)
When the residuals are uncorrelated and stationary (C = UÖI), then
var['I'] = UÖ . Ei'I'] (3.17)
and 'I'/uo follows a chi-square distribution with (N - M) degrees of freedom, ifthe matrix
R is full-rank and the residuals are Gaussian. If the latter is not true but N is very large,
then the Central Limit Theorem stipulates an approximately Gaussian distribution for 'I'
(provided the residuals do not have many outliers).
When the (N x M) matrix R is full-rank (N - M), then the generalized inverse R+ takes
the form that yields the ordinary least-squares (OLS) estimate of 6 (see Section 2.1.5):
ÖOLS = [R'R]-IR'y (3.18)
which yields unbiased and consistent estimates ifthe residuals e(n) are uncorrelated and
independent from the regression variables. These estimates are also efficient (i.e., they
have the minimum variance among all linear estimators) if the uncorrelated residuals are
Gaussian. As discussed in Section 2.1.5, the only practical complications here may arise
from the numerical inversion of the Gram matrix [R'R] if the latter is ill-conditioned or
singular. However, the use ofthe pseudoinverse R+ obviates this concern. Ifthe residuals
are not Gaussian, then the efficient estimates result from minimization of a cost function
that is defined by the minus log-likelihood function of the non-Gaussian residuals (see
Section 2.1.5).
If the residuals are correlated and Gaussian, then the generalized least-squares (GLS)
estimator
aGLS = [R'C-1R]-lR'C-1y (3.19)
ought to be used to obtain unbiased and consistent estimates with minimum variance,
where C denotes the covariance matrix of the residuals. Practical complications arise
from the fact that C is not apriori known and, therefore, must be either postulated or esti-
mated from the data. The latter case, which is more realistic in actual applications, leads
to an iterative procedure that may not be convergent or may not yield satisfactory results
(i.e., start with an OLS estimate of 8, obtain a first estimate of 8 and C, evaluate a first
GLS estimate of 8, obtain a second estimate of 8 and C, and iterate until the process con-
verges to a final GLS estimate of 8).
As an alternative to this iterative procedure, we can obtain an estimate of the moving-
average model ofthe residual term given by Equation (3.6), using the residual values re-
sulting from initial OLS estimation ofthe model parameters, and then evaluate the covarl-
ance matrix C from this moving-average model for subsequent use in GLS estimation.
Although this alternative approach avoids iterations and problems of convergence, it does
not necessarily yield satisfactory results because it is very sensitive to possible errors in
the residuals model of Equation (3.6). Note that the residual term of the model of Equa-
tion (3.5) contains all the noise and errors associated with all input-output variables and
their lagged values involved in the model.
Equivalent to this latter procedure is the residual whitening method, which amounts to
prefiltering the data with the inverse of the transfer function corresponding to Equation
(3.6) (prewhiteningfilter), prior to OLS estimation. The quality ofthe results depends on
the accuracy and time-invariance of the noise model. In this case, of course, the
input-output variables get linearly transformed by the prewhitening filter.
As another alternative, the parameter vector can be augmented to include the coeffi-
cients {l'i} of the moving-average model of the residuals in Equation (3.6), leading to a
pseudolinear regression problem [since the estimates of w(n) depend on 8] that is im-
plemented as an iterative extended least-squares (ELS) procedure [Billings & Voon,
1984,1986; Ljung, 1987; Ljung & Soderstrom, 1983; Soderstrom & Stoica, 1989]. This
procedure can be also used for the estimation of the parameters of a NARMAX model
(see below) and it is subject to a host of potential problems arising from the combina-
tion of the noise (intrinsic to the data) with the nonlinear terms of the NARMAX mod-
el structure.
3.1.1 The Nonlinear ease

In the more general case ofnonlinear (stationary) systems, the ARMAX model can be ex-
tended to the NARMAX model (nonlinear ARMAX) that includes nonlinear multinomial
expressions of the variables on the right-hand side of Equation (3.4) [Billings & Voon,
1984; Leontaritis & Billings, 1985a,b]. This may include difference equations that exhib-
it chaotic behavior [Barahona & Poon, 1996]. For instance, a second-degree multinomial
NARMAX model of order (k = 2, m = 1, q = 0) takes the form
y(n) = lXtY(n -1) + lX2Y(n - 2) + lXI,tT(n -1) + lXI,2Y(n - 1)y(n - 2) + lX2,2y(n - 2) + ßox(n)
+ ßlx(n - 1) + ßo,oX2(n) + ßO,Ix(n)x(n - 1) + ßl,Ix2(n - 1) + l'I,uY(n - 1)x(n)
+ 1'2,uY(n - 2)x(n) + l'I,tY(n - 1)x(n - 1) + 1'2,tY(n - 2)x(n - 1) + w(n) (3.20)
It is evident from Equation (3.20) that the form of a NARMAX model may become
rather unwieldy, and the model specification task is very challenging (i.e., selecting the
appropriate form and degree of nonlinear terms, as well as the number of input, output,
and noise lags involved in the model). Several approaches have been proposed for this
purpose [Billings & Voon, 1984, 1986; Haber, 1989; Haber & Unbehauen, 1990; Koren-
berg, 1983, 1988; Leontaritis & Billings, 1985a,b; Zhao & Marmarelis, 1997, 1998] and
theyare all rather sensitive to the presence of data-contaminating noise. Computationally,
when the structure of the NARMAX model is established, then the parameter estimation
task can be straightforward using multiple linear regression (as discussed above), since
the unknown parameters enter linearly in the NARMAX model. However, the results are
reliable only for cases of additive and/or limited noise. Otherwise, the iterative "extended
least-squares" procedure ought to be employed, which is subject to a host of potential
problems, including the convergence of the iterative procedure and the detrimental effect
ofmultiplicative noise terms (emerging from the nonlinear output terms ofthe NARMAX
model, even in the relatively simple case of output-additive noise).
In order to appreciate the potential complications arising from the presence of noise in
NARMAX models, the reader should be reminded that noise (or errors, in general) addi-
tive to the input and/or output data results in multiplicative terms (with the input and/or
output) in the context ofNARMAX modeling, although it results only in additive residual
terms in the context ofthe ARMAX model. For instance, output data contaminated by ad-
ditive noise, y = y + 8, will generate the square term y2 = Y + 2t,y + e2, which gives rise
to the output-multiplicative noise term (t,y). In addition, the possible Gaussian character-
istics of the noise are transformed by the e2 term into non-Gaussian (chi-square) charac-
teristics for the model residuals. This example can be extended to all nonlinear terms of
the NARMAX model (products of input/output lagged values) to depict the resulting
complexity of the noise effects on parameter estimation.
The problem of multiplicative noise/error terms is confounding in practice, because
existing estimation methods assume that the noise/error (residual) terms are additive to
the input and/or output. In fact, even the case of simultaneous input and output additive
noise remains practically challenging in the context of "total least-squares" methods,
which necessitate assumptions about the noise statistics and the relative input/output
noise power. The complication arising from non-Gaussian residuals can be resolved with
iterative cost-minimization procedures wherein the cost function is determined by the sta-
tistics of the residuals, as discussed in Section 2.1.5 for Volterra kernel estimation, and in
Section 4.2.2 in connection with gradient-based training of the Volterra-equivalent net-
work (connectionist) models that are intrinsically nonlinear in the parameters.
A key practical issue in the use ofNARMAX models is the determination ofthe mod-
el structure (i.e., the number and type of nonlinear combinations of input and output
lagged terms). For this purpose, the prediction-error stepwise-regression (PESR) method
was proposed by Billings and Voon (1984), according to which the model structure is de-
termined by computing the partial correlations between the output and all possible candi-
date terms involving input and output lagged values in multinomial combinations. By ap-
plying the F-ratio test on the computed partial correlations, we can select the
"significant" tenns of the NARMAX structure. The PESR method works well under low-
noise conditions, but it may encounter convergence problems and severe inaccuracies un-
der high-noise conditions. Moreover, when the PESR method is used to detennine the
structure ofthe NARMAX model, the partial correlations ofthe output with all the candi-
date tenns may be rather numerous and may result in heavy computational burden. An-
other promising method for detennining the NARMAX model structure and estimating its
parameters has been proposed by Korenberg (1983, 1988) and appears to offer some ad-
vantages in certain cases.
An alternative identification method ofNARMAX models can be based on Volterra ker-
nel estimates (ofup to third order), under the condition that the NARMAX model can be
satisfactorily approximated by a third-order Volterra model. For the third-order NARMAX
model to be described adequately by the first three orders of Volterra kemels, the coeffi-
cients of the nonlinear autoregressive tenns must retain small magnitudes relative to the
linear terms, which implies that the Volterra kernels of order higher than third can be ne-
glected [Zhao & Mannarelis, 1994a, 1997]. Since computationally efficient techniques
currently exist by which Volterra kernels can be estimated accurately (up to third order)
even with rather noisy data [Marmarelis, 1993], it is plausible that more accurate NAR-
MAX models can be obtained indirectly from these kerne I estimates, rather than direct1y
from the input-output data [Zhao & Marmare1is, 1998]. This method is presented in
Section 3.4 and separates the terms of different order in the NARMAX model based on the
mathematical relations between NARMAX and Volterra models discussed in Section 3.3.
3.1.2 The Nonstationary ease

If the system exhibits nonstationarities, then the regression coefficient vector 6 of model
parameters in Eq. (3.7) will vary through time and can be estimated either in a piecewise
stationary fashion over a sliding time window (batch processing) or in a recursive fashion
using an adaptive estimation fonnula (recursive processing). The latter approach has been
favored in recent years for parametric modeling (Goodwin & Sin, 1984; Ljung, 1987;
Ljung & Soderstrom, 1983; Ljung & Glad, 1994) and is briefly outlined below.
An interesting alternative is to introduce a specific parameterized time-varying structure
for the parameters, thereby augmenting the unknown parameter vector to include the addi-
tional parameters of the time-varying structure (e.g., a pth degree polynomial structure of
time-varying model parameters will introduce p additional parameters for each time-vary-
ing term in the vector 6 that need be estimated from the data via batch processing).
Implementation of the batch processing approach folds back to the previously dis-
cussed least-squares estimation methods. However, the recursive approach requires a new
methodological framework that seeks to update continuously the parameter estimates on
the basis of new time-series data. This adaptive or recursive approach has gained consid-
erable popularity in recent years, although certain important practical issues remain open
regarding the speed of algorithmic convergence and the effect of correlated noise [Good-
win & Sin, 1984; Ljung, 1987; Ljung & Soderstrom, 1983; Ljung & Glad, 1994]. The ba-
sie formulae for the recursive least squares (RLS) algorithm are
6(n) =6(n - 1) + s(n)[y(n) - r'(nj(n - 1)] (3.21)
sen) = r(n)P(n - l)r(n) (3.22)

3.2 VOLTERRA KERNELS OF NONLINEAR DIFFERENTIAL EQUATIONS 153
P{n) = P{n - 1) - ')I(n)P{n - l)r(n)r'{n)P(n - 1) (3.23)
')I(n) = [r'{n)P{n - l)r{n) + ~(n)]-l (3.24)
where the matrix P{n), the vector sen), and the scalar ')I(n) are updating instruments ofthe
algorithm computed at each step n, and {a(n)} denotes a selected sequence of scalar
weights (often taken to be unity) for the squared prediction errors that define the cost
function. The initialization of this algorithm is made by setting P{O) = coI, where Co is a
large positive constant (to suppress the effect of initial conditions) and I is the identity
matrix, as weIl as giving very small random values to the initial parameter vector 9(0).
This recursive algorithm can be used for on-line identification of slowly varying nonsta-
tionary systems to track changes in (near) real time. A critical issue for the efficacy of this
approach is the speed of algorithmic convergence relative to the speed of time variation of
the system/model parameters.
It is important to note that when output autoregressive terms exist in the model of
Equation (3. 7), the regression vector r{n) is correlated with the residual e( n), and, thus,
none of the aforementioned least-squares estimates of the parameter vector will converge
(in principle) to the actual parameter values. This undesirable effect (estimation bias) is
somewhat mitigated when the predicted output values are used at each step for the autore-
gressive lagged terms (closed-loop condition or global predictive model) instead of the
observed output values (open-loop condition or one-step predictive model). To remedy
this problem, the instrumental variable (IV) method has been introduced, which selects
an IV uncorrelated with the residuals but strongly correlated with the regression vector
ren) in order to evaluate the least-squares estimate [Soderstrom & Stoica, 1989]. The IV
estimates can be obtained in recursive fashion, as weIl as in batch processing mode.
3.2 VOLTERRA KERNELS OF NONLINEAR DIFFERENTIAL EQUATIONS
This section discusses the fundamental issue of the mathematical relation (and equiva-
lence) between nonlinear parametrie and nonparametric models for stable nonlinear sta-
tionary systems. The issue is addressed by deriving the equivalent nonparametric (Volter-
ra) models for the broad class of stable nonlinear dynamic systems described by the
nonlinear differential equation model of Equation (3.25). Note that if nonlinear terms in-
volving the input are present on the right-hand side ofEquation (3.25), they will give rise
to singularities (delta functions) in the high-order kerneis of the equivalent Volterra mod-
el. Since such singularities have not been observed thus far in second-order kernel esti-
mates from a broad variety of physiological systems, we have not included such nonlinear
terms on the right-hand side ofEquation (3.25).
Additional classes of nonlinear differential equations can be studied with the same ap-
proach if they have stable solutions. One such important class involves bilinear terms (be-
tween input and output or internal state variables) that may be appropriate for describing
modulatory effects of the nervous, metabolic, or endocrine systems. Its nonparametric
analysis is discussed below with an illustrative example drawn from the so-called "mini-
mal" glucose-insulin model.
We consider the broad class of nonlinear dynamic systems described by the ordinary
differential equation
L(D)y + I I Ci,jyi(Dy'Y = M(D)x (3.25)

i=O i=O
i+j~2
where L(D) and M(D) are polynomials of the differential operator D = d(·)/dt, and x(t)
andy(t) are the system input and output, respectively. The polynomial L(D) is assumed to
be of degree higher than first (i.e., it involves at least the second derivative ofy), and the
polynomial M(D) is of lower degree than L(D) in order to avoid the emergence of singular
(delta) functions in the equivalent Volterra kernels, The nonlinear terms are of degree two
and higher, and may be viewed as terms of a Taylor expansion of an analytic function of y
and its derivative. Furthermore, we assume that the coefficients {ci,j} of the nonlinear
terms are of small magnitude (i.e., ICi,jl :5 e ~ 1 for all i andj) for stable operation ofthe
system-a fact that also simplifies the approximate analytical expressions for the equiva-
lent Volterra kernels,
For the region of stable solutions ofEquation (3.25), there exists a Volterra functional
expansion [see Equation (2.5)] that represents the system output in terms of aseries of
multiple convolution integrals ofthe input. The kernel functions {kn} characterize the dy-
namics ofthe nonlinear system and can be derived analytically for the system ofEquation
(3.25) using the method of "generalized harmonie balance" [Marmarelis, 1982, 1989a,f,
1991]. According to this method, the mth-order Volterra kernel can be evaluated in the m-
dimensional Laplace domain by considering the generalized harmonie inputs:
xm(t) = elt + ... + e mt (3.26)
where Si are distinct complex (Laplace) variables. Substituting the input xm(t) into the
Volterra series expansion ofEquation (2.5), we obtain the output expression (for ko = 0)
J... JknC
00 m m
Ym(t) = I.I ....I e(Sj\+ ... +Sjn)t Tl> • • • , Tn)e-SjITI+'" -sjnTndTI • • • dt;
n=IJI =1 Jn=I
00 m m
= I I ... Ie(Sjl+ ... +Sjn)tKn(SjI'··· ,Sjn) (3.27)
n=IiI=I in=I
where K; is the n-dimensional Laplace transform ofthe Volterra kernel k.: The rth deriv-
ative of the output is given by the expression
00 m m
IYYm(t) = I I ... I
n=I il =1 in=I
(SjI + ... + Sjn)r e(SjI+'" +Sjn)tK n(SjI' ... ,Sjn) (3.28)
Note that the output terms that contain the complex exponential e(sI+'" +Sm)t with all
distinct complex frequencies (Laplace variables) ofthe input are terms associated with the
mth-order kernel. Therefore, substitution ofthe output expression and its derivatives [giv-
en by Equations (3.27) and (3.28)] into Equation (3.25) yields an expression that can be
used for the evaluation of the mth-order Volterra kernel Km(SI' ... , sm) on the basis of
harmonie balance (i.e., by selecting only those terms in the equation that contain the com-
plex exponential e(SI+' .. +sm)t). The contributions of these terms in the equation must bal-
ance out, according to the harmonie balance approach, and yield a set of equations that
can be solved to achieve our stated goals.
Let us follow this approach for increasing values of m. For m = 1 we have
Yl(t) = L ens1t Kn(Sb ... ,SI) (3.29)

n=1
and
Dry 1(t) = L(nSl)r ensltKn(sJ, ... ,SI) (3.30)

n=1
The nonlinear terms [;1(DYIY] yield
Y i(D"J
1 J 1/ - L
~ e(nl+ . . .+ni)Sl t . K nl(s b " "
vi - ~ ... L S )
1 ..•
K ni(s b " " S1)
nl=1 nj=l
L~ ... 'L
nI' ' .. n'sje(nl+
i 1 ... +nJ~)slt·K'(s
nl b " " s)
1 ... K,(s
ru b · · · , s)
1 (331)
.
'1
nl= -r' 1 'J
When Equations (3.29)-(3.31) are substituted in Equation (3.25), then, by balancing

terms containing simply the factor es1t, we obtain the equation
L(SI)K1(SI)es lt = M(SI)&l t (3.32)
which can be solved to find the first-order Volterra kernel, in terms of the polynomials
L(SI) and M(SI) defined in Equation (3.25). Therefore, the equivalent first-order Volterra
kernel in the Laplace domain is
K I(SI) = M(SI)/L 1(SI) (3.33)
The Volterra kernel expression in the time domain can be found by inverse Laplace
transform. Note that the nonlinear terms in Equation (3.25) do not contribute to K 1; i.e.,
the first-order Volterra kernel of the system represents strict1y the linear portion of the
nonlinear differential equation.
The equivalent second-order Volterra kerneI ofthis system can be found following the
same procedure for m = 2. Then,
2 2
Y2(t) = L L ... L e(sil+'" +Sjn)tKn(Sjl' ... ,Sin) (3.34)

n=lil=1 in=I
2 2
Dry 2(t) = L L ... L (Sil + ... + Sin)re(SjI + ... +sin)tKn(Sil' ... , Sin) (3.35)
n=l JI =1 i n =1
Since the expression for Y1(t)[DY2(t)Y is rather unwieldy, we focus only on the contri-
butions ofthose terms that contain the exponential factor e(SI+S2)t. Since no such terms ex-
ist for i + j > 2, the only such contributions will come from the expressions for the nonlin-
ear terms Y2(t)DY2(t), y!(t) and [DY2(t)]2, in addition, of course, to such contributions from
the linear terms Y2(t) and Dry 2(t). Noting that the kemels are symmetrie functions of their
arguments, we obtain the following equation from second-order harmonie balance:
2L(Sl + s2)K2(sJ, S2) + 2C2,oK"l(Sl)K1(S2) + Cl,l(Sl + s2)K 1(sl)K 1(S2)

+ 2CO,2S1S2Kl(Sl)Kl(S2) = 0 (3.36)
Solving the algebraic Equation (3.36) for K 2 , we obtain
Kz(s" sz) = [cz,o + Cl,I(SI + sz)/2 + co,zslsz]K1(SI)Kl(SZ)

(3.37)
L(SI + S2)
To derive the equivalent third-order kernel, we repeat this procedure for m = 3. We see
that terms containing the factor e(sl+S2+S3)t are contributed by Y3(t), IYY3(t), and
y~(t)[Dy3(t)Y for (i + j) = 2, 3. Note, however, that the contributions from expressions for
(i + j) = 2 are of order i2. Since we have assumed that ICi,j1 :5 e ~ 1, we can neglect these
higher-order terms, and the resulting approximate third-order harmonie balance equation
is
3!L(Sl + S2 + s3)K3(Sb S2, S3) + [3! C3,O + 2! C2,1(Sl + S2 + S3)

+ 2!Cl,2(SlS2 + S2S3 + S3S1) + 3 !CO,3S1S2S3]K1(sl)K1(S2)K1(S3) = 0 (3.38)
where terms of order e2 or higher have been omitted in first approximation. Solving Equa-
tion (3.38) for K 3 , we obtain the equivalent third-order Volterra kernel:
K 3(Sb S2' S3) = L( )

SI + S2 + S3
SI + S2 + S3 SlS2 + S2 S3 + S3S1 ]
· [ C3,O + C2,1 3 + Cl,2 3 + CO,3 S1 S2S3 K 1(Sl)K1(S2)K1(S3) (3.39)
in first approximation (i.e., omitting terms of order e2 or higher).

We observe that, for the mth-order harmonie balance of terms eontaining the factor
e(sl+oo+sm)t, the only nonnegligible terms (i.e., of order e) will be contributed by the expres-
sions for Ym(t), Drym(t), and y~(t)[DYm(t)]j for (i + j) = m. Therefore, in first approxima-
tion, we have the general expression (for m > 1):
{ In=O (m -m.n)!n!
m }
,cm-n,nRm,n(Sb ... ,sm) K1(Sl) ... K1(sm)
Km(sJ, ... , sm) = (3.40)
L(SI + ... + Sm)
where Rm,n(sJ, ... , Sm) denotes the sum of all distinet products (Sj 1Sj2 ... Sjn) that ean be
formed with eombinations ofthe indices (jbj2, ... ,jn) from the set (1, 2, ... , m). Note
that Rm,o = 1 by definition. Equation (3.40) yields the approximate general expression for
the equivalent high-order Volterra kernels of this elass of nonlinear differential systems,
under the stated assumption of small-magnitude coefficients for the nonlinear terms of
Equation (3.25).
The derived analytieal expressions for the equivalent Volterra kemels allow the non-
parametrie study of this broad class of parametrie nonlinear systems, whieh may also de-
seribe important eases of nonlinear feedback, as diseussed in Example 3.1 below and in
Seetion 4.1.5.
Another application of the derived analytical expressions is the study of linearized

models with GWN inputs using the first-order Wiener kernel (apparent transfer function),
discussed in Section 3.2.1. The important case of intennodulation mechanisms described
by bilinear terms in differential systems (e.g., the "minimal" model of insulin-glucose in-
teractions discussed in Sections 1.4 and 6.4) is discussed in Section 3.2.2. This type of
model may have broad applications in physiological autoregulation and neuronal dynam-
ics (see Section 8.1).
Example 3.1 The Riccati Equation

As a first analytical example, consider the well-studied Riccati equation:
Dy+ay+by2=ex (3.41)
that represents the parametric model of a system exhibiting a nonlinearity in the squared
output term by': As discussed in Section 1.4, this parametric model can be also cast in a
modular (block-structured) form of a nonlinear feedback model with a linear feedforward
component and a static nonlinear (square) negative feedback component (see Figure 1.8).
The equivalent Volterra kernels of this nonlinear parametric model can be obtained
with the generalized harmonic balance method presented above. In order to simplify the
resulting analytical expressions and limit the equivalent Volterra model to the second or-
der, let us assume that Ibl ~ 1 (which is also required to secure stability ofthe system for
a broad variety of inputs). Then the first-order Volterra kernel is found to be in the
Laplace domain [cf. Equation (3.33)]:
e
K 1(s) = s+a (3.42)
or in the time domain:
k 1(T) = ee-aTu( T) (3.43)
where u( T) denotes the step function (0 for T < 0, and 1 for T ~ 0). The second-order
Volterra kerne1is found to be in the two-dimensional Laplace domain [cf. Equation (3.37)]:
2 1
(3.44)
K 2(s " S2) = -be (SI + a)(s2 + a)(sl + S2 + a)
By two-dimensional inverse Laplace transform, we find the expression for the second-or-
der Volterra kernel in the time domain:
bc: .
k2( Tb T2) = - - e-a(Tl+72)[ l - e" mlO(Tl,72)] U( Tl)U( T2) (3.45)
a
It is evident that the equivalent nonparametric model (even the second-order approxima-
tion for Ibl ~ 1) is much less compact that its parametric counterpart given by the Riccati
equation (3.41). Nonetheless, it should be also noted that the simplification ofthe model
specification task is the primary motivation for using the nonparametric approach.
There fore , in the absence of sufficient prior knowledge about the system functional
characteristics, the nonparametric approach will yield an inductive model "true to the
data." On the other hand, the parametric approach requires firm prior knowledge to al-
low the specification of the parametric model form. Clearly, whenever this is possible,
the parametric approach is more advantageous. Unfortunately, this is rarely possible
in a realistic physiological context, although this fact has not prevented many question-
able attempts in the past to postulate parametric models on very thin (or even non-
existent) supporting evidence. One of the primary thrusts of this book is to advocate
that this questionable practice be discontinued as it engenders serious risks for the prop-
er development of scientific understanding of physiological function. (First impressions,
even when erroneous, are hard to change.) Parametrie models ought to be a desirable
objective (because of their specific advantages discussed herein) but their derivation
must be based on firm evidence. The advocated approach involves the transition from
initial nonparametric models to equivalent parametric models, as discussed in Section
3.4.
The mathematical relation between the Riccati equation and the Volterra functional
expansion can be also explored through the integral equation that results from integration
of the Riccati equation after multiplication (throughout) with the exponential function
exp(at). Applying integration-by-parts on the derivative term, we obtain the integral equa-
tion (for t ~ 0)
y(t) = y(O) e" - b fo y(A)e-a(t-A)dA + e f x(A)ea(t-A)dA

0
(3.46)
which also includes the effect of the initial condition y(O). For noncausal inputs (extend-
ing to -00), the dependence on the initial value can be eliminated and the upper integration
limits can be extended to infinity. Then, the integral equation (3.46) yields [by iterative
substitution of the approximate linear solution Yl(t) into the square integral term of the
right-hand side]the following second-order Volterra-like approximation ofthe output:
Y(t) 1
== e
o
00
e-atx(t - T)dT- bc' re-aAdAIre-a(Al+Al)x(t - A- AJ)x(t- A- Al)dAJdAl (3.47)

0 0
ifterms of order b2 and above can be ignored for very small values of Ibl. Changing into
the integration variables: 'Tl = A + Al and 'T2 = A + A2 , we obtain the expressions for the
first- and second-order Volterra kemels given by Equations (3.43) and (3.45).
Note that a nonlinear feedback model (such as the one shown in Figure 1.8) is an
equivalent modular (block-structured) model for the Riccati equation and can be viewed
also as a feedforward model with many parallel branches that represent the iterative
"looping" of the signal, akin to the iterative substitution method described above. This is
depicted in Figure 3.1 and can be useful in understanding the effect ofnonlinear feedback
in certain cases (provided the system remains stable).
3.2.1 Apparent Transfer Functions of Linearized Models

An interesting situation arises in practice, when an estimate of the "impulse response
function" of a "linearized model" of the nonlinear system is obtained for a GWN input
through cross-correlation or cross-spectral analysis. This is formally the first-order
Wiener kernel of the system. In the frequency domain, this is commonly referred to as the
"apparent transfer function" (ATF) ofthe system, and yields the best linear model (in the
~ I I ~-/T'. Y
x y
I ..
.......
GJGJ·:·
Figure 3.1 On the left is the block-structured nonlinear feedback model of the Riccati equation,
where L denotes the first-order (linear filter) part of Equation (3.41) and N denotes a negative square
(static) nonlinearity -b(·)2. On the right is the equivalent feedforward "signal-looping" model with an
infinite number of parallel cascade branches that is stable for small values of b (allowing conver-
gence of the series composed of the output contributions {y;} of the parallel branches).
output mean-square error sense) for GWN inputs (see Section 2.2.5). For the class of sys-
tems described by Equation (3.25), this ATF is given by [Marmarelis, 1989b]
H[(jw) = K](jw) _ K]~w)

L(jw) m=l n=O
i I' 2 (2m - (n + l)!n!)
m!
(~)m
471' c2m+l-n,n
.f~ ... f R2m+],n(jw,ju], -ju[> ... ,jum, -jum)IK](u,) ... K](u m)12du] ... du; (3.48)
where the previously derived analytical expressions (3.40) for the equivalent Volterra
kemels of this class of systems are combined with Equation (2.153) under the assump-
tion of small nonlinear coefficients. It is evident from Equation (3.48) that the ATF
H1(jw) is apower series in P and depends on the coefficients of the nonlinear terms of
Equation (3.25). Note that H1(jw) coincides with K1(jw) (which represents the linear
portion of the differential equation) for P = 0, as expected.
Inspection of the function R 2m+1,n(jw, iv.. -juJ, ... ,jum, -jum), as defined following
Equation (3.40), indicates that its values for n even do not depend on co, whereas its val-
ues for n odd depend linearlyon jto. This leads to the alternative expression for the
ATF:
H[(jw) = K](jw) _ K](jw)

L(jw) m=l
i (P12)m
m!
m
. L[(2m - 2/ + 1)!(2l)!C2m-21+1,21 + (jw)(2m - 2l)!(2/ + 1)!C2m-2/,21+1]Qm,1 (3.49)
/=0
where
Qm,l = (2~)m f~ ... fRmiu2],. . . , u:,) . IK](u[) ... K](um)12du] ... du.; (3.50)
Considering the definition of Rm,f, we see that the constants Qm,l depend on the Euclid-
ean norms of IK}(u)1 and luK}(u)l. For these quantities to be finite, the degree ofthe poly-
nomial L(D) in Equation (3.16) must be at least two degrees higher than the degree ofthe
po1ynomial M(D). Thus, Equation (3.49) can be also written as
H1(jw) = K1(jw) - ~~:: [A(P) + jwB(P)]

_ . [ A(P) + jWB(P)]
- K1(jw) 1 - L(jw) (3.51)
where A(P) and B(P) are power series in P with coefficients dependent on {Qm,l} and
{ci,j} for (i + j) odd [i.e., i odd andj even for A(P), and i even andj odd for B(P) coeffi-
cients]. Equation (3.51) indicates that the ATF H}vw) is affected only by the nonlinear
terms ofEquation (3.25) for which (i + j) is odd, and depends on the power level P ofthe
GWN input. Therefore, the linearized model obtained through cross-correlation or cross-
spectral analysis may deviate considerably from the linear portion of the system (repre-
sented by K}) if odd-degree nonlinear terms are present in Equation (3.25). The extent of
this deviation depends on the values A and B, which in turn depend on the input power
level. An illustrative example is given below [Marmarelis, 1989b].
Example 3.2 Illustrative Example

Consider as an example a system of this class described by the differential equation
(a 2/Y + a-D + ao)y + C3,QY3 + C2,t.f(Dy) + C3, 2y3(Dy)2 + CO,5(Dy)5 = boX (3.52)
where ICi) ~ 1. Then, the only nonnegligible quantities {Qm,l} are
1 Joo
Q},O = 21T IK}(u})1 2du I
-00 = K (3.53)
Q2,} =2KA (3.54)
Q2,2 = A2 (3.55)
using Eq. (3.50), where
1 Joo
A= - luK}(u)1 2du (3.56)
21T -00
2 bÖ (3.57)
IK1(u)1 = (ao- a2u2)2 + a1u2
Therefore, the ATF of the linearized model of this system is
H1(jw) = b jw(al - B) + (ao - A - a2w2 )

(3.58)
o Uwa} + (ao- a2w2)]
where
A(P) = 3C3,OQ},oP + "23 C3,2Q2,}P2 (3.59)
B(P) = C2,}Q},oP + 15co,sQ2,2P2 (3.60)
For very small values of P, which make A and B negligible relative to ao and ab re-
spectively, the measured ATF H}(jw) has the poles and zeros of K}(jw). However, for val-
ues of P for whichA and B become significant relative to ao and a}, respectively, two new
zeros emerge for H}(jw) and its poles double, as indicated by Eq. (3.58). These effects be-
come, in general, more pronounced as the value of P increases (i.e., the nonlinear effects
are increasing in importance as the GWN input power increases).
To illustrate these effects, we use computer simulations for the parameter values a} =
2, a} = 3, a2 = 1, bo = 1, C3,O = 1/12, C2,} = 1/12, C3,2 = -1/12, and co,s = 1/10, which yield
Q},O = 1/12, Q2,} = 1/36, Q2,2 = 1/36, and
A(P) = !:-(1 -
48
P)
6
(3.61)
B(P) = !:-(~ + p) (3.62)

24 6
The magnitude ofthe ATF IH}(jw)1 2 is shown in Figure 3.2, plotted for P values rang-
ing from 0 to 50. Clearly, as P increases, the system shifts into an unstable mode.
3.2.2 Nonlinear Parametrie Models with Intermodulation

As mentioned earlier, another class of nonlinear parametric models that can have wide
applicability to physiological systems is defined by a system of ordinary differential
equations that contain bilinear terms representing intermodulatory effects. Such effects
abound in physiology and, therefore, the relation of this model form with nonparamet-
ric Volterra models may find wide and useful application. An example of such a model
is the so-called "minimal model" of insulin-glucose interactions that has received con-
siderable attention in the diabetes literature [Bergman et a1., 1981; Carson et a1., 1983;
Bergman & Lovejoy, 1997]. This model is comprised of two differential equations of
first order, which are presented in Section 1.4 [Equations (1.18) and (1.19)]. Its equiva-
lent Volterra kernels can be derived by use of the generalized harmonie balance method
and are given by Equations (1.20)-(1.22).
In this section, we seek to generalize the modeling approach to systems with intermod-
ulatory mechanisms represented by bilinear terms between the output variable (y) and an
internal "regulatory" variable (z) that exerts the modulatory action. For instance, in the
aforementioned "minimal model," the internal regulatory variable is termed "insulin ac-
tion" and its dynamics are described by Equation (1.19), whereas the output is plasma
glucose, whose dynamics are described by Equation (1.18) and are subject to the modula-
tory action depicted by the bilinear term of Equation (1.18). Additional "regulatory" vari-
ables may represent the effects of glucagon, epinephrine, free fatty acids, cortisol, and so
on. This model form is also proposed in Chapter 8 as a more compact and useful repre-
( GAIN OF APPARENT TRANSFER FUNCTION
INPUT POWER LEVEL
PREQUENCY
Figure 3.2 Changes in the shape of the magnitude of the apparent transfer function IH1(iwW for the
example in the text as the value of the input power level P changes from 0 to 50. The plotted fre-
quency range is from 0 to 10Hz [Marmarelis, 1989].
sentation of voltage-dependent and ligand-dependent dynamics of neuronal function (re-

placing the cumbersome Hodgkin-Huxley model and its many variants) which is extend-
ing also to synaptic junctions.
Let us assurne that the internal regulatory variables {Zi} have first-order kinetics de-
scribed by the linear differential equation
dz,
~+
dt a;Zi = bx (3.63)
where i denotes any number of such internal regulatory variables that are all driven by the
same input x. Let us also assurne that the output dynamics are described by the first-order
differential equation
-dy + cY= Yo+ "LZiY (3.64)

dt i
that includes as many bilinear terms as regulatory variables. Note the presence ofthe out-
put basal value Yo that is indispensable for these systems to maintain sustained operation.
The output equation (3.64) is nonlinear and gives rise to an infinite number of Volterra
kemels (all orders ofVolterra functionals). This model form includes the aforementioned
"minimal model" as a special case for i = 1.
We will now derive the equivalent Volterra kernels ofthis system. To this purpose, we
can solve Equation (3.63) for an arbitrary (noncausal) input x(t):
z;(t) = b, f o
e-a;Ax(t - A)dA (3.65)
Substitution of the integral expression (3.65) of the regulatory variables {zlt)} into the
output Equation (3.64) yields the integrodifferential equation
dy(t) JOO
-d- + cy(t) = Yo +
t
I i
biy(t) .
0
e-aiAx(t- A)dA (3.66)
that ean be solved by the generalized harmonie balance method, or by substitution of a

Volterra expression of the output
y(t) = k o + j'"o k
1( 'T)x(t- 'T)d'T + ff
0
k2( 'Tl> 'T2)x(t - 'Tl)x(t- 'T2)d'Tld'T2 + . .. (3.67)
into the integral equation
Y + I b JOO e-euy(t- a)du JOO e-aiAx(t- u- A)dA

y(t) = ~ (3.68)
i
c i 0 0
that results from convolving Equation (3.66) with its homogeneous solution expf-crr].
The latter approach yields the following Volterra kernels:
ko = Yo (3.69)
C
because ko is obtained for a null input x(t) == O. The first-order kernel is obtained by equat-
ing first-order terms [i.e., terms containing one input signal x(t)]:
k 1(T) = koIbijTe-cU-ai(T-U)du
i 0
= Yo I .s.: [e--GT - e-a;T] (3.70)

C i ai- c
It is evident from Equation (3.70) that the kinetie eonstants {ai} of all regulatory mecha-
nisms are imprinted on the first-order Volterra kernel, defining distinct time constants in
the first-order dynamics with relative importance proportional to b/(a i - c).
The seeond-order Volterra kernel is obtained by equating second-order terms (i.e.,
terms containing two input factors) after substitution of y(t) from Equation (3.67) into
Equation (3.68):
Tm
k 2 ( 'Tb 'T2) = I b, f e-cA-ai(TI-A)kl ('T2 - A)dA
i 0
= Yo ~~
2c c: .: ( b·b·
z J) { [e-CTI-ai72 + e-C72-Qi TI] (e
aiTm
- 1)
I } aj - C ai
m
_ [e--Qi TI--QjT2 + e--QiT2--QjTI] (1 - e-{C-a,<lj)T ) }
(3.71)
(c - ai - aj )
where i and j are the indices of all regulatory variables, and 'Tm = min( 'Tl, 'T2). This expres-
sion for the second-order kerneI demonstrates the complexity that arises in the nonlinear
dynamics ofthe system because ofthe bilinear terms.
Note that, if Ibil ~ 1 for all i, the higher-order kernels (higher than second) become
negligible and the system can be approximated well by a second-order Volterra model.
To get a better feeling about the shape ofthe second-order kernel, we may simplify the
expression (3.71) by looking at the diagonal ('Tl = 'T2 = 'Tm = 'T):
_ Yo ~ ~ bb, {_ (1 - e-a i ") [1 - e-(c-a i- a} 71 }

k2 ( 'T, 'T) - - L L e CT - e-(az+aj}T -=-J (3.72)
c i j (aj-c) ai (c-ai-aj)
The purpose of this example is to demonstrate the fact that the presence of abilinear
term gives rise to rather complicated nonlinear dynamics, as attested by the expression for
the second-order Volterra kernel. Nonetheless, these dynamics can be measured direct1y
by means ofkemel estimates obtained from input-output data collected in anatural or ex-
perimental context (see Chapter 2). This gives us the powerful capability to validate para-
metric models that are widely accepted (e.g., the aforementioned "minimal model") or
modify them properly if their validity cannot be affirmed.
This approach can also be used to delineate the various regulatory mechanisms from
the structure of the estimated Volterra kernels, although this task requires considerable
sophistication of analysis, an example ofwhich was presented above. It should be empha-
sized that, despite the complexity of the task, we finally have the realistic prospect of de-
riving nonlinear parametrie models from physiological data in a methodical manner that
removes the heuristic and desultory character of previous efforts. As elaborated in Section
3.4, we can commence with kerneI measurements (a nonparametric model) obtained from
the data and transition to an equivalent parametric model using the rigorous mathematical
analysis presented in this section.
3.3 DISCRETE-TIME VOLTERRA KERNELS OF NARMAX MODELS
The fundamental relations between discrete-time Volterra kemels and the parameters of
equivalent NARMAX models (nonlinear difference equations) provide the methodologi-
cal key for utilizing the equivalence between parametric and nonparametrie models in the
context of actual applications (sampled data). This equivalence can be used to facilitate
the determination of the structure of the NARMAX model that is appropriate for a given
system and to achieve accurate estimation of the NARMAX model parameters, using in-
put-output sampled data. These relations also make possible the transition from nonpara-
metric to parametric models in discrete time [Zhao & Mannarelis, 1998]. The discrete-
time finite-memory Volterra model is given by (for ko = 0)
3.3 DISCRETE-TIME VOLTERRA KERNELS OF NARMAX MODELS 165
M-} M-} M-}
y(n) = I
mI=O
k}(m})x(n - mI) + I I
mI=O m2=O
k2(mJ, m2)x(n - m})x(n - m2) + ...
M-I M-I M-l

+ I I··· I
ml=O rn2=O rnr=O
kr(mJ, m2, ... , mr)x(n - mI)x(n - m2) ... x(n - mr) + ... (3.73)
where the tilde distinguishes the discrete from the continuous Volterra kernels in this
Chapter, the discrete values of the rth-order kerneI are assumed multiplied by T for sim-
plicity of notation (T is the sampling interval), and M is the length of the discretized sys-
tem memory or the memory-bandwidth product ofthe system (i.e., the maximum lag cor-
responding to significant kernel values is M - 1).
We can estimate the kernels k1, k2 , and k3 from input--output data using the methods
presented in Chapter 2 for a finite-order model [max(r) = q < 00]' Note that there are (M +
q)!/(M!q!) discrete kernel values to be estimated in the qth-order model of Equation
(3.73). This is a large number of values to be estimated when M is large (50-100 for a
typical physiological system) and q ;;::: 2, a faet that can lead to reduced accuracy when the
available input--output data are limited and corrupted with noise.
In order to deerease the data-reeord length requirement and inerease the aceuracy of
kernel estimation, the kernel expansion method was introduced in Section 2.3.1, whieh
generally leads to eonsiderable eomputational savings and increased estimation accuracy.
The current implementation of choice employs L diserete-time Laguerre functions to ex-
pand the Volterra kernels of the system, as diseussed in Section 2.3.2. Because L can be
mueh smaller than M for most physiologieal systems (owing to the asymptotically expo-
nential form oftheir kernel functions), the savings in kernel representation/estimation are
signifieant, being approximately proportional to (M/L)q for a qth-order model. In addition,
this method is rather robust in the presenee of noise, a fact that provides additional moti-
vation for the indirect NARMAX model estimation from Volterra kernel measurements
presented below [Zhao & Marmarelis, 1998].
The general NARMAX model is a multinomial nonlinear difference equation of the
fonn
A(~)y(n) - j[x(n), ... , dQx(n); ~y(n), ... , ~Ry(n)] = B(d)x(n) (3.74)
where d denotes the delay/shift operator, i.e., dmx(n) = x(n - m); f{.) is a multinomial
funetion without linear terms; and A(-) and B(-) are polynomials in d representing linear
differenee operators.
In order to determine the discrete-time first-order kernel ofthe above NARMAX mod-
el, we follow the diserete-time version ofthe "generalized harmonie balance" method pre-
sented in Section 3.2 for continuous-time models. This method yields discrete-time kernel
expressions in the multidimensional z-domain in terms ofthe coefficients ofthe equivalent
nonlinear difference-equation model ofEquation (3.74). To obtain the first-order discrete-
time Volterra kemel, we consider the inputx(n) = z", where z = exp(jsT) is the complex vari-
able ofthe z-domain (the discrete counterpart ofthe Laplace domain). Ifwe substitute this
input into the Volterra model (3.73) and subsequently insert the resulting expressions for
yen) andx(n) into the NARMAX model (3.74), then an expression of KI(z) in terms ofthe
parameters of the NARMAX model can be obtained by balancing all the terms with factor
z" in Equation (3.74). The resulting first-order diserete Volterra kernel is expressed in terms
of A(Z-l) and B(Z-I), which constitute the linear part ofthe NARMAX model, as
B(Z-I)
K1(z) = A(Z-l) (3.75)
Note the similarity ofthis result with its continuous counterpart in Equation (3.33).
This approach can be extended to second-order discrete Volterra kernel evaluation by
considering the input x(n) = zr + z~ and repeating the process described above by balanc-
ing the terms in Equation (3.74) that contain the factor (z7zq) in order to obtain an expres-
sion for K 2(ZI' Z2). In general, for the rth-order discrete Volterra kerne1 evaluation, the in-
put x(n) = z7 + ... + z~ is considered and all the terms with the factor (z7 ... z~) are
balanced to obtain an expression for Kr(z}, ... ,zr).
It has been shown [Zhao & Marmarelis, 1998] that the resulting high-order discrete
Volterra kernels can be expressed in terms of the first-order kernel and the coefficients of
the multinomial functionj{·). Specifically, ifwe decompose the functionj{·) into nonlin-
ear terms gr(·) that represent distinct degrees of nonlinearity as
j{.) = g2(-) + g3(-) + ... (3.76)
where
Q Q Q R
g2(X, ... , aQx; ay, ... , aRy) = I I o;i~J2~ilXL~i2X + I I 8J~~xamy
Jl=OJ2=0 J=O m=1
R R
~ L
+ L ~ 82ml,m2
,3 a m1yam2y (3.77)
ml= lm2=1
Q Q Q
g3(X, .. . , aQx; ay, .. . ,aRy) = I I I 8Ji~2J3~IX~2X~3X
Jl =0 J2=0 J3=0
Q R R Q Q R
+ L~ L~ L"" 8~,2J,m},m2 ~xamlyam2y + ~
L L ~ L ~ 8~,3.
Jl,J2,m ~lx~2xamy
J=O ml=lm2=1 Jl=O J2=0 m=l
R R R
+ L~ L~ L""~,4 ml,m2,m3 amlyam2yam3y (3.78)
ml=l m2=1 m3=1
Note that the nonlinear term g2(·) determines the second-order kernel, g2(·) and g3(·)
together determine the third-order kernel, and so on. The expressions for the second-order
and third-order discrete Volterra kernels are obtained after considerable analytical deriva-
tions in the z-domain (for details see [Zhao & Marmarelis, 1998]):
-- 1 [~~ ....
K2(zJ, Z2) = ~ ~/ -1 -1) L L 8]ij2(z IJl Z2J2 + zlJ2z2Jl)
Zl Z2 Jl=O J2=0
Q R
+ I I 83:~(ZlmziKl(Zl) + ZIJz2-mKl(z2))
J=O m=l
+ IR IR 8~~,m2(Zlml Z2m2 + Zlm2Z2ml )K1(Zl)K1(Z2)

]
(3.79)
ml=l m2=1
3.4 FROM VOLTERRA KERNEL MEASUREMENTS TO PARAMETRIC MODELS 167
~ 1 [Q R .
K3(z l> Z2> Z3) = ~ u _\
I Z2 Z3
_\ -I) I I
j=O m=1
8}~ I
il,i2,i3
Zi{(Zi~i3tmK2(zi2> Zi3)
R R -ml( )-m2 + -m 2( )-m l
+~ ~ (}2,3 ~ Zil Zi2Zi3 Zil Zi2Zi3 K ( . )K (z, ,)
L L mJ,m2.~, 2 1 Zl} 2 Z'2' Z'3
ml=1 m2=1 lJ,12,13
Q Q Q
+ I I I (} .', 3 1
JI,J2,J3
. I -" -' -'
Z ,11 Z . -.12 Z ,-.13
'1 '2 '3
jl=O j2=O j3=O il,i2,i3
Q R R ( -:-ml ~m2 + -:-ml -:-m2)

~ ~ ~ 3,2 ~ Z'2 Z'3 Z'3 Z'2 -j
+~ L L (}j,ml,m2.~, 2 Zil K 1(Zi2)K1 ( Z i3)
J=O ml=1 m2=1 lJ,12,13
+~ ~
L L
.:!:
L
8 3.•3 .
JJ,J2,m L
,,(Zi1IZi1 + ZijIZ;j2)
2
2
ir«I (zl},)
zl}
jl=O j2=Om=1 il,i2,i3
R R R ]
+ I I I
ml=1 m2=1 m3=1
(}~i,m2,m3 .~. z~mIZi2m2zi3m3KI(ZI)KI(Z2)KI(Z3)
lJ,12,13
(3.80)
where ~il,i2,i3 denotes the sum of all distinct triplets ofthe indices (i., i 2 , i 3 ) taking values
from 1 to 3.
In practice, the estimated discrete Volterra kemels can be used to identify the equiva-
lent NARMAX model based on Equations (3.75), (3.79), and (3.80), as outlined in the
following section.
3.4 FROM VOlTERRA KERNEl MEASUREMENTS TO

PARAMETRIC MODELS
Comparative studies of parametric models in the form of nonlinear differential or differ-

ence equations and nonparametric models in the form of Volterra functional expansions
have been motivated by the potential benefits that may accrue from the synergistic use of
these two modeling approaches [Barrett, 1963, 1965; Korenberg, 1973b,c; Billings &
Leontartis, 1982; Billings & Voon, 1984; Marmarelis, 1982, 1989a,f, 1991; Zhao & Mar-
marelis, 1994a,b, 1997, 1998]. Among their main advantages, the parametric models are
usually more compact and subject to easier physiological interpretation, whereas the non-
parametrie models do not require prior knowledge 0/ the model structure and yield mod-
els that are true to the data. The main advantages of each approach are also the main dis-
advantages of its counterpart. Thus, it is sensible to combine the two approaches in a
synergistic manner in order to secure the maximum set of possible advantages.
Specifically, if insufficient knowledge prevents the postulation of parametric model
structures (a condition often encountered in complex physiological systems), then the
nonparametric approach can be used first to obtain the Volterra kemels of the system
from input-output data in the "model-free" context of a "minimally constrained" non-
parametric model and, subsequently, a method like the one proposed below can be used to
obtain an equivalent parametrie model. This parametric model is first obtained in discrete
time using the method outlined below, and then can be transitioned to continuous time us-
ing the "kernel invariance" method outlined in Section 3.5.
In the previous section, we presented the general methodology by which we can derive
the mathematical relations between discrete-time Volterra kemels (nonparametric model)
and the equivalent NARMAX model (parametric model). Note that the discrete-time
Volterra model can be viewed as a nonlinear moving-average (NMA) model.
In this section, we will present two different approaches for transitioning from non-
parametric to parametric models. The first uses the equivalence relations derived in the
previous section, and the second uses a modular form of the Volterra model that can be
easily converted into continuous-time differential equations (thus facilitating physiologi-
cal interpretation). Note that the NARMAX models exhibit two inherent shortcomings:
(1) the presence of nonlinear autoregressive terms leads to considerable prediction errors
even for small estimation errors in the autoregressive coefficients or in the presence of
considerable noise; (2) physiological interpretation of the NARMAX model is inhibited
by the difficulty of transitioning to continuous time (i.e., to derive equivalent nonlinear
differential equations), as discussed in Section 3.5.
In the first approach, we use the inverse z-transform ofEquation (3.75), to obtain a lin-
ear difference equation involving the first-order discrete Volterra kerneI in the time do-
main:
I J
Ii=O aikI(n - i) = Ibja(n - j)
j=O
(3.81)
where a(·) is the Kronecker delta, ao == 1, and the coefficients a, and b, describe the linear
portion ofthe NARMAX models [Equation (3.74)]:
I
A(Z-I) = Iai z- i (3.82)
i=O
J
B(Z-I) = Ibjz-j (3.83)
j=O
If k I (n) has been previously estimated from input-output data, then we can use existing
ARMAX model-order determination and parameter-estimation methods to determine the
appropriate values of land J in Equation (3.81) and to estimate the coefficients {ab bj } of
Eq. (3.81), which is the linear part ofthe NARMAX model ofEquation (3.74).
Similarly, the mathematical relation between the second-order discrete Volterra kernel
and the second-order parameters ofthe NARMAX model given in Equation (3.79) in the
two-dimensional z-domain can be converted to the time domain in the form ofthe two-di-
mensional autoregressive relation
I 1 R .
~aikz(nl - i, nz - i) = O;;f.nz + "2 I [O~?mkl(nl - m) + O~'~mkl(nZ - m)]
1=0 m=I
1 R R
+ "2 I I O~~.mz[kl(nl - ml)k1(nz - mz) + k1(nl - mZ)kl(nZ- ml)] (3.84)
mI=l m2=1
Note that this Equation (3.84) is linear in terms of the unknown parameters {(f2,1, (f2,2,
(P,3} and, if kIen) and k2(nb n2) have been previously estimated from input-output data,
we can use linear regression methods (like the ones discussed in Sec. 3.1) to determine
the order and estimate the coefficients of the second-order terms of the NARMAX mod-
el [Zhao & Marmarelis, 1998].
3.4 FROM VOLTERRA KERNEL MEASUREMENTS TO PARAMETRIC MODELS 169
Although laborious, this procedure can be extended to the third-order Volterra kernel
expression of Equation (3.80), in order to obtain estimates of the third-order coefficients
ofthe NARMAX model nonlinearity in Equation (3.78). This procedure can be continued
until the identification task is completed by including all the terms of the NARMAX
model. It is evident that this procedure involves increasingly complex kernel expressions
as the order of nonlinearity increases, and its practical efficacy is contingent upon our
ability to obtain accurate kerneI estimates of the required order from input-output data.
We term this method "kernel-based realization" (KBR) and we illustrate its use with an
example below. Note that the term "realization" indicates the process by which a para-
metric model is developed from data or from an equivalent nonparametrie model.
The reader may be wondering why we appear to identify the system twice (first deriv-
ing a nonparametrie model in the form ofVolterra kemels, and then developing an equiv-
alent parametrie model in NARMAX form). The reason is that we ultimately seek to ob-
tain a parametrie NARMAX model, and the nonparametrie model (Volterra kernels) is
estimated as an intermediate step, because we believe that this "indirect" procedure offers
significant practical advantages in terms of robustness to noise and accuracy of estimation
(as illustrated below).
Example 3.3. Illustrative Example

A computer-simulated example of a NARMAX model (which has been previously pub-
lished in the literature) is used to demonstrate the efficacy ofthe proposed indirect (KBR)
approach and to compare it with the PESR method previously proposed by Billings and
Voon (1986). In the interest of simplicity, the example involves only a simple nonlineari-
ty, two output lags, and output-additive noise:
yen) = 0.5y(n - 1) - 0.3y(n - 2) + O.2x(n)y(n - 1) + 0.4x(n)

zen) = yen) + wen) (3.85)
where the input x(n) is a uniformly distributed white-noise signal with Ix(n)1 $ V3 (i.e.,
unit variance), and the output-additive noise wen) is an independent GWN sequence with
variance of 0.2, resulting in signal-to-noise ratio (SNR) of 0 dB in the output data. The
key Equation (3.84) becomes in this case
k2(nJ, n2) = 0.5k2(nl - 1, n2- 1) - 0.3k2(nl - 2, n2 - 2) + 0.1[k1(nl - 1) + k 1(n2 -1)] (3.86)
The results obtained from applying the KBR method are compared with the results
from the PESR method in Table 3.1, for a data record of 500 input-output data points. It
Table 3.1 Estimation results using the "prediction-error stepwise-regression" (PESR) method and
the "kernel-based realization" (KBR) method, for the NARMAX model of Equation (3.85) and a data
record of 500 input-output datapoints (SNR=O dB for GWN output-additive noise)
Equation tenns Coefficient values PESR estimates KBR estimates
yen - 1) 0.5 0.46 0.48
yen - 2) -0.3 -0.25 -0.30
x(n) 0.4 0.42 0.42
x(n)y(n - 1) 0.2 0.16 0.16
is evident that the kernel-based (KBR) method yields more accurate parameter estimates
for the linear terms ofthis example, while it also facilitates the model-order determination
task. The requisite kerne I estimates for the KBR method were obtained via the Laguerre
expansion technique discussed in Section 2.3.2. Note that the estimation error for the co-
efficient of the nonlinear term is the same in both methods for this example.
The presented results for the KBR method demonstrate that NARMAX model speci-
fication and estimation can be accomplished indirectly using Volterra kernel estimates
obtained from input-output data as an intermediate step. The KBR method relies on ac-
curate kernel estimates (obtained, for instance, by use of the Laguerre expansion tech-
nique discussed in Section 2.3.2) and exhibits the following main advantages over pre-
vious methods: (1) recursive or iterative algorithms are not required and, therefore, there
are no problems of convergence as in previous algorithms (for instance, PESR); (2) the
terms of different orders of nonlinearity are treated separately, so it is easier to establish
the structure of the model (order determination); and (3) the KBR approach may yield
more accurate parameter estimates in the presence of noise, because the effective SNR
of the kernel estimates is usually higher than the SNR of the output data. The main dis-
advantage is that application of the KBR method is limited to third-order models, be-
cause of the impracticality of estimating kerneis of order higher than third and the com-
plexity of the analytical expressions relating higher-order kernels to NARMAX model
coefficients.
The KBR method can be used in a practical context (up to third-order models) to make
the transition from nonparametrie (kernel-based) to parametrie (NARMAX) models, thus
securing some of the combined advantages of the two modeling approaches. However,
there is a second approach to transitioning from nonparametrie to parametric models (dis-
cussed below) that also allows easier transition to continuous-time parametric models
(which are more amenable to physiological interpretation) unlike the cumbersome transi-
tion of NARMAX models to continuous-time nonlinear differential equations, as dis-
The second approach seeks the parsimonious representation of the estimated discrete-
time Volterra kernels in terms of a set of linear filters that may be viewed as a basis of
functions (akin to the Laguerre expansion technique discussed in Section 2.3.2). This
leads to the modular model form of Figure 2.16, where each of the discrete-time filters
hj(n) generates an output uJ{n) that is fed into a multiinput static nonlinearity. Each of
these filters can be easily converted into an equivalent continuous-time differential equa-
tion following the "impulse invariance method" described in Section 3.5. This offers an
alternative "realization" approach whereby the nonparametrie model (obtained in the
form of discrete Volterra kemels) can be converted to a parameterized modular model.
In order to accomplish this, we fit an ARMAX model (linear difference equation) to
each ofthe estimated filter data hj(n):
hj(n) = C}j,thj(n -1) + ... + aj,Pj hj(n -Pj) + ßj,05(n) + ... + ßj,"'j5(n - mj) + 8j(n) (3.87)
following the linear estimation procedures outlined in Section 3.1. Having estimated the
coefficients {C}j, t, ... , C}j,pj' ßj,O, ... , ßj,mj} for the selected model order (Pj' mj), we pro-
ceed with the estimation of the multiinput static nonlinearity:
y(n) =trUten), ... , uj(n), ... , uJAn)] (3.88)

3.5 EQUIVALENCE BETWEEN CONTINUOUS AND DISCRETE PARAMETRIC MODELS 171
where
M-l
uj(n) = I hj(m)x(n - m) (3.89)

m=O
This realization method is completed when the static nonlinearity is parameterized. For
instance, a multinomial representation,
f(u}, ... , UH) = I cj 1,...,jHu{I ... uff (3.90)

Jl' ··JH
was discussed in Section 2.3.2, in connection with the Laguerre expansion technique.
3.5 EQUIVALENCE BETWEEN CONTINUOUS AND DISCRETE

PARAMETRIC MODELS
Although the analysis of discrete-time (sampled) data yields discrete parametric models,
the actual physiological processes described by these models occur in continuous time.
The physical or physiological interpretation of the estimated model parameters may
change considerably between discrete-time (difference equation) and continuous-time
(differential equation) representations, especially for nonlinear models. Thus, methods
for defining the equivalence between discrete and continuous parametric models are
needed to assist in model development and interpretation. The "impulse invariance"
method has been introduced for this purpose in the linear case, as discussed below. We
have extended this method to the nonlinear case by introducing the "kernel invariance"
method that offers the means for defming the equivalence between continuous and dis-
crete nonlinear models [Zhao & Marmarelis, 1997]. This method uses the general Volter-
ra model form as a canonical representation of nonlinear systems and requires that the
continuous-time kernels (corresponding to the differential equation) be identical to the
discrete-time kemels (corresponding to the equivalent difference equation) at the sam-
pling points. The actual implementation of this method may become unwieldy in the gen-
eral case, but it appears to be tractable in certain cases of low-order nonlinear systems, as
described below.
We begin with the presentation ofthe "impulse invariance" method for linear time-in-
variant systems that is based on the principle of maintaining the direct correspondence be-
tween the poles in the continuous s-domain with the poles in the discrete z-domain for the
respective transfer functions, so that the values of the respective impulse response func-
tions are the same at the sampling points.
According to this method, the partial fraction expansions of the two transfer functions
H(s) andH(z) (using their P poles) are equated, so that the discrete impulse response func-
tion h(m) is the sampled version of its continuous counterpart h( 'T), where 'T = mT (the
sampling interval T is assumed to be sufficiently short in order to avoid aliasing). Thus,
for P distinct poles, we have
P Ai
H(s) = ~ S-Pi (3.91)
1=1
H(z)=TI~
'" p
(3.92)
i=1
where Pi are the continuous poles. Equations (3.91) and (3.92) can be readily translated
into equivalent linear differential and difference equation models, respectively.
For example, suppose that a discrete-time impulse response function h(n) is described
by the difference equation
lien) = 0.3h(n - 1) - 0.02h(n - 2) + 8 (n) (3.93)
that has two poles detennined by the z-domain transfer function
1
ii(z) = 1 _ 0.3z- 1 + 0.02z-2
2
(3.94)
1 - 0.2z- 1 1 - 0.lz- 1
The two discrete poles 0.2 and 0.1 lead to the two continuous poles PI = ln(0.2)/T and P2 =
ln(O.l)/T, according to Equation (3.92) that govems the impulse-invariance method.
Therefore, the s-domain transfer function ofthis linear system in continuous time is (for T
= 1):
s + ln(20)
(3.95)
H(s) = S2 -ln(0.02)s + 1n(0.2)ln(0.1)
leading to the equivalent differential-equation model for the system:
dy' dy dx
dt - In(0.02) dt + ln(0.2)ln(0.1) y = di + In(20)x (3.96)
where y(t) and x(t) are the continuous-time output and input of this system.
The developed "kerne1 invariance method" is the conceptual and mathematical exten-
sion to the nonlinear case of the presented "impulse invariance method," that offers a
practical approach for defining the equivalence between continuous and discrete nonlin-
ear parametric models in the time-invariant case [Zhao & Marmarelis, 1997].
According to the kerne1 invariance method, the equivalence between nonlinear differ-
ential and difference equation models is defined on the basis of the principle that the dis-
crete Volterra kemeis (corresponding to nonlinear difference equations) be the sampled
versions of their continuous counterparts (corresponding to the equivalent nonlinear dif-
ferential equations). Implementation of this method requires derivation of the formal
mathematical relations between nonlinear differential/difference equations and continu-
ous/discrete Volterra models (kerneis) that are summarized below. In the following de-
rivations the "tilde" denotes the discrete variables, functions and operators.
For the nonlinear differential equation model ofthe form
L(D)y - fix, y, Dx, Dy, ... , IJRx, DQy) = M(D)x (3.97)

3.5 EQUIVALENCE BE7WEEN CONTINUOUS AND DISCRETE PARAMETRIC MODELS 173
where D denotes the differential operator d(·)/dt,f(-) is a multivariate continuous function

without linear terms, and L(D), M(D) are polynomials in D, where L is of higher degree
than max(R, Q) and M is of lower degree than L. We seek an equivalent nonlinear differ-
ence equation model of the form
l(a)y -l(x, y, ax,ay, ... , aRx, aQY) = M(a)x (3.98)
where adenotes the delay/shift operator, ax(n) = x(n - 1). The key requirement that de-
fines this equivalence according to the kerne I invariance method is
k;(mJ, ... , mi) = Tik;(mIT, ... ,miT) (3.99)
where the scaling constant Ti is introduced to remove the dependence of the measured
discrete kernel values on the sampling interval T.
Although the concept of "kernel invariance" is rather clear and straightforward, its ac-
tual methodological implementation may be rather complex. This is demonstrated below
in the relatively simple case of quadratic (i.e., second-order) models.
We consider the class of nonlinear systems described by Equation (3.97), for nonlin-
earities of second degree:
f(x, Dx, ... , IJRx; y, Dy, ... , DQy) =
R R Q Q R Q
L L alJ,/2DlIxDl2x +mI=Om2=0
/1=0 12=0
L L bmI,m2IrlyDm2y + L L cI,,,!)lxnmy
1=0 m=O
(3.100)
This class of nonlinear systems can be represented by a Volterra series expansion (if
stable solutions exist) with kernels that can be analytically evaluated using the "general-
ized harmonic balance" method described in Section 3.2. The first two continuous Volter-
ra kernels (in the Laplace domain) are given by the expressions
M(s)
K1(s) = L(s) (3.101)
KZ(SI> sz) = ~ ~ ~a (s{lsF + s{zs1

1
)
2 L L 11,/2
11=0 /2=0 L(SI + S2)
+ _21 L'"
Q
L
Q
'" b
mI,m2
( m
SI IsT2 + sf 2
s T I)K I(S I)K I(S2)
-=---=--..:....-~~:2~:!-
mI=O m2=0 L(SI + S2)
+ ~~ ~ (s'{'s1K1(SI) + s{sr'K1(sz))
2L L Cl,m (3.102)
[=0 m=O L(SI + S2)
The equivalent discrete first-order kernel KI(z) in the z-domain can be obtained from
KI(s) using the impulse invariance method for linear systems, discussed above. Conse-
quently, the linear part of the equivalent difference equation can be constructed from
KI(z).
To determine the nonlinear (quadratic) part of the equivalent difference equation, we
can apply the kernel invariance method of second order. If we assume, for simplicity of
derivations, that Kl(s) and K 2(Sl' S2) have distinct poles, then K 2(Sl' S2) can be represented
in terms ofthe partial fraction expansion in the (SI' S2) domain [Crum & Heinen, 1974]:
N bo,o + bl,oSl + bO,lS2

I
K 2(Sb S2) = k=l (s 1 - ak)(S2 - ßk)(Sl + S2 - 'Yk)
(3.103)
where (ab ßb 'Yk) are N distinct triplets formed by the P distinct poles of Kl(s) with repe-
titions (i.e., N = P3). In order to avoid singular functions in the inverse Laplace transform
ofthe second-order kernel, we have to assume that the order of SI and S2 in the denomina-
tor of K 2(Sb S2) is higher than the order in the numerator. Ifthe nonlinearity ofthe equiva-
lent difference equation is
J~('" A'"
~ X, uX, ... , uAR"'. '" A'"
x, y, AQ~
uy, ... ,u YJ-
R R Q Q R Q
'" + L
"'"" L"'"" b'"ml,m2 uAmI yu
A/ Am
'" + L
"'"" L
"'"" '" (3.104)
L L
"'"" "'"" al
'"
l'/2uAll "'
Xu 2 X "' 2Y CI,m uA/"'Am'"
Xu Y
11=0/2=0 ml=l m2=1 1=0 m=l
then the corresponding second-order discrete Volterra kernel in the (Z1, Z2) domain can be
found by generalized hannonic balance to be
1 R R l2zil
K2(z \ , Z2) = - " "Zi (ZillZil2 + Zi I)
2 L L /1'/2 '"
/1 =0/2=0 Liz;' zi
+ _1 IQ Q b
"'"" (z-m
1
l Z2-rn + Zlm2Z2ml)Kl(Zl)K
'" '" (Z )
2 L m\,m2 _
lZ2l) \ 2
ml=l m2=1 L(Zl
+ _1 ~
R Q
"'"" '" ( I m)K
Zl-mZ2-IK'"1(Zl) +ZIZ2 '" l(z)
2 L L Cl,m _ lZ2l) 2 (3.105)
1=0 m=l L(Zl
According to the second-order kernel-invariance method, the equivalent K2(Zl' Z2)

ought to conform with the partial fraction expansion of K 2(Sb S2) given by Equation
(3.103) and, therefore, by equating respective fractions we obtain
N
K2(Zl' Z2) = T2I X

k=l
(bo,o + bl,o + bO,1)e(ak+ßk+Yk)Tzllz2l + ('Yk - ak - ßk)[b l,o(l - eakTz-l) + bO,l(l - eßkTz- l)]
('Yk - (Xk - ßk)(l - eakTzIl)(l - eßkTz2l)(1 - eYkTzllz2l)
(3.106)
where the discrete model parameters are expressed in terms of the continuous model para-
meters.
Comparison of Equations (3.105) and (3.106) yields the "discrete" coefficients (all '/2'
bmI,m2' CI,m) in terms ofthe pole triplets (ab ßb 'Yk) and the constants (bo,o, bl,o, bO,l) ofthe
partial fraction expansion of K 2 shown in Equation (3.103) or, equivalently, in terms of
the "continuous" coefficients (al l ,/2' bmI,m2' CI,m) in Eq. (3.102). The relations between the
two sets of coefficients define the equivalence between the quadratic parts of the nonlin-
ear differential and difference equation models.
For more procedural details see Zhao and Marmarelis (1997). It is evident that this
procedure can be rather unwieldy in general, but represents a general and powerful ap-
proach that has not been utilized heretofore. To assist the reader in understanding the pro-
cedural steps, we provide an illustrative example below.
Example 3.4. Illustrative Example

Consider the differential equation model that has a single square nonlinearity:
D2y + 2alDy + aaY- ey2 = bx (3.107)
This model has the following equivalent first-order and second-order Volterra kerneis in
the s-domain:
b
K,(s) = (s - a)(s - ß) (3.108)
e
K 2(Sl' S2) = b K I(SI)K1(S2)K1(s} + S2) (3.109)
where a = -al + (aT - ao)I/2, and ß = -al - (aT - ao)I/2, are the two poles of the model.
Note that this model has a full Volterra series expansion (i.e., an infinite number of ker-
nels) but here we limit ourselves to a second-order approximation (truncated model) on
the assumption that lei ~ 1. The partial fraction expansions for these two kerneis are
K)(s) = d
b[ (s -1 a) - _1
(s - ß)
J (3.110)
1 1
--------+--------
(SI - a)(s2 - a)(SI + S2 - a) (SI - a)(S2 - ß)(SI + S2 - ß)
1 1
+ +--------
(s} - ß)(S2 - a)(s} + S2 - ß) (SI - ß)(S2 - ß)(SI + S2 - a)
eil-
K 2(sJ, S2) = -;p 1 1
(3.111)
(SI - ß)(S2 - ß)(SI + S2 - ß) (SI - a)(S2 - ß)(SI + S2 - a)

1 1
(SI - ß)(S2 - a)(sl + S2 - a) (SI - a)(S2 - a)(SI + S2 - ß)
where d = (a - ß). Therefore, based on first-order kernel invariance, the first-order kernel
ofthe equivalent discrete model in the z-domain is
M(z-I) b'Ite'" - eßl)z-I

K1(z) = l(Z-I) (3.112)
d(1 - e aTz- 1)( 1 - eßTz- 1)
and the linear part of the equivalent nonlinear difference equation is expressed as
bT(e"! - eßl)
yen) - (eaT + eß1)y(n -1) + e(a+ß)Ty(n -2) = 1 x(n -1) (3.113)
Based on second-order kernel invariance, the partial fraction expansion of the equivalent
discrete second-order discrete Volterra kernel in the (ZI' Z2) domain is obtained using the
general Equation (3.106). After performing the calculations to merge the various partial
fractions that correspond to Equation (3.111), we obtain the following expression for the
second-order discrete Volterra kernel:
,. . ; eb 2 T 2
K 2(z j, Z2) = ---;p- X
1
A..
1 [ 'f'1
+ wi
A.. ( -1 + -1) + A.. Z -1 -1 + A.. Z -1 -1( -1 + -1) + A.. -2 -2]
Z1 Z2 'f'3 1 Z2 'f'4 1 Z2 Z 1 Z2 'f'SZ 1 Z2
Zl zi ,. . ; 1""'; 1""'; 1 1 (3.114)
L(zl )L(zi )L(zl zi )
where
l(z-I) = (1 - eaTz- 1)(1 - eßTz- 1) (3.115)
and the values of the parameters (cP1' cP2' cP3' 4>4' cPs) in terms of a, ß and T are given in
Zhao and Marmarelis (1997).
Thus, the nonlinear difference equation model that has the same first-order and sec-
ond-order Volterra kerneis as the differential equation model ofEquation (3.107) (in ac-
cordance with the kernel invariance method) is
yen) - (e aT + eß1)y(n - 1) + e(a+ß)Ty(n - 2)

2eb
- d2(ea T _ eßT)2 [82x(n - l)y(n - 1) + 83x(n - l)y(n - 2)]
e
d(ea T _ eßT) [84y2(n - 1) + 28sY'(n - l)y(n - 2) + 86jJ2(n - 2)]
b(e aT - eß1) b2()

= x(n -1) + ~cX2(n - 1) (3.116)
where the parameters (()l' ()2' ()3' ()4' ()s, ()6) are expressed in terms of a, ß, and T, as de-
tailed in Zhao and Marmarelis (1997). These expressions are rather unwieldy and are
omitted here in the interest of space.
We observe that the nonlinear second-degree term in the differential equation (3.107)
has given to rise to several nonlinear second-degree terms in the equivalent difference
equation (3.116); namely, all the square and cross-product terms between the lagged in-
put-output values x(n - 1), yen - 1) and yen - 2) that participate in the equivalent linear
(first-order) portion ofthe difference equation. Thus, although the continuous differential-
equation model of Equation (3.107) has only one nonlinear term limited to the current
value ofy, the equivalent difference-equation model ofEquation (3.116) has six nonlinear
terms that involve the lagged values of the discrete input and output. These lagged values
are also the ones present in the linear portion of the difference equation.
This result has important practical implications because it shows that, if one had ob-
tained experimentally the discrete (NARMAX) model of the nonlinear difference equa-
tion (3.116) from sampled data, then the presence of nonlinear terms involving lagged
input-output values should not be necessarily interpreted as nonlinearities in the deriva-
tive(s) of continuous input-output variables of the actual physical process governed by

the differential equation (3.107). This example demonstrates that such terms may be sim-
ply the result of a nonlinearity in y undergoing discretization in the dynamic context of
the differential equation. Therefore, proper interpretation of empirically obtained nonlin-
ear difference equation (NARMAX) models of continuous nonlinear dynamic systems re-
quires the in-depth and rather complicated analysis presented in this section and elaborat-
ed in Zhao and Marmarelis (1997). It is the considerable complexity of this type of
analysis that has confounded the fruitful application of the NARMAX approach to date
and has inhibited practical parametrie modeling in a nonlinear context.
In order to test the validity of the presented analytical results, the first-order and sec-
ond-order Volterra kerneis for the continuous and discrete models of this example can be
evaluated for specific parameters of the differential and difference equations, and com-
pared with the kernel estimates obtained from data resulting from simulation ofthe differ-
ence equation with a Gaussian white-noise input. Specifically, in Zhao & Marmarelis
(1997), we evaluate analytically the kerneis ofthe continuous model ofEquation (3.107)
for parameter values ao = 12, al = 3.5, c = 2 and b = 5 using the Laplace-domain expres-
sions of Equations (3.110) and (3.111). Their discrete counterparts of the equivalent dif-
ference equation model of Equation (3.116) are evaluated using the corresponding analyt-
ical expressions (3.112) and (3.114). The results are shown to be identical for all practical
purposes, in terms of the first-order and the second-order Volterra kernels. These results
confirm the validity of the proposed "kerne I invariance method" and corroborate the ana-
lytical derivations presented above.
3.5.1 Modular Representation

The material presented above makes it clear that derivation of continuous nonlinear para-
metric models from input-output data is a rather complicated task, unless the modular
model approach that was presented in the previous section is used. According to this mod-
ular approach, the discrete Volterra kernel estimates are used to find a set of H linear fil-
ters that form a complete basis for the system at hand. Each of these filters is converted
into an equivalent linear difference equation [see Equation (3.87)]. Then, each difference
equation is converted into an equivalent differential equation using the impulse-invari-
ance method.
The outputs uj(t) ofthese continuous filters (which can be viewed as "internal or state
variables") are fed into the multiinput static nonlinearity f{.) of the modular model to pro-
duce the continuous system output y(t). Therefore, the equivalent continuous parametric
model takes the form
Aj(D)uj = Bj(D)x (for j = 1, ... , H) (3.117)
y = f{u}, ... , UR) (3.118)
where Aj(D) and Bj(D) are the obtained differential operators for thejth filter [i.e., polyno-
mials in D = d(·)/dt] using the impulse invariance method on Equation (3.87). It is evident
that this continuous modular model separates the dynamics from the nonlinearities and
can be viewed also as a parametrie model when the static nonlinearity f{.) of Qth degree is
parameterized. The dynamics of this model are contained only in the H "state equations"
(3.117), and the nonlinearities are contained only in the static nonlinearity of the "output
equation" (3.118). This modular/parametric model form facilitates the study of stability
and control for the subject system. Note that the jth "state equation" is of order Pi and can
be written as a system of Pi first-order differential equations to confonn with the conven-
tional fonnulation ofstate space in control theory, ifthis is deemed necessary or appropri-
ate.
In spite of the many advantages of this modular/parametric model form, the potential
complexity ofthe output nonlinearity (due to high dimensionality H and/or nonlinear or-
der Q) remains a practical problem. The key requirement for obtaining this modular/para-
metric model in a practical context is the parsimonious representation of the kemels in
terms of an efficient expansion and its estimation from input-output data. This relates to
the fundamental issue of "principal dynamic modes" discussed in Section 4.1.1 as the
most efficient representation of the system dynamics.
4
Modularand
Connectionist Modeling
In this chapter, we review the other two modeling approaches (modular and connection-
ist) that supplement the traditional parametric and nonparametric approaches discussed in
the previous two chapters. The potential utility of these approaches depends on the char-
acteristics of each specific application (as discussed in Section 1.4). Generally, the con-
nectionist approach offers methodological advantages in a synergistic context, especially
in conjunction with the nonparametric modeling approach as discussed in Sections
4.2--4.4. The primary motivation for connectionist (network-structured) modeling is the
desire for greater efficiency in extracting the causal relationships between input and out-
put data without the benefit ofprior knowledge (as discussed in Sections 4.2 and 4.3). On
the other hand, the primary motivation for modular (block-structured) modeling is the de-
sire to compact the estimated nonparametric models and facilitate their physiological in-
terpretation (as discussed in Sections 4.1 and 4.4).
The equivalence among these model forms is established through detailed mathematical
analysis in order to allow their synergistic use, since each approach exhibits its own blend
of advantages and disadvantages vis-ä-vis the specific requirements of a given application.
Some modular forms that have been proven useful in practice and derive from non-
parametric models are discussed in Section 4.1. The connectionist modeling approach and
its relation to nonparametric (Volterra) models is discussed in Section 4.2, and Section
4.3 presents its most successful implementation to date (the Laguerre-Volterra network).
The chapter concludes with a general model form that is advocated as the ultimate tool for
open-loop physiological system modeling from input-output data (the VWM model).
4.1 MODULAR FORM OF NONPARAMETRIC MODELS
In this section, we present a general modular form of nonparametric (Volterra) models

employing the concept of "principal dynamic modes," as well as specific modular forms
Nonlinear Dynamic Mode/ing 0/ Physiological Systems. By Vasilis Z. Mannarelis 179

180 MODULAR AND CONNECTIONIST MODELING
(cascades and feedback) derived from nonparametric models that have been found useful
in actual applications as they assist in physiological interpretation of the obtained models.
Webegin with the general modular form that is equivalent to a Volterra model and em-
ploys the concept of "principal dynamic modes" in order to facilitate model interpreta-
tion.
4.1.1 Principal Dynamic Modes

The general modular form that is equivalent to a Volterra model derives from the block-
structured model ofFigure 2.16, which results from the modified discrete Volterra model
of Equation (2.180). The discrete impulse response functions {bj(m)} of the filter bank
constitute a complete basis for the functional space of the system kemels and are selected
apriori as a general "coordinate system" for kernel representation. However, this is not
generally the most efficient representation ofthe system in terms ofparsimony.
The pursuit of parsimony raises the important practical issue of finding the "minimum
set" of linear filters {plm)} that yield an adequate approximation of the system output
and can be used in connection with the modular form of Figure 2.16 as an efficient/parsi-
monious model of the system. This "minimum set" is termed the "principal dynamic
modes" (PDMs) ofthe system and defines a parsimonious modular model (in conjunction
with its respective static nonlinearity) that is equivalent to the discrete Volterra model and
shown schematically in Figure 4.1 [Marmarelis, 1997].
It is evident that the reduction in dimensionality ofthe functional space defined by the
filter bank requires a criterion regarding the adequacy of the PDM model prediction for
the given ensemble of input-output data. This criterion must also take into account the
possible presence ofnoise in the data (see Section 2.3.1).
The general approach to this question can be based on the matrix formulation of the
modified discrete Volterra (MDV) model of Equation (2.180) that is given in Equation
(2.183). For the correct model order R, the MDV model is
y = VRCR + HRWo (4.1)
where CR is the true vector of Volterra kernel expansion coefficients for the general basis
employed in the filter-bank, VR is the input-dependent matrix of model order R, H R is the
"projection" matrix defined in Equation (2.190), and Wo is the output-additive noise vec-
tor.
x
f(-) :Y
Figure 4.1 Structure of the PDM model composed of a filter bank of H filters (PDMs) that span the
dynamics of the system and whose outputs {Uj} feed into an H-input static nonlinearity f(u 1, •.• , UH)
to generate the output of the model.
4.1 MODULAR FORM OF NONPARAMETRIC MODELS 181
We seek a parsimonious representation in terms of the size PR of the coefficient vector

CR, so that the sum ofthe squared residuals ofthe model output prediction remains below
an acceptable level 00. Let us use the following notation for the "parsimonious" model:
y = '1'/)'s + 's (4.2)
where the input-dependent matrix '1's is composed of the outputs of the H PDMs in the
MDV model formulation (H < L), Ys is the vector of the respective expansion coefficients
that can reconstruct the Volterra kernel estimates using the PDMs {plm)} (i = 1, ... ,H),
and (s is the resulting residual vector whose Euclidean norm cannot exceed 00.
r
The task is to find a [PR x Ps] matrix that transfonns V R into '1's:
vRr = '1's (4.3)
where PR = (Q + L)!/(Q!L!) and P, = (Q + H)!/(Q!H!), so that the resulting prediction er-

ror remains below the acceptable level 00. Note that the residual vector (s contains gener-
ally a deterministic input-dependent component that results from the reduction in dimen-
sionality of the filter bank and a stochastic component that is due to the output-additive
noise but is different (in general) from HrwO•
The transformation of Equation (4.3) gives rise to a different "projection" matrix for
the PDM model:
Gs = I- '1's['1'~ '1's]-l'1'~
=I - VRr[r'V~VRr]-lr'Vk (4.4)
which defines the deterministic error in the output prediction as G s V RCR, and the stochas-
tic component of (s as: Gswo. Therefore, the prediction error for the PDM model is
(s = GsVRCR + Gswo (4.5)
which implies that the mean of the Euclidean norm of this prediction error is
E[(;(s] = c~V~G~GsVRCR + (TB . Tr{G~Gs} (4.6)
where (TB is the variance of the output-additive noise (assumed white for simplification)
and Tr{·} denotes the trace of the subject matrix. Thus, the task becomes one of finding a
matrix r [which determines the matrix G s from Equation (4.4)] so that the right-hand side
ofEquation (4.6) is no more than 0 0 for a given system (CR), input data (V R' H R ) , and out-
put-additive noise variance (TB.
It is evident from Equations (4.4) and (4.6) that the general solution to this problem is
rather complicated, however an approximate solution can be obtained with equivalent
model networks (see Section 4.2.2), and a practical solution has been proposed for sec-
ond-order systems [Marmarelis & Orme, 1993; Marmarelis, 1994a, 1997], which is dis-
cussed below.
For a second-order MDV model, the output signal can be expressed as a quadratic
form:
y(n) = v'(n)Cv(n) (4.7)

182 MODULAR AND CONNECTJONIST MODELING
where v(n) is the augmented vector ofthe filter-bank outputs {vj(n)} at each discrete time
n:
v'(n) = [1 vI(n) v2(n) ... vL(n)] (4.8)
and C is the symmetrie coefficient matrix:

- 1 1 ... 1
Co 2"cI(I) 2"cI(2) 2"cI(L)
1
2"c I(I) c2(1, 1) c2(1,2) ... c2(1, L)
C= I 1 c2(2,2) ... c2(2, L) I (4.9)

2"c I(2) c2(2, 1)
1
2"c I(L) c2(L, 1) c2(L,2) ... c2(L, L)
Because ofthe symmetry ofthe square matrix C, there always exists an orthonormal (uni-
tary) matrix R (composed ofthe eigenvectors ofC) such that
C=R'AR (4.10)
where Ais the diagonal matrix ofthe eigenvalues (Le., a diagonal matrix with the (L + 1)
distinct eigenvalues {Ai} ofmatrix C as diagonal elements). Therefore,
y(n) = v'(n)R'ARv(n)
= u'(n)Au(n)
L
= Ii=O AiUr(n) (4.11)
where
u(n) = Rv(n) (4.12)
Therefore, each component uln) is the inner product between the ith eigenvector ~i of
matrix C (i.e., the ith column of matrix R) and the vector v(n) which is defined by the out-
puts of the filter bank at each discrete time n. The first element, ILi,O' of each eigenvector
~i corresponds to the constant that is the first element ofthe vector v(n). The other eigen-
vector elements define the transformed filter-bank outputs:
L
uln) = ILi,O + I ILi,jVj(n) (4.13)
j=l
Because the eigenvectors have unity Euclidean norm, the eigenvalues {Ai} in Equation
(4.11) quantify the relative contribution of the components ur(n) to the model output.
Thus, inspection of the relative magnitude of the ordered eigenvalues (by absolute value)
allows us to detennine which components uren) in Equation (4.11) make significant con-
tributions to the output y(n).
Naturally, the selection ofthe "significant" eigenvalues calls for a criterion, such as IA;I
being greater than a set percentage (e.g., 5%) ofthe maximum absolute eigenvalue IAol.
Having made the selection of H "significant" eigenvalues, the model output signal is ap-
proximated as
H-I
yen) = LA;uf(n)
;=0
= ~Ai JLi,O + ];/i(m)x(n - m) 2

H-I [M-I ]
H-I { M-I
= ~Ai JL7.o + ~O 2JLi,oPi(m)x(n - m)
~l~l }
+m~om~ti(ml)p;(m2)x(n - ml)x(n - m2) (4.14)
where
L
p;(m) = LJL;,jbj(m) (4.15)
j=l
is the ith ''principal dynamic mode" (PDM) of the system. It is evident that the selection
of H PDMs compacts the MDV model representation by reducing the number of filter-
bank outputs and, consequently, the dimensionality of the static nonlinearity that gener-
ates the model output. The resulting reduction in the number of free parameters for the
second-order MDV model is (L - H)(L + H + 3)/2. This reduction in the number of free
parameters has multiple computational and methodological benefits that become even
more dramatic for higher-order models (provided that the PDMs can be detennined for
higher-order models). As a matter ofpractice, the PDMs thus detennined by the second-
order model can be used for higher-order models as well, on the premise that the PDMs of
a system are reflected on all kernels. However, this may not be always true and the issue
of detennining the PDMs of a higher-order model is addressed more properly in the con-
text of equivalent network models, as discussed in Section 4.2.
The resulting Volterra kernels ofthe PDM model are
H-I
"""" A; 1L;,o
k--o = L 2 (4.16)
;=0
H-I
"l(m) = L 2A;JL;,op;(m)
;=0
H-I L
= L L 2A;JL;,oJL;,jbj(m) (4.17)
;=0 j=l
H-l
k2(m J, m2) = I A. iPi(ml)p;(m2)

i=O
H-l L L
= Ii=O jl=1
I j2=1
I A.iJLi,jlJLi,j2bjl(ml)bj2(m2) (4.18)
which indicates the effect of the elimination of the "insignificant" eigenvalues by the
PDM analysis on the original expansion coefficients ofthe Volterra kemels. Specifically,
the kernel expansion coefficients after the PbM analysis are
H-l
Cl(j) = Ii=O 2A.iJLi,OJLi,j (4.19)
H-l
C2(jJ,}2) = Ii=O A.iJLi,jlJLi,j2 (4.20)
Note that another criterion for selecting the PDMs (i.e., the "significant" eigenvalues)
can be based on the mean-square value of the model output for a GWN input, after
the zero-order Volterra kernel is subtracted (this represents a measure of output signal
power for the broadest possible ensemble of input epochs). Implementation of this cri-
terion requires an analytical expression of this mean-square value (J in tenns of the ex-
pansion coefficients for the orthonormal basis {bj(m)} for any H from 1 to L. This is
found to be
L
8(H, P) ~ E[ [Y(n) - k O]2] = P ~ cy(j) + 2p2 ?L ?L C~(jJ,j2) + {P L }2
~ C2(j,}) (4.21)
)=1 )1=1)2=1 )=1
where P is the power level ofthe GWN input, and Cl and C2 depend on H as indicated by
Equations (4.19) and (4.20), respectively. The quantity 8(H, P) is evaluated for H= 1, ... ,
L, and the ratio O(H, P)/8(L, P) (that is between 0 and 1) is compared to a critical threshold
value that is slightly less than unity (e.g., 0.95), representing the percentage of output sig-
nal power captured by the H PDMs for the input power level of interest. The minimum H
for which the ratio exceeds the threshold value determines the number ofPDMs.
The presented PDM analysis can be performed directly in the discrete-time representa-
tion of the kernels without employing any expansion on a discrete-time basis [Mar-
marelis, 1997]. However, the use ofa general orthonormal filter bank (Le., a complete ba-
sis of expansion) usually improves the computational efficiency ofthis task.
Note that a similar type of analysis can be performed through eigen-decomposition of
the second-order kernel alone and selection of its "significant" eigenvalues that determine
the PDMs in the form ofthe respective eigenvectors [Westwick & Kearney, 1992, 1994].
The first-order kernel is then included as another (separate) PDM in this approach, unless
it can be represented as a linear combination ofthe PDMs selected from the eigen-decom-
position of the second-order kernel.
This approach can be extended to higher-order kernels, whereby the eigen-decomposi-
tion is replaced by singular-value decomposition of reetangular matrices, properly con-
structed to represent the contribution of the respective discrete Volterra functional to the
output ofthe model. The column vectors that correspond to the "significant" singular val-
ues are the selected PDMs for each order ofkemel.
The fundamental tradeoff in the PDM modeling approach regards the compactness
versus the accuracy of the model as the number of PDMs changes. This issue can be ad-
dressed in a rigorous mathematical framework by considering the following matrix-vector
representation of the input-output relation:
yen) == Yo + xI(n)k l + x{(n)K2 xI(n) + x{(n)K3 x2(n) + ... + x{(n)KRxR_I(n) (4.22)
where xr(n) == [x(n) x(n - 1) ... x(n - M+ 1)] is the vector of input epoch values af-
fecting the output at discrete time n, k: == [kl(O) kl(I) ... k}(M - 1)] is the vector of
first-order kerneI values, K 2 is a symmetric matrix defined by the values of the second-or-
der Volterra kernel, and, generally, the vector Xr-l(n) and the matrix Kr are constructed so
that they represent the contribution of the rth-order Volterra functional term. For instance,
ifx:(n) == [x(n) x(n - 1)], then
x~(n)K3x2(n) = [x(n) x(n - 1)][ °

k3(0, 0, 0) 3k3(0, 0, 1) ° J
x2(n) ]
x(n)x(n _ 1) (4.23)
3kiO, 1, 1) k3(l , 1, 1) [ x 2(n _ 1)
Following this fonnulation, we may pose the mathematical question of how to reduce the
rank of the matrices K 2 , K 3 , .•. ,KR for a certain threshold of significant "singular values"
when a canonical input (such as GWN) is used. Then, singular-value decomposition
(SVD) ofthese matrices:
K R == U~SRWR (4.24)
where SR is the diagonal matrix of singular values, can yield the singular column vec-
tors of matrix WR corresponding to the "significant" singular values that are greater than
the specified threshold (rank reduction). All these "singular vectors" thus selected from
all matrices K 2 through KR (and the vector k.) can form a new reetangular matrix that
can be subjected again to rank reduction through SVD in order to arrive at the final
PDMs of the system. To account for the relative importance of different nonlinear or-
ders, we can weigh each singular vector by the square-root of the product of the re-
spective singular value with the input power level. This process represents a rigorous
way of determining the PDMs of a system (or the structural parameter H in Volterra-
equivalent network models) but requires knowledge of the VoIterra kernels of the sys-
tem-which is impractical. It is presented here only as a general mathematical frame-
work to assist the reader in understanding the meaning and the role of the PDMs in
Volterra-type modeling.
A more practical approach to the problem of selecting the PDMs is to use the estimat-
ed coefficients from the direct inversion of the MDV model estimation to form a rectan-
gular matrix C that comprises all the column vectors of all estimated kerneI expansions
[i.e., the first-order kernel expansion has one column vector, the second-order kernel ex-
pansion has L column vectors, the third-order kernel expansion has L(L + 1)/2 column
vectors, etc.]. Then application ofSVD on the matrix C yields the PDMs ofthe system as
the singular vectors corresponding to the most significant singular values. To account for
the different effect of input power on the various orders of nonlinearity, we may weigh
the column vectors of the rth-order kernel by the rth power of the root-mean-square value
ofthe de-meaned input signal.
Illustrative Examples. Two simulated examples are given below to i1lustrate the PDM
analysis using a second-order and a fourth-order Volterra system. Examples of PDM
analysis using real data from physiological systems are given in Chapter 6.
In the first example, a second-order system with two PDMs is described by the output
equation:
yen) = 1 + vI(n) + v2(n) + vI(n)v2(n) (4.25)
where yen) is the system output and (Vh V2) are the convolutions of the system input
x(n)--a I,024-point GWN signal in this simulation-with the two impulse response func-
tions, gl and g2, respectively, shown in Figure 4.2. The first-order and second-order ker-
nels ofthis system are shown in Figure 4.3 and can be precisely estimated from these data
via the Laguerre expansion technique (LET) (see Section 2.3.2). As anticipated by theory,
these kemels can be expressed as (note that ko = 1)
kI(m) = gI(m) + g2(m) (4.26)
1
k2(mh m2) = "2 [gI(mI)g2(m2) + gl (m2)g2(m I)] (4.27)
Equation (4.25) can be viewed as the static nonlinearity of a system with two PDMs gl
and g2 (and their corresponding outputs VI and V2) having abilinear cross-term VI • V2'
This nonlinearity can also be expressed without cross-terms using the "decoupled"
0.7
0.6
0.5
0.4 \
\
\
\
0.3 \
\
\
0.2 \
\
\
\
0.1 \
v
0 ._._._._._._._._._ .
\
\
..\~_._._._._._._._._._.- .. :
_._._._._._._._._._._~-~-:.-.:.:.:-:.:':::: .•.
\ ~*
"
-0.1 \
\ ,," "
\ ,,"
\ "
-0.2
\
, ,----,; " ;"
-0.3
0 5 10 15 20 25
TIME LAG (TAU)
Figure 4.2 Impulse response functions of the two filters {g1' g2} used in the simulation example
[Marmarelis, 1997].
0.875
0.75
0.625
0.5
0.375
0.25
0.125
-0.125
-0.25
0 5 10 15 20 25
TIME LAG (TAU)
(a)
X-MIN - 0.0 Y-MIN- 0.0 Z-MIN- -0.04543

X-MAX- 25 Y-MAX- 25 Z-MAX- 0.2165
(b)
Figure 4.3 First-order (a) and second-order (b) kemels of the simulated system described by Equa-
tion (4.25) [Marmarelis. 1997J.
PDMs: (gI + g2) and (gI - g2), and their corresponding outputs Ul = (VI + V2) and U2 =
(VI - V2), with offsets J-Ll,O = 2 and J-L2,O = 0, respectively [see Equation (4.13)] as
2 1 2- -u 2 1 1 1
4 1 +2)2- -u
Y = -Cu 4 2 = 1 +u 1 + -u
41 42
(4.28)
Note that if we apply the convention of normalizing the PDMs to unity Euclidean norm,
the resulting normalized PDMs are PI = 0.60(gl + g2) and P2 = 0.92(gl - g2) in this case,
and have an associated nonlinearity:
y = 1 + 1.67ul + 0.69uY - 0.30u~ (4.29)
Application ofthe eigen-decomposition (ED) approach based on LET kernel estimates

(outlined above) yields the two PDMs shown in Figure 4.4. Note that the first PDM (solid
line) has the same form as the first-order kerne I in this case. As indicated previously,
these two PDMs are the nonnalized sum and difference of gl and g2 ofFigure 4.2. The es-
timated nonlinear function (through least-squares regression) for these PDM estimates is
precisely the one given by Equation (4.29) and has no cross-terms (separable).
The presence of contaminating noise may introduce estimation errors (random fluctua-
tions) in the obtained PDMs. The point is illustrated by adding independent GWN to the
output signal for an SNR of6 dB and, subsequently, estimating the PDMs using LET-ED
that are shown in Figure 4.5. We observe excellent noise resistance of the LET-ED ap-
0.7
0.6
0.5
0.4
.... --' ....... ...
... ...
0.3 ...
... ,
I
I ... ...
0.2 I
I "
I ' .... .......
0.1 I
1 .......
,
I
o ._ _._._;_._._._._._._
I ... """,,-----
\ I
, I
\ I
-0.1 \ I
\ I
\ ;
\;
-0.2
-0.3
o 5 10 15 20 25
TIME LAG (TAU)
Figure 4.4 Two PDMs obtained via eigen-decomposition (ED) using the first two LET kernel esti-
mates. They correspond to the normalized functions P1 = O.60(g1 + 92) and P2 = O.92(g1 - 92) associ-
ated with the nonlinearity of Equation (4.29) [Marmarelis, 1997].
0.7
0.6
0.5
0.4
0.3 ......................
......
0.2 I
/
/
"' ...... ...
I ...
I
I '''',
0.1
,
I ..........
o ._._._._.-:-._._._._._._._._._,_._.- _._._._._._._._._._._.
I ..................... - .
l '
\ I
-0.1 \ t
\ I
\ I
\ I
-0.2 \ J
I
v
-0.3
o 5 10 15 20 25
TIME LAG (TAU)
Figure 4.5 Estimated PDMs from noisy data (SNR = 6 dB) using LET-ED [Marmarelis, 1997].
proach, especially for the first PDM (solid line) that corresponds to the highest eigenval-
ue. The obtained output nonlinearity associated with the normalized PDMs estimated via
LET-ED is y = 1.02 + 1.66uI + O.70UT - O.32u~, demonstrating the robustness of this
method by comparing with the precise output nonlinearity of Equation (4.29). Although
these estimates are somewhat affected by the noise in the data, the estimation accuracy of
the resulting nonlinear model in the presence of considerable noise (SNR = 6dB) demon-
strates the robustness of the proposed approach.
Next, we examine the efficacy ofthis approach for high-order systems by considering
the fourth-order system described by the output nonlinearity:
_ 1 3 1 4
y-VI +V2 +VI V2 - -VI + -V2 (4.30)
3 4
where (VI' V2) are as defined previously. This example serves the purpose of demonstrat-
ing the performance of the proposed method in a high-order case where it is not feasible
to estimate all of the system kemels in a conventional manner, whereas the proposed
method may yield a complete model (i.e., one containing all nonlinear terms present in
the system).
Note that the quadratic part of Equation (4.30) is identical to the output nonlinearity
of the previous example given by Equation (4.25). The addition of the third-order and
fourth-order terms will introduce some bias into the first-order and second-order kernel
estimates obtained for the truncated second-order model. This bias is likely to prevent
precise estimation of the previous PDMs (gI + g2) and (gI - g2) via LET-ED based on
the first two kerne 1 estimates. Furthermore, more than two PDMs will be required for a
model of this system without cross-terms in the output nonlinearity. For instance, the
use of three PDMs corresponding to gl and g2 and a linear combination (gl + ag2)
should be adequate, because gl and g2 give rise to VI and V2, respectively, and the cross-
term (VIV2) can be expressed in terms of these three PDMs as: [(VI + av2)2 - vt -
a2v~]/2a. Thus, three separate polynomial functions corresponding to these three PDMs
can be found to fully represent the system output in a separable form. The LET-ED
analysis based on the first two kernel estimates obtained from the simulated data still
yields two PDMs corresponding to two significant eigenvalues Al = 2.38 and A2 = -0.65
(with the subsequent eigenvalues being A3 = 0.14, A4 = -0.12, etc.). Note that the sign
of the eigenvalues signifies how the respective PDM output contributes to the system
output (excitatory or inhibitory).
The obtained PDMs via LET-ED for Al and A2 are shown in Figure 4.6, and resemble
the PDMs of the previous example, although considerable distortion (estimation bias) is
evident due to the aforementioned influence of the high-order nonlinearities. These two
PDMs correspond to a bivariate output nonlinearity that only yields an approximation of
the system output. This distortion can be removed by increasing the order of estimated
nonlinearities. This can be done by means of equivalent network models discussed in Sec-
tion 4.2. In this simulated example, the use of a fourth-order model yields the three PDMs
shown in Figure 4.7 that correspond precisely to gl, g2 and a linear combination of gl and
g2, as discussed above [Marmarelis, 1997].
These results are extendable to systems of arbitrary order of nonlinearity with multiple
PDMs, endowing this approach with unprecedented power of modeling applications of
highly nonlinear systems. An illustrative example of infinite-order Volterra system is giv-
en for a simulated system described by the output nonlinearity:
0.6
0.5
0.4
0.3
0.2
... ...... -_... .,- ........ , ' ... "
"'"
0.1 ......
0
.......... -.._----
-0.1
-0.2
-0.3
-0.4
0 5 10 15 20 25
TIME LAG (TAU)
Figure 4.6 The two estimated PDMs for the fourth-order system described by Equation (4.30) using
the LET-ED method, corresponding to two significant eigenvalues: A1 =2.38 and, A2 =-0.65. Consid-
erable distortion relative to the previous two PDMs is evident, due to the influence of the higher-order
terms (third and fourth order) [Marmarelis, 1997].
0.7
0.6
0.5
0.4 \'
\ \,
0.3 \\ " \
\\ " \
0.2 \\ "
\
, "".
0.1
\
\
\
-.",
o ............................\~ '.~:>:.~:.~:.~,".".". __._._._._._.".::.:.:.=.::.::i!t!!!_u J
, ,-
\
,,"
,,' .,'
-0.1 \
\
, ,"
,.'
'..., ... _,,'" ,"
-0.2
-0.3
o 5 10 15 20 25
TIME LAG (TAU)
Figure 4.7 The three estimated PDMs for the fourth-order system described by Equation (4.30) us-
ing a fourth-order model. The obtained PDMs correspond to 91 and 92' and a linear combination of
91 and 92' as anticipated by theory [Marmarelis, 1997].
y = expjv.jsinj'(v, + v2)/2] (4.31)
where VI and V2 are as defined previously. This Volterra system has kemeIs of all orders
with declining magnitudes as the order increases. This becomes evident when the Taylor
series expansions of the exponential and trigonometrie functions are used in Equation
(4.31). Application of the LET-ED method yields only two PDMs (i.e., only two signifi-
cant eigenvalues "-I = 1.86 and "-2 = 1.09, with the remaining eigenvalues being at least
one order of magnitude smaller). The prediction of the fifth-order PDM model (based on
two PDMs) is shown in Figure 4.8 along with the exact output ofthe infinite-order system
described by Equation (4.31). Note that the corresponding normalized mean-square error
(NMSE) ofthe model prediction is only 6.8% for the fifth-order PDM model, demonstrat-
ing the potential ofthe PDM modeling approach for high-order Volterra systems.
In closing this section, we must emphasize that the use ofthe PDM modeling approach
is not only motivated by our desire to achieve an accurate model (of possibly high-order
Volterra systems) but also by the need to provide meaningful physiological interpretation
for the obtained nonlinear models, as demonstrated in Chapter 6.
4.1.2 Volterra Models 01 System Cascades

The cascade oftwo Volterra systems A and B (shown in Figure 4.9) with Volterra kemeIs
{ao, ab a-; ... } and {b o, bb b 2 , ••• }, respectively, has Volterra kerneIs that can be ex-
pressed in terms of the Volterra kemels of the cascade components. The analytical ex-
pressions can be derived by the following procedure where the short-hand notation k; @
x' is employed to denote the r-tuple convolution ofthe r-th order kernel k; with the signal
o 100 200 300 400 500

TIME
Figure 4.8 Actual output of infinite-order Volterra system (trace 1) and output prediction by a fifth-
order PDM model using two estimated PDMs (trace 2). The resulting NMSE of output prediction is
only 6.8% [Marmarelis, 1997].
x(t). It is evident that the output of the A-B cascade system can be expressed by the
Volterra series expansion:
y = bo + b} ® z + b2 ® Z2 + ... (4.32)
where z(t) is the output ofthe first cascade component A that can be expressed in terms of
the input x( t) as
z = ao + a} ® x + a2 ® x 2 + ... (4.33)
By substitution of z from Equation (4.33) into Equation (4.32), we have the input-output
expression for the cascade system:
r-----------------------.
X ·1 A r1 B I y,
-----------------------
Figure 4.9 The cascade configuration of two Volterra systems, A and B.
y = b o + b l Q9 [aO + al Q9 x + a2 Q9 X 2 + ...] + b2 Q9 [aO + al Q9 x + a2 Q9 X 2 + ... ]2

= [bo + b l Q9 ao + b 2 Q9 ao2 + ...] +
+ [bI Q9 ao + b2 Q9 (ao Q9 a} + a} Q9 ao) + ...] ® X +
+ [bI Q9 a2 + b2 Q9 (ar + ao Q9 a2 + a2 Q9 ao) + ...] Q9 x 2 + . . . (4.34)
which directly determines the Volterra kernels {"o, k., k2 , ••• } of the cascade system by
equating functional terms of a given order. Thus,
!co = bo + ao f o
b)(A)dA + ai} ff
0
biA b Az)dA)dAZ + ... + ao f.··f b,.{A b ... , Ar)dA) . · · o;
0
+. . . (4.35)
k)(T) = f a)(T- A)U(T- A)b)(A)dA + 2ao ff a)(T- A))U(T- A))bz(A), Az)dA)dAZ +
... + rat;' ff a)(T- A))U(T- A))br(A), · . · ,Ar)dA) . · . ds; +. . . (4.36)
kz(Tb Tz) = rI'"

o 0
aZ(T) - Ab Tz - AZ)U(T) - A))U(TZ - Az)b)(A))8(A) - Az)dA)dA Z
+ f'J"a)(T) - A))a)(TZ - AZ)U(T) - A1)u(TZ - Az)bz(A), Az)dA)dAZ

o 0
+ aO L"" ["'[az(T) - Ab Tz - AZ) + az(Tl - AZ' TZ - A))]

o 0
U(TI - A})U(T2 - A2)b2(Aj, A2)dA}dA2 +. . . (4.37)
where U( T - A) denotes the step function (1 for T ~ A, 0 otherwise). These expressions can
also be given in terms of the multidimensional Fourier (or Laplace) transforms of these
causal Volterra kemels (denoted with capital letters):
ko = b o + aoB}(O) + a5B2(0, 0) + ... + aüBr(O, ... , 0) + ... (4.38)
K}(w) =A}(w)B}(w) + 2aoA}(w)B2(w, 0) + ... + raü-}A}(w)Br(w, 0, ... ,0) + ... (4.39)
K 2(wj, (2) = A}(W})A}(W2)B2(wj, (2) + A 2(w j, (2)B}(w} + W2)

+ 2aoA2(wj, (2)B 2(wj, W2) + ... (4.40)
It is evident that these expressions are simplified considerabIy when ao = 0 (i.e., when the
first subsystem has no output basal value), especially the expressions for higher order ker-
nels. In this case, the third-order Volterra kerne I of the cascade is given in the frequency
domain by
K 3(wj, W2' (3) =A}(WI)AI(W2)AI(W3)B3(wj, W2, (3)

2
+ -[A}(w})A 2(W2' (3)B 2(wj, W2 + (3) + A}(W2)A2(W3' w})B 2(W2' W3 + w})
3
+ A}(W3)A2(Wj, w2)B2(W3' W} + (2)] + A 3(wJ, W2' (3)B 1(w} + W2 + (3) (4.41)
The frequency-domain expressions ofthe Volterra kemels indicate how the frequency-re-
sponse characteristics of the two cascade components combine at multiple frequencies in
a nonlinear context. For instance, the interaction between input power at two frequencies
w} and W2 is reflected on the cascade output in a manner detennined by the first-order re-
sponse characteristics of A at these two frequencies combined in product with the second-
order response characteristics of B (represented by the second-order kernel) at the bi-
frequency point (wJ, W2) [i.e., the tennA}(w})A(W2)B2(wJ, W2)], and so on.
For cascade systems with more than two components, the associative property can be
applied, whereby the kernels ofthe first two components are evaluated first and then the re-
sult is combined with the third component, and so. on. The reverse route can be followed in
decomposing a cascade into its constituent components [Brilliant, 1958; George, 1959].
For finite-order cascade components, the order ofthe cascade is the product ofthe or-
ders ofthe cascade components. For instance, in the A-B cascade, if QA and QB are the fi-
nite orders of A and B respectively, then the order of the overall cascade is QA . QB, as can
be easily derived from Equations (4.32) and (4.33).
The L-N-M, L-N, and N-M Cascades. The most widely studied cascade systems to
date are the L-N-M, L-N, and N-M cascades, where Land M represent linear filters and
N is a static nonlinearity. These three types of cascades have been used extensively to
model physiological systems in a manner that lends itself to interpretation and control
(see Chapter 6). Note that the L-N-M cascade (also called "the sandwich model") was
initially used for the study of the visual system in the context of "describing functions"
[Spekreijse, 1969, 1982] and several interesting results were obtained outside the context
of Volterra-Wiener modeling. In the Volterra-Wiener context, the pioneering work is
that ofKorenberg (1973a,d), elaborated in Korenberg & Hunter (1986).
The expressions for the V olterra kernels of the L-N-M cascade can be easily adapt-
ed to the other two types of cascades by setting one filter to unity transfer function (all-
pass). Thus, we will start by deriving the kerne I expressions for the L-N-M cascade.
first, and then the expressions for the other two cascades, L-N and N-M, can immedi-
ately follow.
The L-N-M cascade is composed of two linear filters Land M with impulse response
functions g( T) and h( T), respectively, separated by a static nonlinearity N defined by the
function.f(·) that can be represented as a polynomial or apower series. The latter can be
viewed as the Taylor series expansion for any analytic nonlinearity. Ifthe nonlinearity is
not analytic, a polynomial approximation of arbitrary accuracy can be obtained over the
domain of values defined by the output of the first filter L using any polynomial basis in
the context of function expansions detailed in Appendix I. The constitutive equations for
the L-N-M cascade shown in Figure 4.10 are
v(t) = {"g(T)x(t- T)dT (4.42)

o
Q
z(t) =f[v(t)] = I lXrVr(t) (4.43)
r=O
y(t) = {"h(A)z(t- A)dA (4.44)

o
~LHNHM~
v=g®x z=/{v) y=h®z
Figure 4.10 The L-N-M configuration composed of two linear filters Land M separated by a static
nonlinearity N (see text).
where Q may tend to infinity for a general analytic nonlinearity. Combining these three
equations to eliminate v(t) and z(t), we can obtain the input-output relation in the form of
a Volterra model of order Q as folIows. Substitution of z(t) from Equation (4.43) into
Equation (4.44) yields
y(t) = L a; fooh(A)v(t - A)dA

r=0
Q
0
(4.45)
and substitution of v(t) from Equation (4.42) into Equation (4.45) yields the input-output
relation
L a; L foo foo
Q m in ( Tl , ... , T r )
y(t) = h(A)dA ... g( Tl - A) ... g( Tr - A)x(t - T}) ... x(t - Tr)dTl ... dr,
r=0 0 0 0
(4.46)
Therefore, the rth-order Volterra kerneI ofthe L-N-M cascade is:
m in ( Tl , ... , T r )
~NM( Tb . . . , Tr) = a;
f o
h(A)g( Tl - A) ... g( Tr - A)u(T} - A) ... U(Tr - A)dA
(4.47)
This expression yields the Volterra kernels ofthe L-N and N-M cascades by letting h(A)
= 8(A) in the former case and g(A) = 8(A) in the latter case. Thus, the rth-order Volterra
kernel for the L-N cascade (sometimes called the "Wiener model"-somewhat confus-
ingly, since Wiener's modeling concept is far broader than this extremely simple model)
has the rth-order Volterra kernel
k~N( Tb ... , Tr) = argeT}) ... g( Tr)U( Tl) ... U(Tr) (4.48)
and the N-M cascade (also called the "Hammerstein model") has the rth-order Volterra
kernel
k~M(TI' ... , Tr) = a r{ ~ L h(Th)8(7j2 -7jI) ... 5(7jr -7jI)}U(TI) ... U(Tr) (4.49)
r Cil, ... ,ir)
where the summation over Vb ... ,ir) takes place fori} = 1, ... ,r and V2' ... ,ir) being all
other (r- 1) indices excepti.. This rotational summation is necessary to make the Volterra
kernel invariant to any permutation ofits arguments (i.e., symmetrie by definition).
The relative simplicity of these kerne I expressions has made the use of these cascade
models rather popular. Some illustrative examples were given in Sections 1.4 and 2.1.
Additional applications of these cascade models are given in Chapter 6 and more can be
found in the literature [Hunter & Korenberg, 1986; Kearney & Hunter, 1990; Naka et al.,
1988; Sakai & Naka, 1987a,b, 1988a,b; Sakuranaga & Naka, 1985a,b,c].
It is evident from the Volterra kerne I expressions (4.47), (4.48), and (4.49) that the dif-
ferent cascade structures can be distinguished by simple comparisons ofthe first- and sec-
ond- (or higher) order Volterra kernels. For instance, any "slice" ofthe second-order kernel
at a fixed T2 value in the L-N model is proportional to the first-order kerneI-an observa-
tion used in the fly photoreceptor model discussed in Section 2.1.1 (Fig. 2.6). The same is
true for slices of higher-order kernels of the L-N model. Furthermore, the first-order
Volterra kernel is proportional to the impulse response function g( T) of the linear filter L.
Likewise, the N-M model is distinguished by the fact that the second-order kernel is
zero everywhere except at the main diagonal ('Tl = 'T2), where the second-order kernel val-
ues are proportional to the first-order kernel, which is in turn proportional to the impulse
response function h( T) of the linear filter M. Similar facts hold for the higher-order ker-
nels ofthe N-M cascade (i.e., zero everywhere except at the main diagonal, TI = 'T2 = ...
= Tn where the kernel values are proportional to the first-order kernel). Note that kernels
of order higher than second have rarely been estimated in practice and, therefore, these
useful observations have been used so far primarily in comparisons between first- and
second-order kemels,
For the L-N-M cascade, the relation between the first-order and the second-order
Volterra kernels is a bit more complicated in the time domain but becomes simplified in
the frequency domain where
KrNM(W) = aIG(w)H(w) (4.50)
K~NM(Wh W2) = a2G(wI)G(W2)H(WI + W2) (4.51)
and, consequently,
K~NM(W, 0) = a2 G(O) . KrNM(w) (4.52)

al
This relation can be used to distinguish the L-N-M structure, provided that G(O) =1= O. The
frequency-domain expression for the rth-order Volterra kernel of the L-N-M cascade is
Kr(Wh ... , wr) = arG(wI)· .. G(Wr)H(WI + ... + wr) (4.53)
A time-domain relation can also be found for the L-N-M model, when we observe
that the values of the zth-order kernel along any axis (Le., when any Ti is zero) become
proportional to the values of the (r + 1)th-order kernel along any two axes, due to the
causality of g( T) [Chen et al., 1985, 1986]:
k~M(Th ... , 'Tr-h 0, 0) = ar+ lh(O)g2(O) . g(TI) ... g(Tr-l)

ar+l
= --g(O) . k~NM( 'Th .•. , Tr-h 0) (4.54)
ar
This relation, however, requires at least third-order kerne1 estimates for meaningful im-
plementation, provided also that g(O) =f= 0 and h(O) =1= o. These requirements have limited
its practica1 use to date. Nonethe1ess, the kernel values along any axis offer an easy way
of estimating the prior filter g( T) (within a scalar), provided that g(O)h(O) =f= 0, since
k~NM(Tl' 0) = a2h(0)g(0) . g(Tl) (4.55)
Subsequently, the posterior filter h( T) can be estimated (within a scalar) through deconvo-
lution of the estimated g( T) from the first-order Vo1terra kerne I [see Equation (4.50)].
It is evident that the prior filter L andlor the posterior filter M can be estimated (within
a sca1ar) from the first-order and second-order Vo1terra kerne1s of any of these cascades
(L-N-M, L-N, N-M), following the steps described above. Subsequently, the static non-
linearity can be estimated by plotting (or regressing) the reconstructed interna1 signal z(t)
on the reconstructed interna1 signal v(t) in the case of the L-N-M cascade, or the corre-
sponding signals in the other two cases [e.g., plotting the output y(t) versus the recon-
structed internal signal v(t) in the case of the L-N cascade]. "Reconstructed" v(t) signal
imp1ies convolution of the input signal with the estimated prior filter g( T), and "recon-
structed" z(t) signal imp1ies deconvo1ution of the estimated posterior filter h( T) from the
output signal.
C1early, there is a scaling ambiguity between the filters and the static nonlinearity [i.e.,
the filters can be multip1ied by an arbitrary nonzero sca1ar and the input-output relation of
the cascade remains intact by adjusting properly the sca1e(s) of the static non1inearity].
For instance, let us assume that the prior and the posterior filters of the L-N-M cascade
are multip1ied by the sca1ars CL and CM, respectively. Then the static non1inearity z = j{v)
is adjusted to the non1inearity
__ 1
f(v) = -f(CLV) (4.56)
CM
in order to maintain the same input-output relation in the overall cascade model. This·fact
imp1ies that the estimated filters can be norma1ized without any loss of genera1ity.
The foregoing discussion e1ucidates the way of how to achieve comp1ete identification
of these three types of cascade systems, provided that their first-order and second-order
Volterra kerneis can be estimated accurate1y (within a sca1ar). This raises the issue ofpos-
sib1e estimation bias due to higher-order terms when a truncated second-order Vo1terra
model is used for kerne1 estimation. For such bias to be avoided, either a comp1ete Volter-
ra model (with all the significant nonlinear terms) ought to be used, or a Wiener model of
second order that offers an unbiased solution in this particu1ar case. It is this latter case
that deserves proper attention for these cascade systems because it offers an attractive
practica1 solution to the problem of high-order model estimation. The reason for this is
found in the fact that the Wiener kerne1s of first order and second order are proportional
to their Volterra counterparts for these cascades systems [Marmarelis & Marmare1is,
1978]. Specifically, the Wiener kernel expressions for the L-N-M cascade are found us-
ing Equation (2.57) to be
hl(T) = kl(T) ·~o

00 (2m + I)!
m!2 m a2m+1
[
p
Lg2(A)dA]m
0
oo
(4.57)
h 2('Tl' 'T2) = ki'Tl> 'T2)' ~l

00 (2m)!
(m -1)!2m a2m P
[ 1
0
00
g2(A)dA
t (4.58)
The proportionality factor between the Volterra and the Wiener kernels depends on the
coefficients of the polynomial (or Taylor series) static nonlinearity of the same parity
(i.e., odd for h 1 and even for h2) . The proportionality factor also depends on the variance
ofthe prior filter L output for a GWN input with power level P, which is given by
Var[v(t)] =P('g2(A)dA (4.59)
Thus, estimation of the normalized first-order and second-order Volterra kemels can
be achieved in practice for an L-N-M cascade system 0/ any order by means of estima-
tion of their Wiener counterparts when a GWN input is available. Subsequently, the esti-
mation of each cascade component separately can be achieved by the aforementioned
methods for any order of nonlinearity.
The feasibility of estimating such cascade models of arbitrary order of nonlinearity has
contributed to their popularity. Some illustrative examples are given in Chapter 6.
Because of its relative simplicity and the fact that cascade operations appear natural for
information processing in the nervous system, the L-N-M cascade (or "sandwich mod-
el") received early attention in the study of sensory systems by Spekreijese and his col-
leagues, who used a variant of the "describing function" approach employing a combina-
tion of sinusoidal and noise stimuli [Spekreijse, 1969]. A few years later, Korenberg
analyzed the sandwich model in the Volterra-Wiener context [Korenberg, 1973a]. This
pioneering work was largely ignored until it was properly highlighted by Marmarelis and
Marmarelis (1978), leading to a number of subsequent applications to physiological sys-
tems.
We conclude this section by pointing out that the aforementioned three types of cascade
systems cannot be distinguished by means of the first-order kerne1 alone (Volterra or
Wiener), but that the second-order kernel is necessary ifthe static nonlinearity has an even
component, or the third-order kernel is required ifthe static nonlinearity is odd. Longer cas-
cades can also be studied using the general results on cascaded Volterra systems presented
earlier; however, the attractive simplifications ofthe L-N-M cascade (and its L-N or N-M
offsprings) are lost when more nonlinearities are appended to the cascade.
4.1.3 Volterra Models 01 Systems with Lateral Branches

The possible presence of lateral feedforward branches in a system may take the form of
additive parallel branches (the simplest case) or modulatory feedforward branches that ei-
ther multiply the output of another branch or affect the characteristics (parameters or ker-
nels) of another system component (see Figure 4.11).
In the simple case of additive parallel branches (see Figure 4.11a), the Volterra kemels
ofthe overall system are simply the SUfi ofthe component kemels ofthe respective order:
kr ( Th . . . , Tr ) = a r ( Th . . . , Tr ) + br ( Tl' . . . , Tr ) (4.60)
where {a.} and {b r } are the rth-order Volterra kernels of A and B, respectively.
x y=A[x,z]
X -.
y=ZA+ZB
(b) (c)
(a)
Figure 4.11 Configurations of modular models with lateral branches. (a) Two parallel branches con-
verging at an adder. (b) Two parallel branches converging at a multiplier. (c) A lateral branch B mod-
ulating component A.
In the case of a multiplicative branch (see Figure 4.11b), the system output is given by
y = [ao + al ® x + a2 ® x 2 + ... ][b o + b l 0 X + b 2 ® x 2 + ... ] (4.61)
Thus, the Volterra kernels of the overall system are
ko = aobo (4.62)
kl(T) = aObl(T) + bOal(T) (4.63)
1
k2(Tb T2) = aOb2(Tb T2) + bOa2(Tb T2) +i [al(TI)b l(T2) + al(T2)b l(TI)] (4.64)
r
k r ( Tb ... , Tr ) = I a
j=O
j( Tb ... , Tj)br- j ( Tj+b ... , Tr ) (4.65)
for Tl ~ T2 ~ ... ~ Tn so that the general expression for the rth-order kernel need not be
symmetrized with respect to the arguments (Tb . . . , Tr ) .
In the case of the "regulatory" branch B of Figure 4.11 c, the Volterra kerne1 expres-
sions for the overall system will depend on the specific manner in which the output Z of
component B influences the internal characteristics of the output-generating component
A. For instance, if z(t) multiplies (modulates) the first-order kernel of component A, then
y = ao + [bo + b, ® x + b2 ® x 2 + .. .]al 0 x + a2 ® x 2 + ... (4.66)
and the Volterra kernels of this system are
k r( Tb ... , Tr ) = a r( Tb ... , Tr ) + al(TI)b r- l( T2' ... , Tr ) (4.67)
for Tl ~ T2 ~ ... ~ r., to avoid symmetrizing the last term ofEquation (4.67).
This latter category of "regulatory" branches may attain numerous diverse forms that
will define different kernel relations. However, the method by which the kernel expres-
sions are derived in all cases remains the same and relies on expressing the output in
terms of the input using Volterra representations.
Note that the component subsystems may also be expressed in terms of parametric
models (e.g., differential equations). Then the equivalent nonparametric model of each
component must be used to derive the Volterra kemels of the overall system in tenns of
the component kernels. An example of this was given in Section 1.4 for the "minimal
model" of insulin-glucose interactions.
4.1.4 Volterra Models of Systems with Feedback Branches

Systems with feedback branches constitute a very important class of physiological sys-
tems because of the critical role of feedback mechanisms in maintaining stable operation
under normal or perturbed conditions (horneostasis and autoregulation). Feedback mecha-
nisms may attain diverse forms, including the closed-loop and nested-loop configurations
discussed in Chapter 10. In this section, we derive the Volterra kernels for certain basic
feedback configurations depicted in Figure 4.12.
The simplest case (Figure 4.12a) exhibits additive feedback that can be expressed as
the integral input-output equation:
y = al ® [x + b, ® Y + b2 ® r + ...] + a2 ® [x + b, ® Y + b2 ® y 2 + ... ]2 + ... (4.68)
where we have assumed that ao = 0 and bo = 0 to simplify matters (which implies that ko =
0). This integral equation contains Volterra functionals of the input and output, suggest-
ing the rudiments of the general theory presented in Chapter 10. The explicit solution of
this integral equation (i.e., expressing the output signal as a Volterra series of the input
signal) is rather complicated but it may be achieved by balancing Volterra tenns of the
same order. Thus, balancing terms of first order yields
kI 0 x = a I 0 x + aI ®b I ®k I 0x (4.69)
which can be solved in the frequency domain to yield
A}(w)
(4.70)
K)(w) = 1 -A)(w)B)(w)
Equation (4.70) is well known from linear-feedback system theory. Balancing tenns of
second order, we obtain
k2 0x2 =al 0 [bI 0 k2 ®x2+ b2 0 (k} 0X)2] +a2 0 [x2 + (bI 0 k I 0X)2 + 2x(b1 0 k} x)] o
(4.71)
x y y x y
z z
(a) (b) (c)
Figure 4.12 Configurations of modular models with feedback branches: (a) additive feedback
branch B; (b) multiplicative feedback branch B; (c) modulatory feedback branch B; all acting on the
forward component A.
4. 1 MODULAR FORM OF NONPARAMETRIC MODELS 201
which can be solved in the frequency domain to yield the second-order Volterra kernel of
the feedback system:
K 2(Wb W2) = {A 1(WI + w2)B2(Wb w2)K1(Wl)K1(W2)

+ A 2(Wb w2)[1 + B 1(Wl)B 1(W2)K 1(Wl)K 1(W2) +
1
+ "2 -1
[B1(Wl)K1(Wl) + B 1(W2)K 1(W2)]}[1 -A 1(WI + w2)B1(Wl + W2)] (4.72)
This approach can be extended to any order, resulting in kernel expressions of increasing
complexity. Obviously, these expressions are simplified when either A or B is linear. For
instance, ifthe forward component A is linear, then the second-order kernel becomes
K 2(Wb W2) =A 1(WI + w2)B2(Wb w2)K1(Wl)K1(W2)[1-A 1(Wl + w2)B1(Wl + W2)]-1 (4.73)
and the third-order kerneI is given by
Kiwj, W2' W3) = { ~Aj(WJ + W2 + w3)[B2(wj, w2)K2(wl> ~)KJ(W:J) +

+ B 2(W2' w3)K2(W2' w3)K 1(Wl) + B 2(W3' wl)K2(W3' wl)K1(W2)]
+ B 3(wj, ~, W])KJ(WJ)KJ(~)KJ(W3)}[1 -AJ(wJ + W2 + w3)BJ(wJ + W2 + W3)]-J (4.74)
This case of the linear forward and nonlinear feedback is discussed again in the following
section in connection with nonlinear differential equation models.
We examine now the multiplicative feedback ofFigure 4.12b. The input-output inte-
gral equation (assuming ao = k o = 0 but b o i= 0) is
y = al @ [x(bo + b 1 @ Y + b 2 @ y 2 + ...)] + a2 @ [x(bo + b 1 @ y + b 2 @ y2 + ... )]2 + ...

(4.75)
which yields the first-order balance equation
k 1 @x= al @ (boX) (4.76)
from which the first-order Volterra kernel of the multiplicative feedback system is de-
rived to be
K 1(w) = boAl(W) (4.77)
The second-order balance equation is
k2 @ x 2 = al ® [x(b l @ k, ® x)] + a2 @ (boX)2 (4.78)
which yields the second-order Volterra kernel
2 bo
K 2(Wb W2) = b~2(Wb W2) + TAI(WI + w2)[B1(WI)AI(WI) + BI (w2)A 1(W2)] (4.79)
Note that the kernel expressions for multiplicative nonlinear feedback are simpler than
their counterparts for additive nonlinear feedback. The case of "regulatory" feedback of
Figure 4.12c depends on the specific manner in which the feedback signal z(t) influences
the characteristics of the forward component A and will not be discussed further in the in-
terest of saving space.
4.1.5 Nonlinear Feedback Described by Differential Equations

This case was first discussed in Section 3.2 and is revisited here in order to elaborate on
the relationship between parametric models described by nonlinear differential equations
and modular feedback models. It is evident that any of the component subsystems (A
and/or B), discussed above in connection with modular feedback systems/models, can be
described equivalently by a parametric or nonparametric model and converted into the
other type using the methods presented in Sections 3.2-3.5. In this section, we will elabo-
rate further on the case of a system with a linear forward and weak nonlinear feedback,
shown in Figure 4.13, that is described by the differential equation
L(D)y + Ej(y) = M(D)x (4.80)
where lEI ~ 1, and L(D), M(D) are polynomials in the differential operator D:!: [d(·)/dt].
If the functionj'(-) is analytic or can be approximated to an arbitrary degree of accuracy by
apower series (note that the linear term is excluded since it can be absorbed into L) as
[Marmarelis, 1991]
j(y) = Ianyz (4.81)

n=2
then the resulting Volterra kernels are
M(s)
K 1(s) = L(s) (4.82)
Kn(s}, ... ,sn) = - E a,J(l(Sl) ... K1(sn)/L(Sl + ... + Sn) (4.83)
where terms of order E 2 or higher have been considered negligible.
Figure 4.13 The modular form of the nonlinear feedback system described by Equation (4.80). L
and M are linear dynamic operators and f(') is an analytic function (or a function approximated by a
polynomial). The factor -E in the feedback component denotes weak negative feedback.
The first-order Wiener kernel in this case is
H 1(Jw) = K 1(JW){ 1 _ ~
L(jw) m=l m!
f
(2m + I)! ( P K)m
2 a2m+1
}
= K 1(JW)[ 1 - ~C
L(Jw) 1
(P)] (4.84)
where K is the integral ofthe square of k 1 andP K = (PK). The second-order Wiener kernel is:
HljWj,j~) =- E Kj(jwj)Kj(j~)
L(JW1 + jW2) m=O
f (2m + 2)! (PK)m
m!2 2 a2m+2
Kj(jWj)Kj(j~) C2CP)
(4.85)
=- E L(jWj +j~)
We observe that as the GWN input power level P varies, the wavefonn ofthe first-or-
der Wiener kernel changes but the second-order Wiener kernel remains unchanged in
shape and changes only in scale. Note that the functions C 1(P) and C2(P) are power series
(or polynomials) in (P K) and characteristic of the system nonlinearities. The Wiener ker-
nels approach their Volterra counterparts as P diminishes (as expected).
These results indicate that, for a system with linear forward and weak nonlinear feed-
back (i.e., IE ail ~ 1), the first-order Wiener kernel in the time domain will be
hj('r) = kj(r) - E Cj(P)(kj(r- A)g(A)dA (4.86)
and the second-order Wiener kernel will be
m ineTl, T2)
h2 (Tb T2) = - E C2(P)
I 0 k 1(T1 - A)k 1(T2 - A)g(A)dA (4.87)
where g(A) is the inverse Fourier transform of I/L(jw).

A companion issue to that of changing input power level is the effect of changing
mean level of the experimental input (with white noise or other perturbations superim-
posed on them) in order to explore different ranges of the system function. The resulting
kernels for each different mean level ~ of the input will vary if the input data are defined
as the deviations from the mean level each time. To reconcile these different measure-
ments, we can use a reference mean level ~O in order to refer the kernels {knl-t} obtained
from different mean levels ~ to the reference kerneis {kn O} according to the relation
k~( Tb ... , Tn) = I (n +,.,i)! (~- ~oY°1

00
00
... fk~+i(Tb ... , Tn, Ub ... , Ui)du1 ... da, (4.88)

i=O nu: 0
The first-order Wiener kernel for this class of systems with static nonlinear feedback is
given in tenns ofthe reference Volterra kernels (when /-Lo = 0) by the expression
hl)(r) = k,O(r)- E A (g(A)k?(r- A)dA (4.89)

o
where
+ i+ I)! (PK)m
A=
{
00 00
m= 01= 0
,., -2 (J.L1'Y.a2m+i+l }
L? (2m m.l. (4.90)
m+i~l
and l' is the integral of k? Note that the first-order Wiener kernel for J.L =/; 0 is also affect-
ed by the even-order terms ofthe nonlinearity in this case, unlike the case of IL = 0, where
it is affected only by the odd-order terms ofthe nonlinearity.
Below, we use computer simulations of systems with cubic and sigmoidal feedback to
demonstrate the effect of changing GWN input power level and/or mean level on the
waveform ofthe first-order Wiener kernel. This will help us explain the changes observed
in the first-order Wiener kemels of some sensory systems when the GWN input power
level and/order mean level is varied experimentally.
Example 4.1. Cubic Feedback Systems

First, we consider a system with a low-pass forward linear subsystem (L-l) and a cubic
negative feedback - E y3, as shown in Figure 4.13 (for M == 1). For lEI ~ 1, the first-
order Wiener kernel is given by Equation (4.86) as
h\(r) = g(r) - 3 E P K (g(A)g(r- A)dA (4.91)
where the first-order Volterra kernel k 1(T) is identical to the impulse response function
g(T) ofthe low-pass linear, forward subsystem in this case. For a zero-mean GWN input
with power levels of P = 1, 2, 4 and cubic feedback coefficient E = 0.001, the first-order
Wiener kerneI estimates are shown in Figure 4.14 along with the estimate for E = 0 (i.e.,
no cubic feedback) or P ~ 0, which corresponds to k1(T) == g(T). We observe a gradual
decrease of damping (i.e., emergence of an increasing "undershoot") in the kernel esti-
mates as P increases, consistent with Equation (4.91). This corresponds to a gradual in-
crease oftheir bandwidth as P increases, as shown in Figure 4.15, where the FFT magni-
tudes of these kerneI estimates are shown up to normalized frequency 0.1 Hz (Nyquist
frequency is 0.5 Hz). We observe the gradual transition from an overdamped to an under-
damped mode and a companion decrease of zero- frequency gain as P increases, similar to
what has been observed in certain low-pass sensory systems such as retinal horizontal
cells. Note that this system becomes unstable when P increases beyond a certain value.
Next we explore the effect of varying the GWN input mean level J.L while keeping E
and P constant (E = 0.001 and P = 1) using input mean levels of J.L = 0, 1, 2, and 3, suc-
cessively. The obtained first-order Wiener kerne I estimates are shown in Figure 4.16 and
exhibit changes in their wavefonn as J.L increases that are qualitatively similar to the ones
induced by increasing P (i.e., increasing bandwidth and decreasing damping). According
to the general expression ofEquation (4.89), we have for this system
h't(r) = g(r) - 3 E (P K + (J.//y)2l(g(A)g(r- A)dA (4.92)
We see that the effect ofincreasing Pis similar to the effect ofincreasing J.L2 and the dif-
ferential effect is proportional to K and r,
respectively. Another point of practical interest
0 ..4 00
0.350
0.300
0'-250
0.200
,..,
~
0.150
:r:
0.100
o.506E-01
0.0
-O.500E-01
-0.100 I
0.0 e.oo 16.0 24.0 32.0 40.0

TIME LAG (TAU)
Figure 4.14 First-order Wiener kernel estimates of the system with negative cubic feedback (E =
0.001) and a low-pass forward subsystem g(7"), obtained for P = 1,2, and 4, along with the first-order
Volterra kernel of the system (P ~ 0), which is identical to g(7') in this case. Observe the increasing
undershoot in the kernel waveform as P increases [Marmarelis, 1991].
4.00
:5.60
3.20
2.80
2.40
2.00
1.60
1.20
0.800
0.400
0.0
T
0.0 O.200E-01 O.400E-01 O.600E-01 o.aOOE-01 0.100
NORMALlZED FREOuENCY (Hz}
Figure 4.15 FFT magnitudes of the first-order kerneis in Figure 4.14, plotted up to normalized fre-
quency of 0.1 Hz. Observe the gradual transition from overdamped to underdamped mode and the
increase of bandwidth as P increases [Marmarelis, 1991].
0.420
0.360
0.300
0.240
0.180
S 0.120
:I:
0.600E-01
0.0
-O.600E-01
-0.120
-0.180
T
0.0 8.00 16.0 24.0 32.0 40.0
TIME LAG (TAU)
Figure 4.16 First-order Wiener kernel estimates of system with negative cubic feedback and a low-
pass forward subsystem, obtained for J.t = 0, 1, 2, and 3 (P = 1, E = 0.001 in all cases). The changes
in kernel waveform follow the same qualitative pattern as in Figure 4.14 [Marmarelis, 1991].
is the difference between the first-order kernel (Volterra or Wiener) and the system re-
sponse to an impulse. This point is often a source of confusion due to misconceptions in-
grained by linear system analysis. For a third-order system, such as in this example for
small E, the response to an impulse input x(t) = AS(t) is
rtf..t) = Ag(t) - E A3fg(A) g3(t - A)dA (4.93)

o
which is clearly different from the first-order Volterra kernel kl(t) == g(t), or its Wiener
counterpart given by Equation (4.91). Another point of practical interest is the response
of this nonlinear feedback system to a step/pulse input x(t) = Au(t), since pulse inputs
have been used extensively in physiological studies. The system response to the pulse
input is
ruCt) = A f g(T)dT- E A3fg(

o 0
T){f g(A - T)dA}3dr
T
(4.94)
and the changes in the response waveforms as the pulse amplitude increases are demon-
strated in Figure 4.17, where the responses of this system are shown for pulse amplitudes
of 1, 2, and 4. The observed changes are qualitatively consistent with the previous discus-
sion (i.e., the responses are less damped for stronger pulse inputs). However, the reader
must note that we cannot obtain the first-order kernel (Volterra or Wiener) or the response
116
11.0
1o.S
1.08
1.50
&00
...50
3.00
1..50
0.0
-1.50 -,
0.0, 25.0 50.0 75.0 100. 125.
TIME
Figure 4.17 Responses of negative cubic feedback system (E = 0.001) to pulse inputs of different
amplitudes (1, 2, and 4). Observe the gradual decrease of damping and latency of the onset re-
sponse as the pulse amplitude increases, as weil as the difference between onset and offset tran-
sient responses [Marmarelis, 1991].
to an impulse by differentiating the pulse response over time, as in the linear case. Ob-
serve also the sharp difference between onset and offset transient response, characteristic
of nonlinear systems and so often seen in physiological systems.
The steady-state value ofthe step response for various values of A is given by
L(O)y + E y3 = A (4.95)
(in the region ofstability ofthis system) where L(O) == I/K I(O) for this system. The steady-
state values of the pulse response as a function of pulse amplitude are shown in Figure
4.18. Note that these values are different, in general, from the mean response values when
the GWN input has nonzero mean.
Having examined the behavior of this nonlinear feedback system with a low-pass
(overdamped) forward subsystem, we now examine the case of a band-pass (under-
damped) forward subsystem with negative cubic feedback of E = 0.008. The resulting
first-order Wiener kerneI estimates for increasing GWN input power level (viz., P = 1, 2
and 4) are shown in Figure 4.19, along with the first-order Volterra kerneI ofthe system
(which is the same as the impulse-response function ofthe linear forward subsystem) that
corresponds to the case of P = O. We observe a gradual deepening ofthe undershoot por-
tion of the band-pass kernel accompanied by a gradual shortening of its duration as P in-
creases (i.e., we see a gradual broadening ofthe system bandwidth and upward shift ofthe
resonance frequency as P increases). This is demonstrated in Figure 4.20 in the frequency
domain, where the FFT magnitudes ofthe kernels ofFigure 4.19 are shown. The changes
20 ..0
w
is.o
I
~
12.0
8..00
Q..
4.00
~
LaI
0..0
i
~
-4.'00
~
~
, -8.00
~
;;
";'12.0
-16.0
-20.0
1-'
-16.0 -8.00 0.0 8.00 16.0 24.0
PULSE AMPUTUDE
Figure 4.18 The steady-state values of the pulse responses as a function of input pulse amplitude
for the negative cubic feedback system (E = 0.001) [Marmarelis, 1991].
0.480
0.400
0..320
0.240
0.180
'""
~ 0.8ooE-01
%:
0.0
-0.8ooE-01
-0.160
-0.2.0
-0.320
-.
0.0 8.00 16.0' 24.0 32.0 40.0
TIME lAG
Figure 4.19 First-order Wiener kerneis of the negative cubic feedback system (E = 0.008) with the
band-pass (underdamped) linear forward subsystem (corresponding to P = 0) for P = 0, 1, 2, and 4.
Observe the increasing undershoot as P increases [Marmarelis, 1991].
0.125E-01
0.1'..,E-01
O.10QE-01
E 0.875E-02
i
i~
w
0.75OE-02
0.625E-02
O.5DOE-02
~ 0.375E-02
0.25OE-02
0.125E-02
0.0
0.0 0.400E-01 0.800E-Ot 0.120 0.160 0.200
NORMALIZED rREQUENCY
Figure 4.20 FFT magnitudes of the first-order Wiener kerneis shown in Figure 4.19. Observe the
gradual increase of bandwidth and upward shift of resonant frequency as P increases [Marmarelis,
1991].
in the wavefonn of these kernels with increasing P are consistent with our theoretical
analysis and mimic changes observed experimentally in some band-pass sensory systems
(e.g., primary auditory fibers).
Note that the effect ofincreasing GWN input mean level on the first-order Wiener ker-
nels is not significant, due to the fact that 'Y (i.e., the integral of k 1) is extremely small in
this case-ef. Equation (4.92). Finally, the system response to input pulses of increasing
amplitude (A = 1, 2, and 4) are shown in Figure 4.21 and demonstrate increasing reso-
nance frequency and decreasing damping in the pulse response as A increases. Note also
that the steady-state values of the pulse responses are extremely small, and the onset/off-
set response waveforms are similar (with reverse polarity), due to the very small value of
'Y = K 1(0) [cf. Equation (4.95)].
Example 4.2. Sigmoid Feedback Systems

The next example deals with a sigmoid feedback nonlinearity which, unlike the cubic
one, is bounded for any output signal amplitude. The arctangent function
2
j(y) = - arctan( ay) (4.96)
7T'
was used in the simulations (a = 0.25) with the previous low-pass forward subsystem and
the resulting first-order Wiener kernels for P = 1 and E = 0,0.125, and 0.5 are shown in
Figure 4.22. The qualitative changes in waveform are similar to the cubic feedback case
for increasing input power level P or feedback strength E. However, for fixed sigmoid
8.00
1.40
4.80
3.20
1.60
0.0
-1.10
-3.20
-4.80
-6.40
-8.00 -
0.0 25.0 50.0 75.0 100. 125.
TIME
Figure 4.21 Response of negative feedback system (E = 0.008) with underdamped forward to
pulse inputs of different amplitudes A = 1,2, and 4. Observe the increasingly underdamped response
as A increases, and the negligible steady-state responses [Marmarelis, 1991].
0.450
0.400
0.350
0.300
0.250
.........
t:,
0.200
J:
0.150
0.100
c
O.500E-Q1
0.0
-O.500E-Q1
f i i 'I I I 1',--1
0.0 8.00 6.0 24.0 32.0 40.0
TIME LAG (TAU)
Figure 4.22 First-order Wiener kernel estimates of negative sigmoid feedback system with the pre-
vious low-pass forward subsystem for E = 0, 0.125, and 0.5 (P = 1, a = 0.25 in all cases). Observe
the similarity in changes of kernel waveform with the ones shown in Figure 4.14 [Marmarelis, 1991].
feedback strength (E) the kernels resulting from increasing GWN input power level P
follow the reverse transition in wavefonn, as demonstrated in Figure 4.23, where the ker-
nels obtained for P == 1, 4, 16, and 64 are shown for E = 0.25 in all cases.
Bear in mind that the first-order Volterra kernel of this sigmoid feedback system is
not the same as the impulse response function of the forward subsystem, but it is the im-
pulse response function of the overall linear feedback system when the linear tenn of
the sigmoid nonlinearity (i.e., its slope at zero) is incorporated in the (negative) feed-
back loop. Thus, the kerne I wavefonn follows the previously described gradual changes
from the impulse response function of the linear feedback system to that of the linear
forward subsystem as P increases (i.e., the kernel wavefonn changes gradually from un-
derdamped to overdamped as P increases and the gain of the equivalent linearized feed-
back decreases).
Because of the bounded nature of the (negative) sigmoid nonlinearity, large values of
E andlor P do not lead to system instabilities as in the case of cubic feedback. Increasing
values of E result in decreasing damping, eventually leading to oscillatory behavior. This
is demonstrated in Figure 4.24, where the kernels for E = 0.5, 1,2, and 4 are shown (P =
1). The oscillatory behavior of this system, for large values of E, is more dramatically
demonstrated in Figure 4.25, where the actual system responses y{t) for E = 100 and 1000
are shown (P == 1). The system goes into perfect oscillation regardless ofthe GWN input,
due to the overwhelming action of the negative sigmoid feedback that is bounded and
symmetric about the origin. The amplitude ofthis oscillation is proportional to E, but is
independent of the input power level. In fact, the oscillatory response remains the same in
amplitude and frequency for any input signal (regardless of its amplitude and wavefonn)
0.450
0.400
0.350
0 ..300
0.250
S 0.200
:r
0.150
0.100
O.500E-01
0.0
-0.500E-01
T
0.0 8.00· 16.0 24.0 32.0 40.0
TIME LAG (TAU)
Figure 4.23 First order Wiener kernel estimates of negative sigmoid feedback system with the pre-
vious low-pass forward subsystem for P = 1, 4, 16, and 64 (E = 0.25, a = 0.25 in all cases). Observe
reverse pattern of kernel waveform changes from the ones in Figures 4.22 or 4.14 [Marmarelis, 1991].
212 MODULA R AND CONNECTIONIST MODELING
0.600
O.~OO
0.400
0.300
0.200
,...
....,
~
0.100
%
0.0
-0.100
-0.200
-0.300
-0.400
0.0 8.00 16.0 24.0 32.0 40.0
TIME lAG
Figure 4.24 First-order Wiener kerneis of negative sigmoid feedbac

k system with low-pass (over-
damped) forward subsystem, for E = 0.5,1,2 , and 4. Observe transitio
n to oscillatory behavior as E
increases [Marmarelis, 1991].
0.25OE+ 04
0.2ooE+04 2
0.150E+0 4
0.1ooE+04
500.
;::::'
~
0.0
-500.
-0.1ooE+ 04
-0. 15OE+04
-0.200E+ 04
-0.250E +04
1
0.0 100. 200. 300. 400. 500.
TIME
Figure 4.25 Oscillatory response of negative sigmoid feedbac

k system for very large feedback gain
E = 100 and 1000, and GWN input (P = 1) [Marmarelis,
1991].
as long as the value of E is much larger than the maximum value of the input. The initial
transient and the phase of the oscillation, however, may vary according to the input pow-
er and waveform. The frequency of oscillation depends on the dynamics (time constants)
ofthe linear forward subsystem. For instance, a low-pass subsystem with shorter memory
(i.e., shorter time constants) leads to higher frequency of oscillation, and so does an un-
derdamped system with the same memory extent.
Although the case of oscillatory behavior is not covered formally by the
Volterra-Wiener analysis because it violates the finite-memory requirement, it is of great
interest in physiology because of the numerous and functionally important physiological
oscillators. Therefore, it is a subject worthy of further exploration in the context of large
negative compressive (e.g., sigmoid) feedback, because the foregoing observations are
rather intriguing. For instance, can an oscillation of fixed frequency be initiated by a
broad ensemble of stimuli that share only minimal attributes irrespective of waveform
(e.g., having bandwidth and dynamic range within certain bounds) as long as the feedback
gain is large?
The effect ofvarying slope ofthe sigmoid nonlinearity was also studied and a gradual-
ly decreasing damping with increasing slope was observed [Marmarelis, 1991]. This tran-
sition reaches asymptotically a limit in both directions of changing a values, as expected.
For a ~ 00, the sigmoid nonlinearity becomes the signum function and leads to perfect
oscillations; and for a ~ 0, the gain of the feedback loop diminishes, leading to a kernel
identical to the impulse response function of the forward linear subsystem.
The effect of nonzero GWN input mean J.L is similar to the effect of increasing P, that
is, the first-order Wiener kernels become more damped as J.L increases, which indicates
decreasing gain of the equivalent linearized negative feedback.
In the case of the underdamped forward subsystem and negative sigmoid feedback, the
changes in the kernel waveforrn undergo a gradual transition from the linearized feedback
system to the forward linear subsystem as the GWN input power level P increases. The
two limit waveforrns (for P ~ 0 and P ~ 00) of the first-order Wiener kernel are shown in
panel (a) ofFigure 4.26 for E = 1, a = 0.25. The effect ofthe negative sigmoid feedback
is less dramatic in this case, since the kerneI retains its underdamped mode for all values
of P. There is, however, a downward shift of resonance frequency and increase of damp-
ing when P increases, as indicated by the FFT magnitudes of the "limit" kerneI wave-
fonns shown in panel (b) of Figure 4.26.
Example 4.3. Positive Nonlinear Feedback

The reverse transition in the first-order Wiener kerne I waveform is observed when the po-
larity ofthe weak nonlinear feedback is changed, as dictated by Equation (4.91). Positive
decompressive (e.g., cubic) feedback leads to a decrease in resonance frequency and
higher gain values in the resonant region. Also, the reverse transition in kernel waveforrn
occurs (i.e., upward shift of resonance frequency and decrease of damping with increas-
ing P values) when the compressive (e.g., sigmoid) feedback becomes positive.
The great advantage of sigmoid versus cubic feedback is that stability of the system
behavior is retained over a broader range of the input power level. For this reason, sig-
moid (or other bounded) feedback is an appealing candidate for plausible models of phys-
iological feedback systems. For those systems that exhibit transitions to broader band-
width and decreased damping as P increases, candidate models may include either
negative decompressive (e.g., cubic) or positive compressive (e.g., sigmoid) feedback.
For those systems that exhibit the reverse transition pattern (i.e., to narrower bandwidth
0.480
0.400
0.320
0.240
0.180
...
""'"
~ 0.800E-01
%
0.0
-0.800E-01
-0.160
-0.240
-0.320
T
0.0 8.00 16.0 24.0 32.0
TIME LAG
(a)
Q.125E-01
O.11JE-01
0.1ooE-01
"""" 0.815E-02
~
%
~ 0.750E-02
w
Q
::;) 0.625E-02
i
~ 0.5OOE-02
....t: 0.375E-02
0.250E-02
O.12SE-02
0.0
lii i I"',' J
0.0 O.SOOE...01 0.100 0.150 0.200 0.250
NORMALIZED rREQU[NCY
(h)
Figure 4.26 The two limit waveforms of the first-order Wiener kernel for the negative sigmoid feed-
back system (E = 1, a = 0.25) with underdamped forward subsystem, obtained for P ~ 0 and P ~ 00
(a), and their FFT magnitudes (b). Observe the lower resonance frequency and increased damping for
p~ 00 [Marmarelis, 1991].
214
and increased darnping as P increases), candidate models may include either positive de-
compressive or negative compressive feedback.
Example 4.4. Second-Order KerneIs ofNonlinear Feedback Systems

Gur exarnples so far have employed nonlinear feedback with odd symmetry (cubic and
sigmoid), and our attention has focused on first-order Wiener kemels because these sys-
tems do not have even-order kernels. However, if the feedback nonlinearity is not odd-
symmetrie, then even-order kernels exist. An example of this is given for negative qua-
dratic feedback of the form E r (for E = 0.08, P = 1) where the previous band-pass
(underdarnped) forward subsystem is used. The resulting second-order Wiener kernel is
shown in Figure 4.27 and has the form and size predicted by the analytical expression of
Equation (4.87). The first-order Wiener kernel is not affected significantly by the quadrat-
ic feedback for small values of E .
It is important to note that Wiener analysis with nonzero GWN input mean yields
even-order Wiener kernels (dependent on the nonzero input mean IL), even for cubic or
sigmoid feedback systems, because a nonzero input mean defines an "operating point"
that breaks the odd symmetry ofthe cubic or sigmoid nonlinearity. For instance, a nega-
tive cubic feedback system, where only K 1 and K 3 are assumed to be significant for small
values of E, has the second-order Wiener kerne1
H';( WI , wz) = 3J.LK3(WI, WZ, 0)

= -3 E J.L'yK,(WI)K,(wz)K1(w, + Wz) (4.97)
Equation (4.97) implies that the second-order Wiener kernel will retain its shape but in-
crease linearly in size with increasing IL (provided, of course, that E is smalI).
Figure 4.27 Second-order Wiener kernel of the negative quadratic feedback system with a band-
pass (underdamped) forward subsystem (E = 0.08, P = 1) [Marmarelis , 1991].
Nonlinear Feedback in Sensory Systems. The presented analysis of nonlinear

feedback systems is useful in interpreting the Wiener kerneI measurements obtained for
certain visual and auditory systems under various conditions of GWN stimulation, as dis-
cussed below.
The Wiener approach has been applied extensively to the study of retinal cells using
band-limited GWN stimuli [e.g., Marmarelis & Naka, 1972, 1973a, b, c, d, 1974a, b]. In
these studies, the experimental stimulus consists of band-limited GWN modulation of
light intensity about a constant level of illumination, and the response is the intracellular
or extracellular potential of a certain retinal cell (receptor, horizontal, bipolar, amacrine,
ganglion). Wiener kerneIs (typically of first and second order) are subsequently computed
from the stimulus-response data. The experiments are typically repeated for different lev-
els of mean illumination (input mean) and various GWN input power levels in order to
cover theentire physiological range of interest. It has been observed that the waveform of
the resulting kernels generally varies with different input mean and/or power level. We
propose that these changes in waveform may be explained by the presence of a nonlinear
feedback mechanism, in accordance with the previous analysis. Note that these changes
cannot be explained by the simple cascade models discussed in Section 4.1.2.
The first such observation was made in the early 1970s [Marmarelis & Naka, 1973b]
on the changing waveform of first-order Wiener kernel estimates of horizontal cells in the
catfish retina, obtained for two different levels of stimulation (low and high mean illumi-
nation levels with proportional GWN modulation). The kernel corresponding to high lev-
el of stimulation was less damped and had shorter latency (shorter peak-response time) as
shown in Figure 6.1. This observation was repeated later (e.g., [Naka et al., 1988; Sakai &
Naka, 1985, 1987a,b]) for graduated values ofincreasing P and M. The observed changes
are qualitatively similar to the ones observed in our simulations of negative decompres-
sive (cubic) feedback systems with an overdamped forward subsystem. However, the
changes in latency time and kernel size are much more pronounced in the experimental
kernels than in our simulations of negative cubic feedback systems presented earlier.
To account for the greater reduction in kernel size observed experimentally, we may in-
troduce a compressive (static) nonlinearity in cascade with the overall feedback system that
leads to an additional reduction ofthe gain ofthe overall cascade system as P and/or J.L in-
crease. On the other hand, a greater reduction in the peak-response (latency) time may re-
quire the introduction of another dynamic component in cascade with the feedback system.
Led by these observations, we propose the modular (block-structured) model, shown
in Figure 4.28, for the light-to-horizontal cell system [Marmarelis, 1987d, 1991]. This
model is comprised of the cascade of three negative decompressive (cubic) feedback sub-
systems with different overdamped forward components (shown in Figure 4.29) and a
compressive (sigmoidal) nonlinearity between the outer and the inner segments of the
photoreceptor model component. The first part of this cascade model, comprised of the
PLIPN feedback loop and the compressive nonlinearity CN, corresponds to the transfor-
mations taking place in the outer segment of the photoreceptor and represents the nonlin-
ear dynamics ofthe phototransduction process. The second part, comprised ofthe RLIRN
feedback loop, represents the nonlinear dynamic transformations taking place in the inner
segment ofthe photoreceptor (including the receptor terminals). The third part, comprised
of the HLIHN feedback loop, represents the nonlinear dynamic transformations taking
place in the horizontal cell and its synaptic junction with the receptor. Note that this mod-
el does not differentiate between cone and rod receptors and does not take into account
spatial interactions or the triadic synapse with bipolar cells (see below).
x(t) y(t)
'----v---J '---v-----/
Receptor Cell ReceptorCell HorizontalCell
Outer Segment InnerSegment
Figure 4.28 Schematic of the modular (block-structured) model of light-to-horizontal cell system.
Input x(t) represents the light stimulus and output y(t) the horizontal cell response. Each of the three
cascaded segments of the model contains negative decompressive feedback and an overdamped
forward component (shown in Figure 4.29). The static nonlinearity CN between the outer and inner
segment of the receptor model component is compressive (sigmoidal) [Marmarelis, 1991].
The first-order Wiener kemels of this model are shown in Figure 4.30 for GWN input
power levels P = 0.5, 1, 2, and 4, for the parameter values indicated in the caption ofFig-
ure 4.29. We observe kemel waveform changes that resemble closely the experimentally
observed changes (note that hyperpolarization is plotted as a positive deflection) that are
discussed in Section 6.1.1. Since experimentally obtained horizontal-cell kemels are usu-
ally plotted in the contrast sensitivity scale (i.e., scaled by the respective GWN input pow-
er level), we show in the same figure the kemels plotted in the contrast sensitivity scale.
The purpose of this demonstration is to show that the experimentally observed kernel
waveform changes can be reproduced fairly well by a modelofthis form employing non-
linear feedback. The selected model components and parameters are dependent on the
specific species, and the precise parameter values can be determined by repeated experi-
ments (for different values of P and ~) and kernel analysis in the presented context for
each particular physiological preparation.
We can extend the light-to-horizontal cell model to explain the experimentally ob-
served changes in the waveform of first-order Wiener kemels of the light-to-bipolar-cell
system (for increasing GWN input power level) [Marmarelis, 1991]. As shown in Figure
4.31, the response of the horizontal cell is subtracted from the response of the receptor
(inner segment), and the resulting signal is passed through a nonlinear feedback compo-
nent representing the triadic synapses (from the receptor terminals to the horizontal
processes and bipolar dendrites) as well as the transformation of the postsynaptic poten-
tial through the bipolar dendrites. The resulting first-order Wiener kemels are similar to
the experimentally observed ones (i.e., shorter latency, increased bandwidth, and in-
creased sensitivity with increasing P) that are presented in Section 6.1.1.
Beyond these mechanistic explanations, an important scientific question can be posed
about the teleological reasons for the existence of decompressive feedback in retinal cells,
in tandem with compressive nonlinearities. The presented analysis suggests that this is an
effective functional design that secures sensory transduction over a very broad range of
stimulus intensities while, at the same time, provides adequate (practically undiminish-
ing) dynamic range of operation about a dynamically changing operating point (attuned to
changing stimulus conditions). Furthermore, the gradual transition ofthe system function-
al characteristics towards faster response when the stimulus intensity and temporal
changes are greater, would be a suitable attribute for a sensory system that has evolved
Figure 4.29 Impulse response functions of the linear overdamped forward components PL (top),
RL (middle), and HL (bottom) used in the model of Figure 4.28. Note that the feedback nonlinearities
PN, RN, and HN used in the model are decompressive (cubic) with coef1icients E = 0.05,0.10, and
0.01, respectively. The static nonlinearity CN is compressive (sigmoidal) of the form described by
Equation (4.96) with a = 0.2 (E = 2) [Marmarelis, 1991].
0 .900
0 .800
0 .700
0 .600
0.500
...
......
..., 0 .400
:t
0 .300
0.200
0. 100
0.0
-0 .100 -,
00 15.0 .30.0 45 .0 60 .0 75 0
TIME LAG (TAU)
(a)
0.0 15.0 .30.0 45 .0 60 .0 75.0
TIME LAG
(b)
Figure 4.30 (a) First-order Wiener kerneis of the Iight-to-horizontal cell model shown in Figure 4.28 ,
for P = 0.5, 1, 2, and 4. Observe the gradual transition in kernel waveform akin to the experimentally
observed . (b) The same kerneis plotted in cont rast sensitivity scale (Le., each kernel scaled by it s
corresponding power level) [Marmarelis , 1991).
219
x (t) +
ReceptorCell ReceptorCell HorizontalCell Bipolar Cell

Outer Segment InnerSegment
Figure 4.31 Schematic of the modular (block-structured) model of the light-to-bipolar cell system
described in the text [Marmarelis, 1991].
under the requirements of rapid detection of changes in the visual field (threat detection)
for survival purposes.
Another interesting example of a sensory system are the auditory nerve fibers, whose
band-pass first-order Wiener kernels undergo a transition to lower resonance frequencies
as the input power level increases [Marmarelis, 1991]. This has been observed experi-
mentally in primary auditory nerve fibers that have center (resonance) frequencies be-
tween 1.5 and 6 KHz [Moller, 1983; Lewis et al., 2002a,b], as discussed in Section 6.1.3.
To explore whether nonlinear feedback may constitute a plausible model in this case,
we consider a band-pass, linear forward subsystem and negative sigmoid feedback, like
the one discussed earlier. The obtained first-order Wiener kernel estimates for GWN in-
put power level P = 1, 16,256, and 4096 are shown in Figure 4.32 (with appropriate plot-
ting offsets to allow easier visual inspection) and replicate the gradual shift in resonance
frequency and contraction of the "envelope" of the kernel with increasing P, which were
also observed experimentally [Moller, 1975, 1977, 1978]. Since these changes are more
easily seen in the frequency domain (the preferred domain in auditory studies), we also
show the FFT magnitudes ofthese kernels in Figure 4.32. Note that the latter are approxi-
mate "inverted tuning curves" exhibiting decreasing resonance frequency and broadening
fractional bandwidth as P increases, similar to the experimental observations in auditory
nerve fibers.
This nonlinear feedback model appears to capture the essential functional characteris-
tics of primary auditory nerve fibers that have been observed experimentally. The nega-
tive compressive feedback can be thought as intensity-reduced stiffness, which has been
observed in studies of the transduction properties of cochlear hair cells. Aceurate quanti-
tative measures of the functional components and parameters of this feedback system
(e.g., the precise form of the feedback nonlinearity) can be obtained on the basis of the
analysis presented earlier and will require aseries of properly designed experiments for
various values of P. Furthermore, the presence of a negative compressive feedback in the
auditory fiber response characteristics may provide a plausible explanation for the onset
of pathological conditions such as tinnitus, a situation in which the strength of the nega-
tive compressive feedback increases beyond normal values and leads to undiminishing
oscillatory behavior irrespective of the specific auditory input waveform, as demonstrated
earlier (see Figure 4.25).
Concluding Remarks on Nonlinear Feedback. Nonlinear feedback has been

long thought to exist in many important physiological systems and be of critical impor-
tance for maintaining proper physiological function. However, its systematic and rigor-
.....--..... 4
"- J
S
i
~ 2
0.0 20.0 40.0 60.0 80.0 100.
TINE lAG
(a)
0.125E-01 -
O.',JE-Ol
0.1ooE-01
0.815E-02
0.750E-02
0.625E-02
O.500E-02
0.375E-02
0.250E-02
0.125E-02
0.0 I' , i , i i i i' li ii , 'i , , , I i , i i , i i i , li i i i i i i i i I' i , i , i i , , ,
0.0 0.250E-Ot 0.5OOE-01 0.750E-01 0.100 0.125
NORMAUZEO rREOUENCY
(b)
Figure 4.32 (a) First-order Wiener kerneis of negative sigmoid feedback system (E = 1, a = 0.25)
with a band-pass forward component, for P = 1 (trace 1),16 (trace 2),256 (trace 3), and 4096 (trace
4), emulating primary auditory fibers. Observe the contracting envelope and decreasing resonance
frequency as P increases (arbitrary offsets for easier inspection). (b) FFT magnitudes of the kerneis
shown in (a). We observe decreasing resonance frequency and gain as P increases, as weil as
broadening of the tuning curve in reverse relation to the envelope of the kerne!. When these curves
are plotted in contrast sensitivity scale (i.e., each scaled by its corresponding P value), then the reso-
nance-frequency gain will appear increasing with increasing P [Marmarelis, 1991].
221
ous study has been hindered by the complexity of the subject matter and the inadequa-
cy of practical methods of analysis. The study of Volterra-Wiener expansions of non-
linear differential equations has led to some analytical results that begin to shed light on
the analysis of nonlinear feedback systems in a manner that advances our understanding
of the system under study. The results obtained for a class of nonlinear feedback sys-
tems relate Volterra or Wiener kernel measurements to the effects of nonlinear feedback
under various experimental conditions. Explicit mathematical expressions were derived
that relate Wiener kernel measurements to the characteristics of the feedback system and
the stimulus parameters. The theoretical results were tested with simulations, and their
validity was demonstrated in a variety of cases (cubic and sigmoid feedback with over-
damped or underdamped forward subsystem). These test cases were chosen as to sug-
gest possible interpretations of experimental results, including results that have been
published in recent years for two types of sensory systems: retinal horizontal and bipo-
lar cells, and primary auditory nerve fibers. It was shown that relatively simple nonlin-
ear feedback models can reproduce the qualitative changes in kerne I waveforms ob-
served experimentally in these sensory systems. Precise quantitative determination of
the parameters of the feedback models requires analysis (in the presented context) of
data from aseries of properly designed experiments.
Specifically, it was shown that negative decompressive feedback (e.g., cubic) or pos-
itive compressive feedback (e.g., sigmoid) result in gradually decreasing damping (in-
creasing bandwidth) of the first-order Wiener kernel as the GWN input power level
and/or mean level increase. Conversely, positive decompressive or negative compressive
feedback result in the reverse pattern of changes. The extent of these effects depends, of
course, on the exact type of feedback nonlinearity and/or the dynamics of the linear for-
ward subsystem. It was demonstrated through analysis and computer simulations that
the experimentally observed changes in the waveform of the first-order Wiener kernel
measurements for retinal horizontal and bipolar cells can be explained with the use of
negative decompressive (cubic) feedback and low-pass forward subsystems (viz., the
gradual transition from an overdamped to an underdamped mode as the GWN stimulus
power and/or mean level increase). In the case of auditory nerve fibers, it was shown
that the use of negative compressive (sigmoid) feedback and a band-pass forward sub-
system can reproduce the effects observed experimentally on their "tuning curves" for
increasing stimulus intensity (viz., a gradual downward shift of the resonance frequency
and broadening of the bandwidth of the tuning curve with increasing stimulus power
level) [Marmarelis, 1991].
It is hoped that this work will inseminate an interest among systems physiologists to
explore the possibility of nonlinear feedback models in order to explain changes in re-
sponse characteristics when the experimental stimulus conditions vary. This is critical
when such changes cannot be explained by simple cascade models of linear and static
nonlinear components (like the ones discussed earlier), which are currently popular in ef-
forts to construct equivalent block-structured models from kernel measurements. For in-
stance, in the case of the auditory nerve fibers, the suggested model of negative sigmoid
feedback may offer a plausible explanation for pathological states of the auditory system,
such as tinnitus. Likewise, in the case ofretinal cells, negative decompressive feedback in
tandem with compressive nonlinearites may explain the ability of the "front end" of the
visual system to accommodate a very broad range of visual stimulus intensities while pre-
serving adequate dynamic range for effective information processing, as weIl as retain the
ability to respond rapidly to changes in stimulus intensity.
4.2 CONNECTIONIST MODELS 223
4.2 CONNECTIONIST MODELS
The idea behind connectionist models is that the relationships among variables of interest
can be represented in the form of connected graphs with generic architectures, so that
claims of universal applicability can be supported for certain broad classes of problems.
The most celebrated example of this approach has been the class of "artificial neural net-
works" (ANN) with forward and/or recurrent (feedback) interconnections. The latter
types with recurrent interconnections have found certain specialized applications (e.g.,
the Hopfield-nets solution to the notorious "traveling salesman problem") and have made
(largely unsubstantiated) claims of affinity to biological neural networks. However it is
fair to say that they have not lived up to their promise (until now) and their current use in
modeling applications is rather limited. The former types of ANN with forward intercon-
nections have found (and continue to find) numerous applications and have demonstrated
considerable utility in various fields. These types of ANN can be used for modeling pur-
poses by representing arbitrary input-output mappings, and they derive their scientific
pedigree from Hilbert's "13th Problem" and Kolmogorov's "representation theorem" of
the early part ofthe 20th century [Kolmogorov, 1957; Sprecher, 1972].
The fundamental mathematical problem concems the mapping of a multivariate func-
tion onto a univariate function by means of a reference set of "activation functions" and
interconnection weights. Kolmogorov's constructive theorem provided theoretical impe-
tus to this effort, but the practical solution of this problem came through the methodolog-
ical evolution of the concept of a "perceptron," which was proposed by Rosenblatt
(1962). The field was further advanced through the pioneering work of Widrow, Gross-
berg, and the contributions of numerous others, leading to the burgenoning field of feed-
forward ANN (for review see [Rurnelhart & McClelland, 1986; Grossberg, 1988; Widrow
& Lehr, 1990; Haykin, 1994; Hassoun, 1995]).
The adjective "neural" is used for historical reasons, since some of the pioneering
work alluded to similarities with information processing in the central nervous system, a
point that remains conjectural and largely wishful rather than corroborated by real data in
a convincing manner (yet). Nonetheless, the mere allusion to analogies with natural
"thinking processes" seems to magnetize people's attention and to lower the threshold of
initial acceptance of related ideas. Although this practice was proven to offer promotional
advantages, the tenuous connection with reality and certain distaste for promotional hype
has led us to dispense with this adjective in the sequel and refer to this type of connection-
ist model as a "Volterra-equivalent network" (VEN).
For our purposes, the problem of nonlinear system modeling from input-output data
relates intimately to the problem of mapping multivariate functions onto a univariate
function, when the input-output data are discretized (sampled). Since the latter is always
the case in practice, we have explored the use of Volterra-equivalent network architec-
tures as an alternative approach to achieve nonlinear system modeling in a practical con-
text. Wehave found that certain architectures offer practical advantages in some cases, as
discussed below.
4.2.1 Equivalence between Connectionist and Volterra Models

Westart by exploring the conditions for equivalence between feedforward connectionist
models and discrete Volterra models [Marmarelis, 1994c; Marmarelis & Zhao, 1994,
1997]. The latter generally represent a mapping of the input epoch vector x(n) = [x(n),
x(n - 1), ... , x(n - M + 1)]' onto the output present scalar value y(n), where M is the
memory-bandwidth product ofthe system. This mapping ofthe [M x 1] input epoch vec-
tor onto the output scalar present value can be expressed in terms of the discrete Volterra
series expansion ofEquation (2.32). On the other hand, this mapping can be implemented
by means of a feedforward network architecture that receives as input the input epoch
vector and generates as output the scalar value y(n). The general architecture of a "Volter-
ra-equivalent network" (YEN) is shown in Figure 4.33 and employs an input layer of M
units (introducing the input epoch values into the network) and two hidden layers ofunits
(that apply nonlinear transformations on weighted sums of the input values). The VEN
output is formed by the sum of the outputs of the hidden units of the second layer and an
offset value.
A more restricted but practically useful YEN class follows the general architecture of a
"three-layer perceptron" (TLP) with a "tapped-delay" input and a single hidden layer
(shown in Figure 4.34) that utilizes polynomial activation functions instead of the con-
ventional sigmoidal activation functions employed by TLP and other ANN. The output
may have its own nonlinearity (e.g., a hard threshold in the initial perceptron architecture
that generated binary outputs, consistent with the all-or-none data modality of action po-
tentials in the nervous system).
x(n) x(n-m) x(n-M+l) INPUT
FIRST
HIDDEN
LAYER
SECOND
HIDDEN
LAYER
or
INTERACTION
LAYER
OUTPUT
Figure 4.33 The general architecture of a "Volterra-equivalent network" (VEN) receiving the input
epoch and generating the corresponding system output after processing through two hidden layers.
Each hidden unit performs a polynomial transformation of the weighted sum of its inputs. Arbitrary
activation functions can be used and be approximated by polynomial expressions within the range of
interest. The output unit performs a simple summation of the outputs of the second hidden layer
(also called "interaction layer") and an output offset.
The TLP architecture shown in Figure 4.34 corresponds to the class of "separable
Volterra networks" (SVNs) whose basic operations are described below [Marmarelis &
Zhao, 1997]. The weights {Wj,m} are used to form the input Uj( n) into the nonlinear "acti-
vation function" ij of the jth hidden unit as
M-l
uj(n) = L wj,mX(n - m) (4.98)

m=O
leading to the output of the jth hidden unit:
Zj(n) = ij[uj(n)] (4.99)
where the activation functionij is a static nonlinear function. For SVN or VEN models,
the activation functions are chosen to be polynomials:
Q
ij(Uj) = LCj,quJ (4.100)
q=l
although any analytic function or nonanalytic function approximated by polynomials (or

power series) can be used.
x(n) x(n -m) ••• x{n - M + 1) INPUT
T 7 1
r----- HIDDEN
LAYER
L ~ :
Z\ (n) Zj(n) ZH(n)
Yo
OUTPUT
Figure 4.34 The single-Iayer architecture of the special class of VEN that corresponds to the "sep-
arable Volterra network" (SVN). The output unit is a simple adder. This network configuration is simi-
lar to the traditional "three-Iayer perceptron" (TLP) albeit with polynomial activation functions {f;} in
the hidden units, instead of the conventional sigmoidal activation functions used in TLP.
The SVNNEN OUtput unit is a simple adder (i.e., no output weights are necessary) that
sums the outputs {Zj} of the hidden units and an offset Yo as
H
yen) = Yo + LZj(n) (4.101)

j=l
Combining Equations (4.98), (4.99), (4.100), and (4.101), we obtain the SVN/VEN
input-output relation:
Q H M-I M-I
yen) = Yo + LL
q=l j=l
Cj,q L ... L
ml=O mq=O
Wj,ml . . . Wj,mqx(n - ml) ... x(n - m q) (4.102)
whieh is isomorphie to the discrete Volterra model (DVM) of order Q as H tends to in-
finity. Equation (4.102) demonstrates the equivalenee between SVNNEN and DVM,
whieh is expeeted to hold in practice (with satisfaetory approximation) even for finite H.
It has been found empirically that satisfaetory DVM approximations can be obtained
with small H for many physiologieal systems.
It is evident that the diserete Volterra kemels ean be evaluated by means of the
SVNNEN parameters as
H
kq(mb . . . , m q) = LCj,qWj,ml . . . Wj,m q (4.103)
j=l
offering an alternative for Volterra kerne I estimation through SVNNEN training that has
proven to have eertain practical advantages [Marmarelis & Zhao, 1997]. Naturally, the
same equivalenee holds for the broader class of VEN models shown in Figure 4.33 as H
and I tend to infinity. However the Volterra kerne I expressions become more complieated
for the general form ofthe VEN model in Figure 4.33.
The use of polynomial activation funetions in eonneetion with SVNNEN models
direet1y maintains the mathematieal affinity with the Volterra models [Marmarelis,
1994e, 1997], and typieally reduces the required number H of hidden units (relative to
pereeptron-type models). This is demonstrated below by eomparing the two elasses of
models.
The aetivation funetions used for conventional TLP arehiteetures are seleeted to have
the sigmoidal shape of the "logistic" funetion:
1
~{u) = 1 + exp[-.A(Uj- 0)]
(4.104)
or the "hyperbolie tangent" funetion:
1 - exp[-A(uj - 0j)]
~(u) = 1 + exp[-.A(Uj- 0)]
(4.105)
depending on whether we want a unipolar (between 0 and 1) or abipolar (between -1 and

+ 1) output, as Uj tends to ±oo. The parameter Adefines the slope of this sigmoidal curve at
the infleetion point Uj = Oj and is typically speeified by the user in the conventional TLP
arehiteetures (i.e., it is not estimated from the data). However, the offset parameter Oj is
estimated from the data for each hidden unit separately, during the "training" process of
the TLP. Since the specific value of A can affect the stability and the convergence rate of
the training algorithm (e.g., through error back-propagation discussed in Section 4.2.2),
we recommend that A be trained along with the other network parameters (contrary to es-
tablished practice). As A increases, the sigmoidal function approaches a "hard threshold"
at (Jj and can be used in connection with binary output variables. In the latter case, it is
necessary to include output weights and a hard-threshold (Jo in the output unit as
yen) = To[~rA{n)] (4.106)
where {rj } are the output weights and To denotes a hard-threshold operator (i.e., equal to 1
when its argument is greater than a threshold value (Jo, and 0 otherwise).
In order to simplify the comparison and study of equivalence between the SVNNEN
and the TLP class of models, we consider a TLP output unit without threshold:
H
y(n) = I
j=l
rjZj(n) (4.107)
Then, combining Equations (4.98), (4.99), (4.104), and (4.107), we obtain the input-out-
put relation ofthe TLP model:
H
y(n) = ~ -.s, ~owj.",.x(n-m)
[M-I ]
(4.108)
which can be put in a DVM form by representing each sigmoidal activation function S,
with its respective Taylor expansion:
~(Uj) = Iai,j((Jj)uJ(n) (4.109)

i=O
where the Taylor expansion coefficients depend on the offset parameter (Jj of the sig-
moidal activation function (and on the slope parameter A, ifit is allowed to be trained). If
the selected activation function is not analytic (e.g., a hard-threshold function that does
not have a proper Taylor expansion), then a polynomial approximation of arbitrary accu-
racy can be obtained according to the Weierstrass theorem, following the method present-
ed in Appendix I.
Combining Equations (4.108) and (4.109), we obtain the equivalent DVM:
00 H M-I M-I
y(n) = Ii=O Ij=l

rjai,j((Jj)
ml=O
I ...mi=O
I Wj,ml ... Wj,mix(n - ml) ... x(n - mJ (4.110)
for this class of TLP models, where the ith-order discrete Volterra kerneI is given by
H
klmJ, ... , mi) = Irjai,j((Jj)Wj,ml ... Wj,mi (4.111)

j=l
Therefore, an SVNNEN or a TLP model has an equivalent DVM whose kernels are de-
fined by Equation (4.103) or (4.111), respectively. The possible presence of an activation
function at the output unit (e.g., a hard threshold) does not alter this fundamental fact but it
makes the analytical expressions for the equivalent Volterra kernels more complicated. The
fundamental question remains as to the relative efficiency of an equivalent SVNNEN or
TLP representation of a given DVM. This question can be answered by considering the to-
tal number offree parameters for each model type that yields the same approximation ofthe
kernel values for a Qth-order DVM. It has been found that the TLP model requires gener-
ally a much larger number ofhidden units and therefore more free parameters.
This important point can be elucidated geornetrically by introducing a hard threshold
at the output of both models and let A ~ 00 in the sigmoidal activation functions of the
TLP so that each Sj becomes a hard-threshold operator 1j(uj - Oj). Furthermore, to facili-
tate the demonstration, consider the simple case of Q = 2 and M = 2, where the input-out-
put relation before the application of the output threshold is simply given by
y(n) = x 2(n) + x 2(n - 1) (4.112)
Application of a hard threshold 00 = 1 at the output of this DVM yields a circular binary
boundary defining the input-output relation, as shown in Figure 4.35. To approximate
this input-output relation with a TLP architecture, we need to use a very large number H
of hidden units, since each hidden unit yields a rectilinear segment after network training
X(n-l)
2
1--------- -----------,1
I I
I I
I I
1
I I
I
I
\7···········,. . I
I
I
I I X(n)
I
-2 o 12
1
I
1 1
I I
I I
1 ••• -1,1.··· I
1 \ . I
1 I
I I
1
1 _ I
Figure 4.35 Illustrative example of a circular output "trigger boundary" (solid line) being approxi-
mated by a three-Iayer perceptron (TLP) with three hidden units defining the piecewise rectilinear
(triangular) approximation of the "trigger boundary" marked by the dotted Iines. The training set is
generated by 500 data points of uniform white-noise input that lies within the square domain demar-
cated by dashed lines. The piecewise rectilinear approximation improves with increasing number of
hidden units of the TLP, assuming polygonal form and approaching asymptotically more precise rep-
resentations of the circular boundary. Nonetheless, a VEN with two hidden units having quadratic ac-
tivation functions yields apreeise and parsimonious representation of the circular boundary.
für the best mean-square approximation of the output, according to the binary output ap-
proximation:
H
Irj1j(uj - (}j) == (Ja (4.113)
j=l
where 1j(uj - (}j) = 1 when wj,oX(n) + wj,}x(n - 1)~ (}j' otherwise T, is zero. Thus, the linear
equation,
Wj,oX(n) + wj,}x(n - 1) = (Jj (4.114)
defines each rectilinear segment of the output approximation due to the jth hidden unit,
and the resulting TLP polygonal approximation is defined by the output Equation (4.113).
For instance, if His only 3, then a TLP triangular approximation is shown in Figure 4.35
with dotted line. If the training ofthe TLP is perfeet, then the nonoverlapping area is min-
imized in a mean-square sense and the resulting polygonal approximation is canonical
(i.e., symmetrie for asymmetrie boundary). This, of course, is seldom the case in practice
because the TLP training is imperfect due to noise in the data or incomplete training con-
vergence, and an approximation similar to the one depicted in Figure 4.35 typically
emerges. Naturally, as H increases, the polygonal TLP approximation of the circle im-
proves, reaching asymptotically aperfect representation when H ~ 00 and if the training
data set is noise-free and fully representative of the actual input ensemble of the system.
Note however that a perfect SVNIVEN representation of this input-output relation (for
noise-free data) requires only two hidden units with quadratic activation functions!
This simple illustrative example punctuates a very important point that fonns the foun-
dation of understanding the relative efficacy of SVNNEN and TLP models in represent-
ing nonlinear dynamic input-output relationships/mappings. The key point is that the use
of sigmoidal activation functions unduly constrains the ability of the network to represent
a broad variety of input-output mappings with a small number of hidden units. This is
true even when multiple hidden layers are used, because the aforementioned rectilinear
constraint remains in force. On the other hand, if polynomial activation functions are
used, then a much broader variety of input-output mappings can be represented by a
small number ofhidden units.
This potential parsimony in the complexity of the required network architecture (in
terms of the number of hidden units) is of critical importance in practice, because the
complexity of the modeling task is direct1y related to the number of hidden units (both in
terms of estimation and interpretation). This holds true for single or multiple hidden lay-
ers. However, the use ofpolynomial activation functions may give rise to some additional
problems in the network training process by introducing more local minima during mini-
mization of the cost function. It is also contrary to Kolmogorov's constructionist ap-
proach in his representation theorem (requiring monotonie activation functions) that re-
tains a degree of reverence within the peer community. With all due respect to
Kolmogorov's seminal contributions, it is claimed herein that nonmonotonic activation
functions (such as polynomials) offer, on balance, a more efficient approach to the prob-
lem of modeling arbitrary input-output mappings with feedforward network architec-
tures.
This proposition gives rise to a new class of feedforward Volterra-equivalent network
architectures that employ polynomial (or generally nonmonotomic) activation functions
and can be efficient models of nonlinear input-output mappings. Note that this is consis-
tent with Gabor's proposition of a "universal" input-output model, akin in form to a dis-
crete Volterra model [Eykhoff, 1974].
Relation with PDM Modeling. If a feedforward network architecture with polyno-

mial activation functions is shown to be appropriate for a certain system, then the num-
ber of hidden units in the first hidden layer defines the number of PDMs of this system.
This is easily proven for the single hidden-layer SVNNEN models shown in Figure
4.34 by considering as the jth PDM output the "internal variable" uj(n) of the jth hidden
unit given by the convolution ofEquation (4.98), where thejth PDMpj(m) is defined by
the respective weights {wj,m} that form a discrete "impulse response function." This in-
ternal variable uj(n) is subsequently transformed by the polynomial activation function
!j(Uj) to generate the output of the jth hidden unit zj(n) according to Equation (4.100),
where the polynomial coefficients {Cj,q} are estimated from the data during SVNNEN
training.
It is evident that the equivalent Volterra kernels of the SVNNEN model are given
by
H
kq(mh ... , mq) = I

j=1
Cj,qpj(ml) ... Pj(m q) (4.115)
Therefore, this network model corresponds to the case of a "separable PDM model"
where the static nonlinearity associated with the PDMs can be "separated" into individual
polynomial nonlinearities corresponding to each PDM and defined by the respective acti-
vation functions, as shown in Figure 4.34. This separable PDM model can be viewed as a
special case of the general PDM model of Figure 4.1 and corresponds to the "separable
Volterra network" (SVN) architecture. The PDMs capture the system dynamics in a most
efficient manner (although other filterbanks can be also used) and the nonlinearities may
or may not be separable. In the latter case, the general YEN model ofFigure 4.33 must be
used to represent the general non-separable PDM model ofFigure 4.1.
The SVN architecture is obviously very convenient in practice but cannot be pro-
posed as having general applicability. Even though it has been found to be appropriate
for many actual applications to date (see Chapter 6), the general YEN model would re-
quire multiple hidden layers that can represent static nonlinearities of arbitrary com-
plexity in the PDM model. The additional hidden layers may incorporate other analytic
functions (such as sigmoidal, Gaussian, etc.), although the polynomial functions would
yield directly the familiar multinomial form of the modified Volterra model with cross-
terms.
The relation of PDM models with the VEN architectures also points to the relation
with the modified discrete Volterra (MDV) models that employ kernel expansions on se-
lected bases. In this case, the input can be viewed as being preprocessed through the re-
spective filterbank prior to weighting and processing by the hidden layer, resulting in the
architecture ofFigure 4.36. The only difference between this and the previous case ofthe
YEN model shown in Figure 4.33 is that the internal variables ofthe first hidden layer are
now weighted sums of the filterbank outputs {vln)}:
L
u}{n) = I
/=1
wj,/vln) (4.116)
x(n)
INPUT
I i 1
~ ... fb;l ... ~ FILTERBANK
~ ~) ~)
HIDDEN
LAYER
INTERACTION
LAYER
OUTPUT
y(n)
Figure 4.36 The VEN model architecture for input preprocessing by the filter bank {b,}.
instead ofthe weighted sum ofthe input lags shown in Equation (4.98). The eritieal point
is that when L ~ M, YEN model parsimony results. By definition, the use ofthe PDMs in
the filterbank yields the most parsimonious VEN model (minimum L).
In order to establish a clear terminology, we will use the term SVN for the YEN model
of Figure 4.34 with a single hidden layer and polynomial activation funetions, and the
term TLP when the aetivation functions are sigmoidal. Note that the VEN model ean gen-
erally have multiple hidden layers to represent arbitrary nonlinearities (not neeessarily
separable), as shown in Figure 4.33 and Figure 4.36.
The important praetieal issue of how we determine the appropriate memory-band-
width produet M and degree of polynomial nonlinearity Q in the aetivation funetions
of the SVN model is addressed by preliminary experiments diseussed in Seetion 5.2.
For instanee, the degree of polynomial nonlinearity ean be established by preliminary
testing of the system with sinusoidal inputs and subsequent determination of the highest
harmonie in the output via diserete Fourier transform, or by varying the power level
of a white-noise input and fitting the resulting output varianee to a polynomial expres-
sion. One other eritieal issue, the number of hidden units ean be determined by sueees-
sive trials in ascending order and applieation of the statistieal eriterion of Seetion 2.3.1
on the resulting reduetion of residual varianee. Note that the input weights in the SVN
model are normalized to the unity Euelidean norm for eaeh hidden unit so that the poly-
nomial eoeffieients give a direet measure of the relative importanee of eaeh hidden unit.
232 MODULAR AND CONNECTJONIST MODELING
Illustrative examples are given below for a second-order and an infinite-order simulat-
ed system with two PDMs [Marmarelis, 1997; Marmarelis & Zhao, 1997].
Illustrative Examples. First we consider a second-order Volterra system with memo-

ry-bandwidth product M = 25, having the first-order kerne1 shown in Figure 4.37 with a
solid line and the second-order kerne1 similar to the one shown in the top panel of Figure
4.38 . This system is simulated using a uniform white-noise input of 500 data points. We
estimate the first-order and second-order Volterra kernels of this system using TLP and
SVN models, as well as LET, which was introduced in Section 2.3 .2 to improve Volterra
kerne1 estimation by use of Laguerre expansions of the kemels and least-squares estima-
tion ofthe expansion coefficients [Marmarelis, 1993]. In the noise-free case , the LET and
SVN approaches yie1d precise first-order and second-order Volterra kernel estimates , al-
though at considerab1y different computationa1 cost (LET is about 20 times faster than
SVN in this case) . Note that the LET approach requires five discrete Laguerre functions
(DLFs) in this example (i.e., 21 free parameters need be estimated) while the SVN ap-
proach needs only one hidden unit with a second-degree activation function (resulting in
28 free parameters).
As expected, the TLP model requires more free parameters in this example, (i.e., more
hidden units) and its predictive accuracy is rather inferior, although it incrementally im-
proves with increasing number H ofhidden units . This incremental improvement gradual-
ly diminishes, because of the finite data record. Since the computationa1 burden for net-
0.42
1sr-ORDER KERNELS FOR NOISY CASE (SNR=O dB)
0.358
0.296
0.234
0 .172
0.1 1
/ " ~\
0
f . >: .\ / .'
-0.014
-0.076
v
- 0.138
-0.2
0 5 10 15 20 25
TIME lAG
Figure 4.37 The exact first-order Volterra kernel (solid line) and the three estimates obtained in the
noisy case (SNR = 0 dB) via LET (dashed line), SVN (dot-dashed line), and TLP (dotted line). The LET
estimate is the best in this example, followed closely by the SVN estimate in terms of accuracy. The
TLP estimate (obtained with four hidden units) is the worst in accuracy and computationally most de-
mand ing [Marmarelis & Zhao , 1997].
4.2 CONNECTIONIST MOD ELS 233
(a)
(b)
(c)
Figura 4.38 The second-order Volterra kemel estimates obta ined in the noisy case (SNR = 0 dB)
via (a) LET, (b) SVN, and (c) TLP. The relative performance of the three methods is the same as de-
scribed in the case of first-order kemeis (see capt ion of Figure 4.37) [Marmarelis & Zhao , 1997].
work training increases with increasing H, we are faced with an important trade-off: in-
cremental improvement in accuracy versus additional computational burden. By varying
H, we determine a reasonable compromise für a TLP model with four hidden units, where
the number of free parameters is 112 and the required training time is about 20 times
longer than SVN (or 400 times longer than LET). The resulting TLP kerne I estimates are
not as accurate as their SVN or LET counterparts, as illustrated in Figures 4.37 and 4.38
for the first-order and second-order kernels, respectively, for a signal-to-noise ratio of 0
dB in the output data (i.e., the output-additive independent GWN variance is equal to the
noise-free, de-meaned output mean-square value). Note that the SVN training required
200 iterations in this example versus 2000 iterations required for TLP training. Thus,
SVN appears to be preferable to TLP in terms of accuracy and computational effort in this
example of a second-order Volterra system.
The obtained Volterra kerneI estimates via the three methods (LET, SVN, TLP)
demonstrate that the LET estimates are the most accurate and quiekest to obtain, followed
by the SVN estimates in terms of accuracy and computation, although SVN requires
longer computing time (by a factor of 20). The TLP estimates are clearly inferior to either
LET or SVN estimates in this example and require longer computing time (about 20 times
longer than SVN for H = 4). These results demonstrate the considerable benefits of using
SVN configurations instead of TLP for Volterra system modeling purposes, although
there may be some cases in which the TLP configuration has a natural advantage (e.g.,
systems with sigmoidal output nonlinearities).
Although LET appears to yield the best kernel estimates, its application is practically
limited to low-order kernels (up to third) and, therefore, it is the preferred method only for
systems with low-order nonlinearites. On the other hand, SVN offers not only an attrac-
tive alternative for low-order kernel estimation and modeling, but also a unique practical
solution when the system nonlinearities are 0/ high order. The latter constitutes the pri-
mary motivation for introducing the SVN configuration for nonlinear system modeling.
To demonstrate the efficacy of SVN modeling for high-order systems, we consider an
infinite-order nonlinear system described by the output equation
y = (v} + 0.8~ - 0.6vIv2)sin[(v} + v2)/5] (4.117)
where the sine function can be expressed as a Taylor series expansion, and the "internal"
variables (v} , V2 ) are given by the difference equations:
v}(n) = 1.2v}(n - 1) - 0.6v}(n - 2) + 0.5x(n - 1) (4.118)
v2(n) = 1.8v2(n - 1) - 1.lv2(n - 2) + 0.2x(n - 3)

+ O.lx(n - 1) + O.lx(n - 2) (4.119)
The discrete-time input signal x(n) is chosen in this simulation to be a 1024-point segment
ofGWN with unit variance. Use ofLET with six DLFs to estimate the truncated second-
order and third-order Volterra models yields output predictions with normalized mean-
square errors (NMSEs) of 47.2% and 34.7%, respectively. Note that the obtained kernel
estimates are seriously biased because of the presence of higher-order terms in the output
equation that are treated by LET as correlated residuals in least-squares estimation. Use of
the SVN approach (employing five hidden units of seventh-degree) yields a model of im-
proved prediction accuracy (NMSE = 6.1%) and mitigates the problem ofkemel estima-
tion bias by allowing estimation of nonlinear terms up to seventh order. Note that, al-
though the selected system is of infinite order, the higher-order Volterra kerneis are of
gradually diminishing size, consistent with the Taylor series expansion of the sine func-
tion. Training of a TLP model with these data yields less prediction accuracy than the
SVN model for comparable numbers ofhidden units. For instance, a TLP model with H =
8 yields an output prediction NMSE of 10.3% (an error that can be gradually, but slowly,
reduced by increasing the number of hidden units) corresponding to 216 free parameters,
which compares with 156 free parameters for the aforementioned SVN model with H = 5
and Q = 7 that yields an output prediction NMSE of 6.1%.
4.2.2 Volterra-Equivalent Network Architectures for Nonlinear

System Modeling
This section discusses the basic principles and methods that govern the use of Volterra-
equivalent network (VEN) architectures for nonlinear system modeling. The previously
established Volterra modeling framework will remain the mathematical foundation for
evaluating the performance of alternative network architectures. The key principles that
will be followed are
1. The network architecture must retain equivalence to the Volterra class of models.
2. Generality and parsimony will be sought, so that the model is compact but not un-
duly constrained.
The study will be limited here to feedforward network architectures of single-input/sin-

gle-output models for which broadband time-series data are available. The case of multi-
ple inputs and outputs, as weIl as the case of autoregressive models with recurrent net-
work connections, will be discussed in Chapters 7 and 10, respectively. The underlying
physiological system is assumed to be stationary and belong to the Volterra class. The
nonstationary case will be discussed in Chapter 9.
The network architectures considered herein may have multiple hidden layers and ar-
bitrary activation function (as long as the latter can be expressed or approximated by
polynomials or Taylor series expansions in order to maintain equivalence with the Volter-
ra class of models). For simplicity of representation, all considered networks will have a
filterbank for input preprocessing, which can be replaced by the basis of sampling func-
tions if no preprocessing is desired. In general, the selection of the filterbank will be as-
sumed "judicious" in order to yield compact representations of the system kerneIs (see
Section 2.3.1). Although the filterbank may incorporate trainable parameters (see next
section on the Laguerre-Volterra network), this will not be a consideration here. With
these stipulations in mind, we consider the general VEN architecture shown in Figure
4.36 that has two hidden layers, Vi} and {gi}, and a filterbank {b,} for input convolution-
al preprocessing according to
M-l
vln) = Lblm)x(n-m) (4.120)

m=O
The first hidden layer has H units with activation functions Vi} transforming the internal
variables
L
uj(n) = L wj,lvln) (4.121)

'=1
into the hidden unit output

Zj(n) = jj[uj(n)]
Q
= ICj,quJ(n) (4.122)
q=1
The outputs {Zj(n)} of the first hidden layer are the inputs to the second hidden layer (also
termed the "interaction layer") that has I units with activation functions {gi} transforming
the ith internal variable
H
c/Jln) = Ipi,~j(n) (4.123)

j=1
into the ith interaction unit output
l/Jln) = gi[ c/Jln)]

R
= I
r=1
'Yi,rc/Ji(n) (4.124)
Note that R and/or Q may tend to infinity if the activation function is expressed as a
Taylor series expansion. Therefore, activation functions other than polynomials (e.g.,
sigmoidal, exponential, sinusoidal) are admissible under this network architecture (note
that monotonicity is not a requirement in contrast to the conventional approach). For in-
stance, a sensible choice might involve polynomial activation functions in the first hid-
den layer (for the reason expounded in the previous section), but cascaded with sig-
moidal activation functions to secure stability of the model output, as discussed in
Section 4.4.
It is evident that the presence of a second hidden layer distinguishes this architecture
from the separable Volterra networks (SVN) discussed in the previous section and en-
dows it with broader applicability. However, as previously for the SVN model, the "prin-
cipal dynamic modes" (PDMs) corresponding to this VEN model architecture remain the
equivalent filters generating the internal variables uj(n) of the first hidden layer, i.e., the
jthPDM is
L
Pj(m) = I
/=1
wj,lblm) (4.125)
and the PDM outputs

M-l
uj(n) = I pj(m)x(n - m) (4.126)

m=O
are fed into a multiinput static nonlinearity that maps the H PDM outputs {zl(n), ... ,
zIfn)} onto the VEN output yen) after transformation through the interaction layer. Thus,
the nonseparable nonlinearity ofthe equivalent PDM model is represented by the cascad-
ed operations of the hidden and interaction layers, yielding the input-output relation
I {H [L M-I ]}
y(n) = Yo + ~ s, ~ Pi,Jj ~Wj,f ~o bf(m)x(n -m) (4.127)
which provides guidance for the application of the chain rule of differentiation for net-
work training through the error back-propagation method (see below). When the quadrat-
ie eost function
1
J(n) = "2E2(n) (4.128)
is sought to be minimized for all n in the training set of data, where
E(n) = y(n) - y(n) (4.129)
is the output prediction error (Y denotes the output measurements), we need to evaluate
the gradient of the cost function with respect to all network parameters over the network
parameter space, which is composed ofthe weights {Wj,l} and {Pi,j}' as weIl as the para-
meters ofthe activation functions {fj} and {gi}. For instance, the gradient component with
respect to the weight Wk,s is
aJ(n) aE(n)
--=E(n)-- (4.130)
l1l.vk,s awk,s
by application of the chain rule of differentiation we have
aE(n) ay(n) ~ af/>ln)

- - = - - = Lg:{cPln)} - -
l1l.vk,s l1l.vk,s i=l l1l.vk,s
I, H, auj(n)
= Lgi{ cPln)} LPi,j!j[Uj(n)]--
i=l j=l l1l.vk,s
I
= Lg:{ cPln)}Pi,J;[Uk(n)]vs(n) (4.131)

i=l
where f' and g' denote the derivatives oif and g, respectively. These gradient components
are evaluated, of course, for the current parameter values that are continuously updated
through the training proeedure. For instanee, the value ofthe weight wk,s is updated at the
ith iteration as
W
(i+ 1) _
k.s
- W
(i)
k.s
- 'V
IW
raE
- -
.:h.,
(n) ] (i) E(l)(n)
.
(4.132)
uvvk,s
where the gradient component is given by Equation (4.131), Jlw denotes the "training
step" or "leaming constant" for the weights {Wj,l}, and the superscript (i) denotes quanti-
ties evaluated for the ith-iteration parameter values. The update schemes that are based on
Ioeal gradient information usuaBy employ a "momentum" term that reduces the random
variability from iteration to iteration by performing first-order low-pass filtering (expo-
nentiaBy weighted smoothing).
Analogous expressions can be developed for the other network parameters using the
ehain rule of differentiation. A specific example is given in the following section for the
Laguerre-Volterra network that has been used extensively in actual applications to date.
In this section, we will concentrate on three key issues:
1. Equivalence with Volterra kernels/models

2. Selection ofthe structural parameters ofthe network model
3. Convergence and accuracy ofthe training procedure
Note that the training of the network is based on the training dataset (using either single
error/residual points or summing many squared residuals in batch form); however, the
cost function computation is based on the testing dataset (different residuals than the
training dataset) that has been randomly selected according to the method described later
in order to reduce possible correlations among the residuals. Note also that, if the actual
prediction errors are not Gaussian, then a nonquadratic cost function can be used to attain
efficient estimates of the network (i.e., minimum estimation variance). The appropriate
cost function in this case is determined by the minus log-likelihood function ofthe actual
prediction errors, as described in Section 2.1.5.
Equiva/ence with Vo/terra Kerne/sIMode/s. The input-output relation ofthe VEN

model shown in Figure 4.36 is given by Equation (4.127). The equivalent Volterra model,
when the activation functions are expressed either as polynomials or as Taylor series ex-
pansions, yields the Volterra kerne I expressions
ko =Yo (4.133)
I H L
k 1( m ) = I
i=1
l'i,1 IPi'J~j,1 I
j=1 1=1
wj,lblm) (4.134)
I H L L
k2( m J, m2) = I
i=1
l'i,1 IPi'J~j,2 I
j=1 11=1 12=1
I
Wj,/IWj,/2bll(ml)bI2(m2)
I H H L L
+ I
i=1
Yi,2 I I
jl =1 j2=1
Pi,jl P i,j2 CjJ,I Cj2,1 I I
11=1 12=1
Wjl,/IWj2,/2bll(ml)bI2(m2) (4.135)
The expressions for the higher-order kernels grow more complicated but are not needed
in practice, since the interpretation ofhigh-order nonlinearities will rely on the PDM mod-
el form and not on individual kernels. It is evident that the complete representation of the
general Volterra system will require an infinite number ofhidden units and filterbank basis
functions. However, we posit that, for most physiological systems, finite numbers of L, H,
and I will provide satisfactory model approximations. The same is posited for the order of
nonlinearity, which is detennined by the product (QR) in the network ofFigure 4.36.
Se/ection of the Structura/ Parameters of the VEN Mode/. The selection of the
structural parameters (L, H,.Q, I, R) that define the architecture ofthe VEN model in Fig-
ure 4.36 is a very crucial matter because it determines the ability ofthe network structure
to approximate the function of the actual physiological system (with regard to the in-
put-output mapping) for properly selected parameter values and for a broad ensemble of
inputs. It should be clearly understood that the ability of a given network model to
achieve a satisfactory approximation ofthe input-output mapping (with properly selected
parameter values) is critically constrained by the selected network structure.
In the case of the VEN models, this selection task is as formidable as it is crucial, be-
cause some of the model parameters enter nonlinearly in the estimation process, unlike
the case of the discrete Volterra model, where the parameters enter linearly and a model
order selection criterion can be rigorously applied (see Section 2.3.1). Therefore, even if
one assurnes that proper convergence can be achieved in the iterative cost-minimization
procedure (discussed below), the issue of rigorously assessing the significance of residual
reduction with increasing model complexity remains a formidable challenge.
To address this issue, we establish the following guiding principles:
1. The assessment of significance of residual reduction must be statistical, since the

data-contaminating noise/interference is expected to be stochastic (at least in part).
2. The simplest measure of residual reduction is the change in the sum of the squared
residuals (SSR) as the model complexity increases (i.e., for increasing values of L,
H, Q,I,R).
3. Proceeding in ascending model complexity, a statistical-hypothesis test is per-
formed at each step, based on the "null hypothesis" that the current model structure
is the right one and examining the residual reduction in the next step (i.e., the next
model order) using a statistical criterion constructed under the null hypothesis.
4. To maximize the statistical independence of the model residuals used for the SSR
computation (a fact that simplifies the construction of the statistical criterion by as-
suming whiteness ofthese residuals), we evaluate the SSR from randomly selected
data points of the output (the "testing dataset") while using the remaining output
data-points for network training (the "training dataset").
5. The statistics of the residuals used for the SSR computation are assumed to be ap-
proximately Gaussian in order to simplify the statistical derivations and justify the
use of a quadratic cost function.
Based on these principles, we examine a sequence ofnetwork model structures {Sk} in

ascending order of complexity, starting with L = H = Q = I = R = 1 and incrementing
each structural parameter sequentially in the presented rightward order (i.e., first we in-
crement L all the way to L m ax and then increment H, etc.). At the kth step, the network
structure Sk is trained with the "training dataset" and the resulting SSR J k is computed
from the "testing dataset." Because the residuals are assumed Gaussian and white (see
principles 4 and 5 above), J k follows a chi-square distribution with degrees of freedom
equal to the size of the "testing dataset" minus the number of free parameters in Sk. Sub-
sequently, an F statistic can be used to test the ratio J;Jk+ 1 against a statistical threshold
for a specified level of confidence. Ifthe threshold is not exceeded, then the null hypothe-
sis is accepted and the network structure Sk is anointed the "right one" for this system;
otherwise, the statistical testing procedure continues with the next network structure of
higher complexity.
This selection procedure appears straightforward but is subject to various pitfalls, root-
ed primarily in the stated assumptions regarding the nature of the actual residuals and
their interrelationship with the specific input data used in this procedure. It is evident that
the pitfalls are minimized when the input data are elose to band-limited white noise (cov-
ering the entire bandwidth and dynamic range of the system) and the actual residuals are
truly white and Gaussian, as well as statistically independent from both the input and the
output.
The application ofthis procedure is demonstrated is Section 4.3 in connection with the
Laguerre-Volterra network, which is the most widely used Volterra-equivalent network
model to date.
Convergence and Accuracy of the Training Procedure. Having selected the

structural parameters (L, H, I, Q, R) of the YEN model, we must "train" it using the
"training set" of input-output data. The verb "train" is used to indicate the iterative esti-
mation of the VEN model parameters through minimization of a cost function defined by
the "testing set" of the input-output data.
As indicated above, the available input-output data are divided into a "training set"
(typically about 80% of the total) and a complementary "testing set" using random sam-
pling to maximize the statistical independence of the model prediction errors/residuals at
the points of the testing set. This random sampling is also useful in mitigating the effects
ofpossible nonstationarities in the system, as discussed in Chapter 9. Note that the input
data is comprised of the vector of preprocessed data of the filter-bank. outputs v(n) =
[Vt(n), ... , vL(n)]', which are contemporaneous with the corresponding output value
y(n). Thus, the random sampling selects about 20% ofthe time indices for the testing set
prior to commencing the training procedure on the basis of the remaining 80% input-
output samples comprising the training set.
For each data point in the training set, the output prediction residual is computed for
the current values ofthe network parameters. This residual error is used to update the val-
ues of the VEN parameter estimates, based on a gradient-descent procedure, such as the
one shown in Equation (4.132) or a variant of it, as discussed below. Search procedures,
either deterministic or stochastic (e.g., genetic algorithms), are also possible candidates
for this purpose but are typically more time-consuming. The reader is urged to explore the
multitude of interesting approaches and algorithms that are currently available in the ex-
tensive literature on artificial neural networks [Haykin, 1994; Hassoun, 1995] and on the
classic problem of nonlinear cost minimization that has been around since the days of
Newton and still defies a "definitive solution."
In this section, we will touch on some ofthe key issues germane to the training offeed-
forward Volterra-equivalent network models ofthe type depicted in Figure 4.36. These is-
sues are
1. Selection of the training and testing data sets

2. Network parameter initialization
2. Enhanced convergence for fixed-step algorithms
4. Variable-step algorithrns
The selection 0/ the training and testing data sets entails, in addition to the aforemen-
tioned random sampling, the sorting of the input data vectors {v(n)} so that if their Eu-
clidean distance in the L-dimensional space is shorter than a specified "minimal proximi-
ty" value, then the data can be consolidated by using the vector averages within each
"proximity cell." The rationale for this consolidation is that proximal input vectors v(n)
are expected to have small differential effects on the output (in which case their "training
value" is smalI) or, ifthe respective observed outputs are considerably different, then this
difference is likely due to noise/interference and will be potentially misleading in the
training context. This "data consolidation" is beneficial in the testing context as well, be-
cause it improves the signal-to-noise ratio and makes the measurements of the quadratic
cost function more robust.
The "proximity cells" can be defined either through a Cartesian grid in the L-dimen-
sional space or through a clustering procedure. In both cases, a "minimal proximity" val-
ue must be specified that quantifies our assessment of the input signal-to-noise ratio
(whieh detennines the input veetor "jitter") and the differential sensitivity of the
input-output mapping for the system at hand. This value dv defines the spaeing of the
grid or the eluster size. Note that this "minimal proximity" value dv may vary depending
on an estimate of the gradient of the output at eaeh speeifie loeation in the L-dimensional
spaee. Sinee this gradient is not known apriori, a eonservative estimate ean be used or
the "data consolidation" proeedure ean be applied iteratively. The downside risk of this
proeedure is the possibility of exeessive smoothing of the surfaee that defines the map-
ping of the input veetor v(n) onto the output y(n). Finally, the random sampling of the
eonsolidated data for the seleetion ofthe testing dataset is subjeet to a minimum time-sep-
aration between seleeted data points in order to minimize the probability of eorrelated
residuals. The remaining datapoints form the training dataset.
The networkparameter initialization eoneems the critieal issue of possible entrapment
in loeal minima during the training proeedures. This is one of the fundamental pitfalls of
gradient-based iterative proeedures, sinee unfortunate initialization may lead to a "stable"
loeal minimum (i.e., a loeally "deep" trough of the eost funetion surfaee that remains
mueh higher than the global minimum). This risk is often mitigated by seleeting multiple
initialization points (that "sample" the parameter spaee with suffieient density either ran-
domly or with a deterministie grid) and eomparing the resulting minima in order to select
the global minimum. This proeedure is sound but ean be very time-eonsuming when the
parameter spaee is multidimensional. This problem also gave impetus to search algo-
rithms (ineluding genetie algorithms that make random "mutation" jumps) whieh, howev-
er, remain rather time-eonsuming.
In general, there are no definitive solutions to this problem. However, for Volterra-
type models of physiologieal systems, one may surmise that the higher-order Volterra
terms/funetionals will be of gradually deelining importanee relative to the first two orders
in most eases. Consequently, one may obtain initial seeond-order Volterra approxima-
tions (using direet inversion or iterative methods) and use these approximations to set
many of the initial network parameter values in the "neighborhood" of the global mini-
mum. Subsequently, the eorreet model order ean be selected without seeond-order limita-
tion and the training proeedure ean be properly perfonned to yield the final parameter es-
timates with redueed risk of "loeal minimum" entrapment.
It is worth noting that the aforementioned "data consolidation" proeedure is expeeted
to alleviate some ofthe loeal minima that are due to noise in the data (input and output).
However, the morphology of the minimized eost funetion depends on the eurrent esti-
mates of the network parameters and, therefore, changes continuously throughout the test-
ing proeess. The general morphology also ehanges for eaeh different data point in the
training set, and the loeation of the global minimum may shift depending on the aetual
residual/noise at eaeh datapoint of the testing set. It is expeeted that the global minimum
for the entire testing set will be very elose to the loeation defined by the true parameter
values of the network model. The basie notion of the changing surfaee morphology of the
cost function during the training process is not widely understood or appreciated, al-
though its implieations for the training proeess ean be very important (e.g., appearanee
and disappearanee of loeal minima during the iterative proeess). A static notion of the
eost funetion ean be seriously misleading as it supports an unjustifiable faith in the
ehancy estimates of the gradient, whieh is eonstantly changing. When the eost function is
formed by the summation of all squared residuals in the testing set, then this morphology
remains invariant, at least with respeet to the individual datapoints, but still changes with
respeet to the eontinuously updated parameter values. Note that these updates are based
on gradient estimates of the ever-changing cost function surfaces for the various data-
points in the training set. Although one may be tempted to combine many training data
points in batch form in order to make the cost function surface less variable in this regard,
this has been shown empirically to retard the convergence of the training algorithm.
Somewhat counterintuitively, the individual training data points seem to facilitate the
convergence speed of the gradient-descent algorithm.
Enhanced convergence algorithmsfor fixed training steps have been extensively stud-
ied, starting with the Newton-Raphson method that employs "curvature information" by
means of the second partial derivatives forming the Hessian matrix [Eykhoff, 1974;
Haykin, 1994]. This approach has also led to the so-called "natural gradient" method that
takes into account the coupling between the updates of the various parameters during the
training procedure using eigendecomposition of the Hessian matrix in order to follow a
"most efficient" path to cost minimization. Generally, the gradient-based update of the
parameter P« at the (i + 1) iteration step is given by
IIp(i) ~p(i+l) _ (i) _ ajCi)(n)

(4.136)
k k Pk --/'
apk
However, this update of parameter Pk changes the cost-function surface that is used for
the update ofthe next parameter Pk+l byapproximately
sr:
k,l+l
(n) == ajCi)(n) A (i) =
apk
uPk - -
i ajCi)(n)
apk
}2 (4.137)
Thus, the update ofthe Pk+l parameter should be based on the gradient ofthe "new" cost
function ]Ci)(n):
a]Ci) ajCi) J2 jCi) . ajCi) a {ajCi)}2

-- = -- + Ilp~) == - - - / ' - - - - (4.138)
apk+l apk+l apk+lapk apk+l apk+l apk
It is evident that the "correction" ofthe cost-function gradient depends on the second par-
tial derivative (curvature) ofthe ith update ofthe cost function (i.e., depends on the Hess-
ian matrix ofthe cost-function update, ifthe entire parameter vector is considered), lead-
ing to the second-order ith update OfPk+l:
. ajCi) a {ajCi)}2
Ilp Cl ) = - / ' - - + y-- -- (4.139)
k+l apk+l apk+l apk
which reduces back to the first-order ith update ofthe type indicated in Equation (4.136)
when /' is very small.
Because of the aforementioned fundamental observation regarding the changeability
ofthe cost-function surface during the training process, it appears imprudent to place high
confidence in these gradient estimates (or their Hessian-based corrections). Nonetheless,
the gradient-based approaches have found many useful applications and their refinement
remains an active area of research. Since these requirements are elaborate and deserve
more space than we can dedicate here, we refer the reader to numerous excellent sources
in the extensive bibliography on this subject. We note the current popularity ofthe Leven-
berg-Marquardt algorithm (in part because it is available in MATLAB) and the useful no-
4.2 CONNECTJONIST MODELS 243
tion, embedded in the "nonnalized least mean-squares" method, that the fixed-step size
may be chosen inversely proportional to the mean-square value of the input. The use of a
momentum tenn in the update fonnula has also been found useful, whereby the update in-
dicated by Equation (4.136) is not direct1y applied but is subject to first-order autorecur-
sive filtering. The choice of the fixed-step value remains a key practical issue in this iter-
ative gradient-based approach.
The variable step algorithms for enhanced convergence deserve abrief overview be-
cause they were found to perform well in eertain eases where eonvergence with fixed-step
algorithms proved to be problematic [Haykin, 1994]. Ofthese algorithms, some use alter-
nate trials (e.g., the "beta rule") and others use previous updates of the parameters to ad-
just the step size (e.g., the "delta-bar-delta" rule). Note that the idea of reducing the step
size as a monotonie function of the iteration index, originating in "stochastic approxima-
tion" methods, was found to be ofvery limited utility.
The "beta rule" provides that the step size for the training of a specifie parameter is ei-
ther multiplied or divided by a fixed scalar ß, depending on which of the alternate trials
yields a smaller cost function. Thus, both options are evaluated at each step and the one
that leads to greater reduction of the eost function is selected. The proper value of the
fixed sealar ß has been determined empirically to be about 1.7 in the ease of artificial
neural networks.
The "delta-bar-delta" rule is rooted on two heuristic observations by Jaeobs, who sug-
gested that if the value of the gradient retains its algebraic sign for several consecutive it-
erations, then the corresponding step size should be increased. Conversely, ifthe algebra-
ic sign of the gradient alternates over several successive iterations, then the corresponding
step size should be decreased. These ideas were first implemented in the "delta-delta
rule," which ehanges the step size aecording to the product ofthe last two gradient values.
However, observed deficiencies in the application of the "delta-delta rule" led to the vari-
ant of the "delta-bar-delta" rule, which inereases the step size by a small fixed quantity K
ifthe gradient has the same sign with a low-pass filtered (smoothed) measure ofprevious
gradient values; otherwise, it deereases the step size by a quantity proportional to its cur-
rent value (so that the step size remains always positive but may diminish asymptotically)
[Haykin, 1994].
Breaking altogether with the conventional thinking of incremental updates, we propose
a method that uses successive parabolic fits (based on local estimates of first and seeond
derivatives) to define variable "leaps" in seareh of the global minimum. According to this
method, the morphology ofthe cost-function surface with respect to a specific parameter
Pk may be one of the three types shown in Figure 4.39. The parameter change (leap) is de-
fined by the iterative relation
p~+I) = p~) _ J'(Pk) (4.140)
where J' and J" denote the first and second partial derivatives of the cost function evalu-
ated at p~), and e is a very small positive "floor value" used to avoid numerical instabili-
ties when J" approaehes zero. Alternatively, the parameter is not changed when J" is
very close to zero, since the cost-funetion surface is eontinuously altered by the updates
ofthe other parameters and, consequently, J" is likely to attain nonzero values at the next
iteration(s) (moving away from an inflection point on the surface). Clearly, this method is
a slight modification of the classic Newton-Raphson method that extends to concave
244 MODULAR AND CONNECTION/ST MODELING
Case I Case 11 Case 111

J">O J"<O J"-O
J(p) 1 ',--~.---','
........................
Convex
Concave
r
inflection
point
p
Figure 4.39 Illustration of the three main cases encountered in the use of the "parabolic leap"
method of cost minimization. The simplified two-dimensional drawings show the cost function J(p)
with solid line and the parabolie loeal fits with dashed line. The abscissa is the value of the trained
parameter P, whose starting value Pr at the rth iteration is marked with a circle and the "Ianding" val-
ue (after the "Ieap") is marked with an asterisk. Gase I of a convex morphology (Ieft) iIIustrates the ef-
ficiency and reliable eonvergenee of defining the "Ieap" using the minimum of the loeal parabelle fit
(the classic Newton-Raphson method). Gase 11 of a concave morphology (middle) iIIustrates the use
of the symmetrie point relative to the maximum of the local parabolic fit. Gase 111 of convexleoncave
morphology with an inflection point (right) iIIustrates the low Iikelihood of a potential pitfall at the in-
fleetion point. The local parabelle fit has the same first and second derivative values as the cost func-
tion at the pivot point P" and eonsequently has the analytical form
1
f(P) = 2 J " (Pr)(P - Pr)2 + J'(Pr)(P- Pr) + J(Pr)
*"
which defines a "Ieap-Ianding" point p: = Pr - J' (Pr)/IJ"(Pr) I as long as J"(Pr) 0 (Le., avoiding inflec-
tion points). This is a slightly modified form of the Newton-Raphson method that covers concave
morphologies.
morphologies and inflection points. The rationale of this approach is depicted in Figure
4.39.
The Pseudomode-Peeling Method. One practical way of addressing the problem

of local minima in the training of separable VEN (SVN) models is the use of the "pseudo-
mode-peeling" method, whereby a single-hidden-unit SVN model is initially fitted to the
input-output data and, subsequently, another single-hidden-unit SVN model is fitted to
the input-residual data, and so on until the residual is minimized (i.e., meets our criterion
for model-order selection given in Section 2.3.1). Bach of the single-hidden-unit SVN
"submodels" thus obtained corresponds to a "pseudo-PDM" of the system (termed the
"pseudomode") with its associated nonlinearity. All the obtained SVN "submodels" are
combined by summation of their outputs to form the output of the overall SVN model of
the system.
Although this method protects the user from getting "trapped" in a local minimum (by
virtue of its ability to repeat the training task on the residual error and "peel" new pseudo-
modes as needed), it is generally expected to yield different pseudomodes for different
initializations of the successive "peeling" steps. Also, there is no guarantee that this pro-
cedure will always converge to the correct overall model, because it is still subject to the
uncertainties of iterative minimization methods. Nonetheless, it is expected to reach the

neighborhood of the correct overall model in most cases.
Another practical advantage of this method is that it simplifies the model-order selec-
tion task by limiting it to the structural parameters Land Q at each "peeling" step. Note
that the overall model (comprised of all the "peeled" pseudomodes) can be used to con-
struct the unique Volterra kemels of the system, which can be used subsequently to deter-
mine the PDMs of the system using the singular-value decomposition method presented
in Section 4.1.1. In this manner, the proposed approach is expected to arrive at similar
overall PDM models regardless of the particular initializations used during the pseudo-
mode-peeling procedure.
An alternative utilization of the overall model resulting from the pseudomode-peeling
method is to view it as a "denoising" tool, whereby the output noise is equated with the fi-
nal residuals of the overall model. The "denoised" data can be subsequently used to per-
form any other modeling or analysis procedure at a higher output-SNR. This can offer
significant practical advantages in low-SNR cases.
Another variant of the pseudomode-peeling method that may be useful in certain cases
involves the use of a "posterior" filter in each mode branch (akin to an L-N-M cascade
discussed in Section 4.1.2). This method can be implemented by the network model archi-
teeture shown in Figure 4.40, whereby two filter banks are employed to represent the two
filtering operations: one preprocesses the input signal (prior filter) and the other processes
the output ofthe hidden unit for each "peeling pseudomode" (posterior filter). It is evident
that this network model architecture gives rise to an overall model of parallel L-N-M
cascades that deviates from the basic Volterra-Wiener modular form or from the equiva-
lent PDM model form of Figure 4.1. This model form can be more efficient (i.e., less
modes are required) in certain cases [Palm, 1979].
x(t)
WI WL
----y---
HU
J z(t)
ßI .
'i rM
Yo
I y(t)
Figure 4.40 The L-N-M equivalent network model for a "peeling mode," employing two filter banks
{b,} and {ßm}, where I= 1, ... , Land m =1, ... , M. The second filter bank processes the output z(t)
of the hidden unit (HU) and captures the "posterior" filter operation through the output weights {r m}'
Nonlinear Autoregressive Modeling (Open-Loop). The Volterra-equivalent net-

work models discussed above can be also deployed in an autoregressive context without
recurrent connections (i.e., open-loop configurations), whereby the "input" is defined as
the past epoch of the signal from discrete time (n - 1) to (n - M) and the output is de-
fined as the value of the signal at discrete time n. Thus, we can obtain the nonlinear
mapping of the past epoch of a signal [Yen - 1), ... , yen - M)] onto its present value
yen) in the form of a feedforward network model that constitutes an "open-loop," non-
linear autoregressive (NAR) model of the signal yen). Clearly, this NAR model attains
the form of an autoregressive discrete Volterra model (with well-defined autoregressive
kernels ) that is equivalent to a nonlinear difference equation and describes the "internal
dynamics" of a single process/signal. The residual of this NAR model can be viewed as
the "innovations process" of conventional autoregressive terminology (i.e., the unknown
signal that drives the generation of the observed signal through the NAR model). It is
evident that this class of models can have broad utility in physiology, since it pertains
to the characterization of single processes (e.g., heart rate variability, neuronal rhythms,
endocrine cycles, etc.).
The estimation approach for this NAR model form is determined by the criterion im-
posed on the properties of the residuals. For instance, if we seek to minimize the vari-
ance of these residuals, then the estimation approach is similar to the aforementioned
ones (i.e., least-squares estimation). However, residual variance minimization may not
be a sensible requirement in the autoregressive case, because the residuals represent an
"innovation process" and do not quantify a "prediction error" as before. Thus, we may
require that the residuals be white (i.e., statistically independent innovations that yield
the "maximum entropy" solution), or that they have minimum correlations with certain
specified variables (maximizing the "mutual information" criterion), or that they exhib-
it specific statistical and/or spectral properties (as prescribed by previous knowledge
about the process). This important issue is revisited in Chapter 10 in connection with the
modeling of multiple interconnected physiological variables operating in closed-loop or
nested-loop configurations that represent the ultimate modeling challenge in systems
physiology.
4.3 THE LAGUERRE-VOLTERRA NETWORK
The most widely used Volterra-equivalent network model to date has been the so-called
"Laguerre-Volterra network" (LVN) which employs a filter bank of discrete-time La-
guerre functions (DLFs) to preprocess the input signal (as discussed in Section 2.3.2 in
connection with the Laguerre expansion technique) and a single hidden layer with poly-
nomial activation functions, as shown in Figure 4.41 [Marmarelis, 1997; Alataris et al.,
2000; Mitsis & Marmarelis, 2002]. The LVN can be viewed as a network-based imple-
mentation of the Laguerre expansion technique (LET) for Volterra system modeling that
employs iterative cost-minimization algorithms (instead of direct least-squares inversion
employed by LET) to estimate the unknown kernel expansion coefficients (through the
estimates of the LVN parameters) and the DLF parameter a. The latter has proven to be
extremely important in actual applications, because it determines the efficiency ofthe La-
guerre expansion of the kemels and constitutes a critical contribution by the author to the
long evolution of Laguerre expansion methods. The LVN approach has yielded accurate
and reliable models of various physiological systems since its recent introduction, using
relatively short input-output data records (see Chapter 6).
4.3 THE LA GUERRE-VOLTERRA NETWORK 247
x(n)
DISCRETE
LAGUERRE
th FILTERBANK
h Hidden Unit
"o(n) 1JL - i (n)

----------;Ä------------------------------,
'--------~-
: Wo h WL - i h :
l' , 1
I L-i I
1 », (n) = L (n)
Wj.h"j ! HIDDEN
LAYER
: j=O:
: :
: :
: :
: :
: :
: :
7 ---------------------------------------------'
t.:
H'~
h
Q
Zh (n) = L Cq.hU~ (n)
q=i
Figure 4.41 The Laguerre-Volterra Network (LVN) architecture employing a discrete Laguerre filter
bank for input preprocessing and a single hidden layer with polynomial activation functions distinct
for each hidden unit HU h (see text).
The basic architecture of the LVN is shown in Figure 4.41 and follows the standard ar-
chitecture of a single-layer, fully connected feedforward artificial neural network, with
three distinctive features that place it in the SVNNEN class ofmodels: (1) the DLF filter-
bank that preprocesses the input with trainable parameter o; (2) the polynomial (instead
ofthe conventional sigmoidal) activation functions in the hidden units; (3) the nonweight-
ed summative output unit (no output weights are necessary because the coefficients ofthe
polynomial activation functions in the hidden units are trainable). Note that the polynomi-
al activation functions do not have constant terms, but the output unit has a trainable off-
set value. The LVN has been shown to be equivalent to the Volterra class of finite-order
models [Marmarelis, 1997] and yields interpretable PDM models (see Chapter 6).
As discussed in Section 2.3.2, the output vj(n) of the jth DLF can be computed by
means ofthe recursive relation (2.203) that improves the computational efficiency ofthe
DLF expansion, provided the parameter a can be properly selected. A major advantage of
the LVN approach over the Laguerre expansion technique is the iterative estimation of a
(along with the other LVN parameters).
The input ofthe hth hidden unit is the weighted sum ofthe DLF filterbank outputs:
L-I
uh(n) = I
j=O
Wj,hVj(n) (4.141)
where h = 1, 2, ... , Hand the DLF indexj ranges from 0 to L - 1, instead ofthe conven-
tional range from 1 to L used in Section 2.3.1 (since the zero-order DLF is defined as the
first term in the filter bank). The output of the hth hidden unit is given by the polynomial
activation function
Q
Zh(n) = LCq,hU%(n) (4.142)
q=l
where Q is the nonlinear order of the equivalent Volterra model. The LVN output is given
by the nonweighted summation of the hidden-unit outputs, including a trainable offset
value y-:
H
y(n) = LZh(n) + yo (4.143)
h=l
The parameter a is critical for the efficacy of this modeling procedure because it de-
fines the form of the DLFs and it detennines the convergence of the DLF expansion,
which in turn determines the computational efficiency ofthis approach. In the original in-
troduction of the Laguerre expansion technique, the key parameter a was specified be-
forehand and remained constant throughout the model estimation process. This presented
a serious practical impediment in the application of the method, because it required te-
dious and time-consuming trials to select the proper a value that yielded an efficient La-
guerre expansion (i.e., with rapid convergence and capable of accurate kerneI representa-
tion). As discussed in Section 2.3.2, the recursive relation (2.203) allows the
computationally efficient evaluation of the prediction error gradient with respect to a, so
that the iterative estimation of the parameter a from the data is feasible, thus removing a
serious practicallimitation of the original Laguerre expansion approach.
It is evident from Equations (2.206) and (2.207) that the parameter ß = va can be es-
timated using the iterative expression
H L-I
ß{r+l) = ß{r) - 'YßeCr)(n) L f~{r)(Uh) L Wh,j[vj(n - 1) + Vj-l(n)] (4.144)
h=l j=O
where eCr )( n) is the output prediction error at the rth iteration, l'ß is the fixed learning con-
stant (update stepsize), andfh{r)(Uh) is the derivative ofthe polynomial activation function
of the hth hidden unit at the rth iteration:
Q
fh{r)[u~)(n)] = L qc~~[u~)(n)]q-l (4.145)
q=l
where the superscript (r) denotes the value of the subject variable/parameter at the rth it-
eration.
The iterative relations for the estimation of the other trainable parameters of the LVN
are
W~l) = w~3 - 'YweCr)(n)f~{r)[u~)(n)]vr)(n) (4.146)
C{r+l)
q.h
= c{r)
q,h - 'V
tc
tfr)(n)[u{r)(n)]q
h (4.147)
y~l) = y~) _ 'Yytfr)(n) (4.148)
wherej = 0, 1, ... ,L - 1; h = 1,2, ... ,H; q = 1, 2 ... , Q; and 'Yw, l'c and l'y denote the
respective learning constants for the weights, the polynomial coefficients, and the output
offset.
4.3 THE LAGUERRE-VOLTERRA NETWORK 249
In order to assist the reader in making the formal eonnection between the LVN and the
Volterra models, we note that successive substitution of the variables {Zh} in terms of
{Uh} and, in turn, {Vi}' yields an expression of the output in terms of the input that is
equivalent to the discrete Volterra model. Then it ean be seen that the Volterra kemels can
be expressed in terms of the LVN parameters as
ko =Yo (4.149)
H L-1
k l(m1) = LCI,h L Wh,jbi(m1) (4.150)

h=1 i=O
H L-1 L-1
k2(mJ, m2) = L C2,h L L Wh,i1wh,J2bi1(m1)bi2(m2) (4.151)

h=1 i1=O i2=O
H L-1 L-I
kQ(mJ, ... ,mQ) = L CQ,h L ... L Wh,j1'" wh,JQbi 1(ml) ... biQ(mQ) (4.152)
h=1 il=O iQ=O
If so desired, onee the training of the LVN is performed, the Volterra kemels can be
evaluated from Equations (4.149)-(4.152). However, it is recommended that the physio-
logieal interpretation of the obtained model be based on the equivalent PDM model, and,
thus, the Volterra kemels serve only as a general framework of reference and a means of
validation!evaluation.
A critical practical issue is the selection of the LVN structural parameters, namely the
number L ofDLFs, the number H ofhidden units, and the degree Q ofthe polynomial ac-
tivation functions, so that the LVN model is not underspecified or overspecified. This is
done by successive trials in ascending order (i.e., moving from lower to higher parameter
values, starting with L = 1, H = 1, and Q = 1 and incrementing from left to right) using a
statistical criterion to test the reduction in the eomputed mean-square error of the output
prediction achieved by the model, properly balaneed against the total number offree para-
meters. To this purpose, we have developed the Model Order Seleetion (MOS) eriterion
ofEq. (2.200) that is described in Seetion 2.3.1.
Conceming the number L of DLFs, it was found that it can affeet the convergence of
the iterative estimation of the a parameter in eertain eases where more DLFs in the L VN
make a eonverge to a smaller value. The reason is that increasing DLF order results in a
longer spread of significant values and increased distance between zero crossings in each
DLF, which is also what happens when o is inereased. Note that Q is independent ofthe L
selection since it represents the intrinsic nonlinearity of the system. Likewise, the selec-
tion of L does not affect (in principle) the seleetion of H, which determines the number of
required PDMs in the system under study. We must emphasize that the selection ofthese
structural parameters of the LVN model is not unique in general, but the equivalent
Volterra model ought to be unique.
Illustrative Example of LVN Modeling. To illustrate the efficacy of the LVN model-
ing approach, we simulate a fifth-order system deseribed by the following difference and
algebraic equations:
v(n) + A 1v(n - 1) + A 2v(n - 2) = BoX(n) + B 1x(n - 1) (4.153)

1 1 1 1
y(n) = v(n) + -v2(n) + -v3(n) + -v4(n) + -v5(n) (4.154)
2 3 4 5
where the coefficient values are arbitrarily chosen to be Al = -1.77, A 2 = 0.78, B o = 0.25,
and BI = -0.27. This system is equivalent to the cascade of a linear filter with two poles
[defined by Equation (4.153)] followed by a fifth-degree polynomial static nonlinearity
[defined by Equation (4.154)]. In this example, we know neither the appropriate value of
a nor the appropriate number L of DLFs, but we do know the appropriate values of the
structural parameters: H = 1 and Q = 5- knowledge that can be used as "ground truth"
for testing and validation purposes. The first-order Volterra kerneI can be found analyti-
cally by solving the difference equation (4.153) that yields
1
kI(m) = 4 (2pi - pi) (4.155)
where PI = exp(-0.2) and P2 = exp(-0.05). The high-order Volterra kernels of this system
are expressed in terms of the first-order kernel as
1
kr(mh' .. ,mr) = -kI(mI) ... kI(m r) (4.156)
r
for r = 2, 3, 4, 5 (of course, k, is zero for r > 5).

The simulation ofthis system is performed with a unit-variance, 1024-point Gaussian
CSRS input and the selected LVN model has structural parameters L = 4, H = 1, and Q =
5. The obtained Volterra kernel estimates are identical to the true Volterra kernels ofthe
system given by Equations (4.155) and (4.156). In order to examine the performance of
the LVN approach in the presence of noise, the simulation is repeated for noisy output
data, whereby an independent GWN signal is added to the output for a signal-to-noise ra-
tio (SNR) equal to 0 dB (i.e., the output signal power is equal to the noise variance). The
a leaming curves for both the noise-free and noisy cases are shown in Figure 4.42. We
can see that a converges to nearby values: 0.814 in the noise-free case and 0.836 in the
noisy case. Its convergence is not affected significantly by the presence of noise. The
large values of a in this example reflect the fact that this system has slow dynamics (the
spread of significant values of the first-order kernel is about 100 lags, which corresponds
to the memory-bandwidthproduct ofthe system).
The estimated first-order and second-order kernels in the noisy case are shown along
with the true ones (which are identical to the noise-free estimates) in Figure 4.43. The
noisy estimates exhibit excellent resemblance to the noise-free estimates (which are iden-
tical to the true kemels) despite the low-SNR data and the relatively short data record of
1024 samples, However, it is evident that the second-order kernel estimate is affected
more than its first-order counterpart by the presence of noise. The NMSE values of the
LVN model prediction for a different GWN input and output data (out-of-sample predic-
tion) are 0.2% and 49.41% for the noise-free and noisy cases, respectively. Note that a
perfect output prediction in the noisy case for SNR = 0 dB corresponds to 50% NMSE
value. These results demonstrate the efficacy and the robustness ofthe LVN modeling ap-
proach, even for high-order systems (fifth-order in this example), low SNR (0 dB), and
short data records (1024 samples),
r-i
1- Noise tee
- - SNR=OdB
I
0.35 - - - - , - - - - - - , - - - - - - r - - - - - , - - - - - . . . - - - - - - - - ,
0.3
0.25
0.2, 'V--- 1'-

I
0.15 I
I
I
0.1 I
I
I
0.05 I
I
o[-L- ---'../ )
o !
50 100 150 200 250 300

Iterations
Figure 4.42 The learning curves of a for the simulated fifth-order system for noise-free and noisy
outputs. Note that the learning constant for the noisy case is ten times smaller [Mitsis & Marmarelis,
2002].
In actual applications to physiological systems, the SNR rarely drops below 4 dB and
almost never below 0 dB. Likewise, it is very rare to require a model order higher than
fifth and it is not unusual to have data records of size comparable to 1024 (in fact, in the
early days of the cross-correlation technique, the data records had typically tens of thou-
sands of samples). Therefore, this illustrative example offers a realistic glimpse at the
quality of the modeling results achievable by the LVN approach.
The application of this modeling approach to actual physiological systems is illustrated
in Chapter 6. It is accurate to say that the LVN approach has yielded the best Volterra
modeling results to date with real physiological data and, therefore, points to a promising
direction for future modeling efforts, without precluding further refinements or enhanced
variants of this approach in the future.
Modeling Systems with Fast and Siow Dynamics (LVN-2). Ofthe many possi-
ble extensions of the LVN approach, the most immediate concems the use of multiple La-
guerre filter banks (with distinct parameters o) in order to capture multiple time scales of
dynamics intrinsic to a system. This is practically important because many physiological
systems exhibit vastly different scales of fast and slow dynamics that may also be interde-
pendent-i-a fact that makes their simultaneous estimation a serious challenge in a practi-
cal context. Note that fast dynamics require high sampling rates and slow dynamics ne-
cessitate long experiments, resulting in extremely long data records (with all the
burdensome experimental and computational ramifications). A practical solution to this
problem can be achieved by a variant of the LVN approach with two filter banks (one for
fast and one for slow dynamics) discussed below [Mitsis & Marmarelis, 2002].
252 M ODULAR AND CONNECTIONIS T M ODELING
3
- True
. _. - Estimated
2.5 .
1.5
g
::;;:
0.5
-0.5
-1
0 5 10 15 20 25
m
(a)
, ., .
3 3v " : :.'
.: ~
2
..
2J. · · :
~1 ~1
go go
SI SI
·1 ·1
-2 ·2
o o
5 5
10 10
15 15
20 20
m1 25 0 m1 25 0
(b)
Figure 4.43 (a)The true and estimated first-order Volterra kernel of the simulated fifth-order system
using LVN (L = 4, H = 1, Q = 5) tor the noisy case of SNR = 0 db [Mitsls & Marmarelis, 2002]. (b) The
true (Ieft) and estimated (right) second-order Volterra kernel of the simu lated fifth-order system using
LVN (L = 4, H = 1, Q = 5) for the noisy case of SNR = 0 db [Mitsis & Marmarelis, 2002].
4.3 THELAGUERRE-VOLTERRA NETWORK 253
The proposed architecture of the LVN variant with two filter banks (LVN-2) is shown
in Figure 4.44. The two filter banks preprocess the input separately and are characterized
by different Laguerre parameters (al and a2) corresponding generally to different num-
bers of DLFs (LI and L 2 ) . A small value of al for the first filter bank and a large value of
a2 for the second filter bank allows the simultaneous modeling of the fast and the slow
components of a system, as well as their interaction.
As was discussed in Section 2.3.2, the asymptotically exponential structure of the
DLFs makes them a good choice for modeling physiological systems, since the latter of-
ten exhibit asymptotically exponential structure in their Volterra kernels. However, one
cannot rule out the possibility of system kemels that do not decay smoothly-a situation
that will require either a large number of DLFs or an alternate (more suitable) filter-
bank. The reader must be reminded that the parameter adefines the exponential relax-
ation rate of the DLFs and determines the convergence of the Laguerre expansion for a
given kernel function. Larger a values result in longer spread of significant values (slow
dynamics). Therefore, the choice of the DLF parameters (al and a2) for the two filter
banks of the LVN-2 model must not be arbitrary and is critical in achieving an efficient
model representation of a system with fast and slow dynamics. This choice is made
automatically by an iterative estimation procedure using the actual experimental data,
xrn)
(2)
(1) WH.~-l
w1•O
Zl (n) ZH (n)
I" Yo
y(n)
Figure 4.44 The LVN-2 model architecture with two Laguerre filter banks {b/ 1)} and {b/ 2 )} that pre-
process the input x(n). The hidden units in the hidden layer have polynomial activation functions {fh}
and receive input from the outputs of both filter banks. The output y(n) is formed by summation of the
outputs of the hidden units {Zh} and the output offset Yo [Mitsis & Marmarelis, 2002].
as discussed earlier for the LVN model. For the LVN-2 model, the iterative estimation
fonnula is
H Q LI~l
ß <rl-
I
1) = ß<r)
I
- 'V.eCr)(n)
/I "~
) )
~ ~
) q[c q.nr
hW(hi)·uhq-1(n)[v(i)(n
n.j r } - 1) + v(~l(n)]]
} r (4.157)
h=l q=l j=O
where i = 1, 2 is the filter bank index, tfr)(n) is the output prediction error at the rth itera-
tion, ßi = ~, and {"Ii} are fixed positive learning constants. The notation [·]r means that
the quantity in the brackets is evaluated for the parameter estimates at the rth iteration.
The remaining variables and parameters are defined in a manner similar to the LVN
case, as indicated in Figure 4.44. Note, however, that the filter bank index i appears as a
subscript of L (since the two filter banks may have different numbers ofDLFs in general)
and as a superscript ofthe weights {w~}} and ofthe DLF outputs {v5i)(n)}, indicating their
dependence on the respective filter bank. The inputs to the polynomial activation func-
tions of the hidden units are
2 LI~l
~L
Uh(n) = L ~ w(i).v(i)(n)
h,} } (4.158)
;=1 j=O
and the LVN-2 output is given by

H
yen) = yo + I.fh[uh(n)] (4.159)
h=l
where each polynomial activation function perfonns the polynomial transformation

Q
h[Uh] = I
q=l
cq,huff (4.160)
that scrambles the contributions of the two filter banks and generates output components
that mix (or solely retain, if appropriate) the characteristics of the two filter banks.
For instance, the equivalent first-order Volterra kernel of the LVN-2 model is com-
posed of a fast component (denoted by subscript ''/' and corresponding to i = 1):
H L1-l
kjm) = I
h=l
C1,h I
j=O
w~,}bY){m) (4.161)
and a slow component (denoted by subscript "s" and corresponding to i = 2):

H L2-1
ks{m) = IC1,h I
h=l j=O
w~~}b)2){m) (4.162)
where
k1(m) = kj(m) + ks(m) (4.163)
The equivalent higher-order Volterra kemels of the LVN-2 model contain also compo-
nents that mix the fast with the slow dynamics (cross-tenns). For instance, the second-
order kerne I is composed of three eomponents: a fast kffi a slow kss , and a fast-slow cross-
term kJs, that are given by the expressions
H Ll-l Ll-l
k1J\mJ,
~f ) -~ C2,h ~ ~
m2 - L L
(1) (I) b(1)(
L Wh,}IWh,}2 }1 m,
)b(1)(
}2 m2
) (4.164)
h=1 }1 =0 }2=0
H L2-1 L2-1
kss ( nu, m2 ) -- "L ~ ~ (2) (2) b(2)( )b(2)( ) (4.165)
C2,h L L Wh,}IWh,}2 }1 m, }2 m2
h=1 }1=0 }2=0
H LI L2
kjs(mJ, m2) == L
h=1
C2,h L L [w~~JIW~~)2b)?(ml)b)~(m2) + w~~)lw~)2b)I/(m2)b)~(ml)]
}1=1}2=}
(4.166)
The second-order kernel is the summation of these three components. In general, the
equivalent qth-order Volterra kernel ean be reconstructed from the LVN-2 parameters as
H 2 2 Lil-1 Li-l
k q ( mJ, . . . , m q ) ~
== LCq,hL'
" .. L
~ ~
L ...
.~
L (i})
Wh,}1 . . . Wh,}
(iq) b ( i } ) ( )
}1 m, ... b(iq) (
} mq
) (4 . 167)
h=1 il=1 iq=I}I=O }q=O q q
It is evident that there is a wealth of information in these kernel components that cannot
be retrieved with any other existing method. This approach can be extended to any num-
ber of filter banks.
A critical practieal issue for the suceessful applieation of the LVN-2 model is the prop-
er selection of its structural parameters, that is, the size of the DLF filter banks LI and L 2 ,
the number H of hidden units, and the degree Q of the polynomial activation functions.
As in the LVN case discussed previously, this selection can be made by suecessive trials
in ascending order (i.e., moving from lower to higher values of the parameters) until a
proper criterion is met (e.g., the statistical MOS criterion presented in Section 2.3.1) that
defines statistically the minimum reduetion in the normalized mean-square error (NMSE)
of the output prediction achieved by the model for an increment in the structural parame-
ters. Specifieally, we eommence the LVN-2 training with structural parameter values LI ==
L 2 = 1, H == 1, and Q == 1 and increment the struetural parameters sequentially (starting
with LI and L 2 , and continuing with Hand Q) until the MOS criterion is met. The training
ofthe LVN-2 is performed in a manner similar to the training of the LVN (based on gra-
dient deseent) and has not presented any additional problems.
Illustrative Examples of LVN-2 Modeling. To illustrate the performance of the

LVN-2 modeling approach, we use three simulated nonlinear systems: the first system is
isomorphie to the LVN-2 model shown in Figure 4.44, (used as a "ground-truth" valida-
tion test), the second is a high-order modular system, and the third is a nonlinear paramet-
rie model defined by the differential equations (4.176)-(4.178) that are frequently used to
describe biophysical and biochemical processes [Mitsis & Marmarelis, 2002].
The first simulated system has Volterra kemels that are composed of linear combina-
tions ofthe first three DLFs with two distinct Laguerre parameters: al == 0.2 and a2 == 0.8.
It can also be represented by the modular model of Figure 4.45 with two branches of lin-
ear filters h, and h2 , representing the fast and slow dynamics, respectively) feeding into
the output static nonlinearity N given by the seeond-order expression
yen) == ul(n) + u2(n) + uy(n) - u~(n) + u}(n)u2(n) (4.168)

.----1 ~ ~! uJ(n)
-----'I
I I
x(n)
N ~n)
~
~
Figure 4.45 The modular model of the first simulated example of a second-order system [Mitsis &
Marmarelis, 2002].
where UI and U2 are the outputs of h, and h2 respectively, given by
h1(m) = bbl)(m) + 2bi l)(m) + b~l)(m) (4.169)
h2(m) = bb2)(m) - bi2)(m) + 2b~2)(m) (4.170)
where by)(m) denotes thejth-order DLF with parameter a;

By substituting the convolutions ofthe input with h 1 and h2 for UI and U2' respectively,
in Equation (4.168), the first-order and second-order Volterra kemeIs of this system are
found to be
k}(m) = hl(m) + h2(m) (4.171)
1
k2(mJ, m2) = h l(ml)h l(m2) - h2(ml)h2(m2) + "2 [h 1(ml)h 2(m2) + h I(m2)h 2(ml)] (4.172)
These kemeIs contain fast and slow components corresponding to the two distinct La-
guerre parameters, as indicated in Equations (4.171) and (4.172).
The system is simulated with a Gaussian CSRS input ofunit variance and length of 1024
data points (at = T = 1). Following the ascending-order MOS procedure described earlier,
we detennine that three DLFs in each filterbank (LI = L 2 = 3) and three hidden units (H = 3)
with distinct second-degree polynomial activation functions (Q = 2) are adequate to model
the system, as theoretically anticipated. The number offree parameters ofthis LVN-2 mod-
el is (LI + L 2 + Q) H + 3 = 27. Note that the obtained structural parameter values in the se-
lected LVN-2 model are not unique in general, although in this example they match the val-
ues anticipated by the construction of the simulated system. Different structural parameters
(i.e., LVN-2 model configurations) may be selected for different data from the same system,
and the corresponding L VN-2 parameter values (e.g., weights and polynomial coefficients)
will be generally different as weIl. However, what remains constant is the equivalent
Volterra representation ofthe system (i.e., the Volterra kemeIs ofthe system), regardless of
the specific LVN-2 configuration selected or the corresponding parameter values.
In the noise-free case, the estimated LVN-2 kernels of first and second order are iden-
tical to their true counterparts, given by Equations (4.171) and (4.172), respectively, and
the normalized mean-square error (NMSE) ofthe output prediction achieved by the LVN-
2 model is on the order of 10- 3 , demonstrating the excellent performance of this modeling
procedure. The first-order kernel is shown in Figure 4.46, along with its slow and fast
components estimated by LVN-2 . The estimated second-order kernel and its three com-
ponents are shown in Figure 4.47 , demonstrating the ability ofthe LVN-2 to capture slow
and fast dynamics.
The utility of employing two filter banks in modeling systems with fast and slow dy-
namics can be demonstrated by comparing the performance ofthe LVN-2 model with an
LVN model (one filter bank) having the same total number of free parameters (L = 6, H =
3, Q = 2). In order to achieve comparable performance with a single filter bank, it was
found that we have to almost double the total number of free parameters to 44 from 27
[Mitsis & Marmarelis, 2002].
In order to examine the effect of noise on the performance of the LVN-2 model, we
add independent GWN to the system output for a signal-to-noise ratio (SNR) equal to 0
dB (i.e., the noise variance equals the mean-square value ofthe de-meaned noise-free out-
put). Convergence occurs in about 600 iterations (peculiarly, faster than in the noise-free
case) and the final estimates of Cl! and ClZ are not affected much by the presence of severe
noise (&, = 0.165 and &z = 0.811). The resulting NMSE value for the LVN-2 model pre-
diction is 53.78% in the noisy case , which is deemed satisfactory, since the ideal NMSE
level is 50% for SNR = OdB. The NMSEs of the estimated first-order and second-order
Volterra kernels in the noisy case are given in Table 4.1 along with the estimates of Cll
and ClZ that corroborate the previous favorable conclusion regarding the efficacy of the
LVN approach, especially when compared to the estimates obtained via the conventional
cross-correlation technique (also given in Table 4.1).
The second simulated system also has the modular architecture shown in Figure 4.45
but with a fourth-order nonlinearity given by
yen) = u!(n) + 2u!(n) + 4uy(n) - 4u~(n) + 4u,(n)uz(n) + t u1(n) + ! u~ (n) + au t (n) + ~ u ~(n)
(4.173)
31 I I
_ · - FMI I
... Slow
2.5
1- Total .
1.5
..~ 1
-0.5
·1' I
o 20 40 60 so 100 120
m
Figura 4.46 Estimated first-order Volterra kernel for the first simulated system and its slow/fast
components [Mitsis & Marmarelis, 2002].
258 M ODULAR AND CO NNECTIONIST MODELING
(a)
:r
(b)
.....:. .... .'.
f +. 5. " -": ' .....
' .;
o .. 0
o~~ .+ . : " " -5

0
>100 30
IU 20~
ml 30 0
(c) (d)
..... ...
~
..;....
. ~. 0.5
0.2 .
o~
0
-O.~ .' ~ ~ ~: :~ 100

...'.. .. . . ..::. ,... .. . -:.. oo.g
100
Flgure 4.47 Estimated second-order Volterra kernel for the first simulated syst em (a), and its three
components: (b) fast component, (c) slow component, and (d) fast-slow cross-component [Mitsis &
Marmarelis, 2002).
and linear filter impulse responses that are not linear combinations of DLFs but given by
hJ(m) = exp(- ~ )si n( 7) (4.174)
him ) = exp( - ;~ ) - exp(- ~ ) (4.175)
Employing the ascen ding-order MOS procedure for the selection of the structural para-
meters ofthe LVN-2 mod el, we select Z, =L2 = 7, H = 4, and Q = 4 (a total of75 LVN-2
model parameters). The results for a Gaussian CSRS input of 4096 data points are excel-
lent in the noise-free case, as before, and demonstrate the efficac y ofthe method for this
high-o rder system [Mitsis & Marmarelis, 2002].
The effect of output- additive noise on the performance of the LVN-2 model is exam-
ined for this system by adding 20 different independent GWN signals to the output for an
SNR of 0 dB. The resulting NMSE values (computed over the 20 independent trials) for
Table 4.1 Normalized mean-square errors (NMSEs) for model prediction and kernel estimates
using LVN-2 and conventional cross -correlation for SNR = 0 dB
Kerne l NMSEs
Prediticion
NMSE k ,(m) kz(m" mz)
a, az (%) (%) (%)
LVN-2 0.165 0.8 11 53.78 7.69 4.10
Cros s-correlation - - 86.38 42 1 1919
the output prediction and for the estimated kernels (mean value ± standard deviation) are
48.42 ± 2.64% for the output prediction, 3.72 ± 2.37% for the k l estimate, and 6.22 ±
3.82% for the k 2 estimate. The robustness of the method is evident since the prediction
NMSE is close to 50% (ideal NMSE for SNR = 0 dB) and the kernel NMSEs are low
compared to the variance of the output-additive noise. In fact, the NMSE of the kernel es-
timates can be used to define an SNR measure for the kernel estimates computed as -10
10g(NMSE), which yields about 14 dB for the k l estimate and about 12 dB for the k2 esti-
mate.
Finally, a third system of different structure is simu1ated, which is described by the fol-
lowing differential equations (the input-output data are properly discretized after continu-
ous-time simulation of these differential equations):
dy(t)
- - + boY(t) = [CIZI(t) - C2Z2(t)}y(t) + Yo (4.176)
dt
dz\(t) + btz\(t) = x(t) (4.177)

dt
dz 2(t) + b2zit) = x(t) (4.178)

dt
where Yo is the output baseline (basal) value and Zl(t) and Z2(t) are internal state variables
whose products with the output y(t) in the bilinear terms of Equation (4.176) constitute
the nonlinearity of this system, which gives rise to an equivalent Volterra model of infi-
nite order. The nonlinearities of this system [i.e., the bilinear terms of Equation (4.176)]
may represent modulatory effects that are often encountered in physiological regulation
(neural, endocrine, cardiovascular, metabolie, and immune systems) or in intennodulato-
ry interactions of cellular/molecular mechanisms (including voltage-dependent conduc-
tances of ion channels in neuronal membranes or ligand-dependent conductances in
synapses-see Chapter 8).
The contribution ofthe qth-order Volterra kernel to the output ofthis system is propor-
tional to the qth powers of RCI and RC2' where R is the root-mean-square value ofthe in-
put. When the magnitudes of Cl and C2 are smaller than one, a truncated Volterra model
can be used to approximate the system. For the parameter values of b., = 0.5, b 1 = 0.2, b2 =
0.02, Cl = 0.3, and C2 = 0.1, it was found that a fourth-order LVN-2 model was sufficient
to represent the system output for a Gaussian CSRS input with unity power level and
length of 2048 data points.
Following the proposed MOS search procedure for the selection ofthe structural para-
meters of the model, an LVN-2 with LI = L 2 = 5, H = 3, and Q = 4 was selected (a total of
45 free parameters). The obtained results for the noise-free and noisy (SNR = 0 dB) con-
ditions are given in Table 4.2, demonstrating the excellent performance of the LVN-2
Table 4.2 LVN-2 model performance for the system described by Equations (4.176)-(4.178)
Output prediction First-order Kerne!

al a2 NMSE(%) NMSE(%)
Noise-free output 0.505 0.903 0.28 0.10
Noisy output (SNR = 0 dB) 0.366 0.853 47.49 4.13
model for this system as welle It should be noted that the estimated zeroth-order kernel
was equal to 1.996, very close to its true value of 2.
The equivalent Volterra kerneIs for this system can be analytically derived by using
the generalized harmonie-balance method described in Section 3.2. The resulting analyti-
cal expressions for the zeroth and first-order kerneIs are
k o = Yo (4.179)
bo
Yo { Cl C2 }
kl(m) = b bl-bo [exp(-bom)-exp(-b,m)]- bz-b [exp(-bom)-exp(-bzm)] (4.180)
o o
The analytical forms of the higher-order kerneIs are rather complex and are not given
here in the interest of space, but the general expression for the second-order Volterra ker-
nel is given by Equation (3.71). The fast component ofthe first-order kernel corresponds
to the first exponential difference in Equation (4.180), whereas the slow component corre-
sponds to the second exponential difference in Equation (4.180) (recall that b l is ten times
bigger than b2 in this simulated example).
We close this section with the conclusion that the use of LVN offers powerful means
for efficient modeling of nonlinear physiological systems from short input-output data
records. The problem of efficient modeling of nonlinear systems with fast and slow dy-
namics can also be addressed by employing two filter banks characterized by distinct La-
guerre parameters (the L VN-2 model). The efficiency and robustness of this approach
were demonstrated in the presence of severe output-additive noise.
4.4 THE VWM MODEL
The basic rationale for the use ofthe Volterra-equivalent network models is twofold: (1)
the separation of the dynamics from the nonlinearities occurring at the first hidden layer,
and (2) the compact representation of the dynamics with a "judiciously chosen" filter
bank and of the nonlinearities with a "properly chosen" structure of activation functions
in the hidden layer(s).
This basic rationale is encapsulated in the proposed Volterra-Wiener-Marmarelis
(VWM) model in a manner considered both general and efficient that obviates the need
for a preselected basis in the filter bank and provides additional stability of operation with
the use of a trainable compressive transformation cascaded with the polynomial activation
functions. The overall structure of the VWM model exhibits close affinity with the
Volterra-equivalent network models shown in Figure 4.36 and discussed in Section 4.2.2.
Specifically, the VWM model is composed of cascaded and lateral operators (both lin-
ear and nonlinear) forming a layered architecture, as shown in Figure 4.48 and described
below.
The previously employed filterbank for input preprocessing is replaced in the VWM
model with a set of linear difference equations of the form
vj(n) = aj,IVj(n - 1) + ... + (Xj,Kjvj(n -~) + x(n) + ßj,lx(n - 1) + ... + ßj,Mjx(n -~)
(4.181)
4.4 THE VWM MODEL 261
VI (n)
ARX (Kl'M J ) I ""- W1,1
Mode #1
h
W 1•H
x(n) y(n)
WL,l
ARX(KL,ML ) ,
VL (n) ",..
Mode#L
Figure 4.48 Schematic of the VWM model. Each input-preprocessing mode # j U = 1, ... , L) is an
autoregressive with exogenaus variable (ARX) difference equation of order (Kj , M}; the activation
functions {fh} and (gJ are polynomials of degree Qh and R; respectively as shown in Equations (4.182)
and (4.184). {Sh} are sigmoidal functions given by Equation (4.183) and the weights {Wj,h} and {'h,i} are
applied according to Equations (4.182) and (4.184), respectively.
which define the "modes" ofthe system, where x(n) is the input and vj(n) is the output of
the jth mode. Note that ßj,O is set equal to 1, because subsequent weighting of the mode
outputs in Equation (4.182) makes these coefficients redundant. The modes defined in
Equation (4.181) for j = 1, ... , L serve as the input preprocessing filter bank but take the
form of trainable ARX models instead of the preselected filter bank basis in the VEN ar-
chiteeture. The mode outputs are fed into the units of the first hidden layer (after proper
weighting) and are transformed first polynomially and subsequently sigmoidally to gener-
ate the output of the hth hidden unit as
zh(n) = Sh{ ~Ch,q{~ wj,hvin) }q} (4.182)
where the sigmoidal transformation (denoted by Sh) is distinct for each hidden unit (with
trainable slope Ah and offset eh) and is given by the bipolar sigmoidal expression of the
hyperbolic tangent:
1 - exp[-Ah(U - eh)]
(4.183)
Sh{U} = 1+exp[-Ah(u-8h)]
The outputs of these hidden units are combined linearly with weights {(h,i} and entered
into the units of a second hidden layer that will be termed the "interaction layer" for clar-
ity of communication. Each "interaction unit" generates an output
R· [H ]r
IMn) = ~ ». ~th,izh(n) (4.184)
by means of a polynomial transformation of degree R;

The VWM model output is formed simply by summing the outputs of the interaction
units and an output offset yo:
1
yen) = yo + I
i=1
l/J,.{n) (4.185)
262 MODULAR AND GONNEGTIONIST MODELING
No output weights are needed because the scaling of the contributions of the interaction
units is absorbed by the coefficients {'Yi,r}.
The equivalence of the VWM model with the discrete Volterra model is evident from
the discussion of Section 4.2.1. Note that the cascaded polynomial-sigmoidal nonlinear
transformation in the hidden layer endows the VWM model with the potential capabilities
of both polynomial and sigmoidal activation functions in a data-adaptive manner, so that
the combined advantages can be secured. We expounded the advantages of polynomial
activation function previously. The additional sigmoidal operation is meant to secure sta-
bility ofbounded outputs in the hidden units (which is a potential drawback ofpolynomi-
al transformations) and facilitate the convergence of the training process. However, it
may not be needed, in which case the trained values ofthe slopes {A h } become very small
(being reduced to a linear scaling transformation) and the respective {(h,i} values become
commensurably large to compensate for the scaling of the small slope values. The key
practical questions are: (1) how to select appropriately the structural parameters of this
model, and (2) how to estimate the model parameters from input-output data (via cost-
function minimization, since many parameters enter nonlinearly in this estimation prob-
lem).
The selection of the structural parameters follows the guidelines of the search proce-
dure presented in Section 4.2.2. A simplification occurs when we accept uniform struc-
ture for the same type ofunits in each layer (i.e., K, = K, M, = M, Qh = Q, R, = R). Then the
structural parameters of the VWM model are L, K, M, H, Q, I, and R. Certain additional
simplifications are advisable in practice. For instance, K and M can be fixed to 4 or 6,
since most physiological systems to date have exhibited modes with no more than two or
three resonances (from the PDM analysis of actual physiological systems). Although this
is not asserted as a general rule and there are no guarantees of general applicability, one
has to maintain a practicable methodological sense rooted in the accumulated experience.
Naturally, ifthis rule appears inappropriate in a certain case, different values of K and M
should be explored. Following the same reasoning, R can be limited to 2, since the pur-
pose of the interaction units is to introduce the "cross-terms" missing in the "separable
Volterra network" architecture and achieve a better approximation of the multiinput static
nonlinearity ofthe system. This is largely achieved by setting R = 2. Finally, Q can be set
to 3 on the basis of the accurnulated experience to date regarding the observable order of
the actual physiological systems in a practical context. Recall that in the presence of the
cascaded polynomial-sigmoidal transformations, the nonlinear order of the VWM model
becomes infinite. However, lower-order models are possible when the trained slopes of
the sigmoidal transformations become very small (effectively, a linear transformation
within the dynamic range ofthe input). Note that the inclusion ofthe sigmoidal transfor-
mation (of trainable slope) at the outputs of the hidden units is motivated by the desire to
avoid possible numerical instabilities (due to the polynomial transformations) by bound-
ing the outputs of the hidden units. Because of the trainable slope of the sigmoidal func-
tions, no such "bounding" is applied if not needed (corresponding to a very small value of
Ah, compensated by the trained values of (h,J.
With these practical simplifications in the structure of the VWM model, the remaining
structural parameters are L, H, and I. The model-order selection procedure presented in
Section 4.2.2 (utilizing the statistical criterion presented in Section 2.3.1) can be applied
for these three structural parameters in the prescribed rightward ascending order (i.e.,
starting with L = H = I = 1, continue with increasing L up to L max , and then increase H
and I in successively slower "looping speeds").
4.4 THE VWM MODEL 263
Having detennined the structure of the VWM model, the parameter estimates are ob-
tained via the training procedures discussed in the previous section by application of the
chain rule of differentiation in error back-propagation. It should be emphasized that the
training of the mode equations obviates the need for "judicious selection" of the input fil-
ter bank basis (a major practical advantage) and is expected to yield the minimum set of
filters for input preprocessing based on guidance from the input-output data. One cannot
ask for more, provided the training procedure converges properly.
The equivalent Volterra kernels for the VWM model can be obtained by expressing the
sigmoidal functions in tenns ofTaylor series expansions or finite polynomial approxima-
tions within the range of their abscissa values. For instance, the analytical expression for
the first-order Volterra kerneI of the VWM model is
I H
k1(m) == L Yi,l L(h,iah,l(Oh, Ah)Ch,l ph(m) (4.186)
i=l h=l
where ph(m) denotes the hth PDM defined by Equation (4.187) and ah,l is the first-order
Taylor coefficient of Sh that depends on the respective slope Ah and the offset 0h. The re-
sulting analytical expressions for the high-order kemels are rather complex in the general
case and are omitted in the interest of space. Furthermore, their practical utility is margin-
al because the physiological interpretation of the VWM model ought to be based on the
equivalent PDM model shown in Figure 4.1. Note that the hth PDM is given by
L
ph(m) == L Wj,hgj(m) (4.187)
j=l
where gj(m) denotes the impulse response function ofthe ARX (mode) Equation (4.181).
The number of free parameters in the VWM model is
P == L(K + M) + H(L + Q +2) + I(H + R) + 1 (4.188)
which indicates that the VWM model complexity is linear with respect to each structural
parameter separately, although it is bilinear with respect to pairs of parameters. This fact
is of fundamental practical importance, especially with regard to the nonlinear order Q
and R, and constitutes the primary advantage (model compactness) of the use of the
VWM model and, generally, Volterra-equivalent networks for nonlinear dynamic system
modeling. In most actual applications, we expect K == M == 4, Q == 3, R == 2, and L ::; 4, H
:5 3, I ::; 2. Therefore, the maximum number of free parameters in a practical context is
expected to be about 70 (for L == 4, H == 3, 1== 2). In the few cases where K == M == 6, the
maximum number of free parameters becomes 86. In most cases, we expect P to be be-
tween 30 and 50. This implies that effective modeling is possible with a few hundred in-
put-output data points.
The structural parameter His the most critical because it defines the number of "prin-
cipal dynamic modes" (PDMs) in the Volterra-equivalent PDM model that is used for
physiological interpretation (i.e., each hidden unit corresponds to a distinct PDM). A
smaller number of PDMs facilitates the physiological interpretation of the VWM model
and affects significantly the computational complexity of the VWM model because
ap/aH == L + Q + I + 2. Likewise, the computational complexity is affected significantly
by the necessary number of modes L, because ap/aL == K + M + H. Each of the PDMs
corresponds to the linear combination of the VWM modes given by Equation (4.187),
which defines the input entering into each hidden unit in the VWM model.
In closing, we note that the mode equations of the VWM model can be viewed also as
"state equations," since the mode outputs {vj(n)} describe the dynamic state ofthe system
at time n. This dynamic state is subject to a static nonlinear transformation to generate the
system output. The nonlinear transformation is implemented (approximated) by means of
the activation functions of the hidden and interaction layers in the combination prescribed
by the VWM model structures ofFigure 4.48.
Finally, we note that the VWM approach can be extended to the case ofmultiple inputs
and multiple outputs with immense scalability advantages through the use of the interac-
tion layer, as discussed in Chapter 7.
5
A Practitioner s Guide
"... explanations ofnatural phenomena must be

as simple as possible ... but no simpler . . . "
-Attributed to Albert Einstein
In this chapter, we summarize the key practical issues confronting the actual application
of the advocated modeling approach, and we present proper procedures for addressing
these issues in a realistic context. The first two sections deal with initial groundwork and
the last two sections deal with the core issues of model specification, estimation, valida-
tion, and interpretation. This is meant to be a "how to" chapter, useful for the practicioner
ofthis methodology. User-friendly software packages implementing this methodology are
distributed (free of charge) by the Biomedical Simulations Resource at USC. http://
bmsr.usc.edu
Before delving into the specifics of the advocated methodology, the reader should be
alerted to the "golden rules" that ought to govem the three main facets of the modeling
task in the case of physiological systems, regarding the proper (1) operating conditions
and input ensemble, (2) model specification and estimation, and (3) model interpretation
and utilization.
Rule #1: We must try to secure natural operating conditions and a broad, repre-
sentative input ensemble.
Rule #2: We must minimize apriori constraints on the selected model structure
and use robust estimation methods associated with proper means ofval-
idation and accuracy assessment.
Rule #3: We must interpret physiologically the obtained model in the realistic
nonlinear dynamic context (not simplistic cases of contrived experi-
mental paradigms) and with regard to the intended utilization (diagno-
sis, control, assessment, scientific knowledge).
Nonlinear Dynamic Modeling 01Physiological Systems. By Vasilis Z. Mannarelis 265

266 A PRACTITIONER'S GUIDE
5.1 PRACTICAL CONSIDERATIONS AND EXPERIMENTAL

REQUIREMENTS
In this seetion, we diseuss the main praetical eonsiderations that arise in actual applica-
tions of nonlinear modeling of physiological systems. We are eoneerned with modeling
studies of physiologieal systems where the input-output signals are reeorded (either spon-
taneously or in eontrolled experiments) as evenly sampled time-series datasets and
proeessed to obtain a mathematical model ofthe dynamie input-output relationship. Mul-
tiple inputs and outputs are possible, including cases of spatiotemporal patterns (e.g., a
visual stimulus eomprised of time-varying spatial patterns) or spectrotemporal patterns
(e.g., an auditory stimulus comprised oftime-varying spectral patterns). The input signal
may represent the natural spontaneous activity of the system or be eontrolled experimen-
tally so that its key attributes (bandwidth and dynamie range) may be seleeted to cover the
operational system bandwidth and dynamic range. The sampling interval is assumed to be
the same for the input and output signals, and it is selected so that the Nyquist frequency
(i.e., the inverse of twiee the sampling interval) exeeeds the system bandwidth. Analog
low-pass filtering is applied to the input and output signals prior to discretization (sam-
pling) to avoid aliasing due to extraneous noise.
Of particular importanee are the input signal charaeteristics that must be seleeted in ac-
cordance with the system characteristies and the objeetives ofthe modeling study. For in-
stanee, the highest frequency of interest in the system response determines the system
bandwidth and, eonsequently, the minimum input bandwidth that should be used. Like-
wise, the lowest frequeney of interest in the system response eorresponds to the system
memory and, eonsequently, determines the minimum experimental data record required
(typically a small multiple ofthe system memory). These practical considerations are or-
ganized in three categories regarding: (1) system charaeteristics (Section 5.1.1), (2) input
characteristics (Seetion 5.1.2), and (3) experimental eonsiderations (Section 5.1.3).
5.1.1 System Characteristics

We must first examine some basic system characteristics that influenee the proper design
ofthe modeling study. These characteristies ean be elustered in two groups: one compris-
ing key system parameters (Le., bandwidth, memory, and dynamie range) and the other
pertaining to key system functional properties (linearity, stationarity, and ergodieity). The
aetual determination of these characteristics is subjeet to experimental variations and
noise/interferenee and, therefore, calls for appropriate seleetion criteria that take into ac-
count this uneertainty in the context of the speeifie data (spontaneous activity or eon-
trolled experiments). Generally, the ability to perform preliminary experimental tests for
this purpose is desirable, as discussed in Section 5.2.
System Bandwidth. The system bandwidth is defined as the range of input frequen-
eies over whieh the system generates significant response. In an experimental setting, it
can be established through preliminary testing with narrow-band (sinusoidal) inputs of
sweeping frequeneies or with broadband white-noise inputs that cover the frequeney
range of interest, as diseussed in Section 5.2.1. The use of white-noise (or, generally,
broadband) inputs is appropriate for testing nonlinear systems since it can generate non-
linear interactions among simultaneously applied multiple frequeneies, whereas narrow-
band inputs do not generate such interactions and, therefore, do not test the system ex-
haustively.
5.1 PRACTICAL CONSIDERATlONS AND EXPERIMENTAL REQUIREMENTS 267
The definition of "significant response" is heuristic and calls for an empirical or statis-
tical criterion. The convention followed usually in the engineering 1iterature is based on a
3 dB reduction of the frequency response gain (i.e., half of the maximum gain) but ap-
pears inappropriate for the purpose of determining the bandwidth of physiological sys-
tems, in which frequency response gains as low as 1% of the maximum may be of inter-
est. It is recommended that the selection criterion be applied to the spectrum of the
elicited response signal for a broadband or white-noise stimulus and be set as low as -40
dB (which corresponds to a -20 dB threshold for the frequency response gain). If the
presence of significant ambient noise prevents the application of such a low threshold by
placing a higher "noise floor" on the spectrum of the measured response, then either the
input-signal power should be increased to improve the signal-to-noise ratio at the output
over a wider frequency range, or the "noise floor" should be accepted as the threshold. If
controlled experiments are not possible and only natural data are available, then the ratio
of the magnitudes of the input-output Fourier transforms can be used as a measure of fre-
quency-response gain to determine the system bandwidth.
System Memory. The system memory (or settling time) is defined as the time elapsed
for the diminution of the causal effect of an impulsive input stimulus. Again, the "diminu-
tion of the causal effect" in the system response calls for a threshold criterion, which is
subject to the same measurement noise ambiguities discussed above for the system band-
width. It is recommended that this determination be made with a generous disposition, us-
ing either an impulsive or a step input. If neither of those deterministic inputs is experi-
mentally feasible or desirable, but broadband input-output data are available (either
natural or experimental), then the input-output cross-correlation can be used to estab1ish
the system memory (see Section 5.2.2).
Another issue related to the system memory is the low-endjj ofthe frequency range of
interest in the system response. The latter can be used to determine the minimum length
ofthe system kernels, which is equivalent to the system memory (all kernels of a system
are assumed to have approximately the same effective memory extent). Note that a very
important parameter in nonparametrie model estimation is the memory-bandwidth prod-
uct of the system that determines the minimum number of discrete-time samples along
each kernel dimension required for adequate discrete-time representation of the kernels
(equivalent to the ratio of the highest to the lowest significant frequency in the frequency
response characteristics ofthe system).
System Dynamic Range. The system dynamic range is defined as the input ampli-
tude range over which the natural operation of the system occurs and the system response
properties are modeled. It is usually determined by the maximum and minimum ampli-
tude values of the input signal that the physiological system is expected to experience in
the course of its natural operation (also called the physiological range), since it is desir-
able to use experimental stimuli that resemble as much as possible the natural input en-
semble. In engineering, the dynamic range is often given in dB and defined as 20 log
[maxImin] (if min > O)-a convention that is not compelling to adopt in the context of
physiological systems, for which it is more appropriate to simply specify the min-max
range.
Although the sought model ought to be valid over the entire physiological range, many
studies to date have resorted to segmentation of this range experimentally in order to sim-
plify the modeling task when the system exhibits significant nonlinearities. This range
segmentation often seeks to approximate the local behavior of the nonlinear system in the
neighborhood of an operating point by means of a linearized model. The characteristics of

this linearized model change, of course, for each different operating point in order to fit
the global nonlinear characteristics ofthe system. This linearization approach may be use-
ful in certain cases but generally results in unwieldy model representations and potential-
ly serious misunderstandings due to the multiple "localized" models. Our goal is to seek
"global" models that are valid over the entire system dynamic range and avoid range seg-
mentation.
System Linearity. The issue of system linearity is critical for modeling purposes, since
nonlinearities require more elaborate methodologies and present themselves with far
greater diversity of model forms. The humorous phrase: "to divide all systems into linear
and nonlinear is like dividing the world into bananas and nonbananas" punctuates the
point of the preponderance of possible system nonlinearities over the very particular case
oflinearity. Linearity, of course, is attractive because ofits relative simplicity and was de-
fined in Section 1.3.2 by means ofthe superposition principle. Although most physiologi-
cal systems do not strictly obey the superposition principle, many can be approximated
fairly well by means of linear models under certain (possibly restrictive) operating condi-
tions.
For a given dynamic range of interest, the linearity of a system can be examined exper-
imentally using one of many possible tests based on the superposition principle or other
fundamental properties of a linear system (e.g., a fundamental property of linear time-in-
variant systems is the generation of a sinusoidal output in response to a sinusoidal input at
the same frequency), as discussed in Section 5.2.4. White-noise inputs can be also used to
test the linearity of a system and possess the advantage of simultaneously testing the sys-
tem over all frequencies-a far more vigorous test than with deterministic inputs such as
sinusoids or impulses/pulses (see Section 5.2.4). Another important aim of the linearity
test can be to provide insight into the type/order of nonlinearity, if linearity is rejected.
These aspects of the linearity test and the required mathematical derivations for rigorous
analysis are elaborated in Section 5.2.4 due to their considerable importance for actual ap-
plications.
System Stationarity. The issue of system stationarity (or time invariance) is likewise
critical for modeling purposes, since possible system nonstationarities are likely to pre-
sent a daunting modeling challenge when combined with system nonlinearities. The test
for stationarity is briefly described in Section 5.2.3 and its application is further discussed
in Chapter 9 in connection with nonstationary system modeling. The key issue 'that serves
as a methodological branching point is the relation between the rapidity of nonstationary
change and the extent ofthe system memory (or the time constants ofthe system dynam-
ics). When the system nonstationarity is slow relative to the low-frequency end ofthe sys-
tem dynamics, it is possible to obtain approximate "piecewise stationary" models over
successive (and possibly overlapping) time segments of the data or use some tracking
method based on recursive relations (e.g., recursive least squares). However, when the
nonstationarity is fast relative to the system dynamics, only certain cases can be treated
with the specialized methods discussed in Chapter 9.
System Ergodicity. The issue of system ergodicity is one that is seldom acknowledged
in practice despite its fundamental importance. It refers to the invariance of the functional
properties for various experimental preparations of a given physiological system. It is
widely accepted that such invariance is not possible in the real world, because each exper-
5.1 PRACTICAL CONSIDERATIONS AND EXPERIMENTAL REQUIREMENTS 269
imental preparation exhibits some systemic variation due to a myriad of uncontrollable

(intrinsie and extrinsie) factors, regardless of our best efforts to have identical experimen-
tal preparations. The same is true for intersubject variability in clinical data. Thus, ergod-
icity does not usually hold and the best we can hope for is that these random systemic
variations across different preparations exhibit fixed statistical characteristics. In this
case, appropriate statistical analysis of the results obtained from different preparations is
possible and the formation of proper averages remains meaningful.
In spite of this fact, the ergodie property is usually postulated in practice for physiolog-
ical systems, since it is often impractical or even impossible to actually perform appropri-
ate statistical analysis across a broad ensemble of results from real data. An ergodicity test
will have to involve a large number of different preparations that are not usually avail-
able. Nonetheless, the issue should be acknowledged and addressed in the proper manner
in each practical context, given that it may call into question the results obtained through
customary practice (averaging results from different preparations). It is our beliefthat the
lack of ergodicity across physiological preparations or clinical subjects need not be con-
founding in actual modeling studies and can be properly addressed with the current
methodological means.
5.1.2 Input Characteristics

A key input requirement for nonlinear system modeling is that the ensemble of input
waveforms contains most, if not all, frequencies and amplitudes of interest (i.e., cover the
entire bandwidth and dynamic range of the system under study). This can be achieved
with a natural ensemble of inputs (if such is available) or, in an experimental setting, the
system must be tested exhaustively over its entire bandwidth and dynamic range in order
to observe all possible nonlinear interactions. This clearly implies that the system cannot
be tested separately for different frequency bands and amplitude ranges, since the super-
position principle does not hold for nonlinear systems and nonlinear interactions must be
observed lest they cannot be modeled. It is for this reason that we favor random (or
pseudorandom) broadband input signals that cover the entire system bandwidth and dy-
namic range of interest or a natural ensemble .ofinputs (if available) that tend to be broad-
band and cover the physiological range of system operation.
Although deterministic input signals can be constructed that satisfy the broadband re-
quirements (e.g., chirp waveforms), most attention has been directed toward random or
pseudorandom input signals that approximate white noise over the bandwidth of interest
(i.e., nearly constant-power spectral density over the system bandwidth). This tendency is
due in part to Norbert Wiener's pioneering proposition ofmore than 50 years aga that the
"effective test input" for nonlinear systems is Gaussian white noise (GWN). However, the
use of white-noise or quasiwhite random inputs is also supported by the fact that natural
stimuli (constituting the ideal input ensemble) are typically broadband and appear sto-
chastic in form. Another important consideration is the availability of appropriate
methodologies for the analysis of such data. This motivated the early use of specialized
test inputs (e.g., multiple impulses or sinusoids, GWN or quasiwhite signals, etc.) that fa-
cilitated the modeling task. It is only recently that the requisite methodologies became
available for effective modeling under natural inputs and operating conditions.
In addition to the input signal waveform, we must select the dynamic range of input
amplitudes over which we intend to study the system. We advocate the use of the full
physiological range of input amplitudes (i.e., the normal operating range of the system)
but this may be constrained by experimental considerations. It should be remembered that
the input amplitude range and frequency bandwidth define the region of validity of the
obtained model in terms of a compact function space (i.e., inputs with smaller dynamic
range and/or bandwidth will be compatible with the estimated model, but not necessarily
inputs with larger dynamic range andlor bandwidth). The full bandwidth requirement can
be established with preliminary testing, as discussed in Section 5.2.1, but the full dynam-
ic range must be determined from physiological considerations of the "normal operating
range." This issue is moot when natural (spontaneous) activity data are used for our mod-
eling purposes.
Wiener' s pivotal contributions to nonlinear system identification are detailed in Sec-
tion 2.2, where the reasons for recommending GWN inputs are also discussed. Here we
simply note that band-limited and range-limited GWN signals (the physical realizations
of ideal GWN) satisfy the aforementioned input requirements. However, other properly
constructed random/pseudorandom and deterministic signals may satisfy these input re-
quirements as well. Among such random signals, a large family of quasiwhite (i.e., ap-
proximately white over the bandwidth of interest) signals with symmetrie amplitude prob-
ability density functions (termed the CSRS) offer an attractive alternative in practice, due
to the afforded flexibility in selecting the appropriate amplitude distribution and other ad-
vantages discussed in Section 2.2.4. Note that all random test input signals are selected as
stationary and ergodie processes. Among the candidate pseudorandom signals, binary and
temary m-sequences have been the primary choice to date, exhibiting a mix of potential
advantages and disadvantages discussed in Section 2.2.4. Among the deterministic sig-
nals, the choice has been limited so far to sums of sinusoids (of nearly incommensurate
frequencies) and properly constructed sequences of impulses discussed in Sections 2.1.5
and 8.3 for neuronal systems, although chirp waveforms (linear or hyperbolic FM) may
also have some potential that is heretofore largely unexplored.
Illustrative examples of various "quasiwhite" input signals are given in Section 2.2.4
in connection with nonparametric modeling. It should be noted that input whiteness (or
quasiwhiteness) is not a strict requirement for recently developed methodologies (dis-
cussed in Section 2.3), although it was astriet requirement for the initial kerne I estimation
technique (i.e., the cross-correlation technique discussed in Section 2.2.3) for nonpara-
metrie modeling of nonlinear systems.
It must be emphasized that, whenever possible, a natural ensemble of inputs should be
used (which is typically stochastic and broadband). If such is not available, then an appro-
priate quasiwhite test signal of appropriate bandwidth and amplitude distribution for each
given application is advisable.
5.1.3 Experimental Considerations

Having selected the type of input signal (i.e., waveform, bandwidth, and dynamic range),
we must ascertain the capability of digital-to-analog signal transducers to deliver the se-
lected waveform at the required bandwidth and dynamic range of the experimental stimu-
lus. Depending on the particular characteristics of each experimental preparation, this
may or may not be achallenge. Typically, when the bandwidth and dynamic range in-
crease, the experimental implementation of the stimulus becomes more challenging, since
the input signal transducers may introduce significant distortions. For this reason, the ex-
perimental stimulus that is actually delivered to the system must be recorded and used for
the modeling task during data analysis (and not the originally designed input to the trans-
ducer). Alternatively, the transfer characteristics of the input transducer can be modeled
5.1 PRACTICAL CONSIDERATlONS AND EXPERIMENTAL REQUIREMENTS 271
separately and accounted for as a precascaded filter in the overall model obtained from
the originally designed input.
Similar transduction issues exist on the output side. The actual experimental output is
measured with an output transducer and discretized with an analog-to-digital converter
(ADC). The transfer characteristics of the output transducer can be modeled separately
and accounted for as a postcascaded filter in the resulting overall model. The effects of
the ADC are usually deemed negligible by securing a sufficient number ofbits for quanti-
zation levels (typically, 12 or more bits alleviate the quantization error).
The sampling interval T is chosen to be the same for the input and output signals, and
is selected on the basis ofthe system bandwidth, Es, and the input bandwidth, B, (in Hz),
following the Nyquist theorem as T:5 1/(2B x ) :5 1/(2Bs ) (in second). In order to avoid
aliasing from possible high-frequency noise or interference, the output signal is low-pass
filtered at B, prior to sampling and digitization. It is recommended that B; be selected to
be equal to B s , and T = 1/(2B x ) , to maximize the efficiency of the data collection and
analysis process (i.e., minimize the number ofrequired samples).
Preliminary measurements of the ambient noise and systemic interference are strongly
reeommended in order to guide possible preproeessing of the reeorded output signal by
means of appropriate filtering. For instanee, many physiological systems exhibit signifi-
eant low-frequency noise and/or interference (e.g., the cardiovascular system exhibits in-
terference from the endocrine and metabolie systems below 0.01 Hz) that can be removed
by appropriate high-pass filtering. Likewise, high-frequency noise may be introduced by
the instrumentation or by systemie sourees, and must be low-pass filtered to avoid alias-
ing. It is also possible to tailor the speetrum of the input signal in order to maintain a fair-
ly constant signal-to-noise ratio (SNR) aeross all frequencies for an anticipated noise/in-
terference spectrum.
The length of the experimental data record is determined by the key trade-off between
stationarity/stability of the experimental preparation and data analysis burden on one
hand, arguing for shorter records, and estimation aecuracy requirements favoring longer
data records on the other hand. Obviously, low SNR conditions require longer data
reeords or repetition of identieal experiments and averaging of the output records (under
an ergodie assumption) in order to improve the SNR in the output data. For nonlinear sys-
tems, another important consideration is that the input signal must have adequate opportu-
nity to stimulate the system with a suffieient variety of input waveforms (as the only in-
disputable means for model validation). This consideration favors random broadband or
quasiwhite inputs, since they offer a broader repertoire of stimulation epochs to the sys-
tem. Since these conditions cannot be precisely quantified, this judgment remains qualita-
tive. Typically, the longest possible data record is eollected under the prevailing experi-
mental constraints, and the other issues are addressed and assessed in the analysis process.
It is useful to note that recently developed estimation methodologies have reduced dra-
matically the data length requirements and yield accurate estimates for rather short data
reeords, thus relieving the investigator of considerable experimental burden.
The effect of noise/interference is a critical praetical consideration in the modeling of
physiological systems and cannot be easily separated from possible system nonstationari-
ties or stochastic systemic variations that are widely viewed as part ofthe natural milieu of
physiological systems. Therefore, special attention must be given to the robustness and
noise-suppressing properties of the employed modeling methodologies (especially the es-
timation methods). The resulting residuals (i.e., the model prediction errors) must be care-
fully examined for clues that enhance our understanding of the statistics of the ambient
noise/interferenee environment of a physiologieal system and for eonfirmation of the va-

lidity ofthe obtained model. This eritieal issue is further diseussed in Seetions 5.3 and 5.4.
5.2 PRELIMINARY TESTS AND DATA PREPARATION
In this seetion, we diseuss the preliminary testing required in order to establish some key
attributes of the system that affeet the modeling process; namely, the system bandwidth,
memory, linearity, stationarity, and ergodieity.
As a starting point, it is assumed that a stable experimental preparation is available for
testing with the seleeted input waveforms (or for monitoring the spontaneous activity of
the naturally operating system). The means of applying the experimental test input signal
(D-to-A transdueers) and recording the elieited output signal (sensors and A-to-D trans-
dueers) are assumed to be available for reliable data eolleetion, ineluding provisions for
minimizing the noise in the reeorded/digitized data. Finally, the dynamie range is seleeted
based on the natural operating range of interest.
5.2.1 Test for System Bandwidth

Having seleeted the input dynamie range, we proeeed with the determination of the sys-
tem bandwidth in an experimental setting by performing tests with sueeessively broader
input bandwidths until inspeetion of the resulting output speetra reveals the highest fre-
queney fmax at whieh the system eeases to respond appreeiably. The latter judgment is
eomplieated by the presence of noise that places a "floor" in the speetral values at high
frequeneies. Typieally, we consider the frequeney at whieh the output spectral density
drops below a speeified threshold (e.g., -20 dB from peak value) to be fmax, provided that
the noise "floor" is lower; otherwise the threshold is set at the noise "floor." Averaging of
responses to repetitive identieal input signals will generally reduee this noise "floor" and
bring it below -20 dB (from peak value) for most applieations. Band-limited GWN (or
other quasiwhite) test inputs are ideal for this purpose, but other broadband inputs can be
used as weIl (e.g., multiple sinusoids), although the use of a finite number of sinusoids
may not test all possible interaetions of interest among multiple frequeneies. The system
bandwidth B, ean be set equal to fmax, and the input signal bandwidth B, ought to be
greater than or equal to fmax in our random broadband experiments (otherwise the system
is not fully tested). Sinee the required sampling interval (to avoid aliasing) is T:::; (2Bx )- I,
it is most effieient to seleet B, = fmax and T = (2Bx )- I.
5.2.2 Test for System Memory

The next task is to determine the system memory (or settling time), whieh represents the
time required for the effects of the input on the output to diminish below a speeified
threshold value. A simple preliminary test that can be used for this purpose is the appliea-
tion of an impulsive input with amplitude determined by the dynamie range of the system
input that we wish to study. Inspeetion ofthe elicited response ean determine the time be-
yond whieh the output amplitude drops below a speeified threshold (e.g., -20 dB from
peak value). The value ofthis threshold depends on the signal-to-noise ratio (SNR) in the
output signal beeause, as in the ease ofbandwidth determination above, the noise plaees a
"floor" in the observable output values. Averaging of multiple responses to repetitive im-
pulsive inputs (suffieiently separated in time to exeeed any eonservative estimate of the
5.2 PRELIMINARY TESTS AND DATA PREPARATION 273
system memory) may be required in practice to raise the output SNR above 20 dB, at a
minimum. Obviously, step inputs or reetangular pulses can be used for the same purpose
as impulses (if they are more convenient experimentally), however the use of such spe-
cialized inputs engenders certain risk of limiting our view of the system behavior to spe-
cialized testing conditions (impulses, pulses, sinusoids).
To avoid this risk, a GWN (or other quasiwhite) test input can be used and the system
memory can be detennined by the extent of significant values in the kerneI estimates.
This is most practical for the first-order kerne I estimate. In the case of broadband random
inputs, the system memory can be detennined by the application of a statistical threshold
on the cross-correlation values between input and output for positive lags (selecting the
maximum lag above the threshold as a conservative estimate of the memory extent). The
threshold can be evaluated from the computed cross-correlation values for negative lags
(assuming causality and open-Ioop operating conditions that allow us to use the latter val-
ues to construct the null hypothesis). The same goal can be achieved with the use ofmov-
ing-average models (see Section 3.1), instead of cross-correlations, to detennine the max-
imum lag of input impact on the output.
In the context of the modified Volterra models, resulting from kernel expansions on
appropriate bases (see Section 2.3.1), the determination of the system memory is sub-
sumed into the issue of selecting the proper parameters for the expansion bases. For in-
stance, in the case of the Laguerre basis, the Laguerre parameter a indirectly determines
the effective system memory (and vice versa). When iterative procedures are used (as dis-
cussed in Section 4.3 in the context ofthe Laguerre-Volterra network), the Laguerre para-
meter a is estimated adaptively from the data along with the other model parameters, thus
obviating the need for a priori determination ofthe system memory.
In the general context ofthe discrete Volterra model (see Section 2.1.4), the problem
of memory detennination is also subsumed into the broader issue of model-order selec-
tion, since M constitutes one of the key structural parameters of the model (the memo-
ry-bandwidth product of the system). The advocated model-order-selection algorithm
(see Section 2.3.1) can be used for determining M along with the nonlinear order Q on the
basis of statistical hypothesis testing.
The estimated system memory intluences the selection of the required experimental
data record length. The latter ought to be at least a low multiple of the system memory
(no less than three times, and preferably more than 10 times). Obviously, the longer the
data record, the better, provided the system remains stationary. This is a critical experi-
mental consideration, since it relates to issues of stability of the preparation (including
possible nonstationarities) and feasibility of the experiment. The record length also deter-
mines the experimental and computational burden associated with the collection and pro-
cessing ofthe data. The advocated "efficient kernel estimation" methods (see Section 2.3)
offer considerable advantages over other existing methods in this regard by reducing sig-
nificantly the requirements of data-record length. Experience with various applications to
date has confinned potential savings by factors of 10 to 100, relative to traditional meth-
ods. The importance of these savings can hardly be overstated, since they affect the ex-
perimental feasibility, the quality ofthe results, the possibility ofpiecewise nonstationary
analysis (if needed) and the total required effort (experimental and computational).
5.2.3 Test for System Stationarity and Ergodicity

With regard to the issue of system stationarity, preliminary testing involves the repetition
of identical experiments with the same preparation at different times and the assessment
of the observed differenees in the recorded outputs and/or in the obtained models. Sinee
some of the output differences will be due to stochastic factors (ambient noise and sys-
temic interference), this assessment is far from trivial and generally less reliable when
based on the observed output data. However, it is more robust when based on observed
changes in the estimated kernels (whieh reduce the effect of noise/interference). In both
cases, we must employ statistical analysis in order to make this assessment. This issue is
discussed further in Chapter 9, where sophistieated methods for nonstationary modeling
are presented. Here, we limit ourselves to the fundamental observation that if the nonsta-
tionarities are slow relative to the system dynamics (i.e., when observable ehanges in the
system functional characteristics take longer than about ten times the system memory),
the effective tracking of the changes in the system model can be accomplished with pieee-
wise stationary analysis of the data over a sliding (and possibly overlapping) time win-
dow or with recursive methods (e.g., reeursive least squares). However, ifthe nonstation-
arities are fast relative to the system dynamics, then the problem is far more diffieult and
one ofthe advanced methods presented in Chapter 9 may be applicable.
With respect to the system ergodicity, the assessment must be also statistieal by virtue
of the stochastic nature of this attribute and requires repetition of identical experiments
with different preparations (of the same experimental setup) in order to eompare the re-
sulting models. The consistency ofthe obtained models for different preparations ought to
be quantified with statistical measures (e.g., mean, standard deviation, skewness, eovari-
anee, distributions, etc.) that ean be used to assess the degree of ergodicity. Naturally, the
key practical question regarding ergodicity is the amount of variability expeeted from
preparation to preparation in terms of the obtained models for given experimental and
computational parameters. When this assessment can be made in a quantitative manner,
the degree of ergodicity of the system can be established. Although this is rarely done in
current practice, we wish to emphasize its importance and encourage future investigations
to assess the degree of ergodieity ofthe system under study.
5.2.4 Test for System Linearity

The fundamental issue of system linearity has to be carefully examined with preliminary
testing in order to establish the need for elaborate nonlinear modeling and determine the
required order of the nonlinear model. This testing may utilize the superposition prineiple
or some basic property oflinear or nonlinear systems. For instance, under the assumption
of stationarity (time invarianee), linear systems produce sinusoidal outputs in response to
sinusoidal inputs (of the same frequency). This basic test can be extended also to nonlin-
ear systems, as depieted in the analysis of Section 2.1.2, where the Volterra kernel of nth
order was shown to generate the n, (n - 2), (n - 4), and so on harmonics of a sinusoidal in-
put frequency. Therefore, a simple preliminary test may employ a sinusoidal input and
Fourier analysis of the resulting output that ought to reveal the highest order of Volterra
kernel present in the system (nonlinear order) based on the detectable presence of har-
monies of the stimulating frequency in the output. It is evident that linearity is asserted
when only the first harmonie (i.e., the same frequency as in the input) is detected. This
task, of course, is practically subject to output SNR considerations that may obscure this
judgment. High output SNR can be achieved by averaging the results of identical repeti-
tive experiments. It is important to note that the sinusoidal input frequency must be varied
and sweep through the system bandwidth in successive trials in order to test the system
thoroughly. The amplitude ofthe sinusoidal input must also cover the amplitude range of
natural operation. Even though this is a simple and practicable test, it is not a eomplete
5.2 PRELIMINARY TESTS AND DATA PREPARATION 275
test in the sense that it does not probe the nonlinear interactions among various input fre-
quencies (sine the input is a single sinusoid at each trial). The sum-of-sinusoids input, de-
scribed in Section 2.1.5, can be used in similar fashion to alleviate this deficiency to some
extent, since it probes the system for nonlinear interactions among all frequencies present
in the input signal.
Impulsive inputs can be also used to test the superposition principle by comparing the
resulting outputs against the expression ofEquation (2.36) and determining the best poly-
nomial fit for various values ofthe impulse strengthA. The degree ofthe best polynomial
fit determines the nonlinear order of the required Volterra model, provided the diagonal
kerneI values are not zero. Comparison ofthe elicited response for a linear combination of
a pair of impulses to the same linear combination of the individual responses to each im-
pulse separately (see Figure 2.1) is particularly suitable for neuronal systems with spikes
(action potentials) at the input. Of course, any combination of linearly independent wave-
forms can be used for the same purpose (i.e., to test the system linearity by means ofthe
superposition principle). These waveforms must cover the system bandwidth and dynam-
ie range ofnatural operation, ifthe test is intended to be complete. However, any special-
ized waveform allows for the possibility of partial testing of the system linearity that en-
genders some risk of erroneous assessment.
The only input waveform that ean assure eomplete testing and definitive assessment of
linearity is the white-noise input signal, eovering the entire bandwidth and dynamic range
ofthe system (e.g., the uniform CSRS quasiwhite test inputs discussed in Section 2.2.4).
This definitive assessment can be based on the existence of high-order kernels (higher
than first-order) and the linear/nonlinear dependence of the output variance on the input
power level, which is given by Equation (2.69) in the case of a GWN input. It is evident
form Equation (2.69) that the nonlinear order of the required Volterra-Wiener model is
the degree of the best polynomial fit of the output variance for different values of the in-
put power level P. The coefficients of this polynomial fit depend on the Euclidean norm
ofthe respective Wiener kernels, which is a reliable measure ofkernel significance.
Another criterion for testing system linearity is the invariance (or not) ofthe estimated
first-order Wiener or CSRS kerneI for different power levels of the quasiwhite test input,
in accordance with the analysis presented in Section 2.2.4. A related and commonly used
test involves eoherence function measurements to assess system linearity (and stationari-
ty), as elaborated in Section 2.2.5.
The test of linearity is practically important because it determines the employed mod-
eling methodology (linear or nonlinear). It is most useful when it also yields the required
order ofnonlinearity, which determines the order ofthe utilized Volterra model.
5.2.5 Oata Preparation

The key step in data preparation eoncems the possible enhancement ofthe signal-to-noise
ratio (SNR) by proper filtering of some of the noise/interference contaminating the in-
put-output data, and the removal of possible artifaets in the output measurements (typi-
cally in the form of baseline/calibration drifts or impulsive outliers).
In terms of noise/interference reduction, the most common approach involves appro-
priate low-pass or high-pass filtering in the frequency bands where the SNR is low. In the
presence of high-frequency noise, low-pass filtering may also allow the lowering of the
Nyquist frequency by down-sampling (if such is aceeptable in terms of the required tem-
poral resolution). In the event of low-frequency noise/interference (which is a frequent
oecurrence in many physiological systems subjeet to numerous autoregulatory mecha-
nisms and unknown systemic or extraneous influences), the necessary high-pass filtering
must be viewed in the context of the data record length to avoid numerical artifacts (i.e.,
the cut-off frequency should be considerably higher than the inverse of the record length).
Low-frequency baseline drifts (whether physiological in origin or experimental artifacts)
can be removed also by fitting and subtracting linear or polynomial baseline fits.
The effect of possible impulsive outliers associated with measurement artifacts can be
mitigated by appropriate clipping (hard or soft) of the signal amplitude values. Proper
clipping requires prior estimation of the natural amplitude distribution of the recorded sig-
nal and selection of a hard or soft clipping threshold based on the dispersion properties of
this distribution.
Of particular importance in terms of data preparation is the "data consolidation"
method advocated in Section 4.2.2. According to this method, we must first define the
space of input vectors (using the input epoch in the simplest case or the filter-bank out-
puts in the case ofthe modified Volterra model) and then determine the "minimal proxim-
ity" value that represents an estimate of the random variability of the input vectors due to
extraneous stochastic factors (i.e., a measure of uncertainty in the input vector measure-
ment). Then, the input vectors that are within this "minimal proximity" value can be aver-
aged and the resulting input vector average can be corresponded to the average value of
the respective outputs. This "data consolidation" procedure is based on the rationale that
input vectors within the vicinity of measurement uncertainty cannot elicit significantly
different outputs. The resulting "smoothing" may introduce some estimation bias in the
model but is expected to have a far more important (and beneficial) effect on reducing in-
put and output noise in the data used for model estimation. Since the latter tends to be a
critical factor in practice, especially for iterative estimation procedures (i.e., giving rise to
local minima), it is expected that the overall effect will be highly beneficial. N aturally, the
trade-off between estimation bias and variance must be carefully examined in practice
vis-a-vis the specified "minimal proximity" value that controls the degree of this
"smoothing" operation.
5.3 MODEL SPECIFICATION AND ESTIMATION
Having completed the preliminary testing to determine the basic attributes of the system
model and having prepared the input-output time-series data, we are now ready to tackle
the core tasks of model specification and estimation. These tasks will be described for the
two types of nonlinear dynamic models that are deemed most promising at the present
time: the modified discrete Volterra (MDV) models and the Volterra-equivalent network
(VEN) models, exemplified by the VWM model in Section 4.4. The former class of mod-
els (MDV) is linear in the unknown parameters and both the specification and estimation
tasks are tackled in the context of linear algebra (matrix formulation), as discussed in Sec-
tion 2.3.1. The latter class ofmodels (VENNWM) is nonlinear in some unknown para-
meters because of the imposed equivalent network structure, and both the specification
and estimation tasks require iterative search and cost-minimization procedures.
The two classes of models will be addressed separately and cover the two primary ap-
proaches for cases of natural, as weIl as controlled, input ensembles. The fact that they do
not require specialized test inputs (e.g., white noise) but are applicable for nearly arbitrary
natural inputs is of tremendous practical importance as it relaxes the experimental re-
quirements and broadens the scope of potential applications. It should be reemphasized
5.3 MODEL SPECIFICATION AND ESTIMATION 277
that these general methodologies can incorporate specialized knowledge (if available) for
practical advantage and, thus, they cannot be criticized for inefficiencies resulting from
their generality. They are both general and efficient. They can also be extended to multi-
ple inputs and multiple outputs, as discussed in Chapter 7.
5.3.1 The MDV Modeling Methodology

The MDV model of order R can be put in the matrix form
Y == VRCR + SR (5.1)
as elaborated in Section 2.3.1. A statistical method for the selection of the model order
was presented in Section 2.3.1 that can be used to determine the true order R ofthe MDV
model (incorporating the filter-bank size Land the nonlinear order Q). The number offree
parameters in the Rth-order model is
PR == (L + Q)!/L!Q! (5.2)
and are contained in the parameter vector CR (i.e., the model is linear in terms of the un-
known parameters). As discussed in Section 2.3.1, the residual vector BR for the true mod-
el order R is a linear transformation of the output-additive noise/interference and allows
the application of a statistical criterion for determining the true model order in an ascend-
ing-order search procedure. The reetangular matrix V R depends on the input data and the
selected filter bank. If it is of reduced rank (because the input ensemble is not "rich"
enough), then pseudoinversion can be used to estimate the parameter vector eR. The para-
meter estimates can be used to reconstruct the estimates of the Volterra kernels of the sys-
tem, utilizing the respective expansion basis (defining the filter bank).
The statistical properties of the parameter (or kerne I) estimates depend on the statisti-
cal properties of the residuals, as discussed in Section 2.3.1. Generally, zero-mean, input-
independent, uncorrelated residuals seeure unbiased and consistent estimates. If the resid-
uals are also Gaussian, then the estimates are efficient (i.e., have minimum variance)
when least-squares estimation methods are used. For non-Gaussian uncorrelated residu-
als, minimum estimation variance can be achieved through minimization of a cost func-
tion proportional to the minus log-likelihood function (i.e., maximum likelihood estima-
tion), which is not quadratic in the non-Gaussian case. In the Gaussian case, minimum
variance is not achieved when the residuals are correlated. Prewhitening of the data is re-
quired in order to achieve minimum variance in this case, which leads to the "generalized
least-squares" estimator of Equation (2.46), employing the residual covariance matrix.
In general, for nearly Gaussian residuals, the parameter vector estimate with minimum
variance is given by
eR == [GRVR]+GRy (5.3)
where G R denotes the prewhitening matrix of the residuals and + denotes the generalized
inverse (or pseudoinverse). An alternative to prewhitening is the random selection of
batches of input-output data points (i.e., input vectors composed of samples at randomly
selected times and the corresponding output values) to form the matrix VR in order to re-
duce the correlation among residuals and facilitate the estimation procedure by making
G R the identity matrix. The random selection procedure can be repeated and the obtained
estimates from each batch can be averaged to yield the final estimates. The size of each
batch must be greater than PR and equal to a fraction of the data record N (e.g., a batch
size of NI8 > PR yields up to eight distinct batches through random selection).
As in the case of network training, where the input-output data points are divided into
a training set and a testing set, one may perform the aforementioned estimation task using
most of these data batches and use the remaining batches to test the predictive ability of
the estimated model ("out of sample" prediction). This procedure is recommended for
model validation (as discussed in Section 5.4) because of possible low-frequency
noise/interference and/or nonstationarities in physiological systems.
It is evident that the estimation of the MDV model is greatly simplified by the fact that
the unknown parameters enter linearly into the estimation problem, which allows the use
ofthe well-developed matrix inversion methods. However, the size ofthe pseudoinverted
reetangular matrix often becomes very large (especially for high-order systems), which
raises issues ofpractical applicability. This has prompted the introduction ofthe Volterra-
equivalent network (YEN) models, such as the VWM model discussed in the following
section.
5.3.2 The VENNWM Modeling Methodology

The VWM model was described in Section 4.4 and is used to exemplify the VEN model-
ing approach. It is generally comprised of a set of "state equations" (linear difference
equations transforming the input into a set of intemallstate variables) and the cascaded
operation of a "hidden layer" followed by an "interaction layer" that j ointly apply a static
nonlinear transformation on the vector of the intemal/state variables to generate the sys-
tem output. The hidden layer employs cascaded polynomial and sigmoidal activation
functions in each hidden unit (acting on the weighted sum ofthe state variables) and the
interaction layer employs polynomial activation functions. These nonlinear operations
make the estimation problem nonlinear in terms of most of the unknown parameters. This
complication (of nonlinear, instead of linear, parameter estimation) is viewed as the
"price we have to pay" in order to compact the model representation (i.e., reduce the num-
ber of free parameters in the model). This is especially important for high-order systems,
as indicated in Equation (4.188), where the total number of free parameters for a VWM
model is shown to be linear with respect to the order ofnonlinearity Q and/or R.
The key trade-off between model compactness and ease of estimation must be exam-
ined in the context of each application, as it depends on the specific characteristics of each
system. Dur cumulative experience to date shows that model compactness is usually far
more important and justifies the additional methodological burden of nonlinear (as op-
posed to linear) estimation ofthe unknown parameters. The latter has to be achieved with
iterative cost-minimization methods that are subject to various potential pitfalls, as dis-
cussed in Section 4.2.2. Nonetheless, many mature methodologies exist for this purpose,
albeit requiring careful handling and attention to many procedural details.
At the core ofthe VENNWM model specification and estimation tasks is the iterative
algorithm by which the values of the model parameters are updated using the output-pre-
diction error. This algorithm was discussed in Section 4.2.2 and may take various forms,
depending on how the information about the local first and second derivative of the cost
function is utilized. Generally, methods that utilize this information to adjust the (vari-
able) step size are more efficient than fixed-step methods in terms of convergence. The
5.4 MODEL VALIDATION AND INTERPRETATION 279
potential entrapment in local minima remains the most serious problem of these iterative
estimation methods, imposing the burden of multiple randomized or grid-based trials/ini-
tializations or "disentrapment" techniques (e.g., simulated annealing or genetic algo-
rithms).
We favor the use of multiple initializations within a subspace of the parameter space
defined by a preliminary second-order approximation of the system model that places us
in the "neighborhood" of the global minimum. In terms of specific parameter update algo-
rithms, we favor either variable-step algorithms using previous update information (i.e.,
beta rule and delta-bar-delta rule) or the "parabolic leap" algorithm described in Section
4.2.2. The use of these algorithms is expected to provide rapid convergence for different
initializations. The global minimum is identified among the individual results from multi-
ple initializations in the aforementioned subspace defined by preliminary second-order
approximation. The parameter values that correspond to the global minimum cannot be
proven to be unique; however, the equivalent Volterra kemeIs (constructed from these pa-
rameters) are unique. This is the source of the great appeal and power of the Volterra
modeling framework.
The global minimum of the cost function is found for each selected model order for the
training dataset, and the corresponding sum of squared residuals of the testing dataset is
used to determine the correct model for the system, based on the statistical criterion de-
scribed in Section 2.3.1, where the number of free parameters is given by Equation
(4.188).
The selection of the correct VENNWM model order yields a Volterra-equivalent
model that can be interpreted either though its equivalent Volterra kemeIs or through the
equivalent PDM formulation, as discussed in the following section.
5.4 MODEL VALIDATION AND INTERPRETATION
Having obtained a model for the system under study, we must now validate it to assure its
acceptance by the peer community, and we seek to interpret it in a way that is physiologi-
cally meaningful and advances the scientific objectives of the study. Our efforts remain
focused on the MDV and VWM classes of models.
5.4.1 Model Validation

The validation of the obtained MDV and VENNWM models (following the procedures
described in the previous section) is performed initially through the normalized mean-
square error (NMSE) of the output prediction for the testing dataset. The NMSE is the
sum of the squared residuals for all data points in the testing datasets divided by the sum
ofthe squares ofthe respective de-meaned output values. This NMSE number (scalar) is
between 0 and 1 and represents the portion of the output signal power that is not "ex-
plained" by the model (it can be viewed as a percentage when multiplied by 100).
Ofcourse, this NMSE incorporates all the effects ofnoise and interference that are detri-
mental to the prediction task, either because they affect the estimated model or because they
are simply present in the output signal. Herein lies the key point: the mere size of the pre-
diction NMSE is not a necessary means of validation (although it is clearly a sufficient
means ofvalidation) because a low SNR at the output will result in large NMSE ofthe out-
put prediction even for a perfect model of the system. Therefore, an additional means of
validation (a necessary condition for stationary systems) is the consistency of the obtained
models for different experiments, regardless ofthe resulting NMSE of output prediction.
Note that the "consistency" of the models can be quantified by means of the mean-
square difference of the obtained Volterra kemels for different experiments. To elaborate
on these validation means, we consider the four possible outcomes oftwo-by-two combi-
nations of low/high prediction NMSE and consistent/inconsistent models resulting from
various experiments on the same preparation. If different preparations are used, then er-
godic considerations should enter into this assessment.
Clearly, ifthe prediction NMSE is low and the consistency is high, then the validation
task has been completed. The same is true when the consistency is high but the NMSE is
not low (as long as it is not so extremely high as to call into question the basic validity of
the experiment). In this case, we conclude that the model is valid because it is consistent
from experiment to experiment and has some predictive ability (albeit possibly limited by
the low SNR in the output data).
The case of low prediction NMSE (high predictive ability) but low consistency of the
models obtained for different experiments points to the likelihood of a nonstationary sys-
tem for which a predictive quasistationary model (i.e., nearly stationary over the time in-
terval of each experiment) can be obtained for each experiment on the same preparation at
different times. However, the nonstationarity of the system leads to low consistency of
modeling results from experiment to experiment. The nonstationary characteristics of the
system can be studied in this case by tracking the changes ofthe obtained models over time
using successive (or overlapping) time windows of data. A time-varying pattern of these
observed changes in model characteristics ought to be established in order to validate the
nonstationary modeling results; otherwise, the level of acceptance by the peer community
may be low, as no compelling evidence is offered to distinguish the obtained results from
an apparent random behavior. This is not an unlikely occasion in physiology, where slow
changes are known to occur in the characteristics of almost all systems due to various slow
modulatory factors (e.g., effects ofthe endocrine or metabolic systems) or due to degrada-
tion processes such as aging, wear, and fatigue. These occasions can make full use of the
power of our approach to capture and quantify nonstationary effects that demonstrate the
unparalleled efficacy ofthe advocated methodology. Two illustrative examples (from the
cardiovascular and the endocrine/metabolic systems) are presented in Chapter 6.
Finally, the case oflow predictive ability (high prediction NMSE) and low consistency
of models from experiment to experiment (on the same preparation) clearly represents the
worst of the four possible outcomes. There are three main implications of this outcome:
(1) the quality ofthe data is low (low SNR); (2) the system possibly exhibits characteris-
tics that are not compatible with practicable Volterra modeling (i.e., the system is ofvery
high order, or it contains nonlinear oscillators or chaotic components that endow it with
infinite memory); and (3) the system is nonstationary (either slow or fast nonstationarity
relative to the system dynamics). It is advisable in this case to improve the SNR as the
foremost priority, since only then can the other two possibilities be explored. This can be
achieved either by increasing the input power (if controllable), or by repetitive experi-
ments and proper data preparation (e.g., filtering or consolidation). The task of improving
the SNR in the data remains formidable in the presence ofnonstationarities and/or nonlin-
ear oscillators (or chaotic dynamics). Any improvement in SNR can be detected as a re-
duction in prediction NMSE and allows the exploration of possible nonstationarities with
the methods presented in Chapter 9. The exploration of quantitative models for nonlinear
oscillators or chaotic dynamics requires substantial improvements in SNR and may em-
5.4 MODEL VALIDATION AND INTERPRETATION 281
ploy a host of methods cited in the literature [Bassingthwaighte et al., 1994; Barahona &
Poon, 1996] or some specialized methods briefly described in Chapter 10.
An additional means of validation is our ability to interpret the obtained models in a
manner that is compatible with existing knowledge about the system functions (if such re-
liable knowledge is available) or appears plausible to our best scientific judgment (if spe-
cific reliable knowledge does not preexist). This brings us to the important issue ofmodel
interpretation.
5.4.2 Model Interpretation

Clearly, the whole enterprise ofphysiological system modeling would be without a clear
scientific purpose if we are unable to interpret the obtained models in a manner that is
physiologically meaningful and advances scientific knowledge. Even though an uninter-
pretable predictive model can still be useful for certain purposes (e.g., control or feature-
based diagnosis), the full utility of the modeling enterprise is realized only when we can
interpret the obtained model in a manner that advances the scientific understating of the
function of the system in the evolving context of systems physiology.
This goal is a critical part of the integrative systems viewpoint in physiological re-
search (discussed in Chapter 1) and provides the ultimate justification for the advocated
inductive approach to modeling, since physiological relevance is the cornerstone of the
alternative deductive approach. Demonstrating that physiological relevance can be ulti-
mately achieved by the inductive approach leaves the competing viewpoint of deductive
modeling with no competitive advantage. Although we generally favor a synergistic ap-
proach that utilizes the relative strengths of both inductive and deductive modeling
methodologies, it is clear that the methodological impetus of our work has been built on
the inductive approach.
There are two ways in which the obtained models can be interpreted: one is focused on
the obtained Volterra kernels and the other is based on the PDM formulation of the
Volterra model. The former is appropriate only für low-order models (up to second or-
der), since it is impractical to interpret multidimensional functions. The PDM fonnula-
tion, however, offers itself for interpretation of high-order models, especially in the case
of a separable nonlinearity, resulting in the relatively simple configuration of a small
number of parallel L-N cascades.
Interpretation of Volterra Kerneis. The interpretation of the first-order and second-

order Yolterra kemels can be made either in the time domain or in the frequency domain,
depending on the salient features ofthe system and the culture ofits horne discipline. For
instance, the time domain is preferred in discussing models of the effects of injected in-
sulin on the concentration of blood glucose, but the frequency domain is preferred in dis-
cussing models of the response of primary auditory fibers to acoustic stimuli (tones or
multitones). Specific examples will be given in Chapter 6.
The first-order kernel in the time domain represents the weighting pattern of input past
values in generating (upon integration) the linear component ofthe system output, as dis-
cussed in the introduction to the Volterra series in Section 2.1. Likewise, the second-order
kernel in the time domain represents the weighting patter of products of input past values
in generating (upon double integration) the quadratic component of the system output.
Naturally, peaks or troughs in these kemels attract attention as indicating the lag values
for which the input past epoch influences maximally the output (in a positive manner for a
peak and in a negative manner for a trough). The relative strength ofthese effects is quan-
tified by the respective kernel values.
In the frequency domain, the morphology of these two kerneis depicts the presence of
possible resonances (peaks ofthe magnitude) or dissonances (troughs ofthe magnitude).
Although this interpretation is straightforward for the first-order kernei, it is less intuitive
for the second-order kernel where one must invoke the notion of bifrequeney. A bifre-
quency location attains the meaning of a pair of frequencies whose stimulating combina-
tion at the input has the effect on the output that is depicted in the second-order kernel
value at this bifrequency point. For instance, if the bifrequency point ( Wl, W2) eorresponds
to a magnitude peak of the Fourier transform (or FFT, in practice) of the second-order
kernei, then the combination of input power at frequencies Wl and W2 results in a large ef-
fect on the output (positive or negative depending on the respective phase). This effect is
very small when the respective bifrequency point corresponds to a magnitude trough of
the second-order kernel FFT.
The bifrequency locations of magnitude peaks and troughs indicate the presence of
nonlinearities in individual physiological mechanisms, if the latter can be associated with
specific frequency bands. For instance, resonances or dissonances appearing at the fre-
quency bands associated with the dominant breathing rate or heart rate can be interpreted
in terms of interactions with the respiratory or cardiac activity. Similar interpretations can
be developed for frequency bands associated with myogenic, sympathetic, parasympa-
thetic, endothelial, metabolie, and endocrine activity, as discussed in Chapter 6 for the
specific cases of renal and cerebral autoregulation.
In general, it should be noted that magnitude peaks (or troughs) on the diagonal bifre-
quencies (Wb Wl) of the second-order kernel reflect amplitude nonlinearities of the indi-
vidual mechanisms residing in the respective frequency band Wl, and off-diagonal peaks
(or troughs) at bifrequency locations (Wl' W2) reflect the presence of nonlinear interaetions
between the (possible) individual mechanisms residing at the frequency bands Wl and W2.
Of course, if an individual mechanism resides in two or more frequency bands (i.e., it has
two or more resonances or dissonances), then the amplitude nonlinearities of this mecha-
nism will give rise to multiple on-diagonal and off-diagonal bifrequency combinations, as
mathematically predicted by the frequency-domain analysis of the Volterra models pre-
sented in Section 2.1.3.
The morphology of the magnitude of the first-order and second-order kerneis in the
frequency domain also depicts possible low-pass or high-pass response characteristics of
the system (in addition to the resonances and dissonances). An example ofthis is given in
Chapter 6 for a mechanoreceptor.
Interpretation of the PDM Model. In the "principal dynamic mode"(PDM) formula-

tion of the Volterra model, the estimated PDMs constitute the minimum set of linear fil-
ters that can represent the system dynamics. They result either from eigendeeomposition
of the Volterra kerneis or from the in-bound weights of each hidden unit in the first hid-
den layer of the Volterra-equivalent network model (in combination, of course, with the
respective filter bank), as discussed in Section 4.1.1. Therefore, the PDMs directly depict
any dynamic characteristics ofthe system (e.g., low-pass, high-pass, or band-pass).
The outputs of the PDMs are transformed nonlinearly by the multiinput nonlinearity
that generates the system/model output and is represented in the model by the combina-
tion of the activation functions in the hidden and interaction layers, or by the individual
static nonlinearities in the separable case. The interpretation of the static nonlinearities is
5.5 OUTLINE OF STEP-BY-STEP PROCEDURE 283
based on visual inspection (e.g., saturation, threshold, decompressive, or other character-

istics). This visual inspection is easier, of course, in the separable case.
The physiological interpretation ofthe PDM model hinges primarilyon our ability to as-
sociate features ofthe individual PDMs with distinct physiological mechanisms. This is not
a simple task and it cannot be supported by general theoretical considerations (i.e., there is
no mathematical proof or any guarantee that individual PDMs will correspond to distinct
physiological mechanisms). This task is now exposed to the evolutionary influences ofthe
accumulated evidence in current and future studies. The aforementioned conjecture re-
garding correspondence of PDM features to physiological mechanisms is expected to be
tested by this accumulating experience and will be either affirmed or evolved.
Illustrative examples of PDM model interpretation are given in Chapter 6 for neural,
cardiovascular, and endocrine/metabolic systems. These examples constitute only the ini-
tial step in this ambitious undertaking of physiological interpretation, with crucial impor-
tance for the ultimate impact ofnonlinear modeling on systems physiology. It is plausible
that additional insights and useful interpretation clues may result from developing equiva-
lent parametric models or modular models (other than the PDM formulation), following
the theory and methods presented in Chapters 3 and 4, respectively.
5.5 OUTLINE OF STEP-BY-STEP PROCEDURE
In this section, we provide abrief outline of the advocated step-by-step procedure for
physiological system modeling in order to assist the reader in developing a comprehen-
sive and useful understanding of this approach in a practical context.
Step 1:Determine the system characteristics of bandwidth, memory, and dynamic

range.
Step 2: Examine the system linearity, stationarity and ergodicity. Establish the de-
gree of nonlinearity (if nonlinear) and the suitability of stationary or quasista-
tionary modeling.
Step 3: Select the input characteristics of waveform class, bandwidth, amplitude dis-
tribution, and dynamic range-if the input is experimental and controllable.
A natural input ensemble is recommended, ifpossible.
Step 4: Determine the data-record length requirements and prepare the input-output
data according to the system characteristics and the noise/interference condi-
tions (i.e., data filtering or consolidation).
Step 5: Perform the core tasks of model specification and estimation, following ei-
ther direct inversion methods (for discrete Volterra models) or iterative cost-
minimization methods (for Volterra-equivalent network models).
Step 6: Repeat Steps 1-5 for as many data segments (or experiments) of the same
and different preparations as possible, in order to explore the consistency of
the results and possible nonstationary or nonergodie characteristics.
Step 7: Perform the model validation task based on output prediction accuracy (for
the testing datasets) and also based on the degree of consistency of the ob-
tained models from experiment to experiment.
Step 8: Perform the interpretation task for the validated model, either in the PDM
formulation (recommended as the most promising option) or by inspecting
the low-order Volterra kernels, The interpretation results can be enriched by

ineorporating prior knowledge about the system and the results of supple-
mental parametric and alternative modular analysis.
Step 9: Based on the final interpretation results and analysis of the key physiologieal
issues, design the next set of experiments or natural data collections that can
elueidated possible ambiguities or explore additional facets of interest.
Step 10: Disseminate the results in the prescribed rigorous context and utilize the ob-
tained model for the specific purpose of the study (e.g., advancement of sei-
entific knowledge, diagnosis, control and outcome assessment). (For this, do
not forget to read the following chapters as weIl.)
5.5.1 Elaboration of the Key Step #5

The reeommended proeedure for the core tasks of model specification and estimation per-
formed by Step #5 is:
(a) Seleet the "training" and "testing" datasets through four-to-one random sampling
of the prepared input-output data.
(b) Compute the seeond-order or third-order MDV model through direct inversion
[see Equation (2.185)] using the model-order-selection (MOS) criterion ofEqua-
tion (2.200) to determine the appropriate structural parameters a and L through an
aseending-order search procedure.
(e) Find the PDMs ofthe obtained second-order or third-order MDV model using the
deeomposition approach outlined in Section 4.1.1.
(d) Compute the PDM outputs {uj(n)} U = 1, ... , M < L) through convolution of
eaeh PDM with the input data.
(e) Determine the static mapping nonlinearity of the "state-vector" u = [ul(n) ...
uAin)]' onto the output data yen). This can be done either eomputationally (by
defining proper bins of a grid in the state space and computing the average output
value for eaeh bin) or through least-squares fitting of a postulated analytical ex-
pression for the multivariate static nonlinearity (e.g., multinomial).
An alternative route after step 5(c) is the following:
(d') Use the PDMs to determine the equivalent difference equations of input prepro-
eessing for the VWM model [see Equation (4.181)].
(e') Use the obtained difference equations to initialize the coefficient values of the
VWM model, and the kerne I deeompositions (eigenvalues or singular values) to
initialize the polynomial coefficients of the hidden units of the VWM model.
(f') Initialize the sigmoidal funetions of the VWM hidden units in the "quasilinear"
region (i.e., small curvature within the input range) and set the number of interac-
tion units equal to [H/2] with quadratic activation funetions having very small ini-
tial coefficient of second degree (for unity eoefficient of first degree).
(g') Train the VWM model for higher orders in the hidden units (Q ;;::: 3) using the
MOS criterion ofEq. (2.200) on the testing dataset.
Note that the resulting VWM model is equivalent to the PDM model, the only difference
being that the static nonlinearity is constrained in the VWM model for greater parsimony.
6
Selected Applications
In this chapter, we present examples of the application of the advocated methodology to

actual physiological systems. These systems represent various physiological domains
(neural, cardiovascular, renal, endocrine/metabolic) for which actual Volterra-Wiener
type of modeling has been undertaken over the course of the last 30 years. Naturally, the
specific variant of the advocated modeling approach taken in each case depends on the
evolutionary stage of the subject methodology at the time of each application and, of
course, on the particular characteristics of the system under study.
The illustrative applications presented herein should not be viewed as a complete or
exhaustive review of the relevant bibliography, but are primarily drawn from work with
which the author has been direct1y involved (with a few exceptions). This affords the req-
uisite familiarity with the procedural and methodological details of each application, as
well as an intimate knowledge ofthe physiological objectives. The applications presented
in this chapter are for single-input/single-output cases, since the multiple-input/multiple-
output cases are discussed in Chapter 7. Examples from the particular case of neural sys-
tems with spike-train inputs are discussed separately in Chapter 8, since this case requires
specialized variants of the general methodology that are compatible with the binary signal
modality of spike trains. All the examples presented in this chapter have discretized (sam-
pled) continuous inputs, but they can have either discretized continuous or point-process
(binary spike-train) outputs (as in the case ofneurosensory systems).
Section 6.1 deals with neurosensory systems (visual, auditory, and somatosensory)
that receive continuous inputs and generate either continuous (graded) or point-process
(spike-train) outputs. Neural systems with spike-train inputs are addressed separately in
Chapter 8. Examples of cardiovascular and renal systems are drawn from blood-flow au-
toregulation studies and presented in Sections 6.2 and 6.3, respectively. The dynamic ef-
fects of injected insulin on blood glucose concentration is used as an example of a meta-
bolic/endocrine system in Section 6.4.

286 SELECTED APPLICATIONS
Many additional applications ofthe Volterra-Wiener approach can be found in the lit-
erature that cannot be reviewed here in the interest of space. Among them, it is worth not-
ing the first known application of this approach to a physiological system (the pupil re-
flex) by L. Stark and his associates [Sandberg & Stark, 1968; Stark, 1968, 1969], the
nonlinear modeling of aplysia neurons by J. Segundo and his associates [Bryant & Segun-
do, 1976; Brillinger et al., 1976; Brillinger & Segundo, 1979], ofthe muscle spindIe by G.
Moore [Moore et al., 1975], ofthe ankle reflex by R. Kearney and I. Hunter (1990), of
growth dynamics in phycomyces by E. Lipson (1975), of ganglion cells in the cat retina
by J. Victor and R. Shapley (1979a, b,1980), and of auditory neurons by A. Moller (1973,
1975, 1976, 1977, 1983, 1987). Ofparticular note are the additional applications (not pre-
sented herein) by K. Naka et al., A. French et al., and T. Lewis et al., as weIl as the
groundbreaking work on neuronal systems with point-process inputs by T. Berger and R.
Selabassi and their associates that is reviewed in Chapter 8.
Note that the important cases of spatiotemporal and spectrotemporal receptive fields
(whose analysis is enabled by this approach) are discussed in Section 7.4.
6.1 NEUROSENSORY SYSTEMS
As indicated above, most neural systems receive as input presynaptic action potentials
(spike trains) that represent a particular data modality requiring specialized treatment
(discussed separately in Chapter 8). In this section, we will present examples from neu-
rosensory systems that have discretized continuous inputs (e.g., light intensity for the vis-
ual system, sound pressure for the auditory system, mechanical strain for the somatosen-
sory system). Some of these neurosensory systems have continuous output (e.g., graded
intracellular potential in retinal receptor, horizontal, and bipolar cells) and others generate
sequences of action potentials (e.g., ganglion retinal cells, primary auditory nerve fibers,
or cuticular mechanoreceptors). In the latter case, a threshold-trigger operator is appended
to the Volterra model output in order to generate the binary spike-train output correspond-
ing to the recorded sequence of action potentials (see also Section 8.2).
An alternative to thresholding that has been used widely in the past is to define the out-
put of the model as the "likelihood of firing an action potential" instead of the binary nu-
merical value denoting the presence (1) or absence (0) of an action potential at each dis-
crete time bin n. The "likelihood of firing" at each time bin n (outside the absolute
refractory period) is quantified by the continuous output of the Volterra modeland can be
converted into "probability of firing" through proper normalization into the range [0, 1].
Of course, the application of a hard threshold on the graded output signal representing the
likelihood or probability of firing can yield a binary (spike train) representation of the
output prediction as a sequence of action potentials.
For historical reasons, we begin with illustrative examples from the extensive model-
ing work that Ken Naka and his associates performed on the vertebrate retina over the last
30 years. This includes the pioneering applications to catfish retinal cells that the author' s
brother, Panos, performed in collaboration with Ken Naka that gave the initial impetus to
this field in the early 1970s. These pioneering applications, along with the earlier contri-
butions of Larry Stark and his associates that represent the first known application of the
Volterra-Wiener approach to physiological systems (the pupil reflex), should be credited
with establishing the importance of this modeling approach to systems physiology and
have been seminal in the formation of this entire field of study.
6. 1 NEUROSENSORY SYSTEMS 287
Note that a major contribution ofPanos Marmarelis to this field has been his extension
of the Volterra-Wiener modeling methodology to systems with two (or more) inputs,
which is discussed in Chapter 7, where illustrative examples are also given from his
groundbreaking applications to motion-sensitive eells in the fly composite eye (with
Gilbert McCann) and to the bipolar cells ofthe catfish retina (with Ken Naka). We con-
clude the illustrative examples of neurosensory systems with a nonlinear model of the fly
photoreceptor retinula cells 1-6, which was the author' s first application of
Volterra-Wiener modeling to a physiological system (with Gilbert McCann) during his
Ph.D. studies at Caltech in the mid-1970s, along with some additional applications to
photoreceptors from other research groups.
Following these examples from the visual system, we present a nonlinear model ofpri-
mary auditory nerve fibers due to Ted Lewis and his associates as an illustrative example
from the auditory system. We elose this section with a nonlinear model of a cuticular
mechanoreceptor as a simple example of a somatosensory system, which was obtained by
the author in collaboration with Andrew Freneh. The latter example includes intracellular
and extracellular output recordings.
These illustrative examples from neurosensory systems are a fitting tribute to the his-
torical importance of these applieations for the establishment of this field and demonstrate
the important fact of common funetional attributes of neurosensory systems (e.g., the
presence of two principal dynamic modes eneoding, respectively, the intensity and the
rate of change of input information).
6.1.1 Vertebrate Retina

As a first example, we select for historical reasons the Wiener models of catfish retinal
cells obtained by Panos Marmarelis and Ken Naka in the pioneering applications of this
approach [Marmarelis & Naka, 1972, 1973a, b, c, d, 1974a, b]. A schematic ofthe retina
organization is shown in Figure 1.3 and a simplified block diagram of neuronal signal
flow is shown in Figure 1.4 (note that more elaborate modular models of the outer retinal
layers are shown in Figures 4.28 and 4.31 in connection with nonlinear feedback analy-
sis).
The first set of Wiener kernel estimates (of first and seeond order) obtained in catfish
retinal ganglion cells is shown in Figure 1.5, where the analyzed output data represent
"probability of firing" measured by superimposition of recorded spike-trains from repeti-
tive identical trials (peristimulus histogram) using a segment of band-limited GWN cur-
rent stimulus injected into the horizontal cell layer. As indicated in Section 1.4, the
"biphasic" waveform of the first-order Wiener kerne I exhibits the two main functional
eharacteristic of sensory systems: (1) the initial positive lobe (peaking in this case at
about 30 ms) indicates that the system responds to the input intensity after weighted inte-
gration (smoothing) over a 50-60 ms time window (using the weighting pattern defined
by the shape of the positive lobe); and (2) the subsequent undershoot (the negative lobe
extending from about 60 ms to about 200 ms) indieates that the system also responds to
changes in the smoothed input intensity, thus providing a robust running measure of the
rate-of-change of input intensity through time.
The biphasic form ofthe first-order Wiener kerne I in retinal ganglion cells is more ev-
ident for higher mean and/or power levels of stimulation (a fact that carries over to other
retinal cells as well). This fact suggests the presence of explicit (i.e., through physical
neuronal processes) or implicit (i.e., through nonlinearities ofthe underlying biophysical
mechanisms) nonlinear feedback, as discussed in Section 4.1.5. This biphasic form often
diminishes for low levels of stimulation (e.g., dark-adapted preparations exposed to low-
intensity light stimulation), taking a monophasic (low-pass) form. It is important to note
that the specific kernel waveform can be related to the dynamics of distinct ionic channel
mechanisms, in accordance with the detailed analysis presented in Section 8.1.
The shape ofthe second-order Wiener kernel exhibits a primary positive mount (peak-
ing at the diagonal point with lags/coordinates Tl = T2 = 30 ms) and a secondary positive
mount at lags/coordinates Tl = 70 ms and T2 = 100 ms (and the symmetrie mount about
the diagonal). Negative "valleys" run along the diagonal after a T2 lag of about 75 ms and
originating at Tl = 60 ms (also the symmetrie "valley" originating at T2 = 60 ms for Tl >
75 ms). The fact that the primary positive mount peaks at the same lag as the main posi-
tive lobe of the first-order kernel suggests a rectifying nonlinearity (consistent with the
notion of a threshold mechanism generating the action potentials at the axon hillock of the
ganglion cell). However, the subtler features of the second-order kernel require more nu-
anced interpretation that was not possible at that early time. At the present time, the use of
a PDM model can offer this kind of refined and elaborate interpretation, extending also to
higher-order nonlinearities. Unfortunately, these data have not been analyzed in the ad-
vanced context of PDM modeling (a fact applying to all retinal cell models to date, with
the exception of the related L-N and L-N-M cascade models of some retinal cells dis-
cussed below). The presented Wiener model of the horizontal-to-ganglion cell pathway
was validated by means of its ability to reduce the mean-square error of the model predic-
tion for the experimental input-output data, as illustrated in Figure 1.6.
Recently, the author and his associates attempted some preliminary PDM analysis of
retinal data provided by Ken Naka for this purpose. The initial results indicate the pres-
ence of only two PDMs, associated with rectifying- and saturating-type nonlinearities. If
confirmed with additional data analysis, this result offers the attractive prospect of inter-
preting the PDM models ofretinal cells in the context ofbiophysical mechanisms, there-
by identifying specific ionic channels associated with the electrochemical activity of reti-
nal cells under various operating conditions (following the procedures presented in
Sections 8.1 and 8.2). The time is ripe to examine these issues thoroughly and establish
the basic facts of retinal cell function in light of our current improved estimation and rig-
orous understanding of kemels,
We should note that these Wiener kerneI estimates depend on the mean level and the
power level of the GWN stimulus with which they have been estimated. Therefore, the
aforementioned kernel morphology may change if different mean and/or power levels are
used for the GWN test input (an important fact largely overlooked in the past). Of course,
these changes in kerne I morphology will be minimal for the second-order Wiener kernel
if the system is well approximated by a second-order model (although this is not true for
the first-order Wiener kernel estimate that can be affected significantly if the mean level
ofthe GWN input is altered, even for a second-order system).
An illustration of this is given in Figure 6.1 where the first- and second-order Wiener
kernels ofthe light-to-horizontal cell system in the catfish retina are shown for two differ-
ent mean and power levels ofthe GWN input [Marmarelis & Naka, 1973a]. The first-or-
der Wiener kerne I for the higher level of stimulation exhibits an undershoot after a lag of
approximately 90 ms (unlike its lower-level counterpart that exhibits no undershoot) and
has a shorter peak-response time (about 40 ms versus about 70 ms for its Iower-level
counterpart). This is indicative of nonlinear feedback in this system (see Section 4.1.5).
Furthermore, the second-order Wiener kernel estimates (although ofrelatively poor qual-
h l (T)
32l- -.A Q)
U)
c
0
281- I \ - 0-
U)
24r X\
Q)
a::
I
20
16
Log I
12
8
4
0
-4
-8 (msec) T
-12
64 128 192 256 320
TI' sec TI ' sec

.032 .06~ .096 .128 .16 .032 061& .096 .128 .16
.032 .032
~ ~
~ .~ ~6 11). . .
....
N
061&
""N
.096
( .096
.128 +I <, "J \ -, -..... .128
~o \
.16
r-. ,-.......
.16
Low meon intensity High meon intensity

h2 ('I' '2)
Figure 6.1 First-order and second-order Wiener kerneis of the Iight-to-horizontal cell system for
low (8) and high (A) mean light intensity levels. [Marmarelis & Naka, 1973a].
ity) demonstrate that the higher level of stimulation gives rise to stronger nonlinearities
(as depicted by the larger values of the second-order kernel for higher level of stimula-
tion) and the negative sign of the primary trough suggests compressive nonlinearity (i.e.,
the second derivative of the nonlinearity is negative). The values of the second-order
Wiener kernel for the lower level of stimulation are small, suggesting an operating point
in a quasilinear region or close to an inflection point of the putative nonlinearity.
It is interesting to note that the first-order Wiener kernel of the light-to-ganglion cell
system shown in Figure 6.2 exhibits more differentiating characteristics (i.e., more sensi-
290 SELECTED APPLICA TIONS
.oaa .* .OII .la .11
.032
•GIN
.OII
.128
.18
Figure 6.2 First-order and second-order Wiener kerneis of light-to-ganlion-cell system [Naka,
1973b]. The scale bar in the lett panel corresponds to 64 ms.
tivity to input change and exhibiting broader bandwidth) than its counterpart for the light-
to-horizontal cell system shown in Figure 6. I-an effect that can be attributed to neural
processing perfonned by the mediating bipolar and amacrine cells. In addition, the sec-
ond-order Wiener kernel ofthe light-to-ganglion cell system indicates stronger nonlinear-
ities (than the light-to-horizontal cell system) and of decompressive character attested to
by the positive sign of the principal mount.
The historical value of these examples notwithstanding, it would be remiss not to re-
mind the reader that the advocated approach in this book favors the estimation of the
Volterra kernels ofthe system with more accurate/reliable methods than cross-correlation
(e.g., LET, LVN, or VWM) presented in Sections 2.3, 4.3, and 4.4. Therefore, equipped
with our current knowledge and advanced methodologies, we should revisit the subject of
nonlinear dynamic modeling of retinal cells, because it offers a stable experimental prepa-
ration of great neurophysiological importance.
Ken Naka and his associates (most notably Hiroko Sakai) perfonned extensive studies
of catfish retinal cells with various types of GWN stimuli over many years, starting with
the pioneering studies at Caltech in the early 1970s (in collaboration with Panos Mar-
mare lis) cited above. These extensive and meticulous studies have elucidated many as-
pects of retinal function in the context of Wiener models and have yielded a valuable
database for additional analysis. Experimental data included responses of horizontal,
bipolar, amacrine, and ganglion cells in response to GWN light stimuli, as well as inject-
ed current stimuli into horizontal, bipolar, and amacrine cells. Studies included multiinput
as well as multicolor testing and modeling [Sakai & Naka, 1985, 1987a, b, 1988a, b].
Some illustrative examples are given below for amacrine and ganglion cells.
Figure 6.3 shows a schematic of retinal organization depicting the main signal flow
pathways and the neuronal classification proposed by Naka. Typical first-order and sec-
ond-order Wiener kerneI estimates for various types of bipolar, amacrine, and ganglion
cells are shown in Figure 6.4 (on-center and off-center). An interesting demonstration of
6.1 NEUROSENSORY SYSTEMS 291
Figura 6.3 Schematic diagram of the neural organization of the catfish retina proposed by Naka
and his associates. Arrows indicate signal pathways with sign-noninverting (+) or sign-inventing (-)
transmission. Note that all monosynaptic inputs to ganglion cells are sign-nonineverting [Sakai &
Naka, 1988b] .
how the difference in the second-order kemels of type C and type NB amacrine cells can
be explained by means of a postcascaded linear filter is presented in Figure 6.5, whereby
the form of the second-order kernel of the type C amacrine cell corresponds to an L-N
cascade, but that of the type NB amacrine cell corresponds to an L -N-M cascade model.
Finally, we show in Figure 6.6 the first-order Wiener kernel estimates obtained for the
neuronal pathways between different types of amacrine and ganglion cells when GWN
current stimulus is injected into the amacrine cell and the elicited response in the ganglion
A B
..,
CJl
~
i)
~
("Tl
z...;
sr
CJl
.J
..,
CJl
i)
;;;
rn
CJl
.J
0.1 sec 0 .2
Figura 6.4a Typical first-order Wiener kemels of each type of retinal neuron are shown. The stimu -
lus was a full-field white-noise light stimulus whose mean luminance was 30 J1-W/cm 2 • Column A
shows first-order kemels from depolar izing (on-) cells; Column B shows the first-order kemels from
hyperpolarizing (011-) cells. Four kemels from the same cell type were superimposed by normalizing
the kemel amplitude. Kemels on the lowest traces were computed from extracellu lar spike dis-
charges, whereas the others were trom intracellular slow potentials. The kemel ordinate unit is mV .
cm 2 J1-W-1 for intracellular responses and spikes · cm 2 J1-W-1 for extracellular spike discharges [Naka
& Bhanot, 2002] .
..,
(f)
~
2 ""0
~
B GANGLION ~ELLS ~
ß~ß~ße .~~/
~ ~
!2
.a." !2
2 2 2
a..V
#11 p
~
<:lCf:l
J:j
C
ß~~
~ O e 7" ~
-at _,it.. _,~~,:J

0
o ~!2 ~ 0 l'Tl
Figure 6.4b Typical second-order Wiener kemels obtained trom amacrine cells (A), trom intracellu-
larly recorded ganglion cells (B), and trom extracellular ganglion spike discharges (C). Solid Iines in
the kemel indicate a mount or a depolarization, and dashed Iines indicate a valley or a hyperpolariza-
tion. Although there are some variations, the second-order kemels computed trom amacrine cells are
essent ially ot three kinds (type-C, type-NB, type-NA). The second-order kemels computed trom gan-
glion cells , either trom intracellular slow potentials (B) or trom extracellular spike d ischarges (C), cor-
respond to the same three kinds obtained trom amacrine cells . The kemel ordinate unit is mV . crrr'
/LW-2 tor intracellular responses and spikes -crrr' /LW-2 for extracellular sp ike discharges [Naka &
Bhanot, 2002].
cell is recorded as output. It is intriguing that homotypical cell pathways (e.g., NA - GA,
NB - GB) exhibit the typical profile of neuronal transduction in neurosensory systems
(intensity and rate sensitivity), but heterotypical cell pathways (e.g., NA _ GB, NB_
GA) exhibit a dynamic characteristic of double differentiation (implying acceleration or
edge sensitivity). The latter should be expected in tasks specializing in detection and cod-
ing of on-off transient response characteristics that relate to the appearance and disap-
pearance of sudden events (possible "threats"), as weIl as edge information related to mo-
tion detection and coding.
The physiological interpretation of these modeling results elucidates certain aspects of
visual information processing in the vertebrate retina. In the outer plexiform layer (i.e.,
the receptor-horizontal-bipolar complex), we take into consideration the putative nonIin-
ear feedback model presented in Section 4.1.5 that explains the experimentally observed
first-order Wiener kerneis and their changes for changing GWN input power. We posit
that the receptor-horizontal complex (Iike most neurosensory systems) processes intensi-
ty and rate-of-change information of the input signal in a manner that retains adequate
sensitivity (gain) over a broad range of stimulation by adjusting dynamically its response
characteristics to the input conditions (mean and power level) using nonlinear feedback in
tandem with a compressive nonlinearity (see Section 4.1.5). Furthermore, this arrange-
A 8
~RETINA~ ~RETINA~
..~o
•• @
~
RE!
~J
_AU _ _ASS SClUAIItlG
-E-fJ}Ulliu EillfQ~
~j~'
~
O. j~
•• 0
''1o •• cOMI\.AT1OfII
ll~i':"cC~'T'"
Figura 6.5 Validation of an L-N cascade model for a type -C amacrine cell (Ieft) and of an L-N-M
cascade model of a type-NB amacrine cell (right) based on their second-order Wiener kemels. The
impulse responses of the prior filter Land posterior filter Mare shown. The output of the first filter L
is squared [Naka & Bhanot, 2002] .
ment secures faster response and broader bandwidth when the input intensity and rate of
change are greater. The latter attribute meets the critical behavioral requirement of rapid
detection of threats for survival purposes. The spatial arrangement of the receptor-hori-
zontal cell complex also provides processing capabilities for contrast enhancement and
edge detection (center-surround organization of the receptive field), as discussed in Sec-
tion 7.4 in connection with spatiotemporal modeling.
The addition of the bipolar cell (the main "bridge" between the outer and the inner
plexiform layers of the retina) to the receptor-horizontal complex further enhances the
differentiating characteristics of visual (pre)processing in time and space that emphasize
contrast and rate-of-change information . The triadic synapses (bringing together horizon-
tal processes and bipolar dendrites at the receptor terminals) assist with this differentiat-
ing task in time and space, and implement the continuous adjustment of processing char-
acteristics to the changing conditions ofvisual stimulation (the morphology ofthe triadic
invagination changes with changing stimulus intensity). The receptive field ofthe bipolar
cell performs double differentiation in space (for edge enhancement) and single differen-
tiation in time (for rate-of-change enhancement}-both band-limited operations, as dis-
cussed further in Sections 7.2.2 and 7.4.1.
In the inner plexiform layer (i.e., the amacrine-ganglion complex), the obtained mod-
eling results point to the existence of three channels of information processing (corre-
A1 81
u
~
«c:
NA-GA
NB-GB
u
cu
~
«
I
··
t'
..,
-, ,,\
NB-GA
NA-GB
<,
.x
Cf)
ca>
.. --..........--.... ~
cl)
CI>
..x
'Ci.
V)
.~
v
0 50
i
100 ms 150
, I
0
i
50
i
100 ms 150
I
A2
~I
l~
....,
., '.
82
CD
"0
0
N
10 Hz 100 10 Hz 100
Figure 6.6 A1 shows the first-order Wiener kerneis derived for the neuronal pathway from type-NA
amacrine to ganglion GA cell (solid line) and from type-NB amacrine to GB ganglion cell (dashed
line). The two kernel waveforms are almost identical. B1 shows the first-order Wiener kerneis derived
for the neuronal pathway from type-NB amacrine to GA ganglion cell (solid line) and from type-NA
amacrine to GB ganglion cell (dashed line). Fourier transform of the kerneis yielded the gain charac-
teristics in A2 and B2, respectively [Sakai & Naka, 1988a].
sponding to the traditional classification into on-sustained, off-sustained, and on/off-tran-

sient response characteristics) denoted by Naka et al., as type A, B, and C, respectively.
These three channels of information processing are also interacting through amacrine in-
temeuronal connections in order to encode the rate-of-change and intensity of signals re-
lated to motion (translational and rotational) over spatial patches that retain localization of
motion information. For instance, the differentiation-rectification-integration cascade of
the type-NB amacrine cell shown in Figure 6.5 provides a running measure of speed, and
a measure of acceleration is given by the double-differentiation characteristic shown in
the first-order kemels of heterotypical amacrine-ganglion pathways (e.g., NA ~ GB or
NB ~ GA) shown in Figure 6.6.
An interesting example of a cascade model is also given by Emerson and his associates
for visual cortical cells in the cat [Emerson et al., 1989, 1992]. This work is revisited in
Section 7.4 in the context of spatiotemporal modeling. Here, we limit ourselves to single
cortical cells (both "simple" and "complex") for which the cascade models of Figures 6.7
and 6.8 were obtained [Emerson et al., 1989]. For a simple cortical cell, the cascade mod-
el of Figure 6.7 suggests that the aggregate effect of two long chains of concatenated op-
erations from the retina to the LGN (depicted within the component LI) can be approxi-
mated by a linear operator with the typical dynamic characteristics of visual sensory
neurons (e.g., the first-order kernel ofretinal ganglion cells shown earlier). This first qua-
silinear operation is followed by a half-wave rectifying static nonlinearity (ramp thresh-
SIMPLECKLL
CASCADE MODBL
L,no
LI N L2
LI.I LJA
IIJ/M
.
rdWLt ~
R......GC LGH CORTEX

~
Figure 6.7 An expanded version of the L1-N-L2 model of a simple cortical cell to show the compo-
nents within L1 that could account for the linear representation of stimulus luminance in an off area of
the cortical receptive field. The first three sub-boxes in L1 account for linear transduction of light to
an internally transmitted signal, linear filtering of the signal, and conversion into a spike train, all oc-
curring between receptors and the output of the ganglion cells in the retina. The knee of ganglion-cell
threshold (N1.1) is shown far to the left to signify that the threshold is so low that a nonlinearity would
appear only for the most extreme negative modulations of the steady-state signal at mean luminance
(indicated by the dashed vertical line). The last three subboxes account for transmission of the spike
train to the LGN, conversion through synaptic input to a higher-threshold LGN neuron, nonlinear re-
coding into another spike train through principal cells of the LGN, and transmission to the cortex.
The last subboxes, operations L1.3 and L1.4, include inversion of the on-pathway signal polarity,
which combines with the off-pathway signal to recreate a linear intracellular voltage in the cortical
neuron. This "push-pull" approach has the effect of cancelling the even-ordered nonlinearities dis-
tributed along each of the complementary pathways [Emerson et al., 1989].
old) and another linear filter, with the characteristics shown in Figure 6.7. For a complex
cortical cell, the resulting cascade model has two parallel branches corresponding to on
and off pathways, each associated with a ramp-threshold static nonlinearity. The combi-
nation of these two pathways gives the total behavior of full-wave rectification as shown
in Figure 6.8, consistent with an approximate "squaring" operation required by the "ener-
gy model" of Adelson and Bergen (1985).
These initial interpretations of the modeling results will be further elucidated in the
discussion of spatiotemporal modeling of visual systems, given in Section 7.4 in the con-
text of multiinput modeling. They can also benefit significantly by the application of the
advocated PDM modeling methodology that has not yet been applied to the wealth of data
available from numerous experimental studies of the visual system.
Additional results on retinal cell modeling and interpretation will be presented in
Chapter 7 in connection with two-input modeling, whereby valuable insight into the spa-
tiotemporal organization of the retina was obtained by the pioneering studies performed
by P. Marmarelis and Naka using spot/annulus stimuli, which opened the path for future
multiinput modeling studies. In particular, the basic characteristics of the spatiotemporal
receptive fields of bipolar, amacrine, and ganglion cells were revealed for the first time
through these studies. These efforts culminated with the measurements of complete spa-
GENERATING EVEN-oRDBR FUNC110N

(COMPLEX CELL)
N
L
ow
I I II~
N
L
1I~
ON
I I
A B
SEPARATE ON aad EVEN·ORDER ..
OFF NONLINEARITIES SQUARING FUNCI10N
i
0.50 r
0.40
0.30
B 0.20
! ,,
'\ 0,.,
""\,
\
\
,,
ON
i
0.50
0.40
0.30
~
\
SQU~:
I~VEN
\
\
,, 0.20
0.10
-G.20 ..0.10 0.0 O.tO 0.20 -0.20 -0.10 0.0 0.10 0.20
INPUT INPUT
Figure 6.8 Model for generating the squaring function needed by complex cortical cells for motion
perception (energy model). Aseparate static nonlinear operation is provided for the linear on and off
pathways. Combination of the static nonlinearities produces a close approximation of a full squarer
at the output (see bottom panels) [Emerson et al., 1989].
tiotemporal receptive fields of retinal ganglion cells by Citron, Kroeker, and McCann
[Citron et al., 1981], which was an extraordinary achievement and was extended to corti-
cal cells by Emerson and Citron [Citron & Emerson, 1983; Citron et al. 1981]. As will be
seen in Section 7.4, these spatiotemporal receptive field measurements are cast in a non-
linear context and contain valuable information about retinal processing unavailable in
any other measurements attained to date. Given the additional methodological power that
becomes available with the techniques advocated in this book (e.g., PDM modeling), it is
impossible to overstate the potential for significant advancements in spatiotemporal neur-
al processing in the immediate future.
6.1.2 Invertebrate Retina

An example from invertebrate vision is the modular model ofthe retinula cells 1-6 ofthe
fly (Calliphora erythrocephala) photoreceptor in a single ommatidium of the composite
eye, which takes the form ofthe L-N cascade shown in Figure 1.7. This modular model
was developed by analyzing the CSRS kemels shown in Figure 6.9 that were obtained
with an eight-level CSRS test input during the author's Ph.D. studies with Gilbert Mc-
FIRST-QRDER KERNEL
~e 50 ESTIMATE 0.(,')
::J 40
.;
e 30
~"i 20
~- 10
o
~ -10
~ I , , I I ,
0.00 u.06
-
T llee)
0.06.
SECOND-QRDER KERNEL
ESTIMATE ~IT •• T2)
j / .?
... 5@
Q .
~
~:::/
/' ~ ', .
, // /
!!f[:n~1~z()
@E~~/P
o.oo~.oo
J
• • I '_
u.06
T.lsec)
Figure 6.9 First-and second-order CSRS kernel estimates of the photoreceptor of Calliphora ery-
throcephaJa (with wh ite eyes) as obtained through the use of an eight-Ievel CSRS stimulus [Mar-
marelis & McCann, 1977).
Cann [Mannarelis & McCann, 1977]. These kernel estimates suggest an equivalent L-N
cascade model based on the analysis of Seetion 4.1.2 regarding the relationships between
kerneis of different order in an L-N cascade model (viz., a "cut" along the axis of the
second-order kerne I for any given value is a scaled version ofthe first-order kerneI).
However, elose inspection of the second-order kernel of Figure 6.9 indicates that the
aforementioned simple scaling relation holds true only for certain "cuts"-specifically for
T2 "cut" values from 10 ms to 15 ms and from 20 ms to 25 ms. Outside these ranges of T2
"cut" values, the second-order "cuts" deviate from a scaled version of the first-order ker-
nel, suggesting a more complex model. Furthennore, the observed changes in first-order
CSRS kernel estimates for different quasiwhite input power levels are consistent with the
presence of nonlinear negative decompressive feedback. This becomes more evident in
dark-adapted cases, where the second-order model becomes inadequate to predict the sys-
tem output (i.e., higher order nonlinearities are present). The inadequacies ofthe second-
order model are also evidenced by its inability to account for certain higher frequencies in
the system output, as illustrated in Figure 6.10 for a ternary input. It is worth noting that
the retinula cells 7-8 of the fly ommatidium have kernels distinct from the retinula cells
1--6 shown above [McCann, 1974; McCann et al., 1977; Eckert & Bishop, 1975], and
they project directly to the medulla along with the large monopolar cells (LMC) that are
postsynaptic to the retinula cells 1--6 at the laminar layer. The illumination spectral re-
sponse characteristics are also different, with the retinula cells 7-8 exhibiting a much
stronger UV response (than the cells 1--6), which is possibly used by the fly for naviga-
tional purposes [Fargason & McCann, 1978].
298 SELECTED APPLICATlONS
fW/ IIA(vU\A"'~IW1!M b
'IIIIV I/lAI"VII''''IM c
.1 SEC
t-----i
Figura 6.10 Portion of a three-Ievel CSRS stimulus (trace a), the corresponding system response
(trace b), and the second-order model-predicted response (trace c) [Marmarelis & Marmarelis, 1978).
These observations were generally confirmed by the extensive studies of Gilbert Mc-
Cann and his associates in the fly eye. For example, Figure 6.11 shows the first-order and
second-order Wiener kernels of retinula cells 1-6 for two different sizes of spot stimuli
(corresponding to two different power levels of GWN stimulation). The increasing under-
shoot in the first-order kernel and the rising magnitude of nonlinearity (peak of second-or-
der kernel) are evident for increasing stimulus size [McCann, 1974]. Another example of
first-order and second-order Wiener kernels of the LMC response to center/surround
stimulation in the fly Eristalis is given by Andrew James [James, 1992] and shown in Fig-
ure 6.12. The first-order Wiener kernels ofthe lamina in the fly eye change with stimula-
tion power level in a manner that suggests the presence of nonlinear decompressive feed-
back (i.e., increased undershoot or decreased damping) in accordance with the analysis
presented in Section 4.1.5 .
Another interesting application of this approach to an invertebrate photoreceptor was
made by Andrew French and his associates to the locust Locusta migratoria using green
GWN stimuli of 50 Hz bandwidth. Severallevels of stimulat ion were applied and a non-
linear feedback model was developed (using a ratio nonlinearity) for mean stimulus levels
above 2000 photons per second [French et al., 1989]. However, this feedback model was
not able to explain the behavior of the locust photoreceptor at lower levels of stimulation,
although nonlinearities have been detected at such low levels . The results confirmed the
previous qualitative observation of faster response (in terms of shorter latency and broad-
er bandwidth associated with the emergence of an undershoot) and reduced gain sensitiv-
ity with increasing stimulus intensity.
The possibility of feedback from a later stage to an earlier stage of a eascade was first
suggested by Borsellino and Fuortes (1968), following on the classic model of Fuortes
and Hodgkin (1964), and was later elaborated in Limulus photoreeeptors by Grzywaez
and Hillman (1988) . The cumulative experience indicates the presenee of forward and
feedback nonlinearities in the process of phototransduction, intertwined with dynamics
that attain more complieated forms as the range of stimulation increases, and especially as
it extends into the lower intensity levels (i.e., when the model seeks to explain the dynam-
ie behavior from near the dark-adapted state to a high light-adapted state).
We postulate a photoreceptor model with negative decompressive feedbaekf(y) (see
Seetion 4.1 .5) that is described by the nonlinear differential equation
A 20 For curve I; Dio. 1& 1/4-
For 20nd 3; Dia.••/2-
~
'5
~
SinVf. Spot
-5
-10
0 0.1 0.2
r in s.conds
B T, j n secondl
0 .01 .02 .03 .04 .05
°1 !
t t t
7 t
h2-'
.01
T
oB .02
Su
tD
•
.~
~ .03
.04
.05
C T, in seconds
.01 .02 .03 .04 .05
~-2
.Ot
i .02
I
.5
....C\I .03
.04
0-
( -221
20
.05
Figure 6.11 Wiener kerneis for retinula cell responses to single-spot stimuli. A: first-order kerneis.
B, C: second-order kerneis. Curve 1 of A and B is for 0.25 0 diameter spot at central field axis; dy-
namic range 80 cd/rn", average 40 cd/m", Curves 2 and 3 of A and C are for 0.5 0 diameter spot 1.3 0
from central field axis (half-sensitivity point); dynamic range 40 cd/rn". Ordinate units for h, are
(p,V)/[(intensity unit) . s]; for h 2 , (p,V}/[(intensity unit) . S]2 [adapted from McCann, 1974].
ha.hb
O,S
0
r
~ -o.s
>e
·1
-i.s 10 20 30 40
0
t,ma
haa
hbb
Flgure 6.12 First-order and second-order Wiener kemels for an Eristalis LMC. Upper panel shows
the first -order kemels for center (solid) and surround stimuli (dashed) in mV/(Cms). Lower panels
show the second-order kemels in units mV/(Cms)2 for the center (haa) and the surround (hbb) stimuli
[adapted from James , 1992].
Ly + 1(Y) =x (6.1)
where L is a linear differential operator L(D) = a p + . . . + aJD+ ao. In the steady state,
for fixed input xo, we have a fixed output Yo that is given by the sigmoidal function Yo =
F(xo), where F is the inverse of the decompressive function j'(v) + aaY [because all other
terms in the linear operator L(D) vanish at the steady state]. For instance , if l(Yo) =
tan(Ayo) and ao = 0, then F(xo) = arctan(xof A). This is consistent with the experimentally
observed decre asing gain sensitivity of the photoreceptor with increasing mean level of
stimulation, because the gain sensitivity relates to the derivative of F.
Aperturbation x(t) about a fixed stimulus level Xo yields
Ly + aaYo +l(yo + y ) = Xo + x (6.2)
where y (t) is the response to the perturbation around a fixed response level Yo correspond-
ing to xo. If we use a Taylor expansion oif about Yo,
a.r
l(yo + y) = N o) + -(yo)y +
ePl
..:l. .2
y2
(Yo)- + . . . (6.3)
ay uy 2
then Equation (6.2) yields the perturbation equation
Ly+ LC,JJn =x (6.4)

n=l
because auYo+J(yo) = xo, where
c; = ~. il'f (6.5)
n! ayn (Yo)
Therefore, the dynamic response to an input perturbation depends on Yo through the coef-
ficients {c n } . This is consistent with the experimental finding that the estimated Volterra
kemels change for different mean levels of stimulation (and response). Specifically, the
first-order Volterra kernel for mean levels Xo and Yo is the solution of the linear differen-
tial equation
Ly + CI(YO)y = x (6.6)
Since Cl is the derivative ofJatyo (i.e., the slope ofthe decompressive nonlinearity atyo),
the anticipated changes in the waveform of the first-order Volterra kernel are greater for
the values ofyo where the derivative changes faster. This relatively simple nonlinear mod-
el seems to explain the experimental observations to date.
The second-order Volterra kernel for mean levels Xo and Yo can be found to be (see
Section 4.1.5) in the bifrequency domain
K 2(wJ, W2) = -c2(Yo)K I(WI)KI(W2)KI(WI + W2) (6.7)
where
(6.8)
K1(w]) = L(w) + Cl(YO)
When the morphology of the second-order kerne I in certain cases changes with mean
stimulation level differently from the manner stipulated by Equation (6.7), we must aug-
ment the model of Equation (6.1) to include the derivative termj' in the feedback nonlin-
earity as j(y, y). Of course, the precise form of this nonlinearity with respect to y depends
on the observed changes of the kemels and requires elaborate analysis using the method-
ology presented in Section 4.1.5. Specifically, ifwe denote
if+lj
.
Ckl=
ay"ayl
--
(yo, 0) (6.9)
then
1
(6.10)
K](s) = L(s) + CO,]s + c],O
and
SI + S2 ]
K 2(Sb S2) = - [ C2,O + CO,2SI S2 + CI,I--2- K I (sl)KI(s2)K I (SI + S2) (6.11)
or in the time domain,
T m=min( Tl, T2)
k2(Tb T2) = -C2,O Io kl(A)kl(TI - A)kl(T2 - A)dA
Tm
- CO,2
Io k1(A)k{( Tl - A)k{(T2 - A)dA
C
- --.ld
ITm k1(A)[ki(TI - A)k1(T2 - A) + kl(TI - A)k{(T2 - A)]dA (6.12)
2 °
where k{ denotes the derivative of k.. It is evident from Equations (6.10) and (6.11) that
the augmented model affords greater flexibility in explaining the experimentally observed
changes in the first-order and second-order Volterra kemels. This model structure can be
validated by checking the analytical relation between the first-order and second-order
Volterra kernels, provided that accurate estimates can be obtained from the data. Note that
the key parameters {Ck,/} depend on the mean level of stimulation and on the morphology
ofthe function.f(y,y) aty = Yo andy = O.
Modeling ofthe fly eye has been popular because ofthe relative simplicity and stabili-
ty of the experimental preparation. It will be revisited again in Chapter 7 in connection
with two-input modeling because it represents the seminal application of two-input mod-
eling following the Volterra-Wiener approach that was pioneered by Panos Marmarelis in
connection with Gilbert McCann's studies of directional selectivity in the fly eye [Mar-
marelis & McCann, 1973].
6.1.3 Auditory Nerve Fibers

As an initial example from the auditory system, we consider the second-order Wiener
models obtained by Ted Lewis and his associates from high-frequency axons ofthe basi-
lar papilla in auditory neurons of the American bullfrog (Rana catesbeiana) [Yamada et
al., 1997; Yamada & Lewis, 1999; Lewis et al., 2002a, b]. These auditory units are not
phase-locking to stimuli within an octave of their characteristic frequencies (i.e., their
best excitatory frequencies), which are above 1000 Hz. When presented with a short seg-
ment (100 ms) ofband-limited noise that covers one octave above and below their charac-
teristic frequency (CF), the first-order Wiener kernel estimates obtained through cross-
correlation are weak and unreliable. However, the obtained second-order Wiener kernel
estimates are strong and reveal a consistent structure illustrated in Figure 6.13 for two ex-
perimental preparations. Singular-value decomposition of this second-order Wiener ker-
nel shows that two singular values are dominant and their respective singular vectors ex-
hibit a quadrature relation (i.e., they have the same waveform but shifted by 90°) as
shown in Figure 6.14 for the two experimental preparations, along with the singular vec-
tors for the subsequent six singular values. It is evident from Figure 6.14 that no dis-
cernible information resides in the singular vectors below the fourth one. However the
third and fourth singular vectors contain some interesting features, albeit weak, that are
12 I B
1
10 -:g:;, 20
0
3l 8
E
0 -
'e
~
Cl
10 ' ... ~ . . ..... .. . ............... .
6
-1
4
0
4 6 8 10 12 0 5 10 15
msec rank
I n
12
2
10
!
I
I •
0
81 ~' j 0
3l 81
E
:JI I 0
6
-1 1 I • -1
4
I
4 6 8 10 12 4 6 8 10 12
msec msec
I n
2 30 " ~" ..... .. . .. ..... ......
Ql
"0
! 81 1 I 0
.~
c:
Cl
l'll
20' ... .......... .. . .. ....
E 10 ' " .. ._ .................

-1
0
4 6 8 10 12 0 5 10 15
msec rank
Figure 6.13 Second-order Wiener kemeis from high-frequency axons of the basilar papilla in the
bullfrog. The stimulus noise was flat from appro ximately 100 Hz to 5.0 kHz, with an ampl itude of
90 dB SPL measured over 95 Hz bandw idth. The kemeis were estimated through reverse co rrela-
tion over the 15 ms immediately preced ing each sp ike (producing a 300 x 300 matrix of values).
The grey scale here shows a 200 x 200 region of the original kemei, focusing on the significant
area. B: The sixteen highest -ranking singular vectors in the singular-value d ecomposition of the
original (300 x 300) Wiener kem el of panel A. Open circles denote singular vectors that make neg-
ative (inhibitory or suppressive) contributions to the predicted spike rate, crosses denote singu lar
vectors that make pos itive (excitatory) contributions. C: Second -order Wiener kemel from another
unit at lower dens ity of 76 dB SPL. D: Reduced kemel of panel C, reconstructed from only the two
highest -ran king singular vectors (the highest-ranking quadrature pair). E: Reduced kemel of panel
C, reconstructed from the six highest -rank ing singular vectors. F: The sixteen highest-ran king sin-
gular vectors in the singuJar-value decomposition of the Wiener kemel of panel C [Yamada & Lewis ,
1999).
also evident in the frequency domain shown in Figure 6.15 (i.e., low gain at CF sugges-
tive ofnonlinear spectrolateral interaction). We observe that the phasic relation of the first
pair of singular vectors relative to the second pair of singular vectors (which can be
viewed as the four PDMs of this system) is such that the second pair provides a form of
spectral " lateral inhibition" (akin to the spatial lateral inhibition long studied in the visual
304 SELECTEDAPPLICA T10NS
unit 3/1/97 #2 unit 3/1/97 #1
A~~~L..
A 1 E
:o :
~ ~~~ I~
2
o
~IF
.
2
0
-2 -2
C I _: ~
IH I
o~:
-2 -2
4 6 B 10 12 4 6 B 10 12
msec msec
Figure 6.14 Singular vectors for the two Wiener kemeis in Figure 6.13 (A and C). The singular vec-
tors were computed for the entire 300 x 300 matrix of each kemel , and each was scaled by its asso-
ciated weighting factor. The 10 ms segments shown here correspond to the time frames shown in
Figure 6.13. Each of these segments is a 200 element vector. Panels A through D are the segments
for the Wiener kemel in Fig 6.13A; panels E through H are those for the Wiener kemel in Fig 6.13C. A,
E: First-ranking singular vector (dark line) and second ranking singular vector (grey line). S, F: Third-
ranking singular vector (dark line) and fourth-ranking singular vector (grey line). C, G: Fifth-ranking
singular vector (dark line) and sixth-rank ing singular vector (grey line). D, H: Seventh-ranking singular
vector (dark line) and eighth-ranking singular vector (grey line) (Yamada & Lewis, 1999].
system) suitable for enhancing the ability ofthe system to perceive and distinguish single
tones in the presence of ambient noise or interference (a form of "spectral contrast en-
hancement").
The quadrature relation between the two highest singular vectors suggests an envelope
detection mechanism following a band-pass filtering operation centered at CF. The Lewis
model is consistent with the van Dijk sandwich (L-N-M cascade) model of the same sys-
tem [van Dijk et al., 1994, 1997a, b], which conforms with the prevailing biophysical
view of the function of an auditory unit with a single hair cell; i.e., the first (band-pass)
filter represents the mechanical tuning structure peripheral to the hair cell, the static (rec-
40
30
20
A
:
:
D. . ~
r::: :.
unit 3/1/97 #2
:
. ~40
· ..
uni! 3/1/97 #1
0= 11' I
i
I , I
40f ~.. ~ 40 f·f .. : : ~ .. ; 1"' .

30 f ..... ; ......
2TFv . ?TIJ
10 .... . :
o i
:
,
0 ,
:
0 I
40 f · ·
c :· ;· · ; : .. 40 <3 , , ..
30 ; A: . ; .
20 /.' .
40 f D········.. ··~· ··· · · · · ·i · ···· ··· · ·: · · · · · · · · ·j 40 f· ··H. . : ; ; : .

. . .
: ; ; A' ; . 30 : ; ; ; .
30 t : ~ A
20 . 20 ; ..
)
10 .. .
10~
o 325 650 1300 2600
325 650 1300 2600
Hz Hz
Figura 6.15 Discrete Fourier transforms of the complete (300-element) versions of the correspond-
ing singular vec!ors in Fig. 6.14 [Yamada & Lewis, 1999).
tifying) nonlinearity represents the hair-eell transduction process, and the second (low-
pass) filter represents signal transformations through the afferent synapse and the primary
afferent axon to the spike trigger. Note that the singular-value decomposition of an am-
phibian papilla unit yielded only one significant singular value, which is a scaled version
of the first-order Wiener kerne1 [Lewis et al., 2002a], as shown in Figure 6.16. This im-
plies that the L-N cascade model applies to this case. Finally, Lewis' group reported two
significant singular values for the second-order Wiener kernel of a midfrequency cochlear
unit of a Mongolian gerbil , corresponding to a quadrature pair of singular vectors, where
the strongest singular vector scales to the obtained first-order Wiener kernel of this sys-
tem [Lewis et al., 2002b]. The latter can be viewed as the "excitatory pathway" and the
other singular vector (lagging by 90%) can be viewed as the "inhibitory pathway" [Lewis
et al., 2002b].
(A) K2 (8) Rank 1 Approx.

25
50 r:;J 50
20
15 15
10
0
10 • 0
5 5
-50 • -50
10 20 10 20
(C) WK1 (0) .vs. u1(-) (0) FFT(k1) end FFT(u1)

0.4
-0.4' ,
o 5 10 15 20 10
2
Figure 6.16 Second-order kernel of amphibian Papilla (top left) and its only significant singular vec-
tor (bottorn left) which is identical to its first-order kernei, shown also in the frequency domain (bot-
tom right). The top right panel shows the second-order kernel reconstruction based on the singular
vector [Lewis et al., 2002a] .
The observed quadrature relation between the singular vectors of nonphase-locking

units is viewed as a confirmation of the anticipated fact that envelope detection is the
critical operation for these units with high CF, since this is the only practical way to en-
code phase information of the stimulus for these high-frequency units . For units with
lower CF, the phase-locking characteristics are also encoded by the first-order kernel.
We must point out that these observations do not incorporate the effects of nonlinear
feedback in the cochlea as expressed on changes observed in first-order Wiener kernel
estimates for varying stimulus power-reported by Moller (1978) and, among others, by
deBoer and Nuttall (1997) as well as Recio et al., (1997) for basilar-membrane response
to noise and broadband stimuli, consistent with our analysis in Section 4.1.5, and shown
in Figures 6.17 and 6.18.
It is important to note that the two dominant singular vectors observed in auditory units
can be viewed as the "principal dynamic modes" of this system and their quadrature rela-
tion depicts the critical role of the second-order nonlinearities in maintaining the phase
sensitivity ofhigh-frequency sensors. Note however that the conventional "tuning curve"
cannot be equated with the minus log ofthe FFT magnitude ofthe first-order kernei, since
higher-order kerneis are contributing to the tuning curve .
We should also note the pioneering efforts of Aage Moller , who obtained the first-or-
der Wiener kerneI estimate s for various auditory neurons with different characteristic fre-
6.1 NEUROSENSORY S YSTEMS 307
ccf wavefonn ccf spectrum

1.0 0
I (a) I dB
·10
0.5
·20
0.0
·30
00.5
-40
FIg.1 Alt: 80dB

-1.0 ·50
0 20 40 1IO 1IO 100 10 15 20 25
Tme['l!tol 1.87 mal Frequency [kHz)
ccf waveform ccf spectrum

1.0 0
I (a) I dB
·10
0.5
-20
0.0
·30
00.5
-40
FIg.2 Alt: 20dB

-1.0 ·50
0 20 40 1IO 1IO 100 10 15 20 25
Tme['l!tol 1.87 mal Frequency 1kHz]
Figure 6.17 Top panels show the input-output cross-correlation funct ion (ccf) at a low level of
stimulation (attenuation 80 dB). Left panel (a): waveform of ccf, show n on an arbitrary amp litude
scale. Right panel (b): spectrum of ccf , shown on a 50 dB scale and normalized with its peak at 0 dB.
Bottom panels show the input-output cross-correlation function at a 60 dB higher level of stimulation
(attenuation 20 dB), measured in the same animal [de Boer & Nuttall, 19971.
quenc ies [Moller, 1973, 1975, 1976, 1977, 1978, 1983] and demonstrated the effe ct of in-
creasi ng power of stim ulation (down-shift in resonant freque ncy and the broadening of
the bandwidth), consistent with Figure 6.17 and our analysis of negative compressive
feedback (see Section 4. 1.5), as well as the extensive work of Peter Johanesma and his as-
sociates Aertsen and Eggermont on spectrotemporal analysis (see Sec . 7.4) of the audito-
ry system [Aertsen & Johanesma, 1981; Eggermont et a1. , 1983].
As we move to the modeling study of a spider mechanoreceptor in the following sec-
tion, we shou ld be rem inded that the hair cells in the cochlea are also mechanorece ptors ,
allowing the comparison between verte brate and inverte brate mechanoreceptors.
6.1.4 Spider Mechanoreceptor

Transduction and detection of mechanical stimuli takes place in living systems for both
somatosensory and regulatory purposes. It is perfonned by mec hanoreceptors that operate
with sufficient gain over a broad range of stimuli in order to detect small stimuli with ad-
equate sensitivity and also accommodate naturally occurring. stimuli of high intensity. In
Rrst-order Wiener kemel

100 10
5dBNHz
50
5dBNHz
o
0.1
·50
·100 0.01
25 1
~
.e:
0
So ·1
~E 8.
35
·2
.l9
CIl
.3
45
~ -4
GI
55 ~ ·5
c,
~.J LIBO
I I I T , i I I i i i i
o 2 3 4 1 3 5 7 9 11 13
Time (ms) Frequency (kHz)
Figura 6.18 First-order Wiener kemels (Ieft column) obtained for different noise levels and their
magnitude and phase spectra (right column). Observe the broaden ing of the resonant peak and the
downshift of CF as the stimulation increases [Recio et al., 1997].
addition to adapting their gains to the prevailing stimulus conditions, mechanoreceptors

must have sufficiently rapid response time for the intended task. The detailed dynamic be-
havior ofmechanoreceptors, and the specific molecular mechanisms involved, are not yet
fully understood, mainly because of the small physical size of most mechanoreceptors
and their intrinsic dynamic nonlinearities.
As an illustrative example, we consider a spider mechanoreceptor (the cuticular recep-
tor in the slit-sense organ of the spider Cupiennius salei) tested with pseudorandom quasi-
white mechanical strain stimuli caused by forced displacement of the slit, and recording
three distinct outputs: (I) the induced intracellular current (under voltage-clamped condi-
tions), (2) the transmembrane potential (under current-clamped and spike-suppressing
conditions), and (3) the extracellular action potentials . In all cases, the experimental
preparation proved to be very stable and the obtained Volterra kernel estimates using the
Laguerre expansion technique (LET) were consistent from preparation to preparation
[Marmarelis et al., 1999a].
Details of the experimental preparation are given in Marmarelis et al., (1999a). Here,
we simply point out that the applied Gaussian quasiwhite mechanical strain stimulus has a
bandwidth of approximately 400 Hz and the input-output data were sampled at 1 ms. The
responses to 10 repetitions ofthe same quasiwhite stimulus segment were averaged to im-
prove the SNR of the output data (in the first two cases of continuous output data, but not
in the case of output action lt0tentials). Sufficient time was allowed between recordings to
avoid transient effects, since\i nonzero mean level of the stimulus was used. Experiments
were performed for two different mean levels (with the same bandwidth and power level
of pseudorandom stimulation in both cases). Although long datasets can be obtained due
to the stability of the preparation, only 4,000 datapoints were used in this analysis for
each kernel estimation via LET.
The first-order and second-order Volterra kerneI estimates for the intracellular current
response are shown in Figures 6.19 and 6.20 for the two mean stimulus levels. Indicative-
ly, we note that the normalized mean-square errors (NMSE) of the model prediction are
8.10. 1ST-ORDER KERNEL ESTtMATES FORCURRENTOATA

'
Q
-,x,o-,
-0.01-'
--0.024
-0.032
...00.04 ~~I'
.
-o.o..e
-0.058
-0.08.
-0.072
o 10 20 30 .0 50
TtME lAG ["SEC]
$)(10-" FFT MAGNtTUOEOf 1 sr-ORDER KERNELS rOR CURRENT DATA
•.5x10··
.)(10. 4
... -,
3.5)(10-· .,. -' ,
'---
3)(10. 4
...... _--- ....
2.5x 10·"
......................... -..
2)(10·" - ....... --- __ 2
1.~h(10-"
to"' 4
5JC10· s
0
0 \l.1 0.2 0.3 0.4 0.5
F'REOUENCV 1KHZ]
Figure 6.19 The obtained first-order kernel estimates of the mechanoreceptor using LET on the in-
tracellular current response data (under voltage-clamped conditions) in the time (top) and frequency
(bottom) domains, for two different mean displacement stimulus levels (solid: 29.02; dashed: 31.90
j-tm). The peak kernel value for the higher mean displacement level is about 40% larger, but the
waveforms of the two kerneis are similar, exhibiting mildly low-pass characteristics. The ordinate axis
units of the first-order kernel are nA/(j-tm ms) [Marmarelis et al., 1999a].
I~ ~
I O~
~ 1,
X-MIN- 0.0 Y-MIN - 0.0 Z- MIN- - 8.841 x 19,-3
X-MAX- 50 Y-MAX- 50 Z-MAX- 1.584x1 0-
k,
->:
~ 1,
~:~: 2.~~~2~6-3
X-MIN - 0 .0 Y-MIN- 0.0
X-MAX- 50 Y-MAX _ 50
Figure 6.20 The obtained second-order kernel estimates of the mechanoreceptor in the time domain
for the two data sets discussed in Figure 6.19. The peak-to-peak kernel value for the higher mean dis-
placement (bottom panel) is about 25% larger, but the waveforms of the two kemels are similar . The
ordinate axis units for the second-order kernel are: nA/(pm rns)? [Marmarelis et al., 1999a).
about 34% for the first-order and about 8% for the second-order model, when evaluated
for a segment of data not used for the estimation of the kemels (out-of-sample prediction).
The results demonstrate the inadequacy ofthe first-order (linear) model and the good pre-
diction obtained by the second-order model.
Next, we compute the PDMs (principal dynamic modes) ofthis system using the pre-
sented first-order and second-order kernel estimates. Only two significant PDMs are
found in this system and they are shown in Figure 6.21 for the aforementioned two
1.125
0.931S
017S
0.5625
0.375
0.1875
... ,
0 ~~._~ .-2
-0.1875
-0.375
-0.5625
-0.75
0 10 20 JO 40 50
rlME LAG (USEC]
1.125
0.9375
0.75
0.5625 lf:, I
0.375 ~J :
I
0.1875 :c: I
I
0 ~'-}-"'i-'
, ,
-0.1875 ~
I
I ,,
,,
I,
-0.375 .:, 'I

"I,
-0.5625
:'
-0.75
0 10 20 30 40 50
TIME LAG (USECJ
(a)
Figure 6.21 (a) The two principal dynamic modes (PDM) of the mechanoreceptor in the time do-
main using the intracellular current kerneis of Figures 6.19 and 6.20. The PDMs for the two mean dis-
placement levels (Iow: top; high: bottom) are rather snnnar. The waveform of the first PDM (solid) is
similar to the first-order kernel (with reverse polarity) exhibiting mild low-pass characteristics, where-
as the second PDM (dashed) exhibits strong high-pass characteristics, as iIIustrated in (b) where the
FFT magnitudes of the two PDMs are shown (see next page). The high-pass characteristic of the
second PDM is evident but it resides entirely in the second-order kerne!. The corresponding eigen-
values are both negative and indicate that the contribution of the first PDM to the current response Is
much larger than the contribution of the second PDM. The ordinate axis units for the PDMs are
nA/<lLm ms) [Marmarelis et al., 1999a].
8)(10- J
7.2><10- 3
6.4)(10'-:3
5.6)(10- 3
"--'~
4.8)(10- 3
4)(10- 3
,
,,
............ ----- __ 2
3.2)(10- 3
,
I
---1
,... , f
2.4><10- 3
,, ... -"
I
I
1.6><10- 3
, I
8><10- 4
', ... ,"''' "
o
o 0.1 0.2 0.3 0.4 0.5
FREQUENCY (KHZJ
8><10- 3
7.2><10- 3
6.4><10- 3
5.6)(10- 3
4.8)(10- 3 "------- .. -.... _-..

, , " '" -------- 2
4)(10- 3 ,
,,'
3.2)(10- 3 I
,,
I
2.4)(10- 3
,,
1.6)(10- 3 ,, ... ...... ' '
,,
8)(10-" ,, ,, , <>: '
o
o 0.1 ~2 ~3 0.4 0.5
FREQUENCY [KHZ]
(b)
recordings of low and high mean displacement level of stimulation. The consistency in
the form of these PDMs for different recordings is remarkable. The corresponding eigen-
values for the first (Al) and second (A2) PDMs are Al = -0.0317 and A2 = -0.0046, and Al
= -0.0632 and Al = -0.0078 for the two recordings, quantifying the relative contributions
of the two PDMs to the system output power. The FFT magnitudes in Figure 6.21(b)
demonstrate the high-pass characteristics ofthe second PDM (evident in the second-order
kernel), and the mildly low-pass characteristics of the first PDM that resembles the first-
order kernel. This result also demonstrates that the high-pass characteristics of the
mechanoreceptor reside in the second-order kernel and are strictly nonlinear. We must
note, however, that the observed high-pass characteristic is due in part to the artificial nu-
merical representation of an action potential with an impulse function (instead of its actu-
X-MIN- -3 Y-MtN- -2.5 Z-MIN- -0.3418

X-MAX- 3 Y-MAX- 2.5 Z-MAX- 0.01318
Figure 6.22 The static nonlinearity associated with the two PDMs shown in Figure 6.21 for low
mean displacement. The ramp-threshold characteristic with respect to U1 is altered when U2 be-
comes negative [Marmarelis et al., 1999a].
al waveform). When the actual waveform ofthe action potential is taken into account, the
high-pass PDM is attenuated to some extent at the highest frequencies.
The nonlinearity associated with these two PDMs is shown in Figure 6.22 for the
low-mean displacement level. It is evident from the plot that the nonlinear dependence
of the system output (intracellular current) upon the first PDM output, Ub follows a
ramp-threshold characteristic whose critical threshold value remains fairly close to zero
for positive values ofthe second PDM output, U2' but rapidly decreases for negative val-
ues of U2' This nonlinear surface is concave (because of the negative eigenvalues) re-
flecting the fact that the differential change of intracellular current in response to an in-
crease of displacement forcing is negative. Note also that the slope of the
ramp-threshold curve with respect to UI decreases with decreasing U2, especially for neg-
ative values of U2' The form of this nonlinearity indicates directionally selective behav-
ior of this mechanoreceptor, since the response characteristics with respect to UI (mag-
nitude of displacement) are distinct1y different for negative and positive values of U2
(velocity of displacement). Similar observations apply to the nonlinearity for the high-
mean displacement level [Marmarelis et al., 1999a].
This interesting nonlinearity and the associated two PDMs constitute a complete non-
linear model for the system defined by the mechanical displacement input and the intra-
cellular current output (under voltage-clamped conditions).
The results for the transmembrane potential data collected under current-clamped con-
ditions (and spike suppression with TTX) are also very consistent in form across record-
ings. Two representative first-order kemels obtained for the same stimuli used in the two
previously presented current recordings are shown in Figure 6.23 in the time and frequen-
314 SELECTEDAPPLICATIONS
cy domains. These first-order kerneIs look like low-pass filtered versions oftheir intracel-
lular current counterparts ofFigure 6.19 (due to the transmembrane capacitance) but their
size relation is more exaggerated in favor ofthe higher mean displacement level case (i.e.,
the peak value ofthe corresponding kernel is about double, compared to about 40°A> larger
for the current output case). The corresponding second-order kerneIs are shown in the
time domain in Figure 6.24. The consistency in the form ofthese kernels is again evident,
and the size relation is similar to their first-order kernel counterparts of Figure 6.23. The
first-order and second-order model predictions are shown in Figure 6.25 along with the
actual output for a segment of data not used for the estimation ofthe kerneIs (out-of-sam-
pIe prediction). The significant contribution of the second-order kerne! is again evident
(NMSE for second-order prediction is 25.9% versus 60.2% for the first-order prediction),
demonstrating the inadequacy of the linear (first-order) model and the presence of signifi-
cant second-order nonlinearities.
Computation of the PDMs from these kerneI estimates again yields only two signifi-
cant PDMs for the case of output potential, shown in Figure 6.26 for the same two
recordings. We observe that the first PDMs are now more low-pass than their current-
data counterparts of Figure 6.21, resembling the first-order kernel waveforms, whereas
the second PDMs remain distinctly high-pass and notably similar in waveform (although
of reverse polarity) to the second PDMs of the previously analyzed current data. The
corresponding eigenvalues for the first (Al) and second (A2 ) PDMs are Al = 2.877 and
A2 = 0.145, and Al = 2.649 and A2 = 0.247 for the two mean levels of stimulation, indi-
cating the relative contributions of the two PDMs to the system output power (the first
is about one order of magnitude larger). These PDMs are shown in the frequency do-
main (FFT magnitude) in Figure 6.26(b). Clearly, the first PDM dominates for low fre-
quencies up to about 160 Hz, and the second PDM is dominant above that frequency.
Thus, for the transmembrane potential data, the two PDMs seem to divide the opera-
tional bandwidth at about 160 Hz. It should be noted again that part of the high-pass
characteristic is due to the artificial representation of an action potential with an impulse
function.
The nonlinearities associated with these two PDMs for the two mean displacement lev-
els are shown in Figure 6.27 and exhibit convex morphology, reflecting the fact that the
differential change oftransmembrane potential in response to an increase of displacement
forcing is positive (this is also the reason for the positive eigenvalues). The form of the
nonlinear surface changes slightly with mean displacement level (in addition to the obvi-
ous and anticipated change in elevation), and exhibits less asymmetry with respect to the
output ofthe second PDM than in the previous case of current output.
These results demonstrate the efficacy and the utility of PDM modeling and analysis
(based on Volterra models) in enhancing our understanding ofthe nonlinear dynamic be-
havior of the mechanoreceptor system.
The form of the nonlinearity in the current data resembles threshold behavior akin to
half-wave rectification (with a negative slope) for positive values ofthe second PDM (U2
> 0), i.e., a positive displacement Ul (relative to the mean) elicits negative current for pos-
itive displacement velocity U2, but for negative displacement velocity the aforementioned
displacement threshold is reduced and the slope of the supratheshold response is also re-
duced (i.e., negative current will continue flowing for negative velocity, even when the
displacement becomes negative, but the sensitivity will be lower). This response behavior
is both position sensitive and velocity sensitive.
For the potential data, the nonlinearity again exhibits rectifying behavior, having
1ST-ORDERKERNEL ESTIMATES rOR POTENTIAL DATA

1.125
0.8'5
0.75
0.625
0.5
0.375
0.25
0.125
0
.... , ..
-0.125
0 10 20 JO 40 50
TIME lAG (MSECl
0.02
rrr MAGNITUDE OF 1ST-ORDER KERNELS FOR POTENTIAL DATA
0.018
O.OHS
I
,,-\ \
\
0.014
,
I
,,
,
\
\
0.012 \
\
,
0.01
8)(10-~ \"""""
6)(10-3 ......
.........
4)(10-3
..... ....
2x'O-J ..._~~--------------
0
0 0.1 0.2 0.3 0.4 0.5
F'REOUENCY (KHZ)
Figure 6.23 The obtained first-order kernel estimates using LET on the intracellular potential re-
sponse data (under current-clamped conditions) in the time (top) and frequency (bottom) domains,
for the same stimuli as in Figure 6.19. These first-order kerneis look like low-pass versions of their in-
tracellular current counterparts in Figure 6.19 (due to the transmembrane capacitance), although the
size relation is more exaggerated in favor of the higher mean-displacement-Ievel case. The ordinate
axis units for the first-order kernel are mV/(p,m ms) [Marmarelis et al., 1999a].
supralinear response eharaeteristies for Ul > 0 (i.e., changing faster than linear) and flat-
tened response for Ul < O. The effeet of U2 is slightlyasymmetrie and eauses inereased re-
sponse potential for larger displaeement speed, also in supralinear fashion. Thus, this re-
sponse behavior is also position sensitive and veloeity sensitive, but not very direetionally
selective. Note that these experiments were made under elamped eonditions and with
TTX-suppressed fast sodium ehannel, whieh may account for some of the differenees in
316 SELECTEDAPPLICATIONS
T2
~
~~
~~ T(
X-MIN=- 0.0 Y-MIN= 0.0 Z-MIN= -0.024.33
X-MAX= 50 Y-MAX= 50 Z-MAX= 0.1202
~~
k2
/~~
X-MIN= 0.0
~~ V-MIN= 0.0 Z-MIN- -0.05652
X-MAX= 50 Y-MAX= 50 Z-MAX= 0.2034
Figure 6.24 The obtained second-order kernel estimates in the time domain for the two data sets
discussed in Figure 6.23. The waveforms are similar for the two mean displacement levels, but the
size for the higher mean displacement level (oottorn panel) is about double-in rough correspon-
dence to their first-order kernel counterparts. The ordinate axis for the second-order kernel are
mV/{J.tm rns)", The 'T1 and 'T2 axes are in ms units (10 ms bar shown) [Marmarelis et al., 1999a].
the nonlinear behavior evident in the current and potential response data. Finally, we
should note that, although the form of the nonlinearity for Ul > 0 appears initially to be
supralinear, it gradually becomes linear (possibly an inflection point) and tends to become
sublinear as Ul increases further and reaches the end ofthe dynamic range (i.e., sigmoidal
overall shape).
The presented PDM model is more compact than its Volterra counterpart (e.g., for sec-
ond-order models the numbers of free parameters are 108 and 1378, respectively). How-
ever, the Volterra model includes the dynamics represented by the less significant eigen-
values/eigenvectors that are omitted from the PDM model. In this application, the
100 /00 300 400 tlOO 600

TIME (MSEC)
Figure 6.25 A segment of transmembrane potential test data under current-clamped conditions
(trace 1) and the Volterra model predictions of first order (trace 2) and second order (trace 3). The sig-
nificant contribution of the second-order kernel to the response potential is evident. The normalized
mean-square errors are 60.2% for the first-order, and 25.9% for the second-order model prediction.
Note that action potentials are suppressed by use of TTX [Marmarelis et al., 1999a].
difference in prediction mean-square error was marginal, lending support to the notion of
a "minimal model" based on PDM analysis. Furthermore, the PDM model can be extend-
ed to nonlinear orders higher than second (even though limited to the selected PDMs in
terms of dynamies), whereas the Volterra models cannot be practically extended into
higher-order nonlinearities because of the computational burden associated with the rapid
increase in the number of free parameters.
The nonlinear dynamic behavior observed in this analysis agrees weIl with experi-
ments using step displacements, where positive steps (indenting the slits) caused signifi-
cant inward currents, while negative steps caused much smaller reductions in inward cur-
rent. This asymmetrie nonlinear behavior was more pronounced in the initial dynamic
responses to steps than in the late responses near the end of the step stimulus, as reflected
in the obtained model by the high-pass properties ofthe second PDM that is primarily re-
sponsible for the nonlinear transient behavior.
The physiological system responsible for the receptor current consists of the slit cuticle
between the stimulator at the dendritic sheath surrounding the neuron tip (a smaIl, pre-
sumably fluid filled, region between the dendritic sheath and the neuronal membrane) and
the mechanically activated ion channels in the neuronal membrane. The two important
questions in interpreting the obtained nonlinear dynamic model are: (1) what biophysical
mechanisms could correspond to the two PDMs, and (2) what is the biological basis ofthe
nonlinearity? Although neither question can be answered with certainty at present, it ap-
pears that the two distinct PDMs, by exhibiting low-pass and high-pass characteristics, re-
spectively, may correspond to two types of mechanically activated ion channels in the
neuronal membrane that have fast (sodium) and slow (potassium) dynamies. The latter in-
318 SELEC TED APPLICATIONS
0.75
0.6
0.45
0.3
0.15
0 .......... ~.~ _,I e~ 2
-0.15
-0.3
-0.45
-0.6
-0.75
0 10 20 JO 40 50
TIME LAG [MSEC]
0.75
0.585
0.42
O~25S
0.09
-0.075
-0.24
-0.405
-0.57
-0.735
-0.9 l----r---~m'H_~_-.,.-- i , ,; i ,
o 10 20 JO 40 50
rlME LAG (MSEC)
(a)
Figure 6.26 (a) The two PDMs in the time domain, using the transmembrane potential kerneis of
Figures 6.23 and 6.24. Again, the waveforms are similar for the two mean displacement levels (top:
high level, bottom: low level), and the first PDMs (solid) resemble in waveform the first-order kerneis.
The second PDMs (dashed) are similar in waveform to their counterparts for the intracellular current
data (with reverse polarity). The corresponding eigenvalues are both positive and indicate that the
relative power contribution of the first PDM is about one order of magnitude larger. (b) (See next
page.) The two PDMs for the transmembrane potential data in the frequency domain (Le., FFT mag-
nitude of the PDMs). As previously, the high-pass characteristic of the second PDM is evident. The
two PDMs appear to divide the frequency response bandwidth, whereby the first PDM is dominant
below 160 Hz and the second PDM is dominant above that frequency [Marmarelis et al., 1999a]..
0.0125
0.01125
0.01
7.5)(10- 3
5)(10- 3
- - - - - - - - - - - 2
3.75)(10-
,,
,
2.5)( 10-3 ,,
,,- ... ---
1.25)(10-~ .r> , ,"
\,
o
o 0.1 0.2 0.3 0.4 0.5
F"REQUENCY [KHZ]
0.01
9)(10- 3
8)(10- 3
7)(10- 3
6)(10- 3
5)(10- 3
_~~-------------------- 2
4)(10- 3
,,
;-
3)(10- 3 I
I
I
I
2)(10- 3 I
J
",. ... " I

I
I
10- 3 " J
.. I .. J
"" "
0
0 0.1 0.2 0.3 0.4 0.5
F"REOUENCY [KHZ)
(b)
Figure 6.26 (continued)
cludes the possibility of a calcium-activated potassium channel. Experiments that elimi-

nate selectively the permeant ions can be elucidating in this regard. Another factor possi-
bly inducing nonlinear behavior is the fluid between the dendritic sheath and the neuronal
membrane, which could conceivably cause nonlinear dashpot action. It is also likely that
the nonlinear dynamics measured here reflect the connection of the deformation of the
neuronal membrane to the molecular structures of the mechanically activated ion chan-
nels and their linkages to the cytoskeleton. Detailed models of mechanically activated
channels are starting to emerge and can benefit from the quantitative nonlinear dynamic
descriptions ofmechanotransduction provided herein [French, 1984a, b; Sachs, 1992].
X-MIN= -6 Z-MIN- -0.125

X-MAX= 6 Z-MAX- 10.568
~
~
U,
~~~
U2
X-MIN- -6 Y-MIN- -3 Z-WH- -0.8301
X-MAX= 6 Y-MAX-.3 Z-t.fAX- 11.475
Figure 6.27 The static nonlinearities associated with the two PDMs of Figure 6.26 for high (bottorn)
and low (top) mean displacement levels. The axes (U1' U2) represent the two PDM outputs, and the
vertical axis is the transmembrane potential response under current-clamped conditions and sup-
pression of action potentials with TTX. The axes ranges are given at the bottom of each plot (1 rnV
bar shown). The convex non linear characteristic is evident as weil as the slight asymmetry with re-
spect to U2 [Marmarelis et al., 1999a].
6.2 CARDIOVASCULAR SYSTEM
As an illustrative example from the cardiovascular system, we select the modeling study
of the nonlinear dynamics of cerebral blood flow autoregulation. The traditional concept
of cerebral autoregulation refers to the ability of the cerebrovascular bed to maintain a rel-
atively constant cerebral blood flow despite changes in cerebral perfusion pressure. Be-
cause of the high aerobic metabolie rate of cerebral tissue, the maintenance of adequate
cerebral blood flow through cerebral autoregulation is critical for survival. Under normal
conditions, it has been observed that a sudden drop in the arterial blood pressure level
causes an initial drop in the level of cerebral blood flow that gradually returns to its previ-
6.2 CARDIOVASCULAR SYSTEM 321
ous value within a couple of minutes, due to multiple homeostatic regulatory mechanisms
that control cerebrovascular impedance over several time scales (from a few seconds to a
couple of minutes) [Edvinsson & Krause, 2002; Panerai et al., 1999, 2000; Poulin et al.,
1996, 1998; Zhang et al., 1998, 2000].
With the development oftranscranial Doppler (TCD) ultrasonography for the noninva-
sive measurement of cerebral blood flow velocity in the middle cerebral artery with high
temporal resolution, it has been shown that blood flow velocity can vary in response to
variations of systemic arterial blood pressure over various time scales. We consider data
representing the mean arterial blood pressure (MABP) and mean cerebral blood flow ve-
locity (MCBFV), computed as averages over each heartbeat interval (marked by the R-R
peaks in the ECG), and resampled evenly every second after proper low-pass filtering to
avoid aliasing [Mitsis et al., 2002; Zhang et al., 1998].
Spontaneous fluctuations in beat-to-beat MABP and MCBFV data possess broadband
spectral properties that offer the opportunity to study dynamic cerebral autoregulation
in humans, using the advocated nonlinear modeling methods. Impulse-response and
transfer-function analysis were initially utilized to show that cerebral autoregulation is
more effective in the low-frequency range (below 0.1 Hz), where most ofthe ABP spec-
tral power resides. These studies have also indicated the presence of significant nonlinear-
ities in this low-frequency range as attested to by low coherence function measurements.
The nonlinear dynamic relationship between beat-to-beat changes in MABP and
MCBFV reflects the combined effects ofmultiple mechanisms serving cerebral autoregu-
lation. Since the vasculature is influenced by metabolic, endocrine, myogenic, endothe-
lial, respiratory, and neural mechanisms, the dynamics of cerebral autoregulation are ac-
tive over widely different frequency bands. Specifically, metabolic or endocrine
mechanisms are active at very low frequencies and respiratory or myogenic mechanisms
are active at high frequencies, whereas endothelial and autonomic neural mechanisms are
found in the intermediate frequency bands.
For this reason, it is incumbent on the employed modeling methodology to be able to
capture reliably both fast and slow dynamics in a single processing task. To this purpose,
we employ the nonlinear modeling method presented in Section 4.3 that utilizes the La-
guerre-Volterra network with two filter banks (LVN-2) to model nonlinear systems with
fast and slow dynamics effectively (from 0.005 to 0.5 Hz in this case).
Arterial blood pressure was measured in the finger by photoplethysmography (Fi-
napres, Ohmeda). Cerebral blood flow velocity was measured in the middle cerebral
artery using TCD (2 MHz Doppler probe, DWL Electronische Systeme) placed over the
temporal window and fixed at a constant angle and position with adjustable head gear to
obtain optimal signals. End-tidal CO2 was also monitored continuously with a nasal can-
nula using a mass spectrometer (MGA 1100, Marquette Electronics). The analog signals
of blood pressure and flow velocity were sampled simultaneously at 100 Hz and were
digitized at 12 bits (Multi-Drop X2, DWL). Beat-to-beat mean values of MABP and
MCBFV were calculated by integrating the waveform of the sampled signals within each
cardiac cycle (R-R interval) and dividing by this interval. The beat-to-beat values were
then linearly interpolated and resampled at 1 Hz (after antialiasing low-pass filtering) to
obtain equally spaced time-series of MABP (input) and MCBFV (output) data for the
subsequent analysis (see Figure 6.28). After high-pass filtering at 0.005 Hz to remove
very slow trends in the data, 6 min input-output data segments (which correspond to 360
data points) are employed in the LVN-2 training procedure (see Section 4.3).
The structural parameters of the LVN-2 model are selected by the model-order se-
lection criterion presented in Section 2.3.1, which ensures that we obtain an accurate
model representation of the system and avoid overfitting the model to the specific data
segment. Following this procedure, an LVN-2 model with LI -= L 2 ;;;.;. 8, H = 3, and Q =
2 is selected for these data. Note that the total number of unknown parameters in this
model is 57, which is extremely low compared to the conventional cross-correlation
technique, which would require the estimation of 5151 values for the first-order and sec-
ond-order kernels with the necessary memory of 100 lags. The achieved model parsi-
mony is further accompanied by a significant improvement in the prediction NMSE rel-
ative to the conventional cross-correlation technique. In order to terminate properly the
training procedure and avoid overtraining the network, the prediction NMSE is mini-
mized for a 2 min forward segment of testing data (adjacent to the 6 min training data
segment) [Mitsis et al., 2002].
The averages ofthe MABP and MCBFV data over the 2 hr recordings from each ofthe
five subjects are 82.3 ± 10.7 mmHg and 61.7 ± 9.0 cm/s, respectively. Typical6 min seg-
ments of MABP and MCBFV data are shown in Figure 6.28 along with their correspond-
ing spectra. Most ofthe signal power lies below 0.1 Hz. The average achieved NMSEs of
output prediction using first-order and second-order models, are 49.1% ± 13.4% and
27.6% ± 9.5%, respectively. The reduction of the prediction NMSE from the first-order
(linear) model to the second-order (nonlinear) model is significant (over 20%), confirm-
ing the fact that the dynamics of cerebral autoregulation are nonlinear.
The performance ofthe LVN-2 model is illustrated in Figure 6.29, where we show the
actual MCBFV output (top trace) along with the obtained LVN-2 model prediction (sec-
ond trace), as well as its first-order and second-order components (third and fourth traces,
respectively). For this specific data segment, the second-order prediction NMSE is 13%,
and the first-order prediction NMSE is 34% (i.e., the NMSE reduction due to the second-
order kernel is 21%). We must note that the contribution of the second-order kerneI (non-
AlterlaI Blood Pl88sure C8r8braI Blood Flow VeIoclty

90 ,..,----------~ 90-----..----..-------,
80 80
l
170
80
1
70
60
o 120 240 380 120 240 360

llrne (lee] 'TIme [sec]
ASP spectrum CBF Y810clty spectrum
eoo r---------__--, 400. ,
300
400
200
200
100
0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2

Frequency [Hz] Frequency [Hz]
Figure 6.28 Typical MABP and MCBFV data used for LVN-2 model estimation. Top panels: time
series; bottom panels: spectra after high-pass filtering at 0.005 Hz [Mitsis et al., 2002].
o 120 240 380

T1me{aec)
Figure 6.29 Typical LVN-2 model prediction of MCBFV output (see text) [Mitsis et al., 2002].
linear term) to the output prediction NMSE demonstrated considerable variability among
data segments (as small as 8% and as large as 62%). This variability was also reflected in
the form ofthe second-order kernel estimates obtained for different segments and/or sub-
jects. This finding suggests either nonstationary behavior in the nonlinearity ofthe system
or the presence of intermodulatory (nonlinear) influences of other exogenous variables
(e.g., changes in arterial CO 2 tension or hormonal fluctuations).
The relative contributions of the linear and nonlinear terms of the model are also illus-
trated in Figure 6.30 for the same set of data in the frequency domain, where the output
spectrum and the spectra of the first-order and second-order residuals (output prediction
errors) are shown. The shaded area corresponds to the difference between the first-order
and second-order residuals in the frequency domain, indicating that the nonlinearities are
found below 0.1 Hz and are more pronounced below 0.04 Hz. This observation is consis-
tent with previous findings based on the estimated coherence function.
The fast Fourier transform (FFT) magnitudes ofthe first-order kernel and its two com-
ponents (fast and slow) are shown in Figure 6.31 in log-log scale. The fast component has
a high-pass (differentiating) characteristic with a peak around 0.2 Hz and a "shoulder"
around 0.075 Hz, whereas the slow component exhibits a peak around 0.03 Hz and a
trough around 0.01 Hz. The total first-order frequency response indicates that cerebral au-
toregulation attenuates the effects of MABP changes on MCBFV within the frequency
range where most of the ABP signal power resides, as expected.
The second-order kerne I (describing the nonlinear dynamics ofthe system) is shown in
Figure 6.32 for the same data segment, along with its corresponding frequency-domain
representation (defined as the magnitude of the two-dimensional FFT of the seeond-order
kernel). The frequency-domain peak of the latter is located on the diagonal and is related
to the corresponding first-order frequency response peak (for this specific segment) at
0.03 Hz. Note that the off-diagonal peak at the bifrequency loeation (0.03, 0.01 Hz) im-
plies nonlinear intermodulatory interactions between the mechanisms residing at the re-
spective frequencies, whereas the diagonal peak at the bifrequeney location (0.03, 0.03
<lOO i i i i i i i i i i I
350
an
250
......Actual output
zn
1SI order rasiduals
Figure 6.30 Spectra of the output (MCBFV), first-order and second-order model residuals. The
shaded area shows the effect of the nonlinearterm in the frequency domain [Mitsis et al., 2002).
Hz) implies nonlinearity of the single meehanism residing around 0.03 Hz. Seeondary
peaks are also diseemible at lower off-diagonal bifrequeney points.
Note that the loeations ofthe various bifrequency peaks vary over time but stay within
eertain bounded regions from segment to segment (e .g. , the 0.03 Hz peak stays within the
0.02-0.04 Hz region). The nonstationarity of the system dynamies (i.e., the varying loca-
,~
--~ .. .
.............
~
bCII
100 ~ ••• -. F.t
- - Slow
----
.,..-,.,
.......
\I:'..,/ "
\
\ /1
{. I
/
V
1./ "-
.." "-
-,
-,
<,
~
...•..•.... \ I "
., I -,
10 L ·.······..· I "
I' I "- "
"
"
\1
10"" 10~ 10"

Frequency [Hz]
Flgure 6.31 FFTmagnitude of the first-order kernel of cerebral autoregulation, and its fast and slow
components in log-log scale . Solid line: total; dotted line: fast component; dashed line: slow cornpo-
nent. A trough around 0.01 Hz and peaks around 0.03 Hz and 0.20 Hz are evident [Mitsis et al.,
2002).
10-3 ...: .... .

...
l(
~
• I • • • • • ~ . .: .- .
~ '.
. '.
4
-2
0 10 ~. :....... ...., .. ....

20 .. ' ". .. ... . .. . ., .. .... ~ 50
3J
40 50
0 Time [sec]
Tlrns [sec)
:I:
o
0.1
0 .05 0.1 0.15

Frequency [Hz)
Figure 6.32 The second-order kernel of cerebral autoregulation for the data of Figure 6.28. Upper
panel: time domain; tower panel : frequency domain [Mitsis et al., 2002].
tions of the speetral peaks and their respeetive strengths) ean be traeked for sueeessive
overlapping data segments [Mitsis et a1. , 2002].
The observed variation s over time may be the result of nonstationary modulation of
the vaseular impedanee by the autonomie nervous system (sympathetie and parasympa-
thetie aetivity) and other faetors, sueh as endothelial, endoerine, or metabolie meeha-
nisms that affeet the vaseular impedanee (typieally over longer time seales).
Nonetheless , the time variations of the first-order kernel estimates are limited, as
demonstrated in Figure 6.33 where the averaged first-order kerneIs over 20 sueeessive 6
min segments of data for five different subjeets are shown in log-linear seales along
with standard deviation bounds .
In order to facilitate the interpretation of the obtained nonlinear model , we resort to
PDM analysis. The key struetural parameter is the nwnber of hidden units (H) , whieh is
also the number of "prineipal dynamie modes" (PDMs) and found to be equal to 3 in all
eases. The LVN-2 model with struetural parameters (L = 8, H = 3, Q = 2) was used to an-
alyze data from human subjeets (provided by our eollaborators Dr. B. Levine and Dr. R.
Zhang from the University ofTexas in Dallas). A typieal PDM model ofa normal subjeet
0.8
0.8
0.4 0.4
0.2 0.2
o O~ :. __• 1 0 ,..:.:.:::: ;;;.:.:.: _. I
04.2 -o.2 L -0.2
.oA' , 04.4
2
10' 102 1~ 10' 1cf 10° 10' 10
11me(aec) 11me(ieC) llme [lee)
0.8 \
::,\\ • I
.::: ::::~:: ~::::. 0,:, \< : : : .

-o.2~ , .0.2 \. ,..-
....... , .."
-oAl , 44' ,
2
1~ 10' 10 1~ 10' 1cf
TIme[aec) nme[aec)
Figure 6.33 Average first-order kerneis over 20 successive 6 min segments for five different sub-
jects (solid lines) and corresponding standard deviation (dashed Iines) [Mitsis et al., 2002].
is shown in Figure 6.34, exhibiting resonant peaks that can be related to specific physio-
logical mechanisms. In order to determine this correspondence, we resort to pharmaco-
logical manipulations whereby data are analyzed before and after the subjects are treated
with various medications (e.g., trimethaphan, phenylephrine, L-NMMA, etc.) that selec-
tively affect different physiological mechanisms.
For instance, MABP and MCBFV data before and after the administration of
trimethaphan (which blocks the ganglia ofboth sympathetic and parasympathetic nerves)
were used to explore quantitatively the effect of the autonomie nervous system on cere-
bral autoregulation in the aforementioned nonlinear dynamic context. The average value
and dynamic range of MABP and MCBFV typically drop after the injection of
trimethaphan and the effect of respiration is visually evident, as illustrated in Figure 6.35
for anormal subject. It is also evident that there are many occasions of flow fluctuations
(typically deep depressions) with no apparent causallink to preceding changes in pres-
sure, possibly caused by other factors (e.g., the effect of blood gases is examined in Sec-
tion 7.2.4 for CO 2 tension in the context oftwo-input modeling). The resulting third-order
PDM models reveal intriguing information about the underlying mechanisms, as illustrat-
ed in Figure 6.36, which shows the results for a normal human subject after injection of
trimethaphan. It is evident by comparing Figure 6.34 with Figure 6.36 that the first and
third PDMs are affected drastically by the trimethaphan as a result of the ganglionic
blockade. The second PDM is not affected significantly and is surmised to represent
Flgure 6.34 PDM model of cerebral autoregulation for anormal subject, where the PDMs are plot-
ted in the frequency doma in (magnitude only). Three PDMs are consistently identified that exhib it ad-
mittance peaks within certain frequency bands. Based on existing knowledge regard ing the act ion of
the sympathetic and parasympathetic nerves, we pos it that the first PDM with peaks marked L 1s •
M 1s • H 1s corresponds primarily to sympathetic effects, and the third PDM with peaks marked L 1p •
M 1P ' H 1P corresponds primarily to parasympathetic effects. The second PDM contains the effects of
local endothelial mechan isms (NO and CO2 ) and respiratory coupling (ASA).
88 [- -,--- -----,------.. .-----,-- ---,

4
f:: L' I~~1
!!\~~~~
82
i
:E 80 [
78 --'--
o 60 120 180 240 300 360
Time [sec)
70 - - - -~---_,.-------.----__r----_,
~
.M ;~~
~
.2.
~
{~~~ j '~'V V~
~
:E
---l
60 120 180 240 300 360
Time [sec)
Figura 6.35 MABP and MCBFV data fram anormal subject injected with trimethaphan.
SYMP v1
~
~ N 1 2
I
x(t) I
----+-----+
MABP!
! 0' •
2 ~[DJ~lf) 0.2 0.3 0.4 0;5 -2
I 2
1----------. 1.5 v3(t)
PARA 1 1 ---. 0 1--1---------,~-..-~'--------+I...
~
0.5
o 0.1 0.2 0.3
f [Hz])
0.4 0.5
. -2
-2-1
~
o
v31
Figure 6.36 Third-order PDM model of cerebral autoregulation for a normal human subject (PDMs
plotted in the frequency domain) after the injection of trimethaphan. Observe the changes relative to
Fig. 6.34, especially in the first (sympathetic) and third (parasympathetic) PDMs (see text).
meehanisms that are not related to the autonomie nervous system (e.g., endothelial, myo-
genie, metabolie, and hormonal faetors, as well as effeets related to the respiratory cyele).
Speeifieally, we interpret the effeets of trimethaphan on the first PDM of Figure 6.36
to indieate the eonneetion of the sympathetie nerve with the PDM peaks at the mid-fre-
queney and high-frequeney ranges (marked as M Is and HIs, respeetively in Fig. 6.34) that
disappear after the injeetion of trimethaphan. The disappearanee of these two peaks can
be explained by the removal of the vasoeonstrietive effeet of sympathetie activity and can
be viewed as a deerease of impedanee at the respeetive frequeney bands (beeause of the
negative trend of the nonlinearity). However, the PDM peak at the low-frequeney range
(marked as L l s in Fig. 6.34) is enhaneed, eorresponding to an inerease in vascular imped-
anee beeause ofthe negativity ofthe assoeiated nonlinearity.
One possible explanation ofthe broadband change in PDM values is that the reduction
in average blood pressure resulting from the removal of the sympathetie eardiae exeita-
tion (i.e., reduetion of eardiae output) is followed by more than proportional reduction in
average blood flow, because of the viseoelastie properties of the vaseular wall (i.e., the
vessel is not a firm, passive tube but exhibits viseoelastie and aetive eharaeteristies). The
inerease of apparent vaseular impedanee that is evident in the very low frequencies of the
first PDM (peak around 0.01 Hz, marked as L Is ) may be due to the removal ofthe modu-
latory effeet ofthe sympathetie aetivity on the humoral meehanism that is likely responsi-
ble for this resonant peak (a eyelieal phenomenon with approximately 2 min period). For
instanee, a change in the hormonal or endothelial modulation of the way the sympathetic
innervation ofthe vaseular wall aets to eontrol its viseoelastie properties may cause such
an effeet. Note that this peak is also evident in the other two PDMs but eontributes differ-
ently to the flow output because of the different nonlinearities assoeiated with the differ-
ent PDMs. The reason we posit that the first PDM is related largely to the sympathetie ac-
tivity is the negative eontribution of this PDM to the flow output (i.e., the contribution of
the first PDM amounts to a decrease in MCBFV for an increase in MABP) consistent
with the anticipated vasoconstrictive action of the sympathetic nerve. The magnitude of
the negative contribution decreases after injection of trimethaphan.
The third PDM also exhibits peaks at the same frequency bands as the first PDM,
(marked as HIP' M I P , and L I P in Fig. 6.34) and they are ascribed to the activity of the
parasympathetic nerve due to their positive contribution to the flow output (positivity of
associated nonlinearity) in the normal subject prior to the injection of trimethaphan. The
positive sign of the output contribution of the third PDM implies that the peaks now cor-
respond to admittanee peaks (in contrast to the impedance peaks ofthe first PDM, owing
to the negative sign of the respective nonlinearity). Therefore, the disappearance of the
peaks M I P and HIP after the injection oftrimethaphan means that the apparent impedance
is increased at the respective frequency bands. The impedance is also increased at low fre-
quencies because the L I P peak is decreased and the values of the associated nonlinearity
are decreased after the trimethaphan injection. The latter change after trimethaphan signi-
fies considerable increase of impedance at the very low frequencies. The removal of the
parasympathetic cardiac stimulation (after ganglionic blockade by trimethaphan) should
counterbanlance to some extent the respective removal of sympathetic cardiac stimula-
tion. Therefore, we are led to a working hypothesis of a facilitative modulatory effect of
the parasympathetic nerve on the humoral or endothelial mechanism likely responsible
far the peak L I P ("facilitative" implies reduction of vascular impedance). This can be ex-
plored in a two-input context (see Section 7.2.4).
Finally, the main features ofthe second PDM (peaks marked as LI, L 2 , M 2 , H 2 in Fig.
6.34) are only slightly altered after trimethaphan injection, with the corresponding output
contribution retaining its basic trend (although reduced somewhat in size). This leads us
to sunnise that this PDM is not affected by the ganglionic blockade of the sympathetic
and parasympathetic nerves. Therefore, the observed peaks must be due to other "local"
mechanisms that reside in the endothelium (e.g., the effect ofnitric oxide or CO2 tension
in decreasing loeal vaseular impedance that may relate to the L 2 andlor M 2 peaks) and
hormonal effects that are distributed throughout the vasculature (LI peak). The H 2 peak is
likely related to the effect of the respiratory cycle (marked as RSA) on the pulmonary
vasculature that is not affected by the ganglionic blockade.
A third-order model of this system satisfies the MOS criterion of Section 2.3.1 part of
the time (e.g., after trimethaphan injection), but not always. Nonetheless, we may consid-
er a third-order model (Q = 3) as the correct structural parameter for cerebral autoregula-
tion, because of the long-held view that the "steady-state" relation between pressure val-
ues and flow has an inflection point (i.e., the "steady-state" flow values exhibit a plateau
for middle values of "steady-state" pressure, but the flow values are rapidly rising for
very large values or declining for very small values of pressure). The existence of an in-
fleetion point neeessitates the use of (at least) a third-order model. Another key structural
parameter is the size of the DLF filterbank (L), which is set at L = 8 since it was found to
range between 6 and 8 for various subjects and data segments.
The particular subject matter of cerebral autoregulation is further elucidated in Chapter
7 (Sec. 7.2.4) by an additional example ofPDM analysis with two inputs (end-tidal CO 2
in addition to arterial blood pressure). This analysis sheds additional light to the afore-
mentioned postulated hypotheses by delineating the effects of CO 2 on cerebral flow au-
toregulation (confinning some ofthe interpretations presented above).
Additional insight is gained by the PDM modeling of data from another experiment
whereby various levels of orthostatic stress are applied on normal subjects by means of
"low-body negative pressure" (LBNP). Some initial results are presented below to cor-
roborate further the efficacy ofPDM analysis.
The application of LBNP emulates changes in orthostatic stress and allows the con-
trolled study of cerebral autoregulation under such conditions (due to its clinical impor-
tance for elderly subjects). The obtained first-order Volterra kemels from 6 min data
records are shown in Figure 6.37 for different levels of LBNP. It is evident that an in-
crease of orthostatic stress raises the admittance gain at lower frequencies initially (i.e., at
-30 mmHg) and for greater LBNP (i.e., at -50 mmHg) across all frequencies. The overall
reduction ofimpedance is about five-fold. In order to achieve a more refinedlevel ofun-
derstanding of these effects, we employ again the equivalent PDM models for the base-
line data and the two LBNP levels, shown in Figure 6.38. We observe decreased gain
(cerebrovascular admittance) at the higher frequencies (above 0.04 Hz) for LBNP = -30
mmHg and increased gain at the low frequency peak around 0.015 Hz for the first PDM.
The latter remains true for LBNP = -50 mmHg; however, the other two PDMs exhibit in-
creased gain at higher frequencies (above 0.10 Hz) as weIl as a new resonant peak at fOOJ
0.04 Hz in the second PDM. The associated nonlinearities remain about the same for
LBNP = -30 mmHg except for the first one that shows a ten-fold increase ofits negative
branch. For LBNP = -50 mmHg, the other two PDM branches exhibit some significant
changes, most notably the convexity of the second nonlinearity that makes the second
PDM contribution to the output mostly positive. This model suggests that autonomic fail-
ure under severe orthostatic stress may be caused by the increased admittance around
0.015 Hz (possibly due to cerebellar activity) when combined with the drop in average
MCBFV.
The complexity of these observations and the potential insight that they may offer to
our understanding of cerebral autoregulation i1lustrate the potential utility of the PDM
model and the high promise of the advocated approach. However this task will require
considerable effort and the careful examination of the results for extensive amounts of
data, especially data collected from subjects under various pharmacological treatments
that selectively target relevant mechanisms (e.g. propranorol or atropine for selective
sympathetic or parasympathetic blocking, respectively, L-NMMA for blocking NO
receptors, modulators of blood gases, etc.). This type of data can yield useful results in
the near future within the presented methodological framework in a manner that sys-
tematizes rigorously the individual effects of various protagonists.
In addition, the interpretation of the PDM model and our understanding of cerebral au-
toregulation can be greatly enhanced by multivariate analysis of data collected on addi-
tional relevant variables (e.g., respiratory and heart rate variability, blood gases, nitric ox-
ide, catecholamines, etc.) when they can be measured with sufficient temporal resolution.
It is understood that the multivariate type of data places a considerable (but not insur-
mountable) burden of measurement and will require multifactorial methods of analysis,
such as the ones discussed in Chapters 7 and 10. No doubt, it represents the "cutting
edge" in systems physiology and a most exciting prospect within the potential capabilities
ofthe peer community.
This wealth of information can become available for the first time through the applica-
tion of the advocated methodology and, if confirmed with sufficient amounts of data, it
surely represents a quantum lead in the state of the art in this field. It is recognized, how-
ever, that the "quantum leap" nature of this advancement raises the threshold of accep-
tance by the peer community. Thus, many more experiments must be conducted and this
0.25 i 0.7. i
0.2 0.8
0.5
!
=jg
0.4
0 ....
'"
CD
~ 0.3
-0.
-0.10 20 40 60 80 100 120 140 160 180 200 00 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
m[sec) FI8qU8nCy (Hz)
~ I 1.4. i
0.1 1.2
C)
::I: 0.1
E
E 0.05
~~
11 .x
Q.
Z
~
-0.1
-0.15 0 20 40 60 80 100 120 140 160 180 200 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
m[sec) Ff8CIUMCY (Hz)
12r--.,..-...,..---r------,r-----r---r--~--,.--- 3.5 i , I
~ o. 2.5
E
E _ 0.8 2
~~
8. 0.4 1.5
Z
~ 0.2
0.5
-0.20 20 40 60 80 100 120 140 160 180 200 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
m(sec) FNQ'*'CY (Hz)
Figure 6.37 The first-order Volterra kerneis of anormal subject for different levels of low-body neg-
ative pressure (LBNP) simulating orthostatic stress. First row: baseline; second row: LBNP = -30
mmHg; third row: LBNP = -50 mmHg. Left column: time domain; right column: frequency domain
representation.
type of analysis must be successfully repeated (confirming these observations) before

these postulated hypotheses can be accepted by the peer community. Far from being a
burden, this is an exciting task that invites the coordinated effort ofthe peer community,
because it promises to deliver (when successfully completed) the requisite breadth and
depth of understanding of the mechanisms subserving cerebral autoregulation that is nec-
essary for drastically improving clinical practice in this century.
Baseline Data
, v1
.~ 0.511---------~---·------~---···-----~----------:----._.---~ - .
0' , ~ -..-
0.3 0.4 0.5 -2 oN 2 4
f Il-lz
21 • .,
x(t)
------r---------r---------r---------
ABP "'!"" i i
0, , ~
0.1 0.2 0.3 5

0·1 [HS·
, , , , 1 15
~ 1.51-········[·········j········t·········I········· 10
\G
1\; ; --. 5
0.5 .. -.-----~---.~-- ..-------~----------~---.-.---- ~
i ;
0
-5
0 0.1 0.2 0.3 0.4 0.5 -5 0 5
f[Hz]
Figure 6.38a PDM model of cerebral autoregulation in anormal subject for baseline data.
LBNP=-3Q mm Hg y1
10 F 'i - i ;;jij
2 ~.--------t------·----!--·_-----·+----------~··--····--
U'
. · . r. . ·_.·. :
: : : :
o~
~::d::.:::I-:::···:I:.:· . :::r
• I I I
• I f I
v1: .
v1
·····i··...·····..........
-. : 1 :
o
o 0.1 0.2 0.3 0.4 0.5 -4 -2 o 2 4
f[Hz] 10, 1
O~I
x(t) V2. !
-.-10 -------r- -----------.-----
:~::::::.:::::::.:::::::::l:::
-5 0 5
10 r-,---,------t-=---r----,
. y3
~ 5
va
-O~. '' , \6
!~
-5 -----------t-------------- ---------·---i-
04
. f[Hz]
0.5 -4 -2 0 2
Figure 6.38b PDM model of cerebral autoregulation in anormal subject for LBNP =-30 mmHg.
6.3 RENAL SYSTEM 333
LBNP=-50 mm Hg
....- ..- .._.. ...._............... .............. _...... .................... .....................
2.5 ~
2 ................. rI .. ................ -.-

~
I
................... -.-I ...................... TI ....................
1.5
.--------. 1 ·::::::::I:::-::::-::::::::::r::::::::.::::::::::
0.5 ---- ......... -, ....... ---..-- -----------,-----------,------- ...--
:
....
: : I
0.1 0.2 0.3

0·1[Hz1-5 -5
x(t) 21:---·---·-~----------+-------·-+-··--~----t-------·--
~-·-
i ~ . . . . -..
--.. . -.. i . . -_.. -..:--.. . --.. . . --r.i. . -----.. . -t---_.
0' , ~
o 0.1 0.2 0.3 0.4 0.5
f[Hz]
2_~ -:::::::r::::::::1:::::::::l::::::::1::::::::::
°H
~-1o l-----L--I--------------------------~
~
1.5 -------+-.--.----+-------.-+---------+------.--
o_~ ::- -::::t:::::::j:::::::::j::::::::::l::::::::::
o 0 , -20 1,--------.------ -------------------,---------.-
o O.1 0.2 0.3 0.4 0.5 -5 o 5
f[Hz]
Figure 6.38c PDM model of cerebral autoregulation in anormal subject for LBNP = -50 mmHg.
6.3 RENAL SYSTEM
Renal blood flow autoregulation (i.e., the process by which vascular hemodynamic im-
pedance is adjusted to minimize fluctuations in renal blood flow caused by fluctuations in
arterial blood pressure) is critical for maintaining fairly constant filtration rates by the kid-
neys [Chon et al., 1993, 1994a, b, 1998b; Marsh et al., 1990]. The two most important
meehanisms subserving renal autoregulation are the myogenic and the tubuloglomerular
feedback (TGF). The myogenic mechanism is vaseular in nature and causes changes in
blood vessel diameter and meehanieal eharaeteristics (e.g., stiffness) in response to
changes in loeal vascular hydrostatie pressure. An inerease in blood pressure causes con-
strietion of the vascular smooth muscle that inereases the vascular wall tension and limits
the change in blood flow.
TGF is an intrarenal, local feedback mechanism govemed by flow-rate-dependent
changes in the concentration of NaCI at the maeula densa. Changes in arterial pressure
lead to changes in NaCI reabsorption in the renal proximal tubule that affect the flow rate
of tubular fluid in subsequent segments of the nephron. Reabsorption of NaCI in the as-
cending limb of the loop of Henle is flow-rate dependent, whereby increases in flow rate
are eausing the concentration ofNaCI to increase, and vice versa. The NaCI concentration
changes are sensed at the macula densa, which is a collection of specialized cells at the
most distal point of the loop of Henle. Macula densa cells trigger a signaling chain that
causes the arterioies to adjust the smooth muscle tone in order to limit variations in flow
within the tubules. Both TGF and the myogenic mechanism are triggered by changes in
arterial blood pressure and act on a common vascular segment (the afferent arteriole) so
that interactions between the two mechanisms are to be expected. Therefore, nonlinear
analysis (like the one advocated herein) is especiaIly weIl suited for the study of such in-
teractions.
Both the myogenic mechanism and TGF generate autonomous oscillations in arteriolar
diameter, which suggests that nonlinearities are functionally significant features of both
mechanisms. Previous work using various linear and nonlinear methods of transfer func-
tion and coherence analysis gives ample confirmation of the importance of nonlinearities
in the two components ofrenal autoregulation [Chon et al., 1993, 1994a, b, 1998a, b; Hol-
stein-Rathlou et al., 1995; Marmarelis et al., 1993, 1999b], in a manner qualitatively sim-
ilar to cerebral autoregulation discussed in the previous section.
The dynamic characteristics of the two mechanisms have been studied in rats using
broad-band arterial pressure forcing to separate the nonlinear dynamic properties of the
two renal autoregulatory mechanisms [Chon et al., 1993; Marmarelis et al., 1993]. The
general conclusion is that the myogenic mechanism causes decreased impedance over the
frequency range from 0.1-0.2 Hz, whereas the TGF mechanism is active in the frequency
range below 0.08 Hz, where increased impedance is observed. The combined action of
the two mechanisms attenuates the effect of blood pressure fluctuations on renal blood
flow at frequencies less than 0.1 Hz, where most ofthe power ofthe spontaneous pressure
signal resides, while the opposite effect occurs in the 0.1-0.2 Hz frequency range (where
very little power of spontaneous pressure fluctuations exists).
Experiments were performed on male Sprague-Dawley rats, and details of the setup
are given in Marmarelis et al., (1993). Measurements of renal blood flow and arterial
blood pressure were made while broadband fluctuations were induced in the arterial
blood pressure with pseudorandom forcing. These were generated by a bellows pump
through a cannula inserted into the distal aorta at the bifurcation. The input was derived
from a constant-switching-pace symmetric random signal (CSRS), which exhibits the
spectral properties of band-limited white noise up to 2 Hz (see Section 2.2.4). A unique
seed was used for the random-number generator in each experiment to secure indepen-
dent experimental runSe The arterial blood pressure was measured in the superior mesen-
teric artery with a catheter, and the renal blood flow was measured in the left renal
artery with an electromagnetic flow probe. Three different power levels of forcing were
used in each experiment on five different preparations, yielding a total of 15 arterial
blood pressure and 15 blood flow data records. The mean and root-mean-square (RMS)
values (after de-meaning) of the resulting pressure and flow fluctuations are given in
Table 6.1 for the three levels of forcing: low (3.86-5.66 mmHg), medium (6.33-7.96
mmHg), and high (10.40-12.41 mmHg) in terms ofRMS level ofthe de-meaned arter-
ial pressure forcing.
The input-output records comprised 512 sampIes over 256 seconds with a sampling
rate of 2 sampIes per second (Nyquist frequency of 1 Hz), after digitallow-pass filtering
(using a 20th-order Hamming window) to avoid aliasing. Each data record was subjected
to second-order polynomial trend removal and was de-meaned (by subtracting out the
mean value) as weIl as normalized by dividing with its RMS value prior to processing.
Thus, regardless of different power levels of arterial pressure forcing, aIl analyzed data
sets had zero mean and unit variance, because we wish to study the form of the system
kemels irrespective of scaling.
The mean and RMS values ofthe resulting pressure and flow signals are plotted in Fig-
ure 6.39 in order to demonstrate that conventional "steady-state" analysis reveals nothing
beyond a crude trend and misses entirely the critical dynamic characteristics of renal au-
toregulation. Typical input-output (pressure-flow) data are shown in Figure 6.40 for three
power levels of CSRS forcing in a normotensive rat. The results of linear transfer function
and coherence analysis are shown in Figure 6.41 and demonstrate reduced coherence (i.e.,
likely presence of nonlinearities) in the lower frequency range (below 0.1 Hz) consistent
with our results in cerebral autoregulation. The obtained coherence values also demon-
strate decreasing autoregulatory effects at higher power levels of forcing, indicating that
strong forcing overwhelms the autoregulatory mechanisms.
The obtained first-order Volterra kernel estimates for the three power levels of forcing
are shown in Figure 6.42. The observed changes in wavefonn as the input power level in-
creases are suggestive of negative compressive (sigmoidal) feedback (see Section 4.1.5).
The efficacy ofthird-order Volterra modeling ofthis system using LET (L = 8) is corrob-
orated by the model prediction errors given in Table 6.2 and illustrated in Figure 6.43,
where the contributions ofthe various orders to the model prediction are shown. Note that
the third-order model seems to capture some "flow depressions" that are not captured by
the second-order model. This suggests that a third-order nonlinearity can account for the
main nonlinear dynamic characteristics of renal autoregulation that are found in the fre-
quency range where the TGF mechanism is active (below 0.08 Hz). Note also that the
aforementioned "flow depressions" in the first-order residuals are more pronounced for
medium and low levels of forcing, and the degree of third-order nonlinearity (as measured
by the percentage contribution of the third-order kernel to the model output prediction)
follows the same pattern, as shown in Table 6.2. This suggests that the presence of non-
linear autoregulation is least evident at the high level of forcing, leading us to posit that
the TGF may become "overwhelmed" at high levels of pressure forcing.
The demonstrated efficacy ofthe advocated methodology in nonnotensive rats led us to
extend the study to spontaneously hypertensive rats (SHR) and examine the effects ofhy-
pertension on the obtained nonlinear models. Four SHR preparations were compared with
four nonnotensive Sprague-Dawley rats (SDR) for two different levels offorcing. The re-
Table 6.1 Mean and standard deviation (SD) values of arterial pressure (AP) and renal blood flow
(BF) for three different power levels of forcing and five experimental preparations (rats)
Data set # AP mean, APSD, BF mean, BFSD

(exp. prep.). Power level mmHg mmHg mllmin mllmin
High 94.19 10.40 5.72 0.83
Medium 109.13 6.33 6.24 0.50
Low 105.69 3.99 6.42 0.33
High 88.96 11.58 3.97 1.06
2 Medium 99.21 6.92 4.12 0.65
Low 102.38 3.86 4.54 0.39
High 92.26 12.41 4.15 0.88
3 Medium 96.41 7.69 4.81 0.56
Low 105.32 5.28 5.71 0.37
High 112.29 11.64 10.62 1.26
4 Medium 115.08 7.96 10.11 0.82
Low 118.54 5.66 9.24 0.71
High 108.54 11.41 6.73 1.17
5 Medium 113.08 7.72 7.12 0.89
Low 115.42 5.03 7.33 0.66
12
• IBOB
10 • MEDIUM
:g
!~
8
A IJJW
~ 6
"8
c8 4
iCD
2 2
o
eo 90 100 110 120
Mean Arterial Pressure (mmHg)
c 1.4 , , , , , , , ,
• IDGH
~
g 1.2 • MEDIUM
~ A LOW
~ 1 I I I •
JO.8 I I ! ! '!' !
o
~ 0.6
.~
~ 0.4
c
~ 0.2
~
Cf) 0 1"'1"'1"'1'''1'''1'''1'''1
.
o 2 4 6 8 10 12 14
Standard Deviation of Arterial Pressure (mmHg)
Figure 6.39 Comparison of mean blood flow versus mean arterial pressure (top display) and stan-
dard deviation of blood flow versus standard deviation of arterial pressure (bottom display) at high,
medium, and low power level of arterial pressure broadband forcing in the rat kidney. Observe the
formation of three clusters with respect to arterial pressure in the standard deviation plot: low (3.5-6
mmHg), medium (6-8 mmHg), and high (10-12.5 mmHg) [Chon et al., 1993].
sulting mean and RMS values are given in Table 6.3. Note that the RMS value is defined on
each de-meaned signal and it is not related to averaging over different preparations.
Typical pressure-flow data are shown in Figure 6.44 for low-power level of forcing.
The observed differences between SDR and SHR in the first-order Volterra kernel esti-
mates are subtle and insufficient to differentiate between the two cases. However, some
discriminating differences are observed in the second-order and third-order Volterra ker-
nel estimates obtained via LET (L = 8), as illustrated in Figure 6.45 and Figure 6.46, re-
spectively. Distinct differences exist for both low and high level of forcing that are ev-
ident as distinct patterns of magnitude peaks in the bifrequency domain used for display
in Figures 6.45 and 6.46 (magnitudes of 2-D FFT of second-order kernel and of a third-
order kerneI slice). Generally, hypertension removes the nonlinear interaction between
the TGF and the myogenic mechanism seen in Figure 6.45(a) at the bifrequency point:
Cl
:I:
..
!t4 .
'00
. ,.
HIGH
l- I:::, .u.·
W
a:
....
:::l ...
~
E
3:
....
C/)
C/) ... 0 ....
...J
w u. ....
a: .... o
a.
;i o 2.10
>&0 o...J
CI: ....
w
b: ...
lD
.......
, .~
<l: ...
... .... .... .... ... .00. ... ... .... .... .. '00.
Ci no.
,... MEDIUM
~ n4.
..,.
S. ,. :SUD
~ ....
E
:::, ....
i S .... ~
E ....
C/)
W
a:
•• ~...J ....
a. .... u. ....
...J
<l: ....
oo ..,.
CI: glD ....
w
I-
n.o
.......
~
... ...
...
.... .... .... ... '00. ... .... .... .... ... '00.
,... '110 LOW

Cl ..,.
:c tu.
lw ,,...· Iuo....
:::,
a:
:::l .... s. ....
1ßw .. ~uo
...J
&: .... u. ....
o
...J
sa: n.o o 110
W
b: ...
n .. g
lD
....
. .....
<l: ....
... ... ..... ....
TIME (sec )
... '00. ... .... .... ....
TIME (sec)
.. '00
Figura 6.40 Typical input-output data (arterial blood pressure: left column ; renal blood flow: right
column) from the renal autoregulation experiments in rats tor three power levels of CSRS (quasiwhite
broadband) pressure forc ing , marked as high (first row), medium (second row), and low (third row)
[Chon et al., 1993).
;; = 0.13 Hz,12 = 0.03 Hz (and is symmetrie about the diagonal) . The peak at the diag-
onal point (fj = 12 = 0.03 Hz) is indicative of the presence of nonlinearity in the TGF
mechanism that is thought to reside around the 0.03 Hz frequency band. Because the
second-order and third-order kernel estirnates are quite variable from preparation to
preparation (and also over time) and their inspection is a rather demanding task, we de-
rive the equivalent PDM models and attempt the same comparison between SDR and
SHR, as weIl as between high and low level of forcing. Two PDMs are obtained in all
cases that seem to largely correspond (fortuitously) to the TGF and myogenic mecha-
338 SELECTED APPUCATIONS
....... 1 .......
UJ
o
o.nG UJ
g o.nG
~ G.t40 ~ o.J4O
Z
~
..:
Q.210 5i
:::;:
0.210
~
",.....~
0.'10 Z 0.11D
2
o
;:: 0.110
o
i= 0.'50
,/ ..\.
/
(,) .';
~ 0..110 5 0.120
••.•...................._..:
;:) u,
u, .....-o1
Cl:
CI: G.IOGI~1
UJ
....
~ UD-01 ~ G.IOlX-Gt
cn 2
2 ......-01 c:::( Q"JI)Q[-o,
..:
~ ~
Cl:
..... ...
... ...00 ..... ..... ..- ..... 0.0
"'00 ..... ..... ..... .....
....... UJ
o .......
UJ
go.nG
~CU40
f:' ;:)
~
~
CU70
...-
I
:::;:
~
:::;:
Q.210 ! ~ ... Z D.21D
: -, o
Z
o
allD
,: ..".... i=
o
0.180
i= 5
~
0.1.
o
~
,/i u..
0..110
u,
0. • •
..................... ... -- . ...
~
CI:
UJ
0.'20
0: Q.JClOI:~' ~ 0.I0OI:-01
UJ
lJ; UlllI<-o1 , 2
« GADCI:-O'
2 ~,/ Cl:
~ uoat:~ " .....
..... ......-01
..... ...
~
... "'00 ..... ..... 0.- ..... ... 0,'" ..... ..... ..- .....
FREOUENCY (Hz) FREOUENCY (Hz)
Flgure 6.41a Transfer function magnitude (galn of admittance funct ion) obtalned from flow/pres-
sure data at high (top panels) and low (bottom panels) arterial pressure forcings for normotensive rats
(Ieft panels) and hypertensive rats (right panels). [Solid Iines are means; dotted and dashed Iines are
standard errors bounds (SE).) Transfer funct ion magnitude units are arbitrary since the flow-pressure
data were normalized to unit variance [Chon et al., 1998).
nisms . The results are shown in Figures 6.47 and 6.48 and demonstrate the potential of
PDM analysis for improved physiologieal interpretation of nonlinear dynamie models of
renal autoregulation.
For low level offoreing, it is observed that the first PDM (that eorresponds to the myo-
genie meehanism) remains essentially the same for SDR and SHR, but the seeond PDM
(whieh eorresponds to the TGF meehanism) is signifieantly altered by hypertension.
Specifieally, it is seen in Figures 6.47 and 6.48 that the seeond PDM exhibits two magni-
tude peaks in the frequeney domain that have distinetly different size relation (i.e., for
SDR, the peak at - 0.02 Hz is dominant over the peak at -0.07 Hz, but for SHR the two
peaks are roughly equal in size). It is very interesting to see in Figure 6.49 that the seeond
PDM for SDR exhibits the same parity between the two peaks for high level of foreing,
suggesting that this parity results from an elevated operating point of average arterial
pressure. However, this parity is destroyed by high-level foreing in SHR, as attested to by
the appearanee of a dominant resonanee peak in the seeond PDM at -0.04 Hz shown in
Figure 6.50 (as ifthe two previous peaks are merged into one under the eombined effeet
of hypertension and high level of foreing).
It is important to note also that the eontribution ofthe seeond PDM is diminished rela-
,... -.
,...
....., ' .-" ....., .......--::::-- ~-_._ ------......
....., .....,
..... ur e, 700
/ ' .../ .v ......,..... ,. ...... ~::.:::
W
~.....,
....
Ü .....,
....
Z
~
~ lIAOO
8uoo
.....,
w
er:
w
J:
0
Ü
-
.uoo
.....
..... . ,.
...
... ..... uoo uoo .- .....
o.a
.. ..,. ..... u .. ..... .....
-
•.00
.......,
w .....
-.....
...
'.00
.....,
.....,
w .....
Ü
zw .... ~
w
....
er: ....
ffi ....
w
J: 0._
6 ..-
Ü
0
uoo Ü o.JOO
.....
..,..
uoo
...
... 0.,,, uoo uoo ..... ......
.
e, ,..
.
0.0 ..... lUOO ..... 0.... .....
FREOU ENCY (Hz) FREOUENCY (Hz)
Fig ure 6.41b The coherence fun cti on measurements for the fo ur cases of renal autoregulation ex-
periments with the transfer functions shown in Fig. 6.41a [Chon et al., 1998].
tive to the contribution of the first PDM in the case of SHR with high-level forcing. This
does not imply that the role of the TGF in the latter case has diminished but rather that it
has likely changed into a chaotic mode, giving rise to a previously observed spontaneous
chaotic oscillation between 0.03 and 0.04 Hz in hyperten sive rats [Marsh et al., 1990].
This also explains the fact that the model prediction error increases in this case, as a result
ofthis input-independent chaotic oscillation, not captured by the Volterra model. The first
PDM is essentially the same in all cases, suggesting that the dynamic characteri stics of
the myogenic mechanism are not affected by the level of arterial blood pressure or hyper-
tension.
We also note that the contribution of the first PDM to the model output power is 5 to
10 times larger than the contribution of the second PDM , as quantified by the respective
eigenvalues. The estimated nonlinearities associated with each PDM show that the second
PDM exhibits stronger nonlinear characteristics (relative to the respective linear charac-
teristics), consistent with the observed lower coherence at frequencies below 0.08 Hz.
This is visually evident in the PDM model with a three-dimensional static nonlinearity as-
sociated with the two PDMs in the normotensive case of renal autoregulation shown in
Figure 6.51. Note that in this PDM model, the spike at the zero lag of the first PDM has
been removed (as representing an all-pass characteristic) in order to enhance our ability to
examine the autoregulation dynamics separately from the all-pass characteristics of pas-
340 SELECTED APPLICATJONS
2.40
2. 10
1.80
1.50
1.20
0.900
0 .600
0.300
0.0
- 0.300
-0.600 - r-r,-ITITT T- f T l
0.0 6.00 12 .0 18.0 24.0 30 .0
TIME LAG [SEC)
Figure 6.42 Typical first-order Volterra kemel estimates at the high (dashed line); medium (dotted
line) and low (solid line) power levels of arterial pressure forcing, obtained from the same preparation
(rat). Note the increased damping of the waveforms as the level of arterial pressure forcing increases.
The amplitude units of these kerneis are arbitrary, since the pressure-flow data were normalized to
unit variance. In general, the first-order kemel units are (output units)/(input units)/(time units). Actual
units can be obtained by use of the corresponding RMS or SO values of pressure/f1ow(input/output)
given in Table 6.1 [Marmarelis et al., 1993].
Table 6.2 NMSEs of model prediction for three power levels of broadband input-forcing and
various model orders for the renal autoregulation data. Note that the NMSE decreases with
increasing power level and model order
Model order
Power Level Ist Order 2nd Order 3rd Order
High 6.86% 6.67% 1.51%
Medium 15.04% 14.70% 4.17%
Low 18.70% 18.63% 7.20%
Table 6.3 Averaged mean and de-meaned RMS (SO) of renal arterial pressure (AP)
and blood flow (BF) for two different levels of forcing (computed over four SHR and
four SOR preparations)
Sprague-Dawley rats (SDR)

Forcing Level Mean ± RMS AP (mmHg) Mean ± RMS BF (mllmin)
High 110.50 ± 7.88 11.75 ± 1.18
Low 117.00 ± 4.72 12.68 ± 1.34
Spontaneously hypertensive rats (SHR)
High 145.01 ± 6.76 12.76 ± 1.06
Low 147.64 ± 4.32 10.27 ± 1.21
'v...,'
Z
~~'I~,
~~~,
:>
o
w
N
:,
-<
:>
a:
o
z:
w
o
:>
S
.
Q.
:::!
l""..;v.".....,.rv .../'~"'V---.Jo-..r..-v-JVv""""-''-<r''''"'\...'''''''.N ....,''''''--tY''''"'"''..,....'''''~~ 5
0.0 51.2 102. 154. 205 . 256 .
TIME [ SEC)
Figure 6.43 Typical arter ial blood pressure signal at the med ium power level (trace 1), the assoclat-
ed blood flow signal (trace 2), the first -order model residuals (trace 3), the second-order model resid -
uals (trace 4), and the third-order model residuals (trace 5) based on Volterra model pred ict ions. Note
the broadband nature of these expe rimental pressure-flow signals, and the absence in the thlrd-or-
der residuals of the "f1ow depressions" seen in the first-order and second-order residuals. The ampli-
tude units for the pressure-flow signals are normalized to unit variance (see text) [Marmarelis et al.,
1993] .
Ci 150 161 i
J:
E
.§. 140 I14
l!! .§.12
::I
~ 130 :t SDR
o
l!! Li: 10
c,
öl 120
"8
'e: ~ 8
~
<110' , ., r I " 6' !
o 50 100 o 50 100
~150 16, I
Cl
J:
E
.§. 140 I14
l!! .§.12
::I
:::130 :t
o
'"eil
0:
Li: 10
äi 120
"8
.~ ~ 8
t I
-c 110' 6' ,
o 50 100 o 50 100
Time (Seconds) Time (Seconds)
Figure 6.44 Typ ical input-output data (arterial blood pressure: left column; renal blood flow: right
column) from the renal autoregulation experiments for low power level of CSRS (quasiwhite broad-
band) pressure forc ing in normotensive (top row) and hypertensive (bottom row) rat preparations
[Chon et al., 1998].
341
342 SELECTED APPLICA TlONS
::\ij
v
i
~;
~
", u
'.0laJCY ,u
I ( tQ)
0 .1 02 0.3 0."
os
~FJ(Iiz)
B """"'1TI- - - r t- - - - - - , .
Figura 6.45a Contour plots (Ieft) and surfaee plots (right) of magnitude of two-dimensional fast
Fourier transforms (2D-FFTs) of averaged seeond-order kemels for high (A) and low (8) power levels
of forcing in normotensive (SDR) rats. The two axes of the plots represent the two frequeney vari-
ables " and '2'
Note the strong nonlinear interaetion peak present at low power level at bifrequeney
locatlon r, = 130 mHz and'2 = 30 mHz (and its symmetrie about the diagonal) [Chon et al., 1994b].
sive hydraulies. Thus removing the zero-lag spike from the first PDM , we allow the re-
maining PDM waveform to represent strietly the myogenie meehanism. The normalized
mean-square error ofthe PDM model predietion for SDR is about 12% for low-level fore-
ing and below 5% for high-level foreing (suggestive of the improved SNR in the latter
ease). This error inereases for SHR model predictions and is eonsiderably higher for high-
level forcing (as mentioned above in connection with the emergence of a chaotic oscilla-
tion in this ease).
Finally , we should note that possible direet effeets of the autonomie nervous system
on renal autoregulation are not examined in this study beeause the rat kidney is dener-
vated in the experimental preparation-raising the exciting prospeet of naturally inner-
vated kidneys in future studies, as well as PDM modeling under eonditions of sponta-
neous aetivity (similar to the presented studies of cerebral autoregulation in the previous
seetion).
6.4 METABOLIC-ENDOCRINE SYSTEM
As an illustrative example for the application ofthe advocated methodology to a metabol-

ic-endocrine system, we model the dynamics of the insulin-glucose interactions during
6.4 METABOLlC-ENDOCRINE SYSTEM 343
A
:[ 1 4
"i
~:
12 11:"-" »<; "- J 2

~
:BS:: I
1
. •• ... u
FIlIQUDIC'f" (...)
...
0
0
B r-. -~~-~---'i
o.~~ __ O.J 0,4

.-..rv1lNlCY pt 0..., lI.I
Figure 6.45b Contour plots (Ieft) and surface plots (right) of magnitudes of 2D-FFTs of averaged
secend-erder kemels for high (A) and low (8) power-level forcings in hypertensive rats (SHR). The
two axes of the plots represent the two frequency variables and '1 '2'
Note absence of nonlinear in-
'1 '2
teraction peak at = 130 mHz and =30 mHz observed in normotensive rats for low power level of
forcing [Chon et al., 1994bl.
metabolie activity. This subject has received extensive attention for a long time, primarily
through the use of compartmental (parametrie) modeling [Bergman et al. , 1981; Carson et
al., 1983; Cobelli & Mari , 1983; Cobelli & Pacini, 1988; Vicini et al., 1999] , because it is
of great importance for the diagnosis and treatment of diabetes. Unfortunately, despite the
considerable effort and resources dedicated to this task to date , the modeling results have
been modest in that no reliable diagnostic technique has emerged and the dream of an "ar-
tificial pancreas" regulating blood glucose levels through automated insulin infus ions re-
mains elusive. It is for this reason that the author and his associates have selected this sys-
tem as a high-priority application ofthe advocated PDM modeling approach.
The analyzed data are of two types: (1) continuous (every 5 min) glucose concentra-
tion measurements and doses of insulin infusions through an implanted micropump in
Type I diabetics (provided by Medtronic-Minimed, Inc.); and (2) continuous (every 3
min) plasma glucose and insulin concentration measurements of spontaneous activity in
anesthetized dogs (provided by Richard Bergman and Katrin Huecking of the USC Med-
ical School). In both cases, the insulin is viewed as the input and the glucose as the output
of a Volterra model, since the underlying physiological mechanisms are nonlinear and dy-
namic. These various mechanisms can be grouped into two categories: (1) glucoleptic
A 1""1- , . , . ,- - - - - - - - - . . . "
"
!
~ ~
':J
05
o
o
•• rt".I (Hol o., 0.2 0.3 0.•
~PJGfz)
. .oo.oa
ll.J
B ffi i
! l.5
c~
I~ 0.5
.. u 0.1 0.2 lI.3 0..

nI[GU(JOC't rt (Ho)
~PJGfz)
Figura 6.46a Contour plots (Ieft) and surfaee plots (right) of magnitude of 2D-FFTs of averaged
third-order kernel sliee for high (A) and low (8) power-level foreings in normotens ive (SDR) rats. Note
the strong nonlinear interaetion peak for low power level of foreing at the bifrequeney loeation: t. =
130 mHz and'2 = 30 mHz, as in the seeond-order kernel [Chon et al., 1994b).
(pertaining to the insulin-assisted uptake of glucose by various tissues and organs); and
(2) glu cogenic (pertaining to the generation of new glucose from various tissues and or-
gans in response to insulin elevation).
The basic physiological mechanisms of glucose metabolism and insulin secretion have
been studied extensively through compartmental modeling but no reliable quantitative
model has yet been advanced that explains weIl the existing data. Multiple regulatory
closed-loop mechanisms subserve the insulin-glucose interactions (e.g., insulin-depen-
dent glucose utilization in muscles and adipose tissue, gluco se-dependent pancreatic in-
sulin secretion, liver glucose production and uptake, renal secretion of glucose, etc.). In
addition, there is insulin-independent glucose utilization in the central nervous system
and in the red blood cells, as weIl as additional hormonal controllers (such as glucagon,
which affects the liver glucose production and whose secretion by pancreatic o-cells is in-
hibited by elevated glucose and insulin). It is also qualitatively known that the concentra-
tion of free fatty acids and epinephrine influence the nonlinear dynamic relation between
insulin and glucose . A preliminary quantitative model of the effect of free fatty acids is
presented in Section 7.2.3 as an example of a two-input PDM model.
Confronted with this rather complicated nested-loop system, early investigators resort-
ed to modeling simplifications of the full physiological complexity in hope of partial but
A irr, - ,----c-------,
"-1\
g 1,5
0,5
o
o
0,, 0.2 0,3 0,4
~FJUiz) 0.5
Bi i
1.5
0,5
....
.. ..
flIlllWJll:r .. (Hr)
0,1
~FJCliaJ
0.2 U 0.4
0.1
Figura 6.46b Contour plots (Ieft) and surface plots (right) of magn ltudes of 2D-FFTs of averaged
third-order kernel slice for high (A) and low (8) power level of forcing in hypertensive rats (SHR). Note
the absence of the non linear interaction peak at '1
= 130 mHz and '2
= 30 mHz observed in nor-
motensive rats for low power level of forcing [Chon et al., 1994b].
meaningful results . The most successful of these efforts is the so-called "minimal model"
that was discussed briefly in Section 1.4. This "minimal model " offered some useful re-
sults in the context of specialized clinical "glucose-tolerance" tests [Bergman et al., 1981;
Bergman & Lovejoy, 1997] but remains inadequate to represent the true complexity of
this system under natural operating conditions. Over the last 30 years, more than 50 para-
metric (compartmental) models have been published in the literature but none can claim
satisfactory performance (for various reasons) under natural operating conditions.
We take a different approach to this problem by dispensing with the unnecessary (and
potentially misleading) constraints of compartmental models and by adopting a true-to-
the-data inductive approach, as advocated throughout this book. It is recognized that the
obtained PDM model (based on the available insulin-glucose data) is nonstationary and
subject specific. Therefore, this PDM model is adapting through time (through tracking
algorithms) and is customized to each specific subject. In this framework, we have been
able to obtain PDM models of excellent predictive performance and remarkable inter-
pretability in terms of a glucoleptic and a glucogenic PDM branch, as discussed below.
These adaptive PDM models offer a realistic prospect for achieving the long-held
dream of an effective "artificial pancreas," because they can be used to control properly
the continuous infusions of an implanted insulin micropurnp in the context of a model-
reference control strategy based on an accurate nonlinear dynamic model of the underly-
1st PDM tor Normotenslve: Low Forcing 2nd PDM tor NormotenalYe: Low Forelug
1j I 1i I
T1meDomlin
! 0.5
0.5
, .... ,
r, I
l
°r
I . ' .
• , l " "
O, I .' .,~-..:-_ ;;. _ ~
. ~ __
1\ I ... . . . ,
v
-0.5 -0.5
0 10 20 30 40 50 0 10 20 30' . 40 50
Time Lag (Seeonds) Time Lag (SecondI)
0.015 . i 0.015" ,,,

I I
'
I
, Frequency Dameln
!
I
0.01 0.0111 \\
c
i
~
~ 0.OO5l I;?if-
11
"
\ .:,:::/
\, .:
---------------_.
0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
Frequency (Hz) Frequency (Hz)
Figura 6.47 The two obtained PDMs of renal autoregulation (averaged over three data records) for
the low-forc ing normotensive data in the time domain (top panels) and in the frequency domain (bot-
tom panels). The first PDM (Ieft panels) corresponds to the myogenic mechanism with a resonance
peak at around 0.12 Hz, and the second PDM (right panels) corresponds to the TGF mechanlsm with
a dominant resonance peak around 0.02 Hz. The standard error bounds in dashed Iines correspond
to ± one standard deviation [Marmarelis et al., 1999b).
ing insulin-glucose relation. The importance of this prospect for reliable regulation of
blood glucose concentration cannot be overstated.
First, we consider the data of injected insulin (input) and glucose concentration (out-
put) from human subjects (8-hour data records sampled every 5 min). Two PDMs have
been consistently obtained for a11 subjects and segments, whereby the first PDM repre-
sents the insulin-assisted glucose uptake (glucolepsis) and the second represents the in-
sulin-induced glucose production (glucogenesis) . An illustrative example of these adap-
tive PDM model s is shown in Figure 6.52 for two 8-hr records from a human subject at
successive days (i.e., separated by 16 hr). The glucoleptic PDM of both models depicts
the insulin-assisted reduction of glucose concentration, with minor differences in dynam-
ics and magnitude for the two segments. Specifically, the dynamics of the second seg-
ment are a bit faster (peaking at about 30 min instead of 50 min) and show a slight over-
shoot after - 2 hr (in both cases the effect of insulin diminishes after -5 hr). The
associated glucoleptic nonlinearity is decompressive in both cases , indicating a supralin-
ear insulin-assisted reduction of glucose concentration. Similar observations apply to the
1st PDM tor Hypertenlive: l.ow Forel ug 2nd PDM tor Hypertenslve : l.o w Forcing
1i i 1i i
Tlme Domsln
! 0.5 0.5 .
~I r;
,'A
I
°r
/fi
I '\ ""l.. ---
-05
.
' -~:---
. 0.... 10 20 30 40 50
-0.5
0 10 20 30 40 50
Tlme Lag (Seconds) Time Lag (Seconds)
0.015. i 0.016, I
Frequency Domsln
-8 0.01 0.01
~
i
~ liiA~"
r· t
Ii:
u.. 0.005
0.005 .' -
\ '.
v' ....
" ...
0.1 0.2 0.3 0.4 0.5 00 0.1 0.2 0.3 0.4 0.6
Frequency(Hz) Frequency (Hz)
Figure 6.48 The two obtained PDMs (averaged over three data records) for the low-forcing hyper-
tensive data in the time domain (top panels) and in the frequency domain (bottom panels). The first
PDM (Ieft panels) corresponds to the myogenic mechanism and the second PDM (right panels) cor-
responds to the TGF mechanism of renal autoregulation. The first PDM is similar to the low-forcing
normotensive case (Figure 6.47) but the second PDM is different and resembles the high-forcing nor-
motensive case (Figure 6.49). The standard error bounds in dashed Iines correspond to ± one stan-
dard deviation [Marmarelis et al., 1999b].
glucogenic PDM and its associated nonlinearity, although the contribution of the gluco-
genic component is decreased by about 30% in the second segment (see the ordinate of
the associated nonlinearities). However, the peak response time ofthe glucogenic PDM is
about the same in both segments (-90 min) and the effects of insulin-induced glucogene-
sis last a bit longer (about 7 hr, instead of about 5 hr for glucolepsis) in both segments.
Similar results were obtained for three more human subjects, although some intersubject
variability was observed.
The consistency of these results and their direct interpretability (based on what is ex-
pected qualitatively from all previous knowledge) are remarkable and hold the promise of
a reliable clinical tool for improved diagnosis and (ultimately) effective glucose regula-
tion with an implanted insulin micropump equipped with adaptive model-reference con-
trol capability (the "artificial pancreas"), For instance, the "insulin sensitivity" of a human
subj ect can be quantified clinically by the peak value (or the area) ofthe glucoleptic PDM
in combination with the values and curvature (second derivative) ofthe glucoleptic non-
1st PDM fOr Normotenslve: HighForcing 2nd POM rar Normotenslve: HighForcing
1i i
TIme Domain
! 0.5
0.5
I
°r
-0.5
0
~
10 20 30 40 50 10 20 30 40 50
Time Lag (Seconds) Time Lag (Seconds)
0.015. i 0.015. I
Frequency Domaln
t\ ,' ,
! 0.01 O.OH \: ..
1I 1
t
I/ 1
\
\
,'I,,- \ \
~ 0.005 0.0051
".I "
\ \
\
,
I \ \
\ \
"~ I
0 00 0.5
0 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4
Frequency (Hz) Frequency (Hz)
Figure 6.49 The two obta ined PDMs (averaged over three data reeerds) for the high-fore ing nor-
motensive data in the time domain (top panels) and in the frequeney domain (bottom panels). The
first PDM (Ieft panels) eorresponds to the myogenie meehanism and the seeond PDM (right panels)
eorresponds to the TGF meehanism of renal autoregulation. Compared to the low-foreing normoten-
sive ease of Figure 6.47, the first PDM is not altered signifieantly but the seeond PDM exhibits
marked ehanges (two resonant peaks at 0.02 and 0.07 Hz), resembling the low-foreing hypertensive
ease of Fig. 6.48 [Marmarelis et al., 1999b].
linearity. The respective values of the glucogenic branch can provide a clinical measure
of the subjects ability to produce new glucose in response to elevated insulin (indirectly
assess ing possible malfunction ofthe liver and/or kidneys in this regard ).
We now turn to the experimental data of spontaneous plasma glucose and insulin con-
centrations from the anesthetized dog (lO-hour data record sampled every 3 min). The
concentration of free fatty acids (FFA) was also measured at the same times and subse-
quently analyzed. Single-input results are presented below, but the most intriguing result
is when FFA was analyzed as a second input in addition to insulin (see Section 7.2.3).
The experimental data records are shown in Figure 6.53. Two PDM s were again ob-
tained for the insulin-to-glucose system from these data and shown in the PDM model of
Figure 6.54. Although the first PDM exhibits again the main glucoleptic characteristics of
an insulin-assisted reduction of glucose concentration (albeit with different dynamics than
before), the second PDM presents a more complicated bipha sic behavior (both glucolep-
tic and glucogenic phases). This may be attributed to homeostatic mechanisms of the au-
1st PDM tor Hypertenstve: High Forctng 2nd PDM Tor Hypertensive: High Forctng
TlmeDomain
! 0.5 0.5
I
I.
I
°U
,I'·"
\1 "
{,
I '
-0.5 -0.5
0 10 20 30 40 50 0 10 20 30 40 50
Time Lag (Seconds) Tarne Lag (Seconds)
0.015, i 0.015 . i
I 0.01 0.01 f :'\'~\

"",, \
\
Frequency Dornaln
,w \~
I I '\ \
' ,' 1 ., \
t: 0.005 \
~
(~~~=~~ 0.005 ~t! \ _ ~'~: " _
: " . ~~- - - -- - -- -
;.~.-.. .., ---:-:::-:- .-_. _===1
00 0' --
0.1 0.2 0.3 0.4 0.5 o 0.1 0.2 0.3 0.4 0.5
Frequency(Hz) Frequency (Hz)
Figure 6.50 The two obtained PDMs (averaged over three data reeords) for the high-foreing hyper-
tensive data in the time domain (top panels) and in the frequeney domain (bottom panels). The first
PDM (Ieft panels) eorresponds to the myogenie meehanism and the seeond PDM (right panels) cor -
responds to the TGF meehanism of renal autoregulation. The first PDM is similar to the normotensive
ease and the seeond PDM resembles the low-foreing normotensive ease, although the dominant res-
onanee peak is shifted upward to about 0.04 Hz [Marrnarelis et al., 1999b] .
MODE#2
Figura 6.51 PDM model for renal autoregulation (without the all-pass eomponent).
350 SELECTED APPLICAT10NS
02 0
~
GLUCOLEPSIS o -50
.. _.....
-0.2
-0 .4 -r ..•.._. --- ._,,",
;
-0.6
Injeeted x(t) o 500 1000 1500 0
Plasma
Time Imins)
lnJuIin Glueese
y.j.t)
-0 .4
-0 6
GLUCOGENESIS
-0.8' ,
o :>00 1000 1500 o
Time Iminsl
Figure 6.52a PDM model of injected insulin-glucose data in a diabetic subject (first 8 hr segment).
0.2,' I
GLUCOLEPSIS
-0.2
y ,(t)
-0.4
-0.6
-0 .8 ' ,
Injected x(l) o 500 1000 1500 -5 o ~y(t)
Plasma
Insulin Time Imins)
+. Gl\l(ose
0: 7' I I
-0.1
-0.2 y.j.t)
-0 .3
GLUCOGENESIS ::: [v·_·······r ···_···;···_·······]

o --
:>UO 1000 1500 -3 -2 -1 o
Time Imins)
Figure 6.52b PDM model of injected insulin-glucose data in the same diabetic subject as in part
(a), for a second 8 hr segment starting 16 hr after the end of the first segment.
6.4 METABOLIC-ENDOCRINE SYSTEM 351
100
~ 80
~
.s 60
"S
CI)
.s 40
20
0 100 200 300 400 500 600
Time [minutes]
--.90
=c
0,
E
;' 80
CI)
0
0
::::J
(5 70
0 100 200 300 400 500 600
Time [minutes]
0.8
!O.6
~ 0.4
u,
0.2
0 100 200 300 400 500 600
Time [minutes]
Figure 6.53 Spontaneous plasma insulin, glucose, and free fatty acid (FFA) data from anesthetized
dog, sampled every 3 min.
y1
o 5
-0.005 o ...
-0.01 v1(t)
... -5 :/
7 v1
/
-0.015
-0.02
-10
-15
r7
-0.025 -20
xjt) 0 150 300 450 600 750 -5 0 5 ......
Insulin I
Glucose
~ h2(t) y2
0.01 10
0.005
o l\ /~
t [nin s]
---..
5
V
~
V \ V2.(t) 0
-0.005
-0.01
V -5
V V2.
o 150 300 450 600 750 -4 -2 o 2 4
Figure 6.54 PDM model for spontaneous insulin-to-glucose data in dog.

toregulated state ofthe spontaneous metabolic-endocrine activity ofthe normal dog (giv-
ing rise to the so-called "counterregulation" phases that have been observed qualiitatively
and seem to contradict notions of unidirectional autoregulatory mechanisms). Note that
this biphasic dynamical behavior was also observed in our studies of cerebral autoregula-
tion under natural conditions of spontaneous activity (see Section 6.2). We also note that
the timing ofthe two peaks in the glucoleptic PDM coincides with the timing ofthe peak
and trough of the glucogenic PDM. Furthermore, the time scale of these dynamics of
spontaneous activity (in dogs) is longer than in the previous injected insulin-glucose
model in humans (almost double).
The spontaneous insulin-to-glucose PDM model (if confinned with additional data)
offers great potential as a diagnostic tool, because the glucoleptic PDM quantifies the
aggregate effect of insulin on glucose uptake (a reliable measure of insulin sensitivity
that can diagnose Type 11 diabetes in rigorously defined grades), and the glucogenic
PDM quantifies the aggregate effect of insulin on new glucose production (a reliable di-
agnostic tool for possible malfunction of this process, related primarily to the liver and
the kidney).
Since the spontaneous activity data truly derive from a closed-Ioop system, we may re-
verse the input-output roles of insulin and glucose in order to derive a PDM model of glu-
cose-induced insulin changes. The resulting PDM model ofthe glucose-to-insulin system
is shown in Figure 6.55 and exhibits biphasic dynamics in both PDMs, although the indi-
vidual contributions of the two PDM branches (tenned "insulinoleptic" and "insulino-
genic") balance each other in a homeostatic fashion by virtue of the opposite signs of the
associated nonlinearities.
We can utilize the PDM model of the glucose-to-insulin system in a diagnostic con-
text and use the measurement of the "insulinogenic" PDM (i.e., the PDM that represents
the glucose-induced production of insulin) to diagnose possible pancreatic deficiency/in-
adequacy and to quantify reliably various grades of Type I diabetes. A suitable diag-
0.2 1-1 ----,.---,.-----r--,-----, 40
o 1 ~f +...-_.-~ ---------- i---.-.---- t·.--------(-----.- 20

-... . !r---.! i. tImins
o. I I. .....:.
.;>... ===- I. o
~
v1(t) -20
-0. 1 ,··········t·········r·····!········..( .
I : : : :
~.21 : : : : -40
x(t o 150 300 450 600 750 -5 o 5
0.151 h2(t) y2
10 ---.------,.--+----.-----,
r-,
o~~~;::;"]:::::::·ffi:::
0' ..... ' I •
0P
. tfml. .]'
450 800 750 o 2 4
Figure 6.55 PDM model for spontaneous glucose-ta-insulin data in dog.

nostic role can also be assigned to the obtained "insulinoleptic" PDM, as a measure of
the subject's aggregate ability to utilize insulin for glucose catabolism. In this rigorous
and quantitative context, we can assess the effect of obesity, exercise, and other factors
(germane to metabolie processes) on the aggregate effect of insulin on blood glucose
concentration. This addresses a host of important health-related issues that have ab-
sorbed immense resources over many years with modest (and often equivocal) results to
date. Of course, the presented PDM results are preliminary.
It is interesting to note that the PDM models obtained consistently from human sub-
jects cannot be reproduced by the widely accepted "minimal model," pointing to its limit-
ed utility for prediction and control, unless it can be augmented to incorporate the other
important mechanisms (e.g., the glucogenic process and the effects of glucagon). In order
to demonstrate this point, we perform PDM analysis on the "minimal model" for parame-
ter values that are considered physiologically meaningful for normal humans in the litera-
ture. The resulting PDM model exhibits different characteristics from those observed in
the PDM models based on the actual data. This is to be expected since this "minimal mod-
el" only accounts for insulin-induced glucoleptic effects. The comparison can be made
also on the basis of the respective Volterra kemels and leads to the same conclusion. It
can be argued that the use of models that are not inductive (i.e., not based on the actual
data) but derive from abstract considerations (that are liable to subjective biases and pos-
sibly restrictive preconceived notions ) may be misleading in certain cases and should be
avoided when lives are at risk.
Another issue of practical interest is the asymptotic behavior of the PDM model for
step inputs of variable strength (often referred to as the "steady-state" behavior). This
allows us to gain some insight into the functional importance of the PDM-associated
nonlinearities. The actual PDM dynamics are reflected on the transient portion of the
step response (e.g., the presence or absence of undershoot, and the extent of settling
time). However, the asymptotic value of the step response (i.e., after the settling time
has elapsed) reflects the form of the associated nonlinearities when viewed as a function
of the step amplitude. Thus, if a step input of amplitude A is applied, then the jth PDM
output reaches asymptotically (i.e., after the settling time) the value: uj = Apj' where Pj
denotes the integral of the jth PDM. The contribution of this PDM to the system output
is Yj = jj(uj ) , where jj denotes the associated nonlinearity for the jth PDM. The resulting
asymptotic value of the system output is
y(A) = Yo + I
j
Yj
= Yo + I
j
jj[Apj] (6.13)
The "steady-state" behavior of the "minimal model" given by Equations (1.18) and
(1.19) is given for a step insulin input (relative to its basal value) whereby the asymptotic
value of insulin action is X = Py4/P2' and the asymptotic value of the glucose concentra-
tion is
G = Gb(1 + ~A]-l (6.14)

PtP2
It is evident from Equation (6.14) that the "minimal model" cannot reproduce the actual
steady-state behavior ofthe system as represented by the inductive (data-true) PDM mod-
el and given by Eq. (6.13).
To achieve better agreement with the data-true inductive model, we propose a "new
minimal model" that incorporates in the "glucose balance" equation two more "state vari-
ables" in bilinear terms (representing internal glucose and insulin production) and as-
sumes the form
dG
- = -ß(G - Gb ) - 'YI(ZO + ZI)G + 'Y2Z2 (glucose balance) (6.15)
dt
~] = -alx] + I, ZI = fi(x]) > 0 (insulin action) (6.16)
dX2 .
dt = -a2x2 + I, Z2 = h(X2) > 0 (glucose production) (6.17)
dz o
---;;t = -aoZo + 'Yo G (insulin production) (6.18)
Note that two ofthe "state variables" (insulin action and glucose production) are subject-
ed to static nonlinear transformations fi. and h prior to entering into the bilinear terms of
the "glucose balance" equation. This model yields the "steady-state" expression
1'01'1 - + [1'1
--02 1 + -fi.(A/al)]-
G = [ G b + -J2(A/a2
1'2 I' )] (6.19)
aoß ß ß
that has a single positive solution G for each A, because the discriminant of this quadratic
equation in G is positive and the product of the two real roots is negative (the negative
root is not admissible because G must be positive). Proper selection of the functions t, and
h can reproduce the steady-state behavior observed in the actual data and endow the "new
minimal model" with greater versatility of dynamics capable of explaining more compli-
cated behavior.
At this juncture, it is instructive to note that a compartmental or parametric model,
such as the "minimal model" or any other, can be validated by means ofVolterra analysis.
Specifically, the Volterra kernels of the parametric model can be compared with the esti-
mated Volterra kernels using the input-output data. Agreement between all pairs of ker-
nels of the same order validates the parametric model because of the uniqueness of the
Volterra representation for practically any system (provided that the Volterra kernel esti-
mates are unbiased, being derived for a complete model). Disagreement between any pair
of Volterra kernels invalidates the parametric model, at least for the respective order of
nonlinearity and to the extent indicated by the observed discrepancy.
For instance, it can be shown that for P3 ~ 1, the "minimal model" can be approxi-
mated well by a second-order Volterra model with kemels [Marmarelis & Mitsis, 2000]:
k o = Gb (6.20)
k 1(T) = -Gb~[exp(-PIT) - exp(-P2 T)] (6.21)

P2-Pl
k2('T\,T2)= 2G
1 {
k$T$k,(T2)+P\
f m in
0
( Tl, 72)=7m
exp(-P IA)k\(T\-A)k l(T2- A)dA
}
(6.22)
b
Ifthe estimated first-order Volterra kernel cannot be fitted by the two-pole expression of
Equation (6.21), then the more general expression ofEquation (3.70) can be applied that
was derived for systems with multiple "state variables" (i.e., bilinear terms in the glucose-
balance equation). The second-order Volterra kerne I can be used for finer validation and
to guide possible adjustments of the parametric model if discrepancies are observed.
Clearly, fitting the second-order kerneI is more complicated than fitting the first-order
kernel, but it also provides unique insight into the system nonlinearities (i.e., many sys-
tems or models may look alike at the first order but be distinctly different at the second
order and exhibit drastically different input-output behavior).
Another interesting observation on the expression of the second-order kerneI in Equa-
tion (6.22) is that the first term within the brackets corresponds to a squarer applied to the
output of a filter k-, and the second term corresponds to a cascaded operation of the square
of the k I filter output with another filter h(A) = Pt exp(-ptA). This equivalent modular
model is an approximation of the precise modular model shown in Figure 1.10 for P3 ~ 1,
and takes the form of an "Sm model" initially proposed by Baumgartner & Rugh (1975)
and studied by Chen et al., (1986).
The question naturally arises as to what is a PDM approximation ofthis modular mod-
el. Clearly, one PDM can be considered to be equal to k I , associated with the quadratic
nonlinearity. However, the other branch of the modular model may be approximated by
an arbitrary number of PDMs due to the action of the posterior filter. If only two PDMs
are desired, then the second PDM g2(T) should yield the best least-squares approximation
by minimizing the mean-square error:
J= If{g2(T\)g2(T2) - », r m
exp(-P\A)k$T\ - A)k\(T2 - A)dAYdT\ d T2 (6.23)
The solution of this minimization problem can be facilitated if we consider an expansion

of the second PDM g2(T) on an appropriate basis of functions {bj ( T)} (e.g., Laguerre func-
tions) and then minimize Jby differentiating with respect to the expansion coefficients. A
more practicable alternative is to minimize J in the frequency domain, since we know that
2
J= I r
-00
IG2(WaG2(W2) -K 1(w$K(W2)
Pt +}
/\
Wt + W2
) 1 dto.dco, (6.24)
where G2 and K, denote the Fourier transforms of g2 and k., respectively. It is evident
from Equation (6.24) that ifthe "minimal model" can be approximated well by means of
two PDMs, then the second PDM has the transfer function
G 2 ( W ) = [2G bJ- 1/2Q(w)KI ( w) (6.25)
where the filter Q is such that
PI
Q(W\)Q(W2) == PI + j(w\ -r W2 (6.26)
1
o 1
r.
' ...........-: /1
~
1.5 V
-1
1 ~-- //
v(t) (t)
---. -2 / -----..
0.5 H-----------,-------------;-------------:--------------
-3 /
01 \ I ~
-4 7
100 200 300 400 500
-5
-1.5
I -1 -0.5 0 0.5
Figure 6.56 PDM model for spontaneous FFA-to-glucose data in dog.
The static nonlinearity associated with the second PDM is a simple squarer in this ap-
proximation. It is worth noting that the first (glucoleptic) PDM of the human subject in
Figure 6.52a resemhles the first-order Volterra kernel ofthe "minimal model" (plotted in
Figure 1.9), hut not for the second data segment shown in Figure 6.52h). The agreement
ceases also for the overall PDM model and for the second-order kemels.
Tuming to the FFA data, we obtain a PDM model of the FFA impact (by considering it
as an input) on the glucose concentration (output), which is expected to be antagonistic to
the effect ofinsulin. This model has only one PDM that is shown in Figure 6.56, depicting
the anticipated antagonistic effect on glucose (lasting over --2 hr) by comparing with the
glucoleptic PDM ofFigure 6.54, although a weak "counterregulation" phase follows after
the 2 hr lag and lasting up to --5 hr. Since another point of interest is how changes in plas-
ma insulin concentration affect the FFA concentration, we obtain a PDM model with in-
sulin as input and FFA as output. The resulting model has one PDM and is shown in Fig-
ure 6.57, depicting an agonistic effect of insulin on FFA (observe the positivity of the
associated nonlinearity that turns both positive and negative phases of the biphasic PDM
into positive effect on the FFA output). This result clearly shows that any change in plas-
ma insulin causes an elevation in FFA concentration.
h(t) y
0.04 0.2
0.03
..~
0.15
0.02
0.01 \
~~
v(t)
---. 0.1 \ (t)
----.
o \
t [mins]
...
\ !
-. /
0.05
-0.01
-0.02
o
V 50 100 150 200
o
-4
\
-2 o
/
2
v
Figure 6.57 PDM model tor spontaneous insulin-to-FFA data in dog.

An intriguing result is also presented in Section 7.2.3 in connection with the nonlinear
dynamic impact on glucose concentration of the combined application of insulin and FFA
inputs (obtained from the spontaneous data in the dog). This result appears to elucidate in
a quantitative manner the effect of obesity on the insulin sensitivity of glucose utilization
(confirming the anticipated reduction of insulin sensitivity for elevated FFA concentra-
tion) [Rebrin et al., 1995; Marmarelis et al., 2002].
These results illustrate the wealth of knowledge that can be extracted from natural
(spontaneous) physiological data by the proper application ofthe advocated methodology
in the field of metabolic-endocrine systems. The author is aware that this field has been
dominated by compartmental modeling up to now, in part because of the sparsity of the
available time-series data. Nonetheless, as the measurement technologies improve (e.g.,
continuous glucose sensors), the use of the advocated methods becomes feasible and the
potential benefits for the advancement of scientific knowledge and clinical practice ap-
pear to be exceptional.
7
Modelingof
Multiinput/output Systems
This chapter is dedicated to the important problem of modeling physiological systems

with multiple inputs and multiple outputs, which is attracting growing attention as it be-
comes increasingly evident that reliable understanding of the function of physiological
systems will be possible only when the full cast of protagonists can be taken into account
in a single model. This realization arises from the fact that physiological mechanisms are
often highly interconnected and subject to multiple influences. This complex interconnec-
tivity may take the form of cascaded and parallel operations (e.g., fan-in and fan-out for-
ward architectures of neurosensory and neuromotor systems) or closed-loop operations
(e.g., homeostatic mechanisms of cardiovascular-respiratory or metabolic-endocrine au-
toregulation). The latter case is far more challenging, as it often exhibits complicated in-
terconnectivity in nested loops that may require a different methodological approach such
as the one presented in Chapter 10.
In this chapter, we focus on the modeling problem of multiple causal pathways be-
tween selected input and output variables. For each output variable, an extended multiin-
put Volterra series expression is sought that describes the causal relationship between the
output variable and all the selected input variables. Note that a specific variable can be
viewed both as an output and as an input (for another output variable, or even in auto-re-
gressive relation with itself as discussed in Chapter 10.)
In the following section, we start with the two-input case, the simplest example ofmul-
tiinput modeling, which has found several interesting applications to date (illustrated in
Section 7.2). The full multi input case is discussed in Section 7.3 and culminates in the im-
portant spatiotemporal case in Section 7.4. The latter represents the best example so far of
multiinput modeling in the visual system.
It must be emphasized at the outset that the nonparametric modeling approach (and its
derivative network-based methodologies) offer a tremendous methodological advantage
over the parametric modeling approach in the case of multiple inputs. This was first rec-
ognized by my brother Panos in the realm of the visual system 30 years aga and remains

360 MODELING OF MULTIINPUTIMUL TIOUTPUT SYSTEMS
the most promising direetion in our efforts to disentangle the eomplexity of multivariate
physiologieal funetion in a manner that remains "true to the data."
7.1 THE TWO-INPUT CASE
In the ease of two inputs, the output can be expressed in terms of the extended Volterra
series that ineludes a set of kernels for eaeh input (termed "self-kernels") and a set of
"eross-kemels" that deseribe the dynamie nonlinear interaetions between the two inputs
as they affeet the output [Marmarelis & McCann, 1973; Marmarelis & Naka, 1973e]:
y(t) = ko,o + ('k\,o(T)x$t- T)dT+ ('ko,\(T)x2(t- T)dT

° 0
+ fJ
o
k 2,o(T\, T2)x\(t- T$x$t- T2)dT\ dT2 + fJ
0
kO,2(Tb T2)Xit- T$X2(t- T2)dT\ dT2
+ fJ k\,$Tb T2)x\(t- T$x2(t- T2)dT\ dT2
+ ...
+ f .. ·fooo km,n( Tb . . . , Tm+n)xl(t- T\) ... XI(t- Tm)x2(t- Tm+\) ... xit- Tm+n)dT\ ... dr.m+n
+ ... (7.1)
where km,n( Tl' . . . , T m+n) denotes the kerne! that eonvolves m times with the first input
XI(t) and n times with the seeond input x-tr). Clearly, when mn =1= 0, km,n is aeross-kemel,
and when m = 0, ko,n is the nth-order self-kernel for input X2' and viee-versa. The eross-
kernels deseribe the nonlinear interaetions between the two inputs as they affect the out-
put and (unlike the self-kemels) are not symmetrie about their diagonals (i.e., the tempo-
ral arrangement between the two inputs matters in terms ofthe elicited response).
The zeroth-order kernel ko,o is eommon for both inputs. The self-kemels for either in-
put in the two-input Volterra model are the same as the kernels ofthe same input in a sin-
gle-input Volterra model, if and only if the models are eomplete (i.e., not truneated) and
the other input is not aetive. If the other input is aetive, then the single-input Volterra
model will attain a nonstationary form (see Chapter 9) beeause ofthe funetional terms in
Equation (7.1) that eontain the other input.
The two-input Volterra series can be orthogonalized for two independent GWN inputs
in a manner similar to the single-input Wiener series orthogonalization, yielding the two-
input Wiener series [Marmarelis & MeCann, 1973]:
y(t) = ho,o + ('ht,o( T)x$t - T)dT+ ( 'ho,\(T)x2(t- T)dT

o 0
+ foof h 2,o( T\, T2)x\(t - T$x$t - T2)dT\dT2 - p\fooh2 ,o(A, A)dA

° 0
+ ff h O,2( Tb T2)xit - T$xit - T2)dT\ dT2 - P 2 f ho,2(A, A)dA
+ ff h\,$Tb T2)x\(t- T$xit- T2)dT\ dT2
+ ... (7.2)
7.1 THE TWO-INPUT GASE 361
where the Wiener kemels {hi,j} are generally distinct from their Volterra counterparts and
depend on both power levels PI and P 2 ofthe two independent GWN inputs Xl and X2. For
instance, the zeroth-order Wiener kernel is expressed in terms of the input-independent
Volterra kerneis as
ho,o = I I 00 m (2m - 2n)!(2n)! Joo J

(m _ n)!n!2m p'{'-npq ... k2m-2n,2n(A), A), ... , Am' Am)dAi ... dA m (7.3)
m=On=O 0
which implies that ho,o contains components generally dependent on all even-order
Volterra kerneis and on both power levels ofthe two GWN inputs.
This is an important point to keep in mind when the Wiener kerneI estimates of the
two-input model are interpreted. To elaborate this point for the practically important first-
order Wiener kernei, we note that
00
m=On=O
m
, ,m pr+l-npq Joo... Jk2m+I-2n,2n(r,AJ, AJ,... .x.; Am)dAI···dA
h\'2=II (2m(m+ -1 -n).n.2
2n)!(2n)!
-00 (7.4)
m
which implies that the first-order Wiener "self-kernel" actually depends on the higher
odd-order Volterra cross-kernels, if such exist. Therefore, in the Wiener formulation of
the two-input problem, the interpretation of the "self-kernels" must be made with great
caution and proper attention to the possible effects of higher order cross-kernels repre-
senting interaction nonlinearities. Similar statements apply to the second-order Wiener
kemels and so on, observing the separation between odd and even terms.
The first application ofthe two-input modeling approach to physiological systems was
made by Panos Marmarelis and Gilbert McCann on directionally selective cells in the fly
eye [Marmarelis & McCann, 1973] followed shortly by an application to horizontal, bipo-
lar, and ganglion cells in the catfish retina by Panos Marmarelis and Ken Naka [Mar-
marelis & Naka, 1973c]. The first application used two spot stimuli to emulate directional
motion, and the second application used stop and annulus stimuli to test the center-sur-
round organization of retinal receptive fields. These applications are viewed as pivotal in
establishing the feasibility of the multi input modeling approach in systems physiology,
culminating to the spatiotemporal modeling of the vertebrate retina by the same pioneers
(see Section 7.4). These pioneering applications, as well as two recent applications to the
metabolic and cardiovascular systems, are presented in Section 7.2 as illustrative exam-
pIes oftwo-input modeling.
In this section, we present the three main methodologies for estimating the self-kernels
and cross-kernels in the two-input case. For historical reasons, we present first in Section
7.1.1 the cross-correlation technique that was used in the aforementioned pioneering ap-
plications for the estimation of Wiener self- and cross-kemels of a second-order model.
We follow in Section 7.1.2 with the adaptation of the kernel-expansion technique to the
two-input case, which yields the Volterra self-and cross-kernels of(possibly higher order)
models. The kemel-expansion approach naturally leads to the PDM variant for two or
multiple inputs, and this in turn leads to Volterra-equivalent network models that are dis-
cussed in Section 7.1.3. The kernel-expansion approach and its PDM or network-based
variants yield accurate Volterra-type high-order models from short data records and,
therefore, are recommended as the methods of choice for two-input modeling. Their effi-
ciency becomes critical in the case of multiinput modeling, as will be discussed in Sec-
tions 7.3 and 7.4.
362 MODELING OF MUL TIINPUTIMUL T10UTPUT SYSTEMS
7.1.1 The Two-Input Cross-Correlation Technique

As indicated above, this is the technique originally used for two-input modeling by Panos
Marmarelis and his colleagues. It employs two independent GWN (or other quasiwhite)
input signals and utilizes the orthogonal Wiener series formalism ofEquation (7.2) to ob-
tain estimates ofthe Wiener kernels as
1
hm.n{ Tb . . . ,Tm+n) = m!n!pmpn E[Ym.n(t)xI{t- Tl)' .. Xl{t - Tm)x2{t - Tm+l) ... X2{t- Tm+n)]
I 2 (7.5)
where Ym,n(t) denotes the output residual at the (m, n) estimation step (i.e., after sub-
tracting from the output signal the contributions of all previously estimated Wiener ker-
nels of lower order). The reason for using the output residual (instead of the output sig-
nal itself) was elaborated in the single-input case and pertains to the accuracy of the
estimation of the diagonal values of the Wiener kemels, The ensemble average indicat-
ed in Equation (7.5) is replaced in practice by time-averaging (over finite-length
records) on the assumption of stationarity and ergodicity of the input-output signals.
Note that unlike the self-kernels, the cross-kemels are not symmetric about some of
their diagonals that separate the effects of the two inputs.
The limitations of the cross-correlation technique and the associated estimation errors
were discussed extensively in Chapter 2, and the same analysis applies to the multiinput
case as well. The application ofthe two-input cross-correlation technique to actual physi-
ological systems is illustrated in Section 7.2.
7.1.2 The Two-Input Kernel-Expansion Technique

As in the single-input case, the Volterra kernels ofthe system can be expanded on ajudi-
ciously chosen basis of functions (e.g., the Laguerre basis) in order to reduce the number
of free parameters that need be estimated from input-output data. Note that due to the
asymmetry ofthe cross-kernels, the number of expansion coefficients for cross-kernels is
larger than for self-kemels of the same order. The expansion for the Volterra kernel of or-
der (m, n) is
km,n(TI,· .. ,Tm+m) =
L L
I ...jm+n=l
Jl=l
I Cm,nU},··· ,jm+n)bjI(TI) . . . bjm(Tm)bjm+I(Tm+l) . . . bjm+n(Tm+n) (7.6)
where bj( T) denotes the jth basis function and {cm,n} are the expansion coefficients of km,n
(to be estimated from input-output data). Note that the indices j I to j m correspond to the
input Xl (m product tenns) and the indicesjm+I tojm+n correspond to the input X2 (n prod-
uct terms).
It is evident that the kernel value is the same for any permutation of the first set of in-
dices (as in the case of a self-kemel) or for any permutation of the second set of indices.
However, permutations across the two sets of indices correspond to distinct kernel values
(i.e., asymmetry of cross-kernels about some diagonals). Note that the two sets ofindices
remain segregated by convention (i.e., the first m correspond to input Xl and the last n cor-
respond to input X2). It follows from these observations that the kernel of order (m, n) has
Lm+n/(m!n!) distinct expansion coefficients, where L is the number of employed basis

functions.
The kernel expansion of Equation (7.6) leads to the following "two-input modified
Volterra model":
L L
y(t) = ko,o +I cI,oCiI)V)~)(t) +I co,ICiI)V)~)(t) +...

il=l iI=l
L L L L
+I···I I '" Cm,n Ci J, ••• ,}m+n

L ' )VjI
(1)( t) ... Vjm
(1)() (2)
t Vjm+I (t) ... Vjm+n
(2) (t)
iI=I im=I im+I=I i m + n =l
+ ... (7.7)
where v/i) denotes the convolution of b, with Xi' Note that each input may also employ its
own set of basis functions (if such practice is deemed useful in a given application) as
shown later for multiinput Volterra-equivalent network models. The expansion coeffi-
cients in the model of Equation (7.7) can be estimated through linear least-squares proce-
dures because they enter linearly into the model (note that the signals Vj(i)(t) are computed
as convolutions between the basis functions and the respective input signal). Although the
model of Equation (7.7) is written in continuous time, its discrete equivalent (resulting
through appropriate sampling) is the one used in practice for estimation and other compu-
tational purposes (e.g., prediction).
Thus, the estimation problem for the two-input Volterra model using the kernel-expan-
sion technique follows the procedural steps elaborated previously for the single-input
case. Likewise, in a manner similar to the single-input case, we can derive the equivalent
PDM model for the two-input case, as depicted in Figure 7.1. The PDM model can be
practically estimated for high-order systems and represents the most parsimonious form
of a Volterra-type model. The two-input PDM model is recommended as the key instru-
U(I)
1
INPUT Xl
(1)
U H1
MULTI-INPUT
STATIC -f( U 1(I) , ••• ,UH2(2))
y-
NONLINEARITY I I
U(2)
1
INPUT x 2
(2)
U H2
0
Figure 7.1 The PDM model for the two-input ease. The PDM output 1) is the convolution of the PDM
(i,J) with the input x, (i = 1, 2). The multiinput statie nonlinearity f receives as inputs al! the PDM outputs
and generates the model output. The PDMs and f are determined by means of similar proeedures with
the ones deseribed in Section 4.1.1 for the single-input ease, using broadband input-output data.
364 MODELING OF MUL TIINPUTIMUL TIOUTPUT SYSTEMS
ment for interpreting the observed dynamics and understanding the functional characteris-
tics ofthe subject system.
The PDMs and the associated multiinput static nonlinearity can be determined by
means of procedures similar to the ones outlined in Section 4.1.1 for the single-input case,
using the available input-output data. The latter ought to be broadband and the two input
signals ought to be weakly correlated in order to achieve high-quality estimates of the
PDMs andf. The reader should be reminded that the PDMs are determined either from
Volterra kerneI estimates or through Volterra-equivalent network models (discussed in
the following section).
7.1.3 Volterra-Equivalent Network Models with Two Inputs

A most efficient approach to obtaining the two-input PDM model is the use of a "separa-
ble Volterra network" (SVN), whereby the multiinput static nonlinearity of Figure 7.1 is
"sliced" into individual dual-input static nonlinearities associated with each pair of PDMs
(one from each input). This leads to the two-input SVN model with a single hidden layer
shown in Figure 7.2, whereby the activation functions of the hidden units are the afore-
mentioned individual dual-input static nonlinearities. The PDMs are detennined by the
inbound weights of each hidden unit (each hidden unit corresponds to a distinct pair of
PDMs) in conjunction with the respective basis functions ofthe filter bank employed for
preprocessing the corresponding input. Note that the two filter banks may be distinct and
the mathematical formulation is now in discrete time (by computational necessity),
whereby n is the discrete-time index.
As indicated in Figure 7.2, the internal variable uh(n) of the hth hidden unit is com-
posed as a weighted summation of all filter bank outputs:
LI L2
~ w(l!v(l)(n)
Uh(n) = L h.j }
+ L~ w(2!v(2)(n)
h,}} (7.8)
j=I j=I
~(n) x2(n)
v;l}(n)
V~l) ( n) I vj2)( n )
L,
Uh ( n ) = L wi:~ vY) (n)
}=1
~
+~ W(2~v(?) (n)
~ h,l 1
}=1
.h(e)
zh(n)=fh[uh(n)]
y(n)
Figure 7.2 The two-input separable Volterra network where each input {x1(n), x2(n)} is preprocessed
through the respective filter bank {bJ)} (i = 1, 2; j = 1, ... , Li), and the respective filter bank outputs
are fed into the hidden units of the hidden layer with polynomial activation functions {fh}. The output
y(n) is formed by summation of the outputs of the hidden units {zh(n)} and an offset Yo.
where w~j denotes the weight ofthe output vj)(n) ofthejth filter in the ith filterbank giv-
en by the convolution
M-l
vj)(n) = L bj)(m)x;(n - m)
m=O
(7.9)
where bj) denotes the basis function that is the impulse response ofthejth filter in the ith
filterbank. The PDM (i, h) in the two-input case is given by
Li
Pi,h(m) = Lw~jbj)(m) (7.10)
j=l
where m denotes the discrete-time lag of the PDM. The internal variable uh(n) of each
hidden unit is transformed by the polynomial static nonlinearity Jh to produce the output
of the hth hidden unit:
Q
zh(n) = Jh[uh(n)] = LCh,qu%(n) (7.11)
q=I
The output of the network model is formed by simple summation of the outputs of the
hidden units and an offset Yo:
H
y(n) = Yo + LZh(n) (7.12)

h=l
It is evident from Equation (7.8) and (7.10) that the PDM outputs for the two inputs
combine in summative pairs prior to the nonlinear transformation depicted in Equation
(7.11) that results in "nonlinear interaction" terms between the two inputs as they impact
the output ofthe model. Thus, even though the activation functionJh is univariate, it con-
tains nonlinear interaction terms between the two inputs.
A critical advantage of this modeling approach over the previous two (cross-correla-
tion and kernel-expansion) is that the number of free parameters is linear with respect to
the nonlinear order Q (and with a rate of increase equal to H) making it very efficient
for high-order systems. This issue is further discussed in Section 7.3 in connection with
multiinput modeling. Note that the number of free parameters for the network model of
Figure 7.2 is (LI + L 2 + Q)H + 3, where one free parameter is allowed for each filter
bank.
The correspondence between the Volterra kerneIs ofthe two-input system and the pa-
rameters of the Volterra-equivalent network of Figure 7.2 is found in a manner similar to
the single-input case. By proper substitution ofthe analytical expressions (7.8)-(7.12), we
obtain
ko,o = Yo (7.13)
k1,o(m) = LCh,tPI,h(m) (7.14)

h=l
kO,l(m) = L Ch,tP2,h(m) (7.15)

h=l
k2,o(mb m2):= I Ch,LPI ,h(m I )P I ,h(m2)

h=I
(7.16)
kO,2(mI, m2):= I
h=I
Ch,LP2,h(mI)P2,h(m2) (7.17)
kl,I(mb m2):= I Ch,2[PI,h(mI)P2,h(m2) + PI,~m2)p2,~mI)]

h=I
(7.18)
etc.
Upon training of the network model of Figure 7.2 with the input-output data, the
PDMs and/or the Volterra kemels ofthe system can be constructed by means ofthe ex-
pressions given above. This methodology has been shown to perform exceptionally weIl,
even for short data records and in the presence of considerable noise. An illustrative ex-
ample of a simulated system is given below and examples of applications to actual physi-
ological systems (neurosensory, cardiovascular, and metabolic) are given in Section 7.2.
In closing this section, we should note that an alternative network configuration may
prove beneficial in certain cases (and especially in the multiinput case), whereby a second
hidden layer (termed the "interaction layer") is used to capture the interactions among
various inputs. Because this architecture is deemed to be primarily useful for scalability
purposes in the multiinput case, it is discussed in detail in Section 7.3.3.
Illustrative Example. To illustrate the performance of the network-based approach,

we use two Laguerre filter banks (with distinct parameters alpha) to model a simulated
system with two inputs (for which we have "ground truth") that is described by the differ-
ential equations
dy(t) = [-ao + CjZj(t)- czzz(t)}Y(t) + Yo (7.19)

dt
dzj(t) = -ajZj(t) + Xj(t) (7.20)

dt
dzz(t) = -azzz(t) + xz(t) (7.21)

dt
where Xl(t) and X2(t) are the two inputs of the system, y(t) is the output, and ZI (t), Z2(t) are
state variables. The values of the parameters used for the simulation are ao := 0.5, al :=
0.25, a2 := 0.05, Cl := 0.3, and C2 := 0.15. Models ofthis form have been used in physiology
(e.g., in the metabolic and endocrine system [Bergman et al., 1981; Carson et al., 1983],
where the bilinear terms represent modulatory action). In the context of glucose regula-
tion, the first equation describes the dependence of blood glucose concentration y(t) on
glucagon action ZI(t) and insulin action Z2(t), which are assumed to have first-order kinet-
ics. A model of similar structure has been proposed by Robert Pinter to describe lateral in-
hibition in the visual system [Pinter, 1984, 1985, 1987; Pinter & Nabet, 1992].
The nonlinearity of this system lies in the two bilinear terms in Equation (7.19), which
give rise to an infinite-order equivalent Volterra model. The relative contribution of the
two nonlinear terms to the qth-order Volterra functional is proportional to the qth powers
Table 7.1 The NMSEs for the network model prediction and for the kernel estimates of the two-input
system described by Equations (7.19)-(7.21) for two values of output SNR (10 and 0 dB). Note that the 10
dB results are for a single run of 1024 input-output data points, but the 0 dB results are the average and
standard deviation values over 50 independent runs.
Kemels NMSEs (%)

Output Prediction
SNR NMSE (%) kt(m) k2(m) k ll(mb m2) k22(mb m2) at a2
10 dB 10.14 0.856 0.818 6.15 4.50 0.241 0.696
OdB 49.70±2.24 2.41 ± 1.52 3.47±3.55 26.80±16.20 13.50±7.14 0.173±0.060 0.648±0.104
of P}/2 C I and P~/2C2' where PI and P 2 are the power levels ofthe two inputs Xl and X2' re-
spectively. Since the magnitudes of Cl and C2 are both smaller than one in this example, a
truncated Volterra model can be used to approximate the system. For the above values of
Cl and C2, it was found that a second-order Laguerre-Volterra network model was suffi-
cient to model the system. The Volterra kemels of this system can be analytically derived
by the generalized harmonie balance method presented in Section 3.2.
This system is simulated for two independent unit-variance GWN inputs with a
length of 1024 data points (for zero initial conditions). Independent GWN signals are
added to the output signal for SNR of 10 dB, in order to examine the effect of output-
additive noise and the robustness of this approach. A two-input Laguerre-Volterra net-
work with five DLFs in each filter bank (LI = L 2 = 5) and two hidden units (H = 2) with
second-degree polynomial activation functions (Q = 2) is selected using the model-order
selection criterion of Section 2.3.1. The NMSEs of the resulting model prediction and of
the kernel estimates are given in Table 7.1. Note that the ideal NMSE ofthe model pre-
diction for output SNR = 10 dB is 10%. The estimated first-order and second-order
Volterra kernels are shown in Figures 7.3 and 7.4, respectively. The estimated zeroth-
order kernel was equal to 1.996, very close to its true value of 2. The results demon-
strate the excellent performance of this approach, especially relative to the conventional
0.7,-----~-~-~---, o
Q6
-0.1
-0.2
Q4
i'
~
g
~Ci
~0.3
-0.3
Q2
0.1
0'
o 10
--?r-
2> 3) 40 eo
m
4\ eo 100 m
Figure 7.3 Estimated first-order Volterra kerneis of the simulated two-input system for SNR = 10
dB and N = 1024 data points (solid line): left panel for X1 and right panel for X2' The exact kerneis are
shown with dotted line.
368 MODELING OF MUL TflNPUTIMULTIOUTPUT SYSTEMS
.-!. .. !.
['-" o
0015, --- ! :. 0015 ---
-o.0C6
N 001 N 001
~ -0.01 .. ~.
-.
E
E
__ E -. : '.
N
\ ~0015 ~ ~ .....
'il00C6 ><
' ." 1:'"
-0.02 .\.-
.'.~.
-0.025 1. :
o
100 100
50 50
o m1 100 0 m2 m1 100 0 m2
Figure 7.4 Estimated second-order Volterra kemels of the simulated two-input system for SNR =
10 dB and N = 1024 data points. The cross-kernel (right panel) shows negative values, although the
self-kernel of the "negative" modulator Z2 (middle panel) shows positive values.
cross-correlation method for which the kerne1 estimates are unacceptable for such short
data records, as shown in Figures 7.5 and 7.6.
The robustness of this approach is further demonstrated by decreasing the output SNR
to 0 dB and repeating the estimation procedure for 50 independent runs. The resulting
NMSEs ofmodel prediction and kerne! estimation are given in Table 7.1 as weIl (average
and standard deviation values over the 50 independent runs) and corroborate the robust-
ness of this approach (note that the ideal NMSE of the model prediction for SNR = 0 dB
is 50%).
06 04 , ,
05 1,: 03
02 '
r
04
Cll
03
,~: "l
:§:02
:;<
01
.0.1
.0.2
0 10 :D J:) Cl !D I 100
m m
Figure 7.5 Estimated first-order kemels of the simulated two-input system for SNR = 10 dB and N
= 1024 data polnts,
using the cross-correlat ion technique (dashed line) or the Laguerre-Volterra net-
work-based approach (solid line).
7.2 APPLICATlONS OF TWO-INPUT MODELING TO PHYSIOLOG/CAL SYSTEMS 369
.'. ,
0.02 .:'
.r , ;
"
:.
0.02 ..'
r. 0.1
'i"
· il'i·····
...... 0.011/ : 0.05 1."
N
E
: I ~O,O1 :N
". : E
,......
..:
l-o.~~n~
.§. !.§.
:.~
~ 0
->c: .,F~ -0.051.'
·{>.cIl I." -'-.! -0.01 -0.1 1/
o !':'. j 0 o -.
/ 100
40 100
2J !D !D !D !D
2J
m1 40 0 ni2 m1 100 0 ni2 m1 100 0 ni2
Figure 7.6 Cross-correlation estimates of the second-o rder kemels of the simulated two-input sys-
tem for SNR = 10 dB and N = 1024 data polnts. Comparison with the respective Laguerre-Volterra
network-based estimates of Figure 7.4 demonstrates the superior performance of the advocated ap-
proach over the conventional cross-correlation technique,
We conclude this example with a note on the steady-state behavior ofthis system/mod-
el. For step inputs Xl(t) = Alu(t) and X2(t) = A2u(t), the steady-state (asymptotic) value of
the output is y = yoJ[ao - (c /al)A, + (c2Ia2)A2], which explains the polarity of the first-
and seeond-order kemels for the two inputs (eorresponding to the signs of the first and
second partial derivatives ofthis nonlinearity).
7.2 APPLICATIONS OF TWO-INPUT MODELING TO

PHYSIOLOGICAL SYSTEMS
To honor the pioneering contributions of my brother Panos and his assoeiates to the prob-
lem oftwo-input Volterra-Wiener modeling and to provide proper historie al perspective,
we begin with the presentation of the first two applications to actual physiological sys-
tems: one on directionally selective eeIIs in the fly eye [Marmarelis & McCann, 1973]
and the other on the eenter-surround organization of receptive fields in the eatfish retina
[Marmarelis & Naka, 1973c]. Both of these initial applications employed the two-input
variant of the cross-correlation technique. Subsequently, we present two recent applica-
tions oftwo-input modeling using the Laguerre-Volterra network-based approach (which
is shown to be far more efficacious) to analyze natural data of spontaneous activity, which
offer valuable insight into the metabolie autoregulation in dogs (Section 7.2.3) and the
cerebral autoregulation in humans (Seetion 7.2.4).
7.2.1 Motion Detection in the Invertebrate Retina

The compound eye of insects consists of a matrix of ommatidia, each containing a small
number ofretinula cells (eight in the fly ommatidium) and having a distinct optical axis .
370 MODELING OF MULTIINPUTIMUL TlOUTPUT SYSTEMS
Motion-sensitive neurons, located in the optic lobe of the insect brain, respond maximally
to motions along two axes (approximately horizontal and vertical). For each axis, there is
a pair of neurons tuned to respond maximally to each direction (i.e., for the horizontal
axis, there is one fiber responding maximally to motions from left to right and one re-
sponding to motions from right to left). The functional properties (dynamies) of each fiber
type can be examined with a two-input experiment in which the stimulus consists of two
spots of light placed along one axis of motion and whose intensity is modulated by two
independent GWN signals (with bandwidth of80 Hz in this fly experiment). The output is
the spike-train response of the motion-sensitive cell measured by an extracellular elec-
trode. The output spike record is converted into continuous form as a peristimulus his-
togram by repeating the GWN stimulus (60 sec duration) 12 times and histogramming the
spike frequency in each time bin [Marmarelis & McCann, 1973].
Figure 7.7 (top-left panel) shows the two first-order Wiener kernels h l a and h l b , corre-
sponding to the two spots a and b, which are quite different. If an impulsive light stimulus
is given at input b, a positive response (increase from the mean response level) will be
evoked, while an impulse given at input a will elicit a very small negative response (slight
decrease from the mean response level).
Figure 7.7 also shows the obtained second-order Wiener kernels (a cross-kernel h2ab
and two self-kernels h 2aa and h2bb ) presented as arrays of numerical values (the self-ker-
nels are symmetrie). We observe that the cross-kernel exhibits the asymmetry expected in
directionally selective cells, and that the self-kernel h 2aa is very small (nearly null),
whereas the self-kemel h 2bb has significant values (almost as large as the cross-kernel val-
ues). The contribution ofthese kemels to the model response can be seen in Figure 7.8. It
should be noted that while the Wiener self-kemels describe the contribution of each input
separately, their effect is generally dependent upon the presence of the other input (see
earlier analysis ofWiener and Volterra kernels in the two-input case).
The cross-kernel h2ab ( Tb T2) exhibits directional selectivity in motion detection be-
cause it has a large positive mount in the region Tl > T2 (forward motion), while in the re-
gion T2 > Tl it has a large negative valley (reverse motion). This cross-kernel describes
quantitatively the contribution to the system response that is due to the nonlinear dynamic
interaction between the two input signals. From this kerneI we see that a pulse at a fol-
lowed by a pulse at b (i.e., Tl > T2) will elicit a large positive response, while a pulse at b
followed by a pulse at a (i.e., 'T2 > 'Tl) will produce a negative response. The temporal ex-
tent of such "cross-talk" between the two inputs (about 60 sec in duration) and the precise
time course ofthe elicited response are fully described by h2ab • The experimental and ker-
nel-predicted responses shown in Figure 7.8 demonstrate that the system is strongly non-
linear and the cross-kernel contribution is significant.
7.2.2 Receptive Field Organization in the Vertebrate Retina

The typical "receptive field" (RF) organization for a neuron in the vertebrate retina con-
sists of a central spot (center) and a concentric annulus (surround) that exhibit distinct re-
sponse characteristics. Thus, a natural way to study the RFs of retinal cells is to consider
them as two-input systems and employ two photic stimuli: a small spot of light covering
the center (in this case, 0.3 mm in diameter) and a concentric annulus oflight covering the
surround (in this case, with inner diameter of 0.3 mm and an outer diameter of 5 mm),
each modulated by statistically independent GWN signals (light intensities modulated in
GWN fashion with bandwidth 55 Hz and dynamic range of 1.6 log units). The response
7.2 APPLICATIONS OF TWO-INPUT MODELING TO PHYS/OLOG/CAL SYSTEMS 371
Tl,mse c
O Il I!l 1 21 1 20 ~ 21 n ~ ~ ~ ~ ~~ ~ ~ 9 7 2 ~ ~ " R
o I
a b • 0
o 0 • 0 J 5
20 Forword- 12 ·1 2 6 9
h 2 0a ( 1"1' 1"2)
'6 · 2·2 5 11 13
20 ·1·' 0 SI III 12
2'1 1 -2 ·) 2 11 13
28 ] ·1 ·)·1 II 10
101- Linear Kerneis )2
lS
• 0·3 ·1 2 II
2 1 · 3 · ) 2 II 5 6 , 1
lIO 0 1·2 ·.·1 II 5 5 6 II
.... , 0 0·2·2 I II .. 5 5 3 0
lIell'OI002)3221-'
52 " 5 1 1 I I I 1 2 1 · 1·1· 1 · '
u 56 2 " " 1 0 I 1 2 1 2 0·'·2 -I · 1
:< 150·1 I , 2 I 0 , 3 :3 2 2 ·1 ·3·1 0 ·1
E "'" -)·1 0 1 I I I 2 ) 3 2 1·2 -s ·1 · 1 ·2
• 61 - I - 2 -2 · 1 0 2 2 J , ) ] 2 0 - , -s -2 ·2 -s
...N 72 , 0 ·2·'·2 1 2 J , 1 ) 5 2·2·' -2 ·2 ·2 ·2
7fl 2 2 - I · 2 -lI ·2 I 2 3 2 0 , " Q -2 · 2 - 2 - 2 · 1 - 2
- 10 eo ·1 Q ·I - 2 -2 -s - I 2 , 3 1 -I 2 1 ·2·2·2 -2· 1 0 0
tN ·1 .] . " -5 · 3 -1 -I 1 , 16 ') 1·1 0 · 1 · , · 2 ..1·1 0 0 0
ee ·1 ·2 . .. -s -1 · 3 I 2 J II 11 :3 1 · 1 0 ·1 -s - I 0 -I -I 0
92 · 1 · 1 - I . ] -, - ' · 2 2 16 11 11 'I , 1 · 1 0 - 1 -2 I 2 - I 0
T 98 -I -2 -a · 1 -a -5 -ll 0 " 8 5 :3 ) , 1 ·1 0 0·) , 11 0 1
-20 100 ·2 ·2 - ll . ", · 2 · 3 -) · 2 I 5 6 11 ) ] , 1·1 0 I I 6 5·1
I~ ·2 ·2 ., -$ ·5 -ll · ll - ] · 2 I 5 5 ) 2 , 2 1 0 2 2 2 8 II
loe O ·I·2·' ·5·5..q ·II ·) ·) 0 'I \I 2 I I 1 0 2 11 , I 'I
o 0 .05 0.10 o 112 2 1 0 ·1·1 ·]·5 -11 .) · 2·) 0 ) 2 2 0·1 0 1 :3 5 2 0
111 , :3 2 1 0 0 · ' -11 ..q ·1 · 1·3 0 2 ) 2 0·1 0 2 ) , 1
120.... 1 " ) 1 0 ·1 ·3 -ll.) 0 - I -lI·2 2 , 1 0 0 1 1 2 1
Time, sec 12'ol · ' ·11 0 11 2 0 0 0 ·2 ... ·2 0 ·1 ..q., 0·1 ·2 - I 1 0 0 2
121 1·2·5 0 , 1 0 I 2 0 ·2 ·2 1·1·, .. ·)·11·)·1 0·1 0
1)2 2 ) · 1 ·5 I ) · 1·1 2 ) 2 0 · 1 I 0 ·3 ~ · 5 ·5 ·3 -I 1 0
136 1 11 2·3·5 1 1·2·2 0 'I ) 1 0 I 0 ·3 · .. -s -11 · ' · 1 2
1'10 ) 3 5 1 · 5 · 5· 1 0 - ) ·11· 1 11 II 2 ·1 0·1·5 -s · 1 -s ." · 1
..
t"l t msee CROSS-KERNEL h... (" , '.)
Q .. 8 12 16 20 2lI 28 32 :J6 '60 \lll .., 52 56 60 6'f, 6' 12 1'6 eo .. oe r l , mstC
o 11 I!I 12 11 20 211 2t )2 38 lIO ....... 52 51 so.. se on 11 ICI .. ..
o I
o , • $ 2 , .. 2 6 1 12 18 SI I ) 7 Si 1 I 2 I 7 0 -7
)
e 0 5 8 11 -8·1 7 I 1 11 7 Si 1 6 1 12 5 ·2 1 3 1 2 I 2 5 5·)
12 0 .. 11 12 • · ID-U -6 • I!I 10 15 11 t I!I 2 , , 1 ·) ·1 I 2 .. 1 " 11 0
12 · 10· 1.·18 · 5 1I 11 20 23 20 UI 12 2·1 5 ·1 -41 ) , I 2 2 5· 1
~ ~ I~ ~ ~ :n I1 · 2· 12-11· 15 , 10 36 Je Je 3S 29 17 2 -2 5·2·) • 12 5 -I 2 I
:
.2'1 11 6 J 12 36 so so
h z bO (TI' TZ)
20 ) -6 ·11·2'1 -9 21 115 56 551 ,. .7
21& I · . · 20· 3'5·33 · 7 2e 53 68 'I 62
37 20
51 :si
11
22
) '-41 . )
• , 7
11 11
... ·) 10
1 ·2 2
1·1 0
21 7 11 I "Ii '17 60 55
J2 0 3 10 J :3 2lI SlI 11 53 21 - 2 ·1·18-"12 -55-"2 . , 21 SI 68 .. 511 so 3i 2'1 13 10 ' ·5·2 , 2·1
36 3 -3 - I 5 ·2 5 32 se 57 ", J2 ·2 · '·11 -)7-11-61 -11. · 7 25 115 56 55 '.. '17 31 21 11 1 7·1 2 I II
"t1 I .. ·5·' ·11 -" 13 113 Si so)8
3S J -I ·20-:IlI·SO-a-es·llS ·2 2'fI 3i 11' '" 11I .... "15 I , • , 5 t
l&O 5 D-II ·]5-III-S5-6I...«t.I·211 0 20 32 33 32 15 3S 26 10 ) 5 7 :I 8
". · 11 I 3 · 8- 111 - 10 I 2S 52 51 \12 21 ~ 11 ,,·7·21·50·511 ·51 -."·]1·,1 0 1I 21 2S 21 2S 22 16 1 0 2 " 1
... ...s -" 0 0-1\-11 -9 11 11 SI 53 3S 21 "11 5 '·2 ·20-"2·51 - 511-1111 ·32-11·11 1 III 22 20 16 I' 10 , .. ·2 0 2
-a 20 .,
u
52 -, ., '"' -&-10·21-18
54 -I-13-IO-ll-16-20-21-12 6 2S 39 \I'
52 1111 21 17
3S 23 16
ISO ....- 13- 11- ' '1-11 -25- 23- ' " -2 '" 26 33 3S 29 20 l3 e:
52 , I 0-1)-32....1·51·51 ..0 ·211·12 -I 0 10
.~ ; ~ :t:~:~::::t:~:::~.;~ j :; .: I~ : : .~ :ti~ :~ .:
11 16 10 I 1·1 ·2 - \I .. I
, 611 -I -I-''I -lt- 19-27-29-1I·8 e 19 2S 29 10 2S 16 •

E SI ~ · 2 -5-1 3- 21-2'l- 3Q· 27-11 0 10 20 22 2't 21&
n -s ~ -3 · $ - 18..2S·26 -2t·22 -s 14 12 It 19 17
20 10 I
17 13 I "
E :: ~ -~ .~ ::::~:~:~:~:~:~:::g.i: :~ :~ ~ .g .~ : -::: :; .~
;., 12 " 6 ) I ·1·12·22·23·21· 11·17·17·11·13"" .. , -,., \I 2·) ·7
N?I 0 -.4t -11 ""I· l0·23-21·~ ·1S -17 -I 7 I' 19 111 11 12 8 - 5 -6 ... 7G 1 I 10 5 1 ·,·,5·25 ·2'1·11·15·12·12·13·11 2 13 13 ... · 10 1 2 ·2
... ., "·2..e ·, -9- 1' - 27-25-20-20- 11 "Il 17 111 10 • 11 2' -5 -I 10 0 I ,10 3 ·2 · '-21-lt·23· II·ll . , -1 ·12 -Q 6 11 15 ... • .. ·2 11
.. , "-1·12-12-15-25-21-19·13·13 - 5 • 111 17
• -5 6 2·IO-ll- ll-19 -2G-22 -U -s -5 1 11 13
15 7
, .. 13
• 9 I -2 2
S " • 0 -1 2
.. 7 ..q·2 I I!I 1 .... -10·23·26·1.. " ·1:2
·3 ·5-10'" 1 ISI 15 · 7· 13 ...
.. 0 .. · 7
-'I ) 5 I -'1.. 10·11- ,1·13·13 -I 0·)·1 ·7 , 11 11 ·2 ...
i2 -I -$ "- 1-13-21-20-2 1-22 -15 -41 2' 0 5 12 SI Il 11 , 2 7 0 -1 az ., 2 ..·5 ·$·1 I·) ·1 ·11 ....·13·10· 10 ... 0 0 · ) · 3 1 17 11 ,
• ~ - SI ·1 ·1 -6-16·25·22·19 ·17 ·9 0 'I , 7 10 7 11 13 .. .. 5 -2 SIll " .) 6 ' ·2 -5 •• ·$·13· 15·12·12·11 -1 ·10 ... 2 .. ·1 ·1 7 13 20
100 0 -2 -t -. -5 ·1 0-20·27-21 ·1" ·10 ~ 2 " 5 , t SI 12 11 t 6 0 100 ) II '11 13 )·5 · 1·13 ..11- 11· 11 -I ., -I 11 3 0 2 .. 10
11)1 ., 2 I'" -9·10·I'·21-2'rI·1$ ., ·$·1 :I , 5 6 t 10 12 e , .. 10. · 1 2 2 11 13 13 2 · 1· 13· 17· 11· 13 · SI 2 ·1 ·2 ) 2 2 5 )
1011 . , -"l J 2 ·' ·t-I1-15 ·,,·ae ·7 0 0 2 5 7 1 7 I!I 10 Jl 6 S Ic.·IO "'" 1 0 5 12 10 ·2·U ·ll-15·1)·II·II ·1 7 " 0 · 1 2 'I ., 7
112 2·10'" ) , ~·II ·II -,S·llt-I3·1 5 2 II 1 SI I!I I!I , SI e ) 112 · 1 -« 2 3 I I 10 5 ·5 ·12·1) ·10 -1· 12·111·2 I • 1·2 2 1 S
IlS , 2·13 -I 2 5 · ' · 1"- ' 1· 15-12· 11 1 6 .. e 11 I • I!I 'I 5 'I 111 ., ... · 1 1 7 ) 5 I ' ·)·10-10·' ·7·12·13 -lI 2 5·1 ..e·1 2
120 · 2 9 '·12 -I 2 , ""·15·11 ·13·12" I 5
12It · 5 1 11 lt ·I·1 0 ,, ·1·15·15·11·10 ·8 1
I 11 11 5 ., 8 0 0
II 'I 1 1 II , "-11
120 , -I.' 3 11 10 5 • 7 1 ..·11·11 -I ·'·10 ·S
1211 10 ' ·5·3 "12 11 , 1 .·2 -4·1 3·11 ... -I - ,
· 5 · 2 3..q·1 ...
- 7 - I - $ 2 ... · 10
121 ·)·2 II 12 5 ·1·11 · 5 ·1·10·,5 ·12·10 - I ·.. , ) 0 1 9 6 1 ) 121 -tI I!I 1 2 0 ) 11 , 6 11 6·1 ·7 -12-11 - 5 - 5 -6 -. ~1t ·5 2 -41
132 · 1 0 , 8 12 .. -1 ·12 ·1 ·5· ,0-Ill·10 ., ·5 ·) .. , ., 5 10 9 9 132·1I ·8 l& 7 Si 2 2 • 3 " 10 1 2 ... ·12 ·10 ·5 .... · 5 -1 ·1 0.. 0
136 I 0 0 I 5 9 2 ·7·12·10 ·$·10· 1) -6 · 2 · 2· 1 6 ...) 8 11 9 138 ·1 2 ~ 12 ·' 2 6 Si 0 ·2· 1·) 0 8 .. ·1 -"12 ·7 .$ -, , .......
'.0 - 2 1 ·1 ·2·2 2 II 0 -e·12· 10 -11 - 10·12·1 .. I -2 5 5 0 I 1I 1'10·10 ·13 -1 ·1 2 :I 2·3 ·,·7 -I .. I 0 - ' · 13· 10 .... . , -41 -I ·7
Figura 7.7 First- and second-order Wiener kerneis of the two-input experiment in the fly eye with
two spot stimuli a and b. The first -order Wiener kerneis are shown in the top-Ieft panel, indicating
much stronger response for spot b. The second-order Wiener cross-kernel h 28b is shown in the bot-
tom-right panel (as an array of numerical values) and exhibits the asymmetry expected in a direction-
ally selective cell (positive mount above the diagonal and negative valley below the .diagonal). The
second-order Wiener self-kernel h 2b b (bottom-Ieft panel) is much stronger than the other self-kernel
h 288 (top-right panel) [Marmarelis & McCann, 1973].
arising from the simultaneous stimulation ofthe two inputs can be separated into the com-
ponents evoked by each stimulus separately, as wen as the component due to the interac-
tion of the two inputs using the two-input modeling methodology to estimate the self-ker-
nels for each input and the cross-kernel for a second-order model [Mannarelis & Naka,
1973c].
oo b
0
Forword---
....--.-.-----1
300msec
Figure 7.8 Experimental and model-predicted responses of the two-input system of directionally
selective cell in the fly eye. Note the significant contribution of the second-order cross-kernel
(marked as "nonlinear interaction") and the minute contribution of the first-order kerneis (marked as
"linear model"). The significant contribution of the h 2b b self-kernel is shown in the trace marked "non-
linear model (only self-terms)" [Marmarelis & McCann, 1973].
Figure 7.9 shows the first-order Wiener kernels for the light-to-horizontal cell system
in the catfish retina obtained via crosscorrelation in three cases: (1) single-input GWN
spot stimulus, (2) single-input GWN annular stimulus, and (3) dual-input composed of a
spot and an annulus stimulus independently modulated by GWN signals. Hyperpolariza-
tion ofthe cell has been plotted upward. We note that the annular first-order kernel h Ials in
the presence of the spot stimulus is very similar to h l a (in the absence of the spot stimu-
lus). However, the spot kerneI in the presence of annular stimulus h Isla is larger and faster
than hIs, which implies that the mechanism responsible for the generation ofthe horizon-
tal cell response to a light spot stimulus becomes faster and its gain increases in the pres-
ence of a positive annular input. On the other hand, the presence of the spot stimulus does
not affect the annular response to any appreciable extent (unlike the polar cell response,
which is also shown).
Figure 7.9 also shows portions ofthe two GWN input signals (one for spot and the oth-
er for annulus) and the resulting horizontal cell response obtained experimentally, togeth-
er with the corresponding model response to these same inputs. The model response was
7.2 APPLICATIONS OF TWO-INPUT MODELING TO PHYSIOLOG/CAL SYSTEMS 373
FIRST-ORDER KERNELS
100 msee
I I
bipolar eell
horizontal eell
Single - input Two - input

a: annulus als: annular component
s : spot s/a: spot eomponent
HORIZONTAL CELL
Annulus
BIPOLAR CELL
I
500 msee ,
Figure 7.9 First-order Wiener kerneis and experimental and model responses of the two-input sys-
tem (spotlannulus) for the horizontal and the bipolar cell in the catfish retina (see text for details)
[Marmarelis & Naka, 1973c].
374 MODELING OF MULTIINPUTIMULTlOUTPUT SYSTEMS
computed from all Wiener kernel estimates up to second order. Agreement between the
system response and the model prediction is very good (NMSE is about 5%). The first-or-
der contribution brings the error down to 12%, suggesting that the system is fairly linear.
Similar results are shown in Figure 7.9 for the light-to-bipolar cell system in the catfish
retina [Marmarelis & Naka, 1973c].
It is evident from Figure 7.9 that the response time ofthe horizontal cell (latency ofthe
peak ofthe first-order kerne I) is longer for the spot input, but decreases in the presence of
the other input. For both bipolar and horizontal cells, the presence ofthe other input am-
plifies the entire waveform of the respective kernel, except for the horizontal annular ker-
nel. It is also evident that the bipolar cell has a biphasic RF (positive spot and negative an-
nular kemels for this on-center bipolar cell) but the horizontal cell has monophasic RF
(positive spot and annular kemels ) for this level of stimulation. It has been found that for
higher levels of stimulation, the annular first-order Wiener kernel of the horizontal cell
exhibits a negative undershoot after approximately 100 ms lag (consistent with our non-
linear feedback analysis of Sec. 4.1.5).
The results ofthe two-input experiment on the horizontal cell suggest an enhancement
of the spot response in the presence of an annulus stimulus but not vice versa. In the cat-
fish retina, the interaction ofthe spot annular stimuli in the external horizontal cells is fa-
cilitatory (the sum ofthe spot and annular responses is smaller than the response obtained
by simultaneous presentation ofthe two stimuli), whereas in the internal horizontal cells it
is neutral (insignificant). Marmarelis and Naka have shown, through an analytical solu-
tion of the spatial distribution of potential in the catfish horizontal cell layers, that these
results can be explained by an increase in the space constant due to the annular stimulus
in the case of the extemal horizontal cells. Thus, the catfish horizontal cells may perform
a dual function: produce the integrated receptive-field response (i.e., setting the operating
point at the average intensity level) and improve the frequency response ofthe initial reti-
nal processing stages at high intensity levels by means of negative feedback to the recep-
tors (see also Sec. 4.1.5).
We now turn to the study of the RF of the retinal ganglion cell, which is responsible
for encoding incoming visual information into spike trains transmitted through the optic
nerve to the LGN and the visual cortex. Webegin by showing in Figure 7.10 the first-or-
der and second-order Wiener kernels ofthe single-input, light-to-ganglion cell system for
three different GWN stimuli: (1) annulus, (2) spot, and (3) uniform field (i.e., covering
both annulus and spot area). It is evident that the annulus pathway input is dominant,
since the annular kemels are much closer to the field kemels than the spot kemels,
We proceed with the two-input study, in which the spot and annulus stimuli are inde-
pendent GWN signals presented simultaneously to the retina. The obtained first-order
Wiener kemels are shown in Figure 7.11 along with their single-input counterparts and
demonstrate the fact that the presence of the other stimulus alters the dynamics for both
pathways (suggesting strong lateral interconnections between center and surround areas
of the RF). This effect of lateral interaction is much more pronounced on the spot path-
way (as a result of the presence of the independent annular stimulus), as demonstrated
also by the power spectra ofthe various response components shown in Figure 7.12.
The second-order Wiener kernels for the two-input case are shown in Figure 7.13 for
Type A and Type B ganglion cells. These second-order kemels demonstrate the different
nonlinear dynamics of the two RF pathways (center and surround) and the differences in
the nonlinear dynamics of the two types (A and B) of ganglion cells. Most notably, it is
observed that only the Type A ganglion cell exhibits directional selectivity (asymmetric
A 25
• UNIFORM LIGHT
20 • SPOT OF LIGHT
+ ANNUWS OF LIGHT
15
.E 10 60 MSEC
1----1
~
~ 5
0::
~ c-: ..t!:t ..,
'....;str 0:54
0::
W
SEC
o -5 TIME T
0::
o
t- -10
(f)
0::
lJ.. -15
-20
-25
Figure 7.10a First-order Wiener kerneis for the light-to-ganglion (type A) cell system in the catfish
retina obtained for three different GWN light stimuli: spot, annulus, and field. The field stimulus is the
combination of spot and annulus. Ordinate units are in (spikes/sec)/(IJ.W/mm2) and 20 units corre-
spond to a change of 50 spikes/sec from the mean firing rate caused bya brief flash at the mean in-
tensity level [Marmarelis & Naka, 1973c].
GANGU<»4 CELL
Ti.1IC
'r, .He
.1. T., sec .1.
-K
o .Ga ._ .e. .11 o .012 ... ._ .121 .1' .0Sl .. ._ .121
o +--*--*'
'C>O
~ '-~. .~. AR
»
;:: ~ ~~:. \ ~:
0 • I I
.,al (!(? "a~~ ~~~

,,,~,"
\
.11
.•a
.11
AnNiut • ~ (T,. TZ) Spot. ha hi, TZ) Fiefd, ha (TI' Tz)
Figure 7.10b Contour plots of the second-order Wiener kerneis of the Iight-to-ganglion cell system
in the catfish retina for three different GWN stimuli: annulus (Ieft), spot (middle), and field (right). The
units of h 2 are (spikes/sec)/(IJ.W/mm2)2. The elevation contours are drawn at equal increments [Mar-
marelis & Naka, 1973c].
376 MODELING OF MUL TIINPUTIMUL TlOUTPUT SYSTEMS
40rA
Spot Component
Spo1 alone
€s: 20~ ~
A
o Spot ond onnulus
0
G>
c 0
~ I
0.1 0.2 0.3 0.4
-20L.
Time r; sec
40,.. 8
Annu[or Component
~s: 20~
A Annulus olone
A o Annulus ond spot
0
'ii
c
~
Q,)
~
-20
I \ V 0.2.
Time T. sec
0.3 004
Figure 7.11 First-order Wiener kerneis for light-to-ganglion cell (type A) system in the catfish retina
obtained from one-input (spot or annulus) and two-input (spot and annulus) experiments. Ordinate
scales are similar to those in Figure 7.1O(a) [Marmarelis & Naka, 1973c].
cross-kernel). We attribute these differences to the distinct connectivity of presynaptic

amacrine cells. We must also note that the obtained first-order Wiener kernels for the two
types (A and B) of ganglion cells in the two-input case have similar waveforms and re-
verse polarity (positive for spot and negative for annular kernels of the Type A ganglion
cell, and reverse polarities for the Type B ganglion cell) [Mannarelis & Naka, 1973c].
Ganglion Cel~timuti
- - ~••.•.•.:::::...... ~ E perimenI
•••••••••••;;.::~~"*"'C'~.':':::::-:.~ Model
01-
.....-;;:..._-- /.~ - '"\' I.y x
i -10 - -:7..... ~(spoll \ " '\ Spot {tmuIUS)
g Annulus-'\..\
l-20
Spot ""\\ ,\\
~'" .....
\ ..
'- ~ .....
I -30
\,\
~ \
~i~~ \ \
~\ i
2 0---"·10 60
Frequency, Hz
Figure 7.12 Power spectra of the inputs and outputs of the light-to-ganglion cell (type A) system.
One-input response power spectra are compared with two-input power spectra with similar stimulus
configuration. Notation "annulus (spot)" indicates the annular component of the two-input model
prediction in the presence of the spot stimulus. Curves marked "model" and "experiment" are the
spectra of the two-input model and the experimental responses. Spectra marked "spot" and "annu-
lus" are computed from the response of the one-input experiments [Marmarelis & Naka, 1973c].
'r,. see TI' sec T" see

o .032 .C* •• .t21 .11 .0» .• •• .121 .18 o .032 .0IlI .011 .t2l· .18
o4 • < • ('( • <.' ,( ',,' < t • ' O' 6t::. • :A:' • t' t
.0!2
~.~ ~~:
.0!2
~
o~
• •ClIlI .08lI
tf'
.• ...~
- U~·. ~'\
o ."0
~t::>~O ~
~~~.::
.121 .128
.11
Type A QOnQlion. St\a(li.Ta)

.t8
~Zl\<
Type A QOngUon, Ahz(li. t'zl Type A QONJlion. SAh2 ( tj . TZ'
Figure 7.13a The second-order Wiener kerneis for the two-input light-to-ganglion (Type A) cell sys-
tem. Sh 2(7'1 , 7'2) denotes the spot self-kernel, Ah 2 ( 7'1 , 7'2) the annular self-kernel, and SAh 2 ( 7'1 , 7'2) the
spot-annulus cross-kerne!. Kernel units are in volts/(fl-W/mm2)2 [Marmarelis & Naka, 1973c].
~
o
t t
.032
(. <
1j.
t
sec
••
! •
••
te K·
.121
.. :ol
.1'• •
.....- '1i. sec
,J» .11 .03a
'fit sec
•• •• .121 .t8
~1D2
H••
...-..
•t21
.11
Type B CJOnCJ'ion, Sht (ti. Ta) Type 8 9O"Qlion, ~ (lj • T2J Type B 9Ongllon, SAha(tit TZ)
Figure 7.13b The second-order Wiener kerneis for the two-input Iight-to-ganglion (type B) cell sys-
tem. Note the lack of significant asymmetry in the cross-kernel [Marmarelis & Naka, 1973c].
Representative first-order Wiener kernels ofType N amacrine cells in the catfish retina
(both NA and NB connecting to Type A and B ganglion cells, respectively) for single-in-
put and dual-input experiments are shown in Figure 7.14. We note the suppression ofthe
spot kernel in the presence of annular input for the Type NA amacrine cell (but not for the
Type NB amacrine cell), indicative of strong lateral interactions that are consistent with
the previously observed directional selectivity ofthe Type A ganglion cell (which is post-
synaptic to the Type NA amacrine cell).
The observed differences in RF center-surround organization (in terms of response
characteristics) and in relative degree of nonlinearity (measured by the relation between
first-order and second-order responses) led Panos Marmarelis and Ken Naka to a clever
classification scheme portrayed in Figure 7.15, whereby various types ofhorizontal, bipo-
lar, amacrine, and ganglion cells in the catfish retina are shown to form distinct clusters.
These results suggest that, throughout the catfish retina, the central RF mechanism is a
slower process when it is excited alone, but it becomes faster (both latency-wise and
bandwidth-wise) in the presence of an annular stimulus. This "speed-up" of the central
, 0.1 sec,
Figure 7.14 (A) First-order Wiener kerneis from type NA amacrine cells and (8) NB-type amacrine
cells in the catfish retina. Two sets of kerneis are shown, one set from two-input spotlannulus exper-
iments and the other from single-input experiments, under the same conditions. In A, traces 1 and 3
r-;
are h 1al s and and traces 2 and 4 are h 1sl a and h 1s • Note the complete suppression of the spot ker-
nel in the presence of the annular input. In B, traces 1 and 3 are h 1sl a and h 1s ' and traces 2 and 4 are
h 1al s and h 1a • No other significant first-order effects are observed. Upward deflection is for hyperpo-
larization of the membrane potential [Marmarelis & Naka, 1973c].
RF mechanism most likely takes place at the level ofthe outer plexiform layer, since it is
first observed in the horizontal cell responses and is replicated at the ganglion celllevel.
This study also shows that as the mean intensity level is increased, the retinal cell kemels
resulting from field or annular inputs become less damped, whereas the kernels resulting
form spot inputs remain overdamped. This can be explained by the stipulation of a nonlin-
ear feedback mechanism which is inactive at low-intensity levels but becomes active at
higher intensity levels. The existence of a negative feedback mechanism results in an im-
provement ofthe frequency-response characteristics ofthe system as the bandwidth ofthe
system is extended, consistent with the analysis presented in Section 4.1.5.
7.2.3 Metabolie Autoregulation in 00g8

The effect of insulin on the concentration of blood glucose has been studied extensively
in the context of diabetes mellitus and the treatment of Type I diabetic patients with in-
sulin injections. The recent development of continuous glucose monitors and insulin mi-
cropumps has stimulated again the long-held interest in an "artificial pancreas," which re-
quires a thorough understanding of the dynamic insulin-glucose relationship and reliable
means to control the release of insulin in order to maintain fairly steady glucose levels in
the blood. Achieving these important objectives requires reliable nonlinear-dynamic
models that can be used also for improved diagnostic purposes in a clinical context. Ex-
amples of such models were given in Section 6.4.
In this section, we consider a two-input modeling task that seeks to reveal and quantify
the causallinks between two simultaneous inputs (plasma insulin and free fatty acids) on
In
II
8
(/)
o:hyp 0: hyp
~ s:dep s . hyp
§, Ire Ire 0::
>.
w o:dep
z 0: dep c
.- ""
0
s:dep s . hyp ~ ! 4
~
g sc: ,/'~~Yb-~E~RÖNS----" -' . .
CI) .-
LLJ ._ ~ ...
Z I 0 0 0 00 ''',
:J m m \ 0 0 \
s
Z
In
o BIPOLAR Ba
2
"
--~
~''''
0 \
I
I
I
..._ ... _ ... - .... - 12....
o o 0 ••••••••• - - - __ ~/ HORIZONTAL
~ 8- -",--- - - ~~ - ~~ _.,.'-~
~~l
- - --0 - ..f) - - - - - - - 0- -
o 0 (~ o oo.. . __-- __~~Jo 1 00 0 00 0 006' IOOC81c}OI• •
e:tO 16 8 4 ---2 1 l/2 Y4 '/4 '/2 1 2 4 Nb'NEURO~~-""
~~ ,..... ----.......... [o]=[s]
:J~ " ... ---[0]=[5]---------......... 8 16
'~ Q:JD ~ -, 0 0 01 0 00 g:> 0& ~ 0
-l..."......-oo..Q...~}- - ~'-"..r:'!
1 0)
- ,,*1;>- - - a J - \-f'- -§ -
(/)
I
<:t> ,,'
-~~~~-~,
-- ...........
-
, ',.~~~~~~-~~--~~-~~-
..Q. -oW - - ..-";""":.... - - - - -
Z No NEURONS " 00 0 "\ BIPOLAR Bb

0 : 0 0 0 J 2
0: \ I') 0 I
::> -, Ya NEURONS .>
LU larger annulor [0] ...----------- ..... [al larger annular
z
0:
«
con1ribution (s] /'i-"' o"\ 4 Cs] contribution
w
z I /"9
o0 0" ,
:J I 00 ,
z ,
, ,
1
s "'".o. 2.-'''''" ~ 8
C NEURONS
Figure 7.15 Clusters of functional classification of catfish retinal neurons. The lower-case designa-
tion "a" or "b" denotes depolarizing or hyperpolarizing cells respectively (earlier denoted as Type A
and Type B). The Y cells are ganglions with strong nonlinearity [Marmarelis & Naka, 1973c).
the concentration of plasma glucose (output). The experimental data are from an anes-
thetized dog under conditions of spontaneous activity (i.e., normal closed-loop operation
without external infusion of insulin, free fatty acids or glucose). The samples were col-
lected every 3 min over a 10 hr period (200 data points for each of the three variables)
[Marmarelis et al., 2002].
Application ofthe method presented in Section 7.1.3 (for L = 5, H= 2, Q = 3) yielded
the PDM model shown in Figure 7.16 that has a single pair ofPDMs (one for each ofthe
two inputs). The PDM corresponding to the insulin input exhibits glucoleptic characteris-
tics (i.e., causes a reduction in glucose concentration as a result of an increase in insulin
concentration). The dynamics ofthis glucoleptic effect show maximum response after ~3
hr and an earlier smaller peak after ,ooJ1 hr, whereas the effect is diminished after 7-8 hr.
The other PDM, corresponding to the free fatty acids (FFA) input, exhibits a positive ef-
feet on glucose concentration (as expected) with the maximum response occurring almost
immediately (zero lag point in the PDM curve) and a secondary peak after ~1 hr. The
FFA effect is diminished after ~3 hr. The combined effect of these two PDMs is deter-
mined by the associated static nonlinearity shown on the right of the PDM model, which
transforms the sum of the outputs of the two PDMs. It is evident from the form of this
concave nonlinearity that an increase of FFA input will reduce the sensitivity of glucose
uptake to insulin (glucoleptic reduction), because it will move the "operating point" ofthe
380 MODELING OF MUL TIINPUTIM UL TlOUTPUT SYSTEMS
-3
x10
5
.01---------·----------------'-
Insulin -5 1-,·--------------1-··-----;--·-----·-·----------------1
---.
2
o
GIUCOS4
-20' ,
o 500 1000 -2 --+
Time [mins] ( +
1 -.- - - - - - - . - - - - - - - . . , -4
0.8 --······----·---···---···1-----·--·--·--·····-----·- -6
-2 -1 o 1 2
FF~ ::: ::::::..:.:.:.::::.::::..L::::::::::::::::::::

O.2f \ j .
o
o 500 1000
Time [mins]
Figure 7.16 The PDM model of the two-input system defined by the spontaneous variations in the
concentrations of plasma insulin and free fatty acids (FFA) as inputs and glucose as output in dogs.
The two PDMs quantify the effect of elevated FFA on insulin sensitivity.
insulin-glucose relation higher where the slope of the nonlinear curve (sensitivity gain) is
smaller. This lucid result (if confirmed by additional data and analysis) can explain the ef-
fect of obesity on insulin sensitivity and potentially elucidate one major factor for the on-
set ofType 11 (or adult onset) diabetes in a quantitative manner.
The prediction of this model is shown in Figure 7.17 for this set of experimental data
and demonstrates good performance in terms of predicting the slow changes in glucose
concentration, but it does not capture the high-frequency variations that are viewed as
systemic noise/interference. These high-frequency fluctuations ought to be examined sep-
arately, since they may not be part ofthe causal relations among the three considered vari-
ables.
7.2.4 Cerebral Autoregulation in Humans

Multiple homeostatic mechanisms regulate cerebral blood flow in humans to maintain a
relatively constant level despite changes in cerebral perfusion pressure [Edvinsson &
Krause, 2002; Panerai et al., 2000; Poulin et al., 1998; Zhang et al., 2000]. A modeling
example of dynamic cerebral autoregulation was given n Section 6.2, whereby sponta-
neous fluctuations of beat-to-beat mean arterial blood pressure (MABP) was viewed as
the input and the corresponding mean cerebral blood flow velocity (MCBFV) was viewed
as the output [Mitsis et al., 2002, 2003a, c; Zhang et al., 1998, 2000].
A PDM model was obtained that revealed new information about the nonlinear dy-
-2
-4
-6 :"1~V.....
-8
-10 I ,
o 100 200 300 400 500 600

Time [m ins]
Figure 7.17 The output prediction of the two-input model shown in Figure 7.16, demonstrating the
prediction of glucose concentration based on insulin and FFA concentrations under spontaneous
operating conditions in dogs.
namic properties of cerebral autoregulation and the quantitative manner in which rapid
changes in arterial pressure induce rapid changes in cerebral flow. The observed data cast
doubt on the validity of the notion of "steady-state" analysis, since no such "steady state"
is ever observed in the natural operation of cerebral circulation, and advances the notion
of dynamic autoregulation (Le., frequency-dependent). Furthermore, the obtained models
have demonstrated that cerebral autoregulation is more effective in the low-frequency
range (below 0.1 Hz), where most of the MABP spectral power resides and where the
model exhibits significant nonlinearities. Therefore, most spontaneous MABP changes do
not cause large MCBFV variations, because of cerebral autoregulation mechanisms that
are more prominent in the low frequency range (up to 0.1 Hz) and exhibit dynamic (i.e.,
frequency -dependent) nonlinearities.
It is also well established in the literature that changes in arterial CO 2 tension cause
vascular responses in cerebral vessels [Poulin et al., 1996, 1998]. The putative physiolog-
ical mechanism is described by the pH hypothesis, which postulates that systemic CO 2
crosses the blood-brain barrier and modulates the extracellular and perivascular [H+],
thus changing the smooth muscle properties. Specifically, it is qualitatively known that
hypercapnia induces vasodilation and hypocapnia causes vasoconstriction, but precise
quantitative (dynamic) relations of this causallink were still lacking until the recent two-
input study described below.
A number of studies have examined MCBFV responses to step changes in CO 2 tension
[Panerai et al., 2000; Poulin et al., 1996, 1988], and it was shown that this response is not
instantaneous but lags the CO 2 tension changes by several seconds. Poulin et al., devel-
oped a simple one-compartment model for the cerebrovascular response to hypercapnia,
characterized by a time constant, a gain term, and a pure delay. A second compartment
with a larger time constant (on the order of 7 min) had to be included for the hypocapnic
response, since a secondary, slow adaptation process of MCBFV increase in response to
the hypocapnic stimulus was reported. An asymmetry in the on-transient and off-transient
responses to hypocapnia was also reported, with the on-transient being significantly faster
and with a smaller gain than the off-transient, whereas a pure time delay equal to 3.9 sec
was estimated [Poulin et al., 1998]. Nonlinear mathematical models were developed by
Ursino et al. (1998, 2000) in order to describe the interactions between cardiovascular im-
pedance, arterial CO 2 tension, and intracranial blood pressure, whereby the interaction be-
tween arterial CO 2 tension and cerebrovascular effects on autoregulation was modeled
with a sigmoidal relationship.
Since the easiest way to observe changes in arterial CO 2 tension is to monitor the breath-
to-breath end-tidal CO 2 (PETC02) variations, the latter measurements can be used as a sur-
rogate to study the dynamic effects of CO 2 tension on vascular impedance by introducing
the PETC02 signal as a second input (in addition to spontaneous MABP variations being the
first input) and obtain a nonlinear dynamic model of MCBFV variations (output). This has
been attempted in a modeling study by Panerai et al. (2000), that employed causal FIR fil-
ters and spontaneous breath-to-breath PETC02 variations to assess the effect of arterial CO 2
on MCBFV in a linear context. It was found that, when used along with beat-to-beat MABP
variations, PETC02 variations improve the prediction performance of the model consider-
ably. The dynamic characteristics ofthe MABP-to-MCBFV and PETC02-to-MCBFV rela-
tions were obtained in the form of impulse response functions and no significant interac-
tions between the two input variables were reported [Panerai et al., 2000].
Here, we employ the two-input formulation of the LVN modeling approach, which is
suitable for nonlinear systems with two inputs, in order to assess the nonlinear dynamic
effects ofMABP an~fE'l'C02 on MCBFV as well as their-nenlinearinteractionsjlvlitsis et
al., 2002, 2003a, cl.
Six-minute data segments (360 data points) from ten normotensive subjects were used
to train a two-input LVN with structural parameters LI = L 2 = 8, H = 3, Q = 3, resulting in
60 free parameters (a very low number ofparameters for a nonlinear model with memory
extent of about 2 min or 160 lags). The achieved model parsimony results in significant
performance improvement relative to conventional methods, such as the cross-correlation
technique. To terminate the training procedure and avoid overtraining the two-input LVN
model, the prediction NMSE is minimized for a two-minute forward segment of testing
data (adjacent to the six-minute training data segment). The model is estimated for a slid-
ing six-minute data window (with five-minute overlap) over 40-min recordings in order to
track any nonstationary changes in the system.
The mean values ofthe MABP, PETC02, and MCBFV data, averaged over the 40-min
recordings, are 77.2 ± 9.1 (mm Hg) for MABP, 40.0 ± 1.9 (mm Hg) for P E TC02, and
59.1 ± 12.3 (ern/sec) for MCBFV. Typical six-minute data segments are shown in
Figure 7.18, along with the spectra ofthe corresponding high-passed (at 0.005 Hz) data
sets. The high-pass filtering is performed in order to eliminate very slow trends. Most of
the signal power resides below 0.1 Hz, although the MCBFV signal exhibits some pow-
er up to 0.3 Hz.
The average NMSEs of in-sample model prediction is given in Table 7.2 for LVN
models with one input (MABP or P E TC02) and two inputs (MABP and PETC02) ofvarious
orders (up to third). The complexity of the one-input and two-input models, in terms of
the total number of free parameters, is the same. Although MABP variations explain most
7.2 APPL/CATIONS OF TWO-INPUT MODEL/NG TO PH YS/OLOG/ CAL SYSTEMS 383
Mean Arterial Blood Pressure End-tidal C02 Cerebral Blood Flow Velocity
90 ' " 70, "
i::~~~r~\~I~~~~~I~~~m
60 1 I I 35
0 120 240 360 0 120 240 360 0 120 240 360
Time [sec] Time [sec] Time [sec]
ABP spectrum PETC02 spectrum Vp spectrum
250 , 10 120
I 100
:~
200 1.
BO
150 ~W \
100 ~ I 60
40
50 1 I. I 20
0 0
0.1 0.2 0.3 0 0.1 0.2 0.3 0 0.1 0.2 0.3
Frequency [Hz] Frequency [Hz] Frequency [Hz]
Figure 7.18 Typi cal data segments used for two-input LVN (TI-LVN) model esti mation. Top panels:
t ime series, bottom panels: spectra of high -passed (at 0.005 Hz) signals [Mitsis et al., 2003a].
ofthe MCBFV variations, the incorporation of P E TC02 variations as an additional input in

the model reduces the third-order model prediction NMSE by about 6%. The predi ction
NMSE achieved by the third-order model in the two-input case (TI-LVN) satisfies our
model-order selection criterion and a third-order model is selected.
The prediction performance achieved by the TI-LVN model for a typical data segment
is shown in Figure 7.19 (left panel), along with the contributions ofthe linear and nonl in-
ear self-kernels and cross-kernels. The overall model prediction is very good, as attested
to by the low value of the average NMSE (20%), and the contribution of the nonlinear
tenns is significant (about 20% average NMSE reduction). The right panel ofFigure 7.19,
where the spectrum ofthe MCBFV output is compared with the spectra ofthe third-order
and first-order model residuals, shows that the nonlinearitie s reside mainly in the low fre-
quencies below 0.08 Hz (the shaded area corresponds to the improvement achieved by the
nonlinear terms).
Table 7.2 Average in- sam pie predict ion NMSEs (± standard deviation) for one-inp ut and two- input
LVN models for cerebral autoregulation in 10 normal human subjects
Model inputs
Model order MABP P ETC0 2 MABP & P ETC0 2
1 42 .2 ± 7.2% 93.2 ± 2.7% 38.2 ± 6.5%

2 25.7 ± 8.3% 78.2 ± 6.4% 22.0 ± 6.0%
3 26.8 ± 7.6% 71.7 ± 4.8% 20.2 ± 5.4%
384 MODELING OF MUL TIINPUTIMUL T10UTPUT SYSTEMS
TNI Wp~
/
o 13) 1lKl 2«l 300 3Sl 0115 0.1 0.15 0.2 0.25 03
-nn.[MCj Frlquaney (Hz]
Figura 7.19 Left panel : Actual MCBFV output and model predictions (total , linear , and nonlinear
terms) for a typical data segment. Right panel: Spectra of actual output and model residuals (first-or-
der and third-order total) [Mitsis et al., 2003a].
The contribution of each of the two model inputs as weIl as their nonlinear interaction
can be seen in the left panel of Figure 7.20 for the same data segment. The top trace cor-
responds to the total (third-order) model prediction, the second trace corresponds to the
contribution of the MABP input (linear and nonlinear terms ), the third trace corresponds
to the contribution ofthe P ETC0 2 input, and the bottom trace corresponds to the nonlinear
interaction betw een the two inputs (second-order and third-order cross-kernels). The
MABP component accounts for about 60% ofthe total model prediction power, the P E T-
/Trueou!pI.C
120 ieo 240 300 300 0.05 0.\ 0.\5 0.2 0.25 0.3
llme[HCj FrlqU1/lCY 1Hz)
Figura 7.20 Left panel: Total (third-order) TI-LVN model prediction and contributions of MABP, PET-
C02' and non linear interaction terms for the data segment of Figure 7.19. Right panel : Spectra of ac-
tual output, first-order and total (third-order) residuals [Mits is et at., 2003a] .
7.2 APPLICATIONS OF TWO-INPUT MODELING TO PHYS/OLOG/CAL SYSTEMS 385
COz component accounts for an additional 17% of the total model prediction power, and
the interaction component accounts for the remaining 23% in this example. The spectra of
the MABP and of the total model residuals are shown in the right panel of Figure 7.20
along with the MCBFV output spectrum. It is observed that most of the contribution of
the MABP input lies above 0.04 Hz, whereas the contribution ofthe PETCOZ input and the
two-input interaction lies primarily below 0.08 Hz, being most prominent below 0.04 Hz
(as illustrated by the shaded area).
The relative linear-nonlinear contribution of each input (MABP and PETCOZ) in the to-
tal model prediction is illustrated in Figure 7.21, where each input contribution is decom-
posed into its linear (first-order Volterra) and nonlinear (second-order and third-order
self) components. For this specific data segment, the power of the linear MABP compo-
nent corresponds to about 80% of the total MABP power contribution, whereas the power
ofthe linear PETC02 component is approximately equal to that ofits nonlinear counterpart.
The aforementioned observations are consistent among different segments and/or sub-
jects. However, considerable variability over time is observed in the form ofthe nonlinear
self- and cross-Volterra kernels (see Section 9.4).
The first-order MABP Volterra kernel for one subject, averaged over the 40 min
recording (6-minute sliding data segments with a 5-min overlap), is shown in Figure 7.22,
both in the time and frequency domains (log-linear plots, whereby the time lag values are
incremented by one). The form of the kernel is consistent among different segments, as
demonstrated by the tight standard deviation bounds. The high-pass characteristic of the
first-order frequency response implies that slow MABP changes are attenuated more ef-
fectively (i.e., autoregulation ofpressure variations is more effective in the low-frequency
range below 0.04 Hz where most of the power of spontaneous pressure variations re-
sides). Two resonant peaks are evident around 0.06 Hz and 0.2 Hz, with a secondary one
appearing around 0.025 Hz. Compared to the first-order frequency response obtained
o
-2
-4
-6
Nonlinear MABPcontribution
-8' Nonlnear
, PETC02
, , contribution
, , ,
o 60 120 180 240 300 360 o 60 120 180 240 300 360
Time [sec] Time [sec]
Figure 7.21 Decomposition of the contributions of MABP (Ieft) and PETC02 (right) inputs in terms of
linear and nonlinear components for the data segment of Figure 7.19 [Mitsis et al., 2003a].
386 MODELING OF MUL TIINPUTIMULTIOUTPUT SYSTEMS
1.6 'r -~~~-~~~--......-,
0.8 1.4
1.2
0.8
0.6 r- ,
o
0.4
-o.a 0.2
.{lA , , 0' ,
1~
1 2
10 10 10-2 10.1
l1me[sec) Frequency [Hz]
Flgure 7.22 Average first-order MABP kernel (solid line) and corresponding standard deviation
bounds (dotted Iines) for one subject over a 40-min data record (see text). Left panel: time domain.
Right panel: FFT magnitude [Mitsis et al., 2003a).
when only MABP is used as an input [Mitsis et aI., 2002] , the MABP-to-MCBFV first-or-
der frequency respon se in the two-input case exhibits reduced gain in the low-frequency
range , reflecting the fact that most ofthe low-frequency MCBFV variations are explained
by P E TC0 2 fluctuations. In the higher frequency ranges, the MABP first-order kernels are
not affected by the inclusion of P E TC0 2 as an additional input.
The average first-order P E TC0 2 kernel for the same subject is shown in Figure 7.23. It
should be noted that the results shown here are obtained without shifting the P E TC0 2 data.
Since it is known that a pure delay of 3--4 sec is present in the P E TC0 2 dynamics, signifi-
cant variance in the initial time lags ofthe first-order and second-order kernel estimates of
the P E TC0 2 input results. The first-order P ETC0 2 frequency respon se (right panel of Figure
7.23) has most of its power below 0.04 Hz and especially below 0.02 Hz (notice the
shoulder at 0.03 Hz) and exhibits a secondary peak around 0.15 Hz, giving the P E TC0 2
first-order frequency response a low-pass characteristic rather than the high-pass charac-
teristic of its MABP counterpart.
Typical second-order MABP and P ETC02 self-kernels and the corre sponding cross-ker-
nel are shown in Figure 7.24. Most of the power of the second-order kemels lies below
0.04 Hz, with some addit ional peaks around 0.06 Hz and 0.16 Hz. Ther e are two diagonal
peaks in the MABP second-order frequency response: a main peak at [0.02,0.02 Hz] and
a secondary peak at [0.16, 0.16 Hz], as weIl as off-diagonal peaks at bifrequency points
[0.02 ,0.06 Hz] and [0.02, 0.16 Hz]. The P ETC0 2 second-order frequency response has a
main diagonal peak also at [0.02, 0.02 Hz] and a secondary peak at [0.02, 0.06 Hz]. The
main cross-kerneI peak occurs at [0.02, 0.02 Hz], and two secondary peaks at [0.16 Hz,
0.02 Hz] and [0.06, 0.02 Hz] (note the asymrnetry of the cross-kernel), which are related
to the MABP and P E TC0 2 self-kernel main peaks and imply nonlinear interactions be-
tween the primary mechani sms of the two inputs acting at these specific frequency bands.
Although the second-order kernels are considerably variabl e among different data seg-
ments, the main diagonal peak ofthe second-order MABP and P E TC0 2 frequency respons-
0.8
0.7
0.6
0.5 0.5
0.4
0.3
o " - " " ' \ , .... 0.2
0.1
-0.1
1ci 10.2 10.1
~[Hz]
Figure 7.23 Average first-order PETC02 kernel (solid line) and corresponding standard deviation
bounds (dotted lines) for one subject over a 40-min data record (see text). Left panel: time domain.
Right panel: FFT magnitude [Mitsis et al., 2003a].
es stays in the neighborhood of 0.02 Hz, and eonsistently defines one eoordinate value for
the secondary off-diagonal peaks in the bi-frequeney domain, while the other coordinate
value lies in the mid-frequency range (0.05-0.10 Hz) and the high-frequeney range
(0.10-0.25 Hz). The cross-kernel peaks are related in general to the self-kernel peaks.
In order to i1lustrate the performance ofthe two-input model, we simulate it for hyper-
capnic and hypocapnic pulses with a magnitude of 1 mm Hg (onset at 20 sec and offset at
420 sec), followed by a shorter MABP pulse with a magnitude of 8 mm Hg applied be-
tween 150 and 300 sec. The onset/offset times are selected to allow sufficient settling
time, based on the estimated kernel memories. The corresponding MCBFV model re-
sponses for a typical subjectlmodel are illustrated in Figure 7.25 and demonstrate that: (1)
hypereapnia inereases MCBFV and hypoeapnia reduces it (as expected); (2) the on-tran-
sient and off-transient responses to the MABP step are distinct in waveform (i.e., not
symmetrie), and the magnitude of the off-transient peak deflection is slightly larger than
the eorresponding on-transient peak defleetion; (3) the settling time of the hypercapnic
on-response to the MABP step is larger (around 50 sec) than that of the normocapnic or
hypocapnic on-response (around 25 see); (4) the settling time for the off-response tran-
sient to the MABP step is about the same (=40 see) in all cases; and (5) the on- and off-
responses to the PETC02 step are roughly symmetrieal in waveform, although the size of
the hypocapnic steady-state response is slightly larger (=20%).
The form ofthe responses to MABP andPETC02 step changes demonstrate the autoreg-
ulatory characteristics of the obtained models in a quantitative manner, but exhibit some
differences compared to previously reported results, where the MCBFV responses to step
increases in PETC02 were found to be much slower than the responses to step decreases in
PETC02. This may be due to the fact that our model was estimated based on small fluetua-
tions of PETC02 around its mean value, or that the model does not account properly for
closed-loop processes active in this system (see Chapter 10).
The presented results demonstrate the effieacy of the advocated methodology for
388 MODELING OF MULTIINPUTIMULTlOUTPUT SYSTEMS
0.25 llJiJ X 11m" I ( <

~, -- - ' - - "f ' ''1 ''''
(, i I') I
0.5
.. '
0.2
Q.
o
~ 0.15 o
~
-0.5
o
..'
J 01
0.05
'ö
50 ~
Time [sec]
o Time [sec] 0.05 0.1 0.15 0.2 0.25
Frequency [Hz]
0.25'1 1I I I
0.05 0.2
N
~ 0.15
~
Q.
0
-0.05
}.,
o 0.05
o........,.,I".;=-c:::::::::;::.;
,
- -
o lime [sec] o 0.05 0.1 0.15 0.2 0.25
Frequency [Hz]
0.25 , ' 11 1 1 I
.>:
0.1 0.2
N
~ ~ 0.15
i 01~(W
W 0
~
CD
~ -0.1
o 0.05
o o
lime [sec] o 0.05 0.1 0.15 0.2 0.25
Frequency [Hz]
Figure 7.24 Typical second -order selt-kemels and cross-kemels in the time (Ieft panels) and tre-
quency (right panels) domains tor cerebral autoregulation in normal human subjects. Top: MABP
selt-kemel; middle: PETC02 selt-kemel; bottom: cross-kemel [Mitsis et al., 2003a].
two-input modeling of nonlinear physiological systems, quantifying the linear and non-
linear effects of each input and their nonlinear interaction. This powernd approach can
be extended to include additional physiological variables affecting autoregulation
(which may lead to a better understanding of the underlying physiological mechanisms
under normal and pathophysiological conditions), using the methods outlined in the fol-
lowing section .
7.3 THE MUL TIINPUT GASE 389
10
- Nonnocapnla
8 ...... Hypercapnla
- . - Hypocapnla
6
..... .....................
4
... ...... .. ... .... .
'~.:: '
J\:.. . ......... ... .... ~
~ 2 .. , ". :
~ 0 y,. :Ät' ;..

I. I
.!:!.
Ci: -2
\
\
.\
\.
,..
.r- -- - · _· _· _· - . \ -
co
u -4
"-
_.- . _ . _. \/
- v ' ... ._. _._. . . .
./
I
~I
~I ,
,
,\
-10
-12
o 50 100 150 200 250 300 350 400 450 500
Time [sec]
Figure 7.25 Model response to MABP single-input pulse (solid), to dual- input MABP pulse and hy-
percapnia (dotted), and to dual- inpur MABP pulse and hypocapnia (dashed-dotted) (see text for de-
tails) [Mitsis et al., 2003a).
7.3 THE MULTIINPUT CASE
Having examined the two-input case in some detail, we now extend this approach to the
case of multiple inputs. Note that there is no methodological difference between a single
output and multiple outputs, because the Volterra-type model is derived separately for
each output of interest. In this connection, we should note also that nonlinear autoregres-
sive (NAR) terms can be incorporated in this methodological framework, whereby the
past epoch of the output signal can be viewed as another "input." Thi s attains particular
importance in closed-Ioop systems and is examined in more detail in Chapter 10.
With regard to multiple inputs, the mathematical formali sm of the two-input Volterra
series is readily extendable to M inputs :
y( t) = ko. . . . . 0 + r
o
k, .o. .. .. o(r)x ,(t - r)dr+ . . . + r0
ko. .... o.,( r)x,,/t - r )dr + . . .
+ f~ ·
o
·f knl. . .. . nM(rl ' .. . , rn,+...+nM)Xl(t - r l) ... xAlt - rn,+... +nM)dr , . . . r n,+. . . +nM
+,. . (7.22 )
where nJ, ... , nM denote the multiplicity ofproduct terms in the general Volterra func-
tional of order (nI + ... + nM) that correspond to each input XI' ... ,XM, respectively. The
complexity of this mathematical fonnalism renders the approach unwieldy and impracti-
cal as the number of inputs increases and/or the nonlinear order of the model increases.
Nonetheless, the multi input Volterra model retains a mathematical elegance and gener-
390 MODEL/NG OF MULTIINPUTIMULT/OUTPUT SYSTEMS
al applicability (no arbitrary constraints are imposed apriori) that is without peer in a
nonlinear context. For the special case of second-order systems/models, a large number of
inputs can be accommodated in practice. The best example ofthis type ofmodel/system is
in the area of spatiotemporal modeling of visual systems that is discussed in Section 7.4.
No applications of multiinput modeling (more than two inputs) for higher-order sys-
tems are known to the author to date. This is because of the aforementioned "curse of di-
mensionality" for higher-order systems, compounded by the numerous cross-kemels aris-
ing from the presence of multiple inputs. We present in Section 7.3.3 a network-based
methodology that offers for the first time a realistic solution to the vexing problem of
multi input modeling of high-order systems. This solution is viewed as a possible path to
understanding the full complexity of multivariate physiological systems. For instance, in
the case of cerebral autoregulation discussed earlier, we may be able to incorporate addi-
tional inputs of interest (e.g., heart rate variability, respiratory sinus arrhythmia, pR, O2
tension, nitric oxide, cathecholamines, etc.), or, in the case of glucose metabolism, we
may include (in addition to insulin and free fatty acids) the concentrations of glucagon,
epinephrine, norepinephrine, cortisol, etc. Clearly, the methodological ability to extend
our modeling to multiple inputs is critical but also relies on the availability of appropriate
time-series data. The latter constitutes another major challenge in advancing the state of
the art, for which emerging micro- and nanotechnologies can be of great assistance.
Before we proceed with the network-based method that is advocated in this book, we
discuss in Section 7.3.1 the cross-correlation-based method because it has been used in
connection with spatiotemporal modeling in the visual system (illustrative examples are
given in Section 7.4). We also discuss the kemel-based method in Section 7.3.2, because
it constitutes the methodological foundation for the recommended network-based method
(discussed in Section 7.3.3).
7.3.1 Cross-Correlation-Based Method for Multiinput Modeling

The cross-correlation-based method for multi-input modeling is, of course, based on a
Wiener-type orthogonalized functional series when the inputs are independent white or
quasiwhite signals. The orthogonalized multiinput Wiener series for second-order sys-
tems/models takes the extended form ofEquation (7.2) for Minputs:
y(t) = ho,... .o + fo
hj,o, ... ,0(T)xj(t- T)dT+ f0
ho,o, ... , j(T)x,w{t- T)dT
+ f'J
o
h2,0, ... , o(Tl> T2)xj(t - Tj)xj (t - T2)dTjdT2 - P, (00 h2,0, ... ,0(.'\, A)dA + ...
Jo
+ ffo ho,o, ... , 2( Tl> T2)xA!...t - Tj)xA!...t - T2)dTj dT2 - PM (00ho,o, . . . , 2(A, A)d'\ + .. ·
Jo
11 00
+
nl=O
I ... InNFO 0
Lfhnj, ... ,nJTiI'Ti2)xnj(t-Tij) ... XnM(t-Ti2)dTijdTi2 (7.23)
nl+ ... +n~2
where the indices i 1 and i2 in the last term correspond to the only two values of (nb . . . ,
nM) that are nonzero in the multiple summation (note the constraint: nl + ... + nM = 2 in
the last summation) [Marmarelis & Naka, 1974a].
The notation for the second-order cross-kernel terms of Equation (7.23) (the last tenn
on the right-hand side) can be extended to higher-order kemels, but this has no practical
7.3 THE MUL TIINPUT GASE 391
utility sinee the cross-eorrelation-based method cannot be applied to high-order multiin-

put systems in a realistie eontext.
As mentioned earlier, the only aetual applieations of the multiinput eross-eorrelation-
based method to date have been in spatiotemporal modeling of visual systems. For this
reason, we adapt the fonnalism of Equation (7.23) to the spatiotemporal ease where the
input s(x; y; t) is a funetion of two spaee variables (x, y) and time t, and the output is a spa-
tiotemporal signal r(x; y; t) in the general ease, although in the spatiotemporal appliea-
tions to date it is viewed only as a funetion oftime r(t) (i.e., the output is reeorded at a sin-
gle loeation, typically a neuron in the retina or in the visual eortex). The seeond-order
spatiotemporal model takes the orthogonalized Wiener-type form [Citron et al., 1981;
Mannarelis & Mannarelis, 1978; Yasui & Fender, 1975; Yasui et al., 1979]:
r(xo;Yo; t) = ho + I dTI dxI dy : h-I»; y; T)S(Xo -x; Yo - y; t - T)

D o; Dy
+ I I dT)dT2 I I dxidx, I I dy)dY2' h2(.xb X2; Yb Y2; Tb T2)

D D o; ». Dy Dy
-pIdTI dxI dY'h 2(x,x;y,Y;T,T) (7.24)

D o; "»
where (xo,Yo) is the output referenee point in spaee, P is the power level of the spatiotem-
poral white (or quasiwhite) input, and the domains of integration are D for time-lag T, D;
for spaee lag x, and D; for spaee lag y. The formalism of the spatiotemporal model of
Equation (7.24) ean be extended to higher order but with limited praetieal utility due to its
eomplexity (as diseussed earlier).
The integration domain for the time lag extends from 0 to the memory JL of the system,
and for the spaee lags (x and y) it extends over the area of the "receptive field" (RF) of the
output neuron. Thus, Equation (7.24) gives a rigorous definition and elear understanding
of a eomplete nonlinear dynamte RF for a retinal or eortieal neuron (applieable also to
any other sensory system that reeeives input information not only in time but also in spaee
and/or wavelength).
In the spectrotemporal case, spaee is replaced by wavelength (or, equivalently, by fre-
queney). For instanee, the seeond-order response of an auditory unit may be modeled as
r(AO; t) = ho + I dTI dA' h)(A; T)S(Ao - A; t- T)

D DA
+ I I dT) dT2 I I dAl dA2' h2(A" A2; Tb T2)S(Ao - A); t- T)S(Ao - A2; t- T2)
D D DA DA
-pI dTI äX: h2(A, A; T, T) (7.25)

D DA
where Adenotes the wavelength ofthe aeoustie stimulus and DA is its respective domain
that defines the support of the "speetrotemporal reeeptive field" (STRF). The praetical
ehallenge in measuring the STRF is the generation and applieation of an aeoustic stim-
ulus that is speetrotemporally white (or quasiwhite), whieh implies statistieal indepen-
denee of multitone signals over a broad bandwidth (covering the bandwidth of the neu-
ron of interest). This type of experiment and modeling ean reveal the eomplete STRF of
an auditory system in a nonlinear dynamie eontext [Aertsen & Johanesma, 1981;
Eggermont et al., 1983; Lewis & van Dijk, 2003]. Likewise, the same fonnalism and
modeling approach can be applied to spectroscopic measurements (e.g., time-resolved
fluorescence spectroscopy data) to yield a complete characterization of the spectrotem-
poral characteristics of the source (e.g., the spectrotemporal characteristics of the fluo-
rophores in the fluorescent tissue in the case of time-resolved fluorescence spec-
troscopy) [Papazoglou et al., 1990; Stavridi et al., 1995a, b].
The orthogonalized Wiener-type models in the spatiotemporal or spectrotemporal case
lend themselves to kernel estimation via crosscorrelation, using the white (or quasiwhite)
statistical properties of the inputs (as in the single-input case). For instance, the Wiener
kemels in the spatiotemporal GWN input case are obtained as
ho = E[r(xo; Yo; t)] (7.26)
1
h1(x; Y; 'T) = p Eh (Xo; Yo; t)s(xo - x; Yo - Y; t - 'T)] (7.27)
1
h2(Xl' X2; Yl' Y2; 'Tl' 'T2) = 2p2 Eh(xo; Yo; t)s(xo - Xl;Yo - Yl; t - 'Tl)S(XO - X2; Yo- Y2; t - 'T2)]
(7.28)
where rl and r2 denote the output residuals of first-order and second-order, respectively
(as in the single-input case), and the indicated ensemble average is replaced in practice by
a time average and space average over the finite data record.
These kernel estimation fonnulae are based on the autocorrelation properties of the
spatiotemporal GWN input. Specifically, all the odd-order autocorrelation functions are
zero, and the second-order is
E[s(x; Y; t)s(x'; y'; t')] = P5(x - x')5(y - y')5(t - t') (7.29)
where 5 denotes the Dirac impulse function and Pis the power level ofthe spatiotemporal
GWN input signal. The fourth-order autocorrelation function is also relevant and given by
E[S(Xl; Yl; t 1)S(X2; Y2; t2)S(X3; Y3; t3)S(X4; Y4; t4)]

= P 25(XI - x2)5(Yl - Y2)5(t1 - t2)5(X3 - x4)5(Y3 - Y4)5(t3 - t4)
+ P 25(X2 - x3)5(Y2 - Y3)5(t2 - t3)5(X4 - xl)5(Y4 - Yl)5(t4 - t 1)
+ P 25(x 1 - x3)5(Yl - Y3)5(t1 - t3)5(X2 - X4)5(y2 - Y4)5(t2 - t4) (7.30)
For quasiwhite inputs (e.g., binary or ternary that have been used in spatiotemporal appli-
cations), the fourth-order autocorrelation function and the second-order kernel estimation
formula must be adjusted to the specific second and fourth moments of the quasiwhite
signals (as in the single-input case with CSRS quasiwhite input signals).
Obviously, the complexity of these expansions rises rapidly for higher-order kernels,
preventing the practical use of cross-correlation-based methods. Illustrative examples of
second-order spatiotemporal applications are given in Section 7.4.
As in the two-input case, the multiinput Wiener kernels are generally distinct from
their Volterra counterparts (except for the highest two orders of kemels in a finite-order
7.3 THEMULTIINPUTCASE 393
system that are the same for Volterra and Wiener) and depend on the power levels ofthe
white (or quasiwhite) test inputs with which they have been estimated. Thus estimating
the input-independent Volterra kernels remains an attractive prospect that is discussed in
the following section.
7.3.2 The Kernet-Expansion Method for Muttiinput Modeling

The Volterra kernels of a multi input system can be estimated with the kemel-expansion
method. As in the two-input case, distinct bases of functions can be used to expand the
kemels associated with each input, resulting in the multiinput modified Volterra model:
LI LM
y(t) = Co + I Cl,O, ... , O(jl)V)~)(t) + ... + I <o,o, ... , 1(jM)v5~(t) + ...

Jl=l JM=I
LI LM
+ I ...
Jl=l
I
J n 1+ · · · +nM=l
Cn], ... ,nJj, ... ,jn\+ ... +nM)vS~)(t) ... vS~+ .. .+nM(t)
+ ... (7.31)
where
J;
v5~P) = b5~;( T)x;(t- T)dT (7.32)
with bj(i) denoting the jth basis function for the expansion of the kernels corresponding to
input x.. The multivariate model of Equation (7.31) is linear with respect to the expansion
coefficients that can be estimated by means ofleast-squares regression using input-output
data.
This model can be also parsimonious for judicious selection of expansion bases or by
determining the appropriate PDMs of the multiinput system, following the procedures
discussed in the two-input case. Note that the model ofEquation (7.31) is valid for all in-
puts and does not require white (or quasiwhite) inputs, so long as the input signals are
fairly broadband (e.g., spontaneous or natural activity). As before, the concept of PDM
modeling leads us to Volterra-equivalent network models that are discussed in the follow-
ing section.
7.3.3 Network-Based Muttiinput Modeling

This is the recommended approach to the multiinput modeling problem because of its
scalability advantage (i.e., its complexity, as measured by the number offree parameters,
is linear with respect to the number M ofinputs and the order Q ofnonlinearity).
To achieve this scalability advantage, we introduce the network architecture of Figure
7.26 that is equivalent to K Volterra models with M inputs and K outputs. This Volterra-
equivalent network employs the distinct filter banks {b)m)} (j = 1, ... ,Lm; m = 1, ... ,M)
that preprocess the respective inputs and generate outputs {v5m )(t) } that are fed into sepa-
rate hidden layers for each input (each having Hi; hidden units with polynomial activation
functions (fhm)} (h = 1, ... ,Hm) ofdegree Qm). The outputs ofthe hidden units are fed into
a common "interaction layer" with I interaction units that operate in a manner similar to the
394 MODELING OF MULTIINPUTIMULTJOUTPUT SYSTEMS
x, XM
INPUT
LAYER
FILTER
LAYER
HIDDEN
LAYER
INTERACTION
LAYER
Figure 7.26 The Volterra-equivalent network model in the multiinput, multioutput case (see text).
hidden units having polynomial activation functions {gi} (i = 1, ... ,l) of'degree R, It is ev-
ident that the interaction layer generates the equivalent Volterra cross-kemels (in addition
to the self-kemels that pass through). The outputs ofthe interaction units are summed with
appropriate weights {"i,k} to form each output Yk (k = 1, ... , K) with offset value Yk,O.
There are (LJIm) in-bound weights {W)~J} of the hidden units for input Xm, and (HJ)
in-bound weights {,~m?} of the interaction units from the hidden layer of input X m. There-
fore, the total number of free parameters for this network model is
M I
P = I u.; + Qm + tvt; + (I + I)K + IR
m=l FI
i (7.33)
without counting possible parameters in the employed filter banks (e.g., the parameter for
each Laguerre filter bank).
In order to demonstrate more easily the scalability advantage of this network, let us as-
sume that all filter banks have the same size (L m = L), all hidden layers have the same num-
ber ofhidden units (Hm = H), all hidden units have the same order (Qm = Q), and all inter-
action units have the same order (R, = R). Then the total number offree parameters becomes
P = MH(L + Q + l) + I(K + R) + K + M (7.34)
where we have also included one free parameter for each filter bank (as in the case ofLa-
guerre filter banks). The expression (7.34) is instructive with regard to the complexity of
7.4 SPATIOTEMPORAL AND SPECTROTEMPORAL MODELING 395
this network model, since the total number of free parameters is linear with respect to Q
and R, as well as Land M. Note that the multiinput modified Volterra model of Equation
(7.31) has a much larger number of free parameters that grows exponentially with the or-
der of nonlinearity and the number of inputs.
For example, in the two-input/two-output case (M = K = 2) and for typical values of
the structural parameters in applications to date-L = 7, H = 2, 1= 2, Q = R = 2-we have
P = 56, whereas the modified Volterra model ofEquation (7.31) has 4,232 free parame-
ters-a "compactness" ratio of about 80. Note that the nonlinear order of the multiinput
network model is equal to the product (QR).
This immense scalability advantage comes at the price of a more challenging conver-
gence of the iterative training algorithm and is contingent on the assumption that the
structural constraints imposed by the network architecture are compatible with the system
at hand. The validity of the latter assumption is checked by the quality of the model per-
formance (in terms of predictive ability and consistency of results from experiment to ex-
periment) provided that the input ensemble is rich in information content and representa-
tive of the natural operation of the system.
This multiinput network architecture can be applied to the spatiotemporal or spec-
trotemporal modeling problem as well, with immense scalability advantages (especially
for higher-order systems).
7.4 SPATIOTEMPORAL AND SPECTROTEMPORAL MODELING
The multiinput modeling approach presented earlier in this chapter can be used for the
study of spatiotemporal and spectrotemporal dynamics, whereby the different space loca-
tions or spectral bands of stimulation and response represent the multiple inputs and out-
puts. The mathematical formalism of a Wiener-type model (orthogonalized functional se-
ries for independent GWN inputs) is given by Equation (7.24) for the spatiotemporal case
and by Equation (7.25) for the spectrotemporal case. The cross-correlation-based formu-
lae for the estimation of the spatiotemporal Wiener kemels are given by Equations
(7.26)-(7.28) for a spatiotemporal output signal.
For an output that is only a function of time (i.e., recorded at a single location xo, Yo),
the cross-correlation formula for the estimation of the qth-order spatiotemporal Wiener
kernel using a finite data record t E [0, R] is
~ 1 JR
hq(Xb ... ,xq; Yb ... ,Yq; Tb ... , Tq) = - - rq(xo; Yo; t)
q!pqR °
s(xo -Xl; Yo - YI; t- Tl) ... S(Xo -Xq;Yo - Y q; t- Tq)dt (7.35)
where rq denotes the qth-order output residual, S denotes the GWN spatiotemporal input
of power level P, and the integration over time is replaced by summation for the dis-
cretized data. This estimation formula has been used successfully in the study of the vis-
ual system up to the second order of nonlinearities, as discussed in the following two sec-
tions. However, the quality of the obtained kernel estimates is modest even for very long
data records, and this approach becomes impractical for large number of inputs (i.e.,
many spatiallocations) or for higher-order nonlinearities.
These practicallimitations motivated the introduction of more efficient Volterra-type
modeling formulations that have the additional advantage of being applicable to cases
where the multiple inputs are not independent white (or quasiwhite) noise. This is ex-
tremely important in practice since it corresponds to the case of natural spatiotemporal
stimuli (i.e., natural visual images changing in space and time). It is a key tenet of this
book that our modeling efforts should be directed (to the extent practically possible) to-
ward the natural paradigm of system operation (i.e., using the "natural" ensemble of in-
put-output signals).
The efficient Volterra-type formulation ofthe spatiotemporal or spectrotemporal mod-
eling problem follows the path of the single-input case, whereby Volterra kerne I expan-
sions are used to reduce the estimation problem to multivariate linear regression with the
minimum number of free parameters [see Equation (7.31)]. Further compaction of this
fonnulation (in terms of reducing the number of free parameters) leads to the network-
based fonnulation presented in Section 7.3.3 that represents the most efficient approach to
the multiinput modeling problem to date (including the spatiotemporal and spectrotempo-
ral cases).
The spatiotemporal fonnulation of the modified Volterra model using kernel expan-
sions is given by
L
r(xo;Yo; t) = Co + I
j=l
Cl (j)vj(xo; Yo; t) + ...
L L
+I ... I Cq{jb ... ,}q)Vj1(Xo; Yo; t) ... Vj (Xo; Yo; t) +. . . (7.36)

. 1
11= i«:
. 1 q
where vj(xo; Yo; t) denotes the output of a spatiotemporal basis filter bj(x; Y; T):
vixo; Yo; t) = f drf f"»

D o;
dx dy· bix; y; r)s(xo- x; Yo - y; t - r) (7.37)
The filter bank {bj } forms a complete basis of the spatiotemporal dynamics of the system
and is selected judiciously to minimize the filterbank size L required for a satisfactory ap-
proximation (in direct extension of the kernel-expansion concept used in the single-input
case). The spatiotemporal basis functions can be represented in terms of separate tempo-
ral and spatial basis functions {bI, bf, bJ} as
bj(x;Y; T) = bf(x)bJ(y)bI( T) (7.38)
where {bI} can be a Laguerre basis and {bf}, {bJ} can be a Hermite or Gabor basis (al-
though there is ample choice for temporal or spatial bases) defined over the domains D,
D x , and Dy , respectively. The spatiotemporal basis {bj} need not be separable, as in Equa-
tion (7.38), but separability simplifies the computational algorithm.
The formulation ofEquation (7.36) reduces the estimation problem to multivariate lin-
ear regression, since the unknown parameters (i.e., the kernel expansion coefficients) en-
ter linearly in the modified Volterra model, as in the single-input case. The total number
offree parameters in the model ofEquation (7.36) for a Qth-order system (i.e., all kernels
up to Qth order) is (L + Q)!I(L!Q!). However, if distinct bases are used for the temporal
and spatial dynamics (with sizes L r, Lx, L y) without regard to the simplifying constraint of
Equation (7.38), then the total number of free parameters increases immensely, because
the qth-order spatiotemporal Volterra kernel expansion becomes
LX LX Ly Ly LT LT
kq(Xb ... ,Xq;Yl, ... ,Yq;Tb ... ,Tq)=I···I I···I I···I
if=I i:=Ii[=I i;=IiIT=I i[=I
X X
aqUxb · · · ' )·X.·y
q,) b . . . ,)·Y.·T
q,j b . . . ·1\bif!
,jq] ( Xl ) ... b i:( Xq)bY(v)
ir I ...
Y T T
biY(Yq)biIT(Tl) ... bi T(Tq) (7.39)
q q
This unwieldy kernel expansion can be contrasted to the relative simplicity of the qth-or-
der kernel expansion for the spatiotemporal basis ofEquation (7.38):
L L
kq(Xb . . . , x q; Yb ... , Y q; Tb ... , Tq) = I ... .I cq(jJ, ... ,}q)

iI=I i q=l
bi l(x I; YI; 'Tl) ... biq(xq;Y q; Tq) (7.40)
Nonetheless, it should be noted that the apparent simplicity of Equation (7.40) disguises
the fact that the employed spatiotemporal basis is far more constrained that the general
case depicted in Equation (7.39). In a practical setting, the choice of the proper basis de-
pends on the specific characteristics of the system. However unwieldy models generally
emerge for high-order systems (higher than second order) and the network-based ap-
proach becomes far more attractive.
The recommended Volterra-equivalent network model is shown in Figure 7.27 and
employs a spatiotemporal filter bank for input preprocessing, as weIl as a hidden layer
and an interaction layer that seek to approximate the static nonlinear mapping:
r(xo; Yo; t) = F[VI(XO; Yo; t), ... , VL(XO; Yo; t)] (7.41)
that is valid for all (xo;Yo; t) data points. The presence of the interaction layer is the key to
achieving parsimony in a general spatiotemporal modeling context.
The presented spatiotemporal models can be reduced to the spatial dimensions only by
integration over the temporal dimension. This may be appropriate in certain applications,
such as the study of visual receptive fields in response to static images or for the analysis
of imaging apertures. In the space-only case, the spatial Volterra kernels {k~} are related
to their spatiotemporal counterparts {k q } as
kJ(xl s ••• ,xq ; Y I> ••• ,Yq ) = f. ..f kq(x

D
b ... ,xq ; Yl> ... ,Yq ; Tb ••. , Tq)dTJ ••• dTq (7.42)
The mathematical formalism for spectrotemporal models follows the same line of
thinking, where the different spatial locations are replaced by different wavelengths or
frequency bands of spectral interaction.
Although applications of the efficient spatiotemporal modeling approach using Volter-
ra kernel expansions or Volterra-equivalent networks have not been published yet, we
present in Section 7.4. an illustrative example of a second-order cross-kernel estimate
from a cortical neuron obtained by use ofLaguerre kernel expansions and compare it with
its cross-correlation counterpart.
In closing this section, we must note the pioneering contribution to spatiotemporal
398 MODELING OF MULTIINPUTIMULTIOUTPUT SYSTEMS
s(xo,Yo, t)
SPATIOTEMPORAL
INPlJT
SPATIOTEMPORAL
FILTER BANK
HIDDEN
LAYER
INTERACTION
LAYER
SPATIOTEMPORAL
OUTPUT
r(xo,Yo, t)
Figure 7.27 Spatiotemporal Volterra-equivalent network model with input preprocessing by spa-
tiotemporal filter bank.
modeling of Syozo Yasui, who first proposed the spatiotemporal extension of the
Volterra-Wiener approach in the pivotal 1975 symposium at Caltech [Yasui & Fender,
1975] organized by Panos Marmarelis and Gilbert McCann, and Sutter's pioneering
contribution in the same symposium [Sutter, 1975]. We must also acknowledge
McCann's foresight in pursuing the onerous implementation of this approach to retina1
ganglion cells in collaboration with Citron and Kroeker [Citron et al., 1981b], Citron's
work with Emerson [Citron & Emerson, 1983; Citron et al., 1981a], and Naka's pio-
neering efforts with Yasui, Davis, and Krausz [Davis & Naka, 1980; Krausz & Naka,
1980; Yasui et al., 1979].
7.4.1 Spatiotemporal Modeling of Retinal Cells

The study of visual receptive fields has been pursued conventionally through the tedious
use of many specialized stimuli of various shapes, sizes, orientations, and contrasts. This
laborious and time-consuming process has yielded useful knowledge but at great expense
of time and resources. The advocated spatiotemporal modeling framework offers a far
more efficient path to this goal by deriving quantitative descriptions of the visual recep-
tive fields (RFs) that are valid for all possible stimuli and have predictive ability in a non-
linear dynamic context. Furthermore, the obtained spatiotemporal models can be used to
study motion-related responses and transient behavior in response to arbitrary combina-
tions of stimuli. Therefore, there seldom exists a paradigm of scientific research that can
benefit so much from an emerging methodology as the case of spatiotemporal Volterra
modeling of the visual system. Sadly, this fact has been largely overlooked by the peer
community thus far. This is about to change.
In this section, we present as an example the first application of the spatiotemporal
Volterra-Wiener approach to the study of the RF of retinal ganglion cells in the frog
(class 3 cells) performed by Citron, Kroeker, and McCann more than 20 years aga [Citron
et al., 1981a]. The class 3 ganglion cells in the frog retina exhibits an excitatory RF center
(6-10° in diameter), surrounded by an inhibitory RF annulus (from about 12° to 20°), the
sole illumination of which does not elicit a response. The experiment involves a spa-
tiotemporal white-noise (STWN) stimulus composed of a 16 x 16 grid of square pixels
each subtending 1.2° on the retina (or a 75 J.Lm square) and modulated by independent bi-
nary CSRS signals (on or off for maximum contrast). The CSRS signals switch synchro-
nously (and, of course, randomly) every 42.4 ms for a temporal bandwidth of approxi-
mately 24 Hz (or 24 spatial frames per second). This STWN input tests the ganglion cell
for a broad variety of stimuli over the length of the experiment (720 sec) and is suitable
for estimating the spatiotemporal Wiener-like (binary CSRS) kemels of the system via
crosscorrelation, except at the diagonal points of the second-order kemels (see Section
2.2.4). The binary waveform was used to simplify the procedure but is unduly restrictive
and was replaced by a 16-level CSRS in later experiments. The obtained spatiotemporal
model can predict the response of the ganglion cell to any spatiotemporal stimulus, in-
cluding moving objects and variable scenes.
The obtained spatiotemporal first-order Wiener (CSRS) kemel h 1 is shown in Figure
7.28B as a sequence of eight two-dimensional frames every 42.4 ms extending 9.6° in ei-
ther dimension (i.e., 8 x 8 pixels). The maximum (negative) value is observed at T = 42.4
ms (Fig 7.28C), and the half-max contour is approximately elliptical with main axes
about 7° and 5°. The time course of h 1 (over T) is shown in Figure 7.28A for the central
pixel ofthe RF and becomes slightly positive for T ~ 85 ms, retuming to the zero baseline
around T = 300 ms. It is evident from the spatial dimensions of this measurement that the
antagonistic surround ofthe RF is not included in the h 1 estimate (only the off-center ex-
citatory portion of the RF is included). Since the total area of the stimulus grid is about
20°, we surmise that the annular (surround) portion of the RF was not captured properly
in this study because of poor estimation accuracy due to the employed cross-correlation
technique. This provides additional motivation for utilizing the improved estimation
methods advocated in this book, especially in low-SNR cases.
Note that in the spatiotemporal case, the multiple "channels" of information that flow
from the various inputs to the output serve as "interference" (i.e., when the kemels corre-
sponding to the mth input are estimated, the contributions of all other inputs to the output
signal act as interference and degrade the output SNR). This is particularly detrimental in
the case of the crosscorrelation technique, as evidenced by this result. An illustrative ex-
ample of how much the kemel estimates improve with the use of the Laguerre expansion
technique is given in the next section regarding cortical cells.
It must be appreciated that the number of possible cross-kemels in the spatiotemporal
model (even for second-order nonlinearities only) increases rapidly with the size of the
RF measured in number of pixels, because all possible pair combinations must be consid-
ered in principle. For instance, the 8 x 8 RF considered in the example above (even
though it represents only the excitatory center) has 2016 second-order temporal kemels,
of which only 64 are self-kemels. This huge number of cross-kemels can be reduced by
imposing symmetry and localization constraints, but remains daunting in general. For this
reason, we advocate the equivalent PDM and network-based models (shown in Figure
400 MODELING OF MUL TfINPUTIMULTlOUTPUT SYSTEMS
A ·0.4 tl, 01 0 sinQlt pixel
0,2
Spt",,1 0
He
<12
-Oa
o 42~7- - '70 212 2~ 2i17
Tim' T ImMel
D
B Comol,l . spal tOlemporol h l
[illB]
.
'.': ~
B . . GJ "
[QJJ
@l. :.D.:·. ."
.
:
' [J
' .
. '.:
',"
..
','
: / .
rr
'
0... "
.
, ' " . .
".
....
. ,
':'.' GJ'
-
~ .:
.... . .
.
...
.
.
o muc 42 rnMC 8~ mue 12 7 mH C 170 mHe 212 mu c 2~4 mMC 297 "' He
c 96
, h , O' 0 .,nQle 11m. (T",•• I
dt<l'"'
D
(Dllll ~l el
96
MOI·n
Figure 7.28 (A) first-order spatiotemporal Wiener kemel of the class 3 ganglion cell in the frog retina
at a single pixel near the center of the receptive field. A flash stimulus at this pixel causes an initial de-
crease from the mean firing rate of background activity, followed bya smaller but longer increase in fir-
ing rate . Error bars represent the standard deviation obtained fram three independent trials. (8) first-
order spatiotemporal Wiene r kemel for all spatial positions in the off-center region of the receptive field
over eight successive time frames (every 42.4 ms). Stippled areas represent negative kemel values.
Note that each point In the receptive field appears to have the time course shown In A and differs only
in magnitude. (C) expanded view of the peak response frame of the first -order kemei . Absclssa and or-
dinate give the spatial coordinates of the 16 by 16 array of stirnulus pixels . The slze of an Individual plx-
ells Indlcated by the small box In the lower rlght of the contour map [Citron et al., 1981].
7.27) that avoid this problem and allow practical extension of spatiotemporal modeling
into higher-order nonlinearities (if needed).
In this example, the temporal second-order kerneIs were found to be similar in form
but of different scale, depending on the distance separating the two input points. A cut
through a second-order cross-kernel at the point of maximum value is shown indicatively
in Figure 7.29 and exhibits similar waveform but reverse polarity from the first-order ker-
nel. When the maximum (positive) value of the second-order kernel is plotted along one
spatial dimens ion, then the expanse of second-order interactions appears to be compara-
ble to the first-order kerneI expanse, as shown in Figure 7.30. However, for a fixed refer-
ence point, the peak values of the cross-kernels between neighboring points and the refer-
ence point appear to decline faster, as shown in Figure 7.30 with dashed lines in two
cases. This intriguing localization of second-order interactions is consistent with the hy-
pothe sis of nonlinear subunits within the RF of ganglion cells, as previously posited for
the ganglion cells in the cat retina [Victor & Shapley, 1979a, b, 1980].
Time course of h land h 2

+0.4
0.2
Spikes/ o
sec
0.2
-0.4
+0.002
0.001
Spikes/
sec o
0.001
-0.002 L ~__ I I I j I I
o 42 85 127 170 212 254 297
Time T (rnsec)
Figure 7.29 Comparison of the time course of first- and second-order spatiotemporal Wiener ker-
nels of the class 3 ganglion cell in the frog retina. The second-order kernel was cut at Jit = 0.0424 sec
(bottom trace) for comparison with the first-order kernet (top trace) [Citron et al., 1981b].
1,00
o I .... :=sC - -, .. > ,m:-
L.-..J
1.2deorees
oI .... > ~ .. -----====-
Pec*
Spatial hl
-1.00
Figure 7.30 Comparison of the spatial profiles of first- and second-order spatiotemporal Wiener
kerneis of the same class 3 ganglion cell shown in Figure 7.29. For clarity, only values at Tmax (time of
peak response) are plotted. The abscissa represents location along the diameter of the receptive
field. The lower solid curve represents the relative magnitude of the first-order kernei, which is a cut
through the surface shown in Fig. 7.28C. The dotted and dashed lines represent two examples of
two-point spatial interaction about the two reference points indicated by arrows. These spatial inter-
actions were determined around each pixel in the receptive field, and the upper solid line represents
the envelope of the peak values of each of these interaction curves [Citron et al., 1981b].
This possible organization of retinal ganglion cells makes them suitable for the net-
work-based spatiotemporal modeling approach advocated in this book, which promises
immense efficiency gains in this formidable problem. The time has come to tackle the full
complexity of this important modeling problem.
7.4.2 Spatiotemporal Modeling of Cortical Cells

The spatiotemporal modeling approach presented in the previous section was also applied
by Citron, Emerson, and their associates to cortical cells in the cat striate cortex [Citron et
al., 1981a; Citron & Emerson, 1983; Emerson et al., 1987, 1992]. The experimental stim-
ulus for directionally selective cells was changed to randomly moving bars in the pre-
ferred direction (one space dimension instead of two), whose location and contrast was
determined by two independent CSRS (quasiwhite) processes of 16 possible values for
location and three possible values for contrast. The middle contrast value corresponding
to mean luminance of 222 cd/m? was defined numerically as "zero level," the low lumi-
nance value corresponding to dark was defined as -1, whereas the high luminance value
corresponding to 444 cd/m? was defined as +1. The dimension of each bar was 0.5 0 by
110 , covering the whole RF in the long direction. The luminance and location values
switched every 16 ms, and this "random grating" stimulus was applied for 7 min continu-
ously. The peristimulus histogram of the cortical cell (over five repetitions) was used as
the output signal [Emerson et al., 1992].
A typical cross-kernel for a strongly directionally selective complex cell is shown in
Figure 7.31, corresponding to two adjacent positions (9 and 10). The ('Tb T2) representa-
tion is typica1 of directionally selective cells as seen earlier in the fly (Section 7.2.1). The
('T, a 'T) representation allows the suppression of the 'T variable through integration (sum-
mation) because the positive "mount" and negative "valley" are separated at d'T= O. Upon
suppression ofthe 'T dimension, we display in Figure 7.32 the cross-kernel values over d'T
(from -8 to +8) and space (from 1 to 16), representing the position ofthe moving bar (the
reference bar position is at 9). The alternating "mounts" and "valleys" of the plotted sur-
face reflect the directional selectivity of the complex cortical cell.
The direction of preferred motion is marked on Figure 7.32(c) and corresponds to the
main axis of the elongated main lobe. The null direction of motion is also marked on Fig-
ure 7.32(c) and corresponds to the line that connects the centers ofthe elliptical contours
(so that integration along this line yields very small values). The slope of the main axis
(preferred direction) determines the preferred velocity for motion detection by this com-
plex cell. As this slope increases, the preferred velocity becomes higher (other directions
result in smaller output values, monotonically decreasing toward the minimum at the null
direction).
The width of the "mounts" and "valleys" is comparable and determines the "optimal"
width of the bar that elicits the maximum response from this complex cell when moving
in the preferred direction. Another implication of this result is the critical value of the
width of a bar above which the directional selectivity of the cell will diminish (because of
attenuation caused by integration along the space dimension).
This example i1lustrates the wealth of information that can be extracted from cortical
complex (or other) cells by means of a fairly short experiment and rigorous analysis.
Since the presented results were obtained by means of the conventional cross-correlation
technique (with all its well-documented limitations), the advanced estimation methods
advocated in this book can enhance further the quality of the information extracted from
these data.
7.4 SPATIOTEMPORAL AND SPECTROTEMPORAL MODEL/NG 403
•
•L q
sr l -~--------
.. 1~q:4!J 0
lF
:. •
()
t ~ /, • ~ ~ ~
Figure 7.31 (A) Second-order interaction cross-kernel values plotted as a two-dimensional function
of the two time lags from two interacting stimuli. (8) Same interaction values as in A, but plotted as a
function of stimulus temporal separation (~7) versus time after the second stimulus (7). Square of A
maps to triangle of B, and heavy square of 8 to heavy arrow-shaped region of A. Note that interac-
tion values in ~ T versus T coordinates of Bare nearly separable, which permits suppression of the T-
variable (see text). Contour at zero is not shown in any contour plots because it is dominated by
noise [Emerson et al., 1992].
404 MODELING OF MUL TIINPUTIMULTJOUTPUT SYSTEMS
B
11"'.")
11
o
J: 12
.c.
iiII_
o.
"u
0"
••
.. '
...z~ - e
C,.,
.... Cl
-, -4 0 .4 .,
6T
SCH!MATIC INT!RACT10N
c ..on...' .,.
11
11
o
5 '21'" ..
.. ::
.c.
0-
0"
••
a-.
I
,
IIOVI"'~ OOW ..
-, -4 0 .4 .,
, . . nuTIW TO CIMn" Ult f6TJ
Flgure 7.32 Spatiotemporal dependence of nonlinear interactions around position 9 in the same di-
rectionally selective complex cell as in Figure 7.31. (A) second-order interacti on values, such as in
Figure 7.31 , have been projected onto the "neighbor position" versus AT by summ lng over the first
80 msec (5 lags) following the second stimulus, and plotted In perspective vlew as a function of the
full range of 16 possible 0.5° nelghbor positions, and 15 temporal separations (A.ry covering ±112
msec. (8) Contour plot shows that the ridge-valley structure of A is obliquely orlented In space-time
(dashed contours are negative). (C) Schematic contour pattem shows locus of interactions elicted by
a bar moving downward (broken arrow) at 15.5°/sec, whlch corresponds to the velocity that inter-
sects the minlmallnteraction values. This cell was dlrectlonally selectlve because upward movement
at th is speed elicited strong positive interact ions (open ellipse), while downward movement at the
same speed elicited the strongest suppression in the RF. "Fast" arrow indicates the trajectory asso-
clated wlth a faster upward moving bar [Emerson et al., 1992].
X-MIN= 0.0 Y-MIN= 0.0 Z -MIN= - 7 .024x 10-3

X-MAX= 20 Y-MAX= 20 Z-MAX= 5.644x10- 3
(A)
3
X-MIN= 0.0 Y-MIN= 0.0 Z-MIN= -5.643x '':p-
X-MAX= 20 Y-MAX= 20 Z-MAX= 4.98x10-
(B)
Figure 7.33 Second-order cross-kernels between locations 4 and 5 in a random bar stimulus of a
complex cortical cell obtained via Laguerre expansion (A) and cross-correlation (8), using a short in-
put-output data record. The Laguerre-expansion results ware consistent for various data segments.
The cross-correlation estimate is evidently unreliable.
406 MODELING OF MULTIINPUTIMUL TlOUTPUT SYSTEMS
An illustration ofthis is given in Figure 7.33, where the cross-kernel between two ad-
jacent input locations (4 and 5) ofthe random bar stimulus is shown for a simple cortical
cell as estimated by the Laguerre expansion technique in the left panel (L = 6, a = 0.2)
and by the cross-correlation technique in the right panel (unpublished results-data pro-
vided to the author by Bob Emerson). The comparison is attesting to the potential benefits
accrued by the use of the advocated kernel estimation methods.
8
Modelingof
Neuronal Systems
Because ofthe particular signal modality employed by the nervous system (sequences of
action potentials) and the rising interest among neuroscientists in quantitative methods of
analysis, we dedicate aseparate chapter to the modeling of neuronal systems. The mathe-
matical modeling of neuronal function has been attracting increased attention as the quan-
titative means for understanding information processing and coding in the nervous sys-
tem. Modeling efforts have been made from the subcellular and cellular levels to the
integrated levels ofneuronal networks and behavioral neuroscience. Techniques for neur-
al modeling include parametrie and nonparametric methods (i.e., the use of differential or
integral equations, respectively) and remain rather diverse. The great diversity of model-
ing approaches is a natural consequence of the immense variety of neural systems and the
diverse requirements of different applications. The approach presented herein must be
viewed as a modeling tool suitable for a broad class of problems that involve the nonlin-
ear dynamic relation between discemible and measurable stimuli and the corresponding
observable response (output).
The nervous system is composed of a multitude of functional components intercon-
nected in complex configurations that process information in order to perform specific vi-
tal tasks in its interactions with the environment and in preserving physiological home-
ostasis. Developments in systems science allow us to formulate the study of information
flow within the nervous system as a "systems problem," whereby signals, representing in-
formation, travel between neuronal components and are dynamically transformed by
them. The use of the systems approach is predicated on appropriate conceptual and math-
ematical fonnalism in describing the "transfer characteristics" of the neural system com-
ponents, their interconnections, and the transformed neural signals. In the case of the ner-
vous system, the level of system integration/decomposition (i.e., the chosen level for
examining the natural hierarchy of functional organization) may range from the subcellu-
lar to the behavioral. In Sections 8.1-8.3, we focus on the level of individual neurons
(viewed as the basic operational unit), which receive certain input signals and generate, in
NonlinearDynamic Modeling ofPhysiological Systems. By Vasilis z. Marmarelis 407

408 MODELING OF NEURONAL SYSTEMS
a causal manner, certain output signals. Aceurate understanding of the functional proper-
ties of the "neuronal unit' facilitates the study of neuronal ensembles and networks, lead-
ing to "integrated" neural system representations in Section 8.4.
Two types of neuronal systems will be studied from the modeling viewpoint: those
pertaining to the process of generation of postsynaptic and action potentials by individual
neurons, and those concerning the transformation of sequences of action potentials (and
related events, such as population spikes) between neurons in neuronal ensembles. The
former type of system performs encoding of input information into a sequence of binary
events (action potentials) by individual neurons, and the latter type of system performs
processing of this encoded information by neuronal ensembles. Both types of systems ex-
hibit significant nonlinearities that are essential for neurophysiological function. There-
fore, the advocated methodology appears to be suitable for the task, after appropriate ad-
justments for the particular signal modality of neuronal spike trains (sequences of action
potentials represented as binary variables).
The chapter begins with a general model form (proposed as an alternative to the
Hodgkin-Huxley (H-H) model ofexcitable membranes) that is suitable forpractical data-
based modeling of action and postsynaptic potentials. This model form is simpler than the
classic H-H model and can be applied to both voltage-dependent and ligand-dependent
processes at the synapse or the axon hillock. This general model form is used in Section
8.2 to integrate the functional dynamics of a single neuron. In Section 8.3, the method-
ological subtleties attendant to the use of point-process (spike) inputs are discussed and
examples are given from actual applications. In Section 8.4, the modeling problem of sig-
nal transformation and processing in neuronal ensembles is addressed in the
Volterra-Wiener context using the advocated PDM modeling approach. It is shown that
this methodological framework is unique in its power to offer a practicable solution to the
difficult problem of multiunit neuronal processing in the true nonlinear dynamic context.
8.1 A GENERAL MODEL OF MEMBRANE AND SYNAPTIC DYNAMICS
One of the most popular modeling problems in neurophysiology concems the ion-channel
mechanisms that give rise to transmembrane currents and potentials, starting with the
seminal work of Hodgkin and Huxley on the generation of action potentials at the squid
axon membrane '[Hodgkin & Huxley, 1952]. The H-H model is highly celebrated (and
was honored properly with a Nobel prize in 1963) because it delineated quantitatively the
specific mechanisms of distinct biophysical processes in excitable membranes (leading to
the key notion of ion channels) in addition to describing accurately the experimental data.
Thus, it influenced profoundly the scientific way of thinking about these biophysical
processes and illuminated a productive path for numerous subsequent studies, represent-
ing an outstanding example of the utility of mathematical modeling in biology. Nonethe-
less, the H-H model also has had several critics and detractors who questioned its validity
under different sets of conditions than the ones considered by the famous duo.
Numerous attempts have been made to simplify the original form ofthe H-H model (in
terms of its postulated nonlinearities and the number of free parameters) in order to make
it more useful in the practical context of neuronal ensembles and more accessible mathe-
matically to the broadest possible community of investigators. These attempts range from
the simplistic "integrate-and-fire" model to elaborate phase-space analysis, and this sub-
ject matter remains an active area ofresearch 50 years after the seminal publication ofthe
8. 1 A GENERAL MODEL OF MEMBRANE AND SYNAPTIC DYNAMICS 409
H-H model. The H-H model has been examined also under conditions of white-noise
stimulation in the Volterra-Wiener modeling context [Guttman et al., 1974, 1975, 1977;
Courellis & Marmarelis, 1989].
In this section, a general parametric model form is presented as a simpler alternative to
the H-H model that can be also extended to synaptic dynamics. The motivation for intro-
ducing this novel model form is to reduce somewhat the complexity ofthe H-H model and
extend its applicability to synaptic dynamics, while maintaining the essential functional
characteristics observed experimentally for the dynamics ofpostsynaptic and action poten-
tials. Another motivation is the need for practical estimation ofthe dynamic characteristics
of each ion-channel from natural (broadband) input-output data without resorting to volt-
age-clamp experiments. This general model form aspires to facilitate the functional inte-
gration at the single-neuron level and eventually at the aggregate level ofneuronal ensem-
bles. Validation of this model relies on inductive, data-based modeling using broadband
experimental data and, therefore, it is also examined in the nonparametric context of the
Volterra-Wiener approach and its PDM modeling variant advocated in this book.
The key observation is that currently used voltage-clamping techniques do not allow
the accurate measurement ofthe voltage-dependent conductances ofthe various ion chan-
nels (as claimed extensively over the last 50 years) even under pharmacological ion-spe-
cific blocking of channels. This is evident from the H-H equation, in which the required
current for clamping at an imposed reference transmembrane voltage V depends not only
on the various conductances at Vbut also on the partial derivatives ofthese conductances
with respect to voltage (evaluated at V). For instance, after blocking ofthe sodium chan-
nel with TTX, we expect to observe a clamping potassium current M:
M = [g~V) + ag;,.V) (V - VK
) J.:1V (8.1)
where av = V - Vm , Vm is the membrane resting potential, VK is the potassium equilibri-

um potential, and gK is the voltage-dependent conductance of the potassium channel.
Note that Equation (8.1) is also simplified in that it neglects the effect of other possible
nonsodium channels (that are not blocked by TTX) and the so-called gating current. AI-
though numerical techniques can be construed to obtain an estimate of g~V) from volt-
age-clamp measurements governed by Equation (8.1), this is not a trivial matter in prac-
tice and is subject to many possible sources of error, in addition to requiring long and
laborious experimentation. This fact provides the rationale for the proposed alternative
model that allows the practical estimation of the dynamics of individual ion channels di-
rectly from the data under conditions of natural operation.
The general form of the proposed model is
y + ay = F(vJ, ... ,VM)Y + bo-z (8.2)
where dot denotes derivative and the negative feedback term z(t) is given by
z+ yz = R(y) (8.3)
where y denotes the "output" transmembrane potential, b o is a "basal" value of transmem-

brane current related to intrinsic ion pumps, z denotes the "re setting feedback" (responsi-
ble for the refractory behavior of the neuron and related to channel inactivation mecha-
nisms), R denotes a sigmoidal function with high slope (a threshold operation), and F de-
notes a bounded nonlinear function ofthe individual ion-channel variables:
V/t) = gi 0 x(t) (8.4)
where 0 denotes convolution between the input current x(t) and an impulse response
function g/ T) characteristic of the ith ion channel. The latter describes the dynamics of
each of the M ion channels. Critical in this formulation is the selection of the function F,
since it is responsible for defining the nonlinearities of the voltage-dependent ion-channel
conductances that lead to generation of an action potential. For synaptic dynamics, the
terms of this model attain different meaning and interpretation.
As an illustrative example, consider two channels (e.g., sodium and potassium) with
dynamics described by the simple biexponential functions
gI(T) = aI(e-bIT- e- CI T) (8.5)
g2( T) = a2(e- b2T - e- C27) (8.6)
where Cl > b, > C2 > b2 (b, and Cl correspond to the faster sodium channel). We consider
also F to be a sigmoidal function acting on the difference (VI - V2) as
ß (8.7)
Fiv-; V2) = 1 + expj-tv, - V2 - 0)]
Finally, we consider also a sigmoidal function for the "resetting" feedback function R:
P
R(y) = 1 + exp[-A(y - m (8.8)
where A attains a high value (sharp slope of thresholding operation). This model can re-
produce quite accurately the time course of the action potential generated at the axonal
membrane of the squid or elsewhere (with proper adjustment of the parameters). The
main functional features ofthe model are: (1) the activation phase of spiking suprathresh-
old behavior (that generates the action potential) due to properly timed negative conduc-
tance (a - F), and (2) the inactivation phase of refractory behavior due to the negative
suprathreshold feedback z.
For subthreshold behavior, the feedback component z is not active and the voltage-de-
pendent "conductance" [o - F(Vh V2)], remains positive, so that the resulting output
y(t) = y(O) · exp{ -{[Cl- F(vl> V 2 )]dt' } (8.9)
does not give rise to an action potential. However, when the integral in the exponent of
Equation (8.9) becomes positive (negative "conductance"), then an action potential re-
sults that triggers the negative feedback loop and causes subsequent hyperpolarization
and refractory behavior.
The sigmoidal functions Fand R secure stable operation and bounded output. Their
sigmoidal form can be attributed to the self-limiting process, intrinsic in biophysical
8. 1 A GENERAL MODEL OF MEMBRANE AND SYNAPTIC DYNAMICS 411
processes, where the substrate for the subject species gets depleted (Bernoulli equation).
The negative feedback mechanism is attributed to the inactivation of the sodium channel.
The channel dynamics (or kinetics) described by the impulse response functions {gi} may
attain any form consistent with the data, but the dynamics for y and z are set to first-order
(capacitive effects) to simplify the formalism without sacrificing much in generality. For
a step input, the variables VI and V2 will reach steady asymptotic values that will trigger
repetitive action potentials, if above threshold. The frequency of these action potentials
will increase with increasing step input values until a limit is reached due to the saturation
of the sigmoidal function F.
The claim that this model can be used for estimation of individual channel dynamics is
due to its mathematical relation with the Volterra kernels of this system (or their equiva-
lent PDMs) that can be directly measured form the input-output data in the subthreshold
region (as shown below).
Furthermore, the claim of applicability to various transmembrane processes and synap-
tic dynamics is based on the fact that any number of channels of arbitrary dynamics can
be used with various nonlinear relationships in this fonnulation. Note that the feedback
term can be dropped for synaptic dynamics (if no fast inactivation mechanisms are pre-
sent), or can be relaxed to a much "softer" feedback mechanism. The generality of this
model form is appealing if we can show that the individual channel characteristics can be
related to kernel or PDM measurements, as intended.
Let us now examine this model in the context ofVolterra analysis. For the two-channel
example, consider the Taylor expansion ofthe sigmoidal functions:
F(VI - V2) = cPo + cPI . (VI - V2) + cP2(VI - V2)2 + ... (8.10)
R(y) = ro + rl . Y + r2 . y2 + ... (8.11 )
The analysis is simplified considerably in the subthreshold region, where the zero and
first-order Volterra kemels of this system are given by
bo
(8.12)
ko= a-4>o
k!( T) = ko4>! . {e-PA[g!( T- Ä)- g2(T- Ä)]dÄ (8.13)

o
where p = a - cPo. The first-order Volterra kernel attains a simpler form in the Laplace do-
main:
kOcPI
(8.14)
where GI(s) and G2(s) are the Laplace transforms of gl and g2, respectively.
Note that the zero-order Volterra kernel ko is the resting potential and known in prac-
tice. Also, the basal value b o can be measured or estimated separately. Therefore, Equa-
tion (8.12) allows estimation ofthe "rate constant" (o - cPo) = bolko. This rate constant is
important because it appears as a pole of the first-order Volterra kernel, as indicated in
Equation (8.14). Thus, the key function G(s) = [G1(s) - G2(s)] can be estimated from
Equation (8.14) within a scalar 4>b using the first-order Volterra kernel estimate that is
obtained from the subthreshold input-output data (which ought to be broadband from
"natural" operation of the system).
The waveform of the key function g(t), which is the inverse Laplace transform of G(s),
contains the critical information regarding the dynamics of the individual ion channels
partaking in the generation of the action potential (or postsynaptic potentials in the case of
synapse models). The scaling by the parameter 4>1 does not affect this waveform. Ifphar-
macological isolation of individual ion channels is possible, then each individual channel
response function g;(t) can be determined through this relatively simple procedure. For in-
stance, in the aforementioned example of the two channels (sodium and potassium),
blocking ofthe sodium channel with TTX allows estimation ofthe potassium-channel dy-
namics g2(t), and blocking of the potassium channel with TEA allows estimation of the
sodium-channel dynamics gl(t), provided, of course, that no other ion channels are pre-
sent. These measurements are based on our ability to obtain reliable estimates of the first-
order Volterra kernel under subthreshold conditions.
Having evaluated the individual channel dynamics {g;(t)}, we now turn to the estima-
tion of the nonlinearity F. In the subthreshold region, the second-order Volterra kerne I in
the two-dimensional Laplace domain is given by
K 2(s j, S2) = {kocf>2 G(S\)G(S2) + ~\ [K\(Sj)G(S2) + K\(S2)G(Sj)] }/(St + S2 +p) (8.15)
or in the time domain:
Tm
k2(Tb T2) = ko4>2 fo e-PAg( Tl - A)g(T2 - A)dA
+-
4>1 fTm e-PA[kl(TI - A)g(T2 - A) + k l(T2 - A)g(TI - A)]dA (8.16)
2 0
where Tm = miru Tl' T2). The second-order Volterra kernel measurement under subthreshold
broadband conditions can be used to validate the structure of our model (since its depen-
dence on the previously estimated kl and g is specific to the structure of the model). This
validation process entails the least-squares fitting ofthe measured k2(Tb T2) to the two com-
ponents on the right-hand side ofEquation (8.16) that can be constructed from the known
p, g, and kl . This least-squares fitting (if it validates the postulated model structure) yields
estimates ofthe scalar quantities (k 04>2) and 4>1. Since k o is already estimated, estimates of
the parameters 4>1 and 4>2 become available, which can be used to estimate the two un-
known parameters (ß and fJ) ofthe sigmoidal nonlinearity F given by Equation (8.7).
This relatively simple procedure of estimating the model components in the subthresh-
old region (using Volterra kernel measurements up to second order, obtained from the
broadband input-output data) can be extended now to the suprathreshold region in order
to estimate the characteristic parameters '}', p, A, and ( of the feedback term z. There are
two routes to this goal: (1) the use of Volterra kernel measurements from data in the
suprathreshold region, and (2) the direct least-squares fitting of Equation (8.3) using esti-
mates of z(t) from suprathreshold data; since we have from Eq. (8.2)
z = F(Vb V2)Y + bo - ay - y (8.17)

8.1 A GENERAL MODEL OF MEMBRANE AND SYNAPTIC DYNAMICS 413
Recall that the right-hand side of Equation (8.17) is known because all its parameters
have been already estimated from subthreshold data. Thus, fitting of the integral equation
z(t) = fe-rrRfy(t - T)]dT (8.18)

o
where z(t) and y(t) are known, can yield estimates of l' and the parameters of the sig-
moidal nonlinearity R. This fitting procedure also validates (or does not) the feedback
structure of the postulated model. Significant discrepancies will necessitate, of course,
modification ofthe postulated feedback structure ofthe model.
The use ofVolterra kerne1measurements in the suprathreshold region requires the fol-
lowing analytical expressions that result from Volterra analysis of the model with the
feedback term active (suprathreshold behavior):
kg= bo-ro (8.19)

'YP
kf(T) = k04>I(h(A)g(T- A)dA (8.20)
where the superscript "s" denotes "suprathreshold," and h(A) is the inverse Laplace trans-
form of
(s + ,,)
(8.21)
H(s) = (s + p)(s + Y) + Cl
with the constant Cl given by
c I = L""' nkn-Ir
0 n (8.22)
n=l
For the second-order Volterra kernel in the suprathreshold case, the analytical expression
in the Laplace domain is
Ki(Sh sz) = { ~I (SI + Sz + Y>[Kf(sl)G(sz) + Kf(sz)G(SI) + 2~:z G(SI)G(SZ) J

- czKf(sl)Kf(sz) }/[(Sl + Sz + P)(SI + Sz + y) + cd (8.23)
where:
00 ,
""' n.
C2 = ~ ( _ 2)'2 rnkÖ-
2 (8.24)
n-2 n .
The complexity of the second-order Volterra kernel expression makes it unlikely that it
will be put into practical use. There fore, the fitting procedure of Equations (8.17) and
(8.18) appears more likely to be used in the suprathreshold region.
With regard to synaptic dynamics, the general model retains the same form but without
the sharp feedback term of the inactivation process (feedback is possible but of milder
functional form). The individual processes of presynaptic release and postsynaptic bind-
ing of various neurotransmitters are represented by the functions {gi} that describe indi-
vidual "synaptic-channel" dynamics. The activation nonlinearity F describes both the in-
dividual self-limiting nonlinearities for each neurotransmitter (typically of the
compressive sigmoidal type) as well as the possible nonlinear interactions among differ-
ent neurotransmitter processes (including "synaptic channels" dependent on second-mes-
senger processes). The output y(t) represents the postsynaptic potential induced by a
presynaptic voltage or current input x(t). The analysis of this model replicates the proce-
dure outlined previously for the subthreshold region of the axonal membrane.
In the following section, we employ the general model form obtained in this section to
examine the functional integration of a single neuron and to derive a general, yet realistic
and practicable, model representation of this integration process that is fundamental to the
function of the nervous system.
8.2 FUNCTIONAL INTEGRATION IN THE SINGLE NEURON
We focus on the neuron as the basic operational neural unit from the viewpoint of signal
transformation and coding. The term "transformation" is used to indicate the dynamic
process of combining multiple channels of input information to produce a composite in-
tracellular potential or current at the site of the axon hillock where the generation of ac-
tion potentials takes place-the latter representing the "coding" of neural information.
Note that the representation ofinformation flow in the nervous system (neural signals) is
either in the form of "graded potentials" (GP) or "action potentials" (AP). The latter are
often referred to as "spikes" (i.e., impulse-like waveforms) and it is accepted that their in-
formation content is in the interspike time intervals and not in their magnitude or shape.
The graded potentials are continuous-time signals, whose values (like those of the APs)
are measured relative to the "resting potential" of the neuron. The generation and trans-
formation of these potentials is accomplished through complex biophysical mechanisms
of excitable tissues (ionic currents arising from changing membrane permeabilities), re-
lease and binding of neurotransmitters in synaptic junctions, electrical coupling in "tight"
junctions, and passive "electrotonic" spread of potentials. These biophysical mechanisms
have been studied extensively and are the subject of a vast literature.
In this section, we limit ourselves to the fundamentals of neuronal structure and func-
tion that are common (with variations in size, geometry, and specific functional attributes)
in most neurons. The "prototypical" neuron is comprised of a cell body (soma), a tree-like
structure of receiving fibers (dendrites), and a long transmitting fiber (axon) with occa-
sional branches (collaterals). The axon is attached to the soma at the "axon hillock" and,
along with its collaterals, ends at synaptic terminals (boutons) that are used to relay infor-
mation onto other neurons through "synaptic junctions." The soma contains the nucleus
and is attached to the trunk ofthe dendritic tree from which it receives incoming informa-
tion. The dendrites are conductors ofinput information to the soma (input ports) and usu-
ally exhibit a high degree of tree-like arborization (up to several thousand branches). In-
put information arrives at various synaptic sites in the dendritic tree and the soma through
axonal terminals of other neurons or electrical (tight) junctions. The axonal terminals re-
lease chemical neurotransmitters into the synaptic cleft upon stimulation by arriving puls-
8.2 FUNCTIONAL INTEGRATION IN THE SINGLE NEURON 415
es (the branched-out APs of presynaptic neurons), which diffuse across the synaptic gap
(typically 200 A wide) of the synaptic junction and induce the generation of a "postsynap-
tic potential" (PSP) at the postsynaptic site ofthe dendrite or soma by binding onto neuro-
transmitter-specific receptors. The neurotransmitter can be excitatory or inhibitory, which
determines the relative polarity between presynaptic and postsynaptic potentials. The PSP
propagates away from the synaptic junction in an electrotonic manner (i.e., in a way anal-
ogous to current flow in passive electric cables), leading to attenuation of amplitude and
spreading ofthe waveform. The resulting dendritic potentials (DPs) merge at the numer-
ous nodes of the dendritic tree and eventually arrive at the soma where, combined with
potentials generated directly at the soma by possible somatic junctions, they produce the
composite intracellular potential or current at the site of the axon hillock. Note that the
merging ofDPs is, most likely, a nonlinear operation because these potentials are due pri-
marily to migration of sodium ions, which may lead to sigmoidal saturation. Although
passive dendritic fibers are more common, active (or semiactive) ones have been found
that have voltage-dependent permeabilities and allow at least partial regeneration of the
propagating signal, leading to less attenuation of amplitude. This partial regeneration
property of dendritic membrane should be contrasted with the full regeneration property
ofaxonal membrane that maintains the amplitude of the propagating AP along the axon.
There are two types of neurons: those that generate APs at the axon hillock through a
threshold-trigger mechanism, and those which do not. The latter transmit through their
axon a graded potential only over very short distances «0.3 mm), owing to rapid attenua-
tion of GPs with distance. Therefore, the use of GPs for interneuronal communication is
limited to neurons with very short axons (e.g., photoreceptors and horizontal and bipolar
cells in the retina). On the other hand, APs can propagate practically unaltered over long
distances (especially in the myelinated axons) because they are regenerated by the ex-
citable membrane, thus offering an effective mode of communication in the nervous sys-
tem.
Having established in the previous section the general model form for the two key
transformational processes in the neuron (i.e., the generation oftransmembrane and post-
synaptic potentials in response to transmembrane currents and presynaptic potentials, re-
spectively), we now proceed with the ambitious goal of modeling the entire process of
functional integration in the single neuron. This entails the quantitative description of the
manner in which a multitude of incoming synaptic inputs at the dendrites are integrated
within the neuron to generate an action potential output at the axon hillock. Accomplish-
ment of this task requires not only quantitative understanding of the two key transforma-
tional processes mentioned above, but also quantitative understanding of the process of
propagation ofpostsynaptic potentials through the dendritic branches and their integration
at the soma prior to arrival at the axon hillock.
In keeping with the basic tenet of this book for balancing fidelity and completeness of
description with practical considerations, the suitability of simplistic notions of linear
"cable theory" is questioned for the purpose of accurate modeling of dendritic propaga-
tion and integration. We hold the view that this process is rather complicated, not only be-
cause it is govemed locally at each dendritic patch by nonlinear dynamic equations of the
general form previously discussed (some dendrites may also have excitable membranes,
and neuromodulators may also exert influences away from the synaptic clefts), but also
because ofthe extremely complicated geometry ofthe dendritic arborization. On the oth-
er hand, we see no compelling practical motivation to examine in detail the enormous
complexity of the dendritic geometry, and we advocate instead the notion offunctional
integration, whereby the modeling problem is cast as the "mapping" of all synaptic inputs
onto the transmembrane current entering the axon hillock. This process is nonlinear and
dynamic and, therefore, amenable to our methodology, which offers both practical advan-
tages and a realistic description of somato-dendritic integration. Consequently, the advo-
cated modeling approach appears to be suitable if our objective is the quantitative descrip-
tion of how a single neuron acts as integrator of incoming synaptic information to
produce (or not) an action potential at any given point in time. Clearly, ifthe objective is
the understanding of the detailed biophysical mechanisms subserving this integrative
process, then detailed modeling analysis must be applied to each dendritic or somatic 10-
cation over the time course of the process.
The objective of functional integration at the single-neuron level is pursued by first
defining the synaptic inputs. There are numerous postsynaptic potentials (both excitatory
and inhibitory) that are generated at the synaptic junctions in response to incoming axon-
al impulses from many terminals of various presynaptic neurons (ignoring for now the
possible inputs from electric gap junctions). Let us consider first as an example the case
of a single presynaptic neuron with axonal spike-train activity denoted by
x(t) = LAip(t - D i) (8.25)

i
where Ai is the amplitude of the ith presynaptic pulse p(t - D i) (of fixed form) arriving at
the ith axon terminal, and D, is the respective time delay due to propagation through dif-
ferent lengths ofaxon terminals. This presynaptic potential induces a postsynaptic poten-
tial z;{t) upon transformation by a nonlinear dynamic operator of the form given by Equa-
tions (8.2)-(8.4) that describe the specific synaptic transmission. Subsequently, the ith
postsynaptic potential Zi(t) propagates along the dendritic branches, merging with other
propagating postsynaptic potentials to arrive eventually at the soma in the form of a single
aggregate potential.
As mentioned earlier, the propagation and merging process from the dendrites and
soma to the axon hillock is very complicated and can be lumped for practical reasons into
a single nonlinear dynamic operator S that acts on all N synaptic inputs [A}x(t - D}), ... ,
ANt(t - D N ) ] to produce the single signal s(t) that represents the input signal at the axon
hillock and generates an action potential (or does not) according to the nonlinear dynamic
relation defined by Equations (8.2)-(8.4). The operator S acts on the N synaptic inputs in
a manner determined by the geometry and biophysics of the full dendritic arborization,
the soma, and the morphology of the synaptic clefts.
In the interest of simplifying the modeling task, we may incorporate the process by
which the various presynaptic potentials are produced at the axon terminals of a presynap-
tic neuron and define an operator Q that represents the cascaded operations of the axon
terminal distribution and dendritic integration as a single operation on the activity {x(t)}
at the axon hillock of the presynaptic neuron:
s(t) = Q[x(t - T); 0 :5 T :5 p,] (8.26)
The operator Q can be modeled as a Volterra system receiving input x(t) from the presy-
naptic neuron to produce the intracellular current s(t) that represents the input to the axon
hillock. The practical advantage of this formulation is that it allows estimation of the op-
erator Q (as a Volterra-type model) from actual data. When cascaded with the model of
Equations (8.2)-(8.4) for the generation of action potentials, this formulation provides
complete and efficient representation of realistic functional integration by a single neu-

ron. Clearly, the existence of more presynaptic neurons does not alter any of these
methodological facts but simply requires use of the multiinput modeling methods of
Chapter 7 to describe the multiinput case:
s(t) = QMl:Xl(t - Tl), ... ,x~t - TM); Tb ... , TM E [0, J.L]] (8.27)
The application ofthe PDM modeling concept to the concatenation ofthe processes of
somatodendritic integration and spike (action potential) generation has given rise to the
notion of "neuronal modes" for information processing in the nervous system, as dis-
cussed in the following section along with the related issue of"trigger regions." Since the
inputs provided by the presynaptic neurons are in the form of spike trains (sequences of
action potentials), the modeling task can be adjusted (for simplification ofprocessing) to
the case of systems with point-process inputs discussed in Section 8.3.
8.2.1 Neuronal Modes and Trigger Regions

We conceptualize each neuron as a "black box" that receives certain input signals and
produces certain output signals on the basis of a nonlinear dynamic "rule" described by an
explicit mathematical expression (model). The use ofthe black-box concept for the repre-
sentation of a single neuron allows us to bypass the complex biophysical mechanisms that
are active at the subcellular level, and simply concentrate on the input-output transforma-
tion. This, in turn, allows the development of reduced-complexity models for aggregates
of neurons and the study of their functional properties.
In developing this mathematical model, we seek a compromise between mathematical
complexity and biological relevance in order to obtain tractable mathematical formula-
tions of the problem while preserving the essential functional features that have been ob-
served experimentally. Our goal is to search for unifying operational principles that ex-
plain the largest possible amount of current experimental evidence and conceptually
organize our understanding of the system function. There is no doubt that nonlinearities
are omnipresent in neural systems and their role is essential for many aspects of their
function. Compressive and decompressive nonlinearities observed in sensory receptors,
certain types of facilitatory or suppressive/depressive interaction, synaptic gap transmis-
sion, and the generation mechanism of action potentials are some of the most widely rec-
ognized examples. The challenge is the actual identification ofthese dynamic nonlineari-
ties from data, and their modeling in a manner that is practical and meaningful from the
point of view of advancing our understanding of neural function. Finally, one should note
the unavoidable presence of extraneous and intrinsic noise that places the modeling prob-
lem in a stochastic framework.
The modeling concept of "neuronal modes" (NMs) is introduced in order to provide
concise and general mathematical representations of the nonlinear dynamics involved in
signal transformation and coding by neuronal systems [Marmarelis & Orme, 1993]. This
modeling concept is based on the judicious selection of the principal dynamic modes of a
neuronal system to achieve a reasonable balance between mathematical complexity and
neurphysiological relevance. The proposed modeling approach decomposes into two op-
erational steps the nonlinear dynamics involved in the transformation of the input signal
into the output signal. The first operational step involves the aforementioned NMs that
perform all linear dynamic (filtering) operations on the input signal. The second opera-
tional step employs a multiinput static nonlinearity (MSN) to produce a continuous out-
put, or a multiinput threshold-trigger (MTT) operator (i.e., a binary operator with multiple
real-valued operands that are the outputs of the NMs) with a refractory mechanisrn that
produces the sequence of output spikes (action potentials). The two operational steps in-
volving NMs and MTT are depicted in the block-structured model ofFigure 8.1, which is
the modular form of the PDM model adapted to the case of spike-output systems [Mar-
marelis, 1989c; Marmarelis & Orme, 1993].
The NMs are the principal dynamic modes (PDMs) ofa neuronal unit or system, when
the PDM analysis procedure presented in Section 4.1.1 is applied to the neuronal
input-output data. Thus, when the input is the axonal activity of a presynaptic neuron and
the output is the generated sequence of action potentials at the axon hillock of the postsy-
naptic neuron, then the NMs provide a succinct representation of the functional character-
istics of neuronal information processing. The same concept applies when the input is
provided by the graded potential of a sensory unit (where the output is the encoded senso-
ry information), or in the case ofneurosensory transduction, where the input is the contin-
uous sensory stimulus.
The NM model shown in Figure 8.1 follows on the PDM model, where the output now
can be either continuous (a graded potential or probability of firing) or a point process (a
sequence of action potentials or "spike train"). The latter output modality requires ap-
pending a threshold-trigger operator at the output ofthe NM model, as discussed above.
It is critical to note that the employed NMs are properly defined filters that capture the
important dynamic characteristics of the system and reflect the integrated effect of all ax-
odendritic and axosomatic synaptic inputs (including propagation effects on the formation
of the transmembrane potential at the axon hillock preceding the generation of an action
x(n)
MODE 1 MODE Z MODE K

m, mZ mk
uk(n)
MULTI-INPUT
THRESHOLD
T[u1,u2,···,uk]
Yen)
Figure 8.1 Block-structured model of a neuronal system with K neuronal modes (akin to the
MPDFs of Sec. 4.1.1) that form a linear filter bank representing the important dynamic characteristics
of the system. The multiinput threshold operator incorporates all system nonlinearities and generates
the output spikes (multiinput threshold-trigger) [Marmarelis & Orme, 1993].
potential (or describe the transduction dynamics and conduction effects for a sensory sys-
tem). The employed MSN or MTT operator represents all nonlinear characteristics of the
system, which incorporate any nonlinearities involved in the creation of the aforemen-
tioned composite potential at the axon hillock, as weIl as the possible generation of the ac-
tion potential.
We will now illustrate the use and the meaning ofNMs through examples. Consider as
a first example a simulated neuronal system that exhibits two distinct types of dynamics:
a (leaky) integrating NM and a (slowly) differentiating NM, shown in Figure 8.2 as im-
pulse response functions gl and g2, respectively, given by the expressions
Tl
gl( T) = 12 [exp(-T/6) - exp(-T/2)] (8.28)
g2(T) = -(T- 10)exp[-(T- 10)2/16] (8.29)
where the time unit is 1 ms. Consider also a two-input static nonlinearity that has a sig-
moidal characteristic in terms ofa linear combination ofthe two NM outputs VI and V2:
(8.30)
y = 1 + a exp(-ßIVI - ß2V2)
where y is the system (continuous) output. For a = 0.4, ßI = 0.2, and ß2 = 0.4 the resulting
(hyper) sigmoid surface is shown in Figure 8.3. The first-order and second-order Volterra
kemels of this nonlinear system are shown in Figure 8.4, and they reflect the combined
dynamics of the two NMs gl and g2 under the (hyper) sigmoid nonlinear transformation
of Equation (8.30). Kemels of this approximate form have been measured in actual neu-
IMPULSE RESPONSE F'UNcriONS or TWO DYNAMIC Moors
0.0 15.0 30.0 45.0 60.0 75.0
TIME
Figure 8.2 The impulse response functions of two NMs, 91 (trace 1) and 92 (trace 2) [Marmarelis,
1989c].
(HYPtR)SlCMotD STATte NONLINFAalTY (Z HN)
Figure 8.3 The form of the (hyper) sigmoidal static nonlinearity defined by Equation (8.30) [Mar-
marelis, 1989c].
ronal systems and can be used to detennine the NMs in each particular case. If the NMs
are detennined correctly, then the form ofthe associated static nonlinearity can be easily
estimated from the experimental data .
Wegenerally expect a great variety ofNMs and static nonlinearities in the nervous sys-
tem, depending on individual neuron characteristics. For instance, a neurophysiological
phenomenon known as "shunt inhibition" may result in amplitude modulation of one NM
byanother, leading to bilinear tenns in the exponent of a (hyper) sigmoid nonlinearity as
y = ----:----:--------;- (8.31)
1 + Cl' exp(-ßIV] - ßzvz + 'YV,vz)
shown in Figure 8.5 (for Cl' = 0.5, ß, = 0.5, ßz = 0.5, y = 0.1) to exhibit regions ofmutual
facilitation and suppression. Another example could be the case of full-wave rectification
ofboth NMs, giving rise to quadratic tenns in the exponent ofthe (hyper) sigmoid nonlin-
earity and shown in Figure 8.6. Full-wave rectification in the presented phenomenological
context may arise from the combined synaptic inputs by two other neurons that have pre-
viously applied half-wave rectification to their respective input .
In its simplest form, the AP-generation mechanism can be viewed as a threshold-trigger
device that produces aspike whenever its input exceeds a certain threshold value. There
must be a "reset" mechanism in this threshold trigger that returns the output to the resting
level after each spike for a short period of time following each firing (refractory period).
When this threshold trigger is combined with the muItiinput static nonlinearity MSN , the
MTT operator results, which can be described by an M-dimensional binary function
1 1
Y= "2 + "2 sgn {j{vj, vz, .. . , VM) - 8} (8.32)
8.2 FUNCTlONAL INTEGRATION IN THE SINGLE NEURON 421
rlR\ 1 OftOU 1l( RN[l

' 00
oeoo
oeoo
0 '00
0200
00
-0200
-0'"
- 0 '"
-oeoo
., 00
00 "0 .100 .. 0 100 ,,"0
r...
Figura 8.4 The first-order and second-order kemels of the nonlinear system defined by Eqs.
(8.28)-(8.30) that has the NMs of Figure 8.2 [Marmarelis, 1989c].
wherejiv., V2, . . . , V ,\.f) is the MSN with M inputs, () is a fixed threshold , and "sgn" is the
signum function . In other words , MTT will produce aspike whenever the combination of
NM output values ( VI> V 2' . .. , VM) is such that./{vl> V2, . . . , V M) ? (). These trigger values
of ( V I> V 2, .. . , V M) define "trigger regions" that are demarcated by the solutions of the
equation
./{VI> V2" ' " VM) - () = 0 (8.33)
The solution s ofEquation (8.33) are the "trigger lines" whose form determines the re-
quired (minimum) order of the model (see Section 8.2.2). In actual applications, these
"trigger regions " (TRs) can be determined experimentally by computing the values of
NM output s ( V I> V 2, • • • , VM) for which aspike is observed in the output ofthe neuron . The
locus of these values will form an estimate of the TR of the system. This is, of course ,
practically possible only if a relatively small number of NMs can span effectively the dy-
namics of the system under study.
This general formulation of the modeling problem for spike-output systems has impor-
tant implications in the study of neuronal systems. A spike-generating unit (neuron) is
seen as a dynamic element that codes input information into a sequence of spikes where
the exact timing of the spikes contains the transmitted inform ation . This coding operation
may be defined by a small number ofNMs and by the TR ofthe unit. This representation
leads to a general and, at the same time, parsimonious description of the nonlinear dy-
namics of a neuronal unit. These units can be interconnect ed to form neuronal aggregates
with specifiable functional characteristics. These aggregates may be composed of classes
of interconnected units , with each class characterized by a specific type of representation
and connecti vity.
Let us now see how these ideas apply to the example discussed above. When the
threshold-trigger operator (with threshold () = 0.8) is appended to the output of the (hy-
per) sigmoid shown in Figure 8.3, then the MTT characteristic shown in Figure 8.7 re-
sults (i.e., the trigger line is a straight line). If the same is done for the MSN function
shown in Figure 8.6, then the MTT characteristic shown in Figure 8.8 results. Note that
the same MTT characteristic will result for all MSN surfaces that have the same inter-
section line(s) with the threshold plane, i.e., they yield the same solution for Equation
STATIC NONLINEARITY SHOWINC fACILITATION 6 OCCLUSION
1
ta- '
1 + oexp(-~.v. -/f.,tJ2 + 1 VI V., )
Figure 8.5 The form of the static nonlinearity defined by Equation (8.31), exhibiting regions of mutu-
al facilitation and suppression [Marmarelis, 1989c].
STATIC .O.L.N~A.ITY SHOW.MC TVO-MODE aECTlrlCATION
----............... ----..
~~
Figure 8.6 The form of the static nonlinearity containing quadratic terms of the NM outputs v, and
V2 in the exponent of the (hyper) sigmoidal expression (full-wave rectification characteristics) [Mar-
marelis, 1989c].
8.2 FUNCTlONAL INTEGRATION IN
THE SINGLE NEURON 423
THaESHOLDED (HYP ER)S ICMO ID STAT IC
NONL INEA aITY (NTT )
Figu re 8.7 The MTT characteristic for the MSN

of Figure 8.3 [Marmarelis, 1989c).
TKSESHOLOEO STAT1C HORL

IREAalTY SKOKIRC TKO'HOOE
aEcT lflCA TIOR
,.,..-~ 'l. .... ""- ...... J

-~-'--"""'"
-
r-«:
-
.
--------...------
", .0 . . . . . . . . . _ _
,~
Figu re 8.8 The MTT characteristic

for the MSN snown in Figure 8.6
[Marmarelis, 1989C).
(8.33). This clearly demonstrates that the detailed morphology of the MSN surface in
the subthreshold or suprathreshold region has no bearing on the pattern of the generated
spikes.
Note that the NM outputs for this system, VI (t) and V2(t), provide finite-bandwidth in-
formation through time about the intensity and rate characteristics of the input signal, re-
spectively. The MTT then indicates which combinations of intensity and rate values of
the input signal will lead to the generation of aspike by the neuron. These combinations
define the TR of the specific neuron. Furthermore, subregions of the TR can be monitored
by "downstream" postsynaptic neurons using the temporal pattern of the generated spike
train and their own nonlinear dynamic characteristics (NMs and MTT). This "higher-lev-
el" coding can provide the means for refined clustering of spike events that reflect specif-
ic "features" of the input signal, leading to specialized detection and classification of in-
put information features and, eventually, to cognitive actions using these specialized
features as "primitives." The "fanning-out" of information to postsynaptic units for the
purpose of combinatorial higher-level processing is consistent with the known architec-
ture of the cortex.
In full awareness that this hypothesis of neural information processing is not yet tested
and is still in a seminal stage, we nevertheless propose it as a plausible theory that exhibits
some attractive characteristics; it incorporates the apparently complex nonlinear dynam-
ics and the signal modalities found in neuronal systems in a manner that is consistent with
current experimental evidence.
We conclude with a simple example of signal encoding based on the presented ideas.
Consider a pulse input, shown in Figure 8.9 along with the resulting NM outputs VI(t) and
V2(t). Application of a threshold trigger on VI(t), V2(t), -V2(t), and V2 2(t) separately yields
the spike trains shown in Figure 8.10 (traces, 1, 2, 3, and 4, respectively). These four
INPUT PULSE (11) AND INTERNAL VARIABLES Vl (12) AND V2 ('3)
··~----2
_--------J
J • • • • » »»»»• »• • i • • • • » i i i i i i i • i .T.. T·T,-,-,..,...,.....,..rTl
0.0 30.0 60.0 90.0 120. 150
TIME
Figure 8.9 A pulse input (trace 1) and the resulting internal NM outputs V1(t) (trace 2) and V2(t) (trace
3), corresponding to the NMs shown in Figure 8.2 [Marmarelis, 1989c].
(1 )(;~-SIJS"A!NEO.{2)ON- TRANSIENT,(3)Off - TRANSIENT,(4)ON/OFF" - TRANSS
._ _lNL . .__ J
0.0 30.0 60.0 90.0 120. 150
TIME
Figure 8.10 Spike-train response to the pulse input for four common types of neuronal responses:
(1) on-sustained, (2) on-transient, (3) off-transient, (4) on/off-transient [Marmarelis, 1989c].
cases emulate the "on-sustained," "on-transient," "off-transient," and "onloff-transient"

responses of neurons, respectively, that are often observed experimentally. They code an
event of significant magnitude and its duration, the onset of an event, the offset of an
event, and the onset and offset of an event, respectively. Application of an MTT of the
type shown in Figure 8.7 (with appropriate threshold) on a linear combination ofthe NM
outputs [VI (t) + 2V2(t)] yields an "on-mixed" response shown in Figure 8.11 along with
the input pulse and the combined continuous wavefonn ofthe NM outputs. The spike out-
put encodes the onset ofthe stimulus (event) and its duration in the same time record. The
higher-Ievel postsynaptic neurons must "know" that this is an "on-mixed" cell, otherwise
they will mistakenly interpret the cell output as coding two distinct events. This can be ac-
complished by cross-examination of the outputs of several different types of neurons re-
ceiving and encoding the same input.
To i1lustrate the idea of higher-Ievel decoding by monitoring different subregions of
the MTT trigger regions for spike events (i.e., clustering of VI, V2 values resulting in a
spike), we consider the presented "on-mixed" spike train of Figure 8.11 and plot the val-
ues (Vb V2) corresponding to an output spike on the (Vb V2) plane. The result is shown in
Figure 8.12, where the abscissa is VI values and the ordinate is V2 values. The combina-
tions of (Vb V2) values that lead to spike generation cluster in two groups. The upper-left
cluster corresponds to high V2 values (encoding significant positive rate of change in the
input signal) and the lower-right cluster corresponds to high VI values (encoding signifi-
cant positive magnitude of the input signal). These two clusters could be delineated by
higher-level neurons with appropriate connectivity and dynamic characteristics, leading
to extraction of specific input features. This example i1lustrates the possibilities for high-
er-level decoding of complex input information afforded by this approach.
The immense variety of individual neuron characteristics (in tenns of synaptic, histo-
logical, and biophysical characteristics) leads to a similar variety of NMs and associated
-------~
1 3
0.0 30.0 60.0 90.0 120. ieo,
TiME
Figure 8.11 The pulse input (trace 1), combined internal variable (V1 + 2V2) (trace 2) and the "on-
mixed" spike output (trace 3) (details in the text) [Marmarelis, 1989c].
10.0
8.50
AE
~
7.00
N
> 5.50 "1 ..
W
..J
m 4.00 )I(
<C
0:=
~ 2.50
..J
<C
Z
1.00
"
W
I-
~
.;,.~
-0.500
•
-2.00
-3.50
-5.00
T
0.0 2.00 4.00 6.00 8.00 10.0
INTERNAL VARIABLE V1
Figure 8.12 Combinations of (v1 , v2 ) values leading to spike generation for the example of Figure
8.11. Two clusters of (V1' v2 ) trigger values appear [Marmarelis, 1989c].
static nonlinearities in different cases. Nonetheless, the presented modeling approach of-
fers a common framework that allows a unified approach to the problem of signal trans-
formation and coding by different types of neurons, and incorporates nonlinear dynamics
and spike generation in a fairly general yet parsimonious manner.
The actual identification of the NMs of a neuronal system from data remains a formi-
dable task in practice. The complexity of this task is compounded by the presence of data-
contaminating noise and experimentallimitations in applying and measuring the appro-
priate input-output signals. Nonetheless, the proposed modeling approach has been
applied successfully and has been shown to provide accurate understanding of the func-
tional properties of individual neurons that will facilitate the study of neuronal aggregates
and, ultimately, the understanding of the functional organization of "integrated" neural
systems of greater complexity. Note the coincidence between NMs and PDMs of a neu-
ronal system. An example from a spider mechanoreceptor is provided below.
Illustrative Examples. An example of an integrated PDM model with trigger regions

of a single neuron with action-potential output is presented here, taken from the modeling
studies of the cuticular mechanoreceptor in the spider leg that was discussed in Section
6.1.4 for intracellular current and potential outputs.
The obtained first-order and second-order kemels of the cuticular mechanoreceptor are
shown in Figure 8.13, when the output is the sequence of action potentials generated by
the quasiwhite displacement (mechanical stain) stimulus [Mitsis et al., 2003c]. Three sig-
nificant PDMs are found in this case and shown in Figure 8.14. The most significant
PDM (corresponding to the highest eigenvalue of 2.28) exhibits high-pass characteristics,
and the other two PDMs (corresponding to eigenvalues of 0.31 and 0.22) exhibit band-
pass characteristics with resonant frequencies at approximately 110Hz and 180 Hz, with
about 80% fractional bandwidths (secondary resonant peaks are also evident at about 40
Hz and 70 Hz in the second and first PDM, respectively). Dur working hypothesis is that
the resonant peaks of the PDMs correspond to different ion channels with distinctive dy-
namics. We posit that the high-pass PDM corresponds to the fast sodium channel and re-
flects (in part) the artificial representation of the action potential with an impulse func-
tion. The main peaks of the second PDM (--180 Hz) and of the third PDM (--110Hz) may
be corresponded to the sodium and potassium channels involved in the generation and re-
setting ofthe action potential. The secondary peak ofthe second PDM (--40 Hz) may be
due to the calcium-activated potassium channel based on the time scale of the anticipated
dynamics. This leaves the secondary peak ofthe first PDM (--70 Hz) as an open question,
regarding its origin and correspondence to a biological mechanism.
The PDM outputs {uI(n), u2(n), u3(n)} can be mapped in three-dimensional state space
for the combinations that lead to the generation of an action potential (AP) [Marmarelis,
1989c; Mitsis et al., 2003c]. The subspace thus defined is called the "trigger region" of
the system and is illustrated in Figure 8.15 for UI versus U2, U2 versus U3' and UI versus U3
separately (since visual display ofthe "trigger region" for all three PDM outputs is diffi-
cult) for a 10% "probability of firing" (PoF). The latter is computed on the basis of the
data as the ratio of the histogram of PDM output combinations generating an AP to the
histogram of all PDM output combinations in the data. Note that an absolute refractory
period of 5 ms is determined from the data and used for histogramming (binning) purpos-
es. The PoF surfaces are also shown in Figure 8.15 for the three pair combinations of
PDM outputs. It is evident that the selection of a hard threshold to produce the "trigger re-
gion" (TR) from a PoF can be determined by the mean firing rate ofthe neuron (the 10%
(a) (b)
0.5 I , , 0.55
0.4
1 I 0.5
0.3
-
oS
.-
0.2
~ 0.1
.Q.1
~.21 V
0.1
, , , I 0.05
0 5 10 15 20 0 100 200 300 400 500
m[rnsec) FrIQl*'CY [Hz)
(a) (b)
,~ 500
,," I', ,
"..-.. " --.,
~
1.5..., "" ,A"

" , " II
I
,, I ,, '
«lO
~
",,~, ,
,," 1
I', , '.1
I" " 1
~ '~1 , .-..
~O.5
I• ,,',I , I g300
I
'...,
tt~
~
.- I u
E 0 ~ cG)
N ""'" 1 :::J
~
.Mt
-G.5 V -- ~ ..,.
_yl
-,I e- 200
... , 'I LL
""
~
::..
·1
0 "- 100
20 <::::)
10
" 1 t
0
20 0 m2(msec] 0 100 200 300 400 500
m1 [maee]
Frequency [Hz]
Figure 8.13 First-order and second-order Volterra kerneis of the mechanoreceptor for action po-
tential output in the time domain (Iett column) and in the frequency domain (right column). Note the
high-pass characteristics in both kerneis [Mitsis et al., 2003c].
level is chosen only for illustrative purposes in Figure 8.15). Accordingly, the hard
threshold is selected if the actual mean firing rate can be established, so that the resulting
mean firing rate from the model equals the actual.
Another way to address this issue of hard-threshold selection is by computing the sen-
sitivity-specificity curves (SSC), which are akin to the "receiver operating characteristic"
(ROC) curves used traditionally in detection theory and applications. The SSC can be de-
(a) (b)
, , , ,
0.8. 1.8.
,",
1.4~
~ \
~ ~
0.41 I'
," \
\
---~
1.2 t
,I ,," ,,
0.2H I •• 1 \
, ''
I ,, "\
,,, -
0.8~
I "
,~ ...
0.8~
I , \
\
0.4~ • I
, • I
~
..
0.2~1
- Mode2 \
I
, --Mode3
0
5 10 15 20 0 100 200 300 400 500
m[maec) Frequency (Hz)
Figure 8.14 Estimated PDMs of the mechanoreceptor for action potential output in the time do-
main (Iett) and in the frequency domain (right) [Mitsis et al., 2003c].
tennined by computing the "sensitivity" of the model as the percentage of true positive
predictions (Le., correct prediction of an output AP) for every threshold (J over all time
bins in the data (defined by the absolute refractory period) and plot it versus the "speci-
ficity" ofthe model, computed as one minus the percentage offalse positives (i.e., predic-
tions of output AP when none exists) over all time bins for each threshold (J. Thus, the
SSC is computed for all (J values, and the performance of the model is deemed better
when the area under the SSC increases (ideally approaching unity). In our example, the
SSC is shown in Figure 8.16 and its area for 3 PDMs is found to be 0.976 for a bin size of
0.025. Note that the bin size of the nonlinearity in the PDM model affects the predictive
performance of the model. Maximization of the SSC area can be used as the criterion for
selecting the "optimal" bin size or other model parameters, including the number of
PDMs. Although the threshold (J detennines the trade-off between sensitivity and speci-
ficity of the model, the natural firing rate of the neuron must be considered in selecting
the "natural threshold," because the notion of "optimality" is extemally imposed (by our
tendency to seek optimal performance) and may not be applicable in a natural context.
Another way of evaluating the PDM model for AP output is to compute the PoF for
each time bin (using the PoF surface) and compare it to a binary output value (1 if an AP
exists in the time bin and 0 otherwise). The computation of a normalized mean-square er-
ror of this model prediction can be used as an alternative metric for model evaluation.
It is important to observe that the most significant high-pass PDM for AP output is
very similar to the second PDM for the previous intracellular current and potential data
(see Figs. 6.21 and 6.26) and both PDMs for the transmembrane current data also bear
certain common features with the band-pass PDMs of the AP output (e.g., the resonant
peak around 40 Hz and the subtler peak around 70-80 Hz), as well as the resonant peak
around 100 Hz.
We must emphasize that the time course ofthe PDMs ofthe mechanoreceptor or any
other neuronal system can be analyzed further to infer the dynamics of the distinct ion
.~'
~
" ",," "1111\--
, -- ~~
- ~ r·-r • - r - ~ r _
~ ~ ~" ~. ~ , ,
-.- .. - po - - " - •
"
---:.., --t, -- ~, _.
• I
~. : . : : ':' ~ ~ .
. ''' .. ~ .. _~ .. _~
,0.8 • •
-~"","- .. ~--~ ..
I , ,0.8 ::I::t::L
.... -:- .... ,~ .. -~,-.
j::
l5 0.8
.
t , ,
f i. .. _: :
1 0.4
0.2
~~'--
,1 ~ r--r"
0.2
u1 0.8 u1
u2 u2
Probability of firing an action potential as a function of the Trigger region . . . function cf 11.. "2.
ouputs of modes 1 and 2.
~ ::~.~::::[:~
,," I
->
,," ,
-. ';
__ '
'-~,. ,
r--_.:.
-;'
•••
.. - :..... ---:
'
.:. • -- • .- - • t .
-~ ~. ~ ~.:. ,0.8
1°·8 ~ .... ...:.. . ~ --~ -:.
I,," ,,", ,," I t
f
0.8
'15
0.4
~.
'"
A'
'.
...- :..,
-
"
.
•
'.'
-~ .. ~ ~:
1
..
,-
,
F:
0.2
0.2
~.' --.:;"...
.;y o.s 0.5
' ~ ... , '
0.4 u1
0.4 u1 u3
u3 Trigger region . . . f\anction oe 11" 11).
Probability of firing an action potential as a function of the
outputs of modes 1 and 3.
, ,-
",~~- -~---
",,';"
"""..... --
••:- .> :: --~~- ~'-
..:- ••, "-,.r, ..,"' ..r">~
't'-
.
" -'.', - . .,.
' _ .~ -.A'
.,,' "
1
__ t
-
I
I
-
.,••••••: .> H~- _r~_
: ... "~,, .'
"".,r
:.:k..:.:'.:'.:!r.J
, .... " 1
4' - .
~.5
0.2
0.2 ~.20
~--.....---...~--........ 0.4
.0.2 0.5 ~.2
~.4 ~.4 u2
u2
u3 u3
Probability offlring an action potential as a function oflbe Trigger region . . . function oe 112, 11).
outputs of modes 2 and 3.
Figure 8.15 Probability of firing an action potential as a function of the PDM outputs taken by pairs
(Ieft column) and the corresponding trigger region for 8 = 0.1 (right column) [Mitsis et al., 2003c).
channels in the neuronal membrane. The particular methodology for this purpose is pre-
sented in Section 8.1 in connection with a class of nonlinear parametrie models with bilin-
ear terms that represent the modulatory effects of different channels (voltage-activated or
ligand-activated conductances). This method can be extremely useful in connecting data-
true VolterraIPDM models with biophysically interpretable parametrie models (nonlinear
differential equations).
It is important to note that the Wiener analysis of the Hodgkin-Huxley (H-H) model
yields kemels (of first and second order) with high-pass characteristics up to 500 Hz for
3 PDMs
0 0 Co 0 0 o co O) ~~ ___
0 .9 ",°0
~ @
~~ \
0 .8
0 .7
/ ~
~
0 .6
PDM pairs ~,
;2 0.5 ~
::!
. (;1
'"
If) 0.4
0 .3
0 .2
0 .1
o
o 0 .1 0 .2 0 .3 0.4 0.5 0.6 0.7 0 .8 0 .9
Specifi city
Figure 8.16 The SSC of the mechanoreceptor for three POMs (best performance corresponding to
an area of 97.6%) and the pair combinations (Iower circles corresponding to an area of 92.4%).
the squid axon membrane [Courellis & Marmarelis, 1989] . From the estimated first-order
Wiener kernei, we see that the frequency response declines after 500 Hz-a fact that at-
tributes band-pass characteristics to the squid axon dynamics over a broader frequency
range (the half-max passband is from about 200 Hz to about 1200 Hz). However, its high-
pass characteristics up to about 500 Hz are similar to the first PDM of the cuticular
mechanoreceptor, and, therefore, the first PDM dynamics of the mechanoreceptor can be
attributed to the sodium-potassium channels that are included in the H-H model. It was
also shown in the Wiener analysis ofthe H-H model that kerneis of order higher than sec-
ond are required in order to predict the generated action potentials in response to broad-
band (quasiwhite) stimulation. This finding is consistent with the results obtained in the
PDM model ofthe cuticular mechanoreceptor.
8.2.2 Minimum-Order Modeling of Spike-Output Systems

Ever since the Volterra-Wiener approach was applied to the study of spike-output sys-
tems, it has been assumed that a large number of kerneis (of high order) would be neces-
sary to produce a satisfactory model prediction of the timing of the output spikes. This
view was based on the rationale that the presence of a spike-generating mechanism con-
stituted a "hard nonlinearity," necessitating the use of high-order nonlinear terms. AI-
though this rationale is correct if we seek to reproduce the numerical binary values of the
system output using a Volterra-Wiener model, we have come to the important realization
that the inclusion of a threshold-trigger in our model reduces considerably the number of
kernels necessary for complete modeling in terms of predicting the timing of the output
spikes.
This realization gives rise to the notion of a "minimum-order" Volterra-Wiener mod-

el, which captures the necessary dynamic characteristics of the system when the effect of
the spike-generating threshold-trigger is separated out, as discussed in the previous sec-
tion. Thus, a wide class of spike-output systems can be effectively modeled with low-or-
der Volterra-Wiener models followed by a threshold-trigger operator. We came to this re-
alization by looking closer at the results of the "reverse correlation" technique for the
estimation of Wiener kemels and by studying the meaning of trigger regions of spike-out-
put systems containing a threshold-trigger mechanism for the generation of spikes [Mar-
marelis et al., 1986].
The Reverse-Corre/ation Technique. For a spike-output system with a Gaussian

white-noise (GWN) input, it has been shown that the Wiener kernels can be estimated
through the "reverse-correlation" technique [de Boer and Kuyper, 1968]. This technique
utilizes the fact that the output can be written as a sum of Kronecker deltas in discrete-
time notation:
I
yen) = LSCn - n;)

;=1
(8.34)
where {n;} are the locations of the output spikes. We can then obtain the Wiener kernel
estimates through cross-correlation as [Marmarelis et al. 1986]
"
h Im«, ... , m r ) = ~
1 {I Lx(n; - ml) ... xin, - m
-
I
r)
rur; N ;=1
r-l 1 N }
- ~ N ~ GJhj ; x(n'), n' ~ n]x(n - ml) ... x(n - mr ) (8.35)
where Gj are the estimates ofthe lower-order discrete Wiener functionals (j = 0, 1, ... , r
- 1), N is the total number ofpoints in the input-output data record and I is the total num-
ber of spikes in the output. For the off-diagonal points of the kemel, the second term on
the right-hand side ofEquation (8.35) vanishes as Ntends to infinity.
To get a feeling about Equation (8.35), let us consider the estimation of the first three
Wiener kernels:
I
1~(
I
" ) __ (8.36)
ho = - LY n - n, - N
N ;=1
" Il{l " IN

h 1(m)=-2 NLx(n;-m)-ho-Lx(n-m)
}
a; ;=1 N n=1
1 I
"'" Nu 2 ~x(ni - m) (for large N) (8.37)
x z=1
"
h2(ml ' m2) =
1{IN ~x(ni
24 I
- ml)x(n i -
}" 1 N
m2) -ho- Lx(n - ml)x(n - m2)
o; z=1 N n=1
1 N M-I
- N ~ ~O h\(m)x(n - m)x(n - ml)x(n - mz)
1 I K
=::::: 21\' 4 Ix(ni - ml)x(n i - m2) - - 22 5(m l - m2) (for large N) (8.38)
1VU X ;= I NuX
For large Nthe first-order model response is

M-I
YI(n) =h o + I
m=O
hl(m)x(n - m)
I 1 M-I I
= N + Na} ~~x(ni-m)x(n-m)
I M I
=N + N ~r(n-ni) (8.39)
where
1 M-I
r(n - n i) = --2
Mo;
I
m=1
xin, - m)x(n - m) (8.40)
The critical observation is that r(n - nj) is an estimate (over M samples) of the normalized
autocorrelation function of the input x(n), centered at the point n = n.. Since x(n) is a
white process, this estimate will tend to a Kronecker delta as M (i.e., the memory band-
width product ofthe kernel) increases. For given M, r(n - ni) will have its peak value at n
= n, and side values (i.e., for n =1= n i) randomly distributed with zero mean and variance
11M. We can easily derive that
E[r(n - nj)] = 5(n - n;) (8.41)
1
var[r(n - ni)] = -[1 + o(n - n i)] (8.42)
M
1
cov[r(n - nj), r(n - nj ) ] = M [o(nj - nj ) + 0(2n - n, - nj ) ] (8.43)
The covariance ofr(n - nj) in Eq. (8.43) depends on the autocorrelation ofthe system out-
put and the specific time instant n. This makes the variance of the first-order model pre-
diction dependent on the overall system characteristics. If we call the contribution of the
first-order Wiener functional: Yl(n) = Yl(n) -ho, then
M [ I I-I ]
var[Yl(n)] = }f2 1+ 2~ ~8(2n-ni-n) (8.44)
An upper bound for this variance can be found regardless of the system characteristics as
var[YI(n)] ~ N2 1 + N IM( n) (8.45)

Equation (8.45) indicates that the variance of the first-order model prediction may in-
crease in time for given N, although it tends to zero as N tends to infinity. Note however
that
MI
E[Yl(n)] = -
N i=l
I
8(n - ni ) (8.46)
which implies that the size of the expected value of the predicted spikes also decreases as
N increases.
We can study similarly the prediction given by the second-order Wiener functional
Y2(n) =Y2(n) -Yl(n) -ho

M-l M-l M
= I I h2(m j, m2)x(n - ml)x(n - m2) - a-; I h2(m, m) (8.47)
ml=O m2=O m=O
For large N, substitution of h2 from Equation (8.38) into Equation (8.47) yields
M2 I IM IM "-
Y2(n) = 2N ~,-2(n - n;) - 2N Y(O) + 2Na-; 4J=(O)
I M-l M-l "-
- ---=t
2Nu,-r
I I
ml=O m2=O
cPxx(ml - m2)x(n - ml)x(n - m2) (8.48)
where
1 N
4>=(m) = N ~x(n)x(n - m) (8.49)
Note that 4>xx is different from r because of the different sample size (N versus M), and
M(M+ 1) I
E[Y2(n)] = I8(n - nJ (8.50)
2N i=l
which indicates that the prediction of the second-order Wiener functional will also repro-
duce (in the mean) the output spike train. This is the fundamental observation that has led
to the concept of minimum-order Wiener modeling for spike-output systems [Marmarelis
et al., 1986].
Minimum-Order Wiener Models. The analysis of the previous section revealed the
interesting fact that the contribution of each nonzero Wiener functional to the model pre-
diction of a spike output is an estimate 0/ a scaled replica of that output. This fact be-
comes apparent when the reverse-correlation technique is used for the estimation
ofthe system Wiener kemels. The model prediction of each individual Wiener functional
places an estimate of the input autocorrelation function (obtained over M points where M
is the memory-bandwidth product of the respective kerneI) at all locations where an in-
put-induced spike is found in the output record. Since the input is white noise, the expect-
ed value of this input autocorrelation function is a Kronecker delta. This would appear to
imply that if these input autocorrelation estimates are sufficiently accurate, then a single
nonzero Wiener functional would suffice in predicting the system output. However, the
variance of these input autocorrelation estimates can be so large as to prevent such a de-
sirable simplification of the modeling and prediction task. This variance depends, in gen-
eral, on the overall system characteristics and the output autocorrelation.
The important suggestion that emerges from these realizations is that if the aforemen-
tioned variance is sufficiently small, then the lowest-order nonzero Wiener functional will
be adequate for the purpose of predicting the system output. Otherwise, more Wiener
terms must be included in the model until the resulting prediction attains the required
clarity. No general mies have been formulated as to the minimum number of required
Wiener terms in a given application, since it depends on the system characteristics. How-
ever the notion of a minimum-order Wiener (MOW) model has become clear. We explore
below the application of this concept to several classes of continuous-input/spike-output
(CISO) systems, and demonstrate the considerable simplification that can be achieved in
many applications by employing the concept of a MOW model.
First we consider the class of CISO systems described by the cascade of a stable linear
time-invariant filter LTI followed by a threshold-trigger TT as shown in Figure 8.17. The
following relations hold:
M-I
v(n) = I
m=O
h(m)x(n - m) (8.51 )
1 ifv ~ (J
y(v) = 0 (8.52)
ifv< (J
where h(m) is the discrete impulse response function of the filter (M being its memory-
bandwidth product) and (J is the threshold value of TT. If x(n) is a GWN process with
variance u;,then v(n) is a nonwhite Gaussian process with zero mean and variance:
M-I
u~ = u; I
m=O
h2(m) (8.53)
Ifwe now consider the Hermite expansion ofy(v) over the entire realline, we have:
y(v) = IakHk(v/V2uv) (8.54)

k=O
where {Hk} is the orthogonal set ofHermite polynomials and {ak} are the Hermite expan-
sion coefficients given by
a = 1 foo
k 2 kk!V2O"v Y2u 8 er 2l2av2H k(v/V2O"v)dv (8.55)
v
Figure 8.17 Cascade of a linear time-invariant filter (LTI) followed by a threshold trigger.
where the kth-order Hermite polynomial is given by
[k/2] (-1 Y2 k- 21kI

Hk(z) = I __ zk-21 (8.56)
1=0
Then the Wiener kernels of the system shown in Figure 8.17 are
hk(mb' .. , mk) = c,j1,(ml) ... h(mk) (8.57)
where
2k/ 2
ci> - k - a k (8.58)
Uv
For instance, the zero-order Wiener kerne I is given by
1 foo -v 212u3 dv (8.59)

ho = v!2uv V2u lJ e
v
representing the average firing rate of the system for a GWN input, that clearly depends
on the values of o; and (J. The first-order Wiener kernel is given by
hl(ml) = clh(m) (8.60)
where
Cl = - -12 foo e: v2
2/2u v2 ( - 2 - 1) dv (8.61)
2uv V2uv 8 Uv
Note that the first-order Wiener kernel is a scaled replica ofthe impulse response function
of the filter LTI. Consequently, all the information needed to characterize the system in
Figure 8.17 can be found in the first-order Wiener kernel (within a scalar), in addition, of
course, to the threshold value o. By evaluating higher-order Wiener kernels we do not
gain any additional information about this system, that is, the MOW model for this class
ofsystems is first-order, although higher-order kernels exist. Although this is not true for
all systems, we can generally assert that higher-order kernels are not needed above a cer-
tain finite order for the representation of the CIsa class of systems.
An important observation regarding the unknown scalar Cl and threshold value (J in the
example above is that specific knowledge of these two parameters is not necessary from
the prediction point of view. Since a single threshold estimate fJ is used in converting the
Wiener model output into a spike train (which is our model prediction for the spike out-
put), both ofthose unknown parameters (Cl and 0) can be combined in determining 8.
The actual estimation of the threshold 8 that ought to be used in connection with the
Wiener model in order to predict the system spike output is a critical task in actual appli-
cations that is performed by comparing the Wiener model output with the actual system
output. In general, the model prediction for a given threshold fJ will fail to match some
output spikes (false-negative aspikes) and predict some spikes that are not present in the
system output (false-positive ß spikes). A cost function that combines the number of a
and ß spikes for a given threshold 8can be used in practice to yield an optimum threshold
estimate that minimizes this cost function. We should note that, in actual experimental
data, a number of spikes that are not input-related will generally be present (point-process
noise or spontaneous activity). These should be found among the aspikes of our model
prediction, and could be effectively "filtered out" by use ofthe presented method.
Another method for estimating () and evaluating the predictive ability of this model
was presented earlier in connection with the spider mechanoreceptor model (see Fig.
8.16) and involves computation of the sensitivity-specificity curves [akin to receiver op-
erating characteristic (ROe) curves used in detection systems].
Let us now consider the class of CISO systems described by the cascade of a linear
time-invariant filter LTI followed by a zero-memory nonlinearity ZMN and a threshold
trigger TT, as shown in Figure 8.18. The cascade ofZMN and TT can be viewed as a sin-
gle zero-memory, nonlinear subsystem NTT that produces aspike whenever its input v(n)
attains values within specific ranges determined by the combination of ZMN and TT. For
instance, if the characteristic of ZMN and the threshold value of TT are as shown in Fig-
ure 8.19(a), then the resulting characteristic of the composite zero-memory subsystem
NTT is as shown in Figure 8.19(b). Consequently, the Wiener series representation of the
overall system of Figure 8.18 will have kemels of the form described by Equation (8.57),
where the coefficients {Ck} will be given by the expression in Equation (8.58) with the
Hermite expansion coefficients {ak} corresponding to the nonlinear function shown in
Figure 8.19(b). Clearly, ifthe characteristic functionj{v) of ZMN is monotonie, then the
NTT characteristic will be a threshold-trigger characteristic with a single threshold value
and the analysis presented for the class of systems in Figure 8.17 will apply. If the ZMN
characteristic is not monotonie, then the analysis will be complicated in that higher-order
Wiener terms may be necessary to achieve a certain accuracy if and only if the function
j{v) - () has multiple roots.
The critical observation is that the minimum order of the required Wiener model is
equal to the number of "trigger points" in the NTT characteristic; that is, the number of
real solutions ofthe equation:j{v) - ()= O. For instance, a characteristic ofthe form shown
in Figure 8.19(b) would require a third-order MOW model. This observation is based on
the fact that the MOW model must reproduce, in addition to the dynamics related to the
LTI subsystem in this example, a zero-memory nonlinear characteristic p(v) that will in-
tersect the model threshold line at the actual trigger points of the system. These trigger
points represent, of course, the real roots of the polynomial p(v) - 8, where p(v) is the
minimum degree polynomial that satisfies this condition. These trigger points define the
one-dimensional "trigger regions" discussed in the previous section.
For a ZMN characteristic with even symmetry (e.g., a full-wave rectifier or a squarer),
the odd-order Hermite coefficients ofits expansion will be zero and, consequently, a first-
order Wiener model will be unable to predict any ofthe output spikes (since the first-order
Wiener kerne I will be zero). A second-order Wiener model, however, will be an excellent
Figure 8.18 Cascade of a linear time-invariant filter (LTI) followed by a zero-memory nonlinearity
(ZMN) and a threshold trigger (Tf). The composite box NTT is a zero-memory nonlinearity exhibiting
the trigger regions shown in Fig. 8.19 [Marmarelis et al., 1986].
u «v ) y
1 r-
V1 v2 I v3 V
(a ) (b)
Figure 8.19 (a) The characteristic u = f(v) of ZM N in Figure 8.18 and its intersection points with the
threshold line u = (J. (b) The characteristics of NTT exhibiting trigger regions that are defined by the
intersection points V1' V2' and V3 [Marmarelis et al., 1986].
predictor of output spikes when combined with the appropriate threshold on the basis of
similar arguments as the ones presented above. Generally, all ZMN characteristics that
have only two intersection points with the line representing the threshold will yield a two-
sided threshold characteristic for the composite subsystem NTT and, consequently, admit
a second-order MOW model as a complete representation ofthe overall system.
It is important to observe that no matter how complex the ZMN characteristic is in the
subthreshold or suprathreshold region, it does not affect the order ofthe MOW model and
its predictive ability. The only thing that matters is how many times the ZMN characteris-
tic intersects the threshold line.
A question naturally arises in connection with CISO systems that do not belong to the
simple classes indicated in Figures 8.17 and 8.18. The general class of CISO systems can
be described by the PDM (or NM) model followed by a TT, and the combined NTT will
generally exhibit multi dimensional "trigger regions" of dimensionality equal to the num-
ber ofPDMs or NMs ofthe subject system.
An illustrative example of MOW modeling of an actual physiological system is given
below for the case of spatiotemporal dynamics of retinal ganglion cells discussed in Sec-
tion 7.4. An example of a three-dimensional trigger region is presented in the case of a
spider mechanoreceptor in Section 8.2.1 (Fig. 8.15).
Illustrative Example. As an example of MOW modeling of an actual physiological

system, we present here some results obtained by spatiotemporal minimum-order
Wiener model of retinal ganglion cells in the frog [Citron & Marmarelis, 1987]. The re-
sponse of the class 3 ganglion cell was predicted using a MOW spatiotemporal model
of first and second order. Typical results are shown in Figure 8.20 and demonstrate the
utility of the second-order Wiener functional in this case, since the latter predicts five of
the eight output spikes in the data segment shown for the appropriate threshold (with no
false positives), whereas the first-order Wiener functional predicts only two of the eight
spikes (one output spike is predicted by both functionals). This result indicates that the
minimum order for Wiener modeling of ganglion cells in the frog retina is second-order
and predicts about 75% of the output spikes with no false positives, for the appropriate
threshold.
8.3 NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS
The nonparametric modeling of neuronal systems stimulated by temporal sequences of

action potentials (spike trains) can be simplified (from the processing point ofview) when
" PREDICTION U8INQ Fl'8T-0ADER WBER TEAM
x !
I
x x
•
xx
I
)0(
I
...
0
I
o.eo
I
UIO 2.AO
I
3.20
I I
4
Th1e lsecl
B SECOM)-OAD, ~ TERM
x x x x XX XX
I e
...
0 o.eo
I
UIO
I
2.AO
I
3.20
I I
4
Tme (sec)
C Fl'ST + SECOND-OADER TERMS

1+2
• •~
1 2 22
X X X X xx xx
I e
,
~
I I I
5" o.eo 1.80 2.AO 3.20
Tme lsecl
Figure 8.20 First - and second -orde r predictions of the spike response of a Class 3 frog ret inal gan-
glion cell to a spat iotemporal white-noise stimulus. Measured spikes are marked by x 's. Pred icted
spikes are indicated by arrows. Threshold value, 0, is drawn to maximize the number of predicted
spikes without false positives [Citron & Marmarelis, 1987].
439
the action potentials are idealized as impulses of fixed intensity (Dirac delta functions in
continuous time or Kronecker deltas in discrete time). Thus, these inputs can be represent-
ed mathematically in a stochastic context by "point processes," a class of random process-
es that are formed by sequences of identical impulses (spikes) representing random, in-
stantaneous, and identical events.
Proper discretization of a continuous-time point process requires that the sampling in-
terval (bin width) be equal to the absolute refractory period in order to allow the represen-
tation of each continuous-time action potential by one and only one Kronecker delta in
the respective bin (the intensity ofthe recorded spike ought to be the integrated area under
the action potential divided by the bin width T).
In practice, a sequence of action potentials (spike train) can be represented by a dis-
crete-time point process:
I
x(n) =A Iö(n -ni) (8.62)

i=l
where n = 1, ... ,N denotes the discrete-time index (t = nT), A is the intensity ofthe Kro-
necker delta (spike) and n, is the time index ofthe ith spike event (i.e., the discretized tim-
ing of the ith action potential). Note that this point process has I spike events over the
available data record of N bins, that is, the mean rate of this point process is (I/N). We
seek to address the problem of neural system modeling from input-output data, where the
system input x(n) is a point process as described by Equation (8.62), but the system out-
put y(n) may be either a discretized continuous signal or a point process.
This fundamental problem was addressed for the first time in the general input-out-
put nonlinear modeling context ofthe Volterra-Wiener approach by Krausz (1975) and
Ogura (1972) in the 19708. Shortly thereafter, significant contributions were made by
Kroeker (1977, 1979) and Berger and Selabassi and their associates [Berger et al., 1987,
1988a, b, 1989, 1991, 1993, 1994; Selabassi et al., 1987, 1988a, b, 1989, 1994].
Although this methodology yielded promising initial results, it has found only limited
use to date, partly due to its perceived complexity and the burdensome requirements of
its initial applications (viz., the length of the required experimental data and the restric-
tive use of Poisson stimuli required by the cross-correlation technique that is used for
model estimation). Note that in this context the Poisson process is the counterpart of
GWN inputs in that it represents statistically independent events (no correlation or spec-
tral structure).
In this section, we clarify some important methodological issues regarding the nature
of the obtained kernels and introduce a more efficient model/kernel estimation method
that reduces significantly the data-length requirements while increasing accuracy. Initial
applications ofthis methodology to experimental data (using Laguerre expansions) have
demonstrated its efficacy in a practical context [Alataris et al., 2000; Courellis et al.,
2000; Gholmieh et al., 2001, 2002, 2003a, b; Song et al., 2002, 2003; Dimoka et al.,
2003] and have corroborated the mathematical results presented herein. This approach to
nonlinear modeling of neural systems with point-process inputs is the only general
methodology currently available. This methodology is also extendable to systems with
multiple inputs and multiple outputs in a nonlinear dynamic context (see Section 8.4), of-
fering the realistic prospect of modeling neuronal ensembles.
In the general formulation of the modeling problem, we seek an explicit mathematical
description of the causal functional F that maps the input past (and present) upon the pre-
8.3 NEURONAL SYSTEMS WITH POINT-PROCESS INPUTS 441
sent value ofthe output in terms ofthe discrete-time Volterra series, which reduces for the
point-process input ofEquation (8.62), to the output expression
I I I
yen) = ko + A Lk1(n - ni ) + A 2 L
i=l
Lk2(n - n., n - n
i=l j=l
j) + ... (8.63)
where the high-order terms (second-order and above) are nonzero only for In - nil :5 M (M
is the finite memory ofthe system) and i or} cover all event indices. In practice, rapid con-
vergence ofthis functional power series is desirable, as it allows truncation ofthe Volterra
series to a few terms for satisfactory model accuracy (i.e., output prediction) and yields rel-
atively concise models. However, this convergence is often slow for point-process inputs.
Following Wiener's approach in the case of continuous-input systems, the search for a
kemel-estimation method that minimizes the model prediction error (as measured by the
output prediction mean-square error) leads to the construction of an orthogonal hierarchy
(series) of functionals using a variant of the Gram-Schmidt orthogonalization procedure.
This approach seeks to decouple the kernels of various orders and secure maximum re-
duction of the prediction error at each successive model order. Orthogonalization of the
functionals also facilitates the estimation of the kernels through cross-correlation.
Critical for this orthogonalization procedure is the selection of the proper input that
ought to test the system as thoroughly as possible over the space of all possible point-
process inputs. Thus, for stochastic inputs, ergodicity is required as well as appropriate
autocorrelation properties of all orders up to twice the highest-order functional (kerne I) of
a given system. For systems with point-process inputs, the proper test input is the Poisson
process, defined in the discrete-time context as a sequence of independent events (spikes)
with fixed probability A of occurrence at each time bin T. For a discrete-time Poisson
point-process (PPP) input x(n), we have the following statistical moments:
E[xr(n)] = AAr (8.64)
where E[.] denotes the expected value operator or ensemble average. The qth-order auto-
correlation function of a PPP is
E[x(nl)x(n2) ... x(n q)] = NAq (8.65)
where} denotes the number of distinct time indices among the indices (n 1, . . . , nq ) . This
expression results from the statistical independence of the values of the Poisson process
in each time bin. The parameter A (Poisson parameter) defines the mean rate of event oc-
currence and plays a role analogous to the power level of GWN in the case of continuous-
time input signals. The statistical properties defined by Equation (8.65) are critical for the
development of the orthogonal functional series with PPP input, which we term the Pois-
son-Wiener (P-W) series. The development ofthe P-W series is greatly simplified ifwe
use the de-meaned input:
zen) = x(n) - AA (8.66)
The P-W orthogonal functionals {Qj} are constructed by use of a Gram-Schmidt or-
thogonalization procedure and involve a set of characteristic P-W kernels {Pj}. They satis-
fy the orthogonality condition ofzero covariance E[QiQj] = 0 for i =1= j, and take the form
QO=PO (8.67)
AI
QI[Z(n);PI] = LPI(m)Z(n - m) (8.68)
m=O
M M
Q2[Z(n);P2] = L L P2(mh m2)Z(n - ml)Z(n - m2)

ml=Om2=0
IL3 M M
- - LP2(m, m)z(n - m) - IL2 LP2(m, m) (8.69)
IL2 m=O m=O
and so on.
The model output is composed of the sum of these orthogonal functionals up to the re-
quired order, which is finite by practical necessity, although the system output may corre-
spond in general to an infinite series:
yen) = LQJz(n);pd (8.70)

;=0
Note that these orthogonal functionals depend on the statistical central moments of the
de-meaned PPP input: /-LI ~ E[z(n)] = 0, IL2 ~ E[z2(n)] = A(1 - A)A2, IL3 ~ E[z3(n)] = A(1 -
A)(1 - 2A)A3, and so on. The normalized rth moment IL,JAr is an rth-degree polynomial of
A. The following key relation is noted:
2
IL4 ~ E[z4(n)] = /-L~ + /-L3 (8.71)
JL2
as it attains critical importance in proving the fundamental limitation in the estimability of

the kernel diagonal values that was empirically observed first by Krausz [Krausz, 1975].
Emulating the cross-correlation technique for Wiener kerne I estimation in the contin-
uous case utilizing GWN inputs, we may estimate the unknown kemels {Pj} byevalu-
ating the covariances between the output yen) and known orthogonal "instrumental"
functionals of the input zen), that is, evaluating the "orthogonal projections" of the out-
put signal upon each of these "instrumental" orthogonal functionals that form an or-
thogonal "coordinate system" in the functional space. If these "instrumental" function-
als are chosen to be simple shift operators, then this approach results in the
cross-correlation technique [Lee & Schetzen, 1965; Berger et al., 1987], which was first
adapted to Poisson-input systems by Krausz in 1975, building on Ogura's earlier contri-
butions [Ogura, 1972]. Unfortunately, the specific estimation formulae derived by
Krausz contained some scaling errors that are corrected below.
The P-W kernel estimation formulae following the cross-correlation technique with
the appropriate instrumental functionals are given below. The zero-order P-W kernel Po
represents the average output value
Po = E[y(n)] (8.72)
The first-order P-W kerne I is given by

1
Pl(m) = -E[y(n)z(n - m)] (8.73)
J1-2
Note that the first-order P-W kernel estimation fonnula derived originally by Krausz has
a different nonnalization constant, namely A instead of J.L2 = A(1 - A)A2.
For the evaluation ofthe second-order P-W kernel, the key (and surprising) realization
is that, because of the identity of Equation (8.71) relating J1-2, J1-3' and J.L4 for any Poisson
process, the diagonal values of the kernel (i.e., for m, = m2) cannot be evaluated through
cross-correlation and must be defined as zero! In other words, the "orthogonal projection"
ofthe output signal y(n) upon the signal z2(n - m) - (1 - 2A)A . ztn - m) - A(1 - A)A2, is
always zero for all m (i.e., they are orthogonal). Thus, the second-order P-W kerneI esti-
mate is given by
P2(mb m2) = { 2~~ E[y(n)z(n - ml)z(n - m2)], for m, =1= m2

(8.74)
0, for m, = m2
In general, the rth-order P-W kernel is given by
Pr(mb . . . , m ) = {
r
r!~2E[y(n)z(n - ml) ... ztn - m r) ] , for distinct (mb . . . , m r )
0, otherwise (8.75)
For ergodie and stationary processes, the ensemble averaging can be replaced by time
averaging over infinite data records. Since, in practice, we only have the benefit of fi-
nite data records, the aforementioned time averaging is limited to a finite domain of
time and results in estimates of the system kernels that represent approximations of the
exact kernels (see Section 2.4.2). The P-W kerne I estimation fonnulae can be adapted
to the specific PPP input form of Equation (8.62) by utilizing the properties of the
Kronecker delta, which reduce the multiplication operations of cross-correlation to ad-
ditions:
1 N JA
Po=- Ly(n)=- (8.76)
N n=O N
A f AA
Pl(m) = -Ly(ni+m)- -Po (8.77)
N J1-2 i=1 J.L2
A2 {f.L.Lf ~[y(nil + m
--2 1) + y(ni2 + m2)]5[(n i1 - n i2) - (m2 - ml)]
2NJ.L2 1=112=1
P2(mb m2) = ~ ~ 2 A} (8.78)
- A~[y(ni + ml) +y(ni + m2)] + A Npo , for m, =1= m2
0, for m, = m2
The key definition that the diagonal values of the kemels be zero leads to considerable
simplification of the form of the functionals by eliminating all the terms other than the
leading term, that is, the last two terms of Q2 in Equation (8.69), since P2(m, m) == O.
Thus, the P-W functional series takes the simple form
yen) = Po + L PI(m)z(n - m) + LLP2(mh m2)z(n - ml)z(n - m2)

m mlm2
+L L L P3(m b m2' m3)z(n - ml)z(n - m2)z(n - m3) + ... (8.79)

ml m2 m3
which is identical in form to the Volterra series for the de-meaned PPP input, but with the
important distinction that the diagonal values of the kernels be zero by definition.
The P-W series ofEquation (8.79) can be expressed in terms ofthe original PPP input
x(n) by use ofEquation (8.66). The resulting expression for the system output in terms of
the P-W kernels and the original PPP input can be used to derive the Poisson-Volterra
(P-V) kernels
k o = L(-AA)rL ... LPr(mb ... , m r) (8.80)
r=0 ml mr
k1(m) = L r(-AA)r-1L ... LPr(m, t., 12, ... ,Ir-I) (8.81)

r=l 11 I r-1
,... 00 r!
kz(ml> mZ) = L 2'( _ 2)1 (-AAy-zL
r=2 . r . 11
... LPr(ml> mZ, 11,
1r-2
• • • ,Ir-Z) (8.82)
and so on, which are identical to the Volterra kernels of the system at the nondiagonal
points. Note that, for a finite-order system, the highest order P-V and P-W kemels are
identical. The diagonal values ofthe P-V kernels are zero since the P-W kernels are zero
at the diagonals, reflecting the fact that a point-process input (having fixed spike magni-
tude values) is intrinsically unable to probe (and therefore estimate) the kernel diagonal
values. Note that the latter would be possible ifthe spike-magnitude values were allowed
to vary.
The resulting "Poisson-Volterra" (P-V) series attains the following meaning: the zero-
order term is the average value ofthe output; the first-order term accounts for the respons-
es to individual input impulses; the second-order term accounts for interactions between
pairs of input impulses within the memory epoch (defined by M); the third-order term ac-
counts for interactions among triplets of input impulses within the memory epoch, and so
on. One could not hope for a more orderly and elegant mathematical model of the hierar-
chy of nonlinear interactions of point-process inputs.
8.3.1 Lag-Delta Representation of P-V or P-W Kerneis

Several investigators [Berger et al., 1987, 1988a, b, 1989; Selabassi et al., 1987, 1988a, b,
1989] have found it more convenient to represent the P-V or P-W kemels in a different
coordinate system that takes the inter-spike interval as an independent variable; that is, in-
stead ofp2(mb m2) they use P2(m, a 1) , where m = m, and a 1 = m2 - m«. ClearlY,P2(m, a 1)
= 0 for a 1 = o. The expressions for the P-V kernels in this lag-delta coordinate system be-
come
k o = I(-AA)rr!I I ... I Pr(m, ~b ••• ,ar-I) (8.83)

r=O m aI ar-I
kI(m) = Ir(-AA)r-I r! I ...I Pr(m, ab ... ,ar-I) (8.84)

r=I aI ar-I
-- ~" _ ~ r!(r-1)! r-2~ ~ __

k2(m, a I) -k 2(m , m + a I) - L 2'( _ 2)' (-AA) L ' " LPr(m, ~J, ~2' ••• ,~r-I) (8.85)
r=2 . r . a2 Ar-I
a
where i = m.; 1 - m 1. The kernel values are zero whenever any of the arguments ~i is zero
(kerneI diagonals in the conventional notation).
The two sets of kernels, {~} and {ki}' are equivalent in the sense that either set repre-
sents fully the input-output relationship. The difference is in the way the kemels are de-
fined within the P-V functionals. The lag-delta representation indicates that the interspike
intervals of the input point process are the critical factors in detennining the nonlinear ef-
fects at the output of neuronal systems.
We must emphasize that, in actual applications, the obtained model should be ulti-
mately put in the P-V form, since the P-V kemels are independent of the specific para-
meters (A, A) ofthe PPP input-unlike the estimated P-W kemels, which depend on the
PPP input parameter (AA). The P-V kernels (unlike the P-W kernels) provide a system
model that does not depend on the specific point-process input and can be used to predict
the system response to any given point-process input (not only Poisson).
Equations (8.80)-(8.85) can be used to reconstruct the P-V kernels of a system from a
complete set of P-W kernel estimates obtained via the cross-correlation technique. The
mathematical relationships between P-W and P-V kemels also suggest practical means
for estimating the nonlinear order of the required model by varying the input-specific pa-
rameter (AA) and observing the resulting effects on the obtained P-W kemels, Since the
latter are polynomial expressions of (AA) with coefficients dependent on the input-inde-
pendent P-V kernels, an indication ofthe order ofthe system (and ofthe required model
order) can be obtained from the degree ofthe observed polynomial relation.
An important observation is in order regarding the diagonal values of the kemels. It
was shown that, for point-process inputs of fixed spike magnitude A, the diagonal values
of the P-W kernels cannot be estimated via cross-correlation. However, since two input
events cannot occur at the same time, the diagonal values of the kernels are never used in
model prediction (whether zero in value or otherwise) and may assume any values with-
out affecting the model prediction, as long as the contributions of these diagonal values
are balanced properly by the lower-order kemels. If the input spikes are allowed to have
different magnitude values, then the diagonal kernel values become relevant and can be
estimated. Of course, in order to estimate these diagonal kernel values from input-output
data, we must either test the system with point-process inputs of variable spike magni-
tudes or interpolate the diagonal kerne I values using the neighboring off-diagonal values
(under the assumption ofkernel smoothness at the diagonal points). The ultimate valida-
tion of these kemels has to be based on the predictive performance of the resulting model.
8.3.2 The Reduced P-V or P-W Kerneis

In another simplification used frequently in practice [Berger et al., 1987, 1988a, b, 1989;
Selabassi et al., 1987, 1988a, b, 1989], the lag-dimension (m) of the kernels in the lag-
delta representation can be suppressed and the dimensionality of the kemels can be re-
duced by one, when the system dynamics in the lag dimension do not spread beyond one
time bin (typically 5 to 10 ms in size). The latter situation also arises in those cases where
the output is an event synchronized with each input spike (e.g., the measurement of a pop-
ulation spike in a recorded field potential). The lag dimension can be suppressed if we can
assume fixed response latency in first approximation. This reduces significantly the com-
plexity of the resulting model by suppressing one dimension in each kernel and has been
proven useful in many applications to date.
It is evident that a modified Volterra model emerges in this case, termed "the reduced
Volterra model," whereby the magnitude ofthe synchronized output event is expressed in
terms ofthe "reduced P-V kernels" {kr} as
yen;) = AkT + A2 L ki(n i - ni) + A3 L L kj(n i - nil' n, - ni2) + ... (8.86)

nj<ni nil<ni nj2<'11
where n, denotes the common time index of input-output events, and the summations take
place over all time-indices ni of events preceding the present/reference index n, that lie
within the memory epoch ofthe system kernels (i.e., In; - nil ~ M). Note that the reduced
P-V kernels are expressed in the lag-delta representation.
The use of the reduced Volterra model facilitates the practical modeling of neural sys-
tems for which the output events exhibit fixed latency relative to the corresponding input
event. If this latency d( n i ) is variable, then another reduced Volterra model equation can
be used to describe the dependence of the variable latency on the timing of the input
events as
den;) = AlT + A2 I li(n i - ni) + A3 I L lj(n i - nil' n, - ni2) + . .. (8.87)

nj<ni '11<ni nj2<njl
where the reduced P-V kemels {Ir} characterize fully the dependence of the output event
latency upon the timing of the preceding input events.
The introduction of a separate model equation for the output event latency suggests
that it is possible to have a vector representation of the output containing any number 01
output attributes (e.g., the initial slope or peak value of an EPSP of a field potential).
Each of these output attributes begets a reduced Volterra model equation and the associat-
ed set of kemels, Each equation (single-attribute data) is analyzed separately to estimate
the respective set ofkemels. Interpretation ofthe kernels can be made either for single at-
tributes or conjointly for multiple attributes.
The resulting reduced Volterra model for the vector output case (i.e., when the depen-
dence of multiple output attributes on the input spike sequence is examined) takes the
vector form
x(n;) = Agi + A2 L g~(ni - ni) + A3 I I g~(ni - nil' n, - ni2) + . . . (8.88)

nj<ni nil<ni nj2<njl
where the underscore denotes vector, i.e., l(n;) = [al(ni) a2(ni ) ••• aQ(n;)]' with aq(ni) de-
noting the qth attribute ofthe output at event time n i (q = 1, 2, ... , Q), and denotes the g:
vector ofthe corresponding rth-order reduced P-V kernels for the multiple output attrib-
utes considered.
Having established the proper definitions, mathematical relations and meaning for the
Volterra, P-V, and P-W kernels of a neuronal system, we now turn to the key issue in ac-
tual applications: the accurate and efficient estimation ofthese kernels from input-output
data. The cross-correlation technique (CCT) was the first to be used for P-W kernel esti-
mation in actual applications [Krausz, 1975], but it has been shown to provide limited es-
timation accuracy and to require long data records in order to achieve satisfactory P-W
kerneI estimates-a serious burden in actual experimental studies where the preparation
can be kept stable only for a limited time. A better estimation method can be based on La-
guerre expansions of the kernels and least-squares fitting procedures, as in the continu-
ous-input case [Marmarelis, 1993]. This method was recently adapted to systems with
point-process inputs and was applied successfully to modeling studies of hippocampal
neurons [Gholmieh et al., 2001, 2002, 2003a, b; Dimoka et al., 2003; Song et al., 2002,
2003]. The method employs the orthonormal basis of discrete-time Laguerre functions
(DLFs) to expand the system kernels, and then uses least-squares fitting to estimate the
requisite expansion coefficients.
In the context of point-process inputs, we may consider the Laguerre expansions of the
Volterra kernels in the lag-delta representation, using L DLFs {bj } :
L
k1(m) = I
j=l
Cl (j)bj(m) (8.89)
L L
k2(m, ~) = I I
jl=lj2=1
C2(jbj2)bjl(m)bj2(~)
(8.90)
The general input-output relation after the Laguerre expansion of the kernels is
I L
y(n) = Co +A I j=l
I
i=l
Cl (j)bj(n - ni)
I il L L
+ 2A2 I I I I C2(jJ,J2)bj 1(n - nil)bj2(n - ni2)
il=l i2=ljl=lj2=1
+ ... (8.91)
where Co = k o (for uniformity ofnotation) and the summation over the indices (i b i 2 , . . •)
extends over the intervals In - nil that do not exceed the system memory (i.e., the extent of
the nonzero values of the kernels). It is evident that the diagonal values of the Volterra
kernels (or the zero-argument values in the equivalent lag-delta representation) are inter-
polated from neighboring estimated values using the Laguerre basis functions. The ex-
pansion coefficients ofEquation (8.91) can be estimated by linear least-squares fitting of
the input-output data.
Having estimated the expansion coefficients in the model equation (8.91), we can
now reconstruct the estimates of the Volterra kemels of the system using Equations
(8.89) and (8.90). These Volterra kernel estimates have nonzero values at zero argu-
ments (in general), owing to the interpolation/extrapolation means provided by the
Laguerre expansion. As an example, for a single input spike of magnitude A at time n.,
the model output is
y(n) = [ko + Ak1(n- ni) + A 2k2(n - ni' 0) + .. .]u(n - ni) (8.92)
where u denotes the discrete step function (0 for negative argument and 1 elsewhere), and
the Volterra kernels {ko, k., k2 , ••• } are not defined as zero for aj = 0 in the lag-delta rep-
resentation, unlike their P-V counterparts, which express the model output as
y(n) = [ko + Ak1(n - ni)]u(n - ni) (8.93)
because, k2 (n - n., 0) = 0 etc. This simple example i1lustrates the use of the interpolated
Volterra kernels instead ofthe P-V kerneIs (causing no harm).
It should be emphasized that the proposed kernel estimation method yields the Volter-
ra kernels of the system, whereas the cross-correlation method can only yield estimates of
the P-W kerneIs that are zero on the diagonal points. It is also evident that Volterra kerne I
estimation using kernel expansions does not require Poisson inputs (although the latter
are efficient test inputs) and can be used in connection with spontaneous neural activity
data (arbitrary point-process inputs). This broadens immensely the scope ofthe advocated
approach and elevates it to a general methodology with no peers at the present time.
A note should be made regarding the relation of the advocated approach to the current-
ly popular approach of paired-pulse testing, whereby pairs of impulses of variable inter-
pulse interval are presented at the input ofthe system and the deviation ofthe system out-
put from linear superposition is computed as a quantitative measure of paired-pulse
facilitation or depression. This quantitative measure is also given by the second-order
Volterra kernel in a more efficient manner, when the system is of second order and only a
pair of impulses is allowed within the memory epoch. Greater efficiency results from the
reduction in the required experimentation and processing time using the advocated ap-
proach, because the long sequence of paired-pulse tests is avoided. However, if the sys-
tem is of higher order, even if no more than two impulses occur within the memory
epoch, the results will deviate for the two cases.
The presented Volterra formulation is valid for any number of impulses within the
memory epoch and is more rigorous because it is based on solid mathematical founda-
tions that allow modeling ofthe nonlinear interactions (facilitatory or depressive) of any
order among any number of impulses within the memory epoch (not just pairs), bringing
the experimental paradigm and the utility of the Volterra model much closer to the actual
natural operation of the system. This is extremely important for advancing scientific
knowledge in neuronal dynamics in a natural and unbiased context that is not constrained
by specialized inputs, since results obtained from paired-pulse testing can be severely bi-
ased by higher-order interactions. The results obtained with the advocated methodology
are more reliable than the results ofthe paired-pulse method because they separate explic-
itly the contributions of the various orders of kemels (i.e., the paired-pulse response con-
tains the contributions from all kernels of order second and higher without distinction)
and they avoid errors due to cross-term interactions or the effects of systemic/ambient
noise (because ofthe employed least-squares fitting).
Some illustrative examples of the application of this approach to the hippocampal for-
mation are given in the following section.
8.3.3 Examples from the Hippocampal Formation

As a first example of the power of the Poisson-Wiener modeling approach in elucidat-
ing the functional complexity of parts of the central nervous system, we choose one of
the pioneering applications to the hippocampal formation by the collaborating research

groups of Berger and Selabassi [Berger et al., 1987, 1988a, b, 1989, 1991, 1993, 1994;
Selabassi et al., 1987, 1988a, b, 1989, 1994]. We also present more recent modeling re-
sults obtained by the collaborating research groups of Berger and Marmarelis from me-
dial and lateral perforant-path stimulation in a hippocampal slice (in vitro) that make use
of the improved kernel estimation methods presented in the previous section (Laguerre
expansion technique to obtain directly the "reduced" Volterra kemels). The recent re-
sults include single-input and dual-input Volterra models [Gholmieh et al., 2001, 2002,
2003a, b; Dimoka et al., 2003]. Finally, we present an example of reduced Volterra
modeling of synaptic dynamics in the hippocampus and compare it to a widely accept-
ed parametric model [Song et al., 2002, 2003].
Single-Input Stimulation in Vivo and Cross-Correlation Technique. The neu-

roanatomical structure ofmany brain regions is organized in terms ofhighly interconnect-
ed subpopulations of neurons, like the well-studied hippocampal formation, which con-
sists of the entorhinal, hippoc ampal, and subicular cortices [Berger et al., 1987]. The
major projection neurons ofthese three cortical regions form a multisynaptic circuit (see
Figure 8.21) in which excitation of the entorhinal cortex results in the sequential excita-
tion through the perforant path of hippocampal and subicular neurons. In addition, many
feedforward and feedback pathways exist within this "master loop" that modulate the ac-
tivity ofthe projection neurons.
Berger and Selabassi and their associates have used the aforementioned modeling ap-
proach to study the functional characteristics ofthe hippocampal formation [Berger et al.,
1987, 1988a, b, 1989; Selabassi et al., 1987, 1988a, b, 1989]. Their experimental strategy
has involved the application of Poisson random impulse trains to axons of the perforant
path, which arise from entorhinal cortical neurons, and recording as outputs the evoked
population responses from postsynaptic granule cells of the dentate gyrus or pyramidal
cells in the CA3 or CAI regions. The reduced Volterra model presented in Section 8.3.2
is used for representing the transformational characteristics of the hippocampal network
of neurons activated by the perforant path. The activity of dentate granule cells can serve
as an effective measure oftotal network activity because the other subpopulations ofneu-
rons within the hippocampal formation contribute to the excitability of granule cells
through several feedback pathways.
The high degree of lamination of neuronal elements within the hippocampus results in
distinct advantages for applying this modeling approach. Subthreshold stimulation ap-
plied to the perforant path results in the generation of excitatory postsynaptic potentials
(EPSPs) within the dendrites of dentate granule neurons. The intracellular current is re-
flected extracellularly as a positive-going potential (i.e., current source), or "population
EPSP," within the cell body layer of the dentate [Figure 8.22 (A)]. If the stimulus is
suprathreshold, the action potential discharges of a population of granule cells are reflect-
ed as a negative-going "population spike" (i.e., current sink) that is superimposed on the
positive-going "population EPSP" [Figure 8.22 (B)]. Thus, the two components ofthe ex-
tracellular field potentials can be separated and analyzed in isolation [Figure 8.22 (C)].
This greatly simplifies the computation of"reduced" Poisson-Wiener kemels for the pop-
ulation spike in the lag-delta (or tau-delta) representation (see Section 8.3.2).
Methodological and experimental procedures are detailed in Berger et al. (1987,
1988a, b) and Selabassi et al. (1988a, b). All experiments were conducted using male
New Zealand white rabbits, and data were collected either acutely or after animals had re-
covered from chronic implantation of stimulation and recording electrodes. During each
Figura 8.21a Sehematie representat ion of a eross-seetion of the hippoeampus showing the intrin-
sie trisynaptie pathway involv ing exeitatory input from the med ial and lateral perforant path (pp) to
granule eells in the dentate gyrus, to pyramidal eells of the CA3 regie inferior, and to pyramidal eells
of the CA1 regie superior [adapted from Berger et al., 1987].
I I I ;
I k I"
, I
I I
A'comm. I I,',
, I
comm.
II~
I I
'I I
, I I
I I I I I
°l
ENTORHINAL
pp
DENTATE
<L~
bc
CA3 IA I
<~H-!-0-:"""\-<
: bc
I
bc :
~ > O"}
"c<~>-
.0:> - ----j ./
SUBICULUM CAl
Figure 8.21b Sehematie representation of the major intereonneetivity in the hippoeampal forma-
tion. Abbreviations : eomm . = eommissural projeetions; be = basket eells; pp = perforant path.
experiment, a Poisson train of 4064 impulses (biphasic, 0.10 ms duration) was delivered
to the perforant path with a mean rate of2.0 impulses/sec (i.e., a mean interimpulse inter-
val of500 ms), and a range ofrandom (Poisson) interimpulse interval s of 1-5000 ms.
The majorit y of population spikes (typically 90% for each preparation) evoked during
random train stimulation occurred with latencies (r) of 3-9 ms in both anesthetized (N =
8) and unanesthetized (N = 8) animals [Figure 8.23 (A) and (B), respectively]. Because of
this range of spike latencies , first-order kerneis were computed initiall y with a temporal
resolution of 10 ms, so as to include all evoked spikes in a single tau bin, making the first-
order kernel a simple scalar [reduced Poisson-Wiener (P-W) kernei]. The value of the
first-ord er reduced P-W kerneI is the average population spike amplitude evoked during
Poisson-train stimulation, which was 1.7 mV (±0.1) for anesthetized animals and 2.2 mV
A. B.
tJI (· "\. ".,

, '',+--J1ltl c.
r_~"(11
~ I
t t5 ts -
s
Figure 8.22 Examples of population EPSP (A) and spike (B) components of field potentials evoked
in the cell body layer of the dentate gyrus in response to stimulation of the perforant path. The result
of a subthreshold stimulation is shown in A, and of a suprathreshold stimulation in B. Shown in C is
the manner in whieh the amplitude (A) of the population spike is determined [Berger et al., 1987).
(±0.3) for unanesthetized animals . Second-order P-W kerneis were computed also in re-
duced form (single tau bin of 10 ms).
Representative reduced second-order P-W kerneis for three anesthetized animals are
shown in Figure 8.24 (A-C), and reveal significant nonlinearities as a function of the in-
terimpulse interval or A. For A < 50 ms they exhibit inhibitory effects on the population
spike, but between 50 ms and 300 ms they exhibit marked facilitation. The maximum fa-
cilitation for all anesthetized preparations ranged from 93-158% of the first-order kernel
value and occurred between 70-100 msec .
The facilitation was bimodal in some preparations. For example, Figure 8.24 (C)
shows a second-order P-W kerne I exhibiting one peak of facilitation at 70 ms and a sec-
ond peak of facilitation at 210 ms. The presence of this bimodality was attributed to inad-
vertent stimulation of two different states or high estimation variance associated with the
employed cross-correlation technique.
For A values in the range of300-700 ms, an inhibition in spike amplitude was observed
in six ofthe eight anesthetized preparations but was small and variable from animal to ani-
mal. At A values longer than 700 ms, the kernel values appear statistically insignificant.
Second-order P-W kernels computed from unanesthetized, chronically prepared ani-
mals revealed several significant differences from those obtained in anesthetized animals
[Figure 8.24 (D-F)]. First, data from unanesthetized animals revealed less facilitation in
response to interimpulse intervals in the range of70-100 ms (mean peak facilitation = 74
± 8%). Second , inhibition did not occur in response to intervals longer than 300 ms.
Third, data from 5 of the 8 animals showed a slight facilitation that extended to approxi-
mately 1000 ms [see Figure 8.24 (F)]. Fourth , the bimodality observed in anesthetized an-
imals is removed.
Analyzing the amplitude of the evoked population spike in isolation of other compo-
nents also greatly simplifies the presentation and interpretation of third-order reduced
P-W kernels because the data can be presented in three dimensions instead offour dimen-
sions that would be necessary if the entire field potential had been analyzed. Reduced
third-order P-W kerneis are shown in Figure 8.25, one for data collected from an unanes-
thetized animal and one for data collected from an anesthetized animal (both kerneis were
computed with a tau bin of 35 ms). The second-order P-W kerneis for these same prepa-
rations are shown in Figure 8.24 (A) and (D), respectively. The third order kernels reveal
A B
100 .100
10 10
10 80
)
'ö
c
: 50
!
2 10
~c
70
50
,...
B .. 8 40
l 30 :. 30
~
20 20
10 10
01 e .... " " , , , , e , 0' ,=l're ei
024 • • 10 12 14 " " 20 o
ms
c D
2.5 3.5
3.0
2.0
2.5
~ 1.5
~ 2.0
~ t::
-;::: 1.5
.E 1.0 ..c:
1.0
0.5
0.5
0' r , e e e , , , , • oI I , , , , , , , , ,
o 10 20 30 40 50 10 70 10 10 100 o 10 20 30 40 50 eo 70 10 10 100
ms rns
E F
3.0 3.0
2.5 ~ 2.5
:;
2.0
... >
E
2.0
S 1.5 S 1.5
E
1.0
s 1.0
0.5 0.5
01 ,=.-'1" oI i'" I , , , , , , ,
o 2 4
•• 10
ms
12 14 11 ,. 20 o 2 4 I
• 10
ms
12 14 11 1. 20
Figure 8,23 Distributions of latencies for population spikes elicited during random train stimulation
for anesthetized (A) and unanesthetized (B) preparation. First-order Poisson-Wiener kerneis (values
are expressed in millivolts) computed with temporal resolutions of 10msec (C and 0) and 1 msec (E
and F) for data collected from the same two preparations [Berger et al., 1987].
8.3 NEURONAL SYSTEMS WITH POINT-PROC ESS INPU TS 453
that significant third-orde r nonlinearities exist in the system and are altered by anesthesia.
In general, third-order P-W kernels from anesthetized animals have significant positive
components, whereas their counterparts from unanesthetized animal s contain almost all
negative component s.
The observed bimodality in the facilitatory region of some kernels may be due to the
stimulation of two subpopulations of perforant-path fibers, since a medial and a lateral
perforant path can be distinguished on the basis of anatomical , physiological, and phar-
macological criteria (see example of dual stimulation below). One of those criteria is a
shorter latency to peak response of population EPSPs in dentate granule cells produced by
stimulation of the medial perforant path. Thus, the first peak may reflect predominant ac-
tivation of the medial perforant path, whereas the second peak may reflect predominant
A
... -..
D
,., '''''''
'JO
i''''''
.." 1-
1 ... 1
- ',.,..' ' '
> ~
E ...
'1 • !
.,.~ .,.~ '1
Ij •
1t • .
" -e." ~
.... "I .,.,.. l ''''''
.... .,.... • '00 ... :soo ... 500 100 700 ooe 100 '000
· Ul
• '00 ... ...
:100 500 100
A (mi)
700 laO toO '000
A (mi'
11
..... !
,..
-:~
'10
u.
'"
l':l~
.....
>
E • .71
1••
.,.t i
- ,.,.. r"'" >E , ,.
1...
.,.
t
• ...~ • •
Ij
... I; ..".,
.... •
~
. .. i:j~
... • ... ,.. ... ......
" .71 ., :soo 500 100 lClO 1000
. , 70
700 ...
'00 ... '00 100 '00 ... '000
A Iml' A (mi)
C
, .. ..... I'
... ,....
,n u.
.....
1 .• - '....' '
~
1'.71 l
1...
...t •
1711
r=
;:.".,
l •1
~ '1
...
.,. • ~
Ii .....
...... •
I; ~
.. .. iil... .....
.,....
.. 71
'00 ... ,.. ... ...... ...
A (mi,
700 ... '000
-u o
• ... ... ... 100 '000 12'01)
A (mi)
,.-c:c '101) '100 JOOO
Figura 8.24 Second-order reduced P-W kemels for population spikes of all latenc ies (tau bin = 10
msec) recorded from three different anesthetized (A-e) and two different unanesthetized (D and E)
preparations. For panels A-E , second-order kemels values are expressed as a functlon of interstimu-
lus intervals (ß) extending to 1000 msec. The data shown in F are from the same preparation as in E,
except that interstimulus intervals extend to 2000 msec. Both unnormalized (mV) and normalized (rel-
ative to the peak of the first-order kemel) values are shown.
activation ofthe lateral perforant path. This is corroborated by the results ofthe two-input
study (medial/lateral) presented below.
Using this approach, Berger and Selabassi determined how manipulation of afferent
projection that is diffusely distributed throughout the hippocampus (i.e., the noradrener-
gic input from the locus coeruleus) influences functional properties at an integrated level
and investigated how epileptiform activity at one point in the system influences the trans-
formational properties of the entire hippocampal network. These results demonstrate the
utility of this approach to a comprehensive understanding of the function of the central
nervous system.
It is evident that the presented modeling methodology can be used to assess quantita-
tively the effect of pharmacological intervention or to diagnose neuropathologies in the
central nervous system, provided that the appropriate input-output measurements can be
100
f
=m 50
E
o
c
--- 0
~
~
~-so
~
-100
Figure 8.25 Third-order reduced P-W kerneis from an unanesthetized (top) and an anesthetized an-
imal (bottom). Kerneis values are normalized relative to the respective first-order kernel for each set
of data, and plotted as a function of the least recent (a 1) and the most recent (a 2) of the two prior in-
terimpulse intervals. Only half of the plot is shown because only interval pairs in which a 2 > a 1 were
included in the calculations [Berger et al., 1987].
made reliably and safely. An illustrative example of quantifying pharmacological effects

in a hippocampal slice (in vitro) is presented in the following section.
Single-Input Stimulation in Vitro and Laguerre Expansion Technique. The

major excitatory projection from the entorhinal cortex to the granule cells of the dentate
gyrus is the perforant pathway and consists of two anatomically and functionally distinct
subdivisions-the medial and the lateral perforant path. The lateral perforant path (LPP)
arises in the ventrolateral entorhinal area and it synapses in the outer one-third ofthe mol-
ecular layer of the dentate. The medial perforant path (MPP) arises in the dorsomedial en-
torhinal cortex and it synapses to the granule cell dendrites in the middle one-third of the
molecular layer (see Figure 8.21a). The difference in the synaptic locations on the granule
cell dendrites of the terminals from the LPP and MPP results in different electrophysio-
logical characteristics of the granule cell responses to the independent activation of the
two pathways.
The medial and the lateral fibers of the perforant path were stimulated in a hippocam-
pal slice preparation in vitro and the population spike data recorded as output in the den-
tate gyrus were analyzed using the Laguerre expansion technique adapted for point
process inputs (see Section 8.3.2). An array of 60 microelectrodes (3 x 20 electrode
arrangement with sizes of 28 J.Lm, and center-to-center spacing of 50 um) was used for
recording (MEA60 Multi Channel Systems, Germany) from a hippocampal slice of adult
male rats. Poisson sequences of 400 impulses were used to stimulate each pathway inde-
pendently and the induced responses of the dentate population spikes were analyzed.
The resulting reduced second-, and third-order Volterra kernels are shown in Figure
8.26 for the medial pathway (k} = 144 f.LV) and in Figure 8.27 for the lateral pathway (k} =
197 f.L V). The distinct dynamic characteristics ofthe two pathways are evident. The medi-
al pathway exhibits biphasic dynamics but the lateral pathway is strictly inhibitory in the
second-order kernel. The third-order kernels are biphasic for both pathways and clearly
with distinct dynamics that appear to counteract the effects of their second-order counter-
parts. The predictive ability of the obtained models is excellent and the obtained kernel
estimates are consistent from experiment to experiment. These results corroborate the im-
provement in modeling performance due to the new estimation method (based on kernel
expansions). This also allows the extension ofthe modeling approach to the dual-stimula-
tion paradigm, discussed in the following subsection [Dimoka et al., 2003].
Another interesting demonstration of the efficacy of the new modeling methodology
is presented in Gholmieh et al. (2001), where the reduced second-order Volterra kerne I
of the neuronal pathway from the Schaffer collateral (stimulation) to the CA1 region
(population spike response) is shown to remain rather consistent from experiment to ex-
periment in acute and cultured hippocampal slices. The pharmacological effects of pi-
crotoxin (a GABAA receptor antagonist) are quantified by means of the respective re-
duced second-order Volterra kernels, as demonstrated in Figure 8.28. The effect of
picrotoxin on the neuronal dynamics is more pronounced for short lags (a < 100 ms),
consistent with the existing qualitative knowledge regarding GABA A receptors (observe
the higher peak after picrotoxin).
Dual-Input Stimulation in the Hippocampal Slice. Two independent Poisson

trains of impulses stimulate simultaneously the medial and lateral perforant pathways
(MPP and LPP) of the hippocampal slice preparation and the induced population spike
(PS) in the dentate gyrus is viewed as the system output. The latency of the PS is within
10 ms of the respective stimulating impulse and, thus, the reduced form of the
Poisson-Volterra model is employed by suppressing the tau dimension in the tau-delta
kernel representation (see Seetion 8.3.2). The synchrony of the input-output events, the
independence of the timing of the impulses in the two input Poisson sequences, and the
extraction ofthe PS output data from the field recording are illustrated in Figure 8.29.
The two-input modeling methodology has to be slightly modified in the reduced Pois-
son-Volterra case in order to account for two distinct interaction components (cross-ker-
nels) in the second-order model. This modification is presented below and corresponds a
K1: Mean: 144uV Stdev: 7.65

K2:
200
150
100
50
o
o 400 ···----..--t'\11I1 800 1000
-50
-100 time (msec)
K3:
20
-20
-40
-60
-8 0
-100
-120
-140
1000 ~
500
800 1000
o 0 200 400 600
Figure 8.26 Second-order and third-order reduced Volterra kemels for medial pathway.
SINGLE INPUT STIMULATION: Lateral (Third order model):
K1: Mean: 197 Stdev: 3.41

K2:
o 200 400 600 800 1000

o
-50
-100
-150
-200
-250
-300
-350
-400
time (msec)
K3:
500 -
400
~
30 0
200 -
100 \ .. ,
-100
100 0
50 0
50 0
o 0
Figure 8.27 Second-order and third -order reduced Volterra kemels tor lateral pathway.
cross-kernel component to each input-synchronized output PS. If we denote with x] and

X2 the Poisson input sequences for MPP and LPP, respectively, then
x](n) = LA,8(n - ni) (8.94)

nil
X2(n) = LA28(n - ni2) (8.95)

"i:
where n denotes the discrete-time index (bin size of IO ms), Al and A 2 denote the fixed
impulse amplitudes for MPP and LPP , respectively, and {nil}, {n i2} are the times of oc-
700 , i i i i I i i i i i
800
.......
]500
::E4OO
A ~
~300
...III
5
'0
200
§
U)
100
- 100
0
tr
200 400 eoo 800 1000 1200 1400 1eoo 1800 2000
Time (ms)
8
B 1.6 C
7
1.2 ~6
>5
~
III ..
kl D.8
'e
I:P
DA U
•o 2
1
0 o I'YA,'", l la i rWA , I Yd , dTt , l rA , m , ' R'.
1 2 3 .. 5 6 7 8 9
Laguerre basis coellicient, C,
Figure 8.28 Effects of picrotoxin (100 IJ.M) on the reduced second-order kemels . (A) Second-order
kemel before drug addition (Iower curve) and 15 min after perfusion with 100 IJ.M picrotoxin (higher
curve). (8) Effect of picrotoxin on reduced first-order kemeis (open bars: control; hatched bars: picro-
toxin (100 IJ.M); means ± SD of five experiments). (C) Effects of picrotoxin on the nine Laguerre expan-
sion coefficients (open bars: control ; hatched bars: picrotoxin) [Gholmieh et al., 2001].
currences of the impulse events in the two inputs [Courellis et al., 2000; Dimoka et al.,
2003].
For the second-order model, the output PS amplitude can be expressed in terms oftwo
components Yl and Y2 representing the MPP and LPP contributions, respectively:
yen) = YI(ni,)O(n - nil) + Y2(ni2)O(n - ni2) (8.96)

La \c .. I
t1 12 n tot u
Figure 8.29 Two-input experiment with medial and lateral perforant-path stimulation.
YI(nil) = kl,o + I kl,l(nil - njl) + I k l,2(n i l - ni2) (8.97)

ni 1-j.L<njl <ni I ni 1-j.L<'12<n i I
Y2(n i2) = k 2,o + I k 2,I(n i2 - njl) + I k 2,2(n i2 - nj2) (8.98)

ni2-j.L<'12<ni2 ni2-j.L<ni2<n i2
where kl,o and k 2,o denote now the reduced P-V first-order kernels, kl,l and k 2,2 denote the
reduced P-V second-order self-kemels, and k l ,2 and k2 ,1 are the two components ofthe re-
duced P-V second-order cross-kernel in the two-input case (note that the summation in-
dex accounts only for "other input" effects). The reduced first-order kernel is a constant
value and represents the average PS amplitude for each pathway under dual stimulation.
The reduced second-order self-kernel represents the nonlinear interaction between the pre-
sent stimulus impulse and every past stimulus impulse in each pathway within the memo-
ry epoch . Each component of the reduced second-order cross-kernel shows the effect of
one input pathway on the response generated by the other input pathway.
The reduced P-V kernels were estimated using the Laguerre expansion technique for
L = 4, a = 0.994. The values of the first-order kerneI were 106 f.1 V for the MPP and 217
f.1V for the LPP in this example. The obtained second-order kernels are shown in Figure
8.30. The second-order self-kernel for the LPP exhibits only a slow inhibitory phase up
to about 600 ms. The second-order self-kernel for the MPP is characterized by a fast fa-
cilitatory phase up to 80 ms, followed by a smaller but longer inhibitory phase lasting
up to about 800 ms. The second-order cross-kernel for the LPP (representing the effects
of MPP on the LPP-synchronized output) is characterized by a fast inhibitory phase
(0-100 ms), followed by a small and relatively short (100-400 ms) facilitatory phase.
The second-order cross-kernel for the MPP is characterized by a fast facilitatory phase
up to 150 ms, followed by a small and short inhibitory phase up to 250 ms, and then a
small but longer facilitatory phase up to about 800 msec. The NMSE of the second-or-
der model prediction with only self-kemels is 16.05%, and the inclusion of the cross-
kernels drops it to 7.03%, as illustrated in Figure 8.31 [Courellis et al., 2000; Dimoka et
al., 2003].
Nonlinear Modeling of Synapfic Dynamies. As an example of nonlinear modeling

of synaptic dynamics, we consider the case of short-tenn plasticity (STP) in the hip-
pocampus, whereby presynaptic action potentials in a Schaffer collateral elicit excitatory
postsynaptic currents (EPSCs) at CAI pyramidal cell synapses in a manner that depends
on recent activity. A Poisson sequence of impulses (mean rate of 2 impulses/sec) was ap-
plied to the Schaffer collateral to stimulate orthodromically the synapses at voltage-
clamped CAI pyramidal cells. Due to the voltage clamp, the deconvolved EPSCs record-
460 MODEL/NG OF NEURONAL SYSTEMS
Lateral PertOrant Plth Medial perforant Path

A: Secood Order SelfKemel a : MCOIIÖ Order SelfKemel
~I~
·50
10 ~OO' I
0 200 «10 llOO llOO 1000 -

u 200 400 600 llOO 1000
~::F~~1
«101 D: Second Order Cross Kernell
300
200
100
!O o
10'
o 200
I .,00' 200 «10time tg'.) llOO
I
1000
400time
lilfllll) llOO
Figura 8.30 Second-order self-kemels (top) and cross-kemels (bottom) when stimulat ing at the
MPP (right) and the LPP (Ieft) simultaneously (dual-site stimulation).
ed in the soma of the pyramidal cell were used as the output measurement of presynaptic
neurotransmitter release [Song et al., 2002] .
A third-order reduced Poisson-Volterra (P-V) model was estimated from the data, using
1000 input-output events and the Laguerre expansion technique. The resu1ting P-V kernel
estimates for this nonparametric model are shown in Figure 8.32 and yield excellent predic-
tion accuracy (NMSE = 2.4%). The reduced second-order P-V kernel exhibits early facili-
tation (up to -100 ms) and late depression (between -100 ms and -1500 ms). The reduced
third-order P-V kernel exhibits early depression (up to -100 ms) and slight late facilitation
(between -100 ms and -500 ms). These reduced P-V kernel estimates are compared in the
same figure with their counterparts obtained from a widely accepted parametric model of
STP for this synapse [Song et al., 2002], which is described briefly below.
The simulated parametric model of STP was proposed in Dittman et al. (2000) and is
described by the following equations [Song et al., 2002] :
>3
.=,:"
-8
.€ 0 "
0
I I
&00
! " !
1000
I
t
" I , ! I " I I 11 I
...
I I I
...
! I ' "
i5. ..10'
~ : r-- i i
"
..::a.c
' Q. .
2~ '
Iln 0'" I t ,. 1 I I I I J J I I J I I 11 I I I I I I I I I
6 0 -.00 , 000 , 110OO aooo ~ GOOO tlI600 .. 000 " 1KlO -.oe
.~ . 10-
&:ri":':"-~-,,-ri"T'-
3
0
Il- .
time(ms)
Figura 8.31 A: The actual response (amplitude of the population spikes) of the two -input system. B:
The model-predicted response based on the estimated self-kemels. C: The model-predicted re-
sponse upon inclusion of the estimated cross-kemels [Courellis et al., 2000].
EX. PK
kl 0.$ o.s
I....)
401 1 40. I
20 20
1500 2000 500 1000 1500 2000
20
... ..
k) 0 ,"
~) 1
·20 ...... -20
Figura 8.32 The reduced P-V kernel estimates tor third -order models estimated trom Poisson ran-
dom train data (EI<) and optimized parametrie model (PI<) [Song et al., 2002).
EPSC(t) = a . N T' r(t) . !J(t) (8.99)
aCaXF = -CaXFCt)/TF + S(t - t o) (8.100)

at
aCaXD = -CaXD(t)/TD + S(t - t o) (8.10 1)

st
1-F1
F(t) = F , + 1 + K~CaXFCt) (8.102)
1-F} -1 (8.103)
K F = (F/(l-F ,))· p-F)
aD
dt = ( 1 - D(t)) . "-recov( caXD) - IJ(to) . r(to) . S(t - to) (8.104)
kmax -!co +k (8.105)

"-recov(caXd ) = 1 + KdcaXD(t) o
In this parametrlc model, the EPSC peaks are modeled as the product of unitary EPSC
(a), the total number of release sites (NT)' the facilitation factor (F), and the depression
factor (D) . F and D are calculated from the concentrations of two calcium-bound mole-
cules, caXF and caXD, which are both driven by residual calcium. Model parameters are :
the initial release probability F I ; the maximum paired-pulse facilitation ratio p; the decay
time-constants 'TF and 'TD for CaXF and CaXD , respectively; the minimum and maximum
recovery rates k o and k m ax ; and the affinities of CaXF and eaXD for the release sites K F
and K D , respectively. Note that K F and p are equivalent. The model parameters (p, F b ko,
km ax , and K D ) were estimated from experimental data using the quasi-Newton optimiza-
tion method yielding the model parameter values p == 1.1, F I == 0.32, k o == 1.4 sec', km ax ==
15.2 sec', K D == 1.9, 'TF= 100 msec, 'TD == 50 msec, and (o : NT) == 0.54 nA.
The predictive performance of the parametric model is a little worse (NMSE == 3.5%)
and its equivalent P-V kernels are slightly different from their counterparts of the non-
parametric model (see Fig. 8.32). The main difference is in the early facilitation ofthe re-
duced second-order P-V kernel (representing two-pulse interactions), where the paramet-
ric model exhibits almost double the actual values measured by the nonparametric model
(which is inductive and true to the data).
At this juncture, it is instructive to clarify the relation between the P-V kernels and the
widely used "paired-pulse response" curves (obtained by successive experiments with
two impulses of variable time separation). If Y( 'Tl' 'T2) denotes the output predicted by the
third-order P-V model with two input impulses preceding the present output by 'Tl and 'T2
sec, respectively, then
Y( 'Tb 'T2) == k I + r2( 'Tl) + r2( 'T3) + 2k2 ( 'Tb 'T2) (8.106)
where
r2( 'T) == k2 ( 'T) + k3 ( 'T, T) (8.107)
is the paired-pulse response for a third-order system. Note that k I is equal to F I , the initial
release probability defined in the parametric model, and the maximum facilitation ratio p
is the summation of the peak value of r2 and 1.
8.4 MODELING OF NEURONAL ENSEMBLES
The modeling methodology presented in the previous section can be applied to any level
of complexity (system integration) from the single neuron to any number of interconnect-
ed neurons (ensemble). In the latter case, a critical issue in developing an "external" mod-
el (or input-output nonparametric mapping) is the selection of the input and output points
in the histological configuration of the ensemble. The input points can be either locations
of external stimulation or the natural activity of a neuron in the ensemble. The output
points are selected locations where the induced activity (response to stimulation) is moni-
tored for the purpose of determining its causallink with the stimulating input point(s).
The input and output points can be intracellular or extracellular. The latter case includes
field recordings that reflect the collective contributions of a population of neurons in the
neighborhood of the electrode (a useful measurement in some cases, discussed in Section
8.3.3).
Adoption of the input-output approach facilitates the discovery of causal links be-
tween points of interest in the neuronal ensemble and yields quantitative descriptions
(models) ofthese causal relationships. These models can be used for prediction purposes
or to advance our understanding of the functional characteristics of the neuronal ensem-
8.4 MODELING OF NEURONAL ENSEMBLES 463
ble. We can also manipulate pharmacologically the experimental preparation in order to

separate specific neurophysiological mechanisms, or we can redefine our input-output
points to explore different functional aspects ofthe neuronal ensemble.
However, these input-output models do not probe specific aspects of the internal
workings of the ensemble and do not reveal directly detailed infonnation on the indi-
vidual components. To achieve this more specific infonnation, we need to develop
"structural" or modular models of the neuronal ensemble based on histological knowl-
edge of the neuronal structural interconnectivity, or we can use functional modules de-
fined by the "integrated submodels" of single neurons. In both cases, it behooves us to
consider the single neuron as the basic functional unit that maps presynaptic inputs onto
the sequence of action potentials at the axon hillock (see Sec. 8.2). It is evident that the
definition of this "basic functional unit" calls for an integrated, external model of the
single neuron such as the Volterra-type (or PDM) models advocated in this book. It is
also clear that by defining the output as the sequence of action potentials at the axon
hillock, we also imply that the "presynaptic inputs" should be represented by the outputs
of presynaptic neurons (i.e., their sequences of action potentials at their axon hillocks)
and not by the numerous presynaptic events/pulses arriving at the various axonal tenni-
nals of each presynaptic neuron.
Generally, for an ensemble of M possibly interconnected neurons, we may consider a
multiinput Volterra model with a threshold trigger to represent the activity of each neuron
in terms ofthe activity ofits cohorts. Thus, for the activity ofthe mth neuron
Xm(t) =
i"*m
r
T{k~m) + ~ k~m)( T)xi(t- T)dT
+.I .I ff k~7,~iTb
lJ =1 12=1 0
TZ)xi,(t- T,)xii t- Tz)dT)dTZ +...] (8.108)
iI*m i2*m
where T denotes a threshold-trigger operator with refractoriness, and k(~: . .. ,in denotes
the nth-order cross-kernel for the mth neuron describing the interactions among the neu-
ronal activities (inputs) XiI' . . . ,xin as they affect the activity X m ofthe mth neuron.
The Volterra model, which defines the operand of T, can be viewed as generating an
"intracellular potential" at the hillock of the mth neuron in response to the activities of all
other neurons, which is subsequently converted to action potential if it exceeds the thresh-
old. The latter must have a refractory period and may have some stochastic characteristics
(as in the Brillinger fonnulation of the problem [Brillinger, 1988]). Obviously, this
Volterra model may take the modified form ofEq. (2.180) after kerne I expansion or an
equivalent modular PDM form with a "trigger region" (see Fig. 8.1). The latter can be
also represented by an equivalent network form to achieve greater efficiencies in model
compactness with benefits in estimation and prediction.
This model can be viewed as a generalization ofthe simple "integrate-and-fire" model
that takes into consideration the combined effects of multiple (possibly interconnected)
neuronal units in a nonlinear dynamic context, where intennodulatory effects are account-
ed for (i.e., the nonlinearity is more than the operation of a threshold-trigger generating
the action potential).
We can also define the basic functional unit of a single neuron by means of a PDM
model with a "trigger region" termed the "neuronal modes", as discussed in the previous
section. The neuronal modes (NMs) incorporate all the dynamics of voltage-dependent
and ligand-dependent channels implicated in the generation and processing of informa-
tion through the neuronal unit (in the form of transmembrane voltage and currents) in-
cluding propagation effects throughout the neuronal structure. It is critical to emphasize
that the inputs of this "neuronal unit" are defined as the sequences of action potentials at
the axon hillocks of the presynaptic neurons, and the output is defined as the elicited se-
quence of action potentials at the axon hillock of the subject neuron. The latter may
branch out when it becomes the presynaptic input to other neuronal units (fan-out ofax-
onal terminals), defining the possibly divergent "down-stream" connectivity of the sub-
ject neuron; however, the integrated dynamics of this process ofaxonal divergence and
postsynaptic connectivity are incorporated in the NMs of the postsynaptic neuronal units.
This is illustrated in Figure 8.33, where the NM models of several neurons (neuronal units
as previously defined) are shown to form a neuronal ensemble of intricate interconnectiv-
ity.
The emerging properties of such neuronal ensembles depend both on the characteris-
tics of the individual NMs and TRs as well as on the particular interconnectivity among
neuronal units. This subject matter deserves more elaboration than allowed by the space
limitations and overall economy of this book. Inevitably, it will have to await more elabo-
rate treatment in future publications. For now, it is simply presented as a plausible gener-
al framework for the detailed functional study of "internalized models" of neuronal en-
sembles.
A more practicable alternative is offered by "external" or "black-box" models of neu-
ronal ensembles defined by the selected input and output points. This approach is consis-
tent with the general nonparametric methodology that constitutes the foundation of the
Volterra-Wiener approach (with all its strengths and weaknesses discussed in Chapters 1
and 2). Although this input-output modeling approach is the origin of the advocated
methodology, the latter has also made significant strides in "internalizing" the model (i.e.,
opening the black box), primarily through the use of equivalent modular models (such as
the PDM or NM models) or equivalent parametric models (in the form of differential or
difference equations) whenever appropriate. This remains our guiding principle in the
gradual process of converting inductive models (i.e., nonparametric models true to the in-
put-output data) into interpretable model forms that reflect more closely the internal func-
tional organization of the subject system and elucidate detailed aspects of scientific or
clinical interest. This gradual process can be assisted by the use of PDM analysis or its
counterpart for neuronal systems (NM analysis).
An interesting method utilizing maximum-likelihood estimation was used by
Brillinger to analyze multiunit data from Aplysia californica provided by Segundo and
Bryant [Brillinger et al., 1976]. The spontaneous activity ofthree abdominal ganglia was
monitored and the spike-train data were analyzed to examine possible causallinks among
them. The mathematical formalism follows the Volterra approach (up to second order) to
define an "intracellular" potential at each neuron based on the activities of the other two
neurons, and an action potential is generated if the intracellular potential exceeds a certain
soft threshold that is defined by anormal cumulative [Brillinger, 1988]. The inflection
point of the soft threshold is allowed to vary parabolically with the time elapsed from the
last neuron firing (in an attempt to capture the refractoriness of the neuron) and also al-
lowed to exhibit some limited Gaussian variation (stochastic threshold). The likelihood
function is defined by a Bernoulli-type binomial distribution (on the assumption of statis-
tical independence of the successive values of probability of firing) and is maximized
8.4 MODELING OF NEURONAL ENSEMBLES 465
NEURONAL UNIT #1 NEURONAL UNIT #2

------------------------~ ,
I
I
I
I
i-~u~~~~;l--8J--i
INPUT I
I
I
i i .! - TR2: i
: I
I
:
L I .- I L-..J :
~
I ·· .....-, ...v I L-J i
I
~- - - - - - - - - - - - - - - - - - - - - - - __ I
#3 #4 #5
,------ ------------.,
:~:
iU---~-~~~:.~~'U~.. !m---~-~~~~.~~IJ--l
:B·~: J'
JI
I
~
TR3
! ..-1 N~~,M.J I i
-
-.
I
I
JI
:BR4
! +-1
I
L
: ]
I !
NM(4.MJ
hn
1
l~
lI
!
L
BR5 : :
+-1 NM(5.M~ I ! I
#6
i-~--~~~6~;;----B----l
C··
. #7
i-b=---~~~;;;----B----i
, ..: : TR6 I ' : TR7 : i
i IN~;6.M&lf_
I
1
I
lINMc;.M'lf_
I
i
I
~------------------~ ~------------------~
Figure 8.33 Interconnectivity in neuronal ensemble of "neuronal units" (see text).
with respect to the unknown parameters (which are akin to kernel functions or related to
the thresholding operation). The results are interesting but not compelling, because ofvar-
ious assumptions in the model structure that appear somewhat arbitrary (including the
postulated statistical independence of successive values ofthe probability of firing or the
assumed parabolic dependence of refractoriness).
We must also note the pioneering work of Moore, Segundo, and Perkel on cross-corre-
lation analysis of neuronal firing patterns to detennine causality in neuronal ensembles
[Moore et al., 1966; Segundo et al., 1968]. This work retains some descriptive relevance
to our scientific objectives until the present time, but it does not yield predictive models
or deal with the intrinsie nonlinearities of the system. Last, but not least, we must point
out that the presence of closed-loop connectivity raises a host of critical modeling issues
that are addressed in Chapter 10.
9
Mode/ingof
Nonstationary Systems
The importance of nonstationarities in physiological function is widely recognized (e.g.,

time-varying characteristics of the cardiovascular or metabolic-endocrine systems), al-
though no general methodology has been accepted as broadly effective under the con-
straints imposed by physiological data. Some of the confounding factors are the diversity
in the form ofnonstationarities encountered in physiology, the intertwining ofnonstation-
arities with nonlinearities, and the ubiquitous presence of noise that obscures observation
and complicates the estimation task.
The first task in a practical context is to examine whether the subject system exhibits
significant nonstationarities under the operating conditions of interest. We must stress the
importance of examining the system behavior under meaningful operating conditions in
order to avoid serious misconceptions arising from contrived experimental conditions
(which often tend to assume a level of credibility in the literature beyond what is warrant-
ed by the artificially imposed experimental constraints).
The practical assessment of nonstationarity can be made either by means ofprelimi-
nary testing or by comparing the relative performance of stationary and nonstationary
models on the same input-output data (in terms of prediction capability). In the former
case, the same experiment is repeated at different times and the obtained stationary mod-
els are compared for consistency. Obviously, significant differences between stationary
models obtained at different times imply either the presence of nonstationarities or poor
estimation accuracy (an issue that must be resolved by improving the estimation accuracy
and reevaluating the results). This approach can be experimentally laborious but can be
simplified somewhat by using specialized inputs. The comparison of the predictive per-
formance of stationary and nonstationary models on the same input-output data requires
the estimation of nonstationary models and, therefore, is subject to all the burdens and po-
tential pitfalls of this task. However, it is less laborious experimentally and becomes the
more attractive (or the only feasible) option in cases where the experimental constraints
on the duration of data collection are severe.

468 MODELING OF NONSTA TlONARY SYSTEMS
If a controlled input can be used, then an efficient and rigorous test of nonstationarity
can be performed as follows. A stationary broadband (preferably quasiwhite) random in-
put is applied to the system and the recorded output is analyzed to determine whether it is
a stationary random process or not. Barring the presence of significant output contaminat-
ing noise that happens to be nonstationary, the stationarity (or nonstationarity) of the out-
put signal provides an unambiguous answer to the question of system stationarity, be-
cause the latter reflects inevitably on the former. A practical and rigorous method for
establishing the stationarity (or nonstationarity) of the output signal has been proposed by
the author [Marmarelis, 1981b, 1983] and is outlined in Section 9.2 in connection with the
kernel expansion methodology of nonstationary modeling of Volterra systems.
In this chapter, we review existing methodologies for the purpose of nonstationary
modeling from input-output data and emphasize the ones that we consider most promis-
ing in terms of efficacy and practicability. We begin in Section 9.1 with the most com-
monly used approach, which entails the tracking of nonstationary changes through time
by applying either recursive algorithms or piecewise stationary methods over a sliding
window of data (quasistationary methods). In Section 9.2, we discuss the nonstationary
modeling methodology based on kerne I expansions that was pioneered by the author in
order to provide a general methodology for modeling nonstationary Volterra-Wiener sys-
tems with broadband inputs. In Section 9.3, the nonstationary modeling of nonlinear dy-
namic systems is cast in the context ofVolterra-equivalent time-varying networks. Final-
ly, in Section 9.4, we present some illustrative examples from applications to actual
physiological systems.
The presented methods offer effective means for nonstationary modeling of certain
classes of systems. Specifically, the classes covered are systems with slow nonstationari-
ties (relative to the system dynamics) of nearly arbitrary form, or systems with fast non-
stationarities ofrestricted form (i.e., cyclical or transient). The strengths and limitations of
each case are discussed below.
9.1 QUASISTATIONARY AND RECURSIVE TRACKING METHODS
The simplest and most commonly used approach to the modeling of nonstationary sys-
tems is the use of piecewise stationary modeling methods within a limited time window
of the input-output data and the successive repetition of this task as the time window
slides through time. Typically, substantial overlap is allowed between successive time
windows, so that the evolution of changes in the obtained sequence of piecewise station-
ary models can be tracked reliably through time.
This quasistationary approach requires the proper selection of the length of the sliding
time window so that sufficient data exist for obtaining reliable stationary input-output
models at each position ofthe time window, without exceeding in length the time interval
over which significant nonstationary changes do not occur in the system. Therefore, a
critical trade-off must be exercised between the speed of nonstationary change and the
amount of required input-output data for reliable (stationary) model estimation. For this
reason, the quasistationary approach is effective when the speed of nonstationary change
is low relative to the system dynamics (i.e., its bandwidth), so that a time window of ade-
quate length can be secured for (stationary) model estimation at each time position ofthe
sliding window. The amount of overlap between successive windows is another parame-
ter that can be used to finesse the strain ofthese competing requirements.
9.2 KERNEL EXPANSION METHOD 469
On the positive side, the quasistationary approach can be applied to nonstationarities

of nearly arbitrary form (i.e., they need not assume specific patterns, like cyclical, tran-
sient, etc.) including cases of chaotic or stochastic variations of the system characteristics.
In addition, this approach does not require any additional methodological burden or eso-
teric mathematical formulations that tend to be discouraging to many biomedical investi-
gators. An illustrative example of this approach is given in Section 9.4 for the case of
cerebral autoregulation.
Therefore, the quasistationary approach is very attractive as a first option (provided
that the nonstationarity is slow relative to the system dynamics) because many physiolog-
ical systems exhibit nonstationarities with no particular temporal structure (e.g., the inter-
modulatory effects of endocrine secretions or intermittent activity of the autonomic ner-
VOtiS system on the cardiovascular system, subject to stochastic external stimulation). One
drawback ofthis approach is that interpretation ofthe observed nonstationary behavior re-
quires separate analysis, because the nonstationarity is not modeled directly (unlike the
cases of kernel expansion and network based approaches that model directly the form of
the nonstationarity). Ultimately, the validity and the utility of this approach must be
judged on the basis of its performance in each specific application (like all other ap-
proaches).
To improve the estimation accuracy of this quasistationary approach, parametric re-
cursive methods have been introduced to track the changes of the system in a continu-
ous fashion. One such method is outlined in Section 3.1 in connection with adaptive es-
timation of parametric models through the recursive least-squares (RLS) algorithm
[Goodwin & Sin, 1984]. This algorithm can be applied to the parameterized form of the
general Volterra model ofEq. (2.180) (termed the "modified Volterra model") after pre-
processing of the input through aselected filter bank, which is equivalent to the kernel
expansion on the filter-bank basis of functions. The filter banks remain time-invariant
and simply the coefficients of the modified Volterra model (which can be viewed as the
kernel expansion coefficients) are updated through time via the RLS algorithm (recall
that the coefficients enter linearly in the model, although the latter is nonlinear in terms
of the input-output relation). The constraint of slow nonstationarity (relative to the sys-
tem dynamics) still applies in this case, but the estimation accuracy of the successive
stationary models generally improves with RLS estimation. An example of this ap-
proach is given by Michael Khoo and his associates in the modeling study of respirato-
ry sinus arrhythmia [Blasi et al., 2003].
9.2 KERNEL EXPANSION METHOD
This is the general approach to the problem of modeling, instead of simply tracldng (as in
the previous section), the nonstationarities ofVolterra systems that are represented by ex-
plicit time variation ofthe Volterra kernels [Marmarelis, 1979b, c, 1980a, c, 1981a]. The
approach, termed the "temporal expansion method" (TEM), employs a judiciously select-
ed basis of temporal functions {ßm(t)}, defined over the input-output data record [0, R],
to represent the rth-order time-varying Volterra kernel as an expansion:
M
kr(t; Tb ... , Tr) = L a~)( Tb ... , Tr)ßm(t)
m=l
(9.1)
470 MODELING OF NONSTATIONARY SYSTEMS
where {a~)} can be viewed as time-invariant kernel components associated with the M
functions of the temporal basis for each order of kernel. This expansion is valid for all
kernels that are square integrable over time, i.e., satisfy the condition
lim -1 IR k;(t; 'Tl' . . . , 'Tr)dt< 00 (9.2)

R~oo R 0
for all ('Tb . . . , 'Tr). Note that, even if condition (9.2) is not satisfied, an approximation of
the expansion (9.1) can be obtained in practice. This type of kernel expansion can be ap-
plied also to the time-varying Wiener kernels ofthe system:
M
hI]; 'Tl' . . . , 'Tr) = I
m=l
1'~;?( 'Tb ... , 'Tr)ßm(t) (9.3)
provided that the Wiener kemels also satisfy the square integrability condition (9.2). Re-
call that the Volterra or Wiener kernels must also satisfy integrability conditions with re-
spect to ('Tb ... , 'Tr) in order to represent stable systems. Thus, for stable physiological
systems, these integrability conditions imply bounded values of the kerneIs.
The reader must be also alerted to the fact that the manner by which the time-varying
kemels are represented in Equations (9.1) and (9.3) is different from the mathematical
fonnalism employed in control system theory. In the latter, following the pioneering work
of Zadeh, the time-varying representation of the impulse response function h(t, 'T') (in a
linear system context) is defined by a convolution input-output relation where the vari-
able 'T' is different from our lag-variable 'T in that it integrates the input values in reverse
order from -00 to t. This distinction in mathematical fonnulation appears trivial but has
profound implications in terms ofthe separability condition required by the kerneI expan-
sion. Therefore, it must be clarified that our fonnulation follows on the established
Volterra series fonnulation (with regard to the definition ofthe lag variables) and simply
allows the kernels to change with time t. This is consistent with the pioneering work of
Flake on time-varying Volterra models [Flake, 1963b].
The key result that was obtained by the author in 1979 (but has been largely over-
looked by the peer community and periodically "reinvented" in various specialized
fonns-e.g., cyclostationary analysis) is that the expansion kernel components in Equa-
tions (9.1) and (9.3) can be estimated in practice by use of a single input-output data
record, provided the appropriate expansion basis {ßm(t)} is selected.
The judicious selection of the temporal expansion basis {ßm( t)} is critical for the effi-
cacy of this methodology in a practical context because it detennines the number of re-
quired expansion components. Obviously, the approach is more efficacious when a small
number of expansion components is sufficient for satisfactory representation of the non-
stationary dynamics of the system. Therefore, the selection of the appropriate temporal
basis depends on the nonstationary characteristics ofthe specific system under study. For
instance, a system with periodic (or cyclical) nonstationarities may be modeled efficiently
by use of a few components ofthe Fourier basis, or a system with transient nonstationari-
ties may be modeled efficiently by use ofpolynomial (e.g., Legendre) or polynomial-ex-
ponential (e.g., Laguerre or Gamma) basis functions.
Having selected a temporal expansion basis, the estimation ofthe kernel expansion com-
ponents can be accomplished either via cross-correlation (ifthe input is white or quasiwhite)
or via least-squares fitting (ifthe input is arbitrary). The two cases are elaborated below.
For a GWN (or quasiwhite) input, the expansion components of the Wiener kernels
can be estimated by "weighted" crosscorrelation of the input-output data for a single data
record as
1
y~:?( Tb ... , Tr) = PR
IR ßm(t)Yr(t)x(t -
0 Tl) . . . x(t - Tr)dt (9.4)
where the temporal expansion basis ßm(t) constitutes the "weighting function" that can be
viewed as "modulating" the output data prior to crosscorrelation with the GWN or quasi-
white input x(t), P is the power level ofthe GWN or quasiwhite input, and y, denotes the
rth-order output residual (defined as in the stationary case by subtracting all previously
estimated terms from the output signal).
It has been shown that the estimator ofEquation (9.4) is unbiased and consistent (i.e.,
its variance tends to zero as R tends to infinity) for systems with "sustained" nonstationar-
ities [Marmarelis, 1979b, cl. The latter are defined by the following condition:
o< R~oo 1 IR h;'(t; Tl' . . . , Tr)dt<

lim -2 00 (9.5)
R-R
which pertains also to the square-integrability condition discussed earlier for the existence
ofthe temporal expansion (the right side ofthe inequality). The left side ofthe inequality
simply states that the kernel must not vanish after a finite point in time (this must hold for
at least one kerneI of the system, so that the system operation does not "die out"). This
"sustainability" condition is required for the statistical consistency of the estimator (9.4)
but, in practice, it suffices to have a data record of adequate length to achieve satisfactory
estimation variance. Therefore, all these mathematical conditions for R ~ 00 need only be
satisfied in practice for finite data records of sufficient length.
The important result regarding estimator (9.4) is that its variance decreases with in-
creasing Rand, therefore, the kernel expansion components in Equation (9.3) can be esti-
mated in practice using a single input-output data record, provided the input is GWN or
quasiwhite. This result creates a host of exciting prospects in actual applications to physi-
ological systems exhibiting nonstationarities with discernible structure that can be cap-
tured by the appropriate temporal expansion basis. However, in order to use the crosscor-
relation-based estimator (9.4), the input must be GWN (or quasiwhite) and, thus, for
arbitrary input waveforms, we must resort to the Volterra kernel-expansion formulation
of the problem. By substituting the kernel expansion of Equation (9.1) into the time-
varying Volterra model, we have the nonstationary input-output relation
y(t) = ko(t) +
o
r
1k,(t; 'T)x(t- 'T)d'T + I k2(t; 'T" 'T2)x(t - 'T,)x(t - 'T2)d'T]d'T2
00
+ ... + r...
o
Ikr(t; 'T" ... , 'Tr)x(t - 'T]) . · . x(t - 'Tr)d'T, ... dr, + ...
=~/m(t){a~) + f a~)( 'T)x(t- 'T)d'T + fI a<~)( 'T" 'T2)x(t - 'T])x(t- 'T2)d'T]d'T2
+ ... + f ... Ia~)('T" · .. , 'Tr)x(t- 'T,) .. · x(t- 'Tr)d'T, ... dr; + ...} (9.6)
It is evident that each temporal expansion basis function ßm(t) is associated with a full sta-
tionary Volterra series comprising all ofthe mth expansion components ofthe time-vary-
ing Volterra kernels, In order to estimate practically all these expansion components of
the Volterra kemels for arbitrary input waveforms, we must resort again to yet another
kernel expansion of the lag variables (as in the stationary case of the modified Volterra
model). To this purpose, we use an expansion basis {bj(T)} over the lag variables that is
tailored to the dynamic characteristics of the system (e.g., the Laguerre basis) to further
expand the time-invariant kernel components as
L L
a~)(Th···' Tr) = I ...JIC~)(jb". ,jr)bj1(Tl).·· bjr(Tr)
jl=1
(9.7)
r=1
Subsequently, the nonstationary Volterra model ofEquation (9.6) takes the modified form
y(t) = I:IM {L
ßm(t )
L L
c~) +j~lC~)(jl)Vh(t) +j~lj~/;,)(jhj2)Vh(t)Vh(t)
L L }
+ ... + ~ ... ~C}:;)(jh ... ,jr)vJl(t) ... Vjr(t) + ... (9.8)
11=1 1r=1
This modified Volterra model is linear in terms of the unknown expansion coefficients
{c~)}, which can be estimated by least-squares regression from the discretized data {x(n),
y(n)} (n = tlT), based on the output expression for the Qth-order system:
M Q L L
y(n) = I I I ... jr=1
m=1 r=0 jl=1
I C~)(jb ... ,jr)ßm(n)Vjl(n) ... vjr(n) (9.9)
where n = 1, ... , N is the discrete-time index of the data, and c~) = a~) for uniformity of
notation. Note that vj(n) is computed as the discrete-time convolution between b, and x.
The number of unknown parameters in the model of Equation (9.9) is P = M(Q +
L)!/(Q!L!), and N must be sufficiently larger than P in order to achieve satisfactory esti-
mation variance. This key requirement places the onus for the successful implementation
of this approach on being able to find appropriate kerne I expansion bases (over t and T) so
that the P -eN.
In order to meet this key requirement, it is usually necessary to perform preliminary
analysis that can guide the selection of the temporal expansion basis to minimize M. It is
evident that the use of a general (complete) basis, like the Fourier basis, cannot meet this
requirement because M = N in that case. If the minimization of M proves to be difficult,
then the use of the modified Volterra model of Equation (9.9) becomes problematic. In
that case, the crosscorrelation-based approach remains a viable option (if quasiwhite in-
puts are available),because it does not place any formal constraints on M relative to N-
although a much larger N is still desirable in order to achieve satisfactory estimation vari-
ance for the kernel expansion components of Equation (9.4). Nonetheless, it must be
emphasized that the use ofthe estimator ofEquation (9.4) can employ a general (com-
plete) temporal expansion basis and does not rely on apriori minimization of M.
In light of these observations, it is evident that a practical approach to the nonstation-
ary modeling problem may involve two steps: first, use the cross-correlation-based
method (if quasiwhite inputs are available) to explore the structure of the system nonsta-
9.2 KERNEL EXPANSION METHOO 473
tionarities with a complete temporal expansion basis (e.g., Fourier); and second, minimize
M with appropriate selection of the system-specific functions {ßm(t)} that decompose ef-
ficiently the system nonstationary dynamies in the form of Equation (9.8) and estimate
the expansion eoefficients of the time-varying Volterra kernels via least-squares regres-
sion ofEquation (9.9).
A special note must be made for the case ofusing the complete Fourier basis {ei m wot }
(m = 0,1, ... , M= Nil; Wo = 21T/R) for the temporal expansion ofthe time-varying ker-
nels, because this represents an easily implementable and general method for exploring
the nonstationary characteristics of the system using FFT. In fact, the resulting set of first-
order Wiener kernel expansion eomponents {1'~)} in this case is termed "the half-spec-
trum" and has broad utility in connection with the nonstationary problem [Marmarelis,
1981a, b, 1983, 1987b, c; Sams & Marmarelis, 1988]. It is evident that the half-spectrum
eomponents can be estimated via discrete Fourier transform (or FFT) over time of the dis-
eretized product [y(t)x(t - T)] for every T within the system memory M, properly scaled by
the input power level. In the estimation of the half spectrum, special attention must be
given to the problem of "leakage" assoeiated with the use of diserete Fourier transform
(DFT) or FFT, for which proper eorreetions ean be applied [Sams & Marmarelis, 1988].
Note that extrapolation of the estimated nonstationarity outside the interval of observation
[0, R] is not generally possible, but it may be possible in the case of periodic nonstationar-
ities.
The selection of the "signifieant" eomponents of the half-spectrum must be based on
statistical hypothesis testing that utilizes the following expression for the estimation vari-
anee of eaeh component when the input in GWN (or quasiwhite):
1
Var[ y~)(T)] = R2 IR dt IJLdA· hr(t; A)
o 0
+
1
R r
o
dt{dA(l - J1)h 1(t, T- A)h 1(t + A, T+ A)e-j2mnAiR
-R R
(9.10)
Since the variance of the half-spectrum component estimates depends on the first-order
time-varying Wiener kernel h 1(t, A), we must proeeed in practice sequentially by testing
first the largest (in peak absolute value) component estimate against a null hypothesis of
no signifieant eomponents at all. If the null hypothesis is rejected for any T value, then
this eomponent is aceepted as "significant" and we proceed with the next-largest compo-
nent estimate under a null hypothesis that ineorporates the previously aceepted "signifi-
cant" components for the evaluation of the eonfidence bounds [detennined by the vari-
anee ofEquation (9.10)]. This proeedure terminates when the null hypothesis is accepted
and allows seleetion of the significant eomponents of the half-spectrum estimate that, in
turn, allows estimation of the first-order time-varying Wiener kernel of the nonstationary
system (see example in Section 9.2.1 below). Note that the half-spectrum is complex in
general.
It is evident that, for a stationary system, only the zero-order component, 1'0(1)( T), will
be selected. This fact suggests a rigorous test of stationarity for a given system, when a
GWN (or quasiwhite) input is available. When the input is arbitrary, a test for (non)sta-
tionarity is described in Section 9.2.2.
This statistical hypothesis testing procedure can be applied also to determine the non-
linear order of a (possibly) nonstationary system with GWN (or quasiwhite) input, utiliz-
474 MODELING OF NONSTA TlONARY SYSTEMS
ing the respective expression for the variance of the higher-order kernel components. The
nonlinear order is detennined when the null hypothesis is accepted for the largest compo-
nent estimate ofthe Qth-order kernel, using confidence bounds determined by the respec-
tive variance:
Var[r9)(T\> ... , TQ)] = '/\12oD\2 rJ

o
dtdt' · ßm(t)ßm(t')c/Jit- t')c/J2Q(T\> .. · ,TQ) (9.11)
where cPe denotes the autocorrelation function ofthe residuals, and cP2Q denotes the 2Qth-
order autocorrelation function of the stationary input.
9.2.1 Illustrative Example

Computer-simulated examples (for which "ground truth" is available) are used to demon-
strate and validate the presented nonstationary modeling methodology. To simplify mat-
ters in the cross-correlation-based approach, we choose initially a first-order (linear) sys-
tem that is described by the linear periodically time-varying differential equation
dy + (1 + a cos ßt)y = x (9.12)

dt
and simulate it for a GWN input signal with unit power level [Sams & Marmarelis, 1988].
The exact time-varying kernelofthis system is
h(t, T) = e-T-(2aIß)sin(ßTI2)cos(ßt-ßtI2) u( T) (9.13)
This kernel satisfies the integrability conditions for consistency of the kernel component
estimates. Note that this linear system has only this first-order kernel, which is not termed
"impulse response function" because it is not the response to an impulse input for time-
varying systems [the latter being h(t, t)]. The periodic time-varying character ofthis sys-
tem makes suitable the Fourier temporal expansion basis. The Fourier expansion terms of
this kerneI (over time) are nonzero at all multiples of the frequency ß of the system non-
stationarity. However, only a few ofthese terms are expected to be distinguished from the
background "noise" level in the half-spectrum estimate, which is determined by the esti-
mation variance of Equation (9.10). The time increment (sampling interval) used is 0.2
sec, and the parameters ofthe model ofEquation (9.12) are a = 2, ß = 1.
The effective memory of this system is approximately 6 sec (31 lags) and the period of
nonstationarity is 2'JT = 6.28 sec, making the two comparable in size. The simulations
used 8.192 input-output data points. The lower portion (up to a frequency ofO.5 Hz) of
the obtained half-spectrum magnitude is shown in Figure 9.1. This portion of the half-
spectrum contains all the significant kernel component estimates, as determined by the
previously outlined statistical procedure over the entire half-spectrum up to the Nyquist
frequency of 2.5 Hz. As indicated in Figure 9.1, four significant components {co, Ch C2'
C3} of the kerneI expansion are identified, corresponding to zero frequency and first, sec-
ond, and third hannonics of the frequency of the periodic nonstationarity (1/27T Hz). Of
course, kernel expansion components exist at higher harmonics but cannot be detected
due to their small size relative to the background variance that is caused by the random
nature of the test input and the finite length of the data record. Corrections for the "leak-
age effect" were made to the half-spectrum estimates at the aforementioned three harmon-
HALFSPECTRUM
MAGNITUDE
L..J
1"- 0.1 Hz
Figure 9.1 Estimated half-spectrum magnitude for the linear time-varying system in Equation
(9.12). The four significant expansion components are indicated [Sams & Marmarelis , 1988]
ies, and were critical in obtaining an accurate kernel estimate [Sams & Marmarelis,
1988].
The resulting time-varying kernel estimate, eomposed of these four components (in-
cluding the respective phases), is shown in Figure 9.2 along with the exact kernel given
by Equation (9.13) over the same time interval of 20 sec near the middle portion of the
record. If stationary analysis were applied to this set of input-output data, the time-invari-
ant kernel (impulse response function) shown in Figure 9.3 would be obtained. This is
identical to the zero-frequeney (m = 0) eomponent ofthe half-spectrum. Segments ofthe
aetual system output and the model predictions obtained from stationary and nonstation-
ary analysis for an independent GWN input are shown in Figure 9.4. This figure demon-
strates the importance of employing the nonstationary (time-varying) identification
method. The normalized mean-square error ofthe time-varying model prediction is 9.8%
compared to 62.5% if a time-invariant model is used. The accuraey ofthis prediction (and
of the associated kernel estimate) ean be improved by increasing the length of the in-
put-output data record, owing to the eonsistency ofthe estimator (9.4).
An example of the use of the modified Volterra model (when the additional expansion
over the lag variable tau is applied) is given in [Marmarelis, 1981a] for a second-order
system.
9.2.2 A Test of Nonstationarity

The procedure described above for the seleetion of the statistically significant compo-
nents ofthe time-varying kerneI expansion, when the input is GWN (or quasiwhite), can
be used also for testing the stationarity of the system. Clearly, stationarity is ascertained
in the ease where only the zero-order expansion eomponent (m = 0), corresponding to a
constant basis function, is seleeted as statistieally significant for all kerneis present in the
system.
When the input waveform is arbitrary (not GWN or quasiwhite) the aforementioned
proeedure is not valid and an alternative test for nonstationarity can be based on examin-
ing the stationarity of the output signal (provided that the input signal is a stationary
proeess). The test relies on the fact that the output signal will be a nonstationary random
proeess if and only if the system is nonstationary (assuming stationarity of the input sig-
(a)
h(t,T)
EXACT KERNEL
(b)
-:
T
Flgure 9.2 (a) Segment 01 the estimated time-varying kemel 01 the system 01 Equation (9.12) con-
structed on the basis of the four significant half-spectrum components shown in Figure 9.1. (b) The
same segment 01the exact time-vary ing kernel given by Equation (9.13) [Sams & Marmarelis, 1988).
2-' TIME-INVARIANT KERNEl
2 4 6
T' (sec)-
Figure 9.3 The time-invariant first-order kernel estimate 10r the time-varying system of Eq. (9.12),
representing the zero-order component of the hal1-spectrum [Sams & Marmarelis, 1988).
I I TIME-INVARIANT MODEL RESPONSE

10 sec:
Figure 9.4 Comparison 01 the actual response 01 the time-varying system 01 Equation (9.12) (trace
b) with the predicted response using the estimated time-varying model (trace a) and the time-invari-
ant model 01 Fig. 9.3 (trace c) [Sams & Marmarelis, 1988].
nal and of possible output-additive noise). Note the distinction between a system and a
signal (a random process) with regard to defining stationarity. In order to test the station-
arity of the output signal, we must first estimate its autocorrelation function as follows
[Marmarelis, 1981b].
Consider the autocorrelation function of a nonstationary random process y(t):
4>(t, T) = E[y(t)y(t - T)] (9.14)
and its temporal expansion on the basis {ßm(t)} in the range [-R, +R]:
M
4>(t, T) = I
m=O
'I'm(T)ßm(t) (9.15)
We have shown [Marmarelis, 1981b] that the single-record estimator
~m(T) = - 1 JR y(t)y(t - r)ßm(t)dt (9.16)

2R -R
yields unbiased and consistent estimates of the expansion components {'I'm(T)} under the
consistency condition described below for the variance of the estimator:
Var[~m(T)] = 4~2r f{E[Y(t)y(t- T)y(t)y(t'- T)- 4J(t, T)4J(t', T)} · ßm(t)ßm(t')dtdt'

-R (9.17)
which is going to zero as R tends to infinity. Consider the uniform bound B of the basis
functions {ßm(t)} and a "ceiling" function a( A) which satisfies the inequality
478 MODELING OF NONSTATJONARY SYSTEMS
E[y(t)y(t - T)y(t')y(t' - T)] - cP(t, T)c/>(t', T) ~ a 2 (t - t') (9.18)
for all t, t', and T.

Then, by using inequality (9.18) in Equation (9.17) we find that
" B2 J2R ( lAI ) 2

Var ['I'm(T)]:5 2R -2R 1 - 2R a (A)dA
B2 J2R 2
:5 2R -2R a (A)dA (9.19)
Therefore, the estimator is consistent if
. 1 J2R 2
R~oo
11m -2R (9.20)
-2R a (A)dA = 0
Note that a 2 (A) can be seen as a ceiling function for the autocorrelation ofy(t):
E[:y2(t)y(t - A)] - M 2 2 ~ a 2 (A) (9.21)
where M 2 is an upper bound of the second moment of y(t). Therefore, existence of a func-
tion a 2 (A), which satisfies the integrability condition (9.20), is tied to the autocorrelation
pattern ofY(t). We expect that in most cases ofpractical interest, the correlation between
samples of y(t) will decline with increasing intersample distance at a sufficiently fast
pace as to satisfy the integrability condition (9.20). These expressions can be further elab-
orated for Gaussian processes/signals [Marmarelis, 1983].
This rationale and the resulting estimator can be extended to single-record estimation
of high-order auto- and cross-correlation functions of nonstationary processes. However,
it should be noted that the obtained expansion of a nonstationary autocorrelation function
is valid only within the time range [-R, +R], and extrapolation outside this range must be
judged on the characteristics of each particular case at hand.
Having estimated the autocorrelation function of the output signal, we can now apply a
statistical test on the estimated expansion components {'I'm(T)} in sequential order ac-
cording to their peak absolute value, as in the case of the kernel expansion discussed pre-
viously, except for the fact that the largest component is now accepted untested (since this
is an autocorrelation function and at least one component must be nonzero). For the sec-
ond largest component, the confidence bounds of the null hypothesis are based on the
variance expression of Equation (9.17), where the autocorrelation function c/J(t, T) is con-
structed by the accepted expansion components up to this point. The procedure continues
until the null hypothesis is accepted for some component (ranked according to peak ab-
solute value). Clearly, if any component for m ~ 0 is selected, then the output signal is
nonstationary and the system is also deemed nonstationary under the previously stated as-
sumptions of input and noise stationarity.
This test of nonstationarity is rigorous and encompassing all the possible kemels in the
system. It can be applied for arbitrary (natural) inputs under the stated weak assumptions.
If the latter assumptions are not valid, then quasistationary estimation of Volterra models
(over successive time segments) can be used to examine the significance of possible
changes in the estimated kemels over time.
9.2.3 Linear Time-Varying Systems with Arbitrary Inputs

The case of linear time-varying systems deserves separate elaboration because it offers a
relatively simple tool for nonstationary analysis of "linearized" systems with arbitrary in-
put waveforms and allows us to outline the analytical approach of parametric realization
of time-varying models in the form of difference equations with time-varying coeffi-
cients. The latter can be useful in applications such as speech processing or analysis of
physiological nonstationary signals (e.g., cardiopulmonary data, EEG, ECG, etc.).
The general input-output relation in the linear case is given by
y(t) = r
o
h(t, 'T).x(t - 'T)d'T (9.22)
where the time-varying kernel (impulse response function) can be expanded on the tem-
poral basis {ßm(t)} as
h(t, T) = Im
a m(T)ßm(t) (9.23)
This time-varying kernel can be estimated from arbitrary (but broadband) input-output
data as the two-dimensional inverse Fourier transform (2D-IFT):
h(t, T) = 2D - IFT{<Pyx(u, w)/<Pxx(w)} (9.24)
where <Pxx(w) is the input spectrum and <Pyx(u, w) is the two-dimensional Fourier trans-
form ofthe nonstationary input-output crosscorrelation. Note that practical estimation of
the latter can be accomplished by means of the procedure outlined in the previous section
for the autocorrelation function ofthe output signal. Having estimated h(t, T), we can now
determine the significant components in the expansion of Equation (9.23). This estimate
can assist the initialization ofthe Volterra modules {Vm } in the time-varying network ap-
proach discussed in the following section.
The derivation of equivalent parametric (time-varying difference equation) models us-
ing the experimentally obtained nonparametric (time-varying kernel) measurements is a
very challenging and long-standing problem, for which a practical solution is outlined be-
low. For the discrete-time, linear time-varying case represented by the difference equation
model:
c;(n)y(n - i) + ... + c}(n)y(n - 1) + co(n)y(n) = x(n) (9.25)
we can substitute the output expression
yen) = I I
m I
am(l)ßm(n)x(n-1) (9.26)
and using the discrete exponential input x(n) = z", we obtain after summation over I
I Am(z)[cln)ßm(n - i)Zi + ... + c}(n)ßm(n - l)Z-1 + co(n)ßm(n)] = 1 (9.27)

m
480 MODELING OF NONSTATfONARY SYSTEMS
Ifthe discrete Fourier set: ßm(n) = exp[-jmnwo], is used for temporal expansion basis then
Equation (9.27) becomes after summation over n
Im
Am(z)fmez) = 1 (9.28)
where
f mez) = 'Yi,mejmiWOz-i + ... + 'Yl,mejmWOz-l + 'YO,m (9.29)
and 'Yj,m denotes the mth Fourier coefficient of ej(n). Since Am(z) can be estimated from in-
put-output data (as described above), Equation (9.28) implies that the functions I'mez) can
be determined within a scalar as
Fmez) = const/Am(z) (9.30)
Knowledge ofthe functions r mez) allows estimation ofthe coefficients {'Y.i,m} from which
the time-varying coefficients {Cj(n)} U = 0, ... , i) ofEquation (9.25) can be reconstruct-
ed, yielding the desired difference equation model. Implementation of this procedure is
clearly a nontrivial task but its far-reaching importance and unique capability justifies the
effort in applications where parametrie time-varying models are useful.
9.3 NETWORK-BASED METHODS
It is evident from Equation (9.6) that the general nonstationary nonlinear system (satisfy-
ing the weak requirements of the Volterra class) can be modeled as a parallel set of sta-
tionary Volterra modules {Vm } whose outputs are multiplied (modulated) by selected ba-
sis functions {ßm(t)} before summing them to form the system output, as depicted in
Figure 9.5.
Volterra
Modules
(Subnets)
Modulating
Functions
Figure 9.5 Time-varying Volterra-equivalent network model.

9.3 NETWORK-BASED METHODS 481
Each of these modules can be represented in the modified Volterra model form using
an additional kernel expansion over the lag variables, as shown in Equation (9.8), or it can
be represented by a Volterra-equivalent network in order to achieve greater parsimony.
The compactness benefits (i.e., less free parameters in the model) are significant in actual
applications, especially when the input waveform is arbitrary (but sufficiently broad-
band). This is the rationale for the introduction ofthe network-based method that has been
shown to be effective under stringent operating conditions where other alternative meth-
ods fail [Iatrou et al., 1999a].
Since Volterra-equivalent networks (VEN) were discussed extensively in Chapters 4
and 6, we will not elaborate on them here any further. Any of the VEN forms discussed
earlier can be used for representation ofthe modules {Vm } , depending on the characteris-
tics of each case. However, we must emphasize that the selection of the "modulating"
functions at the outputs ofthe Vm modules is critical for the successful application ofthis
approach. Although these modulating functions (MFs) correspond mathematically to the
basis functions {ßm(t)} ofthe temporal expansion, their number has to be constrained in
practice in order to achieve satisfactory convergence ofthe training ofthe network-based
model within the customary constraints of applications. This selection is guided, of
course, by our preexisting knowledge ofthe nature ofthe system nonstationarity that gen-
erally falls within three categories: (1) periodic or cyclical; (2) asymptotic transient; (3)
transient trend.
A finite number of MFs with a small number of unknown parameters must be se-
lected for each category. For the first category (periodic or cyclical), we typically select
a small number of sinusoidal MFs of unknown frequency and phase (to be determined
by the data through the iterative training procedure). For the second category (asymp-
totic transient), we typically select a small number of exponential functions ofunknown
time constants or sigmoidal functions with unknown slope and offset parameters (to be
determined by the training procedure) possibly combined with polynomials. For the
third category (transient trend), we may select polynomial functions or any other func-
tion deemed appropriate with a small number of free parameters (to be determined by
the training procedure).
It was found empirically that the inclusion ofMFs in the VEN model architecture often
makes the convergence of the training algorithm more challenging, necessitating the use
of specialized algorithm for faster convergence (e.g., the delta-bar-delta rule) [Haykin,
1994; Iatrou et al., 1999a, b]. Illustrative examples from simulated systems are given be-
low and from actual physiological data in Section 9.4.
9.3.1 Illustrative Examples

To validate and demonstrate the efficacy ofthe network-based method, we use computer-
simulated examples oftime-varying network (TVN) models where the Volterra modules
{Vm} in Figure 9.5 (also called "subnets") are simply L m - Nm cascades, and the MFs are
sinusoidal, exponential, sigmoidal, and polynomial functions. The filters L m in these cas-
cades can be viewed as "principal dynamic modes" in the context ofnonstationary model-
ing. In all examples, the zero-order module/subnet corresponds to ßo = 1 (viewed as the
stationary component ofthe system).
Broadband random (both white and nonwhite) and deterministic (chirp) input signals
of 1024 data points for systems with memory-bandwidth product of 16, were used suc-
cessfully, and convergence was achieved generally after less than 1000 iterations of
training. The results were excellent in all cases, and especially for exponential (tran-
sient) and sinusoidal (periodic) nonstationarities, where the convergence was fast even
for relatively short data records. For polynomial nonstationarities, convergence of the
training algorithm required longer data records, especially for high-degree coefficients.
Sums of sinusoids and combinations of exponential, sinusoidal, and polynomial func-
tions also were tested successfully. This relaxes the specification requirements for mod-
el nonstationarities (i.e., form of MFs) in practice. For instance, if the system nonsta-
tionarity is periodic, sin(t), and the model nonstationarity is specified in the more
general form, exp(-bt)sin(wt), then the training algorithm will yield estimates b = 0, W
= 1.
The method was shown to be robust in the presence of output additive noise. Illustra-
tive examples are given in Figures 9.6-9.8 for transient and periodic nonstationarities.
The robustness ofthis approach is illustrated in Figure 9.9, where the model prediction is
shown to be practically impervious to output-additive noise for SNR = 0 dB. More elabo-
ration on these examples can be found in Iatrou et al., (1999a).
ActuI1 aad ... il.~+) iqJaIse nspaase tbr_ Lt ActuaIIud estin-ed(+) iqJu1se response fbr_ L2
0.8t I 0.8,.......- - - - - - - - - . .
0.2
o t . --r=J , I ! ! , , ,
o 5
Time Lag
10 15 o 5 15
r Jme Ug10
(a) (b)
~ ofmociulator expooeat ~ of output·rms error

0.02 .....-....----..-........- . . - - , 0.8-.- .......--..,ro---....-......----.
0.015 0.6
0.01 0.4
0.005 0.2
°0 100 200 300 400 °0

~ 100- 200 300 «Xl
lteratiaDs ItendioDs
(c) (d)
Figure 9.6 Results for a simulated TVN model (see Fig. 9.5) with one stationary and one nonstation-
ary subnet with exponential MF: ß1(n) = exp(-O.005n); panels (a) and (b) are the actual and estimated
(+) impulse responses for filters Lo and L 1 , respectively; (c) convergence pattern of the MF exponent;
(d) convergence pattern of the output root-mean-square error [Iatrou et al., 1999a).
Actual end estlmated (+) Impulse reaponse for filter L1 ActUill (+) 8ncI estlmetecl Impulse reeponse for filter L2
O.S. , , , 0.8, , , ,
0.6
0.4
0.2
0
0 5 10 15 0 5 10 15
11meLag 11rne Lag
(a) (b)
Comergence of ~eter b ~mergeI ace ollnlection point q
0.03, , , , , , eoo
0.025 500
0.02
400
0.015
300
0.01
0.005 200
0 100
0 20 40 eo 80 100 0 20 40 60 80 100
Iterations iterations
(c) (d)
Figure 9.7 Results for a simulated TVN model with one stationary and one nonstationary subnet
with sigmoidal MF: ß1(n) = 1/{1 + exp[-O.015(n - 580)]}; panels (a) and (b) are the actual and estimat-
ed (+) impulse responses for filters La and L1 , respectively; (c) convergence pattern of the MF slope;
(d) convergence pattern of the MF offset [Iatrou et al., 1999a].
Actual Ind eltimated(+) impulse relpon.e for filter LI Actual Ind eltimated(+) impulle relponle for filter L2
0.8 0.8 ,
0.8 0.8
0.4 0.4
0.2 0.2
0, . - .. ,
o 5 Time LI.1 0 15 5 10 15
Time La.
(a> (b)
0.8, - -- __ y n -- ~------- -- ~-----",
3
o.e
0.4 2
0.2
o
-0.2 0
0 100 200 300 400 0 100 200 ·300 400
Iterationl Iterationl
(e) (d)
Figure 9.8 Results for a simulated TVN model with one stationary and one nonstationary subnet with
sinusoidal MF: ß1(n) = sin(O.64n); panels (a)and (b) are the actual and estimated (+) impulse responses
for filters La and L1 , respectively; (c) convergence pattern for the frequency; (d) the phase [Iatrou et al.,
1999a].
483
2.5. i • , ,
1.5
0.5
O~~~I""'II'l"~V"I~w",
-0.5
-1
-1.5~~ltIJ","~Vlll~."'T~'lI
-2' , , , , ,
1700 1800 1900 2000 2100 2200
Time
Figure 9.9 TVN model prediction and system output for noise-free and noisy (SNR = 0 dB) cases.
The estimated output (model prediction) is obtained from the TVN trained with the noisy output data
and appears to be a very good approximation of the noise-free output [Iatrou et al., 1999a].
9.4 APPLICATIONS TO NONSTATIONARY PHYSIOLOGICAL SYSTEMS
The three classes of methodologies presented in the previous sections have found some
initial applications in the study of nonstationary physiological systems, but surely not as
many as the importance of the problem warrants. We expect, however, that increasing at-
tention will be given to the nonstationary aspects of physiological function in the near fu-
ture, as investigators realize more and more their omnipresence and significance. The pre-
sented methodologies can be important tools in pursuing this purpose, and their
performance is illustrated below with initial applications.
The most commonly used approach is the quasistationary method, for which we show
an example taken from a study of cerebral autoregulation (elaborated in Section 6.2 for
the stationary analysis). The Volterra model of the nonlinear dynamic relation between
mean arterial blood pressure (input) and mean cerebral blood flow velocity (output) is es-
timated from a 6-min sliding window of data (having a 4-min overlap with the previous
window) using fixed structural parameters for a Laguerre-Volterra network (LVN) model
in each data window. The FFT magnitude ofthe estimated first-order Volterra kerneI (Le.,
the first-order frequency response function) is plotted in Fig. 9.10 for successive 6-min
segments over a two-hour period for two different subjects [Mitsis et al., 2002]. The non-
stationarity of this kernel is evident, and it is more pronounced in the frequency range be-
low 0.08 Hz (where autoregulation is deemed to be more active). This nonstationarity is
even more pronounced for the second-order kernel estimates (not shown here in the inter-
9.4 APPLICATIONS TO NONSTATIONARY PHYS/OLOGICAL SYSTEMS 485
10" 1 0"
i lO·I~J!~r "
!
>-
~~ 1 0. 2
U.
10.3 1 0 .3 ' II! ""., I " 111.11 1 " I 1
o 50 100 o 50 100
Time [mlnules) Tim e Im Inules]
Figure 9.10 The first-order frequency response functions tracked over 2 hours of data (6 min slld-
ing data segments with 4 min overlap) for two subjects . The nonstationarity is evident and has ran-
dom appearance [Mitsis et at., 2002).
est of space) . No apparent temporal pattern is evident in the observed nonstationarities, al-
though the matter deserves more study [Mitsis et al., 2002].
An example of the temporal expansion method is taken from a study of respiratory me-
chanics [Marmarelis , 1987c]. And the obtained half-spectrum and the reconstructed time-
varying first-order kernel are shown in Figure 9.11 (input is forced broadband tracheal
pressure and the output is the resulting tracheal airflow). The half-spectrum exhibits three
significant nonstationary kernel components that are clustered around the mean breathing
frequency , as expected, since the mechanical properties ofthe lungs (resistance and com-
pliance) change as a function of the phase of the breathing cycle (e.g., minimum compli-
anee at the end ofthe inspiratory phase, and maximum at the end ofthe expiratory phase) .
Thus, the half-speerrum offers a quantitative measure of these phasic ehanges and can be
used for improved diagnosis of obstructive pulmonary disease or to optimize artificial res-
piration (among other possible clinieal uses). The signifieanee of these cyclical nonsta-
tionary components belies the scientifie or clinical relevance of the simplified notion of
stationary analysis eommonly used.
The network-based method is illustrated with an application taken from a study of po-
tentiation in the hippocampus, whereby the perforant path is stimulated with a Poisson se-
quenee of impulses (mean rate of five impulses per second) and the population field po-
tential in the dendritic layer of the dentate gyrus is recorded as the output (see also Sec.
8.3.3). The relatively high mean rate ofPoisson stimulation induces potentiation, which is
modeled as an asymptotic transient nonstationarity using sigmoidal modulating functions
(MFs) in a second-order TVN with three modules/subnets (one stationary and two sig-
moidal nonstationary) [Iatrou et al., I999b]. All subnets had a single hidden unit with sec-
ond-degree polynomial activation funetions; thus, each has one "principal dynamic
mode" (PDM) . The changes in responses to impulsive stimulation are illustrated in Figure
IlALf-SPECTRIIl IIAGIlI TIm
Figure 9.11 Estimated half-spectrum magnitude (top) and time-varying kemel (bottam) of respira-
tory mechanics. The cluster of three significant nonstationary components is marked by "c" in the
half-spectrum and lies in the neighborhood of the breathing frequency [Marmarelis , 1987c].
9.12 for five segments of data (each segment is about 3.5 sec) spanning the course ofthe
potentiation process. The obtained TVN model gave excellent prediction accuracy (4.8%
normalized root-rnean-square error) and its two MFs and three PDMs (corresponding to
the three subnets) are shown in Figures 9.13 and 9.14.
The first MF "switches on" at the beginning of the third segment and Iasts for almost
5000 sampIe points (or 1.5 sec), and the second MF "switches off" at the end ofthe sec-
ond segment and lasts for almost 2500 sampIe points (or 750 msec) entering the third seg-
ment. Thus , the first nonstationary path is "off" during the first two segments and "on"
9.4 APPLICATIONS TO NONSTATIONARY PHYSIOLOG/CAL SYSTEMS 487
Reaponae to an Impulae for the live agments
0.5
1at segment .)
o
2nd segment b)
3rd segment c)
-1
4th segment d)
e)
-3' , , , , , ..
o 20 40 80 80 100 120
Time-Lag (msec)
Figure 9,12 The model based response to an impulse for the first (a), second (b), third (c), fourth (d),
and fifth (e) segments from the hippocampal data during potentiation. The time-dependent changes
due to potentiation are evident [Iatrau et al., 1999b].
0.8 , ,
o
..0.1 0.4
-0.2
0.2
-0.3
-0.4
o
-0.5
-o.e -0.2
o 10 20 30 «) o 10 20 30 40
TimeL8g Time Lag
(a) (b)
0.4 . - - - -.....- -......- -.....- -......- - -
0.2
o
-0.2
-0.4
o 10 2D 30 410
Time Lag
(c)
Figure 9.13 The three estimated PDMs of the trained TVN model of hippocampal potentiation with
one stationary (a) and two nonstationary subnets with modulating functions: ß1 (n) = 1/{1 + exp[-b 1(n
- q1)]} and ß2(n) = 1/{1 + exp[-b 2(n - q2)]} (b) and (c), respectively [Iatrau et al., 1999b].
0.8
0.8
0.4
0.2 'Modulatorb' 2nd nonst8lionary pelh
0' , , , l:t"5t ' , , , , , ,
o 0.5 1.5 2 2.5 3 3.5 4 4.5 5

l1me (slml)le points) X 10·
Figure 9.14 The estimated sigmoidal MFs for the hippocampal potentiation model corresponding
to the PDMs of panels (b) and (c) of Figure 9.13 [Iatrou et al., 1999b].
during the last three segments (the transitional area extends over the first half of the third
segment). The second nonstationary path is "on" during the first two segments and "off'
during the last three segments (the transitional area extends briefly over the end of the
second segment and beginning of the third segment). The model-predicted output for the
last three segments is almost identical to the actual output and a very good match over the
first two segments. This result demonstrates the ability of the network-based methodolo-
gy to model an important class of nonstationary systems with "on" and "off" switches that
capture different states that the system may assume over time. Each switch will be a dif-
ferent subnet with a sigmoidal MF introduced in the structure of the TVN model.
The parameters of the sigmoidal MFs (slope and offset) reflect the dynamics of the
molecular and cellular processes underlying the transition from the low-excitability state
to the high-excitability state (e.g., ligand-dependent conductances). Effects dependent on
voltage-gated or ligand-gated channels will be reflected on the estimated PDMs and their
corresponding MFs modeling state transitions. This offers the exciting prospect of rigor-
ous hypothesis testing regarding these "switching" nonstationary phenomena.
Another interesting application of nonstationary modeling of neural systems, using a
method based on ensemble averaging, has been presented by Krieger et al., (1992).
10
Modelingof
Closed-Loop Systems
The study of closed-loop systems is fundamental in physiology because of the importance

of homeostatic and autoregulatory mechanisms that maintain the "normal" operation of
living systems. Normal operating conditions must be viewed in a dynamic context, since
everything is in astate of perpetual change within living systems, as weIl as in the SUf-
rounding environment that exerts multiple and ever-changing influences/stimulations on
each living system. Furthermore, the intertwined operation of multiple closed-loop mech-
anisms often presents the daunting challenge of a "nested-loop" system. The importance
of understanding closed-loop or nested-loop operation in physiology is rivaled by its
complexity, especially if we seek a quantitative understanding that is offered by mathe-
matical modeling.
One of the key issues in the study of closed-loop systems is the "circular causality" im-
plicit in these systems, which prevents the ordinary assumptions of causality underlying
existing modeling methodologies. Various ideas have been proposed and tested to over-
come this problem, typically attempting to "open the loop" experimentally or method-
ologically. In the approach advocated herein, the closed-loop or nested-loop system will
be modeled as such, without any attempt to "open the loop." The resulting model will
capture the nonlinear dynamic interrelationships among all variables of interest (for
which time-series data measurements are available) regardless of the complexity of the
interconnections. The successful application of this approach can have enormous poten-
tial implications for the proper study of highly interconnected physiological systems that
perform homeostatic tasks under natural operating conditions (spontaneous activity). It is
critical to note that the advocated approach is cast in the framework of natural operation
and is not subject to experimental or methodological constraints that may distort the ob-
tained understanding of physiological function.
It is obvious that the advocated closed-loop modeling approach represents a "paradigm
shift" and can have immense impact on the development of a "new systems physiology"
with a subsequent quantum advance in clinical practice. It is also obvious that the com-
Nonlinear Dynamic Modeling ofPhysiological Systems. By Vasilis Z. Marmarelis 489

490 MODELING OF CLOSED-LOOP SYSTEMS
plexity of this subject matter deserves a lengthy treatment that is not practically possible
within the confines of this monograph. Therefore, this chapter will only attempt to intro-
duce the basic concepts and broad methodological approach, deferring the detailed elabo-
ration and the necessary demonstrations (unavailable at present) to a future monograph
dedicated solely to this subject matter.
10.1 AUTOREGRESSIVE FORM OF CLOSED-LOOP MODEL
The simplest formulation of the closed-loop model takes the form of a nonlinear autore-
gressive (NAR) model, incorporating all measured in-the loop variables in a generalized
NARMAX model form. The mathematical formulation of this problem for two "in-the-
loop" variables in the discrete-time Volterra context is
R R R
y(n) = go + Igl(r)y(n - r) + I I g2(rb r2)y(n - rl)y(n - r2) + ...
r=1 rl =1 r2=1
M M M
+ ko + I k 1(m)x(n - m) + I I k2(mb m2)x(n - ml)x(n - m2) + ... + e(n) (10.1)

m=O ml =0 m2=O
where y(n) and x(n) are the two in-the-loop variables, e(n) is the noise/interference tenn,
and {gi} and {k i } are the respective sets ofVolterra kemels for this autoregressive fonnu-
lation. This formulation can be also viewed as a nonlinear feedback model:
y(n) = G[y(n - 1), ... ,y(n - R)] + K[x(n), ... ,x(n - M)] + e(n) (10.2)
where G denotes the "feedback" (autoregressive) Volterra operator and K denotes the
"forward" Volterra operator. A more general formulation ofthe closed-loop model is ob-
tained wheny(n) is moved to the right-hand side ofEquation (10.2) and is combined with
G into a single Volterra operator:
F[y(n), ... ,y(n - R)] = G[y(n - 1), ... ,y(n - R)] - y(n) (10.3)
so that the closed-loop model takes the form
F[y(n), ... ,y(n - R)] + K[x(n), ... ,x(n - M)] + e(n) = 0 (10.4)
which can be considerably more general than Equation (10.2) [when y(n) enters into F in
more complicated ways than a subtractive term] and refers to the modular model form
shown in Figure 10.1 that has the direct conceptual interpretation of a nonlinear dynamic
closed-loop system with external perturbations/disturbances.
Note that in Figure 10.1, an additional "disturbance term" (n) has been included for
greater generality. The operational importance ofthe "disturbance terms" e(n) and (n) is
critical because they are the "driving" processes, akin to the "innovation" processes used
in autoregressive modeling. They represent the interaction ofthe closed-loop system with
its environment and obviously attain critical importance in the context of human physiol-
ogy. Elaboration on the role and interpretation of these "disturbance terms" is deferred to
a future monograph, because of the aforementioned constraints of space.
10.2 NETWORK MODEL FORM OF CLOSED-LOOP SYSTEMS 491
I I I. q(n)
e(n) • +1- + l - - - - - - - J
1-(
Figure 10.1 The general closed-Ioop modular model form, for which we must estimate the Volterra
operators K and F so that the disturbances e(n) and ~(n) satisfy certain conditions.
10.2 NETWORK MODEL FORM OF CLOSED-LOOP SYSTEMS
The nonlinear auto-regressive model of Equation (10.1) can be also cast in the form of the
Volterra-equivalent network shown in Figure 10.2 (without the recursive connection).
Activation of the recursive connection, shown with dashed line in Figure 10.2, leads to
the bivariate closed-loop network model of Figure 10.3, where the "interaction layer" is
now termed the "fusion layer" because it generates outputs {lfil(n), ... , lfiAn)} that surn
with an offset lfio to form the residual term e(n). The latter can be viewed as an equilibra-
x(n)
y n-4-1)
( ~
~ ~:
FILTERBANK
LAYER
HIDDEN
LAYER
INTERACTION
'--"" ./ GI LAYER
GI
"\
OUfPUT
Ya • +
'·~G.../' RECURSIVE CONNECTION
- - - - - - - - - - - - - - - - - - - - - - - - ..... - - - - - - - - - - - - - - --
.. .y (n)
Figure 10.2 Volterra-equivalent network of an autoregressive component (right module). When the
recursive connection (shown with dashed line) is activated, then we transition to the "equilibrating"
network architecture of Figure 10.3.
492 MODELING OF CLOSED-LOOP SYSTEMS
x(n) y(n)
PHYSIOLOGICAL
VARIABLES
FlLTERBANK
LAYER
B1DDEN
LAYER
FUSION
LAYER
BALANCING
VII VII STATES
~Vlo EQUILmRATION
RESIDUAL
(ANISOROPy)
e(n)
Figure 10.3 The "equilibrating" network architecture involving two variables x(n) and y(n) in a closed
loop (physiologically, not schematically). The closed-Ioop interrelationship is represented by the gen-
eration of the "balancing states" {t/J;(n)} by the "fusion layer." The equilibration residual (anisoropy)
e(n) must satisfy certain conditions vls-ä-vls the data x(n) and y(n) (see text).
tion variable (termed also "anisoropy," which means in Greek "lack of equilibrium") and
must satisfy some key conditions with regard to the in-the-loop variables x(n) and y(n).
The particular form ofthese conditions between e(n) and {x(n), y(n)} is the key to the
successful application of this approach, since satisfaction of these conditions will be used
for training this network model. It must be emphasized that mean-square minimization is
not necessarily a meaningful criterion for e(n) in this case. We may require instead that
e(n) have maximum "cross-entropy" withx(n) andy(n) (an "information theoretic" crite-
non), or have minimum "projection" on them according to some norm (other than the Eu-
clidean norm). This subject deserves and shall receive the proper attention in the near fu-
ture.
In the multivariate case, the closed-loop network model is shown in Figure 10.4 and
exhibits the virtues of scalability that were discussed in connection with the multi-input
models in Chapter 7. It is evident that this model, also termed the "multivariate homeody-
namic model," can be potentially useful in modeling the full complexity of physiologieal
systems-viewed as practically intraetable heretofore.
As an example of a training criterion for the multivariate ease, we may eonsider the
minimization of the pth-order "dependence measure"
Oi,p= J:! N~xf(n-m)p(n)- ~xf(n-m)~8P(n)

M 1 N N N 1
2/
p (10.5)
10.2 NETWORK MODEL FORM OF CLOSED-LOOP SYSTEMS 493
XI XM
PHYSIOLOGICAL
VARIABLES
FILTERBANK
LAYER
HIDDEN
LAYER
FUSION
LAYER
DYNAMIC
'1/1 '1//. STATES
'1/0
EQUILIBRATION
RESIDUAL
(ANISOROPY)
e(n)
Figure 10.4 Multivariate nonlinear dynamic closed-Ioop network model of M interconnected vari-
ables, also termed the multivariate "homeodynamic" model.
for each variable x, and for one or more integer p values. Also, "classic" Euclidean pro-
jections can be used in minimizing the aggregate quantity:
<I> = I (Xi' e)2 (10.6)

i (Xb Xi)
where <., .> denotes the "inner produet" (see Appendix I). Obviously, many different pos-
sibilities exist for training the multivariate "homeodynamie" network, whieh must be
evaluated in the future in the presented methodological eontext. It is eritieal to note that
minimization ofmean-square value ofthe anisotropy e(n) is not a sensible eriterion in this
formulation of the modeling problem, but rather we must seek the minimization of met-
ries that minimize the "mutual information" or "mutual interdependenee" between the
anisotropy and the variables of interest.
APPENDIX I
Function Expansions
Consider a set of M square-integrable functions {bm{t)} (m = 1, ... , M), defined in the

time interval [A, B], that have nonzero Euclidean norm:
Ilbm(t)W ~ [b~(t)dt > 0 (Al.l)
These functions form a basis if and only if they are linearly independent; that is, each of
them cannot be expressed as a linear combination of the others. A necessary and suffi-
cient condition for linear independence is that the Gram matrix defined by the inner prod-
ucts of these functions {(bi' bj ) } be nonsingular Ci,} = 1, ... ,M). The definition of the in-
ner product is
(b;, b) ~ rA
blt)bß)dt (Al.2)
from which we see that: (bb b) == Ilbi l12 • The basis {b m } defines a "Hilbert space" over t E
[A, B] and can be viewed as a coordinate system offunctions in the sense that any func-
tion of this Hilbert space can be expressed as a linear combination of the basis functions.
The key operational concept of function expansions is that any square-integrable func-
tionj(t), t E [A, B] (not necessarily from the Hilbert space defined by the basis {b m } ) can
be approximated by a linear combination of the basis functions as
M
let) = I
m=l
ambm{t) (Al.3)
in the sense that the Euclidean norm ofthe difference [fCt) -let)] is minimized. The latter
represents the "mean-square approximation error" that underlies most methods of func-

496 APPENDIX I
tionlsignal approximation (or estimation in a Gaussian statistical context using maximum

likelihood methods).
The expansion coefficients {am} in Equation (A1.3) can be determined by minimiza-
tion of the mean-square error:
IJ/{t) - !(t)1J2 ~ r~t) - Zl ambm(t) dt r (A1.4)
yielding the canonical equation
G{l=~ (A1.S)
where G is the M x M Gram matrix {(bi' bj ) }(i,) = 1, ... , M) defined by all inner prod-
ucts of the basis functions, a is the vector of the unknown expansion coefficients {am},
and c is the vector ofthe inner products {if, bm ) } (m = 1, ... ,M). The canonical equation
(A1.5) yields the expansion coefficients upon inversion ofthe Gram matrix
{l = G-l~ (A1.6)
because the Gram matrix is nonsingular by definition (linear independence of the basis
functions).
The solution of Equation (A1.6) is facilitated computationally when the Gram matrix
is diagonal, which corresponds to the case of an "orthogonal basis": (bb bj ) = 0, for i # j.
This motivates the search for orthogonal bases in a practical context. An orthogonal basis
{ßm(t)} can be always constructed from a nonorthogonal basis {bm(t)} (i.e., spanning the
same Hilbert space) using the Gram-Schmidt orthogonalization procedure. For an orthog-
onal basis {ßm(t)}, the expansion coefficients are given by
(f, ßm)
(Al.7)
a m = (ßm, ßm)
Furthermore, the orthogonal basis can be normalized to unity Euclidean norm, IIßml1 = 1,
for every m = 1, ... ,M. This results in an "orthonormal basis" {ßm(t)} for which the ex-
pansion coefficients are given by
am = if, ßm) (Al.8)
The inner product operation of Equation (Al.8) for an orthonormal basis can be viewed
also as a correlation of fit) with orthonormal regression variables {ßm(t)}. The mean-
square approximation error 8ft for an orthonormal basis can be expressed as
M
8ft ~ 1II-111 2 = 11/11 2 - I
m=l
a~ (Al.9)
When this error tends to zero, the basis is called "complete." Completeness applies, of
course, to non-orthogonal bases as weIl. When a basis is complete for a given Hilbert
space of functions, then any function of this space can be represented precisely as a linear
combination of the basis functions.
FUNCTION EXPANSIONS 497
It is critical to note that the mean-square approximation (and the respective expansion
coefficients) resulting from the solution ofthe canonical equation (AI.6) depends on the
interval of expansion [A, B] over which the basis is defined. Thus, when the latter is
changed (as in the example below), the expansion coefficients and the approximation er-
ror also change in general. This is akin to the dependence of Wiener or CSRS kemels on
the input power level in the case ofJunctional expansions (see Section 2.2).
Another critical distinction is between these mean-square expansions and analytical
expansions (like the Taylor series). The latter are not based on error minimization but on
the derivative values at the reference point of differentiable (analytic) functions only. This
distinction corresponds to the distinction between Volterra and Wiener kemels in the case
ofJunctional expansions discussed in Section 2.2.1.
Complete bases are often enumerably infinite (e.g., the Fourier basis) and, therefore,
are truncated in practice. This truncation naturally results in an approximation error that
depends on the convergence pattem of the expansion and decreases monotonically with
increasing number of basis functions (i.e., the dimensionality of the approximating sub-
space). Thus, in practice, the relevant issue is not the completeness but the convergence
of the approximation error resulting from a truncated expansion for the specific case at
hand.
Well-established complete orthonormal (CON) bases that have been used so far in-
clude the Fourier (sinusoidal), Legendre, and Chebychev (polynomial) sets for finite ex-
pansion intervals [A, B]. For semiinfinite intervals (B ~ 00), a well-established CON basis
is the Laguerre set (polynomials multiplied with an exponential) that has found many use-
ful applications in the expansion ofkemels (see Section 2.3.2). For infinite intervals (A ~
-00, B ~ 00), a well-established CON basis is the Hennite set (polynomials multiplied
with a Gaussian) that is currently finding application to image processing and studies of
receptive fields in the visual system.
Example
We consider as an illustrative example, a function that is frequently used to represent
compressive static nonlinearities in biological systems, the Michaelis-Menton function:
x
J(x) = x + C (AI.IO)
defined for x E [0, 00), which has the following analytical (Taylor) expansion about x = 0
(for x ~ 0):
fex) = ~(-I)n+l(n _ 1)!( ~ )n (AI.II)
where c represents the half-max value (note that the max value of 1 is attained asymptoti-
cally as x ~ 00).
If we seek a linear mean-square approximation of this function over the finite interval
[0, xo], then we must consider the subspace defined by the basis {I, x} for x E [0, xo], and
construct the Gram matrix
(1, 1) (1,x)] [xo xfj/2]

(AI.I2)
G= [ (x, 1) (x, x) = X5/2 xfj/3
498 APPENDIX I
whose inverse
4/xo -6/X5]
G-I = [ -6/X6 (AI.I3)
12/xfi
yields the two expansion coefficients ao and al after multiplication with the vector:
xo+cInCo:J
[ (I; 1)] == ( c ) (AI.14)
(I; x) I X6 _ cXo - c2 In Xo + c
2
This yields
4 6
ao == -(I; 1) - 2([, x) (Al.I5)
Xo Xo
12 6
al == - 3 ([, x) - 2([, 1) (Al.I6)
Xo Xo
which indicates that the slope of the linear mean-square approximation depends on Xo and
is distinct from the coefficient (I/c) of the linear term of the Taylor expansion in Eq.
(Al. 11). This i1lustrates the analogy between Volterra (analytical) and Wiener (orthogo-
nal) expansions, as well as the dependence of the Wiener kernels (analogous to the or-
thogonal expansion coefficients) on the GWN input power level (analogous to the interval
of expansion).
APPENDIX 11
Gaussian White Noise
Gaussian white noise (GWN) is a stationary and ergodie random process with zero mean
that is defined by the following fundamental property: any !wo values 01 GWN are statis-
tically independent now matter how close they are in time.
The direct implication of this property is that the autocorrelation function of a GWN
process w( t) is zero for nonzero arguments/shifts/lags:
cPw( T) ~ E[w(t)w(t - T)] = E[w(t)]E[w(t - T)] = 0 (A2.1)
for all T # 0, where cPw denotes the autocorrelation function, and E[ . ] is the "expected
value statistical operator." The value of cPw at T = 0 is the variance of the zero-mean
process w(t), and has to be defined mathematically to be infinite for GWN because, other-
wise, the GWN signal will have zero power. Recall that the power spectrum of a random
process (signal) is the Fourier transform ofits autocorrelation function, and, therefore, the
latter has to be defined as a delta function for GWN in order to retain nonzero power
spectral density. That is, if the value of cPw at T = 0 were finite, then the power spectrum
would be a null function. Thus, the autocorrelation function of GWN is defined by means
of the Dirac delta function as
cPw( T) = PB(T) (A2.2)
where P is a positive scalar termed the "power level" of GWN. The GWN power spec-
trum (also called the power spectral density function) is found as the Fourier transform of
its autocorrelation function to be
Sw(W) =P (A2.3)
Nonlinear Dynamic Modeling ofPhysiological Systems. By Vasilis Z. Marmarelis 499

500 APPENDIX"
Evidently, the power spectrum of GWN is constant over all frequencies, hence the name
"white noise," in analogy to the white light that contains all (visible) wavelengths with the
same power.
The additional adjective "Gaussian" in GWN indicates that the amplitude distribution
of the white-noise signal is Gaussian-like the independent steps in Brownian motion.
However, any zero-mean amplitude distribution can define a non-Gaussian white-noise
process (signal) as long as the values ofthe signal satisfy the aforementioned condition of
statistical independence (see Section 2.2.4 for examples ofnon-Gaussian white processes
with symmetrie amplitude distributions).
Although the mathematical properties of GWN have been studied extensively and uti-
lized in many fields, the ideal GWN signal is not physically realizable because it has infi-
nite variance by definition (recall that the variance of a stationary zero-mean random sig-
nal is equal to the value of its autocorrelation function at zero lag). This situation is akin
to the mathematical use of the Dirac delta function that has found immense utility in
many fields of science and engineering but is, likewise, not physically realizable. Of
course, in practice, finite-bandwidth approximations of delta functions or GWN signals
can be used that are adequate for the requirements of each given application. Specifically,
the critical parameter is the bandwidth ofthe system under study, which has to be covered
by the bandwidth of the selected signal (i.e., the band-limited GWN input signal must
cover the bandwidth of interest).
The most common approximation for GWN is the band-limited GWN signal (with a
bandwidth equal or exceeding the requirements of a given application) that has the "Sinc
function" as autocorrelation function. The use of this and other GWN approximations
(termed quasiwhite signals) are discussed in Section 2.2.4 in the context ofnonlinear sys-
tem identification.
It should be noted that the key property of GWN with regard to system identification is
the "whiteness" and not the "Gaussianity." Thus, non-Gaussian white-noise signals (e.g.,
the CSRS family of quasiwhite signals discussed in Section 2.2.4) have symmetrie ampli-
tude probability density functions and may exhibit practical advantages in certain applica-
tions over band-limited GWN. For instance, multilevel CSRS quasiwhite signals (e.g., bi-
nary, temary, etc.) may be easier to generate and apply through experimental transducers
in certain applications than band-limited GWN waveforms.
In the context of the Wiener approach to nonlinear system identification, it is critical to
understand the high-order statistical properties of GWN signals. We note that the high-or-
der autocorrelation functions of GWN signals have a specific structure that is suitable for
nonlinear system identification following the Wiener approach. Specifically, all the odd-
order autocorrelation functions are uniformly zero and the even-order ones can be ex-
pressed in terms 0/ sums 0/products 0/ delta functions. This statistical "decomposition"
property is critical for the development ofthe Wiener series and its application to nonlin-
ear system identification, as elaborated in Appendix 111 and Section 2.2.3.
Let us illustrate the structure of the even-order autocorrelation functions using the
fourth-order case:
cP4( Th T2, T3) ~ E[w(t)w(t - TI)W(t- T2)W(t - T3)] (A2.4)
Using the decomposition property of zero-mean Gaussian random variables (Xl' X2' X3,
X4), which states that
GAUSS/AN WH/TE NO/SE 501
E1Xl X2 X3 X4] = E1XIX2~X3X4] + E1XIX3j.E[X2X4] + E1XIX4jE'[X2X3] (A2.5)
we obtain
4>4( Tb T2' T3) = E[w(t)w(t - Tl)] . E[w(t - T2)W(t - T3)] +

E[w(t)w(t - T2)] . E[W(t- TI)W(t- T3)] +
E[W(t)W(t- T3)] . E[W(t- T2)W(t - Tl)] (A2.6)
which reduces to
4>4(Tl' T2' T3) = p2{ 8( TI)8(T2 - T3) + 5(T2)5(T3 - Tl) + 8( T3)8( Tl - T2)} (A2.7)
for GWN, because ofEquation (A2.2).

The decornposition property applies to any even number 01zero-mean Gaussian vari-
ables and states that the expected value of the product 2m Gaussian variables is equal to
the surn of the products of the expected values of all possible pairs:
E[XIX2 ... X2m] = IOE[XiXj] (A2.8)
where ~n denotes the surn ofproducts ofall possible pair cornbinations of(i,}) from the
indices (1, ... ,2m). Since there are (2m)!/(m!2 m ) possible decornpositions of 2m vari-
ables in m pairs, the 2mth-order autocorrelation function of GWN can be expressed as the
sum of (2m)!/(m!2 m ) products of m delta functions (with the proper arguments resulting
from all possible pair combinations of the time-shifted versions of w). This decomposi-
tion property is arefleetion of the fact that the Gaussian proeess is a "seeond-order
process," (i.e., its high-order statisties ean be expressed entirely in terms of its seeond-or-
der statisties).
With regard to the odd-order statistics, note that the expeeted value 01any odd number
01zero-mean Gaussian variables is zero. Therefore, the odd-order autoeorrelation fune-
tions of zero-rnean Gaussian processes (white or nonwhite) are uniformly zero. The same
is true for all quasiwhite non-Gaussian processes with symmetrie amplitude distributions
(probability density funetions). These properties find applieation in the kernel estimation
method via eross-eorrelation presented in Sections 2.2.3 and 2.2.4.
APPENDIX III
Construction 0/ the
Wiener Series
Wiener proposed the orthogonalization of the Volterra series for Gaussian white noise
(GWN) inputs in a manner akin to a Grarn-Schmidt orthogonalization procedure. Thus,
the zero-order Wiener functional is a constant (like its Volterra counterpart, although of
different value):
Go=h o (A3.1)
Then the first-order Wiener functional has a leading term similar to its Volterra counter-
part (although involving a different kerne1)plus a multiple ofthe zero-order tenn:
GI(t) = f\I(T)x(t- T)dT+ cI,oha (A3.2)

o
where the scalar C1,O will be detennined so that the covariance between G 1{ t ) and Go be
zero (orthogonality) for a GWN input. This orthogonality condition yields C1,O = 0, be-
cause a GWN signal has zero mean:
E[GOG1{t)] = hoL''\I(T) 'E[x(t- T)]dT+ CI ohÖ = 0

o '
~ c1,oh6 = 0 (A3.3)
Following the same procedure for the second-order Wiener functional, we have
OO
G2 { t ) = LooJ h2{ Tl' T2)X{t - T1)X{t - T2) d T1 dT2 + C2 1L h l ( T)x(t - T)]dT + C2 aho (A3.4)
o ' 0 '
where C2,1 and C2,O must be detennined so that the following two orthogonality conditions
be satisfied:
Nonlinear Dynamic Modeling ofPhysiological Systems. By Vasilis Z. Mannarelis 503

504 APPENDIX 111
ElG2(t)G l(t)] = 0 (A3.5)
E[G2 (t)Go] = 0 (A3.6)
From condition (A3.6) we have
rf
ho hl TJ, Tz)E[X(t - Tl)x(t - Tz)]dTldTz + cz,oh& = 0
=> C2,O = --h

P jooh (Tb Tl)dTl (A3.7)
2
o 0
because E[x(t - Tl)X(t - T2)] = P5( Tl - T2) and E[x(t - T)] = 0, for a GWN input with power
level P. From condition (A3.5) we get C2,1 = 0, otherwise orthogonality between G2 and
GI cannot be secured. Therefore
Gz(t) = rf
o
hl Tl' Tz)x(t - :Tl)x(t - Tz)dTjdTZ - Pjoohz( TJ, Tj)dTI
0
(A3.8)
This procedure can be continued for higher-order Wiener functionals and leads to separa-
tion of odd and even functional terms, because of the statistical properties of GWN dis-
cussed in Appendix 11. For instance, the third-order Wiener functional is
G 3(t) = joo
o
ff h3(TJ, TZ, T3)x(t- Tj)x(t - Tz)x(t - T3)dTj dTzdT3
- 3P rf
o
h 3(TJ, A, A)x(t - Tj)dTjdA (A3.9)
For additional details, see Marmarelis & Marmarelis (1978) or Schetzen (1980).
Using the orthogonality ofthe Wiener functionals, we can derive a general expression
for the autocorrelation function of the output of a nonlinear system:
E[y(t)y(t- u)] = IE[Glt)Glt- u)] (A3.10)

i=O
which indicates that the autocorrelation of the system output is composed of the partial
autocorrelations of the Wiener functionals.
A corollary ofthe general expression (A3.10) is the expression for the output variance
in terms of the Wiener kemels, since
Var[y(t)] = E[y2(t)] - hfj (A3.11)
and
E[y2(t)] = IE[G;(t)]
r
r=0
= Ir!.?'f. ..
r=O 0
h;( TJ, . • . , Tr)dTj . · . dr, (A3.12)
utilizing the autocorrelation properties ofthe GWN process (see Appendix 11).
APPENDIX IV
Stationarity, Ergodicity, and
Autocorrelation Functions 01
Random Processes
The time course of continuous observations of variables associated with a natural phe-
nomenon or experiment can be described by a "random process," because ofthe stochas-
tic element intrinsie to these observations, which necessitates "probabilistic" (as opposed
to "deterministic") descriptions of these variables.
Thus a random process (RP) can be viewed as a function of time (or signal) whose val-
ue at each point in time is a random variable and can be described only probabilistically
(i.e., we can assign a certain probability of occurrence to each possible value ofthis signal
at any given time). The RP, X(t), is often denoted with a capitalletter as the "ensemble" of
all possible realizations {xlt)}. Each realization is termed a "sample function" and is de-
noted by lowercase letters. The amplitude probability density function (APDF) p(x, t) of
the RP is defined as
Prob lx., :::; X(t) < Xo + dx} = p(xo, t)dx (A4.1)
and may gene rally depend on time t. Likewise, the kth joint APDF can be defined as
Prob lx, :::; X(t l ) < Xl + dx ., ,Xk :::; X(tk) < Xk + dXk}
= Pk(Xh ... ,Xk; t h , tk)dxI ... dx; (A4.2)
When all joint APDFs are time-invariant, the RP is called "stationary" and the expres-
sions are simplified by omitting the explicit reference to time. This, in fact, is the main
class of RPs considered in practice.
In order to describe the statistical relations among the different samples/values of the
RP at different times, we introduce the autocorrelation functions
Nonlinear Dynamic Modeling ofPhysiological Systems. By Vasilis Z. Mannarelis 505

506 APPENDIX IV
cPk(t h ... , tk) = E1X(tI) ... X(tk)]
= J . · . JOO Xl ... X,piXI> ... ,Xk; tl> ... , tk)dx] ... dx, (A4.3)
--00
which are somewhat simplified for stationary RPs by considering only the time differ-
ences (tl - t) for i = 2, ... , k, as affecting the value ofthe kth-order autocorrelation func-
tion.
In addition to stationarity, the other key attribute of RPs is ergodicity. An ergodie RP
retains the same statistical characteristics throughout the entire ensemble, the same way a
stationary process retains the same statistical characteristics over time. In practice, we
typically assume that the observed RP is ergodie, although this assumption ought to be
tested and examined. When the RP is ergodie and stationary, the ensemble averaging can
be replaced by time averaging. Therefore, the kth-order autocorrelation function of a sta-
tionary and ergodie RP is given by
cPk( Th ... , Tk-I) = lim - 1 JR X(t)X(t - Tl) . . . X(t - Tk-I)dt (A4.4)

R~oo 2R -R
It is evident that in practice (where only finite data records are available) estimates of
the autocorrelation functions are obtained for finite R. Likewise, only estimates of the
APDF can be obtained via amplitude histograms over the finite data records.
In a practical context, one or more realizations of the RP can be recorded over a finite
time interval and estimates of the APDF and the autocorrelation functions can be ob-
tained. The respective multi dimensional Fourier transform ofthe autocorrelation function
defines the "polyspectrum" or the "high-order spectrum" of the RP for the respective or-
der. Typically, only the second-order autocorrelation function is estimated, yielding an
estimate of the RP spectrum via finite Fourier transform. Occasionally, the bispectrum
and the trispectrum are estimated from the third- and fourth-order autocorrelation func-
tions via the respective Fourier transform. This is meaningful only for non-Gaussian
processes, since Gaussian processes are fully described by the second-order autocorrela-
tion function (see Appendix 11 for the "decomposition" property ofGaussian variables).
Thus, the two aspects of an ergodie and stationary RP that are typically examined are
its spectrum (or the corresponding second-order autocorrelation function) and its APDF.
The former allows the classification into white or nonwhite RPs, and the latter determines
the amplitude characteristics (Gaussian or not, multilevel etc.).
References
Aaslid, R.K., W. Lindengaard, W. Sorterberg, and H. Nomes. (1989). Cerebral autoregulation dy-
namics in humans. Stroke 20:45-52.
Abdel-Malek, A. and V.Z. Marmarelis. (1988). Modeling of task-dependent characteristics of hu-
man operator dynamics during pursuit manual tracking. IEEE Transaetions on Systems, Man,
and Cyberneties 18: 163-172.
Abdel-Malek, A., C.H. Markham, P.Z. Marmarelis, and V.Z. Marmarelis. (1988). Quantifying defi-
ciencies associated with Parkinson's disease by use of time-series analysis. Journal of Elee-
troeneephalography & Clinieal Neurophysiology 69:24-33.
Adelson, E.H. and J.R. Bergen. (1985). Spatiotemporal energy models for the perception ofmotion.
J. Opt. Soe. Am. A 2:284-299.
Aertsen, A.M. and P.l. Johanesma. (1981). The spectrotemporal receptive field: a functional charac-
teristic of auditory neurons. Biologieal Cyberneties 69:407-414.
Alataris, K., T.W. Berger, and V.Z. Marmarelis. (2000). A novel network for nonlinear modeling of
neural systems with arbitrary point-process inputs. Neural Networks 13:255-266.
Amorocho, J. and A. Brandstetter. (1971). Determination ofnonlinear functional response functions
in rainfall-runoffprocesses. Water Resourees Research 7:1087-1101.
Aracil, J. Measurements ofWiener kemels with binary random signals. (1970). IEEE Transactions
on Automatie Control15: 123-125.
Arbib, M.A., P.L. Falb, and R.E. Kaiman. (1969). Topics in Mathematical System Theory.
McGraw-Hill, New York.
Astrom, K.J. and P. Eykhoff. (1971). System identification-a survey. Automatiea 7: 123-162.
Barahona, M. and C.S. Poon. (1996). Detection of nonlinear dynamics in short, noisy time series.
Nature 381:215-217.
Bardakjian, B.L., W.N. Wright, T.A. Valiante, and P.L. Carlen. (1994). Nonlinear system identifi-
cation of hippocampal neurons. In: Advaneed Methods ofPhysiologieal System Modeling, Vol-
ume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 179-194.
Nonlinear Dynamte Modeling ofPhysiological Systems. By Vasilis Z. Mannarelis 507

508 REFERENCES
Barker, H.A. (1967). Choice ofpseudorandom binary signals for system identification. Electronics
Letters 3:524-526.
Barker, H.A. (1968). Elimination of quadratic drift errors in system identification by pseudorandom
binary signals. Electronics Letters 4:255-256.
Barker, H.A. and R.W. Davy. (1975). System identification using pseudorandom signals and the
discrete Fourier transform. Proceedings lEE 122:305-311.
Barker, H.A. and S.N. Obidegwu. (1973). Effects ofnonlinearities on the measurement ofweight-
ing functions by crosscorrelation using pseudorandom signals. Proceedings lEE 120:
1293-1300.
Barker, H.A. and T. Pradisthayon. (1970). High-order autocorrelation functions of pseudorandom
signals based on M-sequences. Proceedings lEE 117:1857-1863.
Barker, H.A., S.N. Obidegwu, and T. Pradisthayon. (1972). Performance of antisymmetrie pseudo-
random signals in the measurement of second-order Volterra kemels by crosscorrelation. Pro-
ceedings lEE 119:353-362.
Barrett, J.F. (1963). The use of functionals in the analysis of nonlinear physical systems. J. Elec-
tron. ControI15:567-615.
Barrett, J.F. (1965). The use ofVolterra series to find region of stability of a nonlinear differential
equation. International Journal ofControll :209-216.
Barrett, T.R. (1975). On linearizing nonlinear systems. Journal ofSound Vibration 39:265-268.
Bassingthwaighte, J.B., L.S. Liebovitch, and B.J. West. (1994). Fractal Physiology, Oxford Uni-
versity Press, Oxford.
Baumgartner, S.L. and W.J. Rugh. (1975). Complete identification of a class of nonlinear systems
from steady state frequency response. IEEE Trans. on Circuits and Systems 22:753-758.
Bedrosian, E. and S.O. Rice. (1971). The output properties ofVolterra systems (nonlinear systems
with memory) driven by harmonie and Gaussian inputs. Proceedings IEEE 59:1688-1707.
Bekey, G.A. (1973). Parameter estimation in biological systems: A survey. Proceedings of the
Third IFAC Symposium-Identijication and System Parameter Estimation, North-Holland, Am-
sterdam, pp. 1123-1130.
Bellman R. and K.J. Astrom. (1969). On structural identifiability. Math. Biosei. 1:329-339.
Belozeroff, V., R.B. Berry, and M.C.K. Khoo. (2002). Model-based assessment of autonomie con-
trol in obstructive sleep apnea syndrome. Sleep 26(1):65-73.
Benardete, E.A. and J.D. Victor. (1994). An extension ofthe m-sequence technique for the analysis
ofmulti-input nonlinear systems. In: Advanced Methods ofPhysiological System Modeling, Vol-
ume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 87-110.
Bendat, J. S. (1976). System identification from multiple input/output data. Journal ofSound Vibra-
tion 49:293-308.
Bendat, J.S. (1998). Nonlinear Systems Techniques and Applications. Wiley, New York.
Bendat, J.S. and A.G. Piersol. (1986). Random Data: Analysis and Measurement Procedures, 2nd
Edition. Wiley, New York.
Berger, T.W. and R.J. Sclabassi. (1988). Long-term potentiation and its relation to changes in hip-
pocampal pyramidal cell activity and behavioral leaming during classical conditioning. In:
Long-term Potentiation: From Biophysics to Behavior, Alan R. Liss, New York, pp. 467-497.
Berger, T.W., G. Chauvet, and R.J. Sclabassi. (1994). A biologically-based model ofthe functional
properties ofthe hippocampus. Neural Networks 7:1031-1064.
Berger, T.W., G.B. Robinson, R.L. Port, and R.J. Sclabassi. (1987). Nonlinear systems analysis of
the functional properties ofhippocampal formation. In: Advanced Methods ofPhysiological Sys-
tem Modeling, Volume /, V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Ange-
les, pp. 73-103.
REFERENCES 509
Berger, T.W., J.L. Eriksson, D.A. Ciarolla, and R.J. Sclabassi. (1988a). Nonlinear systems analysis
of the hippocampal perforant path-dentate projection. 11. Effect of random pulse stimulation.
Journal 0/ Neurophysiology 60: 1077-1094.
Berger, T.W., J.L. Eriksson, D.A. Ciarolla, and R.J. Sclabassi. (1988b). Nonlinear systems analysis
of the hippocampal perforant path-dentate projection. 111. Comparison of random train and
paired impulse stimulation. Journal 0/ Neurophysiology 60: 1095-1109.
Berger, T.W., T.P. Harty, G. Barrionuevo, and R.J. Sclabassi. (1989). Modeling of neuronal net-
works through experimental decomposition. In: Advanced Methods 0/ Physiological System
Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 113-128.
Berger, T.W., G. Barrionuevo, S.P. Levitan, D.N. Krieger, and R.J. Sclabassi. (1991). Nonlinear
systems analysis of network properties of the hippocampal formation. In: Neurocomputation
and Learning: Foundations 0/ Adaptive Networks, J.W. Moore and M. Gabriel (Eds.), MIT
Press, Cambridge, MA, pp. 283-352.
Berger, T.W., G. Barrionuevo, G. Chauvet, D.N. Krieger, S.P. Levitan, and R.J. Sclabassi. (1993).
A theoretical and experimental strategy for realizing a biologically based model of the hip-
pocampus. In: Synaptic Plasticity: Molecular, Cellular and Functional Aspects, R.F. Thompson
and J.L. Davis (Eds.), MIT Press, Cambridge, MA, pp. 169-207.
Berger, T.W., T.P. Harty, C. Choi, X. Xie, G. Barrionuevo, and R.J. Sclabassi. (1994). Experimen-
tal basis for an input/output model of the hippocampal formation. In: Advanced Methods 0/
Physiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp.
29-54.
Berger, T.W., M. Baudry, R.D. Brinton, J.-S. Liaw, V.Z. Marmarelis, A.Y. Park, B.J. Sheu, and
A.R. Tanguay, Jr. (2001). Brain-implantable biomimetic electronics as the next era in neural
prosthetics. Proceedings IEEE 89:993-1012.
Bergman, R.N. and J.C. Lovejoy (Eds.). (1997). The Minimal Model Approach and Determinants
of Glucose Tolerance. Pennington Center Nutrition Series, Vol. 7, Louisiana State Univ. Press,
Baton Rouge, LA and London.
Bergman, R.N., C.R. Bowden, and C. Cobelli. (1981). The minimal model approach to quantification
offactors controlling glucose disposal in man. In: Carbohydrate Metabolism, Wiley, New York,
pp. 269-296.
Billings, S.A. (1980). Identification ofnonlinear systems-a survey. Proceedings lEE 127:272-285.
Billings, S.A. and S.Y. Fakhouri. (1978). Identification of a class of nonlinear systems using corre-
lation analysis. Proceedings lEE 125:691-695.
Billings, S.A. and S.Y. Fakhouri. (1979). Identification ofnonlinear unity feedback systems. Inter-
national Journal ofSystem Seien ce 10:1401-1408.
Billings, S.A. and S.Y. Fakhouri. (1981). Identification of nonlinear systems using correlation
analysis and pseudorandom inputs. International Journal ofSystem Science 11:261-279.
Billings, S.A. and S.Y. Fakhouri. (1982). Identification of systems containing linear dynamic and
static nonlinear elements. Automatica 18:15-26.
Billings, S.A. and I.J. Leontaritis. (1982). Parameter estimation techniques for nonlinear systems.
In: IFAC Symposium on Identification and System Parameter Estimation, Arlington, VA, pp.
427--432.
Billings, S.A. and W.S.F. Voon. (1984). Least-squares parameter estimation algorithms for nonlin-
ear systems. International Journal of System Science 15:610-615.
Billings, S.A. and W.S.F. Voon. (1986). A prediction-error and stepwise-regression estimation al-
gorithm for nonlinear systems. International Journal 0/Control44:803-822.
Blasi A., J. Jo, E. Valladares, B.J. Morgan, J.B. Skatrud, and M.C. Khoo. (2003). Cardiovascular
variability after arousal from sleep: time-varying spectral analysis. Journal 0/Applied Physiolo-
gy 95(4):1394-1404.
510 REFERENCES
Blum, E.K. and L.K. Li. (1991). Approximation theory and feedforward networks. Neural Net-
works 4:511-515.
Boden, G, X. Chen, J. Ruiz, J.V. White, and L. Rossetti. (1994). Mechanism of fatty-acid induced
inhibition of glucose uptake. Journal 01 Clinical Investigation 93:2438-2446.
Borsellino, A. and M.G. Fuortes. (1968). Responses to single photons in visual cells of Limulus.
Journal 0/ Physiology 196:507-539.
Bose, A.G. (1956). A theory ofnonlinear systems. Technical Report No. 309, Research Laboratory
ofElectronics, M.I.T., Cambridge, MA.
Boyd, S. and L.G. Chua. (1985). Fading memory and problem of approximating nonlinear operators
with Volterra series. IEEE Transactions on Circuits and Systems 32:1150-1160.
Boyd, S., L.G. Chua, and C.A. Desoer. (1984). Analytical foundation of Volterra series. J. Math.
Contr. Info. 1:243-282.
Boyd, S., Y.S. Tang, and L.O. Chua. (1983). Measuring Volterra kemels. IEEE Transactions on
Circuits and Systems 30:571-577.
Briggs, P.A. and K.R. Godfrey. (1966). Pseudorandom signals for the dynamic analysis of multi-
variable systems. Proceedings IEEE 113:1259-1267.
Brilliant, M. B. (1958). Theory ofthe analysis ofnonlinear systems. Technical Report No. 345, Re-
search Laboratory ofElectronics, M.I.T., Cambridge, MA.
Brillinger, D.R. (1970). The identification ofpolynomial systems by means ofhigher order spectra.
Journal ofSound Vibration 12:301-313.
Brillinger, D.R. (1975a). The identification of point process systems. Annals 01 Probability
3:909-929.
Brillinger, D.R. (1975b). Time Series: Data Analysis and Theory, Holt, Rinehart & Winston, New
York.
Brillinger, D.R. (1976). Measuring the association ofpoint processes: A case history. The American
Mathematical Monthly 83:16-22.
Brillinger, D.R. (1987). Analyzing interacting nerve cell spike trains to assess causal connections.
In: Advanced Methods 0/ Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.),
Biomedical Simulations Resource, Los Angeles, pp. 29-40.
Brillinger, D.R. (1988). The maximum likelihood approach to the identification of neuronal firing
systems. Annals 01 Biomedical Engineering 16:3-16.
Brillinger, D.R. (1989). Parameter estimation for nongaussian processes via second and third order
spectra with an application to some endocrine data. In: Advanced Methods 0/ Physiological Sys-
tem Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 53-62.
Brillinger, D.R., H. Bryant, and J.P. Segundo (1976). Identification ofsynaptic interactions. Biolog-
ical Cybernetics 22:213-228.
Brillinger, D.R. and J.P. Segundo. (1979). Empirical examination ofthe threshold model ofneuron
firing. Biological Cybernetics 35:213-220.
Brillinger, D.R. and A.E.P. Villa. (1994). Examples ofthe investigation ofneural information pro-
cessing by point process analysis. In: Advanced Methods 0/ Physiological System Modeling,
Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 111-127.
Brockett, R.W. (1976). Volterra series and geometrie control theory. Automatica 12:167-176.
Bryant, H.L., A.R. Marcos, and J.P. Segundo. (1973). Correlations of neuronal spike discharges
produced by monosynaptic connections and by common inputs. Journal 0/ Neurophysiology
36:205-225.
Bryant, H.L. and J.P. Segundo. (1976). Spike initiation by transmembrane current: a whitenoise
analysis. Journal 0/ Physiology 260:279-314.
Bussgang, J.J. (1952). Crosscorrelation functions of amplitude distorted Gaussian signals. Techni-
cal Report No. 216, M.I.T. Research Laboratory ofElectronics, Cambridge, MA.
REFERENCES 511
Bussgang, J.J., L. Ehrman, and J.W. Graham. (1974). Analysis ofnonlinear systems with multiple
inputs. Proceedings IEEE 62:1088-1119.
Cambanis, S. and B. Liu. (1971). On the expansion of a bivariate distribution and its relationship to
the output of a nonlinearity. IEEE Transactions on Information Theory 17: 17-25.
Cameron, R.H. and W.T. Martin. (1947). The orthogonal development ofnonlinear functionals in
series of Fourier-Hermite functionals. Annals ofMathematics 48:385-392.
Carson, E.R., C. Cobelli, and L. Finkelstein. (1983). The Mathematieal Modeling ofMetabolie and
Endocrine Systems. Wiley, New York.
Caumo, A., P. Vicini, J.J. Zachwieja, A. Avogaro, K. Yarasheski, D.M. Bier, D.M., and C. Cobelli.
(1996). Undermodeling affects minimal model indexes: insights from a two compartmental
model. American Journal ofPhysiology 276:EI171-EI193.
Chan, R.Y. and K.-I. Naka. (1980). Spatial organization of catfish retinal neurons. 11. Circular stim-
ulus. Journal ofNeurophysiology 43:832.
Chang, F.H.I. and R. Luus. (1971). A non-iterative method for identification using Hammerstein
model. IEEE Transactions on Automatie ControI16:464-468.
Chappell, R.L., K.-I. Naka, and M. Sakuranaga. (1984). Turtle and catfish horizontal cells show dif-
ferent dynamics. Vision Research 24: 117-128.
Chappell, R.L., K.-I. Naka, and M. Sakuranaga. (1985). Dynamics ofturtle horizontal cells. Journal
ofGeneral Physiology 86:423-453.
Chen, H.-W., N. Ishii, and N. Suzumura. (1986). Structural classification ofnon-linear systems by
input and output measurements. International Journal ofSystem Seience 17:741-774.
Chen, H.-W., N. Ishii, M. Sakuranaga, and K.-I. Naka. (1985). A new method for the complete
identification of some classes of nonlinear systems. In: 15th NIBB Conference on Information
Processing in Neuron Network, K.-I. Naka and Y.I. Ando (Eds.), Okazaki, Japan.
Chen, H.-W., D. Jacobson, and J.P. Gaska. (1990). Structural classification ofmulti-input nonlinear
systems. Biologieal Cyberneties 63:341-357.
Chen, H.-W., D. Jacobson, J.P. Gaska, and D.A. Pollen. (1993). Cross-correlation analyses ofnon-
linear systems with spatiotemporal inputs. IEEE Transactions on Biomedical Engineering
40:1102-1113.
Chian, M.T., V.Z. Marmarelis, and T.W. Berger. (1998). Characterization ofunobservable neural
circuitry in the hippocampus with nonlinear system analysis. In: Computational Neuroscience,
1.M. Bower (Ed.), Plenum Press, New York, pp. 43-50.
Chian, M.T., V.Z. Marmarelis, and T.W. Berger. (1999). Decomposition of neural systems with
nonlinear feedback using stimulus-response data. Neurocomputing 26-27:641-654.
Chan, K.H., N.H. Holstein-Rathlou, and V.Z. Marmarelis. (1998a). Comparative nonlinear model-
ing of renal autoregulation in rats: Volterra approach vs. artificial neural networks. IEEE Trans-
actions on Neural Networks 9:430-435.
Chan, K.H., T.J. Mullen, and R.J. Cohen. (1996). A dual-input nonlinear system analysis of auto-
nomic modulation ofheart rate. IEEE Trans. on Biomedical Engineering 43:530-544.
Chan, K.H., N.-H. Holstein-Rathlou, D.l. Marsh, and V.Z. Marmarelis. (1994a). Parametrie and
nonparametric nonlinear modeling of renal autoregulation dynamics. In: Advanced Methods of
Physiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp.
195-210.
Chon, K.H., Y.M. Chen, N.H. Holstein-Rathlou, D.J. Marsh and V.Z. Marmarelis. (1998b). Nonlin-
ear system analysis of renal autoregulation in normotensive and hypertensive rats. IEEE Trans.
Biomedical Engineering 45:342-353.
Chan, K.H., Y.M. Chen, N.H. Holstein-Rathlou, D.l. Marsh, and V.Z. Marmarelis. (1993). On the
efficacy of linear system analysis of renal autoregulation in rats. IEEE Trans. Biomedical Engi-
neering 40:8-20.
512 REFERENCES
Chon, K.H., Y.M. Chen, V.Z. Marmarelis, D.J. Marsh, and N.H. Holstein-Rathlou, (1994b). Detec-
tion of interactions between myogenic and TGF mechanisms using nonlinear analysis. American
Journal 01Physiology 265:F160-F173.
Chua, L. and Y. Liao. (1989). Measuring Volterra kernels (11). International Journal ofCircuit The-
ory and Applications 17: 151-190.
Chua, L. and Y. Liao. (1991). Measuring Volterra kernels (111). International Journal of Circuit
Theory and Applications 19:189-209.
Chua, L. and C. Ng. (1979). Frequency domain analysis of nonlinear systems: general theory, for-
mulation oftransfer functions. IEEE Transactions on Circuits and Systems 3:165-185, 257-269.
Church, R. (1935). Tables of irreducible polynomials for the first four prime moduli. Annals of
Mathematics 36: 198.
Citron, M.C. (1987). Spatiotemporal white noise analysis of retinal receptive fields. In: Advanced
Methods 01 Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Sim-
ulations Resource, Los Angeles, pp. 161-1 71.
Citron, M.C. and R.C. Emerson. (1983). White noise analysis of cortical directional selectivity in
cat. Brain Research 279:271-277.
Citron, M.C. and V.Z. Marmarelis. (1987). Application ofminimum-order Wiener modeling to reti-
nal ganglion cell spatio-temporal dynamies. Biological Cybernetics, 57:241-247.
Citron, M.C., R.C. Emerson, and L.A. Ide. (1981a). Spatial and temporal receptive-field analysis of
the cat's geniculocortical pathway. Vision Research 21:385-397.
Citron, M.C., J.P. Kroeker, and G.D. McCann. (1981b). Non-linear interactions in ganglion cell re-
ceptive fields. Journal 01Neurophysiology 46:1161-1176.
Citron, M.C., R.C. Emerson, and W.R. Levick. (1988). Nonlinear measurement and classification
of receptive fields in cat retinal ganglion cells. Annals 01Biomedical Engineering 16:65-77
Clever, W.C. and W.C. Meecham. (1972). Time-dependent Wiener-Hermite base for turbulence.
Physics 01Fluids 15:244-255.
Cobelli, C. and A. Mari. (1983). Validation ofmathematical models of complex endocrinemetabol-
ic systems. A case study of a model of glucose regulation. Medical & Biological Engineering &
Computing. 21:390-399.
Cobelli, C. and G. Pacini. (1988). Insulin secretion and hepatic extraction in humans by minimal
modeling of C-peptide and insulin kinetics. Diabetes 37:223-231.
Courellis, S.H. and V.Z. Marmarelis. (1989). Wiener analysis of the Hodgkin-Huxley Equations.
In: Advanced Methods 01 Physiological System Modeling, Volume IL V. Z. Marmarelis (Ed.),
Plenum, New York, pp. 273-289.
Courellis, S.H. and V.Z. Marmarelis. (1990). An artificial neural network for motion detection and
speed estimation. In: International Joint Conference on Neural Networks, Volume L San Diego,
CA, pp. 407-422.
Courellis, S.H. and V.Z. Marmarelis. (1991a). Sensitivity enhancement of elementary velocity esti-
mators with self and lateral facilitation. In: Proceedings 01 the IEEE International Joint Confer-
ence on Neural Networks, Volume L Seattle, WA, pp. 749-758.
Courellis, S.H. and V.Z. Marmarelis. (1991b). Speed ranges accommodated by network architec-
tures of elementary velocity estimators. In: Proceedings 01 Visual Communication and Image
Pro cess ing, Volume 1606, Boston, MA, pp. 336-349.
Courellis, S.H. and V.Z. Marmarelis. (1992a). Nonlinear functional representations for motion de-
tection and speed estimation schemes. In: Nonlinear Vision, R. Pinter and B. Nabet (Eds.), CRC
Press, Boca Raton, FL, pp. 91-108.
Courellis, S.H. and V.Z. Marmarelis. (1992b). Velocity estimators of visual motion in two spatial
dimensions. In: International Joint Conference on Neural Ne tworks, Volume IIL Baltimore,
MD, pp. 72-83.
REFERENCES 513
Courellis, S.H., V.Z. Marmarelis, and T.W. Berger. (2000). Modeling event-driven nonlinear dy-
namics in biological neural networks. In: Proceedings 0/ the 7th Symposium on Neural Compu-
tation, Los Angeles, CA, Volume 10, pp. 28-35.
Cronin, J. (1987). Mathematical Aspects 0/ Hodgkin-Huxley Neural Theory. Cambridge University
Press, Cambridge.
Crum, L.A. and J.A. Heinen. (1974). Simultaneous reduction and expansion of multidimensional
Laplace transform kemels. SIAM Journal 0/ Applied Mathematics 26:753-771.
Curlander, J.C. and V.Z. Marmarelis. (1983). Processing ofvisual information in the distal neurons
ofthe vertebrate retina. IEEE Transactions on Systems, Man and Cybernetics. 13:934-943.
Curlander, J.C. and V.Z. Marmarelis. (1987). A linear spatio-temporal model ofthe light-tobipolar
cell system and its response characteristics to moving bars. Biological Cybernetics 57:357-363.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics 0/
Control, Signals & Systems 2:303-314.
D' Argenio, D.Z. (Ed.). (1991). Advanced Methods 0/ Pharmacokinetic and Pharmacodynamic Sys-
tems Analysis, Volume I Plenum, New York.
D'Argenio, D.Z. (Ed.). (1995). Advanced Methods ofPharmacokinetic and Pharmacodynamic Sys-
tems Analysis, Volume IL Plenum, New York.
D' Argenio, D.Z. and V.Z. Marmarelis. (1987). Experiment design for biomedical system modeling.
In: Systems & Control Encyclopedia: Theory, Technology, Applications, M.G. Singh (Ed.),
Pergamon Press, Oxford, pp. 486-490.
Dalal, S.S., V.Z. Marmarelis, and T.W. Berger. (1997). Anonlinearpositive feedback model ofglu-
tamatergic synaptic transmission in dentate gyrus. In: Proceedings 0/ the 4th Joint Symposium
on Neural Computation, Los Angeles, CA, 7:68-75.
Dalal, S.S., V.Z. Marmarelis, and T.W. Berger. (1998). A nonlinear systems approach of character-
izing AMPA and NMDA receptor dynamics. In: Computational Neuroscience, J.M. Bower
(Ed.), Plenum Press, New York, pp. 155-160.
Davies, W.D.T. (1970). System Identification for Self-Adaptive Control. Wiley, New York.
Davis, G.W. and K.I. Naka. (1980). Spatial organizations of catfish retinal neurons. Single-and ran-
dom-bar stimulation. Journal 0/ Neurophysiology 43:807-831.
de Boer, E. and HR. de Jongh. (1978). On cochlear encoding: potentialities and limitations of the
reverse-correlation technique. Journal ofthe Acoustical Society 0/ America 63: 115-135.
de Boer, E. and P. Kuyper. (1968). Triggered correlation. IEEE Transactions on Biomedical Engi-
neering 15:169-179.
de Boer, E. and A.L. Nuttall. (1997). On cochlear cross-correlation functions: connecting nonlinear-
ity and activity. In: Diversity in Auditory Mechanics. E.R. Lewis, G.R. Long, R.F. Lyon, P.M.
Narins, C.R. Steele, and E. Hecht-Poinar (Eds.), World Scientific Press, Singapore, pp. 291-304.
de Figueiredo, R.J.P. (1980). Implications and applications of Kolmogorov' s superposition theo-
rem. IEEE Trans. Autom. Contr. 25:1227-1231.
Deutsch, R. (1955). On a method ofWiener for noise through nonlinear devices. IRE Convention
Record, Part 4, pp. 186-192.
Deutsch, S. and E. Micheli-Tzanakou. (1987). Neuroelectric Systems. New York University Press,
New York.
Dimoka, A., S.H. Courellis, D. Song, V.Z. Marmarelis, and T.W. Berger. (2003). Identification of
the lateral and medial perforant path of the hippocampus using single and dual random impulse
train stimulation. In: Proceedings 0/ the IEEE EMDS Conference, Cancun, Mexico, pp.
1933-1936.
Dittman, J.S., A.C. Kreitzer and W. G. Regehr (2000). Interplay between facilitation, depression
and residual calcium at three presynaptic terminals. J. Neuroscience 20: 1374-1385.
514 REFERENCES
Dowling, I.E. and B.B. Boycott. (1966). Organization of the primate retina: electron microscopy.
Proc. Roy. Soc. (London) Sero B. 166:80-111.
Eckert, H. and L.G. Bishop. (1975). Nonlinear dynamic transfer characteristics of cells in the pe-
ripheral visual pathway of flies. Part I: The retinula cells. Biological Cybernetics 17:1-6.
Edvinsson, L. and D.N. Krause. (2002). Cerebral Blood Flow and Metabolism. Lippincott Williams
and Wilkins, Philadelphia.
Eggermont, 1.1. (1993). Wiener and Volterra analyses applied to the auditory system. Hearing Re-
search 66: 177-201.
Eggermont, 1.1., P.1. Johannesma, and A.M. Aertsen (1983). Quantitative characterization proce-
dure for auditory neurons based on the spectro-temporal receptive field. Hearing Research
10:167-190.
Eggermont, 1.1., P.I.M. Johannesma, and A.M. Aertsen. (1983). Reverse-correlation methods in au-
ditory research. Quarterly Reviews ofBiophysics 16:341.
Emerson, R.C., I.R. Bergen, and E.H. Adelson. (1992). Directionally selective complex cells and
the computation of motion energy in cat visual cortex. Vision Research 32:203-218.
Emerson, R.C. and M.C. Citron. (1988). How linear and nonlinear mechanisms contribute to direc-
tional selectivity in simple cells of cat striate cortex. Invest. Opthalmol. & Vis. Sci., Suppl.
29:23.
Emerson, R.C., M.l. Korenberg, and M.C. Citron. (1989). Identification of intensive nonlinearities
in cascade models of visual cortex and its relation to cell classification. In: Advanced Methods of
Physiological System Modeling, Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp.
97-112.
Emerson, R.C., M.l. Korenberg, and M.C. Citron. (1992). Identification of complex-cell intensive
nonlinearities in a cascade model of cat visual cortex. Biological Cybernetics 66:291-300.
Emerson, R.C., M.C. Citron, W.J. Vaughn, and S.A. Klein. (1987). Nonlinear directionally selec-
tive subunits in complex cells of cat striate cortex. J. Neurophysiol. 58: 33-65.
Emerson, R.C. and G.L. Gerstein. (1977). Simple striate neurons in the cat. I. Comparison of re-
sponses to moving and stationary stimuli. J. Neurophysiol. 40:119-135.
Eykhoff, P. (1963). Some fundamental aspects ofprocess-parameter estimation. IEEE Transactions
on Automatie Control8:347-357.
Eykhoff, P. (1974). System Identification: Parameter and State Estimation. Wiley, New York.
Fakhouri, S.Y. (1980). Identification ofthe Volterra kernels ofnonlinear systems. Proceedings lEE,
Part D 127:246-304.
Fan, Y. and R. Kalaba. (2003). Dynamic programming and pseudo-inverses. Applied Mathematics
and Computation 139:323-342.
Fargason, R.D. and G.D. McCann. (1978). Response properties of peripheral retinula cells within
Drosophila visual mutants to monochromatic Gaussian white-noise stimuli. Vision Research
18:809-813.
Flake, R.H. (1963a). Volterra series representation of nonlinear systems. A. IEEE Trans.
81:330-335.
Flake, R.H. (1963b). Volterra series representation oftime-varying nonlinear systems. In: Proceed-
ings of2nd IFAC Congress., Basel, Switzerland. 2:91-99.
FitzHugh, R. (1969). Thresholds and plateaus in the Hodgkin-Huxley nerve equations. J. Gen.
Physiol. 43: 867-896.
Frechet, M.R. (1928). Les Espaces Abstrait. Academie Francaise, Paris.
Freckmann, G., B. Kalatz, B. Pfeiffer, U. Hoss, and C. Haug. (2001). Recent advances in continu-
ous glucose monitoring. Exp. Clin. Endocrinol. Diabetes Suppl. 2:S347-S357.
French, A.S. (1976). Practical nonlinear system analysis by Wiener kernel estimation in the fre-
quency domain. Biological Cybernetics 24:111-119.
REFERENCES 515
French, A.S. (1984a). The dynamic properties of the action potential encoder in an insect
mechanosensory neuron. Biophys. J. 46:285-290.
French, A.S. (1984b). The receptor potential and adaptation in the cockroach tactile spine. J. Neuro-
seien ce 4:2063-2068.
French, A.S. (1989). Two components ofrapid sensory adaptation in a cockroach mechanoreceptor
neuron. J. Neurophysiol. 62:768-777.
French, A.S. (1992). Mechanotransduction. Ann. Rev. Physiol. 54: 135-152.
French, A.S. and E.G. Butz. (1973). Measuring the Wiener kemels of a nonlinear system using the
fast Fourier transform algorithm. International Journal ojControI17:529-539.
French, A.S. and E.G. Butz. (1974). The use ofWalsh functions in the Wiener analysis ofnonlinear
systems. IEEE Transactions on Computers 23:225-232.
French, A.S. and A.V. Holden. (1971). Frequency domain analysis of neurophysiological data.
Computer Programs in Biomedicine 1:219-234.
French, A.S. and M. Jarvilehto. (1978). The dynamic behavior of photoreceptor cells in the fly in
response to random (white noise) stimulation at a range oftemperatures. Journal 0/ Physiology
274:311-322.
French, A.S. and M.J. Korenberg. (1989). A nonlinear cascade model of action potential encoding
in an insect sensory neuron. Biophys. J. 55:655-661.
French, A.S. and M.J. Korenberg. (1991). Dissection of a nonlinear cascade model for sensory en-
coding. Ann. Biomed. Eng. 19:473-484.
French, A.S. and J.E. Kuster. (1987). Linear and nonlinear behavior ofphotoreceptors during the trans-
duction of small numbers of photons. In: Advanced Methods 0/ Physiological System Modeling,
Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 41--48.
French, A.S. and J.E. Kuster. (1981). Sensory transduction in an insect mechanoreceptor: extended
bandwidth measurements and sensitivity to stimulus strength. Biological Cybernetics 42:87-94.
French, A.S. and V. Z. Marmarelis. (1995). Nonlinear neuronal mode analysis of action potential
encoding in the cockroach tactile spin neuron. Biological Cybernetics 73:425--430.
French, A.S. and V.Z. Marmarelis. (1999). Nonlinear analysis of neuronal systems. In: Modern
Techniques in Neuroscience Research, V. Windhorst & H. Johansson (Eds.), Springer-Verlag,
NewYork.
French, A.S. and S.K. Patrick. (1994). Testing a nonlinear model ofsensory adaptation with a range
of step input functions. In: Advanced Methods 0/ Physiological System Modeling, Volume IIL
V.Z. Marmarelis (Ed.), Plenum, New York, pp. 129-138.
French, A.S. and R.K.S. Wong. (1977). Nonlinear analysis of sensory transduction in an insect
mechanoreceptor. Biological Cybernetics 26:231-240.
French, A.S., A.E.C. Pece, and M.J. Korenberg. (1989). Nonlinear models of transduction and
adaptation in locust photoreceptors. In: Advanced Methods 0/ Physiological System Modeling,
Volume IL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 81-96.
Funahashi, K.-I. (1989). On the approximate realization of continuous mappings by neural net-
works. Neural Networks. 2:183-192.
Fuortes, M.G. and A.L. Hodgkin. (1964). Changes in time scale and sensitivity in the ommatidia of
Limulus. Journal of Physiology 172:239-263.
Gallman, P.G. (1975). An iterative method for identification of nonlinear systems using a Uryson
model. IEEE Transactions on Automatie ControI20:771-775.
Garde, S., M.G. Regalado, V.L. Schechtman, and M.C.K. Khoo. (2001). Nonlinear dynamics of
heart rate variability in cocaine-exposed neonates during sleep. American Journal 0/ Physiolo-
gy-Heart and Circulatory Physiology 280:H2920-H2928.
Gemperlein, R. and G.D. McCann. (1975). A study of the response properties of retinula cells of
flies using nonlinear identification theory. Biological Cybernetics 19:147-158.
516 REFERENCES
George, D.A. (1959). Continuous nonlinear systems. Technical Report No. 355, Research Labora-
tory ofElectronics, M.I.T., Cambridge, MA.
Ghazanshahi, S.D. and M.C.K. Khoo. (1997). Estimation of chemoreflex loop gain using pseudo-
random binary CO 2 stimulation. IEEE Trans. on Biomedieal Engineering 44:357-366.
Ghazanshahi, S.D., V.Z. Marmarelis, and S.M. Yamashiro. (1986). Analysis of the gas exchange
system dynamics during high-frequency ventilation. Annals 0/ Biomedieal Engineering
14:525-542.
Ghazanshahi, S.D., S.M. Yamashiro, and V.Z. Marmarelis. (1987). Use ofrandom forcing for high
frequency ventilation. Journal 0/ Applied Physiology, 62:1201-1205.
Gholmieh G., S.H. Courellis, S. Fakheri, E. Cheung, V.Z. Marmarelis, H. Baudry, and T.W. Berger.
(2003). Detection and classification of neurotoxins using a novel short-term plasticity quantifi-
cation method. Biosensors & Bioeleetronies 18:1467-1478.
Gholmieh, G., S.H. Courellis, V.Z. Marmarelis, and T.W. Berger. (2002). An efficient method for
studying short-term plasticity with random impulse train stimuli. Journal 0/Neuroseienee Meth-
ods 21: 111-127.
Gholmieh, G., S.H. Courellis, D. Song, Z. Wang, V.Z. Marmarelis, and T.W. Berger. (2003). Char-
acterization of short-term plasticity of the dentate gyrus-CA3 system using nonlinear systems
analysis. In: Proeeedings ofthe IEEE EMBS Conference, Cancun, Mexico, pp. 1929-1932.
Gholmieh, G., W. Soussou, S.H. Courellis, V.Z. Marmarelis, T.W. Berger, and M. Baudry. (2001).
A biosensor for detecting changes in cognitive processing based on nonlinear systems analysis.
Biosensors and Bioeleetronics 16:491-501.
Gilbert, E.G. (1977). Functional expansions for the response of nonlinear differential systems. IEEE
Transactions on Automatie ControI22:909-921.
Godfrey, K.R. and P.A.N. Briggs. (1972). Identification of processes with directiondependent dy-
namic responses. Proeeedings lEE 119: 1733-1739.
Godfrey, K.R. and W. Murgatroyd. (1965). Input-transducer errors in binary crosscorrelation exper-
iments. Proeeedings lEE 112:565-573.
Golomb, S.W. (1967). Shift Register Sequenees. Holden-Day, San Francisco.
Goodwin, G.C. and R.L. Payne. (1977). Dynamie System Identifieation: Experiment Design and
Data Analysis. Academic Press, New York.
Goodwin, G.C. and K.S. Sin. (1984). Adaptive Filtering, Predietion and Control. Prentice-Hall, En-
glewood Cliffs, NJ.
Goussard, Y. (1987). Wiener kernel estimation: A comparison of cross-correlation and stochastic
approximation methods. In: Advaneed Methods 0/ Physiological System Modeling, Volume L
V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 289-302.
Goussard, Y., W.C. Krenz, L. Stark, and G. Demoment. (1991). Practical identification offunction-
al expansions of nonlinear systems submitted to non-Gaussian inputs. Annals 0/ Biomedical En-
gineering 19:401-427.
Grossberg, S. (1988). Nonlinear neural networks: principles, mechanisms, and architectures. Neural
Networks.l:17-61.
Grzywacz N.M. and P. Hillman. (1985). Statistical test of linearity of photoreceptor transduction
process: Limulus passes, others fail. Proc. Natl. Aead. Sei. USA 82:232-235.
Grzywacz N.M. and P. Hillman. (1988). Biophysical evidence that light adaptation in Limulus pho-
toreceptors is due to a negative feedback. Biophysieal Journal 53:337-348.
Guttman, R. and L. Feldman. (1975). White noise measurement of squid axon membrane imped-
ance. Biochemieal and Biophysical Research Communieations 67:427-432.
Guttman, R., L. Feldman, and H. Lecar. (1974). Squid axon membrane response to white noise
stimulation. Biophysieal Journal 14:941-955.
REFERENCES 517
Guttman, R., R. Grisell, and L. Feldman. (1977). Strength-frequency relationship for whitenoise
stimulation of squid axons. Math. Biosei. 33:335-343.
Gyftopoulos, E.P. and R.J. Hooper. (1964). Signals for transfer function measurement in nonlinear
systems. Noise Analysis in Nuelear Systems. USAEC Symposium series 4. TID-7679, Boston.
Haber, R. (1989). Structural identification of quadratic block-oriented models based on estimated
Volterra kemels. Int. J. Syst. Seienee. 20: 1355-1380.
Haber, R. and L. Keviczky. (1976). Identification ofnonlinear dynamic systems, In: IFAC Sympo-
sium on Identifieation & System Parameter Esimation., Tbilisi, Georgia, pp. 62-112.
Haber, R. and H. Unbehauen. (1990). Structural identification of nonlinear dynamic systems-a
survey ofinput/output approaches. Automatiea 26:651-677.
Haist, N.D., F.H.1. Chang, and R. Luus. (1973). Nonlinear identification in the presence of corre-
lated noise using a Hammerstein model. IEEE Transactions on Automatie Control 18:552-
555.
Hassoun, M.H. (1995). Fundamentals 0/ Artifieial Neural Networks. MIT Press, Cabridge, MA.
Haykin, S. (1994). Neural Networks: A Comprehensive Foundation. Macmillan, New York.
Hida, E., K.-I. Naka, and K. Yokoyama. (1983). A new photographie method for mapping spatio-
temporal receptive field using television snow stimulation. Jounal 0/ Neuroseienee Methods
8:225.
Hodgkin, A.L. and A.F. Huxley. (1952). A quantitative description ofmembrane current and its ap-
plication to conduction and excitation in nerve. Journal 0/ Physiology 117:500-544.
Holstein-Rathlou, N.-H., K.H. Chon, D.J. Marsh, and V.Z. Marmarelis. (1995). Models of renal
blood flow autoregulation. In: Modeling the Dynamies 0/ Biologieal Systems, E. Mosekilde and
O.G. Mouritsen (Eds.), Springer Verlag, Berlin, pp. 167-185.
Hooper, R.J. and E.P. Gyftopoulos. (1967). On the measurement of characteristic kemels of a class
of nonlinear systems. Neutron Noise, Waves and Pulse Propagation. USAEC Conference Re-
port No. 660206, Boston.
Homik, K., M. Stinchcombe, and H. White. (1989). Multi-later feedforward networks are universal
approximators. Neural Networks 2:359-366.
Hsieh, H.C. (1964). The least squares estimation oflinear and nonlinear system weighting function
of matrices. In! Control 7:84-115.
Huang, S.T. and S. Cambanis. (1979). On the representation ofnonlinear systems with Gaussian in-
put. Stoehastie Proeess. 2: 173.
Hung, G., D.R. Brillinger, and L. Stark. (1979). Interpretation of kemels 11. Mathematieal Bio-
seienees 47: 159-187.
Hung, G., L. Stark, and P. Eykhoff. (1977). On the interpretation ofkemels: I. Computer simulation
of responses to impulse-pairs. Annals 0/ Biomedieal Engineering. 5: 130-143.
Hung. G. and L. Stark. (1977). The kemel identification method (1910-1977): Review of theory,
calculation, application and interpretation. Mathematical Bioseiences 37: 135-190.
Hung. G. and L. Stark. (1991). The interpretation of kemels-an overview. Mathematieal Bio-
sciences 19:509-519.
Hunter, LW. and M.J. Korenberg. (1986). The identification of nonlinear biological systems:
Wiener and Hammerstein cascade models. Biologieal Cybemetics 55: 135-144.
Hunter, LW. and R.E. Keamey. (1987). Quasi-linear, time-varying, and nonlinear approaches to the
identification of muscle and joint mechanics. In: Advanced Methods 0/ Physiologieal System
Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles,
pp. 128-147.
Iatrou, M., T.W. Berger, and V.Z. Marmarelis. (1999a). Modeling ofnonlinear nonstationary dy-
518 REFERENCES
namic systems with a novel class of artificial neural networks. IEEE Transaetions on Neural
Networks 10:327-339.
Iatrou M., T.W. Berger and V.Z. Mannarelis. (1999b). Application ofa novel modeling method to
the nonstationary properties of potentiation in the rabbit hippocampus. Annals ofBiomedieal En-
gineering 27:581-591.
Jacobson, L.D., J.P. Gaska, H.-W. Chen, and D.A. Pollen. (1993). Structural testing ofmulti-input
linear-nonlinear cascade models for cells in macaque striate cortex. Vision Research
33:609-626.
James, A.C. (1992). Nonlinear operator network models ofprocessing in fly lamina. In Nonlinear
Vision. R.B. Pinter and B. Nabet (Eds.), CRC Press, Boca Raton, FL, Chapter 2, pp. 40-73.
Juusola, M. and A. S. French. (1995). Transduction and adaptation in spider slit sense organ
mechanoreceptors. Journal ofNeurophysiology 74:2513-2523.
Kalaba, R.E. and K. Springam. (1982). Identifieation, Control, and Input Optimization. Plenum
Press, New York.
Kalaba, R.E. and L. Tesfatsion. (1990). Flexible least squares for approximately linear systems.
IEEE Trans. Syst. Man & Cyber. 20:978-989.
Keamey, R.E. and LW. Hunter. (1986). Evaluation of a technique for the identification of time-
varying systems using experimental and simulated data. Digest 12th CMBEC 12:75-76.
Keamey, R.E. and LW. Hunter. (1990). System identification ofhumanjoint dynamies. CRC Criti-
eal Reviews ofBiomedieal Engineering 18:55-87.
Khoo, M.C.K. (Ed.). (1989). Modeling and Parameter Estimation in Respiratory Control. Plenum,
NewYork.
Khoo, M.C.K. (Ed.). (1996). Bioengineering Approaches to Pulmonary Physiology and Medicine.
Plenum, New York.
Khoo M.C.K. (2000). Physiological Control Systems: Analysis, Simulation, and Estimation. IEEE
Press, New York.
Khoo, M.C.K. and V.Z. Marmarelis. (1989). Estimation ofchemoreflex gain from spontaneous sigh
responses. Annals ofBiomedical Engineering, 17:557-570.
Klein, S. and S. Yasui. (1979). Nonlinear systems analysis with non-Gaussian white stimuli-Gen-
eral basis functionals and kemels. IEEE Trans. Info. Theo. 25:495-500.
Klein, S.A. (1987). Relationships between kemels measured with different stimuli. In: Advaneed
Methods ofPhysiologieal System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Sim-
ulations Resource, Los Angeles, pp. 278-288.
Kolmogorov, A.N. (1957). On the representation ofcontinuous functions ofseveral variables by su-
perposition of continuous functions of one variable and addition. Doklady Akademii Nauk.
SSSR 114:953-956; AMS Transl. 2:55-59, (1963).
Korenberg, M.J. (1973a). Identification of biological cascades of linear and static nonlinear sys-
tems. Proeeedings 16th Midwest Symposium Cireuit Theory 18.2:1-9.
Korenberg, M.J. (1973b). Identification ofnonlinear differential systems, In: Proeeedings Joint Au-
tomatie Control Conference pp. 597-603, San Francisco.
Korenberg, M.J. (1973c). New methods in the frequency analysis oflinear time-varying differential
equations. In: Proe ofIEEE International Symposium on Cireuit Theory, pp. 185-188.
Korenberg, M.J. (1973d). Crosscorrelation analysis ofneural cascades. In: Proe. 10th Annual Rocky
Mountain Bioengineering Symposium, pp. 47-52, Denver.
Korenberg, M.J. (1982). Statistical identification of parallel cascades of linear and nonlinear sys-
tems. In: IFAC Symposium on Identifieation and System Parameter Estimation, Arlington, VA,
pp. 580-585.
Korenberg, M.J. (1983). Statistical identification of difference equation representations for nonlin-
ear systems. Eleetronies Letters 19:175-176.
REFERENCES 519
Korenberg, M.I. (1984). Statistieal identifieation of Volterra kerneis of high order systems. In:
ICAS'84, pp. 570-575.
Korenberg, M.I (1987). Funetional expansions, parallel eascades and nonlinear differenee equa-
tions. In: Advanced Methods 0/ Physiological System Modeling, Volume L V.Z. Marmarelis
(Ed.), Biomedieal Simulations Resouree, Los Angeles, pp. 221-240.
Korenberg, M.I. (1988). Identifying nonlinear differenee equation and funetional expansion repre-
sentations: The fast orthogonal algorithm. Annals ofBiomedical Engineering 16:123-142.
Korenberg, M.I. (1989a). A robust orthogonal algorithm for system identifieation and time series
analysis. Bio/. Cybern. 60:267-276.
Korenberg, M.I. (1989b). Fast orthagonal algorithms for nonlinear system identifieation and time-
series analysis. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Mar-
marelis (Ed.), Plenum, New York, pp. 165-178.
Korenberg, M.I. (1991). Parallel easeade identifieation and kernel estimation for nonlinear systems.
Annals of Biomedical Engineering 19:429-455.
Korenberg, M.I. and LW. Hunter. (1986). The identifieation ofnonlinear biologie al systems: LNL
caseade models. Biological Cybernetics 55:125-134.
Korenberg, M.I. and LW. Hunter. (1990). The identifieation of nonlinear systems: Wiener kernel
approaehes. Ann. Biomed. Eng. 18:629-654.
Korenberg, M.I., S.B. Bruder, and P.I. MeIlroy. (1988). Exaet orthogonal kernel estimation from fi-
nite data reeords: Extending Wiener' s identifieation of nonlinear systems. Annals 0/ Biomedical
Engineering 16:201-214.
Krausz, H.L (1975). Identifieation of nonlinear systems using random impulse train inputs. Biologi-
cal Cybernetics 19:217-230.
Krausz, H.L and W.G. Friesen. (1977). The analysis ofnonlinear synaptie transmission. Journal 01
General Physiology 70:243.
Krausz, H.L and K.L Naka. (1980). Spatio-temporal testing and modeling of eatfish retinal neurons.
Biophysical Journal 29: 13-36.
Krenz, W. and L. Stark. (1987). Interpretation of kerneis of funetional expansions. In: Advanced
Methods 01Physiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedieal Sim-
ulations Resouree, Los Angeles, pp. 241-257.
Krenz, W. and L. Stark. (1991). Interpretation of funetional series expansions. Ann. Biomed. Eng.
19:485-509.
Krieger, D., T.W. Berger, and R.I. Sealabassi. (1992). Instantaneous eharaeterization oftime-vary-
ing nonlinear systems. IEEE Trans. Biomedical Engineering 39:420-424, 1992.
Kroeker, I.P. (1977). Wiener analysis of nonlinear systems using Poisson-Charlier eross eorrela-
tion. Biological Cybernetics 27:221-227.
Kroeker, I.P. (1979). Synaptie faeilitation in Aplysia explored by random presynaptie stimulation.
Journal ofGeneral Physiology 73:747.
Ku, Y.H. and A.A. Wolf. (1966). Volterra-Wiener funetionals for the analysis ofnonlinear systems.
Journal 0/ the Franklin Institute 3:9-26.
Landau, M. and C.T. Leondes. (1975). Volterra series synthesis of nonlinear stoehastic tracking
systems. IEEE Trans. Aerospace and Electronic Systems 10:245-265.
Lasater, E.M. (1982a). A white-noise analysis of responses and receptive fields of catfish cones.
Journal ofNeurophysiology 47:1057.
Lasater, E.M. (1982b). Spatial receptive fields of catfish retinal ganglion cells. Journal 01 Neuro-
physiology 48:823.
Lee, Y. W. (1964). Contributions of Norbert Wiener to linear theory and nonlinear theory in en-
gineering. In: Selected Papers 0/ Norbert Wiener, SIAM, MIT Press, Cambridge, MA, pp.
17-33.
520 REFERENCES
Lee, Y.W. and M. Sehetzen. (1965). Measurement ofthe Wiener kerneis of a nonlinear system by
eross-eorrelation. International Journal 0/ Control2:23 7-254.
Leontaritis, LJ. and S.A. Billings. (1985). Input-output parametrie models for nonlinear systems;
Part I-Deterministie nonlinear systems; Part 11 pp. 328-344. Int. J. ControI41:303-327.
Lewis, E.R. and ~.R. Henry. (1995). Nonlinear effeets of noise on phase-Ioeked eoehleamerve re-
sponses to sinusoidal stimuli. Hearing Research 92:1-16.
Lewis, E.R., K.R. Henry, and W.M. Yamada. (2000) Essential roles ofnoise in neural eoding and in
studies ofneural eoding. Biosystems 58:109-115.
Lewis, E.R., K.R. Henry, and W.M. Yamada. (2002a). Tuning and timing ofexeitation and inhibi-
tion in primary auditory nerve fibers. Hearing Research 171:13-31.
Lewis, E.R., K.R. Henry, and W.M. Yamada. (2002b). Tuning and timing in the gerbil ear: Wiener
kernel analysis. Hearing Research 174:206-221.
Lewis, E.R. and P. van Dijk. (2003). New variations on the derivation ofspeetro-temporal reeeptive
fields for primary auditory afferent axons. Hearing Research 186:30-46.
Lipson, E.D. (1975a). White noise analysis ofPhyeomyees light growth response system I. Normal
intensity range. Biophysical Journal 15:989-1012.
Lipson, E.D. (1975b). White noise analysis of Phyeomyees light growth response system. II. Ex-
tended intensity ranges. Biophysical Journal 15:1013-1031.
Lipson, E.D. (1975e). White noise analysis ofPhyeomyees light growth ~esponse system. 111. Pho-
tomutants. Biophysical Journal 15:1033-1045. I
Ljung, L. (1987). System Identification: Theory for the User. Prentiee-Hall Ine., Englewood Cliffs,
NJ.
Ljung, L. and T. Glad. (1994). Modeling 0/ Dynamic Systems. Prentiee-Hall, Englewood Cliffs, NJ.
Ljung, L. and T. Soderstrom. (1983). Theory and Practice 0/ Recursive Identification. MIT Press,
Cambridge, MA.
Marmarelis, P.Z. (1972). Nonlinear identifieation ofbioneuronal systems through white-noise stim-
ulation. In: Thirteenth Joint Automatie Control Conference, Stanford University, Stanford, CA,
pp. 117-126.
Marmarelis, P.Z. (1975). The noise about white-noise: Pros and eons." In: Proceedings 1st Sympo-
sium on Testing and Identification on Nonlinear Systems, California Institute of Teehnology,
Pasadena, CA, pp. 56-75.
Marmarelis, P.Z. and V.Z. Marmarelis. (1978). Analysis ofPhysiological Systems: The White-Noise
Approach. New York: Plenum. (Russian translation: Mir Press, Moseow, 1981. Chinese transla-
tion: Aeademy ofSeienees Press, Beijing, 1990.)
Marmarelis, P.Z. and G.D. MeCann. (1973). Development and applieation ofwhite-noise modeling
teehniques for studies ofinseet visual nervous systems. Kybemetik 12:74-89.
Marmarelis, P.Z. and G.D. MeCann. (1975). Errors involved in the praetieal estimation ofnonlinear
system kerneis. In: Proceedings 1st Symposium on Testing and Identification 0/ Nonlinear Sys-
tems. California Institute ofTeehnology, Pasadena, CA, pp. 147-173.
Marmarelis, P.Z. and K.-L Naka. (1972). White noise analysis of a neuron ehain: An applieation of
the Wiener theory. Science 175:1276-1278.
Marmarelis, P.Z. and K.-L Naka. (1973a). Nonlinear analysis and synthesis of reeeptivefield re-
sponses in the eatfish retina. I. Horizontal eell ~ ganglion eell chain. Journal 0/ Neurophysiolo-
gy 36:605-618.
Marmarelis, P.Z. and K.-L Naka. (1973b). Nonlinear analysis and synthesis of reeeptive-field re-
sponses in the eatfish retina. II. One-input white-noise analysis. Journal 0/ Neurophysiology
36:619-633.
Marmarelis, P.Z. and K.-L Naka. (1973e). Nonlinear analysis and synthesis of reeeptive-field re-
REFERENCES 521
sponses in the catfish retina. III. Two-input white-noise analysis. Journal of Neurophysiology
36:634-648.
Marmarelis, P.Z. and K.-I. Naka. (1973d). Mimetic model ofretinal network in catfish. In: Confer-
ence Proceedings on Regulation and Control in Physiological Systems, A.S. Iberall and A.C.
Guyton (Eds.). Rochester, NY, pp. 159-162.
Marmarelis, P.Z. and K.-I. Naka. (1974a). Identification of multi-input biological systems. IEEE
Trans. Biomedical Engineering 21:88-101.
Marmarelis, P.Z., and K.-I. Naka. (1974b). Experimental analysis of a neural system: Two model-
ing approaches. Kybemetik 15:11-26.
Marmarelis, P.Z. and F.E. Udwadia. (1976). The identification ofbuilding structural systems; Part
11: The nonlinear case. Bulletin ofthe Seismological Society ofAmerica 66: 153-171.
Marmarelis, V.Z. (1975). Identification of nonlinear systems through multi-level random signals.
In: Proceedings 1st Symposium on Testing and Identification ofNonlinear Systems, Pasadena,
CA, pp. 106-124.
Marmarelis, V.Z. (1976). Identification of Nonlinear Systems through Quasi-White Test Signals.
Ph.D. Thesis, Califomia Institute ofTechnology, Pasadena, CA.
Marmarelis, V.Z. (1977). A family of quasi-white random signals and its optimal use in biological
system identification. Part I: Theory, Biological Cybernetics 27:49-56.
Marmarelis, V.Z. (1978a). Random vs. pseudorandom test signals in nonlinear system identifica-
tion. IEEE Proceedings 125:425--428.
Marmarelis, V.Z. (1978b). The optimal use ofrandom quasi-white signals in nonlinear system iden-
tification. Multidisciplinary Research 6:112-141.
Marmarelis, V.Z. (1979a). Error analysis and optimal estimation procedures in identification of
nonlinear Volterra systems. Automatica 15:161-174.
Marmarelis, V.Z. (1979b). Methodology for nonstationary nonlinear analysis ofthe visual system.
In: Proceedings U.S.-Japan Joint Symposium on Advanced Analytical Techniques Applied to the
Visual System, Tokyo, Japan, pp. 235-244.
Marmarelis, V.Z. (1979c). Practical identification of the general time-variant nonlinear dynamic
system. In: Proceedings International Conference on Cybernetics and Society, Denver, CO, pp.
727-733.
Marmarelis, V.Z. (1980a). Identification methodology for nonstationary nonlinear biological sys-
tems. In: Proceedings International Symposium on Circuits and Systems, Houston, TX, pp.
448-452.
Marmarelis, V.Z. (1980b). Identification of nonlinear systems by use of nonstationary white-noise
inputs. Applied Mathematical Modeling 4:117-124.
Marmarelis, V.Z. (1980c). Identification of nonstationary nonlinear systems. In: 14th Asilomar
Conference on Circuits, Systems and Computers, Pacific Grove, CA, pp. 402-406.
Marmarelis, V.Z. (1981a). Practicable identification of nonstationary nonlinear systems. lEE Pro-
ceedings, Part D 128:211-214.
Marmarelis, V.Z. (1981b). A single-record estimator for correlation functions ofnonstationary ran-
dom processes. Proceedings ofthe IEEE 69:841-842.
Marmarelis, V.Z. (1982). Non-parametric validation of parametrie models. Mathematical Model-
ling 3:305-309.
Marmarelis, V.Z. (1983). Practical estimation of correlation functions of nonstationary Gaussian
processes. IEEE Transactions on Information Theory 29:937-938.
Marmarelis, V.Z. (1987a). Advanced Methods of Physiological System Modeling, Volume I Bio-
medical Simulations Resource, Los Angeles, Califomia.
Marmarelis, V.Z. (1987b). Nonlinear and nonstationary modeling ofphysiological systems. In: Ad-
522 REFERENCES
vanced Methods ofPhysiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomed-
ical Simulations Resource, Los Angeles, California, pp. 1-24.
Marmarelis, V.Z. (1987c). Recent advances in nonlinear and nonstationary analysis. In: Advanced
Methods ofPhysiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simu-
lations Resource, Los Angeles, California, pp. 323-336.
Marmarelis, v.z. (1987d). Visual system nonlinear modeling. In: Systems and Control Encyclope-
dia: Theory, Technology, Applications, M.G. Singh (Ed.), Pergamon Press, Oxford, pp.
5065-5070.
Marmarelis, V.Z. (1988a). Coherence and apparent transfer function measurements for nonlinear
physiological systems. Annals ofBiomedical Engineering 16:143-157.
Marmarelis, V.Z. (1988b). The role of nonlinear models in neurophysiological system analysis.
In: 1st IFAC Symposium on Modeling and Control in Biomedical Systems, Venice, Italy, pp.
25-35.
Marmarelis, V.Z. (1989a). Identification and modeling of a class ofnonlinear systems. Mathemati-
cal Computer Modelling 12:991-995.
Marmarelis, V.Z. (1989b). Linearized models of a class of nonlinear dynamic systems. Applied
Mathematical Modelling 13:21-26.
Marmarelis, V.Z. (1989c). Signal transformation and coding in neural systems. IEEE Transactions
on Biomedical Engineering 36:15-24.
Marmarelis, V.Z. (1989d). The role of nonlinear models in neurophysiological system analysis. In:
Modelling and Control in Biomedical Systems, C. Cobelli and L. Mariani (Eds.), pp. 39-50,
Pergamon Press, Oxford.
Marmarelis, V.Z. (Ed.). (198ge). Advanced Methods ofPhysiological System Modeling, Volume II.
Plenum, New York.
Marmarelis, V.Z. (1989f). Volterra-Wiener analysis of a class of nonlinear feedback systems and
application to sensory biosystems. In: Advanced Methods of Physiological System Modeling,
Volume IL Plenum, New York, pp. 1-52.
Marmarelis, V.Z. (1991). Wiener analysis of nonlinear feedback in sensory systems. Annals ofBio-
medical Engineering 19:345-382.
Marmarelis, V.Z. (1993). Identification of nonlinear biological systems using Laguerre expansions
ofkernels. Annals ofBiomedical Engineering 21:573-589.
Marmarelis, V.Z. (1994a). Nonlinear modeling of physiological systems using principal dynamic
modes. In: Advanced Methods of Physiological System Modeling, Volume IIL Plenum, New
York, pp. 1-28.
Marmarelis, V.Z. (1994b). On kernel estimation using non-Gaussian and/or non-white input data.
In: Advanced Methods ofPhysiological System Modeling, Volume IIL Plenum, New York, pp.
229-242.
Marmarelis, V.Z. (1994c). Three conjectures on neural network implementation ofVolterra models
(mappings). In: Advanced Methods ofPhysiological System Modeling, Volume IIL Plenum, New
York, pp. 261-268.
Marmarelis, V.Z. (Ed.). (1994d). Advanced Methods ofPhysiological System Modeling, Volume IIL
Plenum, New York.
Marmarelis, V.Z. (1995). Methods and tools for identification of physiological systems. In: Hand-
book of Biomedical Engineering, J.D. Bronzino (Ed.). Boca Raton, FL: CRC Press, pp.
2422-2436.
Marmarelis, V.Z. (1997). Modeling methodology for nonlinear physiologieal systems. Annals of
Biomed. Eng. 25:239-251.
Marmarelis, V.Z. (2000). Methods and tools for identification of physiological systems. In: The
REFERENCES 523
Biomedical Engineering Handbook, 2nd Ed., Volume 2, J.D. Bronzino (Ed.), Chapter 163, CRC
Press, Boea Raton, FL, pp. 163.1-163.15.
Marmarelis, V.Z. and N. Herman. (1988). LYSIS: An interaetive software system for nonlinear
modeling and simulation. In: 1988 SCS Multiconference: Modeling and Simulation on Micro-
computers, San Diego, CA, pp. 6-10.
Marmarelis, V.Z. and G.D. MeCann. (1975). Optimization of test parameters for identifieation of
spike-train responses of biologieal systems through random test signals. In: Proceedings 1st
Symposium on Testing and Identification ofNonlinear Systems, Pasadena, CA, pp. 325-338.
Marmarelis, V.Z. and G.D. MeCann. (1977). A family ofquasi-white random signals and its opti-
mal use in biologieal system identification. Part 11: Application to the photoreceptor of Callipho-
ra erythrocephala. Biological Cybernetics 27:57-62.
Marmarelis, V.Z. and G.D. Mitsis. (2000). Nonparametrie modeling ofthe glucose-insulin system.
In: Annual Conferenee Biomedical Engineering Society, Seattle, WA.
Marmarelis, V.Z. and M.E. Orme. (1993). Modeling ofneural systems by use ofneuronal modes.
IEEE Transactions on Biomedical Engineering 40:1149-1158.
Marmarelis, V.Z. and A.D. Sams. (1982). Evaluation ofVolterra kerneis from Wiener kernel mea-
surements. In: 15th Annual Hawaii Internernational Conference on System Seien ces, Honolulu,
HI, pp. 322-326.
Marmarelis, V.Z. and S.M. Yamashiro. (1982). Nonparametrie modeling ofrespiratory mechanics
and gas exchange. In: Proceedings 6th IFAC Symposium on Identification and System Parame-
ter Estimation, Arlington, VA, pp. 586-591.
Marmarelis, V.Z. and X. Zhao. (1994). On the relation between Volterra models and feedforward
artificial neural networks. In: Advanced Methods ofPhysiological System Modeling, Volume IIL
Marmarelis, V.Z. and X. Zhao. (1997). Volterra models and three-Iayer perceptrons. IEEE Transac-
tions on Neural Networks 8: 1421-1433.
Marmarelis, V.Z., M.C. Citron, and C.P. Vivo. (1986). Minimum-order Wiener modeling of spike
output system. Biological Cybemetics 54:115-123.
Marmarelis, V.Z., M. Juusola, and A.S. French. (1999a). Prineipal dynamic mode analysis ofnon-
linear transduetion in a spider mechanoreceptor. Annals ofBiomedical Engineering 27:391-402.
Marmarelis, V.Z., K.H. Chon, N.H. Holstein-Rathlou and D.J. Marsh. (1999b). Nonlinear analysis
of renal autoregulation in rats using principal dynamic modes. Annals ofBiomedical Engineer-
ing 27:23-31.
Marmarelis, V.Z., G.D. Mitsis, K. Hueeking, and R.N. Bergman. (2002). Nonparametrie modeling
of the insulin-glucose dynamie relationships in dogs. In: Proeeedings of2nd Joint IEEE/EMBS
Conference, Houston, TX, pp. 224-225.
Marmarelis, V.Z., K.H. Chon, Y.M. Chen, D.J. Marsh, and N.H. Holstein-Rathlou. (1993). Nonlin-
ear analysis of renal autoregulation under broadband forcing conditions. Annals ofBiomedical
Marmarelis, V.Z., S.F. Masri, F.E. Udwadia, T.K. Caughey, and G.D. Jeong. (1979). Analytical and
experimental studies of the modeling of a class of nonlinear systems. Nuclear Engineering and
Design 55:59-68.
Marsh, D.J., J.L. Osborn, and W.J. Cowley. (1990). liftluctuations in arterial pressure and regula-
tion ofrenal blood flow in dogs. American Journal ofPhysiology 258:FI394-FI400.
McCann, G.D. (1974). Nonlinear identification theory models for successive stages of visual ner-
vous systems in flies. Journal ofNeurophysiology 37:869-895.
McCann, G.D. and J.C. Dill. (1969). Fundamental properties ofintensity, form and motion percep-
tion in the visual nervous systems of Calliphora phaenicia and Musea domestica. Journal of
General Physiology 53:385-413.
524 REFERENCES
McCann, G.D. and P.Z. Marmarelis (Eds.). (1975). In: Proceedings of the First Symposium on
Testing and Identification ofNonlinear Systems, Califomia Institute of Technology, Pasadena,
CA.
McCann, G.D., R.D. Fargason, and V.T. Shantz. (1977). The response properties ofretinula cells in
the fly Callifphora erythrocephala as a function ofthe wave-length and polarization properties of
visible and ultraviolet light. Biological Cybernetics 26:93-107.
McCullogh, W.S. and W.R. Pitts. (1943). A logical calculus ofideas immanent in nervous activity.
Bull. Math. Biophys. 5:115-133.
Mitsis, G.D. and V.Z. Marmarelis. (2002). Modeling of nonlinear physiological systems with fast
and slow dynamics. L Methodology. Annals ofBiomedical Engineering 30:272-281.
Mitsis, G.D., R. Zhang, B.D. Levine, and V.Z. Marmarelis. (2002). Modeling ofnonlinear systems
with fast and slow dynamies. 11. Application to cerebral autoregulation in humans. Annals of
Biomedical Engineering 30:555-565.
Mitsis, G.D., P.N. Ainslie, M.J. Poulin, P.A. Robbins, and V.Z. Marmarelis. (2003a). Nonlinear
modeling of the dynamic effects of arterial pressure and blood gas variations on cerebral blood
flow in healthy humans. In: The IXth Oxford Conference on Modeling and Control ofBreathing,
Paris, France.
Mitsis, G.D., S. Courellis, A.S. French, and V Z. Marmarelis. (2003b). Principal dynamic mode
analysis of a spider mechanoreceptor action potentials. In: Proceedings 25th Anniversary Con-
ference ofthe IEEE EMBS, Cancun, Mexico, pp. 2051-2054.
Mitsis, G.D., A. Mahalingam, Z. Zhang, B.D. Levine, and V. Z. Marmarelis. (2003c). Nonlinear
analysis of dynamic cerebral autoregulation in humans under orthostatic stress. In: Proceedings
25th Anniversary Conference ofthe IEEE EMBS, Cancun, Mexico, pp. 398-401.
Moller, A.R. (1973). Statistical evaluation ofthe dynamic properties of cochlear nucleus units using
stimuli modulated with pseudorandom noise. Brain Research 57:443-456.
Moller, A.R. (1975). Dynamic properties of excitation and inhibition in the cochlear nucleus. Acta
Physiologica Scandinavica 93:442-454.
Moller, A.R. (1976). Dynamic properties ofthe responses of single neurones in the cochlear nucle-
us of the rat. Journal ofPhysiology 259:63-82.
Moller, A.R. (1977). Frequency selectivity of single auditory-nerve tibers in response to broadband
noise stimuli. Journal ofthe Acoustical Society ofAmerica 62:135-142.
Moller, A.R. (1978). Responses of auditory nerve tibers to noise stimuli show cochlear nonlineari-
ties. Acta Oto-Laryngol (Stockholm) 86:1-8.
Moller, A.R. (1983). Frequency selectivity of phase-locking of complex sounds in the auditory
nerve ofthe rat. Hearing Research 11:267-284.
Moller, A.R. (1987). Analysis of the auditory system using pseudorandom noise. In: Advanced
Methods ofPhysiological System Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Sim-
Moller, A.R. (1989). Volterra-Wiener analysis from the whole-nerve responses of the exposed au-
diotory nerve in man to pseudorandom noise. In: Advanced Methods of Physiological System
Moller, A.R. and A. Rees. (1986). Dynamic properties ofthe responses ofsingle neurons in the in-
ferior colliculus ofthe rat. Hearing Research 24:203-215.
Moore, G.P. and R.A. Auriemma. (1985). Testing a gamma-activated multiple spike-generator hy-
pothesis for the Ia afferent. In: The Muscle Spindle, LA. Boyd & M.R. Gladden (Eds.), Stockton
Press, New York, pp. 391-395.
Moore, G.P. and R.A. Auriemma. (1985). Production of muscle stretch receptor behavior using
Wiener kemels. Brain Research 331: 185-189.
Moore, G.P., D.R. Perkel, and lP. Segundo. (1966). Statistical analysis and functional interpreta-
tion of neuronal spike data. Annual Review ofPhysiology 28:493-522.
REFERENCES 525
Moore, G.P., D.G. Stuart, E.K. Stauffer, and R. Reinking. (1975). White-noise analysis of mam-
malian muscle receptors. In: Proeeedings 1st Symposium on Testing and Identifieation ofNon-
linear Systems, G. D. McCann and P. Z. Marmarelis (Eds.), Califomia Institute ofTechnology,
Pasadena, CA, 316-324.
Naka, K.-I. (1971). Receptive field mechanism in the vertebrate retina. Seienee 171:691-693.
Naka, K.-I. (1976). Neuronal circuitry in the catfish retina. Invest. Opthalmol. 15:926.
Naka, K.-I. (1977). Functional organization of the catfish retina. Journal ofNeurophysiology 40:26.
Naka, K.-I. (1982). The cells horizontal cells talk to. Vision Res. 22:653.
Naka, K.-I. and V. Bhanot (2002). White-noise analysis in retinal physiology. New York Series,
Part. 111.
Naka, K.-I. and T. Ohtsuka. (1975). Morphological and functional identifications of catfish retinal
neurons. 11. Morphological identification. Journal ofNeurophysiology 38:72-91.
Naka, K.I., M. Itoh, and N. Ishii. (1987). White-Noise Analysis in Visual Physiology. In: Advaneed
Naka, K.-I., G.W. Davis, and R.Y. Chan. (1979). Receptive-field organization in catfish retina. Sen-
sory Proe. 2:366.
Naka, K.-I., H.M. Sakai, and N. Ishii. (1988). Generation and transformation ofsecondorder nonlin-
earity in catfish retina. Annals ofBiomedieal Engineering 16:53-64.
Naka, K.-I., P.Z. Marmarelis, and R.Y. Chan. (1975). Morphological and functional identifications of
catfish retinal neurons. 111. Functional identifications. Journal ofNeurophysiology 38:92-131.
Naka, K.-I., R.L. Chappell, and M. Sakuranaga. (1982). Wiener analysis of turtle horizontal cells.
Biomed. Res. 3(Suppl.): 131.
Naka, K.-I., R.Y. Chan, and S. Yasui. (1979). Adaptation in catfish retina. Journal ofNeurophysiol-
ogy 42:441--454.
Narayanan, S. (1967). Transistor distortion analysis using Volterra series representation. The Bell
System Technical Journal 46:991-1024.
Narayanan, S. (1970). Application ofVolterra series to intermodulation distortion analysis ofa tran-
sistor feedback amplifier. IEEE Transactions on Circuit Theory 17:518-527.
Narenda, K.S. and P.G. Gallman. (1966). An iterative method for the identification of nonlinear
system using a Hammerstein model. IEEE Transactions on Automatie Controlll:546-550.
Neis, V.S. and J.L. Sackman (1967). An experimental study of a non-linear material with memory.
Trans. Soe. Rheology. 11:307-333.
Ni, T.-C., M. Ader, and R.N. Bergman. (1997). Reassessment of glucose effectiveness and insulin
sensitivity from minimal model analysis: A theoretical evaluation of the singlecompartment glu-
cose distribution assumption. Diabetes 46: 1813-1821.
Nikias, C.L. and A.P. Petropulu. (1993). Higher-Order Spectra Analysis. Prentice-Hall, Englewood
Cliffs, NJ.
Ogura, H. (1972). Orthogonal functionals ofthe Poisson process. IEEE Transaetions on Informa-
tion Theory 18:473--481.
Ogura, H. (1985). Estimation of Wiener Kemeis of a nonlinear system and a fast algorithm using
digital Laguerre filters, In: Proeeedings 15th NIBB Conferenee on Information Proeessing in
Neuron Network, Okazaki, Japan, pp. 14-62.
O'Leary, D.P. and V. Honrubia. (1975). On-line identification ofsensory systems using pseudoran-
dom binary noise perturbations. Biophysieal Journal 15:505-532.
O'Leary, D~P., R. Dunn, and V. Honrubia. (1974). Functional and anatomical correlation ofafferent
responses from the isolated semicircular canal. Nature 251:225-227.
O'Leary, D.P., R.F. Dunn, and V. Honrubia. (1976). Analysis of afferent responses from isolated
semicircular canal of the guitarfish using rotational acceleration white-noise inputs. Part I: Cor-
526 REFERENCES
relation of response dynamics with receptor innervation. Journal of Neurophysiology

39:631-647.
Palm, G. (1979). On representation and approximation of nonlinear systems-Part 11: discrete time.
Biological Cybernetics 34:49-52.
Palm, G. and T. Poggio. (1977a). Wiener-like system identification in physiology. J. Math. Biol.
4:375-381.
Palm, G. and T. Poggio. (1977b). The Volterra representation and the Wiener expansion: validity
and pitfalls. SIAM Journal 01Applied Mathematics 33: 195-216.
Palm, G. and T. Poggio (1978). Stochastic identification methods for nonlinear systems: an exten-
sion ofthe Wiener theory. SIAM Journal ofApplied Mathematics 34:524-534.
Palmer, L.A., A. Gottschalk, and J.P. Jones. (1987). Constraints on the estimation of spatial recep-
tive field profiles of simple cells in visual cortex. In: Advanced Methods ofPhysiological System
Modeling, Volume L V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles,
pp. 205-220.
Panerai, R.B., S.L. Dawson, and J.F. Potter. (1999). Linear and nonlinear analysis of human dy-
namic cerebral autoregulation. American Journal ofPhysiology 277:H 1089-1 099.
Panerai, R.B., D.M. Simpson, S.T. Deverson, P. Mahony, P. Hayes, and D.H. Evans. (2000). Multi-
variate dynamic analysis of cerebal blood flow regulation in humans. IEEE Transactions on Bio-
medical Engineering 47:419-421.
Papazoglou, T.G., T. Papaioannou, K. Arakawa, M. Fishbein, V.Z. Marmarelis, and W.S. Grund-
fest. (1990). Control of excimer laser aided tissue ablation via laser-induced fluorescence moni-
toring. Applied Optics 29:4950-4955.
Patwardhan, A.R., S. Vallurupali, J.M. Evans, E.N. Bruce, and C.F. Knap. (1995). Override of
spontaneous respiratory pattern generator reduces cardiovascular parasympathetic influence.
Journal ofApplied Physiology 79:1048-1054.
Pinter, R.B. (1983). The electrophysiological bases for linear and for non-linear productterm lateral
inhibition and the consequences ofwide-field textured stimuli. J. Theor. Biol. 105:233-243.
Pinter, R.B. (1984). Adaptation ofreceptive field spatial organization via multiplicative lateral inhi-
bition. J. Theor. Bio!. 110:424-444.
Pinter, R.B. (1985). Adaptation of spatial modulation transfer functions via nonlinear lateral inhibi-
tion. Biol. Cybern. 51 :285-291.
Pinter, R.B. (1987). Kernel synthesis from nonlinear multiplicative lateral inhibition. In: Advanced
Pinter, R.B. and B. Nabet (Eds.) (1992). Nonlinear Vision: Determination of Neural Receptive
Fields, Function, and Networks. CRC Press, Boca Raton, FL.
Poggio, T. and V. Torre. (1977). Volterra representation for some neuron models. Biological Cy-
bernetics 27:1113-1124.
Poggio, T. and W. Reichardt. (1973). Considerations on models ofmovement detection. Kybernetik
13:223-227.
Poggio, T. and W. Reichardt. (1976). Visual control of orientation behaviour in the fly. Part 11: To-
wards the underlying neural interactions. Quarterly Reviews ofBiophysics 9:377-448. [For Part
I see Reichardt and Poggio (1976).]
Poulin, M.J., P.-J. Liang, and P.A. Robbins. (1996). Dynamics ofthe cerebral blood flow response
to step changes in end-tidal PC0 2 and P0 2 in humans. Journal of Applied Physiology
81:1084-1095.
Poulin, M.J., P.-J. Liang, and P.A. Robbins. (1998). Fast and slow components of cerebral blood
flow response to step decreases in end-tidal CO2 in humans. Journal of Applied Physiology
85:388-397.
REFERENCES 527
Powers, R.L. and D.W. Amett. (1981). Spatio-temporal cross-correlation analysis of catfish retinal
neurons. Biological Cybernetics 41: 179.
Price, R.A. (1958). A useful theorem for nonlinear devices having Gaussian inputs. IRE Trans. In-
form. Theory 4:69-72.
Ratliff, F. B.W. Knight, and N. Graham. (1969). On tuning and amplification by lateral inhibition.
Proc. U.S. Nat. Acad. Sei. 62:733-740.
Ream, N. (1970). Nonlinear identification using inverse-repeat rn-sequences. Proceedings IEEE
117:213-218.
Rebrin K., G.M. Steil, W.P. van Antwerp, and J.J. Mastrototaro. (1991). Subcutaneous glucose pre-
dicts plasma glucose independent of insulin: implications for continuous monitoring. American
Journal ofPhysiology 277:E561-E571.
Rebrin, K., G.M. Steil, L. Getty, and R.N. Bergman. (1995). Free fatty-acid as a link in the regula-
tion of hepatic glucose output by peripheral insulin. Diabetes 44: 1038-1045.
Recio, A., S.S. Narayan, and M.A. Ruggero. (1997). Wiener-kemel analysis ofbasilarmembrane re-
sponses to white noise. In: Diversity in Auditory Mechanics. E.R. Lewis, G.R. Long, R.F. Lyon,
P.M. Narins, C.R. Steele, and E. Hecht-Poinar (Eds.), World Scientific Press, Singapore, pp.
325-331.
Reichardt, W. and T. Poggio. (1976). Visual control of orientation behaviour in the fly. Part I: A
quantitative analysis. Quarterly Reviews ofBiophysics 9:311-375. [For Part 11 see Poggio and
Reichardt (1976).]
Rissanen, J. (1978). Modeling by shortest data description. Automatica 14:465-471.
Rissanen, J. (1996). Information theory and neutral nets. In: Mathematical Perspectives on Neural
Networks, P. Smolensky, M.C. Mozer and D.E. Rumelhart (Eds.), Lawrence Erlbaum Associ-
ates, Mahwah, NJ, pp. 567-602.
Robinson, G.ß., R.J. Sclabassi, and T.W. Berger. (1991). Kindling-induced potentiation of excitato-
ry and inhibitory inputs to hippocampal dentate granule cells. I. Effects on linear and nonlinear
response characteristics. Brain Research 562: 17-25.
Robinson, G.B., S.J. Fluharty, M.J. Zigmond, R.J. Sclabassi, and T.W. Berger. (1993). Recovery of
hippocampal dentate granule cell responsiveness to entorhinal cortical input following norepi-
nephrine depletion. Brain Research 614:21-28.
Rosenblatt, F. (1962). Principles ofNeurodynamics: Preceptrons and the Theory ofBrain Mecha-
nisms. Spartan Books, Washington, D.C.
Rosenblueth, A. and N. Wiener. (1945). The role ofmodels in science. Phi/os. Sei. 12:316-321.
Rugh, W.J. (1981). Nonlinear System Theory: The Volterra/Wiener Approach. Johns Hopkins Uni-
versity Press, Baltimore.
Rumelhart, D.E. and J.L. McClelland (Eds). (1986). Parallel Distributed Processing, Volumes 1
and 11 MIT Press, Cambridge, MA.
Sachs, F. (1992). Stretch-sensitive ion channels: An update. Sensory Transduction 15:241-260.
Sakai, H.M. and K.-I. Naka. (1985). Novel pathway connecting the outer and inner vertebrate reti-
na. Nature (London) 315:570.
Sakai, H.M. and K.-I. Naka. (1987a). Signal transmission in the catfish retina. IV. Transmission to
ganglion cells. Journal ofNeurophysiology 58: 1307-1328.
Sakai, H.M. and K.-I. Naka. (1987b). Signal transmission in the catfish retina. V. Sensitivity and
circuit. Journal ofNeurophysiology 58: 1329-1350.
Sakai, H.M. and K.-I. Naka (1988a). Dissection of neuron network in the catfish inner retina. I.
Transmission to ganglion cells. Journal ofNeurophysiology 60: 1549-1567.
Sakai, H.M. and K.-I. Naka (1988b). Dissection ofneuron network in the catfish inner retina. II. In-
teractions between ganglion cells. Journal ofNeurophysiology 60: 1568-1583.
528 REFERENCES
Sakai, H.M. and K.-I. Naka. (1990). Dissection ofneuron network in the catfish inner retina. V. In-
teractions between NA and NB amacrine cells. Journal ofNeurophysiology 63:120-130.
Sakuranaga, M. and K.-I. Naka. (1985a). Signal transmission in catfish retina. I. Transmission in
the outer retina. Journal ofNeurophysiology 53:373-388.
Sakuranaga, M. and K.-I. Naka. (1985b). Signal transmission in catfish retina. II. Transmission to
type-N cell. Journal of Neurophysiology 53:390-406.
Sakuranaga, M. and K.-I. Naka. (1985c). Signal transmission in catfish retina. III. Transmission to
type-C cell. Journal of Neurophysiology 53:411-428.
Sakuranaga, M. and Y.-I. Ando. (1985). Visual sensitivity and Wiener kemels. Vision Research
25:509.
Saltzberg, B. and W.D. Burton Jr. (1989). Nonlinear filters for tracking chaos in neurobiological
time series. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Mar-
Sams, A.D. and V.Z. Marmarelis. (1988). Identification oflinear periodically time-varying systems
using white-noise test inputs. Automatica 24:563-567.
Sandberg, A. and L. Stark. (1968). Wiener G-function analysis as an approach to nonlinear charac-
teristics of human pupil reflex. Brain Research 11:194-211.
Sandberg, I.W. (1982). Expansions fornonlinear systems. Bell. Syst. Tech. J. 61:159-200.
Sandberg, I.W. (1983). The mathematical foundations of associated expansions for mildly nonlin-
ear systems. IEEE Transactions on Circuits and Systems 30:441-445.
Saridis, G.N. (1974). Stochastic approximation methods for identification and control-A survey.
IEEE Trans. Automatie ControI19:798-809.
Saul, J.P., R.D. Berger, P. Albrecht, S.P. Stein, M.H. Chen, and R.J. Cohen. (1991). Transfer func-
tion of the circulation: unique insights into cardiovascular regulation. American Journal 0/Phys-
iology 261:HI231-HI245.
Schetzen, M. (1965a). Measurement of the kemels of a nonlinear system of finite order. Interna-
tional Journal 0/ControI2:251-263.
Schetzen, M. (1965b). Synthesis of a class of nonlinear systems. International Journal 0/ Control
1:401-414.
Schetzen, M. (1974). A theory ofnonlinear system identification. International Journal of Control
20:557-592.
Schetzen, M. (1980). The Volterra and Wiener Theories ofNonlinear Systems. Wiley, New York.
Schetzen, M. (1981). Nonlinear system modeling based on the Wiener theory. Proc. IEEE 69:1557.
Schetzen, M. (1986). Differential models and modeling. Int. J. Contr. 44:157-179.
Sclabassi, R.J., C.L. Hinman, 1.S. Kroin, and H. Risch. (1985). A nonlinear analysis of afferent
modulatory activity in the cat somatorsensory system. Electroenceph. Clin. Neurophysiol.
60:444-454.
Sclabassi, R.J., J.S. Kroin, C.L. Hinman, and H. Risch. (1986). The effect of cortical ablation on
modulatory activity in the cat somatosensory system. Electroenceph. Clin. Neurophysiol.
64:31-40.
Sclabassi, R.J., D.N. Krieger, and T.W. Berger. (1987). Nonlinear systems analysis of the so-
matosensory system. In: Advanced Methods 0/Physiological System Modeling, Volume L V.Z.
Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 104-127.
Sclabassi, R.J., D.N. Krieger, and T.W. Berger. (1988a). A Systems Theoretic Approach to the
Study of CNS Function. Annals 0/Biomedical Enginnering 16:17-34.
Sclabassi, R.J., D.N. Krieger, 1. Solomon, J. Samosky, S. Levitan, and T.W. Berger. (1989). Theo-
retical decomposition of neuronal networks. In: Advanced Methods 0/ Physiological System
Sclabassi, R.J., J.L. Eriksson, R.L. Port, G.B. Robinson, and T.W. Berger. (1988b). Nonlinear sys-
REFERENCES 529
tems analysis of the hippocampal perforant path-dentate projection: I. Theoretical and interpre-
tational considerations. J. Neurophysiol 60: 1066-1076.
Sclabassi, R.J., B.R. Kosanoviee, G. Barrionuevo, and T.W. Berger. (1994). Computational methods
of neuronal network decomposition. In: Advanced Methods ofPhysiological System Modeling,
Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 55-86.
Segall, A. and T. Kailath (1976). Orthogonal functionals ofindependent-increment processes. IEEE
Transactions on Information Theory IT-22:287-298.
Segundo, J.P. and A.F. Kohn. (1981). A model of excitatory synaptic interactions between pace-
makers. Its reality, its generality and the principles involved. Biol Cybern 40:113-126.
Segundo, J.P., O. Diez Martinez, and H. Quijano. (1987). Testing a model ofexcitatory interactions
between oscillators. Biol Cybern 55:355-365.
Segundo, J.P. and O. Diez Martinez. (1985b). Dynamic and static hysteresis in crayfish mechano-
receptors. Biol Cybern 52 :291-296.
Segundo, J.P., D.H. Perkel, H. Wyman, H. Hegstad, and G.P. Moore. (1968). Input-output relations
in computer-simulated nerve cells. Influence of the statistical properties, strength, number and
interdependence of excitatory presynaptic terminals. Kybernetik 4: 157-171.
Seyfarth, E.A. and A.S. French. (1994). Intracellular characterization ofidentified sensory cells in a
new mechanoreceptor preparation. Journal ofNeurophysiology 71: 1422-1427.
Shapley, R.M. and J.D. Victor. (1978). The effect of contrast on the transfer properties of cat retinal
ganglion cells. J. Physiol. (London) 285:275.
Shapley, R.M. and J.D. Victor. (1979). Nonlinear spatial summation and the contrast gain control of
the cat retina. Journal ofPhysiology 290: 141.
Shi, J. and H.H. Sun. (1990). Nonlinear system identification for cascade block model: an applica-
tion to electrode polarization impedance. IEEE Trans. on Biomedical Eng. 37:574-587.
Shi, J. and H.H. Sun. (1994). Identification of nonlinear system with feedback structure. In: Ad-
vanced Methods ofPhysiological System Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum,
New York, pp. 139-162.
Shi, Y. and K.E. Hecox. (1991). Nonlinear system identification by m-pulse sequences: application
to brainstem auditory evoked responses. IEEE Trans. on Biomedical Eng. 38:834-845.
Shingai, R. E., Hida, and N.-I. Naka. (1983). A comparison of spatio-temporal receptive fields of
ganglion cells in the retinas ofthe tadpole and adult frog. Vision Research 23:943.
Soderstrom, T. and P. Stoica. (1989). System Identification. Prentice-Hall International, London.
Sohrab, S. and S.Y. Yamashiro. (1980). Pseudorandom testing of ventilatory response to inspired
carbon dioxide in man. Journal 0/ Applied Physiology 49:1000-1009.
Song, D., V.Z. Marmarelis, and T.W. Berger. (2002). Parametrie and non-parametric models of
short-term plasticity. In: 2nd Joint IEEE EMBS and BMES Conference, Houston, TX.
Song, D., Z. Wang, V.Z. Marmarelis, and T.W. Berger. (2003). Non-parametric interpretation and
validation of parametric short-term plasticity models. In: Proceedings 0/ the IEEE EMBS Con-
ference, Cancun, Mexico, pp. 1901-1904.
Spekreijse, H. (1969). Rectification in the goldfish retina: Analysis by sinusoidal and auxilliary
stimulation. Vision Research 9: 1461-1472.
Spekreijse, H. (1982). Sequential analysis ofthe visual evoked potential system in man: Nonlinear
analysis ofa sandwich system. Annals ofthe New YorkAcademy ofSciences 388:72-97.
Sprecher, D.A. (1972). An improvement in the superposition theorem of Kolmogorov. J. Math.
Anal. Appl. 38:208-213.
Stark, L. (1968). Neurological Control Systems: Studies in Bioengineering. Plenum Press, New
York.
Stauffer, E.K., R.A. Auriemma, and G.P. Moore (1986). Responses ofGolgi tendon organs to con-
currently active motor units. Brain Research 375: 157-162.
530 REFERENCES
Stark, L. (1969). The pupillary control system: Its nonlinear adaptive and stochastic engineering de-
sign characteristics. Automatica 5:655-676.
Stavridi, M., V.Z. Mannarelis, and W.S. Grundfest. (1995a). Simultaneous monitoring of spectral
and temporal Xe-Cl excimer laser-induced fluorescence. Meas. Sei. Techno/. 7:87-95.
Stavridi, M., V.Z. Marmarelis, and W.S. Grundfest. (1995b). Spectro-temporal studies ofXe-CI ex-
cimer laser-induced arterial wall fluorescence. Medical Engineering & Physics 17:595-601.
Stein, R.B., A.S. French, and A.V. Holden. (1972). The frequency response coherence, and infor-
mation capacity oftwo neuronal models. Biophysical Journal 12:295-322.
Stoica, P. (1981). On the convergence ofiterative algorithm used for Hammerstein system identifi-
cation. IEEE Transactions on Automatie ControI26:967-969.
Suki, B., Q. Zhang, and K. Lutchen. (1995). Relationship between frequency and amplitude depen-
dence in the lung: a nonlinear block-structured modeling approach. Journal 0/ Applied Physiolo-
gy 79:600-671.
Sun, H.H. and J.H. Shi. (1989). New algorithm for Korenberg-Billings model ofnonlinear system
identification. In: Advanced Methods 0/ Physiological System Modeling, Volume IL V.Z. Mar-
Sun, H.H., B. Omaral, and X. Wang. (1987). Bioelectrode polarization phenomena: a fractal ap-
proach. In: Advanced Methods 0/ Physiological System Modeling, Volume L V.Z. Marmarelis
(Ed.), Biomedical Simulations Resource, Los Angeles, pp. 63-72.
Sutter, E.E. (1975). A revised conception ofvisual receptive fields based upon pseudorandom spa-
tio-temporal pattern stimuli. In: Proceedings 1st Symposium on Testing and Identification 0/
Nonlinear Systems, G.D. McCann & P.Z. Marmarelis (Eds.), Califomia Institute ofTechnology,
Pasadena, CA, pp. 353-365.
Sutter, E.E. (1992). A detenninistic approach to nonlinear system analysis. In: Nonlinear Vision,
R.B. Pinter and B. Nabet (Eds.), CRC Press, Boca Raton, FL, pp. 171-220.
Sutter, E.E. (1987). A practical nonstochastic approach to nonlinear time-domain analysis. In: Ad-
vanced Methods 0/ Physiological System Modeling, Volume L V.Z. Mannarelis (Ed.), Biomed-
ical Simulations Resource, Los Angeles, pp. 303-315.
Swanson, G.D. and J.W. Belville. (1975). Step changes in end tidal CO 2 : methods and implications.
Journal 0/ Applied Physiology 39:377-385.
Taylor, M.G. (1966). Use ofrandom excitation and spectral analysis in the study offrequency-de-
pendent parameters ofthe cardiovascular system. Circ. Res. 18:585-595.
Theunissen, F.E., K. Sen, and A.J. Doupe. (2000). Spectral-temporal receptive fields of nonlinear
auditory neurons obtained using natural sounds. J. Neuroscience 20:2315-2331.
Thomas, E.J. (1971). Some considerations on the application of the Volterra representation of non-
linear networks to adaptive echo cancellers. The Bell System Technical Journal 50:2797-2805.
Tiecks, F.P., A.M. Lam, R. Aaslid, and D.W. Newell. (1995). Comparison of static and dynamic
cerebral autoregulation measurements. Stroke 26: 1014-1019.
Toffolo, G., R.N. Bergman, D.T. Finegood, C.R. Bowden, C.R., and C. Cobelli. (1980). Quantita-
tive estimation ofbeta cell sensitivity to glucose in the intact organism. Diabetes 29:979-990.
Tranchina, D., J. Godron, R. Shapley, and J.-1. Toyoda. (1984). Retinallight adaptation evidence
for a feedback mechanism. Nature (London) 310:314.
Tresp, V., T. Briegel, and J. Moody. (1999). Neural-network models for the blood glucose metabo-
lism ofa diabetic. IEEE Transactions on Neural Networks 10:1204-1213.
Trimble, J. and G. Phillips. (1978). Nonlinear analysis ofhuman visual evoked response. Biological
Cybernetics 30:55-61.
Udwadia, F.E. and R.E. Kalaba. (1996). Analytical Dynamies: A New Approach, Cambridge Uni-
versity Press, Cambridge, U.K.
Udwadia, F.E. and P.Z. Marmarelis. (1976). The identification ofbuilding structural systems, I. The
REFERENCES 531
linear case; 11. The nonlinear case. Bulletin 01 the Seismological Society of America 66: 125-
171.
Ursino, M., and C.A. Lodi. (1998). Interaction among autoregulation, CO 2 reactivity and intracra-
nial pressure: A mathematical model. American Journal 01Physiology 274:HI715-HI728.
Ursino, M., A. Ter Minassian, C.A. Lodi, and L. Beydon. (2000). Cerebral hemodynamics during
arterial CO 2 pressure changes: in vivo prediction by a mathematical model. American Journal 01
Physiology 279:H2439-H2455.
van Dijk, P,. H.P. Wit, J.M Segenhout, and A. Tubis. (1994). Wiener kernel analysis of inner ear
function in the American bullfrog. The Journal 01 the Acoustical Society 01 America
95:904-919.
van Dijk, P., H.P. Wit, and J.M. Segenhout. (1997a). Dissecting the frog inner ear with Gaussian
noise. I. Application ofhigh-order Wiener kernel analysis. Hearing Research 114:229-242.
van Dijk, P., H.P. Wit, and J.M. Segenhout. (1997b). Dissecting the frog inner ear with Gaussian
noise. 11. Temperature dependence ofinner-ear function. Hearing Research 114:243-251.
van Trees, H.L. (1964). Functional techniques for the analysis of the nonlinear behavior of phase-
locked loops. Proc. IEEE 32:891-911.
Vassilopoulos, L.A. (1967). The application of statistical theory of nonlinear systems to ship perfor-
mance in random seas. Int. Shipbuild. Prog. 14:54-65.
Vicini, P., A. Caumo, and C. Cobelli. (1999). Glucose effectiveness and insulin sensitivity from the
minimal model: consequences of undermodeling assessed by Monte Carlo simulation. IEEE
Transactions on Biomedical Engineering 46: 130-137.
Victor, J.D. (1979). Nonlinear system analysis: comparison ofwhite noise and sum ofsinusoids in a
biological system. Proc. Natl. Acad. Sei. U.S.A. 76:996-998.
Victor, J.D. (1987). Dynamics of cat X and Y retinal ganglion cells, and some related issues in non-
linear systems analysis. In: Advanced Methods 01 Physiological System Modeling, Volume L
V.Z. Marmarelis (Ed.), Biomedical Simulations Resource, Los Angeles, pp. 148-160.
Victor, J.D. (1989). The geometry of system identification: fractal dimension and integration for-
mulae. In: Advanced Methods 01 Physiological System Modeling, Volume IL V.Z. Marmarelis
(Ed.), Plenum, New York, pp. 147-164.
Victor, J.D. (1991). Asymptotic approach of generalized orthogonal functional expansions to
Wiener kernels. Annals 01Biomedical Engineering 19:383-399.
Victor, J.D. and B.W. Knight. (1979). Nonlinear analysis with an arbitrary stimulus ensemble. Q.
Appl. Math. 37:113-136.
Victor, J.D. and R.M. Shapley. (1979a). Receptive field mechanisms of cat X and Y retinal gan-
glion cells. Journal 01 General Physiology 74:275.
Victor, J.D. and R.M. Shapley. (1979b). The nonlinear pathway ofY ganglion cells in the cat retina.
Journal 01 General Physiology 74:671-689.
Victor, J.D. and R.M. Shapley. (1980). A method of nonlinear analysis in the frequency domain.
Biophysical Journal 29:459-484.
Victor, J.D., R.M. Shapley, and B.W. Knight. (1977). Nonlinear analysis of cat retinal ganglion
cells in the frequency domain. Proc. Nat!. Acad. Sei. U.S.A. 74:3068.
Volterra, V. (1930). Theory 01 Functionals and 01 Integral and Integro-Differential Equations,
Dover Publications, New York.
Waddington, J. and F. Fallside (1966). Analysis ofnonlinear differential equations by the Volterra
series. International Journal ofControl s: 1-15.
Watanabe, A. and L. Stark. (1975). Kernel method for nonlinear analysis: Identification of a biolog-
ical control system. Mathematical Bioseiences 27:99-108.
Webster, J.G. (1971). Pupillary light reflex: The development of teaching models. IEEE Transac-
tions on Biomedical Engineering 18: 187-194.
532 REFERENCES
Weiss, P.L., LW. Hunter, and R.E. Kearney. (1988). Human ankle joint stiffness over the full range
ofmuscle activation levels. Journal ofBiomechanics 21:539-544.
Westwick, D.T. and R.E. Keamey. (1992). A new algorithm for the identification ofmultiple input
Wiener systems. Bio!. Cybern. 68:75-85.
Westwiek, D.T. and R.E. Keamey. (1994). Identification ofmultiple-input nonlinear systems using
non-white test signals. In: Advanced Methods 0/ Physiologieal System Modeling, Volume IIL
Wickesberg, R.E. and C.D. Geisler. (1984). Artifacts in Wiener kerneis estimated using Gaussian
white noise. IEEE Transactions on Biomedical Engineering 31:454-461.
Wickesberg, R.E., J.W. Dickson, M.M. Gibson, and C.D. Geisler. (1984). Wiener kemels analysis
ofresponses from anteroventral cochlear nucleus neurons. Hearing Research 14:155-174.
Widrow, B. and M.A. Lehr. (1990). 30 years ofadaptive neural networks: Perceptron, madaline and
back propogation. Proc. IEEE 78: 1415-1442.
Wiener, N. (1938). The homogeneous chaos. Am. J. Math. 60:897.
Wiener, N. (1942). Response ofa nonlinear device to noise. Report No. 129, Radiation Laboratory,
M.I.T., Cambridge, MA.
Wiener, N. (1958). Nonlinear Problems in Random Theory. MIT Press, Cambridge, MA.
Wray, J. and G.G.R. Green. (1994). Calculation ofthe Volterra kerneis ofnonlinear dynamic sys-
tems using an artificial neural network. Biol. Cybern. 71: 187-195.
Wysocki, E.M. and W.J. Rugh. (1976). Further results on the identification problem for the class of
nonlinear systems SM. IEEE Trans. Circuits & Systems 23:664-670.
Yamada, W.M. and E.R Lewis. (1999). Predicting the temporal responses of non-phaselocking
bullfrog auditory units to complex acoustic waveforms. Hearing Research 130:155-170.
Yamada, W.M. and E.R. Lewis. (2000). Demonstrating the Wiener kernel description of tuning and
suppression in an auditory afferent fiber: Predicting the AC and DC response to a complex nov-
el stimulus. In: Reeent Developments in Auditory Mechanics, H. Wada, T. Takasaka, K. Ikeda,
K. Ohyama, T. Koike (Eds.), World Scientific, Singapore, pp. 506-512.
Yamada, W.M., K.R. Henry, and E.R. Lewis. (2000). Tuning, suppression and adaptation in audito-
ry afferents, as seen with second-order Wiener kerneis. In: Reeent Developments in Auditory
Mechanics. H. Wada, T. Takasaka, K. Ikeda, K. Ohyama, and T. Koike (Eds.), World Scientific,
Singapore, pp. 419-425.
Yamada, W.M., G. Wolodkin, E.R. Lewis, and K.R. Henry. (1997). Wiener kernel analysis and the
singular value decomposition. In: Diversity in Auditory Mechanics. E.R. Lewis, G.R. Long, R.F.
Lyon, P.M. Narins, C.R. Steele, and E. Hecht-Poinar (Eds.), World Scientific Press, Singapore,
pp. 111-118.
Yasui, S. (1979). Stochastic functional Fourier series, Volterra series, and nonlinear systems analy-
sis. IEEE Trans. Autom. Contr. 24:230-242.
Yasui, S. (1982). Wiener-like Fourier kemels for nonlinear systems identification synthesis (Non-
analytic cascade bilinear and feedback case). IEEE Trans. Automatie Control. 27:667.
Yasui, S. and D. H. Fender. (1975). Methodology for measurement of spatiotemporal Yolterra and
Wiener kerneis for visual systems. In: Proceedings 1st Symposium on Testing and Identification
ofNonlinear Systems. Califomia Institute ofTechnology, Pasadena, CA, pp. 366-383.
Yasui, S., W. Davis, and K.I. Naka. (1979). Spatio-temporal receptive field measurements ofretinal
neurons by random pattem stimulation and cross-correlation. IEEE Transactions on Biomedical
Yates, F.E. (1973). Systems biology as a concept. In Engineering Principles in Physiology, Volume
1, H.V. Brown and D.S. Gann (Eds.), Academic Press, New York.
Yeshurun, Y., Z. Wollgerg, and N. Dyn. (1987). Identfication ofMGB cells by Yolterra kemeis, 111:
a glance into the black box. Biological Cybernetics 56:261-268.
REFERENCES 533
Zadeh, L.A. (1956). On the identification problem. IRE Trans. Circuit Theory 3:277-281.
Zadeh, L.A. (1957). On the representation ofnonlinear operators. In: IRE Wescon Conv. Rec., Part
2. 105-113.
Zames, G.D. (1963). Functional analysis applied to nonlinear feedback systems. IEEE Transactions
on Circuit Theory 10:392--404.
Zhang, R., I.H. Zuckerman, C.A. Giller, and B.D. Levine. (1998). Transfer function analysis of dy-
namic cerebral autoregulation in humans. American Journal 0/ Physiology 274:H233-H241.
Zhang, R., I.H. Zuckerman, and B.D. Levine. (2000). Spontaneous fluctuations in cerebral blood
flow velocity: insights from extended duration recordings in humans. American Journal 0/ Phys-
iology 278:HI848-1855.
Zhao, X. and V.Z. Marmarelis. (1994a). Identification ofparametric (NARMAX) models from esti-
mated Volterra kerneIs. In: Advanced Methods 0/ Physiological System Modeling, Volume IIL
Zhao, X. and V.Z. Marmarelis. (1994b). Equivalence between nonlinear differential and difference
equation models using kernel invariance methods. In: Advanced Methods 0/ Physiological Sys-
tem Modeling, Volume IIL V.Z. Marmarelis (Ed.), Plenum, New York, pp. 219-228.
Zhao, X. and V.Z. Marmarelis. (1997). On the relation between continuous and discrete nonlinear
parametric models. Automatica 33:81-84.
Zhao, X. and V.Z. Marmarelis. (1998). Nonlinear parametric models from Volterra kerneIs mea-
surements. Mathl. Comput. Modeling 27:37-43.
Zierler, N. (1959). Linear recurring sequences. Journal ofSociety for Industrial Applied Mathemat-
ics 7:31-49.
Index
Action potentials, 414 exogenous variable (ARMAX) model,

Additive parallel branches, 198 147, 168
Aliasing, 13 parameters, 147
Amplitude nonlinearity, 43 Axon hillock, 414
Analysis of estimation errors, 125 Axons, 414
Anatomists, 3, 25
Anisoropy, 492 Band-limited GWN, 92
ANN, see Artiticial neural network advantages, 92
Apparent transfer function (ATF), 93, 158 disadvantages, 93
illustrative example, 160 input, 60
of linearized models, 158 Bandwidth, 270
Applications oftwo-input modeling to Beta rule, 243, 279
physiological systems, 369 Bose, Amar, 71, 142
Arbitrary inputs, 52 Broadband stochastic input/output signals, 5
ARMA model, 148 Brownian motion, 57
ARMAX model; see Autoregressive moving
average with exogenous variable Caltech, 143
model Cardiovascular system, 320
ARMAX tX2 model, 148 Causal systems, 10
Artiticial neural network (ANN), 223 Cerebral autoregulation, 22
Artiticial pancreas, 345, 347 in humans, 380
Ascending-order MOS procedure, 258 Closed-Ioop condition, 153
Asclepiades, 26, 27 Closed-Ioop model, 490
ATF, see Apparent transfer function autoregressive form, 490
Auditory nerve tibers, 302 Closed-Ioop systems, 489
Autocorrelation functions of random processes, network model form, 491
505 Coherence function, 94
Autoregressive moving average with Coherence measurements, 93
Nonlinear Dynamic Modeling 01Physiological Systems. By Vasilis Z. Marmarelis 535

536 INDEX
Comparative use ofGWN, PRS, and CSRS, 92 Efficient Volterra kerne1estimation, 100
Comparison of Volterra/Wiener model Eigen-decomposition (ED) approach, 188
predictions, 64 ELS, see Extended least squares
Connectionist models, 223 Empiricists, 3, 26, 27
relation with PDM modeling, 230 Enhanced convergence algorithms for fixed
Constant zeroth-order Wiener functional h o, 73 training steps, 242
Constant-switching-pace symmetrie random Equivalence between connectionist and
signal (CSRS), 80 Volterra models, 223
advantages, 93 Equivalence between continuous and discrete
and Volterra kernels, 84 parametrie models, 171
disadvantages, 93 illustrative example, 175
Cross talk, 43 Erasistratus,25
Cross-correlation technique, 17, 78, 100, 449 Ergodicity, 273, 505
Cross-correlation technique (CCT), 113 Erroneous scaling ofkernel estimates, 136
for Wiener kernel estimation, 72 Error term, 9
of nonparametrie modeling, 16 Estimation bias, 128
Cross-correlation-based method for multiinput Estimation error, 132
modeling, 390 analysis of, 125
CSRS, see Constant-switching-pace symmetrie Estimation errors associated with direct
random signal inversion methods, 137
CCT, see Cross-correlation technique Estimation Errors Associated with iterative
Cubic feedback, 222 cost-minimization methods, 139
systems, 204 Estimation errors associated with the cross-
Cybernetics, 32 correlation technique, 127
Estimation of ho, 73
Data consolidation method, 276 Estimation of h2 (t h t2 ) , 74
Data preparation, 275 Estimation of h 3 (t l , t2 , t3 ) , 75
Data record length, 273 Estimation variance, 130, 132
Deductive modeling, 24 Extended least-squares (ELS) procedure, 150
Delta-bar-delta rule, 243, 279
Democritus, 26 Fast exact orthogonalization, 55
Dendrites, 414 Feedback branches, 200
Dendritic potentials (DPs), 415 Feedback mechanisms, 200
Diagonal estimability problem, 85 Feedforward Volterra-equivalent network
Differential-equation models, 145 architectures, 229
Discrete-time representation of the CSRS Filter banks, 251
functional series, 89 First-order Volterra functional, 35
Discrete-time Volterra kerneis ofNARMAX First-order Volterra kernei, 34
models, 164 Fly photoreceptor, 85
Discretized output, 17 F-ratio test, 151
Disease process, 3 Frequency-domain estimation ofWiener
DLF expansions for kernel estimation, 112 kemels, 78
DP, see Dendritic potentials Function expansions, 495-498
Dual-input stimulation in the hippocampal Functional integration in the single neuron, 414
slice, 455
Duffing system, 98 Galen ofPergamos (Galenos), 3 27, 28, 143
Dynamic nonlinearity, 43 Gaussian white noise (GWN), 499
Dynamic range, 270 input, 30
Dynamic system physiology, 3 test input, 16
Dynamic systems, 30 General model of mebrane and synaptic
dynamies, 408
ED, see Eigen decomposition Generalized harmonie balance method, 173
INDEX 537
Global predictive model, 153 Iterative cost-minimization methods for non-

Glucose balance, 354 Gaussian residuals, 55
equation, 354 Iterative estimation methods, 139
Glucose metabolism, 344
Glucose production, 354 KBR method, see Kemel-based method
Glucose-insulin minimal model, 21; see also Kernel expansion approach, 55
Minimal model Kernel expansion method, 469
Graded potentials, 414 kernel expansion methodology, 101
Gram matrix, 53, 54, 104 kernel invariance method, 171, 172
GWN, see Gaussian white noise Kernel-based (KBR) method, 169, 170
Kernel-expansion method for multiinput
Harvey, William, 3, 27 modeling, 393
H-H model, see Hodgkin-Huxley model Kronecker delta, 69
Hidden layer, 278
Higher-order nonlinearities, 37, 116 Lag-delta representation ofP-V or P-W
High-order kernels, 78 kernels, 444
High-order Volterra modeling with equivalent Laguerre expansion technique (LET), 31, 107,
networks, 122 113,455
High-order Volterra models, 122 Laguerre functions, 107
Hippocampal formation, 448 Laguerre-Volterra Network (LVN), 246
Hippocrates, 1, 3, 25, 26, 27, 28, 143 illustrative example, 249
Hodgkin-Huxley (H-H) model, 162,408 Lateral feedforward branches, 198
Homogeneous chaos, 57 Lee, Y. W., 142
Hypothesis-driven research, 4 Leucippus, 26
Light-to-horizontal cell model, 217
Impulse invariance method, 171 Likelihood of firing, 286
Impulse sequences, 51 Linear time-varying systems with arbitrary
Impulsive input, 42 inputs, 479
Inductive modeling, 24 Linearized models, apparent transfer functions
Inductively derived models, 2 of, 158
Input characteristics, 269 L-N cascade, 194
Input-additive noise, 135 system, 38, 96
Input-output data, 13, 30 L-N model, 196
Input-output signal transformation, 8 L-N-M cascade, 194, 198
Instrumental variable (IV) method, 153 L-N-M model, 196
Insulin action, 354 L-N-M "sandwich" system, 39
Insulin production, 354 LVN variant with two filter banks (LVN-2),
Insulin secretion, 344 253
Insulin sensitivity, 347 LVN-2 Modeling, 255
Insulin-glucose interactions, 342
Insulinogenic PDM, 352 Marmarelis, Panos, 143, 286, 287, 295, 361,
Insulinoleptic PDM, 353 369
Integrated PDM model with trigger regions, Marmarelis, Vasilis, 143, 288
427 Mathematical models, 1, 7
Integrative and dynamic view of physiology, "ideal," 7
3 "less than ideal," 7
Interaction layer, 278 bandwidth, 11
Interference, 8, 135, 271 compact,7
Interpretation ofthe PDM model, 282 datasets,8
Interpretation of Volterra kernels, 281 dynamic range, 11
Invertebrate photoreceptor, 18 efficiency, 11
Invertebrate retina, 296 global, 7
538 INDEX
Mathematical models (continued) Multiinput/multioutput systems, 10

interpretable, 7 Multiple interconnections, 5
operational range, 11 Multiple variables of interest, 5
practical considerations and experimental Multiplicative feedback, 201
requirements of, 266 Myogenic mechanism, 333
robustness, 7
scientific interpretability, 11 Naka, Ken, 143,286,287,288,290,295,361
signals, 8 NARMAX model, 151, 152, 164, 165, 168,
trade-off between model parsimony and its 169,170
global validity, 11 Negative decompressive feedback, 222
McCann, Gilbert, 143, 287, 361 nervous system, 407
MDV model, see Modified diserete Volterra Network-based methods, 480
(MDV) model applications to nonstationary physiological
MDV modeling methodology, 277 systems, 484
Measurement noise, 5, 8 illustrative examples, 481
Metabolie autoregulation in dogs, 378 Network-based multiinput modeling, 393
Metabolic-Endocrine system, 342 Neuronal modes, (NMs), 417
Method of generalized harmonie balance, 154 Neuronal systems with point-process inputs,
Methodists, 27 438
Minimal model of insulin-glucose interaction, Neuronal unit, 408
161,353,354,356 Neurosensory systems, 286
Minimum-order modeling of spike-output N-M cascades, 194
systems, 431 N-M model, 196
Minimum-order Wiener models, 434 Noise, 271
illustrative example, 438 effects, 134
MIT, 142 Non-Gaussian white-noise, 500
Model estimation, 10, 276, 284 Non-Gaussian, quasiwhite input signals, 77
Model interpretation, 281 Nonlinear (stationary) systems, 150
Model order determination, 104 Nonlinear autoregressive modeling (open-
Model specification, 10, 276, 284 loop),246
inductive versus deductive model Nonlinear behavior, 6
development, 11 Nonlinear dynamic analysis, 3
Model validation, 279 Nonlinear dynamics, 5
Modeling errors, 8, 126 Nonlinear feedback, 220
Modeling of closed-loop systems, 489-493 described by differential equations, 202
Modeling of multiinput/output systems, in sensory systems, 216
359-406 Nonlinear modeling of physiological systems,
multiinput case, 389 29
two-input case, 360 conceptual/mathematical framework, 30
Modeling of neuronal ensembles, 462 objective, 30
Modeling of neuronal systems, 407-465 strengths, 29
Modeling of nonstationary systems, 467-488 weaknesses, 29
illustrative example, 474 Nonlinear modeling of synaptic dynamics, 459
Modeling physiological systems with multiple Nonlinear models ofphysiological systems, 13
inputs and multiple outputs, 359 connectionist, 14
Modified discrete Volterra (MDV) model, 103 modular, 14
Modular and connectionist modeling, 179-264 nonparametric, 14
Modular form of nonparametric models, 179 parametric, 14
Modular representation, 177 Nonlinear parametric models with
Modulatory feedforward branches, 198 intermodulation, 161
Motion detection in the invertebrate retina, Nonlinearity, 12
369 amplitude, 43
INDEX 539
Nonparametrie modeling, 29-143 test for system memory, 272

Nonstationarity, 12, 13, 152, 467 test for system stationarity and ergodicity,
modeling of, 467-488 273
modeling problem, 472 Prewhitening, 56
system, 30 filter, 150
in system dynamics, 5 Principal dynamic mode (PDM), 179, 180; also
system/model, 34 seePDM
Nonwhite Gaussian inputs, 98 Problems ofmodeling in physiology, 6
Pseudolinear regression problem, 150
One-step predictive model, 153 Pseudomode-peeling method, 245
Open-Ioop condition, 153 Pseudorandom sequences, 91
Optimization of input parameters, 131 Pseudorandom signals (PRSs), 89
Ordinary least-squares (OLS) estimate, 53 advantages, 93
Orthogonal Wiener series, 30 based on m-Sequences, 89
disadvantages, 93
Parabolic leap algorithrn, 279 based on m-Sequences, 89
Parallel-cascade Method, 55
Parametrie model, 167, 168 Quadratic Volterra system, 97
Parametrie modeling, 145-178 Quasistationary approach, 468, 469
basic parametric model forms and estimation Quasiwhite test input, 30, 80
procedures, 146 Quasiwhiteness, 270
PDM (principal dynamic mode) model
interpretation, 282
insulinogenic, 352 Random broadband input signals, 6
insulinoleptic, 353 Real physiological systems, 5
integrated, with trigger regions, 427 Receptive field organization in the vertebrate
Physiological system modeling, 3, 6, 7 retina, 370
complexity, 11 Recuperative faculties, 3
data driven, 11 Recursive tracking methods, 468
data-driven, 25 Reduced P-V or P-W kerneis, 445
deductive, 24 Regulatory branches, 199
inductive, 11, 24 Regulatory feedback, 202
linearity, 12 Relation between Volterra and Wiener models,
nonlinearities in, 12 60
nonstationarities in, 12 analytical example, 86
superposition principle., 12 comparison ofmodel prediction errors, 88
synergistic, 24, 25 Renal system, 333
Physiological system modeling problem, 13 Residual, 9
Physiological variables, 8 Residual whitening method, 150
inputs, 8 Reverse-correlation technique, 432
outputs, 8 Riccati equation, 19, 157, 158
Physiology, problems of modeling in, 6 Riccati system, 40
Piecewise stationary modeling methods, 468
Positive compressive feedback, 222 Sandwich model, 39
Positive nonlinear feedback, 213 Second-order kerneis of nonlinear feedback
Posterior filter, 245 systems, 215
Practical considerations and experimental Second-order Volterra functional, 36
requirements of mathematical Second-order Volterra kernei, 44
modeling, 266 Second-order Volterra system, 35, 36
Preliminary testing, 272 Second-order Wiener model, 16
test for system bandwidth, 272 Separable Volterra network (SVN), 225
test for system linearity, 274 Settling time, 267
540 INDEX
Sigmoid feedback, 222 TGF, see tubuloglomerular feedback

systems, 209 Themison,27
Signal characteristics, 266 Three-Iayer perceptron (TLP), 224
Significant response, 267 Time invariance, 268
Single-input stimulation in vitro, 455 TLP, see three-Iayer perceptron
Single-input stimulation in vivo, 449 Transformation, 414
Sinusoidal input, 43 Trigger lines, 421
Socrates,5 Trigger regions, 417, 421
Sources of estimation errors, 125 Tubuloglomerular feedback (TGF), 333
estimation method errors, 125 Two-dimensional Fourier transform, 36,44
model specification errors, 125 Two-input cross-correlation technique, 362
noise/interference errors, 125 Two-input kernel-expansion technique, 362
Spatiotemporal modeling, 395, 397
of cortical cells, 402 Variable step algorithms, 243
of retinal cells, 398 Vector notation, 10
Spectrotemporal model, 397, 395 VEN, see Volterra-equivalent network
Spider mechanoreceptor, 307 VENNWM modeling methodology, 278
Spontaneous insulin-to-glucose PDM model, Vertebrate retina, 15, 287
352 Vesalius, 3, 27
Stark, Larry, 286 Volterra, Vito, 31, 32, 33, 140, 141
Static nonlinear system, 37 Theory 0/Functionals and Integro-
Stationarity, 12, 273, 505 Differential Equations, 32
Stationary system, 30 Volterra analysis of Riccati equation, 19
Step-by-step procedure for physiological Volterra-equivalent network (VEN), 223,224
system modeling, 283 architectures, 223, 235
Stochastic error term, 8, 9 for nonlinear system modeling, 235
Sum of sinusoids of incommensurate convergence and accuracy of the training
frequencies,52 procedure, 240
SVN, see separable Volterra network equivalence with Volterra kemels/models,
Synaptic junction, 414 238
System bandwidth, 266, 272 network parameter initialization, 241
System characteristics, 266 selection of the structural parameters, 238
System dynamic range, 267 selection of the training and testing data sets,
System ergodicity, 268 240
System linearity, 268, 274 with two inputs, 364
System memory, 267, 272 illustrative example, 366
System modeling, 8, 11 Volterra functional, 33, 34, 36, 42, 43, 44
System nonlinear dynamics, 4 expansion, 30, 37
System stationarity, 268 series expansion, 31
Systemic interference, 5 Volterra kernei, 18,20,22, 30, 33,34,37,42,
Systemic noise, 135 167
discrete-time, 20
Taylor multivariate series expansion of an expansion, 101
analytic function, 32 estimation of, 49, 101
Taylor series coefficients, 37 first order, 20
Taylor series expansion, 33 meaning of, 45
Test of nonstationarity, 475 of nonlinear differential equations, 153
Test systems operational meaning, 41
for system bandwidth, 272 second order, 20
for system linearity, 274 Volterra modeling framework, 35
for system memory, 272 Volterra models, 31, 37, 223
for system stationarity and ergodicity, 273 discrete-time,47
INDEX 541
frequency-domain analysis, 48 Weierstrass theorem, 33

frequency-domain representation, 45 Whiteness, 270
of system cascades, 191 Wiener, Norbert, 6, 32, 140, 141, 142, 143
of systems with feedback branches, 200 approach to kernel estimation, 67
of systems with lateral branches, 198 Wiener class of systems, 62
Volterra series, 30, 31, 32 Wiener functionals, 58, 59
expansion, 32, 33, 37 Wiener kernel, 16, 30, 58, 59, 77
Volterra-Wiener approach, 4, 15 estimation, 77
VolterraIWiener model predictions, Wiener model, 57, 30, 195
comparison of, 64 examples of, 63
Volterra-Wiener network (VWN), 122 Wiener series, 57, 58, 503
Volterra-Wiener-Marmarelis (VWM) model, construction of, 503
260 Wiener-Bose model, 122
VWN see Volterra-Wiener network

Nonlinear Dynamic Modeling of Physiological Systems - Marmarelis

Uploaded by

Copyright:

Available Formats

You might also like

Nonlinear Dynamic Modeling of Physiological Systems - Marmarelis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nonlinear Dynamic Modeling of Physiological Systems - Marmarelis

Uploaded by

Copyright:

Available Formats

Nonlinear Dynamic

IEEE Press Editorial Board

M. Akay R. J. Herrick M. S. Newman

Kenneth Moore, Director ofBook and Information Services (BIS)

IEEE Press Series on Biomedical Engineering

Published by John Wiley & Sons, Ine., Hoboken, New Jersey.

Library 0/ Congress Cataloging-in-Publication Data is available.

Printed in the United States of America.

To my brother Panos for guiding my first steps

To my love Melissa and my sons Zissis and Myrl

2.1.2 Operational Meaning of the Volterra Kernels 41

2.4.2 Estimation Errors Associated with the Cross-Correlation 127

3 Parametrie Modeling 145

4 Modular and Connectionist Modeling 179

4.2.2 Volterra-Equivalent Network Architectures for Nonlinear 235

5 A Practitioner's Guide 265

6 Selected Applications 285

6.3 Renal System 333

7 Modeling of MultiinputlMultioutput Systems 359

8 Modeling of Neuronal Systems 407

9 Modeling of Nonstationary Systems 467

9.2.3 Linear Time-Varying Systems with Arbitrary Inputs 479

10 Modeling of Closed-Loop Systems 489

Appendix I Function Expansions 495

Appendix 11 Gaussian White Noise 499

Appendix 111 Construction of the Wiener Series 503

Appendix IV Stationarity, Ergodicity, and Autocorrelation Functions 505

-Hippocrates, "Precepts," Athens, 5th Century H.C.

1.1 PURPOSE OF THIS BOOK

Nonlinear Dynamic Modeling 0/ Physiological Systems. By Vasilis Z. Mannarelis 1

1.2 ADVOCATED APPROACH

ensemble of natural stimuli and unconstrained by arbitrary experimental manipulations),

• Nonlinear dynamics (ofarbitrary order)

ties/nonstationarities (neural, cardiovascular, respiratory, renal, endocrine, and metabolie

1.3 THE PROBLEM OF SYSTEM MODELING IN PHYSIOLOGY

The purpose of physiological system modeling is to advance our quantitative understand-

form of mathematical relations among variables of physiological interest. The resulting

Xl(t) .... ...

If we limit ourselves, at first, to the single-input/single-output case, we can use the

y(t) = S[x(t'), t' ::; t] + e(t) (1.1)

y(t) = S[x(t'), t' ~ t] + s(t) (1.2)

1.3.1 Model Specification and Estimation

Table 1.1 Nonlinear System Modeling Methodologies

Strengths (+) and Weaknesses (-)

1.3.2 Nonlinearity and Nonstationarity

S[AIXl(t) + A2X2(t)] = AIS[XI(t)] + A2S[X2(t)] (1.3)

1.3.3 Definition of the Modeling Problem

• No prior knowledge is available about the internal workings ofthe system

1.4 TYPES OF NONLINEAR MODELS OF PHYSIOLOGICAL SYSTEMS

es of nonlinear models used to date: nonparametric, parametric, modular, and connec-

Example 1.1. Vertebrate Retina

CONE RECEPTOR ROD RECEPTOR

y(n)=h o+ L, hl(m)x(n-m) + L,L,h 2(mh m2)x(n- ml)x(n- m2)-PL, h 2(m,m) (1.4)

STIMULUS (WHITE NOISE)

h, (Tl , FIRST-ORDER KERNEL

o th, .h,l, nonlinear model

Example 1.2. Invertebrate Photoreceptor

kl(m) = c ig(m)u(m) (1.5)

k2(mJ, m2) = c~(m l )g(m2)u(m aU(m2) (1.6)