Ami Project Jonathan Waxman

Information Theory and Neuroscience
Jonathan Waxman
November 18, 2009
1 Introduction
Neuroscience is the study of the structure and function of the brain and the nervous system and
their interactions and relationships to other physiological systems and biological phenomena. The
field is vast, ever-expanding, and spans an enormous range of inquiry, including investigations
into the subcellular mechanisms that subserve neuronal function, studies involving interactions
among groups of neurons and neuronal assemblies, as well as large-scale functions of the brain
and nervous system, and research into the dynamical interactions between the nervous system and
other physiological systems, such as the cardiovascular and respiratory systems. Neuroscience also
seeks to understand the structure and basis of complex bio-behavioral processes such as sleep,
cognition, and consciousness. Despite the seemingly boundaryless nature of neuroscience, much
research conducted under its purview is united by the guiding notion of the neuron as a basic
biological information processing element. Therefore, at every level of inquiry is the overarching
theme of information. This provides a natural entry point for the application of information theory
to the problems of neuroscience. Information theory provides a mathematical and conceptual
framework for quantifying, storing, processing, and transmitting information in a communication
system. Since the nervous system may be conceptualized as a communication system wherein
information is stored, processed, and transmitted, information theory provides a natural analytic
enviornment for neuroscience. The links between the work of Claude Shannon and Norbert Wiener,
who founded cybernetics, suggest that this may have been recognized very soon after or around
the time Shannon published his first papers on information theory. In any case, the application of
information theory to the neurosciences has a long and rich history. Investigators too numerous
to mention here have utilized a wide range of concepts and techniques from information theory
to study and gain insight into various aspects of the nervous system. This paper provides a very
brief review of the application of information theory to the study of a few major active areas of
neuroscientific research at three organizational levels, namely, the level of the neuron, the level of
the brain, and the level of physiological systems.
1
2 The Nervous System
First, a short review of the nervous system is in order. The nervous system is a vast network of
interconnected cells adapted to send and receive signals to and from one another. It is thought that
the specific organization of these cells and interconnection underlies every aspect of physiological
regulation and homeostasis, as well as behavior, cognition, and consciousness. Naturally, due to
the incredible diversity that has evolved among different species, the structure and function of
the nervous system varies tremendously. Nevertheless, every nervous system appears to consist of
several basic invariant properties. Every nervous system, even that of the most simple invertebrate,
is composed of neurons, which are thought to be the basic units of biological information processing.
As animals evolved more complex behaviors, nervous systems developed from simple networks of
sensory neurons and interneurons to highly complex interconnected systems of many interacting
components along with an array of specialized neurons adapted to various forms of sensing internal
and external stimuli, motor neurons for affecting movement, numerous kinds of interneurons, and
support cells. Thus, we observe in highly evolved mammals a central nervous system and elaborate
brain with many specialized functional areas, a spinal cord enclosed in a shealth and protected by
the skeletal system, and a peripheral nervous system that conncets the central nervous system to
every other system in the body.
Figure 1: The antomy of a neuron.
The precise mechanisms which underlie neuronal signal transmission in all animals has been known
for a long time. Its mathematical description was famously elucidated by Alan Hodkin and Andrew
Huxley in 1952.[1] For their work, they were awarded the Nobel Prize in Physiology or Medicine
in 1963. Briefly, all neurons possess three principal structures as illustrated in Figure 1, namely, a
soma, dendrites, and an axon. The soma is the cell body, which contains the nucleus of the cell.
Dendrites are highly branched extensions of the cell membrane that receive most of the neuron’s
inputs. The axon is a long, branched, cable-like projection of the cell membrane that can extend
distances many orders of magnitude greater than the length of the soma. Electrochemical impulses
or spikes, called action potentials, are generated at the soma and travel along the axon towards
other neurons. The ends of axonal branchs, called anaxon terminals, connect to other neurons and
2
their dendrites via synapses. A synapse refers to the very narrow space between the axon terminal
and the neuron to which it communicates. When the action potential reaches the axon terminal, a
series of electrochemical events occur which cause the release of substances called neurotransmitters
from the axon terminal into the synapse. When enough of these neurotransmitters are present, an
action potential is triggered in an all-or-none fashion in the post-synaptic neuron. This new action
potential then travels down the post-synaptic neuson’s axon to yet more neurons. The number of
neurons and synapses, and hence, interconnections, between neurons may be huge. For instance,
each of the 1011 neurons of the human brain has approximately 7000 synaptic connections to other
neurons. One of the central hypotheses of neuroscience is that information is encoded in the time-
frequency characteristics of trains of action potentials. For example, in many experiments in which
information theory is applied, the probability distribution and entropy of the time-varying firing
rate of a neuron is calculated.
3 Information Theory
There are many excellent texts on information theory.[2, 3] Therefore, only several major concepts
that are routinely used in neuroscience research will be briefly discussed here. The first concept is
that of entropy. Here, I will only consider discrete entropy. Entropy is a function of the probabilty
mass fucntion a random variable, 𝑋, and is defined as
∑
𝐻(𝑋) = 𝐻(𝑝) = − 𝑝(𝑥) log 𝑝(𝑥) (1)
𝑥∈𝒳
where 𝑥 is a particular value taken on by the random variable 𝑋. It may be any value in the
alphabet, or set 𝒳 . 𝑝(𝑥) is the probabilty of 𝑋 = 𝑥 being observed. Since 0 ≤ 𝑝(𝑥) ≤ 1, discrete
entropy is always positive. I will assume the log is to the base 2, so entropy will be measured
in bits. There are many equivalent ways of conceptualizing entropy. Most commonly entropy
is considered to be a measure of the amount of uncertainty in 𝑋 or the average number of bits
required to describe 𝑋. Hence, a random variable with higher entropy than another random variable
contains more uncertainty. Therefore, it requires, on average, more bits to represent its values.
Entropy is maximized when the probabilty mass function of the random variable is uniform. Then
𝐻(𝑋) = log ∣𝒳 ∣. Joint entropy, denoted by 𝐻(𝑋1 , 𝑋2 , ..., 𝑋𝑛 ) is the entropy of a joint probabilty
mass function, which descibes the probabilty of making simultaneous observations of a set of 𝑛
random vraibles. It represents the uncertainty in a set of random variables, or the average number
of bits required to jointly describe an observation of the set of random variables. Conditional
entropy, denoted by 𝐻(𝑌 ∣𝑋1 , 𝑋2 , ..., 𝑋𝑛 ), is the entropy of a conditional probabilty mass function.
Equivalently it is the entropy of random variable 𝑋 given knowledge of other random variables.
A simple example will demonstrate a common way in which these concepts are applied to neuro-
science. Consider a recording of neural activity, 𝑅, in response to some stimulus 𝑆. We assume 𝑅
and 𝑆 to be generated by some unknown and known stochastic processes, respectively. By estimat-
ing the probabilty distribution of some feature of 𝑅 we may calculate 𝐻(𝑅), or the entropy of the
3
neural response. Since the stimulus is designed by the experimenter, we are given its probabilty
distribution, and hence its entropy, 𝐻(𝑆). Suppose we wish to estimate how much information is
contained in the neural response after a stimulus is presented. We may represent this as 𝐻(𝑅∣𝑆).
Conversely, 𝐻(𝑆∣𝑅) can be thought of as the information remaining in the stimulus once the neural
response is observed. These quantities are necessary for understanding the next important concept
of information theory, mutual information.
Mutual information is defined by
∑ 𝑝(𝑥, 𝑦)
𝐼(𝑋; 𝑌 ) = 𝐼(𝑌 ; 𝑋) = 𝑝(𝑥, 𝑦) log . (2)
𝑝(𝑥)𝑝(𝑦)
𝑥∈𝒳 ,𝑦∈𝒳
and may be written in terms of entropies:
𝐼(𝑋; 𝑌 ) = 𝐻(𝑋) − 𝐻(𝑋∣𝑌 ) (3)

= 𝐻(𝑌 ) − 𝐻(𝑌 ∣𝑋) (4)
= 𝐻(𝑋) + 𝐻(𝑌 ) − 𝐻(𝑋, 𝑌 ). (5)
Mutual information may also be conceived of in many different ways. It is the reduction in uncer-
tainty of 𝑋 (or 𝑌 ) due to knowledge of 𝑌 (or 𝑋). Equivalently, it is the amount of information
shared by 𝑋 and 𝑌 . Additionally, if 𝑋 is thought of an an input and 𝑌 an output to a communi-
cation channel, 𝐼(𝑋; 𝑌 ) may be thought of as the rate of information transfer across the channel.
Finally, mutual information can be seen as a kind of distance between the joint probability dis-
tribution and product of marginal distributions of a set of random variables. Hence, when the
random variables are all independent from one another, the mutual information is 0. When mutual
information is nonzero, the variables share some information, or the input and output of a com-
munication channel share information, i.e., communication is taking place or information is being
transferred.
In the context of the simple example given above, we may wish, for instance, to find the amount of
information about the stimulus encoded in the neuronal response, that is, 𝐼(𝑅; 𝑆). This quantity
can equivaneltly be seen as the amount of information in the response lost due to presentation of
the stimulus. If the neuron is viewed as a communication channel, the mutual information may be
defined as the amount of information about the stimulus transmitted through the neuron to the
response.
Additional information theoretic measures and concepts will be describe below where appropriate.
I now proceed to discuss a number of studies that point out the varied nature of neuroscience and
ways in which information theory has been applied.
4
4 At the Level of the Neuron
As explained above, the basic unit of the the nervous system is the neuron. Therefore, the study of
neuronal information processing began with the analysis of electorphysiological recordings of single
isolated neurons. This kind of research is still highly prevalent and relevant in the neurosciences,
and numerous intracellular and extracellular recording methods have been invented to facilitate
these studies.[4] The goal of these methods is to record electrical neural activity, in particular, the
action potentials, of a single neuron over time. While the study of single isolated neurons has
reveled a wealth of knowledge regarding how neurons function, process information, and enable
many complex behaviors, brain activity is not merely the sum of activity of individual neurons.
The brain and nervous system is composed of a vast set of interconnected networks of neurons
and associated cells. Therefore, techniques have been developed to simulataneously and directly
record the activity of populations of neurons.[5] All of these techniques are highly invasive. They
require surgically implanted electrodes or are used to record activity from tissue extracted from
the nervous system. Hence, these techniques are generally reserved for animal research or studying
neurons removed from their natural biological environment. In the rare instances they are used in
humans, it is usually during a surgical procedure to treat a disease.
4.1 Neural Coding
In the study of single neurons and small neural populations, there are two overlapping questions of
central importance. The first question is, how much information can or do neurons posess about
a given stimulus? The seccond question is, precisely how do neurons encode that information
about stimuli or which features of neural activity represent corresponding features of external or
internal stimuli. That is, what is the neural code? To answer the first question independent of
the second question requires assumptions. For instance, one of the earliest and most common
assumptions about neural coding is that sensory information is encoded in the patterns of spikes or
action potentials in the recordings of neural activity.[6] Therefore, in many studies, the probability
distribution of spike counts are analyzed. However, there are many ways in which spike coding
may be envisioned, and there are many other conceivable ways in which a neuron may encode
information. Furthermore, to facilitate these studies, assumptions must be made about which
features of the input stimuli are being encoded. If we are to make as few assumptions as possible and
attain model-independent methodologies, these two question must be addressed simultaneously. A
deeper understanding of how neurons encode, process, and transmit information would not only add
to our scientific knowledge, but would help enable medical treatments and therapies for disorders of
the nervous system. For instance, the ability to decode neural activity and estimate input signals
might inspire the development of devices to replace or augment defective sensory systems.
Information theory has a long history of being applied to the study of neural coding. For good
reviews, see [7] and [8]. Here, only several important results will be discussed. A crucial approach
to the first question posed above is to measure neuronal information and information transfer. The
following studies do so using several different techniques.
Using a common set of techniques for applying information theory to neural coding, Bialek et
5
at, studied the temporal coding of visual information in the blowfly.[9] Blowflies alter their flight
behavior on the order of tens of milliseconds. The movement-sensitive neurons involved are only
able to fire a few action potentials in response to environmental stimuli. These few spikes must
therefore encode visual information very quickly and reliably. The approach taken in this study
was to compare the stimulus to an estimate of the stimulus obtained from the neuronal response.
The stimuli used were rigidly moving random patterns chosen from an ensemble of stimuli that
approximated Gaussian noise. Specifically, a linear estimate of the stimulus was obtained by con-
volving the spike train signal with a mean-square estimation filter. Thus the best linear estimate of
the stimulus was calculated, and the noise of the reconstruction was found by subtraction from the
original stimulus. An estimate of the signal-to-noise ratio was computed from the power spectra of
the noise and stimulus. Finally, the the information rate was calculated, and interpreted as the rate
at which information was gained by the neuron about the stimulus given the neuron’s spike train.
Actually, it is better conceptualized as the amount of information their reconstruction algorithm
succeeded in extracting from the given spike train. They estimated that the neurons they recorded
from gained information about the stimulus at a rate of 64±1 bits per second. This, they claimed,
is the rate necessary to encode the rapidly changing and unpredictable visual stimuli a fly might
encounter.
A similar approach was used to study visual information encoding in a primate.[10] The researchers
hypothesized that constant-velocity stimuli would demand a lower encoding rate than time-varying
stimuli. Single neuron recodings were acquired by microelectrode implants from awake, alert mon-
keys while visual stimuli were presented. As in the previous study, a reconstruction method to
recover the stimulus waveform from the neuronal recording was employed. The mutual informa-
tion between the stimulus and the stimulus estimate was computed by estimating the entropy of
the estimate and the conditional entropy of the estimate given the actual signal, or the entropy
of the errors in the reconstruction. The entropies were calculated in an unspecified manner from
probability distributions of firing rates. This method was used to analyzed the neuronal response
to the time-varying stimuli. An alternative method directly measured the entropy of the stimulus
and the mutual information between the stimulus and the spike train. This was accomplished by
constructing probability distributions of firing rates in the following manner. The spike trains were
binned to some size. The number of spikes in successive bins were then counted, and words were
constructed from the number of spikes in each bin. The probability of occurrence of each word
was then calculated over all stimuli presentations. Two sets of stimuli were presented to calculate
mutual information. The first set consisted of different stimuli drawn from the same distribution.
The second set consisted of all the same stimuli. By subtracting the entropies obtained from two
two sets, the authors claimed an estimate of mutual information between the spike train and the
stimuli was obtained. The major finding of this study was that individual neurons in the area of
the visual system that was being studied encoded information at a much lower rate for constant-
motion stimuli than for time-varying stimuli. In particular, the neurons encoded the direction of
the constant-motion stimuli at a rate of 1-1.6 bits per second. For time-varying stimuli, the neurons
encoded information about the time-varying stimuli at a rate up to 29 bits per second.
The previous study assumed that visual information was encoded in neuron firing rates. However,
others have hypothesized that information is encoded by the precise timing of the spikes, that is, in
temporal firing patterns. In a study designed to test this hypothesis, the encoding of visual infor-
6
mation by individual neurons in the thalamus of cat brains was investigated.[11] As with the other
studies, neuronal recordings were obtained as visual stimuli were presented to the animal. Here,
the direct method used in the previous study was used to quantify the rate of neural information
encoding. However, the entropies were calcualted over a set of word lengths. This allowed for the
possibility of any temporal pattern. The main result of this study was that individual neurons in
the cat thalamus have information rates in excess of 100 bits per second and rates of 3-4 bits per
spike. The authors conclude that there exist, in addition to neural firing rate codes, firing pattern
codes.
These studies are all limited in that each assumed a particular neural code. It is not yet known
if the animals studied actually implement the proposed neural decoding schemes. In addition, the
results of these studies do not indicate which stimulus features are actually being represented by the
neural code and precisely which features of neural activity correspond to stimulus features. A study
by Dimitrov et al aimed to answer this question by attempting to derive the neural codebook using
concepts inspired by rate distortion theory.[?] Their method was similar to quantization, where one
is given R bits to represent a random variable 𝑋. The optimal set of values to represent 𝑋, or the
reproduction set, must be found subject to some minimization constraint. In the proposed neural
decoding method, neural responses are optimally quantized to some reproduction set by minimizing
a distortion function. The representation set is then considered to contain the stimulus-response
codewords. In this study, the distortion function used was 𝐷𝐼 (𝑌, 𝑌𝑁 ) = 𝐼(𝑋; 𝑌 ) − 𝐼)𝑋; 𝑌𝑁 ), where
𝑋 is the input signal, 𝑌 is the response, and 𝑌𝑁 is the reproduction of 𝑌 . The idea is to minimize
the different between the input-output mutual information and the mutual information between
the input and quantized output. The elements of 𝑌𝑁 are then interpreted as associated with the
stimulus-response codewords we wish to find.
This method was applied to the study the encoding characteristics of single sensory neurons of the
cercal sensory system of the cricket. This system is specialized to detect the direction and dynamic
properties of air currents in the horizontal plane in the local environment using two antenna-like
appendages, or cerci, located at the rear of the abdomen. The stimuli employed in the experiment
were complex but controlled air currents. Spike activity was recorded from single neurons of interest
and using algorithms described above and more fully in the reference, quantizers were derived to
identify classes of stimulus feature/spike-pattern pairs.
4.2 Synchrony
While the study of neural coding is essential to the understanding of how neurons process and re-
spond to external stimuli and interact with one another, there are other ways of elucidating neuronal
function, particularly among pairs or populations of neurons. A common method is to measure
how different neurons or neural populations synchronize with one another. It is thought by many
that neuronal synchronization and phase-locking underlies large-scale information transmission as
well as many processes that require the integration or binding of information such as learning,
attention, and cognition. Information theory has been used to derive measures of synchrony used
in such research.
Hurtado et al conducted a rare study involving individual neuronal recordings in human subjects.[12]
7
Data was obtained from Parkinsonian patients undergoing surgical resection of part of the brain
called the pallidum, which is involved in producing the uncontrollable tremors observed in the
disease. While this region of the brain was exposed during surgery, the activity of individual neu-
rons were recorded. Simulatneous electromygraphic recordings of muscle activity in the subjects’
extremities were obtained. The fundamental aim of this research was to quantify phase-locking as
a function of time between pallidal neurons and between pallidal neurons and muscle activity.
The first step in quantifying the phase relations between two oscillatory time series is to estimate
the instantaneous phase of each signal as a function of time. There are several ways to do this,
however, the goal is to represent the time series as a rotation around a point in phase space and
then extract the angle of rotation. This requires narrowband signals, for signals with multiple
frequency components can be represented by trajectories about multiple points in phase space. For
such signals rotation angles will thus be nonunique. To address this issue, the authors determined
statistically significant oscillations in each recording and applied narrow bandpass filtering around
those frequencies of interest. A common method of reconstructing instaneous phase, and the one
used in this study, is to consider the analytic extension of the filtered signal
𝑧(𝑡) = 𝑥(𝑡) + 𝑖¯
𝑥(𝑡) (6)
where 𝑥(𝑡) is the signal and 𝑥¯(𝑡) is the Hilbert transform of the signal. This operation removes all
negative frequency components from the Fourier spectrum, resulting in a complex representation
of the signal. Since the negative frequencies of the spectrum give no additional information, the
new signal 𝑧(𝑡) still contains all the spectral information about the signal. However, now phase is
conveniently exposed since the signal we may write
𝑧(𝑡)
𝑧 ′ (𝑡) = = 𝑒𝑖𝜙(𝑡) (7)
∥𝑧(𝑡)∥
which amounts to the projection of the analytic signal onto the unit circle. 𝜙(𝑡) is thus the angle of
rotation about the origin and represents the time-varying instaneous phase of the original signal.
From phases so obtained from two time series, a series of phase-differences, Φ(𝑡) = ∣𝜙2 (𝑡) − 𝜙1 (𝑡)∣,
may be calculated. By analyzing the distribution of phase differences, synchrony between the two
signals may be estimated.
Two information-theoretic approaches were used in this study. The first uses the entropy of the
phase difference distribution normalized to the maximum entropy over interval 𝑁 :
ˆ 𝑁 = log 𝐿 − 𝐻𝑁 .
𝐻 (8)
log 𝐿
𝐻𝑁 is the entropy of the phase differences for interval 𝑁 . It is computed from a histogram of Φ𝑁 (𝑡),
or the phase differences for interval 𝑁 , where 𝐿 is the number of bins in the histogram. Therefore,
8
when the phase distribution is uniform, that is, when no phase synchronization is present, 𝐻ˆ = 0.
ˆ
𝐻 has a maximum of 1 which occurs for perfect synchrony, when the distribution is a delta function.
Values between 0 and 1 quantify the degree of clustering of the phase distribution about the unit
circle.
The second measure of synchrony makes use of the mutual information between individual phase
distributions estimated from individual and joint histograms. Again, the calculation is made over
a series of intervals, and the mutual information is normalized to its maximum:
log 𝐿 − 𝐼𝑁 (𝜙1 (𝑡); 𝜙2 (𝑡))

𝐼ˆ𝑁 = (9)
log 𝐿
where, again, 𝐿 is the number of bin in the histograms and 𝐼𝑁 is the mutual information between
the individual phase distributions. So during perfect synchrony when 𝜙1 (𝑡) = 𝜙2 (𝑡), 𝐼ˆ = 0. When
𝜙1 (𝑡) and 𝜙2 (𝑡) are independent, 𝐼ˆ = 1.
Finally, to determine statistically significant synchrony, surrogate data methods were employed to
test the null hypothesis of independent phase activities. For instance, synchrony values may be
statistically compared to those obtained from the same time series whose Fourier phases have been
shuffled.
Using these methods, Hurtado et al were able to reveal dynamic features in the neural networks
responsible for generating Parkinsonian tremors that conventional analyses have missed. They hope
that this research will lead to a deeper understanding of how the motor networks of the brain are
altered in Parkinson’s disease.
Another study conducted by Manyakov and Val Hulle used similar techniques to study synchroniza-
tion among neurons in the monkey visual cortex during a paired stimulus-reinforcement learning
task.[13] Recordings were obtained from multi-electrode arrays surgically implanted in the region
of the brain thought to be responsible for visual information processing. During the task, the
monkeys learn to differentiate between two visual patterns by associating a fluid reward with the
target stimulus. The researchers wanted to determine how neurons synchronized with one another
during the learning process. Phase distributions were estimated in the same was as in the previous
study. However, in this study mutual information between the individual phase distributions was
calculated and normalized by the individual entropies:
2𝐼𝑁 (𝜙1 (𝑡); 𝜙2 (𝑡)

𝐼ˆ𝑁 = (10)
𝐻(𝜙1 (𝑡)) + 𝐻(𝜙2 (𝑡))
This particular normaization was chosen because in this study mutual information was estimated
using a binless estimator based on entropy estimates from k-nearest neighbor distances.[14] Average
phase synchrony was estimated using this measure between all possible pairs of electrodes in the
array over many recordings during presentation of the rewarded and unrewarded stimuli. The main
9
findings were that synchrony appeared significantly higher for the rewarded stimulus than for the
unrewarded stimulus and that synchrony appeared in waves whose propagation speed was faster
as a result of training. The authors speculate that enhanced synchrony underlies, to some extent,
an enhanced representation of the rewarded stimulus, and that the waves of synchrony medaite
information transfer in the visual cortex.
5 The the Level of the Brain
The study of neural activity in live humans typically requires noninvasive techniques. Of these
the most common are electroencephalography (EEG) and functional magnetic resonance imaging
(fMRI).[15, 16] EEG measures large-scale electrical brain activity. Unlike multi-neuron recordings,
which can measure the individual activity of up to several hundred neurons, EEG measures the
synchronous activity of thousands to millions of neurons. fMRI does not directly measure neuronal
activity. It measures regional brain hemodynamic activity, or changes in blood oxygenation in the
brain. Unlike many cells in the body, neurons do not have internal energy reserves. Therefore, as a
neuron become more active, the body must provide that neuron more oxygen to sustain its activity.
This process of oxygen delivery can be measured by fMRI and is used as a surrogate for neural
activity. Due to the nature of the signals collected using these techniques, information theory
is applied to their analysis in very different ways than for individual neurons. With individual
neurons, we have direct access to individual information channels, so we may directly probe how
neurons encode information. Because EEG and fMRI signals represent the integrated activity of so
many neurons, this question is very difficult, if at all possible, to answer at the level of the brain.
Therefore, the application of information theory to the analysis of these signals attempts to answer
different questions.
5.1 Disease Discrimination
A large number of studies have aimed to distinguish groups of control subjects from patients
with a variety of neurological disorders using information-theoretic measures. The idea is that if
the disorder disrupts neural information processing, then various information-theoretic measures
should yield significantly different values compared to healthy individuals. Such research may then
inform diagnosis, screening, or prediction methodologies, or help generate hypotheses regarding the
neuropathology of the disease.
For instance, Sabeti et al used a measure of spectral entropy of EEG signals as well as several
other measures of signal complexity to attempt to distinguish a group of schizophrenic patients
from control subjects.[17] Spectral entropy is essentially the Shannon entropy of the probability
distribution of the amplitudes of the spectral power density of a signal. In practice, it is often
normalized to the maximum entropy of the signal. That is,
10
𝑓ℎ
1 ∑
𝐻𝑠𝑝 = − 𝑝𝑖 ln(𝑝𝑖 ) (11)
ln 𝑁
𝑖=𝑓𝑙
where 𝑝𝑖 is the power density over frequency bin 𝑖 normalized so that 𝑓𝑖=𝑓
∑ℎ
𝑙
𝑝𝑖 = 1. The sum is
over the frequency range 𝑓𝑙 to 𝑓ℎ and 𝑁 is the number of frequency bins. The authors (as well
as many others) erroniously claim that the spectral entropy describes the irregularity of the signal
spectrum. Actually, it measures how sinusoidal the signal is.
The spectral entropy and several other complexity measures were computed over short windows
for a set of EEG signals. They were then used to construct input vectors for two different kinds
of classifier algorithms. A genetic programming algorithm was used to reduce the feature dimen-
sions. The result was a classification accuracy of up to approximatly 91%. The authors conclude
that schizophrenic patients are chazracterized by less complex neuropsychology and neurobehavior.
However, they are unable to differentially assign importance to individual features. Like many
researchers, their conclusion is based on the illogical premise that the mathematical complexity or
entropy of physiological signals somehow reflects the complexity of neural processes in the nervous
system. While this may be true, due to the unknown nature of neural and measuement noise and
the precise relationship of the measures to the underlying dynamics, it is very difficult to infer any-
thing about the physiology of the nervous system from information measurements of EEG unless
appropriate experimental manipulations are employed. Nevertheless, as this study demonstrates,
information theoretic measures can be useful at distinguishing some groups of individuals.
In another study on patients with schizophrenia, Na et al aimed to investigate how information

transmission between different areas of the brain might differ between people with schizophrenia
and control subjects.[18] To do this, they used estimates of the time-delayed auto and cross-mutual
information between 16 EEG signals from all participants. The time-delayed cross-mutual infor-
mation (CMI) is mutual information between two signals computed as a function of time delay.
Namely,
∑ 𝑝 (𝑥(𝑡), 𝑦(𝑡 + 𝜏 ))
𝐼𝑋𝑌𝜏 = 𝑝 (𝑥(𝑡), 𝑦(𝑡 + 𝜏 )) log . (12)
𝑝 (𝑥(𝑡)) 𝑝 (𝑦(𝑡 + 𝜏 ))
𝑥(𝑡),𝑦(𝑦+𝜏 )
If 𝑋 and 𝑌 are the same signal, then the time-delayed auto-mutual information (AMI) is obtained.
AMI normalized to the AMI at zero time delay was calculated for 16 EEG channels as a function of
time delay from 0 to 500 ms. The slopes of the AMI profiles were then calculated. The averaged CMI
(A-CMI) was also calculated between all 16 EEG signals over time delays from 0 to 500 ms. AMI
and A-CMI values were compared between the schizophrenic and control groups using standard
statistical techniques. Significant group differences were found for both AMI and A-CMI. It was
further found that schizophrenic patients has more slowly decreasing AMI slopes than controls.
As with the last study, the authors somewhat erronously conclude that the neural dynamics of
individuals with schizophrenia are less complex than in controls. The authors also seem to equate
11
“less compex” dynamics with “decreased” dynamics in order to allow their results to coincide with
the results of other studies. A-CMI was found to be, in general, significantly higher in individuals
with schizophrenia. The authors suggest that these results may indicate dysfunctional integration
between different regions of the brain in schizophrenia.
Similar methods as described above have also been used to differentiate patients with Alzheimer’s
disease from control subjects based on the analysis of EEG signals. Absolo et al used spectral en-
tropy and student’s t-tests to directly disciminate betwen subject groups.[19] They then computed
classification sensitivities (true positive rates) and specificities (true negative rates). Receiver op-
erating characteristic (ROC) curves (plots of sensitivities vs. specificities) were used to determine
optimal measurement thresholds to make classification decisions. Using these standard methods,
they achieved a classifcation accuracy of up to approximately 85%.
Time-delayed mutual information (AMI slope profiles and A-CMI) has also been used to distinguish
Alzheimer’s patients from control subjects.[20] As with the previous study, standard statsical analy-
sis techniques demonstrated significant differences between the subject groups. As with schizophre-
nia, the authors concluded that the EEG signals in Alzheimer’s patients exhibited less complexity
as measured by AMI. However, the CMI measurements suggested significantly reduced information
transmission between regions of the brain. The authors propose that this result supports the notion
that the underlying pathophysiolgy of Alzheimer’s disease is neocortical disconnection.
5.2 Neuropsychology
Another way information theory has been used at the level of the brain is to probe brain activity
during the performance of different cognitive or psychological tasks. The study of the relationship
between the nervous system and cognition or mental processing is called neuropsychology. The
classical approach taken by neuropsychologists is to administer a test and measure various behav-
ioral or physiological parameters. The analysis of these measurements can be used to discriminate
between populations, ascertain which regions of the brain subserve cognition, or gain insight into
the neurological mechanims that enable and support cognition.
Inouye et al conducted a study aimed at localizing the areas of brain activation during mental
arithmetic and to ascertain patterns of directional information flow.[21] To do this they analyzed
the EEG recordings from healthy subjects using spectral entropy and information flow. Information
flow is estimated using the coefficients of a 2-dimensional order m autoregressive function fitted to
the jointly stationary process {𝑋(𝑡), 𝑌 (𝑡)}. Information flow from 𝑋 to 𝑌 at time 𝑛 is given by
( )
1 𝐴2𝑌 𝑋 (𝑖)
𝐼𝑋𝑌 (𝑛) = log 1 + ∑𝑛−1 ( 2 2
) , 𝑛 = 0, 1, ..., 𝑚 (13)
2 𝑖=0 𝐴𝑌 𝑋 (𝑖) + 𝐴𝑌 𝑌 (𝑖)
where 𝐴𝑌 𝑋 , 𝐴𝑌 𝑌 are the estimated 2-D autoregressive coefficients. The form of this equation sug-
gests the assumption of Gaussian processes. The term containing the autoregressive coefficients is
an estimate of the signal to noise ratio obtained from the autoregressive spectrum. The information
12
flow from 𝑌 to 𝑋, 𝐼𝑌 𝑋 , is obtained by substitution 𝐴𝑋𝑌 for 𝐴𝑌 𝑋 and 𝐴𝑋𝑋 for 𝐴𝑌 𝑌 . All possible
pairwise combinations of electrode recordings were used for the processes 𝑋 and 𝑌 . Comparisons
of spectral entropy measurements between resting and mental arithmetic conditions using student’s
t-tests showed significant differences in specific brain regions assumed to be involved with the pro-
cess. Information flow was found to be distributed more sparsely in those regions during mental
arithmetic than during resting. The authors use these results to support the claim made by others
that mental activity is accompanied by neural desynchronization in the regions of the brain that
underlie the activity.
In another study, the mental process of hypothesis generation, or creative reasoning, in children was
explored.[22] The children were presented with images of 20 quail eggs of different sizes and with
different surface patterns. They were allowed to observe the eggs for 30 seconds and were then asked
a question like “Why do the eggs differ in size and surface pattern?” EEG was recorded and, once
again, time-delayed mutual information (A-CMI) was used to quantify information transmission
between different brain regions. Student’s t-tests were used to assess statistical differences between
resting and hypothesis generation conditions. The distribution of significant differences in A-
CMI across the brain allowed the researchers to posit which specific brain regions operate during
hypothesis generation and connect their work to other research in learning and memory.
5.3 Connectivity
One very active area of research in neuroscience aims to map out the connectivity of the various
regions or clusters of neurons in the brain. The ultimate goal is to develop something akin to
the “wiring diagram” of the brain. Success in this endeavor would yield an invaluable resource
for understanding how brain leasions and dysfunctional neural connectivity lead to pathological
conditions. It would also provide a theoretical basis for developing therapies for restoring normal
brain function.
In an fMRI study conducted by Salvador et al, frequency based measures of mutual information
were used to evaluate the degree to which clusters of brain regions were related to one another.[23]
A comparison between healthy individuals and individuals with schizophrenia revealed significant
connectivity differences. The authors start with the assumtion that the time series of observations
o of 𝑝 points in the brain can be described by a jointly normal multivariate stationary stochastic
process 𝑋(𝑡) = {𝑋1 (𝑡), 𝑋2 (𝑡), ..., 𝑋𝑝 (𝑡)}. A discrete Fourier transofrm of the 𝑛-length time series
will yield the jointly normal set of values 𝑌 (𝜔𝑘 ) = {𝑌1 (𝜔𝑘 ), 𝑌2 (𝜔𝑘 ), ..., 𝑌𝑝 (𝜔𝑘 )} at each 𝜔𝑘 for
𝑘 = 1, ..., 𝑛. Since all the 𝑌 (𝜔𝑘 ), 𝑌 (𝜔𝑞 ) pairs can be shown to be asymptotocially independent, the
mutual information between any two 𝑌 will be
∑
𝐼(𝑌𝑎 ; 𝑌𝑏 ) = 𝐼𝜔𝑘 (𝑌𝑎 ; 𝑌𝑏 ) (14)
𝜔𝑘 >0
This allows for the calcualtion of mutual informtion between two brain regions by estimating it
for each frequency separately. 𝐼𝜔𝑘 (𝑌𝑎 ; 𝑌𝑏 ) is calculated from estimates of the entropies 𝐻𝜔𝑘 (𝑌𝑎 ),
13
𝐻𝜔𝑘 (𝑌𝑏 ), and 𝐻𝜔𝑘 (𝑌𝑎 , 𝑌𝑏 ), which the authors derive. An analysis so described was carried out
between signals obtained while subjects quitely rested. 90 brain regions grouped into 10 clusters
were then defined. Mutual information was averaged over three frequency bands and statistically
compared between healthy and schizophrenic subjects. The results of this study pointed to altered
connectivity in a specific brain region, suggesting that defects in this region play a key role in the
pathogenesis of schizophrenia.
6 At the Level of Physiological Systems
When considering the nervous system as a whole, it is insufficient to attend solely to the brain,
for the the brain is extensively interconnected to every physiolgoical system in the body via a
variety of sensors and feedback circuitry. For instance, the regulation of breathing involves intricate
interconnections between the various parts of the brain and the lungs as well as numerous kinds of
sensors and receptors such as arterial and central chemoreceptors which sense blood carbox dioxide
levels and arterial baroreceptors that sense circulatory volume. Another important set of receptors
involved in respiration are pulmonary stretch receptors in the lungs which continuously transmit
signals to the brain encoding rate and depth of respiration. The decoding of these signals was
studied from an information theoretic perspective by Rogers et al.[24] They followed an approach
similar to one described above, to directly measure the mutual information between a spike train
of stretch receptor activity and a stimulus in anesthetized rats. The stimuli consisted of controlled
manipulations of lung volume via a mechanical respirator. Using a fixed-width sliding window,
mutual information between the stimulus and spike train were analyzed in the following manner.
At each window position, the joint probability of the stimulus and response (spike count) was
calculated. From the joint probability, mutual information between the stimulus and response
normalized by its theoretical maximum was calculated. The main finding of this study was that
individual pulmonary stretch receptors were capable of transmitting up to 38% of the available
information about lung volume.
Another study on respiratory regulation focused on the coupling between respiratory signals and
respiratory muscle activity during breathing.[25] By comparing the cross-mutual information, de-
scribed above, between the signals, they investigated differences between healthy individuals and
people with obstructive sleep apnea syndrome (OSAS). OSAS is a prevalent form of sleep-disordered
breathing where individuals experience periodic apneas (cessation of breathing) and hypopneas (re-
duced breathing) while they sleep. As explained above, CMI is the mutual information between
two signals as a function of relative time delay. To obtain significant CMI values, CMI series ob-
tained from surrogate data techniques were subtracted from the original CMI series. Finally, several
variables were dervied from this difference and compared between subject groups by two-sample
T-tests. Stastically significant differences between healthy and OSAS subjects were found. Since
breathing and respiratory muscle activity are coordinated by the nervous system, the results of this
study may point to dysfunctional coordination between these two processes in people with OSAS.
As a final example of information theory applied to the study of the interaction between the
nervous system and a physiological system, we turn to the interactions between the cardiac and
respiratory systems. Hoyer et al hypothesized that information theoretic measures, among other
14
nonliear measures, of heart rate variability and cardiorespiratory interdepencies offer are superior
to conventional measures in classifying patients after a heart attack.[26] Heart rate variability is
a measure of the variability in the heart’s interbeat intervals. It is widely considered to contain
information about the major influences on cardiac control, particular autonomic activity. Reduced
HRV has been shown to be associated with mortality after a heart attack. Once again the auto-
mutual information and cross-mutual information were measured over a range of delays, in this case,
between heart rate and respiratory signals. Several variables were derived from the AMI and CMI
series. Compared to standard variables of cardiorespiratory interactions, the variabiles introduced
by the authors were found to provide improved identification of patients at risk of sudden death
after a heart attack or severe heart failure.
7 Conclusion
Information theory has been applied to the neurosciences in many different ways to try to answer
many different questions at various levels of abstraction. This paper merely touches on some
important and interesting applications and results. However, even from this limited treatment of
the field, we can observe some patterns and trends. As illustrated above, at the level of the neuron,
the relevant questions are how much information does a neuron encode about a stimulus and what
features of a stimulus and neural activity are important for encoding. Two approaches have been
taken to address the first question. First, mutual information between the stimulus and response can
be directly measured. The second method is to reconstruct the stimulus from the response, and then
calculate mutual information between the reconstruction and the original stimulus. Both methods
require assumptions regarding which stimulus and neural response features are important. The
second method requires a decoding algorithm, which may have nothing to do with actual neuronal
function. Even if a perfect reconstruction can be attained, this tells us nothing about how the neuron
actually encodes and decodes information. Furthermore, none of the studies mentioned give any
indication about how much of encoded information is actually used by the nervous system and in
what manner. Thus, researchers need to be extremely careful not to overextend their conclusions
and interpretations beyond what their information-theoretic measures actually express. This is
particular true when one treats the above two questions independently. To answer them jointly is
much more difficult, but only then can inferences be safely made regarding how neurons actually
encode and decode information and what information they encode and decode. At the level of
the brain and physiolgoical systems, we lose direct access to the individual information channel.
Instead we can only observe the syncornous activity of huge populations of neurons. Of course, we
can treat the population as a channel; however, this is very difficult because we have very little
understanding of what form this channel takes. Nevertheless, information theory can provide a
useful tool for distinguishing between different distributions of large-scale acitivty. As described
above, the use of various information theoretic tools can successfully distinguish between health
and disease or between two physiological states. Again, it is extremely important not to make
unwarranted conclusions about actual neural function from the results of these studies. One must
distinguish characteristics of neural signals from actual neural functions. That a neural signal has
a certain amount of complexity or randomness does not necessarily imply anything about how the
neuron actually functions. Information theory has clearly proved to be an extremely useful tool
15
and framework for studing various problems in neuroscience. However, the precise meaning of
what it measures and of its limitations must be fully appreciated. Many open questions remain in
neuroscience that will undoubedtedly benefit from the application of information theory. However
care must be taken to construct experiments specifically designed for its application. Its usefulness
will be limited when applied in an ad hoc manner to experiments not specifically designed to exposes
variables in ways ammenable to information theoretic analyses.
References
[1] Hodgkin and Huxley model. http://en.wikipedia.org/wiki/Hodgkin-Huxley model.
[2] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal,
27:379–423, 1948.
[3] T. M. Cover and Joy A. Thomas. Elements of information theory. John Wiley and Sons, June
2006.
[4] Electrophysiology. http://en.wikipedia.org/wiki/Electrophysiology.
[5] Multielectrode arrays. http://www.scholarpedia.org/article/Multielectrode arrays.
[6] E. D. Adrian. The impulses produced by sensory nerve endings. The journal of Physiology,
61(1):49–72, March 1926.
[7] A Borst and F E Theunissen. Information theory and neural coding. Nature Neuroscience,
2(11):947–957, November 1999. PMID: 10526332.
[8] Ernesto Pereda, Rodrigo Quian Quiroga, and Joydeep Bhattacharya. Nonlinear multivariate
analysis of neurophysiological signals. Progress in Neurobiology, 77(1-2):1–37, October 2005.
PMID: 16289760.
[9] W Bialek, F Rieke, R R de Ruyter van Steveninck, and D Warland. Reading a neural code.
Science (New York, N.Y.), 252(5014):1854–1857, June 1991. PMID: 2063199.
[10] G T Buracas, A M Zador, M R DeWeese, and T D Albright. Efficient discrimination of
temporal patterns by motion-sensitive neurons in primate visual cortex. Neuron, 20(5):959–
969, May 1998. PMID: 9620700.
[11] P Reinagel and R C Reid. Temporal coding of visual information in the thalamus. The Journal
of Neuroscience: The Official Journal of the Society for Neuroscience, 20(14):5392–5400, July
2000. PMID: 10884324.
[12] Jose M Hurtado, Leonid L Rubchinsky, and Karen A Sigvardt. Statistical method for detection
of phase-locking episodes in neural oscillations. Journal of Neurophysiology, 91(4):1883–1898,
April 2004. PMID: 15010498.
[13] Nikolay V Manyakov and Marc M Van Hulle. Synchronization in monkey visual cortex analyzed
with an information-theoretic measure. Chaos (Woodbury, N.Y.), 18(3):037130, September
2008. PMID: 19045504.
16
[14] Alexander Kraskov, Harald Stgbauer, and Peter Grassberger. Estimating mutual information.
Physical Review E, 69(6):066138, June 2004.
[15] eeg. http://en.wikipedia.org/wiki/Eeg.
[16] Functional magnetic resonance imaging. http://en.wikipedia.org/wiki/Fmri.
[17] Malihe Sabeti, Serajeddin Katebi, and Reza Boostani. Entropy and complexity measures for
EEG signal classification of schizophrenic and control participants. Artificial Intelligence in
Medicine, April 2009. PMID: 19403281.
[18] Sun Hee Na, Seung-Hyun Jin, Soo Yong Kim, and Byung-Joo Ham. EEG in schizophrenic
patients: mutual information analysis. Clinical Neurophysiology: Official Journal of the Inter-
national Federation of Clinical Neurophysiology, 113(12):1954–1960, December 2002. PMID:
12464333.
[19] D. Abasolo, R. Hornero, P. Espino, D. Alvarez, and J. Poza. Entropy analysis of the EEG
background activity in alzheimer’s disease patients. Physiological Measurement, 27(3):241–253,
2006.
[20] J Jeong, J C Gore, and B S Peterson. Mutual information analysis of the EEG in patients with
alzheimer’s disease. Clinical Neurophysiology: Official Journal of the International Federation
of Clinical Neurophysiology, 112(5):827–835, May 2001. PMID: 11336898.
[21] T. Inouye, K. Shinosaki, A. Iyama, and Y. Matsumoto. Localization of activated areas and
directional EEG patterns during mental arithmetic. Electroencephalography and Clinical Neu-
rophysiology, 86(4):224–230, April 1993.
[22] Seung-Hyun Jin, Yong-Ju Kwon, Jin-Su Jeong, Suk Won Kwon, and Dong-Hoon Shin. In-
creased information transmission during scientific hypothesis generation: Mutual informa-
tion analysis of multichannel EEG. International Journal of Psychophysiology, 62(2):337–344,
November 2006.
[23] R Salvador, A Martnez, E Pomarol-Clotet, S Sarr, J Suckling, and E Bullmore. Frequency

based mutual information measures between clusters of brain regions in functional magnetic
resonance imaging. NeuroImage, 35(1):83–88, March 2007. PMID: 17240167.
[24] R F Rogers, J D Runyan, A G Vaidyanathan, and J S Schwaber. Information theoretic analysis

of pulmonary stretch receptor spike trains. Journal of Neurophysiology, 85(1):448–461, 2001.
PMID: 11152746.
[25] Joan Francesc Alonso, Miguel A Maanas, Dirk Hoyer, Zbigniew L Topor, and Eugene N
Bruce. Evaluation of respiratory muscles activity by means of cross mutual information func-
tion at different levels of ventilatory effort. IEEE Transactions on Bio-Medical Engineering,
54(9):1573–1582, September 2007. PMID: 17867349.
[26] Dirk Hoyer, Uwe Leder, Heike Hoyer, Bernd Pompe, Michael Sommer, and Ulrich Zwiener.
Mutual information and phase dependencies: measures of reduced nonlinear cardiorespiratory
interactions after myocardial infarction. Medical Engineering & Physics, 24(1):33–43, 2002.
17

Ami Project Jonathan Waxman

Uploaded by

Copyright:

Available Formats

You might also like

Ami Project Jonathan Waxman

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ami Project Jonathan Waxman

Uploaded by

Copyright:

Available Formats

Information Theory and Neuroscience

November 18, 2009

Figure 1: The antomy of a neuron.

Mutual information is deﬁned by

and may be written in terms of entropies:

𝐼(𝑋; 𝑌 ) = 𝐻(𝑋) − 𝐻(𝑋∣𝑌 ) (3)

4.1 Neural Coding

log 𝐿 − 𝐼𝑁 (𝜙1 (𝑡); 𝜙2 (𝑡))

2𝐼𝑁 (𝜙1 (𝑡); 𝜙2 (𝑡)

5 The the Level of the Brain

5.1 Disease Discrimination

In another study on patients with schizophrenia, Na et al aimed to investigate how information

6 At the Level of Physiological Systems

[15] eeg. http://en.wikipedia.org/wiki/Eeg.

[16] Functional magnetic resonance imaging. http://en.wikipedia.org/wiki/Fmri.

[23] R Salvador, A Martnez, E Pomarol-Clotet, S Sarr, J Suckling, and E Bullmore. Frequency

[24] R F Rogers, J D Runyan, A G Vaidyanathan, and J S Schwaber. Information theoretic analysis

You might also like