Professional Documents
Culture Documents
Full download Statistical Topics and Stochastic Models for Dependent Data With Applications Vlad Stefan Barbu file pdf all chapter on 2024
Full download Statistical Topics and Stochastic Models for Dependent Data With Applications Vlad Stefan Barbu file pdf all chapter on 2024
https://ebookmass.com/product/applied-modeling-techniques-and-
data-analysis-2-financial-demographic-stochastic-and-statistical-
models-and-methods-volume-8-yannis-dimotikalis/
https://ebookmass.com/product/statistical-learning-for-big-
dependent-data-wiley-series-in-probability-and-statistics-1st-
edition-daniel-pena/
https://ebookmass.com/product/from-statistical-physics-to-data-
driven-modelling-with-applications-to-quantitative-biology-
simona-cocco/
https://ebookmass.com/product/handbook-of-statistical-analysis-
and-data-mining-applications-second-edition-elder/
An Introduction to Statistical Learning with
Applications in R eBook
https://ebookmass.com/product/an-introduction-to-statistical-
learning-with-applications-in-r-ebook/
https://ebookmass.com/product/statistical-methods-for-survival-
data-analysis-3rd-edition-lee/
https://ebookmass.com/product/exact-statistical-inference-for-
categorical-data-1st-edition-shan/
https://ebookmass.com/product/synthetic-data-for-deep-learning-
generate-synthetic-data-for-decision-making-and-applications-
with-python-and-r-1st-edition-necmi-gursakal-2/
https://ebookmass.com/product/synthetic-data-for-deep-learning-
generate-synthetic-data-for-decision-making-and-applications-
with-python-and-r-1st-edition-necmi-gursakal/
Statistical Topics and Stochastic Models
for Dependent Data with Applications
Series Editor
Nikolaos Limnios
Edited by
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers,
or in the case of reprographic reproduction in accordance with the terms and licenses issued by the
CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the
undermentioned address:
www.iste.co.uk www.wiley.com
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Vlad Stefan BARBU and Nicolas V ERGNE
2.5. Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Preface
This collective book stems from a workshop that took place in Rouen,
October 3–5, 2018, within the framework of the project Random Models and
Statistical Tools, Informatics and Combinatorics (MOUSTIC – Modèles aléatoires et
Outils Statistiques, Informatiques et Combinatoires), financed by the Region of
Normandy and the European Regional Development Fund. The main idea was to
bring together leading scientists working on probabilistic and statistical topics for
dependent data, as well as on associated applications.
This is the way this book was written, with the intention to offer to the scientific
community a part of the latest advances in the field of stochastic modeling for
dependent data, authored by leading experts in the field. This is a crucial aspect in a
time when we face an increasing need for more and more complex models capable of
capturing the main features of increasingly complex applications. From a technical
point of view, our book is important for gathering theoretical developments and
applications related to Markov type models (semi-Markov processes, autoregressive
processes, piecewise deterministic Markov processes, and variable length Markov
chains) as well as probabilistic/statistical techniques issued from information theory,
based on divergence measures and entropies.
This volume is divided into three parts: the first one examines Markov and
semi-Markov processes, the second one deals with autoregressive processes and the
last one presents divergence measures and entropies. Particular attention is given to
applications of these methods in various fields such as finance, DNA analysis,
quantum physics and survival analysis.
The workshop in Rouen and, consequently, the present book, would not have
been possible without the support of the Laboratory of Mathematics Raphaël Salem,
the Laboratory of Mathematics Nicolas Oresme, the University of Rouen Normandy,
the Region of Normandy, the French Statistical Society-SFdS, the MAS group of the
xii Statistical Topics and Stochastic Models for Dependent Data with Applications
We would like to thank all the speakers of the workshop who contributed, although
some of them indirectly, to the quality of the present volume. Our thanks go also to
the anonymous reviewers for their valuable work: without their support, this volume
could not have been successfully completed.
We would also like to thank Professor Nikolaos Limnios for proposing and
encouraging us to elaborate this collective volume as well as the editorial staff of
ISTE Ltd for their technical support.
Statistical Topics and Stochastic Models for Dependent Data with Applications,
First Edition. Vlad Stefan Barbu and Nicolas Vergne.
© ISTE Ltd 2020. Published by ISTE Ltd and John Wiley & Sons, Inc.
1
We consider a walker on the line that at each step keeps the same direction with a
probability that depends on the time already spent in the direction the walker is
currently moving. These walks with memories of variable length can be seen as
generalizations of directionally reinforced random walks (DRRWs) introduced in
Mauldin et al. (1996). We give a complete and usable characterization of the
recurrence or transience in terms of the probabilities to switch the direction. These
conditions are related to some characterizations of existence and uniqueness of a
stationary probability measure for a particular Markov chain: in this chapter, we define
the general model for words produced by a variable length Markov chain (VLMC) and
we introduce a key combinatorial structure on words. For a subclass of these VLMC,
this provides necessary and sufficient conditions for existence of a stationary
probability measure.
1.1. Introduction
This is the story of the encounter between two worlds: the world of random walks
and the world of VLMCs. The meeting point turns around the semi-Markov property
of underlying processes.
In a VLMC, unlike fixed-order Markov chains, the probability to predict the next
symbol depends on a possibly unbounded part of the past, the length of which depends
on the past itself. These relevant parts of pasts are called contexts. They are stored in
a context tree. With each context, a probability distribution is associated, prescribing
the conditional probability of the next symbol, given this context.
Chapter written by Peggy C ÉNAC, Brigitte C HAUVIN, Frédéric PACCAUT and Nicolas
P OUYANNE.
4 Statistical Topics and Stochastic Models for Dependent Data with Applications
VLMCs are now widely used as random models for character strings. They were
introduced in Rissanen (1983) to perform data compression. When they have a finite
memory, they provide a parsimonious alternative to fixed-order Markov chain models,
in which the number of parameters to estimate grows exponentially fast with the order;
they are also able to capture finer properties of character sequences. When they have
infinite memory – this will be our case of study in this chapter – they are a tractable
way to build non-Markov models and they may be considered as a subclass of “chaînes
à liaisons complètes” (Doeblin and Fortet 1937) or “chains with infinite order” (Harris
1955).
VLMCs are used in bioinformatics, linguistics and coding theory to model how
words grow or to classify words. In bioinformatics, both for protein families and DNA
sequences, identifying patterns that have a biological meaning is a crucial issue. Using
VLMC as a model enables one to quantify the influence of a meaning pattern by
giving a transition probability on the following letter of the sequence. In this way, these
patterns appear as contexts of a context tree. Note that their length may be unbounded
(Bejerano and Yona 2001).
As has already been said, VLMC are a natural generalization to infinite memory
of Markov chains. It is usual to index a sequence of random variables forming a
Markov chain with positive integers and to make the process grow to the right. The
main drawback of this habit for an infinite memory process is that the sequence of the
process is read from left to right, whereas the (possibly infinite) sequence giving the
past needed to predict the next symbol is read in the context tree from right to left,
Variable Length Markov Chains, Persistent Random Walks: A Close Encounter 5
thus giving rise to confusion and lack of readability. For this reason, in this chapter,
the VLMC grows to the left. In this way, both the process sequence and the memory
in the context tree are read from left to right.
These random walks that have an unbounded past memory can be seen as a
generalization of “directionally reinforced random walks (DRRW)” introduced by
Mauldin et al. (1996), in the sense that the persistence times are anisotropic ones. For
a one-dimensional random walk associated with a double comb, a complete
characterization of recurrence and transience, in terms of changing (or not) direction
probabilities, is given in section 1.3.1. More precisely, when one of the random times
spent in a given direction (the so-called persistence times) is an integrable random
variable, the recurrence property is equivalent to a classical drift-vanishing. In all
other cases, the walk is transient unless the weight of the tail distributions of both
persistent times are equal. In two-dimensional random walk, sufficient conditions of
transience of recurrence are given in section 1.3.2.
Actually, because of the very specific form of the underlying driving VLMC,
these PRWs turn out to be in one-to-one correspondence with so-called Markov
additive processes. Section 1.5 examines the close links between PRWs, Markov
additive processes, semi-Markov chains and VLMC.
In section 1.2, the definition of a general VLMC and a couple of examples are
given. In section 1.3, the PRWs are defined and known results on their recurrence
properties are collected. In view of section 1.5 where we show how PRW and VLMC
meet through the world of semi-Markov chains, section 1.4 is devoted to results –
together with a heuristic approach – on the existence and unicity of stationary
measures for a VLMC.
6 Statistical Topics and Stochastic Models for Dependent Data with Applications
Let A be a finite set, called the alphabet. Here A will most often be the standard
alphabet A = {0, 1}, but also A = {d, u} (for down and up) or A = {n, e, w, s} (for
the cardinal directions). Let
R = {αβγ · · · : α, β, γ, · · · ∈ A}
Giving a formal frame of such a process leads to the following definitions. For a
complete presentation of VLMC, one can also refer to Cénac et al. (2012).
The tree T is saturated whenever any internal node has # (A) children: for any
finite word u and for any α ∈ A, uα ∈ T =⇒ (∀β ∈ A, uβ ∈ T ). A right-infinite
word on A is an infinite branch of T when all its finite prefixes belong to T .
D EFINITION 1.2 (pref function).– Let T be a context tree. If w is any external node
or any context, the symbol pref w denotes the longest (finite or infinite) prefix of w
that belongs to T .
In other words, pref w is the only context c for which w = c · · · For a more visual
presentation, hang w by its head (its left-most letter) and insert it into the tree; the only
context through which the word goes out of the tree is its pref.
0 1 A context
An internal node
1000
Figure 1.1. A context tree on the alphabet A = {0, 1}. The dotted lines are possibly the
beginning of infinite branches. Any word that writes 1000 · · · , like the one represented
by the dashed line, admits 1000 as a pref. For a color version of this figure, see
www.iste.co.uk/barbu/data.zip
D EFINITION 1.3 (VLMC).– Let T be a context tree. For every context c of T , let qc be
a probability measure on A. The VLMC defined by T and by the (qc )c is the R-valued
discrete-time Markov chain (Un )n∈N defined by the following transition probabilities:
∀n ∈ N, ∀α ∈ A,
At each step of time n ≥ 0, one gets Un+1 by adding a random letter Xn+1 on the
left of Un :
Un+1 = Xn+1 Un
= Xn+1 Xn · · · X1 X0 X−1 X−2 · · ·
R EMARK 1.2.– Assume that the context tree is finite and denote its height by h; in
this condition, the VLMC is just a Markov chain of order h on A. On the contrary,
when the context tree is infinite, and this is mainly our case of interest, the VLMC is
generally not a Markov process on A.
n e w s
E XAMPLE 1.3.– Take A = {0, 1} (naturally ordered for the drawings). The left comb
of right combs, shown on the left side of Figure 1.4, is the context tree of a VLMC that
makes its transition probabilities depend on the largest prefix of Un of the form 0p 1q .
Variable Length Markov Chains, Persistent Random Walks: A Close Encounter 9
If one has to take into consideration the largest prefix of the form 0p 1q or 1p 0q , one has
to use the double comb of opposite combs, as shown on the right side of Figure 1.4.
Figure 1.4. Context trees on A = {0, 1}: the left comb of right combs
(on the left) and a double comb of opposite combs (on the right)
In this section, the so-called PRWs are defined. A PRW is a random walk driven
by some VLMC. In dimensions one and two, results on transience and the recurrence
of PRW are given. These results are detailed and proven in Cénac et al. (2018b, 2013)
in dimension one and in Cénac et al. (2020) in dimension two.
In this section, we deal with one-dimensional PRWs. Note that, contrary to the
classical random walk, a PRW is generally not Markovian. Let A := {d, u} = {−1, 1}
(d for down and u for up) and consider the double comb on this alphabet as a context
tree, probabilize it and denote by (Un )n a realization of the associated VLMC. The
nth increment Xn of the PRW is given as the first letter of Un : define the persistent
random walk S = (Sn )n≥0 by S0 = 0 and, for n ≥ 1,
n
Sn := X , [1.2]
=1
Furthermore, for the sake of simplicity and without loss of generality, we condition
the walk to start almost surely (a.s.) from {X−1 = u, X0 = d} – this amounts to
changing the origin of time. In this model, a walker on a line keeps the same direction
with a probability that depends on the discrete time already spent in the direction the
walker is currently moving (see Figure 1.5). This model can be seen as a generalization
of DRRWs introduced in Mauldin et al. (1996).
Sn
u d Y1 Y2 u
... u
d u d
... u
d
d u B0 B1 B2 B3 B4
n
Mn = Y
=1
In order to avoid trivial cases, we assume that S cannot be frozen in one of the two
directions with a positive probability. Therefore, we make the following assumption.
A SSUMPTION 1.1 (Finiteness of the length of runs).– For any α, β ∈ {u, d}, α = β,
n
lim qαk β (α) = 0. [1.3]
n→+∞
k=1
Another random document with
no related content on Scribd:
occur during their growth, until they are adult. The wings first appear
in the form of prolongations of the meso- and meta-nota, which
increase in size, the increment probably taking place at the moults.
The number of joints of the antennae increases during the
development; it is effected by growth of the third joint and
subsequent division thereof; hence the joints immediately beyond
the second are younger than the others, and are usually shorter and
altogether more imperfect. The life-histories of Termites have been
by no means completely followed; a fact we can well understand
when we recollect that these creatures live in communities
concealed from observation, and that an isolated individual cannot
thrive; besides this the growth is, for Insects, unusually slow.
The Termitidae of Africa are the most remarkable that have yet been
discovered, and it is probably on that continent that the results of the
Termitid economy have reached their climax. Our knowledge of the
Termites of tropical Africa is chiefly due to Smeathman, who has
described the habits of several species, among them T. bellicosus. It is
more than a century since Smeathman travelled in Africa and read an
account of the Termites to the Royal Society.[292] His information was the
first of any importance about Termitidae that was given to the world; it is,
as may be well understood, deficient in many details, but is nevertheless
of great value. Though his statements have been doubted they are
truthful, and have been confirmed by Savage.[293]
From the preceding brief sketch of some Termitidae we may gather the
chief points of importance in which they differ from other Insects, viz. (1)
the existence in the community of individuals—workers and soldiers—
which do not resemble their parents; (2) the limitation of the reproductive
power to a single pair, or to a small number of individuals in each
community, and the prolongation of the terminal period of life. There are
other social Insects besides Termitidae: indeed, the majority of social
Insects—ants, bees, and wasps—belong to the Order Hymenoptera, and
it is interesting to note that analogous phenomena occur in them, but
nevertheless with such great differences that the social life of Termites
must be considered as totally distinct from that of the true ants and other
social Hymenoptera.
Soldier.—All the parts of the head of the soldier undergo a greater or less
change of form; even the pieces at its base, which connect it by means of
the cervical sclerites with the prothorax, are altered. The parts that
undergo the greatest modification are the mandibles (Fig. 233, B); these
become much enlarged in size and so much changed in form that in a
great many species no resemblance to the original shape of these organs
can be traced. It is a curious fact that the specific characters are better
expressed in these superinduced modifications than they are in any other
part of the organisation (except, perhaps, the wings). The soldiers are not
alike in any two species of Termitidae so far as we know, and it seems
impossible to ascribe the differences that exist between the soldiers of
different species of Termitidae to special adaption for the work they have
to perform. Such a suggestion is justifiable only in the case of the Nasuti
(Fig. 234, 1), where the front of the head is prolonged into a point: a duct
opens at the extremity of this point, from which is exuded a fluid that
serves as a cement for constructing the nest, and is perhaps also used to
disable enemies. Hence the prolongation and form of the head of these
Nasuti may be fairly described as adaptation to useful ends. As regards
the great variety exhibited by other soldiers—and their variety is much
greater than it is in the Nasuti—it seems at present impossible to treat it
as being cases of special adaptations for useful purposes. On the whole it
would be more correct to say that the soldiers are very dissimilar in spite
of their having to perform similar work, than to state that they are
dissimilar in conformity with the different tasks they carry on.
Fig. 234.—Soldiers of different species of Termites. (After Hagen.) 1,
Termes armiger; 2, T. dirus; 3, Calotermes flavicollis; 4, T. bellicosus;
5, T. occidentis; 6, T. cingulatus (?); 7, Hodotermes quadricollis (?); 8,
T. debilis (?), Brazil.
Forms of Termes lucifugus. (After Grassi.) Zool. Anz. xii. 1889, p. 360.
On inspecting this table it will be perceived that the variety of forms is due
to three circumstances—(1) the existence of castes that are not present
in ordinary Insects; (2) the coexistence of young, of adolescents, and of
adults; and (3) the habit the Termites have of tampering with forms in their
intermediate stages, the result of which may be the substitution of
neoteinic individuals in place of winged forms.
This latter procedure is far from being completely understood, but to it are
probably due the various abnormal forms, such as soldiers with rudiments
of wings, that have from time to time been discovered in Termite
communities, and have given rise to much perplexity.
In connexion with this subject we may call attention to the necessity, when
examining Termite nests, of taking cognisance of the fact that more than
one species may be present. Bates found different Termites living
together in the Amazons Valley, and Mr. Haviland has found as many as
five species of Termitidae and three of true ants in a single mound in
South Africa. In this latter case observation showed that, though in such
close proximity, there was but little further intimacy between the species.
There are, however, true inquiline, or guest, Termites, of the genus
Eutermes, found in various parts of the world living in the nests of other
Termitidae.
The observations already made show that this is not the case. It has been
thoroughly well ascertained by Lespès and Fritz Müller that in various
species of Calotermes the soldiers are both males and females. Lespès
and Grassi have shown that the workers of Termes lucifugus are of male
and female sex, and that this is also true of the soldiers. Although the
view of the duality of the sexes of these forms was received at first with
incredulity, it appears to be beyond doubt correct. Grassi adds that in all
the individuals of the workers and soldiers of Termes lucifugus the sexual
organs, either male or female, are present, and that they are in the same
stage of development whatever the age of the individual. This statement
of Grassi's is of importance because it seems to render improbable the
view that the difference of form of the soldier and worker arises from the
arrest of the development of their sexual organs at different periods. The
fact that sex has nothing whatever to do with the determination of the
form of workers and soldiers may be considered to be well established.
The statement that the young are all born alike is much more difficult to
substantiate. Bates said that the various forms could be detected in the
new-born. His statement was made, however, merely from inspection of
the nests of species about which nothing was previously known, and as it
is then very difficult to decide that a specimen is newly hatched, it is
probable that all he meant was that the distinction of workers, soldiers,
and sexual forms existed in very small individuals—a statement that is no
doubt correct. Other observers agree that the young are in appearance all
alike when hatched, and Grassi reiterates his statement to this effect.
Hence it would appear that the difference of form we are discussing
arises from some treatment subsequent to hatching. It may be suggested,
notwithstanding the fact that the young are apparently alike when
hatched, that they are not really so, but that there are recondite
differences which are in the course of development rendered
conspicuous. This conclusion cannot at present be said with certainty to
be out of the question, but it is rendered highly improbable by the fact
ascertained by Grassi that a specimen that is already far advanced on the
road to being an ordinary winged individual can be diverted from its
evident destination and made into a soldier, the wings that were partially
developed in such a case being afterwards more or less completely
absorbed. This, as well as other facts observed by Grassi, render it
probable that the young are truly, as well as apparently, born in a state
undifferentiated except as regards sex. Fig. 230 (p. 363) is designed to
illustrate Grassi's view as to this modification; the individual A is already
far advanced in the direction of the winged form C, but can nevertheless
be diverted by the Termites to form the adult soldier B.
According to the facts we have stated, neither heredity nor sex nor arrest
of development are the causes of the distinctions between worker and
soldier, though some arrest of development is common to both; we are
therefore obliged to attribute the distinction between them to other
influences. Grassi has no hesitation in attributing the anatomical
distinctions that arise between the soldiers, workers, and winged forms to
alimentation. Food, or the mode of feeding, or both combined, are,
according to the Italian naturalist, the source of all the distinctions, except
those of sex, that we see in the forms of any one species of Termite.
The salivary food is white and of alkaline nature; when excreted it makes
its appearance on the upper lip. It is used either by other individuals or by
the specimen that produced it; in the latter case it is transferred to the
lower lip and swallowed by several visible efforts of deglutition. The
aliments we have mentioned are made use of to a greater or less extent
by all the individuals except the very young; these are nourished only by
saliva: they commence taking proctodaeal and stomodaeal food before
they can eat triturated wood.
It must be admitted that the duration of life of the king has not been
sufficiently established, for the coexistence of a king with a queen in the
royal cell is not inconsistent with the life of the king being short, and with
his replacement by another. Much that is imaginary exists in the literature
respecting Termites, and it is possible that the life of the king may prove
to be not so prolonged as has been assumed.
Fig. 235.—Royal pair of Termes sp. from Singapore, taken out of royal cell.
A, A, King, lateral and dorsal views; B, B, queen, dorsal and lateral
views. Natural sizes.