Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Reverse Clustering Formulation

Interpretation and Case Studies 1st


Edition Jan W. Owsi■ski
Visit to download the full and correct content document:
https://textbookfull.com/product/reverse-clustering-formulation-interpretation-and-case
-studies-1st-edition-jan-w-owsinski/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Supply Chain Management Models: Forward, Reverse,


Uncertain, and Intelligent Foundations with Case
Studies Hamed Fazlollahtabar

https://textbookfull.com/product/supply-chain-management-models-
forward-reverse-uncertain-and-intelligent-foundations-with-case-
studies-hamed-fazlollahtabar/

Zooarchaeology in Practice: Case Studies in Methodology


and Interpretation in Archaeofaunal Analysis 1st
Edition Christina M. Giovas

https://textbookfull.com/product/zooarchaeology-in-practice-case-
studies-in-methodology-and-interpretation-in-archaeofaunal-
analysis-1st-edition-christina-m-giovas/

Rethinking Disability A Disability Studies Approach to


Inclusive Practices 2nd Edition Jan W. Valle

https://textbookfull.com/product/rethinking-disability-a-
disability-studies-approach-to-inclusive-practices-2nd-edition-
jan-w-valle/

Predictive Modeling Applications in Actuarial Science


Volume 2 Case Studies in Insurance 1st Edition Edward
W. Frees

https://textbookfull.com/product/predictive-modeling-
applications-in-actuarial-science-volume-2-case-studies-in-
insurance-1st-edition-edward-w-frees/
Interpretation and Film Studies: Movie Made Meanings
Phillip Novak

https://textbookfull.com/product/interpretation-and-film-studies-
movie-made-meanings-phillip-novak/

Python Machine Learning Case Studies: Five Case Studies


for the Data Scientist 1st Edition Danish Haroon

https://textbookfull.com/product/python-machine-learning-case-
studies-five-case-studies-for-the-data-scientist-1st-edition-
danish-haroon/

Python Machine Learning Case Studies: Five Case Studies


for the Data Scientist 1st Edition Danish Haroon

https://textbookfull.com/product/python-machine-learning-case-
studies-five-case-studies-for-the-data-scientist-1st-edition-
danish-haroon-2/

Case Formulation for Personality Disorders: Tailoring


Psychotherapy to the Individual Client Ueli Kramer

https://textbookfull.com/product/case-formulation-for-
personality-disorders-tailoring-psychotherapy-to-the-individual-
client-ueli-kramer/

International Case Studies in Event Management


(Routledge International Case Studies in Tourism) 1st
Edition Edited By Judith Mair

https://textbookfull.com/product/international-case-studies-in-
event-management-routledge-international-case-studies-in-
tourism-1st-edition-edited-by-judith-mair/
Studies in Computational Intelligence 957

Jan W. Owsiński · Jarosław Stańczak ·


Karol Opara · Sławomir Zadrożny ·
Janusz Kacprzyk

Reverse
Clustering
Formulation, Interpretation and Case
Studies
Studies in Computational Intelligence

Volume 957

Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/7092


Jan W. Owsiński Jarosław Stańczak
• •

Karol Opara Sławomir Zadrożny


• •

Janusz Kacprzyk

Reverse Clustering
Formulation, Interpretation and Case Studies

123
Jan W. Owsiński Jarosław Stańczak
Polish Academy of Sciences Polish Academy of Sciences
Systems Research Institute Systems Research Institute
Warsaw, Poland Warsaw, Poland

Karol Opara Sławomir Zadrożny


Polish Academy of Sciences Polish Academy of Sciences
Systems Research Institute Systems Research Institute
Warsaw, Poland Warsaw, Poland

Janusz Kacprzyk
Polish Academy of Sciences
Systems Research Institute
Warsaw, Poland

ISSN 1860-949X ISSN 1860-9503 (electronic)


Studies in Computational Intelligence
ISBN 978-3-030-69358-9 ISBN 978-3-030-69359-6 (eBook)
https://doi.org/10.1007/978-3-030-69359-6
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

We witness nowadays an explosive growth and development of methods and


techniques, related to data analysis, this growth being conditioned, on the one hand,
by the rapidly expanding availability of data in virtually all domains of human
activity, and, on the other hand, the very substantive progress in technical and
scientific capabilities of dealing with the increasing volumes of data. All this
amounts to a dramatic change, especially in quantitative terms.
Yet, as researchers and practitioners involved in the work on methodological
side of data analysis know very well, many of the fundamental substantive prob-
lems in this domain still require solutions, or at least—better solutions—than those
available now. This concerns, in particular, such fundamental areas as clustering,
classification, rule extraction, and so on. The primary issue is here constituted by
the opposition between precision or accuracy and speed or computational cost
(when the problem at hand is already truly well-defined). One cannot forget, nei-
ther, of the very strong data dependence of effectiveness and efficiency of many
of the methodologies being applied nowadays, making the situation even more
difficult.
The present book addresses this nexus of issues, aiming, in this case, apparently
at the interface of clustering and classification, but, in fact, being relevant to a much
broader domain, with much broader implications in terms of applicability and
interpretation. Namely, it describes the paradigm of “reverse clustering”, introduced
by the present authors. The paradigm concerns the situation, in which we are given
a certain data set, composed of entities, observations, objects…, which is usual for
the data analysis situation, and, at the same time, we are given, or we consider, a
certain partition of this data set. We do not assume a priori anything about the data
set, nor about the partition, and, essentially importantly, about the relation between
the data set and the partition. Thus, the partition may be the result of a definite kind
of analysis of the given data set, but may, as well, result from quite a different
mechanism (e.g. a division of the set of objects according to some variable or
criterion not contained in the data set at hand).

v
vi Preface

Under these circumstances—the data set and the partition being given—we try
to reconstruct the partition on the basis of the data set, using cluster analysis. We try
to find the entire clustering procedure that will yield, for this given data set, a
partition that is as close to the given one as possible. Thus, the result of the pro-
cedure is both the clustering procedure, defined by a number of attributes (clus-
tering method, its parameters, variable selection, distance definition,…) and the
concrete partition found.
It is obvious that the paradigm borders upon classification (for a very specific
formulation/interpretation of the situation faced), but extends to a much broader
domain, in which the perception of the problem itself and the meaning of solutions
can vary very widely. This is, in particular, shown in the present book.
In the current stage of work, the results obtained and largely contained in this
book pertain mainly to the substantive aspect of the paradigm, while the technical
aspects of the respective algorithms are, as of now, left to future research.
The reverse clustering paradigm constitutes a new perspective on quite a broad
spectrum of problems in data analysis, and, as the book shows, it can provide very
interesting, instructive and significant results, under a wide variety of interpreta-
tional assumptions. We sincerely hope, therefore, that this book does not only give
the Readers a new material and fresh insight into some problems of data analysis,
but may also provoke them to deeper studies in the direction here indicated.

Warsaw, Poland Jan W. Owsiński


Jarosław Stańczak
Karol Opara
Sławomir Zadrożny
Janusz Kacprzyk
Introduction

This book is devoted to an approach or a paradigm, developed by the authors and


applied to a series of cases, of diverse character, mostly based on real-life data; the
approach (or paradigm) belonging to the broadly understood domain of data
analysis—more precisely: classification and cluster analysis. We call the approach
“reverse clustering” because of its logic, which is formulated as follows:

Assume we dispose of a set of data, X, composed of n objects or observations,


indexed i, i = 1,…,n, each of these being described by a vector of m features
or variables, indexed k, the respective vector being denoted xi = {xi1,…,xik,
…,xim}. At the same time, assume we dispose of a partition of the set X of
objects into subsets, this partition being denoted PA. For these data, we try to
obtain a partition PB that is as close to PA as possible, by applying clustering
algorithms to the set X. Thereby, we find both the partition PB that is as close
as possible to PA and the concrete clustering procedure, with all its param-
eters, which yields the partition PB.

The above does not explicitly state the purpose of the exercise (to say nothing
of the technical details), but it can easily be deduced that what is aimed at is closely
related to the notion of classification. While the close relation with classification is
not only obvious, but definitely true, the paradigm has a much wider spectrum of
applications and meanings, as this is explained in Chap. 2 of the book, following
the more precise presentation of this paradigm, given in Chap. 1.
The paradigm is constituted, first, by the above statement of the problem, which
then has to be expressed in pragmatic technical terms, involving
(1) the space of clustering algorithms with its granularity (what algorithms are
accounted for and what parameters, defining the entire clustering procedure, are
being subject of the search for PB);

vii
viii Introduction

(2) the measure of similarity between the partition of the set X, given at the outset,
i.e. PA, and the partitions, obtained from the clustering algorithms, this measure
being maximised (or the measure of distance between them, being minimised);
and
(3) the technique of search for the PB given the data of the concrete problem.
This paradigm is, however, also, and perhaps even more importantly, constituted
by the interpretation of the entire setting, and the particular instances of this
interpretation—as mentioned, treated at length in Chap. 2. This is important insofar
as it places the paradigm against the background of the data analysis domain, with
special emphasis on classification and related fields. These various interpretation
instances are associated primarily with the status of the partition PA, namely its
source, the degree of credibility we assign to it, as well as its actual or presumed
connection with the data set X. Depending on these, and on the results obtained, the
status of the obtained partition PB, including validity and applicability, will also
vary significantly.
Owing to this variety of interpretations, the paradigm may find application in a
broad spectrum of analytic, but also cognitive, situations. The subsequent chapters
of the book, starting with the third one, are exactly devoted to the presentation
of the cases treated, which definitely differ not only as to their substance matter
(domain, from which the data come), but, largely, as to the interpretation of the
actual problem and the results obtained. The implication is that the paradigm can be
used in many data analytic circumstances for diverse purposes, whenever the
structuration of the data set into groups is appropriate.
The paradigm of reverse clustering has been presented already in several papers
by the same team of authors, e.g. in Owsiński et al. (2017a, b), Owsiński, Stańczak
and Zadrożny (2018). The present book aims at a more complete presentation of the
paradigm and its interpretations. The book does not go into the computational and
numerical issues and details, which are, of course, of very high importance.
Namely, the main purpose of the book is to present the approach and its capacities
in terms of various kinds of situations, problems and interpretations of respective
results. We do indeed hope it conveys the intended message in an effective and
interesting manner.
The book is structured in the following manner: first, Chap. 1 presents the
scheme of the approach, characterised, in particular, as it has been used in the cases
illustrated in this book, along with notation used. Then, Chap. 2 outlines the context
of the reverse clustering, starting with other approaches, which concern similar
kinds of problems, related to data analysis, including also an ample reference to the
very general idea of reverse engineering, as well as explainable artificial intelli-
gence or data analysis. Then, the context is shortly analysed in terms of more
detailed specific problems, arising in connection with both the reverse clustering
procedure and the data analytic methods in a more general perspective (like, e.g.
selection of variables, or definitions of distance). This chapter contains also a very
important section on the potential interpretations of the reverse clustering paradigm
and its results. Chapter 3 constitutes a very short introduction to the cases studied
Introduction ix

and illustrated in the book, which are then presented in the consecutive chapters:
Chap. 4 is devoted to the motorway traffic data, Chap. 5 to environmental con-
tamination data, Chaps. 6 and 7 to two separate cases of typologies or classifications
of administrative units in Poland, and, finally, Chap. 8 to some more academic
exercises. The book closes with Chap. 9 summarising the work done and proposing
some new vistas.
This book is intended to offer the Readers truly interesting and novel perspec-
tives in data analysis, regarding the diverse ways of formulating and approaching
problems, and understanding the results, and we shall be very satisfied if it did it at
least in a perceptible degree.

Jan W. Owsiński
Jarosław Stańczak
Karol Opara
Sławomir Zadrożny
Janusz Kacprzyk
Contents

1 The Concept of Reverse Clustering . . . . . . . . . . . . . . . . . . . . . . .... 1


1.1 The Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 1
1.2 The Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 4
1.3 The Elements of Vector Z: The Dimensions of the Search
Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 5
1.4 The Criterion: Maximising the Similarity Between Partitions
PA and PB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 11
1.5 The Search Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 12
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 13
2 Reverse Clustering—The Essence and The Interpretations . . . . . . . . 15
2.1 The Background and the Broad Context . . . . . . . . . . . . . . . . . . . 15
2.2 Some More Specific Related Work . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 The Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Case Studies: An Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1 A Short Characterisation of the Cases Studied . . . . . . . . . . . . . . . 37
3.2 The Interpretations of the Cases Treated . . . . . . . . . . . . . . . . . . . 40
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 The Road Traffic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1 The Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 The Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 The Chemicals in the Natural Environment . . . . . . . . . . . . . . . . . . . 53
5.1 The Data and the Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 The Procedure: Determining the Partition PA . . . . . . . . . . . . . . . . 56
5.3 The Procedure: Reverse Clustering . . . . . . . . . . . . . . . . . . . . . . . 58

xi
xii Contents

5.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61


References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6 Administrative Units, Part I . . . . . . . . . . . . . . . . . . . . . . . . . ...... 63
6.1 The Background: Polish Administrative Division
and the Province of Masovia . . . . . . . . . . . . . . . . . . . . . . ...... 63
6.2 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 64
6.3 The Analysis Regarding the Administrative Categorization
of Municipalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 67
6.4 A Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 71
6.5 The Analysis Regarding the Functional Categorization
of Municipalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 72
6.6 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . . . . ...... 76
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 78
7 Administrative Units, Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.1 The Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 The Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.3 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8 Academic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.2 Fisher’s Iris Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.3 Artificial Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
9 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.1 Interpretation and Use of Results . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.2 Some Final Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
List of Figures

Fig. 2.1 The scheme of the reverse clustering problem formulation . . . . .. 25


Fig. 2.2 The scheme of potential cases of interpreting the paradigm
of reverse clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 28
Fig. 2.3 An illustration of division of a set of objects according
to the rule of “putting together the dissimilar and separating
the similar”: colours indicate the belongingness to three groups:
blue, red and green . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 30
Fig. 3.1 Rough indication of interpretations of the cases treated against
the framework of Fig. 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 41
Fig. 4.1 Median hourly profiles of traffic for the classes of the days
of the week . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 44
Fig. 4.2 Hourly profiles of traffic intensity for individual hours
of the week. Colours, assigned to successive days, denote
the clusters, forming the initial partition PA . . . . . . . . . . . . . . . .. 44
Fig. 4.3 Visual interpretation of clusters described in Table 4.2 . . . . . . .. 49
Fig. 5.1 Concentration levels for Pb: areas in the order of increasing
Pb concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55
Fig. 5.2 Concentration levels for Cd: areas in the order of increasing
Cd concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55
Fig. 5.3 Concentration levels for Zn: areas in the order of increasing
Zn concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 56
Fig. 5.4 Concentration levels for S: areas in the order of increasing
S concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 56
Fig. 5.5 The distribution of points (“areas”) in the space of
concentrations for a particular elements and pairwise; b
enlarged for Zn and S (upper box) and of Pb and Cd (lower
box); see the text further on for the interpretation of colours . . .. 57
Fig. 6.1 Data on municipalities of the province of Masovia with
administrative categorisation into three categories on the plane
of the first two principal components (colours refer to the results
from Table 6.6). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 69

xiii
xiv List of Figures

Fig. 6.2 Map of the province of Masovia with the indication of the
municipalities classified in three clusters resulting from the
reverse clustering according to the data from Table 6.3. Red
area in the middle corresponds to Warsaw and its
neighbourhood, the bigger red blobs correspond to subregional
centres (Radom, Płock, Siedlce and Mińsk Mazowiecki) . . . . . .. 70
Fig. 6.3 Map of Masovia province with the partition PB
from Table 6.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 76
Fig. 6.4 Map of Masovia province with the partition PB
from Table 6.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77
Fig. 7.1 Two examples of the procedures, leading to the potential
prior categorization of the sort similar to the one of interest
here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 80
Fig. 7.2 Map of Poland with indication of municipalities, which
belonged in the solution of Table 7.2 to the “correct”
categories from the initial partition and those that belonged
to the other ones (“incorrect”) . . . . . . . . . . . . . . . . . . . . . . . . . .. 84
Fig. 7.3 Map of Poland, showing the partition of the set of Polish
municipalities obtained with the own evolutionary method
and the k-means algorithm, composed of 12 clusters,
corresponding to Table 7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 85
Fig. 8.1 An example of the artificial data set with “nested clusters”,
subject to experiments with reverse clustering . . . . . . . . . . . . . .. 92
Fig. 8.2 An example of the artificial data set with “linear broken
structure”, subject to experiments with reverse clustering . . . . . .. 92
Fig. 9.1 Map of the province of Masovia showing the municipality
types, obtained from the reverse clustering performed
with DBSCAN algorithm, characterised in Table 9.1 . . . . . . . . . . 100
Fig. 9.2 The meta-scheme of application of the reverse clustering
paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
List of Tables

Table 1.1 Values of the Lance-Williams coefficients for the most


popular of the hierarchical aggregation clustering
algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8
Table 1.2 Elements of calculation of the Rand index of similarity
between partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11
Table 4.1 Summary of results for the first series of experiments
with traffic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46
Table 4.2 Results for traffic data for the entire vector of parameters Z,
with the use of hierarchical aggregation (values of Rand
index = 0.850, of adjusted Rand = 0.654). The upper part
of the table shows the coincidence of patterns in particular
Aq, based on the days of the week, and obtained Bq . . . . . . .. 47
Table 4.3 Results for the traffic data obtained with the “pam”
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 48
Table 5.1 Pollution data for Baden-Württemberg (Germany),
used in the exemplary calculations: total concentrations,
in mg/kg of dry weight (Pb-Lead, Cd-Cadmium, Zn-Zinc,
S-Sulphur) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 54
Table 5.2 Numbers of areas in the classes, defined for the elements
Zn and S contents in the herb layer . . . . . . . . . . . . . . . . . . . .. 58
Table 5.3 Contingency table for the partition PA assumed
and the one obtained in Series 1 of calculations, PB,
with the k-means algorithm and data only for Pb and Cd . . .. 59
Table 5.4 Contingency table for the partition PA assumed
and the one obtained in Series 1 of calculations, PB,
with the hierarchical aggregation algorithm and data only
for Pb and Cd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60
Table 5.5 Contingency table for the partition PA assumed and the one
obtained in Series 1 of calculations, PB, with the DBSCAN
algorithm and data for all four elements. . . . . . . . . . . . . . . . .. 60

xv
xvi List of Tables

Table 5.6 Contingency table for the partition PA assumed and the one
obtained in Series 2 of calculations, PB, with the hierarchical
merger algorithm and data for all four elements . . . . . . . . . . .. 60
Table 6.1 Functional typology of municipalities of the province
of Masovia (data as of 2009) . . . . . . . . . . . . . . . . . . . . . . . . .. 65
Table 6.2 Variables describing municipalities, accounted
for in the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 66
Table 6.3 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with own evolutionary
algorithm using k-means . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67
Table 6.4 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with own evolutionary
algorithm using hierarchical aggregation . . . . . . . . . . . . . . . .. 67
Table 6.5 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with own evolutionary
algorithm using DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68
Table 6.6 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with DE algorithm
using “pam” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68
Table 6.7 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with DE algorithm using
“agnes” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68
Table 6.8 Examples of variable weights for two runs of calculations,
presented in Tables 6.3 and 6.4 . . . . . . . . . . . . . . . . . . . . . . .. 71
Table 6.9 Contingency matrix for the administrative breakdown
of municipalities of the province of Wielkopolska in Poland
and clustering performed with the Z vector obtained
for Masovia in the case shown in Table 6.3
(k-means algorithm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 71
Table 6.10 Contingency matrix for the administrative breakdown of
municipalities of the province of Wielkopolska in Poland and
clustering performed with the Z vector obtained for Masovia
in the case shown in Table 6.4 (hierarchical aggregation
algorithm). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 72
Table 6.11 The contingency matrix for the functional typology of
municipalities of Masovia from Table 6.1 and reverse
clustering with own evolutionary method using the k-means
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73
List of Tables xvii

Table 6.12 The contingency matrix for the functional typology


of municipalities of Masovia from Table 6.1 and reverse
clustering with own evolutionary method using hierarchical
aggregation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73
Table 6.13 The contingency matrix for the functional typology
of municipalities of Masovia from Table 6.1 and reverse
clustering with DE using “pam” algorithm . . . . . . . . . . . . . . .. 74
Table 6.14 The contingency matrix for the functional typology
of municipalities of Masovia from Table 6.1 and reverse
clustering with DE using “agnes” algorithm. . . . . . . . . . . . . .. 75
Table 7.1 Functional typology of Polish municipalities . . . . . . . . . . . . .. 81
Table 7.2 Contingency table for the proposed functional typology
of Polish municipalities and the reverse clustering partition
obtained with own evolutionary method using k-means
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 82
Table 7.3 Variable weights in the solution illustrated in Table 7.2 . . . .. 83
Table 7.4 Contingency table for the proposed functional typology
of Polish municipalities and the reverse clustering partition
obtained with own evolutionary method using hierarchical
aggregation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 86
Table 8.1 The results obtained for the Iris data with the DE
method—comparison of “pam” and “agnes” algorithms
and two selections of vector Z components (notation
as in Table 4.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 90
Table 8.2 Contingency table for the DE method applied to the Iris
data with the “pam” algorithm . . . . . . . . . . . . . . . . . . . . . . . .. 90
Table 8.3 Contingency table for the DE method applied to the Iris
data with the “agnes” algorithm . . . . . . . . . . . . . . . . . . . . . . .. 90
Table 8.4 The reverse clustering results for the Iris data obtained
with the own evolutionary method using DBSCAN, k-means
and hierarchical merger algorithms . . . . . . . . . . . . . . . . . . . . .. 91
Table 9.1 Contingency matrix for the typological categorisation of the
municipalities of the province of Masovia in Poland obtained
with reverse clustering using own evolutionary algorithm and
the DBSCAN algorithm (for explanations see Chap. 6) . . . . .. 99
Chapter 1
The Concept of Reverse Clustering

1.1 The Concept

This book presents an approach, or a paradigm, within which we try to develop a


reverse engineering type of procedure, aimed at reconstructing a certain partition1
of a data set, X, X = {x i }, i = 1,…,n, into p subsets (clusters), Aq , q = 1,…,p. We
assume that each object, indexed by i, is characterized by m variables, so that x i =
(x i1 ,…,x ik ,…,x im ).
Having the partition PA = {Aq }q , given in some definite manner, we now try to
figure out the details of the clustering procedure which, when applied to X, would
have produced the partition PA or its possibly accurate approximation.
That is, we search in the space of configurations, with a particular configuration
denoted by Z, this space being spanned by the following parameters:
(i) the choice of the clustering algorithm, and characteristic parameters of the
respective algorithm(s);
(ii) the selection or other operations on the set of variables (e.g. weighing,
subsetting, aggregation), and
(iii) the definition of a similarity/distance measure between objects, used in the
algorithm.
The partition, resulting from applying the clustering procedure with a candidate
configuration of the above parameters is denoted PB and is composed of clusters Bq’ ,
q’ = 1,…,p’, PB = {Bq’ }q’ . The search is performed by optimizing with respect to a
certain criterion, denoted Q(PA , PB ), defined on the pairs of partitions.
So, as we denote the set of parameters, comprising a configuration, that is being
optimized in the search, by Z (notwithstanding the potential differences in the actual
content of Z), and the space of values of these parameters by , then we are looking
in  for a Z * that minimizes Q(PA , PB ), where PB (Z * ) is a partition of X obtained

1 The concept of a Reverse Cluster Analysis has been introduced by Ríos and Velásquez (2011) in
case of the SOM based clustering, but it is meant there in a rather different sense than associating
original data points with the nodes in the trained network.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 1
J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_1
2 1 The Concept of Reverse Clustering

using the configuration Z * . Formally, we can treat Z as a transformation of the data


set X (a cluster operator) and, thus denote the optimization problem for a given data
set X and its known partition , where denotes the set of all
partitions of X, as follows:

(1.1)

Z ∗ = arg min Q(PA , Z (X )). (1.2)


Z ∈

(Notice that this optimization problem is in line with the reverse engineering, or
backward engineering paradigm, i.e. a procedure that aims at finding out for some
object or process what has been the underlying design, architecture or implementation
process that led to the appearance of the object in question—more on this subject in
Chap. 2.)
Because of the irregularity of circumstances of this search (the nature of the search
space and the values of the performance criterion, see later on for some more details
on this), the solution of the optimization problem defined above is a challenging task.
In our experiments, presented later on in the book, optimisation is performed with
the use of evolutionary algorithms.
Altogether, we try to reconstruct a way in which PA has been obtained from X, this
way, represented through the configuration, Z, pertaining to the broadly conceived
procedure of cluster analysis, for the very simple reason that clustering is a natural
way to produce partitions of sets of data. In some specific circumstances one might
imagine other approaches to the said reconstructing, but we stick to the apparently
most natural one. A short discussion of this subject is also provided in Chap. 2 of
the book.
In order to bring the problem formulated here closer to some real life situation,
let us consider the following three examples of situations:

Example 1: a car dealer. Assume a secondhand car dealer disposes of a set of


data on (potential) customers, who visit the website of the dealer, call this set Y,
and the set of data on those, who actually bought a car from the dealer, the set X.
Naturally, the set X is much smaller than Y (it may constitute, say, less than 1% of
Y ). In this case, a partition of X, PA, might be based on the makes and/or types of
cars purchased by customers represented in the data set X. The dealer might wish to
identify the “rules” leading to the partition PB of the set of the purchasing customers,
X, disregarding the labeling by car makes / types, yet such that approximate possibly
well PA . The obvious objective would be to identify how the groups of customers
interested in particular makes/types of cars may form. Having such a procedure
(Z * ) identified one may apply it to the set Y and obtain the “classes” of (potential)
customers of particular makes/types which, in turn, may be more effectively targeted
with the promotional offers or just information, regarding definite makes / types
of cars. Thus, upon finding the Z * that produces PB that is the closest to PA , one
might hope that by applying Z * to Y it would be possible to define the classes of
1.1 The Concept 3

the (potential) customers, at whom appropriate offers could be effectively addressed


during their search through the website. These classes would form the partition PB
= Z * (Y ).

Example 2: categorization of chemical compounds. Assume that i ∈ I is the


index of a set X of chemical compounds, which are classified in PA according to
their theoretically known properties, primarily related to their toxic properties, or,
more generally, their environmental impact. These properties, along with the associ-
ated classification PA , are based on their composition and structure, and the known
consequences thereof. On the other hand, let us assume that for each compound
i, a vector/object x i of the actual measurements and assessments “from the field”
is available, reflecting the actual action and characteristics of the respective i-th
compound in the concrete, diverse, environmental conditions. Thus, the set X may
be interpreted as a set of such vectors, X = {x i }i∈I or as a matrix X = [x ik ], where k
is an index of an attribute characterizing object x i . These may be related to both the
(induced or deduced) impact on the biotic and abiotic environment, and the charac-
teristics of more physical character, such as penetration speed and reach, persistence,
adhesion, etc. In addition, there may be multiple observations for a single compound
i and, thus, X is actually a bag (multiset). Now, the (“best”, i.e., “closest” to PA )
partition PB we obtain for X = {x i }i∈I , especially regarding the clustering of x i ’s,
reveals partly the influence of the variety of environmental situations on the actual
action of the compounds, but, definitely, also sheds light “backwards” on the appro-
priateness of the categorization PA , motivating, perhaps, to the search for additional
dimensions, characterizing the compounds analyzed. This can take on the form of
an iterative back-and-forth procedure, with subsequent PA (t) and PB (t), obtained in
consecutive iterations t, hopefully getting closer to each other, if not converging.

Example 3: the actual toxicity of mushrooms. Even though this case might be
regarded as anecdotal, mushrooms do constitute an important part of cuisine and
diet in many cultures, and also in many of them lead, every year, to deaths or severe
hospitalizations. It is also well known that owing to the biological properties of
mushrooms their toxicity is highly variable, and the actual effects heavily depend
upon the way they are prepared (e.g. boiling mushrooms in water and then pouring
this water away) and consumed, as well as upon the consumer, her/his general and
current characteristics (like, e.g., age, weight, or alcohol currently consumed). The
partition PA is meant to correspond to the classes of toxicity / edibility of the particular
species, with the aim of communicating these characteristics to the wide public
in a possibly clear manner. Thus, PA , prepared by the experts is juxtaposed with
the partition PB , obtained from the set X of descriptions x i of the actual medically
described poisoning cases, as well as interviews with experienced cooks, specialized
in mushroom dishes. The juxtaposition is intended to lead to better justified and
cogently characterized classification PB (Z * ), supposedly communicated to the wide
public, including general edibility assessments, cooking indications, advice as to the
identification and first help, etc.
4 1 The Concept of Reverse Clustering

1.2 The Notation

We shall now sum up the notation already introduced, extending it whenever


necessary with the notions that will be used further on:
X = {x i }i ∈ I – a set of objects under consideration; this symbol, depending
on the context, may be interpreted in a slightly different way
(see further below);
n– number of objects (observations) in the data set considered;
i– index of the objects, i = 1,…,n;
I = {1,…,i,…,n} – the set of indices of the objects; this set of indices is often
equated, for simplicity, with the set of objects;
m– number of variables (features, attributes, characteristics),
describing the objects2 ;
k– index of the variables of the objects, k = 1,…,m;
x ik – value of variable k for object i; this value belongs to a domain
k associated with variable k;
xi – complete description of the object i in the form of the vector
of values x i = [x i1 ,…,x ik ,…,x im ];
X– also: an n x m matrix, containing the descriptions of all n
objects, according to all m variables;
Ex – the Cartesian product 1 × … × k × … × m of domains of
all variables/attributes which are used to characterize object
x;;
P– a partition of the set of objects X = {x i }i ∈ I , often understood
as the set of their indices, I, into disjoint non-empty subsets
(clusters), P = {Aq }q=1,…,p , jointly covering
 the whole set X,
i.e., ∀q Aq = ∅, ∀q1, q2 Aq1 ∩ Aq2 = ∅, q Aq = X ;
Aq – a cluster (subset of I), indexed by q; q = 1,…,p, where p is the
number of clusters; thus, P = {Aq }; the clusters are assumed
to be disjoint and to exhaust (cover) the set I (hence, we do not
consider, at least not in this book, the fuzzy or rough clusters);
PA – the partition, which is provided together with X as the datum
of the concrete problem;
Z– the vector of parameters (a configuration) of the clustering
procedure comprising the very procedure itself, applied to X,
yielding a partition P = Z(X) of X;
– the universe of possible/considered vectors (configurations)
Z;

2 We do not consider here, in this book, the issue of missing data. Thus, it is assumed that for all
n objects each of m variable values is specified. Although the reverse clustering paradigm applies
also to the case of missing values, the book is devoted to the presentation of the main aspects and
implications of the paradigm, without delving into the multiple, even if important, side issues.
1.2 The Notation 5

Q(.,.) – a measure of similarity or distance between two partitions; we


shall also use notation Q for the quality functions of partitions,
when referred to explicitly;
PB – the partition, obtained from the entire procedure, as suppos-
edly the closest to PA ;
d(.,.) – the distance measure between objects; for objects, charac-
terised in X, we admit a simpler notation: d ij , where i,j ∈
I;
D(.,.) – the distance measure between sets of objects.
A, B, … - a general notation of subsets of I;
X, Y, … - also: general notation of the data sets, describing sets of
objects.

1.3 The Elements of Vector Z: The Dimensions


of the Search Space

We shall now give some additional details, which are associated with the concrete
implementation of the concept introduced here, according to the three aspects of the
space of configurations, specified before. Thereby, we shall be specifying the content
of the vector Z, composed of the individual parameters, subject to choice.
The choice of the clustering algorithms and their parameters.
Concerning the search with respect to the clustering algorithm, throughout this
volume we shall be confined to three families of algorithms:
1. The k-means-type algorithms with some of its varieties, like, e.g. k-medoids;
2. The classical progressive merger algorithms, such as single linkage, complete
linkage etc., and
3. A representative of the local density based algorithms, in this case the DBSCAN
algorithm.
No other kinds of clustering algorithms were accounted for in the experiments
reported in this volume, but, actually, considering the clustering algorithms proper,
the ones mentioned constitute the major part of those numerous clustering algorithms
that could be included in the search. It was important for us to consider the approaches,
which are by their very nature oriented at solving of the clustering problem—it should
namely be mentioned, for clarification, that the metaheuristics, very often used also
for clustering purposes, are by no means clustering algorithms themselves, and do
not contain in themselves the rationality, oriented at a possibly good partitioning of
a data set, but, quite generally, at finding an optimum solution.
We do by no means provide here any review of clustering methods, this domain
being the subject of a multitude of books and papers, both general, survey-like, and
devoted to concrete methods and algorithms, to say nothing of a myriad of appli-
cations. For the sake of completeness we mention such general references, dealing
with clustering, as Mirkin (1996), de Falguerolles (1977), Hayashi et al. (1996),
6 1 The Concept of Reverse Clustering

Banks et al. (2004, 2011), Wierzchoń and Kłopotek (2018), Bramer (2007), Owsiński
(2020), as well as, more focused on specific problems in clustering, Adolfsson et al.
(2019), Figueiredo et al. (1999), Guha et al. (2003), or Simovici and Hua (2019).
The k-means-type algorithms.
The k-means algorithms are based on the following general procedure:
1. for the given data set X = {x i }i∈I generate in some way p points3 in E x
(centroids), denote them x q , q = 1,…,p;
2. assign each object x i from X to the closest centroid x q , thus, for each x i distances
d(x i ,x q ) are calculated for q = 1,…,p, and x i is assigned to x q* , for which d(x i ,x q* )
= minq d(x i ,x q ); thereby, the clusters Aq are formed;
3. for the obtained clusters Aq determine the new centroids x q , being the “repre-
sentatives” of the clusters, e.g. as the means of the elements of clusters, assigned
to clusters in the previous step;
4. if the stopping criterion, e.g., the lack of essential changes between the centroids
in subsequent steps of the algorithm, is not satisfied (yet), go to 2, otherwise
terminate.
This simple procedure was initially formulated by Steinhaus (1956), and soon
afterwards was also developed by Lloyd (1957), but the main impact came from
Forgy (1965), Ball and Hall (1965), and MacQueen (1967). The fuzzy-set based
version of the general k-means method, which became enormously popular and
known as fuzzy c-means, was formulated by Bezdek (1981) (see also, for fuzzy
partitions, Dunn 1974, and Bezdek et al. 1999), following which quite a number of
varieties and algorithmic proposals within the k-means-like algorithm family were
forwarded (see, for instance, Lindsten et al. 2011, Dvoenko 2014, the recent work
of Kłopotek, 2020, or the discussions of equivalence with the Kohonen’s SOMs,
originally formulated by Kohonen 2001).
Nowadays, this generic procedure is being implemented in a variety of manners,
differing, in particular, as to the status of the x q —whether they are chosen from
among the objects x i (k-medoids version) or can be any elements of E x (the classical
k-means) and the way, in which they are determined, and it is available through a
number of open access and paid libraries.
The procedure, along with its varieties, is known to converge quickly (in a couple
or a dozen of iterations of the procedure above) to a local minimum, depending upon
the starting point (the initial points, “centroids”, from step 1) and the nature of the
set X. Since it converges quickly, it remains feasible to start it many times over from
diverse initial sets of centroids in order to increase the chances of finding the global
optimum.
The local minimum that is reached through the functioning of the above procedure
is, naturally, the minimum of the following criterion function:
   
Q(P) = d xi , x q .
q i∈Aq

3 Usually,
instead of p we would use k, as in „k-means” but this would overlap with an earlier
assumed meaning of k as an index of variables/attributes characterizing objects to be clustered.
1.3 The Elements of Vector Z: The Dimensions of the Search Space 7

The distance function used is the Euclidean metric squared, in order to preserve
the properties, associated with the choice of cluster mean as the representative of the
cluster. It is obvious that the above Q(P) is monotonic with respect to p, its minimum
for consecutive p’s decreasing with the increase of p down to Q(P) = 0 for p = n.
That is why the k-means type algorithms are applied with the number of clusters, p,
specified.
In the light of the above it becomes clear that the parameters of the vector Z,
associated with the k-means algorithm are the very choice of the algorithm (k-means
or one of its varieties, usually k-medoids as an alternative) and the number of clusters.
Although the choice of the distance definition appears to have an influence on the
results obtained from the k-means algorithms, it is not treated here, as considered
later on in this chapter.
The classical hierarchical merger algorithms.
The second group of algorithms accounted for in the here reported study of the
reverse clustering is the group of most classical clustering algorithms, consisting in
stepwise mergers of objects and then clusters. These algorithms are all constructed
as follows:
1. start from the set of objects, X, treating each object as a separate cluster (p = n);
calculate the distances d qq’ for all pairs of objects (indices) in I; these distances
are, therefore, treated in this step as inter-cluster distances, Dqq’ ;
2. find the minimum distance Dq*q** = minqq’ Dqq’ ; join/merge the clusters, indexed
by q* and q**, between which the distance is minimum, thereby forming a new
partition, with p: = p − 1;
3. check, whether p > 1; if not, terminate the procedure (all objects have been
merged into one all-embracing cluster);
4. recalculate the inter-cluster distances (i.e. the distances between the cluster
resulting from the merging of Aq* and Aq** in the previous step, on the one hand,
and all the other clusters on the other hand, the distance Dq*q** “disappearing”);
go to 2.
This—again—very simple procedure gives rise to a variety of concrete algorithms,
which differ by the inter-cluster distance recalculation step 4. The algorithms from
this group find their ancestor in the so-called “Wrocław taxonomy” by Florek et al.
(1956), who were the first to formulate what is now called “single-linkage” algorithm,
along with some of its more general properties. The essential step in the development
of the family of these algorithms came with the papers by Lance and Williams
(1966, 1967). They introduced the general formula, according to which the distance
recalculation step is performed:

Dq ∗ ∪q ∗∗ ,q = a1 Dq ∗ q + a2 Dq ∗∗ q + bDq ∗ q ∗∗ + c|Dq ∗ q − Dq ∗∗ q |

where q* ∪ q** denotes the index of the cluster resulting from the merging of
clusters q* and q**, with the values of the coefficients, corresponding to the particular
8 1 The Concept of Reverse Clustering

Table 1.1 Values of the Lance-Williams coefficients for the most popular of the hierarchical
aggregation clustering algorithms
Algorithm a1 a2 b c
Single linkage (nearest 1/2 1/2 0 −1/2
neighbor)
Complete linkage (farthest 1/2 1/2 0 1/2
neighbor)
Unweighted average nq* /(nq + nq* ) nq** /(nq + nq** ) 0 0
(UPGMA)
Weighted average 1/2 1/2 0 0
(WPGMA)
Centroid (UPGMC) nq* /(nq + nq* ) nq** /(nq + nq** ) − nq* nq** /(nq* + nq** ) 0
Median (WPGMC) 1/2 1/2 −1/4 0

implementations of the procedure, i.e. the particular progressive merger algorithms,


shown in Table 1.1 for the most popular of these algorithms.
These algorithms have become quite commonly used because of their intuitive
appeal and the fact that the consecutive mergers lead to the tree-like image (the
dendrogram), which, accompanied by the value of distance, for which the mergers
occur, provides very valuable information. Like in the case of k-means, a choice of
these algorithms is available from multiple sources. Yet, the applicability of these
algorithms is negatively affected by the fact that the entire distance matrix has to be
kept, searched through and updated.
It must be added here that the algorithms from the group differ as to the shape
of clusters they can detect or form, a clear difference separating, in particular,
single linkage from virtually all other algorithms. Namely, the single linkage has a
tendency towards the formation of chains of points (objects), of whatever shapes and
dimensions, while the remaining algorithms tend to form compact, usually spherical
groups.
The obvious parameters of this group of algorithms in terms of the elements of
vector Z are the above listed values of a1 , a2 , b and c. Thereby, no special distinction
is necessary of the particular algorithms. However, it must be added that in many
cases we allowed these coefficients to vary more freely than this is envisaged by the
Lance-Williams formula and the corresponding table of coefficient values (i.e. only
with some constraints on the values of these coefficients), implying, potentially, the,
as of now, non-existing algorithms.4
The density based algorithms—DBSCAN.
The local density-based algorithms form a much less compact and consistent group
than the two previously considered types of algorithms. A more systematic approach
to the density-based techniques was initiated by Raymond Tremolières (Tremolières

4 Actually,
the Lance-Williams parameterisation was extended later on in order to encompass yet
more of similar algorithms, but this is of no interest for the main purpose of this book.
1.3 The Elements of Vector Z: The Dimensions of the Search Space 9

1979, 1981), but then they were virtually forgotten for a long time, mainly in view
of computational issues. They gained again popularity when, on the one hand, the
requirement of single-passage analysis of data sets became important (even before
the time of data streams analysis), in view of the volumes of available data to consider,
and, on the other hand, the new kinds of density techniques, much more computa-
tionally effective than those from before, have been proposed (see, e.g., Yager and
Filev 1994, or, more recently, Rodriguez and Laio 2014). These algorithms, in prin-
ciple, analyse the interrelations, based on distances/proximities of a limited number
of objects. One of the most commonly used algorithms in this group is DBSCAN,
due in its most popular form mainly to Ester et al. (1996), although it is claimed that
already Ling (1972) proposed the algorithm that was very similar to DBSCAN.
In this algorithm, the objects (points in E x ) are classified into three categories: core
points (implying that they are “inside” clusters), density reachable points (which may
form the “border” or the “edges” of clusters), and outliers or noise points. This clas-
sification is based on an essentially heuristic procedure, which refers to two param-
eters (these two parameters being, therefore, also the elements of the vector Z in our
approach), namely: the radius ε, within which we look for the “closest neighbours”
of a given point, and the minimum number of points, required to classify a given
region in E x as “dense”, originally denoted minPts. Based on these two parameters the
procedure classifies the objects into the three categories mentioned, and afterwards
establishes the clusters on the basis of the notion of density connectedness.
The algorithm is popular due to its fast performance and also owing to its inde-
pendence of the shape of the clusters it identifies, or forms. On the other hand, it
definitely strongly depends upon the choice of the two parameters, and, although a
similar criticism is true for, say, k-means, and its parameter p, executing k-means
for a (short) series of p’s is not a problem and may circumvent the arbitrariness of
the choice of the value of p, while finding the right pair of ε and minPts is quite
challenging, in general.
The weighing or selection of the variables.
In the search for the partition possibly similar to the given PA , operations may also
be performed on the set of variables, accounted for. Thus, two alternative options
can be applied: (i) weighing of each of variables, preferably on the scale between 0
(not considered at all) and 1 (considered as in the original data set), (ii) the binary
choice of variables, i.e. either considered or dropped (corresponding to the choice of
weights from among 1 and 0).
It is definitely not typical for clustering to proceed explicitly with such operations
on variables. Usually, such operation is performed in the preprocessing phase, often
even without explicit consideration of clustering as a possible next phase. Yet, in the
framework of reverse clustering, in some cases, this appears to be justified, especially
as it may not be known where does the partition PA come from and what is its relation
to the characterization of X.
10 1 The Concept of Reverse Clustering

Distance definitions.
It is well known that some of the clustering procedures depend to an extent, some-
times considerably, on the distance definitions used. This is absolutely clear for
the k-means family of algorithms, where squared Euclidean distance is virtually a
“must”, for formal reasons, although in some variations of this algorithm this is
no longer a strict requirement. Some definite implementations of specific algorithms
(e.g. from the hierarchical aggregation family) also work differently, depending upon
the distance definitions. The most important aspect in this regard is connected with
the influence, exerted by the objects, located far away from the other ones, the impact
of the increasing dimensionality on the significance of distance, or the differences
in densities in various regions of E x . In view of this influence, it was assumed in
the exercises in reverse clustering, illustrated in this book, that a flexible distance
definition be adopted, namely the general Minkowski distance:

    h 1/ h
d xi , x j = xik − x jk ,
k

where for h = 1 we get the Manhattan (city-block) metric, and for h = 2 the Euclidean
metric. When h tends to infinity, the distance above approximates the Chebyshev
metric, according to which, simply,
 
d xi , x j = maxk (xik − xik ).

Again, like with the Lance-Williams parameters of the hierarchical aggregation


algorithms, we allow for arbitrary (non-negative) values of h, when trying to recon-
struct the way PA has been obtained. Thereby, non-classical distance definitions could
be ultimately used.
Summing up the set of parameters, constituting the vector Z, let us enumerate
them again:

1: the indicator of choice of the clustering algorithm (k-means, hierarchical


merger, or DBSCAN);
2 to 5: the parameters of the clustering algorithms (maximum of 4 numbers
for hierarchical merger algorithms);
6 to 6+m-1: the variables and their weights or binary indicators;
6+m: the exponent h.
1.4 The Criterion: Maximising the Similarity Between Partitions … 11

Table 1.2 Elements of


Numbers of pairs of objects Partition P1
calculation of the Rand index
of similarity between In the same In different
partitions cluster clusters
Partition P2 In the same a b
cluster
In different c d
clusters

1.4 The Criterion: Maximising the Similarity Between


Partitions P A and P B

The search, realised in the space, outlined in the previous section, is performed
with respect to the fundamental criterion of the difference / affinity between the
two partitions, i.e. partition PA , which is given, and P, which is produced by the
clustering procedure, defined by Z, that is, the partition P = Z(X). Ultimately, for
the assessment of the clustering results, the classical Rand index (see Rand 1971)
was selected.5 Rand index measures the similarity of two partitions, P1 and P2 , of
a set of objects, in the following, simplest and highly intuitive manner, based on
the categorisation of pairs of objects, which is illustrated in Table 1.2. Namely, we
consider two partitions, P1 and P2 , and check, for each pair of objects from X (or I)
whether they are in the same cluster or in the different clusters.
Of course, a + b + c + d = n(n−1)/2. We aim at a (objects in the same clusters
in both partitions) and d (objects in different clusters in both partitions) as high as
possible, with b and c being as small as possible, according to the formula

  a+d
Q P 1, P 2 = .
a+b+c+d

Thus, if the two partitions are identical, then Q(P1 ,P2 ) = 1, while Q(P1 ,P2 )
= 0 when they are “completely different” (actually, this occurs only in the sole,
very specific case, for the two partitions, of which one is constituted by a single,
all-embracing cluster, and the other one is composed of all objects being separate
singleton clusters).
In view of the probabilistic properties of this Rand index (its expected value for
two random partitions is not zero), often its adjusted version (see Hubert and Arabie,
1985), denoted Qa (.,.), is being used, accounting for the deviation of the mean from
the actual expected chance value. This adjusted Rand index is defined as:

  a − E x p(a)
Qa P 1, P 2 =
Max(a) − E x p(a)

5 Some more general remarks on this subject shall be forwarded in the next chapter, when discussing

the broader background of the entire approach.


12 1 The Concept of Reverse Clustering

where Exp(a) is the expected value of the index, while the introduction of Max(a)
ensures that the maximum value of the respective measure is equal 1. These two
values can be calculated for two partitions, of which one consists of p1 clusters,
having, respectively, n11 , n12 ,…,n1p1 elements (objects), while the other partition
is composed of p2 clusters, having, respectively, n21 , n22 ,…,n2p2 elements, in the
manner as follows:

 p1 n 1q  p2 n 2q
q=1 · q=1
2 2
E x p(a) =
n
2

and

1  p1 n 1q  p2 n 2q
Max(a) = + .
2 q=1 2 q=1 2

Denœud and Guénoche (2006) suggested that for larger datasets, this kind of
adjustment increases the discriminatory power of the Rand index. Therefore, in some
of the cases reported in this book, we use it as the similarity measure between
partitions. Likewise, in some calculations, definite penalty terms were introduced
for constraining the values of the elements of Z if the possibility arose of their
uncontrolled increase. Generally, however, the original Rand index was kept to as
the main criterion of the search for PB and is virtually kept in all cases as the index of
quality of the solution, if not the actual optimisation criterion (in some cases boiling
down to simply the number of “wrongly classified” objects).

1.5 The Search Procedure

Although this book is not devoted to the analysis of numerical and computational
aspects of the reverse clustering approach—a definitely very important issue—in the
framework of the presentation of the gist and the interpretations of the paradigm, we
shall shortly characterise here the computational aspect, as well.
Thus, in view of the expected very cumbersome landscape and highly complex
choice conditions (“constraints”) it was decided to use the evolutionary algorithms
as the search tools. In actual experiments two kinds of evolutionary algorithms were
used (see also a slightly ampler description in Sect. 4.2 of the book). The first of
them was developed by one of the authors of this book (see Stańczak 2003) and is
characterised by the two-level adaptation, namely at the level of individuals, which
is standard for the evolutionary algorithms, and also at the level of operators, which
are used in a highly flexible manner with respect to different individuals, depending
Another random document with
no related content on Scribd:
Mais pour substituer à l’égoïsme la sollicitude sociale pour la multitude,
la prévoyance, l’intuition, l’économie de la femme sont indispensables.
Les hommes, sans les femmes, ne parviendront jamais à retrancher du
budget les sommes nécessaires pour faire des réformes.
En France, l’argent public appartient surtout aux habiles. Bien plus que
le mérite ou le besoin, la ruse parvient à se le faire attribuer, et les
détournements commis au préjudice de tous s’accomplissent sans
déshonorer ceux qui en bénéficient; les riches n’hésitent pas à prendre ce
qui est le propre des dénués.
Les hommes, mêmes économes de leur argent, sont prodigues de la
fortune commune que les femmes, scrupuleusement, épargneraient.
Dans la famille on ne pourrait pas plus que dans l’état, équilibrer le
budget, si la maîtresse de la maison n’intervenait pour régler les
dépenses.
Puisque chacun reconnaît que les femmes sont capables d’augmenter
la valeur d’emploi de l’argent, au point de satisfaire avec une somme
modique, aux exigences de la maisonnée; pourquoi ne leur laisserait-on
pas accomplir dans l’Etat le miracle de la multiplication des deniers
qu’elles réalisent dans la maison?
C’est parce que les Françaises ne participent pas à la gestion des
affaires publiques que l’argent manque pour activer le progrès.
Tous les partis de gauche même réunis ne pourront, sans la
coopération de la femme, satisfaire le besoin de bien-être que le
développement intellectuel a suscité. Il est donc surprenant que les
républicains ne s’empressent point d’utiliser la puissance que les femmes
représentent.
Le parti qui octroiera la plénitude des droits politiques aux femmes, se
fera trouver indispensable et sera le maître en France puisque, grâce aux
femmes qui décuplent la valeur de l’argent, il aura des ressources
suffisantes pour accomplir les réformes désirées.

L’outil pour s’affranchir


Le meilleur serrurier ne peut sans instrument ouvrir une porte fermée.
De même les femmes ne peuvent sans avoir en main cet outil, le vote,
forcer les portes du droit commun, devenir égales des hommes devant la
loi.
Alors que les femmes ne pouvaient encore être avocates, une
étudiante en droit à laquelle on présentait une pétition réclamant
l’électorat et l’éligibilité pour les femmes, refusait de signer, de crainte,
disait-elle, de compromettre sa situation. Or, peu de temps après,
précisément, le conseil de l’ordre des avocats, arguait de l’incapacité
politique de l’étudiante pour refuser de l’inscrire au tableau. La mesure
était inique, inqualifiable; mais quelle leçon donnée aux femmes et en
particulier à la plaideuse! Qui ne pouvait ignorer qu’à tout bout de champ
on lui demanderait de jouir de ses droits politiques et qu’en n’acceptant
pas de les réclamer, lorsqu’on le lui proposait elle refusait d’assurer sa
position en voulant trop la ménager.
Les dames diplômées sont heureuses de trouver pour passer, la
brèche faite. Seulement aussitôt passées, au lieu de tendre la main aux
autres, volontiers, elles lèveraient le pont. L’égoïsme cause leur perte,
tandis que la solidarité leur assurerait le succès. Leurs clientes naturelles,
les femmes, se font en effet ce raisonnement: «Si l’élite féminine, qui a eu
son savoir sanctionné par les diplômes ne dispute pas à l’homme ses
privilèges en revendiquant l’égalité politique, c’est qu’elle ne se sent pas à
sa hauteur. Or, comme pour le soin de ma santé et de mes affaires
juridiques, j’ai intérêt à m’en remettre aux supérieurs plutôt qu’aux
inférieurs. Je m’adresse aux hommes médecins et aux hommes avocats».
Voilà comment en négligeant de se préoccuper d’exister civiquement,
doctoresses et avocates entretiennent le préjuge de sexe qui pousse la
clientèle féminine chez leurs concurrents mâles.
A chaque pas dans la vie, la femme se heurte à cet obstacle: la
capacité politique. Pour postuler tel emploi, pour choisir telle carrière, il
faut jouir de ses droits politiques. Sachant cela, n’est-ce pas étrange que
les Françaises ne se mettent pas plus en peine de les conquérir?
Attendent-elles donc que ceux qui les leur ont pris, viennent les leur offrir?
Cependant, avec la lutte pour la vie qui va en s’accentuant, il est facile
de prévoir que l’homme avancera la barrière qu’il tient devant la femme,
pour s’assurer le monopole des places et des sinécures, et que bientôt,
pour confier à coudre seulement une chemise de troupier, on exigera la
capacité politique.
Dans ce pays où le moteur de tout est la politique, la femme est par le
fait d’en être exclue, défrancisée, puisqu’elle ne peut s’immiscer en rien à
ce qui se fait en France, ni avoir sa part d’aucun avantage français.
La favorisée, qui seule obtient une situation, est dans de perpétuelles
transes, car ce qui est octroyé par le bon plaisir est retiré par le bon plaisir.
Les ministres tombent, le vent tourne!
Les femmes ne seront assurées contre l’inconstance du vent, que
quand elles auront dans la main le carré de papier nommé carte électorale
qui consacre la souveraineté de qui la possède.
Si les hommes électeurs ne sont point garantis contre les variations du
vent, c’est parce qu’ils ne savent point se servir de leur vote. Suivant la
main qui la tient, une même plume écrit des phrases bien différentes. Ainsi
il en est du suffrage qui, limité à une infime minorité d’hommes, ne peut
d’ailleurs donner les résultats du suffrage universalisé à tous les hommes
et à toutes les femmes.
Parce que les femmes sont tenues hors du droit commun politique,
elles risquent de perdre leur situation. En 1897, on a enlevé aux
personnes ne jouissant pas de leurs droits politiques, c’est-à-dire aux
femmes, le droit de tenir un bureau de placement.
A Paris seulement, plus de soixante placeuses—60 intermédiaires
entre les offrant et les demandant de places—veuves, célibataires,
abandonnées, divorcées ont été dépossédées de leur commerce lucratif
sans recevoir aucune indemnité.
En faisant une loi sur les bureaux de placement, les législateurs
avaient éliminé les femmes placeuses, afin qu’elles ne fassent point
concurrence aux hommes placeurs auxquels une nouvelle autorisation a
été délivrée, ce qui leur a permis d’obtenir une indemnité lors de la
suppression de leur bureau.
La dépossession pour cause d’annulement politique des placeuses
crée un précédent, qui peut permettre d’empêcher demain les logeuses,
les hôtelières, les épicières, les marchandes de vin, les herboristes
d’exercer leur profession parce qu’elles ne jouissent pas de leurs droits
politiques.
Les députés peuvent tout faire aux femmes. Ils n’ont point de
représailles à redouter, puisque les femmes ne votent pas.
L’évincement du sexe féminin d’une position pour cause d’inactivité
politique est bien fait pour démontrer qu’il est impossible à la femme de
prétendre à l’égalité économique tant qu’elle n’est pas en possession de
ce passe-partout, le vote, qui ouvre toutes les portes aux travailleurs.
Vous dites que vous vous souciez peu de la politique. Mais la politique
n’attend pas que vous alliez à elle, que vous vous occupiez d’elle. C’est
elle qui va vous trouver chez vous pour vous enlever votre commerce,
pour vous arracher des mains votre gain parce que vous ne contribuez
pas à la diriger.
Les hommes même étrangers, peuvent, en se faisant naturaliser,
exploiter leurs bureaux; tandis que les femmes nées en France, de
parents français ont été dépossédées des leurs parce qu’elles ne
jouissent pas dans leur propre pays des droits que peuvent obtenir les
étrangers.
Que l’on aille donc soutenir que la politique n’intéresse pas les
femmes, quand, parce qu’elles ne sont point admises à s’en occuper elles
se voient retirer le pain de la bouche.
Lorsque l’accession à la politique devient pour la femme une question
de vie et de mort, le préjugé de sexe, qui est aujourd’hui ce qu’était le
préjugé d’argent avant 1848—l’unique motif d’exclusion—doit disparaître.
Puisque les droits politiques sont indispensables pour se retourner dans la
vie, puisque même pour commercer, il faut en jouir, l’un et l’autre sexe
doivent les posséder.
En voyant les propriétaires de bureaux de placement dépossédées
parce qu’elles ne jouissent pas de leurs droits civils et politiques, les
femmes comprendront-elles que les commerçantes ont autant que les
ménagères, les travailleuses et les institutrices, besoin de voter pour
sauvegarder leurs intérêts. Il est même, pour elles, pressant de voter, car
nous sommes à un tournant social que les petites commerçantes ne
pourront franchir, si elles ne mettent, pour se préserver d’être broyées, la
main au gouvernail.
Que l’on soit pour ou contre les coopératives, pour ou contre la
monopolisation des industries tendant au ravitaillement de la société, il est
difficile de se leurrer sur la durée d’existence du petit commerce.
Les plus optimistes perçoivent que très prochainement les grands
bazars absorberont les petits magasins. Or, les boutiquiers détaillants sont
généralement du sexe féminin. Que deviendront les infortunées
marchandes quand sonnera pour elles le glas commercial?
Si elles n’ont pas dans la main le bulletin qui suscite le dévouement
des conseillers municipaux et des députés, elles se verront enlever sans
compensation leur gagne-pain, parce qu’elles ne votent pas, et elles
seront évincées des emplois créés par la monopolisation, parce qu’elles
ne voteront pas.
Avant longtemps, pour l’importante catégorie des femmes du petit et
du moyen commerce, la privation ou la possession du bulletin de vote,
sera une question de vie ou de mort.
VII
La cherté de la vie est due à l’exclusion des femmes
de l’administration des affaires publiques

L’annulement politique des femmes ne préjudicie pas seulement au


sexe féminin. Il préjudicie à toute l’humanité, car les hommes sont bien
plus préoccupés d’exciter l’admiration de leurs contemporains que de
garantir leur existence. Ils rendent rapide la locomotion, ils dévorent
l’espace et volent dans les airs, mais sans souci de satisfaire les
estomacs. Ils abandonnent aux vieux errements coutumiers l’agriculture,
l’horticulture, l’aviculture, la pisciculture, la production du bétail petit et
grand, et croient que vont leur tomber rôties du ciel, les cailles.
Les masculinistes les plus aveugles sont forcés de constater que, si
aujourd’hui tant de Français vivent dans la gêne, c’est parce que la
prévoyance féminine exclue des parlements, des conseils généraux et
municipaux, n’a pu conjurer la disette alimentaire.
La chaleur torride, la sécheresse ou les pluies que l’on rend
responsables de la hausse des aliments, ne font que corser le malaise
résultant d’une production qui n’est plus en rapport avec la consommation.
La cherté est sur les denrées qui sont plus demandées qu’offertes et
cette cherté s’accentuera, puisque, en même temps que tout le monde fuit
le labeur des champs, tout le monde est devenu friand des aliments
délicats qui se récoltent à la campagne, aliments qui récemment encore,
étaient vendus aux fortunés et non point mangés par la masse.
En développant l’individu, on lui a affiné le goût, on l’a rendu sociable,
on a fait s’éveiller en lui le besoin de stabilité et de bien-être.
Or, le travailleur peut-il, présentement, trouver à la campagne les
garanties qu’il souhaite?
—Non, parce que entre ces périodes de surmenage: le labourage,
l’ensemencement des terres, la récolte des foins, la moisson, le battage
des blés, etc... il y a des semaines, il y a des mois durant lesquels ses
bras étant inutilisés, il doit vivre sans rien gagner.
Pour enrayer la crise alimentaire que les femmes électeurs et élues
auraient prévue et détournée, il faut industrialiser l’agriculture, il faut
intensifier la production, il faut assurer l’existence des ouvriers agricoles.
L’évolution sociale écrase l’agriculteur de charges sans lui procurer de
profits, puisqu’on veut payer son beurre moins cher qu’il ne lui revient.
L’agriculteur est dans l’impossibilité d’instaurer pour ses employés les
satisfactions matérielles et morales exigées dans notre société moderne.
Aussi, les concours bien que très chèrement payés, lui font souvent
défaut.
Des gens croient que, malgré la pénurie de denrées, la taxe mettrait fin
à la hausse. Ils demandent au gouvernement—qui nous donne moins bon
que les particuliers ce qu’il nous fait payer plus cher qu’eux—d’instaurer
des coopératives pour la vente du pain et de la viande.
Ce projet sourit aux maîtres du pouvoir qui, au lieu d’être contraints de
restreindre avec les droits d’entrée les récoltes budgétaires, trouveraient,
grâce aux coopératives, une mine électorale à exploiter.
Naturellement, la coopération des groupes commerciaux et industriels
proteste contre ce projet qui, sans profit pour les victimes de la disette,
constituerait une véritable expropriation des boulangers et des bouchers.
Ce que les groupes commerciaux ne disent pas, c’est que les
boucheries et boulangeries coopératives, dont pour faire les frais, les
villes et les communes devront se mettre en déficit—comme Elbeuf pour
l’exploitation du gaz—feront s’augmenter les dépenses publiques, en
créant des armées de nouveaux fonctionnaires, clients électoraux des
conseillers, députés, sénateurs. Les femmes auraient la charge de payer
ces employés nouveaux, mais, elles ne seraient pas admises à ces
emplois puisqu’elles ne votent pas.
Les hommes si, satisfaits de la manière dont les femmes pourvoient au
ravitaillement de la maison, ne peuvent se décider à les convier à assurer
avec eux l’alimentation de la commune et de l’Etat.
Cependant en administrant seuls, ils font couler de nos robinets une
eau qu’il est dangereux de boire, et trouvent suffisant que nous mangions
la viande frigorifiée. Ils laissent les Allemands, les Autrichiens, les Italiens
faire la rafle du bétail sur nos marchés français.
Au congrès de la boucherie qui s’est tenu au tribunal de commerce, un
vœu a été émis pour qu’une taxe de 10 francs frappe chaque tête de
bétail à destination de l’étranger.
Quand notre troupeau se restreint, quand nos écuries se dépeuplent et
que pour ce motif, le lait renchérit et menace de faire défaut, il ne suffit
pas de mettre une taxe sur les bœufs et les vaches allant à l’étranger, il
faut interdire à ces bœufs et à ces vaches de passer la frontière, avant
que la France, éprouvée et dépourvue n’ait reconstitué son cheptel.
Les hommes ne se sentent pas comme les femmes, responsables de
vies humaines, dédaignent ces détails qui ont pour résultat de satisfaire, à
notre détriment, l’appétit de nos voisins.
Mais ce n’est pas seulement le prix de la viande qui augmente; le
sucre qui se vendait au mois de juillet 70 centimes le kilo, vaut aujourd’hui
1 franc le kilo[13].
Pendant que les hommes seuls gouvernent, on raréfie le sucre pour lui
conserver son haut cours, c’est-à-dire que l’intérêt de 39 millions de
consommateurs français est sacrifié à l’intérêt de 300 producteurs de
sucre.
Ceux qui clament que les betteraves sont, cette année, insuffisantes,
ne se souviennent pas qu’ils ont eux-mêmes limité la production de ces
plantes, en réduisant l’étendue et le nombre des betteraveries, afin de ne
pas obtenir un rendement qui avilirait le prix du sucre.
A l’heure où les médecins signalaient la valeur nutritive du sucre et
conseillaient de faire entrer beaucoup de sucre dans notre alimentation,
les femmes, ménagères de la nation, n’auraient pas, comme les hommes
indifférents à ces questions, permis que l’on restreignît l’étendue des
betteraveries pour maintenir élevé le prix du sucre.
En enfermant la femme, pour la ravaler, dans ce laboratoire, la cuisine,
on ne lui donne point la possibilité de pourvoir au bon fonctionnement de
ce département qu’on lui assigne. Cependant cette dernière place dévolue
à la femme, se trouve être la première, actuellement. Des savants disent
que la médecine ne fait que compléter la cuisine, l’alimentation pouvant
augmenter le cerveau et fortifier le corps.
Pourquoi la ville de Paris garde-t-elle des octrois que la ville de Lyon a
trouvé moyen de supprimer?
Les producteurs préfèrent expédier leurs volailles, leurs œufs, leurs
beurres dans les villes exemptes d’octrois comme Lyon, Londres, Bâle...
que de les expédier à Paris où il faut payer de lourds droits d’entrée, des
frais de transport énormes!
Les denrées alimentaires paient moins pour être transportées en
Angleterre que pour être transportées à Paris. Ainsi les coquetiers, pour
envoyer les beaux œufs frais de l’ouest de la France à Paris doivent payer
8 fr. 40 pour cent kilos de marchandises ou d’emballage, tandis que pour
diriger ces beaux œufs sur l’Angleterre ils n’ont à verser que 5 francs par
cent kilos. Aussi, pendant que les Anglais mangent nos gros œufs à la
coque, nous devons en France nous contenter des petits œufs frais que la
Russie, l’Italie et la Turquie nous envoient.
Des groupes de tous les partis, des ligues de consommateurs se
réunissent pour protester contre la cherté des vivres. Mais il ne faut pas
oublier que ce sont des femmes, des ménagères, qui ont pris l’initiative de
réclamer contre la vie chère.
Les premières manifestations des ménagères ont été approuvées par
les pouvoirs publics. Des femmes ont délibéré avec des municipalités, des
femmes ont décidé le député Basly à détailler lui-même le beurre aux
acheteurs.
En taxant d’autorité le beurre, les ménagères s’offraient pour aider à
atténuer la crise alimentaire. Elles semblaient dire aux gouvernants: Pour
qu’en la République les Français trouvent l’aisance et puissent se nourrir
à bon compte, nous devons prendre place à vos côtés.
L’exclusion des femmes du gouvernail met en péril la barque sociale et
conduit ceux qu’elle porte à la famine. Attendu que les hommes seuls au
pouvoir ne se préoccupent jamais assez, ni de la sécurité nationale—
l’affaire des poudres le prouve—ni de l’alimentation de la population, les
femmes qui, avec des ressources minimes pourvoient au besoin de la
maisonnée, préserveront l’Etat de la disette quand elles seront électeurs
et élues. La prévoyance dont elles sont douées leur fera intensifier la
production de manière qu’elle suffise à la consommation et lui ménage
des réserves.
VIII
Les intérêts de la France mis en péril par les
hommes

«Les Français et les Françaises se


complètent parce qu’ils ont chacun des
qualités propres, que la nation a le plus grand
intérêt à utiliser.»

Hubertine Auclert.

Parce que la femme, économe de la famille, n’a pas le droit d’être


l’économe de la cité et de l’Etat, les dépenses publiques s’accroissent
extraordinairement. Les députés, pour être élus, ne refusent jamais de
voter les dépenses favorables à leur circonscription afin d’assurer leur
réélection, en sorte que le déficit s’accroît et rend de plus en plus difficile
la tâche de boucler le budget.
Les hommes ménagers de la nation ne peuvent faire s’équilibrer les
recettes et les dépenses et sont incapables d’administrer nos affaires sans
nous grever d’emprunts et sans nous frapper d’impôts. Qu’ils passent
donc la main aux femmes qui ont coutume de proportionner leurs
dépenses à leur avoir, et qui, en supprimant le gaspillage, nous
préserveraient des emprunts et des impôts nouveaux.
On multiplie les fonctionnaires inutiles, nuisibles, dont la présence
complique les rouages administratifs et hérisse de difficultés nos actes les
plus simples.
Puisque les hommes ne peuvent y parvenir, les femmes doivent à leur
tour essayer d’enrayer le désordre et le gaspillage qui engloutissent les
fonds publics.
Le vote des femmes serait pour l’électeur, le fruit de l’arbre de la
science politique. Il ferait le mandant apte à être mandataire.
Les femmes prévoyantes et économes, tenues hors la loi, ne sont ni
électeurs ni éligibles, alors que des hommes incapables, ne sachant faire
leurs propres affaires, qui sont pourvus d’un conseil judiciaire, étant
électeurs et éligibles, sont chargés de gérer la fortune publique. A ces
interdits redevenus mineurs qui ne peuvent disposer de leurs biens, on
laisse le pouvoir de disposer des biens de la France. Interdits ou non, les
gouvernants qui dilapident les caisses publiques sont traités en
irresponsables.

Les intérêts de la France sont mis en péril par les hommes. On


démembre la France sans l’assentiment des femmes:
M. Caillaux a donné le Congo à l’Allemagne, M. Delcassé a donné une
partie du Maroc à l’Espagne.
Comment le gouvernement de la France a-t-il pu sanctionner ces dons
de ce qui nous appartient, faits par deux hommes momentanément
ministres? La Chambre et le Sénat qui discutent des mois pour voter une
dépense, ont sanctionné sans discussion notre dépouillement.
Croit-on que les femmes laisseraient se passer ces énormités si elles
avaient leur part de pouvoir?
Dans ce pays où le moteur de tout est la politique, la femme est par le
fait d’en être exclue défrancisée, puisqu’elle ne peut s’immiscer en rien à
ce qui se fait en France, ni avoir sa part d’avantages sociaux. Les femmes
se désintéressant de la politique, les affaires du pays vont à vau l’eau.
L’intégrité territoriale, rien moins qu’assurée par l’administration,
exclusivement masculine:
On a aliéné, secrètement, sans profit, une partie du Maroc qui allait
nous appartenir, et tout le monde, au ministère des Affaires Etrangères
ignorait ce qui s’était passé, et la Chambre ne s’occupa pas de ce qui
avait été fait.
La Diplomatie Française et le ministère Caillaux laissèrent ignorer à
l’Allemagne le traité Delcassé de 1904 qui attribuait à l’Espagne une partie
du Maroc, de sorte que l’accord Franco-Allemand fait s’étendre le
protectorat de la France sur tout le Maroc.
Les gouvernants français traitant avec l’Allemagne pour tout le Maroc
(alors qu’il en avait été concédé une partie) c’était lui faire exiger de la
France une plus grande compensation.
Les hommes inconscients, qui ne savent ce qu’ils font, nous ont fait
donner le Congo à l’Allemagne pour doter l’Espagne de la meilleure partie
du Maroc et pour assurer l’internationalisation de Tanger.
De même qu’on fait la loi sans les femmes, on démembre la France
sans l’assentiment des femmes. Et les femmes doivent payer les fautes
faites par les hommes.
IX
La France menacée par ses multiples Cabarets

L’ordre et l’économie de la femme faisant défaut dans l’Etat, les


gouvernants, pour subvenir au gaspillage masculin, sont réduits à
demander des ressources à l’alcool, et ainsi, à faire s’empoisonner la
nation.
Les femmes sont plus sobres que les hommes. Elles voteraient mieux
que les 76.000 électeurs que la police ramasse chaque année ivres-morts
sur la voie publique et que la plupart des autres électeurs et législateurs
qui, pour n’être pas tous titubants, ne font pas moins leur principale
occupation de boire, quand ils se réunissent pour discuter des affaires
publiques.
Voyons, rares hommes sobres, qui suivez comme nous, avec effroi, les
progrès de l’alcoolisme. Trouveriez-vous à redire à ce que les femmes
apportent un peu de lucidité dans les réunions électorales et dans les
assemblées administratives et législatives? Le gouvernement englobant
les femmes sobres, vaudrait-il moins que le gouvernement des ivrognes?
Les électeurs inconscients, pour la plupart, de leur puissance
souveraine vendent, Esaü modernes, leur vote pour un crédit ouvert au
cabaret.
A l’heure qu’il est, ce sont les marchands de vins qui sont les maîtres
de la France.
L’exclusion des femmes de la vie publique a pour conséquence
l’influence des marchands de vins.
Pendant que la maîtresse de maison ne votera pas, la résidence
forcée de la politique sera le cabaret dont les philtres engourdissent les
esprits, préparent la servitude.
L’alcoolisme fait se restreindre le nombre des naissances.
L’alcoolisme augmente la mortalité en France. Les départements où
l’on meurt le plus de tuberculose sont ceux où l’on boit le plus d’alcool.
Nombreux sont les enfants créés dans l’ivresse, qui ne naissent point
viables, ou qui, faibles de corps et d’esprit sont impropres au service
militaire et restent toute leur vie une charge pour la société.
Les médecins qui proscrivaient le vin et ordonnaient de boire de l’eau,
ont prescrit l’alcool. Ils ont ainsi encouragé les humains à s’intoxiquer
avec ce tonique-panacée, pour rattraper la vie qui continuellement leur
échappe.
Quand des êtres affaiblis ont été une fois ressuscités par leur médecin
à l’aide de quinquina et de fine champagne, à toute nouvelle épreuve
physique, ils recourent aux cordiaux et bientôt ils abusent des apéritifs.
Les médecins et les politiciens peuvent donc compter au nombre des
introducteurs de l’alcoolisme en France. Les premiers parce qu’ils ont
employé l’alcool comme spécifique vivifiant et curatif. Les seconds parce
qu’ils ont fait, de l’alcool, un agent de corruption électorale. Comment
refuser de voter pour un candidat qui fait défoncer les barriques où l’on
peut, à volonté, s’abreuver? Avec les femmes peu buveuses, cette
manœuvre échouerait.
La galanterie n’a pas de pire ennemi que l’alcool qui réduit l’homme à
l’impuissance, moralement et physiquement, et l’éloigne de la femme.
Si l’alcool éloigne l’homme de la femme, on peut bien constater aussi
que la femme éloigne l’homme de l’alcool. Quand il est soustrait à sa
bienfaisante influence, c’est, loin de ses regards, au cabaret, que l’homme
s’alcoolise et non en sa maison.
Interrogez les débitants, ils vous répondront qu’on ne boit jamais chez
eux autant qu’en période électorale et lors d’agitations politiques.
Pourquoi la politique fournit-elle l’occasion de s’intoxiquer?
Parce que les femmes n’y participent pas.
L’avènement des femmes à la politique aurait pour effet immédiat
d’enrayer l’alcoolisme, car il ferait se transporter les discussions publiques
du cabaret dans le home où des couples humains pourraient, en pleine
lucidité, tendre les ressorts de leur esprit vers le mieux-être général.
Si les femmes participaient à la politique avec leur esprit d’ordre et
d’économie, elles feraient considérablement diminuer les dépenses
publiques; leur concours faciliterait l’allègement de l’impôt. Avec elles on
ne tirerait plus l’impôt de sources immorales.
Tant que les Françaises n’auront le droit de rien décider, relativement à
l’alcool, dont les députés s’opposent à la suppression, c’est vainement
qu’elles se ligueront pour combattre l’intempérance.
Le vrai remède à l’alcoolisation est dans le vote des femmes. C’est en
conférant aux femmes le droit de régler la question de l’alcool, c’est-à-
dire, le pouvoir de conserver aux hommes la vie qu’elles leur ont donnée,
que l’on préserverait la nation d’une imminente déchéance.
La femme électeur serait la plus grande force contre l’alcoolisme.
Comment, en effet, pourrait-on sans la femme triompher de ce fléau?
Car, en même temps qu’il faut défendre l’usage de l’alcool, il est
nécessaire de mettre l’organisme en état de s’en passer.
Les buveurs sont généralement des êtres débiles qui avalent
précipitamment le liquide corrodant, non pour se délecter le palais, mais
pour se réconforter le corps.
C’est pour pouvoir jongler plus facilement avec la masse électorale
que les habiles de tous les partis éliminent les femmes des salles de vote.
Si la femme participait à la vie publique, avant peu de temps chacun
pourrait lire dans la politique comme dans un livre ouvert: voir où tous ses
intérêts sont concentrés et se passionner pour ces intérêts, comme le
laboureur se passionne pour le champ de blé, dont la récolte lui rapporte
moins d’argent que la mauvaise politique ne lui en coûte.
La participation de la femme à la vie publique: mais, ce serait à bref
délai le suffrage éclairé, l’émulation des efforts pour le bien public, les
décisions politiques mûries dans la saine atmosphère de la famille,
remplaçant les étourderies consommées au milieu des vapeurs
alcooliques du cabaret.
On n’a pas idée de ce que seraient les délégués au pouvoir, s’ils
étaient choisis par les hommes et par les femmes, et de ce que seraient
capables de faire ces délégués s’ils se sentaient talonnés par tous,
Français et Françaises réunis.
Electeurs! ne sacrifiez donc pas plus longtemps vos intérêts à un vain
préjugé de sexe! Sachez bien que tant que les femmes ne voteront pas,
toujours hommes indifférents ou naïfs, toujours vous vous laisserez
escroquer votre vote. Dans le pouvoir que le vote donne à ceux qui le
possèdent de régler les affaires publiques au mieux de leurs intérêts, la
question d’opinion n’a rien à faire. Est-ce quand une succession s’ouvre
chez un notaire, on s’occupe de la manière de penser de ceux qui
héritent?
Eh bien, il en est des droits politiques comme il en est des droits
d’héritage. Rien, ni opinion, ni sexe ne peut empêcher les ayants-droit
d’entrer en possession de la part de liberté, que les générations qui les
ont précédés leur ont laissée en héritage.
Si tout allait si bien dans le monde, qu’un pas en avant pût faire
craindre de déranger l’harmonie de la société, on comprendrait l’effroi que
certaines gens manifestent à l’idée de voir voter les femmes. Mais alors
que nous avons un budget de cinq milliards[14], pénurie de travail et
augmentation des vivres, la réduction des naissances, la dépopulation et
que l’alcool dissout la France, il n’y a que les bornés ou les hypocrites qui
puissent dire que l’intervention des femmes dans les affaires publiques
ouvrirait l’ère des cataclysmes.

You might also like