Professional Documents
Culture Documents
A Scikit-Learn Compatible Learning Classifier System
A Scikit-Learn Compatible Learning Classifier System
A Scikit-Learn Compatible Learning Classifier System
ABSTRACT 1 INTRODUCTION
Parallel to genetic algorithms and genetic programming, the evolu- Learning classifier systems (LCSs) are a family of rule-based ma-
tionary computation (EC) sub-field of learning classifier systems chine learning (RBML) algorithms within the field of evolutionary
(LCS) has been an active area of research, development, and appli- computation (EC). LCSs are distingished from other EC and gen-
cation. LCSs are best known for their ability to flexibly adapt to eral machine learning (ML) approaches, primarily by representing
perform well on a wide range of problems, and to outperform other solutions in a ’piece-wise’ manner [12]. Specifically, through the
machine learning (ML) algorithms on problems characterized as discovery of conditional IF:THEN rules that are collectively applied
being complex and heterogeneous. However, to date LCSs remain to making decisions/predictions as an ensemble. Having originally
poorly recognized and thus rarely considered for comparison to been implemented primarily as reinforcement learning systems,
other ML approaches in or outside of the EC community. One likely LCSs were tasked with modeling complex, multi-step, and adaptive
reason for this is the general lack of easy-to-use LCS implementa- domains [13]. In the early 2000s, interest in developing LCSs that
tions to facilitate comparisons to other well-known ML approaches. were specialized to single-step supervised-learning problems began
The Python-based scikit-learn library has become a very popular with the development of UCS [3], a direct descendant of XCS [15],
way to implement, utilize, and compare ML algorithms in a simple, the best known and most popular LCS to date. Presently, a large
uniform manner. In this work, we develop and evaluate the first LCS variety of LCS algorithms have been proposed and implemented
scikit-learn package, called scikit-eLCS. The scikit-eLCS algorithm that take advantage of the modular and highly adaptive nature of
is a modern Michigan-style supervised-learning LCS descended an LCS framework to address an impressive variety of problem
from eLCS, UCS, and XCS. We demonstrate the efficacy and capa- domains from robotics and game strategy to data mining on classifi-
bilities of this package over benchmark n-bit multiplexer problems. cation and regression problems [6, 8, 13]. One particular advantage
We expect scikit-eLCS to serve as an algorithmic benchmark to of the ’piece-wise’ (i.e. niche-based) LCS approach to learning and
facilitate future ML comparisons, as well as a blueprint for the modeling is the ability to facilitate and enable the detection and
implementation of other scikit-learn compatible LCS algorithms. modeling of very complex patterns, including feature interactions,
and most uniquely, ’heterogeneous patterns of association’ which
CCS CONCEPTS tend to confound other machine learning approaches. Heteroge-
• Computing methodologies → Rule learning; Genetic algo- neous associations refers to situations when different features in
rithms; • Human-centered computing → Accessibility systems the dataset or environment are relevant to making an appropriate
and tools; decision/prediction depending on the instance at hand. Another key
advantage is that LCSs are generally regarded as being able to yield
KEYWORDS human interpretable solutions, an often essential characteristic in
application domains such as biomedical data mining [1, 10].
learning classifier systems, rule-based machine learning, machine While the LCS concept was introduced around the same time
learning, data mining as the genetic algorithm (GA) [5], LCS research and application re-
ACM Reference Format: mains largely unknown and poorly adopted within the broader field
Robert F. Zhang and Ryan J. Urbanowicz. 2020. A Scikit-learn Compatible of artificial intelligence and ML. Discussions among researchers
Learning Classifier System . In Genetic and Evolutionary Computation Con- at the annual International Workshop on Learning Classifier Sys-
ference Companion (GECCO ’20 Companion), July 8–12, 2020, Cancún, Mexico. tems (IWLCS) have often pointed out two contributing factors: (1)
ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3377929.3398097 gaining an initial understanding of LCS algorithms has been char-
acterized as challenging, likely due to their atypical approach to
∗ Corresponding author learning and solution representation as well as there being a large
diversity of implementations targeted to a diversity of problem
Permission to make digital or hard copies of all or part of this work for personal or domains, and (2) a general lack of easy to use code/software with
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation clear documentation facilitating comparison of LCS algorithms to
on the first page. Copyrights for components of this work owned by others than the new and established ML approaches.
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or In 2017 we sought to address the former issue through the publi-
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org. cation of the first introductory LCS textbook [12]. This book was
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico paired with an accessible new Python-coded LCS called educational
© 2020 Copyright held by the owner/author(s). Publication rights licensed to the LCS (eLCS). eLCS is a supervised learning LCS algorithmically sim-
Association for Computing Machinery.
ACM ISBN 978-1-4503-7127-8/20/07. . . $15.00 ilar to UCS, that also adopted a number of abilities from ExSTraCS
https://doi.org/10.1145/3377929.3398097
1816
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico Robert F. Zhang and Ryan J. Urbanowicz
[14], an LCS geared towards the challenges of biomedical data ’best’ individual in the population. For simplicity, we describe LCS
mining e.g. large-scale, missing, noisy, and/or imbalanced data. operation from the perspective of a supervised classification prob-
In the present work we seek to address the latter issue. Previously, lem. Also unlike other EC algorithms, LCS populations typically
LCS implementations almost exclusively existed as stand alone start out empty, rather than being randomly initialized. Each in-
code packages outside of any commonly used ML libraries such cremental training iteration follows these subsequent steps: (1) A
as scikit-learn [7]. This makes the comparison and application of training instance is taken from the dataset without replacement, (2)
LCS methodologies significantly more difficult and thus less likely a match set [M] is formed, that includes any rule in the population
to occur than for other widely used ML methods such as neural [P] that has a condition matching the training instance, (3) [M] is
networks, decision trees, random forests, etc. This also means that divided into a correct set [C] and an incorrect set [I] depending
there are few reliable and well annotated example code-bases upon on whether a given rule includes the correct or incorrect class, (4)
which future LCS algorithms might be developed and shared. at this point if [C] is empty, the covering mechanism is applied to
We address this issue through the development of scikit-eLCS, randomly generate a rule (added to [M] and [C]) that matches the
the first scikit-learn compatible LCS algorithm [7]. The scikit-learn current instance and has the ’correct’ class, (5) rules in [M] have
Python library is one of the best known and most used ML libraries their parameters updated, e.g. accuracy, fitness, and numerosity,
to date, making it substantially easier to implement an analysis where rule accuracy is the proportion of times a rule has been in a
pipeline where many different ML algorithms can be run, evalu- [C] divided by the times its been in a [M], fitness is an exponen-
ated, and compared in a rigorous and homogeneous manner [4]. tial function of accuracy, and numerosity is the number of virtual
We selected eLCS as our target to be the first LCS implemented for copies of a given rule in [P], (6) subsumption, a rule-generalization
the scikit-learn library because (1) it represents a largely generic mechanism is applied to rules in [C], (7) a basic genetic algorithm
Michigan-style LCS algorithm framework that can be easily adapted uses tournament selection to pick two parent rules from [C] based
into other existing LCS implementations such as XCS, (2) a super- on fitness and generates two offspring rules that are added to [P]
vised learning LCS is easier to understand than its reinforcement along with rules in [M], and (8) if the size of [P] is greater than the
learning counterparts and more comparable to other ML algorithms specified maximum, a deletion mechanism probabilistically selects
within scikit-learn, (3) it offers a simple but effective LCS bench- a rule to be removed from [P]. This set of steps is repeated for some
mark algorithm for future studies to compare to as a baseline, (4) user-specified number of training iterations.
it offers a more advanced and generalizable set of capabilities in For a more complete introduction to LCS algorithms we refer
comparison to UCS, (5) it is an easy-to-understand LCS implemen- readers to the following [12, 13].
tation utilized by the recent introductory LCS textbook, and (6) it
is a code base we are very familiar with.
2.1 The eLCS Algorithm
In the remaining sections of this paper we will detail our methods
for implementation and evaluation, present results demonstrating The eLCS, or Educational Learning Classifier System [12], is a mod-
fidelity of our rebuild of eLCS into scikit-eLCS, draw conclusions, ern variant of Bernado-Mansilla’s sUpervised Classifier System
and outline future work. (UCS)[3] coded in Python. As such, it is a Michigan-Style, super-
vised, offline LCS, that follows the learning cycle presented above. It
can be applied to binary or multi-class classification problems, but
2 METHODS has yet to be adapted for continuous valued outcomes as found in
In this section we (1) review the training cycle of a generic Michigan- regression problems. While eLCS code was originally written to be
style LCS, (2) specifics of the eLCS algorithm, (3) identify the op- paired with the introductory textbook [12], this is the first research
eration, key differences, and useful features of scikit-eLCS, and (4) publication in which it has been formally presented and evaluated.
describe the benchmark experimental evaluation. We outline key differences between UCS and eLCS below.
As discussed, many variations of LCS algorithms have been de- First, prior to training, eLCS shuffles the training data to remove
veloped over the years, some of which have revamped or refined the bias that may be introduced from training data order. Second, eLCS
fundamental underlying architecture of most modern LCS imple- rules adopt a mixed discrete-continuous attribute-list (DCAL) rep-
mentations. In that time, two fundamental LCS architectures have resentation utilized by ExSTraCS [14], that was based on a similar
persisted: Michigan-style and Pittsburgh-style LCSs [13]. While representation in BioHEL [2]. DCAL enables eLCS to work with
both architectures have their advantages and disadvantages, this data that have a mix of categorical or ordinal features. This repre-
work focuses on Michigan-style systems such as eLCS [12], UCS[3], sentation also only stores ’specified’ features in rules rather than
XCS[15], and ExSTraCS[14]. Michigan-style systems are the most storing some symbol for all features. This significantly reduces
popular of the two architectures, while Pittsburgh-style systems run time as a dataset’s feature space scales up [11] (see Figure 1).
have more in common with a traditional genetic algorithm [13]. Third, eLCS learns in the presence of missing data in a given dataset.
A generic Michigan-style LCS evolves a population of IF:THEN Specifically, an instance will match when a rule’s condition specifies
rules incrementally (i.e. one training instance at a time). The ’IF’, a feature value that is missing in the current training instance. This
i.e. condition, of a rule includes one or more specified features and was viewed as a less conservative approach to dealing with miss-
their corresponding values (required for the rule to match). The ingness. Fourth, like ExSTraCS, eLCS does not alternate learning
’THEN’, i.e. action, of a rule gives the predicted outcome, i.e. class, iterations between explore/exploit phases like many earlier systems
asserted by the given rule. Unlike most other EC algorithms, the designed for reinforcement learning such as XCS [15]. Pseudocode
resulting population of rules is the solution/model, rather than a for the eLCS algorithm is given in Algorithm 1.
1817
A Scikit-learn Compatible Learning Classifier System GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico
Figure 1: Traditional LCS rule representation uses ’#’ symbols to indicate that a given feature is ’wild’, i.e. will match any value.
Ternary representation uses [0,1,#] to encode values in binary, but features can be represented directly as specified values
(for categorical features) or a value range (for ordinal features) as illustrated on the left. eLCS applies a DCAL representation
tracking the zero-indexed location of the feature/attribute and its corresponding specified value (right side of figure).
By the end of training, the eLCS’s rule population would typically object to store them. As such, a configuration file would need to
contain hundreds, or thousands of rules, each making their own be generated or updated for each algorithm run. It was set up this
class predictions. To make a prediction on a given training or testing way in an effort to make all hyperparameter settings transparent to
instance, every matching rule serves as a fitness and numerosity new users, as well as leave a permanent record of hyperparameter
weighted vote for the class it specifies as its action. Like other LCSs, settings for a given algorithm run.
eLCS has a number of tunable hyperparameters. However only In line with scikit-learn standards, scikit-eLCS accepts hyperpa-
a handful are particularly important to optimize. These include rameters values as arguments upon initialization and thus does not
the maximum number of training iterations, the maximum popula- use a configuration file to load hyperparameters, nor requires an
tion size, and the nu parameter which determines the exponential external ConfigParser or Constants object as in eLCS. All tools
fitness function calculated based on rule accuracy. For a detailed, needed to initialize, run, and evaluate the model are encapsulated
piece-by-piece explanation of the eLCS algorithm, see [12]. The orig- within a single Python object.
inal eLCS code is available at (https://github.com/ryanurbs/eLCS) Default, recommended hyperparameters are provided, so only
and a Jupyter Notebook version of this same code is available at hyperparameters that the user wishes to change need be specified.
(https://github.com/ryanurbs/eLCS_JupyterNotebook). As mentioned previously, only a handful of LCS hyperparame-
ters are important to consider optimizing for a given dataset. A
2.2 scikit-eLCS complete list of tunable scikit-eLCS hyperparameters, along with
The core algorithm of the original eLCS and scikit-eLCS is nearly brief descriptions and default settings are given in Table 1. Among
identical, with three key exceptions. First, we have opted to han- these, scikit-eLCS introduces a few new hyperparameter options:
dle missing data values differently in scikit-eLCS. Specifically, by matchForMissingness, trackAccuracyWhileFit, rebootFilena
default, missing values within instances will now trigger ’failed- me, and specifiedAttributes. MatchForMissingness is the op-
matches’ whenever a rule specifies that corresponding feature. This tion mentioned above giving the user the choice as to whether a
is a more conservative approach to handling missingness within specified attribute in a rule will match with a missing attribute in a
LCS. scikit-eLCS also includes a hyperparameter giving users the data instance or not. trackAccuracyWhileFit gives the user the
option to switch back to the original missing data strategy applied choice of whether online training accuracy estimates should be com-
in eLCS. Second, we have removed a second round of correct set puted to provide insight to the learning trajectory, post-training,
subsumption that occurs after GA between offspring and correct set at the expense of some computational efficiency. rebootFilename
classifiers. Third, we have separated the original doSubsumption gives the user the option of loading a previously trained scikit-eLCS
hyperparameter into doGASubsumption and doCorrectSetSubsum rule population to resume training (see section 2.2.4). Lastly, in the
ption, with correct set subsumption turned off by default. original eLCS, the discreteAttributeLimit hyperparameter was
Aside from these exceptions, the major differences between eLCS implemented to automatically determine which features should be
and scikit-eLCS have to do with how scikit-eLCS has been imple- treated as discrete (i.e. categorical) and which as continuous-valued
mented to meet the standards and capabilities of other scikit-learn (i.e. ordinal) based on a ’unique-value threshold’. While convenient
packages while preserving the unique capabilities and outputs of in most situations, it will sometimes be necessary for the user to
an LCS algorithm. We examine these differences and capabilities in have the ability to manually specify which features to treat as
the following subsections. discrete, and which as continuous. specifiedAttributes allows
the user to explicitly specify the subset of features that should be
2.2.1 Hyperparameters. The original eLCS algorithm used a treated as either discrete or continuous as an alternative to relying
(.txt) configuration file along with an external ConfigParser object on discreteAttributeLimit to decide automatically.
to load all algorithm hyperparameters and an external Constants
1818
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico Robert F. Zhang and Ryan J. Urbanowicz
2.2.2 Model Training. As mentioned, the underlying eLCS algo- dataset into the algorithm but instead rely on feature index order.
rithm itself has been mainly preserved. However, there are some key Since rule interpretability is a key aspect of LCS algorithms, post
implementation differences that impact the output of the system in training, scikit-eLCS includes methods that allow trained rules to
line with scikit-learn standards. Original eLCS loaded both training be mapped back to feature names rather than indexes when the
and testing data during initialization and saved output statistics, user wishes to inspect the rule population model (see section 2.2.4).
tracked training performance, and the rule population model to .txt These changes optimize training for speed, while retaining and
files automatically. Scikit-eLCS replaces this with external training adding new user friendly evaluation tracking functionality that can
and testing evaluations exported after model fitting in line with be accessed post training.
scikit-learn standards (see section 2.2.3). This making its operation
2.2.3 Testing and Evaluation. Akin to other scikit-learn compat-
easily comparable to other scikit-learn ML estimators such as de-
ible estimators, an n-fold cross validation can be performed on a
cision trees, random forests, etc. The normal workflow involves
trained scikit-eLCS model using the standard sklearn cross_val_
(1) initializing the algorithm, (2) training the model with the fit
score method. Scikit-eLCS also includes the standard set of meth-
function, and then evaluating the model externally on training and
ods familiar to scikit-learn users, such as predict, predict_proba,
testing data or as part of a scikit-learn automated cross validation
and score. Finally, scikit-eLCS includes a suite of evaluation and
analysis.
exporting methods that can be called after training to provide fur-
Also scikit-learn estimators require fully numeric data to train
ther insight into the training process. Examples of these meth-
(i.e. no string values). Thus, we implemented an external DataClean
ods can be found in a descriptive Jupyter Notebook user guide
up.StringEnumerator object to make it easy to map discrete string
that pairs with this paper along with the scikit-eLCS source code
attributes in a dataset into numeric attributes of the correct for-
(https://github.com/UrbsLab/scikit-eLCS). Further details are given
mat. Lastly, scikit-learn standards do not pass feature labels from a
below.
1819
A Scikit-learn Compatible Learning Classifier System GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico
Algorithm 1 The eLCS algorithm Following initialization, the model can be trained with the algo-
for all iteration ∈ maxIterations do rithm with the following command (where x_train is the matrix of
get next instance from shuffled dataset instances by features, and y_train is an array of outcome labels):
for all rule ∈ rule population [𝑃] do model = a l g o r i t h m . f i t ( x _ t r a i n , y _ t r a i n )
if rule condition matches instance then
Add rule to match set [𝑀] The rest of this subsection we discuss example output from the
end if aforementioned Jupyter Notebook eLCS User Guide illustrating
end for how scikit-eLCS can be easily applied as part of a broader ML
for all rule ∈ [𝑀] do analysis pipeline. This user guide walks through every element of
if rule action matches instance class then the scikit-eLCS package in detail, including (1) advice on how to
Add rule to correct set [𝐶] best format training data, (2) detailed hyperparameter descriptions,
end if (3) initializing and training a new eLCS estimator, (4) testing its
end for performance via cross validation, and (5) evaluation summaries
if [𝐶] is empty, i.e. no rules then with ROC/PRC curves, and AUC.
Invoke covering The user guide also contains a walk through on how to access
Add new covered rule to [𝑀] and [𝐶] and export a model’s incremental training performance metrics
end if over the course of algorithm training iterations. These include
for all rule ∈ [M] do tracking macro and micro population size [P], [M] size, [C] size,
Update rule accuracy, fitness, experience (i.e. match count), and cumulative operations and run-time counts for underlying
average match set size eLCS mechanisms. Each can be easily exported post-training as a
end for CSV file for plotting. Figures 2 to 5 illustrate examples of each.
for all rule ∈ [C] do
Update rule correct count
end for
Perform subsumption on rules in [𝐶]
if rules in [𝐶] have not performed 𝐺𝐴 for 𝑡ℎ𝑒𝑡𝑎_𝐺𝐴 iterations
on average then
Tournament selection to select two parent rules from [𝐶]
Copy parent rules to form two offspring rules
Probabilistically apply mutation and uniform crossover to
offspring rules
Perform subsumption between offspring and parent rules
Add offspring rules into [𝑃]
end if
while sum of rule numerosities > 𝑁 do
Perform deletion within [𝑃]
end while
end for Figure 2: Visualizing [M] and [C] sizes over training itera-
tions.
1820
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico Robert F. Zhang and Ryan J. Urbanowicz
1821
A Scikit-learn Compatible Learning Classifier System GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico
Figure 6: Rule population with DCAL representation of rule conditions. Trained on an 11-bit Multiplexer dataset. Note the
original feature names are used in the "Specified Attribute Names" column.
test since normal distributions and homogeneous variance are not Table 3: Summary of average balanced testing accuracy
necessarily expected. Tests were conducted locally on a 2016 Mac-
book Pro (2.6 GHz Quad-Core Intel Core i7 processor). The code to eLCS scikit-eLCS Mann-Whitney (p-value)
replicate this analysis is available in the Jupyter notebook found 6-bit 0.9984 1.0 0.0108
in the eLCSPerformanceTests directory of the following GitHub 11-bit 0.9996 0.99997 0.0416
Repository (https://github.com/UrbsLab/scikit-eLCS). 20-bit 0.9395 0.9334 0.0362
1822
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico Robert F. Zhang and Ryan J. Urbanowicz
1823