A Scikit-Learn Compatible Learning Classifier System

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

A Scikit-learn Compatible Learning Classifier System

Robert F. Zhang Ryan J. Urbanowicz∗


University of Pennsylvania Perelman School of Medicine University of Pennsylvania
Philadelphia, PA, USA Philadelphia, PA, USA
robertzh@wharton.upenn.edu ryanurb@upenn.edu

ABSTRACT 1 INTRODUCTION
Parallel to genetic algorithms and genetic programming, the evolu- Learning classifier systems (LCSs) are a family of rule-based ma-
tionary computation (EC) sub-field of learning classifier systems chine learning (RBML) algorithms within the field of evolutionary
(LCS) has been an active area of research, development, and appli- computation (EC). LCSs are distingished from other EC and gen-
cation. LCSs are best known for their ability to flexibly adapt to eral machine learning (ML) approaches, primarily by representing
perform well on a wide range of problems, and to outperform other solutions in a ’piece-wise’ manner [12]. Specifically, through the
machine learning (ML) algorithms on problems characterized as discovery of conditional IF:THEN rules that are collectively applied
being complex and heterogeneous. However, to date LCSs remain to making decisions/predictions as an ensemble. Having originally
poorly recognized and thus rarely considered for comparison to been implemented primarily as reinforcement learning systems,
other ML approaches in or outside of the EC community. One likely LCSs were tasked with modeling complex, multi-step, and adaptive
reason for this is the general lack of easy-to-use LCS implementa- domains [13]. In the early 2000s, interest in developing LCSs that
tions to facilitate comparisons to other well-known ML approaches. were specialized to single-step supervised-learning problems began
The Python-based scikit-learn library has become a very popular with the development of UCS [3], a direct descendant of XCS [15],
way to implement, utilize, and compare ML algorithms in a simple, the best known and most popular LCS to date. Presently, a large
uniform manner. In this work, we develop and evaluate the first LCS variety of LCS algorithms have been proposed and implemented
scikit-learn package, called scikit-eLCS. The scikit-eLCS algorithm that take advantage of the modular and highly adaptive nature of
is a modern Michigan-style supervised-learning LCS descended an LCS framework to address an impressive variety of problem
from eLCS, UCS, and XCS. We demonstrate the efficacy and capa- domains from robotics and game strategy to data mining on classifi-
bilities of this package over benchmark n-bit multiplexer problems. cation and regression problems [6, 8, 13]. One particular advantage
We expect scikit-eLCS to serve as an algorithmic benchmark to of the ’piece-wise’ (i.e. niche-based) LCS approach to learning and
facilitate future ML comparisons, as well as a blueprint for the modeling is the ability to facilitate and enable the detection and
implementation of other scikit-learn compatible LCS algorithms. modeling of very complex patterns, including feature interactions,
and most uniquely, ’heterogeneous patterns of association’ which
CCS CONCEPTS tend to confound other machine learning approaches. Heteroge-
• Computing methodologies → Rule learning; Genetic algo- neous associations refers to situations when different features in
rithms; • Human-centered computing → Accessibility systems the dataset or environment are relevant to making an appropriate
and tools; decision/prediction depending on the instance at hand. Another key
advantage is that LCSs are generally regarded as being able to yield
KEYWORDS human interpretable solutions, an often essential characteristic in
application domains such as biomedical data mining [1, 10].
learning classifier systems, rule-based machine learning, machine While the LCS concept was introduced around the same time
learning, data mining as the genetic algorithm (GA) [5], LCS research and application re-
ACM Reference Format: mains largely unknown and poorly adopted within the broader field
Robert F. Zhang and Ryan J. Urbanowicz. 2020. A Scikit-learn Compatible of artificial intelligence and ML. Discussions among researchers
Learning Classifier System . In Genetic and Evolutionary Computation Con- at the annual International Workshop on Learning Classifier Sys-
ference Companion (GECCO ’20 Companion), July 8–12, 2020, Cancún, Mexico. tems (IWLCS) have often pointed out two contributing factors: (1)
ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3377929.3398097 gaining an initial understanding of LCS algorithms has been char-
acterized as challenging, likely due to their atypical approach to
∗ Corresponding author learning and solution representation as well as there being a large
diversity of implementations targeted to a diversity of problem
Permission to make digital or hard copies of all or part of this work for personal or domains, and (2) a general lack of easy to use code/software with
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation clear documentation facilitating comparison of LCS algorithms to
on the first page. Copyrights for components of this work owned by others than the new and established ML approaches.
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or In 2017 we sought to address the former issue through the publi-
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org. cation of the first introductory LCS textbook [12]. This book was
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico paired with an accessible new Python-coded LCS called educational
© 2020 Copyright held by the owner/author(s). Publication rights licensed to the LCS (eLCS). eLCS is a supervised learning LCS algorithmically sim-
Association for Computing Machinery.
ACM ISBN 978-1-4503-7127-8/20/07. . . $15.00 ilar to UCS, that also adopted a number of abilities from ExSTraCS
https://doi.org/10.1145/3377929.3398097

1816
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico Robert F. Zhang and Ryan J. Urbanowicz

[14], an LCS geared towards the challenges of biomedical data ’best’ individual in the population. For simplicity, we describe LCS
mining e.g. large-scale, missing, noisy, and/or imbalanced data. operation from the perspective of a supervised classification prob-
In the present work we seek to address the latter issue. Previously, lem. Also unlike other EC algorithms, LCS populations typically
LCS implementations almost exclusively existed as stand alone start out empty, rather than being randomly initialized. Each in-
code packages outside of any commonly used ML libraries such cremental training iteration follows these subsequent steps: (1) A
as scikit-learn [7]. This makes the comparison and application of training instance is taken from the dataset without replacement, (2)
LCS methodologies significantly more difficult and thus less likely a match set [M] is formed, that includes any rule in the population
to occur than for other widely used ML methods such as neural [P] that has a condition matching the training instance, (3) [M] is
networks, decision trees, random forests, etc. This also means that divided into a correct set [C] and an incorrect set [I] depending
there are few reliable and well annotated example code-bases upon on whether a given rule includes the correct or incorrect class, (4)
which future LCS algorithms might be developed and shared. at this point if [C] is empty, the covering mechanism is applied to
We address this issue through the development of scikit-eLCS, randomly generate a rule (added to [M] and [C]) that matches the
the first scikit-learn compatible LCS algorithm [7]. The scikit-learn current instance and has the ’correct’ class, (5) rules in [M] have
Python library is one of the best known and most used ML libraries their parameters updated, e.g. accuracy, fitness, and numerosity,
to date, making it substantially easier to implement an analysis where rule accuracy is the proportion of times a rule has been in a
pipeline where many different ML algorithms can be run, evalu- [C] divided by the times its been in a [M], fitness is an exponen-
ated, and compared in a rigorous and homogeneous manner [4]. tial function of accuracy, and numerosity is the number of virtual
We selected eLCS as our target to be the first LCS implemented for copies of a given rule in [P], (6) subsumption, a rule-generalization
the scikit-learn library because (1) it represents a largely generic mechanism is applied to rules in [C], (7) a basic genetic algorithm
Michigan-style LCS algorithm framework that can be easily adapted uses tournament selection to pick two parent rules from [C] based
into other existing LCS implementations such as XCS, (2) a super- on fitness and generates two offspring rules that are added to [P]
vised learning LCS is easier to understand than its reinforcement along with rules in [M], and (8) if the size of [P] is greater than the
learning counterparts and more comparable to other ML algorithms specified maximum, a deletion mechanism probabilistically selects
within scikit-learn, (3) it offers a simple but effective LCS bench- a rule to be removed from [P]. This set of steps is repeated for some
mark algorithm for future studies to compare to as a baseline, (4) user-specified number of training iterations.
it offers a more advanced and generalizable set of capabilities in For a more complete introduction to LCS algorithms we refer
comparison to UCS, (5) it is an easy-to-understand LCS implemen- readers to the following [12, 13].
tation utilized by the recent introductory LCS textbook, and (6) it
is a code base we are very familiar with.
2.1 The eLCS Algorithm
In the remaining sections of this paper we will detail our methods
for implementation and evaluation, present results demonstrating The eLCS, or Educational Learning Classifier System [12], is a mod-
fidelity of our rebuild of eLCS into scikit-eLCS, draw conclusions, ern variant of Bernado-Mansilla’s sUpervised Classifier System
and outline future work. (UCS)[3] coded in Python. As such, it is a Michigan-Style, super-
vised, offline LCS, that follows the learning cycle presented above. It
can be applied to binary or multi-class classification problems, but
2 METHODS has yet to be adapted for continuous valued outcomes as found in
In this section we (1) review the training cycle of a generic Michigan- regression problems. While eLCS code was originally written to be
style LCS, (2) specifics of the eLCS algorithm, (3) identify the op- paired with the introductory textbook [12], this is the first research
eration, key differences, and useful features of scikit-eLCS, and (4) publication in which it has been formally presented and evaluated.
describe the benchmark experimental evaluation. We outline key differences between UCS and eLCS below.
As discussed, many variations of LCS algorithms have been de- First, prior to training, eLCS shuffles the training data to remove
veloped over the years, some of which have revamped or refined the bias that may be introduced from training data order. Second, eLCS
fundamental underlying architecture of most modern LCS imple- rules adopt a mixed discrete-continuous attribute-list (DCAL) rep-
mentations. In that time, two fundamental LCS architectures have resentation utilized by ExSTraCS [14], that was based on a similar
persisted: Michigan-style and Pittsburgh-style LCSs [13]. While representation in BioHEL [2]. DCAL enables eLCS to work with
both architectures have their advantages and disadvantages, this data that have a mix of categorical or ordinal features. This repre-
work focuses on Michigan-style systems such as eLCS [12], UCS[3], sentation also only stores ’specified’ features in rules rather than
XCS[15], and ExSTraCS[14]. Michigan-style systems are the most storing some symbol for all features. This significantly reduces
popular of the two architectures, while Pittsburgh-style systems run time as a dataset’s feature space scales up [11] (see Figure 1).
have more in common with a traditional genetic algorithm [13]. Third, eLCS learns in the presence of missing data in a given dataset.
A generic Michigan-style LCS evolves a population of IF:THEN Specifically, an instance will match when a rule’s condition specifies
rules incrementally (i.e. one training instance at a time). The ’IF’, a feature value that is missing in the current training instance. This
i.e. condition, of a rule includes one or more specified features and was viewed as a less conservative approach to dealing with miss-
their corresponding values (required for the rule to match). The ingness. Fourth, like ExSTraCS, eLCS does not alternate learning
’THEN’, i.e. action, of a rule gives the predicted outcome, i.e. class, iterations between explore/exploit phases like many earlier systems
asserted by the given rule. Unlike most other EC algorithms, the designed for reinforcement learning such as XCS [15]. Pseudocode
resulting population of rules is the solution/model, rather than a for the eLCS algorithm is given in Algorithm 1.

1817
A Scikit-learn Compatible Learning Classifier System GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico

Figure 1: Traditional LCS rule representation uses ’#’ symbols to indicate that a given feature is ’wild’, i.e. will match any value.
Ternary representation uses [0,1,#] to encode values in binary, but features can be represented directly as specified values
(for categorical features) or a value range (for ordinal features) as illustrated on the left. eLCS applies a DCAL representation
tracking the zero-indexed location of the feature/attribute and its corresponding specified value (right side of figure).

By the end of training, the eLCS’s rule population would typically object to store them. As such, a configuration file would need to
contain hundreds, or thousands of rules, each making their own be generated or updated for each algorithm run. It was set up this
class predictions. To make a prediction on a given training or testing way in an effort to make all hyperparameter settings transparent to
instance, every matching rule serves as a fitness and numerosity new users, as well as leave a permanent record of hyperparameter
weighted vote for the class it specifies as its action. Like other LCSs, settings for a given algorithm run.
eLCS has a number of tunable hyperparameters. However only In line with scikit-learn standards, scikit-eLCS accepts hyperpa-
a handful are particularly important to optimize. These include rameters values as arguments upon initialization and thus does not
the maximum number of training iterations, the maximum popula- use a configuration file to load hyperparameters, nor requires an
tion size, and the nu parameter which determines the exponential external ConfigParser or Constants object as in eLCS. All tools
fitness function calculated based on rule accuracy. For a detailed, needed to initialize, run, and evaluate the model are encapsulated
piece-by-piece explanation of the eLCS algorithm, see [12]. The orig- within a single Python object.
inal eLCS code is available at (https://github.com/ryanurbs/eLCS) Default, recommended hyperparameters are provided, so only
and a Jupyter Notebook version of this same code is available at hyperparameters that the user wishes to change need be specified.
(https://github.com/ryanurbs/eLCS_JupyterNotebook). As mentioned previously, only a handful of LCS hyperparame-
ters are important to consider optimizing for a given dataset. A
2.2 scikit-eLCS complete list of tunable scikit-eLCS hyperparameters, along with
The core algorithm of the original eLCS and scikit-eLCS is nearly brief descriptions and default settings are given in Table 1. Among
identical, with three key exceptions. First, we have opted to han- these, scikit-eLCS introduces a few new hyperparameter options:
dle missing data values differently in scikit-eLCS. Specifically, by matchForMissingness, trackAccuracyWhileFit, rebootFilena
default, missing values within instances will now trigger ’failed- me, and specifiedAttributes. MatchForMissingness is the op-
matches’ whenever a rule specifies that corresponding feature. This tion mentioned above giving the user the choice as to whether a
is a more conservative approach to handling missingness within specified attribute in a rule will match with a missing attribute in a
LCS. scikit-eLCS also includes a hyperparameter giving users the data instance or not. trackAccuracyWhileFit gives the user the
option to switch back to the original missing data strategy applied choice of whether online training accuracy estimates should be com-
in eLCS. Second, we have removed a second round of correct set puted to provide insight to the learning trajectory, post-training,
subsumption that occurs after GA between offspring and correct set at the expense of some computational efficiency. rebootFilename
classifiers. Third, we have separated the original doSubsumption gives the user the option of loading a previously trained scikit-eLCS
hyperparameter into doGASubsumption and doCorrectSetSubsum rule population to resume training (see section 2.2.4). Lastly, in the
ption, with correct set subsumption turned off by default. original eLCS, the discreteAttributeLimit hyperparameter was
Aside from these exceptions, the major differences between eLCS implemented to automatically determine which features should be
and scikit-eLCS have to do with how scikit-eLCS has been imple- treated as discrete (i.e. categorical) and which as continuous-valued
mented to meet the standards and capabilities of other scikit-learn (i.e. ordinal) based on a ’unique-value threshold’. While convenient
packages while preserving the unique capabilities and outputs of in most situations, it will sometimes be necessary for the user to
an LCS algorithm. We examine these differences and capabilities in have the ability to manually specify which features to treat as
the following subsections. discrete, and which as continuous. specifiedAttributes allows
the user to explicitly specify the subset of features that should be
2.2.1 Hyperparameters. The original eLCS algorithm used a treated as either discrete or continuous as an alternative to relying
(.txt) configuration file along with an external ConfigParser object on discreteAttributeLimit to decide automatically.
to load all algorithm hyperparameters and an external Constants

1818
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico Robert F. Zhang and Ryan J. Urbanowicz

Table 1: scikit-eLCS Hyperparameters

Hyperparameter Name Description Default Value


learningIterations The number of training cycles to run. 10000
N Maximum microclassifier population size (sum of classifier numerosities). 1000
p_spec Probability of specifying an attribute during the covering procedure. 0.5
nu Power parameter used to determine the importance of high accuracy when calculating 5
fitness.
chi The probability of applying crossover in the GA. 0.8
upsilon The probability of mutating an allele within an offspring. 0.04
theta_GA The GA threshold. The GA is applied in the correct set when the average time (number 25
of iterations) since the last GA in the correct set is greater than theta_GA.
theta_del The deletion experience threshold. The calculation of the deletion probability changes 20
once this threshold is passed.
theta_sub The subsumption experience threshold. 20
acc_sub Subsumption accuracy requirement. 0.99
beta Learning parameter. Used in calculating average correct set size. 0.2
delta Deletion parameter. Used in determining deletion vote calculation. 0.1
init_fit The initial fitness for a new classifier. 0.01
fitnessReduction Initial fitness reduction in GA offspring rules. 0.1
doCorrectSetSubsumption Determines if subsumption is done in the correct set. False
doGASubsumption Determines if subsumption is done after GA between parents and offspring classifiers. True
selectionMethod GA selection method (tournament or roulette). tournament
theta_sel The fraction of the correct set to be included in tournament selection. 0.5
matchForMissingness Whether missing values in instance can match with specified attributes. False
trackAccuracyWhileFit Determines if live accuracy tracking is done during model training. False
discreteAttributeLimit Multipurpose parameter that determines if an attribute will be treated as continuous or 10
discrete.
specifiedAttributes If discreteAttributeLimit == "c", attributes specified by index in this parameter will numpy.array([])
be continuous and the rest will be discrete. If "d", attributes specified by index in this
parameter will be discrete and the rest will be continuous.
randomSeed Set a constant random seed value to some integer (in order to obtain reproducible none
results). Put ’none’ if none (pseudo-random algorithm runs).
rebootFilename File name of pickled model to be rebooted. None

2.2.2 Model Training. As mentioned, the underlying eLCS algo- dataset into the algorithm but instead rely on feature index order.
rithm itself has been mainly preserved. However, there are some key Since rule interpretability is a key aspect of LCS algorithms, post
implementation differences that impact the output of the system in training, scikit-eLCS includes methods that allow trained rules to
line with scikit-learn standards. Original eLCS loaded both training be mapped back to feature names rather than indexes when the
and testing data during initialization and saved output statistics, user wishes to inspect the rule population model (see section 2.2.4).
tracked training performance, and the rule population model to .txt These changes optimize training for speed, while retaining and
files automatically. Scikit-eLCS replaces this with external training adding new user friendly evaluation tracking functionality that can
and testing evaluations exported after model fitting in line with be accessed post training.
scikit-learn standards (see section 2.2.3). This making its operation
2.2.3 Testing and Evaluation. Akin to other scikit-learn compat-
easily comparable to other scikit-learn ML estimators such as de-
ible estimators, an n-fold cross validation can be performed on a
cision trees, random forests, etc. The normal workflow involves
trained scikit-eLCS model using the standard sklearn cross_val_
(1) initializing the algorithm, (2) training the model with the fit
score method. Scikit-eLCS also includes the standard set of meth-
function, and then evaluating the model externally on training and
ods familiar to scikit-learn users, such as predict, predict_proba,
testing data or as part of a scikit-learn automated cross validation
and score. Finally, scikit-eLCS includes a suite of evaluation and
analysis.
exporting methods that can be called after training to provide fur-
Also scikit-learn estimators require fully numeric data to train
ther insight into the training process. Examples of these meth-
(i.e. no string values). Thus, we implemented an external DataClean
ods can be found in a descriptive Jupyter Notebook user guide
up.StringEnumerator object to make it easy to map discrete string
that pairs with this paper along with the scikit-eLCS source code
attributes in a dataset into numeric attributes of the correct for-
(https://github.com/UrbsLab/scikit-eLCS). Further details are given
mat. Lastly, scikit-learn standards do not pass feature labels from a
below.

1819
A Scikit-learn Compatible Learning Classifier System GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico

Algorithm 1 The eLCS algorithm Following initialization, the model can be trained with the algo-
for all iteration ∈ maxIterations do rithm with the following command (where x_train is the matrix of
get next instance from shuffled dataset instances by features, and y_train is an array of outcome labels):
for all rule ∈ rule population [𝑃] do model = a l g o r i t h m . f i t ( x _ t r a i n , y _ t r a i n )
if rule condition matches instance then
Add rule to match set [𝑀] The rest of this subsection we discuss example output from the
end if aforementioned Jupyter Notebook eLCS User Guide illustrating
end for how scikit-eLCS can be easily applied as part of a broader ML
for all rule ∈ [𝑀] do analysis pipeline. This user guide walks through every element of
if rule action matches instance class then the scikit-eLCS package in detail, including (1) advice on how to
Add rule to correct set [𝐶] best format training data, (2) detailed hyperparameter descriptions,
end if (3) initializing and training a new eLCS estimator, (4) testing its
end for performance via cross validation, and (5) evaluation summaries
if [𝐶] is empty, i.e. no rules then with ROC/PRC curves, and AUC.
Invoke covering The user guide also contains a walk through on how to access
Add new covered rule to [𝑀] and [𝐶] and export a model’s incremental training performance metrics
end if over the course of algorithm training iterations. These include
for all rule ∈ [M] do tracking macro and micro population size [P], [M] size, [C] size,
Update rule accuracy, fitness, experience (i.e. match count), and cumulative operations and run-time counts for underlying
average match set size eLCS mechanisms. Each can be easily exported post-training as a
end for CSV file for plotting. Figures 2 to 5 illustrate examples of each.
for all rule ∈ [C] do
Update rule correct count
end for
Perform subsumption on rules in [𝐶]
if rules in [𝐶] have not performed 𝐺𝐴 for 𝑡ℎ𝑒𝑡𝑎_𝐺𝐴 iterations
on average then
Tournament selection to select two parent rules from [𝐶]
Copy parent rules to form two offspring rules
Probabilistically apply mutation and uniform crossover to
offspring rules
Perform subsumption between offspring and parent rules
Add offspring rules into [𝑃]
end if
while sum of rule numerosities > 𝑁 do
Perform deletion within [𝑃]
end while
end for Figure 2: Visualizing [M] and [C] sizes over training itera-
tions.

2.2.4 Using scikit-eLCS. Scikit-eLCS has been published with


The guide also demonstrates how to export and inspect the
PyPi so it can easily be installed directly using pip install or
final rule population for interpretation of the model post-training.
similar. Assuming the user is using Python 3, installation of scikit-
Scikit-eLCS can export the rule population such that rule conditions
eLCS and its prerequisite packages is achieved with the following
are in their native DCAL representation, or alternatively using a
command:
traditional representation utilizing # for features in rules that are
p i p i n s t a l l s c i k i t −eLCS not specified. Further, the rule population can be exported such
that feature names are included instead of just indexes in the data,
Once installed, scikit-eLCS can be applied easily like any other
for readability. Figure 6 offers a screenshot of an exported rule
scikit-learn package. It can be imported using the following com-
population in DCAL format.
mand:
Further, the guide demonstrates how other rule population sta-
from skeLCS i m p o r t eLCS tistics such as training coverage and accuracy, as well as feature
importance estimates can be obtained from the trained model.
The algorithm and desired hyperparameters can be initialized Finally, the guide demonstrates how to reboot old scikit-eLCS
with a command such as: rule populations to resume training. In some cases, the user might
a l g o r i t h m = eLCS ( l e a r n i n g I t e r a t i o n s = 5 0 0 0 ) want to run training, pause training, and resume training at some
point in the future. Alternatively, the user might want to train a

1820
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico Robert F. Zhang and Ryan J. Urbanowicz

Figure 3: Visualizing macro/micropopulation sizes over


training iterations. Figure 5: Visualizing cumulative training times over train-
ing iterations. Aside from ’Total Time’, the other time val-
ues are stacked (i.e. lines are additive on top of each other).
Evaluation time refers to online learning performance esti-
mates.

export functionality was deactivated for this evaluation. Further, we


substituted hyperparameter handling and training/testing dataset
handling mechanisms to be more comparable to scikit-eLCS. With
these modifications made to the original package, an equitable
comparison between the two eLCS algorithms could be made.
For this empirical comparison we rely on a set of Multiplexer
benchmark datasets that have been well studied and applied in the
context of LCS research and development. For simplicity and to
reduce computational burden to run and allow for easy replication
of this analysis in a Jupyter notebook, we focus on the three lower-
order Multiplexer problems, i.e. 6-bit, 11-bit, and 20-bit, with 6, 11,
Figure 4: Visualizing cumulative learning operations counts or 20 features respectively in each [14]. Multiplexer datasets are
over training iterations. Counts broken down by specific op- noise-free, but notoriously challenging to solve given that they are
erations. characterized by their highly epistatic and heterogeneous associa-
tions with a binary class outcome [15]. The results in this paper can
also be directly compared to the performance of ExSTraCS, a more
generated rule population on new datasets or with different hyper-
advanced LCS which had previously been demonstrated to solve the
parameters. The population reboot feature allows the user to "save"
6, 11, 20, 37, 70, and 135-bit Multiplexer problems over the course of
the current rule population into a txt file for future use. The user
up to a million and a half training instances (for the 135-bit problem)
can later initialize a new scikit-eLCS estimator with this txt file via
[14]. Of note, neither eLCS nor scikit-eLCS is expected to function
the rebootFilename hyperparameter. When the user runs fit, the
as well as ExSTraCS on the more complex multiplexer problems,
new scikit-eLCS estimator will effectively start off where the saved
but we do expect them to effectively solve the lower order ones.
model left off, beginning with the saved rule population, as well as
As in that previous study, 6, 11, and 20 bit problem datasets were
starting at the next sequential learning iteration and adding to the
generated with 500, 5000, and 10000 training instances respectively
last model’s training times.
for our performance comparisons. For reference, the 6-bit problem
has 26 = 64 unique data instances, while the 11-bit problem has
2.3 Empirical benchmark comparison of eLCS 211 = 2048, and the 20-bit problem has 220 = 1048576.
and scikit-eLCS For each of eLCS or scikit-eLCS runs, 10000 learning iterations
It is important that the scikit-eLCS algorithm performs similarly to were completed and the nu parameter was set to 10, placing ad-
the original eLCS algorithm in terms of testing performance and ditional emphasis on rule accuracy to accelerate convergence on
training run time so that it may serve as an effective benchmark optimal rules. Maximum rule micropopulation sizes of 500, 1000,
algorithm. In order to make the time comparisons as fair as pos- and 2000 were applied to solve the 6, 11, and 20 bit problems, In
sible, a modified version of the original eLCS was implemented total, each experiment was repeated 30 times with different random
that removed some of its peripheral methods expected to slow it seeds each time applying 3-fold cross validation. Pairwise statistical
down. Specifically, all print statements, evaluation procedures, and comparisons were made with the Mann-Whitney non-parametric

1821
A Scikit-learn Compatible Learning Classifier System GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico

Figure 6: Rule population with DCAL representation of rule conditions. Trained on an 11-bit Multiplexer dataset. Note the
original feature names are used in the "Specified Attribute Names" column.

test since normal distributions and homogeneous variance are not Table 3: Summary of average balanced testing accuracy
necessarily expected. Tests were conducted locally on a 2016 Mac-
book Pro (2.6 GHz Quad-Core Intel Core i7 processor). The code to eLCS scikit-eLCS Mann-Whitney (p-value)
replicate this analysis is available in the Jupyter notebook found 6-bit 0.9984 1.0 0.0108
in the eLCSPerformanceTests directory of the following GitHub 11-bit 0.9996 0.99997 0.0416
Repository (https://github.com/UrbsLab/scikit-eLCS). 20-bit 0.9395 0.9334 0.0362

3 RESULTS AND DISCUSSION


Tables 2 and 3 summarize average balanced training and testing
accuracy respectively across the three target Multiplexer problems.
Figure 7 presents the distribution of testing accuracy results over
the 30 random seed runs. Overall performance was comparable in
this set of preliminary empirical benchmark tests suggesting that
we have preserved the fidelity of the eLCS algorithm. Notice that
the 20-bit multiplexer problem wasn’t quite solved to the specified
0.99 threshold in the 10000 allotted training iterations. However
we note that both eLCS and scikit-eLCS performed comparably to
how ExSTraCS performed after only 10000 training iterations in a
previous study [14]. Further, as recognized in that previous work,
we expect 100% testing accuracy to be easily achievable on these
problems once rule filtering of the population has been applied to
remove sub-optimal rules before application of the rule set[9, 14]. In
this study we opted to leave out rule compaction in order to focus on Figure 7: Boxplots comparing original eLCS to scikit-eLCS
the reliable implementation of a core LCS algorithm within scikit- testing accuracy
learn. Small but statistically significant performance differences
observed are likely the result of stochastic variation across the 30
runs, but may also reflect the minor noted algorithmic differences Table 4 summarizes the average training run times over the 30
between eLCS and scikit-eLCS. random seed runs and Figure 8 presents boxplots of this analysis.
Overall, run times were comparable although we observed a small
significant speed improvement for scikit-eLCS for the 6-bit problem,
and a small significant speed improvement for eLCS for the 11-bit
problem.

Table 2: Summary of average balanced training accuracy


Table 4: Summary of average training time (s)
eLCS scikit-eLCS Mann-Whitney (p-value)
eLCS scikit-eLCS Mann-Whitney (p-value)
6-bit 1.0 1.0 1.0
6-bit 3.8206 3.4941 « 0.0001
11-bit 0.9997 1.0 0.0408
11-bit 13.1923 14.7741 « 0.0001
20-bit 0.9437 0.9428 0.4151
20-bit 58.8103 61.1884 0.0103

1822
GECCO ’20 Companion, July 8–12, 2020, Cancún, Mexico Robert F. Zhang and Ryan J. Urbanowicz

research groups to improve scikit-eLCS as well as make other


scikit-learn compatible LCS algorithms available to the research
community. Further, we intend to develop and apply rigorous ML
analysis pipelines that compare LCS algorithms alongside other
well known and cutting edge ML algorithms in and outside of the
EC community.
ACKNOWLEDGEMENTS
This work was supported by the Center for Undergraduate Research
and Fellowships (CURF) at the University of Pennsylvania. We thank
reviewers from the International Workshop on Learning Classifier
Systems (IWLCS) for their constructive comments and feedback.

Figure 8: Boxplots comparing original eLCS to scikit-eLCS


training time REFERENCES
[1] Jaume Bacardit, Edmund K Burke, and Natalio Krasnogor. 2009. Improving the
scalability of rule-based evolutionary learning. Memetic computing 1, 1 (2009),
55–67.
Overall, these preliminary results suggest that eLCS is a rela- [2] Jaume Bacardit and Natalio Krasnogor. 2009. A mixed discrete-continuous at-
tively simple, but effective generic supervised LCS algorithm that is tribute list representation for large scale classification domains. In Proceedings of
capable of serving as a benchmark algorithm for comparing other the 11th Annual conference on Genetic and evolutionary computation. 1155–1162.
[3] Ester Bernadó-Mansilla and Josep M Garrell-Guiu. 2003. Accuracy-based learn-
LCS algorithms and most importantly, other well known ML al- ing classifier systems: models, analysis and applications to classification tasks.
gorithms implemented within the scikit-learn library. Further, the Evolutionary computation 11, 3 (2003), 209–238.
fidelity observed in porting eLCS to scikit-eLCS suggests that this [4] Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas
Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort,
new implementation is sound and has the potential to serve as a Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël
blueprint for other scikit-learn compatible LCS implementations Varoquaux. 2013. API design for machine learning software: experiences from
the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining
and hopefully to foster a more collaborative community of LCS and Machine Learning. 108–122.
algorithm improvement, and novel development. [5] J.H. Holland. 1976. Adaptation. Progress in theoretical biology 4 (1976), 263–293.
[6] Pier L Lanzi. 2000. Learning classifier systems: from foundations to applications.
Number 1813. Springer Science & Business Media.
4 CONCLUSIONS AND FUTURE WORK [7] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.
In this work we have addressed the need for an LCS algorithm that Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine
is easy to use and compare to other established ML approaches. Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
Specifically we introduced and demonstrated the efficacy of the [8] Olivier Sigaud and Stewart W Wilson. 2007. Learning classifier systems: a survey.
first scikit-learn compatible LCS algorithm (scikit-eLCS) based on Soft Computing 11, 11 (2007), 1065–1078.
[9] Jie Tan, Jason Moore, and Ryan Urbanowicz. 2013. Rapid rule compaction strate-
the eLCS Michigan-style, offline, supervised learning algorithm. We gies for global knowledge discovery in a supervised learning classifier system. In
anticipate that the scikit-eLCS framework will serve as an easy-to- Artificial Life Conference Proceedings 13. MIT Press, 110–117.
[10] Ryan John Urbanowicz, Angeline S Andrew, Margaret Rita Karagas, and Jason H
use ML benchmark algorithm as well as a blueprint for developing Moore. 2013. Role of genetic heterogeneity and epistasis in bladder cancer
and disseminating other LCS algorithms within the scikit-learn susceptibility and outcome: a learning classifier system approach. Journal of the
library to encourage their widespread application. In turn we hope American Medical Informatics Association 20, 4 (2013), 603–612.
[11] Ryan J Urbanowicz, Gediminas Bertasius, and Jason H Moore. 2014. An extended
this leads to increased exposure of the LCS family of algorithms in michigan-style learning classifier system for flexible supervised learning, classifi-
the general ML zeitgeist, such that others come to recognize their cation, and data mining. In International Conference on Parallel Problem Solving
many fundamental advantages as well as their as of yet largely from Nature. Springer, 211–221.
[12] Ryan J Urbanowicz and Will N Browne. 2017. Introduction to learning classifier
unrealized potential. systems. Springer.
Our future work will first prioritize further testing, evaluation, [13] Ryan J Urbanowicz and Jason H Moore. 2009. Learning classifier systems: a
complete introduction, review, and roadmap. Journal of Artificial Evolution and
and comparison of scikit-eLCS over a broader range of problems, Applications 2009 (2009).
with many more training iterations, and to other ML algorithms [14] Ryan J Urbanowicz and Jason H Moore. 2015. ExSTraCS 2.0: description and
within scikit-learn. Further, we will utilize scikit-eLCS to simi- evaluation of a scalable learning classifier system. Evolutionary intelligence 8, 2-3
(2015), 89–116.
larly implement, compare, and disseminate scikit-XCS and scikit- [15] Stewart W Wilson. 1995. Classifier fitness based on accuracy. Evolutionary
ExSTraCS implementations. We hope to collaborate with other computation 3, 2 (1995), 149–175.

1823

You might also like