Professional Documents
Culture Documents
Using Simulations To Train and Test Machine Learning Applications - Transentis
Using Simulations To Train and Test Machine Learning Applications - Transentis
com -
You are here: Home / Business Prototyping Methodology / Using Simulations To Train and Test Machine Learning Applications
One approach to dealing with this is to generate data using simulations – this gives you full control
both of the features contained within the data and also the volume and frequency of the data.
In this post we show you how to monitor long-standing simulations and use the data for machine
learning purposes without having to deploy on your own hardware – and with the use of as little code
as possible! This example makes use of simulation data generated using our own BPTK-Py
Framework . We employ various Amazon Web Services applications for focusing on Analytics and ML
rather than code and hardware/software.
Quicksight_Lambda_Kinesis_Blogpost
from transentis
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 1/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
09:23
The Challenge
In earlier posts, you learned how we developed a Python-based simulation engine for System
Dynamics and Agent Based Modeling. It allows you to run complex simulations within the Jupyter
environment and access the simulation results for further analysis.
Machine Learning (ML) algorithms are becoming part of our every day life. Ad owners improve their
targeting using web surfing behavior of potential customers. Businesses use more and more data
sources – both internal and external – to forecast future sales and modify their strategy, all
supported by ML. Manufacturers want to predict machine performance and faults to reduce
production downtimes or even accidents.
The choice of the right Machine Learning algorithm is hard. It requires a lot of real-world data and
testing. But the availability of such data is not always given. Someone in the sales department might
be able to give you sales data for last year. But who’s got those market research? Can you get
access to the raw website analytics data? You might need those data on a click-stream level. But the
DWH might only store them preprocessed as minutely data – or enriched with other – for your
problem – useless information.
With BPTK you can develop models and rapidly test them. What if we combine simulation and rapid
prototyping of machine learning algorithms? This way, you avoid the never-ending search for real-
world data. With his approach you can:
In the following we would like to present you a little example of how we made use of agent based
modeling for a few ML use cases.
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 2/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
Why we used AWS in this showcase? Because it allows us to concentrate on Machine Learning
algorithms and methods rather than wiring up an infrastructure and writing big amounts of code
before even being able to test ML. We will go into more detail for the specific services we used later
in this blog post.
In each simulation round, the car either drives, charges or the battery is replaced. The battery might
be failing. On battery fault, it takes one more simulation time step to replace the battery. In this
simple model, a battery fails if its capacity falls below 14,000 amp, with a design capacity of 20,000
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 3/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
amp. The capacity reduces with every round due to driving and recharging, just like in the real world.
Battery states are stored in the “battery\_state” field. On driving, the car uses 39.7 amp per
kilometer, resulting in around 500 km per full charge (when the full capacity is still available). The
number of kilometers driven in a simulation timestep is a random value within certain bounds, given
by a driving strategy. Replacement will only occur if the battery entered the failed state before.
Whenever the car drives, the field “distance\_in\_period” stores the km driven within the given
simulation period. It defaults to zero when charging. The field “charge”‘ measures the current charge
of the battery. The “cylces” field increases by one on a full recharge. The car charges if a lower
bound of charge has been reached. This lower bound is as well defined by the strategy.
The selection of the strategy is random at model initialization. We define three driving strategies:
A driver who lives in the city and does not make too many
City 25 60 %
kilometers
In this article we focus on the battery’s capacity. The capacity determines if the battery has to be
replaced. It would be great to be able to forecast this value. Hence, we should have a look at the
battery’s capacity over time for one agent. Check out Figure 2 for the results of a simulation that ran
for 100,000 steps. We clearly see a rather simple pattern: The capacity starts at 20,000 amp and –
almost linearly – decreases to 14,000, before the battery is replaced and the pattern restarts. If
you’d look more closely at one cycle, you’d notice that the capacity decrease is not really linear. This
is due to the fact that the car drives certain kilometers within the strategy’s bounds and not a fixed
number of kilometers in each timestep. However, one could easily construct a function to estimate a
battery fault event.
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 4/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
Figure 2: Battery capacity over time for the car simulation. Steps: 100,000
This model may be quite simple but good enough for our purposes in order to test methods for
forecasting battery fault. In real world, a prediction engine for battery fault is useful to inform the
driver about making an appointment with the nearest workshop – before she breaks down in the
middle of the road.
In order to forecast such problems car manufacturers may employ data science techniques. A car
could send its own telemetry data to the manufacturer’s servers, which uses a trained model based
on millions of other cars. As a response, it may receive a warning if a battery fault in the near future
is forecast.
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 5/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
The output of AWS Kinesis Analytics queries flow into another Kinesis Stream. These data are
consumed by AWS Kinesis Firehose. Firehose is another component of Kinesis. It takes a Kinesis
Stream as Input and outputs them in a structured way to other AWS applications. Using Firehose
avoids coding your own stream consumer. It pushes the streaming data into the target applications
without much configuration required. Here Firehose is used to output the data to S3, a scalable data
storage – more on that in a later section.
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 6/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
AWS Lambda
For forecasting, we employ AWS Lambda. Lambda is a service that allows for executing arbitrary
code – called a Function – whenever a trigger is executed. A function is stateless which means that
all variables are lost after the execution finishes. The code is triggered whenever a condition applies.
In this showcase, we use the Kinesis stream as trigger. We configured Lambda to execute our
function whenever 100 events arrived.
The function itself is a small Python script that receives the records, computes forecasts for each
agent and fires the data out to another Kinesis stream. The virtual environment in which the code
runs, comes only with the most basic Python packages. Hence, you will have to download the Linux
packages for all requirements and add them to the code package. Note that some packages such as
numpy and pandas need to be compiled. We used an Amazon Linux VM and ran the following
commands in a virtual environment:
Just copy the contents of the virtual environment’s site-packages into a zip file along with your
function.py. Next, we copied the zip to a S3 bucket (S3 is a storage service provided by AWS) and
configured the Lambda function to pull the file from the S3 location.
For forecasting the battery’s capacity, we employ Autoregressive modeling. This method creates a
function that weights previous observations’ influence on future values. An alternative is the use of a
moving average. Both methods are widely-used for forecasting. Extensions such as ARIMA
(combining both methods) are also common. For training the models, we use the python
package statsmodels, an easy-to-use package for statistic modeling. The basic training and
prediction code is pretty simple and straight-forward. For a great example, check out this Blog post.
Most of the code we had to write deals with decoding the incoming streaming data and preparing
them for model training. The data structure is very intuitive. The lambda function receives a
dictionary with a list named records. Each list element stores a dictionary named kinesis, which
includes metadata for the given record and the raw data. Note that the data from a kinesis stream
arrive base64-decoded in a Lambda function. This is due to the fact that kinesis is agnostic of which
type of data arrive. This means, you need the base64 package to decode the JSON string. The
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 7/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
function extracts the capacity field for each record and writes them into a list. The model fitting itself
only requires few commands:
AWS S3
AWS Simple Storage Service or commonly referred to as “S3”, is a scalable online object store. It
allows you to store any data which other applications consume. Data are organized in buckets. Each
bucket can be configured separately, including access rights, visibility and read/write per- missions.
Each of our Firehose processes (one for each query) writes the result data into a different S3 bucket.
S3 does not limit the storage you can use. You rather pay what you use and how much you use it.
This makes S3 scalable. Furthermore it comes with high availability and performance. Just with all
the other services, Amazon makes sure to provision hardware and software required to use the
resoures, without the need of your own physical hardware. Libraries for S3 are available for many
programming languages. So even non-AWS applications can access the data from S3.
AWS Quicksight
The last tool in our chain is AWS Quicksight. Quicksight is a Business Intelligence that is able to
connect to many data sources for graphical analysis. The frontend is very intuitive and simple. For
pulling data from S3, just click “Add Data Source” and select “S3”. Quicksight requires a JSON file
that specifies the S3 bucket(s) and subfolders where it is supposed to pull the data from. Let us look
at an example:
{"fileLocations": [
{
"URIPrefixes": [
"s3://the-bucket/subfolder/",
] }],
"globalUploadSettings": {
"format": "JSON",} . }
The URIPrefixes field sets the directory(ies) to search for data in. Quicksight parses all entries and
pulls all subfolders and files of the prefix. In the given example, it parses all files and subfolders
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 8/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 9/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
Now let us have a look at the most interesting graph, the battery capacity forecast, which is available
in Figure 6. We see how the forecast resembles the pattern of the actual capacity development.
Obviously, the AR method is a good fit for this cycle. The algorithm always uses the last 100
observations to forecast the next 100 steps. The forecast was able to successfully predict the trend
and we could use this as a warning for a battery fault. The numbers will not be 100% accurate, but a
forecast does not necessarily have to meet the exact number but resemble the trend in order to
allow for measures to avoid battery fault.
Learnings
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 10/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
Goal of this proof-of-concept was to develop a scalable data intelligence application, using
simulation data. This approach supports machine learning projects at an early stage where real-
world data are scarce or hard to come by. With the use of cloud infrastructure, complex machine
learning applications are easy to deploy. Thanks to the cloud, we could easily extend this with
applications that consume the simulation data stream. Furthermore, we showed that using
simulation data is useful for prototyping machine learning.
As a data scientist, I could now test other forecasting methods such as Moving Average, ARIMA or
even deep learning using the same data we just generated in the simulation. And even test them
side-by-side and compare the results within Quicksight.
We believe that at early stages of Machine Learning / Data Science projects, simulation is able to
support decision making in regards to choice of data sources, AI methods and budgeting.
NEWSLETTER
Sign Up
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 11/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
TOPICS
business-model
business-model-
prototype Business Analytics
business game business
simulation Data Science Elastic Stack
Enterprise Architect game Jupyter kpi
mathematica mobility
professional service firm
prototyping Python stella
strategy systemdynamics
systems thinking transformation
UML
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 12/13
9/30/2019 Using Simulations To Train and Test Machine Learning Applications - transentis.com -
https://www.transentis.com/using-simulations-train-test-machine-learning-applications/ 13/13