Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Journal

Logo
00 (2019) 1–20
www.elsevier.com/locate/procedia

ATM cash flow prediction: Exploratory Data Analysis and


results from sequence-bases predictions

Abstract
Commercial banks are required to maintain a minimum amount of cash in their ATMs (Automated Teller Machines), in
order to maintain customer satisfaction. Banks create an estimate for daily replenishment of their ATMs, however, this
often results in out of cash or over stock situations. Higher accuracy models are needed to predict the cash inflow needed
for the following day by examining and learning from past transactional data. We first perform an in-depth descriptive
analysis to determine trends that a learning model needs to discriminate from noise in the data, and frame the problem
as regression. A number of sequence prediction techniques are explored, including LSTM for time-series, which to the
best of our knowledge has not been used for cash estimation problem. This study is, to the best of our knowledge, the
first to have used LSTMs for time-series prediction and they provide better results over the other methods, standing at
98%.

c 2011 Published by Elsevier Ltd.


Keywords: ATM Cash Flow, Regression, Long Short Term Memory, Exploratory Data Analysis, Time Series

1. Introduction

Maintaining a minimum amount of cash in ATMs (Automated Teller Machines) is often a regulated
operating condition for commercial banks, imposed by central banks in order to ensure continuously satis-
factory customer experience. The number of cardholders are growing day by day and with this increasing
figures, cash withdrawal transactions too have become increasingly significant, especially in countries in a
state of transition to a paperless system [1]. However, the usual modus operandi by banks is to do an estima-
tios by the expected value, without taking into account the nature of cash inflow and outflow at a particular
ATM. This replenish amount may vary ATM to ATM, depending on the various factors such as, location
(where the ATM is planted), peak factor (what are peak hour of transactions), day of week, day of month,
and many more. For example: An ATM located in Mall has more transaction load on weekends as compare
to ATM located in any residential area. This phenomena of handling cash may result in out of stock or over
stock situation. According to the one of the survey that took place in 2013, it was observed that 21 percent
people use other bank’s ATMs due to out of cash situations [1]. With the introduction of white-label ATMs,
this becomes a major business concern as well. Hence there is a need of effective cash management solution
which enables bank to forecast need of cash replenishment in ATMs. The algorithms/technique must also
be flexible enough and ability to allow the bank to re-forecast future demand and perform what–if analyses
to optimize the bank ATM network for cash distribution [12].
Some of the most widely deployed solution for ATM networks as shown in Table 1. These would benefit
from the integration of algorithms that could predict cash flows in and out of an ATM.
2 / 00 (2019) 1–20

Table 1: Popular commercial solutions in use.

Company Software Web page link


Carreker iCom http://www.carreker.com/main/solutions/cash/icom .htm
Corpora-
tion
Morphis, MorphisCM http://www.morphisinc.com/product.php?pageIn = MorphisCM
Inc
Transoft OptiCa$h http://www.transoftnic.com/site/index.php?option = com_content&task =
Interna- view & id = 18 & Itemid = 40
tional
Wincor Pro Cash http://www.wincor-nixdorf.com/internet/com/Product/Software/Banking
Nixdorf Analyser /CashManagement/Main.html

The key role of an ATM cash forecasting model is to statistically relate principally the date, but also
location and other ancillary factors, with the amount withdrawn. Although this is a classical continuous-
value prediction problem along a time series, but the ancillary factors mentioned above add significant
stochasticity and the predition differs from one ATM to another. Hence it is difficult for banks to develop a
forecasting model for their entire ATM network.
In fact, location of an ATM may be one of the more important features. Hence, for the purpose of our
research, we have chosen ATMs that are sporadic is to propose a novel data-driven approach for ATM cash
flow prediction on an indigenous dataset. Therefore, the objectives of this work are:

• To collect, curate, understand data, preprocess them and perform exploratory data analysis to generate
descriptive statistics, which could guide algorithmic choices,
• To work with an dataset that incorporates local conditions such as understanding of customer cash
withdrawal trends and the effect of local holidays.
• To propose a data-driven approach for ATM cash outflow prediction, framed as non-iid problem,

2. Literature Review

Numerous authors had done work to alleviate ATM cash problem. Ghodratia et al. too present research
on ATM cash management. They used genetic algorithm approach to determine refill amount of each ATM
[5]. The data in this research is collected via survey, consisted of transaction date for year 2011-2012 of
an Iranian bank Ayandeh (Ayandeh Bank of Iran). In this research authors on the basis of their exploration
concluded that some of bank ATM need to be upload in 3 days while some of ATM should be upload with
cash on daily basis.
Armenise et al. in their paper identified the need of availability of cash in ATMs and tried to enhance
cash flow by demonstrating and predicting the customer needs. They used Genetic algorithm technique
as a tool for searching and generating efficient replenish strategies, also to avoid unnecessary amount of
cash in to the ATMs and to assure cash withdrawal service. They experimented by using data of 30 Italian
ATMs. These ATMs are located in different areas of different Italian cities categorized by different set of
transaction volumes. Furthermore, ATMs in the data differ for location to location (i.e. rural area, shopping
center, tourist location or other strategic areas), position (i.e. Malls, through the wall) and cash dispensing
capability. They divided the whole set of 30 ATMs in two groups: first group contained data of twenty
ATMs used to train the model, while second group contained 10 ATMs for testing the model, [1]. At the end
they reach to conclusion that Genetic algorithm is more better than human based approach of maintaining,
predicting and inserting cash in ATMs.
/ 00 (2019) 1–20 3

Darwish also worked to improve the estimating correctness of ATM cash demand. In 2013 the approach
which he used is the extension of ANN that is an Interval Type- Fuzzy Neural Network (IT2FNN). He used
the simulated data of 25 ATM for his experiments [9]. The structure set of date he used consist of: everyday,
weekly and monthly seasonality localized sudden changes (gazette/public holidays and festival effects) were
used to imitate the customers’ money withdrawal from ATMs that are categorized by different transaction
volumes. The experiment showed that average forecast accuracy (per week) of the proposed technique is
about 97.72% while the minimum forecast exactness is 94.15%[6].
In 2014, Abirami et al.used Data mining approach to deal with ATM cash prediction. The key objectives
of their work is to provide easy identification of ATM norm and to monitor the ATM usage (peak) time,
so that ATM must be available when it is needed most. The data consist of 30 day transactions of a day
for testing purpose and later it was extended to 30 ATMs transaction carried in a month. The following
are calculated: usage on the basis of location for each ATM is calculated. The results in their work were
predicted from the former data (past data) and track the appropriate solution which is needed. It happens
that the types of transaction which carried out frequently are characterized and on the basis of that charac-
terization, services will be provided to the customers which is better than the normal. Also peak hours were
identified which demonstrate that at what day of a month a customer will use particular ATM most, so that
more comfort can be provided in peak hours [2].
Simutis et al, demonstrate an approach that based on artificial neural network (ANN) to forecast a daily
cash need for every ATM in the bank network and they invented a procedure for cash upload of ATMs. They
discussed existing solution for ATM cash flow prediction and observe bank network which comprises of
1225 ATMs. In their technique; they discussed the most important factors for ATMs maintenance such as
cost of cash, cost of uploading and cost of daily services. Their work showed that in case of higher interest
rate and minimum cost of money uploading in ATMs, their procedure reduces the ATMs maintenance costs
up to 15-20%. However, they pointed out that further experimental studies that are necessary for the practical
execution[3].
Simutis et al, examined two different techniques to forecast the cash needed for cash withdrawal trans-
actions for ATM networks. The first method was based on artificial neural network (ANN) while the second
one was based on support vector regression (SVR) algorithm. They started out from the common down-
sides of the most known cash management solution around the world for ATM network. Also they tried to
improve these drawbacks in their experiments. Simutis et al, perform tests to see the efficiency of these two
methods and they use the data from 15 real ATMs that were recorded for 2 years. In their paper the results
showed that ANN provided better results than SVR in spite of knowing SVR capabilities. They also used
their results of experiments to improve some existing cash management solution[3].
Genevois et al. highlighted the problem of ATM location and cash management in automated teller
machines in 2015. In their research they discussed two problem which bank are facing one is finding
suitable location for ATM and other is cash management strategy. The author in this paper suggested new
ways to plant new ATM according to its location. In addition, they also discussed in detailed regarding
various technique opted by other researcher for optimal cash management [4].
Teddy and Ng propose a local learning model of the pseudo self-evolving cerebellar model articulation
controller (PSECMAC) to forecast ATM cash demand on daily basis[15]. In their experiment they used the
data of 111 ATM cash withdrawal series. They ignored the pattern or seasonality in their model, to check
the performance of the model they compared it with local and global learning based models. Ben Taieb et
al, also did their experiment in the same fashion the also included the effects of seasonality, input selection,
and forecast arrangements.
Arora et al, suggested an application of fuzzy ARTMAP Network for analyzing and forecasting daily
cash requirement in ATM, to ensure the maximum cash availability and withdrawal service. Historical
data of 2 years from a localized bank for cash management at ATM at different locations is considered for
experiment. The accuracy which was achieved during their experiment was more than 95%. The feature set
count use to train the model was 13 and prediction error was in between 3-5 percent only [9].
Zandevakili et al, showed that how ATM cash management can be smartly done using fuzzy logic (type
II). He conducted his experiment on simulated environment consisting of 25 ATMs and conclude that Fuzzy
will be reduced the deposit cash rate. The overall average prediction accuracy of the proposed method (in
4 / 00 (2019) 1–20

weeks) is about 72/97% and the minimum accuracy of the predicted is 94%[10].
Laukaitis et al, demonstrated functional autoregressive model technique to predict cash in ATMs and
number of transactions involved in electronic payment channels (ATM: auto teller machine and POS: point
of sale) as the continuous-time stochastic process on an entire time-interval. Writer emphases on two linear
wavelet methods; regularized wavelet-vaguelette estimators and projection method in his research. Fur-
thermore, in this experiment he used credit card payment data of Lithuania bank to train the model. He
summarizes that both the model has provided very close prediction in term of MISE (Mean Integrated
Squared Error)[11].
Brentnall et al, construct a methods for forecasting the daily amounts of withdrawals from automated
teller machines (ATMs). The data which they have used consisted of information of two year of 190 ATMs
of one of the bank of United Kingdom. They applied different existing models such as linear models,
autoregressive models, structural time series models and Markov-switching models and compared these
models. Furthermore, they experimented different model for each ATM, also they used a logarithmic scoring
rule in order to conclude the most suitable seasonal and distributional expectations for each ATM. In their
work they mentioned that by using different technique for each ATM, they had a chance to examine the pros
and cons of the each technique. In their study, performance indicator was used examine each method/model
on existing data[12].
In their additional research, they used random-effects model to predict how much a customer is used to
transaction amount in his every visit. They used Multinomial distribution for the distribution of amounts and
to model the distribution of random effect model he used the Dirichlet distribution and empirical distribution.
They used sample of 5.000 UK bank accounts to perform several test and to analyze the efficiency of their
models. They concluded that the empirical distribution of random effects works well with 5000 accounts
but also they said that there are millions of accounts in bank.
Castro et al, worked on two problem related with cash management; one is cash management in ATMs
and other is the compensation of credit card transactions where an action is taken according to future cus-
tomer demand which is indeterminate. In his research, stochastic programming models are utilized for these
two problems. Two short-term models one is single refilling and second is mid-term model which regards
more than one filling, with fixed and staircase costs for ATMs were presented. In their research they used
MILP equivalent deterministic formulations. The language they used is AMPL modeling language and
the solver CPLEX 9.1 was used for resolution of the problem and for finding the efficient or near efficient
solutions[14].
Managing cash in ATMs is an important subject in banking world. Money should be arranged timely and
each ATM should be refilled according to the forecasts. If the forecast were to high then the actual demand
then it will result waste of money and if the forecasts are too low than the actual demand then ATMs will
go out of cash frequently and this will lead to customer dissatisfaction [20]. Therefore, it is important to
forecast accurately and efficiently. However, when we look at the related literature, the information about
ATM cash management solution is very limited. Ekinci, et al, suggested grouping of ATMs into nearby-
location clusters and also try to optimize the sum of daily cash withdraws with the forecasting techniques.
Their studies showed that this integrated forecasting and optimization procedure performs well in order to
minimize costs of loading cash, customer dissatisfaction, and cash-interest fee[16].

3. Data sensitivity and data acquisition

The biggest limitation in this research was the arrangements of data. Bank do not usually expose their
data due to following issues:
• The data consist of customer information such as card number and along with the Track2 (The data
behind in mag stripe) data which can exploited in many ways..
• The bank are Payment Card Industry Data Security Standard (PCI DSS) compliance which is a set
of security standards designed to ensure that all companies that accept, process, store or transmit
credit/debit card information maintain in a secure environment and many more.
/ 00 (2019) 1–20 5

• One another reason for not sharing data publicly is the customer retentions and that is why one bank
never wants other banks to capture its customer.
In this section we have discussed exploratory data analysis (EDA), in such way that how we have extract
features from data? how we have created new feature set?, how we have plot the data to analyze pattern.
We collected the ATM transaction data of ten ATMs spread across vaious localities of Karachi two and
a half years. The data consist of 120,246 rows. This transaction include: withdrawal, fund transfer, tittle
Fetch, bill payment, pin change, balance inquiry and many more. We have also managed to arrange the data
of ATM replenishment of 2.5 years. This data consisted of how much amount is inserted into ATM and how
much the data was present before inserting new cash. The transaction data which we have received from
bank consisted of 60 columns. Each column contains the transaction information, such as date, amount,
time, terminal ID (ATM ID), ATM location, currency code, customer ID, card number, tracks information,
transaction type, channel type, transaction ID and etc. each of these column indicates unique information
like currency code specifying in which currency transaction had been performed. Amount specifying how
much amount had been transacted, Tracks information is the information which is behind the mag stripe
and it cannot be seen via human eye, channel type showing the source channel from which transaction is
initiated. Transaction log id is the unique reference number for each transaction, and transaction type table
indicates type of transaction, Table 2 showing the transaction codes, transaction description along with the
transaction count which had been carried out in 2.5 years.

Table 2: Transaction Code Description

Transaction Code Transaction Description Count


01 Withdrawal 86128
30 Balance Inquiry 18274
43 Open Ended 1410
44 Inter Bank Funds Transfers 2626
50 PIN Change 1
53 Mini Statement Transaction Description 3197
54 PIN Change 25
56 Cheque Book Request 3
57 PIN Validation ATM 1553
63 Title Fetch 5467
71 Bill Relationship Inquiry 615
72 Bill Inquiry 558
73 Bill Payment 389

Our problem statement was to forecast cash withdrawal, therefore we have eliminated all the transaction
data except for withdrawals. After elimination, our data set had become around 86,129 rows. In this data
we have two kind of withdrawal transaction one is approved (RESPPONSE CODE = 000) which say the
guarantee money has been dispensed in those transaction and second is not approved (RESPPONSE CODE!
= 000) which says transaction had been rejected due to some reasons. Table ?? shows the response code
and its description along with the count of approved and not approved transactions.
6 / 00 (2019) 1–20

Response Code Response Description Count


000 Approved 74951
001 Limit Exceeded 1731
002 Account Not Found 254
003 Account Inactive 62
004 Low Balance 2879
007 Card not found 6
009 Error in input data 10
014 Warm Card 326
015 Stolen/Lost Card 14
017 Customer Cancellation 163
024 Bad PIN 542
025 Expired Date 24
028 Account not linked 199
039 No Credit account 1
041 Expired date mismatch 5
045 Unable to process 272
050 Host status unknown 100
053 No saving account 31
055 Host link down 894
058 Timed out 1018
060 PIN retries exhausted 127
061 HSM not responding 24
079 Honor with ID 17
083 No Comms Key 1
091 Issuer reversal 263
094 TXN is not allowed 3
096 Transaction rejected 10
097 Cash has expired 21
104 Account Blocked 148

Table 3: Respond code and description of transactions

In the data of replenishment there is the information of denomination each cassette holds. Along with
the notes count. This data also contains the information regarding how much notes had been inserted on that
particular ATM. There were 459 rows altogether.
/ 00 (2019) 1–20 7

4. Pre-processing and Exploratory Data Analyse

In the initial stage of our research we have performed EDA (Exploratory Data Analysis) on our data. We
have analyzed the filtered data which consist of transaction amount and transaction date.

Fig. 1: Exploratory data analysis between transaction amount and transaction date

Figure 1 showing that pattern is so scattered therefore it was very difficult for us to see the arrangement
of data by our naked eye. Following figure 2 contains month wise transaction pattern of withdrawals for
year 2013, 2014 and 2015.

Fig. 2: Pattern of transactions done over 2.5 years

For the year of 2014, it can be noticed that people has transacted more amount from ATMs, as compared
to the year 2013. There can multiple reasons for it. One important reason may be inflation rate. As inflation
increases the cash withdrawal from ATM will decrease.
Following figure 3 shows the yearly transaction withdrawal amount from ATM. We can observe from
the figure that rate of doing transaction increases in the starting two months of the year and then gradually
decreases in following four months while the transaction rate again increases in July and August.
8 / 00 (2019) 1–20

Fig. 3: Line plot over yearly transaction

The figure 4 shows the monthly transaction. we can notice from the below plot that most of the transac-
tion has been done in the start and end of month, also it can be noticed that transaction rate is very low in
the mid of month because people do not expend in the mid of the month. People are most likely get their
salary at the end or the start of the month that’s why the transaction peak is high at the start and end of the
month.

Fig. 4: Figure of Monthly Transaction

Following figure 5 shows week-wise transaction pattern. From the figure we can figure out that most of
the transaction has been done on Sunday as compare to other days because Sunday.

Fig. 5: weekly transaction pattern


/ 00 (2019) 1–20 9

Following figure 6 shows daily transaction pattern of the data-set. The figure give us insight about the
data-set that most of the transaction has been done in the start and end of the day.

Fig. 6: Plot of daily transactions

After the process of data filtration, only two columns were left that is the date and the amount. The data
consisted of withdrawal transactions both rejected and successful. After this we have noticed that there were
some data missing. This missing data represents that no transaction was took place at that particular date,
so we have inserted the missing date with zero amount indicating that no transaction had been take placed.
After adding missing data, we had sum all the amount having same date, this resulted in a shrink of data
and the data size is now become 911 rows that is for each day there is only one row as shown in Table 4.

Table 4: List of of features, along with a sample of the head of the table

S_No Date_Location_Transaction Transaction Amount


28 20130128 635500
29 20130129 700500
30 20130130 658500
31 20130131 762500
32 20130201 138500
33 20130202 79500
34 20130203 0
35 20130204 281500
36 20130205 323000

From the concept of under-fitting and over-fitting, we know that two features are very less to forecast any
cash withdrawal amount. Therefore we needed more features or data, so that pattern in the data can be easily
detected. In order to achieve this, the first column which was added was IS_Weekday which indicate either
the day was week-day or not. Our data belongs to Pakistani bank, so Saturday and Sunday are weekend in
Pakistan. The reason for including this column was that people normally prefer to go out and spent their
money mostly on weekends. After adding this feature, our data can view in Table 5.
The second column which we had added was IS_Salary which say if the date is salary date or not, usually
in Pakistan most of the people received their salary during first and last week that is from twenty-sixth to
fifth of every month, so after adding this column. The reason of adding this column was that people mostly
10 / 00 (2019) 1–20

Table 5: Bifurcating transactions on the basis of a working day

S_No Date_Location_Transaction Transaction Amount Is_WEEKDAY


1 20130101 150500 0
2 20130102 376500 0
3 20130103 662500 0
4 20130104 850500 1
5 20130105 768500 1
6 20130106 301500 1
7 20130107 388000 0
8 20130108 427500 0
9 20130109 409500 0
0 20130110 535500 0
1 20130111 512000 0
2 20130112 212000 1

perform withdrawal transaction when they get their salary in their hands. After adding this feature, our data
can view in in Table 7
Table 6: Salary days highlighted

S_No Date_Loc_Trans Transaction Amount ISWeekday ISSalaryWeek


1 20130101 150500 0 1
2 20130102 376500 0 1
3 20130103 662500 0 1
4 20130104 850500 0 1
5 20130105 768500 1 1
6 20130106 301500 1 0
7 20130107 388000 0 0
8 20130108 427500 0 0
9 20130109 409500 0 0
10 20130110 535500 0 0

The next column that we have added is the transaction count per day. The reason for adding was to
analyze the pattern of transaction count on every day. it is possible that transaction count on any week-
end/weekday is higher than normal. Table 5 shows the data overview:
After adding transaction count. We have added seven columns indicating the days of week. If value of
Week1 column is 1 then it means it’s Sunday and similarly if value of Week7 column is 1 then it means
its Saturday. The reason for adding seven column for each day of week is to analyze the weekly pattern in
/ 00 (2019) 1–20 11

Table 7: Accumulating on the basis of daily transaction count

S_No Date_Loc_Trans Transaction Amount ISWeekday ISSalaryWeek Tran_Count


1 20130101 150500 0 1 20
2 20130102 376500 0 1 52
3 20130103 662500 0 1 68
4 20130104 850500 0 1 98
5 20130105 768500 1 1 76
6 20130106 301500 1 0 40
7 20130107 388000 0 0 30
8 20130108 427500 0 0 54
9 20130109 409500 0 0 53
10 20130110 535500 0 0 71

withdrawals. Like for example on every Monday of each month, total amount transacted by bank customer
and many other. Table 8 depicts the DB snapshot of all features after adding column for each day of week.
Before running experiments,significant manual work is done. For example we have also created data by
own to capture holiday effect and name the parameter as “Isholiday”. We have assigned it value of 0 or 1
(0 means not a holiday while 1 means its holiday). For creation of data of holidays, we surfed the internet
to find out total gazette holidays observed in the year 2013-2015. We have capture following data shown in
Table 9 of holidays for the year JAN, 2013-JUNE, 2015. After processing and analyzing the data we run the
experiment to map the transaction pattern of the data.
12

Table 8: Trends across individual days of a week

S_No Date_Loc_Tran Tra_ Is_WeekDay Is_SalaryWeek Tran WDay1 WDay2 WDay3 WDay4 WDay5 WDay6 WDay7
Amount
1 20130101 150500 0 1 20 0 0 1 0 0 0 0
2 20130102 376500 0 1 52 0 0 0 1 0 0 0
3 20130103 662500 0 1 68 0 0 0 0 1 0 0
4 20130104 850500 1 1 98 0 0 0 0 0 1 0
5 20130105 768500 1 1 76 0 0 0 0 0 0 1
6 20130106 301500 1 0 40 1 0 0 0 0 0 0
/ 00 (2019) 1–20

7 20130107 388000 0 0 30 0 1 0 0 0 0 0
8 20130108 427500 0 0 54 0 0 1 0 0 0 0
9 20130109 409500 0 0 53 0 0 0 1 0 0 0
0 20130110 535500 0 0 71 0 0 0 0 1 0 0
11 20130111 512000 0 0 67 0 0 0 0 0 1 0
12 20130112 212000 1 0 31 0 0 0 0 0 0 1
/ 00 (2019) 1–20 13

Table 9: Gazettes holidays’ list in the country of location

S_No Holiday_Name
1 Eid Milad un_Nabi
2 Kashmir Day
3 Pakistan Day
4 Labour Day
5 Eid-ul-Fitar Day 1
6 Eid-ul-Fitar Day 2
7 Eid-ul-Fitar Day 3
8 Eid-ul-Fitar Day 4
9 Independence Day
10 Eid-ul-Azha Day 1
11 Eid-ul-Azha Day 2
12 Eid-ul-Azha Day 3
13 1st Day of Ashura
14 2nd Day of Ashura
15 Iqbal Day
16 Christmas Day
17 Quaid-e-Azam Day
18 Day After Christmas
19 Diwali/Deepavali
20 Holi
21 Easter Monday

5. Experimentation

In this section, we have started with mapping the transaction pattern of 2.5 year as shown in figure 7. In
the figure we can notice that most of transactions that occur every day is in between 50k to 100k indicating
the transactional pattern. The correlation between transaction amount and transaction count is 0.98 which
indicated most transaction is of small amounts. As our problem is a kind of regression problem. Normally,
a regression analysis is carried for two purposes: first is to predict the value/behavior of the dependent
variable for an individuals for which we have some information of the explanatory variables, and second
is to estimate the influence of some explanatory variable over dependent variable. In our case our output
variable is Transaction amount (we are predicting amount needed on next day) and rest of parameters are
input variable.
14 / 00 (2019) 1–20

Fig. 7: Relationship between transaction amount and date

In the figure 7 we have shown the relationship between transaction amount and date. The green spots
showing the dates in 2013, while the blue and red dates indicates the dates in 2014 and 2015 respectively. In
the above plot, it can be noticed that most of transactions per day is in between 50k to 100k, which clearly
shows the pattern in it. The next plot that is figure 8 shows the relationship between transaction amount
and transaction count. The Pearson coefficient value is 0.94 clearly indicates that both of these variable are
highly co-related to each other. The maximum transaction count was encounter in our data is 200 and the
maximum transaction amount is 250k. The graph is linear which mean as the number transaction count
increases the amount of withdrawal will also increase as shown in figure 8.

Fig. 8: Corelation between transaction amount and transaction count

A high correlation between transaction-amount and transaction-count indicates that most transactions
were of a small amount of money, and no single large transaction accounted for a large chunk of the money.
All the experiment which are carried out split in to ratio of 70/30 which means 30 percent is the training
the model where as 70 percent is for testing the model. We have used Scikit-learn library to conduct our
experiments. The below are a set of functions projected for regression in which the output value is expected
to be a linear combination of the input variables. In mathematical term, if y(w,x) is the output value then we
can say.

y(w, x) = w0 + w1 x1 + ....w p x p (1)


Above formulae designate the vectors w = w1 , ..., w p as coef_ and w0 as intercept_.
Linear Regression fits a linear model with coefficients w = w1 , ..., w p to minimize the residual sum of
squares between the observed responses in the dataset, and the responses predicted by the linear approxima-
tion. Mathematically it solves a problem of the form:
/ 00 (2019) 1–20 15

minw ||Xw − y||2 (2)

5.1. Linear Regression Model


With Linear Regression Model along with default settings we have got following result mean squared
error: 0.05 variance score: 0.79. It can be noticed that error has been decreased drastically. The error shows
that we are 0.05 percent away from accuracy. The strong value of variance is 1 but in our case it is 0.79
which is acceptable.

5.2. Linear regression with ridge regularsation


Ridge Regression technique, is used when the data suffers from multi-co-linearity. In multi-co-linearity,
even the least squares estimates (OLS) are unbiased, their variances are large which deviates the observed
value far from the actual value. By adding a degree of bias to the regression estimates, ridge regression
reduces the standard errors. Ridge regression solves the multi-co-linearity problem by shrinking parameter
λ. (lambda). These are some important points about Ridge. The assumptions of this regression is same
as least squared regression except normality is not to be assumed. It shrinks the value of coefficients but
does not reaches 0, which proposes no feature selection. This method regularizes and uses l2 regularization.
After running ridge we got following results: mean squared error: 0.03 and variance score: 0.89. It can be
noticed that MSE and variance is improved.

5.3. Linear regression with ridge regularsation


The Least Absolute Shrinkage and Selection Operator is a regression method that involves constraining
the absolute size of the regression coefficients. By constraining the sum of the absolute values of the esti-
mates, we end up in a condition where some of the parameter estimates may be exactly 0. The larger the
penalty applied, the further estimates are shrunk towards 0. This is convenient for some automatic variable
selection, or when dealing with highly correlated predictors, where standard regression will usually have
‘too large’ coefficients. After running lasso we got following results: mean squared error: 0.03 and variance
score: 0.90. It can be noticed that MSE and variance is improved.

5.4. Linear regression with Bayesian ridge regularsation


In Bayesian linear regression is a method of linear regression in which the statistical analysis is assumed
to be within the context of Bayesian inference. After running BRR we got the following results: mean
squared error: 0.03 and variance score: 0.90.

5.5. Time-series regression


Time series modeling is the method for forecasting and prediction. It takes the decision by working
on time that is (minutes, years, days, hours) and find out hidden insight in data. It works well when the
data is correlated. It is basically is a set of data points gathered at constant time interval. There are two
thing which makes time series special and different from linear regression. First it is time dependent, unlink
linear regression which says that observations are independent, second, it identify seasonal trends in the data
for example transaction occur most before gazette holiday etc. In order to run time series model we have
assumed that time series (TS) is stationary and its statistical properties such as variance, mean remain same
over period.it is important to because there is very high probability that series will follow same pattern in
future also. To check the stationary we have did two thing first is plot Rolling Statistics. This plot contain
the moving averages and moving variance and analyze if it is varies from time. Another test for checking
stationary is to check with Dickey-Fuller Test which consist of a ’Test Statistic’ and ’Critical Values’ at
difference level (confidence). If the ‘Critical Value’ is greater than ‘Test Statistic’ then we can say that the
series is stationary as shown in figure 9.
From the figure 9 we concluded that the difference in standard deviation is very small, mean is clearly
increasing and decreasing with time and thus we can say that it’s not a stationary series. Also, the test statis-
tic is way less than the critical values. Note that the signed values should be compared and not the absolute
16 / 00 (2019) 1–20

Fig. 9: Analyzing stationarity of time series with rolling statistics

values.
Following are the results of Dickey-Fuller Test:
Test Statistic -4.880407
p-value 0.000038
Lags Used 21.000000
Number of Observations Used 889.000000
Critical Value 5 % -2.864797
Critical Value 1 % -3.437727
Critical Value 10 % -2.568504

Hence from the above test we concluded that, we have to make time series stationary. To make time
series stationary we need to model the trend and seasonality in the distribution and remove those trends
from the series to get a stationary distribution. One of the method to stationeries series to calculate moving
average and find it’s stationary. In this method, we take average of ‘k’ prior values depending on the
frequency of time series. Here we can take the average over the past one month, i.e. last thirty values. After
running Dickey-Fuller Test, we found the following plot and distribution as shown in figure 10.

Fig. 10: Distribution of model after stationarization

Following are the results of Dickey-Fuller Test:


Test Statistic -1.073524e+01
p-value 2.910471e-19
Lags Used 2.100000e+01
Number of Observations Used 8.880000e+02
/ 00 (2019) 1–20 17

Critical Value (5%) -2.864800e+00


Critical Value (1%) -3.437735e+00
Critical Value (10%) -2.568506e+00

Figure 10 shows a much better series. The rolling values seems to be varying slightly but we can say that
there is no trend in the series. In addition, the test statistic is smaller than the 5% critical values so we could
conclude that the confidence of our stationary series is 95%. Now we have made the time series stationary
we can directly run our experiments of time series forecasting and the approach we will use is the ARIMA
(Auto-Regressive Integrated Moving Averages) approach. This approach same as linear equation, the only
difference is that the predictor depend on three parameter that is p, d and q of an ARMA model.

Fig. 11: Analyzing stationarity of time series after calculating Moving Averages

Figure 11 indicate that our predication is not accurate, we have created the series on two variable that is
amount and date which is time series object. The blue line in the graph show the original series whereas red
line in the plot showing the prediction. The RSS (Residual Squared Sum) indicates the error of 351 which
is not acceptable. After normalization of RSS we have got Normalize root squared sum 0.30. We have run
ARMA model for weeks also but the results were not satisfactory either.

5.6. Recurrent Neural Network - Long Short-Term Memory (LSTM) Model


The long short term memory network is the RRN which trained itself with the help of back propagation.
LSTM contain the memory block instead of neurons which are connected via layers. To setup our exper-
iment for neural network we have used transaction amount feature, because we are predicting transaction
amount. That is, given the amount of transaction (in units of thousands) on a day, what is the amount of
transaction next day? We have created 3 variable one contain transaction amount of day (X), second contain
the contain transaction amount of next day (Y), and third contain the transaction amount of next to next day
(Z). Figure 12 contain the plot of test with LSTM.
Figure 12 shows the data which is plotted, showing the original dataset represents in blue, the predictions
for the training dataset in green, and the predictions on the unseen test dataset in red. On our experiment,
we have divided training and testing dataset with the ratio of 67:33 and ran 20 epochs. With this setup
of variable, we got MSE of .027 on Training dataset while MSE of .028 on testing dataset. The RMSE
indicates that we are .027 away from our actual result.

5.7. Recurrent Neural Network - LSTM for Regression with Time Series
Now we have converted our problem into time series problem, Time steps provide another way to phrase
our time series problem. Like above in the window example, we can take prior time steps in our time series
as inputs to predict the output at the next time step. With changing the shape of data, we have run our
18 / 00 (2019) 1–20

Fig. 12: Experiment with neural network LSTM

experiment on previous variable, and get the MSE of .028 on Training dataset while MSE of .029 on testing
dataset. The MSE indicates that we are .028 away from our actual result. Figure 13 shows the test with
LSTM with time series

Fig. 13: Plot of LSTM for regression with time Series

Figure 13 shows the data which is plotted, showing the original dataset represents in blue, the predictions
for the training dataset in green, and the predictions on the unseen test dataset in red. On our experiment,
we have divided training and testing dataset with the ratio of 67:33 and ran 20 epochs.

5.8. Discussion
In Section 3, we have ran different regression techniques by increasing the feature set, the total feature
set used in all regression experiments were thirteen. All the information of features is discussed in section
3.3, also we have added missing value in the data set. After running experiment we have found following
MSE and variance for each experiment.
/ 00 (2019) 1–20 19

Table 10: Best accuracy per algorithm

Algorithm Mean Squared Eror Variance


Linear Regression Model 0.05 0.79
Ridge Linear Regression Model 0.03 0.89
Lasso Model 0.03 0.90
RidgeCV Model 0.03 0.90
LassoLAR 0.26 -0.01
Bayesian Redige Regression 0.03 0.90
Recurrent Neural Network (LSTM Model) 0.028 0.91
LSTM for Regression with Time Series 0.029 0.91

Note that MSE (Mean squared error) indicates how far we are away from actual prediction while variance
is about inference in data. The good value for variance is near to one. We can noticed from the above table
that almost all of the regression technique has provided impressive result, but the best we have got is from
Bayesian Ridge Regression and RidgeCV Model that is 0.90 variance and 0.03 MSE one of the reason for
better result is may be it does cross validation on data, due to which the data trained and test more efficiently.
The next experiment which we have ran on our data is the time series regression, in time series experi-
ment we have kept two features only one is transaction amount and other is date and created the stationary
series in order to identify trends and seasonality in data. We have tested the stationary of time series model
with Dickey-Fuller test and found 95% confidence on stationary series. After making the time series sta-
tionary we have experimented with ARIMA (Auto regressive Integrated Moving Average). The ARIMA
forecasting for a stationary time series is nothing but a linear (like a linear regression) equation. The pre-
dictors depend on the parameters (p,d,q) of the ARIMA model: Number of AR (Auto-Regressive) terms
(p), Number of MA (Moving Average) terms (q), and Number of Differences terms (d). The RSS (Residual
Squared Sum) indicates the error of 351 which is not acceptable. After normalization of RSS we have got
normalize root squared sum 0.30, which is very low therefore we can conclude that ARIMA approach don’t
work well on our data. The figure 9 shows stationary of time series which we got after running experiments.
The reasons for this results may be: less features, data is asymmetric, sample are not selected properly,
under-fitting and over-fitting

6. Conclusions

In this paper we present a manual analysis of the ATM out of cash problem, feeding into a time-series
prediction loop. The data set which we have used in this research contains the ATM withdrawal transactional
data of one of the largest banks of Pakistan. The proposed study of this paper has implemented various
algorithms for optimization such as linear regression, Ridge Linear Regression, LASSO Model Ridge CV
Model, LASSO LAR, Bayesian Ridge Regression, Random Forest Regression, time series prediction, RRN,
RRN with time series and ARIMA. In the final analysis, we have found that the linear algorithm has provided
optimal solution. We have got more that 98% accuracy with that approach on the novel data set.

References
[1] Roberto Armenise, Cosimo Birtolo, Eugenio Sangianantoni, and Luigi Troiano. “A generative solution for ATM Cash Manage-
ment”. 2010.
[2] S. Madhavi, S. Abirami, C. Bharathi, B. Ekambaram, T. Krishna Sankar, A. Nattudurai, N. Vijayarangan. “ATM Service Analysis
Using Predictive Data Mining”. International Journal of Computer, Electrical, Automation, Control and Information Engineering
Vol: 8, No:2, 2014.
20 / 00 (2019) 1–20

[3] Rimvydas Simutis 1, 2 , Darius Dilijonas 2 , Lidija Bastina 3 , Josif Friman 3 , Pavel Drobinov 3. “OPTIMIZATION OF CASH
MANAGEMENT FOR ATM NETWORK”. ISSN 1392 – 124X INFORMATION TECHNOLOGY AND CONTROL, 2007,
Vol.36, No.1A..
[4] M. Erol Genevois, D. Celik, H. Z. Ulukan. “ATM Location Problem and Cash Management in Automated Teller Machines”.
International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol: 9, No: 7, 2015.
[5] Ahmadreza Ghodratia, Hassan Abyakb and Abdorreza Sharifihosseinic. “ATM cash management using genetic algorithm”. Man-
agement Science Letters 3 (2013) 2007–2014.
[6] Saad M. Darwish. “A Methodology to Improve Cash Demand Forecasting for ATM Network”. International Journal of Computer
and Electrical Engineering, Vol. 5, No. 4, August 2013.
[7] Mohammad Hossein Pour Kazemi, Ph.d Eldar. Sedaght Parast, Mojtaba Amini “Prediction of Optimal Cash Injection in Auto-
mated Teller Machines, ARMA Approach”. 2014.
[8] Venkatesh Kamini 1 Vadlamani Ravi 2 Anita Prinzie3 Dirk Van den Poel4. “Cash Demand Forecasting in ATMs by Clustering
and Neural Networks”. 30 September 2013.
[9] Nidhi Arora1 , Jatinder Kumar R. Saini 2 “Approximating Methodology: Managing Cash in Automated Teller Machines using
Fuzzy ARTMAP Network”.
[10] Mojtaba Zandevakili, Mehdi Javanmard “Using fuzzy logic (type II) in the intelligent ATMs’ cash management” International
Research Journal of Applied and Basic Sciences 2014 Available online at www.irjabs.com ISSN 2251-838X / Vol, 8 (10): 1516-
1519.
[11] A. Laukaitis, "Functional data analysis for cash flow and transactions intensity continuous-time prediction using
Hilbert-valued autoregressive International Science Index, Industrial and Manufacturing Engineering Vol:9, No:7, 2015
waset.org/Publication/10002685
[12] A.R. Brentnall, M.J. Crowder, D.J. Hand, "Predictive-sequential forecasting system development for cash machine stocking",
International Journal of Forecasting, vol. 26, pp.764-776, 2010a
[13] A.R. Brentnall, M.J. Crowder, D.J. Hand, "Predicting the amount individuals withdraw at cash machines using a random effects
multinomial model", Statistical Modeling, vol. 10 (2), pp.197-214, 2010b.
[14] J. Castro, "A stochastic programming approach to cash management in banking", European Journal of Operational Research. vol.
192, pp.963- 974, 2009.
[15] S.D. Teddy, S.K. Ng, "Forecasting ATM cash demands using a local learning model of cerebellar associative memory network",
International Journal of Forecasting, vol. 27, pp.760-776, 2011.
[16] Y. Ekinci, J-C. Lu, E. Duman, "Optimization of ATM cash replenishment with group-demand forecasts", Expert Systems with
Applications, vol. 42, pp. 3480-3490, 2015.

You might also like