Norwbook Plotyexpress

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Prepared by Asif Bhat

Plotly / Plotly Express Tutorial using ANZ Dataset


In [1]: import numpy as np
import pandas as pd
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
import plotly.graph_objects as go
import plotly.express as px
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
import plotly.io as pio

In [2]: df = pd.read_excel('ANZ synthesised transaction dataset.xlsx')


df.head()

Out[2]:
status card_present_flag bpay_biller_code account currency long_lat txn_description merchant_id merchant_code first_name ... age

81c48296-
ACC- 153.41 73be-44a7-
0 authorized 1.0 NaN AUD POS NaN Diana ... 26
1598451071 -27.95 befa-
d053f48ce7cd

830a451c-
ACC- 153.41 316e-4a6a-
1 authorized 0.0 NaN AUD SALES-POS NaN Diana ... 26
1598451071 -27.95 bf25-
e37caedca49e

835c231d-
ACC- 151.23 8cdf-4e96-
2 authorized 1.0 NaN AUD POS NaN Michael ... 38
1222300524 -33.94 859d-
e9d571760cf0

48514682-
ACC- 153.10 c78a-4a88-
3 authorized 1.0 NaN AUD SALES-POS NaN Rhonda ... 40
1037050564 -27.66 b0da-
2d6302e64673

b4e02c10-
ACC- 153.41 0852-4273-
4 authorized 1.0 NaN AUD SALES-POS NaN Diana ... 26
1598451071 -27.95 b8fd-
7b3395e32eb0

5 rows × 23 columns

In [3]: df.shape

Out[3]: (12043, 23)

In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12043 entries, 0 to 12042
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 status 12043 non-null object
1 card_present_flag 7717 non-null float64
2 bpay_biller_code 885 non-null object
3 account 12043 non-null object
4 currency 12043 non-null object
5 long_lat 12043 non-null object
6 txn_description 12043 non-null object
7 merchant_id 7717 non-null object
8 merchant_code 883 non-null float64
9 first_name 12043 non-null object
10 balance 12043 non-null float64
11 date 12043 non-null datetime64[ns]
12 gender 12043 non-null object
13 age 12043 non-null int64
14 merchant_suburb 7717 non-null object
15 merchant_state 7717 non-null object
16 extraction 12043 non-null object
17 amount 12043 non-null float64
18 transaction_id 12043 non-null object
19 country 12043 non-null object
20 customer_id 12043 non-null object
21 merchant_long_lat 7717 non-null object
22 movement 12043 non-null object
dtypes: datetime64[ns](1), float64(4), int64(1), object(17)
memory usage: 2.1+ MB

In [5]: df.shape

Out[5]: (12043, 23)


In [6]: df.head()

Out[6]:
status card_present_flag bpay_biller_code account currency long_lat txn_description merchant_id merchant_code first_name ... age

81c48296-
ACC- 153.41 73be-44a7-
0 authorized 1.0 NaN AUD POS NaN Diana ... 26
1598451071 -27.95 befa-
d053f48ce7cd

830a451c-
ACC- 153.41 316e-4a6a-
1 authorized 0.0 NaN AUD SALES-POS NaN Diana ... 26
1598451071 -27.95 bf25-
e37caedca49e

835c231d-
ACC- 151.23 8cdf-4e96-
2 authorized 1.0 NaN AUD POS NaN Michael ... 38
1222300524 -33.94 859d-
e9d571760cf0

48514682-
ACC- 153.10 c78a-4a88-
3 authorized 1.0 NaN AUD SALES-POS NaN Rhonda ... 40
1037050564 -27.66 b0da-
2d6302e64673

b4e02c10-
ACC- 153.41 0852-4273-
4 authorized 1.0 NaN AUD SALES-POS NaN Diana ... 26
1598451071 -27.95 b8fd-
7b3395e32eb0

5 rows × 23 columns

In [7]: df.isnull().sum() # Drop 'bpay_biller_code' & 'merchant_code' as majority of the values are NULLS

Out[7]: status 0
card_present_flag 4326
bpay_biller_code 11158
account 0
currency 0
long_lat 0
txn_description 0
merchant_id 4326
merchant_code 11160
first_name 0
balance 0
date 0
gender 0
age 0
merchant_suburb 4326
merchant_state 4326
extraction 0
amount 0
transaction_id 0
country 0
customer_id 0
merchant_long_lat 4326
movement 0
dtype: int64

In [8]: df.country.value_counts() # This can be dropped as we are only dealing with one country

Out[8]: Australia 12043


Name: country, dtype: int64

In [9]: df.currency.value_counts() # This can be also dropped as we are only dealing with just one currency

Out[9]: AUD 12043


Name: currency, dtype: int64

In [10]: # Drop 'bpay_biller_code' ,Currency and 'merchant_code' columns.


df.drop(['bpay_biller_code','merchant_code', 'currency','country'],axis=1,inplace=True)

In [11]: # Duplicates
df.duplicated().sum() # NO Duplicates

Out[11]: 0

In [12]: # Create Age buckets


df['age_group']=pd.cut(df.age,[0,20,30,40,50,60,99999],labels=['<20','20-30','30-40','40-50','50-60','>60'])

In [13]: # Change datatype of extraction to datetime


df['extraction']= pd.to_datetime(df['extraction'])
In [14]: # Create date helper columns
df['month'] =df['date'].dt.month_name()
df['day'] = df['date'].dt.day_name()
df['hour']= df.extraction.dt.hour
df.head()

Out[14]:
status card_present_flag account long_lat txn_description merchant_id first_name balance date gender ... extraction amount

81c48296-
ACC- 153.41 73be-44a7- 2018- 2018-08-01
0 authorized 1.0 POS Diana 35.39 F ... 16.25
1598451071 -27.95 befa- 08-01 01:01:15+00:00
d053f48ce7cd

830a451c-
ACC- 153.41 316e-4a6a- 2018- 2018-08-01
1 authorized 0.0 SALES-POS Diana 21.20 F ... 14.19
1598451071 -27.95 bf25- 08-01 01:13:45+00:00
e37caedca49e

835c231d-
ACC- 151.23 8cdf-4e96- 2018- 2018-08-01
2 authorized 1.0 POS Michael 5.71 M ... 6.42
1222300524 -33.94 859d- 08-01 01:26:15+00:00
e9d571760cf0

48514682-
ACC- 153.10 c78a-4a88- 2018- 2018-08-01
3 authorized 1.0 SALES-POS Rhonda 2117.22 F ... 40.90
1037050564 -27.66 b0da- 08-01 01:38:45+00:00
2d6302e64673

b4e02c10-
ACC- 153.41 0852-4273- 2018- 2018-08-01
4 authorized 1.0 SALES-POS Diana 17.95 F ... 3.25
1598451071 -27.95 b8fd- 08-01 01:51:15+00:00
7b3395e32eb0

5 rows × 23 columns

In [15]: df.card_present_flag = df.card_present_flag.astype('Int64')


df.head()

Out[15]:
status card_present_flag account long_lat txn_description merchant_id first_name balance date gender ... extraction amount

81c48296-
ACC- 153.41 73be-44a7- 2018- 2018-08-01
0 authorized 1 POS Diana 35.39 F ... 16.25
1598451071 -27.95 befa- 08-01 01:01:15+00:00
d053f48ce7cd

830a451c-
ACC- 153.41 316e-4a6a- 2018- 2018-08-01
1 authorized 0 SALES-POS Diana 21.20 F ... 14.19
1598451071 -27.95 bf25- 08-01 01:13:45+00:00
e37caedca49e

835c231d-
ACC- 151.23 8cdf-4e96- 2018- 2018-08-01
2 authorized 1 POS Michael 5.71 M ... 6.42
1222300524 -33.94 859d- 08-01 01:26:15+00:00
e9d571760cf0

48514682-
ACC- 153.10 c78a-4a88- 2018- 2018-08-01
3 authorized 1 SALES-POS Rhonda 2117.22 F ... 40.90
1037050564 -27.66 b0da- 08-01 01:38:45+00:00
2d6302e64673

b4e02c10-
ACC- 153.41 0852-4273- 2018- 2018-08-01
4 authorized 1 SALES-POS Diana 17.95 F ... 3.25
1598451071 -27.95 b8fd- 08-01 01:51:15+00:00
7b3395e32eb0

5 rows × 23 columns
In [16]: cols = ['card_present_flag', 'status', 'txn_description' , 'movement' , 'gender', 'merchant_state']

#Subplot initialization
fig = make_subplots(
rows=3,
cols=2,
subplot_titles=('card_present_flag', 'status', 'txn_description' , 'movement','gender', 'merchant_st
horizontal_spacing=0.2,
vertical_spacing=0.2
)
# Adding subplots
count=0
for i in range(1,4):
for j in range(1,3):
fig.add_trace(go.Bar(x=df[cols[count]].value_counts().index,
y=df[cols[count]].value_counts(),
name=cols[count],
textposition='auto',
text= [str(i) + '%' for i in (df[cols[count]].value_counts(normalize=True)*100).round(2).tol
),
row=i,col=j)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
count+=1
fig.update_layout(
title=dict(text = "Analyze Categorical variables (Frequency / Percentage)",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
width = 980,
height = 920,
margin=dict(l=80, r=80, t=150, b=80)
)
fig.show()

Analyze Categorical variables (Frequency / Percentage)

card_present_flag status
8000
6000
80.26% 64.08%
6000
4000
4000 35.92%
2000
2000
19.74%
0 0
−0.5 0 0.5 1 1.5 authorized posted

txn_description movement
4000
92.67%
32.67%

10k
31.41%

3000
21.59%

2000
5k

1000
7.33% 7.33%
6.16% 0.84%
0 0
SA PO PA PA IN PH debit credit
LE S YM Y/S TE O
S-P EN AL RB NE
OS T AR AN BA
Y K NK

gender merchant_state
6000 52.19% 2000
28.11%

27.61%

47.81%
1500
4000
20.16%

1000
14.25%

2000
500
2.66%
5.38% 0.95% 0.88%
0 0
M F NSW VIC QLD WA SA NT ACT TAS

 Insights
Most of the transactions (80.26%) have been done via cards (credit / Debit Card).
Almost 64.08% transactions were authorized and rest were posted.
92.67% transactions are of type debit. Rest transactions are credit.
Looks like majority of the transactions use "SALES-POS" & "POS' transaction mode.
Males tend to do more transactions as compared to females.
NSW , VIC , QLD are most busy merchant states.
ACT & TAS are least busy states.

In [17]: df0_grp=df.groupby(by='txn_description').sum()[['amount']].reset_index()
df0_grp.amount=df0_grp.amount.apply(lambda x : round(x))
df0_grp.head()

Out[17]:
txn_description amount

0 INTER BANK 64331

1 PAY/SALARY 1676577

2 PAYMENT 201794

3 PHONE BANK 10716

4 POS 152861

In [18]: fig=px.treemap(df0_grp,
path=['txn_description'],
values='amount',
color = 'amount',
)

fig.update_layout(
title=dict(text = "Total Amount by Transaction Desciption",x=0.5,y=0.95),
margin=dict(l=10, r=10, t=70, b=10),
)
fig.data[0].textinfo = 'label+value'
fig.update_traces(marker_coloraxis=None)
fig.show()

Total Amount by Transaction Desciption

PAY/SALARY PAYMENT
1,676,577 201,794

SALES-POS POS
157,005 152,861

INTER BANK
64,331

Insights :

Pay/Salary is the major contributor of bank txn amount which is expected as salary transaction amount is usually very high as compared to
normal debit transactions.

In [19]: df_grp0=df.groupby(by='merchant_suburb').sum()[['amount']].reset_index()
df_grp0.head()

Out[19]:
merchant_suburb amount

0 Abbotsford 2004.29

1 Aberdeen 52.45

2 Aberfeldie 57.77

3 Aberfoyle Park 84.92

4 Acacia Ridge 10.30


In [20]: fig=px.treemap(df_grp0,
path=['merchant_suburb'],
values='amount',
color = 'amount',
)
fig.update_layout(
title=dict(text = "Total Txn Amount by Suburb",x=0.5,y=0.95),
margin=dict(l=10, r=10, t=50, b=10),
showlegend=False,
)
fig.data[0].textinfo = 'label+value'
fig.update_traces(marker_coloraxis=None)
fig.show()

Total Txn Amount by Suburb

Sydney Mascot Mount Gravatt Newtown Alexandria


Perth
Richmond Melrose Park Rochedale Subiaco Ashfield Morley Penrith Waterloo Petrie Terrace
Enfield Fairlight Kew Brighton Barangaroo Sunshine
Mossman Gorge Melbourne Airport Baulkham Hills Wentw
501.67 485.36 476.02 467.3
504.63 479.37
808.6 652.37 505.77 480.24 476.49
1,018.92 558.39 512.47 512.43 509.52 498.05 496.76 493.13
2,721.14 1,487.54
2,000.31 1129
21,675.44 10,282.62 Croydon
Randwick
462.33
Strathpine
398.28
Birtinya
Toowoomba City
394.97
Castle Hill
393.49
Geelong
389.72
Kingsley
386.44
Northgate
386.39
Bundoora
382.25
North Ryde
377.72
Karrinyup
377.65
Marrickville
373.74
Indooroopilly
373.64
Mitcham
372.91
Howard Springs
371.55
Elsternwick
371.24
Eden
Pyrmont 397.23 368.67
Frenchs Forest 558.08
790.64 650.71
Collingwood
Lilydale 1,012.25
Deakin Kalbarri Panania Coburg Yass
Nedlands Margaret River
330.62
Roxburgh Park
329.58
Bayswater Biggera Waters
325.84
Rosebery Booragoon Mount Waverley
322.26 Dee Why Kensington Darlinghurst Goulburn Scarborough
314.78 313.11
Chadstone 322.61 317.86

1,118.7
362.62 327.18 323.85 321.98 313.72
332.12 325.05 324.57 312.11
South Melbourne 456.87
Ballina
1,936.87 1,474.52 Mount Druitt
556.78
West Perth Bankstown
784.01
647.82
Innaloo
Cannington
310.1
Moorabbin Airport
285.63 Tarneit
285.5
Wembley
285.48
Crows Nest
283.53
Manly
281.49
Darwin
279.76
Brighton-Le-Sands
279.29
Southbank
278.6
Ballarat Central
277.56 Carlton
274.82
Virginia
274.64
Queanbeyan
274.13
Windsor
271.25
Casuarina
270.99
Hobart
269.3
Maroubra
268.8

361.76
2,556.85
South Yarra
450.55
Broken Hill Altona Craigieburn
Emerald Gilgandra Middlemount Regents Park Whyalla Norrie
Parkes West Lakes Mooroolbark Rainbow Flat Stuart Park Alphington Bakewell Knoxfield Saint Helens Sunbury Stirling Petrie Woodvale

1,090.36 1,003.93 554.26 239.5 238.82 236.4 233.87 232.22 228.84


234.94 231.4 229.56 226.1
North Nowra
Ultimo 265.05 240.01
235.89
229.14 229 228.44 226.84 226.82

Newman Belrose 361.48 307.78


Mount Pritchard
645.69 Wantirna South

1,458.6
778.13 449.39
York Spearwood Green Valley Heidelberg Kings Langley Balgowlah Footscray Longreach Hornsby Phillip Gundagai Cairns City Rivervale
Upper Mount Gravatt Broadmeadows Vermont South Taylors Lakes
207.35
201.62
Katanni
201.3

Chatswood
204.01 200.9
Beechmont 209.25 207.87 205.13 202.68 202.48 202.36 201.59 201.31 200.68 200.58
202.34
261.68 225.79 201.77
Joondalup
Claremont Yarrawonga 306.08

1,860.84 549.94 360.89

Surry Hills Clayfield Moonie


Sippy Downs Blackheath
Coffs Harbour South Perth
Clayton Box Hill South Thornbury Slacks Creek Toowoomba Oxenford Blackwater Applecross Hurstville SYDNEY Bundall Roc

Darwin City
190.96 190.88 187 186.42 185.03 184.87 183.45
Maroochydore 187.04 185.21 184.28 183.96 181.96 178
199.61 187.86 186.57 180.56 179.29
Woolloongabba 225.75

Doncaster East
Victoria Point

934.06 643.89 Kingswood 261

1,088.32 441.22 Mount Gravatt East

Bondi Junction
2,507.09
304.11

Varsity Lakes Clarkson Boronia Fremantle


Gatton Coorparoo
Holtze Parkville Bruthen
Brunswick East
Annerley Palmerston City Sunnybank Hills Chermside Eagle Farm Homebush Surfers Paradise
Mira Mar
Armadale
162.22 162.1 159.33

771.26
163.98
175.3 161.85 161.07 160.73
Toowong 165.53 165.13 164.94 162.3 161.53 159.94 159

Fyshwick
165.45 165.09 164.25
199.53
Ellenbrook 360.59 Lismore
Capalaba 225.04
538.55 260.04

1,445.58
Cannon Hill Fairfield Kelmscott Laurieton Yowah Chelsea Marlo Sale
East Victoria Park
Berrimah Bomaderry Kinglake West Stafford Heights Townsville City
Forrest Broadwater Beaudesert Campbelltown

Welshpool Mill Park 148.92 146.04 145.52 144.67 142.99


146.14 143.48 143.01

Midland
158.41 148.56 148.22 147.3
148.62 148.17 148.04 147.73 144.57 143.85
174.54

Ashmore
Jindalee

Mount Gambier
439.76 301.54

Keilor Park Ceduna


199.4
Loganholme Warwick

Glen Waverley 625.18 358.24


Narellan
224.12

920.95 Narrogin
Barton Blackburn Semaphore
Millner Wallaroo Glebe
North Willoughby
Essendon Fields Lutwyche Coolbellup Strathmore Macquarie Fields Saint Albans Truganina Altona Meadows Laverton North
Success Seaford Caringbah

259.79
129.13 127.62 127.4 127.27
129.84 127.86 127.58

1,085.12
157.88 130.74 128.89 128.46 128.22 127.54 125.7
Coogee 127.01 126.27

1,838.11
140.69 129.42 128.42

Bairnsdale

4,710.25
174.26
Newcastle

535.96
761.75
Ferntree Gully 198.44
297.16
Hoppers Crossing Balwyn North
223.14
439.21 Newport Elanora Ayr Yokine
Shailer Park Molendinar Culburra Beach Townsville Merriwa Bassendean Laverton
118.31
117.04
Beldon Sorrento Coorabie Katoomba Leonora Vincentia Winnellie

Box Hill
118.55 118.33 117.31 113.92 113.3 112.67
124.52 117.08 116.51 115.86 115.83 113.65
118.38 116.12 115.54 114.04
Mosman
Macquarie Park 140.5
Westmead Alstonville 157.52
354.78

East Melbourne Doncaster


173.98

Balingup
258.37
Boggabri
198.1
603.95
Vic
Yungaburra
105.38
Gloucester
Yamba Kingscliff Mittagong Carrara Belmore O'Malley Yagoona Deer Park
101.29
Bega Herne Hill
100.41
105.11 103.98 103.93 103.18 103.06 103

2,297.64
104.12 103.73 100.52

1,429.03
111.3

Ryde
Canning Vale

East Perth Aratula Esperance


140.47

Lidcombe 296.93 222.9


Carrum Downs
157.47

Albany
173.35

532.53 Robina
Sunnybank

Mansfield Chippendale
Tuncurry Ipswich Wollert Pymble Karama Taringa Benowa

256.61

915.07
Tullamarine 92.79 91.53 91.01 90.98 90.34 89.75
92.44
110.64

Miami
Geraldton

354.73
124.47

Southport 742.17 431.7


Melbourne
Wahroonga
197.36 139.45

1,082.57
Merrylands
157.04

Northcote Wantirna

Artarmon Kalgoorlie
Riverwood

221.59 172.59 98.82

296.53

1,821.8
Bulleen

588.47 Casula
124.23

Palm Beach Salisbury

Kadina
11,943.53
138.05

255.93
197.34 Hinchinbrook

St Leonards Bella Vista


156.82 Mareeba

Malvern
97.78

Edgewater
349.39

Epping
428
Mirrabooka Gladesville 531.7
171.51
Ormeau Northbridge
124.14

219.12
Milton
898.43 734.1
Madeley Oakey
87.91
137.78 Beechboro
97.5

Browns Plains Mudgeeraba

Parramatta
Essendon

296.39
1,406.36
255.91 197.06 156.19

Dandenong South
Reservoir

3,994.83
Blacktown
109.58

Arundel Hammondville
124.07

1,079.37 East Ryde


170.84

588.31 Carlton North Beacon Hill


97.43
Parkdale

Lane Cove St Kilda


217.52
348.21

2,222.8
Cunderdin Tingalpa Greenwood Dowsing Point Milsons Point
Red Hill Cronulla Toronto
Mount Warren Park Murrumbeena
137.65 58.13 Aberfeldie Thomastown Marysville Surrey Hills Keilor East
58.9 58.3 57.86 57.73
59.43 59.01 57.77 57.67 57.65 57.62
Lakes Entrance
Campsie
59.06 58.06 57.93 57.62
155.69

529.58
109.54

427.41 Rocklea Clermont

Adelaide
Rouse Hill 196.95 Malvern East
Cheltenham
123.84

296.31 Oatley
New Town Park Avenue Graceville Curl Curl
251.4
Centennial Park Redland Bay New Norfolk Mt Gambier South Morang
170.61 Rowville
Palmyra Bellevue Rose Bay Heathcote Westcourt

Punchbowl
54.94 54.55
97.14
56.89 55.26 54.84 54.79 54.56 54.04 54.02
55.21 54.45 54.33 54.25 53.44
54.62

Fortitude Valley Old Bar


723.2 Melton Jannali Leppington Ashgrove Gungahlin Burwood
Dallas Clarinda Butler Illawong
Chipping Norton Greystanes Dawes Point Wyndham Vale
Earlwood Peak Crossing
Maryborough 216.07
Tom Price
51.15 50.91
50.74
50.6 50.44 50.4 50.2

1,785.64
51.04 50.9 50.75 50.27 50.25

895.56
154.33 56.88
53.17
Hawthorn
50.85 50.58
Neutral Bay

347.9 Duncraig
123.51

Beaconsfield
Sunshine North

Helensvale Nowra North


196.55 170.22

587.62 Forest Hill Unley Pilton


Petersham Hope Island Tinamba West Mackay Angaston Dunsborough Tamworth Southern River
Berrigan Donvale
Williams Landing Springfield Lakes Westminster Logan V

Greensborough
47.09 46.83
Thornleigh 47.71 47.56 47.4 46.72 46.69
249.76 Jerilderie 49.62

419.52
53.1 47.9 47.67 47.56 47.39
137.13 47.18
46.93 46.71
Carindale

295.19
Grenfell
108.95 86.9
56.8
526.05
1,340.39 Nowra
Gold Coast Piccadilly

Edithvale 154.06 Hampton Park 96.61


Byron Bay
Tarro Eden Hill Rozelle
Allambie Heights
123.45
Woodlands Edgecliff 43.73 Tabulam North Rocks Ferny Grove Emerton Cloverdale Mindarie Coolalinga Portarlington
Kinross D
43.68 43.61 43.13 4
215.7 46.03 43.72 43.5 43.31 43.21 43.18
West Footscray 53.09 43.55 43.23 43.12
49.6
43.8

Maribyrnong
56.78

Port Macquarie
Toorak
Newstead
1,074.72
195.53 Golden Point
170.08
West End

Balcatta
134.99

North Sydney
108.78
Booval Carmel
South Granville Moranbah
Mooloolah Valley
45.96 41.11
North Perth
Asquith Callala Bay Hindmarsh Buderim Wynnum West
40.88
Wyalong Stafford Waterford D
Point Cook 41.02 40.93 40.92 4
Gymea 41.07 40.63

346.02
40.87
Spring Hill Wentworth Falls 49.52 42.67 41.1 41 40.9 40.82
56.66
52.92
Springvale 248.37
Queanbeyan East
152.86 Alpine

Trundle
122.92

Auburn
295.02

711.23
Yarra Glen Brentwood City Beach Goodna Burra
Prairiewood Dunkeld
Yamanto Elwood Templestowe
37.91
Windaroo Avalon Beach
37.76
North Beach

Port Douglas Meadow 38.53 38.39 38.31 38.24


38.01 37.93 37.89
37.55
42.64 37.97

Wagga Wagga
37.6

2,059.73
213.87
Pascoe Vale South
52.91 Hillcrest 45.95

417.68

3,770.83
Ningaloo
Scoresby 49.47

Liverpool
Altona North 108.77 Gisborne
133.52

Malaga 169.92 56.45

583.96
Pantapin
878.17
Gray Hove Moree
Ballarat East Melton South Deception Bay Kidman Park Dry Creek
North Adelaide 36.99 35.84 35.73 35.6
Lennox Head
35.43
Thornlands Highton East Brisbane
35.26
35.43 35.33
195.34 35.29
42.54
Leopold 35.7 35.65 35.41

524.71
Ashford
45.87
Toorak Gardens Parafield Gardens
Broadbeach Waters 49.32
121.78 52.62

Kenmore
152.75

Highett Yeppoon
56.45 Noranda
Erskineville
34.65
Falcon Dulwich Hill
33.56
Ashwood
33.52
Whitfords
33.52
Lara Five Dock
33.18
Nelly Bay
33.17
Blackwood
33.11
Bicton Shoalwater
33.06
Balmoral 33.58
341.61
33.3 33.09
246.92
Nundah

Docklands
108.43 Drouin Wonthaggi 42.33 39.9

Norwood Leederville Yarraman 95.84 45.75

1,659.53
212.89
Berwick 133.26
Bulli Kaleen
293.55 Taigum Clinton
North Hobart Park Ridge Niddrie Kelvin Grove Vivonne Bay Montrose Millme

Ringwood
169.06 48.75 32.29 32.23

Keperra
North Maclean 34.6 32.07
52.59
Western Junction 32.56 32.09
32.59 32.57 32.4

Oakleigh
36.96

Vasse
56.35 Diggers Rest
Dural Lowood

South Brisbane
39.78

Eight Mile Plains 121.77 Karawara 42.23

1,336.58
194.83 152.72
45.56

413.57
Ashby
Reedy Creek Inverloch Nangwarry Calliope Highgate Barmera Tuart Hill Redcliffe Kingsthorpe

Burleigh Heads
Aberdeen Thirlmere Kwinana Beach
30.15 30.01 29.6
Camp Hill 34.48 30.04 29.93 29.81 29.77 29.68 29.64
48.69 Hawthorne
31.71

1,074.7
108.42 52.45

695.66
36.95

Wingfield Notting Hill Coolangatta

582.22
Frankston
42.01
Eaton 56.27
Riverhills
South Nowra 246.82 133.21
Spotswood
Leonay North Haven Ravenhall Ulmarra Prestons
Valla Kerang
Bentleigh East Willowban

Cleveland 28.94 26.83 26.17


45.43 Dunalley 31.65 26.56 26.37 26.28
28.22 26.2
26.27
Northmead Annandale 34.36
340.99 168.84 Willetton
Leichhardt

Coconut Grove 522.9


121.68 48.66
East Fremantle 212.17 52.45 Keilor Downs Hyams Beach
41.99 39.59

11,740.58
291.8 Belconnen Hampton Dianella Willunga
Halls Head
28.13
Mclaren Vale
25.57
Palmwoods
25.44
Schofields
25.15
Manning
25.07
Launceston
25.03 Tugun
Lauderdale 24.97
194.42 152.34 56.06
Upper Swan 31.53

859.21
Abbotsford
45.38 34.29

Rosehill
Kilsyth Exeter 36.62

Canberra

Parap
Smithfield Coopers Plains Caboolture South
Mcalinden
Queenscliff Mayfield Warburton
23.52
Manly Vale MAJURA Apollo Bay
23.27
Lindfield
24.53 23.52

48.64
39.54 23.68 23.34 23.26
132.5 Maleny
52.22
41.97 Mount Beauty 28.13
31.5
South Burnie 28.78
Murrumbateman Middle Park Glenunga 34.16
Potts Point Moonta

413.23
121.48
245.8 56.03 Edensor Park
168.75 45.24 36.59
Tallai Greenwith
21.86
Vineyard Sandy Bay
21.81
Corowa Athol Park
21.74
Aveley

Camberwell
21.84

Hamilton
21.76 21.54
Holland Park 24.53
Bass Hill Ludmilla
Hay Kealba
Christies Beach
Carina

2,004.29
48.37
Lakemba 107.58 Kilburn 39.25

Clontarf
85.5

41.91
Paradise Point
31.46
212.05 152.26 51.96
Cremorne
34.09

Burnley Corio Lawson Eveleigh Burnside


Waverton Byford
Kingsbury Advancetown
Hexham 24.43
22.76 20.25
20.25 20.19 20.05

Prahran
45.13 Grafton

Brunswick 193.04
Mount Clarence
56.02
339.54
36.59

Chirnside Park 683.6


132.3 Mitchell Park Pearsall 27.96
Meadowbrook Hampton East
31.34 28.78
Morven
Lane Cove West

Lowlands 574.39

3,323.51 1,583.17
41.84 39.01 Springwood

290.35
48.34 Maffra Temora Cockatoo
Wa 33.98 Swan Hill Yallingup 19.17 19.04 18.98

522.71
121.14 51.9 24.36 22.74

Berserker Cobram
85.21 Faulconbridge O'connor

1,248.73
Modbury
168.74 107.16 Guildford 44.97 36.39 Plumpton
Airport West
Seaton
31.23 18.08
Eastgardens 55.82 Nannup Saint Helena Glenreagh Attadale Junee

Hillarys
245.64

1,049.16
152.04 Meckering Wickham 39.01 33.93 24.3
22.59
41.74
Taree
Enmore
51.86 48.26 27.89

Castlemaine
Pimpama Bentleigh Oatlands
Officer Busby

Glenside
Murray Bridge 17.7 17.27
Lyons
Leinster 36.31 28.68

211.75
31.14
132 93.81
Kandos Beerwah

410.03
Warabrook 44.79
192.71
Moorebank 22.49
Marsden Park 24.23

Braemar
120.22 Berkshire Park Narrabundah 33.93
55.76 41.66 38.96
Fairfield West
27.61

842.85
Nerang KENSINGTON Warrandyte

Labrador
Yarralumla
107.16 51.79 48.22 Catherine Field Summer Hill
Huskisson
28.68

336.88
36.31 30.9 Botany
Victor Harbor 22.37
167.58 Caffey

Haymarket 289.01
Bracknell Sorell
44.7 Plainland 18.71 16
Brisbane Airport Sylvania 16.81
151.74 Erskine Park Gooseberry Hill Lesmurdie 33.93 27.56

Tempe
41.66
55.55 38.87
Adelaide Airport Leeming

Ivanhoe
Millicent Wishart Marian

244.46
Eagleby
85.08
Oaklands Park Runcorn Belgrave
30.66
28.66
22.31 Elleker

Rockingham 131.88 51.74

668.57
Harlaxton 48.18 36.26 18.7
Algester
Croydon Park Torrens Park Mile End Eastland
119.5
211.13 20.61 19.62
Greenslopes 44.46
33.89 27.45

Broadbeach
Harris Park
Florey
192.3 106.28
Mitchell Uraidla
Black Head

567.67
15.89

515.81
Federal
Bellara
24.01
Naremburn

Eltham
18.7
41.63 38.86 Wynyard 30.65
55.52
Kalamunda Kilkenny
West Pennant Hills
51.73 Derrimut 36.25 Glengowrie
27.4
93.4 Woolooware
48.16

Brookvale
165.91 33.85

Brisbane City
Glen Iris Maidstone Galston

Mulgrave 44.38 Kingston 23.98

409.44
Bakery Hill Elliston
Moonee Ponds 150.23 28.6

Preston
Karana Downs Darch
131.3 41.61
38.83

Busselton
Henley Brook Newrybar 20.56
287.01
Nunawading

335.19 Wacol
36.2

2,003.3
119.46 27.35 Fawkner
18.63
Casino Ballajura Bulla
Shoal Bay 106.11
55.41
Maitland Kiara 33.78 23.95 Macedon
51.69
Birkdale
Clear Island Waters
20.55

48.1
44.08

Hawthorn East 244.19 Silverwater Berry


93.23
Beaumaris Saint Kilda East
38.69 Hocking
41.5

1,543.08
210.82 192.02 Bowral 27.07 Bucasia Evatt

1,032.4
Goonellabah Bonnyrigg 36.08 Yangebup 23.86
22.12
60 Kingsford 33.75

821.08 Armidale
Lewisham
165.8 Kent Town
Seville
Chatswood West Fountain Gate

1 237 5
55.4 28.51 Hallam

West Wodonga
51.66 48.03 30.47

Eastwood
Bondi Beach 18.6

2 724 45
Main Beach
149.82
131 Calder Park The Vines
44.07
Seddon 41.49 38.68 Minchinbury

559.33
36.05
105.81

Werribee
Deagon Weston
Keswick Forest Glen Newmarket 18.57 17.4

657 82 515 61
28.41
93.21 Fairfield Heights 30.46
Brunswick West Glen Osmond
59.73 Lavington Lakelands
Geebung
l

Insights:

Sydney , Melbourne, South Brisbane , Mascot and Mount Gambier are leading contributers of transaction amount.

 Analyzing Debit Transactions


In [21]: df1 = df[df.movement=='debit'] # Debit Transactions
df1.head()

Out[21]:
status card_present_flag account long_lat txn_description merchant_id first_name balance date gender ... extraction amount

81c48296-
ACC- 153.41 73be-44a7- 2018- 2018-08-01
0 authorized 1 POS Diana 35.39 F ... 16.25
1598451071 -27.95 befa- 08-01 01:01:15+00:00
d053f48ce7cd

830a451c-
ACC- 153.41 316e-4a6a- 2018- 2018-08-01
1 authorized 0 SALES-POS Diana 21.20 F ... 14.19
1598451071 -27.95 bf25- 08-01 01:13:45+00:00
e37caedca49e

835c231d-
ACC- 151.23 8cdf-4e96- 2018- 2018-08-01
2 authorized 1 POS Michael 5.71 M ... 6.42
1222300524 -33.94 859d- 08-01 01:26:15+00:00
e9d571760cf0

48514682-
ACC- 153.10 c78a-4a88- 2018- 2018-08-01
3 authorized 1 SALES-POS Rhonda 2117.22 F ... 40.90
1037050564 -27.66 b0da- 08-01 01:38:45+00:00
2d6302e64673

b4e02c10-
ACC- 153.41 0852-4273- 2018- 2018-08-01
4 authorized 1 SALES-POS Diana 17.95 F ... 3.25
1598451071 -27.95 b8fd- 08-01 01:51:15+00:00
7b3395e32eb0

5 rows × 23 columns
In [22]: cols = ['card_present_flag', 'status', 'txn_description' , 'movement' , 'gender', 'merchant_state']
#Subplot initialization
fig = make_subplots(
rows=3,
cols=2,
subplot_titles=('card_present_flag', 'status', 'txn_description' , 'movement','gender', 'merchant_st
horizontal_spacing=0.2,
vertical_spacing=0.2
)
# Adding subplots
count=0
for i in range(1,4):
for j in range(1,3):
fig.add_trace(go.Bar(x=df1.groupby(by=cols[count]).sum()['amount'].index,
y=df1.groupby(by=cols[count]).sum()['amount'].values.round(2),
name=cols[count],
textposition='auto',
text=[str(round((i/sum(df1.groupby(by=cols[count]).sum()['amount'].values))*100,2))+'%'
for i in df1.groupby(by=cols[count]).sum()['amount'].values]
),
row=i,col=j)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
count+=1
fig.update_layout(
title=dict(text = "Analyze Categorical variables (Total Txn Amount/Percentage)",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
width = 980,
height = 980,
margin=dict(l=80, r=80, t=150, b=80)
)
fig.show()

Analyze Categorical variables (Total Txn Amount/Percentage)

card_present_flag status
300k
79.96% 52.81%
200k 47.19%

200k

100k
100k
20.04%

0 0
−0.5 0 0.5 1 1.5 authorized posted

txn_description movement
200k 600k
34.39% 100.0%

150k
26.05% 26.76% 400k

100k
200k
50k 10.96%
1.83%
0 0
IN PA PH PO SA debit
TE YM O S LE
RB EN NE S-P
AN T BA OS
K NK

gender merchant_state
100k
300k 54.55%
32.92%

28.27%

45.45%
200k
50k
17.26%

100k
10.97%

2.96%
1.57% 5.41% 0.63%
0 0
F M ACT NSW NT QLD SA TAS VIC WA
 Insights

Around 80% amount transacted via cards.


Payment mode of transaction contributes most to the txn amount.
NSW & VIC merchant states contributed more than half to overall transaction amount

 Data preparation for grouped bar chart to display state & gender wise total transaction amount

In [23]: df_grp=df1.groupby(by=['merchant_state','gender']).sum()[['amount']].reset_index()
df_grp # This is not sorted yet

Out[23]:
merchant_state gender amount

0 ACT F 1657.44

1 ACT M 3219.24

2 NSW F 41430.88

3 NSW M 60590.89

4 NT F 8741.42

5 NT M 427.47

6 QLD F 28611.05

7 QLD M 24872.40

8 SA F 11349.73

9 SA M 5426.84

10 TAS F 622.72

11 TAS M 1340.21

12 VIC F 38626.01

13 VIC M 48957.99

14 WA F 19908.15

15 WA M 14083.91

In [24]: # Sort the dataframe by txn amount


df1.groupby(by=['merchant_state']).sum()[['amount']].sort_values(by='amount',ascending=False).index.values

Out[24]: array(['NSW', 'VIC', 'QLD', 'WA', 'SA', 'NT', 'ACT', 'TAS'], dtype=object)

In [25]: # Perform sorting using custom order


order = df1.groupby(by=['merchant_state']).sum()[['amount']].sort_values(by='amount',ascending=False).index
df_grp['merchant_state']=pd.Categorical(df_grp['merchant_state'],order)
df_grp= df_grp.groupby(by=['merchant_state','gender']).sum().reset_index()
df_grp

Out[25]:
merchant_state gender amount

0 NSW F 41430.88

1 NSW M 60590.89

2 VIC F 38626.01

3 VIC M 48957.99

4 QLD F 28611.05

5 QLD M 24872.40

6 WA F 19908.15

7 WA M 14083.91

8 SA F 11349.73

9 SA M 5426.84

10 NT F 8741.42

11 NT M 427.47

12 ACT F 1657.44

13 ACT M 3219.24

14 TAS F 622.72

15 TAS M 1340.21
In [26]: fig=px.bar(data_frame=df_grp,
x='merchant_state',
y='amount',color='gender',
barmode='group',
text=df_grp.amount.apply(lambda x : str(round(x/1000,2))+'k')
)
fig.update_traces(textposition='outside')
fig.update_xaxes(title='Day',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Transaction Amount in Merchant State by Gender",x=0.5,y=0.95),
title_font_size=20,
)
fig.show()

Transaction Amount in Merchant State by Gender


60.59k ge
60k

50k 48.96k

41.43k
Transaction Amount

40k 38.63k

30k 28.61k
24.87k

19.91k
20k

14.08k
11.35k
10k 8.74k
5.43k
3.22k
1.66k
0.43k 0.62k 1.34k
0
NSW VIC QLD WA SA NT ACT TAS

Day

Insights : Overall males carry out more transactions as compared to females but in three states (QLD,WA,SA) females are leading.
In [27]: fig= px.bar(data_frame=df,
x=df1['day'].value_counts().index.tolist(),
y=df1['day'].value_counts().tolist(),
color=df1['day'].value_counts().tolist(),
text=df1['day'].value_counts().tolist()
)
fig.update_traces(textposition='outside',marker_coloraxis=None)
fig.update_xaxes(title='Day',showgrid=False)
fig.update_yaxes(title='Transaction count',showgrid=False)
fig.update_layout(
title=dict(text = "Transaction flow by each day",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 450,
)
fig.show()

fig1= px.bar(data_frame=df1.groupby(by='day').sum()[['amount']].sort_values('amount',ascending=False),
text=df1.groupby(by='day').sum()[['amount']].sort_values('amount',ascending=False)['amount'].apply(lambda x :
)
fig1.update_traces(textposition='outside')
fig1.update_xaxes(title='Day',showgrid=False)
fig1.update_yaxes(title='Transaction Amount',showgrid=False)
fig1.update_layout(
title=dict(text = "Transaction amount by each day",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 450,
)
fig1.show()

Transaction flow by each day


1891 1872

1709
1658
1550
1500
1327
Transaction count

1153

1000

500

0
Wednesday Friday Saturday Thursday Sunday Tuesday Monday

Day

Transaction amount by each day


94.96k
93.0k
89.24k 87.68k
82.17k
80k
74.24k

65.41k
Transaction Amount

60k

40k

20k

0
Wednesday Saturday Friday Thursday Sunday Tuesday Monday

Day

 Insights

The transaction count is lower during the start of the week but start to pick up on wednesday through saturday.
Even though transaction count is comparatively less on satuday but it is still at place 2 in terms of transaction amount which signifies bigger
transactions on Saturday.
 Data preparation for grouped bar chart to display day & gender wise total transaction amount

In [28]: df1_grp=df1.groupby(by=['day','gender']).sum()[['amount']].reset_index()
df1_grp # This is not sorted yet

Out[28]:
day gender amount

0 Friday F 44450.06

1 Friday M 44789.60

2 Monday F 31511.01

3 Monday M 33901.90

4 Saturday F 36124.99

5 Saturday M 56877.57

6 Sunday F 39932.72

7 Sunday M 42241.84

8 Thursday F 35366.77

9 Thursday M 52310.59

10 Tuesday F 33626.24

11 Tuesday M 40614.62

12 Wednesday F 45654.61

13 Wednesday M 49304.83

In [29]: order = ['Monday','Tuesday', 'Wednesday','Thursday','Friday','Saturday','Sunday']


df1_grp['day']=pd.Categorical(df1_grp['day'],order)
df1_grp= df1_grp.groupby(by=['day','gender']).sum().reset_index()
df1_grp

Out[29]:
day gender amount

0 Monday F 31511.01

1 Monday M 33901.90

2 Tuesday F 33626.24

3 Tuesday M 40614.62

4 Wednesday F 45654.61

5 Wednesday M 49304.83

6 Thursday F 35366.77

7 Thursday M 52310.59

8 Friday F 44450.06

9 Friday M 44789.60

10 Saturday F 36124.99

11 Saturday M 56877.57

12 Sunday F 39932.72

13 Sunday M 42241.84
In [30]: fig=px.bar(data_frame=df1_grp,
x='day',
y='amount',color='gender',
barmode='group',
text=df1_grp.amount.apply(lambda x : str(round(x/1000,2))+'k')
)
fig.update_traces(textposition='outside')
fig.update_xaxes(title='Day',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Transaction Amount in Merchant State by Gender",x=0.5,y=0.95),
title_font_size=20,
)
fig.show()

Transaction Amount in Merchant State by Gender


56.88k ge

52.31k
49.3k
50k
45.65k
44.45k 44.79k
42.24k
40.61k 39.93k
40k
Transaction Amount

35.37k 36.12k
33.9k 33.63k
31.51k
30k

20k

10k

0
Monday Tuesday Wednesday Thursday Friday Saturday Sunday

Day

Insights

Males spent most on Saturday.


Females are spending most on Wednesday & Friday.
In [31]: fig= px.bar(data_frame=df,
x=df['month'].value_counts().index.tolist(),
y=df['month'].value_counts().tolist(),
color=df['month'].value_counts().tolist(),
text=df['month'].value_counts().tolist()
)
fig.update_traces(textposition='outside')
fig.update_xaxes(title='Month',showgrid=False)
fig.update_yaxes(title='Transaction Count',showgrid=False)
fig.update_layout(
title=dict(text = "Transaction flow by each day",x=0.5,y=0.95),
title_font_size=20,
width = 700,
height = 450,
)
fig.show()

Transaction flow by each day


4087 4013 3943 color
4000
4080
3500
4060
3000
Transaction Count

4040
2500

4020
2000

4000
1500

3980
1000

500 3960

0
October September August

Month

Insights : As per the above bar graph there is a steady increase in the number of transaction by each passing Month which is a good sign

In [32]: fig=px.bar(df.groupby(by='customer_id').sum()['amount'].sort_values(ascending=False).head(10),
color=df.groupby(by='customer_id').sum()['amount'].sort_values(ascending=False).head(10),
text=df.groupby(by='customer_id').sum()['amount'].sort_values(ascending=False).head(10).round(),
)
fig.update_traces(textposition='outside',marker_coloraxis=None)
fig.update_xaxes(title='Customer ID',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Top 10 customers by Transaction Amount",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 500,
)
fig.show()

Top 10 customers by Transaction Amount


45409
42688
40216
40k 37944
36786 36639 36588 36544 36051 35833
Transaction Amount

30k

20k

10k

0
CU CU CU CU CU CU CU CU CU CU
S-2 S-3 S-1 S-2 S-2 S-8 S-4 S-5 S-1 S-2
73 142 816 15 61 83 14 27 19 03
829 62 69 57 67 48 26 400 61 13
15 58 31 01 41 25 63 76 56 27
16 64 51 61 36 47 0 97 5 254 4
4
In [33]: df.age_group.value_counts()

Out[33]: 20-30 5071


30-40 3405
<20 1900
40-50 1293
>60 224
50-60 150
Name: age_group, dtype: int64

In [34]: fig=px.bar(df1.age_group.value_counts(),
color=df1.age_group.value_counts(),
text=df1.age_group.value_counts().tolist(),
)
fig.update_traces(textposition='outside',marker_coloraxis=None)
fig.update_xaxes(title='Age Group',showgrid=False)
fig.update_yaxes(title='Transaction Count',showgrid=False)
fig.update_layout(
title=dict(text = "Transactions by Age Group",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 450,
)
fig.show()

Transactions by Age Group


5000 4756

4000

3158
Transaction Count

3000

2000 1766

1165
1000

185 130
0
20-30 30-40 <20 40-50 >60 50-60

Age Group

 Insights

Most transactions have been been carried out by Age Groups - "20-30" & "30-40".
Company should think of providing some attractive offers for "50-60" & ">60" age groups considering the transaction volume of these groups.

 Data preparation for grouped bar chart to display Age Group & gender wise total transaction amount

In [35]: df2_grp=df1.groupby(by=['age_group','gender']).sum()['amount'].reset_index()
df2_grp

Out[35]:
age_group gender amount

0 <20 F 46543.04

1 <20 M 35515.86

2 20-30 F 100941.94

3 20-30 M 129592.17

4 30-40 F 74379.11

5 30-40 M 100520.30

6 40-50 F 38050.03

7 40-50 M 43137.16

8 50-60 F 3652.86

9 50-60 M 5742.71

10 >60 F 3099.42

11 >60 M 5532.75
In [36]: fig=px.bar(data_frame=df2_grp,
x = 'age_group',
y = 'amount',
color='gender',
barmode='group',
text=df2_grp.amount.apply(lambda x : str(round(x/1000,2))+'k')
)
fig.update_traces(textposition='outside')
fig.update_xaxes(title='Age Group',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Transaction Amount by Age Group & Gender",x=0.5,y=0.95),
title_font_size=20,
)
fig.show()

Transaction Amount by Age Group & Gender


129.59k ge

120k

100.94k 100.52k
100k
Transaction Amount

80k 74.38k

60k

46.54k
43.14k
40k 38.05k
35.52k

20k

3.65k 5.74k 5.53k


3.1k
0
<20 20-30 30-40 40-50 50-60 >60

Age Group

Insights :

Males in the age group of 20-30 are contributing most to the Total Txn amount.
In Age group '<20', Females are ahead of males in terms of Total txn amount

In [37]: df3_grp=df1.groupby(by='date').mean()[['amount']].merge(df1.groupby(by='date').count()[['transaction_id']],on='date')
df3_grp.columns= ['Amount','Transaction Count']
df3_grp.head()

Out[37]:
Amount Transaction Count

date

2018-08-01 44.729355 124

2018-08-02 53.225986 142

2018-08-03 56.590845 142

2018-08-04 53.356356 118

2018-08-05 44.265000 100


In [38]: fig=px.line(df3_grp)
fig.update_xaxes(title='Date',showgrid=False)
fig.update_yaxes(showgrid=False)
fig.update_layout(
title=dict(text = "Average Amount VS Txn Count over time",x=0.5,y=0.95),
title_font_size=20,
width = 980,
height = 500,
)
fig.show()

Average Amount VS Txn Count over time

160 variable
Amount
Transaction C
140

120

100
value

80

60

40

20
Aug 5 Aug 19 Sep 2 Sep 16 Sep 30 Oct 14 Oct 28
2018

Date

 Insights

The average transaction amount on 7th August & Oct 21st was very high approx 100 AUD.
Large number of transactions took place on 17th August & 28th September.

In [39]: fig=px.line(df1.groupby(by='date').sum()[['amount']])
fig.update_traces(line=dict(color="#8cba51", width=3.5))
fig.update_xaxes(title='Date',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Total Txn Amount over time",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
width = 980,
height = 500
)
fig.show()

Total Txn Amount over time


14k

12k

10k
Transaction Amount

8k

6k

4k

2k

Aug 5 Aug 19 Sep 2 Sep 16 Sep 30 Oct 14 Oct 28


2018

Date
Insights: Total Transaction amount almost touched 14k AUD on 21st Oct. Looks like some big transaction were done on that day as the transaction
count is not that high on 21st Oct.

In [40]: fig=px.line(df1.groupby(by='hour').sum()[['amount']],
text=df1.groupby(by='hour').sum()['amount'].apply(lambda x : str(round(x/1000))+'k').values
)
fig.update_traces(line=dict(color="#f58634", width=5))
fig.update_xaxes(title='Hour',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Total Txn Amount hourly",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
width = 980,
height = 500
)
fig.update_traces(textposition='middle right',fillcolor='red')
fig.show()

Total Txn Amount hourly

47k
45k

40k 40k
38k
35k 34k 34k 33k
Transaction Amount

32k
30k 30k
28k
27k
25k
24k 24k
22k 22k
20k 21k
19k 19k 19k

16k 17k
15k 15k
13k

10k 10k

5k
3k

0 5 10 15 20

Hour

Insights:

Total transaction amount generated at 9:00 AM is approx 47k which is highest throughout the day.
Between 12:00 AM - 7:00 AM we have least transaction amount because of off hours.

In [41]: df4_grp= df1.groupby(by=['hour','month','gender']).agg(['count','sum'])[['amount']].reset_index()


df4_grp.columns = ['hour', 'month' ,'gender','Transaction Count', 'Total Txn Amount']
df4_grp

Out[41]:
hour month gender Transaction Count Total Txn Amount

0 0 August F 27 676.43

1 0 August M 20 568.77

2 0 October F 11 379.63

3 0 October M 16 411.30

4 0 September F 19 574.91

... ... ... ... ... ...

139 23 August M 82 4630.99

140 23 October F 56 2527.53

141 23 October M 82 3404.24

142 23 September F 60 2621.67

143 23 September M 87 5732.32

144 rows × 5 columns


In [42]: fig1=px.line(data_frame=df4_grp,
x=df4_grp.hour,
y=df4_grp['Transaction Count'],
color=df4_grp.gender,
facet_col= df4_grp.month
)
fig1.update_xaxes(title='Hour',showgrid=False)
fig1.update_yaxes(showgrid=False)
fig1.update_layout(
title=dict(text = "Hourly Transaction count by Month ",x=0.5,y=0.95),
title_font_size=20,
width = 980,
height = 500,
margin=dict(l=80, r=80, t=100, b=80)
)
fig1.show()

fig2=px.line(data_frame=df4_grp,
x=df4_grp.hour,
y=df4_grp['Total Txn Amount'],
color=df4_grp.gender,
facet_col= df4_grp.month
)
fig2.update_xaxes(title='Hour',showgrid=False)
fig2.update_yaxes(showgrid=False)
fig2.update_layout(
title=dict(text = "Hourly Transaction count by Month ",x=0.5,y=0.95),
title_font_size=20,
width = 980,
height = 500,
margin=dict(l=80, r=80, t=100, b=80)
)
fig2.show()

Hourly Transaction count by Month

month=August month=October month=September


gen
200

150
Transaction Count

100

50

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Hour Hour Hour

Hourly Transaction count by Month

month=August month=October month=September


gen

10k

8k
Total Txn Amount

6k

4k

2k

0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Hour Hour Hour


Insights:

In the month of September & October even though transaction count by females are more at 9:00 AM but TXN amount is still less. Seems like
comparatively small transactions done by females during the start of the day.
In October at 2:00 PM transaction amount by females is almost double as compared to males.

In [43]: df4_grp= df1.groupby(by=['hour','day','gender']).agg(['count','sum'])[['amount']].reset_index()


df4_grp.columns = ['hour', 'day' ,'gender','Transaction Count', 'Total Txn Amount']
df4_grp.head()

Out[43]:
hour day gender Transaction Count Total Txn Amount

0 0 Friday F 10 268.61

1 0 Friday M 12 265.03

2 0 Monday F 6 149.13

3 0 Monday M 3 61.40

4 0 Saturday F 10 303.74
In [44]: fig1=px.line(data_frame=df4_grp,
x=df4_grp.hour,
y=df4_grp['Transaction Count'],
color=df4_grp.gender,
facet_col= df4_grp.day
)
fig1.update_xaxes(title='Hour',showgrid=False)
fig1.update_yaxes(showgrid=False)
fig1.update_layout(
title=dict(text = "Hourly Transaction count by Month ",x=0.5,y=0.95),
title_font_size=20,
width = 980,
height = 500,
margin=dict(l=80, r=80, t=100, b=80)
)
fig1.show()

fig2=px.line(data_frame=df4_grp,
x=df4_grp.hour,
y=df4_grp['Total Txn Amount'],
color=df4_grp.gender,
facet_col= df4_grp.day
)
fig2.update_xaxes(title='Hour',showgrid=False)
fig2.update_yaxes(showgrid=False)
fig2.update_layout(
title=dict(text = "Hourly Transaction count by Month ",x=0.5,y=0.95),
title_font_size=20,
width = 980,
height = 500,
margin=dict(l=80, r=80, t=100, b=80)
)
fig2.show()

Hourly Transaction count by Month

day=Friday day=Monday day=Saturday day=Sunday day=Thursday day=Tuesday day=Wednesday


gen

100

80
Transaction Count

60

40

20

0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Hour Hour Hour Hour Hour Hour Hour

Hourly Transaction count by Month

day=Friday day=Monday day=Saturday day=Sunday day=Thursday day=Tuesday day=Wednesday


gen

8000

6000
Total Txn Amount

4000

2000

0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Hour Hour Hour Hour Hour Hour Hour


Insights:

On Saturday at lunch time (2:00 PM) transaction amount by males is almost 6 times higher than females. However on Sunday at the same
time the trend is completely in the opposite direction.

 Analysing Credit Transactions


In [45]: df2 = df[df.movement=='credit']
df2.head()

Out[45]:
status card_present_flag account long_lat txn_description merchant_id first_name balance date gender ... extraction amount

ACC- 151.27 2018- 2018-08-01


50 posted <NA> PAY/SALARY NaN Isaiah 8342.11 M ... 3903.95 9c
588564840 -33.76 08-01 11:00:00+00:00

ACC- 145.01 2018- 2018-08-01


61 posted <NA> PAY/SALARY NaN Marissa 2040.58 F ... 1626.48 18
1650504218 -37.93 08-01 12:00:00+00:00

ACC- 151.18 2018- 2018-08-01


64 posted <NA> PAY/SALARY NaN Eric 3158.51 M ... 983.36 bd
3326339947 -33.80 08-01 12:00:00+00:00

ACC- 145.00 2018- 2018-08-01


68 posted <NA> PAY/SALARY NaN Jeffrey 2517.66 M ... 1408.08 0d
3541460373 -37.83 08-01 13:00:00+00:00

ACC- 144.95 2018- 2018-08-01


70 posted <NA> PAY/SALARY NaN Kristin 2271.79 F ... 1068.04 f
2776252858 -37.76 08-01 13:00:00+00:00

5 rows × 23 columns

In [46]: fig=px.bar(
df2.groupby(by='customer_id').mean()['balance'].sort_values(ascending=False).head(10),
text = df2.groupby(by='customer_id').mean()['balance'].sort_values(ascending=False).head(10).apply(
lambda x : str(round(x/1000,2))+'k' ),
color = df2.groupby(by='customer_id').mean()['balance'].sort_values(ascending=False).head(10)
)
fig.update_traces(textposition='outside')
fig.update_xaxes(title='Customer ID',showgrid=False)
fig.update_yaxes(title='Average Balance',showgrid=False)
fig.update_layout(
title=dict(text = "Top Valuable Customers by AVG Balance",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 500

)
fig.update_traces(marker_coloraxis=None)
fig.show()

Top Valuable Customers by AVG Balance


264.13k

250k

199.84k
200k
Average Balance

150k

113.25k

100k
73.96k
63.58k
58.42k 57.07k 54.79k
49.72k
50k 40.73k

0
CU CU CU CU CU CU CU CU CU CU
S-2 S-5 S-2 S-1 S-3 S-1 S-4 S-1 S-3 S-2
37 274 819 81 11 60 95 64 46 66
010 00 54 66 76 90 59 6 18 28 39
84 76 59 93 10 60 93 38 82 07
57 5 04 15 63 617 1 2 1 033 0
1 5 5

In [47]: order = ['August','September','October']


df['month']=pd.Categorical(df['month'],order)
In [48]: g1 = df.groupby(by='month').agg(['mean','sum'])['amount']
g1.columns=['Avg Amount', 'Total Amount']
g1[['Avg Amount','Total Amount']]=g1[['Avg Amount','Total Amount']].round().astype(int)
g1.reset_index(inplace=True)
g1

Out[48]:
month Avg Amount Total Amount

0 August 185 729936

1 September 182 730550

2 October 196 802798

In [49]: g2=df.groupby(by='month').agg(['mean','sum'])['balance']
g2.columns=['Avg Balance', 'Total Balance']
g2[['Avg Balance','Total Balance']]=g2[['Avg Balance','Total Balance']].round().astype(int)
g2.reset_index(inplace=True)
g2

Out[49]:
month Avg Balance Total Balance

0 August 10794 42561328

1 September 14730 59112097

2 October 18451 75409203

In [50]: month = g1.merge(g2,on='month')


month

Out[50]:
month Avg Amount Total Amount Avg Balance Total Balance

0 August 185 729936 10794 42561328

1 September 182 730550 14730 59112097

2 October 196 802798 18451 75409203

In [51]: pio.templates.default = "plotly_white"


fig = ff.create_table(month)
for i in range(len(fig.layout.annotations)):
fig.layout.annotations[i].font.size = 13
fig.show()

month Avg Amount Total Amount Avg Balance Total Balance

August 185 729936 10794 42561328

September 182 730550 14730 59112097

October 196 802798 18451 75409203

Insights:

There is a 7% increase in Avg transaction amount from August to October.


71% increase in AVG balance maintained by the customers. Looks like customers have deep trust in the ANZ bank.
77% increase in total balance over these 3 months.

 End

You might also like