Professional Documents
Culture Documents
Norwbook Plotyexpress
Norwbook Plotyexpress
Norwbook Plotyexpress
Out[2]:
status card_present_flag bpay_biller_code account currency long_lat txn_description merchant_id merchant_code first_name ... age
81c48296-
ACC- 153.41 73be-44a7-
0 authorized 1.0 NaN AUD POS NaN Diana ... 26
1598451071 -27.95 befa-
d053f48ce7cd
830a451c-
ACC- 153.41 316e-4a6a-
1 authorized 0.0 NaN AUD SALES-POS NaN Diana ... 26
1598451071 -27.95 bf25-
e37caedca49e
835c231d-
ACC- 151.23 8cdf-4e96-
2 authorized 1.0 NaN AUD POS NaN Michael ... 38
1222300524 -33.94 859d-
e9d571760cf0
48514682-
ACC- 153.10 c78a-4a88-
3 authorized 1.0 NaN AUD SALES-POS NaN Rhonda ... 40
1037050564 -27.66 b0da-
2d6302e64673
b4e02c10-
ACC- 153.41 0852-4273-
4 authorized 1.0 NaN AUD SALES-POS NaN Diana ... 26
1598451071 -27.95 b8fd-
7b3395e32eb0
5 rows × 23 columns
In [3]: df.shape
In [4]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12043 entries, 0 to 12042
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 status 12043 non-null object
1 card_present_flag 7717 non-null float64
2 bpay_biller_code 885 non-null object
3 account 12043 non-null object
4 currency 12043 non-null object
5 long_lat 12043 non-null object
6 txn_description 12043 non-null object
7 merchant_id 7717 non-null object
8 merchant_code 883 non-null float64
9 first_name 12043 non-null object
10 balance 12043 non-null float64
11 date 12043 non-null datetime64[ns]
12 gender 12043 non-null object
13 age 12043 non-null int64
14 merchant_suburb 7717 non-null object
15 merchant_state 7717 non-null object
16 extraction 12043 non-null object
17 amount 12043 non-null float64
18 transaction_id 12043 non-null object
19 country 12043 non-null object
20 customer_id 12043 non-null object
21 merchant_long_lat 7717 non-null object
22 movement 12043 non-null object
dtypes: datetime64[ns](1), float64(4), int64(1), object(17)
memory usage: 2.1+ MB
In [5]: df.shape
In [6]: df.head()
Out[6]:
status card_present_flag bpay_biller_code account currency long_lat txn_description merchant_id merchant_code first_name ... age
81c48296-
ACC- 153.41 73be-44a7-
0 authorized 1.0 NaN AUD POS NaN Diana ... 26
1598451071 -27.95 befa-
d053f48ce7cd
830a451c-
ACC- 153.41 316e-4a6a-
1 authorized 0.0 NaN AUD SALES-POS NaN Diana ... 26
1598451071 -27.95 bf25-
e37caedca49e
835c231d-
ACC- 151.23 8cdf-4e96-
2 authorized 1.0 NaN AUD POS NaN Michael ... 38
1222300524 -33.94 859d-
e9d571760cf0
48514682-
ACC- 153.10 c78a-4a88-
3 authorized 1.0 NaN AUD SALES-POS NaN Rhonda ... 40
1037050564 -27.66 b0da-
2d6302e64673
b4e02c10-
ACC- 153.41 0852-4273-
4 authorized 1.0 NaN AUD SALES-POS NaN Diana ... 26
1598451071 -27.95 b8fd-
7b3395e32eb0
5 rows × 23 columns
In [7]: df.isnull().sum() # Drop 'bpay_biller_code' & 'merchant_code' as majority of the values are NULLS
Out[7]: status 0
card_present_flag 4326
bpay_biller_code 11158
account 0
currency 0
long_lat 0
txn_description 0
merchant_id 4326
merchant_code 11160
first_name 0
balance 0
date 0
gender 0
age 0
merchant_suburb 4326
merchant_state 4326
extraction 0
amount 0
transaction_id 0
country 0
customer_id 0
merchant_long_lat 4326
movement 0
dtype: int64
In [8]: df.country.value_counts() # This can be dropped as we are only dealing with one country
In [9]: df.currency.value_counts() # This can be also dropped as we are only dealing with just one currency
In [11]: # Duplicates
df.duplicated().sum() # NO Duplicates
Out[11]: 0
Out[14]:
status card_present_flag account long_lat txn_description merchant_id first_name balance date gender ... extraction amount
81c48296-
ACC- 153.41 73be-44a7- 2018- 2018-08-01
0 authorized 1.0 POS Diana 35.39 F ... 16.25
1598451071 -27.95 befa- 08-01 01:01:15+00:00
d053f48ce7cd
830a451c-
ACC- 153.41 316e-4a6a- 2018- 2018-08-01
1 authorized 0.0 SALES-POS Diana 21.20 F ... 14.19
1598451071 -27.95 bf25- 08-01 01:13:45+00:00
e37caedca49e
835c231d-
ACC- 151.23 8cdf-4e96- 2018- 2018-08-01
2 authorized 1.0 POS Michael 5.71 M ... 6.42
1222300524 -33.94 859d- 08-01 01:26:15+00:00
e9d571760cf0
48514682-
ACC- 153.10 c78a-4a88- 2018- 2018-08-01
3 authorized 1.0 SALES-POS Rhonda 2117.22 F ... 40.90
1037050564 -27.66 b0da- 08-01 01:38:45+00:00
2d6302e64673
b4e02c10-
ACC- 153.41 0852-4273- 2018- 2018-08-01
4 authorized 1.0 SALES-POS Diana 17.95 F ... 3.25
1598451071 -27.95 b8fd- 08-01 01:51:15+00:00
7b3395e32eb0
5 rows × 23 columns
Out[15]:
status card_present_flag account long_lat txn_description merchant_id first_name balance date gender ... extraction amount
81c48296-
ACC- 153.41 73be-44a7- 2018- 2018-08-01
0 authorized 1 POS Diana 35.39 F ... 16.25
1598451071 -27.95 befa- 08-01 01:01:15+00:00
d053f48ce7cd
830a451c-
ACC- 153.41 316e-4a6a- 2018- 2018-08-01
1 authorized 0 SALES-POS Diana 21.20 F ... 14.19
1598451071 -27.95 bf25- 08-01 01:13:45+00:00
e37caedca49e
835c231d-
ACC- 151.23 8cdf-4e96- 2018- 2018-08-01
2 authorized 1 POS Michael 5.71 M ... 6.42
1222300524 -33.94 859d- 08-01 01:26:15+00:00
e9d571760cf0
48514682-
ACC- 153.10 c78a-4a88- 2018- 2018-08-01
3 authorized 1 SALES-POS Rhonda 2117.22 F ... 40.90
1037050564 -27.66 b0da- 08-01 01:38:45+00:00
2d6302e64673
b4e02c10-
ACC- 153.41 0852-4273- 2018- 2018-08-01
4 authorized 1 SALES-POS Diana 17.95 F ... 3.25
1598451071 -27.95 b8fd- 08-01 01:51:15+00:00
7b3395e32eb0
5 rows × 23 columns
In [16]: cols = ['card_present_flag', 'status', 'txn_description' , 'movement' , 'gender', 'merchant_state']
#Subplot initialization
fig = make_subplots(
rows=3,
cols=2,
subplot_titles=('card_present_flag', 'status', 'txn_description' , 'movement','gender', 'merchant_st
horizontal_spacing=0.2,
vertical_spacing=0.2
)
# Adding subplots
count=0
for i in range(1,4):
for j in range(1,3):
fig.add_trace(go.Bar(x=df[cols[count]].value_counts().index,
y=df[cols[count]].value_counts(),
name=cols[count],
textposition='auto',
text= [str(i) + '%' for i in (df[cols[count]].value_counts(normalize=True)*100).round(2).tol
),
row=i,col=j)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
count+=1
fig.update_layout(
title=dict(text = "Analyze Categorical variables (Frequency / Percentage)",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
width = 980,
height = 920,
margin=dict(l=80, r=80, t=150, b=80)
)
fig.show()
card_present_flag status
8000
6000
80.26% 64.08%
6000
4000
4000 35.92%
2000
2000
19.74%
0 0
−0.5 0 0.5 1 1.5 authorized posted
txn_description movement
4000
92.67%
32.67%
10k
31.41%
3000
21.59%
2000
5k
1000
7.33% 7.33%
6.16% 0.84%
0 0
SA PO PA PA IN PH debit credit
LE S YM Y/S TE O
S-P EN AL RB NE
OS T AR AN BA
Y K NK
gender merchant_state
6000 52.19% 2000
28.11%
27.61%
47.81%
1500
4000
20.16%
1000
14.25%
2000
500
2.66%
5.38% 0.95% 0.88%
0 0
M F NSW VIC QLD WA SA NT ACT TAS
Insights
Most of the transactions (80.26%) have been done via cards (credit / Debit Card).
Almost 64.08% transactions were authorized and rest were posted.
92.67% transactions are of type debit. Rest transactions are credit.
Looks like majority of the transactions use "SALES-POS" & "POS' transaction mode.
Males tend to do more transactions as compared to females.
NSW , VIC , QLD are most busy merchant states.
ACT & TAS are least busy states.
In [17]: df0_grp=df.groupby(by='txn_description').sum()[['amount']].reset_index()
df0_grp.amount=df0_grp.amount.apply(lambda x : round(x))
df0_grp.head()
Out[17]:
txn_description amount
1 PAY/SALARY 1676577
2 PAYMENT 201794
4 POS 152861
In [18]: fig=px.treemap(df0_grp,
path=['txn_description'],
values='amount',
color = 'amount',
)
fig.update_layout(
title=dict(text = "Total Amount by Transaction Desciption",x=0.5,y=0.95),
margin=dict(l=10, r=10, t=70, b=10),
)
fig.data[0].textinfo = 'label+value'
fig.update_traces(marker_coloraxis=None)
fig.show()
PAY/SALARY PAYMENT
1,676,577 201,794
SALES-POS POS
157,005 152,861
INTER BANK
64,331
Insights :
Pay/Salary is the major contributor of bank txn amount which is expected as salary transaction amount is usually very high as compared to
normal debit transactions.
In [19]: df_grp0=df.groupby(by='merchant_suburb').sum()[['amount']].reset_index()
df_grp0.head()
Out[19]:
merchant_suburb amount
0 Abbotsford 2004.29
1 Aberdeen 52.45
2 Aberfeldie 57.77
1,118.7
362.62 327.18 323.85 321.98 313.72
332.12 325.05 324.57 312.11
South Melbourne 456.87
Ballina
1,936.87 1,474.52 Mount Druitt
556.78
West Perth Bankstown
784.01
647.82
Innaloo
Cannington
310.1
Moorabbin Airport
285.63 Tarneit
285.5
Wembley
285.48
Crows Nest
283.53
Manly
281.49
Darwin
279.76
Brighton-Le-Sands
279.29
Southbank
278.6
Ballarat Central
277.56 Carlton
274.82
Virginia
274.64
Queanbeyan
274.13
Windsor
271.25
Casuarina
270.99
Hobart
269.3
Maroubra
268.8
361.76
2,556.85
South Yarra
450.55
Broken Hill Altona Craigieburn
Emerald Gilgandra Middlemount Regents Park Whyalla Norrie
Parkes West Lakes Mooroolbark Rainbow Flat Stuart Park Alphington Bakewell Knoxfield Saint Helens Sunbury Stirling Petrie Woodvale
1,458.6
778.13 449.39
York Spearwood Green Valley Heidelberg Kings Langley Balgowlah Footscray Longreach Hornsby Phillip Gundagai Cairns City Rivervale
Upper Mount Gravatt Broadmeadows Vermont South Taylors Lakes
207.35
201.62
Katanni
201.3
Chatswood
204.01 200.9
Beechmont 209.25 207.87 205.13 202.68 202.48 202.36 201.59 201.31 200.68 200.58
202.34
261.68 225.79 201.77
Joondalup
Claremont Yarrawonga 306.08
Darwin City
190.96 190.88 187 186.42 185.03 184.87 183.45
Maroochydore 187.04 185.21 184.28 183.96 181.96 178
199.61 187.86 186.57 180.56 179.29
Woolloongabba 225.75
Doncaster East
Victoria Point
Bondi Junction
2,507.09
304.11
771.26
163.98
175.3 161.85 161.07 160.73
Toowong 165.53 165.13 164.94 162.3 161.53 159.94 159
Fyshwick
165.45 165.09 164.25
199.53
Ellenbrook 360.59 Lismore
Capalaba 225.04
538.55 260.04
1,445.58
Cannon Hill Fairfield Kelmscott Laurieton Yowah Chelsea Marlo Sale
East Victoria Park
Berrimah Bomaderry Kinglake West Stafford Heights Townsville City
Forrest Broadwater Beaudesert Campbelltown
Midland
158.41 148.56 148.22 147.3
148.62 148.17 148.04 147.73 144.57 143.85
174.54
Ashmore
Jindalee
Mount Gambier
439.76 301.54
920.95 Narrogin
Barton Blackburn Semaphore
Millner Wallaroo Glebe
North Willoughby
Essendon Fields Lutwyche Coolbellup Strathmore Macquarie Fields Saint Albans Truganina Altona Meadows Laverton North
Success Seaford Caringbah
259.79
129.13 127.62 127.4 127.27
129.84 127.86 127.58
1,085.12
157.88 130.74 128.89 128.46 128.22 127.54 125.7
Coogee 127.01 126.27
1,838.11
140.69 129.42 128.42
Bairnsdale
4,710.25
174.26
Newcastle
535.96
761.75
Ferntree Gully 198.44
297.16
Hoppers Crossing Balwyn North
223.14
439.21 Newport Elanora Ayr Yokine
Shailer Park Molendinar Culburra Beach Townsville Merriwa Bassendean Laverton
118.31
117.04
Beldon Sorrento Coorabie Katoomba Leonora Vincentia Winnellie
Box Hill
118.55 118.33 117.31 113.92 113.3 112.67
124.52 117.08 116.51 115.86 115.83 113.65
118.38 116.12 115.54 114.04
Mosman
Macquarie Park 140.5
Westmead Alstonville 157.52
354.78
Balingup
258.37
Boggabri
198.1
603.95
Vic
Yungaburra
105.38
Gloucester
Yamba Kingscliff Mittagong Carrara Belmore O'Malley Yagoona Deer Park
101.29
Bega Herne Hill
100.41
105.11 103.98 103.93 103.18 103.06 103
2,297.64
104.12 103.73 100.52
1,429.03
111.3
Ryde
Canning Vale
Albany
173.35
532.53 Robina
Sunnybank
Mansfield Chippendale
Tuncurry Ipswich Wollert Pymble Karama Taringa Benowa
256.61
915.07
Tullamarine 92.79 91.53 91.01 90.98 90.34 89.75
92.44
110.64
Miami
Geraldton
354.73
124.47
1,082.57
Merrylands
157.04
Northcote Wantirna
Artarmon Kalgoorlie
Riverwood
296.53
1,821.8
Bulleen
588.47 Casula
124.23
Kadina
11,943.53
138.05
255.93
197.34 Hinchinbrook
Malvern
97.78
Edgewater
349.39
Epping
428
Mirrabooka Gladesville 531.7
171.51
Ormeau Northbridge
124.14
219.12
Milton
898.43 734.1
Madeley Oakey
87.91
137.78 Beechboro
97.5
Parramatta
Essendon
296.39
1,406.36
255.91 197.06 156.19
Dandenong South
Reservoir
3,994.83
Blacktown
109.58
Arundel Hammondville
124.07
2,222.8
Cunderdin Tingalpa Greenwood Dowsing Point Milsons Point
Red Hill Cronulla Toronto
Mount Warren Park Murrumbeena
137.65 58.13 Aberfeldie Thomastown Marysville Surrey Hills Keilor East
58.9 58.3 57.86 57.73
59.43 59.01 57.77 57.67 57.65 57.62
Lakes Entrance
Campsie
59.06 58.06 57.93 57.62
155.69
529.58
109.54
Adelaide
Rouse Hill 196.95 Malvern East
Cheltenham
123.84
296.31 Oatley
New Town Park Avenue Graceville Curl Curl
251.4
Centennial Park Redland Bay New Norfolk Mt Gambier South Morang
170.61 Rowville
Palmyra Bellevue Rose Bay Heathcote Westcourt
Punchbowl
54.94 54.55
97.14
56.89 55.26 54.84 54.79 54.56 54.04 54.02
55.21 54.45 54.33 54.25 53.44
54.62
1,785.64
51.04 50.9 50.75 50.27 50.25
895.56
154.33 56.88
53.17
Hawthorn
50.85 50.58
Neutral Bay
347.9 Duncraig
123.51
Beaconsfield
Sunshine North
Greensborough
47.09 46.83
Thornleigh 47.71 47.56 47.4 46.72 46.69
249.76 Jerilderie 49.62
419.52
53.1 47.9 47.67 47.56 47.39
137.13 47.18
46.93 46.71
Carindale
295.19
Grenfell
108.95 86.9
56.8
526.05
1,340.39 Nowra
Gold Coast Piccadilly
Maribyrnong
56.78
Port Macquarie
Toorak
Newstead
1,074.72
195.53 Golden Point
170.08
West End
Balcatta
134.99
North Sydney
108.78
Booval Carmel
South Granville Moranbah
Mooloolah Valley
45.96 41.11
North Perth
Asquith Callala Bay Hindmarsh Buderim Wynnum West
40.88
Wyalong Stafford Waterford D
Point Cook 41.02 40.93 40.92 4
Gymea 41.07 40.63
346.02
40.87
Spring Hill Wentworth Falls 49.52 42.67 41.1 41 40.9 40.82
56.66
52.92
Springvale 248.37
Queanbeyan East
152.86 Alpine
Trundle
122.92
Auburn
295.02
711.23
Yarra Glen Brentwood City Beach Goodna Burra
Prairiewood Dunkeld
Yamanto Elwood Templestowe
37.91
Windaroo Avalon Beach
37.76
North Beach
Wagga Wagga
37.6
2,059.73
213.87
Pascoe Vale South
52.91 Hillcrest 45.95
417.68
3,770.83
Ningaloo
Scoresby 49.47
Liverpool
Altona North 108.77 Gisborne
133.52
583.96
Pantapin
878.17
Gray Hove Moree
Ballarat East Melton South Deception Bay Kidman Park Dry Creek
North Adelaide 36.99 35.84 35.73 35.6
Lennox Head
35.43
Thornlands Highton East Brisbane
35.26
35.43 35.33
195.34 35.29
42.54
Leopold 35.7 35.65 35.41
524.71
Ashford
45.87
Toorak Gardens Parafield Gardens
Broadbeach Waters 49.32
121.78 52.62
Kenmore
152.75
Highett Yeppoon
56.45 Noranda
Erskineville
34.65
Falcon Dulwich Hill
33.56
Ashwood
33.52
Whitfords
33.52
Lara Five Dock
33.18
Nelly Bay
33.17
Blackwood
33.11
Bicton Shoalwater
33.06
Balmoral 33.58
341.61
33.3 33.09
246.92
Nundah
Docklands
108.43 Drouin Wonthaggi 42.33 39.9
1,659.53
212.89
Berwick 133.26
Bulli Kaleen
293.55 Taigum Clinton
North Hobart Park Ridge Niddrie Kelvin Grove Vivonne Bay Montrose Millme
Ringwood
169.06 48.75 32.29 32.23
Keperra
North Maclean 34.6 32.07
52.59
Western Junction 32.56 32.09
32.59 32.57 32.4
Oakleigh
36.96
Vasse
56.35 Diggers Rest
Dural Lowood
South Brisbane
39.78
1,336.58
194.83 152.72
45.56
413.57
Ashby
Reedy Creek Inverloch Nangwarry Calliope Highgate Barmera Tuart Hill Redcliffe Kingsthorpe
Burleigh Heads
Aberdeen Thirlmere Kwinana Beach
30.15 30.01 29.6
Camp Hill 34.48 30.04 29.93 29.81 29.77 29.68 29.64
48.69 Hawthorne
31.71
1,074.7
108.42 52.45
695.66
36.95
582.22
Frankston
42.01
Eaton 56.27
Riverhills
South Nowra 246.82 133.21
Spotswood
Leonay North Haven Ravenhall Ulmarra Prestons
Valla Kerang
Bentleigh East Willowban
11,740.58
291.8 Belconnen Hampton Dianella Willunga
Halls Head
28.13
Mclaren Vale
25.57
Palmwoods
25.44
Schofields
25.15
Manning
25.07
Launceston
25.03 Tugun
Lauderdale 24.97
194.42 152.34 56.06
Upper Swan 31.53
859.21
Abbotsford
45.38 34.29
Rosehill
Kilsyth Exeter 36.62
Canberra
Parap
Smithfield Coopers Plains Caboolture South
Mcalinden
Queenscliff Mayfield Warburton
23.52
Manly Vale MAJURA Apollo Bay
23.27
Lindfield
24.53 23.52
48.64
39.54 23.68 23.34 23.26
132.5 Maleny
52.22
41.97 Mount Beauty 28.13
31.5
South Burnie 28.78
Murrumbateman Middle Park Glenunga 34.16
Potts Point Moonta
413.23
121.48
245.8 56.03 Edensor Park
168.75 45.24 36.59
Tallai Greenwith
21.86
Vineyard Sandy Bay
21.81
Corowa Athol Park
21.74
Aveley
Camberwell
21.84
Hamilton
21.76 21.54
Holland Park 24.53
Bass Hill Ludmilla
Hay Kealba
Christies Beach
Carina
2,004.29
48.37
Lakemba 107.58 Kilburn 39.25
Clontarf
85.5
41.91
Paradise Point
31.46
212.05 152.26 51.96
Cremorne
34.09
Prahran
45.13 Grafton
Brunswick 193.04
Mount Clarence
56.02
339.54
36.59
Lowlands 574.39
3,323.51 1,583.17
41.84 39.01 Springwood
290.35
48.34 Maffra Temora Cockatoo
Wa 33.98 Swan Hill Yallingup 19.17 19.04 18.98
522.71
121.14 51.9 24.36 22.74
Berserker Cobram
85.21 Faulconbridge O'connor
1,248.73
Modbury
168.74 107.16 Guildford 44.97 36.39 Plumpton
Airport West
Seaton
31.23 18.08
Eastgardens 55.82 Nannup Saint Helena Glenreagh Attadale Junee
Hillarys
245.64
1,049.16
152.04 Meckering Wickham 39.01 33.93 24.3
22.59
41.74
Taree
Enmore
51.86 48.26 27.89
Castlemaine
Pimpama Bentleigh Oatlands
Officer Busby
Glenside
Murray Bridge 17.7 17.27
Lyons
Leinster 36.31 28.68
211.75
31.14
132 93.81
Kandos Beerwah
410.03
Warabrook 44.79
192.71
Moorebank 22.49
Marsden Park 24.23
Braemar
120.22 Berkshire Park Narrabundah 33.93
55.76 41.66 38.96
Fairfield West
27.61
842.85
Nerang KENSINGTON Warrandyte
Labrador
Yarralumla
107.16 51.79 48.22 Catherine Field Summer Hill
Huskisson
28.68
336.88
36.31 30.9 Botany
Victor Harbor 22.37
167.58 Caffey
Haymarket 289.01
Bracknell Sorell
44.7 Plainland 18.71 16
Brisbane Airport Sylvania 16.81
151.74 Erskine Park Gooseberry Hill Lesmurdie 33.93 27.56
Tempe
41.66
55.55 38.87
Adelaide Airport Leeming
Ivanhoe
Millicent Wishart Marian
244.46
Eagleby
85.08
Oaklands Park Runcorn Belgrave
30.66
28.66
22.31 Elleker
668.57
Harlaxton 48.18 36.26 18.7
Algester
Croydon Park Torrens Park Mile End Eastland
119.5
211.13 20.61 19.62
Greenslopes 44.46
33.89 27.45
Broadbeach
Harris Park
Florey
192.3 106.28
Mitchell Uraidla
Black Head
567.67
15.89
515.81
Federal
Bellara
24.01
Naremburn
Eltham
18.7
41.63 38.86 Wynyard 30.65
55.52
Kalamunda Kilkenny
West Pennant Hills
51.73 Derrimut 36.25 Glengowrie
27.4
93.4 Woolooware
48.16
Brookvale
165.91 33.85
Brisbane City
Glen Iris Maidstone Galston
409.44
Bakery Hill Elliston
Moonee Ponds 150.23 28.6
Preston
Karana Downs Darch
131.3 41.61
38.83
Busselton
Henley Brook Newrybar 20.56
287.01
Nunawading
335.19 Wacol
36.2
2,003.3
119.46 27.35 Fawkner
18.63
Casino Ballajura Bulla
Shoal Bay 106.11
55.41
Maitland Kiara 33.78 23.95 Macedon
51.69
Birkdale
Clear Island Waters
20.55
48.1
44.08
1,543.08
210.82 192.02 Bowral 27.07 Bucasia Evatt
1,032.4
Goonellabah Bonnyrigg 36.08 Yangebup 23.86
22.12
60 Kingsford 33.75
821.08 Armidale
Lewisham
165.8 Kent Town
Seville
Chatswood West Fountain Gate
1 237 5
55.4 28.51 Hallam
West Wodonga
51.66 48.03 30.47
Eastwood
Bondi Beach 18.6
2 724 45
Main Beach
149.82
131 Calder Park The Vines
44.07
Seddon 41.49 38.68 Minchinbury
559.33
36.05
105.81
Werribee
Deagon Weston
Keswick Forest Glen Newmarket 18.57 17.4
657 82 515 61
28.41
93.21 Fairfield Heights 30.46
Brunswick West Glen Osmond
59.73 Lavington Lakelands
Geebung
l
Insights:
Sydney , Melbourne, South Brisbane , Mascot and Mount Gambier are leading contributers of transaction amount.
Out[21]:
status card_present_flag account long_lat txn_description merchant_id first_name balance date gender ... extraction amount
81c48296-
ACC- 153.41 73be-44a7- 2018- 2018-08-01
0 authorized 1 POS Diana 35.39 F ... 16.25
1598451071 -27.95 befa- 08-01 01:01:15+00:00
d053f48ce7cd
830a451c-
ACC- 153.41 316e-4a6a- 2018- 2018-08-01
1 authorized 0 SALES-POS Diana 21.20 F ... 14.19
1598451071 -27.95 bf25- 08-01 01:13:45+00:00
e37caedca49e
835c231d-
ACC- 151.23 8cdf-4e96- 2018- 2018-08-01
2 authorized 1 POS Michael 5.71 M ... 6.42
1222300524 -33.94 859d- 08-01 01:26:15+00:00
e9d571760cf0
48514682-
ACC- 153.10 c78a-4a88- 2018- 2018-08-01
3 authorized 1 SALES-POS Rhonda 2117.22 F ... 40.90
1037050564 -27.66 b0da- 08-01 01:38:45+00:00
2d6302e64673
b4e02c10-
ACC- 153.41 0852-4273- 2018- 2018-08-01
4 authorized 1 SALES-POS Diana 17.95 F ... 3.25
1598451071 -27.95 b8fd- 08-01 01:51:15+00:00
7b3395e32eb0
5 rows × 23 columns
In [22]: cols = ['card_present_flag', 'status', 'txn_description' , 'movement' , 'gender', 'merchant_state']
#Subplot initialization
fig = make_subplots(
rows=3,
cols=2,
subplot_titles=('card_present_flag', 'status', 'txn_description' , 'movement','gender', 'merchant_st
horizontal_spacing=0.2,
vertical_spacing=0.2
)
# Adding subplots
count=0
for i in range(1,4):
for j in range(1,3):
fig.add_trace(go.Bar(x=df1.groupby(by=cols[count]).sum()['amount'].index,
y=df1.groupby(by=cols[count]).sum()['amount'].values.round(2),
name=cols[count],
textposition='auto',
text=[str(round((i/sum(df1.groupby(by=cols[count]).sum()['amount'].values))*100,2))+'%'
for i in df1.groupby(by=cols[count]).sum()['amount'].values]
),
row=i,col=j)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
count+=1
fig.update_layout(
title=dict(text = "Analyze Categorical variables (Total Txn Amount/Percentage)",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
width = 980,
height = 980,
margin=dict(l=80, r=80, t=150, b=80)
)
fig.show()
card_present_flag status
300k
79.96% 52.81%
200k 47.19%
200k
100k
100k
20.04%
0 0
−0.5 0 0.5 1 1.5 authorized posted
txn_description movement
200k 600k
34.39% 100.0%
150k
26.05% 26.76% 400k
100k
200k
50k 10.96%
1.83%
0 0
IN PA PH PO SA debit
TE YM O S LE
RB EN NE S-P
AN T BA OS
K NK
gender merchant_state
100k
300k 54.55%
32.92%
28.27%
45.45%
200k
50k
17.26%
100k
10.97%
2.96%
1.57% 5.41% 0.63%
0 0
F M ACT NSW NT QLD SA TAS VIC WA
Insights
Data preparation for grouped bar chart to display state & gender wise total transaction amount
In [23]: df_grp=df1.groupby(by=['merchant_state','gender']).sum()[['amount']].reset_index()
df_grp # This is not sorted yet
Out[23]:
merchant_state gender amount
0 ACT F 1657.44
1 ACT M 3219.24
2 NSW F 41430.88
3 NSW M 60590.89
4 NT F 8741.42
5 NT M 427.47
6 QLD F 28611.05
7 QLD M 24872.40
8 SA F 11349.73
9 SA M 5426.84
10 TAS F 622.72
11 TAS M 1340.21
12 VIC F 38626.01
13 VIC M 48957.99
14 WA F 19908.15
15 WA M 14083.91
Out[24]: array(['NSW', 'VIC', 'QLD', 'WA', 'SA', 'NT', 'ACT', 'TAS'], dtype=object)
Out[25]:
merchant_state gender amount
0 NSW F 41430.88
1 NSW M 60590.89
2 VIC F 38626.01
3 VIC M 48957.99
4 QLD F 28611.05
5 QLD M 24872.40
6 WA F 19908.15
7 WA M 14083.91
8 SA F 11349.73
9 SA M 5426.84
10 NT F 8741.42
11 NT M 427.47
12 ACT F 1657.44
13 ACT M 3219.24
14 TAS F 622.72
15 TAS M 1340.21
In [26]: fig=px.bar(data_frame=df_grp,
x='merchant_state',
y='amount',color='gender',
barmode='group',
text=df_grp.amount.apply(lambda x : str(round(x/1000,2))+'k')
)
fig.update_traces(textposition='outside')
fig.update_xaxes(title='Day',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Transaction Amount in Merchant State by Gender",x=0.5,y=0.95),
title_font_size=20,
)
fig.show()
50k 48.96k
41.43k
Transaction Amount
40k 38.63k
30k 28.61k
24.87k
19.91k
20k
14.08k
11.35k
10k 8.74k
5.43k
3.22k
1.66k
0.43k 0.62k 1.34k
0
NSW VIC QLD WA SA NT ACT TAS
Day
Insights : Overall males carry out more transactions as compared to females but in three states (QLD,WA,SA) females are leading.
In [27]: fig= px.bar(data_frame=df,
x=df1['day'].value_counts().index.tolist(),
y=df1['day'].value_counts().tolist(),
color=df1['day'].value_counts().tolist(),
text=df1['day'].value_counts().tolist()
)
fig.update_traces(textposition='outside',marker_coloraxis=None)
fig.update_xaxes(title='Day',showgrid=False)
fig.update_yaxes(title='Transaction count',showgrid=False)
fig.update_layout(
title=dict(text = "Transaction flow by each day",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 450,
)
fig.show()
fig1= px.bar(data_frame=df1.groupby(by='day').sum()[['amount']].sort_values('amount',ascending=False),
text=df1.groupby(by='day').sum()[['amount']].sort_values('amount',ascending=False)['amount'].apply(lambda x :
)
fig1.update_traces(textposition='outside')
fig1.update_xaxes(title='Day',showgrid=False)
fig1.update_yaxes(title='Transaction Amount',showgrid=False)
fig1.update_layout(
title=dict(text = "Transaction amount by each day",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 450,
)
fig1.show()
1709
1658
1550
1500
1327
Transaction count
1153
1000
500
0
Wednesday Friday Saturday Thursday Sunday Tuesday Monday
Day
65.41k
Transaction Amount
60k
40k
20k
0
Wednesday Saturday Friday Thursday Sunday Tuesday Monday
Day
Insights
The transaction count is lower during the start of the week but start to pick up on wednesday through saturday.
Even though transaction count is comparatively less on satuday but it is still at place 2 in terms of transaction amount which signifies bigger
transactions on Saturday.
Data preparation for grouped bar chart to display day & gender wise total transaction amount
In [28]: df1_grp=df1.groupby(by=['day','gender']).sum()[['amount']].reset_index()
df1_grp # This is not sorted yet
Out[28]:
day gender amount
0 Friday F 44450.06
1 Friday M 44789.60
2 Monday F 31511.01
3 Monday M 33901.90
4 Saturday F 36124.99
5 Saturday M 56877.57
6 Sunday F 39932.72
7 Sunday M 42241.84
8 Thursday F 35366.77
9 Thursday M 52310.59
10 Tuesday F 33626.24
11 Tuesday M 40614.62
12 Wednesday F 45654.61
13 Wednesday M 49304.83
Out[29]:
day gender amount
0 Monday F 31511.01
1 Monday M 33901.90
2 Tuesday F 33626.24
3 Tuesday M 40614.62
4 Wednesday F 45654.61
5 Wednesday M 49304.83
6 Thursday F 35366.77
7 Thursday M 52310.59
8 Friday F 44450.06
9 Friday M 44789.60
10 Saturday F 36124.99
11 Saturday M 56877.57
12 Sunday F 39932.72
13 Sunday M 42241.84
In [30]: fig=px.bar(data_frame=df1_grp,
x='day',
y='amount',color='gender',
barmode='group',
text=df1_grp.amount.apply(lambda x : str(round(x/1000,2))+'k')
)
fig.update_traces(textposition='outside')
fig.update_xaxes(title='Day',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Transaction Amount in Merchant State by Gender",x=0.5,y=0.95),
title_font_size=20,
)
fig.show()
52.31k
49.3k
50k
45.65k
44.45k 44.79k
42.24k
40.61k 39.93k
40k
Transaction Amount
35.37k 36.12k
33.9k 33.63k
31.51k
30k
20k
10k
0
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Day
Insights
4040
2500
4020
2000
4000
1500
3980
1000
500 3960
0
October September August
Month
Insights : As per the above bar graph there is a steady increase in the number of transaction by each passing Month which is a good sign
In [32]: fig=px.bar(df.groupby(by='customer_id').sum()['amount'].sort_values(ascending=False).head(10),
color=df.groupby(by='customer_id').sum()['amount'].sort_values(ascending=False).head(10),
text=df.groupby(by='customer_id').sum()['amount'].sort_values(ascending=False).head(10).round(),
)
fig.update_traces(textposition='outside',marker_coloraxis=None)
fig.update_xaxes(title='Customer ID',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Top 10 customers by Transaction Amount",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 500,
)
fig.show()
30k
20k
10k
0
CU CU CU CU CU CU CU CU CU CU
S-2 S-3 S-1 S-2 S-2 S-8 S-4 S-5 S-1 S-2
73 142 816 15 61 83 14 27 19 03
829 62 69 57 67 48 26 400 61 13
15 58 31 01 41 25 63 76 56 27
16 64 51 61 36 47 0 97 5 254 4
4
In [33]: df.age_group.value_counts()
In [34]: fig=px.bar(df1.age_group.value_counts(),
color=df1.age_group.value_counts(),
text=df1.age_group.value_counts().tolist(),
)
fig.update_traces(textposition='outside',marker_coloraxis=None)
fig.update_xaxes(title='Age Group',showgrid=False)
fig.update_yaxes(title='Transaction Count',showgrid=False)
fig.update_layout(
title=dict(text = "Transactions by Age Group",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 450,
)
fig.show()
4000
3158
Transaction Count
3000
2000 1766
1165
1000
185 130
0
20-30 30-40 <20 40-50 >60 50-60
Age Group
Insights
Most transactions have been been carried out by Age Groups - "20-30" & "30-40".
Company should think of providing some attractive offers for "50-60" & ">60" age groups considering the transaction volume of these groups.
Data preparation for grouped bar chart to display Age Group & gender wise total transaction amount
In [35]: df2_grp=df1.groupby(by=['age_group','gender']).sum()['amount'].reset_index()
df2_grp
Out[35]:
age_group gender amount
0 <20 F 46543.04
1 <20 M 35515.86
2 20-30 F 100941.94
3 20-30 M 129592.17
4 30-40 F 74379.11
5 30-40 M 100520.30
6 40-50 F 38050.03
7 40-50 M 43137.16
8 50-60 F 3652.86
9 50-60 M 5742.71
10 >60 F 3099.42
11 >60 M 5532.75
In [36]: fig=px.bar(data_frame=df2_grp,
x = 'age_group',
y = 'amount',
color='gender',
barmode='group',
text=df2_grp.amount.apply(lambda x : str(round(x/1000,2))+'k')
)
fig.update_traces(textposition='outside')
fig.update_xaxes(title='Age Group',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Transaction Amount by Age Group & Gender",x=0.5,y=0.95),
title_font_size=20,
)
fig.show()
120k
100.94k 100.52k
100k
Transaction Amount
80k 74.38k
60k
46.54k
43.14k
40k 38.05k
35.52k
20k
Age Group
Insights :
Males in the age group of 20-30 are contributing most to the Total Txn amount.
In Age group '<20', Females are ahead of males in terms of Total txn amount
In [37]: df3_grp=df1.groupby(by='date').mean()[['amount']].merge(df1.groupby(by='date').count()[['transaction_id']],on='date')
df3_grp.columns= ['Amount','Transaction Count']
df3_grp.head()
Out[37]:
Amount Transaction Count
date
160 variable
Amount
Transaction C
140
120
100
value
80
60
40
20
Aug 5 Aug 19 Sep 2 Sep 16 Sep 30 Oct 14 Oct 28
2018
Date
Insights
The average transaction amount on 7th August & Oct 21st was very high approx 100 AUD.
Large number of transactions took place on 17th August & 28th September.
In [39]: fig=px.line(df1.groupby(by='date').sum()[['amount']])
fig.update_traces(line=dict(color="#8cba51", width=3.5))
fig.update_xaxes(title='Date',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Total Txn Amount over time",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
width = 980,
height = 500
)
fig.show()
12k
10k
Transaction Amount
8k
6k
4k
2k
Date
Insights: Total Transaction amount almost touched 14k AUD on 21st Oct. Looks like some big transaction were done on that day as the transaction
count is not that high on 21st Oct.
In [40]: fig=px.line(df1.groupby(by='hour').sum()[['amount']],
text=df1.groupby(by='hour').sum()['amount'].apply(lambda x : str(round(x/1000))+'k').values
)
fig.update_traces(line=dict(color="#f58634", width=5))
fig.update_xaxes(title='Hour',showgrid=False)
fig.update_yaxes(title='Transaction Amount',showgrid=False)
fig.update_layout(
title=dict(text = "Total Txn Amount hourly",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
width = 980,
height = 500
)
fig.update_traces(textposition='middle right',fillcolor='red')
fig.show()
47k
45k
40k 40k
38k
35k 34k 34k 33k
Transaction Amount
32k
30k 30k
28k
27k
25k
24k 24k
22k 22k
20k 21k
19k 19k 19k
16k 17k
15k 15k
13k
10k 10k
5k
3k
0 5 10 15 20
Hour
Insights:
Total transaction amount generated at 9:00 AM is approx 47k which is highest throughout the day.
Between 12:00 AM - 7:00 AM we have least transaction amount because of off hours.
Out[41]:
hour month gender Transaction Count Total Txn Amount
0 0 August F 27 676.43
1 0 August M 20 568.77
2 0 October F 11 379.63
3 0 October M 16 411.30
4 0 September F 19 574.91
fig2=px.line(data_frame=df4_grp,
x=df4_grp.hour,
y=df4_grp['Total Txn Amount'],
color=df4_grp.gender,
facet_col= df4_grp.month
)
fig2.update_xaxes(title='Hour',showgrid=False)
fig2.update_yaxes(showgrid=False)
fig2.update_layout(
title=dict(text = "Hourly Transaction count by Month ",x=0.5,y=0.95),
title_font_size=20,
width = 980,
height = 500,
margin=dict(l=80, r=80, t=100, b=80)
)
fig2.show()
150
Transaction Count
100
50
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
10k
8k
Total Txn Amount
6k
4k
2k
0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
In the month of September & October even though transaction count by females are more at 9:00 AM but TXN amount is still less. Seems like
comparatively small transactions done by females during the start of the day.
In October at 2:00 PM transaction amount by females is almost double as compared to males.
Out[43]:
hour day gender Transaction Count Total Txn Amount
0 0 Friday F 10 268.61
1 0 Friday M 12 265.03
2 0 Monday F 6 149.13
3 0 Monday M 3 61.40
4 0 Saturday F 10 303.74
In [44]: fig1=px.line(data_frame=df4_grp,
x=df4_grp.hour,
y=df4_grp['Transaction Count'],
color=df4_grp.gender,
facet_col= df4_grp.day
)
fig1.update_xaxes(title='Hour',showgrid=False)
fig1.update_yaxes(showgrid=False)
fig1.update_layout(
title=dict(text = "Hourly Transaction count by Month ",x=0.5,y=0.95),
title_font_size=20,
width = 980,
height = 500,
margin=dict(l=80, r=80, t=100, b=80)
)
fig1.show()
fig2=px.line(data_frame=df4_grp,
x=df4_grp.hour,
y=df4_grp['Total Txn Amount'],
color=df4_grp.gender,
facet_col= df4_grp.day
)
fig2.update_xaxes(title='Hour',showgrid=False)
fig2.update_yaxes(showgrid=False)
fig2.update_layout(
title=dict(text = "Hourly Transaction count by Month ",x=0.5,y=0.95),
title_font_size=20,
width = 980,
height = 500,
margin=dict(l=80, r=80, t=100, b=80)
)
fig2.show()
100
80
Transaction Count
60
40
20
0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
8000
6000
Total Txn Amount
4000
2000
0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
On Saturday at lunch time (2:00 PM) transaction amount by males is almost 6 times higher than females. However on Sunday at the same
time the trend is completely in the opposite direction.
Out[45]:
status card_present_flag account long_lat txn_description merchant_id first_name balance date gender ... extraction amount
5 rows × 23 columns
In [46]: fig=px.bar(
df2.groupby(by='customer_id').mean()['balance'].sort_values(ascending=False).head(10),
text = df2.groupby(by='customer_id').mean()['balance'].sort_values(ascending=False).head(10).apply(
lambda x : str(round(x/1000,2))+'k' ),
color = df2.groupby(by='customer_id').mean()['balance'].sort_values(ascending=False).head(10)
)
fig.update_traces(textposition='outside')
fig.update_xaxes(title='Customer ID',showgrid=False)
fig.update_yaxes(title='Average Balance',showgrid=False)
fig.update_layout(
title=dict(text = "Top Valuable Customers by AVG Balance",x=0.5,y=0.95),
title_font_size=20,
showlegend=False,
height = 500
)
fig.update_traces(marker_coloraxis=None)
fig.show()
250k
199.84k
200k
Average Balance
150k
113.25k
100k
73.96k
63.58k
58.42k 57.07k 54.79k
49.72k
50k 40.73k
0
CU CU CU CU CU CU CU CU CU CU
S-2 S-5 S-2 S-1 S-3 S-1 S-4 S-1 S-3 S-2
37 274 819 81 11 60 95 64 46 66
010 00 54 66 76 90 59 6 18 28 39
84 76 59 93 10 60 93 38 82 07
57 5 04 15 63 617 1 2 1 033 0
1 5 5
Out[48]:
month Avg Amount Total Amount
In [49]: g2=df.groupby(by='month').agg(['mean','sum'])['balance']
g2.columns=['Avg Balance', 'Total Balance']
g2[['Avg Balance','Total Balance']]=g2[['Avg Balance','Total Balance']].round().astype(int)
g2.reset_index(inplace=True)
g2
Out[49]:
month Avg Balance Total Balance
Out[50]:
month Avg Amount Total Amount Avg Balance Total Balance
Insights:
End