Professional Documents
Culture Documents
BDA02 IntroToPython
BDA02 IntroToPython
Shankar Venkatagiri
Reference
Not printed!
Flow
Q: Which is quicker?
Numerical variable
Categorical variable
Database
S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing:
An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings
of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121,
Guimarães, Portugal, October, 2011. EUROSIS.
# bank client data:
1 - age
2 - job : type of job (“admin.”,”unknown”,"unemployed","management", ...)
3 - marital : marital status ("married","divorced","single")
4 - education ("unknown","secondary","primary","tertiary")
5 - default: has credit in default? ("yes","no")
6 - balance: average yearly balance, in euros
7 - housing: has housing loan? ("yes","no")
8 - loan: has personal loan? (“yes","no")
# other attributes:
13 - campaign: number of contacts performed during this campaign, for this client
14 - pdays: days passed after last contact (-1 = client was not previously contacted)
15 - previous: number of contacts performed before this campaign for this client
16 - poutcome: outcome of previous campaign ("unknown","other","failure","success")
Whenever you read a dataset in, list out a few rows (head)
Check for any surprises in the data types
Use astype to convert some “object” columns to categorical
Categorical
Bar