Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Vhat is this Look aLout? 1
Vhy Python loi Data Analysis? 2
Get things uone, last 2
Batteiies incluueu 3
Python as glue 3
Solving the "two-language" pioLlem 3
EmLeuuing Python in Laigei Systems 3
Vhy not Python? 3
Essential Python LiLiaiies +
NumPy +
panuas 5
SciPy 5
matplotliL 6
IPython 6
Installation anu Setup 7
Getting Python 7
Installing Python Packages 7
Packages you'll neeu S
Getting help S
Finuing new packages S
Integiateu Development Enviionments (IDEs) 9
Navigating this Look 9
Naming conventions 9
]aigon 9
Python 2 anu Python 3 10
Othei Python Implementations 10
A Biiel Histoiy ol panuas 10
iii
2. Whetting your appetite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Example: 1.usa.gov uata liom Lit.ly 13
Counting time zones in puie Python 15
Counting time zones with panuas 17
Example: MovieLens 1M uata set 22
Measuiing iating uisagieement 26
Example: US BaLy Names 1SS0-2010 27
Analyzing naming tienus 32
3. IPython: an interactive computing and development environment . . . . . . . . . . . . 39
IPython Basics +0
TaL completion +1
Intiospection +2
The %run commanu +3
Executing coue liom the clipLoaiu ++
KeyLoaiu shoitcuts +6
Exceptions anu tiaceLacks +7
Magic commanus +S
Qt-Laseu Rich GUI Console +9
MatplotliL integiation anu pylaL moue +9
Using the Commanu Histoiy 50
Seaiching anu ieusing the commanu histoiy 50
Input anu output vaiiaLles 51
Logging the input anu output 52
Inteiacting with the Opeiating System 52
Shell commanus anu aliases 53
Diiectoiy Lookmaik system 5+
Soltwaie Development Tools 55
Inteiactive DeLuggei 55
Timing coue: %time anu %timeit 59
Basic pioliling: %prun anu %run -p 61
Pioliling a lunction line-Ly-line 62
Tips loi piouuctive coue uevelopment using IPython 6+
Reloauing mouule uepenuencies 6+
Coue uesign tips 65
Auvanceu IPython Featuies 66
Making youi own classes IPython-liienuly 67
Pioliles anu Conliguiation 67
Cieuits 6S
4. NumPy Basics: Arrays and Vectorized Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 69
The NumPy nuaiiay: a multiuimensional aiiay oLject 70
Cieating nuaiiays 71
iv | Table of Contents
Data Types loi nuaiiays 73
Opeiations Letween aiiays anu scalais 75
Basic inuexing anu slicing 76
Boolean selection, suLsetting, lilteiing S0
Fancy inuexing S2
Tiansposing aiiays anu swapping axes S3
Univeisal Functions: Fast element-wise aiiay lunctions S5
Data piocessing using aiiays S7
Expiessing conuitional logic as aiiay opeiations SS
Mathematical anu statistical methous 90
Methous loi Loolean aiiays 91
Soiting 92
Unigue anu othei set logic 93
File input anu output with aiiays 93
Stoiing aiiays on uisk in Linaiy loimat 9+
Saving anu loauing text liles 9+
Lineai algeLia 95
Ranuom numLei geneiation 97
Example: Ranuom Valks 9S
Simulating many ianuom walks at once 99
5. First steps with pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Data stiuctuies in panuas 101
Seiies 101
DataFiame 101
Axis inuexes 101
Aiithmetic anu uata alignment 101
Inuexing, selection, anu lilteiing 101
Reinuexing 101
Diopping entiies liom an axis 102
Hieiaichical inuexing 102
Summaiies anu uesciiptive statistics 103
Unigue values, Linning, anu value counts 103
Coiielation anu covaiiance 10+
Othei aieas ol panuas 10+
6. Data loading and storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Reauing TaLulai Data liom Files 105
Reauing uata liom text liles 105
Reauing text liles in pieces 106
Reauing Miciosolt Excel liles 106
HDF5 anu Linaiy uata loimats 106
Inteiacting with SQL anu NoSQL uataLases 106
Table of Contents | v
Expoiting uata to othei loimats 106
]SON Expoit 106
Inteiacting with VeL APIs 106
7. String and Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Stiing manipulation 107
Regulai expiessions 10S
Vectoiizeu stiing lunctions in panuas 10S
Dealing with common lile loimats 10S
]SON 10S
CSV anu uelimiteu loimats 10S
XML 109
VeL anu HTML sciaping with BeautilulSoup 109
8. Data Wrangling: Clean, Transform, Merge, Reshape . . . . . . . . . . . . . . . . . . . . . . . . 113
ComLining anu meiging uata sets 113
DataLase-style DataFiame meiges 11+
Meiging on inuex 11S
Concatenating along an axis 121
ComLining uata with oveilap 125
Reshaping anu pivoting 126
Reshaping with hieiaichical inuexing 126
Pivoting "long" to "wiue" loimat 129
Hanuling missing uata 131
Filteiing out missing uata 132
Filling in missing uata 133
Data tiansloimation anu cleaning 135
Removing uuplicates 136
Tiansloiming uata using a lunction oi mapping 137
Renaming axis inuexes 13S
Discietization anu Linning 139
Detecting anu lilteiing outlieis 1+1
Peimutation anu ianuom sampling 1+3
Computing inuicatoi / uummy vaiiaLles 1++
Example: USDA Foou DataLase 1+6
9. Plotting and Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Plotting lunctions in panuas 153
Line plots 15+
Bai plots 157
Histogiams anu uensity plots 160
Scattei plots 162
Plots gioupeu Ly lactois 163
vi | Table of Contents
A Liiel matplotliL API piimei 163
Figuies anu SuLplots 16+
Colois, maikeis, anu line styles 16S
Ticks, laLels, anu legenus 169
Annotations anu uiawing on a suLplot 171
Saving plots to lile 17+
matplotliL conliguiation 175
Plotting Maps: Visualizing Haiti Eaithguake Ciisis uata 175
Python visualization tool ecosystem 1S0
Chaco 1S1
mayavi 1S1
Othei packages 1S2
The lutuie ol visualization tools? 1S2
10. Data Aggregation and Group Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
GioupBy mechanics 1S6
Iteiating ovei gioups 1S9
Selecting a column oi suLset ol columns 190
Giouping with uicts anu Seiies 191
Giouping with lunctions 192
Giouping Ly inuex levels 193
Data aggiegation 193
Column-wise anu multiple lunction application 196
Retuining aggiegateu uata in "uninuexeu" loim 19S
Gioup-wise opeiations anu tiansloimations 19S
Apply: Geneial split-apply-comLine 200
Quantile anu Lucket analysis 202
Example: Filling missing values with gioup-specilic values 203
Example: Ranuom sampling anu peimutation 205
Example: Gioup weighteu aveiage anu coiielation 207
Example: Gioup-wise lineai iegiession 20S
Pivot taLles anu Cioss-taLulation 209
Cioss-taLulations: ciosstaL 211
Example: 2012 Feueial Election Commission DataLase 212
Donation statistics Ly occupation anu employei 21+
Bucketing uonation amounts 217
Donation statistics Ly state 219
11. Time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Date anu Time Data Types anu Tools 22+
Conveiting Between stiing to uatetime 225
Time Seiies Basics 227
Inuexing, selection, suLsetting 22S
Table of Contents | vii
Time seiies with uuplicate uates 231
Date ianges, Fieguencies, anu Shilting 231
Geneiating uate ianges 232
Fieguencies anu Date Ollsets 233
Shilting (leauing anu lagging) uata 236
Time Zone Hanuling 23S
Localization anu Conveision 239
Opeiations with time zone-awaie Timestamp oLjects 2+0
Opeiations Letween uilleient time zones 2+1
Peiious anu Peiiou Aiithmetic 2+2
Peiiou Fieguency Conveision 2+3
Quaiteily peiiou lieguencies 2++
Conveiting Timestamps to Peiious (anu Lack) 2+6
Resampling anu Fieguency Conveision 2+7
Downsampling 2+S
Upsampling anu inteipolation 250
Resampling with peiious 251
Time seiies plotting 253
Moving winuow lunctions 25+
Exponentially-weighteu lunctions 257
Binaiy moving winuow lunctions 257
Usei-uelineu moving winuow lunctions 25S
Peiloimance anu Memoiy Usage Notes 259
12. Financial and Economic Data Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Data munging topics 261
Time seiies anu cioss-section alignment 262
Opeiations with time seiies ol uilleient lieguencies 26+
Time ol uay anu "as ol" uata selection 267
Splicing togethei uata souices 26S
Retuin inuexes anu cumulative ietuins 270
Panel uata opeiations 272
Gioup tiansloims anu analysis 27+
Gioup lactoi exposuies 276
Decile anu guaitile analysis 277
Moie example applications 279
Signal liontiei analysis 279
Futuie contiact iolling 2S1
Rolling coiielation anu lineai iegiession 2S+
13. Advanced NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
nuaiiay oLject inteinals 2S7
NumPy utype hieiaichy 2SS
viii | Table of Contents
Auvanceu aiiay manipulation 2S9
Reshaping aiiays 2S9
C vs. Foitian oiuei 291
Concatenating anu splitting aiiays 292
Repeating elements: tile anu iepeat 29+
Fancy inuexing eguivalents: take anu put 296
Bioaucasting 297
Bioaucasting ovei othei axes 299
Setting aiiay values Ly Lioaucasting 302
Auvanceu ulunc usage 302
Ulunc instance methous 303
Custom uluncs 305
Stiuctuieu anu iecoiu aiiays 305
Nesteu utypes anu multiuimensional lielus 306
Vhy use stiuctuieu aiiays? 307
Stiuctuieu aiiay manipulations: numpy.liL.ieclunctions 307
Moie aLout soiting 30S
Inuiiect soits: aigsoit anu lexsoit 309
Alteinate soit algoiithms 310
numpy.seaichsoiteu: Finuing elements in a soiteu aiiay 311
Auvanceu aiiay input anu output 312
Memoiy-mappeu liles 313
HDF5 anu othei aiiay stoiage options 31+
Peiloimance tips 31+
The impoitance ol contiguous memoiy 31+
Othei speeu options: Cython, l2py, C 316
14. Appendix: Python Language Essentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
The Python inteipietei 31S
Language Semantics 319
Inuentation, not Liaces 319
Eveiything is an oLject 320
Comments 320
Function anu oLject methou calls 321
VaiiaLles anu pass-Ly-ieleience 321
Dynamic ieleiences, stiong types 322
AttiiLutes anu methous 323
"Duck" typing 32+
Impoits 32+
Binaiy opeiatois anu compaiisons 325
Eageiness veisus laziness 326
MutaLle anu immutaLle oLjects 327
The Zen ol Python 327
Table of Contents | ix
Scalai Types 32S
Numeiic types 32S
Stiings 329
Booleans 331
Type casting 331
None 332
Dates anu times 332
Contiol Flow 333
Il, elil, anu else 333
Foi loops 33+
Vhile loops 335
pass 335
iange anu xiange 335
Teinaiy Expiessions 336
Tuple 336
Unpacking tuples 33S
Tuple methous 33S
List 339
Auuing anu iemoving elements 339
Concatenating anu comLining lists 3+0
Soiting 3+0
Binaiy seaich anu maintaining a soiteu list 3+1
Slicing 3+1
Built-in Seguence Functions 3+2
enumeiate 3+3
soiteu 3+3
zip 3+3
ieveiseu 3++
Dict 3++
Cieating uicts liom seguences 3+6
Delault values 3+6
Valiu uict key types 3+7
Set 3+S
List, set, anu uict compiehensions 3+9
Nesteu list compiehensions 350
Functions 351
Namespaces, scope, anu local lunctions 351
Retuining multiple values 352
Functions aie oLjects 353
Anonymous (lamLua) lunctions 35+
Closuies: lunctions that ietuin lunctions 355
"Extenueu call" syntax with 'aigs, ''kwaigs 356
Cuiiying: paitial aigument application 357
x | Table of Contents
Files anu the opeiating system 357
Geneiatois 359
Vhy caie aLout geneiatois? 360
The iteitools mouule 360
Table of Contents | xi

You might also like