Professional Documents
Culture Documents
Machine Learning Contents 8
Machine Learning Contents 8
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
2. Loading Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.0 Introduction 23
iii
2.1 Loading a Sample Dataset 23
2.2 Creating a Simulated Dataset 24
2.3 Loading a CSV File 27
2.4 Loading an Excel File 28
2.5 Loading a JSON File 29
2.6 Querying a SQL Database 30
3. Data Wrangling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.0 Introduction 33
3.1 Creating a Data Frame 34
3.2 Describing the Data 35
3.3 Navigating DataFrames 37
3.4 Selecting Rows Based on Conditionals 38
3.5 Replacing Values 39
3.6 Renaming Columns 41
3.7 Finding the Minimum, Maximum, Sum, Average, and Count 42
3.8 Finding Unique Values 43
3.9 Handling Missing Values 44
3.10 Deleting a Column 46
3.11 Deleting a Row 47
3.12 Dropping Duplicate Rows 48
3.13 Grouping Rows by Values 50
3.14 Grouping Rows by Time 51
3.15 Looping Over a Column 53
3.16 Applying a Function Over All Elements in a Column 54
3.17 Applying a Function to Groups 55
3.18 Concatenating DataFrames 55
3.19 Merging DataFrames 57
iv | Table of Contents
5. Handling Categorical Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.0 Introduction 81
5.1 Encoding Nominal Categorical Features 82
5.2 Encoding Ordinal Categorical Features 84
5.3 Encoding Dictionaries of Features 86
5.4 Imputing Missing Class Values 88
5.5 Handling Imbalanced Classes 90
6. Handling Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.0 Introduction 95
6.1 Cleaning Text 95
6.2 Parsing and Cleaning HTML 97
6.3 Removing Punctuation 98
6.4 Tokenizing Text 98
6.5 Removing Stop Words 99
6.6 Stemming Words 100
6.7 Tagging Parts of Speech 101
6.8 Encoding Text as a Bag of Words 104
6.9 Weighting Word Importance 106
Table of Contents | v
8.9 Binarizing Images 137
8.10 Removing Backgrounds 140
8.11 Detecting Edges 144
8.12 Detecting Corners 146
8.13 Creating Features for Machine Learning 150
8.14 Encoding Mean Color as a Feature 152
8.15 Encoding Color Histograms as Features 153
vi | Table of Contents
12.1 Selecting Best Models Using Exhaustive Search 210
12.2 Selecting Best Models Using Randomized Search 212
12.3 Selecting Best Models from Multiple Learning Algorithms 214
12.4 Selecting Best Models When Preprocessing 215
12.5 Speeding Up Model Selection with Parallelization 217
12.6 Speeding Up Model Selection Using Algorithm-Specific Methods 219
12.7 Evaluating Performance After Model Selection 220
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Table of Contents | ix