Professional Documents
Culture Documents
Modern Data Science With R-775437 Chapters
Modern Data Science With R-775437 Chapters
Contents
Preface
2 Data visualization
2.1 The 2012 federal election cycle
2.2 Composing data graphics
2.3 Importance of data graphics: Challenger
2.4 Creating effective presentations
2.5 The wider world of data visualization
2.6 Further resources
2.7 Exercises
2.8 Supplementary exercises
7
3.2 Canonical data graphics in R
3.3 Extended example: Historical baby names
3.4 Further resources
3.5 Exercises
3.6 Supplementary exercises
6 Tidy data
6.1 Tidy data
6.2 Reshaping data
6.3 Naming conventions
6.4 Data intake
6.5 Further resources
6.6 Exercises
6.7 Supplementary exercises
7 Iteration
7.1 Vectorized operations
8
7.2 Using across() with dplyr functions
7.3 The map() family of functions
7.4 Iterating over a one-dimensional vector
7.5 Iteration over subgroups
7.6 Simulation
7.7 Extended example: Factors associated with BMI
7.8 Further resources
7.9 Exercises
7.10 Supplementary exercises
9 Statistical foundations
9.1 Samples and populations
9.2 Sample statistics
9.3 The bootstrap
9.4 Outliers
9
9.5 Statistical models: Explaining variation
9.6 Confounding and accounting for other factors
9.7 The perils of p-values
9.8 Further resources
9.9 Exercises
9.10 Supplementary exercises
10 Predictive modeling
10.1 Predictive modeling
10.2 Simple classi cation models
10.3 Evaluating models
10.4 Extended example: Who has diabetes?
10.5 Further resources
10.6 Exercises
10.7 Supplementary exercises
11 Supervised learning
11.1 Non-regression classi ers
11.2 Parameter tuning
11.3 Example: Evaluation of income models redux
11.4 Extended example: Who has diabetes this time?
11.5 Regularization
11.6 Further resources
11.7 Exercises
11.8 Supplementary exercises
12 Unsupervised learning
12.1 Clustering
12.2 Dimension reduction
12.3 Further resources
12.4 Exercises
12.5 Supplementary exercises
10
13 Simulation
13.1 Reasoning in reverse
13.2 Extended example: Grouping cancers
13.3 Randomizing functions
13.4 Simulating variability
13.5 Random networks
13.6 Key principles of simulation
13.7 Further resources
13.8 Exercises
13.9 Supplementary exercises
11
15.8 Exercises
15.9 Supplementary exercises
16 Database administration
16.1 Constructing ef cient SQL databases
16.2 Changing SQL data
16.3 Extended example: Building a database
16.4 Scalability
16.5 Further resources
16.6 Exercises
16.7 Supplementary exercises
18 Geospatial computations
18.1 Geospatial operations
18.2 Geospatial aggregation
18.3 Geospatial joins
18.4 Extended example: Trail elevations at MacLeish
18.5 Further resources
18.6 Exercises
18.7 Supplementary exercises
12
19 Text as data
19.1 Regular expressions using M acbeth
19.2 Extended example: Analyzing textual data from arXiv.org
19.3 Ingesting text
19.4 Further resources
19.5 Exercises
19.6 Supplementary exercises
20 Network science
20.1 Introduction to network science
20.2 Extended example: Six degrees of Kristen Stewart
20.3 PageRank
20.4 Extended example: 1996 men's college basketball
20.5 Further resources
20.6 Exercises
20.7 Supplementary exercises
13
B.2 Learning R
B.3 Fundamental structures and objects
B.4 Add-ons: Packages
B.5 Further resources
B.6 Exercises
B.7 Supplementary exercises
C Algorithmic thinking
C.1 Introduction
C.2 Simple example
C.3 Extended example: Law of large numbers
C.4 Non-standard evaluation
C.5 Debugging and defensive coding
C.6 Further resources
C.7 Exercises
C.8 Supplementary exercises
E Regression modeling
E.1 Simple linear regression
E.2 Multiple regression
E.3 Inference for regression
E.4 Assumptions underlying regression
E.5 Logistic regression
E.6 Further resources
14
E.7 Exercises
E.8 Supplementary exercises
Bibliography
Indices
Subject index
R index
15