Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Modern Deep Learning for Tabular

Data: Novel Approaches to Common


Modeling Problems 1st Edition Andre
Ye
Visit to download the full and correct content document:
https://ebookmass.com/product/modern-deep-learning-for-tabular-data-novel-approac
hes-to-common-modeling-problems-1st-edition-andre-ye/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Modern Deep Learning for Tabular Data: Novel Approaches


to Common Modeling Problems 1st Edition Andre Ye

https://ebookmass.com/product/modern-deep-learning-for-tabular-
data-novel-approaches-to-common-modeling-problems-1st-edition-
andre-ye/

Modeling and analysis of modern fluid problems Zhang

https://ebookmass.com/product/modeling-and-analysis-of-modern-
fluid-problems-zhang/

Statistical Process Monitoring Using Advanced Data-


Driven and Deep Learning Approaches: Theory and
Practical Applications 1st Edition Fouzi Harrou

https://ebookmass.com/product/statistical-process-monitoring-
using-advanced-data-driven-and-deep-learning-approaches-theory-
and-practical-applications-1st-edition-fouzi-harrou/

Multiscale Modeling Approaches for Composites 1st


Edition George Chatzigeorgiou

https://ebookmass.com/product/multiscale-modeling-approaches-for-
composites-1st-edition-george-chatzigeorgiou/
Synthetic Data for Deep Learning: Generate Synthetic
Data for Decision Making and Applications with Python
and R 1st Edition Necmi Gürsakal

https://ebookmass.com/product/synthetic-data-for-deep-learning-
generate-synthetic-data-for-decision-making-and-applications-
with-python-and-r-1st-edition-necmi-gursakal-2/

Synthetic Data for Deep Learning: Generate Synthetic


Data for Decision Making and Applications with Python
and R 1st Edition Necmi Gürsakal

https://ebookmass.com/product/synthetic-data-for-deep-learning-
generate-synthetic-data-for-decision-making-and-applications-
with-python-and-r-1st-edition-necmi-gursakal/

Risk Modeling: Practical Applications of Artificial


Intelligence, Machine Learning, and Deep Learning
Terisa Roberts

https://ebookmass.com/product/risk-modeling-practical-
applications-of-artificial-intelligence-machine-learning-and-
deep-learning-terisa-roberts/

Novel Approaches to Lesbian History Linda Garber

https://ebookmass.com/product/novel-approaches-to-lesbian-
history-linda-garber/

Demystifying Big Data, Machine Learning, and Deep


Learning for Healthcare Analytics Pradeep N Sandeep
Kautish Sheng-Lung Peng

https://ebookmass.com/product/demystifying-big-data-machine-
learning-and-deep-learning-for-healthcare-analytics-pradeep-n-
sandeep-kautish-sheng-lung-peng/
Andre Ye and Zian Wang

Modern Deep Learning for Tabular Data


Novel Approaches to Common Modeling Problems
Andre Ye
Seattle, WA, USA

Zian Wang
Redmond, WA, USA

ISBN 978-1-4842-8691-3 e-ISBN 978-1-4842-8692-0


https://doi.org/10.1007/978-1-4842-8692-0

© Andre Ye and Zian Wang 2023

This work is subject to copyright. All rights are solely and exclusively licensed
by the Publisher, whether the whole or part of the material is concerned,
specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or
hereafter developed.

The use of general descriptive names, registered names, trademarks, service


marks, etc. in this publication does not imply, even in the absence of a specific
statement, that such names are exempt from the relevant protective laws and
regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice
and information in this book are believed to be true and accurate at the date
of publication. Neither the publisher nor the authors or the editors give a
warranty, expressed or implied, with respect to the material contained herein
or for any errors or omissions that may have been made. The publisher
remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

This Apress imprint is published by the registered company APress Media, LLC,
part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004,
U.S.A.
To the skeptics and the adventurers alike
Foreword 1
Tabular data is the most common data type in real-world AI, but until recently,
tabular data problems were almost solely tackled with tree-based models. This
is because tree-based models are representationally efficient for tabular data,
inherently interpretable, and fast to train. In addition, traditional deep neural
network (DNN) architectures are not suitable for tabular data – they are vastly
overparameterized, causing them to rarely find optimal solutions for tabular
data manifolds.
What, then, does deep learning have to offer tabular data learning? The
main attraction is significant expected performance improvements, particularly
for large datasets – as has been demonstrated in the last decade in images,
text, and speech. In addition, deep learning offers many additional benefits,
such as the ability to do multimodal learning, removing the need for feature
engineering, transfer learning across different domains, semi-supervised
learning, data generation, and multitask learning – all of which have led to
significant AI advances in other data types.
This led us to develop TabNet, one of the first DNN architectures designed
specifically for tabular data. Similar to the canonical architectures in other data
types, TabNet is trained end to end without feature engineering. It uses
sequential attention to choose which features to reason from at each decision
step, enabling both interpretability and better learning as the learning capacity
is used for the most salient features. Importantly, we were able to show that
TabNet outperforms or is on par with tree-based models and is able to achieve
significant performance improvements with unsupervised pretraining. This has
led to an increased amount of work on deep learning for tabular data.
Importantly, the improvement from deep learning on tabular data extends
from academic datasets to large-scale problems in the real world. As part of
my work at Google Cloud, I have observed TabNet achieving significant
performance gains on billion-scale real-world datasets for many organizations.
When Andre reached out to share this book, I was delighted to write a
foreword. It is the first book I have come across that is dedicated toward the
application of deep learning to tabular data. The book has approachability to
anyone who knows only a bit of Python, with helpful extensive code samples
and notebooks. It strikes a good balance between covering the basics and
discussing more cutting-edge research, including interpretability, data
generation, robustness, meta-optimization, and ensembling. Overall, a great
book for data scientist–type personas to get started on deep learning for
tabular data. I’m hoping it will encourage and enable even more successful
real-world applications on deep learning to tabular data.
Tomas Pfister
Head of AI Research, Google Cloud
Author of the TabNet paper (covered in Chapter 6)
Foreword 2
Almost three decades ago, the inception of support vector machines (SVMs)
brought a storm of papers to the machine learning paradigm. The impact of
SVMs was far-reaching and touched many fields of research. Since then, both
the complexity of data and hardware technologies have expanded multifold,
demanding the need for more sophisticated algorithms. This need took us to
the development of state-of-the-art deep learning networks such as
convolutional neural networks (CNNs).
Artificial intelligence (AI) tools like deep learning gained momentum and
stretched into industries, such as driverless cars. High-performance graphics
processing units (GPUs) have accelerated the computation of deep learning
models. Consequently, deeper nets have evolved, pushing both industrial
innovation and academic research.
Generally, visual data occupies the space of deep learning technologies,
and tabular data (nonvisual) is largely ignored in this domain. Many areas of
research, such as biological sciences, material science, medicine, energy, and
agriculture, generate large amounts of tabular data.
This book attempts to explore the less examined domain of tabular data
analysis using deep learning technologies. Very recently, some exciting
research has been done that could drive us into an era of extensively applying
deep learning technologies to tabular data.
Andre and Andy have undoubtedly touched this dimension and
comprehensively put forward essential and relevant works for both novice and
expert readers to apprehend. They described the core models to bridge
tabular data and deep learning technologies. Moreover, they have built up
detailed and easy-to-follow programming language codes. Links to Python
scripts and necessary data are provided for anyone to reimplement the
models.
Although the most modern deep learning networks perform very
promisingly on many applications, they have their limitations, particularly
when handling tabular data. For instance, in biological sciences, tabular data
such as multi-omics data have huge dimensions with very few samples. Deep
learning models have transfer learning capability, which could be used on a
large scale on tabular data to tap hidden information.
Andre and Andy's work is commendable in bringing this crucial
information together in a single work. This is a must-read for AI adventurers!
Alok Sharma
RIKEN Center for Integrative Medical Sciences, Japan
Author of the DeepInsight paper (covered in Chapter 4)
Introduction
Deep learning has become the public and private face of artificial intelligence.
When one talks casually about artificial intelligence with friends at a party,
strangers on the street, and colleagues at work, it is almost always on the
exciting models that generate language, create art, synthesize music, and so
on. Massive and intricately designed deep learning models power most of
these exciting machine capabilities. Many practitioners, however, are rightfully
pushing back against the technological sensationalism of deep learning. While
deep learning is “what’s cool,” it certainly is not the be-all and end-all of
modeling.
While deep learning has undoubtedly dominated specialized, high-
dimensional data forms such as images, text, and audio, the general
consensus is that it performs comparatively worse in tabular data. It is
therefore tabular data where those with some distaste, or even resentment,
toward deep learning stake out their argument. It was and still is fashionable
to publish accepted deep learning papers that make seemingly trivial or even
scientifically dubious modifications – this being one of the gripes against deep
learning research culture – but now it is also fashionable within this minority to
bash the “fresh-minted new-generation data scientists” for being too
enamored with deep learning and instead to tout the comparatively more
classic tree-based methods as instead the “best” model for tabular data. You
will find this perspective everywhere – in bold research papers, AI-oriented
social media, research forums, and blog posts. Indeed, the counter-culture is
often as fashionable as the mainstream culture, whether it is with hippies or
deep learning.
This is not to say that there is no good research that points in favor of
tree-based methods over deep learning – there certainly is.1 But too often this
nuanced research is mistaken and taken for a general rule, and those with a
distaste for deep learning often commit to the same problematic doctrine as
many advancing the state of deep learning: taking results obtained within a
generally well-defined set of limitations and willfully extrapolating them in
ways irrespective of said limitations.
The most obvious shortsightedness of those who advocate for tree-based
models over deep learning models is in the problem domain space. A common
criticism of tabular deep learning approaches is that they seem like “tricks,”
one-off methods that work sporadically, as opposed to reliably high-
performance tree methods. The intellectual question Wolpert and Macready’s
classic No Free Lunch Theorem makes us think about whenever we encounter
claims of universal superiority, whether this is superiority in performance,
consistency, or another metric, is: Universal across what subset of the problem
space?
The datasets used by the well-publicized research surveys and more
informal investigations showing the success of deep learning over tabular data
are common benchmark datasets – the Forest Cover dataset, the Higgs Boson
dataset, the California Housing dataset, the Wine Quality dataset, and so on.
These datasets, even when evaluated in the dozens, are undoubtedly limited.
It would not be unreasonable to suggest that out of all data forms, tabular
data is the most varied. Of course, we must acknowledge that it is much more
difficult to perform an evaluative survey study with poorly behaved diverse
datasets than more homogenous benchmark datasets. Yet those who tout the
findings of such studies as bearing a broad verdict on the capabilities of neural
networks on tabular data overlook the sheer breadth of tabular data domains
in which machine learning models are applied.
With the increase of data signals acquirable from biological systems,
biological datasets have increased significantly in feature richness from their
state just one or two decades ago. The richness of these tabular datasets
exposes the immense complexity of biological phenomena – intricate patterns
across a multitude of scales, ranging from the local to the global, interacting
with each other in innumerable ways. Deep neural networks are almost always
used to model modern tabular datasets representing complex biological
phenomena. Alternatively, content recommendation, an intricate domain
requiring careful and high-capacity modeling power, more or less universally
employs deep learning solutions. Netflix, for instance, reported “large
improvements to our recommendations as measured by both offline and online
metrics” when implementing deep learning.2 Similarly, a Google paper
demonstrating the restructuring of deep learning as the paradigm powering
YouTube recommendations writes that “In conjugation with other product
areas across Google, YouTube has undergone a fundamental paradigm shift
towards using deep learning as a general-purpose solution for nearly all
learning problems.”3
We can find many more examples, if only we look. Many tabular datasets
contain text attributes, such as an online product reviews dataset that contains
a textual review as well as user and product information represented in tabular
fashion. Recent house listing datasets contain images associated with standard
tabular information such as the square footage, number of bathrooms, and so
on. Alternatively, consider stock price data that captures time-series data in
addition to company information in tabular form. What if we want to also add
the top ten financial headlines in addition to this tabular data and the time-
series data to forecast stock prices? Tree-based models, to our knowledge,
cannot effectively address any of these multimodal problems. Deep learning,
on the other hand, can be used to solve all of them. (All three of these
problems, and more, will be explored in the book.)
The fact is that data has changed since the 2000s and early 2010s, which
is when many of the benchmark datasets were used in studies that
investigated performance discrepancies between deep learning and tree-based
models. Tabular data is more fine-grained and complex than ever, capturing a
wide range of incredibly complex phenomena. It is decidedly not true that
deep learning functions as an unstructured, sparsely and randomly successful
method in the context of tabular data.
However, raw supervised learning is not just a singular problem in
modeling tabular data. Tabular data is often noisy, and we need methods to
denoise noise or to otherwise develop ways to be robust to noise. Tabular data
is often also always changing, so we need models that can structurally adapt
to new data easily. We often also encounter many different datasets that share
a fundamentally similar structure, so we would like to be able to transfer
knowledge from one model to another. Sometimes tabular data is lacking, and
we need to generate realistic new data. Alternatively, we would also like to be
able to develop very robust and well-generalized models with a very limited
dataset. Again, as far as we are aware, tree-based models either cannot do
these tasks or have difficulty doing them. Neural networks, on the other hand,
can do all the following successfully, following adaptations to tabular data from
the computer vision and natural language processing domains.
Of course, there are important legitimate general objections to neural
networks.
One such objection is interpretability – the contention that deep neural
networks are less interpretable than tree models. Interpretability is a
particularly interesting idea because it is more an attribute of the human
observer than the model itself. Is a Gradient Boosting model operating on
hundreds of features really more intrinsically interpretable than a multilayer
perceptron (MLP) trained on the same dataset? Tree models do indeed build
easily understandable single-feature split conditions, but this in and of itself is
not really quite valuable. Moreover, many or most models in popular tree
ensembles like Gradient Boosting systems do not directly model the target, but
rather the residual, which makes direct interpretability more difficult. What we
care about more is the interaction between features. To effectively grasp at
this, tree models and neural networks alike need external interpretability
methods to collapse the complexity of decision making into key directions and
forces. Thus, it is not clear that tree models are more inherently interpretable
than neural network models4 on complex datasets.5
The second primary objection is the laboriousness of tuning the meta-
parameters of neural networks. This is an irreconcilable attribute of neural
networks. Neural networks are fundamentally closer to ideas than real
concrete algorithms, given the sheer diversity of possible configurations and
approaches to the architecture, training curriculum, optimization processes,
and so on. It should be noted that this is an even more pronounced problem
for computer vision and natural language processing domains than tabular
data and that approaches have been proposed to mitigate this effect.
Moreover, it should be noted that tree-based models have a large number of
meta-parameters too, which often require systematic meta-optimization.
A third objection is the inability of neural networks to effectively
preprocess data in a way that reflects the effective meaning of the features.
Popular tree-based models are able to interpret features in a way that is
argued to be more effective for heterogeneous data. However, this does not
bar the conjoined application of deep learning with preprocessing schemes,
which can put neural networks on equal footing against tree-based models
with respect to access to expressive features. We spend a significant amount
of space in the book covering different preprocessing schemes.
All of this is to challenge the notion that tree-based models are superior
to deep learning models or even that tree-based models are consistently or
generally superior to deep learning models. It is not to suggest that deep
learning is generally superior to tree-based models, either. To be clear, the
following claims were made:

Deep learning works successfully on a certain well-defined domain of


problems, just as tree-based methods work successfully on another
well-defined problem domain.
Deep learning can solve many problems beyond raw tabular
supervised learning that tree-based methods cannot, such as
modeling multimodal data (image, text, audio, and other data in
addition to tabular data), denoising noisy data, transferring
knowledge across datasets, successfully training on limited datasets,
and generating data.
Deep learning is indeed prone to difficulties, such as interpretability
and meta-optimization. In both cases, however, tree-based models
suffer from the same problems to a somewhat similar degree (on the
sufficiently complex cases where deep learning would be at least
somewhat successful). Moreover, we can attempt to reconcile
weaknesses of neural networks with measures such as employing
preprocessing pipelines in conjunction with deep models.

The objective of this book is to substantiate these claims by providing the


theory and tools to apply deep learning to tabular data problems. The
approach is tentative and explorative, especially given the novel nature of
much of this work. You are not expected to accept the validity or success of
everything or even most things presented. This book is rather a treatment of a
wide range of ideas, with the primary pedagogical goal of exposure.
This book is written toward two audiences (although you are more than
certainly welcome if you feel you do not belong to either): experienced tabular
data skeptics and domain specialists seeking to potentially apply deep learning
to their field. To the former, we hope that you will find the methods and
discussion presented in this book, both original and synthesized from research,
at least interesting and worthy of thought, if not convincing. To the latter, we
have structured the book in a way that provides sufficient introduction to
necessary tools and concepts, and we hope our discussion will assist modeling
your domain of specialty.
---
Organization of the Book
Across 12 chapters, the book is organized into three parts.
Part 1, “Machine Learning and Tabular Data,” contains Chapter 1,
“Classical Machine Learning Principles and Methods,” and Chapter 2, “Data
Preparation and Engineering.” This part introduces machine learning and data
concepts relevant for success in the remainder of the book.

Chapter 1, “Classical Machine Learning Principles and Methods,”


covers important machine learning concepts and algorithms. The
chapter demonstrates the theory and implementation of several
foundational machine learning models and tabular deep learning
competitors, including Gradient Boosting models. It also discusses the
bridge between classical and deep learning.
Chapter 2, “Data Preparation and Engineering,” is a wide exposition
of methods to manipulate, manage, transform, and store tabular data
(as well as other forms of data you may need for multimodal
learning). The chapter discusses NumPy, Pandas, and TensorFlow
Datasets (native and custom); encoding methods for categorical,
text, time, and geographical data; normalization and standardization
(and variants); feature transformations, including through
dimensionality reduction; and feature selection.
Part 2, “Applied Deep Learning Architectures,” contains Chapter 3, “Neural
Networks and Tabular Data”; Chapter 4, “Applying Convolutional Structures to
Tabular Data”; Chapter 5, “Applying Recurrent Structures to Tabular Data”;
Chapter 6, “Applying Attention to Tabular Data”; and Chapter 7, “Tree-Based
Deep Learning Approaches.” This part constitutes the majority of the book and
demonstrates both how various neural network architectures function in their
“native application” and how they can be appropriated for tabular data in both
intuitive and unintuitive ways. Chapters 3, 4, and 5 each center one of the
three well-established (even “traditional”) areas of deep learning – artificial
neural networks (ANNs), convolutional neural networks, and recurrent neural
networks – and their relevance to tabular data. Chapters 6 and 7 collectively
cover two of the most prominent modern research directions in applying deep
learning to tabular data – attention/transformer methods, which take
inspiration from the similarity of modeling cross-word/token relationships and
modeling cross-feature relationships, and tree-based neural network methods,
which attempt to emulate, in some way or another, the structure or
capabilities of tree-based models in neural network form.

Chapter 3, “Neural Networks and Tabular Data,” covers the


fundamentals of neural network theory – the multilayer perceptron,
the backpropagation derivation, activation functions, loss functions,
and optimizers – and the TensorFlow/Keras API for implementing
neural networks. Comparatively advanced neural network methods
such as callbacks, batch normalization, dropout, nonlinear
architectures, and multi-input/multi-output models are also
discussed. The objective of the chapter is to provide both an
important theoretical foundation to understand neural networks and
the tools to begin implementing functional neural networks to model
tabular data.
Chapter 4, “Applying Convolutional Structures to Tabular Data,”
begins by demonstrating the low-level mechanics of convolution and
pooling operations, followed by the construction and application of
standard convolutional neural networks for image data. The
application of convolutional structures to tabular data is explored in
three ways: multimodal image-and-tabular datasets, one-dimensional
convolutions, and two-dimensional convolutions. This chapter is
especially relevant to biological applications, which often employ
methods in this chapter.
Chapter 5, “Applying Recurrent Structures to Tabular Data,” like
Chapter 4, begins by demonstrating how three variants of recurrent
models – the “vanilla” recurrent layer, the Long Short-Term Memory
(LSTM) layer, and the Gated Recurrent Unit layer – capture sequential
properties in the input. Recurrent models are applied to text, time-
series, and multimodal data. Finally, speculative methods for directly
applying recurrent layers to tabular data are proposed.
Chapter 6, “Applying Attention to Tabular Data,” introduces the
attention mechanism and the transformer family of models. The
attention mechanism is applied to text, multimodal text-and-tabular
data, and tabular contexts. Four research papers – TabTransformer,
TabNet, SAINT (Self-Attention and Intersample Attention
Transformer), and ARM-Net – are discussed in detail and
implemented.
Chapter 7, “Tree-Based Deep Learning Approaches,” is dominantly
research-focused and focuses on three primary classes of tree-based
neural networks: tree-inspired/emulative neural networks, which
attempt to replicate the character of tree models in the architecture
or structure of a neural network; stacking and boosting neural
networks; and distillation, which transfers tree-structured knowledge
into a neural network.

Part 3, “Deep Learning Design and Tools,” contains Chapter 8,


“Autoencoders”; Chapter 9, “Data Generation”; Chapter 10, “Meta-
optimization”; Chapter 11, “Multi-model Arrangement”; and Chapter 12,
“Neural Network Interpretability.” This part demonstrates how neural networks
can be used and understood beyond the raw task of supervised modeling in
somewhat shorter chapters.

Chapter 8, “Autoencoders,” introduces the properties of autoencoder


architectures and demonstrates how they can be used for pretraining,
multitask learning, sparse/robust learning, and denoising.
Chapter 9, “Data Generation,” shows how Variational Autoencoders
and Generative Adversarial Networks can be applied to the
generation of tabular data in limited-data contexts.
Chapter 10, “Meta-optimization,” demonstrates how Bayesian
optimization with Hyperopt can be employed to automate the
optimization of meta-parameters including the data encoding pipeline
and the model architecture, as well as the basics of Neural
Architecture Search.
Chapter 11, “Multi-model Arrangement,” shows how neural network
models can be dynamically ensembled and stacked together to boost
performance or to evaluate live/“real-time” model prediction quality.
Chapter 12, “Neural Network Interpretability,” presents three
methods, both model-agnostic and model-specific, for interpreting
the predictions of neural networks.

All the code for the book is available in a repository on Apress’s GitHub
here: https://github.com/apress/modern-deep-learning-tabular-data.
We are more than happy to discuss the book and other topics with you.
You can reach Andre at andreye@uw.edu and Andy at andyw0612@gmail.com.
We hope that this book is thought-provoking and interesting and – most
importantly – inspires you to think critically about the relationship between
deep learning and tabular data. Happy reading, and thanks for joining us on
this adventure!
Best,
Andre and Andy
Any source code or other supplementary material referenced by the author in
this book is available to readers on GitHub (https://github.com/Apress). For
more detailed information, please visit http://www.apress.com/source-code.
Acknowledgments
This book would not have been possible without the professional help and
support of so many. We want to express our greatest gratitude to Mark
Powers, the awesome coordinating editor powering the logistics of the book’s
development, and all the other amazing staff at Apress whose tireless work
allowed us to focus on writing the best book we could. We also would like to
thank Bharath Kumar Bolla, our technical reviewer, as well as Kalamendu Das,
Andrew Steckley, Santi Adavani, and Aditya Battacharya for serving as our
third through seventh pair of eyes. Last but not least, we are honored to have
had Tomas Pfister and Alok Sharma each contribute a foreword.
Many have also supported us in our personal lives – we greatly appreciate
our friends and family for their unwavering support and motivation. Although
these incredible individuals might not be as involved in the technical aspect as
those mentioned previously (we have fielded the question “So exactly what is
your book about?” at the dinner table many a time), we would not have been
able to accomplish what we set out to do – in this book and in life – without
their presence.
Table of Contents
Part I: Machine Learning and Tabular Data
Chapter 1:​Classical Machine Learning Principles and Methods
Fundamental Principles of Modeling
What Is Modeling?​
Modes of Learning
Quantitative Representations of Data:​Regression and
Classification
The Machine Learning Data Cycle:​Training, Validation,
and Test Sets
Bias-Variance Trade-Off
Feature Space and the Curse of Dimensionality
Optimization and Gradient Descent
Metrics and Evaluation
Mean Absolute Error
Mean Squared Error (MSE)
Confusion Matrix
Accuracy
Precision
Recall
F1 Score
Area Under the Receiver Operating Characteristics Curve
(ROC-AUC)
Algorithms
K-Nearest Neighbors
Linear Regression
Logistic Regression
Decision Trees
Gradient Boosting
Summary of Algorithms
Thinking Past Classical Machine Learning
Key Points
Chapter 2:​Data Preparation and Engineering
Data Storage and Manipulation
TensorFlow Datasets
Creating a TensorFlow Dataset
TensorFlow Sequence Datasets
Handling Large Datasets
Data Encoding
Discrete Data
Continuous Data
Text Data
Time Data
Geographical Data
Feature Extraction
Single- and Multi-feature Transformations
Principal Component Analysis
t-SNE
Linear Discriminant Analysis
Statistics-Based Engineering
Feature Selection
Information Gain
Variance Threshold
High-Correlation Method
Recursive Feature Elimination
Permutation Importance
LASSO Coefficient Selection
Key Points
Part II: Applied Deep Learning Architectures
Chapter 3:​Neural Networks and Tabular Data
What Exactly Are Neural Networks?​
Neural Network Theory
Starting with a Single Neuron
Feed-Forward Operation
Introduction to Keras
Modeling with Keras
Loss Functions
Math Behind Feed-Forward Operation
Activation Functions
The Math Behind Neural Network Learning
Gradient Descent in Neural Networks
The Backpropagation Algorithm
Optimizers
Mini-batch Stochastic Gradient Descent (SGD) and
Momentum
Nesterov Accelerated Gradient (NAG)
Adaptive Moment Estimation (Adam)
A Deeper Dive into Keras
Training Callbacks and Validation
Batch Normalization and Dropout
The Keras Functional API
The Universal Approximation Theorem
Selected Research
Simple Modifications to Improve Tabular Neural
Networks
Wide and Deep Learning
Self-Normalizing Neural Networks
Regularization Learning Networks
Key Points
Chapter 4:​Applying Convolutional Structures to Tabular Data
Convolutional Neural Network Theory
Why Do We Need Convolutions?​
The Convolution Operation
The Pooling Operation
Base CNN Architectures
Multimodal Image and Tabular Models
1D Convolutions for Tabular Data
2D Convolutions for Tabular Data
DeepInsight
IGTD (Image Generation for Tabular Data)
Key Points
Chapter 5:​Applying Recurrent Structures to Tabular Data
Recurrent Models Theory
Why Are Recurrent Models Necessary?​
Recurrent Neurons and Memory Cells
LSTMs and Exploding Gradients
Gated Recurrent Units (GRUs)
Bidirectionality​
Introduction to Recurrent Layers in Keras
Return Sequences and Return State
Standard Recurrent Model Applications
Natural Language
Time Series
Multimodal Recurrent Modeling
Direct Tabular Recurrent Modeling
A Novel Modeling Paradigm
Optimizing the Sequence
Optimizing the Initial Memory State(s)
Further Resources
Key Points
Chapter 6:​Applying Attention to Tabular Data
Attention Mechanism Theory
The Attention Mechanism
The Transformer Architecture
BERT and Pretraining Language Models
Taking a Step Back
Working with Attention
Simple Custom Bahdanau Attention
Native Keras Attention
Attention in Sequence-to-Sequence Tasks
Improving Natural Language Models with Attention
Direct Tabular Attention Modeling
Attention-Based Tabular Modeling Research
TabTransformer
TabNet
SAINT
ARM-Net
Key Points
Chapter 7:​Tree-Based Deep Learning Approaches
Tree-Structured Neural Networks
Deep Neural Decision Trees
Soft Decision Tree Regressors
NODE
Tree-Based Neural Network Initialization
Net-DNF
Boosting and Stacking Neural Networks
GrowNet
XBNet
Distillation
DeepGBM
Key Points
Part III: Deep Learning Design and Tools
Chapter 8:​Autoencoders
The Concept of the Autoencoder
Vanilla Autoencoders
Autoencoders for Pretraining
Multitask Autoencoders
Sparse Autoencoders
Denoising and Reparative Autoencoders
Key Points
Chapter 9:​Data Generation
Variational Autoencoders
Theory
Implementation
Generative Adversarial Networks
Theory
Simple GAN in TensorFlow
CTGAN
Key Points
Chapter 10:​Meta-optimization
Meta-optimization:​Concepts and Motivations
No-Gradient Optimization
Optimizing Model Meta-parameters
Optimizing Data Pipelines
Neural Architecture Search
Key Points
Chapter 11:​Multi-model Arrangement
Average Weighting
Input-Informed Weighting
Meta-evaluation
Key Points
Chapter 12:​Neural Network Interpretability​
SHAP
LIME
Activation Maximization
Key Points
Closing Remarks
Appendix:​NumPy and Pandas
Index
About the Authors
Andre Ye

is a deep learning (DL) researcher with a focus on building and training robust
medical deep computer vision systems for uncertain, ambiguous, and unusual
contexts. He has published another book with Apress, Modern Deep Learning
Design and Application Development, and writes short-form data science
articles on his blog. In his spare time, Andre enjoys keeping up with current
deep learning research and jamming to hard metal.

Zian “Andy” Wang


is a researcher and technical writer passionate about data science and
machine learning (ML). With extensive experiences in modern artificial
intelligence (AI) tools and applications, he has competed in various
professional data science competitions while gaining hundreds and thousands
of views across his published articles. His main focus lies in building versatile
model pipelines for different problem settings including tabular and computer
vision–related tasks. At other times while Andy is not writing or programming,
he has a passion for piano and swimming.
About the Technical Reviewer
Bharath Kumar Bolla

has over 10 years of experience and is currently working as a senior data


science engineer consultant at Verizon, Bengaluru. He has a PG diploma in
data science from Praxis Business School and an MS in life sciences from
Mississippi State University, USA. He previously worked as a data scientist at
the University of Georgia, Emory University, Eurofins LLC, and Happiest Minds.
At Happiest Minds, he worked on AI-based digital marketing products and
NLP-based solutions in the education domain. Along with his day-to-day
responsibilities, Bharath is a mentor and an active researcher. To date, he has
published around ten articles in journals and peer-reviewed conferences. He is
particularly interested in unsupervised and semi-supervised learning and
efficient deep learning architectures in NLP and computer vision.
Footnotes
1
Two such studies are especially well-articulated and nuanced:
Gorishniy, Y.V., Rubachev, I., Khrulkov, V., & Babenko, A. (2021). Revisiting Deep Learning Models for
Tabular Data. NeurIPS.
Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree-based models still outperform deep
learning on tabular data?

2
https://ojs.aaai.org/index.php/aimagazine/article/view/18140.

3
Covington, P., Adams, J.K., & Sargin, E. (2016). Deep Neural Networks for YouTube Recommendations.
Proceedings of the 10th ACM Conference on Recommender Systems.

4
It is in fact not implausible to suggest that the differentiability/gradient access of neural network models
enables improved interpretability of decision-making processes relative to the complexity of the model.

5
The prerequisite “complex” is important. A decision tree trained on a comparatively simple dataset with a
small number of features is of course more interpretable than a neural network, but in these sorts of
problems, a decision tree would be preferable to a neural network anyway by virtue of being more
lightweight and achieving similar or superior performance.
Part I
Machine Learning and Tabular Data
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. Ye, Z. WangModern Deep Learning for Tabular Data
https://doi.org/10.1007/978-1-4842-8692-0_1

1. Classical Machine Learning Principles


and Methods
Andre Ye1 and Andy Wang2

(1)
Seattle, WA, USA
(2)
Redmond, WA, USA

True wisdom is acquired by the person who sees patterns, and comes
to understand those patterns in their connection to other patterns –
and from these interconnected patterns, learns the code of life – and
from the learning of the code of life, becomes a co-creator with God.
—Hendrith Vanlon Smith Jr., Author and Businessman

The focus of this book is on deep learning for tabular data (don’t worry –
you got the right book!), but there are many good reasons to first develop a
solid understanding of classical machine learning as a foundation.
Deep learning rests and builds upon the fundamental principles of
machine learning. In many cases the axioms and behavior of the modern deep
learning regime are different and distorted – these distortions and paradigm
shifts will be further discussed and explored. Nevertheless, a strong
knowledge of classical machine learning is required to intuit and work with
deep learning, in the same way that calculus requires a strong understanding
of algebra even though it operates under new, unfamiliar environments of
infinity.
Moreover, understanding classical machine learning is even more
important to one seeking to apply deep learning for tabular data than one
seeking to apply deep learning applications to other sorts of data (image, text,
signal, etc.) because much of classical machine learning was built for and
around tabular data, with later adaptions for other types of data. The core
algorithms and principles of classical machine learning often assume the
presence of certain data characteristics and attributes that are often present
only in tabular data. To understand the modeling of tabular data, one must
understand these characteristics and attributes, why classical machine learning
algorithms can assume and model them, and how they can potentially be
better modeled with deep learning.
Additionally – despite the tremendous hype around it – deep learning is
not a universal solution, nor should it be treated as one! In specialized data
applications, often the only available viable solutions are based on deep
learning. For problems involving tabular data, however, classical machine
learning may provide a better solution. Therefore, it’s important to know the
theory and implementation of that set of classical approaches. Throughout the
book, you’ll continue to hone your judgment regarding which family of
techniques is most applicable to the problems you need to solve.
This chapter will give a wide overview of classical machine learning
principles, theory, and implementations. It is organized into four sections:
“Fundamental Principles of Modeling,” which explores key unifying concepts
through intuition, math, visualizations, and code experiments; “Metrics and
Evaluation,” which provides an overview and implementation for commonly
used model assessment methods; “Algorithms,” which covers the theory and
implementation (from scratch and from existing libraries) of a wide range of
foundational and popular classical machine learning algorithms; and “Thinking
Past Classical Machine Learning,” which introduces the transition into new
deep learning territory. The chapter assumes basic working knowledge of
Python and NumPy but introduces basic usage of the modeling library scikit-
learn. Please see Chapter 2 for an in-depth overview of concepts and
implementation in NumPy, as well as Pandas and TensorFlow Datasets.

Fundamental Principles of Modeling


In this section, we will explore several fundamental ideas, concepts, and
principles of modeling, especially in relation to the tabular data regime and
context. Even if you’re familiar or experienced with these topics, you may find
discussion of new conceptual approaches or frameworks to solidify and push
your understanding.

What Is Modeling?
Thinking generally beyond the field of data science, modeling refers to the
process of building appropriate and “smaller” approximations or
representations of “larger” phenomena. For instance, a fashion model’s role is
to demonstrate key elements and trends of a fashion brand. When a designer
chooses what clothes a fashion model wears, they cannot have them wear
every piece of clothing that the fashion brand carries – they must choose the
most representative items, the ones that best capture the spirit, character, and
philosophy of the brand. In the humanities and the social sciences (philosophy,
sociology, history, psychology, political science, etc.), a model (or, similarly, a
“theory”) acts as a unifying framework that connects ideas and empirical
observations. A psychological theory for love cannot explain or hold all the
possible reasons and conditions in which love has ever arisen, but it can
approximate it by exploring the general contours and dynamics of love.
A scientific model, similarly, articulates a concept that has a general
influence on whatever environment it is derived from. Even though the natural
world is noisy and scientific models are approximations, models can be used to
derive useful approximate predictions. The Copernican model of the solar
system suggested that the sun was at the center of the universe, with Earth
and other planets orbiting around it in circular paths. This model can be used
to understand and predict the motions of planets.
In chemical kinetics, steady-state approximation is a model to obtain the
rate law of a multi-step chemical reaction by assuming all state variables are
constant; the amount of a reactant produced is equal to the amount of that
reactant consumed. This captures another theme of models: because models
generalize, they must make certain assumptions.
Both steady-state approximation and the Copernican model, and all
models, don’t work perfectly, but rather “well enough” in “most” cases.
As briefly explored, models are useful in all sorts of domains. In these
contexts, the “smallness” of models relative to the “largeness” of the
phenomena being modeled is primarily in representation size. For instance, the
Copernican model allows us to – under certain assumptions and simplifications
– predict the trajectory of planets instead of waiting to let the universe (the
phenomena being modeled) play out and observing the result. Restated,
they’re useful because they are easier to comprehend. One can manipulate
and understand a representation of the phenomenon without observing the
phenomenon itself. (We don’t want to need to actually observe a meteor
crashing into Earth to know that a meteor is crashing into Earth – at least, we
would hope so.) In the context of data science and computer science,
however, “smallness” has another key attribute: ease of model creation and
deployment.
It takes significant time and effort for designers to curate a fashion
model’s outfit, for psychologists to collect disparate ideas and unify them into
a theory of the human mind, and for scientists to develop models from
empirical data. Rather, we would like a way to autonomously unify
observations into a model to generalize an understanding of the phenomena
from which the observations are derived. In this context, autonomous means
“without significant human intervention or supervision during the creation of
the model.” In this book, and in contexts of data science, the term modeling is
usually thus associated with some dimension of automation.
Automated modeling requires a computational/mathematical/symbolic
approach to “understanding” and generalizing across a set of given
observations, as computers are the most powerful and scalable instrument of
automation known to humanity at the writing of this book. Even the most
complex-behaving artificial intelligence models – often modeling the dynamics
of language or vision – are built upon mathematical units and computation.
Moreover, because a computational/mathematical/symbolic approach to
automation must be taken, the observations themselves must be organized in
a quantitative format. Much of scientific data is quantitative, whereas “data”
(perhaps more aptly referenced as “ideas”) in the humanities is more often
qualitative and textual. Does this mean that automated modeling cannot deal
with most data in the humanities or any data that does not appear
quantitatively in raw form? No, it merely means that we need another step of
converting the data into quantitative form. We’ll discuss this more in later
subsections.
We can give this concept of automated modeling a name: learning. In this
context, learning is the process of automatically developing representations of
a concept or phenomenon given a set of observations about/derived from it,
rather than requiring extensive involvement of another being (usually, a
human) to piece the generalization together “for” them.

Modes of Learning
There are two general modes of learning: supervised and unsupervised
learning. Note that we will explore more complex learning paradigms,
especially with deep learning, since its unique framework and dynamics allow
for greater complexity.
Supervised learning is intuitive: given some set of inputs x, the model
attempts to predict y. In the supervised learning regime, we seek to capture
how various contributing factors (the features) have an effect on changing the
output (the label). The data must be correspondingly organized: we must have
a feature set (referred to in shorthand as the “x”) and a label set (referred to
in shorthand as the “y”). Each “row” or “item” in the feature set is an
organization of features derived from the same instance of a thing (e.g., pixels
from the same image, behavior statistics from the same user, demographic
information from the same person) and is associated with a label in the label
set (Figure 1-1). The label is often also referred to as the “ground truth.”
Figure 1-1 Supervised learning relationship between data and model

To accomplish the task of modeling an output given a set of inputs, the


model must develop an understanding of how the inputted features interact
with and relate to each other in producing a certain output. These associations
can be simple, like a weighted linear sum of relevant features, or more
complex, involving multiple duplications and concatenations to model the
relevancy of the provided features with respect to the label (Figure 1-2). The
complexity of the associations between the inputs and the outputs depends
both on the model and the inherent complexity of the dataset/problem being
modeled.
Figure 1-2 Varying levels of complexity in discovering relationships between features and labels

Here are some examples of supervised learning, drawn from various


domains and levels of complexity:

Evaluating diabetes risk: Provided with a dataset of patient


information (age, sex, height, medical history, etc.) and the
associated labels (if they have diabetes/severity or type of diabetes),
the model automatically learns relationships and can predict a new
patient’s probability/severity/type of diabetes.
Forecasting a stock: Provided with a recent time window of data
(e.g., the stock price for the last n days) and the corresponding label
(the stock price on the n+1-th day), the model can associate certain
patterns in a sequence of data to predict the next value.
Facial recognition: Given a few images taken of a person’s face (the
features being the pixels of the image) and the corresponding label
(the person’s identity, which can be any sort of personal identifier, like
a unique number), the model can formulate and recognize identifying
characteristics of a face and associate it with that person.
Note Although in this early-stage introductory context we refer to the
algorithms performing the learning and prediction all as “models,” the
types of models used in each context differ widely. The “Algorithms”
section of this chapter engages in more rigorous exploration of different
machine learning algorithms.

Unsupervised learning, on the other hand, is a little bit more abstract. The
goal of unsupervised learning is to formulate information and trends about a
dataset without the presence of a label set (Figure 1-3). Thus, the only input
to an unsupervised model or system is the feature set, x.

Figure 1-3 Unsupervised learning relationship between data and model


Instead of learning the relationship between features with respect to a
label, unsupervised algorithms learn interactions between features with
respect to the feature set itself (Figure 1-4). This is the essential difference
between supervised and unsupervised learning. Like supervised learning,
these interactions can range from simple to complex.

Figure 1-4 Varying levels of complexity in discovering relationships in between features

Here are some examples of unsupervised learning:

Customer segmentation: A business has a dataset of their users with


features describing their behavior. While they don’t have any
particular labels/targets in mind, they want to identify key segments
or clusters of their customer base. They use an unsupervised
clustering algorithm to separate their customer base into several
groups, in which customers in the same group have similar features.
Visualizing high-dimensional features: When there are only one or
two features in a dataset, visualizing is trivial. However, the
introduction of additional dimensions complicates the task of
visualization. As new features are added, additional dimensions of
separation must be added (e.g., color, marker shape, time). At a
certain dataset dimensionality, it becomes impossible to effectively
visualize data. Dimensionality reduction algorithms find the projection
of high-dimensional data into a lower-dimensional space (two
dimensions for visualization) that maximizes the “information” or
“structure” of the data. These are defined mathematically and do not
require the presence of a label feature.

This book will focus primarily on supervised learning, but we will see
many instances in which unsupervised and supervised learning approaches are
combined to create systems more powerful than any one of its isolated parts.

Quantitative Representations of Data: Regression


and Classification
Earlier, it was discussed that data must be in some sort of quantitative form to
be used by an automated learning model. Some data appears “naturally” or
“raw” in quantitative form, like age, length measurements, or pixel values in
images. However, there are many other types of data that we would like to
incorporate into models but don’t appear in original form as a number. The
primary data type here is categorical (class-based or label-based data), like
animal type (dog, cat, turtle) or country (United States, Chile, Japan). Class-
based data is characterized by a discrete separation of elements with no
intrinsic ranking. That is, we cannot place “dog” as “higher than” a “cat” in the
same way that 5 is higher than 3 (although many dog owners may disagree).
When there are only two classes present in a feature or label set, it is
binary. We can simply associate one class with the quantitative representation
“0” and the other class with the quantitative representation “1.” For instance,
consider a feature in a sample dataset “animal” that contains only two unique
values: “dog” and “cat.” We may attach the numerical representation “0” to
“dog” and the numerical representation “1” to “cat” such that the dataset is
transformed as follows (Figure 1-5).
Figure 1-5 Example quantitative conversion/representation of a categorical feature “animal”

You may want to point out that because 1 is higher than 0, we are placing
the “cat” label higher than the “dog” label. However, note that when we
convert a data form into a quantitative representation, the quantitative
measurement does not always (and often doesn’t) mean the same thing
anymore. In this case, the new feature should be interpreted as “Is the animal
a cat?” rather than still as “animal,” in which a 0 represents “no” and a 1
represents “yes.” We also could have formed a quantitative representation
meaning “Is the animal a dog?” in which all “dog” values are replaced with “1”
and all “cat” values are replaced with “0.” Both approaches hold the same
information (i.e., the separation between animal types) as the original data
(Figure 1-6), but it is represented quantitatively and thus becomes
readable/compatible with an automated learning model.
Figure 1-6 Equivalent binary representations of a categorical variable

However, when there are more than two unique values in a feature or
label, things become more complicated. In Chapter 2, we will more deeply
explore effective data representation, manipulation, and transformation in the
context of modeling tabular data. A challenge we will encounter and explore
throughout the book is that of converting features of all forms into a more
readable and informative representation for effective modeling.
Supervised problems can be categorized as regression or classification
problems depending on the form of the desired output/label. If the output is
continuous (e.g., house price), the problem is a regression task. If the output
is binned (e.g., animal type), the problem is a classification task. For ordinal
outputs – features that are quantized/discrete/binned but are also intrinsically
ranked/ordered, like grade (A+, A, A–, B+, B, etc.) or education level
(elementary, intermediate, high, undergraduate, graduate, etc. – the
separation between regression and classification depends more on the
approach you take rather than being inherently attached to the problem itself.
For instance, if you convert education level into quantitative form by attaching
it to an ordered integer (elementary to 0, intermediate to 1, high to 2, etc.)
and the model is to output that ordered integer (e.g., an output of “3” means
the model predicted “undergraduate” education level), the model is performing
a regression task because the outputs are placed on a continuous output. On
the other hand, if you designate each education level as its own class without
intrinsic rank (see Chapter 2 ➤ “Data Encoding” ➤ “One-Hot Encoding”), the
model is performing a classification task.
Many classification algorithms, including neural networks, the main study
of this book, output a probability that an input sample falls into a certain class
rather than automatically assigning a label to an input. For instance, a model
may return a 0.782 probability an image of an ape belongs to the class “ape,”
a 0.236 probability it belongs to the class “gibbon,” and lower probabilities for
all other possible classes. We then interpret the class with the highest
probability as the designated label; in this case, the model correctly decides on
the label “ape.” Although the output is technically continuous in that it is
almost always not an integer 0 or 1 in practice, the ideal behavior/the
expected labels are; thus, the task is classification rather than regression.
Understanding whether your problem and approach perform a
classification or a regression task is a fundamental skill in building functioning
modeling systems.

The Machine Learning Data Cycle: Training,


Validation, and Test Sets
Once we fit our model to the dataset, we want to evaluate how well it
performs so we can determine appropriate next steps. If the model performs
very poorly, more experimentation is needed before it is deployed. If it
performs reasonably well given the domain, it may be deployed in an
application. If it performs too well (i.e., perfect performance), the results may
be too good to be true and require additional investigation. Failing to evaluate
a model’s true performance can lead to headaches later at best and can be
dangerous at worst. Consider, for instance, the impact of a medical model
deployed to give diagnoses to real people that incorrectly appears to perfectly
model the data.
The purpose of modeling, as discussed previously, is to represent a
phenomenon or concept. However, because we can only access these
phenomena and concepts through observations and data, we must
approximate modeling the phenomenon by instead modeling data derived
from the phenomenon. Restated, since we can’t directly model the
phenomenon, we must understand the phenomenon through the information
we record from it. Thus, we need to ensure that the approximation and the
end goal are aligned with one another.
Consider a child learning addition. Their teacher gives them a lesson and
six flash cards to practice (Figure 1-7) and tells them that there will be a quiz
the next day to assess the child’s mastery of addition.

Figure 1-7 Example “training dataset” of flash cards. The feature (input) is the addition prompt (e.g.,
“1+1,” “1+2”); the label (desired output) is the sum (e.g., “2,” “3”)

The teacher could use the same questions on the flash cards given to the
students on the quiz (Quiz 1), or they could write a set of different problems
that weren’t given explicitly to the students, but were of the same difficulty
and concept (Quiz 2) (Figure 1-8). Which one would more truly assess the
students’ knowledge?
Figure 1-8 Two possible quizzes: equal to the training dataset (left) or slightly different examples
representing the same concept

If the teacher gave Quiz 1, a student could score perfectly by simply


memorizing six associations between a prompt “x + y” and the answer “z.”
This demonstrates approximation-goal misalignment: the goal of the quiz was
to assess the students’ mastery of the concept of addition, but the
approximation instead assessed whether the students were able to memorize
six arbitrary associations.
On the other hand, a student who merely memorized the six associations
given to them on the flash cards would perform poorly on Quiz 2, because
they are presented with questions that are different in explicit form, despite
being very similar in concept. Quiz 2 demonstrates approximation-goal
alignment: students can score well on the quiz only if they understand the
concept of addition.
The differentiating factor between Quiz 1 and Quiz 2 is that the problems
from Quiz 2 were not presented to the students for studying. This separation
of information allows the teacher to more genuinely evaluate the students’
learning. Thus, in machine learning, we always want to separate the data the
model is trained on from the data the model is evaluated/tested on.
Another random document with
no related content on Scribd:
affectionate in his family relations, 38;
with his equals socially delightful companion, 38;
inordinately fond of women and their society, 38, 39;
always of delicate rather than robust health, 39, 40;
not a church member, but a believer in religion, 40;
attitude toward his chief, 41;
obsessed by idea of a strong government, 45;
believed necessary to enlist propertied interest, 45;
indifferent to unpropertied classes, 45;
active in interest of Funding Bill, 49;
bargains with Jefferson on location of new national capital, 66-68;
at high tide of popularity, 69;
considers himself Prime Minister, 69;
offends other Cabinet members, by dictatorial manner, 69;
indifferent to public opinion, 70;
prepares Excise Bill, 70, 71;
takes personal charge of Excise Bill in Senate, 73;
National Bank Bill, 74;
enunciates doctrine of implied powers, 75;
breaks with Jefferson when J. advises Washington Bank Bill is
unconstitutional, 78;
Fenno’s Gazette his organ, 154;
Report on Manufactures filed with Congress, 161;
interests capital in developing Passaic Falls, 162;
portrait by Trumbull subscribed for, 162;
attention attracted by Freneau’s Gazette, 163;
believes Jefferson responsible for attacks in paper, 166;
attacks Freneau anonymously, 168;
tries to drive Jefferson from the Cabinet, 169;
in Fenno’s Gazette attacks Jefferson, 172;
denies his own unfriendliness to Constitution, 173;
complains of Jefferson’s interference with Treasury Department, 173;
warns Adams of effort to defeat him in 1792 campaign, 181;
possible candidacy of Aaron Burr for Vice-President maddening, 181;
makes strenuous efforts in Adams’s behalf, 181;
urges Adams in dictatorial terms to his duty, 182;
blackmailing of, by Reynolds, 187;
tells complete story of relations with Reynolds’s wife to deputation from
Congress, 188-90;
amazes House by reports, 198;
his official conduct of the Treasury vindicated by Congress, 203;
alarmed at enthusiasm for French Revolution, 214;
urges Washington to return to Philadelphia, 214;
takes matters into his own hands and decides on proper policy of
Jefferson’s Department of State, 215;
prepares list of questions for Washington to submit to Cabinet, 215;
his position on the reception to be given Genêt, 215;
writes series of papers for Fenno’s Gazette justifying policy of Neutrality
in French Revolution struggle, 225, 226;
is answered by Madison, 226;
aided by Genêt’s conduct, 227;
is stricken with yellow fever, 237, 238;
sees risk of war with England, 245;
is mentioned as special envoy to England, 246;
declines to have his name considered, 247;
goes in person to put down Whiskey Insurrection, 254-56;
plans to crush the Democratic Societies, 256;
is aided by Washington’s attack on Societies, in Annual Message, 262,
264;
prepares to leave Cabinet, 266;
considers his work finished, 266;
opens law office in New York, 268;
plans to direct Federalist Party in Congress by correspondence, 268;
dubs Jay Treaty an ‘execrable thing,’ 271;
is injured in rioting in New York, 276, 277;
consults with leading Federalists on campaign of 1796, 308;
distrusts Adams, 308;
H. and King decide to offer support to Patrick Henry, 308, 309;
H. turns to Thomas Pinckney, 310;
plans to bring in Adams second, 311;
publishes pamphlet on relations with Mrs. Reynolds, 355;
advises Adams through McHenry on French situation, 362;
prepares to play trump card—X Y Z papers—to force war with France,
364;
advises moderation in framing Alien and Sedition Bills, 376, 377;
is nominated Major-General in prospective war with France, 413;
schemes to be made second in command, 414;
directs fight against Adams through his tools in Cabinet, 414;
in correspondence with Miranda, South American adventurer, 427, 428;
opposed by Burr in 1800 New York elections, 448—55;
contrast between H. and Burr, 449;
plans election of Presidential electors he can control, with view of
defeating Adams, 451;
power broken with defeat of Federalists in New York in 1800, 454;
tour of New England in 1800, 459;
schemes against Adams in contest for Presidency, 459-65;
writes pamphlet attacking Adams, not intended for general publication,
477, 478;
effect of pamphlet when published, 479, 480.

Hamilton, Mrs. Alexander, daughter of General Schuyler, 134.

Hamilton, William, and trees in Philadelphia, 117.

Hammond, George, British Minister to United States, 246;


more friendly to Hamilton than to Jefferson, 246.

Hancock, John, Jefferson’s aide in forming new party, 144.

Harper, Robert Goodloe, president of Jacobin Club of Charleston, 223;


Representative from South Carolina, 346;
sketch of, 347;
on the Sedition Bill, 379, 380.

Harrowgate Gardens, Philadelphia, 121.

Hawkins, Benjamin, Senator from North Carolina, in conference with


Jeffersonian leaders, 205.

Henry, Patrick, on Assumption, 60;


Hamiltonians offer him support for Presidency, 309;
declines overtures made through John Marshall, 309.

Higginson, Stephen, on Jay Treaty meetings in Boston, 278.

Holt, Charles, editor of New London Bee, convicted of sedition, 403, 404.

Humphreys, William, secretary to Washington, 119.

Hutchinson, Dr. ——, in yellow fever epidemic in Philadelphia, 237.

Independent Chronicle, letters to, on Funding Bill, 57, 58;


on Funding Bill, 58;
on First Congress, 79;
correspondents to, reply to letters of ‘Publicola,’ 85;
other letters in, defend ‘Publicola’ letters, 85;
on Freneau’s paper the Federal Gazette (National Gazette), 155;
on Indian expedition of St. Clair, 175;
on speculative craze, 176, 177;
on Jay Treaty, 281, 283, 284;
on X Y Z papers, 364.

Indian Queen, Philadelphia tavern, 120.

Iredell, James, Senator from South Carolina, on Genêt, 218.

Izard, Ralph, Senator from South Carolina, on titles, 5;


and the Assumption Bill, 62;
on John Adams, 316.

Jackson, James, Representative from Georgia, later Senator, opposes


Hamilton’s financial measures, 45, 49;
on funding of debt, 49, 50;
on Assumption, 60, 61;
on Excise Bill, 71, 72;
opposes Bank Bill, 76;
aids Jefferson in organizing party, 150.

Jackson, William, secretary to Washington, 119.

Jackson, Mrs. William, sister of Mrs. William Bingham, 135.

Jacobin Club, Democratic Club of Charleston, 223.

Jarvis, Dr. Charles, of Massachusetts, Jeffersonian leader, 144;


on French Revolution, 208;
unsuccessful candidate for Congress against Fisher Ames, 257.

Jay, John, considered by Washington for post of Secretary of the Treasury,


21;
defeated for governorship of New York, in 1792, 178;
appointed special Minister to Great Britain, 247, 248;
obnoxious to Jeffersonians, 248;
his experience in diplomacy, 248;
bitter fight in Senate over confirmation, 249;
concludes treaty with Great Britain, 269-71;
is denounced when provisions of treaty are published, 274;
burned in effigy, 274.

Jay Treaty, the, called ‘Grenville Treaty’ by Jeffersonians, 269;


negotiated by John Jay and Lord Grenville, 269-71;
provisions of, 271;
dubbed by Hamilton an ‘execrable thing,’ 271;
debated at length in Senate, 272;
efforts of Senators to prevent publication, 272;
storm of denunciation over its provisions, 273;
rioting in many places, 274-76;
endorsed at instance of Federalists by chambers of commerce, 279;
mob spirit on account of, in Boston, 279;
protests against, from Charleston, 280, 281;
meetings in opposition to, throughout the country, 281-85;
demands for papers and instructions as to, made by House before making
appropriations required to carry it out, 298;
papers refused by President, 298.

Jefferson, Thomas, shocked at unrepublican tone of New York society, 12;


bargains with Hamilton to aid in passage of Assumption Bill, 66;
afterward claimed Hamilton had deceived him, 67;
letters of, on Assumption, 67;
letters on Treasury policies, 74;
gives Washington written opinion on constitutionality of Bank Bill, 77;
relations with Hamilton strained, 78;
tour through New England, 79, 81;
writes letter commending Paine’s Rights of Man, which printer uses as
preface to pamphlet, with J.’s name and official title, 83;
J. embarrassed, 83;
Adams angry at J.’s supposed reference to him, 84;
J. explains and Adams satisfied, 85;
J. pleased with effect of newspaper turmoil, 85, 86;
position as friend of the ‘man of no importance’ established, 86;
comment on speculative craze, 87;
makes political issue of speculation by Federalist Congressmen, 90;
begins work of organizing an opposition party, 90;
a portrait of the man, 92-113;
personal appearance, 92;
careless in dress, 92;
dignified, but shy, 93;
thought lacking in frankness, 93;
glance shifty, 93;
entertaining talker, 94;
maternal ancestry aristocratic, 94;
father a Western pioneer, 95;
J. a Westerner with Eastern polish, 95;
educated at William and Mary College, 95;
well trained in the law, 96;
influenced by Locke’s writings, 96;
J.’s democracy inherent, 96;
as member of Virginia House of Burgesses, attacks system of land entail
and law of primogeniture, 97;
never forgiven by Virginia landed aristocracy, 98;
as U.S. Minister to France, intimate of Lafayette, 98;
popular with all classes, 98;
familiarizes himself with French life in the country, 99;
diplomatic reports illuminating, 99;
comments on French system of government, 99;
not hostile to monarchy, 100;
reports to Jay on rioting in Paris, 100;
intimate of the Girondists, 100;
returns to America before the Terror, 101;
a humanitarian, 101;
opposed to capital punishment save for treason, 101;
a humane master, 102;
hostile to slavery, 102;
wrote the Ordinance of the N.W. Territory, 102;
not an atheist nor hostile to Christian religion, 103;
contributed regularly to support of clergy, 103;
hated by the clergy for forcing separation in Virginia of Church and State,
104;
so-called atheist law, 104;
his view of creation, 104, 105;
not hostile to the Constitution and favored its ratification, 105;
called Convention ‘an assembly, of demigods,’ 105;
first impressions of Constitution unfavorable, 105;
an ardent friend later, 106;
writes Madison praising The Federalist papers, 106;
writes Washington, hoping a Bill of Rights will be added, 106;
his views on, quoted from his Autobiography, 107;
without a peer in the mastery of men, 107;
his understanding of mass psychology, 107;
a voluminous letter writer, 107;
valued the press as engine of democracy, 108;
captivating in personal contacts, 108;
led rather than drove, 108;
original ‘Easy Boss,’ 108;
not an orator, 109;
disliked contentious debates, 109;
had great self-control, 109;
never belittled his enemies, 110;
admired Hamilton’s ability, 110;
estranged from John Adams for years, revived in last years the old
friendship, 110;
not an idealist, but an opportunist, 110;
a resourceful politician, 111;
his diversified interests, 112;
loved art in all its forms, 112;
arranged in Paris for statue of Washington by Houdon, 112;
visited by Humboldt, 113;
interested in mechanical and scientific inventions, 113, 114;
the life of the farmer his chief interest, 113;
democratic in sympathies, but lived as an aristocrat, 138;
finds commercial interests, professions, and major portion of press
Federalist, 140;
notes resentment of farmers and old Revolutionary soldiers, 141;
notes dissatisfaction with Excise Law, 141;
fears doctrine of implied sovereignty, as undermining sovereignty of
States, 141;
problem to reach and arouse masses, 142;
material for opposition party abundant, 142;
J. notes local parties in opposition in every State, 143;
problem to consolidate and broaden local into national issues, 143;
chooses leaders in various States with keen judgment, 143-50;
efforts directed to broaden franchise, 151;
importance of a national newspaper, 152;
sends letter of resignation to Washington, 166;
grows more dissatisfied with policies of Government under Hamilton’s
leadership, 168;
is attacked by Hamilton in newspapers, 169;
refuses to be drawn into newspaper controversy with Hamilton, 173;
official associations and social relations, unpleasant, 173;
writes to personal friends with much bitterness of Hamilton’s attacks,
173, 174;
tries to drive Hamilton from Cabinet, and fails, 203;
no match for Hamilton in field of finance, 206;
sees new issue in position of Federalists on French Revolution, 210;
ardent in support of French, 210;
believes American Republic bound up with success of French
Revolution, 210;
senses the sympathies of the ‘people of no importance,’ 213;
his position on question of receiving Genêt, 216;
agrees to Proclamation of Neutrality, 216;
urges Madison to reply to Hamilton’s articles on Neutrality, 226;
and the brig Little Sarah, 228;
Genêt’s conduct obnoxious to J., 228;
plans to divorce Jeffersonians from Genêt, 229;
discusses with the President as to Genêt, 230;
prepares letter to American Minister at Paris asking Genêt’s recall, 230;
socially ostracized in Philadelphia, 232-33;
resigns his portfolio as Secretary of State, 233;
his correspondence with both British and French Ministers, impartial,
238;
Report on Commerce, and Algerine piracy, 238, 239;
returns to private life, 239;
plans to force fighting in congressional elections of 1794, 256-58;
lives in retirement at Monticello, but active in political plans, 259;
indifferent as to Presidency contest in 1796, 307;
concerned as to health, 307;
his letter to Philip Mazzei on American politics, 308;
Democrats decide in 1796 on J. as their candidate for President, 308;
J. receives only three votes less than Adams, and hence is chosen Vice-
President, 312;
J.’s letter to Mazzei again brought up, 351;
bitterly attacked in Fenno’s Gazette, 352;
toasted on Washington’s Birthday at Harvard College, in satirical vein,
353;
J. silent under slanderous attacks in newspapers, 353;
his social ostracism in Philadelphia continues, 354;
Jefferson plans opposition to Administration policy toward France, 363,
364;
the Sprigg Resolutions, 363, 364;
publication of X Y Z papers, 365, 366;
his party seeks, on his advice, to moderate war feeling, 374;
outraged by passage of Alien and Sedition Laws, 407;
moves for their repeal, 407;
conference at Monticello, 407;
inspires Virginia and Kentucky Resolutions, 407, 408;
fears insurrection, 418;
election of, as President in 1800, inevitable, 441;
no rival in his party, 444;
his political genius, 444-48;
elected President, 486;
his meeting with Adams in Washington, 489, 490;
farewell to Senate, 507;
inauguration, 509, 510.

Jeffersonians, party of opposition organized by Jefferson (called


‘Jeffersonians,’ ‘Jacobins,’ ‘Democrats,’ and other names, officially
‘Republicans’), 144-50;
take advantage of divisions among Federalists, 151;
‘Jeffersonian insolence,’ 151;
make gains in the congressional elections, 1792, 180;
strong in Virginia, 180;
five States carried by, in 1792, 183;
gains by, in election of 1794, 312;
sweep West and South in 1796, 312;
embarrassed by publication of X Y Z papers, 364, 365;
aided by excesses of Federalists in pushing prosecutions under Sedition
Act, 365-411;
win in New York elections of 1800, 455;
confident in Presidential campaign of 1800, 465-85;
elect Jefferson President, 506.

Johnson, Samuel, of North Carolina, on Assumption Bill, 63;


chosen director of Bank of United States, 90;
on Jay Treaty, 281.

Jones, Willie, North Carolina leader of Jeffersonians, 149.

Jumel, Madame, 39.


Kentucky Resolutions, written and introduced in Legislature by
Breckenridge, at suggestion of Jefferson, 408.

King, Rufus, Senator from New York, on Assumption, 60;


Federalist leader in Senate, 60;
and the Assumption Bill, 62;
discouraged at apparent failure of Assumption Bill, 63;
chosen director of Bank of United States, 90;
on French Revolution, 209;
conference of Federalist leaders in Philadelphia lodgings of, 247;
on business in Senate, 298, 299;
on suppression of Irish rebellion, 375;
protests release by British of Irish prisoners, 375.

Kirby, Ephraim, Democratic organizer in Connecticut, 145.

Knox, Henry, Secretary of War, 13;


resents Hamilton’s interference with War Department purchases, 69;
attacked by Jeffersonians for mismanagement of St. Clair expedition
against the Indians, 175;
on reception of Genêt, 215.

Knox, Mrs. Henry, a Mrs. Malaprop, 15.

Langdon, John, Senator from New Hampshire, Democratic leader in New


Hampshire, 146;
votes against Jay Treaty, 282.

Laurance, John, Representative from New York, on Madison’s amendment


to Funding Bill, 55;
and Assumption Bill, 62;
on Excise Bill, 72;
elected director of Bank of United States, 90;
on Giles’s resolutions attacking Treasury management, 201.
Lear, Tobias, secretary to Washington, 119.

Lee, Richard Henry, Senator from Virginia, on question of titles, 5;


fellow student of Madison at Princeton, 157.

L’Enfant, Pierre Charles, designs Federal Hall, New York City, 2;


employed by Hamilton in planning city of Paterson, 162.

Little Sarah, brig, 227.

Livermore, Samuel, on effect of amendment to Excise Bill, 73.

Livingston, Brockholst, classmate of Madison at Princeton, 157;


in New York elections of 1800, 452.

Livingston, Edward, Representative from New York, sketch of, 290, 291;
asks that the papers and instructions pertaining to Jay Treaty be laid
before the House, 294;
debate on resolutions of, on Jay Treaty, 294-97;
resolutions of, on Jay Treaty, adopted, 297;
on the Alien Bill, 378;
on the Sedition Bill, 379.

Livingston, Robert R., Chancellor of New York State, defeated for Senate
through influence of Hamilton, 36;
leading Jeffersonian, 147.

Logan, Dr. James, Philadelphia, friend of Jefferson, 138;


his visit to France, 423-26.

London Tavern, Philadelphia, 120.

Lyon, Matthew, Democratic leader in Vermont, 146;


ridicules Federalist practice of framing Reply to the President’s Message,
350;
attacked in newspapers, 350, 351;
in disgraceful wrangle with Griswold, 360;
attacked by Griswold, and rough-and-tumble fight ensues, 361;
victim of Reign of Terror, under Sedition Act, 386-88.

McClenachan, Blair, and Jay Treaty, 276.

McCormick, Dan, his House of Gossip, 15.

McHenry, James, on Hamilton, 38;


member of Washington’s military family, 39;
in 1792 campaign, 181, 182;
on John Adams, 324;
Adams’s Secretary of War, sketch of, 334-38;
dismissed by Adams, 456.

Maclay, William, Senator from Pennsylvania, moved to laughter over


matter of titles, 4;
on Hamilton’s funding scheme, 44, 45;
on speculations of Congressmen, 47, 48;
has plan, substitute for Funding Bill, 56;
on his colleague, Scott, 56, 61;
on Assumption, 61;
on Vining, of Delaware, 61;
letters to, from Rush and Logan opposing Assumption, 61;
approached by Morris to join in land speculations, 61, 62;
on attitude of Congressmen and speculators and Assumption, 62;
on Hamilton and Congress, 68;
on Excise Bill, 73, 74;
on Bank Bill, 75;
Jefferson’s aide in Pennsylvania, 148;
on French Revolution, 209;
on John Adams, 317.

Macon, Nathaniel, North Carolina, organizer for Jefferson, 150;


in conference of Jeffersonian leaders, 205.
Madison, James, Representative from Virginia, on Congress, 3;
on titles, 6;
seeks postponement of first tariff measure, 19;
on Hamilton’s Report on Public Credit, 46;
on Funding Bill, 51;
part in framing Constitution, 51;
contributions to The Federalist, 51;
not an orator, 52;
consulted often by Washington, 53;
cultivated by Hamilton, 53;
loved as a son by Jefferson, 53;
proposes an amendment to discriminate between original owners and
purchasers of public securities, 53, 54;
Federalists and speculators much disturbed, 55, 56;
on resolutions of commercial organizations, 55;
amendment to Funding Bill, voted down, 56;
votes for Assumption Bill, 61;
letter to Monroe, 63;
and bargain on Assumption Bill, 66, 67;
opposes principle of Excise Bill, but votes in favor, 72;
on Hamilton’s doctrine of implied powers, 76;
advises Washington Bank Bill is unconstitutional, 77;
tours New England with Jefferson, 79, 81;
writes articles, attacking Hamilton’s policies, for National Gazette, 169;
defends Jefferson’s position on Constitution, 172;
attacks Fenno’s ‘unmanly attack’ on Jefferson, 172;
on Giles’s resolutions attacking Treasury management, 201;
replies to Hamilton’s Neutrality articles, 226;
prepares resolutions based on Jefferson’s Report on Commerce, 240;
attacked by Federalists, 240, 241;
excitement in the country, merchants denouncing and populace favoring
M.’s Resolutions, 242-44;
marriage to Dolly Todd, 259;
author of Virginia Resolutions, 409.

Marshall, John, Representative from Virginia (afterward Chief Justice of the


United States), appointed special envoy to France, 345;
his return made an occasion for riotous celebration, 368, 369;
opposes the Alien and Sedition Laws, in letter to Porcupine’s Gazette,
382;
opposes plans of his party (Federalist) for change in electoral count, 442;
appointed Secretary of State by Adams, 457.

Martin, Luther, the ‘Federalist bull-dog,’ attacks Jefferson, 352, 353.

Maryland Journal, on Hamilton, 69;


on Assumption, 71;
on speculation, 88;
on speculative craze, 177.

Mason, Stevens Thomson, Senator from Virginia, publishes Jay Treaty, 273.

‘Men of no importance,’ feeling among, against Funding and Assumption


Bills, 70;
and Excise Bill, 70, 71.

Mercer, John Francis, Representative from Maryland, organizer for


Jefferson in Maryland, 149;
on Giles’s resolutions attacking Treasury management, 201, 203.

Mifflin, Thomas, Governor of Pennsylvania, 148;


orders militia to parade in honor of President, 359.

Mingo-Creek Society, Democratic Club, 262.

Miranda, Francesco de, soldier of fortune and adventurer, proposes


revolutionary scheme in South America, 427;
in correspondence with Hamilton, 427;
holds out lure of Florida and Cuba to United States, 427.

Monroe, James, Senator from Virginia, of deputation from Congress to


Hamilton on the Reynolds charges, 187;
Minister to France, 341, 342;
banquet in honor of, in Philadelphia, on return from France, 358;
confers with Jefferson and Democratic leaders, 358.

Moore, Thomas, poet, on Jefferson, 90.

Moreau de Saint-Merys, threatened with prosecution under Alien Law, 405,


406.

Morris, Gouverneur, on Hamilton’s speech in Constitutional Convention,


presenting plan for Constitution, 32;
Minister to France, 339-41.

Morris, Robert, Senator from Pennsylvania, on titles, 6;


through business partner, speculated in certificates, 46;
legislative agent of Hamilton, 47;
discusses with Hamilton on location of capital, 65, 66;
rents his residence in Philadelphia as Presidential residence, 119;
Hamilton outlines his Bank policy to, 74;
on Genêt, 217.

Morris, Mrs. Robert, intimate of Mrs. Washington, ‘second lady in the


land,’ 131.

Morse, Anson, D., quoted, 142.

Muhlenberg, Frederick A. C., Speaker of the House, pokes fun at Senators


on titles, 5;
favors Assumption Bill, 58;
one of deputation from Congress to Hamilton on the Reynolds charges,
187.

National Gazette, on Hamilton’s Report on Manufactures, 161, 163;


attacks in, on Hamilton, 166-70;
attacks in, on Washington, 221;
attacks Hamilton’s conduct of the Treasury, 196, 197;
presents analysis of vote vindicating Hamilton’s management of
Treasury, 204;
on French Revolution, 207, 211, 212;
on Genêt, 218, 219, 220;
prints Madison’s reply to ‘Pacificus’ letters in Gazette of United States,
226.

Naturalization Act, 264.

Neutrality, Proclamation of, in war between French Republic and England,


216;
dissatisfaction of people, 217, 220;
flouted by both French and British, 224;
justified by Hamilton in brilliant series of articles in Fenno’s Gazette,
225, 226;
the case of the Little Sarah, 227, 228.

New Hampshire Gazette, on the Jay Treaty, 282.

New York, Argus, on the Jay Treaty, 275, 277, 278, 282, 283.

New York City, capital of the Nation in 1789, 1;


First Congress meets in, 1;
preparations for Washington’s Inaugural, 2, 3, 7;
inaugural ball, 7, 8;
life in, 8, 10;
narrow streets, and muddy, 10;
theatrical productions, 10, 11;
cost of living, 12;
tone of society not republican, 12;
republican ‘court,’ 13, 15;
Wall Street fashionable residence street, 15;
slave market and whipping-post prominent in 1789, 9, 10;
taverns, theaters, 11;
yellow fever epidemic, 380.

New York Daily Advertiser, on First Congress, 79;


‘Publicola’ letters in Columbian Centinel answered, 84, 85.
New York Journal, on Assumption, 63, 64.

New York Register, on Madison’s amendment to Funding Bill, 177.

New York Time Piece, on X Y Z papers, 365.

Newspapers, stories in, as to speculations in public securities, 50;


on the Funding Bill, 57, 58;
on Assumption, 71;
on Bank, 78, 79;
on Jefferson and Paine’s Rights of Man, 84, 85;
on speculative craze, 88, 89, 176, 177;
on Hamilton’s Report on Manufactures, 161;
on attacks on Hamilton’s financial measures, 163, 165;
Federalist and Jeffersonian organs, 166-70;
Hamilton’s attacks in, on Jefferson, 172;
on Indian expedition of St. Clair, 175;
on election of Clinton Governor of New York, 178;
in campaign of 1792, 181-83;
crusade against Hamilton, 196;
on Hamilton’s defense of his financial policy, 198, 199, 203, 204;
on French Revolution, 207, 211, 212;
on Genêt, 218-20;
attacks on and defense of Washington, 221;
on the Democratic Societies, 253;
on the Whiskey Insurrection, 254, 255;
on the Jay Treaty, 273-83;
in campaign of 1796, 310, 311;
on the debates in Congress on trouble with France, 350-61;
on the X Y Z papers, 364, 365;
on supposed French outrages, 366-71;
on Alien and Sedition Bills, 374-81;
in Presidential campaign of 1800, 444-85.

Nicholas, George, Kentucky Jeffersonian, challenges Harper to debate on


Sedition Law, 406.
Nicholas, Wilson Carey, of Virginia, in conference with Jefferson at
Monticello on plans to repeal the Alien and Sedition Laws, 407.

Noailles, Viscount de, visitor to Philadelphia, 135;


appointed Minister by Royal Princes at Coblentz, received by
Washington, 219;
Bache’s Daily Advertiser on, 234.

O’Eller’s tavern, Philadelphia, 119, 121, 136;


dinner at, to Genêt, 220.

Order of the Cincinnati, Jefferson on, 262.

Otis, Harrison Gray, on Hamilton, 38;


on Philadelphia, 123, 124;
Representative from Massachusetts, 346;
sketch of, 346.

Paine, Thomas, Rights of Man reply to Burke’s Reflections upon the French
Revolution, 82;
publication of pamphlet in Philadelphia creates sensation, 82, 83;
Jefferson’s letter to printer used as preface, 83;
newspaper controversy, 83, 84.

Parsons, Theophilus, pessimistic in campaign of 1792, 179.

Paterson, New Jersey, manufacturing city promoted by Hamilton, 162.

Paterson, William, Senator from New Jersey, and the Assumption Bill, 62.

Pennsylvania Gazette, on Bank Bill, 78;


on speculation, 88.

Perry’s Gardens, New York City, 10.


Philadelphia, social background, 116-39;
capital removed to, 116;
appearance of city in 1790’s, 117;
government departments closely connected, 118;
private houses rooming-houses for Congressmen, 120;
lack of ‘respectful manners’ of the ‘common people,’ noted by travelers,
121;
life of working and middle classes not easy, 123;
society luxury-loving and aristocratic, 123;
English influence prominent, 124;
social life, free manners, 126, 127;
yellow fever epidemic in 1793, 235-38.

Philadelphia Advertiser, on Hamilton, 162.

Philadelphia County Brigade, 275.

Philosophical Society, Philadelphia, 138;


Jefferson at rooms of, 138.

Pickering, Timothy, and the Africa incident, 287;


writes of British to John Quincy Adams, 287;
Secretary of State under Adams, sketch of, 326-31;
ignores requests of Adams in French troubles, 430;
delays preparation of instructions to French mission, 434;
dismissed by Adams, 456, 457.

Pinckney, Charles, of South Carolina, joins Jefferson, 150;


elected to Senate, 383.

Pinckney, Charles Cotesworth, Minister to France, 342;


joined with Marshall and Gerry in special mission in France, 345;
Federalist candidate for President in 1800, 459 ff.

Pinckney, Thomas, Minister to Great Britain, his efforts to stop British


violations of Neutrality Proclamation, 224;
set aside in negotiations of Jay Treaty, 269;

You might also like