Professional Documents
Culture Documents
Full download Data Wrangling on AWS: Clean and organize complex data for analysis Shukla file pdf all chapter on 2024
Full download Data Wrangling on AWS: Clean and organize complex data for analysis Shukla file pdf all chapter on 2024
https://ebookmass.com/product/geospatial-data-analytics-on-
aws-1st-edition-scott-bateman/
https://ebookmass.com/product/data-science-in-theory-and-
practice-techniques-for-big-data-analytics-and-complex-data-sets-
maria-c-mariani/
https://ebookmass.com/product/intelligent-data-analysis-from-
data-gathering-to-data-comprehension-deepak-gupta/
https://ebookmass.com/product/data-engineering-with-aws-acquire-
the-skills-to-design-and-build-aws-based-data-transformation-
pipelines-like-a-pro-2nd-edition-eagar/
Introduction to Python for Econometrics, Statistics and
Data Analysis Kevin Sheppard
https://ebookmass.com/product/introduction-to-python-for-
econometrics-statistics-and-data-analysis-kevin-sheppard/
https://ebookmass.com/product/applied-data-analysis-and-modeling-
for-energy-engineers-and-scientists/
https://ebookmass.com/product/volume-iii-data-storage-data-
processing-and-data-analysis-1st-ed-2021-edition-volker-liermann-
editor/
https://ebookmass.com/product/architecting-a-modern-data-
warehouse-for-large-enterprises-build-multi-cloud-modern-
distributed-data-warehouses-with-azure-and-aws-1st-edition-
anjani-kumar-3/
https://ebookmass.com/product/statistical-methods-for-survival-
data-analysis-3rd-edition-lee/
Data Wrangling on AWS
Navnit Shukla
Sankar M
Sam Palani
BIRMINGHAM—MUMBAI
Data Wrangling on AWS
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, without the prior written permission of the publisher, except in the case
of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable
for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and
products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot
guarantee the accuracy of this information.
ISBN 978-1-80181-090-6
www.packtpub.com
I am grateful to my grandmother, Radhika Shukla, my mother, Asha Shukla, my grandfather, I.D. Shukla,
and my father, N.K. Shukla, for their unwavering support and the sacrifices they have made. They have
been a constant source of inspiration and have shown me the true power of determination. I would also
like to express my heartfelt gratitude to my loving wife, Anchal Dubey, and my son, Anav Shukla, for
being my steadfast companions on our shared life journey. Their love and encouragement have fueled
my ambitions and brought immense joy to my life.
- Navnit Shukla
I am thankful to my wife, Krithiha Kumar, for her unwavering support and motivation in all my endeavors.
I would thank my son and daughter for their love, curiosity, creativity, and boundless energy, which has
inspired me every day to explore further and dream big.
- Sankar Sundaram
Contributors
I am deeply grateful to those who have been by my side, offering unwavering support throughout my
journey. I extend a heartfelt thank you to my beloved wife, Anchal Dubey, and my devoted parents, whose
encouragement and belief in me have been invaluable.
Sankar M has been working in the IT industry since 2007, specializing in databases, data warehouses,
and the analytics space for many years. As a specialized data architect, he helps customers build
and modernize data architectures and helps them build secure, scalable, and performant data lake,
database, and data warehouse solutions. Prior to joining AWS, he worked with multiple customers in
implementing complex data architectures.
Sam Palani has over 18+ years as a developer, data engineer, data scientist, startup co-founder, and
IT leader. He holds a master's in Business Administration with a dual specialization in Information
Technology. His professional career spans 5 countries across financial services, management consulting,
and the technology industries. He is currently Sr Leader for Machine Learning and AI at Amazon
Web Services, where he is responsible for multiple lines of the business, product strategy, and thought
leadership. Sam is also a practicing data scientist, a writer with multiple publications, a speaker at
key industry conferences, and an active open-source contributor. Outside work, he loves hiking,
photography, experimenting with food, and reading.
About the reviewer
Naresh Rohra is an accomplished data modeler and lead analyst with a passion for harnessing the
potential of cutting-edge technologies to unlock valuable insights from complex datasets. With over a
decade of experience in the field of data modeling, he has successfully navigated the dynamic landscape
of data wrangling and analysis, earning him a reputation as a leading expert. He has expertly designed
various data models related to OLTP and OLAP systems. He has worked with renowned organizations
like TCS, Cognizant, and Tech Mahindra.
I extend heartfelt thanks to my colleagues and family for their constant encouragement throughout my book
review journey. To my beloved daughter, Ginisha, you are the luminescent star that adorns my night sky.
Roja Boina is an engineering senior advisor. She provides data-driven solutions to enterprises. She
takes pride in her get-it-done attitude. She enjoys defining requirements, designing, developing,
testing, and delivering backend applications, and communicating data through visualizations. She is
very passionate about being a Women in Tech (WIT) advocate and being a part of WIT communities.
Outside of her 9-5 job, she loves to volunteer for non-profits and mentor fellow women in STEM. She
believes in having a growth mindset.
Table of Contents
Prefacexiii
Understanding the pricing of AWS Glue checks using AWS Glue DataBrew 49
DataBrew24 Data publication – fixing data quality issues 62
Event-driven data quality check using Glue
Using AWS Glue DataBrew for data
DataBrew67
wrangling26
Identifying the dataset 27 Data protection with AWS Glue
Downloading the sample dataset 27 DataBrew67
Data discovery – creating an AWS Glue Encryption at rest 68
DataBrew profile for a dataset 32 Encryption in transit 68
Data cleaning and enrichment – AWS Glue Identifying and handling PII 69
DataBrew transforms 37
Data validation – performing data quality
Data lineage and data publication 73
Summary74
3
Introducing AWS SDK for pandas 75
AWS SDK for pandas 75 Glue jobs 91
Building blocks of AWS SDK for Standard and custom installation on Amazon
pandas76 SageMaker notebooks 98
4
Introduction to SageMaker Data Wrangler 131
Data import 133 Data transformation 136
Data orchestration 135 Insights and data quality 138
Table of Contents ix
6
Working with AWS Glue 175
What is Apache Spark? 175 Data ingestion using AWS Glue ETL 194
Apache Spark architecture 176 AWS GlueContext 195
Apache Spark framework 178 DynamicFrame195
Resilient Distributed Datasets 179 AWS Glue Job bookmarks 196
Datasets and DataFrames 180 AWS Glue Triggers 197
AWS Glue interactive sessions 197
Data discovery with AWS Glue 182
AWS Glue Studio 197
AWS Glue Data Catalog 182
Ingesting data from object stores 198
Glue Connections 185
AWS Glue crawlers 186 Summary202
Table stats 191
x Table of Contents
7
Working with Athena 203
Understanding Amazon Athena 203 Setting up data federation for source databases 220
When to use SQL/Spark analysis options? 204 Enriching data with data federation 227
8
Working with QuickSight 245
Introducing Amazon QuickSight and discovery258
its concepts 245 Data visualization with QuickSight 262
Data discovery with QuickSight 247 Visualization and charts with QuickSight 262
QuickSight-supported data sources and setup 247 Embedded analytics 268
Data discovery with QuickSight analysis 249
QuickSight Q and AI-based data analysis/ Summary269
Access through Amazon Athena and Glue Pandas operations for data transformation 311
Catalog293
Data quality validation 315
Data structuring 294 Data quality validation with Pandas 316
Different file formats and when to use them 294 Data quality validation integration with a
Restructuring data using Pandas 296 data pipeline 320
Flattening nested data with Pandas 298
Data visualization 321
Data cleaning 304 Visualization with Python libraries 321
Data cleansing with Pandas 304
Summary327
Data enrichment 310
10
Data Processing for Machine Learning with SageMaker
Data Wrangler 329
Technical requirements 330 Categorical encoding 347
Step 1 – logging in to SageMaker Custom transformation 349
Studio330 Numeric scaling 352
Dropping columns 354
Step 2 – importing data 333
Exploratory data analysis 336 Step 5 – exporting data 355
Built-in data insights 336 Training a machine learning model 357
Step 3 – creating data analysis 342 Summary365
Step 4 – adding transformations 346
Index385
• Data Professionals: Data engineers, data analysts, and data scientists who work with large
and complex datasets and want to enhance their data-wrangling skills on the AWS platform
• AWS Users: Individuals who are already familiar with AWS and want to explore the specific
services and tools available for data wrangling
xiv Preface
While prior experience with data wrangling or AWS is beneficial, the book provides a solid foundation
for beginners and gradually progresses to more advanced topics. Familiarity with basic programming
concepts and SQL would be advantageous but is not mandatory. The book combines theoretical
explanations with practical examples and hands-on exercises, making it accessible to individuals with
different backgrounds and skill levels.
Chapter 7, Working with Athena: In this chapter, you will explore the powerful capabilities of Amazon
Athena, a serverless query service that enables you to analyze data directly in Amazon S3 using standard
SQL queries. This chapter will guide you through the process of leveraging Amazon Athena to unlock
valuable insights from your data, without the need for complex data processing infrastructure.
Chapter 8, Working with QuickSight: In this chapter, you will discover the power of Amazon QuickSight,
a fast, cloud-powered business intelligence (BI) service provided by AWS. This chapter will guide
you through the process of leveraging QuickSight to create interactive dashboards and visualizations,
enabling you to gain valuable insights from your data.
Chapter 9, Building an End-to-End Data-Wrangling Pipeline with AWS SDK for Pandas: In this
chapter, you will explore the powerful combination of AWS Data Wrangler and pandas, a popular
Python library for data manipulation and analysis. This chapter will guide you through the process of
leveraging pandas operations within AWS Data Wrangler to perform advanced data transformations
and analysis on your datasets.
Chapter 10, Data Processing for Machine Learning with SageMaker Data Wrangler: In this chapter, you
will delve into the world of machine learning (ML) data optimization using the powerful capabilities
of AWS SageMaker Data Wrangler. This chapter will guide you through the process of leveraging
SageMaker Data Wrangler to preprocess and prepare your data for ML projects, maximizing the
performance and accuracy of your ML models.
Chapter 11, Data Lake Security and Monitoring: In this chapter, you will be introduced to Identity and
Access Management (IAM) on AWS and how closely Data Wrangler integrates with AWS’ security
features. We will show how you can interact directly with Amazon Cloudwatch logs, query against
logs, and return the logs as a data frame.
If you are using the digital version of this book, we advise you to type the code yourself or access
the code from the book’s GitHub repository (a link is available in the next section). Doing so will
help you avoid any potential errors related to the copying and pasting of code.
xvi Preface
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file
extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Mount
the downloaded WebStorm-10*.dmg disk image file as another disk in your system.”
A block of code is set as follows:
import sys
import awswrangler as wr
print(wr.__version__)
When we wish to draw your attention to a particular part of a code block, the relevant lines or items
are set in bold:
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words
in menus or dialog boxes appear in bold. Here is an example: “Click on the Upload button.”
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at customercare@
packtpub.com and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen.
If you have found a mistake in this book, we would be grateful if you would report this to us. Please
visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would
be grateful if you would provide us with the location address or website name. Please contact us at
copyright@packt.com with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you
are interested in either writing or contributing to a book, please visit authors.packtpub.com.
https://packt.link/free-ebook/9781801810906
This section marks the beginning of an exciting journey, where we explore the realm of data wrangling
and unveil the remarkable capabilities of AWS for data manipulation and preparation. This section
lays a solid foundation that provides you with insights into key concepts and essential tools that will
pave the way for your data-wrangling endeavors throughout the book.
Now that we understand the basic concept of data wrangling, we’ll learn why it is essential, and the
various benefits we get from it.
fractions according to their boiling points. Heavy fractions are on the bottom while light fractions
are on the top, as seen here:
The following figure showcases how oil processing correlates to the data wrangling process:
• Enhanced data quality: Data wrangling helps improve the overall quality of the data. It involves
identifying and handling missing values, outliers, inconsistencies, and errors. By addressing
these issues, data wrangling ensures that the data used for analysis is accurate and reliable,
leading to more robust and trustworthy results.
• Improved data consistency: Raw data often comes from various sources or in different formats,
resulting in inconsistencies in naming conventions, units of measurement, or data structure. Data
wrangling allows you to standardize and harmonize the data, ensuring consistency across the
dataset. Consistent data enables easier integration and comparison of information, facilitating
effective analysis and interpretation.
• Increased data completeness: Incomplete data can pose challenges during analysis and modeling.
Data wrangling methods allow you to handle missing data by applying techniques such as
imputation, where missing values are estimated or filled in based on existing information. By
dealing with missing data appropriately, data wrangling helps ensure a more complete dataset,
reducing potential biases and improving the accuracy of analyses.
• Facilitates data integration: Organizations often have data spread across multiple systems and
sources, making integration a complex task. Data wrangling helps in merging and integrating
data from various sources, allowing analysts to work with a unified dataset. This integration
facilitates a holistic view of the data, enabling comprehensive analyses and insights that might
not be possible when working with fragmented data.
• Streamlined data transformation: Data wrangling provides the tools and techniques to
transform raw data into a format suitable for analysis. This transformation includes tasks such
as data normalization, aggregation, filtering, and reformatting. By streamlining these processes,
data wrangling simplifies the data preparation stage, saving time and effort for analysts and
enabling them to focus more on the actual analysis and interpret the results.
• Enables effective feature engineering: Feature engineering involves creating new derived
variables or transforming existing variables to improve the performance of machine learning
models. Data wrangling provides a foundation for feature engineering by preparing the data in a
way that allows for meaningful transformations. By performing tasks such as scaling, encoding
categorical variables, or creating interaction terms, data wrangling helps derive informative
features that enhance the predictive power of models.
• Supports data exploration and visualization: Data wrangling often involves exploratory
data analysis (EDA), where analysts gain insights and understand patterns in the data before
formal modeling. By cleaning and preparing the data, data wrangling enables effective data
exploration, helping analysts uncover relationships, identify trends, and visualize the data using
charts, graphs, or other visual representations. These exploratory steps are crucial for forming
hypotheses, making data-driven decisions, and communicating insights effectively.
The steps involved in data wrangling 7
Now that we have learned about the advantages of data wrangling, let’s understand the steps involved
in the data wrangling process.
1. Data discovery
2. Data structuring
3. Data cleaning
4. Data enrichment
5. Data validation
6. Data publishing
Before we begin, it’s important to understand these activities may or may not need to be followed
sequentially, or in some cases, you may skip any of these steps.
Also, keep in mind that these steps are iterative and differ for different personas, such as data analysts,
data scientists, and data engineers.
As an example, data discovery for data engineers may vary from what data discovery means for a
data analyst or data scientist:
Data discovery
The first step of the data wrangling process is data discovery. This is one of the most important steps
of data wrangling. In data discovery, we familiarize ourselves with the kind of data we have as raw
data, what use case we are looking to solve with that data, what kind of relationships exist between the
raw data, w hat the data format will look like, such as CSV or Parquet, what kind of t ools are available
for storing, transforming, and querying this data, and how we wish to organize this data, such as by
folder structure, file size, partitions, and so on to make it easy to access.
Let’s understand this by looking at an example.
In this example, we will try to understand how data discovery varies based on the persona. Let’s assume
we have two colleagues, James and Jean. James is a data engineer while Jean is a data analyst, and they
both work for a car-selling company.
Jean is new to the organization and she is required to analyze car sales numbers for Southern California.
She has reached out to James and asked him for data from the sales table from the production system.
Here is the data discovery process for Jane (a data analyst):
1. Jane has to identify the data she needs to generate the sales report (for example, sales transaction
data, vehicle details data, customer data, and so on).
2. Jane has to find where the sales data resides (a database, file share, CRM, and so on).
3. Jane has to identify how much data she needs (from the last 12 months, the last month, and so on).
4. Jane has to identify what kind of tool she is going to use (Amazon QuickSight, Power BI, and
so on).
5. Jane has to identify the format she needs the data to be in so that it works with the tools she has.
6. Jane has to identify where she is looking to store this data – in a data lake (Amazon S3), on her
desktop, a file share, and sandbox environment, and so on.
1. Which system has requested data? For example, Amazon RDS, Salesforce CRM, Production
SFTP location, and so on.
2. How will the data be extracted? For example, using services such as Amazon DMS or AWS
Glue or writing a script.
3. What will the schedule look like? Daily, weekly, or monthly?
4. What will the file format look like? For example, CSV, Parquet, orc, and so on.
5. How will the data be stored in the provided store?
The steps involved in data wrangling 9
Data structuring
To support existing and future business use cases to serve its customers better, the organization
must collect unprecedented amounts of data from different data sources and in different varieties. In
modern data architecture, most of the time, the data is stored in data lakes since a data lake allows
you to store all kinds of data files, whether it is structured data, unstructured data, images, audio,
video, or something else, and it will be of different shapes and sizes in its raw form. When data is in
its raw form, it lacks a definitive structure, which is required for it to be stored in databases or data
warehouses or used to build analytics or machine learning models. At this point, it is not optimized
for cost and performance.
In addition, when you work with streaming data such as clickstreams and log analytics, not all the
data fields (columns) are used in analytics.
At this stage of data wrangling, we try to optimize the raw dataset for cost and performance benefits
by performing partitioning and converting file types (for example, CSV into Parquet).
Once again, let’s consider our friends James and Jean to understand this.
For Jean, the data analyst, data structuring means that she is looking to do direct queries or store
data in a memory store of a BI tool, in the case of Amazon QuickSight called the SPICE layer, which
provides faster access to data.
For James, the data engineer, when he is extracting data from a production system and looking to
store it in a data lake such as Amazon S3, he must consider what the file format will look like. He
can partition it by geographical regions, such as county, state, or region, or by date – for example,
year=YYYY, month=MM, and day=DD.
Data cleaning
The next step of the data wrangling process is data cleaning. The previous two steps give us an idea
of how the data looks and how it is stored. In the data cleaning step, we start working with raw data
to make it meaningful so that we can define future use cases.
In the data cleaning step, we try to make data meaningful by doing the following:
• Removing unwanted columns, duplicate values, and filling null value columns to improve the
data’s readiness
• Performing data validation such as identifying missing values for mandatory columns such as
First Name, Last Name, SSN, Phone No., and so on
• Validating or fixing data type for better optimization of storage and performance
• Identifying and fixing outliers
• Removing garbage data or unwanted values, such as special characters
10 Getting Started with Data Wrangling
Both James and Jane can perform similar data cleaning tasks; however, their scale might vary. For
James, these tasks must be done for the entire dataset. For Jane, they may only have to perform them
on the data from Southern California, and granularity might vary as well. For James, maybe it is only
limited to regions such as Southern California, Northern California, and so on, while for Jean, it might
be city level or even ZIP code.
Data enrichment
Up until the data cleaning step, we were primarily working on single data sources and making them
meaningful for future use. However, in the real world, most of the time, data is fragmented and
stored in multiple disparate data stores, and to support use cases such as building personalization
or recommendation solutions or building Customer 360s or log forensics, we need to join the data
from different data stores.
For example, to build a Customer 360 solution, you need data from the Customer Relationship
Manager (CRM) systems, clickstream logs, relational databases, and so on.
So, in the data enrichment step, we build the process that will enhance the raw data with relevant data
obtained from different sources.
Data validation
There is a very interesting term in computer science called garbage in, garbage out (GIGO). GIGO
is the concept that flawed or defective (garbage) input data produces defective output.
In other words, the quality of the output is determined by the quality of the input. So, if we provide
bad data as input, we will get inaccurate results.
In the data validation step, we address this issue by performing various data quality checks:
Number of records
Duplicate values
Missing values
Outliers
Distinct values
Unique values
Correlation
Best practices for data wrangling 11
There is a lot of overlap between data cleaning and data validation and yes, there are a lot of similarities
between these two processes. However, data validation is done on the resulting dataset, while data
cleaning is primarily done on the raw dataset.
Data publishing
After completing all the data wrangling steps, the data is ready to be used for analytics so that it can
solve business problems.
So, the final step is to publish the data to the end user with the required access and permission.
In this step, we primarily concentrate on how the data is being exposed to the end user and where
the final data gets stored – that is, in a relational database, a data warehouse, curated or user zones in
a data lake, or through the Secure File Transfer Protocol (SFTP).
The choice of data storage depends on the tool through which the end user is looking to access the data.
For example, if the end user is looking to access data through BI tools such as Amazon QuickSight,
Power BI, Informatica, and so on, a relational data store will be an ideal choice. If it is accessed by a
data scientist, ideally, it should be stored in an object store.
We will learn about the different kinds of data stores we can use to store raw and wrangled data later
in this book.
In this section, we learned about the various steps of the data wrangling process through our friends
James and Jean and how these steps may or may not vary based on personas. Now, let’s understand
the best practices for data wrangling.
OUTSIDERS.
After all, outsiders furnish quite as much fun as the legislators
themselves. The number of men who persist in writing one letters of
praise, abuse, and advice on every conceivable subject is appalling;
and the writers are of every grade, from the lunatic and the criminal
up. The most difficult to deal with are the men with hobbies. There is
the Protestant fool, who thinks that our liberties are menaced by the
machinations of the Church of Rome; and his companion idiot, who
wants legislation against all secret societies, especially the Masons.
Then there are the believers in “isms,” of whom the women-
suffragists stand in the first rank. Now I have always been a believer
in woman’s rights, but I must confess I have never seen such a
hopelessly impracticable set of persons as the woman-suffragists
who came up to Albany to get legislation. They simply would not
draw up their measures in proper form; when I pointed out to one of
them that their proposed bill was drawn up in direct defiance of
certain of the sections of the Constitution of the State he blandly
replied that he did not care at all for that, because the measure had
been drawn up so as to be in accord with the Constitution of Heaven.
There was no answer to this beyond the very obvious one that
Albany was in no way akin to Heaven. The ultra-temperance people
—not the moderate and sensible ones—are quite as impervious to
common sense.
A member’s correspondence is sometimes amusing. A member
receives shoals of letters of advice, congratulation, entreaty, and
abuse, half of them anonymous. Most of these are stupid; but some
are at least out of the common.
I had some constant correspondents. One lady in the western part
of the State wrote me a weekly disquisition on woman’s rights. A
Buffalo clergyman spent two years on a one-sided correspondence
about prohibition. A gentleman of Syracuse wrote me such a stream
of essays and requests about the charter of that city that I feared he
would drive me into a lunatic asylum; but he anticipated matters by
going into one himself. A New Yorker at regular intervals sent up a
request that I would “reintroduce” the Dongan charter, which had
lapsed two centuries before. A gentleman interested in a proposed
law to protect primaries took to telegraphing daily questions as to its
progress—a habit of which I broke him by sending in response
telegrams of several hundred words each, which I was careful not to
prepay.
There are certain legislative actions which must be taken in a
purely Pickwickian sense. Notable among these are the resolutions
of sympathy for the alleged oppressed patriots and peoples of
Europe. These are generally directed against England, as there
exists in the lower strata of political life an Anglophobia quite as
objectionable as the Anglomania of the higher social circles.
As a rule, these resolutions are to be classed as simply bouffe
affairs; they are commonly introduced by some ambitious legislator
—often, I regret to say, a native American—who has a large foreign
vote in his district. During my term of service in the Legislature,
resolutions were introduced demanding the recall of Minister Lowell,
assailing the Czar for his conduct towards the Russian Jews,
sympathizing with the Land League and the Dutch Boers, etc., etc.;
the passage of each of which we strenuously and usually
successfully opposed, on the ground that while we would warmly
welcome any foreigner who came here, and in good faith assumed
the duties of American citizenship, we had a right to demand in
return that he should not bring any of his race or national antipathies
into American political life. Resolutions of this character are
sometimes undoubtedly proper; but in nine cases out of ten they are
wholly unjustifiable. An instance of this sort of thing which took place
not at Albany may be cited. Recently the Board of Aldermen of one
of our great cities received a stinging rebuke, which it is to be feared
the aldermanic intellect was too dense fully to appreciate. The
aldermen passed a resolution “condemning” the Czar of Russia for
his conduct towards his fellow-citizens of Hebrew faith, and
“demanding” that he should forthwith treat them better; this was
forwarded to the Russian Minister, with a request that it be sent to
the Czar. It came back forty-eight hours afterwards, with a note on
the back by one of the under-secretaries of the legation, to the effect
that as he was not aware that Russia had any diplomatic relations
with this particular Board of Aldermen, and as, indeed, Russia was
not officially cognizant of their existence, and, moreover, was wholly
indifferent to their opinions on any conceivable subject, he herewith
returned them their kind communication.[7]
In concluding I would say, that while there is so much evil at
Albany, and so much reason for our exerting ourselves to bring
about a better state of things, yet there is no cause for being
disheartened or for thinking that it is hopeless to expect
improvement. On the contrary, the standard of legislative morals is
certainly higher than it was fifteen years ago or twenty-five years
ago. In the future it may either improve or retrograde, by fits and
starts, for it will keep pace exactly with the awakening of the popular
mind to the necessity of having honest and intelligent
representatives in the State Legislature.[8]
I have had opportunity of knowing something about the workings
of but a few of our other State legislatures: from what I have seen
and heard, I should say that we stand about on a par with those of
Pennsylvania, Maryland, and Illinois, above that of Louisiana, and
below those of Vermont, Massachusetts, Rhode Island, and
Wyoming, as well as below the national legislature at Washington.
But the moral status of a legislative body, especially in the West,
often varies widely from year to year.
FOOTNOTES:
[6] The Century, January, 1885.
[7] A few years later a member of the Italian Legation “scored”
heavily on one of our least pleasant national peculiarities. An
Italian had just been lynched in Colorado, and an Italian paper in
New York bitterly denounced the Italian Minister for his supposed
apathy in the matter. The member of the Legation in question
answered that the accusations were most unjust, for the Minister
had worked zealously until he found that the deceased “had taken
out his naturalization papers, and was entitled to all the privileges
of American citizenship.”
[8] At present, twelve years later, I should say that there was
rather less personal corruption in the Legislature; but also less
independence and greater subservience to the machine, which is
even less responsive to honest and enlightened public opinion.
VI
MACHINE POLITICS IN NEW YORK CITY[9]
“HEELERS.”
The “heelers,” or “workers,” who stand at the polls, and are paid in
the way above described, form a large part of the rank and file
composing each organization. There are, of course, scores of them
in each assembly district association, and, together with the almost
equally numerous class of federal, State, or local paid office-holders
(except in so far as these last have been cut out by the operations of
the civil-service reform laws), they form the bulk of the men by whom
the machine is run, the bosses of great and small degree chiefly
merely oversee the work and supervise the deeds of their
henchmen. The organization of a party in our city is really much like
that of an army. There is one great central boss, assisted by some
trusted and able lieutenants; these communicate with the different
district bosses, whom they alternately bully and assist. The district
boss in turn has a number of half-subordinates, half-allies, under
him; these latter choose the captains of the election districts, etc.,
and come into contact with the common heelers. The more stupid
and ignorant the common heelers are, and the more implicitly they
obey orders, the greater becomes the effectiveness of the machine.
An ideal machine has for its officers men of marked force, cunning
and unscrupulous, and for its common soldiers men who may be
either corrupt or moderately honest, but who must be of low
intelligence. This is the reason why such a large proportion of the
members of every political machine are recruited from the lower
grades of the foreign population. These henchmen obey
unhesitatingly the orders of their chiefs, both at the primary or
caucus and on election day, receiving regular rewards for so doing,
either in employment procured for them or else in money outright. Of
course it is by no means true that these men are all actuated merely
by mercenary motives. The great majority entertain also a real
feeling of allegiance towards the party to which they belong, or
towards the political chief whose fortunes they follow; and many
work entirely without pay and purely for what they believe to be right.
Indeed, an experienced politician always greatly prefers to have
under him men whose hearts are in their work and upon whose
unbribed devotion he can rely; but unfortunately he finds in most
cases that their exertions have to be seconded by others which are
prompted by motives far more mixed.
All of these men, whether paid or not, make a business of political
life and are thoroughly at home among the obscure intrigues that go
to make up so much of it; and consequently they have quite as much
the advantage when pitted against amateurs as regular soldiers
have when matched against militiamen. But their numbers, though
absolutely large, are, relatively to the entire community, so small that
some other cause must be taken into consideration in order to
account for the commanding position occupied by the machine and
the machine politicians in public life. This other determining cause is
to be found in the fact that all these machine associations have a
social as well as a political side, and that a large part of the political
life of every leader or boss is also identical with his social life.