(Download pdf) Understanding Large Language Models Learning Their Underlying Concepts And Technologies 1St Edition Thimira Amaratunga full chapter pdf docx

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Understanding Large Language

Models: Learning Their Underlying


Concepts and Technologies 1st Edition
Thimira Amaratunga
Visit to download the full and correct content document:
https://ebookmass.com/product/understanding-large-language-models-learning-their-
underlying-concepts-and-technologies-1st-edition-thimira-amaratunga/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Understanding Large Language Models Thimira Amaratunga

https://ebookmass.com/product/understanding-large-language-
models-thimira-amaratunga/

Transforming Conversational AI: Exploring the Power of


Large Language Models in Interactive Conversational
Agents 1st Edition Michael Mctear

https://ebookmass.com/product/transforming-conversational-ai-
exploring-the-power-of-large-language-models-in-interactive-
conversational-agents-1st-edition-michael-mctear/

Understanding Crypto Fundamentals: Value Investing in


Cryptoassets and Management of Underlying Risks 1st
Edition Thomas Jeegers

https://ebookmass.com/product/understanding-crypto-fundamentals-
value-investing-in-cryptoassets-and-management-of-underlying-
risks-1st-edition-thomas-jeegers/

[EARLY RELEASE] Quick Start Guide to Large Language


Models: Strategies and Best Practices for using ChatGPT
and Other LLMs Sinan Ozdemir

https://ebookmass.com/product/early-release-quick-start-guide-to-
large-language-models-strategies-and-best-practices-for-using-
chatgpt-and-other-llms-sinan-ozdemir/
Applying Language Technology in Humanities Research:
Design, Application, and the Underlying Logic 1st ed.
Edition Barbara Mcgillivray

https://ebookmass.com/product/applying-language-technology-in-
humanities-research-design-application-and-the-underlying-
logic-1st-ed-edition-barbara-mcgillivray/

Education, Narrative Technologies and Digital Learning


1st ed. Edition Tony Hall

https://ebookmass.com/product/education-narrative-technologies-
and-digital-learning-1st-ed-edition-tony-hall/

Understanding Crypto Fundamentals: Value Investing in


Cryptoassets and Management of Underlying Risks Thomas
Jeegers

https://ebookmass.com/product/understanding-crypto-fundamentals-
value-investing-in-cryptoassets-and-management-of-underlying-
risks-thomas-jeegers/

(eTextbook PDF) for Customer Relationship Management:


Concepts and Technologies 4th Edition

https://ebookmass.com/product/etextbook-pdf-for-customer-
relationship-management-concepts-and-technologies-4th-edition/

Autonomy in Language Learning and Teaching: New


Research Agendas 1st Edition Alice Chik

https://ebookmass.com/product/autonomy-in-language-learning-and-
teaching-new-research-agendas-1st-edition-alice-chik/
Understanding
Large Language
Models
Learning Their Underlying Concepts
and Technologies

Thimira Amaratunga
Understanding Large
Language Models
Learning Their Underlying
Concepts and Technologies

Thimira Amaratunga
Understanding Large Language Models: Learning Their Underlying
Concepts and Technologies
Thimira Amaratunga
Nugegoda, Sri Lanka

ISBN-13 (pbk): 979-8-8688-0016-0 ISBN-13 (electronic): 979-8-8688-0017-7


https://doi.org/10.1007/979-8-8688-0017-7

Copyright © 2023 by Thimira Amaratunga


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or
part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way,
and transmission or information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark
symbol with every occurrence of a trademarked name, logo, or image we use the names, logos,
and images only in an editorial fashion and to the benefit of the trademark owner, with no
intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal
responsibility for any errors or omissions that may be made. The publisher makes no warranty,
express or implied, with respect to the material contained herein.
Managing Director, Apress Media LLC: Welmoed Spahr
Acquisitions Editor: Smriti Srivastava
Development Editor: Laura Berendson
Editorial Project Manager: Shaul Elson
Cover designed by eStudioCalamar
Cover image designed by Cai Fang from Unsplash
Distributed to the book trade worldwide by Springer Science+Business Media New York, 1
New York Plaza, Suite 4600, New York, NY 10004-1562, USA. Phone 1-800-SPRINGER, fax (201)
348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media,
LLC is a California LLC and the sole member (owner) is Springer Science + Business Media
Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
For information on translations, please e-mail booktranslations@springernature.com; for
reprint, paperback, or audio rights, please e-mail bookpermissions@springernature.com.
Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook
versions and licenses are also available for most titles. For more information, reference our Print
and eBook Bulk Sales web page at http://www.apress.com/bulk-sales.
Any source code or other supplementary material referenced by the author in this book is available
to readers on GitHub. For more detailed information, please visit https://www.apress.com/gp/
services/source-code.
Paper in this product is recyclable
Dedicated to all who push the boundaries of knowledge.
Table of Contents
About the Author���������������������������������������������������������������������������������xi

About the Technical Reviewer�����������������������������������������������������������xiii

Acknowledgments������������������������������������������������������������������������������xv

Preface���������������������������������������������������������������������������������������������xvii

Chapter 1: Introduction������������������������������������������������������������������������1
A Brief History of AI�����������������������������������������������������������������������������������������������2
Where LLMs Stand������������������������������������������������������������������������������������������������6
Summary��������������������������������������������������������������������������������������������������������������7

Chapter 2: NLP Through the Ages���������������������������������������������������������9


History of NLP�������������������������������������������������������������������������������������������������������9
Formal Grammars������������������������������������������������������������������������������������������11
Transformational Grammar and Generative Grammar�����������������������������������11
Parsing and Syntactic Analysis����������������������������������������������������������������������11
Context and Semantics����������������������������������������������������������������������������������11
Language Understanding������������������������������������������������������������������������������12
Knowledge Engineering���������������������������������������������������������������������������������12
Probabilistic Models��������������������������������������������������������������������������������������13
Hidden Markov Models����������������������������������������������������������������������������������13
N-Gram Language Models�����������������������������������������������������������������������������13
Maximum Entropy Models�����������������������������������������������������������������������������13
Conditional Random Fields����������������������������������������������������������������������������14

v
Table of Contents

Large Annotated Corpora�������������������������������������������������������������������������������14


Word Sense Disambiguation�������������������������������������������������������������������������14
Machine Translation��������������������������������������������������������������������������������������14
Information Retrieval�������������������������������������������������������������������������������������14
Statistical Approaches�����������������������������������������������������������������������������������15
Availability of Large Text Corpora������������������������������������������������������������������15
Supervised Learning for NLP Tasks���������������������������������������������������������������15
Named Entity Recognition�����������������������������������������������������������������������������16
Sentiment Analysis����������������������������������������������������������������������������������������16
Machine Translation��������������������������������������������������������������������������������������16
Introduction of Word Embeddings�����������������������������������������������������������������16
Deep Learning and Neural Networks�������������������������������������������������������������16
Deployment in Real-World Applications��������������������������������������������������������17
Tasks of NLP�������������������������������������������������������������������������������������������������������17
Basic Concepts of NLP����������������������������������������������������������������������������������������18
Tokenization��������������������������������������������������������������������������������������������������20
Corpus and Vocabulary����������������������������������������������������������������������������������21
Word Embeddings�����������������������������������������������������������������������������������������24
Language Modeling���������������������������������������������������������������������������������������������34
N-Gram Language Models�����������������������������������������������������������������������������36
Neural Language Models�������������������������������������������������������������������������������43
Summary������������������������������������������������������������������������������������������������������������53

Chapter 3: Transformers���������������������������������������������������������������������55
Paying Attention��������������������������������������������������������������������������������������������������55
The Transformer Architecture�����������������������������������������������������������������������������64
The Encoder���������������������������������������������������������������������������������������������������66
The Decoder��������������������������������������������������������������������������������������������������68

vi
Table of Contents

Scaled Dot Product����������������������������������������������������������������������������������������71


Multihead Attention���������������������������������������������������������������������������������������76
Summary������������������������������������������������������������������������������������������������������������79

Chapter 4: What Makes LLMs Large?�������������������������������������������������81


What Makes a Transformer Model an LLM����������������������������������������������������������81
Number of Parameters����������������������������������������������������������������������������������82
Scale of Data�������������������������������������������������������������������������������������������������82
Computational Power������������������������������������������������������������������������������������83
Fine-Tuning and Task Adaptation�������������������������������������������������������������������83
Capabilities����������������������������������������������������������������������������������������������������83
Why Parameters Matter���������������������������������������������������������������������������������84
Computational Requirements������������������������������������������������������������������������84
Risk of Overfitting������������������������������������������������������������������������������������������85
Model Size�����������������������������������������������������������������������������������������������������85
The Scale of Data������������������������������������������������������������������������������������������86
Types of LLMs�����������������������������������������������������������������������������������������������������90
Based on the Architecture�����������������������������������������������������������������������������90
Based on the Training Objective��������������������������������������������������������������������91
Usage-Based Categorizations���������������������������������������������������������������������105
Foundation Models��������������������������������������������������������������������������������������������106
Pre-training on Broad Data��������������������������������������������������������������������������106
Fine-Tuning and Adaptability�����������������������������������������������������������������������107
Transfer Learning����������������������������������������������������������������������������������������107
Economies of Scale�������������������������������������������������������������������������������������107
General-Purpose Abilities����������������������������������������������������������������������������107
Fine-Tuning Capabilities������������������������������������������������������������������������������108
Transfer Learning����������������������������������������������������������������������������������������108

vii
Table of Contents

Economies of Scale�������������������������������������������������������������������������������������108
Rapid Deployment���������������������������������������������������������������������������������������108
Interdisciplinary Applications����������������������������������������������������������������������108
Reduced Training Overhead�������������������������������������������������������������������������109
Continuous Adaptability�������������������������������������������������������������������������������109
Democratization of AI����������������������������������������������������������������������������������109
Applying LLMs���������������������������������������������������������������������������������������������������109
Prompt Engineering�������������������������������������������������������������������������������������109
Explicitness�������������������������������������������������������������������������������������������������110
Examples as Guidance��������������������������������������������������������������������������������111
Iterative Refinement������������������������������������������������������������������������������������111
Controlling Verbosity and Complexity����������������������������������������������������������111
Systematic Variations����������������������������������������������������������������������������������111
Fine-Tuning��������������������������������������������������������������������������������������������������114
Overfitting���������������������������������������������������������������������������������������������������������116
Catastrophic Forgetting������������������������������������������������������������������������������������116
Evaluation���������������������������������������������������������������������������������������������������������116
Summary����������������������������������������������������������������������������������������������������������116

Chapter 5: Popular LLMs������������������������������������������������������������������119


Generative Pre-trained Transformer������������������������������������������������������������������119
Bidirectional Encoder Representations from Transformers������������������������������127
Pathways Language Model�������������������������������������������������������������������������������128
Large Language Model Meta AI������������������������������������������������������������������������129
Summary����������������������������������������������������������������������������������������������������������130

viii
Table of Contents

Chapter 6: Threats, Opportunities, and Misconceptions��������������������131


LLMs and the Threat of a Superintelligent AI����������������������������������������������������132
Levels of AI��������������������������������������������������������������������������������������������������132
Existential Risk from an ASI�������������������������������������������������������������������������135
Where LLMs Fit��������������������������������������������������������������������������������������������137
Misconceptions and Misuse�����������������������������������������������������������������������������139
Opportunities����������������������������������������������������������������������������������������������������145
Summary����������������������������������������������������������������������������������������������������������148

Index�������������������������������������������������������������������������������������������������149

ix
About the Author
Thimira Amaratunga is a senior software
architect at Pearson PLC Sri Lanka with more
than 15 years of industry experience. He is also
an inventor, author, and researcher in the AI,
machine learning, deep learning in education,
and computer vision domains.
Thimira has a master’s of science degree in
computer science and a bachelor’s degree in
information technology from the University of Colombo, Sri Lanka. He is
also a TOGAF-certified enterprise architect.
He has filed three patents in the fields of dynamic neural networks and
semantics for online learning platforms. He has published three books on
deep learning and computer vision.
Connect with him on LinkedIn: https://www.linkedin.com/in/
thimira-amaratunga.

xi
About the Technical Reviewer
Kasam Shaikh is a prominent figure in India’s
artificial intelligence landscape, holding the
distinction of being one of the country’s first
four Microsoft Most Valuable Professionals
(MVPs) in AI. Currently serving as a senior
architect at Capgemini, Kasam boasts an
impressive track record as an author, having
written five best-selling books dedicated to
Azure and AI technologies. Beyond his writing
endeavors, Kasam is recognized as a Microsoft
Certified Trainer (MCT) and influential tech
YouTuber (@mekasamshaikh). He also leads the largest online Azure
AI community, known as DearAzure | Azure INDIA, and is a globally
renowned AI speaker. His commitment to knowledge sharing extends to
contributions to Microsoft Learn, where he plays a pivotal role.
Within the realm of AI, Kasam is a respected subject-matter expert
(SME) in generative AI for the cloud, complementing his role as a senior
cloud architect. He actively promotes the adoption of No Code and Azure
OpenAI solutions and possesses a strong foundation in hybrid and cross-
cloud practices. Kasam’s versatility and expertise make him an invaluable
asset in the rapidly evolving landscape of technology, contributing
significantly to the advancement of Azure and AI.
In summary, Kasam is a multifaceted professional who excels in both
technical expertise and knowledge dissemination. His contributions
span writing, training, community leadership, public speaking, and
architecture, establishing him as a true luminary in the world of Azure
and AI.

xiii
Acknowledgments
The idea for this book came during my journey to understand the latest
developments in AI. In creating this book, I want to help others who seek
the same knowledge. Although this was my fourth book, it took a lot of
work. Luckily, I received support from many individuals along the way, for
whom I would like to express my sincere gratitude.
First, I would like to thank the team at Apress: Smriti Srivastava,
Sowmya Thodur, Laura Berendson, Shaul Elson, Mark Powers, Joseph
Quatela, Kasam Shaikh, Linthaa Muralidharan, and everyone involved in
the editing and publishing of this book.
To my loving wife, Pramitha. Thank you for the encouragement and
the motivation given from the inception of the idea to its completion.
Completing this might not have been possible without your support
through the long hours and days spent writing and perfecting this book.
To my colleagues and managers at Pearson PLC, who have guided me
throughout the years, I am grateful for your guidance and encouragement.
Finally, to my parents and sister, thank you for your endless support
throughout the years.

xv
Preface
Today, finding someone who hasn’t heard of ChatGPT, the AI chatbot that
took the world by storm, is hard. ChatGPT—and its competitors such as
Google Bard, Microsoft Bing Chat, etc.—are part of a broader area in AI
known as large language models (LLMs). LLMs are the latest frontier in
AI, resulting from recent research into natural language processing (NLP)
and deep learning. However, the immense popularity these applications
have gained has created some concerns and misconceptions around them
because of a lack of understanding of what they truly are.
Understanding the concepts behind this new technology, including how
it evolved, and addressing the misconceptions and genuine concerns around
it are crucial for us to bring out its full potential. Therefore, this book was
designed to provide a crucial overall understanding of large language models.
In this book, you will do the following:

• Learn the history of AI and NLP leading up to large


language models

• Learn the core concepts of NLP that help define LLMs

• Look at the transformer architecture, a turning point in


NLP research

• See what makes LLMs special

• Understand the architectures of popular LLM applications

• Read about the concerns, threats, misconceptions, and


opportunities presented by using LLMs

This is not a coding book. However, this book will provide a strong
foundation for understanding LLMs as you take your first steps toward them.

xvii
CHAPTER 1

Introduction
It was late 2022. Reports were coming in about a new AI that had human-­
like conversational skills and seemingly infinite knowledge. Not only was
it able to articulate answers to a large number of subject domains such
as science, technology, history, and philosophy, it was able to elaborate
on the answers it gave and perform meaningful follow-up conversations
about them.
This was ChatGPT, a large language model (LLM) chatbot developed
by OpenAI. ChatGPT has been trained on a massive dataset of both text
and code, giving it the ability to generate code as well as creative text
content. Being optimized for conversations, ChatGPT allowed users to
steer the conversations to generate the desired content by considering the
succeeding prompts and replies as context.
Because of these capabilities and it being made available to the
general public, ChatGPT gained immense popularity. It became the
fastest-growing consumer software application in history. Since release,
it has been covered by major news outlets, reviewed in both technical
and nontechnical industries, and even referenced in government
documents. The amount of interest shown in ChatGPT by the general
public is something previously unheard of. The availability of it has made
a substantial impact on many industries both directly and indirectly.
This has resulted in both enthusiasm and concerns about AI and its
capabilities.

© Thimira Amaratunga 2023 1


T. Amaratunga, Understanding Large Language Models,
https://doi.org/10.1007/979-8-8688-0017-7_1
Chapter 1 Introduction

While being the most popular LLM product, ChatGPT is barely the
tip of the iceberg when it comes to the capabilities of large language
models. Ushered in by the advancements of deep learning, natural
language processing (NLP), and the ever-increasing processing power
of data processing, LLMs are the bleeding edge of generative AI. The
technology has been in active development since 2018. ChatGPT is not
the first LLM. In fact, it was not even the first LLM from OpenAI. It was,
however, the most impactful one to reach the general public. The success
of ChatGPT has also triggered a wave of competitor conversational AI
platforms, such as Bard from Google and LLaMA from Meta AI, pushing
the boundaries of the technology further.
As with any new technology, not everyone seems to have grasped
what LLMs really are. Also, while many have expressed enthusiasm
regarding LLMs and their capabilities, there are concerns being raised. The
concerns range from AI taking over certain job roles, disruption of creative
processes, forgeries, and existential risk brought on by superintelligent
AIs. However, some of these concerns are due to the misunderstanding of
LLMs. There are real potential risks associated with LLMs. But it may not
be from where most people are thinking.
To understand both the usefulness and the risks, we must first learn
how LLMs work and the history of AI that led to the development of LLMs.

A Brief History of AI
Humans have always been intrigued by the idea of intelligent machines:
the idea that machines or artificial constructs can be built with intelligent
behavior, allowing them to perform tasks that typically require human
intelligence. This idea pre-dates the concept of computers, and written
records of the idea can be traced back to the 13th century. By the 19th
century, it was this idea that brought forward concepts such as formal
reasoning, propositional logic, and predicate calculus.

2
Chapter 1 Introduction

In June 1956, many expert mathematicians and scientists who were


enthusiasts in the subject of intelligent machines came together for a
conference at Dartmouth College (New Hampshire, US). This conference—
The Dartmouth Summer Research Project on Artificial Intelligence—was
the starting point of the formal research field of artificial intelligence. It
was at this conference that the Logic Theorist, developed by Allen Newell,
Herbert A. Simon, and Cliff Shaw and what is now considered to be the
first artificial intelligence program, was also presented. The Logic Theorist
was meant to mimic the logical problem-solving of a human and was able
to prove 38 out of the first 52 theorems in Principia Mathematica (a book
on the principles of mathematics written by Alfred North Whitehead and
Bertrand Russell).
After its initiation, the field of artificial intelligence branched out
into several subfields, such as expert systems, computer vision, natural
language processing, etc. These subfields often overlap and build upon
each other. Over the following years, AI has experienced several waves
of optimism, followed by disappointment and the loss of funding (time
periods referred to as AI winters, which are followed by new approaches
being discovered, success, and renewed funding and interest).
One of the main obstacles the researchers of AI faced at the time
was the incomplete understanding of intelligence. Even today we lack a
complete understanding of how human intelligence works. By the late
1990s, researchers proposed a new approach: rather than attempting to
code intelligent behavior into a system, build a system that can grow its
own intelligence. This idea created a new subfield of AI named machine
learning.
The main aim of machine learning (ML) is to provide machines with
the ability to learn without explicit programming, in the hopes that such
systems once built will be able to evolve and adapt when they are exposed
to new data. The core idea is the ability of a learner to generalize from
experience. The learner (the AI system being trained), once given a set of

3
Chapter 1 Introduction

training samples, must be able to build a generalized model upon them,


which would allow it to decide upon new cases with sufficient accuracy.
Such training in ML can be provided in three main methods.

• Supervised learning: the system is given a set of labeled


cases (training set) based on which the system is
asked to create a generalized model that can act on
unseen cases.

• Unsupervised learning: The system is given a set of


unlabeled cases and asked to find a pattern in them.
This is ideal for discovering hidden patterns.

• Reinforcement learning: The system is asked to take


any action and is given a reward or a penalty based on
how appropriate that action is to the given situation.
The system must learn which actions yield the most
rewards in given situations over time.

Machine learning can also use a combination of these main learning


methods, such as semi-supervised learning in which a small number of
labeled examples are used with a large set of unlabeled data for training.
With these base concepts of machine learning several models were
introduced as means of implementing trainable systems and learning
techniques, such as artificial neural networks (models inspired by how
neurons of the brain work), decision trees (models that use tree-like
structures to model decisions and outcomes), regression models (models
that use statistical methods to map input and output variables), etc. These
models proved exceptionally effective in areas such as computer vision
and natural language processing.
The success of machine learning saw a steady growth in AI research
and applications over the next decade. By around 2010 few other factors
occurred that pushed their progress further.

4
Chapter 1 Introduction

Building AI models, especially machine learning models such as


neural networks, has always been computationally intensive. By the early
2010s computing power started becoming cheaper and more available
as more powerful and efficient processors were becoming available. In
addition, specialist hardware platforms that benefited AI model training
became available. This allowed more complex models to be evaluated.
In parallel, the cost of data storage and processing continued to decline.
This made collecting and processing large datasets more viable. Finally,
advancements in the medical field increased the understanding of how
the natural brain works. This new knowledge, and the availability of
processing power and data, allowed more complex neural network models
to be created and trained.
It was identified that the natural brain uses a hierarchical method
to obtain knowledge, by building complicated concepts out of simpler
ones. The brain does this by identifying lower-level patterns from the
raw inputs and then building upon those patterns to learn higher-level
features over many levels. This technique, when modeled on machine
learning, is known as hierarchical feature learning and allows such
systems to automatically learn complex features through multiple levels of
abstraction with minimal human intervention. When applying hierarchical
feature learning to neural networks, it results in deep networks with many
feature learning layers. Thus, this approach was called deep learning.
A deep learning model will not try to understand the entire problem
at once. Instead, it will look at the input, piece by piece, so that it can
derive from its lower-level patterns/features. It then uses these lower-level
features to gradually identify higher-level features, through many layers,
hierarchically. This allows deep learning models to learn complicated
patterns, by gradually building them up from simpler ones, allowing them
to comprehend the world better.
Deep learning models were immensely successful in the tasks they
were trained on, resulting in many deep learning architectures being
developed such as convolutional neural networks (CNNs), stacked

5
Chapter 1 Introduction

autoencoders, generative adversarial networks (GANs), transformers,


etc. Their success resulted in deep learning architectures being applied
to many other AI fields such as computer vision and natural language
processing.
In 2014, with the advancements in models such as variational
autoencoders and generative adversarial networks, deep learning
models were able to generate new data based on what they learned from
their training. With the introduction of the transformer deep learning
architecture in 2017, such capabilities were pushed even further. These
latest generations of AI models were named generative AI and within a
few short years were able to generate images, art, music, videos, code, text,
and more.
This is where LLMs come into the picture.

Where LLMs Stand


Large language models are the result of the combination of natural
language processing, deep learning concepts, and generative AI models.
Figure 1-1 shows where LLMs stand in the AI landscape.

6
Chapter 1 Introduction

Figure 1-1. Where LLMs are in the AI landscape

Summary
In this chapter, we went through the history of AI and how it has evolved.
We also looked at where large language models stand in the broader
AI landscape. In the next few chapters, we will look at the evolution of
NLP and its core concepts, the transformer architecture, and the unique
features of LLMs.

7
CHAPTER 2

NLP Through the


Ages
Natural language processing (NLP) is a subfield of artificial intelligence
and computational linguistics. It focuses on enabling computers to
understand, interpret, and generate human language in a way that is
both meaningful and useful. The primary goal of NLP is to bridge the
gap between human language and computer understanding, allowing
machines to process, analyze, and respond to natural language data.
NLP is the heart of large language models (LLMs). LLMs would
not exist without the concepts and algorithms developed through NLP
research over the years. Therefore, to understand LLMs, we need to
understand the concepts of NLP.

History of NLP
The conception of natural language processing dates to the 1950s. In
1950, Alan Turing published an article titled “Computing Machinery and
Intelligence,” which discussed a method to determine whether a machine
exhibits human-like intelligence. This proposed test, most popularly
referred to as the Turing test, is widely considered as what inspired early
NLP researchers to attempt natural language understanding.

© Thimira Amaratunga 2023 9


T. Amaratunga, Understanding Large Language Models,
https://doi.org/10.1007/979-8-8688-0017-7_2
Chapter 2 NLP Through the Ages

The Turing test involves a setup where a human evaluator interacts


with both a human and a machine without knowing which is which.
The evaluator’s task is to determine which participant is the machine
and which is the human based solely on their responses to questions or
prompts. If the machine is successful in convincing the evaluator that it
is human, then it is said to have passed the Turing test. The Turing test
thus provided a concrete and measurable goal for AI research. Turing’s
proposal sparked interest and discussions about the possibility of
creating intelligent machines that could understand and communicate in
natural language like humans. This led to the establishment of NLP as a
fundamental research area within AI.
In 1956, with the establishment of the artificial intelligence research
field, NLP became an established field of research in AI, making it one of
the oldest subfields in AI research.
During the 1960s and 1970s, NLP research predominantly relied on
rule-based systems. One of the earliest NLP programs was the ELIZA
chatbot, developed by Joseph Weizenbaum between 1964 and 1966. ELIZA
used pattern matching and simple rules to simulate conversation between
the user and a psychotherapist. With an extremely limited vocabulary
and ruleset ELIZA was still able to articulate human-like interactions. The
General Problem Solver (GPS) system, developed in the 1970s by Allen
Newell and Herbert A. Simon, working with means-end analysis, also
demonstrated some language processing capabilities.
In the 1970s and 1980s, NLP research began to incorporate linguistic
theories and principles to understand language better. Noam Chomsky’s
theories on generative grammar and transformational grammar influenced
early NLP work. These approaches aimed to use linguistic knowledge and
formal grammatical rules to understand and process human language.
The following are some key aspects of linguistic-based
approaches in NLP.

10
Chapter 2 NLP Through the Ages

Formal Grammars
Linguistics-based NLP heavily relied on formal grammars, such as
context-free grammars and phrase structure grammars. These formalisms
provided a way to represent the hierarchical structure and rules of natural
language sentences.

T ransformational Grammar and Generative


Grammar
Noam Chomsky’s transformational grammar and generative grammar
theories significantly influenced early NLP research. These theories
focused on the idea that sentences in a language are generated from
underlying abstract structures, and rules of transformation govern the
relationship between these structures.

Parsing and Syntactic Analysis


Parsing, also known as syntactic analysis, was a crucial aspect of
linguistics-based NLP. It involved breaking down sentences into their
grammatical components and determining the hierarchical structure.
Researchers explored various parsing algorithms to analyze the syntax of
sentences.

Context and Semantics


Linguistics-based approaches aimed to understand the context and
semantics of sentences beyond just their surface structure. The focus was
on representing the meaning of words and phrases in a way that allowed
systems to reason about their semantic relationships.

11
Chapter 2 NLP Through the Ages

Language Understanding
Linguistics-based NLP systems attempted to achieve deeper language
understanding by incorporating syntactic and semantic knowledge. This
understanding was crucial for more advanced NLP tasks, such as question
answering and natural language understanding.

Knowledge Engineering
In many cases, these approaches required manual knowledge engineering,
where linguistic rules and structures had to be explicitly defined by human
experts. This process was time-consuming and limited the scalability of
NLP systems.
There are, however, some limitations in linguistics-based NLP
approaches. While linguistics-based approaches had theoretical
appeal and offered some insights into language structure, they also
faced limitations. The complexity of natural languages and the vast
number of exceptions to linguistic rules made it challenging to develop
comprehensive and robust NLP systems solely based on formal grammars.
Because of these limitations, while linguistic theories continued to
play a role in shaping the NLP field, they were eventually complemented
and, in some cases, surpassed by data-driven approaches and statistical
methods.
During the 1990s and 2000s, NLP started shifting its focus from
rule-based and linguistics-driven systems to data-driven methods.
These approaches leveraged large amounts of language data to build
probabilistic models, leading to significant advancements in various
NLP tasks.
Statistical NLP methods used several approaches and applications. Let
us look at a few next.

12
Chapter 2 NLP Through the Ages

Probabilistic Models
Statistical approaches relied on probabilistic models to process and
analyze language data. These models assigned probabilities to different
linguistic phenomena based on their occurrences in large annotated
corpora.

Hidden Markov Models


Hidden Markov models (HMMs) were one of the early statistical models
used in NLP. They were employed for tasks such as part-of-speech tagging
and speech recognition. HMMs use probability distributions to model the
transition between hidden states, which represent the underlying linguistic
structures.

N-Gram Language Models


N-gram language models became popular during this era. They predicted
the likelihood of a word occurring given the preceding (n-1) words.
N-grams are simple but effective for tasks such as language modelling,
machine translation, and information retrieval.

Maximum Entropy Models


Maximum entropy (MaxEnt) models were widely used in various NLP
tasks. They are a flexible probabilistic framework that can incorporate
multiple features and constraints to make predictions.

13
Chapter 2 NLP Through the Ages

Conditional Random Fields


Conditional random fields (CRFs) gained popularity for sequence labeling
tasks, such as part-of-speech tagging and named entity recognition. CRFs
model the conditional probabilities of labels given the input features.

Large Annotated Corpora


Statistical approaches relied on large annotated corpora for training and
evaluation. These corpora were essential for estimating the probabilities used
in probabilistic models and for evaluating the performance of NLP systems.

Word Sense Disambiguation


Statistical methods were applied to word sense disambiguation (WSD)
tasks, where the goal was to determine the correct sense of a polysemous
word based on context. Supervised and unsupervised methods were
explored for this task.

Machine Translation
Statistical machine translation (SMT) systems emerged, which used
statistical models to translate text from one language to another. Phrase-­
based and hierarchical models were common approaches in SMT.

Information Retrieval
Statistical techniques were applied to information retrieval tasks, where
documents were ranked based on their relevance to user queries.
While statistical approaches showed great promise, they still faced
challenges related to data sparsity, handling long-range dependencies in
language, and capturing complex semantic relationships between words.

14
Chapter 2 NLP Through the Ages

During the 2000s and 2010s, as we discussed in the history of AI,


there was a significant rise in the application of machine learning (ML)
techniques. This period witnessed tremendous advancements in ML
algorithms, computational power, and the availability of large text corpora,
which fueled the progress of NLP research and applications.
Several key developments contributed to the rise of machine learning–
based NLP during this time. Let us explore a few of them.

Statistical Approaches
Statistical approaches became dominant in NLP during this period.
Instead of hand-crafted rule-based systems, researchers started using
probabilistic models and ML algorithms to solve NLP tasks. Techniques
like HMMs, CRFs, and support vector machines (SVMs) gained popularity.

Availability of Large Text Corpora


The rise of the Internet and digitalization led to the availability of vast
amounts of text data. Researchers could now train ML models on large
corpora, which greatly improved the performance of NLP systems.

Supervised Learning for NLP Tasks


Supervised learning became widely used for various NLP tasks. With
labeled data for tasks like part-of-speech tagging, named entity recognition
(NER), sentiment analysis, and machine translation, researchers could
train ML models effectively.

15
Chapter 2 NLP Through the Ages

Named Entity Recognition


ML-based NER systems, which identify entities such as the names of people,
organizations, and locations in text, became more accurate and widely used.
This was crucial for information extraction and text understanding tasks.

Sentiment Analysis
Sentiment analysis or opinion mining gained prominence, driven by the
increasing interest in understanding public opinions and sentiments
expressed in social media and product reviews.

Machine Translation
Statistical machine translation (SMT) systems, using techniques such
as phrase-based models, started to outperform rule-based approaches,
leading to significant improvements in translation quality.

Introduction of Word Embeddings


Word embeddings, like Word2Vec and GloVe, revolutionized NLP by
providing dense vector representations of words. These embeddings
captured semantic relationships between words, improving performance
in various NLP tasks.

Deep Learning and Neural Networks


The advent of deep learning and neural networks brought about a
paradigm shift in NLP. Models like recurrent neural networks (RNNs), long
short-term memory (LSTM), and convolutional neural networks (CNNs)
significantly improved performance in sequence-to-sequence tasks,
sentiment analysis, and machine translation.

16
Chapter 2 NLP Through the Ages

Deployment in Real-World Applications


ML-based NLP systems found practical applications in various industries,
such as customer support chatbots, virtual assistants, sentiment analysis
tools, and machine translation services.
The combination of statistical methods, large datasets, and the advent
of deep learning paved the way for the widespread adoption of ML-based
NLP during the 2000s and 2010s.
Toward the end of the 2010s, pre-trained language models like ELMo,
Generative Pre-trained Transformer (GPT), and Bidirectional Encoder
Representations from Transformers (BERT) emerged. These models
were pre-trained on vast amounts of data and fine-tuned for specific NLP
tasks, achieving state-of-the-art results in various benchmarks. These
developments enabled significant progress in language understanding,
text generation, and other NLP tasks, making NLP an essential part of
many modern applications and services.

Tasks of NLP
With the primary goal of bridging the gap between human language and
computer understanding, over its history, NLP has been applied to several
tasks concerning language.

• Text classification: Assigning a label or category to


a piece of text. For example, classifying emails as
spam or not spam, sentiment analysis (identifying
the sentiment as positive, negative, or neutral), topic
categorization, etc.

• NER: Identifying and classifying entities mentioned


in the text, such as names of people, organizations,
locations, dates, and more.

17
Chapter 2 NLP Through the Ages

• Machine translation: Automatically translating text


from one language to another.
• Text generation: Creating human-like text, which could
be in the form of chatbots, autogenerated content, or
text summarization.
• Speech recognition: Converting spoken language into
written text.
• Text summarization: Automatically generating a
concise and coherent summary of a longer text.
• Question answering: Providing accurate answers to
questions asked in natural language.
• Language modeling: Predicting the likelihood of a given
sequence of words occurring in a language.

The combination of one or more of these tasks forms the basis of


current NLP applications.

Basic Concepts of NLP


To achieve the previously mentioned tasks, NLP employs a set of key
concepts. These are some of the most common:
• Tokenization: Tokenization is the process of breaking
down a text into smaller units, typically words or
subwords. These smaller units are called tokens, and
tokenization is an essential preprocessing step in most
NLP tasks.
• Stopword removal: Stopwords are common words
(e.g., the, is, and) that often appear in a text but carry
little semantic meaning. Removing stopwords can help
reduce noise and improve computational efficiency.

18
Chapter 2 NLP Through the Ages

• Part-of-speech (POS) tagging: POS tagging involves


assigning grammatical tags (e.g., noun, verb, adjective)
to each word in a sentence, indicating its syntactic role.

• Parsing: Parsing involves analyzing the grammatical


structure of a sentence to understand the relationships
between words and phrases. Dependency parsing and
constituency parsing are common parsing techniques.

• Word embeddings: Word embeddings are dense


vector representations of words that capture semantic
relationships between words. Word2Vec and GloVe are
popular word embedding models.

• NER: NER is the process of identifying and classifying


named entities mentioned in the text, such as names of
people, organizations, locations, dates, etc.

• Stemming and lemmatization: Stemming and


lemmatization are techniques used to reduce words to
their base or root form. For example, running, runs, and
ran might all be stemmed or lemmatized to run.

• Language models: Language models predict the


likelihood of a sequence of words occurring in a
language. They play a crucial role in various NLP tasks,
such as machine translation and text generation.

Apart from these other task-specific techniques such as sequence-­


to-­sequence models, attention mechanisms and transfer learning
mechanisms are also used in NLP.
Let us investigate some of these concepts in depth, which will give us a
better understanding of the internal workings of LLMs.

19
Chapter 2 NLP Through the Ages

Tokenization
Tokenization is the process of breaking down a text or a sequence of
characters into smaller units, called tokens. In NLP, tokens are typically
words or subwords that form the basic building blocks for language
processing tasks. Tokenization is a crucial preprocessing step before text
can be used in various NLP applications.
Let’s take an example sentence: “I love natural language processing!”
The word level tokenization output would be as follows:

["I", "love", "natural", "language", "processing", "!"]

In this example, the tokenization process splits the sentence into


individual words, removing any punctuation. Each word in the sentence
becomes a separate token, forming a list of tokens.
Tokenization can be performed using various methods, and the choice
of tokenizer depends on the specific NLP task and the characteristics of the
text data. Some common tokenization techniques include the following:

• Whitespace tokenization: The text is split into tokens


based on whitespace (spaces, tabs, newlines). It’s a
simple and common approach for English text and can
handle most cases, but it may not handle special cases
like hyphenated words or contractions well.

• Punctuation tokenization: The text is split based


on punctuation marks, such as periods, commas,
exclamation marks, etc. It can be useful when handling
text with significant punctuation, but it may result
in issues when dealing with abbreviations or other
special cases.

20
Chapter 2 NLP Through the Ages

• Word tokenization: This is a more advanced tokenizer


that uses language-specific rules to split text into words.
It can handle special cases like hyphenated words,
contractions, and punctuation in a more linguistically
accurate manner.

• Subword tokenization: Subword tokenization methods


like byte-pair encoding (BPE) and SentencePiece
split words into subword units, allowing the model to
handle out-of-vocabulary words and handle rare or
unseen words more effectively.

The choice of tokenizer can depend on the specific use case and
requirements of the NLP task. Tokenization is the first step in converting
raw text into a format that can be processed and analyzed by NLP models
and algorithms.

Corpus and Vocabulary


In NLP, a corpus refers to a large collection of text documents or utterances
that are used as a dataset for language analysis and model training. A
corpus serves as the primary source of data for various NLP tasks, allowing
researchers and practitioners to study language patterns, extract linguistic
information, and develop language models.
A corpus can take various forms depending on the specific NLP task or
research objective. Some common types of corpora include the following:

• Text corpora: A text corpus is a collection of written text


documents, such as books, articles, web pages, emails,
and social media posts. Text corpora are commonly
used for tasks such as language modeling, sentiment
analysis, text classification, and information retrieval.

21
Chapter 2 NLP Through the Ages

• Speech corpora: A speech corpus consists of audio


recordings or transcriptions of spoken language.
Speech corpora are used in tasks such as speech
recognition, speaker identification, and emotion
detection.

• Parallel corpora: A parallel corpus contains text in


multiple languages that are aligned at the sentence or
document level. Parallel corpora are used for machine
translation and cross-lingual tasks.

• Treebanks: Treebanks are annotated corpora that


include syntactic parse trees, representing the
grammatical structure of sentences. Treebanks are
used in tasks like parsing and syntax-based machine
learning.

• Multimodal corpora: Multimodal corpora include


text along with other modalities, such as images,
videos, or audio. They are used in tasks that involve
understanding and generating information from
multiple modalities.
Building and curating high-quality corpora is essential for the success
of various NLP applications, as the performance and generalization of
language models heavily rely on the quality and diversity of the data they
are trained on.
A vocabulary in NLP refers to the set of unique words or tokens present
in a corpus of text. It is a fundamental component of language processing,
as it defines the complete list of words that a model or system can
understand and work with.

22
Chapter 2 NLP Through the Ages

When processing text data, the following steps are typically performed
to create a vocabulary:
1. Tokenization: The text is split into individual tokens,
which can be words, subwords, or characters,
depending on the tokenization strategy used.
2. Filtering and normalization: Common
preprocessing steps such ss converting text to
lowercase, removing punctuation, and filtering out
stopwords are applied to clean the data and reduce
the size of the vocabulary.
3. Building vocabulary: After tokenization and
preprocessing, the unique tokens in the text data
are collected to form the vocabulary. Each token is
assigned a unique numerical index, which serves as
its representation in the model or during encoding
processes.
The vocabulary is often used to create numerical representations
of text data. In many NLP models, words are represented as dense
vectors (word embeddings) where each word’s embedding is indexed
using its integer representation in the vocabulary. This allows words to
be processed and manipulated as numerical data, making it easier for
machine learning models to work with textual information.
The size of the vocabulary depends on the corpus of text used for
training the model. Large-scale models, such as LLMs, often have very
extensive vocabularies containing hundreds of thousands or even millions
of unique words.
Handling the vocabulary size can be a challenge, as very large vocabularies
require more memory and computational resources. Techniques like
subword tokenization, which splits words into subword units, and methods
like Byte-Pair Encoding (BPE) or SentencePiece, can be used to handle large
vocabularies more efficiently and handle rare or out-of-vocabulary words.

23
Chapter 2 NLP Through the Ages

Word Embeddings
Word embeddings are dense vector representations of words in a
continuous vector space, where similar words are closer to each other.
These representations capture semantic relationships between words,
allowing NLP models to understand word meanings based on their
context.
The main advantages of word embeddings are as follows:

• Semantic meaning: Word embeddings capture


semantic meaning and relationships between words.
Similar words are close to each other in the embedding
space, and analogies like “man is to woman as king is to
queen” can be represented as vector arithmetic.

• Dimensionality reduction: Word embeddings reduce


the dimensionality of the word representation
compared to one-hot encodings. While one-hot
encodings are binary vectors with a length equal to the
vocabulary size, word embeddings typically have much
smaller fixed dimensions (e.g., 100, 300) regardless of
the vocabulary size.
• Generalization: Word embeddings generalize across
words, allowing models to learn from limited data.
Words that share similar contexts tend to have similar
embeddings, which enables models to understand the
meaning of new words based on their context.

• Continuous space: The embedding space is continuous,


enabling interpolation and exploration of relationships
between words. For example, one can add the vector
for “Spain” to “capital” and subtract “France” to find a
vector close to “Madrid.”

24
Chapter 2 NLP Through the Ages

Word embeddings are a fundamental tool in NLP and have greatly


improved the performance of various NLP tasks, such as machine
translation, sentiment analysis, text classification, and information retrieval.
Popular word embedding methods include simpler methods such as bag-
of-words (BoW) to more sophisticated methods such as Word2Vec, Global
Vectors for Word Representation (GloVe), and fastText. These methods
learn word embeddings by considering the co-occurrence patterns of words
in large text corpora, allowing the representations to capture the semantic
meaning and contextual relationships of words in the language.
Let us investigate two of these methods, bag-of-words and Word2Vec,
in more detail.

Bag-of-Words
The BoW method is a simple and popular technique for text
representation. It disregards the order and structure of words in a
document and focuses on the frequency of each word in the text. The
BoW model represents a document as a histogram of word occurrences,
creating a “bag” of words without considering their sequence.
Here are the steps of the bag-of-words method:
1. Tokenization: The first step is to break down the text
into individual words or tokens.
2. Vocabulary creation: The BoW model creates a
vocabulary, which is a list of all unique words
found in the corpus. Each word in the vocabulary is
assigned a unique index.
3. Vectorization: To represent a document using BoW,
a vector is created for each document, with the
length equal to the size of the vocabulary. Each
element of the vector corresponds to a word in the
vocabulary, and its value represents the frequency
of that word in the document.

25
Chapter 2 NLP Through the Ages

Here is an example on the bag-of-words method:


Let us take a corpus of the following three sentences:

• “I love to eat pizza.”

• “She enjoys eating pasta.”

• “They like to cook burgers.”

Step 1: Tokenization
The tokens in the corpus are: [“I”, “love”, “to”, “eat”, “pizza”, “She”,
“enjoys”, “eating”, “pasta”, “They”, “like”, “to”, “cook”, “burgers”].
Step 2: Vocabulary Creation
The vocabulary contains all unique words from the tokenized corpus:
[“I”, “love”, “to”, “eat”, “pizza”, “She”, “enjoys”, “eating”, “pasta”, “They”, “like”,
“cook”, “burgers”].
The vocabulary size is 13.
Step 3: Vectorization
Now, each document is represented as a vector using the vocabulary.
The vectors for the three sentences will be as follows:

[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]

(The vector shows that the words I, love, to, eat, and pizza appear once
in the document.)

[0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0]

(The vector shows that the words She, enjoys, eating, and pasta appear
once in the document.)

[0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]

(The vector shows that the words They, like, to, cook, and burgers
appear once in the document.)
Note that the order of the words is lost in the BoW representation, and
each document is represented solely based on the frequency of the words
present in it.

26
Chapter 2 NLP Through the Ages

The BoW method is a straightforward and effective way to convert text


into numerical vectors for use in various machine learning algorithms and
NLP tasks, such as text classification and information retrieval. However,
it does not consider the context or semantics of words, which can limit its
ability to capture deeper meaning in language data.

Word2Vec
Word2Vec is a popular and influential word embedding method in NLP. It
was introduced by Tomas Mikolov et al. at Google in 2013 and has since
become a foundational technique in various NLP tasks. The main idea
behind Word2Vec is to represent words as points in a high-dimensional
space, where the relative positions of words capture their semantic
relationships and contextual similarities. Words that appear in similar
contexts or have similar meanings are mapped to vectors that are close to
each other in the embedding space.
There are two primary architectures for training Word2Vec models.

• Continuous bag-of-words (CBOW)

CBOW aims to predict the target word given its context


(surrounding words). It uses a neural network to learn
word embeddings by taking the context words as input
and predicting the target word.

The context words are represented as one-hot-encoded


vectors or embeddings, and they are averaged to form a
single context vector.

The CBOW model tries to minimize the prediction


error between the predicted target word and the actual
target word.

27
Chapter 2 NLP Through the Ages

• Skip-gram

Skip-gram, on the other hand, aims to predict the


context words given a target word. It tries to learn the
embeddings by maximizing the likelihood of the context
words given the target word.

The target word is represented as a one-hot-encoded


vector or embedding, and the model tries to predict the
surrounding context words based on this representation.

Skip-gram is often preferred when the dataset is large, as


it generates more training examples by considering all
the context words for each target word.

During training, Word2Vec uses a shallow neural network to learn the


embeddings. The weights of the neural network are updated during the
training process using stochastic gradient descent or similar optimization
techniques. The objective is to learn word embeddings that effectively
capture the word semantics and co-occurrence patterns in the corpus.
Once trained, the Word2Vec model provides word embeddings
that can be used as input to various NLP tasks or serve as a powerful
representation for downstream applications. The trained embeddings
can be used in tasks such as sentiment analysis, machine translation,
document classification, and information retrieval, where they capture the
meaning and relationships between words in a continuous vector space.
Word2Vec has been instrumental in advancing the performance of NLP
models by enabling them to work effectively with textual data in a more
semantically meaningful manner.

28
Chapter 2 NLP Through the Ages

The typical process to train a Word2Vec model would involve the


following steps:

1. Data preparation:

Gather a large corpus of text data that will be used


for training the Word2Vec model. The corpus should
represent the domain or language you want to
capture word embeddings for.

2. Tokenization:

Tokenize the text data to break it down into


individual words or subwords. Remove any
unwanted characters, punctuation, and stopwords
during tokenization.

3. Create context-target pairs:

For each target word in the corpus, create context-


target pairs. The context is a window of words
surrounding the target word. The size of the window
is a hyperparameter, typically set to a small value
like 5 to 10 words. The context-target pairs are used
to train the model to predict the context given the
target word, or vice versa.

4. Convert words to indices:

Convert the words in the context-target pairs into


numerical indices, as Word2Vec models typically
work with integer word indices rather than actual
word strings.

29
Chapter 2 NLP Through the Ages

5. Create training examples:

Use the context-target pairs to create training


examples for the Word2Vec model. Each training
example consists of a target word (input) and its
corresponding context words (output) or vice
versa, depending on the architecture (CBOW or
skip-gram).

6. Architecture selection:

Choose the architecture you want to use for the


Word2Vec model. The two main architectures are
the following:

• CBOW: Predict the target word based on the


context words.

• Skip-gram: Predict the context words based on


the target word.

7. Define the neural network:

Create a shallow neural network for the chosen


architecture. The network will consist of an
embedding layer that represents words as dense
vectors and a softmax layer (for CBOW) or negative
sampling (for skip-gram) to perform the word
predictions.

8. Training:

Train the Word2Vec model on the training


examples using stochastic gradient descent or other
optimization algorithms. The objective is to minimize
the prediction loss, which measures the difference
between predicted and actual context or target words.

30
Chapter 2 NLP Through the Ages

9. Learn word embeddings:

As the model trains, the embedding layer learns to


map each word to a dense vector representation.
These word embeddings capture semantic
relationships and meaning based on the co-
occurrence patterns of words in the corpus.

10. Evaluation:

After training, evaluate the quality of the learned


word embeddings on downstream NLP tasks, such
as word similarity, analogies, or text classification,
to ensure they capture meaningful semantic
information.

The training process may require hyperparameter tuning, and the


model may need to be trained on a large corpus and for multiple epochs to
learn effective word embeddings. Once trained, the Word2Vec model can
be used to generate word vectors for any word in the vocabulary, enabling
the exploration of semantic relationships between words in a continuous
vector space.
Because of the popularity of the Word2Vec model, many machine
learning and NLP libraries have built-in implementations of it. This allows
you to easily utilize Word2Vec embeddings in your code without having to
manually train neural networks for it.

Bag-of-Words vs. Word2Vec


While both Bag-of-Words and Word2Vec are text representation methods
in NLP there are some key differences between them.

31
Chapter 2 NLP Through the Ages

Representation

• Bag-of-Words (BoW): BoW represents a document as


a histogram of word occurrences, without considering
the order or structure of the words. It creates a “bag” of
words, and each element in the vector represents the
frequency of a specific word in the document.

• Word2Vec: Word2Vec, on the other hand, represents


words as dense vectors in a continuous vector space.
It captures the semantic meaning and relationships
between words based on their context in the corpus.
Word2Vec embeddings are learned through a shallow
neural network model trained on a large dataset.

Context and semantics

• BoW: BoW does not consider the context or semantics


of words in a document. It treats each word as an
independent entity and focuses only on the frequency
of occurrence.

• Word2Vec: Word2Vec leverages the distributional


hypothesis, which suggests that words with similar
meanings tend to appear in similar contexts. Word2Vec
captures word embeddings that encode semantic
relationships, allowing for better understanding of
word meanings and similarities based on context.

Vector size

• BoW: The size of the BoW vector is equal to the size


of the vocabulary in the corpus. Each word in the
vocabulary is represented by a unique index, and the
vector elements indicate the frequency of occurrence.

32
Chapter 2 NLP Through the Ages

• Word2Vec: Word2Vec generates dense word


embeddings, typically with a fixed size (e.g., 100, 300
dimensions). The size of the word embeddings is
generally much smaller compared to the BoW vector,
which can be useful for memory and computational
efficiency.

Order of words

• BoW: BoW ignores the order of words in the document,


as it treats each document as a collection of individual
words and their frequencies. The order of words is lost
in the BoW representation.

• Word2Vec: Word2Vec considers the order of words


in the context window during training. It learns word
embeddings by predicting the likelihood of words
appearing in the context of other words, which
allows it to capture word meanings based on the
surrounding words.

Application

• BoW: BoW is commonly used for text classification,


sentiment analysis, and information retrieval tasks. It
is a simple and effective representation for these tasks,
especially when the sequence of words is not crucial.

• Word2Vec: Word2Vec is more suitable for tasks that


require understanding word semantics and capturing
word relationships, such as word similarity, word
analogies, and language generation tasks.

33
Another random document with
no related content on Scribd:
as a temple without altars; it will at any rate exemplify the distinction
between a benefit given and a benefit taken, a borrowed, a lent, and
an owned, an appropriated convenience. The British Museum, the
Louvre, the Bibliothèque Nationale, the treasures of South
Kensington, are assuredly, under forms, at the disposal of the people;
but it is to be observed, I think, that the people walk there more or
less under the shadow of the right waited for and conceded. It
remains as difficult as it is always interesting, however, to trace the
detail (much of it obvious enough, but much more indefinable) of the
personal port of a democracy that, unlike the English, is social as well
as political. One of these denotements is that social democracies are
unfriendly to the preservation of penetralia; so that when penetralia
are of the essence, as in a place of study and meditation, they
inevitably go to the wall. The main staircase, in Boston, has, with its
amplitude of wing and its splendour of tawny marble, a high and
luxurious beauty—bribing the restored absentee to emotion,
moreover, by expanding, monumentally, at one of its rests, into
admirable commemoration of the Civil War service of the two great
Massachusetts Volunteer regiments of élite. Such visions, such
felicities, such couchant lions and recorded names and stirred
memories as these, encountered in the early autumn twilight, colour
an impression—even though to say so be the limit of breach of the
silence in which, for persons of the generation of the author of these
pages, appreciation of them can best take refuge: the refuge to which
I felt myself anon reduced, for instance, opposite the State House, in
presence of Saint-Gaudens’s noble and exquisite monument to
Robert Gould Shaw and the Fifty-fourth Massachusetts. There are
works of memorial art that may suddenly place themselves, by their
operation in a given case, outside articulate criticism—which was
what happened, I found, in respect to the main feature, the rich
staircase of the Library. Another way in which the bribe, as I have
called it, of that masterpiece worked on the spot was by prompting
one to immediate charmed perception of the character of the deep
court and inner arcade of the palace, where a wealth of science and
taste has gone to producing a sense, when the afternoon light sadly
slants, of one of the myriad gold-coloured courts of the Vatican.
These are the refinements of the present Boston—keeping
company as they can with the healthy animation, as it struck me, of
the rest of the building, the multitudinous bustle, the coming and
going, as in a railway-station, of persons with carpet-bags and other
luggage, the simplicity of plan, the open doors and immediate
accesses, admirable for a railway-station, the ubiquitous children,
most irrepressible little democrats of the democracy, the vain quest,
above all, of the deeper depths aforesaid, some part that should be
sufficiently within some other part, sufficiently withdrawn and
consecrated, not to constitute a thoroughfare. Perhaps I didn’t
adequately explore; but there was always the visible scale and
scheme of the building. It was a shock to find the so brave decorative
designs of Puvis de Chavannes, of Sargent and Abbey and John
Elliott, hanging over mere chambers of familiarity and resonance;
and then, I must quickly add, it was a shock still greater perhaps to
find one had no good reason for defending them against such
freedoms. What was sauce for the goose was sauce for the gander:
had one not in other words, in the public places and under the great
loggias of Italy, acclaimed it as just the charm and dignity of these
resorts that, in their pictured and embroidered state, they still serve
for the graceful common life? It was true that one had not been
imprisoned in that consistency in the Laurentian, in the Ambrosian
Library—and at any rate one was here on the edge of abysses. Was it
not splendid, for example, to see, in Boston, such large provision
made for the amusement of children on rainy afternoons?—so many
little heads bent over their story-books that the edifice took on at
moments the appearance worn, one was to observe later on, by most
other American edifices of the same character, that of a lively
distributing-house of the new fiction for the young. The note was
bewildering—yet would one, snatching the bread-and-molasses from
their lips, cruelly deprive the young of rights in which they have been
installed with a majesty nowhere else approaching that of their
American installation? I am not wrong, probably, at all events, in
qualifying such a question as that as abysmal, and I remember how,
more than once, I took refuge from it in craven flight, straight across
the Square, to the already so interesting, the so rapidly-expanding
Art Museum.
There, for some reason, questions exquisitely dropped; perhaps
only for the reason that things sifted and selected have, very visibly,
the effect of challenging the confidence even of the rash. It is of the
nature of objects doomed to show distinction that they virtually
make a desert round them, and peace reigned unbroken, I usually
noted, in the two or three Museum rooms that harbour a small but
deeply-interesting and steadily-growing collection of fragments of
the antique. Here the restless analyst found work to his hand—only
too much; and indeed in presence of the gem of the series, of the
perhaps just too conscious grace of a certain little wasted and dim-
eyed head of Aphrodite, he felt that his function should simply give
way, in common decency, to that of the sonneteer. For it is an
impression by itself, and I think quite worth the Atlantic voyage, to
catch in the American light the very fact of the genius of Greece.
There are things we don’t know, feelings not to be foretold, till we
have had that experience—which I commend to the raffiné of almost
any other clime. I should say to him that he has not seen a fine Greek
thing till he has seen it in America. It is of course on the face of it the
most merciless case of transplanting—the mere moral of which, none
the less, for application, becomes by no means flagrant. The little
Aphrodite, with her connections, her antecedents and references
exhibiting the maximum of breakage, is no doubt as lonely a jewel as
ever strayed out of its setting; yet what does one quickly recognize
but that the intrinsic lustre will have, so far as that may be possible,
doubled? She has lost her background, the divine creature—has lost
her company, and is keeping, in a manner, the strangest; but so far
from having lost an iota of her power, she has gained unspeakably
more, since what she essentially stands for she here stands for alone,
rising ineffably to the occasion. She has in short, by her single
presence, as yet, annexed an empire, and there are strange glimmers
of moments when, as I have spoken of her consciousness, the very
knowledge of this seems to lurk in the depth of her beauty. Where
was she ever more, where was she ever so much, a goddess—and who
knows but that, being thus divine, she foresees the time when, as she
has “moved over,” the place of her actual whereabouts will have
become one of her shrines? Objects doomed to distinction make
round them a desert, I have said; but that is only for any gross
confidence in other matters. For confidence in them they make a
garden, and that is why I felt this quarter of the Boston Art Museum
bloom, under the indescribable dim eyes, with delicate flowers. The
impression swallowed up every other; the place, whatever it was, was
supremely justified, and I was left cold by learning that a much
bigger and grander and richer place is presently to overtop it.
The present establishment “dates back,” back almost to the good
Boston of the middle years, and is full of all sorts of accumulated and
concentrated pleasantness; which fact precisely gives the signal, by
the terrible American law, for its coming to an end and giving a
chance to the untried. It is a consistent application of the rotary
system—the untried always awaiting its turn, and quite perceptibly
stamping and snorting while it waits; all heedless as it is, poor
innocent untried, of the certain hour of the impatiences before which
it too will have to retreat. It is not indeed that the American laws, so
operating, have not almost always their own queer interest; founded
as they are, all together, on one of the strongest of the native
impulses. We see this characteristic again and again at play, see it in
especial wherever we see (which is more than frequently enough) a
university or a college “started” or amplified. This process almost
always takes the form, primarily, of more lands and houses and halls
and rooms, more swimming-baths and football-fields and gymnasia,
a greater luxury of brick and mortar, a greater ingenuity, the most
artful conceivable, of accommodation and installation. Such is the
magic, such the presences, that tend, more than any other, to figure
as the Institution, thereby perverting not a little, as need scarce be
remarked, the finer collegiate idea: the theory being, doubtless, and
again most characteristically, that with all the wrought stone and oak
and painted glass, the immense provision, the multiplied marbles
and tiles and cloisters and acres, “people will come,” that is,
individuals of value will, and in some manner work some miracle. In
the early American time, doubtless, individuals of value had to wait
too much for things; but that is now made up by the way things are
waiting for individuals of value. To which I must immediately add,
however—and it is the ground of my allusion of a moment ago—that
no impression of the “new” Boston can feel itself hang together
without remembrance of what it owes to that rare exhibition of the
living spirit lately achieved, in the interest of the fine arts, and of all
that is noblest in them, by the unaided and quite heroic genius of a
private citizen. To attempt to tell the story of the wonderfully-
gathered and splendidly-lodged Gardner Collection would be to
displace a little the line that separates private from public property;
and yet to find no discreet word for it is to appear to fail of feeling for
the complexity of conditions amid which so undaunted a devotion to
a great idea (undaunted by the battle to fight, losing, alas, with State
Protection of native art, and with other scarce less uncanny things)
has been able consummately to flower. It is in presence of the results
magnificently attained, the energy triumphant over everything, that
one feels the fine old disinterested tradition of Boston least broken.
VIII
CONCORD AND SALEM

I
I felt myself, on the spot, cast about a little for the right expression
of it, and then lost any hesitation to say that, putting the three or four
biggest cities aside, Concord, Massachusetts, had an identity more
palpable to the mind, had nestled in other words more successfully
beneath her narrow fold of the mantle of history, than any other
American town. “Compare me with places of my size, you know,” one
seemed to hear her plead, with the modesty that, under the mild
autumn sun, so well became her russet beauty; and this exactly it was
that prompted the emphasis of one’s reply, or, as it may even be
called, of one’s declaration.
“Ah, my dear, it isn’t a question of places of your ‘size,’ since
among places of your size you’re too obviously and easily first: it’s a
question of places, so many of them, of fifty times your size, and
which yet don’t begin to have a fraction of your weight, or your
character, or your intensity of presence and sweetness of tone, or
your moral charm, or your pleasant appreciability, or, in short, of
anything that is yours. Your ‘size’? Why, you’re the biggest little place
in America—with only New York and Boston and Chicago, by what I
make out, to surpass you; and the country is lucky indeed to have
you, in your sole and single felicity, for if it hadn’t, where in the
world should we go, inane and unappeased, for the particular
communication of which you have the secret? The country is
colossal, and you but a microscopic speck on the hem of its garment;
yet there’s nothing else like you, take you all round, for we see you
complacently, with the naked eye, whereas-there are vast sprawling,
bristling areas, great grey ‘centres of population’ that spread, on the
map, like irremediable grease-spots, which fail utterly of any appeal
to our vision or any control of it, leaving it to pass them by as if they
were not. If you are so thoroughly the opposite of one of these I don’t
say it’s all your superlative merit; it’s rather, as I have put it, your
felicity, your good fortune, the result of the half-dozen happy turns of
the wheel in your favour. Half-a-dozen such turns, you see, are, for
any mortal career, a handsome allowance; and your merit is that,
recognizing this, you have not fallen below your estate. But it’s your
fortune, above all, that’s your charm. One doesn’t want to be
patronizing, but you didn’t, thank goodness, make yours. That’s what
the other places, the big ones that are as nothing to you, are trying to
do, the country over—to make theirs; and, from the point of view of
these remarks, all in vain. Your luck is that you didn’t have to; yours
had been, just as it shows in you to-day, made for you, and you at the
most but gratefully submitted to it. It must be said for you, however,
that you keep it; and it isn’t every place that would have been capable
——! You keep the look, you keep the feeling, you keep the air. Your
great trees arch over these possessions more protectingly, covering
them in as a cherished presence; and you have settled to your tone
and your type as to treasures that can now never be taken. Show me
the other places in America (of the few that have had anything) from
which the best hasn’t mainly been taken, or isn’t in imminent danger
of being. There is old Salem, there is old Newport, which I am on my
way to see again, and which, if you will, are, by what I hear, still
comparatively intact; but their having was never a having like yours,
and they adorn, precisely, my little tale of your supremacy. No, I
don’t want to be patronizing, but your only fault is your tendency to
improve—I mean just by your duration as you are; which indeed is
the only sort of improvement that is not questionable.”
Such was the drift of the warm flood of appreciation, of reflection,
that Concord revisited could set rolling over the field of a prepared
sensibility; and I feel as if I had quite made my point, such as it is, in
asking what other American village could have done anything of the
sort. I should have been at fault perhaps only in speaking of the
interest in question as visible, on that large scale, to the “naked eye”;
the truth being perhaps that one wouldn’t have been so met half-way
by one’s impression unless one had rather particularly known, and
that knowledge, in such a case, amounts to a pair of magnifying
spectacles. I remember indeed putting it to myself on the November
Sunday morning, tepid and bright and perfect for its use, through
which I walked from the station under the constant archway of the
elms, as yet but indulgently thinned: would one know, for one’s self,
what had formerly been the matter here, if one hadn’t happened to
be able to get round behind, in the past, as it were, and more or less
understand? Would the operative elements of the past—little old
Concord Fight, essentially, and Emerson and Hawthorne and
Thoreau, with the rest of the historic animation and the rest of the
figured and shifting “transcendental” company, to its last and loosest
ramifications—would even these handsome quantities have so
lingered to one’s intelligent after-sense, if one had not brought with
one some sign by which they too would know; dim, shy spectralities
as, for themselves, they must, at the best, have become? Idle,
however, such questions when, by the chance of the admirable day,
everything, in its own way and order, unmistakably came out—every
string sounded as if, for all the world, the loose New England town
(and I apply the expression but to the relations of objects and
places), were a lyre swept by the hand of Apollo. Apollo was the spirit
of antique piety, looking about, pausing, remembering, as he moved
to his music; and there were glimpses and reminders that of course
kept him much longer than others.
Seated there at its ease, as if placidly familiar with pilgrims and
quite taking their homage for granted, the place had the very aspect
of some grave, refined New England matron of the “old school,” the
widow of a high celebrity, living on and on in possession of all his
relics and properties, and, though not personally addicted to gossip
or to journalism, having become, where the great company kept by
her in the past is concerned, quite cheerful and modern and
responsive. From her position, her high-backed chair by the window
that commands most of the coming and going, she looks up
intelligently, over her knitting, with no vision of any limit on her part
as yet, to this attitude, and with nothing indeed to suggest the
possibility of a limit save a hint of that loss of temporal perspective in
which we recognize the mental effect of a great weight of years. I had
formerly the acquaintance of a very interesting lady, of extreme age,
whose early friends, in “literary circles,” are now regarded as classics,
and who, toward the end of her life, always said, “You know Charles
Lamb has produced a play at Drury Lane,” or “You know William
Hazlitt has fallen in love with such a very odd woman.” Her facts
were perfectly correct; only death had beautifully passed out of her
world—since I don’t remember her mentioning to me the demise,
which she might have made so contemporary, either of Byron or of
Scott. When people were ill she admirably forbore to ask about them
—she disapproved wholly of such conditions; and there were
interesting invalids round about her, near to her, whose existence
she for long years consummately ignored. It is some such quiet
backward stride as those of my friend that I seem to hear the voice of
old Concord take in reference to her annals, and it is not too much to
say that where her soil is most sacred, I fairly caught, on the breeze,
the mitigated perfect tense. “You know there has been a fight
between our men and the King’s”—one wouldn’t have been
surprised, that crystalline Sunday noon, where so little had changed,
where the stream and the bridge, and all nature, and the feeling,
above all, still so directly testify, at any fresh-sounding form of such
an announcement.
I had forgotten, in all the years, with what thrilling clearness that
supreme site speaks—though anciently, while so much of the course
of the century was still to run, the distinctness might have seemed
even greater. But to stand there again was to take home this
foreshortened view, the gained nearness, to one’s sensibility; to look
straight over the heads of the “American Weimar” company at the
inestimable hour that had so handsomely set up for them their
background. The Fight had been the hinge—so one saw it—on which
the large revolving future was to turn; or it had been better, perhaps,
the large firm nail, ringingly driven in, from which the beautiful
portrait-group, as we see it to-day, was to hang. Beautiful
exceedingly the local Emerson and Thoreau and Hawthorne and (in a
fainter way) tutti quanti; but beautiful largely because the fine old
incident down in the valley had so seriously prepared their effect.
That seriousness gave once for all the pitch, and it was verily as if,
under such a value, even with the seed of a “literary circle” so freely
scattered by an intervening hand, the vulgar note would in that air
never be possible. As I had inevitably, in long absence, let the value,
for immediate perception, rather waste itself, so, on the spot, it came
back most instantly with the extraordinary sweetness of the river,
which, under the autumn sun, like all the American rivers one had
seen or was to see, straightway took the whole case straightway into
its hands. “Oh, you shall tell me of your impression when you have
felt what I can do for it: so hang over me well!”—that’s what they all
seem to say.
I hung over Concord River then as long as I could, and recalled
how Thoreau, Hawthorne, Emerson himself, have expressed with
due sympathy the sense of this full, slow, sleepy, meadowy flood,
which sets its pace and takes its twists like some large obese
benevolent person, scarce so frankly unsociable as to pass you at all.
It had watched the Fight, it even now confesses, without a
quickening of its current, and it draws along the woods and the
orchards and the fields with the purr of a mild domesticated cat who
rubs against the family and the furniture. Not to be recorded, at best,
however, I think, never to emerge from the state of the inexpressible,
in respect to the spot, by the bridge, where one most lingers, is the
sharpest suggestion of the whole scene—the power diffused in it
which makes it, after all these years, or perhaps indeed by reason of
their number, so irresistibly touching. All the commemorative
objects, the stone marking the burial-place of the three English
soldiers, the animated image of the young belted American yeoman
by Mr. Daniel French, the intimately associated element in the
presence, not far off, of the old manse, interesting theme of
Hawthorne’s pen, speak to the spirit, no doubt, in one of the subtlest
tones of which official history is capable, and yet somehow leave the
exquisite melancholy of everything unuttered. It lies too deep, as it
always so lies where the ground has borne the weight of the short,
simple act, intense and unconscious, that was to determine the event,
determine the future in the way we call immortally. For we read into
the scene too little of what we may, unless this muffled touch in it
somehow reaches us so that we feel the pity and the irony of the
precluded relation on the part of the fallen defenders. The sense that
was theirs and that moved them we know, but we seem to know
better still the sense that wasn’t and that couldn’t, and that forms our
luxurious heritage as our eyes, across the gulf, seek to meet their
eyes; so that we are almost ashamed of taking so much, such colossal
quantity and value, as the equivalent of their dimly-seeing offer. The
huge bargain they made for us, in a word, made by the gift of the
little all they had—to the modesty of which amount the homely rural
facts grouped there together have appeared to go on testifying—this
brilliant advantage strikes the imagination that yearns over them as
unfairly enjoyed at their cost. Was it delicate, was it decent—that is
would it have been—to ask the embattled farmers, simple-minded,
unwitting folk, to make us so inordinate a present with so little of the
conscious credit of it? Which all comes indeed, perhaps, simply to
the most poignant of all those effects of disinterested sacrifice that
the toil and trouble of our forefathers produce for us. The minute-
men at the bridge were of course interested intensely, as they
believed—but such, too, was the artful manner in which we see our
latent, lurking, waiting interest like, a Jew in a dusky back-shop,
providentially bait the trap.
Beyond even such broodings as these, and to another purpose,
moreover, the communicated spell falls, in its degree, into that
pathetic oddity of the small aspect, and the rude and the lowly, the
reduced and humiliated above all, that sits on so many nooks and
corners, objects and appurtenances, old contemporary things—
contemporary with the doings of our race; simplifying our
antecedents, our annals, to within an inch of their life, making us
ask, in presence of the rude relics even of greatness, mean retreats
and receptacles, constructionally so poor, from what barbarians or
from what pigmies we have sprung. There are certain rough black
mementos of the early monarchy, in England and Scotland, there are
glimpses of the original humble homes of other greatness as well,
that strike in perfection this grim little note; which has the interest of
our being free to take it, for curiosity, for luxury of thought, as that of
the real or that of the romantic, and with which, again, the deep
Concord rusticity, momentary medium of our national drama,
essentially consorts. We remember the small hard facts of the
Shakespeare house at Stratford; we remember the rude closet, in
Edinburgh Castle, in which James VI of Scotland was born, or the
other little black hole, at Holyrood, in which Mary Stuart “sat” and in
which Rizzio was murdered. These, I confess, are odd memories at
Concord; although the manse, near the spot where we last paused,
and against the edge of whose acre or two the loitering river seeks
friction in the manner I have mentioned, would now seem to have
shaken itself a trifle disconcertingly free of the ornamental mosses
scattered by Hawthorne’s light hand; it stands there, beyond its gate,
with every due similitude to the shrunken historic site in general. To
which I must hasten to add, however, that I was much more struck
with the way these particular places of visitation resist their pressure
of reference than with their affecting us as below their fortune.
Intrinsically they are as naught—deeply depressing, in fact, to any
impulse to reconstitute, the house in which Hawthorne spent what
remained to him of life after his return from the Italy of his Donatello
and his Miriam. Yet, in common with everything else, this mild
monument benefits by that something in the air which makes us
tender, keeps us respectful; meets, in the general interest, waving it
vaguely away, any closer assault of criticism.
It is odd, and it is also exquisite, that these witnessing ways should
be the last ground on which we feel moved to ponderation of the
“Concord school”—to use, I admit, a futile expression; or rather, I
should doubtless say, it would be odd if there were not inevitably
something absolute in the fact of Emerson’s all but lifelong
connection with them. We may smile a little as we “drag in” Weimar,
but I confess myself, for my part, much more satisfied than not by
our happy equivalent, “in American money,” for Goethe and Schiller.
The money is a potful in the second case as in the first, and if Goethe,
in the one, represents the gold and Schiller the silver, I find (and
quite putting aside any bimetallic prejudice) the same good relation
in the other between Emerson and Thoreau. I open Emerson for the
same benefit for which I open Goethe, the sense of moving in large
intellectual space, and that of the gush, here and there, out of the
rock, of the crystalline cupful, in wisdom and poetry, in Wahrheit
and Dichtung; and whatever I open Thoreau for (I needn’t take space
here for the good reasons) I open him oftener than I open Schiller.
Which comes back to our feeling that the rarity of Emerson’s genius,
which has made him so, for the attentive peoples, the first, and the
one really rare, American spirit in letters, couldn’t have spent his
career in a charming woody, watery place, for so long socially and
typically and, above all, interestingly homogeneous, without an effect
as of the communication to it of something ineffaceable. It was
during his long span his immediate concrete, sufficient world; it gave
him his nearest vision of life, and he drew half his images, we
recognize, from the revolution of its seasons and the play of its
manners. I don’t speak of the other half, which he drew from
elsewhere. It is admirably, to-day, as if we were still seeing these
things in those images, which stir the air like birds, dim in the
eventide, coming home to nest. If one had reached a “time of life”
one had thereby at least heard him lecture; and not a russet leaf fell
for me, while I was there, but fell with an Emersonian drop.
II
It never failed that if in moving about I made, under stress, an
inquiry, I should prove to have made it of a flagrant foreigner. It
never happened that, addressing a fellow-citizen, in the street, on
one of those hazards of possible communion with the indigenous
spirit, I should not draw a blank. So, inevitably, at Salem, when,
wandering perhaps astray, I asked my way to the House of the Seven
Gables, the young man I had overtaken was true to his nature; he
stared at me as a remorseless Italian—as remorseless, at least, as six
months of Salem could leave him. On that spot, in that air, I confess,
it was a particular shock to me to be once more, with my so good
general intention, so “put off”; though, if my young man but glared
frank ignorance of the monument I named, he left me at least with
the interest of wondering how the native estimate of it as a romantic
ruin might strike a taste formed for such features by the landscape of
Italy. I will not profess that by the vibration of this note the edifice of
my fond fancy—I mean Hawthorne’s Salem, and the witches’, and
that of other eminent historic figures—was not rather essentially
shaken; since what had the intention of my pilgrimage been, in all
good faith, in artless sympathy and piety, but a search again,
precisely, for the New England homogeneous—for the renewal of
that impression of it which had lingered with me from a vision
snatched too briefly, in a midsummer gloaming, long years ago. I had
been staying near, at that far-away time, and, the railroad helping,
had got myself dropped there for an hour at just the right moment of
the waning day. This memory had been, from far back, a kept felicity
altogether; a picture of goodly Colonial habitations, quite the high-
water mark of that type of state and ancientry, seen in the clear dusk,
and of almost nothing else but a pleasant harbour-side vacancy, the
sense of dead marine industries, that finally looked out at me, for a
climax, over a grass-grown interval, from the blank windows of the
old Customs House of the Introduction to The Scarlet Letter.
I could on that occasion have seen, with my eye on my return-
train, nothing else; but the image of these things I had not lost,
wrapped up as it even was, for the fancy, in some figment of the very
patch of old embroidered cloth that Hawthorne’s charming prefatory
pages unfold for us—pages in which the words are as finely “taken”
as the silk and gold stitches of poor Hester Prynne’s compunctious
needle. It had hung, all the years, closely together, and had served—
oh, so conveniently!—as the term of comparison, the rather rich
frame, for any suggested vision of New England life unalloyed. The
case now was the more marked that, already, on emerging from the
station and not knowing quite where to look again for my goodly
Georgian and neo-Georgian houses, I had had to permit myself to be
directed to them by a civil Englishman, accosted by the way, who, all
kindness and sympathy, immediately mentioned that they formed
the Grosvenor Square, as might be said, of Salem. We conversed for
the moment, and settled, as he told me, in the town, he was most
sustaining; but when, a little later on, I stood there in admiration of
the noble quarter, I could only feel, even while doing it every justice,
that the place was not quite what my imagination had counted on. It
was possibly even better, for the famous houses, almost without
exception ample and charming, seemed to me to show a grace even
beyond my recollection; the only thing was that I had never
bargained for looking at them through a polyglot air. Look at them
none the less, and at the fine old liberal scale, and felt symmetry, and
simple dignity, and solid sincerity of them, I gratefully did, with due
speculation as to their actual chances and changes, as to what they
represent to-day as social “values,” and with a lively impression,
above all, of their preserved and unsophisticated state. That was a
social value—which I found myself comparing, for instance, with
similar aspects, frequent and excellent, in old English towns.
The Salem houses, the best, were all of the old English family, and,
from picture to picture, all the parts would have matched; but the
moral, the social, the political climate, even more than the breath of
nature, had had in each case a different action, had begotten on
either side a different consciousness. Or was it nearer even to say
that these things had on one side begotten a consciousness, and had
on the other begotten comparatively none? The approximation
would have been the more interesting as each arrayed group might
pass for a supreme expression of respectability. It would be the tone
and weight, the quantity and quality, of the respectability that make
the difference; massive and square-shouldered, yet rather battered
and mottled, chipped and frayed, at last rather sceptical and cynical,
in fine, in the English figure—thin and clear, consistently sharp,
boldly unspotted, blankly serene, in the American. It was more
amusing at any rate to spin such fancies, in reaction from the alien
snub, than simply to see one’s antitheses reduced to a mere question
of the effect of climate. There would be yet more to say for the Salem
picture, many of the “bits” of which remain, as Ruskin might have
put it, entirely delightful; but their desperate clean freshness was
what was more to abide with me after the polyglot air had cleared a
little. The spacious, courteous doorways of the houses, expansively
columned, fluted, framed; their large honest windows, in ample tiers,
only here and there dishonoured by the modern pane; their high
bland foreheads, in short, with no musty secrets in the eaves—yes,
not one, in spite of the “speciality,” in this respect, of the Seven
Gables, to which I am coming—clarify too much perhaps the
expressive mask, the look of experience, depress the balance toward
the type of the expensive toy, shown on its shelf, but too good to be
humanly used. It’s as if the old witches had been suffered to live
again, penally, as public housemaids, using nocturnally, for purposes
of almost viciously-thorough purification, the famous broomsticks
they used wantonly to ride.
Was it a sacred terror, after this, that stayed me from crossing the
threshold of the Witch House?—in spite of the quite definite sturdy
stamp of this attraction. I think it was an almost sacred tenderness
rather, the instinct of not pressing too hard on my privilege and of
not draining the offered cup to the lees. It is always interesting, in
America, to see any object, some builded thing in particular, look as
old as it possibly can; for the sight of which effort we sometimes hold
our breath as if to watch, over the course of the backward years, the
straight “track” of the past, the course of some hero of the foot-race
on whom we have staked our hopes. How long will he hold out, how
far back will he run, and where, heroically blown, will he have to
drop? Our suspense is great in proportion to our hope, and if we are
nervously constituted we may very well, at the last, turn away for
anxiety. It was really in some such manner I was affected, I think,
before the Salem Witch House, in presence of the mystery of
antiquity. It is a modest wooden structure, consciously primitive,
standing, if I remember rightly, in some effective relation to a street-
corner and putting no little purpose into its archaism. The pity is,
however, that unrelieved wooden houses never very curiously testify
—as I was presently to learn, to my cost, from the dreadful anti-
climax of the Seven Gables. They look brief and provisional at the
best—look, above all, incorrigibly and witlessly innocent. The quite
sufficiently sturdy little timbered mass by the Salem street, none the
less, with a sidelong crook or twist that we may take as symbolizing
ancient perversity, runs the backward race as long-windedly as we
may anywhere, over the land, see it run. Had I gone in, as a frank
placard invited me, I might have better measured the exploit; yet, on
the other hand, fearing frank placards, in general, in these cases,
fearing nothing so much as reconstituted antiquity, I might have lost
a part of my good little impression—which otherwise, as a small pale
flower plucked from a withered tree, I could fold away, intact,
between the leaves of my romantic herbarium.
I wanted, moreover, to be honest, not to fail, within the hour, of
two other urgent matters, my train away (my sense of Salem was too
destined to be train-haunted) and a due visitation of the Seven
Gables and of the birth-house of their chronicler. It was in the course
of this errand that I was made to feel myself, as I have mentioned,
living, rather witlessly, in a world unknown to the active Salemite of
to-day—a world embodied, I seemed to make out, in the large untidy
industrial quarter that had sprung up since my early visit. Did I quite
escape from this impression before alighting at last happily upon the
small stale structure that had sheltered the romancer’s entrance into
life and that now appears, according to the preference of fancy, either
a strange recipient of the romantic germ or the very spot to cause it,
in protest and desperation, to develop? I took the neighbourhood, at
all events, for the small original Hawthornesque world, keeping the
other, the smoky modernism, at a distance, keeping everything, in
fact, at a distance—on so spare and bare and lean and mean a face
did the bright hard sky strike me as looking down. The way to think
of it evidently was in some frank rural light of the past, that of all the
ancient New England simplicities, with the lap of wide waters and
the stillness of rocky pastures never far off (they seem still indeed
close at hand), and with any number of our present worryings and
pamperings of the “literary temperament” too little in question to be
missed. It kept at a distance, in fact, so far as my perception was
concerned, everything but a little boy, a dear little harsh, intelligent,
sympathetic American boy, who dropped straight from the hard sky
for my benefit (I hadn’t seen him emerge from elsewhere) and turned
up at my side with absolute confidence and with the most knowing
tips. He might have been a Weimar tout or a Stratford amateur—only
he so beautifully wasn’t. That is what I mean by my having alighted
happily; the little boy was so completely master of his subject, and
we formed, on the spot, so close an alliance. He made up to me for
my crude Italian—the way they become crude over here!—he made
up to me a little even for my civil Englishman; he was exactly what I
wanted—a presence (and he was the only thing far or near) old
enough, native and intimate enough, to reach back and to
understand.
He showed me the window of the room in which Hawthorne had
been born; wild horses, as the phrase is, wouldn’t have dragged me
into it, but he might have done so if he hadn’t, as I say, understood.
But he understood everything, and knew when to insist and when not
to; knew, for instance, exactly why I said “Dear, dear, are you very
sure?” after he had brought me to sight of an object at the end of a
lane, by a vague waterside, I think, and looking across to
Marblehead, that he invited me to take, if I could, for the Seven
Gables. I couldn’t take it in the least, as happens, and though he was
perfectly sure, our reasons, on either side, were equally clear to him
—so that in short I think of him as the very genius of the place,
feeding his small shrillness on the cold scraps of Hawthorne’s leaving
and with the making of his acquaintance alone worth the journey.
Yet the fact that, the Seven Gables being in question, the shapeless
object by the waterside wouldn’t do at all, not the least little bit,
troubled us only till we had thrown off together, with a quick,
competent gesture and at the breaking of light, the poor illusion of a
necessity of relation between the accomplished thing, for poetry, for
art, and those other quite equivocal things that we inflate our
ignorance with seeing it suggested by. The weak, vague domiciliary
presence at the end of the lane may have “been” (in our poor
parlance) the idea of the admirable book—though even here we take
a leap into dense darkness; but the idea that is the inner force of the
admirable book so vividly forgets, before our eyes, any such origin or
reference, “cutting” it dead as a low acquaintance and outsoaring the
shadow of its night, that the connection has turned a somersault into
space, repudiated like a ladder kicked back from the top of a wall.
Hawthorne’s ladder at Salem, in fine, has now quite gone, and we but
tread the air if we attempt to set our critical feet on its steps and its
rounds, learning thus as we do, and with infinite interest as I think,
how merely “subjective” in us are our discoveries about genius.
Endless are its ways of besetting and eluding, of meeting and
mocking us. When there are appearances that might have nourished
it we see it as swallowing them all; yet we see it as equally gorged
when there are no appearances at all—then most of all, sometimes,
quite insolently bloated; and we recognize ruefully that we are
forever condemned to know it only after the fact.
IX
PHILADELPHIA

I
To be at all critically, or as we have been fond of calling it,
analytically, minded—over and beyond an inherent love of the
general many-coloured picture of things—is to be subject to the
superstition that objects and places, coherently grouped, disposed
for human use and addressed to it, must have a sense of their own, a
mystic meaning proper to themselves to give out: to give out, that is,
to the participant at once so interested and so detached as to be
moved to a report of the matter. That perverse person is obliged to
take it for a working theory that the essence of almost any settled
aspect of anything may be extracted by the chemistry of criticism,
and may give us its right name, its formula, for convenient use. From
the moment the critic finds himself sighing, to save trouble in a
difficult case, that the cluster of appearances can have no sense, from
that moment he begins, and quite consciously, to go to pieces; it
being the prime business and the high honour of the painter of life
always to make a sense—and to make it most in proportion as the
immediate aspects are loose or confused. The last thing decently
permitted him is to recognize incoherence—to recognize it, that is, as
baffling; though of course he may present and portray it, in all
richness, for incoherence. That, I think, was what I had been mainly
occupied with in New York; and I quitted so qualified a joy, under
extreme stress of winter, with a certain confidence that I should not
have moved even a little of the way southward without practical
relief: relief which came in fact ever so promptly, at Philadelphia, on
my feeling, unmistakably, the change of half the furniture of

You might also like