Introduction To Conformal Prediction With Python: A Short Guide For Quantifying Uncertainty of Machine Learning Models 1st Edition Christoph Molnar

Introduction To Conformal Prediction
With Python : A Short Guide For

Quantifying Uncertainty Of Machine
Learning Models 1st Edition Christoph
Molnar
Visit to download the full and correct content document:
https://ebookmeta.com/product/introduction-to-conformal-prediction-with-python-a-sh
ort-guide-for-quantifying-uncertainty-of-machine-learning-models-1st-edition-christoph
-molnar/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Learn TensorFlow 2.0: Implement Machine Learning and

Deep Learning Models with Python 1st Edition Pramod
Singh
https://ebookmeta.com/product/learn-tensorflow-2-0-implement-
machine-learning-and-deep-learning-models-with-python-1st-
edition-pramod-singh/
Learning Genetic Algorithms with Python Empower the

performance of Machine Learning and AI models with the
capabilities of a powerful search algorithm 1st Edition
Gridin
https://ebookmeta.com/product/learning-genetic-algorithms-with-
python-empower-the-performance-of-machine-learning-and-ai-models-
with-the-capabilities-of-a-powerful-search-algorithm-1st-edition-
gridin/
Deep Learning for Finance: Creating Machine & Deep

Learning Models for Trading in Python 1st Edition
Sofien Kaabar
https://ebookmeta.com/product/deep-learning-for-finance-creating-
machine-deep-learning-models-for-trading-in-python-1st-edition-
sofien-kaabar/
Machine Learning for Knowledge Discovery with R:

Methodologies for Modeling, Inference and Prediction
1st Edition Kao-Tai Tsai
https://ebookmeta.com/product/machine-learning-for-knowledge-
discovery-with-r-methodologies-for-modeling-inference-and-
prediction-1st-edition-kao-tai-tsai/
Multivariate Statistical Machine Learning Methods for
Genomic Prediction Montesinos López
https://ebookmeta.com/product/multivariate-statistical-machine-
learning-methods-for-genomic-prediction-montesinos-lopez/
Machine Learning with Python Cookbook, 2nd Edition Kyle

Gallatin
https://ebookmeta.com/product/machine-learning-with-python-
cookbook-2nd-edition-kyle-gallatin/
Machine Learning with Python Cookbook 2nd Edition Chris

Albon
https://ebookmeta.com/product/machine-learning-with-python-
cookbook-2nd-edition-chris-albon/
Beginner's Guide to Streamlit with Python: Build Web-

Based Data and Machine Learning Applications 1st
Edition Sujay Raghavendra
https://ebookmeta.com/product/beginners-guide-to-streamlit-with-
python-build-web-based-data-and-machine-learning-
applications-1st-edition-sujay-raghavendra/
Building Machine Learning and Deep Learning Models on

Google Cloud Platform: A Comprehensive Guide for
Beginners 1st Edition Ekaba Bisong
https://ebookmeta.com/product/building-machine-learning-and-deep-
learning-models-on-google-cloud-platform-a-comprehensive-guide-
for-beginners-1st-edition-ekaba-bisong/
Introduction To Conformal
Prediction With Python
A Short Guide for Quantifying Uncertainty of Machine
Learning Models
Christoph Molnar
Introduction To Conformal Prediction With
Python
A Short Guide for Quantifying Uncertainty of Machine Learning Models
© 2023 Christoph Molnar, Germany, Munich
christophmolnar.com
For more information about permission to reproduce selections from this book,
write to christoph.molnar.ai@gmail.com.
2023, First Edition
Christoph Molnar
c/o MUCBOOK, Heidi Seibold
Elsenheimerstraße 48
80687 München, Germany
commit id: 7319978

Content
1 Summary 7
2 Preface 9
3 Who This Book Is For 10
4 Introduction to Conformal Prediction 11

4.1 We need uncertainty quantification . . . . . . . . . . . . . . . . . 11
4.2 Uncertainty has many sources . . . . . . . . . . . . . . . . . . . . 12
4.3 Distinguish good from bad predictions . . . . . . . . . . . . . . . 13
4.4 Other approaches don’t have guaranteed coverage . . . . . . . . . 15
4.5 Conformal prediction fills the gap . . . . . . . . . . . . . . . . . . 16
5 Getting Started with Conformal Prediction in Python 19

5.1 Installing the software . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Let’s classify some beans . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 First try: a naive approach . . . . . . . . . . . . . . . . . . . . . . 23
5.4 Second try: conformal classification . . . . . . . . . . . . . . . . . 24
5.5 Getting started with MAPIE . . . . . . . . . . . . . . . . . . . . 28
6 Intuition Behind Conformal Prediction 33

6.1 Conformal prediction is a recipe . . . . . . . . . . . . . . . . . . . 37
6.2 Understand parallels to out-of-sample evaluation . . . . . . . . . . 38
6.3 How to interpret prediction regions and coverage . . . . . . . . . 41
6.4 Conformal prediction and supervised learning . . . . . . . . . . . 41
7 Classification 43
7.1 Back to the beans . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 The naive method doesn’t work . . . . . . . . . . . . . . . . . . . 45
7.3 The Score method is simple but not adaptive . . . . . . . . . . . 46
4
7.4 Use Adaptive Prediction Sets (APS) for conditional coverage . . . 51
7.5 Top-k method for fixed size sets . . . . . . . . . . . . . . . . . . . 58
7.6 Regularized APS (RAPS) for small sets . . . . . . . . . . . . . . . 59
7.7 Group-balanced conformal prediction . . . . . . . . . . . . . . . . 61
7.8 Class-Conditional APS (CCAPS) for coverage by class . . . . . . 63
7.9 Guide for choosing a conformal classification method . . . . . . . 64
8 Regression and Quantile Regression 65

8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.2 Rent Index Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.3 Conformalized Mean Regression . . . . . . . . . . . . . . . . . . . 66
8.4 Conformalized Quantile Regression (CQR) . . . . . . . . . . . . . 75
9 A Glimpse Beyond Classification and Regression 83

9.1 Quickly categorize conformal prediction by task and score . . . . 83
9.2 Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . . 85
9.3 Multi-Label Classification . . . . . . . . . . . . . . . . . . . . . . 85
9.4 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.5 Probability Calibration . . . . . . . . . . . . . . . . . . . . . . . . 88
9.6 And many more tasks . . . . . . . . . . . . . . . . . . . . . . . . 88
9.7 How to stay up to date . . . . . . . . . . . . . . . . . . . . . . . . 89
10 Design Your Own Conformal Predictor 90

10.1 Steps to build your own conformal predictor . . . . . . . . . . . . 90
10.2 Finding the right non-conformity score . . . . . . . . . . . . . . . 91
10.3 Start with a heuristic notion of uncertainty . . . . . . . . . . . . . 92
10.4 A general recipe for 1D uncertaity heuristics . . . . . . . . . . . . 92
10.5 Metrics for evaluating conformal predictors . . . . . . . . . . . . . 93
11 Q & A 95
11.1 How do I choose the calibration size? . . . . . . . . . . . . . . . . 95
11.2 How do I make conformal prediction reproducible? . . . . . . . . 95
11.3 How does alpha affect the size of the prediction regions? . . . . . 95
11.4 What happens if I choose a large 𝛼 for conformal classification? . 96
11.5 How to interpret empty prediction sets? . . . . . . . . . . . . . . 96
11.6 Can I use the same data for calibration and model evaluation? . . 96
11.7 What if I find errors in the book or want to provide feedback? . . 97
5
12 Acknowledgements 98
References 99
6
1 Summary
A prerequisite for trust in machine learning is uncertainty quantification. Without
it, an accurate prediction and a wild guess look the same.
Yet many machine learning models come without uncertainty quantification. And
while there are many approaches to uncertainty – from Bayesian posteriors to
bootstrapping – we have no guarantees that these approaches will perform well
on new data.
At first glance conformal prediction seems like yet another contender. But con-
formal prediction can work in combination with any other uncertainty approach
and has many advantages that make it stand out:
• Guaranteed coverage: Prediction regions generated by conformal predic-

tion come with coverage guarantees of the true outcome
• Easy to use: Conformal prediction approaches can be implemented from
scratch with just a few lines of code
• Model-agnostic: Conformal prediction works with any machine learning
model
• Distribution-free: Conformal prediction makes no distributional assump-
tions
• No retraining required: Conformal prediction can be used without re-
training the model
• Broad application: conformal prediction works for classification, regres-
sion, time series forecasting, and many other tasks
Sound good? Then this is the right book for you to learn about this versatile,
easy-to-use yet powerful tool for taming the uncertainty of your models.
This book:
• Teaches the intuition behind conformal prediction
7
• Demonstrates how conformal prediction works for classification and regres-
sion
• Shows how to apply conformal prediction using Python
• Enables you to quickly learn new conformal algorithms
With the knowledge in this book, you’ll be ready to quantify the uncertainty of
any model.
8
2 Preface
My first encounter with conformal prediction was years ago, when I read a paper
on feature importance. I wasn’t looking for uncertainty quantification. Never-
theless, I tried to understand conformal prediction but was quickly discouraged
because I didn’t immediately understand the concept. I moved on.
About 4 years later, conformal prediction kept popping up on my Twitter and
elsewhere. I tried to ignore it, mostly successfully, but at some point I became
interested in understanding what conformal prediction was. So I dug deeper and
found a method that I actually find intuitive.
My favorite way to learn is to teach, so I decided to do a deep dive in the form of an
email course. For 5 weeks, my newsletter Mindful Modeler1 became a classroom
for conformal prediction. I didn’t know how this experiment would turn out. But
it quickly became clear that many people were eager to learn about conformal
prediction. The course was a success. So I decided to build on that and turn
everything I learned about conformal prediction into a book. You hold the results
in your hand (or in your RAM).
I love turning academic knowledge into practical advice. Conformal prediction is
in a sweet spot: There’s an explosion of academic interest and conformal predic-
tion holds great promise for practical data science. The math behind conformal
prediction isn’t easy. That’s one reason why I gave it a pass for a few years. But
it was a pleasant surprise to find that from an application perspective, conformal
prediction is simple. Solid theory, easy to use, broad applicability – conformal
prediction is ready. But it still lives mostly in the academic sphere.
With this book, I hope to strengthen the knowledge transfer from academia to
practice and bring conformal prediction to the streets.
1
https://mindfulmodeler.substack.com/
9
3 Who This Book Is For
This book is for data scientists, statisticians, machine learners and all other mod-
elers who want to learn how to quantify uncertainty with conformal prediction.
Even if you already use uncertainty quantification in one way or another, confor-
mal prediction is a valuable addition to your toolbox.
Prerequisites:
• You should know the basics of machine learning

• Practical experience with modeling is helpful
• If you want to follow the code examples, you should know the basics of
Python or at least another programming language
• This includes knowing how to install Python and Python libraries
The book is not an academic introduction to the topic, but a very practical one.
So instead of lots of theory and math, there will be intuitive explanations and
hands-on examples.
10
4 Introduction to Conformal
Prediction
In this chapter, you’ll learn
• Why and when we need uncertainty quantification

• What conformal prediction is
4.1 We need uncertainty quantification

Machine learning models make predictions and to fully trust them, we need to
know how certain those predictions really are.
Uncertainty quantification is essential in many situations:
• When we use model predictions to make decisions

• When we want to design robust systems that can handle unexpected situa-
tions
• When we have automated a task with machine learning and need an indi-
cator of when to intervene
• When we want to communicate the uncertainty associated with our predic-
tions to stakeholders
The importance of quantifying uncertainty depends on the application for which

machine learning is being used. Here are some use cases:
• Uncertainty quantification can improve fraud detection in insurance claims

by providing context to case workers evaluating potentially fraudulent
claims. This is especially important when a machine learning model used
to detect fraud is uncertain in its predictions. In such cases, the case
11
workers can use the uncertainty estimates to prioritize their review of the
claim and intervene if necessary.
• Uncertainty quantification can be used to improve the user experience in a
banking app. While the classification of financial transactions into “rent,”
“groceries,” and so on can be largely automated through machine learning,
there will always be transactions that are difficult to classify. Uncertainty
quantification can identify tricky transactions and prompt the user to clas-
sify them.
• Demand forecasting using machine learning can be improved by using un-
certainty quantification, which can provide additional context on the con-
fidence in the prediction. This is especially important in situations where
the demand must meet a certain threshold in order to justify production.
By understanding the uncertainty of the forecast, an organization can make
more informed decisions about whether to proceed with production.
ė Note
As a rule of thumb, you need uncertainty quantification whenever a point
prediction isn’t informative enough.
But where does this uncertainty come from?
4.2 Uncertainty has many sources

A prediction is the result of measuring and collecting data, cleaning the data, and
training a model. Uncertainty can creep into the pipeline at every step of this
long journey:
• The model is trained on a random sample of data, making the model itself
a random variable. If you were to train the model on a different sample
from the same distribution, you would get a slightly different model.
• Some models are even trained in a non-deterministic way. Think of random
weight initialization in neural networks or sampling mechanisms in random
forests. If you train a model with non-deterministic training twice on the
same data, you will get slightly different models.
• This uncertainty in model training is worse when the training dataset is
small.
12
• Hyperparameter tuning, model selection, and feature selection have the
same problem – all of these modeling steps involve estimation based on
random samples of data, which adds to uncertainty to the modeling process.
• The data may not be perfectly measured. The features or the target may
contain measurement errors, such as people filling out surveys incorrectly,
copying errors, and faulty measurements.
• Data sets may have missing values.
Some examples:
• Let’s say we’re predicting house values. The floor type feature isn’t always
accurate, so our model has to work with data that contains measurement
errors. For this and other reasons, the model will not always predict the
house value correctly.
• Decision trees are known to be unstable – small changes in the data can lead
to large differences in how the tree looks like. While this type of uncertainty
is “invisible” when only one tree is trained, it becomes apparent when the
model is retrained, since a new tree will likely have different splits.
• Image classification: Human labelers may disagree on how to classify an
image. A dataset consisting of different human labelers will therefore con-
tain uncertainty as the model will never be able to perfectly predict the
“correct” class, because the true class is up for debate.
4.3 Distinguish good from bad predictions

A trained machine learning model can be thought of as a function that takes the
features as input and outputs a prediction. But not all predictions are equally
hard. Some predictions will be spot on but others will be like wild guesses by
the model, and if the model doesn’t output some kind of confidence or certainty
score, we have a problem: We can’t distinguish good predictions from wild guesses.
Both are just spit out by the model.
Imagine an image classifier that decides whether a picture shows a cat, a dog
or some other animal. Digging a bit into the data, we find that there are some
images where the pets are dressed in costumes, see Figure 4.1b.
For classification, we at least have an idea of how uncertain the classification was.
Look at these two distributions of model probability scores in Figure Figure 4.2:
13
(a) Clearly A Dog (b) Don’t let these dogs bamboozle you.
They want you to believe that they are
ghosts. They are not!
Figure 4.1: Not all images are equally difficult to classify.
14
One classification is quite clear, because the probability is so high. In the other
case, it was a close call for the “cat” category, so we would assume that this
classification was less certain.
(a) Easy Dogo (b) Difficult
Figure 4.2: Classification scores by class.
At first glance, aren’t we done when the model outputs probabilities and we use
them to get an idea of uncertainty? Unfortunately, no. Let’s explore why.
4.4 Other approaches don’t have guaranteed

coverage
For classification we get the class probabilities, Bayesian models produce pre-
dictive posterior distributions, and random forests can show the variance across
trees. In theory, we could just use rely on such approaches to uncertainty. If we
do that, why would we need conformal prediction?
15
The main problem is that these approaches don’t come with any reasonable1
guarantee that they cover the true outcome (Niculescu-Mizil and Caruana 2005;
Lambrou et al. 2012; Johansson and Gabrielsson 2019; Dewolf et al. 2022).
• Class probabilities: We should not interpret these scores as actual prob-

abilities – they just look like probabilities, but are usually not calibrated.
Probability scores are calibrated if, for example, among all classifications
with a score of 90%, we find the true class 9 times out of 10.
• Bayesian posterior predictive intervals: While these intervals express our
belief about where the correct outcome is likely to be, the interval is based
on distributional assumptions for the prior and the distribution family cho-
sen for the data. But unfortunately, reality is often more complex than the
simplified distribution assumptions that we make.
• Bootstrapping: Refitting the model with sampled data can give us an idea
of the uncertainty of a prediction. However, bootstrapping is known to
underestimate the true variance, meaning that 90% prediction intervals are
likely to cover the true value less than 90% of the time (Hesterberg 2015).
Bootstrapped intervals are usually too narrow, especially for small samples.
ė Naive Approach
The naive approach is to take at face value the uncertainty scores that the
model spits out - confidence intervals, variance, Bayesian posteriors, multi-
class probabilities. The problem: you can’t expect these outcomes to be well
calibrated.
4.5 Conformal prediction fills the gap

Conformal prediction is a set of methods that takes an uncertainty score and
turns it into a rigorous score. “Rigorous” means that the output has probabilistic
guarantees that it covers the true outcome.
1
Some methods, such as Bayesian posteriors actually do have guarantees that they cover the
true values. However, this depends on modeling assumptions, such as the priors and data
distributions. Such distributional assumptions are an oversimplification for practically all
real applications and are likely to be violated. Therefore, you can’t count on coverage
guarantees that are based on strong assumptions.
16
Conformal prediction changes what a prediction looks like: it turns point pre-
dictions into prediction regions.2 For multi-class classification it turns the class
output into a set of classes:
Conformal prediction has many advantages that make it a valuable tool to

wield:
• Distribution-free: No assumptions about the distribution of the data,
unlike for Bayesian approaches where you have to specify the priors and
data distribution
• Model-agnostic: Conformal prediction can be applied to any predictive
model
• Coverage guarantee: The resulting prediction sets come with guarantees
of covering the true outcome with a certain probability
2
There’s a difference between confidence intervals (or Bayesian posteriors for that matter) and
prediction intervals. The latter quantify the uncertainty of a prediction and therefore can
be applied to any predictive model. The former only makes sense for parametric models like
logistic regression and describes the uncertainty of the model parameters.
17
Warning
Conformal prediction has one important assumption: exchangeability. If the

data used for calibration is very different from the data for which you want
to quantify the predictive uncertainty, the coverage guarantee goes down the
drain. For e.g. conformal time series forecasting, exchangeability is relaxed
but needs other assumptions.
Before we delve into theory and intuition, let’s see conformal prediction in ac-
tion.
18
5 Getting Started with Conformal
Prediction in Python
In this chapter, you’ll learn:
• That naively trusting class probabilities is bad

• How to use conformal prediction in Python with the MAPIE library
• How to implement a simple conformal prediction algorithm yourself
5.1 Installing the software

To run the examples in this book on your machine, you need Python and some
libraries installed. These are the libraries that I used, along with their version:
• Python (3.10.7)
• scikit-learn (1.2.0)
• MAPIE1 (0.6.1)
• pandas (1.5.2)
• matplotlib (3.6.2)
Before we dive into any kind of theory with conformal prediction let’s just get a
feel for it with a code example.
1
https://mapie.readthedocs.io/en/latest/index.html
19
5.2 Let’s classify some beans
A (fictional) bean company uses machine learning to automatically classify dry
beans2 into 1 of 7 different varieties: Barbunya, Bombay, Cali, Dermason, Horoz,
Seker, and Sira.
The bean dataset contains 13,611 beans (Koklu and Ozkan 2020). Each row is a
dry bean with 8 measurements such as length, roundness, and solidity, in addition
to the variety which is the prediction target.
The different varieties have different characteristics, so it makes sense to classify
them and sell the beans by variety. Planting the right variety is important for
reasons of yield and disease protection. Automating this classification task with
machine learning frees up a lot of time that would otherwise be spent doing it
manually.
Here is how to download the data:
import os
import wget
import zipfile
from os.path import exists
# Download if not available

bean_data_file = "./DryBeanDataset/Dry_Bean_Dataset.xlsx"
base = "https://archive.ics.uci.edu/ml/machine-learning-databases/"
dataset_number = "00602"
if not exists(bean_data_file):
filename = "DryBeanDataset.zip"
url = base + dataset_number + "/" + filename
wget.download(url)
with zipfile.ZipFile(filename, 'r') as zip_ref:
zip_ref.extractall('./')
os.remove(filename)
2
Dry beans are not to be confused with dried beans. Well, you buy dry beans dried, but not
all dried beans are dry beans. Get it? Dry beans are a type of bean (small and white) eaten
in Turkey, for example.
20
The model was trained in ancient times by some legendary dude who left the
company a long time ago. It’s a Naive Bayes model. And it sucks. This is his
code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
# Read in the data from Excel file

beans = pd.read_excel(bean_data_file)
# Labels are characters but should be integers for sklearn
le = LabelEncoder()
beans["Class"] = le.fit_transform(beans["Class"])
# Split data into classification target and features
y = beans["Class"]
X = beans.drop("Class", axis = 1)
# Split of training data

X_train, X_rest1, y_train, y_rest1 = train_test_split(
X, y, train_size=10000, random_state=2
)
# From the remaining data, split of test data

X_test, X_rest2, y_test, y_rest2 = train_test_split(
X_rest1, y_rest1, train_size=1000, random_state=42
)
# Split remaining into calibration and "new" data

X_calib, X_new, y_calib, y_new = train_test_split(
)
# Fit the model

model = GaussianNB().fit(X_train, y_train)
21
Instead of splitting the data only into training and testing, we split the 13,611
beans into:
• 10,000 data samples (X_train, y_train) for training the model

• 1,000 data samples (X_test, y_test) for evaluating model performance
• 1,000 data samples (X_calib, y_calib) for calibration (more on that later)
• The remaining 1,611 data samples (X_new, y_new) for the conformal pre-
diction step and for evaluating the conformal predictor (more on that later)
The dude didn’t even bother to tune hyperparameters or do model selection.
Yikes. Well, let’s have a look at the predictive performance:
from sklearn.metrics import confusion_matrix

# Check accuracy
y_pred = model.predict(X_test)
print("Accuracy:", (y_pred == y_test).mean())
# Create the confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(pd.DataFrame(cm, index=le.classes_, columns=le.classes_))
Accuracy: 0.758
BARBUNYA BOMBAY CALI DERMASON HOROZ SEKER SIRA
BARBUNYA 46 0 47 0 6 0 4
BOMBAY 0 33 0 0 0 0 0
CALI 20 0 81 0 3 0 0
DERMASON 0 0 0 223 0 32 9
HOROZ 0 0 4 3 104 0 22
SEKER 2 0 0 26 1 127 22
SIRA 0 0 0 10 10 21 144
75.80% of the beans in the test data are classified correctly. How to read this
confusion matrix: rows indicate the true classes and columns the predicted classes.
For example, 47 BARBUNYA beans were falsely classified as CALI.
The classes seem to have different classification difficulties, for example Bombay
is always classified correctly in the test data, but Barbunya only half of the time.
22
Overall the model is not the best model.3
Unfortunately, the model can’t be easily replaced because it’s hopelessly inter-
twined with the rest of the bean company’s backend. And nobody wants to be
the one to pull the wrong piece out of this Jenga tower of a backend.
The dry bean company is in trouble. Several customers have complained that
they bought bags of one variety of beans but there were too many beans of other
varieties mixed in.
The bean company holds an emergency meeting and it’s decided that they will
offer premium products with a guaranteed percentage of the advertised bean
variety. For example, a bag labeled “Seker” should contain at least 95% Seker
beans.
5.3 First try: a naive approach

Great, now all the pressure is on the data scientist to provide such guarantees all
based on this bad model. Her first approach is the “naive approach” to uncertainty
which means taking the probability outputs and believing in them. So instead of
just using the class, she takes the predicted probability score, and if that score is
above 95%, the bean makes it into the 95% bag.
It’s not yet clear what to do with beans that don’t make the cut for any of the
classes, but stew seems to be the most popular option among the employees. The
data scientist doesn’t fully trust the model scores, so she checks the coverage of
the naive approach. Fortunately, she has access to new, labeled data that she
can use to estimate how well her approach is working.
She obtains the probability predictions for the new data, keeps only beans with
>=0.95 predicted probability, and checks how often the ground truth is actually
in that 95% bag.
3
Other models, like random forest, are more likely to be calibrated for this dataset. But I
found that out later, when I was already pretty invested in the dataset. And I liked the data,
so we’ll stick with this example. And it’s not that uncommon to get stuck with suboptimal
solutions in complex systems, like legacy code, etc.
23
# Get the "probabilities" from the model
predictions = model.predict_proba(X_calib)
# Get for each instance the highest probability
high_prob_predictions = np.amax(predictions, axis=1)
# Select the predictions where probability over 99%
high_p_beans = np.where(high_prob_predictions >= 0.95)
# Let's count how often we hit the right label
its_a_match = (model.predict(X_calib) == y_calib)
coverage = np.mean(its_a_match.values[high_p_beans])
print(round(coverage, 3))
0.896
Ideally, 95% or more of the beans should have the predicted class, but she finds
that the 95%-bag only contains 89.6% of the correct variety.
Now what?
She could use methods such as Platt scaling or isotonic regression to calibrate
these probabilities, but again, with no guarantee of correct coverage for new
data.
But she has an idea.
5.4 Second try: conformal classification

The data scientist decides to think about the problem in a different way: she
doesn’t start with the probability scores, but with how she can get a 95% coverage
guarantee.
Can she produce a set of predictions for each bean that covers the true class with
95% probability? It seems to be a matter of finding the right threshold.
So she does the following:
She ignores that the output could be a probability. Instead, she uses the model
“probabilities” to construct a measure of uncertainty:
24
𝑠𝑖 = 1 − 𝑓(𝑥𝑖 )[𝑦𝑖 ]
A slightly sloppy notation for saying that we take 1 minus the model score for
the true class. For example, if the ground truth for bean number 8 is “Seker”
and the probability score for Seker is 0.9, then 𝑠8 = 0.1. In conformal prediction
language, this 𝑠𝑖 -score is called non-conformity score.
ė Non-conformity score
The non-conformity score 𝑠𝑖 for a new data point measures how unusual a
suggested outcome 𝑦 seems like given the model output for 𝑥𝑖 . To decide
which of the possible 𝑦’s are “conformal” (and together form the prediction
region), conformal prediction calculates a threshold. This threshold is based
on the non-conformity scores of the calibration data in combination with
their true labels.
Then she does the following to find the threshold:
1. Start with data not used for model training

2. Calculate the scores 𝑠𝑖
3. Sort the scores from low (certain) to high (uncertain)
4. Compute the threshold 𝑞 ̂ where 95% of the 𝑠𝑖 ’s are smaller (=95% quantile)
The threshold is therefore chosen to cover 95% of the true bean classes.
In Python, this procedure can be done in just a few lines of code:
# Size of calibration data

n = len(X_calib)
# Get the probability predictions
# We only need the probability for the true class
prob_true_class = predictions[np.arange(n),y_calib]
# Turn into uncertainty score (larger means more uncertain)
scores = 1 - prob_true_class
Next, she has to find the cut-off.
25
# Setting the alpha so that we get 95% prediction sets
alpha = 0.05
# define quantile
q_level = np.ceil((n+1)*(1-alpha))/n
qhat = np.quantile(scores, q_level, method='higher')
The quantile level (based on 𝛼) requires a finite sample correction to calculate

the corresponding quantile 𝑞.̂ In this case, the 0.95 was multiplied with (n+1)/n
which means that 𝑞𝑙𝑒𝑣𝑒𝑙 = 0.951 for n = 1000.
If we visualize the scores, we can see that it’s a matter of cutting off at the right
position:
import matplotlib.pyplot as plt
# Get the "probabilities" from the model

# Get for each instance the actual probability of ground truth
prob_for_true_class = predictions[np.arange(len(y_calib)),y_calib]
# Create a histogram
plt.hist(1 - prob_for_true_class, bins=30, range=(0, 1))
# Add a title and labels
plt.xlabel("1 - s(y,x)")
plt.ylabel("Frequency")
plt.show()
26
500
Frequency 400
300
200
100
0
0.0 0.2 0.4 0.6 0.8 1.0
1 - s(y,x)
How does the threshold come into play?

For the figure above, we would cut off all above 𝑞 ̂ = 0.99906. Because for bean
scores 𝑠𝑖 below 0.99906 (equivalent to class “probabilities” > 0.001), we can be
confident that we have the right class included 95% of the time.
But there’s a catch: For some data points, there will be more than one class that
makes the cut. But prediction sets are not a bug, they are a feature of conformal
prediction.
ė Prediction Set
A prediction set – for multi-class tasks – is a set of one or more classes.
Conformal classification gives you a set for each instance.
To generate the prediction sets for a new data point, the data scientist has to
combine all classes that are below the threshold 𝑞 ̂ into a set.
prediction_sets = (1 - model.predict_proba(X_new) <= qhat)
Let’s look at the prediction sets for 3 “new” beans (X_new):
27
for i in range(3):
print(le.classes_[prediction_sets[i]])
['DERMASON']
['DERMASON']
['DERMASON' 'SEKER']
On average, the prediction sets cover the true class with a probability of 95%.
That’s the guarantee we get from the conformal procedure.
How could the bean company work with such prediction sets? The first set has
only 1 bean variety “DERMASON”, so it would go into a DERMASON bag.
Beans #3 has a prediction set with two varieties. Maybe a chance to offer bean
products with guaranteed coverage, but containing two varieties? Anything with
more categories could be sorted manually, or the CEO could finally make bean
stew for everyone.
The CEO is now more relaxed and confident in the product.
Spoiler alert: the coverage guarantees don’t work the way the bean CEO thinks
they do, as we will soon learn (what they actually need is a class-wise coverage
guarantee that we will learn about in the classification chapter.
And that’s it. You have just seen conformal prediction in action. To be exact,
this was the score method that you will encounter again in the classification
chapter.
5.5 Getting started with MAPIE

The data scientist could also have used MAPIE4 , a Python library for conformal
prediction.
4
https://mapie.readthedocs.io/en/latest/index.html
28
from mapie.classification import MapieClassifier
cp = MapieClassifier(estimator=model, cv="prefit", method="score")

cp.fit(X_calib, y_calib)
y_pred, y_set = cp.predict(X_new, alpha=0.05)

y_set = np.squeeze(y_set)
We’re no longer working with the Naive Bayes model object, but our model is
now a MapieClassifier object. If you are familiar with the sklearn library, it will
feel natural to work with objects in MAPIE. These MAPIE objects have a .fit-
function and a .predict()-function, just like sklearn models do. MapieClassifier
can be thought of as wrapper around our original model.
Figure 5.1: Conformal prediction wraps the model
And when we use the “predict” method of this conformal classifier, we get both the
usual prediction (“y_pred”) and the sets from the conformal prediction (“y_set”).
It’s possible to specify more than one value for 𝛼. But in the code above only 1
value was specified, so the resulting y_set is an array of shape (1000, 7, 1), which
means 1000 data points, 7 classes, and 1 𝛼. The np.squeeze function removes the
last dimension.
Let’s have a look at some of the resulting prediction sets. Since the cp_score
only contains “True” and “False” at the corresponding class indices, we have to
use the class labels to get readable results. Here are the first 5 prediction sets for
the beans:
29
for i in range(5):
print(le.classes_[y_set[i]])
['DERMASON']
['DERMASON']
['DERMASON']
These prediction sets are of size 1 or 2. Let’s have a look at all the other beans
in X_new:
# first count number of classes per bean

set_sizes = y_set.sum(axis=1)
# use pandas to compute how often each size occurs
print(pd.Series(set_sizes).value_counts())
2 871
1 506
3 233
4 1
dtype: int64
Most sets have size 1 or 2, many fewer have 3 varieties, only one set has 4 varieties
of beans.
This looks different if we make 𝛼 small, saying that we want a high probability
that the true class is in there.

# remove the 1-dim dimension
for i in range(4):
print(le.classes_[y_set[i]])
30
['DERMASON']
['DERMASON']
And again we look at the distribution of set sizes:
set_sizes = y_set.sum(axis=1)
print(pd.Series(set_sizes).value_counts())
3 780
2 372
4 236
1 222
5 1
dtype: int64
As expected, we get larger sets with a lower value for 𝛼. This is because the
lower the 𝛼, the more often the sets have to cover the true parameter. So we can
already see that there is a trade-off between set size and coverage. We pay for
higher coverage with larger set sizes. That’s why 100% coverage (𝛼 = 0) would
produce a stupid solution: it would just include all bean varieties in every set for
every bean.
If we want to see the results under different 𝛼’s, we can pass an array to MAPIE.
MAPIE will then automatically calculate the sets for all the different 𝛼 confidence
levels. We just have to make sure that we use the third dimension to pick the
right value:
y_pred, y_set = cp.predict(X_new, alpha=[0.1, 0.05])

# get prediction sets for 10th observation and second alpha (0.05)
print(le.classes_[y_set[10,:,1]])
['HOROZ' 'SIRA']
31
We can also create a pandas DataFrame to hold our results, which will print
nicely:

df = pd.DataFrame()
for i in range(len(y_pred)):
predset = le.classes_[y_set[i]]
# Create a new dataframe with the calculated values
temp_df = pd.DataFrame({
"set": [predset],
"setsize": [len(predset)]
}, index=[i])
# Concatenate the new dataframe with the existing one
df = pd.concat([df, temp_df])
print(df.head())
set setsize
0 [DERMASON] 1
1 [DERMASON] 1
2 [DERMASON, SEKER] 2
3 [DERMASON] 1
4 [DERMASON, SEKER] 2
Working with conformal prediction and MAPIE is a great experience. But are
the results really what the bean company was looking for? We’ll learn in the
Classification chapter why the bean CEO may have been celebrating too soon. A
hint: the coverage guarantee of the conformal predictor only holds on average –
not necessarily per class.
ė Coverage
The percentage of prediction sets that contain the true label
The next chapter is about the intuition behind conformal prediction.
32
6 Intuition Behind Conformal
Prediction
In this chapter, you will learn
• How conformal prediction works on an intuitive level

• The general “recipe” for conformal prediction
• Parallels to model evaluation
Let’s say you have an image classifier that outputs probabilities, but you want
prediction sets with guaranteed coverage of the true class.
First, we sort the predictions of the calibration dataset from certain to uncertain.
The calibration dataset must be separate from the training dataset. For the
image classifier, we could use 𝑠𝑖 = 1 − 𝑓(𝑥𝑖 )[𝑦𝑖 ] as the so-called non-conformity
score, where 𝑓(𝑥𝑖 )[𝑦𝑖 ] is the model’s probability output for the true class. This
procedure places all images somewhere on a scale of how certain the classification
is, as shown in the following figure.
Don’t use training data for calibration
Models have a tendency to overfit the training examples, which in turn biases
their non-conformity scores. If we were to calibrate using the training data,
it’s likely that the threshold would be too small and therefore the coverage
would be too low (less than 1 − 𝛼). The guaranteed coverage only works by
calibrating with data that wasn’t used to train the model.
The dog on the left has a model output of 0.95 and therefore gets s = 0.05, but
the dogs on the right in their spooky costumes bamboozle the neural network.
This spooky image gets a score of only 0.15 for the class dog, which translates
into a score of s = 0.85.
33
Figure 6.1: Images from calibration data sorted from certain to uncertain
We rely on this ordering of the images to divide the images into certain (or
conformal) and uncertain. The size of each fraction depends on the confidence
level 𝛼 that the user chooses.
If 𝛼 = 0.1, then we want to have 90% of the mismatches in the “certain” section.
Finding the threshold is easy because it means calculating the quantile 𝑞:̂ the
score value where 90% (= 1 − 𝛼) of the images are below and 10% (= 𝛼) are
above:
In this example, the scary dogs fall into the uncertain region.
Another assumption that conformal prediction requires is exchangeability.
ė Exchangeability
For the coverage guarantee to hold, the calibration data must be “exchange-
able” with the new data we expect. For example, if they are randomly drawn
from the same distribution, they are exchangeable. If they come from differ-
ent distributions, they may not be exchangeable.
Time series data, for example, are not exchangeable, since the temporal order
matters. We will see how conformal prediction can still be adapted for such cases.
34
Figure 6.2: The threshold divides images along the uncertainty scale into certain
and uncertain.
Exchangeability is a bit less strict than identical and independently distributed

(i.i.d.) data, a typical assumption for many statistical procedures.
A point that I found confusing at first: We picked the threshold without looking
at wrong classifications. Because there will be scores for wrong classes that also
fall into the “certain” region, but we seemingly ignore them when picking the
threshold 𝑞.̂
Within the prediction sets, conformal classification foremost controls the coverage
of positive labels: there’s a guarantee that, on average, 1 − 𝛼 of the sets contain
the true class.
So is it really true that negative examples don’t matter? Because this would
mean that we don’t care how many wrong classes are in the prediction sets. If
we didn’t care about false positives at all, we could always include all the classes
in the prediction sets and guarantee coverage of 100%! A meaningless solution,
of course.
So one part of conformal prediction is about controlling the coverage of positive
labels and the other part is minimizing the number of negative labels, meaning not
35
having too many “wrong” labels in the prediction sets. CP researchers therefore
always look at the average size of prediction sets. Given that two CP algorithms
provide the same guaranteed coverage, the preferred algorithm is the one that
produces smaller prediction sets. In addition, some CP algorithms guarantee
upper bounds on the coverage probability, which also keeps the sets small.
Let’s move on to the conformal prediction step.
For a new image, we check all possible classes: compute the non-conformity score
for each class and keep the classes where the score falls below the threshold 𝑞.̂
All scores below the threshold are conformal with scores that we observed in the
calibration set and are seen as certain enough (based on 𝛼).
Figure 6.3: Prediction step in conformal prediction for classification.
In this example, the image has the prediction set {cat, lion} because both classes
are “conformal” and made the cut. All other class labels are too uncertain and
therefore excluded.
36
Now perhaps it is clearer what happens to the “wrong classes”: If the model
is worth its money, the probabilities for the wrong classes will be rather low.
Therefore the non-conformity score will probably be above the threshold and the
corresponding classes will not be included in the prediction set.
6.1 Conformal prediction is a recipe

Conformal prediction for classification is different from CP for regression. For
example, we use different non-conformity scores and conformal classification pro-
duces prediction sets while conformal regression produces prediction intervals.
Even among classification, there are many different CP algorithms. However, all
conformal prediction algorithms follow roughly the same recipe. That’s great as
it makes it easier to learn new CP algorithms.
Conformal prediction has 3 steps: training, calibration, and prediction.
Training is what you would expect:
1. Split data into training and calibration

2. Train model on training data
Calibration is where the magic happens:
1. Compute uncertainty scores (aka non-conformity scores) for calibration

data
2. Sort the scores from certain to uncertain
3. Decide on a confidence level 𝛼 (𝛼 = 0.1 means 90% coverage)
4. Find the quantile 𝑞 ̂ where 1 − 𝛼 (multiplied with a finite sample correction)
of non-conformity scores are smaller
Prediction is how you use the calibrated scores:
1. Compute the non-conformity scores for the new data

2. Pick all y’s that produce a score below 𝑞 ̂
3. These y’s form your prediction set or interval
37
In the case of classification, the y’s are classes and for regression, the y’s are all
possible values that could be predicted.
A big differentiator between conformal prediction algorithms is the choice of the
non-conformity score. In addition, they can differ in the details of the recipe and
slightly deviate from it as well. In a way, the recipe isn’t fully accurate, or rather
it’s about a specific version of conformal prediction that is called split conformal
prediction. Splitting the data only once into training and calibration is not the
best use of data. If you are familiar with evaluation in machine learning, you
won’t be surprised about the following extensions.
6.2 Understand parallels to out-of-sample

evaluation
So far we have learned about conformal prediction using a single split into training
and calibration. But you can also do the split repeatedly using:
• k-fold cross-splitting (like in cross-validation)

• bootstrapping
• leave-one-out (also called jackknife)
Do these sound familiar to you? If you are familiar with evaluating and tuning ma-
chine learning algorithms, then you already know these resampling strategies.
ė Inductive Conformal Prediction

The version of conformal prediction that relies on splitting data into train-
ing and calibration is called inductive or split conformal prediction. The
alternative is transductive or full conformal prediction (see next box).
For evaluating or tuning machine learning models, you also have to work with
data that was not used for model training. So it makes sense that we encounter
the same options for conformal prediction where we also have to find a balance
between training the model with as much data as possible, but also having access
to “fresh” data for calibration.
38
Figure 6.4: Different strategies for splitting data into training and calibration
sets.
For cross-conformal prediction, you split the data, for example, into 10 pieces.
You take the first 9 pieces together to train the model and compute the non-
conformity scores for the remaining 1/10th. You repeat this step 9 times so that
each piece is once in the calibration set. You end up with non-conformity scores
for the entire dataset and can continue with computing the quantile for conformal
prediction as in the single split scenario.
If you take cross-conformal prediction to the extreme you end up with the leave-
one-out (LOO) method, also called jackknife, where you train a total of n models,
each with n-1 data points (𝑛 is number of data points in both training and
calibration).
All three options are inductive approaches to conformal prediction. Another
approach is transductive or full conformal prediction.
ė Transductive Conformal Prediction

Transductive CP (also called full CP) uses the entire dataset, including the
new data point, for creating prediction regions. Transductive CP doesn’t
split the data and instead refits the model multiple times to produce a pre-
39
diction region: To get the prediction set for a new data point, the model
has to be retrained for every possible value of 𝑦𝑛𝑒𝑤 . Transductive CP isn’t
covered in this book.
Which approach should you pick?
• Single split: Computation-wise the cheapest. Results in a higher variance

of the prediction sets and is a non-optimal use of data. Ignores variance
from model refits. Preferable if refitting the model is expensive.
• Leave-one-out (LOO): Most expensive, since you have to train n models.
The LOO approach potentially produces smaller prediction sets/intervals
as models are usually more stable when trained with more data points.
Preferable if model refit is fast and/or the dataset is small.
• CV and other resampling methods: trade-off between single split and LOO.
In the MAPIE Python library, switching between resampling techniques is as

simple as changing a parameter. The following code creates a conformal regression
object with the split strategy.
cp = MapieRegressor(model, cv="prefit")
You can change conformal regression to cross-splitting by changing the CV op-

tion:
cp = MapieRegressor(model, cv=10)
Warning
If you don’t specify the cv option at all, MAPIE will use 5-fold cross-splitting
– even if you have already trained your model.
Entering the calibration step is the same for all “cv” options – with cross-splitting
or LOO it just takes longer because the model is trained multiple times.
cp.fit(x_calib, y_calib)
40
6.3 How to interpret prediction regions and
coverage
The interpretation of prediction regions in conformal prediction depends on the
task. For classification we get prediction sets, while for regression we get pre-
diction intervals. The coverage guarantee, which specifies the probability that
the true outcome is covered by the prediction region, is the central aspect of
conformal prediction.
For example, if the desired coverage is 90% (𝛼 = 0.1), we would expect 90% of
the prediction regions to cover the true outcome. This doesn’t mean that each
individual prediction region has a 90% probability of containing the true outcome.
If you have 10 prediction regions, you can expect 9 out of the 10 to cover the true
class. But you might get 10 out of 10, 8 out of 10 next time, and so on. Just
like rolling the dice: you can expect to get 1x five eyes in 6 rolls, but only on
average. The guarantee is only “marginal”, meaning on average for samples from
the distribution of the calibration/new data (remember that the assumption is
exchangeability).
The prediction regions can be considered as random variables that have a frequen-
tist interpretation. That is, they follow a frequentist interpretation of probability,
similar to confidence intervals of coefficients in linear regression models. Because
the true value is considered fixed, but the prediction region is the random variable,
it would be wrong to say that the true value “falls” into the interval. Because
the true value is fixed but unknown. Also, if we have only one interval, we can’t
make probabilistic statements because it is a realization of the prediction region
random variable. And either it covers the true value or it doesn’t, not subject
to probability. Very nitpicky, I know, but that’s the way it is. Instead, we can
only talk about the average behavior of these prediction regions in the long run,
e.g. how the region “variable” behaves in repeated observations.
6.4 Conformal prediction and supervised learning

Conformal prediction as a method makes a lot of sense coming from a super-
vised learning mindset. In supervised machine learning, there’s a strong focus on
evaluation. The evaluation has to happen out-of-sample, meaning a separation
41
between training data and evaluation data. It also requires that we have the
ground truth for the evaluation data. That’s very similar to conformal predic-
tion which requires a calibration data set where we also have the ground truth
available.
I like to think of CP as partially being a supervised learning mindset to uncer-
tainty quantification. This is, of course, a simplification since the true motivation
behind requiring separate calibration data requires math and statistical theory.
But the parallels to model evaluation in supervised learning are there and under-
standing the supervised learning mindset helps to understand conformal predic-
tion.
ė Shameless Self-Promotion
If you want to learn more about the different modeling mindsets – from
frequentist inference to reinforcement learning – I got you covered with my
book Modeling Mindsets: The Many Cultures Of Learning From Data.a
a
https://christophmolnar.com/books/modeling-mindsets/
42
7 Classification
In this chapter, we will take a closer look at classification. You’ll learn
• How to apply conformal prediction to classification models
• The difference between marginal and conditional coverage
• Specifically, you will learn about the following approaches:
– Naive method
– Score method (Sadinle et al. 2019)
– Adaptive Prediction Sets (APS) (Angelopoulos et al. 2020; Romano
et al. 2020)
– Top K (Angelopoulos et al. 2020)
– Regularized APS (RAPS) (Angelopoulos et al. 2020)
– Group Balanced Conformal Classification (Angelopoulos and Bates
2021)
– Class-conditional Conformal Classification (Derhacobian et al.)
All of these methods produce prediction sets:
Good news: all conformal classification methods presented here are available in
the MAPIE Python library.
More good news: All conformal classification methods presented here work re-
gardless of the data type of the input features. All methods presented here work
for image classifiers, tabular classifiers, text classifiers, and so on. The only re-
quirement is that the output has some kind of (probability) score per class.
7.1 Back to the beans

We will work our way through the beans example again. If you haven’t down-
loaded the beans dataset yet, have a look at the Getting Started chapter. For your
convenience, here is the code for training the bean classification model again:
43
Figure 7.1: Prediction sets
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
# Read in the data from Excel file

beans = pd.read_excel(bean_data_file)
# Labels are characters but should be integers for sklearn
le = LabelEncoder()
beans["Class"] = le.fit_transform(beans["Class"])
# Split data into classification target and features
y = beans["Class"]
X = beans.drop("Class", axis = 1)
# Split of training data

X_train, X_rest1, y_train, y_rest1 = train_test_split(
X, y, train_size=10000, random_state=2
)
# From the remaining data, split of test data

X_test, X_rest2, y_test, y_rest2 = train_test_split(
44
)
# Split remaining into calibration and "new" data

X_calib, X_new, y_calib, y_new = train_test_split(
)
# Fit the model

model = GaussianNB().fit(X_train, y_train)
For this example, we will try different conformal prediction approaches, all of
which produce prediction sets instead of just the top class or class probabilities.
All are implemented in MAPIE, which we can load like this.
from mapie.classification import MapieClassifier

from mapie.metrics import classification_coverage_score
from mapie.metrics import classification_mean_width_score
The first line loads the MapieClassifier, which we will use to conformalize our
classification model. Lines 2 and 3 load functions that we can use to evaluate the
resulting prediction sets.
7.2 The naive method doesn’t work

The naive approach would be to take the probabilities at face value and assume
they are well calibrated. So, to generate a prediction set with at least 1 − 𝛼
probability, we take the following naive approach: For each data instance, we add
up the “probabilities” until the cumulative score of 1 − 𝛼 is exceeded, starting
with the highest.
# Initialize the MapieClassifier

mapie_score = MapieClassifier(model, cv="prefit", method="naive")
# Calibration step
mapie_score.fit(X_train, y_train)
# Prediction step
45
Another random document with
no related content on Scribd:
There was a break in the rhythmic thumping, a new gang stepped
in and relieved the men at the handles, then the monotonous
throbbing was resumed. Spell O, the cry for relief, already coming
from the first gang manning the pumps! Backbreaking work that, all
right. How long could the sixteen men we had on deck, even
relieving each other frequently, keep those handles flying up and
down fast enough to give us a chance in the fireroom? Not for long
could human muscles stand that pace, I feared. It would be nip and
tuck between us and the rising water.
From atop the boiler came the banging of metal on metal and the
muffled curses of Bartlett as sprawled out in the scanty space
between boiler and deck beams overhead, he fought with sledge and
wrench to loosen the manhole bolts. Lauterbach came cautiously
down the fireroom ladder, balancing a huge armful of kindling. I
motioned him to toss it onto the grates, then to join Boyd, Iversen,
and Sharvell with the buckets. In silence, we waited below, listening
to the mingled chorus of the banging sledge hammer, the rasping
screech of rusty nuts, and the fluent profanity of Bartlett, prone on
his stomach, a fantastic fur-clad demon with his distorted face
showing up intermittently in the flickering flame of the torch, battling
the boiler beneath him. No one could help him; there wasn’t room for
two men to work in those confined quarters. And there was no use
giving him any advice either. So below we stood, straining our eyes
impatiently toward Bartlett, while inch by inch the water rose on us
and the margin between water level and furnace grates shrank. The
hand pumps on deck were losing out—they had slowed up the rise,
but they could not stop it.
My chilled legs felt cramped. Instinctively, not taking my gaze off
Bartlett, I tried to flex my knees to relieve them, shifting my weight
from one foot to the other. I found I could not lift either leg. Looking
down sharply, I saw for the first time what before in the poor light had
escaped my notice—in that intense cold, far below zero, the water
was turning to slush, ice was forming here and there over its surface,
and both my feet were solidly frozen down to the iron floorplates on
which I stood!
I gripped my legs one at a time with both hands, savagely tore
them free.
“Keep moving, boys!” I warned the men in the water alongside me.
“If you stand still a minute, you’ll be frozen down!” And standing
there in that fast freezing water, at 29° below zero, I was at least
thankful for the four pairs of wool socks, the three suits of blue
flannel underwear, and the two pairs of woolen mittens which
encased me under my fur suit and boots, for otherwise by now,
between cold water and cold air, I should have been frozen stiff as a
board.
Bang!
With a final blow of his sledge, Bartlett knocked free the last dog,
lifted out the boiler manhead, shouted,
“All clear, chief!”
“Start those buckets!” I ordered, but it was unnecessary. Already
Boyd had dipped the first one full, was passing it up to Bartlett, who
dashed the contents through the open manhole into the boiler, where
splashing over the frigid iron plates inside, I haven’t the slightest
doubt but that it promptly became ice.
Round and round went the buckets, Lauterbach filling, Boyd and
Iversen passing them up full to Bartlett, and little Sharvell catching
the empties as they came tumbling down the boiler front. All the men
were soon coated from head to foot with ice from the water slopping
from the buckets—only their constant stooping, rising, and twisting
which kept cracking the ice off in sheets prevented their soon
accumulating so heavy a weight of it as no man could even stagger
under.
Meanwhile, as they labored, I turned to, and took Boyd’s place in
spreading fuel on the grates, preparatory to lighting off. Hastily I
scattered the kindling over the cold furnace bars, then slid several
buckets of coal out the nearest bunker door, carefully maneuvering
them through the slush and ice across the flooded floorplates to
avoid slopping the sea water which reached nearly to the tops of the
buckets, in on the coal. Seizing then a shovel, I started to heave coal
into the furnaces, an awkward job, for getting the shovel into the tops
of the upright buckets was difficult, and naturally I dared not dump
the coal out on the floorplates first. As best I could, I managed it,
spreading the coal over the kindling, a little thin at the front of the
grates, a thicker bed at the rear. That done, I leaned back on my
shovel, and alternated between watching the waterline creeping up
the boiler fronts and my men frantically passing up buckets to fill the
boiler.
It was a big boiler, eight feet in diameter, and would require
innumerable buckets. Mentally I calculated it, making a rough
estimate. Nine tons of water had to be manhandled up into that
boiler to fill it properly, a thousand bucketfuls at the very least. I
timed the heavy buckets; about six a minute were going up, but the
men could hardly maintain that pace. Still, even if they could, it would
take three hours to fill that boiler to the steaming level! Long before
then, the fireboxes at the bottom of the boiler would be flooded, we
could never light off! Somehow, we had to keep the water down in
the fireroom till I got steam, or the Jeannette was doomed. And her
going meant a two hundred mile retreat over the broken pack to
Siberia—in mid January at 40° or worse below zero, an absolutely
hopeless journey!
“Keep ’em flying, boys!” I called out to my coalheavers, “while I lay
up on deck for help. I’ll be back here in a minute!”
Coated with ice to the waist, I clambered up the ladder, went
forward into the deckhouse. Swinging on the pump bars there, were
eight straining seamen; against the bulkhead, resting a moment,
were eight more, including even the Chinamen Ah Sam and Tong
Sing. A little forward of them was De Long, anxiously peering down a
hatch into the forepeak, while below him in that gloomy hole,
Lieutenant Chipp and Nindemann were sloshing round in deep water
with a lantern, searching for the source of our troubles.
“Where’s the leak, captain?” I asked, bending down alongside him.
De Long straightened up, intensely worried.
“We don’t know, chief; Chipp can’t find it. All he can see is that the
water’s gushing through that supposedly solid pine packing the Navy
Yard filled our bow with, as if it were a sieve. The leak’s in the stem,
down somewhere near the keel; I think our forefoot’s twisted off.” He
looked at me with haggard eyes. “We’re still holding our own on the
forepeak with the hand pump; but the men’ll break down before long.
How soon can you give us steam and help out, chief?”
I drew him aside, a little away from that squad of resting seamen,
not wishing to discourage them.
“Never, captain!” I whispered hoarsely, “unless we get help
ourselves!” Briefly I outlined our desperate position. There was no
hand pump in the fireroom, the water was gaining on us there also.
“I’ve got to have a gang to hoist water out of that fireroom by hand
someway to keep it down till my boiler’s filled and I get steam up, or
we’re done for! And it’ll take three hours yet. My gang’s all busy. Who
can you spare?”
De Long gazed at me somberly.
“Except Danenhower, who’s blind, every man and officer’s working
now. But Newcomb and Collins are only collecting records in case
we abandon ship. Will they do?”
I laughed bitterly.
“Newcomb isn’t worth a damn for real work, captain; and from
what I’ve heard from Collins, you could shoot him before he’d turn to
as a seaman! Besides, two are not enough anyway. It’ll take six good
men at least, to keep ahead of that water, and then they may not do
it. But give me Cole and half of that relief gang at the pumps there
and I’ll try.”
“That’ll reduce us here to six men a shift on the pump handles,”
muttered the captain, dubiously eyeing the crew at the pump. “But
we’ve got to get steam! All right, Melville, take them. But for God’s
sake, hurry it up!”
“Aye, aye, sir!” I turned abruptly to our Irish bosun, who was
nearby supervising the pumping. “Jack, pick four men out of the
gang here, any four, and come aft with me. Shake a leg, now!” I
started for the after door in the deckhouse.
Cole grabbed Starr, a Russian and physically the strongest
seaman in our crew, off the starboard pump handle; took Manson, a
burly Swede, off the port handle to even things up, and beckoned to
Ah Sam and Tong Sing from the relief gang.
“C’mon, me byes; lay aft wid yez!” Cole marshalled his little detail
out of the compartment and slammed the deckhouse door behind
them almost before the twelve startled men left at the pump could
realize that they now had the work of all sixteen to carry on.
Close outside the deckhouse stood the barrel which received the
fresh water condensed in our distiller. That barrel was just what I
needed; distilling for the present was the least of my worries.
“Jack,” I explained briefly, “the fireroom’s flooding on us. We got to
keep that water down till I get fires started. Sling that barrel in a
bridle, rig it on a whip to the davit over the machinery hatch, and
start hoisting water out of the fireroom, four bells and a jingle! She’s
all yours now, Jack! Get going!”
Cole, a rattling good bosun if I ever saw one, needed nothing
further.
“Aye, aye, sor. Lave ut to Jack!” In a moment he had that Russian,
the Swede, and the two Chinamen round the barrel, emptying it; in
another second they were rolling it aft; and as I started down the
ladder to the fireroom, Cole had the barrel on end again and already
was expertly throwing a couple of half hitches in a manila line round
it to serve as a sling.
Almost before I got down the ladder to my fireroom again, the
barrel came tumbling down the hatch at the end of a fall and landed
alongside me with a splash, while above, Cole roared out,
“Below there! She’s all yours! Fill ’er up!”
Being nearest, I tipped the barrel sidewise in the water, pushed it
down till it submerged, then righted it. It filled with a gurgle, settled
through the slush to the floor plates.
“Full up!” I shouted. “Take it away!”
“Aye, aye!” The line to the barrel tautened, then started slowly to
rise. Down the hatch floated Cole’s voice, encouraging his squad on
the hoisting line,
“Lay back wid yez, Rooshian! Heave on it, ye Swede! An’ git those
pigtails flyin’ in the breeze, ye two Chinks, or we’ll all be knockin’
soon at the Pearly Gates, an’ fer sailor min the likes of us, wid
damned little chanct to get past St. Peter! Lively wid yez; all togither
now. Heave!”
The loaded barrel suddenly shot up the hatch.
Hurriedly Cole swung it over to the low side scuppers, dumped it,
and sent it clattering down again. Once more I filled it, started it up,
then called Lee, my machinist, from the engine room pump to stand
by on that filling job while I went back to the all-important boiler.
Why go into the agony of the next two hours? Wearily, without
relief, my men heaved water, ice, slush, whatever the flying buckets
scooped up, indiscriminately into the yawning void inside that boiler;
just as wearily, with aching shoulders, Cole and his little group
labored, unrelieved and unshielded from the bitter cold on deck,
heaving that barrel up and down; while from the deckhouse, the
more and more frequent cries of Spell O! showed that at the
undermanned pump, backs were fast giving way under that inhuman
strain.
And in spite of all, I could see that we were going to lose. Another
hour yet to fill the boiler to the steaming level, but from the rate with
which the flood waters were still rising, in another hour it would be
too late—the water would be over the grates. Hoping against hope
that perhaps I was wrong, that perhaps the water was going into that
kettle faster than I thought, I crawled myself to the top of the boiler.
Keeping as clear of Bartlett as the scant space allowed, not to slow
up the stream of buckets, I seized the torch and in between the
dumping of those cumbersome buckets peered through the ice-
rimmed manhole into that Scotch boiler. As I feared. The upper tubes
down there were still uncovered; the crown sheets of the furnaces
were still perhaps a foot above the level of the slush (I could hardly
call it water) line. As I looked, Bartlett, sprawled out beside me, sent
another bucketful splashing through the manhole, which soaked my
beard and almost immediately froze it into a solid mass. But I hardly
noticed it, staring with leaden eyes into that still half-empty boiler.
With a sinking heart, I slid away on the ice-coated cylinder from the
manhole, and crawled down the breechings to stand once again on
the thickening ice covering the flooded floorplates.
Dare I fire up without waiting further?
I was in a terrible predicament. To light fires under a partly filled
boiler like that, with tubes and furnace plates not wholly covered with
water, was not only the surest way to a courtmartial which would
probably end my naval career, it violated also every tenet in my
engineer’s code, violated every principle of safety, practically insured
a boiler explosion! But if I did not get fires going right away, I would
never have a chance to fire up, and not only that boiler but the ship
herself and all her crew besides would vanish in that Arctic ice.
I must risk whatever came.
With flying buckets and tumbling barrel splashing and spilling
water all around me, I applied a match to another oil torch, fanned it
a moment in the chilly air till it blazed brightly, shoved it (in the
narrow space still remaining between the flood waters and the grate
bars) into the inboard furnace under the kindling, till the wood took
fire and then hurriedly transferred it to the outboard furnace until that
also lighted off. The extreme cold of the outside air favored me,
creating a tremendous draft as soon as a little warm air filled the
flues, and in no time at all it seemed, the wood was blazing up
fiercely and igniting the coal which, shining brightly down through the
grate bars onto the water flooding the lower part of the ash-pits, cast
a lurid red glare out into the dark fireroom, evidently putting new life
into the drooping sailors, for both below and on deck, a ragged cheer
greeted that crimson glow.
“Keep that water going, lads; we haven’t won yet!” I warned,
flinging open the furnace doors and heaving in more coal. “We’ve got
to get that water level up over the crown sheets before they get red
hot, or we’re all going straight to hell! Twice as fast now on those
buckets!” And whatever it was, fear or hope, that inspired those
coalheavers, a moment before ready to drop from utter exhaustion,
the buckets started to fly faster than ever.
I finished heaving coal, slammed to the fire doors, and leaned
back on my shovel. I was in for it now. Never in the history of steam,
before nor since, has a boiler been fired under such weird conditions
—furnaces half-flooded, no water showing in the sight glasses, slush
and ice for what charge there was, and the boiler manhead still off!
But I was relying on some of those very dangers to save my bacon—
till I put the manhole cover back, there could be no pressure to
cause real trouble; and till we had melted down and warmed up that
ice and slush, I counted on that chilly mixture and the water still
splashing in to soak up heat so rapidly as to keep the bare tubes and
exposed crown sheets from getting red hot and collapsing.
My other fears I need hardly go into—the dangers of bringing up
steam suddenly in a cold boiler instead of gradually warming up first
for twelve hours as was usual; of frozen gauge glasses; of frozen
feed pumps—all these I deliberately put out of my mind. Only one
thing counted now—to get some steam at any cost whatever before
the water reached the grate bars and flooded out my fires.
And we did. With only a few inches left to go, came at last from
Bartlett the long-awaited cry,
“The crown sheet’s covered now, chief!”
“On with that manhead!” I roared back.
The clanking of Bartlett’s sledge hammer, breaking away the ice
round the manhole so the cover would fit, was my only answer. The
worn-out coalheavers dropped their buckets, rested for the first time
in hours, sagging back against the boiler fronts to keep from
dropping into the icy water. No time for that. I seized a slice bar,
started savagely to slice the fire in the outboard furnace, sang out,
“Boyd, get busy with another slice bar on that inboard fire!
Lauterbach, relieve Lee on filling that barrel! Lee, get back to your
pump now! And, Sharvell, you and Iversen, get into those bunkers
and break out some more coal! Come to life now, all of you!”
Boyd, nearly dead from his half of heaving up over eight tons of
water, staggered over to my side, gripped a slice bar. Together we
labored over the fires, forcing them to the limit, nursing in more coal
without deadening the blaze, till helped by an amazing draft from the
stack, we had them roaring like the very flames of hell itself. Never
have I seen such fires!
Leaving the stoking job now wholly to Boyd, I dropped my slice bar
and stepped back to examine the gauge glasses. Water was barely
showing in the sight glass, but, thank God, it was showing! And the
needle of the pressure gauge was starting to flutter off the zero pin.
Steam was coming up! If we could only hold down the flood for a few
minutes more now, till I could get that pump warmed up and going,
we were saved! But that part was up to Jack Cole.
“Jack!” I shouted up the hatch. “A little more and you can quit. But
right now, for God’s sake, shake it up; faster with that barrel!”
“Aye, aye, sor!” Then to his strangely conglomerate crew, ready
undoubtedly to collapse in their tracks, Cole called gruffly,
“C’mon me byes! Lit’s raylly git to liftin’ now, an’ work up a sweat,
or we’ll freeze to death in this cowld! Lay back on ut, Starr! Heave
there, Manson! Wud yez have thim two Chinks outpullin’ yez? An’
step out there now, ye Chinese seacooks, an’ don’t be clutterin’ up
the decks, or whin that Rooshian gits goin’, he’ll be treadin’ heavy on
thim pigtails! Yo heave! Up wid ut!” And with astonishing speed I saw
the loaded barrel vanish up the hatch.
I breasted my way through the water aft to where Lee in the
engine room stood by my largest steam pump. No need to worry
about priming the pump for suction; another foot higher on that flood
and we would have to go diving to reach the pump valves. I felt the
steam line. The frosty chill was gone; a little steam at least was
already coming through to the pump.
“All right, Lee; let’s get going,” I mumbled. We cracked open the
steam valve a hair, started to drain the line. And no mother nursing
her baby ever handled it more tenderly than Lee and I nursed that
frozen pump, gradually draining and warming the steam cylinder, lest
the sudden application of heat should crack into pieces that
abnormally cold cast iron, and after our heartbreaking struggle with
the boiler, leave us still helpless to eject the sea. With one eye on
Jack Cole’s rapidly moving barrel and the other on that narrowing
margin between flood water and furnace fires, I nursed the pump
along by feel, taking as long to warm it up as I dared without
swamping those flames. At long last the pump cylinder was hot;
steam instead of water was blowing out the drains. And the boiler
gauge needle stood at thirty pounds. Enough; we could go.
I straightened up, motioned Lee to start the pump. He opened the
throttle valve. With a wheeze and a groan the water piston broke free
in its cylinder, the nearly submerged pump commenced to stroke.
Leaving Lee at the pump, I ran (that is, if barely dragging one ice-
weighted foot after another can be called running) up the ladder
toward the deck. While I climbed, the empty barrel came hurtling
down the hatchway, splashed into the water in the fireroom. Before
Lauterbach could fill and upend it, down on top of the barrel in a
maze of coils came the slack end of the hoisting line. Apparently
Cole’s gang was through.
As I poked my head above the hatch into the open, there—Oh,
gorgeous sight for bleary eyes and aching muscles! was a heavy
stream of water pulsing into the scuppers! Nearby, prone on the deck
where they had dropped in their tracks when they let go the hoisting
line, were four utterly worn-out seamen, gazing nevertheless
admiringly on that beautiful stream. And leaning against the bulwark
watching it, was Jack Cole, who as he saw me, sang out,
“Praises be, chief; we’re saved! There’ll be no calls for Spell O
from that chap!”
CHAPTER XX
Our immediate battle was won, but the war thus opened that 19th of
January, 1880, between us and the Arctic Sea for the Jeannette
dragged along with varying fortunes till the last day I ever saw her.
Our big steam pump made short work of all the water in the
fireroom that was still water. In an hour the room was bare down to
ice-coated floors and bilges, with the pump easily keeping ahead of
the leakage coming from forward. But the men at the hand pump,
optimistically knocked off the minute the steam pump began
stroking, were unfortunately not wholly relieved. Despite the fact that
we opened wide the gates in the forepeak and the forehold
bulkheads to let the water run freely aft to the fireroom pump, the
flow through was sluggish, impeded I suppose by having to filter
through the coal in the cross bunker. So fifteen minutes out of every
hour, the hand pump was manned again to keep down the water
level in the forehold, while, sad to contemplate, our weary seamen,
between spells at the pump, had to labor in the forehold storerooms
breaking out provisions (much of which were already water soaked)
and sending them up into the deckhouse to save our food from
complete ruin.
It was ten-thirty in the morning when the leak was discovered; it
was three p.m. when I finally got steam up and a pump going; but at
midnight the whole crew was still at work handling stores. The state
we were then in was deplorable beyond description.
Who struck eight bells that night I do not know, for since morning
we had had no anchor watch, but someone, Dunbar perhaps, whose
seagoing habits were hard to repress, snatched a moment from his
task and manned the lanyard. At any rate, as the clear strokes of the
bronze bell rang out on that frost-bitten night, De Long, in water up to
his knees in the forehold, was recalled to the passage of time. The
provisions actually in the water had been broken out; his effort now
was to send up all the remainder which rising water might menace.
But with the bell echoing in his ears, the captain, looking at the jaded
seamen about him, staggering through the water laden with heavy
boxes and casks, toiling like mules, came suddenly to the realization
that they had only the limited endurance of men and called a halt.
“Knock off, lads,” he said kindly. “If anything more gets wet before
morning, it gets wet. Lay up on deck!” And on deck, as the men
straggled up the hatch to join the rest of the crew round the hand
pump (at the moment unmanned) he ordered Cole to serve out all
around two ounces of brandy each. Frozen hands poured it into
chilled throats, to be downed eagerly at a gulp—there was not a man
who might not have swallowed a whole quart just as eagerly, and
probably then still have felt but little warmth in his congealed veins.
At the captain’s order, Cole then piped down—the starboard watch
to lay below to their bunks, the port watch for whom there was to be
no immediate rest, to man the hand pump as necessary through the
remainder of that dreary night, keeping the water in the forehold
down below the level of the as yet unshifted stores. The frozen
seamen tramped wearily off, some to rest if they could, the others to
bend their backs over the bars of the pump, which soon resumed its
melancholy clanking.
But neither for me, for the captain, nor for Chipp was there any
rest. Immediately I had downed my share of the brandy, I turned to at
once, figuring how I might get steam and a steam pump forward to
suck directly on the forehold and eliminate altogether the toil over the
hand pump which must soon break our men down. I had in my
engine room that spare No. 4 Sewell and Cameron pump (which my
men and I had so thoughtfully picked up in the dark of the moon at
Mare Island before we started). I set to work on a layout for installing
it in the deckhouse forward; which task, between designing
foundations and sketching out suction and steam lines for it, kept me
up the rest of the night. As for Chipp, he was down in the forepeak
with Nindemann, endeavoring to stop, or at least to reduce, the leak.
The water was pouring in through the innumerable joints in that
mass of heavy pine timbers, which stretching from side to side and
from keel to berth deck in our bow, filled it for a distance of ten feet
abaft the stem. However valuable that pine packing may have been
in stiffening our bow for ramming ice, it was now our curse, very
effectively preventing us from caulking whatever was sprung in the
stem itself. All through the night Nindemann and Chipp labored,
stuffing oakum and tallow into the joints of that packing where the
jets of water squirted through. It was discouraging work. As fast as
their numbed fingers rammed a wad of oakum into a leaking joint
and stopped the flow there, water spurted from the joints above.
Methodically through the night they worked in that dismal hole with
freezing water spraying out over them, following up the leaks,
caulking joint after joint, but when at last they got to the top, plugging
oakum into the final crack, the water rose still higher and started to
pour down their necks from between the ceiling and the deck beams
overhead where they could not get to it. They could do no more. At
five a.m., each man a mass of ice, they came up, beaten.
Meanwhile, De Long, foreseeing the possibility of such a
contingency, had himself put in the rest of the night over the ship’s
plans, designing a watertight bulkhead to be built in the forepeak just
abaft that packing, so that if we could not stop the leak, we could at
least confine the flooding to a small space forward and thus stop all
pumping, either by hand or steam.
In the early morning, after twenty-four hours of continuous strain
and toil, the three of us met again in the deckhouse, I with my
sketches for the pump installation, De Long with his bulkhead plans,
and Chipp with the bad news that we had better get both jobs
underway at once for he had failed utterly to stop the leak. So we
turned to.
I will not go into what we went through the week following—my
struggles with frozen lines, improper equipment, and lack of men and
tools for such a job. Suffice it to say that after three days I got that
auxiliary Sewell pump running forward so that to the intense relief of
the deck force, their torture at the hand pumps ended altogether, and
I was able to keep the water in the forepeak so low that Sweetman
and Nindemann were enabled to start building the bulkhead.
From then on, Nindemann and Sweetman bore the brunt. On
these two petty officers, Sweetman, our regular carpenter, and
Nindemann, our quartermaster (but almost as good as a carpenter)
fell the entire labor of building that bulkhead. In the narrow triangular
space in the peak, they toiled hour after hour, day after day, cutting,
fitting, and erecting the planking. William Nindemann, a stocky,
thickset German, was a perfect horse for work, apparently able to
stand anything; but Alfred Sweetman, a tall, spare Englishman, had
so little flesh on his ribs that he froze through rather rapidly, and in
spite of his objections, had to be dragged up frequently to be thawed
out or he would soon have broken down completely. As it was, every
four hours both men got a stiff drink of whiskey to keep them
limbered up, and as much hot coffee and food in between as they
could swallow, which was considerable.
Meanwhile, during all this turmoil and anxiety, the captain was
weighed down with the problem of what to do with the blinded
Danenhower should the water get away from us, either then or later.
To add to his worries, Dunbar, who was also still under the weather
from his illness, seemed between that and his efforts to assist, to
have aged overnight at least twenty years. It was pathetic to see the
old man, looking now positively decrepit, struggling in spite of the
captain’s orders to hold up his end alongside husky seamen, fighting
with them to help save the ship. And as if to make a complete job of
De Long’s mental anguish during that agonizing first day of the leak,
Surgeon Ambler was suddenly taken violently ill, and to the captain’s
great alarm had to be left in his cabin, practically unattended. Aside
from De Long’s natural concern over what might happen to Ambler
himself, the effect on the captain’s mind of this prospect of being left
without a doctor to look after Danenhower and any others who might
collapse in our desperate predicament, can well be imagined. It
amazed me that the captain under the combined impact of all these
worries and disasters, instead of caving in himself, maintained at
least before the men an indomitable appearance, by his actions
encouraging them, and with never a word of profanity, urging and
cheering them on.
By the end of the ensuing week things showed signs of
improvement—I had both steam pumps going, hand pumping was
discontinued, Nindemann and Sweetman against terrible odds were
making progress on the bulkhead, Dunbar was no worse, and
Ambler (whose trouble turned out to be his liver) was under his own
care, sufficiently on the mend to be no longer in danger.
Only Danenhower, aside from our leak, remained as a problem.
He, instead of getting better, got worse.
The third day of our troubles, while I was still struggling with a
frozen steam whistle line through which I was trying to get steam
forward to start my Sewell pump, there into the glacial deckhouse
beside me came our surgeon, wan and pinched and hardly able to
drag one foot after another. I gazed at him startled. He had not been
out of his bunk since his illness.
“What’s the matter, brother?” I queried anxiously. “Why aren’t you
aft in your berth where you belong? We don’t need help; we’re
getting along here beautifully.”
“Where’s the captain?” he asked, ignoring my questions. “I want
him right away.”
“Below there,” I replied, pointing down the forepeak hatch. “He’s
inspecting the work on the bulkhead. Shall I call him for you, doc?”
Apparently too weak to speak a word more than he had to, Ambler
only nodded. A little alarmed, I poked my head down the hatch into
the dark peak tank and called out to De Long standing far below on
the keelson. He looked up, I beckoned him, and he started
cautiously to climb the icy ladder, shortly to be blinking incredulously
through his frosty glasses at Ambler, even more astonished than I at
seeing him out of bed. Ambler wasted no words in explanations
regarding his presence.
“It’s Danenhower, captain. I got up as soon as I could to examine
him. His eye’s so much worse today that if I don’t operate, he’ll lose
it! So I came looking for you to get your permission first. You know
how things stand with us all.”
The captain knew, all right. It was easy to guess, looking into his
harassed eyes as Ambler talked, what was going through De Long’s
mind—a sick surgeon, poor medical facilities, a leaking ship, and the
possibility of having the patient unexpectedly thrust out on that
terrible pack to face the rigors of the Arctic, where with even good
eyes in imminent peril of freezing in their sockets at 50° below zero,
what chance for an eyeball recently sliced open? All this and more
besides was plainly enough reflected in the skipper’s woebegone
eyes and wrinkling brows. De Long thought it over slowly, then
wearily shook his head.
“I can’t give permission, doctor. It’s not Dan’s eye alone; it means
his very life if we have to leave the ship soon. And since it’s his life
against his eye we’re risking, he ought to have a voice in it. I can’t
say yes; I won’t say no. Put it up to Dan; let him decide himself.”
“Aye, aye, sir; I’ll explain it to him.” Dr. Ambler swung about, went
feebly aft, leaving the captain and me soberly regarding each other.
“You’re dead right, captain; nobody but Dan should decide. It’s too
much of a load for another man to have on his conscience if things
go wrong.”
De Long, abstractedly watching Ambler hobbling aft, hardly heard
me. Without a word in reply, he turned to the ladder behind him, and
with his tall frame sagging inside his parka as if the whole world bore
on his bent shoulders, haltingly descended it. I looked after him
pityingly. He had brought Dan, a husky, vital young man into the
Arctic; now of all times, what a weight to have on his mind as Dan’s
life hung in the balance! Unconsciously I groaned as I turned back to
thawing out my steam line and I am afraid that my mind wandered
considerably for the next hour as I played a steam hose back and
forth along that frozen length of iron pipe.
I was still at it, and still not concentrating very well, when Tong
Sing’s slant eyes peered at me through the cloud of vapor
enveloping my head and he pulled my arm to make sure he had my
attention.
“Mister Danenhower likee maybe you see him, chief.”
I shut off my steam hose, nodded to the steward, started aft. If I
could help to lighten poor Dan’s burden any, I was glad to try. But
what, I wondered, did he want of me—advice or information?
I entered Dan’s room, sidling cautiously between the double set of
blankets draping the door to shut out stray light. It was pitch-black
inside.
“That you, chief?” came a strained voice through the darkness the
minute my foot echoed on the stateroom deck.
“Yes, Dan. What is it?”
“My eye’s in horrible shape, the doctor tells me, chief. If it’s
anything like the way it hurts, I guess he understates it. What’s
happened to make it worse the last couple of days I don’t know,” he
moaned, then added bitterly, “Most likely it’s just worry. How do you
think I feel lying here useless, not lending a hand, while the rest of
you are killing yourselves trying to stop that leak and save the ship?”
I felt through the blackness for his bunk, then slid my fingers over
the blankets till I found his hand.
“Don’t let that get you, Dan,” I begged, giving his huge paw a
reassuring squeeze. “We’re making out fine with that leak. As a fact,
we got it practically licked already. It wasn’t much trouble.”
“Quit trying to fool me, chief,” pleaded Dan. “It’s no use. Maybe I
can’t see, but I can hear! So I know what’s going on around me. As
long as I hear that hand pump clanking, things are bad! And with the
skipper’s cabin right over my head and yours just across the
wardroom and me lying here twenty-four hours a day with nothing to
do but listen, don’t you think I know when you turn in? And neither of
you’ve turned in for a total of ten minutes in two nights now! Don’t try
to explain that away!”
I winced. Dan, in spite of the Stygian darkness in which he lived,
had the facts. No use glossing matters over.
“Listen, Dan, I’m not fooling you,” I answered with all the
earnestness I could muster. “It’s true we haven’t slept much, but
we’re both all right. And while things looked pretty bad at first, for a
fact, we got that leak practically licked. Before the day’s over, that
hand pump will shut down for good. Now forget us and the ship; let’s
get back to Danenhower. What can I do for you, brother?” I gave his
palm a friendly caress.
I felt Dan’s invisible hand twitch in mine, then close convulsively
on my fingers.
“I’m in a tough spot, Melville. The doctor tells me if he doesn’t
operate, I’ll go blind. And if he does, and I have to leave the ship
before my eye’s healed and he can strip the bandages, I’ll probably
die! And it’s up to me to decide which. Simple, isn’t it, chief?”
Danenhower groaned. Had I not kept my lips tightly sealed, I should
have groaned also at his pathetic question. With a lump in his throat,
he added, “I don’t want to go back blind to my f—,” he choked the
merest fraction of a second over the word, then substituting another,
I think, hastily finished—“friends, but as much as anybody here I
want to get back alive if I can. Honestly, chief, you won’t fool a blind
shipmate just to spare his feelings, will you?” He gripped my hand
fiercely. “What’re our chances with the ship? I’ve got to know!”
“The leak’s licked, Dan,” I assured him earnestly. “We won’t sink
because of that. But about what the ice is going to do to us, your
guess is as good as mine. Seeing what she’s fought off so far, I’d
back the old Jeannette’s ribs to hold out against the pack for a while
yet.”
“Thanks, chief, for your opinion.” Dan pressed my hand once
more, then slowly relaxed his grip. “I guess I’ll have to think it over
some more before I decide. You’d better go now; sorry to have
dragged you so long from your work to worry you over my poor
carcass.”
I said nothing, I dared not, fearing that my voice would break. With
big Dan stretched out blind and helpless on his bunk, invisible there,
to me only a voice and a groping hand in the darkness, I slipped
away silently, leaving him to grapple with the choice—to operate or
not to operate—possible death in the first case, certain blindness in
the second. And with the knowledge that however he chose, the final
answer lay, not with him, but with the Arctic ice pack. He must guess
what it had in store for the Jeannette with his sight or his life the
forfeit if he guessed wrong. I went back to my own trifling problem,
thawing out the steam line.
Shortly afterward, Tong Sing came forward again, calling the
captain this time, who immediately went aft. Whether Danenhower
had decided or whether he was seeking further information, the
steward did not know. I worked in suspense for the next hour till De
Long returned. One look at his face informed me how Dan had
decided.
“Well, brother, when’s the operation?”
“It’s over already, Melville! Successful too, the doctor says. I
watched it and helped a bit. And, chief, I hardly know which to
admire most—the skill and speed with which Ambler, weak as he
was, worked, or the nerve and heroic endurance with which Dan
stood it. He’s back in his stateroom now, all bandaged again. God
grant the ship doesn’t go out from under us before those bandages
are ready to come off!”
Well, that was that. With a somewhat lighter heart, I resumed
blowing steam on my frozen line. De Long crawled back into the
forepeak to resume his study of the leak.
But my happier frame of mind did not last. If it was not one thing
on the Jeannette to drive us to distraction, it was a couple of others.
The captain soon squirmed back through the hatch with a long face
to join me again beside the deck pump.
“How much coal have we got in our bunkers, now, chief?” he
asked.
“Eighty-three tons and a fraction,” I answered promptly. I felt that I
knew almost every lump of coal in our bunkers by name, so to
speak.
“And what are we burning now?” he continued.
“A ton a day, captain, to run our pumps and for all other purposes,
but as soon as that bulkhead’s finished and the leak’s stopped, we
ought to get down to 300 pounds again, our old allowance.”
De Long shook his head sadly.
“No, chief, we never will. The way the ship’s built, I see now we’ll
never get that bulkhead really tight; she’s going to keep on leaking
and we’re going to keep on pumping. But a ton of coal a day’ll ruin
us! By April, at that rate, the bunkers’ll be bare. Can’t you do
something, anything, to cut down that coal consumption?”
I thought hastily. Our main boiler, designed of course for furnishing
steam to propel the ship, was far bigger than necessary just to run a
couple of pumps, and consequently it was wasteful of fuel. If
pumping, instead of lasting only a few days more, was to be our
steady occupation, I ought to get some setup more nearly suited to
the job. Before me in the deckhouse was the little Baxter boiler I had
rigged for an evaporator. That might run the forward pump. And
looking speculatively aft through the deckhouse door, my eye fell on
our useless steam cutter, half buried in a mound of snow and ice
covering its cradle on the poop. There was a small boiler in that
cutter. Perhaps I could remove it, rig it somehow to run a pump in the
engine room. And then I might let fires die out under the main boiler
again and do the job with less coal.
Briefly I outlined my ideas to the captain, who, willing to clutch at
any straw, gave blanket approval to my making anything on the ship
over into what I would, so long as it promised to save some coal.
“Good, brother,” I promised. “As soon as I get this pump running
and knock off the hand pump, I’ll turn to with the black gang and try
to rig up those small boilers so we can shut down that big coal hog.
And even if we have to hook up Ah Sam’s teakettle to help out on
the steam, we’ll get her shut down; you can lay to that!”
“I’m sure you will, chief,” answered De Long gratefully. “Now is
there any way we can help you out with the deck force?”
“Only by plugging away on those leaks, captain. We’re making
3300 gallons of salt water an hour in leakage; every gallon of that
you plug off means so much more coal left in the bunkers.”
“I well appreciate that, Melville. Nindemann and his mate are doing
what they can with the bulkhead; I’m starting Cole and the deck
watch to shoving down ashes and picked felt between the frames
and the ceilings in the forepeak to stop the flow of water there. We’ll

Introduction To Conformal Prediction With Python: A Short Guide For Quantifying Uncertainty of Machine Learning Models 1st Edition Christoph Molnar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Conformal Prediction With Python: A Short Guide For Quantifying Uncertainty of Machine Learning Models 1st Edition Christoph Molnar

Uploaded by

Copyright:

Available Formats

Introduction To Conformal Prediction

With Python : A Short Guide For

Learn TensorFlow 2.0: Implement Machine Learning and

Learning Genetic Algorithms with Python Empower the

Deep Learning for Finance: Creating Machine & Deep

Machine Learning for Knowledge Discovery with R:

Machine Learning with Python Cookbook, 2nd Edition Kyle

Machine Learning with Python Cookbook 2nd Edition Chris

Beginner's Guide to Streamlit with Python: Build Web-

Building Machine Learning and Deep Learning Models on

commit id: 7319978

3 Who This Book Is For 10

4 Introduction to Conformal Prediction 11

5 Getting Started with Conformal Prediction in Python 19

6 Intuition Behind Conformal Prediction 33

8 Regression and Quantile Regression 65

9 A Glimpse Beyond Classification and Regression 83

10 Design Your Own Conformal Predictor 90

• Guaranteed coverage: Prediction regions generated by conformal predic-

• Teaches the intuition behind conformal prediction

• You should know the basics of machine learning

• Why and when we need uncertainty quantification

4.1 We need uncertainty quantification

• When we use model predictions to make decisions

The importance of quantifying uncertainty depends on the application for which

• Uncertainty quantification can improve fraud detection in insurance claims

But where does this uncertainty come from?

4.2 Uncertainty has many sources

4.3 Distinguish good from bad predictions

Figure 4.1: Not all images are equally diﬀicult to classify.

(a) Easy Dogo (b) Diﬀicult

Figure 4.2: Classification scores by class.

4.4 Other approaches don’t have guaranteed

• Class probabilities: We should not interpret these scores as actual prob-

4.5 Conformal prediction fills the gap

Conformal prediction has many advantages that make it a valuable tool to

Conformal prediction has one important assumption: exchangeability. If the

• That naively trusting class probabilities is bad

5.1 Installing the software

# Download if not available

# Read in the data from Excel file

# Split of training data

# From the remaining data, split of test data

# Split remaining into calibration and "new" data

# Fit the model

• 10,000 data samples (X_train, y_train) for training the model

from sklearn.metrics import confusion_matrix

5.3 First try: a naive approach

5.4 Second try: conformal classification

Then she does the following to find the threshold:

1. Start with data not used for model training

# Size of calibration data

Next, she has to find the cut-off.

The quantile level (based on 𝛼) requires a finite sample correction to calculate

import matplotlib.pyplot as plt

# Get the "probabilities" from the model

How does the threshold come into play?

prediction_sets = (1 - model.predict_proba(X_new) <= qhat)

Let’s look at the prediction sets for 3 “new” beans (X_new):

5.5 Getting started with MAPIE

cp = MapieClassifier(estimator=model, cv="prefit", method="score")

y_pred, y_set = cp.predict(X_new, alpha=0.05)

Figure 5.1: Conformal prediction wraps the model

Don’t use training data for calibration