s41587-022-01483-z

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

nature biotechnology

Article https://doi.org/10.1038/s41587-022-01483-z

High-plex imaging of RNA and proteins


at subcellular resolution in fixed tissue by
spatial molecular imaging

Received: 31 December 2021 Shanshan He1, Ruchir Bhatt1, Carl Brown1, Emily A. Brown 1, Derek L. Buhr1,
Kan Chantranuvatana1, Patrick Danaher 1, Dwayne Dunaway1,
Accepted: 19 August 2022
Ryan G. Garrison1, Gary Geiss1, Mark T. Gregory1, Margaret L. Hoang 1,
Published online: 6 October 2022 Rustem Khafizov1, Emily E. Killingbeck 1, Dae Kim1,2, Tae Kyung Kim1,
Youngmi Kim1, Andrew Klock1, Mithra Korukonda1, Alecksandr Kutchma1,
Check for updates
Zachary R. Lewis 1, Yan Liang 1, Jeffrey S. Nelson1, Giang T. Ong 1,
Evan P. Perillo1, Joseph C. Phan1, Tien Phan-Everson1, Erin Piazza 1,
Tushar Rane1, Zachary Reitz1, Michael Rhodes1, Alyssa Rosenbloom1,
David Ross 1, Hiromi Sato1, Aster W. Wardhani1, Corey A. Williams-Wietzikoski1,
Lidan Wu 1 and Joseph M. Beechem 1

Resolving the spatial distribution of RNA and protein in tissues at subcellular


resolution is a challenge in the field of spatial biology. We describe spatial
molecular imaging, a system that measures RNAs and proteins in intact
biological samples at subcellular resolution by performing multiple
cycles of nucleic acid hybridization of fluorescent molecular barcodes.
We demonstrate that spatial molecular imaging has high sensitivity (one
or two copies per cell) and very low error rate (0.0092 false calls per cell)
and background (~0.04 counts per cell). The imaging system generates
three-dimensional, super-resolution localization of analytes at ~2 million
cells per sample. Cell segmentation is morphology based using antibodies,
compatible with formalin-fixed, paraffin-embedded samples. We measured
multiomic data (980 RNAs and 108 proteins) at subcellular resolution in
formalin-fixed, paraffin-embedded tissues (nonsmall cell lung and breast
cancer) and identified >18 distinct cell types, ten unique t­um­or m­ic­ro­
en­vi­ro­nments a­­­­n­d 100 pairwise ligand–receptor interactions. Data on
>800,000 single cells and ~260 million transcripts can be accessed
at http://nanostring.com/CosMx-dataset.

Understanding the spatial distribution of RNA and protein in tissues car- time. In addition, high-plex, high-sensitivity multiomic detection (RNA
ries great potential to expand our knowledge of all aspects of life science and proteins simultaneously) in formalin-fixed, paraffin-embedded
research1–6. Fluorescence in situ hybridization (FISH) and immunohis- (FFPE) form has not yet been accomplished. Current multiplex IHC
tochemistry (IHC) are the most common technologies used to assess techniques11 involve a long assay time, which is not easily automated,
the spatial distribution of RNA and protein in fixed tissue samples7–10. and often require extensive assay optimization, hindering their
However, these technologies can detect only a handful of targets at a establishment as routine methods.

NanoString Technologies, Seattle, WA, USA. 2Present address: Dxome Co., Gyeonggi-do, Republic of Korea.
1
e-mail: jbeechem@nanostring.com

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1794


Article https://doi.org/10.1038/s41587-022-01483-z

For high-throughput, high-plex RNA profiling at single-cell res- Here we describe the chemistry and applications of a spatial
olution, the most commonly used technology to date is single-cell molecular imager (SMI) to address many of the current unmet needs
RNA sequencing (scRNA-seq), including Drop–seq, inDrop and Chro- in spatial high-plex profiling of RNA and protein expression. SMI is a
mium (10X Genomics)12–15. These technologies require tissue disso- completely automated and integrated platform comprising chemistry,
ciation and cell isolation into nanoliter droplets with hydrogel beads hardware and software that enables highly sensitive spatial profiling
bearing barcoding DNA primers. Although these methods enable of 980 RNAs and 108 proteins in FFPE tissues at single-cell and subcel-
whole-transcriptome RNA profiling at single-cell resolution, the lack of lular resolution.
spatial information limits this approach to generation of a very detailed
‘parts list’ of the tissue. SMI chemistry and workflow
Spatially resolved technologies have recently been developed The SMI was developed to perform multiple cycles of nucleic acid
that are aimed at maximization of the number of markers observ- hybridization of fluorescent molecular barcodes, to enable in situ
able simultaneously. These methods fall into two categories. One measurement of RNAs and proteins on intact biological samples at
is profilers that generate spatially resolved high-plex data, often at subcellular resolution. The SMI chemistry is an enzyme-free, nucleic
the whole-transcriptome level, from small subregions of the tissue acid amplification-free, hybridization-based, single-molecule barcode
using massive parallel sequencing as a readout. Profilers do not have detection methodology that can be directly applied to both intact
single-cell resolution and often have RNA capture reagents arranged FFPE and fresh-frozen tissues on 1-mm-thick standard glass slides for
in a predefined grid-like pattern on which tissue sections are mounted. pathology.
The second is imagers that provide true single-cell and subcellular The chemistry of SMI relies on in situ hybridization (ISH) probes
resolution using in situ reagents, although the level of multiplexing is and fluorescent readout probes called reporters (Fig. 1a) to detect
lower than the whole transcriptome in most platforms. RNAs in intact tissue. ISH probes consist of a DNA of length 35–50 nt
Profiling-based techniques include spatial transcriptomics16, that will hybridize with the RNA target in the tissue (target-binding
slide-seqV2 (ref. 17), pixel–seq18, dbit–seq19 and digital spatial profiling domain), coupled with a stretch of 60–80 nt of DNA ‘readout domain’
(DSP)20 that offer the ability to perform highly multiplexed profiling for RNA identification. The readout domain consists of four consecu-
based on detection readout using next-generation sequencing (NGS). tive 10–20-nt sequences that allow four individual SMI imaging bar-
These techniques are limited, however, by not being resolved at the codes (reporters) to bind sequentially. In addition, for each gene,
single-cell level and often suffer from dead-space areas in the sample five RNA-detection oligonucleotide probes (‘tiles’) of ISH probes are
where no measurements are performed. Although the active RNA designed to detect different regions of the RNA target, with each tile
capture regions (‘spots’) can be precisely organized, the placement independently recording RNA location identity.
of the tissue sample onto the spots has limited control in the selection Each tile of the ISH probe has its unique sequence in the
of regions to be analyzed, ignoring the morphological information target-binding domain whereas all tiles share the same sequence in
in the tissue. Also, the spot size in the RNA capture area introduces the readout domain. This design enables the highest detection sensitiv-
analysis issues—for example, if the spot is set to a larger size (50 μm), ity in FFPE tissue, in which RNA targets may be highly fragmented. For
many cells are randomly selected by the spot location on tissue. If the instance, even if only one of the tiles successfully binds to the target,
spots are small (≤2 μm), the number of captured transcripts is very low the RNA target can be detected at the readout step. Each reporter
and the spots are grouped together for analysis, leading to the same construct contains the controlled number of 15–60 dyes, depending
issues as with larger spots. In addition, many RNA profiling methods on desired sensitivity, assembled with fluorophore-conjugated oli-
capture RNA molecules via the poly-A tail, which is inefficient for the gos with photocleavable (PC) linkers (Fig. 1a). All reporters are single
subsequent sequencing process because poly-A transcripts are domi- color, containing one of four fluorophores: Alexa Fluor-488, ATTO
nated by highly expressed genes21. Because spot-based methods rely 532, Dyomics-605 or Alexa Fluor-647. The key advantages of the SMI
on the mobility of RNA molecules in the tissue sample, they struggle reporter chemistry are high signal-to-noise ratio (SNR) detection over
to work with FFPE samples without requiring separate chemistries the background for accurate spot calling and fast fluorescent signal
for different sample types. The in situ solution offered by the GeoMx® quenching by ultraviolet (UV) cleavage of PC linkers (Fig. 1b).
DSP overcomes many of these issues, providing a method for highly For sample preparation, SMI utilizes standard methods performed
multiplexed spatial profiling of RNA and proteins on both FFPE and for FISH on FFPE tissue sections to expose RNA targets, followed by
frozen samples20. the introduction of fluorescent bead-based fiducials that are fixed to
Single-cell, high-plex imagers use sequential cycles of probe the tissue to provide an optical reference for cyclic image acquisition
hybridization and imaging, offering the potential to combine the ben- and registration. Following hybridization of ISH probes, slides are
efits of scRNA-seq analysis and added spatial resolution at single-cell washed, assembled into a flow cell and placed within a fluidic manifold
or even subcellular resolution. Recent single-cell, high-plex imag- on the SMI instrument for RNA readout and morphological imaging. In
ing technologies include MERFISH22,23, Molecular Cartography24, RNA assay readout the tissue is hybridized with 16 sets of fluorescent
FISSEQ25 and seqFISH+ (ref. 26). Despite isolated proof-of-concept reporters sequentially, each reporter set containing four single-color
experiments demonstrating RNA-plex levels in ~10,000 targets26,27, reporter pools. The reporters specifically bind to ISH probes during
most molecular imaging systems allow lower plex from between 100 the 16 rounds of reporter hybridization according to the barcode
(ref. 24) and 500 targets22. Because these methods were developed for assigned to each gene (Supplementary Fig. 1). After incubation of
fresh-frozen samples, detection efficiency can be very low on FFPE each set of reporters, high-resolution Z-stacked images are acquired
tissue samples. for downstream analysis (Fig. 1c). Before incubation with the next set
For protein detection, CODEX and InSituPlex methodologies of reporters, PC linkers are cleaved by UV illumination and released
increase target-plex by labeling antibodies with unique oligonucleotide fluorophores are removed by washing (Fig. 1b).
tags that are identified by fluorescence detection during amplifica- The SMI encoding scheme is designed to assign a unique barcode
tion of the probe sequence, or through cyclical hybridization of fluo- to each target transcript from a set of 64-bit barcodes (four color
rescently labeled reverse-complement sequences28. Although cyclic reporters in each readout round over 16 readout rounds), with Ham-
immunofluorescence and ion-beam imaging have enabled high-plex ming distance 4 (HD4) and Hamming weight 4 (HW4)31. Every bar-
detection of dozens of proteins in situ29,30, both approaches are limited code is separated by an HD of at least four from all other barcodes to
in the maximal plex number due to on-instrument time or availability maximally suppress RNA decoding error. Every barcode has a constant
of metal isotopes. HW4, in which each target is ‘on’ in four rounds and ‘off’ in 12 rounds

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1795


Article https://doi.org/10.1038/s41587-022-01483-z

a b Expose RNA ISH probe Cyclic readout with


Reporter in tissue hybridization 16 sets of reporters
PC sites
ISH probe

nt in
a
60– t dom
80
u
Target-binding

ado
domain

Re
35–50 nt Cleave and
RNA Reporter remove Reporter
set 1 reporter set 1 set 2

c Raw images x, y, z identification

z
6 µm

40
µm

µm
40
Z1

Z2

20
µm
100 µm y

µm
Z3

20
Z4
x

0
0
Morphology Segmentation Cell assignment

COL1A1
IGHG1
KRT19

20 µm

d Mitochondria
(antibody labeling) Mitochondrial genes Nuclear genes

20 µm

MT-CYB_2 MALAT1
MT-CO1_1 NEAT1
MT-ND5_1
MT-ND2_1 Mitochondrial Nuclear genes
MT-ND3_3 genes (%) (%)
MT-CO3_3 Mitochondria 98.7 3.1
MT-ND6_3 Nuclei 1.3 96.9
MT-ATP6_3

Fig. 1 | CosMx SMI chemistry and workflow and 3D mapping of RNA at single- Following each set of reporter hybridization, high-resolution Z-stacked images
cell and subcellular resolution using morphology-based cell segmentation. are acquired followed by cleavage and removal of fluorophores from the
a, Schematic description of the SMI ISH probe and reporter design. The ISH reporters before incubation with the next set of reporters. c, Analysis workflow
probe consists of the target-binding and readout domains, the former being a used to transform raw images into decoded RNA transcripts at subcellular
35–50-nt DNA sequence that hybridizes with target RNA and the latter a 60–80-nt resolution. The workflow includes: (1) 3D primary image processing to identify
DNA sequence containing four consecutive 10–20-nt reporter-landing domains, and register reporter spots; (2) decoding of reporter spots to RNA transcripts
where each landing domain can be hybridized with a unique reporter. With a with registered x, y, z spatial location; (3) outlining of nuclei and cell boundaries
64-bit barcoding design, there is a total of 64 unique reporter-landing sequences. with DAPI and antibodies after cyclic reporter readout for morphology-based cell
Each reporter is a single-color branched structure assembled with oligos segmentation; and (4) assigning RNA transcripts to single cells. As an example of
conjugated with one of four fluorophores, and will be detected as one of four the final output of the analysis workflow, three identified genes (COL1A1 (yellow),
colors (blue, green, yellow or red) in SMI images. Each reporter has a controlled IGHG1 (cyan) and KRT19 (orange)) in FFPE human lung tissue are overlaid with
number of 15–60 dyes with six PC sites to efficiently quench signals by UV segmented cells based on their registered spatial information. Scale bars, 100 μm
illumination and a washing step before each cyclic reporter readout. for raw images, 20 μm for images with morphology-based cell segmentation
b, Illustration of the SMI RNA assay workflow. The FFPE slide undergoes standard images. d, Demonstration of subcellular resolution in U2OS cells. Transcripts
tissue preparation to expose RNA targets for ISH probe hybridization. The included in a 980-plex panel located in mitochondria and nuclei are shown. These
sample is assembled into a flow cell and loaded onto an SMI instrument for genes were read out on fixed U2OS cells with the SMI RNA assay, followed by
cyclic readout with 16 sets of reporters. Because each reporter set contains four labeling with morphology antibodies for membrane (CD298) and mitochondria,
reporters with four different fluorophores, 64 unique reporters are used in as well as DAPI stain for nuclei. Scale bar, 20 μm. Table shows quantification of
the SMI assay to bind to the different reporter-landing domains on ISH probes. transcripts detected in these two cell compartments.

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1796


Article https://doi.org/10.1038/s41587-022-01483-z

during the 16 rounds of reporter hybridization. This on and off signal than its neighbors. In biological tissues, however, dense fluorescent
barcode design allows for continued expansion to even higher plex, spots can be detected in a small spatial region, which may limit the
because only a subset of RNAs is on in any given cycle. Note that the accuracy of resolving each spot. To solve this potential issue, we devel-
nature of these on–off imaging signals represents a ‘deterministic oped custom image analysis algorithms to process 3D multichannel
super-resolution imaging system’, and hence each SMI imaging bar- image stacks obtained in each field of view (FOV). The key objective of
code can be located well below the diffraction limit of the imaging this analytical method is to reduce the multidimensional image stack
system. For each reporter hybridization round, a single reporter can to a single list of individual reporter-binding events. This process is
bind to one of the reporter-landing domains on the ISH probe of the performed across all FOVs, concurrently with image acquisition during
gene (Fig. 1a,b and Supplementary Fig. 1). The 64-bit encoding scheme cyclic reporter readout. All spots pertaining to a given FOV are col-
with HD4 and HW4 yields 1,210 barcodes, from which a subset of 980 lated into a single list of xyz locations of all individual reporter-binding
is selected to detect 960 target genes and 20 negative control targets events. This list is used in the next step to decode the gene-specific
(Supplementary Table 1). Negative control probes are modeled after barcodes formed by these reporter-binding events.
synthetic sequences from the External RNA Controls Consortium The decoding algorithm enables the detection of as many tran-
(ERCC)22. Because they target alien sequences that are nonexistent scripts as possible in crowded reporter-binding events while limiting
in human tissue, negative control probes serve as nontarget con- the rate of error calls that can be generated due to the presence of
trols for quantification of nonspecific ISH probe hybridization. Five multiple transcripts in close vicinity. In this algorithm, each unique xyz
RNA-detection oligonucleotide probes/tiles are designed per gene location with at least one reporter-binding event is considered a ‘seed’
and negative control target (Supplementary Table 2). The colors of the and used to construct a neighborhood in which gene-specific barcodes
barcodes are assigned by minimizing color overlap between universal are searched. Two passes of the data with the seed-centered neighbor-
high expressors (Supplementary Table 3). The remaining 197 barcodes hood search are used to obtain a list of potential transcripts with their
were used as blank controls for misidentification quantification of spatial locations. In the first pass-through of the data, the neighbor-
reporter readout. hood search is limited to a radius of 0.5 pixels (90 nm). Because every
Another key component of the SMI platform is the integrated and gene barcode has four on spots, any seeds with fewer than four unique
fully automated fluidic and imaging capability. SMI features a large scan reporter probe-binding events in the neighborhood are removed from
area (range 16–375 mm2) on each tissue slide and supports up to four consideration for transcript decoding due to their inability to form a
slides simultaneously. Run time per sample is dependent on the num- complete gene-specific barcode. All possible four reporter combina-
ber of flow cells utilized, the area imaged per sample and the size of the tions of unique reporter probes in a seed’s neighborhood are then
target assay panel. The on-instrument time for the commercial-release matched against a table of all potential gene-specific barcodes to detect
SMI instrument will range from four samples per day for a ~16-mm2 the presence of a gene in that seed’s neighborhood. If more than one
tissue area to ~0.5 per day for 100-mm2 samples. The ability to meas- gene is detected in a seed’s neighborhood, the seed and all transcripts
ure four slides per single SMI run, with each slide allowing the plac- detected in that neighborhood are dropped from further analysis.
ing of multiple samples onto a single slide (375 mm2 active imaging The impact of filtering out seeds with multiple target calls was minor
area), provides extreme flexibility in the maximized throughput of the (3–10%) on overall transcript calls.
SMI system. Any reporter-binding events used in making transcript calls were
The RNA and protein SMI imaging barcodes over-background removed from the dataset, and the process was repeated with a slightly
pixels are located in one of the four color channels, and the intensity increased search radius of 0.75 pixels to try to recover any potential
distribution of each barcode is fit using a two-dimensional (2D) parab- transcript calls that may have been lost by local tissue motion. The tran-
ola at subpixel resolution. Images from different cycles and different script calls generated by these two passes of the data were further fil-
colors are registered using an affine transform to align the fluorescent tered to remove any potential duplicate calls, or calls in neighborhoods
bead reagents added to each tissue slide. The positions of the imaging with a high probability of making a transcript call by random chance.
spots contribute to an individual RNA target call by generating a very These various filtering steps are crucial to addressing potentially
rich super-resolution data matrix of the final x, y and z coordinates of elevated call-rate error and duplicate calls of individual transcripts,
the analyte. The position of each transcript call made using SMI can be ensuring that at the completion of the transcript decoding process
estimated, on average, within a disc of radius ~50 nm by making use of we retain only high-confidence transcript calls while maximizing the
the super-resolution locations of individual reporter-binding events number of decoded transcripts.
contributing to that transcript call over the timescale for the entire Following RNA decoding and 3D spatial registration of target
experiment (Supplementary Fig. 2). Z-axis localization is dependent on signals, the cell segmentation process is performed on tissue sections
the number of optical Z-stacks acquired during data collection. Because of FFPE human lung to define cell boundaries based on tissue morphol-
a step size of 800 nm was used for data collection, z-axis localization ogy markers (Fig. 1c).
is approximately ±400 nm. Unlike the x and y axes, there was no fit- In this cell segmentation process, tissue is stained with a nuclear
ting in the z axis with different planes and accuracy is close to pixel- or dye (DAPI) and labeled with antibodies for morphology markers such
step-size resolution. as membrane (CD298), epithelial cells (pancytokeratin (PanCK)) and
T cells (CD3). Defining accurate cell boundaries by segmentation is
Three-dimensional RNA mapping and cell critical to data quality because it affects the spatial assignment of
segmentation at subcellular resolution transcripts to specific cells. It is challenging to achieve high accuracy
The combination of SMI chemistry, hardware and software enables on tissue sections in which cells are tightly packed, have shared 3D
high-plex spatial RNA profiling within single cells at very high sensitiv- boundaries or are unevenly stained with morphology markers. To
ity. For spatial profiling, the data of fluorescent signals in Z-stacked raw address this issue, we developed a cell segmentation pipeline that com-
images are transformed into decoded RNA transcript information at bines image preprocessing with machine learning for better accuracy
their registered spatial location in three dimensions (3D) and assigned (Supplementary Figs. 3 and 4). The signals of membrane channels were
to single cells identified by segmentation (Fig. 1c). During primary combined and normalized to the range of the nuclear channel, and
image processing, the fluorescence signals in raw images are trans- then image subtraction was performed between the nuclear channel
formed into digital output comprising detected spots localized in x-, and normalized membrane image. After these preprocessing steps the
y- and z-dimensions for subsequent RNA decoding. A spot is identified images were fed into pretrained Cellpose neural network models18. The
in an image as an isolated fluorescence signal with intensity much higher Cellpose model has been validated with an average precision metric

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1797


Article https://doi.org/10.1038/s41587-022-01483-z

of 0.91 (that is, 9% error) at a threshold level of 0.5 on the specialized Segmented regression was fit predicting raw SMI counts from RNA-seq
data. In this study we augmented the pure protein-based (cytoplasmic) (Fig. 2c). For the average cell line, the breakpoint in the segmented
Cellpose algorithm with an additional independent analysis of SMI regression occurred at an RNA-seq value ranging from 0.36 to 5.20
nuclear-based signal using the Cellpose algorithm for a more accurate (median, 1.443) (Fig. 2d). Above this breakpoint, SMI counts linearly
final cell segmentation. Using this improved approach, we estimate the tracked RNA-seq counts with high correlation (Fig. 2c). The results
net segmentation error rate to be approximately 5–9%. show adequate concordance between SMI counts and bulk RNA-seq
In the registered image, each transcript location is mapped to the values (Fig. 2e), indicating that the panel can measure a wide range
corresponding cell and within the cellular compartment (for example, of biological processes. Furthermore, the comparison of SMI data
nuclei, cytoplasm, membrane). To demonstrate accurate RNA detec- from three representative cell lines with those from a conventional
tion at subcellular resolution, we designed a set of SMI probes that hybridization-based assay, RNAscope, revealed that the average counts
target both mitochondrial and nuclear genes32,33. The cell compart- of five representative genes per cell were highly correlated between
ments in U2OS cells were labeled with fluorescent probe-conjugated these platforms (R2 = 0.8122–0.9993; Fig. 2f). SMI counts per cell were
antibodies for mitochondria and membrane (CD298), and stained also compared with published CPM-normalized scRNA-seq counts38
with DAPI for nuclei (Fig. 1d). RNA profiling data shows that 98.7% from three matched cancer cell lines (Pearson’s r = 0.74–0.76; Sup-
of detected mitochondrial genes specifically colocalized with the plementary Fig. 7a). In addition, SMI data were compared with this
antibody-based mitochondrial segment and 96.9% of nuclear genes scRNA-seq dataset for the fraction of cells having nonzero counts
strictly located within nuclei. These results indicate high accuracy in for each of the 788 genes common to both datasets (Supplementary
RNA detection and its spatial assignment at subcellular resolution. Fig. 7b). This fraction for each gene indicates the sensitivity ratio
between the two technologies, as recently described in ref. 39. The data
SMI 980-plex RNA panel and cross-platform points above the slope = 1 line in a scatterplot would indicate that one
validation technology has higher sensitivity than the other. In the three cell lines
For RNA profiling we designed a panel targeting 980 genes to investi- shown, the ratios of sensitivity of SMI over scRNA-seq are 520/268 in the
gate the biology of single cells or subcellular compartments. Among EKVX lung adenocarcinoma cell line, 609/179 in the SKMEL2 melanoma
those genes, 211 were selected for use in cell type identification with cell line and 569/219 in the T47D breast cancer cell line.
the remainder designed to capture critical cell states, cell–cell inter- For the NSCLC dataset, SMI counts were compared with a pub-
actions and hormone activity (Supplementary Fig. 5; list of genes in lished scRNA-seq dataset of CD45+ cells in NSCLC40. Mean counts for the
Supplementary Table 1). The panel also includes 20 ERCC negative 952 shared genes in the immune cell population (100,193 cells) of the
targets containing hybridization regions not complementary to the SMI dataset were compared with 215,423 cells in the scRNA-seq dataset.
human transcriptome34. The panel design and gene selection process In the SMI data, the mean of negative targets was subtracted from each
are fully described in Methods. gene’s mean count. All counts from SMI and scRNA-seq were thresh-
To ensure that this panel was capable of quantifying RNAs in a olded up to 1 × 10–3 to enable log transformation and ensure positive
wide range of biological samples, we profiled the SMI 980-plex panel numbers after subtracting the background from SMI counts. Counts
on 16 different cell lines (CCRF-CEM, COLO205, DU145, EKVX, HCT116, were log2 transformed and all 952 genes are plotted in Fig. 2g (Pearson’s
HL60, HOP92, HS578T, IGROV1, M14, MDA-MB-468, MOLT4, PC3, RPMI- r = 0.74). For the average gene, the log2 ratio of the SMI signal strength
8226, SKMEL2 and SUDHL4), spanning a broad range of biology and over scRNA-seq signal strength is 0.61—that is, background-subtracted
expression patterns (Supplementary Table 4). Furthermore, to under- SMI data show a 20.6 = 1.51-fold stronger signal than scRNA-seq. For
stand how SMI performs on ‘real-world’ FFPE samples, the 980-plex those genes with counts in the 25th and 75th percentiles for scRNA-seq,
assay was run on eight FFPE slides from five non-small cell lung cancer the average log2 ratio is 2.62 and −1.42, respectively. In addition, we
(NSCLC) tumors (Supplementary Table 5). All samples were derived compared the SMI bulk profile of each NSCLC sample with the log10
from tumor tissues archived in 2017 and 2018, the typical age for many mean FPKM of NSCLC data in The Cancer Genome Atlas using the
archived cancer samples available to researchers (Supplementary Table breakpoint regression shown in Fig. 2c. The breakpoints of five tis-
6). The RNA in these samples ranged from ‘too degraded to sequence’ to sues, above which signal predominates over background, occurred
‘medium quality to sequence’ according to either RNA integrity number between 3.3 and 9.6 FPKM (Supplementary Fig. 7c). Correlations above
(RIN) or DV200 scoring for the degree of RNA fragmentation (Sup- breakpoints ranged from 0.57 to 0.75. A small number of high-outlier
plementary Table 7 and Supplementary Fig. 6). These data, therefore, genes are evident, such as MALAT1, MZT2A, COL9A2, NPPC, DUSP5
represent a range of sample types, including the most difficult class of and TYK2. These outliers were all one log10 unit above the trend line in
archived material any researcher is likely to examine. three or four of the five samples. These outliers can be explained by
Using the SMI image processing methodology, hundreds of tran- sample variation due to variation in cell type composition and tumor
scripts per cell were detected in the tested FFPE cell line pellets with states used for the public RNA-seq dataset. Recurring outliers could
a maximum of 4,500 transcripts resolved in a single cell (Fig. 2a and be partially due to platform effects such as different target accessibil-
Supplementary Table 4). Testing in FFPE NSCLC tissues identified an ity for detection, or different sample preparation procedures for the
average of 260 counts per cell and a maximum of 2,737 transcripts in situ imaging-based platform versus bulk RNA-seq. Most outliers in
(Fig. 2b and Supplementary Table 5). SMI RNA profiling also features Lung 6 are keratin-encoding genes and appeared in this sample but
a cell dropout rate (cells containing fewer than 20 total transcripts) not in others. This indicates that SMI procedures have only slight bias
of <3% in the 16 cell lines tested. Background signals for the 20 ERCC in gene quantification.
control targets were extremely low (~0.04 counts per target per cell),
resulting in 12.3% of transcripts called being off-target background. RNA-based cell typing and interaction analysis in
Our single-cell distribution analysis revealed that even low-expressing NSCLC
targets were detected well above the background (Fig. 2a,b), confirm- Measurements of eight slides from NSCLC samples represent a total
ing the high sensitivity of the SMI RNA assay. of 800,327 cells of which nearly 96% passed quality control, yield-
For cross-platform validation, SMI RNA counts were compared ing 769,114 analyzable cells (Supplementary Table 5). An average of
with bulk RNA-seq FPKM values (fragments per kilobase of tran- 260 transcripts per cell were detected (Supplementary Tables 5 and 8),
script per million mapped reads) published in the Cancer Cell Line and 850 of the 960 potential genes in this panel were measured above
Encyclopedia (CCLE)35 and TPM values (transcripts per million) the background (Supplementary Table 5). A first step in the analysis
published as part of the NCI-60 Human Tumor Cell Lines Screen36,37. of these datasets was to identify the high-level cell types of as many

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1798


Article https://doi.org/10.1038/s41587-022-01483-z

a b
250

200 –4
8 × 10

Number of cells

Number of cells
150

100 –4
4 × 10

50

0
0 0 × 10

0 500 1,000 1,500 2,000 0 500 1,000 1,500 2,500


Total transcripts per MDA-MB-468 cell Total transcripts per cell

80 30
Counts per MDA-MB-468 cell

25

Counts per tumor cell


60
20

40 15

10
20
5

0 0
Negative H4C3 MALAT1 EEF1A1 Negative CD274 IDO1 HLA-C
probes low medium high probes low medium high

c d 7

Breakpoint RNA-seq value


CCRF–CEM COLO205 DU145 EKVX 6
5 5

(FPKM/TPM)
4
4
3
3
2
1 2

HCT116 HL60 HOP92 HS578T 1


5 0

M14
PC3

HL60
EKVX
HOP92

DU145
MOLT4
HS578T
SUDHL4

IGROV1

HCT116

SKMEL2
COLO205
CCRF–CEM

RPMI–8,226
MDA–MB–468
4
3
log10(CosMx count)

2
1
IGROV1 M14 MDA–MB–468 MOLT4
5 e
4 0.90
Correlation of log-transformed

3 0.85
data above breakpoint

2 0.80
1
0.75
PC3 RPMI–8226 SKMEL2 SUDHL4
0.70
5
0.65
4
0.60
3
2 0.55 M14
PC3

HL60
EKVX
HOP92

DU145
MOLT4
HS578T
SUDHL4

IGROV1

HCT116

SKMEL2
COLO205
CCRF–CEM

RPMI–8,226
MDA–MB–468

1
–2 0 2 4 –2 0 2 4 –2 0 2 4 –2 0 2 4
log10(RNA-seq)

f 100.0
g 4
MZT2A
3
CD44
MS4A1 DUSP5
2
CD44 TYK2
PTPRC PTPRC 1
10.0 COL9A2
NPPC
RNAscope counts per cell

0
CD68 –1 KRT14
CD68 RAMP1
CD44 CD68 –2 WNT7B THBS2
SMI log2

1.0 KLRK1 CTSG CCL4


–3 KRT16
CD8A VTN KRT1
–4 WNT10B
MS4A1 WNT5B
PTPRC MS4A1 SCG5
–5 TWIST1
CD8A CD8A CCL20
0.1 COL11A1
–6
SCGB3A1
–7
–8
0.01 –9 Pearson’s r = 0.74
0.01 0.1 1.0 10.0 100.0
–10
SMI counts per cell
–10 –9 –8 –7 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 7
scRNA log2

Fig. 2 | Single-cell distribution and detection sensitivity of low, medium and from segmented regression; red lines denote segmented regression fit.
high expressers and comparison of SMI data with RNA-seq and RNAscope data. d, Estimated breakpoints and 95% confidence intervals from each cell line,
a, Single-cell distribution of total transcripts per cell in FFPE cell line MDA-MB-468. in FPKM or TPM depending on cell line origin. Red line denotes median value.
Average counts per cell of genes representing low (H4C3), medium (MALAT1) e, Estimates and 95% confidence intervals for correlation between log-transformed
and high (EEF-1A1) expressers in this cell line and negative probes. b, Single-cell RNA-seq and SMI data of genes with RNA-seq values above breakpoint. Red line
distribution of total transcripts per cell in FFPE lung tissue. Among only tumor denotes median value. f, Average counts per cell of five genes (CD8A, CD44, CD68,
cells, average counts per cell of genes representing low (CD274), medium (IDO1) RTPRC and MS4A1) are compared between RNAscope and SMI data from three
and high (HLA-C) expressers and negative targets. c, For 16 cell lines, RNA-seq representative cell lines (MDA, SUDHL4 and CCRF). g, For the SMI NSCLC dataset,
values are plotted against raw SMI counts. RNA-seq values are in TPM for cell lines log2 of mean counts per gene in immune cell populations (100,193 cells) compared
from NCI-60, and in FPKM for cell lines from CCLE. Orange lines show breakpoints with 215,423 cells in a published scRNA-seq dataset in NSCLC40.

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1799


Article https://doi.org/10.1038/s41587-022-01483-z

spatially resolved cells as possible. Single-cell expression profiles were tumors (Fig. 3d). For example, it shows neighborhoods of almost pure
derived by counting the transcripts of each gene that fell within the tumor cells with very low levels of macrophage and T cell infiltration.
area assigned to a cell according to the cell segmentation algorithm Some neighborhoods were dominated by single cell types, includ-
(Supplementary Figs. 3 and 4). An average of 4% of cells contained fewer ing macrophages, neutrophils, plasmablasts and myeloid dendritic
than 20 total transcripts (Supplementary Table 5), which were omitted cells (mDCs); other neighborhoods held distinct mixtures of immune
from further analysis because this class of cells behaves anomalously populations, such as B cells with T cells, macrophages with T cells and
when projected into lower dimensions (for example, uniform manifold plasmacytoid dendritic cells (pDCs), and macrophages with neutro-
approximation and projection (UMAP)). The average false code had phils and infrequent lymphoid cells. Some of these neighborhoods
0.0092 counts in the average cell, indicated by mean false calls per were specific to single tumors while others were shared across tumors.
cell per target in Supplementary Table 5. Finally, by clustering this neighborhood matrix we partitioned the
Cell type in NSCLC tissue was determined by comparison of tumor microenvironment into distinct niches. Plotting niches in physi-
individual cells’ expression profiles with reference profiles for dif- cal space clarified the spatial organization within, and the contrasts
ferent cell types (scRNA-seq and bulk RNA-seq of flow-sorted blood between, these tumors (Fig. 3e).
and stroma databases)41, assigning each cell to the cell type under Studies across larger numbers of samples require sample-level
which its profile was most likely (Supplementary Fig. 8 and Supple- summary statistics. Using the cell types derived from the gene expres-
mentary Table 9). Likelihood was defined using a negative binomial sion matrix, we found these tissues to differ in the relative abundances
distribution, with mean defined by a cell type’s reference profile plus of the immune cell population within each tumor (Supplementary
expected background and with a size parameter set at 10 to allow for Fig. 9a). Using niches derived from the neighborhood matrix, we
extensive overdispersion. Reference profiles of immune and stro- identified additional differences in the multicellular niches com-
mal cell types were acquired from previous work41. New reference prising their microenvironments (Supplementary Fig. 9b). Niche
profiles were defined with the mean expression profiles of PanCK+ abundances expand the information beyond that which cell type
clusters in the UMAP projection. These new clusters were all PanCK+ abundances alone can provide. For instance, samples Lung 5 and
and included five tumor-specific clusters and one shared across all Lung 6 have similar macrophage abundance (7 and 8% of cells, respec-
tumors. Tumor-specific clusters were labeled ‘tumor’ and the shared tively) while only Lung 6 contains the macrophage-dominated niche
cluster ‘epithelial’, and this interpretation was confirmed by a patholo- (9% of cells).
gist review. To define more nuanced sample-level summaries, we scored each
The matrix of single-cell gene expression profiles was analyzed cell for the number of tumor cells among its 100 closest neighbors, a
using UMAP, with each cell assigned to a cell type as described above metric of how much it has invaded into the tumor rather than remaining
(Fig. 3a). It should be emphasized that, for classic single-cell studies, confined to the stroma. Contrasting this invasiveness score across cell
this UMAP–cell type combination plot represents the essential informa- types and across tumors revealed differences both within and between
tion content of the experiment. For spatially resolved studies, however, tumors (Supplementary Fig. 10). For example, in Lung 6, macrophages
this basic data type is the very beginning of a rich spatial analysis. In were primarily surrounded by non-tumor cells while neutrophils were
spatial profiling, each cell in this UMAP–cell ID representation has more likely to be surrounded by tumor cells.
additional information of high-resolution x, y and z spatial coordinates By contrasting the gene expression and neighborhood matri-
associated with it. For example, we observed that in these tumors B cells ces, we examined further advanced questions for every gene, cell
typically gathered in dense clusters accompanied by T cells (Fig. 3b). type and neighborhood characteristic: how does this cell type change
Plasmablasts also gathered densely, often proximal to smaller numbers expression in response to this neighborhood characteristic? And
of T cells. Macrophages both gathered in small clusters and trafficked how does this dependency vary across tissues? To answer these ques-
diffusely throughout tumors. Neutrophils were usually found filling tions, we investigated the changes of gene expression in macrophages
large vacancies within tumors, accompanied by very few other immune between niches in Lung 6 (Fig. 3f). More than 43% of genes (415 of
cells (Fig. 3b). 960) had expression changes between niches with false discovery
This spatial information also allows detailed analysis of cell neigh- rate (FDR) <0.05. The most statistically significant gene was SPP1
borhoods. We defined a neighborhood matrix encoding the number (P = 5 × 10−61). SPP1 has been shown to mediate macrophage polariza-
of each cell type among each cell’s 200 closest neighbors (Fig. 3c). tion and upregulate PD-L1 expression21. Plotting macrophage SPP1
Neighborhood matrices can be tailored to answer a broad range of expression across the physical space of Lung 6 demonstrated two
different biological questions. For example, neighborhoods could be clear subpopulations of macrophage (Fig. 3g): SPP1-high macrophages
defined over smaller or larger distances, or a neighborhood matrix dominated the tumor interior and the upper half of the stroma whereas
could encode average gene expression profiles or average gene expres- SPP1-negative macrophages dominated the lower half of the stroma
sion profiles within specific cell populations. and the long thrust of diverse immune cells cutting deep into the
Once a neighborhood matrix was defined, it was subjected to tumor along the vasculature. Spatial expression analysis of SPP1 and
traditional single-cell analyses. A UMAP projection of our neighbor- HLA-DQA1 transcripts together (Supplementary Fig. 11) revealed these
hood matrix shows the diverse microenvironment states within these genes to be expressed in mostly mutually exclusive regions of the

Fig. 3 | Spatial RNA detection to identify cell types and cell–cell interactions locations and colored by niche. f, Mean gene expression of macrophages in each
in FFPE human NSCLC tissue. a, UMAP projections of each tissue based on gene niche in Lung 6. The green sidebar shows statistical significance from a global
expression matrix. b, Cells in physical locations (x, y coordinates). a,b, Color likelihood ratio test for uniform expression across all niches. g, Macrophage
denotes cell type. c, Definition of a neighborhood matrix. For each cell, the expression of SPP1 in Lung 6. In this tissue, macrophages located within the
nearest neighbors are identified and a summary of those neighbors is recorded. tumor express SPP1, as shown in the upper half of the dense macrophage region;
Here, the abundance of each cell type is used. This operation is performed for in contrast, macrophages located along the vasculature and lymphoid cells
all cells, defining a matrix of cells and neighborhood characteristics. d, UMAP in the tumor rarely express SPP1. Macrophages denoted by bold points; SPP1
projection and clustering of cells based on the neighborhood matrix. Left: expression in macrophages is shown on a color scale of black to red. These
colored by cell type; center: colored by tissue; right: colored by clustering of data can be interactively viewed at higher resolution in the CosMx Data Viewer
neighborhood data or niche. e, Spatial arrangement of niches. For each cell, the (https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-
frequency of each cell type among its 200 closest neighbors is recorded; cells are dataset/). NK, natural killer cell; Treg, regulatory T cell.
then clustered into niches based on these data. Cells are shown in their physical

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1800


Article https://doi.org/10.1038/s41587-022-01483-z

tumor, suggesting an antigen-presentation role for SPP1-negative niches (Supplementary Fig. 12). For every tumor, macrophages in the
macrophages. Plotting the density of macrophage SPP1 expression diverse ‘immune’ niche had lower SPP1 expression than in almost any
across tumors identified changes between tumors and between other niches.

a Lung 5 Lung 6 Lung 9


c
Closest K neighbors
Neighborhood matrix:
Endothelial number in closest K neighbors
Fibroblast
Macrophage
Mast
mDC Cell_001
Cell 001 5 15 30 50
Monocyte
Neutrophil
NK ... ... ... ...
Cell 002
pDC
Plasmablast
Treg
Lung 12 Lung 13
Tumor
Epithelial
d
Memory CD4 T cell
Naive CD4 T cell
Memory CD8 T cell
Naive CD8 T cell
B cell

Lung 9

Lung 5-3
Lung 5-4
Lung 5-5
Lung 6
Lung 9
Lung 12
Plasmablast-enriched stroma
Lung 13
b Lung 9 Lymphoid structure
Myeloid-enriched stroma
Tumor–stroma boundary
Tumor interior
Stroma
Neutrophils
Macrophages
Immune-enriched

Lung 9
e
Lung 9

1 mm

1 mm

Lung 5-3 Lung 6


1 mm

1 mm

Lung 5-3 Lung 6

1 mm 1 mm

Lung 5-4 Lung 12


1 mm 1 mm

Lung 5-4 Lung 12

1 mm

1 mm
Lung 5-5
Lung 13

1 mm 1 mm 1 mm
1 mm

Lung 5-5
f g
Lung 13 FDR < 0.05
1 SPP1
0
5
10+
0
FDR < 0.05
Lymphoid structure

Tumor–stroma boundary

Tumor interior

Myeloid-enriched stroma

Macrophages

Stroma

Immune

1 mm
1 mm
1 mm

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1801


Article https://doi.org/10.1038/s41587-022-01483-z

a c

CEACAM6/EGFR
Cadherins
Calprotectin TNFSF13B/TNFRSF17
Other binding
CXCL16/CXCR6
EGFR
signaling HLA–E/KLRK1
APP/TNFRSF21 1.0

Normalized interaction score


100 unique EFNB1/ERBB2
0.8
ligand–receptor NRG1/ERBB2
TNF
pairs in the
signaling CD274/PDCD1 0.6
980-plex panel
TNFSF12/TNFRSF12A
Ephrins 0.4
TLR CDH1/ITGAE
signaling CDH1/EGFR 0.2
PDGFR CEACAM1/EGFR
signaling ANXA1/EGFR
Notch MHC Immune
signaling class I checkpoints NRG4/EGFR
Calprotectin/ALCAM
NRG4/ERBB2

Lung 6

Lung 5

Lung 13

Lung 9

Lung 12
b Step 1: Step 2: Step 3a: Step 4a:
Steps 3b and 4b: calculate
build Delaunay extract reorient randomize
score across linkages
network linkages linkages linkages

LR interaction LiRi
=
score nlinkages

Step 5: calculate P value

Distribution of
randomized scores
Interaction
score

P value

Fig. 4 | Paired LR expression between interacting tumor cells and T cells reoriented to differentiate between ligand- and receptor-expressing cells. This
varies across tumors. a, One hundred unique LR pairs are included in the SMI score is compared with a distribution of scores produced using randomized
980-plex panel. These interactions fall into functional categories in relative linkages. Such a comparison can determine whether the given configuration of
proportion. b, LR interactions (Li, Ri) are scored by first building a Delaunay interactions between specified cell types significantly enriches for pairwise LR
network based on cellular spatial locations. The desired linkages are extracted expression. c, Sixteen LR pairs exhibited spatial significance in at least one of five
from the network and used to calculate the LR interaction score, which different lung cancer tumors. The interaction scores for these pairs are scaled,
measures pairwise LR expression between specified cell types. Linkages are each score having a maximum of 1 across all tumors.

With the spatial map of cell types in place, we turned to interro- Reproducibility across serial sections of FFPE
gate the interactions between tumor cells and T cells. First, we anno- NSCLC tumor
tated 100 canonical ligand–receptor (LR) partners within the 980-plex To obtain a better understanding of the reproducibility of the SMI
panel (Fig. 4a). Within our panel, many LR pairs relevant to the platform, three serial sections from an NSCLC tissue were profiled.
tumor immune interface can be found, including various immune Although these serial sections would not contain the same individual
checkpoints such as PD-L1/PD-1 and CTLA4/CD86. To understand cells, they had nearly identical tissue architecture (Lung 5-3, Lung 5-4
how these interactions change across space and between samples, and Lung 5-5 in Fig. 3), allowing comparisons of the same tissue region
we devised a new computational method to search for coordinated across slides. This experiment offers an opportunity to observe techni-
LR expression in neighboring cell types (Fig. 4b). Using this method, cal variability that tests all aspects of the SMI platform: independent
we discovered 16 LR pairs that were enriched at the tumor–T cell tissue preparation, cyclic chemistry, imaging, primary/secondary/
interface in at least one of five lung tumors (Fig. 4c); many of these tertiary data processing and analysis in solid tissue with minimal
interactions were present in only a subset of tumors. PD-L1/PD-1 (CD274/ biological variability.
PDCD1) exhibited a higher interaction score across Lungs 5, 9, 12 and 13 As a first step, we examined the total-slide integrated RNA expres-
but remained lower in Lung 6. Notably, HER2 (ERBB2) shows a similar sion profile from each section by acquiring the total counts of each
profile across tumors. However, a member of the same receptor fam- gene across all cells. These whole-slide integrated bulk profiles, each
ily, EGFR (EGFR), maintained a higher interaction score in Lung 6 and including 94,977–105,903 cells, were highly concordant. The lowest
decreased interaction in all other lung tumors. This between-tumor correlation between the log-scale expression profile of any pair of the
variability in LR signaling strength is consistent with the known three replicates was 0.989 (Supplementary Fig. 13).
variability in tumor response to immune checkpoint inhibitors and To demonstrate reproducibility on a smaller spatial scale than a
EGFR inhibitors. whole section, we partitioned two replicate (rep) sections into 74 grid

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1802


Article https://doi.org/10.1038/s41587-022-01483-z

a Lung 5-3 Lung 5-5 b

log-scale expression in Lung 5-5


Lung 5-3 square profiles Lung 5-5 square profiles

log-scale expression in Lung 5-3

Fig. 5 | Concordance between serial FFPE lung sections over a spatial grid. a, a hole in Lung 5-3 are excluded. To the right of the hole, a square with worse
Each serial section of FFPE lung tissue (Lung 5-3 and Lung 5-5) was partitioned concordance is visible; this square contains part of a tertiary lymphoid structure
into a grid, each square containing 600–2,000 cells. b, Concordance between the in the Lung 5-3 slide but not in the Lung 5-5 slide.
980-gene expression profiles of matching grid squares. Six squares overlapping

squares each (Fig. 5a, top). For this analysis, we used slides Lung 5-3 Among these, 104 antibodies carried a 64-bit encoded barcode to be
and Lung 5-5 (replicates of Lung 5), which were the most spatially read out over four detection events across 16 rounds of reporter hybrid-
well-aligned pair. Grid squares contained 600–2,000 cells; six grid ization. Four antibodies (PanCK, CD45, CD68 and membrane marker
squares intersecting a tissue hole in Lung 5-3 were discarded (Fig. 5b). CD298) were conjugated to a distinct set of oligonucleotide landing
The total expression profile of each grid square was acquired to pro- sites for non-encoded morphological visualization during a single
duce a gene expression matrix of 960 genes across 74 grid squares hybridization event with a distinct set of fluorescent reporter oligo-
in each replicate. The correlation of 960-gene expression profiles nucleotides. Each conjugated antibody had previously been reviewed
between matching grid squares was high: the average square had a cor- by a pathologist and validated on GeoMx DSP20 or via single-plex IHC.
relation of 0.96 between the log-transformed expression profiles of two Formalin-fixed, paraffin-embedded breast cancer tissue of
replicates, and 95% of all squares had correlations of 0.87–0.94 (Fig. 5b). HER2-positive invasive carcinoma was prepared under standard IHC
The lowest correlation occurred in a square where the biological struc- conditions with overnight antibody incubation. Following washes and
ture in serial sections differed: in Lung 5-3 the square contained a small fiducial application, the tissue sample in flow cells was placed on the
part of a tertiary lymphoid structure while in Lung 5-5 it did not. SMI instrument. Protein localization patterns (Fig. 6a,b) and relative
expression levels were read out on the SMI instrument using the same
High-plex protein imaging using SMI chemistry cyclic chemistry as the RNA readout assay. The primary data output is
The SMI encoded detection chemistry utilized for RNA can be applied decoded OME-TIFF files, showing the localization pattern and relative
to protein detection by conjugating the oligonucleotide readout intensity of each protein target in the assay (Fig. 6).
sequences to antibodies (Supplementary Fig. 14). This simple expan- The OME-TIFF files were subset using cell segmentation masks
sion of SMI chemistry for antibody-based detection is enabled by the to yield cell-by-cell expression patterns (Fig. 6c and Supplementary
extremely small size of the oligonucleotides required for SMI 64-bit Fig. 15). Protein localization patterns for a subset of targets, including
encoding: only 60–80 nucleotides are required to encode all of the CD45, Vimentin, CD3 and Histone H3, were cross-validated by com-
multiplexing. parison of immunofluorescence signals from antibodies detected
For the 108-plex protein panel described here, a set of 129 barcodes using traditional IHC or DAPI staining with a single target detected per
with HD5 and HW4 were generated with the same initial set of 64-bit channel to the computationally decoded protein maps (Supplemen-
barcodes used for RNA. The barcodes were designed to maximize bit tary Fig. 16). All other localization patterns were further validated by
distribution over 16 hybridization rounds and to minimize bit over- a pathologist review of decoded protein localization patterns using
lap across barcodes. A subset was selected to detect target proteins, appropriate control tissues.
with the remaining barcodes as blank controls for misidentification While this identical SMI chemistry has been successfully utilized
quantification. to plex >900 RNA targets, the ability to multiplex proteins needed to
Using modified SMI oligonucleotide barcodes, we performed be established. Plex testing was accomplished by performing ‘nested
site-specific conjugation of 108 antibodies against immune cell activa- multiplexed’ protein assays across various plex levels on 35 FFPE cell
tion states and drivers of cancer progression (Supplementary Table 10). lines. In a nested multiplexed assay we compare a 25-plex result with

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1803


Article https://doi.org/10.1038/s41587-022-01483-z

a b
HER2

B7-H3

CD45

CD4
Lorem ipsum
PanCK

SMA

DAPI

4–1BB ARG1 B2M B7–H3 BAD BCL2 BCL6 BCLXL Beta–catenin BIM BRAF cCASP9 CCR7

CD107a CD11b CD11c CD127 CD138 CD14 CD15 CD16 CD163 CD20 CD206 CD25 CD27

CD3 CD31 CD34 CD39 CD4 CD40 CD44 CD45 CD45RO CD56 CD66b CD68 CD8

CD80 CD95 CTLA4 Desmin EGFR EPCAM ER–alpha FN1 FOXP3 GAPDH GATA3 GITR GZMA

log2 intensity
22
20
GZMB H3C1 HER2 HLA–DRA ICOS IDO1 IgD IL–18 IL–1B iNOS KI67 LAG3 MART1
18
16

MET MPO MsIgG1 MsIgG2a NeuN NF–kBp65 NF1 NT5E NY–ESO–1 OX40L p44MAPKERK p53 pAKT1

panAKT panRAS PARP PD1 PDL1 PDL2 pGSK3A–S21–pGSK3B–S9 pJNK PLCG1 pMEK1 pp38MAPK pp44MAPKERK pp90RSK

pPRAS40 PR PTEN pTuberin–T1462 RbIgG S6 SMA STING T–bet TCF–1 TIM3 VIM VISTA

Fig. 6 | Spatial subcellular protein analysis on SMI. a, Multichannel overlay SMA (white). a,b, Scale bars, 50 µm. c, Single-cell expression profiles across
of six protein targets detected in a breast cancer biopsy (HER2-positive 104 encoded protein targets in the sample shown in a. Colored according to log2
invasive carcinoma) from the 108-plex assay (104 encoded targets and four transformation of the sum of pixel intensities in each cell, with the threshold for
morphological markers). b, Enlargement of boxed region in a. Each decoded visualization being the geometric mean of negative isotype controls (mouse
marker is visualized with DAPI (blue) and morphological markers HER2 IgG1, mouse IgG2a, rabbit IgG) plus 1.5 standard deviations.
(yellow), B7-H3 (light green), CD45 (cyan), CD4 (light purple), PanCK (red) and

a 50-plex assay, where the latter consists of the same targets as the Discussion
former but with 25 additional new targets. In this manner, the counts Imaging of real-world FFPE cancer samples is challenging at high plex
of the 25 targets measured in the 25-plex assay are graphed along the in a multiomic manner using conventional methods. As shown in Sup-
x axis while the abundance of the same 25 targets, when measured as plementary Table 5, >96% of the 800,327 cells in the samples used in
part of the 50-plex assay, is graphed along the y axis (Supplementary this study passed quality control and were analyzable at 980-plex RNA
Fig. 17). Measured intensities should be independent of the plex of the and subcellular spatial resolution. Especially noteworthy, this high
final assay. Nested protein assays at 25, 50 and 79 plex yielded highly efficiency of detection was independent of the overall RNA integrity
concordant results over all FFPE cell lines (Supplementary Fig. 17). of the samples. Samples Lung 5 and Lung 13 had DV200 values <30%
These data suggest that protein detection on SMI can probably achieve and were classified as too degraded to even attempt to perform bulk
well over 79–108 plex and may scale in a manner very similar to that of RNA sequencing42, while Lung 9 and Lung 12 were graded as being of
the RNA-based assay. medium RNA quality. The fact that SMI yielded >96% cell detection

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1804


Article https://doi.org/10.1038/s41587-022-01483-z

efficiency across this wide range of DV200 (and RIN scores) reflects whereas RNA images reveal ‘punctate spots’ distributed in a much
the ability of the very small hybridization footprint of SMI chemistry more uniform pattern. SMI is capable of combining both robust RNA
(35–50-nt hybridization zone) to quantitate very fragmented RNA pat- information content and tissue architecture in the image.
terns where DV200 scores are classified as too degraded to sequence. Simultaneous RNA and protein detection on a single slide is also
The ultrasmall readout zone also permits the very simple ‘migra- required for accurate assignment of RNA to single cells, especially in
tion’ of SMI chemistry, from detection of in situ RNAs to oligo-labeled real tissue materials where cells are of very divergent size and density.
antibodies. Our 60–80-nt oligonucleotide-labeled antibodies are easily Cell segmentation methods relying on nuclear staining to predict cell
conjugated and purified and retain their original specificity and sen- boundaries are particularly challenged by low- and high-density tis-
sitivity. Over 80% of the antibodies in this SMI protein panel migrated sue cell environments. SMI uses nuclear staining and antibody-based
directly from the extensively validated antibody panels developed universal membrane stain cocktails, along with state-of-the-art cell
for GeoMx DSP20. All DSP-labeled antibodies successfully migrated segmentation algorithms, to accurately outline the cell boundary.
to an SMI oligo-labeled antibody format; the remaining 20% were This is essential for high-quality single-cell data enabling downstream
screened before and after conjugation in single-plex chromogenic analysis, including cell typing, cell state and cell–cell interactions
assays and reviewed by a pathologist. To the best of our knowledge, (Figs. 3 and 4).
this represents the highest plex protein imaging experiment carried The spatial imaging data from five lung tumors provide more rich-
out on FFPE samples. ness than could be fully explored in a single manuscript. For experts
Another unique feature of the SMI platform is the ability to work in immuno-oncology, the images in Fig. 3 will suggest insights and
with FFPE samples and utilize standard, automatable FFPE sample questions far beyond those discussed here. We expect that similar
preparation methods. SMI applies a sample preparation protocol datasets may often lead to multiple publications as new analysts dis-
nearly identical to standard low-plex in situ hybridization, which does cover additional insights. We demonstrate a narrow application of
not require tissue clearing and/or expansion. Sample preparation time spatial differential expression, and we looked for spatial dependency
for an SMI measurement is 1 day, with <60 min of hands-on time. All of in a single cell type in a single sample. Using our data, similar analyses
these procedures can be automated on a standard tissue sample pro- may still be performed for 959 other genes in any of 18 cell types and in
cessor such as the Leica Bond system43. The simplicity and automated any of the five tumors. These results could then be contrasted across
features of SMI sample preparation save time and labor and maintain samples. Gene expression could be contrasted with more nuanced
tissue preparation highly consistent and reproducible (Fig. 5). The spatial variables, such as the expression profile among cells’ neighbors
SMI chemistry uses a controlled number of 15–60 fluorescent dyes or in that of only a cell’s neighbors of a given cell type. Experts in a given
encoded into a single 64-bit imaging barcode, providing sufficient cell type or gene will find much to explore in these data.
fluorescence signal-to-noise ratio to quantitate a single in situ molecule With the advent of spatial molecular imaging using technologies
of RNA or protein even in the presence of severe autofluorescence of such as the SMI platform, we can now directly test for the enrichment
FFPE tissue samples. of pairwise LR expression at the interface of interacting cells. Here we
Spatial molecular imaging has demonstrated detection of up to find that, out of unique LR pairs, 16 were significantly enriched at the
980-plex RNA and up to 108-plex protein, which is the highest plex of interface between tumor and T cells in at least one of five tumors. As
spatial RNA and protein profiling on FFPE tissue available to date at expected, the interaction scores for each of these pairs varied across
single-cell and subcellular resolution (Figs. 1 and 6). Currently, RNA tumors. This approach can be employed for any pair of cell types, and
readout can be simultaneously performed with morphology marker contrasts in LR interactions can be drawn between different samples
antibodies but is not yet compatible with the 104-plex encoded protein as well as between spatial contexts within a single sample.
panel. Increases in the plex of both RNA and protein assays are possible We have not attempted a comprehensive analysis of all of the
under this encoding scheme but have not yet been demonstrated. data in this first manuscript describing the SMI technology. For that
High accuracy is ensured by the error-robust color encoding scheme reason, we are placing the raw and processed data described in this
with a large Hamming distance (HD ≥ 4) between targets. In addition study into the public domain (http://nanostring.com/CosMx-dataset)
to the high sensitivity and specificity of RNA and protein detection, where interested scientists from around the world can explore the data.
the very low cell dropout rate (<8%, average of 4% in NSCLC samples) Spatial imaging technologies have the potential to greatly enhance the
makes the data from almost every cell available for analysis, which has field of spatial biology. When SMI is combined with high-plex profiling
a distinct advantage over gene dropout in droplet-based, single-cell technologies (such as DSP), a truly comprehensive spatial investigation
sequencing methods44. of tissues can be accomplished.
Spatial molecular imaging accomplishes RNA and protein mul-
tiplexing using a true 64-bit encoding method. This contrasts with Online content
techniques that cycle non-encoded reagents for RNA and protein Any methods, additional references, Nature Research reporting sum-
imaging. For example, cyclic immunofluorescence methods45 are maries, source data, extended data, supplementary information,
limited to the number of fluorescent channels multiplied by the num- acknowledgements, peer review information; details of author con-
ber of reagent imaging cycles, which increases linearly in the number tributions and competing interests; and statements of data and code
of cycles of reagents (for example, 16 imaging rounds × four-channel availability are available at https://doi.org/10.1038/s41587-022-01483-z.
detection = 64-plex capability). SMI uses encoded barcoded antibodies
and the same 16 imaging rounds with four-channel detection, with HD4 References
and HW4 yielding 1,210-plex capability. This class of encoded barcode 1. Garon, E. B. et al. Pembrolizumab for the treatment of
technology provides the type of chemistry that can be essentially non-small-cell lung cancer. N. Engl. J. Med. 372, 2018–28 (2015).
unlimited in plex, and with no fundamental change in instrumentation. 2. Yu, H. et al. PD-L1 expression by two complementary diagnostic
The multiomic capability of SMI is important in regard to a number assays and mRNA in situ hybridization in small cell lung cancer.
of key features. While high-plex RNA images supply incredible infor- J. Thorac. Oncol. 12, 110–120 (2017).
mation content concerning the overall activity of individual cells, the 3. Yu, H., Boyle, T. A., Zhou, C., Rimm, D. L. & Hirsch, F. R. PD-L1
images themselves do not reveal detailed tissue architecture as well as expression in lung cancer. J .Thorac. Oncol. 11, 964–975 (2016).
protein images—for instance, comparison of a typical RNA tissue image 4. Ting, D. T. et al. Aberrant overexpression of satellite repeats
(Fig. 1c) with a typical protein tissue image (Fig. 6a). Protein-based in pancreatic and other epithelial cancers. Science 331,
images clearly reveal overall tissue architecture in exquisite detail 593–596 (2011).

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1805


Article https://doi.org/10.1038/s41587-022-01483-z

5. Garber, K. Oncologists await historic first: a pan-tumor predictive RNA compartmentalization and cell cycle-dependent gene
marker, for immunotherapy. Nat. Biotechnol. 35, 297–298 (2017). expression. Proc. Natl Acad. Sci. USA 116, 19490–19499 (2019).
6. Sokolenko, A. P. & Imyanitov, E. N. Molecular tests for the choice 28. Goltsev, Y. et al. Deep profiling of mouse splenic architecture with
of cancer therapy. Curr. Pharm. Des. 23, 4794–4806 (2017). CODEX multiplexed imaging. Cell 174, 968–981 (2018).
7. Dereli, A. S., Bailey, E. J. & Kumar, N. N. Combining multiplex 29. Lin, J. R., Fallahi-Sichani, M. & Sorger, P. K. Highly multiplexed
fluorescence in situ hybridization with fluorescent imaging of single cells using a high-throughput cyclic
immunohistochemistry on fresh frozen or fixed mouse brain immunofluorescence method. Nat. Commun. 6, 8390 (2015).
sections. J. Vis. Exp. https://doi.org/10.3791/6170910.3791/ 30. Giesen, C. et al. Highly multiplexed imaging of tumor tissues
61709 (2021). with subcellular resolution by mass cytometry. Nat. Methods 11,
8. Taube, J. M. et al. The Society for Immunotherapy of Cancer 417–422 (2014).
statement on best practices for multiplex immunohistochemistry 31. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X.
(IHC) and immunofluorescence (IF) staining and validation. RNA imaging. Spatially resolved, highly multiplexed RNA profiling
J. Immunother. Cancer 8, e000155 (2020). in single cells. Science 348, aaa6090 (2015).
9. Hirsch, F. R. et al. PD-L1 immunohistochemistry assays for lung 32. Mercer, T. R. et al. The human mitochondrial transcriptome. Cell
cancer: results from Phase 1 of the Blueprint PD-L1 IHC Assay 146, 645–658 (2011).
Comparison Project. J. Thorac. Oncol. 12, 208–222 (2017). 33. Gudenas, B. L. & Wang, L. Prediction of LncRNA subcellular
10. Udall, M. et al. PD-L1 diagnostic tests: a systematic literature localization with deep learning from sequence features. Sci. Rep.
review of scoring algorithms and test-validation metrics. 8, 16385 (2018).
Diagn. Pathol. 13, 12 (2018). 34. Baker, S. C. et al. The External RNA Controls Consortium: a
11. Halse, H. et al. Multiplex immunohistochemistry accurately progress report. Nat. Methods 2, 731–734 (2005).
defines the immune context of metastatic melanoma. Sci. Rep. 8, 35. Ghandi, M. et al. Next-generation characterization of the Cancer
11158 (2018). Cell Line Encyclopedia. Nature 569, 503–508 (2019).
12. Macosko Evan, Z. et al. Highly parallel genome-wide expression 36. Nusinow, D. P. et al. Quantitative proteomics of the Cancer Cell
profiling of individual cells using nanoliter droplets. Cell. 161, Line Encyclopedia. Cell 180, 387–402 (2020).
1202–1214 (2015). 37. National Cancer Institute. NCI-60 Human Tumor Cell Lines Screen
13. Zilionis, R. et al. Single-cell barcoding and sequencing using https://dtp.cancer.gov/discovery_development/nci-60/ (2020).
droplet microfluidics. Nat. Protoc. 12, 44–73 (2017). 38. Kinker, G. S. et al. Pan-cancer single-cell RNA-seq identifies
14. Wang, X., He, Y., Zhang, Q., Ren, X. & Zhang, Z. Direct comparative recurring programs of cellular heterogeneity. Nat. Genet. 52,
analyses of 10X Genomics Chromium and Smart-seq2. Genomics 1208–1218 (2020).
Proteomics Bioinformatics 19, 253–266 (2021). 39. Liu, J. et al. Concordance of MERFISH spatial transcriptomics with
15. See, P., Lum, J., Chen, J. & Ginhoux, F. A single-cell sequencing bulk and single-cell RNA sequencing. Preprint at bioRxiv
guide for immunologists. Front. Immunol. 9, 2425 (2018). https://doi.org/10.1101/2022.03.04.483068 (2022).
16. Ståhl, P. L. et al. Visualization and analysis of gene expression in 40. Leader, A. M. et al. Single-cell analysis of human non-small cell
tissue sections by spatial transcriptomics. Science 353, 78–82 lung cancer lesions refines tumor classification and patient
(2016). stratification. Cancer Cell 39, 1594–1609 (2021).
17. Stickels, R. R. et al. Highly sensitive spatial transcriptomics at 41. Danaher, P. et al. Advances in mixed cell deconvolution enable
near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, quantification of cell types in spatially-resolved gene expression
313–319 (2021). data. Nat. Commun. 13, 385 (2022).
18. Fu, X. et al. Continuous polony gels for tissue mapping with 42. Illumina. Evaluating RNA quality from FFPE samples. https://
high resolution and RNA capture efficiency. Preprint at bioRxiv www.illumina.com/content/dam/illumina-marketing/
https://doi.org/10.1101/2021.03.17.435795 (2021). documents/products/technotes/evaluating-rna-quality-
19. Liu, Y. et al. High-spatial-resolution multi-omics sequencing via from-ffpe-samples-technical-note-470-2014-001.pdf (2021).
deterministic barcoding in tissue. Cell 183, 1665–1681 (2020). 43. Leica Biosystems. OND-III fully automated IHC and ISH staining
20. Merritt, C. R. et al. Multiplex digital spatial profiling of proteins system. https://www.leicabiosystems.com/ihc-ish-fish/
and RNA in fixed tissue. Nat. Biotechnol. 38, 586–599 (2020). fully-automated-ihc-ish-instruments/bond-iii/ (2021).
21. Wu, Q. et al. Poly A – transcripts expressed in HeLa cells. PLoS 44. Qiu, P. Embracing the dropouts in single-cell RNA-seq analysis.
ONE 3, e2803 (2008). Nat. Commun. 11, 1169 (2020).
22. Moffitt, J. R. et al. High-throughput single-cell gene-expression 45. Lin, J. R., Fallahi-Sichani, M., Chen, J. Y. & Sorger, P. K. Cyclic
profiling with multiplexed error-robust fluorescence in situ immunofluorescence (CycIF), a highly multiplexed method for
hybridization. Proc. Natl Acad. Sci. USA 113, 11046–11051 (2016). single-cell imaging. Curr. Protoc. Chem. Biol. 8, 251–264 (2016).
23. Moffitt, J. R. & Zhuang, X. in Methods in Enzymology (eds Filonov,
G. S. & Jaffrey, S. R.) 1–49 (Academic Press, 2016). Publisher’s note Springer Nature remains neutral with regard to
24. Groiss, S. et al. Highly resolved spatial transcriptomics for jurisdictional claims in published maps and institutional affiliations.
detection of rare events in cells. Preprint at bioRxiv https://doi.
org/10.1101/2021.10.11.463936 (2021). Springer Nature or its licensor holds exclusive rights to this
25. Lee, J. H. et al. Fluorescent in situ sequencing (FISSEQ) of RNA for article under a publishing agreement with the author(s) or other
gene expression profiling in intact cells and tissues. Nat. Protoc. rightsholder(s); author self-archiving of the accepted manuscript
10, 442–458 (2015). version of this article is solely governed by the terms of such
26. Eng, C.-H. L. et al. Transcriptome-scale super-resolved imaging in publishing agreement and applicable law.
tissues by RNA seqFISH+. Nature 568, 235–239 (2019).
27. Xia, C., Fan, J., Emanuel, G., Hao, J. & Zhuang, X. Spatial © The Author(s), under exclusive licence to Springer Nature America,
transcriptome profiling by MERFISH reveals subcellular Inc. 2022

Nature Biotechnology | Volume 40 | December 2022 | 1794–1806 1806


Article https://doi.org/10.1038/s41587-022-01483-z

Methods To date, the CosMx SMI platform has been used only for research
FFPE cell pellet microarrays and tumor tissues purposes.
Custom FFPE cell pellet arrays of 16 cell lines for RNA assay and
35 cell lines for protein assay were made with A-FLX FFPE CELL PELLET SMI ISH probe design
(Acepix Bioscience). All cell lines were originally sourced from ATCC. The ISH probes were designed to bind in situ messenger RNA
The 16 cell lines for RNA assay include CCRF-CEM, COLO205, DU145, targets (Fig. 1a). From 5′ to 3′, they each comprised a 35–50-nt
EKVX, HCT116, HL60, HOP92, HS578T, IGROV1, M14, MDA-MB-468, target-complementary sequence followed by four consecutive
MOLT4, PC3, RPMI-8226, SKMEL2 and SUDHL4. The 35 cell lines 10–20-nt readout sequences corresponding to four on bits (HW4)
for protein assay include SHSY5Y CA, A431 CA, NCIH2228, assigned to each target. The target-binding sequences in ISH probes
DBTRG05MG, SW48, NB4, U251MG, U87MG, SUPB15, THP1, WSUNHL, were developed by a probe design pipeline that optimizes sensitiv-
SKMEL2, SKMEL5, SKBR3, RAMOS, RPMI-8226, SUDHL6, RI1, ity and specificity for mRNA transcripts. The process begins with an
RAJI, MDAMB468, HUT78, HUH7, HL60, HCT116, HCC78, H596, exhaustive evaluation of all possible contiguous 35–50-nt sequence
A431, HEK293 ICOS, HEK293 PD1-overexpressing, HEK293 windows for each mRNA target. This large pool of possible probe can-
CTLA4-overexpressing, HEK293 GITR-overexpressing, HEK293 didates was first filtered for ideal intrinsic characteristics including
LAG3-overexpressing, HEK293 PDL2-overexpressing and HEK293 melting temperature (Tm), guanine-cytosine (GC) content, secondary
PDL1-overexpressing. structure and runs of polynucleotides. Probes satisfying these param-
The breast cancer biopsy used for protein assay was a CTRL301 eters were further screened for homology to the full transcriptome of
tissue microarray supplied by US Biomax. The FFPE sample used was the parent organism, utilizing the basic local alignment search tool
a malignant breast biopsy from a 61-year-old female with invasive car- (BLAST) from the National Center for Biotechnology Information
cinoma (HER2 3+), T2N0M0 at IIA grade. Human tissues were collected (NCBI). Preference was given to probes covering known protein-coding
under Health Insurance Portability and Accountability Act-approved transcripts, lying within coding regions and maximizing the coverage
protocols (US Biomax). of the isoform repertoire. Final panel candidates were further screened
Five FFPE human NSCLC tissues were acquired from ProteoGenex, for intermolecular interactions with other probes in the candidate pool,
all of which were collected under an ethics committee (European IRB including potential probe–probe hybridization as well as minimization
analog) and with informed consent. of common sequences between probes. Five oligonucleotide RNA
detection probes were designed per target mRNA. Negative control
SMI instrument probes were modeled after synthetic sequences from the ERCC set34.
The Spatial Molecular Imager utilizes standard sample preparation These negative ISH probes were designed to contain the same intrinsic
methods typical for FISH or IHC on FFPE tissue sections, with the characteristics and subjected to the same inter-/intramolecular inter-
introduction of fluorescent bead-based fiducials that are fixed to the action screens as the primary panel of probes.
tissue to provide an optical reference for cyclic image acquisition and The 10–20-nt readout sequences based on the 64-bit barcode
registration. For sample-processing details, see RNA assay FFPE tissue design were filtered based on having a GC fraction >35% to minimize
manual preparation. cross-hybridization between reporter probes and the junctions
Following hybridization of ISH probes or antibody incubation, between readout sequences, as well as to maintain HD = 4 between
slides were washed and a coverslip with a spacer of predefined height sequences. Also, readout sequences contained only bases A, C and T to
(50–100 μm) was applied. The slide plus coverslip constitutes the flow maximize binding kinetics between reporters and ISH probes.
cell, which was placed within a fluidic manifold on the SMI instrument
for analyte readout and morphological imaging. SMI reporter design and assembly
The in situ chemistry and imaging analyses were performed on a The SMI reporter is a defined 15–60-dye DNA construct assembled from
prototype instrument with a custom large FOV, high numerical aperture three oligonucleotide motifs: dye (RPD), sub-branch (RPU) and nano-
(NA) and low aberration objective and associated optics that permit BarCode (nBC) (Supplementary Fig. 18 and Supplementary Table 11).
>6,000 cells to be imaged per FOV from a flow cell assembled onto All oligos were made from DNA amidite (ChemGenes), while RPU and
a standard glass slide. The prototype system can run up to four flow nBC also contain a PC linker for photocleavage. The PC linker moiety
cells simultaneously and can interleave fluidics and optical imaging is introduced to the GeoMx probes via the conventional phosphora-
operations to maximize throughput. midite method of DNA chain assembly using a custom 2-cyanoethyl
The optical system has an epifluorescent configuration based phosphoramidite reagent derived from 5-methyl-2-nitrobenzoic acid
on a custom water objective ( ×13/0.82 NA). FOV size was customized starting material in a ten-step synthesis46. Commercially available
to 0.7 × 0.9 mm2. Illumination is widefield, with a mix of lasers and photocleavage phosphoramidites, such as PC-Spacer (no. P/N 10–4913)
light-emitting diodes (130 W cm–2 at 488 nm, 69 W cm–2 at 530 nm, and PC-Linker (no. P/N 10-4920) from Glen Research, would accomplish
26 W cm–2 at 590 nm, 81 W cm–2 at 647 nm) that allow imaging of Alexa the same function as this custom phosphoramidite reagent. The RPD
Fluor-488, Atto 532, Dyomics Dy-605 and Alexa Fluor-647, as well as contains 15 nt with a 5'-amino modifier used to conjugate Alexa Fluor-
cleaving of photolabile dye components. The camera used was a FLIR 488, ATTO 532, Dyomics-605 or Alexa Fluor-647 fluorophores. RPU
BFS-U3_200S6M-C based on the IMX183 Sony industrial CMOS sensor, contains a repeated 10–20-nt nBC motif on the 5' end, followed by a
and sampling at the image plane was 180 nm per pixel. An xy stage PC linker and a repeated 10–20-nt RPD motif toward the 3' end. nBC
moves the flow cell above the objective lens while a z-axis motor moves contains a 10–20-nt ISH probe motif at the 3' end, followed by a PC linker
the objective lens. and a repeated RPU motif at the 5' end. Each RPD has one corresponding
The fluidic system uses a custom interface to draw reagents RPU and 16 nBCs for a total of 64 reporters.
through a flow cell with a syringe pump. Reagent selection is controlled Lyophilized RPD and RPUs were normalized to 1.5 and 0.5 mM,
by a shear valve (Idex Health & Science). A flow sensor between the flow respectively, in Tris EDTA (TE) pH 8.0, and nBCs were normalized to
cell and syringe pump was used for flow rate feedback (Sensirion AG). 50 μM in TE. All oligos were stored at 4 °C before use. Reporter assembly
The fluidic interface includes a flat aluminum plate in direct contact occurs via a two-stage process involving RPD–RPU hybridization and
with the flow cell. The temperature of this metal plate was controlled nBC hybridization (Supplementary Fig. 18). The design of the ‘N’ oligo
to regulate the reporter hybridization temperature at 20–35 °C. The regions (14 mer; Supplementary Table 11) utilizes the observation noted
enclosure around the instrument was also maintained at a constant by Zhang et al.47, that sequences composed of three (of four possible)
temperature using a separate thermoelectric cooler. nucleotides hybridize more rapidly than those with four of four. Thus,

Nature Biotechnology
Article https://doi.org/10.1038/s41587-022-01483-z

we constructed all combinations of 14-mer sequences composed of Cell state and cell signaling categories were also subjected to
ATG nucleotides and filtered for single-base repeats (no single-base data-driven expression-level assessment. We employed 76 individual
repeats >4 nt), particularly poly-G (no repeats of G > 3 nt), at least 5 G Human Cell Landscape Datasets to score for expression level across
bases, Tm ≥ 40 °C (calculated for 5 nM oligo concentration and 750 mM cell types and tissues. We removed genes that were highly unlikely to be
NaCl using the SantaLucia method), and deoxyguanosine (dG) > 0 at detected in any tissue types. However, we utilized subject matter exper-
5 °C. Any passing sequences were screened for intersequence compat- tise to selectively retain genes of high biological interest—for example,
ibility for both Hamming distance (≥4) and cross-hybridization (no T cell checkpoint genes. While these genes may be rarely detected in
cross-hybridization with any other sequence with Tm > 32 °C (based on healthy tissue, they are crucial to understanding the immune system’s
thermodynamics of cross-hybridization using DNA folding/hybridiza- role in disease (Supplementary Fig. 5).
tion algorithms and calculated for 250 nM oligo concentration and Once genes of biological interest were selected, the remainder
750 mM NaCl using the SantaLucia method). The 980-plex RNA and of the panel was chosen to inform cell typing. First, literature-driven
108-plex protein readout requires 16 sets of pools, each set containing searches identified 56 highly informative immune cell markers41,62,63
four reporters with four different fluorophores. The pools were made and six adipocyte markers64–67. The remaining genes were then chosen
by diluting the 1 μM stock to a 5 nM per-probe solution in 8.75× sodium in a data-driven manner to maximize the contrast between expression
chloride-sodium phosphate-EDTA (SSPE), 0.5% Tween 20 (%w/v) and profiles of different cell types.
0.1% ProClin 950. Pool identity and cleavage were assessed using bioti- Specifically, for a given pair of cell types, we scored each gene’s
nylated targets on a streptavidin-covered slide (Schott). Pools were ‘pairwise distance’ as (|x1 – x2|/sqrt(x1 + x2)) max(x1, x2), where x1 and
stored at 4 °C until ready for use. x2 are a gene’s expression levels in the two cell types. The left side of
this equation is based on the t-statistic and mean-variance relationship
Antibody conjugation of Poisson distribution; the right side is based on the assumption that
All antibodies were sourced from vendors in a bovine serum albumin every transcript of a gene offers a fixed amount of evidence for one cell
(BSA)- and glycerol-free format. Antibodies were quantified using UV type versus another, and that the total evidence from a gene will there-
spectrophotometry and quality checked by gel electrophoresis. Anti- fore vary in proportion to its total transcripts. While the exact statistical
body heavy chains were prepared for conjugation using SiteClick azide power conferred by a gene depends on the cell-typing method used,
modification (Invitrogen) of carbohydrate domains. Amine-modified the above statistic formalizes the intuition that useful cell-typing genes
oligo tags were conjugated to antibodies using modifications of Site- will have two properties: they will vary strongly across cell types and
Click chemistry (Invitrogen) and the heterobifunctional DBCO linker they will have high expression levels in at least one cell type.
(Click Chemistry Tools). The resulting conjugates were HPLC purified For every gene, the pairwise distance between all pairs of cell types
and normalized to 200 μg ml–1 as previously described20. within each dataset was calculated then the total pairwise distance
For the protein detection assay, oligonucleotides resembled those between all cell types within each dataset was calculated over those
used for the SMI RNA detection in that there were landing sites for four genes already chosen for the panel. Cell-typing genes were then chosen
of the 64 nano-barcode reporters, corresponding to the 64-bit barcode in a greedy manner to improve pairwise distances between the least
(Supplementary Fig. 14). For imaging of tissue morphology and cell distant cell types across datasets. To initially choose genes that were
boundaries, antibodies were conjugated to an oligonucleotide possess- informative between many pairs of cell types, the first were chosen
ing a single landing site for one of four morphology reporters, which to maximally increase the 40th percentile of between-cell distances.
have the same overall structure as the nano-barcode reporters but bind Over 224 successive iterations, this percentile was dropped to 0.005
a unique set of four sequences that are orthogonal to the 64 landing to focus on increasing distances between the most similar cell types.
sites used in the readout assay. This allows the signals from antibodies
used for morphological visualization to be eliminated before high-plex RNA assay FFPE tissue manual preparation
RNA and protein readouts. Five-micron tissue sections were cut from FFPE tissue blocks using a
microtome, placed in a heated water bath and adhered to Leica Bond
SMI 980-plex RNA panel design Plus Microscope slides (Leica Biosystems). Slides were then dried at
For comprehensive RNA profiling, we designed a 980-plex panel 37 °C overnight and stored at 4 °C.
(Supplementary Table 1 and Supplementary Fig. 5e) to investigate To perform in situ hybridization on tissue sections, the slides were
the biology of single cells across tumors and diverse organs. baked in an oven overnight at 37 °C and then at 60 °C for at least 2 h. Tis-
To determine panel content, 749 genes were selected to capture sue sections were dewaxed twice in xylene (Millipore) for 5 min, twice in
critical cell states and cell–cell interactions; the remaining genes ethanol (Pharmco) for 2 min and then the slides were baked at 60 °C for
were selected to optimize the panel’s power to distinguish between 5 min. Tissue sections were subjected to the target retrieval step using
different cell types. the RNAscope Target Retrieval kit (ACD Bio) and heated at 100 °C in a
While cell state is a broad term encompassing a wide variety of pressure cooker for 15–30 min. After target retrieval, tissue sections
cellular phenotypes and processes, we focused our curation on core were rinsed with diethyl pyrocarbonate (DEPC)-treated water (DEPC
pathways or environmental factors that are important across broad H2O, ThermoFisher), washed in ethanol for 3 min and dried at room
areas of physiology and disease. These included immune cell states, temperature for 30 min. On the dried slide, a hydrophobic barrier line
basic cellular processes (for example, apoptosis, autophagy), cellular was drawn around the tissue section using an ImmEdge Pen (Vector).
structures that integrate environmental cues (for example, cytoskel- Tissue was then digested with Protease Plus (ACD Bio) spiked with
eton, extracellular matrix) and stress or damage responses (for exam- Proteinase K (ThermoFisher), ranging from 1 to 5 μg ml–1 depending on
ple, hypoxia, wound healing, DNA damage response) (Supplementary tissue type, at 40 °C for 15–30 min. Tissue sections were rinsed twice
Fig. 5). Genes for these cell states were primarily curated from the with DEPC H2O, incubated in 1:400 diluted fiducials (Bangs Laborato-
literature using hallmark genes for each set. Genes for cellular signal- ries) in 2× saline sodium citrate and Tween (0.001% Tween 20, Teknova)
ing pathways, including ligands and receptors, were curated from the for 5 min at room temperature and washed with 1× PBS (ThermoFisher)
literature, HUGO gene families and the KEGG BRITE Ontology. Finally, for 5 min. After digestion and fiducial placement, tissue was fixed with
target genes for common cell signaling pathways were included when 10% neutral-buffered formalin for 1 min to maintain soft tissue mor-
those transcriptional outputs were known with confidence. Consen- phology, washed twice with Tris-glycine buffer (0.1 M glycine (Sigma),
sus target genes were determined for pathways MAPK48, NF-κB48–51, 0.1 M Tris-base (FisherScientific) in DEPC H2O) for 5 min and then
Interferon52, Wnt53–60 and Hedgehog61. washed with 1× PBS for 5 min. Fixed tissue was blocked using 100 mM

Nature Biotechnology
Article https://doi.org/10.1038/s41587-022-01483-z

N-succinimidyl (acetylthio) acetate (NHS-acetate, ThermoFisher) image of the whole slide was acquired to allow the user to access the
diluted in NHS-acetate buffer (0.1 M NaP + 0.1% Tween pH 8.0 in DEPC tissue, then FOVs were placed at the areas of interest.
H2O) for 15 min at room temperature and washed in 2× saline sodium To start the cyclic RNA readout, 100 μl of reporter 1 was flowed
citrate (SSC) for 5 min. An adhesive SecureSeal Hybridization Chamber in at 200 μl min–1 and incubated for 15 min. After incubation, 1 ml of
(Grace Bio-Labs) was placed over tissue. reporter wash buffer was flowed into the flow cell at 750 μl min–1 to
NanoString®ISH probes were prepared by incubation at 95 °C for wash out reporter probes that had not hybridized. Following reporter
2 min and immediately transferred to ice. The ISH probe mix (1 nM ISH wash, 100 μl of imaging buffer was flowed into the flow cell before
probe, 40% formamide, 2.5% dextran sulfate, 0.2% BSA, 100 μg ml–1 imaging. The imaging buffer consisted of 80 mM glucose, 0.6 U ml–1
salmon sperm DNA, 2× SSC, 0.1 U μl–1 SUPERase•In (ThermoFisher) in pyranose oxidase from Coriolus sp., 18 U ml–1 catalase from bovine
DEPC H2O) was then pipetted into the chamber, and adhesives supplied liver, 1:1,000 Proclin 950, 500 mM Tris-HCI buffer pH 7.5, 150 mM
with the chambers were applied to the chamber ports. Hybridiza- sodium chloride and 0.1% Tween 20 in DEPC-treated water. Following
tion occurred at 37 °C overnight after sealing the chamber to prevent the acquisition of eight Z-stack images (800 nm step size) of each FOV,
evaporation. Following ISH probe hybridization, tissue sections were fluorophores on the reporter were UV cleaved (385 nm, 116 J cm–2,
washed twice in a buffer comprising 50% formamide (VWR) in 2× SSC 500 ms) and washed off with 200 μl of strip wash buffer (0.0033× SSPE,
at 37 °C for 25 min, washed twice with 2× SSC for 2 min each at room 0.5% Tween 20 and 1:1,000 Proclin 950. This fluidic and imaging
temperature and then blocked with 100 mM NHS-acetate for 15 min. procedure was repeated for the 16 reporter pools, and the 16 rounds
After blocking, the hydrophobic barrier was removed with a blade and of reporter hybridization were repeated multiple times for
a custom-made flow cell was attached to the slide. increased sensitivity.
After all cycles were completed, tissue was subjected to morphol-
Automated RNA assay FFPE tissue preparation on Leica Bond ogy staining workflow on the same instrument. The blocking buffer
RX (buffer W, NanoString) was incubated in the flow cell for 30 min fol-
As instructed in the manual for FFPE tissue preparation, tissue slides lowed by incubation of tissue with a four-antibody cocktail (CD298,
were baked overnight at 60 °C to ensure tissue adherence to the posi- CD45, CD3, PanCK) diluted in blocking buffer for 1 h. After incuba-
tively charged glass slides (Leica Bond Plus Microscope slides, Leica tion, tissue was washed with 8 ml of reporter wash and then 100 μl
Biosystems). Tissue was then deparaffinized and digested with Protease of imaging buffer was flowed into the flow cell before collection of
Plus (ACD Bio) spiked with Proteinase K (ThermoFisher) ranging from antibody-labeled tissue images.
1 to 5 μg ml–1 depending on tissue type and prepared for heat-induced
epitope retrieval (HIER) on a Leica Biosystems automated tissue han- Cyclic protein readout on the SMI instrument
dler (Bond RX, Leica Biosystems). For cell pellet array or tissue microar- The on-instrument SMI protein assay readout was performed as
ray samples, HIER requires treatment with Leica buffer ER1 at 100 °C for described for RNA, with three exceptions: (1) the readout was only
8 min, and 30 min of treatment for tissue samples. After Leica handling, 16 rounds of hybridization, with no repeat cycling; (2) morphology
the remaining process is the same as for the manual RNA assay FFPE visualization was performed using oligonucleotide-conjugated anti-
tissue preparation described above. bodies (Supplementary Fig. 14), as described in Antibody conjuga-
tion; and (3) tissue and cell membrane morphology were visualized
RNA isolation following hybridization of a specific pool of nano-barcode reporters
Total lung RNA was isolated from single or double 20-μm FFPE curls to oligonucleotide-conjugated antibodies.
using the RNeasy FFPE Kit (Qiagen) and digested with Proteinase K
for 30 min. Lung RNA was quantified using an Agilent RNA 6000 Nano Primary data processing
Kit (Agilent) according to the manufacturer’s instructions. DV200 and Primary data processing is a standard image analysis of a 3D multichan-
RIN scores were determined using Agilent BioAnalyzer 2100 Expert nel image stack obtained at each FOV location. The objective of this
Software B.02.09. For DV200, Region 1 selection was from 200 to about analysis is to reduce multidimensional image stacks to a single list of
8,000 nt using smear analysis. individual reporters seen in the specific binding event. This process
was performed in parallel across all FOVs and in line with data acquisi-
Automated protein assay sample preparation on Leica Bond tion. The image processing comprises three main steps—registration,
RX features detection and localization.
Deparaffinization and antigen retrieval were performed automati- Three-dimensional rigid image registration was performed with
cally on a BOND RX machine (Leica Biosystems) using Bond Dewax the use of fiducial markers embedded within the tissue sample. The
Solution (30 s at 72° C) and Bond ER Solution 1 (20 min at 100°C). fixed image reference was established at the start of the experiment
Samples were blocked for 1 h with blocking buffer W (NanoString). before reporter hybridization. Subsequent image stacks, shifted by
Oligonucleotide-conjugated primary antibodies were pooled and stage motion, were matched to this reference using phase correla-
diluted uniformly to 100 ng ml–1 in blocking buffer. Samples were tion. Individual channels within the image were aligned to each other
incubated with primary antibodies overnight at 4° C, then incubated through the application of precalibrated affine transformation.
with fiducials (0.00025%, Bangs Laboratories) for 5 min. To secure anti- The RNA image analysis pipeline focuses on identification of
bodies and fiducials, samples were fixed with 4% paraformaldehyde for diffraction-limited features that represent the fluorescence response
10 min at room temperature and then washed in 1× PBS. Before cyclic from a single molecule. Once the image stacks were registered, a 2D
encoded protein detection, slides were incubated with NHS-acetate Laplacian of Gaussian filter was applied to each channel to remove back-
(Thermo Scientific) then diluted in 0.0932 M Na2HPO4 + 6.8 mM ground and enhance the encoded reporter signatures. Kernel size and
NaH2PO4 + 0.1% Tween pH 8.0 for 15 min on the SMI instrument. standard deviation of the filter were matched to the expected reporter
point spread function. Post filtering, potential reporter locations
Cyclic RNA readout on the SMI instrument were identified as local maxima using a 3D nearest-neighbor search.
Processed tissue was assembled into the flow cell and loaded onto the Only local maxima greater than a channel-specific threshold were
SMI instrument, then washed with 1.5 ml of reporter wash buffer to retained for further localization. Thresholds for each channel were pre-
remove air bubbles from the flow cell. The reporter wash buffer con- determined empirically based on the SNR of the fluorescent channel.
sisted of 1× SSPE, 0.5% Tween 20, 0.1 U μl–1 SUPERase•In RNase Inhibitor The retained maxima were assigned a confidence value determined by
(20 U μl–1), 0.1% Proclin 950 and DEPC-treated water. A low-resolution the intensity of the reporter signal.

Nature Biotechnology
Article https://doi.org/10.1038/s41587-022-01483-z

Subpixel localization of each feature was obtained by fitting a the first pass through the data. Those reporter-binding event locations
2D polynomial to the maxima and analytical solving for the subpixel that contributed to making transcript calls were used to estimate a
maxima locations in x and y. The final reporter signature locations on centroid location for all transcripts retained after this filtering step.
the x, y and z axes, along with the assigned confidence, were recorded in All reporter-binding events that contributed to making retained tran-
a list assigned to the specific reporter-binding event. After completion script calls were then removed from the original imaging dataset, and
of acquisition, all features pertaining to a given FOV were collated into this modified dataset was used as a starting point for a second pass
a single list forming the basis of the secondary analysis. through the data.
The protein image analysis pipeline requires a different analytic The second pass through the data repeated all the steps above,
approach because, in general, multiple proteins have signals that albeit with an increased target search radius of 0.75 pixels (135 nm).
overlap on a single pixel. The protein image analysis pipeline uses The rationale behind this increase in radius was to try to recover any
locally adaptive thresholds68 to segment regions that range from single potential transcript calls that may have been lost due to local tissue
diffraction-limited features to large contiguous clusters within the FOV. motion during reagent cycling. With the increased radius of 0.75 pixels,
Instead of quantification of proteins as individual diffraction-limited the transcripts added to the final list of transcript calls after addition
spots, we analyze regions of expression by linear unmixing of overlap- of a second pass through the data can vary from sample to sample
ping expression patterns combined with hard decoding of thresholded and were found to range from approximately 20 to 40% of the total
encoding images. The protein is deemed potentially present if a pixel transcript calls in sample FOVs with single- and dual-radius analysis.
matches the expected encoding ‘on bits’ within the 64-bit encoding After the conclusion of two passes through the data, we had a list of
scheme. From that, protein intensity is computed by linear unmix- potential transcript locations in each FOV. However, as we consider
ing using a pseudo-inverse of the protein signature matrix (PSM). each unique reporter-binding event location as a seed for transcript
The output of the primary data analysis is a series of 2D binary maps calling, there is a potential for duplicate calling of each individual
that represent encoded reporter signals in each hybridization round. transcript, artificially inflating the total count of transcripts detected
Binary segments are binned together based on encoded bits. Com- in a FOV. For instance, if all four reporter-binding events contributing
mon regions between the binned masks generate protein localization to a transcript call occur at slightly different locations, the same tran-
masks; localization masks for each protein are stored, followed by script could be counted four times. To prevent this from happening,
linear unmixing using the pseudo-inverse of PSM. The method used for each gene all transcript calls made are filtered to ensure that there
for pseudo-inverse is Cholesky decomposition69. The pseudo-inverse is no other transcript call present within a radius of 0.75 pixels from
of PSM is multiplied by the measured images, which provides protein each transcript’s estimated location. Whenever multiple transcript calls
expression and localization patterns. A more detailed description of are found within this search radius, the transcript call with the highest
the protein analysis pipeline will be the subject of a future manuscript. number of reporter-binding events contributing to it is retained and
others discarded.
Secondary analysis and decoding
The imaging data, converted to a table of xyz locations of all indi- Cell segmentation algorithm
vidual reporter-binding events, were used to determine the presence We established a cell segmentation pipeline combining image pre-
of individual transcripts. To start this process, each unique location processing and machine learning techniques (Supplementary Fig. 3).
with at least one reporter-binding event was considered a seed and all Briefly, our pipeline takes tissue images stained with both nuclear and
neighboring locations to each seed with at least one reporter-binding membrane markers (DAPI, CD298/PanCK/CD3) to perform rescaling,
event were determined. The neighbor search is limited to a radius of normalization, image deconvolution and boundary enhancement.
0.5 pixels (90 nm) in a first pass through the data. Any seeds with fewer Image subtraction was performed between the nuclear and membrane
than four unique reporter probe-binding events in the neighborhood channels to enhance the contrast between adjacent cells while reduc-
were removed from consideration for transcript decoding due to their ing autofluorescence signal. The preprocessed images were fed into
inability to form a complete gene-specific barcode. All four possible pretrained Cellpose neural network models70 for both nuclear and
reporter combinations of unique reporter probes in a seed’s neighbor- cytoplasm modes of segmentation. Results from two segmentation
hood are then matched against a table of all potential gene-specific tasks were combined to select the best results from each mode by
barcodes to detect the presence of a gene in a seed’s neighborhood. analysis of intersection and union between all segmented cells. The
Only those four reporter combinations for which at least one of the combination of preprocessing steps, dual-mode segmentation and
reporter probes is detected at the seed location (as against the neigh- use of publicly available pretrained models has greatly improved the
borhood around the seed) are considered valid for the target-matching robustness of cell segmentation outcomes, making model retraining
process. If more than one gene was detected in a seed’s neighborhood, unnecessary for most tissue images.
that seed and all transcripts detected in its neighborhood were dropped The analysis pipeline was run with two different modes of hard-
from further analysis. ware processor: Cuda GPU (Ge Force RTX 2070, Quadro P4000) and
All seeds (or transcripts) retained after this step underwent parallel CPU threads, with average segmentation run time of around
another filtering step. In this step, any seeds with a high probability 94 s per FOV with five channels, six to eight Z-slices and image size
of making a transcript call by random chance (and hence the tran- 5,472 × 3,678 pixels. The final segmentation step was to map each
scripts detected in their neighborhoods) were then dropped from transcript location in the registered image to the corresponding cell,
further analysis in the first pass through the data. Given that a set as well as to the cell compartment (nuclei, cytoplasm, membrane),
number of barcodes (for example, 980) out of a pool of all possible four where the transcript is located. Other features/properties generated
reporter barcodes that can be generated using 64 unique reporters include shape (area, aspect ratio) and intensity statistics (minimum,
(465,920) were used to denote gene types, there is nonzero probabil- maximum, average) per cell.
ity that any random combination of four unique reporters can match
a gene-specific barcode. This probability is further increased when Segmented regression to estimate the relationship between
the unique reporter-binding events in a seed’s neighborhood can be SMI and RNA-seq
used to generate many potential four-reporter barcodes. To ensure Segmented regression was performed using the R package ‘seg-
high confidence in the presence of a true transcript for every target mented’. log-transformed SMI counts were modeled as a function of
call made, any seeds (and associated transcript calls) with a random log-transformed RNA-seq counts. To avoid negative infinity values
transcript call probability >2% were dropped from further analysis in following log transformation, RNA-seq counts were thresholded below

Nature Biotechnology
Article https://doi.org/10.1038/s41587-022-01483-z

their lowest nonzero value. Estimated breakpoints and linear trends code/analyses performed in this manuscript that are not available
were extracted from segmented regression models. through open-source packages, requests can be made through email
to the corresponding author.
Analysis of NSCLC samples
Single-cell expression profiles were derived by counting the transcripts References
of each gene that fell within the area assigned to a cell by the segmenta- 46. Caruthers, M. H. et al. Chemical synthesis of
tion algorithm. Cells with fewer than 20 total transcripts were omitted deoxyoligonucleotides by the phosphoramidite method. Methods
from analysis. A normalized expression profile was defined for each cell Enzymol. 154, 287–313 (1987).
by dividing its raw counts vector by its total counts. A separate UMAP 47. Zhang, Z., Revyakin, A., Grimm, J. B., Lavis, L. D. & Tjian,
projection was computed for each tissue. R. Single-molecule tracking of the transcription cycle by
After analysis of single-cell expression data we created a single-cell sub-second RNA detection. eLife 3, e01775 (2014).
neighborhood matrix. This matrix specifies the number of each cell 48. Wagle, M.-C. et al. A transcriptional MAPK Pathway Activity Score
type among each cell’s 200 closest neighbors in 2D physical space. The (MPAS) is a clinically relevant biomarker in multiple cancer types.
matrix was then input into the UMAP algorithm. To define niches, this NPJ Precis. Oncol. 2, 7 (2018).
matrix was clustered using the Mclust algorithm. 49. Son, Y. H. et al. Roles of MAPK and NF-kappaB in interleukin-6
To test whether each gene was differentially expressed between induction by lipopolysaccharide in vascular smooth muscle cells.
macrophages in different niches, a linear model was run predicting raw J. Cardiovasc. Pharmacol. 51, 71–77 (2008).
counts from the niche. Only those 6,884 macrophages found in Lung 6 50. Kang, H. B., Kim, Y. E., Kwon, H. J., Sok, D. E. & Lee, Y. Enhancement
were used in this analysis. A global P value for each gene was taken using of NF-kappaB expression and activity upon differentiation of
a likelihood ratio test comparing this linear model to the null model, human embryonic stem cell line SNUhES3. Stem Cells Dev. 16,
using the R package lmtest. False discovery rates were calculated using 615–623 (2007).
the Benjamini–Hochberg procedure via the R function p.adjust. 51. Hiscott, J. et al. Induction of human interferon gene expression is
Following cell typing, we interrogated ligand and receptor signal- associated with a nuclear factor that interacts with the NF-kappa
ing between adjacent cells. Delaunay triangulation was used to build a B site of the human immunodeficiency virus enhancer. J. Virol. 63,
spatial adjacency network, given the spatial locations of each cell. For 2557–2566 (1989).
each pair of adjacent tumor cells and T cells, we calculated an inter- 52. Kitamura, A., Takahashi, K., Okajima, A. & Kitamura, N. Induction
action score using the geometric mean of their ligand and receptor of the human gene for p44, a hepatitis-C-associated microtubular
expression, respectively. This LR interaction score was recalculated aggregate protein, by interferon-alpha/beta. Eur. J. Biochem. 224,
across all adjacent cell pairs for 100 distinct LR interacting partners. 877–883 (1994).
Next, an average score was calculated for each LR pair. Finally, each 53. Kim, J. H., Park, S. Y., Jun, Y., Kim, J. Y. & Nam, J. S. Roles of Wnt
average score was tested to determine whether it was enriched by the target genes in the journey of cancer stem cells. Int. J. Mol. Sci. 18,
spatial arrangement of cells within the adjacency matrix. This was per- 1604 (2017).
formed by producing a null distribution of simulated average scores 54. Jho, E. H. et al. Wnt/beta-catenin/Tcf signaling induces the
calculated using randomized adjacency networks. transcription of Axin2, a negative regulator of the signaling
pathway. Mol. Cell. Biol. 22, 1172–1183 (2002).
Protein expression visualization 55. Lustig, B. et al. Negative feedback loop of Wnt signaling through
The primary outputs of protein data processing are protein localization upregulation of conductin/axin2 in colorectal and liver tumors.
maps for each antibody in the assay, as well as images of the three anti- Mol. Cell. Biol. 22, 1184–1193 (2002).
bodies used for tissue morphology and cell membrane visualization. 56. Yan, D. et al. Elevated expression of axin2 and hnkd mRNA
Protein expression per cell was reported as the sum of the intensity provides evidence that Wnt/beta–catenin signaling is activated in
values of pixels present in both the cell label and protein localization human colon tumors. Proc. Natl Acad. Sci. USA 98, 14973–14978
mask. A limit of detection for visualization was set by filtering to only (2001).
cells with a detectable signal from at least one of the three isotype con- 57. Ramakrishnan, A. B. & Cadigan, K. M. Wnt target genes and where
trols (19% of cells for Fig. 6), using the geometric mean of the isotype to find them. F1000Res 6, 746 (2017).
control signals and adding 1.5 standard deviations. Isotype controls 58. Barker, N. et al. Identification of stem cells in small intestine and
are nonspecific negative control antibodies against rabbit IgG1, mouse colon by marker gene Lgr5. Nature 449, 1003–1007 (2007).
IgG2a and mouse IgG1 (Supplementary Fig. 15). Visualization of protein 59. He, T. C. et al. Identification of c-MYC as a target of the APC
localization patterns was performed in ImageJ, and single-cell expres- pathway. Science 281, 1509–1512 (1998).
sion visualization was performed in R. 60. Shtutman, M. et al. The cyclin D1 gene is a target of the
beta-catenin/LEF-1 pathway. Proc. Natl Acad. Sci. USA 96,
Reporting summary 5522–5527 (1999).
Further information on research design is available in the Nature 61. Katoh, Y. & Katoh, M. Hedgehog target genes: mechanisms
Research Reporting Summary linked to this article. of carcinogenesis induced by aberrant hedgehog signaling
activation. Curr. Mol. Med. 9, 873–886 (2009).
Data availability 62. Danaher, P. et al. Gene expression markers of Tumor Infiltrating
The full RNA NSCLC dataset used in this study is available at http:// Leukocytes. J. Immunother. Cancer 5, 18 (2017).
nanostring.com/CosMx-dataset. 63. Nguyen, Q. H. et al. Profiling human breast epithelial cells using
single cell RNA sequencing identifies cell diversity. Nat. Commun.
Code availability 9, 2028 (2018).
Data from this publication (the full RNA NSCLC dataset) has been 64. Barneda, D. et al. The brown adipocyte protein CIDEA promotes
placed in the public domain in a format that can be analyzed and visual- lipid droplet fusion via a phosphatidic acid-binding amphipathic
ized using a variety of open-source packages, such as Seurat (https:// helix. eLife 4, e07485 (2015).
github.com/satijalab/seurat) and Giotto (https://github.com/RubD/ 65. Ussar, S. et al. ASC-1, PAT2, and P2RX5 are cell surface markers for
Giotto). Nearly all of the analyses performed in this paper can be accom- white, beige, and brown adipocytes. Sci. Transl. Med. 6, 247ra103
plished using these open-source packages. For any of the specialized (2014).

Nature Biotechnology
Article https://doi.org/10.1038/s41587-022-01483-z

66. Min, S. Y. et al. Diverse repertoire of human adipocyte subtypes writing and editing. Y.L. conducted a pathological review to identify
develops from transcriptionally distinct mesenchymal progenitor the correct staining pattern for all antibodies used. J.S.N.’s team was
cells. Proc. Natl Acad. Sci. USA 116, 17970–17979 (2019). responsible for all chemistry process development efforts and supply
67. Shan, T., Liu, W. & Kuang, S. Fatty acid binding protein 4 chain management pertaining to the outsourcing of oligonucleotide
expression marks a population of adipocyte progenitors in white synthesis, process improvements, price negotiation and supply
and brown adipose tissues. FASEB J. 27, 277–287 (2013). agreement, key component scale-up and validation of all custom
68. Bradley, D. & Roth, G. Adaptive thresholding using the integral reagents and DNA components in R&D required for SMI—including
image. J. Graph. Tools 12, 13–21 (2007). contract manufacturing operation and validation of the PC spacer,
69. Krishnamoorthy, A. & Menon, D. Matrix inversion using Cholesky synthesis and quality control of the three large-scale component
decomposition. In 2013 Signal Processing: Algorithms, sets needed for SMI reporter manufacturing plus high-throughput
Architectures, Arrangements, and Applications (SPA) (IEEE, 2013). synthesis, outsourcing, quality control and processing of the
70. Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a thousands of required ISH probes. G.T.O. developed morphology and
generalist algorithm for cellular segmentation. Nat. Methods 18, segmentation markers. E.P.P. developed the SMI instrument optical
100–106 (2021). subsystem, instrument validation and support. J.C.P. developed the
encoding scheme, screened reporter sequences and developed
Acknowledgements readout sequences. T.P.-E. optimized protein assay and undertook
Research and development reported in this publication was supported data collection, protein analyses and manuscript writing/editing.
in part through a strategic development collaboration between E.P. designed the 980-plex RNA panel with P.D. and contributed to
NanoString Technologies and Lam Research (Fremont, CA). The manuscript writing/editing. T.R. developed the secondary analysis
authors thank B. Birditt, B. Filanoski, J. Jenkins and E. Zhao from and target decoding pipeline for RNA targets, and the SMI instrument
NanoString Technologies for provision of technical support. fluidic subsystem and workflow software. Z.R. developed the
LR interaction analysis method, analyzed LR interactions across
Author contributions all tumors and contributed to manuscript writing/editing. M.R.
S.H. undertook conception and design of the work, supervised data performed initial chemistry development, supervised data release and
collection, analysis and interpretation and drafted the manuscript. contributed to writing the manuscript. A.R. was responsible for SMI
R.B. developed protein data processing algorithms. C.B. was protein assay content design and reagent validation. D.R. created plots
responsible for instrument control software, SMI automation and of transcript positions overlaid on morphology and segmentation
integration of all instrument subsystems software. E.A.B. reviewed images for the figures. H.S. carried out manuscript development,
and validated protein data (Fig. 6) and performed the nested protein writing and editing and created figures and tables. A.W.W. designed
multiplex validation experiment (Supplementary Fig. 17). D.L.B. and developed the cell segmentation pipeline and contributed to
developed manual and automated processes for inventorying, manuscript writing/editing. C.A.W.-W. performed lung RNA isolation,
quantitation, normalization, pooling, purification and quality control processing and quality measurement and NGS library preparation
of oligonucleotides in 980-plex RNA panel and SMI reporters. K.C. and sequencing. L.W. established the cell segmentation pipeline,
performed oligo conjugations of antibodies used in SMI protein optimized the on-instrument SMI readout workflow and contributed
assays. P.D. designed the 980-plex RNA panel with E.P., performed to manuscript writing/editing. J.M.B. conceived the project, helped
comparison with RNA-seq in Fig. 2, performed NSCLC analyses in with experimental design and analysis and contributed to manuscript
Fig. 3, performed reproducibility analysis in Fig. 5 and contributed writing/editing.
to manuscript writing/editing. D.D. led the team that developed the
system, instrumentation and software. R.G.G. developed reporter Competing interests
chemistry, assembly, quality control and manufacturing, and All authors are employees of NanoString Technologies and hold
contributed to the section SMI reporter design and assembly and NanoString stock or stock options. D.K. is an employee of Dxome Co.
Supplementary Fig. 18. G.G. led development of protein-based SMI
assays and design and interpretation of protein experiments. M.T.G. Additional information
analyzed profiling data of human lung cells. M.L.H. contributed to Supplementary information The online version contains
the development of reporters and analysis of FFPE RNA quality and supplementary material available at
created Supplementary Fig. 6. R.K. designed and developed reporter https://doi.org/10.1038/s41587-022-01483-z.
structure. E.E.K. contributed to SMI methods. D.K. developed the
overall concept, designed and guided experiments and analyzed data. Correspondence and requests for materials should be addressed to
T.K.K. supervised data collection and interpretation of human lung Joseph M. Beechem.
samples. Y.K. supervised data collection, analysis and interpretation
of human lung samples. A. Klock developed the SMI ISH probe design Peer review information Nature Biotechnology thanks Sanjay Tyagi
pipeline and designed SMI ISH probes used in the 980-plex RNA and the other, anonymous, reviewer(s) for their contribution to the
panel. M.K. developed the primary data analysis pipeline for RNA and peer review of this work.
protein targets. A. Kutchma designed SMI ISH probes and wrote the
section SMI ISH probe design. Z.R.L. designed and developed the Reprints and permissions information is available at
protein assay, executed protein analyses and performed manuscript www.nature.com/reprints.

Nature Biotechnology

You might also like