Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Journal pre-proof

DOI: 10.1016/j.cub.2020.03.022

This is a PDF file of an accepted peer-reviewed article but is not yet the definitive version of record. This version
will undergo additional copyediting, typesetting and review before it is published in its final form, but we are
providing this version to give early visibility of the article. Please note that, during the production process, errors
may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 The Author(s).
Manuscript

1 Title: Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak
2 Authors: Tao Zhang1†, Qunfu Wu1†, Zhigang Zhang1,2*
3 Affiliations:
1
4 State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan,
5 School of Life Sciences, Yunnan University, No.2 North Cuihu Road, Kunming, Yunnan,
6 650091, China
2
7 Lead Contact

8 These authors contributed equally to this work
*
9 Correspondence: zhangzhigang@ynu.edu.cn
10 Summary:
11 An outbreak of coronavirus disease 2019 (COVID-19) caused by the 2019 novel
12 coronavirus (SARS-CoV-2) began in the city of Wuhan in China and has widely spread
13 worldwide. Currently, it is vital to explore potential intermediate hosts of SARS-CoV-2 to
14 control COVID-19 spread. Therefore, we reinvestigated published data from pangolin lung
15 samples from which SARS-CoV-like CoVs were detected by Liu et al.[1]. We found
16 genomic and evolutionary evidence of the occurrence of a SARS-CoV-2-like CoV (named
17 Pangolin-CoV) in dead Malayan pangolins. Pangolin-CoV is 91.02% and 90.55% identical
18 to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole genome level. Aside
19 from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. The S1
20 protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13.
21 Five key amino acid residues involved in the interaction with human ACE2 are completely
22 consistent between Pangolin-CoV and SARS-CoV-2, but four amino acid mutations are
23 present in RaTG13. Both Pangolin-CoV and RaTG13 lost the putative furin recognition
24 sequence motif at S1/S2 cleavage site that can be observed in the SARS-CoV-2.
25 Conclusively, this study suggests that pangolin species are a natural reservoir of SARS-
26 CoV-2-like CoVs.
27 Keywords: Pangolin; SARS-CoV-2; COVID-19; Origin.
28 Results and Discussion
29 Similar to the case for SARS-CoV and MERS-CoV[2], the bat is still a probable
30 species of origin for SARS-CoV-2 because SARS-CoV-2 shares 96% whole-genome
31 identity with a bat coronavirus (CoV), BatCoV RaTG13, from Rhinolophus affinis from
32 Yunnan Province[3]. However, SARS-CoV and MERS-CoV usually pass into intermediate
33 hosts, such as civets or camels, before leaping to humans[4]. This fact indicates that SARS-
34 CoV-2 was probably transmitted to humans by other animals. Considering that the earliest
35 COVID-19 patient reported no exposure at the seafood market[5], it is vital to find the
36 intermediate SARS-CoV-2 host to block interspecies transmission. On 24 October 2019,
37 Liu and his colleagues from the Guangdong Wildlife Rescue Center of China[1] first
38 detected the existence of a SARS-CoV-like CoV from lung samples of two dead Malayan
39 pangolins with a frothy liquid in their lungs and pulmonary fibrosis, and this fact was
40 discovered close to when the COVID-19 outbreak occurred. Using their published results,
41 we showed that all virus contigs assembled from 2 lung samples (lung07, lung08) exhibited
42 low identities, ranging from 80.24% to 88.93%, with known SARSr-CoVs. Hence, we
43 conjectured that the dead Malayan pangolins may carry a new CoV closely related to
44 SARS-CoV-2.
45 Assessing the probability of SARS-CoV-2-like CoV presence in pangolin species
46 To confirm our assumption, we downloaded raw RNA-seq data (sequence read archive
47 (SRA) accession number PRJNA573298) for those two lung samples from the SRA and
48 conducted consistent quality control and contaminant removal, as described by Liu’s
49 study[1]. We found 1882 clean reads from the lung08 sample that mapped to the SARS-
50 CoV-2 reference genome (GenBank Accession MN908947)[6] and covered 76.02% of the
51 SARS-CoV-2 genome. We performed de novo assembly of those reads and obtained 36
52 contigs with lengths ranging from 287 bp to 2187 bp, with a mean length of 700 bp. Via
53 Blast analysis against proteins from 2845 CoV reference genomes, including RaTG13,
54 SARS-CoV-2s and other known CoVs, we found that 22 contigs were best matched to
55 SARS-CoV-2s (70.6%-100% amino acid identity; average: 95.41%) and that 12 contigs
56 matched to bat SARS-CoV-like CoV (92.7%-100% amino acid identity; average: 97.48%)
57 (Table S1). These results indicate that the Malayan pangolin might carry a novel CoV (here
58 named Pangolin-CoV) that is similar to SARS-CoV-2.
59 Draft genome of Pangolin-CoV and its genomic characteristics
60 Using a reference-guided scaffolding approach, we created a Pangolin-CoV draft
61 genome (19,587 bp) based on the above 34 contigs. To reduce the effect of raw read errors
62 on scaffolding quality, small fragments that aligned against the reference genome with a
63 length less than 25 bp were manually discarded if they were unable to be covered by any
64 large fragments or reference genome. Remapping 1882 reads against the draft genome
65 resulted in 99.99% genome coverage (coverage depth range: 1X-47X) (Figure 1A). The
66 mean coverage depth was 7.71X across the whole genome, which was two times higher
67 than the lowest common 3X read coverage depth for single-nucleotide polymorphism (SNP)
68 calling based on low-coverage sequencing in the 1000 Genomes Project pilot phase[7].
69 Similar coverage levels are also sufficient to detect rare or low-abundance microbial
70 species from metagenomic datasets[8], indicating that our assembled Pangolin-CoV draft
71 genome is reliable for further analyses. Based on Simplot analysis[9], Pangolin-CoV
72 showed high overall genome sequence identity to RaTG13 (90.55%) and SARS-CoV-2
73 (91.02%) throughout the genome (Figure 1B), although there was a higher identity (96.2%)
74 between SARS-CoV-2 and RaTG13[3]. Other SARS-CoV-like CoVs similar to Pangolin-
75 CoV were bat SARSr-CoV ZXC21 (85.65%) and bat SARSr-CoV ZC45 (85.01%). While
76 this manuscript was under review, two similar preprint studies found that CoVs in
77 pangolins shared 90.3%[10] and 92.4%[11] DNA identity with SARS-CoV-2
78 approximating the 91.02% identity to SARS-CoV-2 observed here and supporting our
79 findings. Taken together, these results indicate that Pangolin-CoV might be the common
80 origin of SARS-CoV-2 and RaTG13.
81 The Pangolin-CoV genome organization was characterized by sequence alignment
82 against SARS-CoV-2 (GenBank accession MN908947) and RaTG13. The Pangolin-CoV
83 genome consists of six major open reading frames (ORFs) common to CoVs and four other
84 accessory genes (Figure 1C and Table S2). Further analysis indicated that Pangolin-CoV
85 genes aligned to SARS-CoV-2 genes with coverage ranging from 45.8% to 100% (average
86 coverage 76.9%). Pangolin-CoV genes shared high average nucleotide and amino acid
87 identity with both SARS-CoV-2 (MN908947) (93.2% nucleotide/94.1% amino acid
88 identity) and RaTG13 (92.8% nucleotide/93.5% amino acid identity) genes (Figure 1C and
89 Table S2). Surprisingly, some Pangolin-CoV genes showed higher amino acid sequence
90 identity to SARS-CoV-2 genes than to RaTG13 genes, including orf1b (73.4%/72.8%), the
91 spike (S) protein (97.5%/95.4%), orf7a (96.9%/93.6%), and orf10 (97.3%/94.6%). The
92 high S protein amino acid identity implies functional similarity between Pangolin-CoV and
93 SARS-CoV-2.
94 Phylogenetic relationships among Pangolin-CoV, RaTG13 and SARS-CoV-2
95 To determine the evolutionary relationships among Pangolin-CoV, SARS-CoV-2 and
96 previously identified CoVs, we estimated phylogenetic trees based on the nucleotide
97 sequences of the whole genome sequence, RNA-dependent RNA polymerase gene (RdRp),
98 non-structural protein genes ORF1a and ORF1b, and main structural proteins encoded by
99 the S and M genes. In all phylogenies, Pangolin-CoV, RaTG13 and SARS-CoV-2 were
100 clustered into a well-supported group, here named the “SARS-CoV-2 group” (Figure 2 and
101 Figures S1 to S2). This group represents a novel Betacoronavirus group. Within this group,
102 RaTG13 and SARS-CoV-2 were grouped together, and Pangolin-CoV was their closest
103 common ancestor. However, whether the basal position of the SARS-CoV-2 group is
104 SARSr-CoV ZXC21 and/or SARSr-CoV ZC45 is still under debate. Such debate also
105 occurred in both the Wu et al.[6] and Zhou et al.[3] studies. A possible explanation is a past
106 history of recombination in the Betacoronavirus group[6]. It is noteworthy that the
107 discovered evolutionary relationships of CoVs shown by the whole genome, RdRp gene,
108 and S gene were highly consistent with those exhibited by complete genome information
109 in the Zhou et al. study[3]. This correspondence indicates that our Pangolin-CoV draft
110 genome has enough genomic information to trace the true evolutionary position of
111 Pangolin-CoV in CoVs.
112 Dualism of the S protein of Pangolin-CoV
113 The CoV S protein consists of 2 subunits (S1 and S2), mediates infection of receptor-
114 expressing host cells and is a critical target for antiviral neutralizing antibodies[12]. S1
115 contains a receptor-binding domain (RBD) that consists of an approximately 193 amino
116 acid fragment, which is responsible for recognizing and binding the cell surface
117 receptor[13, 14]. Zhou et al. experimentally confirmed that SARS-CoV-2 is able to use
118 human, Chinese horseshoe bat, civet, and pig ACE2 proteins as an entry receptor in ACE2-
119 expressing cells[3], suggesting that the RBD of SARS-CoV-2 mediates infection in
120 humans and other animals. To gain sequence-level insight into the pathogenic potential of
121 Pangolin-CoV, we first investigated the amino acid variation pattern of the S1 proteins
122 from Pangolin-CoV, SARS-CoV-2, RaTG13, and other representative SARS/SARSr-
123 CoVs. The amino acid phylogenetic tree showed that the S1 protein of Pangolin-CoV is
124 more closely related to that of 2019-CoV than to that of RaTG13. Within the RBD, we
125 further found that Pangolin-CoV and SARS-CoV-2 were highly conserved, with only one
126 amino acid change (500H/500Q) (Figure 3), which is not one of the five key residues
127 involved in the interaction with human ACE2[3, 14]. These results indicate that Pangolin-
128 CoV could have pathogenic potential similar to that of SARS-CoV-2. In contrast, RaTG13
129 has changes in 17 amino acid residues, 4 of which are among the key amino acid residues
130 (Figure 3). There are evidences suggesting that the change of 472L (SARS-CoV) to 486F
131 (SARS-CoV-2) (corresponding to the second key amino acid residue change in Figure 3)
132 may make stronger van der Waals contact with M82 (ACE2)[15]. Besides, the major
133 substitution of 404V in the SARS-CoV-RBD with 417K in the SARS-CoV-2-RBD (see
134 420 alignment position in Figure 3 and without amino acid change between the SARS-
135 CoV-2 and RaTG13) may result in tighter association because of the salt bridge formation
136 between 417K and 30D of ACE2[15]. Nevertheless, it still needs further investigation
137 about whether those mutations affect the affinity for ACE2. Whether the Pangolin-CoV or
138 RaTG13 as potential infectious agents to humans remains to be determined.
139 The S1/S2 cleavage site in the S protein is also an important determinant of the
140 transmissibility and pathogenicity of SARS-CoV/SARS-CoVr viruses[16]. The trimetric S
141 protein is processed at the S1/S2 cleavage site by host cell proteases during infection.
142 Following cleavage, also known as priming, the protein is divided into an N-terminal S1-
143 ectodomain that recognizes a cognate cell surface receptor and a C-terminal S2-membrane
144 anchored protein that drives fusion of the viral envelope with a cellular membrane. We
145 found that the SARS-CoV-2 S protein contains a putative furin recognition motif
146 (PRRARSV) (Figure 4) similar to that of MERS-CoV, which has a PRSVRSV motif that
147 is likely cleaved by furin[16, 17] during virus egress. Conversely, the furin sequence motif
148 at the S1/S2 site is missing in the S protein of Pangolin-CoV and all other SARS/SARSr-
149 CoVs. This difference indicates the SARS-CoV-2 might gain a distinct mechanism to
150 promote its entry into host cells[18]. Interestingly, aside from MERS-CoV, similar
151 sequence patterns to the SARS-CoV-2 were also presented in some members of
152 Alphacoronavirus, Betacoronavirus, and Gammcoronavirus[19], raising an interesting
153 question regarding whether this furin sequence motif in SARS-CoV-2 might be derived
154 from those existed S protein of other coronaviruses or alternatively the SARS-CoV-2 might
155 be the recombinant of Pangolin-CoV or RaTG13 and other coronaviruses with similar furin
156 recognition motif in the unknown intermediate host.
157 Amino acid variations in the nucleocapsid (N) protein for potential diagnosis
158 The N protein is the most abundant protein in CoVs. The N protein is a highly
159 immunogenic phosphoprotein, and it is normally very conserved. The CoV N protein is
160 often used as a marker in diagnostic assays. To gain further insight into the diagnostic
161 potential of Pangolin-CoV, we investigated the amino acid variation pattern of the N
162 proteins from Pangolin-CoV, SARS-CoV-2, RaTG13, and other representative SARS-
163 CoVs. Phylogenetic analysis based on the N protein supported the classification of
164 Pangolin-CoV as a sister taxon of SARS-CoV-2 and RaTG13 (Figure S3). We further
165 found seven amino acid mutations that differentiated our defined “SAR-CoV-2 group”
166 CoVs (12N, 26 G, 27S, 104D, 218A, 335T, 346N, and 350Q) from other known SARS-
167 CoVs (12S, 26D, 27N, 104E, 218T, 335H, 346Q, and 350N). Two amino acid sites (38P
168 and 268Q) are shared by Pangolin-CoV, RaTG13 and SARS-CoVs, which are mutated to
169 38S and 268A in SARS-CoV-2. Only one amino acid residue shared by Pangolin-CoV and
170 other SARS-CoVs (129E) is consistently different in both SARS-CoV-2 and RaTG13
171 (129D). The observed amino acid changes in the N protein would be useful for developing
172 antigens with improved sensitivity for SARS-CoV-2 serological detection.
173 Conclusion
174 Based on published metagenomic data, this study provides the first report on a
175 potential closely related kin (Pangolin-CoV) of SARS-CoV-2, which was discovered from
176 dead Malayan pangolins after extensive rescue efforts. Aside from RaTG13, the Pangolin-
177 CoV is the CoV most closely related to SARS-CoV-2. Due to unavailability of the original
178 sample, we did not perform further experiments to confirm our findings, including PCR
179 validation, serological detection, or even isolation of the virus particles. Our discovered
180 Pangolin-CoV genome showed 91.02% nucleotide identity with the SARS-CoV-2 genome.
181 However, whether pangolin species are good candidates for SARS-CoV-2 origin is still
182 under debate. Considering the wide spread of SARSr-CoVs in natural reservoirs, such as
183 bats, camels, and pangolins, our findings would be meaningful for finding novel
184 intermediate SARS-CoV-2 hosts to block interspecies transmission.
185 Acknowledgements
186 This study was supported by the Second Tibetan Plateau Scientific Expedition and
187 Research (STEP) program (no. 2019QZKK0503), the National Key Research and
188 Development Program of China (no. 2018YFC2000500), the Key Research Program of the
189 Chinese Academy of Sciences (no. KFZD-SW-219), and the Chinese National Natural
190 Science Foundation (no. 31970571).
191 Author Contributions
192 Z.Z. performed project planning, coordination, execution, and facilitation. T.Z. and
193 W.Q. performed the metagenomic analysis. T.Z. carried out assemblies, gene prediction,
194 and annotation. W.Q. processed data collection and phylogenetic analysis. Z.Z., T.Z., and
195 W.Q. prepared the manuscript.
196 Declaration of Interests
197 The authors declare no competing interests.
198 Figure Legends
199 Figure 1 Genome-related analysis. (A) Sequence depth of reads remapped to Pangolin-
200 CoV. (B) Similarity plot based on the full-length genome sequence of Pangolin-CoV. Full-
201 length genome sequences of SARS-CoV-2 (Beta-CoV/Wuhan-Hu-1), BatCoV RaTG13,
202 bat SARSr-CoV 21, bat SARSr-CoV45, bat SARSr-CoV WIV1, and SARS-CoV BJ01
203 were used as reference sequences. (C) Comparison of common genome organization
204 similarity among SARS-CoV-2, Pangolin-CoV and BatCoV RaTG13. Related to Table
205 S2.
206 Figure 2 Phylogenetic relationship of CoVs based on the whole genome and RdRp
207 gene nucleotide sequences. Red text denotes the Malayan Pangolin-CoV. Pink text
208 denotes SARS-CoV-2. Green text denotes a bat CoV with 96% similarity at the genome
209 level to SARS-CoV-2. Blue text denotes the reference CoVs used in Figure 1B. Detailed
210 information can be found in the STAR Methods. Related to Figures S1 to S3.
211 Figure 3 Amino acid sequence alignment of the S1 protein and its phylogeny. The
212 receptor-binding motif of SARS-CoV and the homologous region of other CoVs are
213 indicated by the grey box. The key amino acid residues involved in the interaction with
214 human ACE2 are marked with the orange box. Bat SARS-CoV-like CoVs had been
215 reported to not use ACE2 and have amino acid deletions at two motifs marked by the
216 yellow box. Detailed information can be found in the STAR Methods.
217 Figure 4 CoV S protein S1/S2 cleavage sites. Four amino acid insertions (SPRRs) unique
218 to SARS-CoV-2 are marked in yellow. Conserved S1/S2 cleavage sites are marked in
219 green.
220 STAR Methods
221 KEY RESOURCES TABLE
222 LEAD CONTACT AND MATERIALS AVAILABILITY
223 Requests for further information and data resources should be directed to and will be
224 fulfilled by the Lead Contact, Zhigang Zhang (zhangzhigang@ynu.edu.cn). This study did
225 not generate new unique reagents.
226 METHOD DETAILS
227 Data collection and preprocessing
228 We downloaded raw data for the lung08 and lung07 samples published in Liu’s
229 study[1] from the NCBI SRA under BioProject PRJNA573298. Raw reads were first
230 adaptor and quality trimmed using the Trimmomatic program (version 0.39)[20]. To
231 remove host contamination, Bowtie2 (version 2.3.4.3)[21]was used to map clean reads to
232 the host reference genome of Manis javanica (NCBI Project ID: PRJNA256023). Only
233 unmapped reads were mapped to the SARS-CoV-2 reference genome (GenBank accession
234 MN908947) for identifying virus reads.
235 Genome assembly and gene prediction
236 Virus-mapped reads were assembled de novo using MEGAHIT (version 1.1.3)[22].
237 Read remapping to assembled contigs was performed by using Bowtie2[21]. Mapping
238 coverage and depth were determined using Samtools (version 1.9)[23]. Contigs were
239 taxonomically annotated using BLAST 2.9.0+[24] against 2845 CoV reference genomes
240 (Table S1). The BatCoV RaTG13 genome was downloaded from the NGDC database
241 (https://bigd.big.ac.cn/) (accession no. GWHABKP00000000)[3]. The SARS-CoV-2
242 reference genome was downloaded from NCBI (accession no. MN908947)[6]. Other CoV
243 genomes were downloaded from the ViPR database
244 (https://www.viprbrc.org/brc/home.spg?decorator=corona) on 6 February 2020. We
245 further used a reference-guided strategy to construct a draft genome based on contigs
246 taxonomically annotated to SARS-CoV-2s, SARS-CoV, and bat SARS-CoV-like CoV.
247 Each contig was aligned against the SARS-CoV-2 reference genome with MUSCLE
248 software (version 3.8.31)[25]. Aligned contigs were merged into consensus scaffolds with
249 BioEdit version 7.2.5 (http://www.mybiosoftware.com/bioedit-7-0-9-biological-sequence-
250 alignment-editor.html) following manual quality checking. Small fragments less than 25
251 bp in length were discarded if these fragments were not covered by any large fragments.
252 The potential ORFs of the final draft genome obtained were annotated by alignment to the
253 SARS-CoV-2 reference genome (accession no. MN908947). SimPlot 3.5.1[9] was used to
254 analyse whole genome nucleotide identity.
255 Phylogeny
256 Sequence alignment was carried out using MUSCLE software[25]. Alignment
257 accuracy was checked manually base by base. Gblocks[26] was used to process the gap in
258 the aligned sequence. Using MegaX (version 10.1.7)[27], we inferred all maximum
259 likelihood (ML) phylogenetic trees.
260 QUANTIFICATION AND STATISTICAL ANALYSIS
261 Using MegaX software[27], we constructed all maximum likelihood (ML)
262 phylogenetic trees under the best-fit DNA/amino acid substitution model with 1000
263 bootstrap replications. Phylogenetic analyses were performed using the nucleotide
264 sequences of various CoV gene datasets: the whole genome, ORF1a, ORF1b, and the
265 membrane (M), S and RdRp genes. The best model of M was GTR+G, and the best for all
266 the others was GTR+G+I. Two additional protein-based trees were constructed under
267 WAG+G (S1 subunit of the S protein) and JTT+G (N protein). Branches with bootstrap
268 values< 70% were hidden in all phylogenetic trees.
269 DATA AND CODE AVAILABILITY
270 The dataset used in this study is provided as supplementary material (Tables S1 and
271 S2). This study did not generate code.
272 Supplemental Information
273 Table S1 Contigs taxonomically annotated by using BLASTx against 2845 CoV
274 reference genomes. Related to STAR Methods.
275 Table S2 Comparing nucleotide and amino acid sequence identity differences of ten
276 genes among Pangolin-CoV, SARS-CoV-2, and RaTG13. Related to Figure 1C.
277 References
278 1. Liu, P., Chen, W., and Chen, J.-P. (2019). Viral metagenomics revealed sendai virus
279 and coronavirus infection of Malayan Pangolins (Manis javanica). Viruses 11, 979.
280 2. Li, W., Shi, Z., Yu, M., Ren, W., Smith, C., Epstein, J.H., Wang, H., Crameri, G., Hu,
281 Z., Zhang, H., et al. (2005). Bats are natural reservoirs of SARS-like coronaviruses.
282 Science 310, 676-679.
283 3. Zhou, P., Yang, X.-L., Wang, X.-G., Hu, B., Zhang, L., Zhang, W., Si, H.-R., Zhu, Y.,
284 Li, B., Huang, C.-L., et al. (2020). A pneumonia outbreak associated with a new
285 coronavirus of probable bat origin. Nature. doi: https://doi.org/10.1038/s41586-020-
286 2012-7.
287 4. Cui, J., Li, F., and Shi, Z.-L. (2019). Origin and evolution of pathogenic coronaviruses.
288 Nat. Rev. Microbiol. 17, 181-192.
289 5. Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., Zhang, L., Fan, G., Xu, J., Gu,
290 X., et al. (2020). Clinical features of patients infected with 2019 novel coronavirus in
291 Wuhan, China. Lancet 395, 497-506.
292 6. Wu, F., Zhao, S., Yu, B., Chen, Y.-M., Wang, W., Song, Z.-G., Hu, Y., Tao, Z.-W., Tian,
293 J.-H., Pei, Y.-Y., et al. (2020). A new coronavirus associated with human respiratory
294 disease in China. Nature. doi: https://doi.org/10.1038/s41586-020-2008-3.
295 7. Durbin, R.M., Altshuler, D., Durbin, R.M., Abecasis, G.R., Bentley, D.R., Chakravarti,
296 A., Clark, A.G., Collins, F.S., De La Vega, F.M., Donnelly, P., et al. (2010). A map of
297 human genome variation from population-scale sequencing. Nature 467, 1061-1073.
298 8. Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.L., Tyson, G.W., and
299 Nielsen, P.H. (2013). Genome sequences of rare, uncultured bacteria obtained by
300 differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533-538.
301 9. Lole, K.S., Bollinger, R.C., Paranjape, R.S., Gadkari, D., Kulkarni, S.S., Novak, N.G.,
302 Ingersoll, R., Sheppard, H.W., and Ray, S.C. (1999). Full-length human
303 immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in
304 India, with evidence of intersubtype recombination. J. Virol. 73, 152-160.
305 10. Xiao, K., Zhai, J., Feng, Y., Zhou, N., Zhang, X., Zou, J.-J., Li, N., Guo, Y., Li, X.,
306 Shen, X., et al. (2020). Isolation and characterization of 2019-nCoV-like coronavirus
307 from Malayan pangolins. bioRxiv, 2020.2002.2017.951335.
308 11. Lam, T.T.-Y., Shum, M.H.-H., Zhu, H.-C., Tong, Y.-G., Ni, X.-B., Liao, Y.-S., Wei, W.,
309 Cheung, W.Y.-M., Li, W.-J., Li, L.-F., et al. (2020). Identification of 2019-nCoV
310 related coronaviruses in Malayan pangolins in southern China. bioRxiv,
311 2020.2002.2013.945485.
312 12. Tortorici, M.A., and Veesler, D. (2019). Structural insights into coronavirus entry. In
313 Advances in Virus Research, F.A. Rey, ed. (Academic Press), pp. 93-116.
314 13. Ge, X.-Y., Li, J.-L., Yang, X.-L., Chmura, A.A., Zhu, G., Epstein, J.H., Mazet, J.K.,
315 Hu, B., Zhang, W., Peng, C., et al. (2013). Isolation and characterization of a bat
316 SARS-like coronavirus that uses the ACE2 receptor. Nature 503, 535-538.
317 14. Wong, S.K., Li, W., Moore, M.J., Choe, H., and Farzan, M. (2004). A 193-amino acid
318 fragment of the SARS coronavirus S protein efficiently binds angiotensin-converting
319 enzyme 2. J. Biol. Chem. 279, 3197-3201.
320 15. Yan, R., Zhang, Y., Li, Y., Xia, L., Guo, Y., and Zhou, Q. (2020). Structural basis for
321 the recognition of the SARS-CoV-2 by full-length human ACE2. Science, eabb2762.
322 16. Millet, J.K., and Whittaker, G.R. (2014). Host cell entry of Middle East respiratory
323 syndrome coronavirus after two-step, furin-mediated activation of the spike protein.
324 Proc. Natl. Acad. Sci. USA. 111, 15214-15219.
325 17. Coutard, B., Valle, C., de Lamballerie, X., Canard, B., Seidah, N.G., and Decroly, E.
326 (2020). The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-
327 like cleavage site absent in CoV of the same clade. Antiviral Res. 176, 104742.
328 18. Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T., Erichsen, S.,
329 Schiergens, T.S., Herrler, G., Wu, N.-H., Nitsche, A., et al. (2020). SARS-CoV-2 cell
330 entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease
331 inhibitor. Cell. doi:10.1016/j.cell.2020.02.052.
332 19. Millet, J.K., and Whittaker, G.R. (2015). Host cell proteases: Critical determinants of
333 coronavirus tropism and pathogenesis. Virus Res. 202, 120-134.
334 20. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for
335 Illumina sequence data. Bioinformatics 30, 2114-2120.
336 21. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2.
337 Nat. Meth. 9, 357-359.
338 22. Li, D., Liu, C.-M., Luo, R., Sadakane, K., and Lam, T.-W. (2015). MEGAHIT: an ultra-
339 fast single-node solution for large and complex metagenomics assembly via succinct
340 de Bruijn graph. Bioinformatics 31, 1674-1676.
341 23. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G.,
342 Abecasis, G., Durbin, R., and Subgroup, G.P.D.P. (2009). The Sequence
343 Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.
344 24. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and
345 Madden, T.L. (2009). BLAST+: architecture and applications. BMC Bioinformatics
346 10, 421.
347 25. Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and
348 high throughput. Nucleic Acids Res. 32, 1792-1797.
349 26. Talavera, G., and Castresana, J. (2007). Improvement of phylogenies after removing
350 divergent and ambiguously aligned blocks from protein sequence alignments. Syst.
351 Biol. 56, 564-577.
352 27. Kumar, S., Stecher, G., Li, M., Knyaz, C., and Tamura, K. (2018). MEGA X: molecular
353 evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547-
354 1549.
355
Key Resource Table

KEY RESOURCES TABLE


REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited Data
Raw and analyzed data [1] PRJNA573298
Manis javanica reference genome NCBI sequence read PRJNA256023
archive (SRA)
SARS-CoV-2 reference genome GenBank MN908947
BatCov-RaTG13 genome NGDC GWHABKP00000000
(https://bigd.big.ac.cn/)
2845 Coronavirus reference genomes set ViPR https://www.viprbrc.org/br
c/home.spg?decorator=co
rona
Software and Algorithms
Simplot [9] https://www.mybiosoftwar
e.com/simplot-3-5-1-
sequence-similarity-
plotting.html
Trimmomatic [20] http://www.usadellab.org/c
ms/index.php?page=trim
momatic
Bowtie2 [21] http://bowtie-
bio.sourceforge.net/bowtie
2
MEGAHIT [22] https://github.com/voutcn/
megahit
BLAST+ [24] ftp://ftp.ncbi.nlm.nih.gov/bl
ast/executables/blast+/LA
TEST
SAMtools [23] http://samtools.sourceforg
e.net/
MUSCLE [25] http://drive5.com/muscle/
BioEdit San Diego Supercomputer http://www.mybiosoftware.
Center com/bioedit-7-0-9-
biological-sequence-
alignment-editor.html
Gblocks [26] http://molevol.cmima.csic.
es/castresana/Gblocks.ht
ml
MEGA X [27] https://www.megasoftware
.net/
Figure 1
A Genome Coverage: 99.99% (19585nt/19587nt)
40 Mean depth: 7.71
Sequensing depth
30
20
10
0

0 5000 10000 15000 20000


Genome nucleotide position
B 100

90
Percentage nucleotide identity

80

70

60
Query: Pangolin CoV
50 Beta-CoV/Wuhan-Hu-1 (91.02%)
BatCoV RaTG13 (90.55%)
40 Bat SARSr-CoV ZXC21(85.65%)
Bat SARSr-CoV ZC45 (85.01%)
Bat SARSr-CoV WIV1(73.94%)
SARS-CoV BJ01 (73.62)

0 5000 10000 15000 20000 25000 30000


Genome nucleotide position
E 67a 10
C 1a 1b S 3a M 8 N
SARS-CoV-2 Mean 29,903 bp
1a 1b S 3a E M 6 7a 8 N 10 (DNA/AA %)
Human 89.9/96.8 90.7/73.4 88.3/97.5 94.5/96.8 98.3/97.4 93.0/98.6 95.1/96.3 92.0/96.9 91.7/94.8 94.9/96.4 99.1/97.3 93.2/94.1

Pangolin Pangolin CoV Mean 19,587 bp


1a 1b S 3a E M 6 7a 8 N 10 (DNA/AA %)
90.3/97.0 90.4/72.8 87.3/95.4 95.0/96.8 98.3/97.4 93.0/98.6 93.8/96.3 90.4/93.6 91.5/96.6 95.4/96.4 98.3/94.6 92.8/93.5

Bat BatCoV RaTG13 29,855 bp

0 10000 20000 30000


Figure 2 100 SARS-CoV SZ3
100 Bat SARSr-CoV WIV16
Whole Genome RdRp 100 SARS-CoV BJ01
100 Bat SARSr-CoV Rs4231
Bat SARSr-CoV WIV1 Bat SARSr-CoV YNLF31C
100
100 Bat SARSr-CoV SHC014 Bat SARSr-CoV WIV1
81 SARS-CoV SZ3 100 Bat SARSr-CoV SHC014
81
100 SARS-CoV BJ01 Bat SARSr-CoV Rp3
100
Bat SARSr-CoV YNLF31C 100 Bat SARSr-CoV WIV16
100 Bat SARSr-CoV GX2013 Bat SARSr-CoV Rs4231
100 Bat SARSr-CoV Rp3 Bat SARSr-CoV GX2013
99 Bat SARSr-CoV SC2018
Bat SARSr-CoV SC2018 100
Bat SARSr-CoV HuB2013 100 Bat SARSr-CoV Rf1
100 Bat SARSr-CoV Rf1 Bat SARSr-CoV SX2013
100 Bat SARSr-CoV SX2013 Bat SARSr-CoV HuB2013
Bat SARSr-CoV HKU3-1 85 Bat SARSr-CoV ZXC21
100 Bat SARSr-CoV HKU3-1
100 Bat SARSr-CoV Longquan-140 100
Bat SARSr-CoV ZC45

Beta-CoV

Beta-CoV
100 Bat SARSr-CoV ZXC21 100
Bat SARSr-CoV ZC45 100 Bat SARSr-CoV Longquan-140
SARS-CoV-2 Pangolin-CoV Bat SARSr-CoV BM48-31
group
100
Bat CoV RaTG13
SARS-CoV-2 Pangolin-CoV
100 100 Beta-CoV/Wuhan/IPBCAMS-WH-03 group Bat CoV RaTG13
Beta-CoV/Wuhan-Hu-1 100 Beta-CoV/Wuhan-Hu-1
100
100
Beta-CoV/Wuhan/IPBCAMS-WH-02 92 Beta-CoV/Wuhan/IPBCAMS-WH-01
100 Beta-CoV/Wuhan/IPBCAMS-WH-04 Beta-CoV/Wuhan/IPBCAMS-WH-02
100
Beta-CoV/Wuhan/IPBCAMS-WH-01 100 Beta-CoV/Wuhan/IPBCAMS-WH-03
Beta-CoV/Wuhan/IPBCAMS-WH-05 Beta-CoV/Wuhan/IPBCAMS-WH-04
Bat SARSr-CoV BM48-31 Beta-CoV/Wuhan/IPBCAMS-WH-05
Bat Hp-BetaCoV Zhejiang2013 Bat Hp-BetaCoV Zhejiang2013
100
Rousettus bat CoV HKU9 MERS-CoV
MERS-CoV 100 Tylonycteris bat CoV HKU4
100 Tylonycteris bat CoV HKU4 Pipistrellus bat CoV HKU5
100 Pipistrellus bat CoV HKU5 Rousettus bat CoV HKU9
Human CoV HKU1 Human CoV HKU1
100 Mus musculus MHV-1 100 Mus musculus MHV-1
100 Human CoV OC43 100 Human CoV OC43
100 TGEV 100 TGEV
Mink-CoV Mink-CoV
Rhinolophus bat CoV HKU2 99 PEDV
100

Alpha-CoV
100
Alpha-CoV

100 Human CoV NL63 Scotophilus bat CoV 512


100 Human CoV 229E Rhinolophus bat CoV HKU2
98
100 PEDV 0.20 100 Miniopterus bat CoV HKU8
0.50 100
Scotophilus bat CoV 512 Miniopterus bat CoV1
100 Miniopterus bat CoV HKU8 Human CoV NL63
100 Miniopterus bat CoV1 93 Human CoV 229E
Figure 3 340 350 360 370 380 390 400 410 420 430
. . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . .
Beta-CoV/Wuhan-Hu-1 PN I T N L C P F G E V F N A T R F A S V Y AWN R K R I S NCV ADY S V L Y NS AS F S T F KCY GV S P T KL NDL CF T NV Y ADS F V I RGDE V RQ I AP GQT GK I ADYNYKL PDDF 432
Beta-CoV/Wuhan/IPBCAMS-WH-02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Beta-CoV/Wuhan/IPBCAMS-WH-05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Beta-CoV/Wuhan/IPBCAMS-WH-03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Beta-CoV/Wuhan/IPBCAMS-WH-04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Beta-CoV/Wuhan/IPBCAMS-WH-01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Pangolin-CoV . . . . . . . . . . . . . . . .T. . . . . . . . . . . . . . . . . . . . . . . . .T. . . . . . . . . . . . . . . . . . . . . . . . . . . . .V. . . . . . . . . . . . . .R . .G. . . . . . . . . 432
Bat_CoV_RaTG13 . . . . . . . . . . . . . . . .T. . . . . . . . . . . . . . . . . . . . . . . . .T. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .T. . . . . . . . . . . . . . . . . . . . . . . . . . 432
SARS-CoV_SZ3 . . . . . . . . . . . . . . . .K.P. . . . .E. . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . VK . . D . . . . . . . . . . V . . . . . . . . . . . . 432
SARS-CoV_BJ01 . . . . . . . . . . . . . . . .K.P. . . . .E. .K . . . . . . . . . . . . . . TF . . . . . . . . . . A . . . . . . . . S . . . . . . . . VK . . D . . . . . . . . . . V . . . . . . . . . . . . 432
Bat_SARSr-CoV_WIV1 . . . . . . . . . . . . . . . .T.P. . . . .E. . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . VK . . D . . . . . . . . . . V . . . . . . . . . . . . 432
Bat_SARSr-CoV_WIV16 . . . . . . . . . . . . . . . .T.P. . . . .E. . . . . . . . . . . . . . . . . T . . . . . . . . . . . A . . . . . . . . S . . . . . . . . VK . . D . . . . . . . . . . V . . . . . . . . . . . . 432
Bat_SARSr-CoV_ZXC21 . . . . . V . . . HK . . . . . . . P . . . . . E . T K . .D. I . . .T .F . . .T . . . . . . . . . . . .S . . I . . . . .S . . . .T .L . .FS . . . .V . . . . . .V . . . . . . . . . . . . 432
Bat_SARSr-CoV_ZC45 . . . . . V . . . HK . . . . . . . P . . . . . E . T K . .D. I . . .T .F . . .T . . . . . . . . . . . .S . . I . . . . .S . . . .T .L . .FS . . . .V . . . . . .V . . . . . . . . . . . . 432
Bat_SARSr-CoV_YNLF31C . . . . . . . . . DK . . . . . . . P . . . . . E . T K . .D. . . . .T .F . . .T . . . . .N. . . . . .S . . I . . . . .S . . . .T .L . .FS . . . .V . . . . . .V . . . . . . . . . . . . 432
Bat_SARSr-CoV_SX2013 . . . . . . . . . DK . . . . . . . P . . . . . E . T K . .D. . . . .T .F . . .T . . . . .N. . . . . .S . . I . . . . .S . . . .T .L . .FS . . . .V . . . . . .V . . . . . . . . . . . . 432
Bat_SARSr-CoV_Rf1 . . . . . . . . . DK . . . . . . . P . . . . . E . T K . .D. . . . .T .F . . .T . . . . .N. . . . . .S . . I . . . . .S . . . .T .L . .FS . . . .V . . . . . .V . . . . . . . . . . . . 432
Bat_SARSr-CoV_Rp3 . . . . . R . . . DK . . . . . . . PN . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . SS . . . . V . . . E . . V . . . . . . . . . . . . 432
Bat_SARSr-CoV_GX2013 . . . . . R . . . DK . . . . . . . PN . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . SS . . . . V . . . E . . V . . . . . . . . . . . . 432
Bat_SARSr-CoV_HKU3-1 . . . . . R . . . DK . . . . . . . PN . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . SS . . . . V . . . E . . V . . . . . . . . . . . . 432
Bat_SARSr-CoV_Longquan-140 . . . . . R . . . DK . . . V . . . PN . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . SS . . . . V . . . E . . V . . . . . . . . . . . . 432
Bat_SARSr-CoV_SC2018 . . . . . R . . . DK . . . . S . . P . . . . . E . I K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . SS . . . . V . . . E . . V . . . . . . . . . . . . 432
Bat_SARSr-CoV_HuB2013 . . . . . R . . . DR . . . . S . . P . . . . . E . T K . . D . . . . . T . . . . . T . . . . . . . . . . . . S . . I . . . . . S . . . . T . L . . SS . . . . V . . . E . . V . . . . . . . . . . . . 432
440 450 460 470 480 490 500 510 520
. . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . . . . | . .
Beta-CoV/Wuhan-Hu-1 T G C V I A W N S N N L D S K V G G N Y N Y L Y R L F R K S N L K P F E R D I S T E I Y Q A G T P C N G V E G F N C Y F P L Q S Y G F Q P T N G V G Y Q P Y R V V V L S F E L L H A 522
Beta-CoV/Wuhan/IPBCAMS-WH-02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
Beta-CoV/Wuhan/IPBCAMS-WH-05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 99
Beta-CoV/Wuhan/IPBCAMS-WH-03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
Beta-CoV/Wuhan/IPBCAMS-WH-04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 93
Beta-CoV/Wuhan/IPBCAMS-WH-01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 93
Pangolin-CoV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H . . . . . . . . . . . . . . . . . . . . N . 522
Bat_CoV_RaTG13 . . . . . . . . . K H I . A . E . . . F . . . . . . . . . A . . . . . . . . . . . . . . . . . K . . . . Q T . L . . . Y . . Y R . . . Y . . D . . . H . . . . . . . . . . . . . N . 522
SARS-CoV_SZ3 M . . . L . . . T R . I . A T S T . . . . . K . . Y L . H G K . R . . . . . . . N V P F S P D K . . T P - P - L . . . W . . K D . . . Y T . S . I . . . . . . . . . . . . . . . N . 520 99
SARS-CoV_BJ01 M . . . L . . . T R . I . A T S T . . . . . K . . Y L . H G K . R . . . . . . . N V P F S P D K . . T P - P - L . . . W . . N D . . . Y T . T . I . . . . . . . . . . . . . . . N . 520 99
Bat_SARSr-CoV_WIV1 . . . . L . . . T R . I . A T Q T . . . . . K . . S L . H G K . R . . . . . . . N V P F S P D K . . T P - P - . . . . W . . N D . . . Y I . . . I . . . . . . . . . . . . . . . N . 520
Bat_SARSr-CoV_WIV16 . . . . L . . . T R . I . A T Q T . . . . . K . . S L . H G K . R . . . . . . . N V P F S P D K . . T P - P - . . . . W . . N D . . . Y I . . . I . . . . . . . . . . . . . . . N . 520 91
Bat_SARSr-CoV_ZXC21 . . . . . . . . T A K Q . T G H - - - - - . F . . S H . S T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N . N V P L E . . A T . . . . . . . . . . N . 502 72
Bat_SARSr-CoV_ZC45 . . . . . . . . T A K Q . V G N - - - - - . F . . S H . S T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N . N V P L E . . A T . . . . . . . . . . N . 502 96
Bat_SARSr-CoV_YNLF31C . . . . . . . . T A K Y . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G A R T . S - . D . N Q N V P L E . . A T . . . . . . . . . . N . 502
Bat_SARSr-CoV_SX2013 . . . . . . . . T A K Q . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N Q Y V P L E . . A T . . . . . . . . . . N . 502 98
Bat_SARSr-CoV_Rf1 . . . . . . . . T A K Q . V G S - - - - - . F . . S H . S . K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . N Q N V P L E . . A T . . . . . . . . . . N . 502
Bat_SARSr-CoV_Rp3 . . . . . . . . T A K Q . Q G Q - - - - - . Y . . S H . . T K . . . . . . . L . S . - - - - - - - - - - - - - - . G V R T . S - . D . Y . S V P . A . . A T . . . . . . . . . . N . 502 100
Bat_SARSr-CoV_GX2013 . . . . . . . . T A K Q . T G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503
Bat_SARSr-CoV_HKU3-1 . . . . . . . . T A K H . T G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503
Bat_SARSr-CoV_Longquan-140 . . . . . . . . T A K Q . I G N - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503
Bat_SARSr-CoV_SC2018 . . . . . . . . T A K Q . T G S - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503
Bat_SARSr-CoV_HuB2013 . . . . . . . . T A K Q . T G Y - - - - - . Y . . S H . . T K . . . . . . . L . S D - - - - - - - - - - - - - - . G V Y T . S T . D . N . N V P . A . . A T . . . . . . . . . . N . 503 84
Figure 4 S1/S2 cleavage site
Beta-CoV/Wuhan-Hu-1 Y QT QT N S P R R A R S V A S QS I I AY TMS L G
Beta-CoV/Wuhan/IPBCAMS-WH-02 . . . . . . . . . . . . . . . . . . . . . . . . . . .
Beta-CoV/Wuhan/IPBCAMS-WH-03 . . . . . . . . . . . . . . . . . . . . . . . . . . .
Beta-CoV/Wuhan/IPBCAMS-WH-05 . . . . . . . . . . . . . . . . . . . . . . . . . . .
Beta-CoV/Wuhan/IPBCAMS-WH-01 . . . . . . . . . . . . . . . . . . . . . . . . . . .
Beta-CoV/Wuhan/IPBCAMS-WH-04 . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bat_CoV_RaTG13 . . . . . . - - - -S. . . . . . . . . . . . . . . .
Pangolin-CoV . . . . . . - - - -S. . . S. . A . . . . . . . . .
Bat_SARSr-CoV_ZXC21 . H . A S I - - - - L . . T GQK A . V. . . . . . .
Bat_SARSr-CoV_ZC45 . H . A S I - - - - L . . T S QK A . V. . . . . . .
SARS-CoV_SZ3 . H . V S S - - - - L . . T S QK . . V. . . . . . .
SARS-CoV_BJ01 . H . V S L - - - - L . . T S QK . . V. . . . . . .
Bat_SARSr-CoV_WIV1 . H . V S S - - - - L . . T S QK . . V. . . . . . .
Bat_SARSr-CoV_SHC014 . H . V S S - - - - L . . T S QK . . V. . . . . . .
Bat_SARSr-CoV_Rs4231 . H . V S S - - - - L . . T S QK . . V. . . . . . .
Bat_SARSr-CoV_WIV16 . H . V S S - - - - L . . T S QK . . V. . . . . . .
Bat_SARSr-CoV_YNLF31C . H . A S V - - - - L . . T GQK . . V. . . . . . .
Bat_SARSr-CoV_SX2013 . H . A S L - - - - L . . T GQK . . V. . . . . . .
Bat_SARSr-CoV_Rf1 . H . A S H - - - - L . . T GQK . . V. . . . . . .
Bat_SARSr-CoV_Rp3 . H . A S T - - - - L . . . GQK . . V. . . . . . .
Bat_SARSr-CoV_GX2013 . H . A S V - - - - L . . T GQK . . V. . . . . . .
Bat_SARSr-CoV_HKU3-1 . H . A S V - - - - L . . T GQK . . V. . . . . . .
Bat_SARSr-CoV_Longquan-140 . H . A S V - - - - L . . T GQK . . V. . . . . . .
Bat_SARSr-CoV_SC2018 . H . A S T - - - - L . . T GQK . . V. . . . . . .
Bat_SARSr-CoV_HuB2013 . H . A S V - - - - L . . T GQK . . V. . . . . . .
. . | . . . . | . . . . | . . . . | . . . . | . . . .
680 690 700
Supplemental Data
100 Bat SARSr-CoV WIV16 Bat SARSr-CoV WIV1
S M 99
Bat SARSr-CoV SHC014
100 Bat SARSr-CoV Rs4231
73 Bat SARSr-CoV WIV16
Bat SARSr-CoV WIV1
100 Bat SARSr-CoV YNLF31C
100 Bat SARSr-CoV SHC014
81 SARS-CoV SZ3 89 Bat SARSr-CoV Rf1
100 SARS-CoV BJ01 100 Bat SARSr-CoV SX2013
100 SARS-CoV SZ3
Bat SARSr-CoV YNLF31C 70
100 Bat SARSr-CoV GX2013 99 SARS-CoV BJ01
100 Bat SARSr-CoV Rp3 Bat SARSr-CoV SC2018
99 91 Bat SARSr-CoV Rs4231
Bat SARSr-CoV SC2018
Bat SARSr-CoV HuB2013 Bat SARSr-CoV GX2013
100 Bat SARSr-CoV Rf1 Bat SARSr-CoV Rp3
64
77
100 Bat SARSr-CoV SX2013 55 Bat SARSr-CoV Longquan-140
Bat SARSr-CoV HKU3-1 67 Bat SARSr-CoV HuB2013
100100 100 Bat SARSr-CoV HKU3-1
Bat SARSr-CoV Longquan-140

Beta-CoV
Beta-CoV
100 Bat SARSr-CoV ZXC21 100 Bat SARSr-CoV BM48-31
Bat SARSr-CoV ZC45 SARS-CoV-2 100 Bat SARSr-CoV ZXC21
SARS-CoV-2 51 Bat SARSr-CoV ZC45
100 Pangolin-CoV group
group Bat CoV RaTG13 Pangolin-CoV
100 100 Beta-CoV/Wuhan/IPBCAMS-WH-03 Bat CoV RaTG13
91 Beta-CoV/Wuhan-Hu-1
100 Beta-CoV/Wuhan-Hu-1 99
Beta-CoV/Wuhan/IPBCAMS-WH-02 68 Beta-CoV/Wuhan/IPBCAMS-WH-01
100 Beta-CoV/Wuhan/IPBCAMS-WH-04 Beta-CoV/Wuhan/IPBCAMS-WH-02
100
Beta-CoV/Wuhan/IPBCAMS-WH-01 99 Beta-CoV/Wuhan/IPBCAMS-WH-03
95
Beta-CoV/Wuhan/IPBCAMS-WH-05 Beta-CoV/Wuhan/IPBCAMS-WH-04
Bat SARSr-CoV BM48-31 Beta-CoV/Wuhan/IPBCAMS-WH-05
Bat Hp-BetaCoV Zhejiang2013 69 Bat Hp-BetaCoV Zhejiang2013
100
Rousettus bat CoV HKU9 Rousettus bat CoV HKU9
MERS-CoV MERS-CoV
100 Tylonycteris bat CoV HKU4 100 Tylonycteris bat CoV HKU4
100 Pipistrellus bat CoV HKU5 62 Pipistrellus bat CoV HKU5
Human CoV HKU1 Human CoV HKU1
100 Mus musculus MHV-1 100 Mus musculus MHV-1
100 Human CoV OC43 Human CoV OC43
100 TGEV 99 TGEV
Mink-CoV Mink-CoV
Rhinolophus bat CoV HKU2 Rhinolophus bat CoV HKU2
100 100

Alpha-CoV
Alpha-CoV

100 Human CoV NL63 86 Human CoV NL63


100 Human CoV 229E 91 Human CoV 229E
100 PEDV 98 PEDV
0.50 100
0.50
Scotophilus bat CoV 512 Scotophilus bat CoV 512
100 Miniopterus bat CoV HKU8 66 Miniopterus bat CoV HKU8
100 Miniopterus bat CoV1 98 Miniopterus bat CoV1

Figure S1. Phylogenetic relationship of CoVs based on the ORF1a gene (A) and ORF1b gene (B) nucleotide sequences.
Related to Figure 2.
Bat SARSr-CoV WIV16 100 Bat SARSr-CoV WIV16
orf1a 90 Bat SARSr-CoV Rs4231
orf1b 87 Bat SARSr-CoV Rs4231
Bat SARSr-CoV WIV1 Bat SARSr-CoV WIV1
95
98 Bat SARSr-CoV SHC014 100 Bat SARSr-CoV SHC014
SARS-CoV SZ3 Bat SARSr-CoV GX2013
100 SARS-CoV BJ01 97 Bat SARSr-CoV Rp3
100 Bat SARSr-CoV GX2013 100
Bat SARSr-CoV YNLF31C
100 Bat SARSr-CoV Rp3
100 100 SARS-CoV SZ3
Bat SARSr-CoV YNLF31C 100 SARS-CoV BJ01
Bat SARSr-CoV SC2018 Bat SARSr-CoV SC2018
90 Bat SARSr-CoV HuB2013 100
100 Bat SARSr-CoV Rf1
Bat SARSr-CoV Rf1 81
100 100 Bat SARSr-CoV SX2013
Bat SARSr-CoV SX2013
Bat SARSr-CoV HuB2013
86 Bat SARSr-CoV HKU3-1
99 85 Bat SARSr-CoV HKU3-1
93

Beta-CoV
Bat SARSr-CoV Longquan-140
100 Bat SARSr-CoV Longquan-140
100 Bat SARSr-CoV BM48-31
Bat SARSr-CoV ZXC21
97 Bat SARSr-CoV ZXC21

Beta-CoV
100 100 Bat SARSr-CoV ZC45
SARS-CoV-2 Bat SARSr-CoV ZC45
Bat SARSr-CoV BM48-31
group 97
Pangolin-CoV SARS-CoV-2 Pangolin-CoV
Bat CoV RaTG13 group Bat CoV RaTG13
100 Beta-CoV/Wuhan/IPBCAMS-WH-01
100 Beta-CoV/Wuhan-Hu-1
94 Beta-CoV/Wuhan-Hu-1 100
100
Beta-CoV/Wuhan/IPBCAMS-WH-02 100 Beta-CoV/Wuhan/IPBCAMS-WH-01
Beta-CoV/Wuhan/IPBCAMS-WH-02
Beta-CoV/Wuhan/IPBCAMS-WH-03
100 100 Beta-CoV/Wuhan/IPBCAMS-WH-03
83 Beta-CoV/Wuhan/IPBCAMS-WH-04 72
Beta-CoV/Wuhan/IPBCAMS-WH-04
Beta-CoV/Wuhan/IPBCAMS-WH-05
Beta-CoV/Wuhan/IPBCAMS-WH-05
Bat Hp-BetaCoV Zhejiang2013 86 Bat Hp-BetaCoV Zhejiang2013

Rousettus bat CoV HKU9 Rousettus bat CoV HKU9


MERS-CoV Tylonycteris bat CoV HKU4
Tylonycteris bat CoV HKU4 100 Pipistrellus bat CoV HKU5
100
98 Pipistrellus bat CoV HKU5 95 MERS-CoV
Human CoV HKU1 Human CoV HKU1
Mus musculus MHV-1 100 Mus musculus MHV-1
100
Human CoV OC43 100 Human CoV OC43
TGEV 100 TGEV
100
Mink-CoV Mink-CoV
Rhinolophus bat CoV HKU2 100 PEDV
100
Scotophilus bat CoV 512

Alpha-CoV
Alpha-CoV

100 Human CoV NL63


83 Human CoV 229E 100 Miniopterus bat CoV HKU8
0.20 100
100 PEDV Miniopterus bat CoV1
100
0.20 Scotophilus bat CoV 512 75 Rhinolophus bat CoV HKU2
88 Miniopterus bat CoV HKU8 Human CoV NL63
100 Miniopterus bat CoV1 99 Human CoV 229E

Figure S2. Phylogenetic relationship of CoVs based on the S gene (A) and M gene (B) nucleotide sequences. Related to Figure 2.
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220
.... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... |
Beta-CoV/Wuhan-Hu-1 MSDNGPQ-NQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAAIVLQLPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNSSRNSTPGSSRGTSPARMAGNGGDAAL
Beta-CoV/Wuhan/IPBCAMS-WH-02 .......-....................................................................................................................................................................................................................
Beta-CoV/Wuhan/IPBCAMS-WH-03 .......-....................................................................................................................................................................................................................
Beta-CoV/Wuhan/IPBCAMS-WH-05 .......-....................................................................................................................................................................................................................
Beta-CoV/Wuhan/IPBCAMS-WH-01 .......-....................................................................................................................................................................................................................
Beta-CoV/Wuhan/IPBCAMS-WH-04 .......-....................................................................................................................................................................................................................
Bat_CoV_RaTG13 .......-.............................P.................................................................................................................................................................................S....
Pangolin-CoV .......-................A............P...........................R..............................................................E......N.............................................................T......................
Bat_SARSr-CoV_ZXC21 .......-..SS............SDNS.....N...P.........................N.T..............K......................E........................E........................................................................................T..
Bat_SARSr-CoV_ZC45 .......-...S............SDNSK....N...P.........................N.T..............K......................E........................E........................................................................................T..
SARS-CoV_SZ3 .......S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET..
SARS-CoV_BJ01 .......S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET..
Bat_SARSr-CoV_WIV1 .......S...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET..
Bat_SARSr-CoV_SHC014 .......P...S.........T.P.DN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET..
Bat_SARSr-CoV_WIV16 .......P...S.........T..IDN....G.N...P.........................E.R.............G..............V........E................S.......E..V....................N....T..................................GN...........N......SG..ET..
Bat_SARSr-CoV_YNLF31C ......H-...S.S.......T...DN....G.N...P.........................E.R..Q..........G..............V........E................S.......E.......................N....T..................................GN................I.SG..ET..
Bat_SARSr-CoV_Rs4231 .......P...S.........T...DN....G.N...P.........................E.R.............G..............V........E................S.......E.......................N....T..................................GN...........N......SG..ET..
Bat_SARSr-CoV_Longquan-140 .......-S..S.........T..ADN..D.G.....P.........................E.R.............GK........K....V........E................S.......E..V....................N.......................................GN...........N....L.SG..ET..
Bat_SARSr-CoV_HKU3-1 .......-S..S.........A..NDN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.............................S.........GN...........S....L.SG..ET..
Bat_SARSr-CoV_GX2013 .......S...S..S......T...DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V...I................N.......................................GN...........N....L.SG..ET..
Bat_SARSr-CoV_Rf1 .......-..CS.............DN..D.G.....P.........................G....Q..........GR.............V........E................S.......E..V....................N.........................N.............GN..T........N....V.SG..ET..
Bat_SARSr-CoV_SX2013 .......-...S.............DN..D.G...V.P.........................G....Q..........GR.............V........E................S.......E..V....................N.........................N.............GN..T........N....V.SG..ET..
Bat_SARSr-CoV_HuB2013 .......-...S.............DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET..
Bat_SARSr-CoV_Rp3 .......-...S.........T...DN..D.G.....P.........................E.R.............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET..
Bat_SARSr-CoV_SC2018 .......-...S.........T...DN..D.G.....P.........................E...............GK.............V........E................S.......E..V....................N.......................................GN...........N......SG..ET..
230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420
.... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | .... | ...
Beta-CoV/Wuhan-Hu-1 ALLLLDRLNQLESKMSGKGQQQQGQTVTKKSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQKKQQTVTLLPAADLDDFSKQLQQSMS--SADSTQA*
Beta-CoV/Wuhan/IPBCAMS-WH-02 .................................................................................................................................................................................................--.......*
Beta-CoV/Wuhan/IPBCAMS-WH-03 .................................................................................................................................................................................................--.......*
93
Beta-CoV/Wuhan/IPBCAMS-WH-05 .................................................................................................................................................................................................--.......*
Beta-CoV/Wuhan/IPBCAMS-WH-01 .................................................................................................................................................................................................--.......* 96
Beta-CoV/Wuhan/IPBCAMS-WH-04 .................................................................................................................................................................................................--.......* 100
Bat_CoV_RaTG13 .......................S.......................Q.................................................................................................................................................--.......*
Pangolin-CoV ...............................................Q................................Q.................................................................------------------------------..-N.............--.......*
100
Bat_SARSr-CoV_ZXC21 ............N.I................................Q..................................................................H..........Q...N.............................L......................E..........G--T.....*
Bat_SARSr-CoV_ZC45 ............N.V................................Q..................................................................H..........Q...N.............................L......................E..........G--T.....* 100
SARS-CoV_SZ3 ..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*
SARS-CoV_BJ01 ..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*
Bat_SARSr-CoV_WIV1 ..............V.....................R..........Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*
Bat_SARSr-CoV_SHC014 ..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*
Bat_SARSr-CoV_WIV16 ..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......* 65
Bat_SARSr-CoV_YNLF31C ..............V................................Q............D.........D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*
Bat_SARSr-CoV_Rs4231 ..............V................................Q......................D...........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*
Bat_SARSr-CoV_Longquan-140 ..............V.......P........................Q..................................................................H..........Q...N..........................T..A.P....P...P.........M....R...N...GA.......* 90
Bat_SARSr-CoV_HKU3-1 ..............V.......P........................Q............................I.....................................H..........Q...N..........................T..A.P........P.........M....R...H...GA.......*
Bat_SARSr-CoV_GX2013 ..............V................................Q......................D.T.........................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*
Bat_SARSr-CoV_Rf1 ..............V................TS..............Q............D........................................S..........I.H..........Q...N..........................T..A.P........P.........M....R...N...GA.......* 100
Bat_SARSr-CoV_SX2013 ..............V.................S..............Q............D.....................................................H..........Q...N..........................T..A.P........P.........M....R...N...GA.......*
Bat_SARSr-CoV_HuB2013 ..............V................................S..................................................................H..........Q...N..........................T..A.P....-...P.........M....R...N...GA.......*
Bat_SARSr-CoV_Rp3 ..............V..RS............................Q..................................................................H..........Q...N............I.............T..A.P........P.........M....R...N...GA.......*
Bat_SARSr-CoV_SC2018 ..............V................................Q..................................................................H..........Q...N.M..........A.............T..A.P........P.........M....R...N...GA.......* 52

Figure S3. Amino acid sequence alignment of the N protein and its phylogeny. Related to Figure 2. Highly-conserved amino acid
residues in the N-protein marked by colours have diagnostic potential.

You might also like