Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Supplemental Material

Validation of novel forensic DNA markers using multiplex microhaplotype sequencing

1. Description of mMHseq assay.

2. Supplementary figures and tables.

Figure S1. Regional distributions of effective number of alleles (A e).

Figure S2. Distribution of ordered informativeness values (In).

Figure S3. Correlation of Ae and In values.

Figure S4. Population specific random match probability (RMP) and genotype frequency.

Figure S5. Principal component analysis (PCA) of 10 African populations.

Figure S6. PCA of 20 non-African populations.

Figure S7. STRUCTURE analysis of 30 populations in the 90 MH.

Table S1. Study populations.

Table S2. mMHseq primers.

Table S3. Ae values in 30 populations. Note: This is a large separate excel data file.

Table S4. SNPs in 90 microhaplotype regions.

1
1. Description of mMHseq assay.
mMHseq assay design: We developed the mMHseq for multiplex sequencing of 90 MH in a
single tube reaction. The assay design is based on our established technology for multiplex
targeted sequencing from small DNA input amounts[1, 2]. Initial optimization started with primer
pairs for 96 loci and involved several rounds of primer pool rebalancing, which included
increasing/lowering the concentration of specific primers to reach good amplification uniformity
across all microhaplotypes, replacing of failed primers, resequencing and data analysis. This
resulted in a final set of 90 MH which were then used to sequence 156 DNA samples. A total
of 29,364 nucleotide bases that includes 15,585 target nucleotide bases are covered in the 90
microhaplotypes based on the human reference genome sequence (Hg19/GRCh37).

mMHseq primer design: We developed a custom Perl script that integrated the Primer 3
algorithm to design both target specific forward and reverse primers for each microhaplotype
(Supplemental Table S2). Whenever possible, primer hybridization sites were selected to avoid
common SNPs found in the dbSNP database, which could interfere with amplification success.
We run RepeatMasker for the genomic regions of the 90 MH to identify primer pairs falling in
regions that contain repeats and low complexity DNA sequences. Primer pairs with multiple
genomic blast hits were removed as well. Primers were designed to have similar length of an
average 23 bp (range 22-27bp), similar GC content (avg. 53%) similar amplicon size of an
average 327 bp (range 300-350bp) in order to match the 2x250 bp paired-end sequencing
chemistry on the Illumina instruments. Adapter sequences (24 bp) were included at the 5' end
of each primer for post-capture amplification[1]. This design choice facilitates bidirectional
phasing of each microhaplotype.

mMHseq target capture: All primer pairs targeting the 90 MH were pooled together in a single
tube at 4M (10X) concentration. Multiplex amplification was performed using 2 l of the 10X
primer stock, 3-6 l of the template DNA with concentration ranging from 5-20 ng/l, and the
KAPA2G Fast Multiplex PCR Kit reagents in a final volume of 20 l (Kapa Biosystems,
Wilmington, MA) in a Veriti 96-well thermal cycler (Applied Biosystem, Foster City, CA) using
the following thermal profile: 95°C for 3 minutes, 12 cycles of 95°C for 16 seconds, 69-52°C (-
1.5°C per cycle) for 2 minutes, and 72°C for 45 seconds, followed by 10 cycles of 95°C for 16
seconds, 72°C for 20 seconds, and 72°C for 2 minutes. PCR product cleanup was performed

2
by adding 24 l (ratio: 1.2:1) of AMPure XP beads (Beckman Coulter, Brea, CA) and cleanup
was done according to the manufacturers’ recommendations. The DNA was eluted in a final
volume of 15 l elution buffer.

Sequence library construction and sequencing: Sequencing library preparation for the MiSeq
platform was performed according to the manufacturer’s recommendation using 5 l of
purified 1st PCR product per sample. PCR was performed using KAPA (Kapa Biosystems,
Wilmington, MA) library amplification kit in 25 l reactions, using common primers with
sample specific indices and Illumina's P5 and P7 adapter sequences attached at the 5' end.
Samples were barcoded with 8 bp dual indices according to Illumina's index sequencing
protocol. PCR was performed under the following cycling conditions: 98°C for 16 seconds, 13
cycles of 98°C for 16 seconds and 72°C for 20 seconds. The 2nd PCR products were
individually quantified on Agilent’s 1000 DNA chip. Equal quantities of each sample were
pooled (approximately 200-400 l total volume) and purified using AMPure XP beads
(Beckman Coulter, Brea, CA) with a bead to sample ratio of 1.2:1 and eluted in 60 l. Bead-
purified DNA was further size-selected for fragments of 370-600 bp using the Pippin Prep
system (Sage Science Inc., Beverly, MA). DNA concentration of the purified mMHseq library
was quantified using the 2100 Bioanalyzer instrument (Agilent Technologies, Santa Clara,
CA) and 12 pmoles of the library along with 15% PhiX spike-in was sequenced on the MiSeq
instrument using a 2x250-bp paired-end kit (MiSeq Reagent kit v2, 500-cycles, Illumina, San
Diego, CA) according to the manufacturer’s protocol.

Sequence alignment: Image analysis and sample de-multiplexing was performed with the
Illumina MiSeq Control Software version 2.4.1 and the MiSeq Reporter version 2.5.1.3
(Illumina, San Diego, CA). The resulting processed fastq files were aligned to the MH
sequences (hg19/GRCh37) using the Burrows-Wheeler Aligner (BWA-MEM, version 0.7.12).
Picard (version 2.9.0) was used to sort and convert files to BAM format. The potential for
misalignments is greatly minimized if not eliminated by: 1. the larger amplicon length for our
MH regions (avg. size 326 bp) in combination with the longer 2x250 PE read length and high
coverage depth, which increases confidence of read alignment and base calling, and 2.) the
high number of 4-14 SNPs per MH loci which makes these MH loci unique.

3
mMHseq quality control: QC was performed at 3 levels: individual samples, sample amplicon
and base levels for each of the 156 samples. Samtools view was used to extract the total
number of reads for each sample. Bedtools coverage was used to count the total number of
reads mapping to the region of interest (ROI) of each microhaplotype in each sample. Based
on this information, a value for amplicon uniformity was calculated by obtaining the mean
amplicon coverage of each sample and then calculating the percentage of amplicons in that
sample that were covered by at least 50% (0.5 x mean) of the mean amplicon coverage. The
higher the amplicon uniformity value the more uniform the overall amplification for a sample.
This amplicon measure is important for future studies in order to determine the optimal read
count for each sample amplicon. Samtools mpileup was used to obtain the percentage of base
pairs for each microhaplotype region. The number of sequence reads per base (coverage) is
directly related to the confidence of variant base-calling.

Variant calling: Variant calling was performed using the GATK UnifiedGenotyper (GATK UG)
in the ROI of the 90 MHs. GATK UG was used to call variants for samples simultaneously with
downsampling set to none while the other parameters were set in default values. The SNP raw
calls generated were then selected by GATK SelectVariant (parameter selectType SNP) and
then hard filtered through GATK VariantFiltration by setting the filtering criteria: “QD < 2.0 ||
MQ < 40.0 || MQRankSum < -12.5” in order to exclude potential false positive variants caused
by sequencing and mapping error.

Microhaplotype phasing and visualization: Each MH region in each sample was phased using
GATK ReadBackedPhasing tools in GATK 3.8.0 by setting the phaseQualityThresh > 20. SNPs
identified in the primer landing region were excluded from the analysis. Three criteria/filter were
established to pass a SNP loci: 1) The average heterozygous ratio filter set to smaller than 0.2,
2) Novel STR filter with high global allele frequency (>0.8) in a homopolymer region, 3) Deletion
induced variant filter detected by calculating read coverage of the called SNP in heterozygous
or homozygous alternate sites with the amplicon read count ratio derived from amplicon QC
step to be greater than 80%. Finally, each site was compared with previously known SNPs in
our samples, 1KG data and dbSNP, followed by labeling and visualization through ggplot2. A
customized R script was written to extract and convert the phased haplotype into the plotting

4
data for each sample and each MH region to be uploaded to mMHseq website for visualization.
The scripts are publicly available at https://github.com/ScharfeLab/mMHseq_website.

References

1. Lefterova, M.I., et al., Next-Generation Molecular Testing of Newborn Dried Blood Spots for
Cystic Fibrosis. J Mol Diagn, 2016. 18(2): p. 267-82.
2. Peng, G., et al., Combining newborn metabolic and DNA analysis for second-tier testing of
methylmalonic acidemia. Genet Med, 2019. 21(4): p. 896-903.
3. 1000 Genomes Project Consortium, Auton A., et al. A global reference for human genetic
variation. Nature, 2015. 526(7571): p. 68-74.

5
2. Supplementary figures and tables.

Supplemental Figure S1: Regional distributions of effective number of alleles (Ae).


Shown are the distribution of the average Ae for 10 African populations that includes 7
populations from 1000 Genomes (1KG) and 3 populations from this study (A); for 5 European
populations from the 1KG and the Adygei population from this study (B); for 5 South Asian
populations from the 1KG (C); and for 5 East Asian populations from the 1KG (D).

6
Supplemental Figure S2: Distribution of ordered informativeness values (In). The In
values for the 90 mMHseq microhaplotypes were calculated from the 30 populations and
ordered from low to high along the X-axis. The In values ranged from 0.08 to 0.7.

7
Supplemental Figure S3. Correlation of Ae and In values. The scatterplot of informativeness
(In ) by the effective number of alleles (Ae) for the 90 mMHseq microhaplotypes shows a linear
relationship with R2 = 0.5084 and a correlation coefficient of 0.713.

0.8

0.7
R² = 0.5084
0.6
Informativeness (In )

0.5

0.4

0.3

0.2

0.1

0
0 2 4 6 8 10 12 14
Effective number of alleles (Ae)

8
Supplemental Figure S4. Population specific random match probability (RMP) and
genotype frequency. The RMP and the most common genotype frequency in the 90
microhaplotypes in each of the 30 populations were analyzed. The populations including 26
populations from 1KG and 4 populations sequenced in this study (Adygei, Chaga, Sandawe
and Zaramo). The Y axis scale is inverted to show the smallest value on the top.

9
Supplemental Figure S5. Principal component analysis (PCA) of 10 African populations.
The 10 African populations including 7 from 1KG and 3 from this study (CGA, SND, and ZRM
underlined) are plotted using the first two principal coordinates based on the haplotype
frequencies for the 90 MH. PC#1 accounts for 19.56% of the variance and PC#2 accounts for
an additional 14.64%. The symbols for the populations are given in Supplemental Table 1.

10
Supplemental Figure S6. PCA of 20 non-African populations. The 20 non-African
populations are plotted using the first two principal coordinates based on the haplotype
frequencies for the 90 MH. PC#1 accounts for 20.6% of the variance and PC#2 for an additional
12.4%. The symbols of the populations are given in Supplemental Table 1.

11
Figure S7. STRUCTURE analysis of 30 populations in the 90 MH. Individual bar plot
based on the highest likelihood run at K=6 clusters from a STRUCTURE software analysis for
90 microhaplotypes and 30 populations. The standard admixture model assuming correlated
allele frequencies was applied. A range of possible clusters from K equals 3 to 10 was tested
with 20 runs at each K value. The results for K=6 appear optimal. Each run had 10,000 burn-
ins and 10,000 Markov-Chain-Monte-Carlo iterations.

12
Supplemental Table S1. Study population. Statistical analysis was performed in individuals
from 30 populations including 4 populations sequenced using mMHseq (*labeled in bold) and
data from 26 populations from the 1000 Genomes project[3].

World Region Population Abbreviation N Location of population


Gambians GWD 113 The Gambia
Mende MSL 85 Sierra Leone
West Africa
Yoruba YRI 108 Ibadan, Nigeria
Esan ESN 99 Nigeria
Chagga* CGA 45 Tanzania
Sandawe* SND 40 Tanzania
East Africa
Zaramo* ZRM 28 Tanzania
Luhya LWK 99 Webuye, Kenya
African Americans ASW 61 Southwest, USA
Afr-Americans
Afro-Caribbeans ACB 96 Barbados
Iberians IBS 107 Spain
Toscani TSI 107 Italy
Adygei* ADY 30 Krasnodar, Russia
Europe
NW European Ancestry CEU 99 Utah, USA
British GBR 91 England and Scotland, UK
Finns FIN 99 Finland
Punjabi PJL 96 Lahore, Pakistan
Gujarati GIH 103 Houston, Texas, USA
South Central Asia Telugu ITU 102 United Kingdom
Tamils STU 102 Sri Lanka
Bengali BEB 86 Bangladesh
Japanese JPT 104 Tokyo, Japan
Han Chinese CHB 103 Beijing, China
East Asia Southern Han Chinese CHS 105 China
Dai CDX 93 Xishuangbanna, China
Vietnamese KHV 99 Ho Chi Minh City, Vietnam
South America Peruvians PEL 85 Peru
Mexican Americans MXL 64 Los Angeles, CA, USA
Americas-admixed Puerto Ricans PUR 104 Puerto Rico
Colombians CLM 94 Medellin, Colombia

13
Supplemental Table S2. mMHseq primers.

Amplicon
Primer Name Left Primer Sequence Right Primer Sequence
size (bp)
mh01KK-001 5’-TTGATGTGAGCTCTAAAACGATT-3’ 5’-AAGCACATTTCTGGGTTTTGTTT-3’ 350
mh01KK-117 5’-AGAAATCCTCCCCACCCAGCTCT-3’ 5’-CTCCATCACTGTGCCTCCCACAC-3’ 302
mh01KK-172 5’-GGGCAGCTTCTCCCCAAATCACA-3’ 5’-ATCTCTGCCCCAAATCCACGTGG-3’ 317
mh01KK-205 5’-TTTGGGGTCCAGAGCACCAGTTC-3’ 5’-GCTTCATCCACCGGACCAGTGAG-3’ 321
mh01KK-212 5’-CTGTGGCTGTTCAGGTGTCCTCC-3’ 5’-TTCCCTCCCCACTTCAGACCTGT-3’ 305
mh01KK-213 5’-GGTGTGGGGGTCTGCATGACTTA-3’ 5’-GGTTACAGCCCTGAAGCCAAGCT-3’ 336
mh01NK001 5’-ACAGCTAAGTCTACTGTCATTTT-3’ 5’-AGGGAACTTGGATTGTGGCCACT-3’ 346
mh02KK-013 5’-TGGCCTCCCTCAGAAAACCTCCT-3’ 5’-AGCTTGGTGGGGACACAAGAACC-3’ 349
mh02KK-014 5’-GGACCTGTCTCCGGACTGCAGTC-3’ 5’-TGCTTTGTCACCACCTGTTATGTGC-3’ 304
mh02KK-015 5’-AGCTACAGGTGAGGAGCGATGGA-3’ 5’-CCCTTAGGAGGAGCTGAGGGAGG-3’ 310
mh02KK-022 5’-GCCGCCCACTGAACCTGCTAAAT-3’ 5’-TGCCCTTATTCCCAACAGCAGCA-3’ 338
mh02KK-029 5’-CCCACTCTTCCCAGCACTCAGAG-3’ 5’-GAACTTGTCCCACCCTCCACCAG-3’ 331
mh02KK-031 5’-GATCTCCTGACCTCGTGATCCGC-3’ 5’-CCCATGCCATCTCATCCCTAGCC-3’ 328
mh02KK-134 5’-CTGGCAAAGGGGCTCTTCTCTCC-3’ 5’-GGAAGCTGGGATCCTGCTGCTTT-3’ 300
mh02KK-136 5’-CCCTGGGTCAGACTGTCTCACCT-3’ 5’-TGCATTCAAAACCTTTTAGGGGGT-3’ 324
mh02KK-138 5’-AGTTTACTCTAATGCTGCTGTGCCT-3’ 5’-TGTGTGCCTCTCTGGGAAGATACA-3’ 319
mh03KK-016 5’-AACAACAGGGGCCTCAGGTTCAC-3’ 5’-AGGGTCTAGTACTGTGCCTGGCA-3’ 309
mh03KK-017 5’-AGCACTTAACACACCCAGCTGTGA-3’ 5’-GCAAAGCTGAGGGTTTCGTGTGT-3’ 301
mh03KK-018 5’-AGCTGAGAGGAGAAAGCGACAAGT-3’ 5’-TCACTCGTTTTGTGAGACAGTTCA-3’ 306
mh03KK-047 5’-AGCAGCTGGTGTTTCCTCCATCC-3’ 5’-AGGACACCAGGGAACTTCAAGGT-3’ 338
mh03KK-150 5’-GGCCTAAAGGACCTTGCATGCCT-3’ 5’-TCAGTGGATCCCAGGCACTTTCA-3’ 346
mh04KK-010 5’-TTCCAGAAGAGGCCATGCTTCCG-3’ 5’-GCGGCGGGAAGAAAGGAGCTTAT-3’ 336
mh04KK-013 5’-AGTAAGCCATTGCAGTCATCTGA-3’ 5’-AGGAGTACACTAAAAACTCTGGCA-3’ 322
mh04KK-030 5’-TCAGGTCTCAGTGCTCAAGGGGA-3’ 5’-CGTGGTTCTGCTGTGCAAGTTGG-3’ 349
mh05KK-020 5’-TGTTCCCCCAAGACCTGAGTAGCT-3’ 5’-CAGGTTCTGCATTGGAGCTGGGA-3’ 348
mh05KK-169 5’-TGGTTTCCCTTGTGCTCAGGGTG-3’ 5’-AGAAGCCCAGAGAGGAGGCAGAA-3’ 337
mh05KK-170 5’-CGTGTGGGTCTCGGATCACAGAG-3’ 5’-TTGTCATCACCAGCACAGTGCCA-3’ 326
mh05KK-178 5’-TGAGAATCGAAAAAGCTGCCTACA-3’ 5’-AAGCTTGACAGGAGAAGCCACCC-3’ 350
mh06KK-008 5’-TTCATGACTCCACGTGCCGATGG-3’ 5’-CAGAGGGGGAGTAGAGGGGACAG-3’ 343
mh06KK-090 5’-ATCCCCTCCTTCCTGCTTTCCCT-3’ 5’-GGGGAAGGGACTTGCACACTAGG-3’ 341
mh06KK-104 5’-AGCCTGCATTCCTGGGTCAATGA-3’ 5’-CCCAGTGCCTAAAACAGTTGTTGGC-3’ 348
mh07KK-009 5’-GCAGGAACCAAAAGGCCAGTGTG-3’ 5’-TCCCCACTTCAGATGCTGGTTGC-3’ 310
mh08KK-039 5’-GGGAAGCCCAGGGAGGTGAAATT-3’ 5’-TCTTTGCCAGCTGTCCTGGGATG-3’ 329
mh08KK-131 5’-GGTATGAATCACCGCATGCCACT-3’ 5’-CCTGGTGTTCTGGGATCTGGTGT-3’ 300
mh08KK-137 5’-GTAAATTAAAACAGTTGGCAAGTCCCA-3’ 5’-TCCATTGCTGTGCCTTCTAGTGT-3’ 300
mh09KK-010 5’-CCAGCTCTGCATCTGCCCTTCTG-3’ 5’-ACACTCCCAAAAGTTTGCCCGGA-3’ 334
mh09KK-145 5’-TCTGAGGTGTGGGAACTGAGGCA-3’ 5’-GAAAAAGGGGGAGGAGGCAGGAC-3’ 300
mh09KK-153 5’-AGCATTAGACCAGATTACCTGCAGT-3’ 5’-GGCTGTATGATACTGGGCCACCC-3’ 327
mh09KK-157 5’-CGAGAGTGTCAGGTTGGAGCCAG-3’ 5’-CTCCAAGCAACAGCCCTGTCCTT-3’ 345

14
mh09KK-161 5’-AACAGACCCACAAATACAGAGTT-3’ 5’-TCCAGTCCTTGTGTGCTTTAGGGT-3’ 350
mh10KK-162 5’-TTACGCCACTACACTCCAGCCTG-3’ 5’-TGGGGTGTCTGACATCGTTCTCC-3’ 345
mh10KK-167 5’-CACTTGTGCCGCTGGGATTTTGG-3’ 5’-GCGGGACGTTTGTGAGTGGAGAT-3’ 332
mh10KK-170 5’-ACCCACTGCCTGAGGTGTAGGTT-3’ 5’-GGGAGCTGTTAGGGCCAGAAAGG-3’ 311
mh11KK-180 5’-TCCTGTGATCTCCAGCATCGAGA-3’ 5’-ATCTCCACAATCAGCCTGCCTGC-3’ 338
mh11KK-181 5’-CCCAGCTTGAGTGTCCCATCCTG-3’ 5’-CAACCCTGAGAGCAGCCCCTTTT-3’ 305
mh11KK-183 5’-AGATCTCCTCTGCCCTTGCCTGA-3’ 5’-AGAGAGGCTGCTACTGACCACTCA-3’ 345
mh11KK-190 5’-ATCTCCCTTCTGCAGCACCCTCA-3’ 5’-ACATCCTGAGACTGGGGAGGGAC-3’ 301
mh11KK-191 5’-TGGGAAGTGTTTCAAGAGAACCCA-3’ 5’-TCAGGAGAGGCAGTGGAAGCTCT-3’ 307
mh12KK-046 5’-AGGAACACTGGTATAGGAGGAGA-3’ 5’-CTTACCTACCACGGATGCTGCCT-3’ 350
mh12KK-199 5’-TCGCTATCAGCCAATGTGAGCCT-3’ 5’-CCACCTCACATTGCCTCAAACCC-3’ 301
mh12KK-201 5’-CACCGTGCATGCTACCTCTCCC-3’ 5’-GGGACCACTGTCTTCTTGGTACT-3’ 309
mh12KK-202 5’-CTCCACACACACCTCCCTCTCCA-3’ 5’-AGGCACTAGTGTGGCTCTCTGGA-3’ 313
mh12KK-209 5’-AGGTATAAAGCAGAGTGCCCAGA-3’ 5’-CCCAGGAAGCCCACCAAAGCATA-3’ 326
mh13KK-213 5’-TCTGCCACTTTGTTTGGAAGTCT-3’ 5’-CAGTCCCCAGCTGTGAGGAGAAG-3’ 343
mh13KK-215 5’-TGTTCAGGAAATGGACAAGTTCA-3’ 5’-GGCGCCCTCCTAGAATATCAAGGC-3’ 324
mh13KK-217 5’-CCCTGGGTCACTCTTGCTTCTGT-3’ 5’-ACAGAACAAATTGCCTAATTGGAACT-3’ 330
mh13KK-218 5’-TTTCACCATGTTGGCCAGGCAGG-3’ 5’-ACATGTGAGTTCTGCATTACTCT-3’ 346
mh13KK-221 5’-ACTTCCCTCTCTTGTCTGCTGCC-3’ 5’-AGGGTAGAGACAGTGAACAGTGGGT-3’ 317
mh13KK-222 5’-AACTCCTGGGCCTACCTTCACAA-3’ 5’-GGGTGGGGTAGGGGAGTGGTAAA-3’ 310
mh13KK-223 5’-TGAGTGTATCAAACAGGGGCCTTGT-3’ 5’-TGGTCACATGGAGAAGCTGCTCA-3’ 321
mh13KK-225 5’-TGCCGTGTCAAAGACAGTTCAGT-3’ 5’-AAGCACCACTTCTGCCAGGTGAG-3’ 346
mh14KK-048 5’-CAGGACAGAACTGCCAGCCGTG-3’ 5’-TGCTGGGTTGCCTAATTACCTCCA-3’ 301
mh14KK-227 5’-GCAAGCTGCCTGCAAATTCACCA-3’ 5’-CGGTGTCTTAGTTGAGGCCCTCC-3’ 300
mh15KK-066 5’-ACAGGCCAGAGACTAGGAGTGCT-3’ 5’-TCAGAGAAGATAAACGGCATGGCA-3’ 346
mh15KK-067 5’-CTCACACCCGGGAAGAAGGCATT-3’ 5’-AGTCAACAAAATAGCCCCTCATGGA-3’ 331
mh16KK-011 5’-CACATAGAAAGCCGTGGGGGAGG-3’ 5’-GGAAGTTGCAGTGTGGAGGGGAG-3’ 342
mh16KK-049 5’-TTCTCTGCATACTGCCCTGGAGA-3’ 5’-TGAGGGCTGTCACTAAGGGGACT-3’ 347
mh16KK-255 5’-TTGCTGTGTGAGGGCTTTCTGCT-3’ 5’-CCTTCTGTACCCTGACGTGAGCG-3’ 313
mh16KK-259 5’-CAAAACTGAGGGGCCCCAAGGAA-3’ 5’-TCAGACAGCCCAAACCAAGGTGG-3’ 314
mh16KK-262 5’-GCCTCATCTCATCCGCTCTCTGC-3’ 5’-GGCAACTGGAGCAGACAGGACAT-3’ 313
mh16KK-302 5’-CTGAGCTGGGGGTGAAGACATCG-3’ 5’-ATGCATGAGGGGAAATCCGTGGT-3’ 316
mh17KK-012 5’-CCCGAGTTTGGGCACAGATTCTG-3’ 5’-CGAAGTCTGGCCACAGGGACTTT-3’ 304
mh17KK-013 5’-TACGCACATGCACACACCACACA-3’ 5’-AACGCCTTTGAGGAGAACACGCT-3’ 332
mh17KK-272 5’-ATACCTTTCCTCTTCGCAGGGCC-3’ 5’-AGCAGAGAGCAGAGCTGACTTGT-3’ 342
mh17KK-278 5’-TCCACATCCTTTCTAGTTCTTGGCA-3’ 5’-TCCCCTTCAGCGTCTCTTGGAGT-3’ 346
mh18KK-293 5’-GAGGGGACTGTCCTTCCTCCAGT-3’ 5’-TCGGATGTGTCAGTGTGGAGAAGA-3’ 324
mh19KK-299 5’-TCTCTATCATGTGGCCTGGCACA-3’ 5’-AGCATCAGCATCTCCTGGGAGC-3’ 300
mh19KK-300 5’-TACCACACGCCTGAAAACTGCCA-3’ 5’-TGATTGTATTTGGGATACATTGGGT-3’ 338
mh20KK-058 5’-CAGGGCCAGGCATATAGCTGGTG-3’ 5’-CGCTAAGGAGTGGTTGGCATCCA-3’ 300
mh20KK-306 5’-GAGCCAGAGGTGGTTGTCAAGCT-3’ 5’-AACCACAGTGGCATCCTCAGCTC-3’ 349
mh20KK-307 5’-GGACACCCCATGTTCATGTGGCT-3’ 5’-AAGTGATCGTGGTAATGCTCACA-3’ 306
mh21KK-313 5’-ACCCTCTCACATCCCTGGAGGTC-3’ 5’-ACGTTCTGAGGTTTTGGGTGGACA-3’ 330

15
mh21KK-315 5’-CTGAAATCTCAGGCAGGCAGGCA-3’ 5’-CTGGTGAGGAGCAGCAAGGTCAG-3’ 306
mh21KK-316 5’-TGGCAGACCAGGGAAGATATGTGC-3’ 5’-TGCTTCTCATTCTCCCACAGCTTGA-3’ 339
mh21KK-318 5’-AAGACACAGCACCGGGGGATAGA-3’ 5’-TGCTGGGTCCAGGCTCCTTAGAT-3’ 311
mh21KK-320 5’-GAGGCTGTGGAGAGGGTGTGTTT-3’ 5’-CCTGGGGAGCACAGTGAACTCC-3’ 350
mh21KK-324 5’-CTGCCGTCATCTGGGAAACGTGG-3’ 5’-CACATGGATCCTGAGCCCGCATA-3’ 347
mh22KK-061 5’-GCTCTAGCACGGACCTCAAAGGG-3’ 5’-AGTGGAGAGGGGATCCAGAACCC-3’ 348
mh22KK-328 5’-GCAGTCCTCGTCTGATTGGCCAA-3’ 5’-AGGCATGAAGTCCAGGTCAAGGC-3’ 324
mh22KK-340 5’-GCCTACGGTGTTTCGTGGACTTG-3’ 5’-ACGTCATCTTACCCCAAAGCCGT-3’ 350

16
Supplemental Table S4. SNPs in 90 microhaplotype regions. The 90 MH loci contain a
total of 717 SNPs including 632 known SNPs identified from dbSNP (version 2.0 Build 152,
extracted on July 2019) and 85 novel SNPs discovered in this study. The number of SNPs per
individual MH ranged from 4 to 16 SNPs. For each SNP, the reference (REF) and the alternate
(ALT) allele is provided. *The asterisk indicates 44 novel MH loci validated in this project.

MH name dbSNP ID Variant category REF ALT Number of SNP per MH


mh01KK-001 rs2275834 Known A G 10
rs4648344 Known T C
rs4648345 Known G A
rs6663840 Known G A
rs58111155 Known G C
rs6688969 Known C T
rs199565833 Known G A
rs376407444 Known C T
rs114208229 Known C T
. Novel C T
mh01KK-117 rs17413714 Known A C 5
rs2772234 Known A G
rs1610401 Known G C
rs1610400 Known C T
. Novel T C
mh01KK-172 rs3128342 Known C A 7
rs3766176 Known T C
rs1887284 Known G A
rs115309688 Known C T
rs3766175 Known A T
rs74511912 Known C T
. Novel G C
mh01KK-205 rs11810587 Known T C 8
rs1336130 Known T C
rs1533623 Known G A
rs1533622 Known G A
rs1336131 Known G T
rs56176731 Known G C
rs138362286 Known G A
. Novel C T
mh01KK-212* rs16850184 Known G C 11
rs57658975 Known C A
rs58023111 Known T C
rs59815839 Known T C
rs58060874 Known T C
rs12121078 Known C T
rs11589785 Known A G
rs114630930 Known C T
rs186753019 Known G A
rs72750624 Known C G
rs114711668 Known G A
mh01KK-213* rs79793450 Known T A 10
rs7521883 Known G C
rs9424470 Known G C
rs12118941 Known C A
rs77211663 Known T C
rs6424246 Known T C
rs75030648 Known G A
rs186507293 Known G A
rs12120379 Known A G
17
. Novel C G
mh01NK-001 rs2479135 Known A G 4
rs2296796 Known G A
rs2296797 Known G T
rs2296798 Known G A
mh02KK-013* rs7584136 Known A C 7
rs35414072 Known A G
rs34544149 Known C T
rs1978821 Known C T
rs1978820 Known T C
rs114422847 Known C T
rs147490454 Known C G
mh02KK-014* rs6730730 Known A G 13
rs72961055 Known G C
rs58129342 Known G A
rs4332915 Known G A
rs4321359 Known T C
rs4270334 Known T C
rs4580373 Known A G
rs60403543 Known C T
rs77155305 Known A G
rs182570717 Known T G
rs73084811 Known G A
rs78561058 Known T C
. Novel C G
mh02KK-015* rs3791372 Known T C 10
rs3791373 Known A G
rs3791374 Known A G
rs3791375 Known G A
rs28404128 Known A G
rs112145050 Known C T
rs140587940 Known A C
rs74566136 Known C T
rs75104001 Known C T
. Novel G A
mh02KK-022* rs13021132 Known C T 6
rs10495516 Known G A
rs10495515 Known G A
rs62123466 Known A G
rs74618846 Known G A
rs79989676 Known A G
mh02KK-029* rs77788766 Known G A 13
rs6726044 Known C A
rs10191936 Known C A
rs12999343 Known A G
rs114774962 Known A G
rs115054914 Known C T
rs72905669 Known T C
rs78044414 Known A G
rs372620871 Known C T
rs191237929 Known C T
. Novel T C
. Novel C T
. Novel G A
mh02KK-031* rs6740521 Known T C 5
rs13400652 Known C T
rs13400673 Known C T
rs7575942 Known C T
. Novel G A
mh02KK-134 rs12469721 Known A T 6
rs3101043 Known T C

18
rs3111398 Known C T
rs72623112 Known G A
rs190062220 Known C T
. Novel C T
mh02KK-136 rs6714835 Known T G 5
rs6756898 Known C T
rs12617010 Known C A
rs6728149 Known A G
rs184900040 Known C T
mh02KK-138 rs4953292 Known G A 7
rs2595202 Known T G
rs6759301 Known T C
rs59298278 Known A G
rs2595203 Known G A
rs6715568 Known A G
. Novel A G
mh03KK-016* rs9310445 Known C T 9
rs9867515 Known C T
rs9867516 Known C T
rs874969 Known C T
rs874967 Known G T
rs9310444 Known T G
rs367833150 Known C T
rs375235722 Known T G
rs13082821 Known G C
mh03KK-017* rs267527 Known A G 8
rs267526 Known A G
rs2162683 Known C T
rs197766 Known G A
rs35590538 Known G T
rs140794158 Known C T
rs150108526 Known G A
. Novel G A
mh03KK-018* rs6806321 Known G A 10
rs9870146 Known C T
rs3971402 Known C A
rs6770710 Known T C
rs3971401 Known A C
rs12695349 Known T C
rs9832546 Known A G
rs77141935 Known T G
rs116231064 Known G A
. Novel C T
mh03KK-047* rs6441891 Known T C 6
rs59145932 Known C T
rs7612333 Known G A
rs34703630 Known C T
rs189841224 Known G A
. Novel G A
mh03KK-150 rs1225051 Known G A 4
rs1225050 Known G A
rs1225049 Known C T
rs1225048 Known C A
mh04KK-010 rs3135123 Known G A 4
rs495367 Known A G
rs376745306 Known C G
rs185470596 Known C T
mh04KK-013 rs13131164 Known C A 7
rs3775866 Known G A
rs11725922 Known G A
rs3775867 Known G A

19
rs17088476 Known T C
rs73829227 Known T C
. Novel G C
mh04KK-030 rs16844737 Known T C 7
rs4916615 Known C T
rs1884412 Known A G
rs1884411 Known C G
rs58827274 Known C G
rs148696985 Known G A
. Novel T A
mh05KK-020 rs617938 Known G T 6
rs2278325 Known C T
rs2278324 Known G T
rs525735 Known G T
rs74924445 Known A C
rs80052994 Known C T
mh05KK-169* rs260410 Known T A 6
rs62335822 Known C A
rs10428542 Known C T
rs10428597 Known A C
rs62335823 Known G A
rs116092316 Known C A
mh05KK-170 rs74865590 Known C T 13
rs438055 Known A G
rs370672 Known G A
rs6555108 Known A G
rs436910 Known A G
rs79374881 Known G A
rs80244780 Known T C
rs143205094 Known T G
rs58296930 Known G A
rs199671779 Known G A
rs143851316 Known G A
rs116278333 Known C T
. Novel A G
mh05KK-178* rs35138278 Known A G 8
rs282290 Known T C
rs282291 Known C A
rs1437125 Known G C
rs75963146 Known A G
rs116644437 Known A G
rs115575506 Known T C
. Novel C G
mh06KK-008 rs6930377 Known T G 12
rs6921774 Known G C
rs6605524 Known A G
rs6605523 Known A G
rs6422749 Known A G
rs55995725 Known G A
rs115242802 Known T G
rs113888184 Known G C
rs147272602 Known G T
. Novel C G
. Novel G A
. Novel G A
mh06KK-090* rs2256539 Known C T 12
rs2256543 Known T C
rs3202637 Known C T
rs1061535 Known T C
rs6911940 Known A G
rs1061537 Known G A

20
rs1061536 Known T C
rs2523972 Known T C
rs9260770 Known G C
rs4713276 Known C G
rs6939037 Known T C
rs113213828 Known A G
mh06KK-104* rs16897782 Known T C 5
rs220807 Known C T
rs9457084 Known A G
rs220806 Known C T
rs375807743 Known G C
mh07KK-009* rs13244868 Known C A,G 13
rs2520361 Known A G
rs2588633 Known A G
rs28636374 Known A G
rs28712504 Known T C
rs28401174 Known T C
rs148036593 Known A G
rs28465160 Known G A
rs375413102 Known T G
rs115971745 Known C T
rs189222341 Known G T
. Novel A G
. Novel T G
mh08KK-039 rs1905100 Known C T 9
rs10503214 Known T C
rs7822812 Known A G
rs922794 Known C T
rs7838695 Known G C
rs7839517 Known C G
rs74752104 Known T C
rs60536879 Known T G
. Novel A T
mh08KK-131* rs11782143 Known C A 14
rs11136908 Known A G
rs67207258 Known C T
rs11136909 Known C T
rs11775090 Known G A
rs72624088 Known G C
rs140864246 Known G A
rs117526954 Known C T
rs190035804 Known G A
rs150466498 Known G A
. Novel C A
. Novel T G
. Novel C G
. Novel G C
mh08KK-137* rs35052638 Known C A 9
rs58875218 Known T C
rs59644946 Known T C
rs62506141 Known C T
rs73232805 Known A G
rs62506140 Known T C
rs113565897 Known C T
rs184519930 Known A G
. Novel C A
mh09KK-010* rs1408328 Known C T 10
rs1535837 Known G A
rs12555748 Known C T
rs1535838 Known G A
rs1408330 Known C A

21
rs1408329 Known G A
rs11789647 Known T C
rs114987572 Known G A
rs189134433 Known C T
. Novel C T
mh09KK-145* rs409950 Known C A 8
rs12003360 Known A G
rs12005199 Known G A
rs7849565 Known G C
rs10815071 Known G A
rs73401750 Known C G
rs59428916 Known G A
rs181550768 Known G A
mh09KK-153 rs10125791 Known T C 6
rs2987741 Known A G
rs7047561 Known A C
rs1156731 Known C G,T
rs62576887 Known C T
. Novel C T
mh09KK-157 rs606141 Known G A 7
rs8193001 Known C T
rs56256724 Known C T
rs2073578 Known A C
rs633153 Known C T
rs370406708 Known C T
. Novel C T
mh09KK-161 rs7045710 Known T G 9
rs4741822 Known C G
rs4741823 Known C T
rs16932430 Known T G
rs76724192 Known G A
rs57160652 Known C G
rs75177501 Known G T
rs184018458 Known C G
. Novel G A
mh10KK-162* rs9423462 Known T C 12
rs4881098 Known G A
rs3829911 Known C T
rs3829913 Known T C
rs79563339 Known C G
rs9423463 Known A G
rs3829912 Known G A
rs112639679 Known G A
rs12777199 Known T C
. Novel A G
. Novel C A
. Novel C T
mh10KK-167* rs45594139 Known C T 8
rs1111062 Known T C
rs1111063 Known T C
rs2815662 Known A G
rs2815663 Known G C
rs146279227 Known G A
rs370497100 Known C T
. Novel G A
mh10KK-170 rs2250841 Known G T 5
rs2250840 Known A G
rs12359688 Known A G
. Novel C G
. Novel T G
mh11KK-180 rs12802112 Known A G 10

22
rs28631755 Known A C
rs7112918 Known T C
rs4752777 Known C G
rs12360952 Known T C
rs4752778 Known C T
rs7109697 Known T G
rs7105950 Known A C
rs74047734 Known G A
rs113739994 Known G A
mh11KK-181* rs3852525 Known G A 7
rs233440 Known G C
rs78013 Known T G
rs163173 Known T A
rs116463395 Known G T
rs372212221 Known C T
. Novel A T
mh11KK-183* rs4757879 Known G T 9
rs2403552 Known C T
rs11025352 Known C G
rs2896591 Known T G
rs112906811 Known T A
rs56114506 Known A G
rs139023539 Known G C
. Novel T C
. Novel G A
mh11KK-190* rs1426651 Known T C 7
rs56058681 Known C T
rs7936934 Known T C
rs61894297 Known G T
rs72968980 Known A G
rs114239369 Known A T
rs149859006 Known T C
mh11KK-191 rs12421109 Known T C 6
rs12289401 Known A G
rs12420819 Known A G
rs770566 Known T C
rs11222337 Known A C
. Novel G A
mh12KK-046 rs1503767 Known T G 6
rs11068953 Known G A
rs3903095 Known C T
rs139873540 Known G T
rs75554846 Known T C
. Novel T C
mh12KK-199* rs17819677 Known C G 5
rs1641719 Known C T
rs1641720 Known T G
rs6488494 Known C T
rs139574449 Known C T
mh12KK-201* rs11049080 Known C T 12
rs73084354 Known A G
rs7959164 Known T C
rs7959186 Known T C
rs7972446 Known G C
rs7138224 Known T C
rs7959051 Known T C
rs11049081 Known T C
rs11834107 Known C T
rs73084353 Known G A
rs140345281 Known G C
rs145035089 Known A G

23
mh12KK-202 rs10506052 Known A C 5
rs4931233 Known G A
rs10506053 Known T C
rs4931234 Known T C
rs116560588 Known A T
mh12KK-209* rs764902 Known A G 7
rs79004033 Known C T
rs5012151 Known G T
rs2173910 Known G A
rs5012152 Known G A
rs147809360 Known G A
rs113507589 Known A C
mh13KK-213 rs8181845 Known C T 8
rs679482 Known C A
rs9510616 Known G A
rs8181836 Known G A
rs2152727 Known T C
rs186130261 Known A G
. Novel C T
. Novel A G
mh13KK-215* rs1539549 Known C T 8
rs4146589 Known T C
rs1539548 Known T A
rs1750921 Known A G
rs1539547 Known C G
. Novel G T
. Novel A G
. Novel C A
mh13KK-217 rs7320507 Known A G 7
rs9562648 Known G A
rs9562649 Known C T
rs2765614 Known A G
rs9534373 Known G A
rs76839632 Known A G
rs2764588 Known T G
mh13KK-218 rs9536428 Known C G 7
rs1927847 Known T C
rs9536429 Known T C
rs7492234 Known T C
rs9536430 Known C T
. Novel C T
. Novel G A
mh13KK-221* rs112552144 Known A G 12
rs608555 Known C A
rs11843065 Known A T
rs9300648 Known T C
rs61973993 Known T C
rs602728 Known T C
rs7993431 Known T C
rs113732191 Known A G
rs114169269 Known T G
rs9557588 Known T A
rs114345885 Known A G
. Novel T G
mh13KK-222* rs2248394 Known C T 10
rs2248391 Known A G
rs2248388 Known T C
rs2391263 Known C A
rs61972611 Known C T
rs2476747 Known G C
rs9583061 Known T A

24
rs79323262 Known T G
rs180754893 Known T C
. Novel G A
mh13KK-223 rs1192204 Known T C 8
rs1192205 Known C G
rs3825483 Known C T
rs3825481 Known T C
rs1192203 Known T C
rs3825482 Known G T
rs115278730 Known G A
. Novel G A
mh13KK-225 rs4884651 Known G A 5
rs9529023 Known A C
rs7329287 Known G A
rs4884652 Known C T
rs115260715 Known A T
mh14KK-048 rs12717560 Known A G 4
rs12878166 Known C T
rs12879393 Known T C
rs149195448 Known G A
mh14KK-227* rs1953862 Known A G 9
rs57928440 Known T C
rs12885784 Known A G
rs17124680 Known A C
rs114965032 Known G A
rs146988848 Known T C
rs148884074 Known A T
. Novel C A
. Novel G A
mh15KK-066 rs1063902 Known A C 6
rs4219 Known G T
rs1077383 Known G C
rs147715323 Known T C
rs183816954 Known G T
rs148957716 Known G A
mh15KK-067 rs701463 Known T G 5
rs701464 Known C T
rs7183984 Known T C
rs79763993 Known T C
. Novel T C
mh16KK-011* rs390112 Known T C 9
rs390379 Known T C
rs422842 Known G A
rs7190826 Known C A
rs441199 Known G A
rs426580 Known C G
rs76460635 Known G C
rs188060999 Known C T
rs74957131 Known C T
mh16KK-049 rs9937467 Known A C 16
rs17670098 Known C A
rs17670111 Known A G
rs12929083 Known A G
rs9926495 Known G A
rs9937460 Known A G
rs184573741 Known A T
rs72765534 Known T C
rs72765535 Known A G
rs11642586 Known C G,T
rs113511754 Known A C
rs115605200 Known C T

25
rs74009271 Known G A
rs149818672 Known A G
. Novel C T
. Novel T G
mh16KK-255 rs16956011 Known G A 12
rs3934955 Known A C
rs3934956 Known C T
rs4073828 Known A G
rs3934954 Known A G
rs3934957 Known C G
rs149471008 Known T C
rs62044947 Known C G,T
rs184092108 Known C T
rs117206377 Known G C
. Novel G A
. Novel C T
mh16KK-259* rs10163264 Known C G 11
rs10163267 Known G C
rs10163270 Known G C
rs8058168 Known G A
rs145717803 Known C T
rs148944690 Known G C
rs143423041 Known C G
rs78983027 Known T C
rs140436204 Known G C
. Novel G A
. Novel C T
mh16KK-262* rs825589 Known G C 10
rs825588 Known A G
rs863752 Known G C
rs861133 Known T G
rs61069393 Known G A
rs79458371 Known C T
rs193263518 Known G T
. Novel C T
. Novel A T
. Novel C T
mh16KK-302 rs1395579 Known G A 7
rs1395580 Known C T
rs1395582 Known T A
rs9939248 Known T C
rs1507022 Known G T
rs1395581 Known C T
rs377520498 Known C T
mh17KK-012* rs56342141 Known A G 11
rs7211056 Known C T
rs55949744 Known G A
rs4789870 Known C T
rs55877426 Known T C
rs55714291 Known G A
rs56069218 Known A G
rs7211029 Known C T
rs76332411 Known A G
rs145799703 Known G A
. Novel G A
mh17KK-013* rs11651201 Known T C 9
rs11651204 Known T C
rs8074560 Known G A
rs1563446 Known G T
rs4465632 Known G A
rs72848020 Known C T

26
rs147699838 Known C T
rs116177500 Known G A
. Novel T C
mh17KK-272 rs2934897 Known T C 11
rs7207239 Known C T
rs16955257 Known C A
rs7212184 Known T C
rs2934896 Known A G
rs111774039 Known C A
rs7207213 Known C T
rs79327757 Known T A
rs78871226 Known C T
. Novel T G
. Novel T C
mh17KK-278* rs11150747 Known C T 5
rs7225755 Known C A
rs72852775 Known A G
rs7224758 Known G A
rs4969266 Known C T
mh18KK-293 rs621320 Known A G 7
rs621340 Known T G
rs678179 Known G A
rs621766 Known A G
rs80207925 Known C G
. Novel G A
. Novel G A
mh19KK-299 rs12985452 Known G A 7
rs4932999 Known C T
rs4932769 Known A G
rs2361019 Known T A
rs2860462 Known G A
rs139451798 Known C T
. Novel C G
mh19KK-300* rs2411333 Known A G 7
rs10415306 Known C T
rs34338963 Known A C
rs10415743 Known C A
rs150844118 Known C T
rs139954003 Known G A
. Novel G A
mh20KK-058 rs6122890 Known T C 7
rs6095836 Known A G
rs6012881 Known T C
rs6020381 Known G T
rs4811008 Known T G
rs4811009 Known T C
rs73121345 Known A C
mh20KK-306* rs11697918 Known A G 5
rs533194 Known A G
rs6086446 Known C T
rs1014897 Known C G,T
rs1014898 Known T C
mh20KK-307 rs6044080 Known T C 8
rs6044081 Known A G
rs16997830 Known A C
rs17674942 Known T A,C
rs112240396 Known T C
rs116146231 Known G A
rs182552564 Known C G
. Novel A G
mh21KK-313 rs6586324 Known C T 5

27
rs6586325 Known G T
rs6586326 Known T C
. Novel C G
. Novel G C
mh21KK-315 rs8126597 Known G A 5
rs6517970 Known A C,G
rs8131148 Known C T
rs6517971 Known T C
rs76016088 Known G A
mh21KK-316 rs961302 Known A G 6
rs17002090 Known C T
rs961301 Known A G
rs2830208 Known C T
rs73361625 Known G A
. Novel A G
mh21KK-318* rs2250341 Known C T 9
rs2299752 Known C T
rs17753847 Known C T
rs2299753 Known T C
rs73904461 Known G A
rs373008264 Known T C
rs370036702 Known A G
rs149759443 Known T C
. Novel G C
mh21KK-320 rs2838081 Known G A 7
rs2838082 Known A G
rs78902658 Known C T
rs2838083 Known A G
rs75885188 Known A T
rs114395414 Known A G
rs370194397 Known C T
mh21KK-324 rs6518223 Known T C 6
rs2838868 Known C T
rs7279250 Known T A
rs8133697 Known G A
rs7282557 Known G A
rs80278660 Known C T
mh22KK-061 rs763040 Known G A 7
rs5764924 Known A G
rs763041 Known G A
rs139200695 Known G A
. Novel C T
. Novel G A
. Novel C G
mh22KK-328* rs1076115 Known T C 5
rs5746505 Known C T
rs13057610 Known T C
rs4819661 Known G A
. Novel T C
mh22KK-340* rs4925431 Known A G 8
rs4925399 Known A G
rs4925400 Known A G
rs4925401 Known G A
rs9628590 Known T C
rs4925432 Known T C
rs77899570 Known T C
rs79558241 Known C T

28

You might also like