Professional Documents
Culture Documents
SNP Typing Uas Kinship Estimation Gedmatch Pro Tech Note vd2020058 A
SNP Typing Uas Kinship Estimation Gedmatch Pro Tech Note vd2020058 A
For Research, Forensic, or Paternity Use Only. Not for use in diagnostic procedures. 1
Technical Note
UAS evaluates SNP allele calls based on the total number Table 1: Filters and sort options
of reads and classifies alleles as references or alternatives,
providing locus call rate and heterozygosity data with Feature Options
quality control (QC) indicators to facilitate data review. Metadata filters Chromosome
Estimates of contributor status and gender indicate
Typed or Untyped
whether each sample is a mixture and the biological sex.
When analysis is complete, users can manually inspect Homozygous or Heterozygous
each SNP call or use the multi-level filtering and sorting QC indicator filters Allele Count
options to focus the review on SNP metadata, calls flagged
with QC indicators, and specific SNP categories (Table 1). Analytical Threshold
This combination of filter and sort capabilities minimizes Imbalanced
the time required for data review.
Interpretation Threshold
To supplement the guided exploration of onscreen No QC Indicators
results, the software offers four types of reports for Not Detected
packaging SNP data:
Unexpected Allele
• Phenotype and Ancestry—Estimates of hair color, eye User Modified
color, and biogeographical ancestry.
SNP type filters Ancestry SNP
• GEDmatch PRO—SNP genotype calls configured for
secure upload to the database. Identity SNP
For Research, Forensic, or Paternity Use Only. Not for use in diagnostic procedures. 2
Technical Note
Analytical and interpretation thresholds (Table 3). Given these results, Verogen recommends
settings that call fewer SNPs rather than introduce
To determine the default setting of 3% for both AT and
potential errors into kinship estimation.
IT, 10 operators performed 25 runs on the MiSeq FGx
System using the MiSeq FGx Reagent Kit. Each run
Sample read count guideline
sequenced three libraries, which is the recommended
plexity, prepared with the ForenSeq Kintelligence Kit. To determine the sample read count guideline of 15
Data from 31 1 ng positive amplification controls and million reads per sample, Verogen evaluated the total
21 negative amplification controls were analyzed using number of reads obtained from 29 samples at 1 ng
various AT and IT settings with the aim of identifying the input. Libraries were prepared with the ForenSeq
combination that maximizes concordant allele calls and Kintelligence Kit and sequenced on the MiSeq FGx System
minimizes discordant allele calls. The lowest combination using the MiSeq FGx Reagent Kit. Each run sequenced
Verogen evaluated was 1.5% AT and 1.5% IT, which three libraries. Samples that reached 15 million reads
represents a minimum read count of 10. The increase demonstrated a median call rate of ≥ 99.8% and a
to 3% raises the minimum read count to 20 and results minimum call rate of 98.6%. However, samples that do not
in an overall positive effect on concordant call rates and reach 15 million reads might still generate sufficient data
a negligible effect on the detection of true allele calls for analysis.
Convert BCL files Align reads to targets Count and assign Contributor status and User review and
to FASTQ files and optimize SNPs gender estimation report creation
Raw base calls (BCL Both reads are aligned Reads are assigned to Contributor status Multi-level filtering and
files) are converted into to the manifest file, each SNP base call. If the number of sorting for efficient loci
sequence reads (FASTQ generating BAM files. imbalanced alleles review. Common filters
files). Intralocus balance is exceeds the threshold, are applied by default.
The Smith-Waterman calculated for a sample is classified as
Demultiplexing divides algorithm aligns heterozygotes. a mixture. Ability to update
reads into separate files amplicons. contributor status,
based on the index QC indicators identify Gender gender, or allele calls.
adapter sequence below-threshold or Samples with a low
associated with each undetected reads, number of loci passing SNP report
sample. unexpected bases or AT are undetermined. preconfigured for
numbers of calls, Samples that pass upload to GEDmatch
Reads are trimmed for imbalanced thresholds and have a PRO, plus three other
quality. heterozygotes, and minimum number of report formats.
user-edited data. loci on the Y
chromosome are male,
else female.
Figure 1: UAS automatically performs SNP calling, completing analysis in less than one hour. Single‑click reporting
produces a GEDmatch PRO-ready report.
For Research, Forensic, or Paternity Use Only. Not for use in diagnostic procedures. 3
Technical Note
Kinship estimation with dense SNP data candidates. Evaluating the top match candidates, building
and merging family trees, identifying the most recent
GEDmatch PRO is a dedicated forensic portal for kinship common ancestor, and admixture analysis also help with
estimation in cases of unidentified human remains, genealogical assessments.3,4
missing persons, and violent crimes. It supports upload
of SNP reports (kits) that WGS, microarrays, and targeted Nonsegment-based comparisons
sequencing generate, enabling comparisons against
profiles voluntarily shared in GEDmatch. Output from both In contrast to traditional, segment-based methods, the
segment-based and nonsegment-based comparisons Verogen method does not rely on overlap between long,
is a list of candidate matches. Candidate matches from contiguous segments to compare kits and generate
the nonsegment-based tool can be used as input for matches. Instead, the One-to-Many Kinship tool leverages
tree‑building and other segment‑based genealogy tools. a variation of principal component analysis (PCA) to
Users can also leverage the segment‑based tool to analyze the dense set of 10,230 forensically curated
properly place associations on a family tree. SNPs included in the ForenSeq Kintelligence Kit. Typically
used in genome‑wide association studies with genotype
Segment-based comparisons SNP data to infer and correct for population structure
(i.e., ancestry interference), PCA takes SNPs as input and
Traditional DNA-based genealogy methods generate a performs dimension reduction to calculate principal
sizable number of SNP calls, typically three billion for WGS components (PCs) that capture variability in the data. The
and 650,000–2,500,000 for microarrays, and employ a top PCs generally reflect population structure among the
segment-based approach to determine genetic relatives. samples. However, when a population contains known
The DNA between SNP locations on a chromosome is or unknown relatives, PC is confounded by the family
called a segment. If two kits have similar SNP alleles structure and presents clusters of close relatives instead
between contiguous segments, they are considered a of indicating variations across populations.5,6
candidate. A centimorgan (cM) is a measure of genetic
distance that denotes the size of matching DNA segments To bypass this issue, Verogen applied a variation of PCA
in autosomal DNA tests. Segments that have many to kits opted in for law enforcement use. One-to-Many
centimorgans in common are more likely to be significant Kinship builds on published, peer-reviewed methods such
and to indicate a common ancestor within a genealogical as PC-Relate and PC-AiR and uses targeted sequencing
timeframe, indicating biological inheritance. data to infer population structure and make genetic
correlations without the need for ancestry or reference
Users can leverage the Shared cM Project or similar tools population information. PC-Air accounts for relatedness
to evaluate the number of shared centimorgans, average in the population to provide ancestry estimations that
segment length, and the longest segment across matches are not confounded by family structure. Accordingly, the
to understand the most likely relationships that the total PCs that PC-AiR generates are robust to known or cryptic
number of shared centimorgans across chromosomes relatedness. When implemented on kits in GEDmatch,
indicates. Different genealogy resources, such as census, PC‑AiR identifies mutually unrelated kits that are maximally
family, birth, and death records, further refine these ancestrally diverse.7,8
For Research, Forensic, or Paternity Use Only. Not for use in diagnostic procedures. 4
Technical Note
For Research, Forensic, or Paternity Use Only. Not for use in diagnostic procedures. 5
Technical Note
Verogen • 1.833.837.6436 toll-free (North America) • +1.858.285.4101 tel • +44 (0) 208 054 8706 (United Kingdom) info@verogen.com • www.verogen.com
© 2021 Verogen Inc. All rights reserved. | Document # VD2020058 Rev. A | Current as of March 2021
For Research, Forensic, or Paternity Use Only. Not for use in diagnostic procedures. 6