Professional Documents
Culture Documents
DNA Seq Analysis
DNA Seq Analysis
wc -l file_name.fastq
• The Per base sequence quality report, which can help you decide if sequence trimming is needed
before alignment.
• The Sequence Duplication Levels report, which helps you evaluate library enrichment / complexity
machine_id:lane:flowcell_grid_coordinates end_number:failed_qc:0:barcode
• Alignment score = Σ
• match reward
• base mismatch penalty
• gap open penalty
• gap extension penalty
• rewards and penalties may be adjusted for quality scores
of bases involved
Read 1 Read 2
ATCGGGAGATCC
or ATCGGGAGATCC GCGTAGTCTGCC
|||||||||||| |||||||||||| || |||| ||||
...TAATCGGGAGATCCGC...TTATCGGGAGATCCGC... ...TAGCCTAGTGTGCCGC...
Reference Sequence
High alignment score ≠ high mapping quality.
Phred score: P(mismapped) = 10–MQ/10
Paired-end mapping (PEM)
• There is an expected insert size distribution based
on the DNA fragment library.
• Mapping one read anchor the paired read to a
specific location, even if the second read alone
maps multiple places in the reference.
• Only one read in a pair might be mappable.
(singleton/orphan)
• Both reads can map with an unexpected insert size
or orientation (discordant pair)
Split-read alignment (SRA)
• Useful for predicting splice variants or structural
variants.
• Not many mappers do this directly, usually happens
in a post-processing step.
BWA 110.73
Bowtie 220.82
Some Comparisons
Ma p p in g Ac c u ra c y - S p lic e d D a t a t o W h o le Ge n o m e
Mapsplice
Tophat
ABI Bioscope
Mapped correctly
Mapped incorrectly
Splicemap
Not mapped
BWA
SOAP
Bowtie
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Ma p p in g %
Some Comparisons
CP U Tim e - W h o le Ch ro m o s o m e
SOAP
Bowtie
BWA
ABI Bioscope
Mapsplice (Threaded)
Tophat
Splicemap