B9I4fxMyEeiKMhLoBm7qiA - Overview of MG Tools

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Overview of available metagenomic

analysis tools

Oksana Lukjancenko
Outline

• General overview of the methods


• Classification methods
• Commonly used sequence search algorithms

2
Metagenomic analysis methods

Reads

Assembly Classification methods

Sequence
Sequence Marker -
Binning Annotation similarity-
composition-based based
based

Hybrid
3
Sequence similarity-based methods

A homology search (comparison) against


the database of reference organisms

Disadvantage: Can’t not identify organisms


that are not present in the reference database

4
Sequence composition-based methods

• Based on characteristics of the nucleotide


composition (e.g. GC% or codon usage)

• Find the best fitting model to each sequence


read

Disadvantage: Short reads (<1000 bp) are


not suited for this method
5
Hybrid methods

Hybrid methods combine the elements of


both similarity-based and composition-
based methods

6
Marker-based methods

Compare each metagenomic read to the


curated collection of marker genes to identify
high-confidence matches.

Disadvantage: Achieve a low-level of sensitivity


if the reads don’t come from genomes represented
by the marker gene database.

7
Functional Analysis

Mapping against marker genes:


• Antimicrobial Resistance (AMR) genes
• Virulence Factors
• Transposones
• Enzymes
• etc.

8
Commonly used sequence search algorithms

• Variations of BLAST (blastn, blastx, MEGAblast) –


finds regions of similarity between biological
sequences.
• Hidden Markov Models (HMMER) – searches
sequence profile (model) databases for sequence
homologs.
• Bowtie/Bowtie2 – read alignment
to the long reference sequences.
9
Commonly used sequence search algorithms

• Burrows-Wheeler Aligner (BWA) – mapping


of low-divergent sequences against a large
reference genome

• k-mers - search against database of


substrings of length k that are contained in a
string.
10
Commonly used tools

Method name Class of method Sequence search method Composition method Functional classification
MEGAN4 Similarity BLAST programs N/A KEGG, SEED
MG-RAST Similarity BLASTN, BLAT N/A SEED, NOG, COG, KEGG
CARMA3 Similarity BLAST programs N/A Pfam, COG, GO, TIGRFAM
Kraken Similarity Exact match k-mers N/A N/A
MGmapper Similarity BWA N/A N/A
MLTreeMap Marker BLASTX N/A 4 Enzyme families
AMPHORA2 Marker HMMER3 N/A N/A
MetaPhlAn Marker MEGABLAST, Bowtie2 N/A N/A
phymmBL Hybrid MEGABLAST IMM N/A
RITA Hybrid Pipeline of BLAST variations NB N/A
PhyloPythiaS Composition N/A SVM N/A
TACOA Composition N/A k-NN N/A

Peabody et. al.


11

You might also like