Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Module 8 Comparative Genomics

Julie Horvath, PhD Evolutionary Anthropology Department Primate Genomics Initiative Institute for Genome Sciences & Policy Duke University

What is Comparative Genomics?


The analysis and comparison of genomes from different species

A tool for decoding genomic information stored in DNA Functional sequences evolve more slowly than nonfunctional sequences Sequences that remain conserved throughout evolution may perform a biological function

Comparative Sequence Analysis


Identify conserved regions -genes -regulatory sequences

Identify species specific sequences


-genes that contribute to unique characteristics

Understand structure and function of genes for disease studies


-identify new model organisms

Evolutionary reconstructions of genes and genomes Apply knowledge to medicine, biotechnology, agriculture, conservation

Comparative Genomics:
Genotype Phenotype
Humans and Chimpanzees have different disease susceptibilities: A common sialic acid (Neu5Gc) is inactive in humans and may affect cell surface binding of various pathogens Chou et al. PNAS 2002 Humans have smaller masticatory muscles than other apes: Humans have a loss of function mutation in the MYH16 myosin gene Did this myosin mutation lead to gracilization of the human skull? Stedman et al. Nature 2004 and McCollum et al. J Hum Evol 2006 **Use caution when linking genes to phenotype!**
See Varki and Altheide Genome Research 2005

Where To Start?
To identify conserved regions, you must: Decide which species you would like to compare Identify and extract the relevant genome sequences Annotate genes, and other features found in the genome sequences

Ensure that repetitive sequences are masked

Selection of Species for DNA comparisons

Human vs.

Chimpanzee

Mouse

Opossum

Pufferfish

Size (Gbp)
Time since divergence Sequence conservation (in coding regions) Aids identification of Background noise

3.0
~6 MYA

2.5
> 90 MYA

4.2
~150 MYA

0.4
~450 MYA

>99%
Recently changed sequences and genomic rearrangements

~80%
Both coding and noncoding sequences

~70-75%
Both coding and noncoding sequences

~65%
Primarily coding sequences

High

Moderate

Low

Lower

Currently Available Genome Sequences


Ensembl:http://www.ensembl.org/

Draft Assembly ~entire genome represented

Low Coverage ~2X=70% genome coverage

Homology, Orthology, Paralogy


Gene Time

Homologues - Genes derived from common ancestral gene


Speciation

Orthologues

Duplication
Paralogues

Orthologues Genes in different species that are derived from the same gene in last common ancestor
Novel function

Original function maintained

Original function maintained


Paralogues

Paralogues Gene families that have diverged within a single species, often by duplication

Gene 1

Gene 2

Gene 3

Functional Orthologues

Performing Comparative Sequence Analysis


1. Identify orthologous gene/genomic sequences
Use public databases and genome browsers Use homology searches such as BLAST

2. Determine phylogenetic relationships


Pre-computed phylogenetic trees Create your own phylogenetic trees

3. Align genome sequences


Use public databases and genome browsers

4. Identify conserved regions


Use ECR browser

1. Identifying Orthologous Genes


Using genome browsers
Orthologue Prediction at Ensembl:http://www.ensembl.org/

Links to the closest putative orthologous genes in other species Hyperlinks to view alignments & positional information

1. Identifying Orthologous Genes


NCBI Homologene
http://www.ncbi.nlm.nih.gov/sites/entrez?db=homologene&cmd

Contains a wealth of information about homologous genes and links to other resources

1. Identifying Orthologous Genes


BLAST searches
http://www.ncbi.nlm.nih.gov/BLAST/

Species specific searches

Nucleotide or protein searches

Trace archives

Be Cautious!
BLAST searches can identify related but non-orthologous genes
~220 kb BMP8a BMP8b
Human chr1

Mouse chr4

Bmp8a mRNA alignments protein alignments

~170 kb

Bmp8b

Adapted from Nardone et al., 2004

Do You Have The Orthologous Gene?


Percent identity (protein and nucleotide)
(e.g. BLAST, ClustalW, sometimes Homologene)

Confirm that no other paralogous genes are present in your species of interest
(BLAST, self-chain @UCSC, Segmental Dup Track @ UCSC, Ensembl paralogous genes)

Compare the size of exons in orthologous genes


(Ensembl, EST or cDNA comparisons to genome Spidey, Sim4)

Positional information - neighboring genes


(Ensembl SyntenyView)

Phylogenetic analysis

2. Phylogenetic Analysis
Clusters homologous genes into a gene genealogy
(evolutionary tree)

Can directly view patterns of duplication resulting in


orthologues and paralogues Pre-calculated trees available in Ensembl GeneTreeView

Pre-calculated Phylogenetic Trees


Ensembl Gene Tree View

Species Specific Gene Duplications

Constructing Your Own Phylogenetic Trees


Obtain sequences from all species to compare
Genome browsers, PCR amplify and sequence

Align all sequences


MUSCLE, CLUSTALw, DIALIGN, Threaded Blockset Aligner (TBA) etc.

Use different methods to reconstruct the evolutionary history of the sequences


PAUP (Parsimony), Likelihood (Garli), Bayesian (Mr. Bayes), Distance (Neighbor-joining) HyPhy, Phylip, MEGA, Geneious

Performing Comparative Sequence Analysis


1. Identify orthologous gene/genomic sequences
Use public databases and genome browsers Use homology searches such as BLAST

2. Determine phylogenetic relationships


Pre-computed phylogenetic trees Create your own phylogenetic trees

3. Align genome sequences


Use public databases and genome browsers Multiple species (>20) alignments are more informative than pairwise
Margulies et al. TIGS 2006

4. Identify conserved regions


Use ECR browser

3. Aligning Genome Sequences


UCSC genome browser
UCSC Conservation Track:http://genome.ucsc.edu/

Pairwise genome sequence alignments combined with additional phylogenetic information

3. Aligning Genomic Regions


Using genome browsers

Performing Comparative Sequence Analysis


1. Identify orthologous gene/genomic sequences
Use public databases and genome browsers Use homology searches such as BLAST

2. Determine phylogenetic relationships


Pre-computed phylogenetic trees Create your own phylogenetic trees

3. Align genome sequences


Use public databases and genome browsers

4. Identify conserved regions


Use ECR browser

Identifying Conserved Regions


Functionally conserved units may be conserved at the sequence level
Evolutionary Conserved Regions (ECRs)

Miller et al, 2004. Ann Rev Genomics Hum Gen

22

Evolutionary Conserved Regions


Manual
Pipmaker http://bio.cse.psu.edu/cgi-bin/pipmaker
requires repeatmasked sequence and annotation files Local alignment, BLASTZ

Vista http://www-gsd.lbl.gov/vista
requires annotation files, repeat masks for you Global alignment, AVID

Semi automated
zPicture - http://www.dcode.org
Local alignment, BLASTZ

Automatic
Genome Browsers, e.g UCSC and Ensembl

ECRbrowser - http://www.dcode.org
BLAT, BLAST and BLASTZ
23

Can link to both zPicture and rVista

4. Identifying Conserved Regions


VISTA and UCSC

4. Identifying Conserved Regions


ECR browser

Summary Of Worked Examples


Identify paralogous, orthologous and phylogenetic relationships
Ensembl: Protein Family Paralogue Prediction Orthologue Prediction MultiSpecies View Synteny View Gene Tree View

Identify segmental duplications and paralogous regions


UCSC: Self Chain

Identify and align conserved regions


UCSC: Conservation Track

Identify transcription factor binding sites


ECR browser

On Your Own: Galaxy


http://main.g2.bx.psu.edu/

Do the tutorials (Quickies) on your own to learn how to use Galaxy!

You might also like