Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Genome Sequencing

of
Arabidopsis thaliana

thale cress, mouse-ear cress or


arabidopsis.
small flowering plant native to
Eurasia.
first described in 1577 in the Harz
Mountains by Johannes Thal
750 natural varieties of A. thaliana
found around the world

Why A. thaliana ?
1. It has a very small genome which has a small
amount of repetitive DNA.
2. It is perfect for growth in a laboratory setting.
3. Very short generation time as compared to other
plant species (6-8 weeks).
4. Produces a large number of seeds.
5. It is amenable to most known tissue culture

6. Ability of Arabidopsis thaliana to self-fertilize.


7. There are a variety of land races with many different
morphological and physiological characteristics.
8. It is a member of an agronomically important group of plants
the brassica or mustard family.

Initiatives
Biological, Behavioral, and Social Sciences (BBS) Directorate
of NSF (1989).
'A Long-Range Plan for the Multinational Coordinated A.
thaliana Genome Research Project by NSF (1990).
Arabidopsis Genome Initiative (AGI) (1996).
Arabidopsis genome publication by AGI (2000).

Arabidopsis genome sequencing effort (2000)

Copyright: Dimitris Beis /


Ben Scheres, Utrecht
University

Five chromosomes of A. thaliana. The


Centromeres (CEN) are shown alongwith
the two nucleolar organizing regions
(NORs) and the 5S rDNA regions.
Source: Haas et al. (2005)

Representation of the Arabidopsis chromosomes

Each chromosome is represented as a coloured bar. Sequenced portions are red,


telomeric and centromeric regions are light blue, heterochromatic knobs are shown
black and the rDNA repeat regions are magenta. Telomeres are not drawn to scale.
Mitochondrial and chloroplast insertions (`MT/CP') were assigned black and green
tick marks, respectively. Transfer RNAs and small nucleolar RNAs (`RNAs') were
assigned black and red ticks marks, respectively.

By the end of 2000, 115,409,949 bp of the Arabidopsis genome


had been sequenced.
Gene density in Arabidopsis is one per 4.1-4.6 kb, twice that
observed in Drosophila (one gene per 9 kb) but similar to that
found for C. elegans (5kb).
Genome annotation also revealed 589 cytoplasmic tRNAs and 27
organelle derived tRNAs in Arabidopsis.
The large gene set in Arabidopsis is due to the much greater
number of gene duplications and segmental duplications than in
either Drosophila and C. elegans. In fact, 58-60 % of the
Arabidopsis genome occurs in duplicated segments that are
responsible for 6303 highly conserved duplications.
The fact that so much of the genome is represented in duplicated
segments leads credence to the hypothesis that Arabidopsis had
a tetraploid ancestor.

Segmentally duplicated regions in the Arabidopsis genome. Individual


chromosomes are depicted as horizontal grey bars (with chromosome
1 at the top), centromeres are marked black. Coloured bands connect
corresponding duplicated segments.

Among the 25,498 predicted genes in 2000, 11,601


gene families were identified. Approximately 150 of
these gene families are unique to plants. These
unique gene families encode enzymes, TFs and
unknown proteins.

Arabidopsis gene set post-2000 and its


comparison to those of other biota
Centromeres are longer than originally estimated. These longer
estimated centromere lengths would make the genome 146 Mb.
ESTs and full length cDNAs, and more recently RNA seq data have
been and continue to be instrumental in annotating the
Arabidopsis genome.
As of 2013, there were almost 2 million Arabidopsis EST sequences
in NCBI.
The first EST collections were small but the collection grew quickly.
The final TAIR genome annotation release (TAIR 10) contains
27,202 nuclear protein-coding genes, 4827 pseudogenes and
transposable element genes and 1359 nc RNAs (689 tRNAs, 15
rRNAs, 90 snRNAs, 177 miRNAs and 394 other RNAs).

Chromosome statistics from TAIR 10

General features of genes encoded by the three


genomes in Arabidopsis

Gene density is 4.35 kb/gene with an average of 5.89


exons/gene, average exon length of 296 nt and
average intron length of 165 nt.
Arabidopsis appears to contain significantly fewer
nuclear protein-coding genes than any other
sequenced plant species, except for S. bicolor and C.
papaya.
In TAIR 10, the number of genes identified with splice
variants increased to 5885 (18 %).

Table shows the


number of
genes in
selected plant
genomes.

Functional
annotation

Similar to flies and worms, a large number of


mutants have been identified and mapped in
Arabidopsis.
Of these mutant phenotypes, 30 % of the underlying
genes are essential for early development and
survival, 36 % are responsible for morphology, 12 %
are responsible for cellular or biochemical pathways
and 22 % were classified as conditional.
Screening genes in protein families has shown that
most genes do not show an alteration in phenotype
individually disrupted.

Table shows the type and number of predicted genes in selected functional
categories in Arabidopsis.

Table continued from previous slide

Pie chart showing the proportion of predicted


Arabidopsis genes in different functional
categories.

Comparison of functional categories between organisms. Subsets of the Arabidopsis


proteome containing all proteins that fall into a common functional class were assembled.
This reflects the measure of sequence conservation of proteins within this particular
functional category between Arabidopsis and the respective reference genome. y axis, 0.1
= 10%.

Conclusion
The sequence of Arabidopsis genome has accelerated our
understanding of specific genes as well as gene families more than
we could have predicted when NSF proposed funding the
sequencing project in 1989.
Arabidopsis has served as one of the most important model plant
species and has been, and continues to be, utilized to lead the way
in many areas of plant biology.
Now that the genome is completed, it is clear that we still have a lot
to learn about, for eg. Novel classes of regulation (epigenetics and
nc RNAs and the role of alternative splicing and others yet to be
discovered. Also, how this massive data can be used to improve
crops in either breeding or a transgenic approach.

THANK YOU

You might also like