Professional Documents
Culture Documents
New Microsoft Office Power Point Presentation
New Microsoft Office Power Point Presentation
•ASN.1
•EMBL, Swiss Prot
•FASTA
•GCG
•GenBank/GenPept
•PHYLIP
•Plain sequence format
ASN 1: Abstract Syntax Notation 1
used by NCBI
Seq-entry ::= set {
class phy-set ,
descr {
pub {
pub {
article {
title {
name "Cross-species infection of blood parasites between resident
and migratory songbirds in Africa" } ,
authors {
names
std {
{
name
name {
last "Waldenstroem" ,
first "Jonas" ,
initials "J." } } ,
{
name
name {
last "Bensch" ,
first "Staffan" ,
initials "S." } } ,
{
name
name {
last "Kiboi" ,
first "Sam" ,
initials "S." } } ,
{
name
name {
last "Hasselquist" ,
first "Dennis" ,
initials "D." } } ,
{
name
name {
last "Ottosson" ,
EMBL/Swiss Prot
(http://www.ebi.ac.uk/help/formats_frame.html)
• The first line of each sequence entry is the ID definition line which contains entry name, data class,
molecule, division and sequence length.
• XX line contains no data, just a separator
• The AC line lists the accession number.
• DE line gives description about the sequence
• FT precise annotation for the sequence
• Sequence information SQ in the first two spaces.
• The sequence information begins on the fifth line of the sequence entry.
• The last line of each sequence entry in the file is a terminator line which has the two characters // in the
first two spaces.
ID AA03518 standard; DNA; FUN; 237 BP. XX AC U03518;
XX
AC U03518;
XX
DE Aspergillus awamori internal transcribed spacer 1 (ITS1) and 18S
DE rRNA and 5.8S rRNA genes, partial sequence.
DE rRNA and 5.8S rRNA genes, partial sequence.
RX MEDLINE; 94303342.
RX PUBMED; 8030378.
XX
FT rRNA <1..20
FT /product="18S ribosomal RNA"
FT misc_RNA 21..205
FT /standard_name="Internal transcribed spacer 1 (ITS1)"
FT rRNA 206..>237
FT /product="5.8S ribosomal RNA"
SQ Sequence 237 BP; 41 A; 77 C; 67 G; 52 T; 0 other;
aacctgcgga aggatcatta ccgagtgcgg gtcctttggg cccaacctcc catccgtgtc 60
tattgtaccc tgttgcttcg gcgggcccgc cgcttgtcgg ccgccggggg ggcgcctctg 120
ccccccgggc ccgtgcccgc cggagacccc aacacgaaca ctgtctgaaa gcgtgcagtc 180
tgagttgatt gaatgcaatc agttaaaact ttcaacaatg gatctcttgg ttccggc 237
//
EMBL/Swiss Prot
• The first line of the input file contains the number of sequences and their
length (all should have the same length) separated by blanks.
• The next line contains a sequence name, next lines are the sequence itself
in blocks of 10 characters. Then follow rest of sequences.
Plain sequence format
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCG
CTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGC
GGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCT
CCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGC
AGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACT
GenBank format