Professional Documents
Culture Documents
8e5bbfunda Seq Anals
8e5bbfunda Seq Anals
8e5bbfunda Seq Anals
Sequence Analysis
Fourie Joubert
FASTA Example
> mysequence
ACGTCGATCGATCGATGCATCGTGCTAGCTACAGTCGATGCAT
CAGTCGATGCTAGCATGCTAGCTGCATCGATCGATGCTACGTA
CAGTCGATCGATGCAT
> mysequence2
ACCGTACGATGCTAGCTAGCTAGCTACAGTCAGTCGATGCTACG
CAGTCGTAGCATGCTAACGTCGATCGTA
> mysequence3
CAGTCAGTCGTAGCTAGCTAGCTAGCTAGGGGTATCGATGCTAA
CAGTACTTTGCATGCAGCATGCTAGCTAGCTAGCTA
File Header
The first line in the file must have "GENETIC SEQUENCE DATA BANK" in spaces
20 through 46.
The next 8 lines may contain arbitrary text. They are ignored but are required to
maintain the GenBank format.
Sequence Data Entries
Each sequence entry in the file should have the following format:
1st line: Must have LOCUS in the first 5 spaces. The genetic locus name or
identifier must be in spaces 13 - 22. The length of the sequences is right
justified in spaces 23 through 29.
2nd line: Must have DEFINITION in the first 10 spaces. Spaces 13 - 80 are free
form text to identify the sequence.
3rd line: Must have ACCESSION in the first 9 spaces. Spaces 13 - 18 must hold
the primary accession number.
4th line: Must have ORIGIN in the first 6 spaces. Nothing else is required on this
line, it indicates that the nucleic acid sequence begins on the next line.
5th line: Begins the nucleotide sequence. The first 9 spaces of each sequence
line may either be blank or may contain the position in the sequence of the first
nucleotide on the line. The next 66 spaces hold the nucleotide sequence in six
blocks of ten nucleotides. Each of the six blocks begins with a blank space
followed by ten nucleotides. Thus the first nucleotide is in space eleven of the
line while the last is in space 75.
Last line: Must have // in the first 2 spaces to indicate termination of the
sequence.
NOTE: Multiple sequences may appear in each file. To begin another sequence
go back to a) and start again.
Genbank Example
LOCUS
DEFINITION
ACCESSION
VERSION
KEYWORDS
SOURCE
ORGANISM
CDS
181..924
/gene="Tpi
/EC_number="5.3.1.1
/note="Nucleotide sequence of the Celera sequence differs from the published
sequence for this transcript.
/codon_start=1
/db_xref="FLYBASE:FBgn0003738
/db_xref="LocusID:43582
/product="Triose phosphate isomerase
/protein_id="NP_524585.1
/db_xref="GI:17864112"
/translation="MSRKFCVGGNWKMNGDQKSIAEIAKTLSSAALDPNTEVVIGCPA
IYLMYARNLLPCELGLAGQNAYKVAKGAFTGEISPAMLKDIGADWVILGHSERRAIFG
ESDALIAEKAEHALAEGLKVIACIGETLEEREAGKTNEVVARQMCAYAQKIKDWKNVV
VAYEPVWAIGTGQTATPDQAQEVHAFLRQWLSDNISKEVSASLRIQYGGSVTAANAKE
LAKKPDIDGFLVGGASLKPEFVDIINARQ
misc_feature 187..921
/note="TIM; Region: Triosephosphate isomerase
BASE COUNT 279 a 368 c 323 g 220 t
ORIGIN
1 ttaatctcga atctgggaaa aatctgagtg
61 agttacccac ttgaaattat cagttccaaa
121 cccgatccgc agttctacgc caatttcagc
181 atgagccgaa agttctgcgt gggaggcaac
241 gccgagatcg ccaagaccct gagctcggcc
301 ggctgcccgg ccatctacct gatgtacgcc
361 gccggccaga atgcctacaa ggtggccaag
421 atgctgaagg
//
gaaaagtcga
cactctaata
accgattgca
tggaagatga
gccctcgacc
cgcaacctgc
ggcgcattca
cggcgagcct
gcagtcccct
ccgacagcaa
acggcgacca
ccaacacgga
tgccctgcga
ccggcgagat
ccagtcatcg
tgttttgtcc
cagcaacaac
gaagtccatc
ggtggtcatc
gctgggtctg
ctcccctgcg
Unlike the GenBank file format the EMBL file format does not require a series
of header lines. Thus the first line in the file begins the first sequence entry
of the file.
The first line of each sequence entry contains the two letters ID in the first
two spaces. This is followed by the EMBL identifier in spaces 6 through 14.
The second line of each sequence entry has the two letters AC in the first two
spaces. This is followed by the accession number in spaces 6 through 11.
The third line of each sequence entry has the two letters DE in the first two
spaces. This is followed by a free form text definition in spaces 6 through 72.
The fourth line in each sequence entry has the two letters SQ in the first two
spaces. This is followed by the length of the sequence beginning at or after
space 13. After the sequence length there is a blank space and the two
letters BP.
The nucleotide sequence begins on the fifth line of the sequence entry. Each
line of sequence begins with four blank spaces. The next 66 spaces hold the
nucleotide sequence in six blocks of ten nucleotides. Each of the six blocks
begins with a blank space followed by ten nucleotides. Thus the first
nucleotide is in space 6 of the line while the last is in space 70.
The last line of each sequence entry in the file is a terminator line which has
the two characters // in the first two spaces.
Multiple sequences may appear in each file. To begin another sequence go
back to item 1 and start again.
EMBL Example
ID
XX
AC
XX
SV
XX
DT
DT
XX
DE
XX
KW
XX
OS
OC
OC
OC
XX
RN
RP
RA
RT
RL
RL
RL
XX
DMTPIG
X57576; S70377;
X57576.1
20-JAN-1992 (Rel. 30, Created)
19-AUG-1996 (Rel. 49, Last updated, Version 10)
D.melanogaster Tpi gene for Triosephosphate isomerase
glycolytic enzyme; tpi gene; triosephosphate isomerase.
Drosophila melanogaster (fruit fly)
Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; Pterygota;
Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea;
Drosophilidae; Drosophila.
[1]
1-3419
Sullivan D.T.;
;
Submitted (07-FEB-1991) to the EMBL/GenBank/DDBJ databases.
D.T. Sullivan, Biological Research Laboratories, 130 College Pl, Syracuse
University, Syracuse, NY 13244, USA
RN
RX
RA
RT
RT
RL
XX
DR
DR
XX
FH
FH
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
[3]
MEDLINE; 92079900.
Shaw-Lee R.L., Lissemore J.L., Sullivan D.T.;
"Structure and expression of the triose phosphate isomerase (Tpi) gene of
Drosophila melanogaster.";
Mol. Gen. Genet. 230:225-229(1991).
FLYBASE; FBgn0003738; Tpi.
SWISS-PROT; P29613; TPIS_DROME.
Key
source
Location/Qualifiers
1..3419
/db_xref="taxon:7227"
/germline
/organism="Drosophila melanogaster"
/strain="Oregon-R"
/clone_lib="EMBL-4"
CDS
join(2237..2773,2830..3036)
/db_xref="FLYBASE:FBgn0003738"
/db_xref="SWISS-PROT:P29613"
/gene="Tpi"
/EC_number="5.3.1.1"
/product="triosephosphate isomerase"
/protein_id="CAA40804.1"
/translation="MSRKFCVGGNWKMNGDQKSIAEIAKTLSSAALDPNTEVVIGCPAI
YLMYARNLLPCELGLAGQNAYKVAKGAFTGEISPAMLKDIGADWVILGHSERRAIFGES
DALIAEKAEHALAEGLKVIACIGETLEEREAGKTNEVVARQMCAYAQKIKDWKNVVVAY
EPVWAIGTGKTATPDQAQEVHASLRQWLSDNISKEVSASLRIQYGGSVTAANAKELAKK
PDIDGFLVGGASLKPEFLDIINARQ"
mRNA
join(2004..2028,2186..2773,2830..3036)
/gene="Tpi"
prim_transcript 2004..3296
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
XX
SQ
//
exon
exon
exon
intron
intron
misc_feature
misc_feature
polyA_signal
2008..2032
/number=1
2189..2773
/number=2
2830..3296
/number=3
2033..2188
/number=1
2774..2829
/number=2
2147..2151
/note="intron 1 lariat sequence"
2789..2793
/note="intron 2 lariat sequence"
3258..3262
gctgggtgaa
cgaaatagca
tccgcctcct
gatctggagt
ggcatagatc
tcggatgaca
tccagggaga
ctgagggcca
ccacggcgag
gtcccggctc
cgctgaagaa
acgagcccag
agccagccaa
60
120
180
240
300
360
420
480
540
600
660
720
780
Interleaved
18
206
a121
a241
c-s8c1
c1nov
o1brazl
o1campos
o1kauf
ken1-76
ken34-84
ken
uga97-1
bec1-65
zim88-3
knp10-90
zim96-3
zim7-83
knp196-9
zam4-96
MNTTNCFIAL
MNTTDCFIAL
MNTTDCFIAV
MNTTDCFIAV
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MKTTDCFNVL
MKTTDCFDVL
MKTTDCFNVL
MKTTGCFDVL
MKTTDCFNVL
MKTTDCFSVL
MKTTDCFDAL
VHAIREIRAF
VTAIREIRAF
VNAIKEVRAL
VNAIREIRAL
VQAIREIKAL
VQAIREIKAL
VQAIREIKAL
LRAFREIKTL
VRAIREFKIL
VQAIREIKLL
VQAIREIKSL
FEIFHRFGQT
LEIFHRFRQT
LETFHRFRNV
IEIAHRLRQL
LEIIYRFRHT
FEIFHRLRHT
LEAFHRLRQT
FLSRATG-KM
FLPRATG-RM
FLPRTAG-KM
FLPRTTG-KM
FLPRTTG-KM
FLPRTTG-KM
FLSRTTG-KM
FLSRVRG-KM
FSLRPLARKM
FKG--IR-KM
FRS--SR-KM
FKA--DR-KM
FKT--DR-KM
FKT--DR-KM
NKT--DR-KM
FKT--DR-KM
LKT--ER-KM
FKT--DR-KM
EFTLYNGERK
EFTLHNGERK
EFTLHDGEKK
EFTLHDGEKK
ELTLYNGEKK
ELTLYNGEKK
ELTLYNGEKK
EFTLYNGEKK
EFTLYNGIKK
KLTLYNGEKK
EFTLYNGEKK
EFTLYNGEKK
EFTLYNGEKK
EFTLYNGDKK
EFTLYNGEKK
EFTLYNGEKK
EFTLYNGERK
EFTLYNGEKK
TFYSRPNNHD
VFYSRPNNHD
VFYSRPNNHD
VFYSRPNNHD
TFYSRPNNHD
TFYSRPNNHD
TFYSRPNNHD
TFYSRPNNHD
TFYSRPNKHD
TFYSRPNSHD
TFYSRPNNHD
TFYSRPNTHG
TFYSRPNTHG
TFYSRPNTHG
TFYSRPNTHG
TFYSRPNKHG
TFYSRPNKHG
TFYSRPNRHG
NCWLNTILQL
NCWLNTILQL
NCWLNTILQL
NCWLNTILQL
NCWLNAILQL
NCWLNAILQL
NCWLNAILQL
NCWLNAILQL
NCWLNAILQL
NCWLNTILQL
NCWLNTILQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
FRYVDEPFFD
FRYVGEPFFD
FRYVDEPFFD
FRYVDEPFFD
FRYVEEPFFD
FRYVEEPFFD
FRYVEEPFFD
FRYVDEPFFE
FRYVDEPFFD
FRYVDEPFFD
FRYVDEPFFD
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
WVYNSPENLT
WVYDSPENLT
WVYNSPENLT
WVYNSPENLT
WVYSTPENLT
WVYSTPENLT
WVYSSPENLT
WVYDSPENLT
WVYESPENLT
WVYNSPENLT
WVYNSPENLT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
LAAIKQLEEL
LEAIEQLEEL
LEAIKQLEEL
LEAIKQLEEL
LEAIKQLEDL
LEAIKQLEDL
LEAIKQLEDL
VEAIRQLEEL
IQAIGQLEEL
LRAIEQLEEL
LQAIEQLEEL
LDMIKQLSDY
LDMIKQLSDY
LDMIKRLSDY
LDMIKQLSDY
LDMIKQLSDY
LDMIKQLSDY
LDMIKQLSDY
TGLELHEGGP
TGLELHEGGP
TGLELREGGP
TGLELREGGP
TGLELHEGGP
TGLELHEGGP
TGLELHEGGP
TGLELHEGGP
TGLDLREGGP
TGLELREGGP
TGLELHEGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
PALVIWNIKH
PALVIWNIKH
PALVIWNIKH
PALVIWNIKH
PALVIWNIKH
LLQTGIGTAS
LLHTGIGTAS
LLHTGIGTAS
LLHTGIGTAS
LLHTGIGTAS
RPAR-CMVDG
RPSEVCMVDG
RPSEVCMVDG
RPSEVCMVDG
RPSEVCMVDG
TNMCLADFHA
TNMCLADFHA
TDMCLADFHA
TDMCLADFHA
TDMCLADFHA
GIFLKEQEHA
GIFLKGQEHA
GIFMKGREHA
GIFMKGQEHA
GIFLKGQEHA
Sequential
18 206 YF
a121
a241
c-s8c1
c1nov
o1brazl
o1campos
MNTTNCFIAL
NCWLNTILQL
PALVIWNIKH
VFACVTSNGW
LK---MNTTDCFIAL
NCWLNTILQL
PALVIWNIKH
VFACVTSNGW
LK---MNTTDCFIAV
NCWLNTILQL
PALVIWNIKH
VFACVTSNGW
LKGAGQ
MNTTDCFIAV
NCWLNTILQL
PALVIWNIKH
VFACVTSNGW
LKGAGQ
MNTTDCFIAL
NCWLNAILQL
PALVIWNIKH
VFACVTSNGW
LK---MNTTDCFIAL
NCWLNAILQL
PALVIWNIKH
VFAC
VHAIREIRAF
FRYVDEPFFD
LLQTGIGTAS
YAIDDEDFYP
FLSRATG-KM
WVYNSPENLT
RPAR-CMVDG
WTPDPSDVLV
EFTLYNGERK
LAAIKQLEEL
TNMCLADFHA
FVPYDQEPLN
TFYSRPNNHD
TGLELHEGGP
GIFLKEQEHA
GGWKANVQRK
VTAIREIRAF
FRYVGEPFFD
LLHTGIGTAS
YAIDDDDFYP
FLPRATG-RM
WVYDSPENLT
RPSEVCMVDG
WTPDPSDVLV
EFTLHNGERK
LEAIEQLEEL
TNMCLADFHA
FVPYDQEPLN
VFYSRPNNHD
TGLELHEGGP
GIFLKGQEHA
GEWKTKVQQK
VNAIKEVRAL
FRYVDEPFFD
LLHTGIGTAS
YAIDDEDFYP
FLPRTAG-KM
WVYNSPENLT
RPSEVCMVDG
WTPDPSDVLV
EFTLHDGEKK
LEAIKQLEEL
TDMCLADFHA
FVPYDQEPLN
VFYSRPNNHD
TGLELREGGP
GIFMKGREHA
EGWKASVQRK
VNAIREIRAL
FRYVDEPFFD
LLHTGIGTAS
YAIDDEDFYP
FLPRTTG-KM
WVYNSPENLT
RPSEVCMVDG
WTPDPSDVLV
EFTLHDGEKK
LEAIKQLEEL
TDMCLADFHA
FVPYDQEPLN
VFYSRPNNHD
TGLELREGGP
GIFMKGQEHA
EGWKANVQRK
VQAIREIKAL
FRYVEEPFFD
LLHTGIGTAS
YAIDDEDFYP
FLPRTTG-KM
WVYSTPENLT
RPSEVCMVDG
WTPDPSDVLV
ELTLYNGEKK
LEAIKQLEDL
TDMCLADFHA
FVPYDQEPLN
TFYSRPNNHD
TGLELHEGGP
GIFLKGQEHA
GEWKAKVQRK
PDB Example
HEADER
TITLE
TITLE
COMPND
COMPND
COMPND
COMPND
COMPND
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
KEYWDS
KEYWDS
EXPDTA
AUTHOR
AUTHOR
REVDAT
REVDAT
JRNL
JRNL
JRNL
JRNL
JRNL
JRNL
JRNL
JRNL
REMARK
REMARK
REMARK
REMARK
LYASE
06-JUL-99
1QU4
CRYSTAL STRUCTURE OF TRYPANOSOMA BRUCEI ORNITHINE
2 DECARBOXYLASE
MOL_ID: 1;
2 MOLECULE: ORNITHINE DECARBOXYLASE;
3 CHAIN: A, B, C, D;
4 EC: 4.1.1.17;
5 ENGINEERED: YES
MOL_ID: 1;
2 ORGANISM_SCIENTIFIC: TRYPANOSOMA BRUCEI;
3 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
4 EXPRESSION_SYSTEM_COMMON: BACTERIA;
5 EXPRESSION_SYSTEM_STRAIN: B21/DG3;
6 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID
POLYAMINE METABOLISM, PYRIDOXAL 5'-PHOSPHATE, ALPHA-BETA
2 BARREL, LYASE
X-RAY DIFFRACTION
N.V.GRISHIN,A.L.OSTERMAN,H.B.BROOKS,M.A.PHILLIPS,
2 E.J.GOLDSMITH
2
29-DEC-99 1QU4
1
JRNL
COMPND REMARK
1
17-NOV-99 1QU4
0
AUTH
N.V.GRISHIN,A.L.OSTERMAN,H.B.BROOKS,M.A.PHILLIPS,
AUTH 2 E.J.GOLDSMITH
TITL
X-RAY STRUCTURE OF ORNITHINE DECARBOXYLASE FROM
TITL 2 TRYPANOSOMA BRUCEI: THE NATIVE STRUCTURE AND THE
TITL 3 STRUCTURE IN COMPLEX WITH
TITL 4 ALPHA-DIFLUOROMETHYLORNITHINE
REF
BIOCHEMISTRY
V. 38 15174 1999
REFN
ASTM BICHAW US ISSN 0006-2960
1
2
2 RESOLUTION. 2.90 ANGSTROMS.
DBREF
DBREF
DBREF
DBREF
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
1QU4 A
1QU4 B
1QU4 C
1QU4 D
1 A
2 A
3 A
4 A
5 A
6 A
7 A
8 A
9 A
10 A
11 A
12 A
13 A
14 A
15 A
16 A
17 A
18 A
19 A
20 A
21 A
22 A
23 A
24 A
25 A
26 A
27 A
28 A
29 A
30 A
31 A
32 A
33 A
1
1
1
1
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425 SWS
425 SWS
425 SWS
425 SWS
GLY ALA MET
ARG PHE LEU
LYS LYS ILE
PHE PHE VAL
GLU THR TRP
TYR ALA VAL
THR LEU ALA
ASN THR GLU
PRO GLU LYS
SER HIS ILE
MET THR PHE
LYS THR HIS
THR ASP ASP
PHE GLY ALA
GLN ALA LYS
PHE HIS VAL
ALA GLN ALA
GLY THR GLU
GLY GLY GLY
PHE GLU GLU
LYS HIS PHE
GLU PRO GLY
ALA VAL ASN
GLN THR ASP
SER PHE MET
PHE ASN CYS
LEU PRO GLN
PRO SER SER
GLN ILE VAL
GLY GLU TRP
VAL VAL GLY
THR ILE TYR
VAL ARG GLU
P07805
DCOR_TRYBB
P07805
DCOR_TRYBB
P07805
DCOR_TRYBB
P07805
DCOR_TRYBB
ASP ILE VAL VAL ASN ASP
GLU GLY PHE ASN THR ARG
SER MET ASN THR CYS ASP
ALA ASP LEU GLY ASP ILE
LYS LYS CYS LEU PRO ARG
LYS CYS ASN ASP ASP TRP
ALA LEU GLY THR GLY PHE
ILE GLN ARG VAL ARG GLY
ILE ILE TYR ALA ASN PRO
ARG TYR ALA ARG ASP SER
ASP CYS VAL ASP GLU LEU
PRO LYS ALA LYS MET VAL
SER LEU ALA ARG CYS ARG
LYS VAL GLU ASP CYS ARG
LYS LEU ASN ILE ASP VAL
GLY SER GLY SER THR ASP
ILE SER ASP SER ARG PHE
LEU GLY PHE ASN MET HIS
PHE PRO GLY THR ARG ASP
ILE ALA GLY VAL ILE ASN
PRO PRO ASP LEU LYS LEU
ARG TYR TYR VAL ALA SER
VAL ILE ALA LYS LYS VAL
VAL GLY ALA HIS ALA GLU
TYR TYR VAL ASN ASP GLY
ILE LEU TYR ASP HIS ALA
ARG GLU PRO ILE PRO ASN
VAL TRP GLY PRO THR CYS
GLU ARG TYR TYR LEU PRO
LEU LEU PHE GLU ASP MET
THR SER SER PHE ASN GLY
TYR VAL VAL SER GLY LEU
LEU LYS SER GLN LYS SER
21
21
21
21
ASP LEU
ASP ALA
GLU GLY
VAL ARG
VAL THR
ARG VAL
ASP CYS
ILE GLY
CYS LYS
GLY VAL
GLU LYS
LEU ARG
LEU SER
PHE ILE
THR GLY
ALA SER
VAL PHE
ILE LEU
ALA PRO
ASN ALA
THR ILE
ALA PHE
THR PRO
SER ASN
VAL TYR
VAL VAL
GLU LYS
ASP GLY
GLU MET
GLY ALA
PHE GLN
PRO ASP
445
445
445
445
SER CYS
LEU CYS
ASP PRO
LYS HIS
PRO PHE
LEU GLY
ALA SER
VAL PRO
GLN ILE
ASP VAL
VAL ALA
ILE SER
VAL LYS
LEU GLU
VAL SER
THR PHE
ASP MET
ASP ILE
LEU LYS
LEU GLU
VAL ALA
THR LEU
GLY VAL
ALA GLN
GLY SER
ARG PRO
LEU TYR
LEU ASP
GLN VAL
TYR THR
SER PRO
HIS VAL
HET
HET
HET
HET
HETNAM
HETSYN
FORMUL
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
PLP
PLP
PLP
PLP
5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
1
2
3
4
5
6
A 600
15
B 600
15
C 600
15
D 600
15
PLP PYRIDOXAL-5'-PHOSPHATE
PLP VITAMIN B6 COMPLEX
PLP
4(C8 H10 N1 O6 P1)
1 LEU A
45 LEU A
59
2 LYS A
69 ASN A
71
3 ASP A
73 GLY A
84
4 SER A
91 ILE A 101
5 PRO A 104 GLU A 106
6 GLN A 116 SER A 126
7 CYS A 135 HIS A 146
8 LYS A 173 GLU A 175
9 ASP A 176 LEU A 187
10 ALA A 205 LEU A 225
11 LYS A 247 PHE A 263
12 GLY A 276 ALA A 281
13 PHE A 326 HIS A 333
14 THR A 390 THR A 394
15 SER A 396 PHE A 400
A 6 GLN A 365 PRO A 373
A 6 LEU A 350 TRP A 356
A 6 SER A 313 VAL A 318
A 6 PHE A 284 THR A 296
A 6 PHE A 40 ASP A 44
A 6 THR A 404 VAL A 408
A1 6 GLN A 365 PRO A 373
A1 6 LEU A 350 TRP A 356
A1 6 SER A 313 VAL A 318
A1 6 PHE A 284 THR A 296
A1 6 TRP A 380 PHE A 383
A1 6 PRO A 338 PRO A 340
1
5
1
1
5
1
1
5
1
1
1
1
1
5
5
0
-1
1
-1
-1
1
0
-1
1
-1
-1
-1
15
3
12
11
3
11
12
3
12
21
17
6
8
5
5
N
O
N
O
O
TYR
PHE
ILE
PHE
THR
A
A
A
A
A
351
314
291
40
404
O
N
O
N
N
LEU
SER
TYR
ALA
PHE
A
A
A
A
A
372
354
317
287
41
N
O
N
N
O
TYR
PHE
ILE
LEU
LEU
A
A
A
A
A
351
314
291
381
339
O
N
O
O
N
LEU
SER
TYR
VAL
LEU
A
A
A
A
A
372
354
317
288
382
CRYST1
66.800 151.700
85.350 90.00 102.30
ORIGX1
1.000000 0.000000 0.000000
ORIGX2
0.000000 1.000000 0.000000
ORIGX3
0.000000 0.000000 1.000000
SCALE1
0.014970 0.000000 0.003264
SCALE2
0.000000 0.006592 0.000000
SCALE3
0.000000 0.000000 0.011992
ATOM
1 N
ASP A 35
34.731 -5.686
ATOM
2 CA ASP A 35
34.249 -5.884
ATOM
3 C
ASP A 35
33.320 -4.750
ATOM
4 O
ASP A 35
33.474 -3.594
ATOM
5 CB ASP A 35
33.558 -7.247
ATOM
6 CG ASP A 35
33.566 -7.887
ATOM
7 OD1 ASP A 35
33.717 -9.133
ATOM
8 OD2 ASP A 35
33.419 -7.182
ATOM
9 N
GLU A 36
32.332 -5.073
ATOM
10 CA GLU A 36
31.446 -4.080
ATOM
11 C
GLU A 36
32.259 -2.944
ATOM
12 O
GLU A 36
32.220 -1.813
ATOM
13 CB GLU A 36
30.419 -3.638
ATOM
14 CG GLU A 36
29.111 -3.155
ATOM
15 CD GLU A 36
27.791 -3.597
ATOM
16 OE1 GLU A 36
27.308 -4.727
ATOM
17 OE2 GLU A 36
27.115 -2.806
ATOM
18 N
GLY A 37
33.018 -3.192
ATOM
19 CA GLY A 37
33.624 -2.167
ATOM
20 C
GLY A 37
32.598 -1.167
ATOM
21 O
GLY A 37
32.236 -1.162
ATOM
22 N
ASP A 38
32.135 -0.248
ATOM
23 CA ASP A 38
31.136
0.700
ATOM
24 C
ASP A 38
31.794
1.722
ATOM
25 O
ASP A 38
33.029
1.896
ATOM
26 CB ASP A 38
30.500
1.242
ATOM
27 CG ASP A 38
29.583
0.207
ATOM
28 OD1 ASP A 38
29.408 -0.876
ATOM
38 CA PHE A 40
32.728
6.727
...
CONECT1117911177
CONECT1118011177
MASTER
482
0
4
60
80
0
0
END
90.00 P 1 21 1
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
15.000 1.00 98.44
13.629 1.00 98.39
13.203 1.00 98.13
13.603 1.00 98.29
13.545 1.00 98.38
12.170 1.00 98.36
12.114 1.00 98.26
11.148 1.00 98.39
12.378 1.00 97.79
11.787 1.00 95.51
11.199 1.00 90.65
11.692 1.00 94.96
12.840 1.00 97.63
12.261 1.00 98.19
12.824 1.00 98.33
12.601 1.00 98.28
13.520 1.00 98.43
10.131 1.00 52.86
9.299 1.00 39.88
8.712 1.00 34.34
7.531 1.00 31.44
9.564 1.00 37.23
9.138 1.00 36.44
8.228 1.00 33.49
8.156 1.00 34.06
10.405 1.00 42.06
11.047 1.00 44.59
10.434 1.00 45.72
7.615 1.00 20.51
611176
64
N
C
C
O
C
C
O
O
N
C
C
O
C
C
C
O
O
N
C
C
O
N
C
C
O
C
C
O
C
132
Name
Residue Chain
Seq Nr
X Y
CRYST1
66.800 151.700
85.350 90.00 102.30
ORIGX1
1.000000 0.000000 0.000000
ORIGX2
0.000000 1.000000 0.000000
ORIGX3
0.000000 0.000000 1.000000
SCALE1
0.014970 0.000000 0.003264
SCALE2
0.000000 0.006592 0.000000
SCALE3
0.000000 0.000000 0.011992
ATOM
1 N
ASP A 35
34.731 -5.686
ATOM
2 CA ASP A 35
34.249 -5.884
ATOM
3 C
ASP A 35
33.320 -4.750
ATOM
4 O
ASP A 35
33.474 -3.594
ATOM
5 CB ASP A 35
33.558 -7.247
ATOM
6 CG ASP A 35
33.566 -7.887
ATOM
7 OD1 ASP A 35
33.717 -9.133
ATOM
8 OD2 ASP A 35
33.419 -7.182
ATOM
9 N
GLU A 36
32.332 -5.073
ATOM
10 CA GLU A 36
31.446 -4.080
ATOM
11 C
GLU A 36
32.259 -2.944
ATOM
12 O
GLU A 36
32.220 -1.813
ATOM
13 CB GLU A 36
30.419 -3.638
ATOM
14 CG GLU A 36
29.111 -3.155
ATOM
15 CD GLU A 36
27.791 -3.597
ATOM
16 OE1 GLU A 36
27.308 -4.727
ATOM
17 OE2 GLU A 36
27.115 -2.806
ATOM
18 N
GLY A 37
33.018 -3.192
ATOM
19 CA GLY A 37
33.624 -2.167
ATOM
20 C
GLY A 37
32.598 -1.167
ATOM
21 O
GLY A 37
32.236 -1.162
ATOM
22 N
ASP A 38
32.135 -0.248
ATOM
23 CA ASP A 38
31.136
0.700
ATOM
24 C
ASP A 38
31.794
1.722
ATOM
25 O
ASP A 38
33.029
1.896
ATOM
26 CB ASP A 38
30.500
1.242
ATOM
27 CG ASP A 38
29.583
0.207
ATOM
28 OD1 ASP A 38
29.408 -0.876
ATOM
38 CA PHE A 40
32.728
6.727
...
CONECT1117911177
CONECT1118011177
MASTER
482
0
4
60
80
0
0
END
90.00 P 1 21 1
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
15.000 1.00 98.44
13.629 1.00 98.39
13.203 1.00 98.13
13.603 1.00 98.29
13.545 1.00 98.38
12.170 1.00 98.36
12.114 1.00 98.26
11.148 1.00 98.39
12.378 1.00 97.79
11.787 1.00 95.51
11.199 1.00 90.65
11.692 1.00 94.96
12.840 1.00 97.63
12.261 1.00 98.19
12.824 1.00 98.33
12.601 1.00 98.28
13.520 1.00 98.43
10.131 1.00 52.86
9.299 1.00 39.88
8.712 1.00 34.34
7.531 1.00 31.44
9.564 1.00 37.23
9.138 1.00 36.44
8.228 1.00 33.49
8.156 1.00 34.06
10.405 1.00 42.06
11.047 1.00 44.59
10.434 1.00 45.72
7.615 1.00 20.51
611176
64
Element
N
C
C
O
C
C
O
O
N
C
C
O
C
C
C
O
O
N
C
C
O
N
C
C
O
C
C
O
C
132
IG/Stanford
GenBank/GB
NBRF
EMBL
GCG
DNAStrider
Fitch
Pearson/Fasta
Zuker (in-only)
10.
11.
12.
13.
14.
15.
16.
17.
18.
Olsen (in-only)
Phylip3.2 (Sequential)
Phylip (Interleaved)
Plain/Raw
PIR/CODATA
MSF
ASN.1
PAUP/NEXUS
Pretty (out-only)
seqret (EMBOSS)
Major formats
PDB Protein Database
mol2 Tripos Sybyl
mmCIF - Macromolecular
Crystallographic Information File
XYZ
Babel
Alchemy
Biosym .CAR
Cambridge CADPAC
Chem3D Cartesian 2
CSD GSTAT
Feature
Gaussian Output
Gaussian 94 Output
Hyperchem HIN
Mac Molecule
MM2 Input
MMADS
Mopac Cartesian
PC Model
PS-GVB Output
ShelX
Spartan Semi-Empirical
Sybyl Mol2
XYZ
AMBER PREP
Boogie
CHARMm
CSD CSSR
Dock Database
Free Form Fractional
Gaussian Z-Matrix
GAMESS Output (A)
MDL Isis (SDF)
Macromodel
MM2 Ouput
MDL MOLfile
Mopac Internal
PDB
Quanta MSF
SMILES
Spartan Mol. Mechanics
Conjure
XED
biopython
1 letter
3 letter
EMBOSS
transeq
It can translate in any of the 3 forward or three reverse
sense frames, or in all three forward or reverse frames,
or in all six frames.
It can translate specified regions corersponding to the
coding regions of your sequences.
It can translate using the standard ('Universal') genetic
code and also with a selection of non-standard codes.
Termination (STOP) codons are translated as the
character '*'.
The output peptide sequence is always in the standard
one-letter IUPAC code.
prettyseq
This writes out a nicely formatted display of the
sequence with the translation (within specified ranges)
displayed beneath it.
Slightly unusually, this application uses the codon usage
tables to translate the codons
Web tools
Expasy translate tool
abiview (EMBOSS)
Trev (Unix)
EditView (Mac)
Chromas (Windows)
AbiView (Windows)
Assembly program
gap4
Assembly
Contig joining
Assembly checking
Repeat searching
Experiment suggestion
Read pair analysis
Contig editing
Graphical views of contigs
Database
Consed
GRAIL
Neural network
Combine evidence fron 7 different statistical
measures
Frame bias
Periodicities
Fractal dimensions
Coding 6-tuples
In-frame 6-tuples
K-tuple commonality
Repetitive 6-tuple words
Organism/dataset specificity
Genscan
GeneWise
NetGene
EMBOSS
getorf
plotorf
SMART
Pfam
Prosite
Detects signature motifs in proteins
Regular expression searches
Scan sequenes against database
Prints
Protein fingerprints
EMBOSS DNA
cpgplot plots cpg rich areas
restrict restriction sites
tfscan transcription factors
einverted find inverted repeats
chips codon usage
geecee GC content
EMBOSS protein
Primer Design
Factors
Melting point
Length
Composition
Methods for calculating melting point
Internal stability
Specificity
Internal stability
Hairpin structures
Compatibility
Primer dimers
Compatible melting points
OLIGO Package
Nearest neighbour method for Tm
calculation
Comprehensive analysis suite
$$$
CODEHOP
COnsensus-DEgenerate Hybrid Oligonucleotide Primer
PCR primers designed from protein multiple sequence
alignments
Primer3
You provide the target sequence
It picks primers for PCR reactions, considering
as criteria:
start len
1 LEFT PRIMER
RIGHT PRIMER
tm
gc%
any 3'
seq
66
20
60.22
55.00
5.00
2.00 AAGAGTCTGGGGGAGCTGAT
259
20
60.19
50.00
4.00
2.00 ATCATTGCTGGGCTGATCTC
PRODUCT SIZE: 194, PAIR ANY COMPL: 4.00, PAIR 3' COMPL: 2.00
2 LEFT PRIMER
RIGHT PRIMER
331
20
60.25
45.00
5.00
2.00 AGCTCATTGGGCAAAAAGTG
529
20
59.55
55.00
2.00
1.00 CCAGTTCCAATAGCCCAGAC
PRODUCT SIZE: 199, PAIR ANY COMPL: 6.00, PAIR 3' COMPL: 1.00
3 LEFT PRIMER
RIGHT PRIMER
331
20
60.25
45.00
5.00
2.00 AGCTCATTGGGCAAAAAGTG
538
20
60.12
45.00
3.00
2.00 GCAGTTTTGCCAGTTCCAAT
PRODUCT SIZE: 208, PAIR ANY COMPL: 7.00, PAIR 3' COMPL: 2.00
4 LEFT PRIMER
RIGHT PRIMER
379
20
59.67
50.00
4.00
2.00 TCATCGCCTGTATTGGTGAG
578
20
60.44
50.00
6.00
2.00 GCGGAGTTTCTTGTGCACTT
PRODUCT SIZE: 200, PAIR ANY COMPL: 3.00, PAIR 3' COMPL: 1.00
Statistics
con
too
in
in
no
tm
tm
high
high
sid
many
tar
excl
GC
too
too
any
3'
poly
end
ered
Ns
get
reg
GC% clamp
low
stab
ok
Left
4198
810
2322
17
65
86
898
Right
4172
807
2281
83
994
bad
high
Pair Stats:
considered 811, unacceptable product size 422, high any compl 1, high end compl 33, ok
355