8e5bbfunda Seq Anals

Fundamentals of
Sequence Analysis
Fourie Joubert
FASTA File Format
First line contains > followed by a space and a

short descriptor
Sequence usually 60 or 80 characters per column
on following lines
May repeat after inserting a blank line
FASTA Example
> mysequence
ACGTCGATCGATCGATGCATCGTGCTAGCTACAGTCGATGCAT
CAGTCGATGCTAGCATGCTAGCTGCATCGATCGATGCTACGTA
CAGTCGATCGATGCAT
> mysequence2
ACCGTACGATGCTAGCTAGCTAGCTACAGTCAGTCGATGCTACG
CAGTCGTAGCATGCTAACGTCGATCGTA
> mysequence3
CAGTCAGTCGTAGCTAGCTAGCTAGCTAGGGGTATCGATGCTAA
CAGTACTTTGCATGCAGCATGCTAGCTAGCTAGCTA
Genbank File Format
File Header
The first line in the file must have "GENETIC SEQUENCE DATA BANK" in spaces
20 through 46.
The next 8 lines may contain arbitrary text. They are ignored but are required to
maintain the GenBank format.
Sequence Data Entries
Each sequence entry in the file should have the following format:
1st line: Must have LOCUS in the first 5 spaces. The genetic locus name or
identifier must be in spaces 13 - 22. The length of the sequences is right
justified in spaces 23 through 29.
2nd line: Must have DEFINITION in the first 10 spaces. Spaces 13 - 80 are free
form text to identify the sequence.
3rd line: Must have ACCESSION in the first 9 spaces. Spaces 13 - 18 must hold
the primary accession number.
4th line: Must have ORIGIN in the first 6 spaces. Nothing else is required on this
line, it indicates that the nucleic acid sequence begins on the next line.
5th line: Begins the nucleotide sequence. The first 9 spaces of each sequence
line may either be blank or may contain the position in the sequence of the first
nucleotide on the line. The next 66 spaces hold the nucleotide sequence in six
blocks of ten nucleotides. Each of the six blocks begins with a blank space
followed by ten nucleotides. Thus the first nucleotide is in space eleven of the
line while the last is in space 75.
Last line: Must have // in the first 2 spaces to indicate termination of the
sequence.
NOTE: Multiple sequences may appear in each file. To begin another sequence
go back to a) and start again.
Genbank Example
LOCUS
DEFINITION
ACCESSION
VERSION
KEYWORDS
SOURCE
ORGANISM
NM_079846 1190 bp mRNA linear INV 15-DEC-2001

Drosophila melanogaster Triose phosphate isomerase (Tpi), mRNA.
NM_079846
NM_079846.1 GI:17864111
.
fruit fly.
Drosophila melanogaster
Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta;
Pterygota; Neoptera; Endopterygota; Diptera; Brachycera;
Muscomorpha; Ephydroidea; Drosophilidae; Drosophila.
REFERENCE 1 (bases 1 to 1190)
AUTHORS
Shaw-Lee,R.L., Lissemore,J.L. and Sullivan,D.T.
TITLE
Structure and expression of the triose phosphate isomerase (Tpi) gene of
Drosophila melanogaster JOURNAL Mol. Gen. Genet. 230 (1-2), 225-229 (1991)
MEDLINE
92079900
PUBMED
1720860
COMMENT
PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI
review. The reference sequence was derived from AE003772.1.
FEATURES
Location/Qualifiers
source
1..1190
/organism="Drosophila melanogaster
/db_xref="taxon:7227
/chromosome="3
/map="99E1-99E2
gene
1..1190
/gene="Tpi
/note="TPI; TPIS; CG2171; CT6334
/db_xref="FLYBASE:FBgn0003738
/db_xref="LocusID:43582
CDS
181..924
/gene="Tpi
/EC_number="5.3.1.1
/note="Nucleotide sequence of the Celera sequence differs from the published
sequence for this transcript.
/codon_start=1
/db_xref="FLYBASE:FBgn0003738
/db_xref="LocusID:43582
/product="Triose phosphate isomerase
/protein_id="NP_524585.1
/db_xref="GI:17864112"
/translation="MSRKFCVGGNWKMNGDQKSIAEIAKTLSSAALDPNTEVVIGCPA
IYLMYARNLLPCELGLAGQNAYKVAKGAFTGEISPAMLKDIGADWVILGHSERRAIFG
ESDALIAEKAEHALAEGLKVIACIGETLEEREAGKTNEVVARQMCAYAQKIKDWKNVV
VAYEPVWAIGTGQTATPDQAQEVHAFLRQWLSDNISKEVSASLRIQYGGSVTAANAKE
LAKKPDIDGFLVGGASLKPEFVDIINARQ
misc_feature 187..921
/note="TIM; Region: Triosephosphate isomerase
BASE COUNT 279 a 368 c 323 g 220 t
ORIGIN
1 ttaatctcga atctgggaaa aatctgagtg
61 agttacccac ttgaaattat cagttccaaa
121 cccgatccgc agttctacgc caatttcagc
181 atgagccgaa agttctgcgt gggaggcaac
241 gccgagatcg ccaagaccct gagctcggcc
301 ggctgcccgg ccatctacct gatgtacgcc
361 gccggccaga atgcctacaa ggtggccaag
421 atgctgaagg
//
gaaaagtcga
cactctaata
accgattgca
tggaagatga
gccctcgacc
cgcaacctgc
ggcgcattca
cggcgagcct
gcagtcccct
ccgacagcaa
acggcgacca
ccaacacgga
tgccctgcga
ccggcgagat
ccagtcatcg
tgttttgtcc
cagcaacaac
gaagtccatc
ggtggtcatc
gctgggtctg
ctcccctgcg
EMBL File Format
Unlike the GenBank file format the EMBL file format does not require a series
of header lines. Thus the first line in the file begins the first sequence entry
of the file.
The first line of each sequence entry contains the two letters ID in the first
two spaces. This is followed by the EMBL identifier in spaces 6 through 14.
The second line of each sequence entry has the two letters AC in the first two
spaces. This is followed by the accession number in spaces 6 through 11.
The third line of each sequence entry has the two letters DE in the first two
spaces. This is followed by a free form text definition in spaces 6 through 72.
The fourth line in each sequence entry has the two letters SQ in the first two
spaces. This is followed by the length of the sequence beginning at or after
space 13. After the sequence length there is a blank space and the two
letters BP.
The nucleotide sequence begins on the fifth line of the sequence entry. Each
line of sequence begins with four blank spaces. The next 66 spaces hold the
nucleotide sequence in six blocks of ten nucleotides. Each of the six blocks
begins with a blank space followed by ten nucleotides. Thus the first
nucleotide is in space 6 of the line while the last is in space 70.
The last line of each sequence entry in the file is a terminator line which has
the two characters // in the first two spaces.
Multiple sequences may appear in each file. To begin another sequence go
back to item 1 and start again.
EMBL Example
ID
XX
AC
XX
SV
XX
DT
DT
XX
DE
XX
KW
XX
OS
OC
OC
OC
XX
RN
RP
RA
RT
RL
RL
RL
XX
DMTPIG
standard; DNA; INV; 3419 BP.
X57576; S70377;
X57576.1
20-JAN-1992 (Rel. 30, Created)
19-AUG-1996 (Rel. 49, Last updated, Version 10)
D.melanogaster Tpi gene for Triosephosphate isomerase
glycolytic enzyme; tpi gene; triosephosphate isomerase.
Drosophila melanogaster (fruit fly)
Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; Pterygota;
Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea;
Drosophilidae; Drosophila.
[1]
1-3419
Sullivan D.T.;
;
Submitted (07-FEB-1991) to the EMBL/GenBank/DDBJ databases.
D.T. Sullivan, Biological Research Laboratories, 130 College Pl, Syracuse
University, Syracuse, NY 13244, USA
RN
RX
RA
RT
RT
RL
XX
DR
DR
XX
FH
FH
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
[3]
MEDLINE; 92079900.
Shaw-Lee R.L., Lissemore J.L., Sullivan D.T.;
"Structure and expression of the triose phosphate isomerase (Tpi) gene of
Drosophila melanogaster.";
Mol. Gen. Genet. 230:225-229(1991).
FLYBASE; FBgn0003738; Tpi.
SWISS-PROT; P29613; TPIS_DROME.
Key
source
Location/Qualifiers
1..3419
/db_xref="taxon:7227"
/germline
/organism="Drosophila melanogaster"
/strain="Oregon-R"
/clone_lib="EMBL-4"
CDS
join(2237..2773,2830..3036)
/db_xref="FLYBASE:FBgn0003738"
/db_xref="SWISS-PROT:P29613"
/gene="Tpi"
/EC_number="5.3.1.1"
/product="triosephosphate isomerase"
/protein_id="CAA40804.1"
/translation="MSRKFCVGGNWKMNGDQKSIAEIAKTLSSAALDPNTEVVIGCPAI
YLMYARNLLPCELGLAGQNAYKVAKGAFTGEISPAMLKDIGADWVILGHSERRAIFGES
DALIAEKAEHALAEGLKVIACIGETLEEREAGKTNEVVARQMCAYAQKIKDWKNVVVAY
EPVWAIGTGKTATPDQAQEVHASLRQWLSDNISKEVSASLRIQYGGSVTAANAKELAKK
PDIDGFLVGGASLKPEFLDIINARQ"
mRNA
join(2004..2028,2186..2773,2830..3036)
/gene="Tpi"
prim_transcript 2004..3296
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
XX
SQ
//
exon
exon
exon
intron
intron
misc_feature
misc_feature
polyA_signal
2008..2032
/number=1
2189..2773
/number=2
2830..3296
/number=3
2033..2188
/number=1
2774..2829
/number=2
2147..2151
/note="intron 1 lariat sequence"
2789..2793
/note="intron 2 lariat sequence"
3258..3262
Sequence 3419 BP; 855

gatctcgagc gagaaatgtg
accagctacg agttcccttc
gttccacagt cccaccagct
atgacaacca caactacagt
ttgaggacgt attcgtgccg
ccgtggaact gcgtcgctcc
cggactccag gccaatgggc
gaccgctctc cactcaaaca
aactgcttgc tgggcaactc
tcctgctccc caggagcaat
aaatgggcac aagcctaagt
gagccgatcc tgcaactgta
cagtaatctc tgcagggatc
ccatcccaca gagaactttt
A; 933 C; 849 G; 778 T; 4 other;

gaacatagtg gaggcctcca gtggcgccga
ccccgctccg gttcccagcg cagcagtgaa
cctcctgctc ctgcgaagcc ctcagttccg
ttcagccagg atgaggacga agatgatgat
gccagctctg ttccaaatcc cgttcagcct
ctggctttgg tcatgaggga gaaattgcga
aacaatcagg atcttcccat agatgaacag
tctcccacaa atggcccact tccggctctt
nnnncaatag cgctcactgc ctgccaggat
ccggtatctt tgtgatcgat agtgaggcga
atcgaaaggg cacggcattc actcggagtt
gctccatcgc taagggacga ggggtccacg
aggagtcctc tgtacttcca cagcatccgc
gctgggtgaa
cgaaatagca
tccgcctcct
gatctggagt
ggcatagatc
tcggatgaca
tccagggaga
ctgagggcca
ccacggcgag
gtcccggctc
cgctgaagaa
acgagcccag
agccagccaa
60
120
180
240
300
360
420
480
540
600
660
720
780
PHYLIP File Format
Interleaved and Sequential formats

The sequences can continue over multiple lines;
when this is one the sequences must be
either in "interleaved" format, similar to the
output of alignment programs, or "sequential"
ormat. These are described in the main
document file. In sequential format all of one
sequence is given, possibly on multiple lines,
before the next starts. In interleaved format the
first part of the file should contain the first
part of each of the sequences, then possibly a line
containing nothing but a carriage-return
character, then the second part of each
sequence, and so on. Only the first parts of the
sequences should be preceded by names.
Interleaved
18
206
a121
a241
c-s8c1
c1nov
o1brazl
o1campos
o1kauf
ken1-76
ken34-84
ken
uga97-1
bec1-65
zim88-3
knp10-90
zim96-3
zim7-83
knp196-9
zam4-96
MNTTNCFIAL
MNTTDCFIAL
MNTTDCFIAV
MNTTDCFIAV
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MNTTDCFIAL
MKTTDCFNVL
MKTTDCFDVL
MKTTDCFNVL
MKTTGCFDVL
MKTTDCFNVL
MKTTDCFSVL
MKTTDCFDAL
VHAIREIRAF
VTAIREIRAF
VNAIKEVRAL
VNAIREIRAL
VQAIREIKAL
VQAIREIKAL
VQAIREIKAL
LRAFREIKTL
VRAIREFKIL
VQAIREIKLL
VQAIREIKSL
FEIFHRFGQT
LEIFHRFRQT
LETFHRFRNV
IEIAHRLRQL
LEIIYRFRHT
FEIFHRLRHT
LEAFHRLRQT
FLSRATG-KM
FLPRATG-RM
FLPRTAG-KM
FLPRTTG-KM
FLPRTTG-KM
FLPRTTG-KM
FLSRTTG-KM
FLSRVRG-KM
FSLRPLARKM
FKG--IR-KM
FRS--SR-KM
FKA--DR-KM
FKT--DR-KM
FKT--DR-KM
NKT--DR-KM
FKT--DR-KM
LKT--ER-KM
FKT--DR-KM
EFTLYNGERK
EFTLHNGERK
EFTLHDGEKK
EFTLHDGEKK
ELTLYNGEKK
ELTLYNGEKK
ELTLYNGEKK
EFTLYNGEKK
EFTLYNGIKK
KLTLYNGEKK
EFTLYNGEKK
EFTLYNGEKK
EFTLYNGEKK
EFTLYNGDKK
EFTLYNGEKK
EFTLYNGEKK
EFTLYNGERK
EFTLYNGEKK
TFYSRPNNHD
VFYSRPNNHD
VFYSRPNNHD
VFYSRPNNHD
TFYSRPNNHD
TFYSRPNNHD
TFYSRPNNHD
TFYSRPNNHD
TFYSRPNKHD
TFYSRPNSHD
TFYSRPNNHD
TFYSRPNTHG
TFYSRPNTHG
TFYSRPNTHG
TFYSRPNTHG
TFYSRPNKHG
TFYSRPNKHG
TFYSRPNRHG
NCWLNTILQL
NCWLNTILQL
NCWLNTILQL
NCWLNTILQL
NCWLNAILQL
NCWLNAILQL
NCWLNAILQL
NCWLNAILQL
NCWLNAILQL
NCWLNTILQL
NCWLNTILQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
NCWLNSLLQL
FRYVDEPFFD
FRYVGEPFFD
FRYVDEPFFD
FRYVDEPFFD
FRYVEEPFFD
FRYVEEPFFD
FRYVEEPFFD
FRYVDEPFFE
FRYVDEPFFD
FRYVDEPFFD
FRYVDEPFFD
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
FRYVDEPLFE
WVYNSPENLT
WVYDSPENLT
WVYNSPENLT
WVYNSPENLT
WVYSTPENLT
WVYSTPENLT
WVYSSPENLT
WVYDSPENLT
WVYESPENLT
WVYNSPENLT
WVYNSPENLT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
SEYLSPENKT
LAAIKQLEEL
LEAIEQLEEL
LEAIKQLEEL
LEAIKQLEEL
LEAIKQLEDL
LEAIKQLEDL
LEAIKQLEDL
VEAIRQLEEL
IQAIGQLEEL
LRAIEQLEEL
LQAIEQLEEL
LDMIKQLSDY
LDMIKQLSDY
LDMIKRLSDY
LDMIKQLSDY
LDMIKQLSDY
LDMIKQLSDY
LDMIKQLSDY
TGLELHEGGP
TGLELHEGGP
TGLELREGGP
TGLELREGGP
TGLELHEGGP
TGLELHEGGP
TGLELHEGGP
TGLELHEGGP
TGLDLREGGP
TGLELREGGP
TGLELHEGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
TKLDLSDGGP
PALVIWNIKH
PALVIWNIKH
PALVIWNIKH
PALVIWNIKH
PALVIWNIKH
LLQTGIGTAS
LLHTGIGTAS
LLHTGIGTAS
LLHTGIGTAS
LLHTGIGTAS
RPAR-CMVDG
RPSEVCMVDG
RPSEVCMVDG
RPSEVCMVDG
RPSEVCMVDG
TNMCLADFHA
TNMCLADFHA
TDMCLADFHA
TDMCLADFHA
TDMCLADFHA
GIFLKEQEHA
GIFLKGQEHA
GIFMKGREHA
GIFMKGQEHA
GIFLKGQEHA
Sequential
18 206 YF
a121
a241
c-s8c1
c1nov
o1brazl
o1campos
MNTTNCFIAL
NCWLNTILQL
PALVIWNIKH
VFACVTSNGW
LK---MNTTDCFIAL
NCWLNTILQL
PALVIWNIKH
VFACVTSNGW
LK---MNTTDCFIAV
NCWLNTILQL
PALVIWNIKH
VFACVTSNGW
LKGAGQ
MNTTDCFIAV
NCWLNTILQL
PALVIWNIKH
VFACVTSNGW
LKGAGQ
MNTTDCFIAL
NCWLNAILQL
PALVIWNIKH
VFACVTSNGW
LK---MNTTDCFIAL
NCWLNAILQL
PALVIWNIKH
VFAC
VHAIREIRAF
FRYVDEPFFD
LLQTGIGTAS
YAIDDEDFYP
FLSRATG-KM
WVYNSPENLT
RPAR-CMVDG
WTPDPSDVLV
EFTLYNGERK
LAAIKQLEEL
TNMCLADFHA
FVPYDQEPLN
TFYSRPNNHD
TGLELHEGGP
GIFLKEQEHA
GGWKANVQRK
VTAIREIRAF
FRYVGEPFFD
LLHTGIGTAS
YAIDDDDFYP
FLPRATG-RM
WVYDSPENLT
RPSEVCMVDG
WTPDPSDVLV
EFTLHNGERK
LEAIEQLEEL
TNMCLADFHA
FVPYDQEPLN
VFYSRPNNHD
TGLELHEGGP
GIFLKGQEHA
GEWKTKVQQK
VNAIKEVRAL
FRYVDEPFFD
LLHTGIGTAS
YAIDDEDFYP
FLPRTAG-KM
WVYNSPENLT
RPSEVCMVDG
WTPDPSDVLV
EFTLHDGEKK
LEAIKQLEEL
TDMCLADFHA
FVPYDQEPLN
VFYSRPNNHD
TGLELREGGP
GIFMKGREHA
EGWKASVQRK
VNAIREIRAL
FRYVDEPFFD
LLHTGIGTAS
YAIDDEDFYP
FLPRTTG-KM
WVYNSPENLT
RPSEVCMVDG
WTPDPSDVLV
EFTLHDGEKK
LEAIKQLEEL
TDMCLADFHA
FVPYDQEPLN
VFYSRPNNHD
TGLELREGGP
GIFMKGQEHA
EGWKANVQRK
VQAIREIKAL
FRYVEEPFFD
LLHTGIGTAS
YAIDDEDFYP
FLPRTTG-KM
WVYSTPENLT
RPSEVCMVDG
WTPDPSDVLV
ELTLYNGEKK
LEAIKQLEDL
TDMCLADFHA
FVPYDQEPLN
TFYSRPNNHD
TGLELHEGGP
GIFLKGQEHA
GEWKAKVQRK
VQAIREIKAL FLPRTTG-KM ELTLYNGEKK TFYSRPNNHD

FRYVEEPFFD WVYSTPENLT LEAIKQLEDL TGLELHEGGP
LLHTGIGTAS RPSEVCMVDG TDMCLADFHA GIFLKGQEHA
PDB File Format

COLUMNS
DATA TYPE
FIELD
DEFINITION
--------------------------------------------------------------------------------1 - 6
Record name
"ATOM "
7 - 11
Integer
serial
Atom serial number.
13 - 16
Atom
name
Atom name.
17
Character
altLoc
Alternate location indicator.
18 - 20
Residue name
resName
Residue name.
22
Character
chainID
Chain identifier.
23 - 26
Integer
resSeq
Residue sequence number.
27
AChar
iCode
Code for insertion of residues.
31 - 38
Real(8.3)
x
Orthogonal coordinates for X in
Angstroms.
39 - 46
Real(8.3)
y
Orthogonal coordinates for Y in
Angstroms.
47 - 54
Real(8.3)
z
Orthogonal coordinates for Z in
Angstroms.
55 - 60
Real(6.2)
occupancy
Occupancy.
61 - 66
Real(6.2)
tempFactor
Temperature factor.
73 - 76
LString(4)
segID
Segment identifier, left-justified.
77 - 78
LString(2)
element
Element symbol, right-justified.
79 - 80
LString(2)
charge
Charge on the atom.
PDB Example
HEADER
TITLE
TITLE
COMPND
COMPND
COMPND
COMPND
COMPND
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
KEYWDS
KEYWDS
EXPDTA
AUTHOR
AUTHOR
REVDAT
REVDAT
JRNL
JRNL
JRNL
JRNL
JRNL
JRNL
JRNL
JRNL
REMARK
REMARK
REMARK
REMARK
LYASE
06-JUL-99
1QU4
CRYSTAL STRUCTURE OF TRYPANOSOMA BRUCEI ORNITHINE
2 DECARBOXYLASE
MOL_ID: 1;
2 MOLECULE: ORNITHINE DECARBOXYLASE;
3 CHAIN: A, B, C, D;
4 EC: 4.1.1.17;
5 ENGINEERED: YES
MOL_ID: 1;
2 ORGANISM_SCIENTIFIC: TRYPANOSOMA BRUCEI;
3 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
4 EXPRESSION_SYSTEM_COMMON: BACTERIA;
5 EXPRESSION_SYSTEM_STRAIN: B21/DG3;
6 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID
POLYAMINE METABOLISM, PYRIDOXAL 5'-PHOSPHATE, ALPHA-BETA
2 BARREL, LYASE
X-RAY DIFFRACTION
N.V.GRISHIN,A.L.OSTERMAN,H.B.BROOKS,M.A.PHILLIPS,
2 E.J.GOLDSMITH
2
29-DEC-99 1QU4
1
JRNL
COMPND REMARK
1
17-NOV-99 1QU4
0
AUTH
N.V.GRISHIN,A.L.OSTERMAN,H.B.BROOKS,M.A.PHILLIPS,
AUTH 2 E.J.GOLDSMITH
TITL
X-RAY STRUCTURE OF ORNITHINE DECARBOXYLASE FROM
TITL 2 TRYPANOSOMA BRUCEI: THE NATIVE STRUCTURE AND THE
TITL 3 STRUCTURE IN COMPLEX WITH
TITL 4 ALPHA-DIFLUOROMETHYLORNITHINE
REF
BIOCHEMISTRY
V. 38 15174 1999
REFN
ASTM BICHAW US ISSN 0006-2960
1
2
2 RESOLUTION. 2.90 ANGSTROMS.
DBREF
DBREF
DBREF
DBREF
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
1QU4 A
1QU4 B
1QU4 C
1QU4 D
1 A
2 A
3 A
4 A
5 A
6 A
7 A
8 A
9 A
10 A
11 A
12 A
13 A
14 A
15 A
16 A
17 A
18 A
19 A
20 A
21 A
22 A
23 A
24 A
25 A
26 A
27 A
28 A
29 A
30 A
31 A
32 A
33 A
1
1
1
1
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425
425 SWS
425 SWS
425 SWS
425 SWS
GLY ALA MET
ARG PHE LEU
LYS LYS ILE
PHE PHE VAL
GLU THR TRP
TYR ALA VAL
THR LEU ALA
ASN THR GLU
PRO GLU LYS
SER HIS ILE
MET THR PHE
LYS THR HIS
THR ASP ASP
PHE GLY ALA
GLN ALA LYS
PHE HIS VAL
ALA GLN ALA
GLY THR GLU
GLY GLY GLY
PHE GLU GLU
LYS HIS PHE
GLU PRO GLY
ALA VAL ASN
GLN THR ASP
SER PHE MET
PHE ASN CYS
LEU PRO GLN
PRO SER SER
GLN ILE VAL
GLY GLU TRP
VAL VAL GLY
THR ILE TYR
VAL ARG GLU
P07805
DCOR_TRYBB
P07805
DCOR_TRYBB
P07805
DCOR_TRYBB
P07805
DCOR_TRYBB
ASP ILE VAL VAL ASN ASP
GLU GLY PHE ASN THR ARG
SER MET ASN THR CYS ASP
ALA ASP LEU GLY ASP ILE
LYS LYS CYS LEU PRO ARG
LYS CYS ASN ASP ASP TRP
ALA LEU GLY THR GLY PHE
ILE GLN ARG VAL ARG GLY
ILE ILE TYR ALA ASN PRO
ARG TYR ALA ARG ASP SER
ASP CYS VAL ASP GLU LEU
PRO LYS ALA LYS MET VAL
SER LEU ALA ARG CYS ARG
LYS VAL GLU ASP CYS ARG
LYS LEU ASN ILE ASP VAL
GLY SER GLY SER THR ASP
ILE SER ASP SER ARG PHE
LEU GLY PHE ASN MET HIS
PHE PRO GLY THR ARG ASP
ILE ALA GLY VAL ILE ASN
PRO PRO ASP LEU LYS LEU
ARG TYR TYR VAL ALA SER
VAL ILE ALA LYS LYS VAL
VAL GLY ALA HIS ALA GLU
TYR TYR VAL ASN ASP GLY
ILE LEU TYR ASP HIS ALA
ARG GLU PRO ILE PRO ASN
VAL TRP GLY PRO THR CYS
GLU ARG TYR TYR LEU PRO
LEU LEU PHE GLU ASP MET
THR SER SER PHE ASN GLY
TYR VAL VAL SER GLY LEU
LEU LYS SER GLN LYS SER
21
21
21
21
ASP LEU
ASP ALA
GLU GLY
VAL ARG
VAL THR
ARG VAL
ASP CYS
ILE GLY
CYS LYS
GLY VAL
GLU LYS
LEU ARG
LEU SER
PHE ILE
THR GLY
ALA SER
VAL PHE
ILE LEU
ALA PRO
ASN ALA
THR ILE
ALA PHE
THR PRO
SER ASN
VAL TYR
VAL VAL
GLU LYS
ASP GLY
GLU MET
GLY ALA
PHE GLN
PRO ASP
445
445
445
445
SER CYS
LEU CYS
ASP PRO
LYS HIS
PRO PHE
LEU GLY
ALA SER
VAL PRO
GLN ILE
ASP VAL
VAL ALA
ILE SER
VAL LYS
LEU GLU
VAL SER
THR PHE
ASP MET
ASP ILE
LEU LYS
LEU GLU
VAL ALA
THR LEU
GLY VAL
ALA GLN
GLY SER
ARG PRO
LEU TYR
LEU ASP
GLN VAL
TYR THR
SER PRO
HIS VAL
HET
HET
HET
HET
HETNAM
HETSYN
FORMUL
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
SHEET
PLP
PLP
PLP
PLP
5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
1
2
3
4
5
6
A 600
15
B 600
15
C 600
15
D 600
15
PLP PYRIDOXAL-5'-PHOSPHATE
PLP VITAMIN B6 COMPLEX
PLP
4(C8 H10 N1 O6 P1)
1 LEU A
45 LEU A
59
2 LYS A
69 ASN A
71
3 ASP A
73 GLY A
84
4 SER A
91 ILE A 101
5 PRO A 104 GLU A 106
6 GLN A 116 SER A 126
7 CYS A 135 HIS A 146
8 LYS A 173 GLU A 175
9 ASP A 176 LEU A 187
10 ALA A 205 LEU A 225
11 LYS A 247 PHE A 263
12 GLY A 276 ALA A 281
13 PHE A 326 HIS A 333
14 THR A 390 THR A 394
15 SER A 396 PHE A 400
A 6 GLN A 365 PRO A 373
A 6 LEU A 350 TRP A 356
A 6 SER A 313 VAL A 318
A 6 PHE A 284 THR A 296
A 6 PHE A 40 ASP A 44
A 6 THR A 404 VAL A 408
A1 6 GLN A 365 PRO A 373
A1 6 LEU A 350 TRP A 356
A1 6 SER A 313 VAL A 318
A1 6 PHE A 284 THR A 296
A1 6 TRP A 380 PHE A 383
A1 6 PRO A 338 PRO A 340
1
5
1
1
5
1
1
5
1
1
1
1
1
5
5
0
-1
1
-1
-1
1
0
-1
1
-1
-1
-1
15
3
12
11
3
11
12
3
12
21
17
6
8
5
5
N
O
N
O
O
TYR
PHE
ILE
PHE
THR
A
A
A
A
A
351
314
291
40
404
O
N
O
N
N
LEU
SER
TYR
ALA
PHE
A
A
A
A
A
372
354
317
287
41
N
O
N
N
O
TYR
PHE
ILE
LEU
LEU
A
A
A
A
A
351
314
291
381
339
O
N
O
O
N
LEU
SER
TYR
VAL
LEU
A
A
A
A
A
372
354
317
288
382
CRYST1
66.800 151.700
85.350 90.00 102.30
ORIGX1
1.000000 0.000000 0.000000
ORIGX2
0.000000 1.000000 0.000000
ORIGX3
0.000000 0.000000 1.000000
SCALE1
0.014970 0.000000 0.003264
SCALE2
0.000000 0.006592 0.000000
SCALE3
0.000000 0.000000 0.011992
ATOM
1 N
ASP A 35
34.731 -5.686
ATOM
2 CA ASP A 35
34.249 -5.884
ATOM
3 C
ASP A 35
33.320 -4.750
ATOM
4 O
ASP A 35
33.474 -3.594
ATOM
5 CB ASP A 35
33.558 -7.247
ATOM
6 CG ASP A 35
33.566 -7.887
ATOM
7 OD1 ASP A 35
33.717 -9.133
ATOM
8 OD2 ASP A 35
33.419 -7.182
ATOM
9 N
GLU A 36
32.332 -5.073
ATOM
10 CA GLU A 36
31.446 -4.080
ATOM
11 C
GLU A 36
32.259 -2.944
ATOM
12 O
GLU A 36
32.220 -1.813
ATOM
13 CB GLU A 36
30.419 -3.638
ATOM
14 CG GLU A 36
29.111 -3.155
ATOM
15 CD GLU A 36
27.791 -3.597
ATOM
16 OE1 GLU A 36
27.308 -4.727
ATOM
17 OE2 GLU A 36
27.115 -2.806
ATOM
18 N
GLY A 37
33.018 -3.192
ATOM
19 CA GLY A 37
33.624 -2.167
ATOM
20 C
GLY A 37
32.598 -1.167
ATOM
21 O
GLY A 37
32.236 -1.162
ATOM
22 N
ASP A 38
32.135 -0.248
ATOM
23 CA ASP A 38
31.136
0.700
ATOM
24 C
ASP A 38
31.794
1.722
ATOM
25 O
ASP A 38
33.029
1.896
ATOM
26 CB ASP A 38
30.500
1.242
ATOM
27 CG ASP A 38
29.583
0.207
ATOM
28 OD1 ASP A 38
29.408 -0.876
ATOM
38 CA PHE A 40
32.728
6.727
...
CONECT1117911177
CONECT1118011177
MASTER
482
0
4
60
80
0
0
END
90.00 P 1 21 1
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
15.000 1.00 98.44
13.629 1.00 98.39
13.203 1.00 98.13
13.603 1.00 98.29
13.545 1.00 98.38
12.170 1.00 98.36
12.114 1.00 98.26
11.148 1.00 98.39
12.378 1.00 97.79
11.787 1.00 95.51
11.199 1.00 90.65
11.692 1.00 94.96
12.840 1.00 97.63
12.261 1.00 98.19
12.824 1.00 98.33
12.601 1.00 98.28
13.520 1.00 98.43
10.131 1.00 52.86
9.299 1.00 39.88
8.712 1.00 34.34
7.531 1.00 31.44
9.564 1.00 37.23
9.138 1.00 36.44
8.228 1.00 33.49
8.156 1.00 34.06
10.405 1.00 42.06
11.047 1.00 44.59
10.434 1.00 45.72
7.615 1.00 20.51
611176
64
N
C
C
O
C
C
O
O
N
C
C
O
C
C
C
O
O
N
C
C
O
N
C
C
O
C
C
O
C
132
Atom serial number
Name
Residue Chain
Seq Nr
X Y
CRYST1
66.800 151.700
85.350 90.00 102.30
ORIGX1
1.000000 0.000000 0.000000
ORIGX2
0.000000 1.000000 0.000000
ORIGX3
0.000000 0.000000 1.000000
SCALE1
0.014970 0.000000 0.003264
SCALE2
0.000000 0.006592 0.000000
SCALE3
0.000000 0.000000 0.011992
ATOM
1 N
ASP A 35
34.731 -5.686
ATOM
2 CA ASP A 35
34.249 -5.884
ATOM
3 C
ASP A 35
33.320 -4.750
ATOM
4 O
ASP A 35
33.474 -3.594
ATOM
5 CB ASP A 35
33.558 -7.247
ATOM
6 CG ASP A 35
33.566 -7.887
ATOM
7 OD1 ASP A 35
33.717 -9.133
ATOM
8 OD2 ASP A 35
33.419 -7.182
ATOM
9 N
GLU A 36
32.332 -5.073
ATOM
10 CA GLU A 36
31.446 -4.080
ATOM
11 C
GLU A 36
32.259 -2.944
ATOM
12 O
GLU A 36
32.220 -1.813
ATOM
13 CB GLU A 36
30.419 -3.638
ATOM
14 CG GLU A 36
29.111 -3.155
ATOM
15 CD GLU A 36
27.791 -3.597
ATOM
16 OE1 GLU A 36
27.308 -4.727
ATOM
17 OE2 GLU A 36
27.115 -2.806
ATOM
18 N
GLY A 37
33.018 -3.192
ATOM
19 CA GLY A 37
33.624 -2.167
ATOM
20 C
GLY A 37
32.598 -1.167
ATOM
21 O
GLY A 37
32.236 -1.162
ATOM
22 N
ASP A 38
32.135 -0.248
ATOM
23 CA ASP A 38
31.136
0.700
ATOM
24 C
ASP A 38
31.794
1.722
ATOM
25 O
ASP A 38
33.029
1.896
ATOM
26 CB ASP A 38
30.500
1.242
ATOM
27 CG ASP A 38
29.583
0.207
ATOM
28 OD1 ASP A 38
29.408 -0.876
ATOM
38 CA PHE A 40
32.728
6.727
...
CONECT1117911177
CONECT1118011177
MASTER
482
0
4
60
80
0
0
END
Occupancy Temp Factor
90.00 P 1 21 1
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
15.000 1.00 98.44
13.629 1.00 98.39
13.203 1.00 98.13
13.603 1.00 98.29
13.545 1.00 98.38
12.170 1.00 98.36
12.114 1.00 98.26
11.148 1.00 98.39
12.378 1.00 97.79
11.787 1.00 95.51
11.199 1.00 90.65
11.692 1.00 94.96
12.840 1.00 97.63
12.261 1.00 98.19
12.824 1.00 98.33
12.601 1.00 98.28
13.520 1.00 98.43
10.131 1.00 52.86
9.299 1.00 39.88
8.712 1.00 34.34
7.531 1.00 31.44
9.564 1.00 37.23
9.138 1.00 36.44
8.228 1.00 33.49
8.156 1.00 34.06
10.405 1.00 42.06
11.047 1.00 44.59
10.434 1.00 45.72
7.615 1.00 20.51
611176
64
Element
N
C
C
O
C
C
O
O
N
C
C
O
C
C
C
O
O
N
C
C
O
N
C
C
O
C
C
O
C
132
File Format Conversions
Wide variety of formats

Common tools
readseq (all flavors of Unix)
1.
2.
3.
4.
5.
6.
7.
8.
9.
IG/Stanford
GenBank/GB
NBRF
EMBL
GCG
DNAStrider
Fitch
Pearson/Fasta
Zuker (in-only)
10.
11.
12.
13.
14.
15.
16.
17.
18.
Olsen (in-only)
Phylip3.2 (Sequential)
Phylip (Interleaved)
Plain/Raw
PIR/CODATA
MSF
ASN.1
PAUP/NEXUS
Pretty (out-only)
seqret (EMBOSS)
gcg GCG 9.x and 10.x format

embl
swiss
fasta
genbank
nbrf
pir NBRF (PIR)
codata CODATA format.
strider DNA strider format
clustal
phylip PHYLIP non-interleaved multiple alignment format.
acedb ACeDB format
msf Wisconsin Package GCG's MSF multiple sequence format.
hennig86 Hennig86 format
jackknifer Jackknifer format
jackknifernon Jackknifernon format
nexus
paup Nexus/PAUP format
treecon Treecon format
mega Mega format
ig IntelliGenetics format.
staden
text
Many GUI packages such as GCG

SeqLab (Unix), BioEdit (Windows), etc.
have built in conversion utilities between
different file formats
Forcon is handy for converting between
phylogenetic multiple alignment formats
Structure file formats
Major formats
PDB Protein Database
mol2 Tripos Sybyl
mmCIF - Macromolecular
Crystallographic Information File
XYZ
Some packages can automatically

convert between these formats
Babel
Alchemy
Biosym .CAR
Cambridge CADPAC
Chem3D Cartesian 2
CSD GSTAT
Feature
Gaussian Output
Gaussian 94 Output
Hyperchem HIN
Mac Molecule
MM2 Input
MMADS
Mopac Cartesian
PC Model
PS-GVB Output
ShelX
Spartan Semi-Empirical
Sybyl Mol2
XYZ
AMBER PREP
Boogie
CHARMm
CSD CSSR
Dock Database
Free Form Fractional
Gaussian Z-Matrix
GAMESS Output (A)
MDL Isis (SDF)
Macromodel
MM2 Ouput
MDL MOLfile
Mopac Internal
PDB
Quanta MSF
SMILES
Spartan Mol. Mechanics
Conjure
XED
Ball and Stick

Cacao Cartesian
Chem3D Cartesian 1
CSD FDAT
Dock PDB
GAMESS Output
Gaussian 92 Output
GROMOS96 (nm)
M3D
Micro World
MM3
MOLIN
Mopac Output
PS-GVB Input
Schakal
Spartan
Sybyl Mol
UniChem XYZ
Also has the ability to add and delete hydrogens

Available for Unix (AIX, Ultrix, Sun-OS, Convex, SGI, Cray, Linux), MS-DOS, and
on Macs running at least System 7.0.
babel -imm2out mm2.grf -omopint mopac.dat
Some programming tools for

conversions
bioperl
use Bio::SeqIO
;
Bio::SeqIO;
$in = Bio::SeqIO
->new(-file => "inputfilename" , '-format' => 'Fasta');
Bio::SeqIO->new(-file
'Fasta');
$out = Bio::SeqIO
->new(-file => ">outputfilename" , '-format' => 'EMBL');
Bio::SeqIO->new(-file
'EMBL');
while ( my $seq = $in->next_seq() )
{ $out->write_seq($seq);
}
or
use Bio::SeqIO
;
Bio::SeqIO;
$in = Bio::SeqIO
->newFh(-file => "inputfilename" , '-format' => 'Fasta');
Bio::SeqIO->newFh(-file
'Fasta');
$out = Bio::SeqIO
->newFh('-format' => 'EMBL');
Bio::SeqIO->newFh('-format'
'EMBL');
# World's shortest Fasta<->EMBL format converter:
print $out $_ while <$in>;
biopython
Scanner - The part of the parser that actually does

the work or going through the file and extracting
useful information. This useful information is
converted into events.
Consumer - The consumer does the job of
processing the useful information and spitting it out
in a format that the programmer can use. The
consumer does this by receiving the events created
by the scanner.
You may be required to write your own scanner and
consumer for certain formats
Translating nucleotide formats
Factors to take into account

Translate in all 6 reading frames
3 forward, 3 reverse
The use of non-standard genetic codes for
different organisms
Stop codons
Output format
1 letter
3 letter
EMBOSS
transeq
It can translate in any of the 3 forward or three reverse
sense frames, or in all three forward or reverse frames,
or in all six frames.
It can translate specified regions corersponding to the
coding regions of your sequences.
It can translate using the standard ('Universal') genetic
code and also with a selection of non-standard codes.
Termination (STOP) codons are translated as the
character '*'.
The output peptide sequence is always in the standard
one-letter IUPAC code.
prettyseq
This writes out a nicely formatted display of the
sequence with the translation (within specified ranges)
displayed beneath it.
Slightly unusually, this application uses the codon usage
tables to translate the codons
Web tools
Expasy translate tool
EBI translation machine
Viewers for sequencer data
abiview (EMBOSS)
Trev (Unix)
EditView (Mac)
Chromas (Windows)
AbiView (Windows)
Most viewers allow you to:

View the traces
Change the scale
Edit the basecalling
Preserve the original sequence
Export the data
Analysis of primary data from

sequencers
Staden Package (MRC-LMB)
Preparing sequence trace data for analysis

for assembly
pregap4
Graphical user interface

Prepare trace data
Automation
Trace format conversion
Quality analysis
Vector clipping
Contaminant screening
Repeat searching.
Assembly program
gap4
Assembly
Contig joining
Assembly checking
Repeat searching
Experiment suggestion
Read pair analysis
Contig editing
Graphical views of contigs
Database
Consed
Phred: base caller

Phrap: assembler
Consed: Editor and finishing program
Quality values
Phred designed for gel-based sequencers

Being checked for capillary data
Finding open reading frames
GRAIL
Neural network
Combine evidence fron 7 different statistical
measures
Frame bias
Periodicities
Fractal dimensions
Coding 6-tuples
In-frame 6-tuples
K-tuple commonality
Repetitive 6-tuple words
At each position of the sequence, info is

weighted, integrated and scored for ORF or
intergenic region
Organism/dataset specificity
Genscan
Statistics and probabilistic models of gene

structure
GeneWise
Comparison of translations with known

proteins
NetGene
Donor and acceptor sites
EMBOSS
getorf
plotorf
Determining protein and DNA

characteristics
Web
BCM Search Launcher
Nucleic acid sequence searches
General protein sequence/pattern searches
Species-Specific protein sequence searches
Multiple sequence alignments
Pairwise sequence alignments
Gene feature searches
Sequence utilities
Protein secondary structure prediction
SMART
Protein domain and feature analysis
Pfam
HMM-based protein motif searches
Prosite
Detects signature motifs in proteins
Regular expression searches
Scan sequenes against database
Prints
Protein fingerprints
EMBOSS DNA
cpgplot plots cpg rich areas
restrict restriction sites
tfscan transcription factors
einverted find inverted repeats
chips codon usage
geecee GC content
EMBOSS protein
garnier - predicts protein secondary structure

helixturnhelix - report nucleic acid binding motifs
hmoment - hydrophobic moment calculation
pepcoil - predicts coiled coil regions
pepnet - displays proteins as a helical net
pepwheel - shows protein sequences as helices
tmap - displays membrane spanning regions
topo - draws an image of a transmembrane protein
charge - protein charge plot
checktrans - reports STOP codons and ORF statistics of a protein
sequence
compseq - counts the composition of dimer/trimer/etc words in a
sequence
iep - calculates the isoelectric point of a protein
octanol - displays protein hydropathy
pepinfo - plots simple amino acid properties in parallel
pepstats - protein statistics
pepwindow - displays protein hydropathy
antigenic - finds antigenic sites in proteins
pscan - scans proteins using PRINTS
sigcleave - reports protein signal cleavage sites
Primer Design
Factors
Melting point
Length
Composition
Methods for calculating melting point
Internal stability
Specificity
False priming sites
Internal stability
Hairpin structures
Compatibility
Primer dimers
Compatible melting points
OLIGO Package
Nearest neighbour method for Tm
calculation
Comprehensive analysis suite
$$$
CODEHOP
COnsensus-DEgenerate Hybrid Oligonucleotide Primer
PCR primers designed from protein multiple sequence
alignments
Primer3
You provide the target sequence
It picks primers for PCR reactions, considering
as criteria:
Oligonucleotide melting temperature

Size
GC content
primer-dimer possibilities
PCR product size
Positional constraints within the source sequence
Miscellaneous other constraints.
start len
1 LEFT PRIMER
RIGHT PRIMER
tm
gc%
any 3'
seq
66
20
60.22
55.00
5.00
2.00 AAGAGTCTGGGGGAGCTGAT
259
20
60.19
50.00
4.00
2.00 ATCATTGCTGGGCTGATCTC
PRODUCT SIZE: 194, PAIR ANY COMPL: 4.00, PAIR 3' COMPL: 2.00
2 LEFT PRIMER
RIGHT PRIMER
331
20
60.25
45.00
5.00
2.00 AGCTCATTGGGCAAAAAGTG
529
20
59.55
55.00
2.00
1.00 CCAGTTCCAATAGCCCAGAC
3 LEFT PRIMER
RIGHT PRIMER
331
20
60.25
45.00
5.00
2.00 AGCTCATTGGGCAAAAAGTG
538
20
60.12
45.00
3.00
2.00 GCAGTTTTGCCAGTTCCAAT
4 LEFT PRIMER
RIGHT PRIMER
379
20
59.67
50.00
4.00
2.00 TCATCGCCTGTATTGGTGAG
578
20
60.44
50.00
6.00
2.00 GCGGAGTTTCTTGTGCACTT
Statistics
con
too
in
in
no
tm
tm
high
high
sid
many
tar
excl
GC
too
too
any
3'
poly
end
ered
Ns
get
reg
GC% clamp
low
high compl compl
stab
ok
Left
4198
810
2322
17
65
86
898
Right
4172
807
2281
83
994
bad
high
Pair Stats:
considered 811, unacceptable product size 422, high any compl 1, high end compl 33, ok
355

8e5bbfunda Seq Anals

Uploaded by

Copyright:

Available Formats

You might also like

8e5bbfunda Seq Anals

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

8e5bbfunda Seq Anals

Uploaded by

Copyright:

Available Formats

Fundamentals of

FASTA File Format

First line contains > followed by a space and a

Genbank File Format

NM_079846 1190 bp mRNA linear INV 15-DEC-2001

EMBL File Format

standard; DNA; INV; 3419 BP.

Sequence 3419 BP; 855

A; 933 C; 849 G; 778 T; 4 other;

PHYLIP File Format

Interleaved and Sequential formats

VQAIREIKAL FLPRTTG-KM ELTLYNGEKK TFYSRPNNHD

PDB File Format

Atom serial number

Occupancy Temp Factor

File Format Conversions

Wide variety of formats

gcg GCG 9.x and 10.x format

Many GUI packages such as GCG

Structure file formats

Some packages can automatically

Ball and Stick

Also has the ability to add and delete hydrogens

babel -imm2out mm2.grf -omopint mopac.dat

Some programming tools for

Scanner - The part of the parser that actually does

Translating nucleotide formats

Factors to take into account

EBI translation machine

Viewers for sequencer data

Most viewers allow you to:

Analysis of primary data from

Preparing sequence trace data for analysis

Graphical user interface

Phred: base caller

Phred designed for gel-based sequencers

Finding open reading frames

At each position of the sequence, info is

Statistics and probabilistic models of gene

Comparison of translations with known

Donor and acceptor sites

Determining protein and DNA

Protein domain and feature analysis

HMM-based protein motif searches

garnier - predicts protein secondary structure

False priming sites

Oligonucleotide melting temperature

high compl compl

You might also like