Professional Documents
Culture Documents
Project Report On Influenza Virus
Project Report On Influenza Virus
Project Report On Influenza Virus
Page no.
Preface
Chapter 1: Introduction
Chapter4: Conclusion
Chapter 6: Abbreviation
Appendix
Preface
Viruses are masters of interspecies navigation. Mutating rapidly and often grabbing the genetic material of other
viruses, they can jump from animals to humans with a quick flick of their DNA. Sometimes, as in West Nile
fever, the transfer occurs through an intermediate host such as a mosquito. But viruses can also make the leap
directly.
Since the 1980s, the list of diseases that have hitchhiked directly from animals to people has grown
rapidly — Hantavirus, SARS, monkey pox and, most recently, avian influenza, commonly called bird flu. With
the exception of HIV/AIDS, perhaps none of these illnesses has more potential to create widespread harm than
bird flu does.
In people, bird flu usually begins much like conventional influenza, with fever, cough, sore throat and
muscle aches, but bird flu can lead to life-threatening complications.
So far, bird flu is hard for humans to contract, but health officials warn a major flu outbreak could occur if the
virus mutates into a form that can spread easily from person to person. The grimmest scenario would be a global
epidemic to rival the flu pandemic of 1918 and 1919, which claimed millions of lives worldwide. In the
meantime, researchers are trying to sort out options for a vaccine. Bird flu seems to be developing resistance to
the flu drug Tami flu. And a French vaccine maker has produced a bird flu vaccine that promoted an immune
system response but still needs further study.
Lots of work has been going on in bird flu virus .So I also decide to do something in this topic. So I
perform the phylogenetic analysis of different strains of Influenza A virus.
Our current work aimed at analyzing the phylogenetic relationship between the different strains of
Influenza A virus and analyzing the cause of virulence in the light of evolution. We compare the evolutionary
position of different strains in the phylogenetic trees, taking five different types of strains which come under either
HPAI or LPAI. This study throws some light on evolutionary relationship between different strains of Influenza A
virus for better understanding of the evolution of pathogenesis in terms of antigenic drift and shift. Also to model
the unknown protein structure of influenza A virus and to find appropriate drug target for it.
The Gene sequences were collected from NCBI site for the strain of Influenza A virus for the purpose of
phylogenetic analysis. we got nearly 40 sequences .the sequences are aligned using CLUSTAL W package to
know their similarity and relationships..using the output of the CLUSTAL W as the input for the PHYLIP we
perform phylogenetic analysis .Then we use N-J plot for visualizing the tree constructed by the PHYLIP.
After that I collect the protein sequences from NCBI website of H5N1 strain of Influenza A virus.I got
nearly 10 such sequences In order to proceed the modelling we take protein sequences of h5n1 strain only. For it
we use SWISS MODEL server.
Then we go for docking in order to proceed the structure prediction analysis .we find that the
Neuraminidase protein in influenza A virus coded by the NA gene is one of the reasons of its pathogenecity.We
find out appropriate ligand from PDBSUM or CSA, .then we perform docking through HEX software for docking.
We also perform Family analysis through GENSCAN, ORF FINDER for finding consense sequence, motif, exon,
introns & orfs etc.
2
Chapter: 1
INTRODUCTION
Introduction
Avian Influenza viruses that infect bird are called avian influenza A viruses only influenza A viruses
infect birds and all known subtypes of influenza a viruses can infect birds. However there are substantial genetic
differences between the subtype that typically infect both people and birds. Avian influenza A, h5 and h9 viruses
can be distinguished as low pathogenic and high pathogenic forms on the basis of genetic features of the virus.
Influenza A virus, the virus that causes avian flu. Transmission electron micrograph of negatively stained
virus particles in late passage. (Source: Dr. Erskine Palmer, Centers for Disease Control and Prevention
Public Health Image Library)
Avian influenza is a disease of birds caused by influenza viruses closely related to human influenza viruses.
Transmission to humans in close contact with poultry or other birds occurs rarely and only with some strains of
avian influenza. The potential for transformation of avian influenza into a form that both causes severe disease in
humans and spreads easily from person to person is a great concern for world health
Wild birds are the natural host for all known subtypes of influenza A viruses. Typically, wild birds do not become
sick when they are infected with avian influenza A viruses. However, domestic poultry, such as turkeys and
chickens, can become very sick and die from avian influenza, and some avian influenza A viruses also can cause
serious disease and death in wild birds.
Three distinct types of influenza virus, dubbed A, B, and C, have been identified. Together these viruses, which
are antigenically distinct from one another, comprise their own viral family, Orthomyxoviridae. Most cases of
the flu, especially those that occur in epidemics or pandemics, are caused by the influenza A virus, which can
3
affect a variety of animal species, but the B virus, which normally is only found in humans, is responsible for
many localized outbreaks. The influenza C virus is morphologically and genetically different than the other two
viruses and is generally no symptomatic, so is of little medical concern
Influenza Type A
Influenza type A viruses can infect people, birds, pigs, horses, seals, whales, and other animals, but wild birds are
the natural hosts for these viruses. Influenza type A viruses are divided into subtypes based on two proteins on the
surface of the virus. These proteins are called hemagglutinin (HA) and neuraminidase (NA). There are 15
different HA subtypes and 9 different NA subtypes. Many different combinations of HA and NA proteins are
possible. Only some influenza A subtypes (i.e., H1N1, H1N2, and H3N2) are currently in general circulation
among people. Other subtypes are found most commonly in other animal species. For example, H7N7 and H3N8
viruses cause illness in horses and dogs.
Subtypes of influenza A virus are named according to their HA and NA surface proteins. For example, an “H7N2
virus” designates influenza A subtype that has an HA 7 protein and an NA 2 protein. Similarly an “H5N1” virus
has an HA 5 protein and an NA 1 protein.
Influenza Type B
Influenza B viruses are normally found only in humans. Unlike influenza A viruses, these viruses are not
classified according to subtype. Although influenza type B viruses can cause human epidemics, they have not
caused pandemics.
Influenza Type C
Influenza type C viruses cause mild illness in humans and do not cause epidemics or pandemics. These viruses are
not classified according to subtype.
Strains
Influenza B viruses and subtypes of influenza A virus are further characterized into strains. There are many
different strains of influenza B viruses and of influenza A subtypes. New strains of influenza viruses appear and
replace older strains. This process occurs through a type of change is called “drift” (see How Influenza Viruses
Can Change: Shift and Drift). When a new strain of human influenza virus emerges, antibody protection that may
have developed after infection or vaccination with an older strain may not provide protection against the new
strain. Thus, the influenza vaccine is updated on a yearly basis to keep up with the changes in influenza viruses.
Subtypes
Influenza A viruses are significant for their potential for disease and death in humans and other animals. Influenza
A virus subtypes that have been confirmed in humans, in order of the number of known human pandemic deaths
that they have caused, include:
• H1N1, which caused "Spanish Flu" and currently causes seasonal human flu
4
• H2N2, which caused "Asian Flu"
• H3N2, which caused "Hong Kong Flu" and currently causes seasonal human flu
• H5N1, the world's major current pandemic threat
• H9N2, which has infected three people
Only influenza A viruses infect birds. Wild birds are the natural host for all subtypes of influenza A virus.
Typically wild birds do not get sick when they are infected with influenza virus. However, domestic poultry, such
as turkeys and chickens, can get very sick and die from avian influenza, and some avian viruses also can cause
serious disease and death in wild birds.
Structure
The structure of the influenza virus (see Figure 1) is somewhat variable, but the virion particles are usually
spherical or ovoid in shape and 80 to 120 nanometres in diameter. Sometimes filamentous forms of the virus
occur as well, and are more common among some influenza strains than others. The influenza virion is an
enveloped virus that derives its lipid bilayer from the plasma membrane of a host cell. Two different varieties of
glycoprotein spike are embedded in the envelope. Approximately 80 percent of the spikes are hemagglutinin, a
trimeric protein that functions in the attachment of the virus to a host cell. The remaining 20 percent or so of the
glycoprotein spikes consist of neuraminidase, which is thought to be predominantly involved in facilitating the
release of newly produced virus particles from the host cell. On the inner side of the envelope that surrounds an
influenza virion is an antigenic matrix protein lining. Within the envelope is the influenza genome, which is
organized into eight pieces of single-stranded RNA (A and B forms only; influenza C has 7 RNA segments). The
RNA is packaged with nucleoprotein into a helical ribonucleoprotein form, with three polymerase peptides for
each RNA segment.
5
Diagrammatic representation of the morphology of an influenza virion.
o NA codes for neuraminidase, which is an antigenic glycoprotein enzyme, found on the surface
of the influenza viruses. It helps the release of progeny viruses from infected cells.
• Internal viral protein encoding gene segments (RNA molecule): (M, NP, NS, PA, PB1, PB2)
6
o M codes for the matrix proteins (M1 and M2) that along with the two surface proteins
(hemagglutinin and neuraminidase) make up the capsid (protective coat) of the virus. It encodes
by using different reading frames from the same RNA segment.
M1 is a protein that binds to the viral RNA.
M2 is a protein that uncoats the virus exposing its contents (the eight RNA segments)
to the cytoplasm of the host cell. The M2 transmembrane protein is an ion channel
required for efficient infection. Nucleoprotein encoding gene segments
o PA codes for the PA protein which is a critical component of the viral polymerase.
o PB1 codes for the PB1 protein and the PB1-F2 protein.
o PB2 codes for the PB2 protein which is a critical component of the viral polymerase.
Influenza viruses can change in two different ways. One is called "antigenic drift." These are small changes in
the virus that happen continually over time. Antigenic drift produces new virus strains that may not be
recognized by the body's immune system. This process works as follows: a person infected with a particular flu
virus strain develops antibody against that virus. As newer virus strains appear, the antibodies against the older
strains no longer recognize the "newer" virus, and reinfection can occur. This is one of the main reasons why
people can get the flu more than one time. In most years, one or two of the three virus strains in the influenza
vaccine are updated to keep up with the changes in the circulating flu viruses. So, people who want to be
protected from flu need to get a flu shot every year.
The other type of change is called "antigenic shift." Antigenic shift is an abrupt, major change in the influenza A
viruses, resulting in new hemagglutinin and/or new hemagglutinin and neuraminidase proteins in influenza
viruses that infect humans. Shift results in a new influenza A subtype. When shift happens, most people have
little or no protection against the new virus. While influenza viruses are changing by antigenic drift all the time,
antigenic shift happens only occasionally. Type A viruses undergo both kinds of changes; influenza type B
viruses change only by the more gradual process of antigenic drift.
Influenza viruses are dynamic and are continuously evolving. Influenza viruses can change in two different
ways: antigenic drift and antigenic shift. Influenza viruses are changing by antigenic drift all the time, but
antigenic shift happens only occasionally. Influenza type A viruses undergo both kinds of changes; influenza
type B viruses change only by the more gradual process of antigenic drift.
7
Genetic of drifting And shifting:
Antigenic drift refers to small, gradual changes that occur through point mutations in the two genes that contain
the genetic material to produce the main surface proteins, hemagglutinin, and neuraminidase. These point
mutations occur unpredictably and result in minor changes to these surface proteins. Antigenic drift produces
new virus strains that may not be recognized by antibodies to earlier influenza strains. This process works as
follows: a person infected with a particular influenza virus strain develops antibody against that strain. As newer
virus strains appear, the antibodies against the older strains might not recognize the "newer" virus, and infection
with a new strain can occur. This is one of the main reasons why people can become infected with influenza
viruses more than one time and why global surveillance is critical in order to monitor the evolution of human
influenza virus stains for selection of which strains should be included in the annual production of influenza
vaccine. In most years, one or two of the three virus strains in the influenza vaccine are updated to keep up with
the changes in the circulating influenza viruses. For this reason, people who want to be immunized against
influenza need to be vaccinated every year.
Antigenic shift refers to an abrupt, major change to produce a novel influenza A virus subtype in humans that
was not currently circulating among people (see more information below under Influenza Type A and Its
Subtypes). Antigenic shift can occur either through direct animal (poultry)-to-human transmission or through
mixing of human influenza A and animal influenza A virus genes to create a new human influenza A subtype
virus through a process called genetic reassortment. Antigenic shift results in a new human influenza A subtype.
A global influenza pandemic (worldwide spread) may occur if three conditions are met:
8
Diagrammatic representation of Antigenic shift
9
Low Pathogenic versus Highly Pathogenic Avian Influenza A Viruses
Avian influenza A virus strains are further classified as low pathogenic (LPAI) or highly pathogenic (HPAI) on
the basis of specific molecular genetic and pathogenesis criteria that require specific testing. Most avian
influenza A viruses are LPAI viruses that are usually associated with mild disease in poultry. In contrast, HPAI
viruses can cause severe illness and high mortality in poultry. More recently, some HPAI viruses (e.g., H5N1)
have been found to cause no illness in some poultry, such as ducks. LPAI viruses have the potential to evolve
into HPAI viruses and this has been documented in some poultry outbreaks. Avian influenza A viruses of the
subtypes H5 and H7,including H5N1, H7N7, and H7N3 viruses, have been associated with HPAI, and human
infection with these viruses have ranged from mild (H7N3, H7N7) to severe and fatal disease (H7N7, H5N1).
Human illness due to infection with LPAI viruses has been documented, including very mild symptoms (e.g.,
conjunctivitis) to influenza-like illness. Examples of LPAI viruses that have infected humans include H7N7,
H9N2, and H7N2.
In general, direct human infection with avian influenza viruses occurs very infrequently, and has been associated
with direct contact (e.g., touching) infected sick or dead infected birds (domestic poultry).
Mutation
Influenza viruses have a relatively high mutation rate that is characteristic of RNA viruses. The H5N1 virus has
mutated into a variety of types with differing pathogenic profiles; some pathogenic to one species but not
others, some pathogenic to multiple species. The ability of various influenza strains to show species-
selectivity is largely due to variation in the hemagglutinin genes. Genetic mutations in the hemagglutinin
10
gene that cause single amino acid substitutions can significantly alter the ability of viral hemagglutinin
proteins to bind to receptors on the surface of host cells. Such mutations in avian H5N1 viruses can change
virus strains from being inefficient at infecting human cells to being as efficient in causing human
infections as more common human influenza virus types. This doesn't mean one amino acid substitution
can cause a pandemic but it does mean one amino acid substitution can cause an avian flu virus that is not
pathogenic in humans to become pathogenic in humans.H3N2 ("swine flu") is endemic in pigs in China,
and has been detected in pigs in Vietnam, increasing fears of the emergence of new variant strains. The
dominant strain of annual flu virus in January 2006 was H3N2, which is now resistant to the standard
antiviral drugs amantadine and rimantadine. The possibility of H5N1 and H3N2 exchanging genes through
reassortment is a major concern. If a reassortment in H5N1 occurs, it might remain an H5N1 subtype, or it
could shift subtypes, as H2N2 did when it evolved into the Hong Kong Flu strain of H3N2.Both the H2N2
and H3N2 pandemic strains contained avian flu virus RNA segments. "While the pandemic human
influenza viruses of 1957 (H2N2) and 1968 (H3N2) clearly arose through reassortment between human
and avian viruses, the influenza virus causing the 'Spanish flu' in 1918 appears to be entirely derived from
an avian source".
Influenza A viruses have infected many different animals, including ducks, chickens, pigs, whales, horses, and
seals. However, certain subtypes of influenza A virus are specific to certain species, except for birds, which are
hosts to all known subtypes of influenza A. Subtypes that have caused widespread illness in people either in the
past or currently are H3N2, H2N2, H1N1, and H1N2. H1N1 and H3N2 subtypes also have caused outbreaks in
pigs, and H7N7 and H3N8 viruses have caused outbreaks in horses.
Influenza A viruses normally seen in one species sometimes can cross over and cause illness in another species.
For example, until 1998, only H1N1 viruses circulated widely in the U.S. pig population. However, in 1998,
H3N2 viruses from humans were introduced into the pig population and caused widespread disease among pigs.
Most recently, H3N8 viruses from horses have crossed over and caused outbreaks in dogs.
Avian influenza A viruses may be transmitted from animals to humans in two main ways:
Influenza A viruses have eight separate gene segments. The segmented genome allows influenza A viruses from
different species to mix and create a new influenza A virus if viruses from two different species infect the same
person or animal. For example, if a pig were infected with a human influenza A virus and an avian influenza A
virus at the same time, the new replicating viruses could mix existing genetic information (reassortment) and
produce a new virus that had most of the genes from the human virus, but a hemagglutinin and/or neuraminidase
from the avian virus. The resulting new virus might then be able to infect humans and spread from person to
11
person, but it would have surface proteins (hemagglutinin and/or neuraminidase) not previously seen in influenza
viruses that infect humans.
This type of major change in the influenza A viruses is known as antigenic shift. Antigenic shift results when a
new influenza A subtype to which most people have little or no immune protection infects humans. If this new
virus causes illness in people and can be transmitted easily from person to person, an influenza pandemic can
occur.
It is possible that the process of genetic reassortment could occur in a human who is co-infected with avian
influenza A virus and a human strain of influenza A virus. The genetic information in these viruses could reassort
to create a new virus with a hemagglutinin from the avian virus and other genes from the human virus.
Theoretically, influenza A viruses with a hemagglutinin against which humans have little or no immunity that
have reassorted with a human influenza virus are more likely to result in sustained human-to-human transmission
and pandemic influenza. Therefore, careful evaluation of influenza viruses recovered from humans who are
infected with avian influenza is very important to identify reassortment if it occurs.
Although it is unusual for people to get influenza virus infections directly from animals, sporadic human
infections and outbreaks caused by certain avian influenza A viruses and pig influenza viruses have been reported.
(For more information see Avian Influenza Infections in Humans ) These sporadic human infections and
outbreaks, however, rarely result in sustained transmission among humans.
Symptoms in humans
Avian influenza hemagglutinin bind alpha 2-3 sialic acid receptors while human influenza hemagglutinin bind
alpha 2-6 sialic acid receptors. Usually other differences also exist. There is as yet no human form of
H5N1, so all humans who have caught it so far have caught avian H5N1.
Humans who catch a humanized Influenza A virus (in other words a human flu virus of type A) usually have
symptoms that include fever, cough, sore throat, muscle aches, conjunctivitis and, in severe cases, severe
breathing problems and pneumonia that may be fatal. The severity of the infection will depend to a large part on
the state of the infected person's immune system and if the victim has been exposed to the strain before, and is
therefore partially immune. No one knows if these or other symptoms will be the symptoms of a humanized H5N1
flu.
Highly pathogenic H5N1 avian flu in a human is far worse, killing 50% of humans that catch it. In one case, a boy
with H5N1 experienced diarrhea followed rapidly by a coma without developing respiratory or flu-like symptoms.
There have been studies of the levels of cytokines in humans infected by the H5N1 flu virus. Of particular concern
is an elevated levels of tumor necrosis factor alpha (TNFα), a protein that is associated with tissue destruction at
sites of infection and increased production of other cytokines. Flu virus-induced increases in the level of cytokines
are also associated with flu symptoms including fever, chills, vomiting and headache. Tissue damage associated
with pathogenic flu virus infection can ultimately result in death. The inflammatory cascade triggered by H5N1
has been called a 'cytokine storm' by some, because of what seems to be a positive feedback process of damage to
12
the body resulting from immune system stimulation. H5N1 type flu virus induces higher levels of cytokines than
the more common flu virus types such as H1N1.
PREVENTION
Vaccines
A new vaccine is formulated annually with the types and strains of influenza predicted to be the major problems
for that year (predictions are based on worldwide monitoring of influenza). The vaccine is multivalent and the
current one is to two strains of influenza A and one of influenza B. The vaccine given to adults at present is an
inactivated preparation of egg-grown virus. It is contraindicated for those with allergies to eggs. It has a short
lived protective effect and so is usually given in the fall (figure 11) so that protection is high in December/January
- the usual peak months for flu in the northern hemisphere. It needs to be given every year since, besides the short
lived nature of the protection, the most effective strains for the vaccine will change due to drift or shift. Only
certain formulations of the vaccine are approved for young children. Previously, a subunit vaccine was
recommended.
In 2003, a live, attenuated (much less pathogenic than wild-type virus) vaccine (marketed as FluMist) was
approved for use in the United States. It is only approved for healthy individuals (those not at risk for
complications from influenza infection) from five to forty nine years of age. It is given nasally and should provide
mucosal, humoral and cell-mediated immunity. In this vaccine, the vaccine virus is a cold-adapted strain which
can grow in the upper respiratory tract where it is cooler, but grows poorly in the lower respiratory tract. It is
attenuated due to multiple changes in the various genome segments. Reassortment is used to generate viruses
which have six gene segments from the attenuated virus and the HA and NA coding segments from the virus
which is likely to be a problem in the up-coming influenza season. A reassortant is generated for each strain
expected to be a problem. Since this is a live vaccine, given intranasally as a spray, it generates an IgA response
and an IgM/G response. FluMist vaccine virus is also grown on eggs and so is contraindicated for people with an
egg allergy. Since this is a live viral vaccine, it is also contraindicated for children and young adolescents on any
therapy containing aspirin due to the potential risk of Reye's syndrome.
The CDC recommends: “Physicians should administer influenza vaccine to any person who wishes to reduce the
likelihood of becoming ill with influenza (the vaccine can be administered to children as young as 6 months).
Persons who provide essential community services should be considered for vaccination to minimize disruption of
essential activities during influenza outbreaks. Students or other persons in institutional settings (e.g., those who
reside in dormitories) should be encouraged to receive vaccine to minimize the disruption of routine activities
during epidemics.”
Chemotherapy
Rimantadine and amantadine block virus entry across the endosome and also interfere with virus release (see anti-
viral chemotherapy section). They are good prophylactic agents for influenza A, but there are some problems in
taking them on a long term basis. They may be given as protective agents during an outbreak, especially to those
at severe risk and key personnel. They may also be given at the time of vaccination for a few weeks, until the
13
humoral response has time to develop. (There is some evidence that these drugs can help prevent more serious
complications if given early in infection.)
Two neuraminidase inhibitors have recently been approved by the FDA (zanamivir [Relenza] and oseltamivir).
They are active against influenza A and influenza B. These drugs can reduce the duration of uncomplicated
influenza (by approximately 1day). Oseltamavir is approved for prophylaxis as well as treatment. At the moment,
Zanamivir is only approved for treatment but trials indicate it is probably as effective as oseltamivir in
prophylaxis.
As yet there are no clear data on the ability of any of the these drugs to reduce serious complications when used to
treat influenza (as contrasted with when they are used prophylactically).
The best treatments are rest, liquids, anti-febrile agents (not aspirin in the young or adolescent, since Reye's
disease is a potential problem). Be aware of and treat complications appropriatelyThere is no highly
effective treatment for H5N1 flu, but oseltamivir (commercially marketed by Roche as Tamiflu), can
sometimes inhibit the influenza virus from spreading inside the user's body. This drug has become a
focus for some governments and organizations trying to be seen as making preparations for a possible
H5N1 pandemic. On April 20, 2006, Roche AG announced that a stockpile of three million treatment
courses of Tamiflu is waiting at the disposal of the World Health Organization to be used in case of a flu
pandemic; separately Roche donated two million courses to the WHO for use in developing nations that
may be affected by such a pandemic but lack the ability to purchase large quantities of the drug.
There are several H5N1 vaccines for several of the avian H5N1 varieties, but the continual mutation of H5N1
renders them of limited use to date: while vaccines can sometimes provide cross-protection against related flu
strains, the best protection would be from a vaccine specifically produced for any future pandemic flu virus strain.
Dr. Daniel Lucey, co-director of the Biohazardous Threats and Emerging Diseases graduate program at
Georgetown University has made this point, "There is no H5N1 pandemic so there can be no pandemic
vaccine".However, "pre-pandemic vaccines" have been created; are being refined and tested; and do have some
promise both in furthering research and preparedness for the next pandemic.Vaccine manufacturing companies are
being encouraged to increase capacity so that if a pandemic vaccine is needed, facilities will be available for rapid
production of large amounts of a vaccine specific to a new pandemic strain.
Animal and lab studies suggest that Relenza (Zanamivir), which is in the same class of drugs as Tamiflu, may also
be effective against H5N1, in a study performed on mice in 2000, "zanamivir was shown to be efficacious in
treating avian influenza viruses H9N2, H6N1, and H5N1 transmissible to mammals" (Leneva 2001).However
another paper, de Jong 2005, suggested that Zazamivir might not provide protection in humans from the current
avian strain of H5N1 if "systemic involvement of influenza infection is suspected - as has recently been suggested
by some reports on avian H5N1 influenza in humans." While no one knows if zanamivir will be useful or not on a
14
yet to exist pandemic strain of H5N1, it might be useful to stockpile zanamivir as well as oseltamivir in the event
of an H5N1 influenza pandemic. Neither oseltamivir nor zanamivir can currently be manufactured in quantities
that would be meaningful once efficient human transmission starts.
Phylogenetic analysis
Phylogenetic analysis tools are applied to reconstruct the evolution trees at molecular level
Systematics describes the pattern of relationships among taxa and is intended to help us understand the history of all life. But
history is not something we can see—it has happened once and leaves only clues as to the actual events. Scientists use these clues
to build hypotheses, or models, of life's history. In phylogenetic studies, the most convenient way of visually presenting
evolutionary relationships among a group of organisms is through illustrations called phylogenetic trees.
• Operational Taxonomic Unit (OTU): taxonomic level of sampling selected by the user to be used in a study, such as
individuals, populations, species, genera, or bacterial strains.
A phylogenetic tree is composed of nodes, each representing a taxonomic unit (species, populations, individuals), and branches,
15
which define the relationship between the taxonomic units in terms of descent and ancestry. Only one branch can connect any two
adjacent nodes. The branching pattern of the tree is called the topology, and the branch length usually represents the number of
changes that have occurred in the branch. This is called a scaled branch. Scaled trees are often calibrated to represent the passage
of time. Such trees have a theoretical basis in the particular gene or genes under analysis. Branches can also be unscaled, which
means that the branch length is not proportional to the number of changes that has occurred, although the actual number may be
indicated numerically somewhere on the branch. Phylogenetic trees may also be either rooted or unrooted. In rooted trees, there
is a particular node, called the root, representing a common ancestor, from which a unique path leads to any other node. An
unrooted tree only specifies the relationship among species,
without identifying a common ancestor, or evolutionary path.
16
Two major groups of analyses exist to examine phylogenetic relationships: phenetic methods and cladistic methods. It is
important to note that phenetics and cladistics have had an uneasy relationship over the last 40 years or so. Most of today's
evolutionary biologists favor cladistics, although a strictly cladistic approach may result in counterintuitive results.
17
Phenetic Method of Analysis
Phenetics, also known as numerical taxonomy, involves the use of various measures of overall similarity for the ranking of
species. There is no restriction on the number or type of characters (data) that can be used, although all data must be first
converted to a numerical value, without any character "weighting". Each organism is then compared with every other for all
characters measured, and the number of similarities (or differences) is calculated. The organisms are then clustered in such a way
that the most similar are grouped close together and the more different ones are linked more distantly. The taxonomic clusters,
called phenograms, that result from such an analysis do not necessarily reflect genetic similarity or evolutionary relatedness. The
lack of evolutionary significance in phenetics has meant that this system has had little impact on animal classification, and as a
consequence, interest in and use of phenetics has been declining in recent years.
An alternative approach to diagramming relationships between taxa is called cladistics. The basic assumption behind cladistics is
that members of a group share a common evolutionary history. Thus, they are more closely related to one another than they are to
other groups of organisms. Related groups of organisms are recognized because they share a set of unique features (apomorphies)
that were not present in distant ancestors but which are shared by most or all of the organisms within the group. These shared
derived characteristics are called synapomorphies. Therefore, in contrast to phenetics, cladistics groupings do not depend on
whether organisms share physical traits but depend on their evolutionary relationships. Indeed, in cladistic analyses two organisms
may share numerous characteristics but still be considered members of different groups.
Cladistic analysis entails a number of assumptions. For example, species are assumed to arise primarily by bifurcation, or
separation, of the ancestral lineage; species are often considered to become extinct upon hybridization (crossbreeding); and
hybridization is assumed to be rare or absent. In addition, cladistic groupings must possess the following characteristics: all
species in a grouping must share a common ancestor and all species derived from a common ancestor must be included in the
taxon. The application of these requirements results in the following terms being used to describe the different ways in which
groupings can be made:
• A monophyletic grouping is one in which all species share a common ancestor, and all species derived from that
common ancestor are included. This is the only form of grouping accepted as valid by cladists.
• A paraphyletic grouping is one in which all species share a common ancestor, but not all species derived from that
common ancestor are included.
• A polyphyletic grouping is one in which species that do not share an immediate common ancestor are lumped together,
while excluding other members that would link them.
The Origins of Molecular Phylogenetics
Macromolecular data, meaning gene (DNA) and protein sequences, are accumulating at an increasing rate because of recent
advances in molecular biology. For the evolutionary biologist, the rapid accumulation of sequence data from whole genomes has
been a major advance, because the very nature of DNA allows it to be used as a "document" of evolutionary history. Comparisons
of the DNA sequences of various genes between different organisms can tell a scientist a lot about the relationships of organisms
18
that cannot otherwise be inferred from morphology, or an organism's outer form and inner structure. Because genomes evolve by
the gradual accumulation of mutations, the amount of nucleotide sequence difference between a pair of genomes from different
organisms should indicate how recently those two genomes shared a common ancestor. Two genomes that diverged in the recent
past should have fewer differences than two genomes whose common ancestor is more ancient. Therefore, by comparing different
genomes with each other, it should be possible to derive evolutionary relationships between them, the major objective of
molecular phylogenetics.
Molecular phylogenetics attempts to determine the rates and patterns of change occurring in DNA and proteins and to reconstruct
the evolutionary history of genes and organisms. Two general approaches may be taken to obtain this information. In the first
approach, scientists use DNA to study the evolution of an organism. In the second approach, different organisms are used to study
the evolution of DNA. Whatever the approach, the general goal is to infer process from pattern: the processes of organismal
evolution deduced from patterns of DNA variation and processes of molecular evolution inferred from the patterns of variations in
the DNA itself.
19
Molecular Phylogenetic Analysis: Fundamental Elements
As we just discussed, macromolecules, especially gene and protein sequences, have surpassed morphological and other organismal
characters as the most popular forms of data for phylogenetic analyses. Therefore, this next section will concentrate only on
molecular data.
It is important to point out that a single, all-purpose recipe does not exist for phylogenetic analysis of molecular data. Although
numerous algorithms, procedures, and computer programs have been developed, their reliability and practicality are, in all cases,
dependent upon the size and structure of the dataset under analysis. The merits and shortfalls of these various methods are subject
to much scientific debate, because the danger of generating incorrect results is greater in computational molecular phylogenetics
than in many other fields of science. Occasionally, the limiting factor in such analyses is not so much the computational method
used, but the users' understanding of what the method is actually doing with the data. Therefore, the goal of this section is to
demonstrate to the reader that practical analysis should be thought of both as a search for a correct model (analysis) as well as a
search for the correct tree (outcome).
Phylogenetic tree-building models presume particular evolutionary models. For any given set of data, these models may be
violated because of various occurrences, such as the transfer of genetic material between organisms. Therefore, when interpreting
a given analysis, a person should always consider the model used and entertain possible explanations for the results obtained. For
example, models used in molecular phylogenetic analysis methods make "default" assumptions, including:
• The sequence variability in the sample contains phylogenetic signal adequate to resolve the problem under study.
3. Tree building.
4. Tree evaluation.
20
Introduction to Homology modelling
One method that can be applied to generate reasonable model of proteins structure is homology modelling. This procedure is also
termed as comparative modelling or knowledge-based modelling.
Homology modelling are useful to get a rough idea where alpha carbon of a residue sit the folded protein. They
can guide hypothesis about structure–function relationship. Homology models are unreliable in predicting the
conformation of insertion or deletion .Homology model are unlikely to be useful in modelling ligand-docking
drug designing unless the sequence identity with the template is > 70% & even then less reliable than an
empirical crystallographic or NMR.
Introduction to Docking
Docking studies are molecular modeling studies aiming at finding a proper fit between a ligand and its binding
site.
21
Protein receptor-ligand motifs fit together tightly, and are often referred to as a lock and key mechanism. There is
both high specificity and induced fit within these interfaces with specificity increasing with rigidity. Protein
receptor-ligand can either have a rigid ligand and a flexible receptor, or a flexible ligand with a rigid receptor.
The native structure of the rigid ligand flexible receptor often maximizes the interface area between the
molecules. They move within respect to one another in a perpendicular direction in respect to the interface. This
allows for binding of a receptor with a larger than usual ligand. Normally when there is ligand overlap in the
docking interface, energy penalties incur. If the van der Waals forces can be decreased, energy loss in the system
will be minimilized. This can be accomplished by allowing flexibility in the receptor. Flexibility receptors allow
for docking of a larger ligand than would be allowed for with a rigid receptor.
When the fit between the ligand and receptor does not need to be induced, the receptor can retain its rigidity while
maintaing the free energy of the system. For successful docking, the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the receptor interface. No docking is
completely rigid though; there is intrinsic movement which allows for small conformational adaptation for ligand
binding. When the six degrees of freedom for protein movement are taken into consideration (three rotational,
three translational), the amount of inherent flexibility allowed the receptor is even greater. This further offsets any
energy penalty between the receptor and ligand, allowing for easier, more enegetically favorable binding between
the two.
Aim of docking
The aim of docking is to find out the new drugs target, it will open new vistas for further drug development .The
finding of our docking will be useful in finding a cure for the infectious disease bird flu, also it will open new
avenues for finding other possible drug targets in influenza A virus. The docking results can be used to design
new lead compounds and hence can aid in the new drug discovery process.
Receptor
22
A residue on the surface of the cell that serves as a recognition or binding site for antigens,antibody or other
cellular or immunological components.It is a molecule with in a cell suface to which a substance (such as
harmones or a drug ),selectively bind causing a change in the activity of the cell.
Ligand
The molecule which binds to a protein molecule (eg, receptor). As a ligand binds through the interaction of many
weak, noncovalent bonds formed to the binding site of a protein, the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein.
Active Site
The active site of a protein/enzyme is the region that binds the substrates (and the cofactor, if any). It also contains
the residues that directly participate in the making and breaking of bonds. These residues are called the catalytic
groups. In essence, the interaction of the enzyme and substrate at the active site promotes the formation of the
transition state. The active site is the region of the enzyme that most directly lowers the G of the reaction,
which results in the rate enhancement characteristic of enzyme action.
Amino acids in protein active sites:
It is difficult to generalize which amino acids are likely to be in a protein active/functional site as this greatly
depends on the type of function. With that in mind, below are preferences for the 20 amino acids to lie within
functional regions on proteins These were worked out by considering how often particular amino acids were in
contact with bound non-protein atoms in protein three-dimensional structures. Postive values mean that the amino
acid makes more contacts than one would expect by chance; negative values mean that it makes fewer. The below
does not include protein-protein, or protein-peptide interactions, where many of the amino acids with negative
values (e.g. tryptophan or proline) can play critical roles.
Neuraminidase
Neuraminidase is an antigenic glycoprotein enzyme(EC 3.2.1.18) found on the surface of the Influenza virus.
23
Subtypes
Nine neuraminidase subtypes are known; many occur only in various species of duck and chicken. Subtypes N1
and N2 have been positively linked to epidemics in man, and strains with N3 or N7 subtypes have been identified
in a number of isolated deaths.
Structure
The neuraminidase enzyme exists as a mushroom-shape projection on the surface of the influenza virus. It has a
head consisting of four co-planar and roughly spherical subunits, and a hydrophobic region that is embedded
within the interior of the virus' membrane. It is comprised of a single polypeptide chain that is oriented in the
opposite direction to the hemagglutinin antigen. The composition of the polypeptide is a single chain of six
conserved polar amino acids, followed by hydrophilic, variable amino acids.
Function
Neuraminidase has functions that aid in the efficiency of virus release from cells. Neuraminidase cleaves terminal
sialic acid residues from carbohydrate moieties on the surfaces of infected cells. This promotes the release of
progeny viruses from infected cells. Neuraminidase also cleaves sialic acid residues from viral proteins,
preventing aggregation of viruses. Administration of chemical inhibitors of neuraminidase is a treatment that
limits the severity and spread of viral infections.
Ideally influenza virus neuraminidase NA should act on the same type of virus receptor the virus hemagglutinin
HA binds to. This is not always so. It is not quite clear how the virus manages to function if there is no close
match between the specificities of NA and HA
Neuraminidase inhibitors
Inhibitors are used for combating the virus. They are zanamivir and oseltamivir.
Neuraminidase inhibitors are a class of antiviral drugs whose mode of action relies on blocking the function of
viral neuraminidase protein, thus preventing the virus from budding from the host cell.
Unlike the M2 inhibitors, which work only against the influenza A, neuraminidase inhibitors act against both
influenza A and B.
24
Chapter 2
MATERIALS AND
METHODS
25
Materials and methods
Influenza virus belong to orthomyxoviridae family is a special kind of virus whose sequence
available in segments (total 8 segments) not in genome. The influenza A virus genome is contained on 8 single
non-paired RNA strained that code for 10 proteins. The segmented nature of genome allows for the exchange of
entire genes between different viral strains when they cohabitate the same cell.
For our analysis we take three different types of sequences. For this purpose we take gene sequences of five
different strains (i.e. H5N1, H2N2, H1N1, H9N2, and H3N2) available in different segment collected from NCBI
(www. ncbi.nlm.nih.gov) .we get 41 such gene sequences. We also take genome sequences of these different
strains (i.e. H5N1, H2N2, H9N2, H1N1, and H3N2) and protein sequences with the Gene and genome sequences.
We collect genes and protein sequences from influenza virus resources available on the website
(www.ncbi.nlm.nih.gov). We got around 40 such nucleotide sequences and around 60 such protein sequences.
We take these three types of sequences as each sequence is informative in their sense.
After collecting these sequences from their repositories we proceed our further analysis.
Phylogenetic analysis:
• Sequence alignment
• Determining the substitution model
• Tree building
• Tree evaluation
The first step following data retrieval is the execution of a multiple sequence alignment, obtained via
CLUSTALW (progressive alignment method). The purpose of this step is to place the most closely related
sequences in the user's data set together prior to initiating tree construction. PHYLIP takes the patterns gleaned
from multiple sequence alignment when building phylogenies.
2. Phylogenetic Method
26
Analyses in the present interface are rendered according to the distance method. Four program within Phylip are
employed here they are SEQBOOT, DNADIST, NEIGHBOR AND CONSENSE.
[A] Once multiple alignment has been completed, the data set is transmitted to SEQBOOT. SEQBOOT generates
multiple possible arrangements of the alignment (reflecting the number of conceivable evolutionary paths).
[B]DNADIST reads in the data from SEQBOOT and computes a distance score for protein sequences. This step is
most critical, since no subsequent analysis can be made without a measure of sequence divergence or similarity. A
Day Hoff PAM matrix is used for computation of distance scores between pairs of sequences. A distance score
reflects the number of single amino acid alterations required in order generate an identity sequence from a second
sequence.
[C] NEIGHBOR implements the Neighbor-joining method (Saitou and Nei 1987) to determine the most
reasonable positioning of branches. Two sequences having the smallest distance scores are joined as "neighbors"
and will share a node below them (or to their left) in the final tree.
Alternatively, if the user specifies a rooted tree, then NEIGHBOR implements another algorithm, the
unweighted pair group method with arithmetic mean (UPGMA). The UPGMA algorithm assumes a
molecular clock and generates rooted trees.
[D] Then the branch ordering data is passed to CONSENSE for resampling computations. Any phylogenetic
method renders the most likely tree, i.e., those relationships that are most reasonable given the sequence
alignments. As such, any single tree is only one of many possible trees that could have arisen over evolutionary
time. Resampling methods, therefore, are designed to find the most probable tree among the many possible
evolutionary paths that could have generated a given set of proteins.
[E] Lastly we draw tree using NJ-PLOT. NJ-plot is a tree drawing program able to draw any phylogenetic tree
expressed in the Newick phylogenetic tree format (e.g., the format used by the PHYLIP package). NJ plot is
especially convenient for rooting the unrooted trees obtained from parsimony, distance or maximum likelihood
tree-building methods. The trees were drawn as unrooted trees.
Family analysis
Then we go for family analysis, in family analysis we use GENSCAN tool for finding motifs,
exons, introns in our genome sequences. To find out the ORF, we use GET ORF of EMBOSS.
Modelling
Taking protein sequences from the NCBI of h5n1 strain of influenza A virus we perform homology modelling of
these protein sequences using SWISS MODEL server.
First step that we follow is we do PDB Blast of these sequences to get appropriate template present in PDB for our
sequences. We get lots of hits .Among them we select the best template following some criteria.
Then we go for modelling through Swiss Model Server.
27
Then we visualize the modelled structure modelled by Swiss Model Server in SWISSPDB VIEWER. After that
our next step is docking of neuraminidase protein.
Docking
At last we take protein neuraminidase of avian influenza virus, this protein is one of the reasons of its
pathogenicity.To perform docking we use HEX software. It automatically searches the active site for our ligand
where our ligand is best fitted.
For performing the docking we find out the ligand and receptor of our protein using many receptor and ligand
finding tools such as PDBSUM, SUMO, CSA, and JENA LIBRARY.
28
Chapter 3
29
Result and discussion
In the current work of Phylogenetic analysis the trees were constructed using neighbor joining method and were
represented as unrooted trees .The bootstrap values at the node representing the robustness of the trees were also
satisfactory. We find out branch length and distances of gene.
The general nature of the tree and the relative distances of different strains from common ancestors are analyzed.
Bootstrap is used to evaluate the reliability of a Phylogenetic Tree. In a bootstrapped tree, u can see some values
in each node. According to these values, we can say the evolutionary strength of the nodes. The scale bar shows
the number of substitution per residue.
Table 1:
[A]
PA 2.23 H5N1
NS1,NS2 1.56 H5N1
[B]
Gene showing higher branch length in H9N2
NP 1.767 H9N2
PB2 1.665 H9N2
30
[C]
Gene showing higher branch length in H3N2
HA 2.03 H3N2
M1,M2 2.45 H3N2
[D]
NA 2.206 H1N1
PB1,PB1-F2 1.858 H1N1
Our analysis through gene sequences shows that same genes like PA, HA, PB1, PB1-F2, NS, PB2, M1, M2, NP.
NA are present in all strains. It reflects that H5N1, H2N2, H9N2, H3N2, H1N1 are evolved from the same
common ancestor at the same rate.
In case of genes like PA, NS1,NS2 they remain more conserved in h9n2,h2n2,h3n2,h1n1 than in h5n1[table no.1
(A)]
In case of genes like NP,PB2 they remain more conserved in h5n1,h2n2,h3n2,and h1n1 than in h9n2.[table no.
(B)]
In contrast, for the genes like HA, M1, M2, h3n2 strain appears to diverge more from the common ancestor than
h1n1, h2n2, h5n1, h9n2 [table no.1 (C)].
In case of genes like PB1 and PB1-F2 are highly conserved in h3n2, h2n2, h9n2, h5n1 than in h1n1.{table
no.1(D)]
31
Therefore, from this observation it might be concluded that in the course of evolution, the genes underwent
suitable modifications in strains h1n1, h9n2, h5n1, h3n2. as compared to h2n2.This proves that H2N2 is less
pandemic as compared to others, which are main causal of pandemic bird flu now-a-days
So our current analysis, it can be said that overall, from the common ancestor these strains are diverged more in
the course of evolution. In order to adopt a better survival strategy this drift is more prominent.
Outputs
Clustal w output
ClustalW Results
Results of search
Number of sequences 41
Sequence type nt
Alignment
41 2392
32
gi|7391905 AGCAAAAGCA GGTCAATTAT ATTCAGTATG GAAAGAATAA AAGAACTACG
gi|8486138 AGCGAAAGCA GGTCAATTAT ATTCAATATG GAAAGAATAA AAGAACTAAG
gi|7392156 ---------- ---------- ---------- ---------- ----------
gi|7391921 ---------- ---------- ---------- ---------- ----------
gi|8486131 ---------- ---------- ---------- ---------- ----------
gi|3214016 ---------- ---------- ---------- ---------- ----------
gi|7385294 ---------- ---------- ---------- ---------- ----------
gi|7385295 ---------- ---------- ---------- ---------- ----------
gi|3214142 ---------- ---------- ---------- ---------- ----------
gi|7391268 ---------- ---------- ---------- ---------- ----------
gi|7391915 ---------- ---------- ---------- ---------- ----------
gi|8486122 ---------- ---------- ---------- ---------- ----------
gi|7385295 ---------- ---------- ---------- ---------- ----------
gi|7391914 ---------- ---------- ---------- ---------- ----------
gi|8486125 ---------- ---------- ---------- ---------- ----------
gi|3214016 ---------- ---------- ---------- ---------- ----------
gi|7391920 ---------- ---------- ---------- ---------- ----------
gi|7392126 ---------- ---------- ---------- ---------- ----------
gi|8486127 ---------- ---------- ---------- ---------- ----------
gi|7392130 ---------- ---------- ---------- ---------- ----------
gi|7391913 ---------- ---------- ---------- ---------- ----------
gi|3214016 ---------- ---------- ---------- ---------- ----------
gi|7385294 ---------- ---------- ---------- ----AGCAAA AGCAGGCAAA
gi|3214016 ---------- ---------- ---------- -----GCAAA AGCAGGCAAA
gi|7391268 ---------- ---------- ---------- ----AGCAAA AGCAGGCAAA
gi|7391914 ---------- ---------- ---------- ----AGCAAA AGCAGGCAAA
gi|8486134 ---------- ---------- ---------- ----AGCGAA AGCAGGCAAA
33
TGGACCATAT GGCCATAATC AAGAAGTACA CATCAGGAAG ACAGGAGAAG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CAAAATGCTA TAAGTACCAC ATTCCCTTAT ACTGGAGATC CTCCATACAG
CAAAATGCAA TAAGTACCAC ATTCCCTTAT ACTGGAGATC CCCCATATAG
CAAAATGCCA TAAGTACTAC ATTCCCTTAT ACTGGAGATC CTCCATACAG
CAAAATGCCA TAAGCACCAC ATTCCCTTAT ACTGGAGATC CTCCATACAG
CAAAATGCTA TAAGCACAAC TTTCCCTTAT ACCGGAGACC CTCCTTACAG
34
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- -------GCA GGGGT----- --ATAATCTG TCAAAATGGA
---------- AGCAAAAGCA GGGGTTATA- CCATAGACAA CCAAAAGCAT
---------- AGCAAAAGCA GGGGAAA--- ATAAAAACAA CCAAAATGAA
---------- -GCAAAAGCA GGGGAAT--- TACTTAACTA GCAAAATGGA
---------- AGCAAAAGCA GGGGATAATT CTATTAACCA TGAAGACTAT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
AATATTCAGA AAAGGGGAAA TGGACAACGA ACACAGAGAC TGGAGCACCC
AATATTCAGA AAAAGGGAGG TGGACAACAA ACACAGAGAC CGGAGCACCC
AATATTCAGA AAAGGGGAAG TGGACAACAA ACACGGAAAC TGGAGCGCCC
AATATTCAGA GAAGGGGAAG TGGACGACAA ATACAGAAAC TGGGGCACCC
AGTACTCAGA AAAGGCAAGA TGGACAACAA ACACCGAAAC TGGAGCACCG
35
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TCAAAAGTGA TCAGA----- ----TTTGCA TTGGTTACCA TGCAAACAAC
TGAGGGGGGA CCAGA----- ----TATGCA TTGGATACCA TGCCAATAAT
CAGATGCAGA CACAA----- ----TATGTA TAGGCTACCA TGCGAACAAT
GCAATGCAGA TAAAA----- ----TCTGCA TCGGCCACCA GTCAACAAAC
GAAATGACAA CAGCACGGCA ACGCTGTGCC TTGGGCACCA TGCAGT--AC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
GTATGCACAA ACAGATTGTG TATTGGAAGC AATGGCTTTC CTTGAAGAAT
GTATGCACAA ACAGATTGTG TATTGGAAGC AATGGCTTTC CTTGAAGAAT
ATATGCACAA ACAGACTGCG TCCTGGAAGC AATGGCTTTC CTTGAGGAAT
ATATGCACAA ACAGACTGTG TCCTGGAGGC TATGGCCTTC CTTGAAGAAT
TTATGCCCAA ACAGATTGTG TATTGGAAGC AATGGCTTTC CTTGAGGAAT
36
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CATGCCCAAG ACATACTGGA AAAGACACAC AATGGGAAGC TCTGCGATCT
CATGCCAAGG ACATCCTTGA GAAGACCCAT AACGGAAAGC TATGCAAACT
CACTCTGTTA ACCTGCTCGA AGACAGCCAC AACGGAAAAC TATGTAGATT
CATGCCAAAG AATTGCTCCA CACAGAGCAT AATGGAATGC TGTGTGCAAC
AATGCTACTG AACTGGTTCA GAGTTCCTCA ACAGGTGGAA TATGCGA--C
---------- ---------- -------AGC AAAAGCAGGA GATTAAAATG
---------- ---------- -------AGC GAAAGCAGGG GTTTAAAATG
---------- ---------- ---------- ---------- -------ATG
---------- ---------- -------AGC AAAAGCAGGA G-TAAAGATG
---------- ---------- ---------- ---------- -------ATG
CAGCAAACAA GAGTGGATAA GCTGACCCAA GGTCGCCAAA CCTATGACTG
CAGCAAACGA GAGTGGATAA GCTGACCCAA GGTCGCCAGA CTTATGACTG
CAACAAACAA GAGTGGACAA ACTGACCCAA GGTCGTCAGA CCTATGACTG
CAACAAACAA GGGTGGACAA ACTAACCCAA GGCCGCCAGA CTTATGATTG
CAGCAAACAC GAGTAGACAA GCTGACACAA GGCCGACAGA CCTATGACTG
37
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CTCCTCGGAA ACCCTATGTG TGACGAATTC ATCAATGTGC CGGAATGGTC
CTCCTTGGAA ATCCAGAATG TGATAGGCTT CTAAGTGTGC CAGAATGGTC
CTCTTGGGAA ACCCAGAATG CGACCCACTG CTTCCAGTGA GATCATGGTC
GTCTATGGCA ACCCTTCTTG TGACCTGCTG TTGGGAGGAA GAGAATGGTC
CTATTGGGAG ACCCTCAGTG TGATGGCTTC C--AAAATAA GAAATGGGAC
GGATAATTAG CTTGATGTTA CAAATTGGGA ACATAATCTC AA-TATGGGT
GACTAATTAG CCTAATATTG CAAATAGGGA ATATAATCTC AA-TATGGAT
CAACAGTATG CTTCCTCATG CAGATTGCCA TCCTGGTAAC TACTGTGACA
CCACAATATG CTTCTTCATG CAAATTGCCA TCCTGATAAC CACTGTAACA
CGACAATATG TTTACTCATG CAGATTGCCA TCTTAGCAAC GACTATGACA
AGGTCTTCAG ATCGAATGGT CTAACAGCCA ATGAATCGGG AAGGCTAATA
AAGTATTCAG ATCGAACGGT CTAACAGCCA ATGAGTCAGG AAGGTTAATA
AGGTCTTCAG ATCGAATGGA CTGACAGCTA ATGAGTCGGG AAGGCTAATA
AAGTTTTTAG ATCGAATGGA CTAACAGCCA ATGAATCAGG AAGGCTAATA
AAGTGTTCAG ATCAAATGGC CTCACGGCCA ATGAGTCTGG AAGGCTCATA
38
ATGGATTCCA ACACGATAAC CTCGTTTCAG GTAGATTGTT ATCTAT-GGC
---------- ---------- ---------A GCAAAAGCAG GTAGAT-ATT
---------- ---------- --GGGGAATT CCAAAAGCAG GTAGAT-ATT
---------- ---------- ---------A GCAAAAGCAG GTAGAT-ATT
---------- ---------- ---------A GCAAAAGCAG GTAGAT-ATT
---------- ---------- ---------A GCGAAAGCAG GTAGAT-ATT
ATTTC---AA CGACTATGAA GAACTGAAAC ACCTATTGAG CAGAAC-AAA
GCTTC---AA TGACTATGAA GAATTGAAAC ATCTCCTCAG CAGCGT-GAA
ATTTC---AT CGACTATGAG GAGCTGAGGG AGCAATTGAG CTCAGT-GTC
ATGTA---GA AAACCTAGAG GAACTCAGGA CACTTTTTAG TTCCGC-TAG
ATGTG---CC GGATTATGCC TCCCTTAGGT CACTAGTTGC CTCATC-CGG
CAAAGCATTA TTACTTATGA AAACAACACC TGGGTAAATC A------AAC
CAAAACATCA TTACCTATAA AAATAGCACC TGGGTAAAGG A------CAC
AATGCCGTGT GAACCAATAA TAATAGAAAG -GAACATAAC A------GAG
GATGCTGTGT GAACCAACAA TAATAGAAAG -AAACATAAC A------GAG
AGTGCCATGT GAACCAATCA TAATAGAAAG -GAACATAAC A------GAG
AATAACACAT TTCCAGAGAA AGAGAAGAGT GAGGGACAAC ATGACCAAGA
AACAACACAT TTCCAGAGAA AGAGAAGAGT GAGGGACAAC ATGACCAAGA
AACAACACAC TTCCAAAGAA AAAGAAGAGT AAGAGACAAC ATGACCAAGA
AACAACACAC TTTCAAAGAA AAAGGAGAGT AAGAGACAAC ATGACCAAGA
CACAACTCAT TTTCAGAGAA AGAGACGGGT GAGAGACAAT ATGACTAAGA
39
CCCGTCAGGC CCCCTCAAAG CCGAGATCGC GCAGAGACTT GAGGATGTCT
CCCATCAGGC CCCCTCAAAG CCGAGATCGC GCAGAGACTT GAGGATGTTT
CCCGTCAGGC CCCCTCAAAG CCGAGATCGC ACAGAGACTT GAAGATGTCT
TCCATCAGGC CCCCTCAAAG CCGAGATCGC GCAGAGACTT GAAGATGTCT
CCCGTCAGGC CCCCTCAAAG CCGAGATCGC ACAGAGACTT GAAGATGTCT
ATGATGCCTC ATCAGGGGTG AGCTCAGCAT GTCCATACCA TGGGAGGTCC
AGCATACAAC AACTGGAGGT TCATGGGCCT GCGCGGTGTC AGGTAAACCA
ACAACACAAC CAAAGGAGTA ACGGCAGCAT GCTCCCATGC GGGGAAAAGC
ATGTGACTTA CACTGGAACA AGCAGAGCAT GTTC------ ------AGGT
GAGTCACTCA AAATGGAACA AGCTCTGCTT GCAAAAGGAG ATCTAATAAC
TTCAGTAA-- -CATTAGCGG GCAATTCA-- -TCTCTTTGC CCCAT----T
TTCAGTGA-- -TATTAACCG GCAATTCA-- -TCTCTTTGT CCCAT----C
AGTAGTGG-- -AATACAGAA ATTGGTCAAA GCCGCAATGT CAAAT----T
ACTAGCAG-- -AATACAGAA ATTGGTCAAA GCCGCAATGT GACAT----T
AGTAGCAG-- -AATACAAGA ATTGGTCAAA ACCGCAATGT CAAAT----T
AGGAGCTACC TAATAAGAGC ACTGACACTG AACACAATGA CAAAAGACGC
AAGAGCTACC TAATAAGAGC ACTGACACTG AACACAATGA CAAAAGATGC
AGAAGCTATC TGATAAGAGC ACTGACATTG AACACAATGA CTAAAGATGC
AGAGGCTATC TAATAAGAGC TTTGACATTG AACACGATGA CCAAAGATGC
AGGAGTTATC TAATTAGAGC ATTGACCCTG AACACAATGA CCAAAGATGC
40
AAGACCAATC CTGTC---AC CTCTGACTAA GGGGATTTTA GGGTT--TGT
AAGACCAATC CTGTC---AC CTCTGACTAA GGGGATTTTG GGATT--TGT
AAGACCAATT CTGTC---AC CTCTGACTAA GGGGATTTTG GGGTT--TGT
AAGACCAATC CTGTC---AC CTCTGACTAA GGGGATTTTA GGATT--TGT
CAACAATAAA GAGGAGCTAC AATAATACCA ACCAAGAAGA TCTTTTAGTA
CGGTTGCCAA AGGATCGTAC AACAATACAA GCGGAGAACA AATGCTAATA
CAAAGCTGAA AAATTCTTAT GTGAACAAGA AAGGGAAAGA AGTCCTTGTA
CTGTTCAAGA CGCCCAATAC ACAAATAACA GGGGAAAGAG CATTCTTTTC
CAGCATTGAA CGTGACTATG CCAAACAATG AAAAATTTGA CAAACTGTAC
GGGGGATGTG TTTGTTATAA GA-GAGCCGT TCATCTCATG CTCCCACTTG
AGGAGACGTT TTTGTCATAA GA-GAGCCCT TTATTTCATG TTCTCACTTG
TGGGGACATT TGGGTGACGA GA-GAACCTT ATGTGTCATG CGATCCTGGC
TGGGGACATC TGGGTGACAA GA-GAACCTT ATGTGTCATG CGACCCTGAC
CGGGGATATT TGGGTGACAA GA-GAACCTT ATGTATCGTG CGGTCTTGGT
TCAGAGGATT CGTGTACTTT GTCGAAACAC TAGCGAGGAG TATCTGTGAG
TCAGAGGATT CGTGCACTTT GTCGAAGCAC TAGCAAGGAG CATCTGTGAA
TCAGAGGGTT CGTGCACTTT GTCGAAACAC TAGCGAGAAA TATTTGTGAG
TTAGAGGGTT CGTGTACTTC GTTGAAACTT TAGCTAGAAG CATTTGCGAA
TAAGGGGGTT TGTATACTTT GTTGAGACAC TGGCAAGGAG TATATGTGAG
41
CGCTTTGTCC AAAATGCCCT CAATG--GGA ATGGGGATCC AAATAACATG
CGCTTTGTCC AAAATGCCCT CAATG--GGA ATGGAGATCC AAATAACATG
CGCTTTGTCC AAAATGCCCT TAATG--GGA ACGGGGATCC AAATAACATG
CTATCAAAAC CCAACCACTT ACATTTCCGT TGGAACATCA ACACTGAACC
GTACCAGAAT GTGGGAACCT ATGTTTCCGT AGCCACATCA ACATTGTACA
CTATCAGAAT GAAAATGCTT ATGTCTCTGT AGTGACTTCA AATTATAACA
GTACATAAGA AACGACACAA CAACAAGCGT GACAACAGAA GATTTGAATA
ATATGCTCAA GCATCAGGAA GAATCACAGT CTCTACCAAA AGAAGCCAAC
GCACTCCAAT GGGACCGTCA AAGACAGAAG CC-CTCACAG AACATTGA--
GCATTCAAAT GGGACTGTTA AGGACAGAAG CC-CTTATAG GGCCTTAA--
ACATTCAAAT GACACAATAC ATGATAGAAT CC-CTCATCG AACCCTAT--
GCATTCAAAT GACACAGTAC ATGATAGGAC CC-CTTATCG GACCCTAT--
ACACTCAAAT GGCACAATAC ATGATAGGAG TC-CCCATAG AACCCTTT--
ATTGGCAAAT GTCGTGAGGA AGATGATGAC TAACTCACAA GATACAGAGC
ATTGGCAAAT GTTGTGAGAA AGATGATGAC TAACTCACAA GACACAGAGC
ACTAGCAAAT GTTGTTAGAA AAATGATGAC TAATTCACAA GACACAGAGC
ACTGGCAAAT GTTGTGAGAA AAATGATGAC TAATTCACAA GACACTGAGC
GTTGGCAAAT GTTGTAAGGA AGATGATGAC CAATTCTCAG GACACCGAAC
42
TGGGGCCAAA GAAATAGCTC -TCAGTTATT CTGCTGGTGC ACTTGCCAGT
TGGGGCCAAA GAAATCTCAC -TCAGTTATT CTGCTGGTGC ACTTGCCAGT
GGAAGAATGG AGTTCTTCTG GACAATTTTA AAGCCGAATG ATGCCATCAA
CGTAGAATGG AATTCTCTTG GACCCTCTTG GATATGTGGG ACACCATAAA
GGGAGGATGA ACTATTACTG GACCTTGCTA AAACCCGGAG ACACAATAAT
GGAAGAATTG ATTATTATTG GTCGGTACTA AAACCAGGCC AAACATTGCG
AGCAGAATAA GCATCTATTG GACAATAGTA AAACCGGGAG ACATACTTTT
GTCTGTTGCT TGGTCGGCAA GTGCTTGCCA TGATGGCACC AGTTGGTTGA
ATCGGTTGCT TGGTCAGCAA GTGCATGTCA TGATGGCATG GGCTGGCTAA
GTGTGTAGCA TGGTCCAGCT CAAGTTGTCA CGATGGAAAA GCATGGTTGC
GTGCATAGCA TGGTCCAGCT CAAGTTGTCA CGATGGAAAA GCATGGCTGC
GTGCATAGCA TGGTCCAGCT CAAGCTGCCA TGATGGGAAG GCATGGTTAC
ACCCTCGGAT GTTTCTAGCA ATGATAACAT ACATCACAAG GAACCAACCT
ATCCTCGAAT ATTTCTAGCA ATGATAACAT ACATCACAAG GAACCAACCT
ATCCTCGAGT GTTTCTGGCG ATGATAACAT ACATCACAAG AAATCAACCT
ACCCTCGAAT GTTTTTGGCG ATGATTACAT ATATCACAAA AAATCAACCT
ATCCTCGGAT GTTTTTGGCC ATGATCACAT ATATGACCAG AAATCAGCCC
43
TGGCATTTGG CCTGGTATGT GCAACCTGTG AAC-AGATTG C-----TGAC
TCAAGAAAGG GGACTCAGCA ATTATGAAAA GTGAATTGGA ATATGGTAAC
CGAAAAGAGG TAGTTCAGGG ATCATGAAGA CAGAAGGAAC ACTTGAGAAC
GTAGAGGCTT TGGGTCCGGC ATCATCACCT CAAACGCATC AATGCATGAG
CAGGAGGGAG CCATGGAAGA ATCCTGAAGA CTGATTTAAA AGGTGGTAAT
GAAGTGGGAA AAGCTCA--- ATAATGAGAT CAGATGCACC CATTGGCAAA
AACGGCATAA TAACAGACAC TATCAAGAGT TG-GAGGAAC AACAT---AC
AACGGCATAA TAACTGAAAC CATAAAAAGT TG-GAGGAAG AAAAT---AT
GACGGGAGGC TTATGGACAG TATTGGTTCA TG-GTCTCAA AATAT---CC
AATGGGAGGC TTGTAGATAG TATTGTTTCA TG-GTCCAAA AAAAT---CC
GATGGGATGC TTACCGACAG TATTGGTTCA TG-GTCTAAG AACAT---CC
GATGGCAAGA TTAGGGAAAG GATACATGTT CGAAAGTAAG AGCATGAAGC
AATGGCGAGG TTAGGAAAAG GATACATGTT CGAGAGTAAG AGCATGAAGC
AATGGCTAGA CTAGGGAAAG GTTACATGTT CGAAAGCAAG AGCATGAAGC
AATGGCAAGA CTAGGAAAAG GATACATGTT CGAGAGTAAG AGAATGAAGC
AATGGCGAGA CTGGGAAAAG GGTATATGTT TGAGAGCAAG AGTATGAAAC
44
CCATTCCACA ACATACACCC CCTCACCATC GGG--GAATG CCCCAAATAT
CCTTTTCACA ATGTCCACCC ACTGACAATA GGT--GAATG CCCCAAATAT
CCTTTCCAGA ATATACACCC AGTCACAATA GGA--GAGTG CCCAAAATAC
CCATTCCACA ATATCAGTAA ATATGCATTT GGA--ACCTG CCCCAAATAT
CCATTTCAAA ATGTAAACAG GATCACATAT GGG--GCCTG TCCCAGATAT
CTGTAATGAC TGACGGACCA AGTAATGGGC AGGCCTCATA T-AAGATCTT
CTATAATGAC TGATGGCCCG AGTGATGGGC TGGCCTCGTA C-AAAATTTT
TAGTAATGAC TGATGGAA-G TGCTTCAGGA AGAGCCGATA CTAGAATACT
TAGTAATGAC TGATGGGA-G TGCTTCAGGA AAAGCTGATA CTAAAATACT
TAGTAATGAC TGATGGAA-G TGCATCAGGA AGGGCTGATA CTAAAATACT
CTTCAACGAA TCAACGAG-- ----AAAGAA AATCGAGAAA ATAAGACCTC
CTTCAACGAA TCGACGAG-- ----AAAGAA AATTGAGAAA ATAAGACCTC
CTTTAATGAA TCAACCAG-- ----AAAGAA AATTGAGAAA ATAAGGCCTC
TTTCAATGAA TCAACAAG-- ----GAAGAA AATTGAGAAA ATAAGGCCTC
TTTCAATGAT TCAACAAG-- ----AAAGAA GATTGAAAAA ATCCGACCGC
45
GATTGAATCA AG-------- ----AGGATT GTTTGGGGCA ATAGCTGGTT
CATTCAATCC AG-------- ----AGGTCT ATTTGGAGCC ATTGCCGGTT
TAGATCAAGT AG-------- ----AGGACT ATTTGGAGCC ATAGCTGGAT
GAAACAAACT AG-------- ----AGGCAT ATTTGGCGCA ATCGCGGGTT
TAATTATCAC TATGAGGAGT GCTCCTGTTA TCCTGATGCT G---GCGAAA
TAATTCTCAC TATGAGGAAT GTTCCTGTTA CCCTGATACC G---GCAAAG
TGCTCAGCAT GTAGAGGAGT GTTCCTGTTA TCCTCGATAT C---CTGACG
TGCTCAGCAT GTCGAGGAGT GCTCCTGCTA TCCTCGATAT C---CTGGTG
TGCTCAGCAT GTGGAGGAAT GCTCCTGTTA CCCCCGGTAT C---CAGAAG
TGTTCAATAT GCTGAGTACA GTCTTAGGAG TTTCAATCCT GAATCTTGGG
TGTTTAATAT GCTAAGTACG GTCTTAGGAG TCTCAATCTT AAATCTTGGG
TGTTCAACAT GCTAAGTACA GTCTTAGGAG TCTCAATCCT GAATCTCGGG
TGTTCAACAT GCTAAGTACG GTTTTAGGAG TCTCGGTACT GAATCTTGGG
TGTTCAATAT GTTAAGCACT GTATTAGGCG TCTCCATCCT GAATCTTGGA
46
CATCAGAATG AACAGGGATC AGGCTATGCA GCGGATCAAA AAAGCACACA
CATTCAAATG ATCAAGGGGT TGGTATGGCT GCAGATAGGG ATTCAACTCA
CATCAAAATT CTGAGGGAAC AGGACAAGCA GCAGATCTCA AAAGCACTCA
GGTATCTTTC AA--TCAAAA --TTTGGAGT ATCAAATA-G GATATATATG
GGTGTCTTTC GA--TCAAAA --CCTGGATT ATCAAATA-G GATACATCTG
CATAGACATA AATATGGAAG --ATTATAGC ATTGATTCCA GTTATGTGTG
CGTAGATATA AACATAAAGG --ATTATAGC ATTGTTTCCA GTTATGTGTG
GCTATATATA AATGTGGCAG --ATTATAGT GTTGATTCTA GTTATGTGTG
TGATGATTTC GCTCTCATAG --TGAATGCA CCAAATCATG AGGGAATAGA
TGATGATTTC GCTCTCATAG --TGAATGCA CCAAATCATG AGGGAATACA
TGATGACTTC GCTCTCATAG --TGAATGCA CCAAATCATG AGGGAATACA
CGACGATTTT GCCCTCATAG --TGAATGCA CCAAATCATG AGGGAATACA
TGACGATTTT GCTCTGATTG --TGAATGCA CCCAATCATG AAGGGATTCA
47
TGAACAAGCA ATATGAAATA ATTGATCATG AATTCAGTGA GGTTGAAACT
CAAACGAGAA ATTCCATCAG ATTGAAAAAG AATTCTCAGA AGTAGAAGGG
GCAGTTGTGG TCCGGTGTCC CCTAAC---- -----GGGGC ATATGGAGTA
GCAGCTGTGG TCCAGTGTAT GTTGAT---- -----GGAGC AAACGGAGTA
GCAATAGTAA TTGCAGGAAT CCTAACAATG AGAGAGGGAA TCCAGGAGTG
GCAGTAGCCA TTGCTTGGAT CCTAACAATG AAGAAGGTGG TCATGGAGTG
GCAGCAGTAA CTGCAGGGAT CCTAATAACG AGAGAGGGGG CCCAGGAGTG
ATATGACCAA GAAGAAGTCT TACATAAATC GGACAGGAAC ATGTGAATTC
ACATGAGCAA AAAGAAGTCT TACATAAATC GGACAGGAAC ATTTGAGTTC
ATATGAGCAA AAAGAAGTCC TACATAAATA GGACAGGGAC ATTTGAATTC
ACATGAGCAA AAAGAAGTCC TATATAAATA AAACAGGGAC ATTTGAATTC
ATATGAGCAA GAAAAAGTCT TACATAAACA GAACAGGTAC ATTTGAATTC
48
GTCATACAAC GCGGAGCTTC TTGTGGCCCT GGAGAAC--C AACATACAAT
AACCAAAAGC ACTAATTCCA GGAGCGGCTT TGAAATGATT TGGGATCCAA
GACCAAAAGT CACAGTTCCA GACATGGGTT TGAGATGATT TGGGATCCTA
AACGATCAGC AAGGATTTAC GCTCAGGTTA TGAAACTTTC AAAGTCATTG
AACGATCAGC GAGAAGTTAC GCTCAGGATA TGAAACCTTC AAAGTCATTG
AACAATCAAG AAAGATTCGC GCTCTGGTTA TGAGACTTTC AGGGTCGTTG
GCCCAGCTTT GGAGTGTCTG GGATTAATGA ATCGGCTGAC ATGAGCATTG
GCCCAGCTTT GGAGTTTCCG GAATTAATGA ATCGGCTGAC ATGAGCATTG
GCCCAGCTTT GGAGTGTCTG GAATTAATGA ATCGGCTGAT ATGAGCATTG
TCCCAGTTTT GGAGTGTCTG GAATAAACGA GTCAGCTGAT ATGAGTATTG
TCCCAGTTTT GGTGTGTCTG GGAGCAACGA GTCAGCGGAC ATGAGTATTG
49
ATCGTAGCAA TAACTG---- ---ATTGGTC AGGATATAGC GGGAGTTTTG
GTTGTGGCAA TGACTG---- ---ATTGGTC AGGGTATAGC GGGAGTTTCG
ATAGTTGACA GCAATA---- ---ATTGGTC AGGTTACTCT GGTATTTTCT
ATAGTTGACA GAGGTA---- ---ATAGGTC CGGTTATTCT GGTATTTTCT
ATAGTTGACA GTGATA---- ---ACTGGTC TGGGTATTCT GGTATATTCT
ACAGCTCAGA TGGCTCTTCA GCTATTCATT AAGGACTACA GATACCCATA
ACAGCCCAGA TGGCTCTTCA GCTGTTCATT AAAGACTACA GATACACCTA
ACAGCCCAAA TGGCTCTTCA ACTATTCATC AAAGACTACA GATACACGTA
ACAGCCCAGA TGGCTCTCCA ATTGTTCATC AAAGACTACA GATATACATA
ACAGCTCAAA TGGCCCTTCA GTTGTTCATC AAAGATTACA GGTACACGTA
50
GTTGAATTAA TCAGGGGACG ACCTAAAGAA AAAACAATCT GGACTAGTGC
GTGGAGTTGA TAAGGGGAAG GCAACAGGAG ACTAGAGTAT GGTGGACCTC
GTGGAGTTGA TAAGGGGAAG AAAAGAGGAA ACTGAAGTCT TGTGGACCTC
GTGGAGTTGA TAAGAGGGAG ACCACAGGAG ACCAGAGTAT GGTGGACTTC
CTGAAGAAGC TGTGGGAGCA GACCCGCTCA AAGGCAGGAC TGTTGGTTTC
TTGAAGAAGC TGTGGGAGCA GACCCGCTCA AAGGCAGGAC TGTTGGTTTC
CTAAAGAAGC TGTGGGAGCA AACCCGCTCA AAGGCAGGAC TTTTGGTGTC
CTAAAGAAGC TGTGGGATCA AACCCAATCA AGGGCAGGAC TATTGGTATC
ATAAAGAAAC TGTGGGAGCA AACCCGTTCC AAAGCTGGAC TGCTGGTCTC
51
CATGGCCTGA -TGGGGCGAA CATCAATTTC ATGCCTATAT AA--------
CATGGCCTGA -TGGGGCGGA CATCAATCTC ATGCCTATAT AAGCTTTCGC
CATGGCCTGA -TGGAGCGAA TATCAATTTC ATGTCTATAT AAGCTTTCGC
CTGGCTTGAA GTGGGAATTG ATGGATGAAG ACTACCAGGG CAGACTGTGT
TTTGCTTGAA GTGGGAGTTG ATGGATGAAG ATTACCAGGG AAGACTGTGT
TCTGCTTGAA ATGGGAGCTA ATGGATGAAG ACTATCAGGG GAGGCTTTGT
TCTGCTTAAA GTGGGAGCTA ATGGATGAGA ATTATCGGGG AAGACTTTGT
TCTGCCTAAA ATGGGAATTG ATGGATGAGG ATTACCAGGG GCGTTTATGC
52
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TGCTGTGGTA ATGCCAGCTC ATGGCCCAGC CAAGAGCATG GAATATGATG
TGCTGTGGTA ATGCCAGCCC ATGGTCCGGC CAAGAGCATG GAATATGATG
TGCTGTGGTA ATGCCAGCTC ACGGTCCAGC CAAGAGCATG GAATATGATG
TGCTGTAGTG ATGCCAGCCC ACGGTCCAGC CAAAAGTATG GAATATGATG
TGCAGTGATG ATGCCAGCAC ATGGTCCAGC CAAAAACATG GAGTATGATG
53
---------- ---------- ---------- ---------- ----------
AACACCAGCC AAAGGGGGAT TCTTGAGGAT GAACAGATGT ATCAGAAGTG
AACACTAGCC AAAGGGGAAT TCTTGAGGAT GAACAAATGT ACCAGAAGTG
AACACAAGCC AAAGGGGAAT TCTTGAAGAT GAACAGATGT ATCAGAAGTG
AACACTAGCC AAAGGGGAAT TCTTGAGGAT GAACAGATGT ACCAAAAGTG
AATACAAGTC AAAGAGGAGT ACTTGAAGAT GAACAAATGT ACCAAAGGTG
54
TTGGAATTTC CAGCATGGTG GAGGCCATGG TGTCTAGGGC CCGAATTGAT
TTGGAATTTC CAGCATGATG GAGGCCATGG TGTCTAGGGC CCGAATTGAT
TTGGAATTTC CAGCATGGTG GAGGCCATGG TGTCTAGGGC TCGGATTGAT
TTGGAATTTC TAGCATGGTG GAGGCCATGG TGTCTAGGGC CCGGATTGAT
TCGGGATATC CAGTATGGTG GAGGCTATGG TTTCCAGAGC CCGAATTGAT
55
GATCTTGAAG ATCTGTTCCA CCATTGAAGA GCTCGGACGG CAAGGGAAGT
GATCATGAAG ATCTGTTCCA CCATTGAAGA GCTCAGACGG CAAAAATAGT
GATCATGAAG ATCTGTTCCA CCATTGAAGA ACTCAGACGG CAAAAATAAT
GATCATGAAG ATCTGTTCCA CCATTGAAGA GCTCAGACGG CAAAAATAGT
41
gi|7385295
0.0000 0.1063 0.1753 0.1641 0.1931 2.3471 2.3223 2.3333
2.3923 2.3492 2.4457 2.4589 2.4414 2.4361 2.4361 2.4175 2.1281
2.1334 2.1353 2.1765 2.4725 2.3944 2.4102 2.3769 2.3570 2.3404
3.4066 3.1170 3.3324 3.3557 3.9585 3.7485 3.8024 3.4048 3.3406
3.4574 3.6922 3.6989 3.7171 3.5242 3.9746
gi|3214017
0.1063 0.0000 0.1819 0.1662 0.1978 2.2872 2.2655 2.2463
2.3663 2.3324 2.3903 2.4156 2.3836 2.3723 2.3888 2.3549 2.1077
2.1311 2.1038 2.1778 2.3779 2.3026 2.3312 2.3061 2.3164 2.2842
3.2983 3.0012 3.2202 3.3403 3.8456 3.6957 3.7325 3.3231 3.2845
3.4185 3.5965 3.6382 3.6024 3.4367 3.9079
gi|7391268
0.1753 0.1819 0.0000 0.0635 0.0732 2.3310 2.3093 2.2899
2.3079 2.2873 2.3792 2.3645 2.3543 2.3290 2.3290 2.3346 2.0488
2.0537 2.0979 2.0716 2.4847 2.1932 2.2132 2.2194 2.1773 2.1622
3.1839 2.9733 3.2670 3.3618 3.6743 3.4250 3.3827 3.3130 3.2729
3.4077 3.6756 3.6942 3.6854 3.5127 3.9462
gi|8486136
0.1641 0.1662 0.0635 0.0000 0.1143 2.2752 2.2540 2.2418
2.2874 2.2447 2.3460 2.3970 2.3782 2.3763 2.3879 2.3699 2.0142
2.0117 2.0652 2.0141 2.4847 2.2457 2.2457 2.2790 2.2420 2.1932
3.2076 2.9808 3.3757 3.3544 3.7509 3.4675 3.4940 3.4097 3.4056
3.5080 3.6887 3.6989 3.6968 3.5370 4.0509
gi|7391913
0.1931 0.1978 0.0732 0.1143 0.0000 2.3164 2.2949 2.2982
2.2653 2.2545 2.3346 2.3900 2.3915 2.3745 2.3763 2.3722 2.0646
2.0454 2.1000 2.0535 2.5042 2.1984 2.2083 2.2405 2.2301 2.1671
3.3522 3.0557 3.2515 3.3894 3.6161 3.3418 3.3619 3.2329 3.1764
56
3.2901 3.7411 3.8351 3.7548 3.5715 4.1240
gi|3214015
2.3471 2.2872 2.3310 2.2752 2.3164 0.0000 0.0026 0.0604
0.2072 0.2188 0.1729 1.8735 1.8961 1.8326 1.8229 1.8459 1.7862
1.8365 1.8321 1.9505 1.9020 2.5268 2.4282 2.3710 2.3772 2.4389
3.2524 3.5546 2.8988 3.2747 4.0283 3.7818 4.1020 3.5122 3.2693
3.4376 3.3673 3.3846 3.2598 3.4325 3.6583
gi|9316315
2.3223 2.2655 2.3093 2.2540 2.2949 0.0026 0.0000 0.0629
0.2081 0.2185 0.1773 1.8978 1.9210 1.8593 1.8535 1.8706 1.7862
1.8365 1.8321 1.9505 1.9020 2.5268 2.4282 2.3710 2.3772 2.4389
3.2085 3.5254 2.8775 3.2445 3.9636 3.7506 4.0654 3.5122 3.2480
3.4376 3.3452 3.3622 3.2386 3.4065 3.6268
gi|7385295
2.3333 2.2463 2.2899 2.2418 2.2982 0.0604 0.0629 0.0000
0.1857 0.1931 0.1538 1.8691 1.8946 1.8198 1.8043 1.8482 1.8428
1.9098 1.9150 2.0378 1.9624 2.5193 2.4126 2.3697 2.3658 2.4131
3.2091 3.3703 2.8235 3.1243 3.7117 3.7628 4.0000 3.4275 3.2102
3.3757 3.2474 3.2770 3.1404 3.3142 3.5072
gi|7392130
2.3923 2.3663 2.3079 2.2874 2.2653 0.2072 0.2081 0.1857
0.0000 0.0840 0.0854 1.7959 1.7853 1.7906 1.8189 1.8426 1.7581
1.8321 1.8591 1.9264 1.9050 2.3097 2.2709 2.2180 2.2387 2.2348
3.4449 3.4819 3.0244 3.2002 3.8465 3.5716 3.8485 3.2679 3.1513
3.2696 3.3755 3.4057 3.3646 3.5577 3.8782
gi|7391914
2.3492 2.3324 2.2873 2.2447 2.2545 0.2188 0.2185 0.1931
0.0840 0.0000 0.1316 1.8751 1.9080 1.8509 1.8905 1.8919 1.8452
1.9098 1.9579 1.9903 2.0053 2.2506 2.2334 2.2124 2.2378 2.2339
3.4797 3.6735 3.0447 3.2764 3.8861 3.6433 3.9515 3.2690 3.1515
3.1859 3.5497 3.5460 3.5616 3.7566 3.8968
gi|8486129
2.4457 2.3903 2.3792 2.3460 2.3346 0.1729 0.1773 0.1538
0.0854 0.1316 0.0000 1.9223 1.9150 1.8833 1.9165 1.9450 1.7786
1.8124 1.9045 1.9465 1.9350 2.2993 2.2342 2.1553 2.1970 2.2077
3.4354 3.6329 2.9941 3.2301 3.7635 3.6555 3.9188 3.3270 3.1548
3.4501 3.5787 3.6013 3.5044 3.6913 3.9760
gi|7385294
2.4589 2.4156 2.3645 2.3970 2.3900 1.8735 1.8978 1.8691
1.7959 1.8751 1.9223 0.0000 0.1329 0.1899 0.1958 0.1861 1.8359
1.7804 1.8717 1.7894 1.6858 2.3617 2.3911 2.4332 2.5037 2.4545
3.4453 2.9944 3.2243 3.2154 3.6537 3.8403 3.7811 3.5032 3.4476
3.6604 3.3555 3.3149 3.3699 3.2953 3.1628
gi|3214016
2.4414 2.3836 2.3543 2.3782 2.3915 1.8961 1.9210 1.8946
1.7853 1.9080 1.9150 0.1329 0.0000 0.1985 0.2048 0.1866 1.8219
1.7786 1.8343 1.7768 1.6980 2.3391 2.3485 2.4835 2.5265 2.4560
3.4125 2.9268 3.2629 3.0825 3.4834 4.0980 3.8438 3.4661 3.4527
3.7056 3.2613 3.2418 3.2920 3.2639 3.1153
gi|7391882
2.4361 2.3723 2.3290 2.3763 2.3745 1.8326 1.8593 1.8198
1.7906 1.8509 1.8833 0.1899 0.1985 0.0000 0.0718 0.0771 1.7394
1.6830 1.7922 1.7328 1.6304 2.3199 2.3441 2.4214 2.4810 2.4421
3.2508 2.8484 2.9419 3.0785 3.3668 3.6878 3.5423 3.4478 3.4432
3.5653 3.4535 3.4243 3.4878 3.5116 3.3345
gi|7391905
2.4361 2.3888 2.3290 2.3879 2.3763 1.8229 1.8535 1.8043
1.8189 1.8905 1.9165 0.1958 0.2048 0.0718 0.0000 0.1157 1.7256
1.6556 1.7738 1.6841 1.6127 2.2979 2.3320 2.3968 2.4595 2.4125
3.1872 2.8279 2.8438 3.0565 3.4205 3.7099 3.5393 3.5158 3.5048
3.5986 3.4109 3.3748 3.4148 3.4229 3.3269
gi|8486138
2.4175 2.3549 2.3346 2.3699 2.3722 1.8459 1.8706 1.8482
1.8426 1.8919 1.9450 0.1861 0.1866 0.0771 0.1157 0.0000 1.7464
1.6892 1.8241 1.7445 1.6850 2.3446 2.3642 2.4330 2.4839 2.4675
3.3591 2.8793 3.0757 3.1889 3.4216 3.5794 3.4283 3.3558 3.3524
3.4877 3.3955 3.3630 3.4362 3.4367 3.2463
gi|7392156
2.1281 2.1077 2.0488 2.0142 2.0646 1.7862 1.7862 1.8428
1.7581 1.8452 1.7786 1.8359 1.8219 1.7394 1.7256 1.7464 0.0000
0.0672 0.0791 0.1454 0.4088 1.9652 2.0763 1.9867 2.0224 2.0344
2.7992 2.6308 2.6861 3.4638 3.2188 3.7354 3.3783 3.5087 3.4517
3.3484 3.2180 3.0730 3.3767 3.2010 3.3079
gi|7391921
2.1334 2.1311 2.0537 2.0117 2.0454 1.8365 1.8365 1.9098
1.8321 1.9098 1.8124 1.7804 1.7786 1.6830 1.6556 1.6892 0.0672
0.0000 0.1177 0.1624 0.3727 1.9452 2.0526 1.9835 2.0339 2.0339
2.7025 2.4562 2.5497 3.3621 3.0388 3.8921 3.5995 3.4530 3.4882
3.3117 3.0655 2.9303 3.2099 3.0375 3.2489
gi|8486131
2.1353 2.1038 2.0979 2.0652 2.1000 1.8321 1.8321 1.9150
1.8591 1.9579 1.9045 1.8717 1.8343 1.7922 1.7738 1.8241 0.0791
0.1177 0.0000 0.1252 0.3973 2.0882 2.1954 2.1324 2.1609 2.1728
2.8688 2.6256 2.7224 3.3832 3.4314 4.2119 3.7273 3.6344 3.6545
57
3.4832 3.1278 3.0100 3.3041 3.1280 3.4143
gi|3214016
2.1765 2.1778 2.0716 2.0141 2.0535 1.9505 1.9505 2.0378
1.9264 1.9903 1.9465 1.7894 1.7768 1.7328 1.6841 1.7445 0.1454
0.1624 0.1252 0.0000 0.4015 2.0495 2.1645 2.1431 2.1843 2.1541
2.9173 2.7117 2.7484 3.3454 3.3862 4.5655 3.9304 3.9352 3.7668
3.7097 3.3456 3.1959 3.5743 3.4190 3.7992
gi|7385294
2.4725 2.3779 2.4847 2.4847 2.5042 1.9020 1.9020 1.9624
1.9050 2.0053 1.9350 1.6858 1.6980 1.6304 1.6127 1.6850 0.4088
0.3727 0.3973 0.4015 0.0000 2.1572 2.2267 2.2642 2.2859 2.2775
3.1556 2.6173 3.0914 3.5086 3.3677 3.8865 4.2283 3.4490 3.3421
3.4440 2.8189 2.7691 2.8591 2.7993 3.0417
gi|7385295
2.3944 2.3026 2.1932 2.2457 2.1984 2.5268 2.5268 2.5193
2.3097 2.2506 2.2993 2.3617 2.3391 2.3199 2.2979 2.3446 1.9652
1.9452 2.0882 2.0495 2.1572 0.0000 0.0786 0.1123 0.1362 0.1063
4.8457 3.5818 3.5454 4.2749 4.3292 5.3580 5.6194 4.9122 4.8318
4.5861 4.5477 4.3458 4.3820 5.0460 4.4859
gi|3214142
2.4102 2.3312 2.2132 2.2457 2.2083 2.4282 2.4282 2.4126
2.2709 2.2334 2.2342 2.3911 2.3485 2.3441 2.3320 2.3642 2.0763
2.0526 2.1954 2.1645 2.2267 0.0786 0.0000 0.1233 0.1396 0.1160
4.4229 3.3648 3.6261 4.2044 4.2963 5.4914 5.7825 5.6269 5.4956
5.1057 4.6878 4.4745 4.5093 5.2488 4.6878
gi|7391268
2.3769 2.3061 2.2194 2.2790 2.2405 2.3710 2.3710 2.3697
2.2180 2.2124 2.1553 2.4332 2.4835 2.4214 2.3968 2.4330 1.9867
1.9835 2.1324 2.1431 2.2642 0.1123 0.1233 0.0000 0.0525 0.0567
5.3095 3.8582 3.7704 4.5809 4.8015 5.8650 6.0600 5.0782 4.9995
4.8603 4.7422 4.6551 4.6975 5.3408 4.9884
gi|7391915
2.3570 2.3164 2.1773 2.2420 2.2301 2.3772 2.3772 2.3658
2.2387 2.2378 2.1970 2.5037 2.5265 2.4810 2.4595 2.4839 2.0224
2.0339 2.1609 2.1843 2.2859 0.1362 0.1396 0.0525 0.0000 0.0779
5.1091 3.7759 3.8247 4.2165 4.6605 5.5875 5.9400 4.7465 4.8260
4.5831 4.6156 4.4064 4.4469 4.9537 4.6771
gi|8486122
2.3404 2.2842 2.1622 2.1932 2.1671 2.4389 2.4389 2.4131
2.2348 2.2339 2.2077 2.4545 2.4560 2.4421 2.4125 2.4675 2.0344
2.0339 2.1728 2.1541 2.2775 0.1063 0.1160 0.0567 0.0779 0.0000
4.7002 3.6553 3.6434 4.0542 4.4690 5.7112 6.0600 4.6885 4.6312
4.5197 4.6864 4.4694 4.5141 5.0517 4.7604
gi|7385295
3.4066 3.2983 3.1839 3.2076 3.3522 3.2524 3.2085 3.2091
3.4449 3.4797 3.4354 3.4453 3.4125 3.2508 3.1872 3.3591 2.7992
2.7025 2.8688 2.9173 3.1556 4.8457 4.4229 5.3095 5.1091 4.7002
0.0000 0.4017 0.5474 0.7250 0.9789 4.1633 3.8804 3.0973 3.2133
3.2358 3.7356 3.5969 3.6391 3.7716 3.6338
gi|7391914
3.1170 3.0012 2.9733 2.9808 3.0557 3.5546 3.5254 3.3703
3.4819 3.6735 3.6329 2.9944 2.9268 2.8484 2.8279 2.8793 2.6308
2.4562 2.6256 2.7117 2.6173 3.5818 3.3648 3.8582 3.7759 3.6553
0.4017 0.0000 0.5016 0.7396 0.9167 3.8304 3.9581 3.0471 3.1679
3.2251 3.3705 3.2747 3.2685 3.3578 3.3159
gi|8486125
3.3324 3.2202 3.2670 3.3757 3.2515 2.8988 2.8775 2.8235
3.0244 3.0447 2.9941 3.2243 3.2629 2.9419 2.8438 3.0757 2.6861
2.5497 2.7224 2.7484 3.0914 3.5454 3.6261 3.7704 3.8247 3.6434
0.5474 0.5016 0.0000 0.7361 0.8937 3.3880 3.4460 2.9635 2.9855
3.1497 3.4162 3.3643 3.2892 3.3775 3.3498
gi|3214016
3.3557 3.3403 3.3618 3.3544 3.3894 3.2747 3.2445 3.1243
3.2002 3.2764 3.2301 3.2154 3.0825 3.0785 3.0565 3.1889 3.4638
3.3621 3.3832 3.3454 3.5086 4.2749 4.2044 4.5809 4.2165 4.0542
0.7250 0.7396 0.7361 0.0000 0.9653 3.9915 3.8362 3.1780 3.0869
3.3141 3.6510 3.4541 3.3768 3.3963 3.5212
gi|7391920
3.9585 3.8456 3.6743 3.7509 3.6161 4.0283 3.9636 3.7117
3.8465 3.8861 3.7635 3.6537 3.4834 3.3668 3.4205 3.4216 3.2188
3.0388 3.4314 3.3862 3.3677 4.3292 4.2963 4.8015 4.6605 4.4690
0.9789 0.9167 0.8937 0.9653 0.0000 3.5835 3.6419 3.6494 3.7049
3.7925 3.7010 3.5998 3.4375 3.3912 3.7110
gi|7392126
3.7485 3.6957 3.4250 3.4675 3.3418 3.7818 3.7506 3.7628
3.5716 3.6433 3.6555 3.8403 4.0980 3.6878 3.7099 3.5794 3.7354
3.8921 4.2119 4.5655 3.8865 5.3580 5.4914 5.8650 5.5875 5.7112
4.1633 3.8304 3.3880 3.9915 3.5835 0.0000 0.2097 0.9395 0.9020
0.8816 1.8960 1.8747 1.8492 1.7900 1.8792
gi|8486127
3.8024 3.7325 3.3827 3.4940 3.3619 4.1020 4.0654 4.0000
3.8485 3.9515 3.9188 3.7811 3.8438 3.5423 3.5393 3.4283 3.3783
3.5995 3.7273 3.9304 4.2283 5.6194 5.7825 6.0600 5.9400 6.0600
3.8804 3.9581 3.4460 3.8362 3.6419 0.2097 0.0000 0.9753 0.9052
58
0.9255 1.8904 1.8394 1.8416 1.7969 1.8803
gi|7392130
3.4048 3.3231 3.3130 3.4097 3.2329 3.5122 3.5122 3.4275
3.2679 3.2690 3.3270 3.5032 3.4661 3.4478 3.5158 3.3558 3.5087
3.4530 3.6344 3.9352 3.4490 4.9122 5.6269 5.0782 4.7465 4.6885
3.0973 3.0471 2.9635 3.1780 3.6494 0.9395 0.9753 0.0000 0.1261
0.1695 1.7832 1.7161 1.8218 1.7742 1.8920
gi|7391913
3.3406 3.2845 3.2729 3.4056 3.1764 3.2693 3.2480 3.2102
3.1513 3.1515 3.1548 3.4476 3.4527 3.4432 3.5048 3.3524 3.4517
3.4882 3.6545 3.7668 3.3421 4.8318 5.4956 4.9995 4.8260 4.6312
3.2133 3.1679 2.9855 3.0869 3.7049 0.9020 0.9052 0.1261 0.0000
0.2151 1.8357 1.7788 1.8401 1.7820 1.9628
gi|3214016
3.4574 3.4185 3.4077 3.5080 3.2901 3.4376 3.4376 3.3757
3.2696 3.1859 3.4501 3.6604 3.7056 3.5653 3.5986 3.4877 3.3484
3.3117 3.4832 3.7097 3.4440 4.5861 5.1057 4.8603 4.5831 4.5197
3.2358 3.2251 3.1497 3.3141 3.7925 0.8816 0.9255 0.1695 0.2151
0.0000 1.6716 1.6186 1.6838 1.6851 1.8115
gi|7385294
3.6922 3.5965 3.6756 3.6887 3.7411 3.3673 3.3452 3.2474
3.3755 3.5497 3.5787 3.3555 3.2613 3.4535 3.4109 3.3955 3.2180
3.0655 3.1278 3.3456 2.8189 4.5477 4.6878 4.7422 4.6156 4.6864
3.7356 3.3705 3.4162 3.6510 3.7010 1.8960 1.8904 1.7832 1.8357
1.6716 0.0000 0.0715 0.1065 0.1524 0.1954
gi|3214016
3.6989 3.6382 3.6942 3.6989 3.8351 3.3846 3.3622 3.2770
3.4057 3.5460 3.6013 3.3149 3.2418 3.4243 3.3748 3.3630 3.0730
2.9303 3.0100 3.1959 2.7691 4.3458 4.4745 4.6551 4.4064 4.4694
3.5969 3.2747 3.3643 3.4541 3.5998 1.8747 1.8394 1.7161 1.7788
1.6186 0.0715 0.0000 0.1206 0.1522 0.1953
gi|7391268
3.7171 3.6024 3.6854 3.6968 3.7548 3.2598 3.2386 3.1404
3.3646 3.5616 3.5044 3.3699 3.2920 3.4878 3.4148 3.4362 3.3767
3.2099 3.3041 3.5743 2.8591 4.3820 4.5093 4.6975 4.4469 4.5141
3.6391 3.2685 3.2892 3.3768 3.4375 1.8492 1.8416 1.8218 1.8401
1.6838 0.1065 0.1206 0.0000 0.1089 0.1903
gi|7391914
3.5242 3.4367 3.5127 3.5370 3.5715 3.4325 3.4065 3.3142
3.5577 3.7566 3.6913 3.2953 3.2639 3.5116 3.4229 3.4367 3.2010
3.0375 3.1280 3.4190 2.7993 5.0460 5.2488 5.3408 4.9537 5.0517
3.7716 3.3578 3.3775 3.3963 3.3912 1.7900 1.7969 1.7742 1.7820
1.6851 0.1524 0.1522 0.1089 0.0000 0.2047
gi|8486134
3.9746 3.9079 3.9462 4.0509 4.1240 3.6583 3.6268 3.5072
3.8782 3.8968 3.9760 3.1628 3.1153 3.3345 3.3269 3.2463 3.3079
3.2489 3.4143 3.7992 3.0417 4.4859 4.6878 4.9884 4.6771 4.7604
3.6338 3.3159 3.3498 3.5212 3.7110 1.8792 1.8803 1.8920 1.9628
1.8115 0.1954 0.1953 0.1903 0.2047 0.0000
41 Populations
Neighbor-Joining/UPGMA method version 3.573c
Neighbor-joining method
Negative branch lengths allowed
+gi|3214017
!
! +gi|7391268
! +-19
! +-20 +gi|7391913
! ! !
! ! +gi|8486136
! !
! ! +gi|7385295
! ! +-16
! ! ! +gi|3214142
! ! +------------17
! ! ! ! +gi|7391268
! ! ! ! +-14
! ! ! +-15 +gi|7391915
! ! ! !
! ! ! +gi|8486122
-18-21 !
! ! ! +gi|7385294
! ! ! +-23
! ! ! ! +gi|3214016
! ! ! +-------26
59
! ! ! ! ! +gi|7391882
! ! ! ! ! +-24
! ! ! ! +-25 +gi|8486138
! ! ! ! !
! ! ! ! +gi|7391905
! ! ! !
! ! ! ! +gi|7392156
! ! ! ! +-28
! ! ! ! ! ! +gi|8486131
! +----------31 +-33 +-29 +-27
! ! ! ! ! ! +gi|3214016
! ! ! ! +------30 !
! ! ! ! ! ! +gi|7391921
! ! ! ! ! !
! ! ! ! ! +--gi|7385294
! ! ! ! !
! ! ! ! ! +---gi|7385295
! ! ! ! ! +--9
! ! ! ! ! ! +gi|7391914
! ! ! +-32 +-11
! ! ! ! ! ! +---gi|3214016
! ! ! ! +------------12 +-10
! ! ! ! ! ! +-------gi|7391920
! ! ! ! ! !
! ! ! ! ! +-gi|8486125
! ! ! ! !
! ! ! ! ! +gi|7392126
! ! ! ! ! +------1
! ! ! +------22 ! +gi|8486127
! ! ! ! +------8
! +-34 ! ! ! +gi|7392130
! ! ! ! ! +--2
! ! ! ! +--3 +gi|7391913
! ! ! ! !
! ! ! ! +gi|3214016
! ! +----------13
! ! ! +gi|7385294
! ! ! +--5
! ! ! ! ! +gi|7391914
! ! ! +--6 +--4
! ! ! ! ! +-gi|8486134
! ! +-------7 !
! ! ! +gi|7391268
! ! !
! ! +gi|3214016
! !
! ! +gi|3214015
! ! +-35
! ! +-36 +gi|9316315
! ! ! !
! ! ! +gi|7385295
! +--------37
! ! +gi|7392130
! ! +-38
! +-39 +gi|7391914
! !
! +gi|8486129
!
+gi|7385295
60
23 gi|7385294 0.06604
23 gi|3214016 0.06686
26 25 0.02485
25 24 0.01715
24 gi|7391882 0.02677
24 gi|8486138 0.05033
25 gi|7391905 0.03805
33 32 0.06004
32 30 0.61962
30 29 0.09053
29 28 0.01212
28 gi|7392156 0.00981
28 27 0.03984
27 gi|8486131 0.03722
27 gi|3214016 0.08798
29 gi|7391921 0.03538
30 gi|7385294 0.24606
32 22 0.62308
22 12 1.17380
12 11 0.08424
11 9 0.08769
9 gi|7385295 0.36795
9 gi|7391914 0.03375
11 10 0.06886
10 gi|3214016 0.33587
10 gi|7391920 0.62943
12 gi|8486125 0.16543
22 13 1.01545
13 8 0.57819
8 1 0.57132
1 gi|7392126 0.09325
1 gi|8486127 0.11645
8 3 0.14471
3 2 0.03457
2 gi|7392130 0.08332
2 gi|7391913 0.04278
3 gi|3214016 0.09468
13 7 0.69229
7 6 0.03131
6 5 0.01490
5 gi|7385294 0.04241
5 4 0.02914
4 gi|7391914 0.06240
4 gi|8486134 0.14230
6 gi|7391268 0.02620
7 gi|3214016 0.02627
34 37 0.82306
37 36 0.07466
36 35 0.04020
35 gi|3214015 0.00078
35 gi|9316315 0.00182
36 gi|7385295 0.02015
37 39 0.02873
39 38 0.01388
38 gi|7392130 0.02606
38 gi|7391914 0.05794
39 gi|8486129 0.05262
18 gi|7385295 0.07476
Output of consense
Majority-rule and strict consensus tree program, version 3.573c
Species in order:
gi|3214017
gi|7391268
gi|7391913
gi|8486136
gi|7385295
gi|3214142
gi|7391268
gi|7391915
gi|8486122
gi|7385294
gi|3214016
gi|7391882
gi|8486138
gi|7391905
gi|7392156
gi|8486131
gi|3214016
gi|7391921
gi|7385294
gi|7385295
61
gi|7391914
gi|3214016
gi|7391920
gi|8486125
gi|7392126
gi|8486127
gi|7392130
gi|7391913
gi|3214016
gi|7385294
gi|7391914
gi|8486134
gi|7391268
gi|3214016
gi|3214015
gi|9316315
gi|7385295
gi|7392130
gi|7391914
gi|8486129
gi|7385295
CONSENSUS TREE:
the numbers at the forks indicate the number
of times the group consisting of the species
which are to the right of that fork occurred
among the trees, out of 1.00 trees
+----gi|8486131
+--1.0
+--1.0 +----gi|3214016
! !
+--1.0 +---------gi|7392156
! !
+------------1.0 +--------------gi|7391921
! !
! +-------------------gi|7385294
62
!
! +----gi|7391914
! +--1.0
! ! +----gi|7385295
! +--1.0
+--1.0 ! ! +----gi|3214016
! ! +------------1.0 +--1.0
! ! ! ! +----gi|7391920
! ! ! !
! ! ! +--------------gi|8486125
! ! !
! ! ! +---------gi|3214016
! ! ! +--1.0
! +--1.0 ! ! +----gi|7391913
! ! ! +--1.0
! ! +-------1.0 +----gi|7392130
! ! ! !
! ! ! ! +----gi|7392126
! ! ! +-------1.0
! ! ! +----gi|8486127
+--1.0 +--1.0
! ! ! +--------------gi|7391268
! ! ! !
! ! ! +--1.0 +----gi|7391914
! ! ! ! ! +--1.0
! ! ! ! +--1.0 +----gi|8486134
! ! +--1.0 !
! ! ! +---------gi|7385294
! ! !
! ! +-------------------gi|3214016
! !
! ! +----gi|7391882
+--1.0 ! +--1.0
! ! ! +--1.0 +----gi|8486138
! ! ! ! !
! ! +----------------------1.0 +---------gi|7391905
! ! !
! ! ! +----gi|3214016
! ! +-------1.0
! ! +----gi|7385294
! !
! ! +----gi|7391914
! ! +--1.0
! ! +--1.0 +----gi|7392130
+--1.0 ! ! !
! ! +---------------------------1.0 +---------gi|8486129
! ! !
! ! ! +---------gi|7385295
! ! +--1.0
! ! ! +----gi|9316315
! ! +--1.0
! ! +----gi|3214015
! !
! ! +----gi|7391268
+--1.0 ! +--1.0
! ! ! +--1.0 +----gi|7391915
! ! ! ! !
! ! +--------------------------------1.0 +---------gi|8486122
! ! !
! ! ! +----gi|3214142
! ! +-------1.0
! ! +----gi|7385295
! !
! ! +----gi|7391268
! ! +--1.0
! +------------------------------------------1.0 +----gi|7391913
! !
! +---------gi|8486136
!
+-----------------------------------------------------------gi|3214017
!
+-----------------------------------------------------------gi|7385295
63
Neighbor tree with branch length:
64
0 .2
g i |3 2 1 4 0 1 6
g i |7 3 9 1 2 6 8
g i |8 4 8 6 1 3 4
g i |7 3 9 1 9 1 4
g i |7 3 8 5 2 9 4
g i |3 2 1 4 0 1 6
g i |7 3 9 1 9 1 3
g i |7 3 9 2 1 3 0
g i |8 4 8 6 1 2 7
g i |7 3 9 2 1 2 6
g i |7 3 8 5 2 9 5
g i |3 2 1 4 0 1 7
g i |8 4 8 6 1 3 6
g i |7 3 9 1 9 1 3
g i |7 3 9 1 2 6 8
g i |8 4 8 6 1 2 2
g i |7 3 9 1 9 1 5
g i |7 3 9 1 2 6 8
g i |3 2 1 4 1 4 2
g i |7 3 8 5 2 9 5
g i |8 4 8 6 1 2 9
g i |7 3 9 1 9 1 4
g i |7 3 9 2 1 3 0
g i |7 3 8 5 2 9 5
g i |9 3 1 6 3 1 5
g i |3 2 1 4 0 1 5
g i |7 3 9 1 9 0 5
g i |8 4 8 6 1 3 8
g i |7 3 9 1 8 8 2
g i |3 2 1 4 0 1 6
g i |7 3 8 5 2 9 4
g i |7 3 8 5 2 9 4
g i |7 3 9 1 9 2 1
g i |3 2 1 4 0 1 6
g i |8 4 8 6 1 3 1
g i |7 3 9 2 1 5 6
g i |8 4 8 6 1 2 5
g i |7 3 9 1 9 2 0
g i |3 2 1 4 0 1 6
g i |7 3 9 1 9 1 4
g i |7 3 8 5 2 9 5
Family Analysis
We also performed Family analysis in our current work in order to find out no. of base pairs present, exons, introns, orfs present in the sequences of these different
strains.
65
MEDFVRQCFNPMIVELAEKAMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESTIIESGDPNALLKHRFEIIEGRDRTMAWTVVNSICNTTGVE
KPKFLPDLYDYKENRFIEIGVTRREVHTYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDSFRQSERGEETVEE
RFEITGTMCRLADQSLPPNFSSLEKFRAYVDGFEPNGCIEGKLSQMSKEVNARIEPFLKTTPRPLRLPDGPPCSQRSKFLLMDALKLSIEDPSHEGEGIPLYD
AIKCMKTFFGWKEPNIVKPHEKGINPNYLLAWKQVLAELQDIENEEKIPKTKNMRKTSQLKWALGENMAPEKVDFEDCKDVSDLRQYDSDEPKPRSLA
SWIQSEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFLIK
GRSHLRNDTDVVNFVSMEFSLTDPRLEPHRWEKYCVLRIGDMLLRTEIGQVSRPMFLYVRTNGTSKIKMKWGMEMRRCPFQSLQQIESMIEAESSVKEK
DMTKEFFENKSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLIVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASW
FNSFLTHALR
>NC_007359.1_2 [1706 - 1359] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 3, complete sequence
MGLDTWPISVRKSMSPIRRTQYFSHLCGSSLGSVRENSILTKFTTSVSFLKWDLPFIRNPYRLVFRLPSLVLHLLIIGISWKSSMAAQDAFNKAVFMYTPFI
MYSVALQWDTSAVK
>NC_007359.1_3 [1337 - 1026] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 3, complete sequence
MCSIGATSSPISSSSIQLESVNSHALLNSLWIQLASDLGFGSSLSYCLRSLTSLQSSKSTFSGAIFSPSAHFNWLVFLMFFVFGIFSSFSISWSSASTCFQARR
>NC_007359.1_4 [311 - 3] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 3, complete sequence
MQILFTTVQAIVRSLPSIISNRCFNNALGSPDSIIVDSPRSSIKWKSEYMKQTSKCVHIAANLFVSIFGSSPYSFIAFSASSTIIGLKHCRTKSSILDQYLLL
>NC_007357.1_1 [28 - 2304] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 1, complete sequence
MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVT
WWNRNGPTTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKKEELQDC
KIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGIRMV
DILRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTNGSSVKKEEEVLTGNLQTLKIKVHEGYEEFTMVGRRATAILRKATRRLIQLIVSGRDEQSIAEAI
IVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEPIDNVMGMIGILPDMTPSAEMSLRGVRVSKMGVDEYSSTERVV
VSIDRFLRVRDQQGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQDPTMLYNKMEFESFQSLVPKAARSQYSGF
VRTLFQQMRDVLGTFDTVQIIKLLPFAAAPPEPSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKATKRLTVLGKDAGALTEDPDEGTAGVESAVLRGF
LILGREDKRYGPALSINELSNLAKGEKANVLIMQGDVVLVMKRKRDFSILTDSQTATKRIRMAIN
>NC_007357.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLKSRFRFITNTTSPCIINTLAFSPFARLLSSLMLNAGPYLLSSLPRIRNPLNTADSTPAVPSSGSSVNAPASFPRTVSLLVALL
>NC_007364.1_1 [15 - 704] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 8, complete sequence
MDSNTITSFQVDCYLWHIRKLLSMRDMCDAPFDDRLRRDQKALKGRGSTLGLDLRVATMEGKKIVEDILKSETNENLKIAIASSPAPRYITDMSIEEMSRE
WYMLMPRQKITGGLMVKMDQAIMDKRIILKANFSVLFDQLETLVSLRAFTESGAIVAEIFPIPSVPGHFTEDVKNAIGI
LIGGLEWNDNSIRASENIQRFAWGIHDENGGPSLPPKQKRYMAKRVESEV
>NC_007364.1_2 [481 - 849] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 8, complete sequence
MWLKYFPFPPYQDILQRMSKMQLESSSVDLNGMITQFERLKIYRDSLGESMMRMGDLHSLQNRNATWRNELSQKFEEIRWLIAECRNILTKTENSFEQIT
FLQALQLLLEVESEIRTFSFQLI
>NC_007364.1_3 [382 - 56] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 8, complete sequence
MAWSIFTIRPPVIFCLGISMYHSRLISSMLISVIYRGAGLEAMAILRFSFVSLFRMSSTIFFPSIVATLKSSPSVLPLPFNAFWSLRSLSSKGASHMSLILSSFLM
CHR
>NC_007361.1_1 [21 - 1427] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 6, complete sequence
MNPNQKIITIGSICMVVGIISLMLQIGNIISIWVSHSIQTGNQHQAEPCNQSIITYENNTWVNQTYVNISNTNFLTEKAVASVTLAGNSSLCPISGWAVHSKD
NGIRIGSKGDVFVIREPFISCSHLECRTFFLTQGALLNDKHSNGTVKDRSPHRTLMSCPVGEAPSPYNSRFESVAWSASACHDGTSWLTIGISGPDNGAVA
VLKYNGIITDTIKSWRNNILRTQESECACVNGSCFTVMTDGPSNGQASYKIFKMEKGKVVKSVELNAPNYHYEECSCYPDAGEITCVCRDNWHGSNRPW
VSFNQNLEYQIGYICSGVFGDNPRPNDGTGSCGPVSPNGAYGVKGFSFKYGNGVWIGRTKSTNSRSGFEMIWDPNGWTGTDSSFSVKQDIVAITDWSGY
SGSFVQHPELTGLDCIRPCFWVELIRGRPKESTIWTSGSSISFCGVNSDTVGWSWPDDAELPFTIDK
>NC_007361.1_2 [1426 - 959] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 6, complete sequence
MSMVNGNSASSGQDQPTVSLFTPQKDMLLPLVQIVLSLGRPLISSTQKQGLMQSNPVSSGCWTKLPLYPDQSVIATISCFTEKLLSVPVHPFGSQIISKPLLE
LVLLVLPIQTPLPYLNENPFTPYAPLGDTGPQLPVPSLGRGLSPKTPLHIYPI
>NC_007362.1_1 [22 - 1725] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 4, complete sequence
MEKIVLLLAIVSLVKSDQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDLNGVKPLILRDCSVAGWLLGNPMCDEFINVPEWSYIVEKAS
PANDLCYPGDFNDYEELKHLLSRTNHFEKIQIIPKSSWSNHDASSGVSSACPYHGRSSFFRNVVWLIKKNSAYPTIKRSYNNTNQEDLLVLWGIHHPNDA
AEQTKLYQNPTTYISVGTSTLNQRLVPEIATRPKVNGQSGRMEFFWTILKPNDAINFESNGNFIAPEYAYKIVKKGDSAIMKSELEYGNCNTKCQTPMGA
INSSMPFHNIHPLTIGECPKYVKSNRLVLATGLRNTPQRERRRKKRGLFGAIAGFIEGGWQGMVDGWYGYHHSNEQGSGYAADKESTQKAIDGVTNKV
NSIIDKMNTQFEAVGREFNNLERRIENLNKQMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRLQLRDNAKELGNGCFEFYHKCDNECM
ESVKNGTYDYPQYSEEARLNREEISGVKLESMGTYQILSIYSTVASSLALAIMVAGLSLWMCSNGSLQCRICI
>NC_007360.1_1 [28 - 1539] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 5, complete sequence
MSDINIMASQGTKRSYEQMETGGERQNATEIRASVGRMVGGIGRFYIQMCTELKLSDYEGRLIQNSITIERMVLSAFDERRNKYLEEHPSAGKDPKKTGG
PIYRRRDGKWVRELILYDKEEIRRIWRQANNGEDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMV
MELIRMIKRGINDRNFWRGENGRRTRIAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAVASGY
DFERE
GYSLVGIDPFRLLQNSQVFSLIRPNENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVAPRGQLSTRGVQIASNENMETMDSSTLELRSRYWAIRTRSGG
NTNQQRASAGQISVQPTFSVQRNLPFERATIMAAFTGNTEGRTSDMRTEIIRMMESSRPEDVSFQGRGVFELSDEKATNPIVPSFDMSNEGSYFFGDNAEE
YDN
>NC_007360.1_2 [1487 - 1137] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 5, complete sequence
MSKEGTIGFVAFSSESSKTPRPWKDTSSGLELSIILMISVLMSDVLPSVFPVNAAIMVALSKGRFLCTEKVGCTLICPADALCWLVFPPLLVLIAQYLLLSSR
VLESIVSMFSFEAI
>NC_007360.1_3 [465 - 28] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 5, complete sequence
MPDHHVSETSSCIFSIVRLTPNSPDLLFVIQNQLSHPFSVSSSVDWTSSFLWVLPRTGMFFQVFVPPFIKCRENHSLYCYAVLNQPSFIVAEFEFSAHLYIKPP
NSTNHSSNRCSDLSSILAFSTSFHLFIRSFGALRRHDVDVTQ
>NC_007358.1_1 [25 - 2295] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESH
PGIFENSCLETMEVVQQTRVDKLTQGRQTYDWTLKRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKGEMEIITHFQRKRRVRDNMTK
KMVTQRTIGKKKQRLNKRSYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQD
TELSFTITGDNTKWNENQNPRMFLAMITYITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLI
DGTASLSPGMMMGMFNMLSTVLGVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIEAGVDRFYRTCKLVGINMTKKKSYINRTGTCEFTSF
FYRYGFVANFSMELPSFGVSGINESADMSIGVTVIKNNMMDNDLGPATAQMALQLFIKDYRYPYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDG
GPNPYNIRNLHIPEAGLKWELMDEDYQGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHSWIPKRNRSILNTSQRGILEDEQMYQKCC
NLFEKFFPSSSYRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFAEIMKICSTIEELGRQK
>NC_007358.1_2 [1373 - 963] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 2, complete sequence
MRAKSSEDWSPSHQYVVLVYLFCPRFRIETPKTVLSILNMPIIIPGLNEAVPSISRGLIF
SIFFLVDSLKYFKSMLASISAGICVRSFMLLLSNMYPFPNLAILFENIIGAMLKTFLNHS
GWFLVMYVIIARNIRGF
>NC_007363.1_1 [26 - 781] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 7, complete sequence
MSLLTEVETYVLSIVPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDRAVKLY
KKLKREITFHGAKEVALSYSTGALASCMGLIYNRMGTVTTEVAFGLVCATCEQIADSQHRSHRQMATTTNPLIRHENRMVLASTTAKAMEQMAGSSEQ
AAEAMEVASQARQMVQAMRTIGTHPSSSAGLKDNLLENLQAYQKRMGVQMQRFK
>NC_007363.1_2 [865 - 542] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 7, complete sequence
66
MHLKKRRSRIHNIKCSIPMILAATTRGSLESLHLHSHSFLVGLQIFKKIIFQTGTGARMSPNCPHCLHHLPSLTSNLHGFRCLLTRSSHLLHSLSCSAGQHHS
VLMPD
>NC_007363.1_3 [378 - 1] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 7, complete sequence
MSATSLAPWNVISLFSFLYSLTALSILFGSPFPFKAFWTKRLRCSPRSLGTVSVNTNPKIPLVRGDRIGLVFSHSMRASRSVFFPAKTSSSLCAISALRGPDGT
IERTYVSTSVRRLIFQYLPAFA
>NC_004907.1_1 [33 - 788] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 7, complete sequence
MSLLTEVETYVLSIIPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDRAVKLY
KKLKREMTFHGAKEVALSYSTGALASCMGLIYNRMGTVTTEVALGLVCATCEQIADAQHRSHRQMATTTNPLIRHENRMVLASTTAKAMEQMAGSSE
QAAEAMEVASQARQMVQAMRTIGTHPSSSAGLKDDLIENLQAYQKRMGVQMQRFK
>NC_004907.1_2 [385 - 2] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 7, complete sequence
MSATSFAPWNVISLFSFLYSLTALSMLFGSPFPFRAFWTNRLRCSPRSLGTVSVNTNPKIPLVRGDRIGLVFSHSMRASRSVFFPAKTSSSLCAISALRGPDG
MIERTYVSTSVRRLIFQYLPAFGIP
>NC_004912.1_1 [21 - 2168] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 3, complete sequence
MEDFVRQCFNPMIVELAEKTMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESIIVESGDPNALLKHRFEIIEGRDRAMAWTVVNSICNTTGVD
KPKFLPDLYDYKENRFTEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDSFRQSERGEETIEER
FEITGTMRRLADQSLPPNFSSLENFRAYVDGFKPNGCIEGKLSQMSKEVNARIEPFLKTTPRPLRLPDGPPCSQRSKFLLMDALKLSIEDPSHEGEGIPLYDA
IKCMKTFFGWREPNIIKPHEKGINPNYLLAWKQVLAELQDIENEDKIPKTKNMKKTSQLMWALGENMAPEKLDFEDCKDIGDLKQYQSDEPELRSIASWI
QSEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRS
HLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEVGEMLLRTAIGQVSRPMFLYVRTNGTSKIKMKWGMEMRRCLLQSLQQIESMIEAESSIKEKDM
TKEFFENRSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYSSPQLEGFSAESRKLLLIVQALRDNLEPGTFDLEGLYGAIEECLINDPWVLLNASWFNS
FLTHALK
>NC_004912.1_2 [1735 - 1412] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 3, complete sequence
MEVPLVLTYRNMGLDTWPIAVRKSISPTSRTQYFSHLCGSSLGSVRENSILTKFTTSVSFLKWDLPFIMNPYRFVFLLPSFVLHLLIIGISWKSSMAAQDALS
KAVFM
>NC_004912.1_3 [1688 - 1119] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 3, complete sequence
MAYCSPQEHFPYFKNTVLFPFVWLQPWVSKGEFHTHKVHHVSIISQMGPSFYNEPIQVCLPSSFFCSAFAYHWNQLEVIHGCTRCIEQSCIYVNPLHYVLS
GPAMRHFRCEVVPSHACNVLNWGNIFPYLIEFYPARIGQFTCLVELTLDPACYRSELWLITLILFQIANIFAVLKVQFFRCHILPECPH
>NC_004912.1_4 [331 - 2] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 3, complete sequence
MGLSTPVVLQMLFTTVQAIARSLPSIISNLCFNNAFGSPDSTIIDSPRSSMKWKSEYMKQTSKCVHIAANLFVSIFGSSPYSFIVFSASSTIIGLKHCRTKSSIL
DQYLL
>NC_004906.1_1 [27 - 716] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 8, complete sequence
MDSNTVSSFQVDCFLWHVRKRFADQELGDAPFLDRLRRDQKSLRGRGSTLGLDIRTATREGKHIVERILEEESDEALKMTIASVPASRYLTEMTLEEMSR
DWLMLIPKQKVTGPLCIRMDQAVMGKTIILKANFSVIFNRLEALILLRAFTDEGAIVGEISPLPSLPGHTDEDVKNAIGVLIGGLEWNDNTVRVSETLQRFT
WRSSDENGRSPLPPKQKRKVERTIEPEV
>NC_004906.1_2 [535 - 861] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 8, complete sequence
MTRMSKMQLGSSSEDLNGMITQFESLKLYRDSLGEAVMRMGDLHSLQNRNGKWREQLSQKFEEIRWLIEEMRHRLRITENSFEQITFMQALQLLLEVEQ
EIRTFSFQLI
>NC_004906.1_3 [643 - 293] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 8, complete sequence
MLLQVNLCRVSETRTVLSFHSSPPMRTPIAFLTSSSVCPGREGNGEISPTIAPSSVNALSSIRASSRLKITLKFAFNMMVLPITAWSILMQRGPVTFCLGMSIN
QSLDISSRVISVR
>NC_004905.1_1 [28 - 1539] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 5, complete sequence
MSDINIMASQGTKRSYEQMETGGERQNATEIRASVGRMVGGIGRFYVQMCTELKLSDQEGRLIQNSITIERMVLSAFDERRNRYLEEHPSAGKDPKKTG
GPIYRRRDGKWVRELILYDKEEIRRIWRQANNGEDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAIKGVGTMV
MELIRMIKRGINDRNFWRGDNGRRTRIAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAVASGY
DFEREGYSLVGIDPFRLLQNSQVFSLIRPNENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVIPRGQLSTRGVQIASNENVEAMDSSTLELRSRYWAIRT
RSGGNTNQQRASAGQISVQPTFSVQRNLPFERPTIMAAFKGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKATNPIVPSFDMSNEGSYFFGD
NAEEYDN
>NC_004905.1_2 [1047 - 625] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 5, complete sequence
MACHPYQLTFMCWILIWSNKTEDLTVLKQTERIYPNQRVPFPLKIISTGHSEPVHTSRQAGLMGYGSSQDECRPCQKDEIFNFSIPRISAFSHLIHHCSLCCC
LKFPFEDVAHSLICNPCSSSIIASPEVPVINASLYHPN
>NC_004908.1_1 [32 - 1711] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 4, complete sequence
METISLITILLVVTASNADKICIGHQSTNSTETVDTLTETNVPVTHAKELLHTEHNGMLCATSLGHPLILDTCTIEGLVYGNPSCDLLLGGREWSYIVERSSA
VNGTCYPGNVENLEELRTLFSSASSYQRIQIFPDTTWNVTYTGTSRACSGSFYRSMRWLTQKSGFYPVQDAQYTNNRGKSILFVWGIHHPPTYTEQTNLY
IRNDTTTSVTTEDLNRTFKPVIGPRPLVNGLQGRIDYYWSVLKPGQTLRVRSNGNLIAPWYGHVLSGGSHGRILKTDLKGGNCVVQCQTEKGGLNSTLPF
HNISKYAFGTCPKYVRVNSLKLAVGLRNVPARSSRGLFGAIAGFIEGGWPGLVAGWYGFQHSNDQGVGMAADRDSTQKAIDKITSKVNNIVDKMNKQ
YEIIDHEFSEVETRLNMINNKIDDQIQDVWAYNAELLVLLENQKTLDEHDANVNNLYNKVKRALGSNAMEDGKGCFELYHKCDDQCMETIRNGTYNRR
KYREESRLERQKIEGVKLESEGTYKILTIYSTVASSLVLAMGFAAFLFWAMSNGSCRCNICI*
>NC_004909.1_1 [1 - 1401] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 6, complete sequence
MNPNQKIIALGSVSITIATICLLMQIAILATTMTLHFNECTNPSNNQAVPCEPIIIERNITEIVHLNNTTIEKESCPKVAEYKNWSKPQCQITGFAPFSKDNSIRL
SAGGDIWVTREPYVSCGLGKCYQFALGQGTTLNNKHSNGTIHDRSPHRTLLMNELGVPFHLGTKQVCIAWSSSSCHDGKAWLHVCVTGDDRNATASII
YDGMLTDSIGSWSKNILRTQESECVCINGTCTVVMTDGSASGRADTKILFIREGKIVHIGPLSGSAQHVEECSCYPRYPEVRCVCRDNWKGSNRPVLYINV
ADYSVDSSYVCSGLVGDTPRNDDSSSSSNCRDPNNERGGPGVKGWAFDNGNDVWMGRTIKKDSRSGYETFRVVGGWTTANSKSQINRQVIVDSDNWS
GYSGIFSVEGKTCINRCFYVELIRGRPQETRVWWTSNSIIVFCGTSGTYGTGSWPDGANINFMSI
>NC_004910.1_1 [28 - 2304] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 1, complete sequence
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVT
WWNRNGPTTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVKIRRRVDMNPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKREELKN
CNIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGVRM
VDILKQNPTEEQAVDICKAAMGLKISSSFSFGGFTFKRTKGSSVKREEEVLTGNLQTLKIKVHEGYEEFTMVGRRATAILRKATRRMIQLIVSGRDEQSIAE
AIIVAMVFSQEDCMVKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEPIDNVMGMIGILPDMTPSTEMSLRGVRVSKMGVDEYSSTER
VVVSIDRFLRVRDQRGNVLLSPEEVSETQGMEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQEPTMLYNKMEFEPFQSLVPKAARSQYS
GFVRTLFQQMRDVLGTFDTVQIIKLLPFAAAPPEQSRMQFSSLTVNVRGSGMRILVRGNSPAFNYNKTTKRLTILGKDAGALTEDPDEGTAGVESAVLRG
FLILGKEDKRYGPALSINELSNLTKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>NC_004910.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLESRFRFITNTTSPCPINTLAFSPFVRLLSSLMLNAGPYLLSSLPRIRNPLNTADSTPAVPSSGSSVSAPASFPSIVSLLVVLL
>NC_004910.1_3 [1295 - 735] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 1, complete sequence
MFTKFKSPRTAFTMQSSCENTIATIIASAIDCSSLPLTISWIILLVAFLRMAVALRPTIVNSSYPSCTFIFNVWRLPVSTSSSLLTEDPFVLLKVNPPKLKDELIF
KPIAALHISTACSSVGFCLRMSTILTPPICVLWHISRSEANGSADTVALLTMFLAAMIKLWSTSSFLTSPPGVYICSQQVP
>NC_004905.2_1 [22 - 1533] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 5, complete sequence
MSDINIMASQGTKRSYEQMETGGERQNATEIRASVGRMVGGIGRFYVQMCTELKLSDQEGRLIQNSITIERMVLSAFDERRNRYLEEHPSAGKDPKKTG
GPIYRRRDGKWVRELILYDKEEIRRIWRQANNGEDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAIKGVGTMV
MELIRMIKRGINDRNFWRGDNGRRTRIAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAVASGY
DFEREGYSLVGIDPFRLLQNSQVFSLIRPNENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVIPRGQLSTRGVQIASNENVEAMDSSTLELRSRYWAIRT
67
RSGGNTNQQRASAGQISVQPTFSVQRNLPFERPTIMAAFKGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKATNPIVPSFDMSNEGSYFFGD
NAEEYDN
>NC_004905.2_2 [1041 - 619] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 5, complete sequence
MACHPYQLTFMCWILIWSNKTEDLTVLKQTERIYPNQRVPFPLKIISTGHSEPVHTSRQAGLMGYGSSQDECRPCQKDEIFNFSIPRISAFSHLIHHCSLCCC
LKFPFEDVAHSLICNPCSSSIIASPEVPVINASLYHPN
>NC_004911.1_1 [24 - 2297] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGRWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESH
PGLFENSCLETMEVVQQTRVDKLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKEEMEITTHFQRKRRVRDNMTK
KMVTQRTIGKKKQKLTKKSYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVHFVEALARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQD
TELSFTVTGDNTKWNENQNPRIFLAMITYITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLANIDLKYFNESTRKKIEKIRPLLI
EGTASLSPGMMMGMFNMLSTVLGVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLVGINMSKKKSYINRTGTFEFTSFF
YRYGFVANFSMELPSFGVSGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGG
PNLYNIRNLHIPEVCLKWELMDEDYQGRLCNPLNPFVSHKEVESVNNAVVMPAHGPAKSMEYDAVATTHSWIPKRNRSILNTSQRGILEDEQMYQKCC
TLFEKFFPSSSYRRPVGISSMMEAMVSRARIDARIDFESGRIKKEEFAEILKICSTIEELGRQGK
>NC_004911.1_2 [2272 - 1925] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 2, complete sequence
MVEQIFKISANSSFLILPDSKSIRASIRALDTMASIMLEIPTGLRYELLGKNFSNRVQHFWYICSSSRIPLWLVLRMERFLLGIHECVVATASYSMLLAGPWA
GITTALLTDSTSL
>NC_004911.1_3 [1372 - 962] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 2, complete sequence
MRAKSSEDWSPSHQYVVLVYLFCPRFKIETPKTVLSILNMPIIIPGLNEAVPSISRGLIFSIFFLVDSLKYFKSMFASISAGICVRSFMLLLSNMYPFPNLAILF
ENIIGAMLKTFLNHSGWFLVMYVIIARNIRGF
>NC_007374.1_1 [44 - 1729] Influenza A virus (A/Korea/426/68(H2N2)) segment 4, complete sequence
MAIIYLILLFTAVRGDQICIGYHANNSTEKVDTILERNVTVTHAKDILEKTHNGKLCKLNGIPPLELGDCSIAGWLLGNPECDRLLSVPEWSYIMEKENPRY
SLCYPGSFNDYEELKHLLSSVKHFEKVKILPKDRWTQHTTTGGSWACAVSGKPSFFRNMVWLTRKGSNYPVAKGSYNNTSGEQMLIIWGVHHPNDEAE
QRALYQNVGTYVSVATSTLYKRSIPEIAARPKVNGLGRRMEFSWTLLDMWDTINFESTGNLVAPEYGFKISKRGSSGIMKTEGTLENCETKCQTPLGAIN
TTLPFHNVHPLTIGECPKYVKSEKLVLATGLRNVPQIESRGLFGAIAGFIEGGWQGMVDGWYGYHHSNDQGSGYAADKESTQKAFNGITNKVNSVIEKM
NTQFEAVGKEFSNLEKRLENLNKKMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRMQLRDNVKELGNGCFEFYHKCDNECMDSVKNG
TYDYPKYEEESKLNRNEIKGVKLSSMGVYQILAIYATVAGSLSLAIMMAGISFWMCSNGSLQCRICI
>NC_007376.1_1 [25 - 2172] Influenza A virus (A/Korea/426/68(H2N2)) segment 3, complete sequence
MEDFVRQCFNPMIVELAEKAMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIMVELDDPNALLKHRFEIIEGRDRTMAWTVVNSICNTTGA
EKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSENTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMANRGLWDSFRQSERGEETIEE
RFEITGTMRRLADQSLPPNFSCLENFRAYVDGFEPNGYIEGKLSQMSKEVNAKIEPFLKTTPRPIRLPDGPPCFQRSKFLLMDALKLSIEDPSHEGEGIPLYD
AIKCMRTFFGWKEPYIVKPHEKGINPNYLLSWKQVLAELQDIENEEKIPRTKNMKKTSQLKWALGENMAPEKVDFDNCRDISDLKQYDSDEPELRSLSS
WIQNEFNKACELTDSIWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKG
RSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQMSRPMFLYVRTNGTSKIKMKWGMEMRPCLLQSLQQIESMVEAESSVKEK
DMTKEFFENKSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLVVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASW
FNSFLTHALR
>NC_007376.1_2 [858 - 550] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 3, complete sequence
MKTRRPIRKSNWSWCCFQKRFNFCIYFFGHLRKLALNVAVRFESIHIGSKILKAGEVRRETLVGKPAHCPCDFKSFFNCFFASFGLTKGIPEASVGHFLSYG
E
>NC_007377.1_1 [26 - 781] Influenza A virus (A/Korea/426/68(H2N2)) segment 7, complete sequence
MSLLTEVETYVLSIVPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDRAVKLY
RKLKREITFHGAKEVALSYSAGALASCMGLIYNRMGAVTTEVAFAVVCATCEQIADSQHRSHRQMVTTTNPLIRHENRMVLASTTAKAMEQMAGSSEQ
AAEAMEVASQARQMVQAMRAIGTPPSSSAGLKDDLLENLQAYQKRMGVQMQRFK
>NC_007377.1_2 [517 - 185] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 7, complete sequence
MPVRPMLGVSNLFTGCTYHGKGHFSGHSPHPVVYEAHATGKCTSRITERYFFGPMECYLP
LKLSIQFNCSVHVIWIPIPIEGILDKASTLQSSLTWHGEREYKSQNPLSQR
>NC_007377.1_3 [378 - 1] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 7, complete sequence
MSATSLAPWNVISLLSFLYSLTALSMLFGSPFPLRAFWTKRLRCSPRSLGTVSVNTNPKI
PLVRGDRIGLVFSHSMRASRSVFFPAKTSSSLCAISALRGPDGTIERTYVSTSVRRLIFQ
YLPAFA
>NC_007382.1_1 [1 - 1407] Influenza A virus (A/Korea/426/68(H2N2)) segment 6, complete sequence
MNPNQKIITIGSVSLTIATVCFLMQIAILVTTVTLHFKQHECDSPASNQVMPCEPIIIER
NITEIVYLNNTTIEKEICPEVVEYRNWSKPQCQITGFAPFSKDNSIRLSAGGDIWVTREP
YVSCDPGKCYQFALGQGTTLDNKHSNDTIHDRIPHRTLLMNELGVPFHLGTRQVCVAWSS
SSCHDGKAWLHVCVTGDDKNATASFIYDGRLMDSIGSWSQNILRTQESECVCINGTCTVV
MTDGSASGRADTRILFIEEGKIVHISPLSGSAQHVEECSCYPRYPDVRCICRDNWKGSNR
PVIDINMEDYSIDSSYVCSGLVGDTPRNDDRSSNSNCRNPNNERGNPGVKGWAFDNGDDV
WMGRTISKDLRSGYETFKVIGGWSTPNSKSQINRQVIVDSNNWSGYSGIFSVEGKRCINR
CFYVELIRGRQQETRVWWTSNSIVVFCGTSGTYGTGSWPDGANINFMPI*
>NC_007381.1_1 [1 - 1494] Influenza A virus (A/Korea/426/68(H2N2)) segment 5, complete sequence
MASQGTKRSYEQMETDGERQNATEIRASVGKMIDGIGRFYIQMCTELKLSDYEGRLIQNS
LTIERMVLSAFDERRNKYLEEHPSAGKDPKKTGGPIYKRVDGKWMRELVLYDKEEIRRIW
RQANNGDDATAGLTHMMIWHSNLNDTTYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAG
AAVKGVGTMVMELIRMIKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMMD
QVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGPAIASGYNFEKEGYSLVG
IDPFKLLQNSQVYSLIRPNENPAHKSQLVWMACNSAAFEDLRVLSFIRGTKVSPRGKLST
RGVQIASNENMDTMESSTLELRSRYWAIRTRSGGNTNQQRASAGQISVQPAFSVQRNLPF
DKPTIMAAFTGNTEGRTSDMRAEIIRMMEGAKPEEMSFQGRGVFELSDEKATNPIVPSFD
MSNEGSYFFGDNAEEYDN*
>NC_007381.1_2 [723 - 184] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 5, complete sequence
MIHHCSLCSCLKFSFENVAHSLVSTPCFPSILTSPEVPIIDPTFDHPDQLHHHCPNSFDC
SACSSRPPRESRTLHQRAHPGIHSGANKSSCPLVCCIIQIGMPDHHVSQPSCCIITIIGL
APDSPYFFFVIKDEFPHPLSIYSLVYGSSSFLRILPRAGMFFQIFIPSLVKSREHHSLYC
>NC_007375.1_1 [25 - 2295] Influenza A virus (A/Korea/426/68(H2N2)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGKWTTNTE
TGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGIFENSCLETMEVIQQTRVD
KLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVIESMDKEEME
ITTHFQRKRRVRDNMTKKMVTQRTIGKKKQRLNKRSYLIRALTLNTMTKDAERGKLKRRA
IATPGMQIRGFVHFVETLARNICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSF
TITGDNTKWNENQNPRVFLAMITYITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESK
SMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLIDGTVSLSPGMMMGMFNMLSTVLG
VSILNLGQKKYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVNRFYRTCKLVGINMSKK
KSYINRTGTFEFTSFFYRYGFVANFSMELPSFGVSGINESADMSIGVTVIKNNMINNDLG
PATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGGSNLYN
IRNLHIPEVCLKWELMDEDYQGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAV
68
ATTHSWTPKRNRSILNTSQRGILEDEQMYQKCCNLFEKFFPSSSYRRPVGISSMVEAMVS
RARIDARIDFESGRIKKEEFAEIMKICSTIEELRRQK
>NC_007380.1_1 [1 - 711] Influenza A virus (A/Korea/426/68(H2N2)) segment 8, complete sequence
MDSNTVSSFQVDCFLWHVRKQVVDQELGDAPFLDRLRRDQKSLRGRGSTLDLDIEAATRV
GKQIVERILKEESDEALKMTMASAPASRYLTDMTIEELSRDWFMLMPKQKVEGPLCIRID
QAIMDKNIMLKANFSVIFDRLETLILLRAFTEEGAIVGEISPLPSLPGHTIEDVKNAIGV
LIGGLEWNDNTVRVSKTLQRFAWRSSNENGRPPLTPKQKRKMARTIRSKVRRDKMAD
>NC_007380.1_2 [467 - 835] Influenza A virus (A/Korea/426/68(H2N2)) segment 8, complete sequence
MLAKFHHCLLFQDILLRMSKMQLGSSSEDLNGMITQFESLKLYRDSLGEAVMRMGDLHSL
QNRNGKWREQLGQKFEEIRWLIEEVRHRLKITENSFEQITFMQALQLLFEVEQEIRTFSF
QLI*
>NC_007380.1_3 [767 - 120] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 8, complete sequence
MLFAQNYSLLSSICVSLLQSAILSLRTFDLIVLAIFRFCFGVSGGLPFSLLLLQANLCRV
LETRTVLSFHSSPPMRTPIAFLTSSIVCPGREGNGEISPTIAPSSVKALSNIRVSSRSKI
TLKFAFNMMFLSMIAWSILMQRGPSTFCLGISMNQSLDNSSIVMSVRYREAGAEAMVILS
ASSDSSFRILSTICFPTRVAASMSRSRVLPLPLRDF
>NC_007378.1_1 [28 - 2304] Influenza A virus (A/Korea/426/68(H2N2)) segment 1, complete sequence
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPSLRMKWMMAMKYPITAD
KRITEMVPERNEQGQTLWSKMSDAGSDRVMVSPLAVTWWNRNGPMTSTVHYPKIYKTYFE
KVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSE
SQLTITKEKKEELQDCKISPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCW
EQMYTPGGEVRNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCHSTQIGGTRMVDILRQ
NPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTSGSSIKREEEVLTGNLQTLKIRVHEGY
EEFTMVGKRATAILRKATRRLVQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF
VNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEHIDNVMGMIGVLPDMTPSTEMSMRGIRV
SKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGTEKLTITYSSSMMWEIN
GPESVLVNTYQWIIRNWETVKIQWSQNPTMLYNKMEFEPFQSLVPKAIRGQYSGFVRTLF
QQMRDVLGTFDTTQIIKLLPFAAAPPKQSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNK
TTKRLTILGKDAGTLTEDPDEGTSGVESAVLRGFLILGKEDRRYGPALSINELSTLAKGE
KANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>NC_007378.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLESRFRFITNTTSPCPISTLAFSPFARVLSSLMLNAGPYLLSS
LPRMRNPLRTADSTPDVPSSGSSVKVPASFPRIVSLLVVLL
>NC_007378.1_3 [1163 - 798] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 1, complete sequence
MVAFLSIAVALFPTIVNSSYPSCTLIFNVWRLPVSTSSSLLIDDPLVLLNVNPPKLKDEL
ILSPIAALHISTACSSVGFCLRMSTILVPPICVLWHISNKDASGSADTAALLTMFLAAII
RL
>NC_007373.1_1 [28 - 2304] Influenza A virus (A/New York/392/2004(H3N2)) segment 1, complete sequence
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPSLRMKWMMAMKYPITADKRITEMVPERNEQGQTLWSKMSDAGSDRVMVSPLAVT
WWNRNGPVASTVHYPKVYKTYFDKVERLKHGTFGPVHFRNQVKIRRRVDINPHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKKEELRDCK
ISPLMVAYMLERELVRKTRFLPVAGGTSSIYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCHSTQIGGTRMVDI
LRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTSGSSVKKEEEVLTGNLQTLKIRVHEGYEEFTMVGKRATAILRKATRRLVQLIVSGRDEQSIAEAII
VAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEHIDSVMGMVGVLPDMTPSTEMSMRGIRVSKMGVDEYSSTERVV
VSIDRFLRVRDQRGNVLLSPEEVSETQGTERLTITYSSSMMWEINGPESVLVNTYQWIIRNWEAVKIQWSQNPAMLYNKMEFEPFQSLVPKAIRSQYSGF
VRTLFQQMRDVLGTFDTTQIILLPFAAAPPKQSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKTTKRLTILGKDAGTLIEDPDESTSGVESAVLRGFLIIG
KEDRRYGPALSINELSNLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>NC_007373.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLESRFRFITNTTSPCPISTLAFSPFARLLSSLMLNAGPYLLSSLPIMRNPLKTADSTPDVLSSGSSIKVPASFPRIVSLLVVLL
>NC_007373.1_3 [1670 - 1368] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 1, complete sequence
MTKTDSGPLISHIIDDEYVIVNLSVPCVSLTSSGDNNTFPRWSRTLKNRSMLTTTLSVLEYSSTPILLTLIPLIDISVLGVISGNTPTIPITLSMCSIPQF
>NC_007373.1_4 [1163 - 735] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 1, complete sequence
MVAFLSIAVALFPTIVNSSYPSCTLIFNVWRLPVSTSSSFLTDDPLVLLNVNPPKLKDELILNPIAALHISTACSSVGFCLRMSTILVPPICVLWHISNKDASGS
ADTAALLTMFLAAIIRLWSTSSFLTSPPGVYICSQHVP
>NC_007369.1_1 [46 - 1539] Influenza A virus (A/New York/392/2004(H3N2)) segment 5, complete sequence
MASQGTKRSYEQMETDGDRQNATEIRASVGKMIDGIGRFYIQMCTELKLSDHEGRLIQNSLTIEKMVLSAFDERRNKYLEEHPSAGKDPKKTGGPIYRRV
DGKWMRELVLYDKEEIRRIWRQANNGEDATAGLTHIMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGIGTMVMELIRM
VKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMVDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACAYGPAVSSGYDFEKEG
YSLVGIDPFKLLQNSQIYSLIRPNENPAHKSQLVWMACHSAAFEDLRLLSFIRGTKVSPRGKLSTRGVQIASNENMDNMGSSTLELRSGYWAIRTRSGGNT
NQQRASAGQTSVQPTFSVQRNLPFEKSTIMAAFTGNTEGRTSDMRAEIIRMMEGAKPEEVSFRGRGVFELSDEKATNPIVPSFDMSNEGSYFFGDNAEEY
DN
>NC_007369.1_2 [768 - 445] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 5, complete sequence
MIHHCSLCSCLKFSFKNVAHSLISTSCFPPILTSPEISIVDPPFDHSDQFHHHCPDSFDCSTCSSGPSRESRALHQRAHSGIHSSSNKSSCPLVCCIIQIGMPDHY
VS
>NC_007369.1_3 [499 - 194] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 5, complete sequence
MSSGMLHHSNWNARSLCELDQLSHPHHCWLGARFALSLLCHKGRVPSSIFHLLSCIWAPQFSWDLSPRWGVLPGIYSFFHQKQRAPFSLLSSCSGSTALH
DH
>NC_007369.1_4 [411 - 28] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 5, complete sequence
MAPDSPYLFFVIKDEFPHPFSIYSPVYGPPSFLGIFPRAGVFFQVFIPSFIKSREHHFLYCQAVLDQPPFMITEFKFSAHLDVESPNSINHLPDGCPNLSCILAIPI
SFHLFIRPFGALGRHDFDVTR
>NC_007367.1_1 [26 - 781] Influenza A virus (A/New York/392/2004(H3N2)) segment 7, complete sequence
MSLLTEVETYVLSIVPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDKAVKLY
RKLKREITFHGAKEIALSYSAGALASCMGLIYNRMGAVTTEVAFGLVCATCEQIADSQHRSHRQMVATTNPLIKHENRMVLASTTAKAMEQMAGSSEQ
AAEAMEIASQARQMVQAMRAVGTHPSSSTGLRDDLLENLQTYQKRMGVQMQRFK
>NC_007367.1_2 [517 - 185] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 7, complete sequence
MPMRPVLGVSNLFTCCTYQAKCHFSGYSPHPIVYEAHATGKCTSRITESYFFGPMERYLPLKFPIQFNCFVHVIWISIPIEGILDKASTLQSSLTGHGEREHK
PQNPLSQR
>NC_007367.1_3 [378 - 1] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 7, complete sequence
MRAISLAPWNVISLLSFLYSLTALSMLFGSPFPLRAFWTKRLRCSPRSLGTVSVNTNPKIPLVRGDRIGLVFSHSMRASRSVFFPAKTSSSLCAISALRGPDG
TIERTYVSTSVRRLIFQYLPAFA
>NC_007371.1_1 [25 - 2172] Influenza A virus (A/New York/392/2004(H3N2)) segment 3, complete sequence
MEDFVRQCFNPMIVELAEKAMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIVVELDDPNALLKHRFEIIEGRDRTMAWTVVNSICNTTGA
EKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSENTHIHIFSFTGEEIATKADYTLDEESRARIKTRLFTIRQEMANRGLWDSFRQSERGEETIEEK
FEISGTMRRLADQSLPPKFSCLENFRAYVDGFEPNGCIEGKLSQMSKEVNAKIEPFLKTTPRPIKLPNGPPCYQRSKFLLMDALKLSIEDPSHEGEGIPLYDA
IKCIKTFFGWKEPYIVKPHEKGINSNYLLSWKQVLSELQDIENEEKIPRTKNMKKTSQLKWALGENMAPEKVDFDNCRDISDLKQYDSDEPELRSLSSWIQ
NEFNKACELTDSIWIELDEIGEDVAPIEYIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSH
69
LRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQISRPMFLYVRTNGTSKVKMKWGMEMRRCLLQSLQQIESMIEAESSIKEKDMTK
EFFENKSEAWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLVVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSF
LTHALK
>NC_007368.1_1 [20 - 1426] Influenza A virus (A/New York/392/2004(H3N2)) segment 6, complete sequence
MNPNQKIITIGSVSLTISTICFFMQIAILITTVTLHFKQYEFNSPPNNQVMLCEPTIIERNITEIVYLTNTTIEKEMCPKLAEYRNWSKPQCDITGFAPFSKDNSI
RLSAGGDIWVTREPYVSCDPDKCYQFALGQGTTLNNVHSNDTVHDRTPYRTLLMNELGVPFHLGTKQVCIAWSSSSCHDGKAWLHVCVTGDDKNATA
SFIYNGRLVDSIVSWSKKILRTQESECVCINGTCTVVMTDGSASGKADTKILFIEEGKIIHTSTLSGSAQHVEECSCYPRYPGVRCVCRDNWKGSNRPIVDI
NIKDYSIVSSYVCSGLVGDTPRKNDSSSSSHCLDPNNEEGGHGVKGWAFDDGNDVWMGRTISEKLRSGYETFKVIEGWSKPNSKLQINRQVIVDRGNRS
GYSGIFSVEGKSCINRCFYVELIRGRKEETEVLWTSNSIVVFCGTSGTYGTGSWPDGADINLMPI
>NC_007368.1_2 [360 - 34] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 6, complete sequence
MSPPAESLIELSLEKGANPVMSHCGFDQFLYSASLGHISFSMVVLVRYTISVMFLSIIVGSHSITWLFGGELNSYCLKCNVTVVIRMAICMKKHIVEMVRET
EPIVIIF
>NC_007370.1_1 [27 - 716] Influenza A virus (A/New York/392/2004(H3N2)) segment 8, complete sequence
MDSNTVSSFQVDCFLWHIRKQVVDQELSDAPFLDRLRRDQRSLRGRGNTLGLDIKAATHVGKQIVEKILKEESDEALKMTMVSTPASRYITDMTIEELSR
NWFMLMPKQKVEGPLCIRMDQAIMEKNIMLKANFSVIFDRLETIVLLRAFTEEGAIVGEISPLPSFPGHTIEDVKNAIGVLIGGLEWNDNTVRVSKNLQRF
AWRSSNENGGPPLTPKQKRKMARTARSKV
>NC_007370.1_2 [493 - 861] Influenza A virus (A/New York/392/2004(H3N2)) segment 8, complete sequence
MLAKSHHCLLFQDILLRMSKMQLGSSSEDLNGMITQFESLKIYRDSLGEAVMRMGDLHLLQNRNGKWREQLGQKFEEIRWLIEEVRHRLKTTENSFEQI
TFMQALQLLFEVEQEIRTFSFQLI
>NC_007370.1_3 [793 - 146] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 8, complete sequence
MLFVQSYFQLFLVCVSLLQSAILSLQTFDLAVLAIFRFCFGVSGGPPFSLLLLQANLCRFLETRTVLSFHSSPPMRTPIAFLTSSIVCPGKEGNGEISPTIAPSS
VKALSNTMVSSRSKITLKFAFNMMFFSMIAWSILMQRGPSTFCLGISMNQFLDNSSIVMSVMYREAGVETMVILSASSDSSFRIFSTICFPTWVAALMSRP
RVLPLPLRDL
>NC_007372.1_1 [25 - 2295] Influenza A virus (A/New York/392/2004(H3N2)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESH
PGIFENSCLETMEVVQQTRVDKLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKEEMEITTHFQRKRRVRDNMTK
KMVTQRTIGKKKQRVNKRGYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQ
DTELSF
TITGDNTKWNENQNPRMFLAMITYITKNQPEWFRNILSIAPIMFSNKMARLGKGYMFESKRMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLIDGTAS
LSPGMMMGMFNMLSTVLGVSVLNLGQKKYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLVGINMSKKKSYINKTGTFEFTSFFYRY
GFVANFSMELPSFGVSGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWDQTQSRAGLLVSDGGPNL
YN
IRNLHIPEVCLKWELMDENYRGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHSWNPKRNRSILNTSQRGILEDEQMYQKCCNLFEKF
FPSSSYRRPIGISSMVEAMVSRARIDARIDFESGRIKKEEFSEIMKICSTIEELRRQK
>NC_007372.1_2 [2285 - 1884] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 2, complete sequence
MSSSMVEQIFMISENSSFLIRPDSKSILASIRALDTMASTMLEIPIGLLYELLGKNFSNKLQHFWYICSSSRIPLWLVFRIERFLLGFQECVVATASYSILLAGP
WAGITTALFTDSISLWLTKGFRGLQSLPR
>NC_007372.1_3 [1373 - 1011] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 2, complete sequence
MRAKSSEDWSPSHQYVVLVYFFCPRFSTETPKTVLSMLNMPIIIPGLNDAVPSIRRGLIFSIFFLVDSLKYFRSMLASISAGICVRSFILLLSNMYPFPSLAILF
ENIIGAMLRMFLNHSG
>NC_007366.1_1 [30 - 1727] Influenza A virus (A/New York/392/2004(H3N2)) segment 4, complete sequence
MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGGICDSPHQILDGENCTLIDALLGDPQCDGFQNKKWDL
FVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNNESFNWTGVTQNGTSSACKRRSNNSFFSRLNWLTHLKFKYPALNVTMPNNEKFDKLYIWGVHH
PGTDNDQISLYAQASGRITVSTKRSQQTVIPSIGSRPRIRDVPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRSGKSSIMRSDAPIGKCNSECITPNGSIPND
KPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMVDGWYGFRHQNSEGTGQAADLKSTQAAINQINGKLNRLIGKTN
EKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFERTKKQLRENAEDMGNGCFKIYHKCDNACIGSIRNGTYDHD
VYRDEALNNRFQIKGVELKSGYKDWILWISFAISC
FLLCVALLGFIMWACQKGNIRCNICI
>NC_007366.1_2 [1259 - 888] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 4, complete sequence
MMEFLVCFPDQPIQLPIDLVDCCLSAFEICCLSCSLRILMPETVPTVYHSLPTIFYETRDCAKYASSLFLWYISHPCCQFQSVLLNISGTGPICDPVYILKWFV
IGNASIWSDAFRIAFANGCI
>NC_002018.1_1 [21 - 1382] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 6, complete sequence
MNPNQKIITIGSICLVVGLISLILQIGNIISIWISHSIQTGSQNHTGICNQNIITYKNSTWVKDTTSVILTGNSSLCPIRGWAIYSKDNSIRIGSKGDVFVIREPFIS
CSHLECRTFFLTQGALLNDRHSNGTVKDRSPYRALMSCPVGEAPSPYNSRFESVAWSASACHDGMGWLTIGISGPDNGAVAVLKYNGIITETIKSWRKKI
LRTQESECACVNGSCFTIMTDGPSDGLASYKI
FKIEKGKVTKSIELNAPNSHYEECSCYPDTGKVMCVCRDNWHGSNRPWVSFDQNLDYQIGYICSGVFGDNPRPKDGTGSCGPVYVDGANGVKGFSYRY
GNGVWIGRTKSHSSRHGFEMIWDPNGWTETDSKFSVRQDVVAMTDWSGYSGSFVQHPELTGLDCIRPCFWVELIRGRPKEKTIWTSASSISFCGVNSDT
VDWSWPDGAELPFTIDK
>NC_002023.1_1 [28 - 2304] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 1, complete sequence
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRITEMIPERNEQGQTLWSKMNDAGSDRVMVSPLAVT
WWNRNGPMTNTVHYPKIYKTYFERVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKKEELQDC
KISPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVKNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCHSTQIGGIRMV
DILKQ
NPTEEQAVGICKAAMGLRISSSFSFGGFTFKRTSGSSVKREEEVLTGNLQTLKIRVHEGYEEFTMVGRRATAILRKATRRLIQLIVSGRDEQSIAEAIIVAMV
FSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGVEPIDNVMGMIGILPDMTPSIEMSMRGVRISKMGVDEYSSTERVVVSIDRF
LRVRDQRGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQNPTMLYNKMEFEPFQSLVPKAIRGQYSGFVRTLFQ
QMRDVLGTFDTAQIIKLLPFAAAPPKQSRMQFSSFTVNVRGSGMRILVRGNSPVFNYNKATKRLTVLGKDAGTLTEDPDEGTAGVESAVLRGFLILGKED
RRYGPALSINELSNLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>NC_002023.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLESRFRFITNTTSPCPISTLAFSPFARLLSSLMLNAGPYLLSSLPRMRNPLRTADSTPAVPSSGSSVKVPASFPRTVSLFVALL
>NC_002017.1_1 [33 - 1730] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 4, complete sequence
MKANLLVLLCALAAADADTICIGYHANNSTDTVDTVLEKNVTVTHSVNLLEDSHNGKLCRLKGIAPLQLGKCNIAGWLLGNPECDPLLPVRSWSYIVET
PNSENGICYPGDFIDYEELREQLSSVSSFERFEIFPKESSWPNHNTTKGVTAACSHAGKSSFYRNLLWLTEKEGSYPKLKNSYVNKKGKEVLVLWGIHHPS
NSKDQQNIYQNENAYVSVVTSNYNRRFTPEIAERPKVRDQAGRMNYYWTLLKPGDTIIFEANGNLIAPRYAFALSRGFGSGIITSNASMHECNTKCQTPL
GAINSSLPFQNIHPVTIGECPKYVRSAKLRMVTGLRNIPSIQSRGLFGAIAGFIEGGWTGMIDGWYGYHHQNEQGSGYAADQKSTQNAINGITNKVNSVIE
KMNIQFTAVGKEFNKLEKRMENLNKKVDDGFLDIWTYNAELLVLLENERTLDFHDSNVKNLYEKVKSQLKNNAKEIGNGCFEFYHKCDNECMESVRN
GTYDYPKYSEESKLNREKVDGVKLESMGIYQILAIYSTVASSLVLLVSLGAISFWMCSNGSLQCRICI
>NC_002017.1_2 [1680 - 1276] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 4, complete sequence
MPPGRPKAPVNWRQLSRSPESDRSPLIPISLHLPFPCSTLTLLNIWDNHKSHFLHFPCIHCHTCGRTQNIHFRFLWHYSLIGFLLSHTDSSHLSHGNPESFHFP
VELTILHYMSKCPEIHHQLFYLNFPSFFLIC
>NC_002019.1_1 [28 - 1539] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 5, complete sequence
MSDIKIMASQGTKRSYEQMETDGERQNATEIRASVGKMIGGIGRFYIQMCTELKLSDYEGRLIQNSLTIERMVLSAFDERRNKYLEEHPSAGKDPKKTGG
PIYRRVNGKWMRELILYDKEEIRRIWRQANNGDDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMV
MELVRMIKRGINDRNFWRGENGRKTRIAYERMCNILKGKFQTAAQKAMMDQVRESRDPGNAEFEDLTFLARSALILRGSVAHKSCLPACVYGPAVASG
YDFEREGYSLVGIDPFRLLQNSQVYSLIRPNENPAHKSQLVWMACHSAAFEDLRVLSFIKGTKVVPRGKLSTRGVQIASNENMETMESSTLELRSRYWAI
70
RTRSGGNTNQQRASAGQISIQPTFSVQRNLPFDRTTVMAAFTGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKAASPIVPSFDMSNEGSYFF
GDNAEEYDN
>NC_002019.1_2 [549 - 229] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 5, complete sequence
MHQRAHPGIHSGANKSPCPLISCIIQIGMPDHHVSQTSRCIVTIISLAPDSPYFFFVIKDEFSHPLSVYSSVYRSSSFLRILPRTGMFFQVFISPFVKSREHHSLY
C
>NC_002022.1_1 [25 - 2172] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 3, complete sequence
MEDFVRQCFNPMIVELAEKTMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIIVELGDPNALLKHRFEIIEGRDRTMAWTVVNSICNTTGAE
KPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDSFRQSERGEETIEER
FEITGTMRKLADQSLPPNFSSLENFRAYVDGFEPNGYIEGKLSQMSKEVNARIEPFLKTTPRPLRLPNGPPCSQRSKFLLMDALKLSIEDPSHEGEGIPLYDA
IKCMRTFFGWKEPNVVKPHEKGINPNYLLSWKQVLAELQDIENEEKIPKTKNMKKTSQLKWALGENMAPEKVDFDDCKDVGDLKQYDSDEPELRSLAS
WIQNEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTSEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKG
RSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQVSRPMFLYVRTNGTSKIKMKWGMEMRRCLLQSLQQIESMIEAESSVKEKD
MTKEFFENKSETWPIGESPKGVEESSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLIVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFN
SFLTHALS
>NC_002022.1_2 [1706 - 1380] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 3, complete sequence
MGLETWPMALLRSISPISRTQYFSHLCGSSLGSVRENSMLTKFTTSVSFLKWDLPFMMKPYKLVFRLPSLVLHLLIIGINWKSSIAAQDALSKAVLMYTPFI
MYSVALQ
>NC_002022.1_3 [858 - 550] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 3, complete sequence
MRTGRPIRKSKWSWCCFQKRFNSSIYFFGHLRQLALNVAVRFESIHIGSKIFKAGEVRRETLVGKLAHCSCDFKPFFNCLFSSLGLTKGIPEASAGHFLSYG
E
>NC_002016.1_1 [26 - 781] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 7, complete sequence
MSLLTEVETYVLSIIPSGPLKAEIAQRLEDVFAGKNTDLEVLMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDKAVKLY
RKLKREITFHGAKEISLSYSAGALASCMGLIYNRMGAVTTEVAFGLVCATCEQIADSQHRSHRQMVTTTNPLIRHENRMVLASTTAKAMEQMAGSSEQA
AEAMEVASQARQMVQAMRTIGTHPSSSAGLKNDLLENLQAYQKRMGVQMQRFK
>NC_002016.1_2 [378 - 1] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 7, complete sequence
MSEISLAPWNVISLLSFLYSLTALSMLFGSPFPLRAFWTKRLRCSPRSLGTVSVNTNPKIPLVRGDRIGLVFSHSMRTSRSVFFPAKTSSSLCAISALRGPDG
MIERTYVSTSVRRLIFQYLPAFA
>NC_002020.1_1 [27 - 716] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 8, complete sequence
MDPNTVSSFQVDCFLWHVRKRVADQELGDAPFLDRLRRDQKSLRGRGSTLGLDIETATRAGKQIVERILKEESDEALKMTMASVPASRYLTDMTLEEMS
REWSMLIPKQKVAGPLCIRMDQAIMDKNIILKANFSVIFDRLETLILLRAFTEEGAIVGEISPLPSLPGHTAEDVKNAVGVLIGGLEWNDNTVRVSETLQRF
AWRSSNENGRPPLTPKQKREMAGTIRSEV
>NC_002020.1_2 [493 - 861] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 8, complete sequence
MLAKFHHCLLFQDILLRMSKMQLESSSEDLNGMITQFESLKLYRDSLGEAVMRMGDLHSLQNRNEKWREQLGQKFEEIRWLIEEVRHKLKVTENSFEQI
TFMQALHLLLEVEQEIRTFSFQLI
>NC_002020.1_3 [793 - 293] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 8, complete sequence
MLFAQNYSLLPSVCVSLLQSTILFLQTSDLIVPAISRFCFGVSGGLPFSLLLLQANLCRVSETRTVLSFHSSPPMRTPTAFLTSSAVCPGREGNGEISPTIAPSS
VKALSNIRVSSRSKITLKFAFSMMFLSMIAWSILIQRGPATFCLGMSMDHSLDISSRVMSVR
>NC_002021.1_1 [25 - 2295] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKARWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESH
PGIFENSCIETMEVVQQTRVDKLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMKKEEMGITTHFQRKRRVRDNMTK
KMITQRTIGKRKQRLNKRSYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDT
ELSLTITGDNTKWNENQNPRMFLAMITYMTRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNDSTRKKIEKIRPLLI
EGTASLSPGMMMGMFNMLSTVLGVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLHGINMSKKKSYINRTGTFEFTSFF
YRYGFVANFSMELPSFGVSGSNESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFEIKKLWEQTRSKAGLLVSDGG
PNLYNIRNLHIPEVCLKWELMDEDYQGRLCNPLNPFVSHKEIESMNNAVMMPAHGPAKNMEYDAVATTHSWIPKRNRSILNTSQRGVLEDEQMYQRCC
NLFEKFFPSSSYRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFTEIMKICSTIEELRRQK
Predicted genes/exons:
Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..
----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------
71
8.00 Prom + 14579 14618 40 -13.78
8.01 Sngl + 14658 16346 1689 2 0 80 38 557 0.990 45.02
8.02 PlyA + 16363 16368 6 -1.75
72
26.00 Prom + 63021 63060 40 -13.78
26.01 Sngl + 63101 63793 693 1 0 73 49 386 0.985 29.41
26.02 PlyA + 63940 63945 6 1.05
>gi|GENSCAN_predicted_peptide_1|230_aa
MDSNTVSSFQVDCFLWHVRKRFADQELGDAPFLDRLRRDQKSLRGRGSTLGLDIRTATREGKHIVERILEEESDEALKMTIASVPASRYLTEMTL
EEMSRDWLMLIPKQKVTGPLCIRMDQAVMGKTIILKANFSVIFNRLEALILLRAFTDEGAIVGEISPLPSLPGHTDEDVKNAIGVLIGGLEWNDN
TVRVSETLQRFTWRSSDENGRSPLPPKQKRKVERTIEPEV
>gi|GENSCAN_predicted_CDS_1|693_bp
atggattccaacactgtgtcaagctttcaggtagactgctttctttggcatgtccgcaaacgatttgcagaccaagaactgggtgatgccccatt
ccttgaccggcttcgccgagatcagaagtccctaagaggaagaggcagcactcttggtctggacatcagaactgccactcgtgaaggaaagcata
tagtggagcggattctggaggaagaatctgacgaggcacttaaaatgactatcgcttcagtgcctgcttcacgctacctaactgaaatgactctt
gaggaaatgtcaagggattggttaatgctcattcccaagcagaaagtgacagggcccctttgcattagaatggaccaggcagtaatgggtaaaac
catcatattgaaagcaaactttagtgtgatttttaatcgacttgaagctctgatactacttagagcgtttacagatgaaggagcaatagtgggcg
aaatctcaccattaccttcccttccaggacatactgacgaggatgtcaaaaatgcaattggggtcctcatcggaggacttgaatggaatgataac
acagttcgagtctctgaaactctacagagattcacttggagaagcagtgatgagaatgggagatctccactccctccaaaacagaaacggaaagt
ggagagaacaattgagccagaagtttga
>gi|GENSCAN_predicted_peptide_2|498_aa
MASQGTKRSYEQMETGGERQNATEIRASVGRMVGGIGRFYVQMCTELKLSDQEGRLIQNSITIERMVLSAFDERRNRYLEEHPSAGKDPKKTGGP
IYRRRDGKWVRELILYDKEEIRRIWRQANNGEDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAIKGVGTMV
MELIRMIKRGINDRNFWRGDNGRRTRIAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAV
ASGYDFEREGYSLVGIDPFRLLQNSQVFSLIRPNENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVIPRGQLSTRGVQIASNENVEAMDSSTLE
LRSRYWAIRTRSGGNTNQQRASAGQISVQPTFSVQRNLPFERPTIMAAFKGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKATNPI
VPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_2|1497_bp
atggcgtcgcaaggcaccaaacgatcctatgaacagatggaaactggtggagaacgccagaatgccactgagatcagggcatctgttggaagaat
ggttggtggaattgggaggttttacgtacagatgtgcactgaactcaaactcagcgaccaagaaggaaggttgatccagaacagtataacaatag
agagaatggttctctccgcatttgatgaaaggaggaacaggtacctagaggaacatcccagtgcggggaaggacccgaagaagaccggaggtcca
atctaccgaaggagagacgggaaatgggtgagagagctgattctgtatgacaaagaggagataaggagaatttggcgtcaagcgaacaatggaga
agacgcaactgctggtctcactcatatgatgatctggcattccaacctaaatgatgccacataccagagaacaagagccctcgtgcggactggaa
tggaccccagaatgtgctctctgatgcaaggatcaaccctcccgaggagatctggagctgctggtgcagcaataaagggagtcgggacaatggta
atggaactaattcggatgataaagcgaggcattaatgaccggaacttctggagaggcgataatggacgaagaacaaggattgcatatgagagaat
gtgcaacatcctcaaagggaaatttcaaacagcagcacaaagagcaatgatggatcaggtgcgagaaagcagaaatcctgggaatgctgaaattg
aagatctcatctttctggcacggtctgcactcatcctgagaggatccgtagcccataagtcctgcttgcctgcttgtgtgtacgggctcgctgtg
gccagtggatatgattttgagagggaagggtactctctggttgggatagatcctttccgtctgcttcagaacagtcaggtcttcagtcttattag
accaaatgagaatccagcacataaaagtcaattggtatggatggcatgccattctgcagcatttgaggacctgagagtctcaagtttcattagag
gaacaagagtgatcccaagaggacaactatccactagaggagttcagattgcttcaaatgagaacgtggaagcaatggattccagcactcttgaa
ctgagaagcagatattgggctataaggaccaggagtggaggaaacaccaatcaacagagagcatctgcaggacaaatcagtgtacagcccacttt
ctcagtacagagaaatcttcccttcgaaagaccgaccattatggctgcgtttaaggggaataccgagggcagaacatctgacatgaggactgaaa
tcataaggatgatggaaagtgccagaccagaagatgtgtctttccaggggcggggagtcttcgagctctcggacgaaaaggcaacgaacccgatc
gtgccttcctttgacatgagtaatgaaggatcttatttcttcggagacaatgcagaggaatatgacaattga
>gi|GENSCAN_predicted_peptide_3|498_aa
MASQGTKRSYEQMETDGERQNATEIRASVGKMIDGIGRFYIQMCTELKLSDYEGRLIQNSLTIERMVLSAFDERRNKYLEEHPSAGKDPKKTGGP
IYKRVDGKWMRELVLYDKEEIRRIWRQANNGDDATAGLTHMMIWHSNLNDTTYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMV
MELIRMIKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGPAI
ASGYNFEKEGYSLVGIDPFKLLQNSQVYSLIRPNENPAHKSQLVWMACNSAAFEDLRVLSFIRGTKVSPRGKLSTRGVQIASNENMDTMESSTLE
LRSRYWAIRTRSGGNTNQQRASAGQISVQPAFSVQRNLPFDKPTIMAAFTGNTEGRTSDMRAEIIRMMEGAKPEEMSFQGRGVFELSDEKATNPI
VPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_3|1497_bp
atggcgtcccaaggcaccaaacggtcttatgaacagatggaaactgatggggaacgccagaatgcaactgagatcagagcatccgtcgggaagat
gattgatggaattggacgattctacatccaaatgtgcaccgaacttaaactcagtgattatgaggggcgactgatccagaacagcttaacaatag
agagaatggtgctctctgcttttgacgagagaaggaataaatatctggaagaacatcccagcgcggggaaggatcctaagaaaactggaggaccc
atatacaagagagtagatggaaagtggatgagggaactcgtcctttatgacaaagaagaaataaggcgaatctggcgccaagccaataatggtga
tgatgcaacagctgggctgactcacatgatgatctggcattccaatttgaatgatacaacataccagaggacaagagctcttgttcgcaccggaa
tggatcccaggatgtgctctttgatgcagggttcgactctccctaggaggtctggagctgcaggcgctgcagtcaaaggagttgggacaatggtg
atggagttgatcaggatgatcaaacgtgggatcaatgatcggaacttctggagaggtgagaatggacggaaaacaaggagtgcttacgagagaat
gtgcaacattctcaaaggaaaatttcaaacagctgcacaaagagcaatgatggatcaagtgagagaaagccggaacccaggaaatgctgagatcg
aagatctaatctttctggcacggtctgcactcatattgagagggtcagttgctcacaaatcttgtctgcccgcctgtgtgtatggacctgccata
gccagtgggtacaacttcgaaaaagagggatactctctagtgggaatagaccctttcaaactgcttcaaaacagccaagtatacagcctaatcag
accgaacgagaatccagcacacaagagtcagctggtgtggatggcatgcaattctgctgcatttgaagatctaagagtattaagcttcatcagag
ggaccaaagtatccccaagggggaaactttccactagaggagtacaaattgcttcaaatgaaaacatggatactatggaatcaagtactcttgaa
ctaagaagcaggtactgggccataaggaccagaagtggaggaaacactaatcaacagagggcctctgcaggtcaaatcagtgtacaacctgcatt
73
ttctgtgcaaagaaacctcccatttgacaaaccaaccatcatggcagcattcactgggaatacagagggaagaacatcagacatgagggcagaaa
tcataaggatgatggaaggtgcaaaaccagaagaaatgtccttccaggggcggggagtcttcgagctctcggacgaaaaggcaacgaacccgatc
gtgccctcttttgacatgagtaatgaaggatcttatttcttcggagacaatgcagaggagtacgacaattaa
>gi|GENSCAN_predicted_peptide_4|759_aa
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPSLRMKWMMAMKYPITADKRITEMVPERNEQGQTLWSKMSDAGSDRVMVSPLA
VTWWNRNGPMTSTVHYPKIYKTYFEKVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKK
EELQDCKISPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCH
STQIGGTRMVDILRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTSGSSIKREEEVLTGNLQTLKIRVHEGYEEFTMVGKRATAILRKATRR
LVQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEHIDNVMGMIGVLPDMTPSTEMSM
RGIRVSKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQNPTM
LYNKMEFEPFQSLVPKAIRGQYSGFVRTLFQQMRDVLGTFDTTQIIKLLPFAAAPPKQSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKTTKRL
TILGKDAGTLTEDPDEGTSGVESAVLRGFLILGKEDRRYGPALSINELSTLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>gi|GENSCAN_predicted_CDS_4|2280_bp
atggaaagaataaaagaactacggaatctgatgtcgcagtctcgcactcgcgagatactaacaaaaaccacagtggaccatatggccataattaa
gaagtacacatcagggagacaggaaaagaacccgtcacttaggatgaaatggatgatggcaatgaaatatccaattacagctgacaagaggataa
cagaaatggttcctgagagaaatgagcaaggacaaactctatggagtaaaatgagtgatgccgggtcagatcgagtaatggtatcacctttggca
gtgacatggtggaatagaaatggaccaatgacaagtacggttcattatccaaaaatctacaagacttattttgagaaagtcgaaaggttaaaaca
tggaacctttggccctgtccattttagaaaccaagtcaaaatacgccgaagagttgacataaaccctggtcatgcagacctcagtgccaaggagg
cacaagacgtaatcatggaagttgttttccccaatgaagtgggggccaggatactaacgtcggaatcacaattaacaataaccaaagagaaaaaa
gaagaactccaagattgcaaaatttctcctttgatggttgcatacatgttagagagagaacttgtccgaaaaacgagatttctcccagttgctgg
tggaacaagcagtgtgtacattgaagtgttacacttgactcaaggaacatgttgggaacagatgtacaccccaggtggagaagtgaggaatgatg
atgttgatcaaagtctaattattgcagccaggaacatagtgagaagagcagcagtatcagcagatccactagcatctttattggagatgtgccac
agcacacagattggcgggacaaggatggtggacattcttaggcagaacccaacggaagaacaagctgtggatatatgcaaggctgcaatgggact
gagaatcagctcatccttcagttttggcgggttcacatttaagagaacaagcgggtcatcaatcaagagagaggaagaagtgcttacgggcaatc
tccaaacattgaaaataagggtgcatgaggggtacgaggaattcacaatggtggggaaaagggcaacagctatactcagaaaagcaaccaggaga
ttggttcagctgatagtgagtggaagagacgaacagtcaatagccgaagcaataattgtagccatggtgttttcacaagaagattgcatgataaa
agcagttagaggtgacctgaatttcgttaatagggcaaatcagcgattgaatcccatgcatcaacttttaagacattttcagaaagatgcaaaag
tgctctttcaaaattggggaattgaacatatcgacaatgtaatgggaatgattggagtattaccagacatgactccaagcacagagatgtcaatg
agagggataagagtcagcaaaatgggcgtggatgaatactccagcacagagagggtagtggtaagcattgaccggtttttgagagttcgagacca
acgaggaaatgtactactatctcctgaggaggtcagtgaaacacaggggacagagaaactgacaataacttactcatcgtcaatgatgtgggaga
ttaatggccctgagtcagtgttggtcaatacctatcagtggatcatcagaaactgggaaactgttaaaattcaatggtctcagaatcctacaatg
ctatacaataaaatggaatttgagccatttcagtctttagttcctaaggccattagaggccaatacagtggatttgttaggactctattccaaca
aatgagggatgtacttgggacatttgataccacccagataataaagcttcttccctttgcagccgccccaccaaagcaaagtagaatgcagttct
cttcattgactgtgaatgtgaggggatcaggaatgagaatacttgtaaggggcaattctcctgtattcaactacaacaagaccactaagagacta
acaattctcggaaaggatgctggcactttaactgaagacccagatgaaggcacatccggagtggagtccgctgttctgagaggattcctcattct
gggcaaggaagatagaagatatggaccagcattaagcatcaatgaactgagtacccttgcaaaaggagaaaaggctaatgtactaattgggcaag
gagacgtggtgttggtaatgaaacgaaaacgggactctagcatacttactgacagccagacagcgaccaaaagaattcggatggccatcaattaa
>gi|GENSCAN_predicted_peptide_5|252_aa
MSLLTEVETYVLSIVPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDR
AVKLYRKLKREITFHGAKEVALSYSAGALASCMGLIYNRMGAVTTEVAFAVVCATCEQIADSQHRSHRQMVTTTNPLIRHENRMVLASTTAKAME
QMAGSSEQAAEAMEVASQARQMVQAMRAIGTPPSSSAGLKDDLLENLQAYQKRMGVQMQRFK
>gi|GENSCAN_predicted_CDS_5|759_bp
atgagccttctaaccgaggtcgaaacgtacgttctctctatcgtcccgtcaggccccctcaaagccgagatcgcacagagacttgaagatgtctt
tgctgggaagaacacagatcttgaggctctcatggaatggctaaagacaagaccaatcctgtcacctctgactaaggggattttgggatttgtat
tcacgctcaccgtgccaagtgagcgaggactgcagcgtagacgctttgtccaaaatgccctcaatgggaatggggatccaaataacatggacaga
gcagttaaactgtatagaaagcttaagagggagataacattccatggggccaaagaagtagcgctcagttattctgctggtgcacttgccagttg
catgggcctcatatacaacaggatgggggctgtgaccactgaagtggcctttgccgtggtatgtgcaacctgtgaacagattgctgactcccagc
ataggtctcacaggcaaatggtgacaacaaccaatccactaataagacatgagaacagaatggttctggccagcactacagctaaggctatggag
caaatggctggatcgagtgagcaagcagcagaggccatggaggttgctagtcaggccaggcaaatggtgcaggcaatgagagccattgggactcc
tcctagctccagtgctggtctaaaagatgatcttcttgaaaatttgcaggcctatcagaaacgaatgggggtgcagatgcaacgattcaagtga
>gi|GENSCAN_predicted_peptide_6|716_aa
MEDFVRQCFNPMIVELAEKAMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIMVELDDPNALLKHRFEIIEGRDRTMAWTVVNSIC
NTTGAEKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSENTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMANRGLWDS
FRQSERGEETIEERFEITGTMRRLADQSLPPNFSCLENFRAYVDGFEPNGYIEGKLSQMSKEVNAKIEPFLKTTPRPIRLPDGPPCFQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCMRTFFGWKEPYIVKPHEKGINPNYLLSWKQVLAELQDIENEEKIPRTKNMKKTSQLKWALGENMAPEKVD
FDNCRDISDLKQYDSDEPELRSLSSWIQNEFNKACELTDSIWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQMSRPMFLYVRTNGT
SKIKMKWGMEMRPCLLQSLQQIESMVEAESSVKEKDMTKEFFENKSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKL
LLVVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSFLTHALR
>gi|GENSCAN_predicted_CDS_6|2151_bp
atggaagattttgtgcgacaatgcttcaatccgatgattgtcgaacttgcggaaaaggcaatgaaagagtatggagaagatctgaaaatcgaaac
aaacaaatttgcagcaatatgcactcacttggaagtatgcttcatgtattcagattttcatttcatcaatgagcaaggcgagtcaataatggtag
agcttgatgatccaaatgcacttttgaagcacagatttgaaataatagagggaagagatcgcacaatggcctggacagtagtaaacagtatttgc
aacaccacaggagctgagaaaccgaagtttctgccagatttgtatgattacaaggagaatagattcatcgagattggagtgacaaggagagaagt
ccacatatactatcttgaaaaggccaataaaattaaatctgagaatacacacatccacattttctcattcactggggaagaaatggccacaaagg
ccgactacactctcgatgaggaaagcagggctaggatcaaaaccagactattcaccataagacaagaaatggccaacagaggcctctgggattcc
tttcgtcagtccgaaagaggcgaagaaacaattgaagaaagatttgaaatcacagggacaatgcgcaggcttgccgaccaaagtctcccgccgaa
cttctcctgccttgagaattttagagcctatgtggatggattcgaaccgaacggctacattgagggcaagctttctcaaatgtccaaagaagtaa
atgcaaaaattgaaccttttctgaaaacaacaccaagaccaattagacttccggatgggcctccttgttttcagcggtccaaattcctgctgatg
gatgctttaaaattaagcattgaggacccaagtcacgaaggggagggaataccactatatgatgcgatcaagtgcatgagaacattctttggatg
gaaagaaccctatattgttaaaccacacgaaaagggaataaatccaaattatctgctgtcatggaagcaagtactggcggaactgcaggacattg
agaatgaggagaagattccaagaactaaaaacatgaagaaaacgagtcagctaaagtgggcacttggtgagaacatggcaccagagaaggtagac
tttgacaactgtagagacataagcgatttgaagcaatatgatagtgacgaacctgaattaaggtcactttcaagctggatccagaatgagttcaa
caaggcatgcgagctgaccgattcaatctggatagagctcgatgagattggagaagacgtggctccaattgaacacattgcaagcatgagaagga
attacttcacagcagaggtgtcccattgcagagccacagaatatataatgaagggggtatacattaatactgccttgcttaatgcatcctgtgca
gcaatggacgatttccaactaattcccatgataagcaagtgtagaactaaagagggaaggcgaaagaccaatttatatggtttcatcataaaagg
74
aagatctcacttaaggaatgacaccgacgtggtaaactttgtgagcatggagttttctctcactgacccgagacttgagccacacaaatgggaga
agtactgtgtccttgagataggagatatgctactaagaagtgccataggccagatgtcaaggcctatgttcttgtatgtgaggacaaatggaaca
tcaaagattaaaatgaaatggggaatggagatgaggccttgcctccttcagtcactacaacaaatcgagagtatggttgaagccgagtcctctgt
caaagagaaagacatgaccaaagagttttttgagaataaatcagaaacatggcccattggggagtcccccaaaggagtggaagaaggttccattg
ggaaggtctgcaggactttattagccaagtcggtattcaatagcctgtatgcatccccacaattagaaggattttcagctgaatcaagaaaactg
cttcttgtcgttcaggctcttagggacaatcttgaacctggaacctttgatcttggggggctatatgaagcaattgaggagtgcctgattaatga
tccctgggttttgcttaatgcgtcttggttcaactccttcctaacacatgcattaagatag
>gi|GENSCAN_predicted_peptide_7|718_aa
MDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGIFENSCLETMEVIQQTRVDKLTQGRQTYDWTLN
RNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVIESMDKEEMEITTHFQRKRRVRDNMTKKMVTQRTIGKKKQRLNKRSYLIRALTLNTMTK
DAERGKLKRRAIATPGMQIRGFVHFVETLARNICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSFTITGDNTKWNENQNPRVFLAMITY
ITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLIDGTVSLSPGMMMGMFNMLSTVL
GVSILNLGQKKYTKXTYWWDGLQSSDDFALIVNAPNHEGIQAGVNRFYRTCKLVGINMSKKKSYINRTGTFEFTSFFYRYGFVANFSMELPSFGV
SGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGGSNLYNIRNLHIPEV
CLKWELMDEDYQGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHSWTPKRNRSILNTSQRGILEDEQMYQKCCNLFEKFFPSSS
YRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFAEIMKICSTIEELRRQK
>gi|GENSCAN_predicted_CDS_7|2157_bp
atggacacagtcaacagaacacatcaatattcagaaaaggggaagtggacaacaaacacggaaactggagcgccccaacttaacccaattgatgg
accactacctgaggacaatgaaccaagtggatatgcacaaacagactgcgtcctggaagcaatggctttccttgaggaatcacacccaggaatct
ttgaaaattcgtgtcttgaaacgatggaagttattcaacaaacaagagtggacaaactgacccaaggtcgtcagacctatgactggacattgaac
agaaatcagccggctgcaactgcgctagccaacactatagaggtcttcagatcgaatggactgacagctaatgagtcgggaaggctaatagattt
cctcaaggatgtgatagaatcaatggataaagaggagatggaaataacaacacacttccaaagaaaaagaagagtaagagacaacatgaccaaga
aaatggtcacacaacgaacaataggaaagaagaagcaaagattgaacaagagaagctatctgataagagcactgacattgaacacaatgactaaa
gatgcagagagaggtaaattaaaaagaagagcaattgcaacacccggtatgcagatcagagggttcgtgcactttgtcgaaacactagcgagaaa
tatttgtgagaaacttgaacagtctgggcttccggttggaggtaatgaaaagaaggctaaactagcaaatgttgttagaaaaatgatgactaatt
cacaagacacagagctctctttcacaattactggagacaacaccaaatggaatgagaatcaaaatcctcgagtgtttctggcgatgataacatac
atcacaagaaatcaacctgaatggtttagaaacgtcctgagcattgcacccataatgttctcaaataaaatggctagactagggaaaggttacat
gttcgaaagcaagagcatgaagctccgaacacaaataccagcagaaatgctagcaagtattgacctgaaatactttaatgaatcaaccagaaaga
aaattgagaaaataaggcctctcctaatagatggcacagtctcattgagtcctggaatgatgatgggcatgttcaacatgctaagtacagtctta
ggagtctcaatcctgaatctcgggcaaaagaaatacaccaaaacnacatactggtgggacggactccaatcctctgatgacttcgctctcatagt
gaatgcaccaaatcatgagggaatacaagcaggggtgaatagattctacagaacctgcaagctagtcggaatcaatatgagcaaaaagaagtcct
acataaataggacagggacatttgaattcacaagctttttctatcgctatggatttgtagccaattttagcatggagctgcccagctttggagtg
tctggaattaatgaatcggctgatatgagcattggggtaacagtgataaagaacaatatgataaataatgaccttgggccagcaacagcccaaat
ggctcttcaactattcatcaaagactacagatacacgtaccggtgccacagaggggacacacaaattcagacaaggagatcattcgagctaaaga
agctgtgggagcaaacccgctcaaaggcaggacttttggtgtcggatggaggatcaaacttatacaatatccggaatctccacattccagaagtc
tgcttgaaatgggagctaatggatgaagactatcaggggaggctttgtaatcccctgaatccatttgtcagtcataaggaaattgagtctgtaaa
caatgctgtggtaatgccagctcacggtccagccaagagcatggaatatgatgctgttgctactacacactcctggacccctaagaggaaccgct
ccattctcaacacaagccaaaggggaattcttgaagatgaacagatgtatcagaagtgttgcaatctatttgagaaattcttccctagcagttcg
tacaggagaccagttggaatttccagcatggtggaggccatggtgtctagggctcggattgatgcacggattgacttcgagtctggacggattaa
gaaagaggagttcgctgagatcatgaagatctgttccaccattgaagagctcagacggcaaaaatag
>gi|GENSCAN_predicted_peptide_8|562_aa
MAIIYLILLFTAVRGDQICIGYHANNSTEKVDTILERNVTVTHAKDILEKTHNGKLCKLNGIPPLELGDCSIAGWLLGNPECDRLLSVPEWSYIM
EKENPRYSLCYPGSFNDYEELKHLLSSVKHFEKVKILPKDRWTQHTTTGGSWACAVSGKPSFFRNMVWLTRKGSNYPVAKGSYNNTSGEQMLIIW
GVHHPNDEAEQRALYQNVGTYVSVATSTLYKRSIPEIAARPKVNGLGRRMEFSWTLLDMWDTINFESTGNLVAPEYGFKISKRGSSGIMKTEGTL
ENCETKCQTPLGAINTTLPFHNVHPLTIGECPKYVKSEKLVLATGLRNVPQIESRGLFGAIAGFIEGGWQGMVDGWYGYHHSNDQGSGYAADKES
TQKAFNGITNKVNSVIEKMNTQFEAVGKEFSNLEKRLENLNKKMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRMQLRDNVKELGN
GCFEFYHKCDNECMDSVKNGTYDYPKYEEESKLNRNEIKGVKLSSMGVYQILAIYATVAGSLSLAIMMAGISFWMCSNGSLQCRICI
>gi|GENSCAN_predicted_CDS_8|1689_bp
atggccatcatttatctcatactcctgttcacagcagtgaggggggaccagatatgcattggataccatgccaataattccacagaaaaggtcga
cacaattctagagcggaatgtcactgtgactcatgccaaggacatccttgagaagacccataacggaaagctatgcaaactaaacggaatccctc
cacttgaactaggggactgtagcattgccggatggctccttggaaatccagaatgtgataggcttctaagtgtgccagaatggtcctatataatg
gagaaagaaaacccgagatacagtttgtgttacccaggcagcttcaatgactatgaagaattgaaacatctcctcagcagcgtgaaacattttga
gaaagttaagattttgcccaaagatagatggacacagcatacaacaactggaggttcatgggcctgcgcggtgtcaggtaaaccatcattcttca
ggaacatggtctggctgacacgtaaaggatcaaattatccggttgccaaaggatcgtacaacaatacaagcggagaacaaatgctaataatttgg
ggagtgcaccatcctaatgatgaggcagaacaaagagcattgtaccagaatgtgggaacctatgtttccgtagccacatcaacattgtacaaaag
gtcaatcccagaaatagcagcaaggcctaaagtgaatggactaggacgtagaatggaattctcttggaccctcttggatatgtgggacaccataa
attttgagagcactggtaatctagttgcaccagagtatgggttcaaaatatcgaaaagaggtagttcagggatcatgaagacagaaggaacactt
gagaactgtgaaaccaaatgccaaactcctttgggagcaataaatacaacactaccttttcacaatgtccacccactgacaataggtgaatgccc
caaatatgtaaaatcggagaaattggtcttagcaacaggactaaggaatgttccccagattgaatcaagaggattgtttggggcaatagctggtt
ttatagaaggaggatggcaaggaatggttgatggttggtatggataccatcacagcaatgaccagggatcagggtatgcagcagacaaagaatcc
actcaaaaggcatttaatggaatcaccaacaaggtaaattctgtgattgaaaagatgaacacccaatttgaagctgttgggaaagaattcagtaa
cttagagaaaagactggagaacttgaacaaaaagatggaagacgggtttctagatgtgtggacatacaatgcagagcttctagttctgatggaaa
atgagaggacacttgactttcatgattctaatgtcaagaatctgtatgataaagtcagaatgcagctgagagacaacgtcaaagaactaggaaat
ggatgttttgaattttatcacaaatgtgacaatgaatgcatggatagtgtgaaaaacgggacatatgattatcccaagtatgaagaagaatctaa
actaaatagaaatgaaatcaaaggggtaaaattgagcagcatgggggtttatcaaatccttgccatttatgctacagtagcaggttctctgtcac
tggcaatcatgatggctgggatctctttctggatgtgctccaacgggtctctgcagtgcagaatctgcatatga
>gi|GENSCAN_predicted_peptide_9|759_aa
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPSLRMKWMMAMKYPITADKRITEMVPERNEQGQTLWSKMSDAGSDRVMVSPLA
VTWWNRNGPVASTVHYPKVYKTYFDKVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKK
EELRDCKISPLMVAYMLERELVRKTRFLPVAGGTSSIYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCH
STQIGGTRMVDILRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTSGSSVKKEEEVLTGNLQTLKIRVHEGYEEFTMVGKRATAILRKATRR
LVQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEHIDSVMGMVGVLPDMTPSTEMSM
RGIRVSKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGTERLTITYSSSMMWEINGPESVLVNTYQWIIRNWEAVKIQWSQNPAM
LYNKMEFEPFQSLVPKAIRSQYSGFVRTLFQQMRDVLGTFDTTQIIKLLPFAAAPPKQSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKTTKRL
TILGKDAGTLIEDPDESTSGVESAVLRGFLIIGKEDRRYGPALSINELSNLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
75
>gi|GENSCAN_predicted_CDS_9|2280_bp
atggaaagaataaaagaactacggaacctgatgtcgcagtctcgcactcgcgagatactgacaaaaaccacagtggaccatatggccataattaa
gaagtacacatcggggagacaggaaaagaacccgtcacttaggatgaaatggatgatggcaatgaaatacccaatcactgctgacaaaaggataa
cagaaatggttccggagagaaatgaacaaggacaaactctatggagtaaaatgagtgatgctggatcagatcgagtgatggtatcacctttggct
gtaacatggtggaatagaaatggacccgtggcaagtacggtccattacccaaaagtatacaagacttattttgacaaagtcgaaaggttaaaaca
tggaacctttggccctgttcattttagaaatcaagtcaagatacgcagaagagtagacataaaccctggtcatgcagacctcagtgccaaagagg
cacaagatgtaattatggaagttgtttttcccaatgaagtgggagccaggatactaacatcagaatcgcaattaacaataactaaagagaaaaaa
gaagaactccgagattgcaaaatttctcccttgatggttgcatacatgttagagagagaacttgtccgaaaaacaagatttctcccagttgctgg
cggaacaagcagtatatacattgaagtcttacatttgactcaaggaacgtgttgggaacaaatgtacactccaggtggagaagtgaggaatgacg
atgttgaccaaagcctaattattgcggccaggaacatagtaagaagagctgcagtatcagcagatccactagcatctttattggagatgtgccac
agcacacaaattggcgggacaaggatggtggacattcttagacagaacccgactgaagaacaagctgtggatatatgcaaggctgcaatgggatt
gagaatcagctcatccttcagctttggtgggtttacatttaaaagaacaagcgggtcatcagtcaaaaaagaggaagaagtgcttacaggcaatc
tccaaacattgaagataagagtacatgaggggtatgaggagttcacaatggtggggaaaagagcaacagctatactcagaaaagcaaccagaaga
ttggttcagctcatagtgagtggaagagacgaacagtcaatagccgaagcaataatcgtggccatggtgttttcacaagaggattgcatgataaa
agcagttagaggtgacctgaatttcgtcaacagagcaaatcaacggttgaaccccatgcatcagcttttaaggcattttcagaaagatgcgaaag
tgctttttcaaaattggggaattgaacacatcgacagtgtgatgggaatggttggagtattaccagatatgactccaagcacagagatgtcaatg
agaggaataagagtcagcaaaatgggtgtggatgaatactccagtacagagagggtggtggttagcattgatcggtttttgagagttcgagacca
acgcgggaatgtattattgtctcctgaggaggtcagtgaaacacagggaactgaaagattgacaataacatattcatcgtcgatgatgtgggaga
ttaacggtcctgagtcggttttggtcaatacctatcaatggatcatcagaaattgggaagctgtcaaaattcaatggtctcagaatcctgcaatg
ttgtacaacaaaatggaatttgaaccatttcaatctttagtccccaaggccattagaagccaatacagtgggtttgtcagaactctattccaaca
aatgagagacgtacttgggacatttgacaccacccagataataaagcttctcccttttgcagccgctccaccaaagcaaagcagaatgcagttct
cttcactgactgtaaatgtgaggggatcagggatgagaatacttgtaaggggcaattctcctgtattcaactacaacaagaccactaaaagacta
acaattctcggaaaagatgccggcactttaattgaagacccagatgaaagcacatccggagtggagtccgccgtcttgagagggtttctcattat
aggtaaggaagacagaagatacggaccagcattaagcatcaatgaactgagtaaccttgcaaaaggggaaaaggctaatgtgctaatcgggcaag
gagacgtggtgttggtaatgaaacgaaaacgggactctagcatacttactgacagccagacagcgaccaaaagaattcggatggccatcaattaa
>gi|GENSCAN_predicted_peptide_10|757_aa
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFL
EESHPGIFENSCLETMEVVQQTRVDKLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKEEMEITTHFQRKRR
VRDNMTKKMVTQRTIGKKKQRVNKRGYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANV
VRKMMTNSQDTELSFTITGDNTKWNENQNPRMFLAMITYITKNQPEWFRNILSIAPIMFSNKMARLGKGYMFESKRMKLRTQIPAEMLASIDLKY
FNESTRKKIEKIRPLLIDGTASLSPGMMMGMFNMLSTVLGVSVLNLGQKKYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLVGI
NMSKKKSYINKTGTFEFTSFFYRYGFVANFSMELPSFGVSGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQT
RRSFELKKLWDQTQSRAGLLVSDGGPNLYNIRNLHIPEVCLKWELMDENYRGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHS
WNPKRNRSILNTSQRGILEDEQMYQKCCNLFEKFFPSSSYRRPIGISSMVEAMVSRARIDARIDFESGRIKKEEFSEIMKICSTIEELRRQK
>gi|GENSCAN_predicted_CDS_10|2274_bp
atggatgtcaatccgactctactgttcctaaaggttccagcgcaaaatgccataagcaccacattcccttatactggagatcctccatacagcca
tggaacaggaacaggatacaccatggacacagtcaacagaacacaccaatattcagagaaggggaagtggacgacaaatacagaaactggggcac
cccaactcaacccaattgatggaccactacctgaggataatgagccaagtggatatgcacaaacagactgtgtcctggaggctatggccttcctt
gaagaatcccacccaggtatctttgagaactcatgccttgaaacaatggaagtcgttcaacaaacaagggtggacaaactaacccaaggccgcca
gacttatgattggacattaaacagaaatcaaccggcagcaactgcattagccaacaccatagaagtttttagatcgaatggactaacagccaatg
aatcaggaaggctaatagatttcctcaaggatgtgatggaatcaatggataaagaggaaatggagataacaacacactttcaaagaaaaaggaga
gtaagagacaacatgaccaagaaaatggtcacacaaagaacaatagggaagaaaaaacaaagagtgaataagagaggctatctaataagagcttt
gacattgaacacgatgaccaaagatgcagagagaggtaaattaaaaagaagggctattgcaacacccgggatgcaaattagagggttcgtgtact
tcgttgaaactttagctagaagcatttgcgaaaagcttgaacagtctggacttccggttgggggtaatgaaaagaaggccaaactggcaaatgtt
gtgagaaaaatgatgactaattcacaagacactgagctttctttcacaatcactggggacaacactaagtggaatgaaaatcaaaaccctcgaat
gtttttggcgatgattacatatatcacaaaaaatcaacctgagtggttcagaaacatcctgagcatcgcaccaataatgttctcaaacaaaatgg
caagactaggaaaaggatacatgttcgagagtaagagaatgaagctccgaacacaaatacccgcagaaatgctagcaagcattgacctgaagtat
ttcaatgaatcaacaaggaagaaaattgagaaaataaggcctcttctaatagatggcacagcatcattgagccctgggatgatgatgggcatgtt
caacatgctaagtacggttttaggagtctcggtactgaatcttgggcaaaagaaatacaccaagacaacatactggtgggatgggctccaatcct
ccgacgattttgccctcatagtgaatgcaccaaatcatgagggaatacaagcaggagtggatagattctacaggacctgcaagttagtgggaatc
aacatgagcaaaaagaagtcctatataaataaaacagggacatttgaattcacaagctttttttatcgatatggatttgtggctaattttagcat
ggagcttcccagttttggagtgtctggaataaacgagtcagctgatatgagtattggagtaacagtgataaagaacaacatgataaacaatgacc
ttgggccagcaacagcccagatggctctccaattgttcatcaaagactacagatatacatataggtgccatagaggagacacacaaattcagacg
agaagatcattcgagctaaagaagctgtgggatcaaacccaatcaagggcaggactattggtatcagatgggggaccaaacttatacaatatccg
gaaccttcacatccctgaagtctgcttaaagtgggagctaatggatgagaattatcggggaagactttgtaaccccctgaatccctttgtcagcc
ataaagaaattgagtctgtaaacaatgctgtagtgatgccagcccacggtccagccaaaagtatggaatatgatgccgttgcaactacacactcc
tggaatcccaagaggaaccgctctattctaaacactagccaaaggggaattcttgaggatgaacagatgtaccaaaagtgctgcaacttgttcga
gaaatttttccctagtagttcatataggagaccgattggaatttctagcatggtggaggccatggtgtctagggcccggattgatgccagaattg
acttcgagtctggacggattaagaaggaagagttctctgagatcatgaagatctgttccaccattgaagaactcagacggcaaaaataa
>gi|GENSCAN_predicted_peptide_11|716_aa
MEDFVRQCFNPMIVELAEKAMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIVVELDDPNALLKHRFEIIEGRDRTMAWTVVNSIC
NTTGAEKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSENTHIHIFSFTGEEIATKADYTLDEESRARIKTRLFTIRQEMANRGLWDS
FRQSERGEETIEEKFEISGTMRRLADQSLPPKFSCLENFRAYVDGFEPNGCIEGKLSQMSKEVNAKIEPFLKTTPRPIKLPNGPPCYQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCIKTFFGWKEPYIVKPHEKGINSNYLLSWKQVLSELQDIENEEKIPRTKNMKKTSQLKWALGENMAPEKVD
FDNCRDISDLKQYDSDEPELRSLSSWIQNEFNKACELTDSIWIELDEIGEDVAPIEYIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQISRPMFLYVRTNGT
SKVKMKWGMEMRRCLLQSLQQIESMIEAESSIKEKDMTKEFFENKSEAWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKL
LLVVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSFLTHALK
>gi|GENSCAN_predicted_CDS_11|2151_bp
atggaagattttgtgcgacaatgcttcaacccgatgattgtcgaacttgcagaaaaagcaatgaaagagtatggagaggatctgaaaattgaaac
aaacaaatttgcagcaatatgcacccacttggaggtatgtttcatgtattcagattttcatttcatcaatgaacaaggcgaatcaatagtggtag
aacttgatgatccaaatgcactgttaaagcacagatttgaaataatcgaggggagagacagaacaatggcctggacagtagtaaacagtatctgc
aacactactggagcagaaaaaccaaagtttctaccagatttgtatgattacaaggagaatagattcatcgaaattggagtgacaagaagagaagt
ccacatatattaccttgaaaaggccaataaaattaaatctgagaacacacacattcacatcttctcattcactggggaggaaatagccacaaagg
cagactacactctcgacgaggaaagcagggctaggattaaaaccaggctatttaccataagacaagaaatggccaacagaggcctctgggattcc
tttcgtcagtccgaaagaggcgaagaaacaattgaagaaaaatttgaaatctcaggaactatgcgtaggcttgccgaccaaagtctcccaccgaa
76
attctcctgccttgagaattttagagcctatgtggatggattcgaaccgaacggctgcattgagggcaagctttctcaaatgtccaaagaagtga
atgccaaaattgaaccttttctgaagacaacaccaagaccaatcaaacttcctaatggacctccttgttatcagcggtccaaattcctcctgatg
gatgctttgaaattgagcattgaagacccaagtcatgaaggagaagggattccattatatgatgcgatcaagtgcataaaaacattctttggatg
gaaagaaccttatatagtcaaaccacacgaaaagggaataaattcaaattacctgctgtcatggaagcaagtattgtcagaattgcaggacattg
aaaatgaggagaagatcccaaggactaaaaacatgaagaaaacgagtcaactaaagtgggctcttggtgaaaacatggcaccagagaaagtagac
tttgacaactgcagagacataagcgatttgaagcaatatgatagtgacgaacctgaattaaggtcactttcaagctggatacagaatgagttcaa
caaggcctgcgagctaactgattcaatctggatagagctcgatgaaattggagaggacgtagccccaattgagtacattgcaagcatgaggagga
attatttcacagcagaggtgtcccattgtagagccactgagtacataatgaagggggtatacattaatactgccctgctcaatgcatcctgtgca
gcaatggacgattttcaactaattcccatgataagcaagtgcagaactaaagagggaaggcgaaaaaccaatttatatggattcatcataaaggg
aagatctcatttaaggaatgacacagatgtggtaaactttgtgagcatggagttttctctcactgacccgagacttgagccacataaatgggaga
aatactgtgtccttgagataggagatatgttactaagaagtgccataggccaaatttcaaggcctatgttcttgtatgtgaggacaaacggaaca
tcaaaggtcaaaatgaaatggggaatggagatgagacgttgcctccttcagtcactccagcagatcgagagcatgattgaagccgagtcctcgat
taaagagaaagacatgaccaaagagttttttgagaataaatcagaagcatggcccattggggagtcccccaagggagtggaagaaggttccattg
ggaaagtctgtaggactctattggctaagtcagtgttcaatagcctgtatgcatcaccacaattggaaggattttcagcggagtcaagaaaactg
cttcttgttgttcaggctcttagggacaacctcgaacctgggacctttgatctcggggggctatatgaagcaattgaggagtgcctgattaatga
tccctgggttttgctcaatgcatcttggttcaactccttcctgacacatgcattaaaatag
>gi|GENSCAN_predicted_peptide_12|230_aa
MDSNTVSSFQVDCFLWHIRKQVVDQELSDAPFLDRLRRDQRSLRGRGNTLGLDIKAATHVGKQIVEKILKEESDEALKMTMVSTPASRYITDMTI
EELSRNWFMLMPKQKVEGPLCIRMDQAIMEKNIMLKANFSVIFDRLETIVLLRAFTEEGAIVGEISPLPSFPGHTIEDVKNAIGVLIGGLEWNDN
TVRVSKNLQRFAWRSSNENGGPPLTPKQKRKMARTARSKV
>gi|GENSCAN_predicted_CDS_12|693_bp
atggattccaacactgtgtcaagtttccaggtagattgctttctttggcatatccggaaacaagttgtagaccaagaactgagtgatgccccatt
ccttgatcggcttcgccgagatcagaggtccctaaggggaagaggcaatactctcggtctagacatcaaagcagccacccatgttggaaagcaaa
ttgtagaaaagattctgaaagaagaatctgatgaggcacttaaaatgaccatggtctccacacctgcttcgcgatacataactgacatgactatt
gaggaattgtcaagaaactggttcatgctaatgcccaagcagaaagtggaaggacctctttgcatcagaatggaccaggcaatcatggagaaaaa
catcatgttgaaagcgaatttcagtgtgatttttgaccgactagagaccatagtattactaagggctttcaccgaagagggagcaattgttggcg
aaatctcaccattgccttcttttccaggacatactattgaggatgtcaaaaatgcaattggggtcctcatcggaggacttgaatggaatgataac
acagttcgagtctctaaaaatctacagagattcgcttggagaagcagtaatgagaatgggggacctccacttactccaaaacagaaacggaaaat
ggcgagaacagctaggtcaaaagtttga
>gi|GENSCAN_predicted_peptide_13|498_aa
MASQGTKRSYEQMETDGDRQNATEIRASVGKMIDGIGRFYIQMCTELKLSDHEGRLIQNSLTIEKMVLSAFDERRNKYLEEHPSAGKDPKKTGGP
IYRRVDGKWMRELVLYDKEEIRRIWRQANNGEDATAGLTHIMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGIGTMV
MELIRMVKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMVDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACAYGPAV
SSGYDFEKEGYSLVGIDPFKLLQNSQIYSLIRPNENPAHKSQLVWMACHSAAFEDLRLLSFIRGTKVSPRGKLSTRGVQIASNENMDNMGSSTLE
LRSGYWAIRTRSGGNTNQQRASAGQTSVQPTFSVQRNLPFEKSTIMAAFTGNTEGRTSDMRAEIIRMMEGAKPEEVSFRGRGVFELSDEKATNPI
VPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_13|1497_bp
atggcgtcccaaggcaccaaacggtcttatgaacagatggaaactgatggggatcgccagaatgcaactgagattagggcatccgtcgggaagat
gattgatggaattgggagattctacatccaaatgtgcactgaacttaaactcagtgatcatgaagggcggttgatccagaacagcttgacaatag
agaaaatggtgctctctgcttttgatgaaagaaggaataaatacctggaagaacaccccagcgcggggaaagatcccaagaaaactggggggccc
atatacaggagagtagatggaaaatggatgagggaactcgtcctttatgacaaagaagagataaggcgaatctggcgccaagccaacaatggtga
ggatgcgacagctggtctaactcacataatgatctggcattccaatttgaatgatgcaacataccagaggacaagagctcttgttcgaactggaa
tggatcccagaatgtgctctctgatgcagggctcgactctccctagaaggtccggagctgcaggtgctgcagtcaaaggaatcgggacaatggtg
atggaactgatcagaatggtcaaacgggggatcaacgatcgaaatttctggagaggtgagaatgggcggaaaacaagaagtgcttatgagagaat
gtgcaacattcttaaaggaaaatttcaaacagctgcacaaagagcaatggtggatcaagtgagagaaagtcggaacccaggaaatgctgagatcg
aagatctcatatttttggcaagatctgcattgatattgagagggtcagttgctcacaaatcttgcctacctgcctgtgcgtatggacctgcagta
tccagtgggtacgacttcgaaaaagagggatattccttggtgggaatagaccctttcaaactacttcaaaatagccaaatatacagcctaatcag
acctaacgagaatccagcacacaagagtcagctggtgtggatggcatgccattctgctgcatttgaagatttaagattgttaagcttcatcagag
ggacaaaagtatctccgcgggggaaactgtcaactagaggagtacaaattgcttcaaatgagaacatggataatatgggatcgagcactcttgaa
ctgagaagcgggtactgggccataaggaccaggagtggaggaaacactaatcaacagagggcctccgcaggccaaaccagtgtgcaacctacgtt
ttctgtacaaagaaacctcccatttgaaaagtcaaccatcatggcagcattcactggaaatacggagggaaggacttcagacatgagggcagaaa
tcataagaatgatggaaggtgcaaaaccagaagaagtgtcattccgggggaggggagttttcgagctctcagacgagaaggcaacgaacccgatc
gtgccctcttttgatatgagtaatgaaggatcttatttcttcggagacaatgcagaagagtacgacaattaa
>gi|GENSCAN_predicted_peptide_14|1223_aa
MNPNQKIITIGSVSLTISTICFFMQIAILITTVTLHFKQYEFNSPPNNQVMLCEPTIIERNITEIVYLTNTTIEKEMCPKLAEYRNWSKPQCDIT
GFAPFSKDNSIRLSAGGDIWVTREPYVSCDPDKCYQFALGQGTTLNNVHSNDTVHDRTPYRTLLMNELGVPFHLGTKQVCIAWSSSSCHDGKAWL
HVCVTGDDKNATASFIYNGRLVDSIVSWSKKILRTQESECVCINGTCTVVMTDGSASGKADTKILFIEEGKIIHTSTLSGSAQHVEECSCYPRYP
GVRCVCRDNWKGSNRPIVDINIKDYSIVSSYVCSGLVGDTPRKNDSSSSSHCLDPNNEEGGHGVKGWAFDDGNDVWMGRTISEKLRSGYETFKVI
EGWSKPNSKLQINRQVIVDRGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNN
MDKAVKLYRKLKREITFHGAKEIALSYSAVPNGTIVKTITNDQIEVTNATELVQSSSTGGICDSPHQILDGENCTLIDALLGDPQCDGFQNKKWD
LFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNNESFNWTGVTQNGTSSACKRRSNNSFFSRLNWLTHLKFKYPALNVTMPNNEKFDKLYIW
GVHHPGTDNDQISLYAQASGRITVSTKRSQQTVIPSIGSRPRIRDVPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRSGKSSIMRSDAPIG
KCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMVDGWYGFRHQNSEGTGQAADLKST
QAAINQINGKLNRLIGKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFERTKKQLRENAEDMGNG
CFKIYHKCDNACIGSIRNGTYDHDVYRDEALNNRFQIKGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGL
QRRRFVQNALNGNGDPNNMDRAVKLYKKLKREITFHGAKEVALSYSTGALASCMGLIYNRMGTVTTEVAFGLVCATCEQIADSQHRSHRQMATTT
NPLIRHENRMVLASTTAKAMEQMAGSSEQAAEAMEVASQARQMVQAMRTIGTHPSSSAGLKDNLLENLQAYQKRMGVQMQRFK
>gi|GENSCAN_predicted_CDS_14|3672_bp
atgaatccaaatcaaaagataataacgattggctctgtttctctcaccatttccacaatatgcttcttcatgcaaattgccatcctgataaccac
tgtaacattgcatttcaagcaatatgaattcaactcccccccaaacaaccaagtgatgctgtgtgaaccaacaataatagaaagaaacataacag
agatagtgtatctgaccaacaccaccatagagaaggaaatgtgccccaaactagcagaatacagaaattggtcaaagccgcaatgtgacattaca
ggatttgcacctttttctaaggacaattcgattaggctttccgctggtggggacatctgggtgacaagagaaccttatgtgtcatgcgaccctga
caagtgttaccaatttgcccttggacagggaacaacactaaacaacgtgcattcaaatgacacagtacatgataggaccccttatcggaccctat
tgatgaatgaattaggtgttccatttcatctggggaccaagcaagtgtgcatagcatggtccagctcaagttgtcacgatggaaaagcatggctg
catgtttgtgtaacgggggatgataaaaatgcaactgctagcttcatttacaatgggaggcttgtagatagtattgtttcatggtccaaaaaaat
cctcaggacccaggagtcagaatgcgtttgtatcaatggaacttgtacagtagtaatgactgatgggagtgcttcaggaaaagctgatactaaaa
77
tactattcattgaggaggggaaaatcattcatactagcacattgtcaggaagtgctcagcatgtcgaggagtgctcctgctatcctcgatatcct
ggtgtcagatgtgtctgcagagacaactggaaaggctccaataggcccatcgtagatataaacataaaggattatagcattgtttccagttatgt
gtgctcagggcttgttggagacacacccagaaaaaacgacagctccagcagtagccattgcttggatcctaacaatgaagaaggtggtcatggag
tgaaaggctgggcctttgatgatggaaatgacgtgtggatgggaagaacgatcagcgagaagttacgctcaggatatgaaaccttcaaagtcatt
gaaggctggtccaaacctaattccaaattgcagataaataggcaagtcatagttgacagaggccccctcaaagccgagatcgcgcagagacttga
agatgtctttgctgggaaaaacacagatcttgaggctctcatggaatggctaaagacaagaccaattctgtcacctctgactaaggggattttgg
ggtttgtgttcacgctcaccgtgcccagtgagcgaggactgcagcgtagacgctttgtccaaaatgccctcaatgggaatggagatccaaataac
atggacaaagcagttaaactgtataggaaacttaagagggagataacgttccatggggccaaagaaatagctctcagttattctgctgtaccaaa
cggaacgatagtgaaaacaatcacgaatgaccaaattgaagtcactaatgctactgaactggttcagagttcctcaacaggtggaatatgcgaca
gtcctcatcagatccttgatggagaaaactgcacactaatagatgctctattgggagaccctcagtgtgatggcttccaaaataagaaatgggac
ctttttgttgaacgcagcaaagcctacagcaactgttacccttatgatgtgccggattatgcctcccttaggtcactagttgcctcatccggcac
actggagtttaacaatgaaagcttcaattggactggagtcactcaaaatggaacaagctctgcttgcaaaaggagatctaataacagtttcttta
gtagattgaattggttgacccacttaaaattcaaatacccagcattgaacgtgactatgccaaacaatgaaaaatttgacaaactgtacatttgg
ggggttcaccacccgggtacggacaatgaccaaatcagcctatatgctcaagcatcaggaagaatcacagtctctaccaaaagaagccaacaaac
cgtaatcccgagtatcggatctagacccaggataagggatgtccccagcagaataagcatctattggacaatagtaaaaccgggagacatacttt
tgattaacagcacagggaatctaattgctcctcggggttacttcaaaatacgaagtgggaaaagctcaataatgagatcagatgcacccattggc
aaatgcaattctgaatgcatcactccaaatggaagcattcccaatgacaaaccatttcaaaatgtaaacaggatcacatatggggcctgtcccag
atatgttaagcaaaacactctgaaattggcaacagggatgcgaaatgtaccagagaaacaaactagaggcatatttggcgcaatcgcgggtttca
tagaaaatggttgggagggaatggtagacggttggtacggtttcaggcatcaaaattctgagggaacaggacaagcagcagatctcaaaagcact
caagcagcaatcaaccaaatcaatgggaagctgaataggttgatcgggaaaacaaacgagaaattccatcagattgaaaaagaattctcagaagt
agaagggagaattcaggacctcgagaaatatgttgaggacactaaaatagatctctggtcatacaacgcggagcttcttgtggccctggagaacc
aacatacaattgatctaactgactcagaaatgaacaaactgtttgaaagaacaaagaagcaactgagggaaaatgctgaggatatgggcaatggt
tgtttcaaaatataccacaaatgtgacaatgcctgcatagggtcaatcagaaatggaacttatgaccatgatgtatacagagatgaagcattaaa
caaccggttccagatcaaaggccccctcaaagccgagatcgcgcagagacttgaggatgtctttgcaggaaagaacaccgatctcgaggctctca
tggaatggctaaagacaagaccaatcctgtcacctctgactaaagggattttaggatttgtgttcacgctcaccgtgcccagtgagcgaggactg
cagcgtagacgctttgtccagaatgccttaaatggaaatggagatccaaacaatatggatagggcagttaagctatacaagaagctgaaaagaga
aataacattccatggggctaaggaggtcgcactcagctactcaaccggtgcacttgccagttgtatgggtctcatatacaacaggatgggaacgg
tgaccacagaagtggcttttggcctagtgtgtgccacttgtgagcagattgcagattcacagcatcggtctcacagacagatggcaactaccacc
aacccactaatcaggcatgagaacagaatggtgctggccagcactacagctaaggctatggagcagatggctggatcgagtgagcaggcagcgga
agccatggaggttgctagtcaggctaggcagatggtgcaggcaatgaggacaattgggactcatcctagctccagtgccggtctgaaagataatc
ttcttgaaaatttgcaggcctaccaaaaacgaatgggagtgcaaatgcagcgattcaagtga
>gi|GENSCAN_predicted_peptide_15|1320_aa
MEKIVLLLAIVSLVKSDQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDLNGVKPLILRDCSVAGWLLGNPMCDEFINVPEWSYI
VEKASPANDLCYPGDFNDYEELKHLLSRTNHFEKIQIIPKSSWSNHDASSGVSSACPYHGRSSFFRNVVWLIKKNSAYPTIKRSYNNTNQEDLLV
LWGIHHPNDAAEQTKLYQNPTTYISVGTSTLNQRLVPEIATRPKVNGQSGRMEFFWTILKPNDAINFESNGNFIAPEYAYKIVKKGDSAIMKSEL
EYGNCNTKCQTPMGAINSSMPFHNIHPLTIGECPKYVKSNRLVLATGLRNTPQRERRRKKRGLFGAIAGFIEGGWQGMVDGWYGYHHSNEQGSGY
AADKESTQKAIDGVTNKVNSIIDKMNTQFEAVGREFNNLERRIENLNKQMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRLQLRDN
AKELGNGCFEFYHKCDNECMESVKNGTYDYPQYSEEARLNREEISGVKLESMGTYQILSIYSTVASSLALAIMGALLNDKHSNGTVKDRSPHRTL
MSCPVGEAPSPYNSRFESVAWSASACHDGTSWLTIGISGPDNGAVAVLKYNGIITDTIKSWRNNILRTQESECACVNGSCFTVMTDGPSNGQASY
KIFKMEKGKVVKSVELNAPNYHYEECSCYPDAGEITCVCRDNWHGSNRPWVSFNQNLEYQIGYICSGVFGDNPRPNDGTGSCGPVSPNGAYGVKG
FSFKYGNGVWIGRTKSTNSRSGFEMIWDPNGWTGTDSSFSVKQDIVAITDWVDNHSLSDINIMASQGTKRSYEQMETGGERQNATEIRASVGRMV
GGIGRFYIQMCTELKLSDYEGRLIQNSITIERMVLSAFDERRNKYLEEHPSAGKDPKKTGGPIYRRRDGKWVRELILYDKEEIRRIWRQANNGED
ATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMVMELIRMIKRGINDRNFWRGENGRRTRIAYERMC
NILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAVASGYDFEREGYSLVGIDPFRLLQNSQVFSLIRP
NENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVAPRGQLSTRGVQIASNENMETMDSSTLELRSRYWAIRTRSGGNTNQQRASAGQISVQPTFS
VQRNLPFERATIMAAFTGNTEGRTS
DMRTEIIRMMESSRPEDVSFQGRGVFELSDEKATNPIVPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_15|3963_bp
atggagaaaatagtgcttcttcttgcaatagtcagtcttgtcaaaagtgatcagatttgcattggttaccatgcaaacaactcgacagagcaggt
tgacacaataatggaaaagaacgttactgttacacatgcccaagacatactggaaaagacacacaatgggaagctctgcgatctaaatggagtga
agcctctcattttgagagattgtagtgtagctggatggctcctcggaaac
cctatgtgtgacgaattcatcaatgtgccggaatggtcttacatagtggagaaggccagtccagccaatgacctctgttacccaggggatttcaa
cgactatgaagaactgaaacacctattgagcagaacaaaccattttgagaaaattcagatcatccccaaaagttcttggtccaatcatgatgcct
catcaggggtgagctcagcatgtccataccatgggaggtcctcctttttcagaaatgtggtatggcttatcaaaaagaacagtgcatacccaaca
ataaagaggagctacaataataccaaccaagaagatcttttagtactgtgggggattcaccatcctaatgatgcggcagagcagacaaagctcta
tcaaaacccaaccacttacatttccgttggaacatcaacactgaaccagagattggttccagaaatagctactagacccaaagtaaacgggcaaa
gtggaagaatggagttcttctggacaattttaaagccgaatgatgccatcaatttcgagagtaatggaaatttcattgctccagaatatgcatac
aaaattgtcaagaaaggggactcagcaattatgaaaagtgaattggaatatggtaactgcaacaccaagtgtcaaactccaatgggggcgataaa
ctctagtatgccattccacaacatacaccccctcaccatcggggaatgccccaaatatgtgaaatcaaacagattagtccttgcgactggactca
gaaatacccctcagagagagagaagaagaaaaaagagaggactatttggagctatagcaggttttatagagggaggatggcagggaatggtagat
ggttggtatgggtaccaccatagcaatgagcaggggagtggatacgctgcagacaaagaatccactcaaaaggcaatagatggagtcaccaataa
ggtcaactcgatcattgacaaaatgaacactcagtttgaggccgttggaagggaatttaataacttggaaaggaggatagagaatttaaacaagc
agatggaagacggattcctagatgtctggacttataatgctgaacttctggttctcatggaaaatgagagaactctagactttcatgactcaaat
gtcaagaacctttatgacaaggtccgactacagcttagggataatgcaaaggagctgggtaatggttgtttcgagttctatcacaaatgtgataa
tgaatgtatggaaagtgtaaaaaacggaacgtatgactacccgcagtattcagaagaagcaagactaaacagagaggaaataagtggagtaaaat
tggaatcaatgggaacttaccaaatactgtcaatttattcaacagtggcgagttccctagcactggcaatcatgggagccttgctgaatgacaag
cactccaatgggaccgtcaaagacagaagccctcacagaacattgatgagttgtcctgtgggtgaggctccctccccatataactcaaggtttga
gtctgttgcttggtcggcaagtgcttgccatgatggcaccagttggttgacaattggaatttctggcccagacaatggggctgtggctgtattga
aatacaacggcataataacagacactatcaagagttggaggaacaacatactgagaactcaagagtctgaatgtgcatgtgtaaatggctcttgc
tttactgtaatgactgacggaccaagtaatgggcaggcctcatataagatcttcaaaatggaaaaagggaaagtagttaaatcagtcgaattgaa
tgcccctaattatcactatgaggagtgctcctgttatcctgatgctggcgaaatcacatgtgtgtgcagggataattggcatggctcaaatcggc
catgggtatctttcaatcaaaatttggagtatcaaataggatatatatgcagtggagttttcggagacaatccacgccccaatgatggaacaggc
agttgtggtccggtgtcccctaacggggcatatggagtaaaagggttttcatttaaatacggcaatggtgtttggatcgggagaaccaaaagcac
taattccaggagcggctttgaaatgatttgggatccaaatgggtggactggaacggacagtagcttctcggtgaaacaagatatcgtagcaataa
ctgattgggtagataatcactcactgagtgacatcaacatcatggcgtctcagggcaccaaacgatcttatgaacagatggaaactggtggagaa
cgccagaatgctactgagatcagagcatctgttggaagaatggttggtggaattgggaggttttatatacagatgtgcactgaactcaaactcag
78
cgactatgaaggaaggctgattcagaacagcataacaatagagagaatggttctctctgcatttgatgaaaggaggaacaaatacctggaagaac
atcccagtgcggggaaggacccaaagaaaactggaggtccaatctaccgaagaagagacggaaaatgggtgagagagctgattctgtatgacaaa
gaggagatcaggagaatttggcgtcaagcgaacaatggagaagatgcaactgctggtctcactcacatgatgatctggcattccaatctaaatga
tgccacataccagagaacaagagctctcgtgcgtactgggatggaccctagaatgtgctctctgatgcaaggatcaactctcccgaggagatctg
gagctgctggtgcggcagtaaagggagtcggaacgatggtgatggaactaattcggatgataaagcgagggattaacgatcggaatttctggaga
ggtgaaaatgggcgaagaacaagaattgcatatgagagaatgtgcaacatcctcaaagggaaattccaaacagcagcacaaagagcaatgatgga
tcaggtacgggaaagcagaaatcctgggaatgctgagattgaagatctcatatttctggcacggtctgcactcatcctgagaggatcagtggccc
acaagtcctgcttgcctgcttgtgtgtacgggcttgccgtggccagtggatatgactttgagagagaagggtactctctggtcgggattgatcct
ttccgtctgctgcaaaacagccaggtctttagtctaattagaccaaatgagaatccagcacataaaagtcaattggtgtggatggcatgccattc
tgcagcatttgaagatctgagagtctcaagcttcatcagagggacaagagtggccccaaggggacaactatctactagaggagttcaaattgctt
caaatgagaacatggaaacaatggactccagcactcttgaactgagaagcagatattgggctataaggaccaggagtggaggaaacaccaaccag
cagagagcatctgcaggacaaatcagtgtgcagcctactttctcggtacagagaaatcttcccttcgaaagagcgaccattatggcggcattcac
agggaatacagagggcagaacatctgacatgaggactgaaatcataaggatgatggaaagctccagaccagaagatgtgtctttccaggggcggg
gagtcttcgagctctcggacgaaaaggcaacgaacccgatcgtgccttcctttgacatgagtaatgaaggatcttatttcttcggagacaatgca
gaggaatatgacaattga
>gi|GENSCAN_predicted_peptide_16|716_aa
MEDFVRQCFNPMIVELAEKAMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESTIIESGDPNALLKHRFEIIEGRDRTMAWTVVNSIC
NTTGVEKPKFLPDLYDYKENRFIEIGVTRREVHTYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDS
FRQSERGEETVEERFEITGTMCRLADQSLPPNFSSLEKFRAYVDGFEPNGCIEGKLSQMSKEVNARIEPFLKTTPRPLRLPDGPPCSQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCMKTFFGWKEPNIVKPHEKGINPNYLLAWKQVLAELQDIENEEKIPKTKNMRKTSQLKWALGENMAPEKVD
FEDCKDVSDLRQYDSDEPKPRSLASWIQSEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFLIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHRWEKYCVLRIGDMLLRTEIGQVSRPMFLYVRTNGT
SKIKMKWGMEMRRCPFQSLQQIESMIEAESSVKEKDMTKEFFENKSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKL
LLIVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSFLTHALR
>gi|GENSCAN_predicted_CDS_16|2151_bp
atggaagactttgtgcgacaatgcttcaatccaatgattgtcgagcttgcggaaaaggcaatgaaagaatatggggaagatccgaaaatcgaaac
gaacaaatttgccgcaatatgcacgcacttagaagtctgtttcatgtattcagatttccactttattgatgaacggggcgaatcaacaattatag
aatctggcgatcccaatgcattattgaaacaccggtttgaaataatcgaagggagggaccgaacaatggcctggacagtggtgaatagtatctgc
aacaccacaggagttgagaagcctaaatttctcccagatttgtatgactacaaggagaaccgatttattgaaattggagtgacacggagggaagt
tcacacatactatctagaaaaagccaacaagataaaatctgagaagacacacattcacatattctcattcactggagaggaaatggccaccaaag
cggactacacccttgatgaagaaagcagggcccgaatcaaaaccaggctgttcactataaggcaggaaatggccagtaggggtttatgggattcc
tttcgtcagtccgagagaggcgaagagacagttgaagaaagatttgaaatcacagggactatgtgcaggcttgccgaccaaagtctcccacctaa
tttctccagccttgaaaaatttagagcctatgtggatggattcgaaccgaacggctgcattgagggcaagctttctcaaatgtcgaaagaagtaa
acgccagaattgagccatttctgaagacaacaccacgccctcttagattacctgatgggcctccctgctctcagcggtcgaagtttttgctgatg
gatgcccttaaattaagcatcgaagacccgagtcatgagggggaggggataccgctatatgatgcaatcaaatgcatgaaaacatttttcggctg
gaaagagcccaacattgtaaaaccacatgaaaaaggcataaaccccaattacctcctggcttggaagcaggtgctggcagagctccaagatattg
aaaacgaggagaaaattccaaagacaaagaacatgaggaaaacaagccaattgaagtgggcacttggtgagaatatggcaccagagaaagtagac
tttgaggattgcaaagatgttagcgatctaaggcagtatgacagtgatgaaccaaagcctagatcactagcaagctggatccagagtgaattcaa
caaggcatgcgaattgacagattcaagttggattgaacttgatgaaataggggaagacgttgctccaattgagcacattgcaagtatgagaagga
actatttcacagcggaagtatcccattgcagggctactgaatacataatgaagggagtgtacataaacacagctttgttgaatgcatcctgtgca
gccatggatgacttccaactgatcccaatgataagcaaatgcagaaccaaagaaggaagacggaaaactaacctgtatggattccttataaaagg
aagatcccatttgagaaatgacaccgatgtggtaaactttgtgagtatggaattctctcttactgatccgaggctggagccacacagatgggaaa
agtactgcgttcttcggataggagacatgctcttacggactgaaataggccaagtgtcaaggcccatgtttctttatgtgagaaccaatggaacc
tccaagatcaagatgaaatggggcatggaaatgaggcgatgcccttttcaatcccttcaacagattgagagcatgattgaggccgagtcttctgt
caaagaaaaagacatgactaaagaattctttgaaaacaaatcagaaacatggccaattggagaatcacccaagggagtggaggaaggctccatcg
ggaaggtgtgcagaaccttactggctaaatctgttttcaacagtctatatgcatctccacaactcgaggggttttcagctgaatcaagaaaattg
cttctcattgttcaggcacttagggacaacctggaacctggaaccttcgatcttggggggctatatgaagcaattgaggagtgcctgattaatga
tccctgggttttgcttaatgcatcttggttcaactccttcctcacacatgcactaagatag
>gi|GENSCAN_predicted_peptide_17|718_aa
MDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGIFENSCLETMEVVQQTRVDKLTQGRQTYDWTLK
RNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKGEMEIITHFQRKRRVRDNMTKKMVTQRTIGKKKQRLNKRSYLIRALTLNTMTK
DAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSFTITGDNTKWNENQNPRMFLAMITY
ITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLIDGTASLSPGMMMGMFNMLSTVL
GVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIEAGVDRFYRTCKLVGINMTKKKSYINRTGTCEFTSFFYRYGFVANFSMELPSFGV
SGINESADMSIGVTVIKNNMMDNDLGPATAQMALQLFIKDYRYPYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGGPNPYNIRNLHIPEA
GLKWELMDEDYQGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHSWIPKRNRSILNTSQRGILEDEQMYQKCCNLFEKFFPSSS
YRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFAEIMKICSTIEELGRQK
>gi|GENSCAN_predicted_CDS_17|2157_bp
atggacacagtcaacagaacacatcaatattcagaaaaggggaaatggacaacgaacacagagactggagcaccccaactcaatccgattgatgg
accactacctgaggataatgagccgagtgggtatgcacaaacagattgtgtattggaagcaatggctttccttgaagaatcccacccagggatct
ttgaaaactcgtgtcttgaaacgatggaagttgttcagcaaacaagagtggataagctgacccaaggtcgccaaacctatgactggacattgaaa
agaaaccagccggctgcaaccgctttggccaacactatagaggtcttcagatcgaatggtctaacagccaatgaatcgggaaggctaatagattt
cctcaaagacgtgatggaatcaatggataagggagaaatggaaataataacacatttccagagaaagagaagagtgagggacaacatgaccaaga
aaatggtcacacaaagaacaatagggaagaaaaaacaaaggctgaacaaaaggagctacctaataagagcactgacactgaacacaatgacaaaa
gacgcagaaagaggcaaattgaagaggcgggcaattgcaacacccgggatgcaaatcagaggattcgtgtactttgtcgaaacactagcgaggag
tatctgtgagaaacttgagcaatctggactccccgtcggagggaatgaaaagaaggctaaattggcaaatgtcgtgaggaagatgatgactaact
cacaagatacagagctctcttttacaattactggagacaacaccaaatggaatgagaatcagaaccctcggatgtttctagcaatgataacatac
atcacaaggaaccaacctgaatggtttagaaatgtcttaagcattgctcctataatgttctcaaacaagatggcaagattagggaaaggatacat
gttcgaaagtaagagcatgaagctacggacacaaataccagcagaaatgcttgcaagcattgacttgaaatacttcaacgaatcaacgagaaaga
aaatcgagaaaataagacctctactaatagatggcacagcctcattgagtcctggaatgatgatgggcatgttcaatatgctgagtacagtctta
ggagtttcaatcctgaatcttgggcagaagaggtacaccaaaaccacatactggtgggacggactccaatcctctgatgatttcgctctcatagt
gaatgcaccaaatcatgagggaatagaagcaggggtggataggttctataggacttgcaaactagttggaatcaatatgaccaagaagaagtctt
acataaatcggacaggaacatgtgaattcacaagcttcttctaccgctatgggttcgtagccaacttcagtatggagctgcccagctttggagtg
tctgggattaatgaatcggctgacatgagcattggtgttacagtgataaagaacaatatgatggacaacgaccttggaccagcaacagctcagat
ggctcttcagctattcattaaggactacagatacccataccgatgccacaggggggatacacaaatccaaacgaggagatcattcgagctgaaga
agctgtgggagcagacccgctcaaaggcaggactgttggtttcagatggaggaccaaacccatacaatatccggaatctccacattccggaggct
79
ggcttgaagtgggaattgatggatgaagactaccagggcagactgtgtaatcctctgaacccgtttgttagtcataaggaaattgagtctgtcaa
caatgctgtggtaatgccagctcatggcccagccaagagcatggaatatgatgcagttgcgactacacattcatggattcccaagaggaatcgtt
ccattctcaacaccagccaaagggggattcttgaggatgaacagatgtatcagaagtgctgcaatctattcgagaaattcttccctagcagttca
tatcggaggccagttggaatttccagcatggtggaggccatggtgtctagggcccgaattgatgcacgaattgacttcgagtctggaaggattaa
gaaagaagagtttgctgagatcatgaagatctgttccaccattgaagagctcggacggcaaaaatag
>gi|GENSCAN_predicted_peptide_18|759_aa
MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLA
VTWWNRNGPTTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKK
EELQDCKIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCH
STQIGGIRMVDILRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTNGSSVKKEEEVLTGNLQTLKIKVHEGYEEFTMVGRRATAILRKATRR
LIQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEPIDNVMGMIGILPDMTPSAEMSL
RGVRVSKMGVDEYSSTERVVVSIDRFLRVRDQQGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQDPTM
LYNKMEFESFQSLVPKAARSQYSGFVRTLFQQMRDVLGTFDTVQIIKLLPFAAAPPEPSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKATKRL
TVLGKDAGALTEDPDEGTAGVESAVLRGFLILGREDKRYGPALSINELSNLAKGEKANVLIMQGDVVLVMKRKRDFSILTDSQTATKRIRMAIN
>gi|GENSCAN_predicted_CDS_18|2280_bp
atggaaagaataaaagaactaagagatctaatgtcgcagtcccgcactcgcgagatactaacaaaaaccactgtggatcatatggccataatcaa
gaaatacacatcaggaagacaagagaagaaccctgctctcagaatgaaatggatgatggcaatgaaatatccaatcacagcagacaagagaataa
tggagatgattcctgaaaggaatgagcaaggacaaacgctttggagcaagacaaatgatgctgggtcggacagagtgatggtgtctcccctagct
gtaacttggtggaacaggaatgggccgacaacaagtacagtccattatccaaaggtttacaaaacatactttgagaaggttgaaaggttaaaaca
tggaaccttcggtcccgttcatttccgaaaccaagttaaaatacgtcgccgggtggatataaacccgggccatgcagatctcagtgctaaagaag
cacaagatgttatcatggaggtcgttttcccaaatgaagtgggagctagaatattgacatcagagtcgcaattgacaataacaaaagagaagaaa
gaagagctccaggattgtaaaattgctcctttaatggtggcatacatgttggaaagagaactggtccgcaaaaccagatttctaccggtagcagg
cggaacaagcagtgtgtacattgaggtattgcatttgactcaagggacctgttgggaacagatgtacactcccggcggagaagtaagaaatgatg
atgttgaccagagtttgatcatcgctgccagaaacattgttaggagagcaacagtatcagcggacccactggcatcactcttggagatgtgtcac
agcacacaaattgggggaataaggatggtggacatccttaggcaaaacccaactgaggagcaagctgtggatatatgcaaagcagcaatgggttt
gaggatcagttcatcctttagctttggaggcttcactttcaaaagaacaaatggatcatccgtcaagaaggaagaggaagtgcttacaggcaacc
tccaaacattgaaaataaaagtacatgaggggtatgaagaattcacaatggttgggcggagagcaacagctatcctgaggaaagcaactagaagg
ctgattcagttgatagtaagtggaagagatgaacaatcaatcgctgaagcgatcattgtagcaatggtgttctcacaggaggattgcatgataaa
ggcagtccgaggcgatctgaatttcgtgaacagagcaaaccaaagattgaaccccatgcatcaactcctgaggcacttccaaaaagatgcaaaag
tgctgtttcagaactggggaattgaacctattgacaatgtcatggggatgatcggaatattacctgacatgactccaagcgcagagatgtcactg
agaggagtgagagttagtaagatgggagtagatgaatattccagcacggagagagtggtggtgagtattgaccgtttcttgagggtccgagatca
gcaggggaacgtactcttatctcctgaagaggttagtgaaacacagggaacagagaagttgacaataacatattcatcctcaatgatgtgggaaa
tcaacggtcctgagtcagtgcttgttaacacttatcaatggatcatcaggaattgggagactgtaaagattcaatggtctcaagatcccacaatg
ctgtacaataagatggagtttgaatcgttccaatccttggtgccaaaggctgccagaagccaatatagtggatttgtgagaacactattccaaca
gatgcgtgatgttttggggacatttgatactgtccaaataatcaagctgctaccatttgcagcagccccaccggagccgagcagaatgcagtttt
cttctctaactgtgaatgtgagaggctcaggaatgagaatactcgtgaggggtaactcccccgtgttcaactacaacaaggcaaccaaaaggctt
acagtcctcggaaaggacgcaggtgcattaacagaagatccagacgagggaacagccggggtggaatctgcagtattgaggggattcctaattct
aggcagagaggacaaaagatatggacccgcattgagcatcaatgaactgagcaatcttgcaaaaggggagaaggctaatgtattgataatgcaag
gagacgtggtgttggtaatgaaacggaaacgggactttagcatacttactgacagccagacagcgaccaaaagaattcggatggccatcaattag
>gi|GENSCAN_predicted_peptide_19|716_aa
MEDFVRQCFNPMIVELAEKTMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESIIVESGDPNALLKHRFEIIEGRDRAMAWTVVNSIC
NTTGVDKPKFLPDLYDYKENRFTEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDS
FRQSERGEETIEERFEITGTMRRLADQSLPPNFSSLENFRAYVDGFKPNGCIEGKLSQMSKEVNARIEPFLKTTPRPLRLPDGPPCSQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCMKTFFGWREPNIIKPHEKGINPNYLLAWKQVLAELQDIENEDKIPKTKNMKKTSQLMWALGENMAPEKLD
FEDCKDIGDLKQYQSDEPELRSIASWIQSEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEVGEMLLRTAIGQVSRPMFLYVRTNGT
SKIKMKWGMEMRRCLLQSLQQIESMIEAESSIKEKDMTKEFFENRSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYSSPQLEGFSAESRKL
LLIVQALRDNLEPGTFDLEGLYGAIEECLINDPWVLLNASWFNSFLTHALK
>gi|GENSCAN_predicted_CDS_19|2151_bp
atggaagactttgtgcgacagtgcttcaatccaatgattgtcgagcttgcggaaaagacaatgaaggaatatggggaagacccgaaaattgaaac
aaataagttcgctgcaatatgcacacacttagaagtctgcttcatgtattcagacttccatttcattgacgaacgaggcgaatcaataattgtgg
aatctggtgatccaaatgcattgttgaagcacaggtttgaaataattgaaggaagagaccgagcaatggcctggacagtggtgaatagcatctgc
aacacaacaggagtcgataaacccaaatttcttccggatctatacgactacaaggaaaaccgattcactgaaattggtgtgacacggagggaagt
tcacatatattacttagaaaaagctaacaagataaaatccgagaaaacacatatccacatcttttcattcactggagaagaaatggccactaaag
ctgactacacccttgatgaagagagcagggcaagaataaaaaccagactattcaccataagacaggaaatggcaagcaggggtctatgggattcc
tttcgtcagtccgagagaggcgaagagacaattgaagaaagatttgaaatcacagggaccatgcgtaggcttgccgaccaaagtctcccacctaa
cttctccagccttgaaaactttagagcctatgtggatggattcaaaccgaacggctgcattgagggcaagctttctcaaatgtcgaaagaagtga
acgccagaattgagccatttctgaagacaacaccacgtcccctcagattgcctgatggacctccctgctcccagcggtcgaaattcttgctgatg
gatgctctgaaattaagcattgaggacccgagccatgagggggaggggataccgctatatgatgcgataaaatgcatgaaaacattcttcggctg
gagagagcccaacatcatcaagccacacgagaagggcataaatcccaattatcttctggcttggaagcaggtgctggcagaactccaggatattg
aaaatgaggataaaatcccaaaaacaaagaacatgaagaaaacaagccaattaatgtgggcactcggggagaatatggcaccggaaaaattggac
tttgaggactgcaaagatattggcgatctgaaacagtatcaaagtgatgagccagagctcagatcgatagcaagctggatccagagtgagttcaa
caaggcatgtgaattgaccgattcgagctggatagaactcgatgagataggggaagatgttgccccaattgagcacattgcaagcatgagaagga
actacttcacagcggaagtgtctcattgcagggccactgagtacataatgaagggggtttacataaatacagctttgctcaatgcatcttgtgca
gccatggatgacttccaactgattccaatgataagcaaatgcagaacaaaagaaggaagaaggaagacaaacctgtatgggttcattataaaagg
aaggtcccatttgagaaatgatactgacgtggtgaactttgtgagtatggaattctcccttactgacccaaggctggagccacacaaatgggaaa
agtactgtgttcttgaagtaggggaaatgctcttgcggactgcaataggccaggtgtcaaggcccatgttcctgtatgtgagaactaacggaacc
tccaaaattaagatgaaatgggggatggaaatgagacgctgccttcttcaatctcttcaacagattgagagcatgatcgaggctgagtcttctat
caaagagaaagacatgaccaaagaattctttgaaaacagatcggagacatggccaattggagagtcacctaagggagtggaggaaggctcaatcg
ggaaggtgtgcagaaccttactagcaaaatctgtgttcaacagcctatattcatctccacaactcgaaggattttcagctgaatcgagaaaacta
ctactcattgttcaagcacttagggacaacctggaacctggaacctttgatcttgaagggctatatggagcaattgaggagtgcctgattaatga
tccctgggttttgcttaatgcatcttggttcaactccttcctcacacatgcactaaaatag
>gi|GENSCAN_predicted_peptide_20|719_aa
MDTVNRTHQYSEKGRWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGLFENSCLETMEVVQQTRVDKLTQGRQTYDWTLN
RNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKEEMEITTHFQRKRRVRDNMTKKMVTQRTIGKKKQKLTKKSYLIRALTLNTMTK
DAERGKLKRRAIATPGMQIRGFVHFVEALARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSFTVTGDNTKWNENQNPRIFLAMITY
80
ITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLANIDLKYFNESTRKKIEKIRPLLIEGTASLSPGMMMGMFNMLSTVL
GVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLVGINMSKKKSYINRTGTFEFTSFFYRYGFVANFSMELPSFGV
SGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGGPNLYNIRNLHIPEV
CLKWELMDEDYQGRLCNPLNPFVSHKEVESVNNAVVMPAHGPAKSMEYDAVATTHSWIPKRNRSILNTSQRGILEDEQMYQKCCTLFEKFFPSSS
YRRPVGISSMMEAMVSRARIDARIDFESGRIKKEEFAEILKICSTIEELGRQGK
>gi|GENSCAN_predicted_CDS_20|2160_bp
atggacacagtcaacagaacacatcaatattcagaaaaagggaggtggacaacaaacacagagaccggagcaccccaactcaaccctattgatgg
accattacctgaagacaatgagccgagcgggtatgcacaaacagattgtgtattggaagcaatggctttccttgaagaatcccacccaggactct
ttgaaaactcatgtcttgaaacgatggaagttgtccagcaaacgagagtggataagctgacccaaggtcgccagacttatgactggacattgaat
agaaaccagccggctgcaactgctttggccaacaccatagaagtattcagatcgaacggtctaacagccaatgagtcaggaaggttaatagattt
cctcaaggacgtaatggaatcaatggataaggaagaaatggaaataacaacacatttccagagaaagagaagagtgagggacaacatgaccaaga
aaatggtcacacaaagaacaatagggaagaagaagcaaaagctgacaaaaaagagctacctaataagagcactgacactgaacacaatgacaaaa
gatgctgaaaggggaaaattgaaaagacgagcgattgcaacacccggaatgcaaatcagaggattcgtgcactttgtcgaagcactagcaaggag
catctgtgaaaaacttgagcaatctggactccccgttggagggaatgagaagaaggctaaattggcaaatgttgtgagaaagatgatgactaact
cacaagacacagagctctcctttacagttaccggagacaacaccaaatggaatgagaatcagaatcctcgaatatttctagcaatgataacatac
atcacaaggaaccaacctgaatggtttagaaatgtcttgagcattgcccctataatgttctcaaataaaatggcgaggttaggaaaaggatacat
gttcgagagtaagagcatgaagctacggacacaaataccagcagaaatgcttgcaaacattgacttgaaatacttcaacgaatcgacgagaaaga
aaattgagaaaataagacctctactaatagagggcacagcctcattgagtccagggatgatgatgggcatgtttaatatgctaagtacggtctta
ggagtctcaatcttaaatcttgggcagaagaggtacaccaaaaccacatactggtgggatgggctccaatcctctgatgatttcgctctcatagt
gaatgcaccaaatcatgagggaatacaagcaggagtggatagattctataggacttgcaagctagttggaatcaacatgagcaaaaagaagtctt
acataaatcggacaggaacatttgagttcacaagctttttctaccgctatgggtttgtagccaacttcagcatggagctgcccagctttggagtt
tccggaattaatgaatcggctgacatgagcattggagttacagtgataaagaataatatgataaacaacgaccttggaccagcaacagcccagat
ggctcttcagctgttcattaaagactacagatacacctaccgatgccacagaggtgatacacaaattcaaactagaagatcatttgaattgaaga
agctgtgggagcagacccgctcaaaggcaggactgttggtttcagatggagggccgaatttatacaacatccggaatcttcacattccagaagtt
tgcttgaagtgggagttgatggatgaagattaccagggaagactgtgtaaccctctgaacccgtttgtcagtcataaggaagttgaatccgtcaa
caatgctgtggtaatgccagcccatggtccggccaagagcatggaatatgatgccgttgcaactacacattcatggattcccaagagaaatcgct
ccattctcaacactagccaaaggggaattcttgaggatgaacaaatgtaccagaagtgctgcactctattcgagaaattcttccctagcagttca
tatcggaggccagttggaatttccagcatgatggaggccatggtgtctagggcccgaattgatgcacggattgacttcgagtctggaaggattaa
gaaagaagaatttgctgagatcttgaagatctgttccaccattgaagagctcggacggcaagggaagtga
>gi|GENSCAN_predicted_peptide_21|709_aa
MAMKYPITADKRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVTWWNRNGPTTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVKIRRRV
DMNPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKREELKNCNIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCW
EQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGVRMVDILKQNPTEEQAVDICKAAMGLKISSSFSFGGFTFKRTKG
SSVKREEEVLTGNLQTLKIKVHEGYEEFTMVGRRATAILRKATRRMIQLIVSGRDEQSIAEAIIVAMVFSQEDCMVKAVRGDLNFVNRANQRLNP
MHQLLRHFQKDAKVLFQNWGIEPIDNVMGMIGILPDMTPSTEMSLRGVRVSKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGME
KLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQEPTMLYNKMEFEPFQSLVPKAARSQYSGFVRTLFQQMRDVLGTFDTVQIIKLLP
FAAAPPEQSRMQFSSLTVNVRGSGMRILVRGNSPAFNYNKTTKRLTILGKDAGALTEDPDEGTAGVESAVLRGFLILGKEDKRYGPALSINELSN
LTKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>gi|GENSCAN_predicted_CDS_21|2130_bp
atggcgatgaaatacccgatcacagctgacaaaagaataatggagatgatccctgaaaggaatgagcaaggccaaactctttggagcaaaacaaa
tgacgctggatcagacagggtaatggtatcacctctggctgtaacgtggtggaacagaaatggaccaacaacaagtacagtccattatccaaagg
tgtataaaacctactttgaaaaggttgaaagattaaaacacggaacctttggccctgttcatttccggaatcaagtcaaaatacgccgcagggtt
gacatgaaccctggccatgcagatctcagcgctaaagaagcacaagatgtcatcatggaggtcgttttcccaaatgaagttggagccaggatatt
gacatcagaatcacagctgacaataacaaaggaaaagagggaggaactcaagaattgtaatattgctcctttaatggtggcatatatgttggaaa
gagaattggttcgcaagaccagattcctacccgtggctggcgggacaagcagcgtatatatagaagtattgcatttgactcaaggaacttgctgg
gagcagatgtacacaccaggaggggaggtaagaaatgatgatgttgaccaaagtttaatcattgctgctaggaacattgtcaggagagcaacagt
atcagcagacccattggcttcactcctggaaatgtgccatagcacacaaattggcggagtaagaatggtagacatccttaaacaaaacccaacag
aagagcaagctgtagatatatgcaaggcagcaatgggtttgaaaatcagctcatccttcagctttggagggttcactttcaaaagaacaaagggg
tcttctgtcaaaagagaggaagaagtgcttacaggcaacctccaaacattgaagataaaagtacatgaaggatatgaggaattcacaatggttgg
acgaagagcaacagccattctaagaaaagcaaccagaaggatgatccaactgatagtcagcggaagggacgagcaatcaattgctgaggcaatta
ttgtggcaatggtgttctcacaagaagattgcatggtaaaggcagtccgaggtgatttgaatttcgtaaacagagcaaatcaacgactgaatccc
atgcaccaactcctgagacactttcaaaaggatgcaaaggtgctgtttcaaaactggggaattgaacccatcgacaatgtcatgggtatgattgg
aatattgcctgacatgacccccagcacggaaatgtcactaagaggagtgagagttagcaaaatgggggtggatgaatattctagcactgaaaggg
tggtcgtgagcattgaccgtttcttaagggtccgagatcagcgaggaaatgtactcctatcccctgaagaagttagtgaaacacagggaatggaa
aagttgacgataacttattcatcgtctatgatgtgggagattaacgggccagaatcagtgctagttaacacatatcaatggatcattaggaattg
ggagactgtaaagatccaatggtcccaagaacccaccatgctatacaataagatggagtttgaaccatttcaatctttagtaccaaaggctgcca
gaagccaatatagtggatttgtgagaacgctattccagcagatgcgtgatgttttgggaacgttcgacactgttcaaataatcaaactactacca
tttgcagcagccccaccggaacagagtaggatgcaattttcttctctgactgtgaatgtgaggggatcaggaatgagaatacttgtgagaggtaa
ctcccctgcatttaactacaacaagacaactaagaggcttacaatacttgggaaggacgcaggtgcgcttacagaggacccagatgaaggaacag
caggagtagagtctgcagtattgagaggatttctaatcctcggcaaagaagacaaaagatatggaccagcattaagcatcaatgaactgagcaat
cttacgaaaggggagaaagctaatgtattgatagggcaaggagacgtagtgttggtaatgaaacggaaacgggactctagcatacttactgacag
ccagacagcgaccaaaagaattcggatggccatcaattag
>gi|GENSCAN_predicted_peptide_22|751_aa
METISLITILLVVTASNADKICIGHQSTNSTETVDTLTETNVPVTHAKELLHTEHNGMLCATSLGHPLILDTCTIEGLVYGNPSCDLLLGGREWS
YIVERSSAVNGTCYPGNVENLEELRTLFSSASSYQRIQIFPDTTWNVTYTGTSRACSGSFYRSMRWLTQKSGFYPVQDAQYTNNRGKSILFVWGI
HHPPTYTEQTNLYIRNDTTTSVTTEDLNRTFKPVIGPRPLVNGLQGRIDYYWSVLKPGQTLRVRSNGNLIAPWYGHVLSGGSHGRILKTDLKGGN
CVVQCQTEKGGLNSTLPFHNISKYAFGTCPKYVRVNSLKLAVGLRNVPARSSRGLFGAIAGFIEGGWPGLVAGWYGFQHSNDQGVGMAADRDSTQ
KAIDKITSKVNNIVDKMNKQYEIIDHEFSEVETRLNMINNKIDDQIQDVWAYNAELLVLLENQKTLDEHDANVNNLYNKVKRALGSNAMEDGKGC
FELYHKCDDQCMETIRNGTYNRRKYREESRLERQKIEGGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDRAVKLYKKLKREMTFHGAKE
VALSYSTGALASCMGLIYNRMGTVTTEVALGLVCATCEQIADAQHRSHRQMATTTNPLIRHENRMVLASTTAKAMEQMAGSSEQAAEAMEVASQA
RQMVQAMRTIGTHPSSSAASIIGILHLILWILDRLFFKCIYRRFKYGLKRGPSTEGVPESMREEYRQEQQNAVDVDDGHFVNIELE
>gi|GENSCAN_predicted_CDS_22|2256_bp
atggaaacaatatcactaataactatactactagtagtaacagcaagcaatgcagataaaatctgcatcggccaccagtcaacaaactccacaga
aactgtggacacgctaacagaaaccaatgttcctgtgacacatgccaaagaattgctccacacagagcataatggaatgctgtgtgcaacaagcc
tgggacatcccctcattctagacacatgcactattgaaggactagtctatggcaacccttcttgtgacctgctgttgggaggaagagaatggtcc
tacatcgtcgaaagatcatcagctgtaaatggaacgtgttaccctgggaatgtagaaaacctagaggaactcaggacactttttagttccgctag
81
ttcctaccaaagaatccaaatcttcccagacacaacctggaatgtgacttacactggaacaagcagagcatgttcaggttcattctacaggagta
tgagatggctgactcaaaagagcggtttttaccctgttcaagacgcccaatacacaaataacaggggaaagagcattcttttcgtgtggggcata
catcacccacccacctataccgagcaaacaaatttgtacataagaaacgacacaacaacaagcgtgacaacagaagatttgaataggaccttcaa
accagtgatagggccaaggccccttgtcaatggtctgcagggaagaattgattattattggtcggtactaaaaccaggccaaacattgcgagtac
gatccaatgggaatctaattgctccatggtatggacacgttctttcaggagggagccatggaagaatcctgaagactgatttaaaaggtggtaat
tgtgtagtgcaatgtcagactgaaaaaggtggcttaaacagtacattgccattccacaatatcagtaaatatgcatttggaacctgccccaaata
tgtaagagttaatagtctcaaactggcagtcggtctgaggaacgtgcctgctagatcaagtagaggactatttggagccatagctggattcatag
aaggaggttggccaggactagtcgctggctggtatggtttccagcattcaaatgatcaaggggttggtatggctgcagatagggattcaactcaa
aaggcaattgataaaataacatccaaggtgaataatatagtcgacaagatgaacaagcaatatgaaataattgatcatgaattcagtgaggttga
aactagactcaatatgatcaataataagattgatgaccaaatacaagacgtatgggcatataatgcagaattgctagtactacttgaaaatcaaa
aaacactcgatgagcatgatgcgaacgtgaacaatctatataacaaggtgaagagggcactgggctccaatgctatggaagatgggaaaggctgt
ttcgagctataccataaatgtgatgatcagtgcatggaaacaattcggaacgggacctataataggagaaagtatagagaggaatcaagactaga
aaggcagaaaatagagggggggattttagggtttgtgttcacgctcaccgtgcccagtgagcgaggactgcagcgtagacgatttgtccaaaatg
ccctaaatgggaatggagacccaaacaacatggacagggcagttaaactatacaagaagctgaagagggaaatgacattccatggagcaaaggaa
gttgcactcagttactcaactggtgcgcttgccagttgcatgggtctcatatacaaccggatgggaacagtgaccacagaagtggctcttggcct
agtatgtgccacttgtgaacagattgctgatgcccaacatcggtcccacaggcagatggcgactaccaccaacccactaatcaggcatgagaaca
gaatggtactagccagcactacggctaaggccatggagcagatggctggatcaagtgagcaggcagcagaagccatggaagtcgcaagtcaggct
aggcaaatggtgcaggctatgaggacaattgggactcaccctagttccagtgcagcaagtatcattgggatattgcacttgatattgtggattct
tgatcgtcttttcttcaaatgcatttatcgtcgctttaaatacggtttgaaaagagggccttctacggaaggagtgcctgagtctatgagggaag
agtatcggcaggaacagcagaatgctgtggatgttgacgatggtcattttgtcaacatagagctggagtaa
>gi|GENSCAN_predicted_peptide_23|759_aa
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRITEMIPERNEQGQTLWSKMNDAGSDRVMVSPLA
VTWWNRNGPMTNTVHYPKIYKTYFERVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKK
EELQDCKISPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVKNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCH
STQIGGIRMVDILKQNPTEEQAVGICKAAMGLRISSSFSFGGFTFKRTSGSSVKREEEVLTGNLQTLKIRVHEGYEEFTMVGRRATAILRKATRR
LIQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGVEPIDNVMGMIGILPDMTPSIEMSM
RGVRISKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQNPTM
LYNKMEFEPFQSLVPKAIRGQYSGFVRTLFQQMRDVLGTFDTAQIIKLLPFAAAPPKQSRMQFSSFTVNVRGSGMRILVRGNSPVFNYNKATKRL
TVLGKDAGTLTEDPDEGTAGVESAVLRGFLILGKEDRRYGPALSINELSNLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>gi|GENSCAN_predicted_CDS_23|2280_bp
atggaaagaataaaagaactaagaaatctaatgtcgcagtctcgcacccgcgagatactcacaaaaaccaccgtggaccatatggccataatcaa
gaagtacacatcaggaagacaggagaagaacccagcacttaggatgaaatggatgatggcaatgaaatatccaattacagcagacaagaggataa
cggaaatgattcctgagagaaatgagcaaggacaaactttatggagtaaaatgaatgatgccggatcagaccgagtgatggtatcacctctggct
gtgacatggtggaataggaatggaccaatgacaaatacagttcattatccaaaaatctacaaaacttattttgaaagagtcgaaaggctaaagca
tggaacctttggccctgtccattttagaaaccaagtcaaaatacgtcggagagttgacataaatcctggtcatgcagatctcagtgccaaggagg
cacaggatgtaatcatggaagttgttttccctaacgaagtgggagccaggatactaacatcggaatcgcaactaacgataaccaaagagaagaaa
gaagaactccaggattgcaaaatttctcctttgatggttgcatacatgttggagagagaactggtccgcaaaacgagattcctcccagtggctgg
tggaacaagcagtgtgtacattgaagtgttgcatttgactcaaggaacatgctgggaacagatgtatactccaggaggggaagtgaagaatgatg
atgttgatcaaagcttgattattgctgctaggaacatagtgagaagagctgcagtatcagcagacccactagcatctttattggagatgtgccac
agcacacagattggtggaattaggatggtagacatccttaagcagaacccaacagaagagcaagccgtgggtatatgcaaggctgcaatgggact
gagaattagctcatccttcagttttggtggattcacatttaagagaacaagcggatcatcagtcaagagagaggaagaggtgcttacgggcaatc
ttcaaacattgaagataagagtgcatgagggatatgaagagttcacaatggttgggagaagagcaacagccatactcagaaaagcaaccaggaga
ttgattcagctgatagtgagtgggagagacgaacagtcgattgccgaagcaataattgtggccatggtattttcacaagaggattgtatgataaa
agcagttagaggtgatctgaatttcgtcaatagggcgaatcagcgactgaatcctatgcatcaacttttaagacattttcagaaggatgcgaaag
tgctttttcaaaattggggagttgaacctatcgacaatgtgatgggaatgattgggatattgcccgacatgactccaagcatcgagatgtcaatg
agaggagtgagaatcagcaaaatgggtgtagatgagtactccagcacggagagggtagtggtgagcattgaccggttcttgagagtccgggacca
acgaggaaatgtactactgtctcccgaggaggtcagtgaaacacagggaacagagaaactgacaataacttactcatcgtcaatgatgtgggaga
ttaatggtcctgaatcagtgttggtcaatacctatcaatggatcatcagaaactgggaaactgttaaaattcagtggtcccagaaccctacaatg
ctatacaataaaatggaatttgaaccatttcagtctttagtacctaaggccattagaggccaatacagtgggtttgtgagaactctgttccaaca
aatgagggatgtgcttgggacatttgataccgcacagataataaaacttcttcccttcgcagccgctccaccaaagcaaagtagaatgcagttct
cctcatttactgtgaatgtgaggggatcaggaatgagaatacttgtaaggggcaattctcctgtattcaactacaacaaggccacgaagagactc
acagttctcggaaaggatgctggcactttaaccgaagacccagatgaaggcacagctggagtggagtccgctgttctgaggggattcctcattct
gggcaaagaagacaggagatatgggccagcattaagcatcaatgaactgagcaaccttgcgaaaggagagaaggctaatgtgctaattgggcaag
gagacgtggtgttggtaatgaaacgaaaacgggactctagcatacttactgacagccagacagcgaccaaaagaattcggatggccatcaattag
>gi|GENSCAN_predicted_peptide_24|716_aa
MEDFVRQCFNPMIVELAEKTMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIIVELGDPNALLKHRFEIIEGRDRTMAWTVVNSIC
NTTGAEKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDS
FRQSERGEETIEERFEITGTMRKLADQSLPPNFSSLENFRAYVDGFEPNGYIEGKLSQMSKEVNARIEPFLKTTPRPLRLPNGPPCSQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCMRTFFGWKEPNVVKPHEKGINPNYLLSWKQVLAELQDIENEEKIPKTKNMKKTSQLKWALGENMAPEKVD
FDDCKDVGDLKQYDSDEPELRSLASWIQNEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTSEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQVSRPMFLYVRTNGT
SKIKMKWGMEMRRCLLQSLQQIESMIEAESSVKEKDMTKEFFENKSETWPIGESPKGVEESSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKL
LLIVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSFLTHALS
>gi|GENSCAN_predicted_CDS_24|2151_bp
atggaagattttgtgcgacaatgcttcaatccgatgattgtcgagcttgcggaaaaaacaatgaaagagtatggggaggacctgaaaatcgaaac
aaacaaatttgcagcaatatgcactcacttggaagtatgcttcatgtattcagatttccacttcatcaatgagcaaggcgagtcaataatcgtag
aacttggtgatcctaatgcacttttgaagcacagatttgaaataatcgagggaagagatcgcacaatggcctggacagtagtaaacagtatttgc
aacactacaggggctgagaaaccaaagtttctaccagatttgtatgattacaaggaaaatagattcatcgaaattggagtaacaaggagagaagt
tcacatatactatctggaaaaggccaataaaattaaatctgagaaaacacacatccacattttctcgttcactggggaagaaatggccacaaagg
ccgactacactctcgatgaagaaagcagggctaggatcaaaaccaggctattcaccataagacaagaaatggccagcagaggcctctgggattcc
tttcgtcagtccgagagaggagaagagacaattgaagaaaggtttgaaatcacaggaacaatgcgcaagcttgccgaccaaagtctcccgccgaa
cttctccagccttgaaaattttagagcctatgtggatggattcgaaccgaacggctacattgagggcaagctgtctcaaatgtccaaagaagtaa
atgctagaattgaaccttttttgaaaacaacaccacgaccacttagacttccgaatgggcctccctgttctcagcggtccaaattcctgctgatg
gatgccttaaaattaagcattgaggacccaagtcatgaaggagagggaataccgctatatgatgcaatcaaatgcatgagaacattctttggatg
gaaggaacccaatgttgttaaaccacacgaaaagggaataaatccaaattatcttctgtcatggaagcaagtactggcagaactgcaggacattg
agaatgaggagaaaattccaaagactaaaaatatgaaaaaaacaagtcagctaaagtgggcacttggtgagaacatggcaccagaaaaggtagac
82
tttgacgactgtaaagatgtaggtgatttgaagcaatatgatagtgatgaaccagaattgaggtcgcttgcaagttggattcagaatgagttcaa
caaggcatgcgaactgacagattcaagctggatagagcttgatgagattggagaagatgtggctccaattgaacacattgcaagcatgagaagga
attatttcacatcagaggtgtctcactgcagagccacagaatacataatgaagggggtgtacatcaatactgccttacttaatgcatcttgtgca
gcaatggatgatttccaattaattccaatgataagcaagtgtagaactaaggagggaaggcgaaagaccaacttgtatggtttcatcataaaagg
aagatcccacttaaggaatgacaccgacgtggtaaactttgtgagcatggagttttctctcactgacccaagacttgaaccacacaaatgggaga
agtactgtgttcttgagataggagatatgcttctaagaagtgccataggccaggtttcaaggcccatgttcttgtatgtgaggacaaatggaacc
tcaaaaattaaaatgaaatggggaatggagatgaggcgttgtctcctccagtcacttcaacaaattgagagtatgattgaagctgagtcctctgt
caaagagaaagacatgaccaaagagttctttgagaacaaatcagaaacatggcccattggagagtctcccaaaggagtggaggaaagttccattg
ggaaggtctgcaggactttattagcaaagtcggtatttaacagcttgtatgcatctccacaactagaaggattttcagctgaatcaagaaaactg
cttcttatcgttcaggctcttagggacaatctggaacctgggacctttgatcttggggggctatatgaagcaattgaggagtgcctaattaatga
tccctgggttttgcttaatgcttcttggttcaactccttccttacacatgcattgagttag
>gi|GENSCAN_predicted_peptide_25|718_aa
MDTVNRTHQYSEKARWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGIFENSCIETMEVVQQTRVDKLTQGRQTYDWTLN
RNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMKKEEMGITTHFQRKRRVRDNMTKKMITQRTIGKRKQRLNKRSYLIRALTLNTMTK
DAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSLTITGDNTKWNENQNPRMFLAMITY
MTRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNDSTRKKIEKIRPLLIEGTASLSPGMMMGMFNMLSTVL
GVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLHGINMSKKKSYINRTGTFEFTSFFYRYGFVANFSMELPSFGV
SGSNESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFEIKKLWEQTRSKAGLLVSDGGPNLYNIRNLHIPEV
CLKWELMDEDYQGRLCNPLNPFVSHKEIESMNNAVMMPAHGPAKNMEYDAVATTHSWIPKRNRSILNTSQRGVLEDEQMYQRCCNLFEKFFPSSS
YRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFTEIMKICSTIEELRRQK
>gi|GENSCAN_predicted_CDS_25|2157_bp
atggatactgtcaacaggacacatcagtactcagaaaaggcaagatggacaacaaacaccgaaactggagcaccgcaactcaacccgattgatgg
gccactgccagaagacaatgaaccaagtggttatgcccaaacagattgtgtattggaagcaatggctttccttgaggaatcccatcctggtattt
ttgaaaactcgtgtattgaaacgatggaggttgttcagcaaacacgagtagacaagctgacacaaggccgacagacctatgactggactttaaat
agaaaccagcctgctgcaacagcattggccaacacaatagaagtgttcagatcaaatggcctcacggccaatgagtctggaaggctcatagactt
ccttaaggatgtaatggagtcaatgaaaaaagaagaaatggggatcacaactcattttcagagaaagagacgggtgagagacaatatgactaaga
aaatgataacacagagaacaataggtaaaaggaaacagagattgaacaaaaggagttatctaattagagcattgaccctgaacacaatgaccaaa
gatgctgagagagggaagctaaaacggagagcaattgcaaccccagggatgcaaataagggggtttgtatactttgttgagacactggcaaggag
tatatgtgagaaacttgaacaatcagggttgccagttggaggcaatgagaagaaagcaaagttggcaaatgttgtaaggaagatgatgaccaatt
ctcaggacaccgaactttctttgaccatcactggagataacaccaaatggaacgaaaatcagaatcctcggatgtttttggccatgatcacatat
atgaccagaaatcagcccgaatggttcagaaatgttctaagtattgctccaataatgttctcaaacaaaatggcgagactgggaaaagggtatat
gtttgagagcaagagtatgaaacttagaactcaaatacctgcagaaatgctagcaagcattgatttgaaatatttcaatgattcaacaagaaaga
agattgaaaaaatccgaccgctcttaatagaggggactgcatcattgagccctggaatgatgatgggcatgttcaatatgttaagcactgtatta
ggcgtctccatcctgaatcttggacaaaagagatacaccaagactacttactggtgggatggtcttcaatcctctgacgattttgctctgattgt
gaatgcacccaatcatgaagggattcaagccggagtcgacaggttttatcgaacctgtaagctacatggaatcaatatgagcaagaaaaagtctt
acataaacagaacaggtacatttgaattcacaagttttttctatcgttatgggtttgttgccaatttcagcatggagcttcccagttttggtgtg
tctgggagcaacgagtcagcggacatgagtattggagttactgtcatcaaaaacaatatgataaacaatgatcttggtccagcaacagctcaaat
ggcccttcagttgttcatcaaagattacaggtacacgtaccgatgccatagaggtgacacacaaatacaaacccgaagatcatttgaaataaaga
aactgtgggagcaaacccgttccaaagctggactgctggtctccgacggaggcccaaatttatacaacattagaaatctccacattcctgaagtc
tgcctaaaatgggaattgatggatgaggattaccaggggcgtttatgcaacccactgaacccatttgtcagccataaagaaattgaatcaatgaa
caatgcagtgatgatgccagcacatggtccagccaaaaacatggagtatgatgctgttgcaacaacacactcctggatccccaaaagaaatcgat
ccatcttgaatacaagtcaaagaggagtacttgaagatgaacaaatgtaccaaaggtgctgcaatttatttgaaaaattcttccccagcagttca
tacagaagaccagtcgggatatccagtatggtggaggctatggtttccagagcccgaattgatgcacggattgatttcgaatctggaaggataaa
gaaagaagagttcactgagatcatgaagatctgttccaccattgaagagctcagacggcaaaaatag
>gi|GENSCAN_predicted_peptide_26|230_aa
MDPNTVSSFQVDCFLWHVRKRVADQELGDAPFLDRLRRDQKSLRGRGSTLGLDIETATRAGKQIVERILKEESDEALKMTMASVPASRYLTDMTL
EEMSREWSMLIPKQKVAGPLCIRMDQAIMDKNIILKANFSVIFDRLETLILLRAFTEEGAIVGEISPLPSLPGHTAEDVKNAVGVLIGGLEWNDN
TVRVSETLQRFAWRSSNENGRPPLTPKQKREMAGTIRSEV
>gi|GENSCAN_predicted_CDS_26|693_bp
atggatccaaacactgtgtcaagctttcaggtagattgctttctttggcatgtccgcaaacgagttgcagaccaagaactaggtgatgccccatt
ccttgatcggcttcgccgagatcagaaatccctaagaggaaggggcagcactcttggtctggacatcgagacagccacacgtgctggaaagcaga
tagtggagcggattctgaaagaagaatccgatgaggcacttaaaatgaccatggcctctgtacctgcgtcgcgttacctaaccgacatgactctt
gaggaaatgtcaagggaatggtccatgctcatacccaagcagaaagtggcaggccctctttgtatcagaatggaccaggcgatcatggataaaaa
catcatactgaaagcgaacttcagtgtgatttttgaccggctggagactctaatattgctaagggctttcaccgaagagggagcaattgttggcg
aaatttcaccattgccttctcttccaggacatactgctgaggatgtcaaaaatgcagttggagtcctcatcggaggacttgaatggaatgataac
acagttcgagtctctgaaactctacagagattcgcttggagaagcagtaatgagaatgggagacctccactcactccaaaacagaaacgagaaat
ggcgggaacaattaggtcagaagtttga
>gi|GENSCAN_predicted_peptide_27|498_aa
MASQGTKRSYEQMETDGERQNATEIRASVGKMIGGIGRFYIQMCTELKLSDYEGRLIQNSLTIERMVLSAFDERRNKYLEEHPSAGKDPKKTGGP
IYRRVNGKWMRELILYDKEEIRRIWRQANNGDDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMV
MELVRMIKRGINDRNFWRGENGRKTRIAYERMCNILKGKFQTAAQKAMMDQVRESRDPGNAEFEDLTFLARSALILRGSVAHKSCLPACVYGPAV
ASGYDFEREGYSLVGIDPFRLLQNSQVYSLIRPNENPAHKSQLVWMACHSAAFEDLRVLSFIKGTKVVPRGKLSTRGVQIASNENMETMESSTLE
LRSRYWAIRTRSGGNTNQQRASAGQISIQPTFSVQRNLPFDRTTVMAAFTGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKAASPI
VPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_27|1497_bp
atggcgtcccaaggcaccaaacggtcttacgaacagatggagactgatggagaacgccagaatgccactgaaatcagagcatccgtcggaaaaat
gattggtggaattggacgattctacatccaaatgtgcacagaacttaaactcagtgattatgagggacggttgatccaaaacagcttaacaatag
agagaatggtgctctctgcttttgacgaaaggagaaataaatacctggaagaacatcccagtgcggggaaggatcctaagaaaactggaggacct
atatacagaagagtaaacggaaagtggatgagagaactcatcctttatgacaaagaagaaataaggcgaatctggcgccaagctaataatggtga
cgatgcaacggctggtctgactcacatgatgatctggcattccaatttgaatgatgcaacttatcagaggacaagggctcttgttcgcaccggaa
tggatcccaggatgtgctctctgatgcaaggttcaactctccctaggaggtctggagccgcaggtgctgcagtcaaaggagttggaacaatggtg
atggaattggtcaggatgatcaaacgtgggatcaatgatcggaacttctggaggggtgagaatggacgaaaaacaagaattgcttatgaaagaat
gtgcaacattctcaaagggaaatttcaaactgctgcacaaaaagcaatgatggatcaagtgagagagagccgggacccagggaatgctgagttcg
aagatctcacttttctagcacggtctgcactcatattgagagggtcggttgctcacaagtcctgcctgcctgcctgtgtgtatggacctgccgta
gccagtgggtacgactttgaaagagagggatactctctagtcggaatagaccctttcagactgcttcaaaacagccaagtgtacagcctaatcag
accaaatgagaatccagcacacaagagtcaactggtgtggatggcatgccattctgccgcatttgaagatctaagagtattgagcttcatcaaag
83
ggacgaaggtggtcccaagagggaagctttccactagaggagttcaaattgcttccaatgaaaatatggagactatggaatcaagtacacttgaa
ctgagaagcaggtactgggccataaggaccagaagtggaggaaacaccaatcaacagagggcatctgcgggccaaatcagcatacaacctacgtt
ctcagtacagagaaatctcccttttgacagaacaaccgttatggcagcattcactgggaatacagaggggagaacatctgacatgaggaccgaaa
tcataaggatgatggaaagtgcaagaccagaagatgtgtctttccaggggcggggagtcttcgagctctcggacgaaaaggcagcgagcccgatc
gtgccttcctttgacatgagtaatgaaggatcttatttcttcggagacaatgcagaggagtacgacaattaa
>gi|GENSCAN_predicted_peptide_28|711_aa
MKANLLVLLCALAAADADTICIGYHANNSTDTVDTVLEKNVTVTHSVNLLEDSHNGKLCRLKGIAPLQLGKCNIAGWLLGNPECDPLLPVRSWSY
IVETPNSENGICYPGDFIDYEELREQLSSVSSFERFEIFPKESSWPNHNTTKGVTAACSHAGKSSFYRNLLWLTEKEGSYPKLKNSYVNKKGKEV
LVLWGIHHPSNSKDQQNIYQNENAYVSVVTSNYNRRFTPEIAERPKVRDQAGRMNYYWTLLKPGDTIIFEANGNLIAPRYAFALSRGFGSGIITS
NASMHECNTKCQTPLGAINSSLPFQNIHPVTIGECPKYVRSAKLRMVTGLRNIPSIQSRGLFGAIAGFIEGGWTGMIDGWYGYHHQNEQGSGYAA
DQKSTQNAINGITNKVNSVIEKMNIQFTAVGKEFNKLEKRMENLNKKVDDGFLDIWTYNAELLVLLENERTLDFHDSNVKNLYEKVKSQLKNNAK
EIGNGCFEFYHKCDNECMESVRNGTYDYPKYSEESKLNREKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDKAVKLYRKLKREITFHG
AKEISLSYSAGALASCMGLIYNRMGAVTTEVAFGLVCATCEQIADSQHRSHRQMVTTTNPLIRHENRMVLASTTAKAMEQMAGSSEQAAEAMEVA
SQARQMVQAMRTIGTHPSSSAGLKNDLLENLQAYQKRMGVQMQRFK
>gi|GENSCAN_predicted_CDS_28|2136_bp
atgaaggcaaacctactggtcctgttatgtgcacttgcagctgcagatgcagacacaatatgtataggctaccatgcgaacaattcaaccgacac
tgttgacacagtgctcgagaagaatgtgacagtgacacactctgttaacctgctcgaagacagccacaacggaaaactatgtagattaaaaggaa
tagccccactacaattggggaaatgtaacatcgccggatggctcttgggaaacccagaatgcgacccactgcttccagtgagatcatggtcctac
attgtagaaacaccaaactctgagaatggaatatgttatccaggagatttcatcgactatgaggagctgagggagcaattgagctcagtgtcatc
attcgaaagattcgaaatatttcccaaagaaagctcatggcccaaccacaacacaaccaaaggagtaacggcagcatgctcccatgcggggaaaa
gcagtttttacagaaatttgctatggctgacggagaaggagggctcatacccaaagctgaaaaattcttatgtgaacaagaaagggaaagaagtc
cttgtactgtggggtattcatcacccgtctaacagtaaggatcaacagaatatctatcagaatgaaaatgcttatgtctctgtagtgacttcaaa
ttataacaggagatttaccccggaaatagcagaaagacccaaagtaagagatcaagctgggaggatgaactattactggaccttgctaaaacccg
gagacacaataatatttgaggcaaatggaaatctaatagcaccaaggtatgctttcgcactgagtagaggctttgggtccggcatcatcacctca
aacgcatcaatgcatgagtgtaacacgaagtgtcaaacacccctgggagctataaacagcagtctccctttccagaatatacacccagtcacaat
aggagagtgcccaaaatacgtcaggagtgccaaattgaggatggttacaggactaaggaacattccgtccattcaatccagaggtctatttggag
ccattgccggttttattgaagggggatggactggaatgatagatggatggtacggttatcatcatcagaatgaacagggatcaggctatgcagcg
gatcaaaaaagcacacaaaatgccattaacgggattacaaacaaggtgaactctgttatcgagaaaatgaacattcaattcacagctgtgggtaa
agaattcaacaaattagaaaaaaggatggaaaatttaaataaaaaagttgatgatggatttctggacatttggacatataatgcagaattgttag
ttctactggaaaatgaaaggactctggatttccatgactcaaatgtgaagaatctgtatgagaaagtaaaaagccaattaaagaataatgccaaa
gaaatcggaaatggatgttttgagttctaccacaagtgtgacaatgaatgcatggaaagtgtaagaaatgggacttatgattatcccaaatattc
agaagagtcaaagttgaacagggaaaaggggattttaggatttgtgttcacgctcaccgtgcccagtgagcgaggactgcagcgtagacgctttg
tccaaaatgcccttaatgggaacggggatccaaataacatggacaaagcagttaaactgtataggaagctcaagagggagataacattccatggg
gccaaagaaatctcactcagttattctgctggtgcacttgccagttgtatgggcctcatatacaacaggatgggggctgtgaccactgaagtggc
atttggcctggtatgtgcaacctgtgaacagattgctgactcccagcatcggtctcataggcaaatggtgacaacaaccaacccactaatcagac
atgagaacagaatggttttagccagcactacagctaaggctatggagcaaatggctggatcgagtgagcaagcagcagaggccatggaggttgct
agtcaggctaggcaaatggtgcaagcgatgagaaccattgggactcatcctagctccagtgctggtctgaaaaatgatcttcttgaaaatttgca
ggcctatcagaaacgaatgggggtgcagatgcaacggttcaagtga
Explanation
84
Table showing genome of different strains with their total number of Base pairs in their individual segments
H1N1
SEGMENT 1 2341
SEGMENT2 2341
SEGMENT3 2233
SEGMENT4 1778
SEGMENT5 1565
SEGMENT6 1413
SEGMENT7 1027
SEGMENT8 890
H2N2
SEGMENT1 2341
SEGMENT2 2341
SEGMENT3 2233
SEGMENT4 1773
SEGMENT5 1497
SEGMENT6 1410
SEGMENT7 1027
SEGMENT8 838
H3N2
SEGMENT1 2341
SEGMENT2 2341
SEGMENT3 2233
SEGMENT4 1762
SEGMENT5 1566
SEGMENT6 1467
SEGMENT7 1027
SEGMENT8 890
H9N2
SEGMENT1 2341
SEGMENT2 2328
SEGMENT3 2225
SEGMENT4 1714
SEGMENT5 1557
SEGMENT6 1418
SEGMENT7 1025
SEGMENT8 890
H5N1
SEGMENT1 2341
SEGMENT2 2341
SEGMENT3 2233
SEGMENT4 1760
SEGMENT5 1565
SEGMENT6 1458
SEGMENT7 1027
SEGMENT8 865
Modelling
85
% q Q s S
Query sequence subject identity A score start end start end E value Bitscore
gi|73921266|ref|YP_308668.1| pdb|1NMB|N 44.51 474 248 9 1 465 1 468 6.00E-111 396
gi|73921266|ref|YP_308668.1| pdb|5NN9| 48.81 379 188 5 91 465 10 386 2.00E-102 367
gi|73921266|ref|YP_308668.1| pdb|1XOG|A 48.81 379 188 5 91 465 9 385 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|1L7F|A 48.81 379 188 5 91 465 10 386 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|4NN9| 48.81 379 188 5 91 465 10 386 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|1NCC|N 48.81 379 188 5 91 465 11 387 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|1NCA|N 48.81 379 188 5 91 465 11 387 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|1NMA|N 48.28 379 190 5 91 465 10 386 8.00E-102 365
gi|73921266|ref|YP_308668.1| pdb|1NCD|N 48.28 379 190 5 91 465 11 387 8.00E-102 365
gi|73921266|ref|YP_308668.1| pdb|1L7H|A 48.55 379 189 5 91 465 10 386 1.00E-101 365
gi|73921266|ref|YP_308668.1| pdb|1W20|D 48.41 378 188 5 91 463 10 385 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|1NCB|N 48.55 379 189 5 91 465 11 387 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|6NN9| 48.55 379 189 5 91 465 10 386 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|3NN9| 48.55 379 189 5 91 465 10 386 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|1INY| 48.55 379 189 5 91 465 10 386 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|1L7G|A 48.55 379 189 5 91 465 10 386 3.00E-101 363
gi|73921266|ref|YP_308668.1| pdb|2AEQ|A 46.7 379 189 7 92 463 18 390 4.00E-95 343
gi|73921266|ref|YP_308668.1| pdb|2BAT| 45.93 381 193 7 92 465 11 385 4.00E-95 343
gi|73921266|ref|YP_308668.1| pdb|1INX| 45.93 381 193 7 92 465 11 385 6.00E-95 343
1
gi|73921266|ref|YP_308668.1| pdb|1VCJ|A 36.36 352 203 0 116 455 37 379 7.00E-59 223
1
gi|73921266|ref|YP_308668.1| pdb|1B9V|A 36.36 352 203 0 116 455 38 380 7.00E-59 223
1
gi|73921266|ref|YP_308668.1| pdb|1INF| 36.36 352 203 0 116 455 38 380 7.00E-59 223
gi|73921266|ref|YP_308668.1| pdb|1A4Q|B 35.9 351 206 9 116 455 38 380 3.00E-58 221
gi|73921266|ref|YP_308668.1| pdb|2AZD|B 50 22 11 0 294 315 159 180 0.22 32.3
gi|73921266|ref|YP_308668.1| pdb|1QM5|B 50 22 11 0 294 315 159 180 0.22 32.3
gi|73921266|ref|YP_308668.1| pdb|2ECP|B 50 22 11 0 294 315 159 180 0.22 32.3
gi|73921266|ref|YP_308668.1| pdb|1AHP|B 50 22 11 0 294 315 160 181 0.22 32.3
gi|73921266|ref|YP_308668.1| pdb|1U8C|B 25 120 70 4 226 345 496 595 0.28 32
gi|73921266|ref|YP_308668.1| pdb|1SSK|A 31.75 63 42 1 146 207 76 138 1.1 30
gi|73921266|ref|YP_308668.1| pdb|1EGI|B 41.67 24 14 0 422 445 72 95 3.1 28.5
gi|73921266|ref|YP_308668.1| pdb|1EGZ|C 30.3 66 35 3 354 410 9 72 5.3 27.7
gi|73921266|ref|YP_308668.1| pdb|1WB0|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73921266|ref|YP_308668.1| pdb|1HKM|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73921266|ref|YP_308668.1| pdb|1HKK|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73921266|ref|YP_308668.1| pdb|1LQ0|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73921266|ref|YP_308668.1| pdb|1GUV|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73852953|ref|YP_308667.1| pdb|1I7A|D 25.47 106 73 2 254 357 3 104 0.88 30.4
gi|73852953|ref|YP_308667.1| pdb|1TBG|H 32.73 55 35 1 41 95 6 58 5.7 27.7
gi|73852953|ref|YP_308667.1| pdb|1B9Y|B 32.73 55 35 1 41 95 6 58 5.7 27.7
gi|73852953|ref|YP_308667.1| pdb|1A0R|G 32.73 55 35 1 41 95 5 57 5.7 27.7
gi|73852953|ref|YP_308667.1| pdb|1GOT|G 32.73 55 35 1 41 95 13 65 5.7 27.7
gi|73852947|ref|YP_308664.1| pdb|1W1W|D 21.88 128 85 4 106 231 173 287 0.83 31.2
gi|73852947|ref|YP_308664.1| pdb|1GTM|C 20.65 92 72 1 149 240 274 364 4.1 28.9
gi|73852947|ref|YP_308664.1| pdb|1EUZ|F 20.21 94 74 1 149 242 274 366 5.3 28.5
gi|73852947|ref|YP_308664.1| pdb|1J0W|B 28.26 46 33 0 503 548 1 46 5.3 28.5
gi|73852947|ref|YP_308664.1| pdb|1BVU|F 21.74 92 71 1 149 240 273 363 7 28.1
gi|73852947|ref|YP_308664.1| pdb|1BBW|A 22.78 79 59 1 166 242 33 111 9.1 27.7
gi|73852947|ref|YP_308664.1| pdb|1KRT| 22.78 79 59 1 166 242 5 83 9.1 27.7
gi|73852957|ref|YP_308671.1| pdb|1EA3|B 95.12 164 8 0 1 164 1 164 3.00E-87 316
gi|73852957|ref|YP_308671.1| pdb|1AA7|B 94.94 158 8 0 1 158 1 158 3.00E-83 303
gi|73852957|ref|YP_308671.1| pdb|2CRL|A 28.24 85 52 3 118 199 2 80 4.1 26.9
gi|73852957|ref|YP_308671.1| pdb|1QGE|D 36.36 44 27 1 178 221 70 112 9.1 25.8
27
86
Result for the modelling
Template selection:
For modelling we choose the appropriate template, according to its E-value ,bit score, % identity and alignment
length from the list of PDB blast
H5n1
Target: Neuraminidase
Template pdb id=1w21
E-val=1e-96
Bit score=348
%identity=47.15%
Target: HA1
Template: PDB id=1jsmA
E-val=1e-96
Bit score= 645
%identity= 94.17%
87
Docking:
88
We find that there is a het group (NAG N-acetyl-D-glucosamine) is present as a inhibitor for the neuraminidase
protein.We dock this inhibitor with this protein. After docking we get the evalue of this protein we choose the best
score means the protein which have the least e-value
PDB id:
Name:
Title:
Structure:
Source:
UniProt:
Enzyme class:
Reaction:
R-factor: 0.152
89
Structure of NAG:
Het..group:..NAG...
Calculations:
90
Receptor: 7NN9
Ligand: 1AOH
LIGAND: 1A3K
LIGAND: 1A3K
91
Conclusion
After analyzing different strains of influenza A virus sequences we come to conclusion that though they
all are closely related, they have distinctly different pathogenic behaviour which plays an important role
in survival in different species. It is interesting to have closer look at the matter by studying at the gene
level. A phylogenetic analysis can be very helpful in understanding the evolutionary pattern
So based on current analysis, it can be said that different strain get diverged at different level.
We have noticed that same genes are present in all strains this shows that are they evolved together.
As influenza virus change through the well known process of Antigenic
drifting and shifting ,so as we are using four other strains with H5N1,it shows that they are somewhat
related to each other in past and may these strain give rise to each other(i.e. may be H5N1 was evolved
from H1N1, or any other strain. or vice versa)
Studies on the ecology of influenza viruses have led to the hypothesis that
all mammalian influenza viruses derive from the avian influenza reservoir.
With the finishing of the ongoing gene sequencing project on Avian
Influenza, we hope it will be possible to draw conclusive decision about the true picture of evolution in
near future and gene responsible for pathogenesis can also be identified.
Complete inference can only be drawn based on a comprehensive list of the
gene products and their function.
In order to find out unknown structure of protein present in the H5N1 strain
we do homology modelling. Till now the structures submitted is using X-ray crystallography or NMR
techniques. We forward step to present a theoretical model using available online modelling tools.
As we study that neuraminidase protein that is coded by NA gene is one of the
reasons of pathogenicity of Influenza A virus. So we tried to dock this protein with appropriate ligand,
in order to inhibit their activity on the basis of which the drugs have to be developed.
92
FUTURE PROSPECTS
93
Future prospects
The work presented in this report might just be a stepping stone for any such discoveries. The present
work might be small finding of big issue.
Phylogenetics is that field of biology which deals with identifying and understanding the relationships
between the many different kinds of life on earth. This includes methods for collecting and analysing
data, as well as interpretation of those results as new biological information.
With the aid of sequences it should be, possible to find the closely related organism. Experience learns
that closely related organism have similar sequences. More distantly related organism has more
dissimilar sequences. One objective is to reconstruct the evolutionary relationship between species.
Another objective is to estimating times of divergence between two organisms since they last shared a
common ancestor.
The purpose of modelling is to help the Drug developers and Biotechnologists to develop the drug more
efficiently and with more effectiveness in future by analysing the modelled structure of protein.
As the new drugs target would be identified it will open new vistas for further drug development .The
finding of our docking will be useful in finding a cure for the infectious disease bird flu, also it will
open new avenues for finding other possible drug targets in influenza A virus.
The docking results can be used to design new lead compounds and hence can aid in the new drug
discovery process.
Finally, similar process can be applied on other pathogens and hence possible therapeutic sites can be
identified in them. Similar method can also be applied to other infectious diseases and hence we can
look forward to a better disease free world.
The work presented is just a small part of big issue and lots of work still needs to be done to establish a
good phylogenetic relationship and full fledged cure for bird flu. But we are hoping that these findings
will go long way and will prove fruitful to any going in a similar area.
94
BIBLIOGRAPHY
AND
REFERENCES
95
References
• Gog, J. R., Rimmelzwaan, G. F., Osterhaus, A. D. M. E., Grenfell, B. T. (2003). Population dynamics of
rapid fixation in cytotoxic T lymphocyte escape mutants of influenza A. Proc. Natl. Acad. Sci. U. S. A.
100: 11143-11147 [Abstract] [Full Text]
• Nakagawa, N., Nukuzuma, S., Haratome, S., Go, S., Nakagawa, T., Hayashi, K. (2002). Emergence of an
Influenza B Virus with Antigenic Change. J. Clin. Microbiol. 40: 3068-3070 [Abstract] [Full Text]
• Tumpey, T. M., Suarez, D. L., Perkins, L. E. L., Senne, D. A., Lee, J.-g., Lee, Y.-J., Mo, I.-P., Sung, H.-
W., Swayne, D. E. (2002). Characterization of a Highly Pathogenic H5N1 Avian Influenza A Virus
Isolated from Duck Meat. J. Virol. 76: 6344-6355 [Abstract] [Full Text]
• Benton, K. A., Misplon, J. A., Lo, C.-Y., Brutkiewicz, R. R., Prasad, S. A., Epstein, S. L. (2001).
Heterosubtypic Immunity to Influenza A Virus in Mice Lacking IgA, All Ig, NKT Cells, or {{gamma}}
{{delta}} T Cells. J Immunol 166: 7437-7445 [Abstract] [Full Text]
• Lindstrom, S. E., Hiromoto, Y., Nishimura, H., Saito, T., Nerome, R., Nerome, K. (1999). Comparative
Analysis of Evolutionary Mechanisms of the Hemagglutinin and Three Internal Protein Genes of
Influenza B Virus: Multiple Cocirculating Lineages and Frequent Reassortment of the NP, M, and NS
Genes. J. Virol. 73: 4413-4426 [Abstract] [Full Text]
• Voeten, J. T. M., Bestebroer, T. M., Nieuwkoop, N. J., Fouchier, R. A. M., Osterhaus, A. D. M. E.,
Rimmelzwaan, G. F. (2000). Antigenic Drift in the Influenza A Virus (H3N2) Nucleoprotein and Escape
from Recognition by Cytotoxic T Lymphocytes. J. Virol. 74: 6800-6807 [Abstract] [Full Text]
• Cooper, L. A., Subbarao, K. (2000). A Simple Restriction Fragment Length Polymorphism-Based
Strategy That Can Distinguish the Internal Genes of Human H1N1, H3N2, and H5N1 Influenza A
Viruses. J. Clin. Microbiol. 38: 2579-2583 [Abstract] [Full Text]
• Karasin, A. I., Olsen, C. W., Anderson, G. A. (2000). Genetic Characterization of an H1N2 Influenza
Virus Isolated from a Pig In Indiana. J. Clin. Microbiol. 38: 2453-2456 [Abstract] [Full Text]
• Naffakh, N., Massin, P., Escriou, N., Crescenzo-Chaigne, B., van der Werf, S. (2000). Genetic analysis
of the compatibility between polymerase proteins from human and avian strains of influenza A viruses. J
Gen Virol 81: 1283-1291 [Abstract] [Full Text]
• Hiromoto, Y., Yamazaki, Y., Fukushima, T., Saito, T., Lindstrom, S. E., Omoe, K., Nerome, R., Lim,
W., Sugita, S., Nerome, K. (2000). Evolutionary characterization of the six internal genes of H5N1
human influenza A virus. J Gen Virol 81: 1293-1303 [Abstract] [Full Text]
• Hiromoto, Y., Saito, T., Lindstrom, S. E., Li, Y., Nerome, R., Sugita, S., Shinjoh, M., Nerome, K.
(2000). Phylogenetic analysis of the three polymerase genes (PB1, PB2 and PA) of influenza B virus. J
Gen Virol 81: 929-937 [Abstract] [Full Text]
• Zhou, N. N., Senne, D. A., Landgraf, J. S., Swenson, S. L., Erickson, G., Rossow, K., Liu, L., Yoon, K.-
j., Krauss, S., Webster, R. G. (1999). Genetic Reassortment of Avian, Swine, and Human Influenza A
Viruses in American Pigs. J. Virol. 73: 8851-8856 [Abstract] [Full Text]
96
• Alexander DJ, Brown IH. “Recent zoonoses caused by influenza A viruses” Rev Sci Tech 2000; 19:197
225. First citation in article | PubMed
• Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb
Miller, and David J. Lipman Nucleic Acids Res. 25:3389-3402 (1997)
Genetic analysis of the compatibility between polymerase proteins from human and avian strains of influenza A
viruses by Nadia Naffakh1, Pascale Massin1, Nicolas Escriou1, Bernadette Crescenzo-Chaigne1 and Sylvie
van der Werf1 (http://jgv.sgmjournals.org/cgi/content/abstract/81/5/1283) read this article online
• Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and
Reassortment among Recent H3N2 Viruses “Edward C. Holmes1, Elodie Ghedin2, Naomi Miller2, Jill
Taylor3, Yiming Bao4, Kirsten St. George3, Bryan T. Grenfell1, Steven L. Salzberg2, Claire M. Fraser2,
David J. Lipman4*, Jeffery K. Taubenberger5”
• Influenza A (H3N2) Outbreak, Nepal Luke T. Daum,* Michael W. Shaw,Alexander I. Klimov,‡ Linda
C. Canas,* Elizabeth A. Macias,* Debra Niemeyer,* James P. Chambers,† Robert Renthal,† Sanjaya K.
Shrestha,§ Ramesh P. Acharya,¶ Shankar P. Huzdar,¶ Nirmal Rimal,¶ Khin S. Myint,# and Philip
Gould* (http://www.cdc.gov/ncidod/eid/vol11no08/05-0302.htm)
• Felsenstein J. (1981). PHYLIP: Phylogeny inference package (version 3.2). Cladistics 5: 164-166.
• Higgins DG and Sharp PM. (1988). CLUSTAL: A package for performing multiple sequence alignment
on a microcomputer. Gene 73: 237-244.
• Higgins DG, Thompson JD, and Gibson TJ. (1996). Using CLUSTAL for multiple sequence alignment.
Methods Enzymol. 266: 383-402.
• Mount DW. (2001). Bioinformatics: Sequence and genome analysis. Cold Spring Harbor Laboratory
Press, 564 pp.
• Saitou N and Nei M. (1987). The neighbor-joining method: A new method for reconstronting
phylogenetic trees. Mol. Biol. Evol. 4: 406-425.
• Hinshaw VS, Webster RG. The natural history of influenza A viruses. In: Beare AS, editor. Basic and
applied influenza research. Boca Raton (FL): CRC Press; 1982. p. 79-104.
• Scholtissek C, Naylor E. Fish farming and influenza pandemics. Nature 1988;331:215.
• Bean WJ, Kawaoka Y, Wood JM, Pearson JE, Webster RG. Characterization of virulent and avirulent
• Fouchier RAM, Munster V, Wallensten A, et al, 2005. Characterization of a novel influenza A virus
hemagglutinin subtype (H16) obtained from black-headed gulls. J Virol vol 79, issue 5, pp2814-22.
• Gambaryan A, Tuzikov A, Pazynina G, Bovin N, Balish A, Klimov A, 2005. Evolution of the receptor
binding phenotype of influenza A (H5) viruses in Virology (electronic publication ahead of print
version).
• Hatta M, Gao P, Halfmann P, Kawaoka Y, 2001. Molecular Basis for High Virulence of Hong Kong
H5N1 Influenza A Viruses in Science vol 293, pp1840-1842.
97
• Nelson DL and Cox MM, 2005. Lehninger's Principles of Biochemistry, 4th edition, WH Freeman, New
York, NY.
• Suzuki, Y, 2005. Sialobiology of Influenza: Molecular Mechanism of Host Range Variation of Influenza
Viruses in Biological and Pharmaceutical Bulletin, vol 28, pp399-408.
• Senne DA, Panigrahy B, Kawaoka Y, Pearson JE, Suss J, Lipkind M, Kida H, Webster RG, 1996. Survey
of the hemagglutinin (HA) cleavage site sequence of H5 and H7 avian influenza viruses: amino acid
sequence at the HA cleavage site as a marker of pathogenicity potential in Avian Disease vol 40, pp425-
437.
• Weis WI, Brünger AT, Skehel JJ, et al, 1990. Refinement of the influenza virus hemagglutinin by
simulated annealing. J Mol Biol vol 212, pp737-761.
• White JM, Hoffman LR, Arevalo JH, et al, 1997. Attachment and entry of influenza virus into host cells.
Pivotal roles of hemagglutinin. In Structural Biology of Viruses. Chiu W, Burnett RM, and Garcea RL,
editors. Oxford University Press, NY. pp80-104.
Website
1. http://www.ncbi.nlm.nih.gov/genomes/VIRUSES/11308.html
2. http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html
3. http://www.cdc.gov/ncidod/eid/vol4no3/webster.htm
4. http://www.influenzacentre.org/fluinfo.htm
5. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genome&cmd=search&term=influenza+A+virus
6. http://www.ncbi.nih.gov/genomes/VIRUSES
7. http://www.nhsdirect.nhs.uk
8. http://www.influenzareport.com/ir/ai.htm
9. http://www.agnr.umd.edu/avianflu/
10. http://www.cdc.gov/flu/about/fluviruses.htm
11. http://www.cdc.gov/flu/avian/gen-info/flu-viruses.htm
12. http://bioinformatics.ubc.ca/resources/tools/?name=clustalx
13. http://bips.u-strasbg.fr/fr/Documentation/ClustalX/
14. http://pbil.univ-lyon1.fr/software/njplot.html
15. http://www.cdc.gov/ncidod/eid/vol4no3/webster.htm#ref6
16. http://www.en.wikipidia.org//wiki
17. http://www.who.int/csr/don/2004_01_15/en/
18. http://www.mayoclinic.com/health/bird-flu/DS00566
19. http://www.pandemicflu.state.pa.us/pandemicflu/cwp/view.asp?a=501&q=151742`
20. http://micro.magnet.fsu.edu/cells/viruses/influenzavirus.html
98
21. http://www.cdc.gov/flu/about/fluviruses.htm
22. http://en.wikipedia.org/wiki/h5n1_genetic_structure
23. http://www.cdc.gov/flu/avian/gen-info/flu-viruses.htm
24. http://www3.niaid.nih.gov/news/focuson/flu/illustrations/antigenic/antigenicdrift.htm
25. http://www3.niaid.nih.gov/news/focuson/flu/illustrations/antigenic/antigenicshift.htm
26. http://www.cdc.gov/flu/avian/gen-info/flu-viruses.htm
27. http://en.wikipedia.org/wiki
28. http://pathmicro.med.sc.edu/mhunt/flu.htm
29. http://en.wikipedia.org/wiki/H5N1#Genetic_structure_and_related_subtypes
30. http://www.csd.abdn.ac.uk/hex/
31. http://www.ebi.ac.uk/thornton-srv/databases/pdbsum
32. http://www.ebi.ac.uk/thornton-srv/databases/CSA
33. http://en.wikipedia.org/wiki/Neuraminidase
34. en.wikipedia.org/wiki/Neuraminidase_inhibitor
35. www.qdots.com/live/render/content.asp
Books
1) “BIOINFORMATICS AND FUNCTIONAL GENOMICS”
Author: Jonanthan pevsner
2) “SEQUENCE AND GENOME ANALYSIS”
Author: David W Mount
3) “BIONFORMATICS—METHODS AND APPLICATION: GENOMICS, PROTEOMICS”
Author: S.C.Rastogi, Namita Mendiratta , Parag Rastogi
99
ABBREVIATION
100
Abbreviation
101
APPENDIX
102
Appendix
PDBsum:- A database of the known 3D structures of proteins and nucleic acid PDBsum is a pictorial
database providing an at-a-glance overview of every macromolecular structure deposited in the Protein
Data Bank (PDB). It provides schematic diagrams of the molecules in each structure and of the
interactions between them. Entries are accessed by their PDB code (http://www.ebi.ac.uk/thornton-
srv/databases/pdbsum/)
Jena Library:- The Jena Library of Biological Macromolecules (JenaLib) is aimed at a better
dissemination of information on three-dimensional biopolymer structures with an emphasis on
visualization and analysis.
It provides access to all structure entries deposited at the Protein Data Bank (PDB) or at the Nucleic
Acid Database (NDB). ( http://www.fli-leibniz.de/IMAGE.html)
CSA (Catalytic Site Atlas):- The Catalytic Site Atlas (CSA) is a database documenting enzyme active
sites and catalytic residues in enzymes of 3D structure.
The Catalytic Site Atlas (CSA) provides catalytic residue annotation for enzymes in the Protein Data
Bank.
The CSA contains 2 types of entry:
1. Original hand-annotated entries, derived from the primary literature. References for these
entries are given.
103
2. Homologous entries, found by PSI-BLAST alignment (using an e-value cut-off of 0.00005) to
one of the original entries. The equivalent residues, which align in sequence to the catalytic
residues found in the original entry are documented.
CSA Version 2.1.7 ( http://www.ebi.ac.uk/thornton-srv/databases/CSA)
Swiss model
Swiss model is an automated homology modelling server developed within the swiss institute of bioinformatics in collaboration
between Glaxo and SBG make it easy to submit a target sequence and get back an automatically generated homology model, provide an
empirical structure with >30% sequence identity exist to use as a template .These automated models may be useful, but will sometime
have error that could be avoided if manual adjustment are made to the sequence alignment by an expert .
SwissPDB Viewer: Swiss-PdbViewer can load and display several molecules simultaneously.Each
molecule is loaded into its own layer. Each molecule is composed of groups (i.e. amino acids,
nucleotides, substrates...). Each group is composed of atoms, whose coordinates are taken directly from
a PDB file.
Swiss PDV Viewer is a free program to display, analyse and manipulate PDB protein structures. Next
to features such as protein superimposition, H-bond detection, amino acid mutation etc., the protein is
tightly linked to Swiss- Model, an automated homology modelling server running at the Geneva
Biomedical Research Center. This allows
for threading a protein primary sequence to a 3D template and analysing homology. The displaying
options of the program include spacefill, ball & stick, stick and ribbon representations, all of which can
be applied simultaneously within one structure model.
SwissPDB Viewer Version 3.7 http://www.expasy.ch/spdbv/text/main.htm
Hex: - Hex is an interactive molecular graphics program for calculating and displaying feasible docking
modes of pairs of protein and DNA molecules. Hex can also calculate small-ligand/protein docking
(provided the ligand is rigid), and it can superpose pairs of molecules using only knowledge of their 3D
shapes.
In Hex's docking calculations, each molecule is modelled using 3D parametric functions which are used
to encode both surface shape and electrostatic charge and potential distributions
Hex Version 4.5 ( http://www.csd.abdn.ac.uk/hex/)
PHYLIP: (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies
(evolutionary trees). Methods that are available in the package include parsimony, distance matrix, and
likelihood methods, including bootstrapping and consensus trees. Data types that can be handled
104
include molecular sequences, gene frequencies, restriction sites and fragments, distance matrices, and
discrete characters.
Some sequence analysis programs such as the ClustalW alignment program can write data files in the
PHYLIP format. Most of the programs look for the data in a file called "infile" -- if they do not find this
file they then ask the user to type in the file name of the data file.
Output is written onto special files with names like "outfile" and "outtree". Trees written onto "outtree"
are in the Newick format, an informal standard agreed to in 1986 by authors of a number of major
phylogeny packages.
.http://evolution.genetics.washington.edu/phylip
The ORFs can be defined as regions of a specified minimum size between STOP codons or between
START and STOP codons.The ORFs can be output as the nucleotide sequence or as the translation.
The program can also output the region around the START or the initial STOP codon or the ending
STOP codons of an ORF for those doing analysis of the properties of these regions.
The START and STOP codons are defined in the Genetic Code tables. A suitable Genetic Code table
can be selected for the organism you are investigating.
(http://www.3rog.org/general/software/packages/emboss/getorf.html)
Clustal w: ClustalW is a general purpose multiple sequence alignment program for DNA or proteins.It
produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates
the best match for the selected sequences, and lines them up so that the identities, similarities and
differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.
http://www.ebi.ac.uk/clustalw/)
105
This server provides access to the program Genscan for predicting the locations and exon-intron
structures of genes in genomic sequences from a variety of organisms. This server can accept sequences
up to 1 million base pairs (1 Mbp) in length.
http://genes.mit.edu/GENSCAN.html
bioinformatics.ubc.ca/resources/tools/index.php?name=genscan
106
107
108