Imatinib Drug Resistance

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 93

CHAPTER - 1

INTRODUCTION C
hro
nic

Myelogenous Leukaemia is a myelo-proliferative disorder. It is characterized by a biphasic


or triphasic clinical course in which a benign chronic phase is followed by transformation into
an accelerated and blast phase.

On a cytogenetic and molecular level, most patients with Chronic Myelogenous


Leukaemia demonstrate BCR-ABL fusion genes in hematopoietic progenitor cells, which
result from a reciprocal translocation between chromosomes 9 and 22; this translocation leads
to a shortened chromosome 22, called the Philadelphia chromosome. Translation of the
fusion products yields chimeric proteins of variable size that have increased tyrosine kinase
activity.
2

Conventional chemotherapy with hydroxyurea or busulfan can achieve hematologic


control but cannot modify the natural disease course, which inevitably terminates in a rapidly
fatal blastic phase. Since its introduction in the 1980s, allogeneic stem-cell transplantation has
provided the groundwork for a cure of Chronic Myelogenous Leukaemia. However, few
patients are eligible for this treatment because of donor availability and age restrictions.
Therapy with interferon-a alone or in combination with cytarabine suppresses the leukemic
clone, produces cytogenetic remissions, and prolongs survival. It is an effective alternative
first-line treatment for patients ineligible for transplantation. New drugs active against CML
may show increased activity in the transformed phases of the disease.

Novel therapies and concepts are developing rapidly; targeted molecules are tyrosine
kinases, ras, and messenger RNA through antisense oligonucleotides. Alternative
transplantation options, such as stem cells from autologous sources and matched unrelated
donors, are expanding. Immunomodulation by adoptive immunotherapy and vaccine
strategies hold significant promise for the cure of Chronic Myelogenous Leukaemia.

The development of the Bcr-Abl–targeted Imatinib represents a paradigm shift in the


treatment of CML, because treatment with Imatinib resulted in significantly better patient
outcome, response rates, and overall survival compared with previous standards. Despite this
advance, not all patients benefit from Imatinib because of resistance and intolerance.
Resistance to Imatinib can develop from a number of mechanisms that can be defined as Bcr-
Abl– dependent (e.g., most commonly resulting from point mutations in the Abl kinase
domain) and Bcr-Abl independent mechanisms (including the constitutive activation of
downstream signalling molecules, e.g., Src family kinases), which could result in the
activation of the pathway regardless of Bcr-Abl inhibition. Clearly, new treatment approaches
are required for patients resistant to or intolerant of Imatinib, which can be dose escalated in
patients who demonstrate resistance.

In this study, we performed an in silico approach to study the effect of the point
mutations on the evident drug resistance. First of all, a data of the point mutations was
collected from a client [who derived the data by PCR amplification and sequencing of the
nucleotide sequence of some patients]. Then with the help of this data, mutant protein models
3

were created by homology modelling. And finally, docking of these mutant protein molecules
against the drug Imatinib was done. The results were then compared with that of the pure
protein [wild protein].

1. Figures of peripheral blood (left) and bone marrow (right) smears of a CML patient
in chronic phase, showing leukocytosis in the peripheral blood, and hypercellularity
in the bone marrow due mainly to neutrophils in different stages of maturation. In
CML bone marrow, typical megakaryocytes are smaller than normal and have
hypolobulated nuclei.
4

CHAPTER - 2
REVIEW OF
LITERATURE
2.1

CHRONIC MYELOID LUEKEMIA- THE DISEASE:


Chronic Myelogenous (or myeloid) Leukaemia (CML), also known as chronic
granulocytic leukaemia (CGL), is a form of leukaemia characterized by the increased and
unregulated growth of predominantly myeloid cells in the bone marrow and the accumulation
of these cells in the blood. CML is a clonal bone marrow stem cell disorder in which
proliferation of mature granulocytes (neutrophils, eosinophils, and basophils) and their
precursors is the main finding. It is a type of myeloproliferative disease associated with a
characteristic chromosomal translocation called the Philadelphia chromosome. Historically, it
has been treated with chemotherapy, interferon and bone marrow transplantation, although
5

targeted therapies introduced at the beginning of the 21st century have radically changed the
management of CML.

Normally, the bone marrow makes blood stem cells (immature cells) that develop into
mature blood cells over time. A blood stem cell may become a myeloid stem cell or a
lymphoid stem cell. The lymphoid stem cell develops into a white blood cell. The myeloid
stem cell develops into one of three types of mature blood cells:

• Red blood cells that carry oxygen and other materials to all tissues of the body.
• Platelets that help prevent bleeding by causing blood clots to form.
• Granulocytes (white blood cells) that fight infection and disease

Figure 2. Normal condition- blood stem cell development.

In CML, too many blood stem cells develop into a type of white blood cell called
granulocytes. These granulocytes are abnormal and do not become healthy white blood cells.
They may also be called leukemic cells. The leukemic cells can build up in the blood and
bone marrow so there is less room for healthy white blood cells, red blood cells, and platelets.
When this happens, infection, anaemia, or easy bleeding may occur.

Most people with CML have a gene mutation (change) called the Philadelphia
chromosome. Every cell in the body contains DNA (genetic material) that determines how the
cell looks and acts. DNA is contained inside chromosomes. In CML, part of the DNA from
6

one chromosome moves to another chromosome. This change is called the “Philadelphia
chromosome.” It results in the bone marrow making an enzyme, called tyrosine kinase that
causes too many stem cells to develop into white blood cells (granulocytes or blasts). The
Philadelphia chromosome is not passed from parent to child.

\figure 3. Formation of Philadelphia chromosome


7

4[a and b].Structure of the c-Bcr, c-Abl and Bcr-Abl proteins. c-Bcr comprises an
oligomerization domain, a domain thought to mediate binding to SH2-domain-containing proteins, a
serine/threonine kinase domain, a region with homology to Rho guanine-nucleotide-exchange factor
(Rho-GEF), a region thought to facilitate calcium-dependent lipid binding (CaLB) and a region
showing homology to Rac GTPase activating protein (Rac-GAP). The main phosphorylation site of
Bcr (Tyr 177) is indicated. c-Abl comprises an SH3 and SH2 domain, an SH1 tyrosine kinase
domain, several proline-rich domains (P), a nuclear localization signal (NLS), several DNA-binding
domains (DNA BD) and an actin-binding domain. The Bcr-Abl fusion protein comprises the first four
domains of c-Bcr and all the c-Abl domains except the N-terminal SH3 domain.
8

5. Mechanisms responsible for Bcr-Abl-induced malignant transformation in Ph+cells.


As a consequence of the t(9;22) translocation, the regulatory regions at the NH2-terminus of c-
Abl are lost and replaced by the oligomerization domain of c-Bcr. This induces constitutive
dimerization and autophosphorylation of Bcr-Abl, whose uncontrolled activity is responsible for
alterations in the physiological processes regulated by c-Abl – proliferation, apoptosis and adherence to
marrow stroma

Symptoms of CML:

• Splenomagaly
• Susceptibility to infections
• Anaemia
• Thrombocytopenia
• Enlargement of liver etc.

Diagnosis of CML:

• Physical exam and history: An exam of the body to check general signs of health,
including checking for signs of disease such as an enlarged spleen. A history of the
patient’s health habits and past illnesses and treatments will also be taken.
• Complete blood count (CBC): A procedure in which a sample of blood is drawn and
checked for the following:

 The number of red blood cells, white blood cells, and platelets.
 The amount of haemoglobin (the protein that carries oxygen) in the red blood cells.
 The portion of the sample made up of red blood cells
9

2Blood chemistry studies: A procedure in which a blood sample is checked to measure


the amounts of certain substances released into the blood by organs and tissues in the
body. An unusual (higher or lower than normal) amount of a substance can be a sign of
disease in the organ or tissue that makes it.
3Cytogenetic analysis: A test in which cells in a sample of blood or bone marrow are
viewed under a microscope to look for certain changes in the chromosomes, such as the
Philadelphia chromosome.
4Bone marrow aspiration and biopsy: The removal of bone marrow, blood, and a small
piece of bone by inserting a needle into the hipbone or breastbone. A pathologist views
the bone marrow, blood, and bone under a microscope to look for abnormal cells

Small proportion of patients has a clinical picture consistent with CML, but no Ph
chromosome can be cytogenetically observed. In these cases the chromosomal aberrations are
sub-microscopic and in conventional cytogenetic studies the cases seem to be Ph
chromosome negative. These may also be called as cryptic translocations or masked Ph
chromosomes. However, even though cytogenetically no abnormality may be observed, at the
molecular level the pathogenic BCR-ABL fusion gene characteristic for CML is detectable.
This condition is called Ph negative, BCR-ABL positive CML. The Ph negative, BCR-ABL
positive cases do not otherwise differ from standard Ph positive patients except that the
chromosomal mechanism of the fusion gene formation is instead of translocation most often
insertion of 3´ABL or 5´BCR sequences to chromosome 22 or 9, respectively .

The “real” Ph negative cases that are also lacking BCR-ABL molecular rearrangement are
regarded as separate entities: as chronic neutrophilic leukaemia or atypical CML. These
disorders are classified as either other chronic myeloproliferative or myelodysplastic/
myeloproliferative diseases according to WHO classification. Usually these diseases are
unresponsive to tyrosine kinase inhibitors and have a poor prognosis. Because of
unresponsiveness to these inhibitors the name (regardless of the prefix “atypical”) CML is
slightly misleading.

CML is often divided into three phases based on clinical characteristics and laboratory
findings. In the absence of intervention, CML typically begins in the chronic phase, and over
the course of several years progresses to an accelerated phase and ultimately to a blast crisis.
Blast crisis is the terminal phase of CML and clinically behaves like an acute leukemia. One
of the drivers of the progression from chronic phase through acceleration and blast crisis is
10

the acquisition of new chromosomal abnormalities (in addition to the Philadelphia


chromosome). Some patients may already be in the accelerated phase or blast crisis by the
time they are diagnosed.

Chronic phase

Approximately 85% of patients with CML are in the chronic phase at the time of diagnosis.
During this phase, patients are usually asymptomatic or have only mild symptoms of fatigue
or abdominal fullness. The duration of chronic phase is variable and depends on how early
the disease was diagnosed as well as the therapies used. Ultimately, in the absence of curative
treatment, the disease progresses to an accelerated phase.

Accelerated phase

Criteria for diagnosing transition into the accelerated phase are somewhat variable; the most
widely used criteria are those put forward by investigators at M.D. Anderson Cancer Centre,
by Sokal et al, and the World Health Organization. The WHO criteria are perhaps most
widely used, and include:

•10–19% myeloblasts in the blood or bone marrow

•>20% basophils in the blood or bone marrow

•Platelet count <100,000, unrelated to therapy

•Platelet count >1,000,000, unresponsive to therapy

•Cytogenetic evolution with new abnormalities in addition to the Philadelphia


chromosome

•Increasing splenomegaly or white blood cell count, unresponsive to therapy

The patient is considered to be in the accelerated phase if any of the above are present. The
accelerated phase is significant because it signals that the disease is progressing and
transformation to blast crisis is imminent.
11

Blast crisis

Blast crisis is the final phase in the evolution of CML, and behaves like an acute leukaemia,
with rapid progression and short survival. Blast crisis is diagnosed if any of the following are
present in a patient with CML:

•>20% myeloblasts or lymphoblasts in the blood or bone marrow

•Large clusters of blasts in the bone marrow on biopsy

•Development of a chloroma (solid focus of leukaemia outside the bone marrow)

2.2 IMATINIB- THE DRUG:


Imatinib (Glivec®, Gleevec™, formerly STI571 or CGP57148B, also called Imatinib
Mesylate) is a selective small molecule tyrosine kinase inhibitor used in targeted treatment of
CML and Ph chromosome positive ALL. Imatinib is a 2-phenylaminopyrimidine compound
that in preclinical studies showed a 92-98% decrease in the number of BCRABL positive
colony formation but had no inhibition on normal colonies. This observation suggested the
potential utility of the compound in the treatment of BCR-ABL-positive leukaemia.

The high specificity of Imatinib in inhibiting the tyrosine kinases mentioned above is
achieved by its ability to bind the kinase molecule in its closed (inactive) conformation. In the
closed conformation the centrally located activation loop of the kinase is not phosphorylated
and therefore inactive. When phosphorylated, the activation loop extends to the open (active)
conformation which enables binding of substrate molecules to the kinase and subsequently
their phosphorylation. The active conformation is very similar in all known kinases. In
contrast, the inactive conformation has great diversity among protein kinases, explaining the
specificity of Imatinib. Imatinib occupies the ATP binding site of the BCR-ABL kinase
domain and acts as a competitive inhibitor of BCR-ABL with respect to ATP. The side chain
of threonine residue at position 315 (T315) forms a hydrogen bond with the Imatinib
molecule. This residue is replaced by methionine in many kinases which is not able to form
such a bond, which makes T315 a key element for Imatinib to inhibit BCR-ABL. When
Imatinib occupies the ATP binding pocket it stabilizes the inactive form of BCR-ABL, thus
preventing autophosphorylation of the kinase itself and subsequently phosphorylation of its
12

substrates. This consequently results in inhibition of the signalling cascades downstream of


BCR-ABL, inhibition of cell proliferation, and eventually apoptosis.

Some facts about Imatinib:

Primary Accession DB00619


Number
• APRD01028
Secondary
Accession Number
Name Imatinib
Drug Type • Approved

• Small Molecule
Synonyms 1. Imatinib Mesylate

2. Imatinib Methansulfonate
Brand Names 1. Gleevec

2. Glivec
Chemical IUPAC 4-[(4-methylpiperazin-1-yl)methyl]-N-[4-methyl-3-[(4-pyridin-3-
Name ylpyrimidin-2-yl)amino]phenyl]benzamide
Chemical Formula C29H31N7O
13

Chemical Structure

CAS Registry 152459-95-5


Number
Average Molecular 493.6027
Weight
Monoisotopic 493.2590
Molecular Weight
State Solid
Melting Point 226 oC (mesylate salt)
Experimental Water Very soluble in water at pH < 5.5 (mesylate salt) Source: PhysProp
Solubility
Predicted Water 1.46e-02 mg/mL Calculated using ALOGPS
Solubility
Absorption Imatinib is well absorbed with mean absolute bioavailability is 98%
with maximum levels achieved within 2-4 hours of dosing
Toxicity Side effects include nausea, vomiting, diarrhea, loss of appetite, dry
skin, hair loss, swelling (especially in the legs or around the eyes)
and muscle cramps
Protein Binding Very high (95%)
Biotransformation Primarily hepatic via CYP3A4. Other cytochrome P450 enzymes,
such as CYP1A2, CYP2D6, CYP2C9, and CYP2C19, play a minor
role in its metabolism. The main circulating active metabolite in
humans is the N-demethylated piperazine derivative, formed
predominantly by CYP3A4.
Half Life 18 hours for Imatinib, 40 hours for its major active metabolite, the N-
desmethyl derivative
14

Dosage Forms Form


Route

Capsul
e
Oral

Tablet
Oral

Food Interactions • Take with food to reduce the incidence of gastric irritation.
Follow with a large glass of water. A lipid rich meal will slightly
reduce and delay absorption. Avoid grapefruit and grapefruit
juice throughout treatment, grapefruit can significantly
increase serum levels of this product.
Organisms Affected • Humans and other mammals
Phase 1 1. Cytochrome P450 3A4 (CYP3A4)
Metabolizing
Enzymes
Targets • Proto-oncogene tyrosine-protein kinase ABL1
• Beta platelet-derived growth factor receptor
• Mast/stem cell growth factor receptor
• Alpha platelet-derived growth factor receptor
• Macrophage colony-stimulating factor 1 receptor
• Multidrug resistance protein 1
• High affinity nerve growth factor receptor
• ATP-binding cassette sub-family G member 2
• RET proto-oncogene

• Epithelial discoidin domain-containing receptor 1

Oxygen
15

2.3 BIOLOGICAL DATABASES:

2.3.1 NATIONAL CENTRE FOR BIOLOGICAL INFORMATION:

The National Centre for Biotechnology Information (NCBI) provides a


comprehensive website for biologists that includes biology-related databases, and tools for
viewing and analyzing the data inherent in the databases. A division of the National Library
of Medicine at the National Institutes of Health, NCBI is the agency responsible for creating
automated systems for storing and analyzing the rapidly growing profusion of genetic and
molecular data. One of the most difficult challenges faced in the field of bioinformatics is
how to store, in an easily accessible manner, the overwhelming abundance of new
information, including the sequences of entire genomes, the ongoing discoveries of new
genes and gene products, and the determinations of their functions and structures. NCBI was
established as the government's response to the need for more and better information
processing methods to deal with this challenge.
16

View the NCBI home page. A relatively good overview of the tools and databases that can be
accessed through NCBI is provided in the list along the left border of the home page.
Clicking on the link entitled "About NCBI" produces a second menu containing the topics "A
Science Primer", and "Databases and Tools", among others. Selecting "A Science Primer"
yields access to general definitions and introductory information regarding the branches of
science included in bioinformatics. Many bioinformatics terms are defined in this section in a
clear-cut and basic manner, making this Primer an excellent first resource. Selecting
"Databases and Tools" from the "About NCBI" webpage menu yields a complete and well-
ordered listing of accessible information. This web page containing the databases and tools
menu is a good choice for those who are inclined toward bookmarking.

The first item under the "Databases and Tools" menu is "Literature Databases". PubMed is the
most heavily used of the literature databases and can be used to access MEDLINE biological
and medical scientific journal citations dating back to articles written in the mid-1960's. The
second item under the "Databases and Tools” menu is "Entrez Databases". Entrez is a search
and retrieval system developed by NCBI that is capable of accessing integrated information
by searching many of the NCBI databases with just one query (instead of searching only one
database per query, then having to repeat the query to find information on the same topic from
another NCBI database). The NCBI databases that are included in the search when you
launch an Entrez query are shown when you click on this link. The "Nucleotide Databases"
link under the "Databases and Tools" menu lists all the sequence databases available through
NCBI. These sequence databases contain annotated collections of publicly available DNA,
RNA and protein sequences. The evolution of bioinformatics data mining methods has been
largely driven by the prodigious amount of sequence information collected by scientists in
recent years. New sequences of unknown function can be compared with sequences of well-
characterized genes and proteins. Similarities can be identified between the new, unknown
sequences and the well-characterized sequences, and used to postulate theories regarding
function or structure.

Among the tools listed under the NCBI "Databases and Tools" menu, are "Tools for Data
Mining". Selecting the "Tools for Data Mining" topic will show a list of data retrieval tools,
including Entrez, mentioned above, and BLAST, the Basic Local Alignment Search Tool.
Blast is the predominant sequence alignment tool for performing rapid searches of nucleotide
17

and protein sequence databases and detecting local, as well as global, sequence alignments
between the query sequence and the database sequences.

This is a brief glimpse at some of the more widely used tools and databases presented by
NCBI, presented with the intention of helping the novice get some feel for the number and
types of bioinformatics tools that are available on the internet today. Several of these tools are
covered in more detail in subsequent modules included in this bioinformatics course. Before
proceeding to the next module, take a moment to return to the "About NCBI" webpage menu
and glance through some of the interesting web pages linked under the topics "A Science
Primer", "Outreach and Education", and "News".

The NCBI has had responsibility for making available the GenBank DNA sequence database
since 1992. GenBank coordinates with individual laboratories and other sequence databases
such as those of the European Molecular Biology Laboratory (EMBL) and the DNA Database
of Japan (DDBJ).

Since 1992, NCBI has grown to provide other databases in addition to GenBank. NCBI
provides Online Mendelian Inheritance in Man, the Molecular Modeling Database (3D
protein structures), dbSNP a database of Single Nucleotide Polymorphisms, the Unique
Human Gene Sequence Collection, a Gene Map of the Human genome, a Taxonomy Browser,
and coordinates with the National Cancer Institute to provide the Cancer Genome Anatomy
Project. The NCBI assigns a unique identifier (Taxonomy ID number) to each species of
organism

2.3.2 PROTEIN DATA BANK:


18

The Protein Data Bank (PDB) is a repository for the 3-D structural data of large
biological molecules, such as proteins and nucleic acids. (See also crystallographic database).
The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted
by biologists and biochemists from around the world, can be accessed at no charge on the
internet. The PDB is overseen by an organization called the Worldwide Protein Data Bank,
wwPDB.

The PDB is a key resource in areas of structural biology, such as structural genomics. Most
major scientific journals, and some funding agencies, such as the NIH in the USA, now
require scientists to submit their structure data to the PDB. If the contents of the PDB are
thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that
categorize the data differently. For example, both SCOP and CATH categorize structures
according to type of structure and assumed evolutionary relations; GO categorize structures
based on genes.

The PDB originated as a grassroots effort. In 1971, Walter Hamilton of the Brookhaven
National Laboratory agreed to set up the data bank at Brookhaven. Upon Hamilton's death in
1973, Tom Koeztle took over direction of the PDB. In January, 1994, Joel Sussman was
appointed head of the PDB. In October, 1998 the PDB was transferred to the Research
Collaboratory for Structural Bioinformatics (RCSB); the transfer was completed in June,
1999. The new director was Helen M. Berman of Rutgers University (one of the member
institutions of the RCSB). In 2003, with the formation of the wwPDB, the PDB became an
international organization. Each of the four members of wwPDB can act as deposition, data
processing and distribution centres for PDB data. The data processing refers to the fact that
wwPDB staff review and annotates the each submitted entry. The data are then automatically
checked for plausibility

The PDB database is updated weekly. Likewise, the PDB Holdings List is also updated
weekly. As of 28 April 2009, the breakdown of current holdings was as follows:
19

Experimental Protein/Nucleic Acid


Proteins Nucleic Acids Other Total
Method complexes

X-ray diffraction 45825 1141 2110 17 49093

NMR 6815 850 144 7 7816

Electron microscopy 155 16 59 0 230

Other 110 4 4 9 127

Total: 52905 2011 2317 33 57266

38,249 structures in the PDB have a structure factor file.

4,496 structures have an NMR restraint file.

These data show that most structures are determined by X-ray diffraction, but about 15% of
structures are now determined by protein NMR, and a few are even determined by cryo-
electron microscopy.

The significance of the structure factor files, mentioned above, is that, for PDB structures
determined by X-ray diffraction that have a structure file, the electron density map may be
viewed. The data of such structures is stored on the "electron density server", where the
electron maps can be viewed.

In the past, the number of structures in the PDB has grown nearly exponentially. In 2007,
7263 structures were added. However, in 2008, only 7073 structures were added, so the rate
of production of structures has started to decrease.

The file format initially used by the PDB was called the PDB file format. This original format
was restricted by the width of computer punch cards to 80 characters per line. Around 1996,
the "macromolecular Crystallographic Information file" format, mmCIF, started to be phased
in. An XML version of this format, called PDBML, was described in 2005. The structure files
can be downloaded in any of these three formats. In fact, individual files are easily
downloaded into graphics packages using web addresses:

• For PDB format files, use, e.g., http://www.pdb.org/pdb/files/4hhb.pdb.gz


• For PDBML (XML) files, use, e.g., http://www.pdb.org/pdb/files/4hhb.xml.gz
20

The "4hhb" is the PDB identifier. Each structure published in PDB receives a four-character
alphanumeric identifier, its PDB ID. (This cannot be used as an identifier for biomolecules,
because often several structures for the same molecule (in different environments or
conformations) are contained in PDB with different PDB IDs.)

The structure files may be viewed using one of several open source computer programs.
Some other free, but not open source programs include VMD, MDL Chime, Swiss-PDB
Viewer, StarBiochem (a Java-based interactive molecular viewer with integrated search of
protein databank) and Sirius. The RCSB PDB website contains an extensive list of both free
and commercial molecule visualization programs and web browser plugins.

2.3.3 DRUGBANK:

The DrugBank database is a unique bioinformatics and cheminformatics resource that


combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with
comprehensive drug target (i.e. sequence, structure, and pathway) information. The database
contains nearly 4800 drug entries including >1,350 FDA-approved small molecule drugs, 123
FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,243 experimental
drugs. Additionally, more than 2,500 non-redundant protein (i.e. drug target) sequences are
linked to these FDA approved drug entries. Each DrugCard entry contains more than 100 data
fields with half of the information being devoted to drug/chemical data and the other half
devoted to drug target or protein data. DrugBank is supported by David Wishart, Departments
of Computing Science & Biological Sciences, University of Alberta.

Users may query DrugBank in any number of ways. The simple text query (above) supports
general text queries of the entire textual component of the database.

Clicking on the Browse button (on the DrugBank navigation panel above) generates a
tabular synopsis of DrugBank's content. This browse view allows users to casually scroll
through the database or re-sort its contents.

Clicking on a given DrugCard button brings up the full data content for the corresponding
drug. A complete explanation of all the DrugCard fields and sources is given here.
21

The PharmaBrowse button allows users to browse through drugs as grouped by their
indication. This is particularly useful for pharmacists and physicians, but also for
pharmaceutical researchers looking for potential drug leads.

The ChemQuery button allows users to draw (using MarvinSketch applet or a ChemSketch
applet) or write (SMILES string) a chemical compound and to search DrugBank for
chemicals similar or identical to the query compound.

The TextQuery button supports a more sophisticated text search (partial word matches, case
sensitive, misspellings, etc.) of the text portion of DrugBank.

The SeqSearch button allows users to conduct BLASTP (protein) sequence searches of the
18,000 sequences contained in DrugBank. Both single and multiple sequence (i.e. whole
proteome) BLAST queries are supported.

The Data Extractor button opens an easy-to-use relational query search tool that allows
users to select or search over various combinations of subfields. The Data Extractor is the
most sophisticated search tool for DrugBank.

Users may download selected text components and sequence data from DrugBank and track
the latest DrugBank statistics by clicking on the Download button.

2.4 WORKING TOOLS:

2.4.1 ORF FINDER:

The ORF Finder (Open Reading Frame Finder) is a graphical analysis tool which finds all
open reading frames of a selectable minimum size in a user's sequence or in a sequence
already in the database. This tool identifies all open reading frames using the standard or
alternative genetic codes. The deduced amino acid sequence can be saved in various formats
and searched against the sequence database using the WWW BLAST server. The ORF Finder
helps in preparing complete and accurate sequence submissions. It is also packaged with the
Sequin sequence submission software.

Link: www.ncbi.nlm.nih.gov/orf_finder.html
22

2.Click orf
1.Paste sequence
here

To use ORF Finder, enter the accession or GI number of the sequence of interest, or enter
your query sequence directly into the text box in FASTA format. ORF Finder will identify all
open reading frames using the standard genetic code or an alternative one for translation.
Users can limit the search for open reading frames to a portion of the query sequence by
specifying the positions (in base pairs) in the "From" and "To" boxes. Press the ORF Find
button to retrieve a graphic display of ORFs and their location in the sequence in 6 reading
frames. Users have the option to change the minimum ORF length to 50 or 300 nucleotides
(in base pairs) and Redraw the query sequence. The Six Frames option features a graphic of
all start and stop codons. Select a particular ORF by clicking on it to see the amino acid
sequence with all alternative start codons. After selecting a particular ORF of interest, click
on the Accept button and have the option to view the ORF in various formats: GenBank flat-
file, FASTA nucleotide, or FASTA amino acid sequence. Selecting View retrieves the full
GenBank record with its annotated sequence information.

For those scientists submitting sequence data, ORF Finder is also packaged with the Sequin
sequence submission software. ORF Finder can be used in conjunction with Sequin’s
Sequence Editor to annotate new coding regions on the record, perform basic editing, and
translate nucleotide sequences. The Sequin program can be downloaded from NCBI’s FTP
site accessible from the NCBI WWW home page.

2.4.2 BASIC LOCAL ALIGNMENT SEARCH TOOL:


23

The BLAST algorithm was developed as a new way to perform a sequence similarity
search by an algorithm that is faster than FASTA while being as sensitive. A powerful
computer system dedicated to running BLAST has been established at NCBI, National
Library of Medicine. Access to this BLAST system is possible through the Internet
(http://www.ncbi.nlm.nih.gov/) as a Web site and through a BLAST E-mail server. There are
also numerous other Web sites that provide a BLAST database search. In addition to the
BLAST programs developed at the NCBI, an independent set of BLAST programs has been
developed at Washington University. These programs perform similarity searches using the
same methods as NCBI-BLAST and produce gapped local alignments. The statistical
methods used to evaluate sequence similarity scores are different, and thus WU-BLAST and
NCBI-BLAST can produce different results.

The BLAST Web server at http://www.ncbi.nlm.nih.gov/ is the most widely used one
for sequence database searches and is backed up by a powerful computer system so that there
is usually very little wait. Like FASTA, the BLAST algorithm increases the speed of sequence
alignment by searching first for common words or k-tuples in the query sequence and each
database sequence. Whereas FASTA searches for all possible words of the same length,
BLAST confines the search to the words that are the most significant. For proteins,
significance is determined by evaluating these word matches using log odds scores in the
BLOSUM62 amino acid substitution matrix. For the BLAST algorithm, the word length is
fixed at 3 (formerly 4) for proteins and 11 for nucleic acids (3 if the sequences are translated
in all six reading frames). This length is the minimum needed to achieve a word score that is
high enough to be significant but not so long as to miss short but significant patterns.

FASTA theoretically provides a more sensitive search of DNA sequence databases


because a shorter word length may be used. Like FASTA, the BLAST algorithm has gone
through several developmental stages. The most recent gapped BLAST, or BLAST2, is
recommended, as older versions of BLAST are reported to overestimate the significance of
database matches (Brenner et al. 1998). The most important recent change is that BLAST
reports the significance of a gapped alignment of the query and database sequences. Former
versions reported several ungapped alignments, and it was more difficult to evaluate their
overall significance.
Steps for searching a protein sequence database by a query protein sequence include the
following:
24

 The sequence is optionally filtered to remove low-complexity regions that are not useful
for producing meaningful sequence alignments.
 A list of words of length 3 in the query protein sequence is made starting with positions 1,
2, and 3; then 2, 3, and 4, etc.; until the last 3 available positions in the sequence are reached
(word length 11 for DNA sequences, 3 for programs that translate DNA sequences).
 Using the BLOSUM62 substitution scores, the query sequence words in step 1 are
evaluated for an exact match with a word in any database sequence. The words are also
evaluated for matches with any other combination of three amino acids, the object being to
find the scores for aligning the query word with any other three-letter word found in a
database sequence.
 A cut-off score called neighbourhood word score threshold (T) is selected to reduce the
number of possible matches to PQG to the most significant ones.
 The above procedure is repeated for each three-letter word in the query Sequence.
 The remaining high-scoring words that comprise possible matches to each three letter
position in the query sequence are organized into an efficient search tree for comparing them
rapidly to the database sequences.
Each database sequence is scanned for an exact match to one of the 50 words
corresponding to the first query sequence position, for the words to the second position, and
so on. If a match is found, this match is used to seed a possible ungapped alignment between
the query and database sequences.
The next step is to determine whether each HSP score found by one of the above methods
is greater in value than a cut-off score S. A suitable value for S is determined empirically by
examining the range of scores found by comparing random sequences, and by choosing a
value that is significantly greater. The high scoring pairs (HSPs) matched in the entire
database are identified and listed.
 BLAST next determines the statistical significance of each HSP score. A probability that
two random sequences, one the length of the query sequence and the other the entire length of
the database (which is approximately equal to the sum of the lengths of all of the database
sequences), could achieve the HSP score is calculated. Sometimes, two or more HSP regions
that can be made into a longer alignment will be found, thereby providing additional evidence
that the query and database sequences are related. In such cases, a combined assessment of
the significance will be made.
25

Smith-Waterman local alignments are shown for the query sequence with each of the
matched sequences in the database. The score of the alignment is obtained and the expect
value for that score is calculated.
When the expect score for a given database sequence satisfies the user-selectable threshold
parameter E, the match is reported.

2.4.3 ICM MOLSOFT PRO:

Easy-to-use and complete desktop-modeling environment for a biologist or a chemist


interested in molecular structure and function.

Platforms Available : Windows Vista/XP/NT/2000, Linux/i386/AMD64, SGI IRIX, Mac OS


X

ICM empowers a biologist or chemist with lightning fast access and high quality interactive
3D views to the entire sturctural database. In just a few seconds you can browse hundreds of
structures of interest load them, analyze and visualize sequences, structures, alignments, sites,
study pockets and bound ligands and drugs, study surfaces, electrostatics, mutations, pockets,
sequence conservations, perform docking of small molecules as well as protein-protein
docking. ICM supports multiple input formats. You can search structural database by field,
sequence pattern and get an interactive table for instant viewing. ICM offers a rich graphical
environment and powerful views for professional quality of images and molecular animation
videos.

The ICM ('Internal Coordinate Mechanics') software project was originally designed
around a new molecular mechanics approach and optimization algorithm for peptide
prediction, homology modelling, loop simulations, flexible macromolecular docking and
refinement, and then was extended to graphics, molecular animations, chemistry, sequence
26

analysis, database searches, mathematics, statistics and plotting. ICM-Pro contains an all
atom internal coordinate force field and efficient algorithm to perform local and global energy
optimization of small or large molecules with respect to an arbitrary subset of variables. In
addition, ICM contains MMFF94 force field for energy optimization in Cartesian space for
any organic molecule. ICM-Pro allows users to read, build, convert, refine, analyze and
superimpose molecules. Includes graphics tools for diverse molecular rendering, perspective
viewing, depth cueing, etc. Uses both hardware and side-by-side stereo. Allows saving and
printing a screen image as a compact vectorized postscript file in addition to a compressed
bitmap.

Molecular graphics:

It utilizes a full and robust array of graphics tools all accessible from a GUI interface.
Displays your molecules in wire, CPK, ball&stick, worm, ribbon, accessible surface,
transparent molecular surface, perspective, depth cueing, smooth and rugged solid surfaces.
Uses both hardware and side-by-side stereo. Save and print a screen image as a compact
vectorized postscript file (also in stereo) in addition to a compressed bitmap. Painlessly create
movies featuring molecules dressed in solid representations such as CPK, smooth molecular
surface, ball-and-stick read, display, reshape and write any 3D object in the Wavefront
format.

Key molecular graphics features of ICM pro:

• Export publication quality molecular images at high resolution and vector images
(metafile)
• Annotate, atoms, residues and sites
• 2D and 3D user-defined labels
• Hydrogen bond and distance labels
• Display atom clashes, distance restraints
• High quality molecular surface representation, skin, wire, xstick and ribbon
representations
• Easy control of thickness, colour and type in molecular graphics. Colour by atom type,
residue side-chain, molecule, unique carbon atom colouring for multiple objects, bfactor,
occupancy, accessibility, hydrophobicity, polarity, secondary structure, paint structure by
alignment colour, colour by user-defined values
27

• Visual effects: dynamic shadows, fog, hardware and side-by-side stereo, clipping planes,
full screen
• Export coloured and annotated sequence alignments.
• Easy to use and control animation effects: rotations, rocking, zooming
• Store current views/viewpoints, layers and slides
• Two kinds of stereo, including a high quality “in-window” mode, as well as a stereo mode
which does not require any special hardware.

Protein Structure Analysis can be done. ICM-Pro provides a direct link to the PDB. Once
you have downloaded a structure you can analyse the structure - flagging problem regions,
superimpose multiple structures, analyse distances and electrostatic properties.

Key protein structure analysis features of ICM pro:

• Dynamic link to the PDB


• One click search and download PDB structures
• Tabulated PDB data for easy manipulation, sorting and searching
• Extract PDB sequence
• PDB file preparation, detecting and fixing problems, optimization of H, His, Asn, Gln
and Pro
• Superimpose multiple structures and calculate RMSD
• Calculate contact area, surface area
• Measure and display distances and angles
• Fully-linked and dynamic structure-sequence environment
• Drug binding pocket prediction
28

• Protein-protein interaction prediction


• One click ligand pocket display and h-bond optimization with ligand
• One click analysis of protein-ligand interactions
• Predict protein flexibility
• Build electrostatic surfaces
• Interactive Ramachandran plots

Crystallographic Analysis Tools:


The key to understanding a protein structure is to fully evaluate the underlying
crystallographic information contained within a PDB file. For example it is important to
understand the full biological unit of a protein to identify if crystal-crystal contacts have
influenced the structure.

The crystallographic analysis features include:

• View crystallographic cell


• Generate crystallographic neighbors
• Build biological units and apply
transformations
• Direct link to electron density map server
• Contour electron density maps
• Convert electron density map to grid energy map for real space refinement

Protein Structure Prediction

Predicting low energy conformations for


chemical compounds, peptides, nucleic acids
etc.: Take a peptide sequence and predict its three-
dimensional structure. Of course, the success is
not guaranteed, especially if the peptide is longer
than about 25 residues but some preliminary tests
are encouraging. Evaluate local secondary
structure preferences directly from the simulation.
Watch a movie with your peptide folding.
29

Protein Modelling: ICM-Pro has a good record in building protein modelling. There are
procedures which will regularize or build the backbone, shake up the side-chains and loops
by global energy optimization. You can also colour the model by local reliability to identify
the potentially wrong parts of the model. This does not include, however, the fast routine for
building a complete model by homology with loops combined with the database search
(ICM-Homology is a separate add-on to ICM-Pro).

Loop modelling and protein design: ICM-Pro was used to


design two new 7 residue loops and in both cases the designs
were successful. Moreover, the predicted conformations turned
out to be exactly right (accuracy of 0.5A) after the
crystallographic structures of the designed proteins were
determined by Rik Wierenga and his co-workers.

Key structure prediction features:

• A variety of different energy terms and grids are available


• Define distance restraints and tethers
• Local minimization
• Protein structure prediction and optimization
• Prediction of the effect of a mutation
• Generation of multiple receptor conformations
• Model using restrainsts and symmetry

Bioinformatics Tools

ICM-Bioinformatics is included in the ICM-Pro package allows users to search a sequence


database with high-quality global pairwise and multiple alignment algorithms. Also allows
pattern searches, prosite and profile searches. Multiple sequence alignments are fast, the
algorithm produces evolutionary trees, principal component view, annotation transfer from
sequence to structures, threading and alignment visualization tools.
30

Sequence Analysis: Find alternative alignments


and repeats using filtered and probability based
dot-plot. Make accurate pairwise sequence
alignment with a double affine gap penalty and
evaluate the probability that the two aligned
sequences share the same structural fold. Build
multiple sequence alignments, construct and plot
evolutionary trees, visualize sequence clustering in two and three dimensions, predict protein
secondary structure with a set of powerful algorithms. Search your sequence interactively or
in batch through any database and generate a list of possible homologues that are sorted and
evaluated by probability of structural significance. The sensitive and rigorous Zega alignment
is used for each comparison. This search may give you more homologues that a BLAST
search! The output may presented in a linked table form. The text sequence databases can be
indexed and queried with ICM.

Key bioinformatics features include:

• Read in sequence and alignments in FASTA and other formats


• Fast sequence searching in Blast databases
• High quality pairwise and multiple alignment generation
• Interactive alignment editing
• Predict sequence secondary structure content
• Structure-linked sequence alignments and alignment annotation
• Drag-and-drop alignment generation

Small Molecule Docking


ICM-Docking provides a unique set of tools for the modeling
of protein/ligand interactions. Performs fast and accurate
docking of fully continuously flexible small molecule ligands
to a protein represented by grid interaction potentials. Allows
users to dock the ligand to the explicit full-atom representation
of the receptor with arbitrarily selected subset of flexible side-
chains. Performs docking by the ICM stochastic global optimization procedure which
combines pseudo-Brownian positional and torsional steps with fast local gradient
31

minimization. Uses continuously differentiable grid potentials to ensure rapid convergence of


local minimizations. Contains a sophisticated algorithm for tracking the simulation trajectory
to avoid trapping in sub-optimal conformations and allows efficient search of the
conformational space. Provides tools for automatic conversion of 2D chemical structures to
3D, sophisticated atom type assignment, charge assignment and recognition of rotatable
bonds. Allows parts of the ligand to be automatically constrained to a pre-defined position
during docking. Generates multiple conformations of the free or docked ligand. Special
Monte-Carlo steps allow sampling of stereo isomers for racemic compounds. Analyzes
protein surface for potential binding pockets and displays the interaction properties on the
'skin' representation of the surface. Uses graphical user interface for easy set up of the
simulations. Provides maximum flexibility to user by allowing the docking scripts, which are
written in intuitive ICM molecular modeling scripting language, to be modified to best meet
specific project requirements. Performs protein-protein docking with fast global rigid-body
search with grid potentials. Refines best docked configurations with flexible side chains to
allow for the induced fit.

Small molecule docking features include:

• Drug pocket identification, analysis and visualization tools


• Small molecule docking
• Sample racemic centers and double bond cis/trans
• Relax covalent geometry
• Keep carboxyls neutral and set charges for amino groups
• Template docking
• Incorporation of flexibility into the ligand and receptor side chains/backbone
• Multiple-receptor conformation docking
• Automated model building into density - docking to electron density
• Tabulated and easy to visualize docking results
• Multiple solutions ranked by energy values

Protein-Protein Docking

The ICM-Protein-Protein docking procedues has


continually lead the pack in docking accuracy in the
32

worldwide CAPRI protein- protein docking competition. In the past ICM has been used to
dock ab initio a full-atom model of lysozyme to an antibody with 1.6A accuracy (Nature
Struc.Biol., 1994, 1,259). Later, Maxim Totrov and Ruben Abagyan correctly predicted the
association of beta-lactamase and its protein inhibitor in the Docking Challenge (Nature
Struc.Biol., 1996,3,290) using the ICM pseudo-Brownian docking with subsequent ICM side-
chain refinement.

2.4.4 MOLEGRO VIRTUAL DOCKER:


Molegro Virtual Docker (MVD) is an integrated environment for studying and predicting how
ligands interact with macromolecules. The identification of ligand binding modes is done by
iteratively evaluating a number of candidate solutions (ligand conformations) and estimating
the energy of their interactions with the macromolecule. The highest scoring solutions are
returned for further analysis.

MVD requires a three-dimensional structure of both protein and ligand (usually derived from
X-ray/NMR experiments or homology modelling). MVD performs flexible ligand docking, so
the optimal geometry of the ligand will be determined during the docking.

The system requirements for Molegro Virtual Docker are:

Windows Vista, 2003, XP, or 2000.


 Linux: Most standard distribution..
Mac OS X 10.4 (and later versions).

Molegro Virtual Docker contains a built-in version checker making it easy to check for new
program updates including new features and bug fixes. To check for new updates, select Help
| Check for Updates. A window showing available updates and details about changes made
will appear.
33

The MolDock scoring function (MolDock Score) used by MVD is derived from the PLP
scoring functions originally proposed by Gehlhaar et al. [GEHLHAAR 1995,1998] and later
extended by Yang et al. [YANG 2004]. The MolDock scoring function further improves these
scoring functions with a new hydrogen bonding term and new charge schemes. The docking
scoring function, Escore, is defined by the following energy terms:

Escore = Einter + Eintra

Where, Einter is the ligand-protein interaction energy.

After MVD has predicted one or more promising poses using the MolDock score, it
calculates several additional energy terms. All of these terms are stored in the
'DockingResults.mvdresults' file at the end of the docking run. The 'rerank score' is a linear
combination of these terms, weighted by the coefficients given in the
'RerankingCoefficients.txt'.

A '.mvdresults' file is not meant to be interpreted or inspected manually. Instead it should be


opened in MVD (either by dragging it onto the workspace or by selecting 'File | Import
Docking Results (*.mvdresults)...'. It is also possible to open the file in the Data Analyzer in
order to create new regression models based on the energy terms in the file.
34

The following table explains the different terms in a '.mvdresults' file:

Textual Information
• Ligand: The name of the ligand the pose was created from.
• Name: The internal name of the pose (a concatenation of the pose id and ligand
name).
• Filename: The file containing the pose.
• Workspace: The workspace (.mvdml-file) containing the protein.
• Run: When running multiple docking runs for each ligand, this field contains the
docking run number.
Energy terms (total):
• Energy: The MolDock score (arbitrary units).
• RerankScore: The reranking score (arbitrary units).
• PoseEnergy: The score actually assigned to the pose during the docking.
• SimilarityScore: Similarity Score (if docking templates are enabled).
• LE1 Ligand Efficiency 1: MolDock Score divided by Heavy Atoms count.
• LE3 Ligand Efficiency 3: Rerank Score divided by Heavy Atoms count.
Energy terms (contributions)
• E-Total: The total MolDock Score energy is the sum of internal ligand energies,
protein interaction energies and soft penalties.
• E-Inter total: The total MolDock Score interaction energy between the pose and the
target molecule(s).
• E-Inter (cofactor - ligand): The total MolDock Score interaction energy between the
pose and the cofactors. (The sum of the steric interaction energies calculated by PLP,
and the electric and hydrogen bonding terms below).
• Cofactor (VdW): The steric interaction energy between the pose and the cofactors
calculated using a LJ12-6 approximation.
• Cofactor (elec): The electrostatic interaction energy between the pose and the
cofactors.
• Cofactor (hbond): The hydrogen bonding interaction energy between the pose and the
cofactors (calculated by PLP).
35

• E-Inter (protein - ligand): The MolDock Score interaction energy between the pose
and the protein. (Equal to Steric+HBond+Electro+ElectroLong below)
• Steric: Steric interaction energy between the protein and the ligand (calculated by
PLP).
• HBond: Hydrogen bonding energy between protein and ligand (calculated by PLP).
• Electro: The short-range (r<4.5Å) electrostatic protein-ligand interaction energy.
• ElectroLong: The long-range (r>4.5Å) electrostatic protein-ligand interaction energy.
• NoHBond90: This is the hydrogen bonding energy (protein-ligand) as calculated if the
• Directionality of the Hbond was not taken into account.
• VdW (LJ12-6): Protein steric interaction energy from a LJ 12-6 VdW potential
approximation.
• E-Inter (water - ligand): The MolDockScore interaction energy between the pose and
the water molecules.
• E-Intra (tors, ligand atoms): The total internal MolDockScore energy of the pose.
• E-Intra (steric): Steric self-interaction energy for the pose (calculated by PLP).

• E-Intra (hbond): Hydrogen bonding self-interaction energy for the pose (calculated by
PLP).
• E-Intra (elec): Electrostatic self-interaction energy for the pose.
• E-Intra (tors) Torsional energy for the pose.
• E-Intra (sp2-sp2) Additional sp2-sp2 torsional term for the pose.
• E-Intra (vdw) Steric self-interaction energy for the pose (calculated by a LJ12-6 VdW
approximation).
• E-Solvation The energy calculated from the implicit solvation model.
• E-Soft Constraint Penalty The energy contributions from soft constraints.

Static terms
• Torsions: The number of (chosen) rotatable bonds in the ligand.
• HeavyAtoms: Number of heavy atoms.
• MW Molecular weight (in dalton).
• C0 Obsolete constant term: This value is always 1.
• CO2minus: Number of Carboxyl groups in ligand.
• Csp2: Number of Sp2 hybridized carbon atoms in ligand.
36

• Csp3: Number of Sp3 hybridized carbon atoms in ligand.


• DOF Degrees of internal rotational freedom:. As of now this is the number of chosen
rotatable bonds in the ligand and is thus equal to the 'Torsions' term. It is supposed to
reflect how many rotational degrees of freedom are lost upon binding. Future work
may include a more advanced model where the actual conformation is inspected in
order to determine whether rotational degrees of freedom are lost.
• N: Number of nitrogen atoms in ligand.
• Nplus: Number of positively charged nitrogen atoms in ligand.
• OH: Number of hydroxyl groups in ligand.

• OPO32minus: Number of PO4

• 2-- Groups in ligand.


• OS Number of ethers and thioethers in ligand.
• Carbonyl: Number of Carbonyl groups in ligand.
• Halogen: Number of Halogen groups in ligand.

Other terms
RMSD: The RMS deviation from a reference ligand.

The docking search algorithm (MolDock Optimizer) used in MVD is based on an


evolutionary algorithm [MICHALEWICZ 1992, 2000]. Evolutionary algorithms (EAs) are
iterative optimization techniques inspired by Darwinian evolution theory. In EAs, the
evolutionary process is simplified and thus it has very little in common with real world
evolution. Nevertheless, during the last fifty years EAs have proved their worth as powerful
optimization techniques that can assist or replace traditional techniques when these fail or
are inadequate for the task to be solved. Basically, an EA consists of a population of
individuals (candidate solutions), which is exposed to random variation by means of variation
operators, such as mutation and recombination. The individual being altered is often referred
to as the parent and the resulting solution after modification is called the offspring.
Sometimes more than one parent is used to create the offspring by recombination of solutions,
which is also referred to as crossover.
37

The guided differential evolution algorithm (MolDock Optimizer) used in MVD is based on
an EA variant called differential evolution (DE). The DE algorithm was introduced by Storn
and Price in 1995 [STORN 1995]. Compared to more widely known EA-based techniques
(e.g. genetic algorithms, evolutionary programming, and evolution strategies), DE uses a
different approach to select and modify candidate solutions (individuals). The main
innovative idea in DE is to create offspring from a weighted difference of parent solutions.

The DE works as follows:


First, all individuals are initialized and evaluated according to the MolDock Score (fitness
function). Afterwards, the following process will be executed as long as the termination
condition is not fulfilled: For each individual in the population, an offspring is created by
adding a weighted difference of the parent solutions, which are randomly selected from the
population.

Afterwards, the offspring replaces the parent, if and only if it is more fit. Otherwise, the
parent survives and is passed on to the next generation iteration of the algorithm).
Additionally, guided differential evolution may use a cavity prediction algorithm to constrain
predicted conformations (poses) during the search process. More specifically, if a candidate
solution is positioned outside the cavity, it is translated so that a randomly chosen ligand atom
will be located within the region spanned by the cavity.
38

Naturally, this strategy is only applied if a cavity has been found. If no cavities are reported,
the search procedure does not constrain the candidate solutions. One of the reasons why DE
works so well is that the variation operator exploits the population diversity in the following
manner: Initially, when the candidate solutions in the population are randomly generated the
diversity is large. Thus, when offspring are created the differences between parental solutions
are big, resulting in large step sizes being used. As the algorithm converges to better
solutions, the population diversity is lowered, and the step sizes used to create offspring are
lowered correspondingly. Therefore, by using the differences between other individuals in the
population, DE automatically adapts the step sizes used to create offspring as the search
process converges toward good solutions.

Only ligand properties are represented in the individuals since the protein remains rigid
during the docking simulation. Thus, a candidate solution is encoded by an array of real-
valued numbers representing ligand position, orientation, and conformation as Cartesian
coordinates for the ligand translation, four variables specifying the ligand orientation
(encoded as a rotation vector and a rotation angle), and one angle for each flexible torsion
angle in the ligand (if any). Each individual in the initial population is assigned a random
position within the search space region (defined by the user).

Initializing the orientation is more complicated: By just choosing uniform random numbers
for the orientation axis (between -1.0 and 1.0 followed by normalization of the values to form
a unit vector) and the angle of rotation (between -180° and +180°), the initial population
would be biased towards the identity orientation (i.e. no rotation). To avoid this bias, the
algorithm by Shoemake et al. [SHOEMAKE 1992] for generating uniform random
quaternions is used and the random quaternions are then converted to their rotation
axis/rotation angle representation. The flexible torsion angles (if any) are assigned a random
angle between -180° and +180°.

In MVD, the following default parameters are used for the guided differential evolution
algorithm: population size = 50, crossover rate = 0.9, and scaling factor = 0.5. These settings
have been found by trial and error, and are generally found to give the best results across a
test set of 77 complexes.
39

In order to determine the potential binding sites, a grid-based cavity prediction algorithm
has been developed. The cavity prediction algorithm works as follows:
First, a discrete grid with a resolution of 0.8 Å covering the protein is created. At every grid
point a sphere of radius 1.4 Å is placed. It is checked whether this sphere will overlap with
any of the spheres determined by the Van der Waals radii of the protein atoms. Grid points
where the probe clashes with the protein atom spheres will be referred to as part of the
inaccessible volume, all other points are referred to as accessible.

Second, each accessible grid point is checked for whether it is part of a cavity or not using the
following procedure: From the current grid point a random direction is chosen, and this
direction (and the opposite direction) is followed until the grid boundaries are hit, checking if
an inaccessible grid point is hit on the way. This is repeated a number of times, and if the
percentage of lines hitting an inaccessible volume is larger than a given threshold, the point is
marked as being part of a cavity. By default 16 different directions are tested, and a grid point
is assumed part of a cavity if 12 or more of these lines hit an inaccessible volume. The
threshold can be tuned according to how enclosed the found cavities should be. A value of 0%
would only be possible far from the protein as opposed to a value of 100% corresponding to a
binding site buried deeply in the protein.

The final step is to determine the connected regions. Two grid points are connected if they are
neighbours. Regions with a volume below 10.0 Å3 are discarded as irrelevant (the volume of
a connected set of grid points is estimated as the number of grid point times the volume of a
unit grid cell). The cavities found are then ranked according to their volume.

Clustering Algorithm
The multiple poses returned from a docking run are identified using the following procedure:
 During the docking run, new candidate solutions (poses) scoring better than parental
solutions are added to a temporary pool of docking solutions.
 If the number of poses in the pool is higher than 300, a clustering algorithm is used to
cluster all the solutions in the pool. The clustering is performed on-line during the docking
search and when the docking run terminates. Because of the limit of 300 poses, the clustering
process is fast. The members of the pool are replaced by the new cluster representatives found
(limited by the Max number of poses returned option).
40

The clustering procedure works as follows:


1. The pool of solutions is sorted according to energy scores (starting with the best-scoring
pose).
2. The first member of the sorted pool of solutions is added to the first initial cluster and the
member is assigned to be the cluster representative.
3. The remainder of the pool members are added to the most similar cluster available (using
the common RMSD measure) if and only if the RMSD between the representative of the most
similar cluster and the member is below a user-specified RMSD threshold. Otherwise, a new
cluster is created and the member is assigned to be the cluster representative.
4. The clustering procedure is terminated when the total number of clusters created exceeds
Max number of poses returned (user-defined parameter) or when all members of the pool
have been assigned to a cluster.
5. When the cluster procedure has terminated, the set of representatives (one from each
cluster) is returned.

MVD accepts the following molecular structure formats:

 PDB (Protein Data Bank). Supported file extensions: pdb/ent.


 Mol2 (Sybyl Mol2 format). Supported file extensions: mol2.
SDF (MDL format). Supported file extensions: sdf/sd (for multiple structures) and
mol/mdl (for a single molecular structure).

Additionally, Molegro Virtual Docker uses its own MVDML file format. MVDML is a
shorthand notation for Molegro Virtual Docker Markup Language and is an XML-based file
format. In general, MVDML can be used to store the following information:
 Molecular structures (atom coordinates, atom types, partial charges,
bond orders, hybridization states, ...)
 Constraints (location, type, and constraint parameters)
 Search space (center and radius)
 State information (workspace properties)
 Cavities (location, cavity grid points)
41

2.5 HOMOLOGY MODELLING:

Prediction of a three-dimensional structure of a given protein sequence (target) based on an


alignment to one or more known protein structures (templates). If similarity between the
target sequence and the template sequence is detected, structural similarity can be assumed. In
general, 30% sequence identity is required to generate a useful model. It can be used to
understand function, activity, specificity, etc. It is of interest to drug companies wishing to do
structure-aided drug design.

Structure prediction by homology modelling

Homology modelling makes two fundamental assumptions:


 The structure of a protein is determined by its primary amino acid sequence (Anfinsen).
 During evolution, the structure of protein has changed much slower than its sequence.

Similar sequences adopt identical structures and distantly related sequences fold into similar
structures.
42

Homology Modelling Steps:

1) Template recognition & initial alignment


Select the best template from a library of known protein structures derived from the
PDB Templates can be found using the target sequence as a query for searching using
FASTA or BLAST.

To find a template or templates structures from protein data base:


43

2) Alignment correction
Alignments are scored (substitution score) in order to define similarity between 2 aa
residues in the sequences A substitutions score is calculated for each aligned pair of
letters. Substitution matrices:
- Reflect the true probabilities of mutations occurring through a period of evolution
-PAM family: based on global aligments of closely related proteins. Mutation
probability matrix.
- BLOSUM family: based on observed alignments, no extrapolation of sequences
that are related.

3) Backbone generation
Uses known structurally conserved regions to generate coordinates for the unknown
For SCRs - copy coordinates from known structures.

For variable regions (VR) - copy from known structure, if the residue types are
similar; otherwise, use databases for fragmented loop sequences.
44
45

4) Loop modelling:
Loops are created as a result of substitutions, insertions and deletions in the same
family.

Loop modelling is done by:


 Database search for segments from known protein structures fitting fixed end-
points
 Molecular mechanics/molecular dynamics
 Combination of 1+2
For missing loops, Ab initio rebuilding is done.

5) Side-chain modelling
Use of rotamer libraries (backbone dependent)
46

 Molecular mechanics optimization


- Dead-end elimination (heuristic)
- Monte Carlo (heuristic)
- Branch & Bound (exact)
 Mean-field methods

6) Model optimization
Done by molecular mechanics methods.
7) Model validation:

Model should be evaluated for:


- Correctness of the overall fold/structure
- Errors over localized regions
- Stereo chemical parameters: bond lengths, angles, etc
Some softwares for model verification:
- Procheck http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html
-WHAT IF http://swift.cmbi.kun.nl/whatif
-PROSA II http://www.came.sbg.ac.at/Services/prosa.html
-Profile 3D & Verify 3D http://shannon.mbi.ucla.edu/DOE/Services

Frequently used servers and softwares for homology modelling:

Online servers: CPH protein model server; PS2 protein model server; 3D JIGSAW;
PHYRE

Offline servers: SPDBV

Commercial tool: ICM Molsoft

Application of homology modelling:

Structure-based assessment of target drugability


Structure-guided design of mutagenesis experiments
Tool compound design for probing biological function
Homology model based ligand design
Design of in vitro test assays.
47

Structure-based prediction of drug metabolism and toxicity.

2.6 DOCKING:

Docking is nothing but computer simulation of binding interaction between two molecules.
These two molecules may be:

1. Two proteins

2. A protein and a drug

3. A nucleic acid and a drug.

The first docking program was given by Kuntz[1982].

Docking strategy:
48

Types of docking:

1. Rigid docking: In this type of docking, both the molecules are kept rigid. That is,
their side chains are not movable. This type of docking is not natural and is done only
in the softwares.

2. Semi-flexible docking: in this type of docking, the larger protein molecule is made
rigid, whereas the smaller ligand is kept flexible. This usually done in protein and
drug docking. It is also known as quasi-flexible docking.

3. Flexible docking: in this type of docking, both the molecules are kept flexible in
nature and this is the only type of docking which is seen in natural conditions also.

Search algorithm in docking:

Every docking software, follows a method i.e an algorithm to perform the docking
process and to give the best drug for a particular protein. This process or method is known as
search algorithm or search strategy. There are four types of search algorithms:

1. Random search algorithm:

Genetic algorithm: e.g. - AUTODOCK, GOLD.

Monte Carlo method: e.g. - PRODOCK, MC-DOCK, ICM DOCKVISION, GLIDE

Tabu search

2. Systematic search algorithm:

Fragment based method: e.g. - DOCK, FLEXX, ADAM

Point complementary method/ conformational method

Distance geometry method


49

Database method: e.g.-FLOG, EUDOC

3. Simulation search algorithm:

Molecular dynamics

Energy minimization

4. Multiple methods algorithm

Scoring functions used in docking:

Are the functions which are used to score the proteins and ligand complexes an give
us that complex which is having the least energy value. So, for every docking run, a particular
score is given by the scoring functions. Scoring functions vary from tool to tool. There are
three types of scoring functions:

1. Force field based scoring function: atomic structure, valency, bond angle, bond
length etc.

• GOLD score

• G score

• D score

• AMBA score

• CHARM

• GROMOS

2. Emperical based scoring functions: statistics-of regression coefficients etc.

• Chem score
50

• Bȍhm’s scoring function

• F score

• X core

3. Knowledge based scoring functions: experimental data-x-ray crystallography, r-


value, MNR values.

• Drug score

• SMOG score

• Potential of mean score[pmm]


51

3.1 TABLE OF DATABASES USED


NAME URL USED FOR
NCBI www.ncbi.nlm.nih.gov mRNA and protein sequence
of abl1 gene of human
PDB www.rcsb.org/pdb/home/home Blastp against PDB can

CHAPTER - 3
search for structural
similarity of the protein
DrugBank www.drugbank.ca Download the drug Imatinib
in mol format
Exonic mutation Clients’ research Preparing the mutant
information
MATERIALS nucleotides[mRNA]

3.2

AND
METHODS
TABLE OF TOOLS, THEIR SOURCES AND THE WORKING METHOD
THEY ARE USED IN:
NAME URL/ SOURCE USED FOR/
WORKING
METHOD
ORF Finder http://www.ncbi.nlm.nih.gov/gorf/gorf.html Finding the correct
reading frame of a
given nucleotide
sequence
Blast p http://blast.ncbi.nlm.nih.gov/Blast.cgi Pair-wise sequence
alignment to find
similarity against
52

PDB and find the


structure of abl1
protein
ICM Molsoft Pro Purchased from ICM makers and installed the Homology
software on the system modelling of the
mutant protein
sequences using the
template searched
by blastp
Molegro Virtual Purchased from Molegro makers and installed the Docking the drug
Docker software on the system imatinib on to the
mutant protein
sequences
Argus lab 4.0.1 Free download from argus lab site Open and view PDB
files

NOTE: The protocol followed while using the above databases and tools to
for the above mentioned working methods are detailed in the next chapter.
53

4.1 mRNA and Protein sequence of ABL1 c-abl Oncogene 1, receptor

CHAPTER - 4
THE
EXPERIMENT
tyrosine kinase Download:

 Opened the NCBI website by entering the following url in the internet explorer address
bar:

www.ncbi.nlm.nih.gov

Chose “nucleotide” from the database scroll list and typed < abl1 AND human> in the
search box and clicked on search.
54

Among the many results returned, the required result was chosen and the mRNA and the
protein sequence of the same were opened through the link given [marked by arrow]

Both the sequences were opened in the FASTA format and saved. Here there are two
transcripts shown we chose the second one in random.

4.2 Creating mutant mRNA sequence:

The client has sequenced Abl tyrosine kinase domain in CML patients and provided us with
the point mutation data. Using this data, we created mutant mRNA sequences by simply
editing the base at which the mutation has occurred in the mRNA sequenced downloaded
from NCBI in a notepad. And while doing so, spaces if any were removed. Each mutant
nucleotide sequence was saved as a separate file.

4.3 ORF finder:

Now, to convert these mutant nucleotide sequences into a protein sequence, we used the
ORF Finder [Open Reading Frame Finder] provided by the NCBI, at the following link:

http://www.ncbi.nlm.nih.gov/gorf/gorf.html

Copied and pasted the mutant nucleotide sequence in the box given and clicked on ORF
find button. A screen shot of the same is shown below.
55

The result of the ORF find retrieved is shown below. We chose that sequence and frame in
which our mutation is likely to be present.
56

The screen shot below shows the result of the protein sequence of the chosen frame and
length.
57

After clicking on the accept button, we chose to view the sequence in Fasta protein from
the view scroll list and saved the result in a notepad.

4.4 BLASTp:
Now that we have got the mutant protein sequences, we needed to find their structures by
homology modelling. But for that reason, we needed a template. And this template was
obtained by using Basic Local Alignment Search Tool. The BLASTp feature, provided by the
NCBI, allows us to search a protein database using a protein query.

For this on the www, the following link was opened:

http://blast.ncbi.nlm.nih.gov/Blast.cgi
58

The link to BLASTp, [encircled in red] was then opened. This is shown below in two
screen shots. In the first , the sequence in its fasta format is pasted and in the second, the job
title, the database and the algorithm are specified and BLAST button was clicked.
59

The result was then obtained.


60

The results have shown that the query sequence shows 100% identity with the protein 2e2b
[Crystal structure of the c-Abl kinase domain in complex with INNO-406 ] in the PDB
database. The pdb file of this protein was downloaded from the PDB site.

4.5 HOMOLOGY MODELLING BY USING ICM MOLSOFT PRO:

After opening the application, from the tool bar, <file – new> was initiated. This opens up
a dialogue box “new molecule/sequence/grob” in this, the protein sequence of the mutant
nucleotide obtained from the ORF finder was pasted in its Fasta format and a sequence name
was given and then clicked ok.

Now the protein sequence is uploaded in the work space. From the tool bar <file—open>
was chosen and the template i.e 2e2b was imported. And the screen shot of the same is shown
below.
61

In this case, the template was already in object format. If the template is not in the object
format, it won’t show itself in the work space. In such a case, the template can be converted
into an object using <ICM CONVERT—protein > under MolMechanics in the tool bar.

This will open a dialogue box “convert molecular object to.....” choose the options as shown
and click ok.
62

When in the status window shows: <if yes cool a_0> that means the convert is completed.

Next step is to build the homology model. For this, “build model” under homology in the
tools bar is clicked.

This opens up a dialogue box “build model by homology”


63

The sources: fields were chosen by us. The preferences: fields are default settings and the
options were chosen as shown.

Building the model takes a few minutes. But then the result is retrieved as shown below.
64

The mid box[marked with red] shows alignment between the template and the protein
sequence uploaded for homology modelling. The molecule can be viewed properly by using
the viewing tools on the right [marked with red]. The loop beneath the alignment box gives
information about the loops.

The work space is saved by right clicking the “icm” icon next to the protein sequence in
the selection space and then “save as”. The file is saved as a PDB file.

4.6 DOCKING-MOLEGRO VIRTUAL DOCKER:

After opening the application, the mutant protein model was imported by <File—import
molecule> and then browsing and selecting the molecule from the folder.
65

This opened a dialogue box “import molecules” and the required options were chosen.

This imported the molecule into the model / docking visualization box. Now the protein
molecule is to be prepared for docking. For this <preparation—prepare molecule> from tool
bar was initiated. The screen shot of the same is shown below.
66

This opened a dialogue box “prepare molecules”. The appropriate fields were chosen.
67

Next, the protein surface is to be created. For this, a right click on the protein icon in the
work space and subsequently choosing the “create surface” opened a dialogue box “create
surface”. The appropriate fields were chosen. And the create surface was initiated. The screen
shots of these steps are shown below.
68

Once the “probing of grid points” is done, the protein molecule’s surface is created. Now,
the cavities [in green] of the protein molecule were detected as shown below.
69
70

While performing cavity detection, the


application opens up a dialogue box
“cavity prediction”. The fields of which
were chosen as below.

Next, the drug “imatinib” was imported in the same way as of the protein [first step].

Now the docking wizard was opened.


71

Now a series of dialogue boxes were opened and the appropriate fields were selected as
shown.
72
73

The fields that have been shown by arrows were set by us and the fields not marked are
default settings. When start button was clicked, the docking was initiated. Some screen shots
were taken during the process and are shown below.
74
75

The result obtained is shown as under.

Now the MVD batchjob dialogue box is closed. And the results are imported into the work
space as shown below.
76

Each pose dock, can be now visualized by selecting the pose of interest.
77

This now opens up as:

Drug
docked in
the cavity.
78

CHAPTER - 5
RESULTS AND 5
.1C
LI
T’ DISCUSSIONS EN
S

RESEARCH DATA ON THE EXONIC MUTATIONS IN ABL1


TYROSINE KINASE GENEIN HUMANS:
POSITIO Type of 5' 3' Patient’s codes used for
N SNP flanking flankin sequencing
sequenc g
e sequen
ce
165970 deletion GGGG TTTT P264,P 69,P56, P265,P 106
-G
166015 T/G AAAT CTTA P268, P114,P 81, P171,P 98
166024 C/T ATGT TATA P79
166109 G/A TCTA ACTT ON1, P56, P7, ON4
166200 A/C TGGA TCCC P30,
79

166238G/C CCAA CCTT 0N1, ON7,


166248C/A GAAA AATG P136,
166307T/G GCAG GGGG P47
16 Insertion CAGT GGGG P274, P169
6307 -
T
166369 A/C TGAA GTTC P95, ON9, P265, P259, P130, P88, P66,
P257, P97, P250, P119, P257,
166373 C/A AGTT ACAG 0N38,
166375 C/A TTCA AGAC P115, P260, P119, P58
166410 G/A GGCA AGGT 0N41, P260, P263, P56, ON21
148967 A/C CACC CGCT P276, P25
148977 A/C CATT TCCA P66
148982 G/A TCCA CCCC P102
14,90,29, C/A, G/A ACTA ACAA P143
030
149056 C/A CGGA ATCA P179
149119 G/A GAAA AAAT P273, P208, P25, P242, P246, HO1*,
FO1 *E1 FORWARD PRIMER
149294 C/A CCGT TTAT P3, P265,
149251 C/A CCTG TGTC P174, P256,
149285 C/A TTGA TTTT P182, P255* (HETEROZYGOUS FAMILY
DETAILS/HISTORY,
15 Insertion CCTG TCTG P62, P110, P237
8108 -A
158238 T/A TTTT CCTT P70
158417 C/A GCGG TCAC P278,P214
158816 A/G TCCC CACG P81, P20,P16,P83, P183, P200, P81,
P59, P277, P104, P42
158806 T/C GCCA CTCC P116, P135, P115
158896 C/T AATC TTCA ON48, ON46, P260, P117, P168, P38,
P173
159110 G/A CTCA ATCT P115, P76, P43,
159178 C/A GCAG CTGC P72, P9, P45, P194, P33, P92, P70, P90,
P91
159229 C/A AAAG CCCC P173,
160838 G/T CACT TTTT P262, P261, ON45, P179,
160847 G/T AATT CCGT P182, ON35, ON7, P179, ON16, P128,
P137,
160938 A/C GTGA GTGG P257
160935 T/G TTTG GAAG ON45
160987 G/A CTTA AAAT P247, P79, P175, P201, P245, P158,
P163, P246, P89, P128, P182, P100,
P243, P176, P226, P214, ON45, P244,
P184, P11, P174, P257, 0N30, P276
160983 C/T CTTT TTAG P235, P159, P233, P42, P244, P59, P51,
P56, P76, P128, P126, P44, P7, P37,
P74, ON50, P62, P176, P52, P280,
P179, P133, P275, 0N45, P134, P261,
ON21, P212, P228, P11, P119, P14,
80

P135, P21, P66, P29, ON38, P243, P86,


P10, P41, ON17, P45

161068 C/A ATGA AGGG P200, P210


161141 C/A CCTA AACA P213, P156
161136 G/T CCTG CCTA AO4 WELL E4 FORWARD
161153 C/A TCTC ATCA P201, P182, P33, P72,
161196 G/A TGAA TGGT P247,P43, P136, P200, P137,
161227 T/A TTTT CTGC ON41, P174, P214, P102, P26, P148,
P225,P214, P154,
164404 A/C TGGT AAAT P123
164443 C/T CCTT TGAG P242, P261, P183, P137, P138,
P207,P141, P182, P269, P158, P277,
P195, P129, P133, P132, P124, P120,
P119
164469 T/G CTGA TTTA ON9
164490 A/G TGAA TGCT P131
164488 A/C AGTG AATG P132
164493 C/G AATG TACA 0N38, ON35,
164530 T/G TTCT TCAG ON16,P141, P142, P150, P153, P260,
P122,
164607 T/G CAGG GTAT P261,
164760 C/A TGTA ACAA P142
164762 C/A TACA AAAG P142
164764 A/C ACAC AAGT P146
164818 A/G AGCT ATGT P148, P260, P175, P245, P136, P36,
P185

In the above table, the ones that are marked with stars belong to exonic region and the others
are intronic. Since post transcription, splicing of introns occur, the intronic mutations are not
taken into account here. So in all there are 16 exonic mutations. Since p25 has a double
mutation [148967+149119], we can say there are 17 different mutation cases as per the data
given.

5.2 Database search for wild type mRNA and protein sequence:

mRNA sequence :
>gi|62362411|ref|NM_007313.2| Homo sapiens c-abl oncogene 1, receptor tyrosine kinase (ABL1),
transcript variant b, mRNA
81

GGTTGGTGACTTCCACAGGAAAAGTTCTGGAGGAGTAGCCAAAGACCATCAGCGTTTCCT
TTATGTGTGAGAATTGAAATGACTAGCATTATTGACCCTTTTCAGCATCCCCTGTGAATATTT
CTGTTTAGGTTTTTCTTCTTGAAAAGAAATTGTTATTCAGCCCGTTTAAAACAAATCAAGA
AACTTTTGGGTAACATTGCAATTACATGAAATTGATAACCGCGAAAATAATTGGAACTCCT
GCTTGCAAGTGTCAACCTAAAAAAAGTGCTTCCTTTTGTTATGGAAGATGTCTTTCTGTGA
TTGACTTCAATTGCTGACTTGTGGAGATGCAGCGAATGTGAAATCCCACGTATATGCCATTT
CCCTCTACGCTCGCTGACCGTTCTGGAAGATCTTGAACCCTCTTCTGGAAAGGGGTACCTA
TTATTACTTTATGGGGCAGCAGCCTGGAAAAGTACTTGGGGACCAAAGAAGGCCAAGCTT
GCCTGCCCTGCATTTTATCAAAGGAGCAGGGAAGAAGGAATCATCGAGGCATGGGGGTCC
ACACTGCAATGTTTTTGTGGAACATGAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAG
CCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCC
AGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACA
CTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATG
GTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAG
TCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGT
ATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGAGTGAGAGCAGTCCTG
GCCAGAGGTCCATCTCGCTGAGATACGAAGGGAGGGTGTACCATTACAGGATCAACACTG
CTTCTGATGGCAAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGT
TCATCATCATTCAACGGTGGCCGACGGGCTCATCACCACGCTCCATTATCCAGCCCCAAAG
CGCAACAAGCCCACTGTCTATGGTGTGTCCCCCAACTACGACAAGTGGGAGATGGAACGC
ACGGACATCACCATGAAGCACAAGCTGGGCGGGGGCCAGTACGGGGAGGTGTACGAGGG
CGTGTGGAAGAAATACAGCCTGACGGTGGCCGTGAAGACCTTGAAGGAGGACACCATGG
AGGTGGAAGAGTTCTTGAAAGAAGCTGCAGTCATGAAAGAGATCAAACACCCTAACCTG
GTGCAGCTCCTTGGGGTCTGCACCCGGGAGCCCCCGTTCTATATCATCACTGAGTTCATGA
CCTACGGGAACCTCCTGGACTACCTGAGGGAGTGCAACCGGCAGGAGGTGAACGCCGTG
GTGCTGCTGTACATGGCCACTCAGATCTCGTCAGCCATGGAGTACCTGGAGAAGAAAAAC
TTCATCCACAGAGATCTTGCTGCCCGAAACTGCCTGGTAGGGGAGAACCACTTGGTGAAG
GTAGCTGATTTTGGCCTGAGCAGGTTGATGACAGGGGACACCTACACAGCCCATGCTGGA
GCCAAGTTCCCCATCAAATGGACTGCACCCGAGAGCCTGGCCTACAACAAGTTCTCCATC
AAGTCCGACGTCTGGGCATTTGGAGTATTGCTTTGGGAAATTGCTACCTATGGCATGTCCC
CTTACCCGGGAATTGACCTGTCCCAGGTGTATGAGCTGCTAGAGAAGGACTACCGCATGG
AGCGCCCAGAAGGCTGCCCAGAGAAGGTCTATGAACTCATGCGAGCATGTTGGCAGTGGA
ATCCCTCTGACCGGCCCTCCTTTGCTGAAATCCACCAAGCCTTTGAAACAATGTTCCAGGA
ATCCAGTATCTCAGACGAAGTGGAAAAGGAGCTGGGGAAACAAGGCGTCCGTGGGGCTG
TGAGTACCTTGCTGCAGGCCCCAGAGCTGCCCACCAAGACGAGGACCTCCAGGAGAGCT
GCAGAGCACAGAGACACCACTGACGTGCCTGAGATGCCTCACTCCAAGGGCCAGGGAGA
GAGCGATCCTCTGGACCATGAGCCTGCCGTGTCTCCATTGCTCCCTCGAAAAGAGCGAGG
TCCCCCGGAGGGCGGCCTGAATGAAGATGAGCGCCTTCTCCCCAAAGACAAAAAGACCA
ACTTGTTCAGCGCCTTGATCAAGAAGAAGAAGAAGACAGCCCCAACCCCTCCCAAACGC
AGCAGCTCCTTCCGGGAGATGGACGGCCAGCCGGAGCGCAGAGGGGCCGGCGAGGAAG
AGGGCCGAGACATCAGCAACGGGGCACTGGCTTTCACCCCCTTGGACACAGCTGACCCA
GCCAAGTCCCCAAAGCCCAGCAATGGGGCTGGGGTCCCCAATGGAGCCCTCCGGGAGTC
CGGGGGCTCAGGCTTCCGGTCTCCCCACCTGTGGAAGAAGTCCAGCACGCTGACCAGCA
GCCGCCTAGCCACCGGCGAGGAGGAGGGCGGTGGCAGCTCCAGCAAGCGCTTCCTGCGC
TCTTGCTCCGCCTCCTGCGTTCCCCATGGGGCCAAGGACACGGAGTGGAGGTCAGTCACG
CTGCCTCGGGACTTGCAGTCCACGGGAAGACAGTTTGACTCGTCCACATTTGGAGGGCAC
AAAAGTGAGAAGCCGGCTCTGCCTCGGAAGAGGGCAGGGGAGAACAGGTCTGACCAGG
TGACCCGAGGCACAGTAACGCCTCCCCCCAGGCTGGTGAAAAAGAATGAGGAAGCTGCT
GATGAGGTCTTCAAAGACATCATGGAGTCCAGCCCGGGCTCCAGCCCGCCCAACCTGACT
CCAAAACCCCTCCGGCGGCAGGTCACCGTGGCCCCTGCCTCGGGCCTCCCCCACAAGGA
AGAAGCTGGAAAGGGCAGTGCCTTAGGGACCCCTGCTGCAGCTGAGCCAGTGACCCCCA
CCAGCAAAGCAGGCTCAGGTGCACCAGGGGGCACCAGCAAGGGCCCCGCCGAGGAGTC
CAGAGTGAGGAGGCACAAGCACTCCTCTGAGTCGCCAGGGAGGGACAAGGGGAAATTGT
CCAGGCTCAAACCTGCCCCGCCGCCCCCACCAGCAGCCTCTGCAGGGAAGGCTGGAGGA
AAGCCCTCGCAGAGCCCGAGCCAGGAGGCGGCCGGGGAGGCAGTCCTGGGCGCAAAGA
82

CAAAAGCCACGAGTCTGGTTGATGCTGTGAACAGTGACGCTGCCAAGCCCAGCCAGCCG
GGAGAGGGCCTCAAAAAGCCCGTGCTCCCGGCCACTCCAAAGCCACAGTCCGCCAAGCC
GTCGGGGACCCCCATCAGCCCAGCCCCCGTTCCCTCCACGTTGCCATCAGCATCCTCGGCC
CTGGCAGGGGACCAGCCGTCTTCCACCGCCTTCATCCCTCTCATATCAACCCGAGTGTCTC
TTCGGAAAACCCGCCAGCCTCCAGAGCGGATCGCCAGCGGCGCCATCACCAAGGGCGTG
GTCCTGGACAGCACCGAGGCGCTGTGCCTCGCCATCTCTAGGAACTCCGAGCAGATGGCC
AGCCACAGCGCAGTGCTGGAGGCCGGCAAAAACCTCTACACGTTCTGCGTGAGCTATGTG
GATTCCATCCAGCAAATGAGGAACAAGTTTGCCTTCCGAGAGGCCATCAACAAACTGGAG
AATAATCTCCGGGAGCTTCAGATCTGCCCGGCGACAGCAGGCAGTGGTCCAGCGGCCACT
CAGGACTTCAGCAAGCTCCTCAGTTCGGTGAAGGAAATCAGTGACATAGTGCAGAGGTAG
CAGCAGTCAGGGGTCAGGTGTCAGGCCCGTCGGAGCTGCCTGCAGCACATGCGGGCTCG
CCCATACCCGTGACAGTGGCTGACAAGGGACTAGTGAGTCAGCACCTTGGCCCAGGAGCT
CTGCGCCAGGCAGAGCTGAGGGCCCTGTGGAGTCCAGCTCTACTACCTACGTTTGCACCG
CCTGCCCTCCCGCACCTTCCTCCTCCCCGCTCCGTCTCTGTCCTCGAATTTTATCTGTGGAG
TTCCTGCTCCGTGGACTGCAGTCGGCATGCCAGGACCCGCCAGCCCCGCTCCCACCTAGT
GCCCCAGACTGAGCTCTCCAGGCCAGGTGGGAACGGCTGATGTGGACTGTCTTTTTCATTT
TTTTCTCTCTGGAGCCCCTCCTCCCCCGGCTGGGCCTCCTTCTTCCACTTCTCCAAGAATG
GAAGCCTGAACTGAGGCCTTGTGTGTCAGGCCCTCTGCCTGCACTCCCTGGCCTTGCCCG
TCGTGTGCTGAAGACATGTTTCAAGAACCGCATTTCGGGAAGGGCATGCACGGGCATGCA
CACGGCTGGTCACTCTGCCCTCTGCTGCTGCCCGGGGTGGGGTGCACTCGCCATTTCCTCA
CGTGCAGGACAGCTCTTGATTTGGGTGGAAAACAGGGTGCTAAAGCCAACCAGCCTTTGG
GTCCTGGGCAGGTGGGAGCTGAAAAGGATCGAGGCATGGGGCATGTCCTTTCCATCTGTC
CACATCCCCAGAGCCCAGCTCTTGCTCTCTTGTGACGTGCACTGTGAATCCTGGCAAGAA
AGCTTGAGTCTCAAGGGTGGCAGGTCACTGTCACTGCCGACATCCCTCCCCCAGCAGAAT
GGAGGCAGGGGACAAGGGAGGCAGTGGCTAGTGGGGTGAACAGCTGGTGCCAAATAGCC
CCAGACTGGGCCCAGGCAGGTCTGCAAGGGCCCAGAGTGAACCGTCCTTTCACACATCTG
GGTGCCCTGAAAGGGCCCTTCCCCTCCCCCACTCCTCTAAGACAAAGTAGATTCTTACAAG
GCCCTTTCCTTTGGAACAAGACAGCCTTCACTTTTCTGAGTTCTTGAAGCATTTCAAAGCC
CTGCCTCTGTGTAGCCGCCCTGAGAGAGAATAGAGCTGCCACTGGGCACCTGCGCACAGG
TGGGAGGAAAGGGCCTGGCCAGTCCTGGTCCTGGCTGCACTCTTGAACTGGGCGAATGTC
TTATTTAATTACCGTGAGTGACATAGCCTCATGTTCTGTGGGGGTCATCAGGGAGGGTTAG
GAAAACCACAAACGGAGCCCCTGAAAGCCTCACGTATTTCACAGAGCACGCCTGCCATCT
TCTCCCCGAGGCTGCCCCAGGCCGGAGCCCAGATACGGGGGCTGTGACTCTGGGCAGGG
ACCCGGGGTCTCCTGGACCTTGACAGAGCAGCTAACTCCGAGAGCAGTGGGCAGGTGGC
CGCCCCTGAGGCTTCACGCCGGGAGAAGCCACCTTCCCACCCCTTCATACCGCCTCGTGC
CAGCAGCCTCGCACAGGCCCTAGCTTTACGCTCATCACCTAAACTTGTACTTTATTTTTCTG
ATAGAAATGGTTTCCTCTGGATCGTTTTATGCGGTTCTTACAGCACATCACCTCTTTGCCCC
CGACGGCTGTGACGCAGCCGGAGGGAGGCACTAGTCACCGACAGCGGCCTTGAAGACAG
AGCAAAGCGCCCACCCAGGTCCCCCGACTGCCTGTCTCCATGAGGTACTGGTCCCTTCCTT
TTGTTAACGTGATGTGCCACTATATTTTACACGTATCTCTTGGTATGCATCTTTTATAGACGC
TCTTTTCTAAGTGGCGTGTGCATAGCGTCCTGCCCTGCCCCCTCGGGGGCCTGTGGTGGCT
CCCCCTCTGCTTCTCGGGGTCCAGTGCATTTTGTTTCTGTATATGATTCTCTGTGGTTTTTTT
TGAATCCAAATCTGTCCTCTGTAGTATTTTTTAAATAAATCAGTGTTTACATTAGAA

wild type protein sequence:

>gi|62362412|ref|NP_009297.2| c-abl oncogene 1, receptor tyrosine kinase


isoform b [Homo sapiens]
MGQQPGKVLGDQRRPSLPALHFIKGAGKKESSRHGGPHCNVFVEHEALQRPVASDFEPQGLSEAARWN
SKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYIT
PVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVS
83

SESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGE
VYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLL
DYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDT
YTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERP
EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGVRGAVSTLLQAPELPT
KTRTSRRAAEHRDTTDVPEMPHSKGQGESDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNL
FSALIKKKKKTAPTPPKRSSSFREMDGQPERRGAGEEEGRDISNGALAFTPLDTADPAKSPKPSNGAGVP
NGALRESGGSGFRSPHLWKKSSTLTSSRLATGEEEGGGSSSKRFLRSCSASCVPHGAKDTEWRSVTLPR
DLQSTGRQFDSSTFGGHKSEKPALPRKRAGENRSDQVTRGTVTPPPRLVKKNEEAADEVFKDIMESSPG
SSPPNLTPKPLRRQVTVAPASGLPHKEEAGKGSALGTPAAAEPVTPTSKAGSGAPGGTSKGPAEESRVRR
HKHSSESPGRDKGKLSRLKPAPPPPPAASAGKAGGKPSQSPSQEAAGEAVLGAKTKATSLVDAVNSDAA
KPSQPGEGLKKPVLPATPKPQSAKPSGTPISPAPVPSTLPSASSALAGDQPSSTAFIPLISTRVSLRKTRQPP
ERIASGAITKGVVLDSTEALCLAISRNSEQMASHSAVLEAGKNLYTFCVSYVDSIQQMRNKFAFREAINK
LENNLRELQICPATAGSGPAATQDFSKLLSSVKEISDIVQR

5.3 ORF finder results:


With the help of the ORF finder, the reading frames of the mutant [edited] nucleotide
sequences were found out. The results are as shown below:

MUTANT NAME A.A.R. CHANGE WITH VALIDITY


POSITION
148967 T231P Protein created
148977 Y234S Protein created
148982 A236T Protein created
148929+148930 STOP CODON Protein truncated
149056 D260E Protein created
149119 Q [NO CHANGE IN A.A.R] Protein created
148967+149119[P25] T231P; Q[NO CHANGE] Protein created
159110 N [NO CHANGE IN A.A.R] Protein created
161068 STOP CODON Protein truncated
161136 STOP CODON Protein truncated
161141 STOP CODON Protein truncated
161153 STOP CODON Protein truncated
164607 V467G Protein created
166200 N498T Protein created
166238 A511P Protein created
166248 T514K Protein created

T=;P=Phenylalanine;Y=;S=Serine;A=Alanine;D=;E=;Q=;N=;V=Valine;G=Glycine;K=;

Here in this table and here after, T514K means that T in the wild type has been replaced by K
in the mutant type at position 514.The above table shows that there are 5 mutation cases
where the mutant protein is truncated. A truncated protein cannot form an active protein and
hence these mutation cases have been omitted in the next steps. The above table also shows
that there are 3 mutation cases, though there is a change in the nucleotide level, at the protein
level, the changed codon codes for same A.A and hence no change.
84

5.4 BLAST p results:

The blast p result of wild type protein of abl tyrosine kinase Vs PDB database gave the
following result. Only the top few searches have been shown here.

gi|62362412|ref|NP_009297.2| c-abl oncogene...


Query ID: lcl|64796
Description:
gi|62362412|ref|NP_009297.2| c-abl oncogene 1, receptor tyrosine kinase isoform b [Homo
sapiens]
Molecule type: amino acid
Query Length: 1149
Database Name: pdb
Description: PDB protein database
Program: BLASTP 2.2.20+

Search Parameters
Program blastp
Word size 3
Expect value 10
Hitlist size 100
Gapcosts 11,1
Matrix BLOSUM62
Threshold 11
Composition-based stats 2
Filter string F
Genetic Code 1
Window Size 40
Database

Posted date May 17, 2009 5:41


PM
Number of letters 9,422,204
Number of 41,234
sequences
Entrez query none
Karlin-Altschul statistics
Params Ungapped Gapped
Lambda 0.311071 0.267
K 0.12901 0.041
H 0.377932 0.14
Results Statistics
Length adjustment 106
85

Effective length of query 1043


Effective length of 5051400
database
Effective search space 5268610200
Effective search space used 5268610200

Descriptions

Sequences producing significant alignments: (Bits) Value E value

pdb|1OPL|A Chain A, Structural Basis For The Auto-Inhibition ... 1128 0.0
pdb|1OPK|A Chain A, Structural Basis For The Auto-Inhibition ... 1033 0.0
pdb|2FO0|A Chain A, Organization Of The Sh3-Sh2 Unit In Activ... 1021 0.0
86

pdb|2E2B|A Chain A, Crystal Structure Of The C-Abl Kinase Dom... 617 9e-177
pdb|2QOH|A Chain A, Crystal Structure Of Abl Kinase Bound Wit... 611 5e-175
pdb|1FPU|A Chain A, Crystal Structure Of Abl Kinase Domain In... 611 5e-175
pdb|2G1T|A Chain A, A Src-Like Inactive Conformation In The A... 611 5e-175
pdb|2F4J|A Chain A, Structure Of The Kinase Domain Of An Imat... 610 8e-175

>pdb|1OPL|A Chain A, Structural Basis For The Auto-Inhibition Of C-Abl Tyrosine


Kinase
pdb|1OPL|B Chain B, Structural Basis For The Auto-Inhibition Of C-Abl Tyrosine
Kinase
Length=537

Score = 1128 bits (2917), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 528/534 (98%), Positives = 533/534 (99%), Gaps = 0/534 (0%)

Query 1 MGQQPGKVLGDQRRPSLPALHFIKGAGKKESSRHGGPHCNVFVEHEALQRPVASDFEPQG 60
MGQQPGKVLGDQRRPSLPALHFIKGAGK++SSRHGGPHCNVFVEHEALQRPVASDFEPQG
Sbjct 1 MGQQPGKVLGDQRRPSLPALHFIKGAGKRDSSRHGGPHCNVFVEHEALQRPVASDFEPQG 60

Query 61 LSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCE 120


LSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCE
Sbjct 61 LSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCE 120

Query 121 AQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQR 180


AQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQR
Sbjct 121 AQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQR 180

Query 181 SISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRN 240


SISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRN
Sbjct 181 SISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRN 240

Query 241 KPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVE 300


KPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVE
Sbjct 241 KPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVE 300

Query 301 EFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLL 360


EFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLL
Sbjct 301 EFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLL 360

Query 361 YMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKF 420


YMATQISSAMEYLEKKNFIHR+LAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKF
Sbjct 361 YMATQISSAMEYLEKKNFIHRNLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKF 420

Query 421 PIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERP 480


PIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERP
Sbjct 421 PIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERP 480

Query 481 EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGV 534


EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGK+ +
Sbjct 481 EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKENL 534

>pdb|1OPK|A Chain A, Structural Basis For The Auto-Inhibition Of C-Abl Tyrosine


Kinase
Length=495

Score = 1033 bits (2670), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 484/488 (99%), Positives = 488/488 (100%), Gaps = 0/488 (0%)

Query 46 EALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGE 105


EALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGE
Sbjct 7 EALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGE 66
87

Query 106 KLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGIN 165


KLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGIN
Sbjct 67 KLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGIN 126

Query 166 GSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVA 225


GSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVA
Sbjct 127 GSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVA 186

Query 226 DGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSL 285


DGLITTLHYPAPKRNKPT+YGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSL
Sbjct 187 DGLITTLHYPAPKRNKPTIYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSL 246

Query 286 TVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDY 345


TVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDY
Sbjct 247 TVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDY 306

Query 346 LRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSR 405


LRECNRQEV+AVVLLYMATQISSAMEYLEKKNFIHR+LAARNCLVGENHLVKVADFGLSR
Sbjct 307 LRECNRQEVSAVVLLYMATQISSAMEYLEKKNFIHRNLAARNCLVGENHLVKVADFGLSR 366

Query 406 LMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLS 465


LMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLS
Sbjct 367 LMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLS 426

Query 466 QVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEV 525


QVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEV
Sbjct 427 QVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEV 486

Query 526 EKELGKQG 533


EKELGK+G
Sbjct 487 EKELGKRG 494

>pdb|2E2B|A Chain A, Crystal Structure Of The C-Abl Kinase Domain In Complex


With Inno-406
pdb|2E2B|B Chain B, Crystal Structure Of The C-Abl Kinase Domain In Complex
With Inno-406
Length=293

Score = 617 bits (1590), Expect = 9e-177, Method: Compositional matrix adjust.
Identities = 287/287 (100%), Positives = 287/287 (100%), Gaps = 0/287 (0%)

Query 248 SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA 307


SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA
Sbjct 7 SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA 66

Query 308 VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQIS 367


VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQIS
Sbjct 67 VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQIS 126

Query 368 SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP 427


SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP
Sbjct 127 SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP 186

Query 428 ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV 487


ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV
Sbjct 187 ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV 246

Query 488 YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGV 534


YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGV
Sbjct 247 YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGV 293

>pdb|2QOH|A Chain A, Crystal Structure Of Abl Kinase Bound With Ppy-A


88

pdb|2QOH|B Chain B, Crystal Structure Of Abl Kinase Bound With Ppy-A


Length=288

Score = 611 bits (1575), Expect = 5e-175, Method: Compositional matrix adjust.
Identities = 284/286 (99%), Positives = 286/286 (100%), Gaps = 0/286 (0%)

Query 248 SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA 307


SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA
Sbjct 2 SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA 61

Query 308 VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQIS 367


VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEV+AVVLLYMATQIS
Sbjct 62 VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVLLYMATQIS 121

Query 368 SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP 427


SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP
Sbjct 122 SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP 181

Query 428 ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV 487


ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV
Sbjct 182 ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV 241

Query 488 YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQG 533


YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGK+G
Sbjct 242 YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKRG 287

Out of the above alignment results, the protein 2E2B [Crystal Structure Of The C-Abl Kinase
Domain In Complex With Inno-406] gave 100% identity. This protein PDB was downloaded
from PDB and then used as a template for Homology Modelling in ICM Molsoft.

5.5 Molegro Virtual Docker – Docking Results:


The mutant models created in ICM were Docked with Imatinib by MVD. The table below
lists the best docking energies of the mutant models and the energy deviation of the same
from the wild type.
WILD TYPE DOCKING ENERGY: -5271.97

MUTANT NAME DOCKING ENERGY DEVIATION

T231P -5515.16 -243.19

Y234S -5762.53 -490.56

A236T -5504.64 -232.67


89

D260E -5167.09 +104.88

Q271Q -5271.41 +0.56

N355N -5269.39 +2.58

V467G -5153.63 +118.34

N498T -5147.55 +124.42

A511P -5156.21 +115.76

T514K -5136.71 +135.26

T231P+Q271Q[P25] -5515.16 -243.19

The energy deviations in the above table show that, among the 11 mutant types there is high
deviation in Y234S but this deviation is in negative, which suggests that this mutation in fact
“might” help in Imatinib drug binding and consequently effective drug action. On the other
hand, the highest positive energy deviation is for the mutant model V467G. This indicates
that, if the patient with this kind of mutation is given Imatinib to combat CML, he “may” be
required to keep a check on the drug resistance. The two mutations, N355N and Q271Q
showed no effective docking energy deviation from the wild type since there was no change
in the amino acid residue in these two cases.
90

6.1

CHAPTER - 6
CONCLUSIONS
AND SCOPE
CONCLUSIONS:
Bioinformatics has led to an approach where certain assumptions can be made for a particular
case in very time, cost and labour efficient way. This in silico approach has helped many a
researchers to eliminate certain instances in a big project, to finish it in lesser time and cost.
Of course, the results of an in silico approach cannot be taken as final. They need to be tested
to some extent in vitro. In this project, the situation is quite similar and the results here are
just an assumption, to help the researchers to choose the direction in which the project must
further proceed.
91

After evaluating the mutation data sent by the client; creating mutant nucleotide sequences;
finding its reading frame; modelling the mutant protein by ICM molsoft; docking the drug in
question, onto these mutant models by Molegro Virtual Docker; we have come to a
conclusion that, out of the 16 mutation cases, 5 mutant proteins are truncated and can’t form
active protein. In the remaining 11, there are 2 mutant cases which do not show any change in
the protein level and hence no appreciable docking energy deviation; there are 4 mutant cases
where the mutation “might” act favourably for the drug Imatinib to bind and act effectively
against CML; while there are 5 mutant cases where, the mutations “might” cause slight, if not
severe Imatinib drug resistance.

6.4 SCOPE OF THE PROJECT:

The scope of this project remains large enough. In this project, about 250 patients were
screened and their DNA was sequenced for the mutation data. The scope of this project lies in
screening more individuals. And also, we have considered only one mutation in each case
[except P25], as it was given in the individual details list. We can further test the mutation Vs
drug resistance with combination of these mutations. One more aspect that may be looked
into for obvious is designing a new drug or testing other drugs, in silico and in clinical trials
for those who have confirmed Imatinib Drug resistance.

REFERENCE

1. Brain J. Druker, Moshe Talpaz, Debra J. Resta: Efficacy and Safety of a specific
Inhibitor of the BCR-ABL Tyrosine Kinase in CML: N England J Med, Vol. 344,
No. 14
92

2. Christopher Fausel, PharmD, BcPS, BcOP Targeted Chronic Myeloid Leukemia


Therapy: Seeking a Cure : JMCP Supplement to Journal of Managed Care
Pharmacy

3. Hagop, Kantarjian, Charles, Sawyers: Hematologic and cytogenic responses to


Imatinib Myselate in CML: N England J Med, Vol. 346, No. 9

4. Karl Peggs, M.A., and Stephen Mackinnon, M.D. Imatinib Mesylate — The New
Gold Standard for Treatment of Chronic Myeloid Leukemia : New England j
med 348;11

5. Marin, John M. Jamshid S. Khorashad, Dragana Milojkovic, Puja Mehta, Mona


Anand, Sara Ghorashian, Alistair G. In vivo kinetics of kinase domain mutations in
CML patients treated with dasatinib after failing Imatinib doi:10.1182/blood-
2007-06-096396

6. Michael W.N. Deininger, John M. Goldman, Nicholas Lydon and Junia V. Melo of
BCR-ABL-Positive Cells The Tyrosine Kinase Inhibitor CGP57148B Selectively
Inhibits the Growth : Blood 1997 90: 3691-3698

7. Neil P.Shah, Brian J. Skaggs, Susan Branford, Timothy P. Hughes, John M. Nicoll,:
Sequential ABL kinase inhibitor therapy selects for compound drug-resistant
BCR-ABL mutations with altered oncogenic potency: PUBMED

8. Pablo Ramirez, John F. Dipersio: Therapy Options in Imatinib Failures: The


Oncologist

9. Simona sovereni tesi di dottoratto: ABL Kinase Domain Mutations a Mechanism


of Resistance to Tyrosine Kinase Inhibitors in Ph positive leukaemia Biological,
Clinical and prognostic relevance

10. Stefan Faderl, MD; Moshe Talpaz, MD; Zeev Estrov, MD; and Hagop M. Kantarjian,
MD: Chronic Myelogenous Leukemia: Biology and Therapy
93

11. Susan Branford, Zbigniew Rudzki, Sonya Walsh, Ian Parkinson, Andrew Grigg, Jeff
Szer, Detection of BCR-ABL mutations in patients with CMLtreated with
imatinib is virtually always accompanied by clinical resistance, and mutations in
theATP phosphate-binding loop (P-loop) are associated with a poor prognosis:
Blood, July 2003

12. Thomas O'Hare, Christopher A. Eide and Michael W. N. Deininger: Bcr-Abl kinase
domain mutations, drug resistance, and the road to a cure for chronic myeloid
leukemia: doi:10.1182/blood-2007-03-066936

13. Tuija Lundán Novel prognostic factorsin chronic myeloid leukemia

14. Molegro Virtual Docker- Manual

15. ICM Molsoft – Manual

16. www.cancer.gov

17. www.clinicalcancerresearch.gov

18. www.drugbank.ca

19. www.ncbi.nlm.nih.gov

20. www.rcsb.org/pdb/home/home

21. www.wikipedia.org

22. www.scribd.com

You might also like