Professional Documents
Culture Documents
Imatinib Drug Resistance
Imatinib Drug Resistance
Imatinib Drug Resistance
INTRODUCTION C
hro
nic
Novel therapies and concepts are developing rapidly; targeted molecules are tyrosine
kinases, ras, and messenger RNA through antisense oligonucleotides. Alternative
transplantation options, such as stem cells from autologous sources and matched unrelated
donors, are expanding. Immunomodulation by adoptive immunotherapy and vaccine
strategies hold significant promise for the cure of Chronic Myelogenous Leukaemia.
In this study, we performed an in silico approach to study the effect of the point
mutations on the evident drug resistance. First of all, a data of the point mutations was
collected from a client [who derived the data by PCR amplification and sequencing of the
nucleotide sequence of some patients]. Then with the help of this data, mutant protein models
3
were created by homology modelling. And finally, docking of these mutant protein molecules
against the drug Imatinib was done. The results were then compared with that of the pure
protein [wild protein].
1. Figures of peripheral blood (left) and bone marrow (right) smears of a CML patient
in chronic phase, showing leukocytosis in the peripheral blood, and hypercellularity
in the bone marrow due mainly to neutrophils in different stages of maturation. In
CML bone marrow, typical megakaryocytes are smaller than normal and have
hypolobulated nuclei.
4
CHAPTER - 2
REVIEW OF
LITERATURE
2.1
targeted therapies introduced at the beginning of the 21st century have radically changed the
management of CML.
Normally, the bone marrow makes blood stem cells (immature cells) that develop into
mature blood cells over time. A blood stem cell may become a myeloid stem cell or a
lymphoid stem cell. The lymphoid stem cell develops into a white blood cell. The myeloid
stem cell develops into one of three types of mature blood cells:
• Red blood cells that carry oxygen and other materials to all tissues of the body.
• Platelets that help prevent bleeding by causing blood clots to form.
• Granulocytes (white blood cells) that fight infection and disease
In CML, too many blood stem cells develop into a type of white blood cell called
granulocytes. These granulocytes are abnormal and do not become healthy white blood cells.
They may also be called leukemic cells. The leukemic cells can build up in the blood and
bone marrow so there is less room for healthy white blood cells, red blood cells, and platelets.
When this happens, infection, anaemia, or easy bleeding may occur.
Most people with CML have a gene mutation (change) called the Philadelphia
chromosome. Every cell in the body contains DNA (genetic material) that determines how the
cell looks and acts. DNA is contained inside chromosomes. In CML, part of the DNA from
6
one chromosome moves to another chromosome. This change is called the “Philadelphia
chromosome.” It results in the bone marrow making an enzyme, called tyrosine kinase that
causes too many stem cells to develop into white blood cells (granulocytes or blasts). The
Philadelphia chromosome is not passed from parent to child.
4[a and b].Structure of the c-Bcr, c-Abl and Bcr-Abl proteins. c-Bcr comprises an
oligomerization domain, a domain thought to mediate binding to SH2-domain-containing proteins, a
serine/threonine kinase domain, a region with homology to Rho guanine-nucleotide-exchange factor
(Rho-GEF), a region thought to facilitate calcium-dependent lipid binding (CaLB) and a region
showing homology to Rac GTPase activating protein (Rac-GAP). The main phosphorylation site of
Bcr (Tyr 177) is indicated. c-Abl comprises an SH3 and SH2 domain, an SH1 tyrosine kinase
domain, several proline-rich domains (P), a nuclear localization signal (NLS), several DNA-binding
domains (DNA BD) and an actin-binding domain. The Bcr-Abl fusion protein comprises the first four
domains of c-Bcr and all the c-Abl domains except the N-terminal SH3 domain.
8
Symptoms of CML:
• Splenomagaly
• Susceptibility to infections
• Anaemia
• Thrombocytopenia
• Enlargement of liver etc.
Diagnosis of CML:
• Physical exam and history: An exam of the body to check general signs of health,
including checking for signs of disease such as an enlarged spleen. A history of the
patient’s health habits and past illnesses and treatments will also be taken.
• Complete blood count (CBC): A procedure in which a sample of blood is drawn and
checked for the following:
The number of red blood cells, white blood cells, and platelets.
The amount of haemoglobin (the protein that carries oxygen) in the red blood cells.
The portion of the sample made up of red blood cells
9
Small proportion of patients has a clinical picture consistent with CML, but no Ph
chromosome can be cytogenetically observed. In these cases the chromosomal aberrations are
sub-microscopic and in conventional cytogenetic studies the cases seem to be Ph
chromosome negative. These may also be called as cryptic translocations or masked Ph
chromosomes. However, even though cytogenetically no abnormality may be observed, at the
molecular level the pathogenic BCR-ABL fusion gene characteristic for CML is detectable.
This condition is called Ph negative, BCR-ABL positive CML. The Ph negative, BCR-ABL
positive cases do not otherwise differ from standard Ph positive patients except that the
chromosomal mechanism of the fusion gene formation is instead of translocation most often
insertion of 3´ABL or 5´BCR sequences to chromosome 22 or 9, respectively .
The “real” Ph negative cases that are also lacking BCR-ABL molecular rearrangement are
regarded as separate entities: as chronic neutrophilic leukaemia or atypical CML. These
disorders are classified as either other chronic myeloproliferative or myelodysplastic/
myeloproliferative diseases according to WHO classification. Usually these diseases are
unresponsive to tyrosine kinase inhibitors and have a poor prognosis. Because of
unresponsiveness to these inhibitors the name (regardless of the prefix “atypical”) CML is
slightly misleading.
CML is often divided into three phases based on clinical characteristics and laboratory
findings. In the absence of intervention, CML typically begins in the chronic phase, and over
the course of several years progresses to an accelerated phase and ultimately to a blast crisis.
Blast crisis is the terminal phase of CML and clinically behaves like an acute leukemia. One
of the drivers of the progression from chronic phase through acceleration and blast crisis is
10
Chronic phase
Approximately 85% of patients with CML are in the chronic phase at the time of diagnosis.
During this phase, patients are usually asymptomatic or have only mild symptoms of fatigue
or abdominal fullness. The duration of chronic phase is variable and depends on how early
the disease was diagnosed as well as the therapies used. Ultimately, in the absence of curative
treatment, the disease progresses to an accelerated phase.
Accelerated phase
Criteria for diagnosing transition into the accelerated phase are somewhat variable; the most
widely used criteria are those put forward by investigators at M.D. Anderson Cancer Centre,
by Sokal et al, and the World Health Organization. The WHO criteria are perhaps most
widely used, and include:
The patient is considered to be in the accelerated phase if any of the above are present. The
accelerated phase is significant because it signals that the disease is progressing and
transformation to blast crisis is imminent.
11
Blast crisis
Blast crisis is the final phase in the evolution of CML, and behaves like an acute leukaemia,
with rapid progression and short survival. Blast crisis is diagnosed if any of the following are
present in a patient with CML:
The high specificity of Imatinib in inhibiting the tyrosine kinases mentioned above is
achieved by its ability to bind the kinase molecule in its closed (inactive) conformation. In the
closed conformation the centrally located activation loop of the kinase is not phosphorylated
and therefore inactive. When phosphorylated, the activation loop extends to the open (active)
conformation which enables binding of substrate molecules to the kinase and subsequently
their phosphorylation. The active conformation is very similar in all known kinases. In
contrast, the inactive conformation has great diversity among protein kinases, explaining the
specificity of Imatinib. Imatinib occupies the ATP binding site of the BCR-ABL kinase
domain and acts as a competitive inhibitor of BCR-ABL with respect to ATP. The side chain
of threonine residue at position 315 (T315) forms a hydrogen bond with the Imatinib
molecule. This residue is replaced by methionine in many kinases which is not able to form
such a bond, which makes T315 a key element for Imatinib to inhibit BCR-ABL. When
Imatinib occupies the ATP binding pocket it stabilizes the inactive form of BCR-ABL, thus
preventing autophosphorylation of the kinase itself and subsequently phosphorylation of its
12
• Small Molecule
Synonyms 1. Imatinib Mesylate
2. Imatinib Methansulfonate
Brand Names 1. Gleevec
2. Glivec
Chemical IUPAC 4-[(4-methylpiperazin-1-yl)methyl]-N-[4-methyl-3-[(4-pyridin-3-
Name ylpyrimidin-2-yl)amino]phenyl]benzamide
Chemical Formula C29H31N7O
13
Chemical Structure
Capsul
e
Oral
Tablet
Oral
Food Interactions • Take with food to reduce the incidence of gastric irritation.
Follow with a large glass of water. A lipid rich meal will slightly
reduce and delay absorption. Avoid grapefruit and grapefruit
juice throughout treatment, grapefruit can significantly
increase serum levels of this product.
Organisms Affected • Humans and other mammals
Phase 1 1. Cytochrome P450 3A4 (CYP3A4)
Metabolizing
Enzymes
Targets • Proto-oncogene tyrosine-protein kinase ABL1
• Beta platelet-derived growth factor receptor
• Mast/stem cell growth factor receptor
• Alpha platelet-derived growth factor receptor
• Macrophage colony-stimulating factor 1 receptor
• Multidrug resistance protein 1
• High affinity nerve growth factor receptor
• ATP-binding cassette sub-family G member 2
• RET proto-oncogene
Oxygen
15
View the NCBI home page. A relatively good overview of the tools and databases that can be
accessed through NCBI is provided in the list along the left border of the home page.
Clicking on the link entitled "About NCBI" produces a second menu containing the topics "A
Science Primer", and "Databases and Tools", among others. Selecting "A Science Primer"
yields access to general definitions and introductory information regarding the branches of
science included in bioinformatics. Many bioinformatics terms are defined in this section in a
clear-cut and basic manner, making this Primer an excellent first resource. Selecting
"Databases and Tools" from the "About NCBI" webpage menu yields a complete and well-
ordered listing of accessible information. This web page containing the databases and tools
menu is a good choice for those who are inclined toward bookmarking.
The first item under the "Databases and Tools" menu is "Literature Databases". PubMed is the
most heavily used of the literature databases and can be used to access MEDLINE biological
and medical scientific journal citations dating back to articles written in the mid-1960's. The
second item under the "Databases and Tools” menu is "Entrez Databases". Entrez is a search
and retrieval system developed by NCBI that is capable of accessing integrated information
by searching many of the NCBI databases with just one query (instead of searching only one
database per query, then having to repeat the query to find information on the same topic from
another NCBI database). The NCBI databases that are included in the search when you
launch an Entrez query are shown when you click on this link. The "Nucleotide Databases"
link under the "Databases and Tools" menu lists all the sequence databases available through
NCBI. These sequence databases contain annotated collections of publicly available DNA,
RNA and protein sequences. The evolution of bioinformatics data mining methods has been
largely driven by the prodigious amount of sequence information collected by scientists in
recent years. New sequences of unknown function can be compared with sequences of well-
characterized genes and proteins. Similarities can be identified between the new, unknown
sequences and the well-characterized sequences, and used to postulate theories regarding
function or structure.
Among the tools listed under the NCBI "Databases and Tools" menu, are "Tools for Data
Mining". Selecting the "Tools for Data Mining" topic will show a list of data retrieval tools,
including Entrez, mentioned above, and BLAST, the Basic Local Alignment Search Tool.
Blast is the predominant sequence alignment tool for performing rapid searches of nucleotide
17
and protein sequence databases and detecting local, as well as global, sequence alignments
between the query sequence and the database sequences.
This is a brief glimpse at some of the more widely used tools and databases presented by
NCBI, presented with the intention of helping the novice get some feel for the number and
types of bioinformatics tools that are available on the internet today. Several of these tools are
covered in more detail in subsequent modules included in this bioinformatics course. Before
proceeding to the next module, take a moment to return to the "About NCBI" webpage menu
and glance through some of the interesting web pages linked under the topics "A Science
Primer", "Outreach and Education", and "News".
The NCBI has had responsibility for making available the GenBank DNA sequence database
since 1992. GenBank coordinates with individual laboratories and other sequence databases
such as those of the European Molecular Biology Laboratory (EMBL) and the DNA Database
of Japan (DDBJ).
Since 1992, NCBI has grown to provide other databases in addition to GenBank. NCBI
provides Online Mendelian Inheritance in Man, the Molecular Modeling Database (3D
protein structures), dbSNP a database of Single Nucleotide Polymorphisms, the Unique
Human Gene Sequence Collection, a Gene Map of the Human genome, a Taxonomy Browser,
and coordinates with the National Cancer Institute to provide the Cancer Genome Anatomy
Project. The NCBI assigns a unique identifier (Taxonomy ID number) to each species of
organism
The Protein Data Bank (PDB) is a repository for the 3-D structural data of large
biological molecules, such as proteins and nucleic acids. (See also crystallographic database).
The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted
by biologists and biochemists from around the world, can be accessed at no charge on the
internet. The PDB is overseen by an organization called the Worldwide Protein Data Bank,
wwPDB.
The PDB is a key resource in areas of structural biology, such as structural genomics. Most
major scientific journals, and some funding agencies, such as the NIH in the USA, now
require scientists to submit their structure data to the PDB. If the contents of the PDB are
thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that
categorize the data differently. For example, both SCOP and CATH categorize structures
according to type of structure and assumed evolutionary relations; GO categorize structures
based on genes.
The PDB originated as a grassroots effort. In 1971, Walter Hamilton of the Brookhaven
National Laboratory agreed to set up the data bank at Brookhaven. Upon Hamilton's death in
1973, Tom Koeztle took over direction of the PDB. In January, 1994, Joel Sussman was
appointed head of the PDB. In October, 1998 the PDB was transferred to the Research
Collaboratory for Structural Bioinformatics (RCSB); the transfer was completed in June,
1999. The new director was Helen M. Berman of Rutgers University (one of the member
institutions of the RCSB). In 2003, with the formation of the wwPDB, the PDB became an
international organization. Each of the four members of wwPDB can act as deposition, data
processing and distribution centres for PDB data. The data processing refers to the fact that
wwPDB staff review and annotates the each submitted entry. The data are then automatically
checked for plausibility
The PDB database is updated weekly. Likewise, the PDB Holdings List is also updated
weekly. As of 28 April 2009, the breakdown of current holdings was as follows:
19
These data show that most structures are determined by X-ray diffraction, but about 15% of
structures are now determined by protein NMR, and a few are even determined by cryo-
electron microscopy.
The significance of the structure factor files, mentioned above, is that, for PDB structures
determined by X-ray diffraction that have a structure file, the electron density map may be
viewed. The data of such structures is stored on the "electron density server", where the
electron maps can be viewed.
In the past, the number of structures in the PDB has grown nearly exponentially. In 2007,
7263 structures were added. However, in 2008, only 7073 structures were added, so the rate
of production of structures has started to decrease.
The file format initially used by the PDB was called the PDB file format. This original format
was restricted by the width of computer punch cards to 80 characters per line. Around 1996,
the "macromolecular Crystallographic Information file" format, mmCIF, started to be phased
in. An XML version of this format, called PDBML, was described in 2005. The structure files
can be downloaded in any of these three formats. In fact, individual files are easily
downloaded into graphics packages using web addresses:
The "4hhb" is the PDB identifier. Each structure published in PDB receives a four-character
alphanumeric identifier, its PDB ID. (This cannot be used as an identifier for biomolecules,
because often several structures for the same molecule (in different environments or
conformations) are contained in PDB with different PDB IDs.)
The structure files may be viewed using one of several open source computer programs.
Some other free, but not open source programs include VMD, MDL Chime, Swiss-PDB
Viewer, StarBiochem (a Java-based interactive molecular viewer with integrated search of
protein databank) and Sirius. The RCSB PDB website contains an extensive list of both free
and commercial molecule visualization programs and web browser plugins.
2.3.3 DRUGBANK:
Users may query DrugBank in any number of ways. The simple text query (above) supports
general text queries of the entire textual component of the database.
Clicking on the Browse button (on the DrugBank navigation panel above) generates a
tabular synopsis of DrugBank's content. This browse view allows users to casually scroll
through the database or re-sort its contents.
Clicking on a given DrugCard button brings up the full data content for the corresponding
drug. A complete explanation of all the DrugCard fields and sources is given here.
21
The PharmaBrowse button allows users to browse through drugs as grouped by their
indication. This is particularly useful for pharmacists and physicians, but also for
pharmaceutical researchers looking for potential drug leads.
The ChemQuery button allows users to draw (using MarvinSketch applet or a ChemSketch
applet) or write (SMILES string) a chemical compound and to search DrugBank for
chemicals similar or identical to the query compound.
The TextQuery button supports a more sophisticated text search (partial word matches, case
sensitive, misspellings, etc.) of the text portion of DrugBank.
The SeqSearch button allows users to conduct BLASTP (protein) sequence searches of the
18,000 sequences contained in DrugBank. Both single and multiple sequence (i.e. whole
proteome) BLAST queries are supported.
The Data Extractor button opens an easy-to-use relational query search tool that allows
users to select or search over various combinations of subfields. The Data Extractor is the
most sophisticated search tool for DrugBank.
Users may download selected text components and sequence data from DrugBank and track
the latest DrugBank statistics by clicking on the Download button.
The ORF Finder (Open Reading Frame Finder) is a graphical analysis tool which finds all
open reading frames of a selectable minimum size in a user's sequence or in a sequence
already in the database. This tool identifies all open reading frames using the standard or
alternative genetic codes. The deduced amino acid sequence can be saved in various formats
and searched against the sequence database using the WWW BLAST server. The ORF Finder
helps in preparing complete and accurate sequence submissions. It is also packaged with the
Sequin sequence submission software.
Link: www.ncbi.nlm.nih.gov/orf_finder.html
22
2.Click orf
1.Paste sequence
here
To use ORF Finder, enter the accession or GI number of the sequence of interest, or enter
your query sequence directly into the text box in FASTA format. ORF Finder will identify all
open reading frames using the standard genetic code or an alternative one for translation.
Users can limit the search for open reading frames to a portion of the query sequence by
specifying the positions (in base pairs) in the "From" and "To" boxes. Press the ORF Find
button to retrieve a graphic display of ORFs and their location in the sequence in 6 reading
frames. Users have the option to change the minimum ORF length to 50 or 300 nucleotides
(in base pairs) and Redraw the query sequence. The Six Frames option features a graphic of
all start and stop codons. Select a particular ORF by clicking on it to see the amino acid
sequence with all alternative start codons. After selecting a particular ORF of interest, click
on the Accept button and have the option to view the ORF in various formats: GenBank flat-
file, FASTA nucleotide, or FASTA amino acid sequence. Selecting View retrieves the full
GenBank record with its annotated sequence information.
For those scientists submitting sequence data, ORF Finder is also packaged with the Sequin
sequence submission software. ORF Finder can be used in conjunction with Sequin’s
Sequence Editor to annotate new coding regions on the record, perform basic editing, and
translate nucleotide sequences. The Sequin program can be downloaded from NCBI’s FTP
site accessible from the NCBI WWW home page.
The BLAST algorithm was developed as a new way to perform a sequence similarity
search by an algorithm that is faster than FASTA while being as sensitive. A powerful
computer system dedicated to running BLAST has been established at NCBI, National
Library of Medicine. Access to this BLAST system is possible through the Internet
(http://www.ncbi.nlm.nih.gov/) as a Web site and through a BLAST E-mail server. There are
also numerous other Web sites that provide a BLAST database search. In addition to the
BLAST programs developed at the NCBI, an independent set of BLAST programs has been
developed at Washington University. These programs perform similarity searches using the
same methods as NCBI-BLAST and produce gapped local alignments. The statistical
methods used to evaluate sequence similarity scores are different, and thus WU-BLAST and
NCBI-BLAST can produce different results.
The BLAST Web server at http://www.ncbi.nlm.nih.gov/ is the most widely used one
for sequence database searches and is backed up by a powerful computer system so that there
is usually very little wait. Like FASTA, the BLAST algorithm increases the speed of sequence
alignment by searching first for common words or k-tuples in the query sequence and each
database sequence. Whereas FASTA searches for all possible words of the same length,
BLAST confines the search to the words that are the most significant. For proteins,
significance is determined by evaluating these word matches using log odds scores in the
BLOSUM62 amino acid substitution matrix. For the BLAST algorithm, the word length is
fixed at 3 (formerly 4) for proteins and 11 for nucleic acids (3 if the sequences are translated
in all six reading frames). This length is the minimum needed to achieve a word score that is
high enough to be significant but not so long as to miss short but significant patterns.
The sequence is optionally filtered to remove low-complexity regions that are not useful
for producing meaningful sequence alignments.
A list of words of length 3 in the query protein sequence is made starting with positions 1,
2, and 3; then 2, 3, and 4, etc.; until the last 3 available positions in the sequence are reached
(word length 11 for DNA sequences, 3 for programs that translate DNA sequences).
Using the BLOSUM62 substitution scores, the query sequence words in step 1 are
evaluated for an exact match with a word in any database sequence. The words are also
evaluated for matches with any other combination of three amino acids, the object being to
find the scores for aligning the query word with any other three-letter word found in a
database sequence.
A cut-off score called neighbourhood word score threshold (T) is selected to reduce the
number of possible matches to PQG to the most significant ones.
The above procedure is repeated for each three-letter word in the query Sequence.
The remaining high-scoring words that comprise possible matches to each three letter
position in the query sequence are organized into an efficient search tree for comparing them
rapidly to the database sequences.
Each database sequence is scanned for an exact match to one of the 50 words
corresponding to the first query sequence position, for the words to the second position, and
so on. If a match is found, this match is used to seed a possible ungapped alignment between
the query and database sequences.
The next step is to determine whether each HSP score found by one of the above methods
is greater in value than a cut-off score S. A suitable value for S is determined empirically by
examining the range of scores found by comparing random sequences, and by choosing a
value that is significantly greater. The high scoring pairs (HSPs) matched in the entire
database are identified and listed.
BLAST next determines the statistical significance of each HSP score. A probability that
two random sequences, one the length of the query sequence and the other the entire length of
the database (which is approximately equal to the sum of the lengths of all of the database
sequences), could achieve the HSP score is calculated. Sometimes, two or more HSP regions
that can be made into a longer alignment will be found, thereby providing additional evidence
that the query and database sequences are related. In such cases, a combined assessment of
the significance will be made.
25
Smith-Waterman local alignments are shown for the query sequence with each of the
matched sequences in the database. The score of the alignment is obtained and the expect
value for that score is calculated.
When the expect score for a given database sequence satisfies the user-selectable threshold
parameter E, the match is reported.
ICM empowers a biologist or chemist with lightning fast access and high quality interactive
3D views to the entire sturctural database. In just a few seconds you can browse hundreds of
structures of interest load them, analyze and visualize sequences, structures, alignments, sites,
study pockets and bound ligands and drugs, study surfaces, electrostatics, mutations, pockets,
sequence conservations, perform docking of small molecules as well as protein-protein
docking. ICM supports multiple input formats. You can search structural database by field,
sequence pattern and get an interactive table for instant viewing. ICM offers a rich graphical
environment and powerful views for professional quality of images and molecular animation
videos.
The ICM ('Internal Coordinate Mechanics') software project was originally designed
around a new molecular mechanics approach and optimization algorithm for peptide
prediction, homology modelling, loop simulations, flexible macromolecular docking and
refinement, and then was extended to graphics, molecular animations, chemistry, sequence
26
analysis, database searches, mathematics, statistics and plotting. ICM-Pro contains an all
atom internal coordinate force field and efficient algorithm to perform local and global energy
optimization of small or large molecules with respect to an arbitrary subset of variables. In
addition, ICM contains MMFF94 force field for energy optimization in Cartesian space for
any organic molecule. ICM-Pro allows users to read, build, convert, refine, analyze and
superimpose molecules. Includes graphics tools for diverse molecular rendering, perspective
viewing, depth cueing, etc. Uses both hardware and side-by-side stereo. Allows saving and
printing a screen image as a compact vectorized postscript file in addition to a compressed
bitmap.
Molecular graphics:
It utilizes a full and robust array of graphics tools all accessible from a GUI interface.
Displays your molecules in wire, CPK, ball&stick, worm, ribbon, accessible surface,
transparent molecular surface, perspective, depth cueing, smooth and rugged solid surfaces.
Uses both hardware and side-by-side stereo. Save and print a screen image as a compact
vectorized postscript file (also in stereo) in addition to a compressed bitmap. Painlessly create
movies featuring molecules dressed in solid representations such as CPK, smooth molecular
surface, ball-and-stick read, display, reshape and write any 3D object in the Wavefront
format.
• Export publication quality molecular images at high resolution and vector images
(metafile)
• Annotate, atoms, residues and sites
• 2D and 3D user-defined labels
• Hydrogen bond and distance labels
• Display atom clashes, distance restraints
• High quality molecular surface representation, skin, wire, xstick and ribbon
representations
• Easy control of thickness, colour and type in molecular graphics. Colour by atom type,
residue side-chain, molecule, unique carbon atom colouring for multiple objects, bfactor,
occupancy, accessibility, hydrophobicity, polarity, secondary structure, paint structure by
alignment colour, colour by user-defined values
27
• Visual effects: dynamic shadows, fog, hardware and side-by-side stereo, clipping planes,
full screen
• Export coloured and annotated sequence alignments.
• Easy to use and control animation effects: rotations, rocking, zooming
• Store current views/viewpoints, layers and slides
• Two kinds of stereo, including a high quality “in-window” mode, as well as a stereo mode
which does not require any special hardware.
Protein Structure Analysis can be done. ICM-Pro provides a direct link to the PDB. Once
you have downloaded a structure you can analyse the structure - flagging problem regions,
superimpose multiple structures, analyse distances and electrostatic properties.
Protein Modelling: ICM-Pro has a good record in building protein modelling. There are
procedures which will regularize or build the backbone, shake up the side-chains and loops
by global energy optimization. You can also colour the model by local reliability to identify
the potentially wrong parts of the model. This does not include, however, the fast routine for
building a complete model by homology with loops combined with the database search
(ICM-Homology is a separate add-on to ICM-Pro).
Bioinformatics Tools
Protein-Protein Docking
worldwide CAPRI protein- protein docking competition. In the past ICM has been used to
dock ab initio a full-atom model of lysozyme to an antibody with 1.6A accuracy (Nature
Struc.Biol., 1994, 1,259). Later, Maxim Totrov and Ruben Abagyan correctly predicted the
association of beta-lactamase and its protein inhibitor in the Docking Challenge (Nature
Struc.Biol., 1996,3,290) using the ICM pseudo-Brownian docking with subsequent ICM side-
chain refinement.
MVD requires a three-dimensional structure of both protein and ligand (usually derived from
X-ray/NMR experiments or homology modelling). MVD performs flexible ligand docking, so
the optimal geometry of the ligand will be determined during the docking.
Molegro Virtual Docker contains a built-in version checker making it easy to check for new
program updates including new features and bug fixes. To check for new updates, select Help
| Check for Updates. A window showing available updates and details about changes made
will appear.
33
The MolDock scoring function (MolDock Score) used by MVD is derived from the PLP
scoring functions originally proposed by Gehlhaar et al. [GEHLHAAR 1995,1998] and later
extended by Yang et al. [YANG 2004]. The MolDock scoring function further improves these
scoring functions with a new hydrogen bonding term and new charge schemes. The docking
scoring function, Escore, is defined by the following energy terms:
After MVD has predicted one or more promising poses using the MolDock score, it
calculates several additional energy terms. All of these terms are stored in the
'DockingResults.mvdresults' file at the end of the docking run. The 'rerank score' is a linear
combination of these terms, weighted by the coefficients given in the
'RerankingCoefficients.txt'.
Textual Information
• Ligand: The name of the ligand the pose was created from.
• Name: The internal name of the pose (a concatenation of the pose id and ligand
name).
• Filename: The file containing the pose.
• Workspace: The workspace (.mvdml-file) containing the protein.
• Run: When running multiple docking runs for each ligand, this field contains the
docking run number.
Energy terms (total):
• Energy: The MolDock score (arbitrary units).
• RerankScore: The reranking score (arbitrary units).
• PoseEnergy: The score actually assigned to the pose during the docking.
• SimilarityScore: Similarity Score (if docking templates are enabled).
• LE1 Ligand Efficiency 1: MolDock Score divided by Heavy Atoms count.
• LE3 Ligand Efficiency 3: Rerank Score divided by Heavy Atoms count.
Energy terms (contributions)
• E-Total: The total MolDock Score energy is the sum of internal ligand energies,
protein interaction energies and soft penalties.
• E-Inter total: The total MolDock Score interaction energy between the pose and the
target molecule(s).
• E-Inter (cofactor - ligand): The total MolDock Score interaction energy between the
pose and the cofactors. (The sum of the steric interaction energies calculated by PLP,
and the electric and hydrogen bonding terms below).
• Cofactor (VdW): The steric interaction energy between the pose and the cofactors
calculated using a LJ12-6 approximation.
• Cofactor (elec): The electrostatic interaction energy between the pose and the
cofactors.
• Cofactor (hbond): The hydrogen bonding interaction energy between the pose and the
cofactors (calculated by PLP).
35
• E-Inter (protein - ligand): The MolDock Score interaction energy between the pose
and the protein. (Equal to Steric+HBond+Electro+ElectroLong below)
• Steric: Steric interaction energy between the protein and the ligand (calculated by
PLP).
• HBond: Hydrogen bonding energy between protein and ligand (calculated by PLP).
• Electro: The short-range (r<4.5Å) electrostatic protein-ligand interaction energy.
• ElectroLong: The long-range (r>4.5Å) electrostatic protein-ligand interaction energy.
• NoHBond90: This is the hydrogen bonding energy (protein-ligand) as calculated if the
• Directionality of the Hbond was not taken into account.
• VdW (LJ12-6): Protein steric interaction energy from a LJ 12-6 VdW potential
approximation.
• E-Inter (water - ligand): The MolDockScore interaction energy between the pose and
the water molecules.
• E-Intra (tors, ligand atoms): The total internal MolDockScore energy of the pose.
• E-Intra (steric): Steric self-interaction energy for the pose (calculated by PLP).
• E-Intra (hbond): Hydrogen bonding self-interaction energy for the pose (calculated by
PLP).
• E-Intra (elec): Electrostatic self-interaction energy for the pose.
• E-Intra (tors) Torsional energy for the pose.
• E-Intra (sp2-sp2) Additional sp2-sp2 torsional term for the pose.
• E-Intra (vdw) Steric self-interaction energy for the pose (calculated by a LJ12-6 VdW
approximation).
• E-Solvation The energy calculated from the implicit solvation model.
• E-Soft Constraint Penalty The energy contributions from soft constraints.
Static terms
• Torsions: The number of (chosen) rotatable bonds in the ligand.
• HeavyAtoms: Number of heavy atoms.
• MW Molecular weight (in dalton).
• C0 Obsolete constant term: This value is always 1.
• CO2minus: Number of Carboxyl groups in ligand.
• Csp2: Number of Sp2 hybridized carbon atoms in ligand.
36
Other terms
RMSD: The RMS deviation from a reference ligand.
The guided differential evolution algorithm (MolDock Optimizer) used in MVD is based on
an EA variant called differential evolution (DE). The DE algorithm was introduced by Storn
and Price in 1995 [STORN 1995]. Compared to more widely known EA-based techniques
(e.g. genetic algorithms, evolutionary programming, and evolution strategies), DE uses a
different approach to select and modify candidate solutions (individuals). The main
innovative idea in DE is to create offspring from a weighted difference of parent solutions.
Afterwards, the offspring replaces the parent, if and only if it is more fit. Otherwise, the
parent survives and is passed on to the next generation iteration of the algorithm).
Additionally, guided differential evolution may use a cavity prediction algorithm to constrain
predicted conformations (poses) during the search process. More specifically, if a candidate
solution is positioned outside the cavity, it is translated so that a randomly chosen ligand atom
will be located within the region spanned by the cavity.
38
Naturally, this strategy is only applied if a cavity has been found. If no cavities are reported,
the search procedure does not constrain the candidate solutions. One of the reasons why DE
works so well is that the variation operator exploits the population diversity in the following
manner: Initially, when the candidate solutions in the population are randomly generated the
diversity is large. Thus, when offspring are created the differences between parental solutions
are big, resulting in large step sizes being used. As the algorithm converges to better
solutions, the population diversity is lowered, and the step sizes used to create offspring are
lowered correspondingly. Therefore, by using the differences between other individuals in the
population, DE automatically adapts the step sizes used to create offspring as the search
process converges toward good solutions.
Only ligand properties are represented in the individuals since the protein remains rigid
during the docking simulation. Thus, a candidate solution is encoded by an array of real-
valued numbers representing ligand position, orientation, and conformation as Cartesian
coordinates for the ligand translation, four variables specifying the ligand orientation
(encoded as a rotation vector and a rotation angle), and one angle for each flexible torsion
angle in the ligand (if any). Each individual in the initial population is assigned a random
position within the search space region (defined by the user).
Initializing the orientation is more complicated: By just choosing uniform random numbers
for the orientation axis (between -1.0 and 1.0 followed by normalization of the values to form
a unit vector) and the angle of rotation (between -180° and +180°), the initial population
would be biased towards the identity orientation (i.e. no rotation). To avoid this bias, the
algorithm by Shoemake et al. [SHOEMAKE 1992] for generating uniform random
quaternions is used and the random quaternions are then converted to their rotation
axis/rotation angle representation. The flexible torsion angles (if any) are assigned a random
angle between -180° and +180°.
In MVD, the following default parameters are used for the guided differential evolution
algorithm: population size = 50, crossover rate = 0.9, and scaling factor = 0.5. These settings
have been found by trial and error, and are generally found to give the best results across a
test set of 77 complexes.
39
In order to determine the potential binding sites, a grid-based cavity prediction algorithm
has been developed. The cavity prediction algorithm works as follows:
First, a discrete grid with a resolution of 0.8 Å covering the protein is created. At every grid
point a sphere of radius 1.4 Å is placed. It is checked whether this sphere will overlap with
any of the spheres determined by the Van der Waals radii of the protein atoms. Grid points
where the probe clashes with the protein atom spheres will be referred to as part of the
inaccessible volume, all other points are referred to as accessible.
Second, each accessible grid point is checked for whether it is part of a cavity or not using the
following procedure: From the current grid point a random direction is chosen, and this
direction (and the opposite direction) is followed until the grid boundaries are hit, checking if
an inaccessible grid point is hit on the way. This is repeated a number of times, and if the
percentage of lines hitting an inaccessible volume is larger than a given threshold, the point is
marked as being part of a cavity. By default 16 different directions are tested, and a grid point
is assumed part of a cavity if 12 or more of these lines hit an inaccessible volume. The
threshold can be tuned according to how enclosed the found cavities should be. A value of 0%
would only be possible far from the protein as opposed to a value of 100% corresponding to a
binding site buried deeply in the protein.
The final step is to determine the connected regions. Two grid points are connected if they are
neighbours. Regions with a volume below 10.0 Å3 are discarded as irrelevant (the volume of
a connected set of grid points is estimated as the number of grid point times the volume of a
unit grid cell). The cavities found are then ranked according to their volume.
Clustering Algorithm
The multiple poses returned from a docking run are identified using the following procedure:
During the docking run, new candidate solutions (poses) scoring better than parental
solutions are added to a temporary pool of docking solutions.
If the number of poses in the pool is higher than 300, a clustering algorithm is used to
cluster all the solutions in the pool. The clustering is performed on-line during the docking
search and when the docking run terminates. Because of the limit of 300 poses, the clustering
process is fast. The members of the pool are replaced by the new cluster representatives found
(limited by the Max number of poses returned option).
40
Additionally, Molegro Virtual Docker uses its own MVDML file format. MVDML is a
shorthand notation for Molegro Virtual Docker Markup Language and is an XML-based file
format. In general, MVDML can be used to store the following information:
Molecular structures (atom coordinates, atom types, partial charges,
bond orders, hybridization states, ...)
Constraints (location, type, and constraint parameters)
Search space (center and radius)
State information (workspace properties)
Cavities (location, cavity grid points)
41
Similar sequences adopt identical structures and distantly related sequences fold into similar
structures.
42
2) Alignment correction
Alignments are scored (substitution score) in order to define similarity between 2 aa
residues in the sequences A substitutions score is calculated for each aligned pair of
letters. Substitution matrices:
- Reflect the true probabilities of mutations occurring through a period of evolution
-PAM family: based on global aligments of closely related proteins. Mutation
probability matrix.
- BLOSUM family: based on observed alignments, no extrapolation of sequences
that are related.
3) Backbone generation
Uses known structurally conserved regions to generate coordinates for the unknown
For SCRs - copy coordinates from known structures.
For variable regions (VR) - copy from known structure, if the residue types are
similar; otherwise, use databases for fragmented loop sequences.
44
45
4) Loop modelling:
Loops are created as a result of substitutions, insertions and deletions in the same
family.
5) Side-chain modelling
Use of rotamer libraries (backbone dependent)
46
6) Model optimization
Done by molecular mechanics methods.
7) Model validation:
Online servers: CPH protein model server; PS2 protein model server; 3D JIGSAW;
PHYRE
2.6 DOCKING:
Docking is nothing but computer simulation of binding interaction between two molecules.
These two molecules may be:
1. Two proteins
Docking strategy:
48
Types of docking:
1. Rigid docking: In this type of docking, both the molecules are kept rigid. That is,
their side chains are not movable. This type of docking is not natural and is done only
in the softwares.
2. Semi-flexible docking: in this type of docking, the larger protein molecule is made
rigid, whereas the smaller ligand is kept flexible. This usually done in protein and
drug docking. It is also known as quasi-flexible docking.
3. Flexible docking: in this type of docking, both the molecules are kept flexible in
nature and this is the only type of docking which is seen in natural conditions also.
Every docking software, follows a method i.e an algorithm to perform the docking
process and to give the best drug for a particular protein. This process or method is known as
search algorithm or search strategy. There are four types of search algorithms:
Tabu search
Molecular dynamics
Energy minimization
Are the functions which are used to score the proteins and ligand complexes an give
us that complex which is having the least energy value. So, for every docking run, a particular
score is given by the scoring functions. Scoring functions vary from tool to tool. There are
three types of scoring functions:
1. Force field based scoring function: atomic structure, valency, bond angle, bond
length etc.
• GOLD score
• G score
• D score
• AMBA score
• CHARM
• GROMOS
• Chem score
50
• F score
• X core
• Drug score
• SMOG score
CHAPTER - 3
search for structural
similarity of the protein
DrugBank www.drugbank.ca Download the drug Imatinib
in mol format
Exonic mutation Clients’ research Preparing the mutant
information
MATERIALS nucleotides[mRNA]
3.2
AND
METHODS
TABLE OF TOOLS, THEIR SOURCES AND THE WORKING METHOD
THEY ARE USED IN:
NAME URL/ SOURCE USED FOR/
WORKING
METHOD
ORF Finder http://www.ncbi.nlm.nih.gov/gorf/gorf.html Finding the correct
reading frame of a
given nucleotide
sequence
Blast p http://blast.ncbi.nlm.nih.gov/Blast.cgi Pair-wise sequence
alignment to find
similarity against
52
NOTE: The protocol followed while using the above databases and tools to
for the above mentioned working methods are detailed in the next chapter.
53
CHAPTER - 4
THE
EXPERIMENT
tyrosine kinase Download:
Opened the NCBI website by entering the following url in the internet explorer address
bar:
www.ncbi.nlm.nih.gov
Chose “nucleotide” from the database scroll list and typed < abl1 AND human> in the
search box and clicked on search.
54
Among the many results returned, the required result was chosen and the mRNA and the
protein sequence of the same were opened through the link given [marked by arrow]
Both the sequences were opened in the FASTA format and saved. Here there are two
transcripts shown we chose the second one in random.
The client has sequenced Abl tyrosine kinase domain in CML patients and provided us with
the point mutation data. Using this data, we created mutant mRNA sequences by simply
editing the base at which the mutation has occurred in the mRNA sequenced downloaded
from NCBI in a notepad. And while doing so, spaces if any were removed. Each mutant
nucleotide sequence was saved as a separate file.
Now, to convert these mutant nucleotide sequences into a protein sequence, we used the
ORF Finder [Open Reading Frame Finder] provided by the NCBI, at the following link:
http://www.ncbi.nlm.nih.gov/gorf/gorf.html
Copied and pasted the mutant nucleotide sequence in the box given and clicked on ORF
find button. A screen shot of the same is shown below.
55
The result of the ORF find retrieved is shown below. We chose that sequence and frame in
which our mutation is likely to be present.
56
The screen shot below shows the result of the protein sequence of the chosen frame and
length.
57
After clicking on the accept button, we chose to view the sequence in Fasta protein from
the view scroll list and saved the result in a notepad.
4.4 BLASTp:
Now that we have got the mutant protein sequences, we needed to find their structures by
homology modelling. But for that reason, we needed a template. And this template was
obtained by using Basic Local Alignment Search Tool. The BLASTp feature, provided by the
NCBI, allows us to search a protein database using a protein query.
http://blast.ncbi.nlm.nih.gov/Blast.cgi
58
The link to BLASTp, [encircled in red] was then opened. This is shown below in two
screen shots. In the first , the sequence in its fasta format is pasted and in the second, the job
title, the database and the algorithm are specified and BLAST button was clicked.
59
The results have shown that the query sequence shows 100% identity with the protein 2e2b
[Crystal structure of the c-Abl kinase domain in complex with INNO-406 ] in the PDB
database. The pdb file of this protein was downloaded from the PDB site.
After opening the application, from the tool bar, <file – new> was initiated. This opens up
a dialogue box “new molecule/sequence/grob” in this, the protein sequence of the mutant
nucleotide obtained from the ORF finder was pasted in its Fasta format and a sequence name
was given and then clicked ok.
Now the protein sequence is uploaded in the work space. From the tool bar <file—open>
was chosen and the template i.e 2e2b was imported. And the screen shot of the same is shown
below.
61
In this case, the template was already in object format. If the template is not in the object
format, it won’t show itself in the work space. In such a case, the template can be converted
into an object using <ICM CONVERT—protein > under MolMechanics in the tool bar.
This will open a dialogue box “convert molecular object to.....” choose the options as shown
and click ok.
62
When in the status window shows: <if yes cool a_0> that means the convert is completed.
Next step is to build the homology model. For this, “build model” under homology in the
tools bar is clicked.
The sources: fields were chosen by us. The preferences: fields are default settings and the
options were chosen as shown.
Building the model takes a few minutes. But then the result is retrieved as shown below.
64
The mid box[marked with red] shows alignment between the template and the protein
sequence uploaded for homology modelling. The molecule can be viewed properly by using
the viewing tools on the right [marked with red]. The loop beneath the alignment box gives
information about the loops.
The work space is saved by right clicking the “icm” icon next to the protein sequence in
the selection space and then “save as”. The file is saved as a PDB file.
After opening the application, the mutant protein model was imported by <File—import
molecule> and then browsing and selecting the molecule from the folder.
65
This opened a dialogue box “import molecules” and the required options were chosen.
This imported the molecule into the model / docking visualization box. Now the protein
molecule is to be prepared for docking. For this <preparation—prepare molecule> from tool
bar was initiated. The screen shot of the same is shown below.
66
This opened a dialogue box “prepare molecules”. The appropriate fields were chosen.
67
Next, the protein surface is to be created. For this, a right click on the protein icon in the
work space and subsequently choosing the “create surface” opened a dialogue box “create
surface”. The appropriate fields were chosen. And the create surface was initiated. The screen
shots of these steps are shown below.
68
Once the “probing of grid points” is done, the protein molecule’s surface is created. Now,
the cavities [in green] of the protein molecule were detected as shown below.
69
70
Next, the drug “imatinib” was imported in the same way as of the protein [first step].
Now a series of dialogue boxes were opened and the appropriate fields were selected as
shown.
72
73
The fields that have been shown by arrows were set by us and the fields not marked are
default settings. When start button was clicked, the docking was initiated. Some screen shots
were taken during the process and are shown below.
74
75
Now the MVD batchjob dialogue box is closed. And the results are imported into the work
space as shown below.
76
Each pose dock, can be now visualized by selecting the pose of interest.
77
Drug
docked in
the cavity.
78
CHAPTER - 5
RESULTS AND 5
.1C
LI
T’ DISCUSSIONS EN
S
In the above table, the ones that are marked with stars belong to exonic region and the others
are intronic. Since post transcription, splicing of introns occur, the intronic mutations are not
taken into account here. So in all there are 16 exonic mutations. Since p25 has a double
mutation [148967+149119], we can say there are 17 different mutation cases as per the data
given.
5.2 Database search for wild type mRNA and protein sequence:
mRNA sequence :
>gi|62362411|ref|NM_007313.2| Homo sapiens c-abl oncogene 1, receptor tyrosine kinase (ABL1),
transcript variant b, mRNA
81
GGTTGGTGACTTCCACAGGAAAAGTTCTGGAGGAGTAGCCAAAGACCATCAGCGTTTCCT
TTATGTGTGAGAATTGAAATGACTAGCATTATTGACCCTTTTCAGCATCCCCTGTGAATATTT
CTGTTTAGGTTTTTCTTCTTGAAAAGAAATTGTTATTCAGCCCGTTTAAAACAAATCAAGA
AACTTTTGGGTAACATTGCAATTACATGAAATTGATAACCGCGAAAATAATTGGAACTCCT
GCTTGCAAGTGTCAACCTAAAAAAAGTGCTTCCTTTTGTTATGGAAGATGTCTTTCTGTGA
TTGACTTCAATTGCTGACTTGTGGAGATGCAGCGAATGTGAAATCCCACGTATATGCCATTT
CCCTCTACGCTCGCTGACCGTTCTGGAAGATCTTGAACCCTCTTCTGGAAAGGGGTACCTA
TTATTACTTTATGGGGCAGCAGCCTGGAAAAGTACTTGGGGACCAAAGAAGGCCAAGCTT
GCCTGCCCTGCATTTTATCAAAGGAGCAGGGAAGAAGGAATCATCGAGGCATGGGGGTCC
ACACTGCAATGTTTTTGTGGAACATGAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAG
CCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCC
AGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACA
CTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATG
GTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAG
TCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGT
ATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGAGTGAGAGCAGTCCTG
GCCAGAGGTCCATCTCGCTGAGATACGAAGGGAGGGTGTACCATTACAGGATCAACACTG
CTTCTGATGGCAAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGT
TCATCATCATTCAACGGTGGCCGACGGGCTCATCACCACGCTCCATTATCCAGCCCCAAAG
CGCAACAAGCCCACTGTCTATGGTGTGTCCCCCAACTACGACAAGTGGGAGATGGAACGC
ACGGACATCACCATGAAGCACAAGCTGGGCGGGGGCCAGTACGGGGAGGTGTACGAGGG
CGTGTGGAAGAAATACAGCCTGACGGTGGCCGTGAAGACCTTGAAGGAGGACACCATGG
AGGTGGAAGAGTTCTTGAAAGAAGCTGCAGTCATGAAAGAGATCAAACACCCTAACCTG
GTGCAGCTCCTTGGGGTCTGCACCCGGGAGCCCCCGTTCTATATCATCACTGAGTTCATGA
CCTACGGGAACCTCCTGGACTACCTGAGGGAGTGCAACCGGCAGGAGGTGAACGCCGTG
GTGCTGCTGTACATGGCCACTCAGATCTCGTCAGCCATGGAGTACCTGGAGAAGAAAAAC
TTCATCCACAGAGATCTTGCTGCCCGAAACTGCCTGGTAGGGGAGAACCACTTGGTGAAG
GTAGCTGATTTTGGCCTGAGCAGGTTGATGACAGGGGACACCTACACAGCCCATGCTGGA
GCCAAGTTCCCCATCAAATGGACTGCACCCGAGAGCCTGGCCTACAACAAGTTCTCCATC
AAGTCCGACGTCTGGGCATTTGGAGTATTGCTTTGGGAAATTGCTACCTATGGCATGTCCC
CTTACCCGGGAATTGACCTGTCCCAGGTGTATGAGCTGCTAGAGAAGGACTACCGCATGG
AGCGCCCAGAAGGCTGCCCAGAGAAGGTCTATGAACTCATGCGAGCATGTTGGCAGTGGA
ATCCCTCTGACCGGCCCTCCTTTGCTGAAATCCACCAAGCCTTTGAAACAATGTTCCAGGA
ATCCAGTATCTCAGACGAAGTGGAAAAGGAGCTGGGGAAACAAGGCGTCCGTGGGGCTG
TGAGTACCTTGCTGCAGGCCCCAGAGCTGCCCACCAAGACGAGGACCTCCAGGAGAGCT
GCAGAGCACAGAGACACCACTGACGTGCCTGAGATGCCTCACTCCAAGGGCCAGGGAGA
GAGCGATCCTCTGGACCATGAGCCTGCCGTGTCTCCATTGCTCCCTCGAAAAGAGCGAGG
TCCCCCGGAGGGCGGCCTGAATGAAGATGAGCGCCTTCTCCCCAAAGACAAAAAGACCA
ACTTGTTCAGCGCCTTGATCAAGAAGAAGAAGAAGACAGCCCCAACCCCTCCCAAACGC
AGCAGCTCCTTCCGGGAGATGGACGGCCAGCCGGAGCGCAGAGGGGCCGGCGAGGAAG
AGGGCCGAGACATCAGCAACGGGGCACTGGCTTTCACCCCCTTGGACACAGCTGACCCA
GCCAAGTCCCCAAAGCCCAGCAATGGGGCTGGGGTCCCCAATGGAGCCCTCCGGGAGTC
CGGGGGCTCAGGCTTCCGGTCTCCCCACCTGTGGAAGAAGTCCAGCACGCTGACCAGCA
GCCGCCTAGCCACCGGCGAGGAGGAGGGCGGTGGCAGCTCCAGCAAGCGCTTCCTGCGC
TCTTGCTCCGCCTCCTGCGTTCCCCATGGGGCCAAGGACACGGAGTGGAGGTCAGTCACG
CTGCCTCGGGACTTGCAGTCCACGGGAAGACAGTTTGACTCGTCCACATTTGGAGGGCAC
AAAAGTGAGAAGCCGGCTCTGCCTCGGAAGAGGGCAGGGGAGAACAGGTCTGACCAGG
TGACCCGAGGCACAGTAACGCCTCCCCCCAGGCTGGTGAAAAAGAATGAGGAAGCTGCT
GATGAGGTCTTCAAAGACATCATGGAGTCCAGCCCGGGCTCCAGCCCGCCCAACCTGACT
CCAAAACCCCTCCGGCGGCAGGTCACCGTGGCCCCTGCCTCGGGCCTCCCCCACAAGGA
AGAAGCTGGAAAGGGCAGTGCCTTAGGGACCCCTGCTGCAGCTGAGCCAGTGACCCCCA
CCAGCAAAGCAGGCTCAGGTGCACCAGGGGGCACCAGCAAGGGCCCCGCCGAGGAGTC
CAGAGTGAGGAGGCACAAGCACTCCTCTGAGTCGCCAGGGAGGGACAAGGGGAAATTGT
CCAGGCTCAAACCTGCCCCGCCGCCCCCACCAGCAGCCTCTGCAGGGAAGGCTGGAGGA
AAGCCCTCGCAGAGCCCGAGCCAGGAGGCGGCCGGGGAGGCAGTCCTGGGCGCAAAGA
82
CAAAAGCCACGAGTCTGGTTGATGCTGTGAACAGTGACGCTGCCAAGCCCAGCCAGCCG
GGAGAGGGCCTCAAAAAGCCCGTGCTCCCGGCCACTCCAAAGCCACAGTCCGCCAAGCC
GTCGGGGACCCCCATCAGCCCAGCCCCCGTTCCCTCCACGTTGCCATCAGCATCCTCGGCC
CTGGCAGGGGACCAGCCGTCTTCCACCGCCTTCATCCCTCTCATATCAACCCGAGTGTCTC
TTCGGAAAACCCGCCAGCCTCCAGAGCGGATCGCCAGCGGCGCCATCACCAAGGGCGTG
GTCCTGGACAGCACCGAGGCGCTGTGCCTCGCCATCTCTAGGAACTCCGAGCAGATGGCC
AGCCACAGCGCAGTGCTGGAGGCCGGCAAAAACCTCTACACGTTCTGCGTGAGCTATGTG
GATTCCATCCAGCAAATGAGGAACAAGTTTGCCTTCCGAGAGGCCATCAACAAACTGGAG
AATAATCTCCGGGAGCTTCAGATCTGCCCGGCGACAGCAGGCAGTGGTCCAGCGGCCACT
CAGGACTTCAGCAAGCTCCTCAGTTCGGTGAAGGAAATCAGTGACATAGTGCAGAGGTAG
CAGCAGTCAGGGGTCAGGTGTCAGGCCCGTCGGAGCTGCCTGCAGCACATGCGGGCTCG
CCCATACCCGTGACAGTGGCTGACAAGGGACTAGTGAGTCAGCACCTTGGCCCAGGAGCT
CTGCGCCAGGCAGAGCTGAGGGCCCTGTGGAGTCCAGCTCTACTACCTACGTTTGCACCG
CCTGCCCTCCCGCACCTTCCTCCTCCCCGCTCCGTCTCTGTCCTCGAATTTTATCTGTGGAG
TTCCTGCTCCGTGGACTGCAGTCGGCATGCCAGGACCCGCCAGCCCCGCTCCCACCTAGT
GCCCCAGACTGAGCTCTCCAGGCCAGGTGGGAACGGCTGATGTGGACTGTCTTTTTCATTT
TTTTCTCTCTGGAGCCCCTCCTCCCCCGGCTGGGCCTCCTTCTTCCACTTCTCCAAGAATG
GAAGCCTGAACTGAGGCCTTGTGTGTCAGGCCCTCTGCCTGCACTCCCTGGCCTTGCCCG
TCGTGTGCTGAAGACATGTTTCAAGAACCGCATTTCGGGAAGGGCATGCACGGGCATGCA
CACGGCTGGTCACTCTGCCCTCTGCTGCTGCCCGGGGTGGGGTGCACTCGCCATTTCCTCA
CGTGCAGGACAGCTCTTGATTTGGGTGGAAAACAGGGTGCTAAAGCCAACCAGCCTTTGG
GTCCTGGGCAGGTGGGAGCTGAAAAGGATCGAGGCATGGGGCATGTCCTTTCCATCTGTC
CACATCCCCAGAGCCCAGCTCTTGCTCTCTTGTGACGTGCACTGTGAATCCTGGCAAGAA
AGCTTGAGTCTCAAGGGTGGCAGGTCACTGTCACTGCCGACATCCCTCCCCCAGCAGAAT
GGAGGCAGGGGACAAGGGAGGCAGTGGCTAGTGGGGTGAACAGCTGGTGCCAAATAGCC
CCAGACTGGGCCCAGGCAGGTCTGCAAGGGCCCAGAGTGAACCGTCCTTTCACACATCTG
GGTGCCCTGAAAGGGCCCTTCCCCTCCCCCACTCCTCTAAGACAAAGTAGATTCTTACAAG
GCCCTTTCCTTTGGAACAAGACAGCCTTCACTTTTCTGAGTTCTTGAAGCATTTCAAAGCC
CTGCCTCTGTGTAGCCGCCCTGAGAGAGAATAGAGCTGCCACTGGGCACCTGCGCACAGG
TGGGAGGAAAGGGCCTGGCCAGTCCTGGTCCTGGCTGCACTCTTGAACTGGGCGAATGTC
TTATTTAATTACCGTGAGTGACATAGCCTCATGTTCTGTGGGGGTCATCAGGGAGGGTTAG
GAAAACCACAAACGGAGCCCCTGAAAGCCTCACGTATTTCACAGAGCACGCCTGCCATCT
TCTCCCCGAGGCTGCCCCAGGCCGGAGCCCAGATACGGGGGCTGTGACTCTGGGCAGGG
ACCCGGGGTCTCCTGGACCTTGACAGAGCAGCTAACTCCGAGAGCAGTGGGCAGGTGGC
CGCCCCTGAGGCTTCACGCCGGGAGAAGCCACCTTCCCACCCCTTCATACCGCCTCGTGC
CAGCAGCCTCGCACAGGCCCTAGCTTTACGCTCATCACCTAAACTTGTACTTTATTTTTCTG
ATAGAAATGGTTTCCTCTGGATCGTTTTATGCGGTTCTTACAGCACATCACCTCTTTGCCCC
CGACGGCTGTGACGCAGCCGGAGGGAGGCACTAGTCACCGACAGCGGCCTTGAAGACAG
AGCAAAGCGCCCACCCAGGTCCCCCGACTGCCTGTCTCCATGAGGTACTGGTCCCTTCCTT
TTGTTAACGTGATGTGCCACTATATTTTACACGTATCTCTTGGTATGCATCTTTTATAGACGC
TCTTTTCTAAGTGGCGTGTGCATAGCGTCCTGCCCTGCCCCCTCGGGGGCCTGTGGTGGCT
CCCCCTCTGCTTCTCGGGGTCCAGTGCATTTTGTTTCTGTATATGATTCTCTGTGGTTTTTTT
TGAATCCAAATCTGTCCTCTGTAGTATTTTTTAAATAAATCAGTGTTTACATTAGAA
SESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGE
VYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLL
DYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDT
YTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERP
EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGVRGAVSTLLQAPELPT
KTRTSRRAAEHRDTTDVPEMPHSKGQGESDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNL
FSALIKKKKKTAPTPPKRSSSFREMDGQPERRGAGEEEGRDISNGALAFTPLDTADPAKSPKPSNGAGVP
NGALRESGGSGFRSPHLWKKSSTLTSSRLATGEEEGGGSSSKRFLRSCSASCVPHGAKDTEWRSVTLPR
DLQSTGRQFDSSTFGGHKSEKPALPRKRAGENRSDQVTRGTVTPPPRLVKKNEEAADEVFKDIMESSPG
SSPPNLTPKPLRRQVTVAPASGLPHKEEAGKGSALGTPAAAEPVTPTSKAGSGAPGGTSKGPAEESRVRR
HKHSSESPGRDKGKLSRLKPAPPPPPAASAGKAGGKPSQSPSQEAAGEAVLGAKTKATSLVDAVNSDAA
KPSQPGEGLKKPVLPATPKPQSAKPSGTPISPAPVPSTLPSASSALAGDQPSSTAFIPLISTRVSLRKTRQPP
ERIASGAITKGVVLDSTEALCLAISRNSEQMASHSAVLEAGKNLYTFCVSYVDSIQQMRNKFAFREAINK
LENNLRELQICPATAGSGPAATQDFSKLLSSVKEISDIVQR
T=;P=Phenylalanine;Y=;S=Serine;A=Alanine;D=;E=;Q=;N=;V=Valine;G=Glycine;K=;
Here in this table and here after, T514K means that T in the wild type has been replaced by K
in the mutant type at position 514.The above table shows that there are 5 mutation cases
where the mutant protein is truncated. A truncated protein cannot form an active protein and
hence these mutation cases have been omitted in the next steps. The above table also shows
that there are 3 mutation cases, though there is a change in the nucleotide level, at the protein
level, the changed codon codes for same A.A and hence no change.
84
The blast p result of wild type protein of abl tyrosine kinase Vs PDB database gave the
following result. Only the top few searches have been shown here.
Search Parameters
Program blastp
Word size 3
Expect value 10
Hitlist size 100
Gapcosts 11,1
Matrix BLOSUM62
Threshold 11
Composition-based stats 2
Filter string F
Genetic Code 1
Window Size 40
Database
Descriptions
pdb|1OPL|A Chain A, Structural Basis For The Auto-Inhibition ... 1128 0.0
pdb|1OPK|A Chain A, Structural Basis For The Auto-Inhibition ... 1033 0.0
pdb|2FO0|A Chain A, Organization Of The Sh3-Sh2 Unit In Activ... 1021 0.0
86
pdb|2E2B|A Chain A, Crystal Structure Of The C-Abl Kinase Dom... 617 9e-177
pdb|2QOH|A Chain A, Crystal Structure Of Abl Kinase Bound Wit... 611 5e-175
pdb|1FPU|A Chain A, Crystal Structure Of Abl Kinase Domain In... 611 5e-175
pdb|2G1T|A Chain A, A Src-Like Inactive Conformation In The A... 611 5e-175
pdb|2F4J|A Chain A, Structure Of The Kinase Domain Of An Imat... 610 8e-175
Score = 1128 bits (2917), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 528/534 (98%), Positives = 533/534 (99%), Gaps = 0/534 (0%)
Query 1 MGQQPGKVLGDQRRPSLPALHFIKGAGKKESSRHGGPHCNVFVEHEALQRPVASDFEPQG 60
MGQQPGKVLGDQRRPSLPALHFIKGAGK++SSRHGGPHCNVFVEHEALQRPVASDFEPQG
Sbjct 1 MGQQPGKVLGDQRRPSLPALHFIKGAGKRDSSRHGGPHCNVFVEHEALQRPVASDFEPQG 60
Score = 1033 bits (2670), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 484/488 (99%), Positives = 488/488 (100%), Gaps = 0/488 (0%)
Score = 617 bits (1590), Expect = 9e-177, Method: Compositional matrix adjust.
Identities = 287/287 (100%), Positives = 287/287 (100%), Gaps = 0/287 (0%)
Score = 611 bits (1575), Expect = 5e-175, Method: Compositional matrix adjust.
Identities = 284/286 (99%), Positives = 286/286 (100%), Gaps = 0/286 (0%)
Out of the above alignment results, the protein 2E2B [Crystal Structure Of The C-Abl Kinase
Domain In Complex With Inno-406] gave 100% identity. This protein PDB was downloaded
from PDB and then used as a template for Homology Modelling in ICM Molsoft.
The energy deviations in the above table show that, among the 11 mutant types there is high
deviation in Y234S but this deviation is in negative, which suggests that this mutation in fact
“might” help in Imatinib drug binding and consequently effective drug action. On the other
hand, the highest positive energy deviation is for the mutant model V467G. This indicates
that, if the patient with this kind of mutation is given Imatinib to combat CML, he “may” be
required to keep a check on the drug resistance. The two mutations, N355N and Q271Q
showed no effective docking energy deviation from the wild type since there was no change
in the amino acid residue in these two cases.
90
6.1
CHAPTER - 6
CONCLUSIONS
AND SCOPE
CONCLUSIONS:
Bioinformatics has led to an approach where certain assumptions can be made for a particular
case in very time, cost and labour efficient way. This in silico approach has helped many a
researchers to eliminate certain instances in a big project, to finish it in lesser time and cost.
Of course, the results of an in silico approach cannot be taken as final. They need to be tested
to some extent in vitro. In this project, the situation is quite similar and the results here are
just an assumption, to help the researchers to choose the direction in which the project must
further proceed.
91
After evaluating the mutation data sent by the client; creating mutant nucleotide sequences;
finding its reading frame; modelling the mutant protein by ICM molsoft; docking the drug in
question, onto these mutant models by Molegro Virtual Docker; we have come to a
conclusion that, out of the 16 mutation cases, 5 mutant proteins are truncated and can’t form
active protein. In the remaining 11, there are 2 mutant cases which do not show any change in
the protein level and hence no appreciable docking energy deviation; there are 4 mutant cases
where the mutation “might” act favourably for the drug Imatinib to bind and act effectively
against CML; while there are 5 mutant cases where, the mutations “might” cause slight, if not
severe Imatinib drug resistance.
The scope of this project remains large enough. In this project, about 250 patients were
screened and their DNA was sequenced for the mutation data. The scope of this project lies in
screening more individuals. And also, we have considered only one mutation in each case
[except P25], as it was given in the individual details list. We can further test the mutation Vs
drug resistance with combination of these mutations. One more aspect that may be looked
into for obvious is designing a new drug or testing other drugs, in silico and in clinical trials
for those who have confirmed Imatinib Drug resistance.
REFERENCE
1. Brain J. Druker, Moshe Talpaz, Debra J. Resta: Efficacy and Safety of a specific
Inhibitor of the BCR-ABL Tyrosine Kinase in CML: N England J Med, Vol. 344,
No. 14
92
4. Karl Peggs, M.A., and Stephen Mackinnon, M.D. Imatinib Mesylate — The New
Gold Standard for Treatment of Chronic Myeloid Leukemia : New England j
med 348;11
6. Michael W.N. Deininger, John M. Goldman, Nicholas Lydon and Junia V. Melo of
BCR-ABL-Positive Cells The Tyrosine Kinase Inhibitor CGP57148B Selectively
Inhibits the Growth : Blood 1997 90: 3691-3698
7. Neil P.Shah, Brian J. Skaggs, Susan Branford, Timothy P. Hughes, John M. Nicoll,:
Sequential ABL kinase inhibitor therapy selects for compound drug-resistant
BCR-ABL mutations with altered oncogenic potency: PUBMED
10. Stefan Faderl, MD; Moshe Talpaz, MD; Zeev Estrov, MD; and Hagop M. Kantarjian,
MD: Chronic Myelogenous Leukemia: Biology and Therapy
93
11. Susan Branford, Zbigniew Rudzki, Sonya Walsh, Ian Parkinson, Andrew Grigg, Jeff
Szer, Detection of BCR-ABL mutations in patients with CMLtreated with
imatinib is virtually always accompanied by clinical resistance, and mutations in
theATP phosphate-binding loop (P-loop) are associated with a poor prognosis:
Blood, July 2003
12. Thomas O'Hare, Christopher A. Eide and Michael W. N. Deininger: Bcr-Abl kinase
domain mutations, drug resistance, and the road to a cure for chronic myeloid
leukemia: doi:10.1182/blood-2007-03-066936
16. www.cancer.gov
17. www.clinicalcancerresearch.gov
18. www.drugbank.ca
19. www.ncbi.nlm.nih.gov
20. www.rcsb.org/pdb/home/home
21. www.wikipedia.org
22. www.scribd.com