Professional Documents
Culture Documents
Quantitative Structure Activity Relationship (QSAR) Study of HIV-1 Integrase Inhibitors Based On Chalcone Pharmacophore
Quantitative Structure Activity Relationship (QSAR) Study of HIV-1 Integrase Inhibitors Based On Chalcone Pharmacophore
M.Sc. Bioinformatics
B.SANGEETHA
(Reg. No: 1076013)
Dr. R. KRISHNA
Lecturer,
Centre for Bioinformatics
Pondicherry University
MAY 2009
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
CERTIFICATE
EXTERNAL EXAMINER
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
DECLARATION
Place:
Date: B.SANGEETHA
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
ACKNOWLEDGEMENTS
I extend my gratitude and sincere thanks to Dr. Suresh Kumar and Mr. M.
Sundara Mohan for their kindness and support during the course of the work.
I would like specially thank all my friends for their constant support during the
progress of work and throughout the course.
B. Sangeetha
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
CONTENTS
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
1. INTRODUCTION 2
2. REVIEW OF LITERATURE 4
2.1 HUMAN IMMUNODEFICIENCY VIRUS AND AIDS 4
2.2 NATURAL PRODUCTS AS DRUGS 4
2.3 HIV INTEGRASE 5
2.4 PHARMACOPHORE HYPOTHESIS MODEL 6
2.5 MOLECULAR DOCKING 6
2.5.1 TERMS USED IN MOLECULAR DOCKING 7
2.6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) 7
2.6.1 DESCRIPTORS 8
2.6.2 STATISTICAL TERMS USED IN QSAR 8
2.6.2.1 DEPENDENT VARIABLE 8
2.6.2.2 INDEPENDENT VARIABLES 9
2.6.2.3 COVARIANCE 9
2.6.2.4 PEARSON r/R 9
2.6.2.5 MULTIPLE LINEAR REGRESSION 9
2.6.2.6 R-SQUARED 10
2.6.2.7 F-STATISTIC 10
2.6.2.8 P-VALUE 10
3. OBJECTIVE 12
4. MATERIALS 14
4.1 HIV-1 Integrase 14
4.2 LIGANDS 14
4.3 AutoDock 4.0 25
4.4 STATISTICA 25
4.5 HYPERCHEM 25
4.6 QIKPROP 25
5. METHODS 26
5.1 SCREENING 26
5.1.1 TERMS USED FOR SCREENING 26
5.2 DOCKING 26
5.3 SELECTION OF DESCRIPTORS 26
5.3.1 HOMO 27
5.3.2 DIPOLE 27
5.3.3 HYDRATION ENERGY 27
5.4 MODEL GENERATION 27
5.5 PREDICTION 28
6. RESULTS AND DISCUSSION 30
6.1 AUTODOCK 4.0 30
6.2 QSAR RESULTS 33
6.2.1 TRAINING SET DATA 33
6.2.2 CORRELATIONS FOR TRAINING SET 34
6.2.3 COVARIANCES FOR TRAINING SET 34
6.2.4 REGRESSION RESULTS 35
6.2.4.1 REGRESSION SUMMARY FOR THE TOP 6 35
6.2.4.2 SUMMARY OF RESIDUALS AND PREDICTED 35
6.2.4.3 REGRESSION SUMMARY FOR THREE FACTORS
WITH GOOD t-VALUE 36
6.2.4.4 SUMMARY OF RESIDUALS AND PREDICTED 36
6.2.4.5 PLOT OF OUTLIERS 37
6.2.4.6 REGRSSION AFTER REMOVING OUTLIER 37
6.2.4.7 SUMMARY OF RESIDUAL AND PREDICTED 37
6.2.4.8 PLOT OF PREDICTED AND OBSERVED pIC50
FOR THE TRAINING SET 38
6.2.4.9 PREDICTED AND OBSERVED pIC50 FOR THE TEST SET 38
7. CONCLUSION 41
8. REFERENCES 43
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
INTRODUCTION
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
1. INTRODUCTION:
The drug industry is one of the major players involved in the development of the
field of Bioinformatics[1]. Many pharmaceutical companies have internal teams conducting
Bioinformatics research. Most of the drugs are small molecules that are designed to bind,
interact, and modulate the activity of specific biological receptors. Receptors are proteins
that bind and interact with other molecules to perform the numerous functions required for
the maintenance of life. Receptors include an immense array of cell-surface receptors,
enzymes and other functional proteins. Due to genetic abnormalities, physiologic stress or
some combination thereof, the function of specific receptors and enzymes may become
altered to the point that our well-being is diminished. The roll of drugs is to correct the
functioning of these receptors.
Structural bioinformatics can facilitate the discovery, design and optimization of new
chemical entities which ranges from drugs and biological probes to biomaterials. CADD
(Computer-Aided Drug Design)[1] is a specialized discipline that uses computational
methods to simulate drug receptor interactions. As a process, drug designing is an iterative
one, involving drug discovery, lead optimization and chemical synthesis with the aim of
maximizing functional activity and minimizing adverse effects. Drug design is an
optimization problem.
HIV infection is a growing problem these days and there is no treatment as of date.
Combination therapies are used as treatment, but it is long term. Studies are carried out to
find a better way to treat HIV.
QSAR is a lead optimization method by which new drugs can be designed and
synthesized which can have the desired activity. 2D-QSAR is a statistical method by which
the numerical relationship between the various structural properties (descriptors) and the
biological activity of a series of compounds can be expressed. Based on this new compounds
can be synthesized.
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
REVIEW OF
LITERATURE
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
2. REVIEW OF LITERATURE:
Two major types of HIV have been identified so far, HIV-1 and HIV-2[2]. HIV-1 is the
cause of the worldwide epidemic and is most commonly referred to as HIV. It is a highly
variable virus, which mutates readily. There are different strains of HIV-1. HIV-1 is much less
pathogenic and occurs rarely. It is found mostly in West Africa.
HIV begins its infection of a susceptible host cell by binding to the CD4 receptor on
the host cell. It also needs a co-receptor for entering the cell. The co receptors were found
to be CCR5 and CXCR4[3]. Based on the co receptors, different strains of viruses are
identified as R5, X4 and R5X4. R5 strain does not infect naïve T-cells. All three strains infect
memory T-cells.
Following the fusion and entry into the host cell, the viral RNA (genetic material) is
released and undergoes reverse transcription into DNA. HIV REVERSE TRANSCRIPTASE is
necessary to catalyze the conversion of viral RNA into DNA. The viral DNA enters the host
cell nucleus where it integrates into the genetic material of the cell. The enzyme INTEGRASE
catalyses this process. The viral DNA translates in the host cell nucleus and synthesizes the
structural proteins as polyproteins which are cleaved by the viral PROTEASE. All these 3
enzymes are targeted by anti-HIV drugs for the past years. Among these integrase is a very
recently identified target.
HIV undergoes rapid mutations and develops drug resistance and cytotoxic side
effects, many patients experience unsatisfactory virologic, immunologic, or clinical
outcomes from currently available therapies, limiting therapeutic options. Hence
combination of inhibitors is used in therapies (Combination Therapy or HAART-Highly Active
Anti-Retroviral Therapy). Present HIV therapies consist of combinations of integrase
inhibitors, nucleoside reverse transcriptase inhibitors (NRTI’s), nucleoside reverse
transcriptase inhibitors (NNRTI’s), protease inhibitors and fusion inhibitors[2].
Nature has always provided a source of drugs for various ailments. A number of
medicinal plants have been reported to have anti-HIV properties[2]. The bioactivity-guided
fractionation of crude extracts has provided lead molecules for discovery of anti-HIV drug
candidates. A variety of secondary metabolites obtained from natural origin showed
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
moderate to good anti-HIV activity. The present natural products with anti-HIV property can
be classified under different groups like,
Alkaloids
Coumarins
Flavonoids
Lignans
Phenolics
Quinines
Saponins
Terpenes/sterols
Xanthones
Carbohydrates
Peptides
Proteins
Some of these natural products like calanolide and andrographolide are currently
under clinical trials to be approved as a drug.
Flavonoids are under study from a long time for its wide variety of functions and also
that they are easily available in nature as plant coloring pigments. Chalcones are compounds
that come under flavonoid class. They have anticancer, antiviral, antibacterial and antifungal
activity. They are of interest because of its easy availability. For ex: Xanthohumol is a prenyl
chalcone which is found in the plant Humulus lupulus. The flowers of this plant (hops) are
used in beer industry as a flavoring agent. Studies are carried out to increase xanthohumol’s
content in beer, so that anti-HIV beer can be produced and marketed.
Since they are non-specific inhibitors and also it possesses cytotoxicity in a large
number of tumor cell lines studies are carried out to find out which part of the compound’s
structure is important for its anti-HIV activity. The main problem in finding this is that the
target in HIV cell, for all the compounds is not yet identified.
HIV-1 integrase[4] is a 32-kDa enzyme, encoded by the pol gene that carries out DNA
integration in a two step-reaction.
1. 3’ processing, two nucleotides are removed from each 3’end of the viral DNA made
by reverse transcription.
2. DNA strand transfer, a pair of transesterification reactions integrates the ends of
the viral DNA into the host genome.
Integrase is comprised of three structurally and functionally distinct domains and all
three domains are required for each step of the integration reaction. The isolated domains
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
form homodimers in solution, and the three-dimensional structures of all three separate
dimers have been determined. Although little is known concerning the organization of these
domains in the active complex with DNA substrates, integrase is likely to function as at least
a tetramer. Extensive mutagenesis studies mapped the catalytic site to the core domain
(residues 50-212), which contains the catalytic residues Asp-64, Asp-116, and Glu-152. The
structure of this domain of HIV-1integrase has been reported in both apoenzyme form
(1BIS) and bound with Magnesium as the cofactor (1QS4).
There are numerous potential interactions between ligand and receptor. Depending
upon the size of the active site, there may be numerous steric, electrostatic, and
hydrophobic contacts. However some are more important than the others. The specific
interactions that are crucial for ligand recognition and binding by the receptor are termed
the pharmacophore[1]. Usually, these are the interactions that directly factor into the
structural integrity of a receptor or are involved in the mechanism of its action.
A pharmacophore model can be derived from a set of known ligands for the target. A
pharmacophore can be developed using the set of features common to a series of active
molecules. These features can include acceptors, donors, ring centroids, hydrophobes, etc.
A 3-D pharmacophore is used to define the spatial relationship between the groups or
features.
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
sites. Binding sites are areas of protein known to be active in forming of compounds. There
are several possible mutual conformations in which binding may occur. These are commonly
called binding modes.
Molecular docking can be divided into two separate problems[6]. The search
algorithm should create an optimum number of configurations that include the
experimentally determined binding modes. These configurations are evaluated using scoring
functions to distinguish the experimental binding modes from all other modes explored
through the searching algorithm.
Molecular dynamics
Monte Carlo methods
Genetic algorithms
Fragment-based methods
Point complementary methods
Distance geometry methods
Tabu searches
Systematic searches
Force-field methods
Empirical free energy scoring functions
Knowledge-based potential of mean force
A QSAR study tries to establish a link between the ability of a certain molecule to
perform its desired function and properties of that molecule. QSAR methods involve the
statistical analysis of a set of properties or descriptors for a series of biologically active
molecules. The statistical model developed is then used to predict the activity of additional
compounds against the target. The QSAR model can be used to predict which members of a
series of proposed compounds are likely to be active and therefore should be synthesized
and tested. As new compounds are assayed, the additional experimental data are used to
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
refine the model. The relationship between these numerical properties and the activity is
often described by an equation of the general form[1]:
V=f(p)
Where V is the activity in question, p are structure-derived properties of the molecule (i.e.
descriptors), and f is some function. QSAR is applicable only to similar compounds.
2.6.1 DESCRIPTORS:
Constitutional descriptors
Topological descriptors
Geometrical descriptors
Electrostatic descriptors
Quantum chemical descriptors
Molecular Orbital related descriptors
Thermodynamic descriptors
Density Functional Theory (DFT) and Reactivity descriptors
Activities that are used in QSAR include IC50, EC50, etc., that can be obtained by
chemical measurements and biological assays.
The dependent variable (or response variable) is the variable that is being fitted to in
a regression model. It is referred to as dependent as it is assumed that its values are
dependent on the values of independent variables that will be used to generate the
predictive model. This variable is also referred to as the dependent descriptor or the activity
property.
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
The independent variables are those that are being used to fit a regression to a
dependent variable in partial least squares, principal component analysis, or multiple linear
regression methods. They are referred to as independent as their values are assumed not to
depend on the values of the dependent variable. The term independent descriptor is also
used.
2.6.2.3 COVARIANCE:
The covariance measures the extent to which two variables vary together. A positive
value of the covariance indicates that larger than average values of one variable tend to be
paired with larger than average values of the second variable. A negative value of the
covariance indicates that larger than average values of one variable tend to be paired with
smaller than average values of the second variable. A zero covariance indicates the two
variables vary independently from one another. The covariance is dependent on the
magnitude of the variables involved and is most useful when the variables have the same
magnitude.
The Pearson r is a correlation coefficient that determines the extent that two
variables are proportional to one another. In other words, the Pearson r provides a measure
of linear association between variables. Calculated Pearson r values lie on a scale from -1.0
to +1.0 with negative values indicating the best least-squares line between variables x and y
is downward sloping to the right and positive values indicating the best line is upward
sloping to the right. A value of zero indicates no correlation between the two variables. The
Pearson r is independent of the magnitude of variables (unlike the covariance).
Multiple linear regression (MLR) generates linear equations that describe the
relationship between a set of independent descriptors and a dependent descriptor.
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
2.6.2.6 R-SQUARED:
R-squared is the square of the Pearson r correlation coefficient. Its value ranges from
0.0 to 1.0 with a value of zero indicating the two variables have no correlation and a value of
one indicating the variables are perfectly correlated. Like the Pearson r, the r-squared is
independent of the magnitude of the two variables.
2.6.2.7 F-STATISTIC:
The F-statistic provides an indication of the lack of fit of the data to the estimated
values of the regression. A strong relationship between two variables gives a high F-ratio.
2.6.2.8 P-VALUE:
The p-value is the probability that the regression was obtained not from correlations
between the dependent and independent variables, but instead by chance. Generally p-
values of < 0.05, which indicate a 1 in 20 probability that the regression was obtained by
chance, are considered statistically significant.
10
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
OBJECTIVE
11
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
3. OBJECTIVE:
The present study is an attempt to build a QSAR model for integrase inhibitors based
on a set of compounds that were proved to be integrase inhibitors by in-vitro inhibition
assay and by docking methods. The compounds were obtained from the results of a study
by Deng et al.,[8], who discovered structurally diverse HIV-1 integrase inhibitors based on a
chalcone pharmacophore. Preliminary studies had showed that chalcone backbone is not
responsible for its anti-viral activity. Hence the previously identified chalcone compounds
were used to build a pharmacophore model and to search a database of small molecules
and 71 such compounds with good inhibitory capacity were identified and its IC50 values
were obtained from biochemical studies. These compounds were further screened and 38
compounds were selected for generating the QSAR model. Docking studies were carried out
using AutoDock 4.0 and the QSAR model is generated using STATISTICA statistical analysis
package.
12
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
MATERIALS
AND METHODS
13
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
4. MATERIALS:
The crystal structure of integrase chosen for docking studies is 1QS4, core domain of
HIV-1 integrase[4] complexed with Mg++ and 5-CITEP.
Resolution: 2.10 Å
Chains: 3
It is a 32 KDa protein including the N-terminal and C-terminal along with the core
domain. The structure of the catalytic core domain (CCD) of HIV-1 integrase consists of a
central five-stranded β-sheet with six surrounding helices.
Three amino acids in the CCD are highly conserved among retrotransposon and
retroviral integrase. Mutation of these residues generally leads to a loss of all catalytic
activities of these proteins, and they are therefore thought to be essential components of
the integrase active site. Two of these in HIV-1 integrase are Asp64 and Asp116. The third
conserved residue, Glu152, lies near the other two. Site directed mutagenesis and photo-
cross linking experiments have identified several residues near the active site including
Lys156, Lys159, Gln148 and Tyr 143 that are critical for binding viral DNA substrate. Many of
these residues, including Lys156, Lys159 and Gln148 are involved in binding the inhibitor[4].
4.2 LIGANDS:
38 compounds[8] used in the docking studies and to build the QSAR model are:
LIGAND
# STRUCTURE IC50(ST)(µM)*
NAME
1 66
14
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
2 4±3
3 49
5 30
6 72
15
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
7 18
8 12
11 60
14 28
16
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
15 30
16 18
17 ~33
18 48
17
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
21 26±8
22 460
23 25±21
24 130±52
18
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
26 77±28
28 83±20
29 57±20
19
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
32 >100
33 16±3
36 100
20
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
37 >100
38 >100
40 11
21
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
45 80
46 100
47 100
49 53
22
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
51 200
52 542±160
56 33
57 0.6
23
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
63 800
65 84±40
66 210±109
67 37±10
#
- Naming of the ligands is based on all the 71 compounds identified by the pharmacophore and not based on screening
*- IC50 values calculated for inhibition of Strand Transfer reaction
24
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
AutoDock 4.0 [9] is a docking tool that uses Monte Carlo simulated annealing and
Lamarckian genetic algorithm to create a set of possible conformations. Lamarckian Genetic
Algorithm is used as a global optimizer and energy minimization as a local search method.
Possible orientations are evaluated with AMBER force field model in conjunction with free
energy scoring functions and a large set of protein-ligand complexes with known protein-
ligand constants. The newest version 4 contains side chain flexibility.
4.4 STATISTICA:
4.5 HYPERCHEM:
4.6 QIKPROP:
QikProp computes over twenty physical descriptors, which can be used to improve
predictions by fitting to additional or proprietary experimental data, and to generate
alternate QSAR models. QikProp accepts a wide variety of input formats, including Maestro
files, MDL SD files, and PDB files; calculations are easily set up, and results can be plotted
and analyzed using the Maestro user interface.
25
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
5. METHODS:
5.1 SCREENING:
The 71 compounds were screened based on the Blood Brain Barrier and the Gut
Blood Barrier properties. These values were calculated using the QikProp module of
Maestro package. The compounds for which the values were above or below the range of
expected values (25-500) were eliminated from the study. The remaining 38 compounds
which passed the screening were used for further docking and QSAR studies.
5.2 DOCKING:
The ligand 5-CITEP present with the HIV Integrase in the crystal structure 1QS4 was
removed and the protein was prepared by adding hydrogens and removing water
molecules. Magnesium ion that is present in the crystal acts as a co-factor to the protein’s
function and hence it was not removed from the structure while docking. Charges were
added to the protein and ligand and both were prepared for docking with AutoDock 4.0 .
The number of GA runs was set to 10 and 1000 generations of energy was generated.
All the values calculated should be made absolute and those which show some kind
of pattern were converted to logarithmic values. Among the calculated descriptors Surface
area, volume, mass, binding energy, total energy were converted to absolute values and
then to logarithmic values. Hydration energy, HOMO, LUMO, heat of formation were
converted to absolute values.
26
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
5.3.1 HOMO:
HOMO (Highest Occupied Molecular Orbital) is the highest energy level in the
molecule that contains electrons. It is crucially important in governing molecular reactivity
and properties. When a molecule acts as a Lewis base (an electron-pair donor) in bond
formation, the electrons are supplied from the molecule's HOMO. How readily this occurs is
reflected in the energy of the HOMO. Molecules with high HOMOs are more able to donate
their electrons and are hence relatively reactive compared to molecules with low-lying
HOMOs; thus the HOMO descriptor measures the nucleophilicity of a molecule.
5.3.2 DIPOLE:
A dipole is a substance which has two unlike and equal charges at two opposite ends
separated by small distance. Dipole moment is the energy required to keep these charges
separated. The dipole moment is used as a 3D electronic descriptor in QSAR modeling. It
indicates the strength and orientation behavior of a molecule in an electrostatic field. Dipole
properties have been correlated to long range ligand-receptor recognition and subsequent
binding.
It is the heat change occurring when one mole of anhydrous substance undergoes
complete hydration (i.e. combine with the required number of water molecules to form
hydrated salt). Hydration is exothermic process since it involves bonding between the water
molecules and central metal ion. Increase in hydration energy means that the compound is
less reactive but stable while less hydration energy means that the compound can react
easily.
Those ligands which have exact IC50 values were alone considered. Those with a
range of value or approximate values were not considered for generating QSAR model.
Hence among the 38, 18 were considered in generating the model. The remaining 20 had
IC50 values as ranges and hence were omitted from the model generation.
Linear multiple regression method is used to generate the regression equations. All
the equations were generated using the SPSS program. The best equation was chosen based
on the t-test (t>1.74) values and the p-value at a 10% or 0.10 significance level (for two-
tailed test). The best equation is used for predicting the pIC50 values.
1. All 13 descriptors were considered in the first step and correlation matrix is
developed for all 13 descriptors with the pIC50 value.
2. The 6 descriptors with higher correlation were considered for multiple regression.
27
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
3. The descriptors which had significant t-values and p-value were selected for further
refinement and regression.
4. The possible outliers are detected and removed.
5. Final regression is run for the 3 significant descriptors and the model is generated.
5.5 PREDICTION:
The generated model is used to predict the pIC50 values for the compounds in the
test set using STATISTICA.
28
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
RESULTS AND
DISCUSSION
29
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
The docking studies with AutoDock 4.0 showed that all ligands bind only to the
catalytic residues and with the residues present in the active site that are critical for the
function of the protein. AutoDock 4.0 returns the energy of the protein-ligand complex as
Binding Energy, which is a sum of Final Intermolecular Energy (van der Waals energy,
Hydrogen bond energy, desolvation Energy, Electrostatic Energy), Final Total Internal
Energy, Torsional Free Energy, Unbound System's Energy.
All compounds were docked and the energies are tabulated. Ligand 3 has docked but it has
not formed any hydrogen bonds with any of the protein residues.
2 -3.39 ASN155
3 -7.29 no HBs
5 -6.29 ASP116
LYS 156
7 -5.26 ASN155
LYS 156
GLU152
8 -6.16 CYS65
ASN155
11 -4.99 GLU152
GLN 148
15 -5.15 GLU152
GLN 148
30
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
17 -5.49 ASP116
GLN 148
18 -4.46 ASP116
21 -4.39 ASP64
GLN 148
ASN155
22 -5.05 ASP116
GLU152
CYS 65
THR 66
23 -7.87 ASP64
LYS 156
GLU152
GLN148
ASN155
24 -4.78 ASN155
GLU152
GLN148
GLU152
26 -4.55 ASP64
SER147
28 -4.94 GLN148
LYS 156
THR 66
29 -5.31 ASP64
ASN155
GLN148
32 -4.94 ASP116
33 -5.35 ASP64
36 -5.91 ASP116
ASN155
37 -4.93 ASP116
38 -4.52 ASP64
31
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
40 -5.42 GLN148
ASN155
45 -4.69 ASN155
LYS 156
46 -5.38 GLN146
47 -5.16 GLU152
49 -4.73 ASN155
LYS 156
56 -7.05 ASN155
57 -6.3 GLU152
LYS 156
63 -6.03 ASN155
65 -5.41 GLU152
Ligand 23 has got the least energy, -7.87 and hence has docked in a most
comfortable pose. Ligand 57 was found to have high levels of inhibition in HIV-1 integrase
inhibition assays. Its energy was found to be -6.3. The difference in binding energy is due to
the number of interactions the ligand has made with the protein residues. Ligand 23 has
interaction with 2 of the catalytic residues (Asp64 and Gln152) while Ligand 57 has
interaction with only one of the catalytic residues (Gln152) but it is active in the in-vitro
assays because it has hydrophobic interactions with the other two catalytic residues.
32
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
33
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS
34
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS
REGRESSION SUMMARY:
35
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS
REGRESSION SUMMARY:
36
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS
37
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS
6.2.4.8 PLOT OF PREDICTED AND OBSERVED pIC50 FOR THE TRAINING SET:
6.000000
R² = 0.578
5.000000
pIC50 (predicted)
4.000000
3.000000
Predicted
2.000000
Linear (Predicted)
1.000000
0.000000
0 2 4 6
pIC50 (observed)
Based on the correlation between the individual descriptors and pIC50s, the
descriptors are selected. The descriptors that showed higher correlations (17%-45%) were 6
in number and are.
Hydration energy
Log P
Refractivity
Dipole
Heat of formation
HOMO
When a regression was run with all factors, the significance were high only for
Hydration energy
Dipole
HOMO
38
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS
To confirm the significance of these factors, 6 factors with higher correlations were
considered. The multiple regression for these 6 factors showed that the same three factors
Hydration Energy, Dipole and HOMO had significant t-values at 5% significance level.
Thus, the three factors Hydration Energy, Dipole and HOMO alone affect the activity
of all the compounds and hence the significance. The regression model with these three
factors showed a reasonable R 2 of 0.580. But Dipole showed less significance when
compared to others.
To solve this, outliers were detected. Ligand 57 was mapped as an outlier, since its
activity is higher than other ligands in the dataset. The outlier was removed and again
regression was done. The final regression result showed that all three descriptors have
better t-values at 5% significance level.
Based on the derived equation, it can be seen very well that Hydration Energy and
HOMO increases the biological activity when they increase, while the Dipole affects the
biological activity in a negative way.
The most active compound in the dataset has been identified as an outlier because
of the lack of wide scale of activity among the ligands. The model is biased to the
compounds with moderate activity.
The R2 is less with highly significant t-values. The reason for this is that the data is
insufficient and it does not give many degrees of freedom for the regression.
39
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS
CONCLUSION
40
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS
7. CONCLUSION:
The model is not trained well to be applied to external data due to the lack of a
range in the pIC50 in the training set. The model is completely biased to compounds in the
training set and this can be seen clearly from the prediction made by this model on the test
set. Hence the model is invalid for prediction.
41
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS
REFERENCES
42
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS
8. REFERENCES:
43
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)