Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Quantitative Structure Activity Relationship (QSAR)

Study of HIV-1 Integrase Inhibitors Based on Chalcone


Pharmacophore

A dissertation submitted to Pondicherry University in partial fulfillment of


the requirement for the degree of

M.Sc. Bioinformatics
B.SANGEETHA
(Reg. No: 1076013)

Under the guidance of

Dr. R. KRISHNA
Lecturer,
Centre for Bioinformatics
Pondicherry University

CENTRE FOR EXCELLENCE IN BIOINFORMATICS


SCHOOL OF LIFE SCIENCES
PONDICHERRY UNIVERSITY
PONDICHERRY- 605 014
INDIA

MAY 2009

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
CERTIFICATE

Certified that the dissertation entitled “Quantitative Structure Activity


Relationship (QSAR) study of HIV-1 integrase inhibitors based on
chalcone pharmacophore” is an authentic record of project work done by
Ms. B. Sangeetha at Centre for Excellence in Bioinformatics, Pondicherry
University, in partial fulfillment for award of M.Sc. in Bioinformatics and
that it has not previously formed the basis for the award of any degree.

Prof. P. P. Mathur Dr. R. Krishna,


Coordinator, Lecturer,
Centre for Excellence in Bioinformatics, Centre for Excellence in Bioinformatics,
School of Life Sciences School of Life Sciences

EXTERNAL EXAMINER

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
DECLARATION

I hereby declare that the dissertation entitled “Quantitative Structure


Activity Relationship (QSAR) study of HIV-1 integrase inhibitors based
on chalcone pharmacophore” is submitted for partial fulfillment for
award of M.Sc. in Bioinformatics. This project is entirely original work done
by me at Centre for Excellence in Bioinformatics, Pondicherry University,
Pondicherry. This work has not been submitted elsewhere for any other
degree, diploma, associate ship or fellowship.

Place:

Date: B.SANGEETHA

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
ACKNOWLEDGEMENTS

My sincere thanks to Dr. R. Krishna, Lecturer, Centre for Excellence in


Bioinformatics, Pondicherry University and my guide, who inspired me and supported
to take this innovative project. He has been the driving force behind this project. I
sincerely thank him for giving me an opportunity to carry out this project and
motivating me in the course of my project.

I am grateful to Prof. P. P. Mathur, Coordinator of the Centre for Excellence in


Bioinformatics, Pondicherry University for his proactive help and support.

I extend my gratitude and sincere thanks to Dr. Suresh Kumar and Mr. M.
Sundara Mohan for their kindness and support during the course of the work.

I express my heartfelt thanks to the JRFs of this Centre M. Jayakanthan, J.


Muthukumaran, Om Prakash Sharma for their timely help and valuable support.

I also express my heartfelt thanks to my friends Dr. Vijayakumar and Mr.


Kalesh for their timely help and support.

I would like specially thank all my friends for their constant support during the
progress of work and throughout the course.

B. Sangeetha

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
CONTENTS

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
1. INTRODUCTION 2
2. REVIEW OF LITERATURE 4
2.1 HUMAN IMMUNODEFICIENCY VIRUS AND AIDS 4
2.2 NATURAL PRODUCTS AS DRUGS 4
2.3 HIV INTEGRASE 5
2.4 PHARMACOPHORE HYPOTHESIS MODEL 6
2.5 MOLECULAR DOCKING 6
2.5.1 TERMS USED IN MOLECULAR DOCKING 7
2.6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) 7
2.6.1 DESCRIPTORS 8
2.6.2 STATISTICAL TERMS USED IN QSAR 8
2.6.2.1 DEPENDENT VARIABLE 8
2.6.2.2 INDEPENDENT VARIABLES 9
2.6.2.3 COVARIANCE 9
2.6.2.4 PEARSON r/R 9
2.6.2.5 MULTIPLE LINEAR REGRESSION 9
2.6.2.6 R-SQUARED 10
2.6.2.7 F-STATISTIC 10
2.6.2.8 P-VALUE 10
3. OBJECTIVE 12
4. MATERIALS 14
4.1 HIV-1 Integrase 14
4.2 LIGANDS 14
4.3 AutoDock 4.0 25
4.4 STATISTICA 25
4.5 HYPERCHEM 25
4.6 QIKPROP 25
5. METHODS 26
5.1 SCREENING 26
5.1.1 TERMS USED FOR SCREENING 26
5.2 DOCKING 26
5.3 SELECTION OF DESCRIPTORS 26
5.3.1 HOMO 27
5.3.2 DIPOLE 27
5.3.3 HYDRATION ENERGY 27
5.4 MODEL GENERATION 27
5.5 PREDICTION 28
6. RESULTS AND DISCUSSION 30
6.1 AUTODOCK 4.0 30
6.2 QSAR RESULTS 33
6.2.1 TRAINING SET DATA 33
6.2.2 CORRELATIONS FOR TRAINING SET 34
6.2.3 COVARIANCES FOR TRAINING SET 34
6.2.4 REGRESSION RESULTS 35
6.2.4.1 REGRESSION SUMMARY FOR THE TOP 6 35
6.2.4.2 SUMMARY OF RESIDUALS AND PREDICTED 35
6.2.4.3 REGRESSION SUMMARY FOR THREE FACTORS
WITH GOOD t-VALUE 36
6.2.4.4 SUMMARY OF RESIDUALS AND PREDICTED 36
6.2.4.5 PLOT OF OUTLIERS 37
6.2.4.6 REGRSSION AFTER REMOVING OUTLIER 37
6.2.4.7 SUMMARY OF RESIDUAL AND PREDICTED 37
6.2.4.8 PLOT OF PREDICTED AND OBSERVED pIC50
FOR THE TRAINING SET 38
6.2.4.9 PREDICTED AND OBSERVED pIC50 FOR THE TEST SET 38
7. CONCLUSION 41
8. REFERENCES 43

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

INTRODUCTION

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

1. INTRODUCTION:

The drug industry is one of the major players involved in the development of the
field of Bioinformatics[1]. Many pharmaceutical companies have internal teams conducting
Bioinformatics research. Most of the drugs are small molecules that are designed to bind,
interact, and modulate the activity of specific biological receptors. Receptors are proteins
that bind and interact with other molecules to perform the numerous functions required for
the maintenance of life. Receptors include an immense array of cell-surface receptors,
enzymes and other functional proteins. Due to genetic abnormalities, physiologic stress or
some combination thereof, the function of specific receptors and enzymes may become
altered to the point that our well-being is diminished. The roll of drugs is to correct the
functioning of these receptors.

Structural bioinformatics can facilitate the discovery, design and optimization of new
chemical entities which ranges from drugs and biological probes to biomaterials. CADD
(Computer-Aided Drug Design)[1] is a specialized discipline that uses computational
methods to simulate drug receptor interactions. As a process, drug designing is an iterative
one, involving drug discovery, lead optimization and chemical synthesis with the aim of
maximizing functional activity and minimizing adverse effects. Drug design is an
optimization problem.

HIV infection is a growing problem these days and there is no treatment as of date.
Combination therapies are used as treatment, but it is long term. Studies are carried out to
find a better way to treat HIV.

QSAR is a lead optimization method by which new drugs can be designed and
synthesized which can have the desired activity. 2D-QSAR is a statistical method by which
the numerical relationship between the various structural properties (descriptors) and the
biological activity of a series of compounds can be expressed. Based on this new compounds
can be synthesized.

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

REVIEW OF
LITERATURE

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

2. REVIEW OF LITERATURE:

2.1 HUMAN IMMUNODEFICIENCY VIRUS AND AIDS:

Acquired immunodeficiency syndrome (AIDS) is a clinical syndrome that is the result


of infection with human immunodeficiency virus (HIV), which causes profound
immunosuppression. It has been a serious, life-threatening health problem since the first
case was identified in 1981 and is the most quickly spreading disease of the century.

Two major types of HIV have been identified so far, HIV-1 and HIV-2[2]. HIV-1 is the
cause of the worldwide epidemic and is most commonly referred to as HIV. It is a highly
variable virus, which mutates readily. There are different strains of HIV-1. HIV-1 is much less
pathogenic and occurs rarely. It is found mostly in West Africa.

HIV begins its infection of a susceptible host cell by binding to the CD4 receptor on
the host cell. It also needs a co-receptor for entering the cell. The co receptors were found
to be CCR5 and CXCR4[3]. Based on the co receptors, different strains of viruses are
identified as R5, X4 and R5X4. R5 strain does not infect naïve T-cells. All three strains infect
memory T-cells.

Following the fusion and entry into the host cell, the viral RNA (genetic material) is
released and undergoes reverse transcription into DNA. HIV REVERSE TRANSCRIPTASE is
necessary to catalyze the conversion of viral RNA into DNA. The viral DNA enters the host
cell nucleus where it integrates into the genetic material of the cell. The enzyme INTEGRASE
catalyses this process. The viral DNA translates in the host cell nucleus and synthesizes the
structural proteins as polyproteins which are cleaved by the viral PROTEASE. All these 3
enzymes are targeted by anti-HIV drugs for the past years. Among these integrase is a very
recently identified target.

HIV undergoes rapid mutations and develops drug resistance and cytotoxic side
effects, many patients experience unsatisfactory virologic, immunologic, or clinical
outcomes from currently available therapies, limiting therapeutic options. Hence
combination of inhibitors is used in therapies (Combination Therapy or HAART-Highly Active
Anti-Retroviral Therapy). Present HIV therapies consist of combinations of integrase
inhibitors, nucleoside reverse transcriptase inhibitors (NRTI’s), nucleoside reverse
transcriptase inhibitors (NNRTI’s), protease inhibitors and fusion inhibitors[2].

2.2 NATURAL PRODUCTS AS DRUGS:

Nature has always provided a source of drugs for various ailments. A number of
medicinal plants have been reported to have anti-HIV properties[2]. The bioactivity-guided
fractionation of crude extracts has provided lead molecules for discovery of anti-HIV drug
candidates. A variety of secondary metabolites obtained from natural origin showed

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

moderate to good anti-HIV activity. The present natural products with anti-HIV property can
be classified under different groups like,

 Alkaloids
 Coumarins
 Flavonoids
 Lignans
 Phenolics
 Quinines
 Saponins
 Terpenes/sterols
 Xanthones
 Carbohydrates
 Peptides
 Proteins

Some of these natural products like calanolide and andrographolide are currently
under clinical trials to be approved as a drug.

Flavonoids are under study from a long time for its wide variety of functions and also
that they are easily available in nature as plant coloring pigments. Chalcones are compounds
that come under flavonoid class. They have anticancer, antiviral, antibacterial and antifungal
activity. They are of interest because of its easy availability. For ex: Xanthohumol is a prenyl
chalcone which is found in the plant Humulus lupulus. The flowers of this plant (hops) are
used in beer industry as a flavoring agent. Studies are carried out to increase xanthohumol’s
content in beer, so that anti-HIV beer can be produced and marketed.

Since they are non-specific inhibitors and also it possesses cytotoxicity in a large
number of tumor cell lines studies are carried out to find out which part of the compound’s
structure is important for its anti-HIV activity. The main problem in finding this is that the
target in HIV cell, for all the compounds is not yet identified.

2.3 HIV INTEGRASE:

HIV-1 integrase[4] is a 32-kDa enzyme, encoded by the pol gene that carries out DNA
integration in a two step-reaction.

1. 3’ processing, two nucleotides are removed from each 3’end of the viral DNA made
by reverse transcription.
2. DNA strand transfer, a pair of transesterification reactions integrates the ends of
the viral DNA into the host genome.

Integrase is comprised of three structurally and functionally distinct domains and all
three domains are required for each step of the integration reaction. The isolated domains

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

form homodimers in solution, and the three-dimensional structures of all three separate
dimers have been determined. Although little is known concerning the organization of these
domains in the active complex with DNA substrates, integrase is likely to function as at least
a tetramer. Extensive mutagenesis studies mapped the catalytic site to the core domain
(residues 50-212), which contains the catalytic residues Asp-64, Asp-116, and Glu-152. The
structure of this domain of HIV-1integrase has been reported in both apoenzyme form
(1BIS) and bound with Magnesium as the cofactor (1QS4).

The Integrase enzyme possesses three functional domains[5].

1. The N-terminal region, characterized by a HHCC “zinc finger”-like sequence. Studies


have indicated that this region can bind zinc and this binding stabilizes its structure.
2. The central region (Catalytic Domain) is characterized by a D, D (35) E constellation.
These acidic residues have been proposed to be involved in binding the required
metal ions, Mg2+ or Mn2+. Mutagenesis studies have shown that these conserved
acidic residues are essential for both processing and joining activity. This region is
resistant to proteases.
3. The C-terminal region is not highly conserved and contains strong metal-
independent, sequence-independent DNA-binding activity.

Integrase functions as a multimer.

2.4 PHARMACOPHORE HYPOTHESIS MODEL:

There are numerous potential interactions between ligand and receptor. Depending
upon the size of the active site, there may be numerous steric, electrostatic, and
hydrophobic contacts. However some are more important than the others. The specific
interactions that are crucial for ligand recognition and binding by the receptor are termed
the pharmacophore[1]. Usually, these are the interactions that directly factor into the
structural integrity of a receptor or are involved in the mechanism of its action.

A pharmacophore model can be derived from a set of known ligands for the target. A
pharmacophore can be developed using the set of features common to a series of active
molecules. These features can include acceptors, donors, ring centroids, hydrophobes, etc.
A 3-D pharmacophore is used to define the spatial relationship between the groups or
features.

2.5 MOLECULAR DOCKING:

Docking is a method which predicts the preferred orientation of one molecule to a


second when bound to each other to form a stable complex. Molecular docking is used to
predict the structure of the intermolecular complex formed between two or more
molecules. The most interesting case is the protein-ligand interaction, because of its
applications in medicine. Ligand is a small molecule, which interacts with protein’s binding

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

sites. Binding sites are areas of protein known to be active in forming of compounds. There
are several possible mutual conformations in which binding may occur. These are commonly
called binding modes.

2.5.1 TERMS USED IN MOLECULAR DOCKING:

 RECEPTOR or HOST: "receiving" molecule (protein or other biomolecules)


 LIGAND OR GUEST: complementary partner molecule (small molecules or
biopolymer)
 DOCKING: computational simulation of a candidate ligand binding to a receptor
 POSE: a candidate binding mode

Molecular docking can be divided into two separate problems[6]. The search
algorithm should create an optimum number of configurations that include the
experimentally determined binding modes. These configurations are evaluated using scoring
functions to distinguish the experimental binding modes from all other modes explored
through the searching algorithm.

Some common searching algorithms include

 Molecular dynamics
 Monte Carlo methods
 Genetic algorithms
 Fragment-based methods
 Point complementary methods
 Distance geometry methods
 Tabu searches
 Systematic searches

Some common scoring functions are

 Force-field methods
 Empirical free energy scoring functions
 Knowledge-based potential of mean force

2.6 Quantitative Structure Activity Relationship (QSAR):

A QSAR study tries to establish a link between the ability of a certain molecule to
perform its desired function and properties of that molecule. QSAR methods involve the
statistical analysis of a set of properties or descriptors for a series of biologically active
molecules. The statistical model developed is then used to predict the activity of additional
compounds against the target. The QSAR model can be used to predict which members of a
series of proposed compounds are likely to be active and therefore should be synthesized
and tested. As new compounds are assayed, the additional experimental data are used to

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

refine the model. The relationship between these numerical properties and the activity is
often described by an equation of the general form[1]:

V=f(p)

Where V is the activity in question, p are structure-derived properties of the molecule (i.e.
descriptors), and f is some function. QSAR is applicable only to similar compounds.

Once a correlation is established, the structure of any number of compounds with


desired properties can be predicted. The QSAR methodology saves resources and expedites
the process of development of new molecules and drugs.

2.6.1 DESCRIPTORS:

In the development of QSAR, structural or property descriptors of compounds are


correlated with activities. The different classes of descriptors that can be used in
QSAR/QSPR/QSTR are:

 Constitutional descriptors
 Topological descriptors
 Geometrical descriptors
 Electrostatic descriptors
 Quantum chemical descriptors
 Molecular Orbital related descriptors
 Thermodynamic descriptors
 Density Functional Theory (DFT) and Reactivity descriptors

Activities that are used in QSAR include IC50, EC50, etc., that can be obtained by
chemical measurements and biological assays.

2.6.2 STATISTICAL TERMS USED IN QSAR[7]:

2.6.2.1 DEPENDENT VARIABLE:

The dependent variable (or response variable) is the variable that is being fitted to in
a regression model. It is referred to as dependent as it is assumed that its values are
dependent on the values of independent variables that will be used to generate the
predictive model. This variable is also referred to as the dependent descriptor or the activity
property.

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

2.6.2.2 INDEPENDENT VARIABLES:

The independent variables are those that are being used to fit a regression to a
dependent variable in partial least squares, principal component analysis, or multiple linear
regression methods. They are referred to as independent as their values are assumed not to
depend on the values of the dependent variable. The term independent descriptor is also
used.

2.6.2.3 COVARIANCE:

The covariance measures the extent to which two variables vary together. A positive
value of the covariance indicates that larger than average values of one variable tend to be
paired with larger than average values of the second variable. A negative value of the
covariance indicates that larger than average values of one variable tend to be paired with
smaller than average values of the second variable. A zero covariance indicates the two
variables vary independently from one another. The covariance is dependent on the
magnitude of the variables involved and is most useful when the variables have the same
magnitude.

2.6.2.4 PEARSON r/R:

The Pearson r is a correlation coefficient that determines the extent that two
variables are proportional to one another. In other words, the Pearson r provides a measure
of linear association between variables. Calculated Pearson r values lie on a scale from -1.0
to +1.0 with negative values indicating the best least-squares line between variables x and y
is downward sloping to the right and positive values indicating the best line is upward
sloping to the right. A value of zero indicates no correlation between the two variables. The
Pearson r is independent of the magnitude of variables (unlike the covariance).

2.6.2.5 MULTIPLE LINEAR REGRESSION:

Multiple linear regression (MLR) generates linear equations that describe the
relationship between a set of independent descriptors and a dependent descriptor.

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

2.6.2.6 R-SQUARED:

R-squared is the square of the Pearson r correlation coefficient. Its value ranges from
0.0 to 1.0 with a value of zero indicating the two variables have no correlation and a value of
one indicating the variables are perfectly correlated. Like the Pearson r, the r-squared is
independent of the magnitude of the two variables.

2.6.2.7 F-STATISTIC:

The F-statistic provides an indication of the lack of fit of the data to the estimated
values of the regression. A strong relationship between two variables gives a high F-ratio.

2.6.2.8 P-VALUE:

The p-value is the probability that the regression was obtained not from correlations
between the dependent and independent variables, but instead by chance. Generally p-
values of < 0.05, which indicate a 1 in 20 probability that the regression was obtained by
chance, are considered statistically significant.

10

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

OBJECTIVE

11

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

3. OBJECTIVE:

The present study is an attempt to build a QSAR model for integrase inhibitors based
on a set of compounds that were proved to be integrase inhibitors by in-vitro inhibition
assay and by docking methods. The compounds were obtained from the results of a study
by Deng et al.,[8], who discovered structurally diverse HIV-1 integrase inhibitors based on a
chalcone pharmacophore. Preliminary studies had showed that chalcone backbone is not
responsible for its anti-viral activity. Hence the previously identified chalcone compounds
were used to build a pharmacophore model and to search a database of small molecules
and 71 such compounds with good inhibitory capacity were identified and its IC50 values
were obtained from biochemical studies. These compounds were further screened and 38
compounds were selected for generating the QSAR model. Docking studies were carried out
using AutoDock 4.0 and the QSAR model is generated using STATISTICA statistical analysis
package.

12

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

MATERIALS
AND METHODS

13

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

4. MATERIALS:

The crystal structure of integrase chosen for docking studies is 1QS4, core domain of
HIV-1 integrase[4] complexed with Mg++ and 5-CITEP.

4.1 HIV-1 Integrase (EC: 2.7.7.49)

Resolution: 2.10 Å

Chains: 3

Total residues: 157

Hetero group: Mg++

It is a 32 KDa protein including the N-terminal and C-terminal along with the core
domain. The structure of the catalytic core domain (CCD) of HIV-1 integrase consists of a
central five-stranded β-sheet with six surrounding helices.

Three amino acids in the CCD are highly conserved among retrotransposon and
retroviral integrase. Mutation of these residues generally leads to a loss of all catalytic
activities of these proteins, and they are therefore thought to be essential components of
the integrase active site. Two of these in HIV-1 integrase are Asp64 and Asp116. The third
conserved residue, Glu152, lies near the other two. Site directed mutagenesis and photo-
cross linking experiments have identified several residues near the active site including
Lys156, Lys159, Gln148 and Tyr 143 that are critical for binding viral DNA substrate. Many of
these residues, including Lys156, Lys159 and Gln148 are involved in binding the inhibitor[4].

4.2 LIGANDS:

38 compounds[8] used in the docking studies and to build the QSAR model are:

LIGAND
# STRUCTURE IC50(ST)(µM)*
NAME

1 66

14

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

2 4±3

3 49

5 30

6 72

15

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

7 18

8 12

11 60

14 28

16

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

15 30

16 18

17 ~33

18 48

17

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

21 26±8

22 460

23 25±21

24 130±52

18

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

26 77±28

28 83±20

29 57±20

19

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

32 >100

33 16±3

36 100

20

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

37 >100

38 >100

40 11

21

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

45 80

46 100

47 100

49 53

22

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

51 200

52 542±160

56 33

57 0.6

23

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

63 800

65 84±40

66 210±109

67 37±10

#
- Naming of the ligands is based on all the 71 compounds identified by the pharmacophore and not based on screening
*- IC50 values calculated for inhibition of Strand Transfer reaction

24

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

4.3 AutoDock 4.0 :

AutoDock 4.0 [9] is a docking tool that uses Monte Carlo simulated annealing and
Lamarckian genetic algorithm to create a set of possible conformations. Lamarckian Genetic
Algorithm is used as a global optimizer and energy minimization as a local search method.
Possible orientations are evaluated with AMBER force field model in conjunction with free
energy scoring functions and a large set of protein-ligand complexes with known protein-
ligand constants. The newest version 4 contains side chain flexibility.

4.4 STATISTICA:

STATISTICA[10] is a statistics and analytics software package developed by StatSoft.


STATISTICA provides a selection of data analysis, data management, data mining, and data
visualization procedures. Features of the software include basic and multivariate statistical
analysis, quality control modules, neural networks, and a collection of data mining
techniques. All of these tools are provided in an open architecture with a single software
platform.

4.5 HYPERCHEM:

HyperChem[11] is a sophisticated molecular modeling environment that is known for


its quality, flexibility, and ease of use, uniting 3D visualization and animation with quantum
chemical calculations, molecular mechanics, and dynamics. It has options to draw and
optimize a structure in molecular mechanics, quantum mechanics, semi-empirical and ab-
initio methods. It can also calculate a wide range of molecular properties like energies,
molecular orbital calculations and some QSAR properties, which can be useful for generating
a QSAR model.

4.6 QIKPROP:

Schrödinger’s QikProp[12] is an extremely fast ADME properties prediction program.


It can predict a wide range of properties and the predicted ADME properties are accurate. It
also checks Lipinski Rule-of-Five and Jorgensen Rule-of-Three

QikProp computes over twenty physical descriptors, which can be used to improve
predictions by fitting to additional or proprietary experimental data, and to generate
alternate QSAR models. QikProp accepts a wide variety of input formats, including Maestro
files, MDL SD files, and PDB files; calculations are easily set up, and results can be plotted
and analyzed using the Maestro user interface.

25

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

5. METHODS:

5.1 SCREENING:

The 71 compounds were screened based on the Blood Brain Barrier and the Gut
Blood Barrier properties. These values were calculated using the QikProp module of
Maestro package. The compounds for which the values were above or below the range of
expected values (25-500) were eliminated from the study. The remaining 38 compounds
which passed the screening were used for further docking and QSAR studies.

5.1.1 TERMS USED FOR SCREENING:

1. Predicted Caco-2 cell permeability in nm/sec.


Caco-2 cells are a model for the gut-blood barrier. The predictions are for
non-active transport.

2. Predicted MDCK cell permeability in nm/sec.


MDCK cells are considered to be a good mimic for the blood-brain barrier.
The predictions are for non-active transport.

5.2 DOCKING:

The ligand 5-CITEP present with the HIV Integrase in the crystal structure 1QS4 was
removed and the protein was prepared by adding hydrogens and removing water
molecules. Magnesium ion that is present in the crystal acts as a co-factor to the protein’s
function and hence it was not removed from the structure while docking. Charges were
added to the protein and ligand and both were prepared for docking with AutoDock 4.0 .
The number of GA runs was set to 10 and 1000 generations of energy was generated.

5.3 SELECTION OF DESCRIPTORS:

Descriptors like surface area, volume, hydration energy, log P, refractivity,


polarizability, mass, highest occupied molecular orbital (HOMO), lowest unoccupied
molecular orbital (LUMO), binding energy, heat of formation, dipole and total energy were
considered[13] and from this an optimal subset, that most affects the activity was selected
by correlation methods.

All the values calculated should be made absolute and those which show some kind
of pattern were converted to logarithmic values. Among the calculated descriptors Surface
area, volume, mass, binding energy, total energy were converted to absolute values and
then to logarithmic values. Hydration energy, HOMO, LUMO, heat of formation were
converted to absolute values.

26

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

5.3.1 HOMO:

HOMO (Highest Occupied Molecular Orbital) is the highest energy level in the
molecule that contains electrons. It is crucially important in governing molecular reactivity
and properties. When a molecule acts as a Lewis base (an electron-pair donor) in bond
formation, the electrons are supplied from the molecule's HOMO. How readily this occurs is
reflected in the energy of the HOMO. Molecules with high HOMOs are more able to donate
their electrons and are hence relatively reactive compared to molecules with low-lying
HOMOs; thus the HOMO descriptor measures the nucleophilicity of a molecule.

5.3.2 DIPOLE:

A dipole is a substance which has two unlike and equal charges at two opposite ends
separated by small distance. Dipole moment is the energy required to keep these charges
separated. The dipole moment is used as a 3D electronic descriptor in QSAR modeling. It
indicates the strength and orientation behavior of a molecule in an electrostatic field. Dipole
properties have been correlated to long range ligand-receptor recognition and subsequent
binding.

5.3.3 HYDRATION ENERGY:

It is the heat change occurring when one mole of anhydrous substance undergoes
complete hydration (i.e. combine with the required number of water molecules to form
hydrated salt). Hydration is exothermic process since it involves bonding between the water
molecules and central metal ion. Increase in hydration energy means that the compound is
less reactive but stable while less hydration energy means that the compound can react
easily.

5.4 MODEL GENERATION:

Those ligands which have exact IC50 values were alone considered. Those with a
range of value or approximate values were not considered for generating QSAR model.
Hence among the 38, 18 were considered in generating the model. The remaining 20 had
IC50 values as ranges and hence were omitted from the model generation.

Linear multiple regression method is used to generate the regression equations. All
the equations were generated using the SPSS program. The best equation was chosen based
on the t-test (t>1.74) values and the p-value at a 10% or 0.10 significance level (for two-
tailed test). The best equation is used for predicting the pIC50 values.

1. All 13 descriptors were considered in the first step and correlation matrix is
developed for all 13 descriptors with the pIC50 value.
2. The 6 descriptors with higher correlation were considered for multiple regression.

27

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

3. The descriptors which had significant t-values and p-value were selected for further
refinement and regression.
4. The possible outliers are detected and removed.
5. Final regression is run for the 3 significant descriptors and the model is generated.

5.5 PREDICTION:

The generated model is used to predict the pIC50 values for the compounds in the
test set using STATISTICA.

28

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

RESULTS AND
DISCUSSION

29

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

6. RESULTS AND DISCUSSION:

6.1 AUTODOCK 4.0:

The docking studies with AutoDock 4.0 showed that all ligands bind only to the
catalytic residues and with the residues present in the active site that are critical for the
function of the protein. AutoDock 4.0 returns the energy of the protein-ligand complex as
Binding Energy, which is a sum of Final Intermolecular Energy (van der Waals energy,
Hydrogen bond energy, desolvation Energy, Electrostatic Energy), Final Total Internal
Energy, Torsional Free Energy, Unbound System's Energy.

All compounds were docked and the energies are tabulated. Ligand 3 has docked but it has
not formed any hydrogen bonds with any of the protein residues.

LIGAND NAME BINDING ENERGY BINDING RESIDUES


1 -5.44 ASP64
ASP116

2 -3.39 ASN155

3 -7.29 no HBs

5 -6.29 ASP116
LYS 156

6 -5.39 GLN 148

7 -5.26 ASN155
LYS 156
GLU152

8 -6.16 CYS65
ASN155

11 -4.99 GLU152
GLN 148

14 -5.54 GLN 148


SER147

15 -5.15 GLU152
GLN 148

16 -5.78 GLN 148


ASN155

30

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

17 -5.49 ASP116
GLN 148

18 -4.46 ASP116

21 -4.39 ASP64
GLN 148
ASN155

22 -5.05 ASP116
GLU152
CYS 65
THR 66

23 -7.87 ASP64
LYS 156
GLU152
GLN148
ASN155

24 -4.78 ASN155
GLU152
GLN148
GLU152

26 -4.55 ASP64
SER147

28 -4.94 GLN148
LYS 156
THR 66

29 -5.31 ASP64
ASN155
GLN148

32 -4.94 ASP116

33 -5.35 ASP64

36 -5.91 ASP116
ASN155

37 -4.93 ASP116

38 -4.52 ASP64

31

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

40 -5.42 GLN148
ASN155

45 -4.69 ASN155
LYS 156

46 -5.38 GLN146

47 -5.16 GLU152

49 -4.73 ASN155
LYS 156

51 -5.59 GLN 146

52 -6.09 GLN 148

56 -7.05 ASN155

57 -6.3 GLU152
LYS 156

63 -6.03 ASN155

65 -5.41 GLU152

66 -3.97 GLN 146


LYS 156

67 -7.74 LYS 156

Ligand 23 has got the least energy, -7.87 and hence has docked in a most
comfortable pose. Ligand 57 was found to have high levels of inhibition in HIV-1 integrase
inhibition assays. Its energy was found to be -6.3. The difference in binding energy is due to
the number of interactions the ligand has made with the protein residues. Ligand 23 has
interaction with 2 of the catalytic residues (Asp64 and Gln152) while Ligand 57 has
interaction with only one of the catalytic residues (Gln152) but it is active in the in-vitro
assays because it has hydrophobic interactions with the other two catalytic residues.

32

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

6.2 QSAR RESULTS:

6.2.1 TRAINING SET DATA:

33

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF HIV-1 INTEGRASE INHIBITORS

6.2.2 CORRELATIONS FOR TRAINING SET:

6.2.3 COVARIANCES FOR TRAINING SET:

34

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS

6.2.4 REGRESSION RESULTS:

6.2.4.1 REGRESSION SUMMARY FOR THE TOP 6:


Dependent: pIC50(ST) Multiple R = .83245124 F = 4.137951
R²= .69297506 df = 6,11
No. of cases: 18 adjusted R²= .52550691 p = .020249
Standard error of estimate: .475685849
Intercept: -19.04330494 Std.Error: 5.681681 t( 11) = -3.352 p = .0065

AbsHydra beta=1.10 LOG P beta=.273 REFRACTIVITY beta=.155


abs homo beta=.869 abs heat beta=.247 DIPOLE beta=-.78

REGRESSION SUMMARY:

6.2.4.2 SUMMARY OF RESIDUALS AND PREDICTED:

35

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS

6.2.4.3 REGRESSION SUMMARY FOR THREE FACTORS WITH GOOD t-VALUE:


Dependent: pIC50(ST) Multiple R = .76187166 F = 6.456320
R²= .58044843 df = 3,14
No. of cases: 18 adjusted R²= .49054453 p = .005716
Standard error of estimate: .492899529
Intercept: -17.43847968 Std.Error: 5.705083 t( 14) = -3.057 p = .0085

AbsHydra beta=.981 abs homo beta=.848 DIPOLE beta=-.52

REGRESSION SUMMARY:

6.2.4.4 SUMMARY OF RESIDUALS AND PREDICTED:

36

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS

6.2.4.5 PLOT OF OUTLIERS:

6.2.4.6 REGRSSION AFTER REMOVING OUTLIER:

6.2.4.7 SUMMARY OF RESIDUAL AND PREDICTED:

37

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS

6.2.4.8 PLOT OF PREDICTED AND OBSERVED pIC50 FOR THE TRAINING SET:

6.000000
R² = 0.578
5.000000
pIC50 (predicted)

4.000000

3.000000
Predicted
2.000000
Linear (Predicted)

1.000000

0.000000
0 2 4 6
pIC50 (observed)

The final regression equation for the generated model is:

pIC50(ST) = (0.143 * AbsHydra) + (1.826 * abs homo) - (0.219 * DIPOLE) - 13.959

6.2.4.9 PREDICTED AND OBSERVED pIC50 FOR THE TEST SET:

pIC50(ST) PREDICTED RESIDUAL


4.5228 4.4117 0.1111
4.3187 3.7501 0.5686
4.4814 2.4364 2.045
4.2757 4.6126 -0.3369

Based on the correlation between the individual descriptors and pIC50s, the
descriptors are selected. The descriptors that showed higher correlations (17%-45%) were 6
in number and are.

 Hydration energy
 Log P
 Refractivity
 Dipole
 Heat of formation
 HOMO

When a regression was run with all factors, the significance were high only for

 Hydration energy
 Dipole
 HOMO

38

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS

To confirm the significance of these factors, 6 factors with higher correlations were
considered. The multiple regression for these 6 factors showed that the same three factors
Hydration Energy, Dipole and HOMO had significant t-values at 5% significance level.

Thus, the three factors Hydration Energy, Dipole and HOMO alone affect the activity
of all the compounds and hence the significance. The regression model with these three
factors showed a reasonable R 2 of 0.580. But Dipole showed less significance when
compared to others.

To solve this, outliers were detected. Ligand 57 was mapped as an outlier, since its
activity is higher than other ligands in the dataset. The outlier was removed and again
regression was done. The final regression result showed that all three descriptors have
better t-values at 5% significance level.

Based on the derived equation, it can be seen very well that Hydration Energy and
HOMO increases the biological activity when they increase, while the Dipole affects the
biological activity in a negative way.

The most active compound in the dataset has been identified as an outlier because
of the lack of wide scale of activity among the ligands. The model is biased to the
compounds with moderate activity.

The R2 is less with highly significant t-values. The reason for this is that the data is
insufficient and it does not give many degrees of freedom for the regression.

39

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS

CONCLUSION

40

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS

7. CONCLUSION:

The model is not trained well to be applied to external data due to the lack of a
range in the pIC50 in the training set. The model is completely biased to compounds in the
training set and this can be seen clearly from the prediction made by this model on the test
set. Hence the model is invalid for prediction.

41

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS

REFERENCES

42

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
QSAR STUDY OF INTEGRASE INHIBITORS

8. REFERENCES:

1. S.C.Rastogi, Bioinformatics Methods and Applications. 2006: Prentice-Hall of India Private


Ltd.
2. Singh, I.P., Anti-HIV natural products. Current Science, July 2005. 89(2): p. 269-289.
3. Vaishnav, Y.N. and F. Wong-Staal, The biochemistry of AIDS. Annu Rev Biochem, 1991. 60: p.
577-630.
4. Goldgur, Y., et al., Structure of the HIV-1 integrase catalytic domain complexed with an
inhibitor: a platform for antiviral drug design. Proc Natl Acad Sci U S A, 1999. 96(23): p.
13040-3.
5. Katz, R.A. and A.M. Skalka, The retroviral enzymes. Annu Rev Biochem, 1994. 63: p. 133-73.
6. Kaapro. Protein Docking. 2002 [cited; Available from: http://www.lce.hut.fi/teaching/S-
114.500/k2002/Protdock.pdf.
7. SCHRODINGER, Strike.
8. Deng, J., et al., Discovery of structurally diverse HIV-1 integrase inhibitors based on a
chalcone pharmacophore. Bioorg Med Chem, 2007. 15(14): p. 4985-5002.
9. AUTODOCK 4.0 , http://autodock.scripps.edu/.
10. STATISTICA, http://www.statsoftindia.com/.
11. Hyperchem, HyperChem(TM) Professional 7.51, Hypercube, Inc., 1115 NW 4th Street,
Gainesville, Florida 32601, USA.
12. SCHRODINGER, QikProp.
13. Olivero-Verbel, J. and L. Pacheco-Londono, Structure-activity relationships for the anti-HIV
activity of flavonoids. J Chem Inf Comput Sci, 2002. 42(5): p. 1241-6.
14. Geban, QSAR study on antibacterial and antifungal activities of some 3,4-disubstituted-1,2,4-
oxa(thia)-diazole-5(4 H)-ones(thiones) using physicochemical, quantumchemical and
structural parameters. Eur. J. Med. Chem, 1999. 34: p. 753-758.
15. Ertepinar, A QSAR study of the biological activities of some benzimidazoles and
imidazopyridines against Bacillus subtilis. Eur. J. Med. Chem, 1995. 30: p. 171-175.

43

Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)

You might also like