Professional Documents
Culture Documents
Gkad 832
Gkad 832
https://doi.org/10.1093/nar/gkad832
Database issue
Abstract
Leveraging genetics insights to promote drug repurposing has become a promising and active strategy in pharmacology. Indeed, among the
50 drugs approved by FDA in 2021, two-thirds have genetically supported evidence. In this regard, the increasing amount of widely available
genome-wide association studies (GWAS) datasets have provided substantial opportunities for drug repurposing based on genetics discoveries.
Here, we developed PharmGWAS, a comprehensive knowledgebase designed to identify candidate drugs through the integration of GWAS data.
PharmGWAS focuses on novel connections between diseases and small-molecule compounds derived using a reverse relationship between the
genetically-regulated expression signature and the drug-induced signature. Specifically, we collected and processed 1929 GWAS datasets across
a diverse spectrum of diseases and 724 485 perturbation signatures pertaining to a substantial 33609 molecular compounds. To obtain reliable and
robust predictions for the reverse connections, we implemented six distinct connectivity methods. In the current version, PharmGWAS deposits
a total of 740 227 genetically-informed disease-drug pairs derived from drug-perturbation signatures, presenting a valuable and comprehensive
catalog. Further equipped with its user-friendly web design, PharmGWAS is expected to greatly aid the discovery of novel drugs, the exploration
of drug combination therapies and the identification of drug resistance or side effects. PharmGWAS is available at https://ngdc.cncb.ac.cn/
pharmgwas.
Graphical abstract
Genetically regulated Drug induced
expression profiles expression profiles
GWA
W S summary
statistics
S-PrediXcan
Connectivity
methods
Received: August 14, 2023. Revised: September 12, 2023. Editorial Decision: September 19, 2023. Accepted: September 21, 2023
© The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the
original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
2 Nucleic Acids Research, 2023
ing approach, also called the connectivity approach, has been diseases and candidate drugs. PharmGWAS thus provides a
proven particularly effective in discovering new drug repur- valuable reference resource by leveraging genetic evidence to
posing candidates across a wide spectrum of therapeutic do- aid the discovery of novel drugs, the exploration of drug com-
mains, even at the level of single-cell resolution (4–10). It usu- bination therapies and the identification of drug resistance or
ally involves the comparison of the signature of a drug, often side effects.
derived through differential gene expression analysis before
and after drug treatment, with that of another disease pheno-
type. The degree of inverse correlation between the two sig- Materials and methods
natures can provide insights into whether the drug may re- Data collection
vert the disease phenotype itself. Specifically, representative
GWAS summary statistics. We downloaded GWAS sum-
signature-based methods, or connectivity methods, included
With the results of all the six connectivity methods, we de- in the database (Figure 2). The search function available in the
fine the candidate disease-drug pairs to be included in our homepage supports keyword-based quick queries for multiple
database according to the following criteria: forms of items, such as names of diseases, CMap signatures,
WTCS < 0 & CSS < 0 & CSS p < 0.05 & XSum < 0 & or GEO signatures (Figure 2A). Meanwhile, users can navigate
XSum p < 0.05 & SCC < 0 & PCC < 0 & CCS < 0. the entire database through three featured modules: Browse,
Furthermore, to amalgamate the significance derived from CMap Results and GEO Results. All processed data in Phar-
all six connectivity methods, we propose a meta-score calcu- mGWAS are freely accessible.
lated as the count of methods that identify the correlation with The Browse module comprises three pages to facilitate ex-
high priority. Specifically, for each dataset-tissue pair, we rank ploration of GWAS datasets, CMap signatures and GEO sig-
all candidate drugs by their scores from each method. A drug natures. A series of extended interactive functional modules,
is considered identified with high priority by a method if it is such as multi-criteria search and download, are designed to
GWA
W S Summary Statistics Connectivity Map GEO
TSEA
TWAS
Disease Gene
Expression Signatures Drug-induced Expression Signatures
Connectivity Methods
Figure 1. Schematic overview of PharmGWAS. TSEA: Tissue-Specific Enrichment Analysis. TWAS: Transcriptome Wide Association Studies. GEO: Gene
Expression Omnibus. WTCS: Weighted Connectivity Score. CSS: Connection Strength Score. XSum: The eXtreme Sum score.
sociation ID’ in the CMap Results and GEO Results pages. In proven to have therapeutic benefits on atherosclerosis (54).
addition, all processed results can be freely downloaded. Taken together, lisofylline is a reliable candidate drug for the
treatment of CAD. These results suggested PharmGWAS could
Application of PharmGWAS: coronary artery serve as a valuable resource for drug repurposing by leverag-
disease as an example ing the discovery power of human genetics data.
We take the GWAS dataset of coronary artery disease (CAD)
reported by Webb et al. (50) as a case study to demonstrate the
application of PharmGWAS (Figure 2C and D). CAD is a lead- Discussion and future developments
ing cause of death worldwide with a major heritable compo- In this study, we developed PharmGWAS, a GWAS-based
nent and GWAS has identified ∼60 loci explaining ∼15% of knowledgebase for drug repurposing by incorporating hu-
the heritability. The three tissues mostly associated with CAD man genetics data and drug perturbation data. To the best
as determined by using MAGMA and deTS were artery coro- of our knowledge, PharmGWAS is the first valuable refer-
nary (P = 0.0070), liver (P = 0.0242) and spleen (P = 0.0242). ence resource that provides the analysis results processed by a
We calculated GReX in these three tissues as well as in the unified drug repurposing workflow for thousands of publicly
whole blood. After connectivity analyses with CMap2.0 sig- available disease-relevant GWAS datasets. In addition to the
natures, the top-ranked drug candidate was lisofylline (WTCS wide range of disease types covered by GWAS datasets from
= −0.5730, XSum = −9.3722, XSum P < 1 × 10−5 , CSS = multiple sources, the drug perturbation data retrieved from
−0.1464, CSS P = 0.0101, SCC = −0.1478, PCC = −0.1996, CMap2.0 and the GEO RNA-seq signatures extracted from
CCS = −0.1998) in artery coronary (Figure 2C). Lisofylline SigCom LINCS are among the most comprehensive. Our re-
was originally designed as an anti-inflammatory agent and sults shed new light on drug discovery, drug combination, drug
had been investigated for the therapy of diabetes (51). It had resistance and drug side effects from the human genetics per-
been reported to have suppressive effects on the serum-free spective.
fatty acids (52), a risk factor of atherosclerosis and atheroscle- With the accumulation of new GWAS datasets with increas-
rosis was a major complication of diabetes (53). Besides, pen- ing amounts of samples, PharmGWAS will be continuously
toxifylline, a methylxanthine derivative of lisofylline, had been updated by collecting and processing the latest available ge-
6 Nucleic Acids Research, 2023
Figure 2. Screenshots of the major web pages for PharmGWAS. (A) The homepage search function allows users to quickly query for multiple items
including GWAS datasets, CMap signatures and GEO signatures. (B) The browse module provides multi-criteria search function to further filter the data
of interest. (C) Detailed browse page for an individual GWAS dataset. Clicking on the ‘Association ID’ will direct users to view the detailed results and
charts for corresponding item and clicking on the ‘CMap Name’ will lead to the PubChem for specific information. (D) The detail results page for each
candidate disease–drug pair. (E) The CMap Results and GEO Results modules.
Nucleic Acids Research, 2023 7
netic data. In particular, our results are restricted solely to sci- 6. Malcomson,B., Wilson,H., Veglia,E., Thillaiyampalam,G.,
entific studies and do not constitute any clinical recommen- Barsden,R., Donegan,S., El Banna,A., Elborn,J.S., Ennis,M.,
dations. Because of the varying quality of the GWAS datasets Kelly,C., et al. (2016) Connectivity mapping (ssCMap) to predict
and imputed GReX, caution is required in interpreting our re- A20-inducing drugs and their antiinflammatory action in cystic
fibrosis. Proc. Natl. Acad. Sci. U.S.A., 113, E3725–E3734.
sults. Moreover, because CMap data are limited to a selection
7. Raghavan,R., Hyter,S., Pathak,H.B., Godwin,A.K., Konecny,G.,
of cell lines rather than tissues, further efforts to expand the Wang,C., Goode,E.L. and Fridley,B.L. (2016) Drug discovery using
repertoire of tissues will enhance the reliability of our results. clinical outcome-based Connectivity Mapping: application to
Lastly, population-scale applications of single-cell sequencing ovarian cancer. Bmc Genomics [Electronic Resource], 17, 811.
is becoming possible with the development of single-cell se- 8. Mirza,N., Sills,G.J., Pirmohamed,M. and Marson,A.G. (2017)
quencing technology and decreasing costs (55–57). Thus, we Identifying new antiepileptic drugs through genomics-based drug
anticipate that our framework for drug repurposing based on repurposing. Hum. Mol. Genet., 26, 527–537.
24. UCLEB Consortium, Speed,D., Cai,N., Johnson,M.R., Nejentsev,S. mechanisms governing human ovarian ageing. Nature, 596,
and Balding,D.J. (2017) Reevaluation of SNP heritability in 393–397.
complex human traits. Nat. Genet., 49, 986–992. 40. Mullins,N., Forstner,A.J., O’Connell,K.S., Coombes,B.,
25. Yang,J., Zeng,J., Goddard,M.E., Wray,N.R. and Visscher,P.M. Coleman,J.R.I., Qiao,Z., Als,T.D., Bigdeli,T.B., Borte,S., Bryois,J.,
(2017) Concepts, estimation and interpretation of SNP-based et al. (2021) Genome-wide association study of more than 40,000
heritability. Nat. Genet., 49, 1304–1310. bipolar disorder cases provides new insights into the underlying
26. Khera,A.V., Chaffin,M., Aragam,K.G., Haas,M.E., Roselli,C., biology. Nat. Genet., 53, 817–829.
Choi,S.H., Natarajan,P., Lander,E.S., Lubitz,S.A., Ellinor,P.T., et al. 41. Cade,B.E., Lee,J., Sofer,T., Wang,H., Zhang,M., Chen,H.,
(2018) Genome-wide polygenic scores for common diseases Gharib,S.A., Gottlieb,D.J., Guo,X., Lane,J.M., et al. (2021)
identify individuals with risk equivalent to monogenic mutations. Whole-genome association analyses of sleep-disordered breathing
Nat. Genet., 50, 1219–1224. phenotypes in the NHLBI TOPMed program. Genome Med., 13,
27. So,H.C., Chau,C.K., Chiu,W.T., Ho,K.S., Lo,C.P., Yim,S.H. and 136.