Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Nucleic Acids Research, 2023, 1–8

https://doi.org/10.1093/nar/gkad832
Database issue

PharmGWAS: a GWAS-based knowledgebase for drug


repurposing
1 ,2
Hongen Kang , Siyu Pan1 ,2 , Shiqi Lin1 ,2 , Yin-Ying Wang1 , Na Yuan 1
and Peilin Jia 1 ,2 ,3 ,
*

Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad832/7311079 by guest on 20 December 2023


1
CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences and China National
Center for Bioinformation, Beijing 100101, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation,
Beijing 100101, China
*
To whom correspondence should be addressed. Tel: +86 10 84097798; Email: pjia@big.ac.cn

Abstract
Leveraging genetics insights to promote drug repurposing has become a promising and active strategy in pharmacology. Indeed, among the
50 drugs approved by FDA in 2021, two-thirds have genetically supported evidence. In this regard, the increasing amount of widely available
genome-wide association studies (GWAS) datasets have provided substantial opportunities for drug repurposing based on genetics discoveries.
Here, we developed PharmGWAS, a comprehensive knowledgebase designed to identify candidate drugs through the integration of GWAS data.
PharmGWAS focuses on novel connections between diseases and small-molecule compounds derived using a reverse relationship between the
genetically-regulated expression signature and the drug-induced signature. Specifically, we collected and processed 1929 GWAS datasets across
a diverse spectrum of diseases and 724 485 perturbation signatures pertaining to a substantial 33609 molecular compounds. To obtain reliable and
robust predictions for the reverse connections, we implemented six distinct connectivity methods. In the current version, PharmGWAS deposits
a total of 740 227 genetically-informed disease-drug pairs derived from drug-perturbation signatures, presenting a valuable and comprehensive
catalog. Further equipped with its user-friendly web design, PharmGWAS is expected to greatly aid the discovery of novel drugs, the exploration
of drug combination therapies and the identification of drug resistance or side effects. PharmGWAS is available at https://ngdc.cncb.ac.cn/
pharmgwas.

Graphical abstract
Genetically regulated Drug induced
expression profiles expression profiles

GWA
W S summary
statistics

S-PrediXcan

Connectivity
methods

Drug repurposing candidates

Introduction existing medications or drugs, offering accelerated timelines,


The development of novel drugs is a daunting and time- and reduced risks and lower expenses compared to the traditional
cost-consuming process, which severely impedes the transla- de novo drug discovery pipeline.
tion of scientific breakthroughs into clinical applications (1,2). Diverse computational approaches have been developed to
To address this challenge, drug repurposing, also known as aid the identification of repurposing candidates, such as sig-
drug repositioning, has emerged as a promising strategy. Drug nature matching, molecular docking and retrospective clini-
repurposing seeks to identify novel clinical applications for cal analysis and so on (3). Among these, the signature match-

Received: August 14, 2023. Revised: September 12, 2023. Editorial Decision: September 19, 2023. Accepted: September 21, 2023
© The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the
original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
2 Nucleic Acids Research, 2023

ing approach, also called the connectivity approach, has been diseases and candidate drugs. PharmGWAS thus provides a
proven particularly effective in discovering new drug repur- valuable reference resource by leveraging genetic evidence to
posing candidates across a wide spectrum of therapeutic do- aid the discovery of novel drugs, the exploration of drug com-
mains, even at the level of single-cell resolution (4–10). It usu- bination therapies and the identification of drug resistance or
ally involves the comparison of the signature of a drug, often side effects.
derived through differential gene expression analysis before
and after drug treatment, with that of another disease pheno-
type. The degree of inverse correlation between the two sig- Materials and methods
natures can provide insights into whether the drug may re- Data collection
vert the disease phenotype itself. Specifically, representative
GWAS summary statistics. We downloaded GWAS sum-
signature-based methods, or connectivity methods, included

Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad832/7311079 by guest on 20 December 2023


mary statistics mainly from three sources: GWAS Cat-
the Connectivity Score, which was based on Kolmogorov-
alog (32), UK BioBank (UKBB) (33) and large con-
Smirnov (KS) statistics and was first implemented by the Con-
sortiums dedicated to diverse diseases, including Coro-
nectivity Map (CMap) project (11). Later, other methods were
nary ARtery DIsease Genome wide Replication and Meta-
reported not only measuring the strength of the inverse con-
analysis (CARDIoGRAM) plus The Coronary Artery Disease
nections but also assessing the statistical significance of the
(C4D) Genetics Consortium (34), Cerebrovascular Disease
connections (12,13), such as the Connection Strength Score
Knowledge Portal/International Stroke Genetics Consortium
(CSS) and eXtreme sum (XSum) score.
(CDKP/ISGC) (35), Common Metabolic Diseases Knowledge
Recently, advances in genome-wide association studies
Portal (CMDKP) (36), Cardiovascular Disease Knowledge
(GWAS) have revealed important biological insights into com-
Portal (CVDKP) (37,38), Reproductive Genetics Consortium
plex diseases. These genetics discoveries can assist in identi-
(RGC) (39), Psychiatric Genomic Consortium (PGC) (40) and
fying compounds suitable for tailored treatments to the re-
Sleep Disorder Knowledge Portal (SDKP) (41). According to
spective diseases (14–16). Indeed, among the 50 drugs ap-
the Experimental Factor Ontology (EFO) provided by GWAS
proved by FDA in 2021, two-thirds have genetically supported
Catalog, we filtered out the GWAS data where the trait de-
evidence (17). Traditionally, GWAS signals may provide op-
scription did not map to the disease terms in the EFO. Also,
portunities for selecting candidate drug targets by identify-
we only retained disease-related GWAS data for UKBB ac-
ing the causal variants and genes with large effect sizes (18–
cording to the international classification of diseases (ICD-
20). For example, IL23R was identified by GWAS as a re-
10), cancer code and non-cancer illness code. Considering that
purposing candidate for Crohn’s disease (21), and the corre-
the TWAS prediction models were built based on reference
sponding drug, ustekinumab, had been shown to have a sig-
data of European ancestry (42), we only kept the GWAS data
nificant therapeutic effect on IL23R (22). However, the ma-
from European ancestry in our current version of PharmG-
jority of significant variants are located in the non-coding
WAS. Redundant datasets were removed. We also collected
regions of the genome, making it quite challenging to iden-
metadata such as sample size, ancestry and release year of the
tify the causal genes. Furthermore, the majority of common
datasets from GWAS Catalog or manually reviewed the orig-
variants identified in GWAS had small to modest individ-
inal publications. The summary of GWAS datasets is shown
ual effects on traits, which may not be substantial enough
in Table 1.
for therapeutic intervention. On the contrary, the combined
Drug perturbation signatures. The drug-induced gene ex-
polygenic effect of many such variants could be considerably
pression profiles were retrieved from two sources: the Ex-
large and constitute a significant portion of the overall trait
panded CMap LINCS Resource 2020 (hereafter referred to
heritability (23–26). To leverage the discovery power of the
as CMap2.0) and the SigCom LINCS resource (30,31). For
vast majority of common variants, GWAS summary statis-
CMap2.0, we downloaded the level 5 data of small molecule
tics can be used to impute the genetically regulated expres-
compounds along with the metadata including information of
sion (GReX) profile and then compare with the drug-induced
compounds, dosage, time, genes and cell lines. SigCom LINCS
gene expression profiles, such as those generated by the Con-
is a webserver that provides service to process and analyze
nectivity Map project (11). Such extended signature matching
over a million gene expression signatures. We downloaded
strategies had been successfully applied in psychiatric disor-
the level 5 data from the SigCom LINCS resource (file name:
ders (27,28), blood traits (29) among others. However, there
Automatic Human GEO RNA-seq Signatures; access date:
is still a lack of a systematic and comprehensive knowledge-
29 May 2023), which included 4269 signatures that could also
base for genetically-informed drug repurposing for a wide
be used for connectivity analyses.
range of complex diseases.
To address this issue, we developed PharmGWAS, a knowl-
edgebase for GWAS-informed drug repurposing across a Data analysis workflow
diverse spectrum of diseases. We constructed a standard We constructed a standard pipeline to infer genetically-
computational inference workflow to first implement the informed drugs. First, because gene expression data could be
transcriptome-wide association study (TWAS) strategy and quite tissue-specific, we defined disease-associated genes us-
impute the GReX signatures in disease-relevant tissues for ing the GWAS data and conducted tissue-specific enrichment
each GWAS dataset. We also curated a comprehensive list of analysis (TSEA) to identify disease-relevant tissues for TWAS
drug-induced gene expression profiles from the CMap project analyses. Second, we applied TWAS in the predicted tissues
and the Gene Expression Omnibus (GEO) database (30,31). and obtained the imputed GReX. Third, we conducted con-
We then implemented six computational methods to search nectivity analyses to infer drug candidates by integrating the
for the drug-induced signatures that had a negative correla- GWAS-informed signature (GReX) and drug-induced signa-
tion with the GReX signature. These methods have been well- tures. Finally, we defined candidate disease-drug pairs by com-
reported and evaluated in literature to successfully connect bining the results of the six connectivity methods.
Nucleic Acids Research, 2023 3

Table 1. Summary of GWAS datasets

Source Datasets URL

UK Biobank 1551 http://www.nealelab.is/uk-biobank


GWAS Catalog 340 https://www.ebi.ac.uk/gwas/home
CARDIoGRAMplusC4D 2 http://www.cardiogramplusc4d.org/data-downloads/
CDKP/ISGC 4 https://cd.hugeamp.org/downloads.html
CMDKP 13 https://hugeamp.org/downloads.html
CVDKP 3 https://cvd.hugeamp.org/downloads.html
RGC 1 http://www.reprogen.org/data_download.html
PGC 13 https://www.med.unc.edu/pgc/results- and- downloads

Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad832/7311079 by guest on 20 December 2023


SDKP 2 https://sleep.hugeamp.org/downloads.html
CARDIoGRAMplusC4D: Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) plus The Coronary Artery Disease (C4D)
Genetics, CDKP/ISGC: Cerebrovascular Disease Knowledge Portal/International Stroke Genetics Consortium, CMDKP: Common Metabolic Diseases Knowl-
edge Portal, CVDKP: Cardiovascular Disease Knowledge Portal, RGC: Reproductive Genetics Consortium, PGC: Psychiatric Genomic Consortium, SDKP:
Sleep Disorder Knowledge Portal.

Tissue-specific enrichment analysis 


NS

We used Multi-marker Analysis of GenoMic Annotation cssmax (NR, NS ) = (NR − i + 1 )


(MAGMA v1.10) (43) to calculate gene-based p-values us- i=1

ing GWAS summary statistics. We defined disease-associated


genes as those with MAGMA P < 0.05 and conducted TSEA  
css R, S
by using the deTS algorithm to infer disease-relevant tissues
CSS =
(44). For each GWAS dataset, deTS identified the top three tis- cssmax (NR, NS )
sues that were most relevant to the disease and had P < 0.05.
We used these tissues for the following analysis. where R  is the rank-ordered drug list, S is the unordered dis-
ease signature (50 upregulated genes and 50 downregulated
Transcriptome-wide association study genes), NS is the number of genes in S that appears in R  , gsi is
To impute the GReX signatures in disease-relevant tissues, the ith gene in the set S, rabs
drug
( gs i ) is the absolute-value-based
we used the TWAS method S-PrediXcan in this study (45). ranks of gsi in the drug profile, signdrug (gsi ) is the sign of gsi in
We chose Multivariate adaptive shrinkage (Mashr) models  , and NR is the number of genes in R
R  . To generate the non-
trained using the GTEx data (release v8) as the transcrip- parametric P value of CSS, we randomly select NS genes in S
tome prediction models. For each GWAS dataset, we used S- and perform 10 000 permutations.
PrediXcan to impute GReX in the top 3 disease-relevant tis- The method XSum focuses on top- or bottom-ranked genes
sues as determined by deTS. In addition, we imputed GReX in and has been shown to perform better than other methods in
the whole blood tissue for all diseases regardless of their tissue the inference of drug-indication using the CMap data (13).
specific context. The XSum score is defined as follows:
Connectivity methods and filtering criteria ChangeByCompound = top N upregulated ∪ top N down-
We devised and implemented six distinct connectivity meth- regulated genes (compound-treated versus no compound-
ods to infer drug candidates based on several previous bench- treated)
mark and retrospective studies (13,46–48). They were the XUpInDisease = N disease-upregulated genes
Weighted Connectivity Score (WTCS), Connection Strength ∩ChangeByCompound
Score (CSS), eXtreme Sum score (XSum) and three types of XDownInDisease = N disease-downregulated genes
correlation measurements (Spearman, Pearson and Cosine). ∩ChangeByCompound
The method WTCS is used in CMap2.0 and employs the Sum(XUpInDisease) = sum of Z scores (level 5 data) of
Enrichment Score (ES) used in Gene Set Enrichment Analysis compound-induced genes in the set XUpInDisease
(GSEA) (30). WTCS is defined as follows: Sum(XDownInDisease) = sum of Z scores (level 5 data) of
 compound-induced genes in the set XDownInDisease
(E Sup −E Sdown )   XSum = Sum(XUpInDisease) - Sum(XDownInDisease)
WTCS = 2
, i f sign E Sup = sign (E Sdown )
0, otherwise Similarly, we calculate the p value of XSum by 10 000 per-
mutations of XUpInDisease and XDownInDisease. In this
where E Sup and E Sdown are the ESs for upregulated and down- study, we set N to 50.
regulated gene sets, respectively. The sizes of these gene sets are In addition, we calculate three metrics of correlation: Spear-
uniformly defined as 50. man Correlation Coefficient (SCC), Pearson Correlation Co-
The method CSS measures the strength of the correlation efficient (PCC) and Cosine Correlation Score (CCS) using
based on the signed ranks of genes in the disease signature and all shared genes between drug-induced profiles and disease-
the drug-induced signature. CSS had been shown to perform induced profiles. These three metrics can also be extended to
superior to other methods when evaluated to assess drug–drug the extremely shared genes between the sets of genes most per-
similarities using L1000 data (46). CSS is defined as follows: turbed (50 upregulated genes + 50 downregulated genes) by
the drug and disease (XPearson: eXtreme Pearson correlation;
  NS 
 XSpearman: eXtreme Spearman rank correlation; XCosine:
css R, S = rabs
drug (gsi ) × signdrug (gsi ) eXtreme Cosine correlation).
i=1
4 Nucleic Acids Research, 2023

With the results of all the six connectivity methods, we de- in the database (Figure 2). The search function available in the
fine the candidate disease-drug pairs to be included in our homepage supports keyword-based quick queries for multiple
database according to the following criteria: forms of items, such as names of diseases, CMap signatures,
WTCS < 0 & CSS < 0 & CSS p < 0.05 & XSum < 0 & or GEO signatures (Figure 2A). Meanwhile, users can navigate
XSum p < 0.05 & SCC < 0 & PCC < 0 & CCS < 0. the entire database through three featured modules: Browse,
Furthermore, to amalgamate the significance derived from CMap Results and GEO Results. All processed data in Phar-
all six connectivity methods, we propose a meta-score calcu- mGWAS are freely accessible.
lated as the count of methods that identify the correlation with The Browse module comprises three pages to facilitate ex-
high priority. Specifically, for each dataset-tissue pair, we rank ploration of GWAS datasets, CMap signatures and GEO sig-
all candidate drugs by their scores from each method. A drug natures. A series of extended interactive functional modules,
is considered identified with high priority by a method if it is such as multi-criteria search and download, are designed to

Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad832/7311079 by guest on 20 December 2023


ranked among the top 5% of all candidates by the correspond- enable users to quickly retrieve relevant data of interest (Fig-
ing method. As a result, each drug receives 6 labels, each indi- ure 2B). The Browse page of ‘GWAS datasets’ stores a large
cating whether it is of high priority or not according to each amount of key meta information, including data source, dis-
of the six methods. Notably, the range of the meta-score is ease name, publish year, the sample size of GWAS, the num-
between 0 and 6 for a total of 6 methods. ber of cases, the number of controls and so on. The Browse
pages of ‘CMap signatures’ and ‘GEO signatures’ also con-
Database implementation tain a lot of meta information, such as signature name, mech-
anism of action (MOA), Canonical SMILES, InChiKey, cell
PharmGWAS was deployed in a virtual machine with a Cen-
line, dose and GEO accession ID and so on. It is worth noting
tOS 7.9 environment. Nginx v1.20.1 (https://nginx.org/en/)
that clicking on the corresponding ‘Dataset Name’ or ‘Sig-
was used as an HTTP and reverse proxy server. We used
nature ID’ button on the row of interest would direct users
MySQL (https://www.mysql.com) as the database engine. The
to a new ‘detail page’ to view detailed results for a specific
backend RESTful web service was built with Java Spring Boot
GWAS dataset or signature. The detail page for a GWAS
framework (https://spring.io/projects/spring-boot). The fron-
dataset displays the meta-information about the study, the
tend of the web interface was developed using React (https:
disease-relevant tissues identified by deTS and the drug can-
//reactjs.org) and organized using the UMI (https://umijs.org)
didates derived from CMap signatures and GEO signatures
framework. For the user interface (UI) library, we chose Ant
(Figure 2C). Furthermore, users can filter the results by en-
Design (https://ant.design), which contained a set of high-
tering and selecting keywords about the tissue, CMap name,
quality components that could be easily extended to build
or GEO accession ID. More explanatory information about
rich and interactive user interfaces. In addition, the interac-
the column can be viewed by hovering over each entry of ta-
tive visualization of charts was achieved by using different li-
ble headers and clicking on the header will sort the table by
braries such as HighCharts (https://www.highcharts.com) and
the selected column. By clicking on the ‘CMap Name’ col-
ECharts (https://echarts.apache.org).
umn, users can access the details of the corresponding com-
pound in PubChem (49), such as chemical and physical prop-
Database content and usage erties, clinical trials, consolidated references, patents and so
on. By clicking on the ‘Association ID’ column, users will
Overview of PharmGWAS be directed to a dedicated page with detailed results and
PharmGWAS aims to translate GWAS signals into clini- charts for the item (Figure 2C).
cal practice. Currently, PharmGWAS contains 1929 GWAS The ‘detail page’ about individual disease-drug pairs con-
datasets curated from GWAS Catalog, UKBB and several tains five sections (Figure 2D). First, meta-information about
large consortiums. Moreover, 720216 compound-perturbed the corresponding disease and drug is shown at the top of the
gene expression signatures were collected from CMap2.0 and page. Second, we use a radargram to provide an overview of
an additional 4269 perturbed signatures were collected from the results from all six connectivity methods. The larger the
GEO. We implemented our standard pipeline to infer candi- area enclosed by the results of the six methods, the more reli-
date disease-drug pairs for each GWAS dataset. The outline able the drug candidate is. Third, to assess whether the disease-
of PharmGWAS is illustrated in Figure 1. Currently, PharmG- upregulated genes appear toward the bottom of drug-induced
WAS deposits a total of 732947 genetically-informed disease- profiles and disease-downregulated genes appear toward the
drug pairs derived from CMap signatures and 7280 pairs from top of drug-induced profiles, the GSEA plots of disease upreg-
GEO signatures, totaling 740227 (732947 + 7280) disease- ulated and downregulated genes in the pre-ranked list of drug
drug pairs. Among them, 81.58% (603 874) have a meta-score signature are provided. Next, the reverse intersection anal-
of 0, 11.40% (84411) have a meta-score of 1, 3.52% (26 021) ysis of disease-upregulated genes with drug-downregulated
have 2, 2.89% (21 370) have 3, 0.49% (3652) have 4, 838 genes is illustrated in Venn diagrams, and similarly for disease-
have 5, and 61 have 6. In all web pages, the default rule for downregulated genes with drug-upregulated genes. Finally,
sorting results is based on the meta-score in descending order. the corresponding Z scores of upregulated and downregu-
To the best of our knowledge, PharmGWAS is the first system- lated genes in disease and drug signatures are displayed in the
atic and comprehensive database to identify drug repurposing histograms.
candidates for nearly all available disease traits. The modules for ‘CMap Results’ and ‘GEO Results’ pro-
vide an opportunity to directly search for disease-drug pairs
Web interface across datasets and signatures (Figure 2E). An advanced
PharmGWAS provides a user-friendly interface that allows search function has also been provided. Users can jump to
users to intuitively search, browse and download any results the detail page of disease-drug pairs by clicking on the ‘As-
Nucleic Acids Research, 2023 5

GWA
W S Summary Statistics Connectivity Map GEO

TSEA

Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad832/7311079 by guest on 20 December 2023


Disease-relevant Tissues

TWAS

Disease Gene
Expression Signatures Drug-induced Expression Signatures

Connectivity Methods

WTCS CSS XSum Spearman Pearson Cosine

Genetically-informed Drug Candidates

Figure 1. Schematic overview of PharmGWAS. TSEA: Tissue-Specific Enrichment Analysis. TWAS: Transcriptome Wide Association Studies. GEO: Gene
Expression Omnibus. WTCS: Weighted Connectivity Score. CSS: Connection Strength Score. XSum: The eXtreme Sum score.

sociation ID’ in the CMap Results and GEO Results pages. In proven to have therapeutic benefits on atherosclerosis (54).
addition, all processed results can be freely downloaded. Taken together, lisofylline is a reliable candidate drug for the
treatment of CAD. These results suggested PharmGWAS could
Application of PharmGWAS: coronary artery serve as a valuable resource for drug repurposing by leverag-
disease as an example ing the discovery power of human genetics data.
We take the GWAS dataset of coronary artery disease (CAD)
reported by Webb et al. (50) as a case study to demonstrate the
application of PharmGWAS (Figure 2C and D). CAD is a lead- Discussion and future developments
ing cause of death worldwide with a major heritable compo- In this study, we developed PharmGWAS, a GWAS-based
nent and GWAS has identified ∼60 loci explaining ∼15% of knowledgebase for drug repurposing by incorporating hu-
the heritability. The three tissues mostly associated with CAD man genetics data and drug perturbation data. To the best
as determined by using MAGMA and deTS were artery coro- of our knowledge, PharmGWAS is the first valuable refer-
nary (P = 0.0070), liver (P = 0.0242) and spleen (P = 0.0242). ence resource that provides the analysis results processed by a
We calculated GReX in these three tissues as well as in the unified drug repurposing workflow for thousands of publicly
whole blood. After connectivity analyses with CMap2.0 sig- available disease-relevant GWAS datasets. In addition to the
natures, the top-ranked drug candidate was lisofylline (WTCS wide range of disease types covered by GWAS datasets from
= −0.5730, XSum = −9.3722, XSum P < 1 × 10−5 , CSS = multiple sources, the drug perturbation data retrieved from
−0.1464, CSS P = 0.0101, SCC = −0.1478, PCC = −0.1996, CMap2.0 and the GEO RNA-seq signatures extracted from
CCS = −0.1998) in artery coronary (Figure 2C). Lisofylline SigCom LINCS are among the most comprehensive. Our re-
was originally designed as an anti-inflammatory agent and sults shed new light on drug discovery, drug combination, drug
had been investigated for the therapy of diabetes (51). It had resistance and drug side effects from the human genetics per-
been reported to have suppressive effects on the serum-free spective.
fatty acids (52), a risk factor of atherosclerosis and atheroscle- With the accumulation of new GWAS datasets with increas-
rosis was a major complication of diabetes (53). Besides, pen- ing amounts of samples, PharmGWAS will be continuously
toxifylline, a methylxanthine derivative of lisofylline, had been updated by collecting and processing the latest available ge-
6 Nucleic Acids Research, 2023

Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad832/7311079 by guest on 20 December 2023

Figure 2. Screenshots of the major web pages for PharmGWAS. (A) The homepage search function allows users to quickly query for multiple items
including GWAS datasets, CMap signatures and GEO signatures. (B) The browse module provides multi-criteria search function to further filter the data
of interest. (C) Detailed browse page for an individual GWAS dataset. Clicking on the ‘Association ID’ will direct users to view the detailed results and
charts for corresponding item and clicking on the ‘CMap Name’ will lead to the PubChem for specific information. (D) The detail results page for each
candidate disease–drug pair. (E) The CMap Results and GEO Results modules.
Nucleic Acids Research, 2023 7

netic data. In particular, our results are restricted solely to sci- 6. Malcomson,B., Wilson,H., Veglia,E., Thillaiyampalam,G.,
entific studies and do not constitute any clinical recommen- Barsden,R., Donegan,S., El Banna,A., Elborn,J.S., Ennis,M.,
dations. Because of the varying quality of the GWAS datasets Kelly,C., et al. (2016) Connectivity mapping (ssCMap) to predict
and imputed GReX, caution is required in interpreting our re- A20-inducing drugs and their antiinflammatory action in cystic
fibrosis. Proc. Natl. Acad. Sci. U.S.A., 113, E3725–E3734.
sults. Moreover, because CMap data are limited to a selection
7. Raghavan,R., Hyter,S., Pathak,H.B., Godwin,A.K., Konecny,G.,
of cell lines rather than tissues, further efforts to expand the Wang,C., Goode,E.L. and Fridley,B.L. (2016) Drug discovery using
repertoire of tissues will enhance the reliability of our results. clinical outcome-based Connectivity Mapping: application to
Lastly, population-scale applications of single-cell sequencing ovarian cancer. Bmc Genomics [Electronic Resource], 17, 811.
is becoming possible with the development of single-cell se- 8. Mirza,N., Sills,G.J., Pirmohamed,M. and Marson,A.G. (2017)
quencing technology and decreasing costs (55–57). Thus, we Identifying new antiepileptic drugs through genomics-based drug
anticipate that our framework for drug repurposing based on repurposing. Hum. Mol. Genet., 26, 527–537.

Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad832/7311079 by guest on 20 December 2023


the TWAS concept will expand at single-cell resolution in the 9. Williams,G., Gatt,A., Clarke,E., Corcoran,J., Doherty,P.,
future. Chambers,D. and Ballard,C. (2019) Drug repurposing for
Alzheimer’s disease based on transcriptional profiling of human
iPSC-derived cortical neurons. Transl Psychiatry, 9, 220.
10. Wang,Y.Y., Kang,H., Xu,T., Hao,L., Bao,Y. and Jia,P. (2022) CeDR
Data availability Atlas: a knowledgebase of cellular drug response. Nucleic Acids
PharmGWAS is available at https://ngdc.cncb.ac.cn/ Res., 50, D1164–D1171.
pharmgwas. Users can freely access all processed data 11. Lamb,J., Crawford,E.D., Peck,D., Modell,J.W., Blat,I.C.,
and results through the web service. Wrobel,M.J., Lerner,J., Brunet,J.P., Subramanian,A., Ross,K.N.,
et al. (2006) The Connectivity Map: using gene-expression
signatures to connect small molecules, genes, and disease. Science,
Acknowledgements 313, 1929–1935.
12. Zhang,S.D. and Gant,T.W. (2008) A simple and robust method for
We thank National Genomics Data Center (NGDC) for pro- connecting small-molecule drugs using gene-expression signatures.
viding support in database deployment. We are grateful to all BMC Bioinf., 9, 258.
members of the Laboratory for Precision Health (LPH) for 13. Cheng,J., Yang,L., Kumar,V. and Agarwal,P. (2014) Systematic
their valuable comments. evaluation of connectivity map for disease indications. Genome
Med., 6, 540.
14. Szustakowski,J.D., Balasubramanian,S., Kvikstad,E., Khalid,S.,
Funding Bronson,P.G., Sasson,A., Wong,E., Liu,D., Wade Davis,J.,
Haefliger,C., et al. (2021) Advancing human genetics research and
National Natural Science Foundation of China [32270706]; drug discovery through exome sequencing of the UK Biobank.
Strategic Priority Research Program of the Chinese Academy Nat. Genet., 53, 942–948.
of Sciences [XDB38010400]; Startup Research Fund of 15. Reay,W.R. and Cairns,M.J. (2021) Advancing the use of
Henan Academy of Sciences [232016009]. Funding for open genome-wide association studies for drug repurposing. Nat. Rev.
access charge: National Natural Science Foundation of China Genet., 22, 658–671.
[32270706] and Strategic Priority Research Program of Chi- 16. Carss,K.J., Deaton,A.M., Del Rio-Espinola,A., Diogo,D.,
Fielden,M., Kulkarni,D.A., Moggs,J., Newham,P., Nelson,M.R.,
nese Academy of Sciences [XDB38010400 to P.J.].
Sistare,F.D., et al. (2023) Using human genetics to improve safety
assessment of therapeutics. Nat. Rev. Drug. Discov., 22, 145–162.
17. Ochoa,D., Karim,M., Ghoussaini,M., Hulcoop,D.G.,
Conflict of interest statement McDonagh,E.M. and Dunham,I. (2022) Human genetics evidence
None declared. supports two-thirds of the 2021 FDA-approved drugs. Nat. Rev.
Drug. Discov., 21, 551.
18. Nelson,M.R., Tipney,H., Painter,J.L., Shen,J., Nicoletti,P., Shen,Y.,
References Floratos,A., Sham,P.C., Li,M.J., Wang,J., et al. (2015) The support
of human genetic evidence for approved drug indications. Nat.
1. Waring,M.J., Arrowsmith,J., Leach,A.R., Leeson,P.D., Mandrell,S., Genet., 47, 856–860.
Owen,R.M., Pairaudeau,G., Pennie,W.D., Pickett,S.D., Wang,J., 19. Okada,Y., Wu,D., Trynka,G., Raj,T., Terao,C., Ikari,K., Kochi,Y.,
et al. (2015) An analysis of the attrition of drug candidates from Ohmura,K., Suzuki,A., Yoshida,S., et al. (2014) Genetics of
four major pharmaceutical companies. Nat. Rev. Drug Discov., 14, rheumatoid arthritis contributes to biology and drug discovery.
475–486. Nature, 506, 376–381.
2. Dowden,H. and Munro,J. (2019) Trends in clinical success rates 20. Plenge,R.M., Scolnick,E.M. and Altshuler,D. (2013) Validating
and therapeutic focus. Nat. Rev. Drug. Discov., 18, 495–496. therapeutic targets through human genetics. Nat. Rev. Drug.
3. Pushpakom,S., Iorio,F., Eyers,P.A., Escott,K.J., Hopper,S., Wells,A., Discov., 12, 581–594.
Doig,A., Guilliams,T., Latimer,J., McNamee,C., et al. (2019) Drug 21. Duerr,R.H., Taylor,K.D., Brant,S.R., Rioux,J.D., Silverberg,M.S.,
repurposing: progress, challenges and recommendations. Nat. Rev. Daly,M.J., Steinhart,A.H., Abraham,C., Regueiro,M., Griffiths,A.,
Drug. Discov., 18, 41–58. et al. (2006) A genome-wide association study identifies IL23R as
4. Kunkel,S.D., Suneja,M., Ebert,S.M., Bongers,K.S., Fox,D.K., an inflammatory bowel disease gene. Science, 314, 1461–1463.
Malmberg,S.E., Alipour,F., Shields,R.K. and Adams,C.M. (2011) 22. Feagan,B.G., Sandborn,W.J., Gasink,C., Jacobstein,D., Lang,Y.,
mRNA expression signatures of human skeletal muscle atrophy Friedman,J.R., Blank,M.A., Johanns,J., Gao,L.L., Miao,Y., et al.
identify a natural compound that increases muscle mass. Cell (2016) Ustekinumab as Induction and Maintenance Therapy for
Metab., 13, 627–638. Crohn’s Disease. N. Engl. J. Med., 375, 1946–1960.
5. Huang,C.H., Ciou,J.S., Chen,S.T., Kok,V.C., Chung,Y., Tsai,J.J., 23. Park,J.H., Wacholder,S., Gail,M.H., Peters,U., Jacobs,K.B.,
Kurubanjerdjit,N., Huang,C.F. and Ng,K.L. (2016) Identify Chanock,S.J. and Chatterjee,N. (2010) Estimation of effect size
potential drugs for cardiovascular diseases caused by distribution from genome-wide association studies and
stress-induced genes in vascular smooth muscle cells. PeerJ, 4, implications for future discoveries. Nat. Genet., 42, 570–575.
e2478.
8 Nucleic Acids Research, 2023

24. UCLEB Consortium, Speed,D., Cai,N., Johnson,M.R., Nejentsev,S. mechanisms governing human ovarian ageing. Nature, 596,
and Balding,D.J. (2017) Reevaluation of SNP heritability in 393–397.
complex human traits. Nat. Genet., 49, 986–992. 40. Mullins,N., Forstner,A.J., O’Connell,K.S., Coombes,B.,
25. Yang,J., Zeng,J., Goddard,M.E., Wray,N.R. and Visscher,P.M. Coleman,J.R.I., Qiao,Z., Als,T.D., Bigdeli,T.B., Borte,S., Bryois,J.,
(2017) Concepts, estimation and interpretation of SNP-based et al. (2021) Genome-wide association study of more than 40,000
heritability. Nat. Genet., 49, 1304–1310. bipolar disorder cases provides new insights into the underlying
26. Khera,A.V., Chaffin,M., Aragam,K.G., Haas,M.E., Roselli,C., biology. Nat. Genet., 53, 817–829.
Choi,S.H., Natarajan,P., Lander,E.S., Lubitz,S.A., Ellinor,P.T., et al. 41. Cade,B.E., Lee,J., Sofer,T., Wang,H., Zhang,M., Chen,H.,
(2018) Genome-wide polygenic scores for common diseases Gharib,S.A., Gottlieb,D.J., Guo,X., Lane,J.M., et al. (2021)
identify individuals with risk equivalent to monogenic mutations. Whole-genome association analyses of sleep-disordered breathing
Nat. Genet., 50, 1219–1224. phenotypes in the NHLBI TOPMed program. Genome Med., 13,
27. So,H.C., Chau,C.K., Chiu,W.T., Ho,K.S., Lo,C.P., Yim,S.H. and 136.

Downloaded from https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad832/7311079 by guest on 20 December 2023


Sham,P.C. (2017) Analysis of genome-wide association data 42. GTEx Consortium (2020) The GTEx Consortium atlas of genetic
highlights candidates for drug repositioning in psychiatry. Nat. regulatory effects across human tissues. Science, 369, 1318–1330.
Neurosci., 20, 1342–1349. 43. de Leeuw,C.A., Mooij,J.M., Heskes,T. and Posthuma,D. (2015)
28. Woodward,D.J., Thorp,J.G., Akosile,W., Ong,J.S., Gamazon,E.R., MAGMA: generalized gene-set analysis of GWAS data. PLoS
Derks,E.M. and Gerring,Z.F. (2023) Identification of drug Comput. Biol., 11, e1004219.
repurposing candidates for the treatment of anxiety: a genetic 44. Pei,G., Dai,Y., Zhao,Z. and Jia,P. (2019) deTS: tissue-specific
approach. Psychiatry Res., 326, 115343. enrichment analysis to decode tissue specificity. Bioinformatics,
29. Wu,P., Feng,Q., Kerchberger,V.E., Nelson,S.D., Chen,Q., Li,B., 35, 3842–3845.
Edwards,T.L., Cox,N.J., Phillips,E.J., Stein,C.M., et al. (2022) 45. Barbeira,A.N., Dickinson,S.P., Bonazzola,R., Zheng,J.,
Integrating gene expression and clinical data to identify drug Wheeler,H.E., Torres,J.M., Torstenson,E.S., Shah,K.P., Garcia,T.,
repurposing candidates for hyperlipidemia and hypertension. Nat. Edwards,T.L., et al. (2018) Exploring the phenotypic consequences
Commun., 13, 46. of tissue specific gene expression variation inferred from GWAS
30. Subramanian,A., Narayan,R., Corsello,S.M., Peck,D.D., summary statistics. Nat. Commun., 9, 1825.
Natoli,T.E., Lu,X., Gould,J., Davis,J.F., Tubelli,A.A., Asiedu,J.K., 46. Lin,K., Li,L., Dai,Y., Wang,H., Teng,S., Bao,X., Lu,Z.J. and
et al. (2017) A Next Generation Connectivity Map: L1000 Wang,D. (2020) A comprehensive evaluation of connectivity
Platform and the First 1,000,000 Profiles. Cell, 171, 1437–1452. methods for L1000 data. Brief Bioinform, 21, 2194–2205.
31. Evangelista,J.E., Clarke,D.J.B., Xie,Z., Lachmann,A., Jeon,M., 47. Struckmann,S., Ernst,M., Fischer,S., Mah,N., Fuellen,G. and
Chen,K., Jagodnik,K.M., Jenkins,S.L., Kuleshov,M.V., Moller,S. (2021) Scoring functions for drug-effect similarity. Brief
Wojciechowicz,M.L., et al. (2022) SigCom LINCS: data and Bioinform, 22, 1–8.
metadata search engine for a million gene expression signatures. 48. Samart,K., Tuyishime,P., Krishnan,A. and Ravi,J. (2021)
Nucleic Acids Res., 50, W697–W709. Reconciling multiple connectivity scores for drug repurposing.
32. Sollis,E., Mosaku,A., Abid,A., Buniello,A., Cerezo,M., Gil,L., Brief Bioinform, 22, 1–15.
Groza,T., Gunes,O., Hall,P., Hayhurst,J., et al. (2023) The 49. Kim,S., Chen,J., Cheng,T., Gindulyte,A., He,J., He,S., Li,Q.,
NHGRI-EBI GWAS Catalog: knowledgebase and deposition Shoemaker,B.A., Thiessen,P.A., Yu,B., et al. (2023) PubChem 2023
resource. Nucleic Acids Res., 51, D977–D985. update. Nucleic Acids Res., 51, D1373–D1380.
33. Sudlow,C., Gallacher,J., Allen,N., Beral,V., Burton,P., Danesh,J., 50. Webb,T.R., Erdmann,J., Stirrups,K.E., Stitziel,N.O., Masca,N.G.,
Downey,P., Elliott,P., Green,J., Landray,M., et al. (2015) UK Jansen,H., Kanoni,S., Nelson,C.P., Ferrario,P.G., Konig,I.R., et al.
biobank: an open access resource for identifying the causes of a (2017) Systematic evaluation of pleiotropy identifies 6 further loci
wide range of complex diseases of middle and old age. PLoS Med., associated with coronary artery disease. J. Am. Coll. Cardiol., 69,
12, e1001779. 823–836.
34. Nelson,C.P., Goel,A., Butterworth,A.S., Kanoni,S., Webb,T.R., 51. Yang,Z.D., Chen,M., Wu,R., McDuffie,M. and Nadler,J.L. (2002)
Marouli,E., Zeng,L., Ntalla,I., Lai,F.Y., Hopewell,J.C., et al. (2017) The anti-inflammatory compound lisofylline prevents Type I
Association analyses based on false discovery rate implicate new diabetes in non-obese diabetic mice. Diabetologia, 45, 1307–1314.
loci for coronary artery disease. Nat. Genet., 49, 1385–1391. 52. Bursten,S.L., Federighi,D., Wald,J., Meengs,B., Spickler,W. and
35. Traylor,M., Malik,R., Nalls,M.A., Cotlarciuc,I., Radmanesh,F., Nudelman,E. (1998) Lisofylline causes rapid and prolonged
Thorleifsson,G., Hanscombe,K.B., Langefeld,C., Saleheen,D., suppression of serum levels of free fatty acids. J. Pharmacol. Exp.
Rost,N.S., et al. (2017) Genetic variation at 16q24.2 is associated Ther., 284, 337–345.
with small vessel stroke. Ann. Neurol., 81, 383–394. 53. Beckman,J.A., Creager,M.A. and Libby,P. (2002) Diabetes and
36. Sandholm,N., Cole,J.B., Nair,V., Sheng,X., Liu,H., Ahlqvist,E., van atherosclerosis: epidemiology, pathophysiology, and management.
Zuydam,N., Dahlstrom,E.H., Fermin,D., Smyth,L.J., et al. (2022) JAMA, 287, 2570–2581.
Genome-wide meta-analysis and omics integration identifies novel 54. Laurat,E., Poirier,B., Tupin,E., Caligiuri,G., Hansson,G.K.,
genes associated with diabetic kidney disease. Diabetologia, 65, Bariety,J. and Nicoletti,A. (2001) In vivo downregulation of T
1495–1509. helper cell 1 immune responses reduces atherogenesis in
37. Roselli,C., Yu,M., Nauffal,V., Georges,A., Yang,Q., Love,K., apolipoprotein E-knockout mice. Circulation, 104, 197–202.
Weng,L.C., Delling,F.N., Maurya,S.R., Schrolkamp,M., et al. 55. Sumida,T.S. and Hafler,D.A. (2022) Population genetics meets
(2022) Genome-wide association study reveals novel genetic loci: a single-cell sequencing. Science, 376, 134–135.
new polygenic risk score for mitral valve prolapse. Eur. Heart J., 56. Yazar,S., Alquicira-Hernandez,J., Wing,K., Senabouth,A.,
43, 1668–1680. Gordon,M.G., Andersen,S., Lu,Q., Rowson,A., Taylor,T.R.P.,
38. Nauffal,V., Di Achille,P., Klarqvist,M.D.R., Cunningham,J.W., Clarke,L., et al. (2022) Single-cell eQTL mapping identifies cell
Hill,M.C., Pirruccello,J.P., Weng,L.C., Morrill,V.N., Choi,S.H., type-specific genetic control of autoimmune disease. Science, 376,
Khurshid,S., et al. (2023) Genetics of myocardial interstitial eabf3041.
fibrosis in the human heart and association with disease. Nat. 57. Perez,R.K., Gordon,M.G., Subramaniam,M., Kim,M.C.,
Genet., 55, 777–786. Hartoularos,G.C., Targ,S., Sun,Y., Ogorodnikov,A., Bueno,R.,
39. Ruth,K.S., Day,F.R., Hussain,J., Martinez-Marchal,A., Aiken,C.E., Lu,A., et al. (2022) Single-cell RNA-seq reveals cell type-specific
Azad,A., Thompson,D.J., Knoblochova,L., Abe,H., molecular and genetic associations to lupus. Science, 376,
Tarry-Adkins,J.L., et al. (2021) Genetic insights into biological eabf1970.
Received: August 14, 2023. Revised: September 12, 2023. Editorial Decision: September 19, 2023. Accepted: September 21, 2023
© The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For
commercial re-use, please contact journals.permissions@oup.com

You might also like