Professional Documents
Culture Documents
Big Data in Healthcare
Big Data in Healthcare
New data is
Datasets are
Volume Velocity generated
huge
every second
The 4 Vs of
Big Data
Many different Not all data are
types of data are Variety Veracity useful, some are
available useless
Source: IBM Big Data Hub
Source: IBM Big Data Hub
Source: IBM Big Data Hub
Source: IBM Big Data Hub
Big Biomedical and Health Data
Volume of Biomedical
and Health Data
Electronic Health
ProteomicsDB Medical imaging
Records (EHR)
Transcriptomics
Variety of data types E.g. “Omics” data
provides systematic data
and structures at all levels
Proteomics
Medical images
What is Big Data?
How does it compare to other types of data?
Electronic Health
Observational Records
Studies Billing data,
Molecular
insurance claims
Databases
Clinical Doctors’ notes,
Trials lab results,
discussions
Real World
Research
Data
Big Data in Drug Discovery
Cellular Binding
Virtual Screening
Assays Protein Binding
Assays
Biological Efficacy
Cheminformatics Physicochemical
Databases Information
Toxicity
Open Access Drug Databases
ChEMBL SuperTarget
Manually curated bioactive molecules with ~7300 drug-target associations with ~5000
drug-like properties. manually annotated
DrugBank PharmGkb
FDA-approved and experimental drugs with Pharmacogenomic-focused genetic, molecular,
drug target, bio- and chemoinformatic data. cellular, and clinical data for drugs
ZINC ChemBank
~21 million compounds that are commercially Biomedical measurements derived from cell
available and prepared for virtual screening. lines treated with small molecules.
Open Access Databases on Protein-Protein/-Gene/-Other Interactions
BioGRID MINT
~730,000 raw protein and genetic interactions Molecular interaction database focusing on
from major model organisms. experimentally validated protein–
protein interactions.
DGIdb
Drug gene interaction database curated from Database of Interacting Proteins
multiple well-established databases. Manual and computational curation of
experimentally determined protein–protein
interactions.
ExPASy STRING
Known and predicted protein–protein MatrixDB
interactions from experimental repositories Interactions between extracellular proteins
and computational methods. (i.e., collagen and laminins) and
polysaccharides.
Open Access Genomics Databases
Oncomine
Cancer microarray database that can be GDOC
subdivided by treatment, patient survival Broad collection of bioinformatics and systems
and other demographics. biology tools for analysis and visualization of
four major ‘omics’ types: DNA, mRNA,
microRNA and metabolites.
The Cancer Genome Atlas
Large-scale genome sequencing platform for
multiple cancers led by the NCI and the NHGRI.
Open Access Proteomics Databases
dbDEPC PRIDE
Database of differentially expressed proteins in Centralized standards-compliant mass
human cancer. spectrometry proteomics and post-translational
modifications
BiGG HumanCyc
Genomic-based reconstruction of human Human metabolic pathway/genome
metabolism for systems biology simulation bioinformatics database constituting over
and flux modeling 28,000 genes
HMDB SMPDB
Human small molecule metabolites with Small molecule pathway database with >400
associated chemical, clinical and molecular unique human pathways not found in other
biology information databases
Big Data provides new New patterns and associations are
insights for drug design being discovered by mining this data
Different individual ➜
different tumor cells Heterogenous
Same individual ➜
different tumor tissue
➜ different cells
Undergo further changes
when stressed (hypoxia,
Same individual ➜
same tumor tissue ➜ exposed to drugs)
different cells!
Different groups
Participants divided Results from each
receive different LIMITATIONS!
into groups group compared
interventions
E.g. RCT evaluating efficacy of chemotherapy drug
Baseline
Intervention:
different therapies, Measurement(s) Analyse
control
Second- or third-line therapies,
dynamic processes, many other factors
Commonly neglected problem: statistical analysis is typically based on the different effectiveness
of different interventions provided at baseline.
It is taken for granted that the intervention provided at baseline is the only variable factor to be
considered, and any subsequent interventions (second- or third-line therapies) will not affect
outcome ➜ usually not the case.
Current statistical analyses
Such point-to-point
focus on relationship between
relationship may not
baseline intervention and
provide the whole picture
specific outcome
Intervention Endpoint/outcome
Observational Billing data, Electronic Health
Studies insurance claims Records
Molecular
Clinical Databases
Trials Doctors’ notes,
lab results,
discussions
Real World
Research
Data
Demographics
Billing data and
EHRs Health insurance
claims
Laboratory
results
Registries for chronic
Procedures, surgery and infectious diseases
and clinical outcomes
Big Data Presents resources and
Address the limitations
augments RCTs methodologies that can supplement
RCTs in clinical research
of RCTs
Risk factor Is urine output on ICU entry High resolution (other risk Multivariate model, stratified analysis,
evaluation associated with mortality factors should be provided) and propensity score analysis can be
outcome? employed
Effectiveness of Will drug A improve outcome High resolution (including Intervention may be given for patients
intervention of patients with septic shock? a large number of with different conditions. These
confounding factors) conditions should be controlled to avoid
"selective treatment"
Prediction model Prediction model for ICU Moderate resolution The predictive value of whole model is
delirium (general description of risk stressed, rather than a single risk factor
factors)
Epidemiological The incidence and Low resolution A simple description is enough and no
study prevalence rate of risk factor adjustment is required
catheter-related blood stream
infection in ICU
Implementation of Is the policy of screening and Low resolution No complex clinical data are required
healthcare policy controlling hypertension
effective in lowering
cardiovascular event rate?