COMP90016 2023 09 Variant Consequences

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Computational Genomics

Lecture 9
Functional consequences of genetic variants
Dr Gayle Philip

Before watching this lecture, make sure you are familiar with… Today

1 2 3 Sequencing 4 6 Sequence 7 8 Variant 9 Variant


Intro & Intro to Variant
Genomics II calling II consequences
Genomics I technologies computing alignment calling I
About me

• Academic Specialist at Melbourne Bioinformatics

• BSc Biotechnology (NUIM, Ireland)


• PhD in Phylogenetics (NUIM, Ireland) – Supervised by Prof James McInerney
• Postdoctoral researcher at NASA Astrobiology Institute (Honolulu, Hawaii)

• Research interests: Genomics, Metagenomics

2
Lecture 9 Summary
• Introduction to genetic variation
• Large scale (chromosomal)
• Small scale (SNPs, indels)
• Mutation context
• Ploidy and inheritance
• mtDNA
• Chromosomal level context: regulation, tRNA, miRNA, splicing
• Protein level context: functional domains, folding and interactions
• Computational methods to predict effects of mutation
• Protein folding, structure and function
• Computational methods to predict consequence
• A clinical example
• Example tool (VEP)
• Variant interpretation in a clinical setting
• Melbourne Genomic Health Alliance

3
Recap: Codon Table

The genetic code is


considered redundant
because a particular
amino acid can be
encoded by more than
one codon sequence.

4
Amino Acid Properties

5
https://www.labxchange.org/library/pathway/lx-pathway:573b19de-1dda-48de-b82c-7cf4c3524f3c/items/lx-pb:573b19de-1dda-48de-b82c-7cf4c3524f3c:html:4f257f77
Types of mutations: large scale
• Affect large chunks of the genome:
• Involve single chromosomes:
• Deletion
Copy Number variation
• Duplication
• Inversion
• Insertion
• Involve multiple chromosomes:
• Translocation
• Insertion
• Examples in disease:
• Translocation of ABL1 gene in chromosome
9 to chromosome 22 breakpoint cluster
region (BCR) gene – BCR-ABL gene: “always
on” tyrosine kinase signaling protein causing
chronic myeloid leukemia (CML)
• > 36 CAG base pair repeats in huntingtin
gene causes Huntington’s disease. Number
of repeats correlated with age of onset.

6
Types of mutations: small scale

Point mutations:
• Single nucleotide change at the
DNA
• If present in 1% of population:
SNP (single nucleotide
polymorphism)
• When present in one individual:
SNV (single nucleotide variant)

7
Types of mutations: small scale

If in the coding genome:


• Insertions and deletions
(Indels) < 3 nucleotides:
frameshift mutations
• Insertions and deletions of
whole codons (3
nucleotides): addition or
deletion of an amino acid

8
Definitions
An allele is a variant form of a gene, inherited from a
• Loss of function (LoF) variants reduce resultant parent. Alleles can be:
gene function Eg. Frame shift or stop codon likely to
result in a nonfunctional allele. • Wild type

• Gain of function (GoF) variants increase function, The most common allele, expressing the most common
e.g. in cancer, cells divide more rapidly phenotype in a population.

• Somatic mutations arise in all cells of the body • Dominant


except gametes, non-heritable The allele effect is observed irrespective of the other
• Germline mutations arise in gametes, heritable allele in a diploid genomic context
• Recessive
Recessive alleles only show their effect if a diploid cell
Phenotype or trait is the observable/measurable has two copies of the same allele.
characteristic of an organism e.g. diabetes
• Codominant
• Influenced by genetic, epigenetics and
environment Effect of both alleles is detectable
Genotype describes the genetic makeup of an If both inherited alleles are the same, they are termed
organism homozygous, if they are different (one dominant and
one recessive) they are heterozygous.
9
Case Study: Mila
• At 6 years old, and after progressive regression of her eyesight,
coordination and speech, Mila was diagnosed with Batten’s disease.
• Batten’s disease is an autosomal recessive disorder which is always fatal
due to progressive neurodegeneration from toxic buildup of lipids and
proteins in the brain.
• There are >12 variants which cause disease, and very few treatment
options.
• Initial genetic testing identified Mila carried a recessive CLN7 gene from
her father, but a normal variant from the mother.
• How is Mila expressing disease if she is heterozygous?

10
Context is important!

• Coding/Non-coding?
• Specific cellular localization?
• Mitochondrial DNA (mtDNA)?
• Promoter/enhancer region?
• Splicing region?
• Important protein domains?
• Important interaction sites?
• Conserved protein regions?

11
Chromosomal context: Ploidy
• Ploidy refers to the number of complete sets of
chromosomes in a cell.
• Humans have diploid cells (2 copies): one from each parent
• Similar, but not identical as they contain genetic variation
• In bioinformatics we represent this as two strings, or one
string + a set of differences
• Ploidy has important influence on consequence of a
mutation.

This Photo by Unknown Author is licensed under CC BY-SA

12
Inheritance
• The trait expressed depends
on whether it is dominant (B)
or recessive (b).
• If a disease follows autosomal
dominant inheritance the
individual only needs 1
mutated copy (allele) to
manifest disease, and half the
offspring are likely to develop
disease. E.g. one mutated copy
of BRCA1 or BRCA2 can cause
breast cancer
• If a disease follows autosomal
recessive inheritance, the
individual needs 2 mutated
copies (alleles) to manifest
disease E.g. both copies of
CFTR need to be mutated to
cause cystic fibrosis. 13
Chromosomal context: mtDNA

• Human mitochondrial genome:


• 13 proteins, 22 tRNAs, 2 rRNAs specific to the
mitochondrion
• Important for energy production in the cell
• Maternally inherited – mutations cause mitochondrial disease
• Cells contain 5—15 copies of mtDNA - depending on cell type 14
Chromosomal context: Regulation
• The non-coding genome makes up over 98% of human DNA
• Still functionally important:
• https://www.encodeproject.org/
• Important for regulation of transcription:
• Cis- (nearby) or Trans- (distant) regulatory elements
• Promoter regions upstream of coding regions
• Enhancer regions recruit proteins to regulate transcription
levels
• Important for regulation of translation:
• tRNA
• Mutations in non-coding regions can affect coding regions
and resultant protein function

15
Chromosomal context: tRNA/miRNA

Type Transfer RNA (tRNA) microRNA (miRNA)

Structure

Function Physical link between mRNA Post-transcriptional regulation of


and amino acid during gene expression e.g. gene silencing
translation in ribosome
Mutational Affect translation, prevent Improper gene silencing.
consequences essential tRNA base E.g. Chronic lymphocytic leukemia
modifications
E.g. > 200 mitochondrial tRNA
mutations linked to disease
16
Chromosomal context: gene splicing
• Gene splicing is important because it permits
combination of different exons within a gene prior
to transcription, creating the full protein, or partial
protein combinations of different domains
(alternative splicing).
• Specific gene regions encode for splicing:
• at exon/intron junction
• within introns (alternative splicing)
• within exons (for exon recognition)
• Regulated by:
• Exonic splicing Enhancers (ESE) bind to positive splicing
regulators e.g. SR
• Exonic splicing Silencers (ESS) bind to negative splicing
regulators e.g. hnRNP
• Intronic splicing enhancers (ISE) and silencers (ISS) also
recruit splicing regulatory complexes
https://onlinelibrary.wiley.com/doi/full/10.1002/path.2649
• Mutations at these sites cause disease:
• E.g. Riley-Day syndrome (hereditary sensory and
autonomic neuropathy type III): mutation in first intron
of IKBKAP gene causes exon 20 skipping, which
introduces a premature termination codon – nonsense-
mediated decay of IKBKAP. 17
https://www.pnas.org/content/112/9/2637
Variant within or near a gene

https://m.ensembl.org/info/genome/variation/prediction/predicted_data.html 18
Variant within or near a gene
The terms in the table below are shown in order of severity (more severe to less severe)

19
https://m.ensembl.org/info/genome/variation/prediction/predicted_data.html
Case Study: Mila (Cont.)
• Whole genome testing revealed the CLN7 gene inherited from Mila’s
mother was compromised, as a retrotransposon (non-coding DNA) had
been inserted close to it.
• A specific therapy was developed in record time (less than a year),
which was based on an antisense oligonucleotide therapy previously
developed for spinal muscular atrophy. This therapy works by
counteracting the effects of faulty RNA to produce the correct enzyme –
which in theory, would revert Mila’s recessive status and treat her
condition i.e. engineer a drug with a piece of genetic code to patch
Mila’s unique mutation
• Although it halted her rapidly progressing condition and later improved
her quality of life, sadly Mila died at 10 years old.
ØBoth the coding and the non-coding genomes have implications in disease
https://www.statnews.com/2018/10/22/a-tailor-made-therapy-may-have-halted-a-rare- 20
disease/#:~:text=Mila%20has%20Batten%20disease%2C%20an,a%20glimmer%20of%20real%20hope.
Structural context: Protein Folding

Complex 3D shapes emerge from a string of amino acids.

21
https://www.deepmind.com/blog/alphafold-using-ai-for-scientific-discovery-2020
Structural context: Protein structure
• For a protein to carry out its function, it needs to be in the right
subcellular location

• Recognition sites for other proteins permit sub-cellular localization e.g.


nuclear localization signal, and nuclear export signals – ‘tag’ the
protein during translation for accurate localization
• Complex machinery of protein interactions to permit specific
localization and functions, e.g. mutated NES deter protein localization
to the ER, so protein cannot fold properly

• Protein sequence determines protein folding – specific domains


have specific functions. Mutations can have different
consequences:
• Synonymous: no change to structure, but can slow down translation
because of codon usage bias
• Missense: implicated in many diseases e.g. cancer
• Nonsense: truncation of a protein, often non-functional
• Indels: can change protein structure to different extents
22
Protein domain conservation

• Protein sequence
determines structure so
similar (conserved)
sequences tend to fold into
similar structures

• However, in some cases,


distinct sequences may still
fold into similarly shaped
domains – implies the
conservation of an
important, localized
function.

23
Protein structure over sequence
• Structural similarity that
spans at least one complete
domain likely reflects
homology

• A new protein (DUF2815,


purple) in Enterobacter
cancerogenus
bacteriophage Enc34 had
unknown function
• Structural alignment to
other phage-related
proteins from B. cereus
(green), E. faecalis
(orange) and T7 gp2.5
(yellow) show conserved
ssDNA binding motifs
Unknown Enc34 ORF6 (black box)
(purple) DNA
• Specific ssDNA binding
was confirmed
experimentally using
oligonucleotide

Cernooka, E., Rumnieks, J., Tars, K. et al. Structural Basis for DNA Recognition of a Single-stranded DNA-binding Protein from Enterobacter Phage Enc34. Sci Rep 7, 15529 (2017). 24
https://doi.org/10.1038/s41598-017-15774-y
Structural context: stability
• During folding, a protein seeks to achieve a global energy
minimum: the point at which the protein is most stable
• Proteins are naturally dynamic in nature:
• Need to coordinate different biochemical processes
• Any mutation to a protein (e.g. missense, nonsense,
indels) may affect protein stability, and change this
minimum.
• Both an increase and a decrease in stability can be
detrimental:
• Increase: reduces overall dynamics required for certain
protein functions
• Decrease: disturbs the 3-dimensional shape of localized
domains, affecting functions
• Changes in special amino acids: Glycine and Proline –
change backbone orientation, can affect conformation
and stability
Image generated using DynaMut: http://biosig.unimelb.edu.au/dynamut/
• If mutation in important domain, a disruption in stability
is usually accompanied by a disruption in other protein
interactions
25
Structural context: interactions
• Mutations within the protein structure can
affect natural molecular recognition patterns
carried out by the wild-type protein
• These patterns are essential for carrying out Interactions with ligands
specific functions, within biological pathways
• At the molecular level, even subtle mutations
like missense mutations can affect affinities
between molecules:
• Changes in amino acid charge (e.g. +ve Lysine Interactions with other proteins
to -ve Aspartic acid)
• Change in amino acid nature (e.g. from polar
to hydrophobic)
• Changes in amino acid size (e.g. Glycine to
Phenylalanine)
• Loss of special amino acids e.g. Cysteine
participates in disulfide bridge formation Interactions with nucleic acids

• Diseases or disease phenomena like drug


resistance, are heavily influenced by different
interaction patterns at the molecular level. All images generated using Arpeggio: http://biosig.unimelb.edu.au/arpeggioweb/
26
Computational Resources
• Variant Effect Predictor (VEP): https://asia.ensembl.org/info/docs/tools/vep/index.html
• is a freely available, open-source tool for the annotation and filtering of genomic variants. It predicts variant molecular
consequences.
• AlphaFold2: https://alphafold.ebi.ac.uk/
• A database of accurately predicted homology models for over 900,000 structures from the human genome and other
organisms
• ClinVar: https://www.ncbi.nlm.nih.gov/clinvar/
• A database of gene mutations associated with disease, either observed in the clinic or inferred through sequencing
techniques.
• Others include: TCGA: https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga
• ENCODE: https://www.encodeproject.org/
• A database summarizing functional experiments on the non-coding genome
• gnomAD: https://gnomad.broadinstitute.org/
• A database of population variation observed from large-scale genome and exome studies on healthy individuals and even
those experiencing disease
• RCSB Protein Data Bank: https://www.rcsb.org/
• A database for experimental crystal structures
• Others include: PDBe: https://www.ebi.ac.uk/pdbe/node/1
• Uniprot: https://www.uniprot.org/
• A database summarizing gene functions, subcellular localization, role in disease, structures and important domains and
interactions
27
Computational Tools:
• Tools relying on Conservation (as a measure of function) between different organisms:
• ConSurf: https://consurf.tau.ac.il/ (rate of evolution of every wild type residue)
• Polyphen-2: http://genetics.bwh.harvard.edu/pph2/ (functional prediction of human missense mutations)
• PROVEAN protein: http://provean.jcvi.org/seq_submit.php (functional prediction of missense and indels in
different organisms)
• SIFT: https://sift.bii.a-star.edu.sg/ (functional prediction of missense mutations in different organisms)
• SNAP2: https://rostlab.org/services/snap2web/ (functional prediction of missense mutations in different
organisms)

• Tools looking at gene deleteriousness within the human population:


• MTR (Missense Tolerance Ratio): http://biosig.unimelb.edu.au/mtr-viewer/ - calculating tolerance of a
specific gene region to accumulation of missense mutation
• MTR3D: http://biosig.unimelb.edu.au/mtr3d/ - tolerance of a specific protein 3-dimensional locus to
accumulation of missense mutation
28
Computational Tools: (cont.)
• Tools predicting Protein stability effects:
• DynaMut2: http://biosig.unimelb.edu.au/dynamut2/ (changes in stability and flexibility upon missense
mutations, multiple mutations also calculated)
• mCSM-Stability: http://biosig.unimelb.edu.au/mcsm/ (changes in stability upon missense mutation)
• mCSM-Membrane: http://biosig.unimelb.edu.au/mcsm_membrane/ (changes in stability in
transmembrane proteins)

• Tools predicting Protein interactions:


• mmCSM-AB: http://biosig.unimelb.edu.au/mmcsm_ab/ (changes in antibody-antigen binding affinity upon
missense mutations, multiple mutations also calculated)
• mmCSM-lig: http://biosig.unimelb.edu.au/mmcsm_lig/ (changes in protein-ligand affinity upon missense
mutations, multiple mutations also calculated)
• mmCSM-NA: http://biosig.unimelb.edu.au/mmcsm_na/ (changes in protein-nucleic acid affinity upon
missense mutations, multiple mutations also calculated)
• mmCSM-PPI: http://biosig.unimelb.edu.au/mmcsm_ppi/ (changes in protein-protein affinity upon
missense mutations, multiple mutations also calculated)
29
Functional consequences applications
• Humans
• Genetic disease diagnosis
• Disease risk: screening programs/patient monitoring

• Agriculture
• Cross-breeding methods – improvement of crop yield
• Identifying pesticide resistance mechanisms

• Micro-organisms
• Understanding virulence
• Understanding pathogen evolution
• Understanding drug resistance

30
Tool Example: Variant annotation (VEP)

31
Tool Example: Variant annotation (VEP)

32
Tool Example: Variant annotation (VEP)

33
Clinical Application: Variant Interpretation

• There is an intricate path from genetic variant identification to clinical


interpretation (also called ‘variant curation’). This is the complex process of
determining which variant is causing a patient’s condition, or – in the case of
cancer – is driving cancer growth.

• VEP is only one part of this step of classifying/prioritising a variant.

• All variants are assessed using a combination of population, genetic,


computational, literature and clinical data to gather evidence for prioritisation
and the weight of different lines of evidence leads to the final clinical
interpretation.
34
Clinical Applications: Variant Interpretation

https://doi.org/10.1016/j.atg.2014.06.001 35
Clinical Application: Variant Interpretation
• For inherited conditions (germline studies), the curated variants are classified
on a five-point scale to indicate the likelihood that a particular variant is
associated with disease (benign variants are not the cause of a condition, while
pathogenic variants are considered to be the cause of the condition).

• For cancer (somatic studies), the goal of interpretation of variants is to


guide/change patient management. Experts will scan the available evidence to
see if there are effective treatment options for these variants that can help
guide a patient’s care and improve outcomes.
36
Clinical Application: Variant Interpretation

• Variant curation is an internationally acknowledged ‘bottle-neck’ in the genomic


sequencing process, as it is still largely a task for expert human minds (not
computers).
• A multidisciplinary team – including clinical geneticists, medical scientists,
bioinformaticians, genetic counsellors and other medical specialists – need to
work together to interpret and agree a patient’s result.
• As curation is so time consuming, having direct access to all available data for a
variant, including previous observations and the opinion from an expert in the
field, is critical. To this end, expert-curated databases storing genetic variants
and their classifications are an essential resource for any genome diagnostic lab.

37
Clinical Application: MGHA
• Melbourne Genomics Health Alliance (MGHA)

• Working with the Victorian Government to embed genomics in our health


system.

https://www.melbournegenomics.org.au/about-us/who-we-are 38
Clinical Application: GenoVic

• Melbourne Genomics has built GenoVic – a clinical-grade platform for


laboratories that underpins the genomic testing process

• It supports the integration of genomic medicine into routine clinical care

• GenoVic takes data off the sequencing machine, through analysis,


interpretation and to a clinical report to inform a patient’s care – and provides
secure data storage.

https://www.melbournegenomics.org.au/genovic 39
Lecture Summary
• Introduction to genetic variation
• Large scale (chromosomal)
• Small scale (SNPs, indels)
• Mutation context
• Ploidy and inheritance
• mtDNA
• Chromosomal level context: regulation, tRNA, miRNA, splicing
• Protein level context: functional domains, folding and interactions
• Computational methods to predict effects of mutation
• Protein folding, structure and function
• Computational methods to predict consequences
• A clinical example
• Example tool (VEP)
• Variant interpretation in a clinical setting
• Melbourne Genomic Health Alliance

40

You might also like