Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 25

Population Approaches to

Detecting and Genotyping Copy


Number Variation

Lachlan Coin
July 2010
Outline
• Population-haplotype approach to CNV
detecting and genotyping

• Application to SNP and CGH data

• Application to NGS sequence data


cnvHap approach to CNV discovery
and genotyping

Coin et al, 2010, Nature Methods 7, 541 - 546 (2010) 


Example of trained model
cnvHap models haploid CN transitions
• Specify an per-base global transition rate matrix
copy number to
0 1 2 3 4

copy number from


0 q00 q10 ….
1 …
2
3
4

• Rate matrix multiplied by position specific scalar rate


• Values trained using EM, following the approach of
Klosterman et al, used in Xrate for finding substitution
rates
cnvHap joint model of CNV + SNP
haplotypes
Cluster positions modelled using a
linear model

 f0 (g) = 1 
 
  rm ( g )   f1 ( g ) = log(CN( g )/2) 
 2   
  rm ( g )  
f 2 ( g ) = (log(CN( g )/2)) 2

  (g)  = β *  f 3 ( g ) = bfrac( g ) 
 bm   
 2 ( g )  f ( g ) = bfrac ( g ) * (1  bfrac( g ))
 bm   4 
 f ( g ) = bfrac( g ) * (bfrac( g )  0.5) * (bfrac( g )  1) 
 5 

Model fitted using Ridge regression carried


at each iteration of E-M algorithm
Using Illumina SNP arrays
Combined Illumina and Agilent arrays
Illumina Agilent Illumina Agilent Illumina Agilent
Some CNVs exhibit shared structure
Improved CNV genotyping accuracy
Cumulative Frequency of Squared Pearson Correlation
A deletion at 16p11.2 in a patient with
‘extreme obesity’
MLPA probes
+1 Segmental duplication

0
log 2ratio

-1

-2

-3

28.9 Mb 29.2 Mb 29.5 Mb 29.8 Mb 30.1 Mb 30.4 Mb 30.7 Mb


12

1
3

2
2

2
1
2.

3.
2.
2.

2.
3.

1.

q2

3.

4.
3.

q1

q2
p1

q2
p1

p1

p1

q2
q2
p1

chromosome 16

• estimated by aCGH to be 546kb-700kb


• flanked by segmental duplication (>99% sequence identity)
• probably arises by NAHR, implying deletion is 739kb
• BMI = 29.2 kg.m-2 at age 7½
• learning difficulties, delayed speech
RG Walters et al. Nature 463, 671-675 (2010) doi:10.1038/nature08727
16p11.2 deletions in obesity and
population cohorts
Lean/
Cohort Obese
Normal Weight

French child obesity case:control 4/643 0/530

British extreme early-onset obesity


3/931 -
(SCOOP)

French adult obesity case:control 4/705 0/669

French bariatric surgery patients 2/141 -

Swedish discordant siblings 2/159 0/140

Population cohorts
3/1592 1/6235
(NFBC1966, CoLaus, EGPUT)

Obesity: P = 5.8x10-7 OR = 29.8 [3.9–225]


Morbid obesity: P = 6.4x10-8 OR = 43.0 [5.6–329]
Coverage affected by GC content
Regression model fit to correct for
GC bias
Loess curves fit to remove residual
spatial variation of coverage
Detecting CNVS with NGS data
Depth/haploid coverage

B-allele frequency
NGS versus CGH data
NGS data chrom1:350mb-351mb CGH data chrom1:350mb-351mb
NGS vs CGH data
Haplotype structure of deletion
NGS amplification
Depth/coverage
With consistent break-points in
population
Imputation error rate Switch error rate Polyploid phasing and imputation
Conclusions
• Population-haplotype model enables joint
CNV discovery and genotyping using array
data
• Preliminary results indicate this will also
help using NGS data
• Combining information from multiple
platforms improves sensitivity
• Imputation still works for ploidy > 2,
phasing becomes more difficult
Acknowledgements
Evangelos Bellos David Balding (UCL)
Shu-Yi Su Rob Sladek (McGill)
Robin Walters

Julian Asher
Alex Blakemore
Adam de Smith
Phillipe Froguel
Julia El-Sayed Moustafa

You might also like