Question

Cnv Calling In A Region With Homozygous Snps

0

Entering edit mode

11.2 years ago

romsen ▴ 70

I genotyped 100 samples for CNVs with Taqman Assays and found 6 samples with a heterozygous duplication. I used two independent assays which target at the beginning and the end of the CNVs, according DGV. So I’m "quite" sure that these duplications are really Gains.

We have also SNP data (LRR and BAFvalues) from a genome wide SNP array in this area. But exactly there are only 17 SNPs. Not that much but sufficient for PennCNV to call CNVs. But only in 4 of the 6 samples we were able to reproduce the results from qPCR.

Therefore I plotted BAF and LRR values against the SNP- position and saw several things:

The 4 samples in which both methods (qPCR & PennCNV) called a Gain have many SNPs with heterozygous Alleles (8 to 11 of 17). Thus BAF values cluster around 4 points. (AAA,AAB,BBA,BBB) and the LRR values increase.

Nearly all SNPs in the 2 samples in which the qPCR analysis showed also a duplication but PennCNV not, are homozygous. (15-16 of 17). This is the reason why all SNPs have BAF values around 0 or1. (AAA or BBB) The surprising effect is that the LRR is not increasing. But actually it should increase in case of duplications.

Now my question: Could it be that the effect of non-increasing LRR values is due to the homozygous SNPs? The data was exported from Beadstudio. So I would imagine that the internal LRR calculation [LRR = log2(Robserved/Rexpected)] failed due to threshold mistakes or interpolation failures. (more than 60% of the 100 samples carry mainly homozygous SNPs)

(Rexpected is computed from linear interpolation of canonical genotype clusters (Peiffer et al. 2006))

Thanks

cnv snp • 2.9k views

ADD COMMENT • link updated 11.2 years ago by Matt Shirley 10k • written 11.2 years ago by romsen ▴ 70

score 1 · Answer 1 · 2013-03-15

1

Entering edit mode

11.2 years ago

Matt Shirley 10k

If you are calculating your genotype clusters from 17 SNPs, I would not trust any of your data. You should be using the entire array for cluster generation, and then subset the BAF and LRR values after.

ADD COMMENT • link 11.2 years ago by Matt Shirley 10k

0

Entering edit mode

Sorry I'm a complete rookie in this field. But if I understand you right ("entire array for cluster generation") I think I've done that. I used BAF and LRR values which were exported from beadstudio as final results. Not only the 17SNPs but the complete array.

ADD REPLY • link 11.2 years ago by romsen ▴ 70