Question: From Imputed Snps To Cnvs
gravatar for romsen
6.6 years ago by
romsen60 wrote:


I try to describe it in more detail. :)

I have 300,000 SNP genome wide data (ILM HumanHap). And of course I have intensity values and B allele frequency. In principle every kind of value one can export from beadstudio.

For CNV detection I used Log R ratio and B allele frequency and the software PennCNV. This was successful. But because of low SNP density in some regions, you mentioned already, breakpoints are over/ underestimated or a CNV call is completely missing. For special regions I have positive control CNVs which were genotyped via TaqMan/PCR. But I can't call them via PennCNV because there are no SNPs on the array.

In a side project I made a imputation of the 300k SNP data with Impute2. Therefore I used genotype calls (AA, AB and BB).

Now I wonder about a method to use imputed SNP data (genotype calls AA, BB) for calculating CNVs. Over linkage disequilibrium or tag SNP Information or whatever :) I found this, but I'm not sure if it works with my data.

imputation snp cnv • 3.3k views
ADD COMMENTlink modified 6.6 years ago by Irsan6.8k • written 6.6 years ago by romsen60

I think your question is not totally clear (and that might be the reason why you get little response). So you have data on (GW=genome wide?) 300.000 SNPs? What kind of data is this? Genotype calls (AA,AB,BB), intensity/relative abundance in genome like probe intensity or read counts, B allele frequency? Plink is used to do Genome Wide Association Studies (GWAS) which is totally different from CNV-analysis. Furthermore, when you have data on 300.000 SNPs distributed across the whole genome you are fine with doing CNV-analysis either with PennCNV, QuantiSNP or any other CNV-tool. When you don't have much probes your limit of detection on the length of CNV-segments just becomes smaller. However people have done CNV analyses with much less probes/SNPs than 300.000

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Irsan6.8k

CNV analysis requires intensities. 300k snps is fine for CNV detection.

ADD REPLYlink written 6.6 years ago by Sean Davis25k

And how come that you dont have the intensity data? You sure its not in the public domain?

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Irsan6.8k

I edited and changed some things!

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by romsen60
gravatar for Irsan
6.6 years ago by
Irsan6.8k wrote:

As far as I know, you can not estimate copy number states based on genotype calls. You can estimate Loss of Heterozygosity (LOH) based on the B Allele Frequencies (the numbers that are used to make genotype calls).

Since you have succesfully done CNV-analysis with PennCNV I assume you have the Log R Ratios (LRR)and B Allele Frequencies (BAF) somewhere. For copy number variation analysis and Loss of Heterozygosity (I think detecting LOH is what you are looking for) I would do the following using R/Bioconductor:

  1. Put the LRR of all samples in one GRanges object where each column in the elementMetadata represents the LRR-figures of 1 sample. Make sure the rows in the GRanges object are sorted by chr1,chr2,chr3 so not string sorted like chr1,chr10,chr11
  2. Perform CBS-segmentation on the GRanges object with fastseg
  3. Make a dot-plot of the LRR of each chromosome and each sample along with the CBS-segmented regions with ggplot (popular visualization that supports common bioconductor data formats like GRanges). Check in the LRR-dotplot whether the CBS-segmentation describes your data well. CBS is known for producing to much breakpoints. When you see that happening you might want to change the segmentation parameters until you feel the segmentation is optimized for your data
  4. Also, for each chromosome and sample visualize the B Allele Frequencies (for example with ggplot2) and look for regions where the the B Allele Frequency profiles change. I dont know if/how you can do segmentation on LOH...
ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Irsan6.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 787 users visited in the last hour