Dealing with CNV overlap
Entering edit mode
7.8 years ago
Max ▴ 140

I am trying to perform a regression analysis of CNV genotype vs. clinical phenotype for a data set of cancer patients, and need to find out the best way to deal with the issue of CNV overlap.

The CNV data files typically provide chr, start position, stop position, and score for each "locus." The problem is that the start/stop for one "locus" may contain/overlap with the start/stop of other loci to varying degrees. For instances, one locus may be Chr 1 start=1000, stop=10000, another may be Chr 1 start = 5000, stop = 15000.

Presumably each "locus" listed corresponds to a probe, and treating each probe as an independent predictor variable doesn't make sense, because that would involve counting the same duplicated region multiple times with multiple overlapping probes.

Is there a canonical way around this problem, i.e. of using "average" CNV scores weighted by proportion of overlap?

Thanks in advance for any advice and references on this matter.

CNV • 2.3k views
Entering edit mode
You can convert your CNV segmens to a reduced segment matrix with for example bedops --partition and correlate each reduced segment to your phenotypes
Entering edit mode


I am trying to predict the CNVs for three rice genomes using three different softwares like pindel, cnvnator and breakdancer. I would like to know if we find overlap between CNVs reported by two softwares(out of three), should we take only overlapping region for wet lab study or from smallest start coordinate to largest coordinate?


Login before adding your answer.

Traffic: 1321 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6