Question

Dealing with CNV overlap

0

Entering edit mode

10.1 years ago

Max ▴ 150

I am trying to perform a regression analysis of CNV genotype vs. clinical phenotype for a data set of cancer patients, and need to find out the best way to deal with the issue of CNV overlap.

The CNV data files typically provide chr, start position, stop position, and score for each "locus." The problem is that the start/stop for one "locus" may contain/overlap with the start/stop of other loci to varying degrees. For instances, one locus may be Chr 1 start=1000, stop=10000, another may be Chr 1 start = 5000, stop = 15000.

Presumably each "locus" listed corresponds to a probe, and treating each probe as an independent predictor variable doesn't make sense, because that would involve counting the same duplicated region multiple times with multiple overlapping probes.

Is there a canonical way around this problem, i.e. of using "average" CNV scores weighted by proportion of overlap?

Thanks in advance for any advice and references on this matter.

CNV • 2.8k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by Max ▴ 150

0

Entering edit mode

You can convert your CNV segmens to a reduced segment matrix with for example bedops --partition and correlate each reduced segment to your phenotypes

ADD REPLY • link 9.9 years ago by Irsan ★ 7.8k

0

Entering edit mode

Hi

I am trying to predict the CNVs for three rice genomes using three different softwares like pindel, cnvnator and breakdancer. I would like to know if we find overlap between CNVs reported by two softwares(out of three), should we take only overlapping region for wet lab study or from smallest start coordinate to largest coordinate?

ADD REPLY • link 8.2 years ago by sukesh1411 ▴ 30