Question: Help Refining Cnv Regions For Population Comparisions
I am working with CNV data generated off the Affymetrix 6.0 array. CNVs were estimated using the PennCNV-Affy protocol. Just glancing at the data after this step, I noticed that many segments overlap in different samples. Simple sorting based on start probe location works well enough, but I'd like to be able to identify for each "unique" CNV region, the number of samples that are identified as copy-number variant. Comparisons among populations should then be straight forward.

My problem so far is that I don't know what parameters to use to consider that two (or more, hopefully) segments are truly overlapping (length of overlap, #SNPs in overlap, % of total segment), and which may be two separate CNV loci.

Does anyone know of a standard pipeline to merge overlapping CNVs in different samples, or a simple method that is easily justifiable given the platforms I have used so far? I should mention that I am extremely new to programming and have only a bit of experience with Perl. Thanks!

