I'm working with copy number variation (CNV) data if it helps visualize it at all. There are two file types.In the first file we have CNV data which (for simplicity's sake) is formatted like so:
chr1:100000-149000 numsnp=10 length=49000 sample1 startsnp=rs100 endsnp=rs149 chr1:150000-200000 numsnp=10 length=50000 sample1 startsnp=rs150 endsnp=rs200
In the above, each CNV in sample1 spans about 50k, but they are split. This sometimes happens if some intermediate probes didn't detect a copy number change.
There is another file which contains info on which probes/snps are in the file and looks like this:
Name Chr Position rs100 1 100000 rs101 1 101000 ... rs200 1 200000
My goal is to merge CNVs that are separated either A) by some distance in the same sample or B) by some number of probes, as defined by the second file type. B is the better choice. Any tools or resources any of you might use to do this on a regular basis? Links or detailed instructions most appreciated.