Question: Merging Genomic Segments Separated By Some Distance / Number Of Markers
gravatar for Ryan D
8.4 years ago by
Ryan D3.3k
Ryan D3.3k wrote:

I'm working with copy number variation (CNV) data if it helps visualize it at all. There are two file types.In the first file we have CNV data which (for simplicity's sake) is formatted like so:

chr1:100000-149000  numsnp=10  length=49000 sample1 startsnp=rs100 endsnp=rs149

chr1:150000-200000  numsnp=10  length=50000 sample1 startsnp=rs150 endsnp=rs200

In the above, each CNV in sample1 spans about 50k, but they are split. This sometimes happens if some intermediate probes didn't detect a copy number change.

There is another file which contains info on which probes/snps are in the file and looks like this:

Name    Chr     Position
rs100   1       100000
rs101   1       101000


rs200   1       200000

My goal is to merge CNVs that are separated either A) by some distance in the same sample or B) by some number of probes, as defined by the second file type. B is the better choice. Any tools or resources any of you might use to do this on a regular basis? Links or detailed instructions most appreciated.

Thanks, Rx

perl merge cnv • 2.9k views
ADD COMMENTlink modified 8.3 years ago by Malachi Griffith17k • written 8.4 years ago by Ryan D3.3k

Could you define more precisely what is meant by "merge CNVs"? Perhaps give an indication of what the final output should look like.

ADD REPLYlink written 8.4 years ago by Neilfws48k

As Neil suggested try to reformulate your question, your current description appears to have insufficient details.

ADD REPLYlink written 8.4 years ago by Istvan Albert ♦♦ 80k

As Neil suggested try reformulating your question, your current description appears to have insufficient details.

ADD REPLYlink written 8.4 years ago by Istvan Albert ♦♦ 80k

OK, the final output here should look like: chr1:100000-200000 numsnp=21 length=100000 sample1 startsnp=rs100 endsnp=rs200

Assuming a gap of one SNP. I think I'm close to a perl solution but are there any bioinformatic tools that can merge adjacent segments separated by some number (or percentage) or markers designated by a third file type.


ADD REPLYlink written 8.4 years ago by Ryan D3.3k
gravatar for Malachi Griffith
7.4 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

Merging CNVs that are separated by some distance is a task that could be handled by BEDTools. In particular refer to the mergeBed function.

"mergeBed combines overlapping or “book-ended” (that is, one base pair away) features in a feature file into a single feature which spans all of the combined features."

By default only features that are already overlapping will be merged. If you want to merge features that may be separated by some distance, it seems like using the -d option should work.

"-d Maximum distance between features allowed for features to be merged. Default is 0. That is, overlapping and/or book-ended features are merged."

You will need to convert your current file format into BED format but that is trivial. Your second scenario is not as obvious but there are many features of BEDTools that allow for a variety of comparisons between two files containing coordinates.

BEDTools on Google Code

BEDTools Manual

ADD COMMENTlink written 7.4 years ago by Malachi Griffith17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1356 users visited in the last hour