Question: bedtools, merge function: avoid merging intervals if separated by a single base
0
gravatar for gabri
22 months ago by
gabri50
gabri50 wrote:

Hi All,

I'm using bedtools v2.26.0 to combine overlapping intervals of a bed file into “merged” intervals. I have a problem with some SNPs features (same start and end coord). These are my bed file and command line:

input.bed

chr1  70833  70833  a
chr1  70837  70837  b
chr1  70839  70839  c
chr1  71001  71001  d

$ bedtools merge -i input.bed -c 4 -o collapse > output.bed

output.bed

chr1  70833  70833  a
chr1  70836  70840  b,c
chr1  71001  71001  d

By default, overlapping and/or "book-ended" features are combined.

For my analysis, I need to be very accurate. So, I only want to merge the truly overlapping features. I need the features to remain separated if they are separated by one or two bases. So, in this case, the output should remain the same as the input because there aren't any overlapping intervals:

output.bed

chr1  70833  70833  a
chr1  70837  70837  b
chr1  70839  70839  c
chr1  71001  71001  d

Is there a way to obtain this kind of sensitivity with bedtools?

Thanks

ADD COMMENTlink modified 10 months ago by Ekalavya10 • written 22 months ago by gabri50
1

Hello gabri ,

the output of bedtools is interesting. I'm not sure whether this a bug or by design.

Nevertheless I think your bed doesn't represent the positions you think. bed uses 0-based, half open intervals. That means it starts counting the position with 0 instead of 1. And the end position, given in the third column, isn't included. Saying this all your given intervals include no bases.

I guess your bed file should look like this:

chr1  70832  70833  a
chr1  70836  70837  b
chr1  70838  70839  c
chr1  71000  71001  d

fin swimmer

ADD REPLYlink modified 22 months ago • written 22 months ago by finswimmer14k
1

Thank you ATpoint. It worked for me!

ADD REPLYlink written 10 months ago by Ekalavya10

Hi gabri, did you find any workaround to this problem? I am in the same situation. I want to consolidate coordinates of all the Cs in my RRBS dataset. For this, I pooled (all replicates and groups) the CX-report from bismark and used bedtools merge to get unique locations. My final bed file contains regions with length > 1! I found these are repeats of C. I tried with both 0- and 1-based coordinates.

(...truncated by ATpoint to avoid overly long post, please check my answer towards the -d option in bedtools merge)

ADD REPLYlink modified 10 months ago by ATpoint42k • written 10 months ago by Ekalavya10
1
gravatar for ATpoint
10 months ago by
ATpoint42k
Germany
ATpoint42k wrote:

Check the -d option of bedtools merge. If you provide negative integers you can set a minimum number of bases that must overlap before a merge which should be what you need. In case of the toplevel question -d -1 should do the trick.

ADD COMMENTlink modified 10 months ago • written 10 months ago by ATpoint42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1832 users visited in the last hour