Question: Bedtools merge -d option help
gravatar for jantu008
3 months ago by
jantu0080 wrote:

Hi, I'm using Bedtools for R (bedr) in Ubuntu and I am trying to merge genomic regions that are within 100 bp of one another. For example, I have a data frame (df) with 3274 obs of 3 variables and the first rows look like:

chr start end
chr1 1214882 1214884
chr1 1214942 1214944
chr1 1215030 1215032
chr1 1215036 1215038

and when I merge using:

> df1 <- bedr.merge.region(df.sorted, distance = 100, number = TRUE, = TRUE, check.chr = TRUE, check.valid = TRUE, check.sort = TRUE)

I get the data frame (df1):

chr start end V4
chr1 1214882 1215038 4

My goal is to merge all coordinates that are within 100 bp (distance=100) so from data frame df it should've merged the 2 first rows together and then the 2 last together since there's less than 100 bp between the start of row 1 (on df) and end of row 2 (on df), not the 4 (as if shows in df1), since that gives a distance of 156 bp (1215038 - 1214882 = 156),

Any help as to why the parameter "distance = 100" is not merging only regions within 100 bp and it merges regions at 156 bp? The goal is to be able to design probes for wet lab to capture regions of interest but our probes are limited to 100 bp so I want to see how many probes of 100 bp I would need to build to capture all regions and what would their coordinates be.

Thank you Joana

To clarify, my expected output from the code is (df2)

chr start end V4
chr1 1214882 1214944 2
chr1 1215030 1215038 2

ADD COMMENTlink modified 3 months ago • written 3 months ago by jantu0080

If you need to merge one element at a time, you might use a Python or awk script to read in one element at a time, storing the first element. Progressively merge when subsequent element's end positions are within X bases of the first element's start position. When that distance test fails, then you print the range of the merged elements, and reset the "first" element. Repeat this test as you iterate through the rest of the elements. This is a fairly basic scripting exercise.

ADD REPLYlink modified 3 months ago • written 3 months ago by Alex Reynolds28k

Hi Alex,

Thank you for your reply. I figured it could be something simple to write even though my experience in Python or R is very basic. So I figured if this function existed already or there was a parameter for the 'merge' function in Bedtools that I could use for this purpose, I could just plug my data and get the probes for the wet lab portion, and then revisit the issue as a scripting exercise for my Python course ... Thank you!

ADD REPLYlink written 12 weeks ago by jantu0080
gravatar for ATpoint
3 months ago by
ATpoint17k wrote:

The "wrong" result is correct. Merging intervals 1 and 2 creates the new interval 1214882 to 1214944 and 1214994 is 36bp away from the start of interval 3, triggering the merge again, and so it goes for interval 4, resulting in one final interval.

ADD COMMENTlink written 3 months ago by ATpoint17k

Hi ATpoint,

I see what is doing now ... any idea how can I prevent it from doing this? I don't see any parameter on the code ... Thank you! Joana

ADD REPLYlink written 3 months ago by jantu0080

It will come down to a custom script as Alex Reynolds suggests. Try something out, and come back if you get stuck (but try something yourself first ;-)

ADD REPLYlink written 12 weeks ago by ATpoint17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1097 users visited in the last hour