Bedtools merge -d option help
1
0
Entering edit mode
5.1 years ago
jantu008 • 0

Hi, I'm using Bedtools for R (bedr) in Ubuntu and I am trying to merge genomic regions that are within 100 bp of one another. For example, I have a data frame (df) with 3274 obs of 3 variables and the first rows look like:

df
chr start end
chr1 1214882 1214884
chr1 1214942 1214944
chr1 1215030 1215032
chr1 1215036 1215038

and when I merge using:

> df1 <- bedr.merge.region(df.sorted, distance = 100, number = TRUE, check.zero.based = TRUE, check.chr = TRUE, check.valid = TRUE, check.sort = TRUE)

I get the data frame (df1):

df1
chr start end V4
chr1 1214882 1215038 4

My goal is to merge all coordinates that are within 100 bp (distance=100) so from data frame df it should've merged the 2 first rows together and then the 2 last together since there's less than 100 bp between the start of row 1 (on df) and end of row 2 (on df), not the 4 (as if shows in df1), since that gives a distance of 156 bp (1215038 - 1214882 = 156),

Any help as to why the parameter "distance = 100" is not merging only regions within 100 bp and it merges regions at 156 bp? The goal is to be able to design probes for wet lab to capture regions of interest but our probes are limited to 100 bp so I want to see how many probes of 100 bp I would need to build to capture all regions and what would their coordinates be.

Thank you Joana

To clarify, my expected output from the code is (df2)

df2
chr start end V4
chr1 1214882 1214944 2
chr1 1215030 1215038 2

bedtools Rstudio Ubuntu distance bedr • 1.7k views
ADD COMMENT
1
Entering edit mode

If you need to merge one element at a time, you might use a Python or awk script to read in one element at a time, storing the first element. Progressively merge when subsequent element's end positions are within X bases of the first element's start position. When that distance test fails, then you print the range of the merged elements, and reset the "first" element. Repeat this test as you iterate through the rest of the elements. This is a fairly basic scripting exercise.

ADD REPLY
0
Entering edit mode

Hi Alex,

Thank you for your reply. I figured it could be something simple to write even though my experience in Python or R is very basic. So I figured if this function existed already or there was a parameter for the 'merge' function in Bedtools that I could use for this purpose, I could just plug my data and get the probes for the wet lab portion, and then revisit the issue as a scripting exercise for my Python course ... Thank you!

ADD REPLY
1
Entering edit mode
5.1 years ago
ATpoint 81k

The "wrong" result is correct. Merging intervals 1 and 2 creates the new interval 1214882 to 1214944 and 1214994 is 36bp away from the start of interval 3, triggering the merge again, and so it goes for interval 4, resulting in one final interval.

ADD COMMENT
0
Entering edit mode

Hi ATpoint,

I see what is doing now ... any idea how can I prevent it from doing this? I don't see any parameter on the code ... Thank you! Joana

ADD REPLY
0
Entering edit mode

It will come down to a custom script as Alex Reynolds suggests. Try something out, and come back if you get stuck (but try something yourself first ;-)

ADD REPLY

Login before adding your answer.

Traffic: 2778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6