Question: Merging a bed file with regions delimited by reads
0
gravatar for Jon B
3.3 years ago by
Jon B0
Jon B0 wrote:

Hi,

I have a massive file with a lot of regions in bed format. Many of them are overlapping either partially or entirely. I would like to merge these regions, together with their names, in a way that I can see exactly what region (ID'd by the name) contributed to the result.

Using the standard mergeBed doesn't do what I need it to. I can run mergeBed - i bigfile.bed -c1,12 -o count,collapse to get the clustered regions, but that is not what I want. Due to the data I have, there are partial overlaps in many many places and this leads to gargantuan merged intervals that do not specify where each name contributes.

Using this input -

chr1 150 158 A
chr1 155 165 B
chr1 160 170 C
chr1 177 190 D

With the mergeBed I listed above would result in the output -

chr1 150 170 A,B,C 3
chr1 177 190 D, 1

This is unfortunately not what I need.

Ideally, merging the above input would give me the following output (with the 4th column being the name of the region and the 5th being the number of input regions used for calculating the output region) -

chr1 150 154 A 1
chr1 155 158 A,B 2
chr1 159 159 B, 1
chr1 160 165 B,C 2
chr1 166 170 C 1
chr1 177 190 D 1

Note that the regions here are divided by the contributing regions, and you can easily see the original input regions in this result. It is also possible to deduce exactly what region is contributing, such that if you SNPs fall into one of these, you know exactly what is being directly impacted.

Is there any way to do this with existing software?

Thank you in advance

mergebed snp bedtools genome • 816 views
ADD COMMENTlink modified 3.3 years ago by Alex Reynolds30k • written 3.3 years ago by Jon B0
2
gravatar for Alex Reynolds
3.3 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Sure, use BEDOPS. You can partition intervals with bedops --partition and pipe them to bedmap to do mapping and counting:

$ bedops --partition intervals.bed | bedmap --echo --echo-map-id-uniq --count --delim '\t' --multidelim ',' - intervals.bed > answer.bed

For example:

$ echo -e 'chr1\t100\t200\tA\nchr1\t120\t220\tB\nchr1\t180\t260\tC' > test.bed

Here's what test.bed looks like:

$ less test.bed
chr1    100     200     A
chr1    120     220     B
chr1    180     260     C

Then:

$ bedops --partition test.bed | bedmap --echo --echo-map-id-uniq --count --delim '\t' --multidelim ',' - test.bed 
chr1    100     120     A       1
chr1    120     180     A,B     2
chr1    180     200     A,B,C   3
chr1    200     220     B,C     2
chr1    220     260     C       1
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Alex Reynolds30k

That worked beautifully, thank you.

Btw, is there an option to make bedmap count from 1 rather than from 0? When I have

chr1 5 10 chr1 6 20

I get

chr1 5 6 chr1 6 10 chr1 10 20

And I would prefer to see

chr1 5 5 chr1 6 10 chr1 11 20

Anyhoo, I've marked your answer as the correct one, thanks again.

ADD REPLYlink written 3.3 years ago by Jon B0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1554 users visited in the last hour