Question: Merge intervals between but NOT within 2 BED files
gravatar for Anand Rao
4 months ago by
Anand Rao190
United States
Anand Rao190 wrote:

I have BED files for 2 types of coords: A and B. I want to collapse them into intervals where an entry of A is separated from an entry of B by not more than 20, and vice versa. What is not acceptable is a merge of coordinates with 2 or more entries from just A or 2 or more entries from just B.

For example, let's say:

cat A.bed 
Chr1    10  20  A1
Chr1    50  60  A2
Chr1    75  100 A3


cat B.bed 
Chr1    25  40  B1
Chr1    115 160 B2


cat A.bed B.bed > AB.bed
sortBed -i AB.bed > AB_sort.bed

The output from mergeBed with -d 20 gives what I do NOT want!

mergeBed -d 20 -i AB_sort.bed -c 4 -o collapse -delim ","
Chr1    10  160 A1,B1,A2,A3,B2

Results look different when I visually parse bedtools closest results seen below

bedtools closest -d -a A.bed -b B.bed 
Chr1    10  20  A1  Chr1    25  40  B1  6
Chr1    50  60  A2  Chr1    25  40  B1  11
Chr1    75  100 A3  Chr1    115 160 B2  16

What combinations of bedtools's sub-commands should I use to get the type of result that I seek, shown below:

Chr1    10  60  A1,B1,A2
Chr1    75  160 A3,B2

Thanks to Alex for pointing out an error, changed from A3,B1 to A3,B2 in the line above

In my manually parsed output example above, I do NOT collapse coords for features A2 and A3 together (the rule mentioned in the intro) whereas mergeBed is agnostic of this. Any ideas on how to use bedtools and / or bedOps?

note to self: Images (with and without annotation) to help visualize this example has been added below...

Raw Image for example Visualization of example

Annotated image for example AnnotatedImage

bedops merge overlap bedtools • 266 views
ADD COMMENTlink modified 4 months ago by Alex Reynolds27k • written 4 months ago by Anand Rao190

Can you explain the A3,B1 part of your expected result? That doesn't make sense in the context of your question.

You can do this with bedmap very easily, I think, but I need to understand if your expected output is correct, first.

ADD REPLYlink modified 4 months ago • written 4 months ago by Alex Reynolds27k

Thank you for pointing that out, Alex. I fixed the error above. Look forward to your response. Thanks, in advance.

ADD REPLYlink written 4 months ago by Anand Rao190

I think this is complicated enough to warrant a custom python/perl/R/ruby/haskell/FORTRAN program :-)

ADD REPLYlink written 4 months ago by RamRS20k
gravatar for Alex Reynolds
4 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

I think this logic should work, so long as elements in B.bed are not within each other by 20*2 (40) bases, and elements in A.bed and B.bed are sorted with sort-bed (not sortBed):

$ bedmap --echo-map --range 20 --delim '\t' --echo A.bed B.bed \
    | sort -k4,4 \
    | awk -vOFS="\t" -vFS="\t" '{ \
          a[$4]["chr"] = $1; \
          a[$4]["start"] = ($2 < $6) ? ($2 < a[$4]["start"] ? $2 : a[$4]["start"]) : $6; \
          a[$4]["stop"] = ($3 > $7) ? ($3 > a[$4]["stop"] ? $3 : a[$4]["stop"]) : $7; \
          a[$4]["ids"] = a[$4]["ids"] ":" $8; \
      } \
      END { \
          for (i in a) { \
              print a[i]["chr"], a[i]["start"], a[i]["stop"], a[i]["ids"]":"i; \
          } \
      }' \
    | sort-bed - \
    > answer.bed

Test run:

$ cat answer.bed
Chr1    10  60  :A1:A2:B1
Chr1    75  160 :A3:B2

If elements in B.bed are within 40 bases or less, then I think the result could be muddled by multiple overlap categories.

Also, this isn't strictly a merge, but simply taking the min-max bounds of all spanning elements in a "span group". There could be gaps within these bounds (as in this example) that a strict merge would be split upon.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Alex Reynolds27k

Thank you. I've got to think about this, using real data. Because there is no restriction to separation between elements in real-life B.bed data....Thanks again...Like Ram suggested, I might have to cobble up some code for these customized analyses...

ADD REPLYlink written 4 months ago by Anand Rao190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 753 users visited in the last hour