Question: Merge intervals between but NOT within 2 BED files
0
gravatar for Anand Rao
11 months ago by
Anand Rao210
United States
Anand Rao210 wrote:

I have BED files for 2 types of coords: A and B. I want to collapse them into intervals where an entry of A is separated from an entry of B by not more than 20, and vice versa. What is not acceptable is a merge of coordinates with 2 or more entries from just A or 2 or more entries from just B.

For example, let's say:

cat A.bed 
Chr1    10  20  A1
Chr1    50  60  A2
Chr1    75  100 A3

and

cat B.bed 
Chr1    25  40  B1
Chr1    115 160 B2

Now

cat A.bed B.bed > AB.bed
sortBed -i AB.bed > AB_sort.bed

The output from mergeBed with -d 20 gives what I do NOT want!

mergeBed -d 20 -i AB_sort.bed -c 4 -o collapse -delim ","
Chr1    10  160 A1,B1,A2,A3,B2

Results look different when I visually parse bedtools closest results seen below

bedtools closest -d -a A.bed -b B.bed 
Chr1    10  20  A1  Chr1    25  40  B1  6
Chr1    50  60  A2  Chr1    25  40  B1  11
Chr1    75  100 A3  Chr1    115 160 B2  16

What combinations of bedtools's sub-commands should I use to get the type of result that I seek, shown below:

AB_dist20_merge.bed
Chr1    10  60  A1,B1,A2
Chr1    75  160 A3,B2

Thanks to Alex for pointing out an error, changed from A3,B1 to A3,B2 in the line above

In my manually parsed output example above, I do NOT collapse coords for features A2 and A3 together (the rule mentioned in the intro) whereas mergeBed is agnostic of this. Any ideas on how to use bedtools and / or bedOps?

note to self: Images (with and without annotation) to help visualize this example has been added below...

Raw Image for example Visualization of example

Annotated image for example AnnotatedImage

bedops merge overlap bedtools • 430 views
ADD COMMENTlink modified 11 months ago by Alex Reynolds28k • written 11 months ago by Anand Rao210
1

Can you explain the A3,B1 part of your expected result? That doesn't make sense in the context of your question.

You can do this with bedmap very easily, I think, but I need to understand if your expected output is correct, first.

ADD REPLYlink modified 11 months ago • written 11 months ago by Alex Reynolds28k

Thank you for pointing that out, Alex. I fixed the error above. Look forward to your response. Thanks, in advance.

ADD REPLYlink written 11 months ago by Anand Rao210

I think this is complicated enough to warrant a custom python/perl/R/ruby/haskell/FORTRAN program :-)

ADD REPLYlink written 11 months ago by RamRS23k
1
gravatar for Alex Reynolds
11 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

I think this logic should work, so long as elements in B.bed are not within each other by 20*2 (40) bases, and elements in A.bed and B.bed are sorted with sort-bed (not sortBed):

$ bedmap --echo-map --range 20 --delim '\t' --echo A.bed B.bed \
    | sort -k4,4 \
    | awk -vOFS="\t" -vFS="\t" '{ \
          a[$4]["chr"] = $1; \
          a[$4]["start"] = ($2 < $6) ? ($2 < a[$4]["start"] ? $2 : a[$4]["start"]) : $6; \
          a[$4]["stop"] = ($3 > $7) ? ($3 > a[$4]["stop"] ? $3 : a[$4]["stop"]) : $7; \
          a[$4]["ids"] = a[$4]["ids"] ":" $8; \
      } \
      END { \
          for (i in a) { \
              print a[i]["chr"], a[i]["start"], a[i]["stop"], a[i]["ids"]":"i; \
          } \
      }' \
    | sort-bed - \
    > answer.bed

Test run:

$ cat answer.bed
Chr1    10  60  :A1:A2:B1
Chr1    75  160 :A3:B2

If elements in B.bed are within 40 bases or less, then I think the result could be muddled by multiple overlap categories.

Also, this isn't strictly a merge, but simply taking the min-max bounds of all spanning elements in a "span group". There could be gaps within these bounds (as in this example) that a strict merge would be split upon.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Alex Reynolds28k

Thank you. I've got to think about this, using real data. Because there is no restriction to separation between elements in real-life B.bed data....Thanks again...Like Ram suggested, I might have to cobble up some code for these customized analyses...

ADD REPLYlink written 11 months ago by Anand Rao210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 542 users visited in the last hour