Question

Pulling out interval adjoining regions

0

Entering edit mode

6.9 years ago

rbronste ▴ 420

Looking for a good way to take a set of intervals and print out an interval set (bed file) that represents regions just upstream and downstream of every interval in the original file, lets say 10kb up and downstream. Any help appreciated, thanks!

bedtools bedops bed interval • 1.9k views

ADD COMMENT • link updated 6.9 years ago by Alex Reynolds 36k • written 6.9 years ago by rbronste ▴ 420

score 3 · Answer 1 · 2018-11-26

3

Entering edit mode

6.9 years ago

Damian Kao 16k

As with most interval operations, bedtools has a command for it:

https://bedtools.readthedocs.io/en/latest/content/tools/flank.html

ADD COMMENT • link 6.9 years ago by Damian Kao 16k

0

Entering edit mode

I didn't know that one, thanks !

ADD REPLY • link 6.9 years ago by Pierre Lindenbaum 166k

score 1 · Answer 2 · 2018-11-26

1

Entering edit mode

6.9 years ago

Pierre Lindenbaum 166k

 awk -F '\t' '{X=10000; B=int($2);E=int($3);printf("%s\t%d\t%d\n%s\t%d\t%d\n",$1,B-X<0?0:B-X,B,$1,E,E+X);}'

ADD COMMENT • link 6.9 years ago by Pierre Lindenbaum 166k

score 1 · Answer 3 · 2018-11-26

1

Entering edit mode

6.9 years ago

bernatgel ★ 3.4k

If you are using R, you can do it with the flank function in GenomicRanges. It takes into account the chromosome lengths, if present.

https://bioconductor.org/packages/3.7/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.pdf

ADD COMMENT • link 6.9 years ago by bernatgel ★ 3.4k

score 1 · Answer 4 · 2018-11-26

Yep, just use BEDOPS bedmap --range to map padded elements:

$ bedmap --skip-unmapped --echo-map --range 10000 reference.map map.bed | awk '(!a[$0]++)' | sort-bed - > answer.bed

We use awk to strip duplicates from unsorted results. Sorting is necessary because we use --echo-map, where mapped elements can be returned out of order.

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed.

score 1 · Answer 5 · 2018-11-27

Here's another approach that uses bedops --range:

$ bedops --merge reference.bed | bedops --range 10000 - | bedops --element-of 1 map.bed - > answer.bed

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed. Adjust padding, as needed.

Merging the reference intervals before padding should handle overlaps, which avoids the need to filter duplicates and resort. So this should work faster than using bedmap --range, I think.