Question: Pulling out interval adjoining regions
0
gravatar for rbronste
16 days ago by
rbronste230
rbronste230 wrote:

Looking for a good way to take a set of intervals and print out an interval set (bed file) that represents regions just upstream and downstream of every interval in the original file, lets say 10kb up and downstream. Any help appreciated, thanks!

interval bedops bed bedtools • 131 views
ADD COMMENTlink modified 15 days ago by Alex Reynolds26k • written 16 days ago by rbronste230
3
gravatar for Damian Kao
16 days ago by
Damian Kao15k
USA
Damian Kao15k wrote:

As with most interval operations, bedtools has a command for it:

https://bedtools.readthedocs.io/en/latest/content/tools/flank.html

ADD COMMENTlink written 16 days ago by Damian Kao15k

I didn't know that one, thanks !

ADD REPLYlink written 16 days ago by Pierre Lindenbaum115k
1
gravatar for Pierre Lindenbaum
16 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:
 awk -F '\t' '{X=10000; B=int($2);E=int($3);printf("%s\t%d\t%d\n%s\t%d\t%d\n",$1,B-X<0?0:B-X,B,$1,E,E+X);}'
ADD COMMENTlink written 16 days ago by Pierre Lindenbaum115k
1
gravatar for bernatgel
16 days ago by
bernatgel1.4k
Barcelona, Spain
bernatgel1.4k wrote:

If you are using R, you can do it with the flank function in GenomicRanges. It takes into account the chromosome lengths, if present.

https://bioconductor.org/packages/3.7/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.pdf

ADD COMMENTlink written 16 days ago by bernatgel1.4k
1
gravatar for Alex Reynolds
16 days ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

Yep, just use BEDOPS bedmap --range to map padded elements:

$ bedmap --skip-unmapped --echo-map --range 10000 reference.map map.bed | awk '(!a[$0]++)' | sort-bed - > answer.bed

We use awk to strip duplicates from unsorted results. Sorting is necessary because we use --echo-map, where mapped elements can be returned out of order.

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed.

ADD COMMENTlink modified 16 days ago • written 16 days ago by Alex Reynolds26k
1
gravatar for Alex Reynolds
15 days ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

Here's another approach that uses bedops --range:

$ bedops --merge reference.bed | bedops --range 10000 - | bedops --element-of 1 map.bed - > answer.bed

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed. Adjust padding, as needed.

Merging the reference intervals before padding should handle overlaps, which avoids the need to filter duplicates and resort. So this should work faster than using bedmap --range, I think.

ADD COMMENTlink written 15 days ago by Alex Reynolds26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1717 users visited in the last hour