Question: Pulling out interval adjoining regions
0
gravatar for rbronste
2.0 years ago by
rbronste360
rbronste360 wrote:

Looking for a good way to take a set of intervals and print out an interval set (bed file) that represents regions just upstream and downstream of every interval in the original file, lets say 10kb up and downstream. Any help appreciated, thanks!

interval bedops bed bedtools • 456 views
ADD COMMENTlink modified 2.0 years ago by Alex Reynolds31k • written 2.0 years ago by rbronste360
3
gravatar for Damian Kao
2.0 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

As with most interval operations, bedtools has a command for it:

https://bedtools.readthedocs.io/en/latest/content/tools/flank.html

ADD COMMENTlink written 2.0 years ago by Damian Kao15k

I didn't know that one, thanks !

ADD REPLYlink written 2.0 years ago by Pierre Lindenbaum131k
1
gravatar for Pierre Lindenbaum
2.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:
 awk -F '\t' '{X=10000; B=int($2);E=int($3);printf("%s\t%d\t%d\n%s\t%d\t%d\n",$1,B-X<0?0:B-X,B,$1,E,E+X);}'
ADD COMMENTlink written 2.0 years ago by Pierre Lindenbaum131k
1
gravatar for bernatgel
2.0 years ago by
bernatgel2.7k
Barcelona, Spain
bernatgel2.7k wrote:

If you are using R, you can do it with the flank function in GenomicRanges. It takes into account the chromosome lengths, if present.

https://bioconductor.org/packages/3.7/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.pdf

ADD COMMENTlink written 2.0 years ago by bernatgel2.7k
1
gravatar for Alex Reynolds
2.0 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Yep, just use BEDOPS bedmap --range to map padded elements:

$ bedmap --skip-unmapped --echo-map --range 10000 reference.map map.bed | awk '(!a[$0]++)' | sort-bed - > answer.bed

We use awk to strip duplicates from unsorted results. Sorting is necessary because we use --echo-map, where mapped elements can be returned out of order.

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed.

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by Alex Reynolds31k
1
gravatar for Alex Reynolds
2.0 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Here's another approach that uses bedops --range:

$ bedops --merge reference.bed | bedops --range 10000 - | bedops --element-of 1 map.bed - > answer.bed

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed. Adjust padding, as needed.

Merging the reference intervals before padding should handle overlaps, which avoids the need to filter duplicates and resort. So this should work faster than using bedmap --range, I think.

ADD COMMENTlink written 2.0 years ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1206 users visited in the last hour