Question: Pulling out interval adjoining regions
0
gravatar for rbronste
11 weeks ago by
rbronste230
rbronste230 wrote:

Looking for a good way to take a set of intervals and print out an interval set (bed file) that represents regions just upstream and downstream of every interval in the original file, lets say 10kb up and downstream. Any help appreciated, thanks!

interval bedops bed bedtools • 196 views
ADD COMMENTlink modified 11 weeks ago by Alex Reynolds27k • written 11 weeks ago by rbronste230
3
gravatar for Damian Kao
11 weeks ago by
Damian Kao15k
USA
Damian Kao15k wrote:

As with most interval operations, bedtools has a command for it:

https://bedtools.readthedocs.io/en/latest/content/tools/flank.html

ADD COMMENTlink written 11 weeks ago by Damian Kao15k

I didn't know that one, thanks !

ADD REPLYlink written 11 weeks ago by Pierre Lindenbaum116k
1
gravatar for Pierre Lindenbaum
11 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:
 awk -F '\t' '{X=10000; B=int($2);E=int($3);printf("%s\t%d\t%d\n%s\t%d\t%d\n",$1,B-X<0?0:B-X,B,$1,E,E+X);}'
ADD COMMENTlink written 11 weeks ago by Pierre Lindenbaum116k
1
gravatar for bernatgel
11 weeks ago by
bernatgel1.7k
Barcelona, Spain
bernatgel1.7k wrote:

If you are using R, you can do it with the flank function in GenomicRanges. It takes into account the chromosome lengths, if present.

https://bioconductor.org/packages/3.7/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.pdf

ADD COMMENTlink written 11 weeks ago by bernatgel1.7k
1
gravatar for Alex Reynolds
11 weeks ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Yep, just use BEDOPS bedmap --range to map padded elements:

$ bedmap --skip-unmapped --echo-map --range 10000 reference.map map.bed | awk '(!a[$0]++)' | sort-bed - > answer.bed

We use awk to strip duplicates from unsorted results. Sorting is necessary because we use --echo-map, where mapped elements can be returned out of order.

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed.

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by Alex Reynolds27k
1
gravatar for Alex Reynolds
11 weeks ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Here's another approach that uses bedops --range:

$ bedops --merge reference.bed | bedops --range 10000 - | bedops --element-of 1 map.bed - > answer.bed

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed. Adjust padding, as needed.

Merging the reference intervals before padding should handle overlaps, which avoids the need to filter duplicates and resort. So this should work faster than using bedmap --range, I think.

ADD COMMENTlink written 11 weeks ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1242 users visited in the last hour