Question: Splitting even length BED records by an even number of base-pairs produces a single base-pair window
0
gravatar for James Ashmore
3.3 years ago by
James Ashmore2.5k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.5k wrote:

I ran into this issue today while I was plotting TSS occupancy heatmaps. If you take the coordinates of a single transcription start site, extend them by 1000bp and cut this region into 500bp windows then you end up with 5 windows, not 4 as I would have assumed.

Take the TSS of a gene:

# test.bed
chr1    4857693    4857694    Tcea1    1    +

Increase the size by 1000bp upstream and downstream:

bedtools slop -i test.bed -g mm10.chromsizes -b 1000 > test.plusminus1000bp.bed

Check output of bedtools slop:

# test.plusminus1000bp.bed
chr1    4856693    4858694    Tcea1    1    +

Split feature into 500bp windows:

bedtools makewindows -b test.plusminus1000bp.bed -w 500 > test.plusminus1000bp.window500bp.bed

Check output of bedtools makewindows:

# test.plusminus1000bp.window500bp.bed
chr1    4856693    4857193
chr1    4857193    4857693
chr1    4857693    4858193
chr1    4858193    4858693
chr1    4858693    4858694

The last feature in the file is a single base-pair window. I assume this happens because of the 0-based coordinate system, but I'm not sure it's obvious that such a window is produced. I wonder if such output could change the results of an analysis if one of the assumptions is that all windows are the same length? Would it be better to remove this single base-pair window?

bed bedtools • 1.2k views
ADD COMMENTlink modified 3.3 years ago by dariober9.7k • written 3.3 years ago by James Ashmore2.5k
2
gravatar for Alex Reynolds
3.3 years ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

This is due to the half-open nature of how BED elements are indexed. You can fix this with an asymmetric range operation with BEDOPS bedops --range and then generate windows with bedops --chop.

For instance:

$ echo -e 'chr1\t4857693\t4857694\tTcea1\t1\t+' | bedops --range -1000:999 --everything - | bedops --chop 500 -
chr1    4856693    4857193
chr1    4857193    4857693
chr1    4857693    4858193
chr1    4858193    4858693

BEDOPS natively works with standard input/output streams, which makes this work expressive and fast on large-scale datasets.

Note that --chop operations calculate new elements, which necessarily discards all but the first three columns.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Alex Reynolds26k
1
gravatar for dariober
3.3 years ago by
dariober9.7k
Glasgow - UK
dariober9.7k wrote:

I think your reasoning is correct. You could also fix it by using -l 1000 -r 999 instead of -b 1000:

bedtools slop -i test.bed -g mm10.chromsizes -l 1000 -r 999 > test.plusminus1000bp.bed

 

ADD COMMENTlink written 3.3 years ago by dariober9.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1590 users visited in the last hour