Bed Coordinates
2
2
Entering edit mode
13.1 years ago
Florianino ▴ 30

Hi all,

I have installed bedtool and tried fastafromBED but it looks like when I ask for positions 1 to 25, it gives me 2 to 25 instead in the output. How come?

I had posted that as a comment and got a first reply:

"BED format uses zero-based, half-open coordinates, so the first 25 bases of a sequence are in the range 0-25 (those bases being numbered 0 to 24). – Keith James♦ Mar 12 at 16:33"

So BED coordinates are different from GFF3 for example? How to confidently reformat columns of start-stop intervals before extracting coordinates using BEDtools?

Thanks in advance!

bed coordinates format • 7.5k views
ADD COMMENT
1
Entering edit mode

You may want to see this related question on the pros/cons of different coordinate systems: What Are The Advantages/Disadvantages Of One-Based Vs. Zero-Based Genome Coordinate Systems

ADD REPLY
0
Entering edit mode

You may want to see this related question on the pros/cons of different coordinate systems.

ADD REPLY
3
Entering edit mode
13.1 years ago

So BED coordinates are different from GFF3 for example?

Yes, there is a +/-1 shift. See http://genome.ucsc.edu/FAQ/FAQformat.html#format1

chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.

chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.

As for

So BED coordinates are different from GFF3 for example? How to confidently reformat columns of start-stop intervals before extracting coordinates using BEDtools?

You can simply use awk. For example:

echo -e "chr1\t1\t100" | awk '{printf("%s\t%d\t%d\n",$1,int($2)-1,int($3));}'
chr1    0   100
ADD COMMENT
1
Entering edit mode
12.7 years ago
Rlong ▴ 340

I have found it useful to think of bed coordinates as marking the spaces between the the bases, rather than the bases themselves. I will try to represent this:

[?][?]

| A | C | G | T | A | C | G | T |[?]

0 | 1 | 2 | 3 | [?]4 | 5 |[?] 6 | 7 [?]| 8

So if you wanted to describe the first base, it would be:

chr[?][?][?][?]0[?][?][?][?]1

and GTAC:

chr[?][?][?][?]2[?][?][?][?]6

Another handy thing to note, you should always be able to subtract the start from the end to get the length of the bases you are describing, except in the case of insertions, which is the only case when you should have a start == stop. This should make sense in this scheme, since you are really only calling out a position between two bases, where a bit of sequence has been inserted.

ADD COMMENT

Login before adding your answer.

Traffic: 2922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6