Question: Bed Coordinates
2
gravatar for Florianino
9.9 years ago by
Florianino30
Florianino30 wrote:

Hi all,

I have installed bedtool and tried fastafromBED but it looks like when I ask for positions 1 to 25, it gives me 2 to 25 instead in the output. How come?

I had posted that as a comment and got a first reply:

"BED format uses zero-based, half-open coordinates, so the first 25 bases of a sequence are in the range 0-25 (those bases being numbered 0 to 24). – Keith James♦ Mar 12 at 16:33"

So BED coordinates are different from GFF3 for example? How to confidently reformat columns of start-stop intervals before extracting coordinates using BEDtools?

Thanks in advance!

bed coordinates format • 5.6k views
ADD COMMENTlink modified 7.6 years ago by Biostar ♦♦ 20 • written 9.9 years ago by Florianino30
1

You may want to see this related question on the pros/cons of different coordinate systems: What Are The Advantages/Disadvantages Of One-Based Vs. Zero-Based Genome Coordinate Systems

ADD REPLYlink modified 17 months ago by Ram32k • written 9.9 years ago by Casey Bergman18k

You may want to see this related question on the pros/cons of different coordinate systems.

ADD REPLYlink written 9.9 years ago by Casey Bergman18k
3
gravatar for Pierre Lindenbaum
9.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

So BED coordinates are different from GFF3 for example?

Yes, there is a +/-1 shift. See http://genome.ucsc.edu/FAQ/FAQformat.html#format1

chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.

chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.

 

So BED coordinates are different from GFF3 for example? How to confidently reformat columns of start-stop intervals before extracting coordinates using BEDtools?

You can simply use awk. For example:

echo -e "chr1\t1\t100" | awk '{printf("%s\t%d\t%d\n",$1,int($2)-1,int($3));}'
chr1    0   100
ADD COMMENTlink modified 15 months ago by Ram32k • written 9.9 years ago by Pierre Lindenbaum134k
1
gravatar for Rlong
9.6 years ago by
Rlong340
US
Rlong340 wrote:

I have found it useful to think of bed coordinates as marking the spaces between the the bases, rather than the bases themselves. I will try to represent this: [?][?]

| A | C | G | T | A | C | G | T |[?] 0 | 1 | 2 | 3 | [?]4 | 5 |[?] 6 | 7 [?]| 8

So if you wanted to describe the first base, it would be:

chr[?][?][?][?]0[?][?][?][?]1

and GTAC:

chr[?][?][?][?]2[?][?][?][?]6

Another handy thing to note, you should always be able to subtract the start from the end to get the length of the bases you are describing, except in the case of insertions, which is the only case when you should have a start == stop. This should make sense in this scheme, since you are really only calling out a position between two bases, where a bit of sequence has been inserted.

ADD COMMENTlink written 9.6 years ago by Rlong340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2415 users visited in the last hour
_