Question: FASTA of an arbitrary genome: Is the first base the 5' end?
5.4 years ago by
Hello fellows,

I know that it doesn't matter which one of the 2 strands you declare the reference. But it should matter whether you store the strand (whichever you have chosen) in 5' -> 3' or 3' -> 5 orientation in a fasta file. When I download a sequence, is there sort of a convention that the first base is always the 5' (or 3') end? When I have a read that overhangs at the beginning of the sequence in the fasta file, is it a correct to say "The read overhangs the 5' end" ?

5.4 years ago by
Walnut Creek, USA
Nucleotide fasta is always 5' to 3', as are all other formats.

The only exception I am aware of is Solid Colorspace, but it's not in nucleotide space, and has the interesting property that the reverse is the same as the reverse-complement, and as a result, it doesn't matter.  But it's notable because their chemistry includes an enzyme that reads 3' to 5' for read 2.

Even so, when the colorspace reads were translated to nucleotide space, they were always represented 5' to 3'.

If I recall correctly that technology produced reads from the same strand - pointing the same direction, but still right orientation. So it may have read the same fragment backwards from the end but it reversed it during reporting it. It was simple to do that since another property of the color space was that colors in reverse decode to a reverse sequence.

5.4 years ago by
Matt Shirley9.4k
Cambridge, MA
Depends on if you find the ATG codon at the beginning of a gene that is annotated on the + strand :). I'm not sure that there is a convention but it's surely more confusing to distribute sequences as 3' > 5' and so probably uncommon.

5.4 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
This is an interesting question because my first answer is of course it is in 5' to 3' direction but then I don't know of a rule that requires this - other than pandemonium would break out otherwise.

