Question

Off topic:Meaning of the identifier of a fasta file

0

Entering edit mode

8.4 years ago

Lluís R. ★ 1.2k

I found a fasta file (which I must analyse) with some identifiers and information I don't know what does it mean completely.

>NM_001103386.01.e12_cds11 chrX 23878 11577 11716 FWD(+) 140bp frame: 1
>NM_001014709.01.i03_cds02-cds03 chrX 173667 6453 6751 REV(-) 299bp frame: 0

The first part is a gene identifier from NCBI (NM_001103386), the some obscure cds identifiers, chromosome, position of something, start and end positions of the feature on that chromosome. If it is forward or reverse, the length (subtracting the end minus the start) and the frame.

But how do I use this?

I am trying to extract the feature from the sequence, I thought I should use the frame number on the sequence and the FWD and REV to select the right sequence. But now I thought it could already be that the whole sequence is already the feature. Anyone knows where could these data come? Or if I should use the whole sequence as is or I should find the right strand and find the right frame?

fasta • 1.5k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by Lluís R. ★ 1.2k