Off topic:Meaning of the identifier of a fasta file
0
0
Entering edit mode
8.4 years ago
Lluís R. ★ 1.2k

I found a fasta file (which I must analyse) with some identifiers and information I don't know what does it mean completely.

>NM_001103386.01.e12_cds11 chrX 23878 11577 11716 FWD(+) 140bp frame: 1
>NM_001014709.01.i03_cds02-cds03 chrX 173667 6453 6751 REV(-) 299bp frame: 0

The first part is a gene identifier from NCBI (NM_001103386), the some obscure cds identifiers, chromosome, position of something, start and end positions of the feature on that chromosome. If it is forward or reverse, the length (subtracting the end minus the start) and the frame.

But how do I use this?

I am trying to extract the feature from the sequence, I thought I should use the frame number on the sequence and the FWD and REV to select the right sequence. But now I thought it could already be that the whole sequence is already the feature. Anyone knows where could these data come? Or if I should use the whole sequence as is or I should find the right strand and find the right frame?

fasta • 1.5k views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6