Question: Refseq (.Fna) Vs. (.Gbff) Files
When I download transcript data from Refseq via their FTP site (, I noticed that there are two file types: .gbff & .fna. Is it correct to assume that the .gbff (Gene Bank Flat... I believe) file contains EXACTLY the same sequence information as the .fna file (FASTA format sequences) in the same order, except that the .fna file has only short one-line descriptions for the sequences?

Also, what are the possible last 'words' in the ">..." title for each sequence in the .fna file? I've seen for example 'mRNA' and 'ncRNA', and so forth. Is there a fixed and standardized list by chance?

The > line in a fasta file is only divided in the identifier part and the description part. The identifier part goes between the > and the first space. Whatever goes after the first space is the description part (which can have all the spaces that you want), so there is no such a thing as a last word. The description is optional, the identifier not. Yes, I would say that the sequence should be the same, gff has only more "structured" meta data.

