Question: Refseq (.Fna) Vs. (.Gbff) Files
2
gravatar for Truxton
5.1 years ago by
Truxton20
Truxton20 wrote:

When I download transcript data from Refseq via their FTP site (ftp://ftp.ncbi.nlm.nih.gov/refseq/), I noticed that there are two file types: .gbff & .fna. Is it correct to assume that the .gbff (Gene Bank Flat... I believe) file contains EXACTLY the same sequence information as the .fna file (FASTA format sequences) in the same order, except that the .fna file has only short one-line descriptions for the sequences?

Also, what are the possible last 'words' in the ">..." title for each sequence in the .fna file? I've seen for example 'mRNA' and 'ncRNA', and so forth. Is there a fixed and standardized list by chance?

database sequence • 8.1k views
ADD COMMENTlink modified 10 months ago by Biostar ♦♦ 20 • written 5.1 years ago by Truxton20
0
gravatar for Pablacious
5.1 years ago by
Pablacious610
Cambridge, UK
Pablacious610 wrote:

The > line in a fasta file is only divided in the identifier part and the description part. The identifier part goes between the > and the first space. Whatever goes after the first space is the description part (which can have all the spaces that you want), so there is no such a thing as a last word. The description is optional, the identifier not. Yes, I would say that the sequence should be the same, gff has only more "structured" meta data.

ADD COMMENTlink written 5.1 years ago by Pablacious610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1629 users visited in the last hour