Question: Which file should I use to retrieve CDs following the GenBank Assembly link? No RefSeq available.
gravatar for andrespara
6 months ago by
andrespara10 wrote:


I want to gather CDs from NCBI or Ensembl. For some species there is NO curated RefSeq assembly, only the link to "ftp directory for GenBank assembly" There I found GBFF files AND genomic.fna files. If I only want CDs like I usually recover in RefSeq links, which one is the correct one?

I think I can convert GBFF files into fasta using other programs. I don't know how to filter CDs from the genomic.fna file (since I think there are more than CDs in this file). Thanks for your help,


cds gbff refseq assembly ncbi • 205 views
ADD COMMENTlink written 6 months ago by andrespara10

Not sure why you see only GBFF and genomic files. Any time I get assemblies there are bunch of other files, including cds_from_genomic.fna.gz which is what you need. Can you give a link to an assembly?

I would not be surprised if they have only genomic DNA file, but if they have .gff or .gbff files, it is straightforward to convert them into codons or proteins. You may want to try any2fasta or gffread from Cufflinks. I am sure there are many other tools for converting .gff and .gbff into .fasta.

ADD REPLYlink written 6 months ago by Mensur Dlakic4.2k

Thanks I will try some of these tools.

ADD REPLYlink written 6 months ago by andrespara10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour