Which file should I use to retrieve CDs following the GenBank Assembly link? No RefSeq available.
0
0
Entering edit mode
4.5 years ago
andrespara ▴ 30

Hi,

I want to gather CDs from NCBI or Ensembl. For some species there is NO curated RefSeq assembly, only the link to "ftp directory for GenBank assembly" There I found GBFF files AND genomic.fna files. If I only want CDs like I usually recover in RefSeq links, which one is the correct one?

I think I can convert GBFF files into fasta using other programs. I don't know how to filter CDs from the genomic.fna file (since I think there are more than CDs in this file). Thanks for your help,

Andrés

ncbi assembly refseq CDs gbff • 1.7k views
ADD COMMENT
0
Entering edit mode

Not sure why you see only GBFF and genomic files. Any time I get assemblies there are bunch of other files, including cds_from_genomic.fna.gz which is what you need. Can you give a link to an assembly?

I would not be surprised if they have only genomic DNA file, but if they have .gff or .gbff files, it is straightforward to convert them into codons or proteins. You may want to try any2fasta or gffread from Cufflinks. I am sure there are many other tools for converting .gff and .gbff into .fasta.

ADD REPLY
0
Entering edit mode

Thanks I will try some of these tools.

ADD REPLY

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6