Best way to extract gene (CDS) sequences from genome assemblies.
1
0
Entering edit mode
9.4 years ago

Hi!

I constantly need to get a comprehensive list of genes for a bacterial/fungal species from an unfinished genome scaffold. What I got from the FTP site are the gff files and the fna files, and I am planning to write a code to extract all genes listed in the gff file from the fna file into a new genes.fsa file. However, before I get down to writing code, I have a feeling that I might be reinventing the wheel. Is there a set-up standard way of going about this? Either a file that I have overlooked that contain the gene-fasta info, or is there a python module that already does this?

Thank you very much!

genome Assembly • 3.7k views
ADD COMMENT
3
Entering edit mode
9.4 years ago

bedtools has this covered with the getfasta command.

cufflinks also has the gffread command with similar functionality plus others.

ADD COMMENT

Login before adding your answer.

Traffic: 2599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6