Multi-records genbank to CDS
7 months ago
hazirliver ▴ 10

Hi! I have a file containing several genebank records written one after the other. I need to extract CDS (protein sequnce(/translation), /locus_tag, /inference, /product and contig id) from all contigs. How can i do it?
The input format looks like this
And the result looks like this

How can i do this?

CDS genbank biopython • 223 views
Since you are analyzing data, it would be helpful if you make some effort to write a small script to read a file line by line and process it.

7 months ago
Joe 19k

To clarify, you want all proteins/products, from all the entries in the file?

If so, take a look here: https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/genbank2fasta/

Yes, thanks! The code in this article was giving me an error, but this article got me on the right way to find the answer. I found the right solution using SeqIO.InsdcIO.GenBankCdsFeatureIterator.