Hey everyone, I'm very new to bioinformatics and python in general, and haven't had any luck with finding a good way to do this.
My current pipeline outputs a multi-entry .gbk file that contains multiple CDS features per contig annotated. From this, I would like to pull out the locus_tag and the nucleotide sequence for each CDS feature (not contig) and have that output to a simple csv file.
What I've tried is from this github link: https://github.com/dewshr/NCBI-GenBank-file-parser
But I'm getting an indexing error that I'm not sure how to correct since I don't know much about the code:
Traceback (most recent call last):
File "/home/seq/gbknuctest.py", line 128, in <module>
ntgenbank()
File "/home/seq/gbknuctest.py", line 50, in ntgenbank
nm_version = (nm_and_version.split('.')[1]).strip('\n')
IndexError: list index out of range
I'm sure there's a much more concise way to do this, but reading through the cookbook and stumbling around hasn't gotten me anywhere so far.
Output of python -V
to show it's not due to a py3/py2 difference:
$ python -V
Python 2.7.15
Thanks so much in advance.
This is very close! It just isn't working perfectly for me. I actually searched this forum first and have this script you gave in a previous answer, albeit it was specific to a certain feature type. I wasn't sure how to edit it to make it more broad. I'm very glad you saw my question haha.
My first .gbk file is a cluster containing 11 CDSs, of which 3 are full genes. I want the nucleotide sequence for all of these, not just the 3 genes.
Right now it's just pulling the 3 "gene" tags and their sequences, but is ignoring the entries that have no "gene" but only a "locus_tag". Is there a way to parse them based on the "locus_tag" object?
An example of a gene in my file:
An example of CDS without a "gene" tag:
Yep, try my edited code above. I changed it before you replied, but possible after you initially saw it. I noticed it was still parsing out
”gene”
features, so I’ve switched that over now. Try the newer code above.Ah I see it now. This is perfect. Thanks so much!