Hey everyone, I'm very new to bioinformatics and python in general, and haven't had any luck with finding a good way to do this.
My current pipeline outputs a multi-entry .gbk file that contains multiple CDS features per contig annotated. From this, I would like to pull out the locus_tag and the nucleotide sequence for each CDS feature (not contig) and have that output to a simple csv file.
What I've tried is from this github link: https://github.com/dewshr/NCBI-GenBank-file-parser
But I'm getting an indexing error that I'm not sure how to correct since I don't know much about the code:
Traceback (most recent call last): File "/home/seq/gbknuctest.py", line 128, in <module> ntgenbank() File "/home/seq/gbknuctest.py", line 50, in ntgenbank nm_version = (nm_and_version.split('.')).strip('\n') IndexError: list index out of range
I'm sure there's a much more concise way to do this, but reading through the cookbook and stumbling around hasn't gotten me anywhere so far.
output of python -V to show it's not due to a py3/py2 difference:
$ python -V Python 2.7.15
Thanks so much in advance.