Hello,
To begin with, sorry for the long post. I just want to give all the information I think is relevant to the problem I am having. I am newish to bioinformatics, so I apologize if my coding is not the best. If you read all the way through, thank you! I am trying to filter a GenBank file with a list of gene ids using Biopython's SeqIO. I have done this previously with no problem. I am not sure if there is an issue with this particular GenBank file, or if new updates to Biopython have caused the issues. Here is my code below:
genelist_records = []
gb_file1 = "genbank_file.gk"
for inrecord in SeqIO.parse(open(gb_file1,"r"), "genbank"):
name = inrecord.name
if name in geneslist_exons:
genelist_records.append(inrecord)
else:
pass
with open ("genbank_file_filtered.gb", "w") as outfile:
SeqIO.write(genelist_records,outfile, "genbank")
Add these are the errors I receive:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-8f0d4cd86deb> in <module>
18
19 with open ("genbank_file_filtered.gb", "w") as outfile:
---> 20 SeqIO.write(genelist_records,outfile, "genbank")
21
22 #to get the number of gene ids that are over our threshold to make sure we have enough. (we want ~1100)
~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/__init__.py in write(sequences, handle, format)
489 if format in _FormatToWriter:
490 writer_class = _FormatToWriter[format]
--> 491 count = writer_class(fp).write_file(sequences)
492 elif format in AlignIO._FormatToWriter:
493 # Try and turn all the records into a single alignment,
~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/Interfaces.py in write_file(self, records)
212 """
213 self.write_header()
--> 214 count = self.write_records(records)
215 self.write_footer()
216 return count
~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/Interfaces.py in write_records(self, records)
197 count = 0
198 for record in records:
--> 199 self.write_record(record)
200 count += 1
201 # Mark as true, even if there where no records
~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/InsdcIO.py in write_record(self, record)
807 """Write a single record to the output file."""
808 handle = self.handle
--> 809 self._write_the_first_line(record)
810
811 default = record.id
~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/InsdcIO.py in _write_the_first_line(self, record)
616 # Must be something like NucleotideAlphabet or
617 # just the generic Alphabet (default for fasta files)
--> 618 raise ValueError("Need a Nucleotide or Protein alphabet")
619
620 # Get the molecule type
ValueError: Need a Nucleotide or Protein alphabet
I have tried to go back and assign both an alphabet and a molecular type to the original GenBank file with this code:
from Bio.Alphabet import generic_dna
from Bio.Seq import Seq
genelist_records = []
gb_file1 = "genbank_file.gk"
for inrecord in SeqIO.parse(open(gb_file1,"r"), "genbank"):
name = inrecord.name
sequence = Seq(str(inrecord.seq), generic_dna)
annotation={"molecule_type":"DNA"}
genelist_records.append(inrecord)
with open ("Trachinotus_ovatus_filtered_genes_mol_type.gb", "w") as outfile:
SeqIO.write(genelist_records,outfile, "genbank")
However, I get the exact same errors as above.
I have seen in the latest updates where Biopython has replaced Bio.Alphabet, so I tried adding this to my code (which I found here- https://biopython.org/wiki/Alphabet):
try:
from Bio.Alphabet import Alphabet
except ImportError:
Alphabet = "DNA"
And still get the same error. I am really confused and frustrated and am not sure what next steps to take. When looking up a record in my Genbank file, it says there is an alphabet assigned.
ID: LG1_667828-726403
Name: LG1_667828-726403
Number of features: 3
Seq('GAAATATGCTTCAGAACATAACTTCATGTGGAGAGGTGCAGTCAAAGGCATCTG...CTG', Alphabet())
Does anyone have any ideas or suggestions on how to fix this so I can write a GenBank file? Or another method I can use to filter a GenBank file with a list of LOCUS ids?
Thank you for reading and any help!
i think you should check you input file first~ your code seems right~