ValueError: Need a Nucleotide or Protein alphabet. Trying to write a Genbank file and getting error
1
0
Entering edit mode
11 months ago
Fishy_DNA ▴ 10

Hello,

To begin with, sorry for the long post. I just want to give all the information I think is relevant to the problem I am having. I am newish to bioinformatics, so I apologize if my coding is not the best. If you read all the way through, thank you! I am trying to filter a GenBank file with a list of gene ids using Biopython's SeqIO. I have done this previously with no problem. I am not sure if there is an issue with this particular GenBank file, or if new updates to Biopython have caused the issues. Here is my code below:

genelist_records = []
gb_file1 = "genbank_file.gk"

for inrecord in SeqIO.parse(open(gb_file1,"r"), "genbank"):
    name = inrecord.name
    if name in geneslist_exons:
        genelist_records.append(inrecord)

    else:
        pass

with open ("genbank_file_filtered.gb", "w") as outfile:
    SeqIO.write(genelist_records,outfile, "genbank")

Add these are the errors I receive:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-8f0d4cd86deb> in <module>
     18 
     19 with open ("genbank_file_filtered.gb", "w") as outfile:
---> 20     SeqIO.write(genelist_records,outfile, "genbank")
     21 
     22 #to get the number of gene ids that are over our threshold to make sure we have enough. (we want ~1100)

~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/__init__.py in write(sequences, handle, format)
    489         if format in _FormatToWriter:
    490             writer_class = _FormatToWriter[format]
--> 491             count = writer_class(fp).write_file(sequences)
    492         elif format in AlignIO._FormatToWriter:
    493             # Try and turn all the records into a single alignment,

~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/Interfaces.py in write_file(self, records)
    212         """
    213         self.write_header()
--> 214         count = self.write_records(records)
    215         self.write_footer()
    216         return count

~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/Interfaces.py in write_records(self, records)
    197         count = 0
    198         for record in records:
--> 199             self.write_record(record)
    200             count += 1
    201         # Mark as true, even if there where no records

~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/InsdcIO.py in write_record(self, record)
    807         """Write a single record to the output file."""
    808         handle = self.handle
--> 809         self._write_the_first_line(record)
    810 
    811         default = record.id

~/miniconda3/envs/training3/lib/python3.6/site-packages/Bio/SeqIO/InsdcIO.py in _write_the_first_line(self, record)
    616             # Must be something like NucleotideAlphabet or
    617             # just the generic Alphabet (default for fasta files)
--> 618             raise ValueError("Need a Nucleotide or Protein alphabet")
    619 
    620         # Get the molecule type

ValueError: Need a Nucleotide or Protein alphabet

I have tried to go back and assign both an alphabet and a molecular type to the original GenBank file with this code:

from Bio.Alphabet import generic_dna
from Bio.Seq import Seq


genelist_records = []
gb_file1 = "genbank_file.gk"

for inrecord in SeqIO.parse(open(gb_file1,"r"), "genbank"):
    name = inrecord.name
    sequence = Seq(str(inrecord.seq), generic_dna) 
    annotation={"molecule_type":"DNA"}
    genelist_records.append(inrecord)

with open ("Trachinotus_ovatus_filtered_genes_mol_type.gb", "w") as outfile:
    SeqIO.write(genelist_records,outfile, "genbank")

However, I get the exact same errors as above.

I have seen in the latest updates where Biopython has replaced Bio.Alphabet, so I tried adding this to my code (which I found here- https://biopython.org/wiki/Alphabet):

try:
    from Bio.Alphabet import Alphabet
except ImportError:
    Alphabet = "DNA"

And still get the same error. I am really confused and frustrated and am not sure what next steps to take. When looking up a record in my Genbank file, it says there is an alphabet assigned.

ID: LG1_667828-726403
Name: LG1_667828-726403
Number of features: 3
Seq('GAAATATGCTTCAGAACATAACTTCATGTGGAGAGGTGCAGTCAAAGGCATCTG...CTG', Alphabet())

Does anyone have any ideas or suggestions on how to fix this so I can write a GenBank file? Or another method I can use to filter a GenBank file with a list of LOCUS ids?

Thank you for reading and any help!

Genbank Biopython SeqIO.write Bio.Alphabet • 1.2k views
ADD COMMENT
0
Entering edit mode

i think you should check you input file first~ your code seems right~

ADD REPLY
0
Entering edit mode
11 months ago
Mensur Dlakic ★ 14k

I think it may solve your problem if you type this line:

inrecord.seq.alphabet = generic_dna

These three lines may not be necessary:

name = inrecord.name
sequence = Seq(str(inrecord.seq), generic_dna) 
annotation={"molecule_type":"DNA"}
ADD COMMENT
0
Entering edit mode

Thank you so much! That seems to have solved the issue!

ADD REPLY

Login before adding your answer.

Traffic: 2265 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6