Parsing a sequence from a genbank file containing multiple sequences using Biopython SeqIO library
0
0
Entering edit mode
4.5 years ago
landonba • 0

I have a genbank file containing a number or viral sequences. I am trying to parse through this file and print a single sequence by calling on its Locus name. The name of the sequence I am trying to access is "Abras_L_Seg_2". I am using the biopython function SeqIO. My code is as follows.

from Bio import SeqIO
record_dict = SeqIO.to_dict(SeqIO.parse("all.gb", "gb"))
print(record_dict['Abras_L_Seg_2'])

I am calling on this script via terminal and both my script and the genbank file are stored on the same level in the directory. This outputs this:

Traceback (most recent call last): File "practice_parsing.py", line 14, in <module> print(record_dict['Abras_L_Seg_2']) KeyError: 'Abras_L_Seg_2'

Any ideas?

RNA-Seq • 902 views
ADD COMMENT
0
Entering edit mode

Please show us a minimal example of the genbank file.

I don't believe you can apply to_dict directly to a genbank file. The cookbook shows in section 5.4.1 that you instead need to do

records = SeqIO.to_dict(SeqIO.parse("/path/to/genbank.gb", "genbank")

You can then access the keys of the dictionary according to the id value.

records['ID']

This is not the same as the LOCUS however, as it appears id comes from the VERSION field too (see section 4.2.3 of the cookbook).

ADD REPLY

Login before adding your answer.

Traffic: 2387 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6