If GI number in list1 is present in genbank file, make new list and append location information
0
1
Entering edit mode
8.1 years ago
pawlowac ▴ 80

Hi all,

I'm fairly new to python.

I have a list of GI numbers (list 1) and my ultimate goal is to slice sequence data for 5 kb upstream and 5 kb downstream of my target CDS in a genbank file, which is also in a list (list 2). I've made an attempt to do this. I do not get any errors, however I do not get any response from printing the new list (combo_list).

I would really appreciate your help.

from Bio import SeqIO

gb_file = 'file.gb'
gb_records = SeqIO.parse(open(gb_file,"r"), "genbank")
combo_list = []
for gb_record in gb_records :
if gb_record.features :
for feature in gb_record.features :
if feature.qualifiers == "db_xref" :
for gi_list in gb_record :
start = feature.location._start.position
end = feature.location._end.postion
combo_list.append(start, end)
print combo_list
stop()
return combo_list

def main():

gi_file = "gi.txt"
gi_list = []
for line in text_file :
gi_list.append(int(line.strip()))

if __name__ == '__main__':
main()


From Ram's assistance;

for genome in gbank:
for gene in genome.features:
if gene.type=="CDS":
if 'db_xref' in gene.qualifiers:
for gi in gi_list:
if str(gi) in gene.qualifiers['db_xref'][0]:

python genbank biopython • 2.4k views
0
Entering edit mode

Looks like qualifiers is a collection. == might not be the right way to compare if that's the case.

0
Entering edit mode

Thank you for pointing that out, I've now switched the line to

if "db_xref" in feature.qualifiers
0
Entering edit mode
I should mention that the code overall still does not work.
0
Entering edit mode

Documentation states that qualifiers is a key value pair. You might wanna check if the db_xref key exists and has a value. And please read the documentation - these are well documented modules you're using.

0
Entering edit mode

http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/genbank/

db_xref does exist. It's the GI number and GeneID. I'm interested in matching GI numbers and pulling out the location associated with it in the GenBank file.

And since I'm new at this, I want to make sure the documentation you're referring to is the biopython cookbook, right?

0
Entering edit mode

I know db_xref exists(duh), I was saying you should check if your features dict has a key named db_xref. And I'm not sure if it is the cookbook - try this: Google BioPython Bio Seq and check out Records and Features under Bio Seq GenBank.

0
Entering edit mode

How do I check if my features dict has a key named db_xref? I thought it was standard much like using 'CDS' and 'source'?

0
Entering edit mode

if key in dict is the syntax you're looking for. Please try Google, "dict has key" gives you the answer on the first link!

0
Entering edit mode

Gee, thanks for telling me so much to Google for something. Like I've already said, I'm new to this (literally started 2 days ago). If I understood the concept, I wouldn't be asking that question here, I would've googled for it...

0
Entering edit mode

Sarcasm is not a good strategy on forums, pawlowac. I stick by my statement. You learn these terms only by Googling stuff repeatedly and not making the same mistake twice. And you've written "how do I check if my dict has a key" - try typing that in google and check what it suggests.

0
Entering edit mode

Sorry, I shouldn't have been sarcastic. I was frustrated because you really weren't clarifying anything for me by just telling me to google things considering most of those websites go over my head. I guess the fault is partially mine since I am not asking specific questions to your responses. I also didn't appreciate you saying duh since it was obvious I completely misunderstood your question.

Reading back now, I think by this;

You might wanna check if the db_xref key exists and has a value.

you were talking about a particular syntax and not telling me to look if db_xref exists in general or in my genbank file. Correct?

If that's the case, what is the dict? does the code read;

if db_xref in gi_list


Since I am looking for gi number from gi_list within gb_records and want to extract a feature associated with it?

0
Entering edit mode

It's OK, pawlowac, we are all used to plenty of misinterpretations :)

Think of the GenBank object like this:

GenBank object can have any number features - 0 or more. So, you should check if the genbank object has any valid features entry.

Each feature has multiple qualifiers, where one or more of the qualifiers might have a key "db_xref". So, you're looking for something along the lines of if "db_xref" in feature.qualifiers print(feature["db_xref"]

0
Entering edit mode

Got it! Thanks for your direction Ram. (Sorry for the late response, benchwork got in the way)

for genome in gbank:
for gene in genome.features:
if gene.type=="CDS":
if 'db_xref' in gene.qualifiers:
for gi in gi_list:
if str(gi) in gene.qualifiers['db_xref'][0]: