Entering edit mode
8.1 years ago
pawlowac ▴ 80
I'm fairly new to python.
I have a list of GI numbers (list 1) and my ultimate goal is to slice sequence data for 5 kb upstream and 5 kb downstream of my target CDS in a genbank file, which is also in a list (list 2). I've made an attempt to do this. I do not get any errors, however I do not get any response from printing the new list (combo_list).
I would really appreciate your help.
from Bio import SeqIO def add_location(gi_list): gb_file = 'file.gb' gb_records = SeqIO.parse(open(gb_file,"r"), "genbank") combo_list =  for gb_record in gb_records : if gb_record.features : for feature in gb_record.features : if feature.qualifiers == "db_xref" : for gi_list in gb_record : start = feature.location._start.position end = feature.location._end.postion combo_list.append(start, end) print combo_list stop() return combo_list def main(): gi_file = "gi.txt" text_file = open("gi.txt", "r").readlines() gi_list =  for line in text_file : gi_list.append(int(line.strip())) location_gi_list = add_location(gi_list) if __name__ == '__main__': main()
From Ram's assistance;
for genome in gbank: for gene in genome.features: if gene.type=="CDS": if 'db_xref' in gene.qualifiers: for gi in gi_list: if str(gi) in gene.qualifiers['db_xref']:
qualifiersis a collection.
==might not be the right way to compare if that's the case.
Thank you for pointing that out, I've now switched the line to
Documentation states that qualifiers is a key value pair. You might wanna check if the db_xref key exists and has a value. And please read the documentation - these are well documented modules you're using.
db_xrefdoes exist. It's the GI number and GeneID. I'm interested in matching GI numbers and pulling out the location associated with it in the GenBank file.
I have read the documentation.
And since I'm new at this, I want to make sure the documentation you're referring to is the biopython cookbook, right?
db_xrefexists(duh), I was saying you should check if your features dict has a key named
db_xref. And I'm not sure if it is the cookbook - try this: Google BioPython Bio Seq and check out Records and Features under Bio Seq GenBank.
How do I check if my features dict has a key named db_xref? I thought it was standard much like using 'CDS' and 'source'?
if key in dictis the syntax you're looking for. Please try Google, "dict has key" gives you the answer on the first link!
Gee, thanks for telling me so much to Google for something. Like I've already said, I'm new to this (literally started 2 days ago). If I understood the concept, I wouldn't be asking that question here, I would've googled for it...
Sarcasm is not a good strategy on forums, pawlowac. I stick by my statement. You learn these terms only by Googling stuff repeatedly and not making the same mistake twice. And you've written "how do I check if my dict has a key" - try typing that in google and check what it suggests.
Sorry, I shouldn't have been sarcastic. I was frustrated because you really weren't clarifying anything for me by just telling me to google things considering most of those websites go over my head. I guess the fault is partially mine since I am not asking specific questions to your responses. I also didn't appreciate you saying duh since it was obvious I completely misunderstood your question.
Reading back now, I think by this;
you were talking about a particular syntax and not telling me to look if
db_xrefexists in general or in my genbank file. Correct?
If that's the case, what is the dict? does the code read;
Since I am looking for gi number from
gb_recordsand want to extract a feature associated with it?
It's OK, pawlowac, we are all used to plenty of misinterpretations :)
Think of the GenBank object like this:
GenBank object can have any number features - 0 or more. So, you should check if the genbank object has any valid features entry.
Each feature has multiple qualifiers, where one or more of the qualifiers might have a key "db_xref". So, you're looking for something along the lines of
if "db_xref" in feature.qualifiers print(feature["db_xref"]
Got it! Thanks for your direction Ram. (Sorry for the late response, benchwork got in the way)