Question: If GI number in list1 is present in genbank file, make new list and append location information
1
gravatar for pawlowac
6.0 years ago by
pawlowac70
Boston, Harvard Medical School
pawlowac70 wrote:

Hi all,

I'm fairly new to python.

I have a list of GI numbers (list 1) and my ultimate goal is to slice sequence data for 5 kb upstream and 5 kb downstream of my target CDS in a genbank file, which is also in a list (list 2). I've made an attempt to do this. I do not get any errors, however I do not get any response from printing the new list (combo_list).

I would really appreciate your help.

 

 

from Bio import SeqIO

def add_location(gi_list):
    gb_file = 'file.gb'
    gb_records = SeqIO.parse(open(gb_file,"r"), "genbank")
    combo_list = []
    for gb_record in gb_records :
        if gb_record.features :
            for feature in gb_record.features :
                if feature.qualifiers == "db_xref" :
                    for gi_list in gb_record :
                        start = feature.location._start.position
                        end = feature.location._end.postion
                        combo_list.append(start, end)
                        print combo_list
                        stop()
    return combo_list

def main():

    gi_file = "gi.txt"
    text_file = open("gi.txt", "r").readlines()
    gi_list = []
    for line in text_file :
        gi_list.append(int(line.strip()))

    location_gi_list = add_location(gi_list)

if __name__ == '__main__':
    main()

 

From Ram's assistance;

 

for genome in gbank:
    for gene in genome.features:
        if gene.type=="CDS": 
            if 'db_xref' in gene.qualifiers:
                for gi in gi_list:
                    if str(gi) in gene.qualifiers['db_xref'][0]:
genbank biopython python • 1.7k views
ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by pawlowac70

Looks like qualifiers is a collection. == might not be the right way to compare if that's the case.

ADD REPLYlink written 6.0 years ago by Ram32k

Thank you for pointing that out, I've now switched the line to

if "db_xref" in feature.qualifiers
ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by pawlowac70
I should mention that the code overall still does not work.
ADD REPLYlink written 6.0 years ago by pawlowac70

Documentation states that qualifiers is a key value pair. You might wanna check if the db_xref key exists and has a value. And please read the documentation - these are well documented modules you're using.

ADD REPLYlink written 6.0 years ago by Ram32k

http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/genbank/

db_xref does exist. It's the GI number and GeneID. I'm interested in matching GI numbers and pulling out the location associated with it in the GenBank file. 

I have read the documentation. 

And since I'm new at this, I want to make sure the documentation you're referring to is the biopython cookbook, right?

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by pawlowac70

I know db_xref exists(duh), I was saying you should check if your features dict has a key named db_xref. And I'm not sure if it is the cookbook - try this: Google BioPython Bio Seq and check out Records and Features under Bio Seq GenBank. 

ADD REPLYlink written 6.0 years ago by Ram32k

How do I check if my features dict has a key named db_xref? I thought it was standard much like using 'CDS' and 'source'?

ADD REPLYlink written 6.0 years ago by pawlowac70

if key in dict is the syntax you're looking for. Please try Google, "dict has key" gives you the answer on the first link!

ADD REPLYlink written 6.0 years ago by Ram32k

Gee, thanks for telling me so much to Google for something. Like I've already said, I'm new to this (literally started 2 days ago). If I understood the concept, I wouldn't be asking that question here, I would've googled for it...

ADD REPLYlink written 6.0 years ago by pawlowac70

Sarcasm is not a good strategy on forums, pawlowac. I stick by my statement. You learn these terms only by Googling stuff repeatedly and not making the same mistake twice. And you've written "how do I check if my dict has a key" - try typing that in google and check what it suggests.

ADD REPLYlink written 6.0 years ago by Ram32k

Sorry, I shouldn't have been sarcastic. I was frustrated because you really weren't clarifying anything for me by just telling me to google things considering most of those websites go over my head. I guess the fault is partially mine since I am not asking specific questions to your responses. I also didn't appreciate you saying duh since it was obvious I completely misunderstood your question.

Reading back now, I think by this;

"You might wanna check if the db_xref key exists and has a value."  

you were talking about a particular syntax and not telling me to look if db_xref exists in general or in my genbank file. Correct?

If that's the case, what is the dict? does the code read;

if db_xref in gi_list

since I am looking for gi number from gi_list within gb_records and want to extract a feature associated with it?

ADD REPLYlink written 6.0 years ago by pawlowac70

It's OK, pawlowac, we are all used to plenty of misinterpretations :)

Think of the GenBank object like this:

GenBank object can have any number features - 0 or more. So, you should check if the genbank object has any valid features entry.

Each feature has multiple qualifiers, where one or more of the qualifiers might have a key "db_xref". So, you're looking for something along the lines of if "db_xref" in feature.qualifiers print(feature["db_xref"]

ADD REPLYlink written 6.0 years ago by Ram32k

Got it! Thanks for your direction Ram. (Sorry for the late response, benchwork got in the way)

for genome in gbank:
    for gene in genome.features:
       if gene.type=="CDS": 
            if 'db_xref' in gene.qualifiers:
               for gi in gi_list:
                    if str(gi) in gene.qualifiers['db_xref'][0]:
ADD REPLYlink written 6.0 years ago by pawlowac70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2136 users visited in the last hour
_