Question: Get location based on protein ids from genbankfile
gravatar for Tony
4.7 years ago by
Tony10 wrote:

Hello everyone.. I am quite new in here also with biopython.

I have a set of protein ids, using them i extracted locus_tags from genbank file, But i wonder if there is a way to extract locations of those genes i.e. start and end position using existing information??

for example: I have two files,

  1. genbank file
  2. text file containing protein_ids

using this protein id file, i need to get start and end positions of the corresponding gene from that gbk file.

Many thanks and i really appreciate this service.. :)

python R gene genome • 1.3k views
ADD COMMENTlink modified 4.7 years ago by skbrimer640 • written 4.7 years ago by Tony10

Could be helpful : How To Get Ensembl Id (Gene, Transcript, Protein) Mapping Information?

BiomaRt :

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Tanvir Ahamed 290

But i am looking for the gene location on genome, as we know in genbank file, for each gene start and end co-rodinates are given, using existing information i.e. gene i.d or locus tag, i would like to get those start and end co-ordinates of corresponding genes.

ADD REPLYlink written 4.7 years ago by Tony10

Please add an example .

ADD REPLYlink written 4.7 years ago by Tanvir Ahamed 290
gravatar for skbrimer
4.7 years ago by
United States
skbrimer640 wrote:

You should be able to make a script that will reference the protein file and the genbank file and either append the protein file or just make a new one. For the looping through of the genbank file you can use this loop from one of my scripts

for record in SeqIO.parse(open(gb_file,"rU"),"genbank"):
    for feature in record.features:
        if feature.type == 'CDS':
            start = int(feature.location.start)
            stop = int(feature.location.end)
                name = feature.qualifiers['gene'][0]
                #some features only have locus tags
                name = feature.qualifiers['locus_tag'][0]
            if feature.strand < 0:
                strand = "-"
                strand = "+"
            bed_line = +"\t{0}\t{1}\t{2}\t500\t{3}\t{0}\t{1}\t50,205,50\n".format(start, stop, name,strand)

this should get you wan you want, you can find the whole script here More file parsing :) EDIT how do I make a fast and bed file from Genbank file - SOLVED

ADD COMMENTlink written 4.7 years ago by skbrimer640
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1403 users visited in the last hour