Question

Get location based on protein ids from genbankfile

1

Entering edit mode

8.1 years ago

Tony ▴ 10

Hello everyone.. I am quite new in here also with biopython.

I have a set of protein ids, using them i extracted locus_tags from genbank file, But i wonder if there is a way to extract locations of those genes i.e. start and end position using existing information??

for example: I have two files,

genbank file
text file containing protein_ids

using this protein id file, i need to get start and end positions of the corresponding gene from that gbk file.

Many thanks and i really appreciate this service.. :)

genome gene R python • 2.4k views

ADD COMMENT • link updated 8.1 years ago by skbrimer ▴ 740 • written 8.1 years ago by Tony ▴ 10

0

Entering edit mode

Could be helpful : How To Get Ensembl Id (Gene, Transcript, Protein) Mapping Information?

BiomaRt : http://www.ensembl.info/blog/2015/06/01/biomart-or-how-to-access-the-ensembl-data-from-r/

ADD REPLY • link 8.1 years ago by Tanvir Ahamed ▴ 350

0

Entering edit mode

But i am looking for the gene location on genome, as we know in genbank file, for each gene start and end co-rodinates are given, using existing information i.e. gene i.d or locus tag, i would like to get those start and end co-ordinates of corresponding genes.

ADD REPLY • link 8.1 years ago by Tony ▴ 10

0

Entering edit mode

Please add an example .

ADD REPLY • link 8.1 years ago by Tanvir Ahamed ▴ 350

score 1 · Answer 1 · 2016-03-15

You should be able to make a script that will reference the protein file and the genbank file and either append the protein file or just make a new one. For the looping through of the genbank file you can use this loop from one of my scripts

for record in SeqIO.parse(open(gb_file,"rU"),"genbank"):
    for feature in record.features:
        if feature.type == 'CDS':
            start = int(feature.location.start)
            stop = int(feature.location.end)
            try:
                name = feature.qualifiers['gene'][0]
            except:
                #some features only have locus tags
                name = feature.qualifiers['locus_tag'][0]
            if feature.strand < 0:
                strand = "-"
            else:
                strand = "+"
            bed_line = record.id +"\t{0}\t{1}\t{2}\t500\t{3}\t{0}\t{1}\t50,205,50\n".format(start, stop, name,strand)
            out_bedfile.write(bed_line)

this should get you wan you want, you can find the whole script here More file parsing :) EDIT how do I make a fast and bed file from Genbank file - SOLVED