Question: working with promoter sequence of tomato using biopython
gravatar for kws15
4.6 years ago by
kws1540 wrote:

Hi everyone, I have been working with tomato (s.lycopersicum) with its wild relative(s.pennellii). So below is a pair of ortholog protein I got previously using some alignment method.

XP_004243382.1--> protein LOC107025503 [Solanum pennellii]

XP_015081778.1-->protein LOC101253299 [Solanum lycopersicum]

you can easily search them on ncbi, so what I have to do is to analyse th promoter sequences of the proteins of all the orthologous proteins. I used the biopython script below to extract the promoter sequnce of all the gene in every chromosome genbank file for both species. genbank Chromosome files can be downloaded through the ftp site of ncbi

below is the biopython script

def prom_extract_multi(in_gbk = "in.gbk", prom_len = 1000, file_out = "prom_out.fna"):

from Bio import Seq
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

prom_out = ""   

GBrecord = next(SeqIO.parse(in_gbk, "genbank"))
for feature in GBrecord.features:
    if feature.type =="gene":
        my_start = feature.location._start.position # Identifies the start position of the gene on the sense strand (5' to 3' irrespective of actual coding strand).
        my_end = feature.location._end.position # Identifies the end position of the gene on the sense strand (5' to 3' irrespective of actual coding strand).
        start_1000 = my_start - prom_len
        end_1000 = my_end + prom_len
        if feature.strand == -1:
            feat_loc = str(feature.location)
            my_prom = GBrecord[my_end:end_1000].reverse_complement()
            prom_out += "> Promoter rev_comp" + "___" + feature.qualifiers['db_xref'][0] + "___" + feat_loc + "\n"
            prom_out += my_prom.seq.__str__() + "\n\n"

        elif feature.strand == 1:
            feat_loc = str(feature.location)
            my_prom = GBrecord[start_1000:my_start]
            prom_out += "> Promoter" + "___" + feature.qualifiers['db_xref'][0] + "___" + feat_loc + "\n"
            prom_out += my_prom.seq.__str__()+"\n\n"

file=open(file_out, 'w')

so the output file of the promoter fasta file looks like something like this


So i tried to get my promoter sequence for the example pair shown above first, I relate the geneID from the output to the protein id(XP_ 004243382.1) of my example pair to get the promoter sequence for both species, however I was only able to get the one s.pennellii, did I miss something here?? I searched the chromosome genbank file and found something could be causing this,

the link shows “See also 69 discontinued or replaced items.” could this missed/replaced items are the part of my promoter sequence??

Also, can anyone please suggest what I can use to analyse promoter sequences, I tried promoterwise by ebi by failed to install on my laptop, I am thinking just to use blastn , I hope I have explained my question clear enough

thank you very much

biopython promoter tomato • 2.0k views
ADD COMMENTlink modified 2.6 years ago by premraj.preeti0 • written 4.6 years ago by kws1540

When you create the output file handler with the open function, use the option "a" instead of "w" to append the result to the output instead of overwritten it.

ADD REPLYlink written 4.6 years ago by rafa.rios.5060

There's a bug in your FASTA output: you have a space between ">" and "Promoter_..." which can break many tools which will interpret this as a zero length identifier.

ADD REPLYlink written 4.5 years ago by Peter5.8k

In this line of code prom_out += "> Promoter" + "___" + feature.qualifiers['db_xref'][0] + "___" + feat_loc + "\n"

I am getting KeyError: 'db_xref'

Any suggestions please

ADD REPLYlink written 2.6 years ago by premraj.preeti0

Me too premraj.preeti. If you found a solution for this weird error, can you please share? i have Biopython version 1.71 Thanks.

ADD REPLYlink written 2.5 years ago by Areej.alsheikh0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 665 users visited in the last hour