Question: Which script / documentation can you recommend about promoters finding?
gravatar for juanma_lace
4.5 years ago by
juanma_lace20 wrote:

If this question is too broad please let me know in the comments and I'll try to be more specific.

I'm a Software Engineer, brand new to Bioinformatics, and I was assigned to find information about promoters finding, particularly Wheat promoters (4D region).

I've searched information regarding to promoters (biology), scripts (python / perl) but I cannot completely understand how this task can be carry out. 

I have a file with the FASTA format of the genome. 

>2307812 2159 215576 1548678-,...,2120126+

If someone can suggest documentation, websites, or give me some hints I'll be very grateful.

Thank you in advance.



promoters python perl • 1.4k views
ADD COMMENTlink modified 4.1 years ago by Biostar ♦♦ 20 • written 4.5 years ago by juanma_lace20

First start with reading about the concept and then you'll immediately understand why this is not a task that should be assigned to a software engineer with no background in biology/bioinformatics.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Istvan Albert ♦♦ 80k
gravatar for Renesh
4.5 years ago by
United States
Renesh1.6k wrote:

As you are S/W engineer, I assume that you are good in programming. For finding the promoter sequence from genome, first you need to have genome coordinate file. You will easily find genome coordinate file for any organism in gff3 file. This can be downloaded from respective genome website.

The 1000 bp (or 2000 bp) from first exon of a gene (5' site i.e left to first exon) is the promoter sequence for that gene.

So you need to write a code which take position for first exon from gff3 (say pos) file and extract the sequence from that position to 1000 or 2000 bp left (extract pos-1000, pos) from genome file.

You can easily do this in Perl and Python.

ADD COMMENTlink written 4.5 years ago by Renesh1.6k

Good Information, thank you. Once I have the gff3 file, where do I look for the 1000/2000 bp I need? I mean which file format?

ADD REPLYlink written 4.5 years ago by juanma_lace20

From genome sequence in fasta file.

ADD REPLYlink written 4.5 years ago by Renesh1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1131 users visited in the last hour