Question

Which script / documentation can you recommend about promoters finding?

0

Entering edit mode

10.7 years ago

juanma_lace ▴ 20

If this question is too broad please let me know in the comments and I'll try to be more specific.

I'm a Software Engineer, brand new to Bioinformatics, and I was assigned to find information about promoters finding, particularly Wheat promoters (4D region).

I've searched information regarding to promoters (biology), scripts (python / perl) but I cannot completely understand how this task can be carry out.

I have a file with the FASTA format of the genome.

>2307812 2159 215576 1548678-,...,2120126+
CCTCTATCAAGTGGTATCAGATTTTCAGGTTGCTCGGTGAGATTTTACAGTTTTTCATAGTTTAGATCGAGGTTGTTCTTCATACCTTTAGTCCACGAAAAAGCCAAAAACATTTAGGGTTCATCCTATCCAAACCAATCTGAGCCTTTGCATAATCTTGTTTAGAGTTTTTGCTTTGTTGAATTTGCGGTTGCATCGTGGTGTCGAGTTGCTGGTCTTAGCGTCTAGTCCTTTAGAGTTTCGAGTTCTGTTTCATAGTTTGTCACGCCGCCGCCGCACCACCTTTATCACTACCATATACCACCACCCCACCGTATACAT

If someone can suggest documentation, websites, or give me some hints I'll be very grateful.

Thank you in advance.

perl python promoters • 2.7k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by juanma_lace ▴ 20

0

Entering edit mode

First start with reading about the concept http://en.wikipedia.org/wiki/Promoter_(genetics) and then you'll immediately understand why this is not a task that should be assigned to a software engineer with no background in biology/bioinformatics.

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by Istvan Albert 102k

Ram · Answer 1 · 2014-10-21

1

Entering edit mode

10.7 years ago

Renesh ★ 2.2k

As you are S/W engineer, I assume that you are good in programming. For finding the promoter sequence from genome, first you need to have genome coordinate file. You will easily find genome coordinate file for any organism in gff3 file. This can be downloaded from respective genome website.

The 1000 bp (or 2000 bp) from first exon of a gene (5' site i.e left to first exon) is the promoter sequence for that gene.

So you need to write a code which take position for first exon from gff3 (say pos) file and extract the sequence from that position to 1000 or 2000 bp left (extract pos-1000, pos) from genome file.

You can easily do this in Perl and Python.

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by Renesh ★ 2.2k

0

Entering edit mode

Good Information, thank you. Once I have the gff3 file, where do I look for the 1000/2000 bp I need? I mean which file format?