Which script / documentation can you recommend about promoters finding?
1
0
Entering edit mode
9.5 years ago
juanma_lace ▴ 20

If this question is too broad please let me know in the comments and I'll try to be more specific.

I'm a Software Engineer, brand new to Bioinformatics, and I was assigned to find information about promoters finding, particularly Wheat promoters (4D region).

I've searched information regarding to promoters (biology), scripts (python / perl) but I cannot completely understand how this task can be carry out.

I have a file with the FASTA format of the genome.

>2307812 2159 215576 1548678-,...,2120126+
CCTCTATCAAGTGGTATCAGATTTTCAGGTTGCTCGGTGAGATTTTACAGTTTTTCATAGTTTAGATCGAGGTTGTTCTTCATACCTTTAGTCCACGAAAAAGCCAAAAACATTTAGGGTTCATCCTATCCAAACCAATCTGAGCCTTTGCATAATCTTGTTTAGAGTTTTTGCTTTGTTGAATTTGCGGTTGCATCGTGGTGTCGAGTTGCTGGTCTTAGCGTCTAGTCCTTTAGAGTTTCGAGTTCTGTTTCATAGTTTGTCACGCCGCCGCCGCACCACCTTTATCACTACCATATACCACCACCCCACCGTATACAT

If someone can suggest documentation, websites, or give me some hints I'll be very grateful.

Thank you in advance.

perl python promoters • 2.3k views
ADD COMMENT
0
Entering edit mode

First start with reading about the concept http://en.wikipedia.org/wiki/Promoter_(genetics) and then you'll immediately understand why this is not a task that should be assigned to a software engineer with no background in biology/bioinformatics.

ADD REPLY
1
Entering edit mode
9.5 years ago
Renesh ★ 2.2k

As you are S/W engineer, I assume that you are good in programming. For finding the promoter sequence from genome, first you need to have genome coordinate file. You will easily find genome coordinate file for any organism in gff3 file. This can be downloaded from respective genome website.

The 1000 bp (or 2000 bp) from first exon of a gene (5' site i.e left to first exon) is the promoter sequence for that gene.

So you need to write a code which take position for first exon from gff3 (say pos) file and extract the sequence from that position to 1000 or 2000 bp left (extract pos-1000, pos) from genome file.

You can easily do this in Perl and Python.

ADD COMMENT
0
Entering edit mode

Good Information, thank you. Once I have the gff3 file, where do I look for the 1000/2000 bp I need? I mean which file format?

ADD REPLY
1
Entering edit mode

From genome sequence in fasta file.

ADD REPLY

Login before adding your answer.

Traffic: 2241 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6