Question: How To Get Promoter Sequences For Non-Model Organisms?
9.4 years ago by
United States
Dejian1.3k wrote:

Model species are well researched and many tools are available. For example, many methods can be used to get the promoter sequences from human being. What if from a non-model organism? This problem is becoming urgent since many new genomes are available.

ADDED: Maybe I should clarify the situation. The genome is newly sequenced by our group, so its sequences are available for us. But recently the sequences are not publicly available, so BioMart cannot help. We are annotating the genome. The CDS can be predicted. However, I have no idea about how to determine the promoter region for each gene.

non prediction promoter • 3.7k views
ADD COMMENTlink modified 9 months ago by Biostar ♦♦ 20 • written 9.4 years ago by Dejian1.3k

Your question is pretty vague. What exactly do you mean by the promoter sequence? And which genomes are you referring to? Have genes been annotated on these genomes? Are these genomes already available in a genome browser (e.g. Ensembl, Ensembl Genomes or the UCSC Genome Browser)?

written 9.4 years ago by Bert Overduin3.7k

Have tried the fantastic ensEMBL APIs (perl/biomart-perl)?

written 9.4 years ago by Jarretinha3.3k

'Model'-organism is irrelevant, the only things that's important are is the genome sequenced and is it annotated. If the the sequence is available, it's available, if not not ;)

written 9.4 years ago by Michael Dondrup47k

Is the organism a bacterium, archaeon or eukaryote? That will affect the choice of software and the type of promoter region.

written 9.4 years ago by Neilfws48k
9.4 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

BioMart is a very powerful tool for many extraction tasks. If your genome of interest - despite not being a model organism - is included in Ensembl, Ensembl Bacteria, Ensembl Metazoa, Ensembl Protists, Ensembl, Plants, or Ensembl Fungi simply go to the corresponding BioMart. Through the web interface you can easily retrieve a FASTA file with the 5' flanking sequence of every annotated gene.

If you have an annotated genome that is not yet public, you will need to do some basic scripting to retrieve what you need. It should not be hard, but generally since every project organizes data differently, you cannot rely on there being an existing tool that you can just run to do the extraction.

And if the genome has not yet been annotated, you will have to do gene prediction first, which is a task in its own right. Without knowing where the genes are, you cannot extract their putative promoter regions.

written 9.4 years ago by Lars Juhl Jensen11k
9.4 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

I assume now your organism is a bacterium:

Bacterial promoters are normally located relatively closely upstream of the CDS. They are characterized via sequence motives that allow binding of different sigma factors to the DNA in order to initiate transcription. Different promoter motives are specific for different families of sigma factors (e.g. sigma70, sigma54 are the most common ones). Promoters specific for other sigma factors might be more variable and harder to detect, your organism might also contain new sigma factors.

To give an overview I found this list of tools for bacterial promoter prediction:

written 9.4 years ago by Michael Dondrup47k
