Question: How To Analyze Regulatory Elements, E.G. Tf Binding Sites, For Non-Model Organisms?
gravatar for Dejian
8.2 years ago by
United States
Dejian1.2k wrote:

I am working on a non-model organism using customised oligo micro-arrays. I want to study the regulatory elements of differentially expressed genes (DEGs) to see whether they are controlled by shared transcriptional factors. We can map the DEGs to the recently sequenced genomic sequences of this species through selecting the best BLAST hit. I want to retrieve the regulatory genomic sequences of all DEGs and predict the possible TF binding sites of all DEGs. I want to know which tool can do this. Many thanks!

binding transcription • 2.3k views
ADD COMMENTlink modified 8.2 years ago by Larry_Parnell16k • written 8.2 years ago by Dejian1.2k

Is this a prokaryote or eukaryote? Euks have transcriptional regulatory elements much more widely dispersed relative to the gene's TSS than do prokaryotes.

ADD REPLYlink written 8.2 years ago by Larry_Parnell16k

Eukaryote, or specifically locusts, a kind of insects.

ADD REPLYlink written 8.2 years ago by Dejian1.2k

Thanks, now I can better address your question.

ADD REPLYlink written 8.2 years ago by Larry_Parnell16k
gravatar for Will
8.2 years ago by
United States
Will4.5k wrote:

So I would break this into two problems:

  1. Getting Genomic data
  2. Scanning for TFs

Problem 1 is pretty easy. I would use Galaxy. It has tools for uploading entire genomes from arbitrary organisms and then a set of tools for extracting genomic intervals. Within ~10 minutes you should be able to get the upstream region of every gene in the locust genome. For this type of analysis I prefer to use 10Kb upstream but that number is certainly open for debate.

Problem 2 is a little more difficult and will require some approximations: My knowledge of the taxonomy of insects is not very good but I would guess that Drosophila melanogaster (DM) would be a pretty good approximation. You can use the JASPAR database to get the Position Weight Matrices for all TFs in DM (I would download them from the FTP site). Then scan the upstream region of each gene in locust. I've done this a few times before so here is a link to a Python script that should get you started. It requires the MOODS package which is here: (its not easy to find with Google) Paper, code. The script can be git-cloned/downloaded from here. Its currently pretty general (It can process any JASPAR file and SEQInteral file) but it should be easy to modify you need to do so.

If the DM PWMs aren't a reasonable approximation then you'll have to find a tool to predict them. However you'll need data other than microarray to get reasonable predictions (ChipSeq would be wonderful).


When it comes to "analysis" of which TFs are enriched in you DEG list I would suggest one of two methods. Hypergeometric test (or Fisher's Exact) can find TFs which are in your DEG list more often than one would expect by chance. This will give you a pretty good "back of the envelope" answer but the hypergeo and fisher's test are not truly representative of biology. With a little more work you could get the data into the format required by GSEA, think of each TF as a "signature", and then see how well these signatures match your DEG ranking. This method tends to be a little more accepted by the general community.

-Hope that helps,


ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by Will4.5k

Hi,Will. Many thanks to you for your detailed answer. It is pretty informative and instructive. Since there are something new to me, it will take me some time to handle this problem. I will provide feedback when there is some progress.

ADD REPLYlink written 8.2 years ago by Dejian1.2k
gravatar for Dave Gerrard
8.2 years ago by
Dave Gerrard190
Dave Gerrard190 wrote:

I agree with Will but would add that you might be able to use your own expression data to filter for candidate transcription factors (TFs) that are a) turned on in either of your samples or more stringently b) differentially expressed between your samples. This assumes that your custom arrays have probes for the TFs in question.

ADD COMMENTlink written 8.2 years ago by Dave Gerrard190

Thank you, Dave. We have probes for TFs on the microarray, but they were not designed specifically for my current experiment. They were chosen from the locust EST library. Currently, I am planning to check whether they are related to DEGs.

ADD REPLYlink written 8.2 years ago by Dejian1.2k
gravatar for Larry_Parnell
8.2 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

Finally some time to address this interesting and relevant topic.

First, I am not certain that transcription factors (TFs) will show accurate, reliable, detectable expression differences between control and treatment. You are likely to see a few, but detecting many may not be feasible.

Second, I would have a look at the papers from Manolis Kellis' group on sequence comparisons to detect transcription control motifs. They did this work in yeast and in fruit flies. You may be able to use their fly data to identify motifs. JASPAR and TRANSFAC don't have much in terms of insect motifs. An alternate approach is to use the genome sequences from fruit fly, mosquito, honeybee and others to build alignments in order to identify likely regulatory motifs or TF binding sites (TFBSs). You'll have to do this on a gene by gene basis as synteny won't exist across these species as it does for the yeasts or Drosophila species Kellis et al examined. Nonetheless, such an effort may identify conserved core motifs. You can then look to see if the presence of that motif corresponds to an enrichment of upregulated, downregulated or non-responding genes.

Because you are looking at a "new" or different species, linking a motif to a TF may be difficult. I do like Will's GSEA approach but am unsure what to tell you that you should expect for a success rate with this. Could work well, could be problematic.

ADD COMMENTlink written 8.2 years ago by Larry_Parnell16k

Thank you for your continued attention to this question, Larry. As you expected, few TFs appeared in DEGs. You mentioned two of Manolis's papers, and I only get one which focused on yeast[PMID: 12748633]. Will you please provide some details about the papers?Thanks.

ADD REPLYlink written 8.2 years ago by Dejian1.2k

Try these PubMed IDs with respect to Drosophila 17994087 17994088 18421375 20084099 21177974

My search query was "Kellis Drosophila."

ADD REPLYlink written 8.2 years ago by Larry_Parnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1162 users visited in the last hour