How To Analyze Regulatory Elements, E.G. Tf Binding Sites, For Non-Model Organisms?
3
5
Entering edit mode
13.3 years ago
Dejian ★ 1.3k

I am working on a non-model organism using customised oligo micro-arrays. I want to study the regulatory elements of differentially expressed genes (DEGs) to see whether they are controlled by shared transcriptional factors. We can map the DEGs to the recently sequenced genomic sequences of this species through selecting the best BLAST hit. I want to retrieve the regulatory genomic sequences of all DEGs and predict the possible TF binding sites of all DEGs. I want to know which tool can do this. Many thanks!

transcription binding • 4.2k views
ADD COMMENT
1
Entering edit mode

Is this a prokaryote or eukaryote? Euks have transcriptional regulatory elements much more widely dispersed relative to the gene's TSS than do prokaryotes.

ADD REPLY
0
Entering edit mode

Eukaryote, or specifically locusts, a kind of insects.

ADD REPLY
0
Entering edit mode

Thanks, now I can better address your question.

ADD REPLY
8
Entering edit mode
13.3 years ago
Will 4.5k

So I would break this into two problems:

  1. Getting Genomic data
  2. Scanning for TFs

Problem 1 is pretty easy. I would use Galaxy. It has tools for uploading entire genomes from arbitrary organisms and then a set of tools for extracting genomic intervals. Within ~10 minutes you should be able to get the upstream region of every gene in the locust genome. For this type of analysis I prefer to use 10Kb upstream but that number is certainly open for debate.

Problem 2 is a little more difficult and will require some approximations: My knowledge of the taxonomy of insects is not very good but I would guess that Drosophila melanogaster (DM) would be a pretty good approximation. You can use the JASPAR database to get the Position Weight Matrices for all TFs in DM (I would download them from the FTP site). Then scan the upstream region of each gene in locust. I've done this a few times before so here is a link to a Python script that should get you started. It requires the MOODS package which is here: (its not easy to find with Google) Paper, code. The script can be git-cloned/downloaded from here. Its currently pretty general (It can process any JASPAR file and SEQInteral file) but it should be easy to modify you need to do so.

If the DM PWMs aren't a reasonable approximation then you'll have to find a tool to predict them. However you'll need data other than microarray to get reasonable predictions (ChipSeq would be wonderful).

EDIT:

When it comes to "analysis" of which TFs are enriched in you DEG list I would suggest one of two methods. Hypergeometric test (or Fisher's Exact) can find TFs which are in your DEG list more often than one would expect by chance. This will give you a pretty good "back of the envelope" answer but the hypergeo and fisher's test are not truly representative of biology. With a little more work you could get the data into the format required by GSEA, think of each TF as a "signature", and then see how well these signatures match your DEG ranking. This method tends to be a little more accepted by the general community.

-Hope that helps,

Will

ADD COMMENT
0
Entering edit mode

Hi,Will. Many thanks to you for your detailed answer. It is pretty informative and instructive. Since there are something new to me, it will take me some time to handle this problem. I will provide feedback when there is some progress.

ADD REPLY
1
Entering edit mode
13.3 years ago
Dave Gerrard ▴ 190

I agree with Will but would add that you might be able to use your own expression data to filter for candidate transcription factors (TFs) that are a) turned on in either of your samples or more stringently b) differentially expressed between your samples. This assumes that your custom arrays have probes for the TFs in question.

ADD COMMENT
0
Entering edit mode

Thank you, Dave. We have probes for TFs on the microarray, but they were not designed specifically for my current experiment. They were chosen from the locust EST library. Currently, I am planning to check whether they are related to DEGs.

ADD REPLY
1
Entering edit mode
13.3 years ago

Finally some time to address this interesting and relevant topic.

First, I am not certain that transcription factors (TFs) will show accurate, reliable, detectable expression differences between control and treatment. You are likely to see a few, but detecting many may not be feasible.

Second, I would have a look at the papers from Manolis Kellis' group on sequence comparisons to detect transcription control motifs. They did this work in yeast and in fruit flies. You may be able to use their fly data to identify motifs. JASPAR and TRANSFAC don't have much in terms of insect motifs. An alternate approach is to use the genome sequences from fruit fly, mosquito, honeybee and others to build alignments in order to identify likely regulatory motifs or TF binding sites (TFBSs). You'll have to do this on a gene by gene basis as synteny won't exist across these species as it does for the yeasts or Drosophila species Kellis et al examined. Nonetheless, such an effort may identify conserved core motifs. You can then look to see if the presence of that motif corresponds to an enrichment of upregulated, downregulated or non-responding genes.

Because you are looking at a "new" or different species, linking a motif to a TF may be difficult. I do like Will's GSEA approach but am unsure what to tell you that you should expect for a success rate with this. Could work well, could be problematic.

ADD COMMENT
0
Entering edit mode

Thank you for your continued attention to this question, Larry. As you expected, few TFs appeared in DEGs. You mentioned two of Manolis's papers, and I only get one which focused on yeast[PMID: 12748633]. Will you please provide some details about the papers?Thanks.

ADD REPLY
0
Entering edit mode

Try these PubMed IDs with respect to Drosophila

17994087
17994088
18421375
20084099
21177974

My search query was "Kellis Drosophila."

ADD REPLY

Login before adding your answer.

Traffic: 2307 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6