Searching For Mouse Regulatory Motifs
7
7
Entering edit mode
13.9 years ago
Andrew Su 4.9k

Question: The last time I played with transcription factor motif searching (many years ago) I used the TRANSFAC database of motifs combined with the MATCH tool at gene-regulation.com. Has the state of the art progressed? I'm particularly interested in web servers, web services, or R packages to do this analysis.

Background: A collaborator approached us wanting to study the transcriptional regulation of his favorite gene. Specifically, he wanted to identify key regulators that bind in the ~100 KB upstream region. Because he wanted to cast such a wide net, we started by providing the ~2 KB of sequence (spread among four regions) with the highest phylogenetic conservation as calculated in UCSC's conservation track.

Based on these results, he generated transgenic mice with various combinations of these conservons knocked out, and many of these mice had dramatically altered expression patterns. The next step of course is to identify the specific regulators binding to these regions.

He is currently making additional mice that refine the 2 KB of sequence into smaller chunks. In parallel, we'd like to use bioinformatics to identify candidate binding sites and their corresponding regulatory proteins.

Edit: To clarify, there are many tools that take many coregulated genes and find enriched motifs, but these tools do not really address my particular need. I'm interested in the regulation of exactly one gene, and I'm interested in identifying candidate binding motifs corresponding to known TFs in that gene's upstream genomic sequence.

gene motif motif transcription • 7.5k views
ADD COMMENT
0
Entering edit mode

Thanks a lot for your excellent follow up with the question and answers. IMHO, this should be cited as one of the best interactive Q&A of BioStar.

ADD REPLY
4
Entering edit mode
13.9 years ago
Neilfws 49k

Has the state of the art progressed? Not much. I think the only difference that you'll find from the days of TRANSFAC is that there a few more (open) database alternatives - particularly JASPAR - and a few more webserver tools. Few if any of the latter will have a web service, or even provide results to download as ASCII text. To my knowledge, there is no useful R package for this task.

We have used ConSite and the JASPAR website - to scan sequence at the latter, follow the link to the vertebrate core database. You could also look at the MEME suite; it is more geared towards de novo motif discovery, but has tools to scan for known motifs, provided that you convert them to the required format.

ADD COMMENT
0
Entering edit mode

In addition to providing the database of matrices, the scanning tool that JASPAR provides is certainly among the easiest to use. However, they note themselves the "abysmal selectivity" of their tool, and point the user to ConSite. Unfortunately, the ConSite site is down at the moment... I've emailed the administrators and will (hopefully) post another comment on ConSite once it's back up...

ADD REPLY
4
Entering edit mode
ADD COMMENT
0
Entering edit mode

Gave TOUCAN a shot and I actually like it quite a bit! I was a bit wary of the webstart, but now I'm a believer. FASTA input was pretty straightforward, MotifScanner allows automated retrieval of several PWM databses (JASPAR, Transfac, etc.), the GUI is pretty nice for viewing the output, and the GFF export allows for easy downstream parsing. Two thumbs up here!

ADD REPLY
0
Entering edit mode

After taking a look at all the great suggestions, I'm accepting this one as the answer -- nice GUI, easy access to TRANSFAC and JASPAR, many pre-built background sequences for motif scanning, nice visual display of results, and GFF export. Thanks Fred (and all)...

ADD REPLY
0
Entering edit mode

link seems not work anymore?

ADD REPLY
1
Entering edit mode

So web search to track it down; here it is -http://med.kuleuven.be/lcb/toucan.php

ADD REPLY
0
Entering edit mode

Thank you very much :)

ADD REPLY
3
Entering edit mode
13.9 years ago
razor ▴ 190

This is an overview of some of the tools available. "Assessing computational tools for the discovery of transcription factor binding sites" http://www.ncbi.nlm.nih.gov/pubmed/15637633. Most of them are command line tools, but there are web server versions for some of them too.

ADD COMMENT
0
Entering edit mode

Pretty lengthy list of tools that were compared here, but most fall into the class of tools for de novo motif discovery (as opposed to scanning known motifs). Interesting read though, thanks!

ADD REPLY
2
Entering edit mode
13.9 years ago

I think ORegAnno will be a good start. List of mouse TFs and TFBS are available here. If you are interested in regulatory regions in non-coding regions you may check Enhancer Browser or RSAT.

ADD COMMENT
0
Entering edit mode

ORegAnno appears to be a database of regulatory elements, but it doesn't appear to have a tool to search those elements in sequence. Also, it appears that JASPAR is a more commonly used database of binding motifs, no?

ADD REPLY
0
Entering edit mode

VISTA's Enhancer Browser appears to contain only human elements -- potentially good but not directly useful for my mouse example...

ADD REPLY
0
Entering edit mode

Trying RSAT's matrix-scan which appears to be the most promising so far. The "scanning options" don't really make sense to me at first glance, so I hope the defaults are reasonably chosen...

ADD REPLY
0
Entering edit mode

Thanks for the feedback. I suggested ORegAnno, because I have used the TFBS information from ORegAnno to search in upstream of several genes of interest using hmm. True there is no search program available at ORegAnno, but you may perform a pattern search using hmm or other approaches using TFBS reported in ORegAnno.

ADD REPLY
0
Entering edit mode

I suggested Enhancer because the website explicitly mentions that "The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice" . I thought that means these elements must be conserved across the species.

ADD REPLY
0
Entering edit mode

I didn't used JASPAR before, but on a quick look I can see that ORegAnno reports more TFBS than JASPAR. As suggested Neil, you could use MEME to search for the TFBS from OregAnno in the upstream of your gene of interest.

ADD REPLY
0
Entering edit mode

Great! thanks for the follow up... Good points all...

ADD REPLY
0
Entering edit mode

I would also like recommend you to take a look at one of my paper where we looked at the upstream of few arabidopsis genes involved in abiotic stress response using HMM models derived from known TFs. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2561162/

ADD REPLY
2
Entering edit mode
13.9 years ago
Darked89 4.6k

Even restricting yourself to highly conserved regions, 100kb is a lot to search for motifs.

Check:

Prediction of over Represented Transcription Factor Binding Sites in Co-regulated Genes Using Whole Genome Matching Statistics. by Pavesi. Link

After his seminar I updated this page (see the top):

EDIT

I did not use it yet, but Pavesi has a PSCAN server

ADD COMMENT
0
Entering edit mode

Looks interesting, but ideally I could do this one-off analysis through a web interface, or alternatively a web service or an R package.

ADD REPLY
0
Entering edit mode

Looks interesting, but ideally I could do this one-off analysis through a web interface, or alternatively a web service or an R package. I'm not quite so motivated to implement a published algorithm. (Added this context to the question above...)

ADD REPLY
0
Entering edit mode

I hope there is no need for reimplementing the algorithm. See PSCAN link in the edited answer above. One-offism is also what I often do :).

ADD REPLY
0
Entering edit mode

The PSCAN site looks good (thanks!), but it is geared toward a slightly different use case than what I'm looking for. See my comment to Greg Tyrelle's answer and my edit above...

ADD REPLY
2
Entering edit mode
13.9 years ago
Greg Tyrelle ▴ 70

There are many tools that will take sequence and look for TFBS based on various searching approaches e.g. PWM. In fact too many. I will second the recommendation for TOUCAN if you want a GUI tool. I had the need to integrate this kind of search in to a processing pipeline. For that I used TAMO, which is a python toolkit. I can recommend this if you like python. There is now a web based tool, webmotifs, that is based on TAMO, so you're covered if you want a one-off analysis. On the commercial side Biobase have good integrated web based tools for TF searching and analysis. The interfaces are a bit of a mess (organic growth of the product), but all the tools are there.

ADD COMMENT
0
Entering edit mode

Thanks for the links. TAMO and webmotifs look like they are targeted toward finding enriched motifs based on a putatively coregulated set of genes, but I'm interested in finding instances of known motifs in the upstream region of only one gene. Slight different use case in the general space of promoter bashing. I've added some clarification to my question above.

ADD REPLY
0
Entering edit mode

TAMO will do motif search and discovery.

ADD REPLY
0
Entering edit mode

Greg, thanks for the follow up. I tried entering one refseq on this page (http://fraenkel.mit.edu/webmotifs/form.html) and get this error: "You requested motif discovery on fewer than 10 sequences. WebMOTIF only supports motif discovery on sets of 10 sequences or more."

ADD REPLY
0
Entering edit mode
13.0 years ago
Ejm • 0

Maybe worth checking out Biostrings for doing this in R? It can do some PWM manipulations, including matching, and downloading PWMs you need from somewhere (i.e. JASAPAR) and getting them in there should be pretty straightforward.

ADD COMMENT

Login before adding your answer.

Traffic: 2435 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6