Question: Searching For Mouse Regulatory Motifs
7
gravatar for Andrew Su
9.8 years ago by
Andrew Su4.8k
San Diego, CA
Andrew Su4.8k wrote:

Question: The last time I played with transcription factor motif searching (many years ago) I used the TRANSFAC database of motifs combined with the MATCH tool at gene-regulation.com. Has the state of the art progressed? I'm particularly interested in web servers, web services, or R packages to do this analysis.

Background: A collaborator approached us wanting to study the transcriptional regulation of his favorite gene. Specifically, he wanted to identify key regulators that bind in the ~100 KB upstream region. Because he wanted to cast such a wide net, we started by providing the ~2 KB of sequence (spread among four regions) with the highest phylogenetic conservation as calculated in UCSC's conservation track.

Based on these results, he generated transgenic mice with various combinations of these conservons knocked out, and many of these mice had dramatically altered expression patterns. The next step of course is to identify the specific regulators binding to these regions.

He is currently making additional mice that refine the 2 KB of sequence into smaller chunks. In parallel, we'd like to use bioinformatics to identify candidate binding sites and their corresponding regulatory proteins.

Edit: To clarify, there are many tools that take many coregulated genes and find enriched motifs, but these tools do not really address my particular need. I'm interested in the regulation of exactly one gene, and I'm interested in identifying candidate binding motifs corresponding to known TFs in that gene's upstream genomic sequence.

gene motif transcription • 5.2k views
ADD COMMENTlink modified 9.8 years ago by Ejm0 • written 9.8 years ago by Andrew Su4.8k

Thanks a lot for your excellent follow up with the question and answers. IMHO, this should be cited as one of the best interactive Q&A of BioStar.

ADD REPLYlink written 9.8 years ago by Khader Shameer18k
4
gravatar for Neilfws
9.8 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

Has the state of the art progressed? Not much. I think the only difference that you'll find from the days of TRANSFAC is that there a few more (open) database alternatives - particularly JASPAR - and a few more webserver tools. Few if any of the latter will have a web service, or even provide results to download as ASCII text. To my knowledge, there is no useful R package for this task.

We have used ConSite and the JASPAR website - to scan sequence at the latter, follow the link to the vertebrate core database. You could also look at the MEME suite; it is more geared towards de novo motif discovery, but has tools to scan for known motifs, provided that you convert them to the required format.

ADD COMMENTlink modified 4 months ago by RamRS26k • written 9.8 years ago by Neilfws48k

In addition to providing the database of matrices, the scanning tool that JASPAR provides is certainly among the easiest to use. However, they note themselves the "abysmal selectivity" of their tool, and point the user to ConSite. Unfortunately, the ConSite site is down at the moment... I've emailed the administrators and will (hopefully) post another comment on ConSite once it's back up...

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k
4
gravatar for Fred Fleche
9.8 years ago by
Fred Fleche4.3k
Paris, France
Fred Fleche4.3k wrote:

[?]

[?]

http://homes.esat.kuleuven.be/~saerts/software/toucan.php

ADD COMMENTlink written 9.8 years ago by Fred Fleche4.3k

Gave TOUCAN a shot and I actually like it quite a bit! I was a bit wary of the webstart, but now I'm a believer. FASTA input was pretty straightforward, MotifScanner allows automated retrieval of several PWM databses (JASPAR, Transfac, etc.), the GUI is pretty nice for viewing the output, and the GFF export allows for easy downstream parsing. Two thumbs up here!

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k

After taking a look at all the great suggestions, I'm accepting this one as the answer -- nice GUI, easy access to TRANSFAC and JASPAR, many pre-built background sequences for motif scanning, nice visual display of results, and GFF export. Thanks Fred (and all)...

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k

link seems not work anymore?

ADD REPLYlink written 6.4 years ago by dli220
1

So web search to track it down; here it is -http://med.kuleuven.be/lcb/toucan.php

ADD REPLYlink written 6.4 years ago by Neilfws48k

Thank you very much :)

ADD REPLYlink written 6.4 years ago by dli220
3
gravatar for razor
9.8 years ago by
razor160
Barcelona
razor160 wrote:

This is an overview of some of the tools available. "Assessing computational tools for the discovery of transcription factor binding sites" http://www.ncbi.nlm.nih.gov/pubmed/15637633. Most of them are command line tools, but there are web server versions for some of them too.

ADD COMMENTlink written 9.8 years ago by razor160

Pretty lengthy list of tools that were compared here, but most fall into the class of tools for de novo motif discovery (as opposed to scanning known motifs). Interesting read though, thanks!

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k
2
gravatar for Khader Shameer
9.8 years ago by
Manhattan, NY
Khader Shameer18k wrote:

I think ORegAnno will be a good start. List of mouse TFs and TFBS are available here. If you are interested in regulatory regions in non-coding regions you may check Enhancer Browser or RSAT.

ADD COMMENTlink written 9.8 years ago by Khader Shameer18k

ORegAnno appears to be a database of regulatory elements, but it doesn't appear to have a tool to search those elements in sequence. Also, it appears that JASPAR is a more commonly used database of binding motifs, no?

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k

VISTA's Enhancer Browser appears to contain only human elements -- potentially good but not directly useful for my mouse example...

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k

Trying RSAT's matrix-scan which appears to be the most promising so far. The "scanning options" don't really make sense to me at first glance, so I hope the defaults are reasonably chosen...

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k

Thanks for the feedback. I suggested ORegAnno, because I have used the TFBS information from ORegAnno to search in upstream of several genes of interest using hmm. True there is no search program available at ORegAnno, but you may perform a pattern search using hmm or other approaches using TFBS reported in ORegAnno.

ADD REPLYlink written 9.8 years ago by Khader Shameer18k

I suggested Enhancer because the website explicitly mentions that "The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice" . I thought that means these elements must be conserved across the species.

ADD REPLYlink written 9.8 years ago by Khader Shameer18k

I didn't used JASPAR before, but on a quick look I can see that ORegAnno reports more TFBS than JASPAR. As suggested Neil, you could use MEME to search for the TFBS from OregAnno in the upstream of your gene of interest.

ADD REPLYlink written 9.8 years ago by Khader Shameer18k

Great! thanks for the follow up... Good points all...

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k

I would also like recommend you to take a look at one of my paper where we looked at the upstream of few arabidopsis genes involved in abiotic stress response using HMM models derived from known TFs. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2561162/

ADD REPLYlink written 9.8 years ago by Khader Shameer18k
2
gravatar for Darked89
9.8 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

Even restricting yourself to highly conserved regions, 100kb is a lot to search for motifs.

Check:

Prediction of over Represented Transcription Factor Binding Sites in Co-regulated Genes Using Whole Genome Matching Statistics. by Pavesi. Link

After his seminar I updated this page (see the top):

EDIT

I did not use it yet, but Pavesi has a PSCAN server

ADD COMMENTlink modified 18 months ago by RamRS26k • written 9.8 years ago by Darked894.2k

Looks interesting, but ideally I could do this one-off analysis through a web interface, or alternatively a web service or an R package.

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k

Looks interesting, but ideally I could do this one-off analysis through a web interface, or alternatively a web service or an R package. I'm not quite so motivated to implement a published algorithm. (Added this context to the question above...)

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k

I hope there is no need for reimplementing the algorithm. See PSCAN link in the edited answer above. One-offism is also what I often do :).

ADD REPLYlink written 9.8 years ago by Darked894.2k

The PSCAN site looks good (thanks!), but it is geared toward a slightly different use case than what I'm looking for. See my comment to Greg Tyrelle's answer and my edit above...

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k
2
gravatar for Greg Tyrelle
9.8 years ago by
Greg Tyrelle70
Netherlands
Greg Tyrelle70 wrote:

There are many tools that will take sequence and look for TFBS based on various searching approaches e.g. PWM. In fact too many. I will second the recommendation for TOUCAN if you want a GUI tool. I had the need to integrate this kind of search in to a processing pipeline. For that I used TAMO, which is a python toolkit. I can recommend this if you like python. There is now a web based tool, webmotifs, that is based on TAMO, so you're covered if you want a one-off analysis. On the commercial side Biobase have good integrated web based tools for TF searching and analysis. The interfaces are a bit of a mess (organic growth of the product), but all the tools are there.

ADD COMMENTlink modified 9.8 years ago • written 9.8 years ago by Greg Tyrelle70

Thanks for the links. TAMO and webmotifs look like they are targeted toward finding enriched motifs based on a putatively coregulated set of genes, but I'm interested in finding instances of known motifs in the upstream region of only one gene. Slight different use case in the general space of promoter bashing. I've added some clarification to my question above.

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k

TAMO will do motif search and discovery.

ADD REPLYlink written 9.8 years ago by Greg Tyrelle70

Greg, thanks for the follow up. I tried entering one refseq on this page (http://fraenkel.mit.edu/webmotifs/form.html) and get this error: "You requested motif discovery on fewer than 10 sequences. WebMOTIF only supports motif discovery on sets of 10 sequences or more."

ADD REPLYlink written 9.8 years ago by Andrew Su4.8k
0
gravatar for Ejm
9.0 years ago by
Ejm0
Ejm0 wrote:

Maybe worth checking out Biostrings for doing this in R? It can do some PWM manipulations, including matching, and downloading PWMs you need from somewhere (i.e. JASAPAR) and getting them in there should be pretty straightforward.

ADD COMMENTlink written 9.0 years ago by Ejm0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1920 users visited in the last hour