Question: Discovering motifs for core promoter elements on a set of enhancer sites
0
gravatar for dally
3.3 years ago by
dally180
United States
dally180 wrote:

I have a bed file of enhancer sites that i'd like to run motif analysis on. I'm looking for core promoter elements (if any exist) for regions such as TATA-box, Sp1, Inf, etc. 

I came across MEME, and while I admittedly haven't read the entirety of the manual (i'm working on it though!) I thought it would be a good idea to come here and ask for any common pitfalls for this type of analysis.

Specifically, i'm looking for advice to make this analysis statistically and biologically sound. Are the input files to MEME suite my bed file of enhancer sites, or should I first convert this bed file to fasta? Which of the MEME suite tools should I be using if my enhancer sites vary from no less than 20bp to no larger than 1000bp? What is the difference between MEME's novel, ungapped motif identifier and GLAM2's noval, gapped motif identifier? Which one would be better suited to this type of analysis?

 

Thank you!

chip-seq motif analysis meme • 1.3k views
ADD COMMENTlink modified 3.3 years ago by Anima Mundi2.4k • written 3.3 years ago by dally180

PWMs for canonical core promoter elements have already been published. For example, Ohler 2002 Genome Biology has several of those. In addition, Vo Ngoc 2017 Genes Development recently refined the Inr element.

Unless you are looking for novel core promoter elements, I recommend you just used these prior annotations, and set the P- or E-value cutoff on your own. Also, in Ohler 2002 Genome Biology all of the elements had like 12nt long. You definitely need to cut it to 4 to 8 to keep just the positions with most information in your analysis.

Good luck.

ADD REPLYlink written 2.2 years ago by maduh1710
2
gravatar for Fidel
3.3 years ago by
Fidel1.9k
Germany
Fidel1.9k wrote:

you can use `centrimo` from the MEME suite. This is the description 

"CentriMo identifies known or user-provided motifs that show a significant preference for particular locations in your sequences (sample output from sequences and motifs). CentriMo can also show if the local enrichment is significant relative to control sequences. See this Manual for more information."

As input you need a fasta file containing the sequences you are interested in. You can convert your .bed file to a fasta file using `fastaFromBed` from bedTools.

ADD COMMENTlink written 3.3 years ago by Fidel1.9k

Hi Fidel. In sequences not larger than 1 Kb, personally I would not particularly focus on motif position.

ADD REPLYlink written 3.3 years ago by Anima Mundi2.4k

If anything this gives me excellent information to go on. I will probably use centrimo and then compare it to the results from AME/FIMO, which does not focus on motif position.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by dally180
1
gravatar for Anima Mundi
3.3 years ago by
Anima Mundi2.4k
Italy
Anima Mundi2.4k wrote:

Hello,

Question 1: yes, you should convert you input file to FASTA

Question 2: while you can certainly play around with the tools of the suite, I would first use MEME in your shoes

Question 3: Glam2's approach is somehow more ambitious than MEME's, as it tries to identify complex motives (it tries to identify de novo "meta-motives" made by units that might be separated)

Question 4: as above, I would prioritize MEME. Also, you could find existing PSWMs of motives of interest (e.g. Sp1), convert them in MEME format and input them, together with your FASTA sequences, to FIMO.

 

ADD COMMENTlink written 3.3 years ago by Anima Mundi2.4k

I'll give FIMO a try. I seem to have been having a hard time finding PWM's for my targets of interest (finding only TBP and Sp1) in homo sapiens. I find it hard to believe there is not already motifs for the other guys, but JASPAR seems to not have them, and my search is bringing up nothing. 

Will also give meme a try, been running into some problems of "Your sequence must be at least 8 characters long, remove shorter sequences and re-run", but i'll get it. Thanks for the help!

ADD REPLYlink written 3.3 years ago by dally180

Regarding PSWM databases, check out also TRANSFAC (unfortunately mostly not free) and UniPROBE.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Anima Mundi2.4k

It's amazing that none of these have the actual promoter element motifs i'm looking for (at least not the free version of TRANSFAC). I have found a few drosophila motifs, but working with human data I don't really know how reliable this would be.

ADD REPLYlink written 3.3 years ago by dally180

Yes, it might be due to differences between your ideal motives and the ones of Drosophila. Also, keep in mind that FIMO is mostly concerned with avoiding type I errors. There are more permissive algorithms, if you are already sufficiently convinced that your motives have to be there (see MotifViz Possum, for example).

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Anima Mundi2.4k

Would you know a database that contains mammalian promoter sequences that have an annotated TATA box? Some of the papers I read mentioned obtaining a list from GenBank, but i'm not having much luck. I want to validate some of the motifs I have found.

ADD REPLYlink written 3.3 years ago by dally180

I do not know of such a database, but you may find what you search by defining TSS positions provided by RNASeq studies et similia, and then expanding their coordinates to yield a list of narrow ranges (say, 50 bp or less) that are very likely to contain TATA boxes if you chose your promoter type carefully (e.g., there are TATA box-less promoters). You can then use that list to get a FASTA file to screen via MEME; this would output a PSWM that you might then use on your promoters. Actually, your list of promoters might have already been obtained by expanding TSS positions, so if you have TSS information you can just trim your promoters in a different way. In general, I suggest you to take care about the quality of the sequences you provide to motif-searching algorithms, as signal to noise ratio is very important.

ADD REPLYlink written 3.3 years ago by Anima Mundi2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1371 users visited in the last hour