Question: Tools For Chipseq Scale De Novo Motif Finding On Unaligned Sequences?
2
gravatar for 2184687-1231-83-
8.3 years ago by
2184687-1231-83-5.0k wrote:

Following up on this question: http://biostar.stackexchange.com/questions/598/tools-for-chipseq-scale-motif-finding

I've got a large amount of unaligned eukaryotic regulatory sequences and I want to do de novo motif discovery on them. These unaligned regulatory sequences are already filtered from reads that have no mapping, or reads that wouldn't make a peak.

I've seen most tools require aligned sequences and/or search only for a list of pre-defined motifs.

In it's simplest form, what I am looking for is a program that would read file.fa, where file.fa contains ~1M 50-200bp regulatory sequences, and produce the motif predictions, not needing to align it to a reference or scan for known motifs.

Does anybody know of a tool that would work for this amounts of unaligned fasta sequences and do de novo motif discovery?

chip-seq motif denovo • 2.7k views
ADD COMMENTlink modified 8.3 years ago by Dataminer2.6k • written 8.3 years ago by 2184687-1231-83-5.0k
3

How large were your ChIP fragments, and how far did you sequence in? As ChIP-seq sequences from the end of your fragment inwards, do you think the unaligned reads will even have the potential regulatory motifs contained within them?

ADD REPLYlink written 8.3 years ago by Aaron Statham1.1k
1

On prokaryotic or eukaryotic data ?

ADD REPLYlink written 8.3 years ago by Pasta1.3k

These unaligned regulatory sequences are already filtered from reads that have no mapping, or reads that wouldn't make a peak. So most of the data with no potential is already filtered out.

ADD REPLYlink written 8.3 years ago by 2184687-1231-83-5.0k

It's in eukarya

ADD REPLYlink written 8.3 years ago by 2184687-1231-83-5.0k
3
gravatar for Amyemilie
8.3 years ago by
Amyemilie30
Amyemilie30 wrote:

Hi,

Im using GimmeMotifs, it is a de novo motif prediction pipeline, especially suited for ChIP-seq datasets.

Its free, easy to install and to launch. I also think this is the more precise tool on internet.

Good luck :).

http://www.ncmls.eu/bioinfo/gimmemotifs/

ADD COMMENTlink written 8.3 years ago by Amyemilie30
1

This looks interesting, thanks. How robust are its predictions?

ADD REPLYlink written 8.3 years ago by Alastair Kerr5.2k
1
gravatar for Mikael Huss
8.3 years ago by
Mikael Huss4.7k
Stockholm
Mikael Huss4.7k wrote:

I don't quite see why there would be an issue with unaligned reads, as most de novo motif finding algorithms accept FASTA input.

You could try CisFinder or ChIPMunk. The already proposed GimmeMotifs seems nice too.

ADD COMMENTlink written 8.3 years ago by Mikael Huss4.7k

In it's simplest form, what I am looking for is a program that would read file.fa, where file.fa contains ~1M 50-200bp regulatory sequences, and produce the motif predictions, not needing to align it to a reference or scan for known motifs. Would CisFinder or ChIPMunk work like that?

ADD REPLYlink written 8.3 years ago by 2184687-1231-83-5.0k

Yes - although 1 million is a lot. The most I have tried was about 100,000 sequences with CisFinder, which worked well.

ADD REPLYlink written 8.3 years ago by Mikael Huss4.7k
1
gravatar for Dataminer
7.5 years ago by
Dataminer2.6k
Netherlands
Dataminer2.6k wrote:

Try GimmeMotifs, it is one of the best in business and Emilie has done an internship on the same. Wish you luck

ADD COMMENTlink written 7.5 years ago by Dataminer2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2237 users visited in the last hour