Question: Scan a large genome for the ocurrence of a single sequence motif?
2
gravatar for qwerty
5.8 years ago by
qwerty50
Germany
qwerty50 wrote:

Hello,

I want to compile a BED file with genomic coordinates of all occurrences of a particular sequence motif in the complete genome. One obvious online tool to do this task, FIMO, can not be applied because it has a limit of 1 million nucleotides. I need to scan the whole genome (~1e9 bp). Any software suggestions?

Thanks!

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by qwerty50
1

what does your motif look like? is it a sequence you guys are working on (i.e. not in databases already) or a previously known  motif?

you could try IGV http://www.broadinstitute.org/igv/motif_finder

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by TriS4.2k

It's not a known TF motif, it's a novel motif determined from ChIP-seq

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by qwerty50
2
gravatar for thackl
5.8 years ago by
thackl2.8k
MIT
thackl2.8k wrote:

If I understand correctly, you can download and install MEME/FIMO locally (http://meme.nbcr.net/meme/meme-download.html, http://meme.nbcr.net/meme/doc/fimo.html). That might solve the 1Mbp limit. If it doesn't you could chop up your sequence like this:

bedtools makewindow -g GEN.fa -w 1000000 > GEN.bed
bedtools getfasta -fi GEN.fa -bed GEN.bed -fo GEN-win.fa
csplit -z GEN-win.fa '/>/' '{*}'
for FILE in GEN-win.fa.*; do FIMO; done

You would have to adjust coordinates by 1Mbp * split.count for each result file

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by thackl2.8k
1
gravatar for Fidel
5.8 years ago by
Fidel2.0k
Germany
Fidel2.0k wrote:

Use TRAP. What you need is the genome in fasta format and the motif matrix. It is possible to use Transfac and Jaspar or use your own motif as a position-specific scoring matrix.

You may want to take a look at the Nature Protocols paper describing the tool: http://www.nature.com/nprot/journal/v6/n12/full/nprot.2011.409.html

 
ADD COMMENTlink written 5.8 years ago by Fidel2.0k

TRAP has a limit of 5000 bp :(

ADD REPLYlink written 5.8 years ago by qwerty50
1

It can scan whole genomes -that is the idea- and it does it very fast. 

This is how I ran it using drosophila genome:

ANNOTATE_v3.04 -s dm3.fa --pssm meme.pssm -g 0.5 --ttype balanced -name 'MOTIF_1' -d > trap.txt
ADD REPLYlink written 5.8 years ago by Fidel2.0k

Thank you. Could you please tell me an approximate duration of such a job? As you have suggested, I am running a command line with the same parameters as you just indicated. My motif length is 27 nucleotides, and the genome is human. It took already more than 3 hours, still running, and no indications of the state of the job...

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by qwerty50

Fidel, So I waited for 24 hours for the program to finish, and then killed the task. Any ideas how long should it take normally?

ADD REPLYlink written 5.8 years ago by qwerty50

24 hours is too much! The problem may lie on the --ttype balanced, to solve this replace it by -t 5.0 (I just asked my boss Thomas Manke who participated in the development of the tool). The -t (threshold) is the score used to define hits. Higher numbers result in fewer hits that are of higher confidence. How did you compute the pssm matrix? 

 

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Fidel2.0k

The PSSM matrix was reported by MEME

ADD REPLYlink written 5.8 years ago by qwerty50

The MEME output only include the frequency counts. You need a matrix that has log ratios summing up the probabilities of finding a particular base against a background. Try converting your meme matrix using this tool: http://rsat.ulb.ac.be/convert-matrix_form.cgi

We just tried and a local run took only 10 minutes using the human genome.

ADD REPLYlink written 5.8 years ago by Fidel2.0k

Hi Fidel,

Could you please tell which input and output matrix formats should be selected at RSAT?

Thanks!

ADD REPLYlink written 5.8 years ago by chemcehn190

rsat seems to be down for several days now. They have a nice manual page explaining what you want.

ADD REPLYlink written 5.8 years ago by Fidel2.0k
0
gravatar for qwerty
5.8 years ago by
qwerty50
Germany
qwerty50 wrote:

So finally I ended up mapping the fixed motif by Bowtie using the "-a" option. It took just several seconds.

FIMO did not install locally after several attempts and upgrades of the Perl and Python. Some compilation errors...

TRAP did install locally, but it did not finish the job, and the format of the input matrix required for TRAP remains unclear.

ADD COMMENTlink written 5.8 years ago by qwerty50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2072 users visited in the last hour