Question: Instaling Reas Repeat Finding
3
gravatar for Darked89
4.0 years ago by
Darked893.5k
Barcelona, Spain
Darked893.5k wrote:

I am trying to run ReAS, de novo repeat library construction package. My steps are described here:

http://openwetware.org/wiki/Wikiomics:Repeat_finding#ReAS

I got no errors running make (after changing one line, see above link), and several steps processing steps seems to be running OK. But when I run

nohup reas_all.pl -read 454_subset_4repeat_search.fas -pa 8 -output \ 
454_subset_4repeat_search.fas.cons_reas   2 > nohup.reas1.txt & 

I end up with: /usr/local/bin/multi_run2.sh: line 72: dust: command not found

This is strange, since I do have dust (from wublast) on the PATH.

Line 72 multi_run2.sh is just "eval $comm"

The relevant part of the code is in run_SegLink.pl:

`split_fa_bygroup.pl ../$read_file $Nfiles 1`;

open (Fout, ">sh") || die "can not write file sh\n";

for (my $i=1; $i<=$Nfiles; $i++) {

    print Fout "dust leaf.$i.fa 50 >$i.mask &\n";
}

`cat sh|multi_run2.sh -n $Nprocess -w 2\n`;

close Fout;

`cat *.mask >seg.mask.fa`;


my $N_out_file = $Nprocess * 10;

my $add = 0;

Two questions:

  • anybody managed to get ReAS working? If yes, how?
  • any ideas why I am getting "no dust" error?

EDIT: Perl code changed (previous one got mangled while doing copy-paste) See Michael comment below.

EDIT 2: title change (it is not about dust program anymore)

ADD COMMENTlink modified 3.8 years ago • written 4.0 years ago by Darked893.5k
1

Sorry but this doesn't look like valid perl to me: for (my $i=1; $i$i.mask &n"; } Are your sure this is correct? Anyway, I have no idea about this tool, after seeing this code snippet, whatever it does, I would abstain from ever using it unless forced by threats to life ;D

ADD REPLYlink written 4.0 years ago by Michael Dondrup27k

@Michel: my Firefox crashed while posting, I completed the post not noticing that code is mangled somehow. Sorry.

ADD REPLYlink written 4.0 years ago by Darked893.5k

@Michel: now I know pre tag did not protect messing up HTML by Perl code.

ADD REPLYlink written 4.0 years ago by Darked893.5k
3
gravatar for Michael Dondrup
4.0 years ago by
Bergen
Michael Dondrup27k wrote:

Well, well, I see. The author of such a script is probably in urgent need of a perl course (rofl) What is done is: the perl script creates a shell script with some shell commands, adds "&" to fork them off and passes this to another shell script via stdin piping "|" that just evals the lines in (probably) bash syntax given to it. This is achieved by back-tick evaluation.

The author nicely presents some of the worst practices in perl programming, unaware of such useful perl functions such as exec, system and fork. And somewhere on the way, the PATH environment is probably unfortunately lost. Anyway even though crude, this should work at a first glance. So check the following:

  1. In the console, try which dust If you don't get the path, the script cannot. Then try e.g. locate dust
  2. As a quick fix, replace:

     print Fout "dust leaf.$i.fa 50 >$i.mask &\n";
    

    with

     print Fout "/path/to/dust leaf.$i.fa 50 >$i.mask &\n";
    
  3. better fix: make configuration variable for all executables that are called from inside it. Define them in the beginning of the script or in a separate config.pm and use something like:

    print Fout "$dustexecutable leaf.$i.fa 50 >$i.mask &n";

  4. Even better: Try to re-write the code to make use of system and fork and to do all the things that are done purely in perl.

(sorry, no offence...)

ADD COMMENTlink written 4.0 years ago by Michael Dondrup27k

in the same shell window I was getting dust to execute. Same peeve with mixing shell scripting & perl/python. Applied 1 & 2, will know if this solves the problem in few hours.

ADD REPLYlink written 4.0 years ago by Darked893.5k

Good news: it did not print "no dust" this time. Bad news: finished without printing any other error, but also without producing final output. 2.8k lines of Perl in 32 files + one shell file...

ADD REPLYlink written 4.0 years ago by Darked893.5k

Why didn't this suprise me? And again, discovered just another case of a vapour-software publication (sometimes called 'misconduct'). Maybe send a nice mail to the authors, including a link to this question. You can also leave a comment on the PLoS-Comp. Biology site maybe: http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.0010043

ADD REPLYlink written 4.0 years ago by Michael Dondrup27k

Well, there has been ppl who managed to make ReAS work, but somehow this skill is rare... I posted few authors who did just that, maybe they will answer. re PLOS-Comp: I will post the links.

ADD REPLYlink written 4.0 years ago by Darked893.5k
1
gravatar for Darked89
3.8 years ago by
Darked893.5k
Barcelona, Spain
Darked893.5k wrote:

late follow up:

1) I run ReAS in a single process mode:

 nohup reas_all.pl -read 454_subset_4repeat_search.fas  -output 454_subset_4repeat_search.fas.cons_reas   2 > nohup.reas8.txt &

This does not change much, but helps to exclude any errors caused by executing commands in paralell

2) to my bewilderment our sys admins did not link all binaries from cross_match directory to bin, so I had no cross_match.manyreads on my PATH... Fixed.

3) run_SegLink.pl from ReAS writes lovely named temp.sh with a following command:

cross_match.manyreads leaf.1.fa leaf.1.fa -masklevel 101 -minmatch 14 -minscore 30 2>/dev/null |ConvertAlign.pl -s 100 -i 0.6 -t 0 | rmLCSlin
k.pl -m seg.mask.fa -j 50 | align_to_binary >>0.link &

This makes sure that if there are any problems with cross_match.manyreads then you can not read them.

Executing part of it by hand:

cross_match.manyreads leaf.1.fa leaf.1.fa -masklevel 101 -minmatch 14 -minscore 30  > cross_match.out

gives what was wrong:

<snip>
1.008 Mbytes allocated -- total 3647.952 Mbytes

FATAL ERROR: MAX_NUM_BLOCKS exceeded

4) in summary: ReAS produces no sensible result because at this moment cross_match.manyreads can not take 1.8Gb fasta files as an input.

5) Phrap/cross_match distribution has a file db.c defining MAX_NUM_BLOCKS = 1000. Simple number increase to 4000 gives the same error, 5000 does not work:

1 1784847673 -725271951
FATAL ERROR: seq_area_size

seq_area_size is declared in swat.h as int. Changing everything which deals with seq_area_size from int to big int does not look good without consulting program author.

EDIT:

6) decreasing the size of the fasta input file (8x, from 2.1Gb to 265Mb/592k 454 reads). ReAS is executing "run" file:

 cross_match.manyreads leaf.1.fa ../HD_reads.fa -masklevel 101 -minmatch 17 -minscore 30  2>/dev/null |ConvertAlign.pl -s 100 -i 0.6 -t 0 |HighDseg -s 100 -i 0.6 -d 6 >1.bk &

It is still running cross_match.manyreads part @100% CPU, close to 20h by now on Xeon @2.93GHz Running it in a single process mode => slow.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Darked893.5k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 635 users visited in the last hour