Question: Missing or disappearing output RepeatMasker
2
gravatar for mtollis
6.4 years ago by
mtollis30
United States
mtollis30 wrote:

I have received this message recently while using makeblastdb for rmblast in RepeatMasker, and it is a real head-scratcher for me. 

After no errors and completely running through all cycles, RepeatMasker finishes but there are no output files. The only trace of the analysis is the rmblastdb.log file in the RepeatMasker/Libraries directory which reads:

Building a new DB, current time: 05/29/2014 10:54:02
New DB name:   /home/mtollis/RepeatMasker/Libraries/20140131/anolis/specieslib
New DB title:  /home/mtollis/RepeatMasker/Libraries/20140131/anolis/specieslib
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 100% ambiguous nucleotides (shouldn't be over 40%)
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 100% ambiguous nucleotides (shouldn't be over 40%)
Adding sequences from FASTA; added 776 sequences in 0.108892 seconds.

Perhaps the makeblastdb "error" is harmless and maybe it is merely coincidental that my analysis fails. I don't see how either true ambiguities or line endings are the problem, as my database is hardly novel: I am using the RepBase update and the -species command. the command appears to work, as it creates the species specific library as well as the general library in the RepeatMasker/Libraries directory.

Does anyone know why RepeatMasker would run without throwing any errors and then leave no output files whatsoever?

repeatmasker makeblastdb • 3.2k views
ADD COMMENTlink modified 18 months ago by gbdias90 • written 6.4 years ago by mtollis30

<deleted>

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by PoGibas4.8k

In your fasta files, do your headers look ok? They all should start with ">" and header name and on the next line, true sequences should start. I can imagine these errors for the sequences without proper headers (just a guess though).

ADD REPLYlink written 6.4 years ago by Biomonika (Noolean)3.1k

It is hard to diagnose the issue without seeing the exact commands. I realize this is an old post now, but if you can provide the command used, and some information about the data, it would likely be helpful for others. And, it's always nice to answer questions and see things resolved.

ADD REPLYlink written 6.2 years ago by SES8.4k

Here is the command I used:

RepeatMasker -no_is -pa 16 -species "vertebrates" -a -html -gff genome.fasta

And this is an error message I found in the standard output.

Can't call method "getScore" on unblessed reference at /home/mtollis/RepeatMasker/PRSearchResult.pm line 164.

ADD REPLYlink written 6.1 years ago by mtollis30

Also, the data is a vertebrate-sized genome with hundreds of thousands of scaffolds. However, I have had RM work on these kinds of datasets with no problems before.

ADD REPLYlink written 6.1 years ago by mtollis30
1
gravatar for mtollis
5.7 years ago by
mtollis30
United States
mtollis30 wrote:

From the RepeatMasker developer, who suggested the following two fixes:

"The culprit is the processing of the alignment data using the "-a" flag. I tracked it down to a bug
in a routine which handles joining DNA transposons. The ugly match set was:

334 C21533332 2812 2859 + HAT1_DR#DNA/hAT-Ac 598 645
299 C21533332 2812 2859 C hAT-N76_DR#DNA/hAT 2324 2371

And the line in ProcessRepeats is ( line 7852 )

# add fused element to our derived from list
if ( $options{'source'} ) {
$lastAnnot->addDerivedFromAnnot( $member );
}

This should be:

# add fused element to our derived from list
if ( $options{'source'} ) {
$lastAnnot->addDerivedFromAnnot( $member->{'annot'} );
}
"

"I found something which causes ProcessRepeats to go into an infinite loop. It keeps expanding an array until the computer runs out of memory and the process is killed. It didn't print the
"Can't call method "getScore" on unblessed reference at /home/mtollis/RepeatMasker/PRSearchResult.pm line 164"
You have seen before though. I am not sure how you got that a second time. In any case I fixed this problem and I wondered if you might rerun this file on your system. The fix is in the PRSearchResult.pm module. You can download a patched copy of the module here:

http://www.repeatmasker.org/~rhubley...chResult.pm.gz

Copy this into your RepeatMasker directory, backup your old file and unzip this one:

mv PRSearchResult.pm PRSearchResult.pm.bak
gunzip PRSearchResult.pm.gz

I hope this fixes your problem. Thanks for reporting this!"

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by mtollis30
1

Thanks for the update. It's too bad there is not an easier method for distributing the updates, e.g., github.

ADD REPLYlink written 5.7 years ago by SES8.4k
1
gravatar for gbdias
18 months ago by
gbdias90
gbdias90 wrote:

This is an old post but I observed the same behavior in a more recent version of RepeatMasker (4.0.7). This version already has the fix to the bug you reported above, but the behavior persists. The program apparently runs to completion and throws no error message, but the running directory is empty after the run.

In my case, I figured it out as a file name problem. I ran RepeatMasker on several genome assemblies, and the ones where it did not produce any results were the ones where the file name had a plus sign in it. As in p+a_contigs.fasta. After I renamed these files to remove the + sign (pa_contigs.fasta) RepeatMasker finished successfully and produced all expected output.

ADD COMMENTlink written 18 months ago by gbdias90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1311 users visited in the last hour