RepeatMasker crashing with error "can't fork"
Entering edit mode
8.8 years ago
scambier • 0


I'm trying to run RepeatMasker on an Amazon EC2 machine. I know that I can run these files with the standard settings, (so a single sequence at a time), just fine, but it will take 20 hours this way, (I'm analyzing huge files, which I know isn't optimal with RepeatMasker, but there's no other options I'm aware of for retroelement identification and quantification). I see there has been a similar post before, but it wasn't resolved.

I tried the following command RepeatMasker -pa 32 S5.fa, and it starts off great. Then just after the refining SINE/ALU step, I get the "can't fork" error and it dies. I'm running this on a C3.8Xlarge and using Ubuntu. I tried -pa 16, 10, 8, and 6. Now at 6 I'm able to proceed, and I've switched from a C3.8Xlarge to a r3.2xlarge. I'm 50 batches in to 5000, and no fork error yet.

Any thoughts on why I'm limited to 6 processors? I'm beyond excited to bump up my analysis speed 6x, but I'm also worried that a few hours in I'll get a fork error and have to start from scratch using the standard settings.

software-error • 3.6k views
Entering edit mode

According to top I'm using between 50-70% Cpu and ~13.2% memory

Entering edit mode

What do you mean by "huge files" are being used? A draft assembly with thousands of scaffolds, or is it millions of unassembled WGS reads?

Entering edit mode

Sorry I wasn't more explicit, I'm very new to NGS. I've basically taken a miseq run, trimmed adaptors, converted from fastq to fasta, and am running the fasta through repeatmasker. So ~ 2 million sequences or so.

Entering edit mode

And the reads were SE 150, so running around ~130 with adaptors trimmed

Entering edit mode

Thanks for the information, I wanted to make sure before I answered.

Entering edit mode
8.8 years ago
SES 8.6k

I recommend Transposome for this type of data because this is exactly what it was designed for. It has been accepted at Bioinformatics, but I don't have a citation yet. If you are already on a cloud instance, just issue the two commands shown under the "installation" section and it should install just fine for you.

For this tool, you actually don't need 2 million reads, but it won't hurt. I advise starting with about 100,000 reads (which should take a couple of minutes to analyze), then add more reads. I'm sorry I don't have the publication to show, but I found that in maize it adds very little information to sample above 2 million reads, so you definitely have enough data. You will need to obtain a set of reference repeat sequences for the annotation step. RepBase will work fine, but anything closer to the species you are analyzing is better. Feel free to send me a message or email with questions/observations.

Just for reference, RepeatMasker is for masking genomes. It's not really designed for identifying novel repeats, especially from raw reads. It aims to be precise and it will take forever on this type of data.

Entering edit mode

Wow, this seems like it could be much better suited to my data than RepeatMasker.

Essentially, we've designed a method, (based on Paabo's neanderthal protocol), to pull down RNA/DNA hybrids from a cell and ligate to adaptors for Illumina sequencing. So, hopefully, all my reads will contain either R-loops or retroelements.

What I need is a program that can check all my reads for repetitive elements, then spit out a table summarizing all the counts of the different elements that are found, (which is why the RepeatMasker summary table is so nice). But I have no need to then mask these elements.

I will definitely give Transposome a try, unless you think it won't be suitable for my needs with the additional information I've just provided.

Thanks so much for your help, it's really appreciated.

Entering edit mode

I don't see a problem at all, I think this will work nicely for your data. There are two tables (explained on the wiki in detail) of summary results produced which should give you what you want (but I could help if that's not the case). This sounds like a very interesting study, I'd be interested to see what you find. Don't hesitate to ask me anything. Good luck!

Entering edit mode


I've tried installing Transposome, but am encountering some issues, (I have very little knowledge of linux/command line, sorry about that).

I've followed the wiki, trying both the two line install and the manual install. I encountered a number of issues, but thought I'd worked around them. Finally I'm getting told I need to add fast db to path, which is beyond me. I was trying all this on an Amazon EC2 running Ubuntu.

Is it ok to post my issues here, or should I perhaps start a new post?

Thanks so much.

Entering edit mode

I guess start a new post, or send me a message. Either way, please include the exact message or issue you are having. There is no message that would say "fast db to path" so I'm confused what you are referring to. To be clear, you did edit the configuration file to include the name of your sequence data and repeat database, correct? If not, then the program won't know about your data.

Entering edit mode

I will start a new post if I continue to have problems. I want to give it one more try, as I think I was having troubles where I hadn't correctly installed Perl in my home directory, (even though I tried to follow your directions in the wiki), and I was "messing with the system Perl".

Thanks so much.

Entering edit mode

On a cloud instance you can skip that part and just try the two install commands under the "installation" section of the README file. The reason is that you have admin privileges and a recent version of Perl (I'll add some context to that section of the docs).

Also, please try to be more explicit if you want help. If you say you are having troubles or you got stuck, that is not enough information to diagnose the issue. I am curious where you met troubles because I can copy/paste those commands to install perlbrew one at a time and they work on any *nix system (Mac, Ubuntu, Fedora, etc.).

Entering edit mode


Thanks so much for the reply. Sorry again if I'm not including enough info- extreme newbie here, trying to figure this all out on my own, (with help from the internets), is a challenge for me.

So to start- here is what's happening when I try to follow the commands to install/update perl with perlbrew. I'm doing this using ubuntu on an amazon ec2 machine. I get this message initially

ubuntu@ip-172-31-8-176:~$ \curl -L | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0   315    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
100  1095  100  1095    0     0    845      0  0:00:01  0:00:01 --:--:--   845

## Download the latest perlbrew
## Installing perlbrew
perlbrew is installed: ~/perl5/perlbrew/bin/perlbrew

perlbrew root (~/perl5/perlbrew) is initialized.

Append the following piece of code to the end of your ~/.bash_profile and start                                                       a
new shell, perlbrew should be up and fully functional from there:

    source ~/perl5/perlbrew/etc/bashrc

Simply run `perlbrew` for usage details.
Happy brewing!

So I add this piece of code into the current terminal-

ubuntu@ip-172-31-8-176:~$ echo "source ~/perl5/perlbrew/etc/bashrc" >> ~/.bashrc

Then I start a new terminal and type this

ubuntu@ip-172-31-8-176:~$ source ~/.bashrc
ubuntu@ip-172-31-8-176:~$ perlbrew install perl-5.20.1 -Dusethreads
Fetching perl 5.20.1 as /home/ubuntu/perl5/perlbrew/dists/perl-5.20.1.tar.bz2
Download to /home/ubuntu/perl5/perlbrew/dists/perl-5.20.1.tar.bz2
Installing /home/ubuntu/perl5/perlbrew/build/perl-5.20.1 into ~/perl5/perlbrew/perls/perl-5.20.1

This could take a while. You can run the following command on another shell to track the status:

  tail -f ~/perl5/perlbrew/build.perl-5.20.1.log

Installation process failed. To spot any issues, check


If some perl tests failed and you still want install this distribution anyway,

  (cd /home/ubuntu/perl5/perlbrew/build/perl-5.20.1; make install)

You might also want to try upgrading patchperl before trying again:

  perlbrew install-patchperl

Generally, if you need to install a perl distribution known to have minor test
failures, do one of these command to avoid seeing this message

  perlbrew --notest install perl-5.20.1
  perlbrew --force install perl-5.20.1

At this point I give up, because I don't really know where I've gone wrong.

Entering edit mode

And then I try just starting up a new instance on EC2, again with ubuntu. Then I try the two-line "easy install", and these are the problems I encounter, (I literally just copy paste the lines into my terminal, I don't change anything).

The first problem I encounter seems to be that the transposome_config.yml is missing a line, (or actually has a different line than expected)

[ERROR]: 'fraction_coverage' is not defined after parsing configuration file.
         This indicates there may be a blank line in your configuration file.
         Please check your configuration file and try again. Exiting.

I use nano to alter the .yml file, replacing the alignment length line with fraction coverage. This seems to work, until I hit this error message. Which I think is maybe due to me messing with the system perl, but I really don't know

ubuntu@ip-172-31-4-83:/tmp/wefwj48Sfi/config$ transposome --config transposome_config.yml
INFO - ======== Transposome version: 0.07.9 (started at: 07-12-2014 06:16:14) ========
INFO - Configuration - Log file for monitoring progress and errors: first_try.txt
INFO - Configuration - Sequence file:                               /home/ubuntu/Stetson_1_PF_R1_holytrim3.fa
INFO - Configuration - Sequence number for each BLAST process:      100000
INFO - Configuration - Number of CPUs per thread:                   4
INFO - Configuration - Number of threads:                           2
INFO - Configuration - Output directory:                            transposome_results_out
INFO - Configuration - In-memory analysis:                          1
INFO - Configuration - Percent identity for matches:                90
INFO - Configuration - Fraction coverage for pairwise matches:      0.55
INFO - Configuration - Merge threshold for clusters:                2
INFO - Configuration - Minimum cluster size for annotation:         1
INFO - Configuration - BLAST e-value threshold for annotation:      10
INFO - Configuration - Repeat database for annotation:              /home/ubuntu/RepBase19.11.fasta
INFO - Configuration - Log file for clustering/merging results:     first_try__report.txt
ERROR - Unable to find formatdb. Check your PATH to see that it is installed. Exiting.

I looked for formatdb and checked my path, and this is what I find

ubuntu@ip-172-31-4-83:/tmp/wefwj48Sfi/config$ sudo find / -name "formatdb"
ubuntu@ip-172-31-4-83:/tmp/wefwj48Sfi/config$ echo $PATH

Again, I've gone wrong somewhere, obviously, but I don't know how to proceed from here so I gave up.

Entering edit mode

Please be patient and try not to give up. It appears that Amazon has modified paths relative to a normal Ubuntu distribution and other cloud services that I have tried (Linode, Digital Ocean, Rackspace, etc.). The first error was mine, I merged a change from a dev branch that modified the config file, but the master branch is correct now. Thank you for pointing that out. Everything appears to be working except there is something funny with the paths or perl. Could you tell me the output of: perl -V

EDIT: There is no issue with Amazon, see below.

Entering edit mode

Thanks so much! I'm 100% willing to troubleshoot this with you, I'm very appreciative of all the folks in the NGS field willing to share their work and expertise with others. Here's the output, (this is from the terminal where I've been attempting to update with perlbrew- let me know if you need the output from a brand-new untouched instance)

ubuntu@ip-172-31-8-176:~$ perl -V
Summary of my perl5 (revision 5 version 18 subversion 2) configuration:

    osname=linux, osvers=3.2.0-58-generic, archname=x86_64-linux-gnu-thread-multi
    uname='linux brownie 3.2.0-58-generic #88-ubuntu smp tue dec 3 17:37:58 utc 2013 x86_64 x86_64 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.18 -Darchlib=/usr/lib/perl/5.18 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.18.2 -Dsitearch=/usr/local/lib/perl/5.18.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.8.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=, so=so, useshrplib=true,
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'

Characteristics of this binary (from libperl):
                        USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES
  Locally applied patches:
        DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
        DEBPKG:debian/db_file_ver - Remove overly restrictive DB_File version check.
        DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
        DEBPKG:debian/enc2xs_inc - Tweak enc2xs to follow symlinks and ignore missing @INC directories.
        DEBPKG:debian/errno_ver - Remove Errno version check due to upgrade problems with long-running processes.
        DEBPKG:debian/libperl_embed_doc - Note that libperl-dev package is required for embedded linking
        DEBPKG:fixes/respect_umask - Respect umask during installation
        DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
        DEBPKG:debian/extutils_set_libperl_path - EU:MM: Set location of libperl.a to /usr/lib
        DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
        DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
        DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
        DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
        DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
        DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
        DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
        DEBPKG:debian/module_build_man_extensions - Adjust Module::Build manual page extensions for the Debian Perl policy
        DEBPKG:debian/prune_libs - Prune the list of libraries wanted to what we actually need.
        DEBPKG:fixes/net_smtp_docs - [ #36038] Document the Net::SMTP 'Port' option
        DEBPKG:debian/perlivp - Make perlivp skip include directories in /usr/local
        DEBPKG:debian/cpanplus_definstalldirs - Configure CPANPLUS to use the site directories by default.
        DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
        DEBPKG:debian/deprecate-with-apt - Point users to Debian packages of deprecated core modules
        DEBPKG:debian/squelch-locale-warnings - Squelch locale warnings in Debian package maintainer scripts
        DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
        DEBPKG:debian/patchlevel - List packaged patches for 5.18.2-2ubuntu1 in patchlevel.h
        DEBPKG:debian/skip-kfreebsd-crash - [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
        DEBPKG:fixes/document_makemaker_ccflags - [ #68613] Document that CCFLAGS should include $Config{ccflags}
        DEBPKG:debian/find_html2text - Configure CPAN::Distribution with correct name of html2text
        DEBPKG:debian/hurd_test_skip_stack - Disable failing GNU/Hurd tests dist/threads/t/stack.t
        DEBPKG:fixes/manpage_name_Test-Harness - [ #73399] cpan/Test-Harness: add NAME headings in modules with POD
        DEBPKG:debian/makemaker-pasthru - [ #28632] Make EU::MM pass LD through to recursive Makefile.PL invocations
        DEBPKG:debian/perl5db-x-terminal-emulator.patch - Invoke x-terminal-emulator rather than xterm in
        DEBPKG:debian/cpan-missing-site-dirs - Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
        DEBPKG:fixes/memoize_storable_nstore - [ #77790] Memoize::Storable: respect 'nstore' option not respected
        DEBPKG:fixes/net_ftp_failed_command - [ #37700] Net::FTP: cope gracefully with a failed command
        DEBPKG:fixes/perlbug-patchlist - [3541c11] [perl #118433] Make perlbug look up the list of local patches at run time
        DEBPKG:fixes/module_metadata_security_doc - [68cdd4b] CVE-2013-1437 documentation fix
        DEBPKG:fixes/module_metadata_taint_fix - [bff978f] [ #88576] untaint version, if needed, in Module::Metadata
        DEBPKG:fixes/IPC-SysV-spelling - [ #86736] Fix spelling of IPC_CREAT in IPC-SysV documentation
        DEBPKG:fixes/fix-undef-source -
  Built under linux
  Compiled at Mar 27 2014 18:30:28
Entering edit mode

Thank you, I see the issue. I just set up Amazon ec2 instance that is identical, so I will tell you shortly what to do. EDIT: There is no issue, see below.

Entering edit mode

Did you try the suggestion at the end? This is common on older systems but I've never seen this with cloud instances, so it's somewhat odd. It is nothing to worry about though, it just means some minor tests failed. Try the last command and it will work (with --force).

Entering edit mode

I just tried all the following suggestions- force install, notest install, upgrade patchperl, nothing helped. Still get the same message as before. Should I try with something other than ubuntu, or maybe try updating it? The version I'm running is the first one that's available from Amazon when setting up an instance- Ubuntu Server 14.04 LTS (HVM), SSD Volume Type

Entering edit mode

I just set up a fresh Amazon ec2 instance (Ubuntu 14.04), and these commands worked without errors:

sudo apt-get update
sudo apt-get install -y build-essential lib32z1 git ncbi-blast+

Then, I copy-and-pasted the 6 lines for installing Perl from the wiki. Then, one more line installs Transposome:

cpanm git://

Now, test it out:

$ transposome

ERROR: No arguments were given.

USAGE: transposome [-c] [-v] [-h] [-m]

    -c|config  :    The Transposome configuration file.

    -v|version :    Print the program version and exit.
    -h|help    :    Print a usage statement.
    -m|man     :    Print the full documentation.

If you still have issues, please contact via email and I can help further so we don't have too lengthy of a discussion here. I honestly can't see where you getting stuck, but also, I don't know the exact commands you used. The good news it will install/work with the commands shown, which is what I expect. So, we just need to figure out exactly what is specifically the issue with your instance or commands.

Entering edit mode


I got through all the steps above just fine, but am running into problems trying to run my data through. I've emailed you, (I think), so if you want to continue helping me, that would be fantastic. Thanks so much.


Login before adding your answer.

Traffic: 2619 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6