RepeatMasker crashing with error "can't fork"
1
0
Entering edit mode
8.8 years ago
scambier • 0

Hello-

I'm trying to run RepeatMasker on an Amazon EC2 machine. I know that I can run these files with the standard settings, (so a single sequence at a time), just fine, but it will take 20 hours this way, (I'm analyzing huge files, which I know isn't optimal with RepeatMasker, but there's no other options I'm aware of for retroelement identification and quantification). I see there has been a similar post before, but it wasn't resolved.

I tried the following command RepeatMasker -pa 32 S5.fa, and it starts off great. Then just after the refining SINE/ALU step, I get the "can't fork" error and it dies. I'm running this on a C3.8Xlarge and using Ubuntu. I tried -pa 16, 10, 8, and 6. Now at 6 I'm able to proceed, and I've switched from a C3.8Xlarge to a r3.2xlarge. I'm 50 batches in to 5000, and no fork error yet.

Any thoughts on why I'm limited to 6 processors? I'm beyond excited to bump up my analysis speed 6x, but I'm also worried that a few hours in I'll get a fork error and have to start from scratch using the standard settings.

software-error • 3.6k views
ADD COMMENT
0
Entering edit mode

According to top I'm using between 50-70% Cpu and ~13.2% memory

ADD REPLY
0
Entering edit mode

What do you mean by "huge files" are being used? A draft assembly with thousands of scaffolds, or is it millions of unassembled WGS reads?

ADD REPLY
0
Entering edit mode

Sorry I wasn't more explicit, I'm very new to NGS. I've basically taken a miseq run, trimmed adaptors, converted from fastq to fasta, and am running the fasta through repeatmasker. So ~ 2 million sequences or so.

ADD REPLY
0
Entering edit mode

And the reads were SE 150, so running around ~130 with adaptors trimmed

ADD REPLY
0
Entering edit mode

Thanks for the information, I wanted to make sure before I answered.

ADD REPLY
0
Entering edit mode
8.8 years ago
SES 8.6k

I recommend Transposome for this type of data because this is exactly what it was designed for. It has been accepted at Bioinformatics, but I don't have a citation yet. If you are already on a cloud instance, just issue the two commands shown under the "installation" section and it should install just fine for you.

For this tool, you actually don't need 2 million reads, but it won't hurt. I advise starting with about 100,000 reads (which should take a couple of minutes to analyze), then add more reads. I'm sorry I don't have the publication to show, but I found that in maize it adds very little information to sample above 2 million reads, so you definitely have enough data. You will need to obtain a set of reference repeat sequences for the annotation step. RepBase will work fine, but anything closer to the species you are analyzing is better. Feel free to send me a message or email with questions/observations.

Just for reference, RepeatMasker is for masking genomes. It's not really designed for identifying novel repeats, especially from raw reads. It aims to be precise and it will take forever on this type of data.

ADD COMMENT
0
Entering edit mode

Wow, this seems like it could be much better suited to my data than RepeatMasker.

Essentially, we've designed a method, (based on Paabo's neanderthal protocol), to pull down RNA/DNA hybrids from a cell and ligate to adaptors for Illumina sequencing. So, hopefully, all my reads will contain either R-loops or retroelements.

What I need is a program that can check all my reads for repetitive elements, then spit out a table summarizing all the counts of the different elements that are found, (which is why the RepeatMasker summary table is so nice). But I have no need to then mask these elements.

I will definitely give Transposome a try, unless you think it won't be suitable for my needs with the additional information I've just provided.

Thanks so much for your help, it's really appreciated.

ADD REPLY
0
Entering edit mode

I don't see a problem at all, I think this will work nicely for your data. There are two tables (explained on the wiki in detail) of summary results produced which should give you what you want (but I could help if that's not the case). This sounds like a very interesting study, I'd be interested to see what you find. Don't hesitate to ask me anything. Good luck!

ADD REPLY
0
Entering edit mode

Hello-

I've tried installing Transposome, but am encountering some issues, (I have very little knowledge of linux/command line, sorry about that).

I've followed the wiki, trying both the two line install and the manual install. I encountered a number of issues, but thought I'd worked around them. Finally I'm getting told I need to add fast db to path, which is beyond me. I was trying all this on an Amazon EC2 running Ubuntu.

Is it ok to post my issues here, or should I perhaps start a new post?

Thanks so much.

ADD REPLY
0
Entering edit mode

I guess start a new post, or send me a message. Either way, please include the exact message or issue you are having. There is no message that would say "fast db to path" so I'm confused what you are referring to. To be clear, you did edit the configuration file to include the name of your sequence data and repeat database, correct? If not, then the program won't know about your data.

ADD REPLY
0
Entering edit mode

I will start a new post if I continue to have problems. I want to give it one more try, as I think I was having troubles where I hadn't correctly installed Perl in my home directory, (even though I tried to follow your directions in the wiki), and I was "messing with the system Perl".

Thanks so much.

ADD REPLY
0
Entering edit mode

On a cloud instance you can skip that part and just try the two install commands under the "installation" section of the README file. The reason is that you have admin privileges and a recent version of Perl (I'll add some context to that section of the docs).

Also, please try to be more explicit if you want help. If you say you are having troubles or you got stuck, that is not enough information to diagnose the issue. I am curious where you met troubles because I can copy/paste those commands to install perlbrew one at a time and they work on any *nix system (Mac, Ubuntu, Fedora, etc.).

ADD REPLY
0
Entering edit mode

Hello-

Thanks so much for the reply. Sorry again if I'm not including enough info- extreme newbie here, trying to figure this all out on my own, (with help from the internets), is a challenge for me.

So to start- here is what's happening when I try to follow the commands to install/update perl with perlbrew. I'm doing this using ubuntu on an amazon ec2 machine. I get this message initially

ubuntu@ip-172-31-8-176:~$ \curl -L http://install.perlbrew.pl | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0   315    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
100  1095  100  1095    0     0    845      0  0:00:01  0:00:01 --:--:--   845

## Download the latest perlbrew
## Installing perlbrew
perlbrew is installed: ~/perl5/perlbrew/bin/perlbrew

perlbrew root (~/perl5/perlbrew) is initialized.

Append the following piece of code to the end of your ~/.bash_profile and start                                                       a
new shell, perlbrew should be up and fully functional from there:

    source ~/perl5/perlbrew/etc/bashrc

Simply run `perlbrew` for usage details.
Happy brewing!

So I add this piece of code into the current terminal-

ubuntu@ip-172-31-8-176:~$ echo "source ~/perl5/perlbrew/etc/bashrc" >> ~/.bashrc

Then I start a new terminal and type this

ubuntu@ip-172-31-8-176:~$ source ~/.bashrc
ubuntu@ip-172-31-8-176:~$ perlbrew install perl-5.20.1 -Dusethreads
Fetching perl 5.20.1 as /home/ubuntu/perl5/perlbrew/dists/perl-5.20.1.tar.bz2
Download http://www.cpan.org/src/5.0/perl-5.20.1.tar.bz2 to /home/ubuntu/perl5/perlbrew/dists/perl-5.20.1.tar.bz2
Installing /home/ubuntu/perl5/perlbrew/build/perl-5.20.1 into ~/perl5/perlbrew/perls/perl-5.20.1

This could take a while. You can run the following command on another shell to track the status:

  tail -f ~/perl5/perlbrew/build.perl-5.20.1.log

Installation process failed. To spot any issues, check

  /home/ubuntu/perl5/perlbrew/build.perl-5.20.1.log

If some perl tests failed and you still want install this distribution anyway,
do:

  (cd /home/ubuntu/perl5/perlbrew/build/perl-5.20.1; make install)

You might also want to try upgrading patchperl before trying again:

  perlbrew install-patchperl

Generally, if you need to install a perl distribution known to have minor test
failures, do one of these command to avoid seeing this message

  perlbrew --notest install perl-5.20.1
  perlbrew --force install perl-5.20.1

At this point I give up, because I don't really know where I've gone wrong.

ADD REPLY
0
Entering edit mode

And then I try just starting up a new instance on EC2, again with ubuntu. Then I try the two-line "easy install", and these are the problems I encounter, (I literally just copy paste the lines into my terminal, I don't change anything).

The first problem I encounter seems to be that the transposome_config.yml is missing a line, (or actually has a different line than expected)

[ERROR]: 'fraction_coverage' is not defined after parsing configuration file.
         This indicates there may be a blank line in your configuration file.
         Please check your configuration file and try again. Exiting.

I use nano to alter the .yml file, replacing the alignment length line with fraction coverage. This seems to work, until I hit this error message. Which I think is maybe due to me messing with the system perl, but I really don't know

ubuntu@ip-172-31-4-83:/tmp/wefwj48Sfi/config$ transposome --config transposome_config.yml
INFO - ======== Transposome version: 0.07.9 (started at: 07-12-2014 06:16:14) ========
INFO - Configuration - Log file for monitoring progress and errors: first_try.txt
INFO - Configuration - Sequence file:                               /home/ubuntu/Stetson_1_PF_R1_holytrim3.fa
INFO - Configuration - Sequence number for each BLAST process:      100000
INFO - Configuration - Number of CPUs per thread:                   4
INFO - Configuration - Number of threads:                           2
INFO - Configuration - Output directory:                            transposome_results_out
INFO - Configuration - In-memory analysis:                          1
INFO - Configuration - Percent identity for matches:                90
INFO - Configuration - Fraction coverage for pairwise matches:      0.55
INFO - Configuration - Merge threshold for clusters:                2
INFO - Configuration - Minimum cluster size for annotation:         1
INFO - Configuration - BLAST e-value threshold for annotation:      10
INFO - Configuration - Repeat database for annotation:              /home/ubuntu/RepBase19.11.fasta
INFO - Configuration - Log file for clustering/merging results:     first_try__report.txt
ERROR - Unable to find formatdb. Check your PATH to see that it is installed. Exiting.

I looked for formatdb and checked my path, and this is what I find

ubuntu@ip-172-31-4-83:/tmp/wefwj48Sfi/config$ sudo find / -name "formatdb"
/tmp/wefwj48Sfi/build/ci/bin/formatdb
/tmp/wefwj48Sfi/bin/formatdb
/usr/local/bin/formatdb
ubuntu@ip-172-31-4-83:/tmp/wefwj48Sfi/config$ echo $PATH
/home/ubuntu/perl5/perlbrew/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games

Again, I've gone wrong somewhere, obviously, but I don't know how to proceed from here so I gave up.

ADD REPLY
0
Entering edit mode

Please be patient and try not to give up. It appears that Amazon has modified paths relative to a normal Ubuntu distribution and other cloud services that I have tried (Linode, Digital Ocean, Rackspace, etc.). The first error was mine, I merged a change from a dev branch that modified the config file, but the master branch is correct now. Thank you for pointing that out. Everything appears to be working except there is something funny with the paths or perl. Could you tell me the output of: perl -V

EDIT: There is no issue with Amazon, see below.

ADD REPLY
0
Entering edit mode

Thanks so much! I'm 100% willing to troubleshoot this with you, I'm very appreciative of all the folks in the NGS field willing to share their work and expertise with others. Here's the output, (this is from the terminal where I've been attempting to update with perlbrew- let me know if you need the output from a brand-new untouched instance)

ubuntu@ip-172-31-8-176:~$ perl -V
Summary of my perl5 (revision 5 version 18 subversion 2) configuration:

  Platform:
    osname=linux, osvers=3.2.0-58-generic, archname=x86_64-linux-gnu-thread-multi
    uname='linux brownie 3.2.0-58-generic #88-ubuntu smp tue dec 3 17:37:58 utc 2013 x86_64 x86_64 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.18 -Darchlib=/usr/lib/perl/5.18 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.18.2 -Dsitearch=/usr/local/lib/perl/5.18.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.18.2 -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.8.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=, so=so, useshrplib=true, libperl=libperl.so.5.18.2
    gnulibc_version='2.19'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'

Characteristics of this binary (from libperl):
  Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS
                        PERL_DONT_CREATE_GVSV
                        PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
                        PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
                        PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_ALL
                        USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES
                        USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE
                        USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF
                        USE_REENTRANT_API
  Locally applied patches:
        DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
        DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
        DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
        DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
        DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
        DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
        DEBPKG:fixes/respect_umask - Respect umask during installation
        DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
        DEBPKG:debian/extutils_set_libperl_path - EU:MM: Set location of libperl.a to /usr/lib
        DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
        DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
        DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
        DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
        DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
        DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
        DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
        DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
        DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
        DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option
        DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
        DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
        DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
        DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/702096 Point users to Debian packages of deprecated core modules
        DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
        DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
        DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.18.2-2ubuntu1 in patchlevel.h
        DEBPKG:debian/skip-kfreebsd-crash - http://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
        DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
        DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
        DEBPKG:debian/hurd_test_skip_stack - http://bugs.debian.org/650175 Disable failing GNU/Hurd tests dist/threads/t/stack.t
        DEBPKG:fixes/manpage_name_Test-Harness - http://bugs.debian.org/650451 [rt.cpan.org #73399] cpan/Test-Harness: add NAME headings in modules with POD
        DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/660195 [rt.cpan.org #28632] Make EU::MM pass LD through to recursive Makefile.PL invocations
        DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
        DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
        DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
        DEBPKG:fixes/net_ftp_failed_command - [rt.cpan.org #37700] http://bugs.debian.org/491062 Net::FTP: cope gracefully with a failed command
        DEBPKG:fixes/perlbug-patchlist - [3541c11] http://bugs.debian.org/710842 [perl #118433] Make perlbug look up the list of local patches at run time
        DEBPKG:fixes/module_metadata_security_doc - [68cdd4b] CVE-2013-1437 documentation fix
        DEBPKG:fixes/module_metadata_taint_fix - [bff978f] http://bugs.debian.org/722210 [rt.cpan.org #88576] untaint version, if needed, in Module::Metadata
        DEBPKG:fixes/IPC-SysV-spelling - http://bugs.debian.org/730558 [rt.cpan.org #86736] Fix spelling of IPC_CREAT in IPC-SysV documentation
        DEBPKG:fixes/fix-undef-source -
  Built under linux
  Compiled at Mar 27 2014 18:30:28
  %ENV:
    PERLBREW_BASHRC_VERSION="0.71"
    PERLBREW_HOME="/home/ubuntu/.perlbrew"
    PERLBREW_ROOT="/home/ubuntu/perl5/perlbrew"
  @INC:
    /etc/perl
    /usr/local/lib/perl/5.18.2
    /usr/local/share/perl/5.18.2
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.18
    /usr/share/perl/5.18
    /usr/local/lib/site_perl
ADD REPLY
0
Entering edit mode

Thank you, I see the issue. I just set up Amazon ec2 instance that is identical, so I will tell you shortly what to do. EDIT: There is no issue, see below.

ADD REPLY
0
Entering edit mode

Did you try the suggestion at the end? This is common on older systems but I've never seen this with cloud instances, so it's somewhat odd. It is nothing to worry about though, it just means some minor tests failed. Try the last command and it will work (with --force).

ADD REPLY
0
Entering edit mode

I just tried all the following suggestions- force install, notest install, upgrade patchperl, nothing helped. Still get the same message as before. Should I try with something other than ubuntu, or maybe try updating it? The version I'm running is the first one that's available from Amazon when setting up an instance- Ubuntu Server 14.04 LTS (HVM), SSD Volume Type

ADD REPLY
0
Entering edit mode

I just set up a fresh Amazon ec2 instance (Ubuntu 14.04), and these commands worked without errors:

sudo apt-get update
sudo apt-get install -y build-essential lib32z1 git ncbi-blast+

Then, I copy-and-pasted the 6 lines for installing Perl from the wiki. Then, one more line installs Transposome:

cpanm git://github.com/sestaton/Transposome.git

Now, test it out:

$ transposome

ERROR: No arguments were given.

USAGE: transposome [-c] [-v] [-h] [-m]

Required:
    -c|config  :    The Transposome configuration file.

Options:
    -v|version :    Print the program version and exit.
    -h|help    :    Print a usage statement.
    -m|man     :    Print the full documentation.

If you still have issues, please contact via email and I can help further so we don't have too lengthy of a discussion here. I honestly can't see where you getting stuck, but also, I don't know the exact commands you used. The good news it will install/work with the commands shown, which is what I expect. So, we just need to figure out exactly what is specifically the issue with your instance or commands.

ADD REPLY
0
Entering edit mode

Hello-

I got through all the steps above just fine, but am running into problems trying to run my data through. I've emailed you, (I think), so if you want to continue helping me, that would be fantastic. Thanks so much.

ADD REPLY

Login before adding your answer.

Traffic: 3911 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6