Question: What happened after coverage.hist in Abyss?
0
gravatar for wangzz.email
10 weeks ago by
wangzz.email0 wrote:

HI,

I just have a general question, after coverage.hist is generated, what is the next step in Abyss? It seems after the file is produced, the program got stuck, running two days without any output. I am trying to assemble a H. sapien genome on a single machine with 40 cores. I have used the mpi command recommended by the developers.

Thank you!

George

abyss assembly • 202 views
ADD COMMENTlink modified 9 weeks ago by Jean-Karim Heriche13k • written 10 weeks ago by wangzz.email0

First, it's not unusual for a genome assembly to take a long time. On the other hand, check that you have enough RAM on this machine. According to this paper, ABySS 2.0 required 34GB of RAM for a human genome and computation took 20 hours with 64 cores. It will take longer if using fewer cores and/or slower CPUs. If you don't have enough RAM, it'll take forever.

ADD REPLYlink written 10 weeks ago by Jean-Karim Heriche13k

Hi @wangzz,

Can you please provide your abyss-pe command and your complete log output? It would be helpful determing exactly where the program froze. You can use a GitHub gist for the log output if you don't want to post it here. If possible, please enable verbose logging by adding v=-v to your abyss-pe command. It would be very hard to troubleshoot your assembly otherwise.

Also: Thank you for your reply, Jean-Karim.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by benv620

Thanks for quick reply.

The command looks like this: abyss-pe -C abyss.96 np=40 k=96 v=-v name=sample q=15 \ lib='sample_lib_0 sample_lib_1' \ sample_lib_0='L003_R1.00.fa.gz L003_R2.00.fa.gz' \ sample_lib_1='L003_R1.01.fa.gz L003_R2.01.fa.gz' \

There are 100+ paired files.

The standard error stops at this line: 'L008_R1.15.fa.gz': discarded 124675 reads shorter than 96 bases

The standard output stops at this line: 36: Found 78803591 k-mer in 378755 contigs before removing low-coverage contigs. Removed 1456823 k-mer in 57048 low-coverage contigs.

We sequenced the same sample twice, different libraries (all pair end) though. The two data sets have about the same coverage (40X). The first data set was assembled successfully by abyss; the second data set looks like got stuck. We ran both samples on a 500GB cluster node with 40 cores.

Thanks.

ADD REPLYlink written 9 weeks ago by wangzz.email0

500 GB might not be enough unfortunately.

Your abyss-pe command looks good.

The ABySS log contains messages indicating how much memory is being used per MPI process. If you are able to post the full log output (e-mail benv at bcgsc dot ca, or post to a GitHub gist), I could have a look and see if that is the problem.

If you don't have a larger memory machine, you could try assembling with ABySS's new Bloom filter mode: https://github.com/bcgsc/abyss#assembling-using-a-bloom-filter-de-bruijn-graph

ADD REPLYlink written 9 weeks ago by benv620

OK. It might be due to the network problem. The machine I was running have more than 2 network configurations (at least two cards). This may confuse the MPI? I changed to another machine, and it worked. So thanks for your reply.

ADD REPLYlink written 9 weeks ago by wangzz.email0
0
gravatar for Jean-Karim Heriche
9 weeks ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche13k wrote:

This may be related to this mpi issue described in the FAQ.

ADD COMMENTlink written 9 weeks ago by Jean-Karim Heriche13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1579 users visited in the last hour