Entering edit mode
7.5 years ago
wangzz.email
•
0
HI,
I just have a general question, after coverage.hist is generated, what is the next step in Abyss? It seems after the file is produced, the program got stuck, running two days without any output. I am trying to assemble a H. sapien genome on a single machine with 40 cores. I have used the mpi command recommended by the developers.
Thank you!
George
First, it's not unusual for a genome assembly to take a long time. On the other hand, check that you have enough RAM on this machine. According to this paper, ABySS 2.0 required 34GB of RAM for a human genome and computation took 20 hours with 64 cores. It will take longer if using fewer cores and/or slower CPUs. If you don't have enough RAM, it'll take forever.
Hi @wangzz,
Can you please provide your
abyss-pe
command and your complete log output? It would be helpful determing exactly where the program froze. You can use a GitHub gist for the log output if you don't want to post it here. If possible, please enable verbose logging by addingv=-v
to yourabyss-pe
command. It would be very hard to troubleshoot your assembly otherwise.Also: Thank you for your reply, Jean-Karim.
Thanks for quick reply.
The command looks like this: abyss-pe -C abyss.96 np=40 k=96 v=-v name=sample q=15 \ lib='sample_lib_0 sample_lib_1' \ sample_lib_0='L003_R1.00.fa.gz L003_R2.00.fa.gz' \ sample_lib_1='L003_R1.01.fa.gz L003_R2.01.fa.gz' \
There are 100+ paired files.
The standard error stops at this line: 'L008_R1.15.fa.gz': discarded 124675 reads shorter than 96 bases
The standard output stops at this line: 36: Found 78803591 k-mer in 378755 contigs before removing low-coverage contigs. Removed 1456823 k-mer in 57048 low-coverage contigs.
We sequenced the same sample twice, different libraries (all pair end) though. The two data sets have about the same coverage (40X). The first data set was assembled successfully by abyss; the second data set looks like got stuck. We ran both samples on a 500GB cluster node with 40 cores.
Thanks.
500 GB might not be enough unfortunately.
Your
abyss-pe
command looks good.The ABySS log contains messages indicating how much memory is being used per MPI process. If you are able to post the full log output (e-mail benv at bcgsc dot ca, or post to a GitHub gist), I could have a look and see if that is the problem.
If you don't have a larger memory machine, you could try assembling with ABySS's new Bloom filter mode: https://github.com/bcgsc/abyss#assembling-using-a-bloom-filter-de-bruijn-graph
OK. It might be due to the network problem. The machine I was running have more than 2 network configurations (at least two cards). This may confuse the MPI? I changed to another machine, and it worked. So thanks for your reply.