ABySS - Out of memory using 64 cores (520GB)
0
0
Entering edit mode
5.7 years ago
wjar6718 • 0

Hi ABySS,

Now, I used Hi-seq X10 to generate M001_R1.fastq (170GB, 450,000,000 reads) and rM001_R1.fastq (170GB, 450,000,000 reads), with 37xcoverage,95% reads > Q30, and most have length 150bp.

My compute resource is only a single machine with 64 cores and total memory 520GB. My simple question here is whether I can run the 2 fastq files in ABySS1.9.0.

I did some tests of the data using 4 and 8 cores using my computer and I got the following results:

1. Used 4 cores

$cat abysspe91.sh.o3767790 /opt/openmpi/bin/mpirun -np 4 ABYSS-P -k64 -q3 -v --coverage-hist=coverage.hist -s FR07886691_Human_WEEJAR_R1R2-bubbles.fa -o FR07886691_Human_WEEJAR_R1R2-1.fa M001_R1.fastq M001_R2.fastq ABySS 1.9.0 ABYSS-P -k64 -q3 -v --coverage-hist=coverage.hist -s FR07886691_Human_WEEJAR_R1R2-bubbles.fa -o FR07886691_Human_WEEJAR_R1R2-1.fa M001_R1.fastq M001_R2.fastq Running on 4 processors 0: Running on host omega-0-9.local 1: Running on host omega-0-9.local 2: Running on host omega-0-9.local 3: Running on host omega-0-9.local 0: Reading HCCJFCCXX_2_150527_FR07886691_Human__R_150526_WEEJAR_FGS_M001_R1.fastq'... 1: Reading HCCJFCCXX_2_150527_FR07886691_Human__R_150526_WEEJAR_FGS_M001_R2.fastq'... [cut] 0: Read 6900000 reads. 0: Hash load: 229724096 / 536870912 = 0.428 using 8.01 GB 1: Read 7000000 reads. 1: Hash load: 229433787 / 536870912 = 0.427 using 8 GB [Job stopped here without error message. I think out of memory] 2. Used 8 cores$ cat abysspe81_eager.sh.o3769063
/opt/openmpi/bin/mpirun -np 8 ABYSS-P -k64 -q3 -v   --coverage-hist=coverage.hist -s FR07886681_Human_WEEJAR_R1R2_T8-bubbles.fa  -o FR07886681_Human_WEEJAR_R1R2_T8-1.fa M001_R1.fastq

M001_R2.fastq
ABySS 1.9.0
ABYSS-P -k64 -q3 -v --coverage-hist=coverage.hist -s FR07886681_Human_WEEJAR_R1R2_T8-bubbles.fa -o FR07886681_Human_WEEJAR_R1R2_T8-1.fa

M001_R1.fastq

M001_R2.fastq
Running on 8 processors
0: Running on host omega-0-17.local
1: Running on host omega-0-17.local
2: Running on host omega-0-17.local
3: Running on host omega-0-17.local
4: Running on host omega-0-17.local
5: Running on host omega-0-17.local
6: Running on host omega-0-17.local
7: Running on host omega-0-17.local
0: Reading HCCJFCCXX_1_150527_FR07886681_Human__R_150526_WEEJAR_FGS_M001_R1.fastq'...

[cut]

0: Read 16900000 reads. 0: Hash load: 233529399 / 536870912 = 0.435 using 8.15 GB                                           [Job stopped here without error message. I think out of memory]

I concluded that if I used 64 cores, ABySS could load only 132,000,000 reads of each fastq file, so my job would be failed using ABySS.

Again, can you guys help me to get a result of genome assembly? I really like to use ABySS due to high accuracy. I try to split my large file into 60 small files, but ABySS still uses the same memory to load the small files. Thank you in advance

Cheers, Weerachai

abyss assembly • 2.6k views
0
Entering edit mode

It is not clear if "your computer" which you ran the tests on is the same computer as the one with 64 cores and 520Gb memory. How much memory was available for the test run?

Also, did you perform quality checking and trimming, adapter trimming, error correction, maybe digital normalization? These steps should lower memory requirements.

0
Entering edit mode

Thanks for this. Actually, I am thinking to do these now. I did cut adapters, but not for others, as I can see that Aybss can do quality check.

It is interesting! In your case, how well can error correction and digital normalisation cut down the number of reads?

Cheers, weerachai

0
Entering edit mode

I wonder what the output of "free -g" is when ABySS hangs - that would tell you whether it actually runs out of memory or whether it has started to write to swap (the latter would explain why it seemingly hangs, it just takes forever once it starts writing to disk), or whether there's actually still enough memory left

0
Entering edit mode

Hi Philipp,

I would like to know too, but to my knowledge, it is difficult to check this. I am using a cluster that too many people are using and 64 cores would be reasonable for me to wait in a queue. I have submitted my job to SGE using qsub. Actually all compute nodes have 64 cores with 520 GB. They are probably set to provide all assigned cpu resources to finish a job. The details I have known are as follows;

 machine_type x86_64 os_name Linux os_release 2.6.32-504.8.1.el6.x86_64 sys_clock Thu, 13 Aug 2015 04:15:58 +1000 Uptime 16659 days, 18:16:09 Constant Metrics cpu_num 64 CPUs cpu_speed 2599 MHz mem_total 529414720 KB swap_total 268435456 KB

Cheers, weerachai

0
Entering edit mode

SGE lets you redirect stdout and stderr to a file, did you check the file for errors?

0
Entering edit mode

I think I deleted STDERR files, but I checked both. They were just nothing (0 byte). Weerachai

0
Entering edit mode

I never used SGE, but Torque will send these messages to the output, in fact, I got one today:

=>> PBS: job killed: mem job total 10549688 kb exceeded limit 10485760 kb

Are you redirecting Abyss output to some file? It is difficult to troubleshoot if there is no output and no clues.

0
Entering edit mode

Thx h.mon for your great support,

1. I reran the test and got these:

-rw-r--r-- 1 weejar HumanComparativeandProstateCancerGe            0 Aug 13 16:56 abysspe81_64_nslots_Q10q15k70.sh.e3781043
-rw-r--r-- 1 weejar HumanComparativeandProstateCancerGe            0 Aug 13 18:07 abysspe81_64_nslots_Q10q15k70.sh.e3781361
-rw-r--r-- 1 weejar HumanComparativeandProstateCancerGe        10910 Aug 13 18:41 abysspe81_64_nslots_Q10q15k70.sh.o3781361
-rw-r--r-- 1 weejar HumanComparativeandProstateCancerGe            0 Aug 13 18:07 abysspe81_64_nslots_Q10q15k70.sh.pe3781361
-rw-r--r-- 1 weejar HumanComparativeandProstateCancerGe            0 Aug 13 18:07 abysspe81_64_nslots_Q10q15k70.sh.po3781361

-bash-4.1$cat abysspe81_64_nslots_Q10q15k70.sh #!/bin/bash # # RUN_DIR=/share/Temp/weejar/fulcrum_v_043 ./abyss-pe -C$RUN_DIR np=NSLOTS v=-v k=30 n=10 name=FR07886681_R1R2 \ in='Clean2_FR07886681_Human_WEEJAR_R1_fix_kmer_q15_N0_L70_fastx_maskq10.fasta Clean2_FR07886681_Human_WEEJAR_R2_fix_kmer_q15_N0_L70_fastx_maskq10.fasta' \ aligner=bwa As you see, nothing was in *.pe and *.e files and the *.o file stopped at loading reads again 2. I reran another test in login node and not submit to SGE. I got the following: -bash-4.1 ./abyss-pe -C $RUN_DIR np=$NSLOTS v=-v k=64 n=5 name=FR07886681_R1R2_login_clean3 in='Clean3_FR07886681_Human_WEEJAR_R1_fix_kmer_q15_N0_L70_fastx_maskq10_N0.fasta Clean3_FR07886681_Human_WEEJAR_R2_fix_kmer_q15_N0_L70_fastx_maskq10_N0.fasta' aligner=bwa
make: Entering directory /share/Temp/weejar/fulcrum_v_043'
ABySS 1.9.0
[cut]

make: Leaving directory /share/Temp/weejar/fulcrum_v_043'
-bash-4.1$It seems clear to me that it would be something about memory usage if I consider the following total memory of the login node: machine_type x86_64 os_name Linux os_release 2.6.32-504.8.1.el6.x86_64 sys_clock Fri, 14 Aug 2015 11:59:43 +1000 Uptime 44 days, 23:18:50 Constant Metrics cpu_num 8 CPUs cpu_speed 2599 MHz mem_total 33014604 KB swap_total 16777212 KB Cheers, Weerachai ADD REPLY 0 Entering edit mode It could be that the system administrators set limits to the memory used by users of the login node, considering that it's just for submitting jobs. So the node may have enough memory but you're not allowed to use it. Back at UQ I got angry automated emails when I ran tasks on login nodes... ulimit -a` may tell you more about your allowed limits, but I wouldn't use the login node for anything. Can you ssh into your computing node while the job is running? ADD REPLY 0 Entering edit mode Now I have no clues of IP address or ssh'able hostname for compute nodes. The admin would not allow me to get them I think. Thx anyway Philipp ADD REPLY 0 Entering edit mode$ulimit -a

Last login: Thu Aug 13 18:36:05 2015 from 129.94.14.94
Rocks 6.0 (Mamba)
Profile built 12:24 30-Jun-2015

Kickstarted 12:37 30-Jun-2015
-bash-4.1$ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 257782 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited -bash-4.1$

0
Entering edit mode

You can try to start an interactive job, but I do not know how to do it on SGE. You will attract the admins ire if you keep running abyss at the login node, depending on local policies and how much you abuse you could be blocked from using the cluster.

P.S.: have you been using the login node all this time?

0
Entering edit mode

Login nodes for testing are okay and the abyss tests were run only for few hours before stopped. Weerachai

0
Entering edit mode

I face the same problem. I am gonna try to split the fastq files into smaller ones. But I don't how if it is feasible to join the resultant assemblies.