Question: task killed during SOAPdenovo's running
0
gravatar for Yingzi Zhang
9 days ago by
Yingzi Zhang30
Beijing
Yingzi Zhang30 wrote:

Hi all, I was running SOAPdenovo-127mer. The libraries are 12. When importing reads from the 7th library files, it reported that the task was killed and no more other information. Then I re-ran the same command and the task was killed again when importing the 7th library. How please? There was no one killed the task manually. Was there memory issue? Thank you.

Yingzi

ADD COMMENTlink modified 8 days ago by Ram15k • written 9 days ago by Yingzi Zhang30

Can you please share the config file contents?

ADD REPLYlink written 9 days ago by Vijay Lakhujani2.7k

Yes. It's

#config_file#
max_rd_len=100
[LIB]
avg_ins=469
reverse_seq=0
asm_flags=3
rank=1
map_len=32
q1=1_1.fastq
q2=1_2.fastq
[LIB]
avg_ins=467
reverse_seq=0
asm_flags=3
rank=1
q1=2_1.fastq
q2=2_2.fastq
[LIB]
avg_ins=474
reverse_seq=0
asm_flags=3
rank=1
map_len=32
q1=3_1.fastq
q2=3_2.fastq

... 

[LIB]
avg_ins=469
reverse_seq=0
asm_flags=3
rank=1
map_len=32
q1=12_1.fastq
q2=12_2.fastq

I wrote the config file by imitating the examples on the Internet. The insert size values were estimated by a bam-analysis software called qualimap. The fastq files is about 50Gb each.

ADD REPLYlink modified 9 days ago • written 9 days ago by Yingzi Zhang30

Likely a memory problem. How much memory are you using?

ADD REPLYlink written 8 days ago by genomax51k

RAM 768Gb; internal storage 50Tb

ADD REPLYlink written 8 days ago by Yingzi Zhang30

Are you the only user on this machine? 768G may not be enough for a large data set (looks like you have ~600G). You may want to normalize the dataset (bbnorm.sh from BBMap suite can do this). That operation may also take a large amount of memory just so you are aware.

ADD REPLYlink written 7 days ago by genomax51k

Yes i am the only user. I have about 400G reads in total. Did you have experience of how much memory I need?

ADD REPLYlink written 7 days ago by Yingzi Zhang30
1

It would likely depend on unique k-mers you have in your data. You can try @Brian's suggestion in this thread to estimate (How to estimate peak memory usage of SOAPdenovo ).

Are these 12 separate libraries or multiple runs of the same library? You should be able to reduce redundancy of data in either case. Actually having too much coverage is also bad for de novo assemblies (it sounds counter intuitive but it is true: de novo sequence assembly with extremely high coverage ).

ADD REPLYlink modified 7 days ago • written 7 days ago by genomax51k

They are multiple runs of the same library. I am not clear how to arrange runs in one or different library/libraries. And I am not clear how to set the option rank = neither. I will take your advice to estimate peak memory usage first. :) Thank you. Proof positively, I tried to ran the 7th library solely and it ran smoothly till its finish.

Yingzi

ADD REPLYlink written 7 days ago by Yingzi Zhang30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 979 users visited in the last hour