samtools sort not working when using multiple threads
0
0
Entering edit mode
2.2 years ago
bio_elle ▴ 10

I'm trying to use the -@ option on samtools sort to speed up the sorting of a file using multiple threads.

Using this line

samtools sort -n -@ 4 file.bam > file_sorted.bam

I get this error

[bam_sort_core] merging from 192 files and 4 in-memory blocks...
samtools sort: failed writing to "-"

Reading online I found that a solution could be using the -m option, giving each thread more memory, so I tried running this

samtools sort -n -@ 4 -m 2G file.bam > file_sorted.bam

Even with this option I get the same error, what could be the problem? Without the -@ the sort works but since the files I have range from 100 to 300 GB I'd like to speed up the process.

samtools bam • 3.1k views
ADD COMMENT
1
Entering edit mode

That error comes from a failure to write the final output file, in this case writing to stdout. The most likely cause I can think of is running out of disk space. So you could check to see if you have enough room for the tmp files and the end result.

It could be that stdout is being closed somehow, but if you are running from the command line I don't see how that could happen.

ADD REPLY
0
Entering edit mode

For the tmp files and the end result??

So if I have a 100GB file do I need to have 200GB of space (100 for the tmp files and 100 for the final product)??

That is double what I accounted for... if that is so then that is probably the problem. I am going to try again using the -@ option but I will delete the other sorted bam before doing so.

Could I use the -l option to compress the file? It will take longer but if it really is a memory problem then I shouldn't have a problem using multiple threads and it still might be enough to make the sort go faster

ADD REPLY
2
Entering edit mode

So if I have a 100GB file do I need to have 200GB of space (100 for the tmp files and 100 for the final product)??

Basically, yes. samtools sort will put all the temporary files in the same location as it is run from. To put the the tmp files in /tmp you need to use the -T option.

samtools sort -T /tmp -n -@ 4 file.bam > file_sorted.bam

The -m 2G option you used will let samtools sort use more memory for sorting but it will only be 8G total (2G per thread) so you still need at least 92GB of disk storage for the rest of the bam file in its partially sorted stage.

I don't know how much -l will help but you give it a try. Another option would be writing the result as a cram file which would give you better compression.

ADD REPLY
0
Entering edit mode

what is the output of

ls -lah .

and what is the output of

 tr "\0" "\n" < /dev/zero  | head -n 2500000000 > remove_me_later.txt

in the very same directory ?

ADD REPLY
0
Entering edit mode

the only output of the ls -lah command is the file_sorted.bam file, there are no tmp files if that is what you're looking for, during the sort there were lots of them (I'm guessing 192 from the error) but disappeared after the sort finished with the error I showed.

The remove_me_later.txt seems empty, it looks like a file of empty lines \n but nothing else.

ADD REPLY
0
Entering edit mode

ok, I was checking you had write permissions and supported files > 2G.

ADD REPLY

Login before adding your answer.

Traffic: 1765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6