Samtools Sort Set Max Memory , Value In Bytes, Kb, Mb Or Gb ?
2
1
Entering edit mode
10.8 years ago
William ★ 5.3k

Does the samtools sort maximum memory parameter (m- ) require / accept values given in bytes, kB, MB of GB?

I would like to set the maximum memory parameter to 32 GB, so all the sorting of a bam chunk is done in memory. The default value is 500000000 something.

samtools • 15k views
ADD COMMENT
2
Entering edit mode
10.8 years ago
eonsim ▴ 100

If you use the most recent samtools version from the github repository you'll be able to set unit of memory and use threads, makes a massive difference!

https://github.com/samtools/samtools

$ samtools sort
Usage:   samtools sort [options] <in.bam> <out.prefix>
Options: -n        sort by read name
     -f        use <out.prefix> as full file name instead of prefix
     -o        final output to stdout
     -l INT    compression level, from 0 to 9 [-1]
     -@ INT    number of sorting and compression threads [1]
     -m INT    max memory per thread; suffix K/M/G recognized [768M]
ADD COMMENT
0
Entering edit mode

ah you're right! The version I tested at home was the 'old' 1.18

ADD REPLY
0
Entering edit mode

Nice! I'll certainly try this. What effect does the compression level have? Sam format is not compressed? Does is mean compression to keep the memory use down?

ADD REPLY
0
Entering edit mode

the records stored in memory are uncompressed.

ADD REPLY
0
Entering edit mode

What is the option -l for then?

ADD REPLY
1
Entering edit mode

It is used when samtools is writing the data (not in memory) https://github.com/samtools/samtools/blob/develop/bam_sort.c#L500

ADD REPLY
0
Entering edit mode

What is the advantage / disadvantage of using it? If the data is not compressed in memory and also the output data is not compressed?

ADD REPLY
0
Entering edit mode

non compressed in memory: the sorting algorithm goes faster because it doesn't need to compress/uncompress each item. output data compressed=less storage needed. output data uncompressed=faster if a downstream program reads the data in a pipeline.

ADD REPLY
1
Entering edit mode
10.8 years ago
Does the samtools sort maximum memory parameter (m- ) require / accept values given in bytes, kB, MB of GB?

sort answer: no

$grep  "max_mem" bam_sort.c 

size_t max_mem = 500000000;
    case 'm': max_mem = atol(optarg); break;

but you can always use a bash inline multiplication:

samtools sort -m $((10*1024)) ....
ADD COMMENT
0
Entering edit mode

Ehm. So a number of bytes is expected? Or is it a number of records?

ADD REPLY
0
Entering edit mode

it's a number of bytes:

  max_mem  approxiate maximum memory (very inaccurate)
  (...)
  buf = (bam1_t**)calloc(max_mem / BAM_CORE_SIZE, sizeof(bam1_t*));
ADD REPLY

Login before adding your answer.

Traffic: 1244 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6