Question: bam_sort_core problem when bam files be processed by samtools
1
gravatar for 1106518271
27 days ago by
110651827150
110651827150 wrote:

I sorted my RNA-Seq bam files after mapping by star, and here is my key command line in a for cycle bash file:

samtools sort $file -o ${filename}.sorted.bam

At last, the one error file shows like:

[bam_sort_core] merging from 6 files and 1 in-memory blocks...
[bam_sort_core] merging from 4 files and 1 in-memory blocks...
[bam_sort_core] merging from 8 files and 1 in-memory blocks...
[bam_sort_core] merging from 9 files and 1 in-memory blocks...
[bam_sort_core] merging from 7 files and 1 in-memory blocks...
[bam_sort_core] merging from 4 files and 1 in-memory blocks...
[bam_sort_core] merging from 5 files and 1 in-memory blocks...
[bam_sort_core] merging from 3 files and 1 in-memory blocks...
[bam_sort_core] merging from 6 files and 1 in-memory blocks...
[bam_sort_core] merging from 13 files and 1 in-memory blocks...
[bam_sort_core] merging from 8 files and 1 in-memory blocks...
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
...

Something wrong here? I searched but still don't know why, so I'm not sure these sorted files be used for next step?
What's more, I have M(like 63) files, and there M-N(like 6, yes, few) lines, it means not all file will meet this problem? It makes me more confused.

Any ideas will be appreciated!!!

rna-seq samtools • 130 views
ADD COMMENTlink modified 10 days ago by linouhao0 • written 27 days ago by 110651827150
2

If you have the memory, you can reduce the number of temporary files by increasing the default memory usage from 768Mb to, say, 2G using the -m option, e.g. samtools sort -m 2G -o out.bam in.bam. Be sure to never use something like -m 2 rather than -m 2G as this would set the memory limit to 2 bytes resulting in thousands of tmp files, eventually crashing the system.

ADD REPLYlink modified 25 days ago • written 25 days ago by ATpoint10k
1

Got it, many thanks!

ADD REPLYlink written 25 days ago by 110651827150

so is it really an error, and anyone knows to set -@ and -m, which is more useful when dealing with hundreds of bams at the same time in cluster, thanks a lot

ADD REPLYlink written 10 days ago by linouhao0

Please use Add Comment for comments. As Istvan explained, these are just status messages, neither errors nor warnings. Set -@ and -m as you like, but these are options that still deal with one file at a time. If you want things parallelized, have a look at GNU parallel, like:

find ./ -maxdepth 1 -name "*.bam" | parallel -j 8 "samtools sort -@ 2 -m 2G -o {.}_sorted.bam {}

This command will sort all BAM files in your current directories, 8 at a time with 2 cores and 2GB of memory per core each.

ADD REPLYlink modified 10 days ago • written 10 days ago by ATpoint10k

By the way, will this "error" here lead to give wrong result(${filename}.sorted.bam)?

ADD REPLYlink written 9 days ago by 110651827150
5
gravatar for Istvan Albert
27 days ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

These are not error messages, just debugging notes.

Large files cannot be sorted in memory thus get saved into temporary files. Once the sort completes the temporary files are removed.

There is nothing to be concerned about

ADD COMMENTlink written 27 days ago by Istvan Albert ♦♦ 78k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1055 users visited in the last hour