Samtools sort by name - bam size issue
1
0
Entering edit mode
5 weeks ago

Hello all,

I want to sort my bam by queryname so i used the command:

Samtools sort -n -m 1 -@ 10 -o /Path/ouput.bam /path/input.bam

It's work fine but at the end the ouput.bam is really bigger thant input.bam (70Go vs 47Go). It is normal ? Before to go on with this bam i prefer be sure... My input bam is the ouput of samtools view (to remove unwanted read).

Thanks in advance,

Quentin

alignment genome samtools • 119 views
ADD COMMENT
1
Entering edit mode

I would suggest to always be explicit when setting memory limits, so 1G rather than 1. Maybe they made it bullet-proof in recent samtools versions but (if memory serves) there was I think a time when -m 1 was interpreted as memory=1byte and this resulted in samtools spamming the disk with millions of tiny temporary files for each chunk. Can be that I mix it up with another tool, but it does not hurt to be explicit. But yes, as Pierre says this behaviour is expected.

ADD REPLY
0
Entering edit mode

Thanks for your reply. Yes in the real command i put 1G but i forgot here sorry 🙂

ADD REPLY
2
Entering edit mode
5 weeks ago

yes, it's normal. when it's sorted by coordinate, some similar DNA sequences are grouped in the same block of gzip compression, which improve the performance of the compression. When sorting by query-name, you break those groups.

see Size of BAM file reduces after sorting with samtools

ADD COMMENT

Login before adding your answer.

Traffic: 1881 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6