samtools sort generates over millions of files
1
0
Entering edit mode
7.4 years ago
sherry760302 ▴ 10

Hi, I am using bwa mem to map illumina pair-end reads to the rat genome, and then using samtools sort to sort the bam file. The bam file is about 15Gb. However, it appeared that over millions of temp files have been generated during samtools sort step and it cannot be completed because it reached the max directory limit of the school computer. I checked the temp files, and it looked like only one read was recorded in each file. That doesn't look right. Does anyone know what could be the problem? Thanks a lot!

alignment genome sequencing • 3.9k views
ADD COMMENT
1
Entering edit mode

you should start by adding the command that you used

ADD REPLY
1
Entering edit mode

I would guess that you forgot to set the -m parameter appropriately. It defines the amount of memory to be used at max before spilling the data to a tmp file. Check if you defined the unit correctly. Simply typing -m 1 would be 1 Byte of memory I think (thus explaining the abundant number tmp files as basically every read gets its own file), but you need something like 1G for appropriate performance.

ADD REPLY
0
Entering edit mode

Actually $MEM is set to 8G in the script. Assigning the memory via a variable must be causing some strange problem here (as linked by @tonor in one of the threads above).

ADD REPLY
0
Entering edit mode

Yeah, it is probably the variable. I had the same issue on our CentOS server. Setting -m in a variable always caused trouble there, but the same script on OS X Mavericks worked fine.

ADD REPLY
0
Entering edit mode

Can you provide the command you used to sort with samtools? samtools sort does produce multiple temp files but millions is not normal.

ADD REPLY
0
Entering edit mode

Here is the command: REF, FASTQ1, FASTQ2, OUTDIR were defined elsewhere:

SAMPLE=NP2

RG="@RG\tID:$SAMPLE\tSM:$SAMPLE\tPL:illumina\tLB:$SAMPLE\tPU:$SAMPLE"

NT=8

MEM="8G"

module load bwa
module load java
module load samtools

bwa mem -t $NT -M -R "$RG" $REF  $FASTQ1 $FASTQ2 | samtools view -bS - > $OUTDIR/$SAMPLE.bam

samtools sort -m $MEM $OUTDIR/$SAMPLE.bam $OUTDIR/$SAMPLE.sorted
ADD REPLY
0
Entering edit mode

What version of samtools are you using?

ADD REPLY
0
Entering edit mode

version 0.1.18 thanks!

ADD REPLY
2
Entering edit mode

Eeek! that is an ancient version of samtools. Please consider upgrading to the latest. Not sure why samtools put one read per file in temp files.

ADD REPLY
2
Entering edit mode

Definitely worth upgrading - if you can't - this post seems related:

[Samtools-help] samtools sort creates millions of files https://sourceforge.net/p/samtools/mailman/samtools-help/thread/638D9B69-C6AA-40E9-8E3E-D2F20407471D@bx.psu.edu/

Suggests it is to do with the -m option, if you manually run the sort command without any $MEM shortcuts does it work

ADD REPLY
0
Entering edit mode

OK. I will try on a newer version. I just found out we do have samtools/1.3.1, but 0.1.18 is the default setting.... Thanks for the suggestion.

ADD REPLY
0
Entering edit mode

Also - what operating system is your school computer?

ADD REPLY
0
Entering edit mode

Red Hat Enterprise Linux 6.x

ADD REPLY
1
Entering edit mode
7.4 years ago
sherry760302 ▴ 10

Thank you all for your comments. I used the 1.3.1 version of samtools and manually defined -m, and now it works! Here is the command I used:

samtools sort -m 8G -T $OUTDIR/$SAMPLE.sorted -o $OUTDIR/$SAMPLE.sorted.bam $OUTDIR/$SAMPLE.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6