Question: samtools sort generates over millions of files
0
gravatar for sherry760302
10 months ago by
sherry76030210
sherry76030210 wrote:

Hi, I am using bwa mem to map illumina pair-end reads to the rat genome, and then using samtools sort to sort the bam file. The bam file is about 15Gb. However, it appeared that over millions of temp files have been generated during samtools sort step and it cannot be completed because it reached the max directory limit of the school computer. I checked the temp files, and it looked like only one read was recorded in each file. That doesn't look right. Does anyone know what could be the problem? Thanks a lot!

sequencing alignment genome • 536 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by sherry76030210
1

you should start by adding the command that you used

ADD REPLYlink written 10 months ago by Chris Miller18k
1

I would guess that you forgot to set the -m parameter appropriately. It defines the amount of memory to be used at max before spilling the data to a tmp file. Check if you defined the unit correctly. Simply typing -m 1 would be 1 Byte of memory I think (thus explaining the abundant number tmp files as basically every read gets its own file), but you need something like 1G for appropriate performance.

ADD REPLYlink modified 10 months ago • written 10 months ago by ATPoint1.6k

Actually $MEM is set to 8G in the script. Assigning the memory via a variable must be causing some strange problem here (as linked by @tonor in one of the threads above).

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax34k

Yeah, it is probably the variable. I had the same issue on our CentOS server. Setting -m in a variable always caused trouble there, but the same script on OS X Mavericks worked fine.

ADD REPLYlink written 10 months ago by ATPoint1.6k

Can you provide the command you used to sort with samtools? samtools sort does produce multiple temp files but millions is not normal.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax34k

Here is the command: REF, FASTQ1, FASTQ2, OUTDIR were defined elsewhere:

SAMPLE=NP2

RG="@RG\tID:$SAMPLE\tSM:$SAMPLE\tPL:illumina\tLB:$SAMPLE\tPU:$SAMPLE"

NT=8

MEM="8G"

module load bwa
module load java
module load samtools

bwa mem -t $NT -M -R "$RG" $REF  $FASTQ1 $FASTQ2 | samtools view -bS - > $OUTDIR/$SAMPLE.bam

samtools sort -m $MEM $OUTDIR/$SAMPLE.bam $OUTDIR/$SAMPLE.sorted
ADD REPLYlink modified 10 months ago by Alex Reynolds21k • written 10 months ago by sherry76030210

What version of samtools are you using?

ADD REPLYlink written 10 months ago by Tonor400

version 0.1.18 thanks!

ADD REPLYlink written 10 months ago by sherry76030210
2

Eeek! that is an ancient version of samtools. Please consider upgrading to the latest. Not sure why samtools put one read per file in temp files.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax34k
2

Definitely worth upgrading - if you can't - this post seems related:

[Samtools-help] samtools sort creates millions of files https://sourceforge.net/p/samtools/mailman/samtools-help/thread/638D9B69-C6AA-40E9-8E3E-D2F20407471D@bx.psu.edu/

Suggests it is to do with the -m option, if you manually run the sort command without any $MEM shortcuts does it work

ADD REPLYlink modified 10 months ago • written 10 months ago by Tonor400

OK. I will try on a newer version. I just found out we do have samtools/1.3.1, but 0.1.18 is the default setting.... Thanks for the suggestion.

ADD REPLYlink written 10 months ago by sherry76030210

Also - what operating system is your school computer?

ADD REPLYlink written 10 months ago by Tonor400

Red Hat Enterprise Linux 6.x

ADD REPLYlink written 10 months ago by sherry76030210
1
gravatar for sherry760302
10 months ago by
sherry76030210
sherry76030210 wrote:

Thank you all for your comments. I used the 1.3.1 version of samtools and manually defined -m, and now it works! Here is the command I used:

samtools sort -m 8G -T $OUTDIR/$SAMPLE.sorted -o $OUTDIR/$SAMPLE.sorted.bam $OUTDIR/$SAMPLE.bam
ADD COMMENTlink modified 10 months ago by Alex Reynolds21k • written 10 months ago by sherry76030210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 771 users visited in the last hour