I realized that this is a stupid mistake I have made. Since samtools do not overwrite the files by default, the output that I get from samtools merge output.bam f2.bam f1.bam wan't what I thought it was
below is my original post
I'm using samtools/1.9.0 and I'm trying to merge 2 files, but what I observed is when I give the inputs in different order I get very different output.
I have f1.bam
3G and f2.bam
200M. When I do samtools merge output.bam f1.bam f2.bam
, I get a that's slightly larger than 3G.
When I do samtools merge output.bam f2.bam f1.bam
, I get a that's slightly larger than 200M.
I'm sure they are not just f1.bam
or f2.bam
, but I wonder what could have been wrong or is it a issue with samtools/1.9.0?
I also observed that the total counts I get from the output bam files of samtools merge output.bam f1.bam f2.bam
are 10 times larger than that from samtools merge output.bam f2.bam f1.bam
This is the flagstat output from samtools merge output.bam f1.bam f2.bam
164862366 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
164862366 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
This is the flagstat output from samtools merge output.bam f2.bam f1.bam
11009638 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
11009638 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
File sizes should not be used as a metric for anything but a qualitative assessment (e.g. something worked there is a file with stuff in it). Have you tried to sort the final files after merging? That final result should be perhaps identical (if not very similar) in terms of size.
Agree.I also observed that the total counts I get from the output bam files of samtools merge output.bam f1.bam f2.bam are 10 time larger than that from samtools merge output.bam f2.bam f1.bam
Something odd is going on here since the total number of reads is changing in two files after the merge. You are running
samtools v.1.9
(current issamtools v.1.14
) so one suggestion would be to try the latest to see if that fixes the problem.