I have been using Picard's
DownsampleSam to extract random alignments from a .bam file.
The tool documentation for
DownsampleSam states that
Reads marked as not primary alignments are all discarded
However, when running
DownsampleSam (Picard release 2.4.1) on my .bam file, these 'not primary alignments' are not discarded.
For example, running
samtools flagstat on my .bam file indicates I have a total of 66427440 reads, of which 22101618 are secondary alignments.
DownsampleSam should only take 44325822 reads forward for downsampling, but it is downsampling from all 66427440 reads ("Kept 39866332 out of 66427440 reads (60.01%)").
Below is the command line for the above:
DownsampleSam I=input.bam O=downsampled.bam RANDOM_SEED=2 P=0.6
I then ran the above command again, but using Picard release 1.130. This time,
DownsampleSam did indeed discard the secondary alignments as expected (it is now downsampling from 44325822 reads: "Finished! Kept 35458630 out of 44325822 reads"). The reason why I am not using this release of Picard is because it does not support the
STRATEGY option, which I will need for downsampling to smaller fractions.
I was wondering if anyone has come across this kind of behaviour before? I find it strange that a more recent version is not performing as it should. Or am I missing something obvious? I would be very grateful for any suggestions on why
DownsampleSam is behaving differently between different versions. Thanks in advance!