I am attempting to run a comparison a batch of alignments (bam format) to assess spread. However, some of my datasets are larger than others and I wish to subsample down to an equal number of reads per sample.
If I wanted to randomly extract 1 Million reads from a bam file is there a method to do this?
Note: I am fully aware of the samtools and picard methods which allow you to reduce by a proportion (i.e. the flag -s 0.33) but that would not result in an fixed number of reads per sample, which is what I need, but a reduced proportion per sample which doesn't help.
Bam subsampling has previously been talked about it but always the proportional data reduction, not the fixed number required: Sample Sam File, A: Subsampling Bam File With Samtools, How to reduce reads in fastq, bam, bed files in a random manner?, https://broadinstitute.github.io/picard/command-line-overview.html
Edit: Also, I've come across bamtools but haven't been able to get it to work and seems to have not been updated in quite some time