Hello!
I've been pulling my hair Googling this and attempting the found solutions, none of which worked in the end. It baffles me that Samtools does not have a command to do just this.
I've tried view:
samtools view -h in.bam chr{1..22} chr{X,Y,M} > out.bam
This properly removes reads, but not the corresponding header lines of the unwanted contigs.
I've tried reheader:
samtools reheader -c " <sed commands that delete the header lines for the unwanted contigs> " in.bam > out.bam
But, while the header afterwards looks correct, and out.bam is indexable, the resulting file is truncated at some hundreds of reads, even though the file size is several GB!
What's going on? What is the proper standard canon official way of subsetting a bam, without breaking the bam format, like I have?
Before you ask, I really need to remove both the reads and the header lines. The alternative is to ask several developers to change their programs, and I suspect that's the worse solution.
Apologies for my frustration, and big big thanks in advance!
Joel
nevertheless, I'm asking: WHY ?!!!
Because they are being parsed by programs I'm using in my research, and they're causing crashes since they're not among the 1-22, XYM standard contigs!