To speed up variant calling (SNP / Indel) on a set of large BAM files I want to do the variant calling in paralell on a Sun Grid Engine cluster.
I already managed to split the BAMs by reference (chromosome) using https://github.com/pezmaster31/bamtools/ . Afterward I use vcf tools to concatenate the chromosome VCF files to one sample VCF file.
These bamfiles are still relatively big so I would like to split the BAMS further on large N polymer sequences in the reference. I already have a file with the location of these large N polymer sequences in our reference.
Is there a tool that already has the functionality to split BAM files by N polymer sequences?
Or what would be the easiest way to use this region information to split BAM files?