I would like to subset a large bam file so that I can do the variant calling in parallel (using freebayes).
I have tried to subset the bamfile based on subsets of the reference sequence. First, I subset the fasta reference sequence into fasta files containing the equal amount of chromosomes. Then, I used the following command
samtools view file.bam -T subset.fasta -b > subset.bam
to subset the bam file so that it extracts only the reads that map to the chromosomes included in the respective reference subset:
When I view subset.bam in IGV it only shows the chromosomes that are included in the respective reference sequence which tells me that it worked. However, when I do the variant calling, I get an error saying that freebayes was unable to find a FASTA index entry for the chromosomes that don't belong to that reference subset. Obviously, they are not there because I removed them when I subset the reference.
Any suggestions on what might have gone wrong? If not, does anybody know a different way to subset a bam file into equal parts based on chromosome IDs?