I am trying to realign a whole genome BAM file from one reference genome to another. The reason for this is that I am interested in HLA regions, and the original reference genome does not include these regions. The process involves converting the name-sorted BAM file to fastq, then realigning the fastq to a new reference.
I seem to be losing reads when converting from BAM to fastq. I have tried a number of ways to do this, including:
samtools fastq -1 < file1.fq > -2 < file2.fq > < input.bam >
bamToFastq -i < input.bam > -fq < file1.fq > -fq2 < file2.fq >
- Following the process here:
In each case the number of reads in my output fastq file (counted using
wc -l <file> / 4) is slightly less than the original BAM file (counted using
bamToFastq I get several errors like this:
*****WARNING: Query 6:1219:30638:3260 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
I suspect this is the cause of my read loss. Most of these seem to be in chromosome 6, which is my region of interest. I have tried using
samtools fixmate, but still get this same error.
Any ideas would be greatly appreciated!