The following read has 97 for its flags which indicates a paired read, but there is no other read in the bam with the same ID. This must be a error that occurred during some of the processing of this bam correct?
Unfortunately, there is nothing in the SAM specs that actually requires the mate read to exist. You file is still perfectly valid. That said, there many possible reasons your mate could be missing including:
- The BAM file was filtered to include only specific regions of the genome
- The BAM file was filtered to remove specific regions of the genome
- Unmapped reads were removed
- A de-duplication algorithm that was not read-pair aware was run
- A downsampling algorithm that was not read-pair aware was run
- The mate failed QC and was removed
- The mate was aggressively base quality or adapter trimmed to nothing
- Some reads were filter on the command-line (e.g. piping to grep)
- Some other kind of filtering was performed in your pipeline
- The mate actually is there, but you ran an algorithm that changed the alignment position so it's not in the position it's supposed to be according to your read (e.g. GATK indel realignment does this)
- The mate wasn't there to start with
- The SAM flag was changed and your read was never pair
- The read name of either the read or the mate was changed
To work out why it's missing you're going to have to go back to the fastq files and work your way forward through each step in your pipeline.