The following read has 97 for its flags which indicates a paired read, but there is no other read in the bam with the same ID. This must be a error that occurred during some of the processing of this bam correct?
The following read has 97 for its flags which indicates a paired read, but there is no other read in the bam with the same ID. This must be a error that occurred during some of the processing of this bam correct?
Unfortunately, there is nothing in the SAM specs that actually requires the mate read to exist. You file is still perfectly valid. That said, there many possible reasons your mate could be missing including:
The BAM file was filtered to include only specific regions of the genome
The BAM file was filtered to remove specific regions of the genome
Unmapped reads were removed
A de-duplication algorithm that was not read-pair aware was run
A downsampling algorithm that was not read-pair aware was run
The mate failed QC and was removed
The mate was aggressively base quality or adapter trimmed to nothing
Some reads were filter on the command-line (e.g. piping to grep)
Some other kind of filtering was performed in your pipeline
The mate actually is there, but you ran an algorithm that changed the alignment position so it's not in the position it's supposed to be according to your read (e.g. GATK indel realignment does this)
The mate wasn't there to start with
The SAM flag was changed and your read was never pair
The read name of either the read or the mate was changed
To work out why it's missing you're going to have to go back to the fastq files and work your way forward through each step in your pipeline.
Definitely. If you
grep
the read name from the rawsam
file you will surely find both. 97 means:This means that the mate is also mapped and on the reverse strand. Perhaps you filtered this file before doing this operation? How did you filter it?
yes