Hi,
I have a rookie question. I was using the samtools flagstat
to check the statistics of bam files. When I view the results of that bam file, I see that number of reads which pass the QC is sometimes more than the number of reads which mapped. My understanding is that bam files only include mapped reads. Does it have unmapped reads too?
An e.g., is:
$ samtools flagstat file.bam
257823892 + 0 in total (QC-passed reads + QC-failed reads)
132531248 + 0 duplicates
209402202 + 0 mapped (81.22%:nan%)
257823892 + 0 paired in sequencing
128911946 + 0 read1
128911946 + 0 read2
152678438 + 0 properly paired (59.22%:nan%)
48421690 + 0 singletons (18.78%:nan%)
3565988 + 0 with mate mapped to a different chr
1316058 + 0 with mate mapped to a different chr (mapQ>=5)
Here, the mapping is 81.22%. I thought if the bam files have only mapped reads, then it should be 100% mapped. Can anyone help me understand this? Tried looking online but no luck.
The bam file was generated by Lifescope mapping using paired SOLiD reads.
Thanks!
I guess for lifescope, the read pair where both the reads remain ualigned ends up in unmapped.bam file. I think you have the option to select what you want to do with the unmapped reads. But the mapped bam file will have both the reads from a read pair where one read was mapped and other failed to map.
Oh I see. In that case singleton means all the single reads which failed to map. And the number 209,402,202 means all the pairs that were mapped. Is that right?