Samtools flagstat results
2
0
Entering edit mode
6.7 years ago
Gene_MMP8 ▴ 240
221372 + 0 in total (QC-passed reads + QC-failed reads)  
0 + 0 secondary  
0 + 0 supplementary  
20419 + 0 duplicates  
218469 + 0 mapped (98.69% : N/A)  
155851 + 0 paired in sequencing  
77895 + 0 read1  
77956 + 0 read2  
142663 + 0 properly paired (91.54% : N/A)  
150045 + 0 with itself and mate mapped  
2903 + 0 singletons (1.86% : N/A)  
4938 + 0 with mate mapped to a different chr  
2120 + 0 with mate mapped to a different chr (mapQ>=5)

I have the following questions:
1. From previous posts I understood that read1 may not be equal to read2, as there may be reads whose mates didn't align. These are singletons. So doesn't that mean that read2-read1 must be equal to singletons? What am i missing here?
2. What does the field "with itself and mate mapped mean?"

next-gen • 7.4k views
ADD COMMENT
0
Entering edit mode

What preprocessing steps have been applied to get this file ?

ADD REPLY
0
Entering edit mode

I wrote this command: samtools flagstat example.bam. Does this help?

ADD REPLY
5
Entering edit mode
20 months ago

I realized there aren't accessible resources that clearly explain what the different lines of the flagstat output mean (This answer was useful). Here is what I found based on some other answers on forums and playing around with my own files (paired-end reads mapped using bwa mem). Some of these lines may apply regardless of aligner but others may not.

total (QC-passed reads + QC-failed reads): Total number of reads including other such as supplementary

primary: Total number of reads that were provided as input for mapping

secondary: see here

supplementary: see here

duplicates: see here

primary duplicates: primary reads that were marked as duplicates

mapped: number of mapped reads including supplementary

mapped %: percentage of mapped reads including supplementary (denominator is the number of total reads)

primary mapped: number of mapped reads that are labelled primary (just the number of mapped reads out of the input reads). i.e. excludes the number of reads that are supplementary.

primary mapped %: percentage of mapped primary reads (denominator is number of primary reads)

paired in sequencing: This is the number of paired reads. If you used only paired reads after trimming, this will be the same as the number in the primary field

read1: This is the number of forward (R1) reads. If you used only paired reads after trimming, this will be half of the number in the primary field

read2: This is the number of reverse (R2) reads. If you used only paired reads after trimming, this will be the same as the read1 field.

properly paired: This is the number of reads that map in a way that makes sense (Not too far apart, on different chromosomes, R1 read maps to the forward strand and R2 to the reverse strand etc. depending on the aligner). See this for some more information. This is suitable if you want to be very conservative with the number of reads that you consider mapped.

properly paired %: percentage of properly paired reads (denominator is number of primary reads)

with itself and mate mapped: Number of reads with its corresponding reverse / forward read also mapped. This is less strict that properly paired but more that primary mapped.

singletons: This is the number of reads that are mapped but their corresponding reverse / forward read did not map. (primary mapped - with itself and mate mapped = singletons)

singletons %: percentage of singletons (denominator is number of primary reads)

with mate mapped to a different chr: This is the number of reads that are mapped but their corresponding reverse / forward read mapped to a different chromosome. Remove these from properly paired to get an even more conservative estimate of number of mapped reads.

with mate mapped to a different chr (mapQ>=5): This is the number of reads that are mapped but their corresponding reverse / forward read mapped to a different chromosome with good quality for the alignment. Remove these from properly paired to get a more conservative estimate of number of mapped reads.

ADD COMMENT
0
Entering edit mode

Thank you for your simple and understandable explanation. I've read several definitions but still didn't quite understand how to read Samtools flagstat results

ADD REPLY
3
Entering edit mode
6.7 years ago
  1. Singletons occur when only one mate in a pair aligns. You can also have situations where one mate aligns multiple times (e.g., to a simple repeat) and the other only once. Then one will have a single entry and the other may have multiple. Also, if you did any filtering then that'd affect this as well.
  2. It means exactly what is says, both mates mapped. They may be "properly paired" or they may not be. Regardless, if they both align somewhere at least once then they count toward this.
ADD COMMENT
0
Entering edit mode

When I subtract number of reads mapped from the total number of reads (221372-218469=2903), which is the number of singletons. So can I say that singletons are the reads which didn't map to any reference?

ADD REPLY
0
Entering edit mode

By definition a singleton cannot be unmapped. If it is, it's not a singleton.

ADD REPLY
0
Entering edit mode

Thanks for your reply. Can you explain what you meant by "You can also have situations where one mate aligns multiple times (e.g., to a simple repeat) and the other only once.". I know it is a trivial question, but I am entirely new to this field. Hence asking.

ADD REPLY
3
Entering edit mode

Suppose you have the sequence ATATATATATATATATATAAGCGCTAGCTAGTCGATCTAGCTAGCTGATCGGTCGTCAGAC. You might have reads ATATATAT and GCGCTAGC. The latter read can only align to one place in that sequence. The former read can align equally well to multiple places. Consequently, some aligners will produce multiple entries for ATATATAT and a single one for GCGCTAGC.

ADD REPLY
0
Entering edit mode

Excellent explanation. Thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 2558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6