I obtain the following output from running the samtools flagstat command for one bam file:
1403478261 in total
100745504 QC failure
1403478261 mapped (100.00%)
1403478261 paired in sequencing
1331910684 properly paired (94.90%)
1343780774 with itself and mate mapped
59697487 singletons (4.25%)
7717886 with mate mapped to a different chr
7574861 with mate mapped to a different chr (mapQ>=5)
Then I use picard tools to remove duplicates. After that how to remove the QC failure reads? Command 'samtools view -F 0x0200 -b a.bam > b.bam' can be used here?
If i want to call SNPs and indels, can i use the command 'samtools view -f 0x0002 -b a.bam > b.bam' to only include reads that is mapped in a proper pair?
By the way, where can I find the detailed explanation of samtools flagstat output? The website http://i.seqanswers.com/questions/80/interpreting-samtools-flagstat-output gives a simple explanation. But I still cannot understand what is the meaning of properly paired.