How To Remove The Qc-Failure Reads?
1
2
Entering edit mode
13.2 years ago
Junfeng ▴ 330

I obtain the following output from running the samtools flagstat command for one bam file:

1403478261 in total

100745504 QC failure

63676619 duplicates

1403478261 mapped (100.00%)

1403478261 paired in sequencing

704406339 read1

699071922 read2

1331910684 properly paired (94.90%)

1343780774 with itself and mate mapped

59697487 singletons (4.25%)

7717886 with mate mapped to a different chr

7574861 with mate mapped to a different chr (mapQ>=5)

Then I use picard tools to remove duplicates. After that how to remove the QC failure reads? Command 'samtools view -F 0x0200 -b a.bam > b.bam' can be used here?

If i want to call SNPs and indels, can i use the command 'samtools view -f 0x0002 -b a.bam > b.bam' to only include reads that is mapped in a proper pair?

By the way, where can I find the detailed explanation of samtools flagstat output? The website http://i.seqanswers.com/questions/80/interpreting-samtools-flagstat-output gives a simple explanation. But I still cannot understand what is the meaning of properly paired.

next-gen sequencing samtools • 5.5k views
ADD COMMENT
5
Entering edit mode
13.2 years ago
Nina ▴ 400

QC failure of a read is recorded in the bitwise flag of the sam record. so yes "samtools view -F 512" is exactly what you want to use if you want to create a new bam that excludes the QC failed reads.

(I forget if you'll also need to add the -h option so samtools has the header info required to construct the output bam)

flagstat is a really simple tool that just reports what's in a bam file by looking at the bitwise flag and the mate reference name (ie cols 2 and 7 in sam format).

The bitwise flags are already set before flagstat is run -- either by the aligner or by other processing steps that manipulated the bam file. So the answer provided in seqanswers wrt proper pairs is correct and can't really be expanded on...in the sam/bam format it's just a bit in each record. The definition of proper which determines whether the bit is set or not, depends on the aligner and/or other upstream bam processing steps. So if to get the answer as to what "proper pair" means, I would suggest submitting another question where you specify how the bam file was generated.

ADD COMMENT
0
Entering edit mode

Thanks for your useful answer. I have submitted another question about the meaning of samtools flagstat output The Meaning Of Samtools Flagstat Output Of Illumina'S Bam File

ADD REPLY

Login before adding your answer.

Traffic: 2842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6