Question: How To Remove The Qc-Failure Reads?
2
gravatar for Junfeng
7.7 years ago by
Junfeng310
Junfeng310 wrote:

I obtain the following output from running the samtools flagstat command for one bam file:

1403478261 in total

100745504 QC failure

63676619 duplicates

1403478261 mapped (100.00%)

1403478261 paired in sequencing

704406339 read1

699071922 read2

1331910684 properly paired (94.90%)

1343780774 with itself and mate mapped

59697487 singletons (4.25%)

7717886 with mate mapped to a different chr

7574861 with mate mapped to a different chr (mapQ>=5)

Then I use picard tools to remove duplicates. After that how to remove the QC failure reads? Command 'samtools view -F 0x0200 -b a.bam > b.bam' can be used here?

If i want to call SNPs and indels, can i use the command 'samtools view -f 0x0002 -b a.bam > b.bam' to only include reads that is mapped in a proper pair?

By the way, where can I find the detailed explanation of samtools flagstat output? The website http://i.seqanswers.com/questions/80/interpreting-samtools-flagstat-output gives a simple explanation. But I still cannot understand what is the meaning of properly paired.

next-gen samtools sequencing • 3.6k views
ADD COMMENTlink written 7.7 years ago by Junfeng310
5
gravatar for Nina
7.6 years ago by
Nina330
Vancouver, BC, Canada
Nina330 wrote:

QC failure of a read is recorded in the bitwise flag of the sam record. so yes "samtools view -F 512" is exactly what you want to use if you want to create a new bam that excludes the QC failed reads.

(I forget if you'll also need to add the -h option so samtools has the header info required to construct the output bam)

flagstat is a really simple tool that just reports what's in a bam file by looking at the bitwise flag and the mate reference name (ie cols 2 and 7 in sam format).

The bitwise flags are already set before flagstat is run -- either by the aligner or by other processing steps that manipulated the bam file. So the answer provided in seqanswers wrt proper pairs is correct and can't really be expanded on...in the sam/bam format it's just a bit in each record. The definition of proper which determines whether the bit is set or not, depends on the aligner and/or other upstream bam processing steps. So if to get the answer as to what "proper pair" means, I would suggest submitting another question where you specify how the bam file was generated.

ADD COMMENTlink written 7.6 years ago by Nina330

Thanks for your useful answer. I have submitted another question about the meaning of samtools flagstat output http://biostar.stackexchange.com/questions/6143/the-meaning-of-samtools-flagstat-output-of-illuminas-bam-file.

ADD REPLYlink written 7.6 years ago by Junfeng310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1642 users visited in the last hour