Question: Filtering Out Reads Based On Base Quality With Samtools Seems Not To Work Properly
gravatar for Pascal
8.3 years ago by
Pascal1.5k wrote:

We have an issue with the -Q parameter in samtools mpileup: it seems not working properly.

In a bam file I know that the all reads have base Q = 40, the following commands doesn't filter out nothing:

samtools mpileup -Q 93 out_sorted.bam

Here's a complete workflow to reproduce quickly the issue we are experiencing based on simulated data. It generates a bam file and converts all read base qualities to 40 then samtools mpileup is called with a minimum of 93:

samtools faidx human_g1k_v37.fasta 20:1-5000000 > ref.fasta
wgsim -N 200000 -r 0 -R 0 -X 0 ref.fasta out.read1.fq out.read2.fq > wgsim.out
awk ' /222/ { gsub("2", "I"); print $0; next } { print } ' out.read1.fq > out.read1b.fq  
awk ' /222/ { gsub("2", "I"); print $0; next } { print } ' out.read2.fq > out.read2b.fq
bwa index ref.fasta 
bwa aln ref.fasta out.read1b.fq -f out1.sai
bwa aln ref.fasta out.read2b.fq -f out2.sai
bwa sampe -f out.sam -r '@RG\tID:foo\tSM:bar' ref.fasta out1.sai out2.sai out.read1b.fq out.read2b.fq
samtools view -hbS out.sam -o out.bam
samtools sort out.bam out_sorted
samtools index out_sorted.bam
samtools mpileup -Q 93 out_sorted.bam > mpileup.txt

Please let me know if I did not understand something or if there is a real issue here with the -Q function.

samtools • 5.2k views
ADD COMMENTlink written 8.3 years ago by Pascal1.5k

perhaps 93 is an invalid value? just guessing try 92 or 60 or even 41

ADD REPLYlink written 8.3 years ago by Istvan Albert ♦♦ 84k
gravatar for FGV
8.0 years ago by
FGV110 wrote:

I'm under the impression that the -Q and -q option won't filter out anything from the pileup file. That is, even reads/bases are below the threshold they will still show up there. SAMTOOLS just won't take them into account when calculating genotype likelihoods (-g).

Also, -Q option in the manual: "-Q INT Minimum base quality for a base to be considered [13] "

which leads me to think that SAMTOOLS won't throw away entire read, but just ignore bases with lower than -Q quality when computing genotype likelihoods (GL)...

ADD COMMENTlink modified 8.0 years ago • written 8.0 years ago by FGV110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 962 users visited in the last hour