Question: Bbduk filters away good reads
0
gravatar for dt
9 days ago by
dt30
dt30 wrote:

Hi,

I seem to be missing something obvious, but I have a read (actually lots of reads, it's just an example) that shouldn't be filtered by bbduk.sh based on average quality, but it is.

Read:

@NZ_AP014881.1_0_0/2
TGCAGCATTCTCCTGATGGCGGTCTTGATGAAGAGCTTTGTTACGGGGGTCATCCTCATCCATCAGGTCTGGTGCAGAAATAAAGCGCAAAGGCTTGGGGTTACCGCCTGCGCGCATGTCTGCCAGCATATCCTCTAGCGCGGCAGGCTCTGGGCAATTAATCTCAATCTGCTGACGGTCAGACTTTGGCAAATTGAGCAGGCGGTTGCGGGCCGACGTATCCAAAAGGCGATTGCACCAGCGCTGAACACGATATCCCGGACGATCTGGTAGCTGTTCTTCTTCCAGTTCCTCACGCAGA
+
<CCCCEGG6GGGGF,GEGGGC,FFGFFFCEGGCGGGGG@GGFCGGC<FCGGGGEFGFGGF@GCG,GGG@GGGGGG<GEGFGFGGGEGGFAEEC<GGGGFFFEECD<EFGGF8<G5GGGCDBGFFGEG<GCGFECG+FCGF,CC=,*DF5=,9<7C4E:EF,,=B@CGF>:GGECGG;;8G>C1,:,C:+E,9,FF<@,*6:,793G,4+13G*7*;3*=@6C7/85+C59C+<>***2C)*1*/**))<+<2)+**)4/)A)>)1+2)**51065.:091>1***0*)*).0+*(*2*90.

Command:

bbduk.sh in1=test.fq out1=test_out.fq maq=20

Bbduk version is Version 38.46. Average quality of the read seems to be ~27. Hope someone can help, thanks in advance.

ADD COMMENTlink modified 9 days ago • written 9 days ago by dt30

could it be that the reads underwent some trimming causing it to fail under the maq threshold?

minavgquality=0 (maq) Reads with average quality (after trimming) below this will be discarded.

ADD REPLYlink modified 9 days ago • written 9 days ago by lieven.sterck4.8k

Not really, I provided the exact read and the exact command to reproduce the problem. Can you reproduce it?

ADD REPLYlink written 9 days ago by dt30
3
gravatar for dt
9 days ago by
dt30
dt30 wrote:

Ok, I believe I figured it out, it's explained in https://github.com/wdecoster/NanoPlot/issues/57:

BBDuk calculates average quality score by converting to probability scale, taking an average, and then converting back to Phred scale. So for example, a 2bp read with quality scores 10 and 20 would yield an average quality of (0.9+0.99)/2=0.945 -> Q12.6 rather than Q15 with a linear average.

Essentially it means that, looking e.g. at the seqtk fqchk output, bbduk uses the value calculated in the errQ field rather than the avgQ field. I believe, this can be confusing and should be mentioned in the Bbduk documentation. If someone knows the developer, maybe you can let him know? Thanks.

ADD COMMENTlink written 9 days ago by dt30
3

I believe, this can be confusing

This is not confusing at all. Please make yourself familiar with the mathematics of Q values.

Q values represent a logarithmic transformation of the error rates. Logarithmic transformations are often used in science and engineering, but have some pit falls, especially when it comes to calculate the "mean". Why do you want to calculate the arithmetic mean of Q values? The arithmetic mean of Q values is equivalent to the geometric mean of the error rates, which is most likely not what you want.

ADD REPLYlink modified 8 days ago • written 8 days ago by piet1.7k

nice, and that indeed probably explains iit.

for suggestions and comments on the BBTools package you can find a link on their webpage: https://jgi.doe.gov/data-and-tools/bbtools/bbtools-faq-support-forums/

ADD REPLYlink written 9 days ago by lieven.sterck4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2259 users visited in the last hour