Question: bbduk flags 'tossbrokenreads' and 'nullifybrokenquality'
0
gravatar for Anand Rao
4 weeks ago by
Anand Rao250
United States
Anand Rao250 wrote:

I seek help understanding these 2 flags for BBDUK of BBMAP = 'tossbrokenreads' and 'nullifybrokenquality'

I see these flags mentioned in the STDERR of my bbduk.sh step using BBMap version 38-60 while decontaminating Illumina SE 100nt raw reads via "Adapter and Quality Trimming" - please see a relevant block of the STDERR copy-pasted below

[E::bgzf_read] Read block operation failed with error 2 after 58624 of 65536 bytes
Error 3 in block starting at offset 1321362048(4EC26280)
java.lang.Exception: 
Mismatch between length of bases and qualities for read 17377414 (id=HWI-ST797:117:D091UACXX:4:1303:5955:45869 1:).
# qualities=27, # bases=101

CCCFFFFFHHHHHJJIIJJJIJIEIHJ
TTCCCGATCATCCCGAGAAGGAACGTCTGCCATAATCTTCTCCTGACCGCGCCAAAGAATTTTGTCAATGACCCCAAATTCCTTAGCCAATAATGCGTCCA

This can be bypassed with the flag 'tossbrokenreads' or 'nullifybrokenquality'
    at shared.KillSwitch.kill(KillSwitch.java:96)
    at stream.Read.validateQualityLength(Read.java:214)
    at stream.Read.validate(Read.java:104)
    at jgi.BBDuk$ProcessThread.run(BBDuk.java:2418)

However, the bbduk.sh help menu does not have these exact flags (too long to fully copy / paste here), the closest flag I see is tossjunk=f . Therefore, I'm

A. confused about these messages,

B. curious when and why I would call these flags, and

C. why I receive these error messages - do they imply corrupted reads in my FASTQ input?

Could forum members please help? Thanks!

flags bbmap bbduk • 101 views
ADD COMMENTlink modified 4 weeks ago by michael.ante3.5k • written 4 weeks ago by Anand Rao250
1

Those two options are not available in bbduk.sh so this seems to be a case of bbduk not printing correct error fix message. This you could point out to Brian by creating a ticket here.

Your data appears to have become corrupted at some step. Hopefully this may be a transient issue which you can verify by rerunning the sample through your pipeline again.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax74k

How did you perform 'decontamination'?

The error is clearly, that the base-string's length differs from its associated quality-string length.

Personally, I would rather investigate the problem than trying to solve it by bbmap.

ADD REPLYlink written 4 weeks ago by michael.ante3.5k

I agree, Michael.

Here are my steps including and leading to the BBDUK decontamination step(s):

rename.sh in=$IN out=$OUT fixsra=t -Xmx64g # from release 38.61, all other steps from release 38.60
IN=$OUT
clumpify.sh -Xmx64g in=$IN out=$OUT dedupe optical
IN=$OUT
bbduk.sh -Xmx64g in=$IN out=$OUT ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=70 ref=adapters ftm=5 ordered
IN=$OUT
bbduk.sh -Xmx64g in=$IN out=$OUT k=31 ref=artifacts,phix ordered cardinality

Your advice on how to "investigate" the underlying problem(s)?

ADD REPLYlink written 4 weeks ago by Anand Rao250
1

You have the read ID, check in each step if this asynchronous base/quality ratio appeared. Try to find the read in the original sra file.

Check with a simple script if this is the only case. If it was introduced in one of your steps, try to reproduce the error. If the error is reproducible, contact the BB crew.

ADD REPLYlink written 4 weeks ago by michael.ante3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1412 users visited in the last hour