Do threads matter for host DNA removal with bbmap?
1
1
Entering edit mode
11 months ago
HPC123 ▴ 10

Hello,

I want to use bbmap to filter out host sequences from my metagenomics reads. It is my understanding that bbmap calculates the insert size independently for each thread that it is run on, and I am trying to figure out if that makes a difference for my purposes. Does the insert size influence which reads are mapped?--in other words does bbmap (A) map all the reads then determine insert size or (B) calculate insert size and use this to determine if a read has mapped?

command:

bbmap.sh -Xmx${mem}g -t=${threads} minid=0.95 maxindel=3 bwr=0.16 bw=12 \\
      quickmatch=t fast=t minhits=2 qtrim=rl trimq=10 untrim=t in1=${forward_fastq} \\
      in2=${reverse_fastq} path=${filtered_read_path} outu1=${out1} outu2=${out2}

Thanks!!

bbmap metagenomics • 579 views
ADD COMMENT
1
Entering edit mode
11 months ago

The difference is very small between runs (it will generally not affect ~99.99% of reads). But you can add the "deterministic" flag to disable dynamic calculation of insert size per thread. In that case you may wish to set the "averagepairdist" flag, which defaults to 100; that's the average inner distance between reads, so basically (insert size)-(sum of read lengths), and it can be negative.

To be clear, the number of threads does not change this. When run multithreaded with paired reads, there will be a very slight nondeterminism even if you run it twice with the same number of threads, unless you use the "deterministic" flag.

So, I would leave it at default unless for some reason you absolutely need 100% repeatable results (for validating a pipeline, for example).

ADD COMMENT

Login before adding your answer.

Traffic: 879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6