bbduk.sh trimmer with multiple input files

0

Entering edit mode

10 months ago

predeus ★ 1.9k

Hi all,

I am using bbduk.sh and was wondering if there's an efficient way to process multiple sets of reads with it? E.g. if you have read 1 as 4 separate files and read 2 as 4 separate files, typical mappers like bowtie2 or STAR support the comma-separated syntax.

Concatenating files seems like a waste of I/O which is under heavy stress already, when we process many samples.

bbduk bbmap bbtools • 709 views

ADD COMMENT • link updated 10 months ago by GenoMax 141k • written 10 months ago by predeus ★ 1.9k

1

Entering edit mode

Have you tried using process substitution or a named pipe? Efficient way of processing data would still be starting 4 jobs in parallel.

ADD REPLY • link 10 months ago by GenoMax 141k

0

Entering edit mode

Thank you for the suggestions! Process substitution fails and I'm not sure why - something in its I/O block chokes on stdin I think? The errors don't make much sense. Haven't tried named pipe yet.

Bbduk is already extremely efficient, so even doing things sequentially is actually OK - what takes longer is concatenating the sequences after (and, if they are large, which is often, this causes 100's of GB of unnecessary redundant I/O load). For now I have the "extensive" solution, but I'll post here if I find something that's efficient and sleek.

ADD REPLY • link 10 months ago by predeus ★ 1.9k

0

Entering edit mode

Merging BAM's at a point further down the workflow would likely be the most efficient way since samtools can do it multi-threaded.

ADD REPLY • link 10 months ago by GenoMax 141k

Login before adding your answer.