bbduk.sh trimmer with multiple input files
0
0
Entering edit mode
10 months ago
predeus ★ 1.9k

Hi all,

I am using bbduk.sh and was wondering if there's an efficient way to process multiple sets of reads with it? E.g. if you have read 1 as 4 separate files and read 2 as 4 separate files, typical mappers like bowtie2 or STAR support the comma-separated syntax.

Concatenating files seems like a waste of I/O which is under heavy stress already, when we process many samples.

bbduk bbmap bbtools • 709 views
ADD COMMENT
1
Entering edit mode

Have you tried using process substitution or a named pipe? Efficient way of processing data would still be starting 4 jobs in parallel.

ADD REPLY
0
Entering edit mode

Thank you for the suggestions! Process substitution fails and I'm not sure why - something in its I/O block chokes on stdin I think? The errors don't make much sense. Haven't tried named pipe yet.

Bbduk is already extremely efficient, so even doing things sequentially is actually OK - what takes longer is concatenating the sequences after (and, if they are large, which is often, this causes 100's of GB of unnecessary redundant I/O load). For now I have the "extensive" solution, but I'll post here if I find something that's efficient and sleek.

ADD REPLY
0
Entering edit mode

Merging BAM's at a point further down the workflow would likely be the most efficient way since samtools can do it multi-threaded.

ADD REPLY

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6