how to run obisplit in parallel
1
0
Entering edit mode
5.7 years ago

obitools's obisplit command will take a file of sequences and sort reads into separate files, which it will name according to some specified attribute of sequences. The command

obisplit -p DATA_ -t color input.fastq


will result in files DATA_red.fastq, DATA_green.fastq, DATA_blue.fastq.

However, when I try to run this in parallel using parallel package, the output is not files but it prints to the console.

What I do is I split input.fastq into several files, e.g. input_01.fastq and input_02.fastq (using ngsutils) and run them in parallel

find . * | grep -P "^input_\\d+" | parallel -j+2 obisplit -p DATA_processed_color_{/.}_ -t color {/}


{/.} records from which file (01 or 02) the data is from and {/} is input_* as captured by find.

How can I convince parallel to write to files instead of the console?

obitools obisplit parallel linux fastq • 1.6k views
0
Entering edit mode

I'm not sure about your parallel command and/or obisplit, but I would try:

ls *.fastq | parallel -j 2 'obisplit -p DATA_processed_color_{.}_ -t color {}'


Although this might not have the exact output name you would want to have.

0
Entering edit mode

I have tried various ways of what you've suggested but still no dice. What I care about is that file must have the right designation as assigned by obisplit (e.g. DATA_processed_color_bookeeping_*red*), the rest is just bookkeeping which I can handle later on.

2
Entering edit mode
5.7 years ago

The solution was more obvious than initially thought. I was referencing the wrong .fastq file and the output of obisplit was for some reason on that particular file, printed to output. When I switched to the right file, everything folded into place.