how to run obisplit in parallel
1
0
Entering edit mode
7.5 years ago

obitools's obisplit command will take a file of sequences and sort reads into separate files, which it will name according to some specified attribute of sequences. The command

obisplit -p DATA_ -t color input.fastq

will result in files DATA_red.fastq, DATA_green.fastq, DATA_blue.fastq.

However, when I try to run this in parallel using parallel package, the output is not files but it prints to the console.

What I do is I split input.fastq into several files, e.g. input_01.fastq and input_02.fastq (using ngsutils) and run them in parallel

find . * | grep -P "^input_\\d+" | parallel -j+2 obisplit -p DATA_processed_color_{/.}_ -t color {/}

{/.} records from which file (01 or 02) the data is from and {/} is input_* as captured by find.

How can I convince parallel to write to files instead of the console?

obitools obisplit parallel linux fastq • 2.0k views
ADD COMMENT
0
Entering edit mode

I'm not sure about your parallel command and/or obisplit, but I would try:

ls *.fastq | parallel -j 2 'obisplit -p DATA_processed_color_{.}_ -t color {}'

Although this might not have the exact output name you would want to have.

ADD REPLY
0
Entering edit mode

I have tried various ways of what you've suggested but still no dice. What I care about is that file must have the right designation as assigned by obisplit (e.g. DATA_processed_color_bookeeping_*red*), the rest is just bookkeeping which I can handle later on.

ADD REPLY
2
Entering edit mode
7.5 years ago

The solution was more obvious than initially thought. I was referencing the wrong .fastq file and the output of obisplit was for some reason on that particular file, printed to output. When I switched to the right file, everything folded into place.

ADD COMMENT

Login before adding your answer.

Traffic: 816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6