Question: process substitution input
0
gravatar for gmdc
23 months ago by
gmdc0
Brazil
gmdc0 wrote:

dbgh5 Do not accept input from pipe [-]  or from <(cat )

dbgh5 -in <(zcat big_file.fastq.gz another_huge.fq.gz ...) ...
result:  EXCEPTION: Empty bank

It works fine if the uncompressed or compressed input is given without the process substitution, one at a time or concatenating them before.

The problem is that the temporary file will become humongous if there are lots of huge files and takes time to have it.

Is there some reason to not support this ? It could be good to avoid using extra disk space in some cases. 

Or is there something that I am missing here?

 

gatb dbgh5 • 575 views
ADD COMMENTlink modified 23 months ago by edrezen680 • written 23 months ago by gmdc0

Hello,

I think it's not possible to do so because the dbgh5 command actually reads the input file several times :

  1. Computing statistics about part of the input file (statistics about minimizers distribution)
  2. Reading the kmers from the input file and dispatching them in partitions (according to the minimizers distribution computed in step 1)

Since you can't rewind a pipe (see here), there is no way right now to use pipes with dbgh5.

 

ADD REPLYlink written 23 months ago by edrezen680

I assume you are using DiscoSNP? Check this post: https://www.biostars.org/p/156901/

Not that by some magic program authors implemented random access to gz files...

 

ADD REPLYlink written 23 months ago by Darked894.1k

Here I talk about the dbgh5 command itself (DiscoSNP uses this command to build a de Bruijn graph from the input reads).

For memo, the '-in' parameter of dbgh5 can be one of the following :

  1. a fasta file; ex: reads.fa
  2. a gzipped fasta file; ex: reads.fa.gz
  3. a list of fasta files (gzipped or not); ex: r1.fa,r2.fa.gz
  4. a text file containing a list of files, one file per line (possibly another text file); ex:
       r1.fa
       r2.fa.gz
       fileofile.txt

However, a named pipe here should not work because of the several passes on the '-in' parameter (in other words, the pipe would be consumed during the first pass, giving nothing left to read for the other passes).

ADD REPLYlink written 23 months ago by edrezen680
0
gravatar for Darked89
23 months ago by
Darked894.1k
Barcelona, Spain
Darked894.1k wrote:

You can try to use named pipes:

http://www.linuxjournal.com/article/2156

You can try:

mkfifo pipe1
mkfifo pipe2
zcat file1.gz > pipe1 
zcat file2.gz > pipe2
dbgh5 -in pipe1 pipe2

Let us know if it works for you. But start with some toy-sized, fastq.gz files just to test it.  

ADD COMMENTlink written 23 months ago by Darked894.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1100 users visited in the last hour