Question: process substitution input
gravatar for gmdc
3.6 years ago by
gmdc0 wrote:

dbgh5 Do not accept input from pipe [-]  or from <(cat )

dbgh5 -in <(zcat big_file.fastq.gz another_huge.fq.gz ...) ...
result:  EXCEPTION: Empty bank

It works fine if the uncompressed or compressed input is given without the process substitution, one at a time or concatenating them before.

The problem is that the temporary file will become humongous if there are lots of huge files and takes time to have it.

Is there some reason to not support this ? It could be good to avoid using extra disk space in some cases. 

Or is there something that I am missing here?


gatb dbgh5 • 968 views
ADD COMMENTlink modified 3.6 years ago by edrezen720 • written 3.6 years ago by gmdc0


I think it's not possible to do so because the dbgh5 command actually reads the input file several times :

  1. Computing statistics about part of the input file (statistics about minimizers distribution)
  2. Reading the kmers from the input file and dispatching them in partitions (according to the minimizers distribution computed in step 1)

Since you can't rewind a pipe (see here), there is no way right now to use pipes with dbgh5.


ADD REPLYlink written 3.6 years ago by edrezen720

I assume you are using DiscoSNP? Check this post:

Not that by some magic program authors implemented random access to gz files...


ADD REPLYlink written 3.6 years ago by Darked894.2k

Here I talk about the dbgh5 command itself (DiscoSNP uses this command to build a de Bruijn graph from the input reads).

For memo, the '-in' parameter of dbgh5 can be one of the following :

  1. a fasta file; ex: reads.fa
  2. a gzipped fasta file; ex: reads.fa.gz
  3. a list of fasta files (gzipped or not); ex: r1.fa,r2.fa.gz
  4. a text file containing a list of files, one file per line (possibly another text file); ex:

However, a named pipe here should not work because of the several passes on the '-in' parameter (in other words, the pipe would be consumed during the first pass, giving nothing left to read for the other passes).

ADD REPLYlink written 3.6 years ago by edrezen720
gravatar for Darked89
3.6 years ago by
Barcelona, Spain
Darked894.2k wrote:

You can try to use named pipes:

You can try:

mkfifo pipe1
mkfifo pipe2
zcat file1.gz > pipe1 
zcat file2.gz > pipe2
dbgh5 -in pipe1 pipe2

Let us know if it works for you. But start with some toy-sized, fastq.gz files just to test it.  

ADD COMMENTlink written 3.6 years ago by Darked894.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1913 users visited in the last hour