It seems that 'fastuniq' does not support fq.gz files. How to solve this except for decompressing first.
To avoid creating large intermediate files, you might try running fastuniq with named pipes.
To set up the pipes:
$ mkfifo example.pair1.fastq.gz.pipe
$ mkfifo example.pair2.fastq.gz.pipe
$ gunzip -c example.pair1.fastq.gz > example.pair1.fastq.gz.pipe &
$ gunzip -c example.pair2.fastq.gz > example.pair2.fastq.gz.pipe &
Set up a list of filenames from the named pipes:
$ cat pipelist.txt
Then run fastuniq with your list and options:
$ fastuniq -i pipelist.txt ...
When you are done, delete the pipes:
$ rm example.pair*.fastq.gz.pipe
Not all binaries accept named pipes, though, so this may not work. But it might be worth trying.
Works great! (also with multiple .gz files).
Thank you very much. I'll try.
As FastUniq requires a list of files as input, only by decompressing the files you could use it. You could use another tool, such as Dedupe from BBTools, or SuperDeduper, both accept .gz files as input.
Thank you for your recommendation! Do you know which software is better in removing PCR duplicate reads without regarding the input data format ?
If it doesn't support zipped files there is no way around this. Your best approach is to unzip the file, pipe the data into fastuniq, then zip the output. For example:
gunzip sample.fastq | fastuniq | gzip -c > uniq.fastq.gz
Running gunzip like that will not write uncompressed data to standard output. Also, fastuniq does not appear to accept standard input.
Thank you. But fastuniq does not accept standard input.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy