Question: How To Merge Two Fastq.Gz Files?
gravatar for newDNASeqer
6.4 years ago by
United States
newDNASeqer670 wrote:

I need to merge two fastq.gz files. I could decompress the files and do "cat" on them, but is there any faster way? can I use

gzcat file1.fastq.gz file2.fastq.gz | gzip > merged.fastq.gz


merge fastq • 70k views
ADD COMMENTlink modified 6.4 years ago by Pierre Lindenbaum126k • written 6.4 years ago by newDNASeqer670

have a try seqtk mergepe A: Merging two fastq files

ADD REPLYlink written 2.8 years ago by pengchy410

How to find and merge all fastq files into one log file

ADD REPLYlink written 2.5 years ago by adithyankala0
gravatar for Pierre Lindenbaum
6.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:

See SO : Fast Concatenation of Multiple GZip Files

A gzip file consists of a series of "members" (compressed data sets). [...] The members simply appear one after another in the file, with no additional information before, between, or after them.

cat file1.gz file2.gz file3.gz > allfiles.gz
ADD COMMENTlink modified 12 months ago by RamRS25k • written 6.4 years ago by Pierre Lindenbaum126k

This worked well for me. Note that you can confirm that this works in most cases by doing something like the following:

cat file1.gz file2.gz file3.gz > allfiles-cat.gz
zcat file1.gz file2.gz file3.gz | gzip -c > allfiles-zcat.gz
zcat allfiles-cat.gz | md5sum
zcat allfiles-zcat.gz | md5sum

The resulting hash/message digests should be identical.

My experience is that the zcat method is around 40x slower, but the cat method's resulting file is a few percent bigger depending on your the gzip parameters used in the methods.

ADD REPLYlink modified 12 months ago by RamRS25k • written 4.7 years ago by alan80

Note that this will someitmes lead to trouble, if tools do not implement gzip compression correctly. I spend an hour or so to find out that FastQC was only reading the first of 10 fastq.gz files I had combined in this way (they seem to have fixed this in the newest release).

ADD REPLYlink written 6.4 years ago by lelle820

Can I know why after the cat my two 1.5G fastq.gz files become one 700M file? Why it got much smaller?

ADD REPLYlink written 2.7 years ago by xiaoyonf10

Probably the compression found more patterns when more data was added, thus it could reduce more information.

ADD REPLYlink written 5 months ago by Lluís R.890
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 666 users visited in the last hour