How To Merge Two Fastq.Gz Files?
1
20
Entering edit mode
10.6 years ago
newDNASeqer ▴ 760

I need to merge two fastq.gz files. I could decompress the files and do "cat" on them, but is there any faster way? can I use

gzcat file1.fastq.gz file2.fastq.gz | gzip > merged.fastq.gz

?

merge fastq • 127k views
ADD COMMENT
0
Entering edit mode

have a try seqtk mergepe A: Merging two fastq files

ADD REPLY
0
Entering edit mode

How to find and merge all fastq files into one log file

ADD REPLY
60
Entering edit mode
10.6 years ago

See SO : Fast Concatenation of Multiple GZip Files

A gzip file consists of a series of "members" (compressed data sets). [...] The members simply appear one after another in the file, with no additional information before, between, or after them.

cat file1.gz file2.gz file3.gz > allfiles.gz
ADD COMMENT
12
Entering edit mode

This worked well for me. Note that you can confirm that this works in most cases by doing something like the following:

cat file1.gz file2.gz file3.gz > allfiles-cat.gz
zcat file1.gz file2.gz file3.gz | gzip -c > allfiles-zcat.gz
zcat allfiles-cat.gz | md5sum
zcat allfiles-zcat.gz | md5sum

The resulting hash/message digests should be identical.

My experience is that the zcat method is around 40x slower, but the cat method's resulting file is a few percent bigger depending on your the gzip parameters used in the methods.

ADD REPLY
3
Entering edit mode

Note that this will someitmes lead to trouble, if tools do not implement gzip compression correctly. I spend an hour or so to find out that FastQC was only reading the first of 10 fastq.gz files I had combined in this way (they seem to have fixed this in the newest release).

ADD REPLY
0
Entering edit mode

Can I know why after the cat my two 1.5G fastq.gz files become one 700M file? Why it got much smaller?

ADD REPLY
0
Entering edit mode

Probably the compression found more patterns when more data was added, thus it could reduce more information.

ADD REPLY

Login before adding your answer.

Traffic: 1492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6