Question: How To Merge Two Fastq.Gz Files?
17
gravatar for newDNASeqer
7.5 years ago by
newDNASeqer710
United States
newDNASeqer710 wrote:

I need to merge two fastq.gz files. I could decompress the files and do "cat" on them, but is there any faster way? can I use

gzcat file1.fastq.gz file2.fastq.gz | gzip > merged.fastq.gz

?

merge fastq • 84k views
ADD COMMENTlink modified 7.5 years ago by Pierre Lindenbaum134k • written 7.5 years ago by newDNASeqer710

have a try seqtk mergepe A: Merging two fastq files

ADD REPLYlink written 3.9 years ago by pengchy430

How to find and merge all fastq files into one log file

ADD REPLYlink written 3.6 years ago by adithyankala0
51
gravatar for Pierre Lindenbaum
7.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

See SO : Fast Concatenation of Multiple GZip Files

A gzip file consists of a series of "members" (compressed data sets). [...] The members simply appear one after another in the file, with no additional information before, between, or after them.

cat file1.gz file2.gz file3.gz > allfiles.gz
ADD COMMENTlink modified 2.1 years ago by Ram32k • written 7.5 years ago by Pierre Lindenbaum134k
8

This worked well for me. Note that you can confirm that this works in most cases by doing something like the following:

cat file1.gz file2.gz file3.gz > allfiles-cat.gz
zcat file1.gz file2.gz file3.gz | gzip -c > allfiles-zcat.gz
zcat allfiles-cat.gz | md5sum
zcat allfiles-zcat.gz | md5sum

The resulting hash/message digests should be identical.

My experience is that the zcat method is around 40x slower, but the cat method's resulting file is a few percent bigger depending on your the gzip parameters used in the methods.

ADD REPLYlink modified 2.1 years ago by Ram32k • written 5.7 years ago by alan90
3

Note that this will someitmes lead to trouble, if tools do not implement gzip compression correctly. I spend an hour or so to find out that FastQC was only reading the first of 10 fastq.gz files I had combined in this way (they seem to have fixed this in the newest release).

ADD REPLYlink written 7.4 years ago by lelle830

Can I know why after the cat my two 1.5G fastq.gz files become one 700M file? Why it got much smaller?

ADD REPLYlink written 3.7 years ago by xiaoyonf40

Probably the compression found more patterns when more data was added, thus it could reduce more information.

ADD REPLYlink written 17 months ago by Lluís R.1000

I followed this and just used cat to combine my .fastq.gz files. For anyone else, make sure you use zcat and not just cat to combine gzipped files. zcat decompresses first, cat does not. If you combine gzipped fastq files with just cat, you will get gibberish downstream.

ADD REPLYlink written 9 months ago by omg what am I doing70
2

this is wrong. 'Cat' is OK and faster. 'Cat'-ting a set of gzipped files will produce a concatenated gzipped fastq file.

ADD REPLYlink written 9 months ago by Pierre Lindenbaum134k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1123 users visited in the last hour
_