Question: Observed and expected fastq header count different when merging two fastq files
Hi All,

Recently I concatenated two fastq files from one library belonging to the same sample but loaded in different lanes, using the following command.

cat Sample01_L001_R1.all.fastq.gz Sample01_L006_R1.all.fastq.gz > Sample01_L001-6_R1.all.fastq.gz

Sample fastq file content:

@HISEQ:137:C8W59ACXX:1:1101:1183:2157 1:N:0:TATGGC

​To validate that all the lines have been copied to the output fastq file "Sample01_L001-6_R1.all.fastq.gz". I counted the number of lines in each fastq file using following command.

$ zcat Sample01_L001_R1.all.fastq.gz | grep '@HISEQ:137' | wc -l  


$ zcat Sample01_L006_R1.all.fastq.gz | grep '@HISEQ:137' | wc -l 


$ zcat Sample01_L001-6_R1.all.fastq.gz | grep '@HISEQ:137' | wc -l  


Expected count should be 56,340,558.

Why the number of fastq header count is different from the expected?

ADD COMMENTlink written 5.0 years ago by nalandaatmi90

hard to say - do it again, I agree that the counts should be the same, it is possible that the file has been corrupted in some manner

ADD REPLYlink written 5.0 years ago by Istvan Albert ♦♦ 85k

Thanks Istvan Albert. I will try it again.

ADD REPLYlink written 5.0 years ago by nalandaatmi90

Have you tried counting all lines and dividing by 4 to see what number you get?

To be safe you can also try this instead:

$ zcat seq1.fq.gz seq2.fq.gz | gzip -c > all.fq.gz

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by genomax92k

what are those numbers without the grep (all lines) ?

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Pierre Lindenbaum131k

I know this isn't answering your question directly, but you can just supply your fastq files to the aligner, most aligners can merge them for you. I know that STAR does that for sure.

ADD REPLYlink written 5.0 years ago by Kirill290
