Forgive the silly question but I'm having a problem with concatenation that is driving me a little mad.
I have a sequencing run with reads for each sample spread across multiple lanes. So I wanted to concatenate them before proceeding with mapping and further downstream analysis.
I looked up how to concatenate multiple fastq files on biostars and found this great answer: merge large amount of fastq files into a single one
I proceeded to concatenate the multiple lanes using:
cat *fastq.gz > merged.fastq.gz
The problem is when I count the # of reads in each individual file and add it all up I get 31764073 reads however when I cat them together and count I only get 15434478 reads. I tried typing the file names out one by one and got the same result as file globbing above.
I'm counting the number of reads using (Sequence Number Count In Fastq.Gz File) :
zcat my.fastq.gz | echo $((`wc -l`/4))
Can anyone help me understand what is happening? Am I losing some of these reads in the concatenation process?