Concatenate fastq.gz - less reads after concatenation than before?
1
0
Entering edit mode
6.1 years ago
josh.cutts1 ▴ 40

Forgive the silly question but I'm having a problem with concatenation that is driving me a little mad.

I have a sequencing run with reads for each sample spread across multiple lanes. So I wanted to concatenate them before proceeding with mapping and further downstream analysis.

I looked up how to concatenate multiple fastq files on biostars and found this great answer: merge large amount of fastq files into a single one

I proceeded to concatenate the multiple lanes using:

cat *fastq.gz > merged.fastq.gz

The problem is when I count the # of reads in each individual file and add it all up I get 31764073 reads however when I cat them together and count I only get 15434478 reads. I tried typing the file names out one by one and got the same result as file globbing above.

I'm counting the number of reads using (Sequence Number Count In Fastq.Gz File) :

zcat my.fastq.gz | echo $((`wc -l`/4))

Can anyone help me understand what is happening? Am I losing some of these reads in the concatenation process?

next-gen sequencing ChIP-Seq • 3.0k views
ADD COMMENT
1
Entering edit mode

There's likely a fastq file with a typo in the file name, such that it doesn't end in fastq.gz.

As an aside, tell your sequencing provider that bcl2fastq has a --no-lane-splitting option that they could have used to obviate the need for you to merge the files.

ADD REPLY
0
Entering edit mode

Josh, *fastq.gz is all FASTQ files. Are you sure you don't have Paired End reads that need to be concatenated separately into 2 different files? I'd check on downstream tool requirements before doing this cat.

ADD REPLY
0
Entering edit mode
6.1 years ago
josh.cutts1 ▴ 40

Arg sorry! I made a mistake that I can't reproduce.

I tried to recreate the problem and it has gone away and everything adds up correctly. I just have less reads than our sequencing provider said we would but the counts add up in the individual files so I need to follow up with them.

Thanks for your help Devon and Ram. Good to know that there is a no lane splitting option for the future!

ADD COMMENT
0
Entering edit mode

Good to know this, Josh. I'm moving your post to an accepted answer to provide this thread with closure. For people that visit this post in the future, the takeaway is: try to reproduce the problem :-)

ADD REPLY

Login before adding your answer.

Traffic: 2643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6