Question

merging fastq files from RNAseq

0

Entering edit mode

7.2 years ago

4galaxy7 • 0

Hi,

I have been given a bunch of files from an RNAseq output which looks like this

3062_GTGGCC_L003_R2_008.fastq.gz 3062_GTGGCC_L003_R2_007.fastq.gz 3062_GTGGCC_L003_R2_006.fastq.gz .... 3062_GTGGCC_L003_R2_001.fastq.gz

3062_GTGGCC_L003_R1_008.fastq.gz 3062_GTGGCC_L003_R1_007.fastq.gz

I haven't been given much info about them other than they are all from the same sample - I presume the R1 is forward and R2 is reverse pairs and that the total forward and reverse had to be split into 8 each because of file size issues or something.

I know there has been threads before about merging fastq files just using a simple shell script - is it simply as easy as concatenating them? I am a bit suspicious it looks to simple and I am wary of introducing errors with paired reads further down the line. Can anyone give me a bit of advice as to the best way about merging them into a single forward and reverse file for downstream analysis.

Thanks.

RNA-Seq sequence • 3.9k views

ADD COMMENT • link updated 7.2 years ago by Pierre Lindenbaum 161k • written 7.2 years ago by 4galaxy7 • 0

0

Entering edit mode

They are split because there is a parameter in bcl2fastq (Illumina's demultiplexing tool). It defaults to 4,000,000 reads per file. I think that's for historical reasons. I never heard of a problem with FASTQs being too big, but people complain about too many FASTQs.

And yes, as Pierre already said, you can simply cat gzipped files. I was surprised to hear about this too. Just make sure you keep R1 and R2 separate.

ADD REPLY • link 7.2 years ago by igor 13k

0

Entering edit mode

I think that's for historical reasons.

it's better to have multiple fastq per sample: for WGS/WES you can the map them in parallel and then merge the results at the end.

ADD REPLY • link 7.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Sure. But there is nothing magical about 4,000,000 as far as I know.

Also, if you want to split your mapping, you can always split the full FASTQs.

ADD REPLY • link 7.2 years ago by igor 13k

score 4 · Answer 1 · 2017-02-25

I know there has been threads before about merging fastq files just using a simple shell script - is it simply as easy as concatenating them?

yes

and, of course, you'll merge R1 and R2 reads in two differents files and in the same order...

and you can test this:

$ cat f1.R1.fq.fz f2.R1.fq.fz f3.R1.fq.fz | gunzip -c | sha1sum 
a9d47bdeac29619b9d70369588a6de99365f75f3  -
$ gunzip -c f1.R1.fq.fz f2.R1.fq.fz f3.R1.fq.fz | sha1sum 
a9d47bdeac29619b9d70369588a6de99365f75f3  -