merging fastq files from RNAseq
1
0
Entering edit mode
7.2 years ago
4galaxy7 • 0

Hi,

I have been given a bunch of files from an RNAseq output which looks like this

3062_GTGGCC_L003_R2_008.fastq.gz 3062_GTGGCC_L003_R2_007.fastq.gz 3062_GTGGCC_L003_R2_006.fastq.gz .... 3062_GTGGCC_L003_R2_001.fastq.gz

3062_GTGGCC_L003_R1_008.fastq.gz 3062_GTGGCC_L003_R1_007.fastq.gz

I haven't been given much info about them other than they are all from the same sample - I presume the R1 is forward and R2 is reverse pairs and that the total forward and reverse had to be split into 8 each because of file size issues or something.

I know there has been threads before about merging fastq files just using a simple shell script - is it simply as easy as concatenating them? I am a bit suspicious it looks to simple and I am wary of introducing errors with paired reads further down the line. Can anyone give me a bit of advice as to the best way about merging them into a single forward and reverse file for downstream analysis.

Thanks.

RNA-Seq sequence • 3.9k views
ADD COMMENT
0
Entering edit mode

They are split because there is a parameter in bcl2fastq (Illumina's demultiplexing tool). It defaults to 4,000,000 reads per file. I think that's for historical reasons. I never heard of a problem with FASTQs being too big, but people complain about too many FASTQs.

And yes, as Pierre already said, you can simply cat gzipped files. I was surprised to hear about this too. Just make sure you keep R1 and R2 separate.

ADD REPLY
0
Entering edit mode

I think that's for historical reasons.

it's better to have multiple fastq per sample: for WGS/WES you can the map them in parallel and then merge the results at the end.

ADD REPLY
0
Entering edit mode

Sure. But there is nothing magical about 4,000,000 as far as I know.

Also, if you want to split your mapping, you can always split the full FASTQs.

ADD REPLY
4
Entering edit mode
7.2 years ago

I know there has been threads before about merging fastq files just using a simple shell script - is it simply as easy as concatenating them?

yes

and, of course, you'll merge R1 and R2 reads in two differents files and in the same order...

and you can test this:

$ cat f1.R1.fq.fz f2.R1.fq.fz f3.R1.fq.fz | gunzip -c | sha1sum 
a9d47bdeac29619b9d70369588a6de99365f75f3  -
$ gunzip -c f1.R1.fq.fz f2.R1.fq.fz f3.R1.fq.fz | sha1sum 
a9d47bdeac29619b9d70369588a6de99365f75f3  -
ADD COMMENT

Login before adding your answer.

Traffic: 2599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6