Question: merging fastq files from RNAseq
0
gravatar for 4galaxy7
2.7 years ago by
4galaxy70
4galaxy70 wrote:

Hi,

I have been given a bunch of files from an RNAseq output which looks like this

3062_GTGGCC_L003_R2_008.fastq.gz 3062_GTGGCC_L003_R2_007.fastq.gz 3062_GTGGCC_L003_R2_006.fastq.gz .... 3062_GTGGCC_L003_R2_001.fastq.gz

3062_GTGGCC_L003_R1_008.fastq.gz 3062_GTGGCC_L003_R1_007.fastq.gz

I haven't been given much info about them other than they are all from the same sample - I presume the R1 is forward and R2 is reverse pairs and that the total forward and reverse had to be split into 8 each because of file size issues or something.

I know there has been threads before about merging fastq files just using a simple shell script - is it simply as easy as concatenating them? I am a bit suspicious it looks to simple and I am wary of introducing errors with paired reads further down the line. Can anyone give me a bit of advice as to the best way about merging them into a single forward and reverse file for downstream analysis.

Thanks.

rna-seq sequence • 2.0k views
ADD COMMENTlink modified 2.7 years ago by Pierre Lindenbaum123k • written 2.7 years ago by 4galaxy70

They are split because there is a parameter in bcl2fastq (Illumina's demultiplexing tool). It defaults to 4,000,000 reads per file. I think that's for historical reasons. I never heard of a problem with FASTQs being too big, but people complain about too many FASTQs.

And yes, as Pierre already said, you can simply cat gzipped files. I was surprised to hear about this too. Just make sure you keep R1 and R2 separate.

ADD REPLYlink written 2.7 years ago by igor8.6k

I think that's for historical reasons.

it's better to have multiple fastq per sample: for WGS/WES you can the map them in parallel and then merge the results at the end.

ADD REPLYlink written 2.7 years ago by Pierre Lindenbaum123k

Sure. But there is nothing magical about 4,000,000 as far as I know.

Also, if you want to split your mapping, you can always split the full FASTQs.

ADD REPLYlink written 2.6 years ago by igor8.6k
4
gravatar for Pierre Lindenbaum
2.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

I know there has been threads before about merging fastq files just using a simple shell script - is it simply as easy as concatenating them?

yes

and, of course, you'll merge R1 and R2 reads in two differents files and in the same order...

and you can test this:

$ cat f1.R1.fq.fz f2.R1.fq.fz f3.R1.fq.fz | gunzip -c | sha1sum 
a9d47bdeac29619b9d70369588a6de99365f75f3  -
$ gunzip -c f1.R1.fq.fz f2.R1.fq.fz f3.R1.fq.fz | sha1sum 
a9d47bdeac29619b9d70369588a6de99365f75f3  -
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Pierre Lindenbaum123k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1282 users visited in the last hour