Question

Merge fastq files

1

Entering edit mode

6.2 years ago

rse ▴ 100

Hi,

I have the following fastq files: *_L001_R1.fastq, *_L001_R2.fastq, *_L002_R1.fastq, *_L002_R2.fastq, *_L001_I1.fastq, *_L001_I2.fastq, *_L002_I1.fastq, *_L002_I2.fastq

How do i merge these files?

Thank you

next-gen sequencing • 6.0k views

ADD COMMENT • link 6.2 years ago by rse ▴ 100

0

Entering edit mode

You mean: cat file1 file2 file3 > mergefile ?

ADD REPLY • link 6.2 years ago by zhangdengwei ▴ 210

0

Entering edit mode

Yes, i want to merge these files into paired end files: R1.fastq and R2.fastq so i will merge all R1's and R2's together using separate cat commands. But i am not sure what to do of I1 and I2? Do i just ignore it?

ADD REPLY • link 6.2 years ago by rse ▴ 100

0

Entering edit mode

perhaps first explain what the goal of the merging is.

if it is simply to 'reduce' the number of files then yes cat (as suggested by zhangdengwei will do) but that actually makes little 'biological/technical' sense

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

I want to merge these files into paired end files: R1.fastq and R2.fastq so i will merge all R1's and R2's together using separate cat commands. But i am not sure what to do of I1 and I2? Do i just ignore it?

ADD REPLY • link 6.2 years ago by rse ▴ 100

0

Entering edit mode

OK, then cat is NOT the correct approach. What you should look for is tools that can create interleaved fastq files starting from separate fastq files. Simply cat-ing them together will not generate valid fastq files

still don't fully get why you want to merge them though, most programs will expect two files when processing paired-end data anyway (or at best the interleaved format as explained above)

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

Ok, thank you. Yes, i have 4 files (2 R1 and 2 R2 files from different lanes) so i am merging the 4 files into 2 files.

ADD REPLY • link 6.2 years ago by rse ▴ 100

0

Entering edit mode

Does anyone know how to handle the I1 and I2 files? Thank you

ADD REPLY • link 6.2 years ago by rse ▴ 100

0

Entering edit mode

Those (the I files I mean) you can omit, they are index files and not needed for typical downstream analysis

if you want to join the two R1 files together and then the two R2 then you could use cat (but make sure you keep the order correct). if you want to join R1 with R2 then you will have to go for interleaved

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

Ok, understand. Thank you for the help.

ADD REPLY • link 6.2 years ago by rse ▴ 100

0

Entering edit mode

lieven.sterck : That is only correct if one has no interest in the index sequences (not sure why one would run these samples as indexed in first place but stuff happens).

It sounds like these samples are not demultiplexed. The index reads are present in separate files. This type of data is generally required for Qiime analysis.

rse : Are these 16S/metagenomic samples? If so you will need to make use of those I* files. If these are not for Qiime analysis are you interested in separating the samples based on the index sequences?

ADD REPLY • link 6.2 years ago by GenoMax 152k

0

Entering edit mode

I stand corrected.

Indeed, I jumped to conclusion to soon. Of course the index files are useful (and required) for some analyses.

ADD REPLY • link 6.2 years ago by lieven.sterck 15k