Merging two fastq.gz files
3
0
Entering edit mode
3.6 years ago
tcf.hcdg ▴ 70

Hello,

I have 96 *fastqc.gz raw read files from 24 samples. Each sample was sequenced on two lanees for each pair.

I would like to merge reads for each pair from both lanes into one output file with same name identifier from sample file name (2271_merged_R1_001.fastq.gz).

File names are in this order:
22[71-94]*R[1-2]_001.fastq.gz;

**2271**_ID890_1_S1_L001_**R1_001.fastq.gz**
**2271**_ID890_1_S1_L002_**R1_001.fastq.gz**

**2271**_ID890_1_S1_L001_**R2_001.fastq.gz**
**2271**_ID890_1_S1_L002_**R2_001.fastq.gz**


I tried the following short script but only two output files are being generated (first and the last).

FOR R1 files

  for rf in 22[71-94]*R1_001.fastq.gz; do zcat $rf > 22"${71-94}"_merged_R1_001.fastq.gz ; done


FOR R2 files

for rf in 22[71-94]*R2_001.fastq.gz; do zcat $rf > 22"${71-94}"_merged_R2_001.fastq.gz ; done


My Questions are: 1. Why only two output files are generated? 2. The number of reads in the out put files are not the sum of the merged files from both lanes. 3. Is there a nice way, I could do the merging of reads from both lanes for both (R1 and R2) in single step instead of running it two times for each read type.

What went wrong in the code? and how could I verify that the output files are completely merged?

Thanks

fastq merging • 4.6k views
0
Entering edit mode

For 48 files for R1, following code will work ( Take a back up of your work and try on 1-2 sets before using. Match MD5sums):

$for i in *1_R1_001.fastq.gz; do zcat${i%%01*}01_R1_001.fastq.gz ${i%%01*}02_R1_001.fastq.gz| gzip -c - >${i%%_*}_"merged_R"${i#*_R*} ; done  Works for R2 as well. Output file names would be: 2271_merged_R1_001.fastq.gz for 2271 R1. ADD REPLY 1 Entering edit mode 3.6 years ago not need to use gzcat, just use cat merge large amount of fastq files into a single one ADD COMMENT 0 Entering edit mode 3.6 years ago yhoogstrate ▴ 80 Is this what you're looking for maybe?: for rf in 22[71-94]*R1_001.fastq.gz; do cat$rf >> 22"${71-94}"_merged_R1_001.fastq.gz ; done zcat extracts, which is unnecessary as you dump it into a .gz file. Also, >> appends, > overwrites, of which appending seems what you need? I hope this helps you a bit. Enjoy, Youri ADD COMMENT 0 Entering edit mode And What about " 1. Why only two output files are generated? " ADD REPLY 1 Entering edit mode I used the following and it worked: R1 for ((num=71; num<=94; num++)); { cat 22"$num"*{L001,L002}_R1_001.fastq.gz > "22${num}_merged_R1_001.fastq.gz" ;}  R2 for ((num=71; num<=94; num++)); { cat 22"$num"*{L001,L002}_R1_001.fastq.gz > "22${num}_merged_R1_001.fastq.gz" ;}  ADD REPLY 0 Entering edit mode 3.6 years ago igor 12k If you are not sure what your code is doing, try checking what is actually happening. Instead of generating the final file blindly and hoping it is working properly, print the progress. For example, you can check which inputs are getting paired with which outputs: for rf in 22[71-94]*R1_001.fastq.gz; do echo "$rf  to  22\${71-94}_merged_R1_001.fastq.gz"
done


Traffic: 1785 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.