Question: Merging two fastq.gz files
0
gravatar for tcf.hcdg
11 months ago by
tcf.hcdg60
European Union
tcf.hcdg60 wrote:

Hello,

I have 96 *fastqc.gz raw read files from 24 samples. Each sample was sequenced on two lanees for each pair.

I would like to merge reads for each pair from both lanes into one output file with same name identifier from sample file name (2271_merged_R1_001.fastq.gz).

File names are in this order:
22[71-94]*R[1-2]_001.fastq.gz;

**2271**_ID890_1_S1_L001_**R1_001.fastq.gz**
**2271**_ID890_1_S1_L002_**R1_001.fastq.gz**

**2271**_ID890_1_S1_L001_**R2_001.fastq.gz**
**2271**_ID890_1_S1_L002_**R2_001.fastq.gz**

I tried the following short script but only two output files are being generated (first and the last).

FOR R1 files

  for rf in 22[71-94]*R1_001.fastq.gz; do zcat $rf > 22"${71-94}"_merged_R1_001.fastq.gz ; done

FOR R2 files

for rf in 22[71-94]*R2_001.fastq.gz; do zcat $rf > 22"${71-94}"_merged_R2_001.fastq.gz ; done

My Questions are: 1. Why only two output files are generated? 2. The number of reads in the out put files are not the sum of the merged files from both lanes. 3. Is there a nice way, I could do the merging of reads from both lanes for both (R1 and R2) in single step instead of running it two times for each read type.

What went wrong in the code? and how could I verify that the output files are completely merged?

Thanks

fastq merging • 1.4k views
ADD COMMENTlink modified 11 months ago by igor7.6k • written 11 months ago by tcf.hcdg60

For 48 files for R1, following code will work ( Take a back up of your work and try on 1-2 sets before using. Match MD5sums):

$ for i in   *1_R1_001.fastq.gz; do zcat ${i%%01*}01_R1_001.fastq.gz ${i%%01*}02_R1_001.fastq.gz| gzip -c - > ${i%%_*}_"merged_R"${i#*_R*} ; done

Works for R2 as well. Output file names would be: 2271_merged_R1_001.fastq.gz for 2271 R1.

ADD REPLYlink modified 11 months ago • written 11 months ago by cpad011211k
1
gravatar for Pierre Lindenbaum
11 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

not need to use gzcat, just use cat merge large amount of fastq files into a single one

ADD COMMENTlink written 11 months ago by Pierre Lindenbaum119k
0
gravatar for yhoogstrate
11 months ago by
yhoogstrate50
Netherlands
yhoogstrate50 wrote:

Is this what you're looking for maybe?:

for rf in 22[71-94]*R1_001.fastq.gz; do cat $rf >> 22"${71-94}"_merged_R1_001.fastq.gz ; done

zcat extracts, which is unnecessary as you dump it into a .gz file. Also, >> appends, > overwrites, of which appending seems what you need?

I hope this helps you a bit.

Enjoy,

Youri

ADD COMMENTlink modified 11 months ago • written 11 months ago by yhoogstrate50

And What about " 1. Why only two output files are generated? "

ADD REPLYlink written 11 months ago by tcf.hcdg60
1

I used the following and it worked:

R1

for ((num=71; num<=94; num++)); { cat 22"$num"*{L001,L002}_R1_001.fastq.gz > "22${num}_merged_R1_001.fastq.gz" ;}

R2

for ((num=71; num<=94; num++)); { cat 22"$num"*{L001,L002}_R1_001.fastq.gz > "22${num}_merged_R1_001.fastq.gz" ;}
ADD REPLYlink written 11 months ago by tcf.hcdg60
0
gravatar for igor
11 months ago by
igor7.6k
United States
igor7.6k wrote:

If you are not sure what your code is doing, try checking what is actually happening. Instead of generating the final file blindly and hoping it is working properly, print the progress. For example, you can check which inputs are getting paired with which outputs:

for rf in 22[71-94]*R1_001.fastq.gz; do
  echo "$rf  to  22${71-94}_merged_R1_001.fastq.gz"
done
ADD COMMENTlink modified 11 months ago • written 11 months ago by igor7.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 642 users visited in the last hour