IF statement (in linux script) to merge Fastq files with similar (but not the same) names
0
0
Entering edit mode
7 months ago
endretoth ▴ 30

Dear All,

I have a lot of paired-end sequences (R1 and R2) from a NG sequencing. However, each sample was run on a different lane (on a flow cell), thus the fastq files from the same lane must be merged. For example, my files look like:

POP_Sample1_L001_R1.fastq.gz

POP_Sample1_L002_R1.fastq.gz

POP_Sample1_L001_R2.fastq.gz

POP_Sample1_L002_R2.fastq.gz

...

POP_Sample2_L001_R1.fastq.gz

POP_Sample2_L002_R1.fastq.gz

...

All files are stored in a single folder (I have a lot of files). I would like to merge the same files(from the different lanes) into one file (such as the two sample in italics). For this purpose, I would like to write a script with IF condition on their names: (This is just an idea, I'm sure that this script doesn't work. Within IF, I would concatenate the files with cat.)

if [ -f $POP_Sample*_R1.fastq.gz ==$POP_Sample*_R1.fastq.gz]

then

cat POP_Sample*_R1.fastq.gz > POP_Sample*_R1_concatenated.fastq.gz

fi


May I kindly ask your help, I'm not sure even if it is possible.

Best, Thend

unix script ifstatement linux fastq • 337 views
0
Entering edit mode

instead of using if, how about using two loops for S in sample1 sample2 sample3 and for R in R1 R2 and use find to get the fastqz files to concatenate.

0
Entering edit mode
1. get sample names from lane1 and R1 files using find.
2. In bash loop, cat R1 files for each sample
3. In the same loop, repeat 2nd step for R2.

bash string manipulation in combination with find will work in a single loop. You do not need loop for this if you use parallel.