Concatenating 4 files into 1
2
0
Entering edit mode
14 days ago
Roland ▴ 10

Hi.

I'm trying to concatenate 4 files into one. This is how my raw data looks like:

> S9_L001_R1_001_1P.fq.gz

> S9_L001_R1_001_1U.fq.gz

> S9_L001_R1_001_2P.fq.gz

> S9_L001_R1_001_2U.fq.gz

> S10_L001_R1_001_1P.fq.gz

> S10_L001_R1_001_1U.fq.gz

> S10_L001_R1_001_2P.fq.gz

> S10_L001_R1_001_2U.fq.gz


I have twenty samples (S1-20) and all samples consist of four files (1P, 1U, 2P and 2U). The code I've come up with but that doesn't work looks like this:

for i in {1..20};
do for j in 1 2;
do cat S${i}_L001_R1_001_${j}*.fq.gz >S${i}_concatenate.fq.gz; done; done  It only concatenates any 2 files from each sample. Any suggestions? Thanks. Concatenate • 454 views ADD COMMENT 0 Entering edit mode I hope there is a reason you are trying to cat these together. Based on the names it looks like these are properly paired and unpaired reads after trimming. You code is ignoring the 1P, 1U, 2P and 2U in names. What order do you want to concatenate those pieces in? ADD REPLY 0 Entering edit mode Since I'm not mapping the reads to a reference genome or building my own, I figured I might as well treat them as single end reads. I don't think it matters what order I map them in, but I guess 1P-1U-2P-2U ADD REPLY 1 Entering edit mode 14 days ago DavidStreid ▴ 70 Change the > to >> in the inner loop • > Writes a new file, overwriting anything already there • >> Also creates a new file, but will append to the existing file if present Your code only writes the two S${i}_L001_R1_001_2*.fq.gz files for any given i because it is overwriting the output of the S${i}_L001_R1_001_1*.fq.gz files in the second pass through the inner loop for i in {1..20}; do for j in 1 2; do # ONLY CHANGE: ">" => ">>" cat S${i}_L001_R1_001_${j}*.fq.gz >> S${i}_concatenate.fq.gz;
done
done

1
Entering edit mode

Thank you so much! This worked.

0
Entering edit mode

Good luck, np!

0
Entering edit mode
14 days ago
Mensur Dlakic ★ 22k

I am all for writing code to support tedious tasks, and I hope you get your answer. That said, it seems easier to type cat and paste 10 names, and do so twice, than to wait for responses here.

From what I can tell, the only thing that needs changing is * to ?

for i in {1..20};
do for j in 1 2;
do cat S${i}_L001_R1_001_${j}?.fq.gz > S${i}_concatenate.fq.gz; done; done  When in doubt, I suggest you put an echo command in front of your actual command. It will print everything on screen without executing it, so it may be easier to troubleshoot what is wrong. for i in {1..20}; do for j in 1 2; do echo "cat S${i}_L001_R1_001_${j}?.fq.gz > S${i}_concatenate.fq.gz" ; done; done

0
Entering edit mode

Maybe this will do the trick:

for i in {1..20};
do cat S${i}_L001_R1_001_??.fq.gz > S${i}_concatenate.fq.gz; done

0
Entering edit mode

The ? vs. ?? do?

Ah just tried it, the ? is very helpful as a wildcard - thank you

0
Entering edit mode

Thank you for your help. I'm currently working with some "test" samples in preparation for my real data which consists of well over 200 samples, so that's why I'd like to have it automated!