I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below. MC9_FNEN_638A_S19_L008_R1_001.fastq.gz MC9_FNEN_638A_S19_L008_R2_001.fastq.gz MC9_FNEN_638A_S9_L001_R1_001.fastq.gz MC9_FNEN_638A_S9_L001_R2_001.fastq.gz MC9_FNEN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L006_R1_001.fastq.gz MC9_FREN_638A_S9_L006_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz MC9_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz
I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_FREN
). I want to concatenate all the forward read XXXXX_R1_001.fastq.gz
that are generated in different lanes and put in the file name MC9_FREN_R1.fastq.gz
and all reverse reads XXXX_R2_001.fastq.gz
to MC9_FREN_R2.fastq.gz
cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz
cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz
cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz
cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz
cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz
cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz
Thanks jrj.healey, I have a large files for multiple sample, writing manually for each sample and tissue is time consuming, would it be nice if someone have already a script to do the the task.
No one is going to have a script with your IDs in it already.
Do you have a text file with all the lane IDs or anything? How many lanes do you have?
I’ll need to think a big further about how to do all of it in a single loop
Looks good, I’d test it by echoing the command in the loop before you run the whole thing though.
I got the command from other forum, I will test too
Just a word to the wise, it is not really considered fair practice/conduct to cross post the same question in a bunch on places (especially not to state so) as it results in duplication of effort.