Question: How to concatenate RNA-seq files generated in differnt lanes
gravatar for desu_gett
2.5 years ago by
desu_gett0 wrote:

I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below. MC9_FNEN_638A_S19_L008_R1_001.fastq.gz MC9_FNEN_638A_S19_L008_R2_001.fastq.gz MC9_FNEN_638A_S9_L001_R1_001.fastq.gz MC9_FNEN_638A_S9_L001_R2_001.fastq.gz MC9_FNEN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L006_R1_001.fastq.gz MC9_FREN_638A_S9_L006_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz MC9_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz

I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_FREN). I want to concatenate all the forward read XXXXX_R1_001.fastq.gz that are generated in different lanes and put in the file name MC9_FREN_R1.fastq.gz and all reverse reads XXXX_R2_001.fastq.gz to MC9_FREN_R2.fastq.gz

cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz

cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz

cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz

cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz

cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz

cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz

rna-seq sequence • 1.5k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by desu_gett0
gravatar for Joe
2.5 years ago by
United Kingdom
Joe18k wrote:

cat MC9_PREN_*R1* > MC9_PREN_R1.fastq.gz

cat MC9_PREN_*R2* > MC9_PREN_R1.fastq.gz

Rinse and repeat for each lane ID.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Joe18k

Thanks jrj.healey, I have a large files for multiple sample, writing manually for each sample and tissue is time consuming, would it be nice if someone have already a script to do the the task.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by desu_gett0

No one is going to have a script with your IDs in it already.

Do you have a text file with all the lane IDs or anything? How many lanes do you have?

I’ll need to think a big further about how to do all of it in a single loop

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Joe18k
for name in *.fastq.gz; do
    printf '%s\n' "${name%_*_*_*_R[12]*}"
done | uniq |
while read prefix; do
    cat "$prefix"*R1*.fastq.gz >"${prefix}_R1.fastq.gz"
    cat "$prefix"*R2*.fastq.gz >"${prefix}_R2.fastq.gz"
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by desu_gett0

Looks good, I’d test it by echoing the command in the loop before you run the whole thing though.

ADD REPLYlink written 2.5 years ago by Joe18k

I got the command from other forum, I will test too

ADD REPLYlink written 2.5 years ago by desu_gett0

Just a word to the wise, it is not really considered fair practice/conduct to cross post the same question in a bunch on places (especially not to state so) as it results in duplication of effort.

ADD REPLYlink written 2.5 years ago by Joe18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1002 users visited in the last hour