Question: How to concatenate RNA-seq files generated in differnt lanes
gravatar for desu_gett
11 months ago by
desu_gett0 wrote:

I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below. MC9_FNEN_638A_S19_L008_R1_001.fastq.gz MC9_FNEN_638A_S19_L008_R2_001.fastq.gz MC9_FNEN_638A_S9_L001_R1_001.fastq.gz MC9_FNEN_638A_S9_L001_R2_001.fastq.gz MC9_FNEN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L006_R1_001.fastq.gz MC9_FREN_638A_S9_L006_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz MC9_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz

I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_FREN). I want to concatenate all the forward read XXXXX_R1_001.fastq.gz that are generated in different lanes and put in the file name MC9_FREN_R1.fastq.gz and all reverse reads XXXX_R2_001.fastq.gz to MC9_FREN_R2.fastq.gz

cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz

cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz

cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz

cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz

cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz

cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz

rna-seq sequence • 589 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by desu_gett0
gravatar for jrj.healey
11 months ago by
United Kingdom
jrj.healey11k wrote:

cat MC9_PREN_*R1* > MC9_PREN_R1.fastq.gz

cat MC9_PREN_*R2* > MC9_PREN_R1.fastq.gz

Rinse and repeat for each lane ID.

ADD COMMENTlink modified 11 months ago • written 11 months ago by jrj.healey11k

Thanks jrj.healey, I have a large files for multiple sample, writing manually for each sample and tissue is time consuming, would it be nice if someone have already a script to do the the task.

ADD REPLYlink modified 11 months ago • written 11 months ago by desu_gett0

No one is going to have a script with your IDs in it already.

Do you have a text file with all the lane IDs or anything? How many lanes do you have?

I’ll need to think a big further about how to do all of it in a single loop

ADD REPLYlink modified 11 months ago • written 11 months ago by jrj.healey11k
for name in *.fastq.gz; do
    printf '%s\n' "${name%_*_*_*_R[12]*}"
done | uniq |
while read prefix; do
    cat "$prefix"*R1*.fastq.gz >"${prefix}_R1.fastq.gz"
    cat "$prefix"*R2*.fastq.gz >"${prefix}_R2.fastq.gz"
ADD REPLYlink modified 11 months ago • written 11 months ago by desu_gett0

Looks good, I’d test it by echoing the command in the loop before you run the whole thing though.

ADD REPLYlink written 11 months ago by jrj.healey11k

I got the command from other forum, I will test too

ADD REPLYlink written 11 months ago by desu_gett0

Just a word to the wise, it is not really considered fair practice/conduct to cross post the same question in a bunch on places (especially not to state so) as it results in duplication of effort.

ADD REPLYlink written 11 months ago by jrj.healey11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour