Question: How to concatenate RNA-seq files generated in differnt lanes
0
gravatar for desu_gett
19 months ago by
desu_gett0
desu_gett0 wrote:

I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below. MC9_FNEN_638A_S19_L008_R1_001.fastq.gz MC9_FNEN_638A_S19_L008_R2_001.fastq.gz MC9_FNEN_638A_S9_L001_R1_001.fastq.gz MC9_FNEN_638A_S9_L001_R2_001.fastq.gz MC9_FNEN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L006_R1_001.fastq.gz MC9_FREN_638A_S9_L006_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz MC9_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz

I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_FREN). I want to concatenate all the forward read XXXXX_R1_001.fastq.gz that are generated in different lanes and put in the file name MC9_FREN_R1.fastq.gz and all reverse reads XXXX_R2_001.fastq.gz to MC9_FREN_R2.fastq.gz

cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz

cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz

cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz

cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz

cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz

cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz

rna-seq sequence • 901 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by desu_gett0
0
gravatar for Joe
19 months ago by
Joe15k
United Kingdom
Joe15k wrote:

cat MC9_PREN_*R1* > MC9_PREN_R1.fastq.gz

cat MC9_PREN_*R2* > MC9_PREN_R1.fastq.gz

Rinse and repeat for each lane ID.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Joe15k

Thanks jrj.healey, I have a large files for multiple sample, writing manually for each sample and tissue is time consuming, would it be nice if someone have already a script to do the the task.

ADD REPLYlink modified 19 months ago • written 19 months ago by desu_gett0
1

No one is going to have a script with your IDs in it already.

Do you have a text file with all the lane IDs or anything? How many lanes do you have?

I’ll need to think a big further about how to do all of it in a single loop

ADD REPLYlink modified 19 months ago • written 19 months ago by Joe15k
for name in *.fastq.gz; do
    printf '%s\n' "${name%_*_*_*_R[12]*}"
done | uniq |
while read prefix; do
    cat "$prefix"*R1*.fastq.gz >"${prefix}_R1.fastq.gz"
    cat "$prefix"*R2*.fastq.gz >"${prefix}_R2.fastq.gz"
done
ADD REPLYlink modified 19 months ago • written 19 months ago by desu_gett0

Looks good, I’d test it by echoing the command in the loop before you run the whole thing though.

ADD REPLYlink written 19 months ago by Joe15k

I got the command from other forum, I will test too

ADD REPLYlink written 19 months ago by desu_gett0
1

Just a word to the wise, it is not really considered fair practice/conduct to cross post the same question in a bunch on places (especially not to state so) as it results in duplication of effort.

ADD REPLYlink written 19 months ago by Joe15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1257 users visited in the last hour