How to concatenate RNA-seq files generated in differnt lanes
1
0
Entering edit mode
7.3 years ago
desu_gett • 0

I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below. MC9_FNEN_638A_S19_L008_R1_001.fastq.gz MC9_FNEN_638A_S19_L008_R2_001.fastq.gz MC9_FNEN_638A_S9_L001_R1_001.fastq.gz MC9_FNEN_638A_S9_L001_R2_001.fastq.gz MC9_FNEN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L006_R1_001.fastq.gz MC9_FREN_638A_S9_L006_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz MC9_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz

I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (MC9_FREN). I want to concatenate all the forward read XXXXX_R1_001.fastq.gz that are generated in different lanes and put in the file name MC9_FREN_R1.fastq.gz and all reverse reads XXXX_R2_001.fastq.gz to MC9_FREN_R2.fastq.gz

cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz

cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz

cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz

cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz

cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz

cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz

sequence RNA-Seq • 3.5k views
ADD COMMENT
0
Entering edit mode
7.3 years ago
Joe 22k

cat MC9_PREN_*R1* > MC9_PREN_R1.fastq.gz

cat MC9_PREN_*R2* > MC9_PREN_R1.fastq.gz

Rinse and repeat for each lane ID.

ADD COMMENT
0
Entering edit mode

Thanks jrj.healey, I have a large files for multiple sample, writing manually for each sample and tissue is time consuming, would it be nice if someone have already a script to do the the task.

ADD REPLY
1
Entering edit mode

No one is going to have a script with your IDs in it already.

Do you have a text file with all the lane IDs or anything? How many lanes do you have?

I’ll need to think a big further about how to do all of it in a single loop

ADD REPLY
0
Entering edit mode
for name in *.fastq.gz; do
    printf '%s\n' "${name%_*_*_*_R[12]*}"
done | uniq |
while read prefix; do
    cat "$prefix"*R1*.fastq.gz >"${prefix}_R1.fastq.gz"
    cat "$prefix"*R2*.fastq.gz >"${prefix}_R2.fastq.gz"
done
ADD REPLY
0
Entering edit mode

Looks good, I’d test it by echoing the command in the loop before you run the whole thing though.

ADD REPLY
0
Entering edit mode

I got the command from other forum, I will test too

ADD REPLY
1
Entering edit mode

Just a word to the wise, it is not really considered fair practice/conduct to cross post the same question in a bunch on places (especially not to state so) as it results in duplication of effort.

ADD REPLY

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6