Print based on character strings
1
0
Entering edit mode
3.7 years ago
selplat21 ▴ 20

I have a series of fastq files in different folders like so:

Historic_WestFork/80_S74_L002_R1_001.fastq.gz
Historic_WestFork/80_S74_L002_R2_001.fastq.gz
Historic_WestFork/82_S75_L002_R1_001.fastq.gz
Historic_WestFork/82_S75_L002_R2_001.fastq.gz
Historic_WestFork/83_S76_L002_R1_001.fastq.gz
Historic_WestFork/83_S76_L002_R2_001.fastq.gz
Historic_WestFork/84_S77_L002_R1_001.fastq.gz
Historic_WestFork/84_S77_L002_R2_001.fastq.gz
Historic_WestFork/85_S78_L002_R1_001.fastq.gz
Historic_WestFork/85_S78_L002_R2_001.fastq.gz
Historic_WestFork/86_S79_L002_R1_001.fastq.gz
Historic_WestFork/86_S79_L002_R2_001.fastq.gz
Historic_WestFork/88_S80_L002_R1_001.fastq.gz
Historic_WestFork/88_S80_L002_R2_001.fastq.gz
Historic_WestFork/90_S81_L002_R1_001.fastq.gz
Historic_WestFork/90_S81_L002_R2_001.fastq.gz
Historic_WestFork/91_S82_L002_R1_001.fastq.gz
Historic_WestFork/91_S82_L002_R2_001.fastq.gz
Historic_WestFork2/80_S70_L002_R1_001.fastq.gz
Historic_WestFork2/80_S70_L002_R2_001.fastq.gz
Historic_WestFork2/81_S71_L002_R1_001.fastq.gz
Historic_WestFork2/81_S71_L002_R2_001.fastq.gz
Historic_WestFork2/82_S72_L002_R1_001.fastq.gz
Historic_WestFork2/82_S72_L002_R2_001.fastq.gz
Historic_WestFork2/83_S73_L002_R1_001.fastq.gz
Historic_WestFork2/83_S73_L002_R2_001.fastq.gz
Historic_WestFork2/84_S74_L002_R1_001.fastq.gz
Historic_WestFork2/84_S74_L002_R2_001.fastq.gz
Historic_WestFork2/88_S75_L002_R1_001.fastq.gz
Historic_WestFork2/88_S75_L002_R2_001.fastq.gz

Each unique sample is denoted by the number (e.g 88). Only 5 of the 10 samples have two R1 files and two R2 files. I need to merge the R1 files and the R2 files for each sample and want to write a loop to echo each command to merge files.

For each sample, I want to echo for example "cat Historic_WestFork/88_S80_L002_R1_001.fastq.gz Historic_WestFork2/88_S75_L002_R1_001.fastq.gz > 88_merged_R1.fastq.gz" (used 88 for example here). And then the same for the R2 files.

Any help would be appreciated for R or command line.

Thank you.

R Command Line Unix • 1.0k views
ADD COMMENT
0
Entering edit mode

Your example doesn't match your data. Do you wish to replace the _S*_L00* part with _merged_? Please show a proper example.

ADD REPLY
0
Entering edit mode

Edited, Yes!

ADD REPLY
0
Entering edit mode

This is a fun exercise but if you only have to do this 10 times, you're better off doing it manually. Figuring out a loop is going to take longer than doing it manually.

ADD REPLY
0
Entering edit mode

Thank you so much!!

ADD REPLY
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY
1
Entering edit mode
3.7 years ago
Ram 43k

If the above is all you have, this should work:

for r1fq in $(grep "Historic_WestFork/.+_R1_" list_of_fastq_files.txt)
do
    smpl=$(basename $r1fq) #remove dir name
    smpl=${smpl%%_*} #trim everything from end up to first underscore
    r2f1=${r1fq/_R1_/_R2_} #this is R2 fastq

    echo "cat $r1fq Historic_WestFork2/$smpl*_R1_*.fastq.gz > ${smpl}_merged_R1.fastq.gz"
    echo "cat $r1fq Historic_WestFork2/$smpl*_R2_*.fastq.gz > ${smpl}_merged_R2.fastq.gz"
    ## Remove the `echo` to do the actual operation
done

This won't work if you have multiple folders or even a difference in the folder names. You might need to use a complicated find operation in those cases. It would just be easier to create soft-links to all files manually in a separate folder, then use grobs within that folder to get things done.

ADD COMMENT

Login before adding your answer.

Traffic: 2005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6