bash script for writing the files name and path, and directions of read
1
0
Entering edit mode
4.8 years ago
Bioinfonext ▴ 460

Hi, I am using below script to write the sample name, followed by path of the file and then for R1 it write forward and for R2 it writes reverse: but some of the file it do not write correctly, could you please suggest what is the problem:

sample-id,absolute-filepath,direction

wrong file written: forward file is correct but reverse is not written correctly:

Leaf-T1-FD-R10_S73_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R10_S73_L001_R1_001.fastq.gz,forward    

Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,reverse

correct files written:

Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R1_001.fastq.gz,forward

Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R2_001.fastq.gz,reverse

Script:

#!/bin/bash
# header
echo "sample-id,absolute-filepath,direction"

# PATH to directory holding files, no trailing slash!
TARGET="/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq"
for fileR1 in $TARGET/*_R1_*gz; do
    # $file is the absolute path
    fileR2=$(echo $fileR1 | sed 's/R1/R2/')
    R1=$(basename $fileR1)
    R2=$(basename $fileR2)
    sampleR1=$(echo $R1 | sed 's/_R1_001.fastq.gz//')
    sampleR2=$(echo $R2 | sed 's/_R2_001.fastq.gz//')
    # print the results to stdout
    echo "$sampleR1,$fileR1,forward"
    echo "$sampleR2,$fileR2,reverse"
done
bash • 1.4k views
ADD COMMENT
1
Entering edit mode
4.8 years ago
AK ★ 2.2k

Hi Bioinfonext,

Can you check by printing out fileR2 after you did fileR2=$(echo $fileR1 | sed 's/R1/R2/')?

Also, you can add _R1_001.fastq.gz when you use basename. Test with:

$ ls /users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R*
/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R1_001.fastq.gz
/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R2_001.fastq.gz

The following script:

#!/bin/bash
# header
echo "sample-id,absolute-filepath,direction"

# PATH to directory holding files, no trailing slash!
TARGET="/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq"

for fileR1 in $TARGET/*_R1_*gz; do
  # $file is the absolute path
  fileR2=$(echo $fileR1 | sed 's/R1/R2/')
  sampleR1=$(basename $fileR1 _R1_001.fastq.gz)
  sampleR2=$(basename $fileR2 _R2_001.fastq.gz)
  # print the results to stdout
  echo "$sampleR1,$fileR1,forward"
  echo "$sampleR2,$fileR2,reverse"
done

Will return:

sample-id,absolute-filepath,direction
Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R1_001.fastq.gz,forward
Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R2_001.fastq.gz,reverse
ADD COMMENT
0
Entering edit mode

Hi, other files are written correctly but the problem is with FILES which are having R10 in there name:

like this is not written correctly:

Leaf-T1-FD-R10_S73_L001_R1_001.fastq.gz

Leaf-T1-FD-R10_S73_L001_R2_001.fastq.gz

output by script:

Leaf-T1-FD-R10_S73_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R10_S73_L001_R1_001.fastq.gz,forward

Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,reverse
ADD REPLY
1
Entering edit mode

You can change from fileR2=$(echo $fileR1 | sed 's/R1/R2/') to fileR2=$(echo $fileR1 | sed 's/R1_/R2_/').

ADD REPLY
0
Entering edit mode

Now script is working perfectly after the correction suggested by SMK.

Thanks a lot for your valuable time.

Regards Bioinfonext

ADD REPLY

Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6