Question: bash script for writing the files name and path, and directions of read
0
gravatar for Bioinfonext
7 weeks ago by
Bioinfonext150
Korea
Bioinfonext150 wrote:

Hi, I am using below script to write the sample name, followed by path of the file and then for R1 it write forward and for R2 it writes reverse: but some of the file it do not write correctly, could you please suggest what is the problem:

sample-id,absolute-filepath,direction

wrong file written: forward file is correct but reverse is not written correctly:

Leaf-T1-FD-R10_S73_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R10_S73_L001_R1_001.fastq.gz,forward    

Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,reverse

correct files written:

Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R1_001.fastq.gz,forward

Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R2_001.fastq.gz,reverse

Script:

#!/bin/bash
# header
echo "sample-id,absolute-filepath,direction"

# PATH to directory holding files, no trailing slash!
TARGET="/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq"
for fileR1 in $TARGET/*_R1_*gz; do
    # $file is the absolute path
    fileR2=$(echo $fileR1 | sed 's/R1/R2/')
    R1=$(basename $fileR1)
    R2=$(basename $fileR2)
    sampleR1=$(echo $R1 | sed 's/_R1_001.fastq.gz//')
    sampleR2=$(echo $R2 | sed 's/_R2_001.fastq.gz//')
    # print the results to stdout
    echo "$sampleR1,$fileR1,forward"
    echo "$sampleR2,$fileR2,reverse"
done
bash • 120 views
ADD COMMENTlink modified 7 weeks ago by shenwei3564.8k • written 7 weeks ago by Bioinfonext150
1
gravatar for SMK
7 weeks ago by
SMK1.8k
SMK1.8k wrote:

Hi Bioinfonext,

Can you check by printing out fileR2 after you did fileR2=$(echo $fileR1 | sed 's/R1/R2/')?

Also, you can add _R1_001.fastq.gz when you use basename. Test with:

$ ls /users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R*
/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R1_001.fastq.gz
/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R2_001.fastq.gz

The following script:

#!/bin/bash
# header
echo "sample-id,absolute-filepath,direction"

# PATH to directory holding files, no trailing slash!
TARGET="/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq"

for fileR1 in $TARGET/*_R1_*gz; do
  # $file is the absolute path
  fileR2=$(echo $fileR1 | sed 's/R1/R2/')
  sampleR1=$(basename $fileR1 _R1_001.fastq.gz)
  sampleR2=$(basename $fileR2 _R2_001.fastq.gz)
  # print the results to stdout
  echo "$sampleR1,$fileR1,forward"
  echo "$sampleR2,$fileR2,reverse"
done

Will return:

sample-id,absolute-filepath,direction
Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R1_001.fastq.gz,forward
Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R2_001.fastq.gz,reverse
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by SMK1.8k

Hi, other files are written correctly but the problem is with FILES which are having R10 in there name:

like this is not written correctly:

Leaf-T1-FD-R10_S73_L001_R1_001.fastq.gz

Leaf-T1-FD-R10_S73_L001_R2_001.fastq.gz

output by script:

Leaf-T1-FD-R10_S73_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R10_S73_L001_R1_001.fastq.gz,forward

Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,reverse
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Bioinfonext150
1

You can change from fileR2=$(echo $fileR1 | sed 's/R1/R2/') to fileR2=$(echo $fileR1 | sed 's/R1_/R2_/').

ADD REPLYlink written 7 weeks ago by SMK1.8k

Now script is working perfectly after the correction suggested by SMK.

Thanks a lot for your valuable time.

Regards Bioinfonext

ADD REPLYlink written 7 weeks ago by Bioinfonext150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 911 users visited in the last hour