Question: bash script for writing the files name and path, and directions of read
0
gravatar for Bioinfonext
12 months ago by
Bioinfonext220
Korea
Bioinfonext220 wrote:

Hi, I am using below script to write the sample name, followed by path of the file and then for R1 it write forward and for R2 it writes reverse: but some of the file it do not write correctly, could you please suggest what is the problem:

sample-id,absolute-filepath,direction

wrong file written: forward file is correct but reverse is not written correctly:

Leaf-T1-FD-R10_S73_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R10_S73_L001_R1_001.fastq.gz,forward    

Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,reverse

correct files written:

Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R1_001.fastq.gz,forward

Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R2_001.fastq.gz,reverse

Script:

#!/bin/bash
# header
echo "sample-id,absolute-filepath,direction"

# PATH to directory holding files, no trailing slash!
TARGET="/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq"
for fileR1 in $TARGET/*_R1_*gz; do
    # $file is the absolute path
    fileR2=$(echo $fileR1 | sed 's/R1/R2/')
    R1=$(basename $fileR1)
    R2=$(basename $fileR2)
    sampleR1=$(echo $R1 | sed 's/_R1_001.fastq.gz//')
    sampleR2=$(echo $R2 | sed 's/_R2_001.fastq.gz//')
    # print the results to stdout
    echo "$sampleR1,$fileR1,forward"
    echo "$sampleR2,$fileR2,reverse"
done
bash • 286 views
ADD COMMENTlink modified 12 months ago by shenwei3565.2k • written 12 months ago by Bioinfonext220
1
gravatar for SMK
12 months ago by
SMK1.9k
SMK1.9k wrote:

Hi Bioinfonext,

Can you check by printing out fileR2 after you did fileR2=$(echo $fileR1 | sed 's/R1/R2/')?

Also, you can add _R1_001.fastq.gz when you use basename. Test with:

$ ls /users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R*
/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R1_001.fastq.gz
/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R2_001.fastq.gz

The following script:

#!/bin/bash
# header
echo "sample-id,absolute-filepath,direction"

# PATH to directory holding files, no trailing slash!
TARGET="/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq"

for fileR1 in $TARGET/*_R1_*gz; do
  # $file is the absolute path
  fileR2=$(echo $fileR1 | sed 's/R1/R2/')
  sampleR1=$(basename $fileR1 _R1_001.fastq.gz)
  sampleR2=$(basename $fileR2 _R2_001.fastq.gz)
  # print the results to stdout
  echo "$sampleR1,$fileR1,forward"
  echo "$sampleR2,$fileR2,reverse"
done

Will return:

sample-id,absolute-filepath,direction
Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R1_001.fastq.gz,forward
Leaf-T1-FD-R2_S29_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R2_S29_L001_R2_001.fastq.gz,reverse
ADD COMMENTlink modified 12 months ago • written 12 months ago by SMK1.9k

Hi, other files are written correctly but the problem is with FILES which are having R10 in there name:

like this is not written correctly:

Leaf-T1-FD-R10_S73_L001_R1_001.fastq.gz

Leaf-T1-FD-R10_S73_L001_R2_001.fastq.gz

output by script:

Leaf-T1-FD-R10_S73_L001,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R10_S73_L001_R1_001.fastq.gz,forward

Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,/users/3052771/sharedscratch/Amplicon_data_july_2019/PN0086C_16S-137179427/Amplicon_2019_RNAseq/Leaf-T1-FD-R20_S73_L001_R1_001.fastq.gz,reverse
ADD REPLYlink modified 12 months ago • written 12 months ago by Bioinfonext220
1

You can change from fileR2=$(echo $fileR1 | sed 's/R1/R2/') to fileR2=$(echo $fileR1 | sed 's/R1_/R2_/').

ADD REPLYlink written 12 months ago by SMK1.9k

Now script is working perfectly after the correction suggested by SMK.

Thanks a lot for your valuable time.

Regards Bioinfonext

ADD REPLYlink written 12 months ago by Bioinfonext220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1037 users visited in the last hour