looping through PE files ?
1
0
Entering edit mode
3.7 years ago
sunnykevin97 ▴ 980

HI

Newbie to shell scripting, written a bash script loop through 100 PE files, unable to generate output. Can some one correct my code or any suggestions.

#!/bin/bash
for f1 in *_R1_sic.fastq
do
  f2=${f1%%_R1.sic.fastq}"_R2.sic.fastq"
  /home/cent/anaconda3/envs/shannon_env/bin/shannon_cpp shannon -k 30 -l 150 -p /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/$f1 /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/$f2 -o /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/$f1$f2_shan -t 5 -u 5 -g 150 -m 150G > /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/$f1$f2_shan_log
done
RNA-Seq • 1.6k views
ADD COMMENT
0
Entering edit mode

unable to generate output

Use echo at start of the command /home/cent/anaconda3/envs/shannon_env/bin/shannon_cpp shannon to make sure command lines look ok. If not you should get an idea of what needs to be changed.

A better option is to grab sample names by doing:

name = $(basename $f1 _R1_sic.fastq)
# then reconstruct names as needed
f1=${name}_R1_sic.fastq
f2=${name}_R2_sic.fastq
${name}.log
ADD REPLY
0
Entering edit mode
for f1 in *_R1_sic.fastq
do
echo "/home/cent/anaconda3/envs/shannon_env/bin/shannon_cpp shannon -k 30 -l 150 -p /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/$f1 /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/${f1/_R1_/_R2_} -o /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/$f1${f1/_R1_/_R2_}_shan -t 5 -u 5 -g 150 -m 150G > /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/$f1${f1/_R1_/_R2_}_shan_log;"
done

Remove echo after checking the function.

However, your output directory and logs will have unnecessarily long names. Assuming that base name of the files is unique, you can try this:

for f1 in *_R1_sic.fastq
do
echo "/home/cent/anaconda3/envs/shannon_env/bin/shannon_cpp shannon -k 30 -l 150 -p /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/$f1 /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/${f1/_R1_/_R2_} -o /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/${f1%_R1_*}_shan -t 5 -u 5 -g 150 -m 150G > /home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files/${f1%_R1_*}_shan_log;"
done

Remove echo for execution after checking the function.

I suggest you to use parallel for this kind of work.

ADD REPLY
0
Entering edit mode

It not working. : No such file or directory I give all the input files. If I run interactively it works.

ADD REPLY
3
Entering edit mode
3.7 years ago

Hi,

From your example, there is one inconsistency, the pattern name of the file is _R1_sic.fastq or _R1.sic.fastq?

I would suggest to use bash variables to define the path to the program and to the files. It simplifies your code. In addition you can use escape, i.e., \ to break down your code in multiple lines. It also helps to read the code.

The problem with your code is related with the f2=${f1%%_R1.sic.fastq}"_R2.sic.fastq" that is not doing what you're expecting.

Well assuming that is the second one, i.e., _R1.sic.fastq (if not, change accordingly), therefore do the following:

#!/bin/bash

PROGRAM=/home/cent/anaconda3/envs/shannon_env/bin/shannon_cpp
FILES=/home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files

for f1 in *_R1.sic.fastq
do
        f2=${f1%%_*}_R2.sic.fastq
        $PROGRAM shannon -k 30 -l 150 \
                -p $FILES/$f1 $FILES/$f2 \
                -o $FILES/${f1%%_*}_shan \
                -t 5 -u 5 -g 150 -m 150G > $FILES/${f1%%_*}_shan_log
done

Let me know if the code worked.

António

ADD COMMENT
0
Entering edit mode

When I run the script it overwrites the R1 files. All the R1 replaced with text files.

-rwxr-xr-x 1 cent cent  926 Jul 27 23:10 10_R1.fastq
-rwxr-xr-x 1 cent cent 5.2G Jul 26 20:20 10_R2.fastq
-rwxr-xr-x 1 cent cent  926 Jul 27 23:10 11_R1.fastq
-rwxr-xr-x 1 cent cent 5.4G Jul 26 20:14 11_R2.fastq
-rwxr-xr-x 1 cent cent  926 Jul 27 23:10 12_R1.fastq
-rwxr-xr-x 1 cent cent 5.9G Jul 26 20:18 12_R2.fastq
-rwxr-xr-x 1 cent cent  926 Jul 27 23:10 13_R1.fastq
-rwxr-xr-x 1 cent cent 4.9G Jul 26 20:17 13_R2.fastq
-rwxr-xr-x 1 cent cent  926 Jul 27 23:10 14_R1.fastq
-rwxr-xr-x 1 cent cent 5.9G Jul 26 20:18 14_R2.fastq
-rwxr-xr-x 1 cent cent  926 Jul 27 23:10 15_R1.fastq
-rwxr-xr-x 1 cent cent 6.5G Jul 26 20:10 15_R2.fastq
-rwxr-xr-x 1 cent cent  926 Jul 27 23:10 16_R1.fastq
-rwxr-xr-x 1 cent cent 6.6G Jul 26 20:22 16_R2.fastq
-rwxr-xr-x 1 cent cent  926 Jul 27 23:10 17_R1.fastq
-rwxr-xr-x 1 cent cent 6.5G Jul 26 20:15 17_R2.fastq
-rwxr-xr-x 1 cent cent  926 Jul 27 23:10 18_R1.fastq
ADD REPLY
1
Entering edit mode

Hi,

Yes, I understand why. This is because I did not noticed in your code that $f1$f2_shan is actually the same as $f1, that's why is overwriting the R1 files. I corrected the code to appear in the output only the name of the sample: ${f1%%_*}, i.e., the pattern before the first _ underscore.

#!/bin/bash

PROGRAM=/home/cent/anaconda3/envs/shannon_env/bin/shannon_cpp
FILES=/home/cent/data/cat_samples/fastqgz/trim/cent1/sickle_files

for f1 in *_R1.sic.fastq
do
        f2=${f1%%_*}_R2.sic.fastq
        $PROGRAM shannon -k 30 -l 150 \
                -p $FILES/$f1 $FILES/$f2 \
                -o $FILES/${f1%%_*}_shan \
                -t 5 -u 5 -g 150 -m 150G > $FILES/${f1%%_*}_shan_log
done

Let me know if this solves your problem,

António

ADD REPLY
0
Entering edit mode

It works great. I make similar script for preprocesing analysis any suggestions.

for f1 in *_R1.fastq
  do
          f2=${f1%%_*}_R2.fastq
          time java -Xmx12G -jar $PROGRAM PE \
                 $FILES/$f1 $FILES/$f2 \ 
                 $FILES/${f1%%_*}.trim.fastq  $FILES/${f1%%_*}.untrim.fastq \
                 $FILES/${f2%%_*}.trim.fastq  $FILES/${f2%%_*}.untrim.fastq \
                 SLIDINGWINDOW:4:20 MINLEN:20 ILLUMINACLIP:/home/cent/data/softwares/Trim/adapters/TruSeq3-PE.fa:2:30:15 > $FILES/${f1%%_*}_trim_log
done
ADD REPLY
0
Entering edit mode

I don't have further suggestions. The code seems good to me.

I just want to add that I changed the answer above, to correct the problem related with the overwriting of R1 files.

António

ADD REPLY

Login before adding your answer.

Traffic: 2885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6