Question: HISAT2 for loop shell scripting
0
gravatar for e.amiri79
16 months ago by
e.amiri790
e.amiri790 wrote:

Dear all would you please help me to modify my loop in bash to align my samples of FASTQ files? I have paired end RNA-seq files as:

EGG12-Clean_ACTGAT_S33_L004_R1_001.fastq
EGG12-Clean_ACTGAT_S33_L004_R2_001.fastq
EGG14-Clean_GAGTGG_S34_L004_R1_001.fastq
EGG14-Clean_GAGTGG_S34_L004_R2_001.fastq

... I have tried:

#!/bin/bash
export RNA_HOME=~/workspace/rnaseq
cd $RNA_HOME
export RNA_DATA_DIR=$RNA_HOME/data
cd $RNA_DATA_DIR
export RNA_REF_INDEX=$RNA_REFS_DIR/amel_OGSv3.2

for i in $(ls *.fastq | rev | cut -c 22- | rev | uniq)
do
    hisat2 -p 8 \
        --rg-id=${i} \
        --rg SM:${i}\
        --rg PL:ILLUMINA \
        -x $RNA_REF_INDEX \
        --dta --rna-strandness RF \
        -1 $RNA_DATA_DIR/${i}_*_R1_001.fastq \
        -2 $RNA_DATA_DIR/${i}_*_R2_001.fastq \
        -S ./${i}.sam
done;

but unfortunately it can not find the paired end files and therefore will not be executed. Any help would be appreciated

rna-seq • 1.4k views
ADD COMMENTlink modified 16 months ago by jkkim30 • written 16 months ago by e.amiri790
1

Looks like you are entering twice in $RNA_DATA_DIR. You do:

...
cd $RNA_DATA_DIR
...
-1 $RNA_DATA_DIR/${i}_*_R1_001.fastq \
-2 $RNA_DATA_DIR/${i}_*_R2_001.fastq \
ADD REPLYlink written 16 months ago by Carlo Yague4.6k
1

In addition $RNA_REFS_DIR is not defined.

$RNA_DATA_DIR/${i}_*_R1_001.fastq is also weird. Why do you use a wildcard here ?

I think your code is a bit messed up. You should probably test your loop first with something like:

for i in $(ls *.fastq | rev | cut -c 22- | rev | uniq)
do
  echo $i
done;

If this doesn't work, then you know you have issues with your loop.

ADD REPLYlink written 16 months ago by Carlo Yague4.6k

Thanks but it works fine and it is exactly what I need

ADD REPLYlink written 16 months ago by e.amiri790

No it was only to let you know about the direction.

ADD REPLYlink written 16 months ago by e.amiri790

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 16 months ago by WouterDeCoster40k

One issue you have is that your files are .fastq files but your code is trying to search for files that end in .fa. Change your for loop to this:

for i in $(ls *.fastq | rev | cut -c 22- | rev | uniq)

I don't think this is the only issue but try running this and seeing what happens.

ADD REPLYlink modified 16 months ago • written 16 months ago by danielbower910

Thanks, I did run it but still it is not working

ADD REPLYlink written 16 months ago by e.amiri790

can you update your original post with the modified code you are using, and any error message you are getting? Also can you give the ls or tree output of the directory you are running the script in?

ADD REPLYlink written 16 months ago by steve2.2k

ls: cannot access '*.fastq': No such file or directory

this is what I get.

ADD REPLYlink modified 16 months ago • written 16 months ago by e.amiri790

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

DO YOU UNDERSTAND?

ADD REPLYlink written 16 months ago by WouterDeCoster40k

Do not use quotes with shell globs (aka *); the command needs to be: ls *.fastq

ADD REPLYlink written 16 months ago by steve2.2k

@OP: Your $i is EGG12-Clean_ACTGAT_ and when you reconstruct with $RNA_DATA_DIR/${i}_*_R1_001.fastq, your sample is EGG12-Clean_ACTGAT__*_R1_001.fastq (with an extra _) instead of EGG12-Clean_ACTGAT_*_R1_001.fastq. Remove _ after } (some thing like: ${i}*_R1). Try to print $i (echo $i for both reads) after renaming. Change fa to fastq as mentioned above.

Also try to use bash string manipulation:

$ for i in $(ls *.fastq); do echo ${i%_*_*_*_*.fastq};done
EGG12-Clean_ACTGAT
EGG12-Clean_ACTGAT
EGG12-Clean_ACTGAT
EGG12-Clean_ACTGAT

thanks @ram

ADD REPLYlink modified 16 months ago • written 16 months ago by cpad011211k
2
gravatar for h.mon
16 months ago by
h.mon27k
Brazil
h.mon27k wrote:

note: Carlos Yague already pointed out your mistakes, but I will repeat then here because I wrote the post before reading the deeply branched comments.

There are a couple of errors. For example, $RNA_REFS_DIR hasn't been defined, but it is used here:

export RNA_REF_INDEX=$RNA_REFS_DIR/amel_OGSv3.2

Another one is that you want to loop over fastq files, but uses ls *.fa.

Also, if you cd $RNA_DATA_DIR, then on the loop you need:

-1 ${i}_*_R1_001.fastq \
-2 ${i}_*_R2_001.fastq \

Finally, you can't use * ([glob][1]) here, as you need specific file names as input.

It is never a good idea to parse ls, and almost always it is not needed. Instead of you convoluted:

for i in $(ls *.fastq | rev | cut -c 22- | rev | uniq)

You can just glob for filenames, then manipulate $i inside the loop:

for i in *_R1_001.fastq

This is just a skeleton to get you started:

for i in *_R1_001.fastq
do
    SAMPLE=$(echo ${i%_S*})
    R1=$(echo ${i#*_S})
    R2=$(echo ${i#*_S} | sed "s/_R1_/_R2_/")
    echo "${SAMPLE}_S${R1}"
    echo "${SAMPLE}_S${R2}"
done
ADD COMMENTlink written 16 months ago by h.mon27k

Thank you! I need to run it in my computer and if it worked then I can adjust it for my data

ADD REPLYlink written 16 months ago by e.amiri790
1
gravatar for jkkim
16 months ago by
jkkim30
S.Korea
jkkim30 wrote:

I'd like to introduce snakemake for a task like this.

https://snakemake.readthedocs.io/en/stable/

It's easy to use and well-documented. Here is a tutorial from Mr. Slowikowski's website.

https://slowkow.com/notes/snakemake-tutorial/

And this is my mRNAseq pipeline made by snakemake. (I use STAR and RSEM.)

https://github.com/heyyyjude/ngs-data-pipeline/blob/master/mrna-seq.fq.snakemake

ADD COMMENTlink modified 16 months ago • written 16 months ago by jkkim30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1607 users visited in the last hour