Question: Can HISAT2 be invoked in bash shell, and can it be fed variables or names with wildcards as inputs?
0
gravatar for RNAseqer
9 months ago by
RNAseqer 110
RNAseqer 110 wrote:

Hello all,

So I have been trying to write up a .sh script that will automate the process of running HISAT2 on a bunch of paired end data. Since these are massive read files and physical memory on my computer is limited I have been writing a bash shell script that pulls read files off two at a time, puts them into a folder on my desktop, runs programs on them sending output back to the external hard drive into a freshly made folder there, then deletes the two read files from my computer before moving on to the next pair.

This script has worked quite nicely in the texting phase, where I ran a simple regular expression perl script on the read files to check for my programs functionality.

However, I now have to write a line into it calling HISAT2 and I find that Im not sure how to do that! Since the names of the files are going to be changing, I was wondering if I could use variables names or wild card characters in the HISAT2 commands. My few faltering attempts to do this have given me error messages from HISAT2, specifically "extra parameters" being detected.

I'd be very grateful for an help you could provide in this matter. I have included my script below, and I am painfully aware of how clunky and inefficient it probably is. However, it has the incredible virtue of being code I actually understand (and as the rankest of amateurs when it comes to bash, that REALLY matters more to me than efficiency), so I am primarily concerned with addressing the HISAT2 issue. That said, I would of course take any advice regarding how to tackle such a problem the next time around to problem to heart!

#!/bin/sh

prev_dir=/Volumes/My\ Passport/systemPipeR_tests_arabidopsis/data/
new_dir=/Users/mylapple/desktop/test_reads_folder_b


count=0 # keeps track of how many .fastq files have been moved 

cd "$prev_dir"
for i in `cat targeted_files.txt` 
do
   sed -i '' 's/\r$//' $i  

   #Making a folder in the directory on our external HD  

   DIR="${i%%_*}" 
   mkdir -p "$prev_dir"/$DIR


   #Copying 2 mated .fastq files to desktop  
   cd "$prev_dir"       
   cp $i "$new_dir"         
   (( count++ ))

   cd "$new_dir"             

   #With both mate pair files in desktop folder
   if [ $count -eq 2 ] 
        then
        echo "$DIR"" Forward and Reverse reads both in temporary folder. Processing..."
        count=0             #re-zero count 

        sleep 2             # pause to allow user to visually desktop folder contents 


         #Loop over all the read files in desktop folder

                for f in ./*.fastq;
                    do

                    # A simple regular expression substitution script, used while writing/testing/troubleshooting script
                    perl /Users/mylapple/desktop/test_reads_folder_perl/regexSwapFQ.pl "$f" > "$prev_dir"/"$DIR"/"${f%.*}_trimmed.fastq"  

                    done


        #What I would LIKE to do, is use the two read files as part of HISAT2 run
        #Ideally, output from HISAT2 would be sent to the new folder on ext HD

#Heres the problem.... 
        # Can I use wildcard characters/variable names in the following line for the 2 read files? For the output dir?
        # hisat2 -x genome_snp_tran -1 seqrunID_1.fastq -2 seqrunID_2.fastq -S "seqrunID"



        #Now remove the two fastq files from desktop folder
        rm ./*              

    fi


    cd "$prev_dir"

done

Again, my main concern is with this bit right here:

    # Can I use wildcard characters/variable names in the following line for the 2 read files? For the output dir?
    # hisat2 -x genome_snp_tran -1 seqrunID_1.fastq -2 seqrunID_2.fastq -S "seqrunID"
ADD COMMENTlink modified 9 months ago by RamRS24k • written 9 months ago by RNAseqer 110
1

try:

$ ls
seqrunID_1.fastq  seqrunID_2.fastq

$ for i in *1.fastq; do echo  hisat2 -x genome_snp_tran $i ${i/1.fastq/2.fastq} -S ${i/_1.fastq/};done
hisat2 -x genome_snp_tran seqrunID_1.fastq seqrunID_2.fastq -S seqrunID
ADD REPLYlink modified 9 months ago • written 9 months ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1553 users visited in the last hour