Can HISAT2 be invoked in bash shell, and can it be fed variables or names with wildcards as inputs?
Entering edit mode
5.4 years ago
RNAseqer ▴ 270

Hello all,

So I have been trying to write up a .sh script that will automate the process of running HISAT2 on a bunch of paired end data. Since these are massive read files and physical memory on my computer is limited I have been writing a bash shell script that pulls read files off two at a time, puts them into a folder on my desktop, runs programs on them sending output back to the external hard drive into a freshly made folder there, then deletes the two read files from my computer before moving on to the next pair.

This script has worked quite nicely in the texting phase, where I ran a simple regular expression perl script on the read files to check for my programs functionality.

However, I now have to write a line into it calling HISAT2 and I find that Im not sure how to do that! Since the names of the files are going to be changing, I was wondering if I could use variables names or wild card characters in the HISAT2 commands. My few faltering attempts to do this have given me error messages from HISAT2, specifically "extra parameters" being detected.

I'd be very grateful for an help you could provide in this matter. I have included my script below, and I am painfully aware of how clunky and inefficient it probably is. However, it has the incredible virtue of being code I actually understand (and as the rankest of amateurs when it comes to bash, that REALLY matters more to me than efficiency), so I am primarily concerned with addressing the HISAT2 issue. That said, I would of course take any advice regarding how to tackle such a problem the next time around to problem to heart!


prev_dir=/Volumes/My\ Passport/systemPipeR_tests_arabidopsis/data/

count=0 # keeps track of how many .fastq files have been moved 

cd "$prev_dir"
for i in `cat targeted_files.txt` 
  sed -i '' 's/\r$//' $i  

  #Making a folder in the directory on our external HD  
  mkdir -p "$prev_dir"/$DIR

  #Copying 2 mated .fastq files to desktop  
  cd "$prev_dir"        
  cp $i "$new_dir"      
  (( count++ ))

  cd "$new_dir"              

  #With both mate pair files in desktop folder
  if [ $count -eq 2 ] 
    echo "$DIR"" Forward and Reverse reads both in temporary folder. Processing..."
    count=0             #re-zero count 

    sleep 2             # pause to allow user to visually desktop folder contents

    #Loop over all the read files in desktop folder
    for f in ./*.fastq;
        # A simple regular expression substitution script, used while writing/testing/troubleshooting script
        perl /Users/mylapple/desktop/test_reads_folder_perl/ "$f" > "$prev_dir"/"$DIR"/"${f%.*}_trimmed.fastq"

    #What I would LIKE to do, is use the two read files as part of HISAT2 run
    #Ideally, output from HISAT2 would be sent to the new folder on ext HD
    #Heres the problem.... 
    # Can I use wildcard characters/variable names in the following line for the 2 read files? For the output dir?
    # hisat2 -x genome_snp_tran -1 seqrunID_1.fastq -2 seqrunID_2.fastq -S "seqrunID"

    #Now remove the two fastq files from desktop folder
    rm ./*
  cd "$prev_dir"  

Again, my main concern is with this bit right here:

# Can I use wildcard characters/variable names in the following line for the 2 read files? For the output dir?
# hisat2 -x genome_snp_tran -1 seqrunID_1.fastq -2 seqrunID_2.fastq -S "seqrunID"
Bash Shell HISAT2 • 1.5k views
Entering edit mode


$ ls
seqrunID_1.fastq  seqrunID_2.fastq

$ for i in *1.fastq; do echo  hisat2 -x genome_snp_tran $i ${i/1.fastq/2.fastq} -S ${i/_1.fastq/};done
hisat2 -x genome_snp_tran seqrunID_1.fastq seqrunID_2.fastq -S seqrunID

Login before adding your answer.

Traffic: 3319 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6