Question

run multiple files for sortmeRNA

0

Entering edit mode

4.3 years ago

tianshenbio ▴ 170

I used the following codes to run multiple files on sortmeRNA but it did not work...Any problem with it?

#!/bin/bash 

READ1="$1"

READ2="$2"

BASE=$(basename "${READ1/_1P.fq.gz/}")

echo "Uncompressing FASTQ data of $BASE"
gunzip "$READ1" "$READ2"

READ1="${READ1%.gz}"
READ2="${READ2%.gz}"

echo "Merging pairs of $BASE"
merge-paired-reads.sh "$READ1" "$READ2" "${BASE}_merge.fq"

echo "Running SortMeRNA for $BASE"
sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db --reads "${BASE}_merge.fq" --aligned \
"${BASE}_rRNA" --other  "${BASE}_clean" --log -a 16 \
-v --paired_in --fastx

echo "Unmerging SortMeRNA filtered pairs for $BASE"
unmerge-paired-reads.sh "${BASE}_clean.fq" \
"${BASE}_clean1.fq" "${BASE}_clean2.fq"

echo "Doing cleanup for $BASE"
gzip "$READ1" "$READ2" "${BASE}_clean1.fq" \
"${BASE}_clean2.fq" "${BASE}_rRNA.fq"
rm "${BASE}_merge.fq" "${BASE}_clean.fq"

##--- END HERE ---

RNA-Seq rna-seq sortmerna sequencing • 2.0k views

ADD COMMENT • link 4.3 years ago by tianshenbio ▴ 170

0

Entering edit mode

do you have any error messages when you run this script? Please also post the example names for forward and reverse files. Can you print out put from variable: $FILEBASE? where did you define it?@ tianshenbio

ADD REPLY • link 4.3 years ago by cpad0112 21k

0

Entering edit mode

Hi! Sorry for the typo, the '$FILEBASE' should be '$BASE'. My files were named as DS_1_1P.fq.gz, DS_1_2P.fq.gz, WS_2_1P.fq.gz... I have different names for 'DS_1_', and '_1P' and '_2P' indicate forward and reverse reads. I tried to execute the script using:

find /path to my original files -name "*.fq.gz" | sort | head -n 32 | while read READ1 do read READ2 bash /path to my script $READ1 $READ2 done

When I started running, the terminal just stuck there and no msg popped up.

ADD REPLY • link 4.3 years ago by tianshenbio ▴ 170

0

Entering edit mode

basename removes only the extension, not the string. Print / echo base output

ADD REPLY • link 4.3 years ago by cpad0112 21k

0

Entering edit mode

Sorry, I didn't get that, could you explain more?

ADD REPLY • link 4.3 years ago by tianshenbio ▴ 170

0

Entering edit mode

Do values of $BASE that you are printing at each step look ok is what @cpad0112 is asking.

ADD REPLY • link 4.3 years ago by GenoMax 141k

0

Entering edit mode

I think the script did not execute at all...

ADD REPLY • link 4.3 years ago by tianshenbio ▴ 170

0

Entering edit mode

What is the exact command you are using to run the script (code posted above in a file)? Code above is expecting to get two values $1 and $2 from command line so it would need to look something like:

my_script.sh blah_R1.fq blah_R2.fq

ADD REPLY • link 4.3 years ago by GenoMax 141k

0

Entering edit mode

I mentioned that in my previous response, it was:

find /path to my original files -name "*.fq.gz" | sort | head -n 32 | while read READ1 do read READ2 bash /path to my script.sh $READ1 $READ2 done

ADD REPLY • link 4.3 years ago by tianshenbio ▴ 170

0

Entering edit mode

Here is setup similar to your case:

$ ls -1
DS_1_1P.fq.gz
DS_1_2P.fq.gz
DS_2_1P.fq.gz
DS_2_2P.fq.gz
my.sh

My script just has the first part of your script:

$ more my.sh
READ1="$1"

READ2="$2"

BASE=$(basename "${READ1/_1P.fq.gz/}")

echo ${BASE}

If I run your command line then I can correctly grab the filename in $BASE variable. So you need to figure out what else is not working in your script after the step where you grab the base file name.

$ ls -1 *.gz | sort | head -n 32 | while read READ1; do read READ2; echo $READ1 $READ2; bash my.sh $READ1 $READ2; done
DS_1_1P.fq.gz DS_1_2P.fq.gz
DS_1
DS_2_1P.fq.gz DS_2_2P.fq.gz
DS_2

Edit: Depending on the size of your data files it may take a while for this analysis to run. You are uncompressing the data files and then doing additional operations on them.

ADD REPLY • link 4.3 years ago by GenoMax 141k

0

Entering edit mode

Thank you so much! That's very clear. My script works fine now.

ADD REPLY • link 4.3 years ago by tianshenbio ▴ 170

0

Entering edit mode

create a directory test
copy two examples sample files into test. Both forward and reverse reads for each sample
try following code and see if it works:

#!/bin/bash 
# stores basenames i.e sample names (from 1p read files) in samps array from directory test
samps=($(basename -s .fq.gz test/*1P*.gz))

# Prints the sample names, sample numbers
printf '%s\n' "These are your samples" "${samps[@]}" "Number of samples are ${#samps[@]}" ''

# Stores Read 1 and 2 files in two indepedent arrays
READ1=(${samps[@]/%/.fastq.gz})
READ2=(${samps[@]/%1P/2P.fastq.gz})

# Prints file names of Read1 and Read2.

echo -e  "These are READ1 files: ${READ1[@]} \n"
echo -e "These are READ2 files: ${READ2[@]} \n"

# Gunzip all the files

echo -e "gunzip ${READ1[@]} ${READ2[@]}\n"

# Iterates over the read arrays.

for i in $(seq 0 $((${#samps[@]}-1)))
  do echo -e  ${READ1[$i]} ${READ2[$i]}
  echo -e "merging sample ${samps[$i]} files"
  echo -e "merge-paired-reads.sh  ${samps[$i]/%/.fastq} ${samps[$i]/%1P/2P.fastq} ${samps[$i]/%/_merged.fastq}"

done

If this works, you only need to edit commands in loop. Remove echo in loop and other places wherever necessary.

output on my local pc is:

$ ./script.sh 
These are your samples
DS_1_1P
WS_2_1P
Number of samples are 2

These are READ1 files: DS_1_1P.fastq.gz WS_2_1P.fastq.gz 

These are READ2 files: DS_1_2P.fastq.gz WS_2_2P.fastq.gz 

gunzip DS_1_1P.fastq.gz WS_2_1P.fastq.gz DS_1_2P.fastq.gz WS_2_2P.fastq.gz

DS_1_1P.fastq.gz DS_1_2P.fastq.gz
merging sample DS_1_1P files
merge-paired-reads.sh  DS_1_1P.fastq DS_1_2P.fastq DS_1_1P_merged.fastq
WS_2_1P.fastq.gz WS_2_2P.fastq.gz
merging sample WS_2_1P files
merge-paired-reads.sh  WS_2_1P.fastq WS_2_2P.fastq WS_2_1P_merged.fastq

ADD REPLY • link 4.3 years ago by cpad0112 21k

0

Entering edit mode

Hi, thank you for your suggestion, I tried it and I got this:

These are your samples
DS_4_1P.fq.gz
WS_1_1P.fq.gz
Number of samples are 2

These are READ1 files: DS_4_1P.fq.gz.fastq.gz WS_1_1P.fq.gz.fastq.gz 

These are READ2 files: DS_4_1P.fq.gz WS_1_1P.fq.gz 

gunzip DS_4_1P.fq.gz.fastq.gz WS_1_1P.fq.gz.fastq.gz DS_4_1P.fq.gz WS_1_1P.fq.gz

DS_4_1P.fq.gz.fastq.gz DS_4_1P.fq.gz
merging sample DS_4_1P.fq.gz files
merge-paired-reads.sh  DS_4_1P.fq.gz.fastq DS_4_1P.fq.gz DS_4_1P.fq.gz_merged.fastq
WS_1_1P.fq.gz.fastq.gz WS_1_1P.fq.gz
merging sample WS_1_1P.fq.gz files
merge-paired-reads.sh  WS_1_1P.fq.gz.fastq WS_1_1P.fq.gz WS_1_1P.fq.gz_merged.fastq

ADD REPLY • link updated 4.3 years ago by GenoMax 141k • written 4.3 years ago by tianshenbio ▴ 170