Question: Running SortMeRNA for bulk RNAseq data
0
gravatar for F. Golestan
7 weeks ago by
F. Golestan20
F. Golestan20 wrote:

Hello,

I need to run the below commands in my university cluster to be able to do SortMeRNA for my fastq.gz files from RNA-seq experiment. I have 18 paired-end fastq.gz files naming like 30786524-PBG_NAM_18_R1.fastq.gz & 30786524-PBG_NAM_18_R2.fastq.gz

R1 shows forward read and R2 is reverse read. All of my fastq.gz files have different names but they are common in R1.fastq.gz and R2.fastq.gz

I should save the below chunk of code in nano:

!/bin/bash

READ_FW="$1"
READ_RV="$2"

FILEBASE=$(basename "${READ_FW/_1.fq.gz/}")

echo "Uncompressing FASTQ data of $FILEBASE"
gunzip "$READ_FW" "$READ_RV"

READ_FW="${READ_FW%.gz}"
READ_RV="${READ_RV%.gz}"

echo "Merging pairs of $FILEBASE"
merge-paired-reads.sh "$READ_FW" "$READ_RV" "${FILEBASE}_interleaved.fq"

echo "Running SortMeRNA for $FILEBASE"
sortmerna --ref $SORTMERNA_DB --reads "${FILEBASE}_interleaved.fq" --aligned \
"${FILEBASE}-rRNA-hits" --other  "${FILEBASE}-sortmerna" --log -a 16 \
-v --paired_in --fastx

echo "Unmerging SortMeRNA filtered pairs for $FILEBASE"
unmerge-paired-reads.sh "${FILEBASE}-sortmerna.fq" \
"${FILEBASE}-sortmerna_1.fq" "${FILEBASE}-sortmerna_2.fq"

echo "Doing cleanup for $FILEBASE"
gzip "$READ_FW" "$READ_RV" "${FILEBASE}-sortmerna_1.fq" \
"${FILEBASE}-sortmerna_2.fq" "${FILEBASE}-rRNA-hits.fq"
rm "${FILEBASE}_interleaved.fq" "${FILEBASE}-sortmerna.fq"

Then, I should use a loop to execute this script:

mkdir ~/sortmerna

cd ~/sortmerna
find ../raw -name "*.fq.gz"  | sort | head -n 32 | while read READ_FW
do 
  read READ_RV
  bash ../runSortMeRNA.sh $READ_FW $READ_RV
done

As I am new in programming, may I know what is FILEBASE? should I specify a name or a path for it?

Also, should I keep "${READ_FW/_1.fq.gz/}" , READ_FW="$1" and READ_RV="$2" exactly like this or adapt them based on my data? What about find ../raw -name "*.fq.gz" and bash ../runSortMeRNA.sh ? which parts of the codes I should adapt?

I would highly appreciate your help. Best wishes, Farah

ADD COMMENTlink modified 7 weeks ago by Carambakaracho1.5k • written 7 weeks ago by F. Golestan20
1
gravatar for Carambakaracho
7 weeks ago by
Carambakaracho1.5k
Switzerland/Basel
Carambakaracho1.5k wrote:

Also, should I keep "${READ_FW/_1.fq.gz/}" , READ_FW="$1" and READ_RV="$2" exactly like this or adapt them based on my data?

You may need to adapt the _1.fq.gz part, it matches only forward read files ending in _1.fq.gz but doesn't work for _R1.fq.gz or _1.fastq.gz

What about find ../raw -name "*.fq.gz" and bash ../runSortMeRNA.sh ? which parts of the codes I should adapt?

../raw needs to be adapted to the path wherever your gzipped fastq files are. Same for ../runSortMeRNA.sh

ADD COMMENTlink written 7 weeks ago by Carambakaracho1.5k

OK. Many thanks for your help.

ADD REPLYlink written 6 weeks ago by F. Golestan20
0
gravatar for caggtaagtat
7 weeks ago by
caggtaagtat700
caggtaagtat700 wrote:

With FILEBASE=$(basename "${READ_FW/_1.fq.gz/}") you define the new variable called $FILEBASE as a character string using the file name of the respective READ_FW file, but without the ending "_1.fq.gz" and without the whole filepath which originally is in the filename too.

The endings file endings should be the same in the script.

The script you have is written, so that you don't have to adjust it every time.

ADD COMMENTlink written 7 weeks ago by caggtaagtat700
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 825 users visited in the last hour