Running SortMeRNA for bulk RNAseq data
2
0
Entering edit mode
2.2 years ago
Farah ▴ 70

Hello,

I need to run the below commands in my university cluster to be able to do SortMeRNA for my fastq.gz files from RNA-seq experiment. I have 18 paired-end fastq.gz files naming like 30786524-PBG_NAM_18_R1.fastq.gz & 30786524-PBG_NAM_18_R2.fastq.gz

R1 shows forward read and R2 is reverse read. All of my fastq.gz files have different names but they are common in R1.fastq.gz and R2.fastq.gz

I should save the below chunk of code in nano:

# !/bin/bash

READ_FW="$1" READ_RV="$2"

FILEBASE=$(basename "${READ_FW/_1.fq.gz/}")

echo "Uncompressing FASTQ data of $FILEBASE" gunzip "$READ_FW" "$READ_RV" READ_FW="${READ_FW%.gz}"
READ_RV="${READ_RV%.gz}" echo "Merging pairs of$FILEBASE"
merge-paired-reads.sh "$READ_FW" "$READ_RV" "${FILEBASE}_interleaved.fq" echo "Running SortMeRNA for$FILEBASE"
sortmerna --ref $SORTMERNA_DB --reads "${FILEBASE}_interleaved.fq" --aligned \
"${FILEBASE}-rRNA-hits" --other "${FILEBASE}-sortmerna" --log -a 16 \
-v --paired_in --fastx

echo "Unmerging SortMeRNA filtered pairs for $FILEBASE" unmerge-paired-reads.sh "${FILEBASE}-sortmerna.fq" \
"${FILEBASE}-sortmerna_1.fq" "${FILEBASE}-sortmerna_2.fq"

echo "Doing cleanup for $FILEBASE" gzip "$READ_FW" "$READ_RV" "${FILEBASE}-sortmerna_1.fq" \
"${FILEBASE}-sortmerna_2.fq" "${FILEBASE}-rRNA-hits.fq"
rm "${FILEBASE}_interleaved.fq" "${FILEBASE}-sortmerna.fq"


Then, I should use a loop to execute this script:

mkdir ~/sortmerna

cd ~/sortmerna
do
bash ../runSortMeRNA.sh $READ_FW$READ_RV
done


As I am new in programming, may I know what is FILEBASE? should I specify a name or a path for it?

Also, should I keep "${READ_FW/_1.fq.gz/}" , READ_FW="$1" and READ_RV="$2" exactly like this or adapt them based on my data? What about find ../raw -name "*.fq.gz" and bash ../runSortMeRNA.sh ? which parts of the codes I should adapt? I would highly appreciate your help. Best wishes, Farah sortmerna RNA-Seq bioinformatics • 1.4k views ADD COMMENT 0 Entering edit mode Can you explain the role of --paired_in and --paired_out ? I am still confused about these two options after reading the usage. Thanks. ADD REPLY 1 Entering edit mode 2.2 years ago Carambakaracho ★ 2.7k Also, should I keep "${READ_FW/_1.fq.gz/}" , READ_FW="$1" and READ_RV="$2" exactly like this or adapt them based on my data?

You may need to adapt the _1.fq.gz part, it matches only forward read files ending in _1.fq.gz but doesn't work for _R1.fq.gz or _1.fastq.gz

What about find ../raw -name "*.fq.gz" and bash ../runSortMeRNA.sh ? which parts of the codes I should adapt?

../raw needs to be adapted to the path wherever your gzipped fastq files are. Same for ../runSortMeRNA.sh

0
Entering edit mode

OK. Many thanks for your help.

0
Entering edit mode
2.2 years ago
caggtaagtat ★ 1.5k

With FILEBASE=$(basename "${READ_FW/_1.fq.gz/}") you define the new variable called \$FILEBASE as a character string using the file name of the respective READ_FW file, but without the ending "_1.fq.gz" and without the whole filepath which originally is in the filename too.

The endings file endings should be the same in the script.

The script you have is written, so that you don't have to adjust it every time.