Running SortMeRNA for bulk RNAseq data
2
0
Entering edit mode
4.8 years ago
Farah ▴ 80

Hello,

I need to run the below commands in my university cluster to be able to do SortMeRNA for my fastq.gz files from RNA-seq experiment. I have 18 paired-end fastq.gz files naming like 30786524-PBG_NAM_18_R1.fastq.gz & 30786524-PBG_NAM_18_R2.fastq.gz

R1 shows forward read and R2 is reverse read.

All of my fastq.gz files have different names but they are common in R1.fastq.gz and R2.fastq.gz

I should save the below chunk of code in nano:

#!/bin/bash    
READ_FW="$1"
READ_RV="$2"

FILEBASE=$(basename "${READ_FW/_1.fq.gz/}")

echo "Uncompressing FASTQ data of $FILEBASE"
gunzip "$READ_FW" "$READ_RV"

READ_FW="${READ_FW%.gz}"
READ_RV="${READ_RV%.gz}"

echo "Merging pairs of $FILEBASE"
merge-paired-reads.sh "$READ_FW" "$READ_RV" "${FILEBASE}_interleaved.fq"

echo "Running SortMeRNA for $FILEBASE"
sortmerna --ref $SORTMERNA_DB --reads "${FILEBASE}_interleaved.fq" --aligned \
"${FILEBASE}-rRNA-hits" --other  "${FILEBASE}-sortmerna" --log -a 16 \
-v --paired_in --fastx

echo "Unmerging SortMeRNA filtered pairs for $FILEBASE"
unmerge-paired-reads.sh "${FILEBASE}-sortmerna.fq" \
"${FILEBASE}-sortmerna_1.fq" "${FILEBASE}-sortmerna_2.fq"

echo "Doing cleanup for $FILEBASE"
gzip "$READ_FW" "$READ_RV" "${FILEBASE}-sortmerna_1.fq" \
"${FILEBASE}-sortmerna_2.fq" "${FILEBASE}-rRNA-hits.fq"
rm "${FILEBASE}_interleaved.fq" "${FILEBASE}-sortmerna.fq"

Then, I should use a loop to execute this script:

mkdir ~/sortmerna
cd ~/sortmerna
find ../raw -name "*.fq.gz"  | sort | head -n 32 | while read READ_FW
do 
  read READ_RV
  bash ../runSortMeRNA.sh $READ_FW $READ_RV
done

As I am new in programming, may I know what is FILEBASE? Should I specify a name or a path for it?

Also, should I keep ${READ_FW/_1.fq.gz/}, READ_FW="$1" and READ_RV="$2" exactly like this or adapt them based on my data?

What about find ../raw -name "*.fq.gz" and bash ../runSortMeRNA.sh? Which parts of the codes I should adapt?

I would highly appreciate your help.

Best wishes,
Farah

sortmerna RNA-Seq • 2.8k views
ADD COMMENT
0
Entering edit mode

Can you explain the role of --paired_in and --paired_out ? I am still confused about these two options after reading the usage. Thanks.

ADD REPLY
1
Entering edit mode
4.8 years ago
Carambakaracho ★ 3.2k

Also, should I keep "${READ_FW/_1.fq.gz/}" , READ_FW="$1" and READ_RV="$2" exactly like this or adapt them based on my data?

You may need to adapt the _1.fq.gz part, it matches only forward read files ending in _1.fq.gz but doesn't work for _R1.fq.gz or _1.fastq.gz

What about find ../raw -name "*.fq.gz" and bash ../runSortMeRNA.sh ? which parts of the codes I should adapt?

../raw needs to be adapted to the path wherever your gzipped fastq files are. Same for ../runSortMeRNA.sh

ADD COMMENT
0
Entering edit mode

OK. Many thanks for your help.

ADD REPLY
0
Entering edit mode
4.8 years ago
caggtaagtat ★ 1.9k

With FILEBASE=$(basename "${READ_FW/_1.fq.gz/}") you define the new variable called $FILEBASE as a character string using the file name of the respective READ_FW file, but without the ending "_1.fq.gz" and without the whole filepath which originally is in the filename too.

The endings file endings should be the same in the script.

The script you have is written, so that you don't have to adjust it every time.

ADD COMMENT

Login before adding your answer.

Traffic: 2058 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6