Question

Closed:Understanding RNA-Seq Pipeline: newbie

1

Entering edit mode

4.7 years ago

WUSCHEL ▴ 750

I am new to bioinformatics and RNA-seq. I am planning to use below workflow for my Arabidopsis thaliana RNAseq data analysis, but I am not confident how to get started as I do not understand the purpose of each function. Could someone explain to me the purpose of the main functions of this pipeline? and general workflow?

    #!/bin/bash

# Use kallisto to perform k-mer based transcript quantification
# https://www.nature.com/articles/nbt.3519
# Build annotation index kallisto index -i annotation.idx annotation.fa

set -eu

if [ "$#" -lt 5 ]; then
    echo "Missing arguments!"
    echo "USAGE: kallisto.sh <SE,PE> <R1> <R2> <strandedness> <index> <name>"
    echo "strand: unstranded, fr_stranded, rf_stranded"
    echo "EXAMPLE: kallisto.sh PE SRR5724597_1.fastq.gz SRR5724597_2.fastq.gz unstranded AtRTD2_19April2016.idx col0-r1"
exit 1
fi

dow=$(date +"%F")

###########
### SINGLE END
###########

if [ "$1" == "SE" ]; then
    # requirements
    if [ "$#" -ne 5 ]; then
        echo "Missing required arguments for single-end!"
        echo "USAGE: kallisto.sh <SE> <R1> <strandedness> <index> <name>"
        exit 1
    fi

type=$1
R1=$2
strand=$3
annotation=$4
name=$5

echo "##################"
echo "Performing single-end alignments with kallisto"
echo "Type: $type"
echo "Input Files: $R1"
echo "Annotation: $annotation"
echo "Sample: $name"
echo "Time of analysis: $dow"
echo "##################"

# file structure
mkdir ${name}_kallisto_${dow}
mv $R1 -t ${name}_kallisto_${dow}
cd ${name}_kallisto_${dow}

mkdir 0_fastq
mv $R1 -t 0_fastq/

### Read trimming & FastQC
echo "Read trimming and FastQC"

mkdir 1_trimmed_fastq
cd 1_trimmed_fastq
trim_galore --fastqc --fastqc_args "--threads 4" ../0_fastq/$R1 | tee -a ../${name}_logs_${dow}.log
cd ../

mkdir 2_quant/
mv 1_trimmed_fastq/*fq.gz 2_quant/
cd 2_quant/

echo "                      "
echo "kallisto"
echo "                      "

if [ $strand == "unstranded" ]; then

    kallisto quant -i $annotation -t 4 --bias --single ${R1%%.fastq*}_trimmed.fq* -b 50 -l 300 -s 100 -o ./ 2>&1 | tee -a ../${name}_logs_${dow}.log

elif [ $strand == "fr_stranded" ]; then
        kallisto quant -i $annotation --fr-stranded -t 4 --bias --single ${R1%%.fastq*}_trimmed.fq* -b 50 -l 300 -s 100 -o ./ 2>&1 | tee -a ../${name}_logs_${dow}.log

else kallisto quant -i $annotation --rf-stranded -t 4 --bias --single ${R1%%.fastq*}_trimmed.fq* -b 50 -l 300 -s 100 -o ./ 2>&1 | tee -a ../${name}_logs_${dow}.log

fi

mv *fq.gz ../1_trimmed_fastq/

echo "complete"

fi

Your help is greatly appreciated. Thank you.

RNA-Seq next-gen sequencing • 133 views

ADD COMMENT • link 4.7 years ago by WUSCHEL ▴ 750