Question: bash script for RNA seq alignment with STAR
1
gravatar for nicole p
16 months ago by
nicole p10
nicole p10 wrote:

Hi all,

I need to come up with some kind of automated script that will align each sample to the my reference genome. I am using STAR to align reads. I have reads that look something like this (all paired end):

SRR112312_1.fastq.gz 
SRR112312_2.fastq.gz
SRR112313_1.fastq.gz
SRR112313_2.fastq.gz
SRR112372_1.fastq.gz
SRR112372_2.fastq.gz
...

I have tried with code like this:

for i in $(ls *.fastq.gz | rev | cut -c 11- | rev | uniq)

My typical STAR command looks like this:

STAR --genomeDir /indices/human --readFilesIn SRR112312_1.fastq.gz SRR112312_2.fastq.gz --runThreadN 8 --outFileNamePrefix /alignment/treg_NBP_8_ --quantMode GeneCounts --readFilesCommand zcat

I tried using the answer from this post: bash loop for alignment RNA-seq data but im not having any luck. I would like to align each PE read to the genome to get a BAM file. In this example, there would be 3 BAM files. Also, all of my reads end as in _1.fastq.gz and _2.fastq.gz

If anyone can offer any help that would be much appreciated! Just jumping into scripting and this has been giving me a hard time. Thanks!

script rna-seq star bash • 3.8k views
ADD COMMENTlink modified 15 months ago by mforde841.2k • written 16 months ago by nicole p10
2
gravatar for Michael Dondrup
16 months ago by
Bergen, Norway
Michael Dondrup45k wrote:

This is the script I am currently using, with my default settings, so it can be called just as: star-wrapper.sh reads_1.fastq.gz reads_2.fastq.gz

#!/bin/sh
set -eu

## this is a wrapper for mapping single end and paired end data

STAR=`which STAR` # adapt to your needs
### lsalmonis genome for 76bp reads
DBDIR=licebase/genomedata/lsal76ribo

BASE=`basename $1`
DNAME=`dirname $1`
#BASE=VOD/$BASE
#ZCAT=
ZCAT="--readFilesCommand zcat"
THREADS=140
OPTS_P1="--outReadsUnmapped Fastx --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 100000000000 --genomeLoad LoadAndKeep --seedSearchStartLmax 8 --outFilterMultimapNmax 100 --outFilterMismatchNoverLmax 0.5 --outFileNamePrefix $DNAME/${BASE}"

if [ "$#" -ge 3  ]; then
   echo "illegal number of parameters"
   exit 1
fi

if [ "$#" -eq 2 ]; then
CMD="$STAR --runThreadN $THREADS --outBAMsortingThreadN 10 $OPTS_P1 $ZCAT  --genomeDir $DBDIR --readFilesIn $1 $2"
echo $CMD
$CMD
fi

if [ "$#" -eq 1 ]; then
CMD="$STAR --runThreadN $THREADS --outBAMsortingThreadN 10 $OPTS_P1 $ZCAT  --genomeDir $DBDIR --readFilesIn $1"
echo $CMD
$CMD
fi

Then I have a little wrapper:

#!/bin/sh
set -eu

for F in $@ ; do
  R1=`echo $F | sed s/_2/_1/`
  R2=`echo $F | sed s/_1/_2/`
  echo "Processing $R1 $R2"
  ./star-wrapper.sh $R1 $R2
  echo "Done"
done

which can be called like nohup doit.sh *_1.fastq.gz for paired end reads on all reads in the directory.

ADD COMMENTlink modified 16 months ago • written 16 months ago by Michael Dondrup45k
0
gravatar for Sirus
15 months ago by
Sirus770
Boston/USA
Sirus770 wrote:

You can to use bpipe, to automate and parallelize your pipeline.

For example, you can create a file STAR_mapping.groovy then write your script as follow (supposing that you already have done the adaptor trimming and quality control)

runSTAR = {
   def prefix="/alignment/treg_NBP_8_ "
   exec """
   STAR --genomeDir /indices/human 
   --readFilesIn $inputs
   --runThreadN 8 
   --outFileNamePrefix ${prefix}
   --quantMode GeneCounts 
   --readFilesCommand zcat
   """
 }

 run {
   "%_*.fastq.gz " * [runSTAR]
 }

Then you can run it using the following command:

bpipe run -r -n 2 STAR_mapping.groovy *.fastq.gz

Here: - r : generate an HTML report / documentation for the pipeline -n : how many parallel job you allow to run

Here -n 2 means, you'll be running two STAR mapping jobs in the same time. In your script, each job will need 8 threads. In total you'll be using 2*8=16 threads. Also keep in mind that STAR uses about 32Gb, so here you'll need 32 *2 =64Gb.

you can set -n 1 to allow just one job at a time if you don't have too much resources.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Sirus770
0
gravatar for mforde84
15 months ago by
mforde841.2k
mforde841.2k wrote:

https://github.com/mforde84/RNAseq_BigMechanism_DARPA

ADD COMMENTlink written 15 months ago by mforde841.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 977 users visited in the last hour