Question: bash script for RNA seq alignment with STAR
gravatar for nicole p
3.3 years ago by
nicole p40
nicole p40 wrote:

Hi all,

I need to come up with some kind of automated script that will align each sample to the my reference genome. I am using STAR to align reads. I have reads that look something like this (all paired end):


I have tried with code like this:

for i in $(ls *.fastq.gz | rev | cut -c 11- | rev | uniq)

My typical STAR command looks like this:

STAR --genomeDir /indices/human --readFilesIn SRR112312_1.fastq.gz SRR112312_2.fastq.gz --runThreadN 8 --outFileNamePrefix /alignment/treg_NBP_8_ --quantMode GeneCounts --readFilesCommand zcat

I tried using the answer from this post: bash loop for alignment RNA-seq data but im not having any luck. I would like to align each PE read to the genome to get a BAM file. In this example, there would be 3 BAM files. Also, all of my reads end as in _1.fastq.gz and _2.fastq.gz

If anyone can offer any help that would be much appreciated! Just jumping into scripting and this has been giving me a hard time. Thanks!

script rna-seq star bash • 9.2k views
ADD COMMENTlink modified 3.3 years ago by mforde841.3k • written 3.3 years ago by nicole p40
gravatar for Michael Dondrup
3.3 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

This is the script I am currently using, with my default settings, so it can be called just as: reads_1.fastq.gz reads_2.fastq.gz

set -eu

## this is a wrapper for mapping single end and paired end data

STAR=`which STAR` # adapt to your needs
### lsalmonis genome for 76bp reads

BASE=`basename $1`
DNAME=`dirname $1`
ZCAT="--readFilesCommand zcat"
OPTS_P1="--outReadsUnmapped Fastx --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 100000000000 --genomeLoad LoadAndKeep --seedSearchStartLmax 8 --outFilterMultimapNmax 100 --outFilterMismatchNoverLmax 0.5 --outFileNamePrefix $DNAME/${BASE}"

if [ "$#" -ge 3  ]; then
   echo "illegal number of parameters"
   exit 1

if [ "$#" -eq 2 ]; then
CMD="$STAR --runThreadN $THREADS --outBAMsortingThreadN 10 $OPTS_P1 $ZCAT  --genomeDir $DBDIR --readFilesIn $1 $2"
echo $CMD

if [ "$#" -eq 1 ]; then
CMD="$STAR --runThreadN $THREADS --outBAMsortingThreadN 10 $OPTS_P1 $ZCAT  --genomeDir $DBDIR --readFilesIn $1"
echo $CMD

Then I have a little wrapper:

set -eu

for F in $@ ; do
  R1=`echo $F | sed s/_2/_1/`
  R2=`echo $F | sed s/_1/_2/`
  echo "Processing $R1 $R2"
  ./ $R1 $R2
  echo "Done"

which can be called like nohup *_1.fastq.gz for paired end reads on all reads in the directory.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Michael Dondrup48k
gravatar for Sirus
3.3 years ago by
Sirus790 wrote:

You can to use bpipe, to automate and parallelize your pipeline.

For example, you can create a file STAR_mapping.groovy then write your script as follow (supposing that you already have done the adaptor trimming and quality control)

runSTAR = {
   def prefix="/alignment/treg_NBP_8_ "
   exec """
   STAR --genomeDir /indices/human 
   --readFilesIn $inputs
   --runThreadN 8 
   --outFileNamePrefix ${prefix}
   --quantMode GeneCounts 
   --readFilesCommand zcat

 run {
   "%_*.fastq.gz " * [runSTAR]

Then you can run it using the following command:

bpipe run -r -n 2 STAR_mapping.groovy *.fastq.gz

Here: - r : generate an HTML report / documentation for the pipeline -n : how many parallel job you allow to run

Here -n 2 means, you'll be running two STAR mapping jobs in the same time. In your script, each job will need 8 threads. In total you'll be using 2*8=16 threads. Also keep in mind that STAR uses about 32Gb, so here you'll need 32 *2 =64Gb.

you can set -n 1 to allow just one job at a time if you don't have too much resources.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Sirus790
gravatar for mforde84
3.3 years ago by
mforde841.3k wrote:

ADD COMMENTlink written 3.3 years ago by mforde841.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1851 users visited in the last hour