Question: bash script for RNA seq alignment with STAR
gravatar for nicole p
7 months ago by
nicole p10
nicole p10 wrote:

Hi all,

I need to come up with some kind of automated script that will align each sample to the my reference genome. I am using STAR to align reads. I have reads that look something like this (all paired end):


I have tried with code like this:

for i in $(ls *.fastq.gz | rev | cut -c 11- | rev | uniq)

My typical STAR command looks like this:

STAR --genomeDir /indices/human --readFilesIn SRR112312_1.fastq.gz SRR112312_2.fastq.gz --runThreadN 8 --outFileNamePrefix /alignment/treg_NBP_8_ --quantMode GeneCounts --readFilesCommand zcat

I tried using the answer from this post: bash loop for alignment RNA-seq data but im not having any luck. I would like to align each PE read to the genome to get a BAM file. In this example, there would be 3 BAM files. Also, all of my reads end as in _1.fastq.gz and _2.fastq.gz

If anyone can offer any help that would be much appreciated! Just jumping into scripting and this has been giving me a hard time. Thanks!

script rna-seq star bash • 1.3k views
ADD COMMENTlink modified 6 months ago by mforde841.1k • written 7 months ago by nicole p10
gravatar for Michael Dondrup
7 months ago by
Bergen, Norway
Michael Dondrup43k wrote:

This is the script I am currently using, with my default settings, so it can be called just as: reads_1.fastq.gz reads_2.fastq.gz

set -eu

## this is a wrapper for mapping single end and paired end data

STAR=`which STAR` # adapt to your needs
### lsalmonis genome for 76bp reads

BASE=`basename $1`
DNAME=`dirname $1`
ZCAT="--readFilesCommand zcat"
OPTS_P1="--outReadsUnmapped Fastx --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 100000000000 --genomeLoad LoadAndKeep --seedSearchStartLmax 8 --outFilterMultimapNmax 100 --outFilterMismatchNoverLmax 0.5 --outFileNamePrefix $DNAME/${BASE}"

if [ "$#" -ge 3  ]; then
   echo "illegal number of parameters"
   exit 1

if [ "$#" -eq 2 ]; then
CMD="$STAR --runThreadN $THREADS --outBAMsortingThreadN 10 $OPTS_P1 $ZCAT  --genomeDir $DBDIR --readFilesIn $1 $2"
echo $CMD

if [ "$#" -eq 1 ]; then
CMD="$STAR --runThreadN $THREADS --outBAMsortingThreadN 10 $OPTS_P1 $ZCAT  --genomeDir $DBDIR --readFilesIn $1"
echo $CMD

Then I have a little wrapper:

set -eu

for F in $@ ; do
  R1=`echo $F | sed s/_2/_1/`
  R2=`echo $F | sed s/_1/_2/`
  echo "Processing $R1 $R2"
  ./ $R1 $R2
  echo "Done"

which can be called like nohup *_1.fastq.gz for paired end reads on all reads in the directory.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Michael Dondrup43k
gravatar for Sirus
6 months ago by
Sirus770 wrote:

You can to use bpipe, to automate and parallelize your pipeline.

For example, you can create a file STAR_mapping.groovy then write your script as follow (supposing that you already have done the adaptor trimming and quality control)

runSTAR = {
   def prefix="/alignment/treg_NBP_8_ "
   exec """
   STAR --genomeDir /indices/human 
   --readFilesIn $inputs
   --runThreadN 8 
   --outFileNamePrefix ${prefix}
   --quantMode GeneCounts 
   --readFilesCommand zcat

 run {
   "%_*.fastq.gz " * [runSTAR]

Then you can run it using the following command:

bpipe run -r -n 2 STAR_mapping.groovy *.fastq.gz

Here: - r : generate an HTML report / documentation for the pipeline -n : how many parallel job you allow to run

Here -n 2 means, you'll be running two STAR mapping jobs in the same time. In your script, each job will need 8 threads. In total you'll be using 2*8=16 threads. Also keep in mind that STAR uses about 32Gb, so here you'll need 32 *2 =64Gb.

you can set -n 1 to allow just one job at a time if you don't have too much resources.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Sirus770
gravatar for mforde84
6 months ago by
mforde841.1k wrote:

ADD COMMENTlink written 6 months ago by mforde841.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1281 users visited in the last hour