Running STAR aligner on paired-end reads
1
1
Entering edit mode
2.2 years ago
javanokendo ▴ 60

I have four files from paired end reads: SRX10603399_SRR14240730_1.fastq.gz,SRX10603399_SRR14240730_2.fastq.gz,SRX10603417_SRR14240748_1.fastq.gz, and SRX10603417_SRR14240748_2.fastq.gz. I want to use #STAR aligner to align the four files and get two bam files. The code I have is producing four bam files. The following is my code:

module load software/star-2.7.9a

# define variables
index=/scratch/oknjav001/sarsCovRNA/star_index
# get our data files
FILES=/scratch/oknjav001/sarsCovRNA/pbmcs_healthyvscovid/pbmcs/fastq/*.fastq.gz

for f in $FILES
do
  echo $f
  base=$(basename $f .fastq.gz)
  echo $base
  STAR --runThreadN 3 --genomeDir $index --readFilesIn $f --outSAMtype BAM   SortedByCoordinate --outTmpDir /scratch/oknjav001/sarsCovRNA/tempalign --quantMode GeneCounts
  --readFilesCommand zcat --outFileNamePrefix $base"_"
done

echo "done!"
STAR aligner • 7.6k views
ADD COMMENT
0
Entering edit mode

What is the problem here?

ADD REPLY
0
Entering edit mode

I want to pass in _R1.fq.gz and _R2.fq.gz to get one combined bam file from the forward and reverse reads. I want to do this in a loop.

ADD REPLY
0
Entering edit mode

You are getting 4 bam files because you are running STAR 4 times, once per fastq file. This is due to $FILES being an array of 4 different fastq filenames.

ADD REPLY
3
Entering edit mode
2.2 years ago
jv ★ 1.8k

One solution (that minimally changes what you already have) is to loop through an array of $base names instead of fastq file names, example given below. My recommendation for a more flexible script would be to create a separate file that is a list of the base names and to read that in via a while read loop.

module load software/star-2.7.9a

# define variables
index=/scratch/oknjav001/sarsCovRNA/star_index
# get our data files
FQ_DIR=/scratch/oknjav001/sarsCovRNA/pbmcs_healthyvscovid/pbmcs/fastq

for base in SRX10603399_SRR14240730 SRX10603417_SRR14240748
do
  echo $base

  # define R1 fastq filename
  fq1=$FQ_DIR/${base}_1.fastq.gz

 # define R2 fastq filename
  fq2=$FQ_DIR/${base}_2.fastq.gz

 # align with STAR
  STAR --runThreadN 3 --genomeDir $index \
    --readFilesIn $fq1 $fq2 --outSAMtype BAM   SortedByCoordinate \
    --outTmpDir /scratch/oknjav001/sarsCovRNA/tempalign --quantMode GeneCounts
    --readFilesCommand zcat --outFileNamePrefix $base"_"
done

echo "done!"
ADD COMMENT
0
Entering edit mode

Thanks so much this works well for me. I am now able to get on bam file from paired-end raw file.

ADD REPLY

Login before adding your answer.

Traffic: 3741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6