Question

STAR alignment speed

0

Entering edit mode

5 months ago

Gilles ▴ 10

Hello, I am trying to align RNA sequencing data from the NCBI SRA database to the Apis mellifera genome with STAR. The alignment worked fine. However, the mapping step of the alignment seems to be a bit slow. Furthermore, increasing the number of available threads does not improve the speed. Below you can find the command I used and the content of the Log.final.out file. Is this a good speed for STAR? Are there any methods to improve the speed?

  STAR --runThreadN 12 --genomeDir ~/scratch/genomeDir --readFilesIn $word_1.fastq $word_2.fastq --outFileNamePrefix $word --outSAMtype BAM SortedByCoordinate --outSAMattrRGline ID:$word SM:$sample PL:ILLUMINA

Started job on |       Nov 20 12:44:12
                         Started mapping on |       Nov 20 12:44:12
                                Finished on |       Nov 20 12:57:25
   Mapping speed, Million of reads per hour |       52.40

                      Number of input reads |       11542556
                  Average input read length |       150
                                UNIQUE READS:
               Uniquely mapped reads number |       10873607
                    Uniquely mapped reads % |       94.20%
                      Average mapped length |       149.74
                   Number of splices: Total |       3605561
        Number of splices: Annotated (sjdb) |       0
                   Number of splices: GT/AG |       3574735
                   Number of splices: GC/AG |       23103
                   Number of splices: AT/AC |       1720
           Number of splices: Non-canonical |       6003
                  Mismatch rate per base, % |       0.46%
                     Deletion rate per base |       0.03%
                    Deletion average length |       2.17
                    Insertion rate per base |       0.02%
                   Insertion average length |       1.90
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       299041
         % of reads mapped to multiple loci |       2.59%
    Number of reads mapped to too many loci |       2889
         % of reads mapped to too many loci |       0.03%
                              UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
   % of reads unmapped: too many mismatches |       0.00%
        Number of reads unmapped: too short |       364700
             % of reads unmapped: too short |       3.16%
            Number of reads unmapped: other |       2319
                 % of reads unmapped: other |       0.02%
                              CHIMERIC READS:
                   Number of chimeric reads |       0
                        % of chimeric reads |       0.00%

STAR alignment RNA • 768 views

ADD COMMENT • link updated 5 months ago by ATpoint 82k • written 5 months ago by Gilles ▴ 10

4

Entering edit mode

Is this a good speed for STAR? Are there any methods to improve the speed?

This job finished in 13 min! With larger genomes like human it can take several hours to complete similar jobs.

Furthermore, increasing the number of available threads does not improve the speed

Not unexpected. Number of cores are one part of the equation, there could be limitation from input/output from your storage etc. Algorithms used in bioinformatics programs are not always able to linearly scale the speed. Software itself may not have been written in a way that enables this.

ADD REPLY • link 5 months ago by GenoMax 141k

0

Entering edit mode

13 minutes is very fast. I'm currently working on an assembly polishing pipeline which has a warning on it that it may take 0.5 -10 days, so be sure of your data before going into this. Many bioinformatics tools need to run overnight or longer, so 13 minutes is a luxury.

ADD REPLY • link 5 months ago by colindaven 6.4k

0

Entering edit mode

First world problems.

ADD REPLY • link 5 months ago by Mensur Dlakic ★ 27k

0

Entering edit mode

What helped me was to do what they mentioned above, turn off BAM file sorting. Generating the BAM file without sorting and then using samtools is the best option (samtools sort myfile.bam -o myfile_sorted.bam). Another option is to use an aligner that consumes fewer resources, for example HISAT2.

ADD REPLY • link 5 months ago by sansan_96 ▴ 80

0

Entering edit mode

Fewer resources does not mean faster processing.

ADD REPLY • link 5 months ago by ATpoint 82k

score 2 · Answer 1 · 2023-11-20

2

Entering edit mode

5 months ago

ATpoint 82k

You can disable the sorting which for many analysis is not even required, and even if, then use samtools sort rather than STAR which is not optimized for this afaik. It is also resource-hungry during sorting.

ADD COMMENT • link 5 months ago by ATpoint 82k