Question: Long Read Length, yet STAR says many reads too short
gravatar for dec986
4.0 years ago by
United States
dec986260 wrote:


I've aligned single-cell RNA seq to mm10 using STAR. I only get about 13% uniquely mapped reads, with 79% being too short.

I get the following output:

                             Started job on |   Mar 09 14:04:53
                         Started mapping on |   Mar 09 14:07:01
                                Finished on |   Mar 09 14:23:11
   Mapping speed, Million of reads per hour |   67.13

                      Number of input reads |   18088226
                  Average input read length |   47
                                UNIQUE READS:
               Uniquely mapped reads number |   2298713
                    Uniquely mapped reads % |   12.71%
                      Average mapped length |   44.12
                   Number of splices: Total |   54580
        Number of splices: Annotated (sjdb) |   0
                   Number of splices: GT/AG |   51443
                   Number of splices: GC/AG |   601
                   Number of splices: AT/AC |   27
           Number of splices: Non-canonical |   2509
                  Mismatch rate per base, % |   6.80%
                     Deletion rate per base |   0.02%
                    Deletion average length |   1.51
                    Insertion rate per base |   0.02%
                   Insertion average length |   1.40
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   1405637
         % of reads mapped to multiple loci |   7.77%
    Number of reads mapped to too many loci |   95119
         % of reads mapped to too many loci |   0.53%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.00%
             % of reads unmapped: too short |   78.96%
                 % of reads unmapped: other |   0.03%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%

These two pieces of information appear to contradict each other:

1) 78% of reads are too short

2) Average input read length 47 nucleotides.

I looked at the fastq file and there aren't many short reads. I don't understand what went wrong.

What explains the poor alignment?

ADD COMMENTlink modified 4.0 years ago by datascientist28470 • written 4.0 years ago by dec986260

Did you check with fastQC how your read length distribution is?

ADD REPLYlink written 4.0 years ago by Benn8.1k

![enter image description here][1]

Hi b.nota, [1]: but this just looks like a zoom out, yes some reads are too short, but this only be a fraction of a percent.

ADD REPLYlink written 4.0 years ago by dec986260

You can change the minimum read length manually, hopefully this helps for you. See previous post:

STAR Aligner minimum read-length

ADD REPLYlink written 4.0 years ago by Benn8.1k
gravatar for datascientist28
4.0 years ago by
University of Washington
datascientist28470 wrote:

I had a similar problem a couple months ago. It's poorly labeled, too short doesn't actually mean literally "too short".

"too short" means that the best alignments STAR found were too short to pass the filters.

This is controlled by --outFilterScoreMinOverLread --outFilterMatchNminOverLread which by default are set to 0.66. which means that ~2/3 of the total read length (sum of mates) should be mapped.

So what your output means is that 12% of the reads aligned unniquely, 7.7% aligned but multimapped and then 80% of your reads couldn't align with the above parameters. You can try to reduce these parameters to see how many more reads will be mapped. However, it looks like your data might just be contaminated with that alignment score :((

Here is a link to the convo I had with Alex, the developer of STAR. LINK

ADD COMMENTlink written 4.0 years ago by datascientist28470
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1149 users visited in the last hour