Question: Terrible RNA mapping result by STAR
19 months ago
wrote:

Hi,all It was the first time for me to map RNA sequence. The data generated from corals .I used STAR to map the sequence to the reference. I used the default parameter but got a terrible result. The final mapping result was

                             Started job on |       Dec 12 16:02:44
                         Started mapping on |       Dec 12 16:03:13
                                Finished on |       Dec 12 16:12:34
   Mapping speed, Million of reads per hour |       85.99

                      Number of input reads |       13400813
                  Average input read length |       150
                                UNIQUE READS:
               Uniquely mapped reads number |       3114
                    Uniquely mapped reads % |       0.02%
                      Average mapped length |       124.34
                   Number of splices: Total |       41
        Number of splices: Annotated (sjdb) |       2
                   Number of splices: GT/AG |       24
                   Number of splices: GC/AG |       4
                   Number of splices: AT/AC |       0
           Number of splices: Non-canonical |       13
                  Mismatch rate per base, % |       4.13%
                     Deletion rate per base |       0.03%
                    Deletion average length |       1.86
                    Insertion rate per base |       0.01%
                   Insertion average length |       1.47
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       1505
         % of reads mapped to multiple loci |       0.01%
    Number of reads mapped to too many loci |       47
         % of reads mapped to too many loci |       0.00%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |       0.00%
             % of reads unmapped: too short |       99.96%
                 % of reads unmapped: other |       0.00%
                              CHIMERIC READS:
                   Number of chimeric reads |       0
                        % of chimeric reads |       0.00%

Is there any idea about the too many unmapped reads? I didn't understand what the reason 'too short' mean. Can somebody explain it?Thanks!

Could you send your data to a pre-processing software like fastqc

What are your reads length ?

What is you command line to align ?

% of reads unmapped: too short can mean two things with STAR :

  • Read is too short, STAR throw it away
  • Read is long enought to map but does not map, so STAR trims the read and try to align again, read does not map again, STAR trims it again... At this end, if the read does not map, it is too short to map so STAR throw it away
Bastien Hervé wrote:

Too short means too short alignment. Are you sure you use the right reference?

Benn wrote:

In fact, I have nine types of coral,and I chosen five of these to build the reference index independently.But unfortunately, the results were similar

wrote:

Can you elaborate on this ? Are you just using short contigs as a reference (please give stats, like using bbmaps ? Are you aligning against a single species ?

Have you tried bwa-mem or minimap2 to check their mapping rates for general info ? Have you ever tried alignments to these references before ?

colindaven wrote:

I've notice that older STAR versions have issues with PE-reads having too much of an overlap.

If you've got PE data, try only R1 first. Otherwise, check your FastQC reports, as Batien mentioned, for adapter-contamination or overrepresented sequences indicating other contaminations.

michael.ante wrote:
