I've aligned single-cell RNA seq to mm10 using STAR. I only get about 13% uniquely mapped reads, with 79% being too short.
I get the following output:
Started job on | Mar 09 14:04:53 Started mapping on | Mar 09 14:07:01 Finished on | Mar 09 14:23:11 Mapping speed, Million of reads per hour | 67.13 Number of input reads | 18088226 Average input read length | 47 UNIQUE READS: Uniquely mapped reads number | 2298713 Uniquely mapped reads % | 12.71% Average mapped length | 44.12 Number of splices: Total | 54580 Number of splices: Annotated (sjdb) | 0 Number of splices: GT/AG | 51443 Number of splices: GC/AG | 601 Number of splices: AT/AC | 27 Number of splices: Non-canonical | 2509 Mismatch rate per base, % | 6.80% Deletion rate per base | 0.02% Deletion average length | 1.51 Insertion rate per base | 0.02% Insertion average length | 1.40 MULTI-MAPPING READS: Number of reads mapped to multiple loci | 1405637 % of reads mapped to multiple loci | 7.77% Number of reads mapped to too many loci | 95119 % of reads mapped to too many loci | 0.53% UNMAPPED READS: % of reads unmapped: too many mismatches | 0.00% % of reads unmapped: too short | 78.96% % of reads unmapped: other | 0.03% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%
These two pieces of information appear to contradict each other:
1) 78% of reads are too short
2) Average input read length 47 nucleotides.
I looked at the fastq file and there aren't many short reads. I don't understand what went wrong.
What explains the poor alignment?