Good statistics for RNA-seq alignments using RSEM
2.0 years ago
Hi, After calculating the expression from my raw read files, I retrieved these statistics from the cnt file in the stat folder:

17572420 28769454 0 46341874

27465439 1304015 11918214

60305896 3

I'm quite concerned that the number of unalignable reads are 2/3 of the number of alignable reads. However, my reference transcriptome are that provided on the RSEM website which only includes RefSeq with NM prefix.

Does this mean that the unalignable reads may be belong to noncoding sequences, miRNA, etc. instead of mature RNAs? And is this alignment statistics good enough to be proceeded to differential expression analysis?

Is this standard RNA-seq? Did you do poly-A enrichment or ribosomal depletion?

Some of the reasons why you might get low mapping rates:

  • There is a high level of adapter only reads
  • The inserts are very short due to RNA degradation
  • There is a high level of ribosomal RNA contamination
  • The base quality scores are very low
  • There was a mixup with the reference genome
  • There was contamination with genomic DNA

So you should look further into the QC to eliminate the above possibilities.


