High amount of intronic/intergenic reads in SMARTer stranded total bulk RNAseq
I have bulk RNAseq data (SMARTer, stranded total RNA with ribo depletion, 100M paired-end reads 150bp) of 12 human samples and the QC stats look like this

  • High mapping rate against the genome (~90% with Hisat2/STAR)
  • Low mapping rate (3-30%, 7 samples <= 10%) against the transcriptome (using Salmon); also alignment-based quantification using STAR alignments as input didn't increase the mapping rate
  • According to Qualimap most of the reads map to the intronic region followed by the intergenic region, e.g.
    • exonic: 8%
    • intronic: 59%
    • intergenic: 33%
    • overlapping exon: 3%
  • After trimming with Fastp, around 65-75% of the reads map to the genome uniqely, and 15-25% reads are multimapping
  • The average input read and mapped length is ~280 according to STAR.

This is consistent across all 12 samples.

Are there other explanations than genomic DNA contamination for a high amount of intronic/intergenic reads and what else could I check?

A similar question was already asked before: High percentage of intronic/intergenic reads in RNA-seq

Thank you very much. MM

Sounds like genomic DNA contamination to me. Even if you had captured nascent (unspliced) RNA, you should still have a much higher coverage over exons.

If you want to check more things, take some of the reads and map them to genomic coordinates (i.e. a BAM file), and visualize on a genome browser.


