I have bulk RNAseq data (SMARTer, stranded total RNA with ribo depletion, 100M paired-end reads 150bp) of 12 human samples and the QC stats look like this
- High mapping rate against the genome (~90% with Hisat2/STAR)
- Low mapping rate (3-30%, 7 samples <= 10%) against the transcriptome (using Salmon); also alignment-based quantification using STAR alignments as input didn't increase the mapping rate
- According to Qualimap most of the reads map to the intronic region followed by the intergenic region, e.g.
- exonic: 8%
- intronic: 59%
- intergenic: 33%
- overlapping exon: 3%
- After trimming with Fastp, around 65-75% of the reads map to the genome uniqely, and 15-25% reads are multimapping
- The average input read and mapped length is ~280 according to STAR.
This is consistent across all 12 samples.
Are there other explanations than genomic DNA contamination for a high amount of intronic/intergenic reads and what else could I check?
A similar question was already asked before: High percentage of intronic/intergenic reads in RNA-seq
Thank you very much. MM