I have 24 RNA-seq samples from pig (sus scrofa) and have seen some strange stuff during the QC. When counting features, many samples have around 20 - 80 % of reads assigned to "no feature", and 20 - 50 % not assigned due to "multi-mapping". The proportions vary a lot between samples.
Overall mapping with STAR is not that bad, in total around 90 % of reads are either uniquely mapped or multiple-mapped, so I don't suspect contamination of other species. However, I do want to check for genomic DNA and rRNA.
For gDNA: I have checked some samples in IGV. But to do this for all samples is cumbersome, and IGV constantly crashes on my macbook. Are there any systematic ways to assess gDNA contamination?
For rRNA: There are numerous ways suggested when searching around. But I can't figure out any that sounds straightforward to me. Where do I even get at a reference fasta file for pig rRNA sequences? Should I get gene sequences or transcripts? Any simple explanation of this would be extremely helpful.
Can the high amount of "no feature" be due to poor annotation? The lab protocol is poly-A enriched, but it's a custom protocol and we don't know how well it works, so the error could be anywhere.