Hello, biostars. What are the steps that you do when you get a mapping percentage below 70% with STAR software while mapping with a human reference genome? I want an overall instruction that works for most human samples. Based on my searches, I propose the following instruction but I know it has some deficiencies. I would be grateful if you could complete it:
1- If Per sequence GC content result of fastQC has two or multiple picks it is probable that my data has contamination.I should BLAST 10-15 unmapped reads to find the source of contamination.
2- If I also have overrepresented sequences, I should BLAST them to find the source of contamination. The contamination source might be rRNA or DNA contamination or contamination from other organisms.
About 2 I don’t know what should I do in the case of each contamination source. Should I remove rRNA contamination? What about DNA contamination and contamination from other organisms?