I have a bunch of samples from an RNAseq project that I've just tried aligning to my reference genome. For a majority of my treatments all replicates are 79-92% aligned to the reference. For one of the treatments, however, the alignment rate is much lower (11-52%). Can anyone suggest steps to see if there are issues with the low alignment treatment that could be overcome? Could this be an issue where reads that should align are not for some reason?
Info: Sequences from a sequencing facility which produced 150 SE reads. Raw reads went through trimmomatic to remove adapters and 3bp from each end. Trimmed reads went through both STAR and hisat2 (separately) to confirm low alignment rate in this treatment.
FastQC looks similar between the "good" and "bad" sets (except that maybe the "bad" sets look like they have more kmers?) Tried aligning the raw reads and got similar results. Blasted some raw reads and they get hits to the species of interest.
Most likely result is contamination. Did you BLAST unmapped reads? And if so, for the unmapped reads that BLASTed to the correct organism, why did they not align in the first place? Are they low-quality? What kind of alignment length and identity does BLAST indicate?
Also, I recommend running BBMerge to check the insert size in an alignment-free manner:
You write that you obtain different results for one of the treatments, but about how many samples is this question then? If you observe this for just one sample it might be something technical/library prep issue.