My issue is with recent RNA-seq data we have. I've aligned my RNA-seq reads to the genome with STAR. The animal is a cnidarian at question. We have a control set of reps and a treated set of reps. Only the treated set are showing poor alignment rates. RNA extraction, enrichment and sequencing were performed on all samples in the same time and run.
The output of STAR for the low aligning samples is that 60% of reads are not mapping due to being 'too short' - this seems to be characteristic for all the treatment reps. QC of reads seems fine. I've used minimal trimming with Trimmomatic as I don't want to remove a lot of valuable data. No head cropping or anything that could affect the alignment % was performed.
The same results are also produced using Salmon. So it doesn't seem to be an issue with software. I've noticed in the treatment samples the GC content is up by 2% compared to control samples.
I'm starting to think contamination? or is there something else at stake?