I have Illumina mRNA-seq samples where it seems because of low RINs (2-4) in a bunch of them compared to the others, I am getting very widely varying mapping rates (15%-70%) and therefore counts per sample (e.g. 8,000,000 mapped reads vs 40,000,000). Plus I can't really use RIN/mapping rate as a covariate because it is very confounded with a group of interest. Looking at excluding another 20 low mapping rate samples atm.
Is there a preferred way of analyzing this type of data? If I do the usual VST through DESEQ2 I get a cluster of samples with irregular high expression of a lot of genes, also the ones with low numbers of overall sample counts, presumably this is because of what I describe above. I was wondering if quantile normalisation would help as it uses rankings to make the samples more comparable, this could be the kind of extreme situation where it may help... Are there any other ideas?
I also used Salmon to quantify the data using the gc bias and validate mappings flags. Reads are 150bp PE. If I do not run gc bias correction and validate mappings, the mapping rates go up about 10%, but I suspect the quality of those mappings is reduced so currently am using the data with these flags.