I've been given some data to perform differential expression on, and it the process of QCing the resultant count data, I'm seeing that the library sizes have pretty big discrepancies between the 2 samples shown below. I know a good run of an illumina generates between 10-40 million reads, but is it normal for such runs to produce starkly different total reads like this? i.e.: is this an acceptable library size?
I have conducted PCA on this particular grouping and found that P70F20 is a significant outlier and removed it, so I'm also curious how much of that variability is potentially attributable to the library size? I believe DESeq uses TPM normalization, and that should control for this difference?
Any help is appreciated, I have never seen this magnitude of difference in a single grouping before. Fastqc was perfect as well, adapters were trimmed with cutadapt, alignment and counting was done using the Rsubread package in R.