My experimental setup is as; 3 plates, each plate contains treatment of a chemical in 6 different dosages and a control.The duration of the treatment is the same across all dosages. The sequencing is done via bioSpyder, where each gene is represented by a 50bp probe, so you sequence a ~unique probe for individual genes. At the end, 3 biological replicates for each condition.
I did deseq2 for diff. exp. genes. and carried out
counts(dds, normalized=T) for keeping the normalized counts at side. However, when it came to put all my normalized counts for each dosage in a separate dataframe, I realized that the normalized counts of controls did not match; I used the same 3 controls for e.g. 10mM treatment and 20mM treatment while deseq2 but
counts(dds, normalized=T) gives different results for each analysis. This is due to
sizeFactors(dds) that is used for normalization and it changes with every set.
Then I thought maybe putting all my samples in dds object at once and normalizing as a whole could help (I knew if would fail) but when I clustered the normalized dataframe, same dosage treatments did not cluster together but it was mostly the plates clustered with each other regardless of the dosage of treatment. Then I tried plate by plate normalization and it failed as well. There is a strong sequencing session effect.
As this bioplatform used 50 bp probes for each gene, and a gtf according to that design, I cannot get a proper geneLength table that I could put into edgeR as well.
So my question at the end is, what is the right way of normalizing a data structure like mine? And also is there a QC method applied to raw counts? What would you look at in raw counts when it comes to QC?