I am analyzing some RNA-Seq data from animals in different time points after a trauma in the spinal cord. The animal’s tissue have been pooled according to the time-points, therefore I do not have biological neither technical replicates. In order to analyze this data, I have attributed a new label to my samples which would represent "acute" or "later" conditions after the trauma. This grouping would allow me to have biological replicates and therefore "n".
I have performed a heatmap (sample distances) and also a PCA. According to my understanding of the PCA, we can see samples grouping according to such label and that would reflect that the biological condition is explaining a big part of the variation in the data.
My question is: considering that that my biological variable would explain a big part of the variation in the data, could I model the design in DESeq2 only as a function of it? without modelling for any batch variable?
ddsHTSeq3 <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable3, directory = directory, design= ~ condition2)
Do you think I could rely on such results given that maybe subtle variations in the samples would be carried to their respective groups and therefore to the differential expression results? I don't think it would be possible to model for any type of batch, because each sample is a single experiment and therefore the matrix would never be full rank.
Another possibility that I have considered would be to split the later group into two distinct groups.
It is a poor experiment design, so this is the only way I see to analyze the data. I am not inclined to use any algorithm to create pseudo replicates because I do not think I could draw any biological conclusions with those results.
I would really appreciate the community’s opinion before taking a final decision. Please, let me know if anything is not clear.
Thank you all very much!