Question

DESeq2 design and Batch effects

0

Entering edit mode

4.7 years ago

baldissera152 ▴ 10

Hi, guys.

I am analyzing some RNA-Seq data from animals in different time points after a trauma in the spinal cord. The animal’s tissue have been pooled according to the time-points, therefore I do not have biological neither technical replicates. In order to analyze this data, I have attributed a new label to my samples which would represent "acute" or "later" conditions after the trauma. This grouping would allow me to have biological replicates and therefore "n".

I have performed a heatmap (sample distances) and also a PCA. According to my understanding of the PCA, we can see samples grouping according to such label and that would reflect that the biological condition is explaining a big part of the variation in the data.

My question is: considering that that my biological variable would explain a big part of the variation in the data, could I model the design in DESeq2 only as a function of it? without modelling for any batch variable?

ddsHTSeq3 <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable3,
                                       directory = directory,
                                       design= ~ condition2)

Do you think I could rely on such results given that maybe subtle variations in the samples would be carried to their respective groups and therefore to the differential expression results? I don't think it would be possible to model for any type of batch, because each sample is a single experiment and therefore the matrix would never be full rank.

Another possibility that I have considered would be to split the later group into two distinct groups.

It is a poor experiment design, so this is the only way I see to analyze the data. I am not inclined to use any algorithm to create pseudo replicates because I do not think I could draw any biological conclusions with those results.

I would really appreciate the community’s opinion before taking a final decision. Please, let me know if anything is not clear.

Thank you all very much!

Gabriel

RNA-Seq batch-effect DESeq2 • 1.3k views

ADD COMMENT • link updated 3 months ago by Ram 44k • written 4.7 years ago by baldissera152 ▴ 10

2

Entering edit mode

For me, the most important point is to NOT define sample groups according to the expression data (PCA, etc). The design of the experiment should be set before, according to the... hum... design of the experiment. It should never be set a posteriori. So when you say:

Another possibility that I have considered would be to split the later group into two distinct groups.

I hope that you do not define the distinct groups according to the PCA/clustering.

PS: for your design, I strongly suggest to follow this workflow for time course experiments. In your case you will end up without replicate, so you will have to be careful with the interpretation, but it is the best you can do with your dataset.

ADD REPLY • link 4.7 years ago by Carlo Yague 8.7k

1

Entering edit mode

Thank you for your reply. You are right, it's very hard to analyze data when the design is very restrict. The groups I have defined as "acute" and "later" were based on the original paper's findings. My PCA just confirmed a grouping tendency.

Regarding my idea of splitting a group based on the PCA results, I think you are right. The samples may be distinct because of technical variation, so I should not define another group based only on that result.

I will take a look on the time course analysis documentation, unfortunately I would end up without replicates as you said yourself, so I must be very careful on drawing any type of biological conclusion.

ADD REPLY • link 4.7 years ago by baldissera152 ▴ 10