I am facing issues with the control and samples clustering even after batch correction - the samples contain multiple tissue types
I am facing issues with the control and samples clustering even after batch correction - the samples contain multiple tissue types
I can tell you what you already know: this result is unexpected. Nobody can tell you whether the plot is "correct" or not - it depends on whether you processed the data properly. If yes, then the result is what it is. Either your two groups are not different in ways you expected, or something went wrong during the experimental stage.
Assuming that the PCA was "technically" done correctly (bulletproof way would be to use plotPCA from DESeq2 as a wrapper) -- then the separation is simply not due to treatment. You say multiple tissues. PCA uses feature selection of genes (usually by variance) and it is perfectly expected that differences across tissues can be (much) stronger than differences by treatment. Hence two things for starters:
removeBatchEffect()
using tissue as batch) and repeat PCAIn addition to these comments, I would suggest checking other PCs, the first two PCs only explain a combined 39% of the variance. Since you have multiple tissues, it could be the the clustering you're looking for is smaller than expected and explained in one of the other PC combinations (e.g., PC1 + PC3).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It would also help to add information about what kind of data this is and what you did to the data to get to this point.