Entering edit mode
6.2 years ago
Gabiole7
•
0
Hello there,
My question: can I use a dataset for further analysis showing the following type of PCA plot, computed after a DESeq2 using the following code :
#here "spikes" refer to ERCC spike-in
dds <- estimateSizeFactors(dds, controlGenes=match(spikes, rownames(dds)))
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
rld <- rlog(dds)
plotPCA(rld, ntop = 500)
As you can see, one control sample is grouped with the treated on the bottom right (and the 2 remaining controls are not grouped). Should I play with the ntop
?
Please don't hesitate to ask if something is missing, I am a still newbee !
Thx
I consider your number of samples too few to make a judgement about which are outliers and which are okay.
I agree with my colleague Wouter!
From what I saw, 3 replica/condition is quite the standard.... of course I would obviously have preferred to get more, especially when I obtained this kind of PCA plot... Then, as I cannot change the number of replica I have, what analysis would you advise to perform on them to test for homogeneity ? Thx
How do they look on box-and-whisker and violin plots?
How do the data distributions appear when you compare them pairwise in scatter plots?
PCA will always 'stretch' samples to the extremities of the plotting window, but it's not always evidence of an outlier. That said, the variance explained by your PC1 is fairly high.
Can you also try my PCA code to see how they look with that? - A: PCA plot from read count matrix from RNA-Seq Be aware that DESeq2's PCA function automatically removes a large chunk of genes of low variance prior to performing PCA, thus, the difference sbetween samples can appear to be more than they are.
I agree that 3 samples per condition is indeed often used, but it's insufficient in the case of outliers. There is no way to tell which of the 'control' samples does not belong, if any, because all are spread out. If you would have 10 samples and one would be far away then you can make a judgement.