Question: RNAseq normalized counts PCA
0
gravatar for Gabiole7
14 months ago by
Gabiole70
Gabiole70 wrote:

Hello there,

My question: can I use a dataset for further analysis showing the following type of PCA plot, computed after a DESeq2 using the following code :

#here "spikes" refer to ERCC spike-in
dds <- estimateSizeFactors(dds, controlGenes=match(spikes, rownames(dds)))
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
rld <- rlog(dds)
plotPCA(rld, ntop = 500)

test

As you can see, one control sample is grouped with the treated on the bottom right (and the 2 remaining controls are not grouped). Should I play with the ntop ?

Please don't hesitate to ask if something is missing, I am a still newbee !

Thx

ADD COMMENTlink modified 14 months ago • written 14 months ago by Gabiole70
3

I consider your number of samples too few to make a judgement about which are outliers and which are okay.

ADD REPLYlink written 14 months ago by WouterDeCoster38k

I agree with my colleague Wouter!

ADD REPLYlink written 14 months ago by Kevin Blighe41k

From what I saw, 3 replica/condition is quite the standard.... of course I would obviously have preferred to get more, especially when I obtained this kind of PCA plot... Then, as I cannot change the number of replica I have, what analysis would you advise to perform on them to test for homogeneity ? Thx

ADD REPLYlink modified 14 months ago • written 14 months ago by Gabiole70
1

How do they look on box-and-whisker and violin plots?

How do the data distributions appear when you compare them pairwise in scatter plots?

PCA will always 'stretch' samples to the extremities of the plotting window, but it's not always evidence of an outlier. That said, the variance explained by your PC1 is fairly high.

Can you also try my PCA code to see how they look with that? - A: PCA plot from read count matrix from RNA-Seq Be aware that DESeq2's PCA function automatically removes a large chunk of genes of low variance prior to performing PCA, thus, the difference sbetween samples can appear to be more than they are.

ADD REPLYlink written 14 months ago by Kevin Blighe41k

I agree that 3 samples per condition is indeed often used, but it's insufficient in the case of outliers. There is no way to tell which of the 'control' samples does not belong, if any, because all are spread out. If you would have 10 samples and one would be far away then you can make a judgement.

ADD REPLYlink written 14 months ago by WouterDeCoster38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1315 users visited in the last hour