Question

RNAseq normalized counts PCA

0

Entering edit mode

6.2 years ago

Gabiole7 • 0

Hello there,

My question: can I use a dataset for further analysis showing the following type of PCA plot, computed after a DESeq2 using the following code :

#here "spikes" refer to ERCC spike-in
dds <- estimateSizeFactors(dds, controlGenes=match(spikes, rownames(dds)))
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
rld <- rlog(dds)
plotPCA(rld, ntop = 500)

test

As you can see, one control sample is grouped with the treated on the bottom right (and the 2 remaining controls are not grouped). Should I play with the ntop ?

Please don't hesitate to ask if something is missing, I am a still newbee !

Thx

RNA-Seq PCA DESeq2 ERCC spike-in • 3.0k views

ADD COMMENT • link 6.2 years ago by Gabiole7 • 0

3

Entering edit mode

I consider your number of samples too few to make a judgement about which are outliers and which are okay.

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

0

Entering edit mode

I agree with my colleague Wouter!

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

From what I saw, 3 replica/condition is quite the standard.... of course I would obviously have preferred to get more, especially when I obtained this kind of PCA plot... Then, as I cannot change the number of replica I have, what analysis would you advise to perform on them to test for homogeneity ? Thx

ADD REPLY • link 6.2 years ago by Gabiole7 • 0

1

Entering edit mode

How do they look on box-and-whisker and violin plots?

How do the data distributions appear when you compare them pairwise in scatter plots?

PCA will always 'stretch' samples to the extremities of the plotting window, but it's not always evidence of an outlier. That said, the variance explained by your PC1 is fairly high.

Can you also try my PCA code to see how they look with that? - A: PCA plot from read count matrix from RNA-Seq Be aware that DESeq2's PCA function automatically removes a large chunk of genes of low variance prior to performing PCA, thus, the difference sbetween samples can appear to be more than they are.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

I agree that 3 samples per condition is indeed often used, but it's insufficient in the case of outliers. There is no way to tell which of the 'control' samples does not belong, if any, because all are spread out. If you would have 10 samples and one would be far away then you can make a judgement.

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k