Question: PCA plot, less points than samples
0
gravatar for eridanus
21 days ago by
eridanus0
eridanus0 wrote:

Hello. I am working with RNA seq data, and I am trying to do differential expression analysis. I am trying to make a PCA plot with Deseq2. Howevere although I have 12 samples, the PCA plot appears to have 11 dots. When I am making MDS plot with limma voom I see the same (11 dots instead of 12). I have checked dds object (and rlog transformed respectively) and its dimensions is 12. What could be the cause? Thank you!

dds <- DESeqDataSetFromMatrix(countData=countData, colData=colData, design=~group)
keep <- rowSums(counts(dds)) >= 10 
dds <- dds[keep,] 
rld <- rlog(dds)
plot_pca<-plotPCA(rld, intgroup = c("group")) 
plotPCA(rld, intgroup = c("group")) 
plot_pca <- plot_pca + geom_text(aes_string(label = "name"), color = "black") 
print(plot_pca)
rna-seq • 149 views
ADD COMMENTlink modified 21 days ago • written 21 days ago by eridanus0

enter image description here

This is my PCA plot. I checked and finally all the samples are included. I wonder if I should remove some samples from the differential expression analysis. WHat do you think?

ADD REPLYlink modified 21 days ago • written 21 days ago by eridanus0
2
gravatar for ATpoint
21 days ago by
ATpoint40k
Germany
ATpoint40k wrote:

Use returnData=TRUE for plotPCA and see whether there are 12 or 11 samples in the returned data.frame. If 12 then two points are overlapping perfectly.

ADD COMMENTlink modified 21 days ago • written 21 days ago by ATpoint40k

thank you. I tried and it is 12. I thought about overlapping, but then I added the labels, and I thought that if there was overlapping it would be obvious in the labels (that was not the case). what can I do about overlapping? Thank you a lot!

ADD REPLYlink written 21 days ago by eridanus0
1

You can try to increase the number of genes used for PCA, maybe use 1000 and hope they separate better. ?plotPCA for details.

ADD REPLYlink written 21 days ago by ATpoint40k
2

Or, I'd doublecheck that your 12 samples really are 12 samples, and that you don't have a goof where the same fastq was analyzed under two different names.

ADD REPLYlink written 21 days ago by swbarnes28.8k

Good point. For this I guess a simple cor() would do and you should get a Pearson correlation of 1 in that case.

ADD REPLYlink written 21 days ago by ATpoint40k

enter image description here This is my PCA plot. The clustering is not that good, but in differential expression analysis I can see differences in expression. Should I remove some samples? What do you think? Thank you!

ADD REPLYlink modified 21 days ago by ATpoint40k • written 21 days ago by eridanus0

You can't remove samples just because they don't cluster how you like. It might be nice if you could figure out what's driving that first principle component.

ADD REPLYlink written 21 days ago by swbarnes28.8k

Yes you are right. Thank you for your response!

ADD REPLYlink written 20 days ago by eridanus0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1977 users visited in the last hour