How to output MDS plot for RNAseq samples with ggplot2?
6 days ago
Riku ▴ 40

Dear all.

I am trying to output an MDS plot to understand the relationship between RNAseq data.

I was able to output the MDS plot using "diffExpr.P0.001_C2.matrix" output by edgeR. However, I would like to use ggplot2 to output it in a more readable form.

I have tried the following steps, but I am getting an error. What is the problem with my pipeline?

Thank you very much for your advices!

> d <- dist(1 - rho)
> d
           Control1   Control2       Dry1       Dry2    DryRec1    DryRec2    PreRec1    PreRec2    Predry1
Control2 0.06502264                                                                                        
Dry1     1.48474438 1.52669729                                                                             
Dry2     1.62197906 1.66162549 0.22807033                                                                  
DryRec1  1.63688521 1.67153910 0.45567746 0.34034233                                                       
DryRec2  1.54150125 1.57960816 0.30996896 0.28957991 0.23296231                                            
PreRec1  1.60822608 1.64742160 0.24699751 0.15173269 0.34458096 0.28891000                                 
PreRec2  1.63382533 1.67045446 0.35367499 0.20514152 0.28429467 0.32059748 0.18160279                      
Predry1  1.54096592 1.58217537 0.15757815 0.19393518 0.42187210 0.29722224 0.18873484 0.30663769           
Predry2  1.63482837 1.67433055 0.25604187 0.11518698 0.33440870 0.29088779 0.13758830 0.19564874 0.18327134
> mds <- cmdscale(d)
> plot(mds, type = "n")
> text(mds, labels = colnames(count))
> mds2 <-
> ggplot(mds2, aes(x = `1`, y = `2`, color = dex, shape = cell))
Error in FUN(X[[i]], ...) : object '1' not found

> dput(mds2)
structure(list(V1 = c(-1.25870472974072, -1.29753868680971, 0.214130257699255, 
0.359880006140221, 0.354475784656956, 0.269322460991461, 0.34548218958249, 
0.366016370053708, 0.273662948738074, 0.373273398688271), V2 = c(-0.00570676794533206, 
0.0186976427031538, -0.153998283521261, -0.060516886861918, 0.266222331907919, 
0.126701199467327, -0.0536343596445791, 0.0529305521126579, -0.13730725971642, 
-0.0533881685015465)), class = "data.frame", row.names = c("Control1", 
"Control2", "Dry1", "Dry2", "DryRec1", "DryRec2", "PreRec1", 
"PreRec2", "Predry1", "Predry2"))
6 days ago
ATpoint 53k

Here is a scatterplot in its most basic form with some label annotation:

df <- data.frame(Sample=rownames(mds2), mds2)
ggplot(df, aes(x=V1, y=V2, label=Sample)) +
  geom_point() +
  geom_label_repel(min.segment.length = 0)

I encourage to really learn the basics of ggplot2, it is super helpful and imho an essential skill to anyone who works in R.

It's a beautiful graph! Thank you very much! I had to specify the column names "V1" and "V2".

I would like to ask one question, what does the following part mean? In particular, I don't understand the part "Sample=rownames(mds2)", can you please explain?

> df <- data.frame(Sample=rownames(mds2), mds2)
If you want to plot the names of the samples in ggplot then the names have to be part of the data.frame that you give to the ggplot function. In your mds2 these are the rownames, so with the above command I dragged the rownames to a column in the data.frame called Sample and these you can then specify in the ggplot aes to be used as labels. I know ggplot is a bit strange if you have no experience with it, but if you spend some quality time with the many tutorials online then you will soon be fluent in it.

Now, I got it! You're right, I haven't used ggplot2 much yet. It seems I need to learn more about this.

Your advice was very accurate and very helpful. Sincerely, thank you very much for your perfect explanation!

You're very welcome :)


