Question: Help explaining MDS plot of RNAseq?
1
gravatar for Dayna
17 months ago by
Dayna20
Dayna20 wrote:

Hello everybody

Can any one help me understand the MDS plot for two groups of samples like control vs. treated we generate from edgeR or Deseq2. I understand the figure as it separates the control from treated very well, but if we have the y-axis as the logFC, I don't get what the x-axis is? What is the x-axis in such 2D figure?

Thanks

rna-seq mds plot • 2.5k views
ADD COMMENTlink modified 17 months ago by Kevin Blighe42k • written 17 months ago by Dayna20
6
gravatar for Kevin Blighe
17 months ago by
Kevin Blighe42k
Republic of Ireland
Kevin Blighe42k wrote:

Hello, the y-axis in a MDS plot does not represent logFC, and neither does the x-axis.

Provided you have generated the plot according to the workflow, RNA-seq workflow: gene-level exploratory analysis and differential expression, on Bioconductor, your x- and y-axes will be representative of Euclidean distances between your samples. These Euclidean distances will be produced/repeated across multiple dimensions in different ways, but the standard way of representing MDS is to just plot the Euclidean distances from the first 2 dimensions, with x-axis being Dimension 1 and y-axis being Dimension 2. The dimensions are ordered based on how well they fit your samples (for further reading, take a look at the cmdscale Details section.

MDS is commonly used in genetic studies to find relationships between samples based on genotype, but, as you can see, it's also used in RNA-seq. For information on a related method, i.e., principal components analysis, please see my threads here:


For what it's worth:

  • A volcano plot represents logFC (x-axis) and negative log10 P- or adjusted P-value (y-axis)
  • A MA plot represents average expression (x-axis) and logFc (y-axis)
ADD COMMENTlink modified 17 months ago • written 17 months ago by Kevin Blighe42k

Thank a lot Kevin. Can you please give me an example of what could a dimension 1 and 2 be, just as an example to get the idea more?
thanks.

ADD REPLYlink written 17 months ago by Dayna20
1

Hi! You can say that each dimension represents similarity / dis-similarity between your samples of interest, but that's effectively it. They are not directly measuring any parameter in your data. Euclidean distances are abstract and unitless, and in your situation they are an abstraction of the values supplied in your input data-matrix, i.e., gene expression. You should see a very similar separation of your dataset by doing simple hierarchical clustering with Euclidean distance as the distance metric.

Think of it another way: given the expression levels of your input genes, how similar and dis-similar to each other are my samples based on the expression of these genes?

For downstream applications, I prefer to do PCA, as you can better quantify the similarities / dis-similarities with component loadings to each axis / 'dimension', and, thus, infer which specific genes are responsible for segregation along a particular axis.

Trust that further helps.

Kevin

ADD REPLYlink modified 6 months ago • written 17 months ago by Kevin Blighe42k

That's very helpful, thanks Kevin a lot !

ADD REPLYlink written 17 months ago by Dayna20
1

No problem - happy to have helped.

ADD REPLYlink written 17 months ago by Kevin Blighe42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 724 users visited in the last hour