Hey Marc, MetaboAnalyst is primarily for metabolomics data analyses, however, it does have companion tools, such as PLSDA, which is what these authors used. I looked at the methods in the paper and their input seems to have been Z-scores. They say:
Raw read counts were quantile-normalized, mean-centered, and divided
by the standard deviation of each variable.
Quantile normalisation you probably know. The second part, i.e. mean centering and dividing by the sdev, is a Z transformation.
Thus, with your RNA-seq data, I would normalise the raw counts with either of EdgeR, DESeq2, or limma/voom, transform the normalised counts to logged or variance-stabilised counts, and then Z-transform (in R, this can easily be done with the scale()
function).
Hopefully you are familiar with EdgeR, DESeq2, and / or limma/Voom.
Kevin
---------------------
Edit: April 5th, 2019
MetaboAnalyst, in fact, does the normalisation and transformation itself. See below: C: RNA sequencing with metaboanalyst
------------------------
Thanks for your response Kevin,
Unfortunately, I'm new to "big data" statistics and R studio but I'm slowly getting the hang of R studio.
I'd like to ask you two more questions based on your answer if possible: Can you recommend from your own experience what posthoc transformation would be best suited after running deseq2 normalisation? Would you think variance-stabilised counts and dividing by the standard deviation would be best?
Why use variance-stabilised counts instead of mean-centred counts?
Hey Marc. You could do the variance-stabilising transformation and then do mean-centering followed by division by the standard deviation.
Normalised counts would not be ideal, as they follow a negative binomial distribution (shifted toward 0) - you can check with the
hist()
function in R. After you perform the variance-stabilising transformation, re-check the distribution to see how it has changed. The final conversion to Z scores (mean-centering and division by standard deviation) then just helps to 'iron out' the data even more. Z scores are also easily interpretative to the human brain.Thanks for your help Kevin; your help is much appreciated I'm now quite confident with most of the steps thanks to your advice.
No problem. I should add that the idea is that different programs/tools (like MetaboAnalyst) will expect your data to be on different distributions. It seems that MetaboAnalyst wants data to be normally distributed. So, the problem with RNA-seq data in this case is that it follows a negative binomial distribution, which is unsuitable for MetabAnalyst. If you use hist() to plot the RNA-seq normalised counts, you'll likely have to set
breaks=50
orbreaks=100
Thus, through the variance-stabilising transformation followed by the Z-scaling, we 'gracefully' transform the RNA-seq negative binomial counts to a normal distribution, which is more amenable for MetaboAnalyst.
I also want to use metaboanalyst for RNA Seq data. I variance stabilized the normalized DeSeq2 counts with the build in vst() function and used scale() after that to get Z-transformed counts. Now I realized that some values are negative but metaboanalyst does not allow negative values as you can read here. Do you have any suggestion how to avoid negative values while maintaining a correct transformation?
You must be referring to this line:
It makes sense that those would have to be positive.
I can see that, in fact, MetaboAnalyst does the normalisation and scaling itself, as this screenshot shows:
This must also be what the authors in the OP's original question did - I have checked the boxes (in the screenshot) that match the described methods in their manuscript.
So, you will have to input your peak intensities, or, for RNA-seq, raw read counts. I won't comment on the validity of simply quantile normalising raw counts.
I would ensure you're using raw read counts, to my knowledge, it's not possible to have negative raw read counts. Ensure you have sample normalisation and autoscaling to set to none as both have been down with the VST and scale function in DeSeq2