Question: RNA sequencing with metaboanalyst
gravatar for marc.osullivan
5 months ago by
marc.osullivan10 wrote:

I'm currently trying to input my RNA sequencing data into as in this article:

Can anyone recommend if I should use QC raw counts or the normalised counts in my data file that I input?

Can anyone recommend what normalisation/pre processing steps are best within metaboanalyst for RNA sequencing data?

sequencing rna-seq R • 269 views
ADD COMMENTlink modified 5 months ago by Kevin Blighe45k • written 5 months ago by marc.osullivan10
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe45k
Kevin Blighe45k wrote:

Hey Marc, MetaboAnalyst is primarily for metabolomics data analyses, however, it does have companion tools, such as PLSDA, which is what these authors used. I looked at the methods in the paper and their input seems to have been Z-scores. They say:

Raw read counts were quantile-normalized, mean-centered, and divided by the standard deviation of each variable.

Quantile normalisation you probably know. The second part, i.e. mean centering and dividing by the sdev, is a Z transformation.

Thus, with your RNA-seq data, I would normalise the raw counts with either of EdgeR, DESeq2, or limma/voom, transform the normalised counts to logged or variance-stabilised counts, and then Z-transform (in R, this can easily be done with the scale() function).

Hopefully you are familiar with EdgeR, DESeq2, and / or limma/Voom.



Edit: April 5th, 2019

MetaboAnalyst, in fact, does the normalisation and transformation itself. See below: C: RNA sequencing with metaboanalyst


ADD COMMENTlink modified 3 months ago • written 5 months ago by Kevin Blighe45k

Thanks for your response Kevin,

Unfortunately, I'm new to "big data" statistics and R studio but I'm slowly getting the hang of R studio.

I'd like to ask you two more questions based on your answer if possible: Can you recommend from your own experience what posthoc transformation would be best suited after running deseq2 normalisation? Would you think variance-stabilised counts and dividing by the standard deviation would be best?

Why use variance-stabilised counts instead of mean-centred counts?

ADD REPLYlink modified 5 months ago • written 5 months ago by marc.osullivan10

Hey Marc. You could do the variance-stabilising transformation and then do mean-centering followed by division by the standard deviation.

Normalised counts would not be ideal, as they follow a negative binomial distribution (shifted toward 0) - you can check with the hist() function in R. After you perform the variance-stabilising transformation, re-check the distribution to see how it has changed. The final conversion to Z scores (mean-centering and division by standard deviation) then just helps to 'iron out' the data even more. Z scores are also easily interpretative to the human brain.

ADD REPLYlink written 5 months ago by Kevin Blighe45k

Thanks for your help Kevin; your help is much appreciated I'm now quite confident with most of the steps thanks to your advice.

ADD REPLYlink written 5 months ago by marc.osullivan10

No problem. I should add that the idea is that different programs/tools (like MetaboAnalyst) will expect your data to be on different distributions. It seems that MetaboAnalyst wants data to be normally distributed. So, the problem with RNA-seq data in this case is that it follows a negative binomial distribution, which is unsuitable for MetabAnalyst. If you use hist() to plot the RNA-seq normalised counts, you'll likely have to set breaks=50 or breaks=100

Thus, through the variance-stabilising transformation followed by the Z-scaling, we 'gracefully' transform the RNA-seq negative binomial counts to a normal distribution, which is more amenable for MetaboAnalyst.

ADD REPLYlink modified 3 months ago • written 5 months ago by Kevin Blighe45k

I also want to use metaboanalyst for RNA Seq data. I variance stabilized the normalized DeSeq2 counts with the build in vst() function and used scale() after that to get Z-transformed counts. Now I realized that some values are negative but metaboanalyst does not allow negative values as you can read here. Do you have any suggestion how to avoid negative values while maintaining a correct transformation?

ADD REPLYlink written 3 months ago by t-jim30

You must be referring to this line:

Data values (concentrations, bins, peak intensities) should contain only numeric and positive values (using empty or NA for missing values)

It makes sense that those would have to be positive.

I can see that, in fact, MetaboAnalyst does the normalisation and scaling itself, as this screenshot shows:


This must also be what the authors in the OP's original question did - I have checked the boxes (in the screenshot) that match the described methods in their manuscript.

So, you will have to input your peak intensities, or, for RNA-seq, raw read counts. I won't comment on the validity of simply quantile normalising raw counts.

ADD REPLYlink modified 11 weeks ago • written 3 months ago by Kevin Blighe45k

I would ensure you're using raw read counts, to my knowledge, it's not possible to have negative raw read counts. Ensure you have sample normalisation and autoscaling to set to none as both have been down with the VST and scale function in DeSeq2

ADD REPLYlink written 11 weeks ago by marc.osullivan10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 699 users visited in the last hour