RNA sequencing with metaboanalyst
1
0
Entering edit mode
5.2 years ago

I'm currently trying to input my RNA sequencing data into metaboanalyst.ca as in this article: https://jamanetwork.com/journals/jamapediatrics/fullarticle/2663880

Can anyone recommend if I should use QC raw counts or the normalised counts in my data file that I input?

Can anyone recommend what normalisation/pre processing steps are best within metaboanalyst for RNA sequencing data?

RNA-Seq rna-seq R sequencing • 1.8k views
ADD COMMENT
1
Entering edit mode
5.2 years ago

Hey Marc, MetaboAnalyst is primarily for metabolomics data analyses, however, it does have companion tools, such as PLSDA, which is what these authors used. I looked at the methods in the paper and their input seems to have been Z-scores. They say:

Raw read counts were quantile-normalized, mean-centered, and divided by the standard deviation of each variable.

Quantile normalisation you probably know. The second part, i.e. mean centering and dividing by the sdev, is a Z transformation.

Thus, with your RNA-seq data, I would normalise the raw counts with either of EdgeR, DESeq2, or limma/voom, transform the normalised counts to logged or variance-stabilised counts, and then Z-transform (in R, this can easily be done with the scale() function).

Hopefully you are familiar with EdgeR, DESeq2, and / or limma/Voom.

Kevin

---------------------

Edit: April 5th, 2019

MetaboAnalyst, in fact, does the normalisation and transformation itself. See below: C: RNA sequencing with metaboanalyst

------------------------

ADD COMMENT
0
Entering edit mode

Thanks for your response Kevin,

Unfortunately, I'm new to "big data" statistics and R studio but I'm slowly getting the hang of R studio.

I'd like to ask you two more questions based on your answer if possible: Can you recommend from your own experience what posthoc transformation would be best suited after running deseq2 normalisation? Would you think variance-stabilised counts and dividing by the standard deviation would be best?

Why use variance-stabilised counts instead of mean-centred counts?

ADD REPLY
1
Entering edit mode

Hey Marc. You could do the variance-stabilising transformation and then do mean-centering followed by division by the standard deviation.

Normalised counts would not be ideal, as they follow a negative binomial distribution (shifted toward 0) - you can check with the hist() function in R. After you perform the variance-stabilising transformation, re-check the distribution to see how it has changed. The final conversion to Z scores (mean-centering and division by standard deviation) then just helps to 'iron out' the data even more. Z scores are also easily interpretative to the human brain.

ADD REPLY
1
Entering edit mode

Thanks for your help Kevin; your help is much appreciated I'm now quite confident with most of the steps thanks to your advice.

ADD REPLY
1
Entering edit mode

No problem. I should add that the idea is that different programs/tools (like MetaboAnalyst) will expect your data to be on different distributions. It seems that MetaboAnalyst wants data to be normally distributed. So, the problem with RNA-seq data in this case is that it follows a negative binomial distribution, which is unsuitable for MetabAnalyst. If you use hist() to plot the RNA-seq normalised counts, you'll likely have to set breaks=50 or breaks=100

Thus, through the variance-stabilising transformation followed by the Z-scaling, we 'gracefully' transform the RNA-seq negative binomial counts to a normal distribution, which is more amenable for MetaboAnalyst.

ADD REPLY
0
Entering edit mode

I also want to use metaboanalyst for RNA Seq data. I variance stabilized the normalized DeSeq2 counts with the build in vst() function and used scale() after that to get Z-transformed counts. Now I realized that some values are negative but metaboanalyst does not allow negative values as you can read here. Do you have any suggestion how to avoid negative values while maintaining a correct transformation?

ADD REPLY
1
Entering edit mode

You must be referring to this line:

Data values (concentrations, bins, peak intensities) should contain only numeric and positive values (using empty or NA for missing values)

It makes sense that those would have to be positive.

I can see that, in fact, MetaboAnalyst does the normalisation and scaling itself, as this screenshot shows:

f

This must also be what the authors in the OP's original question did - I have checked the boxes (in the screenshot) that match the described methods in their manuscript.

So, you will have to input your peak intensities, or, for RNA-seq, raw read counts. I won't comment on the validity of simply quantile normalising raw counts.

ADD REPLY
0
Entering edit mode

I would ensure you're using raw read counts, to my knowledge, it's not possible to have negative raw read counts. Ensure you have sample normalisation and autoscaling to set to none as both have been down with the VST and scale function in DeSeq2

ADD REPLY

Login before adding your answer.

Traffic: 1870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6