Question: RNA sequencing with metaboanalyst
gravatar for marc.osullivan
8 days ago by
marc.osullivan10 wrote:

I'm currently trying to input my RNA sequencing data into as in this article:

Can anyone recommend if I should use QC raw counts or the normalised counts in my data file that I input?

Can anyone recommend what normalisation/pre processing steps are best within metaboanalyst for RNA sequencing data?

sequencing rna-seq R • 91 views
ADD COMMENTlink modified 8 days ago by Kevin Blighe37k • written 8 days ago by marc.osullivan10
gravatar for Kevin Blighe
8 days ago by
Kevin Blighe37k
Republic of Ireland
Kevin Blighe37k wrote:

Hey Marc, MetaboAnalyst is primarily for metabolomics data analyses, however, it does have companion tools, such as PLSDA, which is what these authors used. I looked at the methods in the paper and their input seems to have been Z-scores. They say:

Raw read counts were quantile-normalized, mean-centered, and divided by the standard deviation of each variable.

Quantile normalisation you probably know. The second part, i.e. mean centering and dividing by the sdev, is a Z transformation.

Thus, with your RNA-seq data, I would normalise the raw counts with either of EdgeR, DESeq2, or limma/voom, transform the normalised counts to logged or variance-stabilised counts, and then Z-transform (in R, this can easily be done with the scale() function).

Hopefully you are familiar with EdgeR, DESeq2, and / or limma/Voom.


ADD COMMENTlink written 8 days ago by Kevin Blighe37k

Thanks for your response Kevin,

Unfortunately, I'm new to "big data" statistics and R studio but I'm slowly getting the hang of R studio.

I'd like to ask you two more questions based on your answer if possible: Can you recommend from your own experience what posthoc transformation would be best suited after running deseq2 normalisation? Would you think variance-stabilised counts and dividing by the standard deviation would be best?

Why use variance-stabilised counts instead of mean-centred counts?

ADD REPLYlink modified 4 days ago • written 4 days ago by marc.osullivan10

Hey Marc. You could do the variance-stabilising transformation and then do mean-centering followed by division by the standard deviation.

Normalised counts would not be ideal, as they follow a negative binomial distribution (shifted toward 0) - you can check with the hist() function in R. After you perform the variance-stabilising transformation, re-check the distribution to see how it has changed. The final conversion to Z scores (mean-centering and division by standard deviation) then just helps to 'iron out' the data even more. Z scores are also easily interpretative to the human brain.

ADD REPLYlink written 4 days ago by Kevin Blighe37k

Thanks for your help Kevin; your help is much appreciated I'm now quite confident with most of the steps thanks to your advice.

ADD REPLYlink written 3 days ago by marc.osullivan10

No problem. I should add that the idea is that different programs/tools (like MetaboAnalyst) will expect your data to be on different distributions. It seems that MetaboAnalyst wants data to be normally distributed. So, the problem with RNA-seq data in this case is that it follows a negative binomial distribution, which is unsuitable for MetabAnalyst. If you use hist() to plot the RNA-seq normalised counts, you'll likely have to set breaks=50 or breaks=100

Thus, through the variance-stabilising transformation followed by the Z-scaling, we 'gracefully' transform the RNA-seq negative binomial counts to a normal distribution, which is more amenable for MetabAnalyst.

'Bad' analysts would just log the normalised counts and use those, but, in doing that, the relationship between the variables in the dataset is not preserved, loosely speaking... i.e., logging the data directly may bring it to a normal distribution but, in the process, you'll introduce bias into the data.

ADD REPLYlink written 3 days ago by Kevin Blighe37k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 669 users visited in the last hour