Question: Log2(x + 1) transformation in gene expression not normally distributed.
0
gravatar for rin
11 months ago by
rin30
rin30 wrote:

Hi all!

I am using raw counts data from TCGA. As I want to compute the Z-score between tumor and normal samples, I have to first ensure that my data are normally distributed. Until now, I downloaded raw counts, normalized them for their GC content using TCGAanalyze_Normalization() function from TCGAbiolinks, log2(x+1) transfromed them but the distribution is right skewed and definetily not normal, as seen in qqnorm() plots.

Commercial Photography

How could I tackle that? I have been trying to figure it out for days, but I cannot find a solution.

Thanks a lot, R.

ADD COMMENTlink modified 11 months ago • written 11 months ago by rin30

Could you reattach the link to your plot, please

ADD REPLYlink written 11 months ago by russhh4.7k

Edited! Sorry about that! :)

ADD REPLYlink written 11 months ago by rin30
2
gravatar for Benn
11 months ago by
Benn7.7k
Netherlands
Benn7.7k wrote:

Some data can not be transformed into a normal distribution. RNA-seq count data fits a Poisson distribution or a negative binomial distribution. There is a great answer here about how RNA-seq data is distributed.

ADD COMMENTlink written 11 months ago by Benn7.7k
1

RNA-Seq is typically fitted to a Poisson or NB-distribution. Claiming that it fits those distributions is a bit strong though.

ADD REPLYlink written 11 months ago by russhh4.7k
1
gravatar for Devon Ryan
11 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

This is expected, RNAseq data should be right-skewed or multimodal.

ADD COMMENTlink written 11 months ago by Devon Ryan91k

@Devon Ryan @b.nota @russhh Really helpful link and answers! Thank you! The reason I want them to be normally distributed is to assess the change between tumor and normal expression by computing a Z-score. Would that be possible / have the same interpretation if they fit a Poisson or NB distribution?

ADD REPLYlink written 11 months ago by rin30

Try to use limma or edgeR for this kind of analysis.

ADD REPLYlink written 11 months ago by Benn7.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1579 users visited in the last hour