Question: Log2(x + 1) transformation in gene expression not normally distributed.
0
gravatar for rin
17 months ago by
rin30
rin30 wrote:

Hi all!

I am using raw counts data from TCGA. As I want to compute the Z-score between tumor and normal samples, I have to first ensure that my data are normally distributed. Until now, I downloaded raw counts, normalized them for their GC content using TCGAanalyze_Normalization() function from TCGAbiolinks, log2(x+1) transfromed them but the distribution is right skewed and definetily not normal, as seen in qqnorm() plots.

Commercial Photography

How could I tackle that? I have been trying to figure it out for days, but I cannot find a solution.

Thanks a lot, R.

ADD COMMENTlink modified 17 months ago • written 17 months ago by rin30

Could you reattach the link to your plot, please

ADD REPLYlink written 17 months ago by russhh5.1k

Edited! Sorry about that! :)

ADD REPLYlink written 17 months ago by rin30
2
gravatar for Benn
17 months ago by
Benn7.9k
Netherlands
Benn7.9k wrote:

Some data can not be transformed into a normal distribution. RNA-seq count data fits a Poisson distribution or a negative binomial distribution. There is a great answer here about how RNA-seq data is distributed.

ADD COMMENTlink written 17 months ago by Benn7.9k
1

RNA-Seq is typically fitted to a Poisson or NB-distribution. Claiming that it fits those distributions is a bit strong though.

ADD REPLYlink written 17 months ago by russhh5.1k
1
gravatar for Devon Ryan
17 months ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

This is expected, RNAseq data should be right-skewed or multimodal.

ADD COMMENTlink written 17 months ago by Devon Ryan94k

@Devon Ryan @b.nota @russhh Really helpful link and answers! Thank you! The reason I want them to be normally distributed is to assess the change between tumor and normal expression by computing a Z-score. Would that be possible / have the same interpretation if they fit a Poisson or NB distribution?

ADD REPLYlink written 17 months ago by rin30

Try to use limma or edgeR for this kind of analysis.

ADD REPLYlink written 17 months ago by Benn7.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2002 users visited in the last hour