Question: Log2(x + 1) transformation in gene expression not normally distributed.
0
gravatar for rin
23 months ago by
rin30
rin30 wrote:

Hi all!

I am using raw counts data from TCGA. As I want to compute the Z-score between tumor and normal samples, I have to first ensure that my data are normally distributed. Until now, I downloaded raw counts, normalized them for their GC content using TCGAanalyze_Normalization() function from TCGAbiolinks, log2(x+1) transfromed them but the distribution is right skewed and definetily not normal, as seen in qqnorm() plots.

Commercial Photography

How could I tackle that? I have been trying to figure it out for days, but I cannot find a solution.

Thanks a lot, R.

transformation rna-seq z-score • 1.6k views
ADD COMMENTlink modified 23 months ago • written 23 months ago by rin30

Could you reattach the link to your plot, please

ADD REPLYlink written 23 months ago by russhh5.5k

Edited! Sorry about that! :)

ADD REPLYlink written 23 months ago by rin30
2
gravatar for Benn
23 months ago by
Benn8.0k
Netherlands
Benn8.0k wrote:

Some data can not be transformed into a normal distribution. RNA-seq count data fits a Poisson distribution or a negative binomial distribution. There is a great answer here about how RNA-seq data is distributed.

ADD COMMENTlink written 23 months ago by Benn8.0k
1

RNA-Seq is typically fitted to a Poisson or NB-distribution. Claiming that it fits those distributions is a bit strong though.

ADD REPLYlink written 23 months ago by russhh5.5k
1
gravatar for Devon Ryan
23 months ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

This is expected, RNAseq data should be right-skewed or multimodal.

ADD COMMENTlink written 23 months ago by Devon Ryan96k

@Devon Ryan @b.nota @russhh Really helpful link and answers! Thank you! The reason I want them to be normally distributed is to assess the change between tumor and normal expression by computing a Z-score. Would that be possible / have the same interpretation if they fit a Poisson or NB distribution?

ADD REPLYlink written 23 months ago by rin30

Try to use limma or edgeR for this kind of analysis.

ADD REPLYlink written 23 months ago by Benn8.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1468 users visited in the last hour