Question: RNAseq expression data log2 transformed has negative values.
2
gravatar for juncheng
4.8 years ago by
juncheng180
köln
juncheng180 wrote:

Hi,

I got a fpkm normalized RNAseq gene expression data. Really a lot of genes at some groups have expression value less than 1, that means after log2 transform, the value is negative. Some of the values even be -1000, which is really annoying.

How do you usually treat with this values, from my experience, log transformed RNAseq expression data never have negative values. 

 

rna-seq • 15k views
ADD COMMENTlink modified 4.8 years ago by dariober9.9k • written 4.8 years ago by juncheng180

Update October 15, 2018

Just to clarify something for others arriving here: logging RPKM or FPKM values does not make these any better for conducting statistical comparisons. With no cross-sample normalisation used when producing RPKM / FPKM, these units are not suitable for differential expression.

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLYlink modified 5 months ago • written 5 months ago by Kevin Blighe39k
3
gravatar for dariober
4.8 years ago by
dariober9.9k
WCIP | Glasgow | UK
dariober9.9k wrote:

In these cases it's not unusual to add a pseudocount of 1 to all counts so genes with 0 return to 0 after log. Obviously you have to assume that adding 1 doesn't bias much the initial non-zero counts.

(EDIT: Apologies, this was meant to be a comment to question of how to treat genes with 0 counts)

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by dariober9.9k

Just for reference, a value of 0.25 (by default, it's the prior.count option) is added by edgeR before calculating log2 rpkm or cpm values. Whether this is needed or not depends on ones goals.

ADD REPLYlink written 4.8 years ago by Devon Ryan88k

Thanks, this seams to be a log2(x+1) transform is common. +0.25 seams also fine depends on the data.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by juncheng180

While it may be common to log(x+1) for RPKM / FPKM, this does not mean that it is statistically valid if your goal is differential expression analysis.

ADD REPLYlink written 5 months ago by Kevin Blighe39k
1
gravatar for Devon Ryan
4.8 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

You treat negative values the same as positive ones. There's no reason to expect FPKM or TPM or CPM or even normalized counts to be greater than 1.

ADD COMMENTlink written 4.8 years ago by Devon Ryan88k

Thanks. How about the 0 values, after log transform, they become inf, this should just be treated as NAs?

ADD REPLYlink written 4.8 years ago by juncheng180

That's what I'd do.
 

ADD REPLYlink written 4.8 years ago by Devon Ryan88k

Thanks. I will do that

ADD REPLYlink written 4.8 years ago by juncheng180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1062 users visited in the last hour