RNAseq expression data log2 transformed has negative values.
2
3
Entering edit mode
8.2 years ago
juncheng ▴ 200

Hi,

I got a fpkm normalized RNAseq gene expression data. Really a lot of genes at some groups have expression value less than 1, that means after log2 transform, the value is negative. Some of the values even be -1000, which is really annoying.

How do you usually treat with this values, from my experience, log transformed RNAseq expression data never have negative values.

RNA-Seq • 26k views
ADD COMMENT
1
Entering edit mode

Update October 15, 2018

Just to clarify something for others arriving here: logging RPKM or FPKM values does not make these any better for conducting statistical comparisons. With no cross-sample normalisation used when producing RPKM / FPKM, these units are not suitable for differential expression.

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLY
3
Entering edit mode
8.2 years ago

In these cases it's not unusual to add a pseudocount of 1 to all counts so genes with 0 return to 0 after log. Obviously you have to assume that adding 1 doesn't bias much the initial non-zero counts.

(EDIT: Apologies, this was meant to be a comment to question of how to treat genes with 0 counts)

ADD COMMENT
0
Entering edit mode

Just for reference, a value of 0.25 (by default, it's the prior.count option) is added by edgeR before calculating log2 rpkm or cpm values. Whether this is needed or not depends on ones goals.

ADD REPLY
0
Entering edit mode

Thanks, this seams to be a log2(x+1) transform is common. +0.25 seams also fine depends on the data.

ADD REPLY
0
Entering edit mode

While it may be common to log(x+1) for RPKM / FPKM, this does not mean that it is statistically valid if your goal is differential expression analysis.

ADD REPLY
1
Entering edit mode
8.2 years ago

You treat negative values the same as positive ones. There's no reason to expect FPKM or TPM or CPM or even normalized counts to be greater than 1.

ADD COMMENT
0
Entering edit mode

Thanks. How about the 0 values, after log transform, they become inf, this should just be treated as NAs?

ADD REPLY
0
Entering edit mode

That's what I'd do.

ADD REPLY
0
Entering edit mode

Thanks. I will do that

ADD REPLY

Login before adding your answer.

Traffic: 879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6