Question: DESeq2 rlogTransformed count data have negative values. Why are they negative?
0
gravatar for okna0215
6 weeks ago by
okna02150
okna02150 wrote:

Hello all. I am running RNA-seq R script and my goal is to get rlogTransformed count data. But in rlogTransformed count data : some are negative values. Does anyone know why the numbers come like this? Please give me kind explanation of this.

Here is my script :

countdata <- read.table("all_count.txt", header=TRUE, row.names=1)
countdata <- as.matrix(countdata)
(condition <- factor(c(rep("a", 1), rep("b", 1), rep("c",1), rep("d",1), rep("e",1), rep("f",1), rep("g",1), rep("h",1), rep("i",1), rep("j",1), rep("k",1), rep("l",1), rep("m",1), rep("n",1), rep("o",1), rep("p",1))))
library("DESeq2")
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds <- DESeq(dds)
rld <- rlogTransformation(dds)
t1 <- assay(rlog(dds, blind = FALSE))
write.csv(t1, file = 'rld.csv')

My count data looks like this : (column names : gene_id, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)

ENSG00000223972 0   5   0   1   0   0   3   0   7   2   0   0   0   6   0   0
ENSG00000227232 738 687 817 785 862 920 616 828 718 533 338 718 563 622 241 402
ENSG00000278267 35  45  44  28  25  48  32  27  23  15  11  21  3   22  40  24
ENSG00000243485 0   0   0   2   0   0   0   0   0   2   0   0   0   0   0   5

Here is normalized count data (FPKM) : (column names : gene_id, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)

ENSG00000223972 0   4.288951195 0   0.886495015 0   0   2.634664669 0   6.740607986 1.825436668 0   0   0   5.552312428 0   0

ENSG00000227232 668.8930875 589.3018942 718.6769008 695.8985868 812.6102698 883.1227192 540.9844787 749.7218024 691.3937905 486.4788721 348.1203536 682.8221808 560.7432175 575.5897217 349.3737234 574.2096037

ENSG00000278267 31.7225719  38.60056075 38.70475353 24.82186042 23.56758323 46.07596796 28.10308981 24.44745008 22.14771195 13.69077501 11.32936062 19.97112228 2.987974516 20.3584789  57.98733999 34.28117037

ENSG00000243485 0   0   0   1.77299003  0   0   0   0   0   1.825436668 0   0   0   0   0   7.141910494

ENSG00000284332 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

Here is my rlogTransformed data : (column names : gene_id, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)

ENSG00000223972 -0.136494061    0.473993602 -0.136927121    0.031100344 -0.135908179    -0.135632089    0.286572765 -0.136508589    0.692740738 0.177626545 -0.134517951    -0.135774897    -0.135056306    0.593107085 -0.128129828    -0.128442585

ENSG00000227232 9.355356576 9.209612781 9.438295885 9.401053408 9.58080676  9.677798069 9.111648865 9.487270931 9.393542263 8.990608069 8.614883247 9.379137129 9.152689426 9.182616956 8.61918245  9.17988811

ENSG00000278267 4.84784725  5.034954118 5.037459351 4.621104012 4.574230631 5.206904356 4.734912851 4.607316964 4.518616217 4.107979334 3.959359318 4.427255644 3.127111744 4.444027165 5.432977714 4.91932701

ENSG00000243485 -1.132208473    -1.13259963 -1.132423133    -0.82661946 -1.129384537    -1.127751285    -1.132434672    -1.132215675    -1.127467001    -0.819432811    -1.121395624    -1.128593051    -1.124421178    -1.131058152    -1.090539496    -0.28826095

ENSG00000284332 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

I want to know why the rlogTransformed numbers come like this. Please give me kind explanation of this.

Thank you.

rna-seq R • 148 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by okna02150
3
gravatar for dsull
6 weeks ago by
dsull1.4k
UCLA
dsull1.4k wrote:

Because it's a log transform. Taking the log of any positive number less than 1 (which some of your counts will be post-normalization) will get you a negative number.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by dsull1.4k

Thank you for kind reply. Do you mean, when counts are normalized, there would be numbers less than 1 (0 < x < 1). Then these numbers will be log transformed as minus values?

As far as I understand, deseq() performs normalization of counts, and I can get normalized count values(FPKM) through counts(dds, normalized = TRUE) command. But still, in normalized counts, it seems that some 0 are transformed to negative values. (I edited the post to paste normalized counts)

If you have any idea, please answer me.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by okna02150
3

This is becausae of how rlog works.

The r in rlog stands for "regularised". That means that what rlog is computing is for any count X rlog(X) = log(X+a). In many applications people just use a=1, but rlog calculates a more appropriate a for each gene, thus you can end up with an a less than one. If a was 0.5, for example, then rlog(0) would be -1.

ADD REPLYlink written 6 weeks ago by i.sudbery8.2k

Thank you i.sudbery. Your explanation is understandable. Now I have no doubt. :)

ADD REPLYlink written 6 weeks ago by okna02150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1389 users visited in the last hour