Question: how to use log transformation with read count data?
1
M K510 wrote:

Hi All,

I have a data set ~20000 observations, and when I plot the histogram for this data,I found it skewed to the right. I used log transformation, but I got infinity values because I have many values equal to zero. Is there any way to use the log transformation without removing these zero values because it's important in my analysis. Or is there any other to transformation this data.

R • 13k views
modified 6.1 years ago by Nicolas Rosewick9.2k • written 6.1 years ago by M K510
2

We add 1 to RNA-seq counts for all the transcripts before the log transformation to get rid of the negative values.

so this will not effect the results at all.

3
Nicolas Rosewick9.2k wrote:

You should use the regularized log or the Variance stabilizing transformation. The two transformation are imlpement in DESeq2 package. Check the vignette, it's very well made : http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html

```rld <- rlog(dds)
vsd <- varianceStabilizingTransformation(dds)```

where dds is a DESeq2 object. You can input data from a count matrix or htseq output ( again check the vignette )

When I try rlog or varianceStabilizingTransformation I get the following error:

Error in DESeqDataSet(se, design = design, ignoreRank) :
some values in assay are not integers

my input data is RNA Seq count data (a matrix) where some values are 0 and rest are positive integers. Can you guide me why I am getting this error ? Thanks

could you post the different commands you used please ?

1
_r_am31k wrote:

I'd suggest using a pseudocount. Maybe a value of 0.0001 added to the actual values would make very little diff in log transformation.

3

i think 1 is a better (and more common) number to add. the log of 0.0001 is -9 so probably not what you want for you zero counts

1

I should've thought of that. I agree, 1 is much better!

do you mean by adding small value to the log transformation like the example below

tran_data<- log(data+0.0001)

tran_data<- log2(data+1)

because log2(1) is zero.