Deseq Infinite In Logfc And Na For P Value
1
0
Entering edit mode
11.0 years ago
k.nirmalraman ★ 1.1k

I am performing DE analysis using DESeq package. The single end illumina reads are bowtie aligned. I used the raw counts for DE as required by DESeq. I got the below shown normalized counts from the following code

temp = t(sizeFactors(cds))  
sizematrix<-matrix(data=temp, nrow=nrow(countTable), ncol=ncol(temp), byrow=TRUE)
scaledcounts = countTable/sizematrix

This one of the feature I am showing here for an example. (S1_1,S1_2,S1_3 are replicates and similarly S2)

`S1_1            S1_2          S1_3         S2_1      S2_2        S2_3`
`0                    0                0        5.87      12.60         21.85

The corresponding result after negative binomial Test from DESeq, I get the below data.

`Base Mean         BaseMeanA     BaseMeanB      FoldChange   log2FoldChange      pVal           pAdj
`6.72205848463734    0         13.4                  Inf               Inf                         0.0004       0.00136

My question is in the given case, I would assume, S2 expresses this gene and S1 does not. In my filtering for significant genes, my biologist is more concerned about the log fold change besides the PValue. So filtering criteria would look something like Pval < 0.0001 & abs(log2Foldchange) > 1.5.

In this case, I would not be able to deal with the infinite value. Would it make sense to make the count data non-zero (say by simply adding a constant value to all count data (constant, not too large)) and then perform DE analysis.

Another extension question would be, how to evaluate the quality of each sample based on count data... in terms of any sample outliers, etc!

Thanks!

differential-expression differential-expression rna rna-seq illumina • 5.5k views
ADD COMMENT
1
Entering edit mode
11.0 years ago

Inf as a fold change should not give you any trouble when your criterion is abs(log2Foldchange) > 1.5. If you try this in R, you should get the following:

abs(Inf) > 1.5 [1] TRUE

If a gene does not have any reads in the control condition and is induced in your condition S2, the log2Foldchange will always be Inf, no matter whether you have only a few or many reads in S2. The p-value, however, will differ, because many reads are less likely to be obtained by chance than just a few. Here is one example from an analysis that I did with DESeq:

Gene1 ctrl1: 0 ctrl2: 0
ctrl3: 0
treated1: 25
treated2: 30
treated3: 51
pVal: 0.000158773 pAdj: 0.000545344 log2FoldChange: Inf

Gene2 ctrl1: 0 ctrl2: 0
ctrl3: 0
treated1: 6
treated2: 8
treated3: 7
pVal: 0,339767037 pAdj: 0.524389436 log2FoldChange: Inf

Gene1 has much lower p-values although the fold change is Inf in both cases.

ADD COMMENT
0
Entering edit mode

Thank you,... That was really helpful.

ADD REPLY

Login before adding your answer.

Traffic: 2735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6