Hello !
I have a DESeq2 result dataframe-like structure in R with some NAs in the p.adj column. Strangely, those NAs are handled seemingly randomly by two extremely similar functions used to test the genes dow- or up- regulation. What is happening here ? A quick fix would be to change those NAs into "1" (never significant) but I want to understand :) Here is my code:
significant_D <- function(x){return(x$padj < 0.01 & (x$log2FoldChange) < -0.584)}
significant_O <- function(x){return(x$padj < 0.01 & (x$log2FoldChange) > 0.584)}
head(DESeq_results[whichis.na(DESeq_results$padj)),])
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
WBGene00021406 20.718704 -0.21384520 0.5063939 -0.42229021 0.6728132 NA
WBGene00021407 3.961096 0.66807041 1.1159760 0.59864226 0.5494115 NA
WBGene00021405 1.939649 -1.16416923 1.6007395 -0.72726966 0.4670608 NA
WBGene00021409 1.719862 1.91952086 2.2842210 0.84033940 0.4007181 NA
WBGene00235257 7.055687 0.23653150 0.8488041 0.27866442 0.7805024 NA
WBGene00015246 15.699319 0.06366663 0.6514634 0.09772863 0.9221478 NA
sumis.na(significant_D(DESeq_results)))
[1] 2558
sumis.na(significant_O(DESeq_results)))
[1] 1452
sumis.na(significant_D(DESeq_results)) & is.na(significant_O(DESeq_results)))
[1] 0
sumis.na(DESeq_results$padj))
[1] 8582
Good catch ! Thanks !