DESeq2 and NA adj.pvalue
2
0
Entering edit mode
3.3 years ago

Hi, Could you explain me how it's possible to have large FCs and adj.pvalues equal to "NA" ? The counts table didn't show any missing or aberrant values. Thanks !

DESeq2 • 3.0k views
ADD COMMENT
2
Entering edit mode
3.3 years ago

DESeq2 gives a few reasons in their documentation.

Note on p-values set to NA: some values in the results table can be set to NA for one of the following reasons:

  • If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
  • If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described below
  • If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA. Description and customization of independent filtering is described below
ADD COMMENT
0
Entering edit mode

Thanks, but in my case not all samples have zero counts, only those from one condition : log2FC adj.pValue Cond1 Cond2 Cond3 Mock1 Mock2 Mock3 6.88709872531625 NA 5.31277170703943 27.6763981516444 31.2711272029594 0 0 0 So I was expecting to get a low adj.pValue since there's a difference between the Mock - Cond counts. Maybe the problem comes from "Cond1" which have a lower count vs "Cond2" et "Cond3"

ADD REPLY
0
Entering edit mode

That only addresses point 1 in their documentation. See @i.sudbery's answer on how points 2 and 3 factor in.

ADD REPLY
1
Entering edit mode
3.3 years ago

There are several stages at which an NA might be included in the adj.pvalues column in DESeq2 output.

The two most likely are low expression filtering and outlier exclusion.

  1. Independent hypothesis filtering: DESeq2 automatically finds the base expression level at which to filter in order maximise the number of genes that pass the specified level of alpha (i.e. the FDR). Any genes whose expression level is below this have their adj.pvalue's set to NA, but should still have pvalues (I think). This can be disabled by setting independentFiltering = TRUE in the call to results(). For more details see this part of the DESeq2 user guide.
  2. Outlier exclusion: DESeq2 tests each sample in each gene for outliers using the Cook's distance. Genes to have outliers will also have their padj.pvalues set to zero (I think also their pvalues as well, in contrast to low count filtering). This can be disabled with cooksCutoff=FALSE in the call to results(). For more details see this part of the DESeq2 user guide.
ADD COMMENT
0
Entering edit mode

Using independentFiltering=FALSE, some genes get a significant padj (i.e. < 0.05). So do you think I should absolutely try to recover those genes which have strong log2FCs, very low pvalues but padj set to NA?

ADD REPLY
1
Entering edit mode

You don't have to use this option, but using it maximises the number of significant genes you will get. This does not mean that none of the genes it filters out will significant, but rather than filtering out those genes means that a larger number of other genes become significant.

The other thing to be aware of is that the default setting for significance in DESeq2 is padj<0.1, and this is the level that independentFiltering uses as its benchmark. If you are using 0.05, then you should set alpha=0.05 in your call to results.

ADD REPLY

Login before adding your answer.

Traffic: 2601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6