Why do I still have missing values in my gene list, even after cooksCutoff and independentFiltering have been set to FALSE ?
1
0
Entering edit mode
2.7 years ago
Sukrit • 0

Hi,

I have started using DESEQ2 package recently for RNA-Seq analysis. I observed that a lot of genes in my list had been assigned a missing value (NA). On reading further, I noticed it can be avoided by taking care of the cooksCutoff and independentFiltering, i.e. cooksCutoff = FALSE, independentFiltering = FALSE.

However, even after that, I found almost 3000 genes which had an NA value assigned to my gene list. Is there any other way to prevent the missing values from occurring at all? If not, how can I proceed further?

DESEQ2 missing symbol gene values • 2.2k views
ADD COMMENT
0
Entering edit mode

What is the baseMean for these genes? Please paste a few lines from the results table.

ADD REPLY
0
Entering edit mode

Hi Kevin,

I am attaching a snippet of the table below for your reference. The baseMean is not zero, and genes with a low baseMean still have a gene symbol assigned to them while those with a higher baseMean are not assigned a gene symbol.

enter image description here

ADD REPLY
0
Entering edit mode

Hi, all of those genes shown in your screenshot have p-values. Can you show the entries for the genes that have NA p-values?

Some of these baseMeans are very low, and a lot of these should be filtered out, in my opinion. However, if this is a knock-out experiment, then, technically, one could expect a low baseMean, depending on the sample size per condition.

ADD REPLY
0
Entering edit mode

Hi Kevin,

Yes, you guessed right, this is a knockout experiment for APEX1 gene. I do not have entries for genes that have NA p-values. I only have NA values in the gene list but not in any of the statistical parameters, baseMean or log2FoldChange.

I would like to prevent NA values appearing for the gene names. I do not have NA values in any other part of the table, except the "symbol" column.

ADD REPLY
0
Entering edit mode
2.7 years ago
ATpoint 82k

Three possible reasons described in the vignette:

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#pvaluesNA

Note on p-values set to NA: some values in the results table can be set to NA for one of the following reasons:

  • If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
  • If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described below
  • If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA. Description and customization of independent filtering is described below

As you excluded 2) and 3) it is probably 1) being an all-zero gene.

ADD COMMENT
0
Entering edit mode

Hi,

Thank you for your response. I am facing a different issue here. The padj and pvalue are not assigned NA but the gene symbol has been assigned an NA value. This prevents me from investigating those genes further, especially ones with a very low pvalue and padj. I have attached an image for the same, where you can observe padj values in the range of e-9 and e-13.

enter image description here

ADD REPLY
0
Entering edit mode

Code for adding that column, so how did you identify those symbols?

ADD REPLY
0
Entering edit mode

Hi,

Please find the code for the addition of the gene symbols.

mapIds function to add individual columns to results table.

Row names provided for the results table as key, and specified that keytype=ENSEMBL

The column argument tells the mapIds function which information we want, and

multiVals tells the function what to do if there are multiple possible values

ens.str <- substr(rownames(DEresults), 1, 15)

DEresults_filtered$symbol <- mapIds(org.Hs.eg.db, keys=ens.str_filtered, column="SYMBOL", keytype="ENSEMBL", multiVals="first", )

ADD REPLY
2
Entering edit mode

I see. Some genes do not have HGNC gene names, simple as that. Welcome to the beautiful messy world of annotations :)

ADD REPLY
0
Entering edit mode

This was my thought too. A few of them are pseudogenes but a lot are producing anti-sense RNA. However, thank you so much again for all the help.

ADD REPLY

Login before adding your answer.

Traffic: 1805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6