Question

How do I subset gene loci with no difference from DeSeq2 output?

0

Entering edit mode

6.2 years ago

MAPK ★ 2.1k

I have a DeSeq2 logfold change output file for sample 1 vs sample 2 I am comparing with the following columns below:

"Loci"  "baseMean"  "log2FoldChange"    "lfcSE" "stat"  "pvalue"    "padj"

I understand that I can extract differentially expressed loci from the table above using padj threshold of <0.1 significance. Can someone please tell me how I can separate **upregulated genes**, **downregulated genes** and gene sets with **no difference (i.e conserved loci)**. What are the cutoff values I should be considering (specifically for padj) if I need to extract each of these gene subsets (i.e upregulated genes, downregulated genes and gene sets with no difference (i.e conserved loci)? Also, I have lots of loci with padj with NA's and I want to know what NA's mean in this case.

deseq2 • 1.6k views

ADD COMMENT • link updated 6.2 years ago by igor 13k • written 6.2 years ago by MAPK ★ 2.1k

1

Entering edit mode

Just adding a tip for your no-difference question. In DESeq2 you can actually test for no differential expression: see the section "Tests of log2 fold change above or below a threshold" in the vignette.

ADD REPLY • link 6.2 years ago by Martombo ★ 3.1k

score 2 · Accepted Answer · 2018-01-29

2

Entering edit mode

6.2 years ago

swbarnes2 14k

If the number in "log2FoldChange" is negative, that's down regulated.

The NAs are usually genes with so few counts the software can't draw any conclusions about their expression. You'll have to ignore those.

ADD COMMENT • link 6.2 years ago by swbarnes2 14k

0

Entering edit mode

Thanks. But how about those with no change? How do you subset those that are statistically conserved?

ADD REPLY • link 6.2 years ago by MAPK ★ 2.1k

1

Entering edit mode

There is no standard cut-off. If you set the cut-off at log (base 2) fold-change <= -2 for down-regulation (couple with some cut-off for FDR adjusted P value), then you're implying that anything between -2 and +2 is neither up- nor down-regulated.

Z-scores may be an additional way to gauge genes that are unchanged. For example, if a gene has a Z-score <1, it means that it's expression is less than 1 standard deviation difference across all samples.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

score 2 · Accepted Answer · 2018-01-29

If you are using padj<0.1 as significant, then the rest are not significant. Of course, not significant could be both not altered and without sufficient information to make the call.

Regarding NAs, that is actually described in the vignette:

If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.

If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described below

If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA.