Question: How do I subset gene loci with no difference from DeSeq2 output?
gravatar for MAPK
2.5 years ago by
MAPK1.6k wrote:

I have a DeSeq2 logfold change output file for sample 1 vs sample 2 I am comparing with the following columns below:

"Loci"  "baseMean"  "log2FoldChange"    "lfcSE" "stat"  "pvalue"    "padj"

I understand that I can extract differentially expressed loci from the table above using padj threshold of <0.1 significance. Can someone please tell me how I can separate **upregulated genes**, **downregulated genes** and gene sets with **no difference (i.e conserved loci)**. What are the cutoff values I should be considering (specifically for padj) if I need to extract each of these gene subsets (i.e upregulated genes, downregulated genes and gene sets with no difference (i.e conserved loci)? Also, I have lots of loci with padj with NA's and I want to know what NA's mean in this case.

deseq2 • 949 views
ADD COMMENTlink modified 2.5 years ago by igor11k • written 2.5 years ago by MAPK1.6k

Just adding a tip for your no-difference question. In DESeq2 you can actually test for no differential expression: see the section "Tests of log2 fold change above or below a threshold" in the vignette.

ADD REPLYlink written 2.5 years ago by Martombo2.6k
gravatar for swbarnes2
2.5 years ago by
United States
swbarnes28.2k wrote:

If the number in "log2FoldChange" is negative, that's down regulated.

The NAs are usually genes with so few counts the software can't draw any conclusions about their expression. You'll have to ignore those.

ADD COMMENTlink written 2.5 years ago by swbarnes28.2k

Thanks. But how about those with no change? How do you subset those that are statistically conserved?

ADD REPLYlink written 2.5 years ago by MAPK1.6k

There is no standard cut-off. If you set the cut-off at log (base 2) fold-change <= -2 for down-regulation (couple with some cut-off for FDR adjusted P value), then you're implying that anything between -2 and +2 is neither up- nor down-regulated.

Z-scores may be an additional way to gauge genes that are unchanged. For example, if a gene has a Z-score <1, it means that it's expression is less than 1 standard deviation difference across all samples.

ADD REPLYlink written 2.5 years ago by Kevin Blighe63k
gravatar for igor
2.5 years ago by
United States
igor11k wrote:

If you are using padj<0.1 as significant, then the rest are not significant. Of course, not significant could be both not altered and without sufficient information to make the call.

Regarding NAs, that is actually described in the vignette:

If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.

If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described below

If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA.

ADD COMMENTlink written 2.5 years ago by igor11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1569 users visited in the last hour