Question: How do I subset gene loci with no difference from DeSeq2 output?
0
gravatar for MAPK
20 months ago by
MAPK1.4k
United States
MAPK1.4k wrote:

I have a DeSeq2 logfold change output file for sample 1 vs sample 2 I am comparing with the following columns below:

"Loci"  "baseMean"  "log2FoldChange"    "lfcSE" "stat"  "pvalue"    "padj"

I understand that I can extract differentially expressed loci from the table above using padj threshold of <0.1 significance. Can someone please tell me how I can separate **upregulated genes**, **downregulated genes** and gene sets with **no difference (i.e conserved loci)**. What are the cutoff values I should be considering (specifically for padj) if I need to extract each of these gene subsets (i.e upregulated genes, downregulated genes and gene sets with no difference (i.e conserved loci)? Also, I have lots of loci with padj with NA's and I want to know what NA's mean in this case.

deseq2 • 709 views
ADD COMMENTlink modified 20 months ago by igor8.3k • written 20 months ago by MAPK1.4k
1

Just adding a tip for your no-difference question. In DESeq2 you can actually test for no differential expression: see the section "Tests of log2 fold change above or below a threshold" in the vignette.

ADD REPLYlink written 20 months ago by Martombo2.5k
2
gravatar for swbarnes2
20 months ago by
swbarnes26.5k
United States
swbarnes26.5k wrote:

If the number in "log2FoldChange" is negative, that's down regulated.

The NAs are usually genes with so few counts the software can't draw any conclusions about their expression. You'll have to ignore those.

ADD COMMENTlink written 20 months ago by swbarnes26.5k

Thanks. But how about those with no change? How do you subset those that are statistically conserved?

ADD REPLYlink written 20 months ago by MAPK1.4k
1

There is no standard cut-off. If you set the cut-off at log (base 2) fold-change <= -2 for down-regulation (couple with some cut-off for FDR adjusted P value), then you're implying that anything between -2 and +2 is neither up- nor down-regulated.

Z-scores may be an additional way to gauge genes that are unchanged. For example, if a gene has a Z-score <1, it means that it's expression is less than 1 standard deviation difference across all samples.

ADD REPLYlink written 20 months ago by Kevin Blighe48k
2
gravatar for igor
20 months ago by
igor8.3k
United States
igor8.3k wrote:

If you are using padj<0.1 as significant, then the rest are not significant. Of course, not significant could be both not altered and without sufficient information to make the call.

Regarding NAs, that is actually described in the vignette:

If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.

If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described below

If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA.

ADD COMMENTlink written 20 months ago by igor8.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1165 users visited in the last hour