I ran the DESeq2 tool and have obtained a table whose column names are here:
baseMean log2FoldChange lfcSE stat pvalue padj
I have an two important genes whose log2FC value of gene A= +0.31 and gene B= -0.25 and padj value is <0.05 for both. I assumed gene A is upregulated and gene B is downregulated? Is that correct or log2FC has to be >1 in order for a gene to be upregulated? Please confirm?
pvalue and padj help you determine whether the up-/down-regulation is statistically significant.
Whether you need a log2FC to be >1 is up to you. log2FC is an "effect size" -- e.g. a log2FC of 8.2 is a bigger effect (i.e. more upregulation) than a log2FC of 0.7. It's up to you whether you want to care about huge effects vs. if you're also interested in more subtle effects. Obviously if your experiment is overexpressing gene A under a super strong promoter, then I'd be concerned about getting a log2FC of +0.31.
The meaning of log2FoldChange is literal: we take a log2 of the calculated fold change.
The inverse reaction gets us back to the fold-change, and is calculated by raising 2 to the log2FC power, so you are correct about the meaning of +/- log2FC. Many people consider that the absolute value of log2FC should be larger than 1 in order to be considered significant, not to be considered upregulated. Since 2^1 = 2, that means that many people consider a fold-change of 2x in either direction to be significant. In your case the fold-changes are 2^0.31=1.24 and 2^-0.25=0.84 compared to your standard conditions.
I am not going to tell you how to feel about your research findings, but I would not be terribly excited that my gene is upregulated 1.24x compared to the standard condition, regardless of what the adjusted p-value says. Or that it is downregulated from 1 to 0.84x.
"doubling in the original scaling is equal to a log2 fold change of 1, a quadrupling is equal to a log2 fold change of 2 and so on. Conversely, the measure is symmetric when the change decreases by an equivalent amount e.g. a halving is equal to a log2 fold change of −1, a quartering is equal to a log2 fold change of −2 and so on"
While doing DE analysis, a gene is often considered up regulated if its log2FC is equal or greater than 1 (symmetrically for down regulated genes). This value, as explained by others, could vary in respect of considered genes, sequencing depth, experimental setup and so on. For more details, read the DESeq2 vignette where these values are well explained.
In most cases it is not just about what we feel, but what we can convince others to be relevant. Anyone can feel great that the expression of their gene changed 1.17x from one condition to another, but we still need to convince our peers (and reviewers) that this change is biologically meaningful.
I don't think rationalizing
log2FCin terms of the promoter strength is what will help with data explanation. A gene can be 10x overexpressed from a weak or from a strong promoter, because the conditions for high transcription were created (say, all activating transcription factors were present in a given condition). What we are measuring is a relative expression change for a large number of genes with changing conditions, and promoter strength doesn't really enter my thinking. Rather, most people think of the effect of transcriptional noise, and try to rationalize what level of fold-change is likely to raise above the transcriptional noise. I don't know if anyone has quantified this properly - chances are that someone has. Most people will intuitively accept that a fold-change of 2+ is real. One can certainly make an argument that a fold change of 1.7 is real as well, but that argument is less likely to fly the closer we get to 1. In my mind that has nothing to do with promoter strength.
RNA-seq analysis is about relative abundances. If "conditions for high transcription were created", you would not see a fold change of 10x for every gene if you normalize your count estimates correctly.
My "changing condition" would be putting gene A in an expression vector under a strong promoter (e.g. a CMV promoter) versus empty vector control. So yes, you would expect a large fold change otherwise there's something wrong with your experiment. (Edit: perhaps my example was confusing and you misunderstood, so I'll provide a better one: imagine gene A is a strong marker gene or is one of the most highly expressed genes for liver cells but isn't expressed at all in bile duct cells; and you're comparing liver vs. bile duct for your treatment vs. control condition). My point is that, if you have some very strong prior about gene A's expression changes, you should use that.
And yes, you do have to rely on your intuition when thresholding (you need to think about whether you care about only the biggest changes or whether you also want to select more subtle changes). Same thing with p-value thresholds: I'll use a looser threshold when I want more sensitivity (at the risk of generating more false positives). I don't think there's anyone who can justify "a 2-fold change is definitely relevant while a 1.7-fold change is definitely not" to a reviewer. Defining a threshold isn't a perfect science and we shouldn't pretend that it is.
As for your claim "Rather, most people think of the effect of transcriptional noise, and try to rationalize what level of fold-change is likely to raise above the transcriptional noise", this is what hypothesis testing is for. The change might be too small to be meaningful in your biological system, but the change is statistically significant so you can't tell me there is no change and that it's all random noise.
Nice explanation. Thanks for this.