Fold changes tend to be higher when genes have overall lower expression (which means low counts).
Since low counts have lower power than high counts the significance for these fold changes is often low unless these FCs are supported by many replicates.
Example 1: Two genes had expression of 50 an 5. That would be a fold change of 10.
Example 2: Two genes had expression of 5000 an 500. That would be a fold change of 10 as well.
Still, the second one is much more reliable as the first one could be a product of the technical noise produced by the sequencing. Adding or reducing e.g. 10 counts to example a can change the result quite much:
50 - 10 vs
5 + 5 would already change the original FC from 10 to 2.6 whereas
5000 - 10 -
500 + 5 changes the FC from 10 to 9.88.
You can see that higher counts are less affected by small fluctuations in counts, therefore they are more reliable.
In DESeq2 you can check the
baseMean column to get the average expression. This is probably low for many of these genes with high FCs but large padj. You can visualize this relationship of baseMean to logFC with the
This is where the concept of shrinkage kicks in. It aims to estimate the "true" fold changes from the data. As you can see below there is little evidence for the fold changes of the genes with low baseMean to be actually true, so they are shrunken towards zero. If you want lowly-expressed genes to be significant 8given they in fact are DEG) then you need most importantly many replicates and high sequencing depth.
Check the DESeq2 vignette for it.
modified 10 weeks ago
10 weeks ago by
ATpoint ♦ 36k