Question: High logfold change but padj > 0.1 in DESEQ2
0
gravatar for thjnant
10 weeks ago by
thjnant100
Germany
thjnant100 wrote:

Hello,

I am analysing RNA-seq data to investigate differential gene expression in hybrids compared to parental species. Since I work with natural populations, I have few samples (5 of two different tissues for each species and hybrid).

I am using the DESEQ2 package for my expression analysis. What I observe is that many genes, although they have a high log fold change (more than 1 or even 1.5), they have a padj > 0.1. While this is true in one group, in another group, genes with a log fold change of 1 or even lower are having a padj < 0.1 or even padj < 0.05.

I was wondering what are the reasons for this observation?

Thank you in advance.

expression rna-seq deseq2 R • 145 views
ADD COMMENTlink modified 10 weeks ago by ATpoint36k • written 10 weeks ago by thjnant100
2

If you plot the normalized expression levels of those genes for each condition, you might understand why.

ADD REPLYlink written 10 weeks ago by geek_y11k
4
gravatar for ATpoint
10 weeks ago by
ATpoint36k
Germany
ATpoint36k wrote:

Fold changes tend to be higher when genes have overall lower expression (which means low counts). Since low counts have lower power than high counts the significance for these fold changes is often low unless these FCs are supported by many replicates.

Example 1: Two genes had expression of 50 an 5. That would be a fold change of 10.

Example 2: Two genes had expression of 5000 an 500. That would be a fold change of 10 as well.

Still, the second one is much more reliable as the first one could be a product of the technical noise produced by the sequencing. Adding or reducing e.g. 10 counts to example a can change the result quite much:

50 - 10 vs 5 + 5 would already change the original FC from 10 to 2.6 whereas

5000 - 10 - 500 + 5 changes the FC from 10 to 9.88.

You can see that higher counts are less affected by small fluctuations in counts, therefore they are more reliable. In DESeq2 you can check the baseMean column to get the average expression. This is probably low for many of these genes with high FCs but large padj. You can visualize this relationship of baseMean to logFC with the plotMA function.

This is where the concept of shrinkage kicks in. It aims to estimate the "true" fold changes from the data. As you can see below there is little evidence for the fold changes of the genes with low baseMean to be actually true, so they are shrunken towards zero. If you want lowly-expressed genes to be significant 8given they in fact are DEG) then you need most importantly many replicates and high sequencing depth.

Check the DESeq2 vignette for it.

Some examples:

Unshrunken FCs:

enter image description here

Shrunken:

enter image description here

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by ATpoint36k

Extremely helpful, thank you so much!

ADD REPLYlink written 10 weeks ago by thjnant100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 601 users visited in the last hour