Question: Odd DESeq2 results
0
gravatar for mbk0asis
1 day ago by
mbk0asis550
Korea, Republic Of
mbk0asis550 wrote:

Hi, all!

During differential expression analysis using DESeq2, I found odd results from it; some of genes with many missing data were detected as significant DEGs with very low pvalues as follow.

            log2FoldChange  pvalue  padj    Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Control1    Control2    Control3    Control4    Control5
Actc1   25.34   7.34E-72    4.45E-68    0.00    0.00    0.00    0.00    0.00    796.48  0.00    0.00    0.00    0.00    0.00    132.75  0.00

Sorry, it's not so easy to see hear, but you can see that only one sample out of 11 samples + controls actually has a count values which is unlikely detectable as a significant DEG. I observed many genes in this condition were detected as significant DEGs.

The code to obtain these results is like below,

conds <- c(rep("Sample",6),rep("Control",5))
res <- results(dds, contrast = c("conds", Sample, Control))

How come these genes have so low pvalues and padj so they are detected as DEGs?

Can somebody explain to me?

Thank you!

rna-seq deseq2 • 83 views
ADD COMMENTlink modified 1 day ago • written 1 day ago by mbk0asis550

Can you show the output of counts(dds[grep("Actc1"), rownames(dds),])?

ADD REPLYlink written 1 day ago by ATpoint36k

Thank you, ATpoint! But I got error with your command.

counts(dds[grep("Actc1"), rownames(dds),])
Error in grep("Actc1") : argument "x" is missing, with no default

I found a typo in the code,

counts(dds[grep("Actc1", rownames(dds)),])
    Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Control1    Control2    Control3    Control4    Control5
Actc1   0   0   0   0   0   778 0   0   0   0   0
ADD REPLYlink modified 1 day ago • written 1 day ago by mbk0asis550

Sorry for the typo. That is odd, only one sample here has counts whereas in your toplevel example there are also counts for control 4 so it does not really match. Are you sure that there is no parsing error and this p-value and fold change do not belong to a different gene?

ADD REPLYlink written 1 day ago by ATpoint36k

Actual data have more samples in complex conditions, I might have made a mistake copying and pasting the data.

Sorry!

For clarity, I'm posting the actual values in my DESeq2 data. I still see the strangely too low pvalues.

> res[grep("Actc1", rownames(res)),]
log2 fold change (MLE): conds WT_PBMC_24m vs WT_PBMC_2m 
Wald test p-value: conds WT_PBMC_24m vs WT_PBMC_2m 
DataFrame with 1 row and 6 columns
              baseMean   log2FoldChange            lfcSE             stat               pvalue                 padj
             <numeric>        <numeric>        <numeric>        <numeric>            <numeric>            <numeric>
Actc1 4434.62314133305 25.3445537653263 1.41381458753046 17.9263631800519 7.34284592381757e-72 4.44584844534075e-68

> dds2 <- dds[,c(79:84,90:94)]
> counts(dds2[grep("Actc1", rownames(dds2)),])
      WT_PBMC_24m_1 WT_PBMC_24m_2 WT_PBMC_24m_3 WT_PBMC_24m_4 WT_PBMC_24m_5 WT_PBMC_24m_6 WT_PBMC_2m_1 WT_PBMC_2m_2 WT_PBMC_2m_3 WT_PBMC_2m_4 WT_PBMC_2m_5
Actc1             0             0             0             0             0           778            0            0            0            0            0

> cntNorm2 <- cntNorm[,c(79:84,90:94)]
> cntNorm2[grep("Actc1", rownames(cntNorm2)),]
WT_PBMC_24m_1 WT_PBMC_24m_2 WT_PBMC_24m_3 WT_PBMC_24m_4 WT_PBMC_24m_5 WT_PBMC_24m_6  WT_PBMC_2m_1  WT_PBMC_2m_2  WT_PBMC_2m_3  WT_PBMC_2m_4  WT_PBMC_2m_5 
        0.000         0.000         0.000         0.000         0.000       796.481         0.000         0.000         0.000         0.000         0.000

Thank you!

ADD REPLYlink written 1 day ago by mbk0asis550

Without full code and a complete description of the experiment (at least the colData) I cannot comment any further.

ADD REPLYlink written 1 day ago by ATpoint36k

OK. Thank you for spending time for me!

I'll check the codes and data again to see if there's any mistakes.

ADD REPLYlink written 1 day ago by mbk0asis550
3
gravatar for antonioggsousa
1 day ago by
antonioggsousa200 wrote:

Hi,

There are two things that you can do to correct or minimize what you've obtained:

(1) use the lfcShrink() (from the DESeq2) (have a look into the function doc: https://rdrr.io/bioc/DESeq2/man/lfcShrink.html) to shrink the log-fold changes of genes that are highly variable among replicates or have low read counts and, therefore, the estimates are more prone to uncertainty and error. This function is applied after you get the differential expression results. It will not change the p-value, but will decrease the log-fold change towards zero (the official DESeq2 vignette provide some examples);

(2) you can remove zeros and singletons (genes expressed only once) from your data set, as this may represent sequencing artifacts, contamination etc.

I hope this helps,

António

ADD COMMENTlink written 1 day ago by antonioggsousa200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1077 users visited in the last hour