Question: Actually meaning of log2FoldChange, p-value & padj in DESeq2 results
1
gravatar for hffqyd
9 weeks ago by
hffqyd10
hffqyd10 wrote:

Hi, I'm new to RNA-seq and differentially expressed gene (DEG) analysis using DESeq2, the R package. But I encountered a problem need for help.

The experiment has samples of 3 group (control, treat1, treat2) with 3 replications each. Having RNA-sequenced, the reads counts were inputed to DESeq2 for DEG analysis, according to DESeq2's manual.

dds=DESeqDataSetFromMatrix(countData = counts, 
                             colData = samples, 
                           design = ~ group)

dds = DESeq(dds)

result=results(dds, contrast=c("group", "treat1", "control"))

In the result, there was log2FoldChange, p value, padj, etc for each gene. The question is when I compared the log2FoldChange with the raw reads of some genes it seemed that the log2FoldChange compared "control VS treat1" (here treat1 was the base level) or "treat1 VS treat2", instead of "treat1 VS control" (here control was the base level, which was my expectation). So it would be the case that a gene up expressed in treat1 (by looking at the raw reads) was down expressed in the results.

Below is a subset of the result.


<style type="text/css"> .tg {border-collapse:collapse;border-spacing:0;border-color:#ccc;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#fff;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#f0f0f0;} .tg .tg-0lax{text-align:left;vertical-align:top} </style>

symbol control-1_reads control-2_reads control-3_reads treat1-1_reads treat1-2_reads treat1-3_reads treat2-1_reads treat2-2_reads treat2-3_reads control-1_norm control-2_norm control-3_norm treat1-1_norm treat1-2_norm treat1-3_norm treat2-1_norm treat2-2_norm treat2-3_norm baseMean log2FoldChange lfcSE stat pvalue padj
gene1 104 113 155 518 514 471 0 1 0 121.8857286 99.70389301 129.1534889 489.6363946 454.1863516 426.6076245 0 1.150867377 0 191.3693721 10.14006488 1.185315798 8.554736972 1.18E-17 6.17E-17
gene2 44 36 47 903 1020 1033 2 1 1 51.56703902 31.76407211 39.16267082 853.5553366 901.303655 935.6383781 2.048006023 1.150867377 1.223606775 313.0459591 9.221995767 0.729090182 12.64863524 1.14E-36 1.02E-35
gene3 28 30 38 133 132 119 0 0 0 32.81538847 26.47006009 31.66343599 125.7174527 116.6392965 107.784092 0 0 0 49.00996952 9.133180831 1.200947709 7.604977935 2.85E-14 1.30E-13
gene4 457 573 721 34150 35421 35562 65 65 62 535.5940189 505.5781477 600.7720354 32280.08277 31299.09487 32210.23427 66.56019573 74.80637953 75.86362004 10849.84292 8.790220784 0.109694018 80.13400281 0 0
gene5 2 1 1 14 9 15 473 395 372 2.343956319 0.882335336 0.833248315 13.23341607 7.952679309 13.58623008 484.3534243 454.592614 455.1817202 159.217736 -5.33149575 0.272127405 -19.59191047 1.81E-85 3.54E-84
gene6 34 55 44 158 162 187 3498 3292 2806 39.84725743 48.5284435 36.66292588 149.3485528 143.1482276 169.3750017 3581.962533 3788.655406 3433.44061 1265.663218 -4.548224914 0.088341971 -51.48430413 0 0
gene7 1170 1703 1749 337 339 424 8399 7277 7116 1371.214447 1502.617078 1457.351304 318.5472297 299.5509206 384.0374369 8600.601292 8374.861905 8707.185809 3446.218602 -4.679805864 0.064073627 -73.03794184 0 0
gene8 25 31 29 1194 1279 1196 22834 21585 19438 29.29945399 27.35239543 24.16420115 1128.621342 1130.164093 1083.275412 23382.08476 24841.47234 23784.46849 8381.211387 -4.429334677 0.041503905 -106.7209146 0 0


Could you please tell me why? Was there I'm doing something wrong? Thank you very much!

ADD COMMENTlink modified 8 weeks ago • written 9 weeks ago by hffqyd10
1

The object generated by results() should spell out pretty clearly which comparison you did as in the following example:

>DESeq.ds <- DESeq(DESeq.ds)
>DGE.results <- results(DESeq.ds)
> head(DGE.results)
log2 fold change (MLE): condition WT vs SNF2  ## this indicates that the logFC corresponds to WT/SNF2
Wald test p-value: condition WT vs SNF2 
DataFrame with 6 rows and 6 columns
                   baseMean     log2FoldChange             lfcSE               stat               pvalue                 padj
                  <numeric>          <numeric>         <numeric>          <numeric>            <numeric>            <numeric>
YAL012W    5542.39136255275 -0.316249974757704  0.16213508054358  -1.95053392330292   0.0511125141585705    0.100943930597228
YAL068C   0.967854292906838 -0.402411323206287  1.22661502833643 -0.328066519576276    0.742861367806886                   NA
YAL067C    40.8786754337591   -1.0896295169803 0.229556222673793  -4.74667819625479 2.06784550031866e-06 1.26457496964761e-05
YAL068C   0.967854292906838 -0.402411323206287  1.22661502833643
ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Friederike2.3k

Thanks for your reply. Yes, the lines in the head of result did indicate the logFC corresponding to treat1/control of group. But, as mentioned above, some of the logFC looked odd.

> head(result)
log2 fold change (MLE): group treat1 vs control 
Wald test p-value: group treat1 vs control 
DataFrame with 6 rows and 6 columns
         baseMean log2FoldChange     lfcSE        stat    pvalue

...
...
ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by hffqyd10

in your snapshot above, which logFC values seem odd to you?

ADD REPLYlink written 9 weeks ago by Friederike2.3k

gene5, gene6, gene7, gene8.

ADD REPLYlink written 9 weeks ago by hffqyd10

what do the rlog- or vst-transformed values look like?

gene 7 looks fine to me: gene7 1170 1703 1749 337 339 424

EDIT: I agree, your table contains a couple of puzzling results. Here's a couple of thoughts:

  • lowly expressed genes tend to show sometimes non-intuitive behaviors due to all the normalizations and assumptions that are being made.
  • How did you add the raw counts for the second treatment?
ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Friederike2.3k

Sorry for replying late.

  1. I updated the DESeq2 normalized counts;

  2. Yeah, gene7 is fine, I pasted the wrong line;

  3. If I didn't count wrong, there about 2% genes seemed non-intuitive like this.

  4. I merged the original counts data to the result by gene symbol;

Thanks for your reply.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by hffqyd10

Did you check whether those 2% fall into the category of 'lowly expressed'?

I think, having the counts in treatment 2 in there plays a role (just look at the baseMean values). I'm not super familiar with the nitty-gritty details of how DESeq uses the contrast argument for the logFC generation, but I have a feeling that the results would probably look more intuitive if you ran the same analysis using only the values for treat1 and control. I recommend you post that question over on https://support.bioconductor.org/ where the authors of DESeq2 are going to see the question. Do post the answer here once you have it.

EDIT: Out of curiosity, can you post the results of resultsNames(dds)?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Friederike2.3k
  1. I supposed the treat2 looked to have been used in the "contrast";

  2. I had already posted to https://support.bioconductor.org, thanks;

  3. Below is the output of resultsNames(dds):

> resultsNames(dds)

[1]  "Intercept" 

[2]  "group_treat1_vs_control"

[3]  "group_treat2_vs_control"
  
ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by hffqyd10

What was the response at the bioconductor support page?

What happens when you use "group_treat1_vs_control" as the contrast name for retrieving the results?

ADD REPLYlink written 8 weeks ago by Friederike2.3k
1

Rather than the raw counts, post the normalized counts:

counts(dds, normalized = TRUE)

ADD REPLYlink written 8 weeks ago by swbarnes24.5k

Sorry for replying late.

I updated the DESeq2 normalized counts above.

Thanks for your reply.

ADD REPLYlink written 8 weeks ago by hffqyd10

Is it possible that you've mixed up the "group" labeling with respect to samples?

ADD REPLYlink written 9 weeks ago by swbarnes24.5k

Thanks for your reply, I checked but I didn't mix up.

ADD REPLYlink written 9 weeks ago by hffqyd10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1608 users visited in the last hour