DESeq2 multifactor design unexpected result
0
0
Entering edit mode
7.5 years ago
john ▴ 50

Some lines of original data

         p1n  p1t  p2n  p2t  p3n  p3t  p4n  p4t  p5n  p5t  p6n  p6t  p7n  p7t  p8n  p8t  p9n p9t  p10n p10t p11n p11t p12n p12t p13n p13t p14n p14t p15n p15t p16n p16t p17n p17t p18n p18t

  "MYOC|4653" 605.00 0.00 1432.00 0.00 3155.00 10.00 36.00 0.00 197.00 2.00 18.00 0.00 37.00 4.00 301.00 1.00 5959.00 2.00 676.00 4.00 2228.00 11.00 1.00 1.00 134.00 5.00 230.00 15.00 676.00 1.00 35.00 1.00 675.00 3.00 204.00 3.00

  "COL10A1|1300" 5.00 622.00 2.00 4525.00 13.00 4091.00 1.00 2493.00 21.00 19.00 1.00 55.00 1.00 5016.00 19.00 1401.00 11.00 1245.00 11.00 24.00 6.00 142.00 3.00 175.00 8.00 1559.00 6.00 1980.00 11.00 535.00 2.00 328.00 8.00 863.00 10.00 1157.00

where p# stands for patient number, n stands for normal, and t stands for tumor

The table design looks like this

         condition patientNumber
  p1n     normal             1
  p1t      tumor             1
  p2n     normal             2
  p2t      tumor             2
  p3n     normal             3
  p3t      tumor             3
  p4n     normal             4
  p4t      tumor             4
  p5n     normal             5
  p5t      tumor             5
  p6n     normal             6
  p6t      tumor             6
  p7n     normal             7
  p7t      tumor             7
  p8n     normal             8
  p8t      tumor             8
  p9n     normal             9
  p9t      tumor             9
  p10n    normal            10
  p10t     tumor            10
  p11n    normal            11
  p11t     tumor            11
  p12n    normal            12
  p12t     tumor            12
  p13n    normal            13
  p13t     tumor            13
  p14n    normal            14
  p14t     tumor            14
  p15n    normal            15
  p15t     tumor            15
  p16n    normal            16
  p16t     tumor            16
  p17n    normal            17
  p17t     tumor            17
  p18n    normal            18
  p18t     tumor            18

Running a multifactor design as such

design = ~condition + patientNumber

I get these results for the corresponding genes

                 "baseMean" "log2FoldChange"    "lfcSE" "stat"  "pvalue"    "padj"
  "COL10A1|1300"    653.809042950468    -0.0625642907767188 0.0326682787216305  -1.91513888166056   0.0554747943911895  0.252118389003285
  "MYOC|4653"   542.597748780303    -0.0328043889593688 0.0345755091590205  -0.94877529665562   0.342734908563562   0.609860961407258

where as with a single factor design like

design = ~condition

I get results like

                 "baseMean" "log2FoldChange"    "lfcSE" "stat"  "pvalue"    "padj"
  "COL10A1|1300"    653.809042950468    6.78450422681106    0.48849885948802    13.8884750599453    7.44025301718552e-44    1.42845417676945e-39
  "MYOC|4653"   542.597748780303    -7.66525845621716   0.603532101073441   -12.7006640451829   5.8630099542801e-37 5.62819640561118e-33

In particular I want to know why the log2foldchange values of the multifactor design is so off

RNA-Seq DESeq2 • 1.8k views
ADD COMMENT
2
Entering edit mode

Without looking, is patientNumber a factor or numeric?

ADD REPLY
0
Entering edit mode

Thanks for responding

     condition patientNumber
        <factor>     <integer>
  p1n     normal             1
  p1t      tumor             1
  p2n     normal             2

I'm guessing that the fact that it is an integer is the source of my issues. If so, should I change it to factor or numeric? and If I do change it to factor, isn't there the issue of reference level? In my case, there is no base level for the patientNumber factor. Sorry if these questions are very trivial. It is my first time using DESeq.

ADD REPLY
1
Entering edit mode

Make it a factor, the reference level doesn't matter, it's just a nuisance variable.

ADD REPLY
0
Entering edit mode

I changed patientNumber into a factor vector and re-ran the multifactor design analysis but the fold change values still look pretty off.

"COL10A1|1300"  653.809042950468    0.156514116923504   0.969946841248428   0.161363602898127   0.871807031641036   0.988385593192889
"MYOC|4653" 542.597748780303    -0.771047101522989  0.991499093631348   -0.777657898505023  0.436770718403106   0.820786203692624
ADD REPLY
0
Entering edit mode

It's a shrunken fold-change, which might be a bit aggressive given your design. Try disabling that then.

ADD REPLY
0
Entering edit mode

Ok so I read documentation on DESeq() and results() and it seems like addMLE is the only argument that fits the bill. I did

res <- results(dds, addMLE = TRUE)

and the lfcMLE value still doesn't quite match the single factor run.

                    "baseMean"  "log2FoldChange"    "lfcMLE"
"MYOC|4653" 542.597748780303    -0.771047101522989  -1.13754641637061
"COL10A1|1300"  653.809042950468    0.156514116923504   0.225895491935506``
ADD REPLY
0
Entering edit mode

There's no reason the two should match, or even be close...one is controlling for patients the other isn't.

ADD REPLY
0
Entering edit mode

Well I understand that it shouldn't match but based on several examples I've seen that compared pool test with paired test, the fold changes values don't vary that much so I assumed there shouldn't be too much of difference in my case too. But this is besides the point.

The lfcMLE values for COL10A1 does not make sense to me, 2^0.225895491 =~1.17. If you were to inspect the expression values for COL10A1, you will see that for all patients except from one (patient 5), the increases in expression are massively positive.

If i were to take 0.225895491 as the correct value for my data, can you explain to me why accounting for patient factor drops my fold change value from 6.8 to 0.23 when the majority of the patients exhibit increased expression dramatically

Thanks

ADD REPLY
0
Entering edit mode

What code are you using to get the results? Be sure to specify that you want condition, since otherwise you'll get the patient18 offset (unless you changed your design such that the patients were before condition).

ADD REPLY
0
Entering edit mode

The code is like this. Everything else I ran with default parameters

dds <- DESeqDataSetFromMatrix(countData = expressionTable, colData = tableDesign, design = ~condition + patientNumber)}
dds<-  dds[    rowSums    (    counts    (dds))    >      1    , ]
dds$condition <- relevel(dds$condition, ref="normal")
dds <- DESeq(dds)
res <- results(dds, addMLE = TRUE)
resOrdered <- res[order(res$padj),]

After that I output the file and grep gene file to look up values

Am I supposed to do

design = ~ patientNumber + condition

instead? I assumed the order for the design arguments didn't matter

ADD REPLY
1
Entering edit mode

It doesn't except when you're trying to plot things or extract results (in which case you can specify the coefficient you want). By default, most things (not just DESeq2) will use the last thing in the model as the default output. In your case that's a patient, so either change the order in the design or use the name= parameter in results().

ADD REPLY
1
Entering edit mode

Devon, thank you so much.

I'm finally seeing fold change values that correspond to what I expect

               log2FoldChange
 "MYOC|4653"     -7.1
 "COL10A1"         6.33333
ADD REPLY
0
Entering edit mode

Glad that's cleared up :)

ADD REPLY

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6