Entering edit mode
                    9.1 years ago
        john
        
    
        ▴
    
    50
    Some lines of original data
         p1n  p1t  p2n  p2t  p3n  p3t  p4n  p4t  p5n  p5t  p6n  p6t  p7n  p7t  p8n  p8t  p9n p9t  p10n p10t p11n p11t p12n p12t p13n p13t p14n p14t p15n p15t p16n p16t p17n p17t p18n p18t
  "MYOC|4653" 605.00 0.00 1432.00 0.00 3155.00 10.00 36.00 0.00 197.00 2.00 18.00 0.00 37.00 4.00 301.00 1.00 5959.00 2.00 676.00 4.00 2228.00 11.00 1.00 1.00 134.00 5.00 230.00 15.00 676.00 1.00 35.00 1.00 675.00 3.00 204.00 3.00
  "COL10A1|1300" 5.00 622.00 2.00 4525.00 13.00 4091.00 1.00 2493.00 21.00 19.00 1.00 55.00 1.00 5016.00 19.00 1401.00 11.00 1245.00 11.00 24.00 6.00 142.00 3.00 175.00 8.00 1559.00 6.00 1980.00 11.00 535.00 2.00 328.00 8.00 863.00 10.00 1157.00
where p# stands for patient number, n stands for normal, and t stands for tumor
The table design looks like this
         condition patientNumber
  p1n     normal             1
  p1t      tumor             1
  p2n     normal             2
  p2t      tumor             2
  p3n     normal             3
  p3t      tumor             3
  p4n     normal             4
  p4t      tumor             4
  p5n     normal             5
  p5t      tumor             5
  p6n     normal             6
  p6t      tumor             6
  p7n     normal             7
  p7t      tumor             7
  p8n     normal             8
  p8t      tumor             8
  p9n     normal             9
  p9t      tumor             9
  p10n    normal            10
  p10t     tumor            10
  p11n    normal            11
  p11t     tumor            11
  p12n    normal            12
  p12t     tumor            12
  p13n    normal            13
  p13t     tumor            13
  p14n    normal            14
  p14t     tumor            14
  p15n    normal            15
  p15t     tumor            15
  p16n    normal            16
  p16t     tumor            16
  p17n    normal            17
  p17t     tumor            17
  p18n    normal            18
  p18t     tumor            18
Running a multifactor design as such
design = ~condition + patientNumber
I get these results for the corresponding genes
                 "baseMean" "log2FoldChange"    "lfcSE" "stat"  "pvalue"    "padj"
  "COL10A1|1300"    653.809042950468    -0.0625642907767188 0.0326682787216305  -1.91513888166056   0.0554747943911895  0.252118389003285
  "MYOC|4653"   542.597748780303    -0.0328043889593688 0.0345755091590205  -0.94877529665562   0.342734908563562   0.609860961407258
where as with a single factor design like
design = ~condition
I get results like
                 "baseMean" "log2FoldChange"    "lfcSE" "stat"  "pvalue"    "padj"
  "COL10A1|1300"    653.809042950468    6.78450422681106    0.48849885948802    13.8884750599453    7.44025301718552e-44    1.42845417676945e-39
  "MYOC|4653"   542.597748780303    -7.66525845621716   0.603532101073441   -12.7006640451829   5.8630099542801e-37 5.62819640561118e-33
In particular I want to know why the log2foldchange values of the multifactor design is so off
Without looking, is
patientNumbera factor or numeric?Thanks for responding
I'm guessing that the fact that it is an integer is the source of my issues. If so, should I change it to factor or numeric? and If I do change it to factor, isn't there the issue of reference level? In my case, there is no base level for the patientNumber factor. Sorry if these questions are very trivial. It is my first time using DESeq.
Make it a factor, the reference level doesn't matter, it's just a nuisance variable.
I changed patientNumber into a factor vector and re-ran the multifactor design analysis but the fold change values still look pretty off.
It's a shrunken fold-change, which might be a bit aggressive given your design. Try disabling that then.
Ok so I read documentation on DESeq() and results() and it seems like addMLE is the only argument that fits the bill. I did
and the lfcMLE value still doesn't quite match the single factor run.
There's no reason the two should match, or even be close...one is controlling for patients the other isn't.
Well I understand that it shouldn't match but based on several examples I've seen that compared pool test with paired test, the fold changes values don't vary that much so I assumed there shouldn't be too much of difference in my case too. But this is besides the point.
The lfcMLE values for COL10A1 does not make sense to me, 2^0.225895491 =~1.17. If you were to inspect the expression values for COL10A1, you will see that for all patients except from one (patient 5), the increases in expression are massively positive.
If i were to take 0.225895491 as the correct value for my data, can you explain to me why accounting for patient factor drops my fold change value from 6.8 to 0.23 when the majority of the patients exhibit increased expression dramatically
Thanks
What code are you using to get the results? Be sure to specify that you want
condition, since otherwise you'll get the patient18 offset (unless you changed your design such that the patients were beforecondition).The code is like this. Everything else I ran with default parameters
After that I output the file and grep gene file to look up values
Am I supposed to do
instead? I assumed the order for the design arguments didn't matter
It doesn't except when you're trying to plot things or extract results (in which case you can specify the coefficient you want). By default, most things (not just DESeq2) will use the last thing in the model as the default output. In your case that's a patient, so either change the order in the design or use the
name=parameter inresults().Devon, thank you so much.
I'm finally seeing fold change values that correspond to what I expect
Glad that's cleared up :)