Entering edit mode
7.5 years ago
john
▴
50
Some lines of original data
p1n p1t p2n p2t p3n p3t p4n p4t p5n p5t p6n p6t p7n p7t p8n p8t p9n p9t p10n p10t p11n p11t p12n p12t p13n p13t p14n p14t p15n p15t p16n p16t p17n p17t p18n p18t
"MYOC|4653" 605.00 0.00 1432.00 0.00 3155.00 10.00 36.00 0.00 197.00 2.00 18.00 0.00 37.00 4.00 301.00 1.00 5959.00 2.00 676.00 4.00 2228.00 11.00 1.00 1.00 134.00 5.00 230.00 15.00 676.00 1.00 35.00 1.00 675.00 3.00 204.00 3.00
"COL10A1|1300" 5.00 622.00 2.00 4525.00 13.00 4091.00 1.00 2493.00 21.00 19.00 1.00 55.00 1.00 5016.00 19.00 1401.00 11.00 1245.00 11.00 24.00 6.00 142.00 3.00 175.00 8.00 1559.00 6.00 1980.00 11.00 535.00 2.00 328.00 8.00 863.00 10.00 1157.00
where p# stands for patient number, n stands for normal, and t stands for tumor
The table design looks like this
condition patientNumber
p1n normal 1
p1t tumor 1
p2n normal 2
p2t tumor 2
p3n normal 3
p3t tumor 3
p4n normal 4
p4t tumor 4
p5n normal 5
p5t tumor 5
p6n normal 6
p6t tumor 6
p7n normal 7
p7t tumor 7
p8n normal 8
p8t tumor 8
p9n normal 9
p9t tumor 9
p10n normal 10
p10t tumor 10
p11n normal 11
p11t tumor 11
p12n normal 12
p12t tumor 12
p13n normal 13
p13t tumor 13
p14n normal 14
p14t tumor 14
p15n normal 15
p15t tumor 15
p16n normal 16
p16t tumor 16
p17n normal 17
p17t tumor 17
p18n normal 18
p18t tumor 18
Running a multifactor design as such
design = ~condition + patientNumber
I get these results for the corresponding genes
"baseMean" "log2FoldChange" "lfcSE" "stat" "pvalue" "padj"
"COL10A1|1300" 653.809042950468 -0.0625642907767188 0.0326682787216305 -1.91513888166056 0.0554747943911895 0.252118389003285
"MYOC|4653" 542.597748780303 -0.0328043889593688 0.0345755091590205 -0.94877529665562 0.342734908563562 0.609860961407258
where as with a single factor design like
design = ~condition
I get results like
"baseMean" "log2FoldChange" "lfcSE" "stat" "pvalue" "padj"
"COL10A1|1300" 653.809042950468 6.78450422681106 0.48849885948802 13.8884750599453 7.44025301718552e-44 1.42845417676945e-39
"MYOC|4653" 542.597748780303 -7.66525845621716 0.603532101073441 -12.7006640451829 5.8630099542801e-37 5.62819640561118e-33
In particular I want to know why the log2foldchange values of the multifactor design is so off
Without looking, is
patientNumber
a factor or numeric?Thanks for responding
I'm guessing that the fact that it is an integer is the source of my issues. If so, should I change it to factor or numeric? and If I do change it to factor, isn't there the issue of reference level? In my case, there is no base level for the patientNumber factor. Sorry if these questions are very trivial. It is my first time using DESeq.
Make it a factor, the reference level doesn't matter, it's just a nuisance variable.
I changed patientNumber into a factor vector and re-ran the multifactor design analysis but the fold change values still look pretty off.
It's a shrunken fold-change, which might be a bit aggressive given your design. Try disabling that then.
Ok so I read documentation on DESeq() and results() and it seems like addMLE is the only argument that fits the bill. I did
and the lfcMLE value still doesn't quite match the single factor run.
There's no reason the two should match, or even be close...one is controlling for patients the other isn't.
Well I understand that it shouldn't match but based on several examples I've seen that compared pool test with paired test, the fold changes values don't vary that much so I assumed there shouldn't be too much of difference in my case too. But this is besides the point.
The lfcMLE values for COL10A1 does not make sense to me, 2^0.225895491 =~1.17. If you were to inspect the expression values for COL10A1, you will see that for all patients except from one (patient 5), the increases in expression are massively positive.
If i were to take 0.225895491 as the correct value for my data, can you explain to me why accounting for patient factor drops my fold change value from 6.8 to 0.23 when the majority of the patients exhibit increased expression dramatically
Thanks
What code are you using to get the results? Be sure to specify that you want
condition
, since otherwise you'll get the patient18 offset (unless you changed your design such that the patients were beforecondition
).The code is like this. Everything else I ran with default parameters
After that I output the file and grep gene file to look up values
Am I supposed to do
instead? I assumed the order for the design arguments didn't matter
It doesn't except when you're trying to plot things or extract results (in which case you can specify the coefficient you want). By default, most things (not just DESeq2) will use the last thing in the model as the default output. In your case that's a patient, so either change the order in the design or use the
name=
parameter inresults()
.Devon, thank you so much.
I'm finally seeing fold change values that correspond to what I expect
Glad that's cleared up :)