Hello everyone, here's my question:
I have three datasets of RNA-seq data and WES of cases and controls of the same disease. After stratifying the cases in all three datasets based on the presence or absence of deleterious mutation on a particular gene I obtain: Controls, Cases_1 and Cases_2 for all three datasets. I then perform differential expression analyses for Cases_1 vs Controls and Cases_2 vs Controls for each dataset separately. If I select the genes I find differentially expressed in all three "Cases_2 vs Controls" analyses and do not find in all three "Cases_1 vs Controls", can I consider those as a signature for the Cases_2 group? Or should additional analyses be involved?
Thanks in advance to anyone who will answer.
Hi, thank you very much for the response.
So for the binomial logistic regression, would it be best to put all raw counts for the three datasets in one matrix, normalize and perform the regressions or do that separately for each dataset?
Can the normalization be performed with DEseq2 functions? such as:
Also, would it be best to perform the regressions and then see how many of the genes which feature a significant regression are differentially expressed in all three datasets (FDR < 0.05, log2FoldChange > 1 or < -1) or viceversa?
Thank you and sorry for the many questions,
Giovanni
Buonasera Giovanni, yes, I would use the
vsd
data.The idea of the regression, in this case, is to confirm that each gene's expression differs based on mutation status:
mydata
may contain Group1 and Group2 samples, combined, or it may contain Group1 + Group2 + Controls. In each case, the meaning of the result will change. This is part of research.Please be flexible with these models, though, and use whatever you feel is appropriate.