My question is about the model formula used in differential expression analyses is applications such as DESeq2, edgeR, Sleuth etc.
I have a dataset which looks like so. There are more replicates but reduced here.
sample tissue family replicate condition a11c a 1 1 c a12c a 1 2 c a21c a 2 1 c a22c a 2 2 c b11c b 1 1 c b12c b 1 2 c b21c b 2 1 c b22c b 2 2 c a11t a 1 1 t a12t a 1 2 t a21t a 2 1 t a22t a 2 2 t b11t b 1 1 t b12t b 1 2 t b21t b 2 1 t b22t b 2 2 t
I have 2 tissues a and b for 2 treatments control and treated. And I also have families. I am not really interested in differentially expressed genes/transcripts (deg/det) between tissues. I am interested in deg/det between control and treated in both tissues. How is the correct way to create this model?
~tissue+condition ~tissue*condition ~tissue:condition
Since I am not that interested in degs between tissues, would it make sense to split the data into 2 datasets based on tissues and do it separately
subset(df,tissue=="a") ~condition subset(df,tissue=="b") ~condition
Family is an additional variable that is not so critical nevertheless would be interesting to inspect. Can I just add that to the original model? Also, does the order matter?
Any other considerations for such analyses? Thanks.