**730**wrote:

My question is about the model formula used in differential expression analyses is applications such as DESeq2, edgeR, Sleuth etc.

I have a dataset which looks like so. There are more replicates but reduced here.

```
sample tissue family replicate condition
a11c a 1 1 c
a12c a 1 2 c
a21c a 2 1 c
a22c a 2 2 c
b11c b 1 1 c
b12c b 1 2 c
b21c b 2 1 c
b22c b 2 2 c
a11t a 1 1 t
a12t a 1 2 t
a21t a 2 1 t
a22t a 2 2 t
b11t b 1 1 t
b12t b 1 2 t
b21t b 2 1 t
b22t b 2 2 t
```

I have 2 tissues a and b for 2 treatments control and treated. And I also have families. I am not really interested in differentially expressed genes/transcripts (deg/det) between tissues. I am interested in deg/det between control and treated in both tissues. How is the correct way to create this model?

```
~tissue+condition
~tissue*condition
~tissue:condition
```

Since I am not that interested in degs between tissues, would it make sense to split the data into 2 datasets based on tissues and do it separately

```
subset(df,tissue=="a")
~condition
subset(df,tissue=="b")
~condition
```

Family is an additional variable that is not so critical nevertheless would be interesting to inspect. Can I just add that to the original model? Also, does the order matter?

```
~tissue+condition+family
```

Any other considerations for such analyses? Thanks.

**91k**• written 3.2 years ago by rmf •

**730**