pseudobulk differential expression design matrix
1
0
Entering edit mode
3 months ago
nhaus ▴ 360

Hi all,

I have the following situation and I just want to make sure that I understand everything correctly from a statistics point of view...

I run a pseudobulk differential expression analysis, where we have a treatment group and a control group. Each group has two replicates (i.e. Ctrl_1, Ctrl_2, Treat_1 and Treat_2). The replicates were performed in batches, i.e. replicate 1 in batch 1 and replicate 2 in batch 2. After summarizing all the counts for one cell population of interest, we end up with metadata that essentially looks like this:

sample_id group_id batch
Ctrl_1 Ctrl 1
Ctrl_2 Ctrl 2
Treat_1 Treat 1
Treat_2 Treat 2

I am interested in comparing Treat vs Ctrl while adjusting for batch, so our model matrix looks like this: mm <- model.matrix(~ batch + group_id, data = mdata)

(Intercept) batch2 group_idTreat
1 0 0
1 1 0
1 0 1
1 1 1

This is all very straight forward.

Here is where the part comes which confuses me slightly. We are using a method, which classifies some cells from the Treat group as controls (because the experimental perturbation did not properly work). This means that we end up with new group_ids, namely: Ctrl_like and Treat_like. I am still interested in comparing the expression of Treat_like vs Ctrl_like, but is my assumption correct, that it is now impossible to perform a standard pseudobulk differential expression analysis, because one sample (i.e Treat_1) can belong to two groups (i.e. Ctrl_like and Treat_like) simultaneously and thus it is not possible anymore to adjust for batch effects? This is how the meta data would look like:

sample_id group_id batch
Ctrl_1 Ctrl_like 1
Ctrl_1 Treat_like 1
Ctrl_2 Ctrl_like 2
Ctrl_2 Treat_like 2
Treat_1 Treat_like 1
Treat_1 Ctrl_like 1
Treat_2 Treat_like 2
Treat_2 Ctrl_like 2

Any insights on that matter are greatly appreciated!

pseudobulk single-cell scRNA-seq • 361 views
ADD COMMENT
0
Entering edit mode
3 months ago

You should be fine because every sample_id and batch apppears in every group_id. You would use the formula ~ group_id + sample_id + batch.

ADD COMMENT
0
Entering edit mode

Will this account for the fact that some cells come from the same original sample? This seems like relevant information for a correct analysis.

ADD REPLY
0
Entering edit mode

Also, I just tried to do a formula like this and got the following error: Design matrix not of full rank.

I assume that is because the design matrix has columns that are linearly dependent? I.e. the sample_id column also encodes the batch column. Is that correct?

ADD REPLY

Login before adding your answer.

Traffic: 1687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6