Good afternoon,

I have a question about using collapseReplicates in DESeq2. As far as I understand, this function adds up the counts belonging to one biosample. I understand the meaning of this if the number of technical replicates per biosample is the same for all samples. Please tell me what should I do if the number of technical replicates per sample differs?

For example, SAMNXXXXX corresponds to SRRXXXXX1 (count = 5) and SRRXXXXX2 (count = 6), SAMNYYYYY corresponds only to SRRYYYYY1 (count = 10). If I add up the counts for SAMNXXXXX (5 + 6 = 11) and then compare it with count for SAMNYYYYY (10), I will get an incorrect conclusion that the expression is higher in SAMNXXXXX.

Maybe I need to take the arithmetic mean or something else? It seems to me that the arithmetic mean is not very reasonable. For example, I have counts of 181 and 2 for different replicates of the same biosample.

Note: this situation is not observed for most samples. For example, in a particular dataset there are 89 biosamples without technical replicates and and 5 biosamples with 2 technical replicates in each.

Thanks!

Good regards, Poecile

Thank you very much! Сould you please confirm that I am acting in the correct order?

e t.c.

Normalization happens after

`collapseReplicates`

, during this step:So you are all good !

Thank you for your help!