Question: Fit GLM using part of the data
gravatar for rasmus.agren
22 months ago by
rasmus.agren0 wrote:

I have a single-cell RNA-seq experiment with five different treatments. The treatments are likely to result in different cell types, although this isn't known at this stage of the analysis. Regardless, they can be assumed to be quite different. Due to technical problems the size factors are also very different for the different treatments.

I now want to use edgeR for finding differentially expressed genes between the treatments, but at this stage it's only treatment 1-3 that are of interest. I wonder if I should use the full dataset for estimating the dispersion and fitting the model, or only the treatments I'm interested in comparing at this stage. On the one hand you should get better tag wise estimates with the full data, but given that this is single cell data on FACS sorted cells that represent different cell types you could very well have zero expression in the treatments I'm interested in and quite high in some of the others (or the opposite). What would be the statistically more correct approach here? I would like to err on the side of caution. Thanks!

edger glm scrna-seq • 477 views
ADD COMMENTlink modified 22 months ago by Kevin Blighe63k • written 22 months ago by rasmus.agren0
gravatar for Kevin Blighe
22 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

There is no real right or wrong here. You should start by using the entire dataset and doing normal filtering for low count transcripts and transcripts with many zeros. With scRNA-seq, as I understand, there are also imputation methods available, which you may consider.

If you run into a brick wall by using the entire dataset, then consider reducing the dataset in size. In some cases, if the covariation between groups within your dataset is so great and / or inconsistent (or heteroskedastic), then splitting the dataset may be the only way.


ADD COMMENTlink written 22 months ago by Kevin Blighe63k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1377 users visited in the last hour