Question: Fit GLM using part of the data
0
gravatar for rasmus.agren
12 months ago by
rasmus.agren0 wrote:

I have a single-cell RNA-seq experiment with five different treatments. The treatments are likely to result in different cell types, although this isn't known at this stage of the analysis. Regardless, they can be assumed to be quite different. Due to technical problems the size factors are also very different for the different treatments.

I now want to use edgeR for finding differentially expressed genes between the treatments, but at this stage it's only treatment 1-3 that are of interest. I wonder if I should use the full dataset for estimating the dispersion and fitting the model, or only the treatments I'm interested in comparing at this stage. On the one hand you should get better tag wise estimates with the full data, but given that this is single cell data on FACS sorted cells that represent different cell types you could very well have zero expression in the treatments I'm interested in and quite high in some of the others (or the opposite). What would be the statistically more correct approach here? I would like to err on the side of caution. Thanks!

edger glm scrna-seq • 324 views
ADD COMMENTlink modified 12 months ago by Kevin Blighe48k • written 12 months ago by rasmus.agren0
1
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe48k
Kevin Blighe48k wrote:

There is no real right or wrong here. You should start by using the entire dataset and doing normal filtering for low count transcripts and transcripts with many zeros. With scRNA-seq, as I understand, there are also imputation methods available, which you may consider.

If you run into a brick wall by using the entire dataset, then consider reducing the dataset in size. In some cases, if the covariation between groups within your dataset is so great and / or inconsistent (or heteroskedastic), then splitting the dataset may be the only way.

Kevin

ADD COMMENTlink written 12 months ago by Kevin Blighe48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 906 users visited in the last hour