Filtering criteria for selecting DEGs from Monocle3 regression analysis with categorical terms
0
0
Entering edit mode
4.3 years ago
gundalav ▴ 380

I'm testing the Monocle3 (version 0.2.0) with the following code. The main aim of this analysis is to identify differentially expressed genes for every time point.

library(monocle3)
library(dplyr) # imported for some downstream data manipulation


expression_matrix <- readRDS(url("http://staff.washington.edu/hpliner/data/packer_embryo_expression.rds"))
cell_metadata <- readRDS(url("http://staff.washington.edu/hpliner/data/packer_embryo_colData.rds"))
gene_annotation <- readRDS(url("http://staff.washington.edu/hpliner/data/packer_embryo_rowData.rds"))


cds <- new_cell_data_set(expression_matrix,
                         cell_metadata = cell_metadata,
                         gene_metadata = gene_annotation)


cds <- preprocess_cds(cds, num_dim = 50)
cds <- reduce_dimension(cds, reduction_method = "UMAP")


gene_fits <- fit_models(cds, model_formula_str = "~time.point + 0", cores = 15)
fit_coefs <- coefficient_table(gene_fits) %>%
             dplyr::select(-model, -model_summary)

There I perform the regression DEG analysis with the categorical term, e.g. ~time.point + 0.

Using the q_value threshold <=0.05. I still get too many DEGs.

fit_coefs  %>%
  filter(q_value <= 0.05) %>%
  group_by(term) %>%
  summarise(n = n())

Which gives me:

# A tibble: 4 x 2
  term                      n
  <chr>                 <int>
1 time.point300_minutes 14324
2 time.point400_minutes 16615
3 time.point500_minutes 18557
4 time.pointmixed       18114

Notice that the DEGs are above 10K genes, which is unusually high. One would expect they are in the range of hundreds. Please advise what's the reasonable way to select the DEGs in this case.

If I'm wrong, I'd like to know how to properly interpret the DEG in this case, for example with DEG 14324, does that mean that there are 14324 genes that are specifically expressed in time.point300_minutes? That is unlikely because I checked there are around 13K genes that appear in all 4 time-points.

RNA-Seq monocle differential-expression • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 2785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6