Question

How to regress out variants using SCTransform function?

2

Entering edit mode

3.2 years ago

paria ▴ 90

I'm doing single-nuclei RNA sequencing and trying to regress out some variants. But When I try to do so I cannot regress out any variants other than percent.chrM. Could anyone help me with this. Below you can see the code and error I'm getting. I don't have any idea what's the error. The variants I'm trying to regress out have more than 2 levels. Any help would be appreciated!

split_seurat <- SplitObject(filtered_seurat, split.by = "orig.ident")

for (i in 1:length(split_seurat)) {
    split_seurat[[i]] <- NormalizeData(split_seurat[[i]], verbose = TRUE)
    split_seurat[[i]] <- CellCycleScoring(split_seurat[[i]], g2m.features=g2m_genes, s.features=s_genes)
    split_seurat[[i]] <- SCTransform(split_seurat[[i]], vars.to.regress = c("percent.chrM", "individual"))
    }
>>>>>
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Warning: The following features are not present in the object: PIMREG, JPT1, not searching for symbol synonyms
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 15682 by 969
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 969 cells
  |======================================================================| 100%
Found 73 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 15682 genes
  |======================================================================| 100%
Computing corrected count matrix for 15682 genes
  |======================================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 21.22376 secs
Determine variable features
Place corrected count matrix in counts slot
Regressing out percent.chrM, individual
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Single-nuclei RNA sequencing SCTransform Seurat • 2.9k views

ADD COMMENT • link 3.2 years ago by paria ▴ 90

0

Entering edit mode

Check the percent.chrM meta-data column are not all the same number (such as 0):

purrr::map_lgl(split_seurat, ~length(unique(.x$percent.xhrM)) == 1)

Also make sure there are no NA, NaN, or Inf values:

purrr::map_lgl(split_seurat, ~any(!is.finite(.x$percent.chrM)))

See if any NA values are in your individual column:

purrr::map_lgl(split_seurat, ~any( is.na(.x$individual)))

Finally, are you sure that there is more than one individual represented in each dataset after splitting by orig.ident?

purrr::map_lgl(split_seurat, ~length(unique(.x$individual)) == 1)

ADD REPLY • link 3.2 years ago by rpolicastro 13k

0

Entering edit mode

Thanks for your response. I ran the first two codes and got FALSE for all the 16 samples. But for the last two codes it says cannot find 'individual' in this Seurat object. I'm not sure why when I add column to my meta.data when I do SplitObject it cannot find the value for the Seurat object. Below is one step before SplitObject.

head(filtered_seurat@meta.data)
                     orig.ident nCount_RNA nFeature_RNA percent.chrM
AAACCCAAGCATGATA-1_1     T01471       1512          841    7.0105820
AAACCCAGTCATGCAT-1_1     T01471       7176         2438    0.7803790
AAACCCATCCGATGTA-1_1     T01471       8488         2900    0.3887842
AAACGCTAGGACAACC-1_1     T01471       2608         1300    2.6073620
AAAGAACAGATGTTAG-1_1     T01471       1758         1015    1.1376564
AAAGAACAGTAGCTCT-1_1     T01471       4522         1984    1.3489606
> head(filtered_seurat@meta.data)
                     orig.ident nCount_RNA nFeature_RNA percent.chrM
AAACCCAAGCATGATA-1_1     T01471       1512          841    7.0105820
AAACCCAGTCATGCAT-1_1     T01471       7176         2438    0.7803790
AAACCCATCCGATGTA-1_1     T01471       8488         2900    0.3887842
AAACGCTAGGACAACC-1_1     T01471       2608         1300    2.6073620
AAAGAACAGATGTTAG-1_1     T01471       1758         1015    1.1376564
AAAGAACAGTAGCTCT-1_1     T01471       4522         1984    1.3489606

ADD REPLY • link updated 3.2 years ago by rpolicastro 13k • written 3.2 years ago by paria ▴ 90

0

Entering edit mode

What code are you using to generate the 'individual' column?

ADD REPLY • link 3.2 years ago by rpolicastro 13k

0

Entering edit mode

The below is the code I used for adding column to the meta.data

metadata[which(metadata$orig.ident %in% c("T01493", "T01494")), "individual"] <- "1454"
metadata[which(metadata$orig.ident %in% c("T01495")), "individual"] <- "1888"
metadata[which(metadata$orig.ident %in% c( "T01471", "T01472")), "individual"] <- "1895"

ADD REPLY • link updated 3.2 years ago by rpolicastro 13k • written 3.2 years ago by paria ▴ 90

0

Entering edit mode

Your original ident "T01495" will only have 1 factor level for individual after you split your seurat object, so you can't use that as a regression covariate for that one object.

Also, how are you adding your metadata back to the Seurat object?

ADD REPLY • link 3.2 years ago by rpolicastro 13k

0

Entering edit mode

Oh, I thought the number of factor level is important when it comes to variable I'm regressing out. So, what should I do in my case as the orig.ident is the identity of my samples. It cannot have two factor level. I'm adding my meta.data back to the Seurat object with the below code: merge_seurat@meta.data <- metadata

ADD REPLY • link 3.2 years ago by paria ▴ 90

0

Entering edit mode

Btw, the T01495 is the orig.ident, sample identity and I'm not regressing it out. I'm trying to regress that the variables like individual which has more than two levels.

ADD REPLY • link 3.2 years ago by paria ▴ 90