I have a dataset from a paper and I wanted to run it seurat pipeline. But before I ran SoupX on the samples:
P018_1.data <- Read10X(data.dir = "/fast/scratch/users/cetinsz_c/GSE199321/UR_AKI_P018.1/outs/filtered_feature_bc_matrix")
P018_1 <- CreateSeuratObject(counts = P018_1.data, project = "P018.1", min.cells = 3, min.features = 200) sc_P018_1 = load10X("/fast/scratch/users/xxxx/GSE199321/UR_AKI_P018.1/outs/") sc_P018_1 = autoEstCont(sc_P018_1, tfidfMin = 0.6) out_P018_1 = adjustCounts(sc_P018_1, roundToInt=T) P018_1@assays$RNA@data@x <- out_P018_1@x
I had 20 samples like this plus 6 pool samples. Next, I wanted to run the scDblFinder
load seurat objects from demultiplexed samples (see "AKI_Urine_Sediment_Demultiplexing_pooled_samples_script") ----
Pool1 <- readRDS("/fast/scratch/users//xxx/akiproject/AKI_Urine_sediment_Pool1.rds") Pool2 <- readRDS("/fast/scratch/users//xxx/akiproject/AKI_Urine_sediment_Pool2.rds") Pool3 <- readRDS("/fast/scratch/users//xxx/akiproject/AKI_Urine_sediment_Pool3.rds") Pool4 <- readRDS("/fast/scratch/users//xxx/akiproject/AKI_Urine_sediment_Pool4.rds") Pool5 <- readRDS("/fast/scratch/users/xxx/akiproject/AKI_Urine_sediment_Pool5.rds") Pool6 <- readRDS("/fast/scratch/users/xxx/akiproject/AKI_Urine_sediment_Pool6.rds")
Copy the 'orig.ident' values to a new 'idents' column in the metadata
Pool4@meta.data$idents <- Pool4@meta.data$orig.ident
Pool6@meta.data$idents <- Pool6@meta.data$orig.ident
Identify the cells with the idents value "P054"
cells_to_remove <- rownames(Pool4@meta.data[Pool4@meta.data$idents == "P054", ])
Subset the Seurat object using the 'subset' function
Pool4 <- subset(Pool4, cells = colnames(Pool4)[!colnames(Pool4) %in% cells_to_remove])
Identify the cells with the idents values "P116", "P118", and "P120"
cells_to_remove_Pool6 <- rownames(Pool6@meta.data[Pool6@meta.data$idents %in% c("P116", "P118", "P120"), ])
Subset the Seurat object using the 'subset' function
Pool6 <- subset(Pool6, cells = colnames(Pool6)[!colnames(Pool6) %in% cells_to_remove_Pool6]) ``
make a list of all objects, determine percentage of mitochondrial RNA and doublets per sample. ----
URINEList <- list(P005, P006, P007, P017_1, P017_2, P018_1, P018_2, P019_1, P019_2, P021, P022, P023_1, P023_2, P023_3, P023_4, P024_1, P024_2, P001, P002_1, P002_2, P003, Pool1, Pool2, Pool3, Pool4, Pool5, Pool6)
URINEList <- list(P005, P006, P007)
does not work:Assuming the input to be a matrix of counts or expected counts.
Error in validObject(result) : invalid class “dgCMatrix” object: 'i' and 'x' slots do not have equal length
for (i in 1:length(URINEList)) { URINEList[[i]][["percent.mt"]] <- PercentageFeatureSet(URINEList[[i]], pattern = "^MT-") doublets <- scDblFinder(GetAssayData(URINEList[[i]], assay = "RNA", slot = "data")) doublets <- as.vector(doublets@colData@listData[["scDblFinder.class"]]) URINEList[[i]]@meta.data$multiplet_class <- doublets URINEList[[i]]@project.name <- levels(URINEList[[i]]@active.ident) URINEList[[i]] <- RenameCells(URINEList[[i]], add.cell.id = paste0(URINEList[[i]]$orig.ident, "_")) }
The error I get: Error in validObject(result) : invalid class “dgCMatrix” object: 'i' and 'x' slots do not have equal length
Info about the URINEList: Dimensions of the input matrix: [1] 23331 1459 Number of non-zero elements in the input matrix: [1] 2600061 Class of the input matrix: [1] "dgCMatrix" Length of 'i' slot: [1] 2846422 Length of 'x' slot: [1] 2872704
I tried another code:
for (i in 1:length(URINEList)) { assay_data <- GetAssayData(URINEList[[i]], assay = "RNA", slot = "data")
URINEList[[i]][["percent.mt"]] <- PercentageFeatureSet(URINEList[[i]], pattern = "^MT-")
doublets <- scDblFinder(as.matrix(assay_data)) doublets <- as.vector(doublets@colData@listData[["scDblFinder.class"]]) URINEList[[i]]@meta.data$multiplet_class <- doublets
URINEList[[i]]@project.name <- levels(URINEList[[i]]@active.ident) URINEList[[i]] <- RenameCells(URINEList[[i]], add.cell.id = paste0(URINEList[[i]]$orig.ident, "_")) }
This is the error I get this time:
Creating ~5000 artificial doublets...
Dimensional reduction
Evaluating kNN...
Training model...
iter=0, 399 cells excluded from training.
iter=1, 227 cells excluded from training.
iter=2, 220 cells excluded from training.
Threshold found:0.514
96 (2%) doublets called
Error in .checkSCE(sce) :
sce
should be a SingleCellExperiment, a SummarizedExperiment, or an array (i.e. matrix, sparse matric, etc.) of counts.
In addition: Warning message:
In asMethod(object) :
sparse->dense coercion: allocating vector of size 3.5 GiB
Is there anyway to fix this problem? I assume this happens because SoupX maybe deletes cells or genes before I find the doublets in the dataset because when I don´t run soupx, it does not give me this error.
Please format this wall of code (the 10101 button), remove unnecessary code and describe what exactly the problem is.