Question: subscript out of bounds in seurat expression matrix [SOLVED]
0
gravatar for demoraesdiogo2017
5 weeks ago by
demoraesdiogo201740 wrote:

Hello

I'm trying to build a expression matrix to use as input in a different heatmap tool used on Seurat. For that, I want to take the 500 DEGs with lowest adjusted p-value.

here is what I tried:

DEGs <- all.markers %>% top_n(n = -500, wt = p_val_adj)
DEGs.genes <- DEGs$gene
DEGs.genes <- unique(DEGs.genes)
integratedexpression <- as.matrix(GetAssayData(circBALF.pred.integrated.CD4, assay = "integrated"))
integratedexpression.filtered <- integratedexpression[DEGs.genes, ]
annotations <- circBALF.pred.integrated.CD4@meta.data
annotations <- t(annotations)

the 5th line gives me the following error message:

Error in integratedexpression[DEGs.genes, ] : subscript out of bounds

this does not occur if I extract SCT expression values, it runs perfectly.

What should I do? The format of the matrix seems to be exactly the same. I searched the error and from what I understand the number of rows of the integrated matrix is smaller than 500, which is not the case.

seurat • 185 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by demoraesdiogo201740

When you did the marker analysis, what assay and slot did you use? Some of the slots hold all genes that pass quality filtering, while others only hold the top 2-3k most variable genes.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by rpolicastro1.7k

I used the RNA assay but did not specify the slot, and I assume the default is "data"

additionally, when I used these genes with the Doheatmap function using the integrated data assay, many genes indeed are left out, but the heatmap is still generated

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by demoraesdiogo201740

I can't remember 100% off the top of my head, But I think all of the RNA slots, as well as the counts and data slots of SCT contain all genes that pass quality filtering. On the other hand, SCT scale.data, and all (or most) of the integration assay slots only have the top 2-3k most variables genes. You can double check for each slot by getting the number of rows in the matrix, since rows are the genes.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by rpolicastro1.7k

I believe you are correct, but still

when I run do heatmap using the same list of genes (DEGs$gene) with the integrated assay, the heatmap is generated.

DoMultiBarHeatmap(circBALF.pred.integrated.CD4, assay = 'integrated', features = DEGs$gene, group.by='integrated_snn_res.0.5', additional.group.by = 'Patient.status', additional.group.sort.by = 'Patient.status') + theme(text = element_text(size = 5))

I do get a warning message of a list of genes that are not present though

  The following features were omitted as they were not found in the scale.data slot for the integrated assay:
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by demoraesdiogo201740
1

The reason integratedexpression.filtered <- integratedexpression[DEGs.genes, ] is not working now is because not all of the genes you are trying to subset are in the rownames of the matrix. You can ignore the genes that are absent by instead doing this.

integratedexpression.filtered <- integratedexpression[rownames(integratedexpression) %in% DEGs.genes, ]
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by rpolicastro1.7k

this worked! thanks!

ADD REPLYlink written 5 weeks ago by demoraesdiogo201740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1110 users visited in the last hour