I am trying to map my genes to chromosome location so I can remove low-quality cells using high mitochondrial content. When mapping to the chromosomes I obtain the below error.
ah = AnnotationHub()
gene_annot <- AnnotationDbi::select(ens.hs.101, keys = genes, keytype = "GENEID", columns = c("GENEID", "SEQNAME")) %>% set_names(c("ID", "Chromosome"))
genes <- rowData(sce)$ID
rowData(sce) <- merge(rowData(sce), gene_annot, by = "ID", sort=FALSE)
Error in .local(x, ..., value = value) : 26634 rows in value to replace 26664rows
I am thinking there is an error on the gene ID
you don't need to determine the chromosome - the best approach is to know what the mt genes are named. I believe the human mt genes will start with MT in the gene symbol. See this example from Seurat:
I am using Bioconductor packages for analysis and am not familiar with Seurat. I was thinking that the difference in gene IDs might have been eliminated during the cellRanger pipeline.?
If you used cellranger to process the sequencing data then use the gene ids provided by cellranger. Whether you use Bioconductor packages or not doesn't matter.
If you want mitochondrial genes then pull the gene ids for mitochondrial genes based on the "MT-" designation in the gene symbol.
Do not mix and match annotations. If you are using 10x provided indexes/annotations when you did
cellrangerpipeline then stay with that set.
I am trying to add chromosome ids to my rowData based on the gene id
10x may have used an older/newer release of Ensembl (check on that by looking at the metadata/release notes). That may explain the difference you are observing.