biomart returns several ensembl ids for one gene
0
0
Entering edit mode
13 days ago
fifty_fifty ▴ 10

I have to convert the gene names in my scRNA-seq data into ensembl IDs for downstream analyses. I used biomaRt package which converted some of the gene names:

library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")

biomart_hgnc <- getBM(attributes = c("hgnc_symbol", "ensembl_gene_id"),
filters = "hgnc_symbol",
values = rownames(LeeCRCtumor), bmHeader = T, mart = ensemble)


However, it returns several ensemble ids for one gene like here:

should I specify the gene location/chromosome in this case?

scrna-seq ensembl biomart r • 176 views
0
Entering edit mode
0
Entering edit mode

yes, I understand that one gene can have several ensemble ids. But in my case, I have this single-cell RNA seq count matrix with gene names which I got from NCBI database. I don't have any fastq files or anything raw. I am trying to find a way to convert the gene names to ensemble ids. So, I think I need to restrict the biomaRt mapping somehow that the genes should not be in haplotypic regions. I was wondering if biomaRt has that functionality.

0
Entering edit mode

For the two examples above:

197953 is the main gene.
261846 is the alternate sequence gene.

So you could filter your lists to restrict genes on main chromosome.

0
Entering edit mode

yes, I filtered out the genes that are not on the main chromosomes. I used ensembldb and several filters of biomaRt subsequently. However, I have some remaining genes that were not recognized by those methods. A lot of them start with RP11. I couldn't find some of them at all, e.g. CH17-212P11.4. Do you know how to convert them into ensemble id?