I am analyzing some gene/cell expression data tor try to annotate the cell type using SingleR. My data is from 10x and consists of two columns that serve as row names, one ensemble Gene_ID and one symbol, for example CYTH3. The data look like below:
Gene_ID Symbol AAACCTGCACACTGCG.1 AAACCTGGTCAGAATA.1 AAACGGGAGATTACCC.1
ENSG00000000419 DPM1 0 0 0
ENSG00000000457 SCYL3 0 0 0
ENSG00000000460 C1orf112 0 0 0
ENSG00000000938 FGR 1 0 0
ENSG00000000971 CFH 0 0 0
ENSG00000001036 FUCA2 0 0 0
ENSG00000001084 GCLC 0 0 0
ENSG00000001167 NFYA 0 0 0
ENSG00000001460 STPG1 0 0 0
My problem is that only one column is allowed as row names, meaning that I have to delete one of the columns. But some gene symbols are duplicates with different Gene_ID:s. I understood that these are hapotypes and that we should rather use the ensembl Gene_ID:s when working with genes. My problem is that the cell type annotation libraries I found are all based on the gene symbols, meaning that it won't recognize my Gene_ID:s. Is there any workaround (like naming gene symbols from different haplotypes) or is there any library that use the Gene_ID:s for recognizing cell type?