Question

Singe cell identification library with Ensembl_ID:s

0

Entering edit mode

5.2 years ago

chilifan ▴ 120

I am analyzing some gene/cell expression data tor try to annotate the cell type using SingleR. My data is from 10x and consists of two columns that serve as row names, one ensemble Gene_ID and one symbol, for example CYTH3. The data look like below:

Gene_ID         Symbol  AAACCTGCACACTGCG.1  AAACCTGGTCAGAATA.1  AAACGGGAGATTACCC.1
ENSG00000000419 DPM1        0   0   0
ENSG00000000457 SCYL3       0   0   0
ENSG00000000460 C1orf112    0   0   0
ENSG00000000938 FGR         1   0   0
ENSG00000000971 CFH         0   0   0
ENSG00000001036 FUCA2       0   0   0
ENSG00000001084 GCLC        0   0   0
ENSG00000001167 NFYA        0   0   0
ENSG00000001460 STPG1       0   0   0

My problem is that only one column is allowed as row names, meaning that I have to delete one of the columns. But some gene symbols are duplicates with different Gene_ID:s. I understood that these are hapotypes and that we should rather use the ensembl Gene_ID:s when working with genes. My problem is that the cell type annotation libraries I found are all based on the gene symbols, meaning that it won't recognize my Gene_ID:s. Is there any workaround (like naming gene symbols from different haplotypes) or is there any library that use the Gene_ID:s for recognizing cell type?

Ensembl library haplotypes Gene_ID annotation • 880 views

ADD COMMENT • link 5.2 years ago by chilifan ▴ 120