I am working on an analysis which requires integration of data per gene from several databases and screens. Since gene symbols vary between the different data sources I am using, I tried to come up with a way to match the gene symbols. I have tried querying the geneinfo table for homo sapiens that I downloaded from NCBI for aliases of genes with incompatible symbols.
but then I found out that some gene symbols correspond to multiple genes.
For example: the symbol C10orf2 is associated both with a gene in chromosome 10, and with the gene CHMP1B on chromosome 18. This observation was also confirmed by search bioDBnet, which was recomended in a previous post.
I have also tried using the geneSynonym package but ran into similar problems.
Does anyone have an idea why this type of disambiguites happen? More practically, if anyone ran into such a problem before I would appreciate any suggestions as to how to match the gene symbols lists in a way that will not be ambiguous.
(obviously it would probably be better to compare IDs such as entrez IDs or ENSGs/ENSPs, but not all the sources that I use provide these).
Thanks in advance