Hello,
I am trying to convert human gene names to mouse gene names using biomaRt.
Here is the code I am using:
library("biomaRt")
ensembl = useMart("ensembl")
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
all <- getBM(attributes = c("external_gene_name","mmusculus_homolog_associated_gene_name"), mart=ensembl)
The problem here is, that for some genes that I am interested in, there are two mmusculus_homolog_associated_gene_name. One contains the right homolog, the other is simply empty. One such gene is PTPRD.
When I want to perform the actual conversation with this code:
mapping <- getBM(attributes = c("external_gene_name","mmusculus_homolog_associated_gene_name"),
filters = "external_gene_name",
values = GENES-OF-INTEREST,
mart=ensembl)
The entry of PTPRD is picked, where the homolog is empty and not Ptprd. This happens for many different genes, and messes up my conversation table.
If anyone can help me figure out this problem or tell me why there are multiple external_gene_name in the first place, I'd be very happy.
EDIT:
Added image to emphasize what I am seeing when analyzing the all object
NaNgenerally stands fornot-a-numberas in "undefined" or "unrepresentable". I don't think there is a gene callednan. There is one calledNans.Sorry, its not actually NaN. Its just empty. Ill specify this in my question.
You can find a list of human-mouse gene homologs at this link from MGI/Jax.