I have a data frame with gene names like this:
> test A 1 mmu-miR-181a-5p 2 mmu-miR-181b-5p 3 mmu-miR-199a-3p__mmu-miR-199b-3p 4 mmu-miR-669o-3p__mmu-miR-669a-3p 5 mmu-miR-669d-5p 6 mmu-miR-103-3p
I truncate the names as follows, to be able to match the them with miRbase IDs:
> test$A <- gsub( "-3p*$", "", test$A) > test$A <- gsub( "-5p*$", "", test$A) > test A 1 mmu-miR-181a 2 mmu-miR-181b 3 mmu-miR-199a-3p__mmu-miR-199b 4 mmu-miR-669o-3p__mmu-miR-669a 5 mmu-miR-669d 6 mmu-miR-103
Now I would like to use a biomaRt and find the ensembl IDs for the genes, but the match fails to find a match:
> ensembl = useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl") > genemap <- getBM( attributes = c("ensembl_gene_id", "gene_biotype", "external_gene_name","mirbase_id" ,"mirbase_trans_name"), + mart = ensembl ) > idx <- match(test$A, genemap$mirbase_id ) > idx  NA NA NA NA NA NA
Out of this list, mmu-mir-669d should give a match but it doesn't. This is just an example - out of a complete lists I got about 16 matches, while I was expecting hundreds.
I was thinking of spaces generated by the
gsub function, but there are no spaces. It's likely stupid errorn but where? Any educated guesses will be welcome...