I can't find some genes codes using biomart or org.Hs.eg.db
0
0
Entering edit mode
4 months ago
oscresal • 0

Hello everyone,

lately I'm getting a lot of gene identifiers through BLAST alignments using R. The problem is that when looking for additional data about these genes, like coordinates, other names, anything... in many cases I can't find anything if I use R packages like org.Hs.eg.db or biomaRt. But if I do it manually using websites like Genbak, uniprot and similar I do find them.

It could be because there are different nomenclatures such as RefSeq, NCBI, Ensembl, UniProt... but I'm getting a lot of them that I can't identify, here are some examples:

 [1] "AP023480" "AP023480" "AC134504" "AC234038" "AF119117"
[6] "AH010012" "NG_023352" "AC016632" "AJ004918" "AJ004918"
[11] "NG_052917" "AC025470" "AB462959" "AF411339" "AC103796"
...
[61] "KJ897416" "NG_050664" "KJ897662" "MH180382" "MH180382"
[66] "BC131822" "BC044787" "NG_032122" "NG_008797" "NG_008797"
[71] "AH010822" "CP068261"


I really do know how to identify them, because I've looked at what the letters mean here. What I don't know is what attribute (biomart) or keytype (org.Hs.eg.db) I have to use, can you help me?

Thank you very much PS: this is my first post here, I hope I am not making any mistakes, but the Posting Guide link is not working :(

BLAST biomart attributes keytype mart • 240 views
2
Entering edit mode

These are Genbank nucleotide sequence accessions. While org.Hs.eg.db does contain Genbank accession mappings in org.Hs.egACCNUM, I wouldn't expect to find all of your hits in this db since there needs to be a corresponding Entrez gene id for inclusion in this db which wouldn't always be the case.

To get additional meta data about these sequences you can use the accessions to query NCBI/Entrez via API, E-utilites, or if you wish to stay in R the Rentrez package.

0
Entering edit mode

Thank you very much! It took me a while to master it, but by combining the functions entrez_search() and entrez_fetch() I managed to do it. The thing is that now I have some codes, that although it finds them in Genbank, the web itself doesn't give me useful information. That is, it doesn't link me to another code or tell me where it is in any reference genome. But I have seen that the UCSC search engine does give me this information.

So do you know of any R package that allows me to do the same but with this website?