How to map SILVA taxa names to NCBI IDs (in R)?
0
0
Entering edit mode
5 weeks ago
Art_SG • 0

Hi,

How to map taxonomy assigned using SILVA database to NCBI ID (preferably in R)?

I'm using R Dada2 and SILVA database to assign taxonomy: Bacteria > Firmicutes > Bacilli > Lactobacillales > Enterococcaceae > Enterococcus

I need to map it to NCBI ID: 2 > 1239 > 91061 > 186826 > 81852 > 1350

Or at least use lowest rank to get NCBI ID: Enterococcus > 1350

I see tool such as Crossclasify (and related paper from Monika Balvočiūtė), but DBs over there are quite old (SILVA 128). Any alternative? Is there any map that can be easily used?

Thanks!

mapping SILVA NCBI R • 294 views
ADD COMMENT
1
Entering edit mode

Not R but using Entrezdirect:

$ esearch -db taxonomy -query "enterococcus" | esummary | xtract -pattern DocumentSummary -element Tax
Id,ScientificName
1350    Enterococcus

OR

$ esearch -db taxonomy -query "enterococcus" | efetch -format xml | xtract -pattern Taxon -element TaxId,ScientificName -tab "\n" -block "*/Taxon" -tab "\n"  -element TaxId,ScientificName
1350    Enterococcus    131567  cellular organisms
2   Bacteria
1783272 Terrabacteria group
1239    Firmicutes
91061   Bacilli
186826  Lactobacillales
81852   Enterococcaceae
ADD REPLY
0
Entering edit mode

Thank for that. It is some sort of workaround, but

  1. It is extremely slow, might be fine for small number of queries, but I need to do that for 100s of gut samples
  2. Many names will not have direct match in esearch, e.g., (genus by SILVA 138) "Lachnospiraceae NK4A136 group" returns nothing. I was hoping for some map that maps all SILVA taxons to NCBI IDs (of course as much as it is possible accounting differences in taxonomy). For above example that would be most likely "unclassified Lachnospiraceae" (taxid: 186928)
ADD REPLY
0
Entering edit mode

NCBI has taxonomy dump files available that you can download and the do the mapping locally. Unless the information exists i.e. someone has created that mapping you are not going to find information you are looking for. SILVA is the derived database, NCBI is simply maintaining taxonomy database.

You will generally want to search with just the family/genus names to get a result.

$ esearch -db taxonomy -query "lachnospiraceae" | efetch -format xml | xtract -pattern Taxon -element TaxId,ScientificName -tab "\n" -block "*/Taxon" -tab "\n"  -element TaxId,ScientificName
186803  Lachnospiraceae 131567  cellular organisms
2   Bacteria
1783272 Terrabacteria group
1239    Firmicutes
186801  Clostridia
186802  Eubacteriales
ADD REPLY

Login before adding your answer.

Traffic: 1962 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6