Find all NCBI taxids in some group (e.g., all coronaviruses)
3
0
Entering edit mode
3.7 years ago
oddjobs ▴ 10

I am trying to get the list of taxids of all coronaviruses which I plan to use in a script. Equivalently, the taxids of all viruses related to SARS-CoV is also a good starting point.

However, I do not know how to extract this information efficiently. I can find the following links with NCBI, however these links do not provide the information in a text/tabulated format that I can transport to a script: 1. SARS: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Tree&id=694009&lvl=3&keep=1&srchmode=1&unlock 2. Coronaviruses: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=11118&lvl=3&lin=f&keep=1&srchmode=1&unlock

I would need to click each organism in this page to find the taxid of an individual virus. Is there a more machine-readable source for this information? Alternatively, I may need to use some scripting to click each link etc - is there some efficient way to do that? I can think of the following, but both seem convoluted: 1. Inspect source of the page to find all links related to individual viruses 2. Use lynx browser on a terminal

NCBI taxonomy browser RNA SARS • 1.5k views
ADD COMMENT
2
Entering edit mode
3.7 years ago

You can also fetch as XML if you need more information :

esearch -db taxonomy -query txid694009[Subtree] | efetch -format xml > out.xml

then process with xtract:

cat out.xml | xtract -pattern Taxon -element TaxId,ScientificName | head

prints:

2709072 Bat coronavirus RaTG13
2697049 Severe acute respiratory syndrome coronavirus 2
2042698 SARS-related betacoronavirus Rp3/2004
2042697 SARS-related bat coronavirus RsSHC014
1699361 Bat SARS-like coronavirus YNLF_34C
1699360 Bat SARS-like coronavirus YNLF_31C
1508227 Bat SARS-like coronavirus
1503303 BtRs-BetaCoV/YN2013
1503302 BtRs-BetaCoV/HuB2013
1503301 BtRs-BetaCoV/GX2013
ADD COMMENT
0
Entering edit mode

Thanks! I wasn't aware of this utility before. It will probably be helpful for other things I want to do as well.

ADD REPLY
1
Entering edit mode
3.7 years ago
GenoMax 141k

Not an elegant solution but I think this should work for now.

  1. Go this link. This is a listing of subtree for 694009 taxID.
  2. Send to drop down --> File --> Under Format --> choose taxid list.
  3. Create File --> Save the file

Under Display settings if you choose Info option then you will get additional information (names, lineages etc).

ADD COMMENT
2
Entering edit mode

You can also execute the solution by genomax from command line via:

esearch -db taxonomy -query txid694009[Subtree] | efetch | head

prints:

2709072
2697049
2042698
2042697
1699361
1699360
1508227
ADD REPLY
0
Entering edit mode

That's probably the closest I can get to what I want. The 200 is a limitation, but I can work with it for now. Thanks!

ADD REPLY
1
Entering edit mode

You should get all 261 in one file. Check updated instructions above.

ADD REPLY
1
Entering edit mode
3.7 years ago
Joe 21k

You can use this script I added to Kai Blin's tool ncbi-genome-download:

https://github.com/kblin/ncbi-genome-download/blob/master/contrib/gimme_taxa.py

ADD COMMENT
0
Entering edit mode

I see. Thanks. It's nice to have in script form as my goal is to use this in a script in the end. The ncbi-genome-download tool also seems quite useful. I will look up both.

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6