Question: Remove bacteria data from nt database
0
gravatar for anasofiamoreira94
7 months ago by
anasofiamoreira9470 wrote:

Hi all, I want to remove the bacteria data from the all nt database. Can someone tell me what's the best way to remove it? Thanks

nt ncbi • 199 views
ADD COMMENTlink written 7 months ago by anasofiamoreira9470
1

As far as I can tell nt sequences are annotated at the Genus level. So only way you may be able to do this is to get those names and exclude ones that are bacteria.

ADD REPLYlink written 7 months ago by genomax83k

It may be simpler to post-filter your results for bacteria instead?

As @lieven points out below

 -negative_taxids <String>
   Restrict search of database to everything except the specified taxonomy IDs
   (multiple IDs delimited by ',')

should work. Assuming nt is properly annotated bacterial taxID.

Edit: No sequences in nt appear to be annotated with taxID 2 so that idea is not going to work.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax83k

alternatively (if you are using the newest blast version) use the taxonomic filtering options and set that to only report eukaryotic hits. No need to modify your blastDB in this case

EDIT/update : though this seems to work on the NCBI webblast, there are indications this does not work on the (local) CLI version

ADD REPLYlink modified 7 months ago • written 7 months ago by lieven.sterck7.8k

I'm using blast locally

ADD REPLYlink written 7 months ago by anasofiamoreira9470

This would work if I add the Ids of the species to remove. But then again, they can change, so the result will be different.

ADD REPLYlink written 7 months ago by anasofiamoreira9470

Hi, I think the search within database should now be possible by limiting taxa even in offline BLAST.

See this NCBI webinar

And/or this post: https://ncbiinsights.ncbi.nlm.nih.gov/2019/01/04/blast-2-8-1-with-new-databases-and-better-performance/.

Bu t if you are after sequences, then I'm not aware of any option to extract the sequences directly from nt database. However, one possible way might be to list all accessions in nt (blastdbcmd), run them through entrez OR get yourself accession2taxid table, select which you want and then extract them using blastdbcmd.

GL

ADD REPLYlink modified 7 months ago • written 7 months ago by massa.kassa.sc3na260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1124 users visited in the last hour