Downloading a subset of sequences from the NCBI nucleotide database
2
0
Entering edit mode
3 months ago
jess_palmer ▴ 10

Hello,

I want to make a local BLAST database which only contains Neisseria sequences. I've been looking at the BLAST+ manual but I'm still a bit confused how to use the blastdb_aliastool to get what I want. Is there a way to make a blastdb from all nucleotide sequences with a specific taxid?

ncbi blast • 326 views
1
Entering edit mode
3 months ago

absolutely.

Download the sequences your want in fasta file and build a blast DB from them.

(building blast DBs is done with the makeblastdb from the blast+ package )

with the newest blast version (v5) it is also possible to subset your results for a certain taxonomy in stead of re-building a database.

1
Entering edit mode
3 months ago
GenoMax 115k

You have a couple of options.

1. Download all Neisseria genome sequences and build a local database. e.g. How to download all Pseudomonas aeruginosa Genomes from NCBI Genomes database? (you will change this to species you want)
2. Use the "taxID" restriction option available with blast+ to restrict your searches (from any standard NCBI pre-formatted database) to just that ID.
0
Entering edit mode

Hi,

Thank you for your response.

The problem with this is that I don't just want Neisseria genomes, I actually want all Neisseria 16S rRNA gene sequences (I should have mentioned this before, sorry!). I thought I could make a db of all Neisseria sequences and then extract the 16S rRNA genes but I suppose that ideally I would instead like to search with a query of my 16S rRNA and filter so that only Neiserria hits are returned and then I'd like to download an alignment of all hits so that I then have a db of just the 16S rRNA genes.

1
Entering edit mode

Since NCBI genomes may be at various levels of completeness your chances of finding annotated 16S rRNA gene sequences are variable. You may find them in some genomes while in others no. So your suggestion of searching those genomes (may best be done locally) with an appropriate 16S rRNA sequence and then extracting relevant sequences to create your multiple-sequence alignment would be the way forward.

Otherwise use web blast and see if you can limit your search there by taxID and perhaps an additional entrez query for 16S genes.