NCBI blast remote on specific species
1
1
Entering edit mode
6.4 years ago
tlorin ▴ 350

Dear all,

I am trying to blast a batch of sequences onto a specific species using the command-line version of blast. I know this can be specified with the -gilist option (see the answer here), but this only works locally, and I am using blast whith the -remote option.

My command-line:

blastn -query myquery.fa -db nt -remote -out test.out -outfmt 7

And it would be something like (note that the following is NOT working):

blastn -query myquery.fa -db nt -remote -out test.out -outfmt 7 -species 144197 (or any other species ID).

Any idea of how I can achieve that? Thanks!

EDIT: Here is another related post, that advises to use blastdb_aliastool, but again, this seems to work only locally.

ncbi • 8.6k views
0
Entering edit mode

gi numbers are about to go away so don't use a solution that depends on gi numbers. You could use -entrez_query instead.

0
Entering edit mode

Thanks! I had a look and tried -entrez_query "txid144197[ORGN]" but seems not to work.

0
Entering edit mode

Use -entrez_query "Stegastes partitus[ORGN]" that should work.

0
Entering edit mode

By the way, when the database is downloaded locally, you mainly depend on tools using the GIs right? Do you know what will they become?

2
Entering edit mode
4.9 years ago
' ▴ 290

I know this is an old thread, but it took me a long time to figure out the same---this stuff is not well-documented. So I want to post an answer to this thread to clarify how one can achieve what's asked for in this thread (both the remote and local way).

I am using the latest version of BLAST+ (at the time of writing this post, it is version 2.7.1 available here) and I am focusing on 'nucleotides' and "Homo sapiens" in the following example.

Doing a remote BLAST alignment on a specific organism is done in the following way using the -entrez_query option.

blastn -query blast_query.fasta -db nt -remote -task blastn-short -word_size 7 -evalue 500 -perc_identity 95 -entrez_query "Homo sapiens [organism]" -outfmt 6 -out blast_result_Hsapiens.table -max_target_seqs 10 -max_hsps 5


A local BLAST alignment on a specific organism is done in this way:

You first need to have Entrez Direct installed (setting that up is extremely easy, just follow the guidelines). Then you download your desired GI list using that tool.

esearch -db nuccore -query "Homo sapiens [organism]" | efetch -db nuccore -format uid > Hsapiens_gilist.gi


Once you have the GI list, you use the -gilist tag to pass it to BLAST.

blastn -query blast_query.fasta -db nt -task blastn-short -word_size 7 -evalue 500 -perc_identity 95 -gilist Hsapiens_gilist.gi -outfmt 6 -out local_blast_result_Hsapiens.table -max_target_seqs 10 -max_hsps 5


2. Choose "Nucleotide" next to the search bar
3. Type in "Homo sapiens [organism]" and hit the search button
4. Click on Send to > Choose Destination: File > Format: GI List > Create File
0
Entering edit mode

Hello, The query for doing a remote BLAST is very useful. Does anyone know how to specify the assembly type as well for Homo sapiens, say "grch37" or "hg19" in the query?

0
Entering edit mode

You could try using -entrez_query "Homo sapiens [organism] AND GRCh37" option.

Edit: Looks like this works.

$blastn -query test.fa -db nt -task blastn -remote -entrez_query "Homo sapiens [organism] AND GRCh38" -outfmt 6 -out local_blast_result_Hsapiens_38.table -max_target_seqs 6$ more local_blast_result_Hsapiens_38.table
FJ525883.1      KY429637.1      84.733  131     19      1       2       132     580     709     2.60e-34        143
FJ525883.1      KY429912.1      85.714  77      10      1       59      135     3153    3078    6.12e-17        86.9
FJ525883.1      KY429912.1      81.356  59      10      1       5       63      3597    3540    3.61e-07        54.5
FJ525883.1      KY429869.1      80.000  35      7       0       764     798     1815    1781    1.2     32.8
FJ525883.1      KY429874.1      77.273  44      9       1       763     806     1388    1346    1.2     31.9
FJ525883.1      KY429835.1      91.667  24      1       1       315     337     3661    3638    1.2     31.9
FJ525883.1      KY429731.1      81.250  32      6       0       783     814     2080    2111    1.2     31.9

$blastn -query test.fa -db nt -task blastn -remote -entrez_query "Homo sapiens [organism] AND GRCh37" -outfmt 6 -out local_blast_result_Hsapiens_37.table -max_target_seqs 6$ more local_blast_result_Hsapiens_37.table
FJ525883.1      NG_021183.1     89.062  128     14      0       1       128     31697   31824   2.08e-41        168
FJ525883.1      KY429393.1      87.500  88      8       3       47      132     4039    4125    3.09e-20        99.6
FJ525883.1      KY429393.1      84.746  59      8       1       4       62      3901    3958    2.22e-09        63.5
FJ525883.1      NG_008662.2     83.117  77      11      2       47      121     95878   95802   1.23e-12        74.3
FJ525883.1      NG_008662.2     90.909  22      2       0       558     579     67840   67819   3.8     31.9
FJ525883.1      NG_023313.1     83.784  37      5       1       890     925     180695  180731  0.089   37.4
FJ525883.1      NG_027748.1     88.889  27      3       0       4       30      308     334     0.31    36.5
FJ525883.1      GU267523.1      80.000  40      5       1       788     824     150     111     0.31    35.6

0
Entering edit mode

Thank you! Is my below query correct to get an exact alignment for the fasta file? Each read has 23 nucleotides.

blastn -query ../../Desktop/short_sample.fasta -db nt -remote -task blastn-short -word_size 23 -evalue 1 -perc_identity 100 -entrez_query "Homo sapiens [organism] AND GRCh37" -outfmt 6 -out blast_result_k562.txt -max_target_seqs 10

0
Entering edit mode

Great! Looks cool. Thanks a lot!