Question: NCBI blast remote on specific species
1
gravatar for tlorin
4.4 years ago by
tlorin310
Switzerland
tlorin310 wrote:

Dear all,

I am trying to blast a batch of sequences onto a specific species using the command-line version of blast. I know this can be specified with the -gilist option (see the answer here), but this only works locally, and I am using blast whith the -remote option.

My command-line:

blastn -query myquery.fa -db nt -remote -out test.out -outfmt 7

And it would be something like (note that the following is NOT working):

blastn -query myquery.fa -db nt -remote -out test.out -outfmt 7 -species 144197 (or any other species ID).

Any idea of how I can achieve that? Thanks!


EDIT: Here is another related post, that advises to use blastdb_aliastool, but again, this seems to work only locally.

ncbi • 5.4k views
ADD COMMENTlink modified 2.9 years ago by '280 • written 4.4 years ago by tlorin310

gi numbers are about to go away so don't use a solution that depends on gi numbers. You could use -entrez_query instead.

ADD REPLYlink written 4.4 years ago by GenoMax95k

Thanks! I had a look and tried -entrez_query "txid144197[ORGN]" but seems not to work.

ADD REPLYlink written 4.4 years ago by tlorin310

Use -entrez_query "Stegastes partitus[ORGN]" that should work.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by GenoMax95k

By the way, when the database is downloaded locally, you mainly depend on tools using the GIs right? Do you know what will they become?

ADD REPLYlink written 4.4 years ago by tlorin310
1
gravatar for  '
2.9 years ago by
'280
'280 wrote:

I know this is an old thread, but it took me a long time to figure out the same---this stuff is not well-documented. So I want to post an answer to this thread to clarify how one can achieve what's asked for in this thread (both the remote and local way).

I am using the latest version of BLAST+ (at the time of writing this post, it is version 2.7.1 available here) and I am focusing on 'nucleotides' and "Homo sapiens" in the following example.


Doing a remote BLAST alignment on a specific organism is done in the following way using the -entrez_query option.

blastn -query blast_query.fasta -db nt -remote -task blastn-short -word_size 7 -evalue 500 -perc_identity 95 -entrez_query "Homo sapiens [organism]" -outfmt 6 -out blast_result_Hsapiens.table -max_target_seqs 10 -max_hsps 5

A local BLAST alignment on a specific organism is done in this way:

You first need to have Entrez Direct installed (setting that up is extremely easy, just follow the guidelines). Then you download your desired GI list using that tool.

esearch -db nuccore -query "Homo sapiens [organism]" | efetch -db nuccore -format uid > Hsapiens_gilist.gi

Once you have the GI list, you use the -gilist tag to pass it to BLAST.

blastn -query blast_query.fasta -db nt -task blastn-short -word_size 7 -evalue 500 -perc_identity 95 -gilist Hsapiens_gilist.gi -outfmt 6 -out local_blast_result_Hsapiens.table -max_target_seqs 10 -max_hsps 5

Downloading the GI list can be done manually as well:

  1. Head over to https://www.ncbi.nlm.nih.gov
  2. Choose "Nucleotide" next to the search bar
  3. Type in "Homo sapiens [organism]" and hit the search button
  4. Click on Send to > Choose Destination: File > Format: GI List > Create File
ADD COMMENTlink written 2.9 years ago by '280

Hello, The query for doing a remote BLAST is very useful. Does anyone know how to specify the assembly type as well for Homo sapiens, say "grch37" or "hg19" in the query?

ADD REPLYlink modified 14 months ago • written 14 months ago by Ina0

You could try using -entrez_query "Homo sapiens [organism] AND GRCh37" option.

Edit: Looks like this works.

$ blastn -query test.fa -db nt -task blastn -remote -entrez_query "Homo sapiens [organism] AND GRCh38" -outfmt 6 -out local_blast_result_Hsapiens_38.table -max_target_seqs 6

$ more local_blast_result_Hsapiens_38.table 
FJ525883.1      KY429637.1      84.733  131     19      1       2       132     580     709     2.60e-34        143
FJ525883.1      KY429912.1      85.714  77      10      1       59      135     3153    3078    6.12e-17        86.9
FJ525883.1      KY429912.1      81.356  59      10      1       5       63      3597    3540    3.61e-07        54.5
FJ525883.1      KY429869.1      80.000  35      7       0       764     798     1815    1781    1.2     32.8
FJ525883.1      KY429874.1      77.273  44      9       1       763     806     1388    1346    1.2     31.9
FJ525883.1      KY429835.1      91.667  24      1       1       315     337     3661    3638    1.2     31.9
FJ525883.1      KY429731.1      81.250  32      6       0       783     814     2080    2111    1.2     31.9

$ blastn -query test.fa -db nt -task blastn -remote -entrez_query "Homo sapiens [organism] AND GRCh37" -outfmt 6 -out local_blast_result_Hsapiens_37.table -max_target_seqs 6

$ more local_blast_result_Hsapiens_37.table 
FJ525883.1      NG_021183.1     89.062  128     14      0       1       128     31697   31824   2.08e-41        168
FJ525883.1      KY429393.1      87.500  88      8       3       47      132     4039    4125    3.09e-20        99.6
FJ525883.1      KY429393.1      84.746  59      8       1       4       62      3901    3958    2.22e-09        63.5
FJ525883.1      NG_008662.2     83.117  77      11      2       47      121     95878   95802   1.23e-12        74.3
FJ525883.1      NG_008662.2     90.909  22      2       0       558     579     67840   67819   3.8     31.9
FJ525883.1      NG_023313.1     83.784  37      5       1       890     925     180695  180731  0.089   37.4
FJ525883.1      NG_027748.1     88.889  27      3       0       4       30      308     334     0.31    36.5
FJ525883.1      GU267523.1      80.000  40      5       1       788     824     150     111     0.31    35.6
ADD REPLYlink modified 14 months ago • written 14 months ago by GenoMax95k

Thank you! Is my below query correct to get an exact alignment for the fasta file? Each read has 23 nucleotides.

blastn -query ../../Desktop/short_sample.fasta -db nt -remote -task blastn-short -word_size 23 -evalue 1 -perc_identity 100 -entrez_query "Homo sapiens [organism] AND GRCh37" -outfmt 6 -out blast_result_k562.txt -max_target_seqs 10
ADD REPLYlink modified 14 months ago by _r_am32k • written 14 months ago by Ina0

Great! Looks cool. Thanks a lot!

ADD REPLYlink written 14 months ago by Ina0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2380 users visited in the last hour
_