Question: Using 'blastn' to do a remote search on NCBI BLAST against the nt collection database
0
gravatar for maciwuk
13 months ago by
maciwuk0
maciwuk0 wrote:

I have 60,000 sequences that I want to BLAST against the default 'Nucleotide collection (nt/nr)' database.

Is it possible to do this without setting up a standalone, local version of BLAST? (I of course have BLAST (blast+-2.6.0) installed, but I wonder if it is possible to run the search non-locally).

blastn -db nt -query input-sequences.fasta -remote -out blast_output.out

I get quite a huge list of errors that contain strings such as: Unavailable feature GNUTLS, Failed to initialize secure session, Service not found, stack is empty, etc.

QUESTIONS:

  1. Am I doing something wrong in my command?
  2. Is it faster to build a local database and search locally on my own computer for such a large number of sequences?
linux shell blast unix • 1.5k views
ADD COMMENTlink modified 13 months ago • written 13 months ago by maciwuk0
1

I don't recollect if v. 2.6.0 moved to using https connections. NCBI has completely moved to using https for all connectivity so upgrading to latest blast v. 2.7.1 may not be a bad idea.

If you need to blast 60K sequences then consider doing those in chunks. You don't want to abuse your privileges at NCBI by sending a massive amount of blast searches their way. Consider using a loop/building in sleep times etc.

If you have enough local resources available then doing the search locally will give you more control over things.

ADD REPLYlink written 13 months ago by genomax64k

Great. I will update to 2.7.1 and will try again. I have access to a computer with 192 GB RAM and 12 physical CPU cores (each @2.2 GHz). Do you think BLASTing 60 thousand sequences will take a substantial amount of time?

ADD REPLYlink modified 13 months ago • written 13 months ago by maciwuk0
1

What kind of sequences are these? NGS or regular fasta? You may want to use DIAMOND (since you have enough resources available locally) instead of blast. That can speed things up significantly.

ADD REPLYlink written 13 months ago by genomax64k

These are short DNA sequences (all between 15-30 nt) extracted directly from UCSC.hg38 and UCSC.mm10 fasta (chromosome) files. They have some modifications introduced, where usually one nucleotide is either replaced by 'H' (not G) or 'N' (any nucleotide). Supposing a certain sequence is from chromosome 1 on hg38, I want to know whether my sequence with the modification can be found on a chromosome other than chr1. I simply want to do a BLAST search to see if I can match any of these sequences to any other chromosomes with 100% similarity where that matched hit is NOT the chromosome my sequence was originally found on. The reason BLAST impeccably fits this situation is that it can (1) optimize the sequence and cut few nucleotide from each end (and that is exactly what I want too, because I am also interested in shorter arms in both ends of the sequence, so cutting a few nucleotides from each end is more than fine), and (2) BLAST is totally fine with 'N' and 'H' nucleotides I have introduced in my sequences and it is capable of dealing with those in a way that is highly applicable to my end-goal. For this reason, I thought BLAST will be even faster than a regular expression search. Though I am still not sure whether I should do it locally!

ADD REPLYlink modified 13 months ago • written 13 months ago by maciwuk0
1

Ah sorry then DIAMOND would not be an option. I suggest doing blastn search locally against a smaller subset (mouse and human genomes) than entire nt. That will help speed things up.

Remember to use --task blastn-short since you have short sequences.

Edit: Blat from UCSC may be very fast but I am not sure if it will handle IUPAC codes. Look into it as well.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax64k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 794 users visited in the last hour