I am trying to perform a remote blastn search through blast+ from the command line (on Linux Ubuntu). This results in an XML file with the following error message:
internal_error: (Severe Error) Blast search error: Details: search failed. # Informational Message: [blastsrv4.REAL]: Error: CPU usage limit was exceeded, resulting in SIGXCPU (24). No hits found
I have tried several (not too long) sequences. I used the following command:
blastn -db nt -query sequence.fasta -evalue 0.05 -word_size 6 -max_target_seqs 10 -out blast_output.xml -outfmt 5 -remote.
I know that a CPU error can happen when your search has a lot of sequences or a very long sequence, but sequence.fasta contains the sequence with accession code FS507595 which has 489bp. I also tried it with a different sequence, but had the same result. (I ran it with existing sequences, because I want to be sure that there should be hits) I also tried running the command from a different IP, which did not help.
Also, before blast starts running, I get the following message:
Critical: [blastn] External MBEDTLS version mismatch: 2.16.2 headers vs. 2.16.3 runtime.
I tried reinstalling blast+ and its dependencies, which did not help.
Does anyone have an idea on how to solve this? Your help would be much appreciated!
There may be an issue at NCBI's end. I would suggest waiting until tomorrow and checking back. You are using the latest
I noticed that I was not installing the latest version of blast+ when using apt, so I installed it manually. This helped with the MBEDTLS error, but I still get the CPU error. (I tried running it two days ago, so I am afraid waiting much longer won't help.)
I am facing that problem too, and I didn't find any solution. In the ncbi web it is somethin they explain, but nothing clear about how to solve it. I found this link: https://www.schrodinger.com/kb/1955 but I don't know how to apply it. I keep trying.
If you are getting this error then you are basically trying to run a large blast search remotely or are trying to run too many blast searches in parallel.
I am sending 200 seqs of max. 700 aa length each in separate files one by one with a break of 5 sec, no parallel run.
For some of them work fine, for others I receive that message.. I am starting to think that It might be for the characteristic of the sequence/blastp. I am defining my blastp only for "not Firmicutes" (it is not the phyllum of my sequence), and maybe for that I should indicate other parameters...maybe change the blosum parameter. I will think about that.
That is a lot of sequences and 5 sec between the run is not much time at all. You need to consider running this locally/in cloud. Usage is cumulative and added for your institutional IP (if you are behind a proxy).
Thank you! Maybe I need to run it locally, I wanted to avoid it because of the space of the nr db. I was also thinking about the IP issue, I will talk to my IT service. Thanks!
Just for who need it in the future, I solved my problem. In my case it was related with the database size. I was using -db nr excluding a bacteria phylum, and that seemed to be too big, so I needed to exclude a bigger box, in my case all eukaryotes too. It worked for most of the sequences.
Hello! Am I correct you excluded sequences from a local copy of the NR database (as you can't limit by taxid with a remote blast)? I'm trying to find a way to shorten my NR database search without having to download it locally, so if you managed to exclude a bigger box remotely can you tell me how?
I think that using the
-entrez_queryflag may help, but I dont actually know what this is - do you know what to put for this flag? Could it be used to only consider Micromonospora NR sequences, for example?
Lots of questions, thanks for reading!
-entrez_query "Micromonospora [ORGN]". I just tested it with remote
nrand worked fine. Got only Micromonospora results.
Perfect, it's still running for me but hasn't errored yet so looks good! I've never used that flag before, seems like a useful one.
Do you know if it limits the database search phase of blast to only Micromonospora DB sequences (which would presumably speed it up)? Or does it use the entrez query to filter the XML to include only Micromonospra hits (making it more of a results filtering thing with negligible speed impacts)? Because, whilst I haven't had time to test it properly yet, it's still quite slow for me.
I don't know the internals of how NCBI does this via their blast site. They use some non-standard things that we don't have access to with local blast. Plus they have large compute infrastructure. Depending on number of input sequences it will still take time so you will have to be patient.
My worst skill, OK thanks very much!