Blast NCBI online using R
1
0
Entering edit mode
9 weeks ago

I want to use an R script to blast sequences to the NCBI databases online.

This post already explored the subject , but all the tools proposed (rBLAST, BLASTr, and metablast) use local databases to do the Blast.

So it does not look possible to do a blast on NCBI online databases using R script, or I have not found this information. Did you come across such a thing?

R NCBI BLAST Rstudio • 569 views
ADD COMMENT
0
Entering edit mode

Keep in mind that "remote blast" is not supposed to be used as a replacement for local blast for large amount of sequences. You may get errors or at worse you may get IP banned.

ADD REPLY
0
Entering edit mode

It's possible to remotely send blast commands and to receive results with SequenceServer blast. We've made examples for blasting from within python. And for remote blasting in the unix command-line.

The same concept (i.e. server-side API) should be accessible from R too. But we have yet to make example code for R.

ADD REPLY
2
Entering edit mode
9 weeks ago
Michael 54k

At least BLASTr seems to be quite comprehensive from looking at the code. It supports the remote option. I haven't tried it but it looks like you can call blast like in the usage example:

blastn(query, db = "nt", out = NULL, outfmt = "xml", max_hits = 20, evalue = 10, remote = TRUE, ...)

You just need a local installation of command line blast tools. If you want to define your workflow in R, it might be worth installing and trying it. The only problem is that the author didn't assign an open-source license or any proper terms at all. So, in principle, you cannot use or modify it for anything public.

EDIT: Thinking about it, the lack of proper license terms is actually a problem here, in the DESCRIPTION file it mentions MIT though. I would anyway ask the author for a clarification if it is useful and you want to use the code in a project.

ADD COMMENT
0
Entering edit mode

Thanks very much for showing this option I had missed, and to have noticed the lack of clarity about the license. It is a problem and I will indeed ask to the author for clarification.

ADD REPLY
0
Entering edit mode

I don't think the remote=TRUE option is allowed with BLASTr. I might have to find a way to activate the BASH blastn from R.

BLASTr::run_blast(asv, remote=TRUE, num_alignments = 4, num_threads = 1, blast_type = "blastn", perc_id = 80, perc_qcov_hsp = 80)
Error in BLASTr::run_blast(asv, remote = TRUE, num_alignments = 4, num_threads = 1,  : 
  unused argument (remote = TRUE)
ADD REPLY
1
Entering edit mode

I just looked at the documentation, but there might be a discrepancy. How did you invoke blastn?

You can also always invoke blastn via system though. However, I think that managing your work-flow through R has considerable downsides as there will be significant overhead for controlling environments, dependencies, resource control, and re-entrance (re-running of failed or outdated processing steps). I suggest you implement your workflow in Snakemake instead, using conda environments to install blast and possibly the databases locally. And then let the R script become a single step of your workflow.

ADD REPLY
0
Entering edit mode

Thank you for the advice @michael. I do not have the choice of the tool to use in this case, and using a single R script has been requested by my colleagues. I wish I could use Snakemake. However I can follow your and GenoMax option and use local databases. I had many issues with the -remote flag and will not use it anymore.

ADD REPLY

Login before adding your answer.

Traffic: 2366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6