Using R to "NCBI Blast" query DNA sequence automatically, many sequences to do
1
0
Entering edit mode
7.3 years ago
Megatron ▴ 10

Hi

I am trying to identify a number of conserved sequences from a known animal like mouse, in another animal in which only the whole genome shotgun is available.

I usually go to NCBI Blast, enter my target query, and set my target whole genome shotgun sequence. BLAST, then get the equivalent sequence of my query in my target animal.

I have only R installed, but never really used it anything. I am wondering if I can get R to do this for me efficiently so I don't have to open like 100 duplicate tabs on chrome and NCBI BLAST manually?

Thanks

alignment sequence blast R • 9.7k views
ADD COMMENT
0
Entering edit mode

What OS are you using this on? If you are using unix then you don't need to use R for doing this. You could easily use a shell script (with appropriate delay built in between jobs to avoid spamming NCBI blast server and getting your IP banned) and command line remote blast.

ADD REPLY
0
Entering edit mode

Currently I have a Win7 + R + RStudio that I was planning to do this on. However, I have a Ubuntu computer available to me in the lab that I can use if it would make it easier. Asking for patience because I haven't actually used Linux OR R before. I am a wet-lab biologist trying to put my foot in the door...

ADD REPLY
1
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

If you want to learn do this right then review basics of UNIX before doing anything else.

ADD REPLY
3
Entering edit mode
7.3 years ago
ddiez ★ 2.0k

One option is to use the blastSequences function in the Bioconductor package annotate:

library(annotate)
# search using a fragment of TLR4 mRNA.
x <- blastSequences("acttcttggg cttagaacaa ctagaacatc tggatttcca gcattccaat ttgaaacaaa")
x
[[1]]
DNAMultipleAlignment with 2 rows and 60 columns
     aln
[1] ACTTCTTGGGCTTAGAACAACTAGAACATCTGGATTTCCAGCATTCCAATTTGAAACAAA
[2] ACTTCTTGGGCTTAGAACAACTAGAACATCTGGATTTCCAGCATTCCAATTTGAAACAAA

[[2]]
DNAMultipleAlignment with 2 rows and 60 columns
     aln
[1] ACTTCTTGGGCTTAGAACAACTAGAACATCTGGATTTCCAGCATTCCAATTTGAAACAAA
[2] ACTTCTTGGGCTTAGAACAACTAGAACATCTGGATTTCCAGCATTCCAATTTGAAACAAA

<trimmed output>

I haven't tested it much so it may have many limitations.

In this website there is a modified version of blastSequences that attempts to improve failed queries, but note that the code is from 2014 and might not work correctly (or maybe better solutions are implemented in the original function, like the timeout argument).

Another option is to have Blast installed locally and use rBLAST (although this is not your question it might be useful in some settings). But at this point I would most likely use some other language to interface with blast (Perl, python, bash), or use plain system() calls.

ADD COMMENT

Login before adding your answer.

Traffic: 1948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6