Standalone Blast+; Automating searches, Blastn formatting, interpretation
0
0
Entering edit mode
4.0 years ago

Hello all, I am struggling with using NCBI's standalone blast, particularly the blastn feature.

I have a large number of nucleotide sequences that I want to use blastn to ID. The goal is to retrieve information from the blast program and eventually import this information to excel. I want to know the capabilities of, or the easiest way, to format the output. I am running a remote search to NCBI's nucleotide database and want to filter and organize the information retrieved as much as possible. The ultimate goal is automate a way to run a remote search to NCBI's nucleotide database, and to work with the results in excel.

I am extremely new to this and struggling to find the best path. From what I can gather, many prefer using external programs, scripts, parsers, interpreters etc to organize their results. I am okay with doing this as well, but would still like to format the output of my blastn search as much as possible beforehand. It is returning a lot of data to me that I don't need.

Can I automate this in excel, and just use VBA/macros to further organize my data? It seems some people prefer to just import the output text file to a spreadsheet and work from there. I have also seen options like BioEdit, eutilities, APIs... I'm not sure which route I should invest my time into figuring out.

Thanks so much for lending your time and expertise!

blast ncbi nucleotide sequencing • 1.4k views
ADD COMMENT
1
Entering edit mode

whatever route you take/choose, be sure to take one that leads you away from excel. :)

ADD REPLY
0
Entering edit mode

Haha, yes I have heard excel is terrible for this sort of stuff. Even if I can get a nice text file that is tab or comma delimited, I would be overjoyed!

ADD REPLY
1
Entering edit mode

That you can easily get by choosing -outfmt 6 (or 7) for your blast output. See this.

ADD REPLY
0
Entering edit mode

Yeah I was hoping that would be the way but I am struggling to format it; blastn remote doesn't like a lot of the filtering options, and this format also lacks several specifiers that I need.. sigh!

ADD REPLY
1
Entering edit mode

You can output a blast tabular file with custom columns using -outfmt "6 col1 col2 col3... etc"

See here: C: Blast - Formatting Output

ADD REPLY
0
Entering edit mode

Hey Joe, thanks for the response. This would be the ideal format to import into excel, but in trying to use this format I have run into some issues. There don't seem to be specifiers for many bits of information I would like in my output (such as query sequence, subject binomial etc). I'd also like a way to have some program choose the best option based on my criteria (probably a lot of if/else or if/then statements) and it seems like blast+ doesn't have such options. Do you know of any program or script I could do that with?

ADD REPLY
1
Entering edit mode

There isn't going to be an 'off-the-shelf' solution to give you all of this in a pretty format right away. Whether or not you can get things like the binomial depends entirely on the IDs of the file you input (if there are no binomials in your input, BLAST can't magic these up).

There is no way to output the full sequence for a given entry, only the aligned regions.

You are going to have to do some of your own scripting using the tabular file as a database of hits etc. You might be able to get some of the information you want by querying the local blast database with blastdbcmd. If you read the blast tab file line-by-line, you can perhaps append the relevant bits to the corresponding line.

I'd also like a way to have some program choose the best option based on my criteria (probably a lot of if/else or if/then statements) and it seems like blast+ doesn't have such options. Do you know of any program or script I could do that with?

I'm not sure what you mean by have a program pick the best option. What option?

ADD REPLY
0
Entering edit mode

Thanks again so much for your thorough response.

As for the binomial, no I don't have IDs in my input. The goal of my search is to assign a binomial to each query I run (they are not IDed)

As for having the program pick the best option; sorry if that was confusing. The sequences are from fungal samples and as such, it can be difficult to ID some of them well. I would like to filter out unlikely matches, and based on some criteria (like percent identity, query coverage, total score etc) have the most likely match returned to me for each query.

I was afraid I would have to do some scripting and that the blast tabular output would only take me so far. I guess now I just have to see how far it can take me! Thanks again Joe.

ADD REPLY
1
Entering edit mode

It is relatively easy to go from an accession to a taxonomic name (though the NCBI taxonomy database is an absolute trash heap, so it will be very patchy data), but you can't do this in one step with BLAST alone (and you'll need to make sure you're getting accessions in your results!). You'd need to parse the hit results as I mentioned and feed the accessions into something like entrez or ETE3 - see for example: https://github.com/jrjhealey/PYlogeny (W.I.P.)

It wouldn't be too arduous to pipeline all of this together, but its not the neatest/most elegant task.

For filtering, you can set various cut-offs to BLAST (e.g. percentage identity, or E-value etc.). You could then also 'post-filter' your blasttab file (or whatever). It would be fairly simple to do this in something like a pandas dataframe (if you're a python person) or in R. Unfortunately there don't tend to be magic values you can use for this and it'll depend on the organism (and I am certainly not well placed to suggest what to use for any fungus).

ADD REPLY

Login before adding your answer.

Traffic: 2196 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6