Blast+ Stand Alone Version For Sequence Alignment
3
0
Entering edit mode
9.7 years ago
Reyhaneh ▴ 520

Hi

On the on line version of Blastp (link text) there is a sequence alignment section (and it produces E-values for the alignment).and now I want to use the standalone version to perform alignment(My platform is Windows). I had installed Blast.2.2.5+ from (link text) and I need the database.

I have looked here (link text) but don't know which one is the suitable one. can you please guide me on this?

also if the following command is correct?
psiblast -in_msa seq.txt -db target.fasta

Thanks for you help in advance.

blast alignment blast pairwise • 5.3k views
1
Entering edit mode

i think it's most likely NR, but you need to understand which databas is appropriate and decide that for yourself, and no, you don't need FASTA files, you need the processed database. Read the readme file in ftp.ncbi.nih.gov/blast/db/ first, and see my answer below.

0
Entering edit mode

"and I need the database." Which one, NR, we cannot know which database you want to use, can we?

0
Entering edit mode

you are right. I want to do a protein sequence alignment using Blastp. I know that for sequence alignment I need to get a FASTA from (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/). Do you know which one is the default one used by pBlast people on the online version?

3
Entering edit mode
9.7 years ago
Hamish ★ 3.2k

If I understand this correctly you want to perform an NCBI BLAST+ blastp search against a database provided by NCBI (as Michael says most likely 'nr': the non-identical protein sequence database produced by NCBI) and from the BLAST result you want the hit alignments with their statistics (score, E-value, %identity and %similarity), as they appear in the "Alignments" section of the web result.

So working backwards...

The "Alignments" section in the on-line result is almost a direct reflection of the default NCBI BLAST+ or Legacy NCBI BLAST output with a few bits of HTML added. You can see the plain result which is produced by the program by using the "Formatting options", and disabling the HTML output. NCBI BLAST+ can produce a number of other formats, see "16. The BLAST Sequence Analysis Tool" in the "The NCBI Handbook" and "BLAST Command Line Applications User Manual" in "BLAST Help".

On to the NCBI BLAST databases, these are available from the NCBI's FTP site, descriptions of the available databases can be found in the "BLAST FTP Site" documentation and the "blastftp.txt" file on the FTP site. There are dependencies between some of the databases, for example the 'swissprot' database requires the 'nr' database due to being implemented a subset of 'nr' using BLAST mask files. So be sure to download all of the required files. The 'update_blastdb.pl' script mentioned by Michael can be used to mirror the NCBI produced BLAST databases. While you can create these databases using makeblastdb and the fasta format files provided by NCBI, the resulting databases will be missing additional information (e.g. Taxonomy) present in the pre-formatted databases, since these are generated from the ASN.1 not the fasta format.

The command you quote will use a multiple sequence alignment (MSA) as the query for a PSI-BLAST search. From your description I suspect this is not what you intended. Instead I think you wanted to perform a blastp (protein sequence vs. protein sequence database) search, which would be something like:

blastp -query seq.tfa -db 'nr'


Depending on what you are doing you may find it more convenient to use the web services to access NCBI's BLAST services remotely rather than trying to maintain the BLAST databases locally. You can do this with the NCBI BLAST+ binaries by using the -remote option, for example:

blastp -query seq.tfa -db 'nr' -remote


To develop programs which use the NCBI BLAST services, see "BLAST Developer Information" for details of the available REST and SOAP APIs. These APIs are supported by many of the bioinformatics code libraries (e.g. BioPerl, BioPython, .NET Bio, etc.) so see their documentation for details. Depending on your database requirements you may also want to look at other organisations which provide NCBI BLAST based services, see BioCatalogue.org for a selection of web services providing NCBI BLAST searches. For example EMBL-EBI provide REST and SOAP APIs for their NCBI BLAST service (used by UniProt.org to power their BLAST search).

If you want to derive a multiple sequence alignment (MSA) from the NCBI BLAST blastp output, then you may want to look at tools such as DbClustal and MView (see http://www.ebi.ac.uk/Tools/msa/ for services and pointers to documentation and downloads).

• Update for additional information relating to the original question

Alternatively if you want to perform a multiple pairwise sequence alignment (multiple PSA) for a set of sequences you can use the pairwise sequence alignment (PSA) functionality in the NCBI BLAST+ programs (in "Legacy" NCBI BLAST this used the bl2seq program instead) thus:

blastp -query querySeqs.tfa -subject targetSeqs.tfa


Gives a a set of pairwise alignments for each sequence in querySeqs.tfa vs. each sequence in targetSeqs.tfa. Note that when performing the alignments this way the statistics are based on the pair of sequences being aligned rather than on the query and database. For details of some other methods to consider using when performing multiple pairwise sequence alignments see this post.

0
Entering edit mode

0
Entering edit mode
9.7 years ago
Reyhaneh ▴ 520

I managed to get the answer:

After installing Blast+ package using the following command you can perform Sequence Alignment:

blastp -query querySeq.txt -subject subjectSeq.txt -out output.txt


this command will align all the sequences in the subjectSeq.txt file against the querySeq.txt.

0
Entering edit mode

It's always nice to find things out yourself, however it's quite bold that you validate your own answer, especially given Hamish provided you with almost the exact command plus links to documentation, given your question was so imprecise that it was almost impossible to figure what you really wanted.

0
Entering edit mode

I just wanted to point out that it is not nice to try to give the impression you found this out yourself, but you obviously didn't.

0
Entering edit mode

I do appreciate Hamish and your help. The reason I made the correct answer mine was if in future some one needs to use the query to only perform MSA they do not need to go through reading all the handbook and trying the commands. I do not earn any credit by making my answer the correct one. my only aim was to guide people in future to the exact command they need. as you can see what Hmishe has given me is command for blasting and getting the MSA through that, but what I have provided here is to get SA for exactly the sequences you provide.

0
Entering edit mode

I don't think providing people the direct answer is rude. This does not mean I don not respect others help and answer and I have not said that I have invented the answer myself. i have been researching for 2 days on this and now that i have found the exact command I like to share it with others.

0
Entering edit mode

You didn't get any credit, but neither did Hamish. I'd recommend accepting his answer and posting your specific resolution as a comment to his answer.

0
Entering edit mode

I wouldn't say it's rude or anything, but I would personally always credit the answer that was most helpful to get there. It is ok to answer your own question and often very helpful, but the answer of Hamish is actually 'more correct' the yours because he managed to address all aspects which come up while trying to guess what you really wanted to know. I don't think there is a lot of benefit for somebody searching for similar question and finding you had validated your own answer. But still it is up to you, that's how the system works and it is ok.

0
Entering edit mode

Sorry, Michael, I think I agree with Reyhaneh. From his original post, he was looking for the -subject modifier and Hamish's answer only mentions -db.

0
Entering edit mode

It wasn't clear for what kind of database OP was looking for. Of course if the question is unclear, only OP can know the answer he/she is looking for. That is also why I don't like to spend too many effort in guess work, because it rarely turns out to be correct and everybody feels misunderstood. Imho, it is the style of the question that often provokes answers that are not to the point.

0
Entering edit mode

In fact when you take into account the question plus OP's comment as answer to which db was meant, then this answer is outright wrong. "Do you know which one is the default one used by pBlast people on the online version?" That together with looking for "the database" tells me (without telepathy at least) that the OP is looking for the default database for an online query in NCBI blastp, which is possibly the NR database, and wants to download it to perform a lokal query. In no way was it mentioned that a custom database should be searched.

0
Entering edit mode

By the way this solution does not perform a multiple sequence alignment (MSA), but instead performs multiple pairwise alignments (multiple PSA). See my answer for methods to perform a MSA with a BLAST result or look at MSA tools (e.g. ClustalW, MUSCLE, T-Coffee, etc.) for a means to perform a de novo MSA. For a selection of MSA tools see the EMBL-EBI's MSA services (http://www.ebi.ac.uk/Tools/msa/).

0
Entering edit mode

I made Hamishe's reply as the answer to end to this useless discussion. I still wished there wasn't any point earned by accepting your answer so we wouldn't have this conversions!

3
Entering edit mode
9.7 years ago

Use the update_blastdb script in the Blast+ distro to download the databases you want, it can also list the available databases. The FASTA ftp directory is most likely not what you want to look at.

also if the following command is correct? psiblast -in_msa seq.txt -db target.fasta

This depends totally on what is in these files, read the manual for more information on the formats: http://www.ncbi.nlm.nih.gov/books/NBK1763/