How To Produce An Analysis Report From Blast Output ?
4
0
Entering edit mode
8.9 years ago
pmuench ▴ 140

Hello, is there a tool for processing the blast output and perform a statistical analysis of the alignment? For example a plot of the position of the mismatches? I cant believe that there is no existing tool. Here is a example of my blast output:

>ref|NC_018914.1| Homo sapiens chromosome 3, alternate assembly CHM1_1.0, whole
genome shotgun sequence
gb|CM001611.1| Homo sapiens cell-line CHM1htert chromosome 3, whole genome shotgun
sequence
Length=197815086

Score = 54.0 bits (27),  Expect = 3e-05
Identities = 33/35 (94%), Gaps = 0/35 (0%)
Strand=Plus/Minus

Query  1          CTAGAATGCACACTCCTACCTCCTTTACCAAACGT  35
|||||||||||| |||||| |||||||||||||||
Sbjct  169676496  CTAGAATGCACATTCCTACTTCCTTTACCAAACGT  169676462

blast parser • 8.6k views
0
Entering edit mode

Sorry for the late answer, just saw this was put to the top automatically, are you still bearing with us?

0
Entering edit mode

One could also say that the standard output already is the perfect report, it provides all required information including the E-value (statistical analysis).

4
Entering edit mode
8.9 years ago
User ▴ 70

The post was deleted.

0
Entering edit mode

Thank you, but I'm interested in the position of mismatches in the reads. I don't find this information in the output.

0
Entering edit mode

Unfortunately, this answer is incorrect (even though it received most votes) given the task to identify location of indels and mismatches. The tabular output format does not convey this information. Sorry, I missed this in the first place.

1
Entering edit mode
8.9 years ago

Depending on the language you are working in, you might search your favorite search engine on "[language] blast parser".

0
Entering edit mode

I think this is a bit general to be helpful.

0
Entering edit mode

As a general matter, when the person asking a question narrows down the scope of the question, that generally allows more focused answers. Cheers.

1
Entering edit mode
8.9 years ago
qiyunzhu ▴ 430

I agree with Gaik Tamazian that you probably need a tabular output, although other format can be parsed, too, with a bit more tricks of text manipulation. I used to write a Perl script to retrieve the sequence alignment directly, instead of tabular output. Here are some tricks I can share with you:

In addition to standalone blast (as Gaik Tamazian showed to you), I suggest the directly http way, in which you don't need to download the whole GenBank to your computer. Simply type these web addresses in your browser manually or automatically using any language:

Blast a sequence:

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&PROGRAM=blastp&DATABASE=nr&QUERY=**your_query_sequence_or_identifier**&EXPECT=1e-5

Get tabular output:

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&RID=**RID_of_this_search**&ALIGNMENT_VIEW=Tabular&FORMAT_TYPE=Text

Get multiple (not pairwise) sequence alignment

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&RID=**RID_of_this_search**&ALIGNMENT_VIEW=FlatQueryAnchoredNoIdentities&FORMAT_TYPE=Text

You will see your results instantly.

0
Entering edit mode

This answer is incorrect given the requirement to retrieve detailed alignment information and addresses a different topic (remote blast) than the one described in the question.

1
Entering edit mode
8.1 years ago

BioPerl is the tool of choice when it comes dealing with more refined dissection of Blast results. See http://www.bioperl.org/wiki/HOWTO:SearchIO#seq_inds.28.29 for a code example. This example shows exactly what you want to do using the method seq_inds() of a HSP object.

Ignore other answers directing to tabular output formats, they are misleading. While tabular format is easier to parse and useful under certain conditions, the location of mismatches and indels can certainly not be derived from tabular blast format (6), but only from either standard format (0), XML, or ASN which contain the homology string.