Question: How To Produce An Analysis Report From Blast Output ?
0
gravatar for pmuench
6.7 years ago by
pmuench130
pmuench130 wrote:

Hello, is there a tool for processing the blast output and perform a statistical analysis of the alignment? For example a plot of the position of the mismatches? I cant believe that there is no existing tool. Here is a example of my blast output:

>ref|NC_018914.1| Homo sapiens chromosome 3, alternate assembly CHM1_1.0, whole 
genome shotgun sequence
 gb|CM001611.1| Homo sapiens cell-line CHM1htert chromosome 3, whole genome shotgun 
sequence
Length=197815086

 Score = 54.0 bits (27),  Expect = 3e-05
 Identities = 33/35 (94%), Gaps = 0/35 (0%)
 Strand=Plus/Minus

Query  1          CTAGAATGCACACTCCTACCTCCTTTACCAAACGT  35
                  |||||||||||| |||||| |||||||||||||||
Sbjct  169676496  CTAGAATGCACATTCCTACTTCCTTTACCAAACGT  169676462
blast parser • 6.3k views
ADD COMMENTlink modified 5.7 years ago by Michael Dondrup46k • written 6.7 years ago by pmuench130

Sorry for the late answer, just saw this was put to the top automatically, are you still bearing with us?

ADD REPLYlink written 5.9 years ago by Michael Dondrup46k

One could also say that the standard output already is the perfect report, it provides all required information including the E-value (statistical analysis).

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Michael Dondrup46k
4
gravatar for Gaik Tamazian
6.7 years ago by
Gaik Tamazian70 wrote:

First, I advice you to use tabular output instead of pairwise one that is default. It can be specified by the option -outfmt 6 if you use blastn from NCBI BLAST+ package. A line from the tabular output looks like

ENSCAFG00000000367    jcf7180003228068    80.36    56    6    5    34    85    33250628    33250574    0.36    37.4

Here is the description of columns. Next, the output can be parsed as a usual tab-delimited file. For example, if you use R for statistical analysis:

x <- read.table('blast.out', header=F)

ADD COMMENTlink modified 6.7 years ago by Neilfws48k • written 6.7 years ago by Gaik Tamazian70

Thank you, but I'm interested in the position of mismatches in the reads. I don't find this information in the output.

ADD REPLYlink written 6.7 years ago by pmuench130

Unfortunately, this answer is incorrect (even though it received most votes) given the task to identify location of indels and mismatches. The tabular output format does not convey this information. Sorry, I missed this in the first place.

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Michael Dondrup46k
1
gravatar for Alex Reynolds
6.7 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Depending on the language you are working in, you might search your favorite search engine on "[language] blast parser".

ADD COMMENTlink written 6.7 years ago by Alex Reynolds29k

I think this is a bit general to be helpful.

ADD REPLYlink written 5.9 years ago by Michael Dondrup46k

As a general matter, when the person asking a question narrows down the scope of the question, that generally allows more focused answers. Cheers.

ADD REPLYlink written 5.7 years ago by Alex Reynolds29k
1
gravatar for qiyunzhu
6.7 years ago by
qiyunzhu420
Buffalo
qiyunzhu420 wrote:

I agree with Gaik Tamazian that you probably need a tabular output, although other format can be parsed, too, with a bit more tricks of text manipulation. I used to write a Perl script to retrieve the sequence alignment directly, instead of tabular output. Here are some tricks I can share with you:

In addition to standalone blast (as Gaik Tamazian showed to you), I suggest the directly http way, in which you don't need to download the whole GenBank to your computer. Simply type these web addresses in your browser manually or automatically using any language:

Blast a sequence:

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&PROGRAM=blastp&DATABASE=nr&QUERY=**your_query_sequence_or_identifier**&EXPECT=1e-5

Get tabular output:

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&RID=**RID_of_this_search**&ALIGNMENT_VIEW=Tabular&FORMAT_TYPE=Text

Get multiple (not pairwise) sequence alignment

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&RID=**RID_of_this_search**&ALIGNMENT_VIEW=FlatQueryAnchoredNoIdentities&FORMAT_TYPE=Text

You will see your results instantly.

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by qiyunzhu420

This answer is incorrect given the requirement to retrieve detailed alignment information and addresses a different topic (remote blast) than the one described in the question.

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Michael Dondrup46k
1
gravatar for Michael Dondrup
5.9 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

BioPerl is the tool of choice when it comes dealing with more refined dissection of Blast results. See http://www.bioperl.org/wiki/HOWTO:SearchIO#seq_inds.28.29 for a code example. This example shows exactly what you want to do using the method seq_inds() of a HSP object.

Ignore other answers directing to tabular output formats, they are misleading. While tabular format is easier to parse and useful under certain conditions, the location of mismatches and indels can certainly not be derived from tabular blast format (6), but only from either standard format (0), XML, or ASN which contain the homology string.

ADD COMMENTlink written 5.9 years ago by Michael Dondrup46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour