Question: Convert Blast Output Into Blast-Xml
1
gravatar for Fabsta
7.9 years ago by
Fabsta120
Fabsta120 wrote:

Hey!

Problem: I need to convert a standard Blast report into the blastxml format.

Background: For speed reasons, I am using UBlast (Usearch) instead of NCBI-Blast. UBlast outputs either tab-delimited format compatible to the -m8 option of NCBI's Blast or a human-readable Blast report. Does anyone has an idea/tool to convert a regular Blast report into the blastxml format. Here is an example of tab-delimited output:

NP_417679.1     YDL171C 41.2    1576    837     27      13      1516    11      1568    0       1131.3

Any help is appreciated. Thanks a lot!

Cheers, Fabian

xml blast conversion • 7.1k views
ADD COMMENTlink modified 7.9 years ago by Pedrofeijao20 • written 7.9 years ago by Fabsta120

Can you please include a sample of the output.

ADD REPLYlink written 7.9 years ago by Pierre Lindenbaum118k
3
gravatar for Pierre Lindenbaum
7.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

You'll be missing some parameters : the BlastOutputparam, the Hspqseq, Iteration_stat etc... So, you won't be able to use the generated XML with another tool requiring a DTD validation. Generating a XML from your text file is just like trying to make a cow from a steack.

That saids, one could imagine to pipe your file in awk (or perl) to build a XML file. Here I'm just using a awk script with your only line (and I dont' know the meaning of your columns ). For multiple Hsps or Hsp per Hit, you'll have to modify this script.

{
printf("<?xml version=\"1.0\"?>\n");
printf("<BlastOutput>\n");
printf("  <BlastOutput_program>blastn</BlastOutput_program>\n");
printf("  <BlastOutput_version>BLASTN 2.2.25+</BlastOutput_version>\n");
printf("  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&amp;auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>\n");
printf("  <BlastOutput_db>n/a</BlastOutput_db>\n");
printf("  <BlastOutput_query-ID>%s</BlastOutput_query-ID>\n",$1);
printf("<BlastOutput_iterations>\n");
printf("<Iteration>\n");
printf("  <Iteration_iter-num>1</Iteration_iter-num>\n");
printf("  <Iteration_query-len>%s</Iteration_query-len>\n",$3);
printf("<Iteration_hits>\n");
printf("<Hit>\n");
printf("  <Hit_num>%d</Hit_num>\n",hit_num++);
printf("  <Hit_def>%s</Hit_def>\n",$2);
printf("  <Hit_len>?</Hit_len>\n");
printf("  <Hit_hsps>\n");
printf("    <Hsp>\n");
printf("      <Hsp_num>1</Hsp_num>\n");
printf("      <Hsp_bit-score>159.983</Hsp_bit-score>\n");
printf("      <Hsp_score>176</Hsp_score>\n");
printf("      <Hsp_evalue>9.34813e-45</Hsp_evalue>\n");
printf("      <Hsp_query-from>%s</Hsp_query-from>\n",$5);
printf("      <Hsp_query-to>%s</Hsp_query-to>\n",$6);
printf("      <Hsp_hit-from>%s</Hsp_hit-from>\n",$7);
printf("      <Hsp_hit-to>%s</Hsp_hit-to>\n",$8);
printf("      <Hsp_query-frame>???</Hsp_query-frame>\n");
printf("      <Hsp_hit-frame>??</Hsp_hit-frame>\n");
printf("      <Hsp_identity>??</Hsp_identity>\n");
printf("      <Hsp_positive>??</Hsp_positive>\n");
printf("      <Hsp_gaps>?</Hsp_gaps>\n");
printf("      <Hsp_align-len>?</Hsp_align-len>\n");
printf("    </Hsp>\n");
printf("  </Hit_hsps>\n");
printf("</Hit>\n");
printf("</Iteration_hits>\n");
printf("</Iteration>\n");
printf("</BlastOutput_iterations>\n");
printf("</BlastOutput>\n");
}

awk -f file.awk blast.txt

<?xml version="1.0"?>
<BlastOutput>
  <BlastOutput_program>blastn</BlastOutput_program>
  <BlastOutput_version>BLASTN 2.2.25+</BlastOutput_version>
  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&amp;auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>n/a</BlastOutput_db>
  <BlastOutput_query-ID>NP_417679.1</BlastOutput_query-ID>
<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-len>41.2</Iteration_query-len>
<Iteration_hits>
<Hit>
  <Hit_num>0</Hit_num>
  <Hit_def>YDL171C</Hit_def>
  <Hit_len>?</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>159.983</Hsp_bit-score>
      <Hsp_score>176</Hsp_score>
      <Hsp_evalue>9.34813e-45</Hsp_evalue>
      <Hsp_query-from>837</Hsp_query-from>
      <Hsp_query-to>27</Hsp_query-to>
      <Hsp_hit-from>13</Hsp_hit-from>
      <Hsp_hit-to>1516</Hsp_hit-to>
      <Hsp_query-frame>???</Hsp_query-frame>
      <Hsp_hit-frame>??</Hsp_hit-frame>
      <Hsp_identity>??</Hsp_identity>
      <Hsp_positive>??</Hsp_positive>
      <Hsp_gaps>?</Hsp_gaps>
      <Hsp_align-len>?</Hsp_align-len>
    </Hsp>
  </Hit_hsps>
</Hit>
</Iteration_hits>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>
ADD COMMENTlink written 7.9 years ago by Pierre Lindenbaum118k
1

+1 for steak->cow analogy

ADD REPLYlink written 7.9 years ago by Casey Bergman18k

@Casey, this analogy was inspired by Dorothea Salo's presentation: http://www.slideshare.net/cavlec/save-the-cows-data-curation-for-the-rest-of-us-1533252

ADD REPLYlink written 7.9 years ago by Pierre Lindenbaum118k
2
gravatar for Pedrofeijao
7.8 years ago by
Pedrofeijao20
Pedrofeijao20 wrote:

There is a Python script that converts text blast reports to blast XML at the Blast2GO page: http://www.blast2go.org/downloads

ADD COMMENTlink written 7.8 years ago by Pedrofeijao20

link not working now

ADD REPLYlink written 7.1 years ago by Sequer150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2025 users visited in the last hour