Convert Blast Output Into Blast-Xml
2
1
Entering edit mode
13.0 years ago
Fabsta ▴ 120

Hey!

Problem: I need to convert a standard Blast report into the blastxml format.

Background: For speed reasons, I am using UBlast (Usearch) instead of NCBI-Blast. UBlast outputs either tab-delimited format compatible to the -m8 option of NCBI's Blast or a human-readable Blast report. Does anyone has an idea/tool to convert a regular Blast report into the blastxml format. Here is an example of tab-delimited output:

NP_417679.1     YDL171C 41.2    1576    837     27      13      1516    11      1568    0       1131.3

Any help is appreciated. Thanks a lot!

Cheers, Fabian

blast xml conversion • 9.5k views
ADD COMMENT
0
Entering edit mode

Can you please include a sample of the output.

ADD REPLY
3
Entering edit mode
13.0 years ago

You'll be missing some parameters : the BlastOutput_param, the Hsp_qseq, Iteration_stat etc... So, you won't be able to use the generated XML with another tool requiring a DTD validation. Generating a XML from your text file is just like trying to make a cow from a steak.

That said, one could imagine to pipe your file in awk (or perl) to build a XML file. Here I'm just using a awk script with your only line (and I dont' know the meaning of your columns ). For multiple Hsps or Hsp per Hit, you'll have to modify this script.

{
printf("<?xml version=\"1.0\"?>\n");
printf("<BlastOutput>\n");
printf("  <BlastOutput_program>blastn</BlastOutput_program>\n");
printf("  <BlastOutput_version>BLASTN 2.2.25+</BlastOutput_version>\n");
printf("  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>\n");
printf("  <BlastOutput_db>n/a</BlastOutput_db>\n");
printf("  <BlastOutput_query-ID>%s</BlastOutput_query-ID>\n",$1);
printf("<BlastOutput_iterations>\n");
printf("<Iteration>\n");
printf("  <Iteration_iter-num>1</Iteration_iter-num>\n");
printf("  <Iteration_query-len>%s</Iteration_query-len>\n",$3);
printf("<Iteration_hits>\n");
printf("<Hit>\n");
printf("  <Hit_num>%d</Hit_num>\n",hit_num++);
printf("  <Hit_def>%s</Hit_def>\n",$2);
printf("  <Hit_len>?</Hit_len>\n");
printf("  <Hit_hsps>\n");
printf("    <Hsp>\n");
printf("      <Hsp_num>1</Hsp_num>\n");
printf("      <Hsp_bit-score>159.983</Hsp_bit-score>\n");
printf("      <Hsp_score>176</Hsp_score>\n");
printf("      <Hsp_evalue>9.34813e-45</Hsp_evalue>\n");
printf("      <Hsp_query-from>%s</Hsp_query-from>\n",$5);
printf("      <Hsp_query-to>%s</Hsp_query-to>\n",$6);
printf("      <Hsp_hit-from>%s</Hsp_hit-from>\n",$7);
printf("      <Hsp_hit-to>%s</Hsp_hit-to>\n",$8);
printf("      <Hsp_query-frame>???</Hsp_query-frame>\n");
printf("      <Hsp_hit-frame>??</Hsp_hit-frame>\n");
printf("      <Hsp_identity>??</Hsp_identity>\n");
printf("      <Hsp_positive>??</Hsp_positive>\n");
printf("      <Hsp_gaps>?</Hsp_gaps>\n");
printf("      <Hsp_align-len>?</Hsp_align-len>\n");
printf("    </Hsp>\n");
printf("  </Hit_hsps>\n");
printf("</Hit>\n");
printf("</Iteration_hits>\n");
printf("</Iteration>\n");
printf("</BlastOutput_iterations>\n");
printf("</BlastOutput>\n");
}
awk -f file.awk blast.txt
<?xml version="1.0"?>
<BlastOutput>
  <BlastOutput_program>blastn</BlastOutput_program>
  <BlastOutput_version>BLASTN 2.2.25+</BlastOutput_version>
  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>n/a</BlastOutput_db>
  <BlastOutput_query-ID>NP_417679.1</BlastOutput_query-ID>
<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-len>41.2</Iteration_query-len>
<Iteration_hits>
<Hit>
  <Hit_num>0</Hit_num>
  <Hit_def>YDL171C</Hit_def>
  <Hit_len>?</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>159.983</Hsp_bit-score>
      <Hsp_score>176</Hsp_score>
      <Hsp_evalue>9.34813e-45</Hsp_evalue>
      <Hsp_query-from>837</Hsp_query-from>
      <Hsp_query-to>27</Hsp_query-to>
      <Hsp_hit-from>13</Hsp_hit-from>
      <Hsp_hit-to>1516</Hsp_hit-to>
      <Hsp_query-frame>???</Hsp_query-frame>
      <Hsp_hit-frame>??</Hsp_hit-frame>
      <Hsp_identity>??</Hsp_identity>
      <Hsp_positive>??</Hsp_positive>
      <Hsp_gaps>?</Hsp_gaps>
      <Hsp_align-len>?</Hsp_align-len>
    </Hsp>
  </Hit_hsps>
</Hit>
</Iteration_hits>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>
ADD COMMENT
1
Entering edit mode

+1 for steak->cow analogy

ADD REPLY
0
Entering edit mode

@Casey, this analogy was inspired by Dorothea Salo's presentation.

ADD REPLY
2
Entering edit mode
12.8 years ago
Pedrofeijao ▴ 20

There is a Python script that converts text blast reports to blast XML at the Blast2GO page: http://www.blast2go.org/downloads

ADD COMMENT
0
Entering edit mode

link not working now

ADD REPLY

Login before adding your answer.

Traffic: 1522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6