Question: How to converts output from rpsblast XML format to CSV format?
0
gravatar for casley_queiroz
2.5 years ago by
casley_queiroz20 wrote:

Hienter code here

Does anyone have a script that converts output from rpsblast XML format to CSV format? Is this a fragment of my XML result?

    <Hit>
      <Hit_num>1</Hit_num>
      <Hit_id>gnl|CDD|289286</Hit_id>
      <Hit_def>pfam12505, DUF3712, Protein of unknown function (DUF3712).  This domain family is found in eukaryotes, and is approximately 130 amino acids in length.</Hit_def>
      <Hit_accession>289286</Hit_accession>
      <Hit_len>124</Hit_len>
      <Hit_hsps>
        <Hsp>
          <Hsp_num>1</Hsp_num>
          <Hsp_bit-score>93.4135</Hsp_bit-score>
          <Hsp_score>233</Hsp_score>
          <Hsp_evalue>9.11946e-22</Hsp_evalue>
          <Hsp_query-from>846</Hsp_query-from>
          <Hsp_query-to>970</Hsp_query-to>
          <Hsp_hit-from>2</Hsp_hit-from>
          <Hsp_hit-to>122</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>39</Hsp_identity>
          <Hsp_positive>60</Hsp_positive>
          <Hsp_gaps>4</Hsp_gaps>
          <Hsp_align-len>125</Hsp_align-len>
          <Hsp_qseq>PLGQIAMPNVSLAGDVGADLNIDAAFAVADVGHLTDFTTYLLTQPSFTWQIYGQNLAVSALGITVPGISILKNVVLDGMDGFKGLVKIESFDLPANDPAGGITLTLATSLTNPSSVGVALSQIGF</Hsp_qseq>
          <Hsp_hseq>PFATVPLPGIKAAGN-GTTLVVDQTLDITDVDAFTDFAKALVFSESFTLSVKGKT-DLKLGGLPFSGVTLDKTVTLKGLNNLKG-FSITDFDLP-LPPADGINLVATATIPNPSVLTIELGNVTL</Hsp_hseq>
          <Hsp_midline>P   + +P +  AG+ G  L +D    + DV   TDF   L+   SFT  + G+   +   G+   G+++ K V L G++  KG   I  FDLP   PA GI L    ++ NPS + + L  +  </Hsp_midline>

I would like a table in CSV in this form:

query id,subject id,% identity,alignment length,mismatches,gap opens,q. start,q. end,s. start,s. end,evalue,bit score,subject description S89_g3,gnl|CDD|109488,43.59,39,22,0,247,285,6,44,3.98E-05,548,457,pfam00432: Prenyltrans: Prenyltransferase and squalene oxidase repeat.

Because from it I can work in excel.

blast • 1.2k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by casley_queiroz20

Have tried any regular blastXML to tab conversion scripts? For eg., https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/blastxml_to_tabular.py

ADD REPLYlink written 2.5 years ago by arnstrm1.7k

Yes, but I need csv or xls format. Whit Pierre's script I'm almost succeeding.

ADD REPLYlink written 2.5 years ago by casley_queiroz20

well, if you have a tsv, you can open it directly in excel. Also, converting from tsv to csv is relatively simple with sed command: sed 's/\t/,/g' file.tsv > file.csv

ADD REPLYlink written 2.5 years ago by arnstrm1.7k

Actually, I have a result of rpbsblast in xml and I want this result listed a table in xls cleanly, just as I exemplified in my first question. So I would like to convert the xml output or to tsv or to csv, so that would make it easy for me to use in xls. I have a script that works great when I use with the output of blastp (using BLAST +), but for output of rpbs blast does not work. I've tried to fix this but, unsuccessfully.

ADD REPLYlink written 2.5 years ago by casley_queiroz20
1
gravatar for Pierre Lindenbaum
2.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

I wrote a blast2tsv : https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/ncbi/blast2tsv.xsl , you can modify it to get the columns you want.

usage:

xsltproc --novalid blast2tsv.xsl blast.xml
ADD COMMENTlink written 2.5 years ago by Pierre Lindenbaum120k

Almost. Not returning the column "Hit-def".

I made this exchange:

59 <xsl:value-of select="Hsp_qseq"/>

60 <xsl:text> </xsl:text>

To

59 <xsl:value-of select="Hit_def"/>

60 <xsl:text> </xsl:text>

but the sequences continues on the output.

Where am I going wrong?

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by casley_queiroz20

no sure, juste delete the lines 58 to 63 ?

ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum120k

Hey Pierre, I sent an email to for you at this address plindenbaum@yahoo.fr about some erros that I had. I sent it there because I sent you some files, I do not know how to send here.

ADD REPLYlink written 2.5 years ago by casley_queiroz20
1
gravatar for casley_queiroz
2.5 years ago by
casley_queiroz20 wrote:

I got it! I changed blast2 by NCBI BLAST + last version and I used this script to conver xml to csv:

https://github.com/Sunhh/NGS_data_processing/blob/master/annot_tools/blast_xml_parse.py.

Only that!

Thanks for everything!

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by casley_queiroz20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1452 users visited in the last hour