Question: How to simply filter a xml file?
0
gravatar for vmicrobio
23 months ago by
vmicrobio230
France
vmicrobio230 wrote:

Hi all!

I would like to filter my xml file, removing all fields from <Hit> to </Hit> that do not contain 'Homo sapiens' in <Hit_def>. Do you have any idea how to do it simply?

    
    http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
    <BlastOutput>
      <BlastOutput_program>blastn</BlastOutput_program>
      <BlastOutput_version>BLASTN 2.2.30+</BlastOutput_version>
      <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
      <BlastOutput_db>nt</BlastOutput_db>
      <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
      <BlastOutput_query-def>R1_mid6_filt_denovo_18-04-16_c1    cov=12.91 len=1158 gc=29.46 nseq=105</BlastOutput_query-def>
      <BlastOutput_query-len>1158</BlastOutput_query-len>
      <BlastOutput_param>
        <Parameters>
          <Parameters_expect>10</Parameters_expect>
          <Parameters_sc-match>2</Parameters_sc-match>
          <Parameters_sc-mismatch>-3</Parameters_sc-mismatch>
          <Parameters_gap-open>5</Parameters_gap-open>
          <Parameters_gap-extend>2</Parameters_gap-extend>
          <Parameters_filter>L;m;</Parameters_filter>
        </Parameters>
      </BlastOutput_param>
    <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_query-ID>Query_1</Iteration_query-ID>
      <Iteration_query-def>R1_mid6_filt_denovo_18-04-16_c1    cov=12.91 len=1158 gc=29.46 nseq=105</Iteration_query-def>
      <Iteration_query-len>1158</Iteration_query-len>
    <Iteration_hits>
    <Hit>
      <Hit_num>1</Hit_num>
      <Hit_id>gi|18642927|gb|AC105445.3|</Hit_id>
      <Hit_def>Homo sapiens BAC clone RP11-350B19 from 4, complete sequence</Hit_def>
      <Hit_accession>AC105445</Hit_accession>
      <Hit_len>128907</Hit_len>
      <Hit_hsps>
        <Hsp>
          <Hsp_num>1</Hsp_num>
          <Hsp_bit-score>1925.48</Hsp_bit-score>
          <Hsp_score>2134</Hsp_score>
          <Hsp_evalue>0</Hsp_evalue>
          <Hsp_query-from>61</Hsp_query-from>
          <Hsp_query-to>1144</Hsp_query-to>
          <Hsp_hit-from>62962</Hsp_hit-from>
          <Hsp_hit-to>61873</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>-1</Hsp_hit-frame>
          <Hsp_identity>1080</Hsp_identity>
          <Hsp_positive>1080</Hsp_positive>
          <Hsp_gaps>6</Hsp_gaps>
          <Hsp_align-len>1090</Hsp_align-len>
          <Hsp_qseq>GAGATKWCAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTGTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAA------AATGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_qseq>
          <Hsp_hseq>GAGATTACAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTCTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAAAAAAATAAGGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_hseq>
          <Hsp_midline>|||||  ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||      || |||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline>
        </Hsp>
        <Hsp>
          <Hsp_num>2</Hsp_num>
          <Hsp_bit-score>123.915</Hsp_bit-score>
          <Hsp_score>136</Hsp_score>
          <Hsp_evalue>6.73231e-24</Hsp_evalue>
          <Hsp_query-from>1</Hsp_query-from>
          <Hsp_query-to>74</Hsp_query-to>
          <Hsp_hit-from>63185</Hsp_hit-from>
          <Hsp_hit-to>63258</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>70</Hsp_identity>
          <Hsp_positive>70</Hsp_positive>
          <Hsp_gaps>0</Hsp_gaps>
          <Hsp_align-len>74</Hsp_align-len>
          <Hsp_qseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGASWGAGATKWCAAGTAA</Hsp_qseq>
          <Hsp_hseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGACAGAGATGTCAAGTAA</Hsp_hseq>
          <Hsp_midline>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||  |||||  |||||||</Hsp_midline>
        </Hsp>
        <Hsp>
          <Hsp_num>3</Hsp_num>
          <Hsp_bit-score>49.9773</Hsp_bit-score>
          <Hsp_score>54</Hsp_score>
          <Hsp_evalue>0.121831</Hsp_evalue>
          <Hsp_query-from>1132</Hsp_query-from>
          <Hsp_query-to>1158</Hsp_query-to>
          <Hsp_hit-from>62138</Hsp_hit-from>
          <Hsp_hit-to>62164</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>27</Hsp_identity>
          <Hsp_positive>27</Hsp_positive>
          <Hsp_gaps>0</Hsp_gaps>
          <Hsp_align-len>27</Hsp_align-len>
          <Hsp_qseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_qseq>
          <Hsp_hseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_hseq>
          <Hsp_midline>|||||||||||||||||||||||||||</Hsp_midline>
        </Hsp>
      </Hit_hsps>
    </Hit>
    <Hit>
      <Hit_num>2</Hit_num>
      <Hit_id>gi|850484145|gb|CP011891.1|</Hit_id>
      <Hit_def>Ovis canadensis canadensis isolate 43U chromosome 6 sequence</Hit_def>
...
xml blast filter • 676 views
ADD COMMENTlink modified 23 months ago by Alex Reynolds26k • written 23 months ago by vmicrobio230
3
gravatar for Jean-Karim Heriche
23 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche17k wrote:

Use an XSLT stylesheet. See this post and this one.

ADD COMMENTlink written 23 months ago by Jean-Karim Heriche17k
2
gravatar for Alex Reynolds
23 months ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

BeautifulSoup may also be an option.

ADD COMMENTlink written 23 months ago by Alex Reynolds26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1190 users visited in the last hour