Question: How to simply filter a xml file?
0
gravatar for haro
21 months ago by
haro230
France
haro230 wrote:

Hi all!

I would like to filter my xml file, removing all fields from <Hit> to </Hit> that do not contain 'Homo sapiens' in <Hit_def>. Do you have any idea how to do it simply?

    
    http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
    <BlastOutput>
      <BlastOutput_program>blastn</BlastOutput_program>
      <BlastOutput_version>BLASTN 2.2.30+</BlastOutput_version>
      <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
      <BlastOutput_db>nt</BlastOutput_db>
      <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
      <BlastOutput_query-def>R1_mid6_filt_denovo_18-04-16_c1    cov=12.91 len=1158 gc=29.46 nseq=105</BlastOutput_query-def>
      <BlastOutput_query-len>1158</BlastOutput_query-len>
      <BlastOutput_param>
        <Parameters>
          <Parameters_expect>10</Parameters_expect>
          <Parameters_sc-match>2</Parameters_sc-match>
          <Parameters_sc-mismatch>-3</Parameters_sc-mismatch>
          <Parameters_gap-open>5</Parameters_gap-open>
          <Parameters_gap-extend>2</Parameters_gap-extend>
          <Parameters_filter>L;m;</Parameters_filter>
        </Parameters>
      </BlastOutput_param>
    <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_query-ID>Query_1</Iteration_query-ID>
      <Iteration_query-def>R1_mid6_filt_denovo_18-04-16_c1    cov=12.91 len=1158 gc=29.46 nseq=105</Iteration_query-def>
      <Iteration_query-len>1158</Iteration_query-len>
    <Iteration_hits>
    <Hit>
      <Hit_num>1</Hit_num>
      <Hit_id>gi|18642927|gb|AC105445.3|</Hit_id>
      <Hit_def>Homo sapiens BAC clone RP11-350B19 from 4, complete sequence</Hit_def>
      <Hit_accession>AC105445</Hit_accession>
      <Hit_len>128907</Hit_len>
      <Hit_hsps>
        <Hsp>
          <Hsp_num>1</Hsp_num>
          <Hsp_bit-score>1925.48</Hsp_bit-score>
          <Hsp_score>2134</Hsp_score>
          <Hsp_evalue>0</Hsp_evalue>
          <Hsp_query-from>61</Hsp_query-from>
          <Hsp_query-to>1144</Hsp_query-to>
          <Hsp_hit-from>62962</Hsp_hit-from>
          <Hsp_hit-to>61873</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>-1</Hsp_hit-frame>
          <Hsp_identity>1080</Hsp_identity>
          <Hsp_positive>1080</Hsp_positive>
          <Hsp_gaps>6</Hsp_gaps>
          <Hsp_align-len>1090</Hsp_align-len>
          <Hsp_qseq>GAGATKWCAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTGTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAA------AATGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_qseq>
          <Hsp_hseq>GAGATTACAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTCTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAAAAAAATAAGGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_hseq>
          <Hsp_midline>|||||  ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||      || |||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline>
        </Hsp>
        <Hsp>
          <Hsp_num>2</Hsp_num>
          <Hsp_bit-score>123.915</Hsp_bit-score>
          <Hsp_score>136</Hsp_score>
          <Hsp_evalue>6.73231e-24</Hsp_evalue>
          <Hsp_query-from>1</Hsp_query-from>
          <Hsp_query-to>74</Hsp_query-to>
          <Hsp_hit-from>63185</Hsp_hit-from>
          <Hsp_hit-to>63258</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>70</Hsp_identity>
          <Hsp_positive>70</Hsp_positive>
          <Hsp_gaps>0</Hsp_gaps>
          <Hsp_align-len>74</Hsp_align-len>
          <Hsp_qseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGASWGAGATKWCAAGTAA</Hsp_qseq>
          <Hsp_hseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGACAGAGATGTCAAGTAA</Hsp_hseq>
          <Hsp_midline>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||  |||||  |||||||</Hsp_midline>
        </Hsp>
        <Hsp>
          <Hsp_num>3</Hsp_num>
          <Hsp_bit-score>49.9773</Hsp_bit-score>
          <Hsp_score>54</Hsp_score>
          <Hsp_evalue>0.121831</Hsp_evalue>
          <Hsp_query-from>1132</Hsp_query-from>
          <Hsp_query-to>1158</Hsp_query-to>
          <Hsp_hit-from>62138</Hsp_hit-from>
          <Hsp_hit-to>62164</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>27</Hsp_identity>
          <Hsp_positive>27</Hsp_positive>
          <Hsp_gaps>0</Hsp_gaps>
          <Hsp_align-len>27</Hsp_align-len>
          <Hsp_qseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_qseq>
          <Hsp_hseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_hseq>
          <Hsp_midline>|||||||||||||||||||||||||||</Hsp_midline>
        </Hsp>
      </Hit_hsps>
    </Hit>
    <Hit>
      <Hit_num>2</Hit_num>
      <Hit_id>gi|850484145|gb|CP011891.1|</Hit_id>
      <Hit_def>Ovis canadensis canadensis isolate 43U chromosome 6 sequence</Hit_def>
...
xml blast filter • 639 views
ADD COMMENTlink modified 21 months ago by Alex Reynolds26k • written 21 months ago by haro230
3
gravatar for Jean-Karim Heriche
21 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche16k wrote:

Use an XSLT stylesheet. See this post and this one.

ADD COMMENTlink written 21 months ago by Jean-Karim Heriche16k
2
gravatar for Alex Reynolds
21 months ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

BeautifulSoup may also be an option.

ADD COMMENTlink written 21 months ago by Alex Reynolds26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1983 users visited in the last hour