How to simply filter a xml file?
2
0
Entering edit mode
7.3 years ago
vmicrobio ▴ 290

Hi all!

I would like to filter my xml file, removing all fields from <Hit> to </Hit> that do not contain 'Homo sapiens' in <Hit_def>. Do you have any idea how to do it simply?

    
    http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
    <BlastOutput>
      <BlastOutput_program>blastn</BlastOutput_program>
      <BlastOutput_version>BLASTN 2.2.30+</BlastOutput_version>
      <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
      <BlastOutput_db>nt</BlastOutput_db>
      <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
      <BlastOutput_query-def>R1_mid6_filt_denovo_18-04-16_c1    cov=12.91 len=1158 gc=29.46 nseq=105</BlastOutput_query-def>
      <BlastOutput_query-len>1158</BlastOutput_query-len>
      <BlastOutput_param>
        <Parameters>
          <Parameters_expect>10</Parameters_expect>
          <Parameters_sc-match>2</Parameters_sc-match>
          <Parameters_sc-mismatch>-3</Parameters_sc-mismatch>
          <Parameters_gap-open>5</Parameters_gap-open>
          <Parameters_gap-extend>2</Parameters_gap-extend>
          <Parameters_filter>L;m;</Parameters_filter>
        </Parameters>
      </BlastOutput_param>
    <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_query-ID>Query_1</Iteration_query-ID>
      <Iteration_query-def>R1_mid6_filt_denovo_18-04-16_c1    cov=12.91 len=1158 gc=29.46 nseq=105</Iteration_query-def>
      <Iteration_query-len>1158</Iteration_query-len>
    <Iteration_hits>
    <Hit>
      <Hit_num>1</Hit_num>
      <Hit_id>gi|18642927|gb|AC105445.3|</Hit_id>
      <Hit_def>Homo sapiens BAC clone RP11-350B19 from 4, complete sequence</Hit_def>
      <Hit_accession>AC105445</Hit_accession>
      <Hit_len>128907</Hit_len>
      <Hit_hsps>
        <Hsp>
          <Hsp_num>1</Hsp_num>
          <Hsp_bit-score>1925.48</Hsp_bit-score>
          <Hsp_score>2134</Hsp_score>
          <Hsp_evalue>0</Hsp_evalue>
          <Hsp_query-from>61</Hsp_query-from>
          <Hsp_query-to>1144</Hsp_query-to>
          <Hsp_hit-from>62962</Hsp_hit-from>
          <Hsp_hit-to>61873</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>-1</Hsp_hit-frame>
          <Hsp_identity>1080</Hsp_identity>
          <Hsp_positive>1080</Hsp_positive>
          <Hsp_gaps>6</Hsp_gaps>
          <Hsp_align-len>1090</Hsp_align-len>
          <Hsp_qseq>GAGATKWCAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTGTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAA------AATGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_qseq>
          <Hsp_hseq>GAGATTACAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTCTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAAAAAAATAAGGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_hseq>
          <Hsp_midline>|||||  ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||      || |||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline>
        </Hsp>
        <Hsp>
          <Hsp_num>2</Hsp_num>
          <Hsp_bit-score>123.915</Hsp_bit-score>
          <Hsp_score>136</Hsp_score>
          <Hsp_evalue>6.73231e-24</Hsp_evalue>
          <Hsp_query-from>1</Hsp_query-from>
          <Hsp_query-to>74</Hsp_query-to>
          <Hsp_hit-from>63185</Hsp_hit-from>
          <Hsp_hit-to>63258</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>70</Hsp_identity>
          <Hsp_positive>70</Hsp_positive>
          <Hsp_gaps>0</Hsp_gaps>
          <Hsp_align-len>74</Hsp_align-len>
          <Hsp_qseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGASWGAGATKWCAAGTAA</Hsp_qseq>
          <Hsp_hseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGACAGAGATGTCAAGTAA</Hsp_hseq>
          <Hsp_midline>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||  |||||  |||||||</Hsp_midline>
        </Hsp>
        <Hsp>
          <Hsp_num>3</Hsp_num>
          <Hsp_bit-score>49.9773</Hsp_bit-score>
          <Hsp_score>54</Hsp_score>
          <Hsp_evalue>0.121831</Hsp_evalue>
          <Hsp_query-from>1132</Hsp_query-from>
          <Hsp_query-to>1158</Hsp_query-to>
          <Hsp_hit-from>62138</Hsp_hit-from>
          <Hsp_hit-to>62164</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>27</Hsp_identity>
          <Hsp_positive>27</Hsp_positive>
          <Hsp_gaps>0</Hsp_gaps>
          <Hsp_align-len>27</Hsp_align-len>
          <Hsp_qseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_qseq>
          <Hsp_hseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_hseq>
          <Hsp_midline>|||||||||||||||||||||||||||</Hsp_midline>
        </Hsp>
      </Hit_hsps>
    </Hit>
    <Hit>
      <Hit_num>2</Hit_num>
      <Hit_id>gi|850484145|gb|CP011891.1|</Hit_id>
      <Hit_def>Ovis canadensis canadensis isolate 43U chromosome 6 sequence</Hit_def>
...
blast filter xml • 1.6k views
ADD COMMENT
3
Entering edit mode
7.3 years ago

Use an XSLT stylesheet. See this post and this one.

ADD COMMENT
2
Entering edit mode
7.3 years ago

BeautifulSoup may also be an option.

ADD COMMENT

Login before adding your answer.

Traffic: 2846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6