Question: How to simply filter a xml file?
0
gravatar for vmicrobio
2.1 years ago by
vmicrobio240
vmicrobio240 wrote:

Hi all!

I would like to filter my xml file, removing all fields from <Hit> to </Hit> that do not contain 'Homo sapiens' in <Hit_def>. Do you have any idea how to do it simply?

    
    http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
    <BlastOutput>
      <BlastOutput_program>blastn</BlastOutput_program>
      <BlastOutput_version>BLASTN 2.2.30+</BlastOutput_version>
      <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
      <BlastOutput_db>nt</BlastOutput_db>
      <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
      <BlastOutput_query-def>R1_mid6_filt_denovo_18-04-16_c1    cov=12.91 len=1158 gc=29.46 nseq=105</BlastOutput_query-def>
      <BlastOutput_query-len>1158</BlastOutput_query-len>
      <BlastOutput_param>
        <Parameters>
          <Parameters_expect>10</Parameters_expect>
          <Parameters_sc-match>2</Parameters_sc-match>
          <Parameters_sc-mismatch>-3</Parameters_sc-mismatch>
          <Parameters_gap-open>5</Parameters_gap-open>
          <Parameters_gap-extend>2</Parameters_gap-extend>
          <Parameters_filter>L;m;</Parameters_filter>
        </Parameters>
      </BlastOutput_param>
    <BlastOutput_iterations>
    <Iteration>
      <Iteration_iter-num>1</Iteration_iter-num>
      <Iteration_query-ID>Query_1</Iteration_query-ID>
      <Iteration_query-def>R1_mid6_filt_denovo_18-04-16_c1    cov=12.91 len=1158 gc=29.46 nseq=105</Iteration_query-def>
      <Iteration_query-len>1158</Iteration_query-len>
    <Iteration_hits>
    <Hit>
      <Hit_num>1</Hit_num>
      <Hit_id>gi|18642927|gb|AC105445.3|</Hit_id>
      <Hit_def>Homo sapiens BAC clone RP11-350B19 from 4, complete sequence</Hit_def>
      <Hit_accession>AC105445</Hit_accession>
      <Hit_len>128907</Hit_len>
      <Hit_hsps>
        <Hsp>
          <Hsp_num>1</Hsp_num>
          <Hsp_bit-score>1925.48</Hsp_bit-score>
          <Hsp_score>2134</Hsp_score>
          <Hsp_evalue>0</Hsp_evalue>
          <Hsp_query-from>61</Hsp_query-from>
          <Hsp_query-to>1144</Hsp_query-to>
          <Hsp_hit-from>62962</Hsp_hit-from>
          <Hsp_hit-to>61873</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>-1</Hsp_hit-frame>
          <Hsp_identity>1080</Hsp_identity>
          <Hsp_positive>1080</Hsp_positive>
          <Hsp_gaps>6</Hsp_gaps>
          <Hsp_align-len>1090</Hsp_align-len>
          <Hsp_qseq>GAGATKWCAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTGTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAA------AATGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_qseq>
          <Hsp_hseq>GAGATTACAAGTAATAACGTAATAATGTCTTTTTGTGATAAGAAATGCTTAGTTGAAGGTATGTGATTATTAGTCATTAAATCGCTTTGCATGTGCTTTGGTTCTAATTTACCTTATCTCATAAGAACATGTAATAAATTATACAGGGAATTTCTGTTAAAAAATAATCCCACAGTTGTATGAGATTGGAGGAGACTTTTAGATGGTAGGTCAGTGGTCTTGCATTTAGTTGAGCAAAAGCAATTTTGCTTCCATTTCAGGACCGCATTTTGCTCCTCATTATAGTAAGTAATGTAGCACTTTTCTGACTTCTATTTTAACATTAGAATTGGGATTACTATCTCATTAATTTTCAAAGTCTCTGCAAGAAAGTCAGTATAATCATCTTTAGTAATGAGGCACCTCTATGGTAAAAAGTCTTAGTAATTTGTCCATTGTTTCAAATCAAAGAGAGAAGTGGAGGCTTTTAAATTCTAGCAAGGTGTTTATGTTATTAATATTTTCACTTTACAACTATTACAAATTAATAATTTTTTTCTTCTTGGAAACCTAGTATAAATATATTTGTAGGTCATAAATAAAAAATGAGAAATCAACTAAAAATGTTACCTTGTTCAGACTCCATTCGCTAAATTTTTCTCTCATTTTCATTGCAGGACATTTGATTATGACTGATCATGCTATTTGTCAGTATGTACATATGTGTGTACGTATGTATGTATCAATCAACCAATTATTATAGTAGCAATATATTAATTTCATATTACCCAAAATTATAAGCCTGAATGTGTTGAAAACTTGAATTTAAAATTACATATTTCTGCAAAACTTTTTATTTCTTTTTGCCTTTTCCAAAAGCAAACACTGTTTTTGGCTTCTTTCTCTTTGCTTACTTCCATATTTCAAGTCATCCTGAAATGATTCCATGCTGGAATTTTCAAAATAATTTCATGTTGAAATTTCTGCCTTAATATCTCTTGTATAAAACTACCTCCTGTCCTAATGTATCATGTCAAAAAAAAAAAAAAATAAGGAGGTTTCAGCTTTTCCCTTCACAAACTGTGTTTTCCTTTCATATGCAGAAATATGT</Hsp_hseq>
          <Hsp_midline>|||||  ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||      || |||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline>
        </Hsp>
        <Hsp>
          <Hsp_num>2</Hsp_num>
          <Hsp_bit-score>123.915</Hsp_bit-score>
          <Hsp_score>136</Hsp_score>
          <Hsp_evalue>6.73231e-24</Hsp_evalue>
          <Hsp_query-from>1</Hsp_query-from>
          <Hsp_query-to>74</Hsp_query-to>
          <Hsp_hit-from>63185</Hsp_hit-from>
          <Hsp_hit-to>63258</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>70</Hsp_identity>
          <Hsp_positive>70</Hsp_positive>
          <Hsp_gaps>0</Hsp_gaps>
          <Hsp_align-len>74</Hsp_align-len>
          <Hsp_qseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGASWGAGATKWCAAGTAA</Hsp_qseq>
          <Hsp_hseq>TAATAACTCTATATCAGAAGTGTTTTATCGTTACCATTTACAGATGAGTAAACCAAGACAGAGATGTCAAGTAA</Hsp_hseq>
          <Hsp_midline>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||  |||||  |||||||</Hsp_midline>
        </Hsp>
        <Hsp>
          <Hsp_num>3</Hsp_num>
          <Hsp_bit-score>49.9773</Hsp_bit-score>
          <Hsp_score>54</Hsp_score>
          <Hsp_evalue>0.121831</Hsp_evalue>
          <Hsp_query-from>1132</Hsp_query-from>
          <Hsp_query-to>1158</Hsp_query-to>
          <Hsp_hit-from>62138</Hsp_hit-from>
          <Hsp_hit-to>62164</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>27</Hsp_identity>
          <Hsp_positive>27</Hsp_positive>
          <Hsp_gaps>0</Hsp_gaps>
          <Hsp_align-len>27</Hsp_align-len>
          <Hsp_qseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_qseq>
          <Hsp_hseq>TGCAGAAATATGTAATTTTAAATTCAA</Hsp_hseq>
          <Hsp_midline>|||||||||||||||||||||||||||</Hsp_midline>
        </Hsp>
      </Hit_hsps>
    </Hit>
    <Hit>
      <Hit_num>2</Hit_num>
      <Hit_id>gi|850484145|gb|CP011891.1|</Hit_id>
      <Hit_def>Ovis canadensis canadensis isolate 43U chromosome 6 sequence</Hit_def>
...
xml blast filter • 721 views
ADD COMMENTlink modified 2.1 years ago by Alex Reynolds27k • written 2.1 years ago by vmicrobio240
3
gravatar for Jean-Karim Heriche
2.1 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

Use an XSLT stylesheet. See this post and this one.

ADD COMMENTlink written 2.1 years ago by Jean-Karim Heriche18k
2
gravatar for Alex Reynolds
2.1 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

BeautifulSoup may also be an option.

ADD COMMENTlink written 2.1 years ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 914 users visited in the last hour