Question: Different output fields from efetch
gravatar for Medhat
3 days ago by
Medhat8.0k wrote:

If I tried to get organism name using efetch like:

efetch -db nuccore -id "NC_001422.1" -format docsum | xtract -pattern DocumentSummary -element Organism result will be: Escherichia virus phiX174

While If I used :
handle = Entrez.efetch(db="nuccore", id="NC_001422.1", rettype="docsum")

There will not be any Organism element in the output result, Should I used different parameters while using python?
or use Subprocess to run efetch from command line?


 filter_cmd = ['xtract', '-pattern', 'DocumentSummary', '-element', 'Organism']
 info_name_cmd = ['efetch', '-db', 'nuccore', '-id', 'NC_001422.1', '-format', 'docsum',]
 ps =, stdout=subprocess.PIPE)
 output = subprocess.check_output((filter_cmd), stdin=ps.stdout)


sequence efetch python • 83 views
ADD COMMENTlink modified 3 days ago by Pierre Lindenbaum115k • written 3 days ago by Medhat8.0k

If one uses python then you can't get the organism name? It is there in results for sure.

$ efetch -db nuccore -id NC_001422 -format docsum | grep Organism
        <Organism>Escherichia virus phiX174</Organism>
ADD REPLYlink modified 3 days ago • written 3 days ago by genomax59k

As I stated I know that the Organism name exists when using eftech from command line, but try the code I suggested in python it will give you only the result you get from running:
wget -O - -q "" as Pierre Lindenbaum sugggested which does not contain the Organism only title. (you can try it) .
handle = Entrez.efetch(db="nuccore", id="NC_001422.1", rettype="docsum")

ADD REPLYlink modified 3 days ago • written 3 days ago by Medhat8.0k
gravatar for Pierre Lindenbaum
3 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

your query only returns the TaxId

$ wget -O - -q """>

    <Item Name="Caption" Type="String">NC_001422</Item>
    <Item Name="Title" Type="String">Coliphage phi-X174, complete genome</Item>
    <Item Name="Extra" Type="String">gi|9626372|ref|NC_001422.1|[9626372]</Item>
    <Item Name="Gi" Type="Integer">9626372</Item>
    <Item Name="CreateDate" Type="String">1993/04/28</Item>
    <Item Name="UpdateDate" Type="String">2018/07/06</Item>
    <Item Name="Flags" Type="Integer">768</Item>
    <Item Name="TaxId" Type="Integer">10847</Item>
    <Item Name="Length" Type="Integer">5386</Item>
    <Item Name="Status" Type="String">live</Item>
    <Item Name="ReplacedBy" Type="String"></Item>
    <Item Name="Comment" Type="String"></Item>
    <Item Name="AccessionVersion" Type="String">NC_001422.1</Item>

using retmode=fasta would return the organism name:

$ wget -O - -q ""  | grep -v TSeq_sequence">
  <TSeq_seqtype value="nucleotide"/>
  <TSeq_orgname>Escherichia virus phiX174</TSeq_orgname>
  <TSeq_defline>Coliphage phi-X174, complete genome</TSeq_defline>

ADD COMMENTlink written 3 days ago by Pierre Lindenbaum115k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1587 users visited in the last hour