Question: Different output fields from efetch
0
gravatar for Medhat
10 weeks ago by
Medhat8.1k
Texas
Medhat8.1k wrote:

If I tried to get organism name using efetch like:

efetch -db nuccore -id "NC_001422.1" -format docsum | xtract -pattern DocumentSummary -element Organism result will be: Escherichia virus phiX174

While If I used :
handle = Entrez.efetch(db="nuccore", id="NC_001422.1", rettype="docsum")

There will not be any Organism element in the output result, Should I used different parameters while using python?
or use Subprocess to run efetch from command line?

like:

 filter_cmd = ['xtract', '-pattern', 'DocumentSummary', '-element', 'Organism']
 info_name_cmd = ['efetch', '-db', 'nuccore', '-id', 'NC_001422.1', '-format', 'docsum',]
 ps = subprocess.run(info_name_cmd, stdout=subprocess.PIPE)
 output = subprocess.check_output((filter_cmd), stdin=ps.stdout)

Thanks.

sequence efetch python • 161 views
ADD COMMENTlink modified 10 weeks ago by Pierre Lindenbaum116k • written 10 weeks ago by Medhat8.1k

If one uses python then you can't get the organism name? It is there in results for sure.

$ efetch -db nuccore -id NC_001422 -format docsum | grep Organism
        <Organism>Escherichia virus phiX174</Organism>
ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax62k

As I stated I know that the Organism name exists when using eftech from command line, but try the code I suggested in python it will give you only the result you get from running:
wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=docsum&retmode=xml&id=NC_001422.1" as Pierre Lindenbaum sugggested which does not contain the Organism only title. (you can try it) .
handle = Entrez.efetch(db="nuccore", id="NC_001422.1", rettype="docsum")

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Medhat8.1k
1
gravatar for Pierre Lindenbaum
10 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

your query only returns the TaxId

$ wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=docsum&retmode=xml&id=NC_001422.1" 

https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20041029/esummary-v1.dtd">
<eSummaryResult>

<DocSum>
    <Id>9626372</Id>
    <Item Name="Caption" Type="String">NC_001422</Item>
    <Item Name="Title" Type="String">Coliphage phi-X174, complete genome</Item>
    <Item Name="Extra" Type="String">gi|9626372|ref|NC_001422.1|[9626372]</Item>
    <Item Name="Gi" Type="Integer">9626372</Item>
    <Item Name="CreateDate" Type="String">1993/04/28</Item>
    <Item Name="UpdateDate" Type="String">2018/07/06</Item>
    <Item Name="Flags" Type="Integer">768</Item>
    <Item Name="TaxId" Type="Integer">10847</Item>
    <Item Name="Length" Type="Integer">5386</Item>
    <Item Name="Status" Type="String">live</Item>
    <Item Name="ReplacedBy" Type="String"></Item>
    <Item Name="Comment" Type="String"></Item>
    <Item Name="AccessionVersion" Type="String">NC_001422.1</Item>
</DocSum>
</eSummaryResult>

using retmode=fasta would return the organism name:

$ wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=fasta&retmode=xml&id=NC_001422.1"  | grep -v TSeq_sequence

https://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
<TSeqSet>
<TSeq>
  <TSeq_seqtype value="nucleotide"/>
  <TSeq_accver>NC_001422.1</TSeq_accver>
  <TSeq_taxid>10847</TSeq_taxid>
  <TSeq_orgname>Escherichia virus phiX174</TSeq_orgname>
  <TSeq_defline>Coliphage phi-X174, complete genome</TSeq_defline>
  <TSeq_length>5386</TSeq_length>
</TSeq>

</TSeqSet>
ADD COMMENTlink written 10 weeks ago by Pierre Lindenbaum116k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 784 users visited in the last hour