Question: Attempting To Utilise The New Entrez Direct Package But Having Difficulty With Pubmed And Nucleotide Xml Parsing
gravatar for Daniel
7.0 years ago by
Cardiff University
Daniel3.8k wrote:

The new tool appears to do exactly what I want and I was keen to try it out, but I'm having some dificulty.

I am attempting to pull out sequences from a taxon tree matching a single gene and give me a table of Accession, Author(s), Affiliation, Title. This is to give to collaborators for them to authenticate trusted sources, and I will pull out the chosen fasta sequences at a later date.

When parsing pubmed records the documentation is quite clear, and I can confirm it works for me:

esearch -db pubmed -query "Garber ED [AUTH] AND PNAS [JOUR]" | elink -related | efilter -query "mouse" | efetch -format docsum | xtract -pattern DocumentSummary -element Id SortFirstAuthor Title

I am attempting to search the nucleotide database, but I cannot return the 'Authors' or other details using the 'xtract' command, and I can't find any examples on doing so

My best attempt is as follows, but it only gives the Id:

esearch -db nucleotide -query "txid2836[Organism:exp] AND rbcl[GENE]" | efetch -format docsum | xtract -pattern DocumentSummary -element Id Authors

Alternatively, I have been attempting to use efetch -format xml and xtract-ing the information from there, but I can't understand how to select the correct hierarchy level (documentation):

The xtract function is used for processing XML data:

Exploration Argument Hierarchy
-pattern       (Highest Rank)
-unit          (Lowest Rank)

One such attempt looks like this:

esearch -db nucleotide -query "txid2836[Organism:exp] AND rbcl[GENE]" | efetch -format xml | xtract -division Authors -unit Name
xml entrez eutils parsing • 4.1k views
ADD COMMENTlink modified 7.0 years ago by Neilfws49k • written 7.0 years ago by Daniel3.8k
gravatar for Neilfws
7.0 years ago by
Sydney, Australia
Neilfws49k wrote:

Currently, I'm unable to get your second example to run to completion. So I'm trying a simpler query:

esearch -db nucleotide -query "NM_182762.3"

The first step is to run xtract with the -outline option, to see what is in the XML:

esearch -db nucleotide -query "NM_182762.3" | efetch -format xml | xtract -outline > ed.out

If you examine the file ed.out, you will see the hierarchy for an author:


Running xtract again: -pattern is the element in the hierarchy "one level" above the -element that you want:

esearch -db nuccore -query "NM_182762.3" | efetch -format xml | xtract -pattern Person-id -element Person-id_ml | head -10

Ren B
Zakharov V
Yang Q
McMahon L
Yu J
Cao W
Xie C 
Wu J
Yun J
Lai J

It's worth spending some time with the complete edirect documentation as opposed to the simplified introduction.

ADD COMMENTlink written 7.0 years ago by Neilfws49k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2264 users visited in the last hour