Question: edirect: Entrez Unix Command line
0
gravatar for st.ph.n
4.6 years ago by
st.ph.n2.4k
Philadelphia, PA
st.ph.n2.4k wrote:

Hi All. I'm trying to output a search using the Entrez Direct: E-utilities on the UNIX Command Line. What I want to do is search using esearch, and using xtract output the following format:

pubmedid        First Author Last Name / FA first initial       First Author Affiliation        Date Published

 

I can get some of the output I need using two different codes, but putting them together would be tricky, and harder than it should be, or so I imagine. The problem, I think lies with two different formats: docsum, and xml.

The first command I've been playing with:

./esearch -db pubmed -query "search string" | ./efilter -mindate 2005 | ./efetch -format docsum | ./xtract -pattern DocumentSummary -element MedlineCitation/PMID -element Id SortFirstAuthor | sort -t $'\t' -k 3,3n -k 2,2f

 

So that outputs the first two columns as desired. However, the docsum format doesn't contain information about affiliation.

Using this command:

./esearch -db pubmed -query "search string" | ./efilter -mindate 2005 | ./efetch -format xml | ./xtract -pattern PubmedArticle -element MedlineCitation/PMID -element Id SortFirstAuthor Affiliation -block PubDate -sep " " -element Year,Month MedlineDate | sort -t $'\t' -k 3,3n -k 2,2f

I get the pubmedid, and all affiliations of every author on each publication.

Does anyone know how I might tweak either of these codes? Is it possible to ignore all other authors except the first other?

All help is appreciated.

ADD COMMENTlink modified 4.6 years ago by Pierre Lindenbaum118k • written 4.6 years ago by st.ph.n2.4k
2
gravatar for Ashutosh Pandey
4.6 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

You should be using "efetch" eutils for this task. You can write your own xml parser. But I would not advise you to reinvent the wheel as it can be pretty easily done using Biopython or Bioperl. See below how it can be done in Biopython:

from Bio import Medline
from Bio import Entrez
Entrez.email = 'xyz@gmail.com' # replace it with your email id

handle = Entrez.esearch(db="pubmed", term="Ignorome")
record = Entrez.read(handle)
def Pubmedsearch(PMID):

        pmid = ""
        pmid = PMID
        handle = Entrez.efetch(db="pubmed", id= pmid, rettype="medline",retmode="text")
        records = Medline.parse(handle)
        records = list(records)
        for record in records:
                return (str(pmid)+"\t"+str(record.get("TI", "?"))+"\t"+str(record.get("AU", "?")[0])+"\t"+str(record.get("AD", "?").split(".")[0])+"\t"+str(record.get("DP", "?")))
                ### TI >Title, FAU > Full Author Name, AU > Author name, AD > Affiliation, DP > Publication  date (More tags can be added from here http://www.ncbi.nlm.nih.gov/books/NBK3827/)
## I have printed the first element of the authors list because you needed the first author. Similarly I split the AD string using "." to get the affiliation of the first author.
## Test , this is my paper. I have also printed the title.
for pmid in record["IdList"]:
      info = Pubmedsearch(pmid)
      print info

###Output

24523945        Functionally enigmatic genes: a case study of the brain ignorome.       Pandey AK       UT Center for Integrative and Translational Genomics and Department of Anatomy and Neurobiology, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America  2014

EDIT: Now it takes a string as input , retrieves all the PMIDs from Pubmed related to that string and then lopps over each PMID and returns the first author information. 

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Ashutosh Pandey11k

Can efetch be used to search for all publications matching a string, similar to my example where esearch is used?

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by st.ph.n2.4k

I didnt read your question carefully. I thought you needed author information for a specific pubmedid. Efetch works when you already have a pubmedid. It doesn't work with all the entrez databases. You will have to write a Esearch-> Efetch pipeline as explained here:http://www.ncbi.nlm.nih.gov/books/NBK25498/#chapter3.ESearch__ESummaryEFetch OR you can use Biopython module that uses Esearch and returns list of PMIDs and then you can get the authors information using Efetch. I am sure Biopython will have some module for Esearch. 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Ashutosh Pandey11k

Go through this tutorial ( http://people.duke.edu/~ccc14/pcfb/biopython/BiopythonEntrez.html ). You can see how it used a string to search Pubmed for all the PMIDs that have that string in them using "einfo". Once you have the PMIDs , then you can use "efetch". 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Ashutosh Pandey11k

Ok, so I will modify the first command,

./esearch -db pubmed -query "coral genomics" | ./efilter -mindate 2005 | ./efetch -format xml | ./xtract -pattern PubmedArticle -element MedlineCitation/PMID 

to now only print pubmedid's. From there, I can use BioPython, with your example, to get the information, per pubmedid. What I really want is only the City/State/Country from each first author affiliation, and can parse that out later.

ADD REPLYlink written 4.6 years ago by st.ph.n2.4k

You will have to write your own code to extract information at that specificity level. Try to understand the structure of the output and then write a parser that will retrieve city, country etc. The "affiliation" information can't be accessed individually as city, country, state from pubmed. It will be extracted together as a big affiliation string and then you will have to parse whatever you need from that string.   PS: I have updated a link in my previous comment. please go through that tuorial and it should be very helpful. Thanks. 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Ashutosh Pandey11k

Please check the new code. 

ADD REPLYlink written 4.6 years ago by Ashutosh Pandey11k

Thanks, Ashutosh. I'm familiar with Python and that Biopython has can deal with these kinds of tasks.  I was hoping to get it all done with entrez edirect.

ADD REPLYlink written 4.6 years ago by st.ph.n2.4k

Well I also have a code that doesn't use Biopython and use pure eutils and my own xmlparser but I think Pierre's code would be much better than mine. Good luck. 

ADD REPLYlink written 4.6 years ago by Ashutosh Pandey11k
0
gravatar for Pierre Lindenbaum
4.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

FYI: you might be interested in my old blog post "Mapping NCBI/PUBMED" (2007) : http://plindenbaum.blogspot.fr/2007/06/mapping-ncbipubmed.html

http://code.google.com/p/lindenb/source/browse/trunk/src/java/org/lindenb/tool/oneshot/NCBIMap.java

ADD COMMENTlink written 4.6 years ago by Pierre Lindenbaum118k

Thanks, Pierre. What you did seems like what I am going to try and accomplish. I plan to parse out the city/state/country information from the affiliation of the first author for each publication found on a given topic. (hence esearch, instead of efetch). Then using that information use geopy to get longitude/latitude for those cities, and map them to a basemap using networkx, and matplotlib.

ADD REPLYlink written 4.6 years ago by st.ph.n2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1148 users visited in the last hour