Question: Using Biopython to search through a list of authors on pubmed
0
gravatar for flisstee
21 months ago by
flisstee0
flisstee0 wrote:

I am currently writing a programme using Biopython and part of it involves searching Pubmed using an entry specified by the user. This returns all the records that match this entry. I then want to search just these records for a specific author, again specified by the user, however I am having issues with how to do this. If anyone can help, it would be much appreciated.

Here is my code so far, it is largely taken from the Biopython cookbook documentation. For this example, I am searching for the word "brainbow" and the author "Currie" (which I believe should be there).

from Bio import Entrez, SeqIO, Medline
Entrez.email = "A.N.Other@example.com" 

def pbmd_search(): #searches pubmed database, using Biopython documentation
    handle = Entrez.egquery(term="brainbow")
    record = Entrez.read(handle)
    for row in record["eGQueryResult"]:
        if row["DbName"]=="pubmed":
            print(str(row["Count"]) + " records returned")

    handle = Entrez.esearch(db="pubmed", term="brainbow", retmax=1000)
    record = Entrez.read(handle)
    idlist = record["IdList"]

    handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text")
    records = Medline.parse(handle)
    records = list(records)

    records_str = []

    for record in records:
        records_str += "Title: %s \nAuthors: %s \nSource: %s\n\n" %(record.get("TI"), ", ".join(record.get("AU")), record.get("SO"))

        search_author = "Currie"
        print(search_author)
        if search_author in records_str:
            print("yes")
        else:
            print("no")
pubmed biopython • 1.0k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by flisstee0

Can you specify what the issue is? What's the reason for not searching for "brainbow AND currie"?

ADD REPLYlink modified 21 months ago • written 21 months ago by WouterDeCoster32k

What I want to do is to search first for "brainbow," and then to search that list of records for "Currie", i.e. I'm only interested in records that match "brainbow" and that also have an author called "Currie". However, I want it to work so that the user doesn't have to search for both at the same time.

Ps. I'm not an expert programmer so I may be missing something obvious!

ADD REPLYlink modified 21 months ago • written 21 months ago by flisstee0

So you'd search for brainbow AND currie[Author]? Programming looks fine, I'm a bit at loss about the logic ;)

ADD REPLYlink written 21 months ago by WouterDeCoster32k

Okay, so this is part of a GUI programme that uses GTK3. I have a window with 2 buttons and 2 entry widgets, so one button is linked to the code searching pubmed e.g. for "brainbow" and then the other button is linked to searching for an author e.g. "Currie". My idea was that users can search for records without necessarily also searching for an author.

ADD REPLYlink written 21 months ago by flisstee0

Then you could still have the input from both entry widgets use to concatenate a search term? But okay, you seem quite convinced about your approach. Is the problem that the search_author isn't found? Or what goes wrong in your code?

ADD REPLYlink written 21 months ago by WouterDeCoster32k

Yes, so the bit that goes wrong is the

if search_author in records_str:
    print("yes")
else: 
    print("no")

The search_author isn't found.

ADD REPLYlink written 21 months ago by flisstee0

And have you confirmed that records_str contains Currie? Because when I use your code, records_str doesn't look like something useful ;)

You probably need records_str.append() instead of += But then still your if search_author in records_str: won't work because records_str is a list. You'll need to iterate over the list and for every element ask if search_author is in that element

Note that you can simplify a lot too, but let's first try to get it working correctly before further changing stuff.

ADD REPLYlink written 21 months ago by WouterDeCoster32k

Thanks for your help, I have now solved my problem.

ADD REPLYlink written 21 months ago by flisstee0
1
gravatar for flisstee
21 months ago by
flisstee0
flisstee0 wrote:

This now searches pubmed and then searches those results for a specific author.

from Bio import Entrez, SeqIO, Medline
Entrez.email = "A.N.Other@example.com" 

def pbmd_search(): #searches pubmed database, using Biopython documentation
    handle = Entrez.egquery(term="brainbow")
    record = Entrez.read(handle)
    for row in record["eGQueryResult"]:
        if row["DbName"]=="pubmed":
            print(str(row["Count"]) + " records returned")

    handle = Entrez.esearch(db="pubmed", term="brainbow", retmax=463)
    record = Entrez.read(handle)
    idlist = record["IdList"]

    handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text")
    records = Medline.parse(handle)
    records = list(records)
    records_str = []
    for record in records:
        records_str += "Title: %s \nAuthors: %s \nSource: %s\n\n" %(record.get("TI"), ", ".join(record.get("AU")), record.get("SO"))

    print("".join(records_str))

    search_author = "Currie JD"
    author_result = []

    for record in records:
        if not "AU" in record:
            continue
        if search_author in record["AU"]:
            author_result+=("Title: %s \nAuthors: %s \nSource: %s\n\n" %(record.get("TI"), ", ".join(record.get("AU")), record.get("SO")))
            print(search_author)
        else:
            if search_author not in self.record["AU"]:
                print("Author not found")

    print("".join(author_result))

pbmd_search()
ADD COMMENTlink written 21 months ago by flisstee0

Why this part:

handle = Entrez.egquery(term="brainbow")
record = Entrez.read(handle)
for row in record["eGQueryResult"]:
    if row["DbName"]=="pubmed":
        print(str(row["Count"]) + " records returned")

Couldn't you just use the length of the idlist to determine the number of records?

ADD REPLYlink written 21 months ago by WouterDeCoster32k

I took this part straight out of the Biopython documentation, it seems to work so I didn't really think it needed altering

ADD REPLYlink written 21 months ago by flisstee0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 644 users visited in the last hour