Using Biopython to search through a list of authors on pubmed
1
0
Entering edit mode
7.4 years ago
flisstee • 0

I am currently writing a programme using Biopython and part of it involves searching Pubmed using an entry specified by the user. This returns all the records that match this entry. I then want to search just these records for a specific author, again specified by the user, however I am having issues with how to do this. If anyone can help, it would be much appreciated.

Here is my code so far, it is largely taken from the Biopython cookbook documentation. For this example, I am searching for the word "brainbow" and the author "Currie" (which I believe should be there).

from Bio import Entrez, SeqIO, Medline
Entrez.email = "A.N.Other@example.com" 

def pbmd_search(): #searches pubmed database, using Biopython documentation
    handle = Entrez.egquery(term="brainbow")
    record = Entrez.read(handle)
    for row in record["eGQueryResult"]:
        if row["DbName"]=="pubmed":
            print(str(row["Count"]) + " records returned")

    handle = Entrez.esearch(db="pubmed", term="brainbow", retmax=1000)
    record = Entrez.read(handle)
    idlist = record["IdList"]

    handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text")
    records = Medline.parse(handle)
    records = list(records)

    records_str = []

    for record in records:
        records_str += "Title: %s \nAuthors: %s \nSource: %s\n\n" %(record.get("TI"), ", ".join(record.get("AU")), record.get("SO"))

        search_author = "Currie"
        print(search_author)
        if search_author in records_str:
            print("yes")
        else:
            print("no")
Biopython pubmed • 4.7k views
ADD COMMENT
0
Entering edit mode

Can you specify what the issue is? What's the reason for not searching for "brainbow AND currie"?

ADD REPLY
0
Entering edit mode

What I want to do is to search first for "brainbow," and then to search that list of records for "Currie", i.e. I'm only interested in records that match "brainbow" and that also have an author called "Currie". However, I want it to work so that the user doesn't have to search for both at the same time.

Ps. I'm not an expert programmer so I may be missing something obvious!

ADD REPLY
0
Entering edit mode

So you'd search for brainbow AND currie[Author]? Programming looks fine, I'm a bit at loss about the logic ;)

ADD REPLY
0
Entering edit mode

Okay, so this is part of a GUI programme that uses GTK3. I have a window with 2 buttons and 2 entry widgets, so one button is linked to the code searching pubmed e.g. for "brainbow" and then the other button is linked to searching for an author e.g. "Currie". My idea was that users can search for records without necessarily also searching for an author.

ADD REPLY
0
Entering edit mode

Then you could still have the input from both entry widgets use to concatenate a search term? But okay, you seem quite convinced about your approach. Is the problem that the search_author isn't found? Or what goes wrong in your code?

ADD REPLY
0
Entering edit mode

Yes, so the bit that goes wrong is the

if search_author in records_str:
    print("yes")
else: 
    print("no")

The search_author isn't found.

ADD REPLY
0
Entering edit mode

And have you confirmed that records_str contains Currie? Because when I use your code, records_str doesn't look like something useful ;)

You probably need records_str.append() instead of += But then still your if search_author in records_str: won't work because records_str is a list. You'll need to iterate over the list and for every element ask if search_author is in that element

Note that you can simplify a lot too, but let's first try to get it working correctly before further changing stuff.

ADD REPLY
0
Entering edit mode

Thanks for your help, I have now solved my problem.

ADD REPLY
1
Entering edit mode
7.4 years ago
flisstee • 0

This now searches pubmed and then searches those results for a specific author.

from Bio import Entrez, SeqIO, Medline
Entrez.email = "A.N.Other@example.com" 

def pbmd_search(): #searches pubmed database, using Biopython documentation
    handle = Entrez.egquery(term="brainbow")
    record = Entrez.read(handle)
    for row in record["eGQueryResult"]:
        if row["DbName"]=="pubmed":
            print(str(row["Count"]) + " records returned")

    handle = Entrez.esearch(db="pubmed", term="brainbow", retmax=463)
    record = Entrez.read(handle)
    idlist = record["IdList"]

    handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text")
    records = Medline.parse(handle)
    records = list(records)
    records_str = []
    for record in records:
        records_str += "Title: %s \nAuthors: %s \nSource: %s\n\n" %(record.get("TI"), ", ".join(record.get("AU")), record.get("SO"))

    print("".join(records_str))

    search_author = "Currie JD"
    author_result = []

    for record in records:
        if not "AU" in record:
            continue
        if search_author in record["AU"]:
            author_result+=("Title: %s \nAuthors: %s \nSource: %s\n\n" %(record.get("TI"), ", ".join(record.get("AU")), record.get("SO")))
            print(search_author)
        else:
            if search_author not in self.record["AU"]:
                print("Author not found")

    print("".join(author_result))

pbmd_search()
ADD COMMENT
0
Entering edit mode

Why this part:

handle = Entrez.egquery(term="brainbow")
record = Entrez.read(handle)
for row in record["eGQueryResult"]:
    if row["DbName"]=="pubmed":
        print(str(row["Count"]) + " records returned")

Couldn't you just use the length of the idlist to determine the number of records?

ADD REPLY
0
Entering edit mode

I took this part straight out of the Biopython documentation, it seems to work so I didn't really think it needed altering

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6