Question: Entrez API returns results different from main search?
2
gravatar for Philipp Bayer
2.8 years ago by
Philipp Bayer6.5k
Australia/Perth/UWA
Philipp Bayer6.5k wrote:

I'm having a relatively baffling problem - I'm trying to access the info for a bunch of BioSample IDs via the Entrez API in Python using Biopython v1.68, here's a minimal example. I'm looking for BioSample SAMEA2467098, URL: https://www.ncbi.nlm.nih.gov/biosample/?term=SAMEA2467098

This should be Brassica napus, cv. Beluga

In Python:

from Bio import Entrez
Entrez.email = 'SNIP'
sample = 'SAMEA2467098'
handle = Entrez.efetch('BioSample', id=sample, retmode='text')
print(handle.readlines())
['1: Non-tumor DNA sample from blood of a human participant in the dbGaP study "An APOBEC Cytidine Deaminase Mutagenesis in Human Cancers"\n', 'Identifiers: BioSample: SAMN02467098\n', 'Organism: Homo sapiens\n', 'Attributes:\n', '    /submitter handle="NHGRI_APOBEC_CytidineDeaminase"\n', '    /biospecimen repository="NHGRI_APOBEC_CytidineDeaminase"\n', '    /study name="An APOBEC Cytidine Deaminase Mutagenesis in Human Cancers"\n', '    /study design="Cross-Sectional"\n', '    /biospecimen repository sample id="TCGA-BP-5199-11A-01D-1429-08"\n', '    /submitted sample id="TCGA-BP-5199-11A-01D-1429-08"\n', '    /submitted subject id="TCGA-BP-5199-11A-01D-1429-08"\n', '    /study disease="Neoplasms"\n', '    /tissue="blood"\n', '    /analyte type="DNA"\n', '    /is tumor="No"\n', '    /subject is affected="Yes"\n', '    /molecular data type="SNV (.MAF)"\n', 'Accession: SAMN02467098\tID: 2467098\n', '\n', '\n']

Suddenly it's human, and the accession changed from SAME A 2467098 to SAM N0 2467098, which is indeed human: https://www.ncbi.nlm.nih.gov/biosample/?term=SAMN02467098

What am I doing wrong here? It works for most of my other BioSample IDs!

Edit: It's not a Biopython problem, using the API manually gives the same result: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=biosample&id=SAMEA2467098&retmode=text

sra entrez python • 2.0k views
ADD COMMENTlink modified 2.8 years ago by DCGenomics320 • written 2.8 years ago by Philipp Bayer6.5k

May be worth creating a ticket at NCBI support. Looks like some sort of a bug in their code.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax74k

I dropped them a message late last night, looks like they'll file it as a bug, will update here

ADD REPLYlink written 2.8 years ago by Philipp Bayer6.5k

Any update on this issue ?

The bug doesn't seem to be fixed yet.

ADD REPLYlink written 2.1 years ago by Zorino50

I haven't heard back anything, but just going over the middle step of using the returned ID works

ADD REPLYlink written 2.1 years ago by Philipp Bayer6.5k
1
gravatar for Philipp Bayer
2.8 years ago by
Philipp Bayer6.5k
Australia/Perth/UWA
Philipp Bayer6.5k wrote:

Hm, I think this has to do with the way Entrez handles IDs, if I use esearch first to get the BioSample's ID number it works:

This works, using esearch first:

handle = Entrez.esearch('BioSample', term='SAMEA2467098', id='text')
record = Entrez.read(handle)
record
{u'Count': '1', u'RetMax': '1', u'IdList': ['3769633'], u'TranslationStack': [{u'Count': '1', u'Field': 'All Fields', u'Term': 'SAMEA2467098[All Fields]', u'Explode': 'N'}, 'GROUP'], u'TranslationSet': [], u'RetStart': '0', u'QueryTranslation': 'SAMEA2467098[All Fields]'}

Using the new ID now in efetch:

handle = Entrez.efetch('BioSample', id=3769633, retmode='text')
handle.readlines()
['1: Beluga_PBY013; Beluga\n', 'Identifiers: BioSample: SAMEA2467098; SRA: ERS436855\n', 'Organism: Brassica napus\n', 'Attributes:\n', '    /sample name="ERS436855"\n', 'Accession: SAMEA2467098\tID: 3769633\n', '\n', '\n']

We're back at Beluga! But I'm still not 100% sure why this works the way it does, possibly the ID I'm entering clashes with the human entry or gets shortened for search.

ADD COMMENTlink written 2.8 years ago by Philipp Bayer6.5k
1

esearch UNIX utils seems to work

esearch -db biosample -query "SAMEA2467098"|esummary
ADD REPLYlink written 2.8 years ago by Sej Modha4.5k

Interesting, I can confirm this! I reckon because this one asks or a 'query' while I'm supplying an 'ID' above?

ADD REPLYlink written 2.8 years ago by Philipp Bayer6.5k
1

Yes, I think you're right because if I supply ID then efetch returns the human results.

efetch -db biosample -id "SAMEA2467098"

1: Non-tumor DNA sample from blood of a human participant in the dbGaP study "An APOBEC Cytidine Deaminase Mutagenesis in Human Cancers" Identifiers: BioSample: SAMN02467098 Organism: Homo sapiens

ADD REPLYlink written 2.8 years ago by Sej Modha4.5k

I'd guess that NCBI Entrez Fetch is being 'helpful' by accepting the the sample name, when it really wants the numerical identifier - and this is going wrong. I'd email the EUtilities help team about this if I were you.

ADD REPLYlink written 2.8 years ago by Peter5.8k
1
gravatar for DCGenomics
2.8 years ago by
DCGenomics320
United States
DCGenomics320 wrote:

Two things,

NCBI Education Group has been offering webinars on this lately.

Check out some of the recent ones at:

https://www.ncbi.nlm.nih.gov/home/coursesandwebinars.shtml

Also, if you put in an issue here, we'll try to help you with a solution:

https://github.com/NCBI-Hackathons/EDirect_EUtils_API_Cookbook

ADD COMMENTlink written 2.8 years ago by DCGenomics320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1065 users visited in the last hour