Question: biopython esearch not giving all children taxIDs
0
gravatar for yarmda
23 months ago by
yarmda0
yarmda0 wrote:

I know similar questions have been posted before..

How To Retrieve All Sequences, From Ncbi, That Belong To A Specific Txid And Its Sub Txids?

C: Refseq Proteins For A Given Taxid

But, I am having trouble retrieving the children sequences of a given taxID.

For instance,

from Bio import Entrez
record = Entrez.read(Entrez.esearch(db='protein', term="txid1392[Organism]"))
record['IdList']

Returns just one list of protein UIDs for the Bacillus anthracis species at 1392, not the list for each organism that is below this taxon. Thus, Entrez.efetch only returns one set of protein sequences.

Dropping the [Organism] doesn't change this behavior. Am I missing something?

entrez biopython ncbi • 617 views
ADD COMMENTlink modified 10 months ago by tsrmhathesh0 • written 23 months ago by yarmda0

Unfortunately, I think you might have to list all the child taxon identifiers explicitly - but try exploring the web interface for building an advanced query first in case that shows a better solution.

ADD REPLYlink written 23 months ago by Peter5.8k

Actually, I think it may have been an issue with a default retmax of 20.

ADD REPLYlink written 23 months ago by yarmda0

Oh good. I should have tried the example myself really to confirm my hunch. Thanks!

ADD REPLYlink written 23 months ago by Peter5.8k

hy ,i am currenly doing biopython yet in industry does it have influence and impotance in this era?

ADD REPLYlink written 10 months ago by tsrmhathesh0
1

Hi, this comment is not appropriate to this (very old) thread.

If you wish to ask a question, please create your own thread. If you do, I strongly encourage you to search the forum first (since questions like this are asked often - and are of dubious usefulness). If you cannot find something that satisfies you, ask a question, but please add much more information and detail and make the question as specific as possible.

ADD REPLYlink written 10 months ago by jrj.healey13k
2
gravatar for Renesh
23 months ago by
Renesh1.6k
United States
Renesh1.6k wrote:

Your code is fetching only the UIDs from the first page. You need to provide retmax= parameter to fetch all records. See below corrected code,

 from Bio import Entrez
 record = Entrez.read(Entrez.esearch(db='protein', retmax=770094, term="txid1392[Organism]"))
 record['IdList']

I have put retmax=770094 as this taxon has 770094 protein records.

Alternative,

You can get the list of sequences for txid1392[Organism] from web NCBI also.

  • Go to NCBI Entrez search and type txid1392[Organism] and choose protein database from dropdown list (it will fetch all protein sequences for txid1392[Organism]
  • Go to send to button and send to file (You can choose FASTA format for downloading the sequences)
ADD COMMENTlink written 23 months ago by Renesh1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 682 users visited in the last hour