Question: biopython esearch not giving all children taxIDs
0
gravatar for yarmda
2.8 years ago by
yarmda40
yarmda40 wrote:

I know similar questions have been posted before..

How To Retrieve All Sequences, From Ncbi, That Belong To A Specific Txid And Its Sub Txids?

C: Refseq Proteins For A Given Taxid

But, I am having trouble retrieving the children sequences of a given taxID.

For instance,

from Bio import Entrez
record = Entrez.read(Entrez.esearch(db='protein', term="txid1392[Organism]"))
record['IdList']

Returns just one list of protein UIDs for the Bacillus anthracis species at 1392, not the list for each organism that is below this taxon. Thus, Entrez.efetch only returns one set of protein sequences.

Dropping the [Organism] doesn't change this behavior. Am I missing something?

entrez biopython ncbi • 757 views
ADD COMMENTlink modified 21 months ago by tsrmhathesh0 • written 2.8 years ago by yarmda40

Unfortunately, I think you might have to list all the child taxon identifiers explicitly - but try exploring the web interface for building an advanced query first in case that shows a better solution.

ADD REPLYlink written 2.8 years ago by Peter5.8k

Actually, I think it may have been an issue with a default retmax of 20.

ADD REPLYlink written 2.8 years ago by yarmda40

Oh good. I should have tried the example myself really to confirm my hunch. Thanks!

ADD REPLYlink written 2.8 years ago by Peter5.8k

hy ,i am currenly doing biopython yet in industry does it have influence and impotance in this era?

ADD REPLYlink written 21 months ago by tsrmhathesh0
1

Hi, this comment is not appropriate to this (very old) thread.

If you wish to ask a question, please create your own thread. If you do, I strongly encourage you to search the forum first (since questions like this are asked often - and are of dubious usefulness). If you cannot find something that satisfies you, ask a question, but please add much more information and detail and make the question as specific as possible.

ADD REPLYlink written 21 months ago by Joe17k
2
gravatar for Renesh
2.8 years ago by
Renesh1.9k
United States
Renesh1.9k wrote:

Your code is fetching only the UIDs from the first page. You need to provide retmax= parameter to fetch all records. See below corrected code,

 from Bio import Entrez
 record = Entrez.read(Entrez.esearch(db='protein', retmax=770094, term="txid1392[Organism]"))
 record['IdList']

I have put retmax=770094 as this taxon has 770094 protein records.

Alternative,

You can get the list of sequences for txid1392[Organism] from web NCBI also.

  • Go to NCBI Entrez search and type txid1392[Organism] and choose protein database from dropdown list (it will fetch all protein sequences for txid1392[Organism]
  • Go to send to button and send to file (You can choose FASTA format for downloading the sequences)
ADD COMMENTlink written 2.8 years ago by Renesh1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 767 users visited in the last hour