Biopython: Automating Genpept Queries
2
1
Entering edit mode
12.4 years ago
Ben Yackley ▴ 10

Hi - I've been trying to use BioPython to automate some research I've been doing, and while I've been able to get the NCBIWWW module to download XML files for the queries I'm sending it, I'm not getting all the information I need. Specifically, I need the GenPept information (e.g. taxonomy of the organisms the query found) relating to each match. I can't find any way in the documentation to do this, though, so I'm turning to you for help. Can this be done?

Thanks for the help!

biopython genbank sequence retrieval • 2.7k views
ADD COMMENT
0
Entering edit mode

I am not sure if this will help. but here is a post that might be relevant: http://www.mailinglistarchive.com/html/biopython@biopython.org/2010-11/msg00008.html

ADD REPLY
0
Entering edit mode

That's actually from the same project as what I'm on - it relates more to processing the GenPept files once we've got them. I'm trying to get them to download automatically in the first place. Thanks, though.

ADD REPLY
2
Entering edit mode
12.4 years ago
Peter 6.0k

You can access the NCBI taxonomy information using the NCBI Entrez Utilities. See the "Finding the lineage of an organism" example the chapter on Bio.Entrez in the Biopython Tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html and also http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchtax_help.html

ADD COMMENT
0
Entering edit mode
12.4 years ago

Hi Ben, did you try to look at this ? ftp://ftp.ncifcrf.gov/pub/genpept/

There are 4 files here: gpdat_1.seq.gz gpdat_2.seq.gz gpdat_3.seq.gz gpdat_4.seq.gz

You can download and parse them accordingly, based on what you need. Hope it helps.

ADD COMMENT

Login before adding your answer.

Traffic: 2813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6