Question: How To Retrieve Data From Jgi Automatically Given A Set Of Ids?
2
gravatar for Manu Prestat
5.6 years ago by
Manu Prestat3.9k
Marseille, France
Manu Prestat3.9k wrote:

Hi, I need to retrieve genomes and metagenomes (assemblies or raw sequences) from JGI DBs. It is doable (not easy though) using HTML forms and following links. However, I need to repeat this process hundreds of times, and I would appreciate to not waste my time anymore.

JGI does provide users with an (very brief) API documentation and an API XML schema (XSD) (usually understood only by Pierre alias @yokofakun ;-) ) and I cannot even make the curl "signing on" command work. Do you know a way to process this task automatically (e.g. using R, python, or any GNU tool...) given some IDs (like project or sample ID)?

Thanks, Manu

R api python xml • 2.6k views
ADD COMMENTlink modified 8 months ago by bison1000 • written 5.6 years ago by Manu Prestat3.9k

Hi glarue,

I have read the script "jgi-query.py" from https://github.com/glarue/jgi-query, but I don't understand it yet.

I want to download metagenomes from JGI using API.

Does your script work for downloading metagenomes from JGI?

Best, Bing

ADD REPLYlink written 8 months ago by bison1000

Geez, sorry to have missed this for so long—my notification settings must not be set up correctly.

The answer to your question depends on what you mean by "metagenome", and the way in which JGI structures its databases, although I fear the answer may be "no". Basically, you have to provide a category to jgi-query, and all of the files organized under that category will be listed. If you are interested in multiple fungal genomes, for example, you can use the query fungi to retrieve a (huge) list of all available files, and then download individual files from within that set (probably using the regex option r at the prompt). If the species you are interested in are not in fungi, you will have to experiment to identify a sufficiently broad query that includes everything you're interested in.

jgi-query was originally designed for grabbing files on a per-species basis. It can download large file sets, however, but how well that will work depends on your specific needs. Hope that helps clarify things.

ADD REPLYlink written 8 weeks ago by glarue20
2
gravatar for glarue
4.0 years ago by
glarue20
United States
glarue20 wrote:

I know this is a late response, and it may not do exactly what you need, but feel free to check out a script I wrote to do something similar here: https://github.com/glarue/jgi-query

 

It's written in Python and runs from the command line. I haven't tested it on Mac or Windows, but it should (theoretically) work there as well as long as cURL and Python are installed.

 

Hope it helps, if you still need it!

 

EDIT: while jgi-query was designed primarily to download various files for a single organism, you can download very large datasets with it as well by using higher-level phylum names and range-formatted file selection syntax. For example, you can retrieve the entire fungal database with the command "jgi-query fungi", although selecting specific subsets of files can become onerous with large databases.

ADD COMMENTlink modified 3.8 years ago • written 4.0 years ago by glarue20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1474 users visited in the last hour