Question: retrieve matched genome/annotation pairs using Ensembl API
gravatar for glarue
5 weeks ago by
United States
glarue50 wrote:

I'm trying to figure out a programatic way of downloading genomes and their corresponding annotation files for a large number of species (100s).

I can't seem to find any reference for this in the Ensembl REST API docs. I suppose I could hack together something in Bash to wget from the Ensembl FTP server but I'm wondering if there's a straightforward way that I'm missing.

ensembl rest genome • 125 views
ADD COMMENTlink modified 4 weeks ago by Emily_Ensembl21k • written 5 weeks ago by glarue50

Have you looked at NCBI's new DATASETS? There is a command line tool available as well.

ADD REPLYlink written 5 weeks ago by genomax92k

This is interesting, and I had not heard of it—thanks! It still doesn't solve the issue, since I'd like to use Ensembl, but a good resource to be aware of.

ADD REPLYlink written 4 weeks ago by glarue50

I don't know if Ensembl API is designed to download genome wide data though I could be wrong. I will ping @Emily from Ensembl.

ADD REPLYlink written 4 weeks ago by genomax92k

What information do you have about that species (Name/Accession number/Assembly id/Taxonomy ID...?) Does it have to be Ensemble or ncbi would work too? (ncbi ftp, efetch, esearch,...?)

This link may help

Cannot get efetch to download genome - what is wrong?

ADD REPLYlink written 5 weeks ago by Fatima820

Currently I'm hoping to use binomial names, although I could probably use any identifier that would work programmatically.

Ensembl is the preferred source—I've used NCBI's utilities in the past which are much more robust, but the annotation pipeline at NCBI is more variable (in my experience), hence the desire for Ensembl's annotation standardization.

ADD REPLYlink written 5 weeks ago by glarue50
gravatar for Emily_Ensembl
4 weeks ago by
Emily_Ensembl21k wrote:

The Ensembl REST API is not designed for anything like that. It should be relatively easy to use the standard paths on the FTP site to script a wget download. You may find it useful to use the info/genomes/division endpoint from the REST API to get the genome names etc that you need in the FTP site locations though.

ADD COMMENTlink written 4 weeks ago by Emily_Ensembl21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1325 users visited in the last hour