6.2 years ago

I think you should provide a tool to search ENA the same way like Entrez Direct does.

The simplicity of being able to do simply do a:

efetch -db nucleotide -id AF086833 -format genbank


or

esearch -db sra -query PRJNA257197 | efetch -format runinfo > runinfo.txt


In my opinion scientists that write new software that interacts with existing data are the ones driving innovation.

In context of automating pipelines it would be an essential/useful addition, no argument there.

Having an additional option is always great but what information can one get from ENA that can't be found using Entrez direct (that is a real question :) )?

Sometimes having multiple options (that do more or less the same thing) confuses (new) users and becomes a matter of personal preference for educators.

I believe that ENA has richer set of annotations than NCBI - or at least better organized and more in tune with what biologists want. It is just not easy to get to it if you don't want to use a web browser.

There is a web API but even I found that to be too complicated and too low level - though I am not shy about using command line.

The ENA has programmatic query services

http://www.ebi.ac.uk/ena/browse/programmatic-access

For searching the catalogue there are a number of rest end points

http://www.ebi.ac.uk/ena/browse/search-rest

thanks

neither of which is similar to what I have described -

FWIW I don't consider the link below a useful programmatic access (taken from one of the resources that you mention).

As a matter of that that is really what would make the ENA more useful. Instead of obscure lengthy links having a tool that builds us these links.

That is what Entrez Direct does.

As mentioned in the post below, yes we have much simpler queries for programmatically fetching data and performing text searches, but yes the advanced search is complicated. Unfortunately there is no easy way to get around this when wanting to target search terms to specific fields within a record.

We are however wanting to start offering downloadable scripts for performing common actions within ENA next year, therefore this feedback is valuable. If you have key things that you feel these scripts should do, please feel free to start posting them here and we'll see what we can do.

0
This is something that got clarified in my own mind while writing the Biostar Handbook, teaching courses and observing how people interact with and work with bioinformatics data.

The graphical interfaces such as ENA interfaces are good to explore data but completely inappropriate to reproduce results or to communicate with another fellow scientist of what has actually been done.

In the whole book, which is now clocking in at more than 500 pages there is not a single instance where we would access data via GUI via a GUI interface - and it works like a charm. And I think that is a the right way to go about it!

The main point I am trying to make is we should use the browser to figure out what is there but then there absolutely needs to be a way to get the same information in simple and unambiguous way that allows the users to go on their own explorations.

A lot is made about reproducible research (or lack thereof) and one key component to that is well defined data access. Lengthy URLs that wrap around many rows and are full of weird characters in them are not a good solution - that is impossible to parse visually or to explain to others.

My opinion is that, having a well documented and simple programmatic interface will be the deciding factor in choosing one resource over the other.

6.2 years ago
cochrane ▴ 10

Yes - lots of services available from EMBL-EBI.

For retrieval: curl "http://www.ebi.ac.uk/ena/data/view/<accession>&display=text" will retrieve any record given the accession. e.g. curl "http://www.ebi.ac.uk/ena/data/view/AF086833&display=text" This accepts ranges and lists and gives a number of format options for different data types. See http://www.ebi.ac.uk/ena/browse/data-retrieval-rest#retrieval_single_identifier for more details.

For search: There is an interactive interface that can be used to build up queries at http://www.ebi.ac.uk/ena/data/warehouse/search. The 'Search query' box shows a search being built up as you fill in fields and you can edit directly. Based on this, you can pull the whole thing out into a standard REST call...

Full details at http://www.ebi.ac.uk/ena/browse/search-rest.

I think the main point here is to build a command line tool like eutils which builds these URL's automagically without the user having to type even this small string in.

efetch -db nucleotide -id AF086833 -format genbank is much simpler to remember/teach than http://www.ebi.ac.uk/ena/data/view/AF086833&display=text.

Don't get us wrong. We all appreciate great job you guys do. We always want more :)

Hi genomax2 do you have a link to this Ensembl eutils project/discussion? As someone who is part of the Ensembl project I'm not aware of an active project to develop such a tool. However it sounds like an interesting idea for sure.

I was doing a forward looking speculation based on this quote from @Nicole's post above :)

We are however wanting to start offering downloadable scripts for performing common actions within ENA next year

Edit: Since this sounds like an idea equivalent to Entrez eutils I should have referred to such a solution, Ensembl ENA eutils.

I hope they materialize at some point in future though. It always comes down to limited resources and how to distribute them We promise to be patient while you consider this request.

Hi

I think you are confusing the ENA with Ensembl.

The ENA is the nucleotide archive hosted at EMBL-EBI and is a different service to Ensembl.

Ensembl, the genome browser does source data from the ENA but isn't the same entity. Nicole works for the ENA not Ensembl

You are correct. My apologies. Got carried away there for a time.

I would suggest to look into how Entrez Direct works and how one can chain its actions together to full appreciate the level innovation there.

While the Entrez Direct implementation is a bit choppy (mainly the error reporting is lacking) it has functionalities that make it uniquely powerful. It is just that many people don't know about it yet.

Thanks for the feedback Istvan and genomax2. I have had a look at Entrez Direct and will use it and your comments here when building a specification for our planned scripts. And yes, error reporting is an important requirement. You will be happy to know that this work is considered to be high-priority for 2017. I'll make a note to make any announcements regarding release of new programmatic tools here, but if you would like to make sure you are kept abreast of any and all announcements regarding new ENA functionality (or service interruptions), please consider signing up to our ena-announce mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ena-announce

Hi Nicole - I was wondering if there's been any progress on this? Just got curious if there's a tool similar to Entrez-direct for ENA.

Thank you!

-- Alex